QSM10 < Addis2023

Questions for self-monitoring: Transformers

What is the core idea of attention-based computing?
How attention is used in encoder-decoder models?
How can attention be applied to overcome the problem of vanishing gradient in RNN encoders?
How does the architecture of a transformer look like?
What are residual connections and which purpose do they serve?

What is multi-head self-attention and why it is used?
How is multi-head self-attention integrated into the network architecture?
If recurrence is abandoned, word order information is lost. How can it be reintroduced?
How are transformers trained?

-- WolfgangMenzel - 09 Mar 2023

This topic: Addis2023 > WebHome > CourseStructure > ScheDule > QSM10
Topic revision: 09 Mar 2023, WolfgangMenzel

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback