Questions for self-monitoring: Transformers
What is the core idea of attention-based computing?
How attention is used in encoder-decoder models?
How can attention be applied to overcome the problem of vanishing gradient in RNN encoders?
How does the architecture of a transformer look like?
What are residual connections and which purpose do they serve?
What is multi-head self-attention and why it is used?
How is multi-head self-attention integrated into the network architecture?
If recurrence is abandoned, word order information is lost. How can it be reintroduced?
How are transformers trained?
--
WolfgangMenzel
- 09 Mar 2023
This topic: Addis2023
>
WebHome
>
CourseStructure
>
ScheDule
>
QSM10
Topic revision:
09 Mar 2023,
WolfgangMenzel
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback