UHH
>
Informatik
>
NatS
>
Addis2023 Web
>
CourseStructure
>
ScheDule
>
QSM10
(09 Mar 2023,
WolfgangMenzel
)
P
rint version
Questions for self-monitoring: Transformers
What is the core idea of attention-based computing?
How attention is used in encoder-decoder models?
How can attention be applied to overcome the problem of vanishing gradient in RNN encoders?
How does the architecture of a transformer look like?
What are residual connections and which purpose do they serve?
What is multi-head self-attention and why it is used?
How is multi-head self-attention integrated into the network architecture?
If recurrence is abandoned, word order information is lost. How can it be reintroduced?
How are transformers trained?
--
WolfgangMenzel
- 09 Mar 2023
Addis2023
Navigation
Index
Changes
Notifications
Preferences
NatsWiki
Main
User
Sandbox
System
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback