Questions for self-monitoring: Transformers

  • What is the core idea of attention-based computing?
  • How attention is used in encoder-decoder models?
  • How can attention be applied to overcome the problem of vanishing gradient in RNN encoders?
  • How does the architecture of a transformer look like?
  • What are residual connections and which purpose do they serve?

  • What is multi-head self-attention and why it is used?
  • How is multi-head self-attention integrated into the network architecture?
  • If recurrence is abandoned, word order information is lost. How can it be reintroduced?
  • How are transformers trained?

-- WolfgangMenzel - 09 Mar 2023
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback