Q:  Do we capture long-term dependencies in n-gram language models? [LE]
A: No, that is impossible. Everything beyond the n-gram history is invisible to the model.

Q:What is the use of multi-head self-attention in Transformer Language Model ?[LE]
A: Learning and differentiation of different kinds of attention aspects, e.g. the relationships within a phrase (agreement in complex NPs or between distant parts of a VP, government and selection phenomena between a verb and its complements)

Q: Why is padding in NLP?[LE]
Padding is rarely used in NLP. It denotes the filling differently long representations by empty elements to make them equally size, a prerequisite for processing with NNs. Usually  padding is used to increase the dimensionality of a vector representation by adding zero elements. Sometimes, it is also used to enforce equally long sentences, in my opinion a very strange measure for any NLP data.

Q: in transformer model, what would happen if we give it syntactically incorrect input for translation? would it give us back syntactically incorrect translation of the target language?[YM]
A: I can only speculate, because I never tried this. I assume, however, that the syntactic form of the output will be quite good despite the errors in the input, since the target sentence is generated from a highly abstract representation of the maning of the sentence. It should keep only little cues of the syntactic form of the input.

-- WolfgangMenzel - 12 Mar 2023
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback