Q:  Is it unigram tokenization or BPE that better works for MT ? Why? [DM]
A: See Jurafsky & Martin page 254

Q: I need  more clarification (working principles) of Data Augmentation and Multilingual Models for low      
       resource situtions. [DM]
A: Data augmentation (sometimes also called Bootstrapping) uses a small data set to train a model with low quality. This model is used to generate additional output (synthetic annotations) for additional input data. In the case of an MT task, one usually can assume that only limited amounts of bilingual data are available, but huge amounts of monolingual ones. Thus, an initial system can be trained with the bilingual data and the resulting model can be used to produce bilingual data from the monolingual ones. This, procedure can be repeated several times. Surprisingly, this approach is beneficial, because the amount of additional training data which can be produced outweights its lower quality. Sometimes, even more crude techniques (like the duplication of training items are applied.
A multilingual model usually covers languages supplied with rich resources and others which are under-resourced. Since the model learns cross-lingual representations which can be expected to cover the universal (language independent) regularities of NL meaning. Thus, representations learned from the large data sets of some languages might help to compensate the data sparseness of the others.

Q: There are different types of tokenzation like, Sentencepiece tokenization (Unigram tokenization and Wordpiece algorithms (BPE Tokenization), any diffrence (performance diffrence) in the group of languages like  Cold languages and Hot languages (pages 252-254)[DH]
A: I see little connections between these two phenomena. The distinction between hot/cold refers to the possibility/necessity to drop pronouns in some languages, while they ore obligatory in others. The two different tokenization procedures, on the other hand, just differ in the direction of the subword generation process (merging or segmenting). 
    
Q: companies like google, microsoft and others are on competition to attract more customers, can't we say they have better(secret) language models that we don't know or we don't learn right now? can we say our(the world) knowledge about language models or machine translation is the latest? [YM]
A: In principles that could be possible,  but I assume that the effect is only a minor one. In general the companies you mention are under the pressure of a strong competition, not only for finding customers, but also to attract high potentials as future staff members. To be successful in this competition, they have to continuously demonstrate their top rank position in research and development. Therefore, they take for instance part in open competitions.
Also, many past examples have shown that the companies have a valid interest to publish their results in the form of toolkits, trained models (e.g. Google: BERT, 5-gram language models) or data sets. In most cases the ROI outweights the possible loss in intellectual property rights considerably, if as many as possible members of the international research community start working with these results to further develop and refine them.

Q: why translating poetry is still a dificult task in machine translation? what is recommended to deal with it? what makes it specially difficult?[YM]
A: Because poetry has many more aspects than linguistic meaning (rhythm, tone, connotations, ...)  which are hard if not impossible to be captured from (non-poetry) texts. Therefore human translators do not transfer a poem by looking into a dictionary but try to reconstruct these aspects in the target language, sometimes in fact close to creating a new poem, inspired by the original.
    
Q: how to deal with idioms in machine translation?[YM]
A: In rule-based and probabilistic MT lists of idiomatic expressions have been used (either hand-compiled or learned from bilingual corpora). NN approaches have to rely on the representational power of the encoder/decoder.

Q: Is hybrid MT mean rule-based, statisticl, & neural MT blending? if so how?[LE]
A: Yes, any kind of combination should be covered, although in the literature the term has mostly been used for the combination of rule-based and probabilistic methods. The combination can either be on the level of the models (e.g. a language model for rescaling output alternatives) or as a post-selection between the output of concurrently running MT systems, e.g. by means of a simple voting or by training and applying a selector component, which learns to decide in which cases which translation system produces superior results. Many other combination possibilities can be and have been developed.

Q: What are Machine Translation Architecture components?[LE]
A: rule-based MT: (morphological and syntactic) analysis, (lexical and structural) transfer, (syntactic and morphological synthesis)
probabilistic MT: transfer, fertility and target language model
neural MT: encoder, decoder


-- WolfgangMenzel - 09 Mar 2023
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback