Domain Adaptation in machine translation
since June 2013
Current state of the art techniques for domain adaptation in statistical machine translation include the usage of mixture models that give a lower weight to the training domain and a higher one to the test domain. Also, mining unknown words by building dictionaries using different resources helps in improving the translation system. There is a related problem in machine learning named transductive transfer learning which learns a scoring function given two different domains and a single task. However, not many techniques from the machine learning community have been implemented. Methods like bootstrapping and structural correspondence learning have been used in tasks like parser adaptation and opinion mining adaptation, but have not been implemented in the task of translation adaptation. One of the reasons lies in the mismatch between the domain adaptation of statistical machine translation models and the transductive transfer learning settings which imply using a feature space and a label space.
The main focus of this research is on fulfilling the theoretical conditions needed in order to apply transfer learning algorithms to domain adaptation in SMT and implementing and applying with success these algorithms in the setting of using low resource languages like Romanian and divergent domains like Biology and Geography
Persons involved: Mirela-Stefania Duma (Ph.D. work), Cristina Vertan, Walther v. Hahn, Wolfgang Menzel