UHH>Informatik>NatS>User Web>GavrilaMonica>EbMt>DissertationGavrila (17 Oct 2012, UnknownUser) Print version

Improving Recombination in a Linear EBMT System by Use of Constraints (Submitted in July 2011, Defended in February 2012)

Abstract

(Automatic) Machine Translation (MT) is one of the most challenging domains in Natural Language Processing and plays an important role in ensuring global communication, especially in a multilingual world with access to large amounts of Internet resources. As rule-based MT approaches need manually developed resources, new MT directions have been developed over the last twenty years, such as corpus-based machine translation (CBMT): Statistical MT (SMT) and Example-based Machine Translation (EBMT). These new directions are based mainly on the existence of a parallel aligned corpus and, therefore, can be easily employed for lower-resourced languages.

In this dissertation we showed how EBMT systems react when a lower-resourced inflecting language (i.e. Romanian) is involved in the translation process. For this purpose we built an EBMT baseline system based only on surface forms (the Lin-EBMT system). One of our main goal was to investigate the impact of word-order constraints on the translation results: we integrated constraints extracted from generalized examples (i.e. templates) in Lin-EBMT and built the Lin-EBMTREC+ system. Although constraints represent a well-known method which is employed quite often in NLP, the use of word-order constraints in an EBMT system is an innovative approach which can open new paths in the domain of example-based MT. We run our experiments for two language-pairs in both directions of translation: Romanian-German and Romanian-English. This aspect raises interesting questions, as Romanian and German present language specific characteristics, which make the translation process even more challenging.

Both EBMT systems developed are easily adaptable for other language-pairs. They are platform and language-pair independent, provided that a parallel aligned corpus for the language-pair exists and that the tools used for obtaining the needed intermediate information (e.g. word-alignment) are available. As a side question, we studied how EBMT reacts in comparison to SMT. We compared the EBMT results obtained to results provided by a Moses-based SMT system and the Google Translate on-line system.

To provide a complete view on CBMT, the performance of each MT system was assessed in several experimental settings, using different corpora (type and size), various system settings and additional part-of-speech information. We evaluated the translation results by means of three automatic evaluation metrics: BLEU, NIST and TER. A subset of the results was manually analyzed for a better overview on the translation quality.

Our experiments showed that constraints improve translation results, although a clear decision which constraint-combination works best could not be taken. Although the SMT system outperformed the EBMT system in all experiments, the manual analysis provided cases in which EBMT offered more accurate results. The behavior of the systems while changing the experimental settings confirmed that (training and test) data have a substantial impact on both MT approaches. The difference between the results of the two MT approaches decreased when a more restricted corpus was used. As expected, both CBMT approaches worked better for shorter sentences.

Publications on the topic

The EBMT System

-- GavrilaMonica -- 13 Oct 2011

User

NatsWiki
Main
User
Sandbox
System

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback