Semantic Web Technologies for Machine Translation
Satellite Workshop at the MT Summit 2005
12 September, Phuket Thailand
Invited Talk Ontologies for Crosslingual Applications, Hans Uszkoreit (DFKI and Saarland University) - Abstract
By its aim to implement a semantic structure behind the content of the World Wide Web, the Semantic Web activities recently attracted a large, significant and specialized research community consisting of computer scientists, computational linguists, logicians, knowledge and ontology specialists, programmers, e-commerce, etc.
Semantic Web needs human language technology and human language technology will highly benefit from the Semantic Web. However until now, research was directed more to the first issue. Techniques from human language technology were used to add meaning to the Web data and to make it usable for automatic processing. The second issue, i.e. the use of the new Semantic Web Technologies for improvement of natural language applications was neglected.
The development of ontologies for the Semantic Web, their search mechanisms, and the standard formal (e.g. RDF) annotation of large pieces of data on the web, are of high value for monolingual and multilingual natural language (web)-applications
The current workshop focuses on this topic, more exactly on the implications of such semantic web technologies on machine translation, which is a representative sub-field of natural language processing. It is well-known that multilinguality is one of the main challenges of Semantic Web. The annotation mechanisms and the development of ontologies and search procedures aim at retrieving relevant information independently of the language in which it was produced. On the other hand, Semantic Web activities will have major impact on natural language applications based on training on large pieces of corpora
Example-based machine translation is a relevant example: Up to now the training is done on parallel aligned corpora, in the best case, additionally annotated with syntactic information. However, big reliable parallel corpora are available only for a few language pairs and domains. In the absence of such corpora, the Web is the best source for parallel aligned corpora. Aligned via RDF(S) annotations, the web can be exploited as a multilingual corpus. Moreover, this annotation will provide the semantic information attached to the respective texts. This strategy can have significant implications on examplebased machine translation.
Knowledge based machine translation is another technique which can benefit from Semantic Web activities. Until now KB-MT systems were based mainly on the development of domain-dependent ontologies and on mapping the source language onto the target language via these ontologies. It was proved that KBMT can be very successful when applied to restricted domains, but encounters severe problems with translations of general texts. The Semantic Web activities (will) provide a large amount of ontologies in various domains and bridges between these ontologies. In this new context, KBMT could become a powerful mechanism for on-line machine translation.
The goal of the workshop is twofold:
- to discuss the implications of semantic web-technologies for machine translation, namely on example based and knowledge-based machine translation,
- to contrast the two main technologies of Semantic Web: topic maps and RDFS in machine translation of on-line texts.
We welcome original papers related (but not limited) to following topics
- semantic web annotations for multilingual corpora
- use of semantic web annotations for corpus based machine translation
- integration of semantic information in example based machine translation
- use of semantic web ontologies for machine translation
- semantic web and on-line translation tools
- integration of semantic web technologies in CAT tools.
We also encourage demonstrations of developed tools. Submissions for a demonstration session should include a 2 page demo-note describing the system-architecture and performance as well as technical requirements.
- Walther v. Hahn (University of Hamburg)
- Vladislav Kubon (Charles University Prague)
- Cristina Vertan (University of Hamburg)