Tagging And More

taggun.jpg

  • Lezius, Wolfgang (2000): Morphy - German Morphology, Part-of-Speech Tagging and Applications. Presented at EURALEX 2000, Stuttgart, Germany. (PS)
  • Lezius, Wolfgang, Rapp, R., Wettler, M. (1998): A Freely Available Morphological Analyzer, Disambiguator, and Context Sensitive Lemmatizer for German. In: Proceedings of the COLING-ACL 1998. (PS)
  • Adwait Ratnaparkhi (1996): A Maximum Entropy Model for Part-Of-Speech Tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996. University of Pennsylvania (PS)

Rathnaparkhi employs an ME approach to tag the WSJ corpus. The words and tags of the preceding and following two tokens are available as features, and features that occur fewer than 10 times are ignored. The initial performance is 96.43%, 86.23% on unknown words.

Since some words (about, that, more etc.) are particularly difficult, accounting for 2 to 3 promille of all errors, the model is refined so that features can ask about the identity of the word in question itself, and treat e.g. `that' differently from other RB items. This largely fails because the annotation itself is too inconsistent to make such fine distinctions; when restricting the experiment to sentences created by the same annotator, an increase to 97% is possible.

  • Adwait Ratnaparkhi (1997). A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania. (PS)
  • Jakub Zavrel and Walter Daelemans (1999): Recent Advances in Memory-Based Part-of-Speech Tagging. In: Actas del VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590-597, 1999. ILK pub: ILK-9903. (PS)
  • Walter Daelemans, Jakub Zavrel, Peter Berck and Steven Gillis: MBT: A Memory-Based Part of Speech Tagger-Generator. In: E. Ejerhed and I. Dagan (eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 14-27, 1996.
  • Oliver Lorenz (1996): Automatische Wortformenerkennung für das Deutsche im Rahmen von Malaga. Magisterarbeit. Friedrich-Alexander-Universität Erlangen-Nürnberg, Abteilung für Computerlinguistik. (PS)
  • Helmut Schmid (1997): Probabilistic Part-of-Speech Tagging Using Decision Trees. In Daniel Jones and Harold Somers (editors), New Methods in Language Processing Studies in Computational Linguistics pp. 154-164 UCL Press, London, GB. (PS)
  • Helmut Schmid (1995): Improvements in Part-of-Speech Tagging with an Application to German. In Proceeding sof the ACL SIGDAT-Workshop pp. 47-50. (PS)

Source:

-- MichaelDaum - 30 Aug 2002
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback