Tagging And More
- Lezius, Wolfgang (2000): Morphy - German Morphology, Part-of-Speech Tagging and Applications. Presented at EURALEX 2000, Stuttgart, Germany. (PS)
- Lezius, Wolfgang, Rapp, R., Wettler, M. (1998): A Freely Available Morphological Analyzer, Disambiguator, and Context Sensitive Lemmatizer for German. In: Proceedings of the COLING-ACL 1998. (PS)
- Adwait Ratnaparkhi (1996): A Maximum Entropy Model for Part-Of-Speech Tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996. University of Pennsylvania (PS)
Rathnaparkhi employs an ME approach to tag the WSJ corpus. The words
and tags of the preceding and following two tokens are available as
features, and features that occur fewer than 10 times are ignored. The
initial performance is 96.43%, 86.23% on unknown words.
Since some words (about, that, more etc.) are particularly difficult,
accounting for 2 to 3 promille of all errors, the model is refined so
that features can ask about the identity of the word in question
itself, and treat e.g. `that' differently from other RB items. This
largely fails because the annotation itself is too inconsistent to
make such fine distinctions; when restricting the experiment to
sentences created by the same annotator, an increase to 97% is
possible.
- Adwait Ratnaparkhi (1997). A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania. (PS)
- Jakub Zavrel and Walter Daelemans (1999): Recent Advances in Memory-Based Part-of-Speech Tagging. In: Actas del VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590-597, 1999. ILK pub: ILK-9903. (PS)
- Walter Daelemans, Jakub Zavrel, Peter Berck and Steven Gillis: MBT: A Memory-Based Part of Speech Tagger-Generator. In: E. Ejerhed and I. Dagan (eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 14-27, 1996.
- Oliver Lorenz (1996): Automatische Wortformenerkennung für das Deutsche im Rahmen von Malaga. Magisterarbeit. Friedrich-Alexander-Universität Erlangen-Nürnberg, Abteilung für Computerlinguistik. (PS)
- Helmut Schmid (1997): Probabilistic Part-of-Speech Tagging Using Decision Trees. In Daniel Jones and Harold Somers (editors), New Methods in Language Processing Studies in Computational Linguistics pp. 154-164 UCL Press, London, GB. (PS)
- Helmut Schmid (1995): Improvements in Part-of-Speech Tagging with an Application to German. In Proceeding sof the ACL SIGDAT-Workshop pp. 47-50. (PS)
Source:
--
MichaelDaum - 30 Aug 2002