"Automatic Recognition and Morphological Classification of Unknown German Nouns

Preslav Nakov, Galia Angelova, Walther von Hahn"

ABSTRACT: The work presented here was performed 2001 as a scientific project of the BIS-21 "Center of Excellence" project, ICA1-2000-70016 and was supported by the cooperation between Hamburg University Sofia University "St. Kl. Ohridski" Abstract A system for recognition and morphological classification of unknown words for German is described. The MorphoClass system takes raw text as input and outputs a list of the unknown nouns together with hypotheses about their morphological class and stem. The used morphological classes uniquely identify the word gender and the inflection endings it takes for changes in case and number. MorphoClass exploits both global information (ending guessing rules, maximum likelihood estimations, word frequency statistics), and local information (adjacent context) as well as morphological properties (compounding, inflection, affixes) and external linguistic knowledge (especially designed lexicons, German grammar information etc.). The task is solved by a sequence of subtasks including: unknown word identification, noun identification, recognition and grouping of inflected forms of the same word (they must share the same stem), compound splitting, morphological stem analysis, stem hypotheses for each group of inflected forms, and finally production of a ranked list of hypotheses about a possible morphological class for each group of words. MorphoClass is a kind of tool for lexical acquisition: it identifies unknown words from a raw text, derives their properties and classifies them. Currently, only nouns are processed but the approach can be successfully applied to other parts of speech (especially when the PoS of the unknown word is already determined) as well as to other inflexional languages.

