Teach Tnt Caps Behaviour

Description

The German grammar relies heavily on TnT's part-of-speech predictions. But TnT consistently assumes that uppercase words are nouns; therefore it mistags the sentence "Sucht Ihr mich?" as NN PPER NN (rather than VVFIN PPER NN). Since it is closed-source, we cannot tell it to question an initial uppercase letter. We can, however, try out the alternative version "sucht Ihr mich?", which is tagged correctly. Therefore, the wrapper deutsch-tagger.pl should be changed so that it (optionally?) performs the following algorithm:

  • when a sentence starts with an uppercase word, tag both variants of the sentence
  • take the predicted scores for both variants and average them

More great examples:

  • Mauer Börsenstart der Telekom Austria
  • Betrug der Umsatz 1998 noch 1,72 Milliarden US-Dollar, steigerte er sich 1999 auf 3,59 Milliarden.

Comments

This is now done by peeking into deutsch-lexikon.cdg instead. No second TnT pass is necessary.
-- KilianAFoth on 04 May 2006, 10:20:53

 

 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback