Teach Tnt Caps Behaviour
Description
The German grammar relies heavily on TnT's part-of-speech predictions. But TnT
consistently assumes that uppercase words are nouns; therefore it mistags the sentence
"Sucht Ihr mich?" as NN PPER NN (rather than VVFIN PPER NN). Since it is closed-source, we
cannot tell it to question an initial uppercase letter. We can, however, try out the alternative version
"sucht Ihr mich?", which
is tagged correctly. Therefore, the wrapper
deutsch-tagger.pl
should be changed so that it (optionally?) performs the following algorithm:
- when a sentence starts with an uppercase word, tag both variants of the sentence
- take the predicted scores for both variants and average them
More great examples:
- Mauer Börsenstart der Telekom Austria
- Betrug der Umsatz 1998 noch 1,72 Milliarden US-Dollar, steigerte er sich 1999 auf 3,59 Milliarden.
Comments
--
KilianAFoth on 04 May 2006, 10:20:53