Hybrid Parsing
Kilian Foth, Wolfgang Menzel, Natural Language Systems Group, Hamburg University, Germany
Parsing natural language sentences is a task which requires to take into account evidence from a wide variety of knowledge source. In addition to the rules of a grammar this also includes a huge number of syntactic and semantic preferences which in many cases are highly lexicalized, i.e. depend on the identity of a word more than on its morphosyntactic properties. The situation becomes even more complex, since model components which are meant to capture such preferences impose quite different requirements on the data from which the necessary information can be extracted: While some can be trained on plain text, others require sophisticated annotations with categories or even tree structures. Therefore, they are available in vastly different amounts.
Unfortunately, model components which deal with preferences are by no means fully reliable. If a number of them has to be integrated into a single system solution, it needs to be able to deal with inconsistencies since different components are likely to produce conflicting predictions. A formalism which is able to arbitrate between conflicting predictions in the process of deciding on the structure of a sentence is Weighted Constraint Dependency Grammar (WCDG), where rules (i.e. constraints) are generally defeasible and individually scored to indicate how important a particular constraint should be considered. The parser then takes into account every available evidence by combining the scores of violated constraints, trying to find the structure which globally ranks best. In case of conflict, constraints based on strong evidence will simply override the information provided by the weaker ones.
Based on an introduction into the formalism of WCDG and the solution procedures available for the parsing problem, the course will focus on a series of experiments which have been carried out to integrate uncertain evidence from various probabilistic components into decision upon the optimal syntactic structure. Among the components considered so far are a POS tagger, a chunker, a supertagger, a PP attacher and a general attachment predictor. Special emphasis will be put on the lessons learned when interfacing external predictors to the parser by means of defeasible constraints and mapping the different metrical spaces onto constraint weights.
--
WolfgangMenzel --
31 May 2006