+ Iwpt Paper 2003

8th International Workshop on Parsing Technologies

  • Date: 23-25 April, 2003
  • Location: Nancy, France
  • URL: http://iwpt03.loria.fr
  • Paper title: Subtree Parsing to Speed up Deep Analysis
  • Authors: KilianAFoth, WolfgangMenzel
  • Summary: Within a grammar formalism that treats syntax analysis as a global optimization problem, methods are investigated to improve parsing performance by recombining the solutions of smaller and easier subproblems. The robust nature of the formalism allows the application of this technique with little change to the original grammar.

The IWPT 2003 unfolded with generally quite interesting contributions, some of which I describe here (there are no e-proceedings, sorry). Most relevant to us would be the various reports on deterministic dependency parsing.

First of all, a Free Download was announced by Bob C. Moore from Microsoft Research. Apparently he agreed to release some of the tools he had used for reporting on parsing evaluation in 2000, and it took him until now to actually do it. Perl5 implementations of current machine parsing formalisms can be downloaded at www.research.microsoft.com/research/downloads, under "Context-Free Parsing Algorithms".

Eugene Charniak: Syntax-based Language Models for Statistical Machine Translation

Eugene Charniak's invited talk presented an experiment where his own record-holding statistical parser was combined with a statistical MT system to improve the overall result. His own parser basically consists of a comparatively simple PCFG which outputs a packed parse forest, from which the maximally probable tree is then selected by applying a much more elaborate (lexicalized) grammar. The SMT model consists of a translation model (assigning probabilities to pairs of utterances in the source and target language) and a language model (assigning probabilities to trees in the target language). His own parser can be used in place of the simple n-gram language model, and when so used, more syntactically correct sentences are created. This shows that more informed language models are useful for machine translation. However, the (standard) BLEU measure for translation evaluation does not reflect the assessment of the human evaluators.

Mark-Jan Nederhof, Giorgio Satta: Probabilistic Parsing as Intersection

It is well-known that the context-free languages are closed against regular intersection. Given a CFG C and a finite automaton F, a new grammar can be constructed which accepts exactly L(C) \cap L(F), and the grammar has an intuitive relation to both C and F. This talk shows that the construction carries over to the statistical hierarchy when a PCFG is intersected with a PFA. This could be useful e.g. when applying a PCFG to the output of a speech recognizer, which is effectively a PFA. "Although it seems to have been ignored in the existing literature, this construction forms the theoretical basis for much of the ongoing work on probabilistic parsing."

Joakim Nivre: An Efficient Algorithm for Projective Dependency Parsing

Nivre performs dependency parsing of free Swedish text. He only considers unlabelled, single-level, connected projective syntactic trees. This allows him to build a shift-reduce parser that runs in in linear time. He achieves 89% correctness on a Swedish toy corpus with the help of perfect POS information.

Since he considers only deterministic parsing, i.e. an edge that has been established is never removed from the tree, the choice of an arbitration policy (when to shift, when to reduce) is critical for the performance. Nivre is about to add statistical knowledge to solve this task better.

Tomasz Obrebski: Dependency Parsing using dependency graph for storing alternative structures

This work describes a rather similar parser, except that it allows edge labels and additional node attributes. There exists a parsing system, dgp, that implements the approach. Tomasz and I have agreed to exchange former papers and, hopefully, our code for further information.

Khalil Sima'an: On Maximizing Metrics for Syntactic Disambiguation

Statistical parsers usually produce many possible parse trees, of which one must be chosen if good precision is desired. Usually one chooses the parse with the maximal probability under the language model that one's grammar implies (MPP).

In 1996, Goodman showed that this is actually not advisable; choosing the most probable parse only maximizes the sentence recall, i.e. the number of perfectly reconstructed parse trees. This number, however, is usually quite low and not very important. Most evaluations actually consider the number of correctly recalled constituents (labelled recall). Using the most probable parse will only maximize this metric if the grammar's language model P is identical with the actual distribution P, which it usually isn't (since the training process is not perfect).

Therefore (says Goodman) one should always select the parse with the most probable constituents, i.e. maximize directly on the valuation metric. Sima'an now gives an example of a construction where this approach doesn't work and suggests that you should optimize stronger metrics that take more linguistic information into account, e.g. the parent-child match rate od the context-free production match rate.

The ensuing discussion questioned whether this work was still relevant, or whether the trained models have not now become so good approximations that one should just revert to MPP.

Hiroyasu Yamada, Yuji Matsumodo: Statistical Dependency Analysis with Support Vector Machines

This work cites yet another 90% parser for the WSJ corpus, this time in probabilistic shift-reduce dependency parsing. The arbitration policy is given by SVM-calculated context probabilities.

Support Vector Machines themselves are the new miracle weapon in machine learning. They allow the use of very large feature spaces by the use of kernel functions, which magically map a small feature space onto a much larger space in a useful way. Through the Maximal Margin strategy (select the separator that has the maximal minimal distance from all class members), they also automatically avoid overfitting to training data. Unfortunately this is all hearsay; I myself can't begin to understand how this actually works.

Pierre Boullier: Guided Earley Parsing

Three years ago, Martin Kay challenged the parsing community to find a way how to extract a faster, more permissive parser P' from a given statistical parser P so that P' can be practically used as a guide for P for an overall speedup. Boullier presents a method that works for Earley parsing, which employs a finite-state model extracted from P that accepts a strict superset of L(P) as a guide for the actual Earley parsing. Compared with the normal Predictor phase of the Earley parser, this model is nine times more accurate.

Gregory Grefenstette: Why Parse? Roots of Statistical and Symbolic Processing

Grefenstette described the roots of early language processing, as he saw them, as springing from two different roots. The AI camp used the paradigm of a language-controlled robot (short, command-like sentences, exact analysis is extremely important). Systems like SHRDLU, Conceptual Grammar or MYCIN were named. The task of the parser was regarded as simply producing all possible syntax trees and handing them over. The parser writers generally saw their task as covering as many as possible of the 317 phenomena on language X. Unfortunately, by the time reliable parsers had been created, the AI community had long since moved on to quite different areas, chasing the funding that had also moved on.

The IR camp used the paradigm of a library and queries: the information is out there, we have to capture it in any way we can. Generally, statistical methods and weighted terms are used, while syntax, morphology etc. are often disregarded entirely (the first thing a retrieval system does is stemming, i.e. explicitly destroying morphologic information). Nowadays most NLP tasks do not actually use parsers; the best method for IR is now relevance feedback (perform query, find 20 best matching documents, extract more query terms from them, the run that query). He concluded that machine parsing appeared to have entered a period of Cinderella sleep, waiting to be kissed awake by the next "Prince Charming" application.

Grefenstette was heavily attacked by the actual first-generation researchers that were present (Eugene Charniak, Martin Kay, Bob C. Moore), who accused him of misrepresenting the whole field and entirely ignoring the last ten years of research. The phrases "beating a dead horse" and "insulting" were used.

Kenji Sagae, Alon Lavie: Combining Rule-based and Data-driven Techniques for Grammatical Relation Extraction

The title pretty much says it all. The goal is parsing utterances from the CHILDES database (English data of child language acquisition) for grammatical relations, and the tool used LCFlex (a CFG augmented with the possibility with unification constraints). Multipass parsing is used with the successive addition of word-skipping and constituent-insertion possibilities. If all passes fail, a backoff model is employed to extract the GRs without parsing (standard POS tagger retrained on GR tags, additional lexicalized Brill training, Charniak parser + conversion to dependency links). Marvel of marvels, the combined system works better than either in isolation.

Frederik Fouvy: Constaint relaxation with weighted feature structures

The bane of unification is that it is a all-or-nothing affair: when it fails, there is no Plan B - all work was in vain. The idea here is to allow unification anyway, but remove the conflicting features, and keep track of how much information had to be thrown away to continue. This could help to guide a robust unification parser (prune the search when too much information has been lost). No evaluation, or indeed implementation, has been performed.

-- MichaelDaum - 16 Jan 2003
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback