+ Iwpt Paper 2003
8th International Workshop on Parsing Technologies
- Date: 23-25 April, 2003
- Location: Nancy, France
- URL: http://iwpt03.loria.fr
- Paper title: Subtree Parsing to Speed up Deep Analysis
- Authors: KilianAFoth, WolfgangMenzel
- Summary: Within a grammar formalism that treats syntax analysis as a global optimization problem, methods are investigated to improve parsing performance by recombining the solutions of smaller and easier subproblems. The robust nature of the formalism allows the application of this technique with little change to the original grammar.
The IWPT 2003 unfolded with generally quite interesting contributions,
some of which I describe here (there are no e-proceedings, sorry).
Most relevant to us would be the various reports on deterministic
dependency parsing.
First of all, a Free Download was announced by Bob C. Moore from
Microsoft Research. Apparently he agreed to release some of the tools
he had used for reporting on parsing evaluation in 2000, and it took
him until now to actually do it. Perl5 implementations of current
machine parsing formalisms can be downloaded at
www.research.microsoft.com/research/downloads, under "Context-Free
Parsing Algorithms".
Eugene Charniak:
Syntax-based Language Models for Statistical Machine Translation
Eugene Charniak's invited talk presented an experiment where his own
record-holding statistical parser was combined with a statistical MT
system to improve the overall result. His own parser basically
consists of a comparatively simple PCFG which outputs a packed parse
forest, from which the maximally probable tree is then selected by
applying a much more elaborate (lexicalized) grammar. The SMT model
consists of a translation model (assigning probabilities to pairs of
utterances in the source and target language) and a language model
(assigning probabilities to trees in the target language). His own
parser can be used in place of the simple n-gram language model, and
when so used, more syntactically correct sentences are created. This
shows that more informed language models are useful for machine
translation. However, the (standard) BLEU measure for translation
evaluation does not reflect the assessment of the human evaluators.
Mark-Jan Nederhof, Giorgio Satta: Probabilistic Parsing as
Intersection
It is well-known that the context-free languages are closed against
regular intersection. Given a CFG C and a finite automaton F, a new
grammar can be constructed which accepts exactly L(C) \cap L(F), and
the grammar has an intuitive relation to both C and F. This talk shows
that the construction carries over to the statistical hierarchy when a
PCFG is intersected with a PFA. This could be useful e.g. when
applying a PCFG to the output of a speech recognizer, which is
effectively a PFA. "Although it seems to have been ignored in the
existing literature, this construction forms the theoretical basis for
much of the ongoing work on probabilistic parsing."
Joakim Nivre: An Efficient Algorithm for Projective Dependency Parsing
Nivre performs dependency parsing of free Swedish text. He only
considers unlabelled, single-level, connected projective syntactic
trees. This allows him to build a shift-reduce parser that runs in in
linear time. He achieves 89% correctness on a Swedish toy corpus with
the help of perfect POS information.
Since he considers only deterministic parsing, i.e. an edge that has
been established is never removed from the tree, the choice of an
arbitration policy (when to shift, when to reduce) is critical for the
performance. Nivre is about to add statistical knowledge to solve this
task better.
Tomasz Obrebski:
Dependency Parsing using dependency graph for storing alternative
structures
This work describes a rather similar parser, except that it allows
edge labels and additional node attributes. There exists a parsing
system, dgp, that implements the approach. Tomasz and I have agreed to
exchange former papers and, hopefully, our code for further
information.
Khalil Sima'an: On Maximizing Metrics for Syntactic Disambiguation
Statistical parsers usually produce many possible parse trees, of
which one must be chosen if good precision is desired. Usually one
chooses the parse with the maximal probability under the language
model that one's grammar implies (MPP).
In 1996, Goodman showed that this is actually not advisable; choosing
the most probable parse only maximizes the sentence recall, i.e. the
number of perfectly reconstructed parse trees. This number, however,
is usually quite low and not very important. Most evaluations actually
consider the number of correctly recalled constituents (labelled
recall). Using the most probable parse will only maximize this metric
if the grammar's language model P is identical with the actual
distribution P, which it usually isn't (since the training process is
not perfect).
Therefore (says Goodman) one should always select the parse with the
most probable constituents, i.e. maximize directly on the valuation
metric. Sima'an now gives an example of a construction where this
approach doesn't work and suggests that you should optimize stronger
metrics that take more linguistic information into account, e.g. the
parent-child match rate od the context-free production match rate.
The ensuing discussion questioned whether this work was still
relevant, or whether the trained models have not now become so good
approximations that one should just revert to MPP.
Hiroyasu Yamada, Yuji Matsumodo:
Statistical Dependency Analysis with Support Vector Machines
This work cites yet another 90% parser for the WSJ corpus, this time
in probabilistic shift-reduce dependency parsing. The arbitration
policy is given by SVM-calculated context probabilities.
Support Vector Machines themselves are the new miracle weapon in
machine learning. They allow the use of very large feature spaces by
the use of kernel functions, which magically map a small feature
space onto a much larger space in a useful way. Through the Maximal
Margin strategy (select the separator that has the maximal minimal
distance from all class members), they also automatically avoid
overfitting to training data. Unfortunately this is all hearsay; I
myself can't begin to understand how this actually works.
Pierre Boullier: Guided Earley Parsing
Three years ago, Martin Kay challenged the parsing community to find a
way how to extract a faster, more permissive parser P' from a given
statistical parser P so that P' can be practically used as a guide
for P for an overall speedup. Boullier presents a method that works
for Earley parsing, which employs a finite-state model extracted from
P that accepts a strict superset of L(P) as a guide for the actual
Earley parsing. Compared with the normal Predictor phase of the Earley
parser, this model is nine times more accurate.
Gregory Grefenstette:
Why Parse? Roots of Statistical and Symbolic Processing
Grefenstette described the roots of early language processing, as he
saw them, as springing from two different roots. The AI camp used the
paradigm of a language-controlled robot (short, command-like
sentences, exact analysis is extremely important). Systems like
SHRDLU, Conceptual Grammar or MYCIN were named. The task of the parser
was regarded as simply producing all possible syntax trees and handing
them over. The parser writers generally saw their task as covering as
many as possible of the 317 phenomena on language X. Unfortunately, by
the time reliable parsers had been created, the AI community had long
since moved on to quite different areas, chasing the funding that had
also moved on.
The IR camp used the paradigm of a library and queries: the
information is out there, we have to capture it in any way we can.
Generally, statistical methods and weighted terms are used, while
syntax, morphology etc. are often disregarded entirely (the first
thing a retrieval system does is stemming, i.e. explicitly destroying
morphologic information). Nowadays most NLP tasks do not actually use
parsers; the best method for IR is now relevance feedback (perform
query, find 20 best matching documents, extract more query terms from
them, the run that query). He concluded that machine parsing
appeared to have entered a period of Cinderella sleep, waiting to be
kissed awake by the next "Prince Charming" application.
Grefenstette was heavily attacked by the actual first-generation
researchers that were present (Eugene Charniak, Martin Kay, Bob C.
Moore), who accused him of misrepresenting the whole field and
entirely ignoring the last ten years of research. The phrases "beating
a dead horse" and "insulting" were used.
Kenji Sagae, Alon Lavie: Combining Rule-based and Data-driven
Techniques for Grammatical Relation Extraction
The title pretty much says it all. The goal is parsing utterances from
the CHILDES database (English data of child language acquisition) for
grammatical relations, and the tool used LCFlex (a CFG augmented with
the possibility with unification constraints). Multipass parsing is
used with the successive addition of word-skipping and
constituent-insertion possibilities. If all passes fail, a backoff
model is employed to extract the GRs without parsing (standard POS
tagger retrained on GR tags, additional lexicalized Brill training,
Charniak parser + conversion to dependency links). Marvel of marvels,
the combined system works better than either in isolation.
Frederik Fouvy: Constaint relaxation with weighted feature structures
The bane of unification is that it is a all-or-nothing affair: when it
fails, there is no Plan B - all work was in vain. The idea here is to
allow unification anyway, but remove the conflicting features, and
keep track of how much information had to be thrown away to continue.
This could help to guide a robust unification parser (prune the search
when too much information has been lost). No evaluation, or indeed
implementation, has been performed.
--
MichaelDaum - 16 Jan 2003