+ Partial Parsing
++ Table of Contents
Introduction
Steven Abney 1991:
Parsing by Chunks
Abney postulates
chunks as the basic unit of human sentence
processing (with some psycholinguistic evidence) and proposes that
parsers should do likewise. His parser consists of a
chunker (a
nondeterministic LR parser using best-first search, guided by
scores about lexical frequency, category preference etc.) and an
attacher which is similar but has some additional rules (prefer
attachment as an argument, prefer low attachment).
Like a chart parser, this method never discards low-level components
just because no higher-level integration was possible. Abney argues
that that a chart parser has to balance how much work is saved by
reusing a cache of partial edges, and how much is introduced by
checking the cache, and that reusing only the chunks in a sentence is
a near-optimal compromise.
- Carroll, John, Briscoe, Ted and Sanfilippo, Antonio (1998): Parser Evaluation: a Survey and a New Proposal. First International Conference on Language Resources and Evaluation, Granada, Spain.
- Gee, J. and Grosjean, F. (1983): Performance Structures: A Psycholinguistic and Linguistic Appraisal. Cognitive Psychology 14, 411-458.
Finite-State methods
Steven Abney 1996:
Partial Parsing via Finite-State Transducers
Generalizing his earlier idea, Abney presents a cascade (i.e. a
pipeline without feedback) of finite-state transducers for parsing.
He employs nine levels of transducers and achieves about 1600 words
per second. Keywords:
easy-first parsing,
islands of certainty,
containment of ambiguity.
Abney argues that rejecting recursion between the levels increases not
only speed but accuracy as well, since lower-level phrases are not
discarded because of errors at higher levels (the standard `chart'
argument), and since his maximal-match rule can capture some phenomena
that a PCFG cannot. (The longest-match rule is supposed to apply to
English as well.) Quote: ``By keeping pattern precision high, we can
parse deterministically with acceptable error rates.'' However,
instead of giving overall accuracy figures, Abney talks at length
about the difference between accuracy and utility, which doesn't sound
promising. The precision/recall on chunks (the lowest level) is
87.9%/87.1%, where human judgement yields 91.3%/88.3%.
Salah Aït-Mokhtar and Jean-Pierre Chanod 1997:
Incremental Finite-State Parsing
(`Incremental' here means `multi-stage', not `prefix-based'.)
Implementing Abney's idea of cascaded regular transducers, the authors
parse free-text French input. The successive stages are:
- morpho-syntactic POS tagging
- (cautious) segmentation of several types of phrases
One transducer introduces tentative <np begins> and <np ends>
tags. Another then finds the longest stretch not including an <
> tag, thus correctly analysing `le ou les responsables'. The vp
detector runs after the np detector and incorporates its results.
An early transducer detects NPs too the left of finite verbs, a later
one detects inverted subjects if the first one found nothing, etc. If
no subject is found by anyone, the output string does not specify one.
Post-nominal nouns etc. are attached to verb clauses. This stage is
optional because it is only reliable in some restricted domains.
- detection of other syntactic functions
The program achieves ~150 words per second. The precision/recall for
subject detection is 92.6%/82.6% on newspaper text. Again, no overall
accuracy is even mentioned.
Jean-Pierre Chanod and Pasi Tapanainen 1996:A Non-Deterministic Tokenizer for Finite-State Parsing.
This paper reports on a nondeterministic tokenizer for French taken to
its limits. The tokenizer is actually a cascade of two FSTs: the first
finds blank-delimited words, the second applies a huge multiword
lexicon. This contains entries such as true multiwords (bon enfant/ADJ),
their literal counterparts (bon/ADJ enfant/N),
idioms (le pourquoi et le comment/N), foreign-language material
(a priori/ADV), and general regular expressions (il y a <num> ans/ADV).
The tokenizer can access all morphological information in the entire
system and generate ambiguous output. It thus increases recall
maximally, but appears not to make any attempt at precision.
- Chanod, Jean-Pierre and Tapanainen, Pasi (1996): A Robust Finite-State Parser for French. In ESSLLI'96 Workshop on Robust Parsing, pp. 12-16, Prague, Czech Republic.
- Chanod, Jean-Pierre and Tapanainen, Pasi (1997): Finite-State Based Reductionist Parsing for French. Kornai, András (ed.), Extended Finite State Models of Language. Cambridge University Press, forthcoming.
- Ejerhed, Eva (1996): Finite state segmentation of discourse into clauses. In Natural Language Engineering 2(4) 355-364. Cambridge University Press.
Gregory Grefenstette 1996:Light Parsing as Finite-State Filtering
Grefenstette details how FSTs can be used to build `rapid, robust and
reasonably accurate' parsers, which he claims are the preconditions
for large-scale NLP. He detects non-ambiguous longest-match NPs as
usual. In addition, he automatically introduces head markers (e.g. to
the rightmost noun in an NP) and then uses them to find subjects (head
nouns to the left of an active verb, with no NP or VP markers in
between) with 80% accuracy. The fact that everything in this
system is an FST is cited as an advantage.
Aravind Joshi 1996: A Parser from Antiquity: An Early Application of Finite State Transducers to Natural Language Parsing.
Joshi describes how a 1959 project employed methods that would be
called cascaded FSTs today. The motivation was that computational
power was extremely limited (they wrote UNIVAC machine language). They
already had idiom detetion, rule-based POS tagging, multiwords default
selections, FSTs for NPs, adjuncts, verb clusters and clauses (where
the last one is actually not finite-state). A
commentary by Karttunen sadly
reflects on how advances in hardware have not been equalled by
advances in grammar writing.
- Koskenniemi, Kimmo (1990): Finite-state Parsing and Disambiguation. In Proceedings of COLING-90, Helsinki, 229-232.
- Koskenniemmi, Kimmo, Tapanainen, Pasi and Voutilainen, Atro (1992): Compiling and using finite-state syntactic rules. Proceedings of COLING-92, I, pp. 156-162, Nantes, France.
- Mohri, Mehryar (1994): Finite-state transducers in language and speech processing. Computational Linguistics, 20(1).
- Pereira, Fernando C. N. and Wright, Rebecca N. (1996): Finite state approximation of phrase structure grammars.
Emmanuel Roche 1996: Parsing with Finite-State Transducers
Roche claims that "FST models are not an inaccurate but efficient
tool, but one of the best formalisms to accurately represent complex
linguistic information." He substantiates this by showing how lots of
phenomena can be modelled with them.
The method is to have an entry in a syntactic dictionary for each
word and its context (e.g. N thinks that S
), transform the entries
into FSTs and join them all with the union operation, and apply the
resulting huge FST repeatedly until no change occurs in the input string.
To avoid the duplication introduced by lexicalized rules, a
morphological transducer is introduced that outputs sequences like
# pn John # v leave left # det this # nn morning
This method is then applied to modal verb chains, discontinuous
operators, hidden support verbs etc. (basically everything that is not
recursive, although the word is never mentioned).
An example tree-adjoining grammar is transformed into such a monster
FST to show how non-contextfree languages can be parsed. Much is made
of the fact that dictionary lookup and CFG transformation
can also be done by FST, and word graphs can be written as DFAs so
that everything is an FST.
Anne Schiller 1996: Multilingual Finite-State Noun Phrase Extraction
Schiller presents NP extraction in seven languages. The tokenizer
handles clitics, elision etc. The POS tagger employs morphological
analysis, a transducer for tag mapping (e.g. remove case information)
and an HMM tagger. NP detection is done in the usual way. On a small
technical manual the extraction reaches an accuracy of 98.5% (German)
and 96.6% (French).
- Senellart, Jean (1998): Locating noun phrases with finite state transducers. In Proceedings of COLING-ACL '98, pp. 1212-1219.
Probabilistic methods
- Church, K. W. (1988): A stochastic parts program and noun phrase parser for unrestricted text. In Proc. of the 2nd applied natural language processing conference, Austin, Texas, ACL, 136-143.
Wojciech Skut and Thorsten Brants 1998:Chunk Tagger: Statistical Recognition of Noun Phrases
This paper covers the same ground as the
later paper,
except that it uses normal HMMs and not Maximum Entropy learning.
- Zhai, Chengxiang (1997): Fast Statistical Parsing of Noun Phrases for Document Indexing. In Proceedings of the 5th Conference on Applied Natural Language Processing,
- Osborne, Miles (2002): Shallow Parsing using Noisy and Non-Stationary Training Material. Journal of Machine Learning Research, pp 695-719.
- Megyesi, Beata (2002): Shallow Parsing with POS Taggers and Linguistic Features. Journal of Machine Learning Research, pp 639-668.
- Zhang, Tong and Damerau, Fred and Johnson, David (2002): Text Chunking based on a Generalization of Winnow. Journal of Machine Learning Research, pp 615-637.
- Dejean, Herve (2002): Learning Rules and Their Exceptions. Journal of Machine Learning Research, pp 669-693.
Cardie, Claire and Pierce, David 1998: Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Apparently in direct reaction to Voutilainen, the authors argue for
statistical, corpus-based NP extraction. They extract all possible POS
tag sequences for NPs from a training corpus annotated with simple
base NPs and then find the longest possible match in the POS tags of
the test text. To avoid overgenerating, they prune the pattern set by
counting how many errors each rule introduces, ranking the rules, and
discarding bad rules.
To overcome some obvious systematic errors, local repair heuristics
are used in another step that does look at the individual words, but
employs hand-written rules again. With these, the performance
approaches 94%/94% on the Penn Treebank.
Maximum Entropy
Wojciech Skut and Thorsten Brants 1998:A Maximum-Entropy Partial Parser for Unrestricted Text
The authors employ ME for parsing POS tag sequences, arguing that ME
can combine information from multiple sources, detect relevant
features automatically, and allows for feature decomposition. They
model the context of a word by triples of POS tag, relation to
previous word, and tag of the parent node. The relation can be one of
six classes such as `The parent of node A is the grandparent of node
B'. This allows modelling almost all occurring NPs, even the
recursive ones.
A second-order HMM is then trained to recognize these triples,
recognizing about 88% of NPs and APs correctly on the NEGRA corpus.
Linear HMM training is about 1% worse, but the difference almost
disappears as more training data are used.
Memory Based
Erik F. Tjong Kim Sang 2002: Memory-Based Shallow Parsing
This is an extensive treatment of memory-based learning applied to
NLP, claiming that "All natural language tasks can be performed
successfully by memory-based learners." It details how MBL solves the
classification task (with the overlap metric, weighted features or the
gain ratio), how to suppress irrelevant features (bidirectional hill
climbing on the space of features employed), and how combining several
learners improves accuracy (by voting, or with stacked
classifiers).
In NP chunking, the different methods of marking NO chunks are
exploited to build a six-fold combined learner. The combination does
boos recognition from 92 to 93%, but the authors own previous method
without feature selection does just as well, so they revert to it for
the task of general chunk detection and achieve 92.6%.
Clause identification is more difficult, particularly because
multiple simultaneous clause openings, and only an f-measure of 67.8%
is reached. NP parsing is performed by finding the heads of NP chunks
and re-running the chunker on them, for an f-measure of 83.8%. The
authors achieve only 80.5% on full parsing and note that although
improvements can be imagined on their parser, it is already bigger and
slower than a better statistical full parser.
Sandra Kübler and Erhard W. Hinrichs 2001: TüSBL: A Similarity-Based Chunk Parser for Robust Syntatic Processing
Although Abney's original chunk paper speaks about a chunker and an
attacher, so far almost all work has been on chunkers. Here (after POS
tagging and applying an Abney chunker) a similarity-based tree
constructor is used to compute the structure within chunks and to
combine them into trees if possible. The metric is the similarity
between entire trees. If no exact match can be found, subchunks, word
sequences or tag sequences are searched for near-matches. The average
accuracy is 77.2%/67.3%. Since the PARSEVAL measure always considers
partial solutions as wrong, the treebank author was called upon to
judge the output, and found it to be 92.4% precise.
Other methods
- Chen, Kuang-hua and Chen, Hsin-hsi (1994): Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation. Meeting of the Association for Computational Linguistics, pages 234-241, citeceere
- Evans, A. David and Zhai, Chengxiang (1996): Noun-Phrase Analysis in Unrestricted Text for Information Retrieval.
- Federici, Stefano, Montemagni, Simonetta and Pirrelli, Vito (1996): Shallow Parsing and Text Chunking: a View on Underspecification in Syntax. In ESSLLI'96 Workshop on Robust Parsing, Prague, Czech Republic
- Grefenstette, Gregory and Teufel, Simone (1995) : Corpus-based Method for Automatic Identification of Support Verbs for Nominalizations.
- Lavie, Alon (1994): An integrated heuristic scheme for partial parse evaluation.
- Lyon, Caroline and Dickerson, Bob (1995) : A fast partial parse of natural language sentences using a connectionist method.
- Min, Kyongho and Wilson, William H. (1998): Integrated Control of Chart Items for Error Repair. In Proceedings of COLING-ACL '98, pp. 862-868.
- Nasukawa, Tetsuya (1995): Robust Parsing Based on Discourse Information - Completing partial parses of ill-formed sentences on the basis of discourse information. Computational Linguistics, cmp-lg/9505042, IBM Research, Tokyo Research Laboratory, nasukawa@trl.vnet.ibm.com
- Ramshaw, Lance A. and Marcus, Mitchell P. (1995): Text Chunking using Transformation-Based Learning. Computational Linguistic, cmp-lg/0505040 v2, ramshaw@polar.bowdoin.edu, mitch@linc.cis.upenn.edu
- Rayner, Manny and Carter, David (1996): Fast parsing using pruning and grammar specialization.
- Schneider, David and McCoy, Kathleen F. (1998): Recognizing Syntactic Errors in the Writing of Second Language Learners. In Proceedings of COLING-ACL '98, pp. 1198-1204.
- Voutilainen, Atro and Järvinen, Timo (1995): Specifying a shallow grammatical representation for parsing purposes
Atro Voutilainen 1995: NPtool, a detector of English noun phrases
Voutilainen openly states that he prefers rule-based
to statistically derived descriptions
because they can be more accurate and reliable. He justifies this by
presenting a multi-stage NP extractor for English, with very high
recall (98-100%) and precision (95-98%).
After segmentation and idiom detection, a very thorough morphological
analyser generates all possible morphological analyses.
Morphosyntactic constraints then eliminate impossible forms (single
cannot be a verb after a) and impossible syntactical functions
(harry cannot be a prenominal element in is harry foolish?).
Several partial FSTs then parse a sentence for NPs, one which
penalizes NPs, one which rewards them. Maximal NPs that are in the
output of both parsers are then assumed to be correct.