PartialParsing < CDG

UHH>Informatik>NatS>CDG Web>InternalPages>EssentialReading>PartialParsing (17 Oct 2012, UnknownUser) Print version

+ Partial Parsing

++ Table of Contents

Introduction
Finite-State methods
Probabilistic methods
Maximum Entropy
Memory Based
Other methods

Introduction

Steven Abney 1991: Parsing by Chunks

Abney postulates chunks as the basic unit of human sentence processing (with some psycholinguistic evidence) and proposes that parsers should do likewise. His parser consists of a chunker (a nondeterministic LR parser using best-first search, guided by scores about lexical frequency, category preference etc.) and an attacher which is similar but has some additional rules (prefer attachment as an argument, prefer low attachment).

Like a chart parser, this method never discards low-level components just because no higher-level integration was possible. Abney argues that that a chart parser has to balance how much work is saved by reusing a cache of partial edges, and how much is introduced by checking the cache, and that reusing only the chunks in a sentence is a near-optimal compromise.

Carroll, John, Briscoe, Ted and Sanfilippo, Antonio (1998): Parser Evaluation: a Survey and a New Proposal. First International Conference on Language Resources and Evaluation, Granada, Spain.
Gee, J. and Grosjean, F. (1983): Performance Structures: A Psycholinguistic and Linguistic Appraisal. Cognitive Psychology 14, 411-458.

Finite-State methods

Steven Abney 1996: Partial Parsing via Finite-State Transducers

Generalizing his earlier idea, Abney presents a cascade (i.e. a pipeline without feedback) of finite-state transducers for parsing. He employs nine levels of transducers and achieves about 1600 words per second. Keywords: easy-first parsing, islands of certainty, containment of ambiguity.

Abney argues that rejecting recursion between the levels increases not only speed but accuracy as well, since lower-level phrases are not discarded because of errors at higher levels (the standard `chart' argument), and since his maximal-match rule can capture some phenomena that a PCFG cannot. (The longest-match rule is supposed to apply to English as well.) Quote: ``By keeping pattern precision high, we can parse deterministically with acceptable error rates.'' However, instead of giving overall accuracy figures, Abney talks at length about the difference between accuracy and utility, which doesn't sound promising. The precision/recall on chunks (the lowest level) is 87.9%/87.1%, where human judgement yields 91.3%/88.3%.

Salah Aït-Mokhtar and Jean-Pierre Chanod 1997: Incremental Finite-State Parsing

(`Incremental' here means `multi-stage', not `prefix-based'.)

Implementing Abney's idea of cascaded regular transducers, the authors parse free-text French input. The successive stages are:

morpho-syntactic POS tagging
(cautious) segmentation of several types of phrases

One transducer introduces tentative <np begins> and <np ends> tags. Another then finds the longest stretch not including an <> tag, thus correctly analysing `le ou les responsables'. The vp detector runs after the np detector and incorporates its results.

subject detection

An early transducer detects NPs too the left of finite verbs, a later one detects inverted subjects if the first one found nothing, etc. If no subject is found by anyone, the output string does not specify one.

verb clause expansion

Post-nominal nouns etc. are attached to verb clauses. This stage is optional because it is only reliable in some restricted domains.

detection of other syntactic functions

The program achieves ~150 words per second. The precision/recall for subject detection is 92.6%/82.6% on newspaper text. Again, no overall accuracy is even mentioned.

Jean-Pierre Chanod and Pasi Tapanainen 1996:A Non-Deterministic Tokenizer for Finite-State Parsing.

This paper reports on a nondeterministic tokenizer for French taken to its limits. The tokenizer is actually a cascade of two FSTs: the first finds blank-delimited words, the second applies a huge multiword lexicon. This contains entries such as true multiwords (bon enfant/ADJ), their literal counterparts (bon/ADJ enfant/N), idioms (le pourquoi et le comment/N), foreign-language material (a priori/ADV), and general regular expressions (il y a <num> ans/ADV).

The tokenizer can access all morphological information in the entire system and generate ambiguous output. It thus increases recall maximally, but appears not to make any attempt at precision.

Chanod, Jean-Pierre and Tapanainen, Pasi (1996): A Robust Finite-State Parser for French. In ESSLLI'96 Workshop on Robust Parsing, pp. 12-16, Prague, Czech Republic.
Chanod, Jean-Pierre and Tapanainen, Pasi (1997): Finite-State Based Reductionist Parsing for French. Kornai, András (ed.), Extended Finite State Models of Language. Cambridge University Press, forthcoming.
Ejerhed, Eva (1996): Finite state segmentation of discourse into clauses. In Natural Language Engineering 2(4) 355-364. Cambridge University Press.

Gregory Grefenstette 1996:Light Parsing as Finite-State Filtering

Grefenstette details how FSTs can be used to build `rapid, robust and reasonably accurate' parsers, which he claims are the preconditions for large-scale NLP. He detects non-ambiguous longest-match NPs as usual. In addition, he automatically introduces head markers (e.g. to the rightmost noun in an NP) and then uses them to find subjects (head nouns to the left of an active verb, with no NP or VP markers in between) with 80% accuracy. The fact that everything in this system is an FST is cited as an advantage.

Aravind Joshi 1996: A Parser from Antiquity: An Early Application of Finite State Transducers to Natural Language Parsing.

Joshi describes how a 1959 project employed methods that would be called cascaded FSTs today. The motivation was that computational power was extremely limited (they wrote UNIVAC machine language). They already had idiom detetion, rule-based POS tagging, multiwords default selections, FSTs for NPs, adjuncts, verb clusters and clauses (where the last one is actually not finite-state). A commentary by Karttunen sadly reflects on how advances in hardware have not been equalled by advances in grammar writing.

Koskenniemi, Kimmo (1990): Finite-state Parsing and Disambiguation. In Proceedings of COLING-90, Helsinki, 229-232.
Koskenniemmi, Kimmo, Tapanainen, Pasi and Voutilainen, Atro (1992): Compiling and using finite-state syntactic rules. Proceedings of COLING-92, I, pp. 156-162, Nantes, France.
Mohri, Mehryar (1994): Finite-state transducers in language and speech processing. Computational Linguistics, 20(1).
Pereira, Fernando C. N. and Wright, Rebecca N. (1996): Finite state approximation of phrase structure grammars.

Emmanuel Roche 1996: Parsing with Finite-State Transducers

Roche claims that "FST models are not an inaccurate but efficient tool, but one of the best formalisms to accurately represent complex linguistic information." He substantiates this by showing how lots of phenomena can be modelled with them.

The method is to have an entry in a syntactic dictionary for each word and its context (e.g. N thinks that S), transform the entries into FSTs and join them all with the union operation, and apply the resulting huge FST repeatedly until no change occurs in the input string.

To avoid the duplication introduced by lexicalized rules, a morphological transducer is introduced that outputs sequences like

# pn John # v leave left # det this # nn morning

This method is then applied to modal verb chains, discontinuous operators, hidden support verbs etc. (basically everything that is not recursive, although the word is never mentioned).

An example tree-adjoining grammar is transformed into such a monster FST to show how non-contextfree languages can be parsed. Much is made of the fact that dictionary lookup and CFG transformation can also be done by FST, and word graphs can be written as DFAs so that everything is an FST.

Anne Schiller 1996: Multilingual Finite-State Noun Phrase Extraction

Schiller presents NP extraction in seven languages. The tokenizer handles clitics, elision etc. The POS tagger employs morphological analysis, a transducer for tag mapping (e.g. remove case information) and an HMM tagger. NP detection is done in the usual way. On a small technical manual the extraction reaches an accuracy of 98.5% (German) and 96.6% (French).

Senellart, Jean (1998): Locating noun phrases with finite state transducers. In Proceedings of COLING-ACL '98, pp. 1212-1219.

Probabilistic methods

Church, K. W. (1988): A stochastic parts program and noun phrase parser for unrestricted text. In Proc. of the 2nd applied natural language processing conference, Austin, Texas, ACL, 136-143.

Wojciech Skut and Thorsten Brants 1998:Chunk Tagger: Statistical Recognition of Noun Phrases

This paper covers the same ground as the later paper, except that it uses normal HMMs and not Maximum Entropy learning.

Zhai, Chengxiang (1997): Fast Statistical Parsing of Noun Phrases for Document Indexing. In Proceedings of the 5th Conference on Applied Natural Language Processing,
Osborne, Miles (2002): Shallow Parsing using Noisy and Non-Stationary Training Material. Journal of Machine Learning Research, pp 695-719.
Megyesi, Beata (2002): Shallow Parsing with POS Taggers and Linguistic Features. Journal of Machine Learning Research, pp 639-668.
Zhang, Tong and Damerau, Fred and Johnson, David (2002): Text Chunking based on a Generalization of Winnow. Journal of Machine Learning Research, pp 615-637.
Dejean, Herve (2002): Learning Rules and Their Exceptions. Journal of Machine Learning Research, pp 669-693.

Cardie, Claire and Pierce, David 1998: Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

Apparently in direct reaction to Voutilainen, the authors argue for statistical, corpus-based NP extraction. They extract all possible POS tag sequences for NPs from a training corpus annotated with simple base NPs and then find the longest possible match in the POS tags of the test text. To avoid overgenerating, they prune the pattern set by counting how many errors each rule introduces, ranking the rules, and discarding bad rules.

To overcome some obvious systematic errors, local repair heuristics are used in another step that does look at the individual words, but employs hand-written rules again. With these, the performance approaches 94%/94% on the Penn Treebank.

Molina, Antonio and Ferran Pla (2002): Shallow Parsing using Specialized HMMs. Journal of Machine Learning Research, pp 595-613.
James Hammerton, Miles Osborn, Susan Armstrong and Walter Daelmans (2002): Introduction to Special Issue on Machine Learning Approaches to Shallow Parsing. Journal of Machine Learning Research, pp 551-558. Washington DC, 31 March - 3 April,
David M. Magerman (1994): Natural Language Parsing as Statistical Pattern Recognition. Phd Thesis. Standford University. cmp-lg/9405009v2.

Maximum Entropy

Wojciech Skut and Thorsten Brants 1998:A Maximum-Entropy Partial Parser for Unrestricted Text

The authors employ ME for parsing POS tag sequences, arguing that ME can combine information from multiple sources, detect relevant features automatically, and allows for feature decomposition. They model the context of a word by triples of POS tag, relation to previous word, and tag of the parent node. The relation can be one of six classes such as `The parent of node A is the grandparent of node B'. This allows modelling almost all occurring NPs, even the recursive ones.

A second-order HMM is then trained to recognize these triples, recognizing about 88% of NPs and APs correctly on the NEGRA corpus. Linear HMM training is about 1% worse, but the difference almost disappears as more training data are used.

see also TaggingAndMore

Memory Based

Erik F. Tjong Kim Sang 2002: Memory-Based Shallow Parsing

This is an extensive treatment of memory-based learning applied to NLP, claiming that "All natural language tasks can be performed successfully by memory-based learners." It details how MBL solves the classification task (with the overlap metric, weighted features or the gain ratio), how to suppress irrelevant features (bidirectional hill climbing on the space of features employed), and how combining several learners improves accuracy (by voting, or with stacked classifiers).

In NP chunking, the different methods of marking NO chunks are exploited to build a six-fold combined learner. The combination does boos recognition from 92 to 93%, but the authors own previous method without feature selection does just as well, so they revert to it for the task of general chunk detection and achieve 92.6%.

Clause identification is more difficult, particularly because multiple simultaneous clause openings, and only an f-measure of 67.8% is reached. NP parsing is performed by finding the heads of NP chunks and re-running the chunker on them, for an f-measure of 83.8%. The authors achieve only 80.5% on full parsing and note that although improvements can be imagined on their parser, it is already bigger and slower than a better statistical full parser.

Sandra Kübler and Erhard W. Hinrichs 2001: TüSBL: A Similarity-Based Chunk Parser for Robust Syntatic Processing

Although Abney's original chunk paper speaks about a chunker and an attacher, so far almost all work has been on chunkers. Here (after POS tagging and applying an Abney chunker) a similarity-based tree constructor is used to compute the structure within chunks and to combine them into trees if possible. The metric is the similarity between entire trees. If no exact match can be found, subchunks, word sequences or tag sequences are searched for near-matches. The average accuracy is 77.2%/67.3%. Since the PARSEVAL measure always considers partial solutions as wrong, the treebank author was called upon to judge the output, and found it to be 92.4% precise.

Other methods

Chen, Kuang-hua and Chen, Hsin-hsi (1994): Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation. Meeting of the Association for Computational Linguistics, pages 234-241, citeceere
Evans, A. David and Zhai, Chengxiang (1996): Noun-Phrase Analysis in Unrestricted Text for Information Retrieval.
Federici, Stefano, Montemagni, Simonetta and Pirrelli, Vito (1996): Shallow Parsing and Text Chunking: a View on Underspecification in Syntax. In ESSLLI'96 Workshop on Robust Parsing, Prague, Czech Republic
Grefenstette, Gregory and Teufel, Simone (1995) : Corpus-based Method for Automatic Identification of Support Verbs for Nominalizations.
Lavie, Alon (1994): An integrated heuristic scheme for partial parse evaluation.
Lyon, Caroline and Dickerson, Bob (1995) : A fast partial parse of natural language sentences using a connectionist method.
Min, Kyongho and Wilson, William H. (1998): Integrated Control of Chart Items for Error Repair. In Proceedings of COLING-ACL '98, pp. 862-868.
Nasukawa, Tetsuya (1995): Robust Parsing Based on Discourse Information - Completing partial parses of ill-formed sentences on the basis of discourse information. Computational Linguistics, cmp-lg/9505042, IBM Research, Tokyo Research Laboratory, nasukawa@trl.vnet.ibm.com
Ramshaw, Lance A. and Marcus, Mitchell P. (1995): Text Chunking using Transformation-Based Learning. Computational Linguistic, cmp-lg/0505040 v2, ramshaw@polar.bowdoin.edu, mitch@linc.cis.upenn.edu
Rayner, Manny and Carter, David (1996): Fast parsing using pruning and grammar specialization.
Schneider, David and McCoy, Kathleen F. (1998): Recognizing Syntactic Errors in the Writing of Second Language Learners. In Proceedings of COLING-ACL '98, pp. 1198-1204.
Voutilainen, Atro and Järvinen, Timo (1995): Specifying a shallow grammatical representation for parsing purposes

Atro Voutilainen 1995: NPtool, a detector of English noun phrases

Voutilainen openly states that he prefers rule-based to statistically derived descriptions because they can be more accurate and reliable. He justifies this by presenting a multi-stage NP extractor for English, with very high recall (98-100%) and precision (95-98%).

After segmentation and idiom detection, a very thorough morphological analyser generates all possible morphological analyses. Morphosyntactic constraints then eliminate impossible forms (single cannot be a verb after a) and impossible syntactical functions (harry cannot be a prenominal element in is harry foolish?). Several partial FSTs then parse a sentence for NPs, one which penalizes NPs, one which rewards them. Maximal NPs that are in the output of both parsers are then assumed to be correct.

CDG

Navigation

Publications

NatsWiki
Main
User
Sandbox
System

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback