Hybrid Analysis

The term hybrid with respect to NLP methods is particularly ambiguous. It can mean `dealing with syntax and semantics', `using deep and shallow mechanisms', `emulating both halves of the brain', or just `using more than one knowledge source'. Here are some examples.

C.T. Kitzmiller and J.S. Kowalik 1987:
Coupling Symbolic and Numeric Computing in Knowledge-Based Systems.
AI Magazine 8, Nr. 2, 85--90 (not online)

This is a report from the Workshop on Coupling Symbolic and Numeric Computing in Expert Systems (Seattle, 1985). Much thought was given back then was to integrating numeric optimisation with expert systems, e.g. to help a user select the proper algorithm for a task or build an intelligent user interface (which seems much the same thing to me). The second reason was to enable reasoning with ambiguous, contradictory and imprecise data. The report even claims that `integrating formal mathematical methods and methods based on symbolic knowledge' could `solve some of the problems currently deemed intractable'.

`Shallow' coupling is characterised as treating numeric routines as black boxes, i.e. the decision when to apply numeric methods and what to do with their results depends only on the variables problem's state variables. A `deep' coupled system on the other hand knows the operating envelope of each component it has and selects them accordingly. Although the distinction does not seem very critical to me, much is made of it; deeply coupled systems supposedly increase robustness, performance, and maintainability.

Kanaan A. Faisal and Stan C. Kwasny 1990:
Design of a Hybrid Deterministic Parser

This work is an interesting contribution from the time when not just grammars, but the actual parsers were hand-coded for a specific task. Is uses the (then) standard PARSIFAL method of deterministic parsing with a fixed lookahead and hand-written rules that can perform complicated actions such as `CREATE VP node' or `ACTIVATE clause-level rule packet'. Ordinarily the actions would have been selected by considering all defined rules in all active `rule packets' and executing the one with the highest static priority.

This experiment instead uses a three-layer connectionist network to decide which rule to fire. It receives a coded representation of the lookahead buffer and the top of the stack and ultimately activates one of the output nodes that represent one rule each. (Apparently the parser still has the same set of complicated actions as before, only the decision which one to use is taken by the network.)

Some grammatical sentences are given for which the hybrid system constructs the same correct parse tree as the grammar by itself. Also, some ungrammatical sentences are shown for which the hybrid system creates plausible parse trees, including one for which Charniak's PARAGRAM yields a nonsensical structure. Syntactic context can also be exploited to disambiguate lexically ambiguous words. The advantage of the hybrid system is thus that it can process some ungrammatical input with the unchanged grammar.

It is not quite clear how this actually works, though. If the rules of the grammar are retained, and the hybrid system manages to navigate the input returning a parse, could the normal parser not have found the same action sequence? Or would it have become trapped by its no-backtracking rules? `Strength' values are given for the ungrammatical sentences which `reflect the certainty with which individual actions are selected'. They are supposed to be `the reciprocal of the average error per processing step'. What is an `error' in using a deterministic parser? It cannot be what we today call a parsing error, for the ungrammatical sentences do not have canonical parse trees. Is it the event that the selected rule can in fact not be applied?

John T. Maxwell III and Ronald M. Kaplan 1993:
The Interface between Phrasal and Functional Constraints

This contribution is classified under `Hybrid Analysis' because it talks about the interface between phrasal and functional constraints as a source of complexity, distinct from the complexity of either part.

It is widely believed that both a context-free and a an attribute-value component are needed for linguistic specification. The context-free part can be computed in polynomial time, while equality or unification constraints are exponentially expensive. How then should these components be combined? The obvious solution is to formulate the context-free backbone in the more general formalism and solve it as if it were exponential too. This is unattractive because it contradicts our opinion that part of the problem should be easier than we make it.

But successive application of both algorithms is even worse: it is easy to first solve the context-free constraints and then enumerate the resulting phrase structure trees, but this is a `computational disster': the net effect is `to produce an exponential number of potentially exponential functional consraint problems.' This particularly nasty behaviour stems not from wither part, but from the unsuitable combination of both. Maxwell & Kaplan propose interleaved processing as an alternative that, while also exponential in the worst case, may fare better overall: functional constraints are solved as the constituent that they apply to is constructed; they constituent is discarded if the constraints cannot be satisfied. This strategy is still exponential but `avoids the blatant excesses of simple composition'. The paper then describes `alternative strategies that can provide exponential improvements in certain common situations and suggests a number of areas for further exploration'.

If the previous algorithmic approach was really as described here (and I doubt it was), it must have been an enormously stupid thing to do, In fact, the authors say right at the start of section two that `all known polynomial parsers make essentially equivalent use of a well-formed substring table', which is exactly the opposite of the naive composition they denounce. So instead of unveiling a dramatic new method of improvement, the authors actually only give a standard `bag of tricks' report.

Amon B. Seagull and Lenhart K. Schubert 2001:
Guiding a Linguistically Well-Founded Parser with Head Patterns

The authors rightly decry previous parsing work as mere `reconstruction of parse trees of the sort annotated in the Penn Treebank'. True parsing, they say, has as its goal the construction of a semantic representation. The flatness of Penn noun phrases is criticised (with `an important job benefit factor', it is unclear whether the job or the factor is important). Also, generative rules which have the same form and content may still be different; for instance there are two rules of the form NP -> NP NP, one which generates `5 Dollars a share' and one which generates `the dog biscuit'. To understand what is being said we would need to know which of the two rules created the phrase we see.

The authors want to augment a `linguistically well-founded' grammar with corpus-based probabilities. It is actually the Boeing grammar of Simplified English (which does not actually seem particularly very well-founded to me, but does have the distinction of having been handwritten) with various custom changes mentioned only briefly. WordNet was used to build language models over word senses rather than forms. A test corpus was then extracted from both the Penn Treebank and the sense-tagged part of the Brown corpus; many pages are spent on the apparently very intricate merging mechanism. On a test set of 1144 sentences, the labelled bracketing precision increases insignificantly, but the authors say you should evaluate the precision of the expansion decisions, which goes up from 84.4% to 85.1%

Berthold Crysmann, Anette Frank, Bernd Kiefer, Stefan Müller et al. 2002:
An Integrated Architecture for Shallow and Deep Processing

This paper discusses the WHITEBOARD system developed in Saarbrücken. The authors argue (somewhat defensively, since they use a deep processing system in a world owned by shallow systems) that the quality of real-life tasks (information extraction) can be improved by using deep processing, at least on demand when needed.

WHITEBOARD lets different components run in parallel and communicate through an `OOP interface'. Shallow analysis is performed by SPPC, a standard finite-state cascade for tokenization, NE detection, POS tagging etc. Deep analysis is performed by a venerable HPSG that has already done duty in Babel, Verbmobil and LKB, parsed with Callmeier's PET system.

Coupling morphology results to the HPSG was `easily established'. POS tags are used as preferences and to select between different default entries for unknown words. Recognized NEs map to single HPSG types; complex recurring NEs such as date expressions are stored as prototypical entries, i.e. pre-built feature structures that are retrieved in a single step when appropriate, but then filled with the day and month information from the actual instance so it is available to unification.

Semantic information about unknown nouns (which is directly used by the HPSG) is retrieved by converting the GermaNet hierarchy to the (much smaller) ontology used by the HPSG through an automated mapping. The effect of this process is not measured, but 77% of all unknown nouns receive the correct sort.

Also, the HPSG search space can be reduced by using shallow information for a `partial pre-partitioning' of complex sentences. Becker's stochastic topological parser is used for the purpose.

When testing the entire system on the NEGRA corpus, the shallow components raised the lexical coverage (i.e. the fraction of sentences whose words are all known) from 29% to 71%. The actual syntactic coverage rose from 12.5% to 22.1%.

The actual IE task was management succession detection Template filling rules are applied both to the shallow and to the deep results; e.g. the string `Nachfolger von X' yields a relation with the person_out slot already filled. A unification rules might match on a structure with PRED[übernehmen], AGENT and THEME features.

There is no evaluation of the earlier claim that deep processing improves performance, just anecdotal evidence: e.g. the complicated sentence `Peter Miscke zufolge wurde Dietmar Hopp gebeten, die Entwicklungsabteilung zu übernehmen.' can only be analysed correctly by a rules opperating on the HPSG output because of difficulties with control verbs, passive voice and free word order.

The discussion of language checking is even shorter: the idea is presented that a shallow system could be used to find error candidates and the deep system to verify that they actually are erroneous. This, however, is entirely hypothetical.

Roberto Basolini, Alessandro Lenci, Simonetta Montemagni, Vito Pirrelli 2004:
Hybrid Constraints for Robust Parsing: First Experiments and Evaluation

The IDEAL+ system described here pairs `deep' and `shallow' parsing (in the authors' words). Its phases are chunking, edpendency assignment and `constrains application'. Only subcategorization and order constraints are used, and the authors call them `hybrid' both because they use different knowledge sources and because they were both handwritten and induced.

The experiments investigate what contribution different constraints make to disambiguation for the task of subject/object distinction in Italian (comparable to the task in German). Always preferring the SVO reading achieves an f-score of 93.5%; using a lexicon that knows the subcat frames of all verbs concerned achieves 90.6%; and using lexicalized constraints that can judge the preverbal and postverbal position differently depending on the verb are bast, achieving 95.9%.

The authors conclude that lexicalized constraints play a key role (an exaggeration, since the gain in accuracy is small) for dependency selection, provided that they are probabilistic: it is no good having a lexicon that tells you that `foo' can be transitive if that sense is, in fact, very rare.

Wolfgang Minker 2004:
Comparative Evaluation of a Stochastic Parser on Semantic and Syntactic-Semantic Labels

Reports on research on a NLU system (train travel information) that uses Semantic Case Grammar. The stochastic parsing component did somewhat better (the error rate drops by a fifth) if syntactic-semantic information is available to it as well as semantic information. `Complex models yielding a high number of parameters are justified as long as they convey significant information.' The author seems to find this remarkable.

Kiril Simov and Petya Osenova 2004:
A Hybrid Strategy for Regular Grammar Parsing

The parser is described that performs annotations for the BulTreeBank. It is supposed to annotate only certain cases and leave others underspecified. It uses different regular grammars to try to parse pieces of the input; annotation is non-monotonic, i.e. pieces of annotation can be removed again in a later step, and non-deterministic in the sense that the available grammars can be applied in different orders. Actual annotation proceeds in three steps: easy-first, bottom-up treatment of base NPs, APs and verb nuclei, clitics, NEs, idioms, dates and multiwords; top-down clause and fixed-expression detection; and network-based treatment of PPs, infinitives, coordinations, relatives and discontinuities.

Quote: `The utility of this hybrid strategy is proved during the annotation of the sentences in the BulTreeBank project.'

The article does not mention at all that the project itself (http://www.bultreebank.org/) uses HPSG structures. Interesting to see that even so they use finite-state parsers for bulk annotation.

Mary Swift, James Allen and Daniel Gildea:
Skeletons in the parser: Using a shallow parser to improve deep parsing

This work does what we thought was frivolous when we considered it a while back: it uses one parser (Collins) as an oracle for another (TRIPS). (It seems I shall have to do the same for German to stay competitive.) Note that Swift calls Collins's parser `shallow', not because it doesn't create trees (it does), but simply because it is a PCFG. (I get the impression that `shallow' means `whatever it is that other people do'.)

The domain that TRIPS parses is H2H dialogues in (simulated) emergency rescue situations, which is quite different from WSJ text, but it is reported that there are `islands of stability' in Collins's output which are nevertheless useful. TRIPS itself is a GPSG/HPSG of English with `fairly extensive coverage' operating `close to real-time for short utterances' (1-9 words).

The PCFG performance was only 32%/64% on the medical data, not surprising since it assumes systematically different phrase structure then TRIPS (see p. 386). Still, if you give the edges that it predicts a 3% bonus on their score in the TRIPS chart, it becomes 2.4 times faster, and sentence accuracy rises from 48.5% to 49.5%.

-- KilianAFoth -- 06 Jun 2005
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback