Hybrid Analysis
The term
hybrid with respect to NLP methods is particularly
ambiguous. It can mean `dealing with syntax and semantics', `using
deep and shallow mechanisms', `emulating both halves of the brain', or
just `using more than one knowledge source'. Here are some examples.
C.T. Kitzmiller and J.S. Kowalik 1987:
Coupling Symbolic and Numeric Computing in Knowledge-Based Systems.
AI Magazine 8, Nr. 2, 85--90 (not online)
This is a report from the Workshop on Coupling Symbolic and Numeric
Computing in Expert Systems (Seattle, 1985). Much thought was given
back then was to integrating numeric optimisation with expert systems,
e.g. to help a user select the proper algorithm for a task or build an
intelligent user interface (which seems much the same thing to me).
The second reason was to enable reasoning with ambiguous,
contradictory and imprecise data. The report even claims that
`integrating formal mathematical methods and methods based on symbolic
knowledge' could `solve some of the problems currently deemed
intractable'.
`Shallow' coupling is characterised as treating numeric routines as black
boxes, i.e. the decision when to apply numeric methods and what to do
with their results depends only on the variables problem's state
variables. A `deep' coupled system on the other hand knows the
operating envelope of each component it has and selects them
accordingly. Although the distinction does not seem very critical to
me, much is made of it; deeply coupled systems supposedly increase
robustness, performance, and maintainability.
Kanaan A. Faisal and Stan C. Kwasny 1990:
Design of a Hybrid Deterministic Parser
This work is an interesting contribution from the time when not just
grammars, but the actual parsers were hand-coded for a specific task.
Is uses the (then) standard PARSIFAL method of deterministic parsing
with a fixed lookahead and hand-written rules that can perform
complicated actions such as `CREATE VP node' or `ACTIVATE clause-level
rule packet'. Ordinarily the actions would have been selected by
considering all defined rules in all active `rule packets' and
executing the one with the highest static priority.
This experiment instead uses a three-layer connectionist network to
decide which rule to fire. It receives a coded representation of the
lookahead buffer and the top of the stack and ultimately activates one
of the output nodes that represent one rule each. (Apparently the
parser still has the same set of complicated actions as before, only
the decision which one to use is taken by the network.)
Some grammatical sentences are given for which the hybrid system
constructs the same correct parse tree as the grammar by itself. Also,
some ungrammatical sentences are shown for which the hybrid system
creates plausible parse trees, including one for which Charniak's
PARAGRAM yields a nonsensical structure. Syntactic context can also be
exploited to disambiguate lexically ambiguous words. The advantage of
the hybrid system is thus that it can process some ungrammatical input
with the unchanged grammar.
It is not quite clear how this actually works, though. If the rules of
the grammar are retained, and the hybrid system manages to navigate
the input returning a parse, could the normal parser not have found
the same action sequence? Or would it have become trapped by its
no-backtracking rules? `Strength' values are given for the
ungrammatical sentences which `reflect the certainty with which
individual actions are selected'. They are supposed to be `the reciprocal
of the average error per processing step'. What is an `error' in using
a deterministic parser? It cannot be what we today call a parsing
error, for the ungrammatical sentences do not have canonical parse
trees. Is it the event that the selected rule can in fact not be
applied?
John T. Maxwell III and Ronald M. Kaplan 1993:
The Interface between Phrasal and Functional Constraints
This contribution is classified under `Hybrid Analysis' because it
talks about the interface between phrasal and functional constraints
as a source of complexity, distinct from the complexity of either
part.
It is widely believed that both a context-free and a an
attribute-value component are needed for linguistic specification. The
context-free part can be computed in polynomial time, while equality
or unification constraints are exponentially expensive. How then
should these components be combined? The obvious solution is to
formulate the context-free backbone in the more general formalism and
solve it as if it were exponential too. This is unattractive because
it contradicts our opinion that part of the problem should be easier
than we make it.
But successive application of both algorithms is even worse: it is
easy to first solve the context-free constraints and then enumerate
the resulting phrase structure trees, but this is a `computational
disster': the net effect is `to produce an exponential number of
potentially exponential functional consraint problems.' This
particularly nasty behaviour stems not from wither part, but from the
unsuitable combination of both. Maxwell & Kaplan propose interleaved
processing as an alternative that, while also exponential in the worst
case, may fare better overall: functional constraints are solved as
the constituent that they apply to is constructed; they constituent is
discarded if the constraints cannot be satisfied. This strategy is
still exponential but `avoids the blatant excesses of simple
composition'. The paper then describes `alternative strategies that can
provide exponential improvements in certain common situations and
suggests a number of areas for further exploration'.
If the previous algorithmic approach was really as described here (and
I doubt it was), it must have been an enormously stupid thing to do,
In fact, the authors say right at the start of section two that `all
known polynomial parsers make essentially equivalent use of a
well-formed substring table', which is exactly the opposite of the
naive composition they denounce. So instead of unveiling a dramatic
new method of improvement, the authors actually only give a standard
`bag of tricks' report.
Amon B. Seagull and Lenhart K. Schubert 2001:
Guiding a Linguistically Well-Founded Parser with Head Patterns
The authors rightly decry previous parsing work as mere
`reconstruction of parse trees of the sort annotated in the Penn
Treebank'. True parsing, they say, has as its goal the construction of
a semantic representation. The flatness of Penn noun phrases is
criticised (with `an important job benefit factor', it is unclear
whether the job or the factor is important). Also, generative rules
which have the same form and content may still be different; for
instance there are two rules of the form NP -> NP NP, one which
generates `5 Dollars a share' and one which generates `the dog
biscuit'. To understand what is being said we would need to know which
of the two rules created the phrase we see.
The authors want to augment a `linguistically well-founded' grammar
with corpus-based probabilities. It is actually the Boeing grammar of
Simplified English (which does not actually seem particularly very
well-founded to me, but does have the distinction of having been
handwritten) with various custom changes mentioned only briefly.
WordNet was used to build language models over word
senses
rather than forms. A test corpus was then extracted from both the Penn
Treebank and the sense-tagged part of the Brown corpus; many pages are
spent on the apparently very intricate merging mechanism. On a test
set of 1144 sentences, the labelled bracketing precision increases
insignificantly, but the authors say you should evaluate the precision
of the expansion decisions, which goes up from 84.4% to 85.1%
Berthold Crysmann, Anette Frank, Bernd Kiefer, Stefan Müller et al. 2002:
An Integrated Architecture for Shallow and Deep Processing
This paper discusses the WHITEBOARD system developed in Saarbrücken.
The authors argue (somewhat defensively, since they use a deep
processing system in a world owned by shallow systems) that the
quality of real-life tasks (information extraction) can be improved by
using deep processing, at least on demand when needed.
WHITEBOARD lets different components run in parallel and communicate
through an `OOP interface'. Shallow analysis is performed by SPPC, a
standard finite-state cascade for tokenization, NE detection, POS
tagging etc. Deep analysis is performed by a venerable HPSG that has
already done duty in Babel, Verbmobil and LKB, parsed with Callmeier's
PET system.
Coupling morphology results to the HPSG was `easily established'. POS
tags are used as preferences and to select between different default
entries for unknown words. Recognized NEs map to single HPSG types;
complex recurring NEs such as date expressions are stored as
prototypical entries, i.e. pre-built feature structures that
are retrieved in a single step when appropriate, but then filled with
the day and month information from the actual instance so it is
available to unification.
Semantic information about unknown nouns (which is directly used by
the HPSG) is retrieved by converting the GermaNet hierarchy to the
(much smaller) ontology used by the HPSG through an automated mapping.
The effect of this process is not measured, but 77% of all unknown
nouns receive the correct sort.
Also, the HPSG search space can be reduced by using shallow
information for a `partial pre-partitioning' of complex sentences.
Becker's stochastic topological parser is used for the purpose.
When testing the entire system on the NEGRA corpus, the shallow
components raised the lexical coverage (i.e. the fraction of sentences
whose words are all known) from 29% to 71%. The actual syntactic
coverage rose from 12.5% to 22.1%.
The actual IE task was
management succession detection
Template filling rules are applied both to the shallow and to the deep
results; e.g. the string `Nachfolger von X' yields a relation with the
person_out slot already filled. A unification rules might match on a
structure with PRED[übernehmen], AGENT and THEME features.
There is no evaluation of the earlier claim that deep processing
improves performance, just anecdotal evidence: e.g. the complicated
sentence `Peter Miscke zufolge wurde Dietmar Hopp gebeten, die
Entwicklungsabteilung zu übernehmen.' can only be analysed correctly
by a rules opperating on the HPSG output because of difficulties with
control verbs, passive voice and free word order.
The discussion of language checking is even shorter: the idea is
presented that a shallow system could be used to find error candidates
and the deep system to verify that they actually are erroneous. This,
however, is entirely hypothetical.
Roberto Basolini, Alessandro Lenci, Simonetta Montemagni, Vito Pirrelli 2004:
Hybrid Constraints for Robust Parsing: First Experiments and Evaluation
The IDEAL+ system described here pairs
`deep' and `shallow' parsing (in the authors' words). Its phases are
chunking, edpendency assignment and `constrains application'. Only
subcategorization and order constraints are used, and the authors call
them `hybrid' both because they use different knowledge sources and
because they were both handwritten and induced.
The experiments investigate what contribution different constraints
make to disambiguation for the task of subject/object distinction in
Italian (comparable to the task in German). Always preferring the SVO
reading achieves an f-score of 93.5%; using a lexicon that knows the
subcat frames of all verbs concerned achieves 90.6%; and using
lexicalized constraints that can judge the preverbal and postverbal
position differently depending on the verb are bast, achieving 95.9%.
The authors conclude that lexicalized constraints play a key role (an
exaggeration, since the gain in accuracy is small) for dependency
selection, provided that they are probabilistic: it is no good having
a lexicon that tells you that `foo' can be transitive if that sense
is, in fact, very rare.
Wolfgang Minker 2004:
Comparative Evaluation of a Stochastic Parser on Semantic and Syntactic-Semantic Labels
Reports on research on a NLU system (train travel information) that
uses
Semantic Case Grammar. The stochastic parsing component
did somewhat better (the error rate drops by a fifth) if
syntactic-semantic information is available to it as well as semantic
information. `Complex models yielding a high number of parameters are
justified as long as they convey significant information.' The author
seems to find this remarkable.
Kiril Simov and Petya Osenova 2004:
A Hybrid Strategy for Regular Grammar Parsing
The parser is described that performs annotations for the BulTreeBank.
It is supposed to annotate only certain cases and leave others
underspecified. It uses different regular grammars to try to parse
pieces of the input; annotation is non-monotonic, i.e. pieces of
annotation can be removed again in a later step, and
non-deterministic in the sense that the available grammars can be
applied in different orders. Actual annotation proceeds in three
steps: easy-first, bottom-up treatment of base NPs, APs and verb
nuclei, clitics, NEs, idioms, dates and multiwords; top-down clause
and fixed-expression detection; and network-based treatment of PPs,
infinitives, coordinations, relatives and discontinuities.
Quote: `The utility of this hybrid strategy is proved during the
annotation of the sentences in the BulTreeBank project.'
The article does not mention at all that the project itself
(
http://www.bultreebank.org/) uses HPSG structures. Interesting to see
that even so they use finite-state parsers for bulk annotation.
Mary Swift, James Allen and Daniel Gildea:
Skeletons in the parser: Using a shallow parser to improve deep parsing
This work does what we thought was frivolous when we considered it a
while back: it uses one parser (Collins) as an oracle for another
(TRIPS). (It seems I shall have to do the same for German to stay
competitive.) Note that Swift calls Collins's parser `shallow', not
because it doesn't create trees (it does), but simply because it is a
PCFG. (I get the impression that `shallow' means `whatever it is that
other people do'.)
The domain that TRIPS parses is
H2H dialogues in (simulated) emergency
rescue situations, which is quite different from WSJ text, but it is
reported that there are `islands of stability' in Collins's output
which are nevertheless useful. TRIPS itself is a GPSG/HPSG of English
with `fairly extensive coverage' operating `close to real-time for
short utterances' (1-9 words).
The PCFG performance was only 32%/64% on the medical data, not
surprising since it assumes systematically different phrase structure
then TRIPS (see p. 386). Still, if you give the edges that it predicts
a 3% bonus on their score in the TRIPS chart, it becomes 2.4 times
faster, and sentence accuracy rises from 48.5% to 49.5%.
--
KilianAFoth --
06 Jun 2005