Dependency Grammars
Keywords: constraint grammars, property grammars, dependency grammars, dependency modelling, dependency parsing
Bibtex
...
Sources
Comments
Timo Järvinen and Pasi Tapanainen:
Towards an implementable dependency grammar
The authors claim both linguistic adequacy and computational
tractability for their model, which they call
parsing grammar.
(Interestingly, they make the exact same claim as we do: "NP-hard in
general, but there is a specific solution for the given grammar that
can be run quickly.") Dependency as opposed to constituency is
motivated entirely with the argument of discontinuous constituents.
They claim that their own Functional Dependency Grammar conforms to
Tesnière's original, 1959 formulation of dependency syntax (which they
say has been much misunderstood): the nodes in dependency trees should
be `nuclei' rather than words, e.g. the entire verb chain should be
represented in just one node; each node has exactly one regent;
there are nonprojective edges, but no fragments.
Tesnière's notion of structural vs. linear order is contrasted with
the standard ID/LP; a big difference is that Tesnière does not really
formalize the linear order rules at all. The case of English
adjectives is used to argue, rather unconvincingly, that "there
are no simple rules or principles for linearisation", when in fact
they effectively give the three simple relevant rules in a footnote.
Examples are given for sentences with coordinations such as "John and
Bill love Mary and Joan" and "Jack painted the kitchen white and the
living room blue". As usual, the actual trees are quite unreadable
because the words are written into the nodes rather than at the
bottom.
The authors claim to have an efficient, broad-coverage grammar for
English, but give no evaluation. Note that Connexor's web site has a
syntax demo, but one that works rather differently; e.g. it obviously
uses words rather than nuclei.
Vincenzo Lombardo and Leonardo Lesmo:
Unit Coordination and Gapping in Dependency Theory
A dependency account of coordination is detailed that follows Melcuk
(who says that coordination is never truly symmetrical, and that
choosing one of the elements as the head is therefore justified), so
that their model is the same as ours: Dick<---and<---Jane.
Projectivity is maintained; long-distance dependencies are covered
instead by using non-lexical categories (traces), something CDG can't
do. "The dependencies rules that license coordination result from the
application of
metarules to primitive rules." The entire paper is
about particular metarules that allow analysis of gapped sentences
such as "I saw a unicorn and Carol a tyrannosaurus". An "Earley-type
parsing algorithm" is mentioned but not explained.
Tomás Holan, Vladislav Kubon, Karel Oliva and Martin Plátek:
Two Useful Measures of Word Order Complexity
Any realistic account of Czech must allow nonprojective structures;
the question is how to what extent it should be allowed in their
Robust Free-Order Dependency Grammar. This paper develops measures for
such nonprojectivity.
Despite its name FODG uses normal binary rewrite rules, with normal
nonterminals. The node-gap complexity (NG) of an analysis is the
highest number gaps that appear in the yiels of a nonterminal. It is
shown that no fixed bound for NG can be given for correct Czech
sentences.
Norbert Bröker:
How to define a context-free backbone for DGs: implementing a DG in the LFG formalism
(This is an extension of the earlier paper "Separating Surface order
and Syntactic Relations in a Dependency Grammar".)
Multidimensional Dependency Grammar (DG) "decouples the dependency
tree from word order, such that surface ordering is not determined by
traversing the dependency tree". Previous approaches to word order
rules in DG are compared and criticised. Bröker's solution that is
"semantically motivated/able to restrict discontinuity to certain
constructions/able to be lexicalized/declarative/formally precise" are
Word Order Domains. "Word order domains loosely correspond to partial
phrases." In "Den Mann hat der Junge gesehen", the verb defines the
three WOD [Den Mann] [hat der Junge gesehen] [], and the nouns define
one WOD each. "Order domains define scopes for precedence predicates."
The trick now is that while dependency trees may be nonprojective, the
WOD structure remains projective. A modifier may not only be inserted
into the WOD of its head, but also of its transitive head. The labels
allowed on the path between the real and the transitive head can then
be used to limit possible extractions. The model is implemented in
LFG, with the WODs used to define a context-free backbone for DG.
Elke Teich:
Types of syntagmatic grammatical relations and their representation
The question `constituency or dependency?' is replaced with `Are
constituency
and dependency sufficient for representing language?',
and predictably aswered with "no".
1. Coordination. Hudson's word grammar assumes that constituents
are necessary to represent coordinations, and Teich believes him. No
mention is made of alternatives.
2. Information structure. Combinatorial Categorial Grammar can
represent different topic structures of the same sentences well:
Who ate the beans? - Fred ate the beans
What did Fred do? - Fred ate the beans.
Normal CFG always brackets this as [Fred [ate [the beans]]],
which corresponds only to the first variant, while CCG can also
produce [Fred ate] [the beans].
3. Agreement: HPSG can model German noun groups with their case,
number, gender, and inflexion type well.
Teich's solution is Systemic Functional Grammar, which transcends both
paradigms. It has four types of syntagmatic structure:
1. prosodic, meaning not "speech melody", but anything that spreads
out over more than one constituent, e.g. agreement.
2. periodic, dealing with prominence, e.g. information structure.
3. interdependency, used both for subordination and coordination, only
it's called hypotaxis and parataxis.
4. constituency, meaning what you think it means.
A syntagmatic representation of a sentence combines all four
annotation types, so that Fred is simultaneously Actor, Theme, Given,
Subject, alpha (to the beta that is `ate'), etc. Even Teich admits
that "SFG is rarely used in parsing" and cites negative results.
Eva Hajicová:
Movement rules revisited
Praguian dependency grammar contains nodes with complex labels with
morphological and lexical values. Function words are not present in
the tree; their contribution is also expressed by these labels.
The distinction between morphemics (surface strings) and syntax is
motivated by noting
- that function words cannot be said to be in the same kind of
dependency relation to their head as autosemantic words, because they
cannot possible appear alone
- there can be portmanteau words in the surface structure (such as `zum')
This paper argues a distinction between three layers of word order:
surface order, which is what you actually observe, communicative
dynamism, and systemic ordering.
`Dynamism' refers to information content: in `I was visited by a
painter in Paris last week' (if `Paris' is stressed), `Paris' (the
focus) is the more dynamic element compared to `I' (the topic). The
principle is motivated by noting that sentences like `Everybody in
this room knows two languages' actually change their meaning depending
on whether `Everybody' or `languages' is the focus. The communicative
order corresponds to the order of nodes in the corresponding
tectogrammatical tree.
Systemic ordering deals with rules such as `Temporal precedes
Actor'.
Movement rules are then described which mediate the differences between
these orderings. Phenomena such as sentences where the focus is not
rightmost, pre-modifying adjectives (which violate the dynamism
principle), the ordering of function words (which, as you will
remember, do not actually appear in the tree), or extraposed relatives
clauses are covered.
Cristina Barbero, Leonardo Lesmo, Vincenzo Lombardo and Paola Merlo:
Integration of syntactic and lexical information in a hierarchical dependency grammar
The early, structuralist account of linguistics in this century
relied heavily on the existence of categories, i.e. classes of units
that are in some sense interchangeable. But recently the importance of
lexicalization, i.e. not treating all nouns as equivalent, has grown
considerably. This paper investigates interaction between lexical and
syntactic information by organizing the entire grammar around the
subcategorization principle.
The formalism, of course, is the same as the one used in the seond
paper. A dependency rule might look like this:
sehe:VVFIN (
# )
Of course it is wasteful to have a rule of this kind for each
transitive verb. Therefore a fine-grained subcategorization hierarchy
is introduced. The left-corner filtering Earley parser has three
possible actions.
1. Prediction: guessing the dependents of the current node
2. Scanning: checking that the current word matches the subcategory
needed by the current node (otherwise backtracking is done)
3. Completion: Move on when the current node is complete.
A classification of 101 Italian verbs is then undertaken, with no
discernible relation to the previous exposition.
Harri Arnola:
On Parsing Binary Dependency Structures Deterministically in Linear Time
Very strong claims are made: "Deterministic, linear parsing of
dependency structures is possible under certain conditions. We
discuss a fully implemented parser and argue that those conditions
hold for at least one natural language." The system achieves 85%
correctness for Finnish.
Wolfgang Menzel and Ingo Schröder:
Decision Procedures for Dependency Parsing Using Graded Constraints
This was our own contribution.
Marie Bourdon, Lyne Da Sylva, Michel Gagnon, Alma Kharrat, Sonja Knoll and Anna Maclachlan:
A Case Study in Implementing Dependency-Based Grammars
This reports on a commercial grammarchecker for English that uses
dependency structures. A problem arose with stranded prepositions:
I was yelled at.
They knew the man we talked about.
The linguistically correct way of dealing with them opens a huge
search space, since every preposition ist hypothetically treated as
dangling. They solved the problem in different ways: for
pseudo-passives, a distinctive new edge label `prep-strand' is used,
and extra constraints make sure that it is only used when the
preposition matches the verb (e.g. `at' matches `yell'). (This cannot
be the whole truth, obviously. This kind of matching constraint would
prevent the `stranded' reading of `I yelled with all my power', but in
the much more common `I yelled at him' it would still be tried.)
For relative clauses, the stranded preposition is attached to the
antecedent, and this is only allowed if that word really is an
antecedent, i.e. carries a relative clause. However, some other
constraints had to be relaxed because this makes the relative clause
itself incomplete (`we talked').
The approach can deal with long-distance dependencies, but breaks down
when the stranded preposition is not clause-final: `The man I talked
about with Mary' is not possible.
An `overall improvement of the parser's performance' is reported, with
no further details.
Jacques Courtin and Damien Genthial:
Parsing with Dependency Relations and Robust Parsing
Courtin describes generative dependency grammars with rules such as
V(N,*,N) or similar. As usual, this would require one rule for every
conceivable type ov VP: one with no adverbials, one with one
adverbial, etc. For efficiency `dependency relations' are introduced,
e.g.:
N -> A := (-14,-15)
N -> D := (-16)
V -> N := (-20,+20)
This means that the determiner must precede adjectives, since -14 <
-16, that nothing else can come between two adjectives, since -15 + 1
= -14, and that nouns can follow or precede verbs, and need not be
adjacent, as there is lots of room between -20 and 20.
The parsing algorithm works on POS-tagged input and the DRs and
enumerates all possible structures in a matrix, by computing
recursively all possible subtrees for each governor. Paring is
"instantaneous" for sentences with up to 20 analyses.
The parser was then to be used in a NLL system, so it could not use
morphological info (since these funny Frenchpeople don't know about
soft constraints). Instead, a totally new parser was written which
used semantic knowledge to disambiguate. This one works by combining
rewriting rules of the type
N_V [
(1:{N}, (0, $F:{pf}) 2:{V})
=>
( ( 1, $F ) 2 ) ]
This means that if you find an N with preverbal particles following
it, and a V after that, you may combined then, and the V will be the
head (parentheses indicate dependency). This kind of context-dependent
rules can restrict the number of children one regent has, or forbid
multiple articles for one noun, etc. You can also write
N_C [
(1:{N}, 2:{C}, 3:{N})
=>
( (1) 2 (3) ) ]
and attach stuff at the left and at the right simultaneously.
Since this would again require one rule for every possible set of
modifiers, a hierarchy of categories is used where both `cnoun' and
`pnoun' can match `noun'. A tree transducer then computs all posible
trees non-deterministically.
Psi-terms (typed feature structures with sharing) are used to
"represent knowledge about words and trees":
UL(lex => "eats";
cat => verb;
subj => UL(sem => S:ANIMATE);
obj => UL(sem => O:EATABLE);
sem => INGEST(agent => S;
patient => O))
One such term is attached to each node in the tree. Strangely, the
paper ends here.
Tom B.Y. Lai and Huang Changning:
Complements and Adjuncts in Dependency Grammar Parsing Emulated by a Constrained Context-Free Grammar
Chinese is represented in dependency structures with rules such as
X(A,B,*,D,E)
Y(*)
*(Z)
meaning that an X can occur between AB and DE, Y can occur with no
modifiers (= is a leaf), and Z can occur with no regent (= is a root).
(This kind of formalism is weakly equivalent to CFG.) _Functional
annotation_ is the equivalent of edge labels and allows writing rules
like
VVFIN(NN(SUBJ), *, NN(OBJA))
Within Gazdar's PATR parser, such rules are implemented as follows:
TVP ---> [PN1, TV, PN2] :-
TVP:cat === tv,
TV:cat === tv,
PN1:cat === n,
PN2:cat === n,
TVP:ds === TV:ds,
TVP:ds:subj === PN1:ds,
TVP:ds:obj === PN2:ds.
i.e. with the nonterminals annotated by features. The authors call
this a context-free PSG constrained with grammatic function
annotations.
Control verb sentences (`A wants to hit B'), where Hudson would
require A to have two regents, are instead modeled by function
annotations (but it isn't clear how; in the example tree their own
solution simply omits the second edge with no substitute). Extraction,
raising and extraposition are said to be dealt with similarly.
The authors are obviously concerned about the presence of CFG
derivation rules, and explicitly question whether their creation is
still a dependency grammar. They test themselves against three
postulates made by Hudson:
1. There should be no transformational component - check.
2. `Dependency should be basic, not derived' - check.
3. Rules of grammar should not be formally distinct from
subcategorization facts. - fail.
Apparently the treatment of adjuncts is a problem under these
postulates because they are not motivated lexically (i.e. by
subcategorization info). The authors justify themselves,
"Adjunct rules are not related to any subcategorization facts. We
believe that their existence (in small numbers) is justified in our
emulation model. Even in a DG formalism that does not have
phrase-structure-like rules, there have to be some general facilities
to take care of such non-lexical grammatical mechanisms. DG is
lexically orientated, but it has to cope with non-lexical grammatical
mechanisms, where they exist, in language."
Yves Lepage, Ando Shin-Ichi, Akamine Susumu and Iida Hitoshi:
An Annotated Corpus in Japanese Using Tesnière's Structural Syntax
Tesnière (1959) explained phenomena in many languages to support his
claim that dependency grammar is descriptively adequate. He cited, but
did not exemplify Japanese; this paper does just that.
Japanese has the usual classes of content and function words, concrete
and abstract concepts etc. Case-marked phrases permute freely, which
fits DG well. Constituents can also be left out, even the subject.
Tesnière's (and our) principle of linking the subject to the finite
verb and the object to the full verb is also followed. Forests are
advocated to represent sentences with elliptical conjunctions. The
concept of transference is used to model na-adjectives (nationalized
foreign words which behave mopre like nouns) simply.
6500 sentences from hotl reservation have been annotated using this
model.