+ Fragmentation Experiments
++ Content
Description
The goal of these experiments is to achieve a gracefull fragmentation of an utterance analysis
giving
reasonable partial parses. Basically there are two approaches: (1) write a grammar for
reasonable partail parses and the invent recombination constraints that bring together the
parts to a full fledged syntax analysis or (2) derive a grammar for partial parsing from an already
present syntax grammar that does a deep syntax analysis. As things are situation (2) applies from where we start now
as we have a quite matured grammar for german newspaper articles, the
HeiseGrammar. This grammar
deals in several ways with partial parses in a way that it constraints fragmentation: fragments are
allowed but are penalized. There are a couple of constraints which regulate fragmentation:
- the unary Fragmentation constraint forbidding arbitrary material to be linked to NIL
- the binary mehrere Hauptsätze constraint forbidding more than one finit verb root in a single analysis
- the existence constraints expressed by auxiliary OBL levels forbidding unsatisfied verb frame demands
Experiments
Parsing partially an increasing number of edges are linked to NIL, i.e triggered by a softened
Fragmentation
constraint. As this behaviour is desired we don't count those bindings as an error even though the
annotation says so. The number of ignored errors is counted separately.
Other settings are the following:
- HeiseGrammar version 0.9, that is the last one with OBL levels
- using the TreeTagger (ignoring impossible tags and normalizing the remaining tag scores to one)
- no chunker
- time limit is 3 minutes
Baseline
The following table shows the numbers for frobbing using all levels switched on in an unmodifieed grammar and
using the normal parse verification (no ignored errors).
Softening 'Fragmentation'
So in the first experiments we switched off all levels despite the
SYN level reducing the original parsing problem alot as to
no existence constraints we must obey. So below we soften the
Fragmentation constraint
in a controlled way. The first row is a kind of baseline as a penalty of 0.01 is the original penalty
of the
Fragmentation constraint, as is the last experiment too which in effect switches off the
Fragmentation
constraint.
Overall performance
no |
flavour |
penalty |
ignored errors % |
structural recall % |
labelled recall % |
lexical recall % |
time |
1 |
frobbing combined |
0.0100 |
03.15 |
72.35 |
66.35 |
60.03 |
1 d 3 h 28 m 35 s 350 ms |
2 |
frobbing combined |
0.8000 |
03.79 |
72.62 |
66.51 |
60.23 |
1 d 5 h 29 m 11 s 240 ms |
3 |
frobbing combined |
0.8500 |
03.85 |
72.59 |
66.51 |
60.20 |
1 d 6 h 14 m 850 ms |
4 |
frobbing combined |
0.9000 |
04.79 |
72.98 |
66.87 |
60.60 |
1 d 7 h 12 m 7 s 100 ms |
5 |
frobbing combined |
0.9500 |
08.51 |
75.01 |
68.97 |
62.81 |
1 d 9 h 56 m 10 s 660 ms |
6 |
frobbing combined |
0.9900 |
31.37 |
82.22 |
75.00 |
70.03 |
1 d 4 h 11 m 46 s 380 ms |
7 |
frobbing combined |
0.9990 |
52.20 |
91.41 |
85.99 |
80.91 |
14 h 49 m 55 s 530 ms |
8 |
frobbing combined |
0.9999 |
53.12 |
91.83 |
86.52 |
81.42 |
10 h 25 m 12 s 430 ms |
9 |
frobbing combined |
1.0000 |
61.42 |
93.96 |
91.12 |
85.90 |
6 h 9 m 21 s 190 ms |
|
10 |
frobbing dynamic |
0.0100 |
05.40 |
77.27 |
71.27 |
65.03 |
9 h 38 m 59 s 880 ms |
11 |
frobbing dynamic |
0.8000 |
08.47 |
79.78 |
73.89 |
67.86 |
8 h 57 m 26 s 460 ms |
12 |
frobbing dynamic |
0.8500 |
08.56 |
79.94 |
74.13 |
68.11 |
9 h 35 m 3 s 680 ms |
13 |
frobbing dynamic |
0.9000 |
09.64 |
80.17 |
74.32 |
68.36 |
9 h 48 m 55 s 970 ms |
14 |
frobbing dynamic |
0.9500 |
12.92 |
81.63 |
75.81 |
69.94 |
10 h 16 m 35 s 680 ms |
15 |
frobbing dynamic |
0.9900 |
33.73 |
86.40 |
79.12 |
74.10 |
5 h 42 m 50 s 30 ms |
16 |
frobbing dynamic |
0.9990 |
52.62 |
93.57 |
88.12 |
82.54 |
2 h 5 m 39 s 500 ms |
17 |
frobbing dynamic |
0.9999 |
53.71 |
93.99 |
88.70 |
83.08 |
1 h 46 m 52 s 210 ms |
18 |
frobbing dynamic |
1.0000 |
62.47 |
96.25 |
93.50 |
87.47 |
1 h 14 m 30 s 80 ms |
Three-Way Comparison: which edges become NIL once
Here is a closer look at last week's fragmentation behaviour.
Summary:
- of the structurally wrong edges in the baseline case (normal grammar), more and more edges go to NIL as the fragmentation penalty rises. No surprise here.
- for the additional NIL edges established, two out of three are initially `good' (i.e. the parser refuses attachment instead of making a wrong one). As the penalty rises, this ratio goes down to one out of three.
- initially, mostly very long, very loose relations turn into NIL, namely subclauses and conjunctions. As the penalty rises, other cases become more prominent: PP, ADV, SUBJ, OBJA often become detached (good) but also PN and DET (bad).
Details:
What happened to the errors originally made?
42132 syntax edges, 9089 errors in baseline performance
penalty 0.0100: 18.6% corrected, 63.9% unresolved, 17.5% to NIL
penalty 0.8000: 16.9% corrected, 56.9% unresolved, 26.2% to NIL
penalty 0.8500: 17.2% corrected, 56.4% unresolved, 26.4% to NIL
penalty 0.9000: 16.5% corrected, 55.5% unresolved, 28.0% to NIL
penalty 0.9500: 15.3% corrected, 50.0% unresolved, 34.7% to NIL
penalty 0.9900: 10.3% corrected, 29.7% unresolved, 60.1% to NIL
penalty 0.9990: 5.3% corrected, 14.0% unresolved, 80.7% to NIL
penalty 0.9999: 5.1% corrected, 13.5% unresolved, 81.4% to NIL
penalty 1.0000: 3.4% corrected, 9.1% unresolved, 87.5% to NIL
Were the additional NIL edges good or bad?
penalty 0.0100: 2352 false NILs, 66.0% good, 34.0% bad
penalty 0.8000: 3626 false NILs, 63.2% good, 36.8% bad
penalty 0.8500: 3668 false NILs, 63.1% good, 36.9% bad
penalty 0.9000: 4096 false NILs, 59.7% good, 40.3% bad
penalty 0.9500: 5439 false NILs, 56.0% good, 44.0% bad
penalty 0.9900: 13925 false NILs, 38.2% good, 61.8% bad
penalty 0.9990: 22089 false NILs, 32.5% good, 67.5% bad
penalty 0.9999: 22522 false NILs, 32.1% good, 67.9% bad
penalty 1.0000: 26129 false NILs, 29.8% good, 70.2% bad
These edges turned into ROOT:
penalty 0.0100:
KON 797 REL 298 OBJC 247 ROOT 228 NEB 201 CJ 166 ADV
126 SUBJ 89 AUX 68 OBJA 34 APPO 31 ATTR 28 PN 12 PP 9 DET 6 KONJ 5
GMOD 3 AVZ 1 KOM 1 OBJA2 1 OBJG 1
penalty 0.8000:
KON 890 OBJC 370 SUBJ 362 REL 340 NEB 275 ROOT 258 CJ 199 ADV 166 PP
137 AUX 129 APPO 101 KONJ 93 OBJA 79 DET 59 PN 53 ATTR 42 KOM 17 AVZ
17 GMOD 13 OBJP 9 OBJD 7 PART 5 ZEIT 2 OBJG 1 EXPL 1 OBJA2 1
penalty 0.8500:
KON 897 OBJC 372 SUBJ 365 REL 342 NEB 276 ROOT 258 CJ 204 ADV 162 PP
149 AUX 135 APPO 101 KONJ 96 OBJA 82 DET 59 PN 52 ATTR 45 AVZ 17 KOM
17 GMOD 13 OBJD 8 OBJP 7 PART 6 ZEIT 2 EXPL 1 OBJA2 1 OBJG 1
penalty 0.9000:
KON 901 PP 429 SUBJ 376 OBJC 374 REL 345 NEB 274 ROOT 262 ADV 221 CJ
201 AUX 152 KONJ 105 APPO 101 OBJA 85 DET 75 PN 64 ATTR 48 KOM 17 AVZ
17 GMOD 16 OBJP 12 OBJD 9 PART 7 ZEIT 2 OBJG 1 EXPL 1 OBJA2 1
penalty 0.9500:
PP 1183 KON 983 ADV 402 SUBJ 401 OBJC 391 REL 353 NEB 275 ROOT
260 CJ 216 AUX 200 KONJ 140 PN 136 DET 108 APPO 105 OBJA 105 ATTR 55
GMOD 31 OBJP 28 AVZ 24 KOM 19 PART 9 OBJD 8 ZEIT 4 OBJA2 1 EXPL 1 OBJG
1
penalty 0.9900:
PP 2982 ADV 1941 DET 1170 KON 1070 PN 1009 SUBJ 705
AUX 699 KONJ 436 OBJA 423 OBJC 423 ROOT 380 REL 369 GMOD 363 PART 358
APPO 357 NEB 316 CJ 303 ATTR 240 KOM 166 OBJP 82 AVZ 53 OBJD 28 ZEIT
27 GRAD 20 OBJA2 3 EXPL 1 OBJG 1
penalty 0.9990:
PP 3506 PN 2504 ADV
2470 SUBJ 1733 OBJA 1528 DET 1517 AUX 1428 APPO 1148 KON 1108 GMOD 560
CJ 549 OBJC 479 ATTR 449 KONJ 439 OBJP 438 ROOT 393 REL 385 PART 363
NEB 327 KOM 324 AVZ 139 OBJD 121 ZEIT 91 GRAD 69 OBJA2 13 EXPL 5 OBJG
2 OBJN 1
penalty 0.9999:
PP 3512 PN 2585 ADV 2476 SUBJ 1840 OBJA 1593
DET 1558 AUX 1444 APPO 1225 KON 1107 GMOD 562 CJ 552 OBJC 479 ATTR 457
OBJP 439 KONJ 439 ROOT 393 REL 386 PART 363 NEB 329 KOM 324 AVZ 139
OBJD 128 ZEIT 98 GRAD 69 OBJA2 13 EXPL 9 OBJG 2 OBJN 1
penalty 1.0000:
PP 3608 PN 3191 ADV 2563 SUBJ 2471 DET 1997 OBJA 1796 AUX 1669 ATTR
1349 APPO 1272 KON 1115 CJ 744 GMOD 624 OBJC 493 OBJP 475 KONJ 440
ROOT 394 REL 386 PART 363 NEB 327 KOM 326 OBJD 152 AVZ 150 ZEIT 111
GRAD 69 EXPL 25 OBJA2 16 OBJG 2 OBJN 1
Original label distribution:
ROOT 7387 DET 5965 PN 4514 PP 3675 ADV 2920 SUBJ 2840 ATTR 2398 AUX
2205 OBJA 2089 APPO 1437 CJ 1373 KON 1123 GMOD 743 OBJC 525 OBJP 503
KONJ 442 REL 402 PART 363 NEB 346 KOM 326 OBJD 175 AVZ 150 ZEIT 113
GRAD 69 EXPL 26 OBJA2 19 OBJG 2 OBJN 2
--
MichaelDaum - 27 Nov 2002