+ Fragmentation Experiments

++ Content

Description

The goal of these experiments is to achieve a gracefull fragmentation of an utterance analysis giving reasonable partial parses. Basically there are two approaches: (1) write a grammar for reasonable partail parses and the invent recombination constraints that bring together the parts to a full fledged syntax analysis or (2) derive a grammar for partial parsing from an already present syntax grammar that does a deep syntax analysis. As things are situation (2) applies from where we start now as we have a quite matured grammar for german newspaper articles, the HeiseGrammar. This grammar deals in several ways with partial parses in a way that it constraints fragmentation: fragments are allowed but are penalized. There are a couple of constraints which regulate fragmentation:
  • the unary Fragmentation constraint forbidding arbitrary material to be linked to NIL
  • the binary mehrere Hauptsätze constraint forbidding more than one finit verb root in a single analysis
  • the existence constraints expressed by auxiliary OBL levels forbidding unsatisfied verb frame demands

Experiments

Parsing partially an increasing number of edges are linked to NIL, i.e triggered by a softened Fragmentation constraint. As this behaviour is desired we don't count those bindings as an error even though the annotation says so. The number of ignored errors is counted separately.

Other settings are the following:
  • HeiseGrammar version 0.9, that is the last one with OBL levels
  • using the TreeTagger (ignoring impossible tags and normalizing the remaining tag scores to one)
  • no chunker
  • time limit is 3 minutes

Baseline

The following table shows the numbers for frobbing using all levels switched on in an unmodifieed grammar and using the normal parse verification (no ignored errors).

no flavour structural recall % labelled recall % lexical recall % time
1 frobbing combined 75.26 70.48 64.13 2 d 6 h 17 m 24 s 670 ms
2 frobbing dynamic 78.91 74.39 68.63 1 d 9 h 20 m 3 s 420 ms

Softening 'Fragmentation'

So in the first experiments we switched off all levels despite the SYN level reducing the original parsing problem alot as to no existence constraints we must obey. So below we soften the Fragmentation constraint in a controlled way. The first row is a kind of baseline as a penalty of 0.01 is the original penalty of the Fragmentation constraint, as is the last experiment too which in effect switches off the Fragmentation constraint.

Overall performance

no flavour penalty ignored errors % structural recall % labelled recall % lexical recall % time
1 frobbing combined 0.0100 03.15 72.35 66.35 60.03 1 d 3 h 28 m 35 s 350 ms
2 frobbing combined 0.8000 03.79 72.62 66.51 60.23 1 d 5 h 29 m 11 s 240 ms
3 frobbing combined 0.8500 03.85 72.59 66.51 60.20 1 d 6 h 14 m 850 ms
4 frobbing combined 0.9000 04.79 72.98 66.87 60.60 1 d 7 h 12 m 7 s 100 ms
5 frobbing combined 0.9500 08.51 75.01 68.97 62.81 1 d 9 h 56 m 10 s 660 ms
6 frobbing combined 0.9900 31.37 82.22 75.00 70.03 1 d 4 h 11 m 46 s 380 ms
7 frobbing combined 0.9990 52.20 91.41 85.99 80.91 14 h 49 m 55 s 530 ms
8 frobbing combined 0.9999 53.12 91.83 86.52 81.42 10 h 25 m 12 s 430 ms
9 frobbing combined 1.0000 61.42 93.96 91.12 85.90 6 h 9 m 21 s 190 ms
10 frobbing dynamic 0.0100 05.40 77.27 71.27 65.03 9 h 38 m 59 s 880 ms
11 frobbing dynamic 0.8000 08.47 79.78 73.89 67.86 8 h 57 m 26 s 460 ms
12 frobbing dynamic 0.8500 08.56 79.94 74.13 68.11 9 h 35 m 3 s 680 ms
13 frobbing dynamic 0.9000 09.64 80.17 74.32 68.36 9 h 48 m 55 s 970 ms
14 frobbing dynamic 0.9500 12.92 81.63 75.81 69.94 10 h 16 m 35 s 680 ms
15 frobbing dynamic 0.9900 33.73 86.40 79.12 74.10 5 h 42 m 50 s 30 ms
16 frobbing dynamic 0.9990 52.62 93.57 88.12 82.54 2 h 5 m 39 s 500 ms
17 frobbing dynamic 0.9999 53.71 93.99 88.70 83.08 1 h 46 m 52 s 210 ms
18 frobbing dynamic 1.0000 62.47 96.25 93.50 87.47 1 h 14 m 30 s 80 ms

fragment-combined.png

fragment-dynamic.png

fragment-errors.png

Three-Way Comparison: which edges become NIL once

Here is a closer look at last week's fragmentation behaviour.

Summary:

  • of the structurally wrong edges in the baseline case (normal grammar), more and more edges go to NIL as the fragmentation penalty rises. No surprise here.

  • for the additional NIL edges established, two out of three are initially `good' (i.e. the parser refuses attachment instead of making a wrong one). As the penalty rises, this ratio goes down to one out of three.

  • initially, mostly very long, very loose relations turn into NIL, namely subclauses and conjunctions. As the penalty rises, other cases become more prominent: PP, ADV, SUBJ, OBJA often become detached (good) but also PN and DET (bad).

Details:

What happened to the errors originally made?

42132 syntax edges, 9089 errors in baseline performance

penalty 0.0100: 18.6% corrected, 63.9% unresolved, 17.5% to NIL 
penalty 0.8000: 16.9% corrected, 56.9% unresolved, 26.2% to NIL 
penalty 0.8500: 17.2% corrected, 56.4% unresolved, 26.4% to NIL 
penalty 0.9000: 16.5% corrected, 55.5% unresolved, 28.0% to NIL 
penalty 0.9500: 15.3% corrected, 50.0% unresolved, 34.7% to NIL 
penalty 0.9900: 10.3% corrected, 29.7% unresolved, 60.1% to NIL 
penalty 0.9990:  5.3% corrected, 14.0% unresolved, 80.7% to NIL 
penalty 0.9999:  5.1% corrected, 13.5% unresolved, 81.4% to NIL 
penalty 1.0000:  3.4% corrected,  9.1% unresolved, 87.5% to NIL 

Were the additional NIL edges good or bad?

penalty 0.0100:  2352 false NILs, 66.0% good, 34.0% bad 
penalty 0.8000:  3626 false NILs, 63.2% good, 36.8% bad 
penalty 0.8500:  3668 false NILs, 63.1% good, 36.9% bad 
penalty 0.9000:  4096 false NILs, 59.7% good, 40.3% bad 
penalty 0.9500:  5439 false NILs, 56.0% good, 44.0% bad 
penalty 0.9900: 13925 false NILs, 38.2% good, 61.8% bad 
penalty 0.9990: 22089 false NILs, 32.5% good, 67.5% bad 
penalty 0.9999: 22522 false NILs, 32.1% good, 67.9% bad 
penalty 1.0000: 26129 false NILs, 29.8% good, 70.2% bad 

These edges turned into ROOT:

penalty 0.0100:

KON 797 REL 298 OBJC 247 ROOT 228 NEB 201 CJ 166 ADV 126 SUBJ 89 AUX 68 OBJA 34 APPO 31 ATTR 28 PN 12 PP 9 DET 6 KONJ 5 GMOD 3 AVZ 1 KOM 1 OBJA2 1 OBJG 1

penalty 0.8000:

KON 890 OBJC 370 SUBJ 362 REL 340 NEB 275 ROOT 258 CJ 199 ADV 166 PP 137 AUX 129 APPO 101 KONJ 93 OBJA 79 DET 59 PN 53 ATTR 42 KOM 17 AVZ 17 GMOD 13 OBJP 9 OBJD 7 PART 5 ZEIT 2 OBJG 1 EXPL 1 OBJA2 1

penalty 0.8500:

KON 897 OBJC 372 SUBJ 365 REL 342 NEB 276 ROOT 258 CJ 204 ADV 162 PP 149 AUX 135 APPO 101 KONJ 96 OBJA 82 DET 59 PN 52 ATTR 45 AVZ 17 KOM 17 GMOD 13 OBJD 8 OBJP 7 PART 6 ZEIT 2 EXPL 1 OBJA2 1 OBJG 1

penalty 0.9000:

KON 901 PP 429 SUBJ 376 OBJC 374 REL 345 NEB 274 ROOT 262 ADV 221 CJ 201 AUX 152 KONJ 105 APPO 101 OBJA 85 DET 75 PN 64 ATTR 48 KOM 17 AVZ 17 GMOD 16 OBJP 12 OBJD 9 PART 7 ZEIT 2 OBJG 1 EXPL 1 OBJA2 1

penalty 0.9500:

PP 1183 KON 983 ADV 402 SUBJ 401 OBJC 391 REL 353 NEB 275 ROOT 260 CJ 216 AUX 200 KONJ 140 PN 136 DET 108 APPO 105 OBJA 105 ATTR 55 GMOD 31 OBJP 28 AVZ 24 KOM 19 PART 9 OBJD 8 ZEIT 4 OBJA2 1 EXPL 1 OBJG 1

penalty 0.9900:

PP 2982 ADV 1941 DET 1170 KON 1070 PN 1009 SUBJ 705 AUX 699 KONJ 436 OBJA 423 OBJC 423 ROOT 380 REL 369 GMOD 363 PART 358 APPO 357 NEB 316 CJ 303 ATTR 240 KOM 166 OBJP 82 AVZ 53 OBJD 28 ZEIT 27 GRAD 20 OBJA2 3 EXPL 1 OBJG 1

penalty 0.9990:

PP 3506 PN 2504 ADV 2470 SUBJ 1733 OBJA 1528 DET 1517 AUX 1428 APPO 1148 KON 1108 GMOD 560 CJ 549 OBJC 479 ATTR 449 KONJ 439 OBJP 438 ROOT 393 REL 385 PART 363 NEB 327 KOM 324 AVZ 139 OBJD 121 ZEIT 91 GRAD 69 OBJA2 13 EXPL 5 OBJG 2 OBJN 1

penalty 0.9999:

PP 3512 PN 2585 ADV 2476 SUBJ 1840 OBJA 1593 DET 1558 AUX 1444 APPO 1225 KON 1107 GMOD 562 CJ 552 OBJC 479 ATTR 457 OBJP 439 KONJ 439 ROOT 393 REL 386 PART 363 NEB 329 KOM 324 AVZ 139 OBJD 128 ZEIT 98 GRAD 69 OBJA2 13 EXPL 9 OBJG 2 OBJN 1

penalty 1.0000:

PP 3608 PN 3191 ADV 2563 SUBJ 2471 DET 1997 OBJA 1796 AUX 1669 ATTR 1349 APPO 1272 KON 1115 CJ 744 GMOD 624 OBJC 493 OBJP 475 KONJ 440 ROOT 394 REL 386 PART 363 NEB 327 KOM 326 OBJD 152 AVZ 150 ZEIT 111 GRAD 69 EXPL 25 OBJA2 16 OBJG 2 OBJN 1

Original label distribution:

ROOT 7387 DET 5965 PN 4514 PP 3675 ADV 2920 SUBJ 2840 ATTR 2398 AUX 2205 OBJA 2089 APPO 1437 CJ 1373 KON 1123 GMOD 743 OBJC 525 OBJP 503 KONJ 442 REL 402 PART 363 NEB 346 KOM 326 OBJD 175 AVZ 150 ZEIT 113 GRAD 69 EXPL 26 OBJA2 19 OBJG 2 OBJN 2

-- MichaelDaum - 27 Nov 2002
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback