+ Optimality Theory
Keywords: optimality theory, optimal parsing, sentence processing, online parsing, psycholinguistics, computational OT
See also: SentenceProcessing
Sources:
++ Table of Contents
Alan Prince & Paul Smolensky (1993): Optimality Theory: Constraint interaction in generative grammar.
Technical report, TR-2,
Rutgers University Center for Cognitive Science, and CU-CS-696-93,
Department of Computer Science, University of Colorado at Boulder.
To appear in the Linguistic Inquiry Monograph Series, MIT Press.
(
pdf)
Edward Gibson & Kevin Broihier (1998): Optimality Theory and Human Sentence Processing.
In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis & D. Pesetsky (eds.),
Is the Best Good Enough?, MIT Press, Cambridge, MA, pp. 157-191.
Review by Melissa Svendsen (url):
In "Optimality Theory and Human Sentence Processing," Edward Gibson and
Kevin Broihier argue that the 'winner-takes-all' approach of standard
Optimality Theory (in which the violation of any number of lower ranked
constraints is preferable to the violation of just one highly ranked
constraint) does not account for the facts of sentence parsing. Instead,
they argue for a
'cumulative constraint weighting system', in which the
violation of a highly ranked constraint may be preferred to multiple
violations of lower ranked constraints.
Gisbert Fanselow, Matthias Schlesewsky, Damir Cavar, Reinhold Kliegl (1999): Optimal Parsing, Syntactic Parsing Preferences, and Optimality Theory
IK Formal Models of Cognitive Complexity, University of Potsdam.
(
pdf)
Remarks & Quotes:
- extensive discussion of garden path effects in the light of OT with various examples and links into the literature
- discussion of parsing preferences as derivable from online application of grammatical principles
- OT being particularly suited for online parsing because
- referring to Pritchett (1992), Gorrell (1995) and Phillips (1996):
- heuristic parsing strategies reflect the influence of the principles of grammar
- that is: the grammar and the parser are the same
- there are no other cognitive strategies that may influence parsing and thereby "overwrite" parsing preferences explainable with an OT grammar
- but in any case: OT parsing models must restrict the number of alternatives considered
- so an OT parser will fail to note the existence of better parses
- and theerby failing to detect ungrammaticality
- p.3 example (4) refering to Frazier (1978), Altmann (1988): he told the girl that ...
(a) ... the father has kissed the child (complement clause interpretation)
(b) ... the father has kissed the story (relative clause interpretation)
early satisfaction of the theta-criterion ==> (a) is prefered
- the role of the theta-criterion in online parsing:
- citing Pritchett (1992:12): "the theta-criterion attempts to be satisfied at every point during processing"
- refering to McElree & Griffith (1995): "... we seem to have evidence that the thematic role information is used later that formal syntactic subcategorization information in online parsing, and should therefore not figure in formulating initial preferences at all."
- central idea of Tesar (1995): overparsing (compare to nonspec in Schulz (1998) :> ) generate partial trees with empty slots to cover the already present input string in a way that it improves the constraint violation profile. this is roughly described by
- build up a tree representation which covers all terminal elements found in the input
- "overparse" the input, i.e. postulate empty heads and integrate them into the structure as long as the result has a better constraint violation profile
- restriction of overparsing by OLDHD (Grimshaw 1997) or FILL (Tesar 1995): oversampling will take place only if OLDHD (or FILL) is outranked by a principle that can be satisfied by oversampling
- referring to Maruyama (1990) and Menzel (1997): "[These] parsing models that are not designed to be psychologically realistic..." :-?
- citing Tesar (1995): "... it is unnecessary, and in fact counterproductive, to consider computing optimal forms [like in Maruyama (1990) and Menzel (1997)] in those terms"
- Discussion: case agreement effects in german relative clauses (material from Schlesewsky (1996) and more) "Do semantically coreferent nominal phrases agree in case (==> ARGCASE rule)?" If so then a case mismatch will be visible in a longer reading time due to a reanalysis.
examples 1: - das ist die frau, die glücklicherweise die soldaten besucht hat, obwohl, ...
- das ist die frau, die glücklicherweise die soldaten besucht haben, obwohl, ...
- der soldat überrascht die frau, die glücklicherweise die männer besucht hat, ...
- der soldat überrascht die frau, die glücklicherweise die männer besucht haben, ...
examples 2: - der anwalt bezweifelt, das man polizisten, obwohl man sie einlädt, glauben sollte.
- der anwalt bezweifelt, das man polizisten, obwohl man ihnen vertraut, bestechen sollte.
- der anwalt bezweifelt, das man polizisten, obwohl man ihnen vertraut, glauben sollte.
- der anwalt bezweifelt, das man polizisten, obwohl man sie einlädt, bestechen sollte.
Conclusion: rules which are restricted to a small domain in grammatical terms may have considerable parsing effects (proving our everyday bread )
- relative clause attachments don't seem to be triggered syntactically ==> OT syntax cannot predict attachment preferences
Reinhard Blutner (1999): Some aspects of optimality in natural language interpretation.
In: Helen de Hoop & Henriette de Swart (eds.)
Papers on Optimality Theoretic Semantics.
Utrecht Institute of Linguistics OTS, December 1999, pp 1-21.
Also: Journal of Semantics 17, 189-216, 2000.
(
pdf,
citeseer)
Bruce Tesar (1995): Computational Optimality Theory
Doctoral Dissertation, University of Colorado at Boulder. ROA-90. 121 pages. (
pdf)
Frank Keller (2000): Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality.
PhD Thesis, University of Edinburgh.
(
URL,
PDF)
Remarks:
- Linear Optimality Theory
- weighted constraints
- distiction btw. hard and soft constraints
- the gradation is computed by the weighted sum of violated constraints
Abstract:
This thesis deals with gradience in grammar, i.e., with the fact that some linguistic structures are not fully acceptable or unacceptable, but receive gradient linguistic judgments. The importance of gradient data for linguistic theory has been recognized at least since Chomsky's Logical Structure of Linguistic Theory. However, systematic empirical studies of gradience are largely absent, and none of the major theoretical frameworks is designed to account for gradient data.
The present thesis addresses both questions. In the experimental part of the thesis (Chapters 3-5), we present a set of magnitude estimation experiments investigating gradience in grammar. The experiments deal with unaccusativity/unergativity, extraction, binding, word order, and gapping. They cover all major modules of syntactic theory, and draw on data from three languages (English, German, and Greek). In the theoretical part of thesis (Chapters 6 and 7), we use these experimental results to motivate a model of gradience in grammar. This model is a variant of Optimality Theory, and explains gradience in terms of the competition of ranked, violable linguistic constraints.
The experimental studies in this thesis deliver two main results. First, they demonstrate that an experimental investigation of gradient phenomena can advance linguistic theory by uncovering acceptability distinctions that have gone unnoticed in the theoretical literature. An experimental approach can also settle data disputes that result from the informal data collection techniques typically employed in theoretical linguistics, which are not well-suited to investigate the behavior of gradient linguistic data.
Second, we identify a set of general properties of gradient data that seem to be valid for a wide range of syntactic phenomena and across languages. (a) Linguistic constraints are ranked, in the sense that some constraint violations lead to a greater degree of unacceptability than others. (b) Constraint violations are cumulative, i.e., the degree of unacceptability of a structure increases with the number of constraints it violates. (c) Two constraint types can be distinguished experimentally: soft constraints lead to mild unacceptability when violated, while hard constraint violations trigger serious unacceptability. (d) The hard/soft distinction can be diagnosed by testing for effects from the linguistic context; context effects only occur for soft constraints; hard constraints are immune to contextual variation. (e) The soft/hard distinction is crosslinguistically stable.
In the theoretical part of the thesis, we develop a model of gradient grammaticality that borrows central concepts from Optimality Theory, a competition-based grammatical framework. We propose an extension, Linear Optimality Theory, motivated by our experimental results on constraint ranking and the cumulativity of violations. The core assumption of our model is that the relative grammaticality of a structure is determined by the weighted sum of the violations it incurs. We show that the parameters of the model (the constraint weights), can be estimated using the least square method, a standard model fitting algorithm. Furthermore, we prove that standard Optimality Theory is a special case of Linear Optimality Theory.
To test the validity of Linear Optimality Theory, we use it to model data from the experimental part of the thesis, including data on extraction, gapping, and word order. For all data sets, a high model fit is obtained and it is demonstrated that the model's predictions generalize to unseen data. On a theoretical level, our modeling results show that certain properties of gradient data (the hard/soft distinction, context effects, and crosslinguistic effects) do not have to be stipulated, but follow from core assumptions of Linear Optimality Theory.