A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005
Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal): constituent structure as PS tree, functional relations as AVM Mapping φ between c-structure and f-structure, established by annotations in the PS rules S NP VP ( SUBJ)= = S NP VP Kim sleeps PRED SUBJ [ sleep ( SUBJ ] PRED Kim 1
A DOP model for LFG is more expressive than Tree-DOP, and therefore requires extension of DOP parameters to multilevel nature of LFG: Representations Fragments Composition Operation Probability model 2
Representations Tree-DOP: (context-free) phrase structure trees LFG-DOP: 1. PS trees (c-structure) 2. AVMs (f-structure) 3. mapping from tree nodes to AVMs (φ) A c-structure/f-structure pair is a valid representation only if it satisfies Nonbranching Dominance: no c-structure such that X X Uniqueness: at most one value for any attribute in an f-structure Coherence: every grammatical relation in an f-structure must be governed by a PRED Completeness: all functions governed by a PRED appear as attributes in the local f-structure 3
Fragments Basic idea: Association of Tree-DOP fragments (= connected subtrees) with f-structure units Challenge: Fragmentation operations (and composition later on) must (i) preserve validity of f-structure components, and (ii) manipulate the correspondence function φ. Tree-DOP fragments can be produced by two operations: Root: Select any node in a tree as the root of the fragment; erase all nodes dominating that root. Frontier: Select a set of nodes in the Root-generated fragment (except its root node); erase the subtrees dominated by these nodes. 4
Fragments contd. Extension for f-structure components: Root removes all φ links leaving the erased nodes, all f-structure units that are not contained in the φ-projections of the remaining nodes, and all PRED features of the φ-projections of erased nodes Like Root, Frontier removes φ-correspondences of erased nodes and their respective PRED features. Frontier removes nothing else. 5
An example: John said that Kim sleeps. S NP VP John V S said C S PRED SUBJ VCOMP say ( SUBJ)( VCOMP) [ ] PRED John PRED sleep ( SUBJ) COMPFORM SUBJ [ PRED that ] Kim that NP VP Kim sleeps 6
Generalisation of Fragments Root and Frontier retain all agreement features of nodes that are φ-accessible from the fragment nodes. In some cases, specification of these features is more restrictive than necessary. [ ] VP SUBJ PERS 2 sleep PRED sleep ( SUBJ) Intuitively, the fragment should also be compatible with a 1st person SUBJ. Discard operation: delete a feature whose corresponding node has been erased. VP sleep SUBJ [] PRED sleep ( SUBJ) 7
Composition of Fragments Composition ( ) is defined in two steps: 1. Left-most substitution on the c-structure (cf. Tree-DOP); 2. Unification of the f-structures corresponding to the matching nodes. Given this definition, a derivation of a representation R is a sequence of fragments f 1, f 2,..., f k such that root(f 1 ) = S and f 1 f 2... f k = R. Interaction of composition and Discard may result in valid representations assigned to ungrammatical utterances = robust language model. Corpus-based notion of grammaticality can be expressed as constraint on derivations: A sentence is grammatical iff there is at least one Discard-free derivation for a valid representation. 8
Validity Checking Recall: An LFG-DOP representation is valid iff it obeys the Nonbranching Dominance (property of c-structure, ignored here), Uniqueness, Coherence, and Completeness Conditions. Uniqueness and Coherence: on-line or off-line check, since these constraints are monotonic: once an f-structure violates one of the constraints, it will remain inconsistent, no matter which information is added in subsequent composition steps. But: Completeness can only be verified for final representations, otherwise, partial information added in later steps could not be taken into account. Off-line checking does not affect the generation of derivations directly, whereas on-line evaluation of conditions restricts the Competition Set (CS): The competition set CS i at step i of the composition contains exactly those fragments that are composable with the analysis obtained by f 1... f i 1. 9
Composability Off-line evaluation of all validity conditions: as in Tree-DOP, fragment s root category must match category of left-most nonterminal in current analysis; on-line evaluation of Uniqueness: (c-structure) node categories match and their corresponding f-structures unify; on-line evaluation of Coherence: (c-structure) node categories match and the result of unifying the corresponding f-structures is coherent. Note that on-line satisfaction of the Coherence condition implies satisfaction of Uniqueness. 10
The Probability Model Probability of a fragment in Tree-DOP: relative frequency. LFG-DOP should distinguish between Root-/Frontier-generated and Discardgenerated (= generalised) fragments, since the number of the latter is exponential in the number of features of the underlying R.-/F.-generated fragments. Discounted Relative Frequency: Generalised fragments are treated as unseen events which receive a probability mass of n 1 N, where n 1 = #singleton events, N = #seen events. Let D the bag of generalised fragments, f the frequency of a fragment f, then ( n1 ) f P(f f D) = N f D f and ( P(f f D) = 1 n ) 1 f N f D f 11
The Probability Model contd. Derivation of a representation as a stochastic process: 1. Select initial fragment f 1 from the set of all fragments rooted in S; 2. each subsequent fragment f i is randomly drawn from the competition set CS i. Competition Probability of a fragment f CS: CP(f CS) = P(f) f CS P(f ) Probability of a derivation: P( f 1, f 2,..., f k ) = k 1 CP(f i CS i ). Probability of a (valid) representation R for sentence S: D derives R P(R) = P(D) R valid and R yields W P(R ) 12
Parsing with LFG-DOP Step 1. Apply the fragmentation operations to a (disambiguated) treebank of LFG representations; Step 2. parse the input sentence with a BU chart parser, using only the c-structure components of the fragments obtained in step 1; Step 3. decode the resulting chart with Monte Carlo disambiguation, i.e. generate a large number of random derivations from the chart, filter out those representations that violate Uniqueness or Coherence conditions, and select the most frequently generated representation among the remaining representations. 13
Evaluation LFG-annotated corpora: Verbmobil (540 parses), Homecentre (980 parses); split: 90% training data, 10% test data; parsing of test set with LFG-DOP parsing using fragments from training set (limited to framents with depth up to 4) metrics: exact match, plus precision and recall. Precision = #correct constituents in P #constituents in P #correct constituents in P Recall = #constituents in T Adaptation for f-structures: f-structures of P and T φ-correspond. 14
Evaluation contd. Fragment estimators (results for Homecentre only): Estimator Exact Match Precision Recall +Discard Discard +Discard Discard +Discard Discard Rel.Freq. 2.7% 37.9% 17.1% 77.8% 15.5% 77.2% Disc.Rel.Freq. 38.4% 37.9% 80.0% 77.8% 78.6% 77.2% Simple Relative Frequency estimator inaccurate with generalised fragments, scores significantly higher with only Root-/Frontier-generated fragments; Discounted Relative Frequency estimator takes advantage of the generalised fragments. 15
Evaluation contd. Fragment sizes (results for Homecentre only): Fragment Depth Exact Match Precision Recall 1 31.3% 75.0% 71.5% 2 36.3% 77.1% 74.7% 3 37.8% 77.8% 76.1% 4 38.4% 80.0% 78.6% Supports DOP-hypothesis: Parse accuracy increases with increasing fragment size. 16
Evaluation contd. LFG-DOP vs. Tree-DOP (results for Homecentre only): Model Exact Match Precision Recall Tree-DOP 49.0% 93.4% 92.1% LFG-DOP 53.2% 95.8% 94.7% Discounted Relative Frequency estimator and fragments up to depth 4 for LFG-DOP; parse accuracy only for tree-structures f-structures help improve accuracy significantly even if only tree-structures matter 17
For Short LFG-DOP enables robust, deep parsing without a competence grammar. The notion of grammaticality is corpus-based. LFG-DOP probability models define a parametrised stochastic process: Uniqueness and/or Coherence constraints can be processed on-line of off-line. LFG-DOP outperforms Tree-DOP. 18