A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Similar documents
Advanced Natural Language Processing Syntactic Parsing

Topics in Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple. Xerox PARC. August 1995

Probabilistic Context-free Grammars

Andreas Zollmann. A Consistent and Efficient Estimator for the Data-Oriented Parsing Model

Parsing with Context-Free Grammars

A* Search. 1 Dijkstra Shortest Path

Processing/Speech, NLP and the Web

Algorithms for Syntax-Aware Statistical Machine Translation

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Chapter 14 (Partially) Unsupervised Parsing

Grammar and Feature Unification

DT2118 Speech and Speaker Recognition

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Lecture 5: UDOP, Dependency Grammars

Bringing machine learning & compositional semantics together: central concepts

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars

National Centre for Language Technology School of Computing Dublin City University

Backoff Parameter Estimation for the DOP Model

Ling 240 Lecture #15. Syntax 4

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Tree Adjoining Grammars

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

Dynamic programming for parsing and estimation of. Stochastic Unification-Based Grammars (SUBGs)

Features of Statistical Parsers

Multilevel Coarse-to-Fine PCFG Parsing

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Probabilistic Context-Free Grammar

Parikh s theorem. Håkan Lindqvist

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Object Detection Grammars

Statistical Methods for NLP

Probabilistic Linguistics

A Context-Free Grammar

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

An axiomatic approach to speaker preferences

The Formal Architecture of. Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple

On rigid NL Lambek grammars inference from generalized functor-argument data

Improved TBL algorithm for learning context-free grammar

CS460/626 : Natural Language

Multiword Expression Identification with Tree Substitution Grammars

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Latent Variable Models in NLP

Model-Theory of Property Grammars with Features

Unification Grammars and Off-Line Parsability. Efrat Jaeger

Naïve Bayes Classifiers

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Decoding and Inference with Syntactic Translation Models

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

LECTURER: BURCU CAN Spring

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

10/17/04. Today s Main Points

Natural Language Processing

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Computational Linguistics

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Models of Language Evolution

Feature Constraint Logics. for Unication Grammars. Gert Smolka. German Research Center for Articial Intelligence and. Universitat des Saarlandes

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

A proof theoretical account of polarity items and monotonic inference.

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Unification. Two Routes to Deep Structure. Unification. Unification Grammar. Martin Kay. Stanford University University of the Saarland

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Sharpening the empirical claims of generative syntax through formalization

Syntax-Based Decoding

Semantics and Pragmatics of NLP Pronouns

Context-free grammars for natural languages

Spatial Role Labeling CS365 Course Project

CHAPTER THREE: RELATIONS AND FUNCTIONS

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

CS 662 Sample Midterm

A brief introduction to Logic. (slides from

1 Rules in a Feature Grammar

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING 1

Chapter 4: Computation tree logic

On the Statistical Consistency of DOP Estimators

Sequences and Information

Generating Sentences by Editing Prototypes

The effect of non-tightness on Bayesian estimation of PCFGs

Variable Latent Semantic Indexing

Tree sets. Reinhard Diestel

Marrying Dynamic Programming with Recurrent Neural Networks

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

This lecture covers Chapter 5 of HMU: Context-free Grammars

STRUCTURAL NON-CORRESPONDENCE IN TRANSLATION

Transcription:

A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal): constituent structure as PS tree, functional relations as AVM Mapping φ between c-structure and f-structure, established by annotations in the PS rules S NP VP ( SUBJ)= = S NP VP Kim sleeps PRED SUBJ [ sleep ( SUBJ ] PRED Kim 1

A DOP model for LFG is more expressive than Tree-DOP, and therefore requires extension of DOP parameters to multilevel nature of LFG: Representations Fragments Composition Operation Probability model 2

Representations Tree-DOP: (context-free) phrase structure trees LFG-DOP: 1. PS trees (c-structure) 2. AVMs (f-structure) 3. mapping from tree nodes to AVMs (φ) A c-structure/f-structure pair is a valid representation only if it satisfies Nonbranching Dominance: no c-structure such that X X Uniqueness: at most one value for any attribute in an f-structure Coherence: every grammatical relation in an f-structure must be governed by a PRED Completeness: all functions governed by a PRED appear as attributes in the local f-structure 3

Fragments Basic idea: Association of Tree-DOP fragments (= connected subtrees) with f-structure units Challenge: Fragmentation operations (and composition later on) must (i) preserve validity of f-structure components, and (ii) manipulate the correspondence function φ. Tree-DOP fragments can be produced by two operations: Root: Select any node in a tree as the root of the fragment; erase all nodes dominating that root. Frontier: Select a set of nodes in the Root-generated fragment (except its root node); erase the subtrees dominated by these nodes. 4

Fragments contd. Extension for f-structure components: Root removes all φ links leaving the erased nodes, all f-structure units that are not contained in the φ-projections of the remaining nodes, and all PRED features of the φ-projections of erased nodes Like Root, Frontier removes φ-correspondences of erased nodes and their respective PRED features. Frontier removes nothing else. 5

An example: John said that Kim sleeps. S NP VP John V S said C S PRED SUBJ VCOMP say ( SUBJ)( VCOMP) [ ] PRED John PRED sleep ( SUBJ) COMPFORM SUBJ [ PRED that ] Kim that NP VP Kim sleeps 6

Generalisation of Fragments Root and Frontier retain all agreement features of nodes that are φ-accessible from the fragment nodes. In some cases, specification of these features is more restrictive than necessary. [ ] VP SUBJ PERS 2 sleep PRED sleep ( SUBJ) Intuitively, the fragment should also be compatible with a 1st person SUBJ. Discard operation: delete a feature whose corresponding node has been erased. VP sleep SUBJ [] PRED sleep ( SUBJ) 7

Composition of Fragments Composition ( ) is defined in two steps: 1. Left-most substitution on the c-structure (cf. Tree-DOP); 2. Unification of the f-structures corresponding to the matching nodes. Given this definition, a derivation of a representation R is a sequence of fragments f 1, f 2,..., f k such that root(f 1 ) = S and f 1 f 2... f k = R. Interaction of composition and Discard may result in valid representations assigned to ungrammatical utterances = robust language model. Corpus-based notion of grammaticality can be expressed as constraint on derivations: A sentence is grammatical iff there is at least one Discard-free derivation for a valid representation. 8

Validity Checking Recall: An LFG-DOP representation is valid iff it obeys the Nonbranching Dominance (property of c-structure, ignored here), Uniqueness, Coherence, and Completeness Conditions. Uniqueness and Coherence: on-line or off-line check, since these constraints are monotonic: once an f-structure violates one of the constraints, it will remain inconsistent, no matter which information is added in subsequent composition steps. But: Completeness can only be verified for final representations, otherwise, partial information added in later steps could not be taken into account. Off-line checking does not affect the generation of derivations directly, whereas on-line evaluation of conditions restricts the Competition Set (CS): The competition set CS i at step i of the composition contains exactly those fragments that are composable with the analysis obtained by f 1... f i 1. 9

Composability Off-line evaluation of all validity conditions: as in Tree-DOP, fragment s root category must match category of left-most nonterminal in current analysis; on-line evaluation of Uniqueness: (c-structure) node categories match and their corresponding f-structures unify; on-line evaluation of Coherence: (c-structure) node categories match and the result of unifying the corresponding f-structures is coherent. Note that on-line satisfaction of the Coherence condition implies satisfaction of Uniqueness. 10

The Probability Model Probability of a fragment in Tree-DOP: relative frequency. LFG-DOP should distinguish between Root-/Frontier-generated and Discardgenerated (= generalised) fragments, since the number of the latter is exponential in the number of features of the underlying R.-/F.-generated fragments. Discounted Relative Frequency: Generalised fragments are treated as unseen events which receive a probability mass of n 1 N, where n 1 = #singleton events, N = #seen events. Let D the bag of generalised fragments, f the frequency of a fragment f, then ( n1 ) f P(f f D) = N f D f and ( P(f f D) = 1 n ) 1 f N f D f 11

The Probability Model contd. Derivation of a representation as a stochastic process: 1. Select initial fragment f 1 from the set of all fragments rooted in S; 2. each subsequent fragment f i is randomly drawn from the competition set CS i. Competition Probability of a fragment f CS: CP(f CS) = P(f) f CS P(f ) Probability of a derivation: P( f 1, f 2,..., f k ) = k 1 CP(f i CS i ). Probability of a (valid) representation R for sentence S: D derives R P(R) = P(D) R valid and R yields W P(R ) 12

Parsing with LFG-DOP Step 1. Apply the fragmentation operations to a (disambiguated) treebank of LFG representations; Step 2. parse the input sentence with a BU chart parser, using only the c-structure components of the fragments obtained in step 1; Step 3. decode the resulting chart with Monte Carlo disambiguation, i.e. generate a large number of random derivations from the chart, filter out those representations that violate Uniqueness or Coherence conditions, and select the most frequently generated representation among the remaining representations. 13

Evaluation LFG-annotated corpora: Verbmobil (540 parses), Homecentre (980 parses); split: 90% training data, 10% test data; parsing of test set with LFG-DOP parsing using fragments from training set (limited to framents with depth up to 4) metrics: exact match, plus precision and recall. Precision = #correct constituents in P #constituents in P #correct constituents in P Recall = #constituents in T Adaptation for f-structures: f-structures of P and T φ-correspond. 14

Evaluation contd. Fragment estimators (results for Homecentre only): Estimator Exact Match Precision Recall +Discard Discard +Discard Discard +Discard Discard Rel.Freq. 2.7% 37.9% 17.1% 77.8% 15.5% 77.2% Disc.Rel.Freq. 38.4% 37.9% 80.0% 77.8% 78.6% 77.2% Simple Relative Frequency estimator inaccurate with generalised fragments, scores significantly higher with only Root-/Frontier-generated fragments; Discounted Relative Frequency estimator takes advantage of the generalised fragments. 15

Evaluation contd. Fragment sizes (results for Homecentre only): Fragment Depth Exact Match Precision Recall 1 31.3% 75.0% 71.5% 2 36.3% 77.1% 74.7% 3 37.8% 77.8% 76.1% 4 38.4% 80.0% 78.6% Supports DOP-hypothesis: Parse accuracy increases with increasing fragment size. 16

Evaluation contd. LFG-DOP vs. Tree-DOP (results for Homecentre only): Model Exact Match Precision Recall Tree-DOP 49.0% 93.4% 92.1% LFG-DOP 53.2% 95.8% 94.7% Discounted Relative Frequency estimator and fragments up to depth 4 for LFG-DOP; parse accuracy only for tree-structures f-structures help improve accuracy significantly even if only tree-structures matter 17

For Short LFG-DOP enables robust, deep parsing without a competence grammar. The notion of grammaticality is corpus-based. LFG-DOP probability models define a parametrised stochastic process: Uniqueness and/or Coherence constraints can be processed on-line of off-line. LFG-DOP outperforms Tree-DOP. 18