Computational Linguistics

Similar documents
Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Transition-Based Parsing

Transition-based dependency parsing

Personal Project: Shift-Reduce Dependency Parsing

Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

Statistical Methods for NLP

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Probabilistic Context Free Grammars. Many slides from Michael Collins

A Tabular Method for Dynamic Oracles in Transition-Based Parsing

A* Search. 1 Dijkstra Shortest Path

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Introduction to Data-Driven Dependency Parsing

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Parsing with Context-Free Grammars

Treebank Grammar Techniques for Non-Projective Dependency Parsing

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Probabilistic Context-free Grammars

The Complexity of First-Order Extensible Dependency Grammar

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Marrying Dynamic Programming with Recurrent Neural Networks

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

Multi-dimensional Dependency Grammar as Multigraph Description

Topics in Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple. Xerox PARC. August 1995

Features of Statistical Parsers

Chapter 14 (Partially) Unsupervised Parsing

Einführung in die Computerlinguistik

LECTURER: BURCU CAN Spring

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Dynamic-oracle Transition-based Parsing with Calibrated Probabilistic Output

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

Parsing Mildly Non-projective Dependency Structures

Towards a formal model of natural language description based on restarting automata with parallel DR-structures

Parsing with Context-Free Grammars

A Context-Free Grammar

Multiword Expression Identification with Tree Substitution Grammars

Effectiveness of complex index terms in information retrieval

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Lab 12: Structured Prediction

The Real Story on Synchronous Rewriting Systems. Daniel Gildea Computer Science Department University of Rochester

Extraction of Opposite Sentiments in Classified Free Format Text Reviews

Introduction to Semantics. The Formalization of Meaning 1

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

The Formal Architecture of. Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Unsupervised Rank Aggregation with Distance-Based Models

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

X-bar theory. X-bar :

Graph-based Dependency Parsing. Ryan McDonald Google Research

A proof theoretical account of polarity items and monotonic inference.

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

Lecture 7: Introduction to syntax-based MT

Propositional Logic. Testing, Quality Assurance, and Maintenance Winter Prof. Arie Gurfinkel

AN ABSTRACT OF THE DISSERTATION OF

Computational Models - Lecture 4

Transition-based Dependency Parsing with Selectional Branching

Advanced Natural Language Processing Syntactic Parsing

Towards a redundancy elimination algorithm for underspecified descriptions

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Dependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University

Natural Language Processing

Categorial Grammar. Larry Moss NASSLLI. Indiana University

Model-Theory of Property Grammars with Features

Bringing machine learning & compositional semantics together: central concepts

Context-free grammars for natural languages

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Statistical Machine Translation

Spatial Role Labeling CS365 Course Project

Modular Grammar Design with Typed Parametric Principles

CS460/626 : Natural Language

Lecture 5: UDOP, Dependency Grammars

Computational Models - Lecture 4 1

Tree Adjoining Grammars

Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs

Transition-based Dependency Parsing with Selectional Branching

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Grammar and Feature Unification

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Spatial Role Labeling: Towards Extraction of Spatial Relations from Natural Language

The relation of surprisal and human processing

Terry Gaasterland Scripps Institution of Oceanography University of California San Diego, La Jolla, CA,

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Advanced Graph-Based Parsing Techniques

What we have done so far

Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov

Decoding and Inference with Syntactic Translation Models

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

10/17/04. Today s Main Points

DT2118 Speech and Speaker Recognition

Log-Linear Models with Structured Outputs

Transcription:

Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016

Acknowledgements These slides are heavily inspired by an ESSLLI 2007 course by Joakim Nivre and Ryan McDonald an ACL-COLING tutorial by Joakim Nivre and Sandra Kübler 2

Phrase-Structure Trees S NP VP PU JJ NN VBD NP PP. Economic news had JJ NN IN NP little effect on JJ NNS financial markets 3

Dependency Trees Basic idea: Syntactic structure = lexical items linked by relations Syntactic structures are usually trees ( but not always) Relations H D H is the head (or governor) D is the dependent pred obj sbj nmod nmod nmod p pc nmod ROOT Economic news had little effect on financial markets. 4

Dependency Trees Parsers are easy to implement and evaluate Dependency-based representations are suitable for free word order languages are often close to the predicate argument structure p pred obj sbj nmod nmod nmod pc nmod ROOT Economic news had little effect on financial markets. 5

Dependency Trees Some criteria for dependency relations between a head H and a dependent D in a linguistic construction C: H determines the syntactic category of C; H can replace C. H determines the semantic category of C; D specifies H. H is obligatory; D may be optional. H selects D and determines whether D is obligatory. The form of D depends on H (agreement or government). The linear position of D is specified with reference to H. 6

Dependency Trees Clear cases: Subject, Object, Less clear cases: complex verb groups subordinate clauses coordination 7

Dependency Graphs Graph G = V, A, L, < V% = a set of vertices (nodes) A% = a set of arcs (directed edges) L% = a set of edge labels <%= a linear order on V p pred obj sbj nmod nmod nmod pc nmod ROOT Economic news had little effect on financial markets. 8

Dependency Trees Notation obj nmod obj nmod had little effect had little effect 9

Dependency Graphs / Trees Formal conditions on dependency graphs: G is weakly connected G is acyclic Every node in G has at most one head G is projective 10

Projectivity A dependency graph G is projective iff if wi wj, then wi * wk for all wi < wk < wj or wj < wk < wi if wi is the head of wj, then there must be a directed path from wi to wk, for all wk between wi and wj. We need non-projectivity for long distance dependencies free word order wi wk wj 11

Projectivity p pred obj sbj nmod nmod nmod pc nmod ROOT Economic news had little effect on financial markets. free word order. pred sbj vg p pc obj nmod nmod nmod ROOT What did economic news have little effect on? 12

Projectivity Language Sentences Dependencies Arabic [Maamouri and Bies 2004] 11.2% 0.4% Basque [Aduriz et al. 2003] 26.2% 2.9% Czech [Hajic et al. 2001] 23.2% 1.9% Danish [Kromann 2003] 15.6% 1.0% Greek [Prokopidis et al. 2005] 20.3% 1.1% Russian [Boguslavsky et al. 2000] 10.6% 0.9% Slovenian [Dzeroski et al. 2006] 22.2% 1.9% Turkish [Oflazer et al. 2003] 11.6% 1.5% 13

Dependency-based Parsing Grammar-based Data-driven Transition-based Graph-based 14

Transition-based Parsing Configurations S, Q, A S = a stack of partially processed tokens (nodes) Q = a queue of unprocessed input tokens A = a set of dependency arcs Initial configuration for input w1 wn [w0], [w1,, wn], {}, w0 = ROOT Terminal (accepting) configuration, [], 15

Transitions ( Arc-Standard ) Left-Arc(r) adds a dependency arc (wj, r, wi) to the arc set A, where wi is the word on top of the stack and wj is the first word in the buffer, and pops the stack. Right-Arc(r) adds a dependency arc (wi, r, wj) to the arc set A, where wi is the word on top of the stack and wj is the first word in the buffer, pops the stack and replaces wj by wi at the head of buffer. 16

Transitions ( Arc-Standard ) Left-Arc(r) [, wi], [wj, ], A [ ], [wj, ], A {(wj, r, wi)} i 0, k l (wk, l, wi) A Right-Arc(r) [, wi], [wj, ], A [ ], [wi, ], A {(wi, r, wj)} k l (wk, l, wj) A Shift [ ], [wi, ], A [, wi], [ ], A 17

An Example ROOT ROOT 0 Economic 1 NMOD news 2 SBJ had 3 OBJ little 4 NMOD effect 5 NMOD on 6 Figure 2 Dependency graph for an English sentence from the Penn Treebank. P PMOD NMOD financial 7 markets 8. 9 18

An Example Transition Configuration ( [0], [1,...,9], ) SHIFT = ( [0, 1], [2,...,9], ) LEFT-ARC NMOD = ( [0], [2,...,9], A 1 = {(2, NMOD,1)} ) SHIFT = ( [0, 2], [3,...,9], A 1 ) LEFT-ARC SBJ = ( [0], [3,...,9], A 2 = A 1 {(3, SBJ,2)} ) SHIFT = ( [0, 3], [4,...,9], A 2 ) SHIFT = ( [0, 3, 4], [5,...,9], A 2 ) LEFT-ARC NMOD = ( [0, 3], [5,...,9], A 3 = A 2 {(5, NMOD,4)} ) SHIFT = ( [0, 3, 5], [6,...,9], A 3 ) SHIFT = ( [0,...6], [7, 8, 9], A 3 ) SHIFT = ( [0,...,7], [8,9], A 3 ) LEFT-ARC NMOD = ( [0,...6], [8, 9], A 4 = A 3 {(8, NMOD,7)} ) RIGHT-ARC s PMOD = ( [0, 3, 5], [6, 9], A 5 = A 4 {(6, PMOD,8)} ) RIGHT-ARC s NMOD = ( [0, 3], [5, 9], A 6 = A 5 {(5, NMOD,6)} ) RIGHT-ARC s OBJ = ( [0], [3, 9], A 7 = A 6 {(3, OBJ,5)} ) SHIFT = ( [0, 3], [9], A 7 ) RIGHT-ARC s P = ( [0], [3], A 8 = A 7 {(3, P,9)} ) RIGHT-ARC s ROOT = ( [ ], [0], A 9 = A 8 {(0, ROOT,3)} ) SHIFT = ( [0], [ ], A 9 ) 19

Deterministic Parsing oracle(c): predicts the next transition parse(w1 wn): c := [w0 = ROOT], [w1,, wn], {} while c is not terminal t := oracle(c) c := t(c) return G = {w0,, wn}, Ac 20

Deterministic Parsing Linear time complexity: the algorithm terminates after 2n steps for input sentences with n words. The algorithm is complete and correct for the class of projective dependency trees: For every projective dependency tree T there is a sequence of transitions that generates T Every sequence of transition steps generates a projective dependency tree Whether the resulting dependency tree is correct or not depends of course on the oracle. 21

The oracle Approximate the oracle by a classifier Represent configurations be feature vectors; for instance lexical properties (word form, lemma) category (part of speech) labels of partial dependency trees 22

An Example 23

Non-projective Parsing Configurations L1, L2, Q, A L1, L2 are stacks of partially processed nodes Q = a queue of unprocessed input tokens A = a set of dependency arcs Initial configuration for input w1 wn [w0], [], [w1,, wn], {}, w0 = ROOT Terminal configuration: [w0, w1,, wn], [], [], A 24

Transitions Left-Arc(l) [, wi], [ ], [wj, ], A [ ], [wi, ], [wj, ], A {(wj, l, wi)} i 0 k l (wk, l, wi) A wi * wj Right-Arc(l) [, wi], [ ], [wj, ], A [ ], [wi, ], [wj, ], A {(wi, l, wj)} k l (wk, l, wj) A wi * wj 25

Transitions No-Arc [, wi], [ ], [ ], A [ ], [wi, ], [ ], A Shift [ ]L1, [ ]L2, [wi, ], A [ ]L1 [, wi]l2, [], [ ], A L1 L2 = the concatenation of L1 and L2 26

An Example ROOT 0 Z 1 (Out-of Pred Atr nich 2 them AuxP je 3 is AuxK jen 4 only ( Only one of them concerns quality. ) Sb AuxP AuxZ jedna 5 one-fem-sg na 6 to Adv kvalitu 7 quality. 8.) Figure 1 Dependency graph for a Czech sentence from the Prague Dependency Treebank. 27

An Example Transition Configuration ( [0], [ ], [1,...,8], ) SHIFT λ = ( [0,1], [], [2,...,8], ) RIGHT-ARC n Atr = ( [0], [1], [2,...,8], A 1 = {(1, Atr, 2)} ) SHIFT λ = ( [0,1,2], [], [3,...,8], A 1 ) NO-ARC n = ( [0,1], [2], [3,...,8], A 1 ) NO-ARC n = ( [0], [1,2], [3,...,8], A 1 ) RIGHT-ARC n Pred = ( [], [0,1,2], [3,...,8], A 2 = A 1 {(0, Pred, 3)} ) SHIFT λ = ( [0,...,3], [], [4,...,8], A 2 ) SHIFT λ = ( [0,...,4], [], [5,...,8], A 2 ) LEFT-ARC n AuxZ = ( [0,...,3], [4], [5,...,8], A 3 = A 2 {(5, AuxZ, 4)} ) RIGHT-ARC n Sb = ( [0,1,2], [3,4], [5,...,8], A 4 = A 3 {(3, Sb, 5)} ) NO-ARC n = ( [0,1], [2,3,4], [5,...,8], A 4 ) LEFT-ARC n AuxP = ( [0], [1,...,4], [5,...,8], A 5 = A 4 {(5, AuxP, 1)} ) SHIFT λ = ( [0,...,5], [], [6,7,8], A 5 ) NO-ARC n = ( [0,...,4], [5], [6,7,8], A 5 ) NO-ARC n = ( [0,,3], [4,5], [6,7,8], A ) 28

An Example { } 5 NO-ARC n = ( [0,...,4], [5], [6,7,8], A 5 ) NO-ARC n = ( [0,...,3], [4,5], [6,7,8], A 5 ) RIGHT-ARC n AuxP = ( [0,1,2], [3,4,5], [6,7,8], A 6 = A 5 {(3, AuxP, 6)} ) SHIFT λ = ( [0,...,6], [], [7,8], A 6 ) RIGHT-ARC n Adv = ( [0,...,5], [6], [7,8], A 7 = A 6 {(6, Adv, 7)} ) SHIFT λ = ( [0,...,7], [], [8], A 7 ) NO-ARC n = ( [0,...,6], [7], [8], A 7 ) NO-ARC n = ( [0,...,5], [6,7], [8], A 7 ) NO-ARC n = ( [0,...,4], [5,6,7], [8], A 7 ) NO-ARC n = ( [0,...,3], [4,...,7], [8], A 7 ) NO-ARC n = ( [0,1,2], [3,...,7], [8], A 7 ) NO-ARC n = ( [0,1], [2,...,7], [8], A 7 ) NO-ARC n = ( [0], [1,...,7], [8], A 7 ) RIGHT-ARC n AuxK = ( [], [0,...,7], [8], A 8 = A 7 {(0, AuxK, 8)} ) SHIFT λ = ( [0,...,8], [], [], A 8 ) 29

Non-projective Parsing The algorithm is sound and complete for the class of dependency forests Time complexity is O(n 2 ) at most n Shift-transitions between the i-th and (i+1)-th Shift-transition there are at most i transitions (left-arc, right-arc, no-arc) 30

Literature Sandra Kübler, Ryan McDonald and Joakim Nivre (2009). Dependency Parsing. Joakim Nivre (2008). Algorithms for Deterministic Incremental Dependency Parsing. Computational Linguistics 34(4), 513 553. 31