Transition-Based Parsing

Similar documents
Computational Linguistics

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

Personal Project: Shift-Reduce Dependency Parsing

Transition-based dependency parsing

Chunking with Support Vector Machines

Statistical Methods for NLP

Dynamic-oracle Transition-based Parsing with Calibrated Probabilistic Output

Marrying Dynamic Programming with Recurrent Neural Networks

Introduction to Data-Driven Dependency Parsing

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Probabilistic Context-free Grammars

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

A Tabular Method for Dynamic Oracles in Transition-Based Parsing

Lab 12: Structured Prediction

Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Log-Linear Models with Structured Outputs

Features of Statistical Parsers

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Parsing with Context-Free Grammars

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Multiword Expression Identification with Tree Substitution Grammars

Transition-based Dependency Parsing with Selectional Branching

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010.

Spatial Role Labeling CS365 Course Project

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University

Stacking Dependency Parsers

A* Search. 1 Dijkstra Shortest Path

Chapter 14 (Partially) Unsupervised Parsing

Machine Learning for Structured Prediction

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

This lecture covers Chapter 5 of HMU: Context-free Grammars

A Discriminative Model for Semantics-to-String Translation

1. For the following sub-problems, consider the following context-free grammar: S AA$ (1) A xa (2) A B (3) B yb (4)

MA/CSSE 474 Theory of Computation

Recurrent neural network grammars

Driving Semantic Parsing from the World s Response

Better! Faster! Stronger*!

6.036 midterm review. Wednesday, March 18, 15

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

NLP Programming Tutorial 11 - The Structured Perceptron

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Advanced Graph-Based Parsing Techniques

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Holdout and Cross-Validation Methods Overfitting Avoidance

Maxent Models and Discriminative Estimation

Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov

Dependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Probabilistic Context Free Grammars. Many slides from Michael Collins

Tuning as Linear Regression

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

Dynamic Programming for Linear-Time Incremental Parsing

Ecient Higher-Order CRFs for Morphological Tagging

Syntax Analysis Part III

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

AN ABSTRACT OF THE DISSERTATION OF

Machine Learning for NLP

Semi-Supervised Learning

Propositional Logic: Methods of Proof (Part II)

Machine Learning for natural language processing

Natural Language Processing

Statistical Methods for NLP

Expectation Maximization (EM)

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

On-line Support Vector Machine Regression

Treebank Grammar Techniques for Non-Projective Dependency Parsing

Fourth-Order Dependency Parsing

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Multilevel Coarse-to-Fine PCFG Parsing

Advanced Natural Language Processing Syntactic Parsing

CS 406: Bottom-Up Parsing

Deep Learning For Mathematical Functions

Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift

Cross-lingual Semantic Parsing

Parsing -3. A View During TD Parsing

Log-Linear Models, MEMMs, and CRFs

DT2118 Speech and Speaker Recognition

LECTURER: BURCU CAN Spring

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Lecture Notes on Inductive Definitions

Computer Science 160 Translation of Programming Languages

CLRG Biocreative V

Exercises. Exercise: Grammar Rewriting

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Transcription:

Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)

Dependency Parsing grammar-based parsing data-driven parsing learning: given a training d=set C of sentences (annotated with dependency graphs), induce a parsing model M that can be used to parse new sentences parsing: given a parsing model M and a sentence S, derive the optimal dependency graph G for S based on M Transition-Based Parsing 2(11)

Deterministic Parsing Basic idea: Derive a single syntactic representation (dependency graph) through a deterministic sequence of elementary parsing actions Sometimes combined with backtracking or repair Motivation: Psycholinguistic modeling Efficiency Simplicity Transition-Based Parsing 3(11)

Covington s Incremental Algorithm Deterministic incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one: PARSE(x = (w 1,...,w n )) 1 for i = 1 up to n 2 for j = i 1 down to 1 3 LINK(w i, w j ) E E (i, j) if w j is a dependent of w i LINK(w i, w j ) = E E (j, i) if w i is a dependent of w j E E otherwise Different conditions, such as Single-Head and Projectivity, can be incorporated into the LINK operation. Transition-Based Parsing 4(11)

Shift-Reduce Type Algorithms Data structures: Stack [...,wi ] S of partially processed tokens Queue [wj,...] Q of remaining input tokens Parsing actions built from atomic actions: Adding arcs (wi w j, w i w j ) Stack and queue operations Left-to-right parsing in O(n) time Restricted to projective dependency graphs Transition-Based Parsing 5(11)

Yamada s Algorithm Three parsing actions: Shift Left Right [...] S [w i,...] Q [...,w i ] S [...] Q [...,w i, w j ] S [...] Q [..., w i ] S [...] Q w i w j [...,w i, w j ] S [...] Q [...,w j ] S [...] Q w i w j Algorithm variants: Originally developed for Japanese (strictly head-final) with only the Shift and Right actions Adapted for English (with mixed headedness) by adding the Left action Multiple passes over the input give time complexity O(n 2 ) Transition-Based Parsing 6(11)

Nivre s Algorithm Four parsing actions: Shift Reduce [...] S [w i,...] Q [..., w i ] S [...] Q [..., w i ] S [...] Q w k : w k w i [...] S [...] Q Left-Arc r [..., w i ] S [w j,...] Q w k : w k w i [...] S [w j,...] Q w i r w j Right-Arc r [...,w i ] S [w j,...] Q w k : w k w j [..., w i, w j ] S [...] Q w i r w j Characteristics: Integrated labeled dependency parsing Arc-eager processing of right-dependents Single pass over the input gives time complexity O(n) Transition-Based Parsing 7(11)

[root] S [Economic news had little effect on financial markets.] Q

[root Economic] S [news had little effect on financial markets.] Q Shift

[root] S Economic [news had little effect on financial markets.] Q Left-Arc

[root Economic news] S [had little effect on financial markets.] Q Shift

[root] S Economic news [had little effect on financial markets.] Q Left-Arc

pred [root Economic news had] S [little effect on financial markets.] Q Right-Arc pred

pred [root Economic news had little] S [effect on financial markets.] Q Shift

pred [root Economic news had] S little [effect on financial markets.] Q Left-Arc

pred obj [root Economic news had little effect] S [on financial markets.] Q Right-Arc obj

pred obj [root Economic news had little effect on] S [financial markets.] Q Right-Arc

pred obj [root Economic news had little effect on financial] S [markets.] Q Shift

pred obj [root Economic news had little effect on] S financial [markets.] Q Left-Arc

pred obj pc [root Economic news had little effect on financial markets] S [.] Q Right-Arc pc

pred obj pc [root Economic news had little effect on] S financial markets [.] Q Reduce

pred obj pc [root Economic news had little effect] S on financial markets [.] Q Reduce

pred obj pc [root Economic news had] S little effect on financial markets [.] Q Reduce

pred obj pc [root] S Economic news had little effect on financial markets [.] Q Reduce

p pred obj pc [root Economic news had little effect on financial markets.] S [] Q Right-Arc p

Classifier-Based Parsing Data-driven deterministic parsing: Deterministic parsing requires an oracle. An oracle can be approximated by a classifier. A classifier can be trained using treebank data. Learning methods: Support vector machines (SVM) (Kudo & Matsumoto 2002, Yamada 2003, Nivre 2006) Memory-based learning (MBL) (Nivre et al. 2004) Maximum entropy modeling (MaxEnt) (Cheng et al. 2005) Transition-Based Parsing 9(11)

Feature Models Learning problem: Approximate a function from parser states, represented by feature vectors to parser actions, given a training set of gold standard derivations. Typical features: Tokens: Target tokens Linear context (neighbors in S and Q) Structural context (parents, children, siblings in G) Attributes: Word form (and lemma) Part-of-speech (and morpho-syntactic features) Dependency type (if labeled) Distance (between target tokens) Transition-Based Parsing 10(11)

Next Example There is an ideal shit, of which all the shit that happens is but an imperfect image. (Plato) Transition-Based Parsing 11(11)

Next Example prd prd pmod There is an ideal shit, of which all the shit that happens is but an imperfect image. (Plato) Transition-Based Parsing 11(11)

Next Example /prd prd prd pmod There is an ideal shit, of which all the shit that happens is but an imperfect image. (Plato) Transition-Based Parsing 11(11)