Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)
Dependency Parsing grammar-based parsing data-driven parsing learning: given a training d=set C of sentences (annotated with dependency graphs), induce a parsing model M that can be used to parse new sentences parsing: given a parsing model M and a sentence S, derive the optimal dependency graph G for S based on M Transition-Based Parsing 2(11)
Deterministic Parsing Basic idea: Derive a single syntactic representation (dependency graph) through a deterministic sequence of elementary parsing actions Sometimes combined with backtracking or repair Motivation: Psycholinguistic modeling Efficiency Simplicity Transition-Based Parsing 3(11)
Covington s Incremental Algorithm Deterministic incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one: PARSE(x = (w 1,...,w n )) 1 for i = 1 up to n 2 for j = i 1 down to 1 3 LINK(w i, w j ) E E (i, j) if w j is a dependent of w i LINK(w i, w j ) = E E (j, i) if w i is a dependent of w j E E otherwise Different conditions, such as Single-Head and Projectivity, can be incorporated into the LINK operation. Transition-Based Parsing 4(11)
Shift-Reduce Type Algorithms Data structures: Stack [...,wi ] S of partially processed tokens Queue [wj,...] Q of remaining input tokens Parsing actions built from atomic actions: Adding arcs (wi w j, w i w j ) Stack and queue operations Left-to-right parsing in O(n) time Restricted to projective dependency graphs Transition-Based Parsing 5(11)
Yamada s Algorithm Three parsing actions: Shift Left Right [...] S [w i,...] Q [...,w i ] S [...] Q [...,w i, w j ] S [...] Q [..., w i ] S [...] Q w i w j [...,w i, w j ] S [...] Q [...,w j ] S [...] Q w i w j Algorithm variants: Originally developed for Japanese (strictly head-final) with only the Shift and Right actions Adapted for English (with mixed headedness) by adding the Left action Multiple passes over the input give time complexity O(n 2 ) Transition-Based Parsing 6(11)
Nivre s Algorithm Four parsing actions: Shift Reduce [...] S [w i,...] Q [..., w i ] S [...] Q [..., w i ] S [...] Q w k : w k w i [...] S [...] Q Left-Arc r [..., w i ] S [w j,...] Q w k : w k w i [...] S [w j,...] Q w i r w j Right-Arc r [...,w i ] S [w j,...] Q w k : w k w j [..., w i, w j ] S [...] Q w i r w j Characteristics: Integrated labeled dependency parsing Arc-eager processing of right-dependents Single pass over the input gives time complexity O(n) Transition-Based Parsing 7(11)
[root] S [Economic news had little effect on financial markets.] Q
[root Economic] S [news had little effect on financial markets.] Q Shift
[root] S Economic [news had little effect on financial markets.] Q Left-Arc
[root Economic news] S [had little effect on financial markets.] Q Shift
[root] S Economic news [had little effect on financial markets.] Q Left-Arc
pred [root Economic news had] S [little effect on financial markets.] Q Right-Arc pred
pred [root Economic news had little] S [effect on financial markets.] Q Shift
pred [root Economic news had] S little [effect on financial markets.] Q Left-Arc
pred obj [root Economic news had little effect] S [on financial markets.] Q Right-Arc obj
pred obj [root Economic news had little effect on] S [financial markets.] Q Right-Arc
pred obj [root Economic news had little effect on financial] S [markets.] Q Shift
pred obj [root Economic news had little effect on] S financial [markets.] Q Left-Arc
pred obj pc [root Economic news had little effect on financial markets] S [.] Q Right-Arc pc
pred obj pc [root Economic news had little effect on] S financial markets [.] Q Reduce
pred obj pc [root Economic news had little effect] S on financial markets [.] Q Reduce
pred obj pc [root Economic news had] S little effect on financial markets [.] Q Reduce
pred obj pc [root] S Economic news had little effect on financial markets [.] Q Reduce
p pred obj pc [root Economic news had little effect on financial markets.] S [] Q Right-Arc p
Classifier-Based Parsing Data-driven deterministic parsing: Deterministic parsing requires an oracle. An oracle can be approximated by a classifier. A classifier can be trained using treebank data. Learning methods: Support vector machines (SVM) (Kudo & Matsumoto 2002, Yamada 2003, Nivre 2006) Memory-based learning (MBL) (Nivre et al. 2004) Maximum entropy modeling (MaxEnt) (Cheng et al. 2005) Transition-Based Parsing 9(11)
Feature Models Learning problem: Approximate a function from parser states, represented by feature vectors to parser actions, given a training set of gold standard derivations. Typical features: Tokens: Target tokens Linear context (neighbors in S and Q) Structural context (parents, children, siblings in G) Attributes: Word form (and lemma) Part-of-speech (and morpho-syntactic features) Dependency type (if labeled) Distance (between target tokens) Transition-Based Parsing 10(11)
Next Example There is an ideal shit, of which all the shit that happens is but an imperfect image. (Plato) Transition-Based Parsing 11(11)
Next Example prd prd pmod There is an ideal shit, of which all the shit that happens is but an imperfect image. (Plato) Transition-Based Parsing 11(11)
Next Example /prd prd prd pmod There is an ideal shit, of which all the shit that happens is but an imperfect image. (Plato) Transition-Based Parsing 11(11)