Dependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University

Size: px

Start display at page:

Download "Dependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University"

Ferdinand Lane
5 years ago
Views:

1 + Dependency Parsing COSI 114 Computational Linguistics Marie Meteer March 21, 2015 Brandeis University

2 Dependency Grammar and Dependency Structure Dependency syntax postulates that syntac1c structure consists of lexical items linked by binary asymmetric rela1ons ( arrows ) called dependencies The arrows are commonly typed with the name of gramma1cal rela1ons (subject, preposi1onal object, apposi1on, etc.) submitted nsubjpass auxpass prep Bills were by prep pobj on Brownback pobj nn appos ports Senator Republican cc conj prep and immigration of pobj Kansas

3 Dependency Grammar and Dependency Structure Dependency syntax postulates that syntac1c structure consists of lexical items linked by binary asymmetric rela1ons ( arrows ) called dependencies The arrow connects a head (governor, superior, regent) with a dependent (modifier, inferior, subordinate) Usually, dependencies form a tree (connected, acyclic, single-head) submitted nsubjpass auxpass prep Bills were by prep pobj on Brownback pobj nn appos ports Senator Republican cc conj prep and immigration of pobj Kansas

4 + Dependency Grammar/Parsing n A sentence is parsed by relating each word to other words in the sentence which depend on it. n The idea of dependency structure goes back a long way n To Pāṇini s grammar (c. 5th century BCE) n Constituency is a new-fangled invention n 20th century invention n Modern work often linked to work of L. Tesniere (1959) n Dominant approach in East (Eastern bloc/east Asia) n Among the earliest kinds of parsers in NLP, even in US: n David Hays, one of the founders of computational linguistics, built early (first?) dependency parser (Hays 1962)

5 + Dependency structure Shaw Publishing acquired 30 % of American City in March $$ n Words are linked from head (regent) to dependent n Warning! Some people do the arrows one way; some the other way (Tesniere has them point from head to dependent ). n Usually add a fake ROOT so every word is a dependent

6 Rela1on between phrase structure and dependency structure n A dependency grammar has a no1on of a head. Officially, CFGs don t. n But modern linguis1c theory and all modern sta1s1cal parsers (Charniak, Collins, Stanford, ) do, via hand-wripen phrasal head rules : n The head of a Noun Phrase is a noun/number/adj/ n The head of a Verb Phrase is a verb/modal/. n The head rules can be used to extract a dependency parse from a CFG parse n The closure of dependencies give cons1tuency from a dependency tree n But the dependents of a word must be at the same level (i.e., flat ) there can be no VP!

7 + Dependency Conditioning Preferences Sources of information: n n n bilexical dependencies distance of dependencies valency of heads (number of dependents) A word s dependents (adjuncts, arguments) Tend to fall near it These next 6 slides are based on slides by Jason Eisner and Noah Smith in the string.

8 + Probabilistic dependency grammar: generative model 1. Start with left wall $ 2. Generate root w 0 $ λw 0 ρw 0 3. Generate left children w -1, w -2,..., w -l from the FSA λw 0 w -1 w 0 w 1 4. Generate right children w 1, w 2,..., w r from the FSA ρw 0 5. Recurse on each w i for i in {-l,..., -1, 1,..., r}, sampling α i (steps 2-4) λw - w -... w -2 w 2... w r 6. Return α l...α -1 w 0 α 1...α r w -.-1

9 + Naïve Recognition/Parsing O(n 5 N 3 ) if N nonterminals O(n 5 ) combinations goal p p c i j k goal r 0 n takes I t takes takes to I t takes two to tango It takes two to tango

10 + Dependency Grammar Cubic Recognition/ Parsing (Eisner & Satta, 1999) } n Triangles: span over words, where tall side of triangle is the head, other side is dependent, and no non-head words expecting more dependents } n Trapezoids: span over words, where larger side is head, smaller side is dependent, and smaller side is still looking for dependents on its side of the trapezoid

11 + Dependency Grammar Cubic Recognition/ Parsing (Eisner & Satta, 1999) One trapezoid per dependency. A triangle is a goal head with some left (or right) subtrees. It takes two to tango

12 + Cubic Recognition/Parsing (Eisner & Satta, 1999) O(n) combinations goal 0 i n O(n 3 ) combinations i j k i j k O(n 3 ) combinations Gives O(n 3 ) dependency grammar parsing i j k i j k

Evaluation of Dependency Parsing: Simply use (labeled) dependency accuracy GOLD PARSED 1 2 3 4 5 Accuracy = number of correct dependencies total number of

13 Evaluation of Dependency Parsing: Simply use (labeled) dependency accuracy GOLD PARSED Accuracy = number of correct dependencies total number of dependencies 1 2 We SUBJ 2 0 eat ROOT 3 5 the DET 4 5 cheese MOD 5 2 sandwich OBJ = 2 / 5 = % 1 2 We SUBJ 2 0 eat ROOT 3 4 the DET 4 2 cheese OBJ 5 2 sandwich PRED

14 + McDonald et al. (2005 ACL): Online Large-Margin Training of Dependency Parsers n Builds a discriminative dependency parser n Can condition on rich features in that context n Best-known recent dependency parser n Lots of recent dependency parsing activity connected with CoNLL 2006/2007 shared task n Doesn t/can t report constituent LP/LR, but evaluating dependencies correct: n Accuracy is similar to but a fraction below dependencies extracted from Collins: n 90.9% vs. 91.4% combining them gives 92.2% [all lengths] n Stanford parser on length up to 40: n Pure generative dependency model: 85.0% n Lexicalized factored parser: 91.0%

15 + McDonald et al. (2005 ACL): Online Large-Margin Training of Dependency Parsers n Score of a parse is the sum of the scores of its dependencies n Each dependency is a linear function of features times weights n Feature weights are learned by MIRA, an online largemargin algorithm n But you could think of it as using a perceptron or maxent classifier n Features cover: n Head and dependent word and POS separately n Head and dependent word and POS bigram features n Words between head and dependent n Length and direction of dependency

16 + Extracting grammatical relations from statistical constituency parsers NP NNS [de Marneffe et al. LREC 2006] n Exploit the high-quality syntactic analysis done by statistical constituency parsers to get the grammatical relations [typed dependencies] n Dependencies are generated by pattern-matching rules NP IN PP NNS NP CC NN S VBD NNP Bills on ports and immigration were submitted by Senator Brownback VP VBN VP IN PP NP NNP submitted nsubjpass auxpass agent Bills were Brownback prep_on nn ports cc_and immigration Senator

17 + Methods of Dependency Parsing 1. Dynamic programming (like in the CKY algorithm) You can do it similarly to lexicalized PCFG parsing: an O(n 5 ) algorithm Eisner (1996) gives a clever algorithm that reduces the complexity to O(n 3 ), by producing parse items with heads at the ends rather than in the middle 2. Graph algorithms You create a Maximum Spanning Tree for a sentence McDonald et al. s (2005) MSTParser scores dependencies independently using a ML classifier (he uses MIRA, for online learning, but it could be MaxEnt) 3. Constraint Sa1sfac1on Edges are eliminated that don t sa1sfy hard constraints. Karlsson (1990), etc. 4. Determinis1c parsing Greedy choice of apachments guided by machine learning classifiers MaltParser (Nivre et al. 2008) discussed in the next segment

18 + Dependency Conditioning Preferences What are the sources of informa1on for dependency parsing? 1. Bilexical affini1es [issues à the] is plausible 2. Dependency distance mostly with nearby words 3. Intervening material Dependencies rarely span intervening verbs or punctua1on 4. Valency of heads How many dependents on which side are usual for a head? ROOT Discussion of the outstanding issues was completed.

19 + Greedy Transition-Based Parsing MaltParser

20 + MaltParser [Nivre et al. 2008] n A simple form of greedy discrimina1ve dependency parser n The parser does a sequence of bopom up ac1ons n Roughly like ship or reduce in a ship-reduce parser, but the reduce ac1ons are specialized to create dependencies with head on lep or right n The parser has: n a stack σ, wripen with top to the right n which starts with the ROOT symbol n a buffer β, wripen with top to the lep n which starts with the input sentence n a set of dependency arcs A n which starts off empty n a set of ac1ons

21 + Basic transi1on-based dependency parser Start: σ = [ROOT], β = w 1,, w n, A = 1. Ship σ, w i β, A è σ w i, β, A 2. Lep-Arc r σ w i, w j β, A è σ, w j β, A {r(w j,w i )} 3. Right-Arc r σ w i, w j β, A è σ, w i β, A {r(w i,w j )} Finish: β = Notes: n Unlike the regular presenta1on of the CFG reduce step, dependencies combine one thing from each of stack and buffer

22 + Ac1ons ( arc-eager dependency parser) Start: σ = [ROOT], β = w 1,, w n, A = 1. Lep-Arc r σ w i, w j β, A è σ, w j β, A {r(w j,w i )} Precondi1on: r (w k, w i ) A, w i ROOT 2. Right-Arc r σ w i, w j β, A è σ w i w j, β, A {r(w i,w j )} 3. Reduce σ w i, β, A è σ, β, A Precondi1on: r (w k, w i ) A 4. Ship σ, w i β, A è σ w i, β, A Finish: β = This is the common arc-eager variant: a head can immediately take a right dependent, before its dependents are found

23 + Example Happy children like to play with their friends. [ROOT] [Happy, children, ] Ship [ROOT, Happy] [children, like, ] LA amod [ROOT] [children, like, ] {amod(children, happy)} = A 1 Ship [ROOT, children] [like, to, ] A 1 LA nsubj [ROOT] [like, to, ] A 1 {nsubj(like, children)} = A 2 RA root [ROOT, like] [to, play, ] A 2 {root(root, like) = A 3 Ship [ROOT, like, to] [play, with, ] A 3 1. Lep-Arc r σ w i, w j β, A è σ, w j β, A {r(w j,w i )} Precondi1on: (w k, r, w i ) A, w i ROOT 2. Right-Arc r σ w i, w j β, A è σ w i w j, β, A {r(w i,w j )} 3. Reduce σ w i, β, A è σ, β, A Precondi1on: (w k, r, w i ) A 4. Ship σ, w i β, A è σ w i, β, A LA aux [ROOT, like] [play, with, ] A 3 {aux(play, to) = A 4 RA xcomp [ROOT, like, play] [with their, ] A 4 {xcomp(like, play) = A 5

24 + Example Happy children like to play with their friends. 1. Lep-Arc r σ w i, w j β, A è σ, w j β, A {r(w j,w i )} Precondi1on: (w k, r, w i ) A, w i ROOT 2. Right-Arc r σ w i, w j β, A è σ w i w j, β, A {r(w i,w j )} 3. Reduce σ w i, β, A è σ, β, A Precondi1on: (w k, r, w i ) A 4. Ship σ, w i β, A è σ w i, β, A RA xcomp [ROOT, like, play] [with their, ] A 4 {xcomp(like, play) = A 5 RA prep [ROOT, like, play, with] [their, friends, ] A 5 {prep(play, with) = A 6 Ship [ROOT, like, play, with, their] [friends,.] A 6 LA poss [ROOT, like, play, with] [friends,.] A 6 {poss(friends, their) = A 7 RA pobj [ROOT, like, play, with, friends] [.] A 7 {pobj(with, friends) = A 8 Reduce [ROOT, like, play, with] [.] A 8 Reduce [ROOT, like, play] [.] A 8 Reduce [ROOT, like] [.] A 8 RA punc [ROOT, like,.] [] A 8 {punc(like,.) = A 9 You terminate as soon as the buffer is empty. Dependencies = A 9

25 + MaltParser [Nivre et al. 2008] n We have lep to explain how we choose the next ac1on n Each ac1on is predicted by a discrimina1ve classifier (open SVM, could be maxent classifier) over each legal move n Max of 4 untyped choices, max of R when typed n Features: top of stack word, POS; first in buffer word, POS; etc. n There is NO search (in the simplest and usual form) n But you could do some kind of beam search if you wish n The model s accuracy is slightly below the best LPCFGs (evaluated on dependencies), but n It provides close to state of the art parsing performance n It provides VERY fast linear 1me parsing

26 + Evalua1on of Dependency Parsing: (labeled) dependency accuracy Acc = # correct deps # of deps ROOT She saw the video lecture UAS = 4 / 5 = 80% LAS = 2 / 5 = 40% Gold 1 2 She nsubj 2 0 saw root 3 5 the det 4 5 video nn 5 2 lecture dobj Parsed 1 2 She nsubj 2 0 saw root 3 4 the det 4 5 video nsubj 5 2 lecture ccomp

27 + Representa1ve performance numbers n The CoNLL-X (2006) shared task provides evalua1on numbers for various dependency parsing approaches over 13 languages n MALT: LAS scores from 65 92%, depending greatly on language/ treebank n Here we give a few UAS numbers for English to allow some comparison to cons1tuency parsing Parser UAS% Sagae and Lavie (2006) ensemble of dependency parsers 92.7 Charniak (2000) generative, constituency 92.2 Collins (1999) generative, constituency 91.7 McDonald and Pereira (2005) MST graph-based dependency 91.5 Yamada and Matsumoto (2003) transition-based dependency 90.4

28 Projec1vity n Dependencies from a CFG tree using heads, must be projec1ve n There must not be any crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words. n But dependency theory normally does allow non-projec1ve structures to account for displaced cons1tuents n You can t easily get the seman1cs of certain construc1ons right without these nonprojec1ve dependencies Who did Bill buy the coffee from yesterday?

29 + Handling non-projec1vity n The arc-eager algorithm we presented only builds projec1ve dependency trees n Possible direc1ons to head: 1. Just declare defeat on nonprojec1ve arcs 2. Use a dependency formalism which only admits projec1ve representa1ons (a CFG doesn t represent such structures ) 3. Use a postprocessor to a projec1ve dependency parsing algorithm to iden1fy and resolve nonprojec1ve links 4. Add extra types of transi1ons that can model at least most non-projec1ve structures 5. Move to a parsing mechanism that does not use or require any constraints on projec1vity (e.g., the graph-based MSTParser)

30 + Dependencies encode relational structure Relation Extraction with Stanford Dependencies

31 + Dependency paths iden1fy rela1ons like protein interac1on [Erkan et al. EMNLP 07, Fundel et al. 2007] nsubj results det The demonstrated compl ccomp interacts prep_with that advmod SasA nsubj conj_and conj_and KaiC rythmically KaiA KaiB KaiC çnsubj interacts prep_withè SasA KaiC çnsubj interacts prep_withè SasA conj_andè KaiA KaiC çnsubj interacts prep_withè SasA conj_andè KaiB

32 + Stanford Dependencies [de Marneffe et al. LREC 2006] n The basic dependency representation is projective n It can be generated by postprocessing headed phrase structure parses (Penn Treebank syntax) n It can also be generated directly by dependency parsers, such as MaltParser, or the Easy-First Parser the jumped nsubj prep boy little over det amod pobj the det fence

33 + Graph modification to facilitate semantic analysis Bell, based in LA, makes and distributes electronic and computer products. Bell partmod nsubj based prep in pobj LA makes cc and conj dobj products amod electronic cc conj and distributes computer

34 + Graph modification to facilitate semantic analysis Bell, based in LA, makes and distributes electronic and computer products. nsubj nsubj conj_and makes distributes dobj Bell partmod based prep_in LA amod products amod electronic conj_and computer

35 + BioNLP 2009/2011 rela1on extrac1on shared tasks [Björne et al. 2009]

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5