Transition-based Dependency Parsing with Selectional Branching

Size: px

Start display at page:

Download "Transition-based Dependency Parsing with Selectional Branching"

Wesley Lucas Stewart
5 years ago
Views:

1 Transitionbased Dependency Parsing with Selectional Branching Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts Amherst

2 Greedy vs. Nongreedy Parsing Greedy parsing Considers only one head for each token. Generates one parse tree per sentence. e.g., transitionbased parsing (2 ms / sentence). Nongreedy parsing Considers multiple heads for each token. Generates multiple parse trees per sentence. e.g., transitionbased parsing with beam search, graphbased parsing, linear programming, dual decomposition ( 93%). 2

3 Motivation How often do we need nongreedy parsing? Our greedy parser performs as accurately as our nongreedy parser about 64% of the time. This gap is even closer when they are evaluated on nonbenchmark data (e.g., twits, chats, blogs). Many applications are time sensitive. Some applications need at least one complete parse tree ready given a limited time period (e.g., search, dialog, Q/ A). Hard sentences are hard for any parser! Considering more heads does not always guarantee more accurate parse results. 3

4 Transitionbased Parsing Transitionbased dependency parsing (greedy) Considers one transition for each parsing state. S t1 t S t1 t S T tl tl What if t is not the correct transition? 4

5 Transitionbased Parsing Transitionbased dependency parsing with beam search Considers bnum. of transitions for each block of parsing S t1 t 1 S1 t11 t 1 S1 T1 t b Sb t1l tb1 t b Sb Tb tl tbl 5

6 Selectional Branching Issues with beam search Generates the fixed number of parse trees no matter how easy/hard the input sentence is. Is it possible to dynamically adjust the beam size for each individual sentence? Selectional branching Onebest transition sequence is found by a greedy parser. Collect kbest statetransition pairs for each low confidence transition used to generate the onebest sequence. Generate transition sequences from the b1 highest scoring statetransition pairs in the collection. 6

7 Selectional Branching S1 t11 t 11 S2 t21 t 21 Sn T t1l low confident? t2l low confident? λ = S1 t 12 S1 t 1k S2 t 22 S2 t 2k Pick b1 number of pairs with the highest scores. For our experiments, k = 2 is used. 7

8 Selectional Branching λ = S1 t 12 S2 t 22 S3 t 32 S1 t 12 S2 Sa T S2 t 22 S3 Sb T S3 t 32 S4 Sc T Carries on parsing states from the onebest sequence. Guarantees to generate fewer trees than beam search when λ b. 8

9 Low Confidence Transition Let C 1 be a classifier that finds the highest scoring transition given the parsing state x. C 1 (x) = arg max{f(x, y)} y2y exp(w (x, y)) f(x, y) = P y 0 2Y exp(w (x, y0 )) Let Ck be a classifier that finds the khighest scoring transitions given the parsing state x and the margin m. C k (x, m) = K arg max{f(x, y)} y2y s.t. f(x, C 1 (x)) f(x, y) apple m The highest scoring transition C 1 (x) is low confident if C k (x, m) > 1. 9

10 Experiments Parsing algorithm (Choi & McCallum, 2013) Hybrid between Nivre s arceager and listbased algorithms. Projective parsing: O(n). Nonprojective parsing: expected linear time. Features Rich nonlocal features from Zhang & Nivre, For languages with coarsegrained POS tags, feature templates using finegrained POS tags are replicated. For languages with morphological features, morphologies of σ[0] and β[0] are used as unigram features. 10

11 Number of Transitions # of transitions performed with respect to beam sizes. 1,200,000 1,000,000 Transitions 800, , , , Beam size = 1, 2, 4, 8, 16, 32, 64, 80 11

12 Projective Parsing The benchmark setup using WSJ. Approach USA LAS Time bt = 80, bd = bt = 80, bd = bt = 80, bd = bt = 80, bd = bt = 80, bd = bt = 80, bd = bt = 80, bd = bt = 80, bd = bt = 1, bd =

13 Projective Parsing The benchmark setup using WSJ. Approach USA LAS Time bt = 80, bd = Zhang & Clark, Huang & Sagae, Zhang & Nivre, Bohnet & Nivre, McDonald et al., McDonald & Pereira, Sagae & Lavie, Koo & Collins, Zhang & McDonald, Martins et al., Rush et al.,

14 Nonprojective Parsing CoNLLX shared task data Approach Danish Dutch Slovene Swedish LAS UAS LAS UAS LAS UAS LAS UAS bt = 80, bd = bt = 80, bd = Nivre et al., McDonald et al., Nivre, F.Gonz. & G.Rodr., Nivre & McDonald, Martins et al.,

15 SPMRL 2013 Shared Task Baseline results provided by ClearNLP. Language 5K Full LAS UAS LS LAS UAS LS Arabic Basque French German Hebrew Hungarian Korean Polish Swedish

16 Conclusion Selectional branching Uses confidence estimates to decide when to employ a beam. Shows comparable accuracy against traditional beam search. Gives faster speed against any other nongreedy parsing. ClearNLP Provides several NLP tools including morphological analyzer, dependency parser, semantic role labeler, etc. Webpage: clearnlp.com. 16

Transition-based Dependency Parsing with Selectional Branching

Transition-based Dependency Parsing with Selectional Branching Jinho D. Choi Department of Computer Science University of Massachusetts Amherst Amherst, MA, 01003, USA jdchoi@cs.umass.edu Andrew McCallum