Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov

Size: px

Start display at page:

Download "Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov"

Beverly Lawson
5 years ago
Views:

1 Vine Pruning for Efficient Multi-Pass Dependency Parsing Alexander M. Rush and Slav Petrov

2 Dependency Parsing

3 Styles of Dependency Parsing greedy O(n) transition-based parsers (Nivre 2004) graph-based parsers (Eisner 2000) (McDonald 2005) speed first-order O(n 3 ) k-best O(kn) second-order O(n 3 ) accuracy third-order O(n 4 )

4 Styles of Dependency Parsing greedy O(n) transition-based parsers (Nivre 2004) graph-based parsers (Eisner 2000) (McDonald 2005) speed first-order O(n 3 ) k-best O(kn) second-order O(n 3 ) accuracy this work third-order O(n 4 )

5 Preview: Coarse-to-Fine Cascades cgwire vine first second

6 linear-size dependency representation

7 Representation Heads Modifiers

8 Representation Heads Modifiers

9 Representation Heads Modifiers

10 Representation Heads Modifiers

11 Representation Heads Modifiers

12 Representation Heads Modifiers

13 Representation Heads Modifiers

14 Representation Heads Modifiers

15 First-Order Feature Calculation

16 First-Order Feature Calculation [] [VBD] [] [ADP] [] [VERB] [] [IN] [ VBD] [ ADP] [ ] [VBD ADP] [ VERB] [ IN] [ ] [VERB IN] [VBD ADP] [ ADP] [ VBD ADP] [ VBD ] [ADJ ADP] [VBD ADP] [VBD ADJ ADP] [VBD ADJ ] [NNS ADP] [NNS VBD ADP] [NNS VBD ] [ADJ ADP NNP] [VBD ADP NNP] [VBD ADJ NNP] [NNS ADP NNP] [NNS VBD NNP] [ left 5] [VBD left 5] [ left 5] [ADP left 5] [VERB IN] [ IN] [ VERB IN] [ VERB ] [JJ IN] [VERB IN] [VERB JJ IN] [VERB JJ ] [NOUN IN] [NOUN VERB IN] [NOUN VERB ] [JJ IN NOUN] [VERB IN NOUN] [VERB JJ NOUN] [NOUN IN NOUN] [NOUN VERB NOUN] [ left 5] [VERB left 5] [ left 5] [IN left 5] [ VBD ADP] [VBD ADJ ADP] [NNS VBD ADP] [VBD ADJ ADP NNP] [NNS VBD ADP NNP] [ VBD left 5] [ ADP left 5] [ left 5] [VBD ADP left 5] [ VERB IN] [VERB JJ IN] [NOUN VERB IN] [VERB JJ IN NOUN] [NOUN VERB IN NOUN] [ VERB left 5] [ IN left 5] [ left 5] [VERB IN left 5] [VBD ADP left 5] [ ADP left 5] [ VBD ADP left 5] [ VBD left 5] [ADJ ADP left 5] [VBD ADP left 5] [VBD ADJ ADP left 5] [VBD ADJ left 5] [NNS ADP left 5] [NNS VBD ADP left 5] [NNS VBD left 5] [ADJ ADP NNP left 5] [VBD ADP NNP left 5] [VBD ADJ NNP left 5] [NNS ADP NNP left 5] [NNS VBD NNP left 5] [VERB IN left 5] [ IN left 5] [ VERB IN left 5] [ VERB left 5] [JJ IN left 5] [VERB IN left 5] [VERB JJ IN left 5] [VERB JJ left 5] [NOUN IN left 5] [NOUN VERB IN left 5]

17 Arc Length By Part-of-Speech NOUN ADP DET VERB ADJ counts length

18 Arc Length By Part-of-Speech NOUN ADP DET VERB ADJ counts length

19 Arc Length By Part-of-Speech NOUN ADP DET VERB ADJ counts length

20 bill The to the intends to RTC restrict only borrowings Treasury the unless authorization congressional specific Arc Length Examples The bill intends to restrict the RTC to Treasury borrowings only unless the agency receives specific congressional authorization. receives agency.

21 This was in system financing the new created in law to order the keep from bailout the spending. swelling deficit budget Arc Length Examples This financing system was created in the new law in order to keep the bailout spending from swelling the budget deficit.

22 Arc Length Examples But the RTC also requires working capital to maintain the bad assets of thrifts that are sold until the assets can be sold separately. But the RTC also requires working capital to maintain the bad assets of thrifts that are sold until the assets can be sold separately.

23 Arc Length Examples It s a problem that clearly has to be resolved said David Cooke executive director of the RTC. It s a problem that clearly has to be resolved said David Cooke executive director of the RTC.

24 Arc Length Examples We would have to wait until we have collected on those assets before we can move forward he said. We would have to wait until we have collected on those assets before we can move forward he said.

25 The in the huge language law new complicated has the. fight muddied Arc Length Examples The complicated language in the huge new law has muddied the fight.

26 Arc Length Examples That secrecy leads to a proposal like the one from Ways and Means which seems to me sort of draconian he said. That secrecy leads to a proposal like the one from Ways and Means which seems to me sort of draconian he said.

27 Arc Length Examples The RTC is going to have to pay a price of prior consultation on the Hill if they want that kind of flexibility. The RTC is going to have to pay a price of prior consultation on the Hill if they want that kind of flexibility.

28 Arc Length Heat Map

29 Arc Length Heat Map

30 Banded Matrix

31 Banded Matrix

32 Outer Arc

33 Outer Arc

34 Outer Arc

35 Outer Arc

36 vine Coarse-to-Fine

37 Coarse-to-Fine vine first

38 Coarse-to-Fine cgwire vine first second

39 dynamic programs for parsing

40 Inference Questions questions: How do we reduce inference time to O(n)? How do we decide which arcs to prune? Vine Parsing (Eisner and Smith 2005)

41 Eisner First-Order Rules + h m h r r + 1 m + h e h m m e

42 First-Order Parsing

43 First-Order Parsing

44 First-Order Parsing

45 First-Order Parsing

46 First-Order Parsing

47 First-Order Parsing

48 First-Order Parsing

49 First-Order Parsing

50 First-Order Parsing

51 Vine Parsing Rules + 0 e 0 e 1 e 1 e + 0 e 0 m m e 0 e 0 e + 0 e 0 m m e + 0 e 0 e 1 e 1 e

52 Vine Parsing

53 Vine Parsing

54 Vine Parsing

55 Vine Parsing

56 Vine Parsing

57 Vine Parsing

58 Vine Parsing

59 Vine Parsing

60 Vine Parsing

61 Vine Parsing

62 Vine Parsing

63 Vine Parsing

64 Arc Pruning Prune arcs based on max-marginals. maxmarginal(a) = max (y w) y:a y Can compute using inside-outside algorithm. Generic algorithm using hypergraph parsing.

65 Max-Marginals for First-Order Arcs maxmarginal( ) > threshold?

66 Max-Marginals for Outer Arcs maxmarginal(left ) > threshold?

67 pruning and training

68 Max-Marginal Pruning goal: Define a threshold on max-marginal score. Validation parameter α trades off between speed and accuracy. t α (w) = α max (y w) + (1 α) 1 y A maxmarginal(a w) a A Highest scoring parse upper bounds any max-marginal. sume average of max-marginals is lower than gold.

69 Pruning Threshold feature two w feature one

70 Pruning Threshold max feature two w feature one

71 Pruning Threshold max feature two w feature one

72 Pruning Threshold max feature two w feature one

73 Pruning Threshold max feature two w feature one

74 Pruning Threshold max feature two average max-marginal w feature one

75 Pruning Threshold max feature two average max-marginal w feature one

76 Pruning Threshold max feature two average max-marginal w feature one

77 Pruning Threshold max feature two average max-marginal w feature one

78 Pruning Threshold max feature two α average max-marginal w feature one

79 Pruning Threshold max feature two average max-marginal w feature one

80 Structured Cascade Training (Weiss and Taskar 2011) Train a linear model with a loss function for pruning. Regularized risk minimization with loss based on threshold min w λ w P P [1 y (p) w + t α (p) (w)] + p=1 Can use a simple variant of perceptron/pegasos to train.

81 Structured Cascade Training max feature two w feature one gold

82 Structured Cascade Training max feature two average max-marginal w feature one gold

83 Structured Cascade Training max feature two average max-marginal w feature one gold

84 Structured Cascade Training max feature two average max-marginal w feature one gold

85 Structured Cascade Training max feature two average max-marginal w feature one gold

86 Structured Cascade Training max feature two average max-marginal w feature one gold

87 Structured Cascade Training max feature two average max-marginal w feature one gold

88 Structured Cascade Training feature two w feature one gold

89 Structured Cascade Training feature two max w feature one gold

90 Structured Cascade Training feature two max w feature one gold

91 Structured Cascade Training feature two max w feature one gold

92 experiments

93 Implementation Inference Experiments use a highly-optimized C++ implementation. Baseline first-order parser processes 2000 tokens/sec. Hypergraph parsing framework with shared inference. Model Final models trained with hamming-loss MIRA. Full collection of dependency parsing features (Koo 2010). First- second- and third-order models match state-of-the-art.

94 Baselines NoPrune exhaustive parsing model with no pruning LocalShort unstructured classifier over O(n) short arcs (Bergsma and Cherry 2010) Local unstructured classifier over O(n 2 ) arcs (Bergsma and Cherry 2010) FirstOnly structured first-order model in cascade (Koo 2010) VinePosterior posterior pruning cascade trained with L-BFGS ZhangNivre reimplementation of state-of-the-art k-best transition-based parser (Zhang and Nivre 2011).

95 Speed/Accuracy Experiments: First-Order Parsing NoPrune Local FirstOnly VinePosterior VineCascade ZhangNivre(8) Relative Speed Accuracy

96 Speed/Accuracy Experiments: Second-Order Parsing NoPrune Local FirstOnly VinePosterior VineCascade ZhangNivre(16) Relative Speed Accuracy

97 Speed/Accuracy Experiments: Third-Order Parsing NoPrune Local FirstOnly VinePosterior VineCascade ZhangNivre(64) Relative Speed Accuracy

98 Empirical Complexity: First-Order Parsing NoPrune [2.8] VineCascade [1.4] time sentence length

99 Empirical Complexity: Second-Order Parsing NoPrune [2.8] VineCascade [1.8] time sentence length

100 Empirical Complexity: Third-Order Parsing NoPrune [3.8] VineCascade [1.9] time sentence length

101 Multilingual Experiments: First-Order Parsing En Bg De Pt Sw Zh NoPrune VineCascade Relative Speed

102 Multilingual Experiments: Second-Order Parsing En Bg De Pt Sw Zh NoPrune VineCascade Relative Speed

103 Multilingual Experiments: Third-Order Parsing En Bg De Pt Sw Zh NoPrune VineCascade Relative Speed

104 Special thanks to: Ryan McDonald Hao Zhang Michael Ringgaard Terry Koo Keith Hall Kuzman Ganchev Yoav Goldberg Andre Martins and the rest of the Google NLP team

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format