Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried, Mitchell Stern and Dan Klein UC Berkeley

Size: px

Start display at page:

Download "Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley"

Ann Dixon
5 years ago
Views:

1 Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley

2 Top-down generative models

3 Top-down generative models S VP The man had an idea.

4 Top-down generative models S VP The man had an idea. (S

5 Top-down generative models S VP The man had an idea. (S (

6 Top-down generative models S VP (S ( The The man had an idea.

7 Top-down generative models S VP (S ( The man The man had an idea.

8 Top-down generative models S VP (S ( The man ) The man had an idea.

9 Top-down generative models S VP The man had an idea. (S ( The man ) (VP

10 Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). )

11 Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). ) G LSTM [Parsing as Language Modeling, Choe and Charniak, 2016]

12 Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). ) G LSTM [Parsing as Language Modeling, Choe and Charniak, 2016] G RNNG [Recurrent Neural Network Grammars, Dyer et al. 2016]

13 Generative models as rerankers

14 Generative models as rerankers base parser generative neural model G

15 Generative models as rerankers base parser generative neural model G S-INV S VP S VP ADJP The man had an idea. The man had an idea. The man had an idea. y ~ p y x)

16 Generative models as rerankers base parser generative neural model G S-INV VP S ADJP The man had an idea. The man had an idea. S VP The man had an idea. y ~ p y x) argmax y p G (x, y)

17 Generative models as rerankers base parser generative neural model G

18 Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank

19 Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank Choe and Charniak Charniak parser 92.6 LSTM language model (G LSTM )

20 Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank Choe and Charniak 2016 Dyer et al Charniak parser 91.7 RNNG-discriminative 92.6 LSTM language model (G LSTM ) 93.3 RNNG-generative (G RNNG )

21 : Necessary evil, or secret sauce? base parser generative neural model G

22 : Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with?

23 : Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with? No, better to combine and G more explicitly

24 : Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with? No, better to combine and G more explicitly 93.9 F1 on PT; 94.7 semi-supervised

25 Using standard beam search for G True Parse (S ( The man eam

26 Using standard beam search for G True Parse (S ( The man (S eam

27 Using standard beam search for G True Parse (S ( The man (S ( eam (VP (PP

28 Using standard beam search for G True Parse (S ( The man (S ( ( eam (VP (PP ( (

29 Using standard beam search for G True Parse (S ( The man (S ( ( ( eam (VP ( ( (PP ( (

30 Using standard beam search for G True Parse (S ( The man (S ( ( (... The eam (VP ( ( (PP ( (

31 Using standard beam search for G True Parse (S ( The man (S ( ( (... The eam (VP ( ( (PP ( ( G RNNG G LSTM eam Size F F1

32 Log probability Standard beam search in G fails Word generation is lexicalized: (S ( The man ) (VP had ( an idea ) ). )

33 Word-synchronous beam search w 0 (S [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]

34 Word-synchronous beam search w 0 w 1 (S ( (VP (PP ( ( The The The [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]

35 Word-synchronous beam search w 0 w 1 w 2 (S ( ( The ( man (VP The ( man (PP ( The man [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]

36 F1 on PT Word-synchronous beam search G LSTM G RNNG eam Size

37 F1 on PT Word-synchronous beam search G LSTM G RNNG G LSTM G RNNG eam Size

38 Finding model combination effects G

39 Finding model combination effects G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.

40 Finding model combination effects Add G s search proposal to candidate list: G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.

41 Finding model combination effects Add G s search proposal to candidate list: G G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.

42 Finding model combination effects Add G s search proposal to candidate list: G G S S VP S VP S-INV VP S- VP S The man had an idea. VP The man had an idea ADJP. The man had an idea. The man had an idea. The man had an idea. The man had an idea.

43 Finding model combination effects Add G s search proposal to candidate list: G G S VP S S-INV VP S- VP S The man had an idea. VP ADJP The man had an idea. The man had an idea. The man had an idea. The man had an idea. S VP The man had an idea.

44 Finding model combination effects F1 on PT G RNNG RNNG Generative Model G LSTM LSTM Generative Model

45 Finding model combination effects F1 on PT G RNNG RNNG Generative Model G LSTM LSTM Generative Model

46 Reranking shows implicit model combination G hides model errors in G

47 Making model combination explicit Can we do better by simply combining model scores? G G G log p G (x, y)

48 Making model combination explicit Can we do better by simply combining model scores? G + G G + log p G (x, y)

49 Making model combination explicit Can we do better by simply combining model scores? G + G G + λ log p G (x, y) + 1 λ log p (y x)

50 Making model combination explicit score with G + score with G F1 on PT G RNNG RNNG Generative Model (G=G RNNG ) G LSTM LSTM Generative Model (G=G LSTM )

51 Making model combination explicit score with G + score with G F1 on PT G RNNG RNNG Generative Model (G=G RNNG ) G LSTM LSTM Generative Model (G=G LSTM )

52 Explicit score combination prevents errors G + fast G G + best

53 Comparison to past work F1 on PT

54 Comparison to past work F1 on PT 92.6 Choe & Charniak 2016

55 Comparison to past work F1 on PT Choe & Charniak 2016 Dyer et al. 2016

56 Comparison to past work F1 on PT Choe & Charniak 2016 Dyer et al Kuncoro et al. 2017

57 Comparison to past work F1 on PT G RNNG G RNNG Choe & Charniak 2016 Dyer et al Kuncoro et al Ours

58 Comparison to past work F1 on PT add G LSTM 93.5 G RNNG G RNNG Choe & Charniak 2016 Dyer et al Kuncoro et al Ours

59 Comparison to past work F1 on PT 93.8 add silver data add G LSTM 93.5 G RNNG G RNNG Choe & Charniak 2016 Dyer et al Kuncoro et al Ours

60 Comparison to past work F1 on PT 94.7 add silver data 93.8 add silver data add G LSTM 93.5 G RNNG G RNNG Choe & Charniak 2016 Dyer et al Kuncoro et al Ours

61 Search procedure for G Conclusion

62 Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017)

63 Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017) Found model combination effects in G

64 Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017) Found model combination effects in G Large improvements from simple, explicit score combination: G +

65 Thanks!

Improving Sequence-to-Sequence Constituency Parsing

Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency