Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley
Top-down generative models
Top-down generative models S VP The man had an idea.
Top-down generative models S VP The man had an idea. (S
Top-down generative models S VP The man had an idea. (S (
Top-down generative models S VP (S ( The The man had an idea.
Top-down generative models S VP (S ( The man The man had an idea.
Top-down generative models S VP (S ( The man ) The man had an idea.
Top-down generative models S VP The man had an idea. (S ( The man ) (VP
Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). )
Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). ) G LSTM [Parsing as Language Modeling, Choe and Charniak, 2016]
Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). ) G LSTM [Parsing as Language Modeling, Choe and Charniak, 2016] G RNNG [Recurrent Neural Network Grammars, Dyer et al. 2016]
Generative models as rerankers
Generative models as rerankers base parser generative neural model G
Generative models as rerankers base parser generative neural model G S-INV S VP S VP ADJP The man had an idea. The man had an idea. The man had an idea. y ~ p y x)
Generative models as rerankers base parser generative neural model G S-INV VP S ADJP The man had an idea. The man had an idea. S VP The man had an idea. y ~ p y x) argmax y p G (x, y)
Generative models as rerankers base parser generative neural model G
Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank
Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank Choe and Charniak 2016 89.7 Charniak parser 92.6 LSTM language model (G LSTM )
Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank Choe and Charniak 2016 Dyer et al. 2016 89.7 Charniak parser 91.7 RNNG-discriminative 92.6 LSTM language model (G LSTM ) 93.3 RNNG-generative (G RNNG )
: Necessary evil, or secret sauce? base parser generative neural model G
: Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with?
: Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with? No, better to combine and G more explicitly
: Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with? No, better to combine and G more explicitly 93.9 F1 on PT; 94.7 semi-supervised
Using standard beam search for G True Parse (S ( The man eam
Using standard beam search for G True Parse (S ( The man (S eam
Using standard beam search for G True Parse (S ( The man (S ( eam (VP (PP
Using standard beam search for G True Parse (S ( The man (S ( ( eam (VP (PP ( (
Using standard beam search for G True Parse (S ( The man (S ( ( ( eam (VP ( ( (PP ( (
Using standard beam search for G True Parse (S ( The man (S ( ( (... The eam (VP ( ( (PP ( (
Using standard beam search for G True Parse (S ( The man (S ( ( (... The eam (VP ( ( (PP ( ( G RNNG G LSTM eam Size 100 29.1 F1 27.4 F1
Log probability Standard beam search in G fails Word generation is lexicalized: 0-1 -2-3 -4-5 (S ( The man ) (VP had ( an idea ) ). )
Word-synchronous beam search w 0 (S [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]
Word-synchronous beam search w 0 w 1 (S ( (VP (PP ( ( The The The [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]
Word-synchronous beam search w 0 w 1 w 2 (S ( ( The ( man (VP The ( man (PP ( The man [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]
F1 on PT Word-synchronous beam search 95 90 G LSTM 85 80 G RNNG 75 70 100 200 300 400 500 600 700 800 900 1000 eam Size
F1 on PT Word-synchronous beam search 95 90 G LSTM G RNNG G LSTM 85 80 G RNNG 75 70 100 200 300 400 500 600 700 800 900 1000 eam Size
Finding model combination effects G
Finding model combination effects G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.
Finding model combination effects Add G s search proposal to candidate list: G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.
Finding model combination effects Add G s search proposal to candidate list: G G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.
Finding model combination effects Add G s search proposal to candidate list: G G S S VP S VP S-INV VP S- VP S The man had an idea. VP The man had an idea ADJP. The man had an idea. The man had an idea. The man had an idea. The man had an idea.
Finding model combination effects Add G s search proposal to candidate list: G G S VP S S-INV VP S- VP S The man had an idea. VP ADJP The man had an idea. The man had an idea. The man had an idea. The man had an idea. S VP The man had an idea.
Finding model combination effects F1 on PT 93.5 93.7 G RNNG RNNG Generative Model G LSTM LSTM Generative Model
Finding model combination effects F1 on PT 93.7 93.5 93.5 92.8 G RNNG RNNG Generative Model G LSTM LSTM Generative Model
Reranking shows implicit model combination G hides model errors in G
Making model combination explicit Can we do better by simply combining model scores? G G G log p G (x, y)
Making model combination explicit Can we do better by simply combining model scores? G + G G + log p G (x, y)
Making model combination explicit Can we do better by simply combining model scores? G + G G + λ log p G (x, y) + 1 λ log p (y x)
Making model combination explicit score with G + score with G F1 on PT 93.5 93.7 93.5 92.8 G RNNG RNNG Generative Model (G=G RNNG ) G LSTM LSTM Generative Model (G=G LSTM )
Making model combination explicit score with G + score with G 93.9 93.9 F1 on PT 94.0 94.2 93.5 93.7 93.5 92.8 G RNNG RNNG Generative Model (G=G RNNG ) G LSTM LSTM Generative Model (G=G LSTM )
Explicit score combination prevents errors G + fast G G + best
Comparison to past work F1 on PT
Comparison to past work F1 on PT 92.6 Choe & Charniak 2016
Comparison to past work F1 on PT 93.3 92.6 Choe & Charniak 2016 Dyer et al. 2016
Comparison to past work F1 on PT 93.3 93.6 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017
Comparison to past work F1 on PT 93.3 93.6 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours
Comparison to past work F1 on PT 93.3 93.6 93.9 add G LSTM 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours
Comparison to past work F1 on PT 93.8 add silver data 93.3 93.6 93.9 add G LSTM 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours
Comparison to past work F1 on PT 94.7 add silver data 93.8 add silver data 93.3 93.6 93.9 add G LSTM 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours
Search procedure for G Conclusion
Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017)
Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017) Found model combination effects in G
Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017) Found model combination effects in G Large improvements from simple, explicit score combination: G +
Thanks!