Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley

Similar documents
Improving Sequence-to-Sequence Constituency Parsing

Marrying Dynamic Programming with Recurrent Neural Networks

Recurrent neural network grammars

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Parsing with Context-Free Grammars

Random Generation of Nondeterministic Tree Automata

Multilevel Coarse-to-Fine PCFG Parsing

Natural Language Processing

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Improvements to Training an RNN parser

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

NLP Programming Tutorial 11 - The Structured Perceptron

Machine Learning for Structured Prediction

The Infinite PCFG using Hierarchical Dirichlet Processes

Bayes Risk Minimization in Natural Language Parsing

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Features of Statistical Parsers

Probabilistic Context-free Grammars

Multiword Expression Identification with Tree Substitution Grammars

1. For the following sub-problems, consider the following context-free grammar: S A$ (1) A xbc (2) A CB (3) B yb (4) C x (6)

Outline. Parsing. Approximations Some tricks Learning agenda-based parsers

LECTURER: BURCU CAN Spring

A Context-Free Grammar

1. For the following sub-problems, consider the following context-free grammar: S AA$ (1) A xa (2) A B (3) B yb (4)

Direct Output Connection for a High-Rank Language Model

Algorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Statistical Methods for NLP

Advanced Natural Language Processing Syntactic Parsing

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I

Decoding and Inference with Syntactic Translation Models

Driving Semantic Parsing from the World s Response

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context Free Grammars. Many slides from Michael Collins

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

A Support Vector Method for Multivariate Performance Measures

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Algorithms for Syntax-Aware Statistical Machine Translation

Parsing with Context-Free Grammars

Learning to translate with neural networks. Michael Auli

CS 545 Lecture XVI: Parsing

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars

Statistical Machine Translation of Natural Languages

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Aspects of Tree-Based Statistical Machine Translation

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Processing/Speech, NLP and the Web

Structure and Complexity of Grammar-Based Machine Translation

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Long-Short Term Memory and Other Gated RNNs

Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Introduction to Computational Linguistics

An introduction to PRISM and its applications

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Dynamic Programming for Linear-Time Incremental Parsing

Dual Decomposition for Natural Language Processing. Decoding complexity

Decipherment of Substitution Ciphers with Neural Language Models

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

STRUCTURED LEARNING WITH INEXACT SEARCH: ADVANCES IN SHIFT-REDUCE CCG PARSING. Wenduan Xu

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Deep Learning For Mathematical Functions

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

The Rise of Statistical Parsing

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

CS388: Natural Language Processing Lecture 4: Sequence Models I

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Machine Learning (CS 567) Lecture 2

CS460/626 : Natural Language

Semantic Role Labeling and Constrained Conditional Models

The relation of surprisal and human processing

Bringing machine learning & compositional semantics together: central concepts

Introduction to Data-Driven Dependency Parsing

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Smoothing for Bracketing Induction

Globally Normalized Transition-Based Neural Networks

Lecture 5: UDOP, Dependency Grammars

Statistical NLP Spring 2010

Context-Free Grammars (and Languages) Lecture 7

Transcription:

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley

Top-down generative models

Top-down generative models S VP The man had an idea.

Top-down generative models S VP The man had an idea. (S

Top-down generative models S VP The man had an idea. (S (

Top-down generative models S VP (S ( The The man had an idea.

Top-down generative models S VP (S ( The man The man had an idea.

Top-down generative models S VP (S ( The man ) The man had an idea.

Top-down generative models S VP The man had an idea. (S ( The man ) (VP

Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). )

Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). ) G LSTM [Parsing as Language Modeling, Choe and Charniak, 2016]

Top-down generative models S VP The man had an idea. (S ( The man ) (VP had ( an idea ) ). ) G LSTM [Parsing as Language Modeling, Choe and Charniak, 2016] G RNNG [Recurrent Neural Network Grammars, Dyer et al. 2016]

Generative models as rerankers

Generative models as rerankers base parser generative neural model G

Generative models as rerankers base parser generative neural model G S-INV S VP S VP ADJP The man had an idea. The man had an idea. The man had an idea. y ~ p y x)

Generative models as rerankers base parser generative neural model G S-INV VP S ADJP The man had an idea. The man had an idea. S VP The man had an idea. y ~ p y x) argmax y p G (x, y)

Generative models as rerankers base parser generative neural model G

Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank

Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank Choe and Charniak 2016 89.7 Charniak parser 92.6 LSTM language model (G LSTM )

Generative models as rerankers base parser generative neural model G F1 on Penn Tree ank Choe and Charniak 2016 Dyer et al. 2016 89.7 Charniak parser 91.7 RNNG-discriminative 92.6 LSTM language model (G LSTM ) 93.3 RNNG-generative (G RNNG )

: Necessary evil, or secret sauce? base parser generative neural model G

: Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with?

: Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with? No, better to combine and G more explicitly

: Necessary evil, or secret sauce? base parser generative neural model G Should we try to do away with? No, better to combine and G more explicitly 93.9 F1 on PT; 94.7 semi-supervised

Using standard beam search for G True Parse (S ( The man eam

Using standard beam search for G True Parse (S ( The man (S eam

Using standard beam search for G True Parse (S ( The man (S ( eam (VP (PP

Using standard beam search for G True Parse (S ( The man (S ( ( eam (VP (PP ( (

Using standard beam search for G True Parse (S ( The man (S ( ( ( eam (VP ( ( (PP ( (

Using standard beam search for G True Parse (S ( The man (S ( ( (... The eam (VP ( ( (PP ( (

Using standard beam search for G True Parse (S ( The man (S ( ( (... The eam (VP ( ( (PP ( ( G RNNG G LSTM eam Size 100 29.1 F1 27.4 F1

Log probability Standard beam search in G fails Word generation is lexicalized: 0-1 -2-3 -4-5 (S ( The man ) (VP had ( an idea ) ). )

Word-synchronous beam search w 0 (S [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]

Word-synchronous beam search w 0 w 1 (S ( (VP (PP ( ( The The The [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]

Word-synchronous beam search w 0 w 1 w 2 (S ( ( The ( man (VP The ( man (PP ( The man [Roark 2001; Titov and Henderson 2010; Charniak 2010; uys and lunsom 2015 ]

F1 on PT Word-synchronous beam search 95 90 G LSTM 85 80 G RNNG 75 70 100 200 300 400 500 600 700 800 900 1000 eam Size

F1 on PT Word-synchronous beam search 95 90 G LSTM G RNNG G LSTM 85 80 G RNNG 75 70 100 200 300 400 500 600 700 800 900 1000 eam Size

Finding model combination effects G

Finding model combination effects G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.

Finding model combination effects Add G s search proposal to candidate list: G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.

Finding model combination effects Add G s search proposal to candidate list: G G S-INV S- VP S VP ADJP The man had an idea. The man had an idea. The man had an idea.

Finding model combination effects Add G s search proposal to candidate list: G G S S VP S VP S-INV VP S- VP S The man had an idea. VP The man had an idea ADJP. The man had an idea. The man had an idea. The man had an idea. The man had an idea.

Finding model combination effects Add G s search proposal to candidate list: G G S VP S S-INV VP S- VP S The man had an idea. VP ADJP The man had an idea. The man had an idea. The man had an idea. The man had an idea. S VP The man had an idea.

Finding model combination effects F1 on PT 93.5 93.7 G RNNG RNNG Generative Model G LSTM LSTM Generative Model

Finding model combination effects F1 on PT 93.7 93.5 93.5 92.8 G RNNG RNNG Generative Model G LSTM LSTM Generative Model

Reranking shows implicit model combination G hides model errors in G

Making model combination explicit Can we do better by simply combining model scores? G G G log p G (x, y)

Making model combination explicit Can we do better by simply combining model scores? G + G G + log p G (x, y)

Making model combination explicit Can we do better by simply combining model scores? G + G G + λ log p G (x, y) + 1 λ log p (y x)

Making model combination explicit score with G + score with G F1 on PT 93.5 93.7 93.5 92.8 G RNNG RNNG Generative Model (G=G RNNG ) G LSTM LSTM Generative Model (G=G LSTM )

Making model combination explicit score with G + score with G 93.9 93.9 F1 on PT 94.0 94.2 93.5 93.7 93.5 92.8 G RNNG RNNG Generative Model (G=G RNNG ) G LSTM LSTM Generative Model (G=G LSTM )

Explicit score combination prevents errors G + fast G G + best

Comparison to past work F1 on PT

Comparison to past work F1 on PT 92.6 Choe & Charniak 2016

Comparison to past work F1 on PT 93.3 92.6 Choe & Charniak 2016 Dyer et al. 2016

Comparison to past work F1 on PT 93.3 93.6 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017

Comparison to past work F1 on PT 93.3 93.6 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours

Comparison to past work F1 on PT 93.3 93.6 93.9 add G LSTM 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours

Comparison to past work F1 on PT 93.8 add silver data 93.3 93.6 93.9 add G LSTM 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours

Comparison to past work F1 on PT 94.7 add silver data 93.8 add silver data 93.3 93.6 93.9 add G LSTM 93.5 G RNNG G RNNG + 92.6 Choe & Charniak 2016 Dyer et al. 2016 Kuncoro et al. 2017 Ours

Search procedure for G Conclusion

Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017)

Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017) Found model combination effects in G

Conclusion Search procedure for G (more effective version forthcoming: Stern et al., EMNLP 2017) Found model combination effects in G Large improvements from simple, explicit score combination: G +

Thanks!