Multilevel Coarse-to-Fine PCFG Parsing

Similar documents
CS 545 Lecture XVI: Parsing

Probabilistic Context-free Grammars

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

A Context-Free Grammar

Parsing with Context-Free Grammars

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Natural Language Processing

Probabilistic Context Free Grammars. Many slides from Michael Collins

Statistical Methods for NLP

CS 6120/CS4120: Natural Language Processing

Structured language modeling

LECTURER: BURCU CAN Spring

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Constituency Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Processing/Speech, NLP and the Web

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CS460/626 : Natural Language

Advanced Natural Language Processing Syntactic Parsing

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Latent Variable Models in NLP

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

The Infinite PCFG using Hierarchical Dirichlet Processes

In this chapter, we explore the parsing problem, which encompasses several questions, including:

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

A* Search. 1 Dijkstra Shortest Path

Syntax-Based Decoding

Chapter 14 (Partially) Unsupervised Parsing

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Lecture 5: UDOP, Dependency Grammars

Syntax-based Statistical Machine Translation

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Features of Statistical Parsers

Lecture 11: PCFGs: getting luckier all the time

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions

CS : Speech, NLP and the Web/Topics in AI

Multiword Expression Identification with Tree Substitution Grammars

Soft Inference and Posterior Marginals. September 19, 2013

Top-Down K-Best A Parsing

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Transformational Priors Over Grammars

Aspects of Tree-Based Statistical Machine Translation

Probabilistic Context-Free Grammar

Decoding and Inference with Syntactic Translation Models

NLP Programming Tutorial 11 - The Structured Perceptron

Statistical Machine Translation

10/17/04. Today s Main Points

Probabilistic Context Free Grammars

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Spectral Unsupervised Parsing with Additive Tree Metrics

Random Generation of Nondeterministic Tree Automata

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CS838-1 Advanced NLP: Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Multi-Component Word Sense Disambiguation

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and the EM Algorithm Part I

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov

Language and Statistics II

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

The Rise of Statistical Parsing

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

DT2118 Speech and Speaker Recognition

Aspects of Tree-Based Statistical Machine Translation

Dynamic programming for parsing and estimation of. Stochastic Unification-Based Grammars (SUBGs)

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Intelligent Systems (AI-2)

Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Terry Gaasterland Scripps Institution of Oceanography University of California San Diego, La Jolla, CA,

Marrying Dynamic Programming with Recurrent Neural Networks

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Ryosuke Kojima & Taisuke Sato Tokyo Institute of Technology

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Natural Language Processing

Maximum Entropy Models for Natural Language Processing

Effectiveness of complex index terms in information retrieval

Transcription:

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa Vu Brown Laboratory for Linguistic Information Processing (BLLIP)

Statistical Parsing Speed Lexicalized statistical parsing can be slow. Charniak: 0.7 seconds per sentence. Real applications demand more speed! Large corpora, eg. NANTC (McClosky, Charniak and Johnson 2006) More words to consider-- lattices from speech recognition (Hall and Johnson 2004) Costly second stage such as question answering.

Constit Length 4 wds S1 S 3 wds 2 wds 1 wd POS S1 S NP NP Bottom-up Parsing I S1 S S1 S S VP VP NP (NNP Ms.) (NNP Haag) (VBZ plays) (NNP Elianti) Beginning word The constituent (VP (VBZ plays) (NP (NNP Elianti)) Standard probabilistic CKY chart parsing. Computes the inside probability β for each constituent.

Constit Length 4 wds S1 S 3 wds S1 S 2 wds NP 1 wd POS NP Bottom-up Parsing II S1 S S1 S S VP (NNP Ms.) (NNP Haag) (VBZ plays) (NNP Elianti) Beginning word Some constituents are gold constituents (parts of correct parse). NP These may not be part of the highest probability (Viterbi) parse. We can use a reranker to try to pick them out later on. VP

Pruning We want to dispose of the incorrect constituents and retain the gold. Initial idea: prune constituents with low probability (~ outside α times inside β). p(n k i,j s) = α(nk i,j )β(nk i,j ) p(s) 4 wds 3 wds 2 wds 1 wd POS S1 S S1 S S1 S NP S1 S S VP NP VP NP (NNP Ms.) (NNP Haag) (VBZ plays) (NNP Elianti)

Outside Probabilities We need the full parse of the sentence to get outside probability α. Estimates how well the constituent contributes to spanning parses for the sentence. S1 S α 1 S1 S α 0 Caraballo and Charniak (1998): agenda reordering method-- proper pruning needs an approximation of α. Approximated α using ngrams at constituent boundaries.

Coarse-to-Fine Parsing Parse quickly with a smaller grammar. 4 wds 3 wds 2 wds 1 wd POS S1 P S1 P S1 P P S1 P P P P P (NNP Ms.) (NNP Haag) (VBZ plays) (NNP Elianti) Now calculate α using the full chart. 4 wds 3 wds 2 wds 1 wd POS S1 P S1 P S1 P P S1 P P P P P (NNP Ms.) (NNP Haag) (VBZ plays) (NNP Elianti)

Coarse-to-Fine Parsing II Prune the chart, then reparse with a more specific grammar. 4 wds 3 wds 2 wds 1 wd POS S1 S_ S1 S S1 S N_ S1 P S_ V_ N_ V_ N_ (NNP Ms.) (NNP Haag) (VBZ plays) (NNP Elianti) Repeat the process until the final grammar is reached. Reduces the cost of a high grammar constant.

Related Work Two-stage parsers: Maxwell and Kaplan (1993); automatically extracted first stage Goodman (1997); first stage uses regular expressions Charniak (2000); first stage is unlexicalized Agenda reordering: Klein and Manning (2003); A* search for the best parse using an upper bound on α. Tsuruoka and Tsujii (2004); iterative deepening.

Parser Details Binarized grammar based on Klein and Manning (2003) Head annotation. Vertical (parent) and horizontal (sibling) Markov context. S NP (DT the) (JJ quick) (JJ brown) (NN fox) S NP^S <NP-NN^S+JJ <NP-NN^S+JJ (DT the) (JJ quick) (JJ brown) (NN fox)

Coarse-to-Fine Scheme S1 P Level 0 S1 HP MP Level 1 S1 Level 2 S_ N_ A_ P_ S1 S VP UCP SQ SBAR SBARQ NP NAC NX LST X UCP FRAG Level 3: Full Treebank Grammar ADJP QP CONJP ADVP INTJ PRN PRT PP PRT RRC WHADJP WHADVP WHNP WHPP

Examples Level 0 Level 1 Level 2 Level 3 (Treebank)

Coarse-to-Fine Probabilities Heuristic probabilities: P(N_ N_ P_) = weighted-avg( Using max instead of avg computes an exact upper bound instead of a heuristic (Geman and Kochanek 2001). No smoothing needed. P(NP NP PP) P(NP NP PRT)... P(NP NAC PP) P(NP NAC PRT)... P(NAC NP PP)...)

Pruning Thresholds Pruning threshold vs. probability of pruning a gold constituent Threshold vs. fraction of incorrect constituents remaining. Prob. % Pruning threshold Pruning threshold

Pruning Statistics Constits Produced Constits Pruned % Pruned (millions) (millions) Level 0 8.82 7.55 86.5 Level 1 9.18 6.51 70.8 Level 2 11.2 9.48 84.4 Level 3 11.8 0 0 Total 40.4 - - Level 3 only 392 0 0

Timing Statistics Time At Level Cumulative Time F-score Level 0 1598 1598 Level 1 2570 4164 Level 2 4303 8471 Level 3 1527 9998 77.9 Level 3 only 114654-77.9 10x speed increase from pruning.

Discussion No loss in f-score from pruning. Each pruning level is useful. Prunes ~80% of constituents produced. Pruning at level 0 (only two nonterminals, S1 / P) Preterminals are still useful. Probability of P-IN NN IN (a constituent ending with a preposition) will be very low.

Conclusion Multi-level coarse-to-fine parsing allows bottomup parsing to use top-down information. Deciding on good parent labels. Using the string boundary. Can be combined with agenda reordering methods. Use coarser levels to estimate outside probability. More stages of parsing can be added. Lexicalization.

Future Work The coarse-to-fine scheme we use is handgenerated. A coarse-to-fine scheme is just a hierarchical clustering of constituent labels. Hierarchical clustering is a well-understood task. Should be possible to define an objective function and search for the best scheme. Could be used to automatically find useful annotations/lexicalizations.

Acknowledgements Class project for CS 241 at Brown University Funded by: Darpa GALE Brown University fellowships Parents of undergraduates Our thanks to all!