PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Similar documents
Probabilistic Context-Free Grammar

Probabilistic Context Free Grammars

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

CS : Speech, NLP and the Web/Topics in AI

Processing/Speech, NLP and the Web

CS460/626 : Natural Language

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Statistical Methods for NLP

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Probabilistic Context-free Grammars

Advanced Natural Language Processing Syntactic Parsing

CS460/626 : Natural Language

The Infinite PCFG using Hierarchical Dirichlet Processes

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

LECTURER: BURCU CAN Spring

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

A* Search. 1 Dijkstra Shortest Path

Introduction to Probablistic Natural Language Processing

Natural Language Processing

Probabilistic Context Free Grammars. Many slides from Michael Collins

Chapter 14 (Partially) Unsupervised Parsing

In this chapter, we explore the parsing problem, which encompasses several questions, including:

Lecture 5: UDOP, Dependency Grammars

Multilevel Coarse-to-Fine PCFG Parsing

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Probabilistic Linguistics

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Probabilistic Context-Free Grammars and beyond

Parsing with Context-Free Grammars

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Aspects of Tree-Based Statistical Machine Translation

A Context-Free Grammar

Decoding and Inference with Syntactic Translation Models

Parsing with Context-Free Grammars

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Even More on Dynamic Programming

Introduction to Computational Linguistics

The Inside-Outside Algorithm

CSE 490 U Natural Language Processing Spring 2016

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Expectation Maximization (EM)

Soft Inference and Posterior Marginals. September 19, 2013

CS 6120/CS4120: Natural Language Processing

Aspects of Tree-Based Statistical Machine Translation

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

An introduction to PRISM and its applications

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Context- Free Parsing with CKY. October 16, 2014

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Statistical Methods for NLP

Lecture 12: Algorithms for HMMs

CSC236H Lecture 2. Ilir Dema. September 19, 2018

Einführung in die Computerlinguistik

Features of Statistical Parsers

Remembering subresults (Part I): Well-formed substring tables

DT2118 Speech and Speaker Recognition

CSE 447/547 Natural Language Processing Winter 2018

Lecture 12: Algorithms for HMMs

Constituency Parsing

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

(pp ) PDAs and CFGs (Sec. 2.2)

Syntax-Based Decoding

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

CYK Algorithm for Parsing General Context-Free Grammars

10/17/04. Today s Main Points

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Sequence Labeling: HMMs & Structured Perceptron

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Lecture 3: ASR: HMMs, Forward, Viterbi

Multiword Expression Identification with Tree Substitution Grammars

Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Hidden Markov Modelling

Algorithms for Syntax-Aware Statistical Machine Translation

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Syntax-based Statistical Machine Translation

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Sharpening the empirical claims of generative syntax through formalization

Logic-based probabilistic modeling language. Syntax: Prolog + msw/2 (random choice) Pragmatics:(very) high level modeling language

Lecture 13: Structured Prediction

Bayesian Inference for PCFGs via Markov chain Monte Carlo

Reconstruction. Reading for this lecture: Lecture Notes.

Statistical methods in NLP, lecture 7 Tagging and parsing

X-bar theory. X-bar :

Relating Probabilistic Grammars and Automata

EM with Features. Nov. 19, Sunday, November 24, 13

Transcription:

1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside-

2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence w 1m according to grammar G? P(w 1m G) What is the most likely parse for a sentence? arg max t P(t w 1m, G) How can we choose rule probabilities for the grammar G that maximize the probability of a sentence? arg max G P(w 1m G) Inside Inside-

2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence w 1m according to grammar G? P(w 1m G) What is the most likely parse for a sentence? arg max t P(t w 1m, G) How can we choose rule probabilities for the grammar G that maximize the probability of a sentence? arg max G P(w 1m G) Inside Inside-

3 / 22 Take one subtree from w p to w q : outside probability: probability of beginning with start symbol N 1 and generating the nonterminal N j pq and all the words not in the subtree, α j (p, q) = P(w 1(p 1), N j pq, w (q+1)m G) Inside Inside- inside probability: probability of words w p...w q given that N j is the starting point, β j (p, q) = P(w pq N j pq, G)

4 / 22 (2) Inside Inside- w 1... w p 1 w p... w q w q+1... w m

4 / 22 (2) Inside Inside- N j β w 1... w p 1 w p... w q w q+1... w m

4 / 22 (2) N 1 α N j β Inside Inside- w 1... w p 1 w p... w q w q+1... w m

5 / 22 Inside Base Case Inside base case: subtree for word w k and rule N j w k β j (k, k) = P(w k N j kk, G) = P(N j w k G) Inside-

6 / 22 Inside Induction induction: findβ j (p, q) for p<q Chomsky Normal Form rule of form: N j N r N s N j N r N s w p w d w d+1 w q Inside Inside-

Inside Induction induction: findβ j (p, q) for p<q Chomsky Normal Form rule of form: N j N r N s N j N r N s w p w d w d+1 w q j, 1 p< q m: Inside Inside- β j (p, q) = q 1 r,s d=p P(N j N r N s ) β r (p, d) β s (d + 1, q) 6 / 22

7 / 22 The inside algorithm When at a particular focus non-terminal node N j, figure out the different ways to expand the node. Implementation: only consider expansions which match a previously calculated inside probability i.e., use a chart For example, in a sentence with a VP trying to cover positions 2 through 5 (astronomers [saw stars with ears]): (1) β VP (2, 5) = P(VP VNP)β V (2, 2)β NP (3, 5)+ P(VP VP PP)β VP (2, 3)β PP (4, 5) Inside Inside-

8 / 22 Probability of a string Inside P(w 1m G) = P(N 1 w 1m G) = P(w 1m N 1 1m, G) =β 1 (1, m) Inside-

9 / 22 Induction Inside base case: probability of root being N j, nothing outside α 1 (1, m) = 1 N 1 is the start symbol α j (1, m) = 0 for j 1 Inside-

10 / 22 Induction (2) induction: node N j pq can be either the left or the right daughter sum over both possibilities N 1 1m N j pq N f pe N g (q+1)e w 1... w p 1 w p... w q w q+1... w e w e+1... w m Inside Inside-

10 / 22 Induction (2) induction: node N j pq can be either the left or the right daughter sum over both possibilities N 1 1m N j pq N f pe N g (q+1)e w 1... w p 1 w p... w q w q+1... w e w e+1... w m N 1 1m Inside Inside- N f eq N g e(p 1) N j pq w 1... w e 1 w e... w p 1 w p... w q w q+1... w m

11 / 22 Induction (3) α j (p, q) = [ m f,g e=q+1 P(w 1(p 1), w (q+1)m, N f pe, Nj pq, Ng (q+1)e )] p 1 +[ P(w 1(p 1), w (q+1)m, Neq f, Ng e(p 1), Nj pq )] f,g e=1 Inside Inside-

11 / 22 Induction (3) α j (p, q) = [ = [ m f,g e=q+1 P(w 1(p 1), w (q+1)m, N f pe, Nj pq, Ng (q+1)e )] p 1 +[ P(w 1(p 1), w (q+1)m, Neq f, Ng e(p 1), Nj pq )] f,g e=1 m f,g e=q+1 P(w 1(p 1), w (e+1)m, N f pe) P(N j pq, Ng (q+1)e Nf pe)p(w (q+1)e N g (q+1)e )] p 1 +[ P(w 1(e 1), w (q+1)m, Neq) f f,g e=1 P(N g e(p 1), Nj pq Nf eq)p(w e(p 1) N g e(p 1) )] Inside Inside-

12 / 22 Induction (4) α j (p, q) = [ m f,g e=q+1 P(w 1(p 1), w (e+1)m, N f pe) P(N j pq, Ng (q+1)e Nf pe)p(w (q+1)e N g (q+1)e )] p 1 +[ P(w 1(e 1), w (q+1)m, Neq) f f,g e=1 P(N g e(p 1), Nj pq Nf eq)p(w e(p 1) N g e(p 1) )] = [ m f,g e=q+1 α f (p, e)p(n f N j N g )β g (q + 1, e)] p 1 +[ α f (e, q)p(n f N g N j )β g (e, p 1)] f,g e=1 Inside Inside-

13 / 22 Example grammar: S NP VP 1.0 NP astronomers 0.1 NP NP PP 0.4 NP ears 0.18 VP VNP 0.7 NP saw 0.04 VP VP PP 0.3 NP stars 0.18 PP PNP 1.0 NP telescopes 0.1 V saw 1.0 P with 1.0 Inside Inside- sentence: astronomers saw stars with ears α S (1, 5) = 1.0

13 / 22 Example grammar: S NP VP 1.0 NP astronomers 0.1 NP NP PP 0.4 NP ears 0.18 VP VNP 0.7 NP saw 0.04 VP VP PP 0.3 NP stars 0.18 PP PNP 1.0 NP telescopes 0.1 V saw 1.0 P with 1.0 sentence: astronomers saw stars with ears Inside Inside- α S (1, 5) = 1.0 α NP (1, 1) = α S (1, 5) P(S NP VP) β VP (2, 5) = 1.0 1.0 0.015876= 0.015876

13 / 22 Example grammar: S NP VP 1.0 NP astronomers 0.1 NP NP PP 0.4 NP ears 0.18 VP VNP 0.7 NP saw 0.04 VP VP PP 0.3 NP stars 0.18 PP PNP 1.0 NP telescopes 0.1 V saw 1.0 P with 1.0 sentence: astronomers saw stars with ears Inside Inside- α PP (4, 5) = α NP (3, 5) P(NP NP PP) β NP (3, 3) +α VP (2, 5) P(VP VP PP) β VP (2, 3) = 0.07 0.4 0.18 + 0.1 0.3 0.126 = 0.00882

14 / 22 Example (2) complete table: 5 α S (1, 5) = α VP (2, 5) = α NP (3, 5) = α PP (4, 5) = α NP (5, 5) = 1.0 0.1 0.07 0.00882 0.00882 4 α P (4, 4) = 0.0015876 3 α VP (2, 3) = α NP (3, 3) = 0.0054 0.00882 2 α V (2, 2) = 0.0015876 1 α NP (1, 1) = 0.015876 end astronomerssaw stars with ears beg 1 2 3 4 5 Inside Inside-

15 / 22 Probability of a string for any k, 1 k m Inside P(w 1m G) = P(w 1(k 1), w k, w (k+1)m, N j kk G) j Inside-

15 / 22 Probability of a string for any k, 1 k m Inside P(w 1m G) = = P(w 1(k 1), w k, w (k+1)m, N j kk G) j P(w 1(k 1), N j kk, w (k+1)m G) j P(w k w 1(k 1), N j kk, w (k+1)m, G) Inside-

15 / 22 Probability of a string for any k, 1 k m Inside P(w 1m G) = = = P(w 1(k 1), w k, w (k+1)m, N j kk G) j P(w 1(k 1), N j kk, w (k+1)m G) j P(w k w 1(k 1), N j kk, w (k+1)m, G) α j (k, k)p(n j w k ) j Inside-

16 / 22 Inside- α j (p, q)β j (p, q) = P(w 1(p 1), N j pq, w (q+1)m G) P(w pq N j pq, G) = P(w 1m, N j pq G) Inside Inside- but: postulate a non-terminal node! P(w 1m, N pq G) = α j (p, q)β j (p, q) j

17 / 22 Finding the Most Likely Parse Goal: finding the parse with the highest probability: P(ˆt) = max 1 i n δ i (1, m) Method: Viterbi-style parsing Viterbi: defining accumulatorsδi (p, q) which record the highest probability for the subtree with root N i dominating the words W pq Idea: if we assume that a node Npq i is used in the derivation, then: it needs to be the subtree with maximal probability, all other subtrees for wpq having the root N i have a lower probabiltiy, and will result in an analysis with lower probability Inside Inside-

18 / 22 Initialization: δ i (p, p) = P(N i w p ) Induction: δ i (p, q) = max 1 j,k n;p r<q P(N i N j N k ) δ j (p, r) δ k (r + 1, q) Inside Inside- Store backtrace: Ψ i (p, q) = arg max (j,k,r) P(N i N j N k ) δ j (p, r) δ k (r + 1, q)

19 / 22 (2) Termination: Inside Reconstruction: P(ˆt) =δ 1 (1, m) Inside- root node: N 1 1m general case: N i pq ˆt and Ψ i (p, q) = (j, k, r) left(n i pq ) = N j pr right(n i pq ) = N k (r+1)q

20 / 22 // base case lexical rules for i = 1 to n do // for all words for all A w i G do // for all rules δ(i, 1, A) = P(A w i ) // recursive case for len = 2 to n for i = 1 to n len+1 for k = 1 to len 1 for all A B C G do prob = P(A B C) δ(i, k, B) δ(i + k, len k, C) if prob>δ(i, len, A) then δ(i, len, A) = prob Ψ(i, len, A) = (B,C,k) // for all lengths // for all words // for all splits // for all rules // new prob // new max? // backtrace Inside Inside-

21 / 22 (2) // base case lexical rules for i = 1 to n do for all A w i G do δ(i, 1, A) = P(A w i ) // recursive case for len = 2 to n for i = 1 to n len+1 for k = 1 to len 1 for all A B C G do prob = P(A B C) δ(i, k, B) δ(i + k, len k, C) if prob > δ(i, len, A) then δ(i, len, A) = prob Ψ(i, len, A) = (B,C,k) Inside Inside- indexes ofδ, Ψ: value of Ψ: other: (no. of first word of const., length of const., root node) (first daughter, second daughter, length of first daughter) beginning of second daughter, length of second daughter

22 / 22 Example Grammar: S PP MS 1.0 IN At 0.5 PP IN NP 1.0 IN in 0.5 MS NP VP 0.8 JJ Roman 1.0 MS NN VP 0.2 NN banquets 0.2 NP DT NN 0.5 NN guests 0.3 NP JJ NN 0.3 NN garlic 0.2 NP NP PP 0.2 NN hair 0.3 VP VBD NP 0.6 DT the 0.6 VP VBD XP 0.4 DT their 0.4 XP NP PP 0.7 VBD wore 1.0 XP NN PP 0.3 Sentence: At Roman banquets, the guests wore garlic in their hair Inside Inside-