PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside-

2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence w 1m according to grammar G? P(w 1m G) What is the most likely parse for a sentence? arg max t P(t w 1m, G) How can we choose rule probabilities for the grammar G that maximize the probability of a sentence? arg max G P(w 1m G) Inside Inside-

3 / 22 Take one subtree from w p to w q : outside probability: probability of beginning with start symbol N 1 and generating the nonterminal N j pq and all the words not in the subtree, α j (p, q) = P(w 1(p 1), N j pq, w (q+1)m G) Inside Inside- inside probability: probability of words w p...w q given that N j is the starting point, β j (p, q) = P(w pq N j pq, G)

4 / 22 (2) Inside Inside- w 1... w p 1 w p... w q w q+1... w m

4 / 22 (2) Inside Inside- N j β w 1... w p 1 w p... w q w q+1... w m

4 / 22 (2) N 1 α N j β Inside Inside- w 1... w p 1 w p... w q w q+1... w m

5 / 22 Inside Base Case Inside base case: subtree for word w k and rule N j w k β j (k, k) = P(w k N j kk, G) = P(N j w k G) Inside-

6 / 22 Inside Induction induction: findβ j (p, q) for p<q Chomsky Normal Form rule of form: N j N r N s N j N r N s w p w d w d+1 w q Inside Inside-

Inside Induction induction: findβ j (p, q) for p<q Chomsky Normal Form rule of form: N j N r N s N j N r N s w p w d w d+1 w q j, 1 p< q m: Inside Inside- β j (p, q) = q 1 r,s d=p P(N j N r N s ) β r (p, d) β s (d + 1, q) 6 / 22

7 / 22 The inside algorithm When at a particular focus non-terminal node N j, figure out the different ways to expand the node. Implementation: only consider expansions which match a previously calculated inside probability i.e., use a chart For example, in a sentence with a VP trying to cover positions 2 through 5 (astronomers [saw stars with ears]): (1) β VP (2, 5) = P(VP VNP)β V (2, 2)β NP (3, 5)+ P(VP VP PP)β VP (2, 3)β PP (4, 5) Inside Inside-

8 / 22 Probability of a string Inside P(w 1m G) = P(N 1 w 1m G) = P(w 1m N 1 1m, G) =β 1 (1, m) Inside-

9 / 22 Induction Inside base case: probability of root being N j, nothing outside α 1 (1, m) = 1 N 1 is the start symbol α j (1, m) = 0 for j 1 Inside-

10 / 22 Induction (2) induction: node N j pq can be either the left or the right daughter sum over both possibilities N 1 1m N j pq N f pe N g (q+1)e w 1... w p 1 w p... w q w q+1... w e w e+1... w m Inside Inside-

10 / 22 Induction (2) induction: node N j pq can be either the left or the right daughter sum over both possibilities N 1 1m N j pq N f pe N g (q+1)e w 1... w p 1 w p... w q w q+1... w e w e+1... w m N 1 1m Inside Inside- N f eq N g e(p 1) N j pq w 1... w e 1 w e... w p 1 w p... w q w q+1... w m

11 / 22 Induction (3) α j (p, q) = [ m f,g e=q+1 P(w 1(p 1), w (q+1)m, N f pe, Nj pq, Ng (q+1)e )] p 1 +[ P(w 1(p 1), w (q+1)m, Neq f, Ng e(p 1), Nj pq )] f,g e=1 Inside Inside-

11 / 22 Induction (3) α j (p, q) = [ = [ m f,g e=q+1 P(w 1(p 1), w (q+1)m, N f pe, Nj pq, Ng (q+1)e )] p 1 +[ P(w 1(p 1), w (q+1)m, Neq f, Ng e(p 1), Nj pq )] f,g e=1 m f,g e=q+1 P(w 1(p 1), w (e+1)m, N f pe) P(N j pq, Ng (q+1)e Nf pe)p(w (q+1)e N g (q+1)e )] p 1 +[ P(w 1(e 1), w (q+1)m, Neq) f f,g e=1 P(N g e(p 1), Nj pq Nf eq)p(w e(p 1) N g e(p 1) )] Inside Inside-

12 / 22 Induction (4) α j (p, q) = [ m f,g e=q+1 P(w 1(p 1), w (e+1)m, N f pe) P(N j pq, Ng (q+1)e Nf pe)p(w (q+1)e N g (q+1)e )] p 1 +[ P(w 1(e 1), w (q+1)m, Neq) f f,g e=1 P(N g e(p 1), Nj pq Nf eq)p(w e(p 1) N g e(p 1) )] = [ m f,g e=q+1 α f (p, e)p(n f N j N g )β g (q + 1, e)] p 1 +[ α f (e, q)p(n f N g N j )β g (e, p 1)] f,g e=1 Inside Inside-

13 / 22 Example grammar: S NP VP 1.0 NP astronomers 0.1 NP NP PP 0.4 NP ears 0.18 VP VNP 0.7 NP saw 0.04 VP VP PP 0.3 NP stars 0.18 PP PNP 1.0 NP telescopes 0.1 V saw 1.0 P with 1.0 Inside Inside- sentence: astronomers saw stars with ears α S (1, 5) = 1.0

13 / 22 Example grammar: S NP VP 1.0 NP astronomers 0.1 NP NP PP 0.4 NP ears 0.18 VP VNP 0.7 NP saw 0.04 VP VP PP 0.3 NP stars 0.18 PP PNP 1.0 NP telescopes 0.1 V saw 1.0 P with 1.0 sentence: astronomers saw stars with ears Inside Inside- α S (1, 5) = 1.0 α NP (1, 1) = α S (1, 5) P(S NP VP) β VP (2, 5) = 1.0 1.0 0.015876= 0.015876

13 / 22 Example grammar: S NP VP 1.0 NP astronomers 0.1 NP NP PP 0.4 NP ears 0.18 VP VNP 0.7 NP saw 0.04 VP VP PP 0.3 NP stars 0.18 PP PNP 1.0 NP telescopes 0.1 V saw 1.0 P with 1.0 sentence: astronomers saw stars with ears Inside Inside- α PP (4, 5) = α NP (3, 5) P(NP NP PP) β NP (3, 3) +α VP (2, 5) P(VP VP PP) β VP (2, 3) = 0.07 0.4 0.18 + 0.1 0.3 0.126 = 0.00882

14 / 22 Example (2) complete table: 5 α S (1, 5) = α VP (2, 5) = α NP (3, 5) = α PP (4, 5) = α NP (5, 5) = 1.0 0.1 0.07 0.00882 0.00882 4 α P (4, 4) = 0.0015876 3 α VP (2, 3) = α NP (3, 3) = 0.0054 0.00882 2 α V (2, 2) = 0.0015876 1 α NP (1, 1) = 0.015876 end astronomerssaw stars with ears beg 1 2 3 4 5 Inside Inside-

15 / 22 Probability of a string for any k, 1 k m Inside P(w 1m G) = P(w 1(k 1), w k, w (k+1)m, N j kk G) j Inside-

15 / 22 Probability of a string for any k, 1 k m Inside P(w 1m G) = = P(w 1(k 1), w k, w (k+1)m, N j kk G) j P(w 1(k 1), N j kk, w (k+1)m G) j P(w k w 1(k 1), N j kk, w (k+1)m, G) Inside-

15 / 22 Probability of a string for any k, 1 k m Inside P(w 1m G) = = = P(w 1(k 1), w k, w (k+1)m, N j kk G) j P(w 1(k 1), N j kk, w (k+1)m G) j P(w k w 1(k 1), N j kk, w (k+1)m, G) α j (k, k)p(n j w k ) j Inside-

16 / 22 Inside- α j (p, q)β j (p, q) = P(w 1(p 1), N j pq, w (q+1)m G) P(w pq N j pq, G) = P(w 1m, N j pq G) Inside Inside- but: postulate a non-terminal node! P(w 1m, N pq G) = α j (p, q)β j (p, q) j

17 / 22 Finding the Most Likely Parse Goal: finding the parse with the highest probability: P(ˆt) = max 1 i n δ i (1, m) Method: Viterbi-style parsing Viterbi: defining accumulatorsδi (p, q) which record the highest probability for the subtree with root N i dominating the words W pq Idea: if we assume that a node Npq i is used in the derivation, then: it needs to be the subtree with maximal probability, all other subtrees for wpq having the root N i have a lower probabiltiy, and will result in an analysis with lower probability Inside Inside-

18 / 22 Initialization: δ i (p, p) = P(N i w p ) Induction: δ i (p, q) = max 1 j,k n;p r<q P(N i N j N k ) δ j (p, r) δ k (r + 1, q) Inside Inside- Store backtrace: Ψ i (p, q) = arg max (j,k,r) P(N i N j N k ) δ j (p, r) δ k (r + 1, q)

19 / 22 (2) Termination: Inside Reconstruction: P(ˆt) =δ 1 (1, m) Inside- root node: N 1 1m general case: N i pq ˆt and Ψ i (p, q) = (j, k, r) left(n i pq ) = N j pr right(n i pq ) = N k (r+1)q

20 / 22 // base case lexical rules for i = 1 to n do // for all words for all A w i G do // for all rules δ(i, 1, A) = P(A w i ) // recursive case for len = 2 to n for i = 1 to n len+1 for k = 1 to len 1 for all A B C G do prob = P(A B C) δ(i, k, B) δ(i + k, len k, C) if prob>δ(i, len, A) then δ(i, len, A) = prob Ψ(i, len, A) = (B,C,k) // for all lengths // for all words // for all splits // for all rules // new prob // new max? // backtrace Inside Inside-

21 / 22 (2) // base case lexical rules for i = 1 to n do for all A w i G do δ(i, 1, A) = P(A w i ) // recursive case for len = 2 to n for i = 1 to n len+1 for k = 1 to len 1 for all A B C G do prob = P(A B C) δ(i, k, B) δ(i + k, len k, C) if prob > δ(i, len, A) then δ(i, len, A) = prob Ψ(i, len, A) = (B,C,k) Inside Inside- indexes ofδ, Ψ: value of Ψ: other: (no. of first word of const., length of const., root node) (first daughter, second daughter, length of first daughter) beginning of second daughter, length of second daughter

22 / 22 Example Grammar: S PP MS 1.0 IN At 0.5 PP IN NP 1.0 IN in 0.5 MS NP VP 0.8 JJ Roman 1.0 MS NN VP 0.2 NN banquets 0.2 NP DT NN 0.5 NN guests 0.3 NP JJ NN 0.3 NN garlic 0.2 NP NP PP 0.2 NN hair 0.3 VP VBD NP 0.6 DT the 0.6 VP VBD XP 0.4 DT their 0.4 XP NP PP 0.7 VBD wore 1.0 XP NN PP 0.3 Sentence: At Roman banquets, the guests wore garlic in their hair Inside Inside-