CS : Speech, NLP and the Web/Topics in AI

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities

Probability of a parse tree (cont.) S 1,l NP 1,2 VP 3,l P ( t s ) = P (t S 1,l ) = P ( NP 1,2, DT 1,1, w 1, N 2,2, w 2, N 2,2 V 3,3 PP 4,l N w 1 w 2 w 3 P 4,4 NP 5,l DT 1 VP 3,l, V 3,3, w 3, PP 4l 4,l,P 44 4,4,w 4, NP 5l 5,l,w 5 l S 1l 1,l ) w 4 w 5 w l = P ( NP 1,2, VP 3,l S 1,l ) * P ( DT 1,1, N 2,2 NP 1,2 ) * D(w 1 DT 1,1 ) * P (w 2 N 2,2 ) * P(V 3,3, PP 4,l VP 3,l ) * P(w 3 V 3,3 ) * P( P 4,4, NP 5,l PP 4,l ) * P(w 4 P 4,4 ) * P (w 5 l NP 5,l ) (Using Chain Rule, Context Freeness and Ancestor Freeness )

Example PCFG Rules & Probabilities S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4 DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * NP VP 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 0.5 0.6 * 1.0 * 0.3 * 1.0 DT 1.0 NN = 0.00225 0.5 VP 0.4 PP 1.0 The gunman VBD 1.0 NP 0.5 P 1.0 NP 0.3 sprayed DT 1.0 NN 0.5 with NNS 1.0 the building bullets

Another Parse t 2 The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.4 = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 P (t 2 ) * 1.0 * 0.3 * 1.0 DT 1.0 NN 0.5 VBD 1.0 NP 0.2 = 0.0015 The gunman sprayed NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 the building with NNS 1.0 bullets

HMM PCFG O observed sequence w 1m sentence X state sequence t parse tree μ model G grammar Three fundamental questions

HMM PCFG How likely is a certain observation given the model? How likely l is a sentence given the grammar? PO ( μ ) Pw ( G ) 1m How to choose a state t sequence which h best explains the observations? How to choose a parse which best supports the sentence? arg max PX ( Oμ, ) arg max P( t w1 m, G) X t

HMM PCFG How to choose the model parameters that best explain the observed data? How to choose rule probabilities which maximize the probabilities of the observed sentences? arg max PO ( ) μ μ P w1 m G arg max ( G)

Interesting Probabilities N 1 NP What is the probability of having a NP at this position such that it will derive the building? - NP (4,5) Inside Probabilities The gunman sprayed the building with bullets 1 2 3 4 5 6 7 Outside Probabilities biliti What is the probability of starting from N 1 and deriving The gunman sprayed, a NP and with bullets? - α NP (4,5)

Outside Probabilities Forward Outside probabilities α j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q Forward probability : α () t = P( w, X = i μ) i 1( t 1) t j Outside probability bilit : α j( p, q) = P( w1( p 1), Npq, w( q+ 1) m G) N 1 α N j w 1 w p-1 w p w q w q+1 w m

Inside Probabilities Backward Inside probabilities j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. Backward probability : () t = P( w X = i, μ) i tt t Inside probability : ( pq, ) = P( w N j, G ) N 1 α N j j pq pq w 1 w p-1 w p w q w q+1 w m

α = NP Outside & Inside Probabilities (4,5) for "the building" P (The gunman sprayed, NP,with bullets G) (4,5) for "the building" = (the building, ) NP P NP4,5 G N 1 4,5 NP The gunman sprayed the building with bullets g p y g 1 2 3 4 5 6 7

Inside probabilities j (p,q) Base case: j j ( kk, ) = Pw ( N, G ) = PN ( w G ) j k kk kk k Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN building is one of the rules with probability 0.5 (5,5) = (, ) NN Pbuilding NN5,5 G = PNN ( bildi building G ) = 0.5 5,5

Induction step : Induction Step ( pq, ) = Pw ( N, G) N j j j pq pq q 1 j r s N r N s, = PN ( NN)* rs d= p r ( pd, )* w p w d w d+1 w q ( d + 1, q) Consider different splits of the words - indicated by d E.g., the huge building Split here for d=2 d=3 Consider different non-terminals to be used in the rule: NP DT NN, NP DT NNS are available options Consider summation over all these. s

The Bottom-Up Approach The idea of induction Consider the gunman NP 05 0.5 Base cases : Apply unary rules DT the Prob = 1.0 NN gunman Prob = 0.5 DT 1.0 NN 0.5 The gunman Induction : Prob that a NP covers these 2 words = P (NP DT NN) * P (DT deriving the word the ) * P (NN deriving the word gunman ) = 0.5 * 1.0 * 0.5 = 0.25

Parse Triangle A parse triangle is constructed cted for calculating lating j (p,q) Probability of a sentence using j (p,q): P( w G) = P( N w G) = P( w N, G) = (1, m) 1 1 1m 1m 1m 1m 1

Parse Triangle The gunman sprayed the building with bullets (1) (2) (3) (4) (5) (6) (7) 1 DT = 1.0 2 3 4 5 6 7 = 05 0.5 NN VBD = 1.0 = 1.0 DT NN = 0.5 P = 1.0 NNS = 1.0 Fill diagonals with j ( kk, )

Parse Triangle 1 2 3 The gunman sprayed the building with bullets (1) (2) (3) (4) (5) (6) (7) = 1.0 DT NP = 0.25 NN = 0.5 = 10 1.0 VBD 4 DT = 1.0 5 = 0.5 6 7 Calculate l using induction formula NP = P NP1,2 G (1,2) (the gunman, ) = PNP ( DTNN)* DT (1,1) 1) * NN (2, 2) = 0.5*1.0*0.5 = 0.25 NN P = 1.0 NNS = 1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 Rule used here is VP VP PP DT 1.0 NN 0.5 VP 0.4 PP 1.0 The gunman VBD 1.0 NP 0.5 P 1.0 NP 0.3 DT 1.0 NN 0.5 with NNS sprayed 1.0 the building bullets

Another Parse t 2 The gunman sprayed the building with bullets. S 1.0 Rule used here is NP 0.5 VP 0.4 VP VBD NP DT 1.0 NN 0.5 VBD 1.0 NP 0.2 The gunmansprayed NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 the building with NNS 10 1.0 bullets

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 DT = 10 1.0 NP = 025 0.25 S = 0.04650465 2 NN = 0.5 3 VBD = 1.0 VP = 1.0 VP = 0.186 4 DT = 1.0 NP = 0.25 NP = 0.015 5 NN = 0.5 6 P = 10 1.0 PP = 0.3 7 NNS = 1.0 (3,7) = (sprayed the building with bullets, ) VP P VP37 3,7 G = PVP ( VPPP)* VP (3,5)* PP (6,7) + P ( VP VBD NP)* VBD (3,3)* NP (4,7) = 0.6*1.0*0.3+ 0.4*1.0*0.015 = 0.186

Different Parses Consider Different splitting points : E.g., 5th and 3 rd position Using different rules for VP expansion : E.g., VP VP PP, VP VBD NP Different parses for the VP sprayed the building Different parses for the VP sprayed the building with bullets can be constructed this way.

Outside Probabilities α j (p,q) Base case: α (1, m) = 1 for start symbol 1 α j (1, m) = 0 for j 1 Inductive step for calculating α j ( pq, ) : N 1 N f pe α f ( pe, ) PN ( f NN j g ) N j pq N g (q+1)e ( q + g 1, e ) w 1 w p-1 w p w q w q+1 w e w e+1 w m Summation over f, g & e

Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: j Pw ( 1m, Npq G) = Pw ( 1m Npq, G) = α j( pq, ) j( pq, ) N 1 j NP j P (The gunman...bullets, N G) 45 4,5 = P(The gunman...bullets N, G) j = α NP j 4,5 (4,5) NP (4,5) + α (4,5) (4,5) +... VP VP Th d th b ildi ith b ll t The gunman sprayed the building with bullets 1 2 3 4 5 6 7