CS : Speech, NLP and the Web/Topics in AI

Similar documents
Processing/Speech, NLP and the Web

CS460/626 : Natural Language

CS460/626 : Natural Language

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Probabilistic Context-Free Grammar

Probabilistic Context Free Grammars

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Statistical Methods for NLP

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Advanced Natural Language Processing Syntactic Parsing

Statistical Methods for NLP

CS460/626 : Natural Language Processing/Speech, NLP and the Web

LECTURER: BURCU CAN Spring

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Probabilistic Context-free Grammars

Artificial Intelligence

Soft Inference and Posterior Marginals. September 19, 2013

Lecture 13: Structured Prediction

CS838-1 Advanced NLP: Hidden Markov Models

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Hidden Markov Models

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

The Infinite PCFG using Hierarchical Dirichlet Processes

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

A Context-Free Grammar

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models Hamid R. Rabiee

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Spectral Unsupervised Parsing with Additive Tree Metrics

Chapter 14 (Partially) Unsupervised Parsing

Lecture 12: Algorithms for HMMs

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

COMP90051 Statistical Machine Learning

CSE 490 U Natural Language Processing Spring 2016

Statistical methods in NLP, lecture 7 Tagging and parsing

Lecture 12: Algorithms for HMMs

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Expectation Maximization (EM)

Lecture 15. Probabilistic Models on Graph

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Natural Language Processing

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Lab 12: Structured Prediction

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Sequence Labeling: HMMs & Structured Perceptron

10/17/04. Today s Main Points

Log-Linear Models, MEMMs, and CRFs

Multilevel Coarse-to-Fine PCFG Parsing

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

So# Inference and Posterior Marginals. September 19, 2013

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Intelligent Systems (AI-2)

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

A gentle introduction to Hidden Markov Models

Hidden Markov Models

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Intelligent Systems (AI-2)

Multiword Expression Identification with Tree Substitution Grammars

Stochastic Parsing. Roberto Basili

CS532, Winter 2010 Hidden Markov Models

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Dept. of Linguistics, Indiana University Fall 2009

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Computers playing Jeopardy!

Lecture 9: Hidden Markov Model

CSE 447/547 Natural Language Processing Winter 2018

CS 6120/CS4120: Natural Language Processing

Natural Language Processing. Lecture 13: More on CFG Parsing

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

STA 414/2104: Machine Learning

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Note Set 5: Hidden Markov Models

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Statistical NLP: Hidden Markov Models. Updated 12/15

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Chunking with Support Vector Machines

Introduction to Probablistic Natural Language Processing

CS344: Introduction to Artificial Intelligence

Graphical models for part of speech tagging

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Introduction to Artificial Intelligence (AI)

Lecture 7: Introduction to syntax-based MT

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Transcription:

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities

Probability of a parse tree (cont.) S 1,l NP 1,2 VP 3,l P ( t s ) = P (t S 1,l ) = P ( NP 1,2, DT 1,1, w 1, N 2,2, w 2, N 2,2 V 3,3 PP 4,l N w 1 w 2 w 3 P 4,4 NP 5,l DT 1 VP 3,l, V 3,3, w 3, PP 4l 4,l,P 44 4,4,w 4, NP 5l 5,l,w 5 l S 1l 1,l ) w 4 w 5 w l = P ( NP 1,2, VP 3,l S 1,l ) * P ( DT 1,1, N 2,2 NP 1,2 ) * D(w 1 DT 1,1 ) * P (w 2 N 2,2 ) * P(V 3,3, PP 4,l VP 3,l ) * P(w 3 V 3,3 ) * P( P 4,4, NP 5,l PP 4,l ) * P(w 4 P 4,4 ) * P (w 5 l NP 5,l ) (Using Chain Rule, Context Freeness and Ancestor Freeness )

Example PCFG Rules & Probabilities S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4 DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * NP VP 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 0.5 0.6 * 1.0 * 0.3 * 1.0 DT 1.0 NN = 0.00225 0.5 VP 0.4 PP 1.0 The gunman VBD 1.0 NP 0.5 P 1.0 NP 0.3 sprayed DT 1.0 NN 0.5 with NNS 1.0 the building bullets

Another Parse t 2 The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.4 = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 P (t 2 ) * 1.0 * 0.3 * 1.0 DT 1.0 NN 0.5 VBD 1.0 NP 0.2 = 0.0015 The gunman sprayed NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 the building with NNS 1.0 bullets

HMM PCFG O observed sequence w 1m sentence X state sequence t parse tree μ model G grammar Three fundamental questions

HMM PCFG How likely is a certain observation given the model? How likely l is a sentence given the grammar? PO ( μ ) Pw ( G ) 1m How to choose a state t sequence which h best explains the observations? How to choose a parse which best supports the sentence? arg max PX ( Oμ, ) arg max P( t w1 m, G) X t

HMM PCFG How to choose the model parameters that best explain the observed data? How to choose rule probabilities which maximize the probabilities of the observed sentences? arg max PO ( ) μ μ P w1 m G arg max ( G)

Interesting Probabilities N 1 NP What is the probability of having a NP at this position such that it will derive the building? - NP (4,5) Inside Probabilities The gunman sprayed the building with bullets 1 2 3 4 5 6 7 Outside Probabilities biliti What is the probability of starting from N 1 and deriving The gunman sprayed, a NP and with bullets? - α NP (4,5)

Outside Probabilities Forward Outside probabilities α j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q Forward probability : α () t = P( w, X = i μ) i 1( t 1) t j Outside probability bilit : α j( p, q) = P( w1( p 1), Npq, w( q+ 1) m G) N 1 α N j w 1 w p-1 w p w q w q+1 w m

Inside Probabilities Backward Inside probabilities j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. Backward probability : () t = P( w X = i, μ) i tt t Inside probability : ( pq, ) = P( w N j, G ) N 1 α N j j pq pq w 1 w p-1 w p w q w q+1 w m

α = NP Outside & Inside Probabilities (4,5) for "the building" P (The gunman sprayed, NP,with bullets G) (4,5) for "the building" = (the building, ) NP P NP4,5 G N 1 4,5 NP The gunman sprayed the building with bullets g p y g 1 2 3 4 5 6 7

Inside probabilities j (p,q) Base case: j j ( kk, ) = Pw ( N, G ) = PN ( w G ) j k kk kk k Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN building is one of the rules with probability 0.5 (5,5) = (, ) NN Pbuilding NN5,5 G = PNN ( bildi building G ) = 0.5 5,5

Induction step : Induction Step ( pq, ) = Pw ( N, G) N j j j pq pq q 1 j r s N r N s, = PN ( NN)* rs d= p r ( pd, )* w p w d w d+1 w q ( d + 1, q) Consider different splits of the words - indicated by d E.g., the huge building Split here for d=2 d=3 Consider different non-terminals to be used in the rule: NP DT NN, NP DT NNS are available options Consider summation over all these. s

The Bottom-Up Approach The idea of induction Consider the gunman NP 05 0.5 Base cases : Apply unary rules DT the Prob = 1.0 NN gunman Prob = 0.5 DT 1.0 NN 0.5 The gunman Induction : Prob that a NP covers these 2 words = P (NP DT NN) * P (DT deriving the word the ) * P (NN deriving the word gunman ) = 0.5 * 1.0 * 0.5 = 0.25

Parse Triangle A parse triangle is constructed cted for calculating lating j (p,q) Probability of a sentence using j (p,q): P( w G) = P( N w G) = P( w N, G) = (1, m) 1 1 1m 1m 1m 1m 1

Parse Triangle The gunman sprayed the building with bullets (1) (2) (3) (4) (5) (6) (7) 1 DT = 1.0 2 3 4 5 6 7 = 05 0.5 NN VBD = 1.0 = 1.0 DT NN = 0.5 P = 1.0 NNS = 1.0 Fill diagonals with j ( kk, )

Parse Triangle 1 2 3 The gunman sprayed the building with bullets (1) (2) (3) (4) (5) (6) (7) = 1.0 DT NP = 0.25 NN = 0.5 = 10 1.0 VBD 4 DT = 1.0 5 = 0.5 6 7 Calculate l using induction formula NP = P NP1,2 G (1,2) (the gunman, ) = PNP ( DTNN)* DT (1,1) 1) * NN (2, 2) = 0.5*1.0*0.5 = 0.25 NN P = 1.0 NNS = 1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 Rule used here is VP VP PP DT 1.0 NN 0.5 VP 0.4 PP 1.0 The gunman VBD 1.0 NP 0.5 P 1.0 NP 0.3 DT 1.0 NN 0.5 with NNS sprayed 1.0 the building bullets

Another Parse t 2 The gunman sprayed the building with bullets. S 1.0 Rule used here is NP 0.5 VP 0.4 VP VBD NP DT 1.0 NN 0.5 VBD 1.0 NP 0.2 The gunmansprayed NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 the building with NNS 10 1.0 bullets

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 DT = 10 1.0 NP = 025 0.25 S = 0.04650465 2 NN = 0.5 3 VBD = 1.0 VP = 1.0 VP = 0.186 4 DT = 1.0 NP = 0.25 NP = 0.015 5 NN = 0.5 6 P = 10 1.0 PP = 0.3 7 NNS = 1.0 (3,7) = (sprayed the building with bullets, ) VP P VP37 3,7 G = PVP ( VPPP)* VP (3,5)* PP (6,7) + P ( VP VBD NP)* VBD (3,3)* NP (4,7) = 0.6*1.0*0.3+ 0.4*1.0*0.015 = 0.186

Different Parses Consider Different splitting points : E.g., 5th and 3 rd position Using different rules for VP expansion : E.g., VP VP PP, VP VBD NP Different parses for the VP sprayed the building Different parses for the VP sprayed the building with bullets can be constructed this way.

Outside Probabilities α j (p,q) Base case: α (1, m) = 1 for start symbol 1 α j (1, m) = 0 for j 1 Inductive step for calculating α j ( pq, ) : N 1 N f pe α f ( pe, ) PN ( f NN j g ) N j pq N g (q+1)e ( q + g 1, e ) w 1 w p-1 w p w q w q+1 w e w e+1 w m Summation over f, g & e

Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: j Pw ( 1m, Npq G) = Pw ( 1m Npq, G) = α j( pq, ) j( pq, ) N 1 j NP j P (The gunman...bullets, N G) 45 4,5 = P(The gunman...bullets N, G) j = α NP j 4,5 (4,5) NP (4,5) + α (4,5) (4,5) +... VP VP Th d th b ildi ith b ll t The gunman sprayed the building with bullets 1 2 3 4 5 6 7