{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Size: px
Start display at page:

Download "{Probabilistic Stochastic} Context-Free Grammars (PCFGs)"

Transcription

1 {Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116

2 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117

3 PCFGs APCFGGconsists of: A set of terminals, {w k },k=1,...,v A set of nonterminals, {N i },i =1,...,n A designated start symbol, N 1 A set of rules, {N i ζ j }, (where ζ j is a sequence of terminals and nonterminals) A corresponding set of probabilities on rules such that: i P(N i ζ j ) = 1 j 118

4 PCFG notation Sentence: sequence of words w 1 w m w ab : the subsequence w a w b N i ab : nonterminal Ni dominates w a w b N j N i w a w b ζ: Repeated derivation from N i gives ζ. 119

5 PCFG probability of a string P(w 1n ) = t P(w 1n,t) t a parse of w 1n = {t:yield(t)=w 1n } P(t) 120

6 A simple PCFG (in CNF) S NP VP 1.0 NP NP PP 0.4 PP PNP 1.0 NP astronomers 0.1 VP VNP 0.7 NP ears 0.18 VP VP PP 0.3 NP saw 0.04 P with 1.0 NP stars 0.18 V saw 1.0 NP telescopes

7 t 1 : S 1.0 NP 0.1 VP 0.7 astronomers V 1.0 NP 0.4 saw NP 0.18 PP 1.0 stars P 1.0 NP 0.18 with ears 122

8 t 2 : S 1.0 NP 0.1 VP 0.3 astronomers VP 0.7 PP 1.0 V 1.0 NP 0.18 P 1.0 NP 0.18 saw stars with ears 123

9 The two parse trees probabilities and the sentence probability P(t 1 ) = = P(t 2 ) = = P(w 15 ) = P(t 1 )+P(t 2 ) =

10 Assumptions of PCFGs 1. Place invariance (like time invariance in HMM): 2. Context-free: k P(N j k(k+c) ζ)is the same P(N j kl ζ words outside w k...w l )=P(N j kl ζ) 3. Ancestor-free: P(N j kl ζ ancestor nodes of Nj kl ) = P(Nj kl ζ) 125

11 Let the upper left index in i N j be an arbitrary identifying index for a particular token of a nonterminal. Then, P 2 NP 1 S the man 3 VP snores = P(1 S NP 12 VP33, 2 NP 12 the 1 man 2, 3 VP 33 sno =... = P(S NP V P)P(NP the man)p(v P snores) 126

12 Some features of PCFGs Reasons to use a PCFG, and some idea of their limitations: Partial solution for grammar ambiguity: a PCFG gives some idea of the plausibility of a sentence. But not a very good idea, as not lexicalized. Better for grammar induction (Gold 1967) Robustness. (Admit everything with low probability.) 127

13 Some features of PCFGs Gives a probabilistic language model for English. In practice, a PCFG is a worse language model for English than a trigram model. Can hope to combine the strengths of a PCFG and a trigram model. PCFG encodes certain biases, e.g., that smaller trees are normally more probable. 128

14 Improper (inconsistent) distributions S rhubarb P = 1 3 S S S P = 2 3 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb P(L) = = = 2 ( ) 2 2 ( Improper/inconsistent distribution 27 ) 3 2 = Not a problem if you estimate from parsed treebank: Chi and Geman 1998). 129

15 Questions for PCFGs Just as for HMMs, there are three basic questions we wish to answer: P(w 1m G) arg max t P(t w 1m,G) Learning algorithm. Find G such that P(w 1m G) is maximized. 130

16 Chomsky Normal Form grammars We ll do the case of Chomsky Normal Form grammars, which only have rules of the form: N i N j N k N i w j Any CFG can be represented by a weakly equivalent CFG in Chomsky Normal Form. It s straightforward to generalize the algorithm (recall chart parsing). 131

17 PCFG parameters We ll do the case of Chomsky Normal Form grammars, which only have rules of the form: N i N j N k N i w j The parameters of a CNF PCFG are: P(N j N r N s G) A n 3 matrix of parameters P(N j w k G) An nt matrix of parameters For j = 1,...,n, P(N j N r N s )+ P(N j w k ) = 1 r,s k 132

18 Probabilistic Regular Grammar: N i w j N k N i w j Start state, N 1 HMM: P(w 1n ) = 1 n w 1n whereas in a PCFG or a PRG: P(w) = 1 w L 133

19 Consider: P(John decided to bake a) High probability in HMM, low probability in a PRG or a PCFG. Implement via sink state. APRG Π Start HMM Finish 134

20 Comparison of HMMs (PRGs) and PCFGs X: NP N N N sink O: the big brown box NP the N big N brown N 0 box 135

21 Inside and outside probabilities This suggests: whereas for an HMM we have: Forwards = α i (t) = P(w 1(t 1),X t =i) Backwards = β i (t) = P(w tt X t = i) for a PCFG we make use of Inside and Outside probabilities, defined as follows: Outside = α j (p, q) = P(w 1(p 1),Npq,w j (q+1)m G) Inside = β j (p, q) = P(w pq Npq,G) j A slight generalization of dynamic Bayes Nets covers PCFG inference by the inside-outside algorithm (and-or tree of conjunctive daughters disjunctively chosen) 136

22 Inside and outside probabilities in PCFGs. N 1 α w 1 N j β w p 1 w p w q w q+1 w m 137

23 Probability of a string Inside probability P(w 1m G) = P(N 1 w 1m G) = P(w 1m,N1m 1,G) =β 1(1,m) Base case: We want to find β j (k, k) (the probability of a rule N j w k ): β j (k, k) = P(w k N j kk,g) = P(N j w k G) 138

24 Induction: We want to find β j (p, q), forp<q. As this is the inductive step using a Chomsky Normal Form grammar, the first rule must be of the form N j N r N s, so we can proceed by induction, dividing the string in two in various places and summing the result: N j N r N s w p w d w d+1 w q These inside probabilities can be calculated bottom up. 139

25 For all j, β j (p, q) = P(w pq N j pq,g) = r,s = r,s = r,s = r,s q 1 d=p q 1 d=p P(w pd,n r pd,w (d+1)q,n s (d+1)q Nj pq,g) P(N r pd,ns (d+1)q Nj pq,g) P(w pd N j pq,n r pd,ns (d+1)q,g) P(w (d+1)q N j pq,n r pd,ns (d+1)q,w pd,g) q 1 d=p P(N r pd,ns (d+1)q Nj pq,g) P(w pd N r pd, G)P(w (d+1)q N s (d+1)q,g) q 1 d=p P(N j N r N s )β r (p, d)β s (d + 1,q) 140

26 Calculation of inside probabilities (CKY algorithm) β NP = 0.1 β S = β S = β NP = 0.04 β VP = β VP = β V = β NP = 0.18 β NP = β P = 1.0 β PP = β NP = 0.18 astronomers saw stars with ears 141

27 Outside probabilities Probability of a string: For any k, 1 k m, P(w 1m G) = j P(w 1(k 1),w k,w (k+1)m,n j kk G) = j P(w 1(k 1),N j kk,w (k+1)m G) = j P(w k w 1(k 1),N j kk,w (k+1)n,g) α j (k, k)p(n j w k ) Inductive (DP) calculation: One calculates the outside probabilities top down (after determining the inside probabilities). 142

28 Outside probabilities Base Case: α 1 (1,m)=1 α j (1,m)=0,forj 1 Inductive Case: N 1 N f pe N j pq N g (q+1)e w 1 w p 1 w p w q w q+1 w e w e+1 w m 143

29 Outside probabilities Base Case: α 1 (1,m)=1 α j (1,m)=0,forj 1 Inductive Case: it s either a left or right branch we will some over both possibilities and calculate using outside and inside probabilities N 1 N f pe N j pq N g (q+1)e w 1 w p 1 w p w q w q+1 w e w e+1 w m 144

30 Outside probabilities inductive case AnodeNpq j might be the left or right branch of the parent node. We sum over both possibilities. N 1 N f eq N g e(p 1) N j pq w 1 w e 1 w e w p 1 w p w q w q+1 w m 145

31 Inductive Case: α j (p, q) = [ f,g m e=q+1 +[ f,g = [ m P(w 1(p 1),w (q+1)m,n f pe,n j pq,n g (q+1)e )] p 1 e=1 f,gnej e=q+1 P(w 1(p 1),w (q+1)m,n f eq,n g e(p 1),Nj pq)] P(w 1(p 1),w (e+1)m,n f pe)p(n j pq,n g (q+1)e Nf pe) p 1 P(w (q+1)e N g (q+1)e )] + [ P(w 1(e 1),w (q+1)m,neq) f f,g e=1 P(N g e(p 1),Nj pq Neq)P(w f e(p 1) N g e(p 1 )] = [ m α f (p, e)p(n f N j N g )β g (q + 1,e)] f,g e=q+1 +[ f,g p 1 e=1 α f (e, q)p(n f N g N j )β g (e, p 1)] 146

32 Overall probability of a node existing As with a HMM, we can form a product of the inside and outside probabilities. This time: α j (p, q)β j (p, q) = P(w 1(p 1),Npq,w j (q+1)m G)P(w pq Npq,G) j =P(w 1m,Npq G) j Therefore, p(w 1m,N pq G) = j α j (p, q)β j (p, q) Just in the cases of the root node and the preterminals, we know there will always be some such constituent. 147

33 Training a PCFG We construct an EM training algorithm, as for HMMs. We would like to calculate how often each rule is used: ˆP(N j ζ) = C(Nj ζ) γ C(N j γ) Have data count; else work iteratively from expectations of current model. Consider: α j (p, q)β j (p, q) = P(N 1 w 1m,N j = P(N 1 w 1m G)P(N j wpq G) wpq N 1 w 1m,G) We have already solved how to calculate P(N 1 w 1m ); let us call this probability π. Then: and P(N j w pq N 1 E(N j is used in the derivation) = w 1m,G)= α j(p, q)β j (p, q) π m m p=1 q=p α j (p, q)β j (p, q) π 148

34 In the case where we are not dealing with a preterminal, we substitute the inductive definition of β, and r,s,p > q: P(N j N r N s w pq N 1 w 1n,G)= q 1 d=p α j(p, q)p(n j N r N s )β r (p, d)β s (d + 1,q) Therefore the expectation is: E(N j N r N s,n j used) m 1 m q 1 p=1 q=p+1 d=p α j(p, q)p(n j N r N s )β r (p, d)β s (d + 1,q) π Now for the maximization step, we want: π P(N j N r N s ) = E(Nj N r N s,n j used) E(N j used) 149

35 Therefore, the reestimation formula, ˆP(N j N r N s ) is the quotient: ˆP(N j N r N s ) = Similarly, m 1 m q 1 p=1 q=p+1 d=p α j(p, q)p(n j N r N s )β r (p, d)β s (d + 1,q) m m p=1 q=1 α j(p, q)β j (p, q) E(N j w k N 1 w 1m,G)= mh=1 α j (h, h)p(n j w h,w h =w k ) Therefore, ˆP(N j w k ) = π mh=1 α j (h, h)p(n j w h,w h =w k ) mp=1 mq=1 α j (p, q)β j (p, q) Inside-Outside algorithm: repeat this process until the estimated probability change is small. 150

36 Multiple training instances: if we have training sentences W = (W 1,...W ω ), with W i = (w 1,...,w mi ) and we let u and v bet the common subterms from before: and u i (p,q,j,r,s)= q 1 d=p α j(p, q)p(n j N r N s )β r (p, d)β s (d + 1,q) P(N 1 W i G) v i (p,q,j)= α j(p, q)β j (p, q) P(N 1 W i G) Assuming the observations are independent, we can sum contributions: ωi=1 mi 1 mi ˆP(N j N r N s p=1 q=p+1 ) = u i(p,q,j,r,s) ωi=1 mi mi p=1 q=pv i (p,q,j) and ωi=1 ˆP(N j w k ) = ωi=1 mi p=1 {h:w h =w k } v i(h, h, j) mi q=p v i (p,q,j) 151

37 Problems with the Inside-Outside algorithm 1. Slow. Each iteration is O(m 3 n 3 ), where m = ω i=1 m i, and n is the number of nonterminals in the grammar. 2. Local maxima are much more of a problem. Charniak reports that on each trial a different local maximum was found. Use simulated annealing? Restrict rules by initializing some parameters to zero? Or HMM initialization? Reallocate nonterminals away from greedy terminals? 3. Lari and Young suggest that you need many more nonterminals available than are theoretically necessary to get good grammar learning (about a threefold increase?). This compounds the first problem. 4. There is no guarantee that the nonterminals that the algorithm learns will have any satisfactory resemblance to the kinds of nonterminals normally motivated in linguistic analysis (NP, VP, etc.). 152

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities 1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence

More information

Probabilistic Context Free Grammars

Probabilistic Context Free Grammars 1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

CS : Speech, NLP and the Web/Topics in AI

CS : Speech, NLP and the Web/Topics in AI CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities Probability of a parse tree (cont.) S 1,l NP 1,2

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

The Infinite PCFG using Hierarchical Dirichlet Processes

The Infinite PCFG using Hierarchical Dirichlet Processes S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise

More information

Multilevel Coarse-to-Fine PCFG Parsing

Multilevel Coarse-to-Fine PCFG Parsing Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7 Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation

More information

Introduction to Probablistic Natural Language Processing

Introduction to Probablistic Natural Language Processing Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation

More information

Probabilistic Linguistics

Probabilistic Linguistics Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

In this chapter, we explore the parsing problem, which encompasses several questions, including:

In this chapter, we explore the parsing problem, which encompasses several questions, including: Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

CS 188 Introduction to AI Fall 2005 Stuart Russell Final NAME: SID#: Section: 1 CS 188 Introduction to AI all 2005 Stuart Russell inal You have 2 hours and 50 minutes. he exam is open-book, open-notes. 100 points total. Panic not. Mark your answers ON HE EXAM

More information

A Context-Free Grammar

A Context-Free Grammar Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning

More information

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing

More information

Lecture 5: UDOP, Dependency Grammars

Lecture 5: UDOP, Dependency Grammars Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M

More information

The Inside-Outside Algorithm

The Inside-Outside Algorithm The Inside-Outside Algorithm Karl Stratos 1 The Setup Our data consists of Q observations O 1,..., O Q where each observation is a sequence of symbols in some set X = {1,..., n}. We suspect a Probabilistic

More information

The effect of non-tightness on Bayesian estimation of PCFGs

The effect of non-tightness on Bayesian estimation of PCFGs The effect of non-tightness on Bayesian estimation of PCFGs hay B. Cohen Department of Computer cience Columbia University scohen@cs.columbia.edu Mark Johnson Department of Computing Macquarie University

More information

Context- Free Parsing with CKY. October 16, 2014

Context- Free Parsing with CKY. October 16, 2014 Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Context Free Grammars: Introduction

Context Free Grammars: Introduction Context Free Grammars: Introduction Context free grammar (CFG) G = (V, Σ, R, S), where V is a set of grammar symbols Σ V is a set of terminal symbols R is a set of rules, where R (V Σ) V S is a distinguished

More information

Probabilistic Context-Free Grammars and beyond

Probabilistic Context-Free Grammars and beyond Probabilistic Context-Free Grammars and beyond Mark Johnson Microsoft Research / Brown University July 2007 1 / 87 Outline Introduction Formal languages and Grammars Probabilistic context-free grammars

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department

More information

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs Context Free Grammars: Introduction CFGs are more powerful than RGs because of the following 2 properties: 1. Recursion Rule is recursive if it is of the form X w 1 Y w 2, where Y w 3 Xw 4 and w 1, w 2,

More information

Even More on Dynamic Programming

Even More on Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

Remembering subresults (Part I): Well-formed substring tables

Remembering subresults (Part I): Well-formed substring tables Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01, 1. February 2005 Problem: Inefficiency of recomputing subresults Two

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

Improved TBL algorithm for learning context-free grammar

Improved TBL algorithm for learning context-free grammar Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski

More information

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm 6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic

More information

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating

More information

Final, CSC242, May 2009

Final, CSC242, May 2009 Final, CSC242, May 2009 Name: Write your NAME legibly on your bluebook AND EXAM, WHCH YOU WLL ALSO TURN N. This is a 168-minute) test. Work all problems. You may use two double-sided pages of notes. Please

More information

Bayesian Inference for PCFGs via Markov chain Monte Carlo

Bayesian Inference for PCFGs via Markov chain Monte Carlo Bayesian Inference for PCFGs via Markov chain Monte Carlo Mark Johnson Cognitive and Linguistic Sciences Thomas L. Griffiths Department of Psychology Brown University University of California, Berkeley

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Language and Statistics II

Language and Statistics II Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

Natural Language Processing. Lecture 13: More on CFG Parsing

Natural Language Processing. Lecture 13: More on CFG Parsing Natural Language Processing Lecture 13: More on CFG Parsing Probabilistc/Weighted Parsing Example: ambiguous parse Probabilistc CFG Ambiguous parse w/probabilites 0.05 0.05 0.20 0.10 0.30 0.20 0.60 0.75

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

Computability Theory

Computability Theory CS:4330 Theory of Computation Spring 2018 Computability Theory Decidable Problems of CFLs and beyond Haniel Barbosa Readings for this lecture Chapter 4 of [Sipser 1996], 3rd edition. Section 4.1. Decidable

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

The Infinite PCFG using Hierarchical Dirichlet Processes

The Infinite PCFG using Hierarchical Dirichlet Processes The Infinite PCFG using Hierarchical Dirichlet Processes Liang, Petrov, Jordan & Klein Presented by: Will Allen November 8, 2011 Overview 1. Overview 2. (Very) Brief History of Context Free Grammars 3.

More information

Algebraic Dynamic Programming. Solving Satisfiability with ADP

Algebraic Dynamic Programming. Solving Satisfiability with ADP Algebraic Dynamic Programming Session 12 Solving Satisfiability with ADP Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Summer 2013 http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis 1 Introduction Parenthesis Matching Problem Describe the set of arithmetic expressions with correctly matched parenthesis. Arithmetic expressions with correctly matched parenthesis cannot be described

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are

More information

CS20a: summary (Oct 24, 2002)

CS20a: summary (Oct 24, 2002) CS20a: summary (Oct 24, 2002) Context-free languages Grammars G = (V, T, P, S) Pushdown automata N-PDA = CFG D-PDA < CFG Today What languages are context-free? Pumping lemma (similar to pumping lemma for

More information

CSE 490 U Natural Language Processing Spring 2016

CSE 490 U Natural Language Processing Spring 2016 CSE 490 U Natural Language Processing Spring 2016 Feature Rich Models Yejin Choi - University of Washington [Many slides from Dan Klein, Luke Zettlemoyer] Structure in the output variable(s)? What is the

More information

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis

More information

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information