Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Size: px
Start display at page:

Download "Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing"

Transcription

1 CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang

2 Big Picture we have already covered... sequence models (WFSAs, WFSTs) decoding (Viterbi Algorithm) supervised training (counting, smoothing) unsupervised training (EM using forward-backward) in this unit we ll look beyond sequences, and cover... tree models (prob context-free grammars and extensions) decoding ( parsing, CKY Algorithm) supervised training (lexicalization, history-annotation,...) unsupervised training (EM via inside-outside) 2

3 Course Project (see handout) Important Dates Tue Nov 10: initial project proposal due Thu Nov 19: final project proposal due; data preparation and one-hour baseline finished. Thu Dec 3: midway project presentations Wed Dec 16: final project writeups due at Topic (see list of samples from previous years) must involve statistical processing of linguistic structures NO boring topics like text classification with bags of words Amount of Work: ~2 HWs for each student 3

4 Review of HW3 what s wrong with this epron-jpron.wfst? 100 (0 (AA "AA" *e* 1.0 ) ) (0 (D "D" *e* 1.0 ) ) (0 (F "F" *e* 1.0 ) ) (0 (HH "HH" *e* 1.0 ) ) (0 (G "G" *e* 1.0 ) ) (0 (B "B" *e* 1.0 ) )... (AA (100 *e* "O O" ) ) (AA (100 *e* "A" ) ) (AA (100 *e* "O" ) ) (AA (100 *e* "A A" ) )... (M (100 *e* "N" ) ) (M (100 *e* "M" ) ) (N (100 *e* "NN" ) ) ( 100 ( 0 " " *e* 1.0 ) ) ( 100 ( 0 *e* *e* 1.0 ) ) 4

5 Limitations of Sequence Models can you write an FSA/FST for the following? { (a n, b n ) } { (a 2n, b n ) } { a n b n } { w w R } { (w, w R ) } does it matter to human languages? [The woman saw the boy [that heard the man [that left] ] ]. [The claim [that the house [he bought] is valuable] is wrong]. but humans can t really process infinite recursions... stack overflow! 5

6 Let s try to write a grammar... (courtesy of Julia Hockenmaier) let s take a closer look... we ll try our best to represent English in a FSA... basic sentence structure: N, V, N 6

7 Subject-Verb-Object compose it with a lexicon, and we get an HMM so far so good 7

8 (Recursive) Adjectives the ball (courtesy of Julia Hockenmaier) the big ball the big, red ball the big, red, heavy ball... then add Adjectives, which modify Nouns the number of modifiers/adjuncts can be unlimited. how about no determiner before noun? play tennis 8

9 Recursive PPs (courtesy of Julia Hockenmaier) the ball the ball in the garden the ball in the garden behind the house the ball in the garden behind the house near the school... recursion can be more complex but we can still model it with FSAs! so why bother to go beyond finite-state? 9

10 FSAs can t go hierarchical! (courtesy of Julia Hockenmaier) but sentences have a hierarchical structure! so that we can infer the meaning we need not only strings, but also trees FSAs are flat, and can only do tail recursions (i.e., loops) but we need real (branching) recursions for languages 10

11 FSAs can t do Center Embedding The mouse ate the corn. (courtesy of Julia Hockenmaier) The mouse that the snake ate ate the corn. The mouse that the snake that the hawk ate ate ate the corn.... vs. The claim that the house he bought was valuable was wrong. vs. I saw the ball in the garden behind the house near the school. in theory, these infinite recursions are still grammatical competence (grammatical knowledge) in practice, studies show that English has a limit of 3 performance (processing and memory limitations) FSAs can model finite embeddings, but very inconvenient. 11

12 How about Recursive FSAs? problem of FSAs: only tail recursions, no branching recursions can t represent hierarchical structures (trees) can t generate center-embedded strings is there a simple way to improve it? recursive transition networks (RTNs) S NP VP -> > > 2 -> NP Det N -> > > 2 -> VP V NP -> > > 2 ->

13 Context-Free Grammars S NP VP NP Det N NP NP PP PP P NP N {ball, garden, house, sushi } P {in, behind, with} V... Det... VP V NP VP VP PP... 13

14 Context-Free Grammars A CFG is a 4-tuple N,Σ,R,S A set of nonterminals N (e.g. N = {S, NP, VP, PP, Noun, Verb,...}) A set of terminals Σ (e.g. Σ = {I, you, he, eat, drink, sushi, ball, }) A set of rules R R {A β with left-hand-side (LHS)" A N and right-hand-side (RHS) β (N Σ)* } A start symbol S (sentence) 14

15 Parse Trees N {sushi, tuna} P {with} V {eat} NP N NP NP PP PP P NP VP V NP VP VP PP 15

16 CFGs for Center-Embedding The mouse ate the corn. The mouse that the snake ate ate the corn. The mouse that the snake that the hawk ate ate ate the corn.... { a n b n } { w w R } can you also do { a n b n c n }? or { w w R w }? { a n b n c m d m } what s the limitation of CFGs? CFG for center-embedded clauses: S NP ate NP; NP NP RC; RC that NP ate 16

17 Review write a CFG for... { a m b n c n d m } { a m b n c 3m+2n } { a m b n c m d n } buffalo buffalo buffalo... write an FST or synchronous CFG for... { (w, w R ) } { (a n, b n ) } HW3: including p(eprons) is wrong HW4: using carmel to test your own code 17

18 Chomsky Hierarchy CS 498 JH: Introduction to NLP (Fall 08) 18

19 Constituents, Heads, Dependents CS 498 JH: Introduction to NLP (Fall 08) 19

20 Constituency Test how about there is or I do? CS 498 JH: Introduction to NLP (Fall 08) 20

21 Arguments and Adjuncts arguments are obligatory CS 498 JH: Introduction to NLP (Fall 08) 21

22 Arguments and Adjuncts adjuncts are optional CS 498 JH: Introduction to NLP (Fall 08) 22

23 Noun Phrases (NPs) CS 498 JH: Introduction to NLP (Fall 08) 23

24 The NP Fragment CS 498 JH: Introduction to NLP (Fall 08) 24

25 ADJPs and PPs CS 498 JH: Introduction to NLP (Fall 08) 25

26 Verb Phrase (VP) CS 498 JH: Introduction to NLP (Fall 08) 26

27 VPs redefined CS 498 JH: Introduction to NLP (Fall 08) 27

28 Sentences CS 498 JH: Introduction to NLP (Fall 08) 28

29 Sentence Redefined CS 498 JH: Introduction to NLP (Fall 08) 29

30 Probabilistic CFG normalization sumβ p( A β) =1 what s the most likely tree? in finite-state world, what s the most likely string? given string w, what s the most likely tree for w this is called parsing (like decoding) CS 498 JH: Introduction to NLP (Fall!08) 30

31 Probability of a tree CS 498 JH: Introduction to NLP (Fall!08) 31

32 Most likely tree given string parsing is to search for the best tree t* that: t* = argmax_t p (t w) = argmax_t p(t) p (w t) = argmax_{t: yield(t)=w} p(t) analogous to HMM decoding is it related to intersection or composition in FSTs? 32

33 CKY Algorithm (S, 0, n) NAACL w0 w1... wn-1 Dynamic Programming

34 CKY Algorithm flies like a flower S NP VP NP DT NN NP NNS NP NP PP VP VB NP VP VP PP VP VB PP P NP VB flies NNS flies VB like P like DT a NN flower NAACL Dynamic Programming

35 CKY Algorithm S, VP, NP NNS, VB, NP S VP VB, P, VP, PP DT NP NN flies like a flower S NP VP NP DT NN NP NNS NP NP PP VP VB NP VP VP PP VP VB PP P NP VB flies NNS flies VB like P like DT a NN flower NAACL Dynamic Programming

36 CKY Example NAACL 2009 CS 498 JH: Introduction to NLP (Fall!08) 36 Dynamic Programming

37 Chomsky Normal Form wait! how can you assume a CFG is binary-branching? well, we can always convert a CFG into Chomsky- Normal Form (CNF) A B C A a how to deal with epsilon-removal? how to do it with PCFG? 37

38 Parsing as Deduction (B, i, k) (C, k, j) : a (A, i, j) : b A B C : a b Pr(A B C) NAACL Dynamic Programming

39 Parsing as Intersection (B, i, k) (C, k, j) : a (A, i, j) : b A B C : a b Pr(A B C) intersection between a CFG G and an FSA D: define L(G) to be the set of strings (i.e., yields) G generates define L(G D) = L(G) L(D) what does this new language generate?? what does the new grammar look like? what about CFG CFG? NAACL Dynamic Programming

40 Parsing as Composition NAACL Dynamic Programming

41 Packed Forests a compact representation of many parses by sharing common sub-derivations polynomial-space encoding of exponentially large set nodes hyperedges a hypergraph 0 I 1 saw 2 him 3 with 4 a 5 mirror 6 NAACL 2009 (Klein and Manning, 2001; Huang and Chiang, 2005) 41 Dynamic Programming

42 Lattice vs. Forest NAACL Dynamic Programming

43 Forest and Deduction (B, i, k) u1 : a fe (C, k, j) u2 : b (B, i, k) (C, k, j) (A, i, j) A B C v : a b Pr(A B C) (A, i, j) (Nederhof, 2003) tails u1 : a u2 : b u1 : a : b u2 antecedents fe fe v : fe (a,b) v : fe (a,b) head consequent NAACL Dynamic Programming

44 Related Formalisms v v OR-node e e AND-node u1 u2 u1 u2 OR-nodes NAACL Dynamic Programming

45 Viterbi Algorithm for DAGs 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): key observation: d(u) is fixed to optimal at this time u w(u, v) v d(v) = d(u) w(u, v) time complexity: O( V + E ) NAACL Dynamic Programming

46 Viterbi Algorithm for DAHs 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming hyperedge e = ((u1,.., u e ), v, fe) use d(ui) s to update d(v) key observation: d(u i) s are fixed to optimal at this time u1 fe v d(v) = f e (d(u 1 ),,d(u e )) u2 time complexity: O( V + E ) NAACL (assuming constant arity) Dynamic Programming

47 Example: CKY Parsing parsing with CFGs in Chomsky Normal Form (CNF) typical instance of the generalized Viterbi for DAHs many variants of CKY ~ various topological ordering (S, 0, n) (S, 0, n) bottom-up left-to-right O(n 3 P ) NAACL Dynamic Programming

48 Example: CKY Parsing parsing with CFGs in Chomsky Normal Form (CNF) typical instance of the generalized Viterbi for DAHs many variants of CKY ~ various topological ordering (S, 0, n) (S, 0, n) (S, 0, n) bottom-up left-to-right right-to-left O(n 3 P ) NAACL Dynamic Programming

49 Parser/Tree Evaluation how would you evaluate the quality of output trees? need to define a similarity measure between trees for sequences, we used same length: hamming distance (e.g., POS tagging) varying length: edit distance (e.g., Japanese transliteration) varying length: precision/recall/f (e.g., word-segmentation) varying length: crossing brackets (e.g., word-segmentation) for trees, we use precision/recall/f and crossing brackets standard PARSEVAL metrics (implemented as evalb.py) NAACL Dynamic Programming

50 PARSEVAL comparing nodes ( brackets ): labelled (by default): (NP, 2, 5); or unlabelled: (2, 5) precision: how many predicted nodes are correct? recall: how many correct nodes are predicted? how to fake precision or recall? F-score: F=2pr/(p+r) other metrics: crossing brackets NAACL matched=6 predicted=7 gold=7 precision=6/7 recall=6/7 F=6/7 Dynamic Programming

51 Inside-Outside Algorithm NAACL Dynamic Programming

52 Inside-Outside Algorithm NAACL Dynamic Programming

53 Inside-Outside Algorithm inside prob beta is easy to compute (CKY, max=>+) what is outside prob alpha(x,i,j)? need to enumerate ways to go to TOP from X,i,j X,i,j can be combined with other nodes on the left/right L: sum_{y->z X, k} alpha(y,k,j) Pr(Y->Z X) beta(z,k,i) R: sum_{y->x Z, k} alpha(y,i,k) Pr(Y->X Z) beta(z,j,k) why beta is used in alpha? very diff. from F-W algorithm what is the likelihood of the sentence? beta(top, 0, n) or alpha(w_i, i, i+1) for any i NAACL Dynamic Programming

54 Inside-Outside Algorithm TOP TOP Y Y Z X X Z 0 k i j n 0 i j k n L: sum_{y->z X, k} alpha(y,k,j) Pr(Y->Z X) beta(z,k,i) R: sum_{y->x Z, k} alpha(y,i,k) Pr(Y->X Z) beta(z,j,k) NAACL Dynamic Programming

55 Inside-Outside Algorithm how do you do EM with alphas and betas? easy; M-step: divide by fractional counts fractional count of rule (X,i,j -> Y,i,k Z,k,j) is alpha(x,i,j) prob(y Z X) beta(y,i,k) beta(z,k,j) if we replace + by max, what will alpha/beta mean? beta : Viterbi inside: best way to derive X,i,j alpha : Viterbi outside: best way to go to TOP from X,i,j now what is alpha (X, i, j) beta (X, i, j)? best derivation that contains X,i,j (useful for pruning) NAACL Dynamic Programming

56 Translation as Parsing translation with SCFGs => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel VP PP (1) VP (2), VP (2) PP (1) VP juxing le huitan, held a meeting PP yu Shalong, with Sharon held a talk with Sharon VP1, 6 with Sharon held a talk complexity: same as CKY parsing -- O(n 3 ) PP1, 3 VP3, 6 yu Shalong juxing le huitan NAACL Dynamic Programming

57 Adding a Bigram Model... meeting... Shalong _... talk bigram with Sharon... Sharon... talks held... Sharon NAACL 2009 VP1, 6 bigram held... talk with... Sharon VP3, 6 PP1, 3 complexity: O(n 3 V 4(m-1) ) PP1, 3 VP3, 6 with... Sharon along... Sharon with... Shalong 57 VP1, 6 held... talk held... meeting hold... talks Dynamic Programming

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011 Outline Tree-based translation models: ynchronous context

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

A Context-Free Grammar

A Context-Free Grammar Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Artificial Intelligence

Artificial Intelligence CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is

More information

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically NP S AUX VP Ali will V NP help D N the man A node is any point in the tree diagram and it can

More information

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018. Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities

More information

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars LING 473: Day 10 START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars 1 Issues with Projects 1. *.sh files must have #!/bin/sh at the top (to run on Condor) 2. If run.sh is supposed

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

NLP Programming Tutorial 11 - The Structured Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is

More information

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations Language Technology CUNY Graduate Center Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required hard optional Liang Huang Probabilities experiment

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Unit 1: Sequence Models

Unit 1: Sequence Models CS 562: Empirical Methods in Natural Language Processing Unit 1: Sequence Models Lecture 5: Probabilities and Estimations Lecture 6: Weighted Finite-State Machines Week 3 -- Sep 8 & 10, 2009 Liang Huang

More information

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

{Probabilistic Stochastic} Context-Free Grammars (PCFGs) {Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

Lecture 5: UDOP, Dependency Grammars

Lecture 5: UDOP, Dependency Grammars Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Spring 2017 Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required optional Liang Huang Probabilities experiment

More information

In this chapter, we explore the parsing problem, which encompasses several questions, including:

In this chapter, we explore the parsing problem, which encompasses several questions, including: Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation

More information

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7 Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Lecture 9: Hidden Markov Model

Lecture 9: Hidden Markov Model Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501 Natural Language Processing 1 This lecture v Hidden Markov

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

X-bar theory. X-bar :

X-bar theory. X-bar : is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission

More information

CS 545 Lecture XVI: Parsing

CS 545 Lecture XVI: Parsing CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Context- Free Parsing with CKY. October 16, 2014

Context- Free Parsing with CKY. October 16, 2014 Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm

More information

Spectral Learning for Non-Deterministic Dependency Parsing

Spectral Learning for Non-Deterministic Dependency Parsing Spectral Learning for Non-Deterministic Dependency Parsing Franco M. Luque 1 Ariadna Quattoni 2 Borja Balle 2 Xavier Carreras 2 1 Universidad Nacional de Córdoba 2 Universitat Politècnica de Catalunya

More information

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs Context Free Grammars: Introduction CFGs are more powerful than RGs because of the following 2 properties: 1. Recursion Rule is recursive if it is of the form X w 1 Y w 2, where Y w 3 Xw 4 and w 1, w 2,

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Probabilistic Linguistics

Probabilistic Linguistics Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

P(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model:

P(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model: Chapter 5 Text Input 5.1 Problem In the last two chapters we looked at language models, and in your first homework you are building language models for English and Chinese to enable the computer to guess

More information

Structure and Complexity of Grammar-Based Machine Translation

Structure and Complexity of Grammar-Based Machine Translation Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection

More information

Syntax-based Statistical Machine Translation

Syntax-based Statistical Machine Translation Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I Part II - Introduction - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Syntax-Based Decoding

Syntax-Based Decoding Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison

More information

Synchronous Grammars

Synchronous Grammars ynchronous Grammars ynchronous grammars are a way of simultaneously generating pairs of recursively related strings ynchronous grammar w wʹ ynchronous grammars were originally invented for programming

More information

Sequence Labeling: HMMs & Structured Perceptron

Sequence Labeling: HMMs & Structured Perceptron Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition

More information

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language

More information

An introduction to PRISM and its applications

An introduction to PRISM and its applications An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1 Contents What is PRISM? Two examples: from population genetics from statistical natural

More information

CISC4090: Theory of Computation

CISC4090: Theory of Computation CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Hidden Markov Models

Hidden Markov Models CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word

More information

Language and Statistics II

Language and Statistics II Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

What we have done so far

What we have done so far What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.

More information

Context Free Grammars: Introduction

Context Free Grammars: Introduction Context Free Grammars: Introduction Context free grammar (CFG) G = (V, Σ, R, S), where V is a set of grammar symbols Σ V is a set of terminal symbols R is a set of rules, where R (V Σ) V S is a distinguished

More information