Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
|
|
- Milton Lloyd
- 5 years ago
- Views:
Transcription
1 CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang
2 Big Picture we have already covered... sequence models (WFSAs, WFSTs) decoding (Viterbi Algorithm) supervised training (counting, smoothing) unsupervised training (EM using forward-backward) in this unit we ll look beyond sequences, and cover... tree models (prob context-free grammars and extensions) decoding ( parsing, CKY Algorithm) supervised training (lexicalization, history-annotation,...) unsupervised training (EM via inside-outside) 2
3 Course Project (see handout) Important Dates Tue Nov 10: initial project proposal due Thu Nov 19: final project proposal due; data preparation and one-hour baseline finished. Thu Dec 3: midway project presentations Wed Dec 16: final project writeups due at Topic (see list of samples from previous years) must involve statistical processing of linguistic structures NO boring topics like text classification with bags of words Amount of Work: ~2 HWs for each student 3
4 Review of HW3 what s wrong with this epron-jpron.wfst? 100 (0 (AA "AA" *e* 1.0 ) ) (0 (D "D" *e* 1.0 ) ) (0 (F "F" *e* 1.0 ) ) (0 (HH "HH" *e* 1.0 ) ) (0 (G "G" *e* 1.0 ) ) (0 (B "B" *e* 1.0 ) )... (AA (100 *e* "O O" ) ) (AA (100 *e* "A" ) ) (AA (100 *e* "O" ) ) (AA (100 *e* "A A" ) )... (M (100 *e* "N" ) ) (M (100 *e* "M" ) ) (N (100 *e* "NN" ) ) ( 100 ( 0 " " *e* 1.0 ) ) ( 100 ( 0 *e* *e* 1.0 ) ) 4
5 Limitations of Sequence Models can you write an FSA/FST for the following? { (a n, b n ) } { (a 2n, b n ) } { a n b n } { w w R } { (w, w R ) } does it matter to human languages? [The woman saw the boy [that heard the man [that left] ] ]. [The claim [that the house [he bought] is valuable] is wrong]. but humans can t really process infinite recursions... stack overflow! 5
6 Let s try to write a grammar... (courtesy of Julia Hockenmaier) let s take a closer look... we ll try our best to represent English in a FSA... basic sentence structure: N, V, N 6
7 Subject-Verb-Object compose it with a lexicon, and we get an HMM so far so good 7
8 (Recursive) Adjectives the ball (courtesy of Julia Hockenmaier) the big ball the big, red ball the big, red, heavy ball... then add Adjectives, which modify Nouns the number of modifiers/adjuncts can be unlimited. how about no determiner before noun? play tennis 8
9 Recursive PPs (courtesy of Julia Hockenmaier) the ball the ball in the garden the ball in the garden behind the house the ball in the garden behind the house near the school... recursion can be more complex but we can still model it with FSAs! so why bother to go beyond finite-state? 9
10 FSAs can t go hierarchical! (courtesy of Julia Hockenmaier) but sentences have a hierarchical structure! so that we can infer the meaning we need not only strings, but also trees FSAs are flat, and can only do tail recursions (i.e., loops) but we need real (branching) recursions for languages 10
11 FSAs can t do Center Embedding The mouse ate the corn. (courtesy of Julia Hockenmaier) The mouse that the snake ate ate the corn. The mouse that the snake that the hawk ate ate ate the corn.... vs. The claim that the house he bought was valuable was wrong. vs. I saw the ball in the garden behind the house near the school. in theory, these infinite recursions are still grammatical competence (grammatical knowledge) in practice, studies show that English has a limit of 3 performance (processing and memory limitations) FSAs can model finite embeddings, but very inconvenient. 11
12 How about Recursive FSAs? problem of FSAs: only tail recursions, no branching recursions can t represent hierarchical structures (trees) can t generate center-embedded strings is there a simple way to improve it? recursive transition networks (RTNs) S NP VP -> > > 2 -> NP Det N -> > > 2 -> VP V NP -> > > 2 ->
13 Context-Free Grammars S NP VP NP Det N NP NP PP PP P NP N {ball, garden, house, sushi } P {in, behind, with} V... Det... VP V NP VP VP PP... 13
14 Context-Free Grammars A CFG is a 4-tuple N,Σ,R,S A set of nonterminals N (e.g. N = {S, NP, VP, PP, Noun, Verb,...}) A set of terminals Σ (e.g. Σ = {I, you, he, eat, drink, sushi, ball, }) A set of rules R R {A β with left-hand-side (LHS)" A N and right-hand-side (RHS) β (N Σ)* } A start symbol S (sentence) 14
15 Parse Trees N {sushi, tuna} P {with} V {eat} NP N NP NP PP PP P NP VP V NP VP VP PP 15
16 CFGs for Center-Embedding The mouse ate the corn. The mouse that the snake ate ate the corn. The mouse that the snake that the hawk ate ate ate the corn.... { a n b n } { w w R } can you also do { a n b n c n }? or { w w R w }? { a n b n c m d m } what s the limitation of CFGs? CFG for center-embedded clauses: S NP ate NP; NP NP RC; RC that NP ate 16
17 Review write a CFG for... { a m b n c n d m } { a m b n c 3m+2n } { a m b n c m d n } buffalo buffalo buffalo... write an FST or synchronous CFG for... { (w, w R ) } { (a n, b n ) } HW3: including p(eprons) is wrong HW4: using carmel to test your own code 17
18 Chomsky Hierarchy CS 498 JH: Introduction to NLP (Fall 08) 18
19 Constituents, Heads, Dependents CS 498 JH: Introduction to NLP (Fall 08) 19
20 Constituency Test how about there is or I do? CS 498 JH: Introduction to NLP (Fall 08) 20
21 Arguments and Adjuncts arguments are obligatory CS 498 JH: Introduction to NLP (Fall 08) 21
22 Arguments and Adjuncts adjuncts are optional CS 498 JH: Introduction to NLP (Fall 08) 22
23 Noun Phrases (NPs) CS 498 JH: Introduction to NLP (Fall 08) 23
24 The NP Fragment CS 498 JH: Introduction to NLP (Fall 08) 24
25 ADJPs and PPs CS 498 JH: Introduction to NLP (Fall 08) 25
26 Verb Phrase (VP) CS 498 JH: Introduction to NLP (Fall 08) 26
27 VPs redefined CS 498 JH: Introduction to NLP (Fall 08) 27
28 Sentences CS 498 JH: Introduction to NLP (Fall 08) 28
29 Sentence Redefined CS 498 JH: Introduction to NLP (Fall 08) 29
30 Probabilistic CFG normalization sumβ p( A β) =1 what s the most likely tree? in finite-state world, what s the most likely string? given string w, what s the most likely tree for w this is called parsing (like decoding) CS 498 JH: Introduction to NLP (Fall!08) 30
31 Probability of a tree CS 498 JH: Introduction to NLP (Fall!08) 31
32 Most likely tree given string parsing is to search for the best tree t* that: t* = argmax_t p (t w) = argmax_t p(t) p (w t) = argmax_{t: yield(t)=w} p(t) analogous to HMM decoding is it related to intersection or composition in FSTs? 32
33 CKY Algorithm (S, 0, n) NAACL w0 w1... wn-1 Dynamic Programming
34 CKY Algorithm flies like a flower S NP VP NP DT NN NP NNS NP NP PP VP VB NP VP VP PP VP VB PP P NP VB flies NNS flies VB like P like DT a NN flower NAACL Dynamic Programming
35 CKY Algorithm S, VP, NP NNS, VB, NP S VP VB, P, VP, PP DT NP NN flies like a flower S NP VP NP DT NN NP NNS NP NP PP VP VB NP VP VP PP VP VB PP P NP VB flies NNS flies VB like P like DT a NN flower NAACL Dynamic Programming
36 CKY Example NAACL 2009 CS 498 JH: Introduction to NLP (Fall!08) 36 Dynamic Programming
37 Chomsky Normal Form wait! how can you assume a CFG is binary-branching? well, we can always convert a CFG into Chomsky- Normal Form (CNF) A B C A a how to deal with epsilon-removal? how to do it with PCFG? 37
38 Parsing as Deduction (B, i, k) (C, k, j) : a (A, i, j) : b A B C : a b Pr(A B C) NAACL Dynamic Programming
39 Parsing as Intersection (B, i, k) (C, k, j) : a (A, i, j) : b A B C : a b Pr(A B C) intersection between a CFG G and an FSA D: define L(G) to be the set of strings (i.e., yields) G generates define L(G D) = L(G) L(D) what does this new language generate?? what does the new grammar look like? what about CFG CFG? NAACL Dynamic Programming
40 Parsing as Composition NAACL Dynamic Programming
41 Packed Forests a compact representation of many parses by sharing common sub-derivations polynomial-space encoding of exponentially large set nodes hyperedges a hypergraph 0 I 1 saw 2 him 3 with 4 a 5 mirror 6 NAACL 2009 (Klein and Manning, 2001; Huang and Chiang, 2005) 41 Dynamic Programming
42 Lattice vs. Forest NAACL Dynamic Programming
43 Forest and Deduction (B, i, k) u1 : a fe (C, k, j) u2 : b (B, i, k) (C, k, j) (A, i, j) A B C v : a b Pr(A B C) (A, i, j) (Nederhof, 2003) tails u1 : a u2 : b u1 : a : b u2 antecedents fe fe v : fe (a,b) v : fe (a,b) head consequent NAACL Dynamic Programming
44 Related Formalisms v v OR-node e e AND-node u1 u2 u1 u2 OR-nodes NAACL Dynamic Programming
45 Viterbi Algorithm for DAGs 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): key observation: d(u) is fixed to optimal at this time u w(u, v) v d(v) = d(u) w(u, v) time complexity: O( V + E ) NAACL Dynamic Programming
46 Viterbi Algorithm for DAHs 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming hyperedge e = ((u1,.., u e ), v, fe) use d(ui) s to update d(v) key observation: d(u i) s are fixed to optimal at this time u1 fe v d(v) = f e (d(u 1 ),,d(u e )) u2 time complexity: O( V + E ) NAACL (assuming constant arity) Dynamic Programming
47 Example: CKY Parsing parsing with CFGs in Chomsky Normal Form (CNF) typical instance of the generalized Viterbi for DAHs many variants of CKY ~ various topological ordering (S, 0, n) (S, 0, n) bottom-up left-to-right O(n 3 P ) NAACL Dynamic Programming
48 Example: CKY Parsing parsing with CFGs in Chomsky Normal Form (CNF) typical instance of the generalized Viterbi for DAHs many variants of CKY ~ various topological ordering (S, 0, n) (S, 0, n) (S, 0, n) bottom-up left-to-right right-to-left O(n 3 P ) NAACL Dynamic Programming
49 Parser/Tree Evaluation how would you evaluate the quality of output trees? need to define a similarity measure between trees for sequences, we used same length: hamming distance (e.g., POS tagging) varying length: edit distance (e.g., Japanese transliteration) varying length: precision/recall/f (e.g., word-segmentation) varying length: crossing brackets (e.g., word-segmentation) for trees, we use precision/recall/f and crossing brackets standard PARSEVAL metrics (implemented as evalb.py) NAACL Dynamic Programming
50 PARSEVAL comparing nodes ( brackets ): labelled (by default): (NP, 2, 5); or unlabelled: (2, 5) precision: how many predicted nodes are correct? recall: how many correct nodes are predicted? how to fake precision or recall? F-score: F=2pr/(p+r) other metrics: crossing brackets NAACL matched=6 predicted=7 gold=7 precision=6/7 recall=6/7 F=6/7 Dynamic Programming
51 Inside-Outside Algorithm NAACL Dynamic Programming
52 Inside-Outside Algorithm NAACL Dynamic Programming
53 Inside-Outside Algorithm inside prob beta is easy to compute (CKY, max=>+) what is outside prob alpha(x,i,j)? need to enumerate ways to go to TOP from X,i,j X,i,j can be combined with other nodes on the left/right L: sum_{y->z X, k} alpha(y,k,j) Pr(Y->Z X) beta(z,k,i) R: sum_{y->x Z, k} alpha(y,i,k) Pr(Y->X Z) beta(z,j,k) why beta is used in alpha? very diff. from F-W algorithm what is the likelihood of the sentence? beta(top, 0, n) or alpha(w_i, i, i+1) for any i NAACL Dynamic Programming
54 Inside-Outside Algorithm TOP TOP Y Y Z X X Z 0 k i j n 0 i j k n L: sum_{y->z X, k} alpha(y,k,j) Pr(Y->Z X) beta(z,k,i) R: sum_{y->x Z, k} alpha(y,i,k) Pr(Y->X Z) beta(z,j,k) NAACL Dynamic Programming
55 Inside-Outside Algorithm how do you do EM with alphas and betas? easy; M-step: divide by fractional counts fractional count of rule (X,i,j -> Y,i,k Z,k,j) is alpha(x,i,j) prob(y Z X) beta(y,i,k) beta(z,k,j) if we replace + by max, what will alpha/beta mean? beta : Viterbi inside: best way to derive X,i,j alpha : Viterbi outside: best way to go to TOP from X,i,j now what is alpha (X, i, j) beta (X, i, j)? best derivation that contains X,i,j (useful for pruning) NAACL Dynamic Programming
56 Translation as Parsing translation with SCFGs => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel VP PP (1) VP (2), VP (2) PP (1) VP juxing le huitan, held a meeting PP yu Shalong, with Sharon held a talk with Sharon VP1, 6 with Sharon held a talk complexity: same as CKY parsing -- O(n 3 ) PP1, 3 VP3, 6 yu Shalong juxing le huitan NAACL Dynamic Programming
57 Adding a Bigram Model... meeting... Shalong _... talk bigram with Sharon... Sharon... talks held... Sharon NAACL 2009 VP1, 6 bigram held... talk with... Sharon VP3, 6 PP1, 3 complexity: O(n 3 V 4(m-1) ) PP1, 3 VP3, 6 with... Sharon along... Sharon with... Shalong 57 VP1, 6 held... talk held... meeting hold... talks Dynamic Programming
Decoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationCS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011 Outline Tree-based translation models: ynchronous context
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationA Context-Free Grammar
Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation
More informationRoger Levy Probabilistic Models in the Study of Language draft, October 2,
Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationArtificial Intelligence
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is
More informationCh. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically
Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically NP S AUX VP Ali will V NP help D N the man A node is any point in the tree diagram and it can
More informationRecap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.
Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities
More informationLING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars
LING 473: Day 10 START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars 1 Issues with Projects 1. *.sh files must have #!/bin/sh at the top (to run on Condor) 2. If run.sh is supposed
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationLanguage Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations
Language Technology CUNY Graduate Center Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required hard optional Liang Huang Probabilities experiment
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationUnit 1: Sequence Models
CS 562: Empirical Methods in Natural Language Processing Unit 1: Sequence Models Lecture 5: Probabilities and Estimations Lecture 6: Weighted Finite-State Machines Week 3 -- Sep 8 & 10, 2009 Liang Huang
More informationNatural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation
atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationGrammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG
Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing
More information{Probabilistic Stochastic} Context-Free Grammars (PCFGs)
{Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationLecture 5: UDOP, Dependency Grammars
Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M
More informationNatural Language Processing
Natural Language Processing Spring 2017 Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required optional Liang Huang Probabilities experiment
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationQuiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7
Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationLecture 9: Hidden Markov Model
Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501 Natural Language Processing 1 This lecture v Hidden Markov
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationSharpening the empirical claims of generative syntax through formalization
Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?
More informationX-bar theory. X-bar :
is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.
More informationCS 6120/CS4120: Natural Language Processing
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission
More informationCS 545 Lecture XVI: Parsing
CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationContext- Free Parsing with CKY. October 16, 2014
Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm
More informationSpectral Learning for Non-Deterministic Dependency Parsing
Spectral Learning for Non-Deterministic Dependency Parsing Franco M. Luque 1 Ariadna Quattoni 2 Borja Balle 2 Xavier Carreras 2 1 Universidad Nacional de Córdoba 2 Universitat Politècnica de Catalunya
More informationContext Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs
Context Free Grammars: Introduction CFGs are more powerful than RGs because of the following 2 properties: 1. Recursion Rule is recursive if it is of the form X w 1 Y w 2, where Y w 3 Xw 4 and w 1, w 2,
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationProbabilistic Linguistics
Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationP(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model:
Chapter 5 Text Input 5.1 Problem In the last two chapters we looked at language models, and in your first homework you are building language models for English and Chinese to enable the computer to guess
More informationStructure and Complexity of Grammar-Based Machine Translation
Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection
More informationSyntax-based Statistical Machine Translation
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I Part II - Introduction - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationCross-Entropy and Estimation of Probabilistic Context-Free Grammars
Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationSyntax-Based Decoding
Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison
More informationSynchronous Grammars
ynchronous Grammars ynchronous grammars are a way of simultaneously generating pairs of recursively related strings ynchronous grammar w wʹ ynchronous grammars were originally invented for programming
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationBasic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology
Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models
INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language
More informationAn introduction to PRISM and its applications
An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1 Contents What is PRISM? Two examples: from population genetics from statistical natural
More informationCISC4090: Theory of Computation
CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationHidden Markov Models
CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word
More informationLanguage and Statistics II
Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma
More informationRecap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees
Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III
More informationWhat we have done so far
What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.
More informationContext Free Grammars: Introduction
Context Free Grammars: Introduction Context free grammar (CFG) G = (V, Σ, R, S), where V is a set of grammar symbols Σ V is a set of terminal symbols R is a set of rules, where R (V Σ) V S is a distinguished
More information