Advanced Natural Language Processing Syntactic Parsing
|
|
- Lawrence Rice
- 5 years ago
- Views:
Transcription
1 Advanced Natural Language Processing Syntactic Parsing Alicia Ageno Universitat Politècnica de Catalunya NLP statistical parsing 1
2 Parsing Review Statistical Parsing SCFG Inside Algorithm Outside Algorithm Viterbi Algorithm Learning models Grammar acquisition: Grammatical induction NLP statistical parsing 2
3 Parsing Parsing: recognising higher level units of structure that allow us to compress our description of a sentence Goal of syntactic analysis (parsing): Detect if a sentence is correct Provide a syntactic structure of a sentence Parsing is the task of uncovering the syntactic structure of language and is often viewed as an important prerequisite for building systems capable of understanding language Syntactic structure is necessary as a first step towards semantic interpretation, for detecting phrasal chunks for indexing in an IR system... NLP statistical parsing 3
4 Parsing A syntactic tree NLP statistical parsing 4
5 Parsing Another syntactic tree NLP statistical parsing 5
6 Parsing A dependency tree NLP statistical parsing 6
7 Parsing A real sentence NLP statistical parsing 7
8 Parsing Theories of Syntactic Structure Constituent trees Dependency trees NLP statistical parsing 8
9 Parsing Factors in parsing Grammar expressivity Coverage Involved Knowledge Sources Parsing strategy Parsing direction Production application order Ambiguity management NLP statistical parsing 9
10 Parsing Parsers today CFG (extended or not) Tabular Charts LR Unification-based Statistical Dependency parsing Robust parsing (shallow, fragmental, chunkers, spotters) NLP statistical parsing 10
11 Parsing Context Free Grammars (CFGs) NLP statistical parsing 11
12 Parsing Context Free Grammars, example NLP statistical parsing 12
13 Parsing Properties of CFGs NLP statistical parsing 13
14 Parsing I was on the hill that has a telescope when I saw a man. I saw a man who was on a hill and who had a telescope. I saw a man who was on the hill that has a telescope on it. Using a telescope, I saw a man who was on a hill. I was on the hill when I used the telescope to see a man.... I saw the man on the hill with the telescope Me See A man The telescope The hill NLP statistical parsing 14
15 Parsing Chomsky Normal Form (CNF) NLP statistical parsing 15
16 Parsing Tabular Methods Dynamic programming CFG CKY (Cocke, Kasami, Younger,1967) Grammar in CNF Earley 1969 Extensible to unification, probabilistic, etc... NLP statistical parsing 16
17 Parsing Parsing as searching in a search space Characterizing the states (if possible) enumerate them Define the initial state (s) Define (if possible) final states or the condition to reach one of them NLP statistical parsing 17
18 Tabular methods: CKY General parsing schema (Sikkel 97) <X, H, D> V (D) H X V X domain, set of items set of de hypothesis set of valid entities set of deductive steps NLP statistical parsing 18
19 Tabular methods: CKY G = <N,, P,S >, G CNF, w = a 1... a n <X, H, D> X = {[A, i, j] 1 i j A N G } H = {[A, j, j] A a j P G 1 j n } D = {[B, i, j], [C, j+1, k] [A, i, k] A BC P G 1 i j < k} V (D) = {[A, i, j] A * a i... a j } CKY domain, set of items set of de hypothesis set of valid entities set of deductive steps NLP statistical parsing 19
20 Tabular methods: CKY CKY spatial cost O(n 2 ) temporal cost O(n 3 ) CNF BU strategy: dynamically build the parsing table t ji rows: width of each component, 1 j wi + 1 columns: initial position of each component, 1 i w where w = a 1,... a n is the input string, w =n NLP statistical parsing 20
21 Tabular methods: CKY A t j,i B C a 1 a 2... a i... a n j Where A -> BC is a binary production of the grammar NLP statistical parsing 21
22 Tabular methods: CKY That A is in cell t j,i means that from A the text fragment a i,... a i+j-1 (string of length j starting in i-esim position) can be derived. The grammaticality condition is that the initial symbol of the grammar (S) satisfies S t w 1 NLP statistical parsing 22
23 Tabular methods: CKY The table is built BU Base case: row1 is built using only the unary rules of the grammar: j=1 t 1i = {A [A --> a i ] P} Recursive case: rows j=2,... are built. The key of the algorithm is that when row j is built all the previous ones (from 1 to j-1) are already built: row j > 1 t ji = {A k, 1 k j, [A-->BC] P, B t ki,c t j-k,i+k } NLP statistical parsing 23
24 Tabular methods: CKY 1. Add the lexical edges: t[1,i] 2. for j = 2 to n: for i = 1 to n-j: for k = 1 to j-1: if: then: ABC and B t[k,i] and C t[j-k,i+k] add ABC to t[j,i] 3. If St[n,1], return the corresponding parse NLP statistical parsing 24
25 Tabular methods: CKY sentence NP, VP NP A, B VP C, NP A det B n NP n VP vi C vt Parse the sentence the cat eats fish the (det) cat(n) eats(vt,vi) fish(n) NLP statistical parsing 25
26 Tabular methods: CKY the cat eats fish sentence the cat eats sentence cat eats fish sentence the cat NP cat eats sentence eats fish VP the (det) A cat (n) B, NP eats (vt, vi) C, VP fish (n) B, NP NLP statistical parsing 26
27 Statistical parsing Introduction SCFG Inside Algorithm Outside Algorithm Viterbi Algorithm Learning models Grammar acquisition: Grammatical induction NLP statistical parsing 27
28 Statistical parsing Using statistical models for Determining the sentence (ex. speech recognizers) The job of the parser is to be a language model Guiding parsing Order or prune the search space Get the most likely parse Ambiguity resolution E.g. Pp-attachment NLP statistical parsing 28
29 Statistical parsing Lexical approaches Context free: unigram Context dependent: N-gram, HMM Syntactic approaches SCFG (or PCFG) Hybrid approaches Stochastic Lexicalized Tags Computing the most likeky (most probable) parse Viterbi Parameter learning Supervised Tagged/parsed corpora Non supervised Baum-Welch (Fw-Bw) para HMM Inside-Outside for SCFG NLP statistical parsing 29
30 SCFG Stochastic Context-Free Grammars (or PCFGs) Associate a probability to each rule Associate a probability to each lexical entry Frequent restriction CNF: Binary rules A p A q A r matrix B pqr Unary rules A p b m matrix U pm NLP statistical parsing 30
31 SCFG NLP statistical parsing 31
32 SCFG NLP statistical parsing 32
33 SCFG NLP statistical parsing 33
34 Parsing SCFG Starting from a CFG SCFG For each rule of G, (A ) P G we should be able to define a probability P(A ) ( A ) P( A ) P G 1 Probability of a tree P( ) ( A ) P( A P G ) f ( A ; ) NLP statistical parsing 34
35 Parsing SCFG P(t) -- Probability of a tree t (product of probabilities of the rules generating it. P(w 1n ) -- Probability of a sentence is the sum of the probabilities of all the valid parse trees of the sentence P(w 1n ) = Σ j P(w 1n, t) where t is a parse of w 1n = Σ j P(t) NLP statistical parsing 35
36 Parsing SCFG Positional invariance: The probability of a subtree is independent of its position in the derivation tree Context-free: the probability of a subtree does not depend on words not dominated by a subtree Ancestor-free: the probability of a subtree does not depend on nodes in the derivation outside the subtree NLP statistical parsing 36
37 Parsing SCFG Parameter estimation Supervised learning From a treebank (MLE) { 1,, N } Non supervised learning Inside/Outside (EM) Similar to Baum-Welch in HMMs NLP statistical parsing 37
38 NLP statistical parsing 38 Parsing SCFG P G A A A A P ) ( ) #( ) #( ) ( N i i A f A 1 ) ; ( ) ( # Supervised learning: Maximum Likelihood Estimation (MLE)
39 SCFG in CNF Learning using CNF CNF: Most frequent approach Binary rules: A p A q A r matrix B p,q,r Unary rules: A p b m matrix U p,m that should satisfy: p, q,r B p,q, r m U p,m 1 A 1 is the axiom of the grammar. d = derivation = sequence of rule applications from A 1 to w: A 1 = d = w p(d G) d k1 p( k k G) 1 p(w G) d: A 1 * P(d w G) NLP statistical parsing 39
40 SCFG in CNF A 1 A p w 1... w i w k+1... w n A q A r A s w i w k b m = w j NLP statistical parsing 40
41 SCFG in CNF Learning using CNF Problems to solve (~ HMM) Probability of a string (LM) p(w 1n G) Most probable parse of a string arg max t p(t w 1n G) Parameter learning: Find G such that if maximizes p(w 1n G) NLP statistical parsing 41
42 SCFG in CNF HMM Probability distribution over strings of a certain length For all n: Σ W1n P(w 1n ) = 1 PCFG Probability distribution over the set of strings that are in the language L Σ L P( ) = 1 Example: P(John decided to bake a) NLP statistical parsing 42
43 SCFG in CNF HMM Probability distribution over strings of a certain length For all n: Σ W1n P(w 1n ) = 1 Forward/Backward Forward α i (t) = P(w 1(t-1), X t =i) Backward β i (t) = P(w tt X t =i) PCFG Probability distribution over the set of strings that are in the language L Σ L P( ) = 1 Inside/Outside Outside O i (p,q) = P(w 1p-1, N i pq, w (q+1)m G) Inside I i (p,q) = P(w pq N i pq, G) NLP statistical parsing 43
44 SCFG in CNF A 1 outside A p A q A r inside NLP statistical parsing 44
45 SCFG in CNF Inside probability I p (i,j) = P(A p * w i... w j ) This probability can be computed bottom up Starting with the shorter constituents base case: I p (i,i) = p(a p * w i ) = U p,m (w m = w i ) recurrence: I p (i, k) k 1 q,r ji I q (i, j) I r (j1, k) B p,q,r NLP statistical parsing 45
46 SCFG in CNF Outside probability: O q (i,j) = P(A 1 * w 1... w i-1 A q w j+1... w n ) This probability can be computed top down Starting with the widest constituents Base case: O 1 (1,n) = p(a 1 * A 1 ) = 1 O j (1,n) = 0, for j 1 Recurrence: two cases, over all the possible partitions O N q p1 (i, j) N r r 1 q n O (i,k) I (j1, k) B p r p,q,r kj1 p1 r1 N N i1 k1 O p (k, j) I r (k,i1) B p,r,q NLP statistical parsing 46
47 Two splitting forms: First SCFG in CNF O q (i, j) O p (i, k) I r (j1, k) B p,q,r A 1 A 1 A q A p w 1...w i-1 w j+1...w n A q A r w 1... w i-1 w j+1... w k w k+1... w n NLP statistical parsing 47
48 SCFG in CNF second: O q (i, j) O p (k, j) I r (k,i1) B p,r,q A 1 A 1 A q A p w 1...w i-1 w j+1...w n A r A q w 1... w k-1 w k... w i-1 w j+1... w n NLP statistical parsing 48
49 SCFG in CNF Viterbi O( G n 3 ) Given a sentence w 1... w n M P (i,j) contains the maximum probability of derivation A p * w i... w j M can be computed incrementally for increasing values of the substring using induction over the length j i +1 Base case: A p M p (i,i) = p(a p * w i ) = U p,m (w m = w i ) w i NLP statistical parsing 49
50 SCFG in CNF Recurrence: Consider all the forms of decomposing A p into 2 components updating the maximum probability M p (i, j) q,r j1 max max M ki q (i, k) M (k r 1, j) B p,q,r Recall that using sum instead of max we get the inside algorithm: p(w 1n G) A q A p A r w i... w k w k+1... w j k - i +1 j - k j i + 1 NLP statistical parsing 50
51 SCFG in CNF To get the probability of best (most probable) derivation: M 1 (1,n) To get the best derivation tree we need to maintain not only the probability M P (i,j) but also the cut point and the two categories of the right side of the rule: (i, p j) arg max M q,r,k q (i, k) M (k 1, j) B r A p p,q,r A RHS1(p,i,j) A RHS2(p,i,j) w i... w SPLIT(p,i,j) w SPLIT(p,i,j) w j NLP statistical parsing 51
52 SCFG in CNF Learning the models. Supervised approach Parameters (probabilities, i.e. matrices B and U) of a corpus MLE (Maximum Likelihood Estimation): Corpus fully parsed (i.e. set of pairs <sentence, correct parse tree> ) Bˆ p,q,r pˆ(a p A q A r ) E(# Ap AqA E(# A G) p r G) NLP statistical parsing 52
53 SCFG in CNF Learning the models. Unsupervised approach Inside/Outside algorithm: Similar to Forward-Backward (Baum-Welch) for HMM Particular application of Expectation Maximization (EM) algorithm: 1. Start with an initial model µ0 (uniform, random, MLE...) 2. Compute observation probability using current model 3. Use obtained probabilities as data to reestimate the model, computing µ 4. Let µ= µ and repeat until no significant improvement (convergence) Iterative hill-climbing: Local maxima. EM property: Pµ (O) Pµ(O) NLP statistical parsing 53
54 SCFG in CNF Learning the models. Unsupervised approach Inside/Outside algorithm: Input: set of training examples (non parsed sentences) and a CFG G Initialization: choose initial parameters P for each rule in the grammar: (randomly or from small labelled corpus using MLE) P( A ) 0 Expectation: compute the posterior probability of each annotated rule and position in each training set tree T Maximization: use these probabilities as weighted observations to update the rule probabilities ( A ) P( A ) P G 1 NLP statistical parsing 54
55 SCFG in CNF Inside/Outside algorithm: For each training sentence w, we compute the insideoutside probabilities. We can multiply the probabilities inside and outside: O i (j,k) I i (j,k) = P(A 1 * w 1... w n, A i * w j... w k G ) = P(w 1n, A i jk G) So that the estimate of A i being used in the derivation: E(A i is used in the derivation ) n p1 n qp O i (p, q) I I 1 (1, n) i (p, q) NLP statistical parsing 55
56 SCFG in CNF Inside/Outside algorithm: The estimate of A i A r A s being used in the derivation: E(A i A r A s ) n1 n q1 p1 q p 1d p O i (p, q) B (1, n) For unary rules, the estimate of A i w m being used: E(A i w m ) n h1 O i (h, h) P(w I 1 h And we can reestimate P(A i A r A s ) and P(A i w m ): P(A i A r A s ) = E(A i A r A s ) /E(A i used) P(A i w m ) = E(A i w m ) /E(A i used) I 1 (1, n) w i,r,s m I r ) I i (p, d) I (h, h) s (d 1, q) NLP statistical parsing 56
57 SCFG in CNF Inside/Outside algorithm: Assuming independence of the sentences in the training corpus, we sum the contributions from multiple sentences in the reestimation process. We can reestimate the values of P(A p A q A r ) and P(A p w m ) and from them the new values of U p,m and B p,q,r The I-O algorithm is to iterate this process of parameter reestimation until the change in the estimated probability is small: P W G ) P( W G ) ( i1 i NLP statistical parsing 57
58 SCFG Pros and cons of SCFG Some idea of the probability of a parse But not very good. CFG cannot be learned without negative examples, SCFG can SCFGs provide a LM for a language In practice SCFG provide a worse LM than an n-gram (n>1) P([N [N toy] [N [N coffee] [N grinder]]]) = P ([N [N [N cat] [N food]] [N tin]]) P (NP Pro) is > in Subj position than in Obj position. NLP statistical parsing 58
59 SCFG Pros and cons of SCFG Robust Possibility of combining SCFG with 3-grams SCFG assign a lot of probability mass to short sentences (a small tree is more probable than a big one) Parameter estimation (probabilities) Problem of sparseness Volume NLP statistical parsing 59
60 Statistical parsing Grammatical induction from corpora Goal: Parsing of non restricted texts with a reasonable level of accuracy (>90%) and efficiency. Requirements: Corpora tagged (with POS): Brown, LOB, Clic-Talp Corpora analyzed: Penn treebank, Susanne, Ancora NLP statistical parsing 60
61 Treebank grammars Penn Treebank = 50,000 sentences with associated trees Usual set-up: 40,000 training sentences, 2400 test sentences NLP statistical parsing 61
62 Treebank grammars Grammars directly derived from a treebank Charniak,1996 UsingPTB 47,000 sentences Navigating PTB where each local subtree provides the left hand and right hand side of a rule Precision and recall around 80% Around 17,500 rules NLP statistical parsing 62
63 Treebank grammars Learning Treebank Grammars Σ j P(N i ζ j N i ) = 1 NLP statistical parsing 63
64 Treebank grammars Supervised learning MLE NLP statistical parsing 64
65 Treebank grammars Proposals for transformation of the obtained PTB grammar: Sekine,1997, Sekine & Grishman,1995 Treebank grammars compactation Lacking generalization ability Continuous growth of the grammar size Most induced rules present low frequency Krotov et al,1999, Krotov,1998, Gaizauskas,1995 NLP statistical parsing 65
66 Treebank grammars Treebank Grammars compactation Partial bracketting NP DT NN CC DT NN NP NP CC NP NP DT NN Redundance removing (some rules can be generated from others) NLP statistical parsing 66
67 Treebank grammars Removing non linguistically valid rules Assign probabilities (MLE) to the initial rules Remove a rule unless the probability of the structure built from its application is greater than the probability of building the structure by applying simpler rules. Thresholding Removing rules occurring < n times Full Simply Fully Linguistically Linguistically thresholded compacted Compacted Compacted Grammar 1 Grammar 2 Recall Precision Grammar size 15,421 7,278 1,122 4,820 6,417 NLP statistical parsing 67
68 Treebank grammars Applying compactation 17,529 1,667 rules #rules % corpus 60% 100% NLP statistical parsing 68
Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More information{Probabilistic Stochastic} Context-Free Grammars (PCFGs)
{Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationNatural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation
atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationStatistical Processing of Natural Language
Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative
More informationPCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities
1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence
More informationCS : Speech, NLP and the Web/Topics in AI
CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities Probability of a parse tree (cont.) S 1,l NP 1,2
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationBasic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology
Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are
More informationA DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005
A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationReview. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering
Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationProbabilistic Context Free Grammars
1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set
More informationUnit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...
More informationSpectral Learning for Non-Deterministic Dependency Parsing
Spectral Learning for Non-Deterministic Dependency Parsing Franco M. Luque 1 Ariadna Quattoni 2 Borja Balle 2 Xavier Carreras 2 1 Universidad Nacional de Córdoba 2 Universitat Politècnica de Catalunya
More informationIntroduction to Probablistic Natural Language Processing
Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation
More informationLanguage and Statistics II
Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma
More informationS NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.
/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5
More informationRemembering subresults (Part I): Well-formed substring tables
Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01, 1. February 2005 Problem: Inefficiency of recomputing subresults Two
More informationStochastic Parsing. Roberto Basili
Stochastic Parsing Roberto Basili Department of Computer Science, System and Production University of Roma, Tor Vergata Via Della Ricerca Scientifica s.n.c., 00133, Roma, ITALY e-mail: basili@info.uniroma2.it
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationTo make a grammar probabilistic, we need to assign a probability to each context-free rewrite
Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to
More informationCS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models
INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language
More informationStatistical NLP: Hidden Markov Models. Updated 12/15
Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first
More informationConstituency Parsing
CS5740: Natural Language Processing Spring 2017 Constituency Parsing Instructor: Yoav Artzi Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and
More informationLecture 5: UDOP, Dependency Grammars
Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationSo# Inference and Posterior Marginals. September 19, 2013
So# Inference and Posterior Marginals September 19, 2013 So# vs. Hard Inference Hard inference Give me a single solucon Viterbi algorithm Maximum spanning tree (Chu- Liu- Edmonds alg.) So# inference Task
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation
More informationGrammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG
Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationSharpening the empirical claims of generative syntax through formalization
Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationA Context-Free Grammar
Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily
More information1. Markov models. 1.1 Markov-chain
1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited
More informationMultilevel Coarse-to-Fine PCFG Parsing
Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationCMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009
CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models Jimmy Lin The ischool University of Maryland Wednesday, September 30, 2009 Today s Agenda The great leap forward in NLP Hidden Markov
More informationRecap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.
Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities
More informationHidden Markov Models (HMMs)
Hidden Markov Models HMMs Raymond J. Mooney University of Texas at Austin 1 Part Of Speech Tagging Annotate each word in a sentence with a part-of-speech marker. Lowest level of syntactic analysis. John
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models
1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCS 6120/CS4120: Natural Language Processing
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission
More informationText Mining. March 3, March 3, / 49
Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49
More informationProbabilistic Linguistics
Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationLecture 12: EM Algorithm
Lecture 12: EM Algorithm Kai-Wei hang S @ University of Virginia kw@kwchang.net ouse webpage: http://kwchang.net/teaching/nlp16 S6501 Natural Language Processing 1 Three basic problems for MMs v Likelihood
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models
INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 18, 2017 Recap: Probabilistic Language
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationHidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391
Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n
More informationCS838-1 Advanced NLP: Hidden Markov Models
CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn
More informationA gentle introduction to Hidden Markov Models
A gentle introduction to Hidden Markov Models Mark Johnson Brown University November 2009 1 / 27 Outline What is sequence labeling? Markov models Hidden Markov models Finding the most likely state sequence
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationIntroduction to Computational Linguistics
Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationProbabilistic Context-Free Grammars and beyond
Probabilistic Context-Free Grammars and beyond Mark Johnson Microsoft Research / Brown University July 2007 1 / 87 Outline Introduction Formal languages and Grammars Probabilistic context-free grammars
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More information