Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation
|
|
- Coleen Holt
- 5 years ago
- Views:
Transcription
1 atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation
2 Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?)
3 Syntactic parsing Syntax The study of the patterns of formation of sentences and phrases from words Borders with semantics and morphology sometimes blurred Parsing The process of predicting syntactic representations Syntactic Representations Afyonkarahisarlılaştırabildiklerimizdenmişsinizcesinee in Turkish mean "as if you are one of the people that we thought to be originating from Afyonkarahisar" [wikipedia] P My ifferent types of syntactic representations are possible, for example: dog S V ate VP a sausage Each internal node correspond to a phrase root poss nsubj dobj det root My dog ate a sausage P V Constuent (a.k.a. phrasestructure) tree ependency tree 3
4 Constituent trees S Internal nodes correspond to phrases VP S a sentence P My dog V ate a sausage (oun Phrase): My dog, a sandwich, lakes,.. VP (Verb Phrase): ate a sausage, barked, PP (Prepositional phrases): with a friend, in a car, odes immediately above words are PoS tags (aka preterminals) P pronoun determiner V verb noun P preposition 4
5 Bracketing notation S VP P V My dog ate a sausage It is often convenient to represent a tree as a bracketed sequence (S ) ( (P My) ( og) ) (VP (V ate) ( ( a ) ( sausage) ) ) We will use this format in the assignment 5
6 ependency trees root esignated root symbol poss nsubj dobj det root My dog ate a sausage P V odes are words (along with PoS tags) irected arcs encode syntactic dependencies between them Labels are types of relations between the words poss possesive dobj direct object nsub - subject det - determiner 6
7 Constituent and dependency representations Constituent trees can (potentially) be converted to dependency trees P S V VP One potential rule to extract nsubj dependency. (In practice, they are specified a bit differently) My dog ate ependency trees can (potentially) be converted to constituent trees Recovering labels is harder S VP a sandwich poss root nsubj Roughly: every word along with all its dependents corresponds to a phrase = to an inner node in the constituent tree P My dog V barked root My dog barked P V 7
8 Recovering shallow semantics root poss nsubj dobj det root My dog ate a sausage P V Some semantic information can be (approximately) derived from syntactic information Subjects (nsubj) are (often) agents ("initiator / doers for an action") irect objects (dobj) are (often) patients ("affected entities") 8
9 Recovering shallow semantics root poss nsubj dobj det root My dog ate a sausage P V Some semantic information can be (approximately) derived from syntactic information Subjects (nsubj) are (often) agents ("initiator / doers for an action") irect objects (dobj) are (often) patients ("affected entities") But even for agents and patients consider: Mary is baking a cake in the oven A cake is baking in the oven Syntactic subject corresponds to an agent (Who is baking?) Syntactic subject corresponds to a patient (What is being baked?) In general it is not trivial even for the most shallow forms of semantics E.g., consider prepositions: in can encode direction, position, temporal information, 9
10 Brief history of parsing Before mid-90s syntactic rule-based parsers producing rich linguistic information Provide low coverage (though hard to evaluate) Predict much more information than the formalisms we have just discussed Realization that basic PCFGs do not result in accurate predictive models We will discuss this in detail Mid-90s first accurate statistical parsers (e.g., [Magerman 1994, Collins 1996] ) o handcrafted grammars, estimated from large datasets (Penn Treebank WSJ) ow: better models, more efficient algorithms, more languages, 10
11 Ambiguity Ambiguous sentence: I saw a girl with a telescope What kind of ambiguity it has? What implications it has for parsing? 11
12 Why parsing is hard? Ambiguity Prepositional phrase attachment ambiguity S S VP P I V saw VP a girl P with PP a telescope P I V saw a girl P with PP a telescope What kind of clues would be useful? PP-attachment ambiguity is just one example type of ambiguity How serious is this problem in practice? 12
13 Why parsing is hard? Ambiguity Example with 3 preposition phrases, 5 interpretations: Put the block ((in the box on the table) in the kitchen) Put the block (in the box (on the table in the kitchen)) Put ((the block in the box) on the table) in the kitchen. Put (the block (in the box on the table)) in the kitchen. Put (the block in the box) (on the table in the kitchen) 13
14 Why parsing is hard? Ambiguity Example with 3 preposition phrases, 5 interpretations: Put the block ((in the box on the table) in the kitchen) Put the block (in the box (on the table in the kitchen)) Put ((the block in the box) on the table) in the kitchen. Put (the block (in the box on the table)) in the kitchen. Put (the block in the box) (on the table in the kitchen) A general case: 2n Cat n = n 2n 4n n 1 n 3/2p Catalan numbers 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786,... 14
15 Why parsing is hard? Ambiguity Example with 3 preposition phrases, 5 interpretations: Put the block ((in the box on the table) in the kitchen) Put the block (in the box (on the table in the kitchen)) Put ((the block in the box) on the table) in the kitchen. Put (the block (in the box on the table)) in the kitchen. Put (the block in the box) (on the table in the kitchen) A general case: 2n Cat n = n 2n 4n n 1 n 3/2p Catalan numbers 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786,... This is only one type of ambiguity but you have many types in almost all sentences 15
16 Why parsing is hard? Ambiguity A typical tree from a standard dataset (Penn treebank WSJ) [from Michael Collins slides] Canadian Utilities had 1988 revenue of $ 1.16 billion, mainly from its natural gas and electric utility businesses in Alberta, where the company serves about 800,000 customers. 16
17 Key problems Recognition problem: is the sentence grammatical? Parsing problem: what is a (most plausible) derivation (tree) corresponding the sentence? Parsing problem encompasses the recognition problem A more interesting question from practical point of view 17
18 Context-Free Grammar Context-free grammar is a tuple of 4 elements G =(V,,R,S) V - the set of non-terminals In our case: phrase categories (VP,,..) and PoS tags (, V,.. aka preterminals) - the set of terminals Words - the set of rules of the form, where, R X! Y 1,Y 2,...,Y n n 0 X 2 V S, Y i 2 V [ is a dedicated start symbol S! VP! girl!! telescope! P V! saw! PP V! eat PP! P
19 An example grammar V = {S, V P,, P P,, V, P, P } = {girl, telescope, sandwich, I, saw, ate, with, in, a, the} S = {S} R : Called Inner rules Preterminal rules (correspond to the HMM emission model aka task model) S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ( A girl) (VP ate a sandwich) (V ate) ( a sandwich) (VP saw a girl) (PP with a telescope) ( a girl) (PP with a sandwich) ( a) ( sandwich) (P with) ( with a sandwich)! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the 19
20 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 20
21 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 21
22 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 22
23 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 23
24 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 24
25 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 25
26 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 26
27 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 27
28 CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope CFG defines both: - a set of substrings (a language) - structures used to represent sentences (constituent trees) 28
29 Why context-free? S VP What can be a sub-tree is only affected by what the phrase type is (VP) but not the context V The dog ate a sandwich 29
30 Why context-free? S VP What can be a sub-tree is only affected by what the phrase type is (VP) but not the context V The dog ate VP a sandwich VP ot grammatical V V AVP ate bark A a sandwich often 30
31 Why context-free? S VP What can be a sub-tree is only affected by what the phrase type is (VP) but not the context V The dog ate VP a sandwich VP ot grammatical V V AVP ate bark A a sandwich often Matters if we want to generate language (e.g., language modeling) but is this relevant to parsing? 31
32 Introduced coordination ambiguity Here, the coarse VP and categories cannot enforce subject-verb agreement in number resulting in the coordination ambiguity S S. VP VP V koalas eat Coordination leaves C and barks koalas VP V eat leaves C and VP V barks "Bark" can refer both to a noun or a verb This tree would be ruled out if the context would be somehow captured (subject-verb agreement) 32
33 Introduced coordination ambiguity Here, the coarse VP and categories cannot enforce subject-verb agreement in number resulting in the coordination ambiguity S S. S koalas VBP eat Coordination Even more detailed PoS tags are not going to help here VP S leaves CC and "Bark" can refer both to a noun or a verb S koalas S barks VBP eat VP S leaves VP CC and This tree would be ruled out if the context would be somehow captured (subject-verb agreement) VP VBZ barks 33
34 Key problems Recognition problem: does the sentence belong to the language defined by CFG? That is: is there a derivation which yields the sentence? Parsing problem: what is a (most plausible) derivation (tree) corresponding the sentence? Parsing problem encompasses the recognition problem 34
35 How to deal with ambiguity? There are (exponentially) many derivation for a typical sentence S ]] S VP VP S VP VB PP Put the block P in the box P on PP P the table in V Put PP the kitchen the block P in PP the box P on V Put PP PP P the table in the kitchen the block P in PP P the box on PP the table P in PP the kitchen Put the block in the box on the table in the kitchen We want to score all the derivations to encode how plausible they are 35
36 An example probabilistic CFG Associate probabilities with the rules p(x! ) : 8 X! 2 R : 0 apple p(x! ) apple 1 X 8X 2 : p(x! ) =1 :X! 2R ow we can score a tree as a product of probabilities corresponding to the used rules S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ( A girl) (VP ate a sandwich) (VP ate) ( a sandwich) (VP saw a girl) (PP with ) ( a girl) (PP with.) ( a) ( sandwich) (P with) ( with a sandwich)! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the
37 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
38 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
39 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
40 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
41 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
42 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
43 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
44 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
45 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
46 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
47 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
48 istribution over trees We defined a distribution over production rules for each nonterminal Our goal was to define a distribution over parse trees Unfortunately, not all PCFGs give rise to a proper distribution over trees, i.e. the sum over probabilities of all Xtrees the grammar can generate may be less than 1: P (T ) < 1 T 48
49 istribution over trees We defined a distribution over production rules for each nonterminal Our goal was to define a distribution over parse trees Unfortunately, not all PCFGs give rise to a proper distribution over trees, i.e. the sum over probabilities of all Xtrees the grammar can generate may be less than 1: P (T ) < 1 T Good news: any PCFG estimated with the maximum likelihood procedure are always proper [Chi and Geman, 98] 49
50 istribution over trees Let us denote by G(x) the set of derivations for the sentence x The probability distribution defines the scoring P (T ) over the trees T 2 G(x) Finding the best parse for the sentence according to PCFG: arg max T 2G(x) P (T ) We will look at this at the next class 50
51 Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?) 51
52 ML estimation 52 A treebank: a collection sentences annotated with constituent trees S P My dog VP V ate a sausage S VP V Put the block PP P in the box PP P on the table PP P in the kitchen S P I VP V saw a girl PP P with a telescope S koalas VP VP V eat leaves C and VP V barks...
53 ML estimation A treebank: a collection sentences annotated with constituent trees P My dog S V ate VP a sausage An estimated probability of a rule (maximum likelihood estimates) p(x! ) = V Put the block S VP P in PP the C(X! ) C(X) box P on the PP table P in PP the kitchen P I S V saw a VP girl P with PP a telescope... koalas The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank V eat S VP leaves VP C and VP V barks 53
54 ML estimation A treebank: a collection sentences annotated with constituent trees P My dog S V ate VP a sausage An estimated probability of a rule (maximum likelihood estimates) p(x! ) = V Put the block S VP P in PP the C(X! ) C(X) box P on the PP table P in PP the kitchen P I S V saw a VP girl P with PP a telescope... koalas The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank V eat S VP leaves VP C and VP V barks 54
55 ML estimation A treebank: a collection sentences annotated with constituent trees P My dog S V ate VP a sausage An estimated probability of a rule (maximum likelihood estimates) p(x! ) = Smoothing is helpful V Put the block S VP P in PP the C(X! ) C(X) box P on the PP table P in PP the kitchen P I S V saw a VP girl P with PP a telescope... koalas The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank V eat S VP leaves VP C and VP V barks Especially important for preterminal rules, i.e. generation of words ( = task mode in PoS tagging) See smoothing techniques in Section 4.5 of the J&M book (+ section for some techniques specifically dealing with to unknown words) 55
56 ML estimation: an example [Ex. from Collins IWPT 01] A toy treebank: S S n 1 n B C 2 n C 3 S B a a a a a a a a Without smoothing: Rule Count Prob. estimate S! BC n 1 n 1 /(n 1 + n 2 + n 3 ) S! C n 2 n 2 /(n 1 + n 2 + n 3 ) S! B n 3 n 3 /(n 1 + n 2 + n 3 ) B! aa n 1 n 1 /(n 1 + n 3 ) B! a n 3 n 3 /(n 1 + n 3 ) C! aa n 1 n 1 /(n 1 + n 2 ) C! aaa n 2 n 2 /(n 1 + n 2 ) 56
57 ML estimation: an example [Ex. from Collins IWPT 01] A toy treebank: S S n 1 n B C 2 n C 3 S B a a a a a a a a Without smoothing: Rule Count Prob. estimate S! BC n 1 n 1 /(n 1 + n 2 + n 3 ) S! C n 2 n 2 /(n 1 + n 2 + n 3 ) S! B n 3 n 3 /(n 1 + n 2 + n 3 ) B! aa n 1 n 1 /(n 1 + n 3 ) B! a n 3 n 3 /(n 1 + n 3 ) C! aa n 1 n 1 /(n 1 + n 2 ) C! aaa n 2 n 2 /(n 1 + n 2 ) 57
58 ML estimation: an example [Ex. from Collins IWPT 01] A toy treebank: S S n 1 n B C 2 n C 3 S B a a a a a a a a Without smoothing: Rule Count Prob. estimate S! BC n 1 n 1 /(n 1 + n 2 + n 3 ) S! C n 2 n 2 /(n 1 + n 2 + n 3 ) S! B n 3 n 3 /(n 1 + n 2 + n 3 ) B! aa n 1 n 1 /(n 1 + n 3 ) B! a n 3 n 3 /(n 1 + n 3 ) C! aa n 1 n 1 /(n 1 + n 2 ) C! aaa n 2 n 2 /(n 1 + n 2 ) 58
59 Penn Treebank: peculiarities Wall street journal: around 40, 000 annotated sentences, 1,000,000 words Fine-grain part of speech tags (45), e.g., for verbs VB VBG VBP VBZ M Verb, past tense Verb, gerund or present participle Verb, present (non-3 rd person singular) Verb, present (3 rd person singular) Modal Flat s (no attempt to disambiguate attachment) T JJ a hot dog food cart 59
60 Outline Unsupervised estimation: forward-backward Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?) 60
61 Parsing Parsing is search through the space of all possible parses e.g., we may want either any parse, all parses or the highest scoring parse (if PCFG): The probability by the arg max P (T ) PCFG model T 2G(x) Bottom-up: Set of all trees given by the grammar for the sentence x One starts from words and attempt to construct the full tree Top-down Start from the start symbol and attempt to expand to get the sentence 61
62 CKY algorithm (aka CYK) Cocke-Kasami-Younger algorithm Independently discovered in late 60s / early 70s An efficient bottom up parsing algorithm for (P)CFGs can be used both for the recognition and parsing problems Very important in LP (and beyond) We will start with the non-probabilistic version 62
63 Constraints on the grammar The basic CKY algorithm supports only rules in the Chomsky ormal Form (CF): C! x Unary preterminal rules (generation of words given PoS tags! telescope,! the, ) C! C 1 C 2 Binary inner rules (e.g., S! VP,! ) 63
64 Constraints on the grammar The basic CKY algorithm supports only rules in the Chomsky ormal Form (CF): C! x C! C 1 C 2 Unary preterminal rules (generation of words given PoS tags! telescope,! the, ) Binary inner rules (e.g., S! VP,! ) Any CFG can be converted to an equivalent CF Equivalent means that they define the same language However (syntactic) trees will look differently Makes linguists unhappy It is possible to address it but defining such transformations that allows for easy reverse transformation 64
65 Transformation to CF form What one need to do to convert to CF form Get rid of empty (aka epsilon) productions: C! Get rid of unary rules: C! C 1 -ary rules: C! C 1 C 2...C n (n>2) Generally not a problem as there are not empty production in the standard (postprocessed) treebanks ot a problem, as our CKY algorithm will support unary rules Crucial to process them, as required for efficient parsing 65
66 Transformation to CF form: binarization Consider! T V BG T VBG the utch publishing group How do we get a set of binary rules which are equivalent? 66
67 Transformation to CF form: binarization Consider! T V BG T VBG the utch publishing group How do we get a set of binary rules which are equivalent?! T X X! Y Y! VBG 67
68 Transformation to CF form: binarization Consider! T V BG T VBG the utch publishing group How do we get a set of binary rules which are equivalent?! T X X! Y Y! VBG A more systematic way to refer to new non-terminals! T! VBG 68
69 Transformation to CF form: binarization Instead of binarizing tules we can binarize trees on preprocessing: T the utch VBG publishing group Also known as lossless Markovization in the context of PCFGs T Can be easily reversed on postprocessing T utch VBG publishing group 69
70 Transformation to CF form: binarization Instead of binarizing tules we can binarize trees on preprocessing: T VBG the utch publishing group T T utch VBG T VBG group 70
71 CKY: Parsing task We a given start symbol a grammar G =(V,,R,S) a sequence of words w =(w 1,w 2,...,w n ) Our goal is to produce a parse tree for w 71
72 CKY: Parsing task We a given start symbol [Some illustrations and slides in this lecture are from Marco Kuhlmann] a grammar G =(V,,R,S) a sequence of words w =(w 1,w 2,...,w n ) Our goal is to produce a parse tree for w We need an easy way to refer to substrings of w indices refer to fenceposts span (i, j) refers to words between fenceposts i and j 72
73 Key problems Recognition problem: does the sentence belong to the language defined by CFG? Is there a derivation which yields the sentence? Parsing problem: what is a derivation (tree) corresponding the sentence? Probabilistic parsing: what is the most probable tree for the sentence? 73
74 Parsing one word C! w i 74
75 Parsing one word C! w i 75
76 Parsing one word C! w i 76
77 Parsing longer spans C! C 1 C 2 Check through all C 1,C 2, mid 77
78 Parsing longer spans C! C 1 C 2 Check through all C 1,C 2, mid 78
79 Parsing longer spans 79
80 Signatures Applications of rules is independent of inner structure of a parse tree We only need to know the corresponding span and the root label of the tree Its signature [min, max, C] Also known as an edge 80
81 CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones 81
82 CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones When done, how do we detect that the sentence is grammatical? 82
83 CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones When done, check if S is among admissible labels for the whole sentence, if yes the sentence belong to the language That is if a tree with signature [0, n, S] exists 83
84 CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones When done, check if S is among admissible labels for the whole sentence, if yes the sentence belong to the language That is if a tree with signature [0, n, S] exists Unary rules? 84
85 CKY in action lead can poison S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
86 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 max = 1 max = 2 max = 3 S? Chart (aka parsing triangle) VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
87 CKY in action lead can poison max = 1 max = 2 max = 3 S? min = 1 min = 2 min = 3 lead can poison S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
88 CKY in action lead can poison max = 1 max = 2 max = 3 S? min = 1 min = 2 min = 3 lead can poison S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
89 CKY in action lead can poison max = 1 max = 2 max = 3 S? min = 1 min = 2 min = 3 lead can poison S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
90 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 max = 1 max = 2 max = 3 S? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
91 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 max = 1 max = 2 max = S? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
92 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3? 2? 3? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
93 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3? 2? 3? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
94 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V 2,M 3,V VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
95 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 2,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
96 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2?,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
97 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2?,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
98 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
99 CKY in action lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 3 Check about unary rules: no unary rules here,v,vp S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
100 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 5 3?,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
101 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 5 3 S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
102 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 5 3 S, V P,,V,VP Check about unary rules: no unary rules here VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
103 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 5 3? S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
104 CKY in action S! VP lead can poison min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 5 3? S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
105 CKY in action S! VP lead can poison mid = 1 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 S, 5 3 S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
106 CKY in action S! VP lead can poison mid = 2 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 S, S(?!) 5 3 S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
107 CKY in action lead can poison mid = 2 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 S, S(?!) S, V P,,V Apparently the sentence is ambiguous,vpfor the grammar: (as the grammar overgenerates) 5 3 S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules
108 Ambiguity S S lead M can VP V poison lead can VP V poison o subject-verb agreement, and poison used as an intransitive verb Apparently the sentence is ambiguous for the grammar: (as the grammar overgenerates)
109 CKY more formally Chart can be represented by a Boolean array chart[min][max][label] chart[min][max][c] Relevant entries have 0 < min < max apple n Here we assume that labels (C) are integer indices chart[min][max][c] = true if the signature (min, max, C) is already added to the chart; false otherwise. max = 1 max = 2 max = 3 min = 0 1,V,VP 4 6 S,V P, min = 1 2,M 5 S, V P, 3,V min = 2,VP 109
110 CKY more formally Chart can be represented by a Boolean array chart[min][max][label] chart[min][max][c] Relevant entries have 0 < min < max apple n Here we assume that labels (C) are integer indices chart[min][max][c] = true if the signature (min, max, C) is already added to the chart; false otherwise. max = 1 max = 2 max = 3 min = 0 1,V,VP 4 6 S,V P, min = 1 2,M 5 S, V P, 3,V min = 2,VP 110
111 Implementation: preterminal rules 111
112 Implementation: binary rules 112
113 Unary rules How to integrate unary rules C! C 1? 113
114 Implementation: unary rules 114
115 Implementation: unary rules But we forgot something! 115
116 Unary closure What if the grammar contained 2 rules: A! B B! C But C can be derived from A by a chain of rules: A! B! C One could support chains in the algorithm but it is easier to extend the grammar, to get the transitive closure A! B B! C ) A! B B! C A! C 116
117 Unary closure What if the grammar contained 2 rules: A! B B! C But C can be derived from A by a chain of rules: A! B! C One could support chains in the algorithm but it is easier to extend the grammar, to get the reflexive transitive closure A! B B! C ) A! B B! C A! C A! A B! B C! C Convenient for programming reasons in the PCFG case 117
118 Implementation: skeleton 118
119 Algorithm analysis Time complexity? (n 3 R ) R, where is the number of rules in the grammar There exist algorithms with better asymptotical time complexity but the `constant' makes them slower in practice (in general) 119
120 Algorithm analysis Time complexity? (n 3 R ) R, where is the number of rules in the grammar There exist algorithms with better asymptotical time complexity but the `constant' makes them slower in practice (in general) 120
121 Algorithm analysis Time complexity? A few seconds for sentences under < 20 words for a nonoptimized parser (n 3 R ) R, where is the number of rules in the grammar There exist algorithms with better asymptotical time complexity but the `constant' makes them slower in practice (in general) 121
122 Algorithm analysis Time complexity? A few seconds for sentences under < 20 words for a nonoptimized parser (n 3 R ) R, where is the number of rules in the grammar There exist algorithms with better asymptotical time complexity but the `constant' makes them slower in practice (in general) 122
123 Practical time complexity Time complexity? (for the PCFG version) n 3.6 [Plot by an Klein] 123
124 Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?) 124
125 Probabilistic parsing We discussed the recognition problem: check if a sentence is parsable with a CFG ow we consider parsing with PCFGs Recognition with PCFGs: what is the probability of the most probable parse tree? Parsing with PCFGs: What is the most probable parse tree? 125
126 CFGs P I S V 0.5 saw 1.0 VP P a girl with PP S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the a telescope p(t )= =
127 istribution over trees Let us denote by G(x) the set of derivations for the sentence x The probability distribution defines the scoring P (T ) over the trees T 2 G(x) Finding the best parse for the sentence according to PCFG: arg max T 2G(x) P (T ) This will be another Viterbi / Max-Product algorithm! 127
128 CKY with PCFGs Chart is represented by a double array chart[min][max][label] chart[min][max][c] It stores probabilities for the most probable subtree with a given signature chart[0][n][s] parse tree will store the probability of the most probable full 128
129 Intuition C! C 1 C 2 For every C choose C 1, C 2 and mid such that P (T 1 ) P (T 2 ) P (C! C 1 C 2 ) is maximal, where T 1 and T 2 are left and right subtrees. 129
130 Implementation: preterminal rules 130
131 Implementation: binary rules 131
132 Unary rules Similarly to CFGs: after producing scores for signatures (c, i, j), try to improve the scores by applying unary rules (and rule chains) If improved, update the scores 132
133 Unary (reflexive transitive) closure A! B B! C ) A! B B! C A! C A! A B! B C! C ote that this is not a PCFG anymore as the rules do not sum to 1 for each parent
134 Unary (reflexive transitive) closure The fact that the rule is composite needs to be stored to recover the true tree A! B B! C ) A! B B! C A! C A! A B! B C! C ote that this is not a PCFG anymore as the rules do not sum to 1 for each parent
135 Unary (reflexive transitive) closure The fact that the rule is composite needs to be stored to recover the true tree A! B B! C ) A! B B! C A! C A! A B! B C! C ote that this is not a PCFG anymore as the rules do not sum to 1 for each parent A! B B! C A! C e 5 ) A! B B! C A! C A! A B! B C! C
136 Unary (reflexive transitive) closure The fact that the rule is composite needs to be stored to recover the true tree A! B B! C ) A! B B! C A! C A! A B! B C! C ote that this is not a PCFG anymore as the rules do not sum to 1 for each parent A! B B! C A! C e 5 ) A! B B! C A! C A! A B! B C! C What about loops, like: A! B! A! C? 136
137 Recovery of the tree For each signature we store backpointers to the elements from which it was built (e.g., rule and, for binary rules, midpoint) start recovering from [0, n, S] Be careful with unary rules Basically you can assume that you always used an unary rule from the closure (but it could be the trivial one ) C! C 137
138 Speeding up the algorithm (approximate search) Any ideas? 138
139 Speeding up the algorithm (approximate search) Basic pruning (roughly): For every span (i,j) store only labels which have the probability at most times smaller than the probability of the most probable label for this span Check not all rules but only rules yielding subtree labels having non-zero probability Coarse-to-fine pruning Parse with a smaller (simpler) grammar, and precompute (posterior) probabilities for each spans, and use only the ones with non-negligible probability from the previous grammar 139
140 Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?) 140
141 Parser evaluation Though has many drawbacks it is easier and allows to track stateof-the-art across many years Intrinsic evaluation: Automatic: evaluate against annotation provided by human experts (gold standard) according to some predefined measure Manual: according to human judgment Extrinsic evaluation: score syntactic representation by comparing how well a system using this representation performs on some task E.g., use syntactic representation as input for a semantic analyzer and compare results of the analyzer using syntax predicted by different parsers. 141
142 Standard evaluation setting in parsing Automatic intrinsic evaluation is used: parsers are evaluated against gold standard by provided by linguists There is a standard split into the parts: training set: used for estimation of model parameters development set: used for tuning the model (initial experiments) test set: final experiments to compare against previous work 142
143 Automatic evaluation of constituent parsers Exact match: percentage of trees predicted correctly Bracket score: scores how well individual phrases (and their boundaries) are identified Crossing brackets: percentage of phrases boundaries crossing The most standard measure; we will focus on it ependency metrics: scores dependency structure corresponding to the constituent tree (percentage of correctly identified heads) 143
144 Brackets scores The most standard score is bracket score It regards a tree as a collection of brackets: The set of brackets predicted by a parser is compared against the set of brackets in the tree annotated by a linguist Precision, recall and F1 are used as scores Subtree signatures for CKY [min, max, C] 144
145 Bracketing notation S VP P V My dog ate a sausage The same tree as a bracketed sequence (S ( (P My) ( og) ) (VP (V ate) ( ( a ) ( sausage) ) ) ) 145
146 Brackets scores Pr = Re = number of brackets the parser and annotation agree on number of brackets predicted by the parser number of brackets the parser and annotation agree on number of brackets in annotation F 1= 2 Pr Re Pr+ Re Harmonic mean of precision and recall 146
147 Preview: F1 bracket score Treebank PCFG Unlexicalized Lexicalized PCFG PCFG (Klein and (Collins, 1999) Manning, 2003) Automatically Induced PCFG (Petrov et al., 2006) The best results reported (as of 2012) 147
148 Preview: F1 bracket score Treebank PCFG Unlexicalized Lexicalized PCFG PCFG (Klein and (Collins, 1999) Manning, 2003) Automatically Induced PCFG (Petrov et al., 2006) The best results reported (as of 2012) Looked into this in BSc KI at UvA We will look into this one 148
LECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationA Context-Free Grammar
Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationReview. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering
Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up
More informationIntroduction to Computational Linguistics
Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationS NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.
/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationMultilevel Coarse-to-Fine PCFG Parsing
Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa
More informationCS838-1 Advanced NLP: Hidden Markov Models
CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn
More informationHidden Markov Models
CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word
More informationCS 6120/CS4120: Natural Language Processing
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationLecture 5: UDOP, Dependency Grammars
Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationRemembering subresults (Part I): Well-formed substring tables
Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01, 1. February 2005 Problem: Inefficiency of recomputing subresults Two
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationConstituency Parsing
CS5740: Natural Language Processing Spring 2017 Constituency Parsing Instructor: Yoav Artzi Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models
INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationHidden Markov Models (HMMs)
Hidden Markov Models HMMs Raymond J. Mooney University of Texas at Austin 1 Part Of Speech Tagging Annotate each word in a sentence with a part-of-speech marker. Lowest level of syntactic analysis. John
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationCS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationc(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)
Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationTo make a grammar probabilistic, we need to assign a probability to each context-free rewrite
Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationGrammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG
Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models
INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 18, 2017 Recap: Probabilistic Language
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationCS 545 Lecture XVI: Parsing
CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models
1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise
More informationToday s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM
Today s Agenda Need to cover lots of background material l Introduction to Statistical Models l Hidden Markov Models l Part of Speech Tagging l Applying HMMs to POS tagging l Expectation-Maximization (EM)
More informationIntroduction to Probablistic Natural Language Processing
Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation
More informationRoger Levy Probabilistic Models in the Study of Language draft, October 2,
Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn
More informationProbabilistic Context Free Grammars
1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set
More informationNatural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationCMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009
CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models Jimmy Lin The ischool University of Maryland Wednesday, September 30, 2009 Today s Agenda The great leap forward in NLP Hidden Markov
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationNatural Language Processing. Lecture 13: More on CFG Parsing
Natural Language Processing Lecture 13: More on CFG Parsing Probabilistc/Weighted Parsing Example: ambiguous parse Probabilistc CFG Ambiguous parse w/probabilites 0.05 0.05 0.20 0.10 0.30 0.20 0.60 0.75
More informationRecap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees
Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationHMM and Part of Speech Tagging. Adam Meyers New York University
HMM and Part of Speech Tagging Adam Meyers New York University Outline Parts of Speech Tagsets Rule-based POS Tagging HMM POS Tagging Transformation-based POS Tagging Part of Speech Tags Standards There
More informationLecture 9: Hidden Markov Model
Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501 Natural Language Processing 1 This lecture v Hidden Markov
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation
More informationComputational Models - Lecture 4
Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationProbabilistic Graphical Models
CS 1675: Intro to Machine Learning Probabilistic Graphical Models Prof. Adriana Kovashka University of Pittsburgh November 27, 2018 Plan for This Lecture Motivation for probabilistic graphical models Directed
More informationUnit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...
More informationSharpening the empirical claims of generative syntax through formalization
Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?
More informationNLP Homework: Dependency Parsing with Feed-Forward Neural Network
NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationA DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005
A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):
More informationQuiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7
Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},
More informationRNA Secondary Structure Prediction
RNA Secondary Structure Prediction 1 RNA structure prediction methods Base-Pair Maximization Context-Free Grammar Parsing. Free Energy Methods Covariance Models 2 The Nussinov-Jacobson Algorithm q = 9
More informationProbabilistic Context-Free Grammars and beyond
Probabilistic Context-Free Grammars and beyond Mark Johnson Microsoft Research / Brown University July 2007 1 / 87 Outline Introduction Formal languages and Grammars Probabilistic context-free grammars
More informationIntroduction to Semantic Parsing with CCG
Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)
More informationCHAPTER THREE: RELATIONS AND FUNCTIONS
CHAPTER THREE: RELATIONS AND FUNCTIONS 1 Relations Intuitively, a relation is the sort of thing that either does or does not hold between certain things, e.g. the love relation holds between Kim and Sandy
More informationRecap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.
Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities
More informationSharpening the empirical claims of generative syntax through formalization
Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?
More information