Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation

Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?)

Syntactic parsing Syntax The study of the patterns of formation of sentences and phrases from words Borders with semantics and morphology sometimes blurred Parsing The process of predicting syntactic representations Syntactic Representations Afyonkarahisarlılaştırabildiklerimizdenmişsinizcesinee in Turkish mean "as if you are one of the people that we thought to be originating from Afyonkarahisar" [wikipedia] P My ifferent types of syntactic representations are possible, for example: dog S V ate VP a sausage Each internal node correspond to a phrase root poss nsubj dobj det root My dog ate a sausage P V Constuent (a.k.a. phrasestructure) tree ependency tree 3

Constituent trees S Internal nodes correspond to phrases VP S a sentence P My dog V ate a sausage (oun Phrase): My dog, a sandwich, lakes,.. VP (Verb Phrase): ate a sausage, barked, PP (Prepositional phrases): with a friend, in a car, odes immediately above words are PoS tags (aka preterminals) P pronoun determiner V verb noun P preposition 4

Bracketing notation S VP P V My dog ate a sausage It is often convenient to represent a tree as a bracketed sequence (S ) ( (P My) ( og) ) (VP (V ate) ( ( a ) ( sausage) ) ) We will use this format in the assignment 5

ependency trees root esignated root symbol poss nsubj dobj det root My dog ate a sausage P V odes are words (along with PoS tags) irected arcs encode syntactic dependencies between them Labels are types of relations between the words poss possesive dobj direct object nsub - subject det - determiner 6

Constituent and dependency representations Constituent trees can (potentially) be converted to dependency trees P S V VP One potential rule to extract nsubj dependency. (In practice, they are specified a bit differently) My dog ate ependency trees can (potentially) be converted to constituent trees Recovering labels is harder S VP a sandwich poss root nsubj Roughly: every word along with all its dependents corresponds to a phrase = to an inner node in the constituent tree P My dog V barked root My dog barked P V 7

Recovering shallow semantics root poss nsubj dobj det root My dog ate a sausage P V Some semantic information can be (approximately) derived from syntactic information Subjects (nsubj) are (often) agents ("initiator / doers for an action") irect objects (dobj) are (often) patients ("affected entities") But even for agents and patients consider: Mary is baking a cake in the oven A cake is baking in the oven Syntactic subject corresponds to an agent (Who is baking?) Syntactic subject corresponds to a patient (What is being baked?) In general it is not trivial even for the most shallow forms of semantics E.g., consider prepositions: in can encode direction, position, temporal information, 9

Brief history of parsing Before mid-90s syntactic rule-based parsers producing rich linguistic information Provide low coverage (though hard to evaluate) Predict much more information than the formalisms we have just discussed Realization that basic PCFGs do not result in accurate predictive models We will discuss this in detail Mid-90s first accurate statistical parsers (e.g., [Magerman 1994, Collins 1996] ) o handcrafted grammars, estimated from large datasets (Penn Treebank WSJ) ow: better models, more efficient algorithms, more languages, 10

Ambiguity Ambiguous sentence: I saw a girl with a telescope What kind of ambiguity it has? What implications it has for parsing? 11

Why parsing is hard? Ambiguity Prepositional phrase attachment ambiguity S S VP P I V saw VP a girl P with PP a telescope P I V saw a girl P with PP a telescope What kind of clues would be useful? PP-attachment ambiguity is just one example type of ambiguity How serious is this problem in practice? 12

Why parsing is hard? Ambiguity Example with 3 preposition phrases, 5 interpretations: Put the block ((in the box on the table) in the kitchen) Put the block (in the box (on the table in the kitchen)) Put ((the block in the box) on the table) in the kitchen. Put (the block (in the box on the table)) in the kitchen. Put (the block in the box) (on the table in the kitchen) A general case: 2n Cat n = n 2n 4n n 1 n 3/2p Catalan numbers 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786,... 14

Why parsing is hard? Ambiguity A typical tree from a standard dataset (Penn treebank WSJ) [from Michael Collins slides] Canadian Utilities had 1988 revenue of $ 1.16 billion, mainly from its natural gas and electric utility businesses in Alberta, where the company serves about 800,000 customers. 16

Key problems Recognition problem: is the sentence grammatical? Parsing problem: what is a (most plausible) derivation (tree) corresponding the sentence? Parsing problem encompasses the recognition problem A more interesting question from practical point of view 17

Context-Free Grammar Context-free grammar is a tuple of 4 elements G =(V,,R,S) V - the set of non-terminals In our case: phrase categories (VP,,..) and PoS tags (, V,.. aka preterminals) - the set of terminals Words - the set of rules of the form, where, R X! Y 1,Y 2,...,Y n n 0 X 2 V S, Y i 2 V [ is a dedicated start symbol S! VP! girl!! telescope! P V! saw! PP V! eat PP! P...... 18

An example grammar V = {S, V P,, P P,, V, P, P } = {girl, telescope, sandwich, I, saw, ate, with, in, a, the} S = {S} R : Called Inner rules Preterminal rules (correspond to the HMM emission model aka task model) S! VP VP! V VP! V VP! VP PP! PP!! P PP! P ( A girl) (VP ate a sandwich) (V ate) ( a sandwich) (VP saw a girl) (PP with a telescope) ( a girl) (PP with a sandwich) ( a) ( sandwich) (P with) ( with a sandwich)! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the 19

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 20

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 21

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 22

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 23

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 24

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 25

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 26

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope 27

CFGs S S! VP VP! V VP! V VP! VP PP! girl! telescope! sandwich P! I V! saw P I V saw VP P PP! PP!! P PP! P V! ate P! with P! in! a! the a girl with a telescope CFG defines both: - a set of substrings (a language) - structures used to represent sentences (constituent trees) 28

Why context-free? S VP What can be a sub-tree is only affected by what the phrase type is (VP) but not the context V The dog ate a sandwich 29

Why context-free? S VP What can be a sub-tree is only affected by what the phrase type is (VP) but not the context V The dog ate VP a sandwich VP ot grammatical V V AVP ate bark A a sandwich often 30

Why context-free? S VP What can be a sub-tree is only affected by what the phrase type is (VP) but not the context V The dog ate VP a sandwich VP ot grammatical V V AVP ate bark A a sandwich often Matters if we want to generate language (e.g., language modeling) but is this relevant to parsing? 31

Introduced coordination ambiguity Here, the coarse VP and categories cannot enforce subject-verb agreement in number resulting in the coordination ambiguity S S. VP VP V koalas eat Coordination leaves C and barks koalas VP V eat leaves C and VP V barks "Bark" can refer both to a noun or a verb This tree would be ruled out if the context would be somehow captured (subject-verb agreement) 32

Introduced coordination ambiguity Here, the coarse VP and categories cannot enforce subject-verb agreement in number resulting in the coordination ambiguity S S. S koalas VBP eat Coordination Even more detailed PoS tags are not going to help here VP S leaves CC and "Bark" can refer both to a noun or a verb S koalas S barks VBP eat VP S leaves VP CC and This tree would be ruled out if the context would be somehow captured (subject-verb agreement) VP VBZ barks 33

Key problems Recognition problem: does the sentence belong to the language defined by CFG? That is: is there a derivation which yields the sentence? Parsing problem: what is a (most plausible) derivation (tree) corresponding the sentence? Parsing problem encompasses the recognition problem 34

How to deal with ambiguity? There are (exponentially) many derivation for a typical sentence S ]] S VP VP S VP VB PP Put the block P in the box P on PP P the table in V Put PP the kitchen the block P in PP the box P on V Put PP PP P the table in the kitchen the block P in PP P the box on PP the table P in PP the kitchen Put the block in the box on the table in the kitchen We want to score all the derivations to encode how plausible they are 35

An example probabilistic CFG Associate probabilities with the rules p(x! ) : 8 X! 2 R : 0 apple p(x! ) apple 1 X 8X 2 : p(x! ) =1 :X! 2R ow we can score a tree as a product of probabilities corresponding to the used rules S! VP VP! V VP! V VP! VP PP! PP!! P PP! P 1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 ( A girl) (VP ate a sandwich) (VP ate) ( a sandwich) (VP saw a girl) (PP with ) ( a girl) (PP with.) ( a) ( sandwich) (P with) ( with a sandwich)! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7 36

CFGs P I 0.2 1.0 S V 0.5 saw 1.0 VP 0.3 0.4 0.2 0.3 0.5 1.0 P a girl with PP 0.6 0.5 0.3 0.7 S! VP VP! V VP! V VP! VP PP! PP!! P PP! P 1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0! girl! telescope! sandwich P! I V! saw V! ate P! with P! in! a! the 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7 a telescope p(t )=1.0 0.2 1.0 0.4 0.5 0.3 0.5 0.3 0.2 1.0 0.6 0.5 0.3 0.7 =2.26 10 5 37

istribution over trees We defined a distribution over production rules for each nonterminal Our goal was to define a distribution over parse trees Unfortunately, not all PCFGs give rise to a proper distribution over trees, i.e. the sum over probabilities of all Xtrees the grammar can generate may be less than 1: P (T ) < 1 T 48

istribution over trees Let us denote by G(x) the set of derivations for the sentence x The probability distribution defines the scoring P (T ) over the trees T 2 G(x) Finding the best parse for the sentence according to PCFG: arg max T 2G(x) P (T ) We will look at this at the next class 50

Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?) 51

ML estimation 52 A treebank: a collection sentences annotated with constituent trees S P My dog VP V ate a sausage S VP V Put the block PP P in the box PP P on the table PP P in the kitchen S P I VP V saw a girl PP P with a telescope S koalas VP VP V eat leaves C and VP V barks...

ML estimation A treebank: a collection sentences annotated with constituent trees P My dog S V ate VP a sausage An estimated probability of a rule (maximum likelihood estimates) p(x! ) = Smoothing is helpful V Put the block S VP P in PP the C(X! ) C(X) box P on the PP table P in PP the kitchen P I S V saw a VP girl P with PP a telescope... koalas The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank V eat S VP leaves VP C and VP V barks Especially important for preterminal rules, i.e. generation of words ( = task mode in PoS tagging) See smoothing techniques in Section 4.5 of the J&M book (+ section 5.8.2 for some techniques specifically dealing with to unknown words) 55

ML estimation: an example [Ex. from Collins IWPT 01] A toy treebank: S S n 1 n B C 2 n C 3 S B a a a a a a a a Without smoothing: Rule Count Prob. estimate S! BC n 1 n 1 /(n 1 + n 2 + n 3 ) S! C n 2 n 2 /(n 1 + n 2 + n 3 ) S! B n 3 n 3 /(n 1 + n 2 + n 3 ) B! aa n 1 n 1 /(n 1 + n 3 ) B! a n 3 n 3 /(n 1 + n 3 ) C! aa n 1 n 1 /(n 1 + n 2 ) C! aaa n 2 n 2 /(n 1 + n 2 ) 56

Penn Treebank: peculiarities Wall street journal: around 40, 000 annotated sentences, 1,000,000 words Fine-grain part of speech tags (45), e.g., for verbs VB VBG VBP VBZ M Verb, past tense Verb, gerund or present participle Verb, present (non-3 rd person singular) Verb, present (3 rd person singular) Modal Flat s (no attempt to disambiguate attachment) T JJ a hot dog food cart 59

Outline Unsupervised estimation: forward-backward Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing evaluation Unsupervised estimation of PCFGs (?) Grammar refinement (producing a state-of-the-art parser) (?) 60

Parsing Parsing is search through the space of all possible parses e.g., we may want either any parse, all parses or the highest scoring parse (if PCFG): The probability by the arg max P (T ) PCFG model T 2G(x) Bottom-up: Set of all trees given by the grammar for the sentence x One starts from words and attempt to construct the full tree Top-down Start from the start symbol and attempt to expand to get the sentence 61

CKY algorithm (aka CYK) Cocke-Kasami-Younger algorithm Independently discovered in late 60s / early 70s An efficient bottom up parsing algorithm for (P)CFGs can be used both for the recognition and parsing problems Very important in LP (and beyond) We will start with the non-probabilistic version 62

Constraints on the grammar The basic CKY algorithm supports only rules in the Chomsky ormal Form (CF): C! x Unary preterminal rules (generation of words given PoS tags! telescope,! the, ) C! C 1 C 2 Binary inner rules (e.g., S! VP,! ) 63

Constraints on the grammar The basic CKY algorithm supports only rules in the Chomsky ormal Form (CF): C! x C! C 1 C 2 Unary preterminal rules (generation of words given PoS tags! telescope,! the, ) Binary inner rules (e.g., S! VP,! ) Any CFG can be converted to an equivalent CF Equivalent means that they define the same language However (syntactic) trees will look differently Makes linguists unhappy It is possible to address it but defining such transformations that allows for easy reverse transformation 64

Transformation to CF form What one need to do to convert to CF form Get rid of empty (aka epsilon) productions: C! Get rid of unary rules: C! C 1 -ary rules: C! C 1 C 2...C n (n>2) Generally not a problem as there are not empty production in the standard (postprocessed) treebanks ot a problem, as our CKY algorithm will support unary rules Crucial to process them, as required for efficient parsing 65

Transformation to CF form: binarization Consider! T V BG T VBG the utch publishing group How do we get a set of binary rules which are equivalent? 66

Transformation to CF form: binarization Consider! T V BG T VBG the utch publishing group How do we get a set of binary rules which are equivalent?! T X X! Y Y! VBG 67

Transformation to CF form: binarization Consider! T V BG T VBG the utch publishing group How do we get a set of binary rules which are equivalent?! T X X! Y Y! VBG A more systematic way to refer to new non-terminals! T @ T @ T! @ T @ T! VBG 68

Transformation to CF form: binarization Instead of binarizing tules we can binarize trees on preprocessing: T the utch VBG publishing group Also known as lossless Markovization in the context of PCFGs T @-> T Can be easily reversed on postprocessing the @-> T utch VBG publishing group 69

Transformation to CF form: binarization Instead of binarizing tules we can binarize trees on preprocessing: T VBG the utch publishing group T @-> T the @-> T utch VBG publishing @-> T VBG group 70

CKY: Parsing task We a given start symbol a grammar G =(V,,R,S) a sequence of words w =(w 1,w 2,...,w n ) Our goal is to produce a parse tree for w 71

CKY: Parsing task We a given start symbol [Some illustrations and slides in this lecture are from Marco Kuhlmann] a grammar G =(V,,R,S) a sequence of words w =(w 1,w 2,...,w n ) Our goal is to produce a parse tree for w We need an easy way to refer to substrings of w indices refer to fenceposts span (i, j) refers to words between fenceposts i and j 72

Key problems Recognition problem: does the sentence belong to the language defined by CFG? Is there a derivation which yields the sentence? Parsing problem: what is a derivation (tree) corresponding the sentence? Probabilistic parsing: what is the most probable tree for the sentence? 73

Parsing one word C! w i 74

Parsing one word C! w i 75

Parsing one word C! w i 76

Parsing longer spans C! C 1 C 2 Check through all C 1,C 2, mid 77

Parsing longer spans C! C 1 C 2 Check through all C 1,C 2, mid 78

Parsing longer spans 79

Signatures Applications of rules is independent of inner structure of a parse tree We only need to know the corresponding span and the root label of the tree Its signature [min, max, C] Also known as an edge 80

CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones 81

CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones When done, how do we detect that the sentence is grammatical? 82

CKY idea Compute for every span a set of admissible labels (may be empty for some spans) Start from small trees (single words) and proceed to larger ones When done, check if S is among admissible labels for the whole sentence, if yes the sentence belong to the language That is if a tree with signature [0, n, S] exists 83

CKY in action lead can poison 0 1 2 3 S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 max = 1 max = 2 max = 3 S? Chart (aka parsing triangle) VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action lead can poison 0 1 2 3 max = 1 max = 2 max = 3 S? min = 1 min = 2 min = 3 lead can poison S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 max = 1 max = 2 max = 3 S? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 max = 1 max = 2 max = 3 1 4 6 2 5 3 S? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3? 2? 3? VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V 2,M 3,V VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 2,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2?,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 3,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 3 Check about unary rules: no unary rules here,v,vp S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 5 3?,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 5 3 S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 5 3 S, V P,,V,VP Check about unary rules: no unary rules here VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 5 3? S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 mid = 1 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 S, 5 3 S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action S! VP lead can poison 0 1 2 3 mid = 2 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 S, S(?!) 5 3 S, V P,,V,VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

CKY in action lead can poison 0 1 2 3 mid = 2 min = 0 min = 1 min = 2 1 max = 1 max = 2 max = 3,V,VP 4 2,M 6 S, S(?!) S, V P,,V Apparently the sentence is ambiguous,vpfor the grammar: (as the grammar overgenerates) 5 3 S! VP VP! VP VP! M V VP! V!!! can! lead! poison M! can M! must V! poison V! lead Preterminal rules Inner rules

Ambiguity S S lead M can VP V poison lead can VP V poison o subject-verb agreement, and poison used as an intransitive verb Apparently the sentence is ambiguous for the grammar: (as the grammar overgenerates)

CKY more formally Chart can be represented by a Boolean array chart[min][max][label] chart[min][max][c] Relevant entries have 0 < min < max apple n Here we assume that labels (C) are integer indices chart[min][max][c] = true if the signature (min, max, C) is already added to the chart; false otherwise. max = 1 max = 2 max = 3 min = 0 1,V,VP 4 6 S,V P, min = 1 2,M 5 S, V P, 3,V min = 2,VP 109

Implementation: preterminal rules 111

Implementation: binary rules 112

Unary rules How to integrate unary rules C! C 1? 113

Implementation: unary rules 114

Implementation: unary rules But we forgot something! 115

Unary closure What if the grammar contained 2 rules: A! B B! C But C can be derived from A by a chain of rules: A! B! C One could support chains in the algorithm but it is easier to extend the grammar, to get the transitive closure A! B B! C ) A! B B! C A! C 116

Implementation: skeleton 118

Algorithm analysis Time complexity? (n 3 R ) R, where is the number of rules in the grammar There exist algorithms with better asymptotical time complexity but the `constant' makes them slower in practice (in general) 119

Algorithm analysis Time complexity? A few seconds for sentences under < 20 words for a nonoptimized parser (n 3 R ) R, where is the number of rules in the grammar There exist algorithms with better asymptotical time complexity but the `constant' makes them slower in practice (in general) 121

Practical time complexity Time complexity? (for the PCFG version) n 3.6 [Plot by an Klein] 123

Probabilistic parsing We discussed the recognition problem: check if a sentence is parsable with a CFG ow we consider parsing with PCFGs Recognition with PCFGs: what is the probability of the most probable parse tree? Parsing with PCFGs: What is the most probable parse tree? 125

CKY with PCFGs Chart is represented by a double array chart[min][max][label] chart[min][max][c] It stores probabilities for the most probable subtree with a given signature chart[0][n][s] parse tree will store the probability of the most probable full 128

Intuition C! C 1 C 2 For every C choose C 1, C 2 and mid such that P (T 1 ) P (T 2 ) P (C! C 1 C 2 ) is maximal, where T 1 and T 2 are left and right subtrees. 129

Implementation: preterminal rules 130

Implementation: binary rules 131

Unary rules Similarly to CFGs: after producing scores for signatures (c, i, j), try to improve the scores by applying unary rules (and rule chains) If improved, update the scores 132

Unary (reflexive transitive) closure A! B B! C... 0.1 0.2 ) A! B B! C A! C 0.1 0.2 0.2 0.1... A! A B! B C! C ote that this is not a PCFG anymore as the rules do not sum to 1 for each parent 1 1 1 133

Unary (reflexive transitive) closure The fact that the rule is composite needs to be stored to recover the true tree A! B B! C... 0.1 0.2 ) A! B B! C A! C 0.1 0.2 0.2 0.1... A! A B! B C! C ote that this is not a PCFG anymore as the rules do not sum to 1 for each parent 1 1 1 134

Recovery of the tree For each signature we store backpointers to the elements from which it was built (e.g., rule and, for binary rules, midpoint) start recovering from [0, n, S] Be careful with unary rules Basically you can assume that you always used an unary rule from the closure (but it could be the trivial one ) C! C 137

Speeding up the algorithm (approximate search) Any ideas? 138

Speeding up the algorithm (approximate search) Basic pruning (roughly): For every span (i,j) store only labels which have the probability at most times smaller than the probability of the most probable label for this span Check not all rules but only rules yielding subtree labels having non-zero probability Coarse-to-fine pruning Parse with a smaller (simpler) grammar, and precompute (posterior) probabilities for each spans, and use only the ones with non-negligible probability from the previous grammar 139

Parser evaluation Though has many drawbacks it is easier and allows to track stateof-the-art across many years Intrinsic evaluation: Automatic: evaluate against annotation provided by human experts (gold standard) according to some predefined measure Manual: according to human judgment Extrinsic evaluation: score syntactic representation by comparing how well a system using this representation performs on some task E.g., use syntactic representation as input for a semantic analyzer and compare results of the analyzer using syntax predicted by different parsers. 141

Standard evaluation setting in parsing Automatic intrinsic evaluation is used: parsers are evaluated against gold standard by provided by linguists There is a standard split into the parts: training set: used for estimation of model parameters development set: used for tuning the model (initial experiments) test set: final experiments to compare against previous work 142

Automatic evaluation of constituent parsers Exact match: percentage of trees predicted correctly Bracket score: scores how well individual phrases (and their boundaries) are identified Crossing brackets: percentage of phrases boundaries crossing The most standard measure; we will focus on it ependency metrics: scores dependency structure corresponding to the constituent tree (percentage of correctly identified heads) 143

Brackets scores The most standard score is bracket score It regards a tree as a collection of brackets: The set of brackets predicted by a parser is compared against the set of brackets in the tree annotated by a linguist Precision, recall and F1 are used as scores Subtree signatures for CKY [min, max, C] 144

Bracketing notation S VP P V My dog ate a sausage The same tree as a bracketed sequence (S ( (P My) ( og) ) (VP (V ate) ( ( a ) ( sausage) ) ) ) 145

Brackets scores Pr = Re = number of brackets the parser and annotation agree on number of brackets predicted by the parser number of brackets the parser and annotation agree on number of brackets in annotation F 1= 2 Pr Re Pr+ Re Harmonic mean of precision and recall 146

Preview: F1 bracket score 95 90 85 80 75 70 65 Treebank PCFG Unlexicalized Lexicalized PCFG PCFG (Klein and (Collins, 1999) Manning, 2003) Automatically Induced PCFG (Petrov et al., 2006) The best results reported (as of 2012) 147