Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Similar documents
Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

LECTURER: BURCU CAN Spring

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

CS 6120/CS4120: Natural Language Processing

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Statistical Methods for NLP

Probabilistic Context-Free Grammar

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Probabilistic Context-free Grammars

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Probabilistic Context Free Grammars

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Processing/Speech, NLP and the Web

Parsing with Context-Free Grammars

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Decoding and Inference with Syntactic Translation Models

Advanced Natural Language Processing Syntactic Parsing

A* Search. 1 Dijkstra Shortest Path

CS460/626 : Natural Language

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Natural Language Processing

Introduction to Computational Linguistics

CS : Speech, NLP and the Web/Topics in AI

Constituency Parsing

In this chapter, we explore the parsing problem, which encompasses several questions, including:

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Chapter 14 (Partially) Unsupervised Parsing

Remembering subresults (Part I): Well-formed substring tables

Parsing with Context-Free Grammars

Algorithms for Syntax-Aware Statistical Machine Translation

Even More on Dynamic Programming

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Probabilistic Context Free Grammars. Many slides from Michael Collins

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Natural Language Processing. Lecture 13: More on CFG Parsing

Probabilistic Context-Free Grammars and beyond

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Introduction to Bottom-Up Parsing

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Context- Free Parsing with CKY. October 16, 2014

CISC4090: Theory of Computation

Context Free Grammars: Introduction

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

CPS 220 Theory of Computation

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Multilevel Coarse-to-Fine PCFG Parsing

Introduction to Bottom-Up Parsing

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

MA/CSSE 474 Theory of Computation

Context Free Grammars

CSCI Compiler Construction

Top-Down Parsing and Intro to Bottom-Up Parsing

CYK Algorithm for Parsing General Context-Free Grammars

Introduction to Probablistic Natural Language Processing

Probabilistic Linguistics

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Computational Models - Lecture 3

THEORY OF COMPILATION

The Infinite PCFG using Hierarchical Dirichlet Processes

Lecture 5: UDOP, Dependency Grammars

A Context-Free Grammar

Follow sets. LL(1) Parsing Table

DT2118 Speech and Speaker Recognition

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

The Pumping Lemma for Context Free Grammars

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Computing if a token can follow

Extensions to the Logic of All x are y: Verbs, Relative Clauses, and Only

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Computational Models - Lecture 5 1

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Computational Models - Lecture 4

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

Aspects of Tree-Based Statistical Machine Translation

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

CS460/626 : Natural Language

Transcription:

even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof, ): Ling/C/tats background: What you hope to get out of the course: Whether the course has so far been too fast, too slow, or about right: Assessment Phrase structure grammars = context-free grammars G = (T,,, R) T is set of terminals is set of nonterminals For LP, we usually distinguish out a set P of preterminals, which always rewrite as terminals is the start symbol (one of the nonterminals) R is rules/productions of the form X γ, where X is a nonterminal and γ is a sequence of terminals and nonterminals (possibly an empty sequence) A grammar G generates a language L. A phrase structure grammar Top-down parsing P V P V P PP P P PP P V P e P P PP P P By convention, is the start symbol, but in the PTB, we have an extra node at the top (, )

Bottom-up parsing Bottom-up parsing is data directed The initial goal list of a bottom-up parser is the string to be parsed. If a sequence in the goal list matches the RH of a rule, then this sequence may be replaced by the LH of the rule. Parsing is finished when the goal list contains just the start category. If the RH of several rules match the goal list, then there is a choice of which rule to apply (search problem) Can use depth-first or breadth-first search, and goal ordering. The standard presentation is as shift-reduce parsing. hift-reduce parsing: one path HIFT P P HIFT P V P V HIFT P V P V P P V P HIFT P V P P P V P P HIFT P V P P P V P P P P V P PP P What other search paths are there for parsing this sentence? oundness and completeness A parser is sound if every parse it returns is valid/correct A parser terminates if it is guaranteed to not go off into an infinite loop A parser is complete if for any given grammar and sentence, it is sound, produces every valid parse for that sentence, and terminates (For many purposes, we settle for sound but incomplete parsers: e.g., probabilistic parsers that return a k-best list.) Problems bottom-up parsing Unable to deal empty categories: termination problem, unless rewriting empties as constituents is somehow restricted (but then it's generally incomplete) Useless work: locally possible, but globally impossible. Inefficient when there is great lexical ambiguity (grammar-driven control might help here) Conversely, it is data-directed: it attempts to parse the words that are there. Repeated work: anywhere there is common substructure Problems top-down parsing Repeated work Left recursive rules A top-down parser will do badly if there are many different rules for the same LH. Consider if there are 6 rules for, 99 of which start P, but one of which starts V, and the sentence starts V. Useless work: expands things that are possible top-down but not there Top-down parsers do well if there is useful grammardriven control: search is directed by the grammar Top-down is hopeless for rewriting parts of speech (preterminals) words (terminals). In practice that is always done bottom-up as lexical lookup. Repeated work: anywhere there is common substructure

Principles for success: take If you are going to do parsing-as-search a grammar as is: Left recursive structures must be found, not predicted Empty categories must be predicted, not found Doing these things doesn't fix the repeated work problem: Both TD (LL) and BU (LR) parsers can (and frequently do) do work exponential in the sentence length on LP problems. Principles for success: take Grammar transformations can fix both leftrecursion and epsilon productions Then you parse the same language but different trees Linguists tend to hate you But this is a misconception: they shouldn't You can fix the trees post hoc: The transform-parse-detransform paradigm Principles for success: take Rather than doing parsing-as-search, we do parsing as dynamic programming This is the most standard way to do things Q.v. CKY parsing, next time It solves the problem of doing repeated work But there are also other ways of solving the problem of doing repeated work Memoization (remembering solved subproblems) Also, next time Doing graph-search rather than tree-search. Human parsing Humans often do ambiguity maintenance Have the police eaten their supper? come in and look around. taken out and shot. But humans also commit early and are garden pathed : The man who hunts ducks out on weekends. The cotton shirts are made from grows in Mississippi. The horse raced past the barn fell. Polynomial time parsing of PCFGs Probabilistic or stochastic context-free grammars (PCFGs) G = (T,,, R, P) T is set of terminals is set of nonterminals For LP, we usually distinguish out a set P of preterminals, which always rewrite as terminals is the start symbol (one of the nonterminals) R is rules/productions of the form X γ, where X is a nonterminal and γ is a sequence of terminals and nonterminals (possibly an empty sequence) P(R) gives the probability of each rule. "X #, P(X $ %) = A grammar G generates a language model L. & X $% #R $ " #T * P(") =

PCFGs otation w n = w w n = the word sequence from to n (sentence of length n) w ab = the subsequence w a w b j ab = the nonterminal j dominating w a w b j w a w b The probability of trees and strings P(t) -- The probability of tree is the product of the probabilities of the rules used to generate it. P(w n ) -- The probability of the string is the sum of the probabilities of the trees which have that string as their yield P(w n ) = Σ j P(w n, t) where t is a parse of w n = Σ j P(t) We ll write P( i ζ j ) to mean P( i ζ j i ) We ll want to calculate max t P(t * w ab ) A imple PCFG (in CF) P. V P.7 PP. PP P P. P. V saw. P P PP. P astronomers. P ears.8 P saw. P stars.8 P telescope. Tree and tring Probabilities w = astronomers saw stars ears P(t ) =. *. *.7 *. *. *.8 *. *. *.8 =.97 P(t ) =. *. *. *.7 *. *.8 *. *. *.8 =.68 P(w ) = P(t ) + P(t ) =.97 +.68 =.876

Chomsky ormal Form Treebank binarization All rules are of the form X Y Z or X w. A transformation to this form doesn t change the weak generative capacity of CFGs. With some extra book-keeping in symbol names, you can even reconstruct the same trees a detransform Unaries/empties are removed recursively -ary rules introduce new nonterminals: V P PP becomes V @-V and @-V P PP In practice it s a pain Reconstructing n-aries is easy Reconstructing unaries can be trickier But it makes parsing easier/more efficient -ary Trees in Treebank TreeAnnotations.annotateTree Binary Trees Lexicon and Grammar Parsing TODO: CKY parsing An example: before binarization P V P PP P P P After binarization.. @->_P @->_V @->_V_P V P PP P @PP->_P P P Binary rule P eems redundant? (the rule was already binary) Reason: easier to see how to make finite-order horizontal markovizations it s like a finite automaton (explained later) V P PP V P PP P P P @PP->_P P

P ternary rule P @->_V @->_V_P V P P PP @PP->_P P V P PP P @PP->_P P @->_P P P @->_V @->_V @->_V_P @->_V_P V P PP P @PP->_P P V P PP P @PP->_P P Treebank: empties and unaries P @->_P @->_V -HL V P PP @->_V_P P @PP->_P P V P PP Remembers siblings If there s a rule V P PP PP, @->_V_P_PP will exist. P P-UBJ -OEε Atone VB PTB Tree -OEε VB Atone ofunctags VB Atone oempties VB Atone Atone High Low ounaries 6

The CKY algorithm (96/96) function CKY(words, grammar) returns most probable parse/prob score = new double[#(words)+][#(words)+][#(nonterms)] back = new Pair[#(words)+][#(words)+][#nonterms]] for i=; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+][a] = P(A -> words[i]) //handle unaries boolean added = true while added added = false for A, B in nonterms if score[i][i+][b] > && A->B in grammar prob = P(A->B)*score[i][i+][B] if(prob > score[i][i+][a]) score[i][i+][a] = prob back[i][i+] [A] = B added = true The CKY algorithm (96/96) for span = to #(words) for begin = to #(words)- span end = begin + span for split = begin+ to end- for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if(prob > score[begin][end][a]) score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) //handle unaries boolean added = true while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B]; if(prob > score[begin][end] [A]) score[begin][end] [A] = prob back[begin][end] [A] = B added = true return buildtree(score, back) walls score[][] score[][] score[][] score[][] score[][] P V walls score[][] score[][] score[][] score[][] P V score[][] score[][] score[][] walls P walls V walls score[][] score[][] P V score[][] for i=; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+][a] = P(A -> words[i]); P V walls P V P P V P walls P walls V walls P P V P // handle unaries laws P laws V laws P walls PP @PP->_P @->_V P V P PP @PP->_P @->_V P V P walls PP @PP->_P @->_V P walls V walls P PP @PP->_P @->_V P V P laws prob=score[begin][split][b]*score[split][end][c]*p(a->bc) P laws prob=score[][][p]*score[][][@pp->_p]*p(pp P @PP- V laws >_P) P For each A, only keep the A->BC highest prob. 7

a ts.9 P a ts.7 V ats.967 P.67 @->V P.6 @PP->P P.67 PP @PP->_P.6 @->_V. @->_P P. @P->_P P.6 @->_V_P P.6 s cratch.967 P.77 V.98 P.89 @->V P.7 @PP->P P.89 @->_V P @->_V_P. P P @P->_P. P @->_P.77.77 @PP->_P P. PP @PP->_P.9 @->_V.6 @->_P P.6 @P->_P P.9 @->_V_P P.9 w alls.89 P walls.87 V w alls.6 P. @->V P.676 @PP->P P. PP @PP->_P.87E-6 @->_V.7E- @->_P P.7E- @P->_P P.87E-6 @->_V_P P @->_V P @->_V_P.E- P P @P->_P 7.E- P @->_P.7E- @PP->_P P.87E-6.7E- 7.E- PP @PP->_P.7 @->_V.66 @->_P P.66 @P->_P P.7 @->_V_P P.7 w ith.967 P. V w ith. P.89 @->V P.7 @PP->P P.89 @->_V P @->_V_P.6E- P P @P->_P.E- P @->_P.7.7 @PP->_P P.E- PP @PP->_P. @->_V.69 @->_P P.69 @P->_P P. @->_V_P P. @->_V P @->_V_P.98 P P @P->_P. P @->_P.6.6 @PP->_P P. PP @PP->_P.7 @->_V.8 @->_P P.8 @P->_P P.7 @->_V_P P.7 laws.6 P l aws.77 V laws. P.6 @->V P.7 @PP->P P.6 walls ats PP @PP->_P P ats @->_V V ats @->_P P P @P->_P P @->_V_P P cratch PP @PP->_P.967 P cratch @->_V P cratch V cratch @->_P P.77 P @P->_P P V cratch @->_V_P P.98 P.89.7.89 walls alls PP @PP->_P.89 P alls @->_V P walls V alls @->_P P.87 P @P->_P P V walls @->_V_P P.6 P..676. ith PP @PP->_P.967 P ith @->_V P ith V ith @->_P P. P @P->_P P V ith @->_V_P P. P.89 // handle unaries.7.89 laws.6 P laws P laws V laws.77 P V laws. P.6.7.6 walls Call buildtree(score, back) to get the best parse 8