CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Similar documents
Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Natural Language Processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Introduction to Computational Linguistics

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Remembering subresults (Part I): Well-formed substring tables

Parsing with Context-Free Grammars

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

CISC4090: Theory of Computation

CISC 4090 Theory of Computation

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

MA/CSSE 474 Theory of Computation

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Natural Language Processing. Lecture 13: More on CFG Parsing

Statistical Machine Translation

CPS 220 Theory of Computation

Context Free Grammars

Automata Theory CS F-08 Context-Free Grammars

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Context- Free Parsing with CKY. October 16, 2014

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

CISC 4090 Theory of Computation

5 Context-Free Languages

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Computational Models - Lecture 4 1

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Advanced Natural Language Processing Syntactic Parsing

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Intro to Theory of Computation

Context Free Languages and Grammars

Probabilistic Context-free Grammars

Computational Models - Lecture 4 1

Introduction to Theory of Computing

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Everything You Always Wanted to Know About Parsing

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Maschinelle Sprachverarbeitung

Theory of Computation

Maschinelle Sprachverarbeitung

Computational Models - Lecture 4

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

LECTURER: BURCU CAN Spring

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Accept or reject. Stack

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Einführung in die Computerlinguistik

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Notes for Comp 497 (454) Week 10

Mildly Context-Sensitive Grammar Formalisms: Thread Automata

Final exam study sheet for CS3719 Turing machines and decidability.

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Computational Models: Class 5

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

CSCI Compiler Construction

Context-Free Grammars and Languages. Reading: Chapter 5

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

FLAC Context-Free Grammars

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

CS481F01 Prelim 2 Solutions

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Theory of Computation Turing Machine and Pushdown Automata

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Pushdown Automata (2015/11/23)

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Languages, regular languages, finite automata

Theory of Computation - Module 3

Computing if a token can follow

Decision problem of substrings in Context Free Languages.

Lecture 11 Context-Free Languages

Theory Bridge Exam Example Questions

Parsing with Context-Free Grammars

MTH401A Theory of Computation. Lecture 17

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Foundations of Informatics: a Bridging Course

Section 1 (closed-book) Total points 30

A* Search. 1 Dijkstra Shortest Path

Grammars and Context-free Languages; Chomsky Hierarchy

Improved TBL algorithm for learning context-free grammar

Automata and Computability. Solutions to Exercises

Syntactic Analysis. Top-Down Parsing

Pushdown Automata (Pre Lecture)

CS375: Logic and Theory of Computing

Transcription:

CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate the set of grammatical strings and rule out ungrammatical strings. Such a formal system has computational properties. One such property is a simple decision problem: given a string, can it be generated by the formal system (recognition). If it is generated, what were the steps taken to recognize the string (parsing). 2/34

Why are parsing algorithms important? Consider the recognition problem: find algorithms for this problem for a particular formal system. The algorithm must be decidable. Preferably the algorithm should be polynomial: enables computational implementations of linguistic theories. Elegant, polynomial-time algorithms exist for formalisms like CFG 3/34 Number of derivations CFG rules { S S S, S a } n : a n number of parses 1 1 2 1 3 2 4 5 5 14 6 42 7 132 8 429 9 1430 10 4862 11 16796 4/34

Number of derivations grows exponentially 14 logplot 12 10 log(derivations) 8 6 4 2 0 0 2 4 6 8 10 12 14 Length L(G) =a+ usingcfgrules{ S S S, S a } 5/34 Syntactic Ambiguity: (Church and Patil 1982) Algebraic character of parse derivations Power Series for grammar for coordination type of grammars (more general than PPs): N natural language processing course N N N We write an equation for algebraic expansion starting from N The equation represents generation of each string in the language as the terms, and the number of different ways of generating the string as the coefficients: N = nat. + lang. + proc. + course + + nat. lang. + nat. proc. +... + 2(nat. lang. proc.) + 2(lang. proc. course) +... + 5(nat. lang. proc. course) +... + 14... 6/34

CFG Ambiguity Coefficients in previous equation equal the number of parses for each string derived from E These ambiguity coefficients are Catalan numbers: Cat(n) = 1 2n n +1 n a b is the binomial coefficient a b = a! (b!(a b)!) 7/34 Catalan numbers Why Catalan numbers? Cat(n) is the number of ways to parenthesize an expression of length n with two conditions: 1. there must be equal numbers of open and close parens 2. they must be properly nested so that an open precedes a close 8/34

Catalan numbers For an expression of with n ways to form constituents there are atotal of 2n choose n parenthesis pairs, e.g. for n = 2, 4 = 6: 2 a(bc), a)bc(, )a(bc, (ab)c, )ab(c, ab)c( But for each valid parenthesis pair, additional n pairs are created that have the right parenthesis to the left of its matching left parenthesis, from e.g. above: a)bc(, )a(bc, )ab(c, ab)c( So we divide 2n choose n by n + 1: 2n n Cat(n) = n +1 9/34 Catalan numbers n catalan(n) 450 1 1 400 2 2 350 3 5 300 4 14 250 5 42 6 132 200 150 7 429 100 8 1430 9 4862 50 10 16796 0 1 2 3 4 5 6 7 10 / 34

0 4e+07 6e+16 3.5e+07 5e+16 3e+07 2.5e+07 4e+16 2e+07 3e+16 1.5e+07 2e+16 1e+07 5e+06 1e+16 0 2 4 6 8 10 12 14 16 0 0 5 10 15 20 25 30 35 4e+35 1.2e+48 3.5e+35 1e+48 3e+35 2.5e+35 8e+47 2e+35 6e+47 1.5e+35 4e+47 1e+35 5e+34 2e+47 0 0 10 20 30 40 50 60 70 0 0 10 20 30 40 50 60 70 80 90 11 / 34 Syntactic Ambiguity Cat(n) also provides exactly the number of parses for the sentence: John saw the man on the hill with the telescope (generated by the grammar given below, a different grammar will have different number of parses) S NP VP NP John Det N N man hill telescope VP VNP Det the VP VP PP NP NP PP PP PNP V saw P on with number of parse trees = Cat(2 + 1) = 5. With 8 PPs: Cat(9) = 4862 parse trees 12 / 34

Syntactic Ambiguity For grammar on previous page, number of parse trees = Cat(2 + 1) = 5. Why Cat(2 + 1)? For 2 PPs, there are 4 things involved: VP, NP, PP-1, PP-2 We want the items over which the grammar imposes all possible parentheses The grammar is structured in such a way that each combination with a VP or an NP reduces the set of items over which we obtain all possible parentheses to 3 This can be viewed schematically as VP * NP * PP-1 * PP-2 1. (VP (NP (PP-1 PP-2))) 2. (VP ((NP PP-1) PP-2)) 3. ((VP NP) (PP-1 PP-2)) 4. ((VP (NP PP-1)) PP-2) 5. (((VP NP) PP-1) PP-2) Note that combining PP-1 and PP-2 is valid because PP-1 has an NP inside it. 13 / 34 Syntactic Ambiguity Other sub-grammars are simpler. For chains of adjectives: cross-eyed pot-bellied ugly hairy professor We can write the following grammar, and compute the power series: ADJP adj ADJP ADJP = 1+adj + adj 2 + adj 3 +... 14 / 34

Syntactic Ambiguity Now consider power series of combinations of sub-grammars: S = NP VP ( The number of products over sales... ) ( is near the number of sales... ) Both the NP subgrammar and the VP subgrammar power series have Catalan coefficients 15 / 34 Syntactic Ambiguity The power series for the S NP VP grammar is the multiplication: ( N i Cat i ( PN) i ) ( is j Cat j ( PN) j ) In a parser for this grammar, this leads to a cross-product: L R = {( l, r ) l L & r R } 16 / 34

Syntactic Ambiguity A simple change: Is ( The number of products over sales... ) ( near the number of sales... ) = Is N i Cat i ( PN) i ) ( j Cat j ( PN) j ) = Is N i = Is N i+j Cat i Cat j ( PN) i+j j Cat i+j+1 ( PN) i+j 17 / 34 Dealing with Ambiguity A CFG for natural language can end up providing exponentially many analyses, approx n!, for an input sentence of length n Much worse than the worst case in the part of speech tagging case, which was n m for m distinct part of speech tags If we actually have to process all the analyses, then our parser might as well be exponential Typically, we can directly use the compact description (in the case of CKY, the chart or 2D array, also called a forest) 18 / 34

Dealing with Ambiguity Solutions to this problem: CKY algorithm: computes all parses in O(n 3 ) time. Problem is that worst-case and average-case time is the same. Earley algorithm: computes all parses in O(n 3 )timefor arbitrary CFGs, O(n 2 ) for unambiguous CFGs, and O(n) for so-called bounded-state CFGs (e.g. S asa bsb aa bb which generates palindromes over the alphabet a, b). Also, average case performance of Earley is better than CKY. Deterministic parsing: only report one parse. Two options: top-down (LL parsing) or bottom-up (LR or shift-reduce) parsing 19 / 34 Shift-Reduce Parsing Every CFG has an equivalent pushdown automata: a finite state machine which has additional memory in the form of a stack Consider the grammar: NP Det N, Det the, N dogs Consider the input: the dogs shift the first word the into the stack, check if the top n symbols in the stack matches the right hand side of a rule in which case you can reduce that rule, or optionally you can shift another word into the stack 20 / 34

Shift-Reduce Parsing reduce using the rule Det the, andpushdet onto the stack shift dogs, andthenreduceusingn dogs and push N onto the stack the stack now contains Det, N which matches the rhs of the rule NP Det N which means we can reduce using this rule, pushing NP onto the stack If NP is the start symbol and since there is no more input left to shift, we can accept the string Can this grammar get stuck (that is, there is no shift or reduce possible at some stage while parsing) on a valid string? What happens if we add the rule NP dogs to the grammar? 21 / 34 Shift-Reduce Parsing Sometimes humans can be led down the garden-path when processing a sentence (from left to right) Such garden-path sentences lead to a situation where one is forced to backtrack because of a commitment to only one out of many possible derivations Consider the sentence: The emergency crews hate most is domestic violence. Consider the sentence: The horse raced past the barn fell 22 / 34

Shift-Reduce Parsing Once you process the word fell you are forced to reanalyze the previous word raced as being a verb inside a relative clause: raced past the barn, meaningthe horse that was raced past the barn Notice however that other examples with the same structure but different words do not behave the same way. For example: the flowers delivered to the patient arrived 23 / 34 Earley Parsing Earley Parsing is a more advanced form of CKY parsing with two novel ideas: A dotted rule as a way to get around the explicit conversion of a CFG to Chomsky Normal Form Do not explore every single element in the CKY parse chart. Instead use goal-directed search Since natural language grammars are quite large, and are often modified to be able to parse more data, avoiding the explicit conversion to CNF is an advantage A dotted rule denotes that the right hand side of a CF rule has been partially recognized/parsed By avoiding the explicit n 3 loop of CKY, we can parse some grammars more efficiently, in time n 2 or n. Goal-directed search can be done in any order including left to right (more psychologically plausible) 24 / 34

Earley Parsing S NP VP indicates that once we find an NP and a VP we have recognized an S S NP VP indicates that we ve recognized an NP and we need a VP S NP VP indicates that we have a complete S Consider the dotted rule S NP VP and assume our CFG contains a rule NP John Because we have such an NP rule we can predict anew dotted rule NP John 25 / 34 Earley Parsing If we have the dotted rule: NP John and the next input symbol on our input tape is the word John we can scan the input and create a new dotted rule NP John Consider the dotted rule S NP VP and NP John Since NP has been completely recognized we can complete S NP VP These three steps: predictor, scanner and completer form the Earley parsing algorithm and can be used to parse using any CFG without conversion to CNF Note that we have not accounted for in the scanner 26 / 34

Earley Parsing A state is a dotted rule plus a span over the input string, e.g. (S NP VP, [4, 8]) implies that we have recognized an NP We store all the states in a chart inchart[j] we store all states of the form: (A α β, [i, j]), where α, β (N T ) 27 / 34 Earley Parsing Note that (S NP VP, [0, 8]) implies that in the chart there are two states (NP α, [0, 8]) and (S NP VP, [0, 0]) this is the completer rule, the heart of the Earley parser Also if we have state (S NP VP, [0, 0]) in the chart, then we always predict the state (NP α, [0, 0]) for all rules NP α in the grammar 28 / 34

Earley Parsing S NP VP NP Det N NP PP John Det the N cookie table VP VP PP VNP V V ate PP PNP P on Consider the input: 0 John 1 ate 2 on 3 the 4 table 5 What can we predict from the state (S NP VP, [0, 0])? What can we complete from the state (V ate, [1, 2])? 29 / 34 Earley Parsing enqueue(state, j): input: state = (A α β, [i, j]) input: j (insert state into chart[j]) if state not in chart[j] then chart[j].add(state) end if predictor(state): input: state = (A B C, [i, j]) for all rules C α in the grammar do newstate = (C α, [j, j]) enqueue(newstate, j) end for 30 / 34

Earley Parsing scanner(state, tokens): input: state = (A B ac, [i, j]) input: tokens (list of input tokens to the parser) if tokens[j] ==a then newstate = (A Ba C, [i, j + 1]) enqueue(newstate, j+1) end if completer(state): input: state = (A BC, [j, k]) for all rules X Y AZ, [i, j] inchart[j] do newstate = (X Y A Z, [i, k]) enqueue(newstate, k) end for 31 / 34 Earley Parsing earley(tokens[0... N], grammar): for each rule S α where S is the start symbol do add (S α, [0, 0]) to chart[0] end for for 0 j N +1do for state in chart[j] that has not been marked do mark state if state = (A α B β, [i, j]) then predictor(state) else if state = (A α b β, [i, j]), j < N +1then scanner(state, tokens) else completer(state) end if end for end for return yes if chart[n + 1] has a final state 32 / 34

Earley Parsing isincomplete(state): if state is of type (A α, [i, j]) then return False end if return True nextcategory(state): if state == (A B ν C, [i, j]) then return ν (ν can be terminal or non-terminal) else raise error end if 33 / 34 Earley Parsing isfinal(state): input: state = (A α, [i, j]) cond1 = A is a start symbol cond2 = isincomplete(state) is False cond3 = j is equal to length(tokens) if cond1 and cond2 and cond3 then return True end if return False istoken(category): if category is terminal symbol then return True end if return False 34 / 34