Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Similar documents
CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Parsing with Context-Free Grammars

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Pushdown Automata (Pre Lecture)

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Pushdown Automata: Introduction (2)

Einführung in die Computerlinguistik

CISC4090: Theory of Computation

A* Search. 1 Dijkstra Shortest Path

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Natural Language Processing. Lecture 13: More on CFG Parsing

Context- Free Parsing with CKY. October 16, 2014

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Syntax Analysis Part I

Syntax Analysis Part I

Everything You Always Wanted to Know About Parsing

Please give details of your answer. A direct answer without explanation is not counted.

MA/CSSE 474 Theory of Computation

Computational Linguistics II: Parsing

Foundations of Informatics: a Bridging Course

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

CPS 220 Theory of Computation Pushdown Automata (PDA)

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Chapter 3. Regular grammars

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Top-Down Parsing and Intro to Bottom-Up Parsing

Introduction to Formal Languages, Automata and Computability p.1/42

Context-Free Languages (Pre Lecture)

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

CSE302: Compiler Design

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Theory of Computation (IV) Yijia Chen Fudan University

Probabilistic Context-free Grammars

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Automata and Computability. Solutions to Exercises

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CFGs and PDAs are Equivalent. We provide algorithms to convert a CFG to a PDA and vice versa.

UNIT-VIII COMPUTABILITY THEORY

Automata and Computability. Solutions to Exercises

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Non-context-Free Languages. CS215, Lecture 5 c

THEORY OF COMPILATION

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Statistical Machine Translation

CSE 105 THEORY OF COMPUTATION

Section 1 (closed-book) Total points 30

Syntax Analysis, VI Examples from LR Parsing. Comp 412

CPS 220 Theory of Computation

CS20a: summary (Oct 24, 2002)

Introduction to Computers & Programming

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

CA Compiler Construction

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Accept or reject. Stack

Pushdown Automata (2015/11/23)

Parser Combinators, (Simply) Indexed Grammars, Natural Language Parsing

CSE 105 THEORY OF COMPUTATION

Computational Models - Lecture 4

Parsing with Context-Free Grammars

CSE 211. Pushdown Automata. CSE 211 (Theory of Computation) Atif Hasan Rahman

Undecidable Problems and Reducibility

CS460/626 : Natural Language

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Syntactic Analysis. Top-Down Parsing

(pp ) PDAs and CFGs (Sec. 2.2)

Syntax Analysis - Part 1. Syntax Analysis

Syntax Analysis Part III

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

CSE 105 THEORY OF COMPUTATION

Theory of Computation (II) Yijia Chen Fudan University

(pp ) PDAs and CFGs (Sec. 2.2)

Homework Assignment 6 Answers

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Lecture 17: Language Recognition

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Intro to Theory of Computation

LECTURER: BURCU CAN Spring

CS415 Compilers Syntax Analysis Bottom-up Parsing

CpSc 421 Final Exam December 15, 2006

Theory of Computation. Theory of Computation

CS Pushdown Automata

Parsing -3. A View During TD Parsing

Artificial Intelligence

CS481F01 Prelim 2 Solutions

Transcription:

Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout 8: Computation & Hierarchical parsing II 1 FTN parsing redux FTN Parser Earley Parser Initialize: Compute initial state set S 0 Compute initial state set S 0 1. S 0 q 0 1. S 0 q 0 q 0 = [Start S, 0] eta-closure= transitive closure of jump arcs q 0 = [Start S, 0, 0] eta-closure= transitive closure of Predict and Complete For each word, wi, 1=1,...,n Loop: S i δ(q, w i ) For each word, w i, 1=1,...,n Si δ(q, w i ) = Scan(S i-1 ) q=item e-closure= closure(predict, Complete) Final: If q f S n then accept; If q f S n then accept; q f = [Start S, 0] q f = [Start S, 0, n] 2 Earley s algorithm Earley s algorithm is like the state set simulation of a nondeterministic FTN presented earlier, with the addition of a single new integer representing the starting point of a hierarchical phrase (since now phrases can start at any point in the input). Note that with simple linear concatenation this information is implicitly encoded via the word position we are at. The stopping or end point of a phrase will be encoded by the word position. To proceed, given input n, a series of state sets S 0, S 1,..., S n is built, where S i contains all the valid items after reading i words. The algorithm as presented is a simple recognizer; as usual, parsing involves more work.

2 6.863J Handout 8, Spring, 2001 In theorem-proving terms, the Earley algorithm selects the leftmost nonterminal (phrase) in a rule as the next candidate to see if one can find a proof for it in the input. (By varying which nonterminal is selected, one can come up with a different strategy for parsing.) To recognize a sentence using a context-free grammar G and Earley s algorithm: 1 Compute the initial state set, S 0 : 1a Put the start state, (Start S, 0, 0), in S 0. 1bExecute the following steps until no new state triples are added. 1b1 Apply complete to S 0. 1b2 Apply predict to S 0. 2 For each word w i, i = 1, 2,...,n, build state set S i given state set S i 1. 2a Apply scan to state set S i. 2bExecute the following steps until no new state triples are added to state set S i. 2b1 Apply complete to S i 2b2 Apply predict to S i 2c If state set S i is empty, reject the sentence; else increment i. 2d If i <n then go to Step 2a; else go to Step 3. 3 If state set n includes the accept state (Start S, 0, n), then accept;. Defining the basic operations on items Definition 1: Scan: For all states (A α tβ,k,i 1) in state set S i 1,if w i = t, then add (A αt β,k, i) to state set S i. Definition 2: Predict (Push): Given a state (A α Bβ,k,i) in state set S i, then add all states of the form (B γ, i, i) to state set S i. Definition 3: Complete (Pop): If state set S i contains the triple (B γ,k,i), then, for all rules in state set k of the form, (A α Bβ,l,k), add (A αb β,l, i) to state set S i. (If the return value is empty, then do nothing.)

Computation & Hierarchical parsing II 3 3 Comparison of FTN and Earley state set parsing The FTN and Earley parsers are almost identical in terms of representations and algorithmic structure. Both construct a sequence of state sets S 0, S 1,..., S n. Both algorithms consist of three parts: an initialization stage; a loop stage, and an acceptance stage. The only difference is that since the Earley parser must handle an expanded notion of an item (it is now a partial tree rather than a partial linear sequence), one must add a single new integer index to mark the return address in hierarchical structure. Note that prediction and completion both act like ɛ-transitions: they spark parser operations without consuming any input; hence, one must close each state set construction under these operations (= we must add all states we can reach after reading i words, including those reached under ɛ-transitions.) Question: where is the stack in the Earley algorithm? (Since we need a stack for return pointers.) FTN Parser Earley Parser Initialize: Compute initial state set S 0 Compute initial state set S 0 1. S 0 q 0 1. S 0 q 0 q 0 = [Start S, 0] q 0 = [Start S, 0, 0] eta-closure= transitive eta-closure= transitive closure closure of jump arcs of Predict and Complete Si δ(q, w ) For each word, wi, 1=1,...,n For each word, w i, 1=1,...,n Loop: S i δ(q, w i ) i = Scan(S i-1 ) q=item e-closure= closure(predict, Complete) Final: If q f S n then accept; If q f S n then accept; q f = [Start S, 0] q f = [Start S, 0, n] 4 A simple example of the algorithm in action Let s now see how this works with a simple grammar and then examine how parses may be retrieved. There have been several schemes proposed for parse storage and retrieval. Here is a simple grammar plus an example parse for John ate ice-cream on the table (ambiguous as to the placement of the Prepositional Phrase on the table).

4 6.863J Handout 8, Spring, 2001 Start S S NP VP NP Name NP Det Noun NP Name PP PP Prep NP VP VNP VP VNPPP V ate Noun ice-cream Name John Name ice-cream Noun table Det the Prep on Let s follow how this parse works using Earley s algorithm and the parser used in laboratory 2. (The headings and running count of state numbers aren t supplied by the parser. Also note that Start is replaced by *DO*. Some additional duplicated states that are printed during tracing have been removed for clarity, and comments added.) (in-package gpsg) (remove-rule-set testrules) (remove-rule-set testdict) (add-rule-set testrules CFG) (add-rule-list testrules ((S ==> NP VP) (NP ==> name) (NP ==> Name PP) (VP ==> V NP) (NP ==> Det Noun) (PP ==> Prep NP) (VP ==> V NP PP))) (add-rule-set testdict DICTIONARY) (add-rule-list testdict ((ate V) (John Name) (table Noun) (ice-cream Noun) (ice-cream Name) (on Prep) (the Det))) (create-cfg-table testg testrules s 0)? (pprint (p "john ate ice-cream on the table" :grammar testg :dictionary testdict :print-states t))

Computation & Hierarchical parsing II 5 State set Return ptr Dotted rule (nothing) 0 0 *D0* ==>. S $ (1) (start state) 0 0 S ==>. NP VP (2) (predict from 1) 0 0 NP ==>. NAME (3) (predict from 2) 0 0 NP ==>. NAME PP (4) (predict from 2) 0 0 NP ==>. DET NOUN (5) (predict from 2) John [Name] 1 0 NP ==> NAME. (6) (scan over 3) 1 0 NP ==> NAME. PP (7) (scan over 4) 1 0 S ==> NP. VP (8) (complete 6 to 2) 1 1 PP ==>. PREP NP (9) (predict from 7) 1 1 VP ==>. V NP (10) (predict from 8) 1 1 VP ==>. V NP PP (11) (predict from 8) ate [V] 2 1 VP ==> V. NP (12) (scan over 10) 2 1 VP ==> V. NP PP (13) (scan over 11) 2 2 NP ==>. NAME (14) (predict from 12/13) 2 2 NP ==>. NAME PP (15) (predict from 12/13) 2 2 NP ==>. DET NOUN (16) (predict from 12/13) ice-cream [Name, Noun] 3 2 NP ==> NAME. (17) (scan over 14) 3 2 NP ==> NAME. PP (18) (scan over 15) 3 1 VP ==> V NP. PP (19) (complete 17 to 13) 3 1 VP ==> V NP. (20) (complete 17 to 12) 3 3 PP ==>. PREP NP (21) (predict from 18/19) 3 0 S ==> NP VP. (22) (complete 20 to 8) 3 0 *D0* ==> S. $ (23) (complete 8 to 1) on [Prep] 4 3 PP ==> PREP. NP (24) (scan over 21) 4 4 NP ==>. NAME (25) (predict from 24) 4 4 NP ==>. NAME PP (26) (predict from 24) 4 4 NP ==>. DET NOUN (27) (predict from 24) the [Det] 5 4 NP ==> DET. NOUN (28) (scan over 27) table [Noun] 6 4 NP ==> DET NOUN. (29) (scan over 28) 6 3 PP ==> PREP NP. (30) (complete 29 to 24) 6 1 VP ==> V NP PP. (31) (complete 24 to 19) 6 2 NP ==> NAME PP. (32) (complete 24 to 18) 6 0 S ==> NP VP. (33) (complete 8 to 1) 6 0 *DO* ==> S. (34) (complete 1) [parse 1] 6 1 VP ==> V NP. (35) (complete 18 to 12) 6 0 S ==> NP VP. (36) (complete 12 to 1) = 33

6 6.863J Handout 8, Spring, 2001 *DO* S (34) S NP VP (33) VP V NP PP (31) VP V NP (35) NP Name PP (32) PP Prep NP (30) NP Det Noun (29) Figure 1: Distinct parses lead to distinct state triple paths in the Earley algorithm 6 0 *DO* ==> S. (37) (complete 1) = 34 [parse 2] 6 1 VP ==> V NP. PP (38) (complete 18 to 13) 6 6 PP ==>. PREP NP (39) (predict from 38) ((START (S (NP (NAME JOHN)) (VP (V ATE) (NP (NAME ICE-CREAM)) (PP (PREP ON) (NP (DET THE) (NOUN TABLE)))))) (START (S (NP (NAME JOHN)) (VP (V ATE) (NP (NAME ICE-CREAM) (PP (PREP ON) (NP (DET THE) (NOUN TABLE)))))))) 5 Time complexity of the Earley algorithm The worst case time complexity of the Earley algorithm is dominated by the time to construct the state sets. This in turn is decomposed into the time to process a single item in a state set times the maximum number of items in a state set (assuming no duplicates; thus, we are assuming some implementation that allows us to quickly check for duplicate states in a state

Computation & Hierarchical parsing II 7 set). In the worst case, the maximum number of distinct items is the maximum number of dotted rules times the maximum number of distinct return values, or G n. The time to process a single item can be found by considering separately the scan, predict and complete actions. Scan and predict are effectively constant time (we can build in advance all the possible single next-state transitions, given a possible category). The complete action could force the algorithm to advance the dot in all the items in a state set, which from the previous calculation, could be G n items, hence proportional to that much time. We can combine the values as shown below to get an upper bound on the execution time, assuming that the primitive operations of our computer allow us to maintain lists without duplicates without any additional overhead (say, by using bit-vectors; if this is not done, then searching through or ordering the list of states could add in another G factor.). Note as before that grammar size (measure by the total number of symbols in the rule system) is an important component to this bound; more so than the input sentence length, as you will see in Laboratory 2. If there is no ambiguity, then this worst case does not arise (why?). The parse is then linear time (why?). If there is only a finite ambiguity in the grammar (at each step, there are only a finite, bounded in advance number of ambiguous attachment possibilities) then the worst case 2 time is proportional to n. Maximum number of state sets X Maximum time to build ONE state set X Maximum number of items Maximum time to process ONE item Maximum possible number of items= [maximum number of dotted rules X maximum number of distinct return values] X