Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

Similar documents
Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Einführung in die Computerlinguistik

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Computational Linguistics II: Parsing

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Mildly Context-Sensitive Grammar Formalisms: Thread Automata

Natural Language Processing. Lecture 13: More on CFG Parsing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Everything You Always Wanted to Know About Parsing

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Tree Adjoining Grammars

Parsing beyond context-free grammar: Parsing Multiple Context-Free Grammars

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Theory of Computation - Module 3

Pushdown Automata: Introduction (2)

Everything You Always Wanted to Know About Parsing

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

THEORY OF COMPILATION

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

Computational Models - Lecture 4

Parser Combinators, (Simply) Indexed Grammars, Natural Language Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing

MA/CSSE 474 Theory of Computation

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Fundamentele Informatica II

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

Context-Free Grammars (and Languages) Lecture 7

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Introduction to Bottom-Up Parsing

Parsing. A Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 27

Foundations of Informatics: a Bridging Course

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Syntactic Analysis. Top-Down Parsing

Lecture 11 Context-Free Languages

Introduction to Bottom-Up Parsing

Creating a Recursive Descent Parse Table

CA Compiler Construction

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

CISC 4090 Theory of Computation

CISC4090: Theory of Computation

Einführung in die Computerlinguistik

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Introduction to Bottom-Up Parsing

Syntax Analysis Part III

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Introduction to Bottom-Up Parsing

The Pumping Lemma for Context Free Grammars

CSE 105 THEORY OF COMPUTATION

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata

Syntax Analysis (Part 2)

CS311 Computational Structures More about PDAs & Context-Free Languages. Lecture 9. Andrew P. Black Andrew Tolmach

Pushdown Automata (Pre Lecture)

Remembering subresults (Part I): Well-formed substring tables

Parsing with Context-Free Grammars

CSE 355 Test 2, Fall 2016

Intro to Theory of Computation

CSE 105 THEORY OF COMPUTATION

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

CS415 Compilers Syntax Analysis Bottom-up Parsing

CPS 220 Theory of Computation Pushdown Automata (PDA)

Computational Models - Lecture 4 1

Notes for Comp 497 (454) Week 10

Context- Free Parsing with CKY. October 16, 2014

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Pushdown Automata. Reading: Chapter 6

Parsing -3. A View During TD Parsing

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Miscellaneous. Closure Properties Decision Properties

Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. The stack

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 7a Syntax Analysis Elias Athanasopoulos

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Introduction to Theory of Computing

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

CS 406: Bottom-Up Parsing

Accept or reject. Stack

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

(pp ) PDAs and CFGs (Sec. 2.2)

Pushdown Automata. Chapter 12

CS500 Homework #2 Solutions

Theory of Computation (IV) Yijia Chen Fudan University

UNIT-VI PUSHDOWN AUTOMATA

The View Over The Horizon

Statistical Machine Translation

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Context-Free Languages (Pre Lecture)

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

CSE302: Compiler Design

Fundamentele Informatica 3 Antwoorden op geselecteerde opgaven uit Hoofdstuk 7 en Hoofdstuk 8

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Transcription:

Parsing Left-Corner Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 17

Table of contents 1 Motivation 2 Algorithm 3 Look-ahead 4 Chart Parsing 2 / 17

Motivation Problems with pure TD/BU approaches: Top-Down does not check whether the actual input corresponds to the predictions made. Bottom-Up does not check whether the recognized constituents correspond to anything one might predict starting from S. Mixed approaches help to overcome these problems: Left-Corner Parsing parses parts of the tree top down, parts bottom-up. Earley-Parsing is a chart-based combination of top-down predictions and bottom-up completions. 3 / 17

Idea In a production A X 1... X k, the first righthand side element X 1 is called the left corner of this production. Notation: A, X 1 LC. Idea: Parse the left corner bottom-up while parsing X 2,..., X k topdown. In other words, in order to predict the subtree A a parse tree for X 1 must already be there. X 1 X 2... X k 4 / 17

Algorithm (1) We assume a CFG without ε-productions and without loops. We need the following three stacks: a stack Γ compl containing completed elements that can be used as potential left corners for applying new productions. Initial value: w a stack Γ td containing the top-down predicted elements of a rhs (i.e., the rhs without the left corner) Initial value: S a stack Γ lhs containing the lhs categories that are waiting to be completed. Once all the top-down predicted rhs symbols are completed, the category is moved to Γ compl. Initial value: ε 5 / 17

Algorithm (2) Item form [Γ compl, Γ td, Γ lhs ] with Γ compl (N T), Γ td (N T {$}) where $ is a new symbol marking the end of a rhs, Γ lhs N. Whenever the symbols X 2,..., X k from a rhs are pushed onto Γ td, they are preceded by $ to mark the end of a rhs (i.e., the point where a category can be completed). Axiom: [w, S, ε] 6 / 17

Algorithm (3) Reduce can be applied if the top of Γ compl is the left corner X 1 of some rule A X 1 X 2... X k. Then X 1 is popped, X 2... X k $ is pushed onto Γ td and A is pushed onto Γ lhs : Reduce: [X 1 α, Bβ, γ] [α, X 2... X k $Bβ, Aγ] A X 1 X 2... X k P, B $ Once the entire righthand side has been completed (top of Γ td is $), the completed category is moved from Γ lhs to Γ compl : Move: [α, $β, Aγ] [Aα, β, γ] A N 7 / 17

Algorithm (4) A completed category can be a left corner (then reduce is applied) or it can be the next symbol on the Γ td stack, then both can be popped: Remove: [Xα, Xβ, γ] [α, β, γ] The recognizer is successfull if Γ compl = Γ td = Γ lhs = ε: Goal item: [ε, ε, ε] 8 / 17

Algorithm (5) Example: Left Corner Parsing Productions: S asa bsb c input w = abcba. Γ compl Γ td Γ lhs operation abcba S ε bcba Sa$S S reduce cba Sb$Sa$S SS reduce ba $Sb$Sa$S SSS reduce Sba Sb$Sa$S SS move ba b$sa$s SS remove a $Sa$S SS remove Sa Sa$S S move a a$s S remove ε $S S remove S S ε move ε ε ε remove 9 / 17

Algorithm (6) Problematic for left-corner parsing: ε-productions: there is no left corner that can trigger a reduce step with an ε-production. If we allow ε-productions to be predicted in reduce steps without a left corner, we would add them an infinite number of times. loops: as in the LL-parsing case, loops can cause an infinite sequence of reduce and move steps. This problem is already avoided with the item-based formulation since we would only try to create the same items again. Both problems can be overcome using the chart-based version with dotted productions described later. 10 / 17

Look-ahead (1) Idea: build the reflexive transitive closure LC of the left corner relation LC, before applying reduce, check whether the top of Γ td stands in the relation LC to the lhs of the new production we predict: Reduce: [X 1 α, Bβ, γ] [α, X 2... X k $Bβ, Aγ] A X 1 X 2... X k P, B, A LC Difference between LC and First: LC for a given non-terminal can be non-terminals and terminals, while the First sets contain only terminals. LC = { A, X A Xα} 11 / 17

Look-ahead (2) Example: VP V NP, VP VP PP, V sees, NP Det N, Det the, N N PP, N girl, N telescope, PP P NP, P with LC: VP, V, VP, VP, V, sees NP, Det, Det, the, N, N, N, girl, N, telescope, PP, P, P, with LC = LC : { VP, sees, V, V, NP, NP, NP, the, PP, PP, PP, with, P, P } 12 / 17

Chart Parsing (1) Problem of left corner parsing: non-deterministic. In order to avoid computing partial results several times, we can use tabulation, i.e., adopt chart parsing. Items we need to tabulate: Completely recognized categories: passive items [X, i, l] Partially recognized productions: active items [A α β, i, l] with α (N T) +, β (N T) (i index of first terminal in yield, l length of the yield) 13 / 17

Chart Parsing (2) Let us again assume a CFG without ε-productions. We start with the initial items [w i, i, 1]. The operations reduce, remove and move are then as follows: Reduce: If [X 1, i, l] and A X 1 X 2... X k [A X 1 X 2... X k, i, l]. P, then we add Move: If [A X 1 X 2... X k, i, l], then we add [A, i, l] Remove: If [X, i, l] and [A α Xβ, j, i j] then we add [A αx β, j, i j + l]. 14 / 17

Chart Parsing (3) Parsing Schema: Scan: Reduce: Remove: Move: [w i, i, 1] 1 i n [X, i, l] [A X α, i, l] A Xα P [A α Xβ, i, l 1 ], [X, j, l 2 ] [A αx β, i, l 1 + l 2 ] [A αx, i, l] [A, i, l] Goal item: [S, 1, n]. j = i + l 1 (This is actually the same algo as the CYK with dotted productions seen earlier in the course, except for different names of the rules and a different use of indices.) 15 / 17

Chart Parsing (4) Example: Left Corner Chart Parsing Productions: S asa bsb c, input w = abcba. item(s) rule antecedens items [a, 1, 1], [b, 2, 1], [c, 3, 1], [b, 4, 1], [a, 5, 1] (axioms) [S a Sa, 1, 1] reduce [a, 1, 1] [S b Sb, 2, 1] reduce [b, 2, 1] [S c, 3, 1] reduce [c, 3, 1] [S a Sb, 4, 1] reduce [b, 4, 1] [S b Sa, 5, 1] reduce [a, 5, 1] [S, 3, 1] move [S c, 3, 1] [S bs b, 2, 2] remove [S b Sb, 2, 1], [S, 3, 1] [S bsb, 2, 3] remove [S bs b, 2, 2], [b, 4, 1] [S, 2, 3] move [S bsb, 2, 3] [S as a, 1, 4] remove [S a Sa, 1, 1], [S, 2, 3] [S asa, 1, 5] remove [S as a, 1, 4], [a, 5, 1] [S, 1, 5] move [S asa, 1, 5] 16 / 17

Conclusion The left corner of a production is the first element of its rhs. Predict a production only if its left corner has already been found. In general non-deterministic. Problematic for ε-productions and loops. Can be implemented as a chart parser with passive and active items. In the chart parser, ε-productions can be dealt with (they require an additional Scan-ε rule) and loops are no longer a problem. 17 / 17