Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Similar documents
Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Compiler Construction Lent Term 2015 Lectures (of 16)

CS 406: Bottom-Up Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Compiler Construction Lent Term 2015 Lectures (of 16)

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Bottom-Up Syntax Analysis

Computer Science 160 Translation of Programming Languages

Parsing -3. A View During TD Parsing

Compiler Construction Lectures 13 16

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

CA Compiler Construction

THEORY OF COMPILATION

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Compiling Techniques

Syntax Analysis (Part 2)

Compiler Design Spring 2017

Syntax Analysis Part I

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Parsing VI LR(1) Parsers

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Syntax Analysis Part I

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Compiler Construction

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Syntactic Analysis. Top-Down Parsing

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

1. For the following sub-problems, consider the following context-free grammar: S AA$ (1) A xa (2) A B (3) B yb (4)

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

CS20a: summary (Oct 24, 2002)

CS415 Compilers Syntax Analysis Bottom-up Parsing

Why augment the grammar?

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

This lecture covers Chapter 5 of HMU: Context-free Grammars

Pushdown Automata (Pre Lecture)

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

1. For the following sub-problems, consider the following context-free grammar: S A$ (1) A xbc (2) A CB (3) B yb (4) C x (6)

Syntax Analysis - Part 1. Syntax Analysis

Context-free Grammars and Languages

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Creating a Recursive Descent Parse Table

Syntax Analysis Part III

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Compiler Principles, PS4

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

Chapter 4: Bottom-up Analysis 106 / 338

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. source code. IR IR target.

Bottom-Up Syntax Analysis

INF5110 Compiler Construction

Deterministic Finite Automaton (DFA)

Introduction to Theory of Computing

Context Free Grammars

UNIT-VIII COMPUTABILITY THEORY

CSE302: Compiler Design

Computability Theory

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Exercises. Exercise: Grammar Rewriting

CS415 Compilers Syntax Analysis Top-down Parsing

CSE 105 THEORY OF COMPUTATION

Lecture 11 Context-Free Languages

Lecture 4: Finite Automata

Context free languages

Lecture 3: Finite Automata

Syntax Directed Transla1on

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Computer Science 160 Translation of Programming Languages

Lecture 3: Finite Automata

Pushdown Automata: Introduction (2)

CISC4090: Theory of Computation

Foundations of Informatics: a Bridging Course

CS 314 Principles of Programming Languages

Type 3 languages. Type 2 languages. Regular grammars Finite automata. Regular expressions. Type 2 grammars. Deterministic Nondeterministic.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 7a Syntax Analysis Elias Athanasopoulos

Bottom-Up Syntax Analysis

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

CS153: Compilers Lecture 5: LL Parsing

CPS 220 Theory of Computation Pushdown Automata (PDA)

Context-Free Languages (Pre Lecture)

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

CSE 105 THEORY OF COMPUTATION

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Pushdown Automata (2015/11/23)

Transcription:

Administrivia Test I during class on 10 March. Bottom-Up Parsing Lecture 11-12 From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS14 Lecture 11 1 2/20/08 Prof. Hilfinger CS14 Lecture 11 2 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Builds on ideas in top-down parsing Most common form is LR parsing L means that tokens are read left to right R means that it constructs a rightmost derivation An Introductory xample LR parsers don t need left-factored grammars and can also handle left-recursive grammars Consider the following grammar: ) Why is this not LL1)? Consider the string: ) ) 2/20/08 Prof. Hilfinger CS14 Lecture 11 3 2/20/08 Prof. Hilfinger CS14 Lecture 11 4 The Idea LR parsing reduces a string to the start symbol by inverting productions: A Bottom-up Parse in Detail 1) ) ) sent input string of terminals while sent S: Identify first β in sent such that A β is a production and S * α A γ α β γ = sent Replace β by A in sent so α A γ becomes new sent) Such α β s are called handles ) ) 2/20/08 Prof. Hilfinger CS14 Lecture 11 2/20/08 Prof. Hilfinger CS14 Lecture 11 1

A Bottom-up Parse in Detail 2) A Bottom-up Parse in Detail 3) ) ) ) ) ) ) ) ) ) ) handles in red) ) ) ) ) 2/20/08 Prof. Hilfinger CS14 Lecture 11 7 2/20/08 Prof. Hilfinger CS14 Lecture 11 8 A Bottom-up Parse in Detail 4) A Bottom-up Parse in Detail ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) 2/20/08 Prof. Hilfinger CS14 Lecture 11 9 2/20/08 Prof. Hilfinger CS14 Lecture 11 10 A Bottom-up Parse in Detail ) Where Do Reductions Happen ) ) ) ) ) ) ) ) A reverse rightmost derivation ) ) Because an LR parser produces a reverse rightmost derivation: If αβγ is step of a bottom-up parse with handle αβ And the next reduction is by A β Then γ is a string of terminals! Because αaγ αβγ is a step in a right-most derivation Intuition: We make decisions about what reduction to use after seeing all symbols in handle, rather than before as for LL1)) 2/20/08 Prof. Hilfinger CS14 Lecture 11 11 2/20/08 Prof. Hilfinger CS14 Lecture 11 12 2

Notation Idea: Split the string o two substrings Right substring a string of terminals) is as yet unexamined by parser Left substring has terminals and non-terminals The dividing po is marked by a I The I is not part of the string Marks end of next potential handle Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift: Move I one place to the right, ing a terminal to the left string I ) I ) Reduce: Apply an inverse production at the handle. If ) is a production, then ) I ) I ) Initially, all input is unexamined: Ix 1 x 2... x n 2/20/08 Prof. Hilfinger CS14 Lecture 11 13 2/20/08 Prof. Hilfinger CS14 Lecture 11 14 Shift-Reduce xample Shift-Reduce xample I ) )$ I ) )$ I ) )$ red. ) ) ) ) Shift-Reduce xample Shift-Reduce xample I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ red. ) ) ) ) 3

Shift-Reduce xample Shift-Reduce xample I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ ) I )$ red. ) ) ) ) ) Shift-Reduce xample Shift-Reduce xample I ) )$ I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ ) I )$ red. ) I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ ) I )$ red. ) I )$ 3 times I )$ 3 times I )$ red. ) ) ) ) Shift-Reduce xample Shift-Reduce xample I ) )$ I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ ) I )$ red. ) I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ ) I )$ red. ) I )$ 3 times I )$ 3 times I )$ red. I )$ red. I )$ I )$ ) I $ red. ) ) ) ) ) 4

Shift-Reduce xample I ) )$ I ) )$ red. I ) )$ 3 times I ) )$ red. I ) )$ ) I )$ I )$ I )$ I )$ ) I $ I $ red. ) 3 times red. red. ) accept ) ) The Stack Left string can be implemented as a stack Top of the stack is the I Shift pushes a terminal on the stack Reduce pops 0 or more symbols from the stack production rhs) and pushes a non-terminal on the stack production lhs) 2/20/08 Prof. Hilfinger CS14 Lecture 11 2 Key Issue: When to Shift or Reduce? Decide based on the left string the stack) Idea: use a finite automaton DFA) to decide when to or reduce The DFA input is the stack up to potential handle DFA alphabet consists of terminals and nonterminals DFA recognizes complete handles We run the DFA on the stack and we examine the resulting state X and the token tok after I If X has a transition labeled tok then If X is labeled with A β on tok then reduce 2/20/08 Prof. Hilfinger CS14 Lecture 11 27 LR1) Parsing. An xample 0 1 2 3 4 accept on $ 7 ) on $, 8 ) 10 ) 11 on $, 9 on ), ) on ), I ) )$ I ) )$ I ) )$ x3) I ) )$ I ) )$ ) I )$ ) I )$ I )$ I )$ ) I $ I $ x3) ) accept Representing the DFA Representing the DFA. xample Parsers represent the DFA as a 2D table As for table-driven lexical analysis Lines correspond to DFA states Columns correspond to terminals and nonterminals In classical treatments, columns are split o: Those for terminals: action table Those for non-terminals: goto table 2/20/08 Prof. Hilfinger CS14 Lecture 11 29 The table for a fragment of our DFA: 3 4 ) 7 on ), ) on $, 3 4 7 s s8 r r ) r ) 2/20/08 Prof. Hilfinger CS14 Lecture 11 30 s4 s7 ) r $ g

The LR Parsing Algorithm After a or reduce action we rerun the DFA on the entire stack This is wasteful, since most of the work is repeated So record, for each stack element, state of the DFA after that state LR parser maains a stack sym 1, state 1... sym n, state n state k is the final state of the DFA on sym 1 sym k 2/20/08 Prof. Hilfinger CS14 Lecture 11 31 The LR Parsing Algorithm Let I = w 1 w 2 w n $ be initial input Let j = 1 Let DFA state 0 be the start state Let stack = dummy, 0 repeat case action[top_statestack), I[j]] of k: push I[j], k ; j = 1 reduce X α: pop α pairs, push X, Goto[top_statestack), X] accept: halt normally error: halt and report error 2/20/08 Prof. Hilfinger CS14 Lecture 11 32 Parsing Contexts Consider the state describing the situation at the I in the stack I ) ) Context: We are looking for an ) Have have seen from the right-hand side We are also looking for or ) Have seen nothing from the right-hand side One DFA state describes a set of such contexts Traditionally, use to show where the I is.) LR1) Items An LR1) item is a pair: X α β, a X αβ is a production a is a terminal the lookahead terminal) LR1) means 1 lookahead terminal [X α β, a] describes a context of the parser We are trying to find an X followed by an a, and We have α already on top of the stack Thus we need to see next a prefix derived from βa 2/20/08 Prof. Hilfinger CS14 Lecture 11 33 2/20/08 Prof. Hilfinger CS14 Lecture 11 34 Convention We add to our grammar a fresh new start symbol S and a production S Where is the old start symbol No need to do this if had only one production The initial parsing context contains: S, $ Trying to find an S as a string derived from $ The stack is empty 2/20/08 Prof. Hilfinger CS14 Lecture 11 3 Constructing the Parsing DFA. xample. S, $ 0 ), $/, $/ 2 S, $ ), $/ accept on $ ), $/ ), )/ and so on 1, $/ ), $/ ), $/ ), )/, )/ on $,, )/ 2/20/08 Prof. Hilfinger CS14 Lecture 11 3 4 3 on ),

LR Parsing Tables. Notes Parsing tables i.e. the DFA) can be constructed automatically for a CFG Shift/Reduce Conflicts If a DFA state contains both [X α aβ, b] and [Y γ, a] But we still need to understand the construction to work with parser generators.g., they report errors in terms of sets of items What kind of errors can we expect? Then on input a we could either Shift o state [X αa β, b], or Reduce with Y γ This is called a -reduce conflict 2/20/08 Prof. Hilfinger CS14 Lecture 11 37 2/20/08 Prof. Hilfinger CS14 Lecture 11 38 Shift/Reduce Conflicts Typically due to ambiguities in the grammar Classic example: the dangling else S if then S if then S else S OTHR Will have DFA state containing [S if then S, else] [S if then S else S, $] If else follows then we can or reduce More Shift/Reduce Conflicts Consider the ambiguous grammar * We will have the states containing [ *, ] [ *, ] [, ] [, ] Again we have a /reduce on input We need to reduce * binds more tightly than ) Solution: declare the precedence of * and 2/20/08 Prof. Hilfinger CS14 Lecture 11 39 2/20/08 Prof. Hilfinger CS14 Lecture 11 40 More Shift/Reduce Conflicts In bison declare precedence and associativity of terminal symbols: %left %left * Precedence of a rule = that of its last terminal See bison manual for ways to override this default Resolve /reduce conflict with a if: input terminal has higher precedence than the rule the precedences are the same and right associative 2/20/08 Prof. Hilfinger CS14 Lecture 11 41 Using Precedence to Solve S/R Conflicts Back to our example: [ *, ] [ *, ] [, ] [, ] Will choose reduce because precedence of rule * is higher than of terminal 2/20/08 Prof. Hilfinger CS14 Lecture 11 42 7

Using Precedence to Solve S/R Conflicts Same grammar as before * We will also have the states [, ] [, ] [, ] [, ] Now we also have a /reduce on input We choose reduce because and have the same precedence and is left-associative Using Precedence to Solve S/R Conflicts Back to our dangling else example [S if then S, else] [S if then S else S, x] Can eliminate conflict by declaring else with higher precedence than then However, best to avoid overuse of precedence declarations or you ll end with unexpected parse trees 2/20/08 Prof. Hilfinger CS14 Lecture 11 43 2/20/08 Prof. Hilfinger CS14 Lecture 11 44 Reduce/Reduce Conflicts If a DFA state contains both [X α, a] and [Y β, a] Then on input a we don t know which production to reduce This is called a reduce/reduce conflict Reduce/Reduce Conflicts Usually due to gross ambiguity in the grammar xample: a sequence of identifiers S ε id id S There are two parse trees for the string id S id S id S id How does this confuse the parser? 2/20/08 Prof. Hilfinger CS14 Lecture 11 4 2/20/08 Prof. Hilfinger CS14 Lecture 11 4 More on Reduce/Reduce Conflicts Consider the states [S id, $] [S S, $] [S id S, $] [S, $] id [S, $] [S id, $] [S id, $] [S id S, $] [S id S, $] Reduce/reduce conflict on input $ S S id S S id S id Better rewrite the grammar: S ε id S Relation to Bison Bison builds this kind of machine. However, for efficiency concerns, collapses many of the states together. Causes some additional conflicts, but not many. The machines discussed here are LR1) engines. Bison s optimized versions are LALR1) engines. 2/20/08 Prof. Hilfinger CS14 Lecture 11 47 2/20/08 Prof. Hilfinger CS14 Lecture 11 48 8

A Hierarchy of Grammar Classes Notes on Parsing From Andrew Appel, Modern Compiler Implementation in Java Parsing A solid foundation: context-free grammars A simple parser: LL1) A more powerful parser: LR1) An efficiency hack: LALR1) We use LALR1) parser generators But let s also see a more powerful engine. 2/20/08 Prof. Hilfinger CS14 Lecture 11 49 2/20/08 Prof. Hilfinger CS14 Lecture 11 0 9