Introduction to Bottom-Up Parsing

Similar documents
Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Creating a Recursive Descent Parse Table

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis Part III

CSE302: Compiler Design

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Syntactic Analysis. Top-Down Parsing

Syntax Analysis Part I

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Parsing -3. A View During TD Parsing

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Exercises. Exercise: Grammar Rewriting

Syntax Analysis Part I

Compiler Principles, PS4

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

CS415 Compilers Syntax Analysis Top-down Parsing

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

CS 406: Bottom-Up Parsing

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

CONTEXT FREE GRAMMAR AND

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Computer Science 160 Translation of Programming Languages

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Follow sets. LL(1) Parsing Table

CS 314 Principles of Programming Languages

Context free languages

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

Syntax Analysis (Part 2)

Compiler Design Spring 2017

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Computing if a token can follow

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Compiler Principles, PS7

CISC4090: Theory of Computation

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

CA Compiler Construction

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

Context-free Grammars and Languages

CS153: Compilers Lecture 5: LL Parsing

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Syntax Analysis - Part 1. Syntax Analysis

Bottom-Up Syntax Analysis

Computability Theory

CSE 105 THEORY OF COMPUTATION

Announcement: Midterm Prep

CSE 105 THEORY OF COMPUTATION

Top-Down Parsing, Part II

Pushdown Automata (Pre Lecture)

Compiling Techniques

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

(pp ) PDAs and CFGs (Sec. 2.2)

LL(1) Grammar and parser

Compiler Construction Lectures 13 16

Compiling Techniques

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

CS20a: summary (Oct 24, 2002)

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

THEORY OF COMPILATION

Computational Models - Lecture 4

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Pushdown Automata: Introduction (2)

Theory of Computation (VI) Yijia Chen Fudan University

CPSC 421: Tutorial #1

Introduction to Theory of Computing

Compiling Techniques

Context-Free Grammars (and Languages) Lecture 7

5 Context-Free Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Context-Free Grammars and Languages. Reading: Chapter 5

This lecture covers Chapter 5 of HMU: Context-free Grammars

Everything You Always Wanted to Know About Parsing

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

MA/CSSE 474 Theory of Computation

Properties of Context-Free Languages. Closure Properties Decision Properties

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Decision problem of substrings in Context Free Languages.

(pp ) PDAs and CFGs (Sec. 2.2)

Theory of Computation (IV) Yijia Chen Fudan University

Theory of Computation (IX) Yijia Chen Fudan University

INF5110 Compiler Construction

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Transcription:

Introduction to Bottom-Up Parsing

Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables 2

Top-Down Parsing: Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal T + * + T + T T () * T 3

Top-Down Parsing: Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal T + The leaves at any po form a string βaγ β contains only terminals * T The input string is βbδ The prefix β matches The next token is b * + T + T T () * T 4

Top-Down Parsing: Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal T + The leaves at any po form a string βaγ β contains only terminals * T T The input string is βbδ The prefix β matches The next token is b * + 5

Top-Down Parsing: Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal T + The leaves at any po form a string βaγ β contains only terminals * T T The input string is βbδ The prefix β matches The next token is b * + 6

Predictive Parsing: Review A predictive parser is described by a table For each non-terminal A and for each token b we specify a production A α When trying to expand A we use A αif b follows next Once we have the table The parsing algorithm is simple and fast No backtracking is necessary 7

Constructing Predictive Parsing Tables Consider the state S * βaγ With b the next token Trying to match βbδ There are two possibilities: 1. Token b belongs to an expansion of A Or Any A αcan be used if b can start a string derived from α We say that b First(α) 8

Constructing Predictive Parsing Tables (Cont.) 2. Token b does not belong to an expansion of A The expansion of A is empty and b belongs to an expansion of γ Means that b can appear after A in a derivation of the form S * βabω We say that b Follow(A) in this case What productions can we use in this case? Any A α can be used if α can expand to ε We say that ε First(A) in this case 9

Computing First Sets Definition First(X) = { b X * bα } { ε X * ε } Algorithm sketch 1. First(b) = { b } 2. ε First(X) if X εis a production 3. ε First(X) if X A 1 A n and ε First(A i ) for 1 i n 4. First(α) First(X) if X A 1 A n α and ε First(A i ) for 1 i n 10

First Sets: xample Recall the grammar T X X + ε T ( ) Y Y * T ε First sets First( ( ) = { ( } First( T ) = {, ( } First( ) ) = { ) } First( ) = {, ( } First( ) = { } First( X ) = { +, ε } First( + ) = { + } First( Y ) = { *, ε } First( * ) = { * } 11

Computing Follow Sets Definition Follow(X) = { b S * β X b δ } Intuition If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * ε then Follow(X) Follow(A) If S is the start symbol then $ Follow(S) 12

Computing Follow Sets (Cont.) Algorithm sketch 1. $ Follow(S) 2. First(β) - {ε} Follow(X) For each production A αx β 3. Follow(A) Follow(X) For each production A αx β where ε First(β) 13

Follow Sets: xample Recall the grammar T X X + ε T ( ) Y Y * T ε Follow sets Follow( + ) = {, ( } Follow( * ) = {, ( } Follow( ( ) = {, ( } Follow( ) = { ), $ } Follow( X ) = { $, ) } Follow( T ) = { +, ), $ } Follow( ) ) = { +, ), $ } Follow( Y ) = { +, ), $ } Follow( ) = { *, +, ), $ } 14

Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A αin G do: For each terminal b First(α) do T[A, b] = α If ε First(α), for each b Follow(A) do T[A, b] = α If ε First(α) and $ Follow(A) do T[A, $] = α 15

Constructing LL(1) Tables: xample Recall the grammar T X X + ε T ( ) Y Y * T ε Where in the line of Y we put Y * T? In the lines of First(*T) = { * } Where in the line of Y we put Y ε? In the lines of Follow(Y) = { $, +, ) } 16

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well For some grammars there is a simple parsing strategy: Predictive parsing Most programming language grammars are not LL(1) Thus, we need more powerful parsing strategies 17

Bottom Up Parsing

Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Builds on ideas in top-down parsing Preferred method in practice Also called LR parsing L means that tokens are read left to right R means that it constructs a rightmost derivation! 19

An Introductory xample LR parsers don t need left-factored grammars and can also handle left-recursive grammars Consider the following grammar: + ( ) Why is this not LL(1)? Consider the string: + ( ) + ( ) 20

The Idea LR parsing reduces a string to the start symbol by inverting productions: str w input string of terminals repeat Identify β in str such that A βis a production (i.e., str = αβγ) Replace β by A in str (i.e., str w = α A γ) until str = S (the start symbol) OR all possibilities are exhausted 21

A Bottom-up Parse in Detail (1) + () + () + ( ) + ( ) + ( ) 22

A Bottom-up Parse in Detail (2) + ( ) + () + () + () + () + ( ) + ( ) 23

A Bottom-up Parse in Detail (3) + ( ) + () + () + () + () + () + () + ( ) + ( ) 24

A Bottom-up Parse in Detail (4) + ( ) + () + () + () + () + () + () + () + ( ) + ( ) 25

A Bottom-up Parse in Detail (5) + ( ) + () + () + () + () + () + () + () + () + ( ) + ( ) 26

A Bottom-up Parse in Detail (6) + () + () + () + () + () + () + () + () + ( ) A rightmost derivation in reverse + ( ) + ( ) 27

Important Fact #1 about Bottom-up Parsing An LR parser traces a rightmost derivation in reverse 28

Where Do Reductions Happen Fact #1 has an eresting consequence: Let αβγ be a step of a bottom-up parse Assume the next reduction is by using A β Then γ is a string of terminals Why? Because αaγ αβγis a step in a right-most derivation 29

Notation Idea: Split string o two substrings Right substring is as yet unexamined by parsing (a string of terminals) Left substring has terminals and non-terminals The dividing po is marked by a I The I is not part of the string Initially, all input is unexamined: Ix 1 x 2... x n 30

Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift Reduce 31

Shift Shift: Move I one place to the right Shifts a terminal to the left string + (I ) + ( I ) In general: ABC I xyz ABCx I yz 32

Reduce Reduce: Apply an inverse production at the right end of the left string If + ( ) is a production, then + ( + ( ) I ) + ( I ) In general, given A xy, then: Cbxy I ijk CbA I ijk 33

Shift-Reduce xample I + () + ()$ shift + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ shift reduce + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ I + () + ()$ shift reduce shift 3 times + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ I + () + ()$ + ( I ) + ()$ shift reduce shift 3 times reduce + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ I + () + ()$ + ( I ) + ()$ + ( I ) + ()$ shift reduce shift 3 times reduce shift + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ I + () + ()$ + ( I ) + ()$ + ( I ) + ()$ + () I + ()$ shift reduce shift 3 times reduce shift reduce + () + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ I + () + ()$ + ( I ) + ()$ + ( I ) + ()$ + () I + ()$ I + ()$ shift reduce shift 3 times reduce shift reduce + () shift 3 times + ( ) + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ I + () + ()$ + ( I ) + ()$ + ( I ) + ()$ + () I + ()$ I + ()$ + ( I )$ shift reduce shift 3 times reduce shift reduce + () shift 3 times reduce + ( ) + ( ) + ( )

Shift-Reduce xample + ( ) I + () + ()$ shift I + () + ()$ reduce I + () + ()$ shift 3 times + ( I ) + ()$ reduce + ( I ) + ()$ + () I + ()$ shift reduce + () I + ()$ shift 3 times + ( I )$ reduce + ( I )$ shift + ( ) + ( )

Shift-Reduce xample + ( ) I + () + ()$ shift I + () + ()$ reduce I + () + ()$ shift 3 times + ( I ) + ()$ reduce + ( I ) + ()$ + () I + ()$ shift reduce + () I + ()$ shift 3 times + ( I )$ reduce + ( I )$ + () I $ shift reduce + () + ( ) + ( )

Shift-Reduce xample I + () + ()$ I + () + ()$ shift reduce I + () + ()$ shift 3 times + ( I ) + ()$ reduce + ( I ) + ()$ + () I + ()$ shift reduce + () I + ()$ shift 3 times + ( I )$ reduce + ( I )$ + () I $ shift reduce + () I $ accept + ( ) + ( )

The Stack Left string can be implemented by a stack Top of the stack is the I Shift pushes a terminal on the stack Reduce pops 0 or more symbols off of the stack (production RHS) and pushes a nonterminal on the stack (production LHS) 45

Key Question: To Shift or to Reduce? Idea: use a finite automaton (DFA) to decide when to shift or reduce The input is the stack The language consists of terminals and non-terminals We run the DFA on the stack and we examine the resulting state X and the token tok after I If X has a transition labeled tok then shift If X is labeled with A βon tok then reduce 46

accept on $ LR(1) Parsing: An xample 0 1 2 + 3 ( 4 7 + () on $, + 8 + ) + 10 6 ( ) 5 11 on $, + 9 on ), + + () on ), + I + () + ()$ shift I + () + ()$ I + () + ()$ shift(x3) + ( I ) + ()$ + ( I ) + ()$ shift + () I + ()$ +() I + ()$ shift (x3) + ( I )$ + ( I )$ shift + () I $ +() I $ accept

Representing the DFA Parsers represent the DFA as a 2D table Recall table-driven lexical analysis Lines correspond to DFA states Columns correspond to terminals and nonterminals Typically columns are split o: Those for terminals: action table Those for non-terminals: goto table 48

Representing the DFA: xample The table for a fragment of our DFA: ( 3 4 6 ) 7 + () on $, + 5 on ), + + ( ) $ 3 s4 4 s5 g6 5 r r 6 s8 s7 7 r +() r +() 49

The LR Parsing Algorithm After a shift or reduce action we rerun the DFA on the entire stack This is wasteful, since most of the work is repeated Remember for each stack element on which state it brings the DFA LR parser maains a stack sym 1, state 1... sym n, state n state k is the final state of the DFA on sym 1 sym k 50

The LR Parsing Algorithm let I = w$ be initial input let j = 0 let DFA state 0 be the start state let stack = dummy, 0 repeat case action[top_state(stack), I[j]] of shift k: push I[j++], k reduce X A: pop A pairs, push X, Goto[top_state(stack),X] accept: halt normally error: halt and report error 51

LR Parsers Can be used to parse more grammars than LL Most programming languages grammars are LR LR Parsers can be described as a simple table There are tools for building the table How is the table constructed? 52