Compiler Principles, PS4

Similar documents
Compiler Principles, PS7

Top-Down Parsing and Intro to Bottom-Up Parsing

CSE302: Compiler Design

Introduction to Bottom-Up Parsing

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntactic Analysis. Top-Down Parsing

Introduction to Bottom-Up Parsing

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Creating a Recursive Descent Parse Table

Syntax Analysis Part III

Syntax Analysis Part I

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

CS 314 Principles of Programming Languages

CS415 Compilers Syntax Analysis Top-down Parsing

Syntax Analysis Part I

Computer Science 160 Translation of Programming Languages

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

CONTEXT FREE GRAMMAR AND

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

THEORY OF COMPILATION

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Exercises. Exercise: Grammar Rewriting

Compiling Techniques

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Parsing -3. A View During TD Parsing

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Context free languages

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

CS153: Compilers Lecture 5: LL Parsing

Announcement: Midterm Prep

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Follow sets. LL(1) Parsing Table

MA/CSSE 474 Theory of Computation

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

CISC4090: Theory of Computation

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Compiling Techniques

CA Compiler Construction

Compiling Techniques

INF5110 Compiler Construction

Pushdown Automata (Pre Lecture)

Syntax Analysis (Part 2)

CS 406: Bottom-Up Parsing

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Computing if a token can follow

Compila(on* **0368/3133*(Semester*A,*2013/14)*

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

INF5110 Compiler Construction

CPS 220 Theory of Computation Pushdown Automata (PDA)

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

MTH401A Theory of Computation. Lecture 17

Compiler Design Spring 2017

Theory of Computation (IV) Yijia Chen Fudan University

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Fundamentele Informatica II

Compiler construction

Top-Down Parsing, Part II

Course Script INF 5110: Compiler construction

Section 1 (closed-book) Total points 30

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

LL(1) Grammar and parser

Bottom-Up Syntax Analysis

Computer Science 160 Translation of Programming Languages

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Syntax Analysis - Part 1. Syntax Analysis

Pushdown Automata: Introduction (2)

Introduction to Theory of Computing

Context-Free Grammars and Languages. Reading: Chapter 5

INF5110 Compiler Construction

Chapter 4: Context-Free Grammars

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

MA/CSSE 474 Theory of Computation

CS415 Compilers Syntax Analysis Bottom-up Parsing

Lecture 11 Context-Free Languages

Syntax Directed Transla1on

Context Free Grammars: Introduction

Pushdown Automata (2015/11/23)

1. For the following sub-problems, consider the following context-free grammar: S A$ (1) A xbc (2) A CB (3) B yb (4) C x (6)

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

Compiler Construction Lectures 13 16

CS Pushdown Automata

Intro to Theory of Computation

Einführung in die Computerlinguistik

Transcription:

Top-Down Parsing Compiler Principles, PS4 Parsing problem definition: The general parsing problem is - given set of rules and input stream (in our case scheme token input stream), how to find the parse tree of the input stream? Parsing problem consists of finding derivation of a particular sentence using a given grammar. In top-down parsing we start with the sentence symbol (start symbol) and "generate" the sentence. Topdown parser tries to find the leftmost derivation for a given sentence. It can be viewed as an attempt to construct a parse tree from start symbol. leftmost derivation = apply the grammar rules on the left most none terminal. Recursive Decent Parser: For each none-terminal, we define a function that checks if the input contains a right side of that noneterminal. For each terminal we have a function that checks that the input contains this terminal. Example: α aβ => bool func_α(), return ( token(a) && func_β() ); - Example: α β γ => bool func_α(), return ( func_β() func_γ() ); - The problems: 1. Long run-time with OR - we don t know which to choose. So we check the first option (which might be a long check) and then find out it is false, go back to check the second option. 2. unable to predict which rule to choose, and unable to undo a choice we made. Examples: α a β, β ab. We will never be able to choose the second rule of α because if we have ab in the input, then the first rule is checked, and is true. S Aab, A a ε. We can t identify ab. S Aab, A ε a. We can t identify aab. 3. unable to handle left recursion grammars will go into an endless loop. Example: S Sab ε. Predictive Parser A grammar is said to be left-recursive if there is a derivation of type A->Ax. The parser might get into infinite loops while trying to parse some input! A grammar is ambiguous if there exists a string that can be derived from the start symbol using 2 different parse trees. Lookahead is the number of symbols required in order to identify a correct derivation. This number is determined accordingly to the given grammar. We'll assume our grammars are with lookahead = 1. grammar ambiguity: given first token in input stream there might be several rules matching. If the choice isn't lucky the parsing won't be successful, although a parse tree for the sentence exists!

In order to perform top-down parsing we must make sure that the grammar is not left-recursive and not ambiguous. LL(1) LL(1) is set of all grammars in which the string is read from Left to right, the Left most derivation is always used first and 1 input symbol is enough to determine derivation rule to use (e.g. lookahead = 1). If lookahead of 1 is not enough then the grammar is not considered to be LL(1). For top-down parsing we need unambiguous LL(1). Definitions: Lookahead symbol- the current input symbol or the input end marker ($, for example). Starter symbol (aka First) of a given non-terminal is any symbol which may appear at the start of a string generated by this non-terminal. Follower symbol (aka Follow) of a given non-terminal is any symbol that can follow the nonterminal. Computing First sets: The First set computed for every non-terminal and right-hand side of grammar rules. Every First set initialized to be an empty set. 1. For each rule: N α add First(α) to First(N) 2. For each rule: α=ab add First(A) to First(α) 3. For each rule: α=ab where A is nullable add First(B) to First(α) 4. First(aβ) will include a, when a is a terminal Computing Follow sets: The Follow set computed for every non-terminal. Every Follow set initialized to be an empty set except the start symbol: Follow(S) = {$} 1. For each rule: M αnβ add First(β) to Follow(N) 2. For each rule: M αnβ where β is nullable add Follow(M) to Follow(N)

Example: Input grammar: SAB PQx Axy m BbC CbC ε PpP ε QqQ ε Lets compute the First set for each rule: rule First SAB {x, m} SPQx {p, q,x} Axy {x} Am {m} BbC {b} CbC {b} C ε {} PpP {p} P ε {} QqQ {q} Q ε {} Lets compute the Follow set for each non-terminal: Non-terminal U(First) Follow A {x,m} {b} B {b} {$} C {b} {$} P {p} {q,x} Q {q} {x} S {x,m,p,q } {$}

Now we can calculate the Director Symbol Set (DSS) for each rule N α, DSS(N α) = First(α) U ( Follow(N) if α is nullable ) DSS(SAB) = {x, m} U ( Follow(S) if {x, m} are nullable) = {x,m} DSS(SPQx) = {p, q, x} U ( Follow(S) if {p, q, x} are nullable) = {p,q,x} DSS(Axy) = {x} U ( Follow(A) if {x} is nullable) = {x} DSS(Am) = {m} U ( Follow(A) if {m} is nullable ) = {m} DSS(BbC) = {b} U ( Follow(B) if {b} is nullable) = {b} DSS(CbC) = {b} U ( Follow(C) if {b} is nullable) = {b} DSS(C ε) = {} U ( Follow(C) if {ε} is nullable) = {$} DSS(PpP) = {p} U ( Follow(P) if {p} is nullable) = {p} DSS(P ε) = {} U ( Follow(P) if {ε} is nullable) = {q,x} DSS(QqQ) = {q} U ( Follow(Q) if {q} is nullable) = {q} DSS(Q ε) = {} U ( Follow(Q) if {ε} is nullable) = {x} Corresponding Predictive Parsing Table: For each rule N α, we put this rule in each cell (N,x) if xdss(n α). S A B C P Q b BbC CbC p SPQx PpP q SPQx P ε QqQ x SAB, SPQx Axy P ε Q ε y m SAB Am $ C ε To parse "mbbb" is easy S->AB, A->m, B->bC, C->bC, C->bC, C-> epsilon, but parsing "x" is ambiguous! The grammar is not LL(1) if there are 2 entries in the PPT, or there is a conflict: First/First conflict for each non-terminal N, the First of all its alternatives must be disjunct. (Example: S α β, where First(α) First(β) Ø ). First/Follow conflict for each non-terminal N that has a nullable alternative, Follow(N) must be disjunctive from all First of each alternative of N. Example: S Aa, A a ε, A has a nullable alternative (ε), and Follow(A)={a} First(A)={a} Ø. Multiple nullable alternatives: for each non-terminal N, there can be at most one alternative which is nullable.

Now we build a Push-Down Automaton from the PPT: Init push S into stack. At each step, we look at top of stack, and do one of the two moves: Prediction move when pop() = non-terminal N, we look at PPT[N,t] where t is the lookahead token (next token in the input). If PPT[N,t] is empty then we return syntax error. If PPT[N,t] = some rule N α, we push α. Notice if α = α 1 α 2 then we push α 2 and then push α 1. Match move when pop() = terminal x, we check that next token in input is x, if it is, we remove x from the input, else we output syntax error. Termination is achieved when stack is empty and input stream is empty. Else there is a syntax error. This only recognizes the grammar, To construct a parse tree do: Predict moves will create new nodes for everything it pushes into the stack (whose parent is what was popped), and match will insert the token into the node he popped.