Syntax Analysis Part III

Similar documents
Syntax Analysis Part I

Syntax Analysis Part I

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Syntactic Analysis. Top-Down Parsing

CSE302: Compiler Design

Top-Down Parsing and Intro to Bottom-Up Parsing

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Introduction to Bottom-Up Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Exercises. Exercise: Grammar Rewriting

CS415 Compilers Syntax Analysis Top-down Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Creating a Recursive Descent Parse Table

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

CS 314 Principles of Programming Languages

Syntax Analysis - Part 1. Syntax Analysis

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Computer Science 160 Translation of Programming Languages

CONTEXT FREE GRAMMAR AND

Compiler Principles, PS4

LL(1) Grammar and parser

Compiling Techniques

Context free languages

Compiling Techniques

Parsing -3. A View During TD Parsing

Follow sets. LL(1) Parsing Table

Computing if a token can follow

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

Compiling Techniques

CA Compiler Construction

CS153: Compilers Lecture 5: LL Parsing

Announcement: Midterm Prep

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Compiler Principles, PS7

THEORY OF COMPILATION

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Context-Free Grammars (and Languages) Lecture 7

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Compila(on* **0368/3133*(Semester*A,*2013/14)*

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Context Free Grammars

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

INF5110 Compiler Construction

INF5110 Compiler Construction

CS 406: Bottom-Up Parsing

Syntax Analysis (Part 2)

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Computer Science 160 Translation of Programming Languages

Compiler Design Spring 2017

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Compiler construction

Compiling Techniques

Course Script INF 5110: Compiler construction

Theory of Computation - Module 3

Compiler Construction Lectures 13 16

Lecture 11 Context-Free Languages

Parsing with Context-Free Grammars

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

INF5110 Compiler Construction

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Top-Down Parsing, Part II

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

CISC4090: Theory of Computation

Pushdown Automata: Introduction (2)

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Lecture Notes on Inductive Definitions

CYK Algorithm for Parsing General Context-Free Grammars

Syntax Directed Transla1on

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Computational Models - Lecture 4

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Fundamentele Informatica II

Context-free Grammars and Languages

Eng. Maha Talaat Page 1

Lecture Notes on Inductive Definitions

CS20a: summary (Oct 24, 2002)

Intro to Theory of Computation

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

Transcription:

Syntax Analysis Part III Chapter 4: Top-Down Parsing Slides adapted from : Robert van Engelen, Florida State University

Eliminating Ambiguity stmt if expr then stmt if expr then stmt else stmt other The above grammar is ambiguous since the following string has two parse trees if E 1 then if E 2 then S 1 else S 2

Eliminating Ambiguity stmt if exp r then stmt if expr then stmt else stmt stmt if expr then stmt if expr then stmt else stmt

Eliminating Ambiguity stmt matched_stmt open_stmt matched_stmt if expr then matched_stmt else matched_stmt other open_stmt if expr then stmt if expr then matched_stmt else open_stmt In practice, disambiguation is rarely implemented into the grammar; extra constraints are added to the parser for solving ambiguity

Exercise Choose the unambiguous version of the given ambiguous grammar: S SS a b S Sa Sb ε S S S S a b S S S S a b S Sa Sb S Sa Sb a b

Immediate Left/Right Recursion Productions of the form A A α β are immediate left recursive Productions of the form A α A β are immediate right recursive

Associativity of Operators Immediate left-recursive productions used to implement left-associative operators left left + term term String a+b+c has the same meaning as (a+b)+c

Associativity of Operators Immediate right-recursive productions used to implement right-associative operators right term = right term String a=b=c has the same meaning as a=(b=c)

Precedence of Operators Operators with higher precedence bind more tightly expr expr + term term term term * factor factor factor number ( expr ) String 2+3*5 has the same meaning as 2+(3*5) expr expr term term factor number term factor number factor number 2 + 3 * 5

Immediate Left Recursion When one of the productions in a grammar is immediate left recursive then a top-down predictive parser loops forever on certain inputs S A Recursive call to A w A β

Immediate Left Recursion We can eliminate immediate left recursive productions by systematically rewriting the grammar using right recursive productions A A α β A β R R α R ε

Immediate Left Recursion A A A α β R A α α R β α R ε

Immediate Left-Recursion Elimination Method Rewrite every left-recursive production A A α A δ β γ into a right-recursive production A β A R γ A R A R α A R δ A R ε

Exercise Choose the grammar that correctly eliminates immediate left recursion from the given grammar: E E + T T T id ( E )

Exercise (cont d) E E + id E + ( E ) id ( E ) E T E E + T E ε T id ( E ) E E + T T E id ( E ) T id ( E ) E id + E E + T T T id ( E )

Left Recursion A CFG is left recursive if it allows derivations of the form A + A α This notion is more general than immediate left recursion

Recognition of Left Recursion Assume there are no epsilon rules Draw a graph where nodes are nonterminals arcs (A, B) represent the relation A expands into B α for some α In this graph, a cycle indicates left recursion

Example A B C a B C A A b C A B C C a A B C

Elimination of Left Recursion The basic operation used by the algorithm: expand first symbol in right-hand side with a rule A B α B β A β α

Elimination of Left Recursion The basic idea Process nonterminal symbols in some order A 1, A 2,, A n Replace rules of the form A i A j γ having j < i with rules having j > i

Example A < B < C A B C A B C

General Left Recursion Elimination Method Arrange the nonterminals in some order A 1, A 2,, A n for i = 1,, n { for j = 1,, i-1 { replace each A i A j γ with A i δ 1 γ δ 2 γ δ k γ where A j δ 1 δ 2 δ k } eliminate the immediate left recursion in A i }

Example A B C a B C A A b C A B C C a Choose arrangement: A, B, C i = 1: i = 2, j = 1: i = 3, j = 1: i = 3, j = 2: nothing to do B C A A b B C A B C b a b (imm) B C A B R a b B R B R C b B R ε C A B C C a C B C B a B C C a C B C B a B C C a C C A B R C B a b B R C B a B C C a (imm) C a b B R C B C R a B C R a C R C R A B R C B C R C C R ε

Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing S A Symbol b cannot be used to predict expansion of B w Bβ Bδ b

Left Factoring Replace productions with A α β 1 α β 2 α β n γ A α A R γ A R β 1 β 2 β n

Top-Down Parsing Recursive-descent parsing Needs backtracking LL methods (Left-to-right, Leftmost derivation) Also called predictive parsing Specialization of recursive-descent where backtracking is not needed (correct production chosen using look-ahead)

Top-Down Parsing Both methods build leftmost derivations Grammar: E T + T T ( E ) T - E T id Leftmost derivation: E lm T + T lm id + T lm id + id E E E E T T T T T T + id + id + id

Recursive Descent Parsing Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information

Recursive Descent Parsing void A() { choose an A-production A X 1 X 2 X k ; for ( i = 1 to k ) { if ( X i is a nonterminal ) } call procedure X i (); else if ( X i equals the current input symbol a ) advance the input to the next symbol; else error(); }

Recursive Descent Parsing To allow for backtracking previous code must be modified as follows Must try each A-production in some order In case of failure, input pointer should be retracted and next production should be checked at the latest point of choice If all productions have been checked, then we have an input error

Example Assume the grammar E E E + id E - E id ( E ) The following C code implements a recursive descent parser

Example (cont d) 1 bool term(token tok) { return *next++ == tok; } 2 bool E 1 () { return E (); } 3 bool E 2 () { return E () && term(plus) && term(id); } 4 bool E() { 5 TOKEN *save = next; 6 return (next = save, E1()) (next = save, E2()); } 7 bool E 1 () { return term(minus) && E (); } 8 bool E 2 () { return term(id); } 9 bool E 3 () { return term(open) && E() && term(close); } 10 bool E () { 11 TOKEN *save = next; 12 return (next = save, E 1 ()) 13 (next = save, E 2 ()) 14 (next = save, E 3 ()); }

Exercise Choose the derivation that is a valid recursive descent parse for the string id + id in the grammar E E E + E E - E id ( E ) Moves that are followed by backtracking are marked in red

Exercise (cont d) 1. E, E, E + E, id + E, id + E, id + id 2. E, E, - E, id, ( E ), E + E, - E + E, id + E, id + E, id + - E, id + id 3. E, E + E, id + E, id + E, id + id 4. E, E, id, E + E, id + E, id + E, id + id

Backtracking & Efficiency When failure occurs at higher level, all lower level subparses that were successful are deleted Later on, those subparses are recomputed from scratch Highly inefficient

Predictive Parsing Grammar must be non-ambiguous Eliminate left recursion from grammar Left factor the grammar Compute functions FIRST() and FOLLOW() Table-driven variant of descent parsing

FIRST() and FOLLOW(): Intuitive Idea Used to predict production A α for nontermial A S FIRST(α) w A FOLLOW(A) a b

FIRST() FIRST(α) = the set of terminals that begin all strings derived from α FIRST(a) = {a} if a T FIRST(ε) = {ε} FIRST(A) = A α FIRST(α) for A α P FIRST(X 1 X 2 X k ) : if ε FIRST(X j ) for all j = 1,, i-1 then add FIRST(X i )\{ε} to FIRST(X 1 X 2 X k ) if ε FIRST(X j ) for all j = 1,, k then add ε to FIRST(X 1 X 2 X k )

Example Grammar: E T X T ( E ) int Y X + E ε Y * T ε FIRST(+) = {+} FIRST(*) = {*} FIRST( ( ) = {(} FIRST( ) ) = {)} FIRST(int) = {int} FIRST(ε) = {ε}

Example FIRST(Y) = FIRST( * T ) FIRST(ε) = FIRST( * ) FIRST(ε) = { * } { ε } = { *, ε } FIRST(X) = FIRST(+ E) FIRST(ε) = FIRST(+) FIRST(ε) = { + } { ε } = { +, ε }

Example FIRST(T) = FIRST( ( E ) ) FIRST(int Y) = FIRST( ( ) FIRST(int) = { ( } { int } = { (, int } FIRST(E) = FIRST(T X) = FIRST(T) = { (, int }

FOLLOW() FOLLOW(A) = the set of terminals that can immediately follow nonterminal A FOLLOW(A) = for all (B α A β) P do add FIRST(β)\{ε} to FOLLOW(A) for all (B α A β) P and ε FIRST(β) do add FOLLOW(B) to FOLLOW(A) for all (B α A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol then add $ to FOLLOW(A)

Example Grammar: E T X T ( E ) int Y X + E ε Y * T ε { $, ) } FOLLOW(E) FOLLOW(X) FOLLOW(E) FOLLOW(E) FOLLOW(X) FOLLOW(X) = FOLLOW(E) = { $, ) }

Example Grammar: E T X T ( E ) int Y X + E ε Y * T ε FIRST(X) \ {ε} = {+} FOLLOW(T) FOLLOW(E) = { $, ) } FOLLOW(T) FOLLOW(Y) FOLLOW(T) FOLLOW(T) FOLLOW(Y) FOLLOW(Y) = FOLLOW(T) = { +, $, ) }

LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A α 1 α 2 α n for nonterminal A the following holds: 1. FIRST(α i ) FIRST(α j ) = for all i j 2. if α i * ε then 2.a. α j * ε for all j i 2.b. FIRST(α j ) FOLLOW(A) = for all j i

Non-LL(1) Examples Grammar S S a a S a S a S a R ε R S ε S a R a R S ε Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S * ε and ε * ε For R: FIRST(S) FOLLOW(R)

Using FIRST and FOLLOW to Write a Recursive Descent Parser expr term rest rest + term rest - term rest ε term id procedure rest(); begin if lookahead in FIRST(+ term rest) then match( + ); term(); rest() else if lookahead in FIRST(- term rest) then match( - ); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }

Non-Recursive Predictive Parsing: Table-Driven Parsing Given an LL(1) grammar G = (N, T, P, S) construct a table M[A, a] for A N, a T and use a driver program with a stack input a + b $ stack X Y Z $ Predictive parsing program (driver) Parsing table M output

Constructing an LL(1) Predictive Parsing Table for each production A α do for each a FIRST(α) do add A α to M[A, a] enddo if ε FIRST(α) then for each b FOLLOW(A) do add A α to M[A, b] enddo endif enddo Mark each undefined entry in M as error

Example Table E T E R E R + T E R ε T F T R T R * F T R ε F ( E ) id A α FIRST(α) FOLLOW(A) E T E R ( id $ ) E R + T E R + $ ) E R ε ε $ ) T F T R ( id + $ ) T R * F T R * + $ ) T R ε ε + $ ) F ( E ) ( * + $ ) F id id * + $ ) id + * ( ) $ E E T E R E T E R E R E R + T E R E R ε E R ε T T F T R T F T R T R T R ε T R * F T R T R ε T R ε F F id F ( E )

Ambiguous grammar S i E t S S R a S R e S ε E b Error: duplicate table entry LL(1) Grammars are Unambiguous A α FIRST(α) FOLLOW(A) S i E t S S R i e $ S a a e $ S R e S e e $ S R ε ε e $ E b b t a b e i t $ S S a S i E t S S R S R E E b S R ε S R e S S R ε

Predictive Parsing Algorithm Input: string w and LL parsing table M for grammar G Output: leftmost derivation if w L(G) and error otherwise Start configuration stack is $S buffer is w$

Predictive Parsing Algorithm let a be the first symbol of w; let X be the top stack symbol; while ( X $ ) if ( X = a ) pop the stack and let a be the next symbol of w; else if ( X is a terminal ) error(); else if ( M[X, a] is an error entry ) error(); else if ( M[X, a] = X Y 1 Y 2 Y k ) { output the production X Y 1 Y 2 Y k ; pop the stack; push Y k, Y k-1,, Y 2, Y 1 onto the stack, with Y 1 } let X be the top stack symbol; } on top;

Example Stack $E $E R T $E R T R F $E R T R id $E R T R $E R $E R T+ $E R T $E R T R F $E R T R id $E R T R $E R T R F* $E R T R F $E R T R id $E R T R $E R $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E T E R T F T R F id T R ε E R + T E R T F T R F id T R * F T R F id T R ε E R ε

Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Can be automated Cons: Error messages are needed FOLLOW(E) = { ) $ } FOLLOW(E R ) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(T R ) = { + ) $ } FOLLOW(F) = { + * ) $ } id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch synch: the driver pops current nonterminal A and skips input till synch token

Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive Can then continue here id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R insert * T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch insert *: driver inserts missing * and retries the production

Error Productions E T E R E R + T E R ε T F T R T R * F T R ε F ( E ) id Add error production : T R F T R to ignore missing *, e.g.: id id Pro: Powerful recovery method Cons: Cannot be automated id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R T R F T R T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch