Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Similar documents
Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Type 3 languages. Type 2 languages. Regular grammars Finite automata. Regular expressions. Type 2 grammars. Deterministic Nondeterministic.

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

CSE302: Compiler Design

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Creating a Recursive Descent Parse Table

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Parsing -3. A View During TD Parsing

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis Part I

Top-Down Parsing and Intro to Bottom-Up Parsing

CS 406: Bottom-Up Parsing

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

Syntax Analysis (Part 2)

Compiler Design Spring 2017

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

Syntax Analysis Part I

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

CS415 Compilers Syntax Analysis Top-down Parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Syntactic Analysis. Top-Down Parsing

Context free languages

Syntax Analysis Part III

Bottom-Up Syntax Analysis

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Compiler Construction Lectures 13 16

Compiler Principles, PS4

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CA Compiler Construction

Exercises. Exercise: Grammar Rewriting

THEORY OF COMPILATION

Computer Science 160 Translation of Programming Languages

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

CONTEXT FREE GRAMMAR AND

Compiler Construction

1. For the following sub-problems, consider the following context-free grammar: S AA$ (1) A xa (2) A B (3) B yb (4)

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Compiler Construction Lent Term 2015 Lectures (of 16)

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Compiler Construction Lent Term 2015 Lectures (of 16)

INF5110 Compiler Construction

Bottom-up Analysis. Theorem: Proof: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Briefly on Bottom-up

Computing if a token can follow

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

INF5110 Compiler Construction

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

Parsing VI LR(1) Parsers

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

LL(1) Grammar and parser

Compila(on* **0368/3133*(Semester*A,*2013/14)*

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

CS20a: summary (Oct 24, 2002)

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

INF5110 Compiler Construction

Compiler Principles, PS7

CS 314 Principles of Programming Languages

Chapter 4: Bottom-up Analysis 106 / 338

Computer Science 160 Translation of Programming Languages

Translator Design Lecture 16 Constructing SLR Parsing Tables

Bottom-Up Syntax Analysis

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

Lecture 11 Context-Free Languages

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

MA/CSSE 474 Theory of Computation

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

CS415 Compilers Syntax Analysis Bottom-up Parsing

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Bottom-Up Syntax Analysis

Compiling Techniques

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

CS Rewriting System - grammars, fa, and PDA

Bottom-Up Syntax Analysis

Follow sets. LL(1) Parsing Table

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Everything You Always Wanted to Know About Parsing

Einführung in die Computerlinguistik

Course Script INF 5110: Compiler construction

Announcement: Midterm Prep

On LR(k)-parsers of polynomial size

Compiler construction

Curs 8. LR(k) parsing. S.Motogna - FL&CD

Why augment the grammar?

CS153: Compilers Lecture 5: LL Parsing

Chomsky Normal Form for Context-Free Gramars

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

Formal Languages, Automata and Compilers

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 7a Syntax Analysis Elias Athanasopoulos

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Syntax Analysis - Part 1. Syntax Analysis

Transcription:

Course 9-10 1

Bottom-up syntactic parsing Bottom-up parser LR(k) grammars Definition Properties LR(0) grammars Characterization theorem for LR(0) LR(0) automaton 2

a 1... a i... a n # X 1 X 1 Control Parsing table... # p 1 p 2... 3

S S, S asa bsb c 4

LR(0) Syntactic analysis Syntactic analyzer generators FIRST, FOLLOW SLR(1) Grammars SLR(1) parsing table SLR(1) syntactic analysis LR(1) grammars 5

The parsing table is the LR(0) automaton, M. Configuration: (σ, u#, π) where σεt 0 T*, uεt*, πεp*. Initial configuration (t 0, w#, ε), Transitions: Shift: (σt, au#, π) (σtt, u#, π) if g(t, a) = t. Reduce: (σtσ t, u#, π) ( σtt, u#, πr) if A β ε t, r = A β, σ t = β şi t = g (t, A). Accept: (t 0 t 1, #, π) is the acceptance configuration if S S ε t1, π is the parsing of the word. Error: a configuration from which no transitions are possible 6

char ps[]= w# ; //ps is the input string w i = 0; // position in the input string STACK.push(t0); // the stack is initialized with t0 while(true) { // repeat until accept or error t = STACK.top(); a = ps[i] // a is the current symbol at the input if( g(t, a) { //shift STACK.push(g(t, a)); i++; //move forward in the input string } else { if(a X 1 X 2 X m t){ if(a == S ) if(a == # )exit( accept ); else exit( error ); else // reduce for( i = 1; i <= m; i++) STACK.pop(); STACK.push(g(top(STACK), A)); } //endif else exit( error ); }//endelse }//endwhile 7

S S S E$ E E+T T (E) E T T a 8

S S S E$ E E+T T (E) E T T a Stack Input Action Output 0 a+(a+a)$# shift 05 +(a+a)$# reduce T a 03 +(a+a)$# reduce E T 02 +(a+a)$# shift 027 (a+a)$# shift 0274 a+a)$# shift 02745 +a)$# reduce T a 02743 +a)$# reduce E T 02748 +a)$# shift 027487 a)$# shift 0274875 )$# reduce T a 0274879 )$# reduce E E+T 02748 )$# shift 02748'10' $# reduce T (E) 0279 $# reduce E E+T 02 $# shift 026 # reduce S E$ 01 # accept 9

Lemma 1, 2 Let G = (N, T, S, P) be an LR(0), t 0 σ, t 0 τ paths in the LR(0) automaton, labeled with φ and γ respectively and u, v ε T*. It follows that, if in the LR(0) parser the calculation (t 0 σ, uv#, ε) + (t 0 τ, v#, π) holds, then the derivation φ dr π u holds for the grammar G, and reciprocally. Theorem If G is an LR(0) grammar, then for any word w ε T*, the LR(0) parser will reach an acceptance configuration for w, i.e. (t 0 σ, uv#, ε) + (t 0 τ, v#, π) iff φ dr π u. 10

YACC ( Yet Another Compiler Compiler ) designed and built in 1975 by S. C. Johnson at the Bell Laboratories Bison, compatible with YACC, written by Robert Corbett and Richard Stallmann 11

yacc <file_name> The result is a C source file Structure of a yacc file C and yacc declarations %% Rules (context free grammar) %% C code 12

%{C declarations%} Declaring terminals: %token <terminal> Declaring the starting symbol of the grammar : %start <non-terminal> Other declarations: type defines types; left operators associative to the left; right - operators associative to the right; prec operator precedence; nonassoc non-associative. 13

non-terminal: <right hand side> {C code} expr:nr{$$=$1;} expr "+"expr {$$=$1+$3;} expr "-" expr{$$=$1-$3;} expr "*" expr{$$=$1*$3;} expr "/" expr{$$=$1/$3;} "("expr")" {$$=$2;} ; 14

Definition Let G be a grammar for which the LR(0) automaton contains inconsistent states (G is not LR(0)). The grammar G is SLR(1) if for any state t of its LR(0) automaton, the following are true: If A α, B β t then FOLLOW(A) FOLLOW(B) = ; If A α, B β aγ t then a FOLLOW(A). The SLR(1) syntactic analysis is similar to the LR(0) one; the syntactic analysis table has two main components: The first, ACTION, determines whether the parser will shift or reduce, depending on the state at the top of the stack and the next input symbol The second, GOTO, determines the state which will be added to the stack following a reduction. 15

FIRST(α) = {a a ε T, α st * au } if (α st * ε) then {ε} else. FOLLOW(A) = {a a ε T {ε}, S st * uaγ, a ε FIRST (γ) } 16

1.for(X ε Σ) 2.if(X ε T)FIRST(X)={X} else FIRST(X)= ; 3.for(A aβ ε P) 4.FIRST(A)=FIRST(A) {a}; 5.FLAG=true; 6.while(FLAG){ 7.FLAG=false; 8.for(A X 1 X 2 X n ε P){ 9.i=1; 10.if((FIRST(X1) FIRST(A)){ 11.FIRST(A)=FIRST(A) (FIRST(X1); 12.FLAG=true; 13.}//endif 14.while(i<n&&X i st * ε) 15.if((FIRST(X i+1 ) FIRST(A)){ 16.FIRST(A)=FIRST(A) FIRST(X i+1 ); 17.FLAG=true;i++; }//endif }//endwhile }//endfor }//endwhile for(a ε N) if(a st * ε)first(a)=first(a) {ε}; 17

Input: the grammar G=(N,T,S,P). the sets FIRST(X),X ε Σ. α =X 1 X 2 X n, X i ε Σ,1 i n. Output: FIRST(α). 1.FIRST(α)=FIRST(X 1 )-{ε};i=1; 2.while(i<n && X i + ε){ 3.FIRST(α)=FIRST(α) (FIRST(X i+1 )-{ε}); 4.i=i+1; }//endwhile 5.if(i==n && X n + ε) 6.FIRST(α)=FIRST(α) {ε}; 18

Let the grammar G be: S E B, E ε, B a beginsc end, C ε ; SC FIRST(S) = {a, begin, ε} FIRST(E) = {ε} FIRST(B) = {a, begin} FIRST(C) = {;, ε}. FIRST(SEC) = {a, begin, ;, ε}, FIRST(SB)= {a, begin}, FIRST(;SC)= {;}. 19

ε ε FOLLOW(S). If A αbβxγ ε P and β + ε, then FIRST(X) -{ε} FOLLOW (B). S * α 1 A β 1 α 1 αbβxγβ 1 * α 1 αbxγβ 1 it then follows that FIRST(X)-{ε} FOLLOW (B). If A αbβ ε P then FIRST(β)-{ε} FOLLOW (B). If A αbβ ε P and β + ε, then FOLLOW(A) FOLLOW(B). 20

1. for(a Σ)FOLLOW(A)= ; 2.FOLLOW(S)={ε}; 3.for(A X 1 X 2 X n ){ 4.i=1; 5.while(i<n){ 6.while(X i N)++i; 7.if(i<n){ 8.FOLLOW(Xi)= FOLLOW(X i ) (FIRST(X i+1 X i+2 X n )-{ε}); 9.++i; }//endif }//endwhile }//endfor 21

10.FLAG=true; 11.while(FLAG){ 12.FLAG=false; 13.for(A X 1 X 2 X n ){ 14.i=n; 15.while(i>0 && X i N){ 16.if(FOLLOW(A) FOLLOW(X i )){ 17.FOLLOW(Xi)=FOLLOW(X i ) FOLLOW(A); 18.FLAG=true; 19.}//endif 20.if(X i + ε)--i; 21.else continue; 22.}//endwhile 23.}//endfor 24.}//endwhile 22

Let the grammar G be: S E B, E ε, B a begin SC end, C ε ; SC FOLLOW(S)=FOLLOW(E)=FOLLOW(B) ={ε, ;, end} FOLLOW(C) = {end}. 23

Definition Let G be a grammar for which the LR(0) automaton contains inconsistent states (G is not LR(0)). The grammar G is SLR(1) if for any state t of its LR(0) automaton, the following are true: If A α, B β t then FOLLOW(A) FOLLOW(B) = ; If A α, B β aγ t then a FOLLOW(A). The SLR(1) syntactic analysis is similar to the LR(0) one; the syntactic analysis table has two main components: The first, ACTION, determines whether the parser will shift or reduce, depending on the state at the top of the stack and the next input symbol The second, GOTO, determines the state which will be added to the stack following a reduction. 24

Input: The grammar G = (N, T, S, P) augmented with S S; The automaton M = (Q, Σ, g, t 0, Q ); The sets FOLLOW(A), A V Output: The SLR(1) parsing table, made up of two parts: ACTION(t, a), t Q, a T { # }, GOTO(t, A), t Q, A N. 25

for(t Q) for (a T) ACTION(t, a) = error ; for (A V) GOTO(t, A) = error ; for(t Q}{ for(a α aβ t) ACTION(t,a)= D g(t, a) ;//shift to g(t, a) for(b γ t ){// accept or reduce if(b == S ) ACTION(t, a) = accept ; else for(a FOLLOW(B)) ACTION(t,a)= R B γ ; }// endfor for (A N) GOTO(t, A) = g(t, A); }//endfor 26

shift: (σt, au#, π) (σtt, u#, π) if ACTION(t, a)=st ; reduce: (σtσ t, u#, π) ( σtt, u#, πr) ACTION(t, a) = Rp where p= A β, σ t = β and t = GOTO(t, A); accept: (t 0 t, #, π) if ACTION(t,a) = accept ; The analyzer stops and accepts the analyzed word and π is the parse for the word (the sequence of productions, in reverse, for the rightmost derivation of w). error: (σ t, au#, π) error if ACTION(t,a) = error ; The analyzer stops and rejects the analyzed word. 27

Input: The grammar G = (N, T, S, P) which is SLR(1) ; The SLR(1) parsing table ( ACTION, GOTO); The input word w T*. Output: The bottom-up syntactic parse of w, if w L(G); error, otherwise. The analyzer uses the stack St for performing the shift/reduce transitions 28

char ps[] = w# ; //ps is the input word w int i = 0; // the current position of the input letter St.push(t0); // initialize the stack with t0 while(true) { // repeat until success or error t = St.top(); a = ps[i] // a is the current input symbol if(action(t,a) == accept ) exit( accept ); if(action(t,a) == Dt ){ St.push(t ); i++; // move forward in w }//endif else { if(action(t,a) == R A X 1 X 2 X m ){ for( i = 1; i m; i++) St.pop(); St.push(GOTO(St.top, A)); } //endif else exit( error ); }//endelse }//endwhile 29

0.S E, 1.E E+T, 2.E T, 3.T T*F, 4.T F, 5.F (E), 6.F a 30

a + * ( ) E T F 0 5 4 1 2 3 1 6 2 7 3 4 5 4 8 2 3 5 6 5 4 9 3 7 5 4 10 8 6 11 9 7 10 11 31

G is not LR(0), as the states 1, 2, 9 contain shift/reduce conflicts FOLLOW(S)={#}, FOLLOW(E)={#,+,)} The grammar is SLR(1) because: In state 1: + FOLLOW(S); In state 2: * FOLLOW(E); In state 9: * FOLLOW(E). 32

State ACTION GOTO a + * ( ) # E T F 0 S5 S4 1 2 3 1 S6 accept 2 R2 S7 R2 R2 3 R4 R4 R4 R4 4 S5 S4 8 2 3 5 R6 R6 R6 R6 6 S5 S4 9 3 7 S5 S4 10 8 S6 S11 9 R1 S7 R1 R1 10 R2 R3 R2 R2 11 R5 R5 R5 R5 33

Stack Input Action Output 0 a*(a+a)# shift 05 *(a+a)# reduce 6.F a 03 *(a+a)# reduce 4.T F 02 *(a+a)# shift 027 (a+a)# shift 0274 a+a)# shift 02745 +a)# reduce 6.F a 02743 +a)# reduce 4.T F 02742 +a)# reduce 2.E T 02748 +a)# shift 34

Stack Input Action Output 027486 a)# shift 0274865 )# reduce 6.F a 0274863 )# reduce 4.T F 0274869 )# reduce 1.E E+T 02748 )# shift 02748(11) # reduce 5.F (E) 027(10) # reduce 3.T T*F 02 # reduce 2.E T 01 # accept 35

Definition Let G = (V, T, S, P) be a reduced grammar. An LR(1) article for the grammar G is a pair (A α β, a), where A α β is an LR(0) article, and a FOLLOW(A) (# instead of ε). Definition The article (A β1 β2, a) is valid for the viable prefix αβ1 if the following are true S r *αau αβ1β2u and a = 1:u (a = # if u = ε). Theorem A grammar G = (N, T, S, P) is LR(1) iff for any viable prefix φ, there are no two distinct articles, valid for φ, with (A α, a), (B β γ, b) where a FIRST(γb). 36

There are no shift/reduce conflicts. Such a conflict would mean that two articles (A α, a) and (B β aβ, b) are both valid for the same prefix. There are no reduce/reduce conflicts. Such a conflict would mean that two complete articles (A α, a) and (B β, a) are both valid for the same prefix. To check if a grammar is LR(1) we build the LR(1) automaton in a similar manner to the LR(0) automaton: States contain sets of LR(1) articles Transitions are made by reading symbols to the right of the point Closure is based on the fact that, if the article (B β Aβ, b) is valid for the viable prefix φ then all articles of the form (A α, a) where a FIRTST(αa) are valid for the same prefix. 37

flag= true; while( flag) { flag= false; for ( (A α Bβ, a) I) { for B γ P) for( b FIRST(βa)){ if( (B γ, b) I) { I = I {(B γ, b)}; flag = true; }//endif }//endforb }//endforb }//endfora }//endwhile return I; 38

t0 = closure((s S,#));T={t 0 };marked(t 0 )=false; while( t T&&!marked(t)){ // marked(t) = false for( X Σ) { t = Φ; for( (A α Xβ,a) t ) t = t {(B αx β,a) (B B α Xβ,a) t}; if( t Φ){ t = closure( t ); if( t T) { T= T { t }; marked( t ) = false; }//endif g(t, X) = t ; } //endif } //endfor marked( t ) = true; } // endwhile 39

Theorem The automaton M built in algorithm 2 is deterministic and L(M) is the set of viable prefixes for G. Moreover, for any viable prefix γ, g(t 0,γ) is the set of LR(1) articles valid for γ. The LR(1) automaton can be used to check if G is LR(1) Reduce/reduce conflict: If a state t contains articles of the form (A α, a), (B β, a) then G is not LR(1); Shift/reduce conflict : If a state t contains articles of the form (A α, a) and (B β 1 aβ 2, b), then G is not LR(1). G is LR(1) if all the states t T are free of conflicts 40

S L=R R, L *R a, R L 41

42

Grigoraş Gh., Construcţia compilatoarelor. Algoritmi fundamentali, Editura Universităţii Alexandru Ioan Cuza, Iaşi, 2005 43