Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Similar documents
CSE302: Compiler Design

Top-Down Parsing and Intro to Bottom-Up Parsing

CS415 Compilers Syntax Analysis Top-down Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

CS 314 Principles of Programming Languages

Creating a Recursive Descent Parse Table

Syntax Analysis Part III

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis Part I

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Compiler Principles, PS4

Syntax Analysis Part I

Syntactic Analysis. Top-Down Parsing

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Exercises. Exercise: Grammar Rewriting

CS153: Compilers Lecture 5: LL Parsing

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Parsing -3. A View During TD Parsing

Computer Science 160 Translation of Programming Languages

LL(1) Grammar and parser

Computing if a token can follow

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Follow sets. LL(1) Parsing Table

Compiling Techniques

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Announcement: Midterm Prep

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

CONTEXT FREE GRAMMAR AND

Compiler Principles, PS7

Top-Down Parsing, Part II

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Theory of Computation - Module 3

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

INF5110 Compiler Construction

Solution. S ABc Ab c Bc Ac b A ABa Ba Aa a B Bbc bc.

Einführung in die Computerlinguistik

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Pushdown Automata (Pre Lecture)

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Compiling Techniques

Context-Free Grammars: Normal Forms

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Compiler Design Spring 2017

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

Compiler Construction Lectures 13 16

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Bottom-up Analysis. Theorem: Proof: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Context free languages

Context-Free Languages (Pre Lecture)

CS 373: Theory of Computation. Fall 2010

CPS 220 Theory of Computation Pushdown Automata (PDA)

CS20a: summary (Oct 24, 2002)

CS375: Logic and Theory of Computing

Context Free Grammars

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CA Compiler Construction

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

Course Script INF 5110: Compiler construction

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Compila(on* **0368/3133*(Semester*A,*2013/14)*

Compiler construction

Computational Models - Lecture 5 1

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Introduction to Theory of Computing

Compiler Construction Lent Term 2015 Lectures (of 16)

INF5110 Compiler Construction

Compiler Construction Lent Term 2015 Lectures (of 16)

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Foundations of Informatics: a Bridging Course

CS 275 Automata and Formal Language Theory. Proof of Lemma II Lemma (II )

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Chomsky Normal Form for Context-Free Gramars

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Chomsky and Greibach Normal Forms

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

Introduction to Formal Languages, Automata and Computability p.1/42

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

MTH401A Theory of Computation. Lecture 17

Fundamentele Informatica II

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

VTU QUESTION BANK. Unit 1. Introduction to Finite Automata. 1. Obtain DFAs to accept strings of a s and b s having exactly one a.

Transcription:

Syntax Analysis: Context-free Grammars, Pushdown Automata and Part - 3 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design

Outline of the Lecture What is syntax analysis? (covered in lecture 1) Specification of programming languages: context-free grammars (covered in lecture 1) context-free languages: push-down automata (covered in lectures 1 and 2) Top-down parsing: LL(1) and recursive-descent parsing Bottom-up parsing: LR-parsing

Testable Conditions for LL(1) We call strong LL(1) as LL(1) from now on and we will not consider lookaheads longer than 1 The classical condition for LL(1) property uses FIRST and FOLLOW sets If α is any string of grammar symbols (α (N T ) ), then FIRST (α) = {a a T, and α ax, x T } FIRST (ɛ) = {ɛ} If A is any nonterminal, then FOLLOW (A) = {a S αaaβ, α, β (N T ), a T {$}} FIRST (α) is determined by α alone, but FOLLOW (A) is determined by the context of A, i.e., the derivations in which A occurs

FIRST and FOLLOW Computation Example Consider the following grammar S S$, S aas c, A ba SB, B ba S FIRST (S ) = FIRST (S) = {a, c} because S S$ c$, and S S$ aas$ abas$ abac$ FIRST (A) = {a, b, c} because A ba, and A SB, and therefore all symbols in FIRST (S) are in FIRST (A) FOLLOW (S) = {a, b, c, $} because S S$, S aas$ asbs$ asbas$, S asbs$ asss$ asaass$, S asss$ ascs$ FOLLOW (A) = {a, c} because S aas$ aaaas$, S aas$ aac

Computation of FIRST : Terminals and Nonterminals { for each (a T ) FIRST(a) = {a}; FIRST(ɛ) = {ɛ}; for each (A N) FIRST(A) = ; while (FIRST sets are still changing) { for each production p { Let p be the production A X 1 X 2...X n ; FIRST(A) = FIRST(A) (FIRST(X 1 ) - {ɛ}); i = 1; while (ɛ FIRST(X i ) && i n 1) { FIRST(A) = FIRST(A) (FIRST(X i+1 {ɛ}); i + +; } if (i == n) && (ɛ FIRST(X n )) FIRST(A) = FIRST(A) {ɛ} } }

Computation of FIRST (β): β, a string of Grammar Symbols { /* It is assumed that FIRST sets of terminals and nonterminals are already available /* FIRST(β) = ; while (FIRST sets are still changing) { Let β be the string X 1 X 2...X n ; FIRST(β) = FIRST(β) (FIRST(X 1 ) - {ɛ}); i = 1; while (ɛ FIRST(X i ) && i n 1) { FIRST(β) = FIRST(β) (FIRST(X i+1 {ɛ}); i + +; } if (i == n) && (ɛ FIRST(X n )) FIRST(β) = FIRST(β) {ɛ} } }

FIRST Computation: Algorithm Trace - 1 Consider the following grammar S S$, S aas ɛ, A ba SB, B ca S Initially, FIRST(S) = FIRST(A) = FIRST(B) = Iteration 1 FIRST(S) = {a, ɛ} from the productions S aas ɛ FIRST(A) = {b} FIRST(S) - {ɛ} FIRST(B) - {ɛ} = {b, a} from the productions A ba SB (since ɛ FIRST(S), FIRST(B) is also included; since FIRST(B)=φ, ɛ is not included) FIRST(B) = {c} FIRST(S) - {ɛ} {ɛ} = {c, a, ɛ} from the productions B ca S (ɛ is included because ɛ FIRST(S))

FIRST Computation: Algorithm Trace - 2 The grammar is S S$, S aas ɛ, A ba SB, B ca S From the first iteration, FIRST(S) = {a, ɛ}, FIRST(A) = {b, a}, FIRST(B) = {c, a, ɛ} Iteration 2 (values stabilize and do not change in iteration 3) FIRST(S) = {a, ɛ} (no change from iteration 1) FIRST(A) = {b} FIRST(S) - {ɛ} FIRST(B) - {ɛ} {ɛ} = {b, a, c, ɛ} (changed!) FIRST(B) = {c, a, ɛ} (no change from iteration 1)

Computation of FOLLOW { for each (X N T ) FOLLOW(X) = ; FOLLOW(S) = {$}; /* S is the start symbol of the grammar */ repeat { for each production A X 1 X 2...X n {/* X i ɛ */ FOLLOW(X n ) = FOLLOW(X n ) FOLLOW(A); REST = FOLLOW(A); for i = n downto 2 { if (ɛ FIRST(X i )) { FOLLOW(X i 1 ) = FOLLOW(X i 1 ) (FIRST (X i ) {ɛ}) REST; REST = FOLLOW(X i 1 ); } else { FOLLOW(X i 1 ) = FOLLOW(X i 1 ) FIRST (X i ) ; REST = FOLLOW(X i 1 ); } } } } until no FOLLOW set has changed }

FOLLOW Computation: Algorithm Trace Consider the following grammar S S$, S aas ɛ, A ba SB, B ca S Initially, follow(s) = {$}; follow(a) = follow(b) = first(s) = {a, ɛ}; first(a) = {a, b, c, ɛ}; first(b) = {a, c, ɛ}; Iteration 1 /* In the following, x = y means x = x y */ S aas: follow(s) = {$}; rest = follow(s) = {$} follow(a) = (first(s) {ɛ}) rest = {a, $} A SB: follow(b) = follow(a) = {a, $} rest = follow(a) = {a,$} follow(s) = (first(b) {ɛ}) rest = {a, c, $} B ca: follow(a) = follow(b) = {a,$} B S: follow(s) = follow(b) = {a, c,$} At the end of iteration 1 follow(s) = {a, c,$}; follow(a) = follow(b) = {a, $}

FOLLOW Computation: Algorithm Trace (contd.) first(s) = {a, ɛ}; first(a) = {a, b, c, ɛ}; first(b) = {a, c, ɛ}; At the end of iteration 1 follow(s) = {a, c, $}; follow(a) = follow(b) = {a, $} Iteration 2 S aas: follow(s) = {a, c, $}; rest = follow(s) = {a, c, $} follow(a) = (first(s) {ɛ}) rest = {a, c, $} (changed!) A SB: follow(b) = follow(a) = {a, c, $} (changed!) rest = follow(a) = {a, c, $} follow(s) = (first(b) {ɛ}) rest = {a, c, $} (no change) At the end of iteration 2 follow(s) = follow(a) = follow(b) = {a, c, $}; The follow sets do not change any further

LL(1) Conditions Let G be a context-free grammar G is LL(1) iff for every pair of productions A α and A β, the following condition holds dirsymb(α) dirsymb(β) =, where dirsymb(γ) = if (ɛ first(γ)) then ((first(γ) {ɛ}) follow(a)) else first(γ) (γ stands for α or β) dirsymb stands for direction symbol set An equivalent formulation (as in ALSU s book) is as below first(α.follow(a)) first(β.follow(a)) = Construction of the LL(1) parsing table for each production A α for each symbol s dirsymb(α) /* s may be either a terminal symbol or $ */ add A α to LLPT [A, s] Make each undefined entry of LLPT as error

LL(1) Table Construction using FIRST and FOLLOW for each production A α for each terminal symbol a first(α) add A α to LLPT [A, a] if ɛ first(α) { for each terminal symbol b follow(a) add A α to LLPT [A, b] if $ follow(a) add A α to LLPT [A, $] } Make each undefined entry of LLPT as error After the construction of the LL(1) table is complete (following any of the two methods), if any slot in the LL(1) table has two or more productions, then the grammar is NOT LL(1)

Simple Example of LL(1) Grammar P1: S if (a) S else S while (a) S begin SL end P2: SL S S P3: S ; SL ɛ {if, while, begin, end, a, (, ), ;} are all terminal symbols Clearly, all alternatives of P1 start with distinct symbols and hence create no problem P2 has no choices Regarding P3, dirsymb(;sl) = {;}, and dirsymb(ɛ) = {end}, and the two have no common symbols Hence the grammar is LL(1)

LL(1) Table Construction Example 1

LL(1) Table Problem Example 1

LL(1) Table Construction Example 2

LL(1) Table Problem Example 2

LL(1) Table Construction Example 3

LL(1) Table Construction Example 4

Elimination of Useless Symbols Now we study the grammar transformations, elimination of useless symbols, elimination of left recursion and left factoring Given a grammar G = (N, T, P, S), a non-terminal X is useful if S αxβ w, where, w T Otherwise, X is useless Two conditions have to be met to ensure that X is useful 1 X w, w T (X derives some terminal string) 2 S αxβ (X occurs in some string derivable from S) Example: S AB CA, B BC AB, A a, C ab b, D d 1 A a, C b, D d, S CA 2 S CA, A a, C b

Testing for X w G = (N,T,P,S ) is the new grammar N_OLD = φ; N_NEW = {X X w, w T } while N_OLD N_NEW do { N_OLD = N_NEW; N_NEW = N_OLD {X X α, α (T N_OLD) } } N = N_NEW; T = T; S = S; P = {p all symbols of p are in N T }

Testing for S αxβ G = (N,T,P,S ) is the new grammar N = {S}; Repeat { for each production A α 1 α 2... α n with A N do add all nonterminals of α 1, α 2,..., α n to N and all terminals of α 1, α 2,..., α n to T } until there is no change in N and T P = {p all symbols of p are in N T }; S = S