CONTEXT FREE GRAMMAR AND

Similar documents
Syntax Analysis Part I

Syntax Analysis Part I

Syntax Analysis Part III

Top-Down Parsing and Intro to Bottom-Up Parsing

Syntactic Analysis. Top-Down Parsing

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Compiling Techniques

Introduction to Bottom-Up Parsing

Lecture 11 Context-Free Languages

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

Introduction to Bottom-Up Parsing

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Follow sets. LL(1) Parsing Table

Computing if a token can follow

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

CS415 Compilers Syntax Analysis Top-down Parsing

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

LL(1) Grammar and parser

Compiler Principles, PS4

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Syntax Analysis - Part 1. Syntax Analysis

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

CS 314 Principles of Programming Languages

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CSE302: Compiler Design

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Context free languages

CYK Algorithm for Parsing General Context-Free Grammars

CA Compiler Construction

Parsing -3. A View During TD Parsing

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

CS153: Compilers Lecture 5: LL Parsing

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

INF5110 Compiler Construction

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Compiling Techniques

Context Free Grammars

Context-free Grammars and Languages

Compiling Techniques

Bottom-Up Syntax Analysis

Computer Science 160 Translation of Programming Languages

Creating a Recursive Descent Parse Table

Context-Free Grammars (and Languages) Lecture 7

This lecture covers Chapter 5 of HMU: Context-free Grammars

Announcement: Midterm Prep

Compiling Techniques

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Context-Free Grammars and Languages

h>p://lara.epfl.ch Compiler Construc/on 2011 CYK Algorithm and Chomsky Normal Form

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Formal Languages, Grammars and Automata Lecture 5

Exercises. Exercise: Grammar Rewriting

Context-Free Grammars and Languages. Reading: Chapter 5

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

INF5110 Compiler Construction

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Chapter 4: Context-Free Grammars

Syntax Analysis, VI Examples from LR Parsing. Comp 412

From EBNF to PEG. Roman R. Redziejowski. Concurrency, Specification and Programming Berlin 2012

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Compiler Principles, PS7

Top-Down Parsing, Part II

Bottom-up syntactic parsing. LR(k) grammars. LR(0) grammars. Bottom-up parser. Definition Properties

Introduction to Theory of Computing

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Course Script INF 5110: Compiler construction

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Computer Science 160 Translation of Programming Languages

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

5 Context-Free Languages

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

INF5110 Compiler Construction

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

UNIT II REGULAR LANGUAGES

Theory of Computation - Module 3

Type 3 languages. Type 2 languages. Regular grammars Finite automata. Regular expressions. Type 2 grammars. Deterministic Nondeterministic.

Syntax Directed Transla1on

Simplification and Normalization of Context-Free Grammars

Chapter 5: Context-Free Languages

Context-Free Languages

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Pushdown Automata. Reading: Chapter 6

Transcription:

CONTEXT FREE GRAMMAR AND PARSING STATIC ANALYSIS - PARSING Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntatic structure Semantic Analysis (IC generator) Syntatic/sema ntic structure Code Generator Target language 2 Code Optimizer Syntax described formally Tokens organized into syntax tree that describes structure Error checking Symbol Table CS 540 Spri ng 2009 GM U 1

STATIC ANALYSIS - PARSING We can use context free grammars to specify the syntax of programming languages. 3 CS 540 Spri ng 2009 GM U CONTEXT FREE GRAMMARS Definition: A context free grammar is a formal model that consists of: A set of tokens or terminal symbols Vt A set of nonterminal symbols Vn Start symbol S (in Vn) Finite set of productions of form: A R1 Rn (n >= 0) where A in Vn, R in Vn Vt 4 2

CFG EXAMPLES Indicates a production V t = {+,-,0..9}, V n = {L,D}, s = {L} L L + D L D D D 0 9 Shorthand for multiple productions V t ={(,)}, V n = {L}, s = {L} L ( L ) L L e 5 LANGUAGES Regular Context free Context sensitive Type 0 A a B, C e A a aab agb a b 3

ANY REGULAR LANGUAGE CAN BE EXPREED USING A CFG Starting with a NFA: For each state Si in the NFA Create non-terminal Ai If transition (Si,a) = Sk, create production Ai a Ak If transition (Si,e) = Sk, create production Ai Ak If Si is a final state, create production Ai e If Si is the NFA start state, s = Ai What does the existence of this algorithm tell us about the relationship between regular and context free languages? 7 NFA TO CFG EXAMPLE ab*a a a 1 2 3 A1 a A2 A2 b A2 A2 a A3 A3 e b 8 4

WRITING GRAMMARS When writing a grammar (or RE) for some language, the following must be true: All strings generated are in the language. Your grammar produces all strings in the language. 9 TRY THESE: Integers divisible by 2 Legal postfix expressions Floating point numbers with no extra zeros Strings of 0,1 where there are more 0 than 1 10 5

PARSING The task of parsing is figuring out what the parse tree looks like for a given input and language. If a string is in the given language, a parse tree must exist. However, just because a parse tree exists for some string in a given language doesn t mean a given algorithm can find it. 11 PARSE TREES The parse tree for some string in a language that is defined by the grammar G as follows: The root is the start symbol of G The leaves are terminals or e. When visited from left to right, the leaves form the input string The interior nodes are non-terminals of G For every non-terminal A in the tree with children B1 Bk, there is some production A B1 Bk 12 6

PARSE TREE FOR (())() L L ( L ) L L e 13 L L ( ( L ) L ) ( ) L L e e e e PARSE TREE FOR (())() L L ( L ) L L e 14 L L ( ( L ) L ) ( L ) L e e e e 7

PARSING Parsing algorithms are based on the idea of derivations. General Algorithms: LL (top down) LR (bottom up) 15 SINGLE STEP DERIVATION Definition: Given a A b (with a,b in (Vn Vt)*) and a production A g, a A b a g b is a single step derivation. Examples: L + D L D + D ( L ) ( L ) ( ( L ) L ) ( L ) L L - D L (L) L 16 8

DERIVATIONS Definition: A sequence of the form: w0 w1 wn is a derivation of wn from w0 (w0 * wn) L ( L ) L ( ) L ( ) production L ( L ) L production L e production L e L * ( ) If wi has non-terminal symbols, it is referred to as sentential form. 17 L * (( )) ( ) L ( L ) L ( L ) ( L ) L ( L ) ( L ) ( ( L ) L ) ( L ) ( ( ) L ) ( L ) ( ( ) L ) ( ) ( ( ) ) ( ) production L ( L ) L production L ( L ) L production L e production L ( L ) L production L e production L e production L e 18 9

L(G), the language generated by grammar G is {w in Vt*: s * w, for start symbol s} Example: We ve just shown that both () and (())() are in L(G) for the previous grammar. 19 LEFTMOST DERIVATIONS A derivation where the leftmost nonterminal is always chosen If a string is in a given language (i.e. a derivation exists), then a leftmost derivation must exist Rightmost derivation defined as you would expect 20 10

LEFTMOST DERIVATION FOR (())() L production L ( L ) L ( L ) L production L ( L ) L ( ( L ) L ) L production L e ( ( ) L ) L production L e ( ( ) ) L ( ( ) ) ( L ) L ( ( ) ) ( ) L ( ( ) ) ( ) production L e production L ( L ) L production L e RIGHTMOST DERIVATION FOR (())() L production L ( L ) L ( L ) L production L ( L ) L ( L ) ( L ) L production L e ( L ) ( L ) production L e ( L ) ( ) production L ( L ) L ( ( L ) L ) ( ) production L e ( ( L ) ) ( ) production L e ( ( ) ) ( ) 11

AMBIGUITY Two (or more) parse trees or leftmost derivations for some string in the language E E + E E E E E 0 9 E E E + E E - E E 4 2 - E E + E 2 3 3 4 2 TWO LEFTMOST DERIVATIONS E E + E E E + E 2 E + E 2 3 + E 2 3 + 4 E E E E E + E 2 E + E 2 3 + E 2 3 + 4 24 12

An ambiguous grammar can sometimes be made unambiguous: E E + T E T T T 0 9 Precedence can be specified as well: E E + T E T T T T * F T / F F F ( E ) 0 9 enforces the correct associativity 2 INPUT: { SIMPLESTMT; SIMPLESTMT; } P { } S ; e S simplestmt S { } P 26 S S { simplestmt ; simplestmt ; } e 13

TOP DOWN (LL) PARSING P { } S ; e S simplestmt S { } end P 27 { simplestmt ; simplestmt ; } TOP DOWN (LL) PARSING P { } S ; e S simplestmt S { } end P 28 S { simplestmt ; simplestmt; } 14

TOP DOWN (LL) PARSING P { } S ; e S simplestmt S { } end P 29 S { simplestmt ; simplestmt; } TOP DOWN (LL) PARSING P { } S ; e S simplestmt S { } end P 30 S S { simplestmt ; simplestmt; } 15

TOP DOWN (LL) PARSING P { } S ; e S simplestmt S { } end P 31 S S { simplestmt ; simplestmt; } TOP DOWN (LL) PARSING P { } S ; e S simplestmt S { } end P 32 S S e { simplestmt ; simplestmt; } 16

BOTTOMUP (LR) PARSING P { } S ; e S simplestmt S { } end 33 S { simplestmt ; simplestmt ; } BOTTOMUP (LR) PARSING P { } S ; e S simplestmt S { } end 34 S S { simplestmt ; simplestmt ; } 17

BOTTOMUP (LR) PARSING P { } S ; e S simplestmt S { } end 35 S S { simplestmt ; simplestmt ; } e CS 540 Sp BOTTOMUP (LR) PARSING P { } S ; e S simplestmt S { } end 36 S S e { simplestmt ; simplestmt ; } CS 540 Spri ng 2009 GM U 18

BOTTOMUP (LR) PARSING P { } S ; e S simplestmt S { } end 37 S S { simplestmt ; simplestmt ; } e BOTTOMUP (LR) PARSING P { } S ; e S simplestmt S { } end P 38 S S { simplestmt ; simplestmt ; } e 19

BACKUS-NAUR FORM (BNF) BNF is a meta-language i.e. a language used to describe another language invented by John Backus to describe ALGOL 58 used by Peter Naur to describe ALGOL 60 BNF is equivalent to context-free grammars a BNF grammar is defined by a set of terminal symbols, a set of nonterminal symbols a set of rules a start symbol (one of the terminal symbols) BNF ELEMENTS terminal symbols are the lexemes of the target PL e.g., while, (, ) nonterminal symbols represent classes of syntactic structures rules they act like syntactic variables e.g., <statement> define how a nonterminal symbol can by developed into a sequence of nonterminal and terminal symbols e.g., <while_stmt> while ( <logic_expr> ) <stmt> 20

BNF RULES A rule has a left-hand side (LHS) then a right-hand side (RHS) There can be several rules for one LHS <stmt> <assignment> <stmt> begin <stmt_list> end Syntactic lists are described using recursion <ident_list> ident <ident_list> ident, <ident_list> A grammar is a finite nonempty set of rules EBNF Extended BNF (EBNF) is most often used avoids having numerous rules for the same LHS Extra meta-symbols (in addition to ) [ ] enclosed symbols are optional (1 or 0 times) e.g., <if_stmt> if ( <exp> ) <stmt> [ else <stmt> ] { } enclosed symbols can be repeated (0 to n times) e.g., <ident_list> ident {, ident } choice of one of the symbol sequences separated by e.g., <stmt> <assignment> begin <stmt_list> end ( ) groups enclosed symbols 21

BNF VS. EBNF 43 BNF <expr> <expr> + <term> <expr> <expr> - <term> <expr> <term> <term> <term> * <factor> <term> <term> / <factor> <term> <factor> <factor> <exp> ** <factor> <factor> <exp> <exp> ( <expr> ) <exp> id EBNF <expr> <term> { ( + - ) <term> } <term> <factor> { ( * / ) <factor> } <factor> <exp> [ ** <factor> ] <exp> ( <expr> ) id AUGMENTED EBNF another meta-symbol ::= (equal) instead of meta-symbols for repetitions + means one or more times * means zero or more times <ident> = <letter> + ( <letter> <digit> ) * rules can use iteration instead of recursion e.g.: <stmt_list> <stmt> <stmt> ; <stmt_list> can be formulated as <stmt_list> ::= <stmt> ( ; <stmt> ) * 22

A SMALL LANGUAGE IN EBNF <program> ::= begin <stmt_list> end <stmt_list> ::= <stmt> <stmt> ; <stmt_list> <stmt> ::= <var> = <expr> <expr> ::= <term> + <term> <term> - <term> <term> ::= <var> const <var> ::= a b c PARSING General Algorithms: LL (top down) LR (bottom up) Both algorithms are driven by the input grammar and the input to be parsed Two important sets: FIRST and FOLLOW 46 23

FIRST SETS FIRST(a) is the set of all terminal symbols that can begin some sentential form in a derivation that starts with a a a b FIRST(a) = {a in V t a * ab } { e } if a * e Example: S simple { S } FIRST(S) = { simple, { } 47 To compute FIRST across strings of terminals and non-terminals: FIRST(e) = { e } FIRST(Aa) = A if A is a terminal FIRST(A) FIRST(a) If A * e FIRST(A) otherwise 48 24

COMPUTING FIRST SETS Initially FIRST(A) (for all A) is empty 1. For productions A c b, where c in V t Add { c } to FIRST(A) 2. For productions A e Add { e } to FIRST(A) 3. For productions A a B b, where a * e and NOT (B * e) Add FIRST(aB) to FIRST(A) 4. For productions A a, where a * e Add FIRST(a) and { e } to FIRST(A) same thing as e FIRST(B) 49 EXAMPLE 1 S a S e S B B b B e B C C c C e C d FIRST(C) = {c,d} FIRST(B) = FIRST(S) = Start with the simplest non-terminal 50 25

EXAMPLE 1 S a S e S B B b B e B C C c C e C d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = CS 540 Spring 2009 GMU Now that we know FIRST(C) 51 EXAMPLE 1 S a S e S B B b B e B C C c C e C d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d} 52 26

EXAMPLE 2 P i c n T S Q P a S d S c S T R b e S e R n e T R S q FIRST(P) = {i,c,n} FIRST(Q) = FIRST(R) = {b,e} FIRST(S) = FIRST(T) = 53 EXAMPLE 2 P i c n T S Q P a S d S c S T R b e S e R n e T R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = FIRST(T) = 54 27

EXAMPLE 2 P i c n T S Q P a S d S c S T R b e S e R n e T R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = 55 EXAMPLE 2 P i c n T S Q P a S d S c S T R b e S e R n e T R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = {b,c,n,q} 56 28

EXAMPLE 3 S a S e S T S T R S e Q R r S r e Q S T e FIRST(S) = FIRST(R) = FIRST(T) = FIRST(Q) = 57 EXAMPLE 3 S a S e S T S T R S e Q R r S r e Q S T e FIRST(S) = {a} FIRST(R) = {r, e} FIRST(T) = {r,a, e} FIRST(Q) = {a, e} 58 29

FOLLOW SETS FOLLOW(A) is the set of terminals (including end of file - $) that may follow non-terminal A in some sentential form. FOLLOW(A) = {c in Vt S + Ac } {$} if S + A For example, consider L + (())(L)L Both ) and end of file can follow L NOTE: e is never in FOLLOW sets 59 COMPUTING FOLLOW(A) If A is start symbol, put $ in FOLLOW(A) Productions of the form B a A b, Add FIRST(b) {e} to FOLLOW(A) INTUITION: Suppose B AX and FIRST(X) = {c} S + a B b a A X b + a A c d b 60 30

3. Productions of the form B a A or B a A b where b * e Add FOLLOW(B) to FOLLOW(A) INTUITION: Suppose B Y A S + a B b a Y A b FOLLOW(B) Suppose B A X and X * e S + a B b a A X b * a A b FOLLOW(B) 61 Assume the first non-terminal is the start symbol EXAMPLE 4 S a S e B B b B C f C C c C g d e FIRST(C) = {c,d,e} FIRST(B) = {b,c,d,e} FIRST(S) = {a,b,c,d,e} FOLLOW(C) = FOLLOW(B) = FOLLOW(S) {$} Using rule #1 62 31

EXAMPLE 4 S a S e B B b B C f C C c C g d e FIRST(C) = {c,d,e} FIRST(B) = {b,c,d,e} FIRST(S) = {a,b,c,d,e} FOLLOW(C) {f,g} FOLLOW(B) {c,d,f} FOLLOW(S) {$,e} Using rule #2 63 EXAMPLE 4 S a S e B B b B C f C C c C g d e FIRST(C) = {c,d,e} FIRST(B) = {b,c,d,e} FIRST(S) = {a,b,c,d,e} FOLLOW(S) = {$} FOLLOW(B) = FIRST(Cf) {$} = {c,d,e,} {$} = {c,d,e,$} FOLLOW(C) = FOLLOW(B) {f,g} = {c,d,e,g,$} 32

EXAMPLE 5 S A B C A D A e a A B b c e C D d C D e b f c FOLLOW(S) = FOLLOW(A) = FOLLOW(B) = FOLLOW(C) = FOLLOW(D) = FIRST(D) = {e,f} FIRST(C) = {e,f} FIRST(B) = {b,c,e} FIRST(A) = {a, e} FIRST(S) = {a,b,c,e,f} 65 EXAMPLE 5 S A B C A D A e a A B b c e C D d C D e b f c FOLLOW(S) = {$} FOLLOW(A) = {b,c,e,f} FOLLOW(B) = {e,f} FOLLOW(C) = {$} FOLLOW(D) = {$} FIRST(D) = {e,f} FIRST(C) = {e,f} FIRST(B) = {b,c,e} FIRST(A) = {a, e} FIRST(S) = {a,b,c,e,f} 66 33

EXAMPLE 6 S ( A) e A T E E & T E e T ( A ) a b c FOLLOW(S) = FOLLOW(A) = FOLLOW(E) = FOLLOW(T) = FIRST(T) = {(,a,b,c} FIRST(E) = {&, e } FIRST(A) = {(,a,b,c} FIRST(S) = {(, e} 67 EXAMPLE 6 S ( A) e A T E E & T E e T ( A ) a b c FIRST(T) = {(,a,b,c} FIRST(E) = {&, e } FIRST(A) = {(,a,b,c} FIRST(S) = {(, e} FOLLOW(S) = {$} FOLLOW(A) = { ) } FOLLOW(E) = FOLLOW(A) = { ) } FOLLOW(T) = FIRST(E) FOLLOW(A) FOLLOW(E) = {&, )} CS 540 Spring 2009 GMU 68 34

EXAMPLE 7 E T E E + T E e T F T T * F T e F ( E ) id FOLLOW(E) = FOLLOW(E ) = FOLLOW(T) = FOLLOW(T ) = FOLLOW(F) = FIRST(F) = FIRST(T) = FIRST(E) = {(,id} FIRST(T ) = {*,e} FIRST(E ) = {+,e} 69 EXAMPLE 7 E T E E + T E e T F T T * F T e F ( E ) id FOLLOW(E) = {$,)} FOLLOW(E ) = FOLLOW(E) = {$,)} FOLLOW(T) = FIRST(E ) FOLLOW(E) FOLLOW(E ) = {+,$,)} FOLLOW(T ) = FOLLOW(T) = {+,$,)} FOLLOW(F) = FIRST(T ) FOLLOW(T) FOLLOW(T ) = {*,+,$,)} FIRST(F) = FIRST(T) = FIRST(E) = {(,id} FIRST(T ) = {*,e} FIRST(E ) = {+,e} 70 35