Syntax Analysis Part III Chapter 4: Top-Down Parsing Slides adapted from : Robert van Engelen, Florida State University
Eliminating Ambiguity stmt if expr then stmt if expr then stmt else stmt other The above grammar is ambiguous since the following string has two parse trees if E 1 then if E 2 then S 1 else S 2
Eliminating Ambiguity stmt if exp r then stmt if expr then stmt else stmt stmt if expr then stmt if expr then stmt else stmt
Eliminating Ambiguity stmt matched_stmt open_stmt matched_stmt if expr then matched_stmt else matched_stmt other open_stmt if expr then stmt if expr then matched_stmt else open_stmt In practice, disambiguation is rarely implemented into the grammar; extra constraints are added to the parser for solving ambiguity
Exercise Choose the unambiguous version of the given ambiguous grammar: S SS a b S Sa Sb ε S S S S a b S S S S a b S Sa Sb S Sa Sb a b
Immediate Left/Right Recursion Productions of the form A A α β are immediate left recursive Productions of the form A α A β are immediate right recursive
Associativity of Operators Immediate left-recursive productions used to implement left-associative operators left left + term term String a+b+c has the same meaning as (a+b)+c
Associativity of Operators Immediate right-recursive productions used to implement right-associative operators right term = right term String a=b=c has the same meaning as a=(b=c)
Precedence of Operators Operators with higher precedence bind more tightly expr expr + term term term term * factor factor factor number ( expr ) String 2+3*5 has the same meaning as 2+(3*5) expr expr term term factor number term factor number factor number 2 + 3 * 5
Immediate Left Recursion When one of the productions in a grammar is immediate left recursive then a top-down predictive parser loops forever on certain inputs S A Recursive call to A w A β
Immediate Left Recursion We can eliminate immediate left recursive productions by systematically rewriting the grammar using right recursive productions A A α β A β R R α R ε
Immediate Left Recursion A A A α β R A α α R β α R ε
Immediate Left-Recursion Elimination Method Rewrite every left-recursive production A A α A δ β γ into a right-recursive production A β A R γ A R A R α A R δ A R ε
Exercise Choose the grammar that correctly eliminates immediate left recursion from the given grammar: E E + T T T id ( E )
Exercise (cont d) E E + id E + ( E ) id ( E ) E T E E + T E ε T id ( E ) E E + T T E id ( E ) T id ( E ) E id + E E + T T T id ( E )
Left Recursion A CFG is left recursive if it allows derivations of the form A + A α This notion is more general than immediate left recursion
Recognition of Left Recursion Assume there are no epsilon rules Draw a graph where nodes are nonterminals arcs (A, B) represent the relation A expands into B α for some α In this graph, a cycle indicates left recursion
Example A B C a B C A A b C A B C C a A B C
Elimination of Left Recursion The basic operation used by the algorithm: expand first symbol in right-hand side with a rule A B α B β A β α
Elimination of Left Recursion The basic idea Process nonterminal symbols in some order A 1, A 2,, A n Replace rules of the form A i A j γ having j < i with rules having j > i
Example A < B < C A B C A B C
General Left Recursion Elimination Method Arrange the nonterminals in some order A 1, A 2,, A n for i = 1,, n { for j = 1,, i-1 { replace each A i A j γ with A i δ 1 γ δ 2 γ δ k γ where A j δ 1 δ 2 δ k } eliminate the immediate left recursion in A i }
Example A B C a B C A A b C A B C C a Choose arrangement: A, B, C i = 1: i = 2, j = 1: i = 3, j = 1: i = 3, j = 2: nothing to do B C A A b B C A B C b a b (imm) B C A B R a b B R B R C b B R ε C A B C C a C B C B a B C C a C B C B a B C C a C C A B R C B a b B R C B a B C C a (imm) C a b B R C B C R a B C R a C R C R A B R C B C R C C R ε
Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing S A Symbol b cannot be used to predict expansion of B w Bβ Bδ b
Left Factoring Replace productions with A α β 1 α β 2 α β n γ A α A R γ A R β 1 β 2 β n
Top-Down Parsing Recursive-descent parsing Needs backtracking LL methods (Left-to-right, Leftmost derivation) Also called predictive parsing Specialization of recursive-descent where backtracking is not needed (correct production chosen using look-ahead)
Top-Down Parsing Both methods build leftmost derivations Grammar: E T + T T ( E ) T - E T id Leftmost derivation: E lm T + T lm id + T lm id + id E E E E T T T T T T + id + id + id
Recursive Descent Parsing Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information
Recursive Descent Parsing void A() { choose an A-production A X 1 X 2 X k ; for ( i = 1 to k ) { if ( X i is a nonterminal ) } call procedure X i (); else if ( X i equals the current input symbol a ) advance the input to the next symbol; else error(); }
Recursive Descent Parsing To allow for backtracking previous code must be modified as follows Must try each A-production in some order In case of failure, input pointer should be retracted and next production should be checked at the latest point of choice If all productions have been checked, then we have an input error
Example Assume the grammar E E E + id E - E id ( E ) The following C code implements a recursive descent parser
Example (cont d) 1 bool term(token tok) { return *next++ == tok; } 2 bool E 1 () { return E (); } 3 bool E 2 () { return E () && term(plus) && term(id); } 4 bool E() { 5 TOKEN *save = next; 6 return (next = save, E1()) (next = save, E2()); } 7 bool E 1 () { return term(minus) && E (); } 8 bool E 2 () { return term(id); } 9 bool E 3 () { return term(open) && E() && term(close); } 10 bool E () { 11 TOKEN *save = next; 12 return (next = save, E 1 ()) 13 (next = save, E 2 ()) 14 (next = save, E 3 ()); }
Exercise Choose the derivation that is a valid recursive descent parse for the string id + id in the grammar E E E + E E - E id ( E ) Moves that are followed by backtracking are marked in red
Exercise (cont d) 1. E, E, E + E, id + E, id + E, id + id 2. E, E, - E, id, ( E ), E + E, - E + E, id + E, id + E, id + - E, id + id 3. E, E + E, id + E, id + E, id + id 4. E, E, id, E + E, id + E, id + E, id + id
Backtracking & Efficiency When failure occurs at higher level, all lower level subparses that were successful are deleted Later on, those subparses are recomputed from scratch Highly inefficient
Predictive Parsing Grammar must be non-ambiguous Eliminate left recursion from grammar Left factor the grammar Compute functions FIRST() and FOLLOW() Table-driven variant of descent parsing
FIRST() and FOLLOW(): Intuitive Idea Used to predict production A α for nontermial A S FIRST(α) w A FOLLOW(A) a b
FIRST() FIRST(α) = the set of terminals that begin all strings derived from α FIRST(a) = {a} if a T FIRST(ε) = {ε} FIRST(A) = A α FIRST(α) for A α P FIRST(X 1 X 2 X k ) : if ε FIRST(X j ) for all j = 1,, i-1 then add FIRST(X i )\{ε} to FIRST(X 1 X 2 X k ) if ε FIRST(X j ) for all j = 1,, k then add ε to FIRST(X 1 X 2 X k )
Example Grammar: E T X T ( E ) int Y X + E ε Y * T ε FIRST(+) = {+} FIRST(*) = {*} FIRST( ( ) = {(} FIRST( ) ) = {)} FIRST(int) = {int} FIRST(ε) = {ε}
Example FIRST(Y) = FIRST( * T ) FIRST(ε) = FIRST( * ) FIRST(ε) = { * } { ε } = { *, ε } FIRST(X) = FIRST(+ E) FIRST(ε) = FIRST(+) FIRST(ε) = { + } { ε } = { +, ε }
Example FIRST(T) = FIRST( ( E ) ) FIRST(int Y) = FIRST( ( ) FIRST(int) = { ( } { int } = { (, int } FIRST(E) = FIRST(T X) = FIRST(T) = { (, int }
FOLLOW() FOLLOW(A) = the set of terminals that can immediately follow nonterminal A FOLLOW(A) = for all (B α A β) P do add FIRST(β)\{ε} to FOLLOW(A) for all (B α A β) P and ε FIRST(β) do add FOLLOW(B) to FOLLOW(A) for all (B α A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol then add $ to FOLLOW(A)
Example Grammar: E T X T ( E ) int Y X + E ε Y * T ε { $, ) } FOLLOW(E) FOLLOW(X) FOLLOW(E) FOLLOW(E) FOLLOW(X) FOLLOW(X) = FOLLOW(E) = { $, ) }
Example Grammar: E T X T ( E ) int Y X + E ε Y * T ε FIRST(X) \ {ε} = {+} FOLLOW(T) FOLLOW(E) = { $, ) } FOLLOW(T) FOLLOW(Y) FOLLOW(T) FOLLOW(T) FOLLOW(Y) FOLLOW(Y) = FOLLOW(T) = { +, $, ) }
LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A α 1 α 2 α n for nonterminal A the following holds: 1. FIRST(α i ) FIRST(α j ) = for all i j 2. if α i * ε then 2.a. α j * ε for all j i 2.b. FIRST(α j ) FOLLOW(A) = for all j i
Non-LL(1) Examples Grammar S S a a S a S a S a R ε R S ε S a R a R S ε Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S * ε and ε * ε For R: FIRST(S) FOLLOW(R)
Using FIRST and FOLLOW to Write a Recursive Descent Parser expr term rest rest + term rest - term rest ε term id procedure rest(); begin if lookahead in FIRST(+ term rest) then match( + ); term(); rest() else if lookahead in FIRST(- term rest) then match( - ); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }
Non-Recursive Predictive Parsing: Table-Driven Parsing Given an LL(1) grammar G = (N, T, P, S) construct a table M[A, a] for A N, a T and use a driver program with a stack input a + b $ stack X Y Z $ Predictive parsing program (driver) Parsing table M output
Constructing an LL(1) Predictive Parsing Table for each production A α do for each a FIRST(α) do add A α to M[A, a] enddo if ε FIRST(α) then for each b FOLLOW(A) do add A α to M[A, b] enddo endif enddo Mark each undefined entry in M as error
Example Table E T E R E R + T E R ε T F T R T R * F T R ε F ( E ) id A α FIRST(α) FOLLOW(A) E T E R ( id $ ) E R + T E R + $ ) E R ε ε $ ) T F T R ( id + $ ) T R * F T R * + $ ) T R ε ε + $ ) F ( E ) ( * + $ ) F id id * + $ ) id + * ( ) $ E E T E R E T E R E R E R + T E R E R ε E R ε T T F T R T F T R T R T R ε T R * F T R T R ε T R ε F F id F ( E )
Ambiguous grammar S i E t S S R a S R e S ε E b Error: duplicate table entry LL(1) Grammars are Unambiguous A α FIRST(α) FOLLOW(A) S i E t S S R i e $ S a a e $ S R e S e e $ S R ε ε e $ E b b t a b e i t $ S S a S i E t S S R S R E E b S R ε S R e S S R ε
Predictive Parsing Algorithm Input: string w and LL parsing table M for grammar G Output: leftmost derivation if w L(G) and error otherwise Start configuration stack is $S buffer is w$
Predictive Parsing Algorithm let a be the first symbol of w; let X be the top stack symbol; while ( X $ ) if ( X = a ) pop the stack and let a be the next symbol of w; else if ( X is a terminal ) error(); else if ( M[X, a] is an error entry ) error(); else if ( M[X, a] = X Y 1 Y 2 Y k ) { output the production X Y 1 Y 2 Y k ; pop the stack; push Y k, Y k-1,, Y 2, Y 1 onto the stack, with Y 1 } let X be the top stack symbol; } on top;
Example Stack $E $E R T $E R T R F $E R T R id $E R T R $E R $E R T+ $E R T $E R T R F $E R T R id $E R T R $E R T R F* $E R T R F $E R T R id $E R T R $E R $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E T E R T F T R F id T R ε E R + T E R T F T R F id T R * F T R F id T R ε E R ε
Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Can be automated Cons: Error messages are needed FOLLOW(E) = { ) $ } FOLLOW(E R ) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(T R ) = { + ) $ } FOLLOW(F) = { + * ) $ } id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch synch: the driver pops current nonterminal A and skips input till synch token
Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive Can then continue here id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R insert * T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch insert *: driver inserts missing * and retries the production
Error Productions E T E R E R + T E R ε T F T R T R * F T R ε F ( E ) id Add error production : T R F T R to ignore missing *, e.g.: id id Pro: Powerful recovery method Cons: Cannot be automated id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R T R F T R T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch