Predictive Parsers op-down Parsing and Intro to Bottom-Up Parsing Lecture 7 Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead In practice, LL(1) is used Prof. Aiken CS 143 Lecture 7 1 Prof. Aiken CS 143 Lecture 7 2 LL(1) vs. Recursive Descent Predictive Parsing and Left Factoring In recursive-descent, At each step, many choices of production to use Backtracking used to undo bad choices In LL(1), At each step, only one choice of production hat is When a non-terminal A is leftmost in a derivation he next input symbol is t here is a unique production A α to use Or no production to use (an error state) LL(1) is a recursive descent variant without backtracking Prof. Aiken CS 143 Lecture 7 3 Recall the grammar + ( ) Hard to predict because For two productions start with For it is not clear how to predict We need to left-factor the grammar Prof. Aiken CS 143 Lecture 7 4 Left-Factoring xample LL(1) Parsing able xample Recall the grammar + ( ) Factor out common prefixes of productions X X + ε ( ) Y Y ε Prof. Aiken CS 143 Lecture 7 5 Left-factored grammar X ( ) Y he LL(1) parsing table: X + ε Y ε next input token + ( ) $ X X X + ε ε Y ( ) Y ε ε ε leftmost non-terminal rhs of production to use Prof. Aiken CS 143 Lecture 7 6 1
LL(1) Parsing able xample (Cont.) Consider the [, ] entry When current non-terminal is and next input is, use production X his can generate an in the first position Consider the [Y,+] entry When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only if Y ε LL(1) Parsing ables. rrors Blank entries indicate error situations Consider the [,] entry here is no way to derive a string starting with from non-terminal Prof. Aiken CS 143 Lecture 7 7 Prof. Aiken CS 143 Lecture 7 8 Using Parsing ables Method similar to recursive descent, except For the leftmost non-terminal S We look at the next input token a And choose the production shown at [S,a] A stack records frontier of parse tree Non-terminals that have yet to be expanded erminals that have yet to matched against the input op of stack = leftmost pending terminal or non-terminal Reject on reaching error state Accept on end of input & empty stack Prof. Aiken CS 143 Lecture 7 9 LL(1) Parsing Algorithm initialize stack = <S $> and next repeat case stack of <X, rest> : if [X,next] = Y 1 Y n then stack <Y 1 Y n rest>; else error (); <t, rest> : if t == next ++ then stack <rest>; else error (); until stack == < > Prof. Aiken CS 143 Lecture 7 10 LL(1) Parsing Algorithm $ marks bottom of stack LL(1) Parsing xample initialize stack = <S $> and next repeat For non-terminal X on top of stack, lookup production case stack of <X, rest> : if [X,next] = Y 1 Y n then stack <Y 1 Y n rest>; else error (); <t, rest> : if t == next ++ For terminal t on top of then stack <rest>; stack, check t matches next else error (); input token. until stack == < > Pop X, push production rhs on stack. Note leftmost symbol of rhs is on top of the stack. Prof. Aiken CS 143 Lecture 7 11 Stack Input Action $ $ X X $ $ Y Y X $ $ terminal Y X $ $ X $ $ terminal X $ $ Y Y X $ $ terminal Y X $ $ ε X $ $ ε $ $ ACCP Prof. Aiken CS 143 Lecture 7 12 2
Constructing Parsing ables: he Intuition Computing First Sets Consider non-terminal A, production A α, & token t [A,t] = α in two cases: If α t β α can derive a t in the first position We say that t First(α) If A α and α ε and S β A t δ Useful if stack has A, input is t, and A cannot derive t In this case only option is to get rid of A (by deriving ε) Can work only if t can follow A in at least one derivation We say t Follow(A) Definition First(X) = { t X tα} {ε X ε} Algorithm sketch: 1. First(t) = { t } 2. ε First(X) if X ε if X A 1 A n and ε First(A i ) for 1 i n 3. First(α) First(X) if X A 1 A n α and ε First(A i ) for 1 i n Prof. Aiken CS 143 Lecture 7 13 Prof. Aiken CS 143 Lecture 7 14 First Sets. xample Computing Follow Sets Recall the grammar X ( ) Y X + ε Y ε Definition: Follow(X) = { t S β X t δ } First sets First( ( ) = { ( } First( ) = {, ( } First( ) ) = { ) } First( ) = {, ( } First( ) = { } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {, ε } First( ) = { } Intuition If X A B then First(B) Follow(A) and Follow(X) Follow(B) if B ε then Follow(X) Follow(A) If S is the start symbol then $ Follow(S) Prof. Aiken CS 143 Lecture 7 15 Prof. Aiken CS 143 Lecture 7 16 Computing Follow Sets (Cont.) Algorithm sketch: 1. $ Follow(S) 2. First(β) - {ε} Follow(X) For each production A α X β 3. Follow(A) Follow(X) For each production A α X β where ε First(β) Follow Sets. xample Recall the grammar X ( ) Y X + ε Y ε Follow sets Follow( + ) = {, ( } Follow( ) = {, ( } Follow( ( ) = {, ( } Follow( ) = {), $} Follow( X ) = {$, ) } Follow( ) = {+, ), $} Follow( ) ) = {+, ), $} Follow( Y ) = {+, ), $} Follow( ) = {, +, ), $} Prof. Aiken CS 143 Lecture 7 17 Prof. Aiken CS 143 Lecture 7 18 3
Constructing LL(1) Parsing ables Construct a parsing table for CFG G For each production A α in G do: For each terminal t First(α) do [A, t] = α If ε First(α), for each t Follow(A) do [A, t] = α If ε First(α) and $ Follow(A) do [A, $] = α Notes on LL(1) Parsing ables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language CFGs are not LL(1) Prof. Aiken CS 143 Lecture 7 19 Prof. Aiken CS 143 Lecture 7 20 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Builds on ideas in top-down parsing Bottom-up is the preferred method Concepts today, algorithms next time An Introductory xample Bottom-up parsers don t need left-factored grammars Revert to the natural grammar for our example: + () Consider the string: + Prof. Aiken CS 143 Lecture 7 21 Prof. Aiken CS 143 Lecture 7 22 he Idea Bottom-up parsing reduces a string to the start symbol by inverting productions: + + + + + + Prof. Aiken CS 143 Lecture 7 23 Observation Read the productions in reverse (from bottom to top) his is a rightmost derivation! + + + + + + Prof. Aiken CS 143 Lecture 7 24 4
Important Fact #1 A Bottom-up Parse Important Fact #1 about bottom-up parsing: A bottom-up parser traces a rightmost derivation in reverse + + + + + + Prof. Aiken CS 143 Lecture 7 25 Prof. Aiken CS 143 Lecture 7 26 A Bottom-up Parse in Detail (1) A Bottom-up Parse in Detail (2) + + + + + Prof. Aiken CS 143 Lecture 7 27 Prof. Aiken CS 143 Lecture 7 28 A Bottom-up Parse in Detail (3) A Bottom-up Parse in Detail (4) + + + + + + + + + Prof. Aiken CS 143 Lecture 7 29 Prof. Aiken CS 143 Lecture 7 30 5
A Bottom-up Parse in Detail (5) A Bottom-up Parse in Detail (6) + + + + + + + + + + + + Prof. Aiken CS 143 Lecture 7 31 Prof. Aiken CS 143 Lecture 7 32 A rivial Bottom-Up Parsing Algorithm Let I = input string repeat pick a non-empty substring β of I where X β is a production if no such β, backtrack replace one β by X in I until I = S (the start symbol) or all possibilities are exhausted Questions Does this algorithm terminate? How fast is the algorithm? Does the algorithm handle all cases? How do we choose the substring to reduce at each step? Prof. Aiken CS 143 Lecture 7 33 Prof. Aiken CS 143 Lecture 7 34 Where Do Reductions Happen? Important Fact #1 has an eresting consequence: Let αβω be a step of a bottom-up parse Assume the next reduction is by X β hen ω is a string of terminals Why? Because αxω αβω is a step in a rightmost derivation Notation Idea: Split string o two substrings Right substring is as yet unexamined by parsing (a string of terminals) Left substring has terminals and non-terminals he dividing po is marked by a he is not part of the string Initially, all input is unexamined x 1 x 2... x n Prof. Aiken CS 143 Lecture 7 35 Prof. Aiken CS 143 Lecture 7 36 6
Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift Shift Shift: Move one place to the right Shifts a terminal to the left string ABC xyz ABCx yz Reduce Prof. Aiken CS 143 Lecture 7 37 Prof. Aiken CS 143 Lecture 7 38 Reduce he xample with Reductions Only Apply an inverse production at the right end of the left string If A xy is a production, then Cbxy ijk CbA ijk + + reduce reduce Prof. Aiken CS 143 Lecture 7 39 + + + reduce reduce reduce + Prof. Aiken CS 143 Lecture 7 40 he xample with Shift-Reduce Parsing A Shift-Reduce Parse in Detail (1) + + + + + + + + + + reduce reduce reduce reduce reduce + + + Prof. Aiken CS 143 Lecture 7 41 Prof. Aiken CS 143 Lecture 7 42 7
A Shift-Reduce Parse in Detail (2) + + A Shift-Reduce Parse in Detail (3) + + + + + Prof. Aiken CS 143 Lecture 7 43 Prof. Aiken CS 143 Lecture 7 44 A Shift-Reduce Parse in Detail (4) + + + + A Shift-Reduce Parse in Detail (5) + + + + + + + Prof. Aiken CS 143 Lecture 7 45 Prof. Aiken CS 143 Lecture 7 46 A Shift-Reduce Parse in Detail (6) A Shift-Reduce Parse in Detail (7) + + + + + + + + + + + + + + + Prof. Aiken CS 143 Lecture 7 47 Prof. Aiken CS 143 Lecture 7 48 8
A Shift-Reduce Parse in Detail (8) A Shift-Reduce Parse in Detail (9) + + + + + + + + + + + + + + + + + + + Prof. Aiken CS 143 Lecture 7 49 Prof. Aiken CS 143 Lecture 7 50 A Shift-Reduce Parse in Detail (10) A Shift-Reduce Parse in Detail (11) + + + + + + + + + + + Prof. Aiken CS 143 Lecture 7 51 + + + + + + + + + + + Prof. Aiken CS 143 Lecture 7 52 he Stack Conflicts Left string can be implemented by a stack op of the stack is the Shift pushes a terminal on the stack Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a nonterminal on the stack (production lhs) In a given state, more than one action ( or reduce) may lead to a valid parse If it is legal to or reduce, there is a reduce conflict If it is legal to reduce by two different productions, there is a reduce-reduce conflict You will see such conflicts in your project! More next time... Prof. Aiken CS 143 Lecture 7 53 Prof. Aiken CS 143 Lecture 7 54 9