Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS 164 -- Berkley University 1
Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Builds on ideas in top-down parsing Preferred method in practice Also called LR parsing L means that tokens are read left to right R means that it constructs a rightmost derivation! Prof. Bodik CS 164 Lecture 7-8 2
An Introductory xample LR parsers don t need left-factored grammars and can also handle left-recursive grammars Consider the following grammar: ( ) Why is this not LL(1)? Consider the string: ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 3
The Idea LR parsing reduces a string to the start symbol by inverting productions: str: input string of terminals repeat Identify β in str such that A β is a production (i.e., str = α β γ) Replace β by A in str (i.e., str becomes α A γ) until str = S Prof. Bodik CS 164 Lecture 7-8 4
A Bottom-up Parse in Detail (1) () () ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 5
A Bottom-up Parse in Detail (2) () () () () ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 6
A Bottom-up Parse in Detail (3) () () () () () () ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 7
A Bottom-up Parse in Detail (4) () () () () () () () ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 8
A Bottom-up Parse in Detail (5) () () () () () () () () ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 9
A Bottom-up Parse in Detail (6) () () () () () () () () A rightmost derivation in reverse ( ) ( ) Prof. Bodik CS 164 Lecture 7-8 10
Important Fact #1 Important Fact #1 about bottom-up parsing: An LR parser traces a rightmost derivation in reverse Prof. Bodik CS 164 Lecture 7-8 11
Where Do Reductions Happen Important Fact #1 has an eresting consequence: Let αβγ be a step of a bottom-up parse Assume the next reduction is by A β Then γ is a string of terminals! Why? Because αaγ αβγ is a step in a rightmost derivation. Prof. Bodik CS 164 Lecture 7-8 12
Notation Idea: Split string o two substrings Right substring (a string of terminals) is as yet unexamined by parser Left substring has terminals and non-terminals The dividing po is marked by a I The I is not part of the string Initially, all input is unexamined: Ix 1 x 2... x n Prof. Bodik CS 164 Lecture 7-8 13
Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift Reduce Prof. Bodik CS 164 Lecture 7-8 14
Shift Shift: Move I one place to the right Shifts a terminal to the left string (I ) ( I ) Prof. Bodik CS 164 Lecture 7-8 15
Reduce Reduce: Apply an inverse production at the right end of the left string If ( ) is a production, then ( ( ) I ) ( I ) Prof. Bodik CS 164 Lecture 7-8 16
Shift-Reduce xample I () ()$ shift ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ red. ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ ( I ) ()$ red. shift ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ ( I ) ()$ () I ()$ red. shift red. () ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ ( I ) ()$ () I ()$ I ()$ red. shift red. () shift 3 times ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ ( I ) ()$ () I ()$ I ()$ red. shift red. () shift 3 times ( I )$ red. ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ red. ( I ) ()$ shift () I ()$ red. () I ()$ shift 3 times ( I )$ red. ( I )$ shift ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ ( I ) ()$ () I ()$ I ()$ red. shift red. () shift 3 times ( I )$ ( I )$ red. shift () I $ red. () ( ) ( )
Shift-Reduce xample I () ()$ shift I () ()$ red. I () ()$ shift 3 times ( I ) ()$ ( I ) ()$ () I ()$ I ()$ red. shift red. () shift 3 times ( I )$ ( I )$ red. shift () I $ I $ red. () accept ( ) ( )
The Stack Left string can be implemented by a stack Top of the stack is the I Shift pushes a terminal on the stack Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a nonterminal on the stack (production lhs) Prof. Bodik CS 164 Lecture 7-8 28
Key Issue: When to Shift or Reduce? Decide based on the stack s content and the lookahead Idea: use a finite automaton (DFA) to decide when to shift or reduce The DFA input is the stack s content The alphabet consists of terminals and non-terminals We run the DFA on the stack and we examine the resulting state X and the token tok after I If X has a transition labeled tok then shift If X is labeled with A β on tok then reduce Prof. Bodik CS 164 Lecture 7-8 29
accept on $ LR(1) Parsing. An xample () on $, 0 1 ( 2 3 4 7 8 ) 10 6 ( ) 5 11 on $, 9 on ), () on ), I () ()$ shift I () ()$ I () ()$ shift(x3) ( I ) ()$ ( I ) ()$ shift () I ()$ () I ()$ shift (x3) ( I )$ ( I )$ shift () I $ () I $ accept
Representing the DFA Parsers represent the DFA as a 2D table Lines correspond to DFA states Columns correspond to terminals and nonterminals Typically columns are split o: Those for terminals: action table Those for non-terminals: goto table Prof. Bodik CS 164 Lecture 7-8 31
Representing the DFA. xample The table for a fragment of our DFA: ( 3 4 6 ) 7 () on $, 5 on ), ( ) $ 3 s4 4 s5 g6 5 r -> r -> 6 s8 s7 7 r -> () r -> () Prof. Bodik CS 164 Lecture 7-8 32
A Hierarchy of Grammar Classes From Andrew Appel, Modern Compiler Implementation in Java Prof. Bodik CS 164 Lecture 7-8 33