Let s try building an SLR parsing table for another simple grammar: S XaY Y X by c Y X S XaY Y X by c Y X Canonical collection: I 0 : {S S, S XaY, S Y, X by, X c, Y X} I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X } I 9 : {S XaY } 1 2
I 0 : {S S, S XaY, S Y, X by, X c, Y X} I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X } I 9 : {S XaY } FOLLOW(S) = {$} FOLLOW(X) = {a, $} FOLLOW(Y ) = {a, $} action goto STATE a b c $ S X Y 0 1 2 3 4 5 6 7 8 9 3 I 0 : {S S, S XaY, S Y, X by, X c, Y X} I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X } I 9 : {S XaY } FOLLOW(S) = {$} FOLLOW(X) = {a, $} FOLLOW(Y ) = {a, $} action goto STATE a b c $ S X Y 0 s4 s5 1 2 3 1 acc 2 s6,r5? r5 3 r2 4 s4 s5 8 7 5 r4 r4 6 s4 s5 8 9 7 r3 r3 8 r5 r5 9 r1 4
action goto STATE a b c $ S X Y 0 s4 s5 1 2 3 1 acc 2 s6,r5? r5 3 r2 r2 4 s4 s5 8 7 5 r4 r4 6 s4 s5 8 9 7 r3 r3 8 r5 r5 9 r1 Let s try a parse: STACK INPUT ACTION 0 cac$ s5 0c5 ac$ r4 0X2 ac$ s6, not r5! 0X2a6 c$ s5 0X2a6c5 $ r4 0X2a6X8 $ r5 0X2a6Y9 $ r1 0S1 $ acc 5 action goto STATE a b c $ S X Y 0 s4 s5 1 2 3 1 acc 2 s6,r5? r5 3 r2 r2 4 s4 s5 8 7 5 r4 r4 6 s4 s5 8 9 7 r3 r3 8 r5 r5 9 r1 Let s try the alternative... STACK INPUT ACTION 0 cac$ s5 0c5 ac$ r4 0X2 ac$ try r5, instead of s6 0Y3 ac$ r2 0S1 ac$ error In LR parsing, the stack should always contain a viable prefix, and the stack plus lookahead (unless it s $) should be a prefix of a right-sentential form. Otherwise the parse will fail... 6
If we have X on the stack, with lookahead a, we had better not reduce X to Y : Y a is not a prefix of a right-sentential form, despite the fact that a FOLLOW(Y ). FOLLOW(Y ) = {a, $} I 0 : {S S, S XaY, S Y, X by, X c, Y X} I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X } I 9 : {S XaY } S XaY by ay So, what viable prefixes take the parser to state 2? X only! To get an a we must use production S XaY. Then to get a Y in front of that a, we must produce a Y from the X, which we can do only with production X by. (So we can t get substring Y a without a leading b.) (Notice that in this case we showed that Ba is not a prefix of any sentential form of course it follows that it is not a prefix of any right-sentential form.) Whenever the parser is in state 2 with lookahead a, the stack contains X (that is, 0X2), and reduction by 5 leaves us with stack Y (that is, 0Y 3) in which case the stack plus lookahead is not a prefix of any right-sentential form. So in state 2, with lookahead a, we should always shift and go to state 6. The SLR parsing table isn t adequate... But maybe there are still cases in which, in state 2 on lookahed a, the parser should reduce with production 5... The problem is with the use of FOLLOW sets to decide when to reduce. 7 8
Canonical LR parsing tables, LR(1) items The grammar in the previous example is not ambiguous, and can be parsed by the LR method, if only we can construct a more adequate parsing table. We ll look first at canonical LR parsing tables. Then LALR parsing tables, which are smaller and weaker (and are what yacc builds!). Both methods can handle the previous examples. We begin by enriching the notion of an item, to better take into account lookahead tokens. Definition An LR(1) item is a pair of an LR(0) item and a token or $, typically written [X α β, a]. LR(1) closure function Definition Given a set I of LR(1) items from an augmented grammar, closure(i) is the least set of items satisfying the following two conditions: 1. I closure(i) 2. If [X α Y β, a] closure(i) and there is an LR(1) item [Y γ, b] s.t. b FIRST(βa) then [Y γ, b] closure(i). Intuitively, the lookahead a has no effect if β ǫ, but if β = ǫ, then [X α β, a] calls for reduction by X αβ if the lookahead token is a. Thus, we will reduce by a production X α only on those lookaheads a for which there is an item [X α, a]. Claim: FIRST(βa) FOLLOW(Y ). (As elsewhere, I suspect this is not true, but it is suggestive. Why not true? Because the definition of FOLLOW is about existence of sentential forms, but this definition is not.) The set of such a s is always a subset of FOLLOW(X). 9 10
1. I closure(i) 2. If [X α Y β, a] closure(i) and there is an LR(1) item [Y γ, b] s.t. b FIRST(βa), then [Y γ, b] closure(i). Example What is closure({[s S, $]})? So we consider LR(1) items with first components S XaY and S Y. In this case, β = ǫ, so FIRST(β$) = {$}. So we add LR(1) items [S XaY, $] and [S Y, $]. Now consider X by and X c. Here β = ay, so FIRST(β$) = {a}. Thus we add items [X by, a] and [X c, a]. Now consider Y X, with β = ǫ. Since FIRST(β$) = {$}, we add item [Y X, $]. Now consider X by and X c, with β = ǫ. Since FIRST(β$) = {$}, we add items [X by, $] and [X c, $]. 11 1. I closure(i) 2. If [X α Y β, a] closure(i) and there is an LR(1) item [Y γ, b] s.t. b FIRST(βa), then [Y γ, b] closure(i). Example What is closure({[s X ay, $], [Y X, $]})? What is closure({[x b Y, a $]})? (stands for 2 LR(1) items) Consider LR(1) items with first component Y X. In this case, β = ǫ. Since FIRST(β$) = {$}, we add [Y X, $], and since FIRST(βa) = {a}, we add [Y X, a]. Now consider X by and X c. Here again β = ǫ, so FIRST(β$) = {$} and FIRST(βa) = {a}. Thus we add [X by, a $] and [X c, a $]. What is closure({[s Xa Y, $]})? 12
LR(1) goto function Construction of sets of LR(1) items Definition For any set I of LR(1) items and grammar symbol X, goto(i, X) is the closure of the set of all LR(1) items [A αx β, a] s.t. [A α Xβ, a] I. Example I 0 : {[S S, $], [S XaY, $], [S Y, $] [X by, a $], [X c, a $], [Y X, $]} goto(i 0, S) = closure({[s S, $]}) = {[S S, $]} goto(i 0, X) = closure({[s X ay, $], [Y X, $]}) = {[S X ay, $], [Y X, $]}) goto(i 0, Y ) = closure({[s Y, $]}) = {[S Y, $]} goto(i 0, b) = closure({[x b Y, a $]}) = {[X b Y, a $], [Y X, a $], [X by, a $], [X c, a $]} goto(i 0, c) = closure({[x c, a $]}) = {[X c, a $]} Here is the algorithm for construction of the canonical collection of sets of LR(1) items for an augmented grammar: C := { closure({[s S, $]}) }; repeat for each set of LR(1) items I C and each grammar symbol X s.t. goto(i, X) is not empty and is not already in C do add goto(i, X) to C until no more sets of items can be added to C in this way So this is the same construction we used for LR(0) items but now for LR(1) items, with the LR(1) versions of closure and goto, and the initial LR(1) item. 13 14
Construction of canonical LR(1) parsing table 1. Construct the canonical collection C = {I 0,...,I n } of sets of LR(0) items. Each element of C is a state, and we ll often write k to refer to the state I k. 2. The applicable parsing actions for state k and lookahead a are determined as follows: a) If goto(k, a) = j, then shift and go to state j is applicable. b) If [B α, a] k and B is not S, then reduce by production B α is applicable. c) If [S S, $] k and a = $, then accept is applicable. If any conflicting actions are generated in step 2, the grammar is not LR(1). 3. For all states k and variables A, if goto(k, A) is nonempty, then goto(k, A) is the corresponding goto entry. 4. All remaining undefined entries have action error. 5. If I k = closure({[s S, $]}), then k is the initial state of the parser. 15