Bottom up prsing Generl ide LR(0) SLR LR(1) LLR To best exploit JvCUP, should understnd the theoreticl bsis (LR prsing); 1 Top-down vs Bottom-up Bottom-up more powerful thn top-down; Cn process more powerful grmmr thn LL, will explin lter. Bottom-up prsers re too hrd to write by hnd but JvCUP (nd ycc) genertes prser from spec; Bottom up prser uses right most derivtion Top down uses left most derivtion; Less grmmr trnsltion is required, hence the grmmr looks more nturl; Intuition: bottom-up prse postpones decisions bout which production rule to pply until it hs more dt thn ws vilble to top-down. Will explin lter 2 1
Bottom up prsing Strt with string of terminls; Exmple: Build up from leves of prse Grmmr: tree; pply productions bckwrds; When rech strt symbol & exhusted input, done; Shift-reduce is common bottomup technique. Notice the blue d should not be reduced into B in step 2. S rm Be rm de rm bcde rm bbcde How to get the right reduction steps? Reduce bbcde to S: b bcde bcde bc de d e d e B e Be S S Be bc b B d 3 Sententil form Sententil Form ny string tht cn be derived from non-terminls. Cn consist of terminls nd non terminls. Exmple: E E+T E + id T+id id + id Sententil forms: E+id, T+id,... Right sententil form: obtined by right most derivtion Sentence Sententil form with no non-terminls; id+id is sentence. 4 2
Hndles S rm Be rm de rm bcde rm bbcde S Be bc b B d Informlly, hndle of sententil form is substring tht cn be reduced. bc is hndle of the right sententil form bcde, becuse bc, nd fter bc is replced by, the resulting string de is still right sententil form. Is d hndle of bcde? No. this is becuse bcbe is not right sententil form. Formlly, hndle of right sententil form γ is production β nd position in γ where the string β my be found nd replced by. If S * rm αw rm αβw, then β in the position fter α is hndle of αβw. When the production β nd the position re cler, we simply sy the substring β is hndle. 5 Hndles in expression exmple E T + E T T int * T int (E) Consider the string: int * int + int The rightmost derivtion E rm T+E rm T+T rm T+int rm int*t +int rm int*int +int For unmbiguous grmmr, there is exctly one hndle for ech right-sententil form. The question is, how to find the hndle? Observtion: The substring to the right of hndle contins only terminl symbols. 6 3
Shift-reduce prsing Brek the input string into two prts: un-digested prt nd semidigested prt Left prt of input prtly processed; Right prt completely unprocessed. int foo (double n) { return (int) n+1 ; } Shifted, prtly reduced So fr unprocessed Use stck to keep trck of tokens seen so fr nd the rules lredy pplied bckwrds (reductions) Shift next input token onto stck When stck top contins good right-hnd-side of production, reduce by rule; Importnt fct: Hndle is lwys t the top of the stck. 7 Shift-reduce min loop Shift: If cn t perform reduction nd there re tokens remining in the unprocessed input, then trnsfer token from the input onto the stck. Reduce: If we cn find rule α, nd the contents of the stck re βα for some β (β my be empty), then reduce the stck to β. The α is clled hndle. Recognizing hndles is key! ccept: S is t the top of the stck nd input now empty, done Error: other cses. 8 4
Exmple 1 Prse Stck Remining input Prser ction Grmmr: S > E E > T E + T T > id (E) Input string: (id) (id)$ Shift prenthesis onto stck ( id)$ Shift id onto stck (id )$ Reduce: T id (pop RHS of production, push LHS, input unchnged) (T )$ Reduce: E T (E )$ Shift right prenthesis (E) $ Reduce: T (E) T $ Reduce: E T E $ Reduce: S E S $ Done: ccept S rm E rm T rm (E) rm (T) rm (id) 9 Shift-Reduce Exmple 2 S E E T E + T T id (E) Prse Stck Remining Input ction ( (id (T (E (E+ (E+id (E+T (E (E) T E S (id + id) $ id + id) $ + id) $ + id) $ + id) $ id) $ ) $ ) $ ) $ $ $ $ $ Shift ( Shift id Reduce T id Reduce E T Shift + Shift id Reduce T id Reduce E E+T; (Ignore: E T) Shift ) Reduce T (E) Reduce E T Reduce S E ccept Input: (id + id) (id +id) (T +id) (E +id ) (E+T ) (E) T E S Note tht it is the reverse of the following rightmost derivtion: S rm E rm T rm (E) rm (E+T ) rm (E +id ) rm (T +id) rm (id +id) 10 5
Conflicts during shift reduce prsing Reduce/reduce conflict stck input... (E+T ) Which rule we should use, E E+T or E T? Shift/reduce conflict ifstt if (E) S if (E) S else S stck Input... if (E) S else... Both reduce nd shift re pplicble. Wht we should do next, reduce or shift? 11 LR(K) prsing Left-to-right, Rightmost derivtion with k-token lookhed. L - Left-to-right scnning of the input R - Constructing rightmost derivtion in reverse k - number of input symbols to select prser ction Most generl prsing technique for deterministic grmmrs. Efficient, Tble-bsed prsing Prses by shift-reduce Cn mssge grmmrs less thn LL(1) Cn hndle lmost ll progrmming lnguge structures LL LR CFG In generl, not prcticl: tbles too lrge (10^6 sttes for C++, d). Common subsets: SLR, LLR (1). 12 6
LR Prsing continued Dt structures: Stck of sttes {s} ction tble ction[s,]; T Goto tble Goto[s,X]; X N In LR prsing, push whole sttes on stck Stck of sttes keeps trck of wht we ve seen so fr (left context): wht we ve shifted & reduced & by wht rules. Use ction tbles to decide shift vs reduce Use Goto tble to move to new stte 13 Min loop of LR prser Initil stte S0 strts on top of stck; Given stte St stte on top of stck nd the next input token : If (ction[st, ] == shift Si) Push new stte Si onto stck Cll yylex to get next token If (ction[st, ] == reduce by Y X1 Xn) Pop off n sttes to find Su on top of stck Push new stte Sv = Goto[Su,Y] onto stck If (ction[st, ] == ccept), done! If (ction[st, ] == error), cn t continue to successful prse. 14 7
Exmple LR prse tble Stte on TOS ction Goto id + ( ) $ E T 0 S4 S3 S1 S2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 S6 S2 4 R4 R4 R4 R4 R4 5 S4 S3 S8 6 S5 S7 (1) E E + T (2) E T (3) T (E) (4) T id 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 If (ction[st, ] == shift), Push new stte ction[st, ] onto stck, Cll yylex to get next token If (ction[st, ] == reduce by Y X1 Xn), Pop off n sttes to find Su on top of stck, Push new stte Sv = Goto[Su,Y] onto stck We explin how to construct this tble lter. 15 Stte on ction Goto TOS id + ( ) $ E T 0 S4 S3 S1 S2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 S6 S2 4 R4 R4 R4 R4 R4 5 S4 S3 S8 6 S5 S7 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 (1) E E + T (2) E T (3) T (E) (4) T id Stte stck Remining Input Prser ction S0 id + (id)$ Shift S4 onto stte stck, move hed in input S0S4 + (id)$ Reduce 4) T id, pop stte stck, goto S2, input unchnged S0S2 + (id)$ Reduce 2) E T, goto S1 S0S1 + (id)$ Shift S5 S0S1S5 (id)$ Shift S3 S0S1S5S3 id)$ Shift S4 (sw nother id) S0S1S5S3S4 )$ Reduce 4) T id, goto S2 S0S1S5S3S2 )$ Reduce 2) E T, goto S6 S0S1S5S3S6 )$ Shift S7 S0S1S5S3S6S7 $ Reduce 3) T (E), goto S8 S0S1S5S8 $ Reduce 1) E E + T, goto S1 * S0S1 $ ccept 16 8
Types of LR prsers LR (k) SLR (k) -- Simple LR LLR (k) Lookhed LR k = # symbols lookhed 0 or 1 in this clss Drgon book hs generl cses Strt with simplest: LR(0) prser 17 LR (0) prser dvntges: Simplest to understnd, Smllest tbles Disdvntges No lookhed, so too simple-minded for rel prsers Good cse to see how to build tbles, though. We ll use LR(0) constructions in other LR(k) prsers Key to LR prsing is recognizing hndles Hndle: sequence of symbols encoded in top stck sttes representing right-hnd-side of rule we wnt to reduce by. 18 9
LR Tbles Given grmmr G, identify possible sttes for prser. Sttes encpsulte wht we ve seen nd shifted nd wht re reduced so fr Steps to construct LR tble: Construct sttes using LR(0) configurtions (or items); Figure out trnsitions between sttes 19 Configurtion configurtion (or item) is rule of G with dot in the right-hnd side. If rule XYZ in grmmr, then the configs re XYZ XY Z X YZ XYZ Dot represents wht prser hs gotten in stck in recognizing the production. XYZ mens XYZ on stck. Reduce! X YZ mens X hs been shifted. To continue prse, we must see token tht could begin string derivble from Y. Nottionl convention: X, Y, Z: symbol, either terminl or non-terminl, b, c : terminl α, β, γ: sequence of terminls or non-terminls 20 10
Set of configurtions X YZ mens X hs been shifted. To continue prse, we must see token tht could begin string derivble from Y. Tht is, we need to see token in First(Y) (or in Follow(Y) if Y ε) Formlly, need to see token t such tht Y * t β for some β Suppose Y α β lso in G. Then these configs correspond to the sme prse stte: X YZ Y α Y β Since the bove configurtions represent the sme stte, we cn: Put them into set together. dd ll other equivlent configurtions to chieve closure. (lgorithm lter) This set represents one prser stte: the stte the prser cn be in while prsing string. 21 Trnsitions between sttes Prser goes from one stte to nother bsed on symbols processed X YZ Y α Y β Y XY Z Model prse s finite utomton! When stte (configurtion set) hs dot t end of n item, tht is F ccept stte Build LR(0) prser bsed on this F 22 11
Constructing item sets & closure Strting Configurtion: ugment Grmmr with symbol S dd production S S to grmmr Initil item set I0 gets S S Perform Closure on S S (Tht completes prser strt stte.) Compute Successor function to mke next stte (next item set) 23 Computing closure Closure(I) 1. Initilly every item in I is dded to closure(i) 2. If α B β is in closure(i) for ll productions B γ, dd B γ 3. Repet step 2 until set gets no more dditions. Exmple Given the configurtion set: { E E+T} Wht is the closure of { E E+T}: E E + T by rule 1 E T by rule 2 T (E) by rule 2 nd 3 T id by rule 2 nd 3 (1) E E + T (2) E T (3) T (E) (4) T id 24 12
Building stte trnsitions LR Tbles need to know wht stte to goto fter shift or reduce. Given Set C & symbol X, we define the set C = Successor (C,X) s: For ech config in C of the form Y α X β, 1. dd Y α X β to C 2. Do closure on C Informlly, move by symbol X from one item set to nother; move to the right of X in ll items where dot is before X; remove ll other items; compute closure. C X C 25 Successor exmple Given I= {E E + T, E T, T (E), T id } Wht is successor(i, ( )? move the fter ( : T ( E ) compute the closure: T ( E) E E + T E T T (E) T id (1) E E + T (2) E T (3) T (E) (4) T id 26 13
Construct the LR(0) tble Construct F={I0, I1, I2,..., In} Stte i is determined by Ii. The prsing ctions for stte i re: if α is in Ii, then set ction[i, ] to reduce α for ll inputs (if is not S ) If S S is in Ii, then set ction[i, $] to ccept. if α β is in Ii nd successor(ii, )=Ij, then set ction[i,j] to shift j. ( is terminl) The goto trnsitions for stte i re constructed for ll non-terminls using the rule: if successor(ii,)=ij, then goto[i, ]=j. ll entries not defined by bove rules re errors. The initil stte I0 is the one constructed from S S. 27 Steps of constructing LR(0) tble 1. ugment the grmmr; 2. Drw the trnsition digrm; 1. Compute the configurtion set (item set/stte); 2. Compute the successor; 3. Fill in the ction tble nd Goto tble. (0) E E (1) E E + T (2) E T (3) T (E) (4) T id 28 14
Configurtion set Successor I0: E' E I1 E E+T I1 E T I2 T (E) I3 T id I4 I1: E' E ccept (dot t end of E rule) E E +T I5 I2: E T Reduce 2 (dot t end) I3: T ( E) I6 E E+T I6 E T I2 T (E) I3 T id I4 I4: T id Reduce 4 (dot t end) I5: E E+ T I8 T (E) I3 T id I4 I6: T (E ) I7 E E +T I5 I7: T (E) Reduce 3 (dot t end) I8: E E+T Reduce 1 (dot t end) Item sets exmple 29 Trnsition digrm I 0 E E' E E E + T E T T (E) T id I 1 E' E E E + T id I 3 I 4 T id id + id ( I 5 E E + T T (E) T id + T E E + T I 7 I 8 T (E) I 2 T E T T ( T ( E) E E + T E T T (E) T id E I 6 T (E ) E E + T ) ( 30 15
The prsing tble Stte on TOS ction Goto id + ( ) $ E T 0 S4 S3 1 2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 6 2 4 R4 R4 R4 R4 R4 5 S4 S3 8 6 S5 S7 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 31 Prsing n erroneous input Stte stck Input Prser ction S0 id + +$ Shift S4 (0) E E (1) E E + T (2) E T (3) T (E) (4) T id S0 S4 + +$ Reduce 4) T id, pop S4, Goto S2 S0 S2 + +$ Reduce 2) E T, pop S2, Goto S1 S0 S1 + +$ Push S5 S0 S1 S5 +$ No ction [S5, +] Error! Stte on ction Goto TOS id + ( ) $ E T 0 S4 S3 S1 S2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 S6 S2 4 R4 R4 R4 R4 R4 5 S4 S3 S8 6 S5 S7 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 32 16
Subset construction nd closure I 1 S S S S S I 0 S S' S S S S S S' S I 2 S S I 4 S S I 3 S I 0 S' S S S S S I 2 S S S S S S I 4 S I 3 33 LR(0) grmmr grmmr is LR(0) if the following two conditions hold: 1. For ny configurtion set contining the item α β, there is no complete item B γ in tht set. No shift/reduce conflict in ny stte in tble, for ech stte, either shift or reduce 2. There is t most one complete item α in ech configurtion set. No reduce/reduce conflict in tble, for ech stte, use sme reduction rule for every input symbol. Very few grmmrs meet the requirements to be LR(0). 34 17
I 2 S E E E+T T T id (E) id[e] I 1 E I 0 E' E E E + T E' E E E + T E T T (E) T id T id[e] T E T id ( I 3 T id T id [E] T id [ I 4 T ( E) E E + T E T T (E) T id T id[e] + id I 9 T id[ E] E E +T E T T (E) T id T id[e] E ( E I 5 I 10 E E + T T (E) T id T id[e] + T id[e ] E E + T ] T id[e] I 11 ( + T (E ) E E + T T E E + T I 7 Incomplete digrm T (E) I 6 ) I 8 35 SLR Prse tble (incomplete) Stte on TOS ction Goto id + ( ) $ [ ] E T 0 S4 S3 1 2 1 S5 ccept 2 R2 R2 R2 R2 (0) S E (1) E E+T (2) E T (3) T id (4) T (E) (5) T id[e] 3 S4 S3 6 2 4 R3 R3 R3 S9 R3 5 S4 S3 8 6 S5 S7 7 R5 R5 R5 R5 8 R1 R1 R1 R1 9 S4 S3 10 2 10 S5 S11 11 R5 R5 R5 R5 36 18
Stte stck Input Prser ction S0 id [ id ] + id $ Shift S4 S0 S4 id [ id ] + id $ S9 S0 S4id S9[ id ] + id $ shift S4 S0 S4id S9[ S4id ] +id $ Reduce T id S0 S4id S9[ S2T ] + id $ Reduce E T S0 S4id S9[ S10E ] + id $ S11 S0 S4id S9[ S10E S11] + id $ Reduce T id[e] S0 S2T + id $ Reduce E T S0 S1E + id $ S5 S0 S1E S5+ id $ S4 S0 S1E S5+ S4id $ Reduce T id S0 S1E S5+ S8T $ Reduce E E+T S0 S1E $ ccept 37 I 0 S E E E+T T V=E T id (E) id[e] V id E I 1 E' E E E + T E' E E E + T E T E V=E T (E) T id T id[e] V id I 2 T E > T id I 4 T id T id [E] V id Shift/reduce conflict: T id T id [E] Reduce/reduce conflict: T id V id 38 19
SLR Prse tble (incomplete) Stte on TOS ction Goto id + ( ) $ [ ] E T 0 S4 S3 1 2 1 S5 ccept 2 R2 R2 R2 R2 3 S4 S3 6 2 4 R4 R4 R4 S9 R4 (0) S E (1) E E+T (2) E T (3) E V=E (4) T id (5) T (E) (6) T id[e] (7) V id 5 S4 S3 8 6 S5 S7 7 R5 R5 R5 R5 8 R1 R1 R1 R1 9 10 11 39 LR(0) key points Construct LR tble Strt with ugmented grmmr (S S) Generte items from productions. Insert the Dot into ll positions Generte item sets (or configurtion sets) from items; they re our prser sttes. Generte stte trnsitions from function successor (stte, symbol). Build ction nd Goto tbles from sttes nd trnsitions. Tbles implement shift-reduce prser. View [sttes nd trnsitions] s finite utomton. n Item represents how fr prser is in recognizing prt of one rule s RHS. n Item set combines vrious pths the prser might hve tken so fr, to diverge s more input is prsed. LR(0) grmmrs re esiest LR to understnd, but too simple to use in rel life prsing. 40 20
Simple LR(1) prsing: SLR LR(0) One LR(0) stte mustn t hve both shift nd reduce items, or two reduce items. So ny complete item (dot t end) must be in its own stte; prser will lwys reduce when in this stte. SLR Peek hed t input to see if reduction is pproprite. Before reducing by rule XYZ, see if the next token is in Follow (). Reduce only in tht cse. Otherwise, shift. 41 Construction for SLR tbles 1. Construct F = {I0, I1,... In }, the LR(0) item sets. 2. Stte i is Ii. The prsing ctions for the stte re: ) If α is in Ii then set ction[i,] to reduce > α for ll in Follow() ( is not S'). b) If S' S is in Ii then set ction[i,$] to ccept. c) If α β is in Ii nd successor(ii, ) = Ij, then set ction[i,] to shift j ( must be terminl). 3. The goto trnsitions for stte i re constructed for ll non-terminls using the rule: If successor(ii, ) = Ij, then Goto [i, ] = j. 4. ll entries not defined by rules 2 nd 3 re errors. 5. The initil stte is closure of set with item S S. 42 21
Properties of SLR Pickier rule bout setting ction tble is the only difference from LR(0) tbles; If G is SLR it is unmbiguous, but not vice vers; Stte cn hve both shift nd reduce items, if Follow sets re disjoint. 43 SLR Exmple Item sets I0 nd successor (I0, id): E' E E E + T E T T (E) T id T id[e] id T id T id [E] E' E E E + T T T (E) id id[e] LR(0) prser sees both shift nd reduce, but SLR prser consults Follow set: Follow(T) = { +, ), ], $ } so T id mens reduce on + or ) or ] or $ T id [E] mens shift otherwise (e.g. on [ ) 44 22
SLR Exmple 2 E' E E E + T E T E V = E T (E) T id V id id T id V id E' E E E + T T V = E T (E) id V id Two complete LR(0) items, so reduce-reduce conflict in LR(0) grmmr, but: Follow(T) = { +, ), $ } Follow(V) = { = } Disjoint, so no conflict. Seprte ction entries in tble. 45 SLR grmmr grmmr is SLR if the following two conditions hold: If items α β nd B γ re in stte, then terminl Follow(B). no shift-reduce conflict on ny stte. This mens the successor function for x from tht set either shifts to new stte or reduces, but not both. For ny two complete items α nd B β in stte, the Follow sets must be disjoint. (Follow() Follow(B) is empty.) no reduce-reduce conflict on ny stte. If more thn one non-terminl could be reduced from this set, it must be possible to uniquely determine which using only one token of lookhed. Compre with LR(0) grmmr: 1. For ny configurtion set contining the item α β, there is no complete item B γ in tht set. 2. There is t most one complete item α in ech configurtion set. Note tht LR(0) SLR 46 23
SLR 1. S S 2. S dc 3. S db 4. c In S3 there is reduce/shift conflict: It cn be R4 or shift. By looking t the Follow set of, the conflict is removed. ction Goto b c d $ S S0 S2 1 S1 S2 S3 4 S3 S5 R4 S4 S6 S5 R2 S6 R3 S0: S' S S dc S db S S1: S S d S2: S d c S d b c c S3: S dc c S4: S d b b S5: S dc S6: S db 47 Prse trce Stte stck Input Prser ction S0 dc$ Shift S2 S0 S2d c$ Shift S3 S0 S2d S3c $ shift S5 S0 S2d S3c S5 $ Reduce 2 S0 S1S $ ccept 48 24
Non-SLR exmple 1. S S 2. S dc 3. S db 4. S 5. c S0: S' S S dc S db S c c S S9: c d S1: S S S2: S d c S d b c S7: S c S3: S dc c S4: S d b S8: S S3 hs shift/reduce conflict. By looking t Follow(), both nd b re in the follow set. So under column we still don t know whether to reduce or shift. b S5: S dc S6: S db 49 The conflict SLR prsing tble ction Goto b c d $ S S0 S9 S2 1 7 S1 S2 S3 4 S3 S5/R5 R5 S4 S6 S5 R2 S6 R3 S7 S8 S8 R4 S9 R5 R5 Follow() = {, b} 50 25
LR(1) prsing Mke items crry more informtion. LR(1) item: X1...Xi Xi+1...Xj, tok Terminl tok is the lookhed. Mening: hve sttes for X1...Xi on stck lredy expect to put sttes for Xi+1...Xj onto stck nd then reduce, but only if token following Xj is tok tok cn be $ Split Follow() into seprte cses Cn cluster items nottionlly: [ α, /b/c] mens the three items: [ α, ] [ α, b] [ α, c] Reduce α to if next token is or b or c {, b, c } Follow() 51 LR(1) item sets More items nd more item sets thn SLR Closure: For ech item [ α Bβ, ] in I, for ech production B γ in G, nd for ech terminl b in First(β), dd [B γ, b] to I (dd only items with the correct lookhed) Once we hve closed item set, use LR(1) successor function to compute trnsitions nd next items. Exmple: Initil item: [S S, $] Wht is the closure? [S dc, $] [S db, $] [S, $] [ c, ] S S S dc db c 52 26
LR(1) successor function Given I n item set with [ α Xβ, ], dd [ α X β, ] to item set J. successor(i,x) is the closure of set J. Similr to successor function to LR(0), but we propgte the lookhed token for ech item. Exmple S0: S' S, $ S dc, $ S db, $ S, $ c, c S d S1: S S, $ S2: S d c, $ S d b, $ c, b S9: c, 53 LR(1) tbles ction tble entries: If [ α, ] Ii, then set ction[i,] to reduce by rule α ( is not S'). If [S S, $] Ii then set ction[i,$] to ccept. If [ α β, b] is in Ii nd succ(ii, ) = Ij, then set ction[i,] to shift j. Here is terminl. Goto entries: For ech stte I & ech non-terminl : If succ(ii, ) = Ij, then Goto [i, ] = j. 54 27
LR(1) digrm 1. S S 2. S dc 3. S db 4. S 5. c S0: S' S, $ S dc, $ S db, $ S, $ c, c S S9: c, d S1: S S, $ S2: S d c, $ S d b, $ c, b S7: S, $ c S3: S dc, $ c, b S4: S d b, $ S8: S, $ b S5: S dc, $ S6: S db, $ 55 Crete the LR(1) prse tble ction Goto b c d $ S S0 S9 S2 1 7 S1 S2 S3 4 S3 S5 R5 S4 S6 S5 R2 S6 R3 S7 S8 S8 R4 S9 R5 56 28
nother LR(1) exmple Crete the trnsition digrm 0) S' S 1) S 2) 3) b S0: S' S, $ S, $, /b b, /b b S S9: b, /b S1: S S, $ S2: S, $, $ b, $ S7:, /b, /b b, /b b b S3: S, $ S4:, $, $ b, $ b S5: b, $ S8:, /b S6:, $ 57 Prse tble stte ction Goto b $ S S0 S7 S9 1 2 S1 ccept S2 S4 S5 3 S3 R1 S4 S4 S5 6 S5 R3 S6 R2 S7 S7 S9 8 S8 R2 R2 S9 R3 R3 58 29
Prse trce stck remining input prse ction S0 bb$ S9 S0S9 b$ R3 b S0S2 b S4 S0S2S4 b S4 S0S2S4S4 b S5 S0S2S4S4S5 $ R3 b S0S2S4S4S6 $ R2 S0S2S4S6 $ R2 S0S2S3 $ R1 S S0S1 $ ccept 59 LR(1) grmmr grmmr is LR(1) if the following 2 conditions re stisfied for ech configurtion set: For ech item [ α β, b] in the set, there is no item in the set of the form [B γ, ] In the ction tble, this trnsltes to no shift/reduce conflict. If there re two complete items [ α, ] nd [B β, b] in the set, then nd b should be different. In the ction tble, this trnsltes to no reduce/reduce conflict Compre with the SLR grmmr For ny item α β in the set, with terminl, there is no complete item B γ in tht set with in Follow(B). For ny two complete items α nd B β in the set, the Follow sets must be disjoint. Note tht SLR(1) LR(1) LR(0) SLR(1) LR(1) 60 30
LR(1) tbles continued LR(1) tbles cn get big exponentil in size of rules Cn we keep the dditionl power we got from going SLR LR without tble explosion? LLR! We split SLR(1) sttes to get LR(1) sttes, mybe too ggressively. Try to merge item sets tht re lmost identicl. Tricky bit: Don t introduce shift-reduce or reduce-reduce conflicts. 61 LLR pproch Just sy LLR (it s lwys 1 in prctice) Given the numerous LR(1) sttes for grmmr G, consider merging similr sttes, to get fewer sttes. Cndidtes for merging: sme core (LR(0) item) only differences in lookheds Exmple: S1: X α β, /b/c S2: X α β, c/d S12: X α β, /b/c/d 62 31
Sttes with sme core items S0: S' S, $ S, $, /b b, /b b S S9: b, /b S1: S S, $ S2: S, $, $ b, $ S7:, /b, /b b, /b b b S3: S, $ S4:, $, $ b, $ b S5: b, $ S8:, /b 0) S' S 1) S 2) 3) b S6:, $ 63 Merge the sttes S0: S' S, $ S, $, /b b, /b b b S S59: b, /b/$ S1: S S, $ S2: S, $, $ b, $ S47:, /b/$, /b/$ b, /b/$ b S3: S, $ S4:, $, $ b, $ b S5: b, $ S68:, /b/$ 0) S' S 1) S 2) 3) b S6:, $ 64 32
Merge the sttes S0: S' S, $ S, $, /b b, /b b b S S59: b, /b/$ S1: S S, $ S2: S, $, $ b, $ S47:, /b/$, /b/$ b, /b/$ b S3: S, $ S68:, /b/$ 0) S' S 1) S 2) 3) b Follow()={ b $ } 65 fter the merge Wht hppened when we merged? Three fewer sttes Lookhed on items merged. In this cse, lookhed in merged sets constitutes entire Follow set. So, we mde SLR(1) grmmr by merging. Result of merge usully not SLR(1). 66 33
conflict fter merging 1) S Bc Cd bbd bcc 2) B e 3) C e S0: S' S, $ S Bc, $ S Cd, $ S bbd, $ S bcc,$ S S1: S S, $ S2: S Bc, $ S Cd, $ B e, c C e,d e S3: B e, c C e, d b S4: S b Bd, $ S b Cc, $ B e, d C e, c e S5: B e, d C e, c 67 Prcticl considertion mbiguity in LR grmmrs G: G produces multiple rightmost derivtions. (i.e. cn build two different prse trees for one input string.) Remember: E E + E E * E (E) id We dded terms nd fctors to force unmbiguous prse with correct precedence nd ssocitivity Wht if we threw the grmmr into n LR-grmmr tble-construction mchine nywy? Conflicts = multiple ction entries for one cell We choose which entry to keep, toss others 68 34
Precedence nd ssocitivity in JvCUP E S0: E' E E E+E E E*E E (E) E id S1: E E E E +E E E * E ( S2: E ( E) E E+E E E*E E (E) + * ( S4: E E+ E E E+E E E*E E (E) E id S5: E E* E E E+E E E*E E (E) E id * E + + E S8: E E*E E E +E E E *E E E + E E * E (E) id * S7: E E+E E E +E E E *E id S3: E id E S6: E (E ) E E +E E E *E ) S9: E (E) 69 JvCup grmmr terminl PLUS, TIMES; precedence left PLUS; precedence left TIME; E::=E PLUS E E TIMES E ID Wht if the input is x+y+z? When shifting + conflicts with reducing production contining +, choose reduce Wht if the input is x+y*z? Wht if the input is x*y+z? 70 35
Trnsition digrm for ssignment expr S1: S S, $ S id V=E V id E V n S S0: S' S, $ S id, $ S V=E, $ V id, = V S3: S V =E, $ S2: S id, $ V id, = id = S4: S V= E, $ E V, $ E n, $ V id, $ E V n id S5: S V=E, $ S6: E V, $ S7: E n, $ S8: V id, $ 71 Why re there conflicts in some rules in ssignments? P S2: S P, $ Non LR(1) grmmr P m Pm S0: S' P, $ P m, $ P Pm, $ P, $ P, m P m, m P Pm, m m S1: P m, $/m It is n mbiguous grmmr. There re two rightmost/leftmost derivtions for sentence m: P Pm m P m *** Shift/Reduce conflict found in stte #0 between P ::= (*) nd P ::= (*) m under symbol m 72 36
slightly chnged grmmr, still not LR P S2: S P, $ Non LR(1) grmmr P m m P S0: S' P, $ P m, $ P mp, $ P, $ m S1: P m, $ P m P, $ P m, $ P mp, $ P, $ P S3: P mp, $ It is n mbiguous grmmr. There re two prse trees for sentence m: P mp m P m Reduce/Reduce conflict found in stte #1 between P ::= m (*) nd P ::= (*) under symbols: {EOF} Produced from jvcup 73 Modified LR(1) grmmr S0: S' P, $ P, $ P mp, $ P m S1: S P, $ S2: P m P, $ P mp, $ P, $ P LR(1) grmmr P m P S3: P mp, $ Note tht there re no conflicts The derivtion: P mp mmp mm 74 37
nother wy of chnging to LR(1) grmmr LR(1) grmmr P Q Q m m Q P S1: S P, $ S0: S' P, $ P Q, $ P, $ Q m, $ Q mq, $ m S2: Q m, $ Q m Q, $ Q m, $ Q mq, $ Q S3: Q mq, $ 75 LR grmmrs: comprison LR(0) SLR(1) LLR LR(1) CFG dvntges Disdvntges LR(0) Smllest tbles, esiest to build Indequte for mny PL structures SLR(1) LLR(1) LR(1) More inclusive, more inform?on thn LR(0) Sme size tbles s SLR, more lngs, efficient to build Most precise use of lookhed, most PL structures we wnt Mny useful grmmrs re not SLR (1) empiricl, not mthem?cl Tbles order of mgnitude > SLR(1) 76 38
The spce of grmmrs CFG Unmbiguous CFG LL(1) LR(1) LLR(1) SLR(1) LR(0) 77 The spce of grmmrs Wht re used in prctice CFG Unmbiguous CFG LL(1) LR(1) LLR(1) SLR(1) LR(0) 78 39
Verifying the lnguge generted by grmmr To verify grmmr: every string generted by G is in L every string in L cn be generted by G Exmple: S (S)S ε the lnguge is ll the strings of blnced prenthesis, such s (), (()), () (()()) Proof prt 1: every sentence derived from S is blnced. bsis: empty string is blnced. induction: suppose tht ll derivtions fewer thn n steps produce blnced sentences, nd consider leftmost derivtion of n steps. such derivtion must be of the form: S (S)S *(x)s *(x)y Proof prt 2: every blnced string cn be derived from S Bsis: the empty string cn be derived from S. Induction: suppose tht every blnced string of length less thn 2n cn be derived from S. Consider blnced string w of length 2n. w must strt with (. w cn be written s (x)y, where x, y re blnced. 79 Hierrchy of grmmrs Lnguge 1: ny string of nd b CFG is more powerful thn RE Type n grmmr is more powerful thn type n+1 grmmr Exmple: Σ={, b} The lnguge of ny string consists of nd b CFG b ε Cn be describe by RE The lnguge of plindromes consist of nd b bb b ε Cn be described by CFG, but not RE Lnguge 2: plindromes Lnguge 3 Lnguge 2 Lnguge 5 Lnguge 4 RG Lnguge 1 When grmmr is more powerful, it is not tht it cn describe lrger lnguge. Insted, the power mens the bility to restrict the set. More powerful grmmr cn define more complicted boundry between correct nd incorrect sentences. Therefore, more different lnguges 80 40
Metphoric comprison of grmmrs RE drw the rose use stright lines (ruler nd T-squre suffice) CFG pproximte the outline by stright lines nd circle segments (ruler nd compsses) 81 41