Bottom-up nlysis Shift-reduce prsing. Constructs the derivtion tree from ottom to top. Reduces the string to the strt symol. Produces reverse right derivtion. Exmple: G(E): 1. E E + T 2. T 3. T T * F 4. F 5. F id 6. ( E ) Input: id + id Reverse right derivtion (the rt of finding hndles). The hndles re underlined: id + id F + id T + id E + id E + F E + T E (strt symol ccept) The trnsfotion to right derivtion is clled the cnonicl reduction sequence. This sequence is reched y so-clled "hndle pruning" (hndle elimintion). How is this sequence rrived t? where Method: w = γ n γ n-1... γ 1 γ 0 = S w: sttement γ n, γ n-1,..., γ 1 : sententil fos γ 0 : strt symol 1. oclize the hndle β n in the sententil fo γ n. 2. Replce the hndle β n with A. (There is rule A β n ). Then we get γ n-1. 3. oclize the hndle β n-1 in sententil fo γ n-1. 4. etc. until the trget symol is reched. A shift-reduce prser finds the rules in the order: 5, 4, 2, 5, 4, 1. ecture 4 & 5 Bottom-up prsing Pge 106 ecture 4 & 5 Bottom-up prsing Pge 107 A stck is needed to implement shift-reduce prser. New symols: $ (or _!_, or ) mrks the end of the string mrks the strt (the stck is empty) Stck Input w Configurtion t the strt S Configurtion t the end The nlyser works y shifting symols on the stck (from input) to hndle (i.e., complete right side of some rule) is on the top of the stck (never inside). Then the hndle is reduced to the corresponding left side (i.e., nonteinl). Exmple: Prse id + id ccording to G(E): Step Stck Input Event Hndle Rule 1 id+id Shift 2 id +id Reduce F id 5 3 F +id Reduce T F 4 4 T +id Reduce E T 2 5 E +id Shift 6 E+ id Shift 7 E+id Reduce F id 5 8 E+F Reduce T F 4 9 E+T Reduce E E + T 1 10 E Accept Accept only if we only hve the trget symol on the stck nd there is no more input (otherwise n error hs occurred). Prolems: 1. How do we find the hndle? 2. How is reduction ccomplished when we hve severl rules which hve the sme right side? An R-prser uses lookhed. ecture 4 & 5 Bottom-up prsing Pge 108 ecture 4 & 5 Bottom-up prsing Pge 109
R-prser Automtic construction Chrcteristics: + An R nlyser cn nlyze lmost ll lnguges which cn e descried with CFG. + More generlly: Rewriting seldom needed, left recursion, right recursion, severl rules with the sme symols t the eginning re not prolem. + As fst s hnd-written. + Discovers the error s soon s it is possile. - The semntics cn not e introduced so esily. - Difficult to generte y hnd. R clsses R(0) SR(1) AR(1) R(1) R(k) Too wek (no lookhed) Simple R, 1 token lookhed Most common, 1 token lookhed 1 token lookhed - tles fr too ig k tokens lookhed Exmple of tle size: Pscl: AR(1) 20 pges, 2 ytes per entry. Differences: Tle size vries widely. Errors not discovered s quickly. imittions in the lnguge definition. Augmented gmr (extended gmr) Add rule which includes the trget symol <S> nd the end symol (egentligen inte, men synlig-gör slutsymolen) <SYS> <S> Exmple of prsing using the R method (See the following pges for gmr nd prse tles) Input:, Step Stck Input Tle entries 1 0, ACTION[0, ] = S4 2 04, ACTION[4,,] = R3 (E ) 0E, GOTO[0, E] = 6 3 0E6, ACTION[6,,] = R2 ( E) 0, GOTO[0, ] = 1 4 01, ACTION[1,,] = S2 5 01,2 ACTION[2, ] = S5 6 01,25 ACTION[5, ] = R4 (E ) 01,2E GOTO[2, E] = 3 7 01,2E3 ACTION[3, ] = R1 (,E) 0 GOTO[0, ] = 1 8 01 ACTION[1, ] = A (ccept) ecture 4 & 5 Bottom-up prsing Pge 110 ecture 4 & 5 Bottom-up prsing Pge 111 P A R S E R A C T I O N T A B E Exmple of n SR(1) gmr T E R M I N A S V O C A B U A R Y N O N T E R M I N A S 1. _!_ 5. <list> 2., 6. <element> 3. 4. GOA SYMBO IS: <list> P R O D U C T I O N S 1. <list> ::= <list>, <element> 2.! <element> 3. <element> ::= 4.! TOP OF STACK INPUT SYMBO -!-, STATE NAME! 1 2 3 4 ------------------------------!------------ 0 <SYS>! X X S4 S5 1 <list>! A S2 * * 2,! X X S4 S5 3 <element>! R1 R1 * * 4! R3 R3 X X 5! R4 R4 X X 6 <element>! R2 R2 * * G O T O T A B E! (Note: The GOTO tle is not the sme s the GOTO grph, which! represents oth the ACTION tle nd the GOTO tle) TOP OF STACK SYMBO <list><element> STATE NAME! 5 6 ------------------------------!------ 0 <SYS>! 1 6 1 <list>! * * 2,! * 3 3 <element>! * * 4! * * 5! * * 6 <element>! * *!! NB: the following revitions re used in the tles ove!! X = error!! * = impossile to end up here! ecture 4 & 5 Bottom-up prsing Pge 112 ecture 4 & 5 Bottom-up prsing Pge 113
Construction of prse tree (Bottom-up) Shift opertions: Crete one-node tree contining the shifted symol. Reduce opertions: When reducing hndle β to A (s in A β), crete new node A whose children re those nodes tht were creted in the hndle. During the nlysis we hve forest of su-trees. Ech entry in the stck points to its corresponding su-tree. When we ccept, the whole prse tree is completed. Exmple. Construction of prse tree. Step 2 prse stck 0 4 semntic stck Step 4 0 1, E E E E Step 3 0 E 6 E Step 5 (clened up) 0 1, 2 Step 8 0 1, E E, E ecture 4 & 5 Bottom-up prsing Pge 114 ecture 4 & 5 Bottom-up prsing Pge 115 Definition: R-gmr: A gmr for which unique R-tle cn e constructed is clled n R gmr (R(0), SR(1), AR(1), R(1),...). No miguous gmrs re R gmrs. There re unmiguous gmrs which re not R gmrs. The stte t the top of the stck contins ll the infotion we need. Definition: Vile prefix The prefixes of right sententil fo which do not contin ny symols to the right of hndle. Exmple: (See next pge for gmr nd prse tle) Input:,, 1. <list> <list>, <element> 2. <element> 3. <element> 4. Right derivtion (hndles re underlined) <list> <list>, <element> <list>, <list>, <element>, <list>,, <element>,,,, Vile prefixes of the sententil fo: <list>,, re { ε, <list>, <list>,, <list>, } ecture 4 & 5 Bottom-up prsing Pge 116 ecture 4 & 5 Bottom-up prsing Pge 117
A prser genertor constructs DFA which recognises ll vile prefixes. This DFA is then trnsfoed into tle fo. Automtic construction of the ACTION- nd GOTOtle: Definition: R(0) item An R(0) item of rule P is rule with dot somewhere in the right side. Exmple: All R(0) items of the production 1. <list> <list>, <element> re <list> <list>, <element> <list> <list>, <element> <list> <list>, <element> <list> <list>, <element> Intuitively n item is interpreted s how much of the rule we hve found nd how much remins. Items re put together in sets which ecome the R nlyser s stte. We wnt to construct DFA which recognises ll vile prefixes of G(<SYS>): Augmented gmr: 0. <SYS> <list> 1. <list> <list>, <element> 2. <element> 3. <element> 4. strt <list>, 0 <element> 1 2 3 <SYS> <list> <list> <list>, <element> <element> 6 4 <element> <list> <element> 5 <element> GOTO-grph (A GOTO-grph is not the sme s GOTO-tle ut corresponds to n ACTION + GOTO-tle. The grph discovers vile prefixes.) ecture 4 & 5 Bottom-up prsing Pge 118 ecture 4 & 5 Bottom-up prsing Pge 119 Algorithm to construct GOTO-grph from the set of R(0)-items (A detiled description is given in the textook p. 221 ff) Bsed on the cnonicl collection of R(0) items drw the GOTO grph. 0 <SYS> <list> Kernel (Bsis) <list> <list>, <element> <list> <element> <element> <element> Ech set corresponds to node in the GOTO grph. When we hve found the symol ehind we move the dot over the symol. 1 <SYS> <list> <list> <list>, <element> (empty closure s precedes teinls) (shift-reduce conflict in the Kernel set!) Closure (of kernel items) Kernel Closure The GOTO grph discovers those prefixes of right sententil fos which hve (t most) one hndle furthest to the right in the prefix. GOTO grph with its R(0) items: strt I 0 S, E E E E I 1 S,, E E I 6 E I 2, E E E I 3 2 <list> <list>, <element> Kernel I 4 E <element> <element> etc., (see the finl result elow). Closure I 5 E ecture 4 & 5 Bottom-up prsing Pge 120 ecture 4 & 5 Bottom-up prsing Pge 121
Cnonicl collection of R(0) items for the gmr G(<SYS>) I 0 <SYS> <list> <list> <list>, <element> <list> <element> <element> <element> R ( 0 ) S E T S O F I T E M S STATENR. ITEMSET SUCCESSOR 0 <SYS> -->.<list> 1 <list> -->.<list>, <element> 1 <list> -->.<element> 6 <element> -->. 4 <element> -->. 5 1 <SYS> --> <list>. ==>0 <list> --> <list>., <element> 2 I 1 <SYS> <list> <list> <list>, <element> 2 <list> --> <list>,.<element> 3 <element> -->. 4 <element> -->. 5 I 2 <list> <list>, <element> <element> <element> I 3 <list> <list>, <element> 3 <list> --> <list>, <element>. ==>1 4 <element> -->. ==>3 5 <element> -->. ==>4 6 <list> --> <element>. ==>2 I 4 <element> I 5 <element> I 6 <list> <element> O O K - A H E A D S E T S NONTERMINA SYMBOS OOK-AHEAD SET <list> _!_, <element> _!_, ecture 4 & 5 Bottom-up prsing Pge 122 ecture 4 & 5 Bottom-up prsing Pge 123 Fill in the ACTION tle ccording to the GOTO grph I i : stte i (line i, itemset i) 1. If there is n item nd <A> α β I i GOTO(I i, ) = I j i I i Fill in shift j for row i nd column for symol. j I j 2. If there is complete item (i.e., ends in dot ): <A> α I i Fill in reduce x where x is the production numer for x: <A> α In which column(s) should reduce x e written? R(0) fills in for ll input. SR(1) fills in for ll input in FOOW(<A>). AR(1) fills in for ll those tht cn follow certin instnce of <A> ACTION: i shift j 3. If we hve <SYS> <S> ccept the symol 4. Otherwise error. ecture 4 & 5 Bottom-up prsing Pge 124 ecture 4 & 5 Bottom-up prsing Pge 125
Fill in the GOTO tle <A> α I i If the GOTO grph(i i, <A>) = I j fill in GOTO[ i, <A>] = j imittions of R gmrs No miguous gmr is R(k) Shift-Reduce conflict Reduce-Reduce conflict GOTO: <A> Wht is conflict? i j Given the current stte nd input symol it is not possile to select n ction (there re two or more entries in the ction tle). Exmple. Reduce/Reduce-conflict procid ident exp ident Exmple. Shift/Reduce-conflict if... then... else Stck top Next token ecture 4 & 5 Bottom-up prsing Pge 126 ecture 4 & 5 Bottom-up prsing Pge 127 dp Exmple of Reduce/Reduce conflict: X ( A ) ( B ) A REA INTEGER IDENT B BOO CHAR IDENT Fctorised gmr: X ( A ) ( B ) ( C ) X ( A ) IDENT X Conflict elimintion y fctorising expnsion rewriting A REA INTEGER B BOO CHAR C IDENT ( B ) IDENT Exmple of Shift/Reduce conflict: X ( A ) OPT_Y ( B ) OPT_Y ε Y Y... A... B... Expnded gmr: (One token lookhed: ( ) X ( A ) X X ( A ) ( B ) Y ( B ) Y... A... B... OPT_Y ε ( B ) ecture 4 & 5 Bottom-up prsing Pge 128 ecture 4 & 5 Bottom-up prsing Pge 129
A concrete exmple from the ook pp. 174, 175: rewriting stmt if expr then stmt if expr then stmt else stmt other Amiguous gmr, s the following sttement hs two prse trees: -lt 2 if E1 then if E2 then S1 else S2 -lt 1 stmt if expr then stmt E1 if expr then stmt else stmt Rewriting the gmr stmt mtched_stmt unmtched_stmt mtched_stmt if expr then mtched_stmt else mtched_stmt other unmtched_stmt if expr then stmt if expr then mtched_stmt else unmtched_stmt This gmr will crete the first lterntive of prse tree, i.e. try to mtch immeditely if possile. E2 S1 S2 stmt if expr then stmt else stmt E1 if expr then stmt S2 E2 S1 ecture 4 & 5 Bottom-up prsing Pge 130 ecture 4 & 5 Bottom-up prsing Pge 131 Shift-Reduce conflict If oth criterion (1) nd (2) hold for certin item-set: <A> α <B> γ β An AR(1) gmr run through the SR(1) genertor Mny conflicts re resolved in SR(1) nd AR(1) y using lookhed. These methods construct FOOWsets, e.g.: FOOW(<list>) = {,, } V O C A B U A R Y This set specifies teinl symols which cn follow the nonteinl <list> during derivtion, lso clled the lookhed-set. All SR(1) gmrs re unmiguous ut there re unmiguous gmrs which re not SR(1). The sme is true for R(k). For R(k) the intersection of the lookhed sets must lso e non-empty. Reduce-Reduce conflict If there re severl complete items in n item-set, e.g. T E R M I N A S N O N T E R M I N A S 1. _!_ 7. <S> 2. B 8. <A> 3. C 4. A 5. E 6. D GOA SYMBO IS: <S> 1. <S> ::= B <A> C 2.! A E C 3.! A <A> D 4. <A> ::= E P R O D U C T I O N S 1. <A> α (reduce 1) 2. <B> β α (reduce 2) ecture 4 & 5 Bottom-up prsing Pge 132 ecture 4 & 5 Bottom-up prsing Pge 133
R ( 0 ) S E T S O F I T E M S STATENR. ITEMSET SUCCESSOR 0 <SYS> -->.<S> 1 <S> -->.B <A> C 2 <S> -->.A E C 6 <S> -->.A <A> D 6 1 <SYS> --> <S>. ==>0 2 <S> --> B.<A> C 3 <A> -->.E 5 3 <S> --> B <A>.C 4 4 <S> --> B <A> C. ==>1 5 <A> --> E. ==>4 6 <S> --> A.E C 7 <S> --> A.<A> D 9 <A> -->.E 7 7 <A> --> E. ==>4 <S> --> A E.C 8 8 <S> --> A E C. ==>2 9 <S> --> A <A>.D 10 An AR(1) gmr run through the AR(1) genertor T E R M I N A S V O C A B U A R Y N O N T E R M I N A S 1. _!_ 7. <s> 2. 8. <> 3. c 4. 5. e 6. d Gol symol is: <s> 1. <s> ::= <> c 2.! e c 3.! <> d 4. <> ::= e P R O D U C T I O N S 10 <S> --> A <A> D. ==>3 O O K - A H E A D S E T S NONTERMINA SYMBOS OOK-AHEAD SET <S> _!_ <A> C D ****** SHIFT-REDUCE CONFICT IN STATE 7 PRODUCTIONS: 2 4 THIS GRAMMAR IS NOT SR(1) ecture 4 & 5 Bottom-up prsing Pge 134 ecture 4 & 5 Bottom-up prsing Pge 135 R ( 0 ) S E T S O F I T E M S sttenr. itemset successor 0 <SYS> -->.<s> 1 <s> -->. <> c 2 <s> -->. e c 6 <s> -->. <> d 6 1 <SYS> --> <s>. ==>0 2 <s> -->.<> c 3 <> -->.e 5 3 <s> --> <>.c 4 4 <s> --> <> c. ==>1 5 <> --> e. ==>4 6 <s> -->.e c 7 <s> -->.<> d 9 <> -->.e 7 7 <> --> e. ==>4 <s> --> e.c 8 8 <s> --> e c. ==>2 9 <s> --> <>.d 10 10 <s> --> <> d. ==>3 O O K - A H E A D S E T S STATE PRODUCTION OOK-AHEAD SET 1 0 _!_ 4 1 _!_ c e d 5 4 _!_ c e d 7 4 d 8 2 _!_ c e d 10 3 _!_ c e d P A R S E R A C T I O N T A B E TOP OF STACK INPUT SYMBO STATE NAME! 1 2 3 4 5 6 ----------------------------!---------------- 0 <SYS>! X S2 X S6 X X 1 <s>! A X X X X X 2! X X X X -4 X 3 <>! X X -1 X X X 6! X X X X S7 X 7 e! X X -2 X X R4 9 <>! * * * * * -3 G O T O T A B E TOP OF STACK SYMBO STATE NAME! 7 8 -------------------------!------ 0 <SYS>! 1 * 1 <s>! * * 2! * 3 3 <>! * * 6! * 9 7 e! * * 9 <>! * * NB: The tles re optimised. For exmple, -4 in the ction tle stnds for shift-reduce ccording to rule 4. ecture 4 & 5 Bottom-up prsing Pge 136 ecture 4 & 5 Bottom-up prsing Pge 137
Methods for syntx error mngement 1. Pnic mode 2. Coding of wrong entries in the ACTION tle 3. Wrong productions 4. nguge-independent methods: 4) Continution method, Röchrich (1980) 4) Automtic error recovery, Burke & Fisher (1982) 1. Pnic mode c) Skip input until either i) Prsing cn continue, or ii) An importnt symol hs een found (e.g. PROCEDURE, BEGIN, WHIE,...) d) If prsing cn not continue: Pop the stck until the importnt symol is ccepted. If you rech the ottom of the stck: "Quit --Unrecoverle error." - Much input cn e removed. - Semntic info on the stck disppers. + Systemtic, simple to implement. + Efficient, very fst nd does not require extr memory. (The other methods re delt with lter in the section on error mngement.) ecture 4 & 5 Bottom-up prsing Pge 138 ecture 4 & 5 Bottom-up prsing Pge 139