Syntax Analysis Contxt-fr grammar Top-down and bottom-up parsing cs5363 1
Front nd Sourc program for (w = 1; w < 100; w = w * 2); Input: a stram of charactrs f o r ( `w = 1 ; w < 1 0 0 ; w Scanning--- convrt input to a stram of words (tokns) for ( w = 1 ; w < 100 ; w Parsing---discovr th syntax/structur of sntncs forstmt assign lss assign mptystmt Lv(w) int(1) Lv(w) Lv(w) int(100) mult Lv(w) int(2) cs5363 2
Contxt-fr Syntax Analysis Goal: rcogniz th structur of programs Dscription of th languag Contxt-fr grammar Parsing: discovr th structur of an input string Rjct th input if it cannot b drivd from th grammar cs5363 3
Dscribing contxt-fr syntax Dscrib how to rcursivly compos programs/sntncs from tokns forstmt: for ( xpr ; xpr ; xpr ) stmt xpr: xpr + xpr xpr xpr xpr * xpr xpr / xpr! xpr stmt: assignmnt forstmt whilstmt cs5363 4
Contxt-fr Grammar A contxt-fr grammar includs (T,NT,S,P) A st of tokns or trminals --- T Atomic symbols in th languag A st of non-trminals --- NT Variabls rprsnting constructs in th languag A st of productions --- P Ruls idntifying componnts of a construct BNF: ach production has format A ::= B (or AB) whr A is a singl non-trminal B is a squnc of trminals and non-trminals A start non-trminal --- S Th main construct of th languag Backus-Naur Form: txtual formula for xprssing contxtfr grammars cs5363 5
Exampl: simpl xprssions BNF: a collction of production ruls ::= n + * / Non-trminals: Trminal (tokn): n, +, -, *, / Start symbol: Using CFG to dscrib rgular xprssions n ::= d n d d ::= 0 1 2 3 4 5 6 7 8 9 Drivation: top-down rplacmnt of non-trminals Each rplacmnt follows a production rul On or mor drivations xist for ach program Exampl: drivations for 5 + 15 * 20 =>*=>+*=>5+*=>5+15*=>5+15*20 =>+=>5+=>5+*=>5+15*=>5+15*20 cs5363 6
Pars trs and drivations Givn a CFG G=(T,NT,P,S), a sntnc si blongs to L(G) if thr is a drivation from S to si Lft-most drivation rplac th lft-most non-trminal at ach stp Right-most drivation rplac th right-most non-trminal at ach stp Pars tr: graphical rprsntation of drivations Grammar: ::= n + * / Sntnc: 5 + 15 * 20 Drivations: =>*=>+*=>5+*=>5+15*=>5+15*20 =>+=>5+=>5+*=>5+15*=>5+15*20 Pars trs: 5 + * 20 15 15 20 cs5363 7 5 + *
Languags dfind by CFG ::= num string id + Support both altrnativ ( ) and rcursion Cannot incorporat contxt information Cannot dtrmin th typ of variabl nams Dclaration of variabls is in th contxt (symbol tabl) Cannot nsur variabls ar always dfind bfor usd int w; 0 = w; for (w = 1; w < 100; w = 2w) a = c + 3; a = c + w cs5363 8
Writing CFGs Giv BNFs to dscrib th following languags All strings gnratd by RE (0 1)*11 Symmtric strings of {a,b}. For xampl aba and babab ar in th languag abab and babbb ar not in th languag All rgular xprssions ovr {0,1}. For xampl 0 1, 0*, (01 10)* ar in th languag 0 and *0 ar not in th languag For ach solution, giv an xampl input of th languag. Thn draw a pars tr for th input basd on your BNF cs5363 9
Abstract vs. Concrt Syntax Concrt syntax: th syntax programmrs writ Exampl: diffrnt notations of xprssions Prfix + 5 * 15 20 Infix 5 + 15 * 20 Postfix 5 15 20 * + Abstract syntax: th structur rcognizd by compilrs Idntifis only th maningful componnts Th opration Th componnts of th opration Pars Tr for 5+15*20 5 + 15 * 20 Abstract Syntax Tr for 5 + 15 * 20 + 5 * 15 20 cs5363 10
Abstract syntax trs Condnsd form of pars tr Oprators and kywords do not appar as lavs Thy dfin th maning of th intrior (parnt) nod S If-thn-ls IF B THEN S1 ELSE S2 B S1 S2 Chains of singl productions may b collapsd E E + T T 5 + 3 5 3 cs5363 11
Ambiguous Grammars A grammar is syntactically ambiguous if Som program has multipl pars trs Consqunc of multipl pars trs Multipl ways to intrprt a program Grammar: ::= n + * / Sntnc: 5 + 15 * 20 Pars trs: + * 20 5 + * 5 15 15 20 cs5363 12
Rwrit ambiguous Exprssions Solution1: introduc prcdnc and associativity ruls to dictat th choics of applying production ruls ::= n + * / Prcdnc and associativity * / >> + - All oprators ar lft associativ Drivation for n+n*n =>+=>n+=>n+*=>n+n*=>n+n*n Solution2: rwrit productions with additional non-trminals E ::= E + T E T T T ::= T * F T / F F F ::= n Drivation for n + n * n E=>E+T=>T+T=>F+T=>n+T=>n+T*F=>n+F*F=>n+n*F=>n+n*n How to modify th grammar if + and - has high prcdnc than * and / All oprators ar right associativ cs5363 13
Rwrit Ambiguous Grammars Disambiguat composition of non-trminals Original grammar S = IF <xpr> THEN S IF <xpr> THEN S ELSE S <othr> Altrnativ grammar S ::= MS US US ::= IF <xpr> THEN MS ELSE US IF <xpr> THEN S MS ::= IF <xpr> THEN MS ELSE MS <othr> cs5363 14
Parsing Rcogniz th structur of programs Givn an input string, discovr its structur by constructing a pars tr Rjct th input if it cannot b drivd from th grammar Top-down parsing Construct th pars tr in a top-down rcursiv dscnt fashion Start from th root of th pars tr, build down towards lavs Bottom-up parsing Construct th pars tr in a bottom-up fashion Start from th lavs of th pars tr, build up towards th root cs5363 15
Top-down Parsing Start from th starting non-trminal, try to find a lft-most drivation E ::= E + T E T T T ::= T * F T / F F F ::= n T void ParsE() { if (us th first rul) { ParsE(); if (gtnxttokn()!= PLUS) ErrorRcovry() ParsT(); } ls if (us th scond rul) { } ls } void ParsT() { } void ParsF() { } - T + F T T * F 7 F F 20 15 Crat a procdur for ach non-trminal S Rcogniz th languag dscribd by S Pars th whol languag in a rcursiv dscnt fashion How to dcid which production rul to us? cs5363 16
LL(k) Parsrs Lft-to-right, lftmost-drivation, k-symbol lookahad parsrs Th production for ach non-trminal can b dtrmind by chcking at most k input tokns LL(k) grammar: grammars that can b parsd by LL(k) parsrs LL(1) parsr: th slction of vry production can b dtrmind by th nxt input tokn Grammar: E ::= E + T E T T T ::= T * F T / F F F ::= n (E) Evry production starts with a numbr. Not LL(1) Lft rcursiv ==> not LL(K) Equivalnt LL(1) grammar : Grammar: E ::= TE E ::= + TE - TE ε T ::= FT T ::= *FT / FT ε F ::= n (E) cs5363 17
Eliminating lft rcursion A grammar is lft-rcursiv if it has a drivation AA for som string Lft rcursiv grammar cannot b parsd by rcursiv dscnt parsrs vn with backtracking A::=A β Grammar: E ::= E + T E T T T ::= T * F T / F F F ::= n A::= β A A ::= A ε Grammar: E ::= TE E ::= + TE - TE ε T ::= FT T ::= *FT / FT ε F ::= n Problm: Lft-rcursion could involv multipl drivations cs5363 18
Algorithm: Eliminating lft-rcursion 1. Arrang th non-trminals in som ordr A1,A2,,An 2. for i = 1 to n do for j = 1 to i-1 do Rplac ach production Ai::=Aj whr Aj ::= β 1 β 2 β k β with Ai::=β 1 β 2 k nd Eliminat lft-rcursion for all Ai productions nd Exampl: S ::= Aa b A ::= Ac Sd Exampl: S ::= Aa b A ::= Ac Aad bd Exampl: S ::= Aa b A ::= bda A A ::= ca ada ε cs5363 19
Lft factoring Whn two altrnativ productions start with th sam symbols, dlay th dcision until w can mak th right choic Can chang LL(k) into LL(1) A::=β 1 β 2 S ::= IF <xpr> THEN S ELSE S IF <xpr> THEN S <othr> A::=A A ::=β 1 β 2 S ::= IF <xpr> THEN S S <othr> S ::= ELSE S ε S ::= MS US US ::= IF <xpr> THEN MS ELSE US IF <xpr> THEN S MS ::= IF <xpr> THEN MS ELSE MS <othr> S ::= MS US US ::= IF <xpr> THEN US US ::= MS ELSE S S MS ::= IF <xpr> THEN MS ELSE MS <othr> cs5363 20
Prdictiv parsing tabl Grammar: E ::= TE E ::= + TE - TE ε T ::= FT T ::= *FT / FT ε F ::= n E n + - * / $ E::=TE E E ::=+TE E ::=-TE E ::= ε T T::=FT T T ::= ε T ::= ε T ::=*FT T ::=/FT T ::= ε F F::=n cs5363 21
Constructing Prdictiv Parsing Tabl For ach string, comput First(): trminals that can start all strings drivd from For ach non-trminal A, comput Follow(A): trminals thn can immdiatly follow A in som drivation Algorithm For ach production A::=, do For ach trminal a in First(), add A::= to M[A,a] If ε First(), add A::= to M[A,b] for ach b Follow(A). Each undfind ntry of M is rror cs5363 22
Comput First E ::= TE E ::= + TE - TE ε T ::= FT T ::= *FT / FT ε F ::= n Non-trminals: First(E )={+,-, ε} First(T )={*,/, ε} First(F) = {n} First(T)=First(F)={n} First(E)=First(T)={n} Strings: First(TE )={n} First(+TE )={+} First(-TE )={-} First(FT )={n} First(*FT )={*} First(/FT )={/} If X is trminal, thn First(X)= {X} If X::= ε is a production, thn ε First(X) If x::=y1y2 yk is a production, thn First(x)=First(y1y2 yk) If X=Y1Y2 Yk is a string, thn First(Y1) First(X) If ε First(Y1), ε First(Y2) ε First(Yi), thn First(Yi+1) First(X) cs5363 23
Comput Follow Grammar: E ::= TE E ::= + TE - TE ε T ::= FT T ::= *FT / FT ε F ::= n Non-trminals: Follow(E)={$} Follow(E )={$} Follow(T) = {$,+,-} Follow(T )={$,+,-} Follow(F)={*,/,+,-,$} If S is th start non-trminal, thn $ Follow(S) If A::=Bβ is a production, thn First(β )-{ε} Follow(B) If ε First(β ), thn Follow(A) Follow(B) If A::= B is a production, thn Follow(A) Follow(B) cs5363 24
Build prdictiv parsing tabls First(TE )={n} First(+TE )={+} First(-TE )={-} First(FT )={n} First(*FT )={*} First(/FT )={/} Follow(E)={$} Follow(E )={$} Follow(T) = {$,+,-} Follow(T )={$,+,-} Follow(F)={*,/,+,-,$} E n + - * / $ E::=TE E E ::=+TE E ::=-TE E ::= ε T T::=FT T T ::= ε T ::= ε T ::=*FT T ::=/FT T ::= ε F F::=n cs5363 25
Bottom-up Parsing Start from th input string, try rduc it to th starting nontrminal. Equivalnt to th rvrs of a right-most drivation Grammar: E ::= E + T E T T T ::= T * F T / F F F ::= n - T T Right-most drivation for 5+15*20-7: + F EE-TE-FE-7E+T-7E+T*F-7 T T * F E+T*20-7E+F*20-7E+15*20-7 T+15*20-7F+15*20-75+15*20-7 F F 20 7 Bottom-up parsing: 5+15*20-7F+15*20-7T+15*20-7 E+15*20-7E+F*20-7E+T*20-7 E+T*F-7E+T-7E-7E-FE-TE 5 15 Right-sntntial form: any sntnc that can appar as an intrmdiat form of a right-most drivation. Th handl of a right-sntntial form : th substring to rduc to a non-trminal at ach stp cs5363 26
Handl pruning Grammar: E ::= E + T E T T T ::= T * F T / F F F ::= n Right-sntntial form Handl Rducing production 5+15*20-7 F+15*20-7 T+15*20-7 E+15*20-7 E+F*20-7 E+T*20-7 E+T*F-7 E+T-7 E-7 E-F E-T E 5 F T 15 F 20 T*F E+T 7 F E-T F::=n T::=F E::=T F::=n T::=F F::=n T::=T*F E::=E+T F::=n T::=F E::=E-T cs5363 27
LR(k) parsrs Lft-to-right, rightmost-drivation, k-symbol lookahad Dcisions ar mad by chcking th nxt k input tokns Us a finit automata to configur actions Automata stats rmmbr symbols to xpct for ach production Each (stat, input tokn) pair dtrmins a uniqu action Why us LR parsrs? Can rcogniz mor CFGs than can prdictiv LL(k) parsrs Can rcogniz virtually all programming languags Gnral non-backtracking mthod, fficint implmntation Can dtct rror at th lftmost position of input string Tradoff: LR(k) vs LL(k) parsrs LR parsrs ar hard to build by hand --- us automatic parsr gnrators (g., yacc) cs5363 28
Shift-rduc parsing Us a stack to sav symbols alrady procssd Prfix of handls procssd so far Us a finit automata to mak dcisions Stat + lookahad => action + goto stat Implmnt handl pruning through four actions Shift th currnt tokn from input string onto stack Rduc symbols on th top of stack to a non-trminal Accpt: succss Error How to locat th handl to b rducd? Which production to us in rducing a handl? Shift/rduc conflict: to shift or to rduc? Rduc/rduc conflict: choos a production to rduc cs5363 29
Exampl: LR(1) parsing tabl I0 n E T I1:(T::=n., $/+/*) I2:(E ::=E., $) I3(E::=T., $/+) + * n I4 I5 T * n I6:(E::=E+T., $/+) I7:T::=T*n., $/+/*) n + * $ E T 0 s1 Goto2 Goto3 1 R(T::=n) R(T::=n) R(T::=n) 2 s4 Acc 3 R(E::=T) s5 R(E::=T) 4 s1 Goto6 5 s7 6 R(E::=E+T) s5 R(E::=E+T) 7 R(T::=T*n) R(T::=T*n) R(T::=T*n)
LR shift-rduc parsing Stack Input Action (0) (0)5(1) (0)T (0)T(3) (0)E (0)E(2) (0)E(2)+(4) (0)E(2)+(4)15(1) (0)E(2)+(4)T (0)E(2)+(4)T(6) (0)E(2)+(4)T(6)*(5) (0)E(2)+(4)T(6)*(5)20(7) (0)E(2)+(4)T (0)E(2)+(4)T(6) (0)E (0)E(2) 5+15*20$ +15*20$ +15*20$ +15*20$ +15*20$ +15*20$ 15*20$ *20$ *20$ *20$ 20$ $ $ $ $ $ Shift 1 Rduc by T::=n Goto3 Rduc by E::=T Goto2 Shift 4 Shift 1 Rduc by T::=n Goto6 Shift5 Shift7 Rduc by T::=T*n Goto6 Rduc by E::=E+T Goto2 Accpt cs5363 31
Modl of an LR parsr Input a1a2.ai an$ Stack Sm Xm Sm-1 LR parsr Output Xm-1 action goto s0 Pars tabl Configuration of LR parsr: (s0x1s1x2s2 Xmsm, aiai+1 an$) Right-sntntial form: Automata stats: s0s1s2 sm X1X2 Xmaiai+1 an$ cs5363 32
Constructing LR parsing tabls Augmntd grammar: add a nw starting non-trminal E Build a finit automata to modl prfix of handls NFA stats: production + position of procssd symbols + lookahad Build a DFA by grouping NFA stats NFA stats: (Sα.β FOLLOW(S),γ ) whr Sαβ is a production, γ Rmmbrs th handl(α.β ) and lookahad(γ ) for ach stat Us lookahad information in automata stats LR(0): no lookahad; LR(1): look-ahad on tokn Grammar: E ::= E E ::= E + T T T ::= T * n n LR(0) itms: (NFA stats) (E ::=.E) E (E ::=E.) (E::=.T) T (E::=T.) (E::=.E+T) E (E::=E.+T) + (E::=E+.T) T (E::=E+T.) LR(1) itms: (E ::=.E,$) E (E ::=E.,$) (E::=.T,$)T(E::=T.,$) cs5363 33
Closur of LR(1) itms If I is a st of LR(1) itms, closur(i) Includs vry itm in I If (A::= α.bβ,a) is in closur(i), and B::=γ is a production, thn for vry b FIRST(β a), add (B::=.γ,b) to closur(i) Rpat until no mor nw itms to add Grammar: E ::= E E ::= E + T T T ::= T * n n Closur({E ::=.E,$}) = {(E ::=.E,$), (E::=.E+T,$/+), (E::=.T,$/+) (T::=.T*F,$/+/*), (T::=.n,$/+/*) } cs5363 34
Goto (DFA) transitions If I is a st of LR(1) itms, X is a grammar symbol, thn Goto(I,X) contains For ach (A::= α.xβ,a) in I, Closur({(A::= αx.β,a)}) Not: thr is no transition from (A::= ε, a) Cononical collction of LR(1) sts Bgin C ::= {closur({(s ::=.S,$)})} rpat for ach itm st I in C for ach grammar symbol X add Goto(I,X) to C until no mor itm sts can b addd to C cs5363 35
Exampl: Building DFA Grammar: E ::= E E ::= E + T T T ::= T * n n I0: {(E ::=.E,$), (E::=.E+T,$/+), (E::=.T,$/+), (T::=.T*F,$/+/*), (T::=.n,$/+/*)} Goto(I0,n): {(T::=n.,$/+/*)} I1 Goto(I0,E): {(E ::=E.,$), (E::=E.+T,$/+)} I2 Goto(I0,T): {(E::=T.,$/+), (T::=T.*n,$/+/*)} I3 Goto(I2,+): {(E::=E+.T,$/+), (T::=.T*n,$/+/*), (T::=.n,$/+/*)} I4 Goto(I3,*): {(T::=T*.n,$/+/*)} I5 Goto(I4,T): {(E::=E+T.,$/+), (T::=T.*n,$/+/*)} I6 Goto(I4,n): {(T::=n.,$/+/*)} I1 Goto(I5,n): {(T::=T*n.,$/+/*)} I7 Goto(I6,*): {(T::=T*.n,$/+/*)} I5 cs5363 36
LR(1) DFA Transitions I0: {(E ::=.E,$), (E::=.E+T,$/+), (E::=.T,$/+), (T::=.T*n,$/+/*), (T::=.n,$/+/*)} Goto(I0,n): {(T::=n.,$/+/*)} I1 Goto(I0,E): {(E ::=E.,$), (E::=E.+T,$/+)} I2 Goto(I0,T): {(E::=T.,$/+), (T::=T.*n,$/+/*)} I3 Goto(I2,+): {(E::=E+.T,$/+), (T::=.T*n,$/+/*), (T::=.n,$/+/*)} I4 Goto(I3,*): {(T::=T*.n,$/+/*)} I5 Goto(I4,T): {(E::=E+T.,$/+), (T::=T.*n,$/+/*)} I6 Goto(I4,n): {(T::=n.,$/+/*)} I1 Goto(I5,n): {(T::=T*n.,$/+/*)} I7 Goto(I6,*): {(T::=T*.n,$/+/*)} I5 I0 n E T I1:(T::=n., $/+/*) I2:(E ::=E., $) I3(E::=T., $/+) + * n I4 I5 T * n I6:(E::=E+T., $/+) I7:T::=T*n., $/+/*) cs5363 37
Constructing LR(1) Parsing Tabl Input: augmntd grammar G Output: parsing tabl functions (action and goto) Mthod: 1. Construct C={I0,I1,,In}, th canonical LR(1) collction 2. Crat a stat i for ach Ii C a) if Goto(Ii, a) = Ij and a is a trminal st action[i,a] to shift j. b) if Goto(Ii,A) = Ij and A is a non-trminal, st GOTO[i,A] to j. b) if (A::=.,a) is in Ii (not: could b ε) st action[i,a] to rduc A::= c) if (S ::=S.,$) is in Ii, st action[i,$] to accpt. cs5363 38
Exampl: LR(1) parsing tabl I0 n E T I1:(T::=n., $/+/*) I2:(E ::=E., $) I3(E::=T., $/+) + * n I4 I5 T * n I6:(E::=E+T., $/+) I7:T::=T*n., $/+/*) n + * $ E T 0 s1 Goto2 Goto3 1 R(T::=n) R(T::=n) R(T::=n) 2 s4 Acc 3 R(E::=T) s5 R(E::=T) 4 s1 Goto6 5 s7 6 R(E::=E+T) s5 R(E::=E+T) 7 R(T::=T*n) R(T::=T*n) R(T::=T*n)
Prcdnc and Associativity E ::= E + E E * E (E) id I7: {E::=E+E., E::=E.+E, E::=E.*E} Oprator + is lft-associativ on input tokn +, rduc with E::=E+E Oprator * has highr prcdnc than + on input tokn *, shift * onto stack I8: {E::=E*E., E::=E.+E, E::=E.*E} Oprator * is lft-associativ on input tokn *, rduc with E::=E*E Oprator * has highr prcdnc than + on input tokn +, rduc with E::=E*E cs5363 40
Parsr hirarchy LL(k) LL(1) LR(k) LR(1) LALR SLR cs5363 41
Summary: grammars and Parsrs Spcification and implmntation of languags Grammars spcify th syntax of languags Parsrs implmnt th spcification Contxt-fr grammars Ambiguous vs non-ambiguous grammars Lft-rcursiv grammars vs. LL parsrs Lft-factoring of grammars Parsrs Backtracking vs prdictiv parsrs LL parsrs vs LR parsrs Lookahad information cs5363 42