Compila(on 0368/3133(SemesterA,2013/14) Admin Lecture4:SyntaxAnalysis (Top/DownParsing) ModernCompilerDesign:Chapter2.2 NoamRinetzky Nextweek:Trubowicz101(Lawschool) Mobiles... Slidescredit:RomanManevich,MoolySagiv,JeffUllman,EranYahav 1 2 WhatisaCompiler? Acompilerisacomputerprogramthat transformssourcecodewri\enina programminglanguage(sourcelanguage) intoanotherlanguage(targetlanguage). Themostcommonreasonforwan(ngto transformsourcecodeistocreatean executableprogram. //Wikipedia( ConceptualStructureofaCompiler Compiler txt exe Frontend Semantic Backend Source Executable Representation text code Lexical Syntax Semantic Intermediate Code Analysis Analysis Analysis Representation Generation Parsing (IR) 3 words sentences 4 1
program(text( token(stream( Fromscanningtoparsing! ( LP Grammar: E... Id!!!Id! a... z ( LP ((23$+$7)$$x)$ syntax error 23 Num Lexical Analyzer + OP Parser Op() valid 7 Num ) RP OP x Id Abstract(Syntax(Tree( ) RP ContextFreeGrammars G=(V,T,P,S) V nonterminals(syntac(cvariables) T terminals(tokens) P deriva(onrules EachruleoftheformVt(T4V) S startsymbol Op(+) Id(b) Num(23) Num(7) 5 6 CFGterminology! CFGterminology S S;S S id:=e E id! E num! E E+E Symbols:! Terminals(tokens):;!:=!(!)id!num!print! Non:terminals:SEL Start(non:terminal:S Conven(on:thenon/terminalappearing inthefirstderiva(onrule Grammar(produc=ons((rules)( N μ Deriva(on/asequenceofreplacementsof non/terminalsusingthederiva(onrules Language/thesetofstringsofterminals derivablefromthestartsymbol Senten(alform/theresultofapar(al deriva(oninwhichtheremaybenon/ terminals 7 8 2
S S ; S id := E ; S id := id ; S id := id ; id := E id := id ; id := E + E ParseTree S S E S id := E Leomost/rightmostDeriva(on Leomostderiva(on alwaysexpandleomostnon/terminal Rightmostderiva(on Alwaysexpandrightmostnon/terminal id := id ; id := E + id id := id ; id := id + id x:= z ; y := x + z id := id ; E id + E id Allowsustodescribederiva(onbylis(ngthe sequenceofrules alwaysknowwhataruleisappliedto 9 10 LeomostDeriva(on Broadkindsofparsers x":="z;" y":="x"+"z" S t S;S S t id := E S S ; S id := E ; S id := id ; S id := id ; id := E id := id ; id := E + E id := id ; id := id + E id := id ; id := id + id x$:=$z$ ;$$y$$:=$$x$+$z$ E t id E + E E E ( E ) S t S;S S t id := E E t id S t id := E E t E + E E t id E t id 11 Parsersforarbitrarygrammars Earley smethod,cykmethod Usually,notusedinprac(ce(thoughmightchange) Top/Downparsers Constructparsetreeinatop/downma\er Findtheleomostderiva(on LinearBo\om/Upparsers Constructparsetreeinabo\om/upmanner Findtherightmostderiva(oninareverseorder 12 3
Intui(on:Top/DownParsing! Beginwithstartsymbol Guess theproduc(ons Checkifparsetreeyieldsuser'sprogram E ET E T T T+F T F F id F num F (E) Intui(on:Top/Downparsing! E T E Recall:Nonstandard precedence T T F F F +! 2!! 3! 13 14 Intui(on:Top/Downparsing! Intui(on:Top/Downparsing! E ET E T T T+F T F F id F num F (E) Weneedthis ruletogetthe! E! E ET E T T T+F T F F id F num F (E) E! E T! +! 2!! 3! +! 2!! 3! 15 16 4
Intui(on:Top/Downparsing! Intui(on:Top/Downparsing! E ET E T T T+F T F F id F num F (E) E T! E E ET E T T T+F T F F id F num F (E) E T E T T! T F! +! 2!! 3! +! 2!! 3! 17 18 Intui(on:Top/Downparsing! Intui(on:Top/Downparsing! E ET E T T T+F T F F id F num F (E) E T E E ET E T T T+F T F F id F num F (E) E T E T T T T F! F F F +! 2!! 3! +! 2!! 3! 19 20 5
Intui(on:Top/Downparsing! Intui(on:Top/Downparsing! E ET E T T T+F T F F id F num F (E) E T E E ET E T T T+F T F F id F num F (E) E T E T T T T F F F F F +! 2!! 3! +! 2!! 3! 21 22 E ET E T T T+F T F F id F num F (E) Intui(on:Top/Downparsing! T E T E T Challengesintop/downparsing! Top/downparsingbeginswithvirtuallyno informa(on Beginswithjustthestartsymbol,which matcheseveryprogram Howcanweknowwhichproduc(onsto apply? F F F +! 2!! 3! 23 24 6
RecursiveDescent! Recursivedescent bool A(){ // A " A 1 A n pos = recordcurrentposition(); Blindexhaus(vesearch Goesoverallpossibleproduc(onrules Read&parseprefixesofinput Backtracksifguesseswrong Implementa(on Uses(possiblyrecursive)func(onsforevery produc(onrule Backtracks! rewind input for (i = 1; i n; i++){ if (A i ()) return true; rewindcurrent(pos); } return false; } bool A i (){ // A i = X 1 X 2 X k for (j=1; j k; j++) if (X j is a terminal) if (X j == current) match(current); else return false; else if (! X j ()) return false; return true; } current 25 token(stream( ( LP ( LP 23 Num + OP 7 Num ) RP OP x Id ) RP 26 Example Grammar E"T T+E T"int intt (E) Input:(5) Tokenstream:LPintRP Problem:LeoRecursion E E/term term inte(){ returne()&&match(token( / ))&&term(); } # Whathappenswiththisprocedure? # Recursivedescentparserscannothandleleo/recursivegrammars p. 127! 27 28 7
Leorecursionremoval p. 130! Challengesintop/downparsing! L(G 1 )=β,βα,βαα,βααα, L(G 2 )=same N Nα β E E/term term N βn N αn ε G 1 G 2 # Forour3 rd example: Canbedonealgorithmically. Problem:grammarbecomes mangledbeyondrecogni(on! E termte term TE >termte ε Top/downparsingbeginswithvirtuallyno informa(on Beginswithjustthestartsymbol,which matcheseveryprogram Howcanweknowwhichproduc(onsto apply? Wanted:Top/Downparsingwithout backtracking 29 30 Predic(veparsing! GivenagrammarGandawordwderivewusingG Applyproduc(ontoleomostnonterminal Pickproduc(onrulebasedonnextinputtoken Generalgrammar Morethanoneop(onforchoosingthenext produc(onbasedonatoken Restrictedgrammars(LL) Knowexactlywhichsingleruletoapplybasedon Nonterminal Next(k)tokens(lookahead) Booleanexpressionsexample! E LIT (E OP E) not E LIT true false OP and or xor not ( not true or false ) 31 32 8
Booleanexpressionsexample! E LIT (E OP E) not E LIT true false OP and or xor produc(onto applyknownfrom nexttoken! not ( not true or false ) E Problem:produc(onswithcommonprefix! term ID ID[expr] Cannottellwhichruletousebasedonlookahead(ID) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E ( E OP E ) not LIT or LIT true false 33 34 Solu(on:leofactoring RewritethegrammartobeinLL(1) term ID ID[expr] Leofactoring anotherexample! S ifethenselses ifethens T term IDaoer_ID Aoer_ID [expr] ε S ifethenss T S elses ε Intui(on:justlikefactoringxy+xzintox(y+z) 35 36 9
LL(k)Parsers LL(k)parsingviapushdown automataandpredic(ontable! Predic(veparser Canbegeneratedautoma(cally Doesnotuserecursion Efficient Incontrast,recursivedescent Manualconstruc(on Recursive Expensive 37 Pushdownautomatonuses Predic(onstack Inputtokenstream Transi(ontable nonterminalsxtokens/>produc(onalterna(ve EntryindexedbynonterminalNandtokent containsthealterna(veofnthatmustbe predicatedwhencurrentinputstartswitht 38 Nonterminals Exampletransi(ontable! (1) E LIT (2) E ( E OP E ) (3) E not E (4) LIT true (5) LIT false (6) OP and (7) OP or (8) OP xor (! )! not! true! false! and! or! xor!! $! E 2 3 1 1 LIT 4 5 Input tokens OP 6 7 8 Which!rule! should!be!used! Non/RecursivePredic(veParser! Stack! X! Y! Z! $! a! +! Predic(veParsing program! Parsing Table! b! $! Output! Table 39 40 10
LL(k)parsingviapushdown automataandpredic(ontable! Twopossiblemoves Predic(on WhentopofstackisnonterminalN,popN,lookuptable[N,t]. Iftable[N,t]isnotempty,pushtable[N,t]onpredic(onstack, otherwise syntaxerror Match Whentopofpredic(onstackisaterminalT,mustbeequalto nextinputtokent.if(t==t),poptandconsumet.if(t T) syntaxerror Parsingterminateswhenpredic(onstackisempty Ifinputisemptyatthatpoint,success.Otherwise, syntaxerror Runningparserexample! aacbb$ A t aab c Input!suffix! Stack!content! Move! aacbb$ A$ predict(a,a)=ataab aacbb$ aab$ match(a,a) acbb$ Ab$ predict(a,a)=ataab acbb$ aabb$ match(a,a) cbb$ Abb$ predict(a,c)=atc cbb$ cbb$ match(c,c) bb$ bb$ match(b,b) b$ b$ match(b,b) $ $ match($,$) success 41 a! b! c! A AtaAb Atc 42 FIRSTsets FIRST(α)={t α"tβ} {X" } FIRST(α)=allterminalsthatαcanappearas firstinsomederiva(onforα + ifcanbederivedfromx Example: FIRST(LIT)={true,false} FIRST((EOPE))={ ( } FIRST(notE)={not} 43 Compu(ngFIRSTsets! FIRST(t)={t}// t nonterminal FIRST(X)if X" or X"A 1..A k and FIRST(A i )i=1 k FIRST(α) FIRST(X)if X"A 1..A k αand FIRST(A i )i=1 k 44 11
12 FIRSTsetscomputa(onexample! STMT ifexprthenstmt whileexprdostmt EXPR; EXPR TERM zero?term notexpr ++id //id TERM id constant! TERM! EXPR! STMT! 45 1.Ini(aliza(on! TERM! EXPR! STMT! id constant! zero? Not ++ //! if while! 46 STMT ifexprthenstmt whileexprdostmt EXPR; EXPR TERM zero?term notexpr ++id //id TERM id constant! TERM! EXPR! STMT! id constant! zero? Not ++ //! if while! zero? Not ++ //! 47 2.Iterate STMT ifexprthenstmt whileexprdostmt EXPR; EXPR TERM zero?term notexpr ++id //id TERM id constant! 2.Iterate2! TERM! EXPR! STMT! id constant! zero? Not ++ //! if while! id constant!! zero? Not ++ //! 48 STMT ifexprthenstmt whileexprdostmt EXPR; EXPR TERM/>id zero?term notexpr ++id //id TERM id constant!
2.Iterate3 fixed/point! FOLLOWsets p. 189! STMT ifexprthenstmt whileexprdostmt EXPR; EXPR TERM zero?term notexpr ++id //id TERM id constant! STMT! if while! zero? Not ++ //! id constant! EXPR! zero? Not ++ //! id constant!! TERM! id constant! FOLLOW(X)={t S" αxtβ} FOLLOW(X)=setoftokensthatcanimmediatelyfollowXin somesenten(alform NOTrelatedtowhatcanbederivedfromX Intui(on:X"AB FIRST(B) FOLLOW(A) FOLLOW(B) FOLLOW(X) IfB" εthenfollow(a) FOLLOW(X) 49 50 FOLLOWsets:Constraints $ FOLLOW(S) FIRST(β) { } FOLLOW(X) ForeachA"αXβ FOLLOW(A) FOLLOW(X) ForeachA"αXβand FIRST(β) Example:FOLLOWsets E"TX X"+E T"(E) intyy"t Terminal! +! (!! )! int! FOLLOW int,( int,( int,( _,),$,),+,$ Non.! Term.! E! T! X! Y! FOLLOW ),$ +,),$ $,) _,),$ 51 52 13
A"α Predic(onTable Problem:NonLLGrammars S Aab A a ε T[A,t]=αift FIRST(α) T[A,t]=αif FIRST(α)andt FOLLOW(A) tcanalsobe$ Tisnotwelldefined!thegrammarisnotLL(1) bools(){ returna()&&match(token( a ))&&match(token( b )); } boola(){ returnmatch(token( a )) true; } # Whathappensforinput ab? # Whathappensifyoufliporderofalterna(vesandtry aab? 53 54 Problem:NonLLGrammars Solu(on:subs(tu(on S Aab A a ε S Aab A a ε FIRST(S)={a} FIRST(A)={aε} FOLLOW(S)={$} FOLLOW(A)={a} S aab ab Subs(tuteAinS FIRST/FOLLOWconflict S aaoer_a aoer_a ab b Leofactoring 55 56 14
LL(k)grammars AgrammarisLL(k)ifitcanbederivedvia: Top/downderiva(on Scanningtheinputfromleotoright(L) Producingtheleomostderiva(on(L) Withlookaheadofktokens(k) Tiswelldefined AlanguageissaidtobeLL(k)whenithasan LL(k)grammar LL(k)grammars AgrammarisnotLL(k)ifitis Ambiguous Leorecursive Notleofactored 57 58 EarleyParsing Dynamicprogramming InventedbyEarley[PhD.1968] HandlesarbitraryCFG Canhandleambiguousgrammars ComplexityO(N 3 )whenn= input Usesdynamicprogramming Compactlyencodesambiguity 59 BreakaproblemPintosubproblemsP 1 P k SolvePbycombiningsolu(onsforP 1 P k Memo/ize(store)solu(onstosubproblems insteadofre/computa(on Bellman/Fordshortestpathalgorithm Sol(x,y,i)=minimumof Sol(x,y,i/1) Sol(t,y,i/1)+weight(x,t)foredges(x,t) 60 15
EarleyParsing Dynamicprogrammingimplementa(onofa recursivedescentparser S[N+1]Sequenceofsetsof Earleystates N= INPUT Earleystate(item)sisasenten(alform+auxinfo S[i]Allparsetreethatcanbeproduced(bya RDP)aoerreadingthefirstitokens S[i+1]builtusingS[0] S[i] EarleyStates s=<cons(tuent,back> cons(tuent(do\edrule)fora"αβ A" αβpredicatedcons(tuents A"α βin/progresscons(tuents A"αβ completedcons(tuents backpreviousearlystateinderiva(on 61 62 EarleyParser Input=x[1 N] S[0]=<E " E,0>;S[1]= S[N]={} fori=0...ndo un(ls[i]doesnotchangedo foreachs S[i] ifs=<a" a,b>anda=x[i+1]then S[i+1]=S[i+1] {<A" a,b>}//scan ifs=<a" X,b>andX"αthen S[i]=S[i] {<X" α,i>}//predict ifs=<a",b>and<x" A,k> S[b]then S[i]=S[i] {<X" A,k>}//complete S0 S E,0 E E + E,0 E n,0 Example n + n S1 S2 S3 E n, 0 S E, 0 E E +E,0 E E + E,0 E E + E,2 E n,2 E n, 2 E E + E, 0 E E +E,2 S E, 0 FIGURE 1. Earley sets for the grammar E E + E n and the input n + n.itemsinboldareoneswhichcorrespondtothe input s derivation. 63 64 16