Overview H9 Vertlerouw H 9: Prsing: op-down & LL(1) do 3 mei 2001 56 heo Ruys h. 8 - Prsing 8.1 ontext-free Grmmrs 8.2 op-down Prsing 8.3 LL(1) Grmmrs See lso [ho, Sethi & Ullmn 1986] for more thorough discussion. INF 5037 - tel. 3716 ruys@cs.utwente.nl donderdg 3 mei 2001 (56) Vertlerouw - H9 1 donderdg 3 mei 2001 (56) Vertlerouw - H9 2 Introduction Prser (= syntx nlyser) checks whether the input progrm is syntcticlly correct usully specified y context-free grmmr regulr expressions fi finite-stte utomton context-free grmmr fi stck utomton is usully ugmented with ctions for context constrints code optimistion nd genertion not only for progrmming lnguges, ut for ll progrms tht process structured dt prsing strtegies: top-down prsing ottom-up prsing donderdg 3 mei 2001 (56) Vertlerouw - H9 3 ontext-free Grmmrs (1) ontext-free Grmmr (FG) G is defined y 4-tuple (N,, P, S) S: strt symol S N tokens tht occur P: production rules : finite set of terminls N: finite set of non-terminls define structure xmple: G = ({,B}, {,,c},, P) where P: fi fi B B fi B fi c regexp: ( c) N β (N ) V N Nottionl conveniences: only prove the production rules use choice opertor: fi B donderdg 3 mei 2001 (56) Vertlerouw - H9 4 ontext-free Grmmrs (2) FG is specifiction of rewrite system. FGs re used to derive strings of terminls. Nottion: α, β, γ, δ (N ) = V string of symols u, v, w string of terminls X, Y, Z (N ) single grmmr symol, B,, D N single non-terminl,, c single terminl 1-step derivtion: αγ αβγ using production rule: fi B donderdg 3 mei 2001 (56) Vertlerouw - H9 5 FGs (3) Derivtion αγ αβγ left-most derivtion if α = w, then wγ l αβγ right-most derivtion if γ = w, then αw r αβw zero or more steps α β one or more steps α β Recursion left-recursive derivtion if α then the FG is left-recursive right-recursive derivtion if α then the FG is right-recursive α, β, γ (N ) u, v, w X, Y, Z (N ), B, N,, c donderdg 3 mei 2001 (56) Vertlerouw - H9 6 1
FGs (4) erminology (cont.) if S β then β is sententil form if S w then w is sentence xmple: fi D D fi c α, β, γ (N ) u, v, w X, Y, Z (N ), B, N,, c ontext-free Grmmrs (5) ontext-free Lnguge (FL) FL = the set of ll sentences derived from FG FL(G) = { w S w } Previous exmple ( fi D D fi c): FL = {, c,, c,, c,, c,... } Prse tree: nother representtion of derivtion D sententil forms sentence D corresponds with D donderdg 3 mei 2001 (56) Vertlerouw - H9 7 donderdg 3 mei 2001 (56) Vertlerouw - H9 8 xmple: fi fi fi ontext-free Grmmrs (6) wo wys of deriving sentence in the FL corresponding to G: G is miguous! In this cse, G does not define the reltive priorities of nd oth derivtions re left-derivtions donderdg 3 mei 2001 (56) Vertlerouw - H9 9 ontext-free Grmmrs (7) Unmiguous grmmr: fi fi fi fi () hs priority over () n extr nonterminl is used to solve the priority/miguity prolem donderdg 3 mei 2001 (56) Vertlerouw - H9 10 ontext-free Grmmrs (8) Infmous dngling-else prolem: S fi if then S S fi if then S else S if then if then S else S ontext-free Grmmrs (9) Prolem: verifying tht the lnguge L is generted y grmmr G i.e. to prove tht: L(G) = L fi verify: if S w then w L verify: if w L then S w if S then S if then S else S if S then if S else S then S xmple: L is the lnguge consisting of lnced prntheses. G: S fi ( S ) S Proof sketch: use induction on the numer of derivtion steps nd the length of the sentence donderdg 3 mei 2001 (56) Vertlerouw - H9 11 donderdg 3 mei 2001 (56) Vertlerouw - H9 12 2
FGs (10) S fi ( S ) S FGs (11) S fi ( S ) S fi verify tht every generted string is lnced n=1 (one step derivtion) e is lnced n>1 ssume tht ll strings re lnced for <n-step derivtions conser n n-step derivtion, which will e of the form: S (S) S (x) S (x) y x nd y must e lnced (oth re cses of <n derivtions), hence (x)y is lnced. verify tht ll lnced-prenthesis strings cn e generted from S n=0 (length of sentence) e is derivle from S n>0 ssume tht every string of length <2n is derivle conser lnced string of length 2n (for n 1) let (x) e the shortest prefix of the lnced string the lnced string cn e written s (x)y where x nd y re oth lnced, nd re oth <2n in length; therefore they re derivle; Hence, we cn find S (S) S (x) S (x) y donderdg 3 mei 2001 (56) Vertlerouw - H9 13 donderdg 3 mei 2001 (56) Vertlerouw - H9 14 ontext-free Grmmrs (12) Rs vs FGs R cn lwys e expressed s FG. lgorithm: 1. " stte s, crete nonterminl s 2. " trnsition lelled, write s t 3. For ccept sttes s, write s e 4. he strt stte is the egin symol. R: { } NF 0 1 2 donderdg 3 mei 2001 (56) Vertlerouw - H9 15 3 FG: 0 0 0 0 0 1 1 2 2 3 3 -> e ontext-free Grmmrs (13) So: RL is lwys context-free. FL is usully not regulr. xmples: L 1 = { n n 1} regulr: L 2 = { n n n 1} not regulr context free: S fi S L 3 = { n n c n n 1} not regulr not context free Ide: RL/FL cn lwys e written in form so tht sustring/stte is repeted. he Pumping Lemms for regulr expressions nd grmmrs should e used to prove tht lnguge L is not RL or FL. (see [ Sudkmp 1991]) finite utomton cnnot keep count grmmr cn count two items, ut not three donderdg 3 mei 2001 (56) Vertlerouw - H9 16 op-down Prsing (1) R Use element construction suset construction to generte DF (= scnner) DF = finite-stte utomton FG n we lso generte prser for FG? S = stck utomton S is NF (or DF) with n extr stck. he stck gives the F the extr power. prser is n lgorithm sed on S tht egins with strt symol of FG nd derives sentence. op-down Prsing (2) Recll recursive-descent prsing (h.1) procedure is ssocited with ech nonterminl N in the grmmr. he ody of the procedure my contin sttements tht mtch terminls; sttements tht cll procedures for ech nonterminl in the right-hnd se of the production of N; semntic ctions. Recursive-descent prsers implicitly use stck, i.e. the cll-stck of the procedures. op-down prsing: uilding the prse tree from the root (i.e. the strt symol). donderdg 3 mei 2001 (56) Vertlerouw - H9 17 donderdg 3 mei 2001 (56) Vertlerouw - H9 18 3
op-down Prsing (3) xmple (using n explicit stck): fi B fi e string = B fi stck input $ B $ BB $ BB $ B $ B $ $ - $ donderdg 3 mei 2001 (56) Vertlerouw - H9 19 B B For the nonterminl on top of production rule is executed. erminls on top of the stck get popped, while dvncing the look-hed pointer in the input. e B B BB B ll left-derivtions op-down le-driven D-D lgorithm: ool DD() { Stck s; ool ccept=true; s.init(); s.push(s); D Prsing (4) Note tht in ech itertion symol is popped from the stck while (ccept && (look_hed!=$!s.empty())) { top = s.pop(); if (top ) { if (top!= look_hed) ccept=flse; else look_hed=red_input(); } else if (top N) { // ssume top == Select some production fi X 1... X n s.push(x 1,...,X n); } else ccept=flse; might e nondeterministic... } In the D-D lgorithm, the selection return ccept; of production rule is driven y tle. } donderdg 3 mei 2001 (56) Vertlerouw - H9 20 LL(1) (1) So we my hve choice of production rules fi α hoosing production rule non-predictive: rndomly (requires cktrcking!) predictive: using the look-hed symols in the input LL(k) If y looking hed k symols in the input strem, we cn lwys choose the right production rule, the given grmmr is (strong) LL(k). L: left-to-right scnning through the input strem L: left-derivtion donderdg 3 mei 2001 (56) Vertlerouw - H9 21 LL(1) (2) strong LL(k) vs. (norml) LL(k) strong LL(k): we only conser the look-hed tokens in the input strem when choosing production rule. LL(k): prt from the look-hed symols in the input, we my lso use the input tokens tht hve lredy een red to choose production rule. clss of strong LL(k) grmmrs clss of LL(k) grmmrs xmple: p 1 : fi p 2 : fi LL(1) LL(2) LL(3)... If k=1, we cnnot tell if p 1 or p 2 should e pplied. herefore, the grmmr is not LL(1); it is LL(2). donderdg 3 mei 2001 (56) Vertlerouw - H9 22 LL(1) (3) onser k=1 LL(1) = strong LL(1) LL(1) grmmrs re sufficient to descrie most progrmming constructs Define: prefix(w) = first terminl of w FIRS(α) New definition of LL(1): Given G nd productions fi α nd, then if FIRS(α.FOLLOW()) FIRS(β.FOLLOW()) = then G is LL(1). α nd β might e e = { terminls tht re first in sentence w derived from α } = { α w nd =prefix(w), for some w } FOLLOW() = { terminls tht re in FIRS(γ) in some sententil form βγ } = { S βγ nd FIRS(γ), for some β,γ V } donderdg 3 mei 2001 (56) Vertlerouw - H9 23 xmple: G 1 is defined y p 1 : fi B p 2 : fi e p 3 : B fi p 4 : B fi c LL(1) (4) L(G) = {, c,, c, c, cc,... } donderdg 3 mei 2001 (56) Vertlerouw - H9 24 B LL(1)-test for G 1: p 1 nd p 2 FIRS(B.FOLLOW()) FIRS(e.FOLLOW()) = {} {$} = p 3 nd p 4 FIRS(.FOLLOW(B)) FIRS(c.FOLLOW(B)) = {} {c} = B c B e Hence, G1 is LL(1) 4
LL(1) (5) xmple: G 2 is defined y p 1 : fi B L(G) = {, c,, c, p 2 : fi e c, cc,... } p 3 : B fi p 4 : B fi c LL(1)-test for G 2: p 1 nd p 2 FIRS(B.FOLLOW()) FIRS(e.FOLLOW()) = {} {FOLLOW()} = {} {} = {} G2 is not LL(1) LL(1) (6) Nottionl convenience: Insted of using the expression FIRS(α.FOLLOW()) for production rule fi α, we define DIRS( fi α) = FIRS(α), if MPY(α) = FIRS(α) FOLLOW(), otherwise he DIRS set cn e used to compute the prse tle DIRS( fi α) = { 1, 2,... } DIRS() = { 1, 2,... } Now: M(, i) = fi α M(, i) = B fi β In generl: M(,) = fi α, if DIRS( fi α) donderdg 3 mei 2001 (56) Vertlerouw - H9 25 donderdg 3 mei 2001 (56) Vertlerouw - H9 26 LL(1) (7)... we know how to check whether G is LL(1)... Left fctoristion fi αβ fi αγ cnnot e LL(1) ecomes liminte left recursion fi α cnnot e LL(1)... It is not decle, though ecomes When G is not LL(1), cn it e mde LL(1)? fi αb B fi β B fi γ which is LL(1) if FIRS(β) FIRS(γ) = fi B B fi β fi α fi e which is LL(1) if FIRS(α.FOLLOW()) FIRS(e.FOLLOW()) = nswer: sometimes donderdg 3 mei 2001 (56) Vertlerouw - H9 27 LL(1) (8)... it does not lwys work (e.g. left fctoristion) D is not importnt fi B fi It my not directly cler why this G is not LL(1). fi DB fi D fi α fi D fi B fi fi α he nonterminl now ssumes the role of the originl : this method will not terminte! donderdg 3 mei 2001 (56) Vertlerouw - H9 28 oncluding remrks: LL(1) (9) Section 8.3.3 of the ook/reder contins n extensive nd forml discussion on the LL(1)-test using the function MPY nd the sets LDING (generlistion of FIRS), RILING, FOLLOW nd DIRS. lgorithms re presented to utomticlly clculte these sets to perform the LL(1)-test; nd if the grmmr is LL(1), the DIRS cn directly e used to construct the prse tle. We will riefly discuss these sets (nd lgorithms) in H10 when presenting FGs. donderdg 3 mei 2001 (56) Vertlerouw - H9 29 5