COMP 412 FALL 2017 Lexicl Anlysis, II Comp 412 source code IR Front End OpMmizer Bck End IR trget code Copyright 2017, Keith D. Cooper & Lind Torczon, ll rights reserved. Students enrolled in Comp 412 t Rice University hve explicit permission to mke copies of these mterils for their personl use. Fculty from other educmonl insmtumons my use these mterils for nonprofit educmonl purposes, provided this copyright nomce is preserved. Chpter 2 in EC2e
Determinism (or not) So fr, we hve only looked t determinisac utomt, or DFAs DFA DeterminisMc Finite Automton DeterminisMc mens tht it hs only one trnsimon out of stte on given chrcter s 0 s 1 rther thn s 0 s 1 s 2 s 3 COMP 412, Fll 2017 1
Determinism (or not) So fr, we hve only looked t determinisac utomt, or DFAs DFA DeterminisMc Finite Automton DeterminisMc mens tht it hs only one trnsimon out of stte on given chrcter s 0 s 1 rther thn s 0 s 1 s 2 s 3 Cn finite utomton hve mulmple trnsimons out of single stte on the sme chrcter? Yes, we cll such n FA Nondeterminis8c Finite Automton And, yes, the NFA is truly n odd no8on ut useful one NFAs nd DFAs re equivlent SomeMmes, it is esier to uild n NFA thn to uild DFA COMP trnsiaon 412, Fll does 2017 not consume n input chrcter, which should worry us. (O(1)?) 2
Who. Wht Does Tht NFA Men? An NFA ccepts string x iff pth though the trnsiaon grph from s 0 to finl stte such tht the edge lels spell x, ignoring s Two models for NFA execumon 1. To run the NFA, strt in s 0 nd guess the right trnsimon t ech step 2. To run the NFA, strt in s 0 nd, t ech non-determinismc choice, clone the NFA to purse ll possile pths. If ny of the clones succeeds, ccept s 1 w h t s 1 s 2 s 3 s 4 s 0 s 1 w h o s 1 s 2 s 3 NFA for wht who In some sense, this sme opermonl definimon works on DFA COMP See pge 412, Fll 44 2017 in EC2e. 3
Why Do We Cre? We need construcaon tht tkes n RE to DFA to scnner. NFAs will help up get there. Overview: 1. Simple nd direct construcmon of nondeterminisac finite utomton (NFA) to recognize given RE Esy to uild in n lgorithmic wy Key ide is to comine NFAs for the terms with -trnsimons 2. Construct determinisac finite utomton (DFA) tht simultes the NFA Use set-of-sttes construcmon 3. Minimize the numer of sttes in the DFA We will look t 2 different lgorithms: Hopcrod s & Brzozowski s 4. Generte the scnner code AddiMonl specificmons needed for the cmons lex nd flex work long these lines OpMonl, ut worthwhile; reduces DFA size COMP 412, Fll 2017 4
Exmple of DFA Here is DFA for ( ) * S 0 S 1 S 2 S 3 This DFA is not praculrly ovious from the RE. Ech RE corresponds to one or more determinis3c finite utomtons (DFAs) We know DFA exists for ech RE The DFA my e hrd to uild directly AutomMc techniques will uild it for us For lgorithm ficiondos in the clss, this DFA is reminiscent of the wy tht the COMP filure 412, funcmon Fll 2017 works in the Knuth, Morris, & Pri su-liner Mme piern mtcher. 5
Exmple s n NFA Here is simpler, more ovious NFA for ( ) * S 0 S 1 S 2 S 2 S 3 ( ) * Here is n NFA for the sme lnguge The relmonship etween the RE nd the NFA is more ovious The -trnsimon pstes together two DFAs to form single NFA We cn rewrite this NFA to eliminte the -trnsimon -trnsimons re n odd nd convenient quirk of NFAs EliminMng this one mkes it ovious tht it hs 2 trnsimons on from s 0 COMP 412, Fll 2017 6
RelMonship etween NFAs nd DFAs DFA is specil cse of n NFA DFA hs no trnsimons DFA s trnsimon funcmon is single-vlued Sme rules will work DFA cn e simulted with n NFA Oviously NFA cn e simulted with DFA Simulte sets of possile sttes Possile exponenml lowup in the stte spce SMll, one stte per chrcter in the input strem NFA & DFA re equivlent in ility to recognize lnguges (less ovious, ut s8ll true) COMP 412, Fll 2017 Rin & ScoV, 1959 7
The Pln for Scnner ConstrucMon RE NFA (Thompson s construc8on) ü Build n NFA for ech term in the RE Comine them in pierns tht model the opertors NFA DFA (Suset construc8on) ü Build DFA tht simultes the NFA DFA Miniml DFA Hopcrod s lgorithm Brzozowski s lgorithm Miniml DFA Scnner See 2.5 in EC2e DFA RE All pirs, ll pths prolem Union together pths from s 0 to finl stte RE NFA DFA Automt Theory Moment Tken together, the construcmons on the cycle show tht REs, NFAs, nd DFAs re ll equivlent in their expressive power. The Cycle of Construc8ons Scnner miniml DFA Tken COMP 412, together, Fll 2017 these construcaons prove tht DFAs nd REs re equivlent. 8
RE NFA using Thompson s ConstrucMon Key ide NFA piern for ech symol & ech opertor Join them with moves in precedence order S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 5 S 0 S 1 S 3 S 4 S 3 S 4 NFA for NFA for * Precedence in REs: Closure ConctenMon COMP AlternMon 412, Fll 2017 Ken Thompson, CACM, 1968 9
Exmple of Thompson s ConstrucMon Let s uild n NFA for ( c ) * 1.,, & c S 0 S 1 S 0 S 1 c S 0 S 1 2. c 3. ( c ) * S 1 S 2 S 0 S 5 c S 3 S 4 S 2 S 3 S 0 S 1 S 6 S 7 S 4 c S 5 Note tht sttes re eing renmed t ech step. COMP 412, Fll 2017 10
Exmple of Thompson s ConstrucMon 4. ( c ) * S 0 S 1 S 4 S 5 S 2 S 3 S 8 S 9 S 6 c S 7 Of course, humn would design something simpler... S 0 S 1 c But, we cn utomte producaon of the more complex NFA version... COMP 412, Fll 2017 11
Thompson s ConstrucMon Wrning You will e tempted to tke shortcuts, such s leving out some of the trnsimons Do not do it. Memorize these four pierns. They will keep you out of troule. S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * COMP Ken Thompson, 412, Fll 2017 CACM, 1968 12
The Pln for Scnner ConstrucMon RE NFA (Thompson s construc8on) ü Build n NFA for ech term in the RE Comine them in pierns tht model the opertors NFA DFA (Suset construc8on) ü Build DFA tht simultes the NFA DFA Miniml DFA Hopcrod s lgorithm Brzozowski s lgorithm Miniml DFA Scnner See 2.5 in EC2e DFA RE All pirs, ll pths prolem Union together pths from s 0 to finl stte The Cycle of Construc8ons RE NFA DFA Scnner miniml DFA COMP 412, Fll 2017 13
SimulMng n NFA with DFA NFA n 0 n 1 ( c )* n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 DFA Where the mpping etween NFA sttes nd DFA sttes is: d 0 d 1 c d 2 c DFA NFA d 0 n 0 d 1 n 1 n 2 n 9 d 3 d 2 n 5 n 8 n 9 c d 3 n 7 n 8 n 9 COMP 412, Fll 2017 14
NFA DFA with Suset ConstrucMon The suset construcaon uilds DFA tht simultes the NFA Two key funcaons Move(s i, ) is the set of sttes rechle from s i y FollowEpsilon(s i ) is the set of sttes rechle from s i y The lgorithm Derive the DFA s strt stte from n 0 of the NFA Add ll sttes rechle from n 0 y following d 0 = FollowEpsilon ( {n 0 } ) Let D = { d 0 } For α Σ, compute FollowEpsilon ( Move(d 0, α) ) If this cretes new stte, dd it to D Iterte unml no more sttes re dded COMP It sounds 412, Fll more 2017 complex thn it is Rin & ScoV, 1959 15
Any DFA stte tht contins finl stte of the NFA ecomes finl stte of the DFA. NFA DFA with Suset ConstrucMon The lgorithm: d 0 FollowEpsilon( { n 0 } ) D { d 0 } W { d 0 } while ( W Ø ) { select nd remove s from W for ech α Σ { t FollowEpsilon(Move(s,α)) T[s,α] t } } } if ( t D ) then { dd t to D dd t to W d 0 is set of sttes D & W re sets of sets of sttes COMP 16 412, Fll 2017 The lgorithm hlts: 1. D contins no duplictes (test efore ddimon) 2. 2 {NFA sttes} is finite 3. while loop dds to D, ut does not remove from D (monotone) the loop hlts D contins ll the rechle NFA sttes It tries ech chrcter in ech d i. It uilds every possile NFA configur8on. D nd T form the DFA This test is liile tricky
NFA DFA with Suset ConstrucMon Exmple of fixed-point computaon Monotone construcmon of some finite set Hlts when it stops dding to the set Proofs of hlmng & correctness re similr These computmons rise in mny contexts Other fixed-point computaons Cnonicl construcmon of sets of LR(1) items Quite similr to the suset construcmon Clssic dt-flow nlysis & Gussin EliminMon Solving sets of simultneous set equmons We will see mny more fixed-point comput8ons COMP 412, Fll 2017 17
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 s 1 n 1, n 2, n 3, n 4, n 6, n 9 n 1, n 2, n 3, n 4, n 6, n 9 none none none n 5, n 8, n 9, n 7, n 8, n 9, s 2 n 5, n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 18
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 s 1 n 1, n 2, n 3, n 4, n 6, n 9 none n 5, n 8, n 9, n 7, n 8, n 9, s 2 n 5, n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 19
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 s 1 n 1, n 2, n 3, n 4, n 6, n 9 none none none n 5, n 8, n 9, n 7, n 8, n 9, s 2 n 5, n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 20
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 d 1 none none none n 5, n 8, n 9, n 7, n 8, n 9, s 2 n 5, n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 21
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 d 1 none none none n 5, n 8, n 9, n 7, n 8, n 9, s 2 n 5, n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 22
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 s 2 n 5 n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 23
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 s 2 n 5 n 8, n 9, s 3 n 7, n 8, n 9, COMP 412, Fll 2017 24
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 d 2 n 5 n 8 n 9 s 3 n 7 n 8, n 9, COMP 412, Fll 2017 25
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 d 2 n 5 n 8 n 9 d 3 n 7 n 8 n 9 COMP 412, Fll 2017 26
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 d 2 n 5 n 8 n 9 none d 3 n 7 n 8 n 9 none COMP 412, Fll 2017 27
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 n 7 is the core stte of d 3 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 d 2 n 5 n 8 n 9 none d 2 d 3 s 3 n 7 n 8 n 9 COMP 412, Fll 2017 28
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 n 5 is the core stte of d 2 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 d 2 n 5 n 8 n 9 none d 2 d 3 d 3 n 7 n 8 n 9 none d 2 d 3 COMP 412, Fll 2017 29
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 none none d 1 none n 5 n 8 n 9 n 7 n 8 n 9 d 2 n 5 n 8 n 9 none d 2 d 3 d 3 n 7 n 8 n 9 none d 2 d 3 Finl sttes ecuse of n 9 COMP 412, Fll 2017 30
NFA DFA with Suset ConstrucMon ( c )* : n 0 n 1 n 4 n 5 n n 3 n 2 8 n 9 c n 6 n 7 Sttes FollowEpsilon ( Move( s,*) ) DFA NFA c d 0 n 0 d 1 none none d 1 none d 2 d 3 d 2 n 5 n 8 n 9 none d 2 d 3 d 3 n 7 n 8 n 9 none d 2 d 3 TrnsiAon tle for the DFA COMP 412, Fll 2017 31
NFA DFA with Suset ConstrucMon The DFA for ( c ) * c d 0 d 1 c d 2 c d 0 d 1 d 1 none none none d 2 d 3 d 3 c d 2 none d 2 d 3 d 3 none d 2 d 3 Much smller thn the NFA (no -trnsimons) All trnsimons re determinismc Use sme code skeleton s efore But, rememer, our gol ws: S 0 S 1 c COMP 412, Fll 2017 32
Rin nd Scoi, 1959 (pge 8) ordinry utomton, defining exctly the sme set of (S,M*,F,So) where the function Ms is defined y the s tpes given nondeterministic mchine. condition Definition 11. Let S=(S,M,So,F) e nondeter-.s' is in M" (s,u) if nd only if s is in M(s',u). ministic utumtun. %(!X) is the system (T,N,to,G) where T is the set of ll susets of S, N is function 011 Notice tht we hve t once the eqution 91"" =%. TXZ such tht N(t+,) is the union of the sets M(s,u) The reltion etween the sets defined y n utomton for s in 1, to=so, nd G is the set of ll susets of S con- nd its dul is s follows. " ~~. - tining t lest one memer of F. Clerly D( 3) is n ordinry utomton, ut it is ctully equivlent to!x. Theorem 11. If 3 is nondeterministic utomton, then T(%) =T(%(X) 1. Proof: Assume first tht tpe x=u0u1... u,-~ is in T( 91) nd let so,sl,..., s, e sequence of internl sttestisfying the conditions of Definition 10. We show y induction tht for kln, sk is in N( For k=o, N(to,oxk)=N(t,,A)=to=S, nd we were given tht so is in So. Assume the result for k- 1. By definition, ~(to,oxk)=n(~(to,oxk-l),~k-l). But we hve ssumed sk-l is in N(to,oxk-l) so tht from the definition of N we hve M(S~.-~,U~-~)C N(t,,,x,). However, sk is in M(S~-~,U~-~), nd so the res'ult is estlished. In prticulr s, is in N(t,,ox,) =N(t,,x), nd since s, is in F, we hve N(to,x) in G, which proves tht x is in T( %( 91) ). Hence, we hve shown tht T(91)C T(D(91) >. Assume next tht tpe x=u0u1... u,-~ is in T(B(S)). Let for ech kln, tk=n(to,,xk). We shll work ckwrds. First, we know tht t, is in G. Let then s, e ny internl stte of 8 such tht s,, is in t, nd s, is in F. Since s, is in tn=n(t,,,(,x,)=n(t,-l,u,_,), we hve from the definition of N tht s,, is in M(S,-~,U~-~) for some s,-~ in t,-l. But tn"l=n(to,"x,-l) =N(tn--2,un"2), You must pprecite the clerly. Theorem 12. If 91 is nondeterministic utomton, then T(X*) =T(W *. Proof: In view of the equlity %*:!==?I, we need only show T(S*)< T(S)". Let x=uoul... e tpe in T(%*); we must show tht x* is in T(91). Let so&,..., s, e the sequence of internl sttes of %* such tht so is in F, s, is in So nd s, is in M*(sk-l,uk-l) for k= 1,2,..., n. Define new sequence S'~~,S'~,..., s', y the eqution S'~=S,+~ for kln. Oviously, do is in S,, nd s', is in F. Further, for k>o nd kln, S'~-~=S,-~+~ is in M*(s~-~,u,~~), or in other words, S,_,~"S'~ is in M (S'~~,,U,-~). Now defining new sequence of symols dodl... ',-, y the formul dk= unpk- 1, we see tht u'k"l=u,-k nd u',,dl... u', - = x". Thus, x+ is in T(91) s ws to e proved. It should e noted tht Theorem 12 together with Theorem 11 yields direct construction nd proof for Theorem 4 of Section 3 which ws first proved y the indirect method of Theorem 1. In the next section we mke hevy use of the direct constructions supplied y the nondeterministic mchines totin results not esily pprent from the mthemticl chrcteriztions of Theorems 1 nd 2. 6. Further closure properties Simplifying result due originlly to Kleene, Myhill in unpulished work hs shown tht the clss T cn e chrcterized s the lest clss of sets of tpes contining the finite sets nd closed under some simple opertions on sets of tpes. We indicte here different proof using
The Pln for Scnner ConstrucMon RE NFA (Thompson s construc8on) ü Build n NFA for ech term in the RE Comine them in pierns tht model the opertors NFA DFA (Suset construc8on) ü Build DFA tht simultes the NFA DFA Miniml DFA Hopcrod s lgorithm Brzozowski s lgorithm Miniml DFA Scnner See 2.5 in EC2e DFA RE All pirs, ll pths prolem Union together pths from s 0 to finl stte The Cycle of Construc8ons RE NFA DFA Scnner miniml DFA COMP 412, Fll 2017 34