Principles of Progrmming Lnguges h"p://www.di.unipi.it/~ndre/did2c/plp- 4/ Prof. Andre Corrdini Deprtment of Computer Science, Pis Lesson 6! From RE to DFA, directly Minimiz@on of DFA s Exercises on lexicl nlysis
From Regulr Expression to DFA Directly The importnt sttes of n NFA re those with non-ε outgoing trnsition, if move({s}, ) for some then s is n importnt stte The suset construction lgorithm uses only the importnt sttes when it determines ε-closure(move(t, )) 2
Wht re the importnt sttes in the NFA uilt from Regulr Expression? ε strt i ε f strt i f r r 2 r r 2 strt i ε ε N(r ) N(r 2 ) strt i N(r ) N(r 2 ) f r* strt i ε N(r) ε ε ε ε ε f f 3
From Regulr Expression to DFA Directly (Algorithm) The only ccepting stte (vi the Thompson lgorithm) is not importnt Augment the regulr expression r with specil end symol # to mke ccepting sttes importnt: the new expression is r# Construct syntx tree for r# Attch unique integer to ech node not leled y ε 4
From Regulr Expression to DFA Directly: Syntx Tree of ( )*#! conctention ct-nodes # 6 closure str-node 4 5 lterntion or-node 5 * 2 3 position numer (for lefs ε)
From Regulr Expression to DFA Directly: Annot@ng the Tree Trverse the tree to construct functions nullle, firstpos, lstpos, nd followpos For node n, let L(n) e the lnguge generted y the sutree with root n nullle(n): L(n) contins the empty string ε firstpos(n): set of positions under n tht cn mtch the first symol of string in L(n) lstpos(n): the set of positions under n tht cn mtch the lst symol of string in L(n) followpos(i): the set of positions tht cn follow position i in ny generted string 6
From Regulr Expression to DFA Directly: Annotting the Tree Node n nullle(n) firstpos(n) lstpos(n) Lef ε true Lef i flse {i} {i} / \ c c 2 / \ c c 2 nullle(c ) or nullle(c 2 ) nullle(c ) nd nullle(c 2 ) firstpos(c ) firstpos(c 2 ) if nullle(c ) then firstpos(c ) firstpos(c 2 ) else firstpos(c ) lstpos(c ) lstpos(c 2 ) if nullle(c 2 ) then lstpos(c ) lstpos(c 2 ) else lstpos(c 2 ) * true firstpos(c ) lstpos(c ) c 7
From Regulr Expression to DFA Annot@ng the Syntx Tree of ( )*# {, 2, 3} {6} {, 2, 3} {5} {6} # {6} 6 nullle {, 2, 3} {4} {5} {5} 5 {, 2} * {, 2, 3} {, 2} {3} {4} {4} 4 {3} {3} 3 firstpos lstpos {, 2} {, 2} {} {} {2} {2} 2 8
$ $ Ϗ $ $ $ -ǂ $ 2Ð!"#$%& $ $ $ ^ $ to DFA $ Ϗ $ Expression uð From Regulr $ of ( )*# followpos on the Syntx $ $Tree dҿ 2ÇÐ $ Ϗ $! $ Ϗ! $ {, 2, 3} {6} നஸᎯᅻ ᗮ(ù ŷѣ Ĝ ࡐ ᗮ Ξ ૧အᅻᗮ အ በᗮന ᗮዐ ᗮበᒹ ዐࡐᒏᗮዐᅻ {6} # {6} 6 ߖ ᗮ ᎯበዐᗮࡐเበအᗮࡐႼႼเᒹᗮᅻᎯเ ᗮ ዐအᗮዐ ᗮበዐࡐᅻŸ အ ȯᗮ ݬ ࡐዐᗮᅻᎯเ ᗮዐ เเ {, 2, 3} {5} ࡐ ᅻ ᗮന ᗮ အዐ ᗮ ť Å {5} ࡐ ᗮ ť በന ह ᗮ အዐ ᗮĜ {4} {5} ዐ നበᗮ အ ᗮࡐᅻ ᗮ WĽ ù ݬ ᗮ အ Ⴜเ ዐ ᗮበ ዐበᗮ ť Å ࡐ ק ד ᅻ ᗮበᎯ 5 {, 2, 3} nullle Wҿ {, 2, 3} {, 2} {, 2} * {3} {3} {3} 3 {, 2} {, 2} {} {} {4} {4} 4 {2} {2} 2 45&' ( - ҿ ť Å Ϭ W ( W ( W W W- ΠϏ 9
From Regulr Expression to DFA Directly: followpos for ech node n in the tree do if n is ct-node with left child c nd right child c 2 then for ech i in lstpos(c ) do followpos(i) := followpos(i) firstpos(c 2 ) end do else if n is str-node for ech i in lstpos(n) do followpos(i) := followpos(i) firstpos(n) end do end if end do
From Regulr Expression to DFA Directly: Exmple Node followpos {, 2, 3} 2 {, 2, 3} 3 {4} 4 {5} 5 {6} # 6-2 3 4 5 6 strt,2,,2,3 3,4,2, 3,5,2, 3,6
From Regulr Expression to DFA Directly: The Algorithm s := firstpos(root) where root is the root of the syntx tree for (r)# Dsttes := {s } nd is unmrked while there is n unmrked stte T in Dsttes do mrk T for ech input symol Σ do let U e the union of followpos(p) for ll positions p in T such tht the symol t position p is if U is not empty nd not in Dsttes then dd U s n unmrked stte to Dsttes end if Dtrn[T, ] := U end do end do 2
Minimizing the Numer of Sttes of DFA strt A C B D E strt AC B D E Given DFA, let us show how to get DFA which ccepts the sme regulr lnguge with miniml numer of sttes 3
Equivlent Sttes: Exmple Consider the ccept sttes c nd g. They re oth sinks. Q: Do we need oth sttes?,, c d e, f g 4
Equivlent Sttes: Exmple A: No, they cn e merged! Q: Cn ny other sttes e merged ecuse ny susequent string suffixes produce iden@cl results? d e,, cg f 5
Equivlent Sttes: Exmple A: Yes, nd f. No@ce tht if you're in or f then:. if string ends, reject in oth cses 2. if next chrcter is, forever ccept in oth cses 3. if next chrcter is, forever reject in oth cses So merge with f. d e,, cg f 6
Equivlent Sttes: Defini@on Intui@vely two sttes re equivlent if ll susequent ehvior from those sttes is the sme., d e, cg, f DEF: Two sttes q nd q' in DFA M = (Q, Σ, δ, q, F ) re equivlent (or indis-nguishle) if for ll strings u Σ*, the sttes on which u ends on when red from q nd q' re oth ccept, or oth non- ccept. 7
Finishing the Exmple Q: Any other wys to simplify the utomton?, d e, cg, B,f 8
A: Get rid of d. Useless Sttes Gefng rid of unrechle useless sttes doesn't ffect the ccepted lnguge., e, cg, f 9
Minimiz@on Algorithm: Gols DEF: An utomton is irreducile if it contins no useless sttes, nd no two dis@nct sttes re equivlent. The gol of the MinimizAon Algorithm is to crete n irreducile utomt from n ritrry one, ccep@ng the sme lnguge. The minimiz@on lgorithm incrementlly uilds praaon of the sttes of the given DFA: It strts with pr@@on sepr@ng just ccep@ng/non ccep@ng sttes Next it splits n equivlence clss if it contins two non equivlent sttes 2
Minimiz@on Algorithm. (Pr@@on Refinement) Code DFA minimize(dfa (Q, S, d, q, F ) ) remove ny stte q unrechle from q Pr@@on P = {F, Q - F } oolen Consistent = flse while ( Consistent == flse ) Consistent = true for(every Set S P, chr S, Set T P ) // collect sttes of T tht rech S using Set temp = {q T d(q,) S } if (temp!= Ø && temp!= T ) Consistent = flse P = (P - T ) {temp,t- temp} return defineminimizor( (Q, S, d, q, F ), P ) 2
Minimiz@on Algorithm. (Pr@@on Refinement) Code DFA defineminimizor (DFA (Q, Σ, δ, q, F ), Pr@@on P ) Set Q' =P Stte q' = the set in P which contins q F' = { S P S F } for (ech S P, Σ) define δ' (S,) = the set T P which contins the sttes δ'(s,) return (Q', Σ, δ', q', F' ) 22
Minimiz@on Algorithm: Exmple Show the result of pplying the minimiz@on lgorithm to this DFA 23
Proof of Miniml Automton Previous lgorithm gurnteed to produce n irreducile DFA. Why should tht FA e the smllest possile FA for its ccepted lnguge? Anlogous ques@on in clculus: Why should locl minimum e glol minimum? Usully not the cse! 24
Proof of Miniml Automton THM (Myhill- Nerode): The minimizcon lgorithm produces the smllest possile utomton for its ccepted lnguge. Proof. Show tht ny irreducile utomton is the smllest for its ccepted lnguge L: We sy tht two strings u,v Σ* re indis-nguishle if for ll strings x, ux L ó vx L. No@ce tht if u nd v re disanguishle, their pths from the strt stte must hve different endpoints. 25
Proof of Miniml Automton Consequently, the numer of sttes in ny DFA for L must e s gret s the numer of mutully dis@nguishle strings for L. But n irreducile DFA hs the property tht every stte gives rise to nother mutully dis@nguishle string! Therefore, ny other DFA must hve t lest s mny sttes s the irreducile DFA Let s see how the proof works on previous exmple: 26
Proof of Miniml Automton: Exmple The spnning tree of strings {ε,,,} is mutully dis@nguishle set (otherwise redundncy would occur nd hence DFA would e reducile). Any other DFA for L hs 4 sttes., e, cg, f 27
Exercises on Lexicl Anlysis 3.. Divide the following C++ progrm into pproprite lexemes: flot limitedsqure(x){flot x;! /* returns x-squred, ut never more thn */! return (x <= -. x >=.)? : x*x;! }! Which lexemes should get ssocited lexicl vlues? Wht should those vlues e? 28
From RE to Automt nd ckwrds We hve seen: RE à NFA NFA à DFA [nd oviously DFA à NFA] RE à DFA, directly DFA à miniml DFA Wht out NFA, DFA à RE? More difficult. Three pproches (not presented): Dynmic Progrmming [Scov Sec@on 2.4 on CD][Hopcrow, Motwni, Ullmn, Sec@on 3.2.] Incrementl stte elimin@on [HMU, Sec@on 3.2.2] RE s fixpoint solu@on of system of lnguge equ@ons [uses right- liner grmmrs for Regulr Lnguges] 29
Exercises on Regulr Expressions 3.3.2 Descrie the lnguges denoted y the following regulr expressions: ) ((ε ) * ) * c) ( ) * ( )( ) 3.3.5 Write regulr defini@ons for the following lnguges: ) All strings of lowercse levers in which the levers re in scending lexicogrphic order. c) Comments, consis@ng of string surrounded y /* nd */, without n intervening */, unless it is inside doule- quotes ( ) i) All strings of 's nd 's tht do not contin the susequence. 3
Exercises with Lex or Flex 3.5.2 Write Lex progrm tht copies file, replcing ech non- empty sequence of white spces y single lnk. 3.5.3 Write Lex progrm tht copies C progrm, replcing ech instnce of the keyword flot y doule. 3
Exercises on Finite Automt 3.6.2 Design finite utomt for the following lnguges (providing oth the trnsi@on grph nd the trnsi@on tle): ) All strings of lowercse levers tht contin the five vowels in order. d) All strings of digits with no repeted digits. Hint: Try this prolem first with few digits, such s {O,, 2}. f) All strings of 's nd 's with n even numer of 's nd n odd numer of 's. 32
Exercises: from RE to DFA 3.7.3 Convert the following regulr expressions to determinis@c finite utomt, using the [McNughton- Ymd- ]Thompson lgorithm (3.23) nd the suset construccon lgorithm (3.2): ) ( ) * ) ( * * ) * c) ((ε ) * ) * d) ( ) * ( ) * 33
Exercises: Minimizing DFA 3.9.3 Show tht the RE ) ( ) * ) ( * * ) * c) ((ε ) * ) * re equivlent y showing tht their minimum stte DFA s re isomorphic. 34