CS 36 Mting 2 /3/8 Announcmnts. Homwork 4 is du Friday. If Friday is Mountain Day, homwork should b turnd in at my offic or th dpartmnt offic bfor 4. 2. Homwork 5 will b availabl ovr th wknd. 3. Our midtrm will occur during th wk of /22. Quit Distinguishd. Last tim I introducd th notion of strings that wr distinguishd by a languag: Dfinition: W say that w, v Σ ar distinguishabl by a languag L if for som z Σ, xactly on of wz and vz is a mmbr of L. 2. Of cours, th opposit of distinguishd is indistinguishabl: Dfinition: W say that w, v Σ ar indistinguishabl by languag L if for all z Σ, wz L vz L and w writ w L v. 3. Two strings ar indistinguishabl rlativ to a languag if in som sns thy ar quivalnt as prfixs of strings in th languag. 4. Th rlation indistinguishabl by L dfind by Dfinition: W say that w, v Σ ar indistinguishabl by languag L if for all z Σ, wz L vz L and w writ w L v. is an quivalnc rlation on strings. 5. Th numbr of distinct quivalnc classs of strings undr th indistinguishabl rlationship has an intrsting rlationship to rgularity. Click hr to viw th slids for this class Givn a languag L and a st X of strings ovr L s alphabt, w say that th X is pairwis distinguishabl by L if vry pair of strings in X is distinguishabl by L. Th indx of a languag L is th siz of th largst st of strings X that is pairwis distinguishabl by L. Equivalntly, th indx of a languag is th siz of th st of quivalnc classs inducd by th indistinguishabl rlation rlativ to th languag. Th Myhill-Nrod Thorm. This bring us to th big Thorm (introducd through problm.52 in Sipsr): Thorm (Myhill-Nrod): A languag L is rgular iff it has finit indx and ach rgular languag is accptd by a DFA whos dscription includs as many stats as th indx of th languag. 2. W start by showing that rgularity implis finit indx. Proof: (a) Suppos that L is rgular. Thn thr is som DFA D = (Q, Σ, δ, s, F ) such that L = L(D). Suppos that X is a st of strings that is pairwis distinguishabl by L with w, v X. Considr ˆδ(s, w) and ˆδ(s, v). If ˆδ(s, w) = ˆδ(s, v) thn for all z Σ, ˆδ(s, wz) = ˆδ(s, vz). But wz L ˆδ(s, wz) F ˆδ(s, vz) F vz L which would imply that w and z wr indistinguishabl by L. Sinc th mmbrs of X ar pairwis distinguishabl, this cannot b th cas so for all w, v it must b th cas that ˆδ(s, w) ˆδ(s, v). This implis that th numbr of lmnts in X cannot xcd th numbr of lmnts in Q sinc othrwis thr would hav to b at last two strings
in X such that ˆδ(s, w) = ˆδ(s, v). Thrfor, if L is rgular it is of finit indx. 3. Now w nd to considr th othr dirction of th if and only if... Proof (continud): b) Suppos L is of finit indx. First, lt s look at a vry simpl concrt xampl: a b. W hav sn that this languag has indx 3 sinc {a, ab, ba} forms a maximal st of distinguishabl strings rlativ to th languag. Any othr collction of rprsntativs of th quivalnc classs associatd with ths strings also form maximal distinguishabl sts. For xampl, {ɛ, b, ba} is anothr maximal distinguishabl st for a b. Considr th following machin which uss th obvious st of stats to rcogniz a b, and nams thos stats, as w hav oftn don, with rprsntativs of th strings that would mov th machin to ach stat: a [ϵ] b [b] b This suggsts th following gnral construction. Lt X b a maximal st of strings pairwis distinguishabl by L. Construct a DFA D = ({[x] x X}, Σ, δ, [ɛ], {[x] x L X}) whr δ([x], a) = [xa]. a a,b [ba] W claim that L = L(D). To justify this claim, w nd to show that δ is wll-dfind. In particular that if a Σ, x, x [x] thn [xa] = [x a]. for all w Σ, ˆδ([ɛ], w) = [w], and [w] {[x] x L X} w L. Th first condition is tru bcaus of th way th indistinguishabl rlation is dfind. If [xa] [x a], thn xa and x a must b distinguishabl by L which would imply that for som z, on of th strings xaz and x az blongd to L and th othr didn t. In that cas, howvr, x and x would b distinguishd by th string az. If [x] = [x ], x and x must b indistinguishabl. Thus, δ is wll dfind. W can show th third condition by induction on th lngth of w. It is clarly tru for w = ɛ. Suppos it is tru for w and considr th a string of th form wx. By dfinition, ˆδ([ɛ], wx) = δ(ˆδ([ɛ], w), x) = δ([w], x) = [wx]. For th final condition, suppos that [w] {[x] x L X}. W know that thr is som x L X such that [w] = [x] which implis that w L x. Thrfor, for any z Σ, wz and xz must ithr both blong to L or nithr b in L. Considr z = ɛ. This implis that w L. In th opposit dirction, if w L, w L x for som x X and thrfor [w] = [x] {[x] x L X} Minimization of DFAs. Givn that w now know that for any rgular languag thr is a DFA of siz qual to th indx of th languag it rcognizs, w would lik to hav a way to algorithmically find this DFA givn any prcis dscription of th languag (i.., a DFA, an NFA, or a rgular xprssion). 2. Givn a rgular xprssion, w can construct a NFA for th languag 2
using th constructions mbddd in th proofs that rgular languags ar closd undr th union, concatnation and closur. 3. Givn an NFA, w can build an quivalnt DFA using th subst construction prsntd arlir. 4. All w nd is a way to convrt a non-minimal DFA into on of minimal siz (or to raliz that th on w startd with was alrady minimal). 5. W can prcisly spcify whn two stats can b mrgd by dfining stat quivalnc formally for two stats p, q Q as: p q w Σ, ˆδ(p, w) F ˆδ(q, w) F 6. It should b clar that this notion of quivalnc of stats is in fact rflxiv, symmtric and transitiv. Thrfor, it partitions th st of stats of a DFA into quivalnc classs. 7. If you look at th quivalnt stats of th originally DFA for th Huffman cod xampl and compar thm to th stats of th rducd DFA, you will notic that ach stat of th rducd DFA corrsponds to on of th quivalnc classs of th original DFA. (Rcall that th stats of th rducd DFA corrspond in turn to th quivalnc classs of strings inducd by th indistinguishabl by L rlation inducd by th languag rcognizd by th machin.). Instad, w will considr an algorithm that dtrmins which stats ar quivalnt to on anothr (by actually dtrmining which stats ar not quivalnt to on anothr).. Th basis of th algorithm is a somwhat rcursiv dfinition of not bing quivalnt. Th bas cas is basically and th rcursiv claus is p q if p F q F p q if w Σ, ˆδ(p, w) ˆδ(q, w) 2. I usd a nw machin as my xampl for how w can algorithmically comput th rlationship this smstr. You say all th nw slids. I didn t hav tim (or nrgy) to turn thos slids into L A TEX. So, hr I will prsnt an old xampl basd on th following DFA and it quivalnt mimimal DFA: 8. This suggsts a way that w could us th quivalnc rlation on th stats of a DFA to dtrmin th minimal DFA. Namly, if [q] dnots th quivalnc class of stat q inducd by th stat quivalnc rlation w just dfind: Givn M = (Q, Σ, δ, s, F ) dfin M = ({[q] q Q}, Σ, δ, [s], {[f] f F }) o g whr δ ([p], x) = [δ(p, x)]. 9. As in th proof of th Myhill-Nrod thorm, w should b carful to vrify that δ is wll dfind, bhavs as dsird, and that th st of final stats is appropriat. W won t. _ i n 3
or or x? or x g i n 3. Th mchanics of th algorithm us a tabl in which w rcord all pairs of stats w can idntify as non quivalnt. Each ntry in th tabl rflcts our knowldg of th rlationship btwn th stats at th top of its column and th right nd of its row. For our xampl machin, th tabl starts out lik this (with nams lik oh! and usd to mak it asy to distinguish th stats for mpty and zro from thos for th lttrs O and E). oh! 4. Th first stp is to us th basis stp dscribd abov to raliz that all final stats ar not quivalnt to all non-final stats. W rcord this by putting big X s in all of th clls in th tabl for such pairs.. g i n 5. Nxt, w us th rcursiv stp ovr and ovr again for diffrnt pairs of stats that still appar to b quivalnt rstricting our attntion to strings w of lngth. For xampl: At this point in our tabl, th ntry for th pair of stats, is mpty bcaus ths stat might still b quivalnt:? g i n Looking back at th stat diagram, w can s that on input, δ(, ) = and δ(, ) =. Sinc th ntry in our tabl for this pair of dstinations (, ) is still mpty, ths stats might b quivalnt, so it would still appar that and might b quivalnt. On th othr hand, on input, δ(, ) = and δ(, ) = oh!. Th ntry for th pair of stats (, oh!) in our tabl alrady has an X in 4
it indicating w know ths stats ar not quivalnt. Thrfor, w can conclud that and ar not quivalnt and rcord this fact with a nw X in our tabl. X g i n 6. W thn continu mthodically (w will go lft to right and top to bottom) through th tabl considring all of th unmarkd pairs: (,) Sinc δ(, ) = and δ(, ) = and th pair (, ) is still unmarkd in our tabl, w mak no changs. Howvr, δ(, ) = and δ(, ) = g and th stats and g ar known not to b quivalnt, so w gt to put anothr X in for, : X X g i n (,) Sinc δ(, ) = and δ(, ) = and th pair (, ) is still unmarkd in our tabl, w mak no changs. Similarly, δ(, ) = oh! and δ(, ) = g and th stats oh! and g ar still unmarkd so w mak no changs. It is important to not, howvr, that in both cass, w ar dciding whthr th two stats th machin would mov into ar not quivalnt by chcking to s if thir ntry in our tabl contains an X bfor w hav vn gottn to that ntry. If, whn w vntually procss thos ntris w discovr thy should hav X s, w will nd to rconsidr th pair (,). W won t do this by spcially rconsidring (,). Instad, w will mak an additional pass ovr all tabl ntris that ar still blank aftr th first pass. (,) Sinc δ(, ) = and δ(, ) = and th pair (, ) is markd as non-quivalnt, w gt to mark (,) Similarly, δ(, ) = oh! and δ(, ) = g and th stats oh! and g ar still unmarkd so w mak no changs. X X X g i n 7. Continuing to considr vry mpty cll in th tabl in th sam way until w rach (i,n), w vntually gt th following: 5
X X X X X X X X g i n 8. At this point, as mntiond abov, w nd to rconsidr all of th blank clls bcaus whn w considrd thm on th first pass w might hav basd our dcision not to mark thm on clls that w had not yt procssd. In this cas, on th scond pass, w will discovr that nothing actually changs. In gnral, w would kp making passs until nothing changs during on complt pass. 9. Th information in th tabl justifis many simplifications of th original machin. It indicats that stats and can b mrgd as can and. It also says that all of th final stats ar quivalnt and can b mrgd. Thus, th rducd machin will look lik: or x x or or? 6