Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one of the simplest models of computing. We spent good del of time getting fmilir with how they work, y constructing exmples of DFAs ccepting given lnguges. Tody, we wnt to tke closer look t the kind of lnguges ccepted y DFAs. We give them specil nme regulr lnguges. Ultimtely, we wnt to get feel for wht regulr lnguges look like, to identify them y sight. In tht connection, our first tsk is to figure out how to uild new regulr lnguges out of old ones. The first such construction is tking the complement of lnguge. In other words, if we hve regulr lnguge (one ccepted y some DFA), we wnt to look t the lnguge contining ll nd only the strings tht re not in tht first lnguge. It turns out this lnguge is lwys regulr s well. To get sense for it, let s look t specific exmple. Let s re-visit the ABBA mchine from lst time: 0 1 2 3 4 ; The lnguge ccepted y this mchine is the set of strings where is repeted ny numer of times (including zero). Now, we wnt to uild DFA tht ccepts exctly the opposite kind of strings. In other words, this new DFA should reject the strings where is repeted ny numer of times, ut it should ccept everything else. Is there n esy wy to tke the DFA we hve nd produce DFA ccepting this new lnguge, the complement lnguge? The quick nd esy solution is this: we wnt DFA tht ccepts ll the strings tht used to e rejected, 1

nd rejects ll the strings tht used to e ccepted. If we just swp the ccepting nd rejecting sttes of the originl mchine, we ll get DFA ccepting the complement of the ABBA lnguge: 0 1 2 3 4 ; Given this ide, cn you propose the wy to do this in generl? Given DFA ccepting certin lnguge, how cn we modify it to crete DFA ccepting the complementry lnguge? The sme ide pplies: tke the originl DFA, turn ll its ccepting sttes into rejecting sttes, nd turn ll its rejecting sttes into ccepting sttes. This rgument tells us tht given regulr lnguge, the complement of tht lnguge is lso regulr we ve uilt new regulr lnguge out of n old one. Of course, there re mny more things of this kind tht we cn do. Given two regulr lnguges, which we ll cll L 1 nd L 2, there re two relted lnguges we cn uild. First, we might wnt to uild new DFA ccepting only the strings elonging to oth L 1 nd L 2. Second, we my wnt to uild new DFA ccepting the strings elonging to either L 1 or L 2. Like efore, we will illustrte with n exmple. Lst time, you uilt DFA tht ccepts the strings strting nd ending with, nd you lso uilt one ccepting exctly the strings contining somewhere in the string. In other words, oth these lnguges re regulr. 2

One possile DFA ccepting the first lnguge is: 0 2 3 1 ; One possile DFA ccepting the second lnguge is: 0 1 2 3 ; Given these two DFAs, is there wy to uild DFA ccepting the intersection of the two lnguges? In other words, we wnt DFA ccepting only the strings elonging to oth lnguges. More concretely, it will ccept the strings strting nd ending with tht lso contin somewhere in the string. Tke few minutes to cht this over. We ll tke it up s group shortly. The key to this is tht we need to keep trck of wht s hppening in oth mchines t the sme time. You cn imgine running oth mchines in prllel, giving them oth the sme string independently. Ech mchine will either ccept or reject the string, nd we only wnt to ccept the string in the end if oth mchines hve ccepted the string. The cool prt is tht we cn model this with single DFA. In this new DFA, we keep trck of pirs of sttes, one for ech of the two originl DFAs. Ech stte in the new DFA mtches up with pir of sttes from the old DFAs. Then, when the new DFA reds letter, ech stte in the pir chnges in the sme wy tht the two old DFAs would hve. The ccepting sttes of the new DFA mtch up with pirs of ccepting sttes from the two old DFAs, since we only wnt to ccept the strings tht oth the old DFAs liked. Here s wht we get when ll is sid nd done: 3

(2;3) (0;0) (2;1) (3;2) (3;3) ; (1;0) (1;1) (1;2) (1;3) In this new DFA, ll the sttes re hve pirs of numers in the lels. The first numer represents stte in the first DFA, nd the second numer represents stte in the second DFA. For exmple, stte (2, 1) represents the first DFA eing in stte 2, nd the second eing in stte 1. If you give to oth mchines, the first mchine moves to stte 3 nd the second moves to stte 2, so the new DFA moves to stte (3, 2). If you give to oth mchines insted, oth of them sty in the sme stte, so the DFA stys in stte (2, 1). Notice tht (2, 3) is the only ccepting stte in this new mchine. This is ecuse the only wy oth of the originl DFAs ccept string is if the first ends up in stte 2 nd the second ends up in stte 3. You cn convince yourself tht this DFA ccepts the intersection of the two lnguges: ll the strings strting nd ending with, nd which contin somewhere. Using similr ide, we cn uild DFA ccepting the union of these two lnguges: ll the strings elonging to either lnguge. In our prticulr exmple, this mens the collection of strings tht strt nd end with, OR contin somewhere. In our minds, we get the sme intuitive ide: imgine running the two originl DFAs with the sme input, nd keeping trck of which sttes oth re in t ny given time. After oth finish reding the string, we ccept it if either DFA hs ccepted it. With this in mind, look ck t the DFA ccepting the intersection of the two lnguges. Wht smll djustment cn we mke to it so tht it ccepts the union insted? All we hve to chnge re the ccepting sttes of the mchine. We now wnt the DFA to ccept the string if either stte in the pir is n ccepting stte for the mtching DFA. Since stte 2 is n ccepting stte for the first DFA, ny stte in the new mchine strting with 2 should e n ccepting stte. Likewise, since 4

stte 3 is n ccepting stte in the second DFA, ny stte in the new mchine ending with 3 should e n ccepting stte s well. The end result looks like this: (2;3) (0;0) (2;1) (3;2) (3;3) ; (1;0) (1;1) (1;2) (1;3) This is the mchine ccepting the strings tht either strt nd end with, or else contin somewhere inside of them. All right, let s recp to mke sure you ve got the gist. Let s sy I give you two DFAs, ccepting two different lnguges. How do I uild DFA ccepting only the strings elonging to oth lnguges? To either lnguge? Gret! We hve now descried severl different wys to mke new regulr lnguges out of old ones. We cn tke the complement of lnguge, or the union or intersection of two lnguges. But now the ll-importnt question: is every lnguge regulr lnguge? In other words, if you write down ny old collection of strings, cn we lwys uild DFA ccepting exctly tht collection? Tke five minutes to ply with this question now, chtting it over with your friends. The key to this question is tht DFAs only hve fixed, finite numer of sttes, nd the sttes re the only memory the DFA hs. Any lnguge requiring us to store n unounded mount of informtion cn never e hndled y DFA. The simplest exmple of this phenomenon is the lnguge of strings of the form n n, where n cn e ny whole numer igger thn or equl to 0. In other words, the strings in the lnguge re the empty string,,,, nd so on. The importnt thing is tht there is some numer of s, followed y n equl numer of s. 5

Now tht I ve given you this exmple, cn you explin why this lnguge cn t e regulr? Intuitively, we need to keep count of how mny s hve een red so fr, in order to mtch up the corresponding numer of s. But since ny numer of s re llowed, we seem to need infinitely mny sttes to keep trck of it ll. But let s convince ourselves eyond shdow of ll dout. Let s write down mthemticl proof tht this lnguge is not regulr. This will e n rgument no one cn disgree with. Suppose there ws DFA tht ccepted this lnguge. Necessrily, this DFA hs only finite numer of sttes. Now, let s trck wht sttes the DFA psses through s it reds unch of s in row. There re only so mny sttes to e in, so eventully, we ll e le to find two whole numers m nd n, where the DFA is in the sme stte fter reding m nd n. Mthemticins like to cll this the pigeonhole principle: if you re trying to put pigeons into oxes, nd you hve more pigeons thn oxes, t lest two pigeons must end up in the sme ox. In the lnguge of our prolem: we re trying to put strings of the form k into sttes, nd we hve more strings thn sttes, so t lest two of the strings end up ssocited to the sme stte. Let s sy tht the DFA ends up in stte q fter reding oth m nd n. Without doing ny hrm, we cn ssume m is smller thn n. Since this DFA ccepts the lnguge we re interested in, it ccepts the string m m. Therefore, if you strt in stte q nd red m letter s in row, you ll end up in some ccepting stte, which we ll cll r. But now we hve prolem. After reding n, the mchine lso ends up in stte q, so if the input is n m, we ll end up in stte q, then red m letter s in row, ending in stte r. This mens the DFA hs to ccept n m, ut this cn t e! All in ll, this mens no DFA cn ccept this lnguge we ve found lnguge tht s not regulr! Another very nturl lnguge (for computers) tht isn t regulr is the lnguge of legl rcketings. For this, pretend tht represents left rcket, nd represents right rcket (we could ctully use rckets s our symols, ut letters re esier to red). When you type n expression with rckets into computer or clcultor, wht s the rule you need to follow? You cn t hve right rcket without mtching left rcket coming first. In terms of letters, you cn t hve without mtching tht comes first. So is llowed, ut nything strting with is not, nd isn t either. I ll let you come up with n rgument for why this isn t regulr on your own, ut my hint is to tke close look t the rgument we just finished using for the other lnguge! Conclusion Tody, we got much etter sense of how regulr lnguges work. First, we explored how to uild new regulr lnguges from old ones, y tking complements, unions, nd intersections. Then, we discovered tht not ll lnguges re regulr: some re just too complicted to e ccepted y ny DFA. The nturl question is: how cn we chnge the design of the DFA to ccept more lnguges? Next time, we ll explore one possile option mking the DFA non-deterministic, llowing it to mke choices. This hs some interesting tie-ins with the modern dy concept of quntum computers. 6