PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA RIGHT LINEAR LANGUAGES. Right Liner Grmmr: Rules of the form: A α B, A α A,B V N, α V T + Left Liner Grmmr: Rules of the form: A Bα, A α A,B V N, α V T + Rewrite nonterminl into non-empty string of terminl symols or non-empty string of terminl symols followed y nonterminl (right liner)/ nonterminl followed y non-empty string of terminl symols (left liner). Restricted Right Liner Grmmr: Rules of the form: A B, A A,B V N, V T Restricted Left Liner: Rules of the form: A B, A A,B V N, V T Rewrite nonterminl into terminl symol or terminl symol followed y nonterminl (right liner)/ non terminl followed y terminl symol (left liner) Liner: A α B β, A,B V N, α V T + nd β V T * or α V T * nd β V T + or A α, A V N, α V T +. Rewrite non-terminl into non-empty string of terminl symols, or into non-terminl flnked y two strings of terminls (possily empty). Remrk 1: for ll these type: we include in the type G e for grmmrs G of the type in reduced form. Remrk 2: Clerly, restricted right liner grmmr is right liner, right liner grmmr is liner, liner grmmr is context free. Remrk 3: Liner grmmrs re stronger thn right liner grmmrs. We will prove shortly tht lnguge n n (n>0) (ll string consisting of 's followed y n equl numer of 's, t lest 1 nd t lest 1 ) cnnot e generted y right liner grmmr. But S, S S is liner grmmr nd genertes n n (n>0). The terminology is explined y the generted prse trees: Right liner: Restricted right liner A A α A A α A A α A A α A A α A single spine of non-terminls on the right side. 19

Left liner: Restricted left liner A A A α A A α A A α A A α A α A single spine of non-terminls on the left side. Liner: A α A β α A β α A β α A β α β A single spine of non-terminls. Theorem: For every right liner grmmr there is n equivlent restricted right liner grmmr. Let G e right liner grmmr. The rules re of the form: A α, A αb, where α = 1... n. Tke ny such rule R. Add to the grmmr new non-terminl symols X 1...X n-1 nd replce R y the rules: X n-1 n A 1 X 1, X 1 2 X 2,... depending on R X n-1 n B Exmple: A cb, B gives prse tree: A c B A X 1, X 1 X 2, X 2 cb, B gives prse tree: A X 1 X 2 c B 20

The resulting grmmr genertes the sme lnguges nd is restricted right liner grmmr. Exmple: () + ccd() + S S S A A ccd B B B B S S A c c d B B S A 1 A 1 S A 1 A A c A 2 A 2 c A 3 A 3 d B B A 4 A 4 A 4 A 21

S A 1 S A 1 A c A 2 c A 3 d B A 4 B A 4 We oviously hve the sme result for left liner grmmrs nd restricted left liner grmmrs. Theorem: For every right liner grmmr there is n equivlent left liner grmmr (nd vice vers). This will follow from other fcts lter. This mens tht the right liner lnguges re exctly the restricted right liner lnguges nd re exctly the left liner lnguges. REGULAR LANGUAGES. Kleene closure, *, is n opertion which mps every lnguge A onto A*. Union,, is n opertion which mps every two lnguges A nd B onto A B. We introduce third opertion product: Product,, is opertion which mps every two lnguges A nd B onto A B, defined s: The product of A nd B: A B = {α β: α A nd β B} Exmple: If A = {,} nd B = {cc, d} A B = {cc, d, cc, d} 22

Fct: A + = A A* Cse 1. Let e A. Then A + = A*, so the clim is: A* = A A* -Let α A nd β A*, then oviously, y definition of A*, α β A*, hence A A* A*. -A A* = {α β: α A nd β A*} A* = {e β: β A*}, nd since e A, A* A A*. So indeed A A* = A*. Cse 2. e A. -We hve lredy proved tht A A* A*. -Since e A, e A A*, ecuse every string in A A* strts with s string from A. The two prove tht A A* A +. -Let α A +. Then either α A, nd since α = α e nd e A*, α A A*. Or α = α 1... α n, where α 1,...,α n A. But then α 1 A nd α 2... α n A*. Hence, α A A*. So, A + A A*. Consequently, A + = A A*. Exmple: A = {,} A* = {e,,,,,,,,.} A A* = { ^e,,,,,,,,. ^e,,,, } = {,,,,,,,.} = A + Tke two lnguges A nd B. First tke their union, (A B). Then tke the Kleene closure of tht: (A B)*. The composition opertion * o is the opertion which tkes ny two lnguges A nd B nd mps them onto *(A B). Let O e the opertion which mps ny five lnguges A,B,C,D,E onto the lnguge ((A)* (B)*) (C (D E)*)*). This opertion cn e decomposed s finite sequence of compositions of the opertions *,, : First pply * to A, then pply * to B, then pply to the result. Cll this 1. Then pply union to D nd E nd pply * to the result. Cll this 2. Apply to C nd 2, nd pply * to the result. Cll this 3. Now pply to 1 nd 3 nd you get the output of O. Defining the notion of 'finite sequence of compositions' is techniclly complex nd nitty-gritty. I won't do tht here, ut ssume insted tht the intuition is cler. 23

The clss of regulr opertions on lnguges is given y: 1. *,, re regulr opertions on lnguges. 2. Any opertion which cn e decomposed s finite sequence of compositions of the opertions *,, is regulr opertion. We define: Lnguge A is regulr lnguge iff A is finite lnguge or there is regulr opertion O nd finite lnguges A 1,,A n nd A = O(A 1,,A n ). This mens tht ny regulr lnguge cn e gotten y strting with finite numer of finite lnguges nd pplying finite sequence of the opertions *,,. For exmple, n m (n,m 0) is: {}* {}* n m (n,m 1) is: {} + {} + Since we hve shown tht {} + = {} {}*, we see tht: n m (n,m 1) is: ({} {}*) ({} {}*) Equivlently, we cn define the clss of regulr lnguges inductively s: R, the clss of ll regulr lnguges is the smllest clss such tht: 1. Every finite lnguge is regulr. 2. If A nd B re regulr lnguges, then A B is regulr. 3. If A nd B re regulr lnguges, then A B is regulr. 4. If A is regulr lnguge, then A* is regulr. (We sy 'clss' nd not 'set' ecuse in this definition we don't put ny constrints on the lphets tht the lnguges re lnguges in.) Theorem: Every regulr lnguges is right liner lnguge., Step 1: Every finite lnguge is right liner lnguge. Let A = {α 1,,α n } S α 1,.,S α n is right liner grmmr. Note tht this lso holds if e A, ecuse this grmmr is trivilly in reduced form. Step 2: If A nd B re right liner lnguges, then A B is right liner lnguge. Suppose G A is right liner grmmr generting A nd G B is right liner grmmr generting B. -Chnge everywhere every non-terminl X in G A y new non-terminl X A. -Chnge everywhere every non-terminl X in G B y new non-terminl X B. (i.e. we mke ll non-terminls in G A nd G B disjoint). -Tke the union of the resulting grmmrs. 24

-For every rule of the form: S A α A A, S A α, S B β B B, S B β dd rule of the form: S α A A, S α, S β B B, S β Cll the resulting grmmr G A B. (Note tht G A B is in reduced form.) G A B is right liner grmmr nd G A B genertes A B. Step 3: If A nd B re right liner lnguges, then A B is right liner lnguge. Suppose G A is right liner grmmr generting A nd G B is right liner grmmr generting B. -Chnge everywhere every non-terminl X in G B y new non-terminl X B (not occurring in G A, i.e. we mke ll non-terminls in G A nd G B disjoint). 1. If e A nd e B, then replce every G A rule of the form: A α y rule of the form: A α S B Cll the resulting grmmr G A B 2. If e A nd e B, G B contins rule S B e. Delete tht rule nd dd for every G A rule of the form: A α rule of the form: A α S B Cll the resulting grmmr G A B 3. If e A nd e B, then delete S! e nd replce every remining G A rule of the form: A α y rule of the form: A α S B nd dd for every rule of the form: S B α B B or S B α rule of the form: S α B B or S α Cll the resulting grmmr G A B 4. If e A nd e B, then G B contins rule S B e. Delete tht rule nd dd for every remining G A rule of the form: A α rule of the form: A α S B 25

nd dd for every rule of the form: S B α B B or S B α rule of the form: S α B B or S α Cll the resulting grmmr G A B In ll four cses G A B is right liner grmmr nd G A B genertes A B. Step 4. If A is right liner lnguge, then A* is right liner lnguge. Suppose G A is right liner grmmr generting A. -If G A contins rule S e, delete tht rule. -For every remining rule of the form: A α dd rule: A α S This will generte A* {e}. -Convert the resulting grmmr into reduced form nd dd S e to the result. Cll the result G A*. G A* is right liner grmmr nd genertes A*. This completes the proof. So we know now tht the clss of regulr lnguges is suclss of the clss of right liner lnguges. We will soon see tht the two clsses ctully coincide. 26

FINITE STATE AUTOMATA We now tke prsing perspective. Wheres grmmrs generte strings, utomt red strings symol y symol, from left to right, follow instructions, nd determinine, when the string is red, whether the string is ccepted or rejected. At ny point of its opertion, we ssume tht the utomton is in certin stte. We cn think of this stte s n rry of switches which cn e on or off. Ech comintion of switch-settings tht the utomtion llows is possile stte tht the utomton cn e in. The instructions tht the utomton follows, then, cn e interpreted s instructions to reset switches, nd hence s instructions to move from one stte to nother. A stte utomton is n utomton tht cn do this nd not more: it cn red the input from left to right, symol y symol, nd t ech point follow n instruction to switch stte. The stte tht it is in fter reding the input will determine whether or not the string is ccepted. Importntly: A stte utomton does not hve ny memory. A finite stte utomton is stte utomton tht hs finite numer of possile sttes it cn e in. A finite stte utomton is deterministic iff in ny stte it is in, there is t most one stte it cn switch to, ccording to its instructions. A finite stte utomton is non-deterministic iff possily in some stte there is more thn one stte it cn switch to, ccording to its instructions. A deterministic stte utomton is totl iff in ny stte it is in, there is exctly one stte it cn switch to. In the forml definition we only specify the things tht vry from utomt to utomt: A finite stte utomton is tuple M = <S,Σ,δ,S 0,F> where: 1. S is finite set, the set of sttes. 2. Σ is finite lphet, the input lphet. 3. δ, the trnsition reltion, is three-plce reltion relting symol in Σ to two sttes (n input stte nd n output stte): δ S Σ S. 4. S 0 S. S 0 is the initil stte. 5. F S. F is the set of finl sttes. Let M e finite stte utomton: M is deterministic iff δ is prtil function from S Σ into S, i.e. iff δ mps every pir consisting of n input stte nd n input symol onto t most one output stte. Let M e deterministic finite stte utomton. M is totl iff δ is totl function from S Σ into S, i.e. iff δ mps every pir consisting of n input stte nd n input symol onto exctly one output stte. 27

Since we red 'non-deterministic' s 'possily non-deterministic', we tke 'finite stte utomton' nd 'non-deterministic finite stte utomton' to e the sme notion. I mentioned tht we only specify the vrile prts of the utomton. The invrile prts re the following: 1. Every utomton hs n input tpe, on which string in the input lphet is written. 2. Every utomton hs reding hed which reds one symol t time. 3. Every computtion strts while the utomton is in the initil stte S 0, reding the first symol of the input string. 4. We ssume tht fter hving red the lst symol of the input string, the utomton reds e. 5. At ech computtion step the utomton follows trnsition. We write trnsition δ(s i,)=s k s: (S i,) S k And with this trnsition, the utomton cn perform the following computtion step: 6. We sy: Computtion step: If the utomton is in stte S i nd reds symol on the input tpe, it switches to stte S k nd reds the next symol on the input tpe. The utomton hlts iff there is no trnsition rule to continue. Let α Σ*. A computtion pth for α in M is sequence of computtion steps eginning in S 0 reding the first symol of α, following instructions in δ until M hlts. Fct: If M is deterministic finite stte mchine, then every input string α Σ* hs unique computtion pth. This mens tht for ech input string, the utomton will hlt. Now, since e Σ (since Σ is n lphet), there is y definition of δ no instruction if M reds e. This mens tht if the utomton reds e, it hlts. We use this in defining cceptnce: 7 DET. Deterministic finite stte utomton M ccepts string α Σ* iff t the end of the computtion pth of α in M, M reds e nd M is in finl stte. Otherwise M rejects α. This mens tht M rejects α if, t the end of the computtion pth for α in M, M reds e, ut is not in finl stte, or if M hlts t symol efore reding the whole input string, tht is, if t the end of the computtion pth of α in M, M doesn't red e. 28

If M is non-deterministic utomton, there my e more thn one instruction tht M cn follow while reding n input symol in stte. This mens tht M cn choose, nd this mens tht for ech string α of the input lphet there my e more thn one computtion pth for α in M. (where computtion pth for α in M is, once gin, sequence of computtion steps licensed y trnsitions in M, strting in S 0 reding the first symol of α, nd hlting in some stte.) For non-deterministic utomt we define cceptnce: 7 NDET Non-deterministic finite stte utomton M ccepts string α Σ* iff for some computtion pth for α in M, t the end of tht computtion pth, M reds e nd M is in finl stte. Otherwise M rejects α. This mens, tht there my e computtion pths for α in M t the end of which M is not reding e, or M is not in finl stte, nd yet M ccepts α: s long s there is t lest one computtion pth, where M ends up reding e in finl stte, M ccepts α. 8. Let M e finite stte utomton. L(M), the lnguge ccepted y M, is the set of ll ccepted input strings. (So L(M) Σ*). We cll L(M) finite stte lnguge. M 1 nd M 2 re equivlent iff L(M 1 ) = L(M 2 ). Now we introduce pictures of finite stte utomt, clled stte digrms: A stte digrm of finite stte utomton represents the sttes s circles with the stte nmes s lels, it represents the trnsitions in δ s rrows etween the pproprite stte circles, where ech rrow is leled y the pproprite input symol, ccording to δ, nd it represents finl sttes s doule circles. Exmple. M is given y: S = {S 0,S 1 } Σ = {} δ(s 0,)=S 1 δ(s 1,)=S 0 F = {S 0 } S 0 S 1 Accepted: e,,,,... Rejected:,,,,... Accepted lnguge: n (n is even.) Note tht e is ccepted y this utomton. The utomton strts out in S 0 reding e, nd hlts there. Since S 0 is finl stte, e is ccepted. Fct: finite stte utomton M ccepts e iff S 0 is finl stte. 29

M is given y: S = {S 0,S 1,S 2 } Σ = {} δ(s 0,)=S 1 δ(s 1,)=S 0 F = {S 1 } Accepted:,,,,... Rejected: e,,,,... Accepted lnguge: n (n is odd) M is given y: S = {S 0,S 1 } Σ = {,} δ: (S 0,)=S 1 (S 1,)=S 1 (S 1,)=S 2 (S 2,)=S 2 F = {S 1 } S 0 S 1 S 0 S 1 S 2 Accepted lnguge: n m (n>0, m>0) Two stte digrms re equivlent iff they re stte digrms for equivlent utomt. A stte digrm is reduced iff there is no equivlent stte digrm with fewer sttes. Reduction Theorem: Any two reduced equivlent stte digrms re isomorphic (i.e. differ t most in the nmes of the sttes). Omitted. This mens tht for ech finite stte lnguge, there is, up to isomorphism, unique smllest utomton recognizing tht lnguge. 30

Theorem: For every deterministic finite stte utomton, there is n equivlent totl deterministic finite stte utomton. Let M e deterministic finite stte utomton. Add new stte G (for grge) to M, which is not finl stte. For ech stte S i nd ech symol such tht δ(s i,) is undefined, dd: δ(s i,)=g (lso for G itself). The resulting utomton is deterministic nd totl, nd clerly recognizes the sme lnguge s M. Exmple. We mke the lst utomton totl: S 0 S 1 S 2 G Since it is so esy to turn deterministic utomton into totl deterministic utomton, when I sk you to mke deterministic utomton, I don't require you to mke it totl. But mke sure tht it is deterministic, ecuse often it is much esier to mke nondeterministic utomton thn deterministic one (nd I sometimes don't wnt you to do the esier thing). Exmple: In lphet {,} I wnt n utomton tht recognizes n m k p (n,m>0,k,p 0) -ny string of one or more 's -ny string of one or more 's -ny string consisting of one, followed y s mny 's s you wnt. -ny string consisting of one, followed y s mny 's s you wnt. This is strightforwrd to do non-deterministiclly. S 0 S 1 S 2 31

On the pth from S 0 to S 1 nd looping on S 1, you ccept ny string of one or more 's, nd ny string strting with, followed y ny numer of 's. On the pth from S ) to S 2 nd looping on S 2, you ccept one or more 's nd ny string strting with, followed y ny numer of 's. Since wht the utomton ccepts is the union of wht it ccepts long ech of these pths, it ccepts the lnguge specified. Deterministiclly, you need to think it more, though it is not very difficult: S 3 S 0 S 1 S 4 S 2 S 5 S 6 We will prove elow tht the clss of non-deterministic finite stte lnguges nd the clss of deterministic finite stte lnguges coincide. But we will prove some simpler things first. It is useful to introduce for utomt fmilir notion nd fmilir theorem: Finite stte utomton M is in reduced form iff S 0 does not occur in the rnge of δ, i.e. if M hs no rrows going into S 0. Theorem: Every finite stte utomton is equivlent to finite stte utomton in reduced form. The sme s for grmmrs: replce ech occurrence of S 0 in M y S 0 ', dd new initil stte S 0, nd dd for ech trnsition (S 0 ',) S k trnsition (S,) S k. Mke S 0 finl stte iff e L(M). The resulting utomton is in reduced form nd genertes the sme lnguges s M. 32

Exmple: S 0 S 1 S 0 S 0 ' S 1 Theorem: The right liner lnguges re exctly the finite stte lnguges. Step 1: If lnguge is right liner, there is finite stte utomton ccepting it. Let G = <V N,V T,S,P> e restricted right liner grmmr. We construct finite stte utomton M: 1. Σ = V T. 2. S = V N {Q}, with Q symol not in V N. 3. For every rule A B, we hve trnsition (A,) B in δ. 4. For every rule A, we hve trnsition (A,) Q in δ. 5. S is the initil stte. 6. -If S e is not in G, then Q is the finl stte. -If S e is in G, then Q nd S re the finl sttes. Clim: G nd M re equivlent. A. If G genertes α, M ccepts α. -If α=e nd G genertes α, then S is finl stte nd M ccepts e. -Suppose G genertes α, nd α = 1... n. Then there is derivtion in G of the form: S 1 A 1... 1... n-1 A n-1 1... n This mens tht G contins rules: S 1 A 1,...,A n-2 n-1 A n-1, A n-1 This mens tht in the utomton we hve: S 1 A 1... n-1 A n-1 n Q Clerly, then M ccepts α. B. If M ccepts α, G genertes α. -If α=e nd M ccepts e, then S e is in G, y definition of M, so G genertes e. -If M ccepts α nd α = 1... n, M contins pth of the ove form. Note tht even though, S my in principle e finl stte, if α e, M will not ccept α in S, ecuse S is only finl stte if S e is in G, nd tht cn only e the cse if G is in reduced form. But tht mens tht M is lso in reduced form, nd this mens indeed tht the pth ccepting α is of the ove form. 33

But from the construction of M, we know tht then ll of the rules: S 1 A 1,...,A n-2 n-1 A n-1, A n-1 re in G (ecuse tht's how we got those trnsitions in the first plce). Hence G genertes α. Exmple: S A A A cc A A S 0 A B Q B B B c Step 2: If lnguge is finite stte lnguge, there is right liner grmmr generting it. Let M e finite stte utomton in reduced form. We define grmmr G M : 1. V T = Σ. 2. V N = S 3. For every instruction in δ: (A i,) A k we dd rule: A i A k. 4. For every instruction in δ: (A i,) F, where F is finl stte, we dd rule: A i. 5. S=S 0. 6. If S 0 is finl stte, we dd S e. Since M ws in reduced form, G M is in reduced form. Clerly, y n rgument which is the inverse of the ove rgument, G M will generte wht M ccepts. And G M is right liner. Since the clss of finite stte lnguges is the clss of lnguges ccepted y finite stte utomt in reduced form, we hve proved our theorem. Exmple: S 0 S 1 S 2 S S 1 S S 1 S S S S 2 S S 2 S 1 S1 S1 S2 S2 S2 34

Theorem: The Left liner lnguges re exctly the finite stte lnguges. We define for string 1... n, for lnguge A, nd for restricted right liner grmmr G: The reversl of 1... n, ( 1 2... n ) R = n... 2 1 The reversl of A, A R = {α R : α A} The reversl of G, G R is the result of replcing in G every rule of the form A B y A B. Fct: L(G R ) = (L(G)) R This is ovious: Right liner derivtion D gets replced y left liner derivtion D': D S D' S A A B B THEOREM: If A is finite stte lnguge, A R is finite stte lnguge. Let M e finite stte utomton tht ccepts A. Cse 1. Assume M hs one finl stte F. -turn every trnsition (S i,) S k into (S k,) S i. -mke S 0 the finl stte. -mke F the initil stte. The resulting finite stte utomton ccepts A R. Cse 2. Assume M hs finl sttes F 1,...,F n. -turn every trnsition (S i,) S k into (S k,) S i. -mke S 0 the finl stte. -dd new initil stte S', nd dd for every trnsition: (F i,) S k trnsition (S',) S k. The resulting utomton is in reduced form. If e A, mke S' finl stte s well. The resulting utomton recognizes A R. This completes the proof. Corrollry: The left liner lnguges re exctly the right liner lnguges. -Let A e right liner lnguge. Then A is finite stte lnguge. Then A R is finite stte lnguge, y the ove theorem, nd hence A R is right liner lnguge. Tke right liner grmmr G for A R. G R is left liner grmmr tht genertes A RR, y the erlier theorem. But A RR = A. Hence A is left liner lnguge. -Let A e left liner lnguge. Then A R is right liner lnguge, hence A R is finite stte lnguge, hence A RR is finite stte lnguge, so A is finite stte lnguge, nd hence A is right liner lnguge. The next proof is difficult proof. It is one of the two difficult proofs I do in this clss. I do it, ecuse it illumintes the structure of regulr lnguges so well. 35

Rememer tht we proved erlier tht every regulr lnguge is right liner lnguge, nd hence (we now know) finite stte lnguge. We will now prove the converse of this: THEOREM: Every finite stte lnguge is regulr lnguge. Let M e finite stte utomton with n sttes. Assign numers 1,...,n to the sttes in M: stte m is the stte we ssign numer m. We re going to define for ech numer k n nd ech two sttes i nd j, with i,j n, set of string R k i,j. The intuition is tht we look t ll the pths through the utomton tht ring you from stte i to stte j, nd we re interested in the strings tht re ccepted long those pths. This does not men tht these strings re ccepted y the utomton M, ut only tht if you strt in stte i, these strings will ring you from there to stte j. The numer k puts restriction on which pths to include nd which to exclude. k sys: ignore ny pth tht goes through ny stte m where m > k. This mens, then, tht R n i,j is the set of ll strings tht ring you from stte i to stte j, ecuse there re no sttes m with m > n, so ll pths count. Similrly, R 0 i,j is the set of strings tht ring you from stte i to stte j, while ignoring ny pth tht goes through stte 1,...,n. We will interpret tht s mening tht R 0 i,j is the set of strings tht directly ring you from stte i to stte j. Following this intuition, we will define the ltter sets s follows: Definition: for every i,j n: if i j then: R 0 i,j = {α: δ(i,α) = j} if i = j then: R 0 i,j = {α: δ(i,α) = j} {e} (i.e. this is R 0 i,i) We re now going to look t R k i,j where k >0. R k i,j is the set of strings which ring you from i to j, without going through ny stte with numer higher thn k. Intuitively, we cn split this set of strings into two sets: -the set of strings tht ring you from i to j, without going though ny stte with numer higher thn k 1: tht is, R k 1 i,j -the set of strings tht ring you from i to j, while going through stte k. Let us cll the ltter set for the moment K. Tht mens tht: R k i,j = R k 1 i,j K 36

Now we focus our ttention on set K, the set of strings tht ring us from i to j while going through stte k. Such strings my go through stte k more thn once, in tht cse they loop in stte k. But intuitively we cn divide ny such string into three prts: - string tht rings you from stte i to stte k on pth tht doesn't itself go through stte k. (i.e. the string you get the first time you rech stte k). - string tht rings you from stte k to stte k 0 or more times. - string tht rings you from stte k to stte j on pth tht doesn't itself go through stte k (i.e. the string you get y going from the lst time you re in stte k to stte j). Thus, ny string in K is conctention of string in R k 1 i,k followed (possily) y string tht loops from k to k, followed y string in R k 1 k,j Writing for the moment L for the set of ll middle prts, the strings tht loop in k, we see tht: K = R k 1 i,k L R k 1 k,j Now, the loop strings re strings tht ring you from stte k to stte k. Ech such string cn e descried s conctention of strings tht ring you from stte k to stte k without going through stte k itself. This is ovious: if you loop m times in stte k nd get string α, divide α into the sustrings you get ech time you rech stte k gin: these sustrings themselves do not go through stte k. This mens tht loop set L is the closure under string formtion of the set R k 1 k,k the string closure of the set of strings tht ring you from k ck to k, without going through k (or stte with higher numer): L = (R k 1 k,k)* Filling in L in K, we get: K = R k 1 i,k (R k 1 k,k)* R k 1 k,j Filling in K in R k i,j, we get: R k i,j = R k 1 i,j (R k 1 i,k (R k 1 k,k)* R k 1 k,j) This we use s definition: Definition: for every k, such tht 0 < k n, for every i,j n: R k i,j = R k-1 i,j (R k-1 i,k (R k-1 k,k)* R k-1 k,j) This mens tht, with our two definitions, we hve defined R k i,j for every numer k n, nd for every sttes i,j n. Now we stte theorem: 37

Theorem: for every numer k n nd for every two sttes i,j n: R k i,j is regulr. We prove this with induction to the numer k. We will prove the following two things: Proposition 1: For every two sttes i,j n: R 0 i,j is regulr. Proposition 2: For ny numer k with 0 < k n: If it is the cse tht for every two sttes,: R k 1, is regulr, Then it is the cse tht for every two sttes i,j: R k i,j is regulr The proofs of these two propositions together form n induction proof of the theorem, for the following reson: Proposition 2 sys tht if the theorem holds for k 1, it holds for k (with k >0). Since proposition 1 sys tht the theorem holds for k=0, it then follows with proposition 2, tht the theorem holds for k=1. It holds for k=1, so once gin, proposition 2 sys it holds for k=2, etc. This mens tht, if we cn prove propositions 1 nd 2, we hve indeed proved the theorem. Proof of proposition 1: By definition of R 0 i,j, R 0 i,j is finite set for every i nd j, hence for every i nd j, R 0 i,j is regulr (ecuse finite sets re regulr). Proof of proposition 2: We ssume tht it is the cse for every two sttes, tht R k 1, is regulr. Let i nd j e ny sttes. We prove tht R k i,j is regulr. By definition: By ssumption: R k i,j = R k 1 i,j (R k 1 i,k (R k 1 k,k)* R k 1 k,j) R k 1 i,j is regulr, R k 1 i,k is regulr, R k 1 k,k is regulr, nd R k 1 k,j is regulr. But then R k 1 i,j (R k 1 i,k (R k 1 k,k)* R k 1 k,j) is regulr, since it is uilt from those sets with regulr opertions,, nd *. Hence R k i,j is regulr. With the proof of propositions 1 nd 2 we hve proved the theorem. Now we will use this theorem to prove the min theorem. Let e the numer of the initil stte nd e the numer of finl stte. R n, is the set of ll strings ccepted y M in finl stte. It follows from the theorem just proved tht R n, is regulr. 38

Let e the numer of the initil stte nd 1,..., m e the numers corresponding to ll the finl sttes in M. Then the lnguge ccepted y M is: L(M) = R n,1... R n,m Since we hve just seen tht ll the sets in this union re regulr, L(M) is union of regulr sets, nd hence L(M) is itself regulr. This proves the min theorem: every lnguge ccepted y finite stte utomton is regulr. We hve now proved tht ll the lnguge clsses discussed here, right liner lnguges, left liner lnguges, finite stte lnguges form one n the sme clss of lnguges, the clss of regulr lnguges. Exmple: S 1 S 2 S 3 R 3 13 = R 2 13 ( R 2 13 (R 2 33) * R 2 33) R 2 13 = R 1 13 ( R 1 12 (R 1 22) * R 1 23) R 2 33 = R 1 33 ( R 1 32 (R 1 22) * R 1 23) R 0 13 = {} R 0 11 = {e} R 0 12 = {} R 1 13 = R 0 13 ( R 0 11 (R 0 11) * R 0 13) R 0 22 = {e, } R 1 12 = R 0 12 ( R 0 11 (R 0 11) * R 0 12) R 1 22 = R 0 22 ( R 0 21 (R 0 11) * R 0 12) R 1 23 = R 0 23 ( R 0 21 (R 0 11) * R 0 13) R 1 33 = R 0 33 ( R 0 31 (R 0 11) * R 0 13) R 1 32 = R 0 32 ( R 0 31 (R 0 11) * R 0 12) R 0 21 = Ø R 0 23 = {} R 0 33 = {e,} R 0 31 = Ø R 0 32 = Ø 39

Hence: R 1 32 = Ø (Ø {e} {}) R 1 33 = {e,} ( Ø {e} {}) R 1 23 = {} ( Ø {e} {}) R 1 22 = {e,} ( Ø {e} {}) R 1 12 = {} ( {e} {e} {}) R 1 13 = {} ( {e} {e} {} R 1 32 = Ø R 1 33 = {e,} R 1 23 = {} R 1 22 = {e,} R 1 12 = {} R 1 13 = {} And: R 2 33 = {e,} (Ø ({e,} * {}) R 2 33 = {e,} R 2 13 = {} ( {} {} * {}) R 3 13 = {} ( {} {} * {}) ({} ( {} {} * {})) {e,} * {e,} Since {e,} * {e,} = {e,} * we simplify to: R 3 13 = {} ( {} {} * {}) ({} ( {} {} * {})) {e,} * Since {} ( {} {} * {}) flls under ({} ( {} {} * {})) {e,} * we simplify to: R 3 13 = ({} ( {} {} * {})) {e,} * This lnguge is: n m k (n>0, m 0, k 0) 40

Now the theorem promised ove: Theorem: For every non-determininstic finite stte utomton there is n equivlent deterministic finite stte utomton. We strt with some nottion for non-deterministic finite stte utomt: δ[s,]: the set of sttes tht you get to from S y : δ[s,] = {S 1 : <S,,S 1 > δ} This is the set of sttes tht δ mps S nd onto. δ[s,α]: the set of sttes you get to from S y α: Let α = 1 n δ 1 [S,α] = δ[s, 1 ] δ 2 [S,α] = {S 2 : S 1 δ 1 [S,α]: δ[s 1, 2 ]=S 2 } δ i [S,α] = {S i : S i 1 δ i 1 [S,α]: δ[s i 1, 2 ]=S i } δ[s,α] = δ n [S,α] This is the set of sttes for which there is derivtion of α from S. Let M e non-deterministic finite stte utomton. We define deterministic finite stte utomton K: S K = pow(s M ) F K = {S K : S M : S M S K } stte of M. S 0,K = {S 0, M } The set of sets of K-sttes tht contin t lest one finl δ[{s 1,,S i },] = δ[s 1,] δ[s i,] Clim: δ[{s 0 },α] = δ[s 0,α] Step 1: δ[{s 0 },e] = δ[s 0,e] (which is {S 0 } or Ø) Step 2: Assume tht δ[{s 0 },α] = δ[s 0,α] Then δ[{s 0 },α] = {S: S 1 δ[{s 0 },α]: S δ(s 1,)} = {S: S 1 δ[s 0,α]: S δ(s 1,)} (y induction) = δ[s 0,α] With this it follows tht: δ[{s 0 },α] F K iff δ[s 0,α] F K By definition of F K : δ[s 0,α] F K iff S δ[s 0,α]: S F M Hence: δ[{s 0 },α] F K iff S δ[s 0,α]: S F M This mens tht K genertes α iff M genertes α, hence K nd M re equivlent. 41

THE PUMPING LEMMA FOR REGULAR LANGUAGES. Let A e regulr lnguge. A is ccepted y finite stte utomton M, with, sy, n sttes. Let α A, α = 1... m, m n. Assume tht there is pth through M for 1... m from S 0 to finl stte F. Let's cll the occurrences of sttes on tht pth S 0...S m. Since m n, it is not possile tht ll the sttes S 0...S m re distinct, ecuse S 0...S m form t lest n+1 occurrences of sttes, nd there re only n sttes. This mens tht for some j,k n: S j = S k (let's ssume j < k). In other words, S 0...S m contins loop. Suppose sustring j+1... k is prt of 1... m ccepted y going through this loop once. We know then tht: 1 j+1... k n. Now, insted of going through the loop from S j to S k in S 0...S m, nd then on to S k+1, we could hve skipped the loop nd gone on directly from S j to S k+1, nd the resulting string, α with j+1... k replced y e, would lso hve een ccepted. Hence, α with j+1... k replced y e, is lso in lnguge A. Similrly, we could hve gone through the loop twice, nd then go on s efore, nd the resulting string, α with j+1... k replced y j+1... k j+1... k, would lso hve een ccepted, hence, α with j+1... k replced y j+1... k j+1... k is lso in lnguge A. Thus, if 1... j j+1... k k+1... m A, then 1...j(j +1... k ) z k +1... m A, for every z 0. Hence, for every sufficiently long strong string 1... m A, we cn find sustring tht cn e 'pumped' through the loop, nd the result is lso in A. This is the pumping lemm. Pumping lemm for regulr lnguges: Let A e regulr lnguge. There is numer n clled the pumping constnt for A(not greter thn the numer of sttes in the smllest utomton ccepting A) such tht: For every string φ A with φ n: φ cn e written s the conctention of three sustrings: αβγ such tht: 1. αβ n 2. β >0 3. for every i 0: αβ i γ A. 42

Appliction: m m is not regulr lnguge. Assume m m is regulr lnguge. Let n e the pumping constnt for this lnguge. Choose numer k such tht 2k>n,nd consider the string k k m m of length 2k:... k k According to the pumping lemm, we cn write this string s αβγ, where β e, αβ n nd αβ i γ m m. Try to divide this string. -If β consists only of 's, then pumping β will mke the numer of 's nd 's not the sme, hence the result is not in m m. -If β consists only of 's, the sme. -If β consists of 's nd 's, it is of the form u z. So our string is:...( u z )... But then pumping β once gives:...( u z ) ( u z )... nd this string hs the 's nd 's mixed, in the middle, hence it is not in m m. Since these re the only three possiilities, we cnnot divide this string in wy tht stisfies the pumping lemm. This mens tht m m does not stisfy the pumping lemm for regulr lnguges, nd hence m m is not regulr lnguge. We will see shortly tht it follows from this tht English is not regulr lnguge. Note tht the pumping lemm goes one wy: if lnguge is regulr, it stisfies the pumping lemm. But lnguges tht stisify the pumping lemm re not necessrily regulr. Let L 1 nd L 2 e lnguges such tht L 1 L 2 = Ø. Assume tht L 1 is regulr, ut L 2 is not, sy, L 2 is intrctle. Let n e the pumping constnt for L 1 nd let L 1 n e the set of L 1 strings of length lrger thn n. Look t L 1 n L 2. Oviously, L 1 n L 2 is intrctle in the sme wy tht L 2 is. But, L 1 n L 2 stisfies the pumpinglemm, ecuse the strings in L 1 n do, y the fct tht L 1 is regulr. 43

CLOSURE PROPERTIES OF REGULAR LANGUAGES. We lredy know tht if A,B re regulr, then so re A*, A B, nd A B. Theorem: Let L e lnguge in lphet A (L A*). If L is regulr, then A* L is regulr. A* L is the complement of L in A*. This mens tht the clss of regulr lnguges is closed under complementtion. Let L e regulr, nd let M e deterministic nd totl utomton ccepting L Mke every finl stte in M non-finl nd every non-finl stte finl. The resulting utomton ccepts A* L Exmple: n m (n,m > 0) S 0 S 1 S 2 G S 0 S 1 S 2 G Any string, s long s it is not just 's followed y 's: Exmple: 44

Corrollry: If A nd B re regulr lnguges then A B is regulr lnguge. Let A nd B e regulr lnguges in lphet Σ (which cn e tken to e just the union of the symols occurring in A nd the symols occurring in B). A B = Σ* ((Σ* A) (Σ* B)). Tht is, the opertion of intersection cn e defined s sequence of compositions of the opertions of complementtion nd union, Since we hve proved the ltter opertions to e regulr, nd since sequences of compositions of regulr opertions re regulr, intersection is regulr. Mking n intersection utomton is lot of work, though. -Strt with deterministic utomton of A nd deterministic utomton for B. -Tke for oth of them the complement utomton (i.e. switch finl nd non-finl sttes). -For the resulting two utomt, M 1 nd M 2 form the union utomton. This goes in the sme wy s we did for right liner grmmrs: mke the sttes of the two utomt disjoint, dd new initil stte, dd for every rrow leving M 1 's old initil stte to some stte S i similr rrow from the new initil stte to S i, nd the sme for ny such rrow leving M 2 's old initil stte. Mke the new initil stte finl stte if one of the old initil sttes ws finl. -Next convert this utomton to deterministic utomton (since the union procedure tends to give you non-deterministic utomton). And finlly tke the complement utomton of the result. This will e n utomton for the intersection. I will give simple construction of n intersection utomton lter in this course. These resuls men tht the set of regulr lnguges in certin lphet form Boolen lger. Let A nd B e lphets. A homomorphism from A* into B* is function tht mps strings in A* onto strings in B* in which the vlue for complex string in A* is completely determined y the vlues for the symols of A. Formlly: A homomorphism from A into B is function h:a* B* such tht: 1. h(e)=e 2. for every string in A* of the form α, with α A* nd A: h(α) = h(α) h(). So: if h()= nd h()=, then: h()= h()h() = h() = h()h() = h() =. Let L e lnguge in lphet A, nd let h:a* B* e homomorphism, then: the homomorphic imge of L under h, h(l) is given y: h(l) = {h(α): α L} Theorem: If L is regulr lnguge in lphet A nd h:a* B* is homomorphism, then h(l) is regulr lnguge in lphet B. elow 45

Exmple: Let A = {,}. Let B = {,,c} Let h:a* B* e homomorphism such tht: h(e)=e h()= h()=cc We know tht: n m (n,m>0) is regulr lnguge. h( n m (n,m>0)) = () n (cc) m (n,m>0). It follows tht: () n (cc) m (n,m>0) is lso regulr lnguge. Exmple: Let A = {,,c}. Let h:a* A* e homomorphism such tht: h(e)=e h()= h()= h(c)=e We hve proved tht n n (n 0) is not regulr lnguge. We look t: n c n (n 0). h( n c n (n 0)) = n n (n 0). Consequently we know: if n c n (n 0) were regulr, n n (n 0) would e regulr. But n n (n 0) is not regulr. Hence: n c n (n 0) is not regulr. Let h:a* B* e homomorphism. For ech string β B*, we define: h 1 (β) = {α A*: h(α)=β} We cll h 1 the inverse homomorphism of h. (Note: h 1 is not function from B* into A*, ut from B* into pow(a*).) For L B* we define: h 1 (L) = {α A*: h(α) L} Theorem: If L is regulr lnguge in lphet B nd h:a* B* is homomorphism, then h 1 (L) is regulr lnguge in lphet A. Let h: A* B* is homomorphism nd L regulr lnguge in B*. Let M e deterministic totl finite stte utomton for L. We simplify the nottion introduced for Non-deterministic finite stte utomt: Let β = 1. n B* δ 1 [S,β] = δ[s, 1 ] δ 2 [S,β] = δ[δ 1 [S,β], 2 ] δ i [S,β] = δ[δ i 1 [S,β], i ] δ[s,β] = δ n [S,β] 46

We turn M into M': deterministic totl finite stte utomton on A* y defining: δ[s,] = δ[s,h()] Clim: M' cceps α iff M ccepts h(α) Cse 1: M' ccepts iff M ccepts h() M' gets to the sme stte finl or non-finl stte fter where M gets fter h(), y the construction. Cse 1: Assume M' ccepts α iff M ccepts h(α) Then M' ccepts α iff M ccepts h(α)h(). Agin, this is ovious from the construction. Let A nd B e lphets. A sustitution from A into B is function tht mps every string of A* onto set of strings in B which is determined y the vlues of the symols in the following wy: The strings in the set ssocited with complex string α re gotten y sustituting ll vlues for the symols in α t the plce where they occur. Formlly: A sustitution from A into B is function s:a* pow(b*) such tht: 1. s(e)={e} 2. For ny string of the form α, with α A* nd A: s(α) = s(α) s() If L is lnguge in lphet A nd s sustitution from A into B, then, the sustitution lnguge of L reltive to s, s(l), is given y: s(l) = α L s(α). Ide: s() = { 1, 2 } s() = { } s(c) = { 1, 2 } ^ ^ c L { 1, 2 } { } { 1, 2 } s(l) Exmple: n cd m (n,m>0) is regulr lnguge in A={,,c,d} (, followed y s mny 's s you wnt, followed y c, followed y s mny d's s you wnt). We tke lphet B = {John, Bill, Mry, nd, wlk, tlk, sing} nd we tke s, sustitution from A into B given y: s() = {John, Bill, Mry} s() = {nd John, nd Bill, nd Mry} s(c) = {wlk, tlk, sing} s(d) = {nd wlk, nd tlk, nd sing} 47

s(cddd) = {John, Bill, Mry} {nd John, nd Bill, nd Mry} {nd John, nd Bill, nd Mry} {wlk, tlk, sing} {nd wlk, nd tlk, nd sing} {nd wlk, nd tlk, nd sing} {nd wlk, nd tlk, nd sing}. So, one of the strings in the sustitution lnguge of cddd is: John nd Bill nd Mry wlk nd tlk nd sing. Another one is: Mry nd Mry nd Mry tlk nd tlk nd tlk. s( n cd m (n,m>0)) = {John, Bill, Mry} {nd John, nd Bill, nd Mry} + {wlk, tlk, sing} {nd wlk, nd tlk, nd sing} +. which contins ny string, strting with either John or with Bill or with Mry, followed y one or more occurrences of strings in {nd John, nd Bill, nd Mry}, followed y one of the items wlk or tlk or sing, ending with one or more of the items in {nd wlk, nd tlk, nd sing}. Theorem: If L is regulr lnguge in lphet A nd s is sustitution from A* into pow(b*), tht mps every symol in A onto regulr suset of B*, then s(l) is regulr lnguge in lphet B. Let L e regulr lnguge in A. This mens tht L is derived with,, * from finite lnguges L 1,,.L n 1. If A then s() is regulr lnguge. If α A*, α = 1. n, then s(α) = s( 1 ) s( n ). Since for ech i n: s( i ) is regulr, α is regulr. If L is finite, L = {α 1,,α n }. s(l) = s({α 1, α n }) = s({α 1 }) s({α n }). Since for ech i n: s({α i }) is regulr, s({α 1 }) s({α n }) is regulr. Hence s(l) is regulr. 2. Assume L = L 1 L2, with L1 nd L2, s(l1) nd s(l2) regulr. s(l 1 L 2 ) = s(l 1 ) s(l 2 ), hence s(l 1 L 2 ) is lso regulr, so s(l) is regulr. 3. Assume L = L 1 L2, with L1 nd L2, s(l1) nd s(l2) regulr. s(l 1 L 2 ) = s(l 1 ) s(l 2 ), hence s(l 1 L 2 ) is lso regulr, so s(l) is regulr. 4. Assume L = L 1 *, with L1 nd s(l1) is regulr. s(l 1 *) = s(l 1 )*, hence s(l 1 *) is lso regulr, so s(l) is regulr. 48

Corrollry: If L is regulr lnguge in lphet A nd h is homomorphism from A* into B* then s(l) is regulr lnguge. Insted of h(α)=β, we cn write h(α)={β}, without loosing ny informtion. This mens tht homomorphisms re specil cse of sustitutions, with the vlues singleton sets, hence finite, hence regulr. This mens tht the ove theorem pplies to them. With the theorem, we don't hve to prove seprtely tht this lnguge is regulr lnguge, tht follows from the theorem. We see tht, with the notions of homomorphism, inverse homomorphism, sustitutions, we cn extend our theorems from lnguges tht look like toy lnguges (with little 's nd 's) to lnguges tht look suspiciously like nturl lnguges. We use the fct tht the intersection of regulr lnguges is regulr nd the fct tht regulr lnguges re closed under homomorphisms to prove tht the nturl lnguge English is not regulr lnguge: Theorem: English is not regulr lnguge. Let the set of grmmticl strings of English e E. I specify sequence of strings α 1,α 2,... α 1 = the fct tht Fred ws clever ws surprising α 2 = the fct tht the fct tht Fred ws clever ws surprising ws surprising α 3 = the fct tht the fct tht the fct tht Fred ws clever ws surprising ws surprising ws surprising... nd we set: L = {α 1,α 2,α 3,...} We re deling with the following structure: IP DP I' D NP I PRED the N CP ws surprising fct C IP tht Fred ws clever 49

The ide is, of course, tht we expnd the tree y looping the it etween the IP nodes: IP DP I' D NP I PRED the N CP ws surprising fct C IP tht DP I' D NP I PRED the N CP ws surprising fct C IP tht Fred ws clever And L is: L = the fct tht n Fred is clever ws surprising n, (n>0) Choose the following homomorphism: h(the fct tht)=, h(fred is clever)=c, h(ws surprising)=.. Then the homomorphic imge of L is n c n which we showed to e not regulr. Consequently: L is not regulr lnguge. Now we look t it wider lnguge, L': L' = the fct tht n Fred is clever ws surprising m, (n,m>0) We chose homomorphism such tht: h() =the fct tht, h(c)=fred is clever, h()=ws surprising. L' is the homomorphic imge of n c m (n,m>0), which we showed erlier to e regulr. Hence: L' is regulr. 50

Now we oserve the empiricl fct out English: Empiricl Fct: E L' = L The only strings of L' tht re grmmticl in English re the strings sentences in L. So we hve three fcts: 1. L is not regulr. 2. L' is regulr. 3. L = L' E. Suppose English were regulr lnguge. Then oth E nd L' would e regulr. The intersection of two regulr lnguges is regulr, hence L would e regulr. But L is not regulr. Since L' is regulr, it follows tht E is not regulr. This completes the proof. Other exmple: The mn who n sincerely n elieves tht he is crzy, is indeed crzy n 0 = the mn who = elieves tht he is crzy 51