Lecture 09: Myhill-Nerode Theorem

CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives First, we will see prcticl lgorithm for minimizing DFA, nd provide theoreticl nlysis of the sitution 1 On the numer of sttes of DFA 11 Strting DFA from dierent sttes Consider the DFA on the right It hs prticulr dened strt stte However, we could strt it from ny 1 2 of its sttes If the originl DFA ws nmed M, dene M q to e the DFA with its strt stte chnged to stte q Then the lnguge L q, is the one 7 4 ccepted if you strt t q For exmple, in this picture, L 3 is (+), nd L 6 is the sme Also, L 2, nd L 5 re oth ( + ) Finlly, L 7 is Suppose tht L q = L r, for two sttes q nd r Then once we get to q or r, the DFA is going to do the sme thing from then on (ie, its going to ccept or reject exctly the sme strings) So these two sttes cn e merged In prticulr, in the ove utomt, we cn merge 2 nd 5 nd the sttes 3 nd 6 We cn the new utomt, depicted on the right, 3, 5 6, 1 2/5 3/6, 4 7 12 Sux Lnguges Let Σ e some lphet Denition 11 Let L Σ e ny lnguge The sux lnguge of L with respect to word x Σ is dened s { L/x = y x y L 1

In words, L/x is the lnguge mde out of ll the words, such tht if we ppend x to them s prex, we get word in L The clss of sux lnguges of L is { L/x C(L) = x Σ Exmple 12 For exmple, if L = 0 1, then: L/ɛ = 0 1 = L L/0 = 0 1 = L L/0 i = 0 1 = L, for ny i N L/1 = 1 L/1 i = 1, for ny i 1 L/10 { = y 10y L = Hence there re only three sux lnguges for L: 0 1, 1, So C(L) = { 0 1, 1, As the ove exmple demonstrtes, if there is word x, such tht ny word w tht hve x s prex is not in L, then L/x =, which implies tht is one of the sux lnguges of L Exmple 13 The ove suggests the following utomt for the lnguge of Exmple 12: L = 0 1 0 1 0, 1 0 1 1 1 0 And clerly, this is the utomt with the smllest numer of sttes tht ccepts this lnguge 121 Regulr lnguges hve few sux lnguges Now, consider DFA M = (Q, Σ, δ, q 0, F ) ccepting some lnguge L Let x Σ, nd let M rech the stte q on reding x The sux lnguge L/x is precisely the set of strings w, such tht xw is in L But this is exctly the sme s L q Tht is, L/x = L q, where q is the stte reched y M on reding x Hence the sux lnguges of regulr lnguge ccepted y DFA re precisely those lnguges L q, where q Q Notice tht the denition of sux lnguges is more generl, ecuse it cn lso e pplied to non-regulr lnguges 2

Lemm 14 For regulr lnguge L, the numer of dierent sux lnguges it hs is ounded; tht is C(L) is ounded y constnt (tht depends on L) Proof: Consider the DFA M = (Q, Σ, δ, q 0, F ) tht ccepts L For ny string x, the sux lnguge L/x is just the lnguges ssocited with L q, where q is the stte M is in fter reding x Indeed, the sux lnguge L/x is the set of strings w such tht xw L Since the DFA reches q on x, it is cler tht the sux lnguge of x is precisely the lnguge ccepted y M strting from the stte q, which is L q Hence, for every x Σ, L/x = L δ(q0,x), where q is the stte the utomton reches on x As such, ny sux lnguge of L is relizle s the lnguge of stte of M Since the numer of sttes of M is some constnt k, it follows tht the numer of sux lnguges of L is ounded y k An immedite impliction of the ove lemm is the following Lemm 15 If lnguge L hs innite numer of sux lnguges, then L is not regulr 122 The sux lnguges of non-regulr lnguge Consider the lnguge L = { n n n N The sux lnguge of L for i is L/ i = { n i n n N Note, tht i L/ i, ut this is the only string mde out of only s tht is in this lnguge As such, for ny i, j, where i nd j re dierent, the sux lnguge of L with respect to i is dierent from tht of L with respect to j (ie L/ i L/ j ) Hence L hs innitely mny sux lnguges, nd hence is not regulr, y Lemm 15 Let us summrize wht we hd seen so fr: Any stte of DFA of lnguge L is ssocited with sux lnguge of L If two sttes re ssocited with the sme sux lnguge, tht we cn merge them into single stte At lest one non-regulr lnguge { n n n N hs n innite numer of sux lnguges It is thus nturl to conjecture tht the numer of sux lnguges of lnguge, is good indictor of how mny sttes n utomt for this lnguge would require And this is indeed true, s the following section testies 3

2 Regulr Lnguges nd Sux Lnguges 21 A few esy oservtions Lemm 21 If ɛ L/x if nd only if x L Proof: By denition, if ɛ L/x then x = xɛ L Similrly, if x L, then xɛ L, which implies tht ɛ L/x Lemm 22 Let L e lnguge over lphet Σ For ll x, y Σ we hve tht if L/x = L/y then for ll Σ we hve L/x = L/y Proof: If w L/x, then (y denition) xw L But then, w L/x Since L/x = L/y, this implies tht w L/y, which implies tht yw L, which implies tht w L/y This implies tht L/x L/y, symmetric rgument implies tht L/y L/x We conclude tht L/x = L/y 22 Regulr lnguges nd sux lnguges We cn now stte chrcteriztion of regulr lnguges in term of sux lnguges Theorem 23 (Myhill-Nerode theorem) A lnguge L Σ is regulr if nd only if the numer of sux lnguges of L is nite (ie C(L) is nite) Moreover, if C(L) contins exctly k lnguges, we cn uild DFA for L tht hs k sttes; lso, ny DFA ccepting L must hve k sttes e Proof: If L is regulr, then C(L) is nite set y Lemm 14 Second, let us show tht if C(L) is nite, then L is regulr Let the sux lnguges of L C(L) = { L/x 1, L/x2,, L/xk (1) Note tht for ny y Σ, L/y = L/x j, for some j {1,, k We will construct DFA whose sttes re the vrious sux lnguges of L; hence we will hve k sttes in the DFA Moreover, the DFA will e designed such tht fter reding y, the DFA will end up in the stte L/y The DFA is M = (Q, Σ, q 0, δ, F ) where Q = { L/x 1, L/x2,, L/xk q 0 = L/ɛ, F = { L/x ɛ L/x Note, tht y Lemm 21, if ɛ L/x then x L δ ( L/x, ) = L/x for every Σ The trnsition function δ is well-dened ecuse of Lemm 22 We cn now prove, y induction on the length of x, tht fter reding x, the DFA reches the stte L/x If x L, then ɛ L/x, which implies tht δ(q 0, x) = L/x F Thus, 4

x L(M) Similrly, if x L(M), then L/x F, which implies tht ɛ L/x, nd y Lemm 21 this implies tht x L As such, L(M) = L We hd shown tht the DFA M ccepts L, which implies tht L is regulr, furthermore M hs k sttes We next prove tht ny DFA for L must hve t lest k sttes So, let N = (Q, Σ, δ N q init, F ) ny DFA ccepting L The lnguge L hs k sux lnguges, generted y the strings x 1, x 2,, x k, see Eq (1) For ny i j, we hve tht L/x i L/xj As such, there must exist word w such tht w L/x j nd w / L/xj (the symmetric cse where w L/xj \ L/xi is hndled in similr fshion But then, x i w L nd x j w / L Nmely, N(q init, x) N(q init, y), nd the two sttes tht N reches for x i nd x j respectively, re distinguishle Formlly, let q i = δ(q init, x i ), for i = 1,, k All these sttes re pirwise distinguishle, which implies tht N must hve t lest k sttes Remrk 24 The full Myhill-Nerode theorem lso shows tht ll miniml DFAs for L re isomorphic, ie hve identicl trnsitions s well s the sme numer of sttes, ut we will not show tht prt This is done y rguing tht ny DFA for L tht hs k sttes must e identicl to the DFA we creted ove This is it more involved nottionlly, nd is proved y showing 1 1 correspondence etween the two DFAs nd rguing they must e connected the sme wy We omit this prt of the theorem nd proof 23 Exmples Let us explin the theorem we just proved using n exmple Consider the lnguge L {, : { L = w w hs n odd numer of 's The sux lnguge of x Σ, where x hs n even numer of 's is: { L/x = w w hs n odd numer of 's = L The sux lnguge of x Σ, where x hs n odd numer of 's is: { L/x = w w hs n even numer of 's Hence there re only two distinct sux lnguges for L By the theorem, we know L must e regulr nd the miniml DFA for L hs two sttes Going with the construction of the DFA mentioned in the proof of the theorem, we see tht we hve two sttes, q 0 = L/ɛ nd q 1 = L/ The trnsitions re s follows: From q 0 = L/ɛ, on we go to L/, which is the stte q 1 From q 0 = L/ɛ, on we go to L/, which is sme s L/ɛ, ie the stte q 0 From q 1 = L/, on we go to L/, which is sme s L/ɛ, ie the stte q 0 5

From q 1 = L/, on we go to L/, which is sme s L/, ie the stte q 1 The initil stte is L/ɛ which is the stte q 0, nd the nl sttes re those sttes L/x tht hve ɛ in them, which is the set {q 1 We hence hve DFA for L, nd in fct this is the miniml utomton ccepting L 6