Fall 1999 Formal Language Theory Dr. R. Boyer Week Four: Regular Languages; Pumping Lemma 1. There are other methods of nding a regular expression equivalent to a nite automaton in addition to the ones discussed in the last lecture. One treats the problem as one of solving a system of equations for the language, where concatenation plays the role of multiplication and union the role of addition. We are given the DFA M =(Q; ;;s;f). Let X q = fx 2 : (q; x) 2 F g which isthe set of all strings in that can accepted by the DFA if its start state is q rather than s. Then we nd that: X q = [ a2 ax (q;a) :ifq=2 F; while = [ a2 ax (q;a) [ ; if q 2 F: This is the linear system for the sets X q 's we mentioned above where addition is replaced by union and multiplication by concatenation. For regular languages, we need what is known as Arden's Lemma: Arden's Lemma: Let A; B with e=2 A: Then the equation: X = A X [ B has the unique solution X = A B: Proof: Step 1: If X is a solution, then A B X: To see this, note that A B =(A + [ e)b = A + B [ B = A(A B) [ B: Step 2: X A B: By Step 1, X = A B [ C; since A B X with C \ A B = ;: We want to show that C = ;: Now X = AX [ B; so A B [ C = A(A B [ C) [ B = A + B [ AC [ B = A + B [ B [ AC =(A + [ e)b [ AC = A B [ AC 1
Next, consider the relation: (A B [ C) \ C =(A B [ AC) \ C: Then C = AC \ C; so C AC: Since e 6= A; the shortest string in AC must be longer than the shortest string in C: Hence, AC = C = ;: We conclude: A B is the unique solution. Note: If e 2 A; then the solution A B is no longer unique but it is the smallest solution. Example: We shall indicate an outline of the calculation of the regular expression for the DFA given in the last lecture using the equational approach. We obtain the following system of equations: X 1 = ax 2 [ bx 3 X 2 = ax 1 [ bx 3 [ X 3 = ax 2 [ bx 3 [ Note the term in the equations for the states which are accepting. The goal is to nd X 1 since q 1 is the initial state. We start solving this system. Since X 1 = ax 2 [bx 3,we obtain X 2 = a(ax 2 [bx 3 )[bx 3 [. By Arden's Lemma, we nd X 2 =(aa) [(ab [ b)x 3 [ ]. By substituting this equation into X 3 = ax 2 [ bx 3 [, we obtain another equation for X 3 that can be solved using Arden's Lemma. Because X 2 and X 3 are now known, these identities can be used in X 1 = ax 2 [bx 3 to determine X 1. 2. We now present a useful theoretical result that states that regular languages must obey a certain type of \periodicity" property. It is used to show that certain simple languages cannot be regular. Pumping Lemma. Let M =(K; ;;s;f)beadfa, with L = L(M): Suppose m = jkj: Let w 2 L(M) with jwj m: Then there are strings x; y; and z such that w = xyz; jxyj m; y 6= e; and xy k z 2 L; 8k 0: We call m; the pumping constant. Idea of the Proof. Any string accepted by M whose length is greater than the number of states of the machine must have a loop in it. It is precisely this loop that can be iterated. 2
We use the contrapositive form of the pumping lemma to show that a language is NOT regular - Let L be a language. Suppose that there exists a string w with substrings x; y; z such that y 6= e; w = xyz; and xy k z=2 L; for some integer k 0; then L cannot be aregular language. So, to show a language is NOT regular, think of playing the following sort of game: nd a string w 2 L so that for any non-empty substring y of w; there exists some pumped form of w : xy k z so that xy k z=2 L: Example (1) L 1 = fa n b n : n 1g is not regular. Suppose the language were regular. Choose n greater than the pumping constant given above. Then w = a n b n can be factored as xyz; y 6= e; and xy k z 2 L 1 ; for all k 0: Choose k =0; so xz 2 L; but xz = a n,jyj b n 2 L: Contradiction. We say that we pumped "down" in this example. Example (2) L 2 = fa n2 : n 1g is not regular. Suppose L 2 were regular. Choose n greater than the pumping constant m: Then a n2 = xyz; where y 6= e and jyj m n: So, xy k z 2 L 2 ; for all k 0: Choose k =2: Then xy 2 z 2 L 2 implies jxy 2 zj is a perfect square. But n 2 < jxy 2 zj <n 2 + n<(n +1) 2 : Contradiction. In this example, we say that we pumped "up." Example (3) L 3 = fw w R : w 2 g is not regular if = fa; bg: Suppose L 3 were regular. Choose the string w so jwj m +1; where m is the pumping constant. Further, we maychoose w to have the special form: w = a m b; so ww r = a m bba m : By the Pumping Lemma, ww R = xyz; with jxyj m and xy k z 2 L 3 ; for k 0: Take k =0: Then xz = a m,jyj bba m 2 L 3 : Contradiction. 3. Problem: Given a DFA M; nd an equivalent DFA with a minimum number of states. 3
We present two solutions to this problem. The rst one of algorithmic. The second one is more conceptual and proves that the equivalent minimum state DFA is unique, up to the labeling of its states. 4. First Method: Merging of Equivalent States Let M = (K; ;;q 0 ;F)beaDFA. Given two states q and q 0 equivalence relation on the states of M by: from K; we dene an q q 0 means (q; w) 2 F () (q 0 ;w) 2 F; 8w 2 : The -equivalence classes are computed by a sequence of other equivalence relations n by successive renements. Let q and q 0 be two states of M: Then: q 0 q 0 means q 2 F () q 0 2 F: That is, 0 has two equivalence classes: the set of accepting states F and the set of rejecting states Q n F: For n > 0; dene n+1 to mean: q n+1 q 0 as q n q 0 and (q; a) n (q 0 ;a); 8a 2 : That is, q n q 0 means (q; w) 2 F () (q 0 ;w) 2 F; for all strings w whose length is less than or equal n: The equivalence classes of n stabilize for n less than or equal to the number of states of the automaton M: Further, q q 0 if and only if q n q 0 ; for all n: Now, if all the states of M are reachable from the start state q 0 and if equivalent states are merged, then the resulting automaton has a minimum number of states. These observations give rise to an eective algorithm to nd the minimum state automaton, by successively computing the n -equivalence classes, for n = 0; 1; 2;:::: The process terminates when the equivalence classes for two successive values of n agree. Algorithm for Merging Equivalent States: We rst make a table of unordered pairs of distinct states. No pair is marked. (1) First, mark all pairs of inequivalent states relative to strings of length 0; so mark the pair fp; qg if p 2 F; q 2 Q n F or p 2 Q n F; q 2 F: 4
(2) Next, we mark all pairs of inequivalent states relative to strings of length k = 1; 2; ::; n; where n is the total number of states of the original DFA. for k =1::n do if there is an unmarked pair fp; qg, so that f(p; );(q;)g is marked, then mark the pair fp; qg: od; (3) When the loop terminates, all inequivalent pairs are marked; so the unmarked pairs are equivalent states. Merge these pairs together. Example. State a b 0 1 2 1 3 4 2 4 3 3 5 5 4 5 5 5 5 5 The accepting states are f1; 2; 5g: The result of the algorithm is seen to be that states 1 and 2 should be merged and states 3 and 4 should be merged as well. f0; 3g! a f1; 5g; f0; 3g! b f2; 5g: f0; 4g! a f1; 5g; f0; 4g! b f2; 5g f1; 2g! a f3; 4g; f1; 2g! b f3; 4g: f1; 5g! a f3; 5g; f1; 5g! b f4; 5g: f3; 4g! a f5; 5g; f3; 4g! b f5; 5g: f2; 5g! a f4; 5g; f2; 5g! b f3; 5g: 5
5. Second Method: Construction of the Minimum State DFA directly from the Language L Let M =(K; ;;q 0 ;F) be a nite deterministic automaton such that all its states are reachable from its start state. Let L = L(M) be the language it accepts. We associate with M a special equivalence relation R M on ; where xr M y () (q 0 ;x)=(q 0 ;y); where x; y 2 that is, two strings x and y are equivalent if they terminate at the same state. Hence, we can identify the R M equivalence classes [x] M with the sates of M: The language L(M) is the union of the R M -equivalence classes which include an element x; so (q 0 ;x) 2 F: We may call R M ; machine equivalence. We call an equivalence relation R on right-invariant if xry ) xzryz; for all strings z 2 : Note: R M is right invariant. Let L be any language over the alphabet ; that is, L : We can associate an equivalence relation R L on directly from L; without using a nite automaton. Given any two strings x; y 2 ; we say xr L y () xz 2 L exactly when yz 2 L; for all z 2 : Note: the equivalence relation R L is right invariant and R L is a renement of R M ; if L = L(M); foradfa M: 6. We can construct a deterministic nite automaton M L directly from the equivalence relation R L ; if R L has FINITE index; that is, if the number of R L equivalence classes is nite. We set M L =(K L ; ; L ;s L ;F L ): Let K L ; the states of the machine M L ; be the collection of all R L -equivalence classes; write them as [x] L ; for a string x: 6
The transition function L : K L! K L is given as: L ([x] L ;a)=[xa] L : Note: L is well dened. Set s L =[e] L and F L = f[x] L : x 2 Lg: The minimum state automaton accepting L is given by M L ; further, any other minimum state automaton that accepts L can be identied with M L ; by a re-labeling of its states. The regular languages are closed under homomorphisms. A homomorphism is a map h :!, such that for all x; y 2 wehave that h(xy) = h(x)h(y) and h() =. It follows at once that the values of h are determined on any string once they are known for the letters of. Proposition. Let h :!, be a homomorphism. Let L be a regular language. Then h(l) is also regular. For the proof, we use the fact that L is denoted by some regular expression. Then the argument reduces to the following formula: L(h()) = h(l()); for any regular expression. This formula can be established by induction on the number of operators in the regular expression. The base case of zero operators corresponds to = 2 ; or ;. In all three cases, the desired formula holds. We next need to make the observations that (1) h(l 1 L 2 )=h(l 1 ) h(l 2 ), and (2) h( S L k )= S h(l k ). The result now follows in a routine fashion. 7
Proposition. Let h :!, be a homomorphism. Let L 0, be a regular language. Then so is h,1 (L 0 ). To establish this result, we use the fact that L 0 is accepted by some DFA M 0 = (K 0 ;,; 0 ;s;f 0 ). Let M denote the DFA that will accept h,1 (L 0 ); where M =(K; ;;s;f) where K = K 0 ;F = F 0 ; and s = s 0.We dene (q; a) = 0 (q; h(a)): We can establish by induction that (q; x) = 0 (q; h(x)); where x 2 : Finally, we observe that x 2 L(M) () (s; x) 2 F () (s; h(x)) 2 F () h(x) 2 L(M 0 ) () x 2 h,1 (L(M 0 )). Let L be a language over the one letter alphabet fag. Then L is regular if and only if the set of non-negative integers U = fm : a m 2 Lg is ultimately periodic; that is there are integers n 0 and p>0 such that for all m n; m 2 U if and only if m + p 2 U. We call the number p the period of U. To see this result, we use the fact that a regular language is accepted by adfa. Complexity of Algorithms for Finite Automata and Regular Languages: (a) There is an exponential algorithm which constructs an equivalent deterministic nite automaton for a given non-deterministic nite automaton. (b) There is an exponential algorithm algorithm that constructs an equivalent regular expression for a given non-deterministic automaton. (c) There is a polynomial algorithm which constructs an equivalent non-deterministic nite automaton for a given regular expression. (d) There is a polynomial algorithm that constructs a minimum state deterministic nite automaton for a given deterministic nite automaton. 8
(e) There is a polynomial algorithm which decides when two deterministic nite automata are equivalent. (f) There is an exponential algorithm which decides when two non-deterministic nite automata are equivalent. (g) For a given regular language L and string w, there is an algorithm that decides if w belongs to L whose complexity is linear in the length of w. (h) For a given regular language L = L(M), where M is a deterministic nite automaton, and string w, there is an algorithm that decides if w belongs to L whose complexity is linear in the length of w. For a given regular language L = L(M), where M is a non-deterministic nite automaton, and string w, there is an algorithm that decides if w belongs to L whose complexity is O(jKj 2 jwj). 9