Prime Languages, Orna Kupferman, Jonathan Mosheiff. School of Engineering and Computer Science The Hebrew University, Jerusalem, Israel

Size: px

Start display at page:

Download "Prime Languages, Orna Kupferman, Jonathan Mosheiff. School of Engineering and Computer Science The Hebrew University, Jerusalem, Israel"

Cory Turner
5 years ago
Views:

1 Prime Languages, Orna Kupferman, Jonathan Mosheiff School of Engineering and Computer Science The Hebrew University, Jerusalem, Israel Abstract We say that a deterministic finite automaton (DFA) A is composite if there are DFAs A 1,..., A t such that L(A) = t i=1 L(A i) and the index of every A i is strictly smaller than the index of A. Otherwise, A is prime. We study the problem of deciding whether a given DFA is composite, the number of DFAs required in a decomposition, decompositions that are based on abstractions, methods to prove primality, and structural properties of DFAs that make the problem simpler or are retained in a decomposition. We also provide an algebraic view of the problem and demonstrate its usefulness for the special case of permutation DFAs. Keywords: Deterministic Finite Automata (DFA), Regular Languages, DFA Decomposition, Prime DFA, Prime Regular Languages 1. Introduction Compositionality is a well motivated and studied notion in computer science [2]. By decomposing a problem into several smaller problems, it is possible not only to increase parallelism, but also to sometimes handle inputs that are otherwise intractable. A major challenge is to identify problems and instances that can be decomposed. Consider for example the LTL model-checking problem [11]. Given a system S and a specification ψ, checking whether all the computations of S satisfy ψ can be done in time linear in S and exponential in ψ. If ψ is a conjunction of smaller specifications, say ψ = ϕ 1 ϕ t, then it is possible to check instead whether S satisfies each of the ϕ i s. 1 Not all problems allow for easy decomposition. For example, if we wish to synthesize a transducer that realizes the specification ψ above, it is not clear how to use A preliminary version appeared in the 38th International Symposium on Mathematical Foundations of Computer Science. The work is supported in part by ERC project no QUALITY. addresses: orna@cs.huji.ac.il (Orna Kupferman), yonatanm@cs.huji.ac.il (Jonathan Mosheiff) 1 The ability to decompose the specification causes the systems, which are much bigger than the subspecifications, to be the computational bottleneck in the model-checking problem. Thus, a different big challenge is to decompose the system [12]. Preprint submitted to Elsevier June 22, 2014

2 the decomposition of ψ into its conjuncts. In particular, it is not clear how to compose a transducer that realizes ψ from t transducers that realize ϕ 1,..., ϕ t [8]. In the automata-theoretic approach to formal verification, we use automata in order to model systems and their specifications. A natural question then is whether we can decompose a given automaton A into smaller automata A 1,..., A t such that L(A) = t i=1 L(A i). Then, for example, we can reduce checking L(S) L(A) to checking whether L(S) L(A i ) for all 1 i t. The automata used for reasoning about systems and their specifications are typically nondeterministic automata on infinite words [13]. As it turns out, however, the question of automata decomposition is open already for the basic model of deterministic automata on finite words (DFAs). Studying DFAs also suggests a very clean mathematical approach, as each regular language has a canonical minimal DFA recognizing it. Researchers have developed a helpful algebraic approach for DFAs that offers some very interesting results on DFAs and their decomposition. To the best of our knowledge, however, the basic question of decomposing a DFA into smaller DFAs is still open. In the algebraic approach, a DFA A is matched with a monoid. 2 The members of this monoid are the actions of words in Σ on the states of A. That is, each member is associated with a word and corresponds to the states-to-states transition function induced by the word. In particular, ɛ corresponds to the identity element. The product of two such transition functions is their composition. A DFA is called a permutation DFA if its monoid consists of permutations. A DFA is called a reset DFA if its monoid consists of constant functions and the identity function. A wreath product of a sequence A 1, A 2..., A t of DFAs is a cascade in which the transition function of each DFA A i may depend on the state of A i as well as the states of the DFAs preceding A i in the sequence. The algebraic approach is used in [7] in order to show that every DFA A can be presented as a wreath product of reset DFAs and permutation DFAs, whose algebraic structure is simpler than that of A. Specifically, the monoid associated with each of the permutation DFA factors in the wreath product is a group. This group divides a subgroup of the monoid DFA of A. The algebraic approach is based on a syntactic congruence between words in Σ : given a regular language L Σ, we have that x L y, for words x, y Σ, if for every w, z Σ, it holds that w x z L iff w y z L. Thus, the congruence refers to extensions of words from both right and left. In the context of minimization, which motivates the practical study of decomposition, one is interested in a right congruence. There, x L y iff for all words z Σ, we have that x z L iff y z L. By the Myhill-Nerode theorem [9, 10], the equivalence classes of L constitute the state space of a minimal canonical DFA for L. The number of equivalence classes is referred to as the index of L. We say that a language L Σ is composite if there are languages L 1,..., L t such that L = t i=1 L t and the index of L i, for all 1 i t, is strictly 2 A semigroup is a non-empty set S coupled with an associative binary operation : S S. A semigroup M is called a monoid if there exists an identity element e M such that for every x M it holds that x e = e x = x. A monoid G is called a group if for every x G there exists an element x 1 G such that x x 1 = x 1 x = e. 2

3 smaller than the index of L. Otherwise, we say that L is prime 3. The definitions apply also to DFAs, referring to the languages they recognize. For example, for Σ with Σ > 1 and w Σ, let L w = {w}. Clearly, the index of L w is w + 2. We claim that if w contains at least two different letters, then L w is composite. To see this, we show we can express L w as the intersection of two DFAs of index at most w +1. Let σ be some letter in w, and let m be its number of occurrences in w. By the condition on w, we have that 1 m < w. It is easy to see that L w is the intersection of the language w, whose index is w + 1, and the language of all words in which σ appears exactly m times, whose index is m + 2 w + 1. On the other hand, if w consists of a single letter, then L w is prime. One of our goals in this work is to develop techniques for deciding primality. The decomposition of L w described above is of width 2; that is, it has two factors. The case of decompositions of width 2 was studied in [3], where the question of whether one may need wider decompositions was left open. We answer the question positively; that is, we present a language that does not have a decomposition of width 2 but has one of width 3. We conjecture that there exist composite languages of arbitrarily large width. For compositions of width 2, the question of deciding whether a given DFA is composite is clearly in NP, as one can guess the two factors. In the general case, the only bound we have on the width is exponential, which follows from the bound on the size of the underlying DFAs. This bound suggests an EXPSPACE algorithm for deciding whether a given DFA is composite. Consider a DFA A. We define the roof of A as the intersection of all languages L(B), where B is a DFA such that L(A) L(B) and the index of B is smaller than that of A. Thus, the roof of A is the minimal (with respect to containment) language that can be defined as an intersection of DFAs whose language contain the language of A and whose index is smaller than the index of A. Accordingly, A is composite iff L(A) = roof (A). We use roofs in order to study primality further. In particular, if A is prime then there exists a word w roof (A) \ L(A). The word w is called a primality witness for A. Indeed, A is prime iff it has a primality witness. Let us go back to the language L w from the example above. We wish to prove that when w = σ n for some letter σ, then L w is prime. Let l > n be a natural number such that for every p n+1 it holds that n l mod p. The existence of l is guaranteed by the Chinese remainder theorem. In Section 2, we prove that σ l is a primality witness for L w and conclude that L w is prime. We use the notion of a primality witness to prove the primality of additional, more involved, families of languages. We then turn to study structural properties of composite and prime DFAs. Each DFA A induces a directed graph G A. We study the relation between the structure of G A and the primality of A. We identify cases where primality (with a short primality witness) is guaranteed. An example to such a case are co-safety languages, whose DFA contains one rejecting strongly connected component from which an accepting sink is reachable, We also study structural properties that can be retained in a decomposition and prove, for example, that a composite strongly connected DFA can be decomposed 3 We note that a different notion of primality, relative to the concatanation operator rather than to intersection, has been studied in [4]. 3

4 into strongly connected DFAs. Recall that a decomposition of a DFA A consists of DFAs that contain the language of A and are still of a smaller index. A simple way to get a DFA with the above properties is by merging states of A. A simple decomposition of A is a decomposition in which each of the underlying DFAs is a result of merging states of A. Simple decompositions have also been studied in [5] in the context of sequential machines. It follows from [3] that some DFAs have a decomposition of width 2 yet do not have a simple decomposition. We characterize simple decompositions and show that the problem of deciding whether a given DFA has a simple decomposition is in PTIME. Finally, we develop an algebraic view of DFA decomposition and primality. As [7], our approach is based on the transition monoid of A. First, we show that once we fix the set of accepting states, the question of primality of a DFA A depends only on A s transition monoid, rather than its transition function or alphabet. We then focus on permutation DFAs. Given a permutation DFA A we construct a new DFA, termed the monoid DFA, such that compositionally of A can be reduced to simplecompositionality of its monoid DFA. Driven by observations about monoid DFAs, we show a PSPACE algorithm for deciding the primality of A. We also show that composite permutation DFAs can be decomposed into permutation DFAs. 2. Preliminaries A deterministic finite automaton (DFA) is a 5-tuple A = Q, Σ, q 0, δ, F, where Q is a finite set of states, Σ is a finite non-empty alphabet, δ : Q Σ Q is a transition function, q 0 Q is an initial state, and F Q is a set of accepting states. For q Q, we use A q to denote the DFA A with q as the initial state. That is, A q = Q, Σ, q, δ, F. We extend δ to words in the expected way, thus δ : Q Σ Q is defined recursively by δ(q, ɛ) = q and δ(q, w 1 w 2 w n ) = δ(δ(q, w 1 w 2 w n 1 ), w n ). The run of A on a word w = w 1... w n is the sequence of states s 0, s 1... s n such that s 0 = q 0 and for each 1 i n it holds that δ(s i 1, w i ) = s i. Note that s n = δ(q 0, w). The DFA A accepts w iff δ(q 0, w) F. Otherwise, A rejects w. The set of words accepted by A is denoted L(A) and is called the language of A. We say that A recognizes L(A). A language recognized by some DFA is called a regular language. A DFA A is minimal if every DFA B that has less states than A satisfies L(B) L(A). Every regular language L has a single (up to DFA isomorphism) minimal DFA A such that L(A) = L. The index of L, denoted ind(l), is the size of the minimal DFA recognizing L. Consider a language L Σ. The Myhill-Nerode relation relative to L, denoted L, is a binary relation on Σ defined as follows: For x, y Σ, we say that x L y if for every z Σ it holds that x z Σ iff y z Σ. Note that L is an equivalence relation. It is known that L is regular iff L has a finite number of equivalence classes. The number of these equivalence classes is equal to ind(l). Definition 2.1. [DFA decomposition] Consider a DFA A. For k N, we say that A is k-decomposable if there exist DFAs A 1,..., A t such that for all 1 i t it holds that 4

5 ind(a i ) k and t i=1 L(A i) = L(A). The DFAs are then a k-decomposition of A. The depth of A, denoted depth(a), is the minimal k such that A is k-decomposable. Obviously, every DFA A is ind(a)-decomposable. The question is whether a decomposition of A can involve DFAs of a strictly smaller index. Formally, we have the following. Definition 2.2. [Composite and Prime DFAs] A DFA A is composite if depth(a) < ind(a). Otherwise, A is prime. We identify a regular language with its minimal DFA. Thus, we talk also about a regular language being k-decomposable or composite, referring to its minimal DFA. Similarly, for a DFA A, we refer to ind(l(a)) as ind(a). Example 2.1. Let Σ = {a} and let L k = (a k ). We show that if k is not a prime power, then L k is composite. Clearly, ind(l k ) = k. If k is not a prime power, then there exist 2 p, q < k such that p and q are coprime and p q = k. It then holds that L k = L p L q. Since ind(l p ) < k and ind(l q ) < k, it follows that L k is composite. In Example 3.1, we will show that the above condition is necessary and sufficient, thus, L k is prime iff k is a prime power. Let A be a DFA. We define: α(a) = {B : B is a minimal DFA such that L(A) L(B) and ind(b) < ind(a)}. That is, α(a) is the set of DFAs that contain A and have an index smaller than the index of A. As with other notations, we also refer to α(l), for a regular language L, and we identify α(l) with α(a), for the minimal DFA A for L. The roof of A is the intersection of the languages of all DFAs in α(a). Thus, roof (A) = B α(a) L(B). Clearly, L(A) roof (A). Also, if A is composite, then α(l) is an (ind(a) 1)- decomposition of A. We thus have the following. Theorem 2.2. A DFA A is prime iff L(A) roof (A), unless L(A) = Σ. The PRIME-DFA problem is to decide, given a DFA A, whether A is prime. A more general problem is, given a DFA A, to compute depth(a). We now prove an upper bound on the complexity of PRIME-DFA. We first need the following lemma. Lemma 2.3. Let A be a DFA and let n = ind(a). Then, α(a) = 2 O(n Σ log n) and ind(roof (A)) 2 2O(n Σ log n). PROOF. Note that a DFA of size smaller than n has an equivalent DFA of size n. Thus, α(a) is at most the number of DFAs of index n. A DFA with a state space Q and alphabet Σ can be characterized by a function f : Q Σ Q describing the transition function, a starting state and an accepting state set. Therefore α(l) n n Σ n 2 n = 2 O(n Σ log n). A DFA for roof (A) can be built using the product DFA construction. As each DFA in α(l) is of size less than n, the size of A is at most n α(l) = 2 2O(n Σ log n). This bound is doubly-exponential in n. Since ind(roof (A)) is at most the size of A, we are done. 5

6 Combining Theorem 2.2 and Lemma 2.3, we have the following. Theorem 2.4. The PRIME-DFA problem is in EXPSPACE. PROOF. We present a NEXPSPACE algorithm that decides the complementary language. Given a DFA A, our algorithm computes a DFA A such that L(A ) = roof (A) using the product DFA construction. The algorithm now checks whether L(A ) L(A) and accepts or rejects accordingly. Note that L(A ) L(A) iff A is prime. Let n = ind(a). From the proof of Lemma 2.3, we have that the size of A is at most = 2 2O(n Σ log n). The question of whether L(A ) L(A) can be nondeterministically decided in space logarithmic in the number of states in A and A. The relevant parts of the transition function of A can be computed on-the-fly so that A needs not be explicitly computed. Therefore, our algorithm indeed decides the problem in NEXPSPACE. The lower bound we are able to show for the PRIME-DFA problem is much lower: Theorem 2.5. The PRIME-DFA problem is NLOGSPACE-hard. PROOF. We show a reduction from the problem of deciding the emptiness of a given DFA, known to be NLOGSPACE-hard [6]. Let C be a minimal composite DFA with a decomposition L(C) = t i=1 C i, where ind(c i ) < ind(c). For simplicity, we assume that C does not have a rejecting sink state. The DFA from Example 3.1 can serve as our C. Given a DFA A, we construct a DFA B such that L(A) is empty iff B is prime. Let Σ be the alphabet of A and let # be a letter not in Σ. The DFA B then accepts the L(A) # L(C). It is not hard to see that B can be generated in logarithmic space we only have to add # transitions from A s accepting states to C s initial state. We prove that indeed L(A) is empty iff B is prime. Assume first that L(A) =. Then, L(B) = L(A) # L(C) =, which is indeed a prime language. For the other direction, assume that L(A). We claim that then, L(B) is composite, with the decomposition t i=1 L(A) # L(C i). Clearly, L(A) # L(C) = t i=1 L(A) # L(C i), thus it is left to prove that for every 1 i t, it holds that ind(l(a) # L(C i )) < ind(l(a) # L(C)). To show this, we distinguish between two cases. If L(A) = Σ, then ind(l(a) # L(C i )) = ind(c i ) + 1 < ind(c) + 1 = ind(l(a) # L(C)), satisfying the inequality. If L(A) Σ, then ind(l(a) # L(C i )) = ind(a)+ind(c i )+1 < ind(a)+ind(c)+1 = ind(l(a) # L(C)), again satisfying the inequality. Hence, L(B) = L(A) # L(C) is composite, and we are done We note the big gap between the upper bound given in Theorem 2.4 and the lower bound given in Theorem 2.5. Tightening these bounds is currently an open problem. 6

7 3. Primality Witnesses Recall that a minimal DFA A is prime iff roof (A) L(A). We define a primality witness for A as a word in roof (A) \ L(A). Clearly, a DFA A is prime iff A has a primality witness. Let A be a DFA. By the above, we can prove that A is prime by pointing to a primality witness for A. The following example utilizes this approach to introduce a family of prime languages. It complements Example 2.1. Example 3.1. Let Σ = {a} and let L k = (a k ). We show that if k is a prime power, then L k is prime. Let p, r N {0} be such that p is a prime and k = p r. Let w k = a (p+1)pr 1. Note that w k / L k. We claim that w k is a primality witness for L k, and conclude that L k is prime. Clearly, ind(l k ) = k. Consider a language L α(l). We show that w k L. Assume by contradiction that w k / L. Let A be a DFA for L. Since ind(l ) < ind(l k ) = k and w k > k, the rejecting run of A on w k must traverse at least one cycle. Let t be the length of such a cycle, and note that 0 < t < k. Note that for all i 0, we have that a it+(p+1)pr 1 is not accepted by A, and hence, a it+(p+1)pr 1 / L. On the other hand, since t 0 mod k, there exists i 0 such that it p r 1 mod k. For this value of i, we have a it+(p+1)pr 1 L \ L, and thus, L L, in contradiction to the assumption that L α(l). Therefore, w k L and we are done. As another example, consider the language L w = {w}. In Section 1 we argued that L w is prime iff w is over a single letter. We now provide the complete proof. Example 3.2. For w Σ, let L w = {w}. Assume first that w is not of the form σ n. Set w = w 1... w n. Note that ind(l) = n + 2. Let σ = w 1 and let m be the number of times that σ appears in w. Since w σ n, we have that m < n. Now, set L 1 = w and L 2 = {z Σ : The letter σ appears in z exactly m times}. Note that ind(l 1 ) = n + 1 and that ind(l 2 ) = m + 2 < n + 2. Therefore L = L 1 L 2 is a decomposition of L. For the other direction, assume that w = σ n for some σ Σ and n N {0}. We claim that L w is prime. Let l > n + 1 be a natural number such that for every p n + 1 it holds that n l mod p. The existence of l is guaranteed by the Chinese remainder theorem. Now, let z = σ l. We claim that z is a primality witness for L w. Note that ind(l w ) = n + 2. Let K α(l). Note that w K and ind(k) n + 1. By pumping, there exists 1 p n + 1 such that σ n+ip K for every i N {0}. Since n l mod p, there exists an i such that l = n + ip. Therefore, z K, proving that z is indeed a primality witness for L and therefore, that L is prime. We continue to study primality witnesses and focus on their lengths. For a DFA A, we define min witness(a) as follows. min witness(a) = min{ w : w is a primality witness for A}. Likewise, for a language L with minimal DFA A, we define min witness(l) = min witness(a). 7

8 Proposition 3.3. A prime DFA has a primality witness of length doubly exponential. PROOF. Let n = ind(a) and let A be the minimal DFA of roof (A). From Lemma 2.3, we have that ind(a ) 2 2O(n Σ log n). Using the product construction, we define a DFA A such that L(A ) = roof (A) \ L. The DFA A is a product of A and A and therefore, ind(a ) ind(a) ind(a ). Note that L(A ) contains exactly all the primality witnesses for A. Since A is prime, the language of A is not empty. Let w Σ be the shortest word in L(A ). The word w is the shortest primality witness for A. Since w is the shortest word in L(A ), the run of A on w is simple (that is, has no cycles). Therefore, w ind(a ) ind(a) ind(a n) 2O(n Σ log ) n 2 which is doubly exponential in ind(a), and we are done. Proposition 3.3 implies a naive algorithm for PRIME-DFA: Given an input DFA A, the algorithm proceeds by going over all words w Σ of length at most 2 2O(n Σ log n), and checking, for each B α(a), whether w L(B). While the algorithm is naive, it suggests that if we strengthen Proposition 3.3 to give a polynomial bound on min witness(a), we would have a PSPACE algorithm for PRIME-DFA. The question of whether such a polynomial bound exists is currently open. The following examples introduce more involved families of prime languages. Example 3.4. For n N, let K n = {ww : w Σ n } and L n = comp(k n); that is, L n = Σ \ K n. Let w n be a concatenation of all words of the form ss for s Σ n in some arbitrary order. Note that w n / L n. We prove that w n is a primality witness for L n, and conclude that L n is prime. Furthermore, min witness(l n ) is polynomial in ind(l). We first describe the Myhill-Nerode equivalence classes of L n. These consist of the following types of classes. 1. Classes of the form K n {x} for every x Σ such that 0 x < n. There are 2 n 1 such classes. 2. Classes of the form K n {x y x : x Σ and x = n k} for every 0 < k n and y Σ k. There are 2 n+1 2 such classes. 3. The single accepting sink class {x Σ : for every y Σ it holds that x y / K n }. Overall, we have ind(l n ) = 3 2 n 2. Let A n = Q, Σ, q 0, δ, F be the minimal DFA for L n. Consider a language L Σ with ind(l ) < ind(l n ) such that w n / L. To prove that w n is a primality witness for L n, we need to show that L n L. Let A = Q, Σ, q 0, δ, F be the minimal DFA for L. It is not hard to see that for every q Q other than the accepting sink, there exists a prefix x of w n such that δ(q 0, x) = q. Since ind(l ) < ind(l n ), there exist two prefixes x, y of w n such that δ(q 0, x) δ(q 0, y) but δ (q 0, x) = δ (q 0, y). Assume without loss of generality that x is shorter than y and let z be such that w n = yz. It then holds that δ (q 0, xz) = δ (q 0, w n ) / F and therefore xz / L. We claim that xz L n, implying that L n L. In order to verify that xz L n, we need to distinguish between 3 cases: 8

9 1. The case x y mod 2n. Then, xz 0 mod 2n, and therefore xz L n. 2. The case x y mod 2n and x mod 2n < n. Then, x K n w and y K n w for some 0 k < n and w, w Σ k such that w w. Assume that z = z 1 z m with z i Σ. Then it holds that z n k+1 z n = w w. Therefore, xz L n. 3. The case x y mod 2n and x mod 2n n. Then, there exist 0 k < n, w = w 1 w n Σ m and w = w 1... w n such that x K n w w 1... w k and y K n w w 1 w k and such that w k+1 w n w k+1 w n. Assume that z = z 1 z m with z i Σ. Then, it holds that z 1 z n k = w k+1 w n w k+1 w n. Therefore, xz L n. We have shown that L n L, implying that w n is indeed a primality witness for L n. Note that w n = 2 n+1 n = O(ind(L n ) log(ind(l n ))), proving that L n has primality witnesses of length polynomial in ind(l n ). Example 3.5. Consider words s = s 1 s m and w = w 1 w t, both over the same alphabet Σ. If there exists an increasing sequence of indices 1 i 1 < i 2 < < i t m such that for each 1 j t it holds that w j = s ij, we say that w is a subsequence of s. If w is a subsequence of s and w s, then we say that w is a proper subsequence of s. Let w Σ. Consider the language L w = {s Σ : w is a subsequence of s}. We claim that L w is prime, and furthermore, that min witness(l w ) 2 ind(l w ). If w = ɛ then L w = Σ and the claim is trivial. Thus, we may assume w ɛ. Let n = w and w = w 1 w n. Note that ind(l w ) = n + 1. The Myhill-Nerode equivalence classes defined by the language L w are as follows: For every 0 k < n, the set {s Σ : w 1 w k is a subsequence of s but w 1 w k+1 is not a subsequence of s} is an equivalence class. There are n such sets. The set L w itself is an equivalence class. We first prove the following claim. Claim 3.6. Let w = w 1... w n Σ with n > 0. There exists a word s Σ such that the following hold: 1. w is not a subsequence of s. 2. For every word c Σ such that c < n and c is a subsequence of w, it holds that c is a subsequence of s. Furthermore, there exists such a word s for which s < 2 w. PROOF. For each 1 i n 1, we define x i and y i as follows: x i = w i+1. { ɛ if w i = w i+1, y i = w i if w i w i+1. 9

10 Define s = x 1 y 1 x 2 y 2 x n 1 y n 1. We show that s satisfies the requirements in the lemma. We claim that for every i, the longest prefix of w that is a subsequence of x 1 y 1 x 2 y 2 x i y i is the word w 1 w i. This claim can be easily verified by an induction on i. Symmetrically, we claim that for every i, the longest suffix of w that is a subsequence of x i.y i... x n 1.y n 1 is w i+1... w n. This claim can be verified by an induction on n i. From the first claim, the longest prefix of w that is a subsequence of s is w 1... w n 1. Therefore, w is not a subsequence of s, implying that s satisfies the first requirement. Let c Σ n 1 be such that c is a subsequence of w. To show that s satisfies the second requirement, it is enough to show that c is a subsequence of s. Assume that c is of the form c = w 1 w i 1 w i+1 w n for some i. From the above claims about s, we know that w 1... w i 1 is a subsequence of x 1 y 1... x i 1 y i 1 and that w i+1... w n is a subsequence of x i y i x n y n. Therefore, c is a subsequence of s. We have shown that s satisfies both requirements. Finally, clearly, s < 2n. We now continue to complete the claim on the primality of L w. Let w = w 1... w n Σ and let s be the string guaranteed by Claim 3.6. We claim that s is a primality witness for L w. Let L α(l w ) and let A = Q, Σ, q 0, δ, F be a minimal DFA for L. Let c Σ be a proper subsequence of w. Note that s L c. Therefore, to show that s is a primality witness for L w, it is enough to show that there exists a proper subsequence c of w such that L c L. For each 0 i n, let Q i = {q Q : L(A q ) L wi+1 w n }. Note that L w = Σ L w. Hence, every reachable state in A is in Q 0. Since A is minimal, all states are reachable and so Q = Q 0. Note that for every 1 i n, and for every word x that contains the letter w i, it holds that δ(q i 1, x) Q i. Now, for every 0 i < n let R i = Q i \ Q i+1 and R n = Q n. The sets R 0... R n are a partition of Q into n + 1 disjoint sets. Since A has at most n states, one of these sets must be empty. It is impossible to have R n = Q n = since δ(q 0, w) R n. Therefore, there exists 0 i < n such that R i =. Set c = w 1... w i w i+2... w n. We now claim that L c L. Let x L c. We can write x as a concatenation of words x = x 1 x 2 x i x i+2 x n such that for each j i + 1, the word x j contains the letter w i. Set q = δ(q 0, x 1 x 2 x i ). It holds that q Q i, but since R i =, this implies q Q i+1. Therefore, L(A q ) L wi+2...w n, so x i+2... x n L(A q ). Hence, x L(A) = L as required. We have shown that L L c, which proves that s is indeed a witness for the primality of L w. Furthermore note that s is of length at most 2n + 1 < 2 ind(l w ). 4. The Width of a Decomposition Languages that can be decomposed into two factors have been studied in [3], where the question of whether one may need more than two factors was left open. In this section we answer the question positively. Formally, we have the following. Let A be a DFA. If there exist DFAs A 1, A 2... A m α(a) such that A = m i=1 L(A i), we say that A is m-factors composite. Assume that A is composite. Then, width(l) is defined as the minimal m such that A is m-factors composite. Clearly, for every composite A, 10

11 it holds that width(a) 2. The question left open in [3] is whether there exists a composite A such that width(l) > 2. Such a language is presented in the following example. Example 4.1. Let Σ = {a, b, c} and let L = {w Σ : w is a prefix of a word in c.(a +.b +.c + ) }. We show that the language L is composite with width(l) = 3. Consider A, the minimal DFA for L, described in Figure 1. A decomposition of A is given by the DFAs described in Figure 2. Note that all three factors are of index 3, while A is of index 4. Missing edges lead to a rejecting sink, hence the additional state in the index. b c q 3 b c q 1 q 2 a a Figure 1: The DFA A is of width 3. a b b, c q 1 q 2 a a, c q 1 q 2 b c c a, b q 1 q 2 c a b Figure 2: A decomposition for A. It can be verified by a case-by-case analysis that L is not 2-factor composite. Example 4.1 motivates us to conjecture that the width of composite languages is unbounded. That is, that width induces a strong hierarchy on the set of composite languages. On the other hand, given a composite L Σ, we wish to provide an upper bound on width(l). We conjecture that there exists a polynomial f such that width(l) f(ind(l)) for some polynomial f. If this is true, the algorithm given in the proof of Theorem 2.4 can be improved to a PSPACE algorithm by going over all subsets of D α(a) such that D f(ind(a)), and checking for each such D whether B D L(B) = L(A). 11

12 5. Structural Properties Consider a minimal DFA A and a DFA B α(a). Recall that L(A) L(B) and ind(b) < ind(a). Thus, intuitively, in B, fewer states have to accept more words. In this section we examine whether this requirement on B can be of help in reasoning about possible decompositions. The DFA A = Q, Σ, q 0, δ, F induces a directed graph G A = Q, E, where E = {(q, q ) : σ Σ such that δ(q, σ) = q }. The strongly connected components (SCCs) of G A are called the SCCs of A. We refer to the directed acyclic graph (DAG) induced by the SCCs of G A as the SCC DAG of A. A leaf in G A is a SCC that is a sink in this DAG. A DFA A is said to be strongly connected if it consists of a single SCC. Let A = Q, Σ, q 0, δ, F and B = S, Σ, s 0, η, G be DFAs. Let q Q and s S. If there exists a word w Σ such that δ(q 0, w) = q and η(s 0, w) = s, then we say that q touches s, denoted q s. Obviously, this is a symmetric relation. Lemma 5.1. Let A and B be DFAs such that L(A) L(B) and let q and s be states of A and B, respectively, such that q s. Then, L(A q ) L(B s ). PROOF. Let A = Q, Σ, q 0, δ, F and B = S, Σ, s 0, η, G be DFAs. Let w Σ be such that δ(q 0, w) = q and η(s 0, w) = s. Let x L(A q ). Since w x L(A) and L(A) L(B), we have that w x L(B), and thus, x L(B s ). Hence, L(A q ) L(B q ). For each s S, consider the subset of Q consisting of the states that touch s. Recall that S < Q. Intuitively, if one attempts to design B so that L(B) over-approximates L(A) as tightly as possible, one would try to avoid, as much as possible, having states in S that touch more than one state in Q. However, by the pigeonhole principle, there must be a state s S that touches more than one state in Q. The following lemma provides a stronger statement: There must exist a non-empty set Q Q relative to which the DFA B is confused when attempting to imitate A. Lemma 5.2. Let A = Q, Σ, q 0, δ, F and B = S, Σ, s 0, η, G be minimal DFAs such that B α(a). Then, there exists a non empty set Q Q such that for every q 1 Q and s S with q 1 s, there exists q 2 Q such that q 1 q 2 and q 2 s. PROOF. Since A and B are minimal, all states in Q and S are reachable. Therefore, every state in Q touches some state in S. We now construct two sequences Q 0, Q 1, Q 2... and S 0, S 1, S 2... as follows. First, Q 0 = Q and S 0 = S. Now, if there exist q Q i and s S i, for i 0, such that s touches q and s touches no other state in Q i, we set Q i+1 = Q i \ {q} and S i+1 = S i \ {s}. We repeat this step until no such q and s exist. If no such q and s exist, we are done. This process eventually terminates, since the sequence Q i is decreasing. Assume that the process terminates after m iterations. Set Q = Q m. We claim that Q satisfies the lemma s conditions. First, note that since S < Q, then for every i >= 0 it holds that S i < Q i and therefore Q. 12

13 Now, let q 1 Q and let s S such that s touches q 1. If there exists i such that s S i \ S i+1, then there exists q Q i \ Q i+1 such that s touches q but doesn t touch any other member of Q i. However, q 1 Q i is also touched by s, leading to contradiction. Therefore, s S m, and since the process halts after m iterations, there must exist q 2 Q such that s touches q 2. This proves that Q satisfies the lemma s conditions. We are going to use Lemma 5.2 in our study of primality of classes of DFAs. We start with safe and co-safe DFAs. A language L Σ is a safety language if for every w / L there exists a prefix x of w such that x y / L for all y Σ. A language L Σ is a co-safety language if comp(l) is a safety language. That is, L is co-safety if for every w Σ there exists a prefix x of w such that x y L for all y Σ. The word x is called a good prefix of L [1]. Let A = Q, Σ, q 0, δ, F be a minimal DFA such that L(A) Σ. It is easy to see that L(A) is co-safety iff A has a single accepting state s, which is an accepting sink. Obviously, the singleton {s} is a SCC of A. If Q \ {s} is a SCC, we say that A is a simple co-safety DFA. For example, it is not hard to see that for all w Σ, if Σ > 2, then the language L w of all words that have w as a subword is such that the minimal DFA for L w is a simple co-safety DFA. Example 5.3. Fix Σ such that Σ 3. For w Σ, let L w = {v Σ : w is a substring of v}. It is easy to verify that for all w Σ, the minimal DFA for L w is a simple co-safety DFA. We prove that simple co-safe DFAs are prime, and they have short primality witnesses. We first need the following lemma. Lemma 5.4. Let A be a simple co-safe DFA of index k. For every word x Σ such that x / L(A) and DFA B α(a), there exists a word y B Σ such that y B < k 2 +k, x y B / L(A), and x y B Σ L(B). Furthermore, given x Σ, it is possible to choose the words y B for x and for each B α(a) so that {y B : B α(a)} k 2. PROOF. Let A = Q, Σ, q 0, δ, F. Denote the two SCCs of A by C 1 and C 2, where C 2 is the singleton accepting sink. Assume that B = S, Σ, s 0, η, G. Let w L(A). Note that for every y Σ it holds that w y L(A). Since L(A) L(B), it also holds that w y L(B). Therefore, the state η(s 0, w) is an accepting sink of B. Since B is minimal, it only has a single accepting sink. Let s S be this accepting sink. Let Q Q be the set of states of A guaranteed by Lemma 5.2, with respect to B. Note that Q 2. Let q Q be such that the set L(A q ) is minimal relative to the containment partial order. For every q q Q it holds that L(A q ) L(A q ). Obviously, q C 1. Given x, let q = δ(q 0, x). Since x / L(A), we have q C 1. Since q C 1, there exists a word w Σ such that δ(q, w) = q and w k 2 = C 1 1. Let s = η(s 0, x w). Note that s q, and therefore, there exists a state q Q such that q q and s q. Furthermore, there exists z L(A q ) \ L(A q ). Let y B = wz and we have δ(q 0, x y B ) = δ(q 0, x w z) = δ(q, w z) = δ(q, z) / F. 13

14 Therefore, x y B / L(A). Let x Σ be a word such that δ(q 0, x ) = q and η(s 0, x ) = s. Note that δ(q 0, x z) = δ(q, z) F. Therefore, x z L(A). It follows that η(s 0, x y) = η(s 0, x w z) = η(s, z) = η(s, x z) = s. Therefore, x y B Σ L(B) as required. We now show that there exists z L(A q ) \ L(A q ) such that z < k 2, and conclude that y B = w + z < k 2 + k. Let z be a string of minimal length in L(A q ) \ L(A q ). Let z = z 1 z t. Assume by way of contradiction that t k 2. For each 0 i t, define a i = δ(q, z 1 z i ) and b i = δ(q, z 1... z i ). For every i, we have that (a i, b i ) Q 2 and Q 2 = k 2. Since t k 2, there exist 0 i < j t such that (a i, b i ) = (a j, b j ), but then we have z 1 z i z j+1 z t L(A s ) \ L(A s ), contradicting the minimality of z. Therefore, z < k 2 and y B < k 2 + k. Note that y B is only dependent on q and q. Therefore, by making y B a function of q and q, we have {y B : A α(a)} Q Q = k 2. Theorem 5.5. Every simple co-safe DFA is prime with a primality witness of polynomial length. PROOF. Given a simple co-safe DFA A, we define the primality witness for L(A) as w = y 0 y 1 y t, where y 0, y 1,..., y t is the finite sequence of words over Σ defined by the following algorithm: 1. Let D 0 = α(a). 2. Let y 0 = ɛ. 3. To define y i+1, set x = y 0 y 1 y i. Let B D i. By Lemma 5.4, there exists a word y B Σ such that x y B / L(A) and x y B Σ L(B). Furthermore, it is possible to choose the words y B such that {y B : B α(a)} k 2. Therefore, there exists a subset D i D i such that D i Di k and y 2 B is identical for every B D i. Let y i+1 be this y B, and let D i+1 = D i \ D i. 4. Repeat Step 3, increasing i, until D i =. We claim that w is a primality witness for L(A). Let B α(l). There exists 0 i t such that B D i. Note that y i+1 = y B. Therefore, y 0 y 1 y i+1 Σ L(B). Since w y 0 y 1 y i+1 Σ, we have w L(B). On the other hand, w / L(A), making it a primality witness. Now, note that for every i it holds that Di+1. Therefore, k 2 D i k2 1 k 2 ( k 2 1 )t α(a) 2 O(k log k Σ ). Hence, t is bounded by a polynomial in k. Now, note that for every i it holds that y i k 2 + k. Therefore, w t (k 2 + k) is also bounded by a polynomial in k. The requirement on A being simple is essential. For example, consider the DFA A appearing in Figure 3. While A is co-safe, it is not simple: It has two SCCs C 1 and C 2 such that no path exists either from C 1 to C 2 or from C 2 to C 1. In such a case, A can be decomposed into DFAs A 1 and A 2 as in the example in Figure 3. On the other hand, consider a non simple co-safety DFA in which the SCCs are linearly ordered by a path that traverses all of them. In this case, the question of primality remains open. 14

15 b C 2 C 0 C 3 C a 1 Σ 1 C 0 C 1 C 3 a b Σ 2 b C 2 C 0 C 3 a Σ Figure 3: The DFA A can be decomposed into A 1 and A 2. Our main result about the structural properties of composite DFAs shows that strong connectivity can be carried over to the DFAs in the decomposition. Intuitively, let A i be a member of a decomposition of A, and let B i be a DFA induced by a leaf of the SCC graph of A i. We can replace A i by B i and still get a valid decomposition of A. Formally, we have the following. Theorem 5.6. Let A be a strongly connected composite DFA. Then, A can be decomposed using only strongly connected DFAs as factors. PROOF. Let A = Q, Σ, q 0, δ, F. Since A is composite, there exist A 1... A m α(a) such that L(A) = m i=1 L(A i). Let A i = Q i, Σ, q0, i δ i, F i. We recursively define a sequence of words w 1... w m as follows. Let q i = δ i (q0, i w 1 w i 1 ). Consider the SCC DAG of A i. Let w i be a word such that the state δ i (q i, w i ) is in a leaf of the SCC DAG. Now, let q = δ(q 0, w 1 w m ), and let w be a string that satisfies δ(q, w ) = q 0. Such a string exists since A is strongly connected. Finally, define w = w 1 w 2 w m w. Note that δ(q 0, w) = q 0. Let 1 i m. Note that the run of A i on the string w, ends inside a leaf of the SCC DAG of A i. Let B i be a DFA consisting only of the states of that leaf, with the initial state δ i (q0, i w). Note that B i is strongly connected, and that ind(b i ) ind(a i ). We claim that L(A) = m i=1 L(B i), showing that A can be t-decomposed into strongly connected DFAs. Let z Σ. Recall that δ(q 0, w) = q 0. Therefore, z L(A) iff w z L(A). Recall that the initial state of B i is δ i (q i 0, w). Therefore, for every 1 i m, it holds that w z L(A i ) iff z L(B i ). We conclude that z L(A) iff z m i=1 L(Bi ), and we are done. We find the result surprising, as strong connectivity significantly restricts the overapproximating DFAs in the decomposition. 15

16 6. Simple Decompositions Consider the task of decomposing a DFA A. A natural approach is to build the factors of A by merging states of A into equivalence classes in such a manner that the transition function of A respects the partition of its states into equivalence classes. 4 The result of such a construction is a DFA B that contains A and still has fewer states. If A has a decomposition into factors constructed by this approach, we say that A is simply-composite. In this section, we formally define the concept of a simple decomposition and investigate its properties. In particular, we show that it is computationally easy to check whether a given DFA is simply-composite. Consider DFAs A = Q, Σ, q 0, δ, F and B = S, Σ, s 0, η, G. Recall that q s if there exists w Σ such that δ(q 0, w) = q and η(s 0, w) = s. We say that B is an abstraction of A if for every q Q there exists a single s S such that q s. An abstraction B of A is called a miser abstraction of A if L(A) L(B), and the set G of B s accepting states cannot be reduced retaining the containment. That is, for all s G, the DFA B s = S, Σ, s 0, η, G \ {s} is such that L(A) L(B s ). It is not hard to see that L(A) L(B) iff for every q F and s S such that q s, it holds that s G. The above suggests a simple criterion for fixing the set of accepting states required for an abstraction to be miser. Simple decompositions of a DFA consists of miser abstractions. Let L Σ. Clearly, if L is simply-composite, then it is composite. The opposite is not necessarily true, as shown by the following example. Example 6.1. Let A be the minimal DFA for the singleton language { ab }. According to Example 3.2, the DFA A is composite. However, one can go over all the miser abstractions of its 4-state DFA and verify that A is not simply-composite. q a 0 q b 1 q 2 a {q 0} {q 1} b a a, b b a, b q 3 {q 2, q 3} a, b a, b Figure 4: The language of the DFA A is { ab }. The DFA B is a miser abstraction of A, obtained by merging the states q 2 and q 3. Had we changed the accepting state set of B, it would have been an abstraction of A, but not a miser abstraction. Consider a DFA A = Q, Σ, q 0, δ, F and t N. We use γ t (A) to denote the set of all miser abstractions of A with index at most t. Let ceiling t (A) = B γ t(a) L(B). 4 Note that a naive merge of states introduces nondeterminism. Here, merging states implies also a merge of their successors, thus determinism is retained. 16

17 Note that A is simply decomposable iff L(A) = ceiling t (A) for t = ind(a) 1. Our next goal is an algorithm that decides whether a given DFA is simply-composite. For t N, a t-partition of Q is a set P = {Q 1,..., Q t } of t t nonempty and pairwise disjoint subsets of Q whose union is Q. For q Q, we use [q] P to refer to the set Q i such that q Q i. Definition 6.1. Let A = Q, Σ, q 0, δ, F be a DFA. A t-partition P of Q is a good t-partition of A if δ respects P. That is, for every q Q, σ Σ, and q [q] P, we have that δ(q, σ) [δ(q, σ)] P. A good t-partition of A induces a miser abstraction of it. In the other direction, each abstraction of A with index at most t induces a t-partition of A. Formally, we have the following. Lemma 6.2. There is a one-to-one correspondence between the DFAs in γ t (A) and the good t-partitions of A. PROOF. Consider a DFA A = Q, Σ, q 0, δ, F such that every state of A is reachable. Let P be a good t-partition of A. Define B P = P, Σ, [q 0 ] P, η, G, where η is naturally induced by δ and G = {Q i P : Q i F }. Note that B P is a miser abstraction of A and furthermore, that B P γ t (A). It is not hard to verify that the function from P to B P is one to one and onto γ t (A). Let A = Q, Σ, q 0, δ, F be a DFA and let q Q \ F. Let P be a good partition of A. If [q] P F =, we say that P is a q-excluding partition. Lemma 6.3. Let A = Q, Σ, q 0, δ, F be a DFA such that every state of A is reachable and F Q. Then, the DFA A is t-simply-decomposable iff for every state q Q \ F there exists a q-excluding t-partition. PROOF. Assume that for every q Q\F, there exists a q-excluding t-partition P q. Let B q γ t (A) be the DFA associated with the good partition P q as in Lemma 6.2. Note that ind(b q ) = P q t. It is straightforward to verify that L(A) = q Q\F L(B q) and therefore, that A is t-simply-decomposable. For the other direction, assume that A is t-simply-decomposable. Then, L(A) = B γ t(a) L(B). Let q Q \ F and let w Σ be such that δ(q 0, w) = q. Note that such a word w exists since every state ofa is reachable. Since w / L(A), there exists a DFA B = S, Σ, s 0, η, G γ t (A) such that w / L(B). Let P be the good partition of A defined by B. Note that P ind(b) t. Let s S be the unique state of B satisfying s q. Since γ t (A) contains only abstractions of A, the state is indeed unique. Note that η(s 0, w) = s, thus s / G, which implies that [q] P F =. Therefore, the partition P is a q-excluding partition, and we are done. For two partitions P and P of Q, we say that P is a refinement of P if for every R P there exist sets R 1... R s P such that R = s i=1 R i. Let q, q Q be such that q q. Let P be a good partition of A. If [q] P = [q ] P, we say that P joins q and q. 17

18 It is not hard to see that good partitions are closed under intersection. That is, if P and P are good partitions, so is the partition that contains the intersections of their members. Thus, there exists a unique good partition of A, denoted P q,q such that if P is a good partition of A that joins q and q, then P q,q is a refinement of P. The following lemma states that the partition P q,q can be computed efficiently. Lemma 6.4. Given a DFA A = Q, Σ, q 0, δ, F and q, q Q such that q q, the partition P q,q can be computed in polynomial time. PROOF. We present a polynomial time algorithm to compute P q,q. The algorithm iteratively computes a sequence P 0, P 1... P m of partitions of A. In the partition P 0, the states q and q belong to the same set, and all other sets are singletons. In the i-th iteration, the algorithm searches for s, s Q and σ Σ such that [s] Pi = [s ] Pi but [δ(s, σ)] Pi [δ(s, σ)] Pi. If such s, s, and σ exist, the partition P i+1 is identical to the partition P i, except for the sets [δ(s, σ)] Pi and [δ(s, σ)] Pi, which are replaced by their union in P i+1. If such s, s, and σ do not exist, the algorithm sets m = i and returns P m as P q,q. First, note that the stopping condition of the algorithm implies that the partition P m is a good partition of A. Let P be a good partition of A such that [q] P = [q ] P. Let s, s Q be such that [s] Pm = [s ] Pm. We claim that [s] P = [s ] P. This can be easily proven by induction on the minimal index i such that [s] Pi = [s ] Pi. We have shown that P m is a refinement of P. Therefore, P = P q,q. To see that the algorithm runs in polynomial time, note that m < Q and that each iteration runs in time polynomial in Q and Σ. Using Lemmas 6.3 and 6.4, we can now prove this section s main result. Theorem 6.5. Given a DFA A, it is possible to decide in polynomial time whether A is simply-composite. PROOF. Let A = Q, Σ, q 0, δ, F and k = ind(a). According to Lemma 6.3, the DFA A is simply-composite iff for every s Q \ F there exists an s-excluding (k 1)- partition. Let P be a good (k 1)-partition. Since A has k states, there must exist q, q Q such that q q and [q] P = [q ] P. The partition P q,q is thus a refinement of P. Therefore, if P is an s-excluding partition then P q,q is an s-excluding partition. On the other hand, note that P q,q k 1. Therefore, there exists an s-excluding (k 1)-partition iff there exists an s-excluding partition of the form P q,q for some q, q Q. We now present a polynomial time algorithm to decide whether A is simply-composite. 1. For each q, q Q, proceed as follows. (a) Compute P q,q. (b) For every s Q \ F, if P q,q is an s-excluding partition, then mark s. 2. If all members of Q \ F have been marked, then return true. Otherwise, return false. The correctness of the algorithm is obvious from the above. Note that P q,q can be computed in polynomial time based on Lemma 6.4, thus the algorithm runs in polynomial time. 18

19 Note that the above algorithm can be generalized to decide whether A is (ind(a) t)-simply-decomposable in time exponential in t. 7. Algebraic Approach In this section we use and develop concepts from the algebraic approach to automata in order to study the DFA primality problem. Utilizing this approach, we associate with every DFA A, a monoid M(A), and relate the properties of M(A) with the primality of A. We then focus on permutation DFAs those for which the transition monoid is a group. By analyzing their transition monoid, we are able to show a simpler algorithm for checking the primality of permutation DFAs and to prove that composite permutation DFAs can be decomposed into permutation DFAs Definitions and notations A semigroup is a set S together with an associative binary operation : S S S. A monoid is a semigroup S with an identity element e S such that for every x S it holds that e x = x e = x. Let S be a monoid with an identity element e S and let A S. Let A be the smallest monoid such that A A and e A. If A = S we say that A generates S. Consider a DFA A = Q, Σ, q 0, δ, F. For w Σ, let δ w : Q Q be such that for every q Q, we have δ w (q) = δ(q, w). For w 1, w 2 Σ, the composition of δ w1 and δ w2 is, as expected, the operation δ w2 w 1 : Q Q with δ w2 w 1 (q) = δ(q, w 2 w 1 ) = δ w1 (δ w2 (q)). The set {δ w : w Σ }, equipped with the composition binary operation, is a monoid called the transition monoid of A, denoted M(A). Its identity element is δ ɛ, denoted id. Note that M(A) is generated by {δ σ : σ Σ} A monoid-driven characterization of primality The following theorem and its corollary show that in order to decide whether a DFA is composite, we only need to know its state set, set of accepting states, and transition monoid. Thus, interestingly, changing the transition function or even the alphabet does not affect the composability of a DFA as long as the transition monoid remains the same. Theorem 7.1. Let A and A be two DFAs with the same set of states, initial state, and set of accepting states. If M(A) = M(A ), then depth(a) = depth(a ). PROOF. We prove that M(A) M(A ) implies that depth(a) depth(a ). We then conclude that M(A) = M(A ) implies that depth(a) = depth(a ). We parametrize the definition of α(a) by a bound t on the index of the DFAs in it. Thus, α t (A) = {B : B is a minimal DFA with ind(b) t and L(A) L(B)}. Let A = Q, Σ, q 0, δ, F and A = Q, Σ, q 0, δ, F. Consider the set δ Σ = {δ σ : σ Σ} that generates M(A). Since M(A) M(A ), it holds that δ Σ M(A ). Therefore, there exists an encoding function f : Σ Σ such that for every σ Σ, we have δ σ = δ f(σ). We can extend the function f to the domain Σ as follows: For w = w 1 w m with w i Σ, define f(w) = m i=1 f(w i). It is trivial to verify that for 19

Variable Automata over Infinite Alphabets

Variable Automata over Infinite Alphabets Orna Grumberg a, Orna Kupferman b, Sarai Sheinvald b a Department of Computer Science, The Technion, Haifa 32000, Israel b School of Computer Science and Engineering,