Simple equations on binary factorial languages

Simple equations on binary factorial languages A. E. Frid a a Sobolev Institute of Mathematics SB RAS Koptyug av., 4, 630090 Novosibirsk, Russia E-mail: frid@math.nsc.ru Abstract We consider equations on the monoid of factorial languages on the binary alphabet. We use the notion of a canonical decomposition of a factorial language and previous results by Avgustinovich and the author to solve several simple equations on binary factorial languages including X n = Y n, the commutation equation XY = Y X and the conjugacy equation XZ = ZY. At the end of the paper we discuss the difficulties hindering to reduce equations on factorial languages to equations on words and to enlarge the alphabet considered. Key words: language equations, commutation, conjugacy, catenation of languages, monoid of factorial languages, canonical decompositions 1 Introduction Language equations consitute an extensively developing and very non-trivial area. Their behaviour shows impressive differences with that of word equations and is much more complicated [10]. Even if we restrict ourselves to single equations envolving as an operation only the catenation, many intricate effects appear. As an example consider the commutation equation. On words, it is easy to completely solve it: if x and y are finite words, we have xy = yx if and only if x = z n and y = z m for some word z and some non-negative integers n and m. Supported in part by RFBR. Preprint submitted to Elsevier Science 26 September 2008

However, on languages, the commutation equation becomes very difficult to solve. In particular, much attention has been paid to the centralizer of a language, that is the maximal language commuting with it: the centralizer always exists since the set of languages commuting with a given one is closed under union. Conway [3] conjectured in 1971 that the centralizer of a rational language is rational. However, this conjecture was disproved by Kunc [9] in a very strong sense: the centralizer of a finite language can be not recursively enumerable. At the same time, positive partial results for prefix codes [12], codes [7] and languages with at most three elements [8] are known. In this paper we consider three simple equations on binary factorial languages. A language is called factorial if it contains all factors of any its element. The study of the monoid of factorial languages was started by S. V. Avgustinovich and the author in [1] where a theorem of existence and uniqueness of a canonical decomposition of a factorial language was proved. Note that no similar result is possible for languages in general [13]. Then we showed that languages occurring in the canonical decomposition of a regular factorial language are always regular [2] and investigated possible forms of the canonical decomposition of the catenation of languages [5]. This latter result allowed to develop a technique for solving some simple equations on binary factorial languages, and that is what we do in this paper. Problems arising when we try to consider languages on a larger alphabet or solve longer equations are described in the last section of this paper. The results concerning commutation have been reported at DLT 2007 [6]. The results on the first equation, X n = Y n, and conjugacy, are new. 2 Canonical decompositions Let Σ be a finite alphabet. A language is an arbitrary subset of the set Σ of all finite words on Σ. The empty word is denoted by λ. A word v is called a factor of a word u if u = svt for some words s and t (which can be empty). In particular, λ is a factor of any word. The factorial closure Fac(L) of a language L is the set of all factors of all its elements. Clearly, Fac(L) L. If Fac(L) = L, that is, if L is closed under taking factors, we say that L is a factorial language. Typical examples of factorial languages include the set of factors of a finite or infinite word; the set of words avoiding a pattern, etc. Clearly, the factorial closure of an arbitrary language is a factorial language; if the initial language is regular, so is its factorial closure. The family of factorial languages is closed 2

under taking union, intersection and catenation; here the catenation of languages is defined naturally as L 1 L 2 = {u 1 u 2 u 1 L 1, u 2 L 2 }. Factorial languages equipped with catenation constitute a submonoid of the monoid of all languages, and its unit is the language {λ}. We are interested in properties of this submonoid. A factorial language L is called indecomposable if L = L 1 L 2 implies L = L 1 or L = L 2 for any factorial L 1 and L 2. In particular, we have the following Lemma 1 [1] For each alphabet Σ, the language Σ is indecomposable. Other examples of indecomposable languages include a + b with a, b Σ, and languages of factors of any recurrent infinite word. (Here and below (+) denotes the union of languages.) A decomposition L = L 1 L k of a factorial language L to catenation of factorial languages is called minimal if L L 1 L i 1 L il i+1 L k for any factorial language L i L i. A minimal decomposition to indecomposable languages is called canonical. The following theorem is the starting point of our technique. Theorem 1 [1] For each factorial language L, a canonical decmposition exists and is unique. Example 1 If L is indecomposable, its canonical decomposition is just L = L. The canonical decomposition of the language a b + b a is (a + b )(a + b ). In what follows, the canonical decomposition of a factorial language L is denoted by L. A canonical decomposition can be interpreted as a word on the infinite alphabet F of all indecomposable factorial languages (although not all words on that alphabet are allowable canonical decompositions). We write.. L 1 = L2 or simply L 1 = L2 to mark that the canonical decompositions are. equal. Clearly, L 1 = L2 if and only if L 1 = L 2, and this is our main tool. We should also know what happens to the canonical decomposition when we catenate languages: given L 1 and L 2, how can we describe L 1 L 2? The answer has been described in [5], and to state it we need more notation. 3 Preliminary facts For a factorial language L, we define the subalphabets Π(L) = {x Σ Lx L} and (L) = {x Σ xl L}. 3

So, Π(L) is defined as the greatest subalphabet such that each word from L can be extended to the right by any letter of Π(L); and (L) is defined symmetrically in the left direction. Remark 1 If Π(L) = Σ or (L) = Σ, then clearly L = Σ. If Σ is the binary alphabet, Σ = {a, b}, this implies that Π and of any language not equal to Σ can be equal to {a}, {b}, or. Example 2 If L = a b, then (L) = {a} and Π(L) = {b}. If L = a + b, then (L) = Π(L) =. We also have (L) = Π(L) = for each finite language L. Lemma 2 [5] If X. = X 1 X n, X i F, then Π(X) = Π(X n ) and (X) = (X 1 ). Now, given a factorial language X and a subalphabet, let the operators L and R on factorial languages be defined by L (X) = Fac(X\ X) and R (X) = Fac(X\X ). The meaning of these sets is described by the following lemma. Lemma 3 [5] For factorial languages X and Y we have R (Y ) (X)Y = XY, and R (Y ) (X) is the minimal factorial set with this property: it is equal to the intersection of all factorial languages Z such that ZY = XY. Symmetrically, Y L Π(Y ) (X) = Y X, and L Π(Y ) (X) is the minimal factorial language with this property. Note that Y = Σ implies that XY = Y for all X, and (Y ) = implies that for all X, the minimal language giving XY when catenated with Y is X itself. So, in the binary case the situation is non-trivial only if (Y ) = {x} for some symbol x. In what follows we write R x and L x instead of R {x} and L {x} for a symbol x Σ. Let us list several staightforward properties of the operators L and R. Lemma 4 Let X is a binary factorial language on Σ = {a, b}. Then for any symbol x Σ we have {λ} = L Σ (X) L x (X) L (X) = X and {λ} = R Σ (X) R x (X) R (X) = X. Lemma 5 For all factorial languages X Σ and subalphabets Σ the equality holds R (R (X)) = R (X). Lemma 6 For each symbol x and a factorial language X we have Xx = R x (X)x and x X = x L x (X). 4

Lemma 7 [5] Let X be a factorial language with X =. X 1 X k and (X) = {x}. Then L x (X) =. X 2 X k, if X 1 = x, X, otherwise. The symmetric statement for Π(X) and X k also holds. The following theorem proved for an arbitrary alphabet is the main result of [5]. Here we reformulate it for the binary alphabet to simplify reading. Theorem 2 [5] Let A and B be binary factorial languages with A. = A 1 A k and B. = B 1 B m, where A i, B j F. Then the canonical decomposition of AB is the following: (1) If A = {a, b} or B = {a, b}, then AB. = {a, b} ; (2) If Π(A) = and (B) = {x} for a symbol x, then AB = R x (A) B; symmetrically, if Π(A) = {x} and (B) =, then AB. = A L x (B); (3) If (B) = Π(A) = {x}, then AB. = A 1 A k 1 B when A k = x, and symmetrically, AB = AB 2 B m when B 1 = x. Note that the situation of A k = B 1 = x falls into both cases. If A k x and B 1 x, then AB. = A B. (4) If Π(A) = {x} and (B) = {y}, x y, or if Π(A) = (B) =, then AB. = A B. Corollary 1 [5] For all factorial languages A and B, the canonical decomposition of AB is either AB. = R (B) (A) B or AB. = A L Π(A) (B). If R (B) (A) A, the first equality holds, and if L Π(A) (B) B, the second equality holds. Example 3 If A = a b and B = b a, then Π(A) = (B) = {b} and AB = a b a ; here it does not matter which of the b s was erased. This falls into the first part of Case (3) of Theorem 2. Example 4 Let us consider F a =Fac{a, ab}, which means that F a is the language of all binary words which do not contain two successive bs. Then F a. is indecomposable, Π(F a ) = (F a ) = {a}, and F a F a = Fa F a : here F a F a is the language of all words containing the factor bb at most once. This falls into the second part of Case (3) of Theorem 2. To make Theorem 2 applicable, we need to specify the form of R x (A) (and L x (B)) in Case (2) of it. Lemma 8 Let X =. X 1 X m, X i F, be the canonical decomposition of a factorial language X. Then each factor X i X j, i j, of the word X 1 X m is also a canonical decomposition of the respective language: X i X j. = Xi X j. 5

Proof. Suppose that X i X j is not a canonical decomposition. Since all languages X k are already indecomposable, it is possible only if X i X j = X i X k X j for some factorial language X k X k, i k j. But then X = X 1 (X i X k X j ) X m = X 1 X k X m, that is, X X 1 X m, a contradiction. Lemma 9 Let x {a, b} be a symbol and X be a binary factorial language containing both symbols a and b. Denote R x (X) = Y ; then (Y ) = (X). Symmetrically, Π(L x (X)) = Π(X). Proof. First of all, Y {λ} since X contains the symbol not equal to x. On the other hand, Y X, so that (Y ) {a, b}. It remains to prove that x (Y ). Suppose by contrary that x / (Y ), that is, xu Y for some u Y. If u is not the empty word, u Y means that uv X for some v such that the last symbol of uv is not equal to x. Since x (X), we have xuv X and thus xu Y, a contradiction. Now it remains to observe that if xu Y for all non-empty words u Y (which exist since Y {λ}), then x = xλ Y since Y is factorial. We have shown that x (Y ), which was to be proved. The following lemma is non-trivial in the binary case only when and Π are of cardinality one, but we just prove it for the general case. Lemma 10 Let X be a factorial language. Then for all subalphabets, Π Σ the equality L Π (R (X)) = R (L Π (X)) holds. Proof. If a non-empty word u L Π (R (X)), then there exists v (which can be empty) such that vu starts with a symbol from Σ\Π and belongs to R (X). This, in its turn, means that there exists a word w (which can be empty) such that the last symbol of the word vuw belongs to Σ\, and vuw X. We see that the obtained condition is symmetric with respect to the order of applying the operators L Π and R, so, we get it another time if we consider an arbitrary word u R (L Π (X)). Thus, these two sets are equal. Lemma 11 [5] Let X be a factorial language with X. = X 1 X m, X i F. Consider a subalphabet Σ and the factorial language Y = R (X). Then Y is obtained by deleting {λ} entries from the decomposition U 1 U m, where U i F and subalphabets i Σ are defined iteratively as follows: m =, and for each i from m to 1 we put 6

U i = R i (X i ) and i 1 = (U i ), if U i i, U i = {λ} and i 1 = i, otherwise. Note that in the binary case, the described situation can be non-trivial only if is of cardinality one, and U i may be not equal to X i only if i (X i+1 ), which means that we had (X i+1 ) =. Example 5 Consider X = (a +b ) 2k and = {a}. Then U 2k = R a (a +b ) = b, 2k 1 = {b}, U 2k 1 = R b (a + b ) = a, 2k 2 = {a}, etc., so that we have R a (X) =. (a b ) k. Consequently, Xa. = (a b ) k a, giving an example of Case (2) of Theorem 2. Corollary 2 In the notation of the previous lemma, consider undecomposable factors Y i of Y : Y =. Y 1 Y n, Y j F. Then there exist integers 0 = i 0... i m 1 i m = n such that Y ik 1 +1 Y ik X k for all k = 1,..., m. Moreover, if Y = R (X), then for each k < m we have Y 1 Y ik = R (Yik +1)(X 1 X k ) and Y ik +1 Y n = R (X k+1 X m ). Lemma 12 Suppose that Y = R (X) (or Y = L (X)) for some Σ, X. = X 1 X n, X i F, and Y. = X σ(1) X σ(n) for some permutation σ. Then X = Y. Proof. The assertion of the lemma means that each indecomposable factorial language occurs in the canonical decompositions of X and Y an equal number of times. For the sake of convenience, let us denote X σ(i) = Y i. Due to Corollary 2, there exist integers 0 = i 0... i n 1 i n = n such that Y ik 1 +1 Y ik X k for all k = 1,..., n. We wish to prove that i k = k for all k, and all the inclusions are in fact equalities (of the form Y i = X i ). Suppose the opposite. Then there exists some k 1 such that the corresponding inclusion is of the form Y ik1 1+1 Y ik1 X k1 (the equality is impossible even if i k1 i k1 1 2, since all the involved languages are indecomposable, and decompositions are minimal). In particular, neither of the languages Y ik1 1+1,..., Y ik1 is equal to X k1. But we know that the language X k1 occurs in X and Y an equal number of times. So, X k1 is equal to some Y j, where i k2 1 + 1 j i k2, and X k1 = Y j X k2. Continuing this argument, we get an infinite sequence X k1 X k2 X km. But there is only a finite number of entries in the canonical decomposition of a factorial language. A contradiction. The following lemma follows directly from the definitions and will be used below several times. Lemma 13 Let Y be a factorial language with Y = R (Y ) (Y = L Π (Y )) 7

for a given, Π Σ. Then Y = R (X) (Y = L Π (X)) if and only if for a factorial language X we have Y X Y (respectively, Y X Π Y ). 4 Simple Word Equations Here we list several classical word equations and their solutions. Words are considered on an alphabet A which may be infinite since all considered words are finite anyway. Lemma 14 (Commutation of words, see e.g. [11]) Let words x, y A commute: xy = yx. Then x = z n and y = z m for some z A and n, m 0. Lemma 15 (Conjugacy of words, see, e. g., [4]) Let xz = zy for some x, y, z A. Then either x = y = λ, or z = λ, or x = rs, y = sr, and z = (rs) k r for some r, s A with r λ and k 0. At last, the following lemma can be easily proved by a standard technique described, e. g., in [4]. Lemma 16 Let xay = yax for some x, y A, a A. Then x = (za) n z and y = (za) m z for some z A and n, m 0. 5 Unary factorial languages Before we pass to the main part of the paper, note that equations on unary factorial languages are in general easy to solve. Indeed, if the alphabet consists of one symbol a, then all possible factorial languages are a and A k = {a i 0 i k} for all k 0. We have A k A m = A k+m and a A k = A k a = a for all k and m. Thus, unary factorial languages equipped with catenation are equivalent to non-negative integers and infinity under addition, that is, to the Presburger arithmetic with infinity, which is decidable. In particular, we easily see that (A k ) n = A kn and (a ) n = a, so that for unary factorial languages X n = Y n if and only if X = Y ; any two unary factorial languages commute; and XZ = ZY if and only if Z = a or X = Y. So, from now on we may assume that both symbols do occur in at least one of the languages constituting the considered equations. 8

6 The equation X n = Y n Theorem 3 Let X and Y be factorial languages. Then for all n 2 we have X n = Y n if and only if X = Y. We shall give two proofs of this theorem: the first one is easy and is valid for an arbitrary alphabet, and the second one is longer and less general but uses the same technique that works for the other equations considered. Proof 1 (S. V. Avgustinovich). Suppose that X n = Y n but X Y ; then without loss of generality there exists a word x X\Y. Then x n X n = Y n ; now consider all prefixes of x n belonging to Y. The longest of them, denoted by y 1, is shorter than x: otherwise we would have x Y since Y is factorial. So, x = y 1 z 1 for z 1 λ, and z 1 x n 1 belongs to Y n 1 which is a factorial language. Similarly, we see that the longest prefix y 2 of z 1 x n 1 belonging to Y is shorter than z 1 x: z 1 x = y 1 z 2 for z 2 λ and z 2 x n 2 Y n 2, etc.; at last we obtain that z n 1 x Y and thus x Y, a contradiction. Proof 2. This proof is valid only if the alphabet Σ is binary, Σ = {a, b}. First of all, since {a, b} is indecomposable, we have X n = {a, b} if and only if X = {a, b}. So, it remains to list all the possible forms of X n and Y n when Π(X), (X), Π(Y ), (Y ) are of cardinality 1 or empty. In fact, due to Corollary 1, the cases to be considered are: XX = X X; or XX = X X, where X = R x (X) X, (X) = {x}; or XX = X X, where X = L x (X) X, Π(X) = {x}; and these three cases may arbitrarily combine with the three analogous situations for Y Y. Some of the combinations are symmetric to each other, so that the case study is not too long. The equality XX = X X due to Corollary 1 implies that X = R (X) (X) = L Π(X) (X). So, by an easy induction we see that X n. n = X. If the equality XX = X X holds, where X = R x (X) X, (X) = {x}, then (X ) = {x} due to Lemma 9. Now we see that X 3 = X X X due to Lemma 3 and this is the canonical decomposition of X 3 due to Corollary 1. Continuing the process, we obtain that X n. = (X ) n 1 X. Symmetrically, if XX = X X, then X n. = X(X ) n 1. Case 1. If the equality X n = Y n is rewritten for the canonical decompositions as X n. = Y n, then clearly X = Y. Case 2. Let only one of the canonical decompositions X n and Y n be not equal to (X) n (or (Y ) n ), say, let the equation for canonical decompositions be (X ) n 1 X. = Y n (1) with X = R x (X), {x} = (X). Since (X ) = {x}, we have (Y ) = {x} 9

due to Lemma 2 (applied several times), and we can see that Y does not change when we apply x to it from the right: R x (Y ) = Y. Thus, applying R x to both parts of Equation (1), we obtain (X ) n. n = Y and thus clearly X = Y. Substituting this to (1), we see that X = Y = X, contradicting to the assumption that X X. Case 3. Let the canonical decompositions of X n and Y n be biased in the same direction, say, let the equation for the canonical decompositions be (X ) n 1 X. = (Y ) n 1 Y. (2) Note that (X) = (X ) and (Y ) = (Y ) due to Lemma 9; (X ) = (Y ) due to Lemma 2, so that (X) = (Y ) = {x} for some x {a, b}. Applying R x to (2), we obtain the word equation (X ) n. = (Y ) n whose only solution is X = Y. Substituting it to (2), we obtain X = Y, which is what we needed. Case 4. Let the canonical decompositions of X n and Y n be biased in opposite directions, that is, let the equation for the canonical decompositions be (X ) n 1 X. = Y (Y ) n 1. (3) Here we have (X) = (X ) = {x} and Π(Y ) = Π(Y ) = {y} for x, y {a, b}; as usual, X = R x (X) and Y = L y (Y ). Let us apply to both parts of (3) the operators R x and L y : due to Lemma 10, the order of applying does not matter. We have R x ((X ) n 1 X. = (X ) n and (X ) = {x}. Suppose first that x y, then X does not change under L y, and L y ((X ) n ) = (X ) n. Symmetrically, in this case we have R x (L y (Y (Y ) n 1 )) =. (Y ) n ; since these canonical decompositions are equal, this means X = Y. Returning to (3), we see that X and Y = X are both suffixes of X n ; clearly, X X, that is, the suffix corresponding to X is longer: X = W X for some W F. So, R x (X) = X = R x (W X ), but due to Lemma 11, R x (W X ) = W R x (X ) for some W, and due to Lemma 5, R x (X ) = X, so that W = {λ}. Here W = R (X )(W ) = R x (W ); at the same time, we know that W X. = X, which means that W = R (X )(W ) = W. So, W = {λ} and X = X, contradicting to our assumption. Now suppose that x = y. If X does not change under L x and Y does not change under R x, we repeat the arguments above and obtain a contradiction. Suppose that X changes under L x ; due to Theorem 2, this is possible only if X. = x X for some X = L x (X ). Then L x ((X ) n ) = (X x ) n 1 X ; we see that the number of elements of F in this canonical decomposition modulo n is equal to n 1, so that we cannot have R x (L x (Y (Y ) n 1 )) =. (Y ) n. Thus, Y must change under R x, Y = Y x. So, after applying R x and L x to both 10

parts of (3) we obtain (X x ) n 1 X. = (Y x ) n 1 Y and thus X = Y. Denote that language by Z; then X = x Z and Y = Zx, and (3) can be rewritten as (x Z) n 1 X. = Y (Zx ) n 1. We see that X ends with x, which means that X = R x (X)x = X x = x Zx. Symmetrically, we obtain Y = x Zx, that is, X = Y, which was to be proved. We have listed all the cases and thus proved the theorem. Of course this second proof is much more complicated and less general than the first one, but its technique works also for other equations on binary factorial languages, and we show it in the subsequent sections. 7 Commutation In this section, we completely solve the equation XY = Y X, where X and Y are binary factorial languages. Clearly, if factorial languages (in fact, languages in general) are powers of the same language, they commute. We call it word type commutation: Word type commutation: XY = Y X if X = Z m and Y = Z n for some factorial language Z and non-negative integers n and m. However, it is easy to see that binary factorial languages may commute also in other situations. The simplest of them is absorption: Commutation by absorption: Let Σ X be the subalphabet of all letters occurring in a factorial language X. Then XY = Y X = Y if Y Σ X Y, Σ X Y Y, and thus Y = Y Σ X = Σ XY : the language Y absorbs X. In the binary case, absorption means that either X = {λ}, or X x for some letter x and Π(Y ) = (Y ) = {x}, or Y = {a, b}. There are also less obvious examples of commutation. Let us list them: Unexpected commutation I. Let Z be a binary factorial language with (Z) = {x} and Π(Z) = {y}, x y. Then for all r, p > 0 the language Z p commutes with any language X satisfying the inclusion Z r X Z r x y Z r. (4) 11

Such a language not equal to Z r exists if and only if there exists a word v such that yv Z r, vx Z r, but yvx / Z r. Example 6 Consider the languages F a =Fac({a, ab} ) and F b =Fac({b, ab} ): the language F a contains all words avoiding two successive bs, and the language F b contains all words avoiding two successive as. Consider Z = F b F a ; then Π(Z) = {a} and (Z) = {b}. Let us fix r = 1. Then any language X = Z +S, where S is a factorial subset of a b, commutes with any power Z p of Z. The word v satisfying the condition above is equal to ab since aab Z, abb Z, but aabb / Z. Unexpected commutation II. Let x Σ 2 be a symbol and Q be a binary factorial language with L x (Q) = R x (Q) = Q and (Q), Π(Q) equal to or {y}, y x. Then for all p 0 and r 1 the language (x Q) p x commutes with any language A satisfying the inclusions (x Q) r + (Qx ) r A (x Q) r x. (5) Example 7 The languages A = a b + b a and B = a commute since AB = BA = a b a. Here x = a and Q = b, so, in fact B commutes with any factorial language which includes a b + b a and is included into a b a. The following example based on the same idea is more sophisticated. Example 8 For each p 0, the language (a b ) p a commutes with the language A = a b (aa) b a + a b (aa) ab, since a b a b + b a b a A a b a b a. Unexpected commutation III. Let Z be a binary factorial language such that ZZ. = Z Z and (Z) = {x}. Let B be a factorial language satisfying Z n B Z n x, n > 0. Then B commutes with Z m B for all m > 0. Symmetrically, if Z is a binary factorial language with ZZ =. Z Z and Π(Z) = {x}, and if B is a factorial language satisfying Z n B x Z n, then B commutes with BZ m for all n, m > 0. Example 9 Consider Z = a b and B =Fac(a (bb) a b + a b(bb) a b a ). Here (Z) = {a} and Z 2 = a b a b B a b a b a = Z 2 a. We see that B commutes with all sets A of the form A = Z m B: AB = BA = Z m+2 B. The following theorem states that in fact we have listed all possible situations of commutation: Theorem 4 Two binary factorial languages commute if and only if one of the situations above is realized: either word type commutation, or absorption, or unexpected commutation I, II, or III. 12

Proof. Let XY = Y X for binary factorial languages X and Y. Due to Corollary 1, there are only three possibilities of how the equality for canonical decompositions looks like: either X Y. = Y X, (6) where X = R (Y ) (X) and X = L P i(y ) (X) (or XY = Y X, which is the same up to renaming X and Y ); or X Y. = Y X, (7) where X = R (Y ) (X) and Y = R (X) (Y ); or XY completely symmetric to (7).. = Y X, and this case is These cases intersect: for example, the situation when L Π(Y ) (X) = X and R (X) (Y ) = Y falls into both (6) and (7). However, to get a classification of the cases of commutation, we consider the cases (6) and (7) separately. Suppose first that (6) holds. It is a conjugacy equation on the alphabet F, and it can be solved according to Lemma 15. Since the unit element of the semigroup F is the language {λ}, the equation has the following solutions: (1) Either Y = {λ}; then X = X = X and this is a particular case of absorption. (2) Or X = X = {λ}, and this is again absorption, since XY = Y X = Y. (3) Or X. = RS, X. = SR, and Y. = (RS) k R for some R, S F, where R {λ}, k 0. Let us consider this third situation in detail. First, note that due to Lemma 8, the languages R and S are given in canonical decompositions. Due to Lemma 2 (applied several times), we have (Y ) = (R) = (X ) and Π(Y ) = Π(R) = Π(X ); (8) in what follows we denote these subalphabets just by and Π. Suppose first that one of the subalphabets and Π is empty: say, =. Then X = R (X) = X = RS and X = L Π (X) = SR; due to Lemma 12, X = X, and the commutation equation (6) is just XY. = Y X. Due to Lemma 14, we have X. = Z n and Y. = Z m for some factorial language Z F, and this is word type commutation. Note that if Y = {a, b}, then X = X = {λ}, and this is absorption. So, the only non-trivial situation is # = #Π = 1, that is, either = {x} and Π = {y}, y x, or = Π = {x}. We shall consider these two situations in 13

succession, but before that, note that in both cases L Π (X ) = R (X ) (9) due to Lemma 10 and X + X X X Π X by the definitions of X = R (X) and X = L Π (X), that is, RS + SR X RS Π SR. (10) Suppose first that = {x} and Π = {y}, x y. Then it can be easily seen that L y (X ) = X and R x (X ) = X. By (9) we see that X = X, that is, RS. = SR, and due to Lemma 14, we have R = Z n and S = Z m for some Z F +. So, X = X = Z n+m and Y = Z k(n+m)+n. After renaming variables we can write X = X = Z r and Y = Z p for some r, p > 0 (if p or r is equal to 0, the language X or Y is equal to {λ}, and we have already considered these degenerate situations). Now (10) can be rewritten as Z r X Z r x y Z r. (11) If Z r = Z r x y Z r, then X = Z r and we have word type commutation. But if Z r Z r x y Z r, consider a word u of minimal length belonging to (Z r x y Z r )\Z r. We see that u = yvx, where yv Z r, vx Z r, but yvx / Z r. If such a word u exists, then we can take any factorial set X lying between Z r and Z r x y Z r, and it will commute with any power of Z. This is exactly Unexpected commutation I described above. Now suppose that = Π = {x}. First consider the case when R does not start with x. Then we have L x (X ) = L x (RS) = RS due to Lemma 7, and thus RS = R x (SR) due to (9). So, due to Lemma 12 we have RS. = SR, and due to Lemma 14, RS = SR = X = X = Z r for some factorial language Z with R. = Z n and S. = Z m. Now (10) can be rewritten as Z r X Z r x x Z r ; but in fact, both inclusions here are equalities: Z r x = x Z r = Z r since (Z) = (R) = {x} and Π(Z) = (R) = {x} due to Lemma 2. So, X = Z r, Y = (RS) k R = Z p, and this is word type commutation. Symmetrically, the same holds if R does not end with x. So, it remains to check the situation when R. = x or R. = x T x for some T F + (note that T {λ} since x x. = x ). 14

Suppose first that R. = x. Then (10) can be rewritten as x S + Sx X x Sx. (12) Any language X satisfying these inclusions commutes with all languages of the form (x S) k x. Here S is an arbitrary language which can precede and follow x in a canonical decomposition: that is, an arbitrary language such that L x (S) = R x (S) = S and x / (S), Π(S) (which means that (S) and Π(S) are equal to {y} or to ). Note that if X is the maximal possible, X = x Sx, this is again a word type commutation since X k = (x Sx ) k = (x S) k x = Y. If X x Sx, this type of commutation is new. Now suppose that R =. x T x, T F +. Then L x (RS) =. T x S and R x (SR) =. Sx T due to Lemma 7; due to (9), we have the following word equation on F : T x S = Sx T. Due to Lemma 16, the general solution of this equation is S =. (Qx ) n Q and T =. (Qx ) m Q for some Q F such that L x (Q) = R x (Q) = Q and x / (S), Π(S), and for n, m 0. So, RS = (x Q) n+m+2, SR = (Qx ) n+m+2, and Y = (x Q) k(n+m+2)+m+1 x. After renaming variables, we get RS = (x Q) r, SR = (Qx ) r, and Y = (x Q) p x for some r 2 and p 1; and (10) takes the form (5). Together with (12) (which adds the cases of r = 1 and p = 0) this inclusion gives exactly Unexpected commutation II. We have considered all situations possible if (6) holds. Now suppose that (7) holds, that is, the canonical decompositions for the commutation equation XY = Y X are X Y. = Y X. Suppose first that X = {λ} or Y = {λ}. Then XY = Y or XY = X, and this is commutation by absorption. So, in what follows we assume that X and Y are not empty. Suppose that (X) =. Then Y. = Y due to Lemma 4, and our case have been considered in the previous subsection (where it has been shown that this is inevitably word type commutation). Thus we have (X) = {x} and (Y ) = {y} for some x, y {a, b}. But {y} = (Y ) = (XY ) = (X ) = {x} due to Lemmas 9 and 2 since X and Y are not equal to {λ}. So, x = y. Note that this is the main critical point in this theorem where we require the alphabet to be binary: all the previous arguments in this section could be extended to the general alphabet. Note that if X = Y, then X = Y, and this is word type commutation. So, we may assume that one of the words X, Y on the alphabet F is a proper prefix of the other: say, X. = Y C for some C F +. Then X =. CY because of (7), and Y C =. X. = Rx (X) =. R x (CY ) =. C R x (Y ) = C Y because of 15

Corollary 2; here C = R x (C) since (Y ) = (Y ) = {x}. Clearly, C = C since C precedes Y in the canonical decomposition of XY, and (Y ) = {x}. Thus, we have Y C. = CY, so that Y = Z n, C = Z m for some n, m > 0 due to Lemma 14. Here Z is an arbitrary factorial language with (Z) = {x} and ZZ. = Z Z. By the definition of Y, we have Y = Z n Y Z n x, and Y can be equal to any set satisfying these inclusions. Note that Y can be not equal to Y only if Π(Z) is equal to or {z}, z x. Now we can just define X = Z m Y and observe that X and Y really commute: XY = Y X = Z n+m Y. So, this is the right-to-left version of Unexpected commutation III. The symmetric left-to-right version of unexpected commutation III can be found and stated symmetrically starting from the equation XY. = Y X. Of course unexpected commutation III includes some cases of word type commutation: in particular, if Y = Z n 1 D for some Z D Zx, where {x} = (Z), then Y = D n and X = D m+n. But situations when it is not word type commutation also exist, as Example 9 shows. We have studied all possible cases when binary factorial languages commute. Theorem 4 is proved. 8 Conjugacy The conjugacy equation is XZ = ZY, and its solutions on words have been described in Lemma 15. Clearly, for factorial languages, all the word solutions are also admitted: Word type conjugacy: either X = Y = {λ}; or Z = {λ}; or X = RS, Y = SR, Z = (RS) k R for some R, S F such that R {λ}, k 0, and if S {λ}, then RS = RS, SR = SR, otherwise RR = RR. On the other hand, it is easy to list all cases when XZ = ZY = {a, b} : we call them trivial absorption. Trivial absorption: We have XZ = ZY = {a, b} if and only if X = Y = {a, b} or Z = {a, b}. So, in all other cases on the binary alphabet, the subalphabets and Π of X, Y, and Z are either empty or of cardinality one. To list all solutions, we should consider all possible cases. Like above, we shall group them according 16

to the form of the canonical decompositions of XZ and ZY, assuming that X, Y, Z are not empty. Basically, there are only four possible cases: X Z = ZY, where X = R (Z) (X), Y = L Π(Z) (Y ); (13) XZ = Z Y, where Z = L Π(X) (Z), Z = R (Y ) (Z); or (14) XZ = ZY (15) or, symmetrically, X Z = Z Y, where X, Y, Z, Z are defined as above. Of course, each of the reduced languages (with primes) can be equal to the initial languages, in particular when the respective subalphabet is empty. We could consider these cases successively, but the resulting list of cases is long and too awkward to form a nice-looking theorem. So, let us show how the technique works on an example. In what follows we consider X = F a F b, where F a and F b are defined as in Example 6. In particular, F a and F b are the two components of the canonical decomposition of X, and we have (F a ) = Π(F a ) = (X) = {a}, (F b ) = Π(F b ) = Π(X) = {b}. Also we have L a (X) = L b (X) = R a (X) = R b (X) = X, so that X remains unchanged under any of these operators. So, we should eliminate the situation when XZ = {a, b}, and after that due to Corollary 1 it is sufficient to consider equalities (14) and (15). First of all, clearly, XZ = ZY = {a, b} if and only if Z = {a, b} ; here Y is arbitrary. This gives us Solution 1. Z = {a, b} and Y {a, b} is arbitrary. Suppose first that (15) holds: F a F b Z. = ZY, where Z = L b (Z) and Y = L Π(Z) (Y ). If Z = {λ}, we have Y = X, which gives Solution 2. Z = {λ} and Y = X = F a F b. Now suppose that Z {λ}. Let us apply L b to (15). We see that the left part does not change under this operator, and the right part turns into Z Y, where Y = L Π(Z )(Y ) = L Π(Z )(Y ) since Π(Z ) Π(Z). So, we have F a F b Z. = Z Y, and this is a conjugacy equation for words on F. Since F a F b {λ}, there are only two opportunities: either Z = {λ}, that is, Z b. But due to (15), the canonical decomposition of Z starts with F a, a contradiction.. Or F a F b = RS, where R {λ}, Y = SR, and Z = (RS) k R for some k 0. In the second case, there are again two opportunities: R = F a, S = F b, or R = F a F b, S = {λ}. 17

If R = F a, we have Z = (F a F b ) k F a = L b (Z). Returning to (15), we see that the leftest undecomposable language in the canonical decomposition of Z is equal to F a. So, due to Theorem 2, L b (Z) = Z = Z. Then, Y = L Π(Z )(Y ) = L a (Y ) = F b F a, so that F b F a Y a F b F a. It is easy to check that Z and any Y satisfying this double inclusion fit the conjugacy equation: XZ = ZY = (F a F b ) k+1 F a. This is Solution 3. Z = (F a F b ) k F a for some k 0 and Y is an arbitrary factorial language satisfying F b F a Y a F b F a. If R = F a F b, we have Z = (F a F b ) k+1 for some k 0. As above, the canonical decomposition of Z starts with F a, so that Z = Z. Then, Y = F a F b and F a F b Y b F a F b. Clearly, Z and any Y satisfying this double inclusion fit the conjugacy equation: XZ = ZY = (F a F b ) k+2. This is Solution 4. Z = (F a F b ) k for some k 1 and Y is an arbitrary factorial language satisfying F a F b Y b F a F b. Now suppose that (14) holds. Here we may suppose that Z Z since otherwise the situation falls also into the previous case and has been considered. So, (Y ) = {y} for some y {a, b}. If Z = {λ}, we have Y = XZ = F a F b Z, which means that (Y ) = {a} and thus Z a. Consequently, Z = Z, and we get Solution 5. Z a (that is, Z = a or Z = A k for some k 0, where A k is defined in Section 5) and Y = F a F b Z. If Z = {λ}, then Z b and thus Z = Z or Z = {λ}. Both cases have been considered above. Now suppose that Z {λ}, Z {λ}. Due to Lemma 10, we have R y (Z ) = L b (Z ). Note that L b (Z ) = Z since Z is not empty and thus its canonical decomposition starts with F a ; so, Z = R y (Z ). Let us apply R y to both parts of (14); we obtain F a F b Z. = Z Y 1, where Y 1 = R y (Y ); here the right part of the equality holds since due to Lemma 9 (Y 1 ) = (Y ) = {y}. This equality is the conjugacy equation for words on F; since we have already considered the case when Z. = {λ}, the only new situation is F a F b = RS, where R {λ}, Y 1 = SR, and Z = (RS) k R for some k 0. As above, there are two cases: either R = F a, S = F b, or R = F a F b, S = {λ}. If R = F a, we have Z = (F a F b ) k F a and Y 1 = F b F a so that y = b. Since Z = R y (Z), we have (F a F b ) k F a Z (F a F b ) k F a b, and since Y 1 = R b (Y ) we have F b F a Y F b F a b. Note also that Z = L b (Z) (F a F b ) k F a by the definition of L b, so that (F a F b ) k F a Z Z (F a F b ) k F a b. 18

If we return to (14), we see that the cases when k = 0 and k > 0 give different solutions. If k = 0, we have F a F b Z. = Fa Y, so that Y. = F b Z. Here Z is an arbitrary factorial language such that F a Z F a b, and Z = L b (Z). Note also that Y = F b Z, and any Z satisfying the inclusions gives such a solution. This is Solution 6. Z is an arbitrary factorial language such that F a Z F a b, and Y = F b Z. (For example we can take Z = F a + b, and then Y = F b F a ; or Z = F a +Fac(bab ), and then Y = F b A 1 (F a + b ).) If k > 0, we have F a F b Z. = (Fa F b ) k F a Y, so that Y is an arbitrary factorial language such that F b F a Y F b F a b and (Y ) = {b} (then automatically Y can follow F a in a canonical decomposition). Then Z. = (Fa F b ) k 1 F a Y. Here Z is an arbitrary language such that L b (Z) = Z and R b (Z) = Z, that is, an arbitrary language such that Z Z Z b Z Z b. Note also that Z = (F a F b ) k F a Z, so it is not necessary to mention it in thethis is Solution 7. Y is an arbitrary factorial language such that F b F a Y F b F a b and (Y ) = {b}; and Z is an arbitrary factorial language satisfying (F a F b ) k F a Y Z b (F a F b ) k F a Y (F a F b ) k F a b, k 0. Example 10 We can take Y = F b F a + b a b and Z = F a F b F a + F a B a b + b F a b and will have F a F b Z = ZY. In remain to consider the case of R = F a F b, that is, Z = (F a F b ) k, k 1, and Y = F a F b ; here y = a and thus F a F b Y F a F b a ; here (Y ) must be equal to {a}. Equation (14) gives F a F b Z. = (Fa F b ) k Y, so that Z = (F a F b ) k 1 Y and thus (F a F b ) k 1 Y Z b (F a F b ) k 1 Y (F a F b ) k a. This is Solution 8. Y is an arbitrary factorial language such that F a F b Y F a F b a and (Y ) = {a}; and Z is an arbitrary factorial language such that (F a F b ) k 1 Y Z b (F a F b ) k 1 Y (F a F b ) k a for some k 1. Example 11 We can take Y = F a F b + a F b a and Z = F a F b +Fac(ba F b a ) and will have F a F b Z = ZY. We have listed all the possible cases and can state Lemma 17 Define X = F a F b. Then XZ = ZY for some binary factorial languages Z and Y if and only if Y and Z are defined according to one of Solutions 1 8. A general theorem describing when binary factorial languages commute can be stated as well, but will contain an intricate list of cases. 19

9 Further problems Two natural questions arise after several equations have been solved over binary factorial languages. First, is it possible to generalize our results to larger alphabets? In fact, we know that the theorem concerning the equation X n = Y n holds for an arbitrary alphabet; and it is not a problem to solve the conjugacy equation on a larger alphabet, but the situation with commutation is less clear. The case study of subalphabets occurring when we consider the case of (7) grows rapidly with the alphabet and instantly becomes very complicated. The second question concerns equations other than the considered ones: Is there a way to standardize solving general equations on binary factorial languages and to describe something like the Makanin algorithm for them? Clearly, solving equations on factorial languages by our technique cannot be easier than solving word equations: every time we list all possible forms of the equation for the canonical decompositions, and one of them just repeats the initial equation (but holds for words on F). We should solve it, as well as all the other possible equations for the canonical decompositions. Note that the number of the word equations to study increases rapidly with the cardinality of the alphabet considered and the length of the (left and right parts of the) initial equation: for each language variable X, we should consider all possible values of the subalphabets (X) and Π(X) and can meet the word variables L Π (X), R (X) and L Π (R (X)) for all possible subalphabets and Π. As it is shown above, the case study is far from trivial even if the alphabet is binary and the initial equation is very short. In fact, if we consider a longer equation, the following problem arises. A particular equation involving, e. g., variables X, L a (X) and L b (X) can admit a solution in terms of some new variable factorial languages (above they have been denoted for instance by R, S, and Q). We must have L a (X) + L b (X) X a L a (X) b L b (X): a solution of the word equation exists if and only if these inclusions hold. However, it is not even clear if satisfiability of such inclusions on factorial languages is decidable. In the considered examples, it was every time clear that a solution exists, but it was just some luck. So, it is not clear if generalizing the described technique to larger alphabets or longer equations is possible. 20

References [1] S. V. Avgustinovich, A. E. Frid, A unique decomposition theorem for factorial languages, Internat. J. Algebra Comput. 15 (2005), 149 160. [2] S. V. Avgustinovich, A. E. Frid, Canonical decomposition of a regular factorial language, in: CSR 2006, LNCS 3967, Springer, 2006, 18 22. [3] J. H. Conway, Regular Algebra and Finite Machines, Chapman & Hall, London, 1971. [4] V. Diekert, Makanin s Algorithm, in: M. Lothaire, Algebraic combinatorics on words, Cambridge Univ. Press, 2002. Pp. 387 442. [5] A. E. Frid, Canonical decomposition of a catenation of factorial languages, Siberian Electronic Mathematical Reports 4 (2007) 14 22, http://semr.math.nsc.ru/2007/v4/p12-19.pdf. [6] A. E. Frid, Commutation of binary factorial languages, in: DLT 2007, LNCS 4588, Springer, 2007, 193 204. [7] J. Karhumäki, M. Latteux, I. Petre, Commutation with codes, Theoret. Comput. Sci. 340 (2005) 322 333. [8] J. Karhumäki, M. Latteux, I. Petre, The commutation with ternary sets of words, Theory Comput. Systems 38 (2005) 161 169. [9] M. Kunc, The power of commuting with finite sets of words, Theory of Computing Systems 40 (2007), 521 551. [10] M. Kunc, What do we know about language equations?, in: DLT 2007, LNCS 4588, Springer, 2007, 23 27. [11] M. Lothaire, Combinatorics on words, Addison-Wesley, 1983. [12] B. Ratoandromanana, Codes et motifs, RAIRO Inform. Theor. 23 (1989) 425 444. [13] A. Salomaa, S. Yu, On the decomposition of finite languages, in: Developments in Language Theory. Foundations, Applications, Perspectives, World Scientific, 2000, 22 31. 21