On closures of lexicographic star-free languages. E. Ochmański and K. Stawikowska

On closures of lexicographic star-free languages E. Ochmański and K. Stawikowska Preprint No 7/2005 Version 1, posted on April 19, 2005

On closures of lexicographic star-free languages Edward Ochma ski and Krystyna Stawikowska Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toru {edoch,entropia}@mat.uni.torun.pl Abstract. Muscholl and Petersen showed that, in the case of transitive dependencies, closures of star-free lexicographic languages are star-free or non-regular. It implies that, in the same case of transitive dependencies, closures of star-free lexicographic languages are star-free. In this paper, it is shown to be true also in the case of transitive independencies. Main result is even more general, but the general question, if closures of star-free lexicographic languages are star-free in any case, remains open. 1 Introduction Trace theory, as a formal tool for description of concurrent behaviours, was proposed by Mazurkiewicz [2]. The approach is based on the notion of independency relation, expressing concurrent execution of actions. One of the most useful operations on languages, related to independency, is the closure operation, introduced by Szíjártó [8] in the initial period of trace theory. Theory of star-free languages started with the fundamental Schützenberger [7] Theorem (star-free = aperiodic in free monoids). Guaiana/Restivo/Salemi [1] combined the both subjects, proving Schützenberger Theorem for arbitrary trace monoids. The main inspiration for the present paper descends from Muscholl/Petersen [4], where the closure operation in the context of star-free languages was studied. Lexicographic words play an important role in the theory of recognizable trace languages, because Closures of regular sets of lexicographic words are again regular (EO [5]). Results of Muscholl/Petersen [4] arouse a suspicion that Closures of star-free sets of lexicographic words are again star-free. It is true for trace monoids with transitive dependencies (cartesian products of free monoids), by results of [4] and [5]. We prove in this paper that it is also true for trace monoids with transitive independencies (free products of commutative monoids). For this aim, we define a subclass of star-free languages (called LSF-languages) and show that each star-free language of lexicographic words is an LSF-language. The proof consists in a detailed analysis of minimal deterministic automata for sets of lexicographic words. 1

2 Preliminaries In this section, we recall very briefly basic notions and known results, needed here. Finally, we state the main question of the paper. 2.1 Basic Notions An alphabet A is assumed to be finite. The set A* with concatenation as the product operation form the free monoid; subsets of A* are called (word) languages. Let I A A be a symmetric and irreflexive relation on A, called independency, its complement D=A A I is named dependency. The couple (A,I) or (A,D) is said to be a concurrent alphabet. Given a concurrent alphabet (A,I), the trace monoid A*/I is the quotient of the free monoid A* by the least congruence on A* containing the relation {ab=ba aib}. Members of A*/I are called traces, and sets of traces (i.e. subsets of A*/I) are called trace languages. Clearly, a trace monoid A*/I is free iff I=. Given a monoid M, the complement of a subset X M will be denoted by, i.e. X =M X. Let (A,I) be a concurrent alphabet. Any word w A* induces a trace [w] A*/I the congruency class of w, any word language L A* induces a trace language [L] = {[w] w L} the set of all traces induced by members of L. Given a trace language T A*/I, the flattening of T is the word language T = {w A* [w] T} the union of traces in T, viewed as subsets of A*. Given a word language L A*, the closure of L is the word language L =[L]. A word language L is said to be closed (w.r.t. I) iff L= L. Given a trace monoid M=A*/I and a trace language T M, the following notions are well-known: atomic trace languages (atoms, for short):, [ε], all [a] for a A; syntactic congruence T M M of T: x T y iff ( r,s M) rxs T rys T; syntactic monoid of T: the quotient monoid M/ T. A trace language T M=A*/I is said to be: rational iff it is built from atoms with union, product and star; recognizable iff its syntactic monoid is finite; star-free iff it is built from atoms with union, product and complement; aperiodic iff its syntactic monoid is finite and aperiodic ( n) ( x) x n =x n+1. Classes of languages, defined above, will be denoted by RAT(M), REC(M), SF(M) and AP(M), respectively. The argument will be possibly omitted, if it will not lead to a confusion. As RAT(A*)=REC(A*) in finitely generated free monoids (Kleene Theorem), the class RAT(A*)=REC(A*) will be uniformly denoted by REG(A*) in that case. By definitions, AP(M) REC(M) for any monoid M. Moreover, if M is a trace monoid, there hold the inclusions SF(M) REC(M) RAT(M). 2

2.2 Roots and influences Let us remind some fundamental results, utilized in the paper. Theorem 2.1 (Schützenberger [7]). SF(A*)=AP(A*). Theorem 2.2 (Guaiana/Restivo/Salemi [1]). Let M=A*/I be a trace monoid and T M. T SF(M) T AP(M) T AP(A*) T SF(A*) Corollary 2.3. The family of closed SF-languages is the least family containing atoms and closed under union, complement and closed product (where X Y= XY ). Corollary 2.4. Let X,Y A*. If X,Y SF, then XY = X Y SF. Theorem 2.5 (Muscholl/Petersen [4]). Let (A,D) be a concurrent alphabet with transitive dependency D, and let L A*. If L SF then L SF or L REG. Example 2.6 (Muscholl/Petersen [4]). If D is not transitive, then Theorem 2.5 does not hold, as shown by the set L=(abcbac)* for D: a c b. 2.3 Lexicographic languages and their closures A concurrent alphabet, equipped with a strict order on the alphabet, will be called an oriented concurrent alphabet and denoted by (A,<,I). Such an alphabet is said to be transitively oriented iff the relation < I is transitive. Any strict order on A induces the well-known lexicographic order on A*. A word w A* is said to be lexicographic (w.r.t. < and I) iff it is lexicographically first in its closure w A*. The set of all lexicographic words will be denoted by LEX (assuming < and I are unambiguously fixed). Notice that, for each oriented concurrent alphabet, LEX is star-free, as LEX = A* {A*bI a *aa* aib a<b}, where I a ={c A aic}. Property 2.7. ( x,y,z A*) xyz LEX y LEX. Given an oriented concurrent alphabet (A,<,I), the lexicographic representation of a trace language T A*/I is defined as the word language Lex(T)=T LEX. Theorem 2.8 (EO [5]). Let M=A*/I be a trace monoid. trace formulation T REC(M) Lex(T) REG(A*) word formulation L REG(A*) L LEX REG(A*) 3

Theorem 2.8 arouses a question for star-free languages: Question 2.9. Let (A,<,I) be an oriented concurrent alphabet. Is it true that trace formulation T SF(A*/I)? Lex(T) SF(A*), word formulation L SF(A*)? L LEX SF(A*). The question is supported by the result and example of Muscholl/Petersen [4], as Theorems 2.5 and 2.8 yield the positive answer in the case of transitive dependency (as SF(A*) REG(A*) and L LEX = L ), and moreover, the language of Example 2.6 does not work as counterexample for our question, because it is not included in LEX. Remark that the implication is true, because the family SF is closed under intersection. Thus the crucial question may be formulated as follows: Question 2.10. Does L SF L LEX imply L SF? 3 LSF-languages Let us define two operations on languages, related to LEX. lexicographic product X Y = XY if XY LEX and undefined otherwise; lexicographic complement X = LEX X. Let LSF be the class of languages built from atoms with union, lexicographic product and lexicographic complement. Any expression built from atoms with, (whenever defined) and is called an LSF-expression. Fact 3.1. If L LSF then L SF and L LEX Proof. Obvious, as atoms and LEX are star-free and included in LEX. We will see, in the next section, that the converse of Fact 3.1 is true in the transitively oriented case, as a consequence of Lemmas 4.6 and 4.7. We do not know, if it is true in general. Lemma 3.2. If L LSF then L SF. Proof. Structural induction on LSF-expressions: For atoms obvious, for because X Y = X Y, for from Corollary 2.4, for because X = A* X if X LEX. 4

4 Transitively oriented case For this section, we assume (A,<,I) with transitive < I. In this case the following characterization of LEX holds: Proposition 4.1 (Métivier/EO [3]). If (A,<,I) is transitively oriented, then LEX = {w A* ( a,b A) ( u,v A*) w=uabv adb a<b}. Let us denote, for y A, ylex = LEX ya* lexicographic words started with y LEX y = LEX A*y lexicographic words ended in y Lemma 4.2. If (a 1 <...< a n ;I) is transitively oriented, then ( y A) y LEX LSF and LEX y LSF Proof. Notice that, as a consequence of Proposition 4.1, we have (1) ( y A) y LEX = y (LEX { x LEX x<y xiy}), (2) ( y A) LEX y = (LEX {LEX z y<z yiz}) y. Clearly, (1 ) a1 LEX = a 1 LEX LSF and (2 ) LEX an = LEX a n LSF. Now, inductively from a 1 to a n, starting with (1 ), we get (1 ) ( y A) y LEX LSF, and from a n to a 1, starting with (2 ), we get (2 ) ( y A) LEX y LSF. 4.1 Automata for LEXes Automaton is a quadruple (A,Q,s 0,F), where A is an alphabet, Q is a set of states, s 0 Q is an initial state and F Q is a set of final states; states are partial functions q:a Q; domain of q will be denoted by dom(q). Whenever F=Q (any state is final), we write such automaton as a triple (A,Q,s 0 ); it is the case of automata for LEXes. Proposition 4.1 gives reasons, in the case of transitively oriented alphabets, for the following construction of minimal deterministic automaton A LEX for the set LEX. Construction 4.3. M.d.a. for LEX transitively oriented case Given a transitively oriented concurrent alphabet (a 1 <... < a m ;D), the automaton is built up inductively, as follows. A 1 = A LEX (a 1 ;D): ({a 1 },{q 1 }, q 1 ), where q 1 (a 1 )=q 1 Given the automaton A n = A LEX (a 1 <... < a n ;D) = ({a 1,...,a n }, Q n, q 1 ) for n<m (set Q n =k), we build A n+1 = A LEX (a 1 <... < a n < a n +1 ;D) = ({a 1,...,a n,a n+1 }, Q n+1, q 1 ): 5

1 Q n+1 =Q n {q}, where q=q i Q n if ( x dom(q i )) xda n+1 and ( j<i) ( x dom(q i )) xia n+1 in words, q i is the first state of Q n such that all members of its domain depend on a n+1. If such a state does not exist, q=q k+1 a new state is added. 2 We extend domains of states in Q n : ( i k) q i (a n+1 )=q (the state found or added in 1 ). 3 If a new state was added in 1, we define its activity: dom(q k+1 )={x {a 1,...,a n,a n+1 } xda n+1 }; q k+1 (x)=q i iff ( y dom(q i )) ydx x<y and ( j<i) ( y dom(q i )) yix y<x in words, q i is the first state of Q n+1 such that each member of its domain depends on x or follows x (in the alphabetic order). Example 4.4. For the concurrent alphabet (A,D): a c b d, with the order a<b<c<d, the construction of m.d.a. for LEX proceeds as follows (from left to right: A 1, A 2, A 3 and A 4 ): a 1 a 1 a 1 c a 1 c d b c b c b 3 d b d b 2 b 2 b 2 Property 4.5. Let (A,<,I) be transitively oriented and A LEX = (A,Q,q 1 ). Then, for each a A, there is exactly one state q a Q such that ( p,r Q) p(a)=r r=q a i.e. all a-labelled arcs in A LEX aim at a common state q a. Proof. Directly from Construction 4.3. Given an automaton A LEX = (A,Q,q 1 ), by L(p,r) we denote, for p,r Q, the language recognizable by the automaton (A,Q,p,r), i.e. the language given by all paths in A LEX from p (as initial state) to r (as final state). And by L(p,Q) we denote {L(p,r) r Q}. Lemma 4.6. Let (A,<,I) be transitively oriented and A LEX = (A,Q,q 1 ). Then ( p,r Q) L(p,r) LSF. Proof. We have L(q 1,r)={LEX y ( q Q) q(y)=r} ( {ε} if r=q 1 ), by Property 4.5, and L(p,Q)={ y LEX y dom(p)} {ε}, by Properties 2.7 and 4.5. Hence, by Lemma 4.2, L(q 1,r) and L(p,Q) are in LSF. Now observe that L(p,r) {ε} = L(p,Q) L(q 1,r) {ε}, by Properties 2.7 and 4.5. As LSF is closed under boolean operations, L(p,r) LSF. 6

4.2 Main result The following lemma holds for arbitrary oriented alphabets. Lemma 4.7. Let (A,<,I) be an oriented concurrent alphabet and A LEX = (A,Q,q 1 ). If any L(p,r) LSF, then L SF L LEX L LSF. Proof. First we prove, under the hypothesis of the lemma, the following claim. Claim: ( L SF)( p,r Q) L L(p,r) LSF. Proof. Structural induction on SF-expression. For atoms obvious. Induction step: (X Y) L(p,r) = (X L(p,r)) (Y L(p,r)) LSF; XY L(p,r) = {(X L(p,q))(Y L(q,r)) q Q} LSF; X L(p,r) = X L(p,r) = L(p,r) (X L(p,r)) LSF. End of Claim Now observe that LEX={L(q 1,r) r Q}, thus L LEX={L L(q 1,r) r Q}. As all L L(q 1,r) LSF (by Claim), its union is in LSF, too. Remark that Lemmas 4.6 and 4.7 yield the converse of Fact 3.1 in the case of transitively oriented concurrent alphabets. And now we can prove: Proposition 4.8. Let (A,<,I) be a transitively oriented concurrent alphabet. If L SF and L LEX, then L SF. Proof. Any L(p,r) LSF, by Lemma 4,6. Then L LEX=L LSF, by Lemma 4.7. And Lemma 3.2 ends the proof. 5 Conclusions Let us recall Theorem 2.5 and notice that, with a support of Theorems 2.8 and 2.2, it yields Proposition 5.1. Let (A,I) be a concurrent alphabet with transitive D=A* A* I, and let T A*/I be a trace language. Then (i) T SF iff (ii) ( <) Lex < (T) SF iff (iii) ( <) Lex < (T) SF Proof. (iii) (ii) is obvious. (ii) (i): Let L=Lex < (T); by Theorem 2.8, L REG; then by Theorem 2.5, T= L SF; and finally, by Theorem 2.2, T SF. (i) (iii): by Theorem 2.2, T SF, thus Lex < (T)= T LEX SF, since LEX SF. Remark that only (ii) (i) uses the assumption that D is transitive (Theorem 2.5). 7

Let us look at concurrent alphabets with transitive I. Remind that such trace monoids constitute an important class of trace monoids. Namely, it is just the class of trace monoids with decidable recognizability problem (Sakarovitch [6]) and SF-problem (Muscholl/Petersen [4]). Results of the present paper show that thesis of Proposition 5.1 is true also under the hypothesis of transitive I. Proposition 5.2. Let (A,I) be a concurrent alphabet with transitive I, and let T A*/I be a trace language. Then (i) T SF iff (ii) ( <) Lex < (T) SF iff (iii) ( <) Lex < (T) SF Proof. From Proposition 4.8, as in this case (A,<,I) is transitively oriented for any <. The question, if Propositions 5.1 and 5.2 hold unconditionally (for arbitrary concurrent alphabets), remains open. Notice that there are concurrent alphabets with non-transitive D and without transitive orientations, for example the pentagon (A,D): References 1. G. Guaiana, A. Restivo, S. Salemi: Star-free trace languages. Theoretical Computer Science 97, pp. 301-311, 1992. 2. A. Mazurkiewicz: Concurrent program schemes and their interpretations. Report DAIMI-PB-78, Aarhus University, 1977. 3. Y. Métivier, E. Ochma ski: On lexicographic semi-commutations. Information Processing Letters 26, pp. 55-59, 1987. 4. A. Muscholl, H. Petersen: A note on the commutative closure of star-free languages. Information Processing Letters 57, pp. 71-74, 1996. 5. E. Ochma ski: Regular behaviour of concurrent systems. Bulletin of EATCS 27, pp. 56-67, 1985. 6. J. Sakarovitch: The last decision problem for rational trace languages. Proc. of LATIN 92, LNCS 583, pp. 460-473. Springer, 1992. 7. M.P. Schützenberger: On finite monoids having only trivial subgroups. Information and Control 8, pp. 190-194, 1965. 8. M. Szíjártó: A classification and closure properties of languages for describing concurrent system behaviours. Fundamenta Informaticae 4, pp. 531-549, 1981. 8