Cover Automata for Finite Languages

Cover Automt for Finite Lnguges Michël Cdilhc Technicl Report n o 0504, June 2005 revision 681 Astrct. Although regulr lnguges comined with finite utomt re widely used nd studied, mny pplictions only use finite lnguges. Cover utomt were introduced in Câmpenu et l. (2001) s n efficient wy to represent such lnguges. The old concept is to hve n utomton tht recognizes not only the given lnguge ut lso words longer thn ny word in it. The construction of miniml deterministic cover utomton of finite lnguge results in n utomton with fewer sttes thn the miniml deterministic utomton tht recognizes strictly the lnguge. In this technicl report, the theory of cover utomt is presented, then we focus on the lgorithms tht compute miniml deterministic cover utomton. Résumé. Bien que les lngges rtionnels vus u trvers des utomtes finis soient très utilisés, eucoup d pplictions n utilisent u finl que les lngges finis. Les utomtes de couverture introduit dns Câmpenu et l. (2001) sont une fçon efficce de représenter ces lngges. Le ut est de créer un utomte qui reconnît un lngge plus grnd que celui d origine, mis qui permet de le retrouver pr seul test de l longueur du mot évlué. L utomte miniml construit à prtir de celui-ci ur moins d étt que l utomte miniml de l utomte qui reconnît strictement le lngge. Dns ce rpport technique, l théorie des utomtes de couverture ser présentée, puis il y ser détillé les lgorithmes qui permettent le clcul de l utomte miniml de couverture. Keywords Automt, Cover Automt, Minimiztion, Finite Lnguges. Lortoire de Recherche et Développement de l Epit 14-16, rue Voltire F-94276 Le Kremlin-Bicêtre cedex Frnce Tél. +33 1 53 14 59 47 Fx. +33 1 53 14 59 22 michel.cdilhc@lrde.epit.fr http://www.lrde.epit.fr/ cdilh_m

2 Copying this document Copyright c 2005 LRDE. Permission is grnted to copy, distriute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version pulished y the Free Softwre Foundtion; with the Invrint Sections eing just Copying this document, no Front-Cover Texts, nd no Bck-Cover Texts. A copy of the license is provided in the file COPYING.DOC.

Contents 1 Bsics 5 1.1 Definitions nd nottions.................................. 5 1.2 Isomorphism nd miniml utomt............................ 6 1.2.1 Preliminries.................................... 6 1.2.2 On the wy to minimiztion............................ 7 1.2.3 Minimiztion lgorithms.............................. 8 2 Cover utomt 12 2.1 Bsic ide.......................................... 12 2.2 On the similrity of sttes.................................. 13 2.3 Miniml DFCA....................................... 14 3 Cover minimiztion 16 3.1 A O(n 2 ) lgorithm...................................... 16 3.1.1 Definitions..................................... 16 3.1.2 Algorithm...................................... 17 3.1.3 Exmple...................................... 18 3.2 A O(nlog(n)) lgorithm................................... 19 3.2.1 Algorithm...................................... 19 3.2.2 Exmple...................................... 20 4 Implementtion in the VAUCANSON frmework 21 4.1 Implementtions...................................... 21 4.1.1 The O(n 2 ) lgorithm................................ 21 4.1.2 The O(nlog(n)) lgorithm.............................. 23 4.2 Performnces........................................ 25 Biliogrphy 27 Tle of figures 28 Index 29

Introduction Finite lnguges re perhps the most often used ut the lest studied fmily of lnguges in the forml lnguge fmily hierrchy. In the pst century, ll the works mde in this field were pieces of igger works tht did not focus on it. Only recently, severl spects of finite lnguges, such s the stte complexity nd decompositions, hve een studied (see for exmple Cmpenu et l. (1999), Yu (1999)). However, the wide use of finite lnguges 1 hs conduced the reserch effort to focus on wy to represent finite lnguges efficiently. This effort hs led to the cretion of specific field in the utomt theory: cover utomt. First introduced y Câmpenu et l. (2001), cover utomt re n lterntive representtion of finite lnguges. The im ws to crete n utomton tht cn, in some wy or nother, effectively represents finite lnguge ut with fewer sttes thn the corresponding utomton for this lnguge. By fewer sttes, we re referring to the numer of sttes of the miniml deterministic version of ech utomton. Roughly speking, if L is finite lnguge nd l the length of the longest word(s) in L, cover utomton for the lnguge L ccepts ll words in L nd possily dditionl words of length greter thn l. We will see in the sequel how this construction cn e used to: express finite lnguges s powerfully s with clssicl utomton, reduce the size of the utomton. In the first prt, some sics of utomt theory re introduced, together with reminder on clssicl utomt minimiztion. Then, in the second prt, cover utomt theory is descried, nd the min properties of these utomt re presented. In the third prt, the clss of lgorithms tht compute miniml deterministic finite cover utomton of lnguge is studied. Lst, the finl prt presents the implementtions nd performnces of these lgorithms in the VAUCANSON 2 frmework. 1 For instnce in lexicl nlysis or in user interfce trnsltions (see Wood nd Yu (1998)) 2 VAUCANSON: http://vucnson.lrde.epit.fr

Chpter 1 Bsics This chpter presents some sics of utomt theory we need in this report. First, some clssicl notions re presented, then the clss of lgorithms tht minimize n utomton. The following definitions re reltive to the use we will hve of them (i.e. they re not solute). Unless specified otherwise, they re tken from Skrovitch (2003). This prt eing only reminder, the theorem nd propositions listed here will not e demonstrted. 1.1 Definitions nd nottions Nottion 1 (Alphet nd words). If Σ is set of letter, We will write Σ for the set of words sed on the lphet Σ. The empty word will e denoted ε. Additionlly, we will note Σ >l the set of words longer thn l (respectively, Σ <l the set of words shorter thn l). Definition 1 (Lnguge nd Finite lnguge). A lnguge L on the lphet Σ is (possily empty) set of words, tht is to sy, L Σ. A lnguge L is finite if Crd(L) is ounded. Definition 2 (Automton). An utomton A is specified y the following elements: A non-empty set Q clled the set of sttes of A, A non-empty finite set Σ clled the lphet of A, Two susets I nd F of Q; I eing the set of initil sttes nd F eing the set of finl ones, A function ϕ : Q Σ 2 Q clled the trnsition function. Given stte q nd word w, ϕ(q, w) represents the set of sttes which hve incoming trnsitions from q leled y w. Definition 3 (Lnguge of n utomton). The lnguge recognized y the utomton A is L(A) = {w Σ q i I, q f F, q i w q f } Definition 4 (Finite utomton). We will sy tht n utomton is finite if the set Q is finite. Definition 5 (Rel-time utomton). An utomton is rel-time if the trnsition function ϕ is defined s ϕ : Q Σ 2 Q. In other words, ll the trnsitions re leled y single letter. Nottion 2 (Automton). With the nottions of Definition 2, n utomton A will e written s quintuplet A = Q, Σ, ϕ, I, F. We will sy tht A is n utomton over Σ.

1.2 Isomorphism nd miniml utomt 6 The sequel of this report will only consider rel-time utomt. Definition 6 (Completeness). An utomton is complete if for ll stte q nd for ll letter in the lphet, there is n outgoing trnsition from q leled y. An utomton cn e completed: sink stte is dded on which every missing trnsitions will go to. Definition 7 (Trim). An utomton is trimmed if ll its sttes re rechle from the initil stte nd cn rech finl stte. Trimming usully remove the sink stte. Definition 8 (Deterministic Finite Automton). A Deterministic Finite Automton (DFA) is n utomton such tht q Q, l Σ, δ(q, l) 1 nd I = 1, i.e. for ll sttes nd ll letters, it exists t most one outgoing trnsition leled y this letter. Definition 9 (Non-deterministic Finite Automton). A Non-deterministic Finite Automton (NFA) is the generl cse in the determinism property: determinism nd non-determinism re not mutully exclusive concepts, deterministic utomton is non-deterministic. The reminder of this document will only consider deterministic utomt. Definition 10 (Evlution). An evlution of word w = (w 1,..., w n ) on n utomton A is defined recursively on the letters y: Let Q k e the sttes reched y the word w = (w 1,..., w k ), Q k+1 is the sttes reched with the letter w k+1 from the sttes of Q k, Then the result of the evlution is true if Q n, flse otherwise. Nottion 3 (Evlution). The evlution of word w on n utomton A will e written evl(w, A). If evl(w, A) is true, then we sy tht w is recognized y A, or tht w is in the lnguge L(A) descried y A. 1.2 Isomorphism nd miniml utomt 1.2.1 Preliminries First, the reder hs to e convicted tht there is not only one utomton for specified lnguge. For instnce, these two utomt recognize the sme words: p q r Figure 1.1: Automt for * The question tht rises knowing tht is: Does n utomton hve cnonicl representtion? It is the cse with deterministic utomt: the miniml deterministic utomton is unique for given lnguge. For exmple, the miniml DFA for * is the one on the left on Figure 1.1.

7 Bsics The following definitions nd theorems will e needed to understnd the lgorithms tht compute the miniml utomton of DFA: Definition 11 (Isomorphism of utomt). Two utomt re isomorphic if they re the sme without regrd to the nme of the sttes. Definition 12 (Miniml utomton). An utomton A is miniml if for ll utomton B such tht L(A) = L(B), Crd(A) Crd(B) with Crd( ) giving the numer of sttes. Theorem 1 (Unicity of the miniml utomton). The miniml utomton of lnguge L is unique through utomt isomorphism. Definition 13 (Equivlency of sttes). Let A = Q, Σ, ϕ, I, F, (p, q) Q 2. Suppose we reched p nd q with two words w 1 nd w 2. We sy tht: w 1 is equivlent to w 2 nd note w 1 w 2 if z Σ, w 1 z L w 2 z L, p is equivlent to q nd note p q if w 1 w 2. The reltion is n equivlence reltion. Intuitively, p q if p nd q hve the sme future. The following theorem estlishes the link etween equivlency nd miniml utomt: Theorem 2 (Myhill (1957)). A DFA is miniml if nd only if (p, q) Q 2, p q p q. 1.2.2 On the wy to minimiztion From the Theorem 2, we could esily deduce simple lgorithm tht mkes the minimiztion. Given n orcle tht would sy if two sttes re equivlent, we could merge equivlent sttes (s they hve the sme future) to otin n utomton in which no sttes re equivlent. For instnce, in the following utomton, q nd r re equivlent: q c p c s r Figure 1.2: Automton for ( + )c

1.2 Isomorphism nd miniml utomt 8 Thus, to otin the miniml utomton for ( + )c, q nd r should e merged: p c {q, r} s Figure 1.3: Miniml utomton for ( + )c 1.2.3 Minimiztion lgorithms In this section, we will work on the utomton A = Q, Σ, ϕ, I, F. Minimiztion lgorithms im t finding the equivlences etween the sttes. They usully work y prtitioning the set of sttes in equivlence clsses, tht is to sy in clsses in which sttes re ll equivlent. The first gross division tht could e mde is Q = F Q \ F. Indeed, we know tht the elements of F could not e equivlent to the elements of Q \ F: only stte in F cn led to finl stte on the input ε. It does not men, however, tht the sttes in F re ll equivlent. The lgorithms egin with this first prtitioning, then mke successive refinements to rech fixed point. The following theorem ssures the equivlence etween this tsk nd the minimiztion. Theorem 3 (Myhill (1957)). For deterministic finite utomton M, the minimum numer of sttes in ny equivlent deterministic finite utomton is the sme s the numer of equivlence clsses of M s sttes. Moore s lgorithm The lgorithm of Moore (1956) is sed on the following fct: Fct 1. All equivlent sttes go to equivlent sttes under ll inputs. The time complexity of the lgorithm is in O(n 2 ), with n = Crd(A). Its principle is quite simple: 1 s t r t with two groups F nd Q \ F do { 3 f o r e v e r y group { f o r e v e r y s t t e i n t h e group { 5 f i n d which groups t h e i n p u t s l e d t o } 7 i f t h e r e r e d i f f e r e n c e s { p r t i t i o n t h e group i n t o s e t s c o n t i n i n g s t t e s which go t o t h e 9 sme groups under t h e sme i n p u t s } 11 } } while p r t i t i o n i n g hs een mde i n t h e loop

9 Bsics Let us tret n exmple: we will use the utomton M tht descries the lnguge denoted y the regulr expression ( + ) (). This utomton is rel-time nd deterministic, in other words, it mtches our requirements: + q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 Figure 1.4: The utomton M Our first prtition is the split of Q regrding finl sttes: A = {q 3, q 8 }, B = {q 1, q 2, q 4, q 5, q 6, q 7 } We shll now find if the sttes in these groups go to the sme group under inputs nd. It could e seen tht the sttes of group A oth go to sttes in group B under oth inputs. However, this is not the cse for the group B; the following tle shows the result of pplying the inputs to these sttes (for instnce, the input leds from q 7 to q 8, i.e. to A) In stte: q 1 q 2 q 4 q 5 q 6 q 7 leds to: B B B B B B leds to: B A B B B A The input helps us distinguish etween two of the sttes (q 2 nd q 7 ) nd the rest of the sttes in the group since it leds to group A for these two insted of group B. Thus we should split B into two groups nd we now hve: A = {q 3, q 8 }, B = {q 1, q 4, q 5, q 6 }, C = {q 2, q 3 } The next loop in the lgorithm shows us tht q 4 is not equivlent to the rest of group B with input nd we must split gin. Continuing this process until we cnnot distinguish etween the sttes in ny group y employing input tests, we end up with the following clsses: A = {q 3, q 8 }, B = {q 1, q 5, q 6 }, C = {q 2 }, D = {q 4 }, E = {q 7 }

1.2 Isomorphism nd miniml utomt 10 Considering the ove theoreticl definitions nd results, we cn sy tht ll sttes in ech group re equivlent ecuse they ll go to the sme groups under the inputs nd. The miniml utomton for L(M) is then: B C A D E Figure 1.5: Miniml utomton for L(M) Hopcroft s lgorithm The minimiztion lgorithm of Hopcroft (1971) is sed on the sme fct s Moore s one, ut its formultion is chnged: Fct 2. On some input nd for every group G, ll the groups should not contins some sttes tht re predecessor of G nd some others tht re not. This reversed point of view (i.e. to consider predecessors) together with trick in enqueueing in the lgorithms led to time complexity in O(n.log(n)). Hopcroft s lgorithm could e written s follows: i n i t i l i z e Π = {F, Q \ F}, 2 p u t F i n " To t r e t " queue T, while T i s not empty { 4 remove s e t S of T, f o r ech l e t t e r i n Π { 6 compute X : t h e l i s t of s t t e s t h t go i n S, Y : t h e l i s t of s t t e s t h t don t, 8 f o r ech s e t s π i n Π { i f π X nd π Y { 10 s p l i t π i n π X nd π Y, enqueue t h e smller one i n T. 12 } } 14 } } Let us pply this lgorithm to the utomton M of Figure 1.4. One of the first itertion will consider S = {q 2, q 7 } with the input letter. In the other set, nmely {q 0, q 1, q 3, q 4, q 5, q 6 }, the sttes {q 1, q 6 } re predecessors of S ut the others re not. The first split will e mde, nd the prtition will e: Π = {{q 0, q 3, q 4, q 5 }, {q 2, q 7 }, {q 1, q 6 }}

11 Bsics The lst group will e enqueued, eing the smller one in the split, then the loop will continue: the itertion with S = {q 1, q 6 } nd the input letter will split {q 0, q 3, q 4, q 5 }, q 4 not eing predecessor of S. We get: Π = {{q 0, q 3, q 5 }, {q 2, q 7 }, {q 1, q 6 }, {q 4 }} Agin, the lst group will e enqueued nd the itertion with S = {4} nd input letter will cuse n ultimte split of {q 1, q 6 }. We end up with the sme prtition s in Moore s lgorithm, tht is to sy: Π = {{q 0, q 3, q 5 }, {q 2, q 7 }, {q 1 }, {q 4 }, {q 6 }}

Chpter 2 Cover utomt In this chpter, we present the theory of cover utomt nd detil the current results of this field. As one of the most importnt, we will show tht miniml deterministic cover utomton for finite lnguge hs less or s mny sttes s the miniml deterministic utomton for this lnguge. The following is mostly tken from Câmpenu et l. (2001). 2.1 Bsic ide A finite lnguge hs finite numer of words; some of them re the longest ones. In the use of these lnguges, the length l of the longest words is often, if not lwys, known. An utomton for finite lnguge L, whose longest words hve size l, could e seen s: A checker for word w to e in (L W), W Σ >l, A checker for this word w to hve size less or equl to l. A cover utomton supposed deterministic (DFCA) for lnguge L will not check the length of the input. In other words, it ccepts ll words of L nd possily longer words thn the longest ones in L. If, wht we will suppose, l is known 1, the length check will e mde efore the evlution in the utomton nd we will end up with oth functionlities. For instnce, the following utomton is cover utomton for L = {,,,, }: p s q r 1 It cn e computed in time liner in L. Figure 2.1: Cover utomton for L = {,,,, }, l = 3

13 Cover utomt Oviously, this utomton does not recognize strictly L, for exmple, the word is ccepted here. However, the longest words hve length of l = 3, nd is longer thn tht. Therefore, the length check filing, we cn sy tht is not in L. It follows this forml definition: Definition 14 (Cover utomton). A deterministic finite cover utomton for finite lnguge L f of longest words of length l, is DFA A such tht L(A) Σ l = L f. We sy tht L(A) is cover lnguge of L f. The sequel of this report will focus on the cover minimiztion of n utomton, tht is to sy the uilding of miniml deterministic cover utomton (MDFCA) from DFA. 2.2 On the similrity of sttes As we sw in Section 1.2, the construction of miniml utomton is sed on the equivlence reltion (Definition 13): if p q, p nd q hve the sme future. In the context of cover utomt, we need nother importnt reltion: Definition 15 (Similrity of sttes). Let A = Q, Σ, ϕ, I, F, (p, q) Q 2. Suppose we reched p nd q with, respectively, two words w 1 nd w 2. We sy tht: w 1 is similr to w 2 nd note w 1 w 2 if z Σ such tht x z l y z l, xz L yz L, p is similr to q nd note p q if w 1 nd w 2 re the shortest pths to respectively p nd q, nd w 1 w 2. The defined reltion is reflexive, symmetric ut not trnsitive. Intuitively, p q if p nd q hve the sme ounded future. For instnce, in Figure 2.1, we hve, ut, so p q. Definition 16 (Gp). The function gp : Q Q N computes the length of the shortest word tht shows tht its two rguments re dissimilr (i.e. word tht leds to finl stte from one stte nd tht does not from the other one). For convenience, if they re similr, the function returns l. For instnce, from the following utomton: r q t s Figure 2.2: Automton for ( + )

2.3 Miniml DFCA 14 The gp function cn e computed s follows: gp q r s t q l 1 1 0 r l l 0 s l 0 t l For exmple, gp(r, q) is 1 ecuse the shortest word tht shows dissimilrity etween r nd q is. Definition 17 (Level). For i the initil stte of the utomton A = Q, Σ, ϕ, I, F, we define q Q, level(q) = min{ w, i w q} Intuitively, level(q) is the word tht represents the shortest pth to q. 2.3 Miniml DFCA Definition 18 (Miniml DFCA). An utomton A is miniml DFCA for L if for ll utomt B such tht B is cover utomton of L, Crd(A) Crd(B). The min theorem in the field of cover minimiztion is the equivlent of Theorem 2 ut with the reltion: Theorem 4 (Câmpenu et l. (2001)). A DFCA A is miniml if nd only if (p, q) Q 2, p q p q. A strong link cn e mde with the gp function: the computtion of this function leds to the knowledge of similr clsses: if gp(r, s) is known nd vlued s l, then r nd s re similr. Therefore: Fct 3. An lgorithm tht computes the gp function would cover minimize the utomton. It could e noticed tht the reltion is more restrictive thn. In other words, p q p q; this implies tht the similr clsses re lrger or s lrge s the equivlence clsses. The following fct is then deduced: Fct 4. The miniml DFCA for L is smller thn the miniml DFA for L. As miniml DFA ws first defined to e cnonicl representtion of lnguge, one cn wonder if miniml DFCA is still unique, this property eing relly useful when, for instnce, compring utomt. Als, it is not the cse, due to the fct tht the similrity reltion is not n equivlence reltion. For exmple, the following utomt re miniml DFCA for the lnguge L = {,,,,,, }: q q p + s p s r r Figure 2.3: Two distinct miniml DFCA for L = {,,,,,, }

15 Cover utomt Some questions my rise, first, Why cnnot we minimize through clssicl minimiztion the miniml DFCA in order to hve cnonicl form? The nswer is simple: miniml DFCA re miniml DFA. But the miniml DFCA A is for lnguge L f wheres the view of A s miniml DFA would e for the lnguge L(A), nd these lnguges re different (see Definition 14). Another question could e How mny distinct utomt cn n lgorithm tht mkes minimiztion through similrity clsses yield? The following theorem nswers this question: Theorem 5 (Câmpenu nd Pun (2002)). The numer of miniml DFCA tht cn e otined from given miniml DFA with n sttes y merging the similr sttes in the given DFA is upper ounded y k 0! (2k 0 n + 1)!, with k 0 = 4n 9 + 8n + 1 8 Moreover, this ound is reched, i.e. for ny given positive integer n we cn find miniml DFA with n sttes, which hs the numer of miniml DFCA otined y merging similr sttes equl to this mximum. This is penlty one hs to del with when working with cover utomt. Another penlty is the following: when one considers miniml utomton, the sink stte is usully not represented. However, minimiztion lgorithm should, in theory, return complete utomton. A drwck of cover utomt is tht the sink stte could e merged with nother stte, leding to the cretion of trnsitions tht trimmed utomton would not hve: p q + r {p, s} q r s + Figure 2.4: Merge of the sink stte

Chpter 3 Cover minimiztion This chpter presents two lgorithms tht mke cover minimiztion, tht is to sy, trnsform DFA A into miniml DFCA for L(A). The historiclly first lgorithm presented in the originl pper (Câmpenu et l. (2001)), with time complexity in O(n 4 ), hs messy theoreticl study; thus, we will not tret it, s the originl uthors pulished in the sme reth of the first pper nother one tht detils O(n 2 ) lgorithm. 3.1 A O(n 2 ) lgorithm This lgorithm hs een presented y Pun et l. (2001) nd is significnt improvement of the previous one of Câmpenu et l. (2001). As seen in Fct 3, computing the gp function is wy to compute the miniml DFCA. It is proposed, s in the very first lgorithm, to compute efficiently this function. 3.1.1 Definitions We ssume tht A = Q, Σ, ϕ, I, F is DFA ccepting finite lnguge L whose longest words hve length l. A is complete nd without ny useless stte except the sink stte. One my refer to Pun et l. (2001) for the proofs of the following lemms nd theorems. Definition 19 (Rnge). For p, q Q 2 nd p q, we define rnge(p, q) = l mx(level(p), level(q)) Intuitively, if w p nd w q re the shortest words tht led to, respectively, p nd q, rnge(p, q) is the mximum length of word w such tht w p w l nd w q w l. Definition 20 (Stte filure). Let p, q Q nd z Σ. We sy tht p nd q fil on z if ϕ(p, z) F nd ϕ(q, z) Q \ F or vice vers, nd z rnge(p, q). Theorem 6. p q if nd only if there exists z Σ such tht p nd q fil on z. With those definitions, gp cn e re-defined s follows: Definition 21 (Gp). If p q, gp(p, q) = min{ z such tht p nd q fil on z}

17 Cover minimiztion Theorem 7. 1. Let d e the sink stte of A. If level(d) > l, then q Q \ {d}, d q. If level(d) l, then f F, d f nd gp(d, f ) = 0. 2. If p F nd q Q \ F \ {d} then p q nd gp(p, q) = 0. Lemm 1. Let p, q Q 2, p q nd r = ϕ(p, ), t = ϕ(q, ), for some Σ, then rnge(p, q) rnge(r, t) + 1. The following theorems re the crux of the lgorithm: Theorem 8 (Pun et l. (2001)). Let p nd q e two sttes such tht either p, q F or p, q Q \ F. Then p q if nd only if there exists Σ such tht ϕ(p, ) = r nd ϕ(q, ) = t, r t, nd gp(r, t) + 1 rnge(p, q) Theorem 9 (Pun et l. (2001)). If p q such tht p, q F or p, q Q \ F, then gp(p, q) = min{gp(r, t)+1 ϕ(p, ) = r nd ϕ(q, ) = t, for Σ, r t, nd gp(r, t)+1 rnge(p, q)} 3.1.2 Algorithm The presented lgorithm ssumes tht its input utomton is ordered, which mens tht the sttes re numered from 0 to n, nd tht the lst one is the sink stte. The Theorem 9 nturlly leds to the following lgorithm: 1 Input: An ordered, reduced nd complete DFA A = Q, Σ, ϕ, I, F, with n + 1 sttes,which ccepts finite lnguge L whose longest words hvelengthl 3 Output: gp(i, j)for ech pir i, j Q nd i < j. Algorithm: 5 1. f o r ech i Q { compute level(i) } 7 2. f o r i = 0 t o n 1 { gp(i, n) = l } 9 i f level(n) l { f o r ech i F { 11 gp(i, n) = 0 }} 3. f o r ech p i r i, j Q \ {n} such t h t i < j { 13 i f i F nd j Q \ F or v i c e v e r s { gp(i, j) = 0 15 } e l s e { gp(i, j) = l 17 } } 19 4. f o r i = n 2 down t o 0 { f o r j = n down t o i + 1 { 21 f o r ech Σ { l e t i = ϕ(i, ) nd j = ϕ( j, ) 23 i f i j { g = i f (i < j ) t h e n gp(i, j ) e l s e gp( j, i ) 25 i f g + 1 rnge(i, j) { gp(i, j) = min(gp(i, j), g + 1) 27 }}}}}

3.1 A O(n 2 ) lgorithm 18 The time complexity is estlished s follows: Ech of Step 1 nd Step 2 is O(n), Step 3 tkes O(n 2 ) itertions, Step 4 hs two nested loops, ech of which hs O(n) itertions. Ech inner itertion is O( Σ ), where Σ is constnt. Therefore, this lgorithm tht computes the gp function is in O(n 2 ). 3.1.3 Exmple Let A = Q, Σ, ϕ, i, F e the utomton denoted y Figure 3.1. Clerly, L(A) = {c, c, c} nd l = 7. We suppose the existence of sink stte q 9, not represented here for the ske of clrity. q 1 q 2 q 3 q 4 q 5 q 6 q 7 c c c q 8 Figure 3.1: A DFA for L(A) = {c, c, c} We follow the lgorithm of Section 3.1.2 tht computes the gp function. At Step 1, level(q) is clculted for ech q Q: Stte q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 9 Level 0 1 2 3 4 5 6 3 1 After Step 2, Step 3 nd Step 4, we hve gp(q i, q j ) for ech 1 i 8, 2 j 9 nd i < j s follows: q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 9 q 1 2 1 2 1 2 1 0 3 q 2 1 7 1 7 1 0 2 q 3 1 7 1 7 0 1 q 4 1 7 1 0 2 q 5 1 7 0 1 q 6 1 0 2 q 7 0 1 q 8 0

19 Cover minimiztion In the ove tle, the sttes re equl if gp(q i, q j ) = 7, ell eing 7. The merging of similr sttes results in the following miniml DFCA for L(A) (the sink stte is still not represented): c {q 1 } {q 2, q 4, q 6 } {q 3, q 5, q 7 } {q 8 } Figure 3.2: MDFCA for L(A) = {c, c, c} 3.2 A O(nlog(n)) lgorithm This lgorithm, which hs een presented y Körner (2003), is mjor nd the lst improvement in cover minimiztion. Inspired y the lgorithm of Hopcroft (1971), it is unlikely tht n lgorithm with smller time complexity will e found s it equls the etter one for clssicl minimiztion. 3.2.1 Algorithm Bsiclly, it uses Hopcroft s lgorithm with regrd to ounded evlution of ech sttes. The lgorithm cn e expressed s the following to emphsize the similrities with Hopcroft s lgorithm: 1 i n i t i l i z e Π = {F, Q \ F}, p u t (F, 0) i n " To t r e t " queue T, 3 while T i s not empty { remove s e t (S, k) of T, 5 f o r ech l e t t e r i n Π { compute X : t h e l i s t of s t t e s t h t go i n S, 7 Y : t h e l i s t of s t t e s t h t don t, with level(p) + k l 9 f o r ech s e t s π i n Π { i f π X nd π Y { 11 s p l i t π i n π X nd π Y, enqueue t h e smller one i n T with k + 1. 13 } } 15 } }

3.2 A O(nlog(n)) lgorithm 20 The modifictions from the originl lgorithm re the following: l.2: The finl sttes re enqueued together with 0; this numer represents the distnce from the sttes enqueued to finl stte, l.4: k, this distnce, is removed with the set of sttes, l.8: The sttes re considered with regrd to their level nd the distnce from them to finl stte (k); it results in considering stte if nd only if the length of the word tht led to it is less thn l. l.12: The new set in enqueued with n incremented distnce to the finl sttes. As the computing of the level function is in O(n), the whole lgorithm is still in O(nlog(n)). 3.2.2 Exmple This exmple considers the following utomton B = Q, Σ, ϕ, i, F which recognizes the lnguge L(B) = {c, c} with l = 4. We suppose the existence of sink stte q 6 not represented in the figure. q 1 q 2 q 3 q c 4 q 5 c Figure 3.3: A DFA for L(B) = {c, c} One of the first itertion will consider S = {q 5 }, k = 0, with the input letter c. The set {q 1, q 2, q 3, q 4, q 6 } will e split into {q 2, q 4 } nd {q 1, q 3, q 6 }, nd the ltter will e enqueued together with k + 1 = 1. An other itertion will then consider S = {q 2, q 4 }, k = 1 nd the input letter. The set {q 1, q 3, q 6 } is split into {q 1, q 3 } nd {q 6 }, nd {{q 6 }, k + 1 = 2} will e enqueued. If we now consider S = {q 6 }, k = 2 nd the input letter, the list of sttes going in S is X = {q 1, q 3, q 4, q 5, q 6 }. This X would split the set {q 2, q 4 } if q 4 ws in the ctul X: indeed, level(q 4 ) + (k = 2) = 5 l, so q 4 is not in X. The lgorithm will not produce nymore split, nd the resulting utomton is the following: c {q 1, q 3 } {q 2, q 4 } {q 5 } Figure 3.4: MDFCA for L(B) = {c, c}

Chpter 4 Implementtion in the VAUCANSON frmework The two lgorithms of Section 3.1 nd Section 3.2 hve een implemented using the VAUCANSON frmework. In this chpter, we present those implementtions, together with study of their respective performnces. 4.1 Implementtions 4.1.1 The O(n 2 ) lgorithm The implementtion is quite strightforwrd, it uses only sic functionlities of VAUCANSON. The Step s in comment refer to the ones of Section 3.1. void compute_gps ( c o n s t u t o m t o n _ t& ut, 2 unsigned l, g p s _ t& gps, 4 v s t t e s _ t& s t t e s, l e v e l s _ t& l e v e l s ) 6 { c o n s t l p h e t _ t& l p h e t = u t. s e r i e s ( ). monoid ( ). l p h e t ( ) ; 8 / / S t e p 1. 10 f o r _ e c h _ i n i t i l _ s t t e ( i s t t e, u t ) c o m p u t e _ l e v e l s ( ut, i s t t e, 0, l e v e l s ) ; 12 / / Get t h e s i n k s t t e 14 h s t t e _ t s i n k = g e t _ s i n k _ s t t e ( u t ) ; 16 / / Compute Q s i n k s t t e s. c l e r ( ) ; 18 f o r _ e c h _ s t t e ( i s t t e, u t ) i f ( i s t t e!= s i n k ) 20 s t t e s. push_ck ( i s t t e ) ;

4.1 Implementtions 22 / / S t e p 2. 22 f o r _ l l ( v s t t e s _ t, i s t t e, s t t e s ) gps [ i s t t e ] [ s i n k ] = l ; 24 i f ( l e v e l s [ s i n k ] <= l ) 26 f o r _ e c h _ f i n l _ s t t e ( i s t t e, u t ) gps [ i s t t e ] [ s i n k ] = 0 ; 28 / / S t e p 3. 30 f o r _ l l ( v s t t e s _ t, i, s t t e s ) f o r _ l l ( v s t t e s _ t, j, s t t e s ) { 32 i f ( not ( i < j ) ) continue ; 34 i f ( ( u t. i s _ f i n l ( i ) nd not u t. i s _ f i n l ( j ) ) or ( not u t. i s _ f i n l ( i ) nd u t. i s _ f i n l ( j ) ) ) 36 gps [ i ] [ j ] = 0 ; e l s e 38 gps [ i ] [ j ] = l ; } 40 / / Now dd t h e s i n k s t t e t t h e end o f t h e c o n t i n e r. 42 s t t e s. push_ck ( s i n k ) ; 44 / / S t e p 4. d e l t _ t d e l t ; 46 v s t t e s _ t : : r e v e r s e _ i t e r t o r i = s t t e s. r e g i n ( ) ; f o r (++i, ++ i ; i!= s t t e s. rend ( ) ; ++i ) { 48 f o r ( v s t t e s _ t : : r e v e r s e _ i t e r t o r j = s t t e s. r e g i n ( ) ; j!= i ; ++ j ) 50 f o r _ e c h _ l e t t e r ( i l e t t e r, l p h e t ) { d e l t. c l e r ( ) ; 52 u t. l e t t e r _ d e l t c ( d e l t, i, i l e t t e r, d e l t _ k i n d : : s t t e s ( ) ) ; h s t t e _ t i p = ( d e l t. e g i n ( ) ) ; 54 d e l t. c l e r ( ) ; u t. l e t t e r _ d e l t c ( d e l t, j, i l e t t e r, d e l t _ k i n d : : s t t e s ( ) ) ; 56 h s t t e _ t j p = ( d e l t. e g i n ( ) ) ; 58 i f ( i p!= j p ) { unsigned g = ( i p < j p )? gps [ i p ] [ j p ] : gps [ j p ] [ i p ] ; 60 i f ( g + 1 <= r n g e ( i, j ) ) { gps [ i ] [ j ] = s t d : : min ( gps [ i ] [ i ], g + 1 ) ; 62 }}}}}

23 Implementtion in the VAUCANSON frmework 4.1.2 The O(nlog(n)) lgorithm This implementtion of the lgorithm presented in Section 3.2 computes similrity sttes decompositions (SSD). Only the relevnt portions of code re kept here. The comments re put to help the reder know where in lgorithm of Section 3.2 we re. Thnks to VAUCANSON, this implementtion is gret improvement of the originl 400 lines implementtion presented in Körner (2003). void compute_ssd ( u t o m t o n _ t& ut, 2 unsigned l, l e v e l s _ t& l e v e l s, 4 s s d s _ t& Qs ) { 6 / / Compute l e v e l ( q ) f o r l l q < Q l e v e l s. c l e r ( ) ; 8 f o r _ e c h _ i n i t i l _ s t t e ( i s t t e, u t ) c o m p u t e _ l e v e l s ( ut, i s t t e, 0, l e v e l s ) ; 10 / / Q( 0 ) = Q \ F; Q( 1 ) = F 12 f o r _ e c h _ s t t e ( i s t t e, u t ) i f ( not u t. i s _ f i n l ( i s t t e ) ) 14 Qs [ 0 ]. i n s e r t ( i s t t e ) ; e l s e 16 Qs [ 1 ]. i n s e r t ( i s t t e ) ; 18 unsigned r = 2 ; / / s s d s i n d e x. s s d _ q u e u e _ t T ; / / FIFO queue f o r s p l i t t i n g. 20 h s t t e s _ t X; h s t t e s _ t Y; 22 d e l t _ t d e l t ; c o n s t l p h e t _ t& l p h e t = u t. s e r i e s ( ). monoid ( ). l p h e t ( ) ; 24 / / I n i t i l i z e T w i t h ( F, 0) 26 T. push ( s s d _ p i r _ t ( h s t t e s _ t ( ), 0 ) ) ; f o r _ l l ( h s t t e s _ t, i s t t e, Qs [ 1 ] ) 28 T. f r o n t ( ). f i r s t. i n s e r t ( i s t t e ) ; 30 / / Min loop while ( not T. empty ( ) ) { 32 / / F i r s t e l e m e n t o f T i s T. f r o n t ( ) s s d _ p i r _ t S_k = T. f r o n t ( ) ; 34 T. pop ( ) ; 36 f o r _ e c h _ l e t t e r ( i l e t t e r, l p h e t ) { / / X = { p d e l t ( p, i l e t t e r ) < S nd l e v e l ( p ) + k < l }

4.1 Implementtions 24 38 / / Y = { p d e l t ( p, i l e t t e r ) n o t < S nd l e v e l ( p ) + k < l } X. c l e r ( ) ; 40 Y. c l e r ( ) ; f o r _ e c h _ s t t e ( i s t t e, u t ) { 42 d e l t. c l e r ( ) ; u t. l e t t e r _ d e l t c ( d e l t, i s t t e, i l e t t e r, 44 d e l t _ k i n d : : s t t e s ( ) ) ; / / w. r. t l e v e l nd k. 46 i f ( l e v e l s [ i s t t e ] + S_k. second < l ) { ool _ i s _ i n _ S = true ; 48 f o r _ l l ( d e l t _ t, i s u c c, d e l t ) i f ( S_k. f i r s t. f i n d ( i s u c c ) == S_k. f i r s t. end ( ) ) { 50 _ i s _ i n _ S = f l s e ; rek ; 52 } i f ( _ i s _ i n _ S ) 54 X. i n s e r t ( i s t t e ) ; e l s e 56 Y. i n s e r t ( i s t t e ) ; }} 58 60 f o r ( i n t i = r 1 ; i >= 0 ; i ) { h s t t e s _ t Qi_inter_X, Qi_minus_Qi_inter_X ; 62 ool _Qi_inter_Y_empty = true ; c o n s t unsigned n _ Q i _ s i z e = Qs [ i ]. s i z e ( ) ; 64 / / Compute Qi i n t e r X, Qi minus X nd Qi i n t e r Y 66 f o r _ l l ( h s t t e s _ t, i s t t e, Qs [ i ] ) { i f (X. f i n d ( i s t t e )!= X. end ( ) ) 68 Q i _ i n t e r _ X. i n s e r t ( i s t t e ) ; e l s e 70 Qi_minus_Qi_inter_X. i n s e r t ( i s t t e ) ; i f (Y. f i n d ( i s t t e )!= Y. end ( ) ) 72 _Qi_inter_Y_empty = f l s e ; } 74 / / i f Qi i n t e r X!= 0 nd Qi i n t e r Y!= 0 76 i f ( not Q i _ i n t e r _ X. empty ( ) nd not _Qi_inter_Y_empty ) { / / Z i s Q i _ i n t e r _ X 78 / / I f Z <= Qi \ Z i f ( Q i _ i n t e r _ X. s i z e ( ) <= Qi_minus_Qi_inter_X. s i z e ( ) ) { 80 Qs [ r ] = Q i _ i n t e r _ X ; Qs [ i ] = Qi_minus_Qi_inter_X ; 82 } e l s e { Qs [ r ] = Qi_minus_Qi_inter_X ; 84 Qs [ i ] = Q i _ i n t e r _ X ; } 86 s s e r t ( Qs [ i ]. s i z e ( ) + Qs [ r ]. s i z e ( ) == n _ Q i _ s i z e ) ; T. push ( s s d _ p i r _ t ( Qs [ r ], S_k. second + 1 ) ) ; 88 r += 1 ; }}}}}

25 Implementtion in the VAUCANSON frmework 4.2 Performnces At the moment this report is written, performnce tests re not deeply mde. A first result is the following: Protocol: Let L e lnguge composed of rndom words over Σ, with Σ = {, }, Let A e deterministic utomton tht recognizes L, We will compute the miniml utomton for A, nd miniml cover utomton for it. The results re the following: Crdinl Time (s) A Words Min. DFA Min. DFCA Hopcroft O(n 2 ) O(nlog(n)) 55 20 37 30 0.01 0.01 0.02 412 40 172 140 0.1 0.9 0.5 963 60 498 440 0.2 3.0 1.1 1418 80 742 698 0.5 6.4 3.1 2437 100 1481 1323 0.9 34.5 9.2 More tests nd comments on performnces re to e mde in the next few months. Câmpenu et l. pln to mke their tests too, ut the est protocol to do so is not relly defined t this time: rel-world pplictions tend to not give good results nd we hve to focus on specific uses of cover utomt.

Conclusion In this technicl report, we presented the theory of cover utomt, n efficient wy to represent finite lnguges. This field try to fulfill rel need from the world of lnguge processing, which commonly uses finite lnguges. Cover utomt re useful thnks to lgorithms tht cover minimize utomt, giving n utomton with the sme lnguge of the input one, modulo the size of the words recognized. The est lgorithm known is in O(nlog(n)) nd is inspired y Hopcroft s lgorithm, which is the most performnt lgorithm for clssicl minimiztion. It is rther unlikely tht this ound would e improved, nd it is ssumed to e tight for oth minimiztions. Ongoing reserches in this field re focused on the test of performnce: scientists re interested in knowing in which cses cover utomt cn significntly reduce the numer of sttes of n utomton.

Biliogrphy Cmpenu, C., II, K. C., Slom, K., nd Yu, S. (1999). Stte complexity of sic opertions on finite lnguges. In WIA, pges 60 70. Cmpenu, C., Pun, A., nd Yu, S. (2002). An efficient lgorithm for constructing miniml cover utomt for finite lnguges. Interntionl Journl of Foundtions of Computer Science (IJFCS), 13(1):83?? Câmpenu, C. nd Pun, A. (2002). The numer of similrity reltions nd the numer of miniml deterministic finite cover utomt. In CIAA, pges 67 76. Câmpenu, C., Pun, A., nd Yu, S. (2002). An efficient lgorithm for constructing miniml cover utomt for finite lnguges. Interntionl Journl of Foundtions of Computer Science (IJFCS), 13(1):83?? Câmpenu, C., Sânten, N., nd Yu, S. (2001). Miniml cover-utomt for finite lnguges. Theor. Comput. Sci., 267(1-2):3 16. Hopcroft, J. E. (1971). An nlogn lgorithm for minimizing the sttes in finite-utomton. In Kohvi, Z., editor, Theory of Mchines nd Computtions, pges 189 196. Acdemic Press. Körner, H. (2003). A time nd spce efficient lgorithm for minimizing cover utomt for finite lnguges. Interntionl Journl of Foundtions of Computer Science (IJFCS), 14(6):1071?? Moore, E. (1956). Gednken-experiments on sequentil mchines. In Shnnon, C. nd McCrthy, J., editors, Automt Studies, pges 129 153. Princeton University Press, Princeton, NJ. Myhill, J. (1957). Finite utomt nd the representtion of events. Technicl Report 57-624, WADC. Pun, A., Sânten, N., nd Yu, S. (2001). An o(n 2 ) lgorithm for constructing miniml cover utomt for finite lnguges. In CIAA 00: Revised Ppers from the 5th Interntionl Conference on Implementtion nd Appliction of Automt, pges 243 251, London, UK. Springer-Verlg. Skrovitch, J. (2003). Éléments de theorie des utomtes. Éditions Vuiert. Tle of Contents, prefce nd introductions to chpters ville t http://perso.enst.fr/ jsk/eta/. Wood, D. nd Yu, S., editors (1998). Automt Implementtion, Second Interntionl Workshop on Implementing Automt, WIA 97, London, Ontrio, Cnd, Septemer 18-20, 1997, Revised Ppers, volume 1436 of Lecture Notes in Computer Science. Springer. Yu (1999). Stte complexity of regulr lnguges. In IWDCAGRS: Proceedings of the Interntionl Workshop on Descriptionl Complexity of Automt, Grmmrs nd Relted Structures.

List of Figures 1.1 Automt for *...................................... 6 1.2 Automton for ( + )c.................................. 7 1.3 Miniml utomton for ( + )c............................. 8 1.4 The utomton M...................................... 9 1.5 Miniml utomton for L(M)............................... 10 2.1 Cover utomton for L = {,,,, }, l = 3.................... 12 2.2 Automton for ( + ).................................. 13 2.3 Two distinct miniml DFCA for L = {,,,,,, }............ 14 2.4 Merge of the sink stte................................... 15 3.1 A DFA for L(A) = {c, c, c}......................... 18 3.2 MDFCA for L(A) = {c, c, c}........................ 19 3.3 A DFA for L(B) = {c, c}................................ 20 3.4 MDFCA for L(B) = {c, c}............................... 20

Index Alphet, 5 Automton, 5, 5 complete, 6 cover, see DFCA deterministic finite, see DFA finite, 5 isomorphism of, 7 miniml, see MDFA miniml deterministic, see MDFA miniml deterministic cover, see MDFCA non-deterministic finite, see NFA rel-time, 5 Rnge, 16 Similrity, 13 Stte filure, 16 Trim, 6 VAUCANSON, 21, 23 Words, 5 Complexity, 18 Cover utomt, 4 Cover minimiztion, 13, 14, 16, 19 DFA, 6, 13 DFCA, 12, 13, 14 Equivlency, 7 Evlution, 6 Gp, 13, 16 gp, 18 Hopcroft s lgorithm, 19 Hopcroft s lgorithm, 10 Lnguge, 5 cover, 13 finite, 5, 16 of n utomton, 5 Level, 14 MDFA, 6, 7, 8, 13 MDFCA, 13, 14, 14 Minimiztion, 8, 10 cover, see Cover minimiztion Moore s lgorithm, 8 NFA, 6 29