Hopcroft and Karp s algorithm for Non-deterministic Finite Automata

Similar documents
Checking NFA equivalence with bisimulations up to congruence

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Minimal DFA. minimal DFA for L starting from any other

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Theory of Computation Regular Languages

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages

Coalgebra, Lecture 15: Equations for Deterministic Automata

The Regulated and Riemann Integrals

CS 275 Automata and Formal Language Theory

p-adic Egyptian Fractions

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Math 1B, lecture 4: Error bounds for numerical methods

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Convert the NFA into DFA

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Formal Languages and Automata

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Recitation 3: More Applications of the Derivative

More on automata. Michael George. March 24 April 7, 2014

Lecture 09: Myhill-Nerode Theorem

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

State Minimization for DFAs

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

Nondeterminism and Nodeterministic Automata

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

Handout: Natural deduction for first order logic

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

7.2 The Definite Integral

Formal languages, automata, and theory of computation

Finite Automata-cont d

Continuous Random Variables

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

New Expansion and Infinite Series

Lecture 3: Equivalence Relations

20 MATHEMATICS POLYNOMIALS

Lecture 08: Feb. 08, 2019

Lecture 14: Quadrature

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

A recursive construction of efficiently decodable list-disjunct matrices

Quadratic Forms. Quadratic Forms

Regular expressions, Finite Automata, transition graphs are all the same!!

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Lecture 1. Functional series. Pointwise and uniform convergence.

Riemann is the Mann! (But Lebesgue may besgue to differ.)

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

Chapter 2 Finite Automata

Deterministic Finite Automata

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Chapter 5 : Continuous Random Variables

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Java II Finite Automata I

CHAPTER 1 Regular Languages. Contents

Improper Integrals, and Differential Equations

3 Regular expressions

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Nondeterminism. Nondeterministic Finite Automata. Example: Moves on a Chessboard. Nondeterminism (2) Example: Chessboard (2) Formal NFA

1.4 Nonregular Languages

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

Designing finite automata II

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Bases for Vector Spaces

CS 275 Automata and Formal Language Theory

Review of Gaussian Quadrature method

1.3 Regular Expressions

How to simulate Turing machines by invertible one-dimensional cellular automata

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

1 Online Learning and Regret Minimization

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Summer School Verification Technology, Systems & Applications

1 Structural induction

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

Vector potential quantization and the photon wave-particle representation

1B40 Practical Skills

1 Nondeterministic Finite Automata

1.9 C 2 inner variations

Parse trees, ambiguity, and Chomsky normal form

Myhill-Nerode Theorem

Theoretical foundations of Gaussian quadrature

Harvard University Computer Science 121 Midterm October 23, 2012

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Concepts of Concurrent Computation Spring 2015 Lecture 9: Petri Nets

COMPUTER SCIENCE TRIPOS

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Automata and Languages

Non-deterministic Finite Automata

Bernoulli Numbers Jeff Morton

Transcription:

Hopcroft nd Krp s lgorithm for Non-deterministic Finite Automt Filippo Bonchi, Dmien Pous To cite this version: Filippo Bonchi, Dmien Pous. Hopcroft nd Krp s lgorithm for Non-deterministic Finite Automt. 2011. <hl-00639716v1> HAL Id: hl-00639716 https://hl.rchives-ouvertes.fr/hl-00639716v1 Submitted on 9 Nov 2011 (v1), lst revised 11 Jul 2012 (v5) HAL is multi-disciplinry open ccess rchive for the deposit nd dissemintion of scientific reserch documents, whether they re published or not. The documents my come from teching nd reserch institutions in Frnce or brod, or from public or privte reserch centers. L rchive ouverte pluridisciplinire HAL, est destinée u dépôt et à l diffusion de documents scientifiques de niveu recherche, publiés ou non, émnnt des étblissements d enseignement et de recherche frnçis ou étrngers, des lbortoires publics ou privés.

Hopcroft nd Krp s lgorithm for Non-deterministic Finite Automt Filippo Bonchi nd Dmien Pous November 2011 Abstrct An lgorithm is given for determining if two non-deterministic finite utomt re lnguge equivlent. We exploit up-to techniques to improve the stndrd lgorithm by Hopcroft nd Krp for deterministic finite utomt [5], so s to void computing the whole deterministic utomt. Although the proposed lgorithm remins exponentil in worst cse (the problem is PSPACE-complete), experimentl results show tht it cn be much fster thn the stndrd lgorithm: only very smll portion of the determinized utomt hve to be explored in prctice. Keywords Lnguge Equivlence, Non-deterministic Finite Automt, Bisimultion, Coinduction, Up-to techniques, Congruence. 1 Introduction Checking lnguge equivlence of finite utomt is clssicl problem in computer science, tht finds pplictions in mny fields rnging from compilers construction to model checking. Equivlence of deterministic finite utomt (DFA) cn be checked either vi minimiztion [4, 2] or through Hopcroft nd Krp s lgorithm [5], which exploits n instnce of wht is nowdys clled coinduction proof principle [7, 12, 10, 1]: two sttes recognise the sme lnguge if nd only if there exists bisimultion relting them. In order to check the equivlence of two given sttes, Hopcroft nd Krp s lgorithm cretes reltion contining them nd tries to build bisimultion, by dding pirs of sttes to this reltion: if it succeeds then the two sttes re equivlent, otherwise they re different. On the one hnd, minimiztion hs the dvntge of checking the equivlence of ll the sttes t once (while Hopcroft nd Krp only of certin pir of the sttes). On the other hnd, minimiztion hs the disdvntge of ENS Lyon, Université de Lyon, LIP (UMR 5668) CNRS, Université de Grenoble, LIG (UMR 5217) 1

needing the whole utomt from the beginning, while Hopcroft nd Krp s lgorithm cn be executed on the fly on lzy DFA, which re constructed on demnd. This difference is fundmentl for our work: when strting from non-deterministic finite utomt (NFA), the powerset construction used to get deterministic utomt induces n exponentil fctor. Indeed, the lgorithm we introduce in this work for checking equivlence of NFA usully does not build the whole deterministic utomton, but just smll prt of it. We write usully becuse in few bd cses, the lgorithm still needs exponentilly mny sttes of the DFA (otherwise we would hve solved in polynomil time the problem of lnguge equivlence, which is PSPACEcomplete [6]). Our lgorithm is grounded on simple observtion on determinized NFA: for ll sets X nd Y of sttes of the originl NFA, the union (written +) of the lnguge recognised by X (written [[X ]) with the lnguge recognised by Y ([Y ]) is equl to the lnguge recognised by the union of X nd Y ([X + Y ]). In symbols: [X + Y ] = [X ] + [Y ] This fct llows us to introduce sound nd complete proof technique for lnguge equivlence, nmely bisimultion up to context, tht exploits both induction (on the opertor +) nd coinduction: if bisimultion R equtes both the (sets of) sttes X 1, Y 1 nd X 2, Y 2, then [X 1 ] = [Y 1 ] nd [[X 2 ] = [Y 2 ] nd, by the bove observtion, we cn immeditely conclude tht lso X 1 + X 2 nd Y 1 + Y 2 re lnguge equivlent. Intuitively, bisimultions up to context re bisimultions which do not need to relte X 1 + X 2 nd Y 1 + Y 2 when X 1 (resp. X 2 ) nd Y 1 (resp. Y 2 ) re lredy relted. To better illustrte this ide, consider the following exmple, where we check the equivlence of the sttes x nd u from the NFA depicted below on the lefthnd side. (Finl sttes re overlined, lbelled edges represent trnsitions.) x u z w y v {x} {y} 1 2 {z} {u} {v, w} {u, w} {u, v, w} 3 {x, y} {y, z} 4 5 6 {x, y, z} The determinized utomton is depicted on the right-hnd side. Ech stte is set of sttes of the NFA, finl sttes re overlined: they contin t lest one finl stte of the NFA. The numbered lines show reltion which is bisimultion contining x nd u. Actully, this is the reltion tht is built by the stndrd Hopcroft nd Krp s lgorithm (the numbers express the order in which ech pir is dded). The dshed lines (numbered by 1, 2 nd 3) form smller reltion which is not bisimultion, but bisimultion up to context: the equivlence of sttes 2

{x, y} nd {u, v, w} could be immeditely deduced from the fct tht {x} is relted to {u} nd {y} to {y, w}, without the need of further exploring the determinized utomton. Bisimultions up-to, nd in prticulr bisimultions up to context, hve been introduced in the context of concurrency theory [7, 8, 11] s proof technique for bisimilrity of CCS or π-clculus processes. As fr s we know, they hve never been employed for proving lnguge equivlence of non deterministic utomt. Among these techniques one should lso mention bisimultion up to equivlence, which, s we show in this pper, is implicitly used in the originl Hopcroft nd Krp s lgorithm. This technique cn be briefly explined by noting tht not ll bisimultions re equivlence reltions, nd thus, it might be the cse tht bisimultion reltes (for instnce) X nd Y, Y nd Z but not X nd Z. However, since [X ] = [Y ] nd [Y ] = [Z ], we cn immeditely conclude tht X nd Z recognise the sme lnguge. Anlogously to bisimultions up to context, bisimultion up to equivlence does not need to relte X nd Z when X nd Z re lredy relted to some Y (more generlly, when X nd Z belong to the equivlence closure of the reltion). The techniques of up-to equivlence nd up-to context cn be combined resulting in powerful proof technique which we cll bisimultion up to congruence. Our lgorithm is in fct just n extension of Hopcroft nd Krp s lgorithm tht ttempts to build bisimultion up to congruence insted of bisimultion up to equivlence. An importnt consequence, when using the up to congruence technique, is tht we do not need to build the whole deterministic utomt, but just those sttes tht re needed for the bisimultion up-to. For instnce, in the bove NFA, the lgorithm stops fter equting z nd u + v nd does not build the remining four sttes of the DFA. Despite their use of the up to equivlence technique, this is not the cse with Hopcroft nd Krp s lgorithm, where ll ccessible subsets of the deterministic utomt hve to be visited t lest once. Summrising, the contributions of this work re (1) the observtion tht Hopcroft nd Krp implicitly use bisimultions up to equivlence for DFA (Section 2), (2) new sound nd complete proof technique for proving lnguge equivlence of NFA (Section 3.1), nd (3) n efficient lgorithm for checking lnguge equivlence of NFA (Sections 3.2, 3.3, nd 3.4). Outline. The remining of the pper is structured s follows. We recll Hopcroft nd Krp s lgorithm for DFA in Section 2, showing how it cn be interpreted in terms of bisimultion up to equivlence. We then describe our lgorithm in Section 3, bsed on bisimultions up to congruence. We discuss the complexity of this lgorithm in Section 4; in prticulr, we provide good 3

nd bd cses, s well s some experimentl dt showing tht the introduced optimistion cn be very useful in prctice. Nottion. We denote sets by cpitl letters X, Y, S, T... nd functions by lower cse letters f, g,... Given sets X nd Y, X Y is the Crtesin product of X nd Y, X Y is the disjoint union nd X Y is the set of functions f : Y X. Finite itertions of function f : X X re denoted by f n (formlly, f 0 (x) = x, f n+1 (x) = f(f n+1 (x))). The collection of subsets of X is denoted by P(X). The (omeg) itertion of function f : P(X) P(X) on powerset is denoted by f ω (formlly, f ω (Y ) = n 0 f n (Y )). For set of letters A, A denotes the set of ll finite words over A; ɛ the empty word; nd w 1 w 2 (nd w 1 w 2 ) the conctention of words w 1, w 2 A. We use 2 to denote the set {0, 1} nd 2 A to denote the set of ll forml lnguges over A. 2 Hopcroft nd Krp s lgorithm for DFA In this section, we introduce (1) notion of bisimultion for proving lnguge equivlence of deterministic finite utomt, (2) nive lgorithm tht utomticlly checks lnguge equivlence of DFA by men of bisimultions (Section 2.2) nd (3) the stndrd Hopcroft nd nd Krp s lgorithm (Section 2.3). Moreover, we observe tht this lgorithm exploits the proof technique of bisimultion up to equivlence (Section 2.4). A deterministic finite utomton (DFA) over the input lphbet A is triple (S, o, t), where S is finite set of sttes, o: S 2 is the output function, which determines if stte x S is finl (o(x) = 1) or not (o(x) = 0), nd t: S S A is the trnsition function which returns for ech stte x nd input letter A the next stte. For A, we will write x x to men tht t(x)() = x. For w A, we will write x w x for the lest reltion such tht (1) x ɛ x nd (2) x iff x x nd x x. From ny DFA, there exists unique function [ ]: S 2 A mpping sttes to forml lnguges, defined s follows for ll x S: x w w [x](ɛ) = o(x) [x]( w) = [[t(x)()](w) The lnguge [x] is clled the lnguge ccepted by x. Given two utomt (S 1, o 1, t 1 ) nd (S 2, o 2, t 2 ), the sttes x 1 S 1 nd x 2 S 2 re sid to be lnguge equivlent (written x 1 x 2 ) iff they ccept they sme lnguge. In the following, we will lwys consider the problem of checking the equivlence of sttes of one single nd fixed utomton (S, o, t). We do not loose generlity since for ny two DFA (S 1, o 1, t 1 ) nd (S 2, o 2, t 2 ) it is lwys possible to build the utomton (S 1 S 2, o 1 o 2, t 1 t 2 ) with o 1 o 2 (x) = { o 1 (x) if x S 1 o 2 (x) if x S 2 t 1 t 2 (x) = { t 1 (x) if x S 1 t 2 (x) if x S 2, 4

where the lnguge ccepted by every stte x S 1 S 2 is the sme s the lnguge ccepted by the sme stte in the originl utomton (S i, o i, t i ). For this reson, we lso work with utomt without explicit initil sttes: we focus on the equivlence of two rbitrry sttes of fixed DFA. 2.1 Proving lnguge equivlence vi coinduction We first define notion of bisimultion on sttes. We mke explicit the underlying notion of progression which we need in the sequel for up to techniques. Definition 1 (Progression, Bisimultion). Given two reltions R, R S S on sttes, R progresses to R, denoted R R if whenever x R y then 1. o(x) = o(y) nd 2. for ll A, t(x)() R t(y)(). A bisimultion is reltion R such tht R R. As expected, the bisimultion proof technique is sound nd complete w.r.t. lnguge equivlence: Proposition 1. Two sttes re lnguge equivlent iff there exists bisimultion tht reltes them. Proof. Let R [ ] be the reltion {(x, y) [x] = [y ]}. We prove tht R [ ] is bisimultion. If x R [ ] y, then o(x) = [[x](ɛ) = [[y ](ɛ) = o(y). Moreover, for ll A nd w A, [t(x)()](w) = [x]( w) = [y ]( w) = [t(y)()](w) tht mens [[t(x)()] = [t(y)()], tht is t(x)() R [ ] t(y)(). We now prove the other direction. Let R be bisimultion. We wnt to prove tht x R y entils [[x] = [y ], i.e., for ll w A, [x](w) = [[y ](w). We proceed by induction on w. For w = ɛ, we hve [x](ɛ) = o(x) = o(y) = [[y ](ɛ). For w = w, since R is bisimultion, we hve t(x)() R t(y)() nd thus [t(x)()](w ) = [t(y)()](w ) by induction. This llows us to conclude since [x]( w ) = [t(x)()](w ) nd [y ]( w ) = [t(y)()](w ). 2.2 Nive lgorithm Figure 1 shows nive version of Hopcroft nd Krp s lgorithm for checking lnguge equivlence of the sttes x nd y of deterministic finite utomton (S, o, t). Strting from x nd y, the lgorithm builds reltion R tht, in cse of success, is bisimultion. In order to do tht, it employs the set (of pirs of sttes) todo which, intuitively, t ny step of the execution, contins the pirs (x, y ) tht must be checked: if (x, y ) lredy belongs to R, then it hs lredy been checked nd nothing-else should be done. Otherwise, the lgorithm checks if x nd y hve the sme outputs (i.e., if both re finl or not). If o(x ) o(y ), then x nd y re different (becuse there exists w A such tht x w x nd y w y ). If o(x ) = o(y ), then the lgorithm inserts (x, y) in R nd, for ll A, the pirs (t(x)(), t(y)()) in todo. 5

Nive(x,y) (1) R is empty; todo is empty; (2) insert (x, y) in todo; (3) while todo is not empty { (3.1) extrct (x, y ) from todo; (3.2) if (x, y ) R then skip, else { (3.3) if o(x ) o(y ) then return flse, else { (3.4) for ll A, insert (t(x )(), t(y )()) in todo; (3.5) insert (x, y ) in R; }} } (4) return true; Figure 1: Nive lgorithm for checking the equivlence of sttes x nd y of DFA (S, o, t); R nd todo re sets of pirs of sttes. For the time being, we void to discuss which dt structures re convenient for implementing R nd todo (s well s ny complexity issue), but we just focus our ttention on the correctness of the lgorithm. Just notice tht the lgorithm termintes since new pir is dded to R t ech itertion, nd there re finitely mny such pirs. (In the sequel, when enumerting itertions, we ignore those where pir from todo is lredy in R so tht there is nothing to do we cn moreover notice tht when the lgorithm returns true, it necessrily went 1 + A R times in loop (3), where A nd R respectively denote the size of the lphbet nd of the produced reltion R.) Proposition 2. For ll sttes x nd y, we hve x y iff Nive(x,y). Proof. We first observe tht if Nive(x,y) returns true then the reltion R tht is built before rriving to step (4) is bisimultion. Indeed, the following proposition is n invrint for the loop corresponding to step (3): R R todo This invrint is preserved since t ny itertion of the lgorithm pir (x, y ) is removed from todo nd inserted in R unless it ws lredy present (fter checking tht o(x ) = o(y ) nd dding the corresponding derivtives to todo). Since todo is empty t the end of the loop, we ctully hve R R, i.e., R is bisimultion. By Proposition 1, we deduce x y. We now prove tht if Nive(x,y) returns flse, then x y. Note tht for ll (x, y ) inserted in todo, there exists word w A such tht x w x nd y w y. Since o(x ) o(y ), then [[x ](ɛ) [y ](ɛ) nd thus [[x](w) = [[x ](ɛ) [y ](ɛ) = [y ](w), tht is x y. Since both Hopcroft nd Krp s lgorithm nd the one we introduce in Section 3 re simple vritions of this nive one, it is importnt to illustrte its 6

1 x u 2 y v 3 z w 1 x u,b 2 y v b,b 4 5,b,b 3 z w,b Figure 2: Checking for DFA equivlence, nively. execution with n exmple. Consider the DFA with input lphbet A = {} in the left-hnd side of Figure 2, nd suppose we wnt to check if x nd u re lnguge equivlent. During the initilistion, (x, u) is inserted in todo. At the first itertion (of cycle (3)), since o(x) = 0 = o(u), (x, u) is inserted in R nd (y, v) in todo. At the second itertion, since o(y) = 1 = o(v), (u, v) is inserted in R nd (z, w) in todo. At the third itertion, since o(z) = 0 = o(w), (z, w) is inserted in R nd (y, v) in todo. At the fourth itertion, since (y, v) is lredy in R, then the lgorithm does nothing. Since there re no more pirs to check (in todo), the reltion R is bisimultion nd the lgorithm termintes returning true. The itertions of the lgorithm re concisely described by the numbered dshed lines in Figure 2. The line i mens tht the connected pir is inserted in R t itertion i. Remrk 1. Minimiztion [4, 2] is n lterntive wy of checking lnguge equivlence; there re two min differences. 1. Minimiztion lgorithms equte ll the lnguge equivlent sttes of given utomton. The bove nive lgorithm insted checks the equivlence of only two sttes nd, t the end, not ll the equivlent sttes re in the reltion R. For instnce, the sttes x nd w of the left-hnd side exmple from Figure 2 re lso lnguge equivlent, but they re not in the reltion R computed by Nive(x,u). 2. Minimiztion lgorithms require to know from the beginning the whole utomton (X, o, t), while Nive cn be executed on the fly [3]: it cn be executed on lzy DFA, which is constructed on demnd. This property is essentil for the lgorithm tht we introduce in Section 3. 2.3 Hopcroft nd Krp s lgorithm The previous nive lgorithm is pproximtely qudrtic in the number of sttes of the DFA: new pir is dded to R t ech itertion, nd there re only n 2 such pirs, where n = S is the number of sttes of the DFA. 7

HK(x,y) (1) R is empty; todo is empty; (2) insert (x, y) in todo; (3) while todo is not empty { (3.1) extrct (x, y ) from todo; (3.2) if (x, y ) e(r) then skip, else { (3.3) if o(x ) o(y ) then return flse, else { (3.4) for ll A, insert (t(x )(), t(y )()) in todo; (3.5) insert (x, y ) in R; }} } (4) return true; Figure 3: Hopcroft nd Krp s lgorithm for checking the equivlence of sttes x nd y of DFA (S, o, t); R nd todo re sets of pirs of sttes. To mke this lgorithm (lmost) liner, Hopcroft nd Krp ctully use union-find dt structure to record set of equivlence clsses rther thn set of visited pirs. With respect to Figure 1, it suffices to updte steps 3.2 nd 3.5 s follows: (3.2) if equiv(r, x, y ) then... (3.5) merge(r, x, y ) As consequence, their lgorithm my stop erlier, when n encountered pir of sttes is not lredy in R but in its reflexive, symmetric, nd trnsitive closure. For instnce in the right-hnd side exmple from Figure 2, we cn skip the fifth pir (y, w), since y nd w lredy belong to the sme equivlence clss ccording to the previous four pirs. More generlly, there cn be t most n itertions (two equivlence clsses re merged t ech itertion); the lgorithm is thus lmost liner: by using union-find dt structure, steps 3.2 nd 3.5 cn be performed in lmost constnt time [13]. Let e(r) be the symmetric, reflexive, nd trnsitive closure of binry reltion R on sttes; n lterntive wy of presenting this lgorithm, without considering the concrete union-find dt-structure, consists in simply replcing step 3.2 with (3.2) if (x, y ) e(r) then... The whole lgorithm, nmed HK, is given in Figure 3. We now show tht this ctully corresponds to using n up-to technique to improve the coinductive proof method. 2.4 Bisimultions up-to Definition 2 (Bisimultion up-to). Let f : P(S S) P(S S) be function on reltions on S. A reltion R is bisimultion up to f if R f(r), i.e., 8

whenever x R y then 1. o(x) = o(y) nd 2. for ll A, t(x)() f(r) t(y)(). With this definition, Hopcroft nd Krp s lgorithm just consists in building bisimultion up to e. To prove the correctness of the lgorithm it suffices to show tht ny bisimultion up to e is contined in bisimultion. We use for tht the notion of comptible function [11, 9]: Definition 3 (Comptible function). A function f : P(S S) P(S S) on reltions on S is comptible if it preserves progressions: for ll R, R, R R entils f(r) f(r ). Theorem 1 (Correctness of comptible functions). Let f be comptible function. Any bisimultion up to f is contined in bisimultion. Proof. Suppose tht R is bisimultion up to f: R f(r). Using comptibility of f nd by simple induction on n, we get n, f n (R) f n+1 (R). Therefore, we hve f n (R) f n (R), n n i.e., f ω (R) = n f n (R) is bisimultion. This ltter reltion trivilly contins R, by tking n = 0. We could prove directly tht e is comptible function; we however tke detour to ese our correctness proof for the lgorithm we propose in Section 3. Proposition 3 (Compositionlity of comptible functions). The following functions re comptible: 1. the identity function id : R R; 2. the composition f g : R f(g(r)) of comptible functions f nd g; 3. the union F : R f F f(r) of n rbitrry fmily F of comptible functions. Proof. The first two points re strightforwrd; for the lst one, ssume tht F is fmily of comptible functions. Suppose tht R R ; for ll f F, we hve f(r) f(r ) so tht f F f(r) f F f(r ). As consequence, the itertion f ω of comptible function f is comptible. Lemm 1. The following functions re comptible: 1. the constnt to identity function r : R {(x, x) x}; 9

2. the converse function s : R {(y, x) x R y}; 3. the squring function t : R {(x, z) y, x R y y R z}; Intuitively, given reltion R, (s id)(r) is the symmetric closure of R, (r s id)(r) is its reflexive nd symmetric closure, nd (r s t id) ω (R) is its symmetric, reflexive nd trnsitive closure: we hve e = (r s t id) ω. Another wy to understnd this decomposition of the symmetric, reflexive, nd trnsitive closure function (e) is to recll tht for given R, e(r) cn be defined inductively by the following rules: x e(r) x r x e(r) y y e(r) x s x e(r) y y e(r) z t x e(r) z x R y x e(r) y id Therefore, together with Proposition 3, Lemm 1 ensures tht e is comptible. Corollry 1. For ll sttes x nd y, we hve x y iff HK(x,y). Proof. Sme proof s for Proposition 2, by using the invrint R e(r) todo for the loop. We deduce tht R is bisimultion up to e fter the loop. Since e is comptible, R is contined in bisimultion by Theorem 1. As n exmple, tke the utomton on the right of Figure 2. nive lgorithm constructs the reltion While the R Nive = {(x, u), (y, v), (z, w), (z, v), (y, w)}, which is bisimultion, Hopcroft nd Krp s lgorithm stops one step erlier, resulting in the reltion R HK = {(x, u), (y, v), (z, w), (z, v)}, which is not bisimultion (becuse (x, u) R HK, x b y, u b w nd (y, w) / R HK ), but bisimultion up to e (since (y, w) e(r HK )). Remrk 2. Observe tht unlike with the nive lgorithm, the reltion R built from Hopcroft nd Krp s lgorithm might chnge depending on the order in which the pirs (x, y ) re processed from the todo list (step (3.1)). For instnce, fter inserting (x, u) in R, we might insert (y, w), then (z, v) nd finlly (z, w), resulting in the following reltion: R HK = {(x, u), (y, w), (z, v), (z, w)} We cn however notice tht e(r HK ) = e(r HK ) = e(r Nive). This ctully holds in generl, whtever the order in which the todo list is processed: we lwys hve R HK R Nive e(r HK ) 10

(the first continment holds by definition of the lgorithm, the second holds becuse e(r HK ) is bisimultion proof of Theorem 1, we hve e ω = e). It follows tht e(r HK ) = e(r Nive ), e being monotonic nd idempotent. Since e(r HK ) is obtined by merging equivlence clsses, this mens tht the number of itertions required by HK does not depend on the order in which the pirs re processed. This ltter property will not hold nymore in the lgorithm tht we will introduce in Section 3, so tht the policy for choosing (x, y ) in the step (3.1) will be relevnt for the efficiency of the lgorithm. 3 Optimised lgorithm for NFA We now introduce our optimised lgorithm for non deterministic finite utomt (NFA). We strt with stndrd definitions bout semi-lttices, NFA, deterministion, nd lnguge equivlence for NFA. A semi-lttice with bottom (X, +, 0) consists of set X nd binry opertion +: X X X tht is ssocitive, commuttive, idempotent (ACI) nd hs 0 X (the bottom) s identity. Since we will lwys consider semi-lttices with bottom, herefter we will void to specify every time with bottom, but we will just write semi-lttice. Given two semi-lttices (X 1, + 1, 0 1 ) nd (X 2, + 2, 0 2 ), n homomorphism of semi-lttices f : (X 1, + 1, 0 1 ) (X 2, + 2, 0 2 ) is function f : X 1 X 2 such tht for ll x, y X 1, f(x+ 1 y) = f(x)+ 2 f(y) nd f(0 1 ) = 0 2. The set 2 = {0, 1} is semi-lttice when tking + to be the ordinry Boolen or. Also the set of ll lnguges 2 A crries semi-lttice where + is the union of lnguges nd 0 is the empty lnguge. More generlly, for ny set X, P(X) is semi-lttice where + is the union of sets nd 0 is the empty set. In the rest of the pper we will indiscrimintely use 0 to denote the element 0 2, the empty lnguge in 2 A nd the empty set in P(X). Anlogously, + will denote the Boolen or in 2, the union of lnguges in 2 A nd the union of sets in P(X). A non-deterministic finite utomton (NFA) over the input lphbet A is triple (S, o, δ), where S is finite set of sttes, o: S 2 is the output function (s for DFA), nd δ : S P(S) A is the trnsition reltion which ssigns to ech stte x S nd input letter A set of possible successor sttes. The powerset construction trnsforms every NFA (S, o, δ) into the DFA (P(S), o, δ ) where o : P(S) 2 nd δ : P(S) P(S) A re defined for ll X P(S) s o(x) if X = {x} with x S o (X) = 0 if X = 0 o (X 1 ) + o (X 2 ) if X = X 1 + X 2 δ(x)() if X = {x} with x S δ (X)() = 0 if X = 0 δ (X 1 )() + δ (X 2 )() if X = X 1 + X 2 11

For n exmple consider the NFA (S, o, δ) depicted on the left below. Prt of the corresponding DFA is depicted on the right, where we use new nottion: sttes re denoted by expressions of the form x 1 + + x n corresponding to the set {x 1,..., x n } (thus x corresponds to {x} nd 0 to the empty set). Like previously, expressions re overlined iff they re finl sttes. x y z x y z x + y y + z x + y + z Observe tht the stte z mkes one single -trnsition going into x + y. This stte is finl, since o (x + y) = o (x) + o (y) = o(x) + o(y) = 1 + 0 = 1. Moreover it mkes n -trnsition into δ (x + y)() = δ (x)() + δ (y)() = δ(x)() + δ(y)() = y + z. The lnguge ccepted by the sttes of NFA (S, o, δ) cn be conveniently defined vi the powerset construction: the lnguge ccepted by x S is the lnguge ccepted by the singleton {x} in the DFA (P(S), o, δ ), in symbols [{x}]. Therefore, in the following, insted of considering the problem of lnguge equivlence of sttes of the NFA, we will focus on lnguge equivlence of sets of sttes of the NFA: given the sets of sttes X nd Y in P(S), we sy tht X nd Y re lnguge equivlent (X Y ) iff [X ] = [[Y ]. This is exctly wht hppens in clssicl utomt theory where NFA re equipped with set of initil sttes. It is worth to note tht, by definition, both o nd δ re semi-lttices homomorphisms. This property will be fundmentl in Lemm 2 for proving the soundness of the up-to technique we re going to introduce. Moreover it induces compositionlity of lnguge equivlence, s stted by following theorem. Theorem 2. Let (S, o, δ) be non-deterministic utomton nd (P(S), o, δ ) be the corresponding deterministic utomton obtined through the powerset construction. The function [ ]: P(S) 2 A is semi-lttice homomorphism, tht is, for ll X 1, X 2 P(S), [X 1 + X 2 ] = [X 1 ] + [X 2 ] nd [0]] = 0. Proof. We prove tht for ll words w A, [[X 1 +X 2 ](w) = [[X 1 ](w)+ [X 2 ](w), by induction on w. for ɛ, we hve: [X 1 + X 2 ](ɛ) = o (X 1 + X 2 ) = o (X 1 ) + o (X 2 ) = [[X 1 ](ɛ) + [[X 2 ](ɛ). 12

for w, we hve: [X 1 + X 2 ]( w) = [[δ (X 1 + X 2 )()](w) (by definition) = [[δ (X 1 )() + δ (X 2 )()](w) (by definition) = [[δ (X 1 )()](w) + [[δ (X 2 )()](w) (by induction hypothesis) = [[X 1 ]( w) + [[X 2 ]( w). (by definition) For the second prt, we prove tht for ll words w A, [0](w) = 0, gin by induction on w. Bse cse: [0]](ɛ) = o (0) = 0. Inductive cse: [0]]( w) = [δ (0)()](w) = [0](w) tht by induction hypothesis is 0. In order to check if the sets of sttes X nd Y of n NFA (S, o, δ) re lnguge equivlent, we cn employ bisimultion on the DFA (P(S), o, δ ). In other words, bisimultion for NFA (S, o, δ) is reltion R P(S) P(S) on sets of sttes, such tht whenever X R Y then 1. o (X) = o (Y ) nd 2. for ll A, δ (X)() R δ (Y )(). Since this is just the old definition of bisimultion (Definition 1) pplied on (P(S), o, δ ), it is immedite to see tht X Y iff there exists bisimultion tht reltes them. Remrk 3 (Liner time v.s. brnching time). It is importnt not to confuse these bisimultion reltions with the stndrd Milner-nd-Prk bisimultions [7] (which strictly imply lnguge equivlence): in stndrd bisimultion R if the following sttes x nd y re in R, x x 1. x n y y 1. y m then ech x i should be in R with some y j (nd vice-vers). Here, insted, we first trnsform the trnsition reltion into x x 1 + + x n y y 1 + + y m, using the powerset construction, nd then we require tht x 1 + + x n nd y 1 + + y m re relted by R. 3.1 Bisimultion up to congruence As explined in the introduction, we rely on the notion of bisimultion up to congruence. More precisely, this notion of congruence hs to be understood w.r.t. set theoretic union of sets of sttes (+). We strt with the following notion of congruence closure: 13

Definition 4 (Congruence closure). Let u: P(P(S) P(S)) P(P(S) P(S)) be the function on reltions on sets of sttes defined s: R {(X 1 + X 2, Y 1 + Y 2 ) X 1 R Y 1 X 2 R Y 2 }. The function c = (r s t u id) ω is clled the congruence closure function. Intuitively, c(r) is the smllest equivlence reltion which is congruence with respect to the opertion + nd which includes R. It could lterntively be defined inductively using the rules r, s, t, nd id from the previous section, nd the following one: X 1 c(r) Y 1 X 2 c(r) Y 2 X 1 + X 2 c(r) Y 1 + Y 2 u (Note tht we do not include rule for the constnt 0 since it is subsumed by reflexivity (r).) Here is concrete exmple; consider the following reltion: R = {(x, y + z), (u, y + v)}. By summing these two pirs using rule u, we deduce x + u c(r) y + z + v, nd since u R y + v, we lso get x + u c(r) z + u. We cn deduce mny other equtions; in fct, c(r) defines the following prtition of sets of sttes: {0} {y} {z} {v} {z + v} {x, y + z, x + z, x + y, x + y + z} {u, y + v, u + v, y + u, y + u + v} {x + u, z + u, y + z + v, nd the 14 remining sets}. Lemm 2. The function u is comptible. Proof. Suppose tht R R, we hve to show u(r) u(r ). Suppose tht X u(r) Y, i.e., X = X 1 + X 2 nd Y = Y 1 + Y 2 for some X 1, X 2, Y 1, Y 2 such tht X 1 R Y 1 nd X 2 R Y 2. By ssumption, we hve o (X 1 ) = o (Y 1 ) o (X 2 ) = o (Y 2 ) t (X 1 )() R t (Y 1 )() t (X 2 )() R t (Y 2 )() (for ll A) Since o nd t re homomorphisms, we deduce o (X 1 + X 2 ) = o (Y 1 + Y 2 ) t (X 1 + X 2 )() u(r ) t (Y 1 + Y 2 )() (for ll A) Theorem 3. Any bisimultion up to c is contined in bisimultion. Proof. By Theorem 1, it suffices to show tht c is comptible, which follows from Lemms 1 nd 2, nd Proposition 3. 14

x y u v 1 x y v 2 y + u x + u 4 3 x + u + v y + v 6 5 y + u + v x + y + u 8 7 x + y + u + v Figure 4: Bisimultions up to congruence, on single letter NFA. We lredy gve n exmple of bisimultion up to context in the introduction, which is prticulr cse of bisimultion up to congruence (up to context corresponds to using the function (u id) ω, where we do not close the given reltion under reflexivity, symmetry, nd trnsitivity). A more involved exmple illustrting the use of ll ingredients of the congruence closure function (c) is given in Figure 4. The reltion R expressed by the dshed numbered lines (formlly R = {(y, x), (v, y + u)}) is neither bisimultion, nor bisimultion up to equivlence, since v x + u nd y + u y + v, but (x + u, y + v) / e(r). However, R is bisimultion up to congruence: we hve (x + u, y + v) c(r): x + u c(r) y + u ((y, x) R) c(r) y + y + u (+ is idempotent) c(r) y + v ((v, y + u) R) In contrst, we need eight pirs to get bisimultion up to e contining the pir (y, x): this is the reltion depicted with both dshed nd dotted lines in Figure 4. 3.2 Optimised lgorithm for NFA As expected, this up-to congruence proof technique cn be turned into n lgorithm HKC for checking equivlence of sets of sttes in n NFA. The code is given in Figure 5. It bsiclly corresponds to Hopcroft nd Krp s lgorithm (Figure 3), except tht: 1. the sttes of the underlying DFA re computed on the fly, by the powerset construction; 2. we use the up-to congruence technique in step 3.2. 15

HKC(X,Y ) (1) R is empty; todo is empty; (2) insert (X, Y ) in todo; (3) while todo is not empty { (3.1) extrct (X, Y ) from todo; (3.2) if (X, Y ) c(r) then skip, else { (3.3) if o (X ) o (Y ) then return flse, else { (3.4) for ll A, insert (δ (X )(), δ (Y )()) in todo; (3.5) insert (X, Y ) in R; }} } (4) return true; Figure 5: On the fly nd up-to congruence vrint of Hopcroft nd Krp s lgorithm, for checking the equivlence of sets of sttes X nd Y of NFA (S, o, δ). Corollry 2. For ll sets of sttes X nd Y, we hve X Y iff HKC(X,Y ). Proof. Sme proof s for Proposition 2, by using the invrint R c(r) todo for the loop. We deduce tht R is bisimultion up to c fter the loop. We conclude with Theorem 3. 3.3 Computing the congruence closure In the lgorithm of Figure 5, we need to check whether some pirs belong to the congruence closure of the current reltion R (step 3.2). We present here simple solution, bsed on rewriting modulo ssocitivity, commuttivity, nd idempotence (ACI). The ide is to look ech pir (X, Y ) in the reltion R s pir of rewriting rules, which we will use to compute norml forms for sets of sttes: X X + Y Y X + Y. Indeed, by idempotence, X R Y entils X c(r) Y c(r) X + Y. Definition 5. Let R P(S) P(S) be reltion on sets of sttes. The rewriting reltion R P(S) P(S) is the smllest irreflexive reltion defined by the following rules: X R Y X R X + Y X R Y Y R X + Y Z R Z U + Z R U + Z Lemm 3. The rewriting reltion R is convergent nd contined in c(r). 16

Proof. We hve tht Z R Z implies Z > Z, where X denotes the crdinlity of the set X (note tht R is irreflexive). Since Z is bounded by S, the number of sttes of the NFA, the reltion R is strongly normlising. We cn lso check tht whenever Z R Z 1 nd Z R Z 2, either Z 1 = Z 2 or there is some Z such tht Z 1 R Z nd Z 2 R Z. Therefore, R is convergent. Finlly, if Z R Z then there exists (X, Y ) (s id)(r) such tht Z = Z + X nd Z = Z + Y. Therefore Z c(r) Z nd, thus, R is contined in c(r). In the sequel, we denote by X R the norml form of set X w.r.t. R. Intuitively, the norml form of set is the lrgest set of its equivlence clss. Reclling the exmple from Figure 4, the common norml form of x + u nd y + v cn be computed s follows (R is the reltion {(y, x), (v, y + u)}): x + u x + y + u x + y + u + v x + y + v y + v Lemm 4. Let X, Y P(S), we hve (X + Y ) R = (X R + Y R ) R. Proof. Follows from confluence (Lemm 3) nd from the fct tht for ll Z, Z, U, Z R Z entils U + Z R U + Z. Theorem 4. We hve (X, Y ) c(r) iff X R = Y R. Proof. From left to right. We proceed by induction on the derivtion of (X, Y ) c(r). The cses for rules r, s, nd t re strightforwrd. For rule id, suppose tht X R Y, we hve to show X R = Y R : if X = Y, we re done; if X Y, then X R X + Y = Y ; if Y X, then Y R X + Y = X; if neither Y X nor X Y, then X, Y R X + Y. (In the lst three cses, we conclude by confluence Lemm 3.) For rule u, suppose by induction tht X i R = Y i R for i 1, 2; we hve to show tht (X 1 + Y 1 ) R = (X 2 + Y 2 ) R. This follows by Lemm 4. From right to left. By Lemm 3, we hve X c(r) X R for ny set X, so tht X c(r) X R = Y R c(r) Y. In the corresponding normlistion lgorithm, ech pir of R my be used only once s rewriting rule. However, we do not know in dvnce in which order to pply these rules. Therefore, checking whether (X, Y ) c(r) with this lgorithm requires time proportionl to r 2 n, where r = R is the size of the 17

reltion R, nd n = S is the number of sttes of the NFA (ssuming liner time complexity for set-theoretic union nd continment of sets of sttes). There is room for optimistion; we could try for instnce to normlise the set of rewriting rules when dding new rules (step 3.5), or to optimise rewriting rules during normlistion (like it is done with the union-find dt structure, where pths re compressed during the find opertion). We could lso look for better dt structures to represent congruence clsses. We leve this for future work: this is orthogonl to the ides presented here. 3.4 Heuristics mtters Like Hopcroft nd Krp s lgorithm for DFA (Figure 3), our lgorithm for NFA produces reltion which is not bisimultion, only bisimultion up-to (here, up to congruence). The sme rgument s in Remrk 2 cn be mde: while the produced reltion depends on the order in which todo is processed, its congruence closure, which is bisimultion, does not. However, unlike from Hopcroft nd Krp s lgorithm, the number of steps required to build this bisimultion up-to does depend on this choice. Consider for instnce the following NFA over the lphbet A = {, b}:,b x,b z b,b y,b Strting from the singleton sets {x} nd {y}, the lgorithm my compute bisimultion up to congruence contining three or four pirs. If we strt with the b-trnsitions, we rech the pir (0, x + y), which imposes strong constrints (nmely, both x nd y re empty) so tht most subsequent visited pirs re lredy in the congruence closure: 1 x y x + y + z b 3,b b b 0,b 2 z x + y ɛ b bb b bb b pirs x y 0 x + y z... x + y + z 0... x + y 0 x + y + z 0... x + y + z 0... x + y + z b (On the right-hnd side, we list the visited pirs, mrking them with dsh ( ) when they re dded to R, nd with dots (...) when they lredy belong to c(r).) On the contrry, if we dely the processing of (0, x + y), the other pirs 18

we dd to R re less constrining, so tht we need one more step, s illustrted below: 1 x y b x + y + z,b b b 3 2 0,b 4 z x + y b ɛ b b b b bb pirs x y z x + y + z x + y x + y + z x + y + z... x + y + z x + y... x + y + z x + y + z... x + y + z 0 x + y 0... x + y + z 0... x + y While the order in which letters of the lphbet should be processed to be optiml seems hrd to guess, empiric experiments tend to show tht it is usully better to explore the underlying DFA in bredth first mnner rther thn in depth first. Other heuristics re possible, like guessing which pirs will impose strong constrints in the congruence closure of the reltion R, or which ones will result in smll sub-dfa. We leve the study of such ides for future work. 4 Complexity hints We do not know how to properly nlyse the complexity of our lgorithm, for severl resons: 1. it depends on the heuristic we use to decide in which order to process the pirs in todo; 2. it depends on the lgorithm we use to check whether pir belongs to the congruence closure of finite reltion, nd the lgorithm we proposed in Section 3.3 to this end cn certinly improved; 3. the worst cse complexity might not be representtive of the ctul behviour in prctice (this holds lso for Hopcroft nd Krp s lgorithm: NFA which produce very lrge DFA by the powerset construction re not so frequent); 4. n verge-cse nlysis seems out of rech: we found no results bout Hopcroft nd Krp s lgorithm in the literture, here we would moreover need to understnd the verge size of the miniml reltion generting given congruence (this is trivil for equivlences). Therefore, we only provide prtil nswers here: we first give n exmple showing tht HKC cn be exponentilly better thn HK; we then show bd cse, where both lgorithms behve the sme nd require exponentil time; we finlly give experimentl dt obtined on ll smll NFA nd on lrger rndom NFA. 19

x,b x 0,b,b x n,b,b y y,b 0,b y n b,b z z,b 0,b z n Figure 6: Cses where HKC is liner while HK requires exponentil time. In ll cses, we focus on the size of the constructed bisimultion up-to: this does not depend on the implementtion of the congruence closure, nd this number is strongly relted to the number of itertions (s mentioned before Proposition 2, the min loop is executed 1 + A R times, where A nd R respectively denote the size of the lphbet nd of the produced reltion R). 4.1 Good cses Consider the fmily of NFA given in Figure 6, where n is n rbitrry nturl number. They intuitively correspond to the following regulr expressions, nd we hve we hve x y + z. x : ( + b) n+1 ( + b) y : ( + b) ( + b) n z : ( + b) b ( + b) n The top utomton (x) is lredy deterministic, nd the two other ones (y nd z) yield exponentil miniml DFA (in fct, strting from stte y or z, the powerset construction results in miniml DFA which hs 2 n+1 sttes; strting from stte y + z, it yields DFA with 2 n+2 1 sttes which is not miniml x is the miniml DFA in this cse). While the stndrd lgorithm (HK) requires 2 n+2 1 steps to prove tht x y + z (ll sttes of both DFA hve to be mentioned in bisimultion up to equivlence), our lgorithm (HKC) requires only 2n + 3 steps with bredth first heuristic for picking pirs from todo. Indeed, it constructs the following reltion: R n = {(x, y + z)} {(x i, y + y 0 + + y i + z) 0 i n} (1) {(x i, y + y 0 + + y i 1 + z i + z) 0 i n}. (2) We hve R n = 2n + 3 nd we cn check tht R n congruence: is bisimultion up to 20

Lemm 5. For ll nturl numbers n, R n is bisimultion up to c for the NFA depicted in Figure 6. Proof. First notice tht y + y 0 + z c(r n ) y + z 0 + z, ( ) since these two sets re relted to x 0 by R n, nd x i c(r n ) y + y 0 + + y i + z i + z, ( ) by summing up pirs (1) nd (2) nd using idempotence. We then consider ech kind of pir of R n seprtely: (x, y + z): we hve t (x)() = x 0 R n y + y 0 + z = t (y + z)(), similrly, t (x)(b) = x 0 R n y + z 0 + z = t (y + z)(b). (x i, y + y 0 + + y i + z) for i < n: we hve t (x i )() = x i+1 R n y + y 0 + + y i + y i+1 + z (by (1)) = t (y + y 0 + + y i + z)(), nd t (x i )(b) = x i+1 R n y + y 0 + + y i + y i+1 + z (by (1)) c(r n ) y + y 1 + + y i + y i+1 + z 0 + z (by ( )) = t (y + y 0 + + y i + z)(b) ; (x i, y + y 0 + + y i 1 + z i + z) for i < n: we hve t (x i )() = x i+1 R n y + y 0 + + y i + z i+1 + z (by (1)) = t (y + y 0 + + y i 1 + z i + z)(), nd t (x i )(b) = x i+1 R n y + y 0 + + y i + z i+1 + z (by (2)) (x n, y + y 0 + + y n + z): we hve c(r n ) y + y 1 + + y i + z i+1 + z 0 + z (by ( )) = t (y + y 0 + + y i 1 + z i + z)(b) ; t (x n )() = x n R n y + y 0 + + y n + z (by (1)) = t (y + y 0 + + y n + z)(), nd t (x n )(b) = x n R n y + y 0 + + y n + z (by (1)) c(r n ) y + y 1 + + y n + z 0 + z (by ( )) = t (y + y 0 + + y n + z)(b) ; 21

(x n, y + y 0 + + y n 1 + z n + z): we hve t (x n )() = x n c(r n ) y + y 0 + + y n + z n + z (by ( )) = t (y + y 0 + + y n 1 + z n + z)(), nd t (x n )(b) = x n c(r n ) y + y 0 + + y n + z n + z (by ( )) c(r n ) y + y 1 + + y n + z n + z 0 + z (by ( )) = t (y + y 0 + + y n 1 + z n + z)(b). Considering heuristics, notice tht the bredth-first strtegy is crucil to get this behviour: with depth-first strtegy, we would not dd the pirs (x 0, y + y 0 + z) nd (x 0, y + z 0 + z) from the beginning, nd these pirs, which entil ( ), re used to skip most pirs ppering during the unfolding of the DFA. Indeed, using to depth-first strtegy, we get bisimultions up to congruence whose size is exponentil lthough they re smller thn the bisimultions up to equivlence obtined with HK. 4.2 Bd cses Even though the previous exmple shows tht our lgorithm cn be polynomil where the stndrd Hopcroft nd Krp s lgorithm is exponentil, the problem of checking lnguge equivlence of NFA is PSPACE-complete [6], nd NPcomplete when restricted to the one-letter cse. We now give n exmple in the one-letter cse where our lgorithm requires exponentil time ( nice property in the one-letter cse is tht todo contins t most one element, which mens tht we do not hve to chose heuristic for extrcting pirs from tht set). The exmple is given in Figure 7; it consists of the DFA corresponding to the full lnguge (stte x) which we compre with the prllel composition of cycles of length vrying in [1..n], for given nturl number n. In regulr expressions syntx, it mounts to the following (trivil) eqution: = n ( i ). For ny nturl number k, let S k be the following set of sttes: S k = i=1 n x i k mod i. i=1 For ll k, we hve S k Sk+1, nd this sequence is periodic, of period p = lcm[1..n], the lest common multiplier of the first n positive numbers (known to be greter thn 2 n for n > 7). By running Hopcroft nd Krp s lgorithm between x nd S 0, we get bisimultion up to equivlence in p steps (this reltion is ctully bisimultion, 22

x x 1 0 x 2 0 x 2 1 x 3 0 x 3 1 x 3 2 x 4 0 x 4 1 x 4 2 x 4 3 x 5 1 x 5 0 x 5 2 x 5 3 x 5 4... Figure 7: Bd cse, where HK nd HKC behve the sme nd require exponentil time with one-letter utomt the up-to equivlence technique is not used). We cn show tht the optimised lgorithm behves exctly the sme: the up-to congruence technique does not help. Intuitively, t the j-th itertion, we hve R = {(x, S k ) k < j} ; therefore if j < p then (x, S j ) does not belong to R. This pir does not belong to c(r) either: x R = x + k<j S k while S j is in norml form nd does not contin x. Remrk 4. Note tht if we merge sttes x nd x 1 0 in this exmple, the stndrd lgorithm still requires lcm[1..n] steps, but the optimised one only requires n steps: t step n 1, we hve R = {(x 0 0, S k ) k < n}, nd now, the pir (x 0 0, S n ) belongs to c(r): x 0 0 R S n R, since x 0 0 belongs to S n ; nd x 0 0 R is the full stte, i.e., {x j i i < j}, since we went through ll sttes of ech cycle. Therefore, x 0 0 R = S n R, nd thus x 0 0 c(r) S n by Lemm 4. 4.3 Experimentl dt We performed n exhustive simultion for smll NFA with one letter. The results re summrised in the tble below: for ech line, we rn the two lgorithms on ll NFA, storing the size of the lrgest bisimultion up-to obtined in this wy (i.e., the worst cse). S worst cse HK HKC 3 5 3 4 12 5 5 17 7 6 26 9 23

These exhustive computtions re not trctble for lrger sizes, nd the pprent liner behviour for HKC is trp, s the exmple from Section 4.2 shows. To get n intuition of the behviour of our lgorithm on lrger NFA, we performed few tests on rndom utomt: S HK HKC men medin men medin 20 710.3 653.0 15.5 14.0 30 4884.1 4367.0 19.4 19.0 Here we worked with n lphbet of three letters, the probbility to hve trnsition with given lbel between two nodes is 10% in the first line, 5% in the second one (we chose these vlues to void getting NFA which mostly degenerte into trivil DFA). In both cses the men nd the medin vlues were computed bsed on 1000 experiments, nd we used bredth-first heuristic in the implementtion of HKC. These preliminry experimentl results look relly promising; we would like to understnd whether it is possible to ssess them more formlly. Acknowledgements. We re grteful to Mrcello Bonsngue, Jn Rutten, nd Alexndr Silv for the helpful discussions we hd. References [1] M. Bonsngue A. Silv, F. Bonchi nd J. Rutten. Generlizing the powerset construction, colgebriclly. In Proc. FSTTCS, volume 8 of LIPIcs, pges 272 283. Schloss Dgstuhl - Leibniz-Zentrum fuer Informtik, 2010. [2] J. A. Brzozowski. Cnonicl regulr expressions nd miniml stte grphs for definite events. In Mthemticl Theory of Automt, volume 12(6), pges 529 561. Polytechnic Press, NY, 1962. [3] J.-C. Fernndez, L. Mounier, C. Jrd, nd T. Jron. On-the-fly verifiction of finite trnsition systems. Forml Methods in System Design, 1(2/3):251 273, 1992. [4] J. E. Hopcroft. An n log n lgorithm for minimizing in finite utomton. In Proc. Interntionl Symposium of Theory of Mchines nd Computtions, pges 189 196. Acdemic Press, 1971. [5] J. E. Hopcroft nd R. M. Krp. A liner lgorithm for testing equivlence of finite utomt. Technicl Report 114, Cornell University, December 1971. 24

[6] A.R. Meyer nd L. J. Stockmeyer. Word problems requiring exponentil time. In Proc. STOC, pges 1 9. ACM, 1973. [7] R. Milner. Communiction nd Concurrency. Prentice Hll, 1989. [8] R. Milner, J. Prrow, nd D. Wlker. A clculus of mobile processes, I/II. Informtion nd Computtion, 100(1):1 77, 1992. [9] D. Pous. Complete lttices nd up-to techniques. In Proc. APLAS, volume 4807 of Lecture Notes in Computer Science, pges 351 366. Springer, 2007. [10] J. Rutten. Automt nd coinduction (n exercise in colgebr). In Proc. CONCUR, volume 1466 of Lecture Notes in Computer Science, pges 194 218. Springer, 1998. [11] D. Sngiorgi. On the Bisimultion Proof Method. Journl of Mthemticl Structures in Computer Science, 8:447 479, 1998. [12] D. Sngiorgi. Introduction to Bisimultion nd Coinduction. Cmbridge University Press, 2011. [13] R. E. Trjn. Efficiency of good but not liner set union lgorithm. Journl of the ACM, 22(2):215 225, 1975. 25