DFA Minimization in Map-Reduce

Size: px

Start display at page:

Download "DFA Minimization in Map-Reduce"

Barrie Silas Riley
5 years ago
Views:

1 DFA Minimization in Ma-Reduce Gösta Grahne Concordia University Montreal, Canada, H3G M8 Shahab Harrafi Concordia University Montreal, Canada, H3G M8 s harraf@alumni.concordia.ca Iraj Hedayati Concordia University Montreal, Canada, H3G M8 h iraj@encs.concordia.ca Ali Moallemi Concordia University Montreal, Canada, H3G M8 moa ali@encs.concordia.ca ABSTRACT We describe Ma-Reduce imlementations of two of the most rominent DFA minimization methods, namely Moore s and Hocroft s algorithms. Our analysis shows that the one based on Hocroft s algorithm is more efficient, both in terms of running time and communication cost. This is validated by our extensive exeriments on various tyes of DFA s, with u to 2 7 states. It also turns out that both algorithms are sensitive to skewed inut, the Hocroft s algorithm being intrinsically so. CCS Concets Theory of comutation MaReduce algorithms; Formal languages and automata theory; Keywords Ma-Reduce, Automata, DFA Minimization, Communication Cost. INTRODUCTION Google introduced Ma-Reduce (MR) [3] as a arallel rogramming model that can work over large clusters of commodity comuters. Ma-Reduce rovides a high-level framework for designing and imlementing such arallelism. MR is by now well established in academia and industry. While many single round (non-recursive) roblems have been studied and imlemented, recursive roblems that require several rounds of Ma and Reduce are still being exlored. A growing number of aers (see e.g. [,, 3]), deal with such multi-round MR algorithms. The roblem of DFA minimization rovides an interesting vehicle for studying multi-round MR roblems for several reasons. Firstly, because of its imortance and wide use in alications. Secondly, because of the multi-round structure of this roblem. In contrast with single-round roblems Contact author. Permission to make digital or hard coies of all or art of this work for ersonal or classroom use is granted without fee rovided that coies are not made or distributed for rofit or commercial advantage and that coies bear this notice and the full citation on the first age. Coyrights for comonents of this work owned by others than ACM must be honored. Abstracting with credit is ermitted. To coy otherwise, or reublish, to ost on servers or to redistribute to lists, requires rior secific ermission and/or a fee. Request ermissions from ermissions@acm.org. BeyondMR 6, June 26-July 26, San Francisco, CA, USA c 26 ACM. ISBN /6/6... $5. DOI: htt://dx.doi.org/.45/ such as the NFA intersections studied in [4], multi-round roblems require coordination between rounds. Thirdly, although DFA minimization is fairly straightforward to exress in Datalog, the results of the resent aer show that designing DFA minimization algorithms in MR is far from trivial. Thus, we hoe that DFA minimization can shed some light on the more general roblem of imlementing arbitrary Datalog rograms in the MR framework. We note that DFA minimization has been studied for various arallel architectures (see e.g. [6,, 4, 5]). However, all of these architectures contain some kind of shared memory, while the Ma-Reduce model corresonds to a sharednothing architecture. To the best of our knowledge, ours is the first aer studying DFA minimization in MR. However, somewhat similar roblem of grah reductions based on bi-simulation has recently been the object of MR imlementations, see e.g. [8] and [2]. In the serial context, the two major algorithms for DFA minimization are the one of Moore [9] and the one of Hocroft [5]. Of these, Hocroft s algorithm is considered suerior due to its running time, which is less than that of Moore s algorithm. In this aer we derive MR versions of Moore s and Hocroft s algorithms. The resulting algorithms are called and, resectively. We analyze and in terms of total communication cost, i.e. the size of all data communicated by maers and reducers over the commodity cluster. These turn out to be O(k 2 n 2 log n) and O(kn 2 log n) bits, resectively. Here n is the number of states of the automaton to be minimized, and k is the size of its alhabet. We have imlemented both algorithms in Aache Hadoo, and erformed extensive exeriments on synthetically generated DFA s, with various grah toologies. These toologies allow us to comare the communication cost and running time of the algorithms with resect to worst and average inuts, as well as to see the effect of skewed inut. Our exeriments validate the analysis and show that and are comarable in erformance, but for larger alhabets is more efficient. The exeriments also show that the erformance of both algorithms deteriorate when the inut is skewed. The rest of this aer is organized as follows. Section 2 rovides the necessary technical definitions, and Section 3 describes Moore s and Hocroft s algorithms. In Section 4 we adat the two algorithms for the MR framework along with the technical analysis of the communication cost for the two adated algorithms. Section 5 describes the exerimental results. Conclusions are drawn in the last section.

2 2. PRELIMINARIES In this section we introduce the basic technical reliminaries and definitions. A Finite Automaton (FA) is a 5-tule A = (Q, Σ, δ, q s, F ), where Q is a finite set of states, Σ is a finite set of alhabet symbols, δ Q Σ Q is the transition relation, q s Q is the start state, and F Q is a set of final states. By Σ we denote the set of all finite strings over Σ. Let w = a a 2... a l where a i Σ, be a string in Σ. An acceting comutation ath of w in A is a sequence (q s, a, q ), (q, a 2, q 2),..., (q l, a l, q f ) of tules of δ, where q f F. The language acceted by A, denoted L(A), is the set of all strings in Σ for which there exists an acceting comutation ath in A. An FA A is said to be deterministic if for all Q and a Σ there is a unique q Q, such that (, a, q) δ. Otherwise, the FA is non-deterministic. Deterministic FA s are called DFA s, and non-deterministic ones are called NFA s. By the well known subset construction, any NFA A can be turned into a DFA A D, such that L(A D ) = L(A). For a DFA A = (Q, Σ, δ, q s, F ), we also write δ in the functional format, i.e. δ(, a) = q iff (, a, q) δ. For a state Q, and string w = a a 2... a l Σ, we denote by ˆδ(, w) the unique state δ(δ( δ(δ(, a ), a 2), a l ), a l ). Then, for a DFA its language L(A) can be defined as {w : ˆδ(q s, w) F }. A DFA A is said to be minimal, if all DFA s B, such that L(A) = L(B), have at least as many states as A. For each regular language L, there is a unique (u to isomorhism of their grah reresentations) minimal DFA that accets L. The DFA minimization roblem has been studied since the 95 s. A taxonomy of DFA minimization algorithms was created by B. W. Watson in 993 [6]. The taxonomy reveals that most of the algorithms are based on the notion of equivalent states (to be defined). The sole excetion is Brzozowski s algorithm [7], which is based on reversal and determinization of automata. Let A = (Q, Σ, δ, q s, F ) be a DFA. Then the reversal of A is the NFA A R = (Q, Σ, δ R, F, {q s}), where δ R = {(, a, q) : (q, a, ) δ}. Brzozowski showed in 962 that if A is a DFA, then (((A R ) D ) R ) D is a minimal DFA for L(A). This rather surrising result does not however yield a ractical minimization algorithm since there is a otential double exonential blow-u in the number of states, due to the two determinization stes. The rest of the algorithms are based on equivalence of states. Let A = (Q, Σ, δ, q s, F ) be a DFA, and, q Q. Then is equivalent with q, denoted q, if for all strings w Σ, it holds that ˆδ(, w) F iff ˆδ(q, w) F. The relation is an equivalence relation, and by Q/ we denote the induced artition of the state-sace Q. The quotient DFA A/ = (Q/, Σ, γ, q s/, F/ ) where γ(/, a) = δ(, a)/, is then a minimal DFA, such that L(A/ ) = L(A). An imortant observation is that can be comuted iteratively as a sequence,, 2,..., m =, where q if F q F, and i+ q if i q and δ(, a) i δ(q, a), for all a Σ. For each DFA A the sequence converges in at most l stes, where l is the length of the longest simle ath from the start state to a final state. This means that Q/ can be comuted iteratively, each ste refining Q/ i to Q/ i+, and eventually converging to Q/. We reserve the letter n for the number of states in a DFA and the letter k for the size of its alhabet. We assume that Σ = {a,..., a k }, and use a and b as metalinguistic variables denoting arbitrary alhabet symbols. 3. THE CLASSICAL ALGORITHMS 3. Moore s Algorithm The first iterative algorithm was roosed by E. F. Moore in 956 [9]. The algorithm comutes the artition Q/ by iteratively refining the initial artition π = {F, Q\F }. After termination, π = Q/. In the algorithm, the artition π is encoded as a maing that assigns to each state Q a block-identifier π, which is a bit-string. The number of blocks in π, i.e. the cardinality of the range of π, is denoted π. The value of π at the i:th iteration of lines 8 is denoted π i. The symbol on line of the algorithm denotes concatenation (of strings). Algorithm Moore s DFA minimization algorithm Inut: DFA A = (Q, {a,..., a k }, δ, q s, F ) Outut: π = Q/ : i 2: for all Q do The initial artition 3: if F then π i 4: else π i 5: end if 6: end for 7: reeat 8: i i + 9: for all Q do : π i π i : end for 2: until π i = π i π i δ(,a ) πi δ(,a 2 )... πi δ(,a k ) The minimal automaton is now obtained by relacing Q by π, q s by π qs, F by {π f : f F }, and relacing each transition (, a, q) δ by transition (π, a, π q), and then removing dulicate transitions. 3.2 Hocroft s Algorithm This algorithm was roosed by J. E. Hocroft in 97 [5]. We start with some definitions. Let A = (Q, Σ, δ, q s, F ) be a DFA, S, P Q, S P =, and a Σ. Then S, a is called a slitter, and P S, a = {P, P 2}, where P = { P : δ(, a) S} P 2 = { P : δ(, a) / S}. If either of P or P 2 is emty, we set P S, a = P. Otherwise S, a is said to slit P. Furthermore, for S Q and a Σ, we define a S = {s S : δ(, a) = s for some Q}. That is, a S consists of those states in S that have an incoming transition labeled a. As in Moore s algorithm, we encode the current artition with a function π that mas states to integers. We also define B j = { Q : π = j}.

3 Algorithm 2 Hocroft s DFA minimization algorithm Inut: DFA A = (Q, Σ, δ, q s, F ) Outut: π = Q/ : Execute lines 6 of Algorithm Comute π 2: Q Queue set 3: for all a Σ do 4: if a B < a B then Add B, a to Q 5: else Add B, a to Q. 6: end if 7: end for 8: i 9: while Q do : i i + ; size π i : Pick and delete a slitter S, a from Q. 2: for each B j π i which is slit by S, a do 3: size size + 4: for all B j, where δ(, a) S do 5: π i size New block-number 6: end for 7: for all b Σ do Udate Q 8: if B j, b Q then 9: Add B π i, b to Q 2: else 2: if b B j < b B π i then 22: add B j, b to Q 23: else add B π i, b to Q. 24: end if 25: end if 26: end for 27: end for 28: end while 29: π π i At line, the size of the current artition (the number of equivalence classes) is stored in variable size. At lines 2 6 we examine all blocks B j π i that are slit by S, a, and all states B j, where δ(, a) S, are given the new block identifier size for π i at line 5. At the end, the minimal automaton A/ can be obtained as in Moore s algorithm. 4. THE ALGORITHMS IN MAP-REDUCE We assume familiarity with the Ma-Reduce model (see e.g. [7]). Here we will only recall that the initial data is stored on the DFS, and each maer is resonsible for a chunk of the inut. In a round of Ma and Reduce, the maers emit key-value airs K, V. Each reducer K receives and aggregates key-value lists of the form K, [V,..., V l ]. where the K, V i airs were emitted by the maers. 4. Moore s Algorithm in Ma-Reduce Our Ma-Reduce version of Moore s algorithm, called, consists of a re-rocessing stage, and one or more rounds of ma and reduce functions. Pre-rocessing: Let A = (Q, {a,..., a k }, δ, q s, F ). We first build a set from δ. The set will consist of annotated transitions of the form (, a, q, π, D), where π is a bit-string reresenting the initial block where the state belongs, D = + indicates that the tule reresents a transition (an outgoing edge), and D = indicates that the tule is a dummy transition carrying in its fourth osition the information of the initial block of the aforementioned state q. More secifically, for each (, a, q) δ we insert into (, a, q,, +) and (, a, q,, ) when, q F (, a, q,, +) and (, a, q,, ) when F, q F (, a, q,, +) and (, a, q,, ) when F, q F (, a, q,, +) and (, a, q,, ) when, q F. Recall that Moore s algorithm is an iterative refinement of the initial artition {F, Q \ F }. In the MR version each reducer will be resonsible for one or more states Q. Since there is no global data structure, we need to annotate each state with the identifier π of the block it currently belongs to. This annotation is ket in all transitions (, a i, q i), i =,..., k. Hence tules (, a i, q i, π, +) are in. In order to generate new block identifiers, the reducer also needs to know the current block of all the above states q i, which is why tules (, a i, q i, π qi, ) are in. Furthermore, the new block of state will be needed in the next round when udating the block annotation of states r,..., r m, where (r, a i, ),... (r m, a im, ) are all transitions leading to. Thus tules (r i, a ij,, π, ) are in. Ma function: Let ν be the number of reducers available, and h : Q {,,..., ν } a hashing function. At round i each maer gets a chunk of as inut, and for each tule (, a, q, π i, +) it emits the key-value air h(), (, a, q, π i, +), and for each tule (, a, q, πq i, ) it emits h(), (, a, q, πq i, ) and h(q), (, a, q, πq i, ). Reduce function: Each reducer ρ {,,..., ν } receives, for all Q where h() = ρ, all outgoing transitions (, a, q, π i, +),..., (, a k, q k, π i, +), as well as dummy transitions (, a, q, π i q (r, a i,, π i The reducer can now comute π i π i, ),..., (, a k, q k, π i, ), and q k, ),..., (r m, a im,, π i, ). π i q π i q 2... π i q k, corresonding to line in Algorithm and write the new value π i in the tules (, a, q j, π i, +), for j {,..., k}, and (r j, a,, π i, ), for j {,..., m}, which it then oututs. The reducer also oututs a change tule (, true) if the new value of π i means that the number of blocks in π i has increased. In order to see how reducer h() can determine this internally, consider for examle the DFA A = ({, q, r}, {a}, δ,, {r}), where δ(, a) = q, δ(q, a) = r, and δ(r, a) = r. Round i π i πq i πr i 2 3 After round a new block was created for state q, and after round 2 a new block was created for state. This can be seen from the fact that after round, π consists of two equal arts and, whereas π q consists of two distinct arts and. Similarly, after round 2, π q still consists of two distinct arts, whereas π has changed from two similar

4 arts to two distinct arts. We conclude that a new block has been created for but not for q. After round 3, all block-identifiers consist of two distinct arts, as they did after round 2. No new blocks have been created, and the algorithm can terminate. The above can be generalized as follows: If Σ = k, then a block-identifier π is a bit-string that consists of k + bitstring comonents from the revious round. If the number of comonents have increased, it means that a new blockidentifier has been created. The salient oint is that the increase can be detected inside reducer h(), which then can emit a change tule (, true). Termination detection: At he end of each MR-round, if any change tules (, true) have been emitted, it means that another round of MR is needed. Proosition. At round i of, π i = π i q if and only if i q. Proof. We have π = πq iff q by construction. Assume that π i = πq i iff i q, and suose that π i = πq. i In the i:th Ma-Reduce round, π i = π i π i q = π i q π i δ(,a )... πi δ(,a k ), and π i δ(q,a )... πi δ(q,a k ). By the inductive hyothesis, all the above (k + ) comonents are equal iff i q and δ(, a j) i δ(q, a j) for all j {,..., k} iff i q. For roving termination of the algorithm, we denote by sub(π i ) the set of substrings obtained by dividing π i into (k + ) contiguous substrings of equal length. Proosition 2. sub(π) i = sub(π i ) for all states if and only if i = i 2. Proof. Suose sub(π i ) sub(π i ) for some state. We have π i = π i π i δ(,a )... πi δ(,a k ), and π i = π i 2 π i 2 δ(,a )... πi 2 δ(,a k ). From this it is easy to see that if, say, π i 2 δ(,a j ) πi 2 δ(,a, m) then necessarily π i δ(,a j ) π i δ(,a. Thus, m) sub(πi ) sub(π i ) entails that sub(π) i > sub(π i ). Consequently there are j, m {,..., k}, such that π i 2 π i 2 δ(,a m) and πi δ(,a j ) π i δ(,a m). (Note: πi δ(,a j ) = δ(,a j ) and π i δ(,a j ) could also be π i and π.) i Proosition now entails that δ(, a j) i 2 δ(, a m) and δ(, a j) i δ(, a m). Consequently, i i 2. Suose then that sub(π) i = sub(π i ), for all states. This means that for all states and all j, m {,..., k}, π i δ(,a j ) = πi δ(,a m) if and only if πi 2 δ(,a j ) = πi 2 δ(,a. Then, m) by Proosition, δ(, a j) i δ(, a m) δ(, a j) i 2 δ(, a m) Thus q/ i = q/ i 2 for every state q that is of the form δ(, a j) for some and a j. Since we assume that the DFA has no inaccessible states, this is true for all states excet ossibly the initial state q s. For the initial state q s, if ( ) q s/ i q s/ i 2, there would have to be an a j Σ such that sub(π i δ(q s,a j )) > sub(πi 2 δ(q s,a j )), imlying, as in the first art of the roof, that δ(q s, a j)/ i δ(q s, a j)/ i 2, contradicting ( ). Thus necessarily i = i 2. Dealing with large block-labels: It is easy to see that the bitstrings π, encoding the block-identifiers, grow with a factor of k+ at each round, meaning that the length of each π is O((k + ) i log n) bits at round i. In order to avoid such excessive growth we aly the technique of Perfect Hashing Functions (PHF). A PHF is a total injective function which takes a subset of size s of a set S and mas it to a set T where T = s and s S. However, PHF requires some shared data structures which is not alicable in Ma-Reduce environment. In order to imlement PHF in MR, in each reducer ρ, a new block number is assigned from the range {ρ n,..., ρ n + n }. Since there can be at most n reducers, the resulting block numbers will be in the range {,..., n 2 }, and our hashing scheme is thus guaranteed to be injective. We call the scheme Parallel PHF (PPHF). Technically, alying the PPHF is a searate MR job, following the main ma and reduce functions in each round. 4.2 Hocroft s Algorithm in Ma-Reduce This section describes, a Ma-Reduce version of Hocroft s algorithm. It consists of a re-rocessing stage, and rounds of a sequence of two ma-reduce jobs and a wra-u job. Pre-rocessing. We first roduce the set = {(, a, q, π, +) : (, a, q) δ}, where π = if F, and π = otherwise. In this stage we also build a set Γ, containing information about a B, for all a Σ, and blocks B in the current artition π. More secifically, Γ = {(, a, q, πq, ) : (, a, q) δ, q B j}, B j π Again, the initial value of π q is or as in. The sets and Γ are stored in the DFS. After the rerocessing we execute rounds of two ma-reduce jobs and a wra-u. The main idea is that, in each round, for each slitter S, a Q, and each q, where q S, the reducer h(q) in the first MR-job calculates a new block-id for all states, where (, a, q, π, +). It will be the resonsibility of the reducer h() in the second MR-job to udate the block-id in all relevant tules. First ma function. At round i, each maer is assigned a chunk of and chunk of Γ i according to Q. Note that before the second and subsequent rounds, the set Γ i is udated by the second job of the first round, according to the artition π i. We have Γ i = {(, a, q, πq i, ) : (, a, q) δ, q B j}. B j π i For each tule (, a, q, π i, +) in its chunk of the ma-, +), and for, ) in its Γ i, the maer emits er emits key-value air h(q), (, a, q, π i each tule (, a, q, π i q

5 h(q), (, a, q, π i q, ). After maing all slitters, the Master removes all elements of Q. First reduce function. Reducer h(q) then receives, for each S, a in Q, where q S, the tules (, a, q, π i, +),..., ( m, a, q, π i m, +), m n, of incoming a-transitions to q, and a subset of tules (, a, q, π i q, ),..., ( m, a, q, π i, ), namely those that were in the current Γ i. If state q had no incoming a-transitions, or if q doesn t articiate in any of the current slitters, the reducer does nothing. Otherwise, some of the tules ( j, a, q, πq i, ), where j {,..., m} were received, and for each such tule, the state j needs to move to a new block. The identifier of this new block will be comuted from the values of π i j, πq i, and β a j, where β a j, is a bitvector of length k = Σ, such that β a j (i) = if a = a i, and β a j (i) =, otherwise. The reducer will outut udate tules ( j, π i j, β a j, πq i ) Note that reducer h(q) will ossibly also receive transitions tules (, a, q, π i, +), in case h(q ) = h(q). These states will be udated only if (, a, q, π i q, ) also was received. Second ma function. Each maer is assigned its chunks of as before, a chunk of Γ i, and a chunk of the udate tules roduced by the first ma-reduce job. The maer emits key-value airs h(), (, a, q, π i, +) from, and h(), (, π i, β a, π i ) from its assigned udate tules. δ(,a) Second reduce function. Each reducer h() will receive tules (, a, δ(, a ), π i, +),..., (, a k, δ(, a k ), π i, +) from, and ossibly udate tules (, π i, β a i, π i δ(,a i ) ),..., (, π, βa im, π δ(,aim )), where m k. If the reducer received the udate tules, it first comutes β = m, and will then write the value π i = π i j= βa i j β π i δ(,a i )... πi δ(,a im ), in the above tules from and Γ i. Finally, the reducer oututs all its udated (or not) tules and udated (or not) Γ i tules as Γ i. Proosition 3. At the end of i:th round of Hocroft- MR, π i = π i q if and only if i q. π i δ(,a ij ) = πi δ(q,a ij ) Proof. We have π = πq iff q by definition. Assume that π i = πq i iff i q, and suose that π i = πq. i Then necessarily π i = πq i, β = β q, and for all ij {i,..., im}. By the inductive hyothesis i q and δ(, a ij ) i δ(q, a ij ), for all i j {i,..., i m}. The remaining ossibility is that δ(, a j) i δ(q, a j), for some j such that β (j) = β q(j) =. Then, for all slitters S, a j in Q at round i, there are no states r S, such that δ(, a j) = r or δ(q, a j) = r. On q the other hand, since i q it must be that δ(, a j) i 2 δ(q, a j). Since we had δ(, a j) i δ(q, a j), it follows that either δ(, a j) or δ(q, a j) was moved to new block at round i, which means that a sub-block of π i δ(,a j ) or πi δ(q,a j ) was ut in Q for round i, contradicting our assumtion that β (j) = β q(j) =. Suose then that i q. Then i q and by the inductive hyothesis, π i = πq i. Also, for all j {,..., k}, δ(, a j) i δ(q, a j), and by the inductive hyothesis π i δ(,a j ) = πi δ(q,a j ). Denote πi δ(,a j ) by S. We have two cases to consider: In the first case S, a j is in Q at round i. Then β (j) = β q(j) = and reducer h() will in the second job concatenate π i δ(,a j ) generated in the first job by reducer h(δ(, a j)) to π i and reducer h(q) will concatenate π i δ(q,a j ) to πq. i Since π i δ(,a j ) = πi δ(q,a j ), it follows that πi = πq. i In the second case, S, a j is not in Q at round i. Then β (j) = β q(j) =, meaning that π i δ(,a j ) and πi δ(q,a j ) were not art of π i and πq, i and that therefore π i = πq. i Wra-u. The Master now executes lines 7 26 of Algorithm 2. If there is any slit, Master will ut new slitters into Q and initiates another round of the two ma-reduce jobs. Dealing with large block-labels: In the block-identifiers π consist of (k + 2) i bits after round i. In order to avoid long bitstrings, between rounds we use a PHF with range {,..., n 2 }, as in. 4.3 Communication Cost of the Algorithms It is known that, in the worst case, = n 2, where n is the number of states in the DFA [9]. This haens in linear DFAs, such as A = ({, q, r}, {a}, δ, {r}) from Section 4.. This means that Algorithm needs n rounds in the worst case. It follows from Lemma 9 in [2], that requires the same number of rounds as. We need to account for the amount of data communicated in each round. In each transition (, a, q) δ gives rise to tules (, a, q, π, +) and (, a, q, π q, ) in. The first tule is maed to h() and the second to h() and h(q). This gives a relication rate of 3. A state can be encoded using log n bits and an alhabet symbol using log k bits. Since block-labels are in the range {,..., n 2 } using PPHF, these require 2 log n bits. We note that since PPHF actually is a Ma-Reduce job, the maers in this job emit udated block numbers which require (k + ) log n bits as a result of the concatenation oerations in the main Ma-Reduce job. Thus a tule (, a, q, π /π q, +/ ) requires log n+log k+log n+(k+) log n+ = Θ(k log n) bits. Since there are kn transitions, a total of Θ(k 2 n log n) bits need to be communicated. The reducers in outut the above udated two tye of tules, and ossibly an udate tule (, true). It follows from [5] that the total number of udate tules ever emitted is bounded by O(n log n), which however is absorbed in one round by O(k 2 n log n). We therefore have Proosition 4. The communication cost for Moore s MR-algorithm is O(k 2 n 2 log n) The analysis for is similar, excet that there are two MR jobs in each round, and only two tules are

6 maed for every transition in the inut DFA. As in Moore- MR, there can be at most n log n udate tules in total. All of this yields O((log n + log k)kn) bits of communication er round. We can assume k < n, which gives us O(kn log n) bits er round. Proosition 5. The communication cost for Hocroft s MR-algorithm is O(kn 2 log n) The effect of skewed inut. One of the roblems with the Ma Reduce framework is that the reducers might not be evenly utilized. This haens when the inut is skewed in some way. We note that in, each reducer h() receives k tules reresenting outgoing transitions, and k dummy tules. However, reducer h() also receives dummy tules reresenting transitions leading into. There is at least one, and at most n incoming transitions for each state. Thus we can exect that those reducers h(), where has a large number of incoming transitions, have to deal with larger inuts than those reducers h( ), where has only a few incoming transitions. Thus the reducers h( ) might have to wait for the h() reducers. The same henomenon occurs in the first Reduce job in, where each reducer h(q) receives tules corresonding to transitions leading into state q. However, in the second Reduce job all reducers receives tules corresonding to outgoing transitions. However, is inherently sensitive to skewness, as the slitting is determined by a state q, for all blocks B that contain a state, such that (, a, q) δ. On the other hand, determines a new block for a state based on all its outgoing transitions, and each sate has exactly k outgoing transitions. The ossible skewness in number of incoming transitions is only manifested in the way we udate the dummy tules reresenting transitions incident uon. We are currently exloring ways of circumventing this skewness sensitivity in. 5. EXPERIMENTAL RESULTS In this section we examine the behavior of and by imlementing them on Aache Hadoo on a cluster of two machines, and then running the algorithms on various tyes of DFA s. The number of reducers in the exeriments was set to Data Generation We first briefly describe the samle data we ran the algorithms on. In a DFA (Q, Σ, δ, q s, F ) where Q = n and Σ = k, we assume that Q = {,..., n}, q s is state, and Σ = {,..., k}. We will generate four tyes of DFA s: Slow DFA s. This family of automata was described in [5]. Slow DFA s behave badly for Hocroft s and Moore s (serial) algorithms. Here each newly generated block has a constant number of states and at each iteration only one block can be slit. The rototyical slow DFA is a linear one, where F = {n} and δ(i, j) = i +, for all i Q \ {n} and j Σ. Additionally, δ(n, j) = n for all j Σ. Linear DFA s are already minimal. Circular DFA s. These DFA s were also utilized in original aer of Hocroft. For all states i and alhabet symbols j, we comute δ(i, j) as follows: δ(i, j) = { (i + j) mod n if j k 2 (i + k j) mod n 2 otherwise. We also have F = {i Q : i mod k = }. This tye of DFA does not exhibit any skew. If we assume that n is a multile of k, we will have i j iff i mod k = j mod k, so the minimal version of the DFA is a similar circular DFA, only with k states. Relicated-Random DFA s. We first generate a random DFA and then make k coies of it. We add a global start state, from which symbol i takes us to (what was) the start state of the ith coy DFA. The minimized version of the Relicated-Random DFA is equal to the minimized version of the original random DFA lus the new start state, and its transitions to the old start state on all symbols. Star DFA s. The last tye of DFA has a star shae, where F = {n}. For all states i and alhabet symbols j, we comute δ(i, j) as follows: ( ) i mod (n ) + if i n and j = δ(i, j) = n if i n and j n if i = n. This tye of DFA exhibits drastic skew, and will be minimized to ({, n}, {,..., k}, γ,, {n}), where γ(, j) = n for all j {2,..., k}, γ(, ) =, and γ(n, j) = n for all j {,..., k}. 5.2 Exeriments Slow DFA s. We comared Slow DFA s, each with 4 to 52 states and alhabet size k = 2. As we know, udates all transitions in each round. According to Table, in this articular dataset, the minimization takes exactly n rounds for. In each round all n states are involved. Thus, this requires n 2 O(k log n) = O(n 2 k log n) bits. Since k = 2, we can consider it as a constant, so the bound becomes O(n 2 log n). On the other hand, in the total number of udated states is indeendent from the number of rounds and is bounded by O(n log n). Figure clearly shows the above olynomial relation between these two algorithms. However, Figure 2 shows that takes more time than. This is due to the fact that has one more job in each round. The relation here is linear and exhibits the combined imact of number of ma-reduce jobs and rounds on execution time. Since initial state is labeled as, we ma δ(i, j) = to n.

7 Communication Cost (MB) Communication Cost (MB) Figure : Communication cost on slow DFA for the alhabet size k = 2 Figure 3: Communication cost on circular DFA for the alhabet size k = 4,5, Figure 2: Execution time on slow DFA for the alhabet size k = 2 Figure 4: Execution time on circular DFA for the alhabet size k = 4 Circular DFA s. The main feature of this class of automata is that each state has k incoming transitions. It creates a uniform distribution of transitions both in maing schemas based on source state (the Key-Value airs h(), (, a, q, π, +) in and in the second ma function in ) and in maing schemas based on target state ( h(q), (, a, q, π q, ) in and the first ma function in ). As can be seen from Figure 3, both and have the same communication cost on small alhabets. On the other hand, similarly to the Slow DFA s, when the size of the alhabet is fixed and relatively small, we see from Figure 4 that requires more execution time. This is mainly because the overhead around I/O oerations and job execution, as was the case for Slow DFA s. We note that when the number of states is increased, both algorithms erform similarly. In order to see the effect of the alhabet size, we ran exeriments using increasing alhabet size and a fixed number (n = 24) of states. Figures 5 and 6 show that the communication cost and running time of are larger than that of Hocrft-MR by a factor of k, as stated earlier in Section 4.3. Communication Cost (GB) Alhabet Size Figure 5: Communication cost on circular DFA for n = 2, Alhabet Size Figure 6: Execution time on circular DFA for n = 2

8 Relicated-Random DFA s. Here the size of alhabet is fixed and equal to 4. The number of states in the Relicated-Random DFA s varies from to Each DFA is relicated four times and the coies are connected using an auxiliary start state. Results are lotted in Figures 7 and 8. These figures show that Relicated-Random DFA s behave similarly to Circular DFA s. We note that the small standard deviation of the distribution of tules among reducers in the Relicated-Random DFA s (see Table 2) does not affect the erformance significantly. The irregular value for the DFA in Figure 8 with states is, as shown in Table, due to the fact that the set of final states for this DFA is emty and it converges at the beginning of the rocess. Communication Cost (MB), Figure 9: Communication cost on star DFA for alhabet size k = 4 Communication Cost (MB), Figure 7: Communication cost on relicatedrandom DFA for alhabet size k = 4 Figure : Execution time on star DFA for alhabet size k = Communication Cost (MB) Alhabet Size Figure 8: Execution time on relicated-random DFA for alhabet size k = 4 Figure : n = 2 Communication cost on star DFA for Star DFA s. The last dataset is designed to highlight the sensitivity of the algorithms to data skewness. The main roerty of this dataset is that (k )n out of kn transitions will be maed to one reducer. Interestingly, both algorithms demonstrate the same erformance which shows both have the same sensitivity to skewness, as this is a clear consequence of their maing schemas. Results are lotted in Figures 9 and. In conclusion, these two algorithms behave equally when k is fixed and small. On the other hand, when the number of states is fixed and the alhabet size increases, there is a noticeable difference between and. Figure shows the difference in communication cost between these algorithms when k increases, while, Figure 2 shows this difference in execution time Alhabet Size Figure 2: Execution time on star DFA for n = 2

9 5.3 Summary The number of rounds required for each algorithm is categorized by the data sets and is listed in Table. For the Slow DFA s, and both required exactly n rounds, as it is exected. On the other hand, for Circular DFA s, both algorithms accomlished the task in a fixed number of rounds, indeendent from the size of the alhabet. The same results are observed for Star DFA s. Lastly, running the algorithms on Relicated-Random DFA s shows that has finished the task in fewer rounds than. Table : Number of rounds for minimizing DFA using and with different datasets Dataset k n M-MR a H-MR b Slow 2 2 to 2 n n Circular Random Star to to c c to to a b c This randomly generated DFA does not have any final state and will minimize into a single state machine. Figures 3 6 reresent the data distribution and execution time among all reducers only in the first round of execution. As is exected, for Circular and Relicated-Random DFA s the data is distributed and executed uniformly. However, for Star DFA s these results show that both and are sensitive to skewness. The data distribution shows that one reducer receives most of the inut while the rest of the inut is evenly shared among all other reducers. Number of Transitions ( 5 ) Circular Random Star Reducer Number Figure 3: Maximum number of transitions in reducer grous for s algorithm Circular Random Star Reducer Number Figure 4: Maximum execution time in reducer grous for s algorithm Number of Transitions ( 5 ) Circular Random Star Reducer Number Figure 5: Maximum number of transitions in reducer grous for s algorithm Circular Random Star Reducer Number Figure 6: Maximum execution time in reducer grous for s algorithm

10 Statistical summary of the distribution of the tules over the reducers is shown in Table 2 for both algorithms. For all four datasets the size of alhabet is set to 4 and the number of states is 2 7. The columns Min. and Max. show the minimum and maximum number of tules maed to each reducer in every job of the all rounds of execution. For we have two rows er DFA tye describing the two jobs. In Circular DFA s, tules are evenly distributed for both incoming and outgoing transitions, and the standard deviation for both algorithms is. However, a high degree of skewness is observed for Star DFA s (all states have one incoming transition excet the final state, which has (k )n incoming transitions). Table 2: Statistical summary of the distribution of records among reducers Dataset Min. Max. Avg. Std. Dev. Circular Random Star Circular Random Star CONCLUSION In this aer we develoed and studied imlementations in Ma-Reduce of two algorithms for minimizing DFA s. Our analytical and exerimental results show that the algorithm and the behave very similarly when the alhabet size is small, whereas Hocroft- MR outerforms in communication cost when the cardinality of the alhabet is at least 6, and in wallclock time when the cardinality is at least 32, both measured on DFA s with 24 states. Moreover, our study shows that both algorithms are equally sensitive to skewness in the inut data. However, (and serial Moore) determines the block of a state based on its (constant number of) outgoing transitions, while Hocorft-MR (and serial Hocroft) does it based on the transitions incident uon the state. This means that there is otential to reduce skewsensitiveness in. In future work we will also investigate the average communication cost of. Finally, the caacity of the reducers (indeendently of the maing schema) may affect number of rounds needed. We intend to exlore this trade-off in future work. 7. REFERENCES [] F. N. Afrati and J. D. Ullman. Transitive closure and recursive datalog imlemented on clusters. In 5th International Conference on Extending Database Technology, EDBT, Berlin, Germany, ages 32 43, 22. [2] J. David. Average comlexity of Moore s and Hocroft s algorithms. Theor. Comut. Sci., 47:5 65, 22. [3] J. Dean and S. Ghemawat. MaReduce: simlified data rocessing on large clusters. Communications of the ACM, 5():7 3, 28. [4] G. Grahne, S. Harrafi, A. Moallemi, and A. Onet. Comuting NFA intersections in Ma-Reduce. In Proceedings of the Workshos of the EDBT/ICDT Joint Conference (EDBT/ICDT), ages 42 45, 25. [5] J. E. Hocroft. An n log n algorithm for minimizing states in a finite automaton. In Theory of Machines and Comutations, ages Academic Press, New York, 97. [6] J. F. JáJá and K. W. Ryu. An efficient arallel algorithm for the single function coarsest artition roblem. In Proceedings of the fifth annual ACM symosium on Parallel algorithms and architectures, ages ACM, 993. [7] J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets, 2nd Ed. Cambridge University Press, 24. [8] Y. Luo, G. H. Fletcher, J. Hidders, Y. Wu, and P. De Bra. External memory k-bisimulation reduction of big grahs. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, ages ACM, 23. [9] E. F. Moore. Gedanken-exeriments on sequential machines. Automata studies, 34:29 53, 956. [] V. Rastogi, A. Machanavajjhala, L. Chitnis, and A. D. Sarma. Finding connected comonents in Ma-Reduce in logarithmic rounds. In 29th IEEE International Conference on Data Engineering, ICDE, Brisbane, Australia, ages 5 6, 23. [] B. Ravikumar and X. Xiong. A arallel algorithm for minimization of finite automata. In The th International Parallel Processing Symosium, ages 87 9, Aril 996. [2] A. Schätzle, A. Neu, G. Lausen, and M. Przyjaciel-Zablocki. Large-scale bisimulation of rdf grahs. In Proceedings of the Fifth Worksho on Semantic Web Information Management, age. ACM, 23. [3] M. Shaw, P. Koutris, B. Howe, and D. Suciu. Otimizing large-scale semi-naïve Datalog evaluation in Hadoo. In Datalog in Academia and Industry - Second International Worksho, Datalog 2., Vienna, Austria, ages 65 76, 22. [4] Y. N. Srikant. A arallel algorithm for the minimization of finite state automata. International Journal of Comuter Mathematics, 32(-2):, 99. [5] A. Tewari, U. Srivastava, and P. Guta. A arallel DFA minimization algorithm. In High Performance Comuting-HiPC, ages Sringer, 22. [6] B. W. Watson. A taxonomy of finite automata minimization algorithms. Comuting Science Note 93/44, Eindhoven University of Technology, The Netherlands, 993. [7] M. Yoeli. Canonical reresentations of chain events. Information and Control, 8(2):8 89, 965.

A Parallel Algorithm for Minimization of Finite Automata

A Parallel Algorithm for Minimization of Finite Automata B. Ravikumar X. Xiong Deartment of Comuter Science University of Rhode Island Kingston, RI 02881 E-mail: fravi,xiongg@cs.uri.edu Abstract In this