DFA Minimization in Map-Reduce
|
|
- Barrie Silas Riley
- 5 years ago
- Views:
Transcription
1 DFA Minimization in Ma-Reduce Gösta Grahne Concordia University Montreal, Canada, H3G M8 Shahab Harrafi Concordia University Montreal, Canada, H3G M8 s harraf@alumni.concordia.ca Iraj Hedayati Concordia University Montreal, Canada, H3G M8 h iraj@encs.concordia.ca Ali Moallemi Concordia University Montreal, Canada, H3G M8 moa ali@encs.concordia.ca ABSTRACT We describe Ma-Reduce imlementations of two of the most rominent DFA minimization methods, namely Moore s and Hocroft s algorithms. Our analysis shows that the one based on Hocroft s algorithm is more efficient, both in terms of running time and communication cost. This is validated by our extensive exeriments on various tyes of DFA s, with u to 2 7 states. It also turns out that both algorithms are sensitive to skewed inut, the Hocroft s algorithm being intrinsically so. CCS Concets Theory of comutation MaReduce algorithms; Formal languages and automata theory; Keywords Ma-Reduce, Automata, DFA Minimization, Communication Cost. INTRODUCTION Google introduced Ma-Reduce (MR) [3] as a arallel rogramming model that can work over large clusters of commodity comuters. Ma-Reduce rovides a high-level framework for designing and imlementing such arallelism. MR is by now well established in academia and industry. While many single round (non-recursive) roblems have been studied and imlemented, recursive roblems that require several rounds of Ma and Reduce are still being exlored. A growing number of aers (see e.g. [,, 3]), deal with such multi-round MR algorithms. The roblem of DFA minimization rovides an interesting vehicle for studying multi-round MR roblems for several reasons. Firstly, because of its imortance and wide use in alications. Secondly, because of the multi-round structure of this roblem. In contrast with single-round roblems Contact author. Permission to make digital or hard coies of all or art of this work for ersonal or classroom use is granted without fee rovided that coies are not made or distributed for rofit or commercial advantage and that coies bear this notice and the full citation on the first age. Coyrights for comonents of this work owned by others than ACM must be honored. Abstracting with credit is ermitted. To coy otherwise, or reublish, to ost on servers or to redistribute to lists, requires rior secific ermission and/or a fee. Request ermissions from ermissions@acm.org. BeyondMR 6, June 26-July 26, San Francisco, CA, USA c 26 ACM. ISBN /6/6... $5. DOI: htt://dx.doi.org/.45/ such as the NFA intersections studied in [4], multi-round roblems require coordination between rounds. Thirdly, although DFA minimization is fairly straightforward to exress in Datalog, the results of the resent aer show that designing DFA minimization algorithms in MR is far from trivial. Thus, we hoe that DFA minimization can shed some light on the more general roblem of imlementing arbitrary Datalog rograms in the MR framework. We note that DFA minimization has been studied for various arallel architectures (see e.g. [6,, 4, 5]). However, all of these architectures contain some kind of shared memory, while the Ma-Reduce model corresonds to a sharednothing architecture. To the best of our knowledge, ours is the first aer studying DFA minimization in MR. However, somewhat similar roblem of grah reductions based on bi-simulation has recently been the object of MR imlementations, see e.g. [8] and [2]. In the serial context, the two major algorithms for DFA minimization are the one of Moore [9] and the one of Hocroft [5]. Of these, Hocroft s algorithm is considered suerior due to its running time, which is less than that of Moore s algorithm. In this aer we derive MR versions of Moore s and Hocroft s algorithms. The resulting algorithms are called and, resectively. We analyze and in terms of total communication cost, i.e. the size of all data communicated by maers and reducers over the commodity cluster. These turn out to be O(k 2 n 2 log n) and O(kn 2 log n) bits, resectively. Here n is the number of states of the automaton to be minimized, and k is the size of its alhabet. We have imlemented both algorithms in Aache Hadoo, and erformed extensive exeriments on synthetically generated DFA s, with various grah toologies. These toologies allow us to comare the communication cost and running time of the algorithms with resect to worst and average inuts, as well as to see the effect of skewed inut. Our exeriments validate the analysis and show that and are comarable in erformance, but for larger alhabets is more efficient. The exeriments also show that the erformance of both algorithms deteriorate when the inut is skewed. The rest of this aer is organized as follows. Section 2 rovides the necessary technical definitions, and Section 3 describes Moore s and Hocroft s algorithms. In Section 4 we adat the two algorithms for the MR framework along with the technical analysis of the communication cost for the two adated algorithms. Section 5 describes the exerimental results. Conclusions are drawn in the last section.
2 2. PRELIMINARIES In this section we introduce the basic technical reliminaries and definitions. A Finite Automaton (FA) is a 5-tule A = (Q, Σ, δ, q s, F ), where Q is a finite set of states, Σ is a finite set of alhabet symbols, δ Q Σ Q is the transition relation, q s Q is the start state, and F Q is a set of final states. By Σ we denote the set of all finite strings over Σ. Let w = a a 2... a l where a i Σ, be a string in Σ. An acceting comutation ath of w in A is a sequence (q s, a, q ), (q, a 2, q 2),..., (q l, a l, q f ) of tules of δ, where q f F. The language acceted by A, denoted L(A), is the set of all strings in Σ for which there exists an acceting comutation ath in A. An FA A is said to be deterministic if for all Q and a Σ there is a unique q Q, such that (, a, q) δ. Otherwise, the FA is non-deterministic. Deterministic FA s are called DFA s, and non-deterministic ones are called NFA s. By the well known subset construction, any NFA A can be turned into a DFA A D, such that L(A D ) = L(A). For a DFA A = (Q, Σ, δ, q s, F ), we also write δ in the functional format, i.e. δ(, a) = q iff (, a, q) δ. For a state Q, and string w = a a 2... a l Σ, we denote by ˆδ(, w) the unique state δ(δ( δ(δ(, a ), a 2), a l ), a l ). Then, for a DFA its language L(A) can be defined as {w : ˆδ(q s, w) F }. A DFA A is said to be minimal, if all DFA s B, such that L(A) = L(B), have at least as many states as A. For each regular language L, there is a unique (u to isomorhism of their grah reresentations) minimal DFA that accets L. The DFA minimization roblem has been studied since the 95 s. A taxonomy of DFA minimization algorithms was created by B. W. Watson in 993 [6]. The taxonomy reveals that most of the algorithms are based on the notion of equivalent states (to be defined). The sole excetion is Brzozowski s algorithm [7], which is based on reversal and determinization of automata. Let A = (Q, Σ, δ, q s, F ) be a DFA. Then the reversal of A is the NFA A R = (Q, Σ, δ R, F, {q s}), where δ R = {(, a, q) : (q, a, ) δ}. Brzozowski showed in 962 that if A is a DFA, then (((A R ) D ) R ) D is a minimal DFA for L(A). This rather surrising result does not however yield a ractical minimization algorithm since there is a otential double exonential blow-u in the number of states, due to the two determinization stes. The rest of the algorithms are based on equivalence of states. Let A = (Q, Σ, δ, q s, F ) be a DFA, and, q Q. Then is equivalent with q, denoted q, if for all strings w Σ, it holds that ˆδ(, w) F iff ˆδ(q, w) F. The relation is an equivalence relation, and by Q/ we denote the induced artition of the state-sace Q. The quotient DFA A/ = (Q/, Σ, γ, q s/, F/ ) where γ(/, a) = δ(, a)/, is then a minimal DFA, such that L(A/ ) = L(A). An imortant observation is that can be comuted iteratively as a sequence,, 2,..., m =, where q if F q F, and i+ q if i q and δ(, a) i δ(q, a), for all a Σ. For each DFA A the sequence converges in at most l stes, where l is the length of the longest simle ath from the start state to a final state. This means that Q/ can be comuted iteratively, each ste refining Q/ i to Q/ i+, and eventually converging to Q/. We reserve the letter n for the number of states in a DFA and the letter k for the size of its alhabet. We assume that Σ = {a,..., a k }, and use a and b as metalinguistic variables denoting arbitrary alhabet symbols. 3. THE CLASSICAL ALGORITHMS 3. Moore s Algorithm The first iterative algorithm was roosed by E. F. Moore in 956 [9]. The algorithm comutes the artition Q/ by iteratively refining the initial artition π = {F, Q\F }. After termination, π = Q/. In the algorithm, the artition π is encoded as a maing that assigns to each state Q a block-identifier π, which is a bit-string. The number of blocks in π, i.e. the cardinality of the range of π, is denoted π. The value of π at the i:th iteration of lines 8 is denoted π i. The symbol on line of the algorithm denotes concatenation (of strings). Algorithm Moore s DFA minimization algorithm Inut: DFA A = (Q, {a,..., a k }, δ, q s, F ) Outut: π = Q/ : i 2: for all Q do The initial artition 3: if F then π i 4: else π i 5: end if 6: end for 7: reeat 8: i i + 9: for all Q do : π i π i : end for 2: until π i = π i π i δ(,a ) πi δ(,a 2 )... πi δ(,a k ) The minimal automaton is now obtained by relacing Q by π, q s by π qs, F by {π f : f F }, and relacing each transition (, a, q) δ by transition (π, a, π q), and then removing dulicate transitions. 3.2 Hocroft s Algorithm This algorithm was roosed by J. E. Hocroft in 97 [5]. We start with some definitions. Let A = (Q, Σ, δ, q s, F ) be a DFA, S, P Q, S P =, and a Σ. Then S, a is called a slitter, and P S, a = {P, P 2}, where P = { P : δ(, a) S} P 2 = { P : δ(, a) / S}. If either of P or P 2 is emty, we set P S, a = P. Otherwise S, a is said to slit P. Furthermore, for S Q and a Σ, we define a S = {s S : δ(, a) = s for some Q}. That is, a S consists of those states in S that have an incoming transition labeled a. As in Moore s algorithm, we encode the current artition with a function π that mas states to integers. We also define B j = { Q : π = j}.
3 Algorithm 2 Hocroft s DFA minimization algorithm Inut: DFA A = (Q, Σ, δ, q s, F ) Outut: π = Q/ : Execute lines 6 of Algorithm Comute π 2: Q Queue set 3: for all a Σ do 4: if a B < a B then Add B, a to Q 5: else Add B, a to Q. 6: end if 7: end for 8: i 9: while Q do : i i + ; size π i : Pick and delete a slitter S, a from Q. 2: for each B j π i which is slit by S, a do 3: size size + 4: for all B j, where δ(, a) S do 5: π i size New block-number 6: end for 7: for all b Σ do Udate Q 8: if B j, b Q then 9: Add B π i, b to Q 2: else 2: if b B j < b B π i then 22: add B j, b to Q 23: else add B π i, b to Q. 24: end if 25: end if 26: end for 27: end for 28: end while 29: π π i At line, the size of the current artition (the number of equivalence classes) is stored in variable size. At lines 2 6 we examine all blocks B j π i that are slit by S, a, and all states B j, where δ(, a) S, are given the new block identifier size for π i at line 5. At the end, the minimal automaton A/ can be obtained as in Moore s algorithm. 4. THE ALGORITHMS IN MAP-REDUCE We assume familiarity with the Ma-Reduce model (see e.g. [7]). Here we will only recall that the initial data is stored on the DFS, and each maer is resonsible for a chunk of the inut. In a round of Ma and Reduce, the maers emit key-value airs K, V. Each reducer K receives and aggregates key-value lists of the form K, [V,..., V l ]. where the K, V i airs were emitted by the maers. 4. Moore s Algorithm in Ma-Reduce Our Ma-Reduce version of Moore s algorithm, called, consists of a re-rocessing stage, and one or more rounds of ma and reduce functions. Pre-rocessing: Let A = (Q, {a,..., a k }, δ, q s, F ). We first build a set from δ. The set will consist of annotated transitions of the form (, a, q, π, D), where π is a bit-string reresenting the initial block where the state belongs, D = + indicates that the tule reresents a transition (an outgoing edge), and D = indicates that the tule is a dummy transition carrying in its fourth osition the information of the initial block of the aforementioned state q. More secifically, for each (, a, q) δ we insert into (, a, q,, +) and (, a, q,, ) when, q F (, a, q,, +) and (, a, q,, ) when F, q F (, a, q,, +) and (, a, q,, ) when F, q F (, a, q,, +) and (, a, q,, ) when, q F. Recall that Moore s algorithm is an iterative refinement of the initial artition {F, Q \ F }. In the MR version each reducer will be resonsible for one or more states Q. Since there is no global data structure, we need to annotate each state with the identifier π of the block it currently belongs to. This annotation is ket in all transitions (, a i, q i), i =,..., k. Hence tules (, a i, q i, π, +) are in. In order to generate new block identifiers, the reducer also needs to know the current block of all the above states q i, which is why tules (, a i, q i, π qi, ) are in. Furthermore, the new block of state will be needed in the next round when udating the block annotation of states r,..., r m, where (r, a i, ),... (r m, a im, ) are all transitions leading to. Thus tules (r i, a ij,, π, ) are in. Ma function: Let ν be the number of reducers available, and h : Q {,,..., ν } a hashing function. At round i each maer gets a chunk of as inut, and for each tule (, a, q, π i, +) it emits the key-value air h(), (, a, q, π i, +), and for each tule (, a, q, πq i, ) it emits h(), (, a, q, πq i, ) and h(q), (, a, q, πq i, ). Reduce function: Each reducer ρ {,,..., ν } receives, for all Q where h() = ρ, all outgoing transitions (, a, q, π i, +),..., (, a k, q k, π i, +), as well as dummy transitions (, a, q, π i q (r, a i,, π i The reducer can now comute π i π i, ),..., (, a k, q k, π i, ), and q k, ),..., (r m, a im,, π i, ). π i q π i q 2... π i q k, corresonding to line in Algorithm and write the new value π i in the tules (, a, q j, π i, +), for j {,..., k}, and (r j, a,, π i, ), for j {,..., m}, which it then oututs. The reducer also oututs a change tule (, true) if the new value of π i means that the number of blocks in π i has increased. In order to see how reducer h() can determine this internally, consider for examle the DFA A = ({, q, r}, {a}, δ,, {r}), where δ(, a) = q, δ(q, a) = r, and δ(r, a) = r. Round i π i πq i πr i 2 3 After round a new block was created for state q, and after round 2 a new block was created for state. This can be seen from the fact that after round, π consists of two equal arts and, whereas π q consists of two distinct arts and. Similarly, after round 2, π q still consists of two distinct arts, whereas π has changed from two similar
4 arts to two distinct arts. We conclude that a new block has been created for but not for q. After round 3, all block-identifiers consist of two distinct arts, as they did after round 2. No new blocks have been created, and the algorithm can terminate. The above can be generalized as follows: If Σ = k, then a block-identifier π is a bit-string that consists of k + bitstring comonents from the revious round. If the number of comonents have increased, it means that a new blockidentifier has been created. The salient oint is that the increase can be detected inside reducer h(), which then can emit a change tule (, true). Termination detection: At he end of each MR-round, if any change tules (, true) have been emitted, it means that another round of MR is needed. Proosition. At round i of, π i = π i q if and only if i q. Proof. We have π = πq iff q by construction. Assume that π i = πq i iff i q, and suose that π i = πq. i In the i:th Ma-Reduce round, π i = π i π i q = π i q π i δ(,a )... πi δ(,a k ), and π i δ(q,a )... πi δ(q,a k ). By the inductive hyothesis, all the above (k + ) comonents are equal iff i q and δ(, a j) i δ(q, a j) for all j {,..., k} iff i q. For roving termination of the algorithm, we denote by sub(π i ) the set of substrings obtained by dividing π i into (k + ) contiguous substrings of equal length. Proosition 2. sub(π) i = sub(π i ) for all states if and only if i = i 2. Proof. Suose sub(π i ) sub(π i ) for some state. We have π i = π i π i δ(,a )... πi δ(,a k ), and π i = π i 2 π i 2 δ(,a )... πi 2 δ(,a k ). From this it is easy to see that if, say, π i 2 δ(,a j ) πi 2 δ(,a, m) then necessarily π i δ(,a j ) π i δ(,a. Thus, m) sub(πi ) sub(π i ) entails that sub(π) i > sub(π i ). Consequently there are j, m {,..., k}, such that π i 2 π i 2 δ(,a m) and πi δ(,a j ) π i δ(,a m). (Note: πi δ(,a j ) = δ(,a j ) and π i δ(,a j ) could also be π i and π.) i Proosition now entails that δ(, a j) i 2 δ(, a m) and δ(, a j) i δ(, a m). Consequently, i i 2. Suose then that sub(π) i = sub(π i ), for all states. This means that for all states and all j, m {,..., k}, π i δ(,a j ) = πi δ(,a m) if and only if πi 2 δ(,a j ) = πi 2 δ(,a. Then, m) by Proosition, δ(, a j) i δ(, a m) δ(, a j) i 2 δ(, a m) Thus q/ i = q/ i 2 for every state q that is of the form δ(, a j) for some and a j. Since we assume that the DFA has no inaccessible states, this is true for all states excet ossibly the initial state q s. For the initial state q s, if ( ) q s/ i q s/ i 2, there would have to be an a j Σ such that sub(π i δ(q s,a j )) > sub(πi 2 δ(q s,a j )), imlying, as in the first art of the roof, that δ(q s, a j)/ i δ(q s, a j)/ i 2, contradicting ( ). Thus necessarily i = i 2. Dealing with large block-labels: It is easy to see that the bitstrings π, encoding the block-identifiers, grow with a factor of k+ at each round, meaning that the length of each π is O((k + ) i log n) bits at round i. In order to avoid such excessive growth we aly the technique of Perfect Hashing Functions (PHF). A PHF is a total injective function which takes a subset of size s of a set S and mas it to a set T where T = s and s S. However, PHF requires some shared data structures which is not alicable in Ma-Reduce environment. In order to imlement PHF in MR, in each reducer ρ, a new block number is assigned from the range {ρ n,..., ρ n + n }. Since there can be at most n reducers, the resulting block numbers will be in the range {,..., n 2 }, and our hashing scheme is thus guaranteed to be injective. We call the scheme Parallel PHF (PPHF). Technically, alying the PPHF is a searate MR job, following the main ma and reduce functions in each round. 4.2 Hocroft s Algorithm in Ma-Reduce This section describes, a Ma-Reduce version of Hocroft s algorithm. It consists of a re-rocessing stage, and rounds of a sequence of two ma-reduce jobs and a wra-u job. Pre-rocessing. We first roduce the set = {(, a, q, π, +) : (, a, q) δ}, where π = if F, and π = otherwise. In this stage we also build a set Γ, containing information about a B, for all a Σ, and blocks B in the current artition π. More secifically, Γ = {(, a, q, πq, ) : (, a, q) δ, q B j}, B j π Again, the initial value of π q is or as in. The sets and Γ are stored in the DFS. After the rerocessing we execute rounds of two ma-reduce jobs and a wra-u. The main idea is that, in each round, for each slitter S, a Q, and each q, where q S, the reducer h(q) in the first MR-job calculates a new block-id for all states, where (, a, q, π, +). It will be the resonsibility of the reducer h() in the second MR-job to udate the block-id in all relevant tules. First ma function. At round i, each maer is assigned a chunk of and chunk of Γ i according to Q. Note that before the second and subsequent rounds, the set Γ i is udated by the second job of the first round, according to the artition π i. We have Γ i = {(, a, q, πq i, ) : (, a, q) δ, q B j}. B j π i For each tule (, a, q, π i, +) in its chunk of the ma-, +), and for, ) in its Γ i, the maer emits er emits key-value air h(q), (, a, q, π i each tule (, a, q, π i q
5 h(q), (, a, q, π i q, ). After maing all slitters, the Master removes all elements of Q. First reduce function. Reducer h(q) then receives, for each S, a in Q, where q S, the tules (, a, q, π i, +),..., ( m, a, q, π i m, +), m n, of incoming a-transitions to q, and a subset of tules (, a, q, π i q, ),..., ( m, a, q, π i, ), namely those that were in the current Γ i. If state q had no incoming a-transitions, or if q doesn t articiate in any of the current slitters, the reducer does nothing. Otherwise, some of the tules ( j, a, q, πq i, ), where j {,..., m} were received, and for each such tule, the state j needs to move to a new block. The identifier of this new block will be comuted from the values of π i j, πq i, and β a j, where β a j, is a bitvector of length k = Σ, such that β a j (i) = if a = a i, and β a j (i) =, otherwise. The reducer will outut udate tules ( j, π i j, β a j, πq i ) Note that reducer h(q) will ossibly also receive transitions tules (, a, q, π i, +), in case h(q ) = h(q). These states will be udated only if (, a, q, π i q, ) also was received. Second ma function. Each maer is assigned its chunks of as before, a chunk of Γ i, and a chunk of the udate tules roduced by the first ma-reduce job. The maer emits key-value airs h(), (, a, q, π i, +) from, and h(), (, π i, β a, π i ) from its assigned udate tules. δ(,a) Second reduce function. Each reducer h() will receive tules (, a, δ(, a ), π i, +),..., (, a k, δ(, a k ), π i, +) from, and ossibly udate tules (, π i, β a i, π i δ(,a i ) ),..., (, π, βa im, π δ(,aim )), where m k. If the reducer received the udate tules, it first comutes β = m, and will then write the value π i = π i j= βa i j β π i δ(,a i )... πi δ(,a im ), in the above tules from and Γ i. Finally, the reducer oututs all its udated (or not) tules and udated (or not) Γ i tules as Γ i. Proosition 3. At the end of i:th round of Hocroft- MR, π i = π i q if and only if i q. π i δ(,a ij ) = πi δ(q,a ij ) Proof. We have π = πq iff q by definition. Assume that π i = πq i iff i q, and suose that π i = πq. i Then necessarily π i = πq i, β = β q, and for all ij {i,..., im}. By the inductive hyothesis i q and δ(, a ij ) i δ(q, a ij ), for all i j {i,..., i m}. The remaining ossibility is that δ(, a j) i δ(q, a j), for some j such that β (j) = β q(j) =. Then, for all slitters S, a j in Q at round i, there are no states r S, such that δ(, a j) = r or δ(q, a j) = r. On q the other hand, since i q it must be that δ(, a j) i 2 δ(q, a j). Since we had δ(, a j) i δ(q, a j), it follows that either δ(, a j) or δ(q, a j) was moved to new block at round i, which means that a sub-block of π i δ(,a j ) or πi δ(q,a j ) was ut in Q for round i, contradicting our assumtion that β (j) = β q(j) =. Suose then that i q. Then i q and by the inductive hyothesis, π i = πq i. Also, for all j {,..., k}, δ(, a j) i δ(q, a j), and by the inductive hyothesis π i δ(,a j ) = πi δ(q,a j ). Denote πi δ(,a j ) by S. We have two cases to consider: In the first case S, a j is in Q at round i. Then β (j) = β q(j) = and reducer h() will in the second job concatenate π i δ(,a j ) generated in the first job by reducer h(δ(, a j)) to π i and reducer h(q) will concatenate π i δ(q,a j ) to πq. i Since π i δ(,a j ) = πi δ(q,a j ), it follows that πi = πq. i In the second case, S, a j is not in Q at round i. Then β (j) = β q(j) =, meaning that π i δ(,a j ) and πi δ(q,a j ) were not art of π i and πq, i and that therefore π i = πq. i Wra-u. The Master now executes lines 7 26 of Algorithm 2. If there is any slit, Master will ut new slitters into Q and initiates another round of the two ma-reduce jobs. Dealing with large block-labels: In the block-identifiers π consist of (k + 2) i bits after round i. In order to avoid long bitstrings, between rounds we use a PHF with range {,..., n 2 }, as in. 4.3 Communication Cost of the Algorithms It is known that, in the worst case, = n 2, where n is the number of states in the DFA [9]. This haens in linear DFAs, such as A = ({, q, r}, {a}, δ, {r}) from Section 4.. This means that Algorithm needs n rounds in the worst case. It follows from Lemma 9 in [2], that requires the same number of rounds as. We need to account for the amount of data communicated in each round. In each transition (, a, q) δ gives rise to tules (, a, q, π, +) and (, a, q, π q, ) in. The first tule is maed to h() and the second to h() and h(q). This gives a relication rate of 3. A state can be encoded using log n bits and an alhabet symbol using log k bits. Since block-labels are in the range {,..., n 2 } using PPHF, these require 2 log n bits. We note that since PPHF actually is a Ma-Reduce job, the maers in this job emit udated block numbers which require (k + ) log n bits as a result of the concatenation oerations in the main Ma-Reduce job. Thus a tule (, a, q, π /π q, +/ ) requires log n+log k+log n+(k+) log n+ = Θ(k log n) bits. Since there are kn transitions, a total of Θ(k 2 n log n) bits need to be communicated. The reducers in outut the above udated two tye of tules, and ossibly an udate tule (, true). It follows from [5] that the total number of udate tules ever emitted is bounded by O(n log n), which however is absorbed in one round by O(k 2 n log n). We therefore have Proosition 4. The communication cost for Moore s MR-algorithm is O(k 2 n 2 log n) The analysis for is similar, excet that there are two MR jobs in each round, and only two tules are
6 maed for every transition in the inut DFA. As in Moore- MR, there can be at most n log n udate tules in total. All of this yields O((log n + log k)kn) bits of communication er round. We can assume k < n, which gives us O(kn log n) bits er round. Proosition 5. The communication cost for Hocroft s MR-algorithm is O(kn 2 log n) The effect of skewed inut. One of the roblems with the Ma Reduce framework is that the reducers might not be evenly utilized. This haens when the inut is skewed in some way. We note that in, each reducer h() receives k tules reresenting outgoing transitions, and k dummy tules. However, reducer h() also receives dummy tules reresenting transitions leading into. There is at least one, and at most n incoming transitions for each state. Thus we can exect that those reducers h(), where has a large number of incoming transitions, have to deal with larger inuts than those reducers h( ), where has only a few incoming transitions. Thus the reducers h( ) might have to wait for the h() reducers. The same henomenon occurs in the first Reduce job in, where each reducer h(q) receives tules corresonding to transitions leading into state q. However, in the second Reduce job all reducers receives tules corresonding to outgoing transitions. However, is inherently sensitive to skewness, as the slitting is determined by a state q, for all blocks B that contain a state, such that (, a, q) δ. On the other hand, determines a new block for a state based on all its outgoing transitions, and each sate has exactly k outgoing transitions. The ossible skewness in number of incoming transitions is only manifested in the way we udate the dummy tules reresenting transitions incident uon. We are currently exloring ways of circumventing this skewness sensitivity in. 5. EXPERIMENTAL RESULTS In this section we examine the behavior of and by imlementing them on Aache Hadoo on a cluster of two machines, and then running the algorithms on various tyes of DFA s. The number of reducers in the exeriments was set to Data Generation We first briefly describe the samle data we ran the algorithms on. In a DFA (Q, Σ, δ, q s, F ) where Q = n and Σ = k, we assume that Q = {,..., n}, q s is state, and Σ = {,..., k}. We will generate four tyes of DFA s: Slow DFA s. This family of automata was described in [5]. Slow DFA s behave badly for Hocroft s and Moore s (serial) algorithms. Here each newly generated block has a constant number of states and at each iteration only one block can be slit. The rototyical slow DFA is a linear one, where F = {n} and δ(i, j) = i +, for all i Q \ {n} and j Σ. Additionally, δ(n, j) = n for all j Σ. Linear DFA s are already minimal. Circular DFA s. These DFA s were also utilized in original aer of Hocroft. For all states i and alhabet symbols j, we comute δ(i, j) as follows: δ(i, j) = { (i + j) mod n if j k 2 (i + k j) mod n 2 otherwise. We also have F = {i Q : i mod k = }. This tye of DFA does not exhibit any skew. If we assume that n is a multile of k, we will have i j iff i mod k = j mod k, so the minimal version of the DFA is a similar circular DFA, only with k states. Relicated-Random DFA s. We first generate a random DFA and then make k coies of it. We add a global start state, from which symbol i takes us to (what was) the start state of the ith coy DFA. The minimized version of the Relicated-Random DFA is equal to the minimized version of the original random DFA lus the new start state, and its transitions to the old start state on all symbols. Star DFA s. The last tye of DFA has a star shae, where F = {n}. For all states i and alhabet symbols j, we comute δ(i, j) as follows: ( ) i mod (n ) + if i n and j = δ(i, j) = n if i n and j n if i = n. This tye of DFA exhibits drastic skew, and will be minimized to ({, n}, {,..., k}, γ,, {n}), where γ(, j) = n for all j {2,..., k}, γ(, ) =, and γ(n, j) = n for all j {,..., k}. 5.2 Exeriments Slow DFA s. We comared Slow DFA s, each with 4 to 52 states and alhabet size k = 2. As we know, udates all transitions in each round. According to Table, in this articular dataset, the minimization takes exactly n rounds for. In each round all n states are involved. Thus, this requires n 2 O(k log n) = O(n 2 k log n) bits. Since k = 2, we can consider it as a constant, so the bound becomes O(n 2 log n). On the other hand, in the total number of udated states is indeendent from the number of rounds and is bounded by O(n log n). Figure clearly shows the above olynomial relation between these two algorithms. However, Figure 2 shows that takes more time than. This is due to the fact that has one more job in each round. The relation here is linear and exhibits the combined imact of number of ma-reduce jobs and rounds on execution time. Since initial state is labeled as, we ma δ(i, j) = to n.
7 Communication Cost (MB) Communication Cost (MB) Figure : Communication cost on slow DFA for the alhabet size k = 2 Figure 3: Communication cost on circular DFA for the alhabet size k = 4,5, Figure 2: Execution time on slow DFA for the alhabet size k = 2 Figure 4: Execution time on circular DFA for the alhabet size k = 4 Circular DFA s. The main feature of this class of automata is that each state has k incoming transitions. It creates a uniform distribution of transitions both in maing schemas based on source state (the Key-Value airs h(), (, a, q, π, +) in and in the second ma function in ) and in maing schemas based on target state ( h(q), (, a, q, π q, ) in and the first ma function in ). As can be seen from Figure 3, both and have the same communication cost on small alhabets. On the other hand, similarly to the Slow DFA s, when the size of the alhabet is fixed and relatively small, we see from Figure 4 that requires more execution time. This is mainly because the overhead around I/O oerations and job execution, as was the case for Slow DFA s. We note that when the number of states is increased, both algorithms erform similarly. In order to see the effect of the alhabet size, we ran exeriments using increasing alhabet size and a fixed number (n = 24) of states. Figures 5 and 6 show that the communication cost and running time of are larger than that of Hocrft-MR by a factor of k, as stated earlier in Section 4.3. Communication Cost (GB) Alhabet Size Figure 5: Communication cost on circular DFA for n = 2, Alhabet Size Figure 6: Execution time on circular DFA for n = 2
8 Relicated-Random DFA s. Here the size of alhabet is fixed and equal to 4. The number of states in the Relicated-Random DFA s varies from to Each DFA is relicated four times and the coies are connected using an auxiliary start state. Results are lotted in Figures 7 and 8. These figures show that Relicated-Random DFA s behave similarly to Circular DFA s. We note that the small standard deviation of the distribution of tules among reducers in the Relicated-Random DFA s (see Table 2) does not affect the erformance significantly. The irregular value for the DFA in Figure 8 with states is, as shown in Table, due to the fact that the set of final states for this DFA is emty and it converges at the beginning of the rocess. Communication Cost (MB), Figure 9: Communication cost on star DFA for alhabet size k = 4 Communication Cost (MB), Figure 7: Communication cost on relicatedrandom DFA for alhabet size k = 4 Figure : Execution time on star DFA for alhabet size k = Communication Cost (MB) Alhabet Size Figure 8: Execution time on relicated-random DFA for alhabet size k = 4 Figure : n = 2 Communication cost on star DFA for Star DFA s. The last dataset is designed to highlight the sensitivity of the algorithms to data skewness. The main roerty of this dataset is that (k )n out of kn transitions will be maed to one reducer. Interestingly, both algorithms demonstrate the same erformance which shows both have the same sensitivity to skewness, as this is a clear consequence of their maing schemas. Results are lotted in Figures 9 and. In conclusion, these two algorithms behave equally when k is fixed and small. On the other hand, when the number of states is fixed and the alhabet size increases, there is a noticeable difference between and. Figure shows the difference in communication cost between these algorithms when k increases, while, Figure 2 shows this difference in execution time Alhabet Size Figure 2: Execution time on star DFA for n = 2
9 5.3 Summary The number of rounds required for each algorithm is categorized by the data sets and is listed in Table. For the Slow DFA s, and both required exactly n rounds, as it is exected. On the other hand, for Circular DFA s, both algorithms accomlished the task in a fixed number of rounds, indeendent from the size of the alhabet. The same results are observed for Star DFA s. Lastly, running the algorithms on Relicated-Random DFA s shows that has finished the task in fewer rounds than. Table : Number of rounds for minimizing DFA using and with different datasets Dataset k n M-MR a H-MR b Slow 2 2 to 2 n n Circular Random Star to to c c to to a b c This randomly generated DFA does not have any final state and will minimize into a single state machine. Figures 3 6 reresent the data distribution and execution time among all reducers only in the first round of execution. As is exected, for Circular and Relicated-Random DFA s the data is distributed and executed uniformly. However, for Star DFA s these results show that both and are sensitive to skewness. The data distribution shows that one reducer receives most of the inut while the rest of the inut is evenly shared among all other reducers. Number of Transitions ( 5 ) Circular Random Star Reducer Number Figure 3: Maximum number of transitions in reducer grous for s algorithm Circular Random Star Reducer Number Figure 4: Maximum execution time in reducer grous for s algorithm Number of Transitions ( 5 ) Circular Random Star Reducer Number Figure 5: Maximum number of transitions in reducer grous for s algorithm Circular Random Star Reducer Number Figure 6: Maximum execution time in reducer grous for s algorithm
10 Statistical summary of the distribution of the tules over the reducers is shown in Table 2 for both algorithms. For all four datasets the size of alhabet is set to 4 and the number of states is 2 7. The columns Min. and Max. show the minimum and maximum number of tules maed to each reducer in every job of the all rounds of execution. For we have two rows er DFA tye describing the two jobs. In Circular DFA s, tules are evenly distributed for both incoming and outgoing transitions, and the standard deviation for both algorithms is. However, a high degree of skewness is observed for Star DFA s (all states have one incoming transition excet the final state, which has (k )n incoming transitions). Table 2: Statistical summary of the distribution of records among reducers Dataset Min. Max. Avg. Std. Dev. Circular Random Star Circular Random Star CONCLUSION In this aer we develoed and studied imlementations in Ma-Reduce of two algorithms for minimizing DFA s. Our analytical and exerimental results show that the algorithm and the behave very similarly when the alhabet size is small, whereas Hocroft- MR outerforms in communication cost when the cardinality of the alhabet is at least 6, and in wallclock time when the cardinality is at least 32, both measured on DFA s with 24 states. Moreover, our study shows that both algorithms are equally sensitive to skewness in the inut data. However, (and serial Moore) determines the block of a state based on its (constant number of) outgoing transitions, while Hocorft-MR (and serial Hocroft) does it based on the transitions incident uon the state. This means that there is otential to reduce skewsensitiveness in. In future work we will also investigate the average communication cost of. Finally, the caacity of the reducers (indeendently of the maing schema) may affect number of rounds needed. We intend to exlore this trade-off in future work. 7. REFERENCES [] F. N. Afrati and J. D. Ullman. Transitive closure and recursive datalog imlemented on clusters. In 5th International Conference on Extending Database Technology, EDBT, Berlin, Germany, ages 32 43, 22. [2] J. David. Average comlexity of Moore s and Hocroft s algorithms. Theor. Comut. Sci., 47:5 65, 22. [3] J. Dean and S. Ghemawat. MaReduce: simlified data rocessing on large clusters. Communications of the ACM, 5():7 3, 28. [4] G. Grahne, S. Harrafi, A. Moallemi, and A. Onet. Comuting NFA intersections in Ma-Reduce. In Proceedings of the Workshos of the EDBT/ICDT Joint Conference (EDBT/ICDT), ages 42 45, 25. [5] J. E. Hocroft. An n log n algorithm for minimizing states in a finite automaton. In Theory of Machines and Comutations, ages Academic Press, New York, 97. [6] J. F. JáJá and K. W. Ryu. An efficient arallel algorithm for the single function coarsest artition roblem. In Proceedings of the fifth annual ACM symosium on Parallel algorithms and architectures, ages ACM, 993. [7] J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets, 2nd Ed. Cambridge University Press, 24. [8] Y. Luo, G. H. Fletcher, J. Hidders, Y. Wu, and P. De Bra. External memory k-bisimulation reduction of big grahs. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, ages ACM, 23. [9] E. F. Moore. Gedanken-exeriments on sequential machines. Automata studies, 34:29 53, 956. [] V. Rastogi, A. Machanavajjhala, L. Chitnis, and A. D. Sarma. Finding connected comonents in Ma-Reduce in logarithmic rounds. In 29th IEEE International Conference on Data Engineering, ICDE, Brisbane, Australia, ages 5 6, 23. [] B. Ravikumar and X. Xiong. A arallel algorithm for minimization of finite automata. In The th International Parallel Processing Symosium, ages 87 9, Aril 996. [2] A. Schätzle, A. Neu, G. Lausen, and M. Przyjaciel-Zablocki. Large-scale bisimulation of rdf grahs. In Proceedings of the Fifth Worksho on Semantic Web Information Management, age. ACM, 23. [3] M. Shaw, P. Koutris, B. Howe, and D. Suciu. Otimizing large-scale semi-naïve Datalog evaluation in Hadoo. In Datalog in Academia and Industry - Second International Worksho, Datalog 2., Vienna, Austria, ages 65 76, 22. [4] Y. N. Srikant. A arallel algorithm for the minimization of finite state automata. International Journal of Comuter Mathematics, 32(-2):, 99. [5] A. Tewari, U. Srivastava, and P. Guta. A arallel DFA minimization algorithm. In High Performance Comuting-HiPC, ages Sringer, 22. [6] B. W. Watson. A taxonomy of finite automata minimization algorithms. Comuting Science Note 93/44, Eindhoven University of Technology, The Netherlands, 993. [7] M. Yoeli. Canonical reresentations of chain events. Information and Control, 8(2):8 89, 965.
A Parallel Algorithm for Minimization of Finite Automata
A Parallel Algorithm for Minimization of Finite Automata B. Ravikumar X. Xiong Deartment of Comuter Science University of Rhode Island Kingston, RI 02881 E-mail: fravi,xiongg@cs.uri.edu Abstract In this
More informationThe Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule
The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca
More informationOutline. CS21 Decidability and Tractability. Regular expressions and FA. Regular expressions and FA. Regular expressions and FA
Outline CS21 Decidability and Tractability Lecture 4 January 14, 2019 FA and Regular Exressions Non-regular languages: Puming Lemma Pushdown Automata Context-Free Grammars and Languages January 14, 2019
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationModel checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]
Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear
More information1-way quantum finite automata: strengths, weaknesses and generalizations
1-way quantum finite automata: strengths, weaknesses and generalizations arxiv:quant-h/9802062v3 30 Se 1998 Andris Ambainis UC Berkeley Abstract Rūsiņš Freivalds University of Latvia We study 1-way quantum
More informationTopic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar
15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider
More informationElementary Analysis in Q p
Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some
More informationEvaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models
Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationGIVEN an input sequence x 0,..., x n 1 and the
1 Running Max/Min Filters using 1 + o(1) Comarisons er Samle Hao Yuan, Member, IEEE, and Mikhail J. Atallah, Fellow, IEEE Abstract A running max (or min) filter asks for the maximum or (minimum) elements
More informationAnalysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel
Performance Analysis Introduction Analysis of execution time for arallel algorithm to dertmine if it is worth the effort to code and debug in arallel Understanding barriers to high erformance and redict
More informationFuzzy Automata Induction using Construction Method
Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control
More informationShadow Computing: An Energy-Aware Fault Tolerant Computing Model
Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More informationCryptanalysis of Pseudorandom Generators
CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we
More informationEstimation of the large covariance matrix with two-step monotone missing data
Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo
More informationConvex Optimization methods for Computing Channel Capacity
Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationMulti-Operation Multi-Machine Scheduling
Multi-Oeration Multi-Machine Scheduling Weizhen Mao he College of William and Mary, Williamsburg VA 3185, USA Abstract. In the multi-oeration scheduling that arises in industrial engineering, each job
More informationLilian Markenzon 1, Nair Maria Maia de Abreu 2* and Luciana Lee 3
Pesquisa Oeracional (2013) 33(1): 123-132 2013 Brazilian Oerations Research Society Printed version ISSN 0101-7438 / Online version ISSN 1678-5142 www.scielo.br/oe SOME RESULTS ABOUT THE CONNECTIVITY OF
More informationAnalysis of some entrance probabilities for killed birth-death processes
Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction
More informationMatching Partition a Linked List and Its Optimization
Matching Partition a Linked List and Its Otimization Yijie Han Deartment of Comuter Science University of Kentucky Lexington, KY 40506 ABSTRACT We show the curve O( n log i + log (i) n + log i) for the
More informationParallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720
Parallelism and Locality in Priority Queues A. Ranade S. Cheng E. Derit J. Jones S. Shih Comuter Science Division University of California Berkeley, CA 94720 Abstract We exlore two ways of incororating
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationExamples from Elements of Theory of Computation. Abstract. Introduction
Examles from Elements of Theory of Comutation Mostafa Ghandehari Samee Ullah Khan Deartment of Comuter Science and Engineering University of Texas at Arlington, TX-7609, USA Tel: +(87)7-5688, Fax: +(87)7-784
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationTheoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability
Theoretically Otimal and Emirically Efficient R-trees with Strong Parallelizability Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang School of Comuting and Information Systems, The University of Melbourne
More informationFinite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking
Finite-State Verification or Model Checking Finite State Verification (FSV) or Model Checking Holds the romise of roviding a cost effective way of verifying imortant roerties about a system Not all faults
More informationRecursive Estimation of the Preisach Density function for a Smart Actuator
Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator
More informationModeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation
6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca
More informationThe Fekete Szegő theorem with splitting conditions: Part I
ACTA ARITHMETICA XCIII.2 (2000) The Fekete Szegő theorem with slitting conditions: Part I by Robert Rumely (Athens, GA) A classical theorem of Fekete and Szegő [4] says that if E is a comact set in the
More informationSAT based Abstraction-Refinement using ILP and Machine Learning Techniques
SAT based Abstraction-Refinement using ILP and Machine Learning Techniques 1 SAT based Abstraction-Refinement using ILP and Machine Learning Techniques Edmund Clarke James Kukula Anubhav Guta Ofer Strichman
More informationJohn Weatherwax. Analysis of Parallel Depth First Search Algorithms
Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel
More informationImproved Capacity Bounds for the Binary Energy Harvesting Channel
Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,
More informationGOOD MODELS FOR CUBIC SURFACES. 1. Introduction
GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in
More informationTowards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK
Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)
More informationFor q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i
Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:
More informationSums of independent random variables
3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where
More informationRANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES
RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete
More informationCHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules
CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is
More informationCryptography. Lecture 8. Arpita Patra
Crytograhy Lecture 8 Arita Patra Quick Recall and Today s Roadma >> Hash Functions- stands in between ublic and rivate key world >> Key Agreement >> Assumtions in Finite Cyclic grous - DL, CDH, DDH Grous
More informationOutline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding
Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift
More informationOn the Toppling of a Sand Pile
Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université
More informationA New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search
A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search Eduard Babkin and Margarita Karunina 2, National Research University Higher School of Economics Det of nformation Systems and
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationNew Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms
New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China
More informationA generalization of Amdahl's law and relative conditions of parallelism
A generalization of Amdahl's law and relative conditions of arallelism Author: Gianluca Argentini, New Technologies and Models, Riello Grou, Legnago (VR), Italy. E-mail: gianluca.argentini@riellogrou.com
More informationA Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition
A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino
More informationOn the capacity of the general trapdoor channel with feedback
On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:
More informationReal Analysis 1 Fall Homework 3. a n.
eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually
More informationDistributed Rule-Based Inference in the Presence of Redundant Information
istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced
More informationBrownian Motion and Random Prime Factorization
Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........
More informationHENSEL S LEMMA KEITH CONRAD
HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar
More informationFinding recurrent sources in sequences
Finding recurrent sources in sequences Aristides Gionis Deartment of Comuter Science Stanford University Stanford, CA, 94305, USA gionis@cs.stanford.edu Heikki Mannila HIIT Basic Research Unit Deartment
More informationTests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)
Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant
More informationRecent Developments in Multilayer Perceptron Neural Networks
Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael
More informationUncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning
TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment
More informationImprovement on the Decay of Crossing Numbers
Grahs and Combinatorics 2013) 29:365 371 DOI 10.1007/s00373-012-1137-3 ORIGINAL PAPER Imrovement on the Decay of Crossing Numbers Jakub Černý Jan Kynčl Géza Tóth Received: 24 Aril 2007 / Revised: 1 November
More informationA Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split
A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com
More informationCombinatorics of topmost discs of multi-peg Tower of Hanoi problem
Combinatorics of tomost discs of multi-eg Tower of Hanoi roblem Sandi Klavžar Deartment of Mathematics, PEF, Unversity of Maribor Koroška cesta 160, 000 Maribor, Slovenia Uroš Milutinović Deartment of
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationMaximum Cardinality Matchings on Trees by Randomized Local Search
Maximum Cardinality Matchings on Trees by Randomized Local Search ABSTRACT Oliver Giel Fachbereich Informatik, Lehrstuhl 2 Universität Dortmund 44221 Dortmund, Germany oliver.giel@cs.uni-dortmund.de To
More informationComputer arithmetic. Intensive Computation. Annalisa Massini 2017/2018
Comuter arithmetic Intensive Comutation Annalisa Massini 7/8 Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J Intensive Comutation - 7/8 3
More informationMachine Learning: Homework 4
10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,
More informationA New Perspective on Learning Linear Separators with Large L q L p Margins
A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More information#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS
#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt
More informationAn Introduction To Range Searching
An Introduction To Range Searching Jan Vahrenhold eartment of Comuter Science Westfälische Wilhelms-Universität Münster, Germany. Overview 1. Introduction: Problem Statement, Lower Bounds 2. Range Searching
More informationJoint Property Estimation for Multiple RFID Tag Sets Using Snapshots of Variable Lengths
Joint Proerty Estimation for Multile RFID Tag Sets Using Snashots of Variable Lengths ABSTRACT Qingjun Xiao Key Laboratory of Comuter Network and Information Integration Southeast University) Ministry
More informationChapter 1: PROBABILITY BASICS
Charles Boncelet, obability, Statistics, and Random Signals," Oxford University ess, 0. ISBN: 978-0-9-0005-0 Chater : PROBABILITY BASICS Sections. What Is obability?. Exeriments, Outcomes, and Events.
More informationAnalysis of M/M/n/K Queue with Multiple Priorities
Analysis of M/M/n/K Queue with Multile Priorities Coyright, Sanjay K. Bose For a P-riority system, class P of highest riority Indeendent, Poisson arrival rocesses for each class with i as average arrival
More information2-D Analysis for Iterative Learning Controller for Discrete-Time Systems With Variable Initial Conditions Yong FANG 1, and Tommy W. S.
-D Analysis for Iterative Learning Controller for Discrete-ime Systems With Variable Initial Conditions Yong FANG, and ommy W. S. Chow Abstract In this aer, an iterative learning controller alying to linear
More informationOutline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu
and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu
More informationON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS
#A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,
More informationAnalysis of Multi-Hop Emergency Message Propagation in Vehicular Ad Hoc Networks
Analysis of Multi-Ho Emergency Message Proagation in Vehicular Ad Hoc Networks ABSTRACT Vehicular Ad Hoc Networks (VANETs) are attracting the attention of researchers, industry, and governments for their
More informationMobility-Induced Service Migration in Mobile. Micro-Clouds
arxiv:503054v [csdc] 7 Mar 205 Mobility-Induced Service Migration in Mobile Micro-Clouds Shiiang Wang, Rahul Urgaonkar, Ting He, Murtaza Zafer, Kevin Chan, and Kin K LeungTime Oerating after ossible Deartment
More informationAn Overview of Witt Vectors
An Overview of Witt Vectors Daniel Finkel December 7, 2007 Abstract This aer offers a brief overview of the basics of Witt vectors. As an alication, we summarize work of Bartolo and Falcone to rove that
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More informationThe Euler Phi Function
The Euler Phi Function 7-3-2006 An arithmetic function takes ositive integers as inuts and roduces real or comlex numbers as oututs. If f is an arithmetic function, the divisor sum Dfn) is the sum of the
More informationVarious Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems
Int. J. Oen Problems Comt. Math., Vol. 3, No. 2, June 2010 ISSN 1998-6262; Coyright c ICSRS Publication, 2010 www.i-csrs.org Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various
More informationReal-Time Computing with Lock-Free Shared Objects
Real-Time Comuting with Lock-Free Shared Objects JAMES H. ADERSO, SRIKATH RAMAMURTHY, and KEVI JEFFAY University of orth Carolina This article considers the use of lock-free shared objects within hard
More informationA CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract
A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave
More informationCharacteristics of Fibonacci-type Sequences
Characteristics of Fibonacci-tye Sequences Yarden Blausa May 018 Abstract This aer resents an exloration of the Fibonacci sequence, as well as multi-nacci sequences and the Lucas sequence. We comare and
More informationA randomized sorting algorithm on the BSP model
A randomized sorting algorithm on the BSP model Alexandros V. Gerbessiotis a, Constantinos J. Siniolakis b a CS Deartment, New Jersey Institute of Technology, Newark, NJ 07102, USA b The American College
More informationq-ary Symmetric Channel for Large q
List-Message Passing Achieves Caacity on the q-ary Symmetric Channel for Large q Fan Zhang and Henry D Pfister Deartment of Electrical and Comuter Engineering, Texas A&M University {fanzhang,hfister}@tamuedu
More informationMATH 361: NUMBER THEORY EIGHTH LECTURE
MATH 361: NUMBER THEORY EIGHTH LECTURE 1. Quadratic Recirocity: Introduction Quadratic recirocity is the first result of modern number theory. Lagrange conjectured it in the late 1700 s, but it was first
More informationSpectral Clustering based on the graph p-laplacian
Sectral Clustering based on the grah -Lalacian Thomas Bühler tb@cs.uni-sb.de Matthias Hein hein@cs.uni-sb.de Saarland University Comuter Science Deartment Camus E 663 Saarbrücken Germany Abstract We resent
More informationCOMMUNICATION BETWEEN SHAREHOLDERS 1
COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov
More informationAR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research
AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM DEGENERATE MIXTURES Radu Balan, Alexander Jourjine, Justinian Rosca Siemens Cororation Research 7 College Road East Princeton, NJ 8 fradu,jourjine,roscag@scr.siemens.com
More informationSolution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0
Advanced Finite Elements MA5337 - WS7/8 Solution sheet This exercise sheets deals with B-slines and NURBS, which are the basis of isogeometric analysis as they will later relace the olynomial ansatz-functions
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationTopic 7: Using identity types
Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules
More informationParallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem
Parallel Quantum-insired Genetic Algorithm for Combinatorial Otimization Problem Kuk-Hyun Han Kui-Hong Park Chi-Ho Lee Jong-Hwan Kim Det. of Electrical Engineering and Comuter Science, Korea Advanced Institute
More informationB8.1 Martingales Through Measure Theory. Concept of independence
B8.1 Martingales Through Measure Theory Concet of indeendence Motivated by the notion of indeendent events in relims robability, we have generalized the concet of indeendence to families of σ-algebras.
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationSTABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS
STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS Massimo Vaccarini Sauro Longhi M. Reza Katebi D.I.I.G.A., Università Politecnica delle Marche, Ancona, Italy
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationLecture 9: Connecting PH, P/poly and BPP
Comutational Comlexity Theory, Fall 010 Setember Lecture 9: Connecting PH, P/oly and BPP Lecturer: Kristoffer Arnsfelt Hansen Scribe: Martin Sergio Hedevang Faester Although we do not know how to searate
More informationEXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES
EXACTLY ERIODIC SUBSACE DECOMOSITION BASED AROACH FOR IDENTIFYING TANDEM REEATS IN DNA SEUENCES Ravi Guta, Divya Sarthi, Ankush Mittal, and Kuldi Singh Deartment of Electronics & Comuter Engineering, Indian
More information