Cole s MergeSort. prof. Ing. Pavel Tvrdík CSc. Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 2010

Size: px

Start display at page:

Download "Cole s MergeSort. prof. Ing. Pavel Tvrdík CSc. Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 2010"

Audra Hampton
5 years ago
Views:

1 Cole s MergeSort prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 2010 Pokročilé paralelní algoritmy (PI-PPA) LS 2010/11, Seminář 3 Evropský sociální fond. Praha & EU: Investujeme do vaší budoucnosti prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 1 / 27

2 Standard MergeSort Algorithms Even-Odd MergeSort Standard sorting on hypercubic networks Batcher s algorithms: Even-Odd MergeSort, Bitonic MergeSort. Even-Odd MergeSort (EOMS) a 0 a 1 EOMS(N) EOMS(2N) a EOM(N,N) EOM(N/2,N/2) a N 1 a N a N+1 EOM(N,N) a N 2 N 1 b b b EOM(N/2,N/2) EOMS(N) a 2N 1 b N 2 N 1 EOMS(a 0,..., a 2N 1 ) = EOMerge(EOMS(a 0,..., a N 1 ), EOMS(a N,..., a 2N 1 )). prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 2 / 27

3 Standard MergeSort Algorithms Even-Odd Merge Even-Odd Merge (EOM) A B a a N 2 N 1 b b b b N 2 N 1 interleave CE EOM(N,N) EOM(N/2,N/2) EOM(N/2,N/2) C L L D C = EOMerge(even(A), odd(b)) D = EOMerge(odd(A), even(b)) L = Interleave(C, D) L = EOMerge(A, B) = Pairwise CE(L ) prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 3 / 27

4 Standard MergeSort Algorithms Correctness and parallel complexity of EOM Correctness and parallel complexity Lemma 1 EOM merges 2 sorted sequences A, B of length N in log N + 1 parallel CE steps using N(log N + 1) comparators. Proof. (By 0-1 Sorting Lemma.) The lemma holds for N = 1. Let N = 2 k, k 1. Let α and β be the # of zeros in A and B, resp. Then the # of zeros in C and D is γ = α/2 + β/2 and δ = α/2 + β/2, resp. Hence, γ δ 1. Therefore, the # of 0 s in C and D can differ by at most one. One comparator column can fix the only one possible misordered pair. prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 4 / 27

5 Standard MergeSort Algorithms Correctness and parallel complexity of EOM d m (2) = 1 & d m (2N) = d m (N) + 1 d m (2N) = log N + 1 = log(2n), c m (2N) = Nd m (2N) = N log(2n). prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 5 / 27

6 Standard MergeSort Algorithms Parallel complexity analysis of EOMS Time and cost complexity of EOMS Lemma 2 EOMS sorts N numbers in O(log 2 N) parallel CE steps using O(N log 2 N) comparators. d s (2) = 1 & d s (2N) = d s (N) + d m (2N) = d s (N) + log(2n) d s (N) = log N(log N + 1)/2 = O(log 2 N). c s (2) = 1 & c s (2N) = 2c s (N) + c m (2N) = 2c s (N) + N log(2n) c s (N) = Nd s (N)/2 = O(N log 2 N). prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 6 / 27

7 Standard MergeSort Algorithms EOMS iteratively Unfolded EOMS EOMS treats the input sequence as a sequence of N pairs. By passing through a comparator, each pair becomes a sorted subsequence of length 2. These subsequences are then merged into N/2 sorted subsequences of length 4, and so on. In the last merge step, two ascending sequences of length N are merged into the final ascending sequence of length 2N prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 7 / 27

8 Standard MergeSort Algorithms Even-Even MergeSort Even-Even MergeSort (EEMS) where EEMS(a 0,..., a 2N 1 ) = EEMerge(EEMS(a 0,..., a N 1 ), EEMS(a N,..., a 2N 1 )). EEMerge(A, B) = Pairwise CE(Even Odd Interleave(C, D)) C = EEMerge(even(A), even(b)) D = EEMerge(odd(A), odd(b)) a 0 a 1 EEMS(N) EEMS(2N) a 3 EEM(N,N) EEM(N/2,N/2) interleave CE a N 1 a N a N+1 EEMS(N) EEM(N,N) N 2 a N 1 b0 b1 b2 b3 EEM(N/2,N/2) a 2N 1 N 2 bn 1 (a) (b) prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 8 / 27

9 Standard MergeSort Algorithms Non-binary proofs of EEMS Non-binary proof of the standard EEMS Lemma 3 Let Then where N = 2 k, L[1,..., N] and R[1,..., N] = monotonic sequences, Z = EEMerge(L, R) = merge of L and R of length 2N. Z = Pairwise CE(Even Odd Interleave(OM, EM)), EM[1,..., N] = EEMerge(even(L), even(r)), OM[1,..., N] = EEMerge(odd(L), odd(r)). prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 9 / 27

10 Standard MergeSort Algorithms Non-binary proofs of EEMS Proof.. 1 N L: R: EM: i? OM: i i+ 2 N i N i 1 The only uncertain number in OM with respect to EM[i] is OM[i + 1] we just need CE(EM[i], OM[i + 1]) to put these two numbers in order. prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 10 / 27

11 Standard MergeSort Algorithms Non-binary proofs of EEMS Yet another proof of the standard EEMS Lemma 4 Let z = EM[i] and j = the original index of z in its original input sequence, say L, f(z) = the final position of z in the final sequence Z. Then 2i f(z) 2i + 1. Proof.. L: 2 j R: EM: i?? Z: i 1 2N 2 i 1 prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 11 / 27

12 Standard MergeSort Algorithms Problem analysis Problem analysis What can be said about OM if we have only EM? In general, very little. L: R: EM: Z: ??????? 29 But with some additional O(N) computation, we can determine exactly the final positions of the odd-positioned numbers without performing the operation EEMerge(odd(A), odd(b)). 7 prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 12 / 27

13 The main idea The main idea of the cost-optimal EREW PRAM MergeSort In each level of EEMS, apply recursively only 1 even-even merge operation. Perform some auxiliary O(N) computations to find the final positions for all numbers in L R. Then the # of CE operations for the merge operation of 2N numbers is c m (2N) = c m (N) + O(N). This equation has solution since c m (2N) = O(N), N + N/2 + N/ = 2N. Therefore, the total # of CE operations of the whole EEMS algorithm is then c s (2N) = 2c s (N) + c m (2N) = O(N log N). prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 13 / 27

14 Pseudocode Pseudocode of the Cole s EREW PRAM MergeSort function EffMerge(L, R) returns Z; input EM, L, R: array[1,..., N] of T = record {V al : int; Origin : {L, R}; OrigIndex : int}; output Z: array[1,..., 2N] of T ; auxiliary OrP I, OpP I: array[1,..., N] of int; ( OrP I[i] = index of predecessor of EM[i].V al in its original array EM[i].Origin ) ( OpP I[i] = index of predecessor of EM[i].V al in the opposite than the original array ) LP I, RP I: array[1,..., N] of int; ( LP I[i] = index of predecessor of EM[i].V al in L ) ( RP I[i] = index of predecessor of EM[i].V al in R ) LEMP I, REMP I: array[1,..., N] of int; ( LEMP I[i] = index of predecessor of L[i].V al, i is odd, in EM ) ( REMP I[i] = index of predecessor of R[i].V al, i is odd, in EM ) prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 14 / 27

15 Pseudocode F I: array[1,..., 2N] of int; ( F I = final indexes of numbers from L R in Z ) begin ( merge recursively the even-numbered subsequences ) EM = EffMerge(even(L), even(r)); ( Phase A: calculate indexes of predecessors for all numbers from EM in L and R ) i = 1,..., N do in parallel { OrP I[i] := EM[i].OrigIndex 1; OpP I[i] := 2i EM[i].OrigIndex; if (EM[i].Origin = L) then { if (OpP I[i] < N and R[OpP I[i] + 1].V al < EM[i].V al) then OpP I[i] + +} else { if (OpP I[i] < N and L[OpP I[i] + 1].V al < EM[i].V al) then OpP I[i] + +}; } prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 15 / 27

16 Pseudocode ( Phase B: rename arrays ) i = 1,..., N do in parallel if (EM[i].Origin = L) then {LP I[i] := OrP I[i]; RP I[i] := OpP I[i]} else {LP I[i] := OpP I[i]; RP I[i] := OrP I[i]} ( Phase C: calculate indexes of predecessors in EM for all elements from L R EM ) i = 1,..., N do in parallel { if (LP I[i 1] < LP I[i] and LP I[i] is odd) then LEMP I[LP I[i]] := i 1; if (RP I[i 1] < RP I[i] and RP I[i] is odd) then REMP I[RP I[i]] := i 1; } prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 16 / 27

17 Pseudocode ( Phase D: calculate the final positions in Z ) ( D1: calculation for EM ) i = 1,..., N do in parallel { F I[i] := LP I[i] + RP I[i] + 1; Z[F I[i]] := EM[i]; } ( D2: calculation for L EM ) i = 1,..., N 1 do in parallel { j := LEMP I[i]; if (L[i].V al > R[RP I[j + 1]].V al) then F I[i] := i + RP I[j + 1] else F I[i] := i + RP I[j + 1] 1; Z[F I[i]] := L[i]; } ( D3: calculation for R EM ) i = 1,..., N 1 do in parallel { j := REMP I[i]; if (R[i].V al > L[LP I[j + 1]].V al) then F I[i] := i + LP I[j + 1] else F I[i] := i + LP I[j + 1] 1; Z[F I[i]] := R[i]; } end prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 17 / 27

18 Correctness proofs Proof of the correctness of Cole s MergeSort Phase A OpPI: EM: i=5 L: R: ? 2 i j j=4 EM: OpPI: L: R: prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 18 / 27

19 Correctness proofs Phase B EM: LPI: RPI: L: R: prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 19 / 27

20 Correctness proofs Span of RP I and LP I Lemma 5 The maximal span between adjacent LPI s or RPI s is at most 2. Proof. Case (a): e 1 < e < o 1 < e 2 < o 2 < e e < e 2 < e Case (b): o 1 < e e 1 < o 2 < e 2 < e e < e 2 < e. EM: LPI i 1 i i 1 i e e EM: e e LPI LPI LPI L: e1 o1 e2 o2 L: o1 e1 o2 e2 (a) (b) prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 20 / 27

21 Correctness proofs Phase C Case (a): x < e < z < e y. Case (b): e x & x < e x must appear in EM before e, but not before e x = e LEMP I[j] = i 1. Case (c): EM: L: EM: L: cannot appear due to the Lemma. i 1 i i 1 i i 1 i e e EM: e e EM: e e LPI LPI LPI LPI LPI LPI x z y L: y x z L: u y x z j 1 j j+1 j 2 j 1 j j 3j 2 j 1 j LEMPI i 1 i i 1 i e e EM: e e LEMPI x z y L: y x z j 1 j j+1 j 2 j 1 j prof. Pavel Tvrdík (FIT(a) ČVUT) Cole(b) MS PI-PPA, 2011, Seminář 3 21 / 27 (c)

22 Correctness proofs The indexes LEMP I and REMP I LEMPI: EM: REMPI: L: R: prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 22 / 27

23 Correctness proofs Phase D2 Calculate numbers in L EM the # of predecessor in R. Trivially, k > e & e > r k > r. k e & e r k r. j j+1 EM: e e LEMPI[i] RPI[j] L: k R: i RPI[j+1] r r r k k It follows that interval (r, r contains at most two numbers. prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 23 / 27

24 Correctness proofs Case analysis EM: j j+1 e e EM: j j+1 e e LEMPI[i] RPI[j] RPI[j+1] LEMPI[i] RPI[j] RPI[j+1] L: k i R: (a) r e r r? L: k i R: (b) r o r r EM: j j+1 e e EM: j j+1 e e LEMPI[i] RPI[j] RPI[j+1] LEMPI[i] RPI[j] RPI[j+1] L: k i R: (c) r r r L: k R: r r r i (d)? prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 24 / 27

25 Complexity analysis Complexity of Cole s EREW PRAM MergeSort Theorem 6 EffMerge(N, N) requires 4N CE s and its parallel time is T (2N, N) = O(log N). Proof. Phases A and C perform at most N comparisons plus some O(1) operations per number. The same for Phase D. c m (2N) c m (N) + 2N c m (2N) 4N. p = N one stage takes O(1) time. The # of stages = O(log N). prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 25 / 27

26 Complexity analysis Theorem 7 EffMerge(N, N) takes parallel time O(log N) using N/ log N processors. Proof. log N numbers per processor. The first stage takes α log N parallel CE s plus auxiliary operations, α 4. The 1st recursive call of EffMerge takes (α log N)/2. The 2nd recursive call (α log N)/4, and so on. The log log N-th recursive call the # of numbers = N/ log N at most one number per processor O(1) time. The total time is T (2N, N/ log N) = αo (log N + (log N)/2 + (log N)/ where the # of nonconstant terms is log log N and the remaining log N log log N terms are O(1) T (2N, N/ log N) α3 log N. prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 26 / 27

27 Complexity analysis Theorem 8 The Cole s MergeSort performs at most 2N log N comparisons to sort N numbers, it runs in parallel time O(log 2 N) using N/ log N processors. Proof. The total # of CE operations per stage is at most 2N (EffMerge of two sequences of length N/2). The # of merge stages is log N. Each stage takes parallel time log N on N/ log N processors. prof. Pavel Tvrdík (FIT ČVUT) Cole MS PI-PPA, 2011, Seminář 3 27 / 27

Markovské řetězce se spojitým parametrem

Markovské řetězce se spojitým parametrem Mgr. Rudolf B. Blažek, Ph.D. prof. RNDr. Roman Kotecký, DrSc. Katedra počítačových systémů Katedra teoretické informatiky Fakulta informačních technologií České