CSE 613: Parallel Programming Lecture 8 ( Analyzing Divide-and-Conquer Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2012
A Useful Recurrence Consider the following recurrence: Θ 1, 1,, ; where, 1and 1. Arises frequently in the analyses of divide-and-conquer algorithms. Recall the following from the analyses of QSort(quicksort) in lecture 1. Serial: 2 Θ Parallel (with serial partition): Θ Parallel (with parallel partition): Θ log
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,.
How the Recurrence Unfolds Θ 1, 1,,. Θ
How the Recurrence Unfolds: Case 1 Θ 1, 1,,. Ο log _ for some constant 0. Sums Geometrically Increase Level by Level. Last Level Dominates. Θ Θ
How the Recurrence Unfolds: Case 2 Θ 1, 1,,. Θ log lg Sums Arithmetically Increase for some constant 0. Level by Level. No Level Dominates. Θ + Θ
How the Recurrence Unfolds: Case 3 Θ 1, 1,,. Ω log + & for constants 0& 1. Sums Geometrically decrease Level by Level. First Level Dominates. Θ Θ
The Master Theorem Θ 1, 1,, 1,1. Case 1: Ο log _ for some constant0 Θ log Case 2: Θ log lg for some constant0. Θ log lg +1 Case 3: Ω log + and for constants0and 1. Θ
Back to QSort Complexities Now let s try the QSort(quicksort) recurrences from lecture 1. Serial: 2 Θ Master Theorem Case 2: Θ log Parallel (with serial partition): Θ Master Theorem Case 3: Θ Parallel (with parallel partition): Θ log Master Theorem Case 2: Θ log 2
Multithreaded Matrix Multiplication
Parallel Iterative MM Iter-MM ( Z, X, Y ) { X, Y, Z are n n matrices, where n is a positive integer } 1. for i 1 to n do 2. for j 1 to n do 3. Z[ i ][ j ] 0 4. for k 1 to n do 5. Z[ i ][ j ] Z[ i ][ j ] + X[ i ][ k ] Y[ k ][ j ] Par-Iter-MM ( Z, X, Y ) { X, Y, Z are n n matrices, where n is a positive integer } 1. parallel for i 1 to n do 2. parallel for j 1 to n do 3. Z[ i ][ j ] 0 4. for k 1 to n do 5. Z[ i ][ j ] Z[ i ][ j ] + X[ i ][ k ] Y[ k ][ j ]
Parallel Iterative MM Par-Iter-MM ( Z, X, Y ) { X, Y, Z are n n matrices, where n is a positive integer } 1. parallel for i 1 to n do 2. parallel for j 1 to n do 3. Z[ i ][ j ] 0 4. for k 1 to n do 5. Z[ i ][ j ] Z[ i ][ j ] + X[ i ][ k ] Y[ k ][ j ] Work: Θ Span: Θ loglog Θ Parallelism: Θ
Parallel Recursive MM Z X Y n/2 n/2 n/2 n/2 Z 11 Z 12 n/2 X 11 X 12 n/2 Y 11 Y 12 n = n n Z 21 Z 22 X 21 X 22 Y 21 Y 22 n n n n/2 n/2 X 11 Y 11 + X 12 Y 21 X 11 Y 12 + X 12 Y 22 = n X 21 Y 11 + X 22 Y 21 X 21 Y 12 + X 22 Y 22 n
Parallel Recursive MM Par-Rec-MM ( Z, X, Y ) { X, Y, Z are n n matrices, where n = 2 k for integer k 0 } 1. if n = 1 then 2. Z Z + X Y 3. else 4. spawn Par-Rec-MM ( Z 11, X 11, Y 11 ) 5. spawn Par-Rec-MM ( Z 12, X 11, Y 12 ) 6. spawn Par-Rec-MM ( Z 21, X 21, Y 11 ) 7. Par-Rec-MM ( Z 21, X 21, Y 12 ) 8. sync 9. spawn Par-Rec-MM ( Z 11, X 12, Y 21 ) 10. spawn Par-Rec-MM ( Z 12, X 12, Y 22 ) 11. spawn Par-Rec-MM ( Z 21, X 22, Y 21 ) 12. Par-Rec-MM ( Z 22, X 22, Y 22 ) 13. sync 14. endif
Parallel Recursive MM Par-Rec-MM ( Z, X, Y ) { X, Y, Z are n n matrices, where n = 2 k for integer k 0 } 1. if n = 1 then 2. Z Z + X Y 3. else 4. spawn Par-Rec-MM ( Z 11, X 11, Y 11 ) 5. spawn Par-Rec-MM ( Z 12, X 11, Y 12 ) 6. spawn Par-Rec-MM ( Z 21, X 21, Y 11 ) 7. Par-Rec-MM ( Z 21, X 21, Y 12 ) 8. sync 9. spawn Par-Rec-MM ( Z 11, X 12, Y 21 ) 10. spawn Par-Rec-MM ( Z 12, X 12, Y 22 ) 11. spawn Par-Rec-MM ( Z 21, X 22, Y 21 ) 12. Par-Rec-MM ( Z 22, X 22, Y 22 ) 13. sync 14. endif Work: Θ 1, 1, 8 2 Θ 1,. Span: Θ [ MT Case 1 ] Θ 1, 1, 2 2 Θ 1,. Θ [ MT Case 1 ] Parallelism: Additional Space: Θ 1 Θ
n/2 Recursive MM with More Parallelism Z n/2 n/2 Z 11 Z 12 n/2 X 11 Y 11 + X 12 Y 21 X 11 Y 12 + X 12 Y 22 n = n Z 21 Z 22 X 21 Y 11 + X 22 Y 21 X 21 Y 12 + X 22 Y 22 n n n/2 n/2 n/2 X 11 Y 11 = + X 11 Y 12 n/2 X 12 Y 21 X 12 Y 22 n n X 21 Y 11 X 21 Y 12 X 22 Y 21 X 22 Y 22 n n
Recursive MM with More Parallelism Par-Rec-MM2 ( Z, X, Y ) { X, Y, Z are n n matrices, where n = 2 k for integer k 0 } 1. if n = 1 then 2. Z Z + X Y 3. else { T is a temporary n n matrix } 4. spawn Par-Rec-MM2 ( Z 11, X 11, Y 11 ) 5. spawn Par-Rec-MM2 ( Z 12, X 11, Y 12 ) 6. spawn Par-Rec-MM2 ( Z 21, X 21, Y 11 ) 7. spawn Par-Rec-MM2 ( Z 21, X 21, Y 12 ) 8. spawn Par-Rec-MM2 ( T 11, X 12, Y 21 ) 9. spawn Par-Rec-MM2 ( T 12, X 12, Y 22 ) 10. spawn Par-Rec-MM2 ( T 21, X 22, Y 21 ) 11. Par-Rec-MM2 ( T 22, X 22, Y 22 ) 12. sync 13. parallel for i 1 to n do 14. parallel for j 1 to n do 15. Z[ i ][ j ] Z[ i ][ j ] + T[ i ][ j ] 16. endif
Recursive MM with More Parallelism Par-Rec-MM2 ( Z, X, Y ) { X, Y, Z are n n matrices, where n = 2 k for integer k 0 } 1. if n = 1 then 2. Z Z + X Y 3. else { T is a temporary n n matrix } 4. spawn Par-Rec-MM2 ( Z 11, X 11, Y 11 ) 5. spawn Par-Rec-MM2 ( Z 12, X 11, Y 12 ) 6. spawn Par-Rec-MM2 ( Z 21, X 21, Y 11 ) 7. spawn Par-Rec-MM2 ( Z 21, X 21, Y 12 ) 8. spawn Par-Rec-MM2 ( T 11, X 12, Y 21 ) 9. spawn Par-Rec-MM2 ( T 12, X 12, Y 22 ) 10. spawn Par-Rec-MM2 ( T 21, X 22, Y 21 ) 11. Par-Rec-MM2 ( T 22, X 22, Y 22 ) 12. sync 13. parallel for i 1 to n do 14. parallel for j 1 to n do 15. Z[ i ][ j ] Z[ i ][ j ] + T[ i ][ j ] 16. endif Work: Θ 1, 1, 8 2 Θ,. Span: Θ [ MT Case 1 ] Θ 1, 1, 2 Θ log,. Θ log 2 [ MT Case 2 ] Parallelism: Θ Additional Space: Θ 1, 1, 8 2 Θ,. Θ [ MT Case 1 ]
Multithreaded Merge Sort
Parallel Merge Sort Merge-Sort ( A, p, r ) { sort the elements in A[ p r ] } 1. if p < r then 2. q ( p + r ) / 2 3. Merge-Sort ( A, p, q ) 4. Merge-Sort ( A, q + 1, r ) 5. Merge ( A, p, q, r ) Par-Merge-Sort ( A, p, r ) { sort the elements in A[ p r ] } 1. if p < r then 2. q ( p + r ) / 2 3. spawn Merge-Sort ( A, p, q ) 4. Merge-Sort ( A, q + 1, r ) 5. sync 6. Merge ( A, p, q, r )
Parallel Merge Sort Par-Merge-Sort ( A, p, r ) { sort the elements in A[ p r ] } 1. if p < r then 2. q ( p + r ) / 2 3. spawn Merge-Sort ( A, p, q ) 4. Merge-Sort ( A, q + 1, r ) 5. sync 6. Merge ( A, p, q, r ) Work: Θ 1, 1, 2 2 Θ,. Θ log [ MT Case 2 ] Span: Θ 1, 1, 2 Θ,. Θ [ MT Case 3 ] Parallelism: Θ log
Source:Cormenet al., Introduction to Algorithms, 3 rd Edition subarrays to merge: suppose: merged output: Parallel Merge 1 1...... 1
subarrays to merge: Parallel Merge 1 1.... suppose: Source:Cormenet al., Introduction to Algorithms, 3 rd Edition merged output:.. 1 Step 1: Find, where is the midpoint of..
subarrays to merge: Parallel Merge 1 1.... suppose: Source:Cormenet al., Introduction to Algorithms, 3 rd Edition merged output:.. 1 Step 2: Use binary search to find the index in subarray.. so that the subarraywould still be sorted if we insert between 1 and
subarrays to merge: Parallel Merge 1 1.... suppose: Source:Cormenet al., Introduction to Algorithms, 3 rd Edition merged output:.. 1 Step 3:Copy to, where
subarrays to merge: Parallel Merge 1 1.... suppose: Source:Cormenet al., Introduction to Algorithms, 3 rd Edition merged output:.. 1 Perform the following two steps in parallel. Step 4(a):Recursively merge.. 1 with.. 1, and place the result into.. 1
subarrays to merge: Parallel Merge 1 1.... suppose: Source:Cormenet al., Introduction to Algorithms, 3 rd Edition merged output:.. 1 Perform the following two steps in parallel. Step 4(a):Recursively merge.. 1 with.. 1, and place the result into.. 1 Step 4(b):Recursively merge 1.. with 1.., and place the result into 1..
Parallel Merge Par-Merge ( T, p 1, r 1, p 2, r 2, A, p 3 ) 1. n 1 r 1 p 1 + 1, n 2 r 2 p 2 + 1 2. if n 1 < n 2 then 3. p 1 p 2, r 1 r 2, n 1 n 2 4. if n 1 = 0 then return 5. else 6. q 1 ( p 1 + r 1 ) / 2 7. q 2 Binary-Search ( T[ q 1 ], T, p 2, r 2 ) 8. q 3 p 3 + ( q 1 p 1 ) + ( q 2 p 2 ) 9. A[ q 3 ] T[ q 1 ] 10. spawn Par-Merge ( T, p 1, q 1-1, p 2, q 2-1, A, p 3 ) 11. Par-Merge ( T, q 1 +1, r 1, q 2 +1, r 2, A, q 3 +1 ) 12. sync
Parallel Merge Par-Merge ( T, p 1, r 1, p 2, r 2, A, p 3 ) 1. n 1 r 1 p 1 + 1, n 2 r 2 p 2 + 1 2. if n 1 < n 2 then 3. p 1 p 2, r 1 r 2, n 1 n 2 4. if n 1 = 0 then return 5. else 6. q 1 ( p 1 + r 1 ) / 2 7. q 2 Binary-Search ( T[ q 1 ], T, p 2, r 2 ) 8. q 3 p 3 + ( q 1 p 1 ) + ( q 2 p 2 ) 9. A[ q 3 ] T[ q 1 ] We have, 2 In the worst case, a recursive call in lines 9-10 merges half the elements of.. with all elements of... 10. spawn Par-Merge ( T, p 1, q 1-1, p 2, q 2-1, A, p 3 ) 11. Par-Merge ( T, q 1 +1, r 1, q 2 +1, r 2, A, q 3 +1 ) 12. sync Hence, #elements involved in such a call: 2 2 2 2 2 2 4 2 4 3 4
Parallel Merge Par-Merge ( T, p 1, r 1, p 2, r 2, A, p 3 ) 1. n 1 r 1 p 1 + 1, n 2 r 2 p 2 + 1 2. if n 1 < n 2 then 3. p 1 p 2, r 1 r 2, n 1 n 2 4. if n 1 = 0 then return 5. else 6. q 1 ( p 1 + r 1 ) / 2 7. q 2 Binary-Search ( T[ q 1 ], T, p 2, r 2 ) 8. q 3 p 3 + ( q 1 p 1 ) + ( q 2 p 2 ) 9. A[ q 3 ] T[ q 1 ] 10. spawn Par-Merge ( T, p 1, q 1-1, p 2, q 2-1, A, p 3 ) 11. Par-Merge ( T, q 1 +1, r 1, q 2 +1, r 2, A, q 3 +1 ) 12. sync Span: Θ 1, 1, 3 4 Θ log,. Θ log 2 [ MT Case 2 ] Work: Clearly, Ω We show below that, Ο For some,, we have the following recurrence, 1 Ο log Assuming log for positive constants and, and substituting on the right hand side of the above recurrence gives us: log Ο. Hence, Θ.
Parallel Merge Sort with Parallel Merge Par-Merge-Sort ( A, p, r ) { sort the elements in A[ p r ] } 1. if p < r then 2. q ( p + r ) / 2 3. spawn Merge-Sort ( A, p, q ) 4. Merge-Sort ( A, q + 1, r ) 5. sync 6. Par-Merge ( A, p, q, r ) Work: Θ 1, 1, 2 2 Θ,. Θ log [ MT Case 2 ] Span: Θ 1, 1, 2 Θ log2,. Θ log 3 [ MT Case 2 ] Parallelism: Θ