Lecture 22: Multithreaded Algorithms CSCI Algorithms I. Andrew Rosenberg

Similar documents
Multi-threading model

Mergesort and Recurrences (CLRS 2.3, 4.4)

CSE 613: Parallel Programming. Lecture 8 ( Analyzing Divide-and-Conquer Algorithms )

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

CSE 613: Parallel Programming. Lectures ( Analyzing Divide-and-Conquer Algorithms )

Divide and Conquer Strategy

CSCI 3110 Assignment 6 Solutions

Searching. Sorting. Lambdas

CS 240A : Examples with Cilk++

Analysis of Multithreaded Algorithms

Grade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2

Fundamental Algorithms

Fundamental Algorithms

Space Complexity of Algorithms

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures

CS325: Analysis of Algorithms, Fall Midterm

Review Of Topics. Review: Induction

Solving Recurrences. Lecture 23 CS2110 Fall 2011

Asymptotic Running Time of Algorithms

CSCI Honor seminar in algorithms Homework 2 Solution

CSE 548: Analysis of Algorithms. Lecture 12 ( Analyzing Parallel Algorithms )

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Lecture 4. Quicksort

Topic 17. Analysis of Algorithms

Computational Complexity - Pseudocode and Recursions

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

CS173 Running Time and Big-O. Tandy Warnow

data structures and algorithms lecture 2

Big O 2/14/13. Administrative. Does it terminate? David Kauchak cs302 Spring 2013

Solutions. Problem 1: Suppose a polynomial in n of degree d has the form

CS483 Design and Analysis of Algorithms

Quicksort. Where Average and Worst Case Differ. S.V. N. (vishy) Vishwanathan. University of California, Santa Cruz

= 1 when k = 0 or k = n. Otherwise, Fact 4.5 (page 98) says. n(n 1)(n 2) (n k + 3)(n k + 2)(n k + 1) (n k + 1)(n k + 2)(n k + 3) (n 2)(n 1) n

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Algorithm efficiency analysis

Week 5: Quicksort, Lower bound, Greedy

Divide and Conquer. Arash Rafiey. 27 October, 2016

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

University of the Virgin Islands, St. Thomas January 14, 2015 Algorithms and Programming for High Schoolers. Lecture 5

Data Structures and Algorithms CSE 465

Lecture 2: Asymptotic Notation CSCI Algorithms I

CSE 613: Parallel Programming. Lecture 9 ( Divide-and-Conquer: Partitioning for Selection and Sorting )

COMP Analysis of Algorithms & Data Structures

Problem Set 1 Solutions

Algorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ. Instructor: X. Zhang

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2

Algorithms 演算法. Multi-threaded Algorithms

Outline. 1 Introduction. Merging and MergeSort. 3 Analysis. 4 Reference

Introduction to Divide and Conquer

CSE 101. Algorithm Design and Analysis Miles Jones Office 4208 CSE Building Lecture 1: Introduction

CS Non-recursive and Recursive Algorithm Analysis

Divide and Conquer. CSE21 Winter 2017, Day 9 (B00), Day 6 (A00) January 30,

V. Adamchik 1. Recurrences. Victor Adamchik Fall of 2005

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen

Mathematical Background. Unsigned binary numbers. Powers of 2. Logs and exponents. Mathematical Background. Today, we will review:

University of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2006

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Recommended readings: Description of Quicksort in my notes, Ch7 of your CLRS text.

Introduction to Algorithms 6.046J/18.401J/SMA5503

Lecture 5: Loop Invariants and Insertion-sort

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

Solving recurrences. Frequently showing up when analysing divide&conquer algorithms or, more generally, recursive algorithms.

DM507 - Algoritmer og Datastrukturer Project 1

Cpt S 223. School of EECS, WSU

CS 161 Summer 2009 Homework #2 Sample Solutions

Analysis of Algorithms - Using Asymptotic Bounds -

Time Analysis of Sorting and Searching Algorithms

The Divide-and-Conquer Design Paradigm

Algorithms. What is an algorithm?

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Design and Analysis of Algorithms Department of Mathematics MA21007/Assignment-2/Due: 18 Aug.

Analysis of Algorithms. Randomizing Quicksort

Analysis of Algorithms CMPSC 565

Algorithms And Programming I. Lecture 5 Quicksort

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

Divide-and-Conquer Algorithms Part Two

Tricks of the Trade in Combinatorics and Arithmetic

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

COMP 250 Fall Midterm examination

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

CS483 Design and Analysis of Algorithms

Divide and Conquer Algorithms

1 Divide and Conquer (September 3)

i=1 i B[i] B[i] + A[i, j]; c n for j n downto i + 1 do c n i=1 (n i) C[i] C[i] + A[i, j]; c n

Divide-and-Conquer. a technique for designing algorithms

CE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort)

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

Computational Complexity

Divide and conquer. Philip II of Macedon

Data selection. Lower complexity bound for sorting

Divide and Conquer algorithms

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Bucket-Sort. Have seen lower bound of Ω(nlog n) for comparisonbased. Some cheating algorithms achieve O(n), given certain assumptions re input

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

COL106: Data Structures and Algorithms (IIT Delhi, Semester-II )

1 Substitution method

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

Transcription:

Lecture 22: Multithreaded Algorithms CSCI 700 - Algorithms I Andrew Rosenberg

Last Time Open Addressing Hashing

Today Multithreading

Two Styles of Threading Shared Memory Every thread can access the same memory (data). Distributed Memory Each thread has it s own partition of the memory

Types of Parallelism Nested Parallelism Call a method. Don t wait for it to return.

Types of Parallelism Parallel Loops Just like a for loop Loop executions run concurrently

Parallelization keywords spawn Start a new thread running sync Wait here until all threads finish parallel indicates a parallel loop

Fibonacci example for paralleism Fib(n) if n 1 then return n else x = Fib(n-1) y = Fib(n-2) return x + y end if

Fibonacci example for paralleism Fib(n) if n 1 then return n else x = spawn Fib(n-1) y = Fib(n-2) sync return x + y end if

Recursion of Fib(6)

Graphical example of P-Fib(4) While uses 17 commands, the critical path the longest from initial strand to the final strand is 8 units long. (Can be found using a BFS.)

Some Terminology Work is the number of calls to a function that are made (T 1 ). Span is the length of the critical path (T ). The fastest a process can run on a multicore processor with P threads is T P T 1 /P. The speedup is T 1 /T P. If the speedup is P we have perfect linear speedup. The parallelism of an algorithm is T 1 /T. This is the maximum possible speedup that an algorithm can achieve by adding more processors.

Analyzing multithreaded algorithms Most of what we ve done so far has been the analysis and optimization of work. Analyzing span is different. The total span of a parallel algorithm with two components A and B is max(t (A), T (B)). For P-Fib(x): T (n) = max(t (n 1), T (n 2)) + Θ(1) (1) = T (n 1) + Θ(1) (2)

Analyzing multithreaded algorithms Most of what we ve done so far has been the analysis and optimization of work. Analyzing span is different. The total span of a parallel algorithm with two components A and B is max(t (A), T (B)). For P-Fib(x): T (n) = max(t (n 1), T (n 2)) + Θ(1) (1) = T (n 1) + Θ(1) (2) = Θ(n) (3) (4) Huge Parallelism: T 1 (n)/t (n) = θ(φ n /n)

Parallel Loops If the iterations of a loop don t depend on each other, they can run in parallel. For example, multiply a matrix A by a vector x. y i = n j=1 A ijx j

y i = n j=1 A ijx j Mat-Vec(A,x) n = A.rows y = new vector with n cells for i = 1 to n do y i = 0 end for for i = 1 to n do for j = 1 to n do y i = y i + a ij x j end for end for

y i = n j=1 A ijx j Mat-Vec(A,x) n = A.rows y = new vector with n cells for parallel i = 1 to n do y i = 0 end for for parallel i = 1 to n do for j = 1 to n do y i = y i + a ij x j end for end for

Race Conditions Race() x = 0 for parallel i = 1 to 2 do x = x + 1 end for print x Need to make sure that the two threads are independent no writing to memory that other threads are reading from.

Multithreaded Merge Sort Merge-Sort(A,p,r) if p < r then q = (p + r/2) spawn Merge-Sort(A, p, q) Merge-Sort(A, q + 1, r) sync Merge(A, p, q, r) end if Spawn two recursive calls to Merge Sort. No problem

Multithreaded Merge Sort Work MS 1 (n) = 2MS 1 (n/2) + Θ(n) = Θ(n log n) Span MS (n) = MS (n/2) + Θ(n) = Θ(n) Parallelization MS 1 MS = Θ(n log n) Θ(n) = Θ(log n)

Parallel Merge Serial Merge is dominating the performance. How can we parallelize merge? Divide-and-conquer Merge Put the middle element, z, of the smaller of the two lists in the correct position Merge the subarrays containing elements smaller than z Merge the subarrays containing elements greater than z

Parallel Merge P-Merge(T, p 1, r 1, p2, r 2, A, p 3 ) n 1 = r 1 p 1 + 1 n 2 = r 2 p 2 + 1 if n 1 < n 2 then swap p s, r s and n s end if if n 1 == 0 then return else q 1 = (p 1 + r 1 )/2 q 2 =Binary-Search(T [q 1 ], T, p 2, r 2 ) q 3 = p 3 + (q 1 p 1 ) + (q 2 p 2 ) // Where to put T [q 1 ] A[q 3 ] = T [q 1 ] spawn P-Merge(T, p 1, q 1 1, p 2, q 2 1, A, p 3 ) P-Merge(T, q 1 + 1, r 1, q 2 + 1, r 2, A, q 3 + 1) sync end if

Parallel Merge Span Identify the maximum number of elements in the largest call to P-Merge. The worst case merges n 1 /2 elements (from the larger subarray) with all n 2 elements (from the smaller subarray). n 1 /2 + n 2 n 1 /2 + n 2 /2 + n 2 /2 = (n 1 + n 2 )/2 + n 2 /2 n/2 + n/4 = 3n/4 PM (n) = PM (3n/4) + Θ(log n) = Θ(log 2 n)

Parallel Merge Work PM 1 (n) = PM 1 = PM 1 (αn) + PM 1 ((1 α)n) + O(log n) PM 1 is clearly Ω(n). Can show that PM 1 (n) c 1 n c 2 log n for constants c 1, c 2 proving that PM 1 = O(n). Thus, PM 1 = Θ(n).

Parallel Merge Sort P-MergeSort(A, p, r, B, s) n = r p + 1 if n == 1 then B[s] = A[p] else let T [n] be a new array q = (p + r)/2 q = q p + 1 spawn P-Merge-Sort(A, p, q, T, 1) P-Merge-Sort(A, q + 1, r, T, q + 1) sync P-Merge(T, 1, q, q + 1, n, B, s) end if

Parallel Merge Sort Work PMergeSort 1 (n) = 2PMergeSort 1 (n/2) + PM 1 (n) = 2PMergeSort 1 (n/2) + Θ(n) = Θ(n log n)

Parallel Merge Sort Span PMergeSort (n) = PMergeSort (n/2) + PM i nfty(n) = PMergeSort (n/2) + Θ(log 2 n) = Θ(log 3 n) Parallelism PMergeSort 1 (n)/pmergesort (n) = Θ(n log n)/θ(log 3 n) = Θ(n/ log 2 n)

Bye Next time NP-Completeness