Lecture 6 September 21, 2016

Similar documents
Lecture 5: Loop Invariants and Insertion-sort

Lecture 2 September 4, 2014

Lecture 7: Dynamic Programming I: Optimal BSTs

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

Heaps Induction. Heaps. Heaps. Tirgul 6

EE/CSCI 451: Parallel and Distributed Computation

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

R ij = 2. Using all of these facts together, you can solve problem number 9.

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions

Topic 17. Analysis of Algorithms

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Solutions. Problem 1: Suppose a polynomial in n of degree d has the form

INF2220: algorithms and data structures Series 1

Methods for solving recurrences

data structures and algorithms lecture 2

Graph-theoretic Problems

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

Today s Outline. CS 362, Lecture 4. Annihilator Method. Lookup Table. Annihilators for recurrences with non-homogeneous terms Transformations

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Searching. Sorting. Lambdas

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

CS383, Algorithms Spring 2009 HW1 Solutions

CSC236H Lecture 2. Ilir Dema. September 19, 2018

Advanced Analysis of Algorithms - Midterm (Solutions)

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Lecture 10 September 27, 2016

COS597D: Information Theory in Computer Science October 19, Lecture 10

Introduction to Divide and Conquer

Assignment 4. CSci 3110: Introduction to Algorithms. Sample Solutions

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Midterm Exam. CS 3110: Design and Analysis of Algorithms. June 20, Group 1 Group 2 Group 3

CS 161 Summer 2009 Homework #2 Sample Solutions

Algorithm efficiency analysis

6.841/18.405J: Advanced Complexity Wednesday, February 12, Lecture Lecture 3

Sums and Products. a i = a 1. i=1. a i = a i a n. n 1

ICS141: Discrete Mathematics for Computer Science I

Week 5: Quicksort, Lower bound, Greedy

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015

Module 1: Analyzing the Efficiency of Algorithms

COL 730: Parallel Programming

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Quicksort. Where Average and Worst Case Differ. S.V. N. (vishy) Vishwanathan. University of California, Santa Cruz

CS2223 Algorithms D Term 2009 Exam 3 Solutions

Ch 01. Analysis of Algorithms

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

Compare the growth rate of functions

CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions

CS 361, Lecture 14. Outline

1 Caveats of Parallel Algorithms

COL106: Data Structures and Algorithms (IIT Delhi, Semester-II )

Recursion: Introduction and Correctness

Ch01. Analysis of Algorithms

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Data selection. Lower complexity bound for sorting

Algorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee

CS 151. Red Black Trees & Structural Induction. Thursday, November 1, 12

Problem Set 2 Solutions

Space and Nondeterminism

Saturday, April 23, Dependence Analysis

CS156: The Calculus of Computation

CSE 202 Homework 4 Matthias Springer, A

CS173 Lecture B, November 3, 2015

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

CS 310 Advanced Data Structures and Algorithms

Lecture 12 : Recurrences DRAFT

Math 391: Midterm 1.0 Spring 2016

Central Algorithmic Techniques. Iterative Algorithms

Lecture 1: Contraction Algorithm

13 Dynamic Programming (3) Optimal Binary Search Trees Subset Sums & Knapsacks

Analysis of Algorithms. Outline. Single Source Shortest Path. Andres Mendez-Vazquez. November 9, Notes. Notes

E.g. The set RE of regular expressions is given by the following rules:

COMP 382: Reasoning about algorithms

The Trees of Hanoi. Joost Engelfriet

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

Advanced Data Structures

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

Written Homework #1: Analysis of Algorithms

Practice Second Midterm Exam III

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

CIS 121. Analysis of Algorithms & Computational Complexity. Slides based on materials provided by Mary Wootters (Stanford University)

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun

Fast Sorting and Selection. A Lower Bound for Worst Case

Exercise Sheet #1 Solutions, Computer Science, A M. Hallett, K. Smith a k n b k = n b + k. a k n b k c 1 n b k. a k n b.

Advanced Data Structures

Design and Analysis of Algorithms

Data structures Exercise 1 solution. Question 1. Let s start by writing all the functions in big O notation:

Entropy as a measure of surprise

Recap: Prefix Sums. Given A: set of n integers Find B: prefix sums 1 / 86

Lecture Notes for Chapter 17: Amortized Analysis

CS325: Analysis of Algorithms, Fall Midterm

Lecture 3. 1 Polynomial-time algorithms for the maximum flow problem

Today s Outline. CS 362, Lecture 13. Matrix Chain Multiplication. Paranthesizing Matrices. Matrix Multiplication. Jared Saia University of New Mexico

Recursion and Induction

1. Problem I calculated these out by hand. It was tedious, but kind of fun, in a a computer should really be doing this sort of way.

Transcription:

ICS 643: Advanced Parallel Algorithms Fall 2016 Lecture 6 September 21, 2016 Prof. Nodari Sitchinava Scribe: Tiffany Eulalio 1 Overview In the last lecture, we wrote a non-recursive summation program and showed that analysis of the runtime was more intuitive than with the recursive summation program written previously. We proved a theorem stating that a p-processor CRCW PRAM algorithm that runs in time t p (n), can be implemented on a p-processor EREW PRAM in time t p(n) O(t p (n) log p). In this lecture, we introduce the parallel prefix-sums algorithm. We start with an overview of an iterative version of prefix-sums. We then present a parallel prefix-sum algorithm, followed by an inductive proof to show that this algorithm correctly computes prefix-sums. 2 Prefix-Sums Algorithm Given an array A, with n elements, the prefix-sums algorithm returns an array B, such that: B[i] (1) k1 In other words, each element of B in position i, is equal to the sum of all elements from 0 to i in A. So, if we have the following array as input, A [1, 5, 3], then, B[0] A[0] 1, B[1] A[0] + A[1] 1 + 5 6 B[2] A[0] + A[1] + A[2] 1 + 5 + 3 9 Our resulting array is B [1, 6, 9]. computes prefix-sums. We can see then, that the following algorithm iteratively Algorithm 1 PrefixSums(A[0... (n 1)]) B[0] A[0] for i 1 to n 1 do B[i] B[i 1] + A[i] return B[0... n] This algorithm iterates through the arrays, summing each element and storing the new sum in its proper position in B along the way. We can see that this algorithm runs in linear time. Although our iterative algorithm takes input from array A and returns the prefix-sums in array B, it is also possible to write the prefix-sums directly into A, creating an in-place prefix-sum algorithm without the use of the additional array B (see Algorithm 2). 1

Algorithm 2 InPlacePrefixSums(A[0... (n 1)]) for i 1 to n 1 do A[i] A[i 1] + A[i] return 2.1 Parallel Prefix-Sums Prefix-sums inherently seems like an iterative problem since each entry depends upon those that come before it. However, we want to be able to perform this task in parallel. We ll claim that the following in-place algorithm computes prefix-sums in parallel. For the purpose of analysis, we assume that the input array has a size of 2n + 1 and has index positions i, for n i < n, where elements in positions i < 0 are equal to zero. Algorithm 3 ParallelPrefixSums(A[0... (n 1)]) for j 0 to log n 1 do for i 2 j to n 1 in parallel do return A[i] A[i] + A[i 2 j ] We can see that the runtime for the inner for loop will be constant time since the operation is done by all processors in parallel. The outer for loop runs from zero to log n 1, and so, requires log n time to execute. This tells us that our runtime is T (n) O(log n). To calculate work, we need to include the steps done in the inner for loop. We know that i goes from 2 j to n 1 while the outer for loop goes from j 0 to log n. So we can describe the work as follows: First, let us prove the following: Claim 1. k 1 j0 2j 2 k 1 W (n) Proof. We ll prove this using induction. Base case: k 1. log n 1 j0 log n 1 j0 log n 1 j0 n 1 1 i2 j (n 2 j ) log n 1 n j0 2 j 1 1 0 2 j 2 j 1 j0 j0 2

Inductive step: Assume the claim holds for k > 1 and let us prove it for k k + 1. k 1 j0 2 j k j0 2 j k 1 j0 2 j 2 k 1 + 2 k 2 2 k 1 2 k+1 1 2 k 1 + 2 k Thus, we can say that log n 1 j0 2 j 2 log n 1 n 1. Another way to understand this is by picturing a full binary tree. If a binary tree has n leaves, then the numbers we are summing, are the amount of internal nodes of the tree. The number of internal nodes of a binary tree is equal to one less than the number of leaves. So in our example, the number of internal nodes is n 1. In Figure 1 below, the subtree of internal nodes is colored in red. Notice that the level right above the leaves has n 2 nodes. This is similar to our sum equation ending at n 2. Figure 1: Binary Tree and subtree Now that we ve figured out what the term log n 1 j0 2 j equals, we can continue our analysis of work done for the parallel prefix-sum algorithm, starting where we left off. So work is equal to Θ(n log n). log n 1 W (n) n log n j0 n log n (n 1) Θ(n log n) 2 j 3

2.2 Proof of correctness of Parallel Prefix-Sum We want to prove that the Parallel Prefix-Sums algorithm actually does what it s supposed to do, which is to calculate the correct prefix-sums and store them back into the array in the appropriate elements. First, let s begin by understanding what is happening in the algorithm. Algorithm 4 ParallelPrefixSums(A[0... (n 1)]) for j 0 to log n 1 do for i 2 j to n 1 in parallel do return A[i] A[i] + A[i 2 j ] Let s say that we have the input array A, with size, n 14, and elements initialized to one (See Figure 2). In the first iteration, j 0, and the elements with indices i, from 2 j 2 0 1 to n 1, get the sum of its own value and that of the element to its left. On the next iteration, j 1, and now the elements from 2 i n 1, get the sum of its own value with that of the element 2 indices away on the left. We continue this until j log n 1. In our case, this is when j 3. See Figure 2 for a visualization of the what is happening in this algorithm. Figure 2: Parallel Prefix Sum We can now look at the algorithm in steps, as j is incremented. Notice that after step j completes, the first 2 j+1 elements of array A are set to correct prefix-sums values. In other words, before step j occurs, the first 2 j elements of array A are correct prefix-sums values. After step 0, when j 0, has occurred, the first two (2 j+1 2 0+1 2) elements are correct prefix-sums values. We see this in Figure 2, where elements A[0] 1 and A[1] 2. Notice also, that each of the rest of the elements 4

is set to the sum of 2 j+1 preceding elements, including itself, after step j has completed. In our example for step 0, we see that the rest of the elements in A[2... 12] are all set to the value 2, which is the sum of the two elements: the preceding one and itself. We will now use these observations to prove the correctness of the parallel prefix-sum algorithm using induction. Claim 2. Before each round j in the algorithm A[i] i ki (2 j 1) We can run through the summation to get an understanding of what s going on. Let s use our example from Figure 2, but we ll only concern ourselves with the first four elements. Remember, we stated earlier that the input array A, is actually of size 2n 1 with elements at indices less than zero, initialized to zero. We begin with j 0. Then, b 2 j 1 0. We can then see that for i 0, k 0 0 0. So the summation goes from zero to zero, and before round zero, A[0] A[0]. We see that before round zero, this is the case for all i, so that A[i] A[i]. This makes sense since no work has been done before round zero. The same can be done for additional levels of j to verify that this summation makes sense. Table 1 below, shows this for 0 i < 4 and j 2. Notice the general pattern of iterations emerging as j increases, and see how it matches the steps in Figure 2. Table 1: Illustration of Claim 2 for the first few iterations j 2 j 1 i k 0 0 0 0 A[0] A[0] 1 1 1 A[1] A[1] 1 2 2 A[2] A[2] 1 3 3 A[3] A[3] 1 1 1 0-1 A[0] A[ 1] + A[0] 1 1 0 A[1] A[0] + A[1] 2 2 1 A[2] A[1] + A[2] 2 3 2 A[3] A[2] + A[3] 2 2 3 0-3 A[0] A[ 3] + A[ 2] + A[ 1] + A[0] 1 1-2 A[1] A[ 2] + A[ 1] + A[0] + A[1] 2 2-1 A[2] A[ 1] + A[0] + A[1] + A[2] 3 3 0 A[3] A[0] + A[1] + A[2] + A[3] 4 Now that we understand the claim that s being made, we can begin the inductive proof for the parallel prefix-sum algorithm. Proof. Base case: j 0 5

A[i] ki (2 j 1) ki (2 0 1) ki A[i] Thus, we see that the base case holds for j 0. Now, we do the inductive step. First, we assume that for some j, the claim holds true so that before the j th round, A[i] i ki (2 j 1). Then, by looking back at the algorithm, we look at the state of the array elements before round j j +1. From the algorithm, we know that before round j, A[i] A[i] + A[i 2 j ] Let i i 2 j, then A[i] A[i] + A[i ] + ki (2 j 1) i ki (2 j 1), by our inductive hypothesis To prove our claim, we need to show that A[i] i. We can distribute the negative ki (2 j 1) in the first summation and replace i with i 2 j in the second summation. A[i] ki 2 j +1 + i 2 j ki 2 j (2 j 1) Now let s take a look at these summations separately. The first summation A[i] i ki 2 j +1 is equal to A[i 2 j + 1] +... + A[i 2] + A[i 1] + A[i]. What this means is that we re summing consecutive elements in an array from positions (i 2 j + 1) to i. Now if we look at the second summation from the equation above, we see that i 2 j ki 2 j (2 j 1) is equal to A[i 2j (2 j 1)] +... + A[i 2 j 1] + A[i 2 j ]. This is also the sum of consecutive elements in the array. Notice that these two sequences are actually next to one another and form one long consecutive sequence (see figure 3). 6

Figure 3: Consecutive summation elements The first summation term is the latter half of the consecutive sequence, while the second summation term is the former half. We see that each sequence is the sum of 2 j elements. We can easily check this by subtracting the last index from the first and adding one as we did earlier, i (i 2 j + 1) + 1 i i + 2 j 1 + 1 2 j i 2 j (i 2 j (2 j 1)) + 1 i 2 j i + 2 j + 2 j 1 + 1 2 j Then, we can see that this sequence is summing 2 2 j 2 j+1 2 j the sequences are consecutive, we can say that elements. Now that we know A[i] ki 2 j (2 j 1) ki 2 j 2 j +1 ki 2 j+1 +1 ki (2 j+1 1) ki (2 j 1) 7

Thus, after round j log n 1, A[i] i the definition of prefix sums. ki (2 j 1) for every index 0 i n 1, which is 8