Divide-and-conquer. Curs 2015

Similar documents
Divide-and-conquer: Order Statistics. Curs: Fall 2017

5. DIVIDE AND CONQUER I

5. DIVIDE AND CONQUER I

Algorithms And Programming I. Lecture 5 Quicksort

Lecture 4. Quicksort

Copyright 2000, Kevin Wayne 1

Divide-and-Conquer. Consequence. Brute force: n 2. Divide-and-conquer: n log n. Divide et impera. Veni, vidi, vici.

Data Structures and Algorithms CSE 465

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

! Break up problem into several parts. ! Solve each part recursively. ! Combine solutions to sub-problems into overall solution.

Divide-Conquer-Glue Algorithms

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

Algorithm Design and Analysis

CPS 616 DIVIDE-AND-CONQUER 6-1

Partition and Select

Analysis of Algorithms. Randomizing Quicksort

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Analysis of Algorithms CMPSC 565

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

Divide and conquer. Philip II of Macedon

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

5. DIVIDE AND CONQUER I

Week 5: Quicksort, Lower bound, Greedy

ITEC2620 Introduction to Data Structures

Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

COMP Analysis of Algorithms & Data Structures

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Data selection. Lower complexity bound for sorting

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Fundamental Algorithms

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Fundamental Algorithms

Quicksort. Where Average and Worst Case Differ. S.V. N. (vishy) Vishwanathan. University of California, Santa Cruz

Review Of Topics. Review: Induction

Kartsuba s Algorithm and Linear Time Selection

Divide and Conquer Algorithms

Divide and Conquer. Andreas Klappenecker. [based on slides by Prof. Welch]

CS 470/570 Divide-and-Conquer. Format of Divide-and-Conquer algorithms: Master Recurrence Theorem (simpler version)

1 Divide and Conquer (September 3)

Data Structures and Algorithms

CS483 Design and Analysis of Algorithms

Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

Divide and Conquer Algorithms

On Partial Sorting. Conrado Martínez. Univ. Politècnica de Catalunya, Spain

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Divide and Conquer Strategy

Data Structures and Algorithm Analysis (CSC317) Randomized Algorithms (part 3)

Extended Algorithms Courses COMP3821/9801

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Recommended readings: Description of Quicksort in my notes, Ch7 of your CLRS text.

Mergesort and Recurrences (CLRS 2.3, 4.4)

Quick Sort Notes , Spring 2010

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Linear Time Selection

Selection and Adversary Arguments. COMP 215 Lecture 19

CS 231: Algorithmic Problem Solving

Divide and Conquer Algorithms

Divide and Conquer. Arash Rafiey. 27 October, 2016

Objec&ves. Review. Divide and conquer algorithms

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Searching. Sorting. Lambdas

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

MAKING A BINARY HEAP

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

1 Terminology and setup

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

MAKING A BINARY HEAP

Notes for Recitation 14

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

CSE 421 Algorithms: Divide and Conquer

Divide and Conquer. Andreas Klappenecker

data structures and algorithms lecture 2

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Divide & Conquer. CS 320, Fall Dr. Geri Georg, Instructor CS320 Div&Conq 1

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Introduction to Divide and Conquer

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Quicksort algorithm Average case analysis

Find Min and Max. Find them independantly: 2n 2. Can easily modify to get 2n 3. Should be able to do better(?) Try divide and conquer.

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Divide-and-Conquer. a technique for designing algorithms

Concrete models and tight upper/lower bounds

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Quicksort. Recurrence analysis Quicksort introduction. November 17, 2017 Hassan Khosravi / Geoffrey Tien 1

CSE 312, Winter 2011, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

Reductions, Recursion and Divide and Conquer

Algorithms Test 1. Question 1. (10 points) for (i = 1; i <= n; i++) { j = 1; while (j < n) {

CSC236 Intro. to the Theory of Computation Lecture 7: Master Theorem; more D&C; correctness

Divide and Conquer. Recurrence Relations

CSE 591 Homework 3 Sample Solutions. Problem 1

Chunksort: A Generalized Partial Sort Algorithm

Analysis of Approximate Quickselect and Related Problems

CS483 Design and Analysis of Algorithms

CS361 Homework #3 Solutions

Bucket-Sort. Have seen lower bound of Ω(nlog n) for comparisonbased. Some cheating algorithms achieve O(n), given certain assumptions re input

Transcription:

Divide-and-conquer Curs 2015

The divide-and-conquer strategy. 1. Break the problem into smaller subproblems, 2. recursively solve each problem, 3. appropriately combine their answers. Known Examples: Binary search Mergesort Quicksort Strassen matrix multiplication Julius Caesar (I-BC) Divide et impera J. von Neumann (1903-57) Merge sort

Collaborative Filtering In many commercial webs collaborative filtering is a technique to match your preferences with other customers in the Web, to guess which product you should be recommended. One manner to do it, make rankings of given product (music, movies, novels) and match your interest with similar rankings by other people. One way to quantify the notion of how similar are two ordered lists is by counting the inversions between the two lists.

Counting inversions Given n items (1, 2,..., n) consider A has a list of preferences L A = {1, 2,,..., n}, and B has a list L B = {b 1, b 2,..., b n }. We want to see how similar (close) L B is to L A Two items i, j form an inversion if i < j in L A but i > j in L B. For example: Consider L A = {1, 2, 3, 4, 5, 6, 7, 8} and L B = {1, 5, 3, 2, 7, 4, 8, 6} 1 2 3 4 5 6 7 8 1 5 3 2 7 4 8 6 Number of inversions 8 (5,3),(5,2), (5,4), (5,7), (3,2) (7,4), (7,6), (8,6) 1 2 3 4 5 6 7 8 Number of inversions 28 All pairs are inversions 8 7 6 5 4 3 2 1

D&C for counting inversions Brute force algorithm: Look at every pair (i, j) to see if it is an inversion: ( n 2) = n 2 /2 = O(n 2 ) Algorisme D&C 1. Divide: Separate the lists into two half 2. Conquer: Recursively count inversions in each half 3. Combine: Count inversions where i i j are in different halves 4. Return: The sum of the 3 quantities The strategy for 3 will be similar to Mergesort

Combine two halfs E and D key idea: sort using merge sort: Given subproblems E and D at same level of recursion, with E and D already sorted: scan from left to right E and D compare i E with j D if i < j then i is not inverted with left element in D if i > j then i is inverted with every left element in D append the smaller to sorted list. Complexity: T (n) = 2T (n/2) + O(n) = O(n lg n) 1 3 7 8 2 4 5 7 1 3 6 8 2 4 5 7 1 2 +1 (3,2) 1 2 3 4 5 6 7 8 +4

Example 1 5 3 2 7 4 8 6 1 5 3 2 7 4 8 6 1 5 3 2 7 4 8 6 1 5 3 2 7 4 8 6 1 1 5 2 3 4 7 6 8 2 3 5 4 6 7 8 (3,2), (7,4), (8,6) (5,2),(5,3), (7,6) 1 2 3 4 5 6 7 8 (5,4) Total inversions= 7

2D-Closest pair of points Give n points in the plane, find a pair of points with the smallest Euclidean distance. Assumption: No two points have the same x-coordinate. Brute force algorithm: Compute the distance between every pair (i, j) and compare with the others: O(n 2 ) Very easy: sort by coordinate. O(n lg n). But sorting method does not generalize to higher dimensions (2) why?

D&C for 2D-Closest pair of points D&C 1. Divide: Separate the plane by a line L, into two half E and D with same number of points (±1) 2. Conquer: Recursively find the minimal distance between pairs of points in each half. 3. Combine: Taking into consideration pairs of points (p, q) with p E and q D 4. Return: The pair of points at minimal distance. E Recursive calls D L3 L2 L3 L1 L3 L2 L3

D& C Algorithm At each step: Divide: Sort the n points by its x coordinate. Take n/2 into left of L (E) and n/2 into right of L (D) (O(n lg n)) Conquer: Return d = min{d E, d D } (2T (n/2)) dd p 1 de q1 q2 p2 E L D Combine: There might be two points, one in E and other in D, that are closer than d

D& C Algorithm: Combine phase Take a vertical band of width 2d around L Any p E and q D s.t. d(p, q) d must reside in this band. There could be many other points inside the band. Focus only of points in the band To find the closest p, q in this band: Sort by increasing y-coordinate the points in the band, Y = {y 1, y 2,... y m }. Cost: O(n lg n) E 2 d dd p 1 de q1 q2 p2 L D

D& C Algorithm: Combine phase Consider a grid with d/2 inside the band There is at most 1 point inside each d 2 d 2 cell. (the diagonal of the cell = d 2 < d) Two points > 2 cells rows apart have distance > d (The distance between two points in two consecutive cells is 5 4 = 1.118d.) d Two points > 2 cells columns apart have distance > d (The same argument, that above) 2 d L d/2

How many squares a point can influence?: For every point in the sorted Y = {y 1, y 2,... y m }, starting from y 1 we only have to explore the distance between y i and the nest 10 ordered points in Y. y i, y j, d(y i, y j ) d if i j 10 So for every point in he band we only have to compare with the 10 nearest points in Y, with a total cost 10n. d

Closest-Pair Algorithm: Closest-Pair (p 1,..., p n ) Sort by the x-coordinate to compute L d 1 =Closet-Pair(E) d 2 =Closet-Pair(D) d = min{d 1, d 2 } Delete points > d from L Sort the remaining points by y-coordinate to form listed Y Scan in order Y list computing the distance with next 11 elements If any of those distances is < d update d T (n) = 2T (n/2) + O(n lg n) = O(n lg 2 n) Do you know how to improve to O(n lg n)

Random-Quicksort Consider the function Ran-Partition: Ran-Partition (A[p,..., q]) r = rand(p, q) u.a.r. interchange A[p] and A[r] Using Ran-Partition, consider the following randomized Divide and Conquer algorithm, on input A[1,..., n]: Ran-Quicksort (A[p,..., q]) r = Ran-Partition (A[p,..., q]) if p < q then Ran-Quicksort (A[1,..., r 1]) Ran-Quicksort (A[+1,..., q]) else return A[p] end if

Example A={1,3,5,6,8,10,12,14,15,16,17,18,20,22,23} Ran Partition of input 8 3 16 1 6 12 18 5 10 14 15 17 22 20 23

Expected Complexity of Ran-Partition The expected running time T (n) of Rand-Quicksort is dominated by the number of comparisons. Every call to Rand-Partition has cost Θ(1) + Θ(number of comparisons) }{{} p q If we can count the number of comparisons, we can bound the the total time of Quicksort. Let X be the number of comparisons made in all calls of Ran-Quicksort X is a rv as it depends of the random choices of Ran-Partition

Expected Complexity of Ran-Partition Note: In the first application of Ran-Partition A[r] compares with all n 1 elements. Key observation: Any two keys are compared iff one of them is a pivot, and they are compared at most one time. 10 12 14 15 16 17 18 20 22 23 never compare For simplicity assume all keys are different, for any input A[i,..., j] to Ran-Quicksort, 1 i < j n, let Z i,j be the ordered set of key {z i, z i+1,..., z j } (with z i the smallest). Note Z i,j = j 1 + 1 Therefore choosing u.a.r. a pivot is done with probability. 1 Z i,j = 1 j 1 + 1

Define the indicator r.v.: { 1 if z i is compared to z j, X ij = 0 otherwise. Then, X = n 1 i=1 n j=i+1 X i,j (this is true because we never compare a pair more than once) n 1 E [X ] = E n i=1 j=i+1 X i,j = n 1 n i=1 j=i+1 E [X i,j ] As E [X i,j ] = 0Pr [X i,j = 0] + 1Pr [X i,j = 1] E [X i,j ] = Pr [X i,j = 1] = Pr [z i is compared to z j ]

End of the proof and main theorem E [X ] = n 1 i=1 n j=i+1 E [X i,j] Pr [z i is compared to z j ] As z i and z j compare iff one of them is chosen as pivot, then Pr [X i,j ] = 1 = Pr [z i is pivot] + Pr [z j is pivot] Because pivots as chosen u.a.r. in Z i,j : Pr [z i is pivot] = Pr [z j is pivot] = 1 j 1+1 Therefore: E [X ] = n 1 n i=1 j=i+1 2 j i + 1.

n 1 E [X ] = n i=1 j=i+1 = 2 < 2 = 2 n i=1 2 j i + 1 ( 1 2 + 1 3 + + 1 n i + 1 ) n ( 1 2 + 1 3 + + 1 n ) i=1 n H n = 2 n H n = O(n lg n). i=1 Therefore, E [X ] = 2n ln n + Θ(n). Theorem The expected complexity of Ran-Quicksort is E [T n ] = O(n lg n).

Selection and order statistics Problem: Given a list A of n of unordered distinct keys, and a i Z, 1 i n, select the element x A that is larger than exactly i 1 other elements in A. Notice if: 1. i = 1 MINIMUM element 2. i = n MAXIMUM element 3. i = n+1 2 the MEDIAN 4. i = 0.9 n order statistics Sort A (O(n lg n) and search for A[i] (Θ(n)). Can we do it in linear time? Yes, Selection is easier than Sorting

Quick-Select Given unordered A[1,..., n] return the i-th. element Quick-Select (A[p,..., q], i) r = Ran-Partition (p, q) to find position of pivot if i = r return A[r] if i < r Quick-Select (A[p,..., r 1], i) else Quick-Select (A[r + 1,..., q], i) 1 3 A Search for i=2 in A m u h e c b k v 1 8 3=Ran Partition(1,8) e c b h u v k m

Quick-Select Algorithm Quickselect (A[p,..., q], i) if p = q then return A[p] else r =Ran-Partition (A[p,..., q]) k = r p + 1 if i = k then return A[q] if i < k then return Quickselect (A[p,..., q 1], i) else return Quickselect (A[q + 1,..., r], i k) end if end if end if

Analysis. Lucky: at each recusrsive call the search space is reduced in 9/10 of the size. Then T (n) T (9n/10) + Θ(n) = Θ(n). Unlucky: T (n) = T (n 1) + Θ(n) = Θ(n 2 ). In this case it is worst than sorting!. Theorem Given A[1,..., n] and i, the expected number of steps for Quick-Select to find the i-th. element in A is O(n)

Proof Given A[1,..., n] let T (n) be a rv counting the expected number of steps for Quick-Select to find the ith element. Quick-Select (A, i) returns the k-th. element with probability 1 A. Define the indicator rv: X ij = { 1 if subarray A = k, 0 otherwise. Therefore, E [X k ] = 1 n To get an UB on E [T (n)] assume the desired i-th element always fells in the k 1 largest side of the partition. When X k = 1 we have subarrays of size k 1 and n k. We get the recurrence: n T (n) X k T (max{k 1, n k}) + O(n) k=1 k m k k=ran Partition(A)

Proof (cont.) [ n ] E [T (n)] E X k T (max{k 1, n k}) + O(n) E [T (n)] = 1 n = = k=1 n E [X k T (max{k 1, n k})] + O(n) k=1 n E [X k ] E [T (max{k 1, n k})] + O(n)? k=1 = 1 n n E [X k ] E [T (max{k 1, n k})] k=1 Notice max{k 1, n k} = { k 1 n k n 1 k=1 E [T (k)] + O(n) = O(n) if k > n/2, otherwise.

Deterministic linear selection. Generate deterministically a good split element x. Divide the n elements in n/2 groups, each with 5 elements (+ possible one group with < 5 elements).

Deterministic linear selection. Sort each set to find its median, say x i. (Each sorting needs 6 comparisons, i.e. Θ(1)) Total: 6n/2

Deterministic linear selection. Use recursively Select to find the median x of the medians {x i }, 1 i 6n/2. Use deterministic Partition (quick sort) to re-arrange the groups corresponding to medians {x i } around x, in linear time on the number of medians. x

Deterministic linear selection. Al least 3 2 n/5 = 3n/10 of the elements are x. x

Deterministic linear selection. Al least 3 2 n/5 = 3n/10 of the elements are x. x

The deterministic algorithm Select (A, i) 1.- Divide the n elements into n/5 groups of 5 2.- Find the median by insertion sort, and take the middle element 3.- Use Select recursively to find the median x of the n/5 medians 4.- Use Partition to place x and its group. Let k=rank of x 5.- if i = k then return x else if i < k then use Select to find the i-th smallest in the left else use Select recursively to find the i k-th smallest in the right end if Notice steps 4 and 5 are the same as Quickselect.

Example Get the mean ( n/2 ) on the following input: 3 13 9 4 5 1 15 12 10 2 6 14 8 11 17 3 4 5 9 13 1 2 10 12 15 6 8 11 14 17 PARTITION around 10: 3 4 5 9 1 2 6 8 10 13 12 15 11 14 17 To get the 7th element (mean) call SELECT on this instance

Worst case Analysis. As at least 3n 10 of the elements are x. At least 3n 10 elements are < x. In the worst case, step 5 calls Select recursively 7n/10 Steps 1, 2 and 4 take O(n) time. Step 3 takes time T (n/5) and step 5 takes time T (7n/10). so we have T (n) = { Θ(1) if n 50 T (n/5) + T (7n/10) + Θ(n) if n > 50 Therefore, T (n) = Θ(n)

Notice: If we make groups of 7, the number of elements x is 2n 7, which yield T (n) T (n/7) + T (5n/7) + O(n) with solution T (n) = O(n). However, if we make groups of 3, then T (n) T (n/3) + T (2n/3) + O(n), which has a solution T (n) = O(n ln n).

Conclusions From a randomized algorithm we remove the randomization to get a fast deterministic algorithm for selection. From the practical point of view, the deterministic algorithm is slow. Use Quickselect.