Divide-and-conquer Curs 2015
The divide-and-conquer strategy. 1. Break the problem into smaller subproblems, 2. recursively solve each problem, 3. appropriately combine their answers. Known Examples: Binary search Mergesort Quicksort Strassen matrix multiplication Julius Caesar (I-BC) Divide et impera J. von Neumann (1903-57) Merge sort
Collaborative Filtering In many commercial webs collaborative filtering is a technique to match your preferences with other customers in the Web, to guess which product you should be recommended. One manner to do it, make rankings of given product (music, movies, novels) and match your interest with similar rankings by other people. One way to quantify the notion of how similar are two ordered lists is by counting the inversions between the two lists.
Counting inversions Given n items (1, 2,..., n) consider A has a list of preferences L A = {1, 2,,..., n}, and B has a list L B = {b 1, b 2,..., b n }. We want to see how similar (close) L B is to L A Two items i, j form an inversion if i < j in L A but i > j in L B. For example: Consider L A = {1, 2, 3, 4, 5, 6, 7, 8} and L B = {1, 5, 3, 2, 7, 4, 8, 6} 1 2 3 4 5 6 7 8 1 5 3 2 7 4 8 6 Number of inversions 8 (5,3),(5,2), (5,4), (5,7), (3,2) (7,4), (7,6), (8,6) 1 2 3 4 5 6 7 8 Number of inversions 28 All pairs are inversions 8 7 6 5 4 3 2 1
D&C for counting inversions Brute force algorithm: Look at every pair (i, j) to see if it is an inversion: ( n 2) = n 2 /2 = O(n 2 ) Algorisme D&C 1. Divide: Separate the lists into two half 2. Conquer: Recursively count inversions in each half 3. Combine: Count inversions where i i j are in different halves 4. Return: The sum of the 3 quantities The strategy for 3 will be similar to Mergesort
Combine two halfs E and D key idea: sort using merge sort: Given subproblems E and D at same level of recursion, with E and D already sorted: scan from left to right E and D compare i E with j D if i < j then i is not inverted with left element in D if i > j then i is inverted with every left element in D append the smaller to sorted list. Complexity: T (n) = 2T (n/2) + O(n) = O(n lg n) 1 3 7 8 2 4 5 7 1 3 6 8 2 4 5 7 1 2 +1 (3,2) 1 2 3 4 5 6 7 8 +4
Example 1 5 3 2 7 4 8 6 1 5 3 2 7 4 8 6 1 5 3 2 7 4 8 6 1 5 3 2 7 4 8 6 1 1 5 2 3 4 7 6 8 2 3 5 4 6 7 8 (3,2), (7,4), (8,6) (5,2),(5,3), (7,6) 1 2 3 4 5 6 7 8 (5,4) Total inversions= 7
2D-Closest pair of points Give n points in the plane, find a pair of points with the smallest Euclidean distance. Assumption: No two points have the same x-coordinate. Brute force algorithm: Compute the distance between every pair (i, j) and compare with the others: O(n 2 ) Very easy: sort by coordinate. O(n lg n). But sorting method does not generalize to higher dimensions (2) why?
D&C for 2D-Closest pair of points D&C 1. Divide: Separate the plane by a line L, into two half E and D with same number of points (±1) 2. Conquer: Recursively find the minimal distance between pairs of points in each half. 3. Combine: Taking into consideration pairs of points (p, q) with p E and q D 4. Return: The pair of points at minimal distance. E Recursive calls D L3 L2 L3 L1 L3 L2 L3
D& C Algorithm At each step: Divide: Sort the n points by its x coordinate. Take n/2 into left of L (E) and n/2 into right of L (D) (O(n lg n)) Conquer: Return d = min{d E, d D } (2T (n/2)) dd p 1 de q1 q2 p2 E L D Combine: There might be two points, one in E and other in D, that are closer than d
D& C Algorithm: Combine phase Take a vertical band of width 2d around L Any p E and q D s.t. d(p, q) d must reside in this band. There could be many other points inside the band. Focus only of points in the band To find the closest p, q in this band: Sort by increasing y-coordinate the points in the band, Y = {y 1, y 2,... y m }. Cost: O(n lg n) E 2 d dd p 1 de q1 q2 p2 L D
D& C Algorithm: Combine phase Consider a grid with d/2 inside the band There is at most 1 point inside each d 2 d 2 cell. (the diagonal of the cell = d 2 < d) Two points > 2 cells rows apart have distance > d (The distance between two points in two consecutive cells is 5 4 = 1.118d.) d Two points > 2 cells columns apart have distance > d (The same argument, that above) 2 d L d/2
How many squares a point can influence?: For every point in the sorted Y = {y 1, y 2,... y m }, starting from y 1 we only have to explore the distance between y i and the nest 10 ordered points in Y. y i, y j, d(y i, y j ) d if i j 10 So for every point in he band we only have to compare with the 10 nearest points in Y, with a total cost 10n. d
Closest-Pair Algorithm: Closest-Pair (p 1,..., p n ) Sort by the x-coordinate to compute L d 1 =Closet-Pair(E) d 2 =Closet-Pair(D) d = min{d 1, d 2 } Delete points > d from L Sort the remaining points by y-coordinate to form listed Y Scan in order Y list computing the distance with next 11 elements If any of those distances is < d update d T (n) = 2T (n/2) + O(n lg n) = O(n lg 2 n) Do you know how to improve to O(n lg n)
Random-Quicksort Consider the function Ran-Partition: Ran-Partition (A[p,..., q]) r = rand(p, q) u.a.r. interchange A[p] and A[r] Using Ran-Partition, consider the following randomized Divide and Conquer algorithm, on input A[1,..., n]: Ran-Quicksort (A[p,..., q]) r = Ran-Partition (A[p,..., q]) if p < q then Ran-Quicksort (A[1,..., r 1]) Ran-Quicksort (A[+1,..., q]) else return A[p] end if
Example A={1,3,5,6,8,10,12,14,15,16,17,18,20,22,23} Ran Partition of input 8 3 16 1 6 12 18 5 10 14 15 17 22 20 23
Expected Complexity of Ran-Partition The expected running time T (n) of Rand-Quicksort is dominated by the number of comparisons. Every call to Rand-Partition has cost Θ(1) + Θ(number of comparisons) }{{} p q If we can count the number of comparisons, we can bound the the total time of Quicksort. Let X be the number of comparisons made in all calls of Ran-Quicksort X is a rv as it depends of the random choices of Ran-Partition
Expected Complexity of Ran-Partition Note: In the first application of Ran-Partition A[r] compares with all n 1 elements. Key observation: Any two keys are compared iff one of them is a pivot, and they are compared at most one time. 10 12 14 15 16 17 18 20 22 23 never compare For simplicity assume all keys are different, for any input A[i,..., j] to Ran-Quicksort, 1 i < j n, let Z i,j be the ordered set of key {z i, z i+1,..., z j } (with z i the smallest). Note Z i,j = j 1 + 1 Therefore choosing u.a.r. a pivot is done with probability. 1 Z i,j = 1 j 1 + 1
Define the indicator r.v.: { 1 if z i is compared to z j, X ij = 0 otherwise. Then, X = n 1 i=1 n j=i+1 X i,j (this is true because we never compare a pair more than once) n 1 E [X ] = E n i=1 j=i+1 X i,j = n 1 n i=1 j=i+1 E [X i,j ] As E [X i,j ] = 0Pr [X i,j = 0] + 1Pr [X i,j = 1] E [X i,j ] = Pr [X i,j = 1] = Pr [z i is compared to z j ]
End of the proof and main theorem E [X ] = n 1 i=1 n j=i+1 E [X i,j] Pr [z i is compared to z j ] As z i and z j compare iff one of them is chosen as pivot, then Pr [X i,j ] = 1 = Pr [z i is pivot] + Pr [z j is pivot] Because pivots as chosen u.a.r. in Z i,j : Pr [z i is pivot] = Pr [z j is pivot] = 1 j 1+1 Therefore: E [X ] = n 1 n i=1 j=i+1 2 j i + 1.
n 1 E [X ] = n i=1 j=i+1 = 2 < 2 = 2 n i=1 2 j i + 1 ( 1 2 + 1 3 + + 1 n i + 1 ) n ( 1 2 + 1 3 + + 1 n ) i=1 n H n = 2 n H n = O(n lg n). i=1 Therefore, E [X ] = 2n ln n + Θ(n). Theorem The expected complexity of Ran-Quicksort is E [T n ] = O(n lg n).
Selection and order statistics Problem: Given a list A of n of unordered distinct keys, and a i Z, 1 i n, select the element x A that is larger than exactly i 1 other elements in A. Notice if: 1. i = 1 MINIMUM element 2. i = n MAXIMUM element 3. i = n+1 2 the MEDIAN 4. i = 0.9 n order statistics Sort A (O(n lg n) and search for A[i] (Θ(n)). Can we do it in linear time? Yes, Selection is easier than Sorting
Quick-Select Given unordered A[1,..., n] return the i-th. element Quick-Select (A[p,..., q], i) r = Ran-Partition (p, q) to find position of pivot if i = r return A[r] if i < r Quick-Select (A[p,..., r 1], i) else Quick-Select (A[r + 1,..., q], i) 1 3 A Search for i=2 in A m u h e c b k v 1 8 3=Ran Partition(1,8) e c b h u v k m
Quick-Select Algorithm Quickselect (A[p,..., q], i) if p = q then return A[p] else r =Ran-Partition (A[p,..., q]) k = r p + 1 if i = k then return A[q] if i < k then return Quickselect (A[p,..., q 1], i) else return Quickselect (A[q + 1,..., r], i k) end if end if end if
Analysis. Lucky: at each recusrsive call the search space is reduced in 9/10 of the size. Then T (n) T (9n/10) + Θ(n) = Θ(n). Unlucky: T (n) = T (n 1) + Θ(n) = Θ(n 2 ). In this case it is worst than sorting!. Theorem Given A[1,..., n] and i, the expected number of steps for Quick-Select to find the i-th. element in A is O(n)
Proof Given A[1,..., n] let T (n) be a rv counting the expected number of steps for Quick-Select to find the ith element. Quick-Select (A, i) returns the k-th. element with probability 1 A. Define the indicator rv: X ij = { 1 if subarray A = k, 0 otherwise. Therefore, E [X k ] = 1 n To get an UB on E [T (n)] assume the desired i-th element always fells in the k 1 largest side of the partition. When X k = 1 we have subarrays of size k 1 and n k. We get the recurrence: n T (n) X k T (max{k 1, n k}) + O(n) k=1 k m k k=ran Partition(A)
Proof (cont.) [ n ] E [T (n)] E X k T (max{k 1, n k}) + O(n) E [T (n)] = 1 n = = k=1 n E [X k T (max{k 1, n k})] + O(n) k=1 n E [X k ] E [T (max{k 1, n k})] + O(n)? k=1 = 1 n n E [X k ] E [T (max{k 1, n k})] k=1 Notice max{k 1, n k} = { k 1 n k n 1 k=1 E [T (k)] + O(n) = O(n) if k > n/2, otherwise.
Deterministic linear selection. Generate deterministically a good split element x. Divide the n elements in n/2 groups, each with 5 elements (+ possible one group with < 5 elements).
Deterministic linear selection. Sort each set to find its median, say x i. (Each sorting needs 6 comparisons, i.e. Θ(1)) Total: 6n/2
Deterministic linear selection. Use recursively Select to find the median x of the medians {x i }, 1 i 6n/2. Use deterministic Partition (quick sort) to re-arrange the groups corresponding to medians {x i } around x, in linear time on the number of medians. x
Deterministic linear selection. Al least 3 2 n/5 = 3n/10 of the elements are x. x
Deterministic linear selection. Al least 3 2 n/5 = 3n/10 of the elements are x. x
The deterministic algorithm Select (A, i) 1.- Divide the n elements into n/5 groups of 5 2.- Find the median by insertion sort, and take the middle element 3.- Use Select recursively to find the median x of the n/5 medians 4.- Use Partition to place x and its group. Let k=rank of x 5.- if i = k then return x else if i < k then use Select to find the i-th smallest in the left else use Select recursively to find the i k-th smallest in the right end if Notice steps 4 and 5 are the same as Quickselect.
Example Get the mean ( n/2 ) on the following input: 3 13 9 4 5 1 15 12 10 2 6 14 8 11 17 3 4 5 9 13 1 2 10 12 15 6 8 11 14 17 PARTITION around 10: 3 4 5 9 1 2 6 8 10 13 12 15 11 14 17 To get the 7th element (mean) call SELECT on this instance
Worst case Analysis. As at least 3n 10 of the elements are x. At least 3n 10 elements are < x. In the worst case, step 5 calls Select recursively 7n/10 Steps 1, 2 and 4 take O(n) time. Step 3 takes time T (n/5) and step 5 takes time T (7n/10). so we have T (n) = { Θ(1) if n 50 T (n/5) + T (7n/10) + Θ(n) if n > 50 Therefore, T (n) = Θ(n)
Notice: If we make groups of 7, the number of elements x is 2n 7, which yield T (n) T (n/7) + T (5n/7) + O(n) with solution T (n) = O(n). However, if we make groups of 3, then T (n) T (n/3) + T (2n/3) + O(n), which has a solution T (n) = O(n ln n).
Conclusions From a randomized algorithm we remove the randomization to get a fast deterministic algorithm for selection. From the practical point of view, the deterministic algorithm is slow. Use Quickselect.