Randomization in Algorithms and Data Structures

Randomization in Algorithms and Data Structures 3 lectures (Tuesdays 14:15-16:00) May 1: Gerth Stølting Brodal May 8: Kasper Green Larsen May 15: Peyman Afshani For each lecture one handin exercise deadline June 1 Pre-talent track activity in Algorithms, Department of Computer Science, Aarhus University, May, 2018

Backwards Analysis Gerth Stølting Brodal Pre-talent track activity in Algorithms, Department of Computer Science, Aarhus University, May 1, 2018

Pareto optimal / non-dominated points / skyline No points Pareto optimal Dominated points Skyline Vilfredo Pareto (1848 1923)

Exercise 1 (skyline construction) Given n points (x 1, y 1 ),..., (x n, y n ) in R 2 Give an algorithm for computing the points on the skyline? What is the running time of your algorithm? skyline Skyline

Problem expected skyline size? Consider n points (x 1, y 1 ),..., (x n, y n ) in R 2 Each x i and y i is selected independently and uniformly at random from [0, 1] What is the expected skyline size? 1 0 0 1 Skyline size

Exercise 2 (dependent points) Describe an algorithm for generating n distinct points (x 1, y 1 ),..., (x n, y n ) in R 2 Each x i and y i is selected uniformly at random from [0, 1] The points are not independent 1 0 0 1 Skyline size

Generating random points Assume the points are generated by the following algorithm 1) Generate n random x-values x 1,..., x n 2) sort the x-values in decreasing order 3) for decreasing x i generate random y i (x i, y i ) is a skyline point y i = max(y 1,..., y i ) 1 (x i, y i ) Prob[y i skyline point] = 1 i since y 1,..., y i independt and the same distribution, all permutations are equally likely, i.e. probability y i to be largest among i values is 1/i 0 (x 1, y 1 ) 0 1 Skyline size

Expected skyline size Stochastic variable: X i = ቊ 1 if point i on skyline 0 otherwise E[ skyline ] 1 = E[X 1 + + X n ] 1 = 1 + 1 2 + + 1 n 1 = σ i=1..n (harmonic number H i n ) ln n + 1 0 (x i, y i ) (x 1, y 1 ) 0 1 Skyline size

Σ n n-th Harmonic number H n = 1/1 + 1/2 + 1/3 + + 1/n = 1/i i = 1 n n H n 1 1/x dx = [ ln x ] = ln n ln 1 = ln n H n 1/n 1 1 1/n ln n + 1/n H n ln n + 1 1/1 1/2 1/3 1/4 1/n 1 2 3 4 5 Harmonic numbers n

Exercise 3 (divide-and-conquer skyline) Consider the following algorithm Find the topmost point p in O(n) time recurse on points to the right of p Show that the expected running time is O(n) 1 p 0 0 1 Skyline

QuickSort a randomized sorting algorithm Quicksort(x 1,..., x n ) Pick a random pivot element x i Partition remaining elements: S smaller than x i, and L larger than x i Recursively sort S and L 7 4 15 12 8 11 3 5 1 14 6 7 4 3 5 1 6 8 15 12 11 14 3 1 4 7 5 6 11 15 12 14 1 3 5 6 7 12 14 15 3 5 6 12 15 6 QuickSort

QuickSort analysis I Alternative Quicksort Consider a random permutation π of the input, such that x π(1) is the first pivot, then x π(2), x π(3),... Observations Changes the order unsorted sublists are partitioned, but the pivots are still selected uniformly among the elements x i and x j are compared if and only x i or x j is selected as a pivot before any element in the sorted list between x i and x j output x j x i QuickSort

QuickSort analysis II E[comparisons by quicksort] = E[Σ i<j cost of comparing x i and x j ] = Σ i<j E[cost of comparing x i and x j ] 2 = Σ i<j r j r(i) + 1 where r(i) = position of x i in output 2 = Σ 1 p<q n q p + 1 1 2n Σ 2 d n d output r(j) r(i) 2n ((ln n + 1) 1) x j x i p q = 2n ln n d QuickSort

Exercise 4 (random search trees) Construct a unbalanced binary search tree for n numbers x 1 < < x n by inserting the numbers in random order What is the probability that x j is an ancestor of x i? What is the expected depth of a node x i? 15 8 17 3 13 5 10 Insert: 15, 8, 17, 13, 3, 5, 10 Random search trees

Convex hull Convex hull = smallest polygon encolosing all points Convex hull

Exercise 5: Convex hull Give an O(n log n) worst-case algorithm finding the Convex Hull Convex hull

Convex hull randomized incremental 1) Let p 1,..., p n be a random permutation of the points 2) Compute convex hull of {p 1, p 2, p 3 } 3) c = (p 1 + p 2 + p 3 ) / 3 4) for i = 4 to n insert p i and construct H i from H i-1 : if p i inside H i-1 skip, otherwise insert p i in H i-1 and delete chain inside p 3 p i c H 3 p 1 p 2 H i-1 c Convex hull

Convex hull inserting p i e p i c H i-1 Convex hull

Convex hull inserting p i e p i c H i-1 How to find e for p i? store set of points with e and reference to e from p i Convex hull

Convex hull - analysis Each point inserted / deleted / inside at most once in convex hull E[# point-edge updates] = E[Σ 4 i n Σ p p updated on insertion i] = Σ 4 i n Σ p E[p updated on insertion i] Σ 4 i n Σ p 2 i 3 2n (ln n + 1) since p only updated on insertion i if p i is u or v Total expected time O(n log n) H i u c e p v Convex hull

Binary search - but forgot to sort the array... (a debugging case) 31 17 useful How many cells can ever be reached by a binary search? 43 useful 3 useful 13 47 not reachable 29 2 41 11 7 23 19 5 37 find(41) 2 3 41 31 11 13 7 17 23 47 19 43 5 29 37 Binary searching unsorted array

Reachable nodes analysis Pr[ v i useful ] = L i / Σ j L j E[ # useful nodes at level ] = Σ i ( L i / Σ j L j ) = 1 E[ # useful nodes in tree ] = height - 1 E[ # reachable nodes in tree ] height 2 = O(log 2 n) useful 17 31 ]-,17[ ]17,43[ useful 43 ]43, [ # useful nodes? v 1 v 2 v 3 v 4 L 1 = { 2,3,4,5,11 } L 2 = L 3 = { 19,23,29,37,41 } L 4 = { 47 } Binary searching unsorted array

Closest pair p Given n points, find pair (p, q) with minimum distance q Algorithm (idea) permute points randomly p 1, p 2,..., p n for i = 2..n compute Δ i = distance between closest pair for p 1,..., p i Observation Pr[Δ i < Δ i-1 ] 2 i since minimum distance can only decrease if p i is defining Δ i Closest pair

Closest pair grid cells Construct grid cells with side-length Δ i-1 Point (x, y) in cell ( x/δ i 1, y/δ i 1 ) 4 points in cell if all pairwise distances Δ i-1 Neighbors of p i within distance Δ i-1 are in 9 cells Store non-empty cells in a hash tabel (using randomization) Rebuild grid whenever Δ i decreases Analysis E[rebuild cost] = E[Σ 3 i n rebuild cost inserting p i ] = Σ 3 i n E[rebuild cost inserting p i ] Σ 3 i n 2 i i 2n Total expected time O(n) p i Δ i-1 Closest pair

Handin (Treaps) 3 0.1 7 0.9 13 0.5 A treap is a binary search tree with a random priority assigned to each element when inserted (in the example elements are white and priorities yellow). A left-to-right inorder traversal gives the elements in sorted order, whereas the priorities satisfy heap order, i.e. priorities increase along a leaf-to-root path. 8 0.1 11 0.3 21 0.4 a) Argue that an arbitrary element in a treap of size n has expected depth O(log n) b) Describe how to insert an element into a treap and give running time Treaps

References Raimund Seidel, Backwards Analysis of Randomized Geometric Algorithms, in Pach J. (eds) New Trends in Discrete and Computational Geometry. Algorithms and Combinatorics, vol 10, 37-67. Springer, Berlin, Heidelberg 1993. DOI: 10.1007/978-3-642-58043-7_3 (public available at CiteSeerX). [Chapters 4-5] Sariel Har-Peled, Backwards analysis, lecture notes, 2014. sarielhp.org. [Chapters 39.1-39.3] Raimund Seidel and Cecilia R. Aragon, Randomized search trees, Algorithmica, 16(4 5), 464 497, 1996. DOI 10.1007/BF01940876. David R. Karger, Philip N. Klein, Robert E. Tarjan, A randomized linear-time algorithm to find minimum spanning trees, Journal of the ACM, 42(2), 321-328, 1995. DOI 10.1145/201019.201022. Timothy M. Chan, Backwards analysis of the Karger-Klein-Tarjan algorithm for minimum spanning trees, Information Processing Letters, 67(6), 303-304, 1998. DOI 10.1016/S0020-0190(98)00129-X. References