Quicksort Where Average and Worst Case Differ S.V. N. (vishy) Vishwanathan University of California, Santa Cruz vishy@ucsc.edu February 1, 2016 S.V. N. Vishwanathan (UCSC) CMPS101 1 / 28
Basic Idea Outline 1 Basic Idea 2 Worst Case Analysis 3 Average Case Analysis S.V. N. Vishwanathan (UCSC) CMPS101 2 / 28
Basic Idea Divide and Conquer Divide: Choose a random pivot a A and split the array into A < and A >. Conquer: Combine: position) Sort A < and A > by calling quicksort Nothing to do! (note that a is already in its correct S.V. N. Vishwanathan (UCSC) CMPS101 3 / 28
Basic Idea Pseudocode quicksort(a, p, r) 1 if p < r 2 q = partition(a, p, r) 3 quicksort(a, p, q 1) 4 quicksort(a, q + 1, r) S.V. N. Vishwanathan (UCSC) CMPS101 4 / 28
Basic Idea Partition S.V. N. Vishwanathan (UCSC) CMPS101 5 / 28
Basic Idea Partition partition(a, p, r) 1 x = A[r] 2 i = p 1 3 for j = p to r 1 4 if A[j] x 5 i = i + 1 6 exchange A[i] with A[j] 7 exchange A[i + 1] with A[r] 8 return i + 1 S.V. N. Vishwanathan (UCSC) CMPS101 6 / 28
Basic Idea Loop Invariant At the beginning of each iteration of the loop of lines 3 6, for any array index k 1 If p k i, then A[k] x. 2 If i + 1 k j 1, then A[k] > x. 3 If k = r, then A[k] = x. Indices between j and r 1 have no particular relationship to x S.V. N. Vishwanathan (UCSC) CMPS101 7 / 28
Basic Idea In Pictures S.V. N. Vishwanathan (UCSC) CMPS101 8 / 28
Basic Idea Initialization i = p 1 and j = p No values lie between i + 1 and j 1 First two conditions are trivially satisfied x = A[r] in line 1 satisfies the third condition S.V. N. Vishwanathan (UCSC) CMPS101 9 / 28
Basic Idea Maintenance S.V. N. Vishwanathan (UCSC) CMPS101 10 / 28
Basic Idea Maintenance Case 1: A[j] > x increment j Condition 2 now holds for A[j 1] Other entries remain unchanged Case 2: A[j] x Increment i and swap A[i] with A[j] This satisfies condition 1 Increment j Now A[j 1] > x, this satisfies condition 3 S.V. N. Vishwanathan (UCSC) CMPS101 11 / 28
Basic Idea Termination j = r so every element in the array is either in A or A > S.V. N. Vishwanathan (UCSC) CMPS101 12 / 28
Worst Case Analysis Outline 1 Basic Idea 2 Worst Case Analysis 3 Average Case Analysis S.V. N. Vishwanathan (UCSC) CMPS101 13 / 28
Worst Case Analysis Worst Case Time Complexity Partitioning procedure produces one sub-problem with n 1 elements and one with 0 elements Recurrence relationship Therefore T (n) = O(n 2 ) T (n) = T (n 1) + T (0) + Θ(n) = T (n 1) + Θ(n) Worst case occurs when the array is already sorted! S.V. N. Vishwanathan (UCSC) CMPS101 14 / 28
Worst Case Analysis Best Case Time Complexity Partitioning procedure produces one sub-problem with n/2 elements and one with n/2 1 elements Recurrence relationship Therefore T (n) = Ω(n log(n)) T (n) = 2T (n/2) + Θ(n) S.V. N. Vishwanathan (UCSC) CMPS101 15 / 28
Average Case Analysis Outline 1 Basic Idea 2 Worst Case Analysis 3 Average Case Analysis S.V. N. Vishwanathan (UCSC) CMPS101 16 / 28
Average Case Analysis Average Case Suppose each permutation of an array is equally likely as an input to quicksort What is the expected time complexity? Turns out the average case time complexity of quicksort is O(n log n) S.V. N. Vishwanathan (UCSC) CMPS101 17 / 28
Average Case Analysis What about Imbalanced Splits? Even if the partitioning procedure produces one sub-problem with 9n/10 elements and one with n/10 elements Recurrence relationship T (n) = T (9n/10) + T (n/10) + Θ(n) Still evaluates to T (n) = O(n log(n)) S.V. N. Vishwanathan (UCSC) CMPS101 18 / 28
Average Case Analysis Proof by Picture S.V. N. Vishwanathan (UCSC) CMPS101 19 / 28
Average Case Analysis Randomizing Pivot Selection randomized-partition(a, p, r) 1 i = random(p, r) 2 exchange A[r] with A[i] 3 return partition(a, p, r) Call randomized-partition(a, p, r) instead of partition(a, p, r) S.V. N. Vishwanathan (UCSC) CMPS101 20 / 28
Average Case Analysis Bounding Runtime Lemma Let X be the number of comparisons performed in line 4 of partition over the entire execution of quicksort on an n-element array. Then the running time of quicksort is O(n + X ) Proof There are at most n calls to partition Each call takes O(1) time to setup The loop in lines 3 6 is the only place where we make a comparison S.V. N. Vishwanathan (UCSC) CMPS101 21 / 28
Average Case Analysis Terminology Rename elements of A as z 1, z 2,..., z n, with z i s sorted Define Z ij = {z i, z i+1,..., z j } to be the set of elements between z i and z j, inclusive Let X ij be an indicator variable Therefore X ij = I {z i is compared to z j } X = n 1 n i=1 j=i+1 X ij S.V. N. Vishwanathan (UCSC) CMPS101 22 / 28
Average Case Analysis Expectations n 1 E [X ] = E = = n 1 n i=1 j=i+1 n i=1 j=i+1 n 1 n i=1 j=i+1 X ij E [X ij ] Pr {z i is compared to z j }. S.V. N. Vishwanathan (UCSC) CMPS101 23 / 28
Average Case Analysis Observation - I If z i < x < z j is chosen as a pivot then z i and z j are never compared S.V. N. Vishwanathan (UCSC) CMPS101 24 / 28
Average Case Analysis Observation - II If z i is chosen as a pivot before any other item in Z ij, then z i is compared with each item in Z ij Same holds for z j S.V. N. Vishwanathan (UCSC) CMPS101 25 / 28
Average Case Analysis Probability of Comparison Pr {z i is compared to z j } = Pr {z i or z j is first pivot chosen from Z ij } = Pr {z i or is first pivot chosen from Z ij } + Pr {z j or is first pivot chosen from Z ij } 1 = j i + 1 + 1 j i + 1 2 = j i + 1 S.V. N. Vishwanathan (UCSC) CMPS101 26 / 28
Average Case Analysis Expectations E [X ] = n 1 n i=1 j=i+1 n 1 n i = i=1 k=1 n 1 n i=1 k=1 n 1 2 j i + 1 2 k + 1 2 k O(log n) i=1 = O(n log n) S.V. N. Vishwanathan (UCSC) CMPS101 27 / 28
Average Case Analysis Questions? S.V. N. Vishwanathan (UCSC) CMPS101 28 / 28