Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

Similar documents
Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

Algorithms And Programming I. Lecture 5 Quicksort

Lecture 4. Quicksort

Quicksort. Where Average and Worst Case Differ. S.V. N. (vishy) Vishwanathan. University of California, Santa Cruz

Data Structures and Algorithms CSE 465

Analysis of Algorithms. Randomizing Quicksort

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Analysis of Algorithms CMPSC 565

Data Structures and Algorithm Analysis (CSC317) Randomized algorithms

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Data Structures and Algorithm Analysis (CSC317) Randomized Algorithms (part 3)

Fundamental Algorithms

Fundamental Algorithms

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Week 5: Quicksort, Lower bound, Greedy

Recommended readings: Description of Quicksort in my notes, Ch7 of your CLRS text.

Review Of Topics. Review: Induction

CS 161 Summer 2009 Homework #2 Sample Solutions

Bucket-Sort. Have seen lower bound of Ω(nlog n) for comparisonbased. Some cheating algorithms achieve O(n), given certain assumptions re input

Divide-and-conquer. Curs 2015

Partition and Select

Data Structures and Algorithms

COMP Analysis of Algorithms & Data Structures

Divide and conquer. Philip II of Macedon

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Algorithms Test 1. Question 1. (10 points) for (i = 1; i <= n; i++) { j = 1; while (j < n) {

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

1 Divide and Conquer (September 3)

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

CSE 421, Spring 2017, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

CSE 312, Winter 2011, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

Divide-and-conquer: Order Statistics. Curs: Fall 2017

Average Case Analysis. October 11, 2011

Quicksort algorithm Average case analysis

Problem Set 2 Solutions

Solutions. Problem 1: Suppose a polynomial in n of degree d has the form

Fast Sorting and Selection. A Lower Bound for Worst Case

Searching. Sorting. Lambdas

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Quick Sort Notes , Spring 2010

Mergesort and Recurrences (CLRS 2.3, 4.4)

5. DIVIDE AND CONQUER I

Advanced Analysis of Algorithms - Midterm (Solutions)

Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures

CSCI 3110 Assignment 6 Solutions

Randomized algorithms. Inge Li Gørtz

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

N/4 + N/2 + N = 2N 2.

Data selection. Lower complexity bound for sorting

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

Chunksort: A Generalized Partial Sort Algorithm

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

CS483 Design and Analysis of Algorithms

Asymptotic Analysis and Recurrences

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Design and Analysis of Algorithms

Divide and Conquer Algorithms

5. DIVIDE AND CONQUER I

Selection and Adversary Arguments. COMP 215 Lecture 19

data structures and algorithms lecture 2

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Linear Time Selection

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

CS 470/570 Divide-and-Conquer. Format of Divide-and-Conquer algorithms: Master Recurrence Theorem (simpler version)

Kartsuba s Algorithm and Linear Time Selection

Recap: Prefix Sums. Given A: set of n integers Find B: prefix sums 1 / 86

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Divide and Conquer Algorithms

ITEC2620 Introduction to Data Structures

1. Basic Algorithms: Bubble Sort

CSCE 750, Spring 2001 Notes 2 Page 1 4 Chapter 4 Sorting ffl Reasons for studying sorting is a big deal pedagogically useful Λ the application itself

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

Divide and Conquer Strategy

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

CIS 121 Data Structures and Algorithms with Java Spring Big-Oh Notation Monday, January 22/Tuesday, January 23

CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms

Asymptotic Algorithm Analysis & Sorting

Module 1: Analyzing the Efficiency of Algorithms

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Reductions, Recursion and Divide and Conquer

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 2. Recurrence Relations. Divide and Conquer. Divide and Conquer Strategy. Another Example: Merge Sort. Merge Sort Example. Merge Sort Example

Divide-and-Conquer. a technique for designing algorithms

Data Structures and Algorithms CSE 465

CS361 Homework #3 Solutions

Chapter 4 Divide-and-Conquer

5. DIVIDE AND CONQUER I

1. Basic Algorithms: Bubble Sort

Omega notation. Transitivity etc.

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Divide-Conquer-Glue Algorithms

Transcription:

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort Partition n elements array A into two subarrays of n/2 elements each Sort the two subarrays recursively Merge the two subarrays Running time: T(n) = 2T(n/2) + Θ(n) T(n) = Θ(n log n) Another possibility is to divide the elements such that there is no need of merging, that is Partition A[1...n] into subarrays A = A[1..q] and A = A[q+1...n] such that all elements in A are larger than all elements in A. Recursively sort A and A. (nothing to combine/merge. A already sorted after sorting A and A ) Pseudo code for Quicksort: Quicksort(A, p, r) IF p < r THEN FI q=partition(a, p, r) Quicksort(A,p,q 1) Quicksort(A,q + 1,r) Sort using Quicksort(A,1,n) If q = n/2 and we divide in Θ(n) time, we again get the recurrence T(n) = 2T(n/2) + Θ(n) for the running time T(n) = Θ(n log n) The problem is that it is hard to develop partition algorithm which always divide A in two halves

Partition(A, p, r) x = A[r] i = p 1 FOR j = p TO r 1 DO IF A[j] x THEN FI i = i + 1 Exchange A[i] and A[j] OD Exchange A[i + 1] and A[r] RETURN i + 1 Quicksort correctness:..easy to show, inductively, if Partition works correctly Example: 2 1 7 8 3 5 6 4 2 1 3 8 7 5 6 4 2 1 3 8 7 5 6 4 2 1 3 8 7 5 6 4 2 1 3 4 7 5 6 8 i=0, j=1 i=1, j=2 i=1, j=3 i=1, j=4 i=2, j=5 i=3, j=6 i=3, j=7 i=3, j=8 q=4 Partition can be proved correct (by induction) using the loop invariant: A[k] x for p k i A[k] > x for i + 1 k j 1 A[k] = x for k = r quicksort analysis Partition runs in time Θ(r p) Running time depends on how well Partition divides A. In the example it does reasonably well. If array is always partitioned nicely in two halves (partition returns q = r p 2 ), we have the recurrence T(n) = 2T(n/2) + Θ(n) T(n) = Θ(n lg n). But, in the worst case Partition always returns q = p or q = r and the running time becomes T(n) = Θ(n) + T(0) + T(n 1) T(n) = Θ(n 2 ).

and what is maybe even worse, the worst case is when A is already sorted. So why is it called quick -sort? Because it often performs very well can we theoretically justify this? Even if all the splits are relatively bad, we get Θ(n log n) time: Example: Split is 9 10 n, 1 10 n. T(n) = T( 9 1 10n) + T( 10 n) + n Solution? Guess: T(n) cn log n Induction T(n) = T( 9 1 n) + T( 10 10 n) + n 9cn 10 log(9n 10 ) + cn 10 log( n 10 ) + n 9cn 9cn log n + 10 10 log( 9 10 ) + cn cn log n + 10 10 log( 1 10 ) + n cn log n + 9cn 9cn cn log 9 log 10 log 10 + n 10 10 10 cn log n n(clog 10 9c log 9 1) 10 T(n) cn log n if clog 10 9c 10 10 log 9 1 > 0 which is definitely true if c > log 10 So, in other words, if the splits happen at a constant fraction of n we get Θ(n lg n) or, it s almost never bad! Average running time The natural question is: what is the average case running time of quicksort? Is it close to worstcase (Θ(n 2 ), or to the best case Θ(n lg n)? Average time depends on the distribution of inputs for which we take the average. If we run quicksort on a set of inputs that are all almost sorted, the average running time will be close to the worst-case. Similarly, if we run quicksort on a set of inputs that give good splits, the average running time will be close to the best-case. If we run quicksort on a set of inputs which are picked uniformly at random from the space of all possible input permutations, then the average case will also be close to the best-case. Why? Intuitively, if any input ordering is equally likely, then we expect at least as many good splits as bad splits, therefore on the average a bad split will be followed by a good split, and it gets absorbed in the good split. So, under the assumption that all input permutations are equally likely, the average time of Quicksort is Θ(n lg n) (intuitively). Is this assumption realistic? Not really. In many cases the input is almost sorted; think of rebuilding indexes in a database etc. The question is: how can we make Quicksort have a good average time irrespective of the input distribution? Using randomization.

Randomization We consider what we call randomized algorithms, that is, algorithms that make some random choices during their execution. Running time of normal deterministic algorithm only depend on the input. Running time of a randomized algorithm depends not only on input but also on the random choices made by the algorithm. Running time of a randomized algorithm is not fixed for a given input! Randomized algorithms have best-case and worst-case running times, but the inputs for which these are achieved are not known, they can be any of the inputs. We are normally interested in analyzing the expected running time of a randomized algorithm, that is, the expected (average) running time for all inputs of size n T e (n) = E X =n [T(X)] Randomized Quicksort We can enforce that all n! permutations are equally likely by randomly permuting the input before the algorithm. Most computers have pseudo-random number generator random(1, n) returning random number between 1 and n Using pseudo-random number generator we can generate a random permutation (such that all n! permutations equally likely) in O(n) time: Choose element in A[1] randomly among elements in A[1..n], choose element in A[2] randomly among elements in A[2..n], choose element in A[3] randomly among elements in A[3..n], and so on. Note: Just choosing A[i] randomly among elements A[1..n] for all i will not give random permutation! Why? Alternatively we can modify Partition slightly and exchange last element in A with random element in A before partitioning. RandPartition(A, p, r) i=random(p, r) Exchange A[r] and A[i] RETURN Partition(A, p, r) RandQuicksort(A, p, r) IF p < r THEN FI q=randpartition(a, p, r) RandQuicksort(A, p, q 1) RandQuicksort(A, q + 1, r)

Expected Running Time of Randomized Quicksort Let T(n) be the running time of RandQuicksort for an input of size n. Running time of RandQuicksort is the total running time spent in all Partition calls. Partition is called n times The pivot element x is not included in any recursive calls. One call of Partition takes O(1) time plus time proportional to the number of iterations of FOR-loop. In each iteration of FOR-loop we compare an element with the pivot element. If X is the number of comparisons A[j] x performed in Partition over the entire execution of RandQuicksort then the running time is O(n + X). E[T(n)] = E[O(n + X)] = n + E[X] To analyze the expected running time we need to compute E[X] To compute X we use z 1,z 2,...,z n to denote the elements in A where z i is the ith smallest element. We also use Z ij to denote {z i,z i+1,...,z j }. Each pair of elements z i and z j are compared at most once (when either of them is the pivot) X = n 1 nj=i+1 i=1 X ij where { 1 If zi compared to z X ij = i 0 If z i not compared to z i [ n 1 ] E[X] = E nj=i+1 i=1 X ij = n 1 nj=i+1 i=1 E[X ij ] = n 1 nj=i+1 i=1 Pr[z i compared to z j ] To compute Pr[z i compared to z j ] it is useful to consider when two elements are not compared. Example: Consider an input consisting of numbers 1 through n. Assume first pivot it 7 first partition separates the numbers into sets {1, 2, 3, 4, 5, 6} and {8, 9, 10}. In partitioning, 7 is compared to all numbers. No number from the first set will ever be compared to a number from the second set. In general, once a pivot x, z i < x < z j, is chosen, we know that z i and z j cannot later be compared. On the other hand, if z i is chosen as pivot before any other element in Z ij then it is compared to each element in Z ij. Similar for z j.

In example: 7 and 9 are compared because 7 is first item from Z 7,9 to be chosen as pivot, and 2 and 9 are not compared because the first pivot in Z 2,9 is 7. Prior to an element in Z ij being chosen as pivot, the set Z ij is together in the same partition any element in Z ij is equally likely to be first element chosen as pivot the probability that z i or z j is chosen first in Z ij is 1 j i+1 Pr[z i compared to z j ] = 2 j i+1 We now have: E[X] = n 1 nj=i+1 i=1 Pr[z i compared to z j ] = n 1 nj=i+1 2 i=1 = n 1 i=1 < n 1 i=1 = n 1 j i+1 n i 2 k=1 k+1 n i k=1 2 k i=1 O(log n) = O(n log n) Since best case is θ(n lg n) = E[X] = Θ(n lg n) and therefore E[T(n)] = Θ(n lg n). Next time we will see how to make quicksort run in worst-case O(n log n) time.