Partition and Select

Similar documents
Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

Divide-Conquer-Glue Algorithms

Algorithms And Programming I. Lecture 5 Quicksort

Week 5: Quicksort, Lower bound, Greedy

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Data Structures and Algorithms

5. DIVIDE AND CONQUER I

Data Structures and Algorithms CSE 465

5. DIVIDE AND CONQUER I

Data selection. Lower complexity bound for sorting

Lecture 4. Quicksort

Divide-and-conquer. Curs 2015

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

ITEC2620 Introduction to Data Structures

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Divide-and-conquer: Order Statistics. Curs: Fall 2017

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

CSE 591 Homework 3 Sample Solutions. Problem 1

CPS 616 DIVIDE-AND-CONQUER 6-1

Selection and Adversary Arguments. COMP 215 Lecture 19

Design and Analysis of Algorithms

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Analysis of Algorithms CMPSC 565

Linear Time Selection

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Review Of Topics. Review: Induction

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

V. Adamchik 1. Recurrences. Victor Adamchik Fall of 2005

Analysis of Algorithms. Randomizing Quicksort

1 Terminology and setup

CSE 312, Winter 2011, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

Lecture 14: Nov. 11 & 13

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

CSE 4502/5717: Big Data Analytics

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Design and Analysis of Algorithms

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

Concrete models and tight upper/lower bounds

Algorithms Test 1. Question 1. (10 points) for (i = 1; i <= n; i++) { j = 1; while (j < n) {

COMP Analysis of Algorithms & Data Structures

Quicksort. Recurrence analysis Quicksort introduction. November 17, 2017 Hassan Khosravi / Geoffrey Tien 1

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

1 Divide and Conquer (September 3)

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Mergesort and Recurrences (CLRS 2.3, 4.4)

Quicksort. Where Average and Worst Case Differ. S.V. N. (vishy) Vishwanathan. University of California, Santa Cruz

CSE 421, Spring 2017, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

Advanced Analysis of Algorithms - Midterm (Solutions)

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Problem Set 2 Solutions

Bucket-Sort. Have seen lower bound of Ω(nlog n) for comparisonbased. Some cheating algorithms achieve O(n), given certain assumptions re input

data structures and algorithms lecture 2

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

CSCI 3110 Assignment 6 Solutions

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

Algorithms. Algorithms 2.2 MERGESORT. mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE

Fundamental Algorithms

Fundamental Algorithms

Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

Data Structures and Algorithm Analysis (CSC317) Randomized Algorithms (part 3)

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Quick Sort Notes , Spring 2010

CS483 Design and Analysis of Algorithms

Divide and Conquer. Recurrence Relations

On Partial Sorting. Conrado Martínez. Univ. Politècnica de Catalunya, Spain

CSCE 750, Spring 2001 Notes 2 Page 1 4 Chapter 4 Sorting ffl Reasons for studying sorting is a big deal pedagogically useful Λ the application itself

Algorithm Design and Analysis

Fast Sorting and Selection. A Lower Bound for Worst Case

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

5. DIVIDE AND CONQUER I

Analysis of Algorithms - Using Asymptotic Bounds -

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

Notes for Recitation 14

Divide and Conquer Algorithms

Divide and Conquer Strategy

Average Case Analysis. October 11, 2011

CSE 613: Parallel Programming. Lecture 8 ( Analyzing Divide-and-Conquer Algorithms )

Kartsuba s Algorithm and Linear Time Selection

Multidimensional Divide and Conquer 1 Skylines

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

CSE 421 Algorithms: Divide and Conquer

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

N/4 + N/2 + N = 2N 2.

Find Min and Max. Find them independantly: 2n 2. Can easily modify to get 2n 3. Should be able to do better(?) Try divide and conquer.

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Divide and Conquer Algorithms

CS 161 Summer 2009 Homework #2 Sample Solutions

Outline. 1 Introduction. Merging and MergeSort. 3 Analysis. 4 Reference

Divide and Conquer Algorithms

Divide and Conquer. CSE21 Winter 2017, Day 9 (B00), Day 6 (A00) January 30,

Divide and conquer. Philip II of Macedon

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

Topic 4 Randomized algorithms

CS 470/570 Divide-and-Conquer. Format of Divide-and-Conquer algorithms: Master Recurrence Theorem (simpler version)

CS 5321: Advanced Algorithms Analysis Using Recurrence. Acknowledgement. Outline

Transcription:

Divide-Conquer-Glue Algorithms Quicksort, Quickselect and the Master Theorem Quickselect algorithm Tyler Moore CSE 3353, SMU, Dallas, TX Lecture 11 Selection Problem: find the kth smallest number of an unsorted sequence. What is the name for selection when k = n 2? Θ(n lg n) solution is easy. How? There is a linear-time solution available in the average case Some slides created by or adapted from Dr. Kevin Wayne. For more information see http://www.cs.princeton.edu/~wayne/kleinberg-tardos. Some code reused from Python Algorithms by Magnus Lie Hetland. 2 / 20 Partitioning with pivots Partition and Select To use a divide-conquer-glue strategy, we need a way to split the problem in half Furthermore, to make the running time linear, we need to always identify the half of the problem where the kth element is Key insight: split the sequence by a random pivot. If the subset of smaller items happens to be of size k 1, then you have found the pivot. Otherwise, pivot on the half known to have k. 1 def p a r t i t i o n ( seq ) : 2 pi, seq = seq [ 0 ], seq [ 1 : ] # Pick and remove the p i v o t 3 l o = [ x f o r x i n seq i f x <= p i ] # A l l the s m a l l e l e m e n t s 4 h i = [ x f o r x i n seq i f x > p i ] # A l l the l a r g e ones 5 return lo, pi, h i # p i i s i n the r i g h t p l a c e 6 7 def s e l e c t ( seq, k ) : 8 lo, pi, h i = p a r t i t i o n ( seq ) # [<= p i ], pi, [> p i ] 9 m = l e n ( l o ) 10 i f m == k : return p i # Found kth s m a l l e s t 11 e l i f m < k : # Too f a r to the l e f t 12 return s e l e c t ( hi, k m 1) # Remember to a d j u s t k 13 e l s e : # Too f a r to the r i g h t 14 return s e l e c t ( lo, k ) # Use o r i g i n a l k h e r e 3 / 20 4 / 20

A verbose Select function Seeing the Select in action d e f s e l e c t ( seq, k ) : lo, pi, h i = p a r t i t i o n ( seq ) # [<= p i ], pi, [> p i ] p r i n t lo, pi, h i m = l e n ( l o ) p r i n t s m a l l p a r t i t i o n l e n g t h %i %(m) i f m == k : p r i n t found kth s m a l l e s t %s % p i r e t u r n p i # Found kth s m a l l e s t e l i f m < k : # Too f a r to t h e l e f t p r i n t s m a l l p a r t i t i o n has %i e l e m ents, so kth must be i n r i g h t s e q u e n c e % m r e t u r n s e l e c t ( hi, k m 1) # Remember to a d j u s t k e l s e : # Too f a r to t h e r i g h t p r i n t s m a l l p a r t i t i o n has %i e l e m ents, so kth must be i n l e f t s e q u e n c e % m r e t u r n s e l e c t ( lo, k ) # Use o r i g i n a l k h e r e 5 / 20 >>> select([3, 4, 1, 6, 3, 7, 9, 13, 93, 0, 100, 1, 2, 2, 3, 3, 2],4) [1, 3, 0, 1, 2, 2, 3, 3, 2] 3 [4, 6, 7, 9, 13, 93, 100] small partition length 9 small partition has 9 elements, so kth must be in left sequence [0, 1] 1 [3, 2, 2, 3, 3, 2] small partition length 2 small partition has 2 elements, so kth must be in right sequence [2, 2, 3, 3, 2] 3 [] small partition length 5 small partition has 5 elements, so kth must be in left sequence [2, 2] 2 [3, 3] small partition length 2 small partition has 2 elements, so kth must be in left sequence [2] 2 [] small partition length 1 found kth smallest 2 2 6 / 20 From Quickselect to Quicksort Best case for Quicksort Question: what if we wanted to know all k-smallest items (for k = 1 n)? 1 def q u i c k s o r t ( seq ) : 2 i f l e n ( seq ) <= 1 : return seq # Base c a s e 3 lo, pi, h i = p a r t i t i o n ( seq ) # p i i s i n i t s p l a c e 4 return q u i c k s o r t ( l o ) + [ p i ] + q u i c k s o r t ( h i ) # S o r t l o and h i s e p a r a t e l y The total partitioning on each level is O(n), and it take lg n levels of perfect partitions to get to single element subproblems. When we are down to single elements, the problems are sorted. Thus the total time in the best case is O(n lg n). 7 / 20 8 / 20

Worst case for Quicksort Picking Better Pivots Suppose instead our pivot element splits the array as unequally as possible. Thus instead of n/2 elements in the smaller half, we get zero, meaning that the pivot element is the biggest or smallest element in the array. Having the worst case occur when they are sorted or almost sorted is very bad, since that is likely to be the case in certain applications. To eliminate this problem, pick a better pivot: 1 Use the middle element of the subarray as pivot. 2 Use a random element of the array as the pivot. 3 Perhaps best of all, take the median of three elements (first, last, middle) as the pivot. Why should we use median instead of the mean? Whichever of these three rules we use, the worst case remains O(n 2 ). Now we have n1 levels, instead of lg n, for a worst case time of Θ(n 2 ), since the first n/2 levels each have n/2 elements to partition. 9 / 20 10 / 20 Randomized Quicksort Randomized Guarantees Suppose you are writing a sorting program, to run on data given to you by your worst enemy. Quicksort is good on average, but bad on certain worst-case instances. If you used Quicksort, what kind of data would your enemy give you to run it on? Exactly the worst-case instance, to make you look bad. But instead of picking the median of three or the first element as pivot, suppose you picked the pivot element at random. Now your enemy cannot design a worst-case instance to give to you, because no matter which data they give you, you would have the same probability of picking a good pivot! Randomization is a very important and useful idea. By either picking a random pivot or scrambling the permutation before sorting it, we can say: With high probability, randomized quicksort runs in Θ(n lg n) time Where before, all we could say is: If you give me random input data, quicksort runs in expected Θ(n lg n) time. See the difference? 11 / 20 12 / 20

Importance of Randomization Recurrence Relations Since the time bound how does not depend upon your input distribution, this means that unless we are extremely unlucky (as opposed to ill prepared or unpopular) we will certainly get good performance. Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity. The worst-case is still there, but we almost certainly wont see it. Recurrence relations specify the cost of executing recursive functions. Consider mergesort 1 Linear-time cost to divide the lists 2 Two recursive calls are made, each given half the original input 3 Linear-time cost to merge the resulting lists together Recurrence: T (n) = 2T ( n 2 ) + Θ(n) Great, but how does this help us estimate the running time? 13 / 20 14 / 20 Master Theorem Extracts Time Complexity from Some Recurrences Apply the Master Theorem So what s the complexity of Mergesort? Mergesort recurrence: T (n) = 2T ( n 2 ) + Θ(n) Since a = 2, b = 2, k = 1, 2 = 2 1. Thus k log n) Let s try another one: T (n) = 3T ( n 5 ) + 8n2 Well a = 3, b = 5, c = 8, k = 2, and 3 < 5 2. Thus 2 ) 15 / 20 16 / 20

Apply the Master Theorem What s going on in the three cases? Now it s your turn: T (n) = 4T ( n 2 ) + 5n 1 Too many leaves: leaf nodes outweighs sum of divide and glue costs 2 Equal work per level: split between leaves matches divide and glue costs 3 Too expensive a root: divide and glue costs dominate 17 / 20 18 / 20 Quicksort Recurrence (Average Case) Selection (Average Case) 1 def s e l e c t ( seq, k ) : 2 lo, pi, h i = p a r t i t i o n ( seq ) # [<= p i ], pi, [> p i ] 1 def q u i c k s o r t ( seq ) : 3 m = l e n ( l o ) 2 i f l e n ( seq ) <= 1 : return seq # Base c a s e 4 i f m == k : return p i # Found kth s m a l l e s t 3 lo, pi, h i = p a r t i t i o n ( seq ) # p i i s i n 5i t s p le al ci fe m < k : # Too f a r to the l e f t 4 return q u i c k s o r t ( l o ) + [ p i ] + q u i c k s o r t ( h i ) # S o r t l o and 6 h i s e preturn a r a t e l ys e l e c t ( hi, k m 1) # Remember to a d j u s t k T (n) = 2T ( n 7 e l s e : # Too f a r to the r i g h t 2 ) + Θ(n), so T (n) = n log n 8 return s e l e c t ( lo, k ) # Use o r i g i n a l k h e r e 19 / 20 20 / 20