CS 332: Algorithms. Linear-Time Sorting. Order statistics. Slide credit: David Luebke (Virginia)

Similar documents
Design and Analysis of Algorithms

CS 332: Algorithms. Quicksort

Sorting Algorithms. Algorithms Kyuseok Shim SoEECS, SNU.

Analysis of Algorithms -Quicksort-

CS / MCS 401 Homework 3 grader solutions

CS583 Lecture 02. Jana Kosecka. some materials here are based on E. Demaine, D. Luebke slides

4.3 Growth Rates of Solutions to Recurrences

This Lecture. Divide and Conquer. Merge Sort: Algorithm. Merge Sort Algorithm. MergeSort (Example) - 1. MergeSort (Example) - 2

Divide & Conquer. Divide-and-conquer algorithms. Conventional product of polynomials. Conventional product of polynomials.

Analysis of Algorithms. Introduction. Contents

Algorithms and Data Structures Lecture IV

CS 270 Algorithms. Oliver Kullmann. Growth of Functions. Divide-and- Conquer Min-Max- Problem. Tutorial. Reading from CLRS for week 2

Model of Computation and Runtime Analysis

Data Structures Lecture 9

Data Structures and Algorithm. Xiaoqing Zheng

Model of Computation and Runtime Analysis

Average-Case Analysis of QuickSort

COMP285 Midterm Exam Department of Mathematics

CSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions

Divide and Conquer. 1 Overview. 2 Multiplying Bit Strings. COMPSCI 330: Design and Analysis of Algorithms 1/19/2016 and 1/21/2016

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

Design and Analysis of ALGORITHM (Topic 2)

Introduction to Algorithms 6.046J/18.401J LECTURE 3 Divide and conquer Binary search Powering a number Fibonacci numbers Matrix multiplication

Algorithm Analysis. Algorithms that are equally correct can vary in their utilization of computational resources

A Probabilistic Analysis of Quicksort

Department of Informatics Prof. Dr. Michael Böhlen Binzmühlestrasse Zurich Phone:

Test One (Answer Key)

CS161: Algorithm Design and Analysis Handout #10 Stanford University Wednesday, 10 February 2016

Algorithms Design & Analysis. Divide & Conquer

Fundamental Algorithms

Advanced Course of Algorithm Design and Analysis

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

A recurrence equation is just a recursive function definition. It defines a function at one input in terms of its value on smaller inputs.

Matriculation number: You have 90 minutes to complete the exam of InformatikIIb. The following rules apply:

CSE 5311 Notes 1: Mathematical Preliminaries

Review Of Topics. Review: Induction

Divide and Conquer. 1 Overview. 2 Insertion Sort. COMPSCI 330: Design and Analysis of Algorithms 1/19/2016 and 1/21/2016

CSI 5163 (95.573) ALGORITHM ANALYSIS AND DESIGN

) n. ALG 1.3 Deterministic Selection and Sorting: Problem P size n. Examples: 1st lecture's mult M(n) = 3 M ( È

An Introduction to Randomized Algorithms

6.046 Recitation 5: Binary Search Trees Bill Thies, Fall 2004 Outline

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, February 4/Tuesday, February 5

Chapter 22 Developing Efficient Algorithms

Infinite Sequences and Series

CS:3330 (Prof. Pemmaraju ): Assignment #1 Solutions. (b) For n = 3, we will have 3 men and 3 women with preferences as follows: m 1 : w 3 > w 1 > w 2

Sums, products and sequences

Merge and Quick Sort

COMP26120: More on the Complexity of Recursive Programs (2018/19) Lucas Cordeiro

CS161 Design and Analysis of Algorithms. Administrative

CSED233: Data Structures (2018F) Lecture13: Sorting and Selection

Zeros of Polynomials

CSE 202 Homework 1 Matthias Springer, A Yes, there does always exist a perfect matching without a strong instability.

Hashing and Amortization

ECEN 655: Advanced Channel Coding Spring Lecture 7 02/04/14. Belief propagation is exact on tree-structured factor graphs.

Number Representation

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Divide and Conquer II

0 1 sum= sum= sum= sum= sum= sum= sum=64

Induction: Solutions

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

ITEC 360 Data Structures and Analysis of Algorithms Spring for n 1

Riemann Sums y = f (x)

Recurrence Relations

XT - MATHS Grade 12. Date: 2010/06/29. Subject: Series and Sequences 1: Arithmetic Total Marks: 84 = 2 = 2 1. FALSE 10.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

6.3 Testing Series With Positive Terms

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

1 Hash tables. 1.1 Implementation

Quantum Computing Lecture 7. Quantum Factoring

Disjoint set (Union-Find)

Examples: data compression, path-finding, game-playing, scheduling, bin packing

Math 475, Problem Set #12: Answers

Data Structures and Algorithms

Lecture 3: Asymptotic Analysis + Recurrences

Mathematical Foundation. CSE 6331 Algorithms Steve Lai

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Lecture 19: Convergence

MA131 - Analysis 1. Workbook 3 Sequences II

COMP26120: Introducing Complexity Analysis (2018/19) Lucas Cordeiro

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Design and Analysis of Algorithms

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Dotting The Dot Map, Revisited. A. Jon Kimerling Dept. of Geosciences Oregon State University

The Growth of Functions. Theoretical Supplement

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

CS 330 Discussion - Probability

Spectral Partitioning in the Planted Partition Model

Bertrand s Postulate

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Practice Problems: Taylor and Maclaurin Series

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

EE260: Digital Design, Spring n Binary Addition. n Complement forms. n Subtraction. n Multiplication. n Inputs: A 0, B 0. n Boolean equations:

1. By using truth tables prove that, for all statements P and Q, the statement

Homework 5 Solutions

Calculus with Analytic Geometry 2

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

1 Generating functions for balls in boxes

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

4.1 SIGMA NOTATION AND RIEMANN SUMS

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Transcription:

1 CS 332: Algorithms Liear-Time Sortig. Order statistics. Slide credit: David Luebke (Virgiia)

Quicksort: Partitio I Words Partitio(A, p, r): Select a elemet to act as the pivot (which?) Grow two regios, A[p..i] ad A[j..r] All elemets i A[p..i] <= pivot All elemets i A[j..r] >= pivot Icremet i util A[i] >= pivot Decremet j util A[j] <= pivot Swap A[i] ad A[j] Repeat util i >= j Retur j 2 Note: slightly differet from book s partitio()

Partitio Code Partitio(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; util A[j] <= x; repeat i++; util A[i] >= x; if (i < j) Swap(A, i, j); else retur j; 3 Illustrate o A = {5, 3, 2, 6, 4, 1, 3, 7}; What is the ruig time of partitio()?

Partitio Code Partitio(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; util A[j] <= x; repeat i++; util A[i] >= x; if (i < j) Swap(A, i, j); else retur j; 4 partitio() rus i O() time

5 Sortig So Far Isertio sort: Easy to code Fast o small iputs (less tha ~50 elemets) Fast o early-sorted iputs O( 2 ) worst case O( 2 ) average (equally-likely iputs) case O( 2 ) reverse-sorted case

6 Sortig So Far Merge sort: Divide-ad-coquer: Split array i half Recursively sort subarrays Liear-time merge step O( lg ) worst case Does t sort i place

7 Sortig So Far Quick sort: Divide-ad-coquer: Partitio array ito two subarrays, recursively sort All of first subarray < all of secod subarray No merge step eeded! O( lg ) average case Fast i practice O( 2 ) worst case Naïve implemetatio: worst case o sorted iput Address this with radomized quicksort

8 How Fast Ca We Sort? We will provide a lower boud, the beat it How do you suppose we ll beat it? First, a observatio: all of the sortig algorithms so far are compariso sorts The oly operatio used to gai orderig iformatio about a sequece is the pairwise compariso of two elemets Theorem: all compariso sorts are Ω( lg ) A compariso sort must do O() comparisos (why?) What about the gap betwee O() ad O( lg )

9 Decisio Trees Decisio trees provide a abstractio of compariso sorts A decisio tree represets the comparisos made by a compariso sort. Every thig else igored (Draw examples o board) What do the leaves represet? How may leaves must there be?

10 Decisio Trees Decisio trees ca model compariso sorts. For a give algorithm: Oe tree for each Tree paths are all possible executio traces What s the logest path i a decisio tree for isertio sort? For merge sort? What is the asymptotic height of ay decisio tree for sortig elemets? Aswer: Ω( lg ) (ow let s prove it )

11 Lower Boud For Compariso Sortig Thm: Ay decisio tree that sorts elemets has height Ω( lg ) What s the miimum # of leaves? What s the maximum # of leaves of a biary tree of height h? Clearly the miimum # of leaves is less tha or equal to the maximum # of leaves i a biary tree of that height

12 Lower Boud For Compariso Sortig So we have! 2 h Takig logarithms: lg (!) h Stirlig s approximatio tells us:!> e Thus: h lg e

Lower Boud For Compariso Sortig So we have h lg e = lg lg e = Ω ( lg ) Thus the miimum height of a decisio tree is Ω( lg ) 13

14 Lower Boud For Compariso Sorts Thus the time to compariso sort elemets is Ω( lg ) Corollary: Mergesort is a asymptotically optimal compariso sort But the ame of this lecture is Sortig i liear time! How ca we do better tha Ω( lg )?

15 Sortig I Liear Time Coutig sort No comparisos betwee elemets! But depeds o assumptio about the umbers beig sorted We assume umbers are i the rage 1.. k The algorithm: Iput: A[1..], where A[j] {1, 2, 3,, k} Output: B[1..], sorted (otice: ot sortig i place) Also: Array C[1..k] for auxiliary storage

16 Coutig Sort 1 CoutigSort(A, B, k) 2 for i=1 to k 3 C[i]= 0; 4 for j=1 to 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; 8 for j= dowto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1; Work through example: A={4 1 3 4 3}, k = 4

17 Coutig Sort 1 CoutigSort(A, B, k) 2 for i=1 to k Takes time O(k) 3 C[i]= 0; 4 for j=1 to 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; Takes time O() 8 for j= dowto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1; What will be the ruig time?

18 Coutig Sort Total time: O( + k) Usually, k = O() Thus coutig sort rus i O() time But sortig is Ω( lg )! No cotradictio--this is ot a compariso sort (i fact, there are o comparisos at all!) Notice that this algorithm is stable

19 Coutig Sort Cool! Why do t we always use coutig sort? Because it depeds o rage k of elemets Could we use coutig sort to sort 32 bit itegers? Why or why ot? Aswer: o, k too large (2 32 = 4,294,967,296)

20 Coutig Sort How did IBM get rich origially? Aswer: puched card readers for cesus tabulatio i early 1900 s. I particular, a card sorter that could sort cards ito differet bis Each colum ca be puched i 12 places Decimal digits use 10 places Problem: oly oe colum ca be sorted o at a time

21 Radix Sort Ituitively, you might sort o the most sigificat digit, the the secod msd, etc. Problem: lots of itermediate piles of cards (read: scratch arrays) to keep track of Key idea: sort the least sigificat digit first RadixSort(A, d) for i=1 to d StableSort(A) o digit i

Radix Sort Ca we prove it will work? Sketch of a iductive argumet (iductio o the umber of passes): Assume lower-order digits {j: j<i}are sorted Show that sortig ext digit i leaves array correctly sorted If two digits at positio i are differet, orderig umbers by that digit is correct (lower-order digits irrelevat) If they are the same, umbers are already sorted o the lower-order digits. Sice we use a stable sort, the 22

23 Radix Sort What sort will we use to sort o digits? Coutig sort is obvious choice: Sort umbers o digits that rage from 1..k Time: O( + k) Each pass over umbers with d digits takes time O(+k), so total time O(d+dk) Whe d is costat ad k=o(), takes O() time How may bits i a computer word?

24 Radix Sort Problem: sort 1 millio 64-bit umbers Treat as four-digit radix 2 16 umbers Ca sort i just four passes with radix sort! Compares well with typical O( lg ) compariso sort Requires approx lg = 20 operatios per umber beig sorted So why would we ever use aythig but radix sort?

25 Radix Sort I geeral, radix sort based o coutig sort is Fast Asymptotically fast (i.e., O()) Simple to code A good choice To thik about: Ca radix sort be used o floatig-poit umbers?

Bucket Sort Bucket sort Assumptio: iput is reals from [0, 1) Basic idea: Create liked lists (buckets) to divide iterval [0,1) ito subitervals of size 1/ Add each iput elemet to appropriate bucket ad sort buckets with isertio sort Uiform iput distributio O(1) bucket size Therefore the expected total time is O() These ideas will retur whe you will lear about hash tables 26

27 Order Statistics The ith order statistic i a set of elemets is the ith smallest elemet The miimum is thus the 1st order statistic The maximum is (duh) the th order statistic The media is the /2 order statistic If is eve, there are 2 medias How ca we calculate order statistics? What is the ruig time?

Order Statistics How may comparisos are eeded to fid the miimum elemet i a set? The maximum? Ca we fid the miimum ad maximum with less tha twice the cost? Yes: Walk through elemets by pairs Compare each elemet i pair to the other Compare the largest to maximum, smallest to miimum Total cost: 3 comparisos per 2 elemets = O(3/2) 28

29 Fidig Order Statistics: The Selectio Problem A more iterestig problem is selectio: fidig the ith smallest elemet of a set We will show: A practical radomized algorithm with O() expected ruig time A cool algorithm of theoretical iterest oly with O() worst-case ruig time

30 Radomized Selectio Key idea: use partitio() from quicksort But, oly eed to examie oe subarray This savigs shows up i ruig time: O() We will agai use a slightly differet partitio tha the book: q = RadomizedPartitio(A, p, r) A[q] A[q] p q r

Radomized Selectio RadomizedSelect(A, p, r, i) if (p == r) the retur A[p]; q = RadomizedPartitio(A, p, r) k = q - p + 1; if (i == k) the retur A[q]; // ot i book if (i < k) the retur RadomizedSelect(A, p, q-1, i); else retur RadomizedSelect(A, q+1, r, i-k); k A[q] A[q] p 31 q r

32 Radomized Selectio Aalyzig RadomizedSelect() Worst case: partitio always 0:-1 T() = T(-1) + O() =??? = O( 2 ) (arithmetic series) No better tha sortig! Best case: suppose a 9:1 partitio T() = T(9/10) + O() =??? = O() (Master Theorem, case 3) Better tha sortig! What if this had bee a 99:1 split?

33 Radomized Selectio Average case For upper boud, assume ith elemet always falls i larger side of partitio: T 1 1 ( ) T ( max( k, k 1) ) + Θ( ) k = 0 2 1 T k = / 2 ( k ) + Θ( ) What happeed here? Let s show that T() = O() by substitutio

34 What happeed here? Split the recurrece What happeed here? What happeed here? What happeed here? Radomized Selectio Assume T() c for sufficietly large c: ( ) ( ) ( ) ( ) ( ) ( ) ( ) c c c k k c ck k T T k k k k + Θ = + Θ = + Θ = + Θ + Θ = = = = 1 2 2 1 2 1 2 2 1 1 2 1 2 2 2 ) ( 2 ) ( 1 2 1 1 1 1 2 / 1 2 / The recurrece we started with Substitute T() c for T(k) Expad arithmetic series Multiply it out

35 Radomized Selectio Assume T() c for sufficietly large c: T ( ) = = = c c 2 2 c c c c + + Θ( ) 4 2 c c c + Θ( ) 4 2 c c c + Θ( ) 4 2 c (if c is big eough) ( 1) 1 + Θ( ) The recurrece so far What Multiply happeed it out here? What Subtract happeed c/2 here? What Rearrage happeed the arithmetic here? What happeed we set out here? to prove

36 Worst-Case Liear-Time Selectio Radomized algorithm works well i practice What follows is a worst-case liear time algorithm, really of theoretical iterest oly Basic idea: Geerate a good partitioig elemet Call this elemet x

37 Worst-Case Liear-Time Selectio The algorithm i words: 1. Divide elemets ito groups of 5 2. Fid media of each group (How? How log?) 3. Use Select() recursively to fid media x of the /5 medias 4. Partitio the elemets aroud x. Let k = rak(x) 5. if (i == k) the retur x if (i < k) the use Select() recursively to fid ith smallest elemet i first partitio else (i > k) use Select() recursively to fid (i-k)th smallest elemet i last partitio

38 Worst-Case Liear-Time Selectio (Sketch situatio o the board) How may of the 5-elemet medias are x? At least 1/2 of the medias = /5 / 2 = /10 How may elemets are x? At least 3 /10 elemets For large, 3 /10 /4 (How large?) So at least /4 elemets x Similarly: at least /4 elemets x

Worst-Case Liear-Time Selectio Thus after partitioig aroud x, step 5 will call Select() o at most 3/4 elemets The recurrece is therefore: T ( ) T 5 + T 3 4 = T c c c ( ) ( ) + Θ( ) ( 5) + T ( 3 4) + Θ( ) 5 = 19c + 3c ( c 20 Θ( ) ) if 20 + Θ( ) c is 39 4 + Θ( ) big eough /5??? /5 Substitute T() =??? c Combie fractios??? Express i desired form??? What we set out to prove???

40 Worst-Case Liear-Time Selectio Ituitively: Work at each level is a costat fractio (19/20) smaller Geometric progressio! Thus the O() work at the root domiates

41 Liear-Time Media Selectio Give a black box O() media algorithm, what ca we do? ith order statistic: Fid media x Partitio iput aroud x if (i (+1)/2) recursively fid ith elemet of first half else fid (i - (+1)/2)th elemet i secod half T() = T(/2) + O() = O() Ca you thik of a applicatio to sortig?

42 Liear-Time Media Selectio Worst-case O( lg ) quicksort Fid media x ad partitio aroud it Recursively quicksort two halves T() = 2T(/2) + O() = O( lg )

The Ed 43