CSCE 750, Spring 2001 Notes 2 Page 1 4 Chapter 4 Sorting ffl Reasons for studying sorting is a big deal pedagogically useful Λ the application itself

Similar documents
Week 5: Quicksort, Lower bound, Greedy

Algorithms And Programming I. Lecture 5 Quicksort

Data selection. Lower complexity bound for sorting

CSCE 750, Spring 2001 Notes 1 Page 1 An outline of the book 1. Analyzing algorithms 2. Data abstraction 3. Recursion and induction 4. Sorting 5. Selec

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Data Structures and Algorithms

Fundamental Algorithms

Fundamental Algorithms

CS 161 Summer 2009 Homework #2 Sample Solutions

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Analysis of Algorithms CMPSC 565

Review Of Topics. Review: Induction

Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

Partition and Select

Lecture 4. Quicksort

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

Algorithm Design and Analysis

1 Terminology and setup

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

V. Adamchik 1. Recurrences. Victor Adamchik Fall of 2005

Advanced Analysis of Algorithms - Midterm (Solutions)

Central Algorithmic Techniques. Iterative Algorithms

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

Divide and Conquer Algorithms

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Quick Sort Notes , Spring 2010

Concrete models and tight upper/lower bounds

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

Divide and Conquer Algorithms

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

5. DIVIDE AND CONQUER I

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

Data Structures and Algorithms CSE 465

Algorithms. Algorithms 2.2 MERGESORT. mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Selection and Adversary Arguments. COMP 215 Lecture 19

Problem 5. Use mathematical induction to show that when n is an exact power of two, the solution of the recurrence

Bucket-Sort. Have seen lower bound of Ω(nlog n) for comparisonbased. Some cheating algorithms achieve O(n), given certain assumptions re input

Insertion Sort. We take the first element: 34 is sorted

Divide and Conquer. Recurrence Relations

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Linear Time Selection

data structures and algorithms lecture 2

Divide and Conquer Algorithms

Data Structures and Algorithms Chapter 3

Problem Set 2 Solutions

Divide-and-conquer: Order Statistics. Curs: Fall 2017

CPS 616 DIVIDE-AND-CONQUER 6-1

CS Data Structures and Algorithm Analysis

Lecture 14: Nov. 11 & 13

CS361 Homework #3 Solutions

5. DIVIDE AND CONQUER I

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Divide-and-Conquer Algorithms Part Two

CSE 421, Spring 2017, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

Fundamental Algorithms

Sorting algorithms. Sorting algorithms

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Divide-and-conquer. Curs 2015

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

Asymptotic Running Time of Algorithms

Asymptotic Analysis and Recurrences

Design and Analysis of Algorithms

CSCI 3110 Assignment 6 Solutions

1. Basic Algorithms: Bubble Sort

Analysis of Algorithms - Using Asymptotic Bounds -

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

Sorting and Searching. Tony Wong

Big O 2/14/13. Administrative. Does it terminate? David Kauchak cs302 Spring 2013

Chapter 2. Recurrence Relations. Divide and Conquer. Divide and Conquer Strategy. Another Example: Merge Sort. Merge Sort Example. Merge Sort Example

Quicksort. Recurrence analysis Quicksort introduction. November 17, 2017 Hassan Khosravi / Geoffrey Tien 1

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Computational Complexity

Advanced Counting Techniques. Chapter 8

An analogy from Calculus: limits

Reductions, Recursion and Divide and Conquer

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Bin Sort. Sorting integers in Range [1,...,n] Add all elements to table and then

Algorithms. Jordi Planes. Escola Politècnica Superior Universitat de Lleida

MIDTERM I CMPS Winter 2013 Warmuth

Lower and Upper Bound Theory

Outline. 1 Introduction. Merging and MergeSort. 3 Analysis. 4 Reference

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CSE 613: Parallel Programming. Lecture 8 ( Analyzing Divide-and-Conquer Algorithms )

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

1. Basic Algorithms: Bubble Sort

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Transcription:

CSCE 750, Spring 2001 Notes 2 Page 1 4 Chapter 4 Sorting ffl Reasons for studying sorting is a big deal pedagogically useful Λ the application itself is easy to understand Λ a complete analysis can often be done Λ different sorting algorithms illustrate different ideas ffl Theoretical issues count comparisons worst case versus average case ffl Practical issues space recursion versus iteration external sort versus internal sort sort keys, not data number of moves memory to memory? disk to disk? locality ofreference?

CSCE 750, Spring 2001 Notes 2 Page 2 sort the array `a[*]' of length `count' perform only the first `howmany' sort steps keep track of the total number of comparisons void bubble(long a[],long howmany,long count,long *comparisons) long i,j,temp; if(howmany == count) howmany--; for(i = 1; i <= howmany; i++) for(j = i+1; j <= count; j++) comparisons++; if(a[i] > a[j]) temp = a[i]; a[i] = a[j]; a[j] = temp;

CSCE 750, Spring 2001 Notes 2 Page 3 sort the array `a[*]' of length `count' perform only the first `howmany' sort steps keep track of the total number of comparisons void insert(long a[],long howmany,long count,long *comparisons) long j,length,temp; if(howmany == count) howmany--; for(length = 2; length <= howmany; length++) for(j = length-1; j >= 1; j--) comparisons++; if(a[j] > a[j+1]) temp = a[j]; a[j] = a[j+1]; a[j+1] = temp; else break;

CSCE 750, Spring 2001 Notes 2 Page 4 sort the array `a[*]' of length `count' perform only the first `howmany' sort steps keep track of the total number of comparisons void insert2(long a[],long howmany,long count,long *comparisons) long j,length,insertvalue,temp; if(howmany == count) howmany--; for(length = 2; length <= howmany; length++) insertvalue = a[length]; for(j = length-1; j >= 1; j--) comparisons++; if(a[j] > insertvalue) a[j+1] = a[j]; else a[j] = insertvalue; break;

CSCE 750, Spring 2001 Notes 2 Page 5 4.1 Lower Bounds With some sorting algorithms we can prove lower bounds on the complexity of algorithms. We will describe the nature of sorting algorithms by noting first that any list of n elements is some permutation ß of its sorted order. Definition 4.1. Given a permutation ß on n integer keys, an inversion is a pair (i; j) such that i<j but ß(i) >ß(j). For example, the list list inversions 4 (4,3) (5,2) 3 (4,2) (5,1) 5 (4,1) (2,1) 2 (3,2) 1 (3,1) FOR THOSE OF YOU WHO KNOW ABOUT PERMUTATION GROUPS, note that this definition of inversion is not at all the same as that of a transposition. For every n, there exists a permutation of n(n 1)=2 inversions. (This can be proved either directly or by induction. Note that this requires that every pair be outoforder.) Theorem 4.2. A sorting algorithm that sorts by comparison of keys and that removes at most one inversion for every comparison made has worst case running time of at least n(n 1)=2 comparisons and average case running time of at least n(n 1)=4 comparisons.

CSCE 750, Spring 2001 Notes 2 Page 6 Proof. The worst case is obvious at this point. For every n there exists a permutation with n(n 1)=2 inversions, so if only one is removed with each comparison, then worst case running time must require n(n 1)=2 comparisons. Average case: Consider permutations in pairs, of a permutation and its reverse. For each of the n(n 1)=2 pairs of elements, each pair is an inversion in exactly one of the two permutations, so the n(n1)=2inversions are shared between the two permutations. On average, then, a random permutation has half that many inversions, and the theorem is proved. Corollary 4.3. Insertion sort has lower order worst case running time of at least n(n 1)=2 comparisons and average case running time of at least n(n 1)=4 comparisons. Proof. (of the corollary) Insertion sort has the effect of comparing adjacent elements in a list and exchanging them if they are out of order. ( Has the effect" because one version does not compare adjacent elements but instead creates a vacancy. However, the effect is no different from what would happen if the element to be inserted were in fact inserted into the vacancy at every step, and then it would be true that adjacent elements had been compared.) If one compares two adjacent elements a n and a n+1 that are out of order (that is, a n >a n+1 ) and then exchanges them, the only change in the number of inversions is that the inversion (a n ;a n+1 ) is eliminated. So the conditions of the theorem are met and the corollary is true (assuming the theorem is true.)

CSCE 750, Spring 2001 Notes 2 Page 7 4.2 Quicksort ffllet's assume that we have n items to sort. ffl(a) Assume that we could (invoke oracle here) pick out the n=2-th element a k, which we can re-index to be a n=2. ffl(b) Now, let's assume that we rearrange the array of n items so that all items subscripted less than n=2 are less than a n=2 and all items subscripted larger than n=2 are larger than a n=2. Then call this procedure recursively, invoking the oracle at every stage to be able to choose the midpoint element every time. ffln items ffllg n recursion depth ffllet's assume that with each recursion, we can arrange the data as in (B) in time proportional to n. fflthen we could sort the entire array in time O(n lg n). fflwe can't, of course, find an oracle. But if we're lucky and careful, we might be able to come close in the average case to choosing the midpoint element as our pivot element. fflas for the rearrangement (B): fflassume the pivot element is the first element in the array. (If it isn't, then exchange the pivot element with the first element.) Pull the pivot element aside to create avacancy in position one. fflwork from the end toward the front until we find the first element smaller than the pivot. Put that element in the vacancy in position one, thus creating a vacancy in the second half of the array. fflnow work forward from position 2 until we find an element that belongs in

CSCE 750, Spring 2001 Notes 2 Page 8 the second half of the array. Put that in the vacancy, creating a first-half vacancy. ffland so forth. fflat the end, the vacancy will be in the middle of the array, and we put our pivot element there. fflbottom Line: In time n we can cause the array to be split into a first half smaller than the pivot element and a second half larger than the pivot element. fflassuming the oracle, then, we could guarantee n lg n running time. 4.2.1 Worst Case Without the oracle, how bad could this be? If our pivot element in fact was the smallest element, then we would not be doing a divide and conquer splitting the array size in half each time. We would only be shortening the list by one element every time. In this case, we will do n recursion steps and have to compare the pivot element against n k elements in the k-th step. fflso we have n 2 running time in the worst case.

CSCE 750, Spring 2001 Notes 2 Page 9 4.2.2 Average Case Theorem 4.4. Quicksort has average case running time, for large n, of Proof. approximately 2 n log n comparisons. ffllet A(n) be the average case number of comparisons for sorting n elements. fflour choice of pivot element is random, so it is the i-th element for random i and splits the array of length n into subarrays of length i and n i 1, with each i having probability 1=n.. fflwe do n 1 comparisons in the initial split. fflwe have A(0) = A(1) = 0. fflso the average case running time is A(n) =n 1+ fflthis can be collapsed into n1 X i=0 A(n) =n 1+ 2 n A(i)+A(n i 1) ; n 2: n n1 X i=1 A(i); n 1: (*) because the initial conditions happen to fit the formula and because the A(i) and A(n i 1) terms are symmetric. fflwe have already seen that if we had an oracle and could choose exactly the right pivot point, we could get the average running time down to n log n. fflwhat if we make a wild leap of faith that average isn't asymptotically worse than perfect?

CSCE 750, Spring 2001 Notes 2 Page 10 Proposition 4.5. For n 1, we have A(n)» cn log n for some constant c. Proof. fflinduct. True for n =1. Assume true 8i, 1» i<n. From (*) we have A(n) =n 1+ 2 n = n 1+ 2 n n1 X i=1 n1 X i=1 Z n A(i) ci log i» n 1+ 2 c x log x dx n 1 = n 1+ 2c n2 log n n2 n 2 4 = n 1+c n log n n 2 = c n log n +(1 c 2 )n 1 If we choose c 2, then clearly the proposition is proved. fflthis gives us an upper bound of 2n log n. To get the lower bound: 1. Choose c =2 ". 2. Restate the proposition with A(n) >c n log n 3. Shift the boundaries on the integral to make itgotheotherway. (This inequality says that the integral of an increasing function is bounded below by rectangles whose height is the left point of intersection with the function. The integral is also bounded from above by rectangles intersecting at the right point.) fflso we're done.

CSCE 750, Spring 2001 Notes 2 Page 11 Type comparisons comparisons / n log n bubble 49995000.0 542.814 insertion 25083810.1 272.344 quicksort (original) 155983.1 1.694 quicksort (baase) 153569.2 1.667 quicksort (median) 134376.2 1.459 quicksort (median actual) 149383.9 1.622

CSCE 750, Spring 2001 Notes 2 Page 12 4.3 Merging Arrays fflif we have two sorted lists each of size n=2, we can merge them with no more than n 1 comparisons. fflrequires additional space (i.e., can't be done in place) ffldo the obvious keep pointers ptr_list_1, ptr_list_2, ptr_merged_list while(not done) if list_1[ptr_list_1] < list_2[ptr_list_2] merged_list[ptr_merged_list] < list_1[ptr_list_1] ptr_list_1++ ptr_merged_list++ else merged_list[ptr_merged_list] < list_2[ptr_list_2] ptr_list_2++ ptr_merged_list++ endif endwhile when we run out of one list, copy the other list to merged_list keep three pointers fflworst case: we never run out of either list because the two lists exactly interleave Theorem 4.6. Any algorithm that merges by comparison of keys two sorted arrays of n=2 elements into one sorted array of n elements must do at least n 1 comparisons in the worst case.

CSCE 750, Spring 2001 Notes 2 Page 13 Proof. Assume arrays A =[a i ] and B =[b i ]. We claim that if we have a 1 <b 1 <a 2 <b 2 <:::<a i <b i <a i+1 <b i+1 <:::<a n=2 <b n=2 then we must do n 1 comparisons. The claim is that we must compare a i with b i for 1» i» n=2 and that we must compare b i with a i+1 for 1» i» n=2 1, for a total of n 1 comparisons. If for some reason the algorithm is written so as not to compare a i with b i for some i, then the algorithm cannot correctly sort two arrays in which these elements are reversed in magnitude. If for some reason the algorithm is written so as not to compare b i with a i+1 for some i, then the algorithm cannot correctly sort two arrays in which these elements are reversed in magnitude. But reversing either pair above does nothing to change the sorted order of the final array except for the order of the pairs in question.

CSCE 750, Spring 2001 Notes 2 Page 14 4.4 Mergesort Theorem 4.7. Mergesort has worst case running time of (n log n). Proof. Clearly W (n) =2 W (n=2) + n 1 =4 W (n=4) + n 1+2(n=2 1) =4 W (n=4)+2n 1 2 ::: = n W (1) + (log n)n (n 1) = (nlog n)

CSCE 750, Spring 2001 Notes 2 Page 15 4.5 Lower Bounds in General fflconsider an algorithm to sort by comparison of keys as a decision tree, as in Figure 4.15. fflthere are n! total permutations of n elements, so a decision tree for sorting n elements must have at least n! leaves. fflfor a binary tree of k leaves and height h we have k» 2 h, that is, h lg k. fflso a decision tree, with n! leaves, must have height atleast dlg(n!)e ffland the lower bound for number of comparisons is the lower bound for the height of the decision tree. fflso the lower bound for sorting by comparing keys is dlg(n!)e fflnow, n! n(n 1):::(dn=2e) so lg(n!) n 2 lg n 2 fflnote that lg(n!) n lg n lg e n ß n lg n 1:443n

CSCE 750, Spring 2001 Notes 2 Page 16 4.6 Lower Bounds for the Average Case fflthe worst case requires looking for the maximal path down a decision tree. fflthe average case requires looking at the average path length. fflconsider the average case, that is, sum of all path lengths to leaves n! Proposition 4.8. The sum of all path lengths to leaves is minimized when the tree is completely balanced. Proof. Disconnecting any subtree from a balanced binary tree, and moving that subtree so as to connect it somewhere else in a binary tree, necessarily increases the sum of the path lengths. Theorem 4.9. The average case of any sort by comparison of keys is at least lg(n!) comparisons. Proof. The minimal sum of all path lengths to trees is (n!) (lg(n!)) so the average path length is bounded below by lg(n!)

CSCE 750, Spring 2001 Notes 2 Page 17 4.7 Heapsort fflheapsort in theory isabettersort: 1. worst case is n log n, 2. average case is n log n, 3. and it doesn't require extra space. fflon the other hand: 1. it's a fixed cost algorithmthere's no benefit to having the list sorted to start with, 2. the constant is worse than other constants, 3. and it doesn't cache well. Definition 4.10. A binary tree T is a heap if and only if 1. T is complete at least through depth h 1; 2. all leaves are at depth h or depth h 1; 3. all paths to a leaf at depth h are to the left of any path to a leaf at depth h 1; 4. the key value at any node is greater than or equal to the key value of either of its children, if any children exist. fflnote that Baase carefully defines a heap structure as a tree with properties 1-3, a partial order tree to be a tree with property 4, and then fails to define a heap itself. Say what?

CSCE 750, Spring 2001 Notes 2 Page 18 fflexamples, page 183, and elsewhere. fflheapsort metaphysics: If we have a heap, then 1. The largest element isthe root of the tree. 2. We can exchange the root with the last" element 3. and then recreate the heap in log n steps by pushing the used-to-be-last element into its proper place, 4. resulting in a heap with one element less than before. 5. So if we can create a heap in the first place in n log n steps 6. then we can put the elements into sorted order in another n log n steps. 7. Further, this would be guaranteed, fixed-cost, running time. fflfigure 4.18, page 185, describing the re-heaping process. fflalgorithm 4.6, page 186 Lemma 4.11. At most 2h comparisons are needed torecreate the heap after exchange of the root with the currently-last element. Proof. Worst case: We compare the two children of the current" node to find the larger, and then we compare the larger with the current" node to get the largest of all three.

CSCE 750, Spring 2001 Notes 2 Page 19 4.7.1 Creating the heap void constructheap(heap) if(heap!= leaf) constructheap(left subtree of heap) constructheap(right subtree of heap) key = root(heap) fixheap(heap,key) endif Theorem 4.12. Procedure constructheap(*) creates a structure with the partial order property (4).

CSCE 750, Spring 2001 Notes 2 Page 20 4.7.2 Constructing a Heap: Worst case analysis fflnote that since the two halves of the recursive breakdown of the problem are not necessarily equal in size, a precise analysis could be painful. fflhowever, the work for a problem of size n will be bounded below and above by problems of size 2 h1 and 2 h, respectively, where 2 h1» n<2 h fflso we deal with complete binary trees, with N =2 h 1 total nodes. fflthen W (N) =2W ((N 1)=2) + 2 lg N fflapply the Master Theorem ffl b =2,c =2,E =1,f(N) =2lgN ffl Choose " such that 0 <"<1, and we get that W (N) = (N).

CSCE 750, Spring 2001 Notes 2 Page 21 4.7.3 Heapsort: Worst case analysis fflthe description of Heapsort on pages 188-191 seems to me to be especially confused... fflbut we can at least describe the storage of the array to be sorted. fflif we have a complete binary tree, and we number the nodes top to bottom, left to right, from 1 to n 1we find that the two nodesbelow any particular node i are those labelled 2i and 2i +1. So we can use this numbering in order to create and sort a heap using no extra storage. fflthe array will be a heap under this ordering scheme if 8i, a[i] a[2i] and a[i] a[2i +1] fflnow, in each step, we remove the largest element, shorten the array by 1, and then fix up the heap. Fixing up a heap of k nodes takes at most 2blg kc comparisons.

CSCE 750, Spring 2001 Notes 2 Page 22 fflso the total number of comparisons is the (n) to construct the heap originally plus 2 n1 X k=1 blg kc < 2 =2» 2 n1 X k=1 n1 X k=1 Z n 1 lg k lg e log k lg e log xdx = 2(lg e)(n log n n) fflso the running time is asymptotically n log n worst case. fflactually, it's (n log n) average case also.

CSCE 750, Spring 2001 Notes 2 Page 23 4.7.4 Accelerated Heapsort