Data selection. Lower complexity bound for sorting

Similar documents
Week 5: Quicksort, Lower bound, Greedy

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Algorithms And Programming I. Lecture 5 Quicksort

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Lecture 4. Quicksort

Data Structures and Algorithms

COMP Analysis of Algorithms & Data Structures

Analysis of Algorithms CMPSC 565

Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

Fundamental Algorithms

Fundamental Algorithms

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Analysis of Approximate Quickselect and Related Problems

Partition and Select

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Problem Set 2 Solutions

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Analysis of Algorithms. Randomizing Quicksort

Sorting. CS 3343 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

CSE 4502/5717: Big Data Analytics

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Review Of Topics. Review: Induction

Problem 5. Use mathematical induction to show that when n is an exact power of two, the solution of the recurrence

5. DIVIDE AND CONQUER I

CSCE 750, Spring 2001 Notes 2 Page 1 4 Chapter 4 Sorting ffl Reasons for studying sorting is a big deal pedagogically useful Λ the application itself

Quicksort. Where Average and Worst Case Differ. S.V. N. (vishy) Vishwanathan. University of California, Santa Cruz

data structures and algorithms lecture 2

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

Data Structures and Algorithms CSE 465

Analysis of Algorithms - Using Asymptotic Bounds -

Sorting algorithms. Sorting algorithms

Fast Sorting and Selection. A Lower Bound for Worst Case

Divide-and-conquer. Curs 2015

5. DIVIDE AND CONQUER I

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

Selection and Adversary Arguments. COMP 215 Lecture 19

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Algorithm Design and Analysis

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

1 Terminology and setup

1. Basic Algorithms: Bubble Sort

On Partial Sorting. Conrado Martínez. Univ. Politècnica de Catalunya, Spain

Design and Analysis of Algorithms

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

Lower and Upper Bound Theory

Outline. 1 Introduction. Merging and MergeSort. 3 Analysis. 4 Reference

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

1. Basic Algorithms: Bubble Sort

Running Time Evaluation

Quick Sort Notes , Spring 2010

COMP 382: Reasoning about algorithms

Design and Analysis of Algorithms

Longest Common Prefixes

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Central Algorithmic Techniques. Iterative Algorithms

1 Divide and Conquer (September 3)

Linear Selection and Linear Sorting

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

Lecture 6: Recurrent Algorithms:

5. DIVIDE AND CONQUER I

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

CS Data Structures and Algorithm Analysis

Concrete models and tight upper/lower bounds

Divide-Conquer-Glue Algorithms

Disjoint Sets : ADT. Disjoint-Set : Find-Set

Advanced Analysis of Algorithms - Midterm (Solutions)

Bin Sort. Sorting integers in Range [1,...,n] Add all elements to table and then

Advanced Analysis of Algorithms - Homework I (Solutions)

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures

Divide-and-conquer: Order Statistics. Curs: Fall 2017

2.2 Asymptotic Order of Growth. definitions and notation (2.2) examples (2.4) properties (2.2)

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Data Structures and Algorithms Chapter 3

Recommended readings: Description of Quicksort in my notes, Ch7 of your CLRS text.

Data Structures and Algorithms

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

Big O 2/14/13. Administrative. Does it terminate? David Kauchak cs302 Spring 2013

CS361 Homework #3 Solutions

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

Divide-and-Conquer. a technique for designing algorithms

N/4 + N/2 + N = 2N 2.

CS 161 Summer 2009 Homework #2 Sample Solutions

Data Structures and Algorithms Chapter 3

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Lecture 5: Loop Invariants and Insertion-sort

MIDTERM I CMPS Winter 2013 Warmuth

Linear Time Selection

Transcription:

Data selection. Lower complexity bound for sorting Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 12

1 Data selection: Quickselect 2 Lower complexity bound for sorting 3 The worst-case complexity bound 4 The average-case complexity bound 5 Lower sorting complexity under additional constraints 2 / 12

Data Selection vs. Data Sorting Selection: finding only the k th smallest element, called the element of rank k, or the k th order statistic in a list of n items. Main question: can selection be done faster without sorting? Quickselect: the average Θ(n) and worst-case Θ(n 2 ) complexity 1 If n = 0 or 1, return not found or the list item, respectively. 2 Otherwise, choose one of the list items as a pivot, p, and partition the list into disjoint head and tail sublists with j and n j 1 items, respectively, separated by p at position with index j a. 3 Return the result of quickselect on the head if k < j; the element p if k = j, or the result of quickselect on the tail otherwise. a All head (tail) items are less (greater) than the pivot p and precede (follow) it. 3 / 12

Analysis of Quickselect: Average-case Complexity Theorem 2.33: The average-case time complexity of quickselect is linear, or Θ(n). Proof. Up to cn operations to partition the list into the head and tail sublists of size j and n 1 j, respectively, where 0 j n 1. As in quicksort, each final pivot index j with equal probability 1 n. Average time T (n) to select the k th smallest item out of n items: T (n) = 1 n 1 T (j)+t (n j 1) n 2 + cn = 1 n 1 n T (j) + cn. j=0 Therefore, nt (n) = n 1 j=0 T (j) + cn2. j=0 nt (n) (n 1)T (n 1) = T (n 1) + c(2n 1), or T (n) T (n 1) + c, so that T (n) Θ(n). 4 / 12

Implementation of Quickselect algorithm quickselect Array-based quickselect finds k th smallest element in the subarray a[l..r] Input: array a[0..n 1]; array indices l, r; integer k begin if l r then i pivot(a, l, r) return initial position of pivot j partition(a, l, r, i) return final position of pivot q j l + 1 the pivot s rank in a[l..r] if k = q then return a[j] else if k < q then return quickselect(a, l, j 1, k) else return quickselect(a, j + 1, r, k q) end if else return not found end 5 / 12

Sorting by Pairwise Comparisons: Decision Tree Representing any sorting of n items by pairwise comparisons with a binary decision tree having n! leaves (internal nodes: comparisons). 1 : 2 Decision tree for n = 3 a 1 a 2 a 1 < a 2 2 : 3 2 : 3 a 2 a 3 a 2 < a 3 a 2 a 3 a 2 < a 3 3 2 1 1 : 3 1 : 3 1 2 3 a 1 a 3 a 1 < a 3 a 1 a 3 a 1 < a 3 2 3 1 2 1 3 3 1 2 1 3 2 6 / 12

Sorting by Pairwise Comparisons: Decision Tree Each leaf ijk: a sorted array a i, a j, a k obtained from the initial list a 1, a 2, a 3. Each internal node: a pairwise comparison i : j between the elements a i and a j. Two downward arcs: two possible results: a i a j or a i < a j. Any of n! permutations of n arbitrary items a 1,..., a n may be met after sorting: so the decision tree must have at least n! leaves. The path length from the root to a leaf is equal to the total number of comparisons for getting the sorted list at the leaf. The longest path (the tree height) is equal to the worst-case number of comparisons. Example: 3 items are sorted with no more than 3 comparisons, because the height of the tree for n = 3 is equal to 3. 7 / 12

Sorting by Pairwise Comparisons: Decision Tree a = (35, 10, 17) Decision tree for n = 3 1 : 2 a 1 a 2 a 1 < a 2 2 : 3 2 : 3 a 2 a 3 a 2 < a 3 a 2 a 3 a 2 < a 3 3 2 1 1 : 3 1 : 3 1 2 3 a 1 a 3 a 1 < a 3 a 1 a 3 a 1 < a 3 2 3 1 2 1 3 3 1 2 1 3 2 a = (10, 17, 35) Example: a 1 = 35, a 2 = 10, a 3 = 17 1 Comparison 1 : 2 (35 > 10) left branch a 1 > a 2 2 Comparison 2 : 3 (10 < 17) right branch a2 < a3 3 Comparison 1 : 3: (35 > 17) left branch a1 > a3 4 Sorted array 231 a 2 = 10, a 3 = 17, a 1 = 35 8 / 12

The Worst-case Complexity Bound Lemma: A decision tree of height h has at most 2 h leaves. Proof: by mathematical induction. Base cases: A tree of height 0 has at most 2 0 leaves (i.e. one leaf). Hypothesis: Let any tree of height h 1 have at most 2 h 1 leaves. Induction: Any tree of height h consists of a root and two subtrees of height at most h 1 each. The number of leaves in the whole decision tree of height h is equal to the total number of leaves in its subtrees, that is, at most 2 h 1 + 2 h 1 = 2 h. 9 / 12

The Worst-case Complexity Bound Theorem 2.35 (Textbook): Every pairwise-comparison-based sorting algorithm takes Ω(n log n) time in the worst case. Proof: Each binary tree, as shown in Slide 9, has at most 2 h leaves. The least height h such that 2 h n! has the lower bound h lg(n!). By the Stirling s approximation, n! n n e n 2πn as n. Therefore, asymptotically, lg(n!) n lg n 1.44n, or lg(n!) Ω(n log n). Therefore, heapsort and mergesort have the asymptotically optimal worst-case time complexity for comparison-based sorting. 10 / 12

The Average-case Complexity Bound Theorem 2.36 (Textbook): Every pairwise-comparison-based sorting algorithm takes Ω(n log n) time in the average case. Proof: Let H(k) be the sum of all heights of k leaves in a balanced decision tree with equal numbers, k, of leaves on the left and right subtrees. 2 Such a tree has the smallest height, i.e., in any other decision tree, the sum of heights cannot be smaller than H(k). H(k) = 2H ( k 2 ) + k as the link to the root adds one to each height, so that H(k) = k lg k. When k = n! (the number of permutations of an array of n keys), H(n!) = n! lg(n!). Given equiprobable permutations, the average height of a leaf is H avg (n!) = 1 n! H(n!) = lg(n!) n lg n 1.44n. Thus, the lower bound of the average-case complexity of sorting n items by pairwise comparisons is Ω(n log n). 11 / 12

Counting Sort Exercise 2.7.2 (Textbook) Input: an integer array a n = (a 1,..., a n ); each a i Q = {0,..., Q 1}. Make a counting array t Q and set t q 0 for q Q. Scan through a n to accumulate in the counters t q ; q Q, how many times each item q is found: if a i = q, then t[q] t[q] + 1. Loop through 0 q Q 1 and output t q copies of q at each step. Linear worst- and average-case time complexity, Θ(n) when Q is fixed. Q + n elementary operations to first set t Q to zero; count then how many times t q each item q is found in a n, and successively output the sorted array a n by repeating t q times each entry q. Theorems 2.35 and 2.36 do not hold under additional data constraints! 12 / 12