Selection and Adversary Arguments. COMP 215 Lecture 19

Similar documents
Data Structures and Algorithms

Quick Sort Notes , Spring 2010

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

1 Terminology and setup

Partition and Select

N/4 + N/2 + N = 2N 2.

Find Min and Max. Find them independantly: 2n 2. Can easily modify to get 2n 3. Should be able to do better(?) Try divide and conquer.

Algorithms Test 1. Question 1. (10 points) for (i = 1; i <= n; i++) { j = 1; while (j < n) {

Algorithms And Programming I. Lecture 5 Quicksort

Lecture 4. Quicksort

15.1 Introduction to Lower Bounds Proofs

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Concrete models and tight upper/lower bounds

CS 2110: INDUCTION DISCUSSION TOPICS

Data selection. Lower complexity bound for sorting

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Finding the Median. 1 The problem of finding the median. 2 A good strategy? lecture notes September 2, Lecturer: Michel Goemans

CPSC 320 Sample Final Examination December 2013

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Linear Time Selection

Review Of Topics. Review: Induction

Lecture 2: Divide and conquer and Dynamic programming

1 Introduction (January 21)

Data Structures and Algorithms CSE 465

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

Design and Analysis of Algorithms

Week 5: Quicksort, Lower bound, Greedy

CS 161 Summer 2009 Homework #2 Sample Solutions

Divide-and-conquer. Curs 2015

Fast Sorting and Selection. A Lower Bound for Worst Case

data structures and algorithms lecture 2

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

COMP Analysis of Algorithms & Data Structures

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

IS 709/809: Computational Methods in IS Research Fall Exam Review

Asymptotic Analysis and Recurrences

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Kartsuba s Algorithm and Linear Time Selection

Algorithm Design and Analysis

Analysis of Algorithms - Using Asymptotic Bounds -

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Asymptotic Analysis, recurrences Date: 9/7/17

CS361 Homework #3 Solutions

Recurrences COMP 215

Divide-and-conquer: Order Statistics. Curs: Fall 2017

Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

COMP 382: Reasoning about algorithms

CSE 421, Spring 2017, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

CS173 Lecture B, November 3, 2015

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Divide and Conquer Algorithms

CSE 312, Winter 2011, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

1 Divide and Conquer (September 3)

Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

An analogy from Calculus: limits

CSE 4502/5717: Big Data Analytics

Asymptotic Algorithm Analysis & Sorting

CSCI 3110 Assignment 6 Solutions

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Find an Element x in an Unsorted Array

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Design and Analysis of Algorithms

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Data Structures and Algorithm Analysis (CSC317) Randomized Algorithms (part 3)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Divide and Conquer Algorithms

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Multidimensional Divide and Conquer 1 Skylines

1 Closest Pair of Points on the Plane

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Data Structures and Algorithms CSE 465

Divide and Conquer CPE 349. Theresa Migler-VonDollen

Problem Set 1. CSE 373 Spring Out: February 9, 2016

ITEC2620 Introduction to Data Structures

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

15. LECTURE 15. I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one.

CSCE 750, Spring 2001 Notes 2 Page 1 4 Chapter 4 Sorting ffl Reasons for studying sorting is a big deal pedagogically useful Λ the application itself

TTIC / CMSC Algorithms. Lectures 1 and 2 + background material

Introduction to Algorithms

Today s Outline. CS 362, Lecture 4. Annihilator Method. Lookup Table. Annihilators for recurrences with non-homogeneous terms Transformations

Fundamental Algorithms

Fundamental Algorithms

Solutions. Prelim 2[Solutions]

CS173 Running Time and Big-O. Tandy Warnow

CS 4104 Data and Algorithm Analysis. Recurrence Relations. Modeling Recursive Function Cost. Solving Recurrences. Clifford A. Shaffer.

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Data structures Exercise 1 solution. Question 1. Let s start by writing all the functions in big O notation:

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

MIDTERM I CMPS Winter 2013 Warmuth

Analysis of Algorithms - Midterm (Solutions)

Prelim 2[Solutions] Solutions. 1. Short Answer [18 pts]

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

CPS 616 DIVIDE-AND-CONQUER 6-1

Exercise Sheet #1 Solutions, Computer Science, A M. Hallett, K. Smith a k n b k = n b + k. a k n b k c 1 n b k. a k n b.

Math Models of OR: Branch-and-Bound

Limitations of Algorithm Power

(Refer Slide Time: 0:21)

What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

Transcription:

Selection and Adversary Arguments COMP 215 Lecture 19

Selection Problems We want to find the k'th largest entry in an unsorted array. Could be the largest, smallest, median, etc. Ideas for an n lg n algorithm? We will think about: Largest, smallest Largest AND smallest Second largest Kth.

Finding the Largest Item We have seen a simple algorithm that requires n 1 comparisons in the worst case. That's optimal.

Largest and Smallest We could run the previous algorithm twice, which would give us 2n 2 comparisons. We have seen an algorithm that requires comparisons in the worst case. Anyone remember? Is this optimal? First idea is to consider a decision tree. There must be at least n leaves. Therefore the height must be at least lg n. Obviously not a very tight bound. 3 n 2 2

Adversary Arguments We design an adversary that forces any algorithm to do as much work as possible. The adversary does not have a particular solution in mind all answers are chosen to reveal as little information as possible while being consistent with earlier answers. The analysis proceeds by Determining what information the algorithm needs to solve the problem. Determining how long it will take to get that information from the adversary.

Adversary for Largest/Smallest In order to determine both the largest and the smallest we assign states to keys: X Key has not been involved in comparison. (0 unit) L Key has lost at least one comparison. (1 unit) W Key has won at least one comparison. (1 unit) WL Key has won and lost a comparison. (2 units) We cannot determine the largest and smallest keys until All keys except one have lost a comparison: n 1 units. All keys except one have won a comparison: n 1 units. We need to learn 2n 2 things overall. We design an adversary that reveals as little as possible.

Adversary for Largest/Smallest Before Winner After Information X X 1 W L 2 X L 1 W L 1 X W 2 L W 1 X WL 1 W WL 1 L L 1 WL W 1 L W 2 L W 0 L WL 2 L WL 0 W W 1 W WL 1 W WL 1 W WL 0 WL WL be consistent W WL 0

Adversary for Largest/Smallest We need to get 2n 2 items of information. How many comparisons are necessary? The most information per comparison (2) occurs when neither item has one or lost. This can happen at most n/2 times for a total of n items of information. Any other comparison results in at most 1 unit of information. We need at least 2n 2 n = n 2 of these. Total number of comparisons is n 2 + n/2 = 3n/2 2. This lower bound matches the worst case of our algorithm.

Second Largest Item One possibility is to find the largest, remove it, then find the largest remaining: (n 1) + (n 2) = 2n 3 comparisons. A better alternative is the tournament approach. Hold a tournament to determine the largest key. Since the second largest key must be defeated by the largest key at some point, We look for the largest key among those keys beaten by the largest key.

Second Largest Analysis We will just do the analysis for powers of 2. First, how many comparisons in the tournament: n 2 n 2 n lg n = n 2 2 lg n i=1 1 2 i = n 1 Next, how many keys did the max key defeat? lg n Therefore searching for the max among the defeated keys will take lg n 1 comparisons. Total comparisons: (n 1) + (lg n 1) = n + lg n 2. If n is not a power of 2: n lg n 2 Is this optimal?

Second Largest Adversary For any algorithm, the largest key is involved in some number of comparisons m. Discounting the largest key, there are n 1 keys. It takes at least n 2 comparisons to find the second largest among those keys. The m comparisons with the largest key cannot count toward that n 2. Therefore the total number of comparisons required is at least m + n 2. We will construct an adversary that forces m to be at least lg n.

Second Largest Adversary Adversary builds a tree, and uses it to guide answers. Initially all nodes are roots (throughout a root node will represent a key that hasn't lost a comparison.) Two roots compared: If both trees are the same size, answer is arbitrary, smaller is made child of larger. If one tree is larger, the root of the smaller tree loses, and is made a child of the root of the larger tree. A root and a non root are compared. The non root is declared smaller, and trees are not changed. Two non roots are compared. Answer is consistent with previous answers, and trees are not changed.

Second Largest Adversary When any correct algorithm terminates, there can be at most one root. Otherwise there are two keys that never lost a comparison. The rules on the previous slide are designed so that the tree with the largest key as a root grows as slowly as possible.

Second Largest Adversary The question is: how many comparisons must a key have been involved in to end up at the root of the final tree? The size of the tree rooted at the largest tree at most doubles after each comparison, after the kth comparison: size k 2 size k 1 Initial size of the tree rooted at the largest tree is 1. size 0 =1 Homogeneous linear recurrence. Solution is: size k 2 k. n=size m 2 m lg n m lg n m

Second Largest Adversary Final result is a n lg n 2 lower bound. Recall that our tournament algorithm required comparisons. n lg n 2

Kth Smallest Entry We can solve this problem with order n comparisons (average case) with a small modification to quicksort. Recall, that after each call to partition, the pivot item is in its final sorted position. If the pivot item is at the kth position, then we have the kth smallest item. Here is the algorithm: Partition the data, if pivot point = k, terminate. if pivot point > k, partition the array up to pivot position. if pivot point < k, partition the array after the pivot position. Repeat until pivot position = k.

Selection Analysis Worst case comparisons? Average case comparisons?

Selection Analysis Worst case comparisons? Same as quicksort: n 2. Average case comparisons? In order to do an average case analysis we sum up the cost associated with every possible input, divide by the number of possible inputs. Assume that every k is equally possible, and every pivot point is equally possible after a partition.

Selection Analysis Average Case Input size of recursive calls can be anything from 0 to n 1. Size is 0 if k = p (pivot point) after first partition. There are n ways for that to happen. Size is 1 if k = 1 and p = 2, or if k=n and p = n 1... two possible ways. Four ways for size to be 2.... 2(n 1) ways for size to be n 1.

Selection Analysis Average Case This leads to an ugly recurrence: A n = na 0 2 A 1 4 A 2 2 n 1 A n 1 n 1 n 2 4 2 n 1 The book informs us that this works out to: A n 3 n

Selection in Worst Case Linear Time In our previous algorithm we wanted the pivot item to be near the median. This causes the size of the array to be roughly halved on each recursive call. No obvious way to select a pivot item near the median, without knowing what the median is. Determining the median requires solving the selection problem. Doh. Amazingly, there is a way forward...

Linear Time Selection Alter our partition function as follows: First break the array into n/5 groups. Compute the median of each of those groups. This can be done with six comparisons per group. Alternately, just use insertion sort on each group. Now, recursively call the selection algorithm to determine the median of our n/5 medians. It turns out that this median of medians will be reasonably close to the median. Use the result as the pivot point and partition as usual. The resulting algorithm is worst case linear time.

Linear Selection Analysis First, how close is the median of medians to the median? Let's draw a picture... So, roughly speaking, there are at most 3(n/10) items larger than the median of medians, and at most that many smaller. So the worst case size of the input to the recursive call is about 7n/10. Book comes up with 7n/10 3/2.

Linear Selection Analysis We can write the following recurrence to describe the worst case number of comparisons: W n = W 7 n 10 3 2 W n 5 6 n 5 n Worst case cost of the next recursive call cost of finding the median of n/5 medians cost of finding the n/5 medians cost of partition

Linear Selection Analysis We don't really have any tools for dealing with this recurrence: Revert to guess and check. We guess that it is W(n) is linear, i.e. W(n) <= cn. Therefore: W n = W 7 n 10 3 W n 11 n 2 5 5 c 7 n 10 3 c n 2 5 11 n 5 Solving this, we get 22 <= c. c n We then use induction to prove that W(n) <= 22n.

Linear Selection Analysis There are selection algorithms that are closer to 3n in the worst case. We could construct an adversary argument to provide a lower bound for this problem. The highest lower bound determined so far is a bit more than 2n. Computer Algorithms, Baase and Van Gelder, 2000.