CS361 Homework #3 Solutions

Similar documents
Data Structures and Algorithms

N/4 + N/2 + N = 2N 2.

Quick Sort Notes , Spring 2010

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Divide-and-conquer: Order Statistics. Curs: Fall 2017

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Another way of saying this is that amortized analysis guarantees the average case performance of each operation in the worst case.

Review Of Topics. Review: Induction

More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018

CS 161 Summer 2009 Homework #2 Sample Solutions

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Week 5: Quicksort, Lower bound, Greedy

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

Divide-and-Conquer Algorithms Part Two

Tutorial 4. Dynamic Set: Amortized Analysis

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Algorithms And Programming I. Lecture 5 Quicksort

CSCI 3110 Assignment 6 Solutions

Lecture 4. Quicksort

Methods for solving recurrences

Computational Complexity

Advanced Analysis of Algorithms - Midterm (Solutions)

Problem Set 1. CSE 373 Spring Out: February 9, 2016

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015

Data Structures and Algorithms CSE 465

Priority queues implemented via heaps

Lecture 3: Big-O and Big-Θ

Solutions. Problem 1: Suppose a polynomial in n of degree d has the form

Problem Set 2 Solutions

Divide & Conquer. CS 320, Fall Dr. Geri Georg, Instructor CS320 Div&Conq 1

University of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2006

MIDTERM Fundamental Algorithms, Spring 2008, Professor Yap March 10, 2008

P, NP, NP-Complete, and NPhard

CS Data Structures and Algorithm Analysis

CSE373: Data Structures and Algorithms Lecture 2: Math Review; Algorithm Analysis. Hunter Zahn Summer 2016

COL106: Data Structures and Algorithms (IIT Delhi, Semester-II )

IS 709/809: Computational Methods in IS Research Fall Exam Review

Selection and Adversary Arguments. COMP 215 Lecture 19

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

Fundamental Algorithms

Fundamental Algorithms

CS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d

Algorithms. Algorithms 2.4 PRIORITY QUEUES. Pro tip: Sit somewhere where you can work in a group of 2 or 3

How many hours would you estimate that you spent on this assignment?

Fundamental Algorithms

Asymptotic Analysis and Recurrences

Data selection. Lower complexity bound for sorting

CS 470/570 Divide-and-Conquer. Format of Divide-and-Conquer algorithms: Master Recurrence Theorem (simpler version)

Extended Algorithms Courses COMP3821/9801

CPSC 320 Sample Final Examination December 2013

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Linear Time Selection

CSE 373: Data Structures and Algorithms Pep Talk; Algorithm Analysis. Riley Porter Winter 2017

CS 4104 Data and Algorithm Analysis. Recurrence Relations. Modeling Recursive Function Cost. Solving Recurrences. Clifford A. Shaffer.

Randomization in Algorithms and Data Structures

Quicksort algorithm Average case analysis

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Amortized analysis. Amortized analysis

ITEC2620 Introduction to Data Structures

6.854 Advanced Algorithms

Analysis of Algorithms. Outline 1 Introduction Basic Definitions Ordered Trees. Fibonacci Heaps. Andres Mendez-Vazquez. October 29, Notes.

Data structures Exercise 1 solution. Question 1. Let s start by writing all the functions in big O notation:

1 Terminology and setup

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Quicksort (CLRS 7) We previously saw how the divide-and-conquer technique can be used to design sorting algorithm Merge-sort

Analysis of Algorithms CMPSC 565

CSCI Honor seminar in algorithms Homework 2 Solution

5. DIVIDE AND CONQUER I

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

Lecture 10: Powers of Matrices, Difference Equations

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Fast Sorting and Selection. A Lower Bound for Worst Case

University of New Mexico Department of Computer Science. Midterm Examination. CS 361 Data Structures and Algorithms Spring, 2003

Searching. Sorting. Lambdas

Computational Models Lecture 8 1

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

Reductions, Recursion and Divide and Conquer

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

When we use asymptotic notation within an expression, the asymptotic notation is shorthand for an unspecified function satisfying the relation:

Algorithms and Their Complexity

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

Exercise Sheet #1 Solutions, Computer Science, A M. Hallett, K. Smith a k n b k = n b + k. a k n b k c 1 n b k. a k n b.

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

AVL Trees. Properties Insertion. October 17, 2017 Cinda Heeren / Geoffrey Tien 1

Algorithms. Jordi Planes. Escola Politècnica Superior Universitat de Lleida

Divide-and-conquer. Curs 2015

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

Homework 1 Submission

Transcription:

CS6 Homework # Solutions. Suppose I have a hash table with 5 locations. I would like to know how many items I can store in it before it becomes fairly likely that I have a collision, i.e., that two items get hashed to the same location. Assume that the hash function is random, and solve this problem in two ways: (a) Find the smallest value of n for which, if I store n items, the probability I don t get a collision falls below /. Do this by calculating this probability exactly for n =, n =, and so on. (b) Find the smallest value of n for which the expected (or average) number of colliding pairs is equal to or greater than. Do this by calculating this average exactly for the relevant values of n. Do these two methods give roughly the same answer? Can you explain why they are slightly different? Answer. The exact probability that there is no collision is the product of the probabilities, for each item, that it doesn t fall into any of the locations taken by the previous items. For the ith item, there are 5 (i ) locations taken by the previous i items, so this is 5 49 48 5 5 5 5 (n ) 5 The first value of n for which this is less than / is n = 9, for which this probability is roughly.4655. The expected number of colliding pairs is the number of possible pairs of items, times the probability that a given one of them collides. This is. ( ) n 5 = n(n ). The smallest value of n for which this is greater than is n =, for which the expected number of pairs is.. These values are somewhat different because, first of all, setting the threshold at / is arbitrary; secondly, even if we have a collision with probability /, the expected number of collisions could be just ( + )/ = /.

. Consider the following question: Input: a min-heap H containing a set of n numbers, an integer k, and a number x Question: does H contain k or more numbers which are smaller than x? Show how to answer this question in O(k) time; in other words, in an amount of time that grows linearly with k, and doesn t depend on n. Answer. Our goal is to go looking for elements less than or equal to x. We will do this by starting at the root and exploring downward; since this is a min-heap, whenever we reach a node greater than or equal to x we don t need to explore that branch any further, since all its descendants are also greater than x. As soon as we find k nodes smaller than x, we halt and return yes ; if our search ends after fewer than k nodes by hitting bedrock, i.e., nodes greater than or equal to x, we return no. We can do this depth-first or breadth-first, but in either case we never need to explore more than O(k) nodes of the heap. Here s pseudocode for a version that explores depth-first; to do it breadth-first, simply make s a queue instead of a stack. are_there_k_smaller_than_x(k,x) { t = ; // number found smaller than x so far stack s; push root onto s; while (t < k && s is not empty) { y = pop(s); if y < x { t++; push y s daughters onto s; } } if (t >= k) return "yes" else return "no" }

. A d-ary heap is a heap based on a balanced d-ary tree, where each node has d children (except that the bottom row may be incomplete). (a) How would you implement a d-ary heap in an array? Answer: if the array is labelled a[...n] where a[] is the root, the d children of a[k] should be a[dk d + ], a[dk d + ],..., a[dk + ] This generalizes the implementation for a binary heap with d =, where the children of a[k] are a[k] and a[k + ]. If we instead label the array a[...n ] where a[] is the root, the d children of a[k] are which looks a little simpler. a[dk + ], a[dk + ],..., a[dk + d] (b) How should downsift work in a d-ary heap? Answer: to call downsift on a node x, find the minimum of x s d children (this takes d comparisons). If x s key is larger than this minimum (which takes more comparison), swap them. This takes a total of d comparisons. (c) Analyze the running time of makeheap, deletemin, and insert in terms of n and d. Find the dependence on d instead of just absorbing it into the constant hidden in Θ, and discuss whether they are faster or slower than in a binary heap. Answer: since the depth of the tree is roughly log d n and each step of downsift takes d comparisons, downsifting the root, which we have to do for deletemin, takes d log d n = (d/ log d) log n steps at worst. Since d/ log d is an increasing function of d, the worst-case running time of deletemin is worse than for a binary heap. For insert, we need to upsift a new leaf. But this only takes one comparison for each step, so even if we make to go all the way to the root this takes just log d n = (log n)/(log d) time. Since this is a decreasing funciton of d, insert is faster than in a binary heap. For makeheap, the recurrence for makeheaping a set of n nodes is f(n) = d f(n/d) + d log d n since we first makeheap the d subheaps and then downsift the root. Since the driving term is O(log n) is small compared to the homogeneous solution Θ(n), this gives f(n) = Θ(n). But, to get the constant hidden in Θ, we need to think about the fraction of nodes at each level of the tree. Let s denote f(h) as the fraction of the n nodes that are h levels above the leaves; f() is the fraction of nodes that are leaves, f() is the fraction of nodes that are one step above a leaf, and so on. In a

balanced d-ary tree, each level has d times as many nodes as the one above it, so f(h + ) = f(h)/d. which means that f(h) = A/d h for some constant A. But, by definition all these fractions have to add up to, so in a large tree we can say, = f(h) = A (/d) h = A /d = A d d h= h= where we used the formula for a geometric sum. But then A = (d )/d, so f(h) = d d (/d)h. Now, we saw before that each step of downsift takes d comparisons, to find whether we should swap keys with one of our daughters, and if so, which one. Since there are f(h)n nodes at each level, and since we might have to downsift a node at level h up to h times before it finds its proper place, the total worst-case running time is f(h)n d h = nad h/d h /d = n(d ) ( /d) h= h= where we used A = (d )/d and the formula k= krk = r/( r) with r = /d. Simplifying the fraction, this becomes n d d This decreases slightly (towards ) as d increases so makeheap is faster for a d-ary heap than for a binary heap, but not by much. 4

4. Suppose I have a binary search tree. It starts out empty, and I insert three items into it. The shape of the resulting tree depends on which of the 6 possible orders I insert them in. Assume that the 6 possible orders are equally likely; then, give the possible shapes the tree can have, and calculate the probability for each one. Now, for each of these shapes, suppose that I search for one of the three items, where each of the three is equally likely. Calculate the average number of steps I need to find this item, i.e., its average depth in the tree, where the root has depth. Finally, take the average over the possible shapes, weighted by their probabilities, to get the average of this average: in other words, calculate the average depth of a random item in a random tree. If all goes well, you should get / times the average number of comparisons Quicksort needs to sort items, or 8/9! Figure : The five possible tree shapes with three items. Answer. There are 5 possible tree shapes, shown in Figure. The four stringy ones each result from one of the 6 possible orders, and so they have probability /6. The balanced one results from two orders, namely,, and,,, and so its probability is /. For the stringy ones, the average depth is ( + + )/ =, while for the balanced one it is ( + + )/ = /. Thus the total average depth is as promised. 4 6 + = 8 9 5

5. Let s prove that an AVL tree with n nodes has depth O(log n). We will do this by solving the opposite problem: finding the smallest number of nodes that can cause an AVL tree to have a certain depth. Call an AVL tree extreme if every node except the leaves is unbalanced to the right, and let f(d) be the number of nodes in an extreme tree of depth d. Then f() =, f() =, f() = 4, and f(4) = 7. Find a recurrence for f(d) and solve it within Θ. Then argue that n f(d) and therefore d f (n) where f is the inverse function of f. End by proving that d = O(log n). What is the base of the logarithm? How much deeper are AVL trees than perfectly balanced binary trees? d = d- d- Figure : The recurrence for extreme AVL trees. Answer. A little thought and some doodling reveals that the extreme tree of depth d consists of the extreme trees of depth d and d. Adding the root, this gives the recurrence f(d) = f(d ) + f(d ) +. Except for the tiny driving term of +, this is the Fibonacci recurrence. Thus f(d) = Θ(ϕ d ) where φ = ( 5 + )/.68 is the golden ratio, and inverting this gives d = O(log ϕ n). (To be more precise, since f(d) = Θ(ϕ n ) we have n Cϕ d for some constant C, and so d log ϕ (n/c) = log ϕ n log ϕ C.) Since log ϕ n = log n/ log ϕ.44 log n, this is about 44% deeper than a balanced binary tree. 6

Extra Credit: We analyzed the version of Quicksort in which the pivot is a randomly chosen element. We found that the average number of comparisons is f(n) = n ln n We did this in the following way. Let x be the position in the list that the pivot ends up in, and let P (x) be the probability of a given value of x. Then the recurrence for the average number of comparisons is f(n) = n }{{} partitioning n n + + n x= P (k) (f(x ) + f(n x)) }{{} recursion P (x)f(x) dx () In the second line we assumed that P (k) is symmetric, i.e., that P (x) = P (n x). Now, if the pivot is random, its rank is equally likely to take any value from to n, so every value of x occurs with equal probability /n: P (x) = n for all x < n Then Equation () becomes f(n) = n + n n dx f(x) By substituting a solution of the form f(n) = An ln n into this equation and solving for A, we get A = as before. Now, an improved version of Quicksort uses the median of three random elements as the pivot. In this case, P (x) is the probability distribution of the median of three random numbers in the unit interval [, ]. Calculate P (x) by asking, given a random number x between and n, what is the probability that if I choose two more random numbers y and z, then x is greater than y and less than z? Then, how many ways are there for this to happen? (You might want to check your work by confirming that the total probability n P (x) dx is.) Intuitively, rather than being uniform, P (x) should be peaked at x = n/. Finally, use this expression for P (x) in Equation () and again try a solution of the form f(n) = An ln n. You should be able to derive that A is now somewhat smaller than, indicating that this method of choosing the pivot improves the constant in the number of comparisons. Answer. In the median-of-three version, the probability distribution of the median of three random numbers between and n is (when n is large) P (x) = 7 6x(n x) n

To see this, note that once we choose x, the probability that another random number y is less than x is x/n, and the probability that a random number z is greater than x is (n x)/n. The probability that both of these things occur, i.e., that y < x < z, and that we chose x in the first place, is x(n x)/n. But, there are 6 ways for this to happen: the median could be any of x, y, or z, and we can switch which of the other two is greater and which is smaller. Then, with a little help from my computer, I get n P (x) Ax ln x dx = and so Equation () becomes n dx 6x(n x) n x ln x = An ln n 7 An An ln n = n + An ln n 7 An and solving for A gives A = /7.7, roughly 4% faster than the original version. Indeed, taking the median of three elements is often used in practice. Even more extra credit: What happens if we choose the pivot by taking the median of 5, 7,... elements? I leave this to you... 8