Written Homework #1: Analysis of Algorithms

Written Homework #1: Analysis of Algorithms CIS 121 Fall 2016 cis121-16fa-staff@googlegroups.com Due: Thursday, September 15th, 2015 before 10:30am (You must submit your homework online via Canvas. A paper copy is not required) Learning Goals The goals of this assignment are to understand different methods of analyzing algorithms. The skills that you will practice in this homework are: Analyzing programs by timing them on different sizes of input, and building a mathematical model (an equation) that predicts the run time for other sizes of inputs. Using the doubling test and log ratio to predict the order of growth, by taking the slope in a power-law relationship. Understanding approximate models and orders of growth. Analyzing the orders of growth of different code snippets. Learning the definition of different common notations that we use to classify algorithms. Note that your textbook focuses on a particular approximate model, called Tilde notation, which is not as common as big O, big Theta, and big Omega. For the parts of this homework assignment that deal with O, Theta and Omega, you will need to do research and draw on sources other than your textbook. Practice solving recurrences. If you ve forgotten how to do these, there is a textbook called Mathematics for Computer Science by Eric Lehman and Tom Leighton that is linked from the Resources section of the course homepage. It has a chapter on recurrences that you should read. You should read Chapter 1.4 in your textbook to help you understand how to answer questions 2-7. For the remainder of the assignment, the textbook will be useful in helping you understand the fundamentals of approximate models, but you will need to use external resources to go into greater detail about the definitions of O, Theta and Omega. 1 Partner 0 points Did you work with a partner? If so, please say who you worked with. Give their name and their PennKey username. Remember that it is fine to discuss the problems with your partner, but you should write up your answers independently. Doing so improves your understanding and retention of the concepts. 1

2 Experimental analysis through timing experiments The table below contains observations of the running times of a program for different sized inputs. Problem size N Running time t(n) 10 0.0002 20 0.0021 40 0.0339 80 0.4665 160 7.2499 320 120.3161 Using any method, plot the problem size N on x-axis against the running time t(n) on y-axis. Give a mathematical model for the data. Solution (0.5 pts) Any reasonable plot that plots the data points. (0.5) Model: T (N) = 1.1 10 8 N 4 3 Applying the doubling test Using the data from the previous question, calculate the log ratio and predict what the running times will be be for N = {640, 1280, 4000000 using the doubling test. Compare this value to your model. Solution (0.5 pts ea) Around the magnitude of 1800, 30000, 2.8e18. If a numerical difference inside of a reasonable range is made and the overall idea is there, full credit is given. Partial credit is given based on the amount of work shown otherwise. 2

4 Tilde approximation Let each cell in the function column represent a different g(n). Compute each tilde approximation f(n). Compare the ratio of g(n)/f(n) for each g(n) f(n). g(n)/f(n) for No. g(n) f(n) N = 1 N = 100 N = 10, 000 N = 1, 000, 000 1. N + 3 2. 1234 + 3 N ( 1 3. 3 + 1 ) ( 1 N 2 2 + 2 ) N 4. N(2N 1)(N 3)/7 5. lg N N + 1000 lg N 6. 2N 2 + N + 4 7. 3 lg 2 N + 2 10 8. N 3 + lg N 6 9. 2 lg N + N 2 + 1 10. 2 lg N + N + 5 Solution 3

g(n)/f(n) for No. g(n) f(n) N = 1 N = 100 N = 10, 000 N = 1, 000, 000 1. N + 3 N 4 1.03 1.0003 1.000003 2. 1234 + 3 N ( 1 3. 3 + 1 ) ( 1 N 2 2 + 2 ) N 1234 1.002431118 1.000024311 1.000000243 1.000000002 1 6 20 1.040312 1.00040003 1.000004 4. N(2N 1)(N 3)/7 2N 3 /7-1 0.96515 0.999650015 0.9999965 5. lg N N + 1000 lg N N lg N N/A (0/0) 11 1.1 1.001 6. 2N 2 + N + 4 2N 2 3.5 1.0052 1.00005002 1.0000005 7. 3 lg 2 N + 2 10 3N 342.3333333 4.413333333 1.034133333 1.000341333 8. N 3 + lg N 6 N 3 1 1.000039863 1 1 9. 2 lg N + N 2 + 1 N 2 3 1.0101 1.00010001 1.000001 10. 2 lg N + N + 5 2N 3.5 1.025 1.00025 1.0000025 Point breakdown. Per row: (1 pt) correct f(n), (0.25 pts ea) correct ratios for each N. Small rounding errors can be overlooked. 5 Orders of growth Given the 10 functions in the chart of the previous problem, order the g(n) by order of growth, with line breaks in between functions with different orders of growth. The first function should have the smallest order of growth, and the last should have the largest. (If two functions have the same order of growth, put them on the same line as each other. You can also refer to the functions by the number.) Remember that orders of growth make an additional simplification beyond what Tilde notation does. 4

6 Code analysis to illustrate common orders of growth (Part 1) How many times does this code print the word bar as a function of n? p u b l i c s t a t i c void foo ( i n t n ) { i n t max = 0 ; f o r ( i n t i = 1 ; i <= n ; i ++) { f o r ( i n t j = 1 ; j < n i ; j =2) { System. out. p r i n t l n ( bar ) ; Solution We know the inner loop runs lg(n i) times for i from 1 to n. Hence we can sum as follows: n 1 i=1 lg(n i) = lg(n 1) +... lg(1) = O(n lg(n)) 7 Code analysis to illustrate common orders of growth (Part 2) How many times does this code print the word bar as a function of n? p u b l i c s t a t i c void bar ( i n t n ) { i n t max = 0 ; f o r ( i n t i = 1 ; i <= n ; i++) { f o r ( i n t j = 1 ; j <= Math. l o g ( i )/Math. l o g ( 2 ) ; j++) { f o r ( i n t k = 0 ; k < Math. pow ( 2, j ) ; k++) { System. out. p r i n t l n ( bar ) ; Solution We know the innermost loop runs 2 j times and that the second nested loop runs lg(i) times. Hence, we can sum as follows: n lg(i) i=1 j=1 2j = n i=1 2 (i 1) = 2 (0 + 1... + (n 1)) = (n 1)(n) = O(n2 ) 8 Definitions of O, Theta, and Omega 1 Give formal definitions of O, Theta, and Omega. These are not covered in your textbook, so you should do research on your own to find their definitions. For this problem, it is fine for you to copy a definition from another source, so long as you (1) attribute the source material, and (2) also explain in your own words what the definition means. 5

9 Upper and lower bounds The running time T (n) of an algorithm P is at least O(n 2 ). Does this statement allow us to make any conclusions about the lower bound or upper bound of algorithm P? Justify your answer. Solution The statement says that T (n) O(n 2 ), but this gives us no information about the upper bound of T (n). For the lower bound, we know that T (n) O(n 2 ). However, there are several functions of the form f(n) where f(n) O(n 2 ). Thus, we have no real information about the lower bound either. Hence, the above statement tells us nothing about the running time T (n) of algorithm P. 6

10 Analyze the O, Ω, and Θ of the following algorithm. p u b l i c i n t maximize ( i n t [ ] p r i c e s ) { i n t max = 0 ; f o r ( i n t i = 0 ; i < p r i c e s. l e n gth ; i++) { f o r ( i n t j = i ; j < p r i c e s. l ength ; j++) { p r o f i t = p r i c e s [ j ] p r i c e s [ i ] ; i f ( p r o f i t > max) { max = p r o f i t ; Solution The algorithm loops through the array using two pointers and examines every possible pair of points. There are exactly ( n 2) unique pairs, resulting in a lower and upper bound of O(n 2 ). Thus, the algorithm is Θ(n 2 ). 11 When can we say that an algorithm is optimal? Give a definition of what it means for an algorithm to be optimal. Given an array of integers, we want to find the minimum value in this array. We can achieve this by performing a single pass through the array and keeping track of the smallest value seen so far. Is this algorithm optimal? If so, explain why it is. If not, give an algorithm with a better runtime. 12 Recurrences Solve the following recurrences and express your answers in Θ notation: a. T (n) = 2T (n 1) + 4 where T (0) = 1 b. T (n) = T (n/2) + n where T (0) = 1 Solution a. T (n) = 2T (n 1) + 4 T (n) = 2T (n 1) + 4 T (n) = 2[2T (n 2) + 4] + 4 T (n) = 4T (n 2) + 12 T (n) = 4[2T (n 3) + 4] + 12 T (n) = 8T (n 3) + 28 T (n) = 2 k T (n k) + 4 2 ( k 1) + 4 2 ( k 2) +... + 4 2 1 + 4 2 0 k 1 T (n) = 2 k T (n k) + 4 2 i i=0 Setting k = n, we obtain: T (n) = 2 n T (0) + n 1 i=0 4 2i. Thus, we have T (n) = 2 n + 4(2 n + 1) = 5 2 n + 4, and we can conclude that T (n) Θ(2 n ). 7

b. T (n) = T (n/2) + n T (n) = T (n/2) + n T (n) = [T (n/4) + n/2] + n T (n) = [T (n/8) + n/4] + n/2 + n k 1 T (n) = T (n/2 k 1 ) + n 2 i Setting k = lg n, we obtain: T (n) = T (0) + n lg n 1 1 i=0 2. i Thus, we have T (n) = 1 + n (1 1 ) = 1 + n (1 2 2 lg n 1 n ) Hence, we see that T (n) = n 1, and we can conclude that T (n) Θ(n). 13 Recurrences (Part 2) Convert the following code fragment to a recurrence and then derive its running time. Assume that the function divide is called as follows: divide(0, n, numbers) where numbers is an array whose values are unknown and n is the length of the array. Express your answer in Θ notation as a function of n. p u b l i c i n t d i v i d e ( i n t s t a r t, i n t end, i n t [ ] numbers ) { i f ( s t a r t >= end ) { return 0 ; i n t index = s t a r t + ( end s t a r t ) / 2 ; d i v i d e ( s t a r t, index, numbers ) ; d i v i d e ( index, end, numbers ) ; f o r ( i n t i = s t a r t ; i < end ; i++) { System. out. p r i n t l n ( numbers [ i ] ) ; return numbers. l e n g t h ; Solution Let T (n) be the running time of the code fragment. We observe that the function makes a recursive call to itself twice, but on only half of the input each time. We further observe that the for loop runs in n time for each iteration. Thus, our recurrence is: T (n) = 2 T (n/2) + n. We proceed to solve this recurrence as follows: T (n) = 2T (n/2) + n i=0 = 2[2T (n/4) + n] + n = 4T (n/4) + 2 n = 2 k T (n/2 k ) + k n Setting k = lg n, we have T (n) = 2 lg n T (0) + n lg n. Thus, T (n) = n + n lg n, so we can conclude that T (n) Θ(n lg n). 14 Extra Credit: Asymptotic analysis Prove that 2 n is O(n!) using induction. 8

Hint: Start with n = 4 as a base case and c = 1 as your constant. Solution Sketch of proof should look something like this: 1. Prove the base case(n = 4): 4! > 2 4. 2. Induction Hypothesis: Assume k! > 2 k for some k 4 3. Inductive Step: Prove that (k + 1)! > 2 k+1 : (k)! k+1 2 > 2 k Since we know k > 4, we know that k+1 2 is greater and we know by our IH that k! > 2 k. This means (k)! k+1 2 must be greater than 2 k. 15 Extra Credit: Big Oh Proofs Prove that f(n) is O(g(n)) if and only if g(n) is Ω(f(n)). Hint: First, prove that if (f(n)) is O(g(n)) then g(n) is Ω(f(n)). You can start by proving that if (f(n)) is O(g(n)) then g(n) is Ω(f(n)). If f(n) is O(g(n)), then what do we know about f(n) from the definition of big-o? We want to prove that g(n) is Ω(f(n)). What can we prove about g(n) from the definition of Ω? Given what we know about the relationship between f(n) and g(n), complete the proof. Solution a. Since f(n) O(g(n)) then from the definition of big-o, we know that c > 0 n 0 n n 0, f(n) c g(n). 1 Rearranging this inequality, we can write: c f(n) g(n), or equivalently, g(n) 1 c f(n). Setting c = 1 c, we have g(n) c f(n). Thus, from the definition of Ω, we can conclude that g(n) Ω(f(n)). This completes the proof. b. Since g(n) Ω(f(n)), from the definition of Ω, we know that c > 0 n 0 n n 0, g(n) c f(n). Rearranging this inequality, we can write 1 c g(n) f(n), or equivalently, f(n) 1 c g(n). Setting c = 1 c,we have g(n) c f(n). Thus, from the definition of big-o, we can conclude that g(n) O(f(n)). This completes the proof. 16 Extra Credit: Relation between O, Theta, and Omega Prove or disprove the following statements. f(n) O(g(n)) implies 2 f(n) O(2 g(n) ) The running time of an algorithm is Θ(g(n)) if and only if its worst-case running time is O(g(n)) and its best-case running time is Ω(g(n)). Solution False. Consider f(n) = 2n and g(n) = n. True. You must show a implies b and b implies a. For a implies b, we can get the definitions of big O and big Omega from big Theta. To show b implies a, we can combine the definitions of big O and big Omega to derive the exact definition of big Theta. 9