Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2010,2017 by Clifford A. Shaffer Data and Algorithm Analysis Title page Data and Algorithm Analysis Clifford A. Shaffer Spring 2017 Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Spring 2017 Copyright c 2010,2017 by Clifford A. Shaffer Data and Algorithm Analysis Spring 2017 1 / 382 Recurrence Relations A (math) function defined in terms of itself. Example: Fibonacci numbers: F(n) = F(n 1) + F(n 2) general case F(1) = F(2) = 1 base cases There are always one or more general cases and one or more base cases. We will use recurrences for time complexity of recursive (computer) functions. General format is T (n) = E(T, n) where E(T, n) is an expression in T and n. T (n) = 2T (n/2) + n Alternately, an upper bound: T (n) E(T, n). Recurrence Relations Recurrence Relations A (math) function defined in terms of itself. Example: Fibonacci numbers: F(n) = F(n 1) + F(n 2) general case F(1) = F(2) = 1 base cases There are always one or more general cases and one or more base cases. We will use recurrences for time complexity of recursive (computer) functions. General format is T (n) = E(T, n) where E(T, n) is an expression in T and n. T (n) = 2T (n/2) + n Alternately, an upper bound: T (n) E(T, n). We won t spend a lot of time on techniques... just enough to be able to use them. Data and Algorithm Analysis Spring 2017 74 / 382 Modeling Recursive Function Cost How does Binary Search work? Look at the middle element, then do binary search on one side. T (n) = T (n/2) + 1; f (1) = 1; no note Modeling Recursive Function Cost Modeling Recursive Function Cost How does Binary Search work? Look at the middle element, then do binary search on one side. T (n) = T (n/2) + 1; f (1) = 1; Mergesort: T (n) = 2T (n/2) + O(n) TOH: T (n) = 2T (n 1) + 1; T (1) = 0 Insertion Sort or Bubble Sort: Typically think of these as double for loops that naturally yeild a summation Can think of them recursively: T (n) = T (n 1) + O(n); T (1) = 1 Mergesort: T (n) = 2T (n/2) + O(n) TOH: T (n) = 2T (n 1) + 1; T (1) = 0 Insertion Sort or Bubble Sort: Typically think of these as double for loops that naturally yeild a summation Can think of them recursively: T (n) = T (n 1) + O(n); T (1) = 1 Data and Algorithm Analysis Spring 2017 75 / 382 Solving Recurrences Solving Recurrences Solving Recurrences We would like to find a closed form solution for T (n) such that: T (n) = Θ(f (n)) Methods: Guess (and test) a solution Expand recurrence Theorems We would like to find a closed form solution for T (n) such that: T (n) = Θ(f (n)) Methods: Guess (and test) a solution Expand recurrence Theorems Note that finding a closed form means that we have f (n) that doesn t include T. Guessing is useful for finding an asymptotic solution. Use induction to prove the guess correct. Data and Algorithm Analysis Spring 2017 76 / 382
Guessing n 2 Note that T is defined only for powers of 2. Guess a solution: T (n) c 1 n 3 = f (n) implies that c 1 7 Inductively, assume T (n/2) f (n/2). 2c 1 (n/2) 3 + 5n 2 c 1 (n 3 /4) + 5n 2 c 1 n 3 if c 1 20/3. Guessing For Big-oh, not many choices in what to guess. 7 1 3 = 7 Guessing n 2 Note that T is defined only for powers of 2. Guess a solution: T (n) c1n 3 = f (n) implies that c1 7 Inductively, assume T (n/2) f (n/2). 2c1(n/2) 3 + 5n 2 c1(n 3 /4) + 5n 2 c1n 3 if c1 20/3. Because 20 4 3 n3 + 5n 2 = 20 3 n3 when n = 1, and as n grows, the right side grows even faster. Data and Algorithm Analysis Spring 2017 77 / 382 Therefore, if c1 = 7, a proof by induction yields: T (n) 7n 3 T (n) O(n 3 ) Is this the best possible solution? No - try something tighter. Therefore, if c 1 = 7, a proof by induction yields: T (n) 7n 3 T (n) O(n 3 ) Is this the best possible solution? Data and Algorithm Analysis Spring 2017 78 / 382 Guess again. implies c 2 7. T (n) c 2 n 2 = g(n) Inductively, assume T (n/2) g(n/2). 2c 2 (n/2) 2 + 5n 2 = c 2 (n 2 /2) + 5n 2 c 2 n 2 if c 2 10 Guess again. implies c2 7. T (n) c2n 2 = g(n) Inductively, assume T (n/2) g(n/2). 2c2(n/2) 2 + 5n 2 = c2(n 2 /2) + 5n 2 c2n 2 if c2 10 Therefore, if c2 = 10, T (n) 10n 2. T (n) = O(n 2 ). Is this the best possible upper bound? Because 10 2 n2 + 5n 2 = 10n 2 for n = 1, and the right hand side grows faster. Yes this is best, since T (n) can be as bad as 5n 2. Therefore, if c 2 = 10, T (n) 10n 2. T (n) = O(n 2 ). Is this the best possible upper bound? Data and Algorithm Analysis Spring 2017 79 / 382 Now, reshape the recurrence so that T is defined for all values of n. T (n) 2T ( n/2 ) + 5n 2 n 2 For arbitrary n, let 2 k 1 < n 2 k. We have already shown that T (2 k ) 10(2 k ) 2. T (n) T (2 k ) 10(2 k ) 2 = 10(2 k /n) 2 n 2 10(2) 2 n 2 40n 2 Hence, T (n) = O(n 2 ) for all values of n. Typically, the bound for powers of two generalizes to all n. Data and Algorithm Analysis Spring 2017 80 / 382 Now, reshape the recurrence so that T is defined for all values of n. T (n) 2T ( n/2 ) + 5n 2 n 2 For arbitrary n, let 2 k 1 < n 2 k. We have already shown that T (2 k ) 10(2 k ) 2. T (n) T (2 k ) 10(2 k ) 2 = 10(2 k /n) 2 n 2 10(2) 2 n 2 40n 2 Hence, T (n) = O(n 2 ) for all values of n. Typically, the bound for powers of two generalizes to all n.
k 1 Expanding Recurrences (Intro) Expanding Recurrences (Intro) Expanding Recurrences (Intro) Take a look at the simple example slideshows at OpenDSA. no note Take a look at the simple example slideshows at OpenDSA. Data and Algorithm Analysis Spring 2017 81 / 382 Expanding Recurrences Expanding Recurrences Expanding Recurrences Usually, start with equality version of recurrence. Assume n is a power of 2; n = 2 k. Usually, start with equality version of recurrence. Assume n is a power of 2; n = 2 k. Data and Algorithm Analysis Spring 2017 82 / 382 Expanding Recurrences (cont) Expanding Recurrences (cont) Expanding Recurrences (cont) = 2(2T (n/4) + 5(n/2) 2 ) + 5n 2 = 2(2(2T (n/8) + 5(n/4) 2 ) + 5(n/2) 2 ) + 5n 2 = 2 k T (1) + 2 k 1 5(n/2 k 1 ) 2 + 2 k 2 5(n/2 k 2 ) 2 + + 2 5(n/2) 2 + 5n 2 k 1 = 7n + 5 n 2 /2 i = 7n + 5n2 1/2 i = 7n + 5n 2 (2 1/2 k 1 ) = 7n + 5n 2 (2 2/n). = 2(2T (n/4) + 5(n/2) 2 ) + 5n 2 = 2(2(2T (n/8) + 5(n/4) 2 ) + 5(n/2) 2 ) + 5n 2 = 2 k T (1) + 2 k 1 5(n/2 k 1 ) 2 + 2 k 2 5(n/2 k 2 ) 2 + + 2 5(n/2) 2 + 5n 2 k 1 k 1 = 7n + 5 n 2 /2 i = 7n + 5n 2 1/2 i = 7n + 5n 2 (2 1/2 k 1 ) = 7n + 5n 2 (2 2/n). This it the exact solution for powers of 2. T (n) = Θ(n 2 ). This it the exact solution for powers of 2. T (n) = Θ(n 2 ). Data and Algorithm Analysis Spring 2017 83 / 382 These have the form: T (n) = at (n/b) + cn k T (1) = c... where a, b, c, k are constants. A problem of size n is divided into a subproblems of size n/b, while cn k is the amount of work needed to combine the solutions. These have the form: T (n) = at (n/b) + cn k T (1) = c... where a, b, c, k are constants. A problem of size n is divided into a subproblems of size n/b, while cn k is the amount of work needed to combine the solutions. Data and Algorithm Analysis Spring 2017 84 / 382
(cont) Expand the sum; n = b m. T (n) = a(at (n/b 2 ) + c(n/b) k ) + cn k = a m T (1) + a m 1 c(n/b m 1 ) k + + ac(n/b) k + cn k m = ca m (b k /a) i a m = a log b n = n log b a The summation is a geometric series whose sum depends on the ratio r = b k /a. There are 3 cases. Data and Algorithm Analysis Spring 2017 85 / 382 (cont) n = b m m = log b n. Set a = b log b a. Switch order of logs, giving (b log b n ) log b a = n log b a. (cont) Expand the sum; n = b m. T (n) = a(at (n/b 2 ) + c(n/b) k ) + cn k = a m T (1) + a m 1 c(n/b m 1 ) k + + ac(n/b) k + cn k m = ca m (b k /a) i a m = a log b n = n log b a The summation is a geometric series whose sum depends on the ratio r = b k /a. There are 3 cases. D & C Recurrences (cont) D & C Recurrences (cont) D & C Recurrences (cont) (1) r < 1. (2) r = 1. i r < 1/(1 r), a constant. T (n) = Θ(a m ) = Θ(n log b a ). r i = m + 1 = log b n + 1 (1) r < 1. r i < 1/(1 r), a constant. When r = 1, since r = b k /a = 1, we get a = b k. Recall that k = log b a. T (n) = Θ(n log b a log n) = Θ(n k log n) T (n) = Θ(a m ) = Θ(n log b a ). (2) r = 1. r i = m + 1 = log b n + 1 T (n) = Θ(n log b a log n) = Θ(n k log n) Data and Algorithm Analysis Spring 2017 86 / 382 D & C Recurrences (Case 3) D & C Recurrences (Case 3) (3) r > 1. D & C Recurrences (Case 3) So, from T (n) = ca m r i, r i = r m+1 1 r 1 = Θ(r m ) T (n) = Θ(a m r m ) = Θ(a m (b k /a) m ) = Θ(b km ) = Θ(n k ) (3) r > 1. So, from T (n) = ca m r i, r i = r m+1 1 r 1 = Θ(r m ) T (n) = Θ(a m r m ) = Θ(a m (b k /a) m ) = Θ(b km ) = Θ(n k ) Data and Algorithm Analysis Spring 2017 87 / 382 Summary Summary Theorem 3.4: T (n) = Apply the theorem: T (n) = 3T (n/5) + 8n 2. a = 3, b = 5, c = 8, k = 2. b k /a = 25/3. Summary Θ(n log b a ) Θ(n k log n) Θ(n k ) if a > b k if a = b k if a < b k Case (3) holds: T (n) = Θ(n 2 ). Theorem 3.4: Θ(n log b a ) T (n) = Θ(n k log n) Θ(n k ) if a > b k if a = b k if a < b k We simplify by approximating summations. Apply the theorem: T (n) = 3T (n/5) + 8n 2. a = 3, b = 5, c = 8, k = 2. b k /a = 25/3. Case (3) holds: T (n) = Θ(n 2 ). Data and Algorithm Analysis Spring 2017 88 / 382
Examples Examples Examples Mergesort: T (n) = 2T (n/2) + n. 2 1 /2 = 1, so T (n) = Θ(n log n). Binary search: T (n) = T (n/2) + 2. 2 0 /1 = 1, so T (n) = Θ(log n). Insertion sort: T (n) = T (n 1) + n. Can t apply the theorem. Sorry! Standard Matrix Multiply (recursively): T (n) = 8T (n/2) + n 2. 2 2 /8 = 1/2 so T (n) = Θ(n log 2 8 ) = Θ(n 3 ). Mergesort: T (n) = 2T (n/2) + n. 2 1 /2 = 1, so T (n) = Θ(n log n). Binary search: T (n) = T (n/2) + 2. 2 0 /1 = 1, so T (n) = Θ(log n). Insertion sort: T (n) = T (n 1) + n. Can t apply the theorem. Sorry! Standard Matrix Multiply (recursively): T (n) = 8T (n/2) + n 2. 2 2 /8 = 1/2 so T (n) = Θ(n log 2 8 ) = Θ(n 3 ). [ c11 c 12 c 21 c 22 ] [ ] [ ] a11 a = 12 b11 b 12 a 21 a 22 b 21 b 22 In the straightforward implementation, 2 2 case is: c 11 = a 11 b 11 + a 12 b 21 c 12 = a 11 b 12 + a 12 b 22 c 21 = a 21 b 11 + a 22 b 21 c 22 = a 21 b 12 + a 22 b 22 So the recursion is 8 calls of half size, and the additions take Θ(n 2 ) work. Data and Algorithm Analysis Spring 2017 89 / 382 Searching Searching Searching Assumptions for search problems: Target is well defined. Target is fixed. Search domain is finite. We (can) remember all information gathered during search. We search for a record with a key. Assumptions for search problems: Target is well defined. Target is fixed. Search domain is finite. We (can) remember all information gathered during search. Well defined: We recognize a hit or miss. Fixed: The target doesn t move during the life of the search. We often choose not to remember information. For example, sequential search does not remember the values seen already. We search for a record with a key. Data and Algorithm Analysis Spring 2017 90 / 382 A Search Model (1) Problem: Given: A list L, of n elements A search key X Solve: Identify one element in L which has key value X, if any exist. Model: The key values for elements in L are unique. One comparison determines <, =, >. Comparison is our only way to find ordering information. Every comparison costs the same. A Search Model (1) A Search Model (1) Problem: Given: A list L, of n elements A search key X Solve: Identify one element in L which has key value X, if any exist. Model: The key values for elements in L are unique. One comparison determines <, =, >. Comparison is our only way to find ordering information. Every comparison costs the same. What if the key values are not unique? Probably the cost goes down, not up. This is an assumption for analysis, not for implementation. We would have a slightly different model (though no asymptotic change in cost) if our only comparison test was <. We would have a very different model if our only comparison was = /. A comparison-based model. String data might require comparisons with very different costs. Data and Algorithm Analysis Spring 2017 91 / 382 A Search Model (2) A Search Model (2) A Search Model (2) Goal: Solve the problem using the minimum number of comparisons. Cost model: Number of comparisons. (Implication) Access to every item in L costs the same (array). Is this a reasonable model and goal? Goal: Solve the problem using the minimum number of comparisons. Cost model: Number of comparisons. (Implication) Access to every item in L costs the same (array). Is this a reasonable model and goal? We are assuming that the # of comparisons is proportional to runtime. Might not always share an array (assumption that all accesses are equal). For example, linked lists. Or data on disk, or across network. We assume there is no relationship between value X and its position. Data and Algorithm Analysis Spring 2017 92 / 382
Linear Search General algorithm strategy: Reduce the problem. Compare X to the first element. If not done, then solve the problem for n 1 elements. Position linear_search(l, lower, upper, X) { if L[lower] = X then return lower; else if lower = upper then return -1; else return linear_search(l, lower+1, upper, X); } Linear Search Linear Search General algorithm strategy: Reduce the problem. Compare X to the first element. If not done, then solve the problem for n 1 elements. Position linear_search(l, lower, upper, X) { if L[lower] = X then return lower; else if lower = upper then return -1; else return linear_search(l, lower+1, upper, X); } What equation represents the worst case cost? Warning: We are using this simple, familiar algorithm as an illustration of how to do full, formal analysis. This includes some recurrence solving techniques, and attention to lower bounds. Cost given on next slide. What equation represents the worst case cost? Data and Algorithm Analysis Spring 2017 93 / 382 Worst Cost Upper Bound Worst Cost Upper Bound Worst Cost Upper Bound { 1 n = 1 f (n) = f (n 1) + 1 n > 1 Reasonable to guess that f (n) = n. Prove by induction: Basis step: f (1) = 1, so f (n) = n when n = 1. Induction hypothesis: For k < n, f (k) = k. Induction step: From recurrence, = (n 1) + 1 = n { 1 n = 1 f (n) = f (n 1) + 1 n > 1 Reasonable to guess that f (n) = n. Prove by induction: Basis step: f (1) = 1, so f (n) = n when n = 1. Induction hypothesis: For k < n, f (k) = k. Induction step: From recurrence, = (n 1) + 1 = n Thus, the worst case cost for n elements is linear. Induction is great for verifying a hypothesis. Thus, the worst case cost for n elements is linear. Induction is great for verifying a hypothesis. Data and Algorithm Analysis Spring 2017 94 / 382 Approach #2 What if we couldn t guess a solution? Try: Substitute and Guess. Iterate a few steps of the recurrence, and look for a summation. = {f (n 2) + 1} + 1 = {{f (n 3) + 1} + 1} + 1} Approach #2 Replace i with n 1. Alternative: Recognize f (n) = f (1 n + i=2 1. Approach #2 What if we couldn t guess a solution? Try: Substitute and Guess. Iterate a few steps of the recurrence, and look for a summation. = {f (n 2) + 1} + 1 = {{f (n 3) + 1} + 1} + 1} Now what? Guess f (n) = f (n i) + i. When do we stop? When we reach a value for f that we know. f (n) = f (n (n 1)) + n 1 = f (1) + n 1 = n Now, go back and test the guess using induction. Now what? Guess f (n) = f (n i) + i. When do we stop? When we reach a value for f that we know. f (n) = f (n (n 1)) + n 1 = f (1) + n 1 = n Now, go back and test the guess using induction. Data and Algorithm Analysis Spring 2017 95 / 382 Approach #3 Guess and Test: Guess the form of the solution, then solve the resulting equations. Guess: f (n) is linear. f (n) = rn + s for some r, s. What do we know? f (1) = r 1 + s = r + s = 1. f (n) = r n + s = r (n 1) + s + 1. Solving these two simultaneous equations, r = 1, s = 0. Final form of guess: f (n) = n. Now, prove using induction. Data and Algorithm Analysis Spring 2017 96 / 382 Approach #3 Often, f (0) is easier. Or maybe f (2). Approach #3 Guess and Test: Guess the form of the solution, then solve the resulting equations. Guess: f (n) is linear. f (n) = rn + s for some r, s. What do we know? f (1) = r 1 + s = r + s = 1. f (n) = r n + s = r (n 1) + s + 1. Solving these two simultaneous equations, r = 1, s = 0. Final form of guess: f (n) = n. Now, prove using induction. By definition,, so r n = r (n 1) + 1. So rn + s = rn r + s + 1. s = s r + 1 r 1 = 0 Since. Why is this a guess and not a proof? Because all we did is show that our model passes through two points that the real curve also passes through. If the curve really is linear, 2 points is all that we need. But, we need to prove that it is linear.
Lower Bound on the Problem (1) comparing X with L[n]. We can feed the algorithm an input with X in position n. Lower Bound on the Problem (1) Lower Bound on the Problem (1) comparing X with L[n]. We can feed the algorithm an input with X in position n. Be careful about assumptions on how an algorithm might (must) behave. After all, where do new, clever algorithms come from? From different behavior than was previously assumed! Data and Algorithm Analysis Spring 2017 97 / 382 Fixing the Proof (1) Fixing the Proof (1) Fixing the Proof (1) Error #1: An algorithm need not consistently skip position n. Itcould, for example, work from right to left. Fix: Generalize to skip some other position i. Error #1: An algorithm need not consistently skip position n. Itcould, for example, work from right to left. Fix: Generalize to skip some other position i. Data and Algorithm Analysis Spring 2017 98 / 382 Lower Bound on the Problem (2) Lower Bound on the Problem (2) Lower Bound on the Problem (2) comparing X with L[i] for some value i. We can feed the algorithm an input with X in position i. comparing X with L[i] for some value i. We can feed the algorithm an input with X in position i. Data and Algorithm Analysis Spring 2017 99 / 382