COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background 1 Polynomial Running Time Brute force. For many non-trivial problems, there is a natural brute force search algorithm that checks every possible solution. Typically takes 2 N time or worse for inputs of size N. Unacceptable in practice. n! for stable matching with n men and n women Desirable scaling property. When the input size doubles, the algorithm should only slow down by some constant factor c. There exists constants c > 0 and d > 0 such that on every input of size n, its running time is bounded by c n d steps. Def. An algorithm is poly-time if the above scaling property holds. 2 1
Worst-Case Analysis Worst case running time. Obtain bound on largest possible running time of algorithm on input of a given size N. Generally captures efficiency in practice. Draconian view, but hard to find effective alternative. Average case running time. Obtain bound on running time of algorithm on random input as a function of input size N. Hard (or impossible) to accurately model real instances by random distributions. Algorithm tuned for a certain distribution may perform poorly on other inputs. 3 Worst-Case Polynomial Time An algorithm is efficient if its running time is polynomial. Justification: It really works in practice! Although 6.02 10 23 N 20 is technically poly-time, it would be useless in practice. In practice, the poly-time algorithms that people develop almost always have low constants and low exponents. Breaking through the exponential barrier of brute force typically exposes some crucial structure of the problem. Exceptions. Some poly-time algorithms do have high constants and/or exponents, and are useless in practice. Some exponential-time (or worse) algorithms are widely used because the worst-case instances seem to be rare. 4 2
Big-O Notation Asymptotic O-notation ( big-o ) provides a way to simplify the messy functions that often arise in analyzing the running times of algorithms Allows us to ignore less important elements (constants) Focus on important issues (growth rate for large values of n) 5 Formal Definition Big-O Formally, f(n) is O(g(n)) if there exist constants c > 0 and n 0 0 such that, f(n) c g(n), for all n n 0. Thus, big-o notation can be thought of as a way of expressing a sort of fuzzy relation between functions, where by fuzzy, we mean that constant factors are ignored and we are only interested in what happens as n tends to infinity. 6 3
Intuitive Form of Big-O f n f n is O g n if lim n g n c, for some constant c 0. For example, if f(n) = 15n 2 + 7n log 3 n and g(n) = n 2, we have f(n) is O(g(n)) because In the last step of the derivation, we have used the important fact that log n raised to any positive power grows asymptotically more slowly than n raised to any positive power. 7 Useful facts about limits 8 4
Survey of Common Running Times Linear Time: O(n) Linearithmic Time: O(n log n) Quadratic Time: O(n 2 ) Cubic Time: O(n 3 ) Polynomial Time: O(n k ) Exponential Time: O(2 nk ) 9 Linear Time: O(n) Linear time. Running time is at most a constant factor times the size of the input. Computing the maximum. Compute maximum of n numbers a 1,, a n. max a 1 for i = 2 to n { if (a i > max) max a i 10 5
Linear Time: O(n) Merge. Combine two sorted lists A = a 1,a 2,,a n with B = b 1,b 2,,b n into sorted whole. i = 1, j = 1 while (both lists are nonempty) { if (a i b j ) append a i to output list and increment i else(a i b j )append b j to output list and increment j append remainder of nonempty list to output list Claim. Merging two lists of size n takes O(n) time. Pf. After each comparison, the length of output list increases by 1. 11 also referred to as linearithmic time O(n log n) Time O(n log n) time. Arises in divide-and-conquer algorithms. Sorting. Mergesort and heapsort are sorting algorithms that perform O(n log n) comparisons. Largest empty interval. Given n time-stamps x 1,, x n on which copies of a file arrive at a server, what is largest interval of time when no copies of the file arrive? O(n log n) solution. Sort the time-stamps. Scan the sorted list in order, identifying the maximum gap between successive time-stamps. 12 6
Quadratic Time: O(n 2 ) Quadratic time. Enumerate all pairs of elements. Closest pair of points. Given a list of n points in the plane (x 1, y 1 ),, (x n, y n ), find the pair that is closest. O(n 2 ) solution. Try all pairs of points. min (x 1 - x 2 ) 2 + (y 1 - y 2 ) 2 for i = 1 to n { for j = i+1 to n { d (x i - x j ) 2 + (y i - y j ) 2 if (d < min) min d don't need to take square roots 13 Cubic Time: O(n 3 ) Cubic time. Enumerate all triples of elements. Set disjointness. Given n sets S 1,, S n each of which is a subset of 1, 2,, n, is there some pair of these which are disjoint? O(n 3 ) solution. For each pairs of sets, determine if they are disjoint. foreach set S i { foreach other set S j { foreach element p of S i { determine whether p also belongs to S j if (no element of S i belongs to S j ) report that S i and S j are disjoint 14 7
Polynomial Time: O(n k ) Time Independent set of size k. Given a graph, are there k nodes such that no two are joined by an edge? O(n k ) solution. Enumerate all subsets of k nodes. k is a constant foreach subset S of k nodes { check whether S in an independent set if (S is an independent set) report S is an independent set Check whether S is an independent set = O(k 2 ). Number of k element subsets = O(k 2 n k / k!) = O(n k ). poly-time for k=17, but not practical n k n (n 1) (n 2) (n k 1) k (k 1) (k 2) (2) (1) 15 nk k! Exponential Time Independent set. Given a graph, what is maximum size of an independent set? O(n 2 2 n ) solution. Enumerate all subsets. S* ϕ foreach subset S of nodes { check whether S in an independent set if (S is largest independent set seen so far) update S* S 16 8
Comparison of Running Times 17 Other Asymptotic Forms Big-O has a number of relatives, which are useful for expressing other sorts of relations. More on these next time! 18 9
Summations Naturally arise in analysis of iterative algorithms More complex forms of analysis, such as recurrences, are often solved by reducing to summations Solving a summation means reducing it to a closed-form formula No summations, recurrences, integrals, or other complex operators Often don t need to solve a summation exactly to find the asymptotic approximation 19 Constant Series 20 10
Arithmetic Series 21 Geometric Series 22 11
Quadratic Series 23 Linear-geometric Series 24 12
Harmonic Series 25 Summations With General Bounds 26 13
Linearity of Summation 27 Approximate Using Integrals 28 14
Example: Previous Larger Element Given a sequence of numeric values, <a 1, a 2,..., a n >. For each element a i, for 1 i n, we want to know the index of the rightmost element of the sequence <a 1, a 2,..., a i 1 > whose value is strictly larger than a i. If no element of this subsequence is larger than a i then, by convention, the index will be 0. (Or, if you like, you may imagine that there is a fictitious sentinel value a 0 =.) More formally, for 1 i n, define p i to be p i = max{j 0 j < i and a j > a i, where a 0 = (see Fig. 2). 29 Naive Algorithm For Previous Larger Element 30 15
Recurrences Arise naturally in analysis of divide-and-conquer algorithms Divide: Divide the problem into two or more sub-problems (ideally of roughly equal sizes) Conquer: Solve each sub-problem recursively Combine: Combine the solutions to the subproblems into a single global solution. 31 Recurrences To analyze recursive procedures such as divideand-conquer, we need to set up a recurrence. Example: Suppose we break a problem into two sub-problems, each of size roughly n/2. Additional overhead of splitting and merging the solutions is O(n). When sub-problems are reduced to size 1, we can solve them in O(1) time. Ignoring constants and writing O(n) as n, we get: T(n) = 1 if n = 1, T(n) = 2T(n/2) + n if n > 1 32 16
Example Problem Use mathematical induction to show that when n is an exact power of 2, the solution of the recurrence 2, if n = 2, T n = 2T n 2 + n, if n = 2k, for k > 1 is T n = n lg n 33 Next Time Other Asymptotic Forms Read Section 2.2 34 17