Overview Lecture 20: Analysis of Algorithms Which algorithm should I choose for a particular application?! Ex: symbol table. (linked list, hash table, red-black tree,... )! Ex: sorting. (insertion sort, quicksort, mergesort,... )! Ex: halting problem. (none) Analysis of algorithms: framework for comparing algorithms and predicting performance. Computational complexity: framework for studying intrinsic difficulty of problems. COS 126: General Computer Science http://www.princeton.edu/~cos126 2 Ex 1: N-body Simulation. Overview! Simulate gravitational interactions among N bodies.! Brute force: N 2 steps.! Barnes-Hut: N log N steps, enables new research. Ex 2: Discrete Fourier transform.! Break down waveform of N samples into periodic components. Applications: DVD players, JPEG, analysis of astronomical data, medical imaging, nonlinear Schroedinger equation,...! Brute force: N 2 steps.! FFT algorithm: N log N steps, enables new technology. Ex 3: Sorting.! Rearrange N items in ascending order. physicists want N = # atoms in universe! Fundamental information processing abstraction. Andrew Appel PU '81 Freidrich Gauss 1805 Attribute Cost Applicability Improvement Better Machines vs. Better Algorithms Better Machine $$$ or more. Makes "everything" run faster. Incremental quantitative improvements. $ or less. May not apply to some problems. Ex: sorting a billion items with quicksort on home PC is 100x faster than insertion sorting it on supercomputer. Computer home PC Comparisons Per Second 10 7 Better Algorithm Dramatic quantitative improvements possible. Insertion 3 centuries 3 hours Jon von Neumann IAS 1945 super 10 12 2 weeks 3 4
Case Study: Sorting Insertion Sort Sorting problem:! Given N items, rearrange them in ascending order.! Applications: databases, data compression, computational biology, computer graphics,... Insertion sort! Brute-force sorting solution.! Move left-to-right through array.! Exchange next element with larger elements to its left, one-by-one. name Hauser Hong Hsu Hayes Haskell Hanley Hornet name Hanley Haskell Hauser Hayes Hong Hornet Hsu static void insertionsort(string[] a) { for (int i = 0; i < a.length; i++) { for (int j = i; j > 0; j--) { if (less(a[j], a[j-1])) exch(a, j, j-1); else break; 5 6 Helper Functions Estimating the Running Time Is string s lexicographically less than string t? static boolean less(string s, String t) { int n = Math.min(s.length(), t.length()); for (int i = 0; i < n; i++) { if (s.charat(i) < t.charat(i)) return true; else if (s.charat(i) > t.charat(i)) return false; return (s.length() < t.length()); Swap string references a[i] and a[j]. static void exch(string[] a, int i, int j) { String swap = a[i]; a[i] = a[j]; a[j] = swap; Total run time. Sum over all instructions: frequency! cost. Frequency. Determined by algorithm and input. Cost. Determined by compiler and machine. Easier alternative. (i) Analyze asymptotic growth. (ii) For medium N, run and measure time. (iii) For large N, use (i) and (ii) to predict time. Donald Knuth Asymptotic growth rates.! Estimate time as a function of input size N. N, N log N, N 2, N 3, 2 N, N!! Typically ignore lower order terms and leading coefficients. Ex. 6N 3 + 17N 2 + 56 is asymptotically proportional to N 3 7 8
Insertion Sort: Analysis Insertion Sort: Analysis Worst case.! Elements in reverse sorted order. i th iteration requires i - 1 compare and exchange operations total = 0 + 1 + 2 +... + N-2 + N-1 = N (N-1) / 2 Best case.! Elements in sorted order already. i th iteration requires only 1 compare operation total = 0 + 1 + 1 +... + 1 = N - 1 E F G H I J D C B A A B C D E F G H I J unsorted sorted active unsorted sorted active 10 11 Insertion Sort: Analysis Insertion Sort: Analysis Average case.! Elements are randomly ordered. i th iteration requires i / 2 comparison on average total = 0 + 1/2 + 2/2 +... + (N-1)/2 = N (N-1) / 4 Analysis Worst Comparisons N (N - 1) / 2 Asymptotic 0.5! N 2 Best N 1 N A D E F H J C B I G Average N (N 1) / 4 0.25! N 2 unsorted sorted active 12 13
Estimating the Running Time: Insertion Sort Insertion Sort: Number of Comparisons (ii) Collect empirical data for medium N.! Use System.currentTimeMillis to get number of milliseconds since January 1, 1970. long start = System.currentTimeMillis(); insertionsort(a); long stop = System.currentTimeMillis(); double elapsed = (stop - start) / 1000.0; N 5,000 10,000 20,000 comparisons 6.1 million 23 million 98 million running time 0.7 seconds 4 seconds 25 seconds (iii) Predict for large N.! Quadratic: 2x factor in input size " 4x factor in comparisons. Comparsions (millions) 100000 10000 Actual 1000 Predicted 100 10 1 1000 10000 100000 1000000 Input Size Lesson. Supercomputer can't rescue bad algorithm. 40,000 400 million 130 seconds 80,000 1.6 billion 570 seconds Data source: first N words of Charles Dicken's life work. Machine: Dell 2.2GHz PC with 1GB memory. computer home super comparisons per second 10 7 10 12 thousand Input Size million 1 day 1 second billion 3 centuries 2 weeks 14 15 Profiling What is my program doing? % java Xprof InsertionSort 50000 < dickens.txt Flat profile of 238.75 secs (11909 total ticks): main Compiled + native Method 90.9% 10824 + 0 InsertionSort.less 6.2% 733 + 0 java.lang.math.min 2.8% 337 + 0 InsertionSort.insertionsort 0.0% 0 + 3 InsertionSort.main 0.0% 2 + 0 java.io.bufferedinputstream.read 0.0% 1 + 0 java.lang.stringbuffer.length 0.0% 1 + 0 java.lang.stringbuffer.append 0.0% 1 + 0 java.lang.string.<init> 0.0% 1 + 0 StdIn.readString 99.9% 11900 + 3 Total compiled. C. A. R. Hoare, 1960 Q U I C K S O R T I S C O O L How to speed up the program?! Make less run substantially faster.! Call less substantially fewer times. 16 17
..! Sort each "half" recursively. partitioning element partitioning element Q U I C K S O R T I S C O O L Q U I C K S O R T I S C O O L I C K I C L Q U S O R T S O O CI C KI I CK L QO OU OS QO R ST S OT OU # L $ L partitioned array sort each piece 18 19 : Java Implementation : Implementing Partition.! Sort each "half" recursively. How do we partition in-place efficiently? static int partition(string[] a, int left, int right) { int i = left - 1; int j = right; while(true) { public void quicksort(string[] a, int left, int right) { if (right <= left) return; int i = partition(a, left, right); quicksort(a, left, i-1); quicksort(a, i+1, right); while (less(a[++i], a[right])) ; while (less(a[right], a[--j])) if (j == left) break; if (i >= j) break; exch(a, i, j); check if pointers cross swap find item on left to swap find item on right to swap exch(a, i, right); return i; swap with partitioning element return index where crossing occurs 20 21
: Analysis Sorting Summary Average case running time.! Roughly 2 N log e N comparisons. see Sedgewick book or COS 226 for proof! Assumption: partition on random element (instead of rightmost one). Worst case running proportional to N 2.! More likely that you are struck by lightning and meteor at same time. Numerical experiments roughly match predictions.! A bit better because of many equal keys. N comparisons running time Comparsions (thousands) 1000000 Insertion sort 100000 10000 1000 100 1000 10000 100000 1000000 Input Size 200,000 400,000 1 million 4.0 million 8.3 million 23 million 0.9 seconds 2.2 seconds 5.8 seconds 2 million 47 million 13 seconds 4 million 97 million 31 seconds Lesson. Good algorithm can be much more powerful than supercomputer. Computer home PC super Comparisons Per Second 10 7 10 12 Insertion 3 centuries 2 weeks 3 hours N = 1 billion 22 23 Design, Analysis, and Implementation of Algorithms Computational Complexity of Problems Algorithm.! Step-by-step recipe used to solve a problem.! Generally independent of programming language or machine on which it is to be executed. Design. Find a method to solve the problem. Analysis. Evaluate effectiveness and predict theoretical performance. Implementation. Write Java code to test your theory. Analysis Design Computational complexity. Framework to study efficiency of algorithms for solving a particular problem X. Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. Optimal algorithm. Algorithm with best cost guarantee for X. Example 1: X = sorting.! Measure costs in terms of comparisons.! Upper bound = N log 2 N with mergesort.! Lower bound = N log 2 N - N log 2 e.! Optimal algorithm = mergesort. lower bound ~ upper bound quicksort usually faster, but mergesort never slow applies to any comparison-based algorithm (see COS 226) Implement Theory simplifies and provides guidelines. 24 25
Computational Complexity of Problems Summary Computational complexity. Framework to study efficiency of algorithms for solving a particular problem X. Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. Optimal algorithm. Algorithm with best cost guarantee for X. How can I evaluate the performance of my algorithm?! Computational experiments.! Analysis of algorithms. What if it's not fast enough?! Understand why. analysis of algorithms! Buy a faster computer. performance improves incrementally Example 2: X = Euclidean TSP.! Measure cost in terms of arithmetic operations.! Upper bound = 2 N by dynamic programming. N! by brute force! Lower bound = N.! Optimal algorithm = ask again in 50 years.! Find a better algorithm in a textbook. performance can improve dramatically! Discover a new algorithm. not always easy or possible Sobering philosophical thoughts. see NP-completeness! In theory, most problems are undecidable.! In practice, most remaining problems are intractable.! Analysis of algorithms helps us improve the ones we use. Essence of computational complexity: closing the gap. 26 27