Introduction to Algorithms 6.046J/8.40J Lecture Prof. Piotr Indyk Welcome to Introduction to Algorithms, Spring 08 Handouts. Course Information. Calendar 3. Signup sheet (PLEASE return at the end of this lecture!) Charles Leiserson and Piotr Indyk L. Course information Design and analysis of algorithms. Staff. Prerequisites 3. Lectures &Recitations 4. Problem sets 5. Describing algorithms 6. Grading policy. Collaboration policy 8. Textbook (CLRS). Course web site 0.Extra help (HKN) L.3 Theoretical study of how to solve computational problems efficiently Computational problems : e.g., sorting data, finding shortest path, etc. Solve : design an algorithm that does the job (correctly) Theoretical : use the language of (algorithmic) mathematics to understand the performance of the algorithms. In particular, it enables us to define what efficiently means Typical question: what is the fastest algorithm for a given problem? L.4 Performance vs. the rest of Course 6 The problem of sorting Also important: modularity maintainability functionality robustness user-friendliness programmer time simplicity extensibility reliability Performance is often one of the key aspects: Searching the Web Analyzing the genome(s) L.5 Input: sequence a, a,, a n of numbers. Output: permuted input a', a',, a' n such that a' a' a' n. Example: Input: 8 4 3 6 Output: 3 4 6 8 L.6
A: Insertion sort pseudocode INSERTION-SORT (A, n) A[.. n] for j to n do key A[ j] i j while i > 0 and A[i] > key do A[i+] A[i] i i A[i+] = key i j n sorted key 8 4 3 6 L. L.8 8 4 3 6 8 4 3 6 8 4 3 6 L. L.0 8 4 3 6 8 4 3 6 8 4 3 6 8 4 3 6 4 8 3 6 L. L.
8 4 3 6 8 4 3 6 4 8 3 6 8 4 3 6 8 4 3 6 4 8 3 6 4 8 3 6 L. L.4 8 4 3 6 8 4 3 6 4 8 3 6 4 8 3 6 8 4 3 6 8 4 3 6 4 8 3 6 4 8 3 6 3 4 8 6 L.5 L.6 8 4 3 6 8 4 3 6 4 8 3 6 4 8 3 6 3 4 8 6 8 4 3 6 8 4 3 6 4 8 3 6 4 8 3 6 3 4 8 6 3 4 6 8 done L. L.8 3
Running time of I-Sort? INSERTION-SORT (A, n) A[.. n] for j to n do key A[ j] i j while i > 0 and A[i] > key do A[i+] A[i] i i A[i+] = key Issues: the running time depends on The input: an already sorted sequence is easy to sort The processor: ZX80 vs Pentium How can we develop metrics that do not depend on these factors? L. Input-independence ANALYSES: Worst-case: (usually) T(n) = maximum time of algorithm on any input of size n. Average-case: (sometimes) T(n) = expected time of algorithm over all inputs of size n. Need assumption of statistical distribution of inputs. Best-case: (bogus) Cheat with a slow algorithm that works fast on some input. L. Machine-independence Θ-notation What is insertion sort s worst-case time? BIG IDEA: Ignore machine-dependent constants. Look at growth of T(n) as n. Asymptotic Analysis Math: Θ(g(n)) = { f (n) :there exist positive constants c, c, and n 0 such that 0 c g(n) f (n) c g(n) for all n n 0 } We write f (n) = Θ(g(n)) instead of f (n) Θ(g(n)) Engineering: Drop low-order terms; ignore leading constants. Example: 3n 3 + 0n 5n + 6046 = Θ(n 3 ) L. Also: O(), Ω(), o(), ω(), L. T(n) Asymptotic analysis When n gets large enough, a Θ(n) algorithm always beats a Θ(n ) algorithm. n n 0 We shouldn t ignore asymptotically slower algorithms, however. Real-world design situations often call for a careful balancing of engineering objectives. L.3 Insertion sort analysis Worst case: T ( n) = Θ( n j= j) = Θ ( n ) Is insertion sort a fast sorting algorithm? Moderately so, for small n. Not at all, for large n. L.4 4
Merge sort MERGE-SORT A[.. n]. If n =, done.. Recursively sort A[.. n/ ] and A[ n/ +.. n ]. 3. Merge the two sorted lists into one. Key subroutine: MERGE L.5 L.6 L. L.8 L. L.30 5
6 L.3 L.3 L.33 L.34 L.35 L.36
Time to merge a total of n elements? Θ(n) L.3 L.38 Analyzing merge sort T(n) Θ() T(n/) Θ(n) MERGE-SORT A[.. n]. If n =, done.. Recursively sort A[.. n/ ] and A[ n/ +.. n ]. 3. Merge the sorted lists Sloppiness: Should be T( n/ ) + T( n/ ), but it turns out not to matter asymptotically. Recurrence for merge sort T(n) = Θ() if n = ; T(n/) + Θ(n) if n >. How to solve it? One approach on the next slide More in CLRS and Lecture L.3 L.40 Solve T(n) = T(n/) +, where c > 0 is constant. Solve T(n) = T(n/) +, where c > 0 is constant. T(n) L.4 L.4
Solve T(n) = T(n/) +, where c > 0 is constant. Solve T(n) = T(n/) +, where c > 0 is constant. T(n/) T(n/) / / T(n/4) T(n/4) T(n/4) T(n/4) L.43 L.44 Solve T(n) = T(n/) +, where c > 0 is constant. Solve T(n) = T(n/) +, where c > 0 is constant. / / /4 /4 /4 /4 / / /4 /4 /4 /4 Θ() Θ() L.45 L.46 Solve T(n) = T(n/) +, where c > 0 is constant. Solve T(n) = T(n/) +, where c > 0 is constant. / / / / /4 /4 /4 /4 /4 /4 /4 /4 Θ() Θ() L.4 L.48 8
Solve T(n) = T(n/) +, where c > 0 is constant. Solve T(n) = T(n/) +, where c > 0 is constant. / / / / /4 /4 /4 /4 /4 /4 /4 /4 Θ() Θ() #leaves = n Θ(n) L.4 L.50 Solve T(n) = T(n/) +, where c > 0 is constant. / / /4 /4 /4 /4 Conclusions Θ(n lg n) grows more slowly than Θ(n ). Therefore, merge sort asymptotically beats insertion sort in the worst case. In practice, merge sort beats insertion sort for n > 30 or so. Go test it out for yourself! Θ() #leaves = n Θ(n) Total = Θ(n lg n) L.5 L.5