CS Data Structures and Algorithm Analysis

Similar documents
Lecture 2. Fundamentals of the Analysis of Algorithm Efficiency

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

CS Non-recursive and Recursive Algorithm Analysis

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

When we use asymptotic notation within an expression, the asymptotic notation is shorthand for an unspecified function satisfying the relation:

CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms

When we use asymptotic notation within an expression, the asymptotic notation is shorthand for an unspecified function satisfying the relation:

with the size of the input in the limit, as the size of the misused.

Module 1: Analyzing the Efficiency of Algorithms

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Module 1: Analyzing the Efficiency of Algorithms

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 3. Θ Notation. Comparing Algorithms

Growth of Functions (CLRS 2.3,3)

What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt

Ch01. Analysis of Algorithms

Data Structures and Algorithms Running time and growth functions January 18, 2018

CS 380 ALGORITHM DESIGN AND ANALYSIS

CS173 Running Time and Big-O. Tandy Warnow

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College

3.1 Asymptotic notation

Data Structures and Algorithms

Algorithms Design & Analysis. Analysis of Algorithm

CS 310 Advanced Data Structures and Algorithms

Ch 01. Analysis of Algorithms

Review Of Topics. Review: Induction

5 + 9(10) + 3(100) + 0(1000) + 2(10000) =

CSC Design and Analysis of Algorithms. Lecture 1

Analysis of Algorithms - Using Asymptotic Bounds -

CSCI 3110 Assignment 6 Solutions

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Solving recurrences. Frequently showing up when analysing divide&conquer algorithms or, more generally, recursive algorithms.

Asymptotic Notation. such that t(n) cf(n) for all n n 0. for some positive real constant c and integer threshold n 0

CSED233: Data Structures (2017F) Lecture4: Analysis of Algorithms

Theory of Computation

Algorithm Efficiency. Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University

Chapter 2. Recurrence Relations. Divide and Conquer. Divide and Conquer Strategy. Another Example: Merge Sort. Merge Sort Example. Merge Sort Example

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

Asymptotic Analysis. Thomas A. Anastasio. January 7, 2004

Algorithm efficiency analysis

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Asymptotic Analysis, recurrences Date: 9/7/17

COMP 382: Reasoning about algorithms

Analysis of Algorithms

COMP Analysis of Algorithms & Data Structures

Principles of Algorithm Analysis

COMP Analysis of Algorithms & Data Structures

Big-O Notation and Complexity Analysis

Data Structures and Algorithms. Asymptotic notation

Input Decidable Language -- Program Halts on all Input Encoding of Input -- Natural Numbers Encoded in Binary or Decimal, Not Unary

Grade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2

Computer Science 431 Algorithms The College of Saint Rose Spring Topic Notes: Analysis Fundamentals

Recursion. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry. Fall 2007

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

Divide and Conquer. Arash Rafiey. 27 October, 2016

data structures and algorithms lecture 2

Runtime Complexity. CS 331: Data Structures and Algorithms

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

Big O 2/14/13. Administrative. Does it terminate? David Kauchak cs302 Spring 2013

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 2

LECTURE NOTES ON DESIGN AND ANALYSIS OF ALGORITHMS

Asymptotic Analysis and Recurrences

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

Lecture 2. More Algorithm Analysis, Math and MCSS By: Sarah Buchanan

In-Class Soln 1. CS 361, Lecture 4. Today s Outline. In-Class Soln 2

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

CSE 417: Algorithms and Computational Complexity

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes

CS Data Structures and Algorithm Analysis

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Computational Complexity

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

CS Asymptotic Notations for Algorithm Analysis

Asymptotic Algorithm Analysis & Sorting

Topic 17. Analysis of Algorithms

1 Substitution method

COMPUTER ALGORITHMS. Athasit Surarerks.

Analysis of Algorithms

Asymptotic Running Time of Algorithms

Methods for solving recurrences

Solving Recurrences. Lecture 23 CS2110 Fall 2011

Analysis of Algorithms

CS 4407 Algorithms Lecture 2: Growth Functions

Recurrences COMP 215

Midterm Exam. CS 3110: Design and Analysis of Algorithms. June 20, Group 1 Group 2 Group 3

Lecture 3. Big-O notation, more recurrences!!

csci 210: Data Structures Program Analysis

Algorithm Analysis Recurrence Relation. Chung-Ang University, Jaesung Lee

Big O (Asymptotic Upper Bound)

Asymptotic Analysis 1

Cpt S 223. School of EECS, WSU

Introduction to Algorithms

i=1 i B[i] B[i] + A[i, j]; c n for j n downto i + 1 do c n i=1 (n i) C[i] C[i] + A[i, j]; c n

Data Structures and Algorithms CSE 465

CS Analysis of Recursive Algorithms and Brute Force

EECS 477: Introduction to algorithms. Lecture 5

Algorithms and Programming I. Lecture#1 Spring 2015

CS 344 Design and Analysis of Algorithms. Tarek El-Gaaly Course website:

Transcription:

CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006

Outline 1 Analysis Framework 2 Asymptotic Notation 3 Nonrecursive Algorithms 4 Recursive Algorithms & Recurrence Relations 5 Empirical Analysis 6 Homework

Analyzing for Efficiency Though there are other factors, we concentrate our analysis on efficiency Time efficiency how fast an algorithm runs Space efficiency how much memory an algorithm Input size Wall-clock time depends on the machine, but algorithms are machine independent Typically, algorithms run longer as the size of its input increases We are interested in how efficiency scales wrt input size

Measuring input size What do we measure to judge the size of the problem?

Measuring input size What do we measure to judge the size of the problem? Sort(a 1,a 2,,a n) 2 6 4 a 11 a n1. a 1m a nm 3 2 7 6 5 4 b 11 b m1 3. b 1k b mk 7 5 = p(x) = a nx n + + a 1x + a 0

Measuring input size What do we measure to judge the size of the problem? Sort(a 1,a 2,,a n) Number of list elements 2 6 4 a 11 a n1. a 1m a nm 3 2 7 6 5 4 b 11 b m1 3. b 1k b mk 7 5 = p(x) = a nx n + + a 1x + a 0

Measuring input size What do we measure to judge the size of the problem? Sort(a 1,a 2,,a n) Number of list elements 2 6 4 a 11 a n1. 3 2 7 6 5 4 b 11 b m1 3. 7 5 = Order of matrices? Total number of elements? a 1m a nm b 1k b mk p(x) = a nx n + + a 1x + a 0

Measuring input size What do we measure to judge the size of the problem? Sort(a 1,a 2,,a n) Number of list elements 2 6 4 a 11 a n1. 3 2 7 6 5 4 b 11 b m1 3. 7 5 = Order of matrices? Total number of elements? a 1m a nm b 1k b mk p(x) = a nx n + + a 1x + a 0 Degree of polynomial? Total number of coefficients?

Measuring Running Time What do we measure to judge the running time on an algorithm?

Measuring Running Time What do we measure to judge the running time on an algorithm? Could count all operations executed... But we typically concern ourselves with the basic operation, the one contributing the most to the total running time Sort(a 1,a 2,,a n) 2 6 4 a 11 a n1. a 1m a nm 3 2 7 6 5 4 b 11 b m1 3. b 1k b mk 7 5 =

Measuring Running Time What do we measure to judge the running time on an algorithm? Could count all operations executed... But we typically concern ourselves with the basic operation, the one contributing the most to the total running time Sort(a 1,a 2,,a n) Key comparisons 2 6 4 a 11 a n1. a 1m a nm 3 2 7 6 5 4 b 11 b m1 3. b 1k b mk 7 5 =

Measuring Running Time What do we measure to judge the running time on an algorithm? Could count all operations executed... But we typically concern ourselves with the basic operation, the one contributing the most to the total running time Sort(a 1,a 2,,a n) Key comparisons 2 6 4 a 11 a n1. 3 2 7 6 5 4 b 11 b m1 3. 7 5 = Additions? Multiplications? a 1m a nm b 1k b mk

Measuring Running Time What do we measure to judge the running time on an algorithm? Could count all operations executed... But we typically concern ourselves with the basic operation, the one contributing the most to the total running time Sort(a 1,a 2,,a n) Key comparisons 2 6 4 a 11 a n1. 3 2 7 6 5 4 b 11 b m1 3. 7 5 = Additions? Multiplications? a 1m a nm b 1k b mk In general, we can approximate running time as: T(n) c op C(n) Example: Given C(n) := 1 2n (n 1), what is the run-time effect of doubling the input size?

Sidebar: Logarithm Basics If b x = y then log b y = x We can change bases: log a x = log b x log b a

Sidebar: Logarithm Basics If b x = y then log b y = x We can change bases: log a x = log b x log b a y 0 10 20 30 40 50 y =2 x y =x y =log 2 (x) log b x y = y log b x log xy = log x + log y log x y = log x logy a log b x = x log b a 0 10 20 30 40 50 x

How Does Your Running Time Grow? y 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 y =2 x y =x 4 y =x 3 y =x 2 y =x We aren t so interested in running times on small inputs, but in how running time scales on very large inputs We are interested in an algorithm s order of growth, ignoring constant factors 0 20 40 60 80 100 x

How Does Your Running Time Grow? y 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 y =2 x y =x 4 y =x 3 y =x 2 y =x 0 20 40 60 80 100 We aren t so interested in running times on small inputs, but in how running time scales on very large inputs We are interested in an algorithm s order of growth, ignoring constant factors 1 Constant log n Logarithmic (sub-linear) n Linear n lg n n-log-n n k, for some constant k Polynomial k n, for some constant k Exponential n! Factorial ( Exponential ) x

How Does Your Running Time Grow? y 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 y =2 x y =x 4 y =x 3 y =x 2 y =x 0 20 40 60 80 100 x We aren t so interested in running times on small inputs, but in how running time scales on very large inputs We are interested in an algorithm s order of growth, ignoring constant factors 1 Constant log n Logarithmic (sub-linear) n Linear n lg n n-log-n n k, for some constant k Polynomial k n, for some constant k Exponential n! Factorial ( Exponential ) NOTE: Given a nonnegative integer d, we can write a polynomial in the form p(n) = P d i=0 a i n i, where a i are constants. We call d the degree of the polynomial.

Worst-Case Efficiency SequentialSearch(A[0... n 1], K) i 0 while i < n and A[i] K do i i + 1 if i < n return i else return 1

Worst-Case Efficiency SequentialSearch(A[0... n 1], K) i 0 while i < n and A[i] K do i i + 1 if i < n return i else return 1 Worst case is when K is not in A[] Search every element, requiring C worst (n) = n comparisons Worst case analysis provides an upper bound

Best-Case Efficiency SequentialSearch(A[0... n 1], K) i 0 while i < n and A[i] K do i i + 1 if i < n return i else return 1

Best-Case Efficiency SequentialSearch(A[0... n 1], K) i 0 while i < n and A[i] K do i i + 1 if i < n return i else return 1 Best case is when K is in A[0] Search first element, requiring C best (n) = 1 comparisons Best case analysis provides a lower bound Generally, best-case efficiency is not as useful...

Average-Case Efficiency SequentialSearch(A[0... n 1], K) i 0 while i < n and A[i] K do i i + 1 if i < n return i else return 1

Average-Case Efficiency SequentialSearch(A[0... n 1], K) i 0 while i < n and A[i] K do i i + 1 if i < n return i else return 1 Average case asks a useful question: What kind of running time to we expect to get when we don t know the data? Let p [0, 1] be probability that K A[] If successful, let the Pr{K = A[1]} = Pr{K = A[i]} i [0, n 1] So the Pr{K = A[i]} = p n C avg (n) = [ 1 p n + 2 p n + + n p ] n +n (1 p) p n+1 2 +(1 p) n If p = 1 (i.e., K A[]), running time is linear

O, Ω, Θ: Sets of functions Informally, we can think of O, Ω, Θ as sets of functions O (g(n)): set of all functions with the same or smaller order of growth as g(n) 2n 2 5n + 1? O `n2 2 n + n 100 2? O (n!) 2n + 6? O (log n) Ω (g(n)): set of all functions with the same or larger order of growth as g(n) 2n 2 5n + 1? Ω `n2 2 n + n 100 2? Ω (n!) 2n + 6? Ω (log n) Θ (g(n)): set of all func. with the same order of growth as g(n) 2n 2 5n + 1? Θ `n2 2 n + n 1 00 2? Θ (n!) 2n + 6? Θ (log n)

O, Ω, Θ: Sets of functions Informally, we can think of O, Ω, Θ as sets of functions O (g(n)): set of all functions with the same or smaller order of growth as g(n) 2n 2 5n + 1 O `n2 2 n + n 100 2 O (n!) 2n + 6 O (log n) Ω (g(n)): set of all functions with the same or larger order of growth as g(n) 2n 2 5n + 1? Ω `n2 2 n + n 100 2? Ω (n!) 2n + 6? Ω (log n) Θ (g(n)): set of all func. with the same order of growth as g(n) 2n 2 5n + 1? Θ `n2 2 n + n 1 00 2? Θ (n!) 2n + 6? Θ (log n)

O, Ω, Θ: Sets of functions Informally, we can think of O, Ω, Θ as sets of functions O (g(n)): set of all functions with the same or smaller order of growth as g(n) 2n 2 5n + 1 O `n2 2 n + n 100 2 O (n!) 2n + 6 O (log n) Ω (g(n)): set of all functions with the same or larger order of growth as g(n) 2n 2 5n + 1 Ω `n2 2 n + n 100 2 Ω(n!) 2n + 6 Ω (log n) Θ (g(n)): set of all func. with the same order of growth as g(n) 2n 2 5n + 1? Θ `n2 2 n + n 1 00 2? Θ (n!) 2n + 6? Θ (log n)

O, Ω, Θ: Sets of functions Informally, we can think of O, Ω, Θ as sets of functions O (g(n)): set of all functions with the same or smaller order of growth as g(n) 2n 2 5n + 1 O `n2 2 n + n 100 2 O (n!) 2n + 6 O (log n) Ω (g(n)): set of all functions with the same or larger order of growth as g(n) 2n 2 5n + 1 Ω `n2 2 n + n 100 2 Ω(n!) 2n + 6 Ω (log n) Θ (g(n)): set of all func. with the same order of growth as g(n) 2n 2 5n + 1 Θ `n2 2 n + n 1 00 2 Θ (n!) 2n + 6 Θ (log n)

O-Notation, more formally Definition O (g(n)) = {t(n) : c, n 0 > 0 such that 0 t(n) c g(n) n n 0 }. c g(n) y 0 100 200 300 400 500 600 n 0 t(n) I.e., t(n) is in O(g(n)) if t(n) is bounded above by some constant multiple of g(n) for all large n We call O an asymptotic upper bound 0 20 40 60 80 100 x

Ω-Notation, more formally Definition Ω (g(n)) = {t(n) : c, n 0 > 0 such that 0 c g(n) t(n) n n 0 }. y 0 100 200 300 400 500 600 n 0 t(n) c g(n) I.e., t(n) is in O(g(n)) if t(n) is bounded below by some constant multiple of g(n) for all large n We call O an asymptotic lower bound 0 20 40 60 80 100 x

Θ-Notation, more formally Definition For any two functions t(n) and g(n) we have t(n) Θ (g(n)) if and only if t(n) O (g(n)) and t(n) Ω (g(n)). y 0 100 200 300 400 500 600 n 0 c 2 g(n) t(n) c 1 g(n) I.e., t(n) is in O(g(n)) if t(n) is bounded below by some constant multiple of g(n), c 1, and bounded above by some constant multiple of g(n), c 2, for all large n We sometimes call such a bound tight 0 20 40 60 80 100 x

A Useful Asymptotic Propery Theorem If t 1 (n) O(g 1 (n)) and t 2 (n) O(g 2 (n)) then t 1 (n) + t 2 (n) O (max{g 1 (n), g 2 (n)}). Review the proof in the book (p. 56) Advantage: We can restrict our analysis to one part of an algorithm if we know the other(s) to have a lower order of growth (e.g., CountingSort p. 24) Advantage: We can front-load an algorithm without increasing the time complexity as long as the preparation step has no greater an order of growth than main part (e.g., shuffling data before QuickSort)

Comparing Orders of Growth One common way to compare two algorithms is by taking the limit of the ratios of their running times: Comparing t(n) and f (n): 0 if t(n) is of smaller order than f (n) t(n) lim n f (n) = c if t(n) is of the same order as f (n) if t(n) is of greater order than f (n) Example: Compare the orders of growth of lg n and n:

Comparing Orders of Growth One common way to compare two algorithms is by taking the limit of the ratios of their running times: Comparing t(n) and f (n): 0 if t(n) is of smaller order than f (n) t(n) lim n f (n) = c if t(n) is of the same order as f (n) if t(n) is of greater order than f (n) Example: Compare the orders of growth of lg n and n: lg n lim n n d dn lim lg n n d dn n lg e n 1 lim n 1 2 n (given) (L Hopital s Rule) (derivation) 1 lim 2lg e n n 1 n n lim 2lg e n n = 0 (algebra)

Comparing Orders of Growth One common way to compare two algorithms is by taking the limit of the ratios of their running times: Comparing t(n) and f (n): 0 if t(n) is of smaller order than f (n) t(n) lim n f (n) = c if t(n) is of the same order as f (n) if t(n) is of greater order than f (n) Example: Compare the orders of growth of lg n and n: lg n lim n n d dn lim lg n n d dn n lg e n 1 lim n 1 2 n (given) (L Hopital s Rule) (derivation) 1 lim 2lg e n n 1 n n lim 2lg e n n = 0 lg n is of smaller order than n (algebra)

General Analysis Plan 1 How will you measure input size? What algorithmic parameter is it? 2 What is the algorithm s basic operation? (Is it in the inner-most loop?) 3 Does the # of times the basic operation is executed depend on something other than the input size (e.g., ordering of the input)? If so, you have to consider worst, best, and average cases separately. 4 Specify a summation of the # times basic operation is executed 5 Find a closed-form solution if possible, determine the order of growth from the formula

Example: Matrix Multiplication MatrixMultiply(A[0...n 1, 0...n 1], B[0...n 1, 0...n 1]) for i 0 to n 1 do for j 0 to n 1 do C[i, j] 0 for k 0 to n 1 do C[i, j] C[i, j] + A[i, k] B[k, j] return C

Example: Matrix Multiplication MatrixMultiply(A[0...n 1, 0...n 1], B[0...n 1, 0...n 1]) for i 0 to n 1 do for j 0 to n 1 do C[i, j] 0 for k 0 to n 1 do C[i, j] C[i, j] + A[i, k] B[k, j] return C 1 Input size measure: n, the order of the matrices 2 Basic operation: Multiplication 3 Dependency of op: Depends only on input size n 1 k=0 1 n 1 j=0 4 Summation: M(n) = n 1 i=0 5 Find the order: M(n) O ( n 3)

Example: Binary Digits Binary(n) count 1 while n > 1 do count count + 1 n n 2 return count

Example: Binary Digits Binary(n) count 1 while n > 1 do count count + 1 n n 2 return count 1 Input size measure: n, the integer given 2 Basic operation: the n > 1 comparison 3 Dependency of op: Depends only on input size 4 Summation: Here it is actually a recurrence relation, but we know the answer: B(n) = lg n + 1 5 Find the order: B(n) O (lg n)

Example: Factorial Factorial(n) if n = 0 return 1 else return Factorial((n 1) n)

Example: Factorial Factorial(n) if n = 0 return 1 else return Factorial((n 1) n) n indicates input size, we count multiplications F(n) = F(n 1) n, for n > 0 We call such equations recurrence relations This defines a sequence, but to make it unique we need an initial condition (i.e., if n = 0 return 1) Recurrence relations can be useful analyzing some nonrecursive algorithms, too (e.g., Binary)

General Analysis Plan 1 How will you measure input size? What algorithmic parameter is it? 2 What is the algorithm s basic operation? (Is it in the inner-most loop?) 3 Does the # of times the basic operation is executed depend on something other than the input size (e.g., ordering of the input)? If so, you have to consider worst, best, and average cases separately. 4 Specify a recurrence relation for the # times basic operation is executed 5 Solve the recurrence relation, determine the order of growth of its solutions Back substitution, induction Recurrence trees Masters Theorem

Solving Recurrences: Back Substitution & Induction Start with a large / difficult input, substitute it into the equation Continue for a few steps and note the pattern Use this intuition to posit a guess for the form of the solution Make use the of asymptotic definition to setup an inequality Use induction to show that solution works by showing that the inequality holds for all values of n.

Example: Binary (v2) Binary (n) if n = 1 return 1 else return Binary ( n/2 ) + 1 Relation: B(n) = B ( n 2 ) +1

Example: Binary (v2) Binary (n) if n = 1 return 1 else return Binary ( n/2 ) + 1 Relation: B(n) = B ( n 2 ) +1 B(2 k ) = B = = B h B Guess: B(n) O(lg n) 2 k 1 + 1 2 k 2 i + 1 + 1 = B(2 k 2 ) + 2 2 k k + k = k

Example: Binary (v2) Binary (n) if n = 1 return 1 else return Binary ( n/2 ) + 1 Relation: B(n) = B ( n 2 ) +1 B(2 k ) = B = = B h B Guess: B(n) O(lg n) 2 k 1 + 1 2 k 2 i + 1 + 1 = B(2 k 2 ) + 2 2 k k + k = k From the definition from O, we want to prove that B(n) c lg n B(n) B ( n/2 ) + 1 c lg n/2 + 1 = c lg n c lg 2 + 1 = c lg n K c lg n We also have to show that the base case holds.

General Analysis Plan 1 Choose an appropriate experimental design 2 Decide on the efficiency metric and the measurement unit (count vs. time) 3 How will input space be sampled? (range, size, etc.) 4 Implement the algorithm 5 Generate a sample of inputs 6 Run the algorithm on the sample inputs, record results 7 Analyze the results

What Are We Measuring? Counter Time

What Are We Measuring? Counter Identify & count basic operations as program runs Requires additions to the code Sometimes you can use a profiler to help identify the basic operation Time

What Are We Measuring? Counter Identify & count basic operations as program runs Requires additions to the code Sometimes you can use a profiler to help identify the basic operation Time Measure the length of time it takes for the algorithm to complete the task May be done in implementation or (sometimes) using OS commands Times can be inaccurate Small inputs may clock in below threshold of timer precision On a multiprocessing OS, you will need to consider how much CPU time the user got, etc. Times on different machines may differ

Obtaining Statistics Sample the input space Counts & times can vary on different inputs of the same size Times can vary in the same input For some kinds of algorithms, counts can too (e.g., RandomizedQuickSort) Choose an input range that makes sense for your purposes Report the results Aggregate (average, median, etc.) results across runs for each input size (and possibly for the same input) Tabulate or graph the data, typically as size versus time (p.89) Fit a function to the data

Obtaining Statistics (2) Be careful what you conclude Constants terms and factors can be misleading when input sizes are relatively small When possible, use a regression method to be as statistically certain as possible Remember: Empiricism is about belief (don t conclude too much)

Obtaining Statistics (2) Be careful what you conclude Constants terms and factors can be misleading when input sizes are relatively small When possible, use a regression method to be as statistically certain as possible Remember: Empiricism is about belief (don t conclude too much) Time / Count 50 100 150 5.7 n + 5 10 15 20 25 30 Input Size

Obtaining Statistics (2) Be careful what you conclude Constants terms and factors can be misleading when input sizes are relatively small When possible, use a regression method to be as statistically certain as possible Remember: Empiricism is about belief (don t conclude too much) Time / Count 50 100 150 5.7 n + 5 Time / Count 0 500 1000 1500 nlog 2n 10 15 20 25 30 50 100 150 200 Input Size Input Size

Assignments This week s assignments: Section 2.1: Problems 2, 6, and 9 Section 2.2: Problems 2, 5, and 6 Section 2.3: Problems 2 and 4 Section 2.4: Problems 1, 8, 9, and 10 Section 2.5: Problem 6 Section 2.6: Problems 1, 6, and 8 Challenge problem

Project I: Analysis of Simple Sorting Write a paper comparing and contrasting BubbleSort, MergeSort, and ShellSort. Discuss the advantages and disadvantages of each in the context of efficiency of computational running-time with respect to input size. For BubbleSort, include a proof of correctness for the algorithm. In addition, implement the three algorithms and conduct an empirical analysis. Describe how you chose to measure input size and running times, including your motivation for those choices. Be as clear and detailed as possible about all of these choices. For example, if you choose to count operations, explain which operation you will be counting and why. Additionally, the report should contain graphs and/or tables of input size versus running time and a discussion for how the data matches the formal running time. Submit the project electronically by sending me an email message with attached gzipped tar file. The tarball should include: 1 The report in text, PS, PDF, HTML, or RTF format (do NOT send me a Word document); 2 Any graphs non-embedded graphics in non-proprietary formats such as PS, JPEG, PNG, etc. (do NOT send me an Excel spreadsheet); 3 Source code, but not binaries; 4 Notes for compiling the sources, if necessary; 5 Any relevant data files. The project will be due by midnight March 8.