Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Similar documents
CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

COMP 9024, Class notes, 11s2, Class 1

Analysis of Algorithms - Using Asymptotic Bounds -

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

with the size of the input in the limit, as the size of the misused.

data structures and algorithms lecture 2

Introduction to Algorithms

Growth of Functions (CLRS 2.3,3)

Data Structures and Algorithms CSE 465

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 3. Θ Notation. Comparing Algorithms

Data Structures and Algorithms Running time and growth functions January 18, 2018

CIS 121. Analysis of Algorithms & Computational Complexity. Slides based on materials provided by Mary Wootters (Stanford University)

Asymptotic Analysis 1

Chapter 2. Recurrence Relations. Divide and Conquer. Divide and Conquer Strategy. Another Example: Merge Sort. Merge Sort Example. Merge Sort Example

COMP 355 Advanced Algorithms

Big O 2/14/13. Administrative. Does it terminate? David Kauchak cs302 Spring 2013

Notes for Recitation 14

Mergesort and Recurrences (CLRS 2.3, 4.4)

COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background

Algorithms Design & Analysis. Analysis of Algorithm

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

Methods for solving recurrences

Asymptotic Analysis and Recurrences

Divide and Conquer Algorithms

Chapter 4. Recurrences

Grade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2

Divide and Conquer. CSE21 Winter 2017, Day 9 (B00), Day 6 (A00) January 30,

Algorithms Chapter 4 Recurrences

In-Class Soln 1. CS 361, Lecture 4. Today s Outline. In-Class Soln 2

CIS 121 Data Structures and Algorithms with Java Spring Big-Oh Notation Monday, January 22/Tuesday, January 23

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

COMP 382: Reasoning about algorithms

Divide-and-conquer: Order Statistics. Curs: Fall 2017

CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms

Introduction to Divide and Conquer

Divide and Conquer Algorithms

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu

When we use asymptotic notation within an expression, the asymptotic notation is shorthand for an unspecified function satisfying the relation:

i=1 i B[i] B[i] + A[i, j]; c n for j n downto i + 1 do c n i=1 (n i) C[i] C[i] + A[i, j]; c n

Review Of Topics. Review: Induction

CS 4407 Algorithms Lecture 2: Growth Functions

Computational Complexity

Solving recurrences. Frequently showing up when analysing divide&conquer algorithms or, more generally, recursive algorithms.

Divide and Conquer CPE 349. Theresa Migler-VonDollen

Divide and Conquer. Arash Rafiey. 27 October, 2016

When we use asymptotic notation within an expression, the asymptotic notation is shorthand for an unspecified function satisfying the relation:

Asymptotic Analysis. Thomas A. Anastasio. January 7, 2004

Algorithm Analysis Recurrence Relation. Chung-Ang University, Jaesung Lee

3.1 Asymptotic notation

Algorithm Design and Analysis

Written Homework #1: Analysis of Algorithms

Problem 5. Use mathematical induction to show that when n is an exact power of two, the solution of the recurrence

CS 4104 Data and Algorithm Analysis. Recurrence Relations. Modeling Recursive Function Cost. Solving Recurrences. Clifford A. Shaffer.

Inf 2B: Sorting, MergeSort and Divide-and-Conquer

Data Structures and Algorithms Chapter 3

CS473 - Algorithms I

Data selection. Lower complexity bound for sorting

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

CS Data Structures and Algorithm Analysis

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

CS 5321: Advanced Algorithms - Recurrence. Acknowledgement. Outline. Ali Ebnenasir Department of Computer Science Michigan Technological University

Analysis of Algorithms

Lecture 2: Asymptotic Notation CSCI Algorithms I

Introduction to Algorithms 6.046J/18.401J

CS173 Running Time and Big-O. Tandy Warnow

CMPS 2200 Fall Divide-and-Conquer. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

CS 5321: Advanced Algorithms Analysis Using Recurrence. Acknowledgement. Outline

Divide-Conquer-Glue Algorithms

Recurrence Relations

Module 1: Analyzing the Efficiency of Algorithms

Problem Set 1 Solutions

CPS 616 DIVIDE-AND-CONQUER 6-1

Asymptotic Notation. such that t(n) cf(n) for all n n 0. for some positive real constant c and integer threshold n 0

Chapter 4 Divide-and-Conquer

Analysis of Algorithms

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Asymptotic Algorithm Analysis & Sorting

Data Structures and Algorithms Chapter 3

Principles of Algorithm Analysis

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Asymptotic Analysis, recurrences Date: 9/7/17

Lecture 4. Quicksort

Divide-and-Conquer Algorithms Part Two

Big-O Notation and Complexity Analysis

282 Math Preview. Chris Brown. September 8, Why This? 2. 2 Logarithms Basic Identities Basic Consequences...

What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt

Analysis of Algorithms

Appendix: Solving Recurrences

The Time Complexity of an Algorithm

Algorithms And Programming I. Lecture 5 Quicksort

2.2 Asymptotic Order of Growth. definitions and notation (2.2) examples (2.4) properties (2.2)

Asymptotic Analysis. Slides by Carl Kingsford. Jan. 27, AD Chapter 2

Design and Analysis of Algorithms Recurrence. Prof. Chuhua Xian School of Computer Science and Engineering

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 2

IS 709/809: Computational Methods in IS Research Fall Exam Review

V. Adamchik 1. Recurrences. Victor Adamchik Fall of 2005

Week 5: Quicksort, Lower bound, Greedy

Transcription:

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort Xi Chen Columbia University

We continue with two more asymptotic notation: o( ) and ω( ). Let f (n) and g(n) are functions that map n = 1, 2,... to real numbers, then we let { o(g(n)) = f (n) : for any constant c > 0, there exists an n 0 > 0 } s.t. 0 f (n) < c g(n) for all n n 0 Usually we use f (n) = o(g(n)) to denote f (n) o(g(n)). It means asymptotically f (n) is dominated by g(n), or the order of f (n) is strictly less than that of g(n).

Note the crucial difference between O(g(n)) and o(g(n)): there exists a constant c > 0 versus for any constant c > 0. Usually f (n) = o(g(n)) can be proved using when the limit exists. f (n) lim n g(n)

For example, n 1.9 lim n n 2 = 0 By the definition of limits, this implies that for any constant c > 0, there exists an n 0 > 0 such that n 1.9 n 2 < c, for all n n 0. It follows from the definition of o(n 2 ) that n 1.9 = o(n 2 ), and in general, n a = o(n b ) for all constants a, b : 0 < a < b.

Similarly, we have for any positive constants a > 1 and b, c > 0, n b lim n a n = 0 nb = o(a n ) (1) lg c n lim n n b = 0 lg c n = o(n b ) (2) Here the limit in (2) follows from the one in (1), while (1) can be proved using the l Hopital s rule.

As a result, we have lg n < lg 2 n < < n < n lg n < n 2 < n 3 < < 2 n < 3 n < n! where < means little-o or is asymptotically dominated by. Also read Section 3.2 if you are not familiar with common functions like the floors, ceilings, polynomials, exponentials or logarithms.

Finally, let f (n) and g(n) are functions that map n = 1, 2,... to real numbers, then { ω(g(n)) = f (n) : for any constant c > 0, there exists an n 0 > 0 } s.t. 0 c g(n) < f (n) for all n n 0 Usually we use f (n) = ω(g(n)) to denote f (n) ω(g(n)). Check that f (n) = ω(g(n)) if and only if g(n) = o(f (n)). It means that f (n) dominates g(n) asymptotically.

In the last class, we described InsertionSort and showed that its worst-case running time is Θ(n 2 ). However, we did not prove its correctness. Check Figure 2.2 for the intuition why it is correct. To give a formal proof, we use (mathematical) induction.

Induction is usually used to prove that a mathematical statement, involving a parameter n, holds for all n = 1, 2,... It has 3 steps: 1 Basis: Check that the statement holds for n = 1. 2 Induction Step: Prove that for any k 2, if the statement holds for 1, 2,..., k 1, then it also holds for k. 3 Conclude that, by induction, the statement holds for 1, 2,... Here is how to get the conclusion from the Basis and Induction Step: Let n 1 be any positive integer. Then from the Basis, the statement holds for 1. Next by applying the Induction Step for k = 2, we know that the statement holds for 1 and 2. Keep applying the Induction Step for n 1 times, we know that the statement holds for 1, 2,..., n, and this finishes the proof.

In the Induction Step, we assume that the statement holds for 1, 2,..., k 1. This assumption is usually referred to as the Inductive Hypothesis. The goal of the Induction Step is then to use the Inductive Hypothesis to prove the statement for k. Here is an example:

Theorem For any n 1, we have 1 2 + 2 2 + + n 2 = n(n + 1)(2n + 1) / 6. Proof. 1 Basis: Easy to check that the equation holds for n = 1. 2 Induction Step: Let k 2 be any integer, and assume the equation holds for 1, 2,..., k 1 (the Inductive Hypothesis). In particular, this implies that it holds for k 1. Thus, 1 2 + + k 2 = (1 2 + + (k 1) 2 ) + k 2 = (k 1)k(2k 1)/6 + k 2 = k(k + 1)(2k + 1)/6 3 Conclude that the equation holds for any n = 1, 2,...

Now back to InsertionSort, we prove the following theorem: Theorem Let n 1 be a positive integer. If A = (a 1,..., a n ) is the input sequence of InsertionSort, then after the ith loop, where i = 1, 2,..., n, the list B is of length i and is a nondecreasing permutation of the first i integers of A. The intuition of this statement comes from the example of Figure 2.2 in the textbook. The correctness of InsertionSort clearly follows from this theorem.

Proof. 1 Basis: The statement holds for i = 1 because in the first loop, we simply insert the first integer a 1 into the empty list B. 2 Induction Step: Assume the statement holds for all i : 1 i k 1, for some k : 2 k n. In particular, after round k 1, B is of length k 1 and is a nondecreasing permutation of a 1,..., a k 1. Now in loop k, InsertionSort finds the first integer in B smaller than a k and inserts a k after that integer. As a result, the list B after round k is of length k 1 + 1 = k and is a nondecreasing permutation of a 1,..., a k 1, a k. 3 Conclude that the statement holds for 1, 2,..., n.

Note that the induction step in the last proof only works for k : 2 k n because the algorithm only has n rounds. As a result, we conclude that the statement holds for 1, 2,..., n (exactly what we need), instead of 1, 2,...

Next we describe MergeSort, a sorting algorithm that is asymptotically faster than InsertionSort. It is an example of the Divide-and-Conquer technique: 1 Divide the problem into smaller subproblems 2 Conquer (or solve) each of the subproblems recursively 3 Combine the solutions to the subproblems to get a solution to the original problem

MergeSort (A), where A = (a 1,..., a n ) is a sequence of n integers 1 If n = 1, return 2 MergeSort (a 1, a 2,..., a n/2 ) 3 MergeSort (a n/2 +1,..., a n 1, a n ) 4 Merge the two sequences obtained to produce a sorted permutation of the n integers in A

An implementation of line 4 can be found on page 31 of the textbook. It takes time Θ(n) (or in other words, cn steps for some positive constant c). And one can use induction (page 32 and 33) to show that, if the two sequences we obtain from the two recursive calls are sorted sequences of a 1,..., a n/2 and a n/2 +1,..., a n, respectively, then the Merge subroutine outputs a sorted permutation of a 1,..., a n.

Practice the use of induction to show that Theorem MergeSort outputs correctly for sequences of length n = 1, 2,...

To understand the performance of MergeSort, we use T (n) to denote the number of steps it takes over sequences of length n. We get the following recurrence: T (1) = Θ(1) usually let Θ(1) denote a positive constant T (n) = Θ(1) + T ( n/2 ) + T ( n/2 ) + Θ(n) for all n 2

For now, we focus the analysis on powers of 2: n = 2 k for some integer k 0. We will see later that this suffices to understand the asymptotic complexity of T (n). For powers of 2, we have T (1) = Θ(1) T (n) = 2T (n/2) + Θ(n)

Putting the constants back gives us T (1) = c 1 (3) T (n) = 2T (n/2) + c 2 n (4) for some positive constants c 1 and c 2.

Let n = 2 k. Then start with T (n) and substitute it using (4): T (n) = 2T (n/2) + c 2 n Next, expand it further by substituting T (n/2) using (4): T (n/2) = 2T (n/4) + c 2 n/2. We get T (n) = 2(2T (n/4) + c 2 n/2) + c 2 n = 4T (n/4) + c 2 n + c 2 n Repeat it for k rounds, and we get the so-called recursion tree in Fig 2.5 on page 38. T (n) is the sum of all numbers on the nodes: T (n) = c 2 nk + c 1 n = c 2 n lg n + c 1 n

To see why this gives us T (n) = Θ(n lg n), we use the fact that T (n) is monotonically nondecreasing over n = 1, 2,... (Try to prove it using induction). Given any integer n 1, we let n denote the largest power of 2 that is smaller than n. This implies that n < n 2n and by the monotonicity of T, we have T (n ) T (n) T (2n ) Because both n and 2n are powers of 2, from the last slide, T (n) T (2n ) = 2c 2 n lg(2n ) + 2c 1 n < 2c 2 n lg(2n) + 2c 1 n T (n) T (n ) = c 2 n lg n + c 1 n c 2 (n/2) lg(n/2) + c 1 (n/2) From these two inequalities, it is easy to show T (n) = Θ(n lg n).

From the analysis, we see that c 1 and c 2 do not affect the asymptotic complexity of T (n). This is why we can suppress them and denote the running time of line 1 and 4 by Θ(1) and Θ(n), respectively, and we do not care what they are exactly in the analysis.

However, the coefficient 2 of T (n/2) cannot be suppressed. Changing it would change the order of T (n). In particular, if we change it to 3, then the recursion tree would look different: every internal node would have 3 children instead of 2. This would change the order of T (n) significantly. We will use an example: Strassen s algorithm for matrix multiplication, to demonstrate the importance of this constant in the next class.