data structures and algorithms lecture 2

data structures and algorithms 2018 09 06 lecture 2

recall: insertion sort Algorithm insertionsort(a, n): for j := 2 to n do key := A[j] i := j 1 while i 1 and A[i] > key do A[i + 1] := A[i] i := i 1 A[i + 1] := key worst-case time complexity is in Θ(n 2 ) and the best-case time complexity? what about space complexity?

time complexity we aim to express the time complexity as a function T that takes as input the size n of the input of the algorithm (for example the length of the input-array) and computes the number of elementary steps the algorithm performs usually we are interested in the function for the worst-case complexity usually we are interested in the rate of growth of that function

insertion sort: worst-case time complexity running time depends on the input: best case if input is sorted, worst case if input is inverse sorted running time depends on the size of the input: we want to express running time as function T of size of the input which running time? best case is useless; average case is interesting; worst case gives guarantee worst-case time complexity is quadratic in the size of the input T (n) in Θ(n 2 )

properties of sorting algorithms a sorting algorithm may or may not be comparison-based based on comparisons of pairs of elements in-place use the space for the input sequence plus a constant amount of space stable keep the order of equal elements (only interesting if they are keys of some bigger item) efficient what is a good efficiency for sorting? what is the best we can do?

sorting there are many different sorting algorithms see for example: The art of computer programming by Donald Knuth art of computer programming

overview asymptotic notation merge sort

rate growth of some important functions f (x) = e x f (x) = x 2 f (x) f (x) = x log x f (x) = x f (x) = log x x

Θ: first intuition notation for the rate of growth of the function when the input goes to infinity first approximation: drop lower order terms and drop leading constants example: n 2 + 50n + 5000 is in Θ(n 2 ) example: 2 n + 5000n 2 is in θ(2 n ) example: n log n + 5000n is in θ(n log n)

more asymptotic bounds: first intuition what is the behaviour of the function for large inputs Θ asymptotic tight bound ( = ) O asymptotic upper bound ( ) Ω asymptotic lower bound ( ) o asymptotic upper bound not tight ( < ) ω asymptotic lower bound not tight ( > )

definition: asymptotic tight bound Θ for f, g : N R + : f (n) Θ(g(n)) if c, c > 0 R n 0 > 0 N n N : n n 0 c g(n) f (n) c g(n) f (n) is eventually as large as g(n) up to constants

asymptotic upper bound: O for f, g : N R + : f (n) O(g(n)) if c > 0 R n 0 > 0 N n N : n n 0 f (n) c g(n) f (n) is eventually smaller-equal g(n) up to a constant

Θ in terms of big-o alternative definition of Θ using O: f (n) Θ(g(n)) if f (n) O(g(n)) and g(n) O(f (n))

O: examples in math approach 5n 2 + 3n log n + 2n + 5 O(n 2 ) For n 2 we have 5n 2 + 3n log n + 2n + 5 (5 + 3 + 2 + 5)n 2, so take c = 15. 3n 2 + 5n + 8 O(n 2 ) 3n 2 + 5n + 8 Θ(n 2 ) 3n 2 + 5n + 8 O(n 3 ) 3n s + 5n + 8 Θ(n 3 ) 1000 Θ(1)

Θ and O worst-case time complexity of insertion sort is in O(n 2 ) time complexity of insertion sort is in O(n 2 ) worst-case time complexity of insertion sort is in Θ(n 2 ) time complexity of insertion sort is not in Θ(n 2 ) (sometimes better!)

mathematics or algorithm analysis from a mathematical point of view n 2 O(n 3 ) is correct from a algorithm analysis point of view we write n 2 in O(n 2 ) we almost always take the algortithm analysis point of view!

big-o: some properties n a O(n b ) for all 0 < a b 1 2 n2 O(n 3 ) log a (n) O(n b ) for all a, b > 0 log n O( n) n a O(b n ) for all a > 0 and b > 1 n 5 O(2 n ) log a n O(log b n) for all a, b > 0 log 2 n O(log 3 n) the latter because a log a b log b n = (a log a b ) log b n = b log b n = n hence we have: log a b log b n = log a n

some summations we often use n i=0 a i = 1 an+1 1 a see also book 3.2 n i = i=1 n(n + 1) 2

quiz Algorithm Loop1(n): s := 0 for i := 1 to n do s := s + i T (n) is in Θ(?)

quiz Algorithm Loop2(n): p := 1 for i := 1 to 2n do p := p i T (n) is in Θ(?)

overview asymptotic notation merge sort

merge sort: idea how do we sort? stop condition: if the sequence contains less than 1 element; otherwise: split the sequence into two parts of (almost) equal length sort the two subsequences using mergesort (recursion) merge the two sorted sublists how do we merge? loop: as long as both sequences are non-empty compare the first elements of both sequences remove the smallest and move it to the end of the output-sequence last: if one of the sequences is empty, add the other to the tail of the output-sequence

applying merge sort: see also exercise class 1 7 1 2 9 6 5 3 8 1 2 3 5 6 7 8 9 7 1 2 9 1 2 7 9 6 5 3 8 3 5 6 8 7 1 1 7 2 9 2 9 6 5 5 6 3 8 3 8 7 7 1 1 2 2 9 9 6 6 5 5 3 3 8 8 each node represents a recursive call

mergesort: some pseudo-code Algorithm mergesort(a, p, r): if p < r then q := (p + r)/2 mergesort(a, p, q) mergesort(a, q + 1, r) Merge(A, p, q, r) where is the work done? for pseudo-code Merge see the book

merge: worst-case time complexity example: merge 3 5 7 8 and 1 2 4 5 comparisons in elementary time as many comparisons as there are elements in total time complexity of merging n elements in Θ(n)

mergesort: worst-case time complexity via tree analysis analyze recursion tree: height is log n for n the length of the input-array number of levels is log n + 1 number of leaves is n total: c n log n (from the upper n levels) + d(n) (from the leaves) so the T (n) is of the shape an log n + bn so the time complexity of merge sort is in Θ(n log n)

mergesort: picture of the recursion tree depth nodes size 0 1 n 1 2 n/2 i 2 i n/2 i.........

mergesort: time complexity via recurrence equations divide is easy, in Θ(1) conquer uses recursion, two subproblems of size half combine is difficult, in Θ(n) first assume n = 2 i { c if n = 1 T (n) = 2T ( n 2 ) + cn if n > 1

mergesort: drawback space complexity merging two sublists takes place outside the memory for the input-sequence so worst-case time complexity of insertion sort is in Θ(n 2 ) worst-case time complexity of merge sort is in Θ(n log n) but insertion sort is in-place and merge sort is not in-place obvious question: is there a n log n in-place sorting algorithm?

insertion sort versus merge sort insertion sort is in Θ(n 2 ), say with T I = 2 n 2 merge sort is in Θ(n log n), say with T M = 50 n log n assume our computer performs 10 9 operations per second assume our input-sequence has length n = 10 7 insertion sort takes more or less 2 10 5 seconds (55 hours) merge sort takes about 12 seconds see exercise on exercise sheet 1

mergesort: inventor mergesort was invented by John von Neumann (1903 1957) in 1945

summary book: 2, 3, 4.3, 4.4 we continue with chapter 5 in lecture 3 exercise class 2 tomorrow at 11.00 with groups as in canvas group 1 in WN-P647 group 2 in HG-08A2 group 3 in WN-C659 group 4 in HG-11A24