Searching. Sorting. Lambdas - PDF Free Download

.. s Babes-Bolyai University arthur@cs.ubbcluj.ro

Overview 1 2 3

Feedback for the course You can write feedback at academicinfo.ubbcluj.ro It is both important as well as anonymous Write both what you like (so we keep&improve it) and what you don t Best if you write about all activities (lecture,seminar and laboratory)

Data are available in the internal memory, as a sequence of records (k 1, k 2,..., k n ) Search a record having a certain value for one of its fields, called the search key. If the search is successful, we have the position of the record in the given sequence. We approach the search s two possibilities separately: with unordered keys with ordered keys

- unordered keys Problem specification Data: a, n, (k i, i = 0,.., n 1), where n N, n 0. Results: p, where (0 p n 1, a = k p ) or p = 1, if key is not found.

- unordered keys d e f s e a r c h S e q ( e l, l ) : Search f o r an element i n l i s t e l element l l i s t o f e l e m e n t s Return the p o s i t i o n o f the element, 1 i f not found poz = 1 f o r i i n range ( 0, l e n ( l ) ) : i f e l == l [ i ] : poz = i r e t u r n poz n 1 Computational complexity is T (n) = 1 = n Θ(n) i=0

- unordered keys d e f s e a r c h S e q ( e l, l ) : Search f o r an element i n l i s t e l element l l i s t o f e l e m e n t s Return the p o s i t i o n o f the element, 1 i f not found i = 0 w h i l e i <l e n ( l ) and e l!= l [ i ] : i += 1 i f i <l e n ( l ) : r e t u r n i r e t u r n 1 What is the difference between this and the previous version?

- unordered keys Best case: the element is at the first position, T (n) Θ(1). Worst case: the element is in the n-1 position, T (n) Θ(n). Average case: if distributing the element uniformly, the loop can be executed 0, 1,.., n 1 times, so T (n) = 1+2+..+n 1 n Θ(n). Overabll complexity is O(n)

- ordered keys Problem specification Data: a, n, (k i, i = 0,.., n 1), where n N, n 0, and k 0 < k 1 <... < k n 1 ; Results: p, where (p = 0 and a k 0 ) or (p = n and a > k n 1 ) or (0 < p n 1) and (k p 1 < a k p ).

- ordered keys d e f s e a r c h S e q ( e l, l ) : Search f o r an element i n l i s t e l element l l i s t o f o r d e r e d e l e m e n t s Return the p o s i t i o n o f the f i r s t o c c u r r e n c e, or p o s i t i o n where element can be i n s e r t e d i f l e n ( l ) == 0 : r e t u r n 0 poz = 1 f o r i i n range ( 0, l e n ( l ) ) : i f e l <=l [ i ] : poz = i i f poz == 1: r e t u r n l e n ( l ) r e t u r n poz n 1 Computational complexity is T (n) = 1 = n Θ(n) i=0

- ordered keys d e f s e a r c h S u c c e s o r ( e l, l ) : Search f o r an element i n l i s t e l element l l i s t o f o r d e r e d e l e m e n t s Return the p o s i t i o n o f the f i r s t o c c u r r e n c e, or p o s i t i o n where element can be i n s e r t e d i f l e n ( l )==0 or e l <=l [ 0 ] : r e t u r n 0 i f e l >=l [ 1 ] : r e t u r n l e n ( l ) i = 0 w h i l e i <l e n ( l ) and e l >l [ i ] : i += 1 r e t u r n i

- ordered keys Best case: the element is at the first position, T (n) Θ(1). Worst case: the element is in the n-1 position, T (n) Θ(n). Average case: if distributing the element uniformly, the loop can be executed 0, 1,.., n 1 times, so T (n) = 1+2+..+n 1 n Θ(n). Overabll complexity is O(n)

Sequential search Keys are successively examined Keys may not be ordered Uses the divide and conquer technique Keys are ordered

Recursive binary-search algorithm d e f b i n a r y S e a r c h ( key, data, l e f t, r i g h t ) : Search f o r an element i n an o r d e r e d l i s t key element to s e a r c h l e f t, r i g h t bounds o f the s e a r c h Return i n s e r t i o n p o s i t i o n o f key t h a t keeps l i s t o r d e r e d i f l e f t >= r i g h t 1 : r e t u r n r i g h t m i d d l e = ( l e f t + r i g h t ) // 2 i f key < data [ m iddle ] : r e t u r n b i n a r y S e a r c h ( key, data, l e f t, middle ) e l s e : r e t u r n b i n a r y S e a r c h ( key, data, middle, r i g h t ) p r i n t ( b i n a r y S e a r c h (2000, data, 0, l e n ( data ) ) )

Recursive binary-search function d e f s e a r c h ( key, data ) : Search f o r an element i n an o r d e r e d l i s t key element to s e a r c h data the l i s t Return i n s e r t i o n p o s i t i o n o f key t h a t keeps l i s t o r d e r e d i f l e n ( data ) == 0 or key < data [ 0 ] : r e t u r n 0 i f key > data [ 1 ] : r e t u r n l e n ( data ) r e t u r n b i n a r y S e a r c h ( key, data, 0, l e n ( data ) )

Binary-search recurrence The recurrence: T(n) = { 1, n = 1 T ( n 2 ) + 1, n > 1

Iterative binary-search function d e f b i n a r y S e a r c h ( key, data ) : s p e c i f i c a t i o n i f l e n ( data ) == 0 or key < data [ 0 ] : r e t u r n 0 i f key > data [ 1 ] : r e t u r n l e n ( data ) l e f t = 0 r i g h t = l e n ( data ) w h i l e r i g h t l e f t > 1 : m i d d l e = ( l e f t + r i g h t ) // 2 i f key <= data [ m i d d l e ] : r i g h t = m i d d le e l s e : l e f t = m i d d l e r e t u r n r i g h t

Search runtime complexity Algorithm Best case Average Worst case Overall Sequential Θ(n) Θ(n) Θ(n) Θ(n) Succesor Θ(1) Θ(n) Θ(n) O(n) Binary-search Θ(1) Θ(log 2 n) Θ(log 2 n) O(log 2 n)

Demo Collections and search Examine the source code in ex34 search.py

Rearrange a data collection in such a way that the elements of the collection verify a given order. Internal sort - data to be sorted are available in the internal memory External sort - data is available as a file (on external media) In-place sort - transforms the input data into the output, only using a small additional space. Its opposite is called out-of-place. stability - we say that sorting is stable when the original order of multiple records having the same key is preserved

Demo Stable sort example Examine the source code in ex35 stablesort.py

Elements of the data collection are called records A record is formed by one or more components, called fields A key K is associated to each record, and is usually one of the fields. We say that a collection of n records is: Sorted in increasing order by the key K: if K(i) K(j) for 0 i < j < n Sorted in decreasing order: if K(i) K(j) for 0 i < j < n

Internal sorting Problem specification Data: n, K, where K = (k 1, k 2,..., k n ), k i R, i = 1, n Results: K, where K is a permutation of K, having sorted elements: k 1 k 2... k n.

A few that we will study: Bubble sort Quick sort

Selection Sort Determine the element having the minimal key, and swap it with the first element. Resume the procedure for the remaining elements, until all elements have been considered.

algorithm d e f s e l e c t i o n S o r t ( data ) : f o r i i n range ( l e n ( data ) ) : m i n i n d e x = i # Find s m a l l e s t element i n the r e s t o f the l i s t f o r j i n range ( i +1, l e n ( data ) ) : i f data [ j ] < data [ m i n i n d e x ] : m i n i n d e x = j data [ i ], data [ m i n i n d e x ] = data [ m i n i n d e x ], data [ i ]

- time complexity The total number of comparisons is n 1 n n(n 1) 1 = Θ(n 2 ) 2 i=1 j=i+1 Independent of the input data size, what are the best, average, worst-case computational complexities?

- space complexity In-place. Algorithms that use a small (constant) quantity of additional memory. Out-of-place or not-in-space. Algorithms that use a non-constant quantity of extra-space. The additional memory required by selection sort is O(1). is an in-place sorting algorithm.

Direct selection sort d e f d i r e c t S e l e c t i o n S o r t ( data ) : f o r i i n range ( 0, l e n ( data ) 1) : # S e l e c t the s m a l l e s t element f o r j i n range ( i + 1, l e n ( data ) ) : i f data [ j ] < data [ i ] : data [ i ], data [ j ] = data [ j ], data [ i ]

Direct selection sort Overall time complexity: n 1 n i=1 j=i+1 1 = n(n 1) 2 Θ(n 2 )

Insertion Sort Traverse the elements. Insert the current element at the right position in the subsequence of already sorted elements. The sub-sequence containing the already processed elements is kept sorted, so that, at the end of the traversal, the whole sequence is sorted.

Insertion Sort - Algorithm d e f i n s e r t S o r t ( data ) : f o r i i n range ( 1, l e n ( data ) ) : i n d e x = i 1 elem = data [ i ] # I n s e r t i n t o c o r r e c t p o s i t i o n w h i l e i n d e x >= 0 and elem < data [ i n d e x ] : data [ i n d e x + 1 ] = data [ i n d e x ] i n d e x = 1 data [ i n d e x + 1 ] = elem

Insertion Sort - time complexity Maximum number of iterations (worst case) happens if the initial array is sorted in a descending order: n n(n 1) T (n) = (i 1) = Θ(n 2 ) 2 i=2

Insertion Sort - time complexity Minimum number of iterations (best case) happens if the initial array is already sorted: n T (n) = 1 = n 1 Θ(n) i=2

Insertion Sort - Space complexity Time complexity - The overall time complexity of insertion sort is O(n 2 ). Space complexity - The complexity of insertion sort is θ(1) is an in-place sorting algorithm.

Compares pairs of consecutive elements that are swapped if not in the expected order. The comparison process ends when all pairs of consecutive elements are in the expected order.

- Algorithm d e f b u b b l e S o r t ( data ) : done = F a l s e w h i l e not done : done = True f o r i i n range ( 0, l e n ( data ) 1) : i f data [ i ] > data [ i +1]: data [ i ], data [ i +1] = data [ i +1], data [ i ] done = F a l s e

- Complexity Best-case running time complexity order is θ(n) Worst-case running time complexity order is θ(n 2 ) Average running-time complexity order is θ(n 2 ) Space complexity, additional memory required is θ(1) Bubble sort is an in-place sorting algorithm.

Based on the divide and conquer technique 1 Divide: partition array into 2 sub-arrays such that elements in the lower part elements in the higher part. Partitioning Re-arrange the elements so that the element called pivot occupies the final position in the sub-sequence. If i is that position: k j k i k l, for Left j < i < l Right 2 Conquer: recursively sort the 2 sub-arrays. 3 Combine: trivial since sorting is done in place.

- partitioning algorithm d e f p a r t i t i o n ( data, l e f t, r i g h t ) : p i v o t = data [ l e f t ] i = l e f t j = r i g h t w h i l e i!= j : # Find an element s m a l l e r than the p i v o t w h i l e data [ j ] >= p i v o t and i < j : j = 1 data [ i ] = data [ j ] # Find an element l a r g e r than the p i v o t w h i l e data [ i ] <= p i v o t and i < j : i += 1 data [ j ] = data [ i ] # P l a c e the p i v o t i n p o s i t i o n data [ i ] = p i v o t r e t u r n i

- algorithm d e f q u i c k S o r t ( data, l e f t, r i g h t ) : # P a r t i t i o n the l i s t pos = p a r t i t i o n ( data, l e f t, r i g h t ) # Order l e f t s i d e i f l e f t < pos 1 : q u i c k S o r t ( data, l e f t, pos 1) # Order r i g h t s i d e i f pos + 1 < r i g h t : q u i c k S o r t ( data, pos + 1, r i g h t )

- time complexity The run time of quick-sort depends on the distribution of splits The partitioning function requires linear time Best case, the partitioning function splits the array evenly: T (n) = 2T ( n 2 ) + Θ(n), T (n) Θ(n log 2 n)

- best partitioning We partition n elements log 2 n times, so T (n) Θ(n log 2 n)

- worst partitioning In the worst case, function Partition splits the array such that one side of the partition has only one element: T (n) = T (1) + T (n 1) + Θ(n) = T (n 1) + Θ(n) = n Θ(k) Θ(n 2 ) k=1

- Worst case Worst case partitioning appears when the input array is sorted or reverse sorted, so n elements are partitioned n times, T (n) Θ(n 2 )

runtime complexity Algorithm Worst case Average Θ(n 2 ) Θ(n 2 ) Θ(n 2 ) Θ(n 2 ) Bubble sort Θ(n 2 ) Θ(n 2 ) Quick sort Θ(n 2 ) Θ(n log 2 n)

Demo Examine the source code in ex36 sort.py

expressions expressions Small anonymous functions, that you define and use in the same place. Syntactically restricted to a single expression. Can reference variables from the containing scope (just like nested functions). They are syntactic sugar for a function definition.

Demo Examine the source code in ex36 lambdas.py