Chapter 1 Search Structures 1
1.2 AVL Trees : 동기 Objective : maintain binary search tree structure for O(log n) access time with dynamic changes of identifiers (i.e., elements) in the tree. JULY APR AUG FEB MAY DEC FEB AUG JAN MAR OCT JAN JULY NOV DEC JUNE NOV SEPT JUNE MAR 최상의 dynamic insertions MAY 최악의 dynamic insertions NOV OCT SEPT 2
1.2 AVL Trees : 정의 Height-balanced: An empty tree is height-balanced. If T is a non-empty binary tree with T L and T R as its left and right subtrees, respectively, then T is height-balanced iff (1) T L and T R are heightbalanced and (2) h L -h R < 2 where h L and h R are the heights of T L and T R, respectively. Balance factor: The balance factor, BF(k), of a node k in a binary tree : h L h R. AVL is a binary search tree, satisfying for any node k in the tree, BF(k) = -1,, or 1. 3
1.2 AVL Trees : Insertion MAR MAR - 1 + 1 MAY + 1 (a) Insert MARCH (b) Insert MAY MAY MAR NOV - 2 AUG (d) Insert AUGUST MAR - 1 MAY RR MAY MAR NOV NOV (c) Insert NOV 4
1.2 AVL Trees : Insertion (2) +2 MAY +2 +1 MAY +1 AUG MAR NOV LL (e) Insert APRIL MAR AUG APR NOV APR +2 MAY - 1 MAR - 1 AUG + 1 NOV LR AUG MAY APR MAR (f) Insert JANUARY APR JAN NOV JAN 5
1.2 AVL Trees : Insertion (3) APR AUG +1 MAR - 1-1 + 1 JAN MAY NOV LR APR +1 MAR - 1-1 AUG JAN MAY NOV DEC DEC JULY (g) Insert DECEMBER (h) Insert JULY APR +2 MAR - 2-1 DEC MAY +1-1 JAN NOV RL +1 MAR -1 DEC MAY + AUG 1 JAN NOV DEC JULY APR FEB JULY FEB (i) Insert FEBRUARY 6
1.2 AVL Trees : Insertion (4) APR +1 AUG +2 MAR -1-1 DEC MAY FEB -1 JAN -1 JULY NOV JUNE LR APR +1 AUG (j) Insert JUNE +1 DEC FEB JAN +1 AUG JULY MAR NOV JULY +1 AUG APR JAN +1-1 DEC FEB -1-1 JULY JUNE MAR -2 MAY -1 NOV OCT RR +1 AUG APR (k) Insert OCTOBER JAN +1 DEC FEB -1 JULY JUNE MAR MAY NOV 7 OCT
1.2 AVL Trees : Insertion (4) APR +1 AUG +2 MAR -1-1 DEC MAY FEB -1 JAN -1 JULY NOV JUNE LR APR +1 AUG (j) Insert JUNE +1 DEC FEB JAN +1 AUG JULY MAR NOV JULY +1 AUG APR JAN +1-1 DEC FEB -1-1 JULY JUNE MAR -2 MAY -1 NOV OCT RR +1 AUG APR (k) Insert OCTOBER JAN +1 DEC FEB -1 JULY JUNE MAR MAY NOV 8 OCT
1.2 AVL Trees : Rebalancing rotations BL B Balanced subtree BR +1 A AR h h+2 Unbalanced following insertion BL +1 B BR +2 A AR rotation type LL h+2 Rebalanced subtree BL B BR A AR Rebalanced subtree -1 A AL B height+2 Height of BL increases to h+1-2 A AL -1 B Height of BR increases to h+1 RR Height of subtrees of B remain h+1 A B BR BL BR BL BR height+2 AL BL
1.2 AVL Trees : Rebalancing rotations(2) Balanced subtree B +1 A Unbalanced following insertion -1 B +2 A rotation type LR(a) Rebalanced subtree B C A C BL B C +1 A AR h+2 h BL -1 B +1 C +2 A AR LR(b) B C -1 A h+2 h CL CR h-1 CL CR h BL CL CR BR
1.2 AVL Trees : Rebalancing rotations(3) -1 B +2 A AR LR(c) +1 B C A h+2 BL -1 C CL CR BL CL CR BR h 11
1.2 AVL Trees : Performance comparisons Operation Sequential list Linked List AVL tree Search for x O(log n) O(log n) O(log n) Search for k-th item O(1) O(k) O(log n) Delete x O(n) O(1) 1 O(log n) Delete k-th item O(n-k) O(k) O(log n) Insert x O(n) O(1) 2 O(log n) Output in order O(n) O(n) O(n) 1. Doubly linked list and position of x known 2. Position for insertion known 12
1.3 2-3 Trees - node degree is more than 2 - a special case of B-trees Definition (2-3 tree) : (1) Each internal node is a 2-node or a 3-node. (2) if e is a 2-node, key of every element in LeftChild(node e) < key of e key of every element in MiddleChild(node e) > key of e (3) if e is a 3-node, key of every element in LeftChild(node e) < keyl of e KeyL of e < key of every element in MiddleChild(node e) > keyr of e key of every element in RightChild(node e) > keyr of e (4) All external nodes are at the same level. A 4 B C 1 2 8 13
1.3.3 Inserting into a 2-3 Tree Searching : O(log n) Inserting : O(log n) A 4 (b) 6 inserted G 4 A 2 F 7 B 1 2 8 C B 1 3 D C 6 8 E A 4 A 2 4 B 1 2 7 8 C B 1 3 D 7 8 C (a) 7 inserted (b) 3 inserted 14
1.3.4 Deletion from a 2-3 Tree Deleting : O(log n) A 5 8 B 1 2 C 6 D 9 95 A 5 8 B 1 2 C 6 7 D 9 95 7 deleted A 5 8 B 1 2 C 6 D 95 9 deleted 15
1.3.4 Deletion from a 2-3 Tree (2) A 5 8 B 2 8 1 deleted B 1 2 C 6 D 95 A 2 B 1 C 8 A 2 8 5 deleted B 1 C 5 D 95 A 2 6 deleted B 1 C 5 8 95 deleted 16
1.3.4 Deletion from a 2-3 Tree : Rotations (a) p is the left child r r x? r y? p q y z p x q z a b c d a b c d (b) p is the middle child r r z? r y? q x y p q x p z a b c d a b c d (c) p is the right child r r x z r w y a q y z p a q x z p b c d e b c d e 17
1.3.4 Deletion from a 2-3 Tree : Procedure Step 1: Modify node p as necessary to reflect its status after the desired element has been deleted. Step 2: for (; p has zero elements && p!=root; p = r) { let r be the parent of p, and let q be the left or right sibling of p (as appropriate); if(q is a 3-node) perform a rotation else perform a combine; } Step 3: If p has zero elements, then p must be the root. The left child of p becomes the new root, and node p is deleted. 18
1.6 B-Trees Definition: An m-way search tree satisfies (1) The root has at most m subtrees n, A, (K 1, A 1 ), (K 2, A 2 ),, (K n, A n ). (2) K i < K i+1, i = 1,, n. (3) K i < All key values in subtree A i < K i+1, i = 1,, n. (4) K n < All key values in subtree A n, All key values in subtree A < K 1. (5) The subtrees A i, i = 1,, n, are also m-way search tree. T 2, 4 a node schematic format a 2, b, (2, c), (4, d) b 1, 15 c 25, 3 d 45, 5 b 2,, (1, ), (15, ) c 2,, (25, e), (3, ) d 2,, (45, ), (5, ) e e 1,, (28, ) 28 Figure 1.35: Example of a 3-way search tree that is not a 2-3 tree Definition: A B-tree of order m is an m-way search tree, satisfying (1) The root node has at least two children. (2) All nodes other than root and failure nodes have at least m/2 children. (3) All failure nodes are at the same level. 19
1.6.3 B-Trees: Properties N: minimum number of keys in a B-tree N+1 = the number of failure nodes = the number of nodes at level l+1 > 2(m/2) l-1 If there are N key values, the level of B-tree l is l < log m/2 {N+1)/2} +1 Choice of m - depending on access time : time for reading nodes from disk + time to search the nodes for x Total maximum search time 6.8 5.7 5 125 4 m 2
1.9 Tries blank a b c g o t w blank l u a h o u oriole h wren r b bluebird bunting cardinal chickadee d s gull a u godwit goshawk thrasher thrush 21
1.9 Tries : Searching and Sampling Strategies 1. Searching : O(l) where l is the number of level 2. How to reduce l sampling strategy at the i-th level for key value x Example: Sample(x, i) = x r(x,i) for r(x,i) a randomization function blank a b c d e f g h i j k l m n o p q r s t u v w x y z b bunting goshawk wren godwit bluebird thrush thrasher e l a h A tri : sampling one character at a time, from right to left chickadee oriole cardinal gull blank a b c d e f g h i j k l m n o p q r s t u v w x y z b thrasher cardinal goshawk wren chickadee bluebird gull oriole bunting thrush godwit An optimal tri : sampling on the first level done by using the fourth character
1.9 Tries : Insertion and Deletion Shrink when deleting b l o u σ Need a count data member in each branch node δ 1 δ 2 e u bobwhite bunting b j δ 3 Section of tri showing changes resulting from inserting bobwhile and bluejay Grow when inserting ρ bluebird bluejay
Outline 1. Introduction 2. Finding max. and min. 3. Finding the 2th largest key 4. The Selection Problem 5. A lower bound for finding the median 24
1. Introduction SP : (Selection Problem) Given a set of n real numbers, find the k th smallest one, 1 k n. How can you solve it? well, (1) Sort the numbers. (2) Pick the k th smallest one. O(nlogn) Any better way? 25
What is a trivial lower bound in time complexity for solving SP? T L (n) = Ω(n) Why? What if only considering comparisons? well,... 26
P : Given a set S of n real numbers, find the largest one. W c L W c = {(?,?, x 1 ), (?,?, x 2 ),, (?,?, x n )} W c = n T L (n) = log 2 W c = log 2 n However, this is not tight!!! Why? n = 3 S = {x 1, x 2, x 3 } L = {1 1, 1 2, 1 3, 1 4 } W c = {(?,?, x 1 ), (?,?, x 2 ), (?,?, x 3 )} 1 : 2 < > 2 : 3 1 : 3 < > < > 1 1 1 2 1 3 1 4 x 3 x 2 x 3 x L >> W 1 c = n as n!!! (x 1, x 2, x 3 ) (?,?, x 2 ) (x 2, x 1, x 3 ) (?,?, x 1 ) 27
Adversary Arguments Z 1 = {, 1,, 999} Guess the number in Z 1 that I have in mind? A Guessing Game!!! I can change my mind as long as my answers(responses) are consistent!!! Maximize the number of leaves in a decision tree. 28
2. Finding Max. and Min. MM : Given a set of n real numbers, find max and min. max = the largest number min = the smallest number How can you solve MM? x 1 x 2 x 3 x 4 x 2n-1 x 2m n = 2m W {x 11, x 21, x 31,, x m1 } max L {x 12, x 22, x 32,, x m2 } min How many comparisons? m dividing m-1 finding max m-1 3 3m 2 = n 2finding min 2 Any better way? 29
What information is needed for finding max and min? Finding max : All numbers except max itself must lose at least once in some comparisons. (n-1 losses) Finding min : All numbers except min itself must win at least once in some comparisons. (n-1 wins) 1 unit of information (1 win) or (1 loss) (2n - 2) units of information are needed!!! max min x 1 x 2 x 3 x 4 x 5 x 6 L L L L L W W W W W 3
Status of a number (x i, s i ) : Status W at least one win, no loss L at least one loss, no win WL wins and losses N no comparisons (x, y) Status of keys x and y new compared by an algorithm Adversary response information New Status Units of (N,N) x>y (W,L) 2 (W,N) or (WL,N) * x>y (W,L) or (WL,L) 1 (L,N) ** x<y (L,W) 1 (W,W) x>y (W,WL) 1 (L,L) x>y (WL,L) 1 (W,L), (WL,L) or (W,WL) *** x>y No change (WL,WL) Consistent with No change * (N, W) or (N, WL) can be treated assigned symmetrically values ** (N, L) can be treated symmetrically. *** (L, W), (L, WL) or (WL, W) can be treated symmetrically 31
Example Comparison x 1, x 2 x 1, x 5 x 3, x 4 x 3, x 6 x 3, x 1 x 2, x 4 x 5, x 6 x 6, x 4 x 1 Status Value x 2 Status Value x 3 Status Value x 4 Status Value x 5 Status Value x 6 Status Value N * N * N * N * N * N * W 2 W 2 WL 2 L 1 W 15 W 15 W 25 L 8 WL 1 L 6 L 2 L 5 L 12 LW 5 L 3 LW 3 32
Theorem : Any algorithm to find max and min of n numbers must do at least 3n/2-2 comparisons in the worst case [proof] n 2m (for nsimplicity) 2 (N, N) 2m information needed n 2?? 2m-2, since 2n-2(4m-2) + (2m 2) = n 3n + ( n 2) = 2 2 2 What if n = 2m + 1? Exercise. 33
3. Finding the 2 nd largest key 2L : Given a set of n real numbers find the largest two numbers (max and max2). max max2 max2 the 2nd largest one x1 x2 x3 x n W L L L W L L 2n - 3 comparisons!!! Do we need all those L? Any better algorithm? max. max --- n - 1 comparisons max 2 ---? How many numbers were compared directly with 34
19 19 19 * 1 * 2 7 7 19 9 15 15 * 15 3 6 6 (n-1) + ( log 2 n - 1) = n + log 2 n - 2 comparisons 35
Initially, w(x i ) = 1, i = 1, 2,, n (x i, x j ) Upon each comparison of x i and x j, their values are manipulated depending on the weights, w(x i ) and w(x j ) : w(x i ) > w(x j ) x i > x j w(x i ) := w(x i ) + w(x j ); w(x j ) := w(x i ) = w(x j ) > same same w(x i ) < w(x j ) x i < x j w(x j ) := w(x j ) + w(x i ); w(x i ) := w(x i ) = w(x j ) = consistent no change w(x 1 ) w(x 2 ) w(x 3 ) w(x 4 ) w(x 5 ) 1 1 1 1 1 (x 1,x 2 ) x 1 > x 2 2 (x 3,x 4 ) x 3 > x 4 2 * (x 3,x 5 ) x 3 > x 5 3 * 36
Lemma : # of direct losers to max = log 2 n [Proof] max x i for some i w(x i ) = n w k (x i ) w(x i ) after the k th win against a previously undefeated key w k (x i ) 2 w k-1 (x i ) Why? w (x i ) = 1 w k (x i ) = w k-1 (x i )+w(x j ) 2w k-1 (x i ) since w(x j ) w k-1 (x i ) for (x i,x j ) Suppose that x i wins against t previously undefeated keys eventually. Then, n = w t (x i ) n 2 t Why? w t (x i ) 2 t w (x i ) w t (x i ) 2 t log 2 n t Theorem : Any algorithm to find the 2 nd largest number in a set of n real numbers must do at least n + log 2 n - 2 comparisons. 37
Lecture Schedule November 18 (Friday) 1: ~ 11: class A 14: ~ 15: class B Room #4443 (Oh Sang-su lecture room) 38
4. Selection Problem SP : Given a set S of n real numbers, find the k th smallest one. a n - k numbers > N k N k k - 1 numbers < N k b is less than a (b < a) b In order to fix the k th smallest number N k, the relation of N k to each number in S must be established!!! Why? 39
y N k x y x An adversary could change the value of y which is not related to N k!!! n - 1 crucial comparisons!!! Why? Theorem : Finding the k th smallest element in S requires at least S - 1 comparisons. 4
How to find the k th smallest one A straightforward approach (1) Sort S (2) Pick the k th smallest one O(nlogn) Far from optimality!!! Any better idea? well,. Try Divide and Conquer!!! 41
S = { 21, 15, 13, 8, 7, 29, 22, 2, 5, 1, 3, 26, 4, 19, 12, 2, 18, 24, 16, 23, 11, 1, 25, 14, 27, 6, 17, 9, 28 } 21 29 3 2 11 6 15 22 26 18 1 17 Divide S into S /5 sequences 13 2 4 24 25 9 of 5 elements each 8 5 19 16 14 28 with up to 4 leftover elements 7 1 12 23 27 21 29 26 24 27 6 15 22 19 23 25 17 Sort each 5-element sequence 13 1 12 2 14 9 8 5 4 18 11 28 7 2 3 16 1 42
A B 29 26 21 27 24 6 M = 22 19 15 25 23 17 m = the median of M 1 12 13 14 2 9 S 1 = {s s < m and s S} 5 4 8 11 18 28 S 2 = {s s = m and s S} 2 3 7 1 16 S 3 = {s s > m and s S} C D S 1 3 S 4 Why? 3 S 3 S 4 43
A B............ m............ m................ C D S 1 = {s s < m and s S} S 2 = {s s = m and s S} S 3 = {s s > m and s S} 3 3 S 1 S and S 3 S 4 4 44
if S 1 k then select (S 1, k) else if S 1 + S 2 k then m is the k th smallest one else select (S 3, k - S 1 - S 2 ) end 3n T ( n) = T ( ) + c n + T ( n / 5) 4 Why? 45
Algorithm ( finding the k th smallest element in S ) procedure SELECT(k,S) begin end if S < 5 then Sort S; SELECT := the k th smallest one end {if} else end Divide S into S /5 sequences of 5 elements each with up to 4 leftover elements; Sort each 5-element sequence; Let M be the set of medians of 5-elements sets (sequences); m := SELECT( ); S 1 = {s s < m and s S}; S 2 = {s s = m and s S}; S 3 = {s s > m and s S}; if S 1 k then SELECT ( k, S 1 ) else if S 1 + S 2 k then else end M, M 2 SELECT ( k - S 1 - S 2, S 3 ) SELECT := m c1 n Why? c2 n n T 5 c n 3 3n T 4 46
c n if T ( n) T ( n / 5) + T (3n / 4) + c n if Show that T(n) 2cn. How? By induction!!! 15 n = 5 T (5) T (1) + T ( ) + c 5 4 T (1) + T (38) + c 5 c 1 + c 38 + c 5 c 98 2 c 5 n < 5 n 5 5 < n k T(n) 2cn n = k+1 T(k+1) T((k+1) / 5) + T(3(k+1) / 4) + c (k+1) 2 c (k+1) / 5 + 2 c 3(k+1) / 4 + c(k+1) 2 c (k+1) 4c(k+1) + 15 c (k+1) + c(k+1) 47
Finding the Median # of comparisons 16n Blum [1973] 5.4n Hyafile [1976] Schonhage, Paterson, and Pippenger [1976] The 3n + o(n) best known little o algorithm 48
5. A Lower Bound for Finding the Median k = (n+1) / 2 n-1 (crucial) comparisons Can you find any tighter lower bound? Well, Why not using an adversary argument? 49
Observation x y median x y crucial comparisons non-crucial comparisons Def n : A comparison involving an element x is said to be a crucial comparison for x if it is the first comparison with y satisfying one of the following conditions : (1) x > y for some y median. (2) x < y for some y median. Note : (i) A crucial comparison for x establishes the relation of x to the median. (ii) The relation of y to the median is not necessarily known at the time the crucial comparison for x is done. 5
Adversary Strategy Force an algorithm to perform as many non-crucial comparisons as possible. How? Assigning values to variables. (x i, s i ) Status L: assigned a value larger than the median S: assigned a value smaller than the median N: not yet in comparison (L, S) (x i, x j ) : Comparing x i and x j, i j (N, N) -- x i > median > x j ; (L, S) (L, N) -- make the unassigned one smaller than the median ; (S, N) -- reverse the above ; (S, L) (L, S) L) consistent with previous responses (S, L) S) 51
n + 2 n 1 2 1 1 elements 1 n + 1 th 2 element n n 1 2 n + 1 < median median < elements media n Unless there are already (n - 1) / 2 elements with status S (or L), keep the strategy previously stated!!! 2 Otherwise, make the balance between the numbers for L and S. (n - 1) / 2 non-crucial comparisons possible!!! Why? 3 ( n 1) / 2 + ( n 1) = ( n 1) 2 crucial non-crucial Comparisons 52
Theorem : Any algorithm to find the median of n numbers must do 3 ( n 1) at least 2 comparisons. 3 ( n 1) 2 1.75n log n 1.8n 2n Best lower bound currently known!!! (n - 1) comparisons are tight lower bound only for k = 1 and n!!! 53
Project 2 Graph-related algorithms Both directed and undirected graphs Menu-driven 1) Initialize Graphs 2) Min-Cost spanning tree 3) Dijkstra s shortest path 4) Depth-first Search 5) Breadth-first Search 6) Biconnected components 7) Strongly connected components 54