2-3 Trees 1
Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2
2-3 trees A kind of balanced search tree Assume keys are totally ordered (<, >, =) Assume n key/value pairs are stored in the dictionary Time per dictionary operation is O(log n) Support of other useful operations as well 3
Basic structure: a tree Key/value pairs stored only at leaves (no duplicate keys) All leaves at the same level, with keys in sorted order Each internal node: has either 2 or 3 children has a guide : the maximum key in its subtree 4
Example 17 8 17 3 6 8 11 17 1 3 4 6 7 8 10 11 12 15 17 5
Let h := height of tree (Recall: height = length of longest path from root to leaf) Claim: n 2 h Proof by induction on h Base case: h = 0, n = 1 Induction step: h > 0, assume claim holds for h 1 Tree has a root node, which has either 2 or 3 children Each of these children is the root of a subtree, which itself is a 2-3 tree of height h 1 By induction hypothesis, if the th subtree has n leaves, then n 2 h 1 [here, = 1.. 2 (or 3)] Corollary: h log 2 n n = n 2h 1 2 2 h 1 = 2 h 6
Se rch( ): use guides Se rch(, p, he ght): // Invoke as Se rch(, root, h) if (he ght > 0) then else if ( p.child0.g ide) then return Se rch(, p.child0, he ght 1) else if ( p.child1.g ide or p.child2 = n ll) then return Se rch(, p.child1, he ght 1) else return Se rch(, p.child2, he ght 1) if = p.g ide then return p.v l e else return null (or a default value) 7
nsert( ): Search for, and if it should belong under p: add as a child of p (if not already present) if p now has 4 children: split p into two two nodes, p 1 and p 2, each with two children p p p1 p2 y z y z y z process p s parent in the same way Special case: no parent create new root, increasing height of tree by 1 Also need to update guides easy Time = O(height) = O(log n) 8
Insert 5 5 Insert 69 Insert 21 5 21 5 8 21 63 69 Insert 8 5 8 21 Insert 10 5 8 10 21 63 69 Insert 63 5 8 21 63 5 8 21 63 5 8 10 21 63 69 9
Insert 69 Insert 6 5 8 21 63 69 5 6 8 10 21 63 69 Insert 10 Insert 7 5 8 10 21 63 69 5 6 7 8 10 21 63 69 5 6 7 8 10 21 63 69 5 8 10 21 63 69 5 6 7 8 10 21 63 69 10
De ete( ): Search for, and if found under p: remove if p now only has one child: if p is the root: delete p (height decreases by 1) if one of p s adjacent siblings has 3 children: borrow one q p q p y y 11
if none of p s adjacent siblings has 3 children: one sibling q must have 2 children give p s only child to q delete p process p s parent q p q p y y 12
Delete 69 5 6 7 8 10 21 63 5 6 7 8 10 21 63 69 (give) Delete 8 5 6 7 8 10 21 63 5 6 7 8 10 21 63 (borrow) (give) 5 6 7 10 21 63 5 6 7 8 10 21 63 (delete root) 5 6 7 10 21 63 5 6 7 8 10 21 63 13
Delete 10 5 6 7 8 10 21 63 5 6 7 10 21 63 Delete 8 (give) 5 6 7 8 10 21 63 5 6 7 21 63 (borrow) 5 6 7 10 21 63 5 6 7 21 63 5 6 7 10 21 63 14
2-3 trees: summary Assume n items in dictionary Running time for lookup, insert, delete: O(log n) comparisons, plus O(log n) overhead Space: O(n) pointers Note: in the literature, 2-3 trees usually store the guides in the parent node every node contains two guides (the guide for the third child is not needed) A generalization: B-trees allow many children (which makes the height smaller) again, store guides in the parent node useful for high-latency memory (like hard drives) 15
Augmenting 2-3 trees Idea: augment nodes with additional information to support new types of queries Example: store # of items in subtree at each internal node Queries: What is the kth smallest item? How many items are? 16
Items may be marked with an attribute, say, active / inactive Store a count of active items in subtree at each internal node Queries: What is the kth smallest active item? How many active items are? 17
Attribute flipping Operation F pr nge(, y) flips all attribute bits of items in the range Assume attributes are bits Store an attribute bit at every node: internal nodes and leaves effective value of the attribute is the XOR of all bits on path from root to leaf 0 1 0 1 1 0 0 1 1 = 1 1 18
To perform F pr nge(, y): trace paths e, ƒ to, y flip bits at, y, and all roots of internal subtrees y 19
2-3 Trees: Join and Split Jo n(t 1, T 2 ) joins two 2-3 trees in time O(log n) Assume m x(t 1 ) < min(t 2 ) Assume T has height h for = 1, 2 Case 1: h 1 = h 2 new root Time: O(1) T 1 T 2 20
Case 2: h 1 < h 2 p T 1 T 2 Attach as the left-most child of p If p now has 4 children, we split p, and proceed up the tree as in nsert Time: O(h 2 h 1 ) = O(log n) Case 3: h 1 > h 2 similar 21
Sp t(t, ) = (T 1 [ ], T 2 [> ]) T 1 T 2 T 3 T 4 T 5 join from inside out 22
Analysis of rejoin step Start with trees T 0, T 1,..., T k with h = he ght(t ) h 0 h 1 h k [Monotonicity] There are at most two trees of any given height (except there may be three of height 0) [Diversity] Form new trees T 0, T 1,..., T k with h = he ght(t ): T 0 = T 0, T +1 = Jo n(t, T +1 ) Total running time: O(S), where k 1 S = ( h +1 h =0 + 1) 23
Claim: For = 0.. k 1: h {h, h + 1} From the claim, h +1 h h +1 h + 1, and so k 1 S (h +1 h + 2) = 2k + h k h 0 = O(log n) =0 Example: 0 0 0 1 1 2 3 3 5 1 0 1 1 2 3 3 5 1 1 1 2 3 3 5 2 1 2 3 3 5 2 2 3 3 5 3 3 3 5 4 3 5 4 5 5 = a fresh tree: root has only 2 children 24
We prove the claim by induction, but we need to prove more: strengthening the induction hypothesis For = 0.. k 1, we have P( ) : Q( ) : h h {h, h + 1} > h +1 = T fresh Base case: = 0 (recall h 0 = h 0) Induction step: assume P( ), Q( ) and prove P( + 1), Q( + 1), where = 0.. k 2 25
Case I. Recall: T +1 = Jo n(t, T +1 ) 1. Assume h h +1 2. By logic of Join and (1), either h +1 = h +1 or h +1 = h +1 + 1 and T is fresh +1 3. So P( + 1) holds 4. To prove Q( + 1): h +1 > h +2 = h +1 > h +1 [By monotonicity] = T is fresh [By (2)] +1 26
Case II. Recall: T +1 = Jo n(t, T +1 ) 1. Assume h > h +1 2. Must have > 0 3. By Q( ) and (1), T is fresh, and Join logic implies h +1 = h 4. We have h h + 1 [By P( )] h +1 + 1 [By monotonicity] and so h h [By (1)] = h + 1 = h +1 + 1 5. By (4), we have h = h +1, and by diversity, monotonicity, and (2): h +1 + 1 h +2 6. Therefore: h +1 = h = h +1 + 1 [ = P( + 1)] h +2 [ = Q( + 1)] 27