B- TREE Michael Tsai 2017/06/06
2 B- Tree Overview Balanced search tree Very large branching factor Height = O(log n), but much less than that of RB tree Usage: Large amount of data to be stored - - Partially in memory, and partially in secondary storage (e.g., hard drive) Goal: 1. Minimizing disk I/O operation 2. Minimizing CPU time
3 Typical Storage Speed / Capacity Storage Read speed Capacity Hard Drive Typically ~100 MB/s Up to 10 TB SSD ~500 MB/s Up to 1 TB SD Memory 30 MB/s (UHS- 3) 10 MB/s (class 10) 6400 MB/s (DDR3) Up to 1 TB (Typically 32 GB or 64 GB) Desktop/laptop has 4~16 GB
4 Typical B- Tree (keys) T:root M Internal node x has x.n keys (3) D H Q T X B C F G J K L Keys in x separate the ranges of keys in its sub trees. N P R S V W Y Z Internal node x has x.n+1 children (4)
5 Typical B- Tree (search) T:root R M Internal node x has x.n keys (3) D H Q T X B C F G J K L Keys in x separate the ranges of keys in its sub trees. N P R S V W Y Z Internal node x has x.n+1 children (4)
6 A more realistic B- tree T:root 1000 1000 1000 1001 1000 1001 1001 1001 1000 1000 1000 Usually only a node s keys/data is read from the disk at a time. Root is always kept in the memory.
7 B- Tree Definition (1) B- Tree is a rooted tree For each node x: x.n is the number of keys in x Keys: x.key 1,x.key 2,...,x.key x.n are stored in non- decreasingorder. x.key 1 apple x.key 2 apple apple x.key x.n x.leaf, TRUE if x is a leaf and FALSE otherwise. Each internal node x contains x.n+1 pointers x.c 1,x.c 2,...,x.c x.n+1 Leaves have these undefined.
8 B- Tree Definitions (2) The keys x.key i separate the ranges of keys stored in each subtree: is any key stored in the subtree with root, k i All leaves have the same depth: the tree s height h. Minimum degree of B- tree: x.c i k 1 apple x.key 1 apple k 2 apple x.key 2 apple apple x.key x.n apple k x.n+1 t 2 Every node other than root have at least t- 1 keys (thus t children) Every node can have at most 2t- 1 keys (thus 2t children) (In this case, this node is full)
9 Proof: B- Tree Height If n 1, then for any n- key B- tree T of height h and minimum degree, t 2 h apple log t n +1 2 Proof: Consider the case with each node node t 1 T:root 1 having the least # of t 1 depth 1 number of nodes 0 1 2 t 1 t t 1 t 1 t t 1 2 2t t t 1 t 1 t t 1 t 1 t t 1 t 1 t t 1 t 1 3 2t 2
10 Proof: B- Tree Height n 1+(t 1) = 1 + 2(t 1) =2t h 1 hx i=1 2t i 1 t h 1 t 1 t h apple (n + 1)/2 )h apple log t (n + 1) 2 h = O(log n)
11 Disk Operation DISK- READ(x) : if x is not in memory, then we require this before accessing x. no- op if x is already in the memory. DISK- WRITE(x): this is required for putting any changes of x back to the disk. Root is always stored in the memory Typical work flow: x = pointer to an object DISK-READ(x) (operations to modify x) DISK-WRITE(x) Operations to access x (but no modifications)
12 Search in B- Tree B-TREE-SEARCH.x; k/ 1 i D 1 2 while i x:n and k>x:key i 3 i D i C 1 4 if i x:n and k == x:key i 5 return.x; i/ 6 elseif x:leaf 7 return NIL 8 else DISK-READ.x:c i / 9 return B-TREE-SEARCH.x:c i ;k/ Input: x: search from this node k: key to be searched Return value: (x,i): key k is found at node x s i- th key CPU time: Disk I/O: O(t log t n) O(log t n) Using a linear-search procedure, lines 1 3 fi,orelsetheyset to. L
13 Create an empty B- Tree on the disk for that node. B-TREE-CREATE.T / 1 x D ALLOCATE-NODE./ 2 x:leaf D TRUE 3 x:n D 0 4 DISK-WRITE.x/ 5 T:root D x Allocate- Node() is a O(1) operation to allocate a disk page to store a new node CPU time: O(1) Disk I/O: O(1) B-TREE-CREATE requires
14 B- Tree Insertion: Overview Cannot simply create a new leaf node and insert it: this will violate the B- tree definitions Sol: insert into an existing leaf node Problem: what if that leaf node is already FULL? FULL: having 2t- 1 keys and 2t children Sol: split a full node y around its median key y.key t y.key 1,y.key 2,...,y.key t 1,<y.key t <y.key t+1,y.key t+2,...,y.key 2t 1, t keys smaller than median t keys larger than median y.key t Then move up to y s parent node. What if y s parent is also full? We split it too Workflow: Start from root (search), split all traversed full nodes
15 B- Tree Insertion: Overview x N W y D x:c ix:keyi 1 x:key i P Q R S T U V x y D x:c i N S W x:key i 1 x:key i x:key ic1 D x:c ic1 P Q R T U V T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 t=4 2t- 1=7
16 B- Tree: Split Full Child B-TREE-SPLIT-CHILD.x; i/ 1 D ALLOCATE-NODE./ 2 y D x:c i 3 :leaf D y:leaf 4 :n D t 1 5 for j D 1 to t 1 6 :key j D y:key j Ct 7 if not y:leaf 8 for j D 1 to t 9 :c j D y:c j Ct 10 y:n D t 1 11 for j D x:n C 1 downto i C 1 12 x:c j C1 D x:c j 13 x:c C D 14 for n downto Split node x s i- th child, which is full CPU time: O(t) Disk I/O: O(1) D C C 12 j C1 D j 13 x:c ic1 D 14 for j D x:n downto i 15 x:key j C1 D x:key j 16 x:key i D y:key t 17 x:n D x:n C 1 18 DISK-WRITE.y/ 19 DISK-WRITE. / 20 DISK-WRITE.x/
17 B- Tree: Split the Root T:root r A D F H L N P T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T:root s H r A D F L N P T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 Splitting the root is the only way to increase the height of a B- tree Height is increased at the top, not at the bottom
18 B- Tree: Split the Root cursion Please never review descends the pseudo to a full code node. below yourself. B-TREE-INSERT.T; k/ 1 r D T:root 2 if r:n == 2t 1 3 s D ALLOCATE-NODE./ 4 T:root D s 5 s:leaf D FALSE 6 s:n D 0 7 s:c 1 D r 8 B-TREE-SPLIT-CHILD.s; 1/ 9 B-TREE-INSERT-NONFULL.s; k/ 10 else B-TREE-INSERT-NONFULL.r; k/ Lines 3 9 handle the case in which the root n
Insertion Example 19
20 B- Tree Insertion:pseudo code B-TREE-INSERT-NONFULL.x; k/ 1 i D x:n 2 if x:leaf 3 while i 1 and k<x:key i 4 x:key ic1 D x:key i 5 i D i 1 6 x:key ic1 D k 7 x:n D x:n C 1 8 DISK-WRITE.x/ 9 else while i 1 and k<x:key i 10 i D i 1 11 i D i C 1 12 DISK-READ.x:c i / 13 if x:c i :n == 2t 1 14 B-TREE-SPLIT-CHILD.x; i/ 15 if k>x:key i 16 i D i C 1 17 B-TREE-INSERT-NONFULL.x:c i ;k/ The B-TREE-INSERT-NONFULL procedure wo If x is a leaf, insert k at the right location If x is not a leaf, then.. Find the right child node If the child node is full, first split it! (its median key will come back to this node) Finally, recursive call to continue to the child node
21 B- Tree Insertion: running time cursion never descends to a full node. B-TREE-INSERT.T; k/ 1 r D T:root 2 if r:n == 2t 1 3 s D ALLOCATE-NODE./ 4 T:root D s 5 s:leaf D FALSE 6 s:n D 0 7 s:c 1 D r 8 B-TREE-SPLIT-CHILD.s; 1/ 9 B-TREE-INSERT-NONFULL.s; k/ 10 else B-TREE-INSERT-NONFULL.r; k/ Lines 3 9 handle the case in which the root n CPU time: Disk I/O: O(t log t n) O(log t n) B-TREE-INSERT-NONFULL.x; k/ 1 i D x:n 2 if x:leaf 3 while i 1 and k<x:key i 4 x:key ic1 D x:key i 5 i D i 1 6 x:key ic1 D k 7 x:n D x:n C 1 8 DISK-WRITE.x/ 9 else while i 1 and k<x:key i 10 i D i 1 11 i D i C 1 12 DISK-READ.x:c i / 13 if x:c i :n == 2t 1 14 B-TREE-SPLIT-CHILD.x; i/ 15 if k>x:key i 16 i D i C 1 17 B-TREE-INSERT-NONFULL.x:c i ;k/ The B-TREE-INSERT-NONFULL procedure wo
22 Reading Assignment (Real) Chapter 18.3 Deleting a key from a B- tree