B- TREE. Michael Tsai 2017/06/06

Similar documents
searching algorithms

Fundamental Algorithms

Data Structures and and Algorithm Xiaoqing Zheng

Dictionary: an abstract data type

Binary Search Trees. Motivation

HW #4. (mostly by) Salim Sarımurat. 1) Insert 6 2) Insert 8 3) Insert 30. 4) Insert S a.

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures

Analysis of Algorithms. Outline 1 Introduction Basic Definitions Ordered Trees. Fibonacci Heaps. Andres Mendez-Vazquez. October 29, Notes.

5 Spatial Access Methods

5 Spatial Access Methods

INF2220: algorithms and data structures Series 1

Assignment 5: Solutions

Data Structures and Algorithms " Search Trees!!

Dictionary: an abstract data type

Graphs and Trees Binary Search Trees AVL-Trees (a,b)-trees Splay-Trees. Search Trees. Tobias Lieber. April 14, 2008

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Binary Search Trees. Lecture 29 Section Robb T. Koether. Hampden-Sydney College. Fri, Apr 8, 2016

The null-pointers in a binary search tree are replaced by pointers to special null-vertices, that do not carry any object-data

Splay Trees. CMSC 420: Lecture 8

CS 151. Red Black Trees & Structural Induction. Thursday, November 1, 12

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Mon Tue Wed Thurs Fri

Fibonacci (Min-)Heap. (I draw dashed lines in place of of circular lists.) 1 / 17

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

Efficient Spatial Data Structure for Multiversion Management of Engineering Drawings

AVL Trees. Manolis Koubarakis. Data Structures and Programming Techniques

Dynamic Ordered Sets with Exponential Search Trees

Problem: Data base too big to fit memory Disk reads are slow. Example: 1,000,000 records on disk Binary search might take 20 disk reads

Optimal Color Range Reporting in One Dimension

16. Binary Search Trees. [Ottman/Widmayer, Kap. 5.1, Cormen et al, Kap ]

16. Binary Search Trees. [Ottman/Widmayer, Kap. 5.1, Cormen et al, Kap ]

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Advanced Data Structures

Recitation 7. Treaps and Combining BSTs. 7.1 Announcements. FingerLab is due Friday afternoon. It s worth 125 points.

Hash tables. Hash tables

Appendix of Computational Protein Design Using AND/OR Branch and Bound Search

Advanced Data Structures

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

CS361 Homework #3 Solutions

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

CS 240 Data Structures and Data Management. Module 4: Dictionaries

CS Data Structures and Algorithm Analysis

Data Structures 1 NTIN066

Ordered Dictionary & Binary Search Tree

CS60007 Algorithm Design and Analysis 2018 Assignment 1

Nearest Neighbor Search with Keywords

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Inge Li Gørtz. Thank you to Kevin Wayne for inspiration to slides

Hash tables. Hash tables

Data Structures 1 NTIN066

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Lecture 5: Splay Trees

Fundamental Algorithms

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan

Notes on Logarithmic Lower Bounds in the Cell Probe Model

AVL Trees. Properties Insertion. October 17, 2017 Cinda Heeren / Geoffrey Tien 1

past balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM

Chapter 10: CLP Systems

Evolutionary Tree Analysis. Overview

Review Of Topics. Review: Induction

More Dynamic Programming

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

Introduction. I Dynamic programming is a technique for solving optimization problems. I Key element: Decompose a problem into subproblems, solve them

CSE 4502/5717: Big Data Analytics

More Dynamic Programming

Search Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015

Hash tables. Hash tables

/463 Algorithms - Fall 2013 Solution to Assignment 4

CPSC 320 Sample Final Examination December 2013

Bayesian Networks: Independencies and Inference

Amortized analysis. Amortized analysis

Ukkonen's suffix tree construction algorithm

Chapter 5 Arrays and Strings 5.1 Arrays as abstract data types 5.2 Contiguous representations of arrays 5.3 Sparse arrays 5.4 Representations of

Weight-balanced Binary Search Trees

Cache-Oblivious Algorithms

Even More on Dynamic Programming

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

Cache-Oblivious Algorithms

Fibonacci Heaps These lecture slides are adapted from CLRS, Chapter 20.

CSE 202 Homework 4 Matthias Springer, A

Chapter 6. Self-Adjusting Data Structures

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Weight-balanced Binary Search Trees


Exam 1. March 12th, CS525 - Midterm Exam Solutions

7.3 AVL-Trees. Definition 15. Lemma 16. AVL-trees are binary search trees that fulfill the following balance condition.

TASM: Top-k Approximate Subtree Matching

Administrative notes. Computational Thinking ct.cs.ubc.ca

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

DS L19: B Trees

#A32 INTEGERS 10 (2010), TWO NEW VAN DER WAERDEN NUMBERS: w(2; 3, 17) AND w(2; 3, 18)

Hierarchical Matrices. Jon Cockayne April 18, 2017

QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees

Math Models of OR: Branch-and-Bound

Priority queues implemented via heaps

Transcription:

B- TREE Michael Tsai 2017/06/06

2 B- Tree Overview Balanced search tree Very large branching factor Height = O(log n), but much less than that of RB tree Usage: Large amount of data to be stored - - Partially in memory, and partially in secondary storage (e.g., hard drive) Goal: 1. Minimizing disk I/O operation 2. Minimizing CPU time

3 Typical Storage Speed / Capacity Storage Read speed Capacity Hard Drive Typically ~100 MB/s Up to 10 TB SSD ~500 MB/s Up to 1 TB SD Memory 30 MB/s (UHS- 3) 10 MB/s (class 10) 6400 MB/s (DDR3) Up to 1 TB (Typically 32 GB or 64 GB) Desktop/laptop has 4~16 GB

4 Typical B- Tree (keys) T:root M Internal node x has x.n keys (3) D H Q T X B C F G J K L Keys in x separate the ranges of keys in its sub trees. N P R S V W Y Z Internal node x has x.n+1 children (4)

5 Typical B- Tree (search) T:root R M Internal node x has x.n keys (3) D H Q T X B C F G J K L Keys in x separate the ranges of keys in its sub trees. N P R S V W Y Z Internal node x has x.n+1 children (4)

6 A more realistic B- tree T:root 1000 1000 1000 1001 1000 1001 1001 1001 1000 1000 1000 Usually only a node s keys/data is read from the disk at a time. Root is always kept in the memory.

7 B- Tree Definition (1) B- Tree is a rooted tree For each node x: x.n is the number of keys in x Keys: x.key 1,x.key 2,...,x.key x.n are stored in non- decreasingorder. x.key 1 apple x.key 2 apple apple x.key x.n x.leaf, TRUE if x is a leaf and FALSE otherwise. Each internal node x contains x.n+1 pointers x.c 1,x.c 2,...,x.c x.n+1 Leaves have these undefined.

8 B- Tree Definitions (2) The keys x.key i separate the ranges of keys stored in each subtree: is any key stored in the subtree with root, k i All leaves have the same depth: the tree s height h. Minimum degree of B- tree: x.c i k 1 apple x.key 1 apple k 2 apple x.key 2 apple apple x.key x.n apple k x.n+1 t 2 Every node other than root have at least t- 1 keys (thus t children) Every node can have at most 2t- 1 keys (thus 2t children) (In this case, this node is full)

9 Proof: B- Tree Height If n 1, then for any n- key B- tree T of height h and minimum degree, t 2 h apple log t n +1 2 Proof: Consider the case with each node node t 1 T:root 1 having the least # of t 1 depth 1 number of nodes 0 1 2 t 1 t t 1 t 1 t t 1 2 2t t t 1 t 1 t t 1 t 1 t t 1 t 1 t t 1 t 1 3 2t 2

10 Proof: B- Tree Height n 1+(t 1) = 1 + 2(t 1) =2t h 1 hx i=1 2t i 1 t h 1 t 1 t h apple (n + 1)/2 )h apple log t (n + 1) 2 h = O(log n)

11 Disk Operation DISK- READ(x) : if x is not in memory, then we require this before accessing x. no- op if x is already in the memory. DISK- WRITE(x): this is required for putting any changes of x back to the disk. Root is always stored in the memory Typical work flow: x = pointer to an object DISK-READ(x) (operations to modify x) DISK-WRITE(x) Operations to access x (but no modifications)

12 Search in B- Tree B-TREE-SEARCH.x; k/ 1 i D 1 2 while i x:n and k>x:key i 3 i D i C 1 4 if i x:n and k == x:key i 5 return.x; i/ 6 elseif x:leaf 7 return NIL 8 else DISK-READ.x:c i / 9 return B-TREE-SEARCH.x:c i ;k/ Input: x: search from this node k: key to be searched Return value: (x,i): key k is found at node x s i- th key CPU time: Disk I/O: O(t log t n) O(log t n) Using a linear-search procedure, lines 1 3 fi,orelsetheyset to. L

13 Create an empty B- Tree on the disk for that node. B-TREE-CREATE.T / 1 x D ALLOCATE-NODE./ 2 x:leaf D TRUE 3 x:n D 0 4 DISK-WRITE.x/ 5 T:root D x Allocate- Node() is a O(1) operation to allocate a disk page to store a new node CPU time: O(1) Disk I/O: O(1) B-TREE-CREATE requires

14 B- Tree Insertion: Overview Cannot simply create a new leaf node and insert it: this will violate the B- tree definitions Sol: insert into an existing leaf node Problem: what if that leaf node is already FULL? FULL: having 2t- 1 keys and 2t children Sol: split a full node y around its median key y.key t y.key 1,y.key 2,...,y.key t 1,<y.key t <y.key t+1,y.key t+2,...,y.key 2t 1, t keys smaller than median t keys larger than median y.key t Then move up to y s parent node. What if y s parent is also full? We split it too Workflow: Start from root (search), split all traversed full nodes

15 B- Tree Insertion: Overview x N W y D x:c ix:keyi 1 x:key i P Q R S T U V x y D x:c i N S W x:key i 1 x:key i x:key ic1 D x:c ic1 P Q R T U V T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 t=4 2t- 1=7

16 B- Tree: Split Full Child B-TREE-SPLIT-CHILD.x; i/ 1 D ALLOCATE-NODE./ 2 y D x:c i 3 :leaf D y:leaf 4 :n D t 1 5 for j D 1 to t 1 6 :key j D y:key j Ct 7 if not y:leaf 8 for j D 1 to t 9 :c j D y:c j Ct 10 y:n D t 1 11 for j D x:n C 1 downto i C 1 12 x:c j C1 D x:c j 13 x:c C D 14 for n downto Split node x s i- th child, which is full CPU time: O(t) Disk I/O: O(1) D C C 12 j C1 D j 13 x:c ic1 D 14 for j D x:n downto i 15 x:key j C1 D x:key j 16 x:key i D y:key t 17 x:n D x:n C 1 18 DISK-WRITE.y/ 19 DISK-WRITE. / 20 DISK-WRITE.x/

17 B- Tree: Split the Root T:root r A D F H L N P T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T:root s H r A D F L N P T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 Splitting the root is the only way to increase the height of a B- tree Height is increased at the top, not at the bottom

18 B- Tree: Split the Root cursion Please never review descends the pseudo to a full code node. below yourself. B-TREE-INSERT.T; k/ 1 r D T:root 2 if r:n == 2t 1 3 s D ALLOCATE-NODE./ 4 T:root D s 5 s:leaf D FALSE 6 s:n D 0 7 s:c 1 D r 8 B-TREE-SPLIT-CHILD.s; 1/ 9 B-TREE-INSERT-NONFULL.s; k/ 10 else B-TREE-INSERT-NONFULL.r; k/ Lines 3 9 handle the case in which the root n

Insertion Example 19

20 B- Tree Insertion:pseudo code B-TREE-INSERT-NONFULL.x; k/ 1 i D x:n 2 if x:leaf 3 while i 1 and k<x:key i 4 x:key ic1 D x:key i 5 i D i 1 6 x:key ic1 D k 7 x:n D x:n C 1 8 DISK-WRITE.x/ 9 else while i 1 and k<x:key i 10 i D i 1 11 i D i C 1 12 DISK-READ.x:c i / 13 if x:c i :n == 2t 1 14 B-TREE-SPLIT-CHILD.x; i/ 15 if k>x:key i 16 i D i C 1 17 B-TREE-INSERT-NONFULL.x:c i ;k/ The B-TREE-INSERT-NONFULL procedure wo If x is a leaf, insert k at the right location If x is not a leaf, then.. Find the right child node If the child node is full, first split it! (its median key will come back to this node) Finally, recursive call to continue to the child node

21 B- Tree Insertion: running time cursion never descends to a full node. B-TREE-INSERT.T; k/ 1 r D T:root 2 if r:n == 2t 1 3 s D ALLOCATE-NODE./ 4 T:root D s 5 s:leaf D FALSE 6 s:n D 0 7 s:c 1 D r 8 B-TREE-SPLIT-CHILD.s; 1/ 9 B-TREE-INSERT-NONFULL.s; k/ 10 else B-TREE-INSERT-NONFULL.r; k/ Lines 3 9 handle the case in which the root n CPU time: Disk I/O: O(t log t n) O(log t n) B-TREE-INSERT-NONFULL.x; k/ 1 i D x:n 2 if x:leaf 3 while i 1 and k<x:key i 4 x:key ic1 D x:key i 5 i D i 1 6 x:key ic1 D k 7 x:n D x:n C 1 8 DISK-WRITE.x/ 9 else while i 1 and k<x:key i 10 i D i 1 11 i D i C 1 12 DISK-READ.x:c i / 13 if x:c i :n == 2t 1 14 B-TREE-SPLIT-CHILD.x; i/ 15 if k>x:key i 16 i D i C 1 17 B-TREE-INSERT-NONFULL.x:c i ;k/ The B-TREE-INSERT-NONFULL procedure wo

22 Reading Assignment (Real) Chapter 18.3 Deleting a key from a B- tree