Advanced Implementations of Tables: Balanced Search Trees and Hashing

Similar documents
Lecture: Analysis of Algorithms (CS )

Introduction to Hashtables

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Hash tables. Hash tables

Data Structures and Algorithms " Search Trees!!

Hash tables. Hash tables

Hashing. Data organization in main memory or disk

Fundamental Algorithms

Introduction to Hash Tables

INF2220: algorithms and data structures Series 1

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Hash tables. Hash tables

CS Data Structures and Algorithm Analysis

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Binary Search Trees. Motivation

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Dictionary: an abstract data type

Algorithms lecture notes 1. Hashing, and Universal Hash functions

1 Maintaining a Dictionary

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

Solution suggestions for examination of Logic, Algorithms and Data Structures,

Hashing Data Structures. Ananda Gunawardena

Problem: Data base too big to fit memory Disk reads are slow. Example: 1,000,000 records on disk Binary search might take 20 disk reads

Dictionary: an abstract data type

Hashing, Hash Functions. Lecture 7

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

5 Spatial Access Methods

Searching, mainly via Hash tables

Module 1: Analyzing the Efficiency of Algorithms

So far we have implemented the search for a key by carefully choosing split-elements.

? 11.5 Perfect hashing. Exercises

16. Binary Search Trees. [Ottman/Widmayer, Kap. 5.1, Cormen et al, Kap ]

Amortized analysis. Amortized analysis

Binary Search Trees. Dictionary Operations: Additional operations: find(key) insert(key, value) erase(key)

Assignment 5: Solutions

AVL Trees. Manolis Koubarakis. Data Structures and Programming Techniques

5 Spatial Access Methods

7 Dictionary. EADS c Ernst Mayr, Harald Räcke 109

16. Binary Search Trees. [Ottman/Widmayer, Kap. 5.1, Cormen et al, Kap ]

Data Structures and Algorithms

Data Structures and Algorithm. Xiaoqing Zheng

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures

Search Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015

compare to comparison and pointer based sorting, binary trees

Advanced Data Structures

Data Structures and Algorithms Winter Semester

Algorithms for Data Science

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

CMSC 132, Object-Oriented Programming II Summer Lecture 12

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

Tutorial 4. Dynamic Set: Amortized Analysis

Discrete Mathematics

Review Of Topics. Review: Induction

7 Dictionary. 7.1 Binary Search Trees. Binary Search Trees: Searching. 7.1 Binary Search Trees

CS 240 Data Structures and Data Management. Module 4: Dictionaries

Array-based Hashtables

Integer Sorting on the word-ram

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Lecture 1 : Data Compression and Entropy

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Ordered Dictionary & Binary Search Tree

Advanced Data Structures

Heaps and Priority Queues

Functional Data Structures

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Hashing. Why Hashing? Applications of Hashing

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

past balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM

Priority queues implemented via heaps

Data Structures and and Algorithm Xiaoqing Zheng

Problem. Maintain the above set so as to support certain

CS60007 Algorithm Design and Analysis 2018 Assignment 1

Optimal Color Range Reporting in One Dimension

1 Hashing. 1.1 Perfect Hashing

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.

Part IA Algorithms Notes

CS361 Homework #3 Solutions

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille

Fundamental Algorithms

CS1802 Week 11: Algorithms, Sums, Series, Induction

Integer factorization, part 1: the Q sieve. part 2: detecting smoothness. D. J. Bernstein

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

Lecture 8 HASHING!!!!!

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Graphs and Trees Binary Search Trees AVL-Trees (a,b)-trees Splay-Trees. Search Trees. Tobias Lieber. April 14, 2008

Weight-balanced Binary Search Trees

6.854 Advanced Algorithms

Chapter 5 Arrays and Strings 5.1 Arrays as abstract data types 5.2 Contiguous representations of arrays 5.3 Sparse arrays 5.4 Representations of

Transcription:

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the desired node The path length can vary from log 2 (n+1) to O(n) depending on how balanced or unbalanced the tree is The shape of the tree is determined by the values of the items and the order in which they were inserted Binary searching & introduction to trees 2

Examples 10 20 30 40 20 60 40 50 60 10 30 50 70 Can you get the same tree with different insertion orders? 70 Binary searching & introduction to trees 3

2-3 Trees Each internal node has two or three children All leaves are at the same level 2 children = 2-node, 3 children = 3-node Binary searching & introduction to trees 4

2-3 Trees (continued) 2-3 trees are not binary trees (duh) A 2-3 tree of height h always has at least 2 h -1 nodes i.e. greater than or equal to a binary tree of height h A 2-3 tree with n nodes has height less than or equal to log 2 (n+1) i.e less than or equal to the height of a full binary tree with n nodes Binary searching & introduction to trees 5

Definition of 2-3 Trees T is a 2-3 tree of height h if T is empty (height 0), OR T is of the form r T L T R Where r is a node that contains one data item and T L and T R are 2-3 trees, each of height h-1, and the search key of r is greater than any in T L and less than any in T R, OR Binary searching & introduction to trees 6

Definition of 2-3 Trees (continued) T is of the form r T L T M T R Where r is a node that contains two data items and T L, T M, and T R are 2-3 trees, each of height h-1, and the smaller search key of r is greater than any in T L and less than any in T M and the larger search key in r is greater than any in T M and smaller than any in T R. Binary searching & introduction to trees 7

Placing Data in a 2-3 tree 1. A 2-node must contain a single data item whose value is greater than any in its left subtree, and smaller than any in its right subtree 2. A 3-mode must contain two data items, the smaller of which is greater than any in its left subtree, and smaller than any in its middle subtree, and the greater of which is greater than any in its middle subtree, and smaller than any in its right subtree 3. A leaf may contain either one or two data items Binary searching & introduction to trees 8

2-3 Node Code import searchkeys.*; public class Tree23Node { private KeyedItem smallitem; private KeyedItem largeitem; private Tree23Node leftchild; private Tree23Node middlechild; private Tree23Node rightchild; } Binary searching & introduction to trees 9

Traversing a 2-3 Tree Just like a binary tree: preorder, inorder, postorder 50 90 20 70 120 150 10 30 40 60 80 100 110 130 140 160 Binary searching & introduction to trees 10

Searching a 2-3 tree Same efficiency as a balanced binary search tree: O(logn) BUT, easy to keep balanced, unlike binary search trees This means that as you insert elements, the balance is easily maintained, and worst case performance remains O(logn) Binary searching & introduction to trees 11

Inserting 39 Into A 2-3 Tree 50 30 70 90 60 10 20 80 160 40 Binary searching & introduction to trees 12

Inserting 38 50 30 70 90 60 10 20 80 160 39 40 Binary searching & introduction to trees 13

Inserting 38 (continued) 50 30 70 90 60 10 20 80 160 38 39 40 Binary searching & introduction to trees 14

Inserting 37 50 30 39 70 90 60 10 20 80 160 38 40 Binary searching & introduction to trees 15

Inserting 36 50 30 39 70 90 60 10 20 80 160 37 38 40 Binary searching & introduction to trees 16

Inserting 36 (continued) 50 30 39 70 90 60 10 20 80 160 36 37 38 40 Binary searching & introduction to trees 17

Inserting 36 (continued) 50 30 37 39 70 90 10 20 36 38 40 60 80 160 Binary searching & introduction to trees 18

Inserting 75 37 50 30 39 70 90 10 20 36 38 40 60 80 160 Binary searching & introduction to trees 19

Inserting 77 37 50 30 39 70 90 10 20 36 38 40 60 75 80 160 Binary searching & introduction to trees 20

Inserting 77 (continued) 37 50 30 39 70 90 10 20 36 38 40 60 75 77 80 160 Binary searching & introduction to trees 21

Inserting 77 (continued) 37 50 30 39 70 77 90 10 20 36 38 40 60 75 80 160 Binary searching & introduction to trees 22

Inserting 77 (continued) 37 50 77 30 39 70 90 10 20 36 38 40 60 75 80 160 Binary searching & introduction to trees 23

Inserting 77 (continued) 50 37 77 30 39 70 90 10 20 36 38 40 60 75 80 160 Binary searching & introduction to trees 24

Inserting Into A 2-3 Tree Insert into the leaf node in which the search key belongs If the leaf has two values, stop If the leaf has three values, split the node into two nodes with the smallest and largest values, and Push the middle value into the parent node Continue with the parent node until either you push a value into a node that had only one value, or you create a new root node Binary searching & introduction to trees 25

Deleting From A 2-3 Tree The inverse of inserting Delete the value (in a leaf), then Merge empty nodes If necessary, delete empty root Binary searching & introduction to trees 26

Redistribute I P L S L S P Binary searching & introduction to trees 27

Merge I L S S L Binary searching & introduction to trees 28

Redistribute II P L S L S P a b c d a b c d Binary searching & introduction to trees 29

Merge II L S S L a b c a b c Binary searching & introduction to trees 30

Delete S L S L a b c a b c Binary searching & introduction to trees 31

2-3 Trees: Results Slightly more complicated than binary search trees, BUT 2-3 Trees are always balanced Every operation takes O(logn) Binary searching & introduction to trees 32

2-3-4 Trees Slightly less complicated than 2-3 Trees Each node can contain 1 3 values and have 1 4 children Inserting Split 4-nodes on the way down Insert into leaf Deleting: Only delete from 3-node or 4-node Binary searching & introduction to trees 33

Red-Black Trees 2-3 Trees are always balanced O(logn) time for all operations 2-3-4 Trees are always balanced O(logn) time for all operations, and Insertion and deletion can be done in a single pass from root to leaf But, require slightly more storage per node Red-Black Trees have the advantage of 2-3-4 trees, without the overhead Represent the 2-3-4 Tree as a binary tree with colored references (red or black) Binary searching & introduction to trees 34

Red-Black Representation of a 4-Node M S M L S L a b c d a b c d Binary searching & introduction to trees 35

Red-Black Representation of a 3-Node L S L a b c a S S b c a L b c Binary searching & introduction to trees 36

2-3-4 Tree 37 50 30 35 39 70 90 10 20 32 33 34 36 38 40 60 80 160 Binary searching & introduction to trees 37

Equivalent Red-Black Tree 37 30 50 20 35 39 90 10 33 36 38 40 70 100 32 34 60 80 Binary searching & introduction to trees 38

Red-Black Tree Node public class RBTreeNode { public static final int RED = 0; public static final int BLACK = 1; } private KeyedItem item; private RBTreeNode leftchild; private RBTreeNode rightchild; private int leftcolor; private int rightcolor; Binary searching & introduction to trees 39

Searching and Traversing Red-Black Trees Red-Black trees are binary search trees Just search them the same way you would any other binary search tree Inserting Split 4-nodes on the way down by changing paired red child references to black Insert into a leaf Binary searching & introduction to trees 40

Splitting a 4-node root M M S L S L a b c d a b c d Binary searching & introduction to trees 41

Splitting a 4-node P P M e M e S L S L a b c d a b c d Binary searching & introduction to trees 42

Splitting a 4-node P P a M a M S L S L b c d e b c d e Binary searching & introduction to trees 43

Splitting a 4-node P P M Q M Q S L e f S L e f a b c d a b c d Binary searching & introduction to trees 44

Splitting a 4-node Q P P f M Q M e S L e f S L a b c d a b c d Binary searching & introduction to trees 45

Splitting a 4-node P a Q M P Q M f a S L f S L b c d e b c d e Binary searching & introduction to trees 46

AVL Trees Insert in the appropriate spot Rotate as necessary to restore balance Rotate your way back up the tree Every operation is O(logn) Binary searching & introduction to trees 47

Hashing Trees allow for efficient table operations, but What if you want O(1) behavior? What if time is much more critical than space? Basic idea: Same for insert, lookup, delete Array Search Key Hash Function Number in 0..n-1 index Hash Table Binary searching & introduction to trees 48

Issues How to create a hash function Easy or difficult, depending upon the desired properties How large the array should be Factors: how many items will be stored, how much memory you have, how fast you want the operations to be Static or dynamic Hash collisions: what if two input values produce the same hash value? Binary searching & introduction to trees 49

Hash Functions Goals Fast, easy to compute Distributes hash values evenly in the target range Possible functions for an n-entry hash table h(k) = random number in 0..n-1 h(k) = the sum of the digits in k h(k) = the first m digits of k, where m = log 10 (n) h(k) = k mod(n) Are these good or bad? Why? Binary searching & introduction to trees 50

Hash Functions on Strings Need to convert string to a number Possible solutions: Add the binary representations of the characters Concatenate the binary representations to get a very large number Hello = H 32 4 + e 32 3 + l 32 2 + l 32 + o Horner s rule: (((H 32 + e) 32 + l) 32 + l) 32 + o This is a very big number, so (A B) mod n = (A mod n B mod n) mod n (A + B) mod n = (A mod n + B mod n) mod n Apply the modulo operator early and often to keep the number small Binary searching & introduction to trees 51

Dealing with Hash Collisions Open addressing: Collision try to place the object in other locations in a predictable sequence Linear probing: search starting from the current location Issues: wrapping, slow, empty/deleted entries, clustering Quadratic probing: search +1,4,9,16,25modn Double hashing: Use second hash function to determine the size of the steps: h 2 (k) 0 h 2 h 1 Example: h 1 (k) = k mod 11, h 2 (k) = 7 (k mod 7) Note: size of steps and size of table must be relatively prime so that all entries get visited Use prime size, step, etc. Binary searching & introduction to trees 52

Increase The Size of the Hash Table Increasing the actual size is infeasible Everything must be rehashed and moved Chain-bucket hashing Each entry in a hash table is a bucket into which multiple values may be added Each bucket can be implemented as a chain, or linked list Now the size of the hash table is variable Binary searching & introduction to trees 53

Efficiency of Hashing Load factor α = Number _ of Size _ of items table α is a measure of how full the table is Small α little chance of collision and low search time Large α high chance of collision and high search time Unsuccessful searches require more time than successful searches Binary searching & introduction to trees 54

Linear Probing Successful search 1 2 1 + 1 1 α Unsuccessful search 1 2 1 1 + 2 1 ( α ) Binary searching & introduction to trees 55

Quadratic Probing and Double Hashing Successful search ( ) log 1 α e α Unsuccessful search 1 1 α Binary searching & introduction to trees 56

Chain-Bucket Hashing Successful search 1+ α 2 Unsuccessful search α Binary searching & introduction to trees 57

How well does a hash function work? How fast is it to compute? How well does it scatter random data? How well does it scatter non-random data? This can be very important It is always possible to construct a worst case General principles: The hash function should involve the whole search key If a hash function involves a modulo operation, the base should be prime Binary searching & introduction to trees 58

Hashing vs. Binary Trees Hashing supports efficient insert and remove Hashing does not support efficient sorting Hashing does not support efficient range queries Binary searching & introduction to trees 59