Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Similar documents
Data Structures and Algorithm. Xiaoqing Zheng

Hashing, Hash Functions. Lecture 7

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Hash tables. Hash tables

Fundamental Algorithms

Hash tables. Hash tables

Hash tables. Hash tables

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Introduction to Hashtables

Lecture: Analysis of Algorithms (CS )

Hashing. Data organization in main memory or disk

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Data Structures and Algorithm. Xiaoqing Zheng

Advanced Implementations of Tables: Balanced Search Trees and Hashing

CSE 502 Class 11 Part 2

Introduction to Hash Tables

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science

? 11.5 Perfect hashing. Exercises

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

Amortized analysis. Amortized analysis

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM

Analysis of Algorithms I: Perfect Hashing

Searching, mainly via Hash tables

Hashing. Why Hashing? Applications of Hashing

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Module 1: Analyzing the Efficiency of Algorithms

Array-based Hashtables

So far we have implemented the search for a key by carefully choosing split-elements.

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

Hashing. Algorithm : Design & Analysis [09]

CS483 Design and Analysis of Algorithms

Hashing. Martin Babka. January 12, 2011

Hashing Data Structures. Ananda Gunawardena

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

Lecture Notes for Chapter 17: Amortized Analysis

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille

Lecture 7: Amortized analysis

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

Tutorial 4. Dynamic Set: Amortized Analysis

Introduction to Algorithms

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Cache-Oblivious Hashing

6.1 Occupancy Problem

Graduate Analysis of Algorithms Dr. Haim Levkowitz

11. Hash Tables. m is not too large. Many applications require a dynamic set that supports only the directory operations INSERT, SEARCH and DELETE.

s 1 if xπy and f(x) = f(y)

Hashing. Alexandra Stefan

1 Maintaining a Dictionary

Problem Set 4 Solutions

N/4 + N/2 + N = 2N 2.

Solution suggestions for examination of Logic, Algorithms and Data Structures,

Algorithms for Data Science

1 Hashing. 1.1 Perfect Hashing

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

CS361 Homework #3 Solutions

Lecture Lecture 3 Tuesday Sep 09, 2014

Today: Amortized Analysis

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

CS 580: Algorithm Design and Analysis

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

Another way of saying this is that amortized analysis guarantees the average case performance of each operation in the worst case.

CS1802 Week 11: Algorithms, Sums, Series, Induction

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

7 Dictionary. EADS c Ernst Mayr, Harald Räcke 109

Amortized Analysis (chap. 17)

compare to comparison and pointer based sorting, binary trees

data structures and algorithms lecture 2

Data Structure. Mohsen Arab. January 13, Yazd University. Mohsen Arab (Yazd University ) Data Structure January 13, / 86

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Pipelined Computations

CS 4407 Algorithms Lecture 2: Growth Functions

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures

Introduction to Algorithms

Hash Table Analysis. e(k) = bp(k). kp(k) = n b.

Data Structures. Element Uniqueness Problem. Hash Tables. Example. Hash Tables. Dana Shapira. 19 x 1. ) h(x 4. ) h(x 2. ) h(x 3. h(x 1. x 4. x 2.

Module 9: Tries and String Matching

Outline. CS38 Introduction to Algorithms. Max-3-SAT approximation algorithm 6/3/2014. randomness in algorithms. Lecture 19 June 3, 2014

1.1 Our Contribution. 1.2 Paper Organization. The Goshawk Protocol 2.1 Improved Two-layer Hybrid Consensus

Hash Tables (Cont'd) Carlos Moreno uwaterloo.ca EIT

7 Dictionary. 7.1 Binary Search Trees. Binary Search Trees: Searching. 7.1 Binary Search Trees

CS Data Structures and Algorithm Analysis

CS 161 Summer 2009 Homework #2 Sample Solutions

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)

Basics of hashing: k-independence and applications

Randomized Algorithms. Zhou Jun

Lecture 12: Hash Tables [Sp 15]

Electrical & Computer Engineering University of Waterloo Canada February 6, 2007

University of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2013

CS Asymptotic Notations for Algorithm Analysis

Transcription:

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of possible keys. Use array T[0...m 1] as a direct-address table. Implies 1-1 correspondence between keys and slots. Direct-Address-Search(T, k) return T[k] Direct-Address-Insert(T, x) T[x.key] x Direct-Address-Delete(T, k) T[x.key] nil Advantage: operations are Θ(1). Disadvantage: Θ( U ) space required. CS 5633 Analysis of Algorithms Chapter 11: Slide 2

Hash Tables 2 2 Let K be the set of keys to be stored. Goal: use Θ( K ) space and Θ(1) time/op. Idea: Use array T[0...m 1] as a hash table, and use a Θ(1) hash function h, where h: U {0,...,m 1} maps from keys to slots. A collision is when two keys map to the same slot. CS 5633 Analysis of Algorithms Chapter 11: Slide 3

Good Hash Functions 2 2 Division method: h(k) = k mod m m is prime, not close to any 2 i. Division variation: h(k) = (k mod M) mod m M is prime, << than U, not close to any 2 i. m is << than M. Multiplication method: h(k) = m((ka) mod 1) m is a power of 2. A = ( 5 1)/2 CS 5633 Analysis of Algorithms Chapter 11: Slide 4

Horner s Method for Division Hash Function 2 2 If k = k[1],...,k[l], and if 0 k[i] < r, then compute hash function by: h k[1] mod m for i 2 to l do h (rh+k[i]) mod m CS 5633 Analysis of Algorithms Chapter 11: Slide 5

Universal Hashing 2 2 Let H be a set of hashing functions. H is universal if h(k) = h(k ) with prob. 1/m m is a prime number. k = k[1],...,k[l], where 0 k[i] < m Assign a[i] Random(0,m 1) ( ) l h(k) = Σ a[i] k[i] mod m i=1 The set of possible functions h(k) is universal. h(k) = h(k ) with prob. 1/m. If k[i] k [i], (a[i] (k[i] k [i])) mod m has equally likely results. CS 5633 Analysis of Algorithms Chapter 11: Slide 6

Chaining 2 2 In, slots are linked lists of the elements that hash to that slot, i.e., collisions. Consider m slots, n elts., load factor α = n/m. Worst-case: Θ(n) if all elts. hash to same slot. Best-case: Θ(1+α), each slot has α or α. Average-case: Assume each slot is equally likely. Unsuccessful search: Θ(1 + α) This is because average slot length = α. CS 5633 Analysis of Algorithms Chapter 11: Slide 7

Chaining, Part 2 2 2 Successful search: Θ(1 + α) Before ith elt. inserted, avg. length = (i 1)/m. Expected position of ith elt. = 1+(i 1)/m. Expected search length is the summation: n Σ i=1 n elements to search for. 1/n Prob. for ith element is 1/n. 1+(i 1)/m Expected position of ith elt. n Σ i=1 ( )( 1 1+ i 1 ) n m = 1+ α 2 1 2m CS 5633 Analysis of Algorithms Chapter 11: Slide 8

Open-Address Hashing 2 2 In ing, when a collision occurs, probe for an empty slot and insert the new elt. there. The hash function becomes: h : U {0,...,m 1} {0,...,m 1} The probe sequence h(k,0),...,h(k,m 1) should include all the slots. CS 5633 Analysis of Algorithms Chapter 11: Slide 9

Open-Address Hashing, Part 2 2 2 Hash-Insert(T, x) for i 0 to m 1 do j h(x.key,i) if T[j] = nil then T[j] x return j error hash table overflow Hash-Delete marks the slot as deleted. Hash-Search must continue past deleted slots. Hash-Insert can put new elts. in deleted slots. CS 5633 Analysis of Algorithms Chapter 11: Slide 10

Uniform Hashing Analysis 2 2 Uniform hashing assumes each open-address probe-sequence is equally likely. Unsuccessful Search: Θ ( ) 1 Let p i = prob. exactly i probes find full slots. Let q i = prob. first i probes find full slots. p i = q i q i+1 q 1 = n/m = α and q 2 = ( )( n n 1 ) m m 1 < α 2 q i = i 1 Π k=0 n k ( n ) i m k = α i m CS 5633 Analysis of Algorithms Chapter 11: Slide 11

Uniform Hashing Analysis, Part 2 2 2 Average number of probes is: 1+ Σ n ip i = 1+ Σ n q i Σ α i = 1 i=1 i=1 i=0 Successful Search: Θ ( 1 α ln ) 1 Inserting ith elt. = unsuccessful search i 1 elts. Average number of probes is: ( )( n 1 1 Σ i=1 n 1 (i 1)/m ) 1 α ln 1 CS 5633 Analysis of Algorithms Chapter 11: Slide 12

Performance of Practical Methods 2 2 Linear Probing: h(k,i) = (h (k)+i) mod m Successful Search: Θ ( ) 1 ( 1 Unsuccessful Search: Θ () 2 ) Linear probing suffers from primary clustering, from long runs of occupied slots. An empty slot preceded by i full slots gets filled next with probability (i + 1)/m. CS 5633 Analysis of Algorithms Chapter 11: Slide 13

Performance of Practical Methods 2 2 Quadratic Probing assumes m is a power of 2. h(k,i) = (h (k)+ i 2 + i2 2 ) mod m Successful Search: Θ ( 1 α ln 1 Unsuccessful Search: Θ ( 1 Double Hashing, m is prime, 1 h 2 (k) m 1 h(k,i) = (h 1 (k)+ih 2 (k)) mod m Successful Search: Θ ( 1 α ln ) 1 Unsuccessful Search: Θ ( ) 1 CS 5633 Analysis of Algorithms Chapter 11: Slide 14 ) )