Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Similar documents
Introduction to Hashtables

Hash tables. Hash tables

Hash tables. Hash tables

Hash tables. Hash tables

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

Fundamental Algorithms

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Searching, mainly via Hash tables

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

Hashing. Why Hashing? Applications of Hashing

Data Structures and Algorithm. Xiaoqing Zheng

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Lecture: Analysis of Algorithms (CS )

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

Hashing. Data organization in main memory or disk

So far we have implemented the search for a key by carefully choosing split-elements.

Hashing, Hash Functions. Lecture 7

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

Hashing Data Structures. Ananda Gunawardena

? 11.5 Perfect hashing. Exercises

6.1 Occupancy Problem

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

1 Maintaining a Dictionary

Hashing. Martin Babka. January 12, 2011

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]

Analysis of Algorithms I: Perfect Hashing

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

Introduction to Hash Tables

Data Structures and Algorithm. Xiaoqing Zheng

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

Tutorial 4. Dynamic Set: Amortized Analysis

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Independence, Variance, Bayes Theorem

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)

Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science

Array-based Hashtables

Linear Equations in Linear Algebra

Problem Set 4 Solutions

1 Hashing. 1.1 Perfect Hashing

Amortized analysis. Amortized analysis

Advanced Algorithm Design: Hashing and Applications to Compact Data Representation

Skip Lists. What is a Skip List. Skip Lists 3/19/14

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

s 1 if xπy and f(x) = f(y)

Cuckoo Hashing with a Stash: Alternative Analysis, Simple Hash Functions

Configuring Spatial Grids for Efficient Main Memory Joins

Fully Understanding the Hashing Trick

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures

CSE 502 Class 11 Part 2

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Lecture 8: Number theory

CSE 548: (Design and) Analysis of Algorithms

Outline Resource Introduction Usage Testing Exercises. Linked Lists COMP SCI / SFWR ENG 2S03. Department of Computing and Software McMaster University

Hash Tables (Cont'd) Carlos Moreno uwaterloo.ca EIT

Rainbow Tables ENEE 457/CMSC 498E

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Special types of matrices

Solution suggestions for examination of Logic, Algorithms and Data Structures,

INF2220: algorithms and data structures Series 1

Algorithms for Data Science

Data Structure. Mohsen Arab. January 13, Yazd University. Mohsen Arab (Yazd University ) Data Structure January 13, / 86

Problem. Maintain the above set so as to support certain

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

compare to comparison and pointer based sorting, binary trees

4.5 Applications of Congruences

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

Hashing. Alexandra Stefan

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY

Lecture Lecture 3 Tuesday Sep 09, 2014

Lecture 4: Divide and Conquer: van Emde Boas Trees

Queues. Principles of Computer Science II. Basic features of Queues

Genome Assembly. Sequencing Output. High Throughput Sequencing

1. Introduction. Mathematisches Institut, Seminars, (Y. Tschinkel, ed.), p Universität Göttingen,

Integers and Division

Lecture 12: Hash Tables [Sp 15]

Advanced Data Structures

Vector, Matrix, and Tensor Derivatives

Testing a Hash Function using Probability

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Inequalities. Some problems in algebra lead to inequalities instead of equations.

Hashing. Algorithm : Design & Analysis [09]

Module 1: Analyzing the Efficiency of Algorithms

F.3 Special Factoring and a General Strategy of Factoring

Improved Collision Attack on MD5

Transcription:

Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST

Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory location, a collision is said to occur 2

Collision Collisions occur when the hash function maps two different keys to the same location A method used to solve the problem of collision, also called collision resolution technique, is applied Open addressing Chaining 3

Open Addressing By using the technique, the hash table contains two types of values: sentinel values (e.g., 1) and data values The sentinel value indicates that the location contains no data value at present but can be used to hold a value If the location already has some data value stored in it, then other slots are examined systematically in the forward direction to find a free slot If even a single free location is not found, then we have an OVERFLOW condition The process of examining memory locations in the hash table is called probing linear probing, quadratic probing, double hashing, and rehashing 4

Linear Probing The simplest approach to resolve a collision is linear probing If a value is already stored at a location generated by h(xx), then the following hash function is used to resolve the collision h xx, ii = h xx + ii mod MM MM is the size of the hash table, h xx = xx mod MM, and ii is the probe number that varies from 0 to MM 1 When we have to store a value, we try the slots: h xx mod MM, h xx + 1 mod MM, h xx + 2 mod MM, h xx + 3 mod MM, and so no, until a vacant location is found 5

Example. h xx, ii = h xx + ii mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table Initial Step 1 xx = 72 h 72,0 = h 72 + 0 mod 10 = 72 mod 10 + 0 mod 10 = 2 6

Example.. h xx, ii = h xx + ii mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table Step 2 xx = 27 h 27,0 = h 27 + 0 mod 10 = 27 mod 10 + 0 mod 10 = 7 Step 3 xx = 36 h 36,0 = h 36 + 0 mod 10 = 36 mod 10 + 0 mod 10 = 6 7

Example h xx, ii = h xx + ii mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table Step 4 xx = 24 h 24,0 = h 24 + 0 mod 10 = 24 mod 10 + 0 mod 10 = 4 Step 5 xx = 63 h 63,0 = h 63 + 0 mod 10 = 63 mod 10 + 0 mod 10 = 3 8

Example. h xx, ii = h xx + ii mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table Step 6 xx = 81 h 81,0 = h 81 + 0 mod 10 = 81 mod 10 + 0 mod 10 = 1 Step 7 xx = 92 h 92,0 = h 92 + 0 mod 10 = 92 mod 10 + 0 mod 10 = 2 h 92,1 = h 92 + 1 mod 10 = 92 mod 10 + 1 mod 10 = 3 h 92,2 = h 92 + 2 mod 10 = 92 mod 10 + 2 mod 10 = 4 h 92,3 = h 92 + 3 mod 10 = 92 mod 10 + 3 mod 10 = 5 9

Example.. h xx, ii = h xx + ii mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table Step 8 xx = 101 h 101,0 = h 101 + 0 mod 10 = 101 mod 10 + 0 mod 10 = 1 h 101,0 = h 101 + 1 mod 10 = 101 mod 10 + 1 mod 10 = 2 h 101,0 = h 101 + 7 mod 10 = 101 mod 10 + 7 mod 10 = 8 101 10

Quadratic Probing If a value is already stored at a location generated by h(xx), then the following hash function is used to resolve the collision h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM MM is the size of the hash table, h xx = xx mod MM, ii is the probe number that varies from 0 to MM 1, and cc 1 and cc 2 are constants such that cc 1 0 and cc 2 0 11

Example. h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM the keys 72, 27, 36, 24, 63, 81, 101, and 92 into the table. Take cc 1 = 1 and cc 2 = 3. Initial Step 1 xx = 72 h 72,0 = h 72 + 1 0 + 3 0 2 mod 10 = 72 mod 10 + 0 + 0 mod 10 = 2 12

Example.. h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM the keys 72, 27, 36, 24, 63, 81, 101, and 92 into the table. Take cc 1 = 1 and cc 2 = 3. Step 2 xx = 27 h 27,0 = h 27 + 1 0 + 3 0 2 mod 10 = 27 mod 10 + 0 + 0 mod 10 = 7 Step 3 xx = 36 h 36,0 = h 36 + 1 0 + 3 0 2 mod 10 = 36 mod 10 + 0 + 0 mod 10 = 6 13

Example h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM the keys 72, 27, 36, 24, 63, 81, 101, and 92 into the table. Take cc 1 = 1 and cc 2 = 3. Step 4 h 24,0 = h 24 + 1 0 + 3 0 2 mod 10 = 24 mod 10 + 0 + 0 mod 10 = 4 Step 5 h 63,0 = h 63 + 1 0 + 3 0 2 mod 10 = 63 mod 10 + 0 + 0 mod 10 = 3 Step 6 h 81,0 = h 81 + 1 0 + 3 0 2 mod 10 = 81 mod 10 + 0 + 0 mod 10 = 1 14

Example. h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM the keys 72, 27, 36, 24, 63, 81, 101, and 92 into the table. Take cc 1 = 1 and cc 2 = 3. Step 7 h 101,0 = h 101 + 1 0 + 3 0 2 mod 10 = 101 mod 10 + 0 + 0 mod 10 = 1 h 101,1 = h 101 + 1 1 + 3 1 2 mod 10 = 101 mod 10 + 1 + 3 mod 10 = 5 15

Example.. h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM the keys 72, 27, 36, 24, 63, 81, 101, and 92 into the table. Take cc 1 = 1 and cc 2 = 3. Step 8 h 92,0 = h 92 + 1 0 + 3 0 2 mod 10 = 92 mod 10 + 0 + 0 mod 10 = 2 h 92,1 = h 92 + 1 1 + 3 1 2 mod 10 = 92 mod 10 + 1 + 3 mod 10 = 6 h 92,2 = h 92 + 1 2 + 3 2 2 mod 10 = 92 mod 10 + 2 + 12 mod 10 = 6 h 92,3 = h 92 + 1 3 + 3 3 2 mod 10 = 92 mod 10 + 3 + 27 mod 10 = 2 h 92,4 = h 92 + 1 4 + 3 4 2 mod 10 = 92 mod 10 + 4 + 48 mod 10 = 4 h 92,5 = h 92 + 1 5 + 3 5 2 mod 10 = 92 mod 10 + 5 + 75 mod 10 = 2 16

Example... h xx, ii = h xx + cc 1 ii + cc 2 ii 2 mod MM the keys 72, 27, 36, 24, 63, 81, 101, and 92 into the table. Take cc 1 = 1 and cc 2 = 3. h 92,6 = h 92 + 1 6 + 3 6 2 mod 10 = 92 mod 10 + 6 + 108 mod 10 = 6 h 92,7 = h 92 + 1 7 + 3 7 2 mod 10 = 92 mod 10 + 7 + 147 mod 10 = 6 h 92,8 = h 92 + 1 8 + 3 8 2 mod 10 = 92 mod 10 + 8 + 192 mod 10 = 2 h 92,9 = h 92 + 1 9 + 3 9 2 mod 10 = 92 mod 10 + 9 + 243 mod 10 = 4 One of the major drawbacks of quadratic probing is that a sequence of successive probes may only explore a fraction of the table, and this fraction may be quite small If this happens, then we will not be able to find an empty location in the table despite the fact that the table is by no means full 17

Double Hashing Double hashing uses one hash value and then repeatedly steps forward an interval until an empty location is reached The interval is decided using a second, independent hash function, hence the name double hashing In double hashing, we use two hash functions rather than a single function h xx, ii = h 1 xx + ii h 2 xx mod MM MM is the size of the hash table, h 1 xx and h 2 xx are two hash functions given as h 1 xx = xx mod MM and h 2 xx = xx mod MMM, ii is the probe number that varies from 0 to MM 1, and MMM is chosen to be less than MM We can choose MMM = MM 1 or MM = MM 2 18

Example. h xx, ii = h 1 xx + ii h 2 xx mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table. Take h 1 = xx mod 10 and h 2 = xx mod 8. Initial Step 1 xx = 72 h 72,0 = h 1 72 + 0 h 2 72 mod 10 = 72 mod 10 + 0 72 mod 8 mod 10 = 2 Step 2 xx = 27 h 27,0 = h 1 27 + 0 h 2 27 mod 10 = 27 mod 10 + 0 27 mod 8 mod 10 = 7 19

Example.. h xx, ii = h 1 xx + ii h 2 xx mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table. Take h 1 = xx mod 10 and h 2 = xx mod 8. Step 3 xx = 36 h 36,0 = h 1 36 + 0 h 2 36 mod 10 = 36 mod 10 + 0 36 mod 8 mod 10 = 6 Step 4 xx = 24 h 24,0 = h 1 24 + 0 h 2 24 mod 10 = 24 mod 10 + 0 24 mod 8 mod 10 = 4 Step 5 xx = 63 h 63,0 = h 1 63 + 0 h 2 63 mod 10 = 63 mod 10 + 0 63 mod 8 mod 10 = 3 20

Example... h xx, ii = h 1 xx + ii h 2 xx mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table. Take h 1 = xx mod 10 and h 2 = xx mod 8. Step 6 xx = 81 h 81,0 = h 1 81 + 0 h 2 81 mod 10 = 81 mod 10 + 0 81 mod 8 mod 10 = 1 Step 7 xx = 92 h 92,0 = h 1 92 + 0 h 2 92 mod 10 = 92 mod 10 + 0 92 mod 8 mod 10 = 2 h 92,1 = h 1 92 + 1 h 2 92 mod 10 = 92 mod 10 + 1 92 mod 8 mod 10 = 6 h 92,2 = h 1 92 + 2 h 2 92 mod 10 = 92 mod 10 + 2 92 mod 8 mod 10 = 0 21

Example... h xx, ii = h 1 xx + ii h 2 xx mod MM the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table. Take h 1 = xx mod 10 and h 2 = xx mod 8. Step 8 xx = 101 h 101,0 = h 1 101 + 0 h 2 101 mod 10 = 101 mod 10 + 0 101 mod 8 mod 10 = 1 h 101,1 = h 1 101 + 1 h 2 101 mod 10 = 101 mod 10 + 1 101 mod 8 mod 10 = 6 h 101,2 = h 1 101 + 2 h 2 101 mod 10 = 101 mod 10 + 2 101 mod 8 mod 10 = 1. 22

Rehashing When the hash table becomes nearly full, the number of collisions increases, thereby degrading the performance of insertion and search operations A better option is to create a new hash table with size double of the original hash table By performing the rehashing, all the entries in the original hash table will then have to be moved to the new hash table This is done by taking each entry, computing its new hash value, and then inserting it in the new hash table Though rehashing seems to be a simple process, it is quite expensive and must therefore not be done frequently 23

Example Consider the hash table of size 5 given below The hash function used is h(xx) = xx mod 5 Rehash the entries into to a new hash table Note that the new hash table is of 10 locations, double the size of the original table Rehash the key values from the old hash table into the new one using hash function h(xx) = xx mod 10 24

Chaining In chaining, each location in a hash table stores a pointer to a linked list that contains all the key values that were hashed to that location Location nn in the hash table points to the head of the linked list of all the key values that hashed to nn If no key value hashes to nn, then location nn in the hash table contains NULL 25

Example. Insert the keys 7, 24, 18, 52, 36, 54, 11, and 23 in a chained hash table of 9 memory locations. Use h(xx) = xx mod 9. Step 1 xx = 7 h 7 = 7 mod 9 = 7 Step 2 xx = 24 h 24 = 24 mod 9 = 6 Step 3 xx = 18 h 18 = 18 mod 9 = 0 Step 4 xx = 52 h 52 = 52 mod 9 = 7 26

Example.. Insert the keys 7, 24, 18, 52, 36, 54, 11, and 23 in a chained hash table of 9 memory locations. Use h(xx) = xx mod 9. Step 5 xx = 36 h 36 = 36 mod 9 = 0 Step 6 xx = 54 h 54 = 54 mod 9 = 0 Step 7 xx = 11 h 11 = 11 mod 9 = 2 Step 8 xx = 23 h 23 = 23 mod 9 = 5 27

Questions? kychen@mail.ntust.edu.tw 28