Definitions: Universe U of keys, e.g., U N 0. U very large. Set S U of keys, S = m U.

Similar documents
7.7 Hashing. 7.7 Hashing. Perfect Hashing. Direct Addressing

So far we have implemented the search for a key by carefully choosing split-elements.

11. Hash Tables. m is not too large. Many applications require a dynamic set that supports only the directory operations INSERT, SEARCH and DELETE.

Hashing. Algorithm : Design & Analysis [09]

Hashing and Amortization

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

1 Hash tables. 1.1 Implementation

Analysis of Algorithms. Introduction. Contents

Design and Analysis of Algorithms

Problem Set 2 Solutions

19.1 The dictionary problem

CS / MCS 401 Homework 3 grader solutions

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Skip Lists. Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 S 3 S S 1

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

ECEN 655: Advanced Channel Coding Spring Lecture 7 02/04/14. Belief propagation is exact on tree-structured factor graphs.

Skip lists: A randomized dictionary

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

Lecture 11: Hash Functions and Random Oracle Model

Lecture 4: Unique-SAT, Parity-SAT, and Approximate Counting

DATA STRUCTURES I, II, III, AND IV

Analysis of Algorithms -Quicksort-

An Introduction to Randomized Algorithms

MA131 - Analysis 1. Workbook 2 Sequences I

Sequences and Series of Functions

Disjoint set (Union-Find)

Lecture 11: Pseudorandom functions

CS161: Algorithm Design and Analysis Handout #10 Stanford University Wednesday, 10 February 2016

CSE 202 Homework 1 Matthias Springer, A Yes, there does always exist a perfect matching without a strong instability.

Sequences I. Chapter Introduction

Examples: data compression, path-finding, game-playing, scheduling, bin packing

Lecture 12: November 13, 2018

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

IP Reference guide for integer programming formulations.

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

CS284A: Representations and Algorithms in Molecular Biology

Trial division, Pollard s p 1, Pollard s ρ, and Fermat s method. Christopher Koch 1. April 8, 2014

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

On Random Line Segments in the Unit Square


Lecture 2: April 3, 2013

Lecture 2: Concentration Bounds

Chapter 6. Advanced Counting Techniques

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 12

4.1 Sigma Notation and Riemann Sums

Design and Analysis of ALGORITHM (Topic 2)

Filter banks. Separately, the lowpass and highpass filters are not invertible. removes the highest frequency 1/ 2and

Lecture 4 February 16, 2016

Chapter 6 Infinite Series

Axioms of Measure Theory

Lecture 9: Pseudo-random generators against space bounded computation,

Divide and Conquer. 1 Overview. 2 Multiplying Bit Strings. COMPSCI 330: Design and Analysis of Algorithms 1/19/2016 and 1/21/2016

Problem Set 4 Due Oct, 12

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Infinite Sequences and Series

Math 475, Problem Set #12: Answers

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Test One (Answer Key)

Optimally Sparse SVMs

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

MAT1026 Calculus II Basic Convergence Tests for Series

HOMEWORK 2 SOLUTIONS

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Average-Case Analysis of QuickSort

MA131 - Analysis 1. Workbook 3 Sequences II

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

Fall 2013 MTH431/531 Real analysis Section Notes

CS 330 Discussion - Probability

Outline for Today. A simple and lightning fast hash table implementation. Why the degree of independence matters.

11. FINITE FIELDS. Example 1: The following tables define addition and multiplication for a field of order 4.

Topic 9: Sampling Distributions of Estimators

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Empirical Process Theory and Oracle Inequalities

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Math 61CM - Solutions to homework 3

b i u x i U a i j u x i u x j

The multiplicative structure of finite field and a construction of LRC

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Frequentist Inference

7.1 Convergence of sequences of random variables

Random Variables, Sampling and Estimation

Polynomial identity testing and global minimum cut

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =

Math 155 (Lecture 3)

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

The Random Walk For Dummies

The Boolean Ring of Intervals

6.003 Homework #3 Solutions

2 Statistical Principles

Lecture Notes for Analysis Class

Lecture 10 October Minimaxity and least favorable prior sequences

Induction: Solutions

Ma 530 Introduction to Power Series

Transcription:

7 7 Dictioary: S.isertx): Isert a elemet x. S.deletex): Delete the elemet poited to by x. S.searchk): Retur a poiter to a elemet e with key[e] = k i S if it exists; otherwise retur ull. So far we have implemeted the search for a key by carefully choosig split-elemets. The the memory locatio of a object x with key k is determied by successively comparig k to split-elemets. Hashig tries to directly compute the memory locatio from the give key. The goal is to have costat search time. Defiitios: Uiverse U of keys, e.g., U N 0. U very large. Set S U of keys, S = m U. Array T [0,..., ] hash-table. Hash fuctio h : U [0,..., ]. The hash-fuctio h should fulfill: Fast to evaluate. Small storage requiremet. Good distributio of elemets over the whole table. Erst Mayr, Harald Räcke 29 Erst Mayr, Harald Räcke 220 7 7 Ideally the hash fuctio maps all keys to differet memory locatios. k Suppose that we kow the set S of actual keys o isert/o delete). The we may wat to desig a simple hash-fuctio that maps all these keys to differet memory locatios. k k U uiverse of keys k6 k 7 k 3 k 7 k 3 U uiverse of keys k k6 k 7 k 3 k 7 k 6 S actual keys) k 3 k 6 This special case is kow as Direct Addressig. It is usually very urealistic as the uiverse of keys typically is quite large, ad i particular larger tha the available memory. Such a hash fuctio h is called a perfect hash fuctio for set S. Erst Mayr, Harald Räcke 22 Erst Mayr, Harald Räcke 222

7 7 If we do ot kow the keys i advace, the best we ca hope for is that the hash fuctio distributes keys evely across the table. Problem: Collisios Usually the uiverse U is much larger tha the table-size. Hece, there may be two elemets k, k 2 from the set S that map to the same memory locatio i.e., hk ) = hk 2 )). This is called a collisio. Typically, collisios do ot appear oce the size of the set S of actual keys gets close to, but already whe S ω ). Lemma The probability of havig a collisio whe hashig m elemets ito a table of size uder uiform hashig is at least e mm ) 2 e m2 2. Uiform hashig: Choose a hash fuctio uiformly at radom from all fuctios f : U [0,..., ]. Erst Mayr, Harald Räcke 223 Erst Mayr, Harald Räcke 224 7 Proof. Let A m, deote the evet that isertig m keys ito a table of size does ot geerate a collisio. The 4 3 f x) e x x Pr[A m, ] = m l= m j=0 l + = m j=0 e j/ = e m j=0 j ) j = e mm ) 2. Here the first equality follows sice the l-th elemet that is hashed has a probability of l+ to ot geerate a collisio uder the coditio that the previous elemets did ot iduce collisios. 2 3 2 2 3 The iequality x e x is derived by stoppig the Taylor-expasio of e x after the secod term. x Erst Mayr, Harald Räcke 225 Erst Mayr, Harald Räcke 226

Resolvig Collisios The methods for dealig with collisios ca be classified ito the two mai types ope addressig, aka. closed hashig hashig with chaiig, aka. closed addressig, ope hashig. There are applicatios e.g. computer chess where you do ot resolve collisios at all. Hashig with Chaiig Arrage elemets that map to the same positio i a liear list. Access: compute hx) ad search list for key[x]. Isert: isert at the frot of the list. U uiverse of keys k k 6 k 4 k 5 k 2 k 7 S actual keys) k 8 k3 k k 4 k 5 k 2 k 7 k 3 k 8 k 6 Erst Mayr, Harald Räcke 227 Erst Mayr, Harald Räcke 228 7 Hashig with Chaiig Let A deote a strategy for resolvig collisios. We use the followig otatio: A + deotes the average time for a successful search whe usig A; A deotes the average time for a usuccessful search whe usig A; We parameterize the complexity results i terms of α := m, the so-called fill factor of the hash-table. The time required for a usuccessful search is plus the legth of the list that is examied. The average legth of a list is α = m. Hece, if A is the collisio resolvig strategy Hashig with Chaiig we have A = + α. We assume uiform hashig for the followig aalysis. Erst Mayr, Harald Räcke 229 Erst Mayr, Harald Räcke 230

Hashig with Chaiig For a successful search observe that we do ot choose a list at radom, but we cosider a radom key k i the hash-table ad ask for the search-time for k. This is plus the umber of elemets that lie before k i k s list. Let k l deote the l-th key iserted ito the table. Let for two keys k i ad k j, X ij deote the idicator variable for the evet that k i ad k j hash to the same positio. Clearly, Pr[X ij = ] = / for uiform hashig. The expected successful search cost is [ E m m i= + m j=i+ keys before k )] i X ij cost for key k i Hashig with Chaiig [ E m m i= + m j=i+ )] X ij = m m = m i= m i= = + m + + m j=i+ m j=i+ E [ X ij ] ) ) = + m m i) m i= m 2 ) mm + ) 2 = + m 2 = + α 2 α 2m. Hece, the expected cost for a successful search is A + + α 2. Erst Mayr, Harald Räcke 23 Erst Mayr, Harald Räcke 232 Hashig with Chaiig Ope Addressig Disadvatages: poiters icrease memory requiremets poiters may lead to bad cache efficiecy Advatages: o à priori limit o the umber of elemets deletio ca be implemeted efficietly by usig balaced trees istead of liked list oe ca also obtai worst-case guaratees. All objects are stored i the table itself. Defie a fuctio hk, j) that determies the table-positio to be examied i the j-th step. The values hk, 0),...,hk, ) must form a permutatio of 0,...,. Searchk): Try positio hk, 0); if it is empty your search fails; otw. cotiue with hk, ), hk, 2),.... Isertx): Search util you fid a empty slot; isert your elemet there. If your search reaches hk, ), ad this slot is o-empty the your table is full. Erst Mayr, Harald Räcke 233 Erst Mayr, Harald Räcke 234

Ope Addressig Choices for hk, j): Liear probig: hk, i) = hk) + i mod sometimes: hk, i) = hk) + ci mod ). Quadratic probig: hk, i) = hk) + c i + c 2 i 2 mod. Double hashig: hk, i) = h k) + ih 2 k) mod. For quadratic probig ad double hashig oe has to esure that the search covers all positios i the table i.e., for double hashig h 2 k) must be relatively prime to teilerfremd); for quadratic probig c ad c 2 have to be chose carefully). Liear Probig Advatage: Cache-efficiecy. The ew probe positio is very likely to be i the cache. Disadvatage: Primary clusterig. Log sequeces of occupied table-positios get loger as they have a larger probability to be hit. Furthermore, they ca merge formig larger sequeces. Lemma 2 Let L be the method of liear probig for resolvig collisios: L + 2 L 2 + ) α ) + α) 2 Erst Mayr, Harald Räcke 235 Erst Mayr, Harald Räcke 236 Quadratic Probig Not as cache-efficiet as Liear Probig. Double Hashig Ay probe ito the hash-table usually creates a cache-miss. Secodary clusterig: caused by the fact that all keys mapped to the same positio have the same probe sequece. Lemma 3 Let Q be the method of quadratic probig for resolvig collisios: ) Q + + l α α 2 Lemma 4 Let A be the method of double hashig for resolvig collisios: D + ) α l α D α Q ) α + l α α Erst Mayr, Harald Räcke 237 Erst Mayr, Harald Räcke 238

Ope Addressig Ope Addressig #probes Some values: 0 α Liear Probig Quadratic Probig Double Hashig L + L Q + Q D + D 0.5.5 2.5.44 2.9.39 2 0.9 5.5 50.5 2.85.40 2.55 0 0.95 0.5 200.5 3.52 22.05 3.5 20 5 L + Q + D + L Q D α 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Erst Mayr, Harald Räcke 239 Erst Mayr, Harald Räcke 240 Aalysis of Idealized Ope Address Hashig Aalysis of Idealized Ope Address Hashig Let X deote a radom variable describig the umber of probes i a usuccessful search. We aalyze the time for a search i a very idealized Ope Addressig scheme. The probe sequece hk, 0), hk, ), hk, 2),... is equally likely to be ay permutatio of 0,,...,. Let A i deote the evet that the i-th probe occurs ad is to a o-empty slot. Pr[A A 2 A i ] = Pr[A ] Pr[A 2 A ] Pr[A 3 A A 2 ]... Pr[A i A A i 2 ] Pr[X i] = m m m 2 2... m i + 2 i + 2 m ) i = α i. Erst Mayr, Harald Räcke 24 Erst Mayr, Harald Räcke 242

Aalysis of Idealized Ope Address Hashig i = 3 Pr[X = i] i Pr[X = i] = Pr[X i] i i E[X] = Pr[X i] α i = i= i= i=0 α i = α. α = + α + α2 + α 3 +... 2 3 4 5 6 7 i The j-th rectagle appears i both sums j times. j times i the first due to multiplicatio with j; ad j times i the secod for summads i =, 2,..., j) Erst Mayr, Harald Räcke 243 Erst Mayr, Harald Räcke 244 i = 4 Pr[X = i] i Pr[X = i] = Pr[X i] i i Aalysis of Idealized Ope Address Hashig The umber of probes i a successful search for k is equal to the umber of probes made i a usuccessful search for k at the time that k is iserted. Let k be the i + -st elemet. The expected time for a search for k is at most i/ = i. 2 3 4 5 6 7 i m m i=0 i = m m i=0 i = α k= m+ k The j-th rectagle appears i both sums j times. j times i the first due to multiplicatio with j; ad j times i the secod for summads i =, 2,..., j) α m x dx = α l m = α l α. Erst Mayr, Harald Räcke 244 Erst Mayr, Harald Räcke 245

f x) Deletios f x) = x k m x dx k=m + How do we delete i a hash-table? For hashig with chaiig this is ot a problem. Simply search for the key, ad delete the item i the correspodig list. For ope addressig this is difficult. m + m +2 m m + x Erst Mayr, Harald Räcke 246 Erst Mayr, Harald Räcke 247 Deletios Deletios for Liear Probig Simply removig a key might iterrupt the probe sequece of other keys which the caot be foud aymore. Oe ca delete a elemet by replacig it with a deleted-marker. Durig a isertio if a deleted-marker is ecoutered a elemet ca be iserted there. Durig a search a deleted-marker must ot be used to termiate the probe sequece. The table could fill up with deleted-markers leadig to bad performace. If a table cotais may deleted-markers liear fractio of the keys) oe ca rehash the whole table ad amortize the cost for this rehash agaist the cost for the deletios. For Liear Probig oe ca delete elemets without usig deletio-markers. Upo a deletio elemets that are further dow i the probe-sequece may be moved to guaratee that they are still foud durig a search. Erst Mayr, Harald Räcke 248 Erst Mayr, Harald Räcke 249

Deletios for Liear Probig Algorithm 6 deletep) : T [p] ull 2: p succp) 3: while T [p] ull do 4: y T [p] 5: T [p] ull 6: p succp) 7: iserty) p is the idex ito the table-cell that cotais the object to be deleted. Poiters ito the hash-table become ivalid. Uiversal Hashig Regardless, of the choice of hash-fuctio there is always a iput a set of keys) that has a very poor worst-case behaviour. Therefore, so far we assumed that the hash-fuctio is radom so that regardless of the iput the average case behaviour is good. However, the assumptio of uiform hashig that h is chose radomly from all fuctios f : U [0,..., ] is clearly urealistic as there are U such fuctios. Eve writig dow such a fuctio would take U log bits. Uiversal hashig tries to defie a set H of fuctios that is much smaller but still leads to good average case behaviour whe selectig a hash-fuctio uiformly at radom from H. Erst Mayr, Harald Räcke 250 Erst Mayr, Harald Räcke 25 Uiversal Hashig Uiversal Hashig Defiitio 5 A class H of hash-fuctios from the uiverse U ito the set {0,..., } is called uiversal if for all u, u 2 U with u u 2 Pr[hu ) = hu 2 )], where the probability is w. r. t. the choice of a radom hash-fuctio from set H. Note that this meas that the probability of a collisio is at most. Defiitio 6 A class H of hash-fuctios from the uiverse U ito the set {0,..., } is called 2-idepedet pairwise idepedet) if the followig two coditios hold For ay key u U, ad t {0,..., } Pr[hu) = t] =, i.e., a key is distributed uiformly withi the hash-table. For all u, u 2 U with u u 2, ad for ay two hash-positios t, t 2 : Pr[hu ) = t hu 2 ) = t 2 ] 2. This requiremet clearly implies a uiversal hash-fuctio. Erst Mayr, Harald Räcke 252 Erst Mayr, Harald Räcke 253

Uiversal Hashig Uiversal Hashig Defiitio 7 A class H of hash-fuctios from the uiverse U ito the set {0,..., } is called k-idepedet if for ay choice of l k distict keys u,..., u l U, ad for ay set of l ot ecessarily distict hash-positios t,..., t l : Pr[hu ) = t hu l ) = t l ] l, where the probability is w. r. t. the choice of a radom hash-fuctio from set H. Defiitio 8 A class H of hash-fuctios from the uiverse U ito the set {0,..., } is called µ, k)-idepedet if for ay choice of l k distict keys u,..., u l U, ad for ay set of l ot ecessarily distict hash-positios t,..., t l : Pr[hu ) = t hu l ) = t l ] µ l, where the probability is w. r. t. the choice of a radom hash-fuctio from set H. Erst Mayr, Harald Räcke 254 Erst Mayr, Harald Räcke 255 Uiversal Hashig Let U := {0,..., p } for a prime p. Let Z p := {0,..., p }, ad let Z p := {,..., p } deote the set of ivertible elemets i Z p. Uiversal Hashig Proof. Let x, y U be two distict keys. We have to show that the probability of a collisio is oly /. ax + b ay + b mod p) Defie h a,b x) := ax + b mod p) mod If x y the x y) 0 mod p). Lemma 9 The class H = {h a,b a Z p, b Z p} is a uiversal class of hash-fuctios from U to {0,..., }. Multiplyig with a 0 mod p) gives ax y) 0 mod p) where we use that Z p is a field Körper) ad, hece, has o zero divisors ullteilerfrei). Erst Mayr, Harald Räcke 256 Erst Mayr, Harald Räcke 257

The hash-fuctio does ot geerate collisios before the mod )-operatio. Furthermore, every choice a, b) is mapped to a differet pair t x, t y ) with t x := ax + b ad t y := ay + b. This holds because we ca compute a ad b whe give t x ad t y : t x ax + b mod p) t y ay + b mod p) t x t y ax y) mod p) t y ay + b mod p) a t x t y )x y) mod p) b t y ay mod p) Uiversal Hashig There is a oe-to-oe correspodece betwee hash-fuctios pairs a, b), a 0) ad pairs t x, t y ), t x t y. Therefore, we ca view the first step before the mod - operatio) as choosig a pair t x, t y ), t x t y uiformly at radom. What happes whe we do the mod operatio? Fix a value t x. There are p possible values for choosig t y. From the rage 0,..., p the values t x, t x +, t x + 2,... map to t x after the modulo-operatio. These are at most p/ values. Erst Mayr, Harald Räcke 258 Erst Mayr, Harald Räcke 259 Uiversal Hashig As t y t x there are p p + p possibilities for choosig t y such that the fial hash-value creates a collisio. Uiversal Hashig It is also possible to show that H is a almost) pairwise idepedet class of hash-fuctios. p 2 pp ) Pr t x t y Z 2 p [ tx mod =h t y mod =h 2 ] p 2 pp ) This happes with probability at most. Note that the middle is the probability that hx) = h ad hy) = h 2. The total umber of choices for t x, t y ) is pp ). The umber of choices for t x t y ) such that t x mod = h t y mod = h 2 ) lies betwee p ad p. Erst Mayr, Harald Räcke 260 Erst Mayr, Harald Räcke 26

Defiitio 0 Let d N; q d + ) be a prime; ad let a {0,..., q } d+. Defie for x {0,..., q} d h a x) := i=0 ) a i x i mod q mod. Let H d := {h a a {0,..., q} d+ }. The class H d is e, d + )-idepedet. For the coefficiets ā {0,..., q } d+ let fā deote the polyomial d fāx) = a i x i) mod q i=0 The polyomial is defied by d + distict poits. Note that i the previous case we had d = ad chose a d 0. Erst Mayr, Harald Räcke 262 Erst Mayr, Harald Räcke 263 Fix l d + ; let x,..., x l {0,..., q } be keys, ad let t,..., t l deote the correspodig hash-fuctio values. Let A l = {hā H hāx i ) = t i for all i {,..., l}} The hā A l hā = fā mod ad fāx i ) {t i + α α {0,..., q }} }{{} =: B i I order to obtai the cardiality of A l we choose our polyomial by fixig d + poits. We first fix the values for iputs x,..., x l. We have B... B l possibilities to do this so that hāx i ) = t i ). A l deotes the set of hashfuctios such that every x i hits its pre-defied positio t i. B i is the set of positios that fā ca hit so that hā still hits t i. Erst Mayr, Harald Räcke 264 Now, we choose d l + other iputs ad choose their value arbitrarily. We have q d l+ possibilities to do this. Therefore we have B... B l q d l+ q l q d l+ possibilities to choose ā such that hā A l. Erst Mayr, Harald Räcke 265

Perfect Hashig Therefore the probability of choosig hā from A l is oly q l q d l+ q d+ q+ )l q l + l q + ) l q l ) l l e l. This shows that the H is e, d + )-uiversal. Suppose that we kow the set S of actual keys o isert/o delete). The we may wat to desig a simple hash-fuctio that maps all these keys to differet memory locatios. U uiverse of keys k k6 k 7 k 3 k k 7 S actual keys) k 3 k 6 Erst Mayr, Harald Räcke 266 Erst Mayr, Harald Räcke 267 Perfect Hashig Perfect Hashig Let m = S. We could simply choose the hash-table size very large so that we do t get ay collisios. Usig a uiversal hash-fuctio the expected umber of collisios is ) m E[#Collisios] = 2. If we choose = m 2 the expected umber of collisios is strictly less tha 2. Ca we get a upper boud o the probability of havig collisios? The probability of havig or more collisios ca be at most 2 as otherwise the expectatio would be larger tha 2. Erst Mayr, Harald Räcke 268 We ca fid such a hash-fuctio by a few trials. However, a hash-table size of = m 2 is very very high. We costruct a two-level scheme. We first use a hash-fuctio that maps elemets from S to m buckets. Let m j deote the umber of items that are hashed to the j-th bucket. For each bucket we choose a secod hash-fuctio that maps the elemets of the bucket ito a table of size m 2 j. The secod fuctio ca be chose such that all elemets are mapped to differet locatios. Erst Mayr, Harald Räcke 269

Perfect Hashig U uiverse of keys k k 4 k 5 S actual keys) k 7 k 6 k 8 k 3 k 2 m 2 m 3 m 6 m 8 i m i = m Perfect Hashig The total memory that is required by all hash-tables is O j m 2 j ). Note that m j is a radom variable. [ E j m 2 j ] [ = E 2 j [ = 2 E j mj ) + ] m j 2 j ) ] [ ] + E m j 2 mj The first expectatio is simply the expected umber of collisios, for the first level. Sice we use uiversal hashig we have j k k 6 k 4 k 3 k 2 k 8 k 5 k 7 m 2 2 m 2 3 m 2 6 m 2 8 = 2 m 2 ) m + m = 2m. Erst Mayr, Harald Räcke 270 Erst Mayr, Harald Räcke 27 Perfect Hashig We eed oly Om) time to costruct a hash-fuctio h with j m 2 j = O4m), because with probability at least /2 a radom fuctio from a uiversal family will have this property. The we costruct a hash-table h j for every bucket. This takes expected time Om j ) for every bucket. A radom fuctio h j is collisio-free with probability at least /2. We eed Om j ) to test this. We oly eed that the hash-fuctios are chose from a uiversal family!!! Goal: Try to geerate a hash-table with costat worst-case search time i a dyamic sceario. Two hash-tables T [0,..., ] ad T 2 [0,..., ], with hash-fuctios h, ad h 2. A object x is either stored at locatio T [h x)] or T 2 [h 2 x)]. A search clearly takes costat time if the above costrait is met. Erst Mayr, Harald Räcke 272 Erst Mayr, Harald Räcke 273

Isert: x x 7 x 4 x 6 x 6 x x 7 x x 9 x 76 x 3 Algorithm 7 Cuckoo-Isertx) : if T [h x)] = x T 2 [h 2 x)] = x the retur 2: steps 3: while steps maxsteps do 4: exchage x ad T [h x)] 5: if x = ull the retur 6: exchage x ad T 2 [h 2 x)] 7: if x = ull the retur 8: steps steps + 9: rehash) // chage hash-fuctios; rehash everythig 0: Cuckoo-Isertx) T T 2 Erst Mayr, Harald Räcke 274 Erst Mayr, Harald Räcke 275 We call oe iteratio through the while-loop a step of the algorithm. We call a sequece of iteratios through the while-loop without the termiatio coditio becomig true a phase of the algorithm. We say a phase is successful if it is ot termiated by the maxstep-coditio, but the while loop is left because x = ull. What is the expected time for a isert-operatio? We first aalyze the probability that we ed-up i a ifiite loop that is the termiated after maxsteps steps). Formally what is the probability to eter a ifiite loop that touches s differet keys? Erst Mayr, Harald Räcke 276 Erst Mayr, Harald Räcke 277

: Isert x x x 2 x 0 x 0 x 9 x 9 x x 2 x = x x 2 x 2 23 x 3 x 4 x 34 x 48 x 5 x 7 x 5 x 7 x = x x 6 x 8 x6 x 0 x 7 x 9 x 8 x x x 2 x 3 x 4 x 5 x 6 p p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 T x = x 0 2 7 56 84 29 3 x T 2 Erst Mayr, Harald Räcke 278 Erst Mayr, Harald Räcke 279 A cycle-structure is active if for every key x l likig a cell p i from T ad a cell p j from T 2 ) we have h x l ) = p i ad h 2 x l ) = p j Observatio: If durig a phase the isert-procedure rus ito a cycle there must exist a active cycle structure of size s 3. What is the probability that all keys i a cycle-structure of size s correctly map ito their T -cell? This probability is at most µ s sice h is a µ, s)-idepedet hash-fuctio. What is the probability that all keys i the cycle-structure of size s correctly map ito their T 2 -cell? This probability is at most µ s sice h 2 is a µ, s)-idepedet hash-fuctio. These evets are idepedet. Erst Mayr, Harald Räcke 280 Erst Mayr, Harald Räcke 28

The umber of cycle-structures of size s is at most The probability that a give cycle-structure of size s is active is at most µ2 2s. What is the probability that there exists a active cycle structure of size s? s 3 s m s. There are at most s 2 possibilities where to attach the forward ad backward liks. There are at most s possibilities to choose where to place key x. There are m s possibilities to choose the keys apart from x. There are s possibilities to choose the cells. Erst Mayr, Harald Räcke 282 Erst Mayr, Harald Räcke 283 The probability that there exists a active cycle-structure is therefore at most s=3 s 3 s m s µ2 µ2 = 2s m µ2 m 2 s=3 Here we used the fact that + ɛ)m. ) s m s 3 s=3 ) s ) s 3 O + ɛ m 2. Now, we aalyze the probability that a phase is ot successful without ruig ito a closed cycle. Hece, ) Pr[cycle] = O m 2. Erst Mayr, Harald Räcke 284 Erst Mayr, Harald Räcke 285

x 9 x 8 x x x 2 x 3 x 4 x 5 x 6 p p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 Sequece of visited keys: x = x, x 2, x 3, x 4, x 5, x 6, x 7, x 3, x 2, x = x, x 8, x 9,... x 7 Cosider the sequece of ot ecessarily distict keys startig with x i the order that they are visited durig the phase. Lemma If the sequece is of legth p the there exists a sub-sequece of at least p/3 keys startig with x of distict keys. Proof. x is cotaied at most twice i the sequece. Either the sub-sequece startig from x util right before the first repeated key, or the sub-sequece startig from the repetitio of x util the ed must cotai at least p/3 distict keys. Erst Mayr, Harald Räcke 286 Erst Mayr, Harald Räcke 287 x x x 2 x 3 x 4 x 5 x 6 x 7 x 8 p p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 x x x 2 x 3 x 4 x 5 x 6 x 7 x 8 p p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 A path-structure is active if for every key x l likig a cell p i from T ad a cell p j from T 2 ) we have h x l ) = p i ad h 2 x l ) = p j A path-structure of size s is defied by s + differet cells alteratig btw. cells from T ad T 2 ). s distict keys x = x, x 2,..., x s, likig the cells. The leftmost cell is either from T or T 2. Observatio: If a phase takes at least t steps without ruig ito a cycle there must exist a active path-structure of size 2t )/3. Erst Mayr, Harald Räcke 288 Erst Mayr, Harald Räcke 289

The probability that a give path-structure of size s is active is at most µ2 2s. The probability that there exists a active path-structure of size s is at most 2 s+ m s µ2 2s ) m s 2µ 2 2µ 2 + ɛ 2µ 2 + ɛ ) s ) 2t )/3 = 2µ 2 + ɛ ) 2t 2)/3. We choose maxsteps 3l 2 +. The the probability that a phase is termiated usuccessfully without ruig ito a cycle is at most Pr[usuccessful o cycle] Pr[ active path-structure of size at least Pr[ active path-structure of size at least l + ] 2µ 2 ) l + ɛ m 2 2maxsteps +) 3 ] by choosig l log ) ) 2µ 2 m /log 2 +ɛ = log 2µ 2 m 2) /log + ɛ ) Note that this gives maxsteps = Θlog m). Erst Mayr, Harald Räcke 290 Erst Mayr, Harald Räcke 29 The expected umber of steps i the successful phase of a isert operatio is: Hece, E[umber of steps phase successful] = Pr[search takes at least t steps phase successful] t We have Pr[search at least t steps successful] = Pr[search at least t steps successful]/ Pr[successful] Pr[search at least t steps o cycle], c E[umber of steps phase successful] = Pr[search at least t steps o cycle] c t [ + ] 2µ 2 ) 2t 2)/3 c + ɛ t 2 = c + 2µ2 ) t = O). c + ɛ) 2/3 t 0 where we use the fact that for a suitable costat c 0 Pr[successful] = Pr[o cycle] Pr[usuccessful o cycle] c Pr[o cycle] Erst Mayr, Harald Räcke 293

A phase that is ot successful iduces cost Om) for doig a complete rehash this domiates the cost for the steps i the phase). The probability that a phase is ot successful is p = O/m 2 ) probability O/m 2 ) of ruig ito a cycle ad probability O/m 2 ) of reachig maxsteps without ruig ito a cycle). The expected umber of usuccessful phases is i p i = p = p p = Op). What kid of hash-fuctios do we eed? Sice maxsteps is Θlog m) the largest size of a path-structure or cycle-structure cotais just Θlog m) differet keys. Therefore, it is sufficiet to have µ, Θlog m))-idepedet hash-fuctios. Therefore the expected cost for re-hashes is Om) Op) = O/m). Erst Mayr, Harald Räcke 294 Erst Mayr, Harald Räcke 295 How do we make sure that + ɛ)m? Let α := / + ɛ). Keep track of the umber of elemets i the table. Whe m α we double ad do a complete re-hash table-expad). Wheever m drops below α/4 we divide by 2 ad do a rehash table-shrik). Note that right after a chage i table-size we have m = α/2. I order for a table-expad to occur at least α/2 isertios are required. Similar, for a table-shrik at least α/4 deletios must occur. Therefore we ca amortize the rehash cost after a chage i table-size agaist the cost for isertios ad deletios. Lemma 2 has a expected costat isert-time ad a worst-case costat search-time. Note that the above lemma oly holds if the fill-factor umber of keys/total umber of hash-table slots) is at most 2+ɛ). Erst Mayr, Harald Räcke 296 Erst Mayr, Harald Räcke 297