Algorithms Proessor John Rei Hash Function : A B ALG 4.2 Universal Hash Functions: CLR - Chapter 34 Auxillary Reading Selections: AHU-Data Section 4.7 BB Section 8.4.4 Handout: Carter & Wegman, "Universal Classes o Hash Functions", JCSS, Vol. 8, pp. 43-54, 979. keys indices has conlict at x,y e A i xπy but (x = (y { s i xπy and (x = (y (x,y = 0 else 2
I H is a set o hash unctions, s H (x,y = Â s (x,y e H or set o keys S, s H (x,s = Â e H Â y e S s (x,y a Keys a ŒH ( a b ( a b - Total Conlicts b Indices b a b a b keys / index conlicts / index [( ( a b - ] a 2 b - a 3 4
H is a universal 2 set o hash unctions i s H (x,y H B or all x,y e A i.e. no pair o keys x,y are mapped into the same index by > B o all unctions in H Proposition Given any set H o hash n, $ x,y e A s.t. ( s H (x,y > H - B A proo let a = A, b = B By counting, we can show x (x = (y A B y Conlict ( - ( AA b - a a a s, b 2 2 b 5 6
Thus sh (A,A a 2 H ( b - a By the pidgeon hole principle $ x,y e A s.t s H (x,y > H ( b - a note in most applications, A >> B and then any universal 2 class, has asymptotically a minimum number o conlicts Proposition 2: Let x A, S c A For chosen randomly rom a universal 2 class H o hash unctions, the expected number o colisions is proo E( s (x,s = H = H Â y e S H Â S = B y e S S s (x,s B Â e H s (x,s s H (x,y by deinition H B ' by deinition o universal 2 7 8
application associative memory storage o S keys onto B linked lists. Given key x e A, store x in list (x Proposition 2 implies each list has expected S length B = 0( i B S Gives 0( time or STORE, RETRIEVE, and DELETE operations Proposition 3 Let R be a sequence o requests with k insertion operations into an associative memory. I is chosen at random rom set o universal 2 class H, the expected total cost o all k searches is R ( + k B. proo There are R total search ops, and each takes by Proposition 2 expected k time +. B 9 note i B k, then expected total time is O( R. 0
Bounds on distribution o s (x,s Proposition 4 Let x Œ A, S A Let m = expected value o s (x,s For chosen randomly rom universal 2 set o unctions H, Prob (s (x,s > t. m < t H = universal set o hash unctions. 2 E = Expected cost o random set o k requests using a worst case unction in H (random input proo immediate rom Markov bound improved prob t 4 bounds on probability: or universal hash ns. H2, H 3 (using 2nd and 4th moments o prob. distribution. E = Expected cost o worst case set o k requests 2 using a random unction in H (randomized algorithm 2
P rop 5 E ( - e E 2 where e = B A Example o Universal 2 Class proo Let a = A, b = B. S Prop 2 implies E 2 + b Suppose S is chosen randomly. or x, y e S, E( s (x,y = a 2 s (A,A Set o Keys Table Let A = {0,,..., a-} Set o Keys B = {0,,..., b-} Table Let p be a prime a Zp = {0,,..., p-} = number ield mod p deine g : Z p Æ B s.t. g(x = x mod b deine or n,m e Z p with m π o, a 2 a 2 ( b - a by Prop h n, m : A Æ Z p with h n, m (x = (mx+n mod p deine n, m : A Æ B s.t. n, m (x = g(h m, n (x ( b - a H = { m,n m,n e Z p, mπo} So E + E (s (x,s Claim: H is universal 2 + S ( b - a 3 4
proo Lemma or distinct x, y ea, s H (x,y = s g (Z p, Z p s g (Z p, Z p = {(r,s r,s e Z p, r π s, g(r = g(s} Observe that the linear equations: xm + n = r (mod p ym + n = s (mod p have unique solutions in Z p So (r,s = (h m, n (x, h m, n (y then ( m, n (x = m, n (y i and only i g(r = g(s s H (x,y is the number o such pairs in (r,s e s g (Z p, Z p Theorem proo H is universal 2 Let n i = {t e Z p g(t = i} By deinition o g(x = x mod b, p- i n i + b For any given r, the number o s where s π r and g(r = g(s is s (r, Z p g p-. b But there are p choices o r, (p- so p ( b s (Z p, Z p g = s H (x,y by Lemma (Also note s (x,x = 0 H H Hence s H (x,y b since H = p(p- so H is universal 2 5 6
Universal Hash Fns on Long keys Given class o hash unctions H, deine hash unctions J = {h,g,g Œ H } where h,g (x, x 2 = (x g(x 2 exclusive or { 0 } Theorem Suppose B =,,, b = power o 2. Suppose this class o ns A $ real r" ieb" x, y ea, x π y K where b is a Æ B i { ( ( = } H e x y i rh Then " xy, Œ ( A A, x π y hj e hx ( hy ( = i rh { } Proo or x x, x, y y, y in A A = ( 2 = ( 2 { } ib e then hj e hx ( hy ( = i { g H x g x y g y i} =, ( ( ( ( = e 2 2 { }  H x y i gx gy = e ( ( = ( ( yh e 2 2 { Œ ( ( = } H x y i rh 7 example H with m = 0 gives J with r = universal! B 8
Universal 2 Hashing with out Multiplication A = set o d digit numbers base a so, A = a d B = set o binary numbers length j M = arrays o length d. a, with elements in B "m e M let m(k = kth element o array m "x e A let x k = kth digit o x base a d deinition m (x = m(x + m(x +x 2 +2... m( Â x k k= +k array m m( m(k Theorem H 2 = { m m e M } is universal 2 proo or x, y e A, let m (x = r r 2... r s rows o m m (y = r s+... r t Then m (x = m (y i r... r t = o But i xπy i $ k s. t. r k in only one o m (x, m (y so ( (x = (y m m i r k = r i i π k But there are only B possibilities or row r k so x,y will collide or o ns m e H B 2 Hence H 2 is universal 2 9 20
Analysis o Hashing or Uniorm Random Hash n Hashing with Chaining keep list o conlicts at each index load actor a = # o keys hashed # o indicies in Hash Table...... Conlicts... length is binomial variable expected length = a 2 Expected Time Cost per hash = O( + a By Cherno Bounds, with high likelyhood time cost per hash O( alog (# keys 22
Open Address Hashing (With Uniorm Random Hash n Resolve conlicts by applying another hash unction a = load actor = prob. o occupied hash address # rehashes as geometric variable expected hash time = - a = 2 + a + a + K 23