Hashing. Alexandra Stefan - PDF Free Download

Hashng Alexandra Stefan 1

Hash tables Tables Drect access table (or key-ndex table): key => ndex Hash table: key => hash value => ndex Man components Hash functon Collson resoluton Dfferent keys mapped to the same ndex Dynamc hashng reallocate the table as needed If an Insert operaton brngs the load factor to ½, double the table. If a Delete operaton brngs the load factor to 1/8, half the table. Propertes: Good tme-space trade-off Good for: Search, nsert, delete O(1) Not good for: Select, sort not supported, must use a dfferent method Readng: chapter 11, CLRS (chapter 1, Sedgewck has more complexty analyss)

Example Let M = 10, h(k) = k%10 Insert keys: 6 -> 6 15 -> 5 0 -> 0 37 -> 7 3 -> 3 5 -> 5 collson 35 -> 9 -> Collson resoluton: - Separate channg - Open addressng - Lnear probng - Quadratc probng ndex k 0 0 1 3 3 5 15 6 6 7 37 - Double hashng 3 8 9

Hash functons M table sze. h hash functon that maps a key to an ndex We want random-lke behavor: any key can be mapped to any ndex wth equal probablty. Typcal functons: h(k,m) = floor( ((k-s)/(t-s))* M ) Here s k<t. Smple, good f keys are random, not so good otherwse. h(k,m) = k % M Best M s a prme number. (Avod M that has a power of factor, wll generate more collsons). Choose M a prme number that s closest to the desred table sze. If M = p, t uses only the lower order p bts => bad, deally use all bts. h(k,m) = floor(m*(k*a mod 1)), 0<A<1 (n CLRS) Good A = 0.618033 (the golden rato) Useful when M s not prme (can pck M to be a power of ) Alternatve: h(k,m) = (16161 * (unsgned)k ) % M (from Sedgewck)

Collson Resoluton: Separate Channg α = N/M (N tems n the table, M table sze) load factor Separate channg Each table entry ponts to a lst of all tems whose keys were mapped to that ndex. Requres extra space for lnks n lsts Lsts wll be short. On average, sze α. Preferred when the table must support deletons. Operatons: Chan_Insert(T,x) - O(1) nsert x n lst T[h(x.key)] at begnnng. No search for duplcates Chan_Delete(T,x) O(1) delete x from lst T[h(x.key)] Chan_Search(T, k) Θ(1+ α) (both successful and unsuccessful) search n lst T[h(k)] for an tem x wth x.key == k. 5

Separate Channg Example: nsert 5 Let M = 10, h(k) = k%10 Insert keys: 6 -> 6 15 -> 5 0 -> 0 37 -> 7 3 -> 3 5 -> 5 collson 35 9 -> ndex 0 1 3 5 6 7 8 9 k 0 3 5 15 6 37 Insertng at the begnnng of the lst s advantageous n cases where the more recent data s more lkely to be accessed agan (e.g. new students wll have to go see several offces at UTA and so they wll be lookedup more frequently than contnung students. α = 6

Collson Resoluton: Open Addressng α = N/M (N tems n the table, M table sze) load factor Open addressng: Use empty cells n the table to store colldng elements. M > N α rato of used cells from the table (<1). Probng examnng slots n the table. Number of probes = number of slots examned. Lnear probng: h(k,,m) = (h 1 (k) + )% M, If the slot where the key s hashed s taken, use the next avalable slot. Long chans Quadratc probng: h(k,,m) = (h 1 (k) + c 1 + c )% M, If two keys hash to the same value, they follow the same set of probes. Double hashng: h(k,,m) = (h 1 (k) + * h (k)) % M, h (k) should not evaluate to 0. (E.g. use: h (k) = 1 + k%(m-1) ) Use a second hash value as the jump sze (as opposed to sze 1 n lnear probng). Want: h (k) relatvely prme wth M. M prme and h (k) = 1 + k%(m-1) M= p and h (k) = odd (M and h (k) wll be relatvely prme snce all the dvsors of M are powers of, thus even). See fgure 1.10, page 596 (Sedgewck) for clusterng produced by lnear probng and double hashng. 7

Open Addressng: quadratc Worksheet M = 10, h 1 (k) = k%10. Table already contans keys: 6, 15, 0, 37, 3 Next want to nsert 5: h 1 (5) = 5 (collson: 5 wth 15) Lnear probng Index Lnear Quadratc Double hashng h (k) = 1+(k%7) Double hashng h (k) = 1+(k%9) - h(k,,m) = (h 1 (k) + )% M 0 0 0 0 0 (try slots: 5,6,7,8) Quadratc probng example: 1 - h(k,,m) = (h 1 (k) + + )% M (try slots: 5, 8) - Insertng 35(not shown n table): (probe) 0 h 1 (k) + + %10 3 3 3 3 3 5 15 15 15 15 (try slots: 5, 8, 3,0) 1 6 6 6 6 6 7 37 37 37 37 3 8 9 8 Where wll 9 be nserted now (after 35)?

Open Addressng: quadratc Answers M = 10, h 1 (k) = k%10. Table already contans keys: 6, 15, 0, 37, 3 Next want to nsert 5: h 1 (5) = 5 (collson: 5 wth 15) Lnear probng - h(k,,m) = (h 1 (k) + )% M (try slots: 5,6,7,8) Quadratc probng example: - h(k,,m) = (h 1 (k) + + )% M (try slots: 5, 8) - Insertng 35(not shown n table): (try slots: 5, 8, 3,0) (probe) h 1 (k) + + %10 0 5+0=5 5 1 5+3=8 8 5+8=13 3 3 5+*3+3 = 5+15=0 0 Index Lnear Quadratc Double hashng h (k) = 1+(k%7) Double hashng h (k) = 1+(k%9) 0 0 0 0 0 1 5 3 3 3 3 3 5 15 15 15 15 6 6 6 6 6 7 37 37 37 37 8 5 5 9 9 Where wll 9 be nserted now (after 35)?

Open Addressng : double hashng - Worksheet M = 10, h 1 (k) = k%10. Table already contans keys: 6, 15, 0, 37, 3 Try to nsert 5: h 1 (5) = 5 (collson: 5 wth 15) Double hashng example - h(k,,m) = (h 1 (k) + * h (k)) % M Choce of h matters: - h a (k) = 1+(k%7): try slots: 5, 9, - h (5) = 1+ (5%7) = 1+ = 5 => h(k,,m) = (5 + *5)%M => slots: 5,0,5,0, Cannot nsert 5. - h b (k) = 1+(k%9): - h (5) = 1 + (5%9) = 1 + 7 = 8 => h(k,,m) = (5 + *8)%M => slots: 5,3,1,7,5, (probe) h 1 (k) + + %10 0 5+0=5 5 1 5+3=8 8 5+8=13 3 3 5+*3+3 = 5+15=0 0 Index Lnear Quadratc Double hashng h (k) = 1+(k%7) Double hashng h (k) = 1+(k%9) 0 0 0 0 0 (pro be) 0 1 Inde x (h 1 (k) + *h a (k) )%M (5+*5)%10 (pro be) 0 1 Index (h 1 (k) + *h b (k) )%M (5+*8)%10 1 5 3 3 3 3 3 5 15 15 15 15 6 6 6 6 6 3 3 7 37 37 37 37 8 5 5 5 9 Where wll 9 be nserted now? 10

Open Addressng : double hashng - Answers M = 10, h 1 (k) = k%10. Table already contans keys: 6, 15, 0, 37, 3 Try to nsert 5: h 1 (5) = 5 (collson: 5 wth 15) Double hashng example - h(k,,m) = (h 1 (k) + * h (k)) % M Choce of h matters: - h a (k) = 1+(k%7): try slots: 5, 9, - h (5) = 1+ (5%7) = 1+ = 5 => h(k,,m) = (5 + *5)%M => slots: 5,0,5,0, Cannot nsert 5. - h b (k) = 1+(k%9): - h (5) = 1 + (5%9) = 1 + 7 = 8 => h(k,,m) = (5 + *8)%M => slots: 5,3,1,7,5, (probe) h 1 (k) + + %10 0 5+0=5 5 1 5+3=8 8 5+8=13 3 3 5+*3+3 = 5+15=0 0 Index Lnear Quadratc Double hashng h (k) = 1+(k%7) Double hashng h (k) = 1+(k%9) 0 0 0 0 0 (pro be) Inde x (h 1 (k) + *h a (k) )%M (5+*5)%10 0 5 (5+0)%10=5 1 0 (5+5)%10=0 5 (5+*5)%10 =15%10= 5 Cycles back to 5 => Cannot nsert 5 3 0 (5+3*5)%10=0%10= 0 (pro be) Index (h 1 (k) + *h b (k) )%M (5+*8)%10 0 5 (5+0)%10=5 1 3 (5+8)%10=3 1 (5+*8)%10 =1%10= 1 3 9 (5+3*8)%10=9%10= 9 7 (5+*8)%10=37%10= 7 5 5 (5+5*8)%10=5%10= 5 Cycles back to 5 1 5 3 3 3 3 3 5 15 15 15 15 6 6 6 6 6 7 37 37 37 37 8 5 5 9 Where wll 9 be nserted now? 11

Open Addressng: Quadratc vs double hashng M = 10, h 1 (k) = k%10. Table already contans keys: 6, 15, 0, 37, 3 Try to nsert 5: h 1 (5) = 5 (collson: 5 wth 15) Quadratc hashng wth: + h(k)= (h 1 (k) + + )%M where h 1 (k) = k%m (probe) 0 1 3 Index h(k)=(h 1 (k) + + )%M =(5 + + )%10 (k=5) Double hashng: h(k)=(h 1 (k)+*h (k) )%M where: h 1 (k) = k%m h (k) = 1+(k%(M-1)) = 1+(k%9) h (5) = 1+(5%9) = 1+7 = 8 (pro be) 0 1 3 5 Index h(k)=(h 1 (k)+*h (k) )%M = (5+*8)%10 (for k=5) Index Lnear Quadratc Double hashng h (k) = 1+(k%7) Double hashng h (k) = 1+(k%9) 0 0 0 0 0 1 5 3 3 3 3 3 5 15 15 15 15 6 6 6 6 6 7 37 37 37 37 Choce of h matters. See h (k) = 1+(k%7) h (5) = 1+ = 5 => h(5) cycles: 5,0,5,0 => Could not nsert 5. 8 5 5 9 Where wll 9 be nserted now? 1

Open Addressng: Quadratc vs double hashng M = 10, h 1 (k) = k%10. Table already contans keys: 6, 15, 0, 37, 3 Try to nsert 5: h 1 (5) = 5 (collson: 5 wth 15) Quadratc hashng wth: + h(k)= (h 1 (k) + + )%M where h 1 (k) = k%m (probe) Index 0 5 5+0=5 1 8 5+3=8 3 5+8=13 h(k)=(h 1 (k) + + )%M =(5 + + )%10 (k=5) 3 0 5+*3+3 = 5+15=0 Double hashng: h(k)=(h 1 (k)+*h (k) )%M where: h 1 (k) = k%m h (k) = 1+(k%(M-1)) = 1+(k%9) h (5) = 1+(5%9) = 1+7 = 8 (pro be) Index 0 5 (5+0)%10=5 1 3 (5+8)%10=3 h(k)=(h 1 (k)+*h (k) )%M = (5+*8)%10 (for k=5) 1 (5+*8)%10 =31%10= 1 3 9 (5+3*8)%10=9%10= 9 7 (5+*8)%10=37%10= 7 5 5 (5+5*8)%10=5%10= 5 Cycles back to 5 Choce of h matters. See h (k) = 1+(k%7) h (5) = 1+ = 5 => h(5) cycles: 5,0,5,0 => Could not nsert 5. Index Lnear Quadratc Double hashng h (k) = 1+(k%7) Double hashng h (k) = 1+(k%9) 0 0 0 0 0 1 5 3 3 3 3 3 5 15 15 15 15 6 6 6 6 6 7 37 37 37 37 8 5 5 9 Where wll 9 be nserted now? 13

Searchng: Search and Deleton n Open Addressng Report as not found when land on an EMPTY cell Deleton: Mark the cell as DELETED, not as an EMPTY cell Otherwse you wll break the chan and not be able to fnd elements followng n that chan. E.g., wth lnear probng, and hash functon h(k,,10) = (k + ) %10, nsert 15,5,35,5, search for 5, then delete 5 and search for 5 or 35. 1

Open Addressng: clusterng Lnear probng prmary clusterng: the longer the chan, the hgher the probablty that t wll ncrease. Gven a chan of sze T n a table of sze M, what s the probablty that ths chan wll ncrease after a new nserton? Quadratc probng 15

Expected Tme Complexty for Hash Operatons (under perfect condtons) Operaton \Methods Separate channg Open Addressng Successful search Θ(1+α) (1/α)ln(1/(1-α)) Unsuccessful search Θ(1+α) 1/(1-α) Insert Delete Θ(1) When: nsert at begnnng and no search for duplcates Θ(1) Assumes: doubly-lnked lst and node wth tem to be deleted s gven. 1/(1-α) Perfect condtons: smple unform hashng unform hashng The tme complexty does not depend only on α, (but also on the deleted cells). In such cases separate channg may be preferred to open addressng as ts behavor has better guarantees. Reference Theorem 11.1 and 11. Theorem 11.6 and 11.8 and corollary 11.7 n CLRS α = N/M s the load factor 16

Perfect Hashng Smlar to separate channg, but use another hash table nstead of a lnked lst. Can be done for statc keys (once the keys are stored n the table, the set of keys never changes). Corollary 11.11:Suppose that we store N keys n a hash table usng perfect hashng. Then the expected storage used for the secondary hash tables s less than N. 17

Hashng Strngs Gven strngs wth 7-bt encoded characters Vew the strng as a number n base 18 where each letter s a dgt. Convert t to base 10 Apply mod M Example: now : n = 110, o = 111, w = 119: h( now,m) = (110*18 + 111*18 1 + 119) % M See Sedgewck page 576 (fgure 1.3) for the mportance of M for collsons M = 31 s better than M = 6 When M = 6, t s a dvsor of 18 => (110*18 + 111*18 1 + 119) % M = 119 % M => only the last letter s used. => f lowercase Englsh letters: a,,z=> 97,,1 => (%6) => 6 ndexes, [33,...,58], used out of [0,63] avalable. Addtonal collsons wll happen due to last letter dstrbuton (how many words end n t?). 18

Hashng Long Strngs For long strngs, the prevous method wll result n number overflow. E.g.: 6-bt ntegers 11 character long word: c*18 10 = c*( 7 ) 10 = c* 70 Vew the word as a number n base 18 (each letter s a dgt ). Soluton: partal calculatons: Replace: c 10 *18 10 + c 9 *18 9 + c 1 *18 1 + c 0 wth: ((c 10 * 18 + c 9 ) * 18 + c 8 )*18+.+ c 1 )*18 + c 0 ) Compute t teratvely and apply mod M at each teraton: // (Sedgewck) nt hash(char *v, nt M) { nt h = 0, a = 18; for (; *v!= '\0'; v++) h = (a*h + *v) % M; return h; } // mprovement: use 17, not 18: (Sedgewck) nt hash(char *v, nt M) { nt h = 0, a = 17; for (; *v!= '\0'; v++) h = (a*h + *v) % M; return h; 19 }