Outline for Today. A simple and lightning fast hash table implementation. Why the degree of independence matters.

Size: px

Start display at page:

Download "Outline for Today. A simple and lightning fast hash table implementation. Why the degree of independence matters."

Franklin Hardy
5 years ago
Views:

1 Liear Probig

2 Outlie for Today Liear Probig Hashig A simple ad lightig fast hash table implemetatio. Aalyzig Liear Probig Why the degree of idepedece matters. Fourth Momet Bouds Aother approach for estimatig frequecies.

3 Hashig Strategies All hash table implemetatios eed to address what happes whe collisios occur. Commo strategies: Closed addressig: Store all elemets with hash collisios i a secodary data structure (liked list, BST, etc.) Perfect hashig: Choose hash fuctios to esure that collisios do't happe, ad rehash or move elemets whe they do. Ope addressig: Allow elemets to leak out from their preferred positio ad spill over ito other positios. Liear probig is a example of ope addressig. We'll see a type of perfect hashig (cuckoo hashig) o Thursday.

4 Liear Probig Liear probig is a simple ope-addressig hashig strategy. To isert a elemet x, compute h(x) ad try to place x there. If that spot is occupied, keep movig through the array, wrappig aroud at the ed, util a free spot is foud. y x 2 6 z 3 4 5

5 Liear Probig Liear probig is a simple ope-addressig hashig strategy. To isert a elemet x, compute h(x) ad try to place x there. If that spot is occupied, keep movig through the array, wrappig aroud at the ed, util a free spot is foud. y w 1 7 x 2 6 z 3 4 5

6 Liear Probig Liear probig is a simple ope-addressig hashig strategy. To isert a elemet x, compute h(x) ad try to place x there. If that spot is occupied, keep movig through the array, wrappig aroud at the ed, util a free spot is foud. y w 1 7 x 2 6 z 3 4 5

7 Liear Probig To look up a elemet x, compute h(x) ad start lookig there. Move aroud the rig util either the elemet is foud or a blak spot is detected. (We'll assume the load factor prohibits us from isertig so may elemets that there are o free spaces.) y r r 8 w 1 7 x 2 6 z 3 4 5

8 Liear Probig To look up a elemet x, compute h(x) ad start lookig there. Move aroud the rig util either the elemet is foud or a blak spot is detected. (We'll assume the load factor prohibits us from isertig so may elemets that there are o free spaces.) y r w r 1 7 x 2 6 z 3 4 5

9 Liear Probig To look up a elemet x, compute h(x) ad start lookig there. Move aroud the rig util either the elemet is foud or a blak spot is detected. (We'll assume the load factor prohibits us from isertig so may elemets that there are o free spaces.) y r r 8 w 1 7 x 2 6 z 3 4 5

10 Liear Probig To look up a elemet x, compute h(x) ad start lookig there. Move aroud the rig util either the elemet is foud or a blak spot is detected. (We'll assume the load factor prohibits us from isertig so may elemets that there are o free spaces.) y r 15 9 r 0 8 w 1 7 x 2 6 z 3 4 5

11 Liear Probig Deletios are a bit trickier tha i chaied hashig. y z We caot just do a search ad remove the elemet where we fid it. Why? r 9 w 8 w 7 x

12 Liear Probig Deletios are a bit trickier tha i chaied hashig. y z We caot just do a search ad remove the elemet where we fid it. Why? r 9 r 8 7 x

13 Liear Probig Deletios are a bit trickier tha i chaied hashig. y z We caot just do a search ad remove the elemet where we fid it. Why? r r x

14 Liear Probig Deletios are ofte implemeted usig tombstoes. Whe removig a elemet, mark that the cell is empty ad was previously occupied. y z 3 Whe doig a lookup, do't stop at a tombstoe. Istead, keep the search goig w 4 5 You eed to watch out for wraparouds. Whe isertig, feel free to replace ay tombstoe you ecouter. 10 r 9 8 w 7 x 6

15 Liear Probig Deletios are ofte implemeted usig tombstoes. Whe removig a elemet, mark that the cell is empty ad was previously occupied. y z 3 Whe doig a lookup, do't stop at a tombstoe. Istead, keep the search goig You eed to watch out for wraparouds. Whe isertig, feel free to replace ay tombstoe you ecouter. 10 r 9 8 墓 7 x 6

16 Liear Probig Deletios are ofte implemeted usig tombstoes. Whe removig a elemet, mark that the cell is empty ad was previously occupied. y z 3 Whe doig a lookup, do't stop at a tombstoe. Istead, keep the search goig You eed to watch out for wraparouds. Whe isertig, feel free to replace ay tombstoe you ecouter. 10 r 9 r 8 墓 7 x 6

17 Liear Probig i Practice I practice, liear probig is oe of the fastest geeral-purpose hashig strategies available. This is surprisig it was origially iveted i 1954! It's pretty amazig that it still holds up so well. Why is this? Low memory overhead: just eed a array ad a hash fuctio. Excellet locality: whe collisios occur, we oly search i adjacet locatios i the array. Great cache performace: a combiatio of the above two factors.

18 The Weakess Liear probig exhibits severe performace degradatios whe the load factor gets high. The umber of collisios teds to grow as a fuctio of the umber of existig collisios. This is called primary clusterig

19 So how fast is liear probig?

20 Time-Out for Aoucemets!

21 Fial Project Topics Fial project topics have bee assiged, ad we re really excited to see what you ed up makig! We recommed that you make slow ad steady progress o the project over the ext couple of weeks. We ll work out a presetatio schedule i a week or so.

22 Problem Sets Problem Set Four is due this Thursday at 2:30PM. Have questios? Stop by office hours or ask o Piazza! We re workig o gradig PS3 right ow ad will try to get it back to you soo. PS5 will go out o Thursday ad will be due oe week from this Thursday. Ad that s it!

23 Later This Week Keith will be out of tow through the ed of the week. Rafa ad Mitchell will be coverig Keith s office hours at the regular time (2PM 4PM) i the Huag Basemet. Sam will be givig Thursday s lecture o cuckoo hashig (super iterestig stuff!)

25 GTGTC Exec Applicatios Girls Teachig Girls to Code (GTGTC) is lookig for people to serve o ext year s executive committee. This is a excellet program that s bee aroud for years. It s a great way to make a impact. Iterested? Apply here by this Suday.

26 Back to CS166!

27 Aalyzig Liear Probig

28 You probably saw a aalysis of chaied hash tables i CS161. What makes liear probig differet, iterestig, or oteworthy?

29 Why Liear Probig is Differet I chaied hashig, collisios oly occur whe two values have exactly the same hash code. I liear probig, collisios ca occur betwee elemets with etirely differet hash codes. To aalyze liear probig, we eed to kow more tha just how may elemets collide with us The 3 3 The lookup lookup time time here here is is huge huge eve eve though though this this key key oly 4 4 oly directly directly collides collides with with oe oe other. other

30 Some Brief History I 1954, Gee Amdahl, Elaie McGraw, ad Arthur Samuel ivet liear probig as a subroutie for a assembler. I 1962, Do Kuth, i his first ever aalysis of a algorithm, proves that liear probig takes expected time O(1) for lookups if the hash fuctio is truly radom (-wise idepedece). I 1995, Schmidt ad Siegel proved O(log )-idepedet hash fuctios guaratee fast performace for liear probig, but ote that such hash fuctios either take a log time to evaluate or require a lot of space. I 2006, Aa Pagh et al. proved that 5-idepedet hash fuctios give expected costat-time lookups. (This is the aalysis we ll see today.) These hash fuctios ca be stored i O(1) space ad evaluated i O(1) time. I 2007, Mitzemacher ad Vadha proved that 2-idepedece will give expected O(1)-time lookups, assumig there s some measure of radomess i the keys. I 2010, Pătrașcu ad Thorup proved that 5-idepedece is the miimum idepedece eeded for adversarially-chose keys.

31 The Aalysis!

32 For simplicity, let s assume a load factor of α = ¹/₃. This elemet is far from home. A regio of size m is a cosecutive set of m locatios i the hash table A elemet x hashes to regio R if h(x) R, though x may ot be placed i R O expectatio, a regio of size 2 s should have at most ¹/₃ 2 s elemets hash to it. It would be very ulucky if a regio had twice as may elemets i it as expected. A regio of size 2 s is overloaded if at least ²/₃ 2 s elemets hash to it Ituitio: If If a a elemet elemet eds eds up up far far from from its its home home locatio, locatio, the the some some large large regio regio ear ear its its home home has has to to be be overloaded.

33 Theorem: The probability that a elemet xₐ eds up betwee 2 s ad 2 s+1 steps from its home locatio is upper-bouded by c Pr[ the regio of size 2 s cetered c Pr[ o h(xₐ) is overloaded for some fixed costat c idepedet of s. Proof: Set up some cleverly-chose rages over the hash table ad use the pigeohole priciple. See Thorup s lecture otes.

34 Aalyzig the Rutime The cost of lookig up some key xₐ is bouded from above by the legth of the ru cotaiig xₐ. The expected cost of performig a lookup is therefore at most log O(1) s=0 The previous theorem tells us that this cost is log O(1) s=0 2 s+1 Pr [ x q is betwee 2 s ad 2 s+1 spots from home 2 s Pr[ the regio of size 2 s o h( x a ) is overloaded If we ca determie the probability that a regio of size 2 s is overloaded, we'll have a boud o the expected lookup cost for xₐ.

35 Overloaded Regios Recall: A regio is a cotiguous spa of table slots, ad we ve chose α = ¹/₃. A overloaded regio has at least ⅔ 2ˢ elemets i it. Let the radom variable Bₛ represet the umber of keys that hash ito the block of size 2ˢ cetered o h(xₐ). We wat to kow Pr[ Bₛ ⅔ 2 s. Assumig our hash fuctios are at least 2-idepedet, we have E[Bₛ = ⅓ 2ˢ. The the above quatity is equivalet to Pr[ Bₛ 2 E[Bₛ, ad lookig up a elemet takes, o expectatio, time log O(1) s=0 2 s Pr[ B s 2 E[ B s

36 Cocetratio Iequalities The expressio Pr[ Bₛ 2 E[Bₛ seems like a perfect case to try to use a cocetratio boud, like we did last Thursday. Kowig othig about Bₛ other tha the fact that it's oegative, we could start off by tryig to use Markov's iequality: Usig what we have: Pr[ X c E[X / c Pr[ Bₛ 2 E[Bₛ E[Bₛ / 2 E[Bₛ = ½. That's a pretty weak boud. What does that do to our aalysis?

37 A Rutime Boud The expected cost of lookig up xₐ i a liear probig table is log O(1) 2 s Pr[ B s 2 E[ B s s=0 Assumig 2-idepedet hashig, this is log O(1) s=0 log O(1) s=0 log = O(1) s=0 = O() 2 s Pr[ B s 2 E[ B k 2 s s This boud is ot at all useful. We're goig to eed to do better tha this!

38 Cocetratio Iequalities This aalysis used Markov s iequality without ay additioal kowledge about Bₛ. Bₛ is the umber of elemets that hash ito the block of size 2ˢ ear h(xₐ). What does that tell us? Let Xᵢₛ be a idicator variable that's 1 if xᵢ hashes ito the regio of size 2ˢ cetered o h(xₐ) ad 0 otherwise. The we ca write B s = Notice that X is. E[ B s = E[ X is = E [ X is.

39 Cheroff Bouds Last time, we saw the Cheroff boud, which says that if X ~ Biom(, p) ad p < 1/2, the We just saw that our variable Bₛ is the sum of a umber of Beroulli variables Xᵢₛ, so it seems like we might be able to apply Cheroff bouds here. Problem: These Xᵢₛ variables are ot idepedet of oe aother! (1/2 p) 2 2p Pr[ X > /2 < e We kow h is k-idepedet ad we kow what h(xₐ) is. So ay other group of k-1 hashes are idepedet, but ot all of them. Therefore, Bₛ is ot biomially distributed, so we ca't use a Cheroff boud.

40 Chebyshev's Iequality The last remaiig boud that we used last time was Chebyshev's iequality, which states that Pr [ X E[X c Var[X / c 2. If we ca determie Var[Bₛ, the we ca try usig Chebyshev's iequality to boud the probability that Bₛ is too large.

41 The Variace Var[ B s = Var [ = = X is Var[ X is E[ X is 2 E[ X is = E[ X is = E[ B s Assume, Assume, goig goig forward, forward, that that the the Xᵢₛ's Xᵢₛ's are are pairwise pairwise idepedet. idepedet. We're We're already already coditioig coditioig o o kowig kowig h(xₐ). h(xₐ). This This meas meas that that we we eed eed our our hash hash fuctio fuctio to to be be at at least least 3-idepedet 3-idepedet from from this this poit poit oward. oward.

42 The Variace Stadard Stadard techique techique we we saw saw last last time: time: use use the the fact fact that that Var[Z Var[Z E[Z E[Z Var[ B s = Var [ = = X is Var[ X is E[ X 2 is E[ X is = E[ X is = E[ B s

43 The Variace Stadard Stadard techique techique we we saw saw last last time: time: if if Z is is a a idicator idicator variable, variable, the the Z 2 2 = Z. Z. Var[ B s = Var [ = = X is Var[ X is E[ X 2 is E[ X is = E[ X is = E[ B s

44 The Variace More More geerally: geerally: if if X is is a sum sum of of pairwise pairwise idepedet idepedet idicator idicator variables, variables, the the Var[X Var[X E[X. E[X. Var[ B s = Var [ = = X is Var[ X is E[ X 2 is E[ X is = E[ X is = E[ B s

45 Usig Chebyshev We wat to kow Pr[ Bₛ 2 E[Bₛ = Pr[ Bₛ E[Bₛ E[Bₛ Usig Chebyshev's iequality: Pr[ Bₛ E[Bₛ E[Bₛ Pr[ Bₛ E[Bₛ E[Bₛ Var[Bₛ / E[Bₛ 2 E[Bₛ / E[Bₛ 2 = 1 / E[Bₛ = 3 2 ˢ.

46 A Better Boud The expected cost of lookig up xₐ i a liear probig table is log O(1) 2 s Pr[ B s 2 E[ B s s=0 Assumig 3-idepedet hashig, this is log O(1) s=0 log O(1) s=0 log = O(1) s=0 = O(log ) 2 s Pr[ B s 2 E[ B s 2 s 3 2 s 3 Theorem: This rutime boud is tight (there's a adversarial choice of a 3-idepedet hash fuctio that degrades the rutime to this level.)

47 Why This Works Key idea: Icreasig the degree of idepedece lets us cotrol the variace of the distributio. With 2-idepedet hashig, we use oe degree of idepedece to coditio o kowig where some specific key lads. At that poit, we oly have oe more degree of idepedece ot eough to cotrol the variace! With 3-idepedet hashig, we use oe degree of idepedece to coditio o kowig where the key lads. We ca the use the two remaiig degrees of idepedece to cotrol the variace ad use Chebyshev's iequality. Small icreases to the idepedece of a hash fuctio ca dramatically tighte cocetratio bouds.

48 Questio: If we icrease the degree of idepedece further, ca we costrai the spread of the elemets i a way that improves our rutime? (This is the theory versio of ca we do better? )

49 Geeralizig Variace The variace of a radom variable X is defied as Var[X = E[(X E[X) 2. We ca geeralize this to higher expoets. The fourth cetral momet of X, deoted 4th[X, is defied as 4th[X = E[(X E[X) 4. Like the variace, 4th[X measures how likely we are to get far away from E[X. Because of the fourth-power term, 4th[X is much more sesitive to outliers.

50 Geeralizig Chebyshev The fourth momet iequality states that Pr[ X E[X c 4th[X / c 4. Proof: Let X be a radom variable. The Pr [ X E[X c = Pr[ (X E[X) 4 c 4. Let Y = (X E[X) 4. Notice that E[Y = E[(X E[X) 4 = 4th[X, so via Markov's iequality, we have Pr[ X E[X c = Pr[ Y c 4 Good Good questio questio to to poder: poder: why why does't does't this this work work for for the the third third cetral cetral momet, momet, where where 3rd[X 3rd[X = (X (X E[X) E[X) 3? 3? E[Y / c 4 = 4th[X / c 4.

51 Geeralizig Idicator Variace Theorem: If X is a idicator variable for the evet Ɛ, the 4th[X E[X. Proof: X takes o value 1 with probability Pr[Ɛ ad 0 with probability 1 Pr[Ɛ. Therefore, we have 4th[X = E[(X E[X) 4 = (1 Pr[Ɛ) 4 Pr[Ɛ + Pr[Ɛ 4 (1 Pr[Ɛ) (1 Pr[Ɛ) 3 Pr[Ɛ + Pr[Ɛ 4 = Pr[Ɛ Pr[Ɛ 4 + Pr[Ɛ 4 = Pr[Ɛ = E[X. Read Read this this o o your your ow ow time time it s it s cute! cute!

52 Updatig our Aalysis For liear probig, we're ultimately iterested i boudig Pr[ Bₛ 2 E[Bₛ i the case where Bₛ represets the umber of elemets hittig a particular block. Usig 2-idepedet hashig, the best boud we could use was Markov's iequality, which gave a extremely weak boud. Usig 3-idepedet hashig, we could use Chebyshev's iequality, which gave a iverse expoetial boud. Questio: If we use stroger hash fuctios, ca we tighte this boud usig the fourth momet iequality?

53 What is 4th[Bₛ?

54 The Limits of Our Geeralizatio There s a lovely little expressio for Var[X: That s because Var[X = E[X 2 E[X 2. Var[X = E[(X E[X) 2 = E[X 2 2X E[X + E[X 2 = E[X 2 2E[X E[X + E[X 2 = E[X 2 2E[X 2 + E[X 2 = E[X 2 E[X 2. We ca try this for fourth momets, but, well, um... 4th[X = E[(X E[X) 4 = E[X 4 4X 3 E[X + 6X 2 E[X 2 4X E[X 3 + E[X 4 = E[X 4 4E[X E[X 3 + 6E[X 2 E[X 2 4E[X E[X 3 + E[X 4 = E[X 4 4E[X E[X 3 + 6E[X 2 E[X 2 3E[X 4 = \_( ツ )_/

55 The Fourth Momet Let s see if we ca boud 4th[Bₛ. 4th[ B s = E[(B s E[B s ) 4 = E[( X is E[ X is )4 = E[( ( X is ))4 = E[ j=1 k=1 l=1 = j=1 k=1 l=1 ( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) So ow we just eed to simplify this expressio.

56 Icreasig our Idepedece We ow have this lovely expressio: 4th[B s = j=1 k=1 l=1 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) Recall: If our hash fuctio is k-idepedet, the we've already used oe degree of idepedece coditioig o kowig where h(xₐ) is. That leaves us with k-1 degrees of idepedece. Let's suppose we're usig a 5-idepedet hash fuctio, meaig that ay four hash values are idepedet of oe aother. This allows us to dramatically simplify this expressio.

57 Explorig this Summatio The terms of this summatio might sometimes rage over the same variables at the same time: 4th[B s = E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) j=1 k=1 l=1 Claim: Ay term i the above summatio where Xᵢₛ is a differet radom variable tha Xⱼₛ, Xₖₛ, ad Xₗₛ is zero. Proof: Suppose that Xᵢₛ is a differet radom variable from the others. The sice Xᵢₛ, Xⱼₛ, Xₖₛ, ad Xₗₛ are idepedet, we have = E[ (Xᵢₛ E[Xᵢₛ)(Xⱼₛ E[Xⱼₛ)(Xₖₛ E[Xₖₛ)(Xₗₛ E[Xₗₛ) = E[Xᵢₛ E[Xᵢₛ E[(Xⱼₛ E[Xⱼₛ)(Xₖₛ E[Xₖₛ)(Xₗₛ E[Xₗₛ) = 0 E[(Xⱼₛ E[Xⱼₛ)(Xₖₛ E[Xₖₛ)(Xₗₛ E[Xₗₛ) = 0

58 Explorig this Summatio The terms of this summatio might sometimes rage over the same variables at the same time: 4th[B s = E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) j=1 k=1 l=1 Claim: Every term i this sum is zero except for the followig: Terms where i = j = k = l. Terms where two of i, j, k, ad l refer to oe value ad the other two of i, j, k, ad l refer to aother. Proof: If a variable appears exactly oe time, the by our previous logic the term evaluates to zero. If a variable appears exactly three times, the the other variable appears exactly oce ad the term evaluates to zero. That leaves behid the two remaiig cases here.

59 Explorig this Summatio The terms of this summatio might sometimes rage over the same variables at the same time: 4th[B s = E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) j=1 k=1 l=1 Claim: Every term i this sum is zero except for the followig: Terms where i = j = k = l. Terms where two of i, j, k, ad l refer to oe value ad the other two of i, j, k, ad l refer to aother. E[( X is ) 4 + ( 4 2) p=1 E[( X ps E[ X ps ) 2 ( X qs E[ X qs ) 2 q=p+1

60 Explorig this Summatio The terms of this summatio might sometimes rage over the same variables at the same time: 4th[B s = E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) j=1 k=1 l=1 Claim: Every term i this sum is zero except for the followig: Terms where i = j = k = l. Terms where two of i, j, k, ad l refer to oe value ad the other two of i, j, k, ad l refer to aother. E[( X is ) 4 + ( 4 2) p=1 E[( X ps E[ X ps ) 2 ( X qs E[ X qs ) 2 q=p+1 Which Which of of i, i, j, j, k, k, ad ad l l refer refer to to the the first first value? value? What s What s the the first first value? value? What s What s the the secod? secod? (It (It must must be be differet differet tha tha the the first!) first!)

61 Explorig this Summatio The terms of this summatio might sometimes rage over the same variables at the same time: 4th[B s = E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) j=1 k=1 l=1 Claim: Every term i this sum is zero except for the followig: Terms where i = j = k = l. Terms where two of i, j, k, ad l refer to oe value ad the other two of i, j, k, ad l refer to aother. E[( X is ) 4 + ( 4 2) E[( X is ) 2 ( X js E[ X js ) 2 j=i+1 We ll We ll use use i i ad ad j j as as our our summatio summatio variables, variables, sice sice that s that s easier easier to to read. read.

62 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = 4th[ X is + 6 4th[ X is + 3 j=i+1 j=1 j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Sice Var[ X is Var[ X js Sice h is is 5-idepedet 5-idepedet ad ad we re we re 3( coditioig coditioig o = 4th[ X is + o just just kowig kowig oe oe hash hash locatio locatio Var[ X is )2 (h(xₐ)), (h(xₐ)), these these are are idepedet idepedet radom radom variables. variables. = 4th[ X is + 3Var[ B s 2 E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2

63 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = 4th[ X is + 6 j=i+1 4th[ X is + 3 j=1 This This is is the the defiitio 3( = 4th[ X is + of of the the fourth cetral j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Var[ X is Var[ X js Var[ X is )2 = 4th[ X is + 3Var[ B s 2 momet. E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2 This This is is the the defiitio of of variace. So So is is this. this.

64 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = = 4th[ X is + 6 j=i+1 4th[ X is + 3 j=1 4th[ X is + 3( j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Var[ X is Var[ X js Var[ X is )2 = 4th[ X is + 3Var[ B s 2 6 = 3 3 E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2

65 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = = 4th[ X is + 6 j=i+1 4th[ X is + 3 j=1 4th[ X is + 3( j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Var[ X is Var[ X js Var[ X is )2 = 4th[ X is + 3Var[ B s 2 E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2 = 2

66 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = = 4th[ X is + 6 j=i+1 4th[ X is + 3 j=1 4th[ X is + 3( j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Var[ X is Var[ X js Var[ X is )2 = 4th[ X is + 3Var[ B s 2 E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2 Var[ X is = Var [ X is = Var [ B s

67 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = = 4th[ X is + 6 j=i+1 4th[ X is + 3 j=1 4th[ X is + 3( j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Var[ X is Var[ X js Var[ X is )2 = 4th[ X is + 3Var[ B s 2 E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2 If If X is is a a idicator, idicator, the the 4th[X 4th[X E[X. E[X. We We kow kow from from our our 3-idepedece 3-idepedece aalysis aalysis that that Var[Bₛ Var[Bₛ E[Bₛ E[Bₛ

68 4th[B s = j=1 k=1 l=1 = E[( X is ) 4 + ( 4 E[( X is )( X js E[ X js )( X ks E[ X ks )( X ls E[ X ls ) 2) = E[( X is ) = = 4th[ X is + 6 j=i+1 4th[ X is + 3 j=1 4th[ X is + 3( j=i+1 j=i+1 E[( X is ) 2 ( X js E[ X js ) 2 E[( X is ) 2 E[( X js E[ X js ) 2 Var[ X is Var[ X js Var[ X is Var[ X js Var[ X is )2 = 4th[ X is + 3Var[ B s 2 E[ X is + 3E[ B s 2 = E[ B s + 3E[ B s 2 4E[ B s 2 (As (As log log as as E[Bₛ E[Bₛ 1, 1, which which we we ca ca assume assume if if we re we re talkig talkig about about sufficietly sufficietly large large regios.) regios.)

69 The Net Result We've just show that 4th[B 4 E[B 2 Phew! That was crazy. But at least we ow have a boud o the fourth momet, which lets us use the fourth momet iequality!

70 Fourth Momets for Victory Usig the fourth momet iequality: Pr[ Bₛ 2E[Bₛ = Pr[ Bₛ E[Bₛ E[Bₛ 4th[Bₛ / E[Bₛ 4 4 E[Bₛ 2 / E[Bₛ 4 = 4 / E[Bₛ 2 = 4 / (¹/₃ 2 s ) 2 = s. Notice that this is expoetially better tha our previous boud!

71 A Strog Rutime Boud The expected cost of lookig up xₐ i a liear probig table is log O(1) 2 s Pr[ B s 2 E[ B s s=0 Assumig 5-idepedet hashig, this is log O(1) s=0 log O(1) s=0 log = O(1) s=0 = O(1) 2 s Pr[ B s 2 E[ B s 2 s s 36 2 s We've fially obtaied a O(1) boud o the cost of operatios i a chaied hash table provided that we use 5-idepedet hashig!

72 What Just Happeed? With oe degree of idepedece, we could obtai the expected value ad use that to boud the probability with Markov's iequality. Usig two degrees of idepedece, we could obtai the variace ad use that to boud the probability with Chebyshev's iequality. Usig four degrees of idepedece, we could obtai the fourth cetral momet ad use that to boud the probability with the fourth momet boud. Icreasig the stregth of a hash fuctio allows us to obtai more cetral momets ad, therefore, to tighte our boud more tha might iitially be suspected.

73 More to Explore Mitzemacher ad Vadha s paper Why Simple Hash Fuctios Work provides a fudametally differet strategy for aalyzig liear probig. Pătrașcu ad Thorup s paper o the lower boud for 5-idepedece here gives a glimpse of how you d argue that these bouds ca t be improved.

74 Next Time Cuckoo Hashig Hashig with worst-case O(1) lookups! The Cuckoo Graph Radom graphs for Fu ad Profit.

CS 330 Discussion - Probability

CS 330 Discussion - Probability CS 330 Discussio - Probability March 24 2017 1 Fudametals of Probability 11 Radom Variables ad Evets A radom variable X is oe whose value is o-determiistic For example, suppose we flip a coi ad set X =