CS 224: Advanced Algorthms Sprng 207 Prof. Jelan Nelson Lecture 3 January 3, 207 Scrbe: Saketh Rama Overvew In the last lecture we covered Y-fast tres and Fuson Trees. In ths lecture we start our dscusson of hashng. We wll study load balancng, k-wse ndependence, and the dynamc dctonary problem solved usng hashng wth channg and lnear probng. 2 Load Balancng Formally, consder jobs wth IDs n the unverse [u], and machnes labeled,..., m. The task of load balancng studes the assgnment of jobs to machnes so that no machne s too overloaded. For example, we could have a centralzed scheduler whch decdes where jobs should go. However, local decsons for schedulng are preferable for complexty reasons ths motvates our study of hashng. The dea s to have a random functon h : [u] [m]. 2. Chernoff Bound We wll assume there are n jobs n the system, wth n u, and focus on the case where m = n. Studyng ths case wll motvate our statement and proof of Chernoff bounds. 2.. Applcaton of Bound Load balancng corresponds to a small probablty Pmax load of any machne > T ). We can restate ths as follows: Pmax load of any machne > T ) = P machne : load) > T ) n Pload) > T ) = = n Pload) > T ) where the nequalty follows from the unon bound. We can now apply the Chernoff bound, whch wll prove later.
Lemma. Chernoff Bound Dscrete Case). Let random varables X,..., X R {0, } be ndependent, wth X = X and E[X] = µ. Then for all δ > 0, e δ PX > + δ)µ) < + δ) +δ To apply ths to hashng, defne R = n ndcator varables { h) = X =. 0 o.w. ) µ Then µ = E[X] = E[X ] = n n =. We can now analyze the probablty n load balancng. Pload) > T ) n < et T T n e ) T < n < n et /T ) T T ) By settng T = Θ lg n such that /T ) T /n 2, we get n e T /T ) T < /n o). In load balancng ) jargon, we say that f the left-hand condton s satsfed, then the max load s O wth hgh probablty. lg n 2..2 Proof of Bound Because X s Bernoull, E[X ] = p mples µ = p. We wll make use of the followng nequalty to bound the probablty usng an expectaton. Lemma 2. Markov s Inequalty. Let X be a nonnegatve r.v. Then for all λ > 0, Because f s strctly ncreasng, we can say that PX > λ) < E[X] λ. PX > + δ)µ) = Pfx) > f + δ)µ)). As a somewhat magcal step, choose fz) = e tz, such that we can guess at the form of z. 2
Pe tx > e t+δ)µ ) < e t+δ)µ Ee t X ) by Markov s nequalty) = e t+δ)µ E e tx ) = e t+δ)µ Ee tx ) = e t+δ)µ p + p e t ) e t+δ)µ e p e t ) ) = e t+δ)µ e µet ) ) µ e Ths establshes that PX > + δ)µ) < δ +δ). +δ The above proof requred a magcal step of guessng at the functon s form. We can also consder Chernoff bounds more ntutvely as a moment bound for large p n the expresson derved from a repeated applcaton of Markov s nequalty: P X E[X] p ) < E X E[X] )p λ p. 2.2 Alternatve Analyss We can also approach load balancng more drectly. P max load > T ) < n Pload) > T ) = n P T jobs mappng to machne ) ) n < n T n T We can bound n T as follows. For I = {,..., T } wth < < T, let { f all s map to X I =. 0 o.w. Then P T jobs mappng to ) = P I : X I = ) I PX I = ) by the unon bound. Note that ) n T n T = n! T!n T!) nn ) n T + ) = nt T! n T < T!. Here, we can ether use Strlng s approxmaton or be slopper wth the nequalty T! > T/2) T/2. We choose the latter, and so T! < q where q = T/2. Ths quantty s much smaller than /n for ) q T = q = Θ. lg n 3
3 k-wse Independence It turns out that the above analyss only requres k-wse ndependence where k = T ), a concept whch we wll now study. Note that PX I = ) = P h T j= h j) = ) = T j= P hh j ) = ) = /n) T, where the probablty s taken over the randomness of the hash functon. Defnton 3. A famly H of functons h : [u] [m] s a k-wse ndependent hash famly f for any < < k [u] and y,..., y k [m], we have P h H T j=h j ) = y j ) = k Ph j ) = y j ). h Ths condton s useful because a totally random hash functon would requre u lg m bts to store. Wth a less restrctve k-wse ndependence, we can get away wth less storage. Note that f H s the set of all functons mappng [u] to [m], then a random h H s what we just analyzed. j= 3. Example Let u = p where p s prme, wth p 2m. Defne H = {h : hx) = k =0 a x mod p}. Then H = p k, and so lg H = k lg p bts. We wll omt the analyss of ths example for the purposes of ths course. It can be derved usng polynomal nterpolaton. 4 Dynamc Dctonary Problem The dynamc dctonary problem s a data structure problem. The goal s to mantan tems S [u] as keys where each x S has an assocated value n the unverse [u] subject to the followng operatons:. nsertx, v) assocate key x wth v 2. queryx) return value of key x 3. delx) remove x from S 4. Frst Soluton: Hashng wth Channg We can defne an array A[... m] whose entres are ponters to lnked lsts wth key-value pars as nodes. The hash functon maps the unverse nto ths array. It turns out that f H s 2-wse ndependent wth m S, then E[tme per op] = o). The analyss of ths s avalable n the notes for CS 24/25 and CLRS. 4
4.2 Second Soluton: Lnear Probng The approach we wll focus on n ths course s lnear probng. We agan have an array A[dotsm]. To nsert a key k, we start at hk) and move rght untl we fnd an empty slot. For now, we wll consder the smpler case whch does not support deletons.) Lnear probng frst appeared n an IBM 70 program by Samuel, Amdahl, and Boehme n 954, and was subsequently analyzed by Knuth n 963 n the case where H s all functons []. Knuth showed that f m + ɛ) n, where n = S, then Etme per op) O/ɛ 2 ). Pagh, Pagh, and Ružć showed more recently that f m c n e.g., c = 3 works), then 5-wse ndependence guarantees constant expected tme as well, whch we wll prove n the next lecture. In the case of 4-wse ndependence, Pǎtraşcu and Thorup showed that there exst H whch have expected runtme Ωlg n), whch s not constant. References [] Donald Knuth. The Art of Computer Programmng 2nd ed.). Addson Wesley, pp. 53 558, 998. [2] Anna Pagh, Rasmus Pagh, and Mlan Ružć. Lnear probng wth 5-wse ndependence. SIAM Revew 53.3 20):547-558, 20. [3] Mha Pǎtraşcu, and Mkkel Thorup. On the k-ndependence requred by lnear probng and mnwse ndependence. Internatonal Colloquum on Automata, Languages, and Programmng. Sprnger Berln Hedelberg, 200. 5