E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

Algorthms Non-Lecture E: Tal Inequaltes If you hold a cat by the tal you learn thngs you cannot learn any other way. Mar Twan E Tal Inequaltes The smple recursve structure of sp lsts made t relatvely easy to derve an upper bound on the expected worst-case search tme, by way of a stronger hgh-probablty upper bound on the worst-case search tme. We can prove smlar results for treaps, but because of the more complex recursve structure, we need slghtly more sophstcated probablstc tools. These tools are usually called tal nequaltes; ntutvely, they bound the probablty that a random varable wth a bell-shaped dstrbuton taes a value n the tals of the dstrbuton, far away from the mean. E.1 Marov s Inequalty Perhaps the smplest tal nequalty was named after the Russan mathematcan Andrey Marov; however, n strct accordance wth Stgler s Law of Eponymy, t frst appeared n the wors of Marov s probablty teacher, Pafnuty Chebyshev. 1 Marov s Inequalty. Let X be a non-negatve nteger random varable. Pr[X t] E[X ]/t. For any t > 0, we have Proof: The nequalty follows from the defnton of expectaton by smple algebrac manpulaton. E[X ] = Pr[X = ] [defnton of E[X ]] =0 = Pr[X ] [algebra] =0 t 1 Pr[X ] [snce t < ] =0 t 1 Pr[X t] [snce < t] =0 = t Pr[X t] [algebra] Unfortunately, the bounds that Marov s nequalty mples (at least drectly) are often very wea, even useless. (For example, Marov s nequalty mples that wth hgh probablty, every node n an n-node treap has depth O(n 2 log n). Well, duh!) To get stronger bounds, we need to explot some addtonal structure n our random varables. 1 The closely related tal bound tradtonally called Chebyshev s nequalty was actually dscovered by the French statstcan Irénée-Jules Benaymé, a frend and colleague of Chebyshev s. 1

Algorthms Non-Lecture E: Tal Inequaltes E.2 Sums of Indcator Varables A set of random varables X 1, X 2,..., X n are sad to be mutually ndependent f and only f n Pr (X = x ) = Pr[X = x ] for all possble values x 1, x 2,..., x n. For examples, dfferent flps of the same far con are mutually ndependent, but the number of heads and the number of tals n a sequence of n con flps are not ndependent (snce they must add to n). Mutual ndependence of the X s mples that the expectaton of the product of the X s s equal to the product of the expectatons: E X = E[X ]. Moreover, f X 1, X 2,..., X n are ndependent, then for any functon f, the random varables f (X 1 ), f (X 2 ),..., f (X n ) are also mutually ndependent. Suppose X = n X s the sum of n mutually ndependent random ndcator varables X. For each, let p = Pr[X = 1], and let µ = E[X ] = E[X ] = p. Chernoff Bound (Upper Tal). Pr[X > (1 + δ)µ] < (1 + δ) 1+δ µ for any δ > 0. Proof: The proof s farly long, but t reples on just a few basc components: a clever substtuton, Marov s nequalty, the ndependence of the X s, The World s Most Useful Inequalty e x > 1 + x, a tny bt of calculus, and lots of hgh-school algebra. We start by ntroducng a varable t, whose role wll become clear shortly. P r[x > (1 + δ)µ] = Pr[e tx > e t(1+δ)µ ] To cut down on the superscrpts, I ll usually wrte exp(x) nstead of e x n the rest of the proof. Now apply Marov s nequalty to the rght sde of ths equaton: P r[x > (1 + δ)µ] < E[exp(tX )] exp(t(1 + δ)µ). We can smplfy the expectaton on the rght usng the fact that the terms X are ndependent. E[exp(tX )] = E exp t X = E exp(tx ) = E[exp(tX )] We can bound the ndvdual expectatons E e tx usng The World s Most Useful Inequalty: E[exp(tX )] = p e t + (1 p ) = 1 + (e t 1)p < exp (e t 1)p Ths nequalty gves us a smple upper bound for E[e tx ]: E[exp(tX )] < exp((e t 1)p ) < exp (e t 1)p = exp((e t 1)µ) 2

Algorthms Non-Lecture E: Tal Inequaltes Substtutng ths bac nto our orgnal fracton from Marov s nequalty, we obtan P r[x > (1 + δ)µ] < E[exp(tX )] exp(t(1 + δ)µ) < exp((et 1)µ) exp(t(1 + δ)µ) = exp(e t 1 t(1 + δ)) µ Notce that ths last nequalty holds for all possble values of t. To obtan the fnal tal bound, we wll choose t to mae ths bound as tght as possble. To mnmze e t 1 t tδ, we tae ts dervatve wth respect to t and set t to zero: d d t (et 1 t(1 + δ)) = e t 1 δ = 0. (And you thought calculus would never be useful!) Ths equaton has just one soluton t = ln(1 + δ). Pluggng ths bac nto our bound gves us P r[x > (1 + δ)µ] < exp(δ (1 + δ) ln(1 + δ)) µ = (1 + δ) 1+δ µ And we re done! Ths form of the Chernoff bound can be a bt clumsy to use. A more complcated argument gves us the bound Pr[X > (1 + δ)µ] < e µδ2 /3 for any 0 < δ < 1. A smlar argument gves us an nequalty boundng the probablty that X s sgnfcantly smaller than ts expected value: Chernoff Bound (Lower Tal). Pr[X < (1 δ)µ] < (1 δ) 1 δ µ < e µδ2 /2 for any δ > 0. E.3 Bac to Treaps In our analyss of randomzed treaps, we defned the ndcator varable A to have the value 1 f and only f the node wth the th smallest ey ( node ) was a proper ancestor of the node wth the th smallest ey ( node ). We argued that Pr[A [ ] = 1] = + 1, and from ths we concluded that the expected depth of node s E[depth()] = n Pr[A = 1] = H + H n 2 < 2 ln n. To prove a worst-case expected bound on the depth of the tree, we need to argue that the maxmum depth of any node s small. Chernoff bounds mae ths argument easy, once we establsh that the relevant ndcator varables are mutually ndependent. Lemma 1. For any ndex, the 1 random varables A wth < are mutually ndependent. Smlarly, for any ndex, the n random varables A wth > are mutually ndependent. 3

Algorthms Non-Lecture E: Tal Inequaltes Proof: To smplfy the notaton, we explctly consder only the case = 1, although the argument generalzes easly to other values of. Fx n 1 arbtrary ndcator values x 2, x 3,..., x n. We prove the lemma by nducton on n, wth the vacuous base case n = 1. The defnton of condtonal probablty gves us n Pr (A 1 = x ) = Pr (A = x ) A n 1 = x n = Pr (A = x ) An 1 = x n Pr A n 1 = x n Now recall that A n 1 = 1 f and only f node n has the smallest prorty, and the other n 2 ndcator varables A 1 depend only on the order of the prortes of nodes 1 through. There are exactly ()! permutatons of the n prortes n whch the nth prorty s smallest, and each of these permutatons s equally lely. Thus, Pr (A = x ) An 1 = x n = Pr (A = x ) The nductve hypothess mples that the varables A 2 1,..., A 1 are mutually ndependent, so Pr (A = x ) = Pr A 1 = x. We conclude that n Pr (A 1 = x ) = Pr A n 1 = x n Pr A 1 = x = Pr A 1 = x, or n other words, that the ndcator varables are mutually ndependent. Theorem 2. The depth of a randomzed treap wth n nodes s O(log n) wth hgh probablty. Proof: Frst let s bound the probablty that the depth of node s at most 8 ln n. There s nothng specal about the constant 8 here; I m beng generous to mae the analyss easer. The depth s a sum of n ndcator varables A, as ranges from 1 to n. Our Observaton allows us to partton these varables nto two mutually ndependent subsets. Let d < () = < A and d > () = < A, so that depth() = d <() + d > (). If depth() > 8 ln n, then ether d < () > 4 ln n or d > () > 4 ln n. Chernoff s nequalty, wth µ = E[d < ()] = H 1 < ln n and δ = 3, bounds the probablty that d < () > 4 ln n as follows. e 3 µ e 3 ln n Pr[d < () > 4 ln n] < Pr[d < () > 4µ] < 4 4 < 4 4 = n ln(e3 /4 4) = n 3 4 ln 4 < 1 n 2. (The last step uses the fact that 4 ln 4 5.54518 > 5.) The same analyss mples that Pr[d > () > 4 ln n] < 1/n 2. These nequaltes mply the crude bound Pr[depth() > 4 ln n] < 2/n 2. Now consder the probablty that the treap has depth greater than 10 ln n. Even though the dstrbutons of dfferent nodes depths are not ndependent, we can conservatvely bound the probablty of falure as follows: n n Pr max depth() > 8 ln n = Pr (depth() > 8 ln n) Pr[depth() > 8 ln n] < 2 n. =1 4 =1

Algorthms Non-Lecture E: Tal Inequaltes Ths argument mples more generally that for any constant c, the depth of the treap s greater than c ln n wth probablty at most 2/n c ln c c. We can mae the falure probablty an arbtrarly small polynomal by choosng c approprately. Ths lemma mples that any search, nserton, deleton, or merge operaton on an n-node treap requres O(log n) tme wth hgh probablty. In partcular, the expected worst-case tme for each of these operatons s O(log n). Exercses 1. Prove that for any nteger such that 1 < < n, the n 1 ndcator varables A wth are not mutually ndependent. [Hnt: Consder the case n = 3.] 2. Recall from Exercse 1 n the prevous note that the expected number of descendants of any node n a treap s O(log n). Why doesn t the Chernoff-bound argument for depth mply that, wth hgh probablty, every node n a treap has O(log n) descendants? The concluson s clearly bogus Every treap has a node wth n descendants! but what s the hole n the argument? 3. A heater s a sort of dual treap, n whch the prortes of the nodes are gven, but ther search eys are generate ndependently and unformly from the unt nterval [0, 1]. You can assume all prortes and eys are dstnct. (a) Prove that for any r, the node wth the rth smallest prorty has expected depth O(log r). (b) Prove that an n-node heater has depth O(log n) wth hgh probablty. (c) Descrbe algorthms to perform the operatons INSERT and DELETEMIN n a heater. What are the expected worst-case runnng tmes of your algorthms? In partcular, can you express the expected runnng tme of INSERT n terms of the prorty ran of the newly nserted tem? c Copyrght 2009 Jeff Ercson. Released under a Creatve Commons Attrbuton-NonCommercal-ShareAle 3.0 Lcense (http://creatvecommons.org/lcenses/by-nc-sa/3.0/). Free dstrbuton s strongly encouraged; commercal dstrbuton s expressly forbdden. See http://www.cs.uuc.edu/~jeffe/teachng/algorthms/ for the most recent revson. 5