Lecture 02: Bounding tail distributions of a random variable

CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome of the -th co toss to be a radom varables { +, w.p. 2 X =., w.p. 2 We assume all co tosses are depedet ad let would lke to study the sum S of the frst co tosses, S = X. = I ths lecture, we would lke the study the probablty that S greatly devates from ts mea E S = 0. Specfcally, for some parameter t, we would lke to estmate the probablty Pr[S > t]. Itutvely, we kow that such probablty should be small for large eough t. The goal of ths lecture (ad part of the ext oe) s to derve qualtatve upper bouds o the tal probablty mass parameterzed by t. As we dd the prevous lecture, usg the Berry-Essee theorem, we kow that Pr[S t] Pr[G t] O(), where G N (0, ) s a stadard Gaussa. For coveece, we may also use the followg formal otato Pr[S t] = Pr[G t] ± O(). () Usg basc calculus, we ca estmate that Pr[G t] = + t 2π e t2 2 dt O() e t 2 2. (2) Now let us fx the parameter t = 0 l. Combg () ad (2), we have Pr[S ( ) ( ( )) t] Pr[G t] + O = O exp (0 l ) 2 2 ( ) = O 50 ( ) + O ( ) ( ) + O = O.

Lecture 02: Boudg tal dstrbutos of a radom varable 2 We see that the tal mass of the stadard Gaussa s oly O ( ) ( ). However, the error term 50 O troduced by the Berry-Essee theorem s much greater. Ths error term s the ma reaso that we ca t get better results. I the followg part of ths lecture, we wll try varous other methods to mprove the upper boud. Markov equalty Whe we oly kow the mea of a oegatve radom varable, Markov equalty gves a smple upper boud o the probablty that t devates from ts mea. Theorem (Markov equalty). Gve a radom varable X, assume X 0. For each parameter t, we have Pr[X t E[X]] t. Proof. For each α > 0, we have E[X] = Pr[X α] E[X X α] + Pr[X < α] E[X X < α] Pr[X α] α + Pr[X < α] 0 = Pr[X α] α. Dvdg both sdes of the equalty by α > 0 E[X] α Takg α = E[X] t we get the desred boud. Pr[X α] Now let us try to apply the Markov equalty to boudg the tal mass of S. Sce S s ot a oegatve radom varable, we caot drectly apply the equalty. However, ote that S always holds. We apply the equalty to T = S + where E T = E S + =. Let t = 0 l. We have Pr[S t] = Pr[T t + ] = Pr [ T (E T ) t + ] t + = Ths s a very bad boud t does ot eve coverge to 0 as grows! + 0 l. 2 Chebyshev equalty The Chebyshev equalty ot oly uses the mea of the radom varable, but also eeds the varace (or the secod momet). Sce we have more formato about the radom varable, we may potetally get better bouds.

Lecture 02: Boudg tal dstrbutos of a radom varable 3 Theorem 2 (Chebyshev equalty). Asumme that E[X] = µ ad Var[X] = σ 2 > 0. For every parameter t > 0, we have Pr[ X µ t σ] t 2. Proof. Let Y = (X µ) 2. We ca check that E[Y ] = σ 2 ad Y 0. Applyg Markov equalty, we have Pr[ X µ t σ] = Pr[(X µ) 2 t 2 σ 2 ] = Pr[Y t 2 E[Y ]] t 2. Now let us go back to the scearo dscussed at the begg of ths lecture. We compute that µ = E S = 0 ad σ = Var[S] = E S 2 =. Therefore Pr[S 0 l ] Pr[ S 0 l ] [ = Pr S σ 0 l σ ] σ 2 (0 l ) = 2 00 l. Ths boud s stll ot as good as expected. However, at least t coverges to 0 as. Remark. Note that Chebyshev equalty oly eeds parwse depedece amog X s. Specfcally, whe computg the varace of S, we have Var[S] = Var[X + X 2 +... + X ] = E[(X + X 2 +... + X ) 2 ] E[(X + X 2 +... + X )] 2 = E[(X + X 2 +... + X ) 2 ] = E[X X j ] = E[X 2 ] =. = E[X 2 ] + j I the peultmate equalty, we used the fact that X s depedet from X j for j. = 3 The fourth momet method Usg the frst two momets, we had better bouds the oly usg the mea of the radom varable. Now let us try to exted ths method to the fourth momet. Let us cosder S 4 0. By Markov equalty, we have Pr[S 0 l ] Pr [S 4 (0 ] l ) 4 E S 4 0000 2 l 2. (3)

Lecture 02: Boudg tal dstrbutos of a radom varable 4 Now let us estmate ( ) 4 E[S 4 ] = E X = E X 4 + j E X 2 Xj 2 ( ) 4 + 2 2 ( ) 4 E X Xj 3 + j + j k:k,k j j k:k,k j q:q,q j,q k ( ) 4 E X X j Xk 2 2 E X X j X k X q. (4) Fortuately because of depedece ad E X = 0, we have that E X Xj 3 = E X X j Xk 2 = E X X j X k X q = 0. Therefore we ca smplfy (4) by E[S 4 ] = E X 4 + E X 2 Xj 2 3 = + 3( ) 3 2. (5) Combg (3) ad (5), we get j Pr[S 0 l ] 3 2 0000 2 l 2 = 3 0000 l 2. Ths s a better boud tha what we get from Chebyshev. Remark 2. Whe usg the fourth momet method, we oly used the depedece amog every quadruple of radom varables. Therefore the boud works for 4-wse depedet radom varables too. Remark 3. We ca exted ths method to cosderg S 2k for postve teger k s, ad pckg the k that optmzes the upper boud. However, ths pla would lead to the paful estmato of E S 2k. We wll use a slghtly dfferet method to get better upper bouds. 4 The Cheroff method Istead of S 2k, let us cosder the fucto e λs for some postve parameter λ. Sce e x s a mootocally creasg fucto, we have Pr[S 0 l ] = Pr[λS 0λ l ] Pr[e λs e 0λ l ]. By Markov equalty (also checkg that e λs > 0, we have Pr[e λs e 0λ l ] E e λs e 0λ l. (6)

Lecture 02: Boudg tal dstrbutos of a radom varable 5 Now t remas to upper boud E e λs. We have E e λs = E e λ X ] = E e λx = E e λx. (7) Note that the last equalty, we used the full depedece amog all X s. O the other had, by the dstrbuto of X, we have E e λx = 2 eλ + 2 e λ = ) ( + λ + λ2 2 2! + λ3 3! + λ4 4! +... + 2 = + λ2 2! + λ4 4! + λ6 6! +... eλ2 /2. Gettg back to (7), we have ) ( λ + λ2 2! λ3 3! + λ4 4!... (Taylor expaso) E e λs e λ2 /2 = e λ2 /2. Combg ths wth (6), we have Pr[e λs e 0λ l ] e λ2 /2 0λ l. (8) l Pckg λ = 0, we mmze the rght-had sde of (8) ad get our desred upper boud Pr[e λs e 0λ l ] e 50 l 00 l =. 50 I the begg of the ext lecture, we are gog to exted ths method to more geeral radom varables ad geeral thresholds, ad go through the proof the famous Cheroff boud.