Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other words, if you pick up a radom perso, the probability that his salary is more tha 1 times the average salary is more tha 0.1. Put ito a formal settig, deote by the set of people edowed with a uiform probability ad let X(!) be the perso s salary, the P(X > 1E[X]) 0.1. The followig theorem will show that this is ot possible. Theorem 5.1 (Markov iequality) Let X be a radom variable assumig oegative values, i.e., X(!) 0 for every!. The for every a 0, P(X a) E[X] a. Commet: Adrey Adreyevich Markov (1856 19) was a Russia mathematicia otably kow for his work o Stochastic processes.

10 Chapter 5 Commet: Note first that this is a vacuous statemet for a < E[X]. For a > E[X] this iequality limits the probability that X assumes values larger tha its expected value. This is the first time i this course that we derive a iequality. Iequalities, i geeral, are a importat tool i aalysis, where estimates (rather tha exact idetities) are ofte eeded. The stregth of such a iequality is that it holds for every radom variable. As a result, this estimate may be far from beig tight. Proof : Let A be the evet that X is at least a, that is A = {! X(!) a}. Defie a ew radom variable Y = ai A, where I A is the idicator of A. That is, a X(!) a Y(!) = 0 otherwise. Clearly, X Y, so by the mootoicity of expectatio E[X] E[Y] = a E[I A ] = ap(a) = ap(x a), where we used the fact that I A is a Beroulli variable, hece, its expectatio equals the probability that it is equal to 1. A corollary of Markov s iequality is aother iequality due to Chebyshev: Theorem 5. (Chebyshev s iequality) Let X be a radom variable with expected value µ ad variace. The, for every a > 0 P(X µ a) a. Commet: Pafuty Lvovich Chebyshev (181 1894) was a Russia mathematicia kow for may cotributios; he was Markov s advisor. Commet: If we write a = k, this theorem states that the probability that a radom variable assumes a value whose absolute distace from its expected value is more tha k times its stadard deviatio is at most 1k, P(X µ k ) 1 k.

Iequalities 11 Proof : Defie Y = X µ. Y is a o-egative radom variable, satisfyig, by defiitio, E[Y] =. Applyig Markov s iequality for Y, P(X µ a) = P(Y a ) E[Y] a = a. Example: O average, Joh Doe driks 5 liters of wie every week. What ca be said about the probability that he driks more tha 50 liters of wie o a give week? Here we apply the Markov iequality. If X is the amout of wie Joh Doe driks o a give week, the P(X > 50) E[X] 50 = 1. Note that this may be a very loose estimate; it may well be that Joe Doe driks exactly 5 liters of wie every week, i which case this probability is zero. Example: Let X B (, 1), The, E[X] = ad Var[X] = 4. If you toss a fair coi 10 6 times, the distributio of the umber of Heads is X B (10 6, 1); the probability of gettig betwee 495,000 ad 505,000 Heads ca be bouded by P(X E[X] < 5000) = 1 P(X E[X] 5000) 1 106 4 5000 = 0.99 Sice a biomial variable B (, p) ca be represeted as a sum of Beroulli variables, the followig geeralizatio is called for. Let X 1, X,...,X by idepedet ad idetically distributed radom variables, with E[X 1 ] = µ ad Var[X 1 ] =. Defie X = k=1 X k.

1 Chapter 5 The, by the liearity of the expectatio ad the idepedece of the radom variables, E[X] = µ ad Var[X] =. Chebyshev s iequality yields P(X µ a) a. This iequality starts gettig useful whe a >. For further referece, we ote that i the particular case where µ = 0, we get a fortiori that for every a > 0, P(X a) a. 5. Hoe dig s iequality We have just see that if we cosider a sum of idepedet radom variables, we ca use Chebyshev s iequality to boud the probability that this sum deviates by much from its expectatio. We ow ask: is this result tight? It turs out there are much better bouds that we ca get i this settig. Theorem 5.3 (Hoeffdig s iequality) Let X 1, X,...,X be idepedet radom variables (ot ecessarily idetically-distributed) satisfyig E[X i ] = 0 ad X i 1 for all 1 i (that is, for every i ad every!, X i (!) 1). The, for every a > 0 P X i a exp a. Recall the defiitio of the momet geeratig fuctio of X M X (t) = E[e tx ]. The proof will use momet geeratig fuctios ad Markov s iequality. We eed the followig lemma. Lemma 5.1 Let X be a radom variable satisfyig E[X] = 0 ad X i 1. The for all t R M X (t) exp t.

Iequalities 13 Proof : Fix t R, ad cosider the fuctio f (x) = e tx. This fuctio is covex (the secod derivative of f is everywhere positive), meaig that ay lie segmet betwee two poits o its graph lies above the graph; i.e., for every a, b R ad 0 1, f ( a + (1 )b) f (a) + (1 ) f (b). I particular, takig a = 1 ad b = 1 we get that for every t ad every 0 1, e t( (1 )) e t + (1 )e t. Settig x = 1, i.e., = (1 + x) ad 1 = (1 x), we get that for ay 1 x 1, e tx 1 + x et + 1 x e t = et + e t Sice X is always betwee 1 ad 1, it follows that e tx et + e t + X et e t, + x et e t. which is a iequality betwee radom variables. By mootoicity of expectatio, usig the liearity of the expectatio ad the fact that E[X] = 0, It remais to show that for all t R M X (t) = E[e tx ] et + e t. e t + e t exp t. This is doe by compariso of the Taylor series aroud 0, e t + e t = 1 t k + ( t) k k=0 k! = `=0 t ` (`)! `=0 t ` ``! = (t )` `=0 `! = exp t, where we used the fact that (`)! ``!. Proof :(of Hoe dig s iequality) Let X = X i. First ote that M X (t) = E[exp(t X i )] = E[ e tx i ] = E[e tx i ] = M Xi (t)

14 Chapter 5 where we used Propositio 4.3 ad the idepedece of the radom variables. Usig the above lemma we get that M X (t) exp t. For t > 0 the evet X a is the same as the evet e tx e ta. Usig Markov s iequality we get that for ay t > 0 P(X a) = P(e tx e ta ) E[etX ] e ta = M X(t) e ta exp t ta. The right-had side depeds o the parameter t, whereas the left-had side does ot; to get a tighter boud, we may choose t such to miimize the right-had side. The miimum is achieved whe t = a, hece P(X a) exp (a) (a)a = exp a. Commet: Wassily Hoe dig (1914 1991) was a Fiish statisticia. The iequality amed after him was proved i 1963. Commet: We have stated Hoe dig s iequality uder the assumptios that the radom variables are bouded by 1 i absolute value ad the expectatio is 0. I geeral, if we have bouded radom variables we ca apply a a e trasformatio to get a radom variables which satisfy the coditio of the theorem. For example, if E[X i ] = µ ad X i b, the Y i = X i µ b + µ satisfies, E[Y i ] = 0 ad Y i X i + µ b + µ 1. Example: Goig back to the last example from the previous sectio, if X i are idepedet Beroulli 1 radom variables, the X = X i B, 1. We defie

Iequalities 15 ew radom variables Y i = X i 1 so that Y i 1 ad E[Y i ] = 0. Furthermore, set Y = Y i, so that Y = X. The evet X + 5 equals the evet Y 10. Hoe dig s iequality gives us the boud P Y 10 exp 100 = e 50. A similar boud holds for X 5. Put together, P X 5 e 50. So, if you tossed a fair coi 10 6 times, the probability that you get betwee 495, 000 ad 505, 000 heads is at least 1 e 50 0.9999999999999999999996145... Quite a improvemet over our previous boud of 99 percet! 5.3 Jese s iequality Recall that a fuctio f R R is called covex if its graph betwee every two poits lies uder the secat, i.e., if for every x, y R ad 0 < < 1, f ( x + (1 )y) f (x) + (1 ) f (y). Propositio 5.1 (Jese s iequality) If g is a covex real-valued fuctio ad X is a real-valued radom variable, the provided that the expectatios exist. g(e[x]) E[g(X)], Proof : Let s start with a proof uder the additioal assumptio that g is twice di eretiable. Sice g is covex, it follows that g (x) 0 for all x.

16 Chapter 5 Taylor expadig g about µ = E[X], g(x) = g(µ) + g (µ)(x µ) + 1 g ( )(x µ) for some. The last term is o-egative, so we get that for all x g(x) g(µ) + g (µ)(x µ), ad as a iequality betwee radom variables: By the mootoicity of expectatio, g(x) g(µ) + g (µ)(x µ). E[g(X)] E[g(µ) + g (µ)(x µ)] = g(µ) + g (µ)(e[x] µ) = g(e[x]) which is precisely what we eed to show. What about the more geeral case? Ay covex fuctio is cotiuous, ad has oe-sided derivatives with For every m [g (µ), g +(µ)] g g(x) g(y) g(x) g(y) (x) = lim lim = g y x x y y x x y +(x). g(x) g(µ) + m(x µ), so the same proof holds with m replacig g (µ). Commet: Jese s iequality is valid also for covex fuctios of several variables. Example: Sice exp is a covex fuctio, exp(t E[X]) E[e tx ] = M X (t), or, E[X] 1 t log M X(t), for all t > 0.

Iequalities 17 Example: Cosider a discrete radom variable X assumig the positive values x 1,...,x with equal probability 1. Jese s iequality for g(x) = log(x) gives, log 1 x i 1 Reversig sigs, ad expoetiatig, we get 1 log(x i ) = log x i 1 x i, x 1 i. which is the classical arithmetic mea-geometric mea iequality. I fact, this iequality ca be geeralized for arbitrary distributios, p X (x i ) = p i, yieldig p i x i x p i i. 5.4 Kolmogorov s iequality (Not taught i 016-7) The Kolmogorov iequality may first seem to be of similar flavor as Chebyshev s iequality, but it is cosiderably stroger. I have decided to iclude it here because its proof ivolves some iterestig subtleties. First, a lemma: Lemma 5. If X, Y, Z are radom variables such that Y is idepedet of X ad Z, the E[XYZ] = E[XZ] E[Y]. Proof : The fact that Y is idepedet of both X, Z implies that (i the case of discrete variables), p X,Y,Z (x, y, z) = p X,Z (x, z)p Y (y).

18 Chapter 5 Now, for every z, E[XYZ = z] = xy p X,YZ (x, yz) x,y = xy p X,Y,Z(x, y, z) x,y p Z (z) = xy p X,Z(x, z)p Y (y) x,y p Z (z) = y yp Y (y) x p X,Z(x, z) x p Z (z) = E[Y] E[XZ = z]. Theorem 5.4 (Kolmogorov s iequality) Let X 1,...,X be idepedet radom variables such that E[X k ] = 0 ad Var[X k ] = k <. The, for all a > 0, P max 1 k X 1 + +X k a 1 a i. Commet: For = 1 this is othig but the Chebyshev iequality. For > 1 it would still be Chebyshev s iequality if the maximum over 1 k was replaced by k =, sice by idepedece Var[X 1 + +X ] = i. Proof : We itroduce the otatio S k = X 1 + +X k. This theorem is cocered with the probability that S k > a for some k. We defie the radom variable N(!) to be the smallest iteger k for which S k > a; if there is o such umber we set N(!) =. We observe the equivalece of evets,! max 1 k S k > a =! S N(!) > a,

Iequalities 19 ad from the Markov iequality P max 1 k S k > a 1 a E[S N]. We eed to estimate the right had side. If we could replace E[S N] by E[S ] = Var[S ] = i, the we would be doe. The trick is to show that E[S N ] E[S ] by usig coditioal expectatios. If E[S NN = k] E[S N = k] for all 1 k the the iequality holds, sice we have the a iequality betwee radom variables E[S N N] E[S N], ad applyig expectatios o both sides gives the desired result. For k =, the idetity hold trivially. Otherwise, we write E[S NN = ] = E[S N = ], E[S N = k] = E[S kn = k] + E[(X k+1 + +X ) N = k] + E[S k (X k+1 + +X )N = k] The first term o the right had side equals E[S NN = k], whereas the secod terms is o-egative. Remais the third term for which we remark that X k+1 + +X is idepedet of both S k ad N, ad by the previous lemma, E[S k (X k+1 + +X )N = k] = E[S k N = k] E[X k+1 + +X ] = 0. Puttig it all together, E[S N = k] E[S NN = k]. Sice this holds for all k s we have thus show that E[S N] E[S ] = i, which completes the proof.