MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak law of large umbers establishes covergece of the sample mea, i probability, the strog law establishes almost sure covergece. Before we proceed, we poit out two commo methods for provig almost sure covergece. Propositio : Let {X } be a sequece of radom variables, ot ecessarily idepedet. (i) If [ s a.s. ] <, ad s > 0, the X 0. = X a.s. (ii) If P( X > ɛ) <, for every ɛ > 0, the X 0. = Proof. (i) By the mootoe covergece theorem, we obtai [ = X s ] <, which implies that the radom variable = X s is fiite, with probability. Therefore, X s a.s. a.s. 0, which also implies that X 0. (ii) Settig ɛ = /k, for ay positive iteger k, the Borel-Catelli Lemma shows that the evet { X > /k} occurs oly a fiite umber of times, with probability. Thus, P(lim sup X > /k) = 0, for every positive iteger k. Note that the sequece of evets {lim sup X > /k} is mootoe ad coverges to the evet {lim sup X > 0}. The cotiuity of probability measures implies that P(lim sup X > 0) = 0. This establishes that a.s. X 0.

Theorem : Let X, X, X 2,... be i.i.d. radom variables, ad assume that [ X ] <. Let S = X + + X. The, S / coverges almost surely to [X]. Proof, assumig fiite fourth momets. Let us make the additioal assumptio that [X 4 ] <. Note that this implies [ X ] <. Ideed, usig the iequality x + x 4, we have [ ] X + [X 4 ] <. Let us assume first that [X] = 0. We will show that [ ] (X + + X ) 4 4 <. = We have [ ] (X + + X ) 4 = [Xi 4 4 X i X 2 i X 3 i ]. 4 i = i 2 = i 3 = i 4 = Let us cosider the various terms i this sum. If oe of the idices is differet from all of the other idices, the correspodig term is equal to zero. For example, if i is differet from i 2, i 3, or i 4, the assumptio [X i ] = 0 yields [X i X i2 X i3 X i4 ] = [X i ] [X i2 X i3 X i4 ] = 0. Therefore, the ozero terms i the above sum are either of the form [X 4 i ] (there are such terms), or of the form [X 2 i X 2 j ], with i = j. Let us cout the umber of terms of the secod type. Such terms are obtaied i three differet ways: by settig i = i 2 i 3 = i 4, or by settig i = i 3 i 2 = i 4, or by settig i = i 4 = i 2 = i 3. For each oe of these three ways, we have choices for the first pair of idices, ad choices for the secod pair. We coclude that there are 3( ) terms of this type. Thus, [ ] (X + + X ) 4 = [X 4 ] + 3( )[X 2 X 2 2 ] 4 4. Usig the iequality xy (x 2 + y 2 )/2, we obtai [X 2 X2 2 ] [X 4 ], ad [ ] ( ) (X + + X ) 4 + 3( ) [X 4 ] 3 2 [X 4 ] 3[X 4 ] =. 4 4 4 2 2

It follows that [ ] (X + + X ) 4 [ = (X + +X ) 4] 3 4 [X ] <, 4 4 2 = = = where the last step uses the well kow property = 2 <. This implies that (X + + X ) 4 / 4 coverges to zero with probability, ad therefore, (X + +X )/ also coverges to zero with probability, which is the strog law of large umbers. For the more geeral case where the mea of ( the radom variables X ) i is ozero, the precedig argumet establishes that X + + X [X ] / coverges to zero, which is the same as (X + +X )/ covergig to [X ], with probability. Proof, assumig fiite secod momets. We ow cosider the case where we oly assume that [X 2 ] <. We have [ (S ) ] 2 var(x) µ =. If we oly cosider values of that are perfect squares, we obtai [ (Si ) ] 2 2 var(x) i 2 µ = i 2 <, i= i= which implies that ( (S i 2 /i 2 ) [X] ) 2 coverges to zero, with probability. Therefore, S i 2 /i 2 coverges to [X], with probability. Suppose that the radom variables X i are oegative. Cosider some such that i 2 < (i + ) 2. We the have S i 2 S S (i+) 2. It follows that or S i 2 S S (i+) 2, (i + ) 2 i 2 i 2 S i 2 S (i + ) 2 S (i+) 2. (i + ) 2 i 2 i 2 (i + ) 2 As, we also have i. Sice i/(i + ), ad sice S i 2 i 2 coverges to [X], with probability, we see that for almost all sample poits, S / is sadwiched betwee two sequeces that coverge to [X]. This proves that S / [X], with probability. For a geeral radom variable X, we write it i the form X = X + X, where X + ad X are oegative. The strog law applied to X ad X separately, implies the strog law for X as well. 3

The proof for the most geeral case (fiite mea, but possibly ifiite variace) is omitted. It ivolves trucatig the distributio of X, so that its momets are all fiite, ad the verifyig that the errors due to such trucatio are ot sigificat i the limit. 2 The Cheroff boud Let agai X, X,... be i.i.d., ad S = X + + X. Let us assume, for simplicity, that [X] = 0. Accordig to the weak law of large umbers, we kow that P(S a) 0, for every a > 0. We are iterested i a more detailed estimate of P(S a), ivolvig the rate at which this probability coverges to zero. It turs out that if the momet geeratig fuctio of X is fiite o some iterval [0, c] (where c > 0), the P(S a) decays expoetially with, ad much is kow about the precise rate of expoetial decay. 2. Upper boud Let M(s) = [e sx ], ad assume that M(s) <, for s [0, c], where c > 0. Recall that M S (s) = [e s(x + +X ) ] = (M(s)). For ay s > 0, the Markov iequality yields P(S a) = P(e ss e sa ) e sa [e ss ] = e sa (M(s)). very oegative value of s, gives us a particular boud o P(S a). To obtai the tightest possible boud, we miimize over s, ad obtai the followig result. Theorem 2. (Cheroff upper boud) Suppose that [e sx ] < for some s > 0, ad that a > 0. The, P(S a) e φ(a), where ( ) φ(a) = sup sa log M(s). s 0 For s = 0, we have sa log M(s) = 0 log = 0, where we have used the geeric property M(0) =. Furthermore, d( ) d sa log M(s) = a M(s) = a [X] > 0. ds s=0 M(s) ds s=0 4

Sice the fuctio sa log M(s) is zero ad has a positive derivative at s = 0, it must be positive whe s is positive ad small. It follows that the supremum φ(a) of the fuctio sa log M(s) over all s 0 is also positive. I particular, for ay fixed a > 0, the probability P(S a) decays at least expoetially fast with. s xample: For a stadard ormal radom variable X, we have M(s) = e 2 /2. Therefore, sa log M(s) = sa s 2 /2. To maximize this expressio over all s 0, we form the derivative, which is a s, ad set it to zero, resultig i s = a. Thus, φ(a) = a 2 /2, which leads to the boud 2.2 Lower boud P(X a) e a2 /2. Remarkably, it turs out that the estimate φ(a) of the decay rate is tight, uder miimal assumptios. To keep the argumet simple, we itroduce some simplifyig assumptios. Assumptio. (i) M(s) = [e sx ] <, for all s R. (ii) The radom variable X is cotiuous, with PDF f X. (iii) The radom variable X does ot admit fiite upper ad lower bouds. (Formally, 0 < F X (x) <, for all x R.) We the have the followig lower boud. Theorem 2. (Cheroff lower boud) Uder Assumptio, we have for every a > 0. lim log P(S a) = φ(a), () We ote two cosequeces of our assumptios, whose proof is left as a exercise: log M(s) (a) lim s = ; s (b) M(s) is differetiable at every s. The first property guaratees that for ay a > 0 we have lim s (log M(s) sa) =. Sice M(s) > 0 for all s, ad sice M(s) is differetiable, it follows that log M(s) is also differetiable ad that there exists some s 0 at which 5

log M(s) sa is miimized over all s 0. Takig derivatives, we see that such a s satisfies a = M (s )/M(s ), where M stads for the derivative of M. I particular, φ(a) = s a log M(s ). (2) Let us itroduce a ew PDF e s x f Y (x) = f X (x). M(s ) This is a legitimate PDF because f Y (x) dx = e s x f X (x) dx = M(s ) =. M(s ) M(s ) The momet geeratig fuctio associated with the ew PDF is M Y (s) = e sx e s x f X (x) dx = M(s + s ). M(s ) M(s ) Thus, d M (s ) [Y ] = M(s + s ) = = a, M(s ) ds s=0 M(s ) where the last equality follows from our defiitio of s. The distributio of Y is called a tilted versio of the distributio of X. Let Y,..., Y be i.i.d. radom variables with PDF f Y. Because of the close relatio betwee f X ad f Y, approximate probabilities of evets ivolvig Y,..., Y ca be used to obtai approximate probabilities of evets ivolvig X,..., X. We keep assumig that a > 0, ad fix some δ > 0. Let { } B = (x,..., x ) a δ x i a + δ R. i= Let S = X +... + X ad T = Y +... + Y. We have ( ) ( ) P S (a δ) P (a δ) S (a + δ) = f X (x ) f X (x ) dx dx (x,...,x ) B = (M(s )) e s x f Y (x ) e s x f Y (x ) dx dx (x,...,x ) B (M(s )) e s (a+δ) f Y (x ) f Y (x ) dx dx (x,...,x ) B = (M(s )) e s (a+δ) P(T B). (3) 6

The secod iequality above was obtaied because for every (x,..., x )] B, we have x + + x (a + δ), so that e s x e s x e s (a+δ). By the weak law of large umbers, we have ( Y + + Y ) P(T B) = P [a δ, a + δ], as. Takig logarithms, dividig by, ad the takig the limit of the two sides of q. (3), ad fially usig q. (2), we obtai lim if log P(S > a) log M(s ) s a δ = φ(a) δ. This iequality is true for every δ > 0, which establishes the lower boud i q. (). 7

MIT OpeCourseWare http://ocw.mit.edu 6.436J / 5.085J Fudametals of Probability Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.