Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i distributio, ad covergece i kth mea. We ow cosider a fourth mode of covergece, almost sure covergece or covergece with probability oe, which will be show to imply both covergece i probability ad covergece i distributio. It is for this reaso that we attach the term strog to almost sure covergece ad weak to the other two; these terms are ot meat to idicate aythig about their usefuless. I fact, the weak modes of covergece are used much more frequetly i asymptotic statistics tha the strog mode, ad thus a reader ew to the subject may wish to skip this chapter; most of the rest of the book may be uderstood without a grasp of strog covergece. 3.1 Defiitio of almost sure covergece This form of covergece is i some sese the simplest to uderstad, sice it depeds oly o the cocept of limit of a sequece of real umbers, Defiitio 1.1. Sice a radom variable like X or X is a fuctio o a sample space (say, Ω), if we fix a particular elemet of that space (say, ω), we obtai the real umbers X (ω) ad X(ω). We may the ask, for each ω Ω, whether X (ω) coverges to X(ω) as a sequece of real umbers. Defiitio 3.1 Suppose X ad X 1, X 2,... are radom variables defied o the same sample space Ω (ad as usual P deotes the associated probability measure). If P ({ω Ω : X (ω) X(ω)}) = 1, the X is said to coverge almost surely (or with probability oe) to X, deoted X X or X X or X X w.p. 1. 50

I other words, covergece with probability oe meas exactly what it souds like: The probability that X coverges to X equals oe. Later, i Theorem 3.3, we will formulate a equivalet defiitio of almost sure covergece that makes it much easier to see why it is such a strog form of covergece of radom variables. Yet the ituitive simplicity of Defiitio 3.1 makes it the stadard defiitio. As i the case of covergece i probability, we may replace the limitig radom variable X by ay costat c, i which case we write X c. I the most commo statistical usage of covergece to a costat, the radom variable X is some estimator of a particular parameter, say g(θ): Defiitio 3.2 If X g(θ), X is said to be strogly cosistet for g(θ). As the ames suggest, strog cosistecy implies cosistecy, a fact to be explored i more depth below. 3.1.1 Strog Cosistecy versus Cosistecy As before, suppose that X ad X 1, X 2,... are radom variables defied o the same sample space, Ω. For give ad ɛ > 0, defie the evets ad A = {ω Ω : X k (ω) X(ω) < ɛ for all k } (3.1) B = {ω Ω : X (ω) X(ω) < ɛ}. (3.2) Evidetly, both A ad B occur with high probability whe X is i some sese close to X, so it might be reasoable to say that X coverges to X if P (A ) 1 or P (B ) 1 for ay ɛ > 0. I fact, Defiitio 2.1 is othig more tha the latter statemet; that is, X P X if ad oly if P (B ) 1 for ay ɛ > 0. Yet what about the sets A? Oe fact is immediate: Sice A B, we must have P (A ) P (B ). Therefore, P (A ) 1 implies P (B ) 1. By ow, the reader might already have guessed that P (A ) 1 for all ɛ is equivalet to X X: Theorem 3.3 With A defied as i Equatio (3.1), P (A ) 1 for ay ɛ > 0 if ad oly if X X. The proof of Theorem 3.3 is the subject of Exercise 3.1. The followig corollary ow follows from the precedig discussio: Corollary 3.4 If X X, the X P X. 51

The coverse of Corollary 3.4 is ot true, as the followig example illustrates. Example 3.5 Take Ω to be the half-ope iterval (0, 1], ad for ay iterval J Ω, say J = (a, b], take P (J) = b a to be the legth of that iterval. Defie a sequece of itervals J 1, J 2,... as follows: J 1 = (0, 1] J 2 through J 4 = (0, 1 3 ], ( 1 3, 2 3 ], ( 2 3,1] J 5 through J 9 = (0, 1 5 ], ( 1 5, 2 5 ], ( 2 5, 3 5 ], ( 3 5, 4 5 ], ( 4 5,1] J m 2 +1 through J (m+1) 2 =.. ( ] ( ] 1 2m 0,,..., 2m + 1 2m + 1, 1 Note i particular that P (J ) = 1/(2m + 1), where m = 1 is the largest iteger ot greater tha 1. Now, defie X = I{J } ad take 0 < ɛ < 1. The P ( X 0 < ɛ) is the same as 1 P (J ). Sice P (J ) 0, we coclude X P 0 by defiitio. However, it is ot true that X 0. Sice every ω Ω is cotaied i ifiitely may J, the set A defied i Equatio (3.1) is empty for all. Alteratively, cosider the set S = {ω : X (ω) 0}. For ay ω, X (ω) has o limit because X (ω) = 1 ad X (ω) = 0 both occur for ifiitely may. Thus S is empty. This is ot covergece with probability oe; it is covergece with probability zero! 3.1.2 Multivariate Extesios We may exted Defiitio 3.1 to the multivariate case i a completely straightforward way: Defiitio 3.6 X is said to coverge almost surely (or with probability oe) to X (X X) if P (X X as ) = 1. Alteratively, sice the proof of Theorem 3.3 applies to radom vectors as well as radom variables, we say X X if for ay ɛ > 0, P ( X k X < ɛ for all k ) 1 as. 52

We saw i Theorems 2.24 ad 2.30 that cotiuous fuctios preserve both covergece i probability ad covergece i distributio. Yet these facts were quite difficult to prove. Fortuately, the aalogous result for covergece with probability oe is quite easy to prove. I fact, sice almost sure covergece is defied i terms of covergece of sequeces of real (ot radom) vectors, the followig theorem may be prove usig the same method of proof used for Theorem 1.16. Theorem 3.7 Suppose that f : S R l is a cotiuous fuctio defied o some subset S R k, X is a k-compoet radom vector, ad P (X S) = 1. If X X, the f(x ) f(x). We coclude this sectio with a simple diagram summarizig the implicatios amog the modes of covergece defied so far. I the diagram, a double arrow like meas implies. Note that the picture chages slightly whe covergece is to a costat c rather tha a radom vector X. qm X X X X X P X X d X qm X c X c X P c X d c Exercise 3.1 Prove Theorem 3.3. Exercises for Sectio 3.1 Hit: Note that the sets A are icreasig i, so that by the lower cotiuity of ay probability measure (which you may assume without proof), lim P (A ) exists ad is equal to P ( =1A ). Exercise 3.2 Prove Theorem 3.7. 3.2 The Strog Law of Large Numbers Some of the results i this sectio are preseted for uivariate radom variables ad some are preseted for radom vectors. Take ote of the use of bold prit to deote vectors. Theorem 3.8 Strog Law of Large Numbers: Suppose that X 1, X 2,... are idepedet ad idetically distributed ad have fiite mea µ. The X µ. 53

It is possible to use fairly simple argumets to prove a versio of the Strog Law uder more restrictive assumptios tha those give above. See Exercise 3.4 for details of a proof of the uivariate Strog Law uder the additioal assumptio that X 4 <. To aid the proof of the Strog Law i its full geerality, we first establish a useful lemma. Lemma 3.9 If k=1 P ( X k X > ɛ) < for ay ɛ > 0, the X X. Proof: The proof relies o the coutable subadditivity of ay probability measure, a axiom statig that for ay sequece A 1, A 2,... of evets, ( ) P A k P (A k ). (3.3) k=1 To prove the lemma, we must demostrate that P ( X k X ɛ for all k ) 1 as, which (takig complemets) is equivalet to P ( X k X > ɛ for some k ) 0. Lettig A k deote the evet that X k X > ɛ, coutable subadditivity implies ( ) P (A k for some k ) = P A k P (A k ), ad the right had side teds to 0 as because it is the tail of a coverget series. Lemma 3.9 is early the same as a famous result called the First Borel-Catelli Lemma, or sometimes simply the Borel-Catelli Lemma; see Exercise 3.3. The utility of Lemma 3.9 is illustrated by the followig useful result, which allows us to relate almost sure covergece to covergece i probability (see Theorem 2.24, for istace). Theorem 3.10 X P X if ad oly if each subsequece X 1, X 2,... cotais a further subsequece that coverges almost surely to X. The proof of Theorem 3.10, which uses Lemma 3.9, is the subject of Exercise 3.7. i=k k= k= 3.2.1 Idepedet but ot idetically distributed variables Here, we geeralize the uivariate versio of the Strog Law to a situatio i which the X are assumed to be idepedet ad satisfy a secod momet coditio: Theorem 3.11 Kolmogorov s Strog Law of Large Numbers: Suppose that X 1, X 2,... are idepedet with mea µ ad Var X i <. i 2 The X µ. i=1 54

Note that there is o reaso the X i i Theorem 3.11 must have the same meas: If E X i = µ i, the the theorem as writte implies that (1/) i (X i µ i ) 0. Theorem 3.11 may be proved usig Kolmogorov s iequality from Exercise 1.31; this proof is the focus of Exercise 3.6. I fact, Theorem 3.11 turs out to be very importat because it may be used to prove the Strog Law, Theorem 3.8. The key to completig this proof is to itroduce trucated versios of X 1, X 2,... as i the followig lemma. Lemma 3.12 Suppose that X 1, X 2,... are idepedet ad idetically distributed ad have fiite mea µ. Defie Xi = X i I{ X i i}. The ad X X 0. i=1 Var X i i 2 < (3.4) Uder the assumptios of Lemma 3.12, we see immediately that X = X +(X X ) µ, because Equatio (3.4) implies X µ by Theorem 3.11. This proves the uivariate versio of Theorem 3.8; the full multivariate versio follows because X µ if ad oly if X j µ j for all j (Lemma 1.31). A proof of Lemma 3.12 is the subject of Exercise 3.5. Exercises for Sectio 3.2 Exercise 3.3 Let B 1, B 2,... deote a sequece of evets. Let B i.o., which stads for B ifiitely ofte, deote the set B i.o. def = {ω Ω : for every, there exists k such that ω B k }. Prove the first Borel-Catelli Lemma, which states that if =1 P (B ) <, the P (B i.o.) = 0. Hit: Argue that B i.o. = B k, =1 k= the adapt the proof of Lemma 3.9. Exercise 3.4 Use the hit below to prove that if X 1, X 2,... are idepedet ad idetically distributed ad E X1 4 <, the X E X 1. You may assume without loss of geerality that E X 1 = 0. 55

Hit: Use Markov s iequality (1.22) with r = 4 to put a upper boud o P ( X > ɛ ) ivolvig E (X 1 +... + X ) 4. Expad E (X 1 +... + X ) 4 ad the cout the ozero terms. Fially, argue that the coditios of Lemma 3.9 are satisfied. Exercise 3.5 Lemma 3.12 makes two assertios about the radom variables X i = X i I{ X i i}: (a) Prove that i=1 Var X i i 2 <. Hit: Use the fact that the X i are idepedet ad idetically distributed, the show that X1 2 1 k I{ X 1 k} 2 X 2 1, k=1 perhaps by boudig the sum o the left by a easy-to-evaluate itegral. (b) Prove that X X 0. Hit: Use Lemma 3.9 ad Exercise 1.32 to show that X X 0. The use Exercise 1.3. Exercise 3.6 Prove Theorem 3.11. Use the followig steps: (a) For k = 1, 2,..., defie Y k = max X µ. 2 k 1 <2 k Use the Kolmogorov iequality from Exercise 1.31 to show that P (Y k ɛ) 4 2k i=1 Var X i 4 k ɛ 2. (b) Use Lemma 3.9 to show that Y k 0, the argue that this proves X µ. Hit: Lettig log 2 i deote the smallest iteger greater tha or equal to log 2 i (the base-2 logarithm of i), verify ad use the fact that 1 4 4 k 3i. 2 k= log 2 i 56

Exercise 3.7 Prove Theorem 3.10. Hit: To simplify otatio, let Y k = X k deote a arbitrary subseqece. If Y P k X, Show that there exist k 1, k 2,... such that the use Lemma 3.9. P ( Y kj X > ɛ) < 1 2 j, O the other had, if X does ot coverge i probability to X, argue that there exists a subsequece Y 1 = X 1, Y 2 = X 2,... ad ɛ > 0 such that P ( Y k X > ɛ) > ɛ for all k. The use Corollary 3.4 to argue that Y does ot have a subsequece that coverges almost surely. 3.3 The Domiated Covergece Theorem We ow cosider the questio of whe Y d Y implies E Y E Y. This is ot geerally the case: Cosider cotamiated ormal distributios with distributio fuctios ( F (x) = 1 1 ) Φ(x) + 1 Φ(x 37). (3.5) These distributios coverge i distributio to the stadard ormal Φ(x), yet each has mea 37. However, recall Theorem 2.25, which guaratees that Y d Y implies E Y Y if all of the Y ad Y are uiformly bouded say, Y < M ad Y < M sice i that case, there is a bouded, cotiuous fuctio g(y) for which g(y ) = Y ad g(y ) = Y : Simply defie g(y) = y for M < y < M, ad g(y) = My/ y otherwise. To say that the Y are uiformly bouded is a much stroger statemet tha sayig that each Y is bouded. The latter statemet implies that the boud we choose is allowed to deped o, whereas the uiform boud meas that the same boud must apply to all Y. Whe there are oly fiitely may Y, the boudedess implies uiform boudedess sice we may take as a uiform boud the maximum of the bouds of the idividual Y. However, i the case of a ifiite sequece of Y, the maximum of a ifiite set of idividual bouds might ot exist. 57

The ituitio above, the, is that some sort of uiform boud o the Y should be eough to guaratee E Y E Y. The most commo way to express this idea is the Domiated Covergece Theorem, give later i this sectio as Theorem 3.17. The proof of the Domiated Covergece Theorem that we give here relies o a powerful techique that is ofte useful for provig results about covergece i distributio. This techique is called the Skorohod Represetatio Theorem, which guaratees that covergece i distributio implies almost sure covergece for a possibly differet sequece of radom variables. More precisely, if we kow X d X, the Skorohod Represetatio Theorem guaratees the existece of Y d =X ad Y d =X such that Y Y, where d = meas has the same distributio as. Costructio of such Y ad Y will deped upo ivertig the distributio fuctios of X ad X. However, sice ot all distributio fuctios are ivertible, we first geeralize the otio of the iverse of a distributio fuctio by defiig the quatile fuctio. Defiitio 3.13 If F (x) is a distributio fuctio, the we defie the quatile fuctio F : (0, 1) R by F (u) def = if{x R : u F (x)}. With the quatile fuctio thus defied, we may prove a useful lemma: Lemma 3.14 u F (x) if ad oly if F (u) x. Proof: Usig the facts that F ad F are odecreasig ad F [F (x)] x, u F (x) F (u) F [F (x)] x F [F (u)] F (x) u F (x), where the first implicatio follows because F [F (x)] x ad the last follows because u F [F (u)] (the latter fact requires right-cotiuity of F ). Now the costructio of Y ad Y proceeds as follows. Let F ad F deote the distributio fuctios of X ad X, respectively, for all. Take the sample space Ω to be the iterval (0, 1) ad adopt the probability measure that assigs to each iterval subset (a, b) (0, 1) its legth (b a). (There is a uique probability measure o (0, 1) with this property, a fact we do ot prove here.) The for every ω Ω, defie Y (ω) def = F (ω) ad Y (ω) def = F (ω). (3.6) The radom variables Y ad Y are exactly the radom variables we eed, as asserted i the followig theorem. 58

Theorem 3.15 Skorohod represetatio theorem: Assume F d F. The radom variables defied i expressio (3.6) satisfy 1. P (Y t) = F (t) for all ad P (Y t) = F (t); 2. Y Y. Before provig Theorem 3.15, we first state a techical lemma, a proof of which is the subject of Exercise 3.8(a). Lemma 3.16 Assume F d F ad let the radom variables Y ad Y be defied as i expressio (3.6). The for ay ω (0, 1) ad ay ɛ > 0 such that ω + ɛ < 1, Y (ω) lim if Y (ω) lim sup Y (ω) Y (ω + ɛ). (3.7) Proof of Theorem 3.15: By Lemma 3.14, Y t if ad oly if ω F (t). But P (ω F (t)) = F (t) by costructio. A similar argumet for Y proves the first part of the theorem. For the secod part of the theorem, lettig ɛ 0 i iequality (3.7) shows that Y (ω) Y (ω) wheever ω is a poit of cotiuity of Y (ω). Sice Y (ω) is a odecreasig fuctio of ω, there are at most coutably may poits of discotiuity of ω; see Exercise 3.8(b). Let D deote the set of all poits of discotiuity of Y (ω). Sice each idividual poit i Ω has probability zero, the coutable subadditivity property (3.3) implies that P (D) = 0. Sice we have show that Y (ω) Y (ω) for all ω D, we coclude that Y Y. Note that the poits of discotiuity of Y (ω) metioed i the proof of Theorem 3.15 are ot i ay way related to the poits of discotiuity of F (x). I fact, flat spots of F (x) lead to discotiuities of Y (ω) ad vice versa. Havig thus established the Skorohod Represeatio Theorem, we ow itroduce the Domiated Covergece Theorem. Theorem 3.17 Domiated Covergece Theorem: If for some radom variable Z, X Z for all ad E Z <, the X d X implies that E X E X. Proof: Fatou s Lemma (see Exercise 3.9) states that E lim if X lim if E X. (3.8) A secod applicatio of Fatou s Lemma to the oegative radom variables Z X implies E Z E lim sup X E Z lim sup E X. 59

Because E Z <, subtractig E Z preserves the iequality, so we obtai lim sup Together, iequalities (3.8) ad (3.9) imply E lim if X lim if E X E lim sup X. (3.9) E X lim sup E X E lim sup X. Therefore, the proof would be complete if X X. This is where we ivoke the Skorohod Represetatio Theorem: Because there exists a sequece Y that does coverge almost surely to Y, havig the same distributios ad expectatios as X ad X, the above argumet shows that E Y E Y, hece E X E X, completig the proof. Exercises for Sectio 3.3 Exercise 3.8 This exercise proves two results used to establish theorem 3.15. (a) Prove Lemma 3.16. Hit: For ay δ > 0, let x be a cotiuity poit of F (t) i the iterval (Y (ω) δ, Y (ω)). Use the fact that F d F to argue that for large, Y (ω) δ < Y (ω). Take the limit iferior of each side ad ote that δ is arbitrary. Similarly, argue that for large, Y (ω) < Y (ω + ɛ) + δ. (b) Prove that ay odecreasig fuctio has at most coutably may poits of discotiuity. Hit: If x is a poit of discotiuity, cosider the ope iterval whose edpoits are the left- ad right-sided limits at x. Note that each such iterval cotais a ratioal umber, of which there are oly coutably may. Exercise 3.9 Prove Fatou s lemma: E lim if X lim if E X. (3.10) Hit: Argue that E X E if k X k, the take the limit iferior of each side. Use the mootoe covergece property o page 25. Exercise 3.10 If Y d Y, a sufficiet coditio for E Y E Y is the uiform itegrability of the Y. 60

Defiitio 3.18 The radom variables Y 1, Y 2,... are said to be uiformly itegrable if sup E ( Y I{ Y α}) 0 as α. Prove that if Y d Y ad the Y are uiformly itegrable, the E Y E Y. Exercise 3.11 Prove that if there exists ɛ > 0 such that sup E Y 1+ɛ <, the the Y are uiformly itegrable. Exercise 3.12 Prove that if there exists a radom variable Z such that E Z = µ < ad P ( Y t) P ( Z t) for all ad for all t > 0, the the Y are uiformly itegrable. You may use the fact (without proof) that for a oegative X, E (X) = 0 P (X t) dt. Hits: Cosider the radom variables Y I{ Y t} ad Z I{ Z t}. I additio, use the fact that E Z = E ( Z I{i 1 Z < i}) i=1 to argue that E ( Z I{ Z < α}) E Z as α. 61