Probability: Limit Theorems I. Charles Newman, Transcribed by Ian Tobasco

Size: px

Start display at page:

Download "Probability: Limit Theorems I. Charles Newman, Transcribed by Ian Tobasco"

Gilbert Pope
5 years ago
Views:

1 Probability: Limit Theorems I Charles Newma, Trascribed by Ia Tobasco

2 Abstract. This is part oe of a two semester course o measure theoretic probability. The course was offered i Fall 2011 at the Courat Istitute for Mathematical Scieces, a divisio of New York Uiversity. The text is Probability Theory by S. R. S. Varadha. Homework will be assiged o a regular basis.

3 Cotets Chapter 1. Itroductio to Probability Theory 5 1. Probability Spaces 5 2. Measure Theoretic Itegratio 9 3. Product Spaces ad Product Measures Distributios ad Expectatios 12 Chapter 2. Weak Covergece Characteristic Fuctios Weak Covergece 17 Chapter 3. Idepedet Radom Variables Idepedece ad Covolutio Weak Law of Large Numbers Strog Law of Large Numbers Kolmogorov s 0-1 Law Cetral Limit Theorem Triagular Arrays ad Ifiite Divisibility Law of the Iterated Logarithm 39 Chapter 4. Depedet Radom Variables Coditioig Markov Chais ad Radom Walks Trasiece ad Recurrece 48 3

5 CHAPTER 1 Itroductio to Probability Theory 1. Probability Spaces Defiitio 1.1. A measurable space is a pair (Ω, B) where Ω is a set ad B is a σ-field of subsets of Ω, i.e. B cotais ad is closed uder complemetatio ad coutable uios. It follows immediately from the defiitio that every σ-field also cotais Ω ad is closed uder coutable itersectios. Propositio Give ay collectio F of subsets of Ω, there exists a uique σ-field B such that B F ad for ay σ-field G with G F, it follows that G B. The σ-field B give by the propositio above is called the σ-field geerated by F ad is ofte deoted σ (F). It is the smallest σ-field cotaiig F. I geeral it is difficult to say exactly whether a set belogs to a give σ-field. But the propositio above gives a coveiet way to costruct a σ-field from the iterestig sets (whatever those may be). Examples 1.3. Here are some first examples of measure spaces. (1) A pair of dice: Ω 1 = {(ω 1, ω 2 ) : ω i {1, 2,..., 6} for i = 1, 2}, B 1 = P (Ω 1 ). Or if dice are idistiguishable, replace B 1 with B 1 = {A : (ω 1, ω 2 ) A = (ω 2, ω 1 ) A}. (2) Number of cars passig a itersectio (i a oe-miute period): Ω 2 = {0, 1, 2,... }, B 2 = P (Ω 2 ). (3) Arbitrarily log (or ifiite) sequeces of tosses of a coi: Ω 3 = {(ω 1, ω 2,... ) : ω i {0, 1}}. We wat to uderstad evets of the form {ω = (ω 1, ω 2,... ) : ω 1 = ε 1,..., ω k = ε k, ε 1,..., ε k {0, 1}}. So let F deote the collectio of such evets, ad take B 3 = σ (F). Lecture 1, 9/6/11 1 This is exercise 1.7 i the text. 5

6 6 1. INTRODUCTION TO PROBABILITY THEORY (4) Poit processes i space (e.g. oe could study the distributio of galaxies i the uiverse): Ω 4 = { ω : ω R 3 s.t. bouded Λ R 3, ω Λ is fiite }. Let Λ be a ope, bouded subset of R 3 ad let A = {ω : ω Λ = } for fixed {0, 1, 2,... }. Let F be the collectio of all such A for all possible Λ, ad take B 4 = σ (F). Defiitio 1.4. A probability measure P o (Ω, B) is a fuctio from B to [0, 1 such that P (Ω) = 1 P is coutably additive, i.e. if A 1, A 2,... are disjoit members of B the P ( =1A ) = P (A ). Note that coutable additivity is equivalet to the followig cotiuity property: 2 If B 1 B 2 ad B B for all the P ( =1B ) = lim P (B ). Takig complemets, we get aother equivalet cotiuity property: If C 1 C 2 ad C B for all the =1 P ( =1C ) = lim P (C ). Defiitio 1.5. A probability space is a triple (Ω, B, P ) as above. If Ω is coutable ad B = P (Ω) the a probability measure is uiquely determied by the umber p (ω) = P ({ω}) for each ω Ω, provided p (ω) 0 for each ω ad ω Ω p (ω) = 1. I this case, we have for ay A B. P (A) = ω A p (ω) Example 1.6. Recall the previous examples. For Ω 1, we could take p (ω) = 1/36 for each ω, supposig we have fair, idepedet dice. For Ω 2, we could take λ λω p (ω) = e ω!, the Poisso distributio with mea λ > 0. But (3) ad (4) are ot as straightforward. What if Ω = σ (F) is ucoutable? We may kow what P should be o the smaller collectio F, e.g. ( ) k 1 P ({ω : ω 1 = ε 1,..., ω k = ε k }) = 2 i the third example above. This gives rise to the followig questios i the geeral cotext: Give Ω ad B = σ (F), if P is first defied o F, the (1) Ca the domai of P be exteded to B such that P is a probability measure? (I particular, how ca we guaratee coutable additivity?) 2 Proof of which is problem 1 from problem set 1 (also exercise 1.2 i the text).

7 1. PROBABILITY SPACES 7 (2) Is the extesio to B uique? Basic aswers to these questios are i the followig defiitios ad theorems. Defiitio 1.7. A field F existig of subsets of Ω is a collectio closed uder complemetatio ad fiite uios. A fiitely addivitive fuctio P from F to [0, 1 with P (Ω) = 1 is called coutably additive o F if for ay decreasig sequece A 1 A 2 with A j F for all j ad with =1A =, we have lim P (A ) = 0. Theorem 1.8 (Caratheodory Extesio Theorem). If F is a field ad P is coutably additive o F as above, the there exists a uique probability measure P o B = σ (F) such that P F = P. See page four of the text for a discussio of the proof. To use the theorem, we eed a field. Cosider the ext example. Example 1.9. The field of subsets of R. Let I be the collectio of itervals of the form (a, b with a < b or (, b or (a, ). The the fiite disjoit uios of itervals from I form a field. Exercise Check that the collectio of fiite disjoit uios of itervals i the example above ideed form a field. Defiitio The σ-field geerated by the field i the previous example is called the Borel σ-field, B. Ay elemet of B is called a Borel subset of R. To defie a probability measure o (R, B), we ll defie it first o itervals (a, b, ad the exted uiquely via the Caratheodory extesio theorem. Defiitio The Lebesgue measure is the uique measure λ such that λ ((a, b) = b a. Remark. But λ is ot a probability measure, as λ (R) =. We ll see ext how to costruct probability measures via the extesio theorem. Suppose F (x) is a mootoe icreasig ad right-cotiuous fuctio o R. Defie the set fuctio λ F ((a, b) = F (b) F (a). Note if F (x) = x, the λ F ((a, b) = λ ((a, b) ad so λ F exteds to the Lebesgue measure. Now suppose that F ( ) = 0 ad F (+ ) = 1. The λ F is a probability measure, for by our assumptio. λ F (R) = lim F (b) F (a) = 1 0 = 1 b a Example Let F : R R be give by { 0 x < 0 F (x) = 1 x 0. Note that λ F ((, ɛ) = F ( ɛ) F ( ) = 0 0 = 0, λ F ((, 0) = F (0) F ( ) = 1, Lecture 2, 9/13/11

8 8 1. INTRODUCTION TO PROBABILITY THEORY ad λ F ((, )) = 1. λ F geerates the Dirac δ-measure (or poit mass) which satisfies { 1 0 U δ (U) = 0 0 / U. I a similar way we ca geerate the poit mass at x, { 1 x U δ x (U) = 0 x / U. Claim. λ F is coutably additive o F iff F is right-cotiuous. Defiitio A distributio fuctio F (x) is a right-cotiuous, odecreasig fuctio o R which satisfies F ( ) = 0 ad F (+ ) = 1. Propositio If P is a probability measure o (R, B), the F (x) = P ((, x) is a distributio fuctio. Theorem Coversely, for every distributio fuctio F, there exists a uique probability measure o (R, B) with P ((, x) = F (x). Now we ll see a importat class of probability measures. Let x 1, x 2,... be a sequece of distict real umbers, ad let p 1, p 2, 0 satisfy p = 1. Defie the measure P (A) = p = p 1 A (x ). :x A These are called discrete probability measures o R. What is the distributio fuctio for P? We have F (x) = P ((, x) = p. =1 Here is a example of such a probaiblity measure. :x x Example The Poisso probability measure o R with rate λ > 0. Set x 1 = 0, x 2 = 1,..., x = 1,... ad set p = λ 1 ( 1)! e λ. The discrete probability measure geerated by these x, p is the Poisso measure. Example Let x 1, x 2,... be a deumeratio of Q. Let p = 2. Let P = p δ x = 1 2 δ x. This is a importat measure ad has some strage properties. For example: Propositio The distributio fuctio F (x) = 1 2 is strictly icreasig. :x x Proof. Fix x ad let ɛ > 0. The iterval (x, x + ɛ cotais ratioal umbers, ad hece has positive probability. Thus F (x + ɛ) F (x) = P ((x, x + ɛ) > 0.

9 2. MEASURE THEORETIC INTEGRATION 9 This measure is a great source of couterexamples, ad is worth rememberig. Example Suppose f is a o-egative, (Lebesgue)-itegrable fuctio with f (y) dy = 1. (Such a f is kow as a desity fuctio.) Defie the distributio fuctio F (x) = x f (y) dy ad let P satisfy P ((, x) = F (x). (Check that F is ideed a distributio fuctio.) By the fudametal theorem of calculus, almost everywhere. We ll see why later. d F (x) = f (x) dx 2. Measure Theoretic Itegratio First we must idetify which fuctios we expect to itegrate. Defiitio 2.1. A (real-valued) measurable fuctio o a measure space (Ω, Σ) is a map f : Ω R such that for all Borel B R, f 1 (B) Ω. Geerically, a measurable fuctio has the property that the preimages of measurable sets are measurable. Defiitio 2.2. Let (Ω, Σ, P ) be a probability space. A radom variable (rv) o (Ω, Σ, P ) is a measurable fuctio. So what is the differece here? I measure theory we care about the space, (R, B), ad objects like measurable fuctios help you uderstad the space. I probability, we care about radom variables (observables), ad we choose the space Ω to help us uderstad the radom variables. We usually supress the argumet whe writig a radom variable, so we would write X to mea the radom variable (measurable fuctio) X (a). Defiitio 2.3. A map f : Ω 1 Ω 2 betwee measure spaces is measurable if for all B Σ 2, f 1 (B) Σ 1. Such a f is called a Ω 2 -valued radom variable. Remark. I this course, the fuctos we ll discuss will be measurable. So it s ice to kow that certai classes of fuctios (e.g. cotiuous fuctios) are measurable ad that the usual operatios (e.g. +,, ( ) 1,, ( ), d dx ( ), lim ( )) preserve the property of measurability. Proof of these facts ca be foud i ay stadard text o measure theory. Example 2.4. Let X 1, X 2, X 3, X 4 be real-valued radom variables (o some measure space (Ω, Σ, P )). Defie the matrix-valued radom variable ( ) X1 X Y = 2. X 3 X 4 Let Ω 2 2 be the space of 2 2 matrices (Ω 2 2 measurable. = R 4 ), the Y : Ω Ω 2 2 is Here are the steps to buildig a theory of itegratio:

10 10 1. INTRODUCTION TO PROBABILITY THEORY (1) Defie the itegral Ω f (ω) dp (ω) = f dp for simple fuctios f, i.e. fiite liear combiatios of idicator fuctios of measurable sets. (2) Exted the defiitio to bouded fuctios. (3) Exted to o-egative fuctios. (4) Exted to itegrable fuctios (suitably defied). Here is step four. If f is measurable, defie the fuctios f + (ω) = f (ω) 1 [0, ) (f (ω)) = max (f, 0) f (ω) = f (ω) 1 (,0 (f (ω)) = mi (f, 0). The f +, f are measurable, ad by the costructio f = f + f ad f = f + +f. We way that f is itegrable if f dp <, ad i that case we defie Ω f dp = f + dp f dp. Example 2.5. Cosider f : R R give by f (x) = si x. Oe might be tempted to say f = 0, but i fact f is ot itegrable uder our defiitio. So f is ot well defied. Lecture 3, 9/20/11 Defiitio 2.6. If X : Ω R is a real-valued radom variable which is itegrable, the we defie its expectatio to be the quatity E [X = X (ω) dp (ω). Here are some useful theorems of itegratio: Ω Mootoe Covergece Theorem: If X, the lim E [X = E [lim X. Domiated Covergece Theorem: If X Y for itegrable Y, the lim E [X = E [lim X. Bouded Covergece Theorem: If X M <, the lim E [X = E [lim X. Jese s Iequality: If φ is covex, the φ (E [X ) E [φ (X ). These theorems hold uder poitwise covergece of X X, but they also hold uder a weaker kid of covergece, i which the set where X X has probability zero. Defiitio 2.7. If X is a sequece of (real-valued) radom variables o (Ω, B, P ), the we say X X almost surely (a.s.) if P {ω : X (ω) X (ω)} = 1. Here is a aother importat kid of covergece. Defiitio 2.8. If P {ω : X (ω) X (ω) > ɛ} 0 for all ɛ > 0, the we say that X X i probability. Note. If we were doig aalysis, we would say almost everywhere istead of almost surely ad i measure istead of i probability.

11 3. PRODUCT SPACES AND PRODUCT MEASURES 11 Example 2.9. O {[0, 1, Borel sets, Lebesgue measure}, cosider the sequece of radom variables X 1 = 1 [0,1/2 X 2 = 1 [1/2,1 X 3 = 1 [0,1/4 X 4 = 1 [1/4,1/2.. ad so o. The X do ot coverge a.e. to aythig (cosider ay o-dyadic umber i [0, 1). But they do coverge i probability to X 0, for with k () as. P { X X > ɛ} 1 2 k() 3. Product Spaces ad Product Measures This is importat for studyig idepedece i the abstract settig. Let (Ω 1, B 1, P 1 ), (Ω 2, B 2, P 2 ) be probability spaces, ad recall their Cartesia product Ω = Ω 1 Ω 2 is the set of (ω 1, ω 2 ) with ω i Ω i. There is a atural way to costruct a σ-field B ad probability measure P o Ω 1 Ω 2, which is aalogous to goig from oe-dimesio Lebesgue measure (legth) o R 1 to two-dimesioal Lebesgue measure (area) o R 2. The product σ-field, B = B 1 B 2, is the σ-field geerated by {A 1 A 2 : A 1 B 1, A 2 B 2 } (the measurable rectagles). Now we look to defie a measure o B. Let F be the field (ot σ-field) of fiite disjoit uios of the measurable rectagles. The defie P o F by first settig P (A 1 A 2 ) = P 1 (A 1 ) P 2 (A 2 ) for oe measurable rectagle A 1 A 2 ad the extedig via fiite additivity to all of F. Lemma 3.1. P defied as above is coutably additive o F. Thus P exteds uiquely to a probability measure o B, the product measure P 1 P 2. Furthermore, if A B (but ot eccessarily i F), the P (A) ca be evaluated by iterated itegratio (i either order). This is corollary 1.11 i the text, ad is a special case of (ad leads to) the followig theorem. Theorem 3.2 (Fubii). If f (ω 1, ω 2 ) is a (real-valued) itegrable fuctio o (Ω, B, P ) = (Ω 1 Ω 2, B 1 B 2, P 1 P 2 ), the the fuctio ω 2 f (ω 1, ω 2 ) is itegrable for a.e. ω 1, the fuctio ω 1 Ω 2 f (ω 1, ω 2 ) dp 2 is itegrable, ad [ f (ω 1, ω 2 ) dp = f (ω 1, ω 2 ) dp 2 dp 1. Ω 1 Ω 2 Ω 2 Similarly, f (ω 1, ω 2 ) dp = Ω 1 Ω 2 Ω 1 Ω 2 [ f (ω 1, ω 2 ) dp 1 dp 2. Ω 1 Remark. The coverse holds for o-egative fuctios, a result due to Toelli.

12 12 1. INTRODUCTION TO PROBABILITY THEORY 4. Distributios ad Expectatios Defiitio 4.1. Suppose (Ω 1, B 1, P ) is a probability space ad (Ω 2, B 2 ) is a measure space. If X : Ω 1 Ω 2 is a Ω 2 -valued radom variable, the Q defied by Q (A) = P ( X 1 (A) ) for A B 2 is a probability measure o (Ω 2, B 2 ), called the iduced measure or the distributio of X. Note. I the aalysis settig, oe might see Q = P X 1. I the probability settig, we ca iterpret Q (A) as the probability that X takes values i A. Example 4.2. If X is real-valued ((Ω 2, B 2 ) = (R, Borel sets)), the Q is the probability measure o R whose distributio fuctio is F Q (x) = Q ((, x) = P (X x) = F X (x). F X is the cumulative distributio fuctio of X from classical probability. So the distributio of a radom variable (i the ew sese) is just a geeralizatio of cumulative distributio to abstract probability space. Theorem 4.3 (Chage of Variables). If X is as i the above defiitio ad h is a real-valued measureable fuctio o (Ω 2, B 2 ) the the fuctio g give by g (ω 1 ) = h (X (ω 1 )) is a real-valued radom variable o (Ω 1, B 1 ). g is itegrable o (Ω 1, B 1, P ) iff h is itegrable o (Ω 2, B 2, Q) where Q is the distributio of X, ad E [g = E [h (X) = h (X (ω 1 )) dp = h (ω 2 ) dq. Ω 1 Ω 2 Example 4.4. If X is real-valued, the we have the usual formula E [h (X) = h (X (ω 1 )) dp = h (ω 2 ) dq = h (x) df X (x) Ω 1 R with the Lebesgue-Stieltjes itegral o the right. Here is the situatio thus far. If X is a real-valued radom variable o (Ω, B, P ), the it iduces a measure α = P X 1 called the probability distributio of X. The distributio fuctio of X is the F (x) = α ((, x) = P (X x). Ad if X is itegrable, its expectatio (mea) is E [X = X (ω) dp = x dα = x df (x). Ω Fially, if g a α-itegrable real-valued fuctio o R, the E [g (X) = g (X (ω)) dp = g (x) dα = g (x) df (x). Ω R R Defiitios 4.5. The mth momet of X for m = 1, 2, 3,... is E [X m = x m dα, provided E [ X m = R x m dα <. The variace of X is [ Var (X) = E (X E (X)) 2 = E [ X 2 (E [X) 2 ad the stadard deviatio of X is R σ (X) = Var (X).

13 4. DISTRIBUTIONS AND EXPECTATIONS 13 Problem oe o the secod homework is to calculate the mea ad stadard deviatio of a shufflig radom variable. Here is a hit. Propositio 4.6. If X 1, X 2,..., X are itegrable radom variables the c i X i is itegrable ad E [ c i X i = c i E [X i.

15 CHAPTER 2 Weak Covergece 1. Characteristic Fuctios Note. I probability, characteristic fuctios are ot idicator fuctios. Recall the followig elemetary facts about complex umbers: i 2 = 1 C = {a + bi a, b R} e ir = cos r + i si r for r R. Exercise 1.1. Defie e ir = (ir)!. Prove the third fact usig i 2 = 1 ad Taylor series for cos r ad si r. Defiitio 1.2. A complex-valued fuctio f (ω) = a (ω) + ib (ω) is itegrable if a ad b are itegrable, ad the f (ω) dp = a (ω) dp + i b (ω) dp. Thus if α is a probability measure o R, the for t R we have e itx dα = cos tx dα + i si tx dα. R R If α is the probability distributio of some real-valued radom variable X, the cosider the complex-valued fuctio φ (t) = e itx dα = e itx df X = E [ e itx. R Defiitio 1.3. The fuctio φ above is called the characteristic fuctio of α (or of F X or of X). Characteristic fuctios are importat to may parts of this course. For example, a stadard proof of the cetral limit theorem is via characteristic fuctios. Examples 1.4. (1) If X has uiform distributio o [C, D, the D ( e itx dα = e itx 1 D C dx = 1 e itd e itc D C it R C Note that ( 1 e itd e itc ) lim = 1 t 0 D C it as we expect (α is a probability measure). A special case is C = 1/2, D = +1/2, for which e itx dα = si t 2 t. R 2 15 R ).

16 16 2. WEAK CONVERGENCE (2) If X is Biomial(, p), the E [ e itx ( ) = e itk p k (1 p) k = ( pe it + (1 p) ). k k=0 (3) If X is Poisso(λ), the E [ e itx = e itk λ λk e k! = it 1) eλ(e. k=0 (4) If X is expoetial with mea µ, the E [ e itx 1 = 1 iµt. (5) If X is Normal ( µ, σ 2), the Lecture 4, 9/27/11 E [ e itx = e iµt 1 2 σ2 t 2. Exercise 1.5. Verify these calculatios. What is the relatio betwee characteristic fuctios ad momets? Formally, by iterchagig derivatives ad itegrals, we get d dt φ (t) = d e itx dα = (ix) e itx dα dt ad by settig t = 0 φ (0) = ix dα = ie [X. More geerally ( m d e dt) itx = i t=0 m x m ad so φ (m) (0) = i m E [X m, at least formally. Whe is this justified? Propositio 1.6. If E [ X m <, the φ is m-times (cotiuously) differetiable ad φ (m) (0) = i m E [X m. Proof of this will be problem two o the secod problem set. Also see exercises 2.2, 2.4 of the text. What about a coverse? If φ is m-times (cotiuously) differetiable, does this imply that E [ X m < ad hece E [X m = 1 i m φ(m) (0)? The aswer is yes for m eve, but o for m odd. See exercise 2.3 of the text for the details. Here is a classic theorem, which tells us how to determie if a complex-valued fuctio is the characteristic fuctio of a radom variable. Theorem 1.7 (Bocher). A complex-valued fuctio φ (t) is the characteristic fuctio of some α (or F or X) iff φ satisfies the followig three properties: (1) φ (0) = 1

17 2. WEAK CONVERGENCE 17 (2) φ is cotiuous (for all t) (3) φ is positive (semi-)defiite; i.e. for ay = 1, 2, 3,... ad ay real t 1,..., t, the matrix (φ (t i t j )) i,j=1 should be positive (semi-)defiite (self-adjoit with all positive eigevalues). Equivaletly, give N ad t 1,..., t R, φ should satisfy c i c j φ (t i t j ) 0 i,j=1 for all complex c 1,..., c. See the text for the proof; the oly if part is easier. This ext propositio is used to fiish the proof of the cetral limit theorem. Observe first that a distributio fuctio F is determied uiquely by its values at its cotiuity poits (F is rightcotiuous ad icreasig). Propositio 1.8. A distributio α (or distributio fuctio F ) is uiquely determied by its characteristic fuctio: if a, b are cotiuity poits of F the 1 α ((a, b) = F (b) F (a) = lim T 2π See pp of the text for the proof. T T e ita e itb φ (t) dt. it Remark. Eve if a, b are ot cotiuity poits, the limit above equals This is cosistet with the propositio. α ((a, b)) α ({α}) α ({b}). 2. Weak Covergece We are workig towards the cetral limit theorem, which is about the covergece of distributios to the ormal distributio. We eed to reiterpret the word covergece to acheive the theorem. Theorem 2.1. Let α, α be probability measures o R ad let F, F be the correspodig ditributio fuctios. The followig are equivalet: (1) lim F (x) = F (x) for every x that is a cotiuity poit of F. (2) α ([a, b) α ([a, b) as for all a, b R which are ot atoms of α (i.e. α ({a}), α ({b}) 0). (3) lim R f (x) dα (x) = f (x) dα (x) for every bouded, cotiuous R (real/complex-valued) fuctio f o R. Remark. If X has distributio α ad X has distributio α, the (3) says that E [f (X ) E [f (X). Defiitio 2.2. If α, α (F, F ) satisfy ay of the coditios above, we say α coverges weakly to α, ad we write α α (F F ). Theorem 2.3. α α iff the characteristic fuctios φ, φ satisfy lim φ (t) = φ (t) for every real t.

18 18 2. WEAK CONVERGENCE This last iterpretatio of weak covergece is used to prove the cetral limit theorem. Remarks. (1) If X, X have distributios α, α with α α, we say X coverges i distributio (i law) to X. As oted above, this meas E [f (X ) E [f (X) for ay bouded, cotiuous f. (2) (1) or (2) i theorem 2.1 ca be stregtheed to: α (A) α (A) for every Borel set A so that α ( A\A 0) = 0. (This is theorem 2.6 i the text.) Such a set A is called a cotiuity set of α. Example 2.4. Here is a simple example. Take α = δ 1 1/ ad α = δ 1. The α α, but α ({1}) = 0 for all so that α ({1}) α ({1}). Ideed, {1} is ot a cotiuity set of α (α ( A\A 0) 0). Cosider the proof of theorems 2.1 ad 2.3. (1) = (3) is fairly straightforward (see pp of the text). (3) = φ (t) φ (t) is trivial (take f (x) = e itx ). The iterestig proof is that φ (t) φ (t) = (1), which follows from the ext theorem. Theorem 2.5. Suppose φ, = 1, 2,..., are characteristic fuctios (of probability measures α ) ad there exists a fuctio φ so that φ (t) φ (t) for each t. Suppose also that φ is cotiuous at t = 0. The φ is a characteristic fuctio of some probability measure α, ad α α. Proof sketch. Part 1. (Steps 1-3 o pp ) Let F be ay sequece of distributio fuctios, the there exists a subsequece k ad a sub-distributio fuctio G (G is o-decreasig, right-cotiuous, with values i [0, 1; but G ( ) > 0 ad/or G (+ ) < 1 are allowed) such that F k (x) G (x) at every cotiuity poit of G. Proof of this fact uses boudedess of {F (x)} for each x, coutability of the ratioals, diagoalizatio, ad so forth. I some books, this type of covergece is called vague covergece. Example 2.6. For α = 2 3 δ δ δ, F (x) G (x) with { 1 G (x) = 12 x < x 1. Part 2. (Steps 4-5 o pp ) The key argumet is to show that φ (t) is cotiuous at t = 0 implies that G ( ) = 0 ad G (+ ) = 1. This follows from the Lemma 2.7. If F is a distributio fuctio with characteristic fuctio φ, the for ay T > 0 we have ( ( ) ( 2 1 F F 2 )) ( ) T φ (t) dt. T T 2T T This lemma is proved by elemetary maipulatios ad Fubii s theorem (see p. 27 of the text). Applyig the lemma to the F k ad takig k yields that for small T such that ±2/T are cotiuity poits of G, ( ( ) ( )) ( G G ) T φ (t) dt. T T 2T T

19 2. WEAK CONVERGENCE 19 This holds eve though we do t yet kow G is a distributio fuctio (so we do t kow φ is a characteristic fuctio). But we do kow φ is cotiuous at t = 0 ad φ (0) = 1, so G ( ) = 0 ad G (+ ) = 1. So G is a distributio fuctio. Now we have a subsequece F k which coverges to a distributio fuctio G at cotiuity poits. By what we ve already show, φ k (t) e itx dg for every t. But sice φ k (t) φ (t), we have φ (t) = e itx dg. Now for ay subsequece k, by the same argumet there exists a sub-subsequece k j so that F kj coverges to a distributio fuctio G. But agai φ G = φ G so that G = G ad hece F kj G. This implies F G; ow reame G as F to complete the proof. The argumets used i this proof are iterestig i their ow right. Defiitio 2.8. A collectio A of probability distributios o R is called totally bouded if there exists a probability distributio α so that every sequece α from A has a subsequece α k α. Theorem 2.9. A family of probability distributios A is totally bouded iff either of the followig holds: (1) The family is tight, i.e. lim sup α ({x : x l}) = 0. l α A (2) Let φ α be the characteristic fuctio of α. The lim sup h 0 t h 1 φ α (t) = 0. That (1) ad (2) are equivalet is the cotet of lemma 2.7. That (1) implies A is totally bouded is part 1 of the proof of theorem 2.5. That (2) implies A is totally bouded is part 2 of the proof of theorem 2.5. Lecture 5, 10/4/11

21 CHAPTER 3 Idepedet Radom Variables 1. Idepedece ad Covolutio Recall the otio of idepedet radom variables from classical probability. We eed a more geeral defiitio to tackle the classic theorems of probability. Defiitios 1.1. A fiite family of (real-valued) radom variables X 1, X 2,, X is idepedet (joitly idepedet) if P (X 1 B 1, X 2 B 2,..., X B ) = P (X 1 B 1 ) P (X B ) for B 1, B 2,... ay Borel subsets of R. A ifiite family of radom variables {X α } is idepedet if every fiite subfamily is. Evets A 1, A 2,... are said to be idepedet if their idicators are idepedet, or equivaletly if P (C 1 C 2 C ) = P (C 1 ) P (C ) for each ad for every choice of C j = A j or (A j ) c. Note. (1) For a pair of evets A 1, A 2 it suffices to check P (A 1 A 2 ) = P (A 1 ) P (A 2 ). This is what is usually preseted i a first course o probability. Exercise 1.2. Prove this. (2) It is possible for a family of evets (or radom variables) to be pairwise idepedet but ot (joitly) idepedet. See the ext example. Example 1.3. Fair coi tossig. Set { +1 ith toss is heads X i =, 1 ith toss is tails the X 1, X 2 ad Y = X 1 X 2 are pairwise idepedet but ot joitly idepedet. It might seem o-ituitive that these are pairwise idepedet. But thik of the coditioal distributios. This really shows idepedece is a statistical property (ad ot a fuctioal property). Here is a secod way to uderstad idepedece. Recall that if X 1,..., X are real-valued radom variables o some (Ω, F, P ) we ca regard X = (X 1,..., X ) as a R -valued radom variable whose distributio (the joit distributio of X 1,..., X from elemetary probability) is the measure µ o R so that µ (B) = P ({ω : X (ω) B}). If B is a rectagle B 1 B ad if X 1,..., X are idepedet, the µ (B 1 B ) = µ 1 (B 1 ) µ (B ) where µ i is the distributio (o R) of X i. I this case, we write µ = µ 1 µ. The coverse is also true. 21

22 22 3. INDEPENDENT RANDOM VARIABLES Lemma 1.4. X 1,..., X are idepedet iff the distributio of (X 1,..., X ) is the product measure of the distributios of the idividual X i s. We re workig our way towards applyig the theory to characteristic fuctios. This will lead to the classic theorems. Here is a tool we ll use quite ofte. Propositio 1.5. Suppose f 1,..., f are (bouded) measurable fuctios o R. 1 If X 1,..., X are idepedet (real-valued) radom variables, the E [f 1 (X 1 ) f (X ) = Π j=1e [f j (X j ). Proof. Apply the above lemma ad Fubii s theorem. A importat case of this is whe f j (X j ) = e itjxj with real umbers t j ad j = 1,...,. The, the propositio implies [ E e i(t1x1+ +tx) = Π j=1e [ e itjxj. The object o the left is the (multivariate) characteristic fuctio φ (t) = φ ((t 1,..., t )) of X = (X 1,..., X ). Of course this defiitio does ot require idepedece. The coverse is also true (if the equality holds for all choices of t j, the X 1,..., X are idepedet). I this course, we ll maily cosider t = t 1 = t 2 = = t. I this case, we see that if X 1,..., X are idepedet, the the characteristic fuctio of X = X X is φ X1+ +X (t) = E [e it(x1+ +X) = Π j=1e [ e itxj = Π j=1φ Xj (t). This is a importat relatio which we ll use time ad time agai. But ote that eve if the equality above holds for all t it may ot be true that X 1,..., X are idepedet. (??) The discussio above shows the utility of the characteristic fuctios. But i practice, distributios are closer to what we re actually iterested i. So we ask: Is there a represetatio for the distributio of sums of idepedet radom variables? The aswer is yes, but ufortuately it s ot quite as ice. By Fubii s theorem, oe has that for Z = X + Y (X, Y idepedet) ad A ay Borel subset of R, µ Z (A) = µ X (A y) dµ Y (y) = µ Y (A x) dµ X (x). R This is called the covolutio of µ X ad µ Y ad is deoted by µ X µ Y. If X, Y have probability desities f X, f Y, the Z also has a desity f Z with f Z (z) = f X (z y) f Y (y) dy = R f Y (z x) f X (x) dx. Here is a stadard ad useful fact for idepedet X, Y, which follow from the relatio E [XY = E [X E [Y : If E [ X 2, E [ Y 2 <, the Var (X + Y ) = Var (X) + Var (Y ). j. 1 The assumptio that f1,..., f are bouded ca be weakeed to E [ f j (X j ) < for each

23 2. WEAK LAW OF LARGE NUMBERS 23 For X 1,..., X idepedet, Var (X X ) = Var (X 1 ) + + Var (X ). Exercise 1.6. Derive these results. May of the stadard limit theorems of probability theory (e.g. the Law of Large Numbers, the Cetral Limit Theorem) cocer sums of idepedet radom variables X X as. I order to make sese of these theorems, we eed to a probability space o which ifiitely may idepedet radom variables X 1, X 2,... are defied (e.g. with some commo prescribed distributio). This comes from the classical samplig techiques from statistics. Defiitio 1.7. If X 1, X 2,... are idepedet radom variables which share a commo distributio, the they are said to be idepedet, idetically distributed (i.i.d.) radom variables. Example 1.8. Coi tossig where { +1 jth toss is heads X j =. 1 jth toss is tails There is a geeral argumet from measure theory that gives the existece of such spaces. Theorem 1.9. Let µ 1, µ 2,... be (Borel) probability measures o R, ad let Ω = {a = (x 1, x 2,... ) : x j R} be the space of ifiite sequeces of real umbers (sometimes deoted R ). Let F be the field of fiite dimesioal cylider sets, i.e. sets of the form {ω = (x 1, x 2,... ) : (x 1,..., x ) A} for N ad A a Borel subset of R. Let Σ be the σ-field geerated by F. The there exists a uique probability measure µ = µ 1 µ 2 o (Ω, Σ) such that µ ({ω = (x 1, x 2,... ) : (x 1,..., x ) A}) = (µ 1 µ ) (A) for ay N ad ay Borel subset A of R. Remark. (1) There is o covergece issue here. We re workig o a probability space. (2) We could have defied µ via ay choice of coordiates from ω, istead of just the first. The result would be the same. 2. Weak Law of Large Numbers Now we ca cosider i.i.d. radom variables X 1, X 2,... ad study the sample mea X1+ +X as. We ll begi with the weaker result, where we ll assumig ot oly that E ( X 1 ) < (so that E (X 1 ) = m is defied) but also that E ( ) X1 2 < (fiite variace). Write S = X X. We first study the weak law of large umbers, which claims that S m i probability, i.e. that P ( S m ) > δ 0 as. Note. Usually covergece i probability ad i distributio are ot the same. But sice the covergece here is to a costat (ad ot a arbitrary radom variable), the weak law of large umbers equivaletly claims that the distributios of S coverge i distributio to δ m as. Exercise 2.1. Check that α δ 0 iff P ( X > δ) 0 as.

24 24 3. INDEPENDENT RANDOM VARIABLES Later, we ll see the strog law of large umbers, where the covergece is almost surely. This takes much more work! Recall the followig result from measure theory. Propositio 2.2 (Chebyshev s Iequality). For ay radom variable X with E [ X 2 <, for ay a > 0. Proof. We have P ( X > a) 1 a 2 E [ X 2 P ( X > a) = E [ 1 { X >a} [ X 2 E a 2 1 { X >a} 1 a 2 E [ X 2 ad so we re doe. Now apply this to X = S ( ) S P m > δ m with a = δ to get 1 δ 2 E [ ((S = 1 δ 2 Var ( S ) ) m ) 2 = 1 ( δ 2 Var X1 + + X = 1 ( ) δ 2 Var X1 = 1 1 δ 2 Var (X 1) = 1 δ 2 σ2 0 as. Note the secod lie follows as [ X1 + + X m = E [X 1 = E = E ) [ S. So we have obtaied a first weak law of large umbers. (See theorem 3.2 i the text.) Theorem 2.3. If X 1, X 2,... are i.i.d. with E [X 1, E [ X1 2 < ad m = E [X i, the S m i probability as. S We ca do better. Theorem 2.4. If X 1, X 2,... are i.i.d. with E [ X 1 < ad m = E [X i, the m i probability as. There are two proofs.

25 2. WEAK LAW OF LARGE NUMBERS 25 X C j First proof. This proof is by a trucatio argumet. Give C > 0, defie = X j 1 { Xj C} ad Yj C = X j Xj C = X j 1 { Xj >C}. The S = 1 j=1 X C j + 1 j=1 Y C j = ξ C + η C, ad m = E [X 1 [ S = E = E [ X1 C [ + E Y C 1 = a C + b C = E [ ξ C [ + E η C we re we ve itroduced ξ C, η C ad a C, b C for bookkeepig. To prove S m it suffices to show that E [ S m 0. (Follows from Chebyshev s iequality.) Write [ S E m = E [ ξ C + η C a C b C E [ ξ C a C + 1 i=1 E [ Y C i + b C = E [ ξ C a C + E [ Y1 C + b C ( E [ ξ C a C 2) 1/2 [ + 2 E Y C 1 sice by Cauchy-Schwarz E [ W ( E [ W 2) 1/2. Now E [ ξ C a C 2 = Var ( ξ C ( ) 2 1 = Var ) ( X1 C ) 0 as, so that [ S lim sup E m 2 E [ Y1 C. Now we claim E [ Y1 C 0 as C. This follows from the domiated covergece theorem sice X 1 1 { X1 >C} 0 almost everywhere. This completes the proof. The idea of trucatio will appear agai whe we prove the strog law of large umbers. The secod proof is completely differet. It uses a importat lie of reasoig that will show up time ad time agai (e.g. i the proof of the cetral limit theorem).

26 26 3. INDEPENDENT RANDOM VARIABLES Secod proof. Argue with characteristic fuctios. Let φ (t) = E [ e itxj (for ay j), the set [ ψ (t) = E e it S [ = E e it( X 1 + +X ) = ( [ = E ( = φ e i tx 1 ( t ) )). Sice E [ X 1 <, φ is differetiable with φ (0) = ie [X 1 = im. So by Taylor expasio, ( ) t φ = 1 + im t ( ) 1 + o. The ( ψ (t) = 1 + im t ( )) 1 + o Lecture 6, 10/18/11 ad the limit as is the same as of ( 1 + im t ) e imt. But the characteristic fuctio of δ m is e imt, ad hece S δ m. We used the followig result: Lemma 2.5. If a is a sequece of complex umbers such that a z C (i.e. a = z/ + o (1/)), the (1 + a ) e z. 3. Strog Law of Large Numbers The coclusio of the strog law of large umbers is that S m almost surely. To prove almost sure limits, a basic (ad very importat) tool is the Borel-Catelli lemma. Lemma 3.1 (Borel-Catelli). Let {A } be ay sequece of evets i some probability space (Ω, F, P ). If =1 P (A ) <, the P ({A occurs for oly fiitely may }) = 1 or equivaletly P ({A occurs ifiitely ofte}) = 0 where {A occurs ifiitely ofte} = {ω : ω A for ifiitely may s}. Coversely, if the A are (mutually) idepedet evets with =1 P (A ) =, the P ({A occurs ifiitely ofte}) = 1. Note. Sometimes we write i.o. to mea ifiitely ofte. Proof. {A occurs i.o.} = k=1 ( =k A ), a decreasig limit as k of the sets =k A. So by coutable additivity we have P ({A occurs i.o.}) = lim P k ( =ka ) lim P (A ) = 0 k sice =1 P (A ) < by assumptio. =k

27 3. STRONG LAW OF LARGE NUMBERS 27 I the other directio, write {A occurs for oly fiitely may } = m=1 ( =m (A ) c ), which is a icreasig limit as k of the sets =m (A ) c. So P ({A occurs for oly fiitely may }) = lim m P ( =m (A ) c ) = lim m Π =mp ((A ) c ) = lim m Π =m (1 P (A )) sice by assumptio the A are idepedet. Usig the fact that 1 x e x for x 0 (proof by calculus) we get P ({A occurs for oly fiitely may }) lim m e ad hece the claim. = lim m 0 = 0 P =m P (Am) Exercise 3.2. Check the details of the proof by replacig Π =mp ((A ) c ) with Π l =mp ((A ) c ) ad lettig l i the ed. Now we are equipped to prove a first strog law of large umbers. [ Theorem 3.3. If X 1, X 2,... are i.i.d. ad E (X 1 ) 4 <, the S m = E (X 1 ) almost surely. Proof. By replacig X i by X i m, we may assume that m = 0. To show S 0 almost surely, it suffices to show that for every δ > 0, ({ }) S P δ for oly fiitely may = 1 as { ω : S } { (ω) 0 = j=1 ω : S } 1 for oly fiitely may, j ad as the evets o the right form a decreasig sequece i j, ({ }) { S P 0 = lim P S 1 } j j for oly fiitely may. Fially by Borel-Catelli, it suffices to show =1 P ( S ) [ > δ <. Sice E [X 1 = 0 ad E (X 1 ) 4 <, Chebyshev s iequality with p = 4 yields the estimate ( ) S P > δ 1 [ δ 4 E S 4 = 1 4 δ 4 E [ S 4. So it will be eough to show E [ S 4 B 2 for some costat B. Write E [ S 4 = E [(X X ) 4 [ = E (X 1 ) (X ) (X 1 ) 2 (X 2 ) (X 1 ) 2 (X ) 2 + other terms like X 1 (X 2 ) 2 X 3 or X 4 (X 7 ) 3 or X 1 X 3 X 5 X 8.

28 28 3. INDEPENDENT RANDOM VARIABLES [ Now E (X i ) 4 [ = C < for all i, ad E (X j ) 2 (X k ) 2 [ = E (X j ) 2 [ E (X k ) 2 = σ 2 σ 2 = σ 4 for j k where σ 2 = Var [ (X 1 ) < (sice E [X 1 = 0). O the other had, E [X 1 (X 2 ) 2 X 3 = E [X 1 E (X 2 ) 2 E [X 3 = 0 ad similarly all other such terms have zero expectatio. Thus E [ S 4 ( ) = C + 6 σ 4 = C + 3 ( 1) σ 4 B 2 2 for some B <. This completes the proof. We wat to claim the law of large umbers holds uder weaker coditios. First we ll see some theorems about sums of idepedet (ot eccessarily idetically distributed) radom variables X j, which will say whe j=1 X j is coverget (almost surely). I order to do this, we eed a techical lemma that improves the Chebyshev iequality: P ({ S l}) E ((S ) 2) l 2. Lemma [ 3.4 (Kolmogorov s Iequality). Suppose X 1,..., X are idepedet with E (X j ) 2 = (σ j ) 2 < ad E [X j = 0 for all j. Let T (ω) = sup 1 k S k (ω). The, E ((S ) 2) P ({T l}) l 2. The proof has a iterestig structure, which shows up agai i the study of Markov chais ad martigales. Proof. This is a stoppig-time argumet. Cosider the first time that the sequece S 1, S 2,... is l, ad defie E j to be the evet that that time is j. Explicitly we have E 1 = { S 1 l}, E 2 = { S 1 < l, S 2 l}, ad so o. Also, {T l} = k=1 E k ad 1 {T l} = k=1 1 E k. Now it suffices to show that P ({T l}) 1 l E [(S 2 ) 2 1 {T l}, ad by our choice of E k we have E [(S ) 2 1 {T l} = k=1 [(S E ) 2 1 Ek. Now P ({T l}) = E [ 1 {T l} = E [1 Ek k=1 k=1 1 [ l 2 E (S k ) 2 1 Ek by our choice of E k. Goig forwards, 1 [( P ({T l}) l 2 E (S k ) 2 + (S S k ) 2) 1 Ek = k=1 k=1 1 [( ) l 2 E (S ) 2 2S k (S S k ) 1 Ek.

29 3. STRONG LAW OF LARGE NUMBERS 29 But observe E [S k (S S k ) 1 Ek = E [S k 1 Ek E [S S k = 0 by idepedece ad sice E [X j = 0 for all j. So we ve show 1 [ P ({T l}) l 2 E (S ) 2 1 Ek, k=1 ad hece the result. Lecture 7, 10/31/11 Now we have Kolmogorov s two- ad three-series theorems. These importat results describe the covergece of idepedet series of radom variables. Theorem 3.5 (Kolmogorov s two-series theorem). Suppose X 1, X 2,... (o some (Ω, F, P )) are idepedet. If i=1 E [X i ad i=1 Var (X i) are coverget, the the series i=1 X i (ω) coverges almost surely. Remark. The two-series theorem gives a sufficiet coditio for almost sure covergece of series of idepedet r.v.s. The ext theorem (the three-series theorem) will give a eccesary coditio. Example 3.6. Recall from calculus that =1 1/s < iff s > 1. But =1 ( 1) / s < coverges for s > 0. What if oe takes radom sigs? Let Y 1, Y 2,... be idepedet with P (Y = +1) = 1/2, P (Y = 1) = 1/2 (e.g. fair coi-tossig). Does =1 Y / s coverge? The ituitive aswer is that we still get eough cacellatio to allow covergece. By the two-series theorem, the series coverges a.s. for s > 1/2. Ideed, if we defie X = Y / s, the E [X = 0 ad Var (X ) = 1/ 2s so that =1 Var (X ) < if 2s > 1. After we prove the three-series theorem, we ll be able to say that =1 X is a.s. ot coverget for s 1/2. Proof. First, assume E [X i = 0 for all i. To show a.s. covergece, it suffices to prove for all δ > 0, ( ) lim P sup S k S m δ = 0 m, m<k where S k = X X k. By Kolmogorov s iequality applied to the radom variables X m+1, X m+1 + X m+2,..., X m X, ( ) P sup S k S m δ 1 m<k δ 2 Var (X m X ) = 1 δ 2 Var (X i ) 0 i=m+1 as m,. If the meas are o-zero, defie Y i = X i E [X i so that X i = Y i +E [X i with E [Y i = 0. The apply the previous result to the Y i s ad use that i E [X i <. Suppose ow we have idepedet X i s to which the two-series theorem does ot apply. Note that if i Var (X i) < but i E [X i is diverget, we ca still apply the two-series theorem to Y i = X i E [X i ad argue that i X i = i (Y i + E [X i ) is diverget. But suppose i Var (X i) = +. We ca the try a

30 30 3. INDEPENDENT RANDOM VARIABLES trucatio argumet, lettig Y i = X i 1 { Xi C} with C a fixed costat. If the twoseries theorem works for the Y i s, ad if i=1 P (X i Y i ) = i=1 P ( X i > C) <, the the Borel-Catelli lemma implies P (X i = Y i for all but fiitely may i) = 1. Ad the i X i coverges a.s. because i Y i does. This proves half of the ext theorem. Theorem 3.7 (Kolmogorov s three-series thereom). Let X 1, X 2,... be idepedet radom variables, ad let C > 0. The i X i coverges almost surely iff (1) i P ( X i > C) <, (2) i E [ X i 1 { Xi C} <, ad (3) i Var ( X i 1 { Xi C}) <. We already proved sufficiecy; for ecessity, see the text. Remark. It seems strage that we ca take ay C > 0 i the theorem above, for the series of trucated variables that result are differet for differet Cs. But as asserted above, they coverge simultaeously. Sometimes we ca make problems easier by choosig the right C. Now we state ad prove the strog law of large umbers. Theorem 3.8 (Strog law of large umbers). If X 1, X 2,... are i.i.d. with E [ X 1 <, the X1+ +X E [X 1 almost surely. Proof sketch. There are several steps. (1) W.l.o.g., assume E [X 1 = 0. (2) Let Y = X 1 { X } ad b = E [Y. Sice E [ X 1 <, we get P ( X 1 > ) <. As P ( X 1 > ) = P ( X > ), we get P ( X > ) < ad hece P (X Y ) <. So by Borel- Catelli, P (X Y oly fiitely ofte) = 1. Thus if Y b coverges a.s., the so does X b. (3) Sice E [X = 0, b E [X = b = E [ X 1 1 { X1 >} 0 as because E [ X 1 <. (This is by the domiated covergece theorem.) (4) A elemetary lemma about ifiite series says that X b < implies (X1 b1)+ +(X b) 0. Sice b 0 implies b1+ +b 0, we coclude X1+ +X 0 as desired. (5) It remais to show Y b <. This is a applicatio of the Komogorov three-series theorem. Coditios (1) ad (2) i the theorem are immediate. Showig (3) requires a estimate based o the assumptio that E [ X 1 <. 4. Kolmogorov s 0-1 Law Cosider radom variables X 1, X 2,... o the ifiite product space (Ω, B, P ) = Π i=1 (R, Borel sets, µ i). Defiitios 4.1. Let B be the σ-field geerated by all evets of the form {X j D} for j where D is ay Borel set. The B is called the σ-field geerated by X, X +1,..., or the future σ-field from time. The tail σ-field B is the σ-field B = =1B. Elemets of B are called tail evets.

31 5. CENTRAL LIMIT THEOREM 31 Oe way to thik about the future σ-field is to otice that if F B, the the defiitio of F does ot deped o X 1,..., X 1. Similarly if F B, the F does ot deped o ay fiite collectio of X i s. Example 4.2. Let ω = (ω 1, ω 2,... ) ad X (ω) = ω. The evets { } E 1 = ω : lim sup X (ω) 5, { } X 1 (ω) + + X (ω) E 2 = ω : lim = 2 are tail evets. Theorem 4.3 (Kolmogorov s 0-1 law). If X 1, X 2,... are idepedet ad A is a tail evet, the either P (A) = 0 or P (A) = 1. Example 4.4. Cosider the tail evet E 2 from the previous example, ad suppose the X i are i.i.d. evets. The if E [X 1 = 2, the strog law of large umbers says P (E 2 ) = 1. Ad if E [X 1 2, the strog law of large umbers says P (E 2 ) = 0. This agrees with the result of the Kolmogorov 0-1 law. Proof. The idea is to show that A is idepedet of A, sice the P (A) = P (A A) = P (A) P (A) ad so (P (A)) 2 P (A) = 0. Hece P (A) (P (A) 1) = 0 ad hece P (A) = 0 or P (A) = 1. Let B be the σ-field geerated by X 1,..., X. A B implies A is idepedet of every evet i B for every. Let A = {evets that are idepedet of A}, the A is a mootoe class (closed uder icreasig/decreasig limits). Sice A cotais each of the fields B, it cotais the whole σ-field B o the ifiite product space. I particular we have A A, so A is idepedet of itself. Lecture 8, 11/1/11 Here are some more examples of tail evets: Examples 4.5. Let {X i } be a sequece of idepedet evets. (1) { 1 ω : lim ( X i (ω)) 3 } { is a tail evet. (2) ω : lim sup 1 log log ( } i=1 X i (ω) a) = b is a tail evet. Here a, b are costats. (3) I fact, {ω : lim X i (ω) exists} is a tail evet. I most cases, this evet has probability zero. The degeerate case is whe the distributios approach a poit mass. (4) { 1 ω : sup i=1 X i (ω) 3 } is ot a tail evet. 5. Cetral Limit Theorem Let X 1, X 2,... be i.i.d. r.v. s ad let S = X X be the th partial sum. Our best law of large umbers says that if µ = E [ X 1 <, the S / µ with probability oe. We ca reiterpret this as sayig 1 (S µ) 0 with probability oe. This suggests the questio: how fast does 1 (S [ µ) ted to zero? The cetral limit theorem gives a aswer, at least whe E (X 1 ) 2 <. The, as we ll see, 1 (S µ) 0 roughly like order 1/. More precisely, the distributio of (S µ) = 1 (S µ) does ot ted to δ 0 as (except i the trivial case where the commo distributio of the X i s is δ µ ).

32 32 3. INDEPENDENT RANDOM VARIABLES [ Theorem 5.1 (Cetral limit theorem). Let X 1, X 2,... be i.i.d. r.v. s with E (X 1 ) 2 1 < ad call µ = E [X 1. The the distributio of (S µ) coverges [ weakly to the ormal distributio with mea zero ad variace σ 2 = E (X 1 µ) 2 = Var (X 1 ). Remark. I the otrivial case σ 0, we ca read the theorem as sayig ( ) S µ u 1 P u e x2 2σ 2 dx 2πσ for every u R. Proof. The proof is a exercise i characteristic fuctios. Recall that (1) The characteristic fuctio of a ormal r.v. W with mea µ ad variace σ 2 is ψ (t) = E [ e itw = e iµt 1 2 σ2 t 2. (2) Distributios of r.v. s W coverge weakly to the distributio of W iff ψ (t) = E [ e itw ψ (t) = E [ e itw for all real t. (3) If Y 1, Y 2,..., Y are idepedet, the E [ e it(α1y1+ +αy) = Π i=1 E [ e i(αit)yi. (4) If E [ Y 2 <, the E [ e ity = 1 + ite [Y + (it)2 2 E [ Y 2 + o ( t 2) = 1 + ie [Y t E [ Y 2 t 2 + o ( t 2) 2 as t 0. See exercise 2.4 i the text; see also the secod proof of the weak law of large umbers (theorem 2.4). Now the proof. By (1) ad (2), we oly eed to prove that E [e it 1 (S µ) e 1 2 σ2 t 2 for all t R. To study 1 (S µ) = 1 i=1 (X i µ), let Y i = X i µ [ ad ote that E [Y i = E [X i µ = 0 ad E (Y i ) 2 = Var (X i ) = σ 2. The by (3), E [e it 1 (S µ) = E [e it P Y i i=1 where φ (t) = E [ e ity1. Fially by (1), = Π i=1e [e it Y i = ( φ ( t )) φ (t) = 1 σ2 2 t2 + o ( t 2) as t 0, so for fixed t R [ ( ) t φ = [1 σ2 t 2 ( t 2 + o = 2 [1 σ2 2 e 1 2 σ2 t 2 by lemma 2.5. This completes the proof. t 2 + o ( 1 ) )

33 5. CENTRAL LIMIT THEOREM 33 There are a variety of extesios of the cetral limit theorem. I what follows we ll assume X 1, X 2,... are idepedet with E [ Xi 2 <, but ot that they are idetically distributed. To simplify the discussio, let s assume E [X i = 0 for all i so that Var (X i ) = E [ Xi 2. (Of course, we ca always replace Xi by Y i = X i E [X i to get to this case.) Let S = X X ad let s 2 = Var (S ) = i=1 Var (X i) = i=1 σ2 i. I the i.i.d. case, σ2 = σ 2 ad the cetral limit theorem says (for σ 2 0) that S /s N (0, 1) i distributio. 2 I the exteded settig, ca we coclude S /s N (0, 1) i distributio? The ext example shows we eed some more assumptios. Example 5.2. Let X 1, X 2,... be i.i.d. ad (to be cocrete) with values ±1 with probability 1/2. Let X j = σ j X j with j=1 σ2 j < (e.g. with σ j = j 2 ). The by the Komogorov 2- or 3-series theorem, sice s 2 = j=1 Var (X j) = j=1 σ2 2 s 2 = j=1 σ2 j < we have that S j=1 σ jx j a.s. ad so S /s 1 s j=1 σ jx j a.s. ad hece also i distributio. But this limit is ot N (0, 1)-distributed. There are (at least) two ways to see why the limit is ot N (0, 1)-distributed. The first is to cosider the characteristic fuctio of the limit. The secod is to observe that i the σ j = j 2 case, the limit is a bouded radom variable. The example above wet wrog because j=1 σ2 j least) eed to assume j=1 σ2 j =. Is this eough? Example 5.3. Let +j with probability p j /2 X j = j with probability p j /2. 0 with probability 1 p j <. So it seems we (at [ The σj 2 = E (X j ) 2 = j 2 p j. Suppose j=1 j2 p j = but also that j=1 P j < (e.g. p j = j 2 ). The by Borel-Catelli (or the 3-series theorem), j=1 X j is a.s. coverget, sice it is a.s. a fiite sum. So S /s 0 a.s. ad therefore ot to N (0, 1) i distributio. The problem i the example above is that although s = j=1 σ2 j =, the mai cotributio to s = j=1 σ2 j comes from very large values of X j. To avoid this pheomeo, oe imposes the Lideberg coditio : If α i deotes the distributio of X i, we require 1 x 2 dα i 0 x ɛ s s 2 i=1 for every ɛ > 0. This says that the cotributio to the variace from very large values of X j is egligible. (Recall for E [X i = 0, Var (X i ) = R x2 dα i.) As we ll see, Lideberg s coditio plus the requiremet that s are eough to imply that S /s N (0, 1) i distributio. Lecture 9, 11/8/11 Theorem [ 5.4 (Lideberg s CLT). Let X 1, X 2,... be idepedet with E [X j = 0 ad E (X j ) 2 = Var (X j ) = σj 2 < ad let S = X X ad s 2 = 2 N (0, 1) meas the stadard ormal distributio, with mea 0 ad variace 1.

34 34 3. INDEPENDENT RANDOM VARIABLES σ σ 2 = Var (S ). The S /s N (0, 1) if s ad if for all ɛ > 0, 1 (5.1) E [ Xj 2 1 { Xj/s >ɛ} 0 as. s 2 j=1 Remark. Coditio (5.1) is called the Lideberg coditio. Proof. We have S s = X1 s + + X s ad suppose X j /s has the characteristic fuctio φ,j. ({φ,j } is a example of a triagular array.) We eed to show Π j=1 φ,j (t) e t2 /2 as for ay fixed t R. The trick: as φ,j (t) is close to 1 for large (by the Lideberg coditio), write φ,j (t) = (1 + (φ,j (t) 1)) = e φ,j(t) 1 + O (φ,j (t) 1) 2 as. So replace φ,j with ψ,j (t) = e φ,j(t) 1 ; oe ca show it suffices to prove Π j=1 ψ,j (t) = e P j=1 (φ,j(t) 1) e t2 /2, or equivaletly ( ) K = φ,j (t) 1 + σ2 j t2 2s 2 0 j=1 j=1 for all t. Observe that [ K E e itxj/s 1 + it X j + t2 (X j ) 2, s 2 sice E [X j = 0 for all j. Now we ll write each term as a itegral w.r.t. α j, the distributio of X j, break up the itegral accordig to where x < ɛs ad x ɛs (with ɛ < 1), ad use the bouds { e ir 1 ir + r2 2 C r 3 r < t C r 2 otherwise with r = tx/s. So K Ct 3 j=1 = I + II. { x <ɛs } x 3 s 3 dα j + C t 2 j=1 s 2 x 2 { x ɛs } s 2 Notice the Lideberg coditio says exactly that II 0. Ad x 3 { x <ɛs } s 3 dα j = ɛ x 2 { x <ɛs } s 2 dα j x 2 ɛ dα j = ɛ σ2 j s 2 R s 2 which gives K ɛct 3 after summig o j. Thus lim sup K Dɛ for all ɛ > 0, ad so lim K = 0. This completes the proof. dα j

35 6. TRIANGULAR ARRAYS AND INFINITE DIVISIBILITY Triagular Arrays ad Ifiite Divisibility The proof of the Lideberg CLT used a example of a triagular array. Here is aother example. Example 6.1. Biomial approximatio of the Poisso distributio. Let S be the umber of successes i idepedet trials with probability p of success o each trial. Claim. If p λ (0, ), the S Poisso (mea λ). Proof. (By characteristic fuctios.) Write S = X,1 + + X, with X,j idepedet Beroulli variables with parameter p. The E [ e its = ( E [ e itx1,) = ( (1 p ) + p e it) = ( ( 1 + p e it 1 )) ( = 1 + λ ( e it 1 ) ( )) 1 + o e λ(eit 1) which is the characteristic fuctio of Poisso (λ). Hece the claim. Now we have two examples of triagular arrays, oe i which we get a ormally distributed limit ad oe i which we get a Poisso limit. This is the begiig of the theory of triagular arrays. Defiitio 6.2. A triagular array cosists of radom variables Note. Ofte k =. X 1,1,..., X 1,k1 X 2,1,..., X 2,k2. We ll cosider the case where the radom variables i each row are idepedet, ad ask whether S = X,1 + + X,k has a limitig distributio as. Examples 6.3. (1) I Lideberg CLT, k =, X,j = X j /s, ad the limit is N (0, 1). (2) I the biomial approximatio of the Poisso distributio, k =, { 1 probability p X,j = 0 probability (1 p ), ad the limit is Poisso (λ). Besides idepedece i each row, we wat o idividual summad to cotribute too much (as ). So we ll assume that for all δ > 0, sup P ( X,j δ) 0 1 j k as. This coditio is called uiform ifiitesimality or uifomly asymptotically egligible, ad is aalogous to the Lideberg coditio (which says idividual summads do ot cotribute too much to the variace). Through all of this, we seek to aswer the questios:

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space