Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece i distributio of Z to Z by lim F Z (x) = F Z (x), for every x R k. But this is ot good eough if F Z is ot always cotiuous. We ca adopt the followig defiitio. Defiitio. Let Z,Z 1,Z 2,... be radom vectors o R k. If, for every c R k, c Z coverges i distributio to c Z, the we say that Z coverges i distributio to Z. For ay costat vector c R k, c Z is a liear combiatio of compoets of Z. Note that i this defiitio, the covergece of c Z has to be true for every c (ot just some c). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 1 / 16

If Z coverges i distributio to Z, the every compoet of Z coverges i distributio to the correspodig compoet of Z. The coverse is ot true: if every compoet of Z coverges i distributio to the correspodig compoet of Z, Z does ot ecessarily coverge i distributio to Z, because each compoet correspods to a particular c oly. (A couter-example is give below.) This is differet from covergece i probability ad covergece almost surely. Theorems 5.5.12 ad 5.5.13 ca be exteded to the case of radom vectors. A couter-example Let Z = (X,Y ) have the joit pdf f (x,y) = 1 2π e (x 2 +y 2 )/2 [1 + g(x,y)], (x,y) R 2 { xy 1 < x < 1, 1 < y < 1 g(x,y) = 0 otherwise UW-Madiso (Statistics) Stat 609 Lecture 20 2015 2 / 16

I Chapter 4 we showed that X N(0,1) ad Y N(0,1), but the joit distributio of Z is ot ormal. Let Z = (X,Y ), where for each, the radom variables X ad Y are idepedet, X N(0,1), ad Y N(0,1). Sice X s have the same N(0,1) distributio for all, obviously that X coverges" i distributio to X N(0,1); similarly, Y coverges i distributio to Y N(0,1). But the joit distributio of Z = (X,Y ) for every is f Z (x,y) = 1 2π e (x 2 +y 2 )/2, (x,y) R 2 sice X ad Y are idepedet. Sice Z s have the same distributio for every, Z coverges i distributio to Z 1, which has a differet distributio from Z. Thus, Z does ot coverge i distributio to Z although each compoet of Z coverges i distributio to the correspodig compoet of Z. UW-Madiso (Statistics) Stat 609 Lecture 20 2015 3 / 16

The covergece i distributio is equivalet to the covergece i chf (Theorem C7 or M3(ii)) so that the covergece i chf is a tool to study covergece i distributio. We ca also use Theorem 2.3.12 or M3(i) to establish covergece i distributio by showig the covergece i mgf, but we have to kow the existece of mgf s. Example. Let X 1,...,X be iid radom variables. We wat to show that there does ot exist a sequece of real umbers {c } such that lim i=1 (X i c i ) exists almost surely, uless P(X 1 = c) = 1 for a costat c. Suppose that Y = lim i=1 (X i c i ) exists almost surely. For ay, the chf of Y = i=1 (X i c i ) is φ Y (t) = i=1 φ X1 (t)e ıtc i = [φ X1 (t)] e ıt(c 1+ +c ) If Y exists almost surely, the Y coverges i distributio to Y ad UW-Madiso (Statistics) Stat 609 Lecture 20 2015 4 / 16

lim [φ X 1 (t)] e ıt(c 1+ +c ) = lim φ X 1 (t) = φ Y (t). However, lim φ X1 (t) is either 0 or 1, depedig o whether φ X1 (t) < 1 or = 1, which meas that φ Y (t) must be either 0 or 1. Sice φ Y (t) is cotiuous ad φ Y (0) = 1, φ Y (t) = 1 for all t ad hece φ X1 (t) = 1 for all t. This proves that P(X 1 = c) = 1 for a costat c. The Cetral Limit Theorem (CLT) is oe of the most importat theorems i probability ad statistics. It derives the limitig distributio of a sequece of ormalized radom variables/vectors. Theorem 5.5.15 (Cetral Limit Theorem) Let X 1,X 2,... be iid radom variables with E(X 1 ) = µ ad Var(X i ) = σ 2 <. The, for ay x R, lim P( ( X µ)/σ x) = Φ(x) = x 1 2π e t2 /2 dt That is, ( X µ)/σ coverges i distributio to Z N(0,1). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 5 / 16

Proof. Normality comes from sums of iid radom variables without distributioal assumptio except the fiiteess of the variace. The assumptio of fiite variace is essetially ecessary for covergece of a sume of iid radom variables to ormality. We apply Theorem C7, i.e., we try to show that the chf of ( X µ) Z = = 1 X σ i µ σ i=1 coverges to the chf of N(0,1), which is e t2 /2. Let φ be the chf of (X i µ)/σ (ot depedig o i sice X i s are iid). From the properties of chf, the chf of Z is [ ( )] t φ Z (t) = φ. It remais to show that lim φ Z (t) = e t2 /2 for t R. Sice E((X i µ)/σ) = 0 ad Var((X i µ)/σ) = 1, by Theorem C1 ad Taylor s expasio, UW-Madiso (Statistics) Stat 609 Lecture 20 2015 6 / 16

where The, for ay t R, φ(s) = 1 s2 2 + R(s) as s 0 lim s 0 R(s)/s2 = 0 ( ) t lim R = 0 ad hece [ ( )] t lim φ Z (t) = lim φ = lim [1 t2 2 + R [ = lim 1 1 { t 2 2 + R [ = lim 1 1 ( )] t 2 2 = e t2 /2 ( t )] ( t )}] UW-Madiso (Statistics) Stat 609 Lecture 20 2015 7 / 16

The multivariate CLT Let X 1,X 2,... be iid radom vectors o R k with E(X 1 ) = µ ad fiite covariace matrix Σ. The ( X µ) coverges i distributio to a radom vector X N(0, Σ), the k-dimesioal ormal distributio with mea 0 ad covariace matrix Σ. Proof. By the defiitio of covergece i distributio for radom vectors, we eed to show that for ay costat vector c R k, c ( X µ) coverges i distributio to c X. From the result discussed earlier, we kow that c X N(0,c Σc). By the CLT, (c X E(c X)) Var(c X) = (c X c µ) c Σc coverges i distributio to Z N(0,1). Note that c ΣcZ N(0,c Σc). The proof is the completed because we ca show from the defiitio that, if Y coverges i distributio to Y, the ay coverges i distributio to ay. UW-Madiso (Statistics) Stat 609 Lecture 20 2015 8 / 16

Example 5.5.16. Let X 1,...,X be a radom sample from a egative-biomial(r,p) distributio. Recall that r(1 p) µ = E(X 1 ) =, σ 2 r(1 p) = Var(X 1 ) = p p 2 The, the CLT tells us that the sequece of ormalized radom variables ( X µ) = σ coverges i distributio to Z N(0,1). ( X r(1 p)/p), = 1,2,... r(1 p)/p 2 This provides a way to approximately calculate probabilities that are very difficult to compute exactly. For example, if r = 10, p = 1/2 ad = 30, usig the fact that 30 i=1 X i egative-biomial(r,p), we ca exactly compute UW-Madiso (Statistics) Stat 609 Lecture 20 2015 9 / 16

( ) 30 P( X 11) = P X i 330 i=1 = 330 x=0 The CLT gives us the approximatio ( P( X 30( X 10) 11) = P 20 ( 300 + x 1 x ) (0.5) 300+x = 0.8916 ) 30(11 20) 20 Φ ( 30(11 20) 20 ) = Φ(1.2247) = 0.8888 where Φ(x) is the cdf of N(0,1). Normal approximatio to biomial Let Z biomial(,p), = 1,2,... To apply the CLT, ote that each Z is a sum of idepedet Beroulli radom variables X 1,...,X, i.e., Z = X 1 + + X, = 1,2,... X 1,X 2,... are iid with E(X i ) = p, Var(X i ) = p(1 p) UW-Madiso (Statistics) Stat 609 Lecture 20 2015 10 / 16

The, X = Z / ad Z p ( X p) = coverges i distributio to Z N(0,1) p(1 p) p(1 p) Cosequetly, for ay m, ( P(Z m) = P Z p p(1 p) ) ( m p Φ p(1 p) ) m p p(1 p) From the previous two examples, to approximate a probability of the form P(Z x), we always first ormalize Z to [Z E(Z )]/ Var(Z ) ad the approximate the probability by Φ([x E(Z )]/ Var(Z )). I fact, a little bit more ca be doe usig the followig result. Pólya s theorem If a sequece of radom vectors Z coverges i distributio to Z ad Z has a cotiuous cdf F Z o R k, the lim F Z (x) F Z (x) = 0. x R k UW-Madiso (Statistics) Stat 609 Lecture 20 2015 11 / 16

If the limitig cdf F Z is cotiuous, the the covergece of F Z to F Z is ot oly for every x, but also uiformly for all x R k. This result implies the followig useful result: If Z coverges i distributio to Z with a cotiuous F Z ad c R k with c c, the F Z (c ) F Z (c). Asymptotic ormality If Y, = 1,2,..., is a sequece of radom variables ad µ ad σ > 0, = 1,2,..., are costats (typically µ = E(Y ) ad σ 2 = Var(Y ) if they are fiite) such that (Y µ )/σ coverges i distributio to N(0,1), the for ay sequece {c } of costats, ( ) lim P(Y c µ c ) Φ = 0, i.e., P(Y c ) ca be approximated by Φ ( c µ ) σ, regardless of whether c, µ, or σ has a limit. Sice Φ ( t µ ) σ is the cdf of N(µ,σ 2 ), Y is said to be asymptotically distributed as N(µ,σ 2 ) or simply asymptotically ormal. σ UW-Madiso (Statistics) Stat 609 Lecture 20 2015 12 / 16

Other forms of CLT There are other forms of CLT for o-iid sequeces of radom variables/vectors. We itroduce without proof the followig three. Lideberg s CLT Let X, = 1,2,..., be idepedet radom variables with ) 0 < σ 2 = Var( X j <, = 1,2,... If lim 1 σ 2 [ ] E (X j E(X j )) 2 I { Xj E(X j ) >εσ } = 0 for ay ε > 0, the 1 σ (X j E(X j )) coverges i distributio to N(0,1). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 13 / 16

Liapouov s CLT Let X, = 1,2,..., be idepedet radom variables with lim 1 σ 2+δ E X j EX j 2+δ = 0 for some δ > 0. The the same coclusio as i Lideberg s CLT holds. The coditio i Liapouov s CLT is stroger tha the coditio i Lideberg s CLT, Liapouov s coditio is easier to verify. The CLT for a liear combiatio of iid radom variables Let X, = 1,2,..., be iid radom variables with E(X 1 ) = µ ad Var(X 1 ) = σ 2 <. Let c R, = 1,2,..., be costats such that ( ) The i=1 c i (X i µ) σ lim / ci 2 i=1 max 1 i c2 i / ci 2 i=1 = 0. coverges i distributio to N(0,1). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 14 / 16

Example. We apply Liapouov s CLT to idepedet radom variables X 1,X 2,... satisfyig P(X j = ±j a ) = P(X j = 0) = 1/3, where a > 0, j = 1,2,... Note that E(X j ) = 0 ad for all j ) σ 2 = Var( X j = j ) = Var(X 2 3 We eed to show that, for some δ > 0, For ay δ > 0, Sice lim 1 σ 2+δ E X j EX j 2+δ = 0 E X j E(X j ) 2+δ = 2 3 1 j t 1 j+1 x t dx j j (2+δ)a. j t j=2 j 2a. UW-Madiso (Statistics) Stat 609 Lecture 20 2015 15 / 16

ad we coclude that for ay t > 0. The lim 1 σ 2+δ 1 j+1 x t dx = x t dx = t+1 1 j 1 t + 1, lim 1 t+1 E X j EX j 2+δ = lim = lim ( 3 = 2 = 0. j t = 1 t + 1 2 3 j(2+δ)a ( 2 3 j2a ) 1+δ/2 ( ) 3 δ/2 (2a + 1) 1+δ/2 2 (2 + δ)a + 1 ) δ/2 (2a + 1) 1+δ/2 (2 + δ)a + 1 lim (2+δ)a+1 (2a+1)(1+δ/2) 1 δ/2 UW-Madiso (Statistics) Stat 609 Lecture 20 2015 16 / 16