Lecture 20: Multivariate convergence and the Central Limit Theorem

Similar documents
Lecture 19: Convergence

Lecture 8: Convergence of transformations and law of large numbers

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

This section is optional.

LECTURE 8: ASYMPTOTICS I

Lecture Chapter 6: Convergence of Random Sequences

7.1 Convergence of sequences of random variables

Lecture 18: Sampling distributions

Convergence of random variables. (telegram style notes) P.J.C. Spreij

7.1 Convergence of sequences of random variables

Distribution of Random Samples & Limit theorems

4. Partial Sums and the Central Limit Theorem

STA Object Data Analysis - A List of Projects. January 18, 2018

Lecture 33: Bootstrap

Lecture 23: Minimal sufficiency

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

ST5215: Advanced Statistical Theory

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture 15: Density estimation

Parameter, Statistic and Random Samples

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

EE 4TM4: Digital Communications II Probability Theory

Probability and Random Processes

An Introduction to Asymptotic Theory

The Central Limit Theorem

1 Convergence in Probability and the Weak Law of Large Numbers

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Chapter 6 Principles of Data Reduction

Notes 19 : Martingale CLT

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

SDS 321: Introduction to Probability and Statistics

2.2. Central limit theorem.

Solution to Chapter 2 Analytical Exercises

Introduction to Probability. Ariel Yadin

STATISTICAL METHODS FOR BUSINESS

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Mathematics 170B Selected HW Solutions.

Notes 27 : Brownian motion: path properties

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Entropy Rates and Asymptotic Equipartition

Expectation and Variance of a random variable

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Fall 2013 MTH431/531 Real analysis Section Notes

Sequences and Series of Functions

HOMEWORK I: PREREQUISITES FROM MATH 727

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MA Advanced Econometrics: Properties of Least Squares Estimators

Unbiased Estimation. February 7-12, 2008

Lecture 24: Variable selection in linear models

32 estimating the cumulative distribution function

5. Limit Theorems, Part II: Central Limit Theorem. ECE 302 Fall 2009 TR 3 4:15pm Purdue University, School of ECE Prof.

STAT Homework 1 - Solutions

Kernel density estimator

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Probability for mathematicians INDEPENDENCE TAU

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Asymptotic Results for the Linear Regression Model

Lecture 3 : Random variables and their distributions

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Regression with an Evaporating Logarithmic Trend

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Solutions: Homework 3

STAT Homework 2 - Solutions

MAT1026 Calculus II Basic Convergence Tests for Series

Singular Continuous Measures by Michael Pejic 5/14/10

Math 525: Lecture 5. January 18, 2018

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Asymptotic distribution of products of sums of independent random variables

Lecture 7: Properties of Random Samples

MIT Spring 2016

Estimation of the Mean and the ACVF

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Advanced Stochastic Processes.

Math 61CM - Solutions to homework 3

Are the following series absolutely convergent? n=1. n 3. n=1 n. ( 1) n. n=1 n=1

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The central limit theorem for Student s distribution. Problem Karim M. Abadir and Jan R. Magnus. Econometric Theory, 19, 1195 (2003)

11 THE GMM ESTIMATION

Central Limit Theorem using Characteristic functions

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 4. Random variable and distribution of probability

2.1. Convergence in distribution and characteristic functions.

Chapter 6 Infinite Series

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

AMS570 Lecture Notes #2

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

4. Basic probability theory

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

6.3 Testing Series With Positive Terms

Transcription:

Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece i distributio of Z to Z by lim F Z (x) = F Z (x), for every x R k. But this is ot good eough if F Z is ot always cotiuous. We ca adopt the followig defiitio. Defiitio. Let Z,Z 1,Z 2,... be radom vectors o R k. If, for every c R k, c Z coverges i distributio to c Z, the we say that Z coverges i distributio to Z. For ay costat vector c R k, c Z is a liear combiatio of compoets of Z. Note that i this defiitio, the covergece of c Z has to be true for every c (ot just some c). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 1 / 16

If Z coverges i distributio to Z, the every compoet of Z coverges i distributio to the correspodig compoet of Z. The coverse is ot true: if every compoet of Z coverges i distributio to the correspodig compoet of Z, Z does ot ecessarily coverge i distributio to Z, because each compoet correspods to a particular c oly. (A couter-example is give below.) This is differet from covergece i probability ad covergece almost surely. Theorems 5.5.12 ad 5.5.13 ca be exteded to the case of radom vectors. A couter-example Let Z = (X,Y ) have the joit pdf f (x,y) = 1 2π e (x 2 +y 2 )/2 [1 + g(x,y)], (x,y) R 2 { xy 1 < x < 1, 1 < y < 1 g(x,y) = 0 otherwise UW-Madiso (Statistics) Stat 609 Lecture 20 2015 2 / 16

I Chapter 4 we showed that X N(0,1) ad Y N(0,1), but the joit distributio of Z is ot ormal. Let Z = (X,Y ), where for each, the radom variables X ad Y are idepedet, X N(0,1), ad Y N(0,1). Sice X s have the same N(0,1) distributio for all, obviously that X coverges" i distributio to X N(0,1); similarly, Y coverges i distributio to Y N(0,1). But the joit distributio of Z = (X,Y ) for every is f Z (x,y) = 1 2π e (x 2 +y 2 )/2, (x,y) R 2 sice X ad Y are idepedet. Sice Z s have the same distributio for every, Z coverges i distributio to Z 1, which has a differet distributio from Z. Thus, Z does ot coverge i distributio to Z although each compoet of Z coverges i distributio to the correspodig compoet of Z. UW-Madiso (Statistics) Stat 609 Lecture 20 2015 3 / 16

The covergece i distributio is equivalet to the covergece i chf (Theorem C7 or M3(ii)) so that the covergece i chf is a tool to study covergece i distributio. We ca also use Theorem 2.3.12 or M3(i) to establish covergece i distributio by showig the covergece i mgf, but we have to kow the existece of mgf s. Example. Let X 1,...,X be iid radom variables. We wat to show that there does ot exist a sequece of real umbers {c } such that lim i=1 (X i c i ) exists almost surely, uless P(X 1 = c) = 1 for a costat c. Suppose that Y = lim i=1 (X i c i ) exists almost surely. For ay, the chf of Y = i=1 (X i c i ) is φ Y (t) = i=1 φ X1 (t)e ıtc i = [φ X1 (t)] e ıt(c 1+ +c ) If Y exists almost surely, the Y coverges i distributio to Y ad UW-Madiso (Statistics) Stat 609 Lecture 20 2015 4 / 16

lim [φ X 1 (t)] e ıt(c 1+ +c ) = lim φ X 1 (t) = φ Y (t). However, lim φ X1 (t) is either 0 or 1, depedig o whether φ X1 (t) < 1 or = 1, which meas that φ Y (t) must be either 0 or 1. Sice φ Y (t) is cotiuous ad φ Y (0) = 1, φ Y (t) = 1 for all t ad hece φ X1 (t) = 1 for all t. This proves that P(X 1 = c) = 1 for a costat c. The Cetral Limit Theorem (CLT) is oe of the most importat theorems i probability ad statistics. It derives the limitig distributio of a sequece of ormalized radom variables/vectors. Theorem 5.5.15 (Cetral Limit Theorem) Let X 1,X 2,... be iid radom variables with E(X 1 ) = µ ad Var(X i ) = σ 2 <. The, for ay x R, lim P( ( X µ)/σ x) = Φ(x) = x 1 2π e t2 /2 dt That is, ( X µ)/σ coverges i distributio to Z N(0,1). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 5 / 16

Proof. Normality comes from sums of iid radom variables without distributioal assumptio except the fiiteess of the variace. The assumptio of fiite variace is essetially ecessary for covergece of a sume of iid radom variables to ormality. We apply Theorem C7, i.e., we try to show that the chf of ( X µ) Z = = 1 X σ i µ σ i=1 coverges to the chf of N(0,1), which is e t2 /2. Let φ be the chf of (X i µ)/σ (ot depedig o i sice X i s are iid). From the properties of chf, the chf of Z is [ ( )] t φ Z (t) = φ. It remais to show that lim φ Z (t) = e t2 /2 for t R. Sice E((X i µ)/σ) = 0 ad Var((X i µ)/σ) = 1, by Theorem C1 ad Taylor s expasio, UW-Madiso (Statistics) Stat 609 Lecture 20 2015 6 / 16

where The, for ay t R, φ(s) = 1 s2 2 + R(s) as s 0 lim s 0 R(s)/s2 = 0 ( ) t lim R = 0 ad hece [ ( )] t lim φ Z (t) = lim φ = lim [1 t2 2 + R [ = lim 1 1 { t 2 2 + R [ = lim 1 1 ( )] t 2 2 = e t2 /2 ( t )] ( t )}] UW-Madiso (Statistics) Stat 609 Lecture 20 2015 7 / 16

The multivariate CLT Let X 1,X 2,... be iid radom vectors o R k with E(X 1 ) = µ ad fiite covariace matrix Σ. The ( X µ) coverges i distributio to a radom vector X N(0, Σ), the k-dimesioal ormal distributio with mea 0 ad covariace matrix Σ. Proof. By the defiitio of covergece i distributio for radom vectors, we eed to show that for ay costat vector c R k, c ( X µ) coverges i distributio to c X. From the result discussed earlier, we kow that c X N(0,c Σc). By the CLT, (c X E(c X)) Var(c X) = (c X c µ) c Σc coverges i distributio to Z N(0,1). Note that c ΣcZ N(0,c Σc). The proof is the completed because we ca show from the defiitio that, if Y coverges i distributio to Y, the ay coverges i distributio to ay. UW-Madiso (Statistics) Stat 609 Lecture 20 2015 8 / 16

Example 5.5.16. Let X 1,...,X be a radom sample from a egative-biomial(r,p) distributio. Recall that r(1 p) µ = E(X 1 ) =, σ 2 r(1 p) = Var(X 1 ) = p p 2 The, the CLT tells us that the sequece of ormalized radom variables ( X µ) = σ coverges i distributio to Z N(0,1). ( X r(1 p)/p), = 1,2,... r(1 p)/p 2 This provides a way to approximately calculate probabilities that are very difficult to compute exactly. For example, if r = 10, p = 1/2 ad = 30, usig the fact that 30 i=1 X i egative-biomial(r,p), we ca exactly compute UW-Madiso (Statistics) Stat 609 Lecture 20 2015 9 / 16

( ) 30 P( X 11) = P X i 330 i=1 = 330 x=0 The CLT gives us the approximatio ( P( X 30( X 10) 11) = P 20 ( 300 + x 1 x ) (0.5) 300+x = 0.8916 ) 30(11 20) 20 Φ ( 30(11 20) 20 ) = Φ(1.2247) = 0.8888 where Φ(x) is the cdf of N(0,1). Normal approximatio to biomial Let Z biomial(,p), = 1,2,... To apply the CLT, ote that each Z is a sum of idepedet Beroulli radom variables X 1,...,X, i.e., Z = X 1 + + X, = 1,2,... X 1,X 2,... are iid with E(X i ) = p, Var(X i ) = p(1 p) UW-Madiso (Statistics) Stat 609 Lecture 20 2015 10 / 16

The, X = Z / ad Z p ( X p) = coverges i distributio to Z N(0,1) p(1 p) p(1 p) Cosequetly, for ay m, ( P(Z m) = P Z p p(1 p) ) ( m p Φ p(1 p) ) m p p(1 p) From the previous two examples, to approximate a probability of the form P(Z x), we always first ormalize Z to [Z E(Z )]/ Var(Z ) ad the approximate the probability by Φ([x E(Z )]/ Var(Z )). I fact, a little bit more ca be doe usig the followig result. Pólya s theorem If a sequece of radom vectors Z coverges i distributio to Z ad Z has a cotiuous cdf F Z o R k, the lim F Z (x) F Z (x) = 0. x R k UW-Madiso (Statistics) Stat 609 Lecture 20 2015 11 / 16

If the limitig cdf F Z is cotiuous, the the covergece of F Z to F Z is ot oly for every x, but also uiformly for all x R k. This result implies the followig useful result: If Z coverges i distributio to Z with a cotiuous F Z ad c R k with c c, the F Z (c ) F Z (c). Asymptotic ormality If Y, = 1,2,..., is a sequece of radom variables ad µ ad σ > 0, = 1,2,..., are costats (typically µ = E(Y ) ad σ 2 = Var(Y ) if they are fiite) such that (Y µ )/σ coverges i distributio to N(0,1), the for ay sequece {c } of costats, ( ) lim P(Y c µ c ) Φ = 0, i.e., P(Y c ) ca be approximated by Φ ( c µ ) σ, regardless of whether c, µ, or σ has a limit. Sice Φ ( t µ ) σ is the cdf of N(µ,σ 2 ), Y is said to be asymptotically distributed as N(µ,σ 2 ) or simply asymptotically ormal. σ UW-Madiso (Statistics) Stat 609 Lecture 20 2015 12 / 16

Other forms of CLT There are other forms of CLT for o-iid sequeces of radom variables/vectors. We itroduce without proof the followig three. Lideberg s CLT Let X, = 1,2,..., be idepedet radom variables with ) 0 < σ 2 = Var( X j <, = 1,2,... If lim 1 σ 2 [ ] E (X j E(X j )) 2 I { Xj E(X j ) >εσ } = 0 for ay ε > 0, the 1 σ (X j E(X j )) coverges i distributio to N(0,1). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 13 / 16

Liapouov s CLT Let X, = 1,2,..., be idepedet radom variables with lim 1 σ 2+δ E X j EX j 2+δ = 0 for some δ > 0. The the same coclusio as i Lideberg s CLT holds. The coditio i Liapouov s CLT is stroger tha the coditio i Lideberg s CLT, Liapouov s coditio is easier to verify. The CLT for a liear combiatio of iid radom variables Let X, = 1,2,..., be iid radom variables with E(X 1 ) = µ ad Var(X 1 ) = σ 2 <. Let c R, = 1,2,..., be costats such that ( ) The i=1 c i (X i µ) σ lim / ci 2 i=1 max 1 i c2 i / ci 2 i=1 = 0. coverges i distributio to N(0,1). UW-Madiso (Statistics) Stat 609 Lecture 20 2015 14 / 16

Example. We apply Liapouov s CLT to idepedet radom variables X 1,X 2,... satisfyig P(X j = ±j a ) = P(X j = 0) = 1/3, where a > 0, j = 1,2,... Note that E(X j ) = 0 ad for all j ) σ 2 = Var( X j = j ) = Var(X 2 3 We eed to show that, for some δ > 0, For ay δ > 0, Sice lim 1 σ 2+δ E X j EX j 2+δ = 0 E X j E(X j ) 2+δ = 2 3 1 j t 1 j+1 x t dx j j (2+δ)a. j t j=2 j 2a. UW-Madiso (Statistics) Stat 609 Lecture 20 2015 15 / 16

ad we coclude that for ay t > 0. The lim 1 σ 2+δ 1 j+1 x t dx = x t dx = t+1 1 j 1 t + 1, lim 1 t+1 E X j EX j 2+δ = lim = lim ( 3 = 2 = 0. j t = 1 t + 1 2 3 j(2+δ)a ( 2 3 j2a ) 1+δ/2 ( ) 3 δ/2 (2a + 1) 1+δ/2 2 (2 + δ)a + 1 ) δ/2 (2a + 1) 1+δ/2 (2 + δ)a + 1 lim (2+δ)a+1 (2a+1)(1+δ/2) 1 δ/2 UW-Madiso (Statistics) Stat 609 Lecture 20 2015 16 / 16