Lecture 19: Convergence

Similar documents
Lecture 8: Convergence of transformations and law of large numbers

Distribution of Random Samples & Limit theorems

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 33: Bootstrap

LECTURE 8: ASYMPTOTICS I

ST5215: Advanced Statistical Theory

1 Convergence in Probability and the Weak Law of Large Numbers

This section is optional.

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables

Lecture 23: Minimal sufficiency

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Advanced Stochastic Processes.

Lecture Chapter 6: Convergence of Random Sequences

Chapter 6 Principles of Data Reduction

Chapter 2 The Monte Carlo Method

2.2. Central limit theorem.

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Fall 2013 MTH431/531 Real analysis Section Notes

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

STAT Homework 1 - Solutions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

An Introduction to Randomized Algorithms

Sequences and Series of Functions

Lecture 7: Properties of Random Samples

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Introduction to Probability. Ariel Yadin

4. Partial Sums and the Central Limit Theorem

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Notes 5 : More on the a.s. convergence of sums

Probability and Statistics

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

An Introduction to Asymptotic Theory

STA Object Data Analysis - A List of Projects. January 18, 2018

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Notes 27 : Brownian motion: path properties

Parameter, Statistic and Random Samples

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 18: Sampling distributions

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

EE 4TM4: Digital Communications II Probability Theory

Notes 19 : Martingale CLT

Mathematics 170B Selected HW Solutions.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

1 Introduction to reducing variance in Monte Carlo simulations

STAT Homework 2 - Solutions

Chapter 6 Sampling Distributions

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Singular Continuous Measures by Michael Pejic 5/14/10

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

AMS570 Lecture Notes #2

Random Variables, Sampling and Estimation

Introductory statistics

1.010 Uncertainty in Engineering Fall 2008

Math 525: Lecture 5. January 18, 2018

Topic 9: Sampling Distributions of Estimators

Infinite Sequences and Series

Entropy Rates and Asymptotic Equipartition

SDS 321: Introduction to Probability and Statistics

LECTURE 11 LINEAR PROCESSES III: ASYMPTOTIC RESULTS

Lecture 2: Concentration Bounds

Lecture 24: Variable selection in linear models

Solutions: Homework 3

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

INFINITE SEQUENCES AND SERIES

Topic 9: Sampling Distributions of Estimators

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Statistical Theory MT 2009 Problems 1: Solution sketches

Chapter 6 Infinite Series

Topic 9: Sampling Distributions of Estimators

Asymptotic distribution of products of sums of independent random variables

Lecture 10 October Minimaxity and least favorable prior sequences

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Asymptotic Results for the Linear Regression Model

Statistical Theory MT 2008 Problems 1: Solution sketches

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

6 Infinite random sequences

Stat 421-SP2012 Interval Estimation Section

Lecture 15: Density estimation

Probability and Random Processes

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Math 61CM - Solutions to homework 3

Approximations and more PMFs and PDFs

Lecture 12: September 27

Transcription:

Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may complicated problems we are ot able to fid exactly the momets or distributios of give statistics. Whe the sample size is large, we may approximate the momets ad distributios of statistics, usig asymptotic tools, some of which are studied i this course. I a asymptotic aalysis, we cosider a sample X = (X 1,...,X ) ot for fixed, but as a member of a sequece correspodig to = 0, 0 + 1,..., ad obtai the limit of the distributio of a appropriately ormalized statistic or variable T (X) as. The limitig distributio ad its momets are used as approximatios to the distributio ad momets of T (X) i the situatio with a large but actually fiite. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 1 / 17

This leads to some asymptotic statistical procedures ad asymptotic criteria for assessig their performaces. The asymptotic approach is ot oly applied to the situatio where o exact method (the approach cosiderig a fixed ) is available, but also used to provide a procedure simpler (e.g., i terms of computatio) tha that produced by the exact approach. I additio to providig more theoretical results ad/or simpler procedures, the asymptotic approach requires less striget mathematical assumptios tha does the exact approach. Defiitio 5.5.1 (covergece i probability) A sequece of radom variables Z, i = 1,2,..., coverges i probability to a radom variable Z iff for every ε > 0, lim P( Z Z ε) = 0. A sequece of radom vectors Z coverges i probability to a radom vector Z iff each compoet of Z coverges i probability to the correspodig compoet of Z. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 2 / 17

Theorem 5.5.2 (Weak Law of Large Numbers (WLLN)) Let X 1,...,X be iid radom variables with E(X i ) = µ ad fiite Var(X i ) = σ 2. The, the sample mea X coverges i probability to µ. Proof. By Chebychev s iequality ad Theorem 5.2.6, P( X µ ε) which coverges to 0 as. Remarks. Var( X) ε 2 = σ 2 ε 2 1 Although we write the sample mea as X, it depeds o. 2 The WLLN states that the probability of the sample mea X beig close to the populatio mea µ coverges to 1. 3 The existece of a fiite variace σ 2 is ot eeded; we oly eed the existece of E(X i ), a proof will be give later. 4 The idepedece assumptio is ot ecessary either: i the previous proof, we oly eed X i s are ucorrelated. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 3 / 17

Example. Suppose that X 1,...,X are idetically distributed with E(X i ) = µ ad Var(X i ) = σ 2 <, ad that { c if s t = 1 Cov(X t,x s ) = 0 if s t > 1 The X coverges i probability to µ, because ( Var( X) 1 = Var = 1 2 ( = σ 2 ad, Chebychev s iequality, P( X µ ε) i=1 i=1 X i ) = 1 2 Var ( i=1 X i ) Var(X i ) + Cov(X i,x j ) i j + ( 1)c 2 ) Var( X) ε 2 = σ 2 + (1 1 )c ε 2 0 UW-Madiso (Statistics) Stat 609 Lecture 19 2015 4 / 17

A proof of the WLLN usig chf s Let X 1,...,X be iid radom variables with E X 1 < ad E(X i ) = µ. From the result for the chf (Theorem C1), the chf of X 1 is differetiable at 0 ad φ X1 (t) = 1 + ıµt + o( t ) as t 0. The, the chf of X is φ X (t) = [ φ X1 ( t )] = [ 1 + ıµt ( )] t + o e ıµt for ay t R as, because (1 + c /) e c for ay complex sequece {c } satisfyig c c. The limitig fuctio e ıµt is the chf of the costat µ. By Theorem C7, if F X (x) is the cdf of X, the lim F X (x) = { 1 x > µ 0 x < µ This shows that X coverges i probability to µ, because of Theorem 5.5.13 to be established later. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 5 / 17

Theorem 5.5.4. Let Z 1,Z 2,... be radom vectors that coverge i probability to a radom vector Z ad let h be a cotiuous fuctio. The h(z 1 ),h(z 2 ),... coverges i probability to h(z ). Example 5.5.3. Let X 1,X 2,... be iid radom variable with E(X i ) = µ ad Var(X i ) = σ 2. Cosider the sample variace Defie S 2 = 1 1 Z = 1 i=1 i=1 (X i X) 2 = 1 1 i=1 (X i µ) 2 ( X µ) 2 1 (X i µ) 2, U = X µ, a = 1 By the WLLN, (Z,U ) coverges i probability to (σ 2,0). Note that a 1 ad a is ot radom, but we ca view that a coverges i probability to 1. The, by Theorem 5.5.4, S 2 = h(a,z,u ) = a (Z U) 2 coverges i probability to h(1,σ 2,0) = σ 2. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 6 / 17

Example 5.5.5. Cosider h(x) = x. By Theorem 5.5.4, the sample stadard deviatio S = h(s 2 ) coverges i probability to the populatio stadard deviatio σ = h(σ 2 ). Note that covergece i probability is differet from the covergece of a sequece of determiistic fuctios g (x) to a fuctio g(x) for x i a set A R k. Similar to the covergece of determiistic fuctios (ote that radom variables are radom fuctios), we have the followig cocept. Defiitio 5.5.6 (covergece almost surely) A sequece of radom variables Z, = 1,2,..., coverges almost surely to a radom variable Z iff ( ) P lim Z = Z = 1. A sequece of radom vectors Z, = 1,2,..., coverges almost surely to a radom vector Z iff each compoet of Z coverges almost surely to the correspodig compoet of Z. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 7 / 17

The almost sure covergece of Z to Z meas that there is a evet N such that P(N) = 0 ad for every elemet ω N c, lim Z (ω) = Z (ω), which is almost the same as poit-wise covergece for determiistic fuctios (Example 5.5.7). If a sequece of radom vectors Z coverges almost surely to a radom vector Z, ad h is a cotiuous fuctio, the h(z ) coverges almost surely to h(z ). If Z coverges almost surely to Z, the Z coverges i probability to Z. Covergece i probability, however, does ot imply covergece almost surely (Example 5.5.8). If Z coverges i probability fast eough, the it coverges almost surely, i.e., if for every ε > 0, P( Z Z ε) <, =1 the Z coverges almost surely to Z. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 8 / 17

It is, however, ot easy to costruct a example of covergece i probability but ot almost surely. Similar to the WLLN i Theorem 5.5.2, we have the followig result with almost sure covergece. Theorem 5.5.9 (Strog Law of Large Numbers (SLLN)) Let X 1,...,X be iid radom variables with E(X i ) = µ. The, the sample mea X coverges almost surely to µ. Note that we still oly require the existece of the mea, ot the secod order momet. The proof is omitted, sice it is out of the scope of the textbook. Approximatio to a itegral Suppose that h(x) is a fuctio of x R k. I may applicatios we wat to calculate a itegral R k h(x)dx UW-Madiso (Statistics) Stat 609 Lecture 19 2015 9 / 17

If the itegral is ot easy to calculate, a umerical method is eeded. The followig is the so called Mote Carlo approximatio method, which is based o the SLLN. Suppose that we ca geerate iid radom vectors X 1,X 2,... from a pdf p(x) o R k satisfyig that p(x) > 0 if h(x) 0. By the SLLN, with probability equal to 1 (almost surely), ( ) 1 h(x lim i ) h(x1 ) = E p(x i=1 i ) p(x 1 ) h(x) = R k p(x) p(x)dx = h(x)dx R k Thus, we ca approximate the itegral by the average 1 h(x i ) i=1 p(x i ) with a very large. We ca actually fid what is the large eough to have a good approximatio. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 10 / 17

We ofte eed to cosider a covergece eve weaker tha covergece i probability. Defiitio 5.5.10 (covergece i distributio) A sequece of radom variables Z, = 1,2,..., coverges i distributio to a radom variable Z iff lim F Z (x) = F Z (x), x {y : F Z (y) is cotious } where F Z ad F Z are the cdf s of Z ad Z, respectively. Note that we oly eed to cosider the covergece at x that is a cotiuity poit of F Z. Note that cdf s, ot pdf s or pmf s, are ivolved i this defiitio. I covergece i distributio, it is really the cdfs that coverge, ot the radom variables; i fact, the radom variables ca be defied i differet spaces, which is very differet from the covergece i probability or almost surely. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 11 / 17

Example 5.5.11. Let X 1,X 2,... be iid from uiform o (0,1) ad X () = max i=1,..., X i. For every ε > 0, P( X () 1 ε) = P(X () 1 ε) + P(X () 1 + ε) = P(X () 1 ε) = P(X i 1 ε,i = 1,...,) = (1 ε) Hece, X () coverges i probability to 1. I fact, sice =1 (1 ε) <, X () coverges almost surely to 1. For ay t > 0, P((1 X () ) t) = 1 P((1 X () ) t) = 1 P(X () 1 t/) = 1 (1 t/) 1 e t which is the cdf of the expoetial(0, 1) distributio. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 12 / 17

It is clear that P((1 X () ) t) = 0 if t 0. Thus, (1 X () ) coverges i distributio to Z expoetial(0,1). The ext theorem shows that covergece i distributio is weaker tha covergece i probability ad, hece, is also weaker tha almost sure covergece. Theorem 5.5.12. If Z coverges i probability to Z, the Z coverges i distributio to Z. Proof. For ay x R ad ε > 0, F Z (x ε) = P ( Z x ε ) Lettig, we obtai that P ( Z x ) + P ( Z x ε,z > x ) F Z (x) + P ( Z Z > ε). F Z (x ε) limiff Z (x). UW-Madiso (Statistics) Stat 609 Lecture 19 2015 13 / 17

Switchig Z ad Z i the previous argumet, we ca show that i.e., F Z (x ε) limif Sice ε is arbitrary, F Z (x + ε) limsupf Z (x) F Z (x) limsupf Z (x) F Z (x + ε) lim F Z (x ε) limiff Z (x) limsupf Z (x) lim F Z (x + ε) ε 0 ε 0 Now, if F Z is cotiuous at x, the the limit o the far left had side equals the limit o the far right had side ad both are equal to F Z (x), which shows that F Z (x) = lim F Z (x). Example. The coverse of Theorem 5.5.12 is ot true i geeral. Let θ = 1 + 1 ad X be a radom variable havig the expoetial(0,θ ) distributio, = 1,2,... Let X be a radom variable expoetial(0,1). UW-Madiso (Statistics) Stat 609 Lecture 19 2015 14 / 17

For ay x > 0, as, F X (x) = 1 e x/θ 1 e x = F X (x) Sice F X (x) 0 F X (x) for x 0, we have show that X coverges i distributio to X. Does X coverge i probability to X? Case 1 Need further iformatio about the radom variables X ad X. We cosider two cases i which differet aswers ca be obtaied. Suppose that X θ X (the X has the give distributio). X X = (θ 1)X = 1 X, which has the cdf (1 e x )I [0, ) (x). The, X coverges i probability to X, because, for ay ε > 0, P ( X X ε) = e ε 0 I fact, X coverges almost surely to X, sice e ε < =1 UW-Madiso (Statistics) Stat 609 Lecture 19 2015 15 / 17

Case 2 Suppose that X ad X are idepedet radom variables. Sice the pdf s of X ad X are θ 1 e x/θ I (0, ) (x) ad e x I (,0) (x), respectively, we have ε P ( X X ε) = θ 1 e x/θ e y x I (0, ) (x)i (,x) (y)dxdy, ε which coverges to (by the domiated covergece theorem) ε e x e y x I (0, ) (x)i (,x) (y)dxdy = 1 e ε. Thus, ε P ( X X ε) e ε > 0 for ay ε > 0 ad, therefore, X does ot coverge i probability to X. I oe situatio, however, covergece i distributio is equivalet to covergece i probability, as the followig result shows. Theorem 5.5.13. Z coverges i probability to a costat c iff Z coverges i distributio to c. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 16 / 17

Proof. The oly if" part is a special case of Theorem 5.5.12. Hece, we oly eed to show the if" part. If Z coverges i distributio to a costat c, the lim P(Z x) = { 0 x < c 1 x > c which is the cdf of a costat c. (Note that the limit does ot iclude the case of x = 0, which is a discotiuity poit of the cdf of c. For every ε > 0, P( Z c ε) = P(Z c ε) + P(Z c ε) = P(Z c + ε) + P(Z c ε) = 1 P(Z < c + ε) + P(Z c ε) 1 P(Z < c + ε/2) + P(Z c ε) 1 1 + 0 = 0 sice c ε < c ad c + ε/2 > c. This proves that Z coverges i probability to c. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 17 / 17