Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Similar documents
Probability inequalities 11

Introduction. Outline. Paninski, Intro. Math. Stats., October 5, What is statistics? inference from data. building models of the world

TAYLOR POLYNOMIALS DARYL DEFORD

8 Laws of large numbers

6.1 Moment Generating and Characteristic Functions

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Math 5a Reading Assignments for Sections

Natural deduction for truth-functional logic

Lecture Tricks with Random Variables: The Law of Large Numbers & The Central Limit Theorem

1 Review of Probability

(, ) : R n R n R. 1. It is bilinear, meaning it s linear in each argument: that is

Math 480 The Vector Space of Differentiable Functions

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

Physics 6720 Introduction to Statistics April 4, 2017

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture 9: Taylor Series

12. Perturbed Matrices

Lecture 4: Constructing the Integers, Rationals and Reals

Math 425 Fall All About Zero

Asymptotic and Perturbation Methods: Lecture Notes.

Exercise 2. Prove that [ 1, 1] is the set of all the limit points of ( 1, 1] = {x R : 1 <

CH 59 SQUARE ROOTS. Every positive number has two square roots. Ch 59 Square Roots. Introduction

Notes on the Z-transform, part 4 1

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

5.4 Continuity: Preliminary Notions

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

The Multivariate Gaussian Distribution [DRAFT]

Math 341: Probability Seventeenth Lecture (11/10/09)

Is there a rigorous high school limit proof that 0 0 = 1?

Vector, Matrix, and Tensor Derivatives

Lecture 6: Finite Fields

Power series and Taylor series

A Few Examples of Limit Proofs

T has many other desirable properties, and we will return to this example

Finding Limits Graphically and Numerically

Solutions to Math 41 First Exam October 18, 2012

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

. Get closed expressions for the following subsequences and decide if they converge. (1) a n+1 = (2) a 2n = (3) a 2n+1 = (4) a n 2 = (5) b n+1 =

Order Statistics and Distributions

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Main topics for the First Midterm Exam

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

CS 124 Math Review Section January 29, 2018

HOW TO LOOK AT MINKOWSKI S THEOREM

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

At the start of the term, we saw the following formula for computing the sum of the first n integers:

Section 1.x: The Variety of Asymptotic Experiences

Section Example Determine the Maclaurin series of f (x) = e x and its the interval of convergence.

STA205 Probability: Week 8 R. Wolpert

Lecture 4: September Reminder: convergence of sequences

MITOCW MITRES18_006F10_26_0602_300k-mp4

2 Discrete Dynamical Systems (DDS)

Hence, (f(x) f(x 0 )) 2 + (g(x) g(x 0 )) 2 < ɛ

Limiting Distributions

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

One-sample categorical data: approximate inference

Math 111, Introduction to the Calculus, Fall 2011 Midterm I Practice Exam 1 Solutions

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Department of Mathematics

10.8 Further Applications of the Divergence Theorem

Taylor and Maclaurin Series

Derivatives and the Product Rule

Limiting Distributions

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). Andrew Dabrowski

Problem Solving in Math (Math 43900) Fall 2013

where r n = dn+1 x(t)

8. TRANSFORMING TOOL #1 (the Addition Property of Equality)

MATH 308 COURSE SUMMARY

Econ 508B: Lecture 5

7 Random samples and sampling distributions

University of Regina. Lecture Notes. Michael Kozdron

CHAPTER 3: THE INTEGERS Z

Multivariate Distributions (Hogg Chapter Two)

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

5.2 Fisher information and the Cramer-Rao bound

Math 328 Course Notes

the law of large numbers & the CLT

Cool Results on Primes

Problem List MATH 5143 Fall, 2013

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Sequences and infinite series

MITOCW MITRES_18-007_Part5_lec3_300k.mp4

Integer-Valued Polynomials

Row Reduction

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Proof Techniques (Review of Math 271)

Quantum Mechanics-I Prof. Dr. S. Lakshmi Bala Department of Physics Indian Institute of Technology, Madras. Lecture - 21 Square-Integrable Functions

The Integers. Peter J. Kahn

19. TAYLOR SERIES AND TECHNIQUES

EXAM 2 REVIEW DAVID SEAL

Biostatistics: Correlations

Math 341 Summer 2016 Midterm Exam 2 Solutions. 1. Complete the definitions of the following words or phrases:

Linear Algebra, Summer 2011, pt. 2

General Technical Remarks on PDE s and Boundary Conditions Kurt Bryan MA 436

Transcription:

Paninski, Intro. Math. Stats., October 5, 2005 35 probability, Z P Z, if P ( Z Z > ɛ) 0 as. (The weak LL is called weak because it asserts convergence in probability, which turns out to be a somewhat weak sense of stochastic convergence, in the mathematical sense that there are stronger forms of convergence that is, it s possible to find sequences of r.v. s which converge in probability but not in these stronger senses. In addition, it s possible to prove the LL without assuming that the variance exists; existence of the mean turns out to be sufficient. But discussing these stronger concepts of convergence would take us too far afield 3 ; convergence in probability will be plenty strong enough for our purposes.) We discussed convergence of r.v. s above; it s often also useful to think about convergence of distributions. We say a sequence of r.v. s with cdf s F (u) converge in distribution if lim F (u) F (u) for all u such that F is continuous at u (here F is itself a cdf). Exercise 35: Explain why do we need to restrict our attention to continuity points of F. (Hint: think of the following sequence of distributions: F (u) = (u < /), where the indicator function of a set A is one if x A and zero otherwise.) It s worth emphasizing that convergence in distribution because it only looks at the cdf is in fact weaker than convergence in probability. For example, if p X is symmetric, then the sequence X, X, X, X,... trivially converges in distribution to X, but obviously doesn t converge in probability. Exercise 36: Prove that convergence in probability actually is stronger, that is, implies convergence in distribution. Central limit theorem The second fundamental result in probability theory, after the LL, is the CLT: if X i are i.i.d. with mean zero and variance, then X i D (0, ), 3 Again, see e.g. Breiman 68 for more information.

Paninski, Intro. Math. Stats., October 5, 2005 36 Figure 9: De Moivre and Laplace. where (0, ) is the standard normal density. rescalings tell us that More generally, the usual σ(x) (X i E(X)) D (0, ). Thus we know not only that (from the LL) the distribution of the sample mean approaches the degenerate distribution on E(X), but moreover (from the CLT) we know exactly what this distribution looks like, asymptotically, if we take out our magnifying glass and zoom in on E(X), to a scale of /2. In this sense the CLT is a stronger result than the WLL: it gives more details about what the asymptotic distribution actually looks like. One thing worth noting: keep in mind that the CLT really only tells us what s going on in the local neighborhood (E(X) /2 c, E(X) + /2 c) think of this as the mean plus or minus a few standard deviations. But this does not imply that, say, P ( X i ɛ) ɛ (0, ɛ )(x)dx = (0, )(x)dx not true; a different asymptotic approximation typically holds for the large devia-

Paninski, Intro. Math. Stats., October 5, 2005 37 tions, the tails of the sample mean distribution 4. More on stochastic convergence So, as emphasized above, convergence in distribution can drastically simplify our lives, if we can find a simple approximate (limit) distribution to substitute for our original complicated distribution. The CLT is the canonical example of this; the Poisson theorem is another. What are some general methods to prove convergence in distribution? Delta method The first thing to note is that if X converge in distribution or probability to a constant c, then g(x ) D g(c) for any continuous function g(.). Exercise 37: Prove this, using the definition of continuity of a function: a function g(u) is continuous at u if for any possible fixed ɛ > 0, there is some (possibly very small) δ such that g(u + v) g(u) < ɛ, for all v such that δ < v < δ. (If you re having trouble, just try proving this for convergence in probability.) So the LL for sample means immediately implies an LL for a bunch of functions of the sample mean, e.g., if X i are i.i.d. with V (X) <, then ( e X i ) / = e P X i P e E(X), (which of course should not be confused with E(e X ); in fact, Exercise 38: Which is greater, E(e X ) or e E(X)? Give an example where one of E(e X ) or e E(X) is infinite, but the other is finite). We can also zoom in to look at the asymptotic distribution (not just the limit point) of g(z), whenever g is sufficiently smooth. For example, let s say g(.) has a Taylor expansion at u, g(z) = g(u) + (z u) + o( z u ), z u 0, where > 0 and z = o(y) means z/y 0. Then if a (z u) D q, 4 See e.g., Large deviations techniques and applications, Dembo and Zeitouni 93, for more information.

Paninski, Intro. Math. Stats., October 5, 2005 38 for some limit distribution q and a sequence of constants a a = /2, if Z is the sample mean), then (think a g(z ) g(u) D q, since a g(z ) g(u) ( ) Z u = a (Z u) + o a ; the first term converges in distribution to q (by our assumption) and the second one converges to zero in probability (Exercise 39: Prove this; i.e., prove that the remainder term a g(z ) g(u) a (Z u) converges to zero in probability, by using the Taylor expansion formula). In other words, limit distributions are passed through functions in a pretty simple way. This is called the delta method (I suppose because of the deltas and epsilons involved in this kind of limiting argument), and we ll be using it a lot. The main application is when we ve already proven a CLT for Z, Z µ D (0, ), σ in which case (g(z ) g(µ)) D (0, σ 2 (g (µ)) 2 ). Exercise 40: Assume /2 Z D (0, ). Then what is the asymptotic distribution of ) g(z ) = (Z ) 2? 2) what about g(z ) = Z 2? Does anything go wrong when applying the delta method in this case? Can you fix this problem? Mgf method What if the r.v. we re interested in, Y, can t be written as g(x ), i.e., a nice function of an r.v. we already know converges? Are there methods to prove limit theorems directly? Here we turn to our old friend the mgf. It turns out that the following generalization of the mgf invertibility theorem we quoted above is true:

Paninski, Intro. Math. Stats., October 5, 2005 39 Theorem 2. The distribution functions F converge to F if: the corresponding mgf s M X (s) and M X (s) exist (and are finite) for all s ( z, z), for all, for some positive constant z. M X (s) M X (s) for all s ( z, z). So, once again, if we have a good handle on the mgf s of X, we can learn a lot about the limit distribution. In fact, this idea provides the simplest way to prove the CLT. Proof: assume X i has mean zero and unit variance; the general case follows easily, by the usual rescalings. ow let s look at M (s), the mgf of X i. If X i has mgf M(s), then X i has mgf M(s/ ). ow let s make a Taylor expansion. We know that M(0) =, M (0) = 0, and M (0) =. (Why?) So we can write M(s) = + s 2 /2 + o(s 2 ). ow we just note that M (s) converges to e s2 /2, recall the mgf of a standard normal r.v., and then appeal to our general convergence-in-distribution theorem for mgf s.