A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University

Similar documents
CS145: Probability & Computing

Proving the central limit theorem

where r n = dn+1 x(t)

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

18.440: Lecture 19 Normal random variables

8 Laws of large numbers

Multiple Random Variables

Asymptotic Statistics-III. Changliang Zou

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Section 9.1. Expected Values of Sums

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Chapter 7: Special Distributions

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Convergence in Distribution

Stochastic Processes for Physicists

System Identification

Recitation 2: Probability

conditional cdf, conditional pdf, total probability theorem?

II. FOURIER TRANSFORM ON L 1 (R)

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

BASICS OF PROBABILITY

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Week 9 The Central Limit Theorem and Estimation Concepts

6.1 Moment Generating and Characteristic Functions

Gaussian vectors and central limit theorem

Lecture 2: Review of Basic Probability Theory

ECO227: Term Test 2 (Solutions and Marking Procedure)

Lecture 1: August 28

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

18.175: Lecture 15 Characteristic functions and central limit theorem

4. Distributions of Functions of Random Variables

Econ 508B: Lecture 5

A Probability Review

The Lindeberg central limit theorem

Experimental Design and Statistics - AGA47A

Limiting Distributions

MAS113 Introduction to Probability and Statistics

Quick Tour of Basic Probability Theory and Linear Algebra

A large deviation principle for a RWRC in a box

1 Presessional Probability

Lecture 11: Probability, Order Statistics and Sampling

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Generating and characteristic functions. Generating and Characteristic Functions. Probability generating function. Probability generating function

Formulas for probability theory and linear models SF2941

We introduce methods that are useful in:

Characteristic Functions and the Central Limit Theorem

Northwestern University Department of Electrical Engineering and Computer Science

18.175: Lecture 13 Infinite divisibility and Lévy processes

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Uncertainty Quantification in Computational Science

Lecture 2: Repetition of probability theory and statistics

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Stat 5101 Notes: Algorithms

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Fourier Sin and Cos Series and Least Squares Convergence

Functions of Several Random Variables (Ch. 5.5)

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation

Review: mostly probability and some statistics

Statistics, Data Analysis, and Simulation SS 2015

Monte Carlo Methods for Stochastic Programming

Fundamental Tools - Probability Theory IV

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Limiting Distributions

Starting from Heat Equation

Math Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT

MATH Solutions to Probability Exercises

Mathematical Preliminaries

7 Convergence in R d and in Metric Spaces

Lecture Tricks with Random Variables: The Law of Large Numbers & The Central Limit Theorem

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Probability and Measure

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

University of Regina. Lecture Notes. Michael Kozdron

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

18.440: Lecture 28 Lectures Review

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Lecture 2: Review of Probability

Exponential Distribution and Poisson Process

The random variable 1

Probability and Distributions

3. Review of Probability and Statistics

Large sample covariance matrices and the T 2 statistic

Continuous distributions

Probability A exam solutions

Lecture 8: Continuous random variables, expectation and variance

Mathematics 426 Robert Gross Homework 9 Answers

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

IEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.

The Central Limit Theorem

Review of Probability. CS1538: Introduction to Simulations

1 Random Variable: Topics

Random Variables and Their Distributions

Economics 583: Econometric Theory I A Primer on Asymptotics

3 Operations on One Random Variable - Expectation

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

Transcription:

1 / 36 A Brief Analysis of Central Limit Theorem Omid Khanmohamadi (okhanmoh@math.fsu.edu) Diego Hernán Díaz Martínez (ddiazmar@math.fsu.edu) Tony Wills (twills@math.fsu.edu) Kouadio David Yao (kyao@math.fsu.edu) SIAM Chapter Florida State University March 17, 2014

2 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

3 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

From Concrete to Abstract: Examples then Theorems! You should start with understanding the interesting examples and build up to explain what the general phenomena are. This was your progress from initial understanding to more understanding. Michael Atiyah [image source: Wikipedia] The source of all great mathematics is the special case, the concrete example. It is frequent in mathematics that every instance of a concept of seemingly great generality is in essence the same as a small and concrete special case. Paul Halmos (19162006) [image source: Wikipedia] 4 / 36

Sum of Dice Throws is (Eventually) Normally Distributed Comparison of probability density functions, p(k), for sum of n fair 6-sided dice, showing convergence to a normal distribution with increasing n [image source: Wikipedia] n = 1 p(k) 0.18 1 / 6 0.16 0.14 0.12 0.10 0.08 0.05 0.04 0.02 0.00 123456 p(k) 0.18 0.16 0.14 0.12 0.10 0.08 0.05 0.04 0.02 0.00 p(k) 0.18 0.16 0.14 0.12 0.10 0.08 0.05 0.04 0.02 0.00 1 / 6 n = 2 2 7 12 n = 3 1 / 8 3 10,11 18k 5 / 36 k k p(k) 0.18 0.16 0.14 0.12 0.10 0.08 0.05 0.04 0.02 0.00 p(k) 0.18 0.16 0.14 0.12 0.10 0.08 0.05 0.04 0.02 0.00 n = 4 73 / 648 4 14 24 n = 5 65 / 648 5 17,18 30k k

6 / 36 Dice Throws (Cont'd) Roll a fair dice 10 9 times, with each roll independently of others. fair = faces have equal probability (identically distributed) Let X i be the number that come up on the ith die and let S 10 9 = 10 9 i=1 X i be the total (sum) of the numbers rolled. The probability that S 10 9 is less than x standard deviations 1 x above its mean is (approximately) 2π e t2 /2 dt.

7 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

8 / 36 Denitions and Assumptions Let X 1, X 2,..., X n be a sequence i.i.d random variables, each with mean µ = 0 and variance σ 2 = 1. Let S n = n i=1 X i. Any other nite µ and σ 2 may be reduced to this case. [ ] Sn E n = 1 E[S n ] = 1 n n n i=1 E[X i] = 0. Var Mean (E) is a linear function. [ Sn n ] = ( 1 n ) 2 Var[S n ] = 1 n n i=1 Var[X i] = 1 n n = 1. Var is not a linear function; it distributes over sums (when the random variables are independent) and it squares scalar multipliers.

9 / 36 Denitions and Assumptions (cont'd) Central Limit Theorem is a statement about the so-called normalized sum dened as Sn nµ Sn which in our case is nσ n. Normalized mean is the dierence between the sum S n and its expected value nµ, measured relative to (in units of) standard deviation nσ; it measures how many standard deviations the sum is from its expected value.

10 / 36 Statement of Central Limit Theorem With the assumptions of the previous slide, we have ( ) Pr a S n b n 1 2π b Convergence ( ) is in distribution. a e t2 /2 dt Convergence is not in probability or almost surely. Convergence is not uniform. as n Tails of the distribution converge more slowly than its center.

11 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

12 / 36 Convergence in Distribution Central Limit Theorem is expressed in terms of convergence in distribution which is dened as follows: Denition (Convergence in Distribution) A sequence of random variables X 1,..., X n converges in distribution to X if, F Xn (x) F X (x) as n at all points x where F X is continuous, where F X represents the distribution of the random variable X, given by F X (x) := Pr(X x)

13 / 36 Characteristic Function and its relation to Convergence in Distribution Denition (Characteristic function) The characteristic function of any real-valued random variable completely denes its probability distribution. Let F X be the distribution function of the random variable X, the characteristic function of X is the function φ X given by E[e iξx ] = φ X (ξ) = e iξx df X (x) = f X (x)e iξx dx, where f X is the density function of X (if it exists). Notice the relation to Fourier transform if the density f X exists. Convergence in distribution and convergence in characteristic are equivalent.

14 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

15 / 36 Fourier Transform Pair The convention we will be using is that the (1 dimensional) Fourier transform of a function f (x) is f (ξ) = f (x)e iξx dx and the inverse Fourier transform of a function f (ξ) is f (x) = 1 2π f (ξ)e iξx dξ.

16 / 36 Convolution If f and g are integrable functions, we dene the convolution f g by (f g)(x) = f (x y)g(y) dy. Convolution is sometimes also known by its German name, faltung ("folding"). Later, in the proof section, we see n-fold convolution which means convolution repeated n times.

17 / 36 Basic Properties of Fourier Transform There are a few basic properties of the Fourier transform that we will need to know. In particular, we need to know what the Fourier transform does to scaling, a Gaussian distribution, and convolution. Scaling: For a non-zero real number α, if g(x) = f (αx), then ĝ(ξ) = α f 1 ( ) ξ. α Gaussian: If f (x) = 1 2π e x2 2, then f (ξ) = 2πf (ξ) Convolution: Under Fourier transforms the convolution becomes multiplication. (f g)(ξ) = f (ξ)ĝ(ξ)

18 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

19 / 36 Overview, View, Review! Tell them what you're going to tell them, tell them, and tell them what you told them. Paul Halmos (19162006) [image source: Wikipedia]

An Overview of the Outline of the Proof Our goal is to outline the steps in showing: ( ) Pr a S n b n 1 2π b a e t2 /2 dt 1. Write density of sum S n in terms of density of its i.i.d terms X i (by using an n-fold convolution) to go from f to f Sn. 2. Find eect of scaling on density (by using a substitution in the integral) to go from f Sn to f Sn/ n. 3. Use the scaling results for Fourier transform and density as well as convolution to go from f Sn/ n to f Sn/ n. 4. Expand f around zero to nd a useful converging expression. 5. Rewrite that converging expression for f Sn/ n to get convergence to a Gaussian density 6. Take inverse Fourier transform to arrive at the standard Gaussian density. 20 / 36

21 / 36 Step 1: From f to f Sn : n-fold Convolution We show the result for two iid variables, X 1 and X 2, with identical distributions F X1 F X2 =: F and densities f X1 f X2 =: f. f X1 +X2 (a) = d da F X1+X2 (a) = d da Pr{X 1 + X 2 a}. F X1 +X2 (a) is given by the integral over {(x 1, x 2 ): x 1 + x 2 a} of f X1 (x 1 )f X2 (x 2 ) = f (x 1 )f (x 2 ): F X1 +X2 (a) = Pr{X 1 + X 2 a} = Dierentiation gives f X1 +X2 (a) = d da = F (a x)f (x) dx = a x2 f (x 1 )f (x 2 ) dx 1 dx 2 F (a x)f (x) dx f (a x)f (x) dx = f f (a)

22 / 36 Step 2: From f Sn to f Sn / : Eect of Scaling on Density n The Central Limit Theorem involves the probability ( ) Pr a S n b. n Notice that if the density of S n is f Sn (t), then ( ) Pr a S n b = Pr ( a n S n ) b n n by making the substitution s = Sn n is nf Sn ( nt). = = b n a n b a f Sn (t) dt nfsn ( ns) ds n t. This shows that the density of

23 / 36 Step 3: From f Sn / to f n Sn / n Now, we have everything we need to get from the density f of a sequence of i.i.d random variables to the characteristic f Sn/ n (ξ) of the corresponding normalized sum S n / n: f Sn (t) = f f (t). fsn (ξ) = (f f )(ξ) = ( f ) n (ξ) f Sn/ n (t) = nf Sn ( nt). f (ξ) = Sn/ n nf Sn ( nt)(ξ) = n 1 ( ) ξ fsn n n ( ) ( ) ξ ξ = f Sn = ( f ) n n n

24 / 36 Step 4: Taylor Expansion of f at 0 The Fourier Transform of the density f (identical for all) of X i is f (ξ) = e iξx f (x)dx Dierentiation under the integral sign can be done, so the Taylor Series is f (ξ) = f (0) + f (0)ξ + f (0)ξ 2 2 + ɛ(ξ)ξ 2 as ξ 0, in which limit ɛ(ξ) 0 also. Observe that f (0) = f (x)dx = 1 f (0) = i xf (x)dx = 0 (mean 0) f (0) = x 2 f (x)dx = 1 (variance 1)

25 / 36 Taylor Expansion of f at 0 (cont'd) So f (ξ) = 1 ξ 2 as ξ 0, which is the same as as ξ 0. 2 + ɛ(ξ)ξ2 ( ) ξ f 2 (ξ) 1 ξ2 0 2

26 / 36 Step 5: Convergence of f Sn / n(ξ) to e ξ2 /2 Hoping that we may get a similar convergence result for f Sn/ n, we write ( f ) n (ξ/ n) (1 ξ2 2n ) n = f (ξ/ ) n) (1 ξ2 n 1 ( f ) k (ξ/ ) n) (1 n k 1 ξ2 2n 2n k=0 f (ξ/ ) n) (1 ξ2 n 1 f (ξ/ n) k 1 ξ2 2n 2n k=0 n k 1

27 / 36 Convergence of f Sn / n(ξ) to e ξ2 /2 Since f (ξ) f L f L 1 = 1, for n large enough we have ( f ) n (ξ/ ) n n) (1 ξ2 n 2n f (ξ/ n) It's clear that as n, ξ/ n 0, so as n, so ( f ) n (ξ/ ) n n) (1 ξ2 0 2n (1 ξ2 2n ) f Sn/ n (ξ) = ( f ) n (ξ/ n) e ξ2 /2

28 / 36 Step 6: Convergence of f Sn / n(x) to e x 2 /2 / 2π: Inverse Fourier Transform Taking the inverse Fourier Transform we obtain f Sn/ n (x) 1 2π e x2 /2 as n, which is the conclusion of the Central Limit Theorem! Observe that this is pointwise convergence in density (or equivalently in distribution).

29 / 36 Outline Examples Statement of Theorem Modes of Convergence Fourier Transform and Convolution Outline of Proof Generalizations

30 / 36 Directions for Generalization Three general versions of CLT will be discussed: Lyapunov's CLT which weakens the hypothesis of identical distribution with a tradeback on the hypothesis of nite variance (Lyapunov's Condition). Lindeberg's CLT which weakens Lyapunov's Condition (nite variance) and maintains the same weak requirements on the distribution of the random variables. Multivariate CLT which uses the covariance matrix of the random variables for the generalization.

31 / 36 Lyapunov's CLT Suppose X 1, X 2,..., X n is a sequence of independent random variables, each with nite expected value µ i and variance σ 2 i (i.e. not identically distributed). Let s 2 n = and for some δ > 0, the following condition (called Lyapunov condition), holds lim n 1 s 2+δ n n E i=1 n i=1 σ 2 i [ X i µ i 2+δ] = 0 then a sum of X i µ i converges in distribution to a standard normal sn random variable, as n.

32 / 36 Lindeberg's CLT Suppose X 1, X 2,..., X n is a sequence of independent random variables, each with nite expected value µ i and variance σ 2 i (i.e. not identically distributed). Let s 2 n = and for every ɛ > 0, the following condition (called Lindeberg condition), holds n i=1 σ 2 i lim n 1 s 2 n n [ E (X i µ i ) 2 1 { Xi µ i >ɛsn}] = 0 i=1 then a sum of X i µ i converges in distribution to a standard normal sn random variable, as n.

33 / 36 Comparison of Finite Variance Conditions Lindeberg: Classical: Lyapunov: Xi µ i >ɛsn (X i µ i ) 2 df i < (X i µ i ) 2 df i < R X i µ i 2+δ df i < R Observe that, in the Classical CLT, µ i = µ and f i (x) = f (x) i

34 / 36 Generalizations in a Nutshell: CLT is Robust If one has a lot of small random terms which are mostly independent and each contributes a small fraction of the total sum, then the total sum must be approximately normally distributed.

35 / 36 Multivariate CLT Suppose {X 1, X 2,..., X n } R d is a sequence of independent random vectors, with nite mean vector E[X i ] = µ and nite covariance matrix Σ, then ( n ) 1 X i nµ N d (0, Σ) n i=1 in distribution as n, where N d (0, Σ) is the multivariate normal distribution with mean vector 0 and covariance matrix Σ. Note: Addition is done componentwise.

Thank you for your attention! Figure: Laplace 36 / 36 4

37 / 36 Outline More Details

38 / 36 Almost Sure convergence and Convergence in Probability Because of their relationship to Convergence in Distribution, it is useful to review Almost Sure Convergence and Convergence in Probability. We let X 1, X 2,..., X n,... be a sequence of random variables dened on the probability space (Ω, F, P) Almost Sure Convergence (Strong convergence): X 1, X 2,..., X n,... converges almost surely to a random variable X if, for every ε > 0 ( ) P lim X n X < ε = 1 n Convergence in Probability (Weak convergence): X 1, X 2,..., X n,... converges in probability to X if, for for every ε > 0 lim P ( X n X < ε) = 1 or n lim P ( X n X ε) = 0 n

39 / 36 Notable Relationship between Convergence Concepts (A.S.) Conv = Conv in Prob = Conv in Distribution