Lecture Tricks with Random Variables: The Law of Large Numbers & The Central Limit Theorem

Similar documents
Lecture 1. ABC of Probability

Lecture 11. Probability Theory: an Overveiw

MATH Notebook 5 Fall 2018/2019

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

Lecture 3. Discrete Random Variables

Lecture 12. Introduction to Survey Sampling

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

18.440: Lecture 19 Normal random variables

Random Variables. Lecture 6: E(X ), Var(X ), & Cov(X, Y ) Random Variables - Vocabulary. Random Variables, cont.

Expectations and Variance

1 Presessional Probability

University of Regina. Lecture Notes. Michael Kozdron

Lecture 1. ABC of Probability

18.175: Lecture 13 Infinite divisibility and Lévy processes

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Expectations and Variance

Asymptotic Statistics-III. Changliang Zou

Lecture Notes 3 Convergence (Chapter 5)

Economics 241B Review of Limit Theorems for Sequences of Random Variables

S n = x + X 1 + X X n.

Lecture 2. Conditional Probability

Lecture 1: August 28

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Lecture 12. Poisson random variables

6.1 Moment Generating and Characteristic Functions

Lecture 16. Theory of Second Order Linear Homogeneous ODEs

STA205 Probability: Week 8 R. Wolpert

Lecture 4. Continuous Random Variables and Transformations of Random Variables

B553 Lecture 1: Calculus Review

MAS113 Introduction to Probability and Statistics

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Lecture 4: September Reminder: convergence of sequences

Lecture 1: Review on Probability and Statistics

Class 8 Review Problems 18.05, Spring 2014

MATH/STAT 3360, Probability

1 Basic continuous random variable problems

Large Sample Properties & Simulation

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Lecture 8: Continuous random variables, expectation and variance

1 Review of Probability

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

Discrete Distributions

. Find E(V ) and var(v ).

Week 12-13: Discrete Probability

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Tom Salisbury

Introduction and Overview STAT 421, SP Course Instructor

Bivariate scatter plots and densities

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Xt i Xs i N(0, σ 2 (t s)) and they are independent. This implies that the density function of X t X s is a product of normal density functions:

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

MATH 18.01, FALL PROBLEM SET #5 SOLUTIONS (PART II)

Taylor Series. Math114. March 1, Department of Mathematics, University of Kentucky. Math114 Lecture 18 1/ 13

Lecture 2: Review of Probability

Admin and Lecture 1: Recap of Measure Theory

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Econ 508B: Lecture 5

S.R.S Varadhan by Professor Tom Louis Lindstrøm

Probability and random variables. Sept 2018

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Probability and Measure

1 Basic continuous random variable problems

Quick Tour of Basic Probability Theory and Linear Algebra

Class 26: review for final exam 18.05, Spring 2014

Central Theorems Chris Piech CS109, Stanford University

Distributions of linear combinations

Chapter 4 : Expectation and Moments

Review of Probability Theory

Fundamental Probability and Statistics

Poisson approximations

Lecture 2. Classification of Differential Equations and Method of Integrating Factors

P (A G) dp G P (A G)

ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172.

A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University

CS 246 Review of Proof Techniques and Probability 01/14/19

1.1 Review of Probability Theory

5601 Notes: The Sandwich Estimator

Lecture notes for probability. Math 124

Probability on a Riemannian Manifold

Introduction to Machine Learning

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Example 1. Assume that X follows the normal distribution N(2, 2 2 ). Estimate the probabilities: (a) P (X 3); (b) P (X 1); (c) P (1 X 3).

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Chapter 2: Random Variables

Markov chain Monte Carlo

Introduction to Machine Learning

CS281 Section 4: Factor Analysis and PCA

18.600: Lecture 24 Covariance and some conditional expectation exercises

Monte Carlo Integration I [RC] Chapter 3

Lecture 31. Basic Theory of First Order Linear Systems

Lecture 5: Asymptotic Equipartition Property

Lecture 7: February 6

Probability Theory and Simulation Methods

Introduction to Probability

Probability and Estimation. Alan Moses

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

ST 371 (IX): Theories of Sampling Distributions

Practice Exam 1: Long List 18.05, Spring 2014

18.440: Lecture 25 Covariance and some conditional expectation exercises

Transcription:

Math 408 - Mathematical Statistics Lecture 9-10. Tricks with Random Variables: The Law of Large Numbers & The Central Limit Theorem February 6-8, 2013 Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 1 / 17

Agenda Large Sample Theory Types of Convergence Convergence in robability Convergence in istribution The Law of Large Numbers The Monte Carlo Method The Central Limit Theorem Multivariate version Summary Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 2 / 17

Large Sample Theory The most important aspect of probability theory concerns the behavior of sequences of random variables. This part of probability is called large sample theory or limit theory or asymptotic theory. This theory is extremely important for statistical inference. The basic question is this: What can we say about the limiting behavior of a sequence of random variables? X 1, X 2, X 3... In the statistical context: What happens as we gather more and more data? In Calculus, we say that a sequence of real numbers x 1, x 2,... converges to a limit x if, for every ɛ > 0, we can find N such that x n x < ɛ for all n > N. In robability, convergence is more subtle. Going back to calculus, suppose that x n = 1/n. Then trivially, lim n x n = 0. Consider a probabilistic version of this example: suppose that X 1, X 2,... are independent and X n N (0, 1/n). Intuitively, X n is very concentrated around 0 for large n, and we are tempted to say that X n converges to zero. However, (X n = 0) = 0 for all n! Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 3 / 17

Types of Convergence There are two main types of convergence: convergence in probability and convergence in distribution efinition Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the CF of X n and let F denote the CF of X. X n converges to X in probability, written X n X, if for every ɛ > 0 lim ( X n X ɛ) = 0 n X n converges to X in distribution, written X n X, if lim F n(x) = F (x) n for all x for which F is continuous. Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 4 / 17

Relationships Between the Types of Convergence Example: Let X n N (0, 1/n). Then X n X n 0 0 Question: Is there any relationship between Answer: Yes: X n and? X implies that X n X Important Remark: The reverse implication does not hold: convergence in distribution does not imply convergence in probability. Example: Let X N (0, 1) and let X n = X for all n. Then X n X n X X Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 5 / 17

The Law of Large Numbers The law of large numbers is one of the main achievements in probability. This theorem says that the mean of a large sample is close to the mean of the distribution. The Law of Large Numbers Let X 1, X 2,... be an i.i.d. sample and let µ = E[X 1 ] and σ 2 = V[X 1 ] <. Then X n = 1 n n i=1 X i µ Useful Interpretation: The distribution of X n becomes more concentrated around µ as n gets larger. Example: Let X i Bernoulli(p). The fraction of heads after n tosses is X n. According to the LLN, X n E[X i ] = p. It means that, when n is large, the distribution of X n is tightly concentrated around p. Q: How large should n be so that ( X n p < ɛ) 1 α? Answer: n p(1 p) αɛ 2 Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 6 / 17

The Monte Carlo Method Suppose we want to calculate I (f ) = 1 0 f (x)dx where the integration cannot be done by elementary means. The Monte Carlo method works as follows: 1 Generate independent uniform random variables on [0,1], X 1,..., X n U[0, 1] 2 Compute Y 1 = f (X 1 ),..., Y n = f (X n ). Then Y 1,..., Y n are i.i.d. 3 By the law of large numbers Y n should be close to E[Y 1 ]. Therefore: 1 n n f (X i ) = Y n E[Y 1 ] = E[f (X 1 )] = i=1 1 0 f (x)dx Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 7 / 17

Monte Carlo method: Example Suppose we want to compute the following integral: I = 1 0 x 2 dx From calculus: I = 1/3 Using Monte Carlo method: I (n) = 1 n n i=1 X 2 i, where X i U[0, 1] 0.4 0.35 Mone Carlo estimate I(n) 0.3 0.25 0.2 0.15 0.1 0.05 0 0 100 200 300 400 500 600 700 800 900 1000 n Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 8 / 17

Accuracy of the Monte Carlo method 1 n n f (X i ) i=1 1 0 f (x)dx, X 1,..., X n U[0, 1] Question: How large should n be to achieve a desired accuracy? Answer: Let f : [0, 1] [0, 1]. To get 1 n n i=1 f (X i) within ɛ of the true value I (f ) with probability at least p, we should choose n so that n 1 ɛ 2 (1 p) Thus, the Monte Carlo method tells us how large to take n to get a desired accuracy. Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 9 / 17

The Central Limit Theorem Suppose that X 1,..., X n are i.i.d. with mean µ and variance σ 2. The central limit theorem (CLT) says that X n has a distribution which is approximately Normal. This is remarkable since nothing is assumed about the distribution of X i, except the existence of the mean and variance. The Central Limit Theorem Let X 1,..., X n be i.i.d. with mean µ and variance σ 2. Let X n = 1 n n i=1 X i. Then Z n X n µ σ/ n Z N (0, 1) Useful Interpretation: robability statements about X n can be approximated using a Normal distribution. Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 10 / 17

The Central Limit Theorem Z n X n µ σ/ n Z N (0, 1) There are several forms of notation to denote the fact that the distribution of Z n is converging to a Normal. They all mean the same thing: Z n N (0, 1) ) X n N (µ, σ2 n ) X n µ N (0, σ2 n n(x n µ) N (0, σ 2 ) X n µ σ/ n N (0, 1) Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 11 / 17

The Central Limit Theorem: Remarks The CLT asserts that the CF of X n, suitably normalized to have mean 0 and variance 1, converges to the CF of N (0, 1). Q: Is the corresponding result valid at the level of Fs and MFs? Broadly speaking the answer is yes, but some condition of smoothness is necessary (generally, F n (x) F (x) does not imply F n(x) F (x)). The CLT does not say anything about the rate of convergence. The CLT tells us that in the long run we know what the distribution must be. Even better: it is always the same distribution. Still better: it is one which is remarkably easy to deal with, and for which we have a huge amount of theory. Historic Remark: For the special case of Bernoulli variables with p = 1/2, CLT was proved by de Moivre around 1733. General values of p were treated later by Laplace. The first rigorous proof of CLT was discovered by Lyapunov around 1901. Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 12 / 17

The Central Limit Theorem: Example Suppose that the number of errors per computer program has a oisson distribution with mean λ = 5. f (k λ) = e We get n = 125 programs; n is sample size λ λk k! Let X 1,..., X n be the number of errors in the programs, X i oisson(λ). Estimate probability (X n λ + ɛ), where ɛ = 0.5. Answer: ( ) n (X n λ + ɛ) Φ ɛ = Φ(2.5) 0.994 λ Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 13 / 17

The Central Limit Theorem: Example A tourist in Las Vegas was attracted by a certain gambling game in which the customer stakes 1 dollar on each play a win then pays the customer 2 dollars plus the return of her stake a loss costs her only her stake The probability of winning at this game is p = 1/4. The tourist played this game n = 240 times. Assuming that no near miracles happened, about how much poorer was the tourist upon leaving the casino? Answer: E[payoff] = $60 what is the probability that she lost no money? Answer: [payoff 0] 0 Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 14 / 17

The Central Limit Theorem The central limit theorem tells us that Z n = X n µ σ/ n N (0, 1) However, in applications, we rarely know σ. We can estimate σ 2 from X 1,..., X n by sample variance Sn 2 = 1 n (X i X n ) 2 n 1 Question: If we replace σ with S n is the central limit theorem still true? Answer: Yes! Theorem i=1 Assume the same conditions as the CLT. Then, X n µ S n / n Z N (0, 1) Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 15 / 17

Multivariate Central Limit Theorem Let X 1,..., X n be i.i.d. random vectors with mean µ and covariance matrix Σ: X 1i µ 1 E[X 1i ] X 2i µ 2 X i = µ =.. = E[X 2i ]. X ki µ k E[X ki ] V[X 1i ] Cov(X 1i, X 2i )... Cov(X 1i, X ki ) Cov(X 2i, X 1i ) V[X 2i ]... Cov(X 2i, X ki ) Σ =...... Cov(X ki, X 1i )... Cov(X ki, X k 1i ) V[X ki ] Let X n = (X 1n,..., X kn ) T. Then n(x n µ) N (0, Σ) Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 16 / 17

Summary X n X n X : X n converges to X in probability, if for every ɛ > 0 lim ( X n X ɛ) = 0 n X : Xn converges to X in distribution, if for all x for which F is continuous X n X implies that X n lim F n(x) = F (x) n X The Law of Large Numbers: Let X 1, X 2,... be an i.i.d. sample and let µ = E[X 1 ]. Then X n = 1 n X i µ n The Central Limit Theorem: Let X 1,..., X n be i.i.d. with mean µ and variance σ 2. Then Z n X n µ σ/ n i=1 Z N (0, 1) Konstantin Zuev (USC) Math 408, Lecture 9-10 February 6-8, 2013 17 / 17