Distribution of Random Samples & Limit theorems

Similar documents
This section is optional.

Lecture 19: Convergence

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables

4. Partial Sums and the Central Limit Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

LECTURE 8: ASYMPTOTICS I

Convergence of random variables. (telegram style notes) P.J.C. Spreij

EE 4TM4: Digital Communications II Probability Theory

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture 7: Properties of Random Samples

Lecture Chapter 6: Convergence of Random Sequences

Advanced Stochastic Processes.

Probability and Random Processes

Lecture 8: Convergence of transformations and law of large numbers

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Parameter, Statistic and Random Samples

STAT Homework 1 - Solutions

1 Convergence in Probability and the Weak Law of Large Numbers

Lecture 20: Multivariate convergence and the Central Limit Theorem

AMS570 Lecture Notes #2

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Mathematics 170B Selected HW Solutions.

Lecture 18: Sampling distributions

Learning Theory: Lecture Notes

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

ST5215: Advanced Statistical Theory

Probability and Statistics

Topic 9: Sampling Distributions of Estimators

32 estimating the cumulative distribution function

Lecture 2: Concentration Bounds

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

The standard deviation of the mean

Probability and statistics: basic terms

2.2. Central limit theorem.

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Introduction to Probability. Ariel Yadin

Asymptotic distribution of products of sums of independent random variables

An Introduction to Randomized Algorithms

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY

Math 525: Lecture 5. January 18, 2018

An Introduction to Asymptotic Theory

Chapter 6 Principles of Data Reduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Entropy Rates and Asymptotic Equipartition

Notes 19 : Martingale CLT

Generalized Semi- Markov Processes (GSMP)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

HOMEWORK I: PREREQUISITES FROM MATH 727

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Glivenko-Cantelli Classes

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

STA Object Data Analysis - A List of Projects. January 18, 2018

Sequences and Series of Functions

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Random Variables, Sampling and Estimation

Chapter 6 Sampling Distributions

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Lecture 3. Properties of Summary Statistics: Sampling Distribution

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Lecture 6: Coupon Collector s problem

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Understanding Samples

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Mathematical Statistics - MS

Fall 2013 MTH431/531 Real analysis Section Notes

1 Introduction to reducing variance in Monte Carlo simulations

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Approximations and more PMFs and PDFs

5 Birkhoff s Ergodic Theorem

lim za n n = z lim a n n.

Basics of Probability Theory (for Theory of Computation courses)

Central limit theorem and almost sure central limit theorem for the product of some partial sums

Lecture 33: Bootstrap

Unbiased Estimation. February 7-12, 2008

MA Advanced Econometrics: Properties of Least Squares Estimators

Statistical Theory; Why is the Gaussian Distribution so popular?

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Lecture 3 : Random variables and their distributions

Quick Review of Probability

PRACTICE PROBLEMS FOR THE FINAL

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Quick Review of Probability

2.1. Convergence in distribution and characteristic functions.

Transcription:

STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to demostrate the digital trasformatio i the US populatio over the past 5 years. Oe feature of this techological revolutio is for istace the time spet o smartphoes. Let us deote X that radom variable for some US idividual. If oe is iterested i E[X] the mea time spet o smartphoes by the US populatio, that quatity is exactly E[X] = 1 N x i N where N 320 millio, is the size of the US populatio ad x i is the time spet by idividual i o his/her smartphoe. However, i most statistical studies, it is rare to have access to the whole populatio. Therefore, we eed some pricipled guidace to estimate E[X] from a sample of the populatio of size with geerally N. Defiitio 1.1 i.i.d. Sample). Let X 1,..., X be collectio of N radom variables o probability space Ω, A, P). X 1,..., X ) is a radom sample of size if ad oly if: 1) X 1,..., X are mutually idepedet 2) X 1,..., X are idetically distributed, that is, each X i comes from the same distributio. We say that X 1,..., X are idepedet ad idetically distributed abbreviated i.i.d.). A good statistical sample is such that the picked idividuals are represetative of the populatio. The latter is esured by the mutual idepedece of the idividuals. Upo existece, the populatio mea is deoted µ = E[X 1 ] ad the variace of the populatio σ 2 = VarX 1 ). Defiitio 1.2 Sample mea). Let X 1,..., X ) be a radom sample.the the radom variable X = 1 X i 1) is called the sample mea. 1

We are iterested i the distributio of X. We eed to defie formally the distributio of more tha 2 radom variables. You should see that most of the followig defiitios/properties are atural extesios of the case = 2. Defiitio 1.3 Joit cumulative distributio fuctio). Let X 1,..., X be N rrvs o probability space Ω, A, P). The joit cumulative) distributio fuctio c.d.f.) of X 1,..., X is defied as follows : for x 1,..., x ) R F X1...X x 1,..., x ) = P X 1 x 1,..., X x ), 2) Defiitio 1.4 Discrete rrvs). Let X 1,..., X be N discrete rrvs o probability space Ω, A, P). The joit probability mass fuctio pmf) of X 1,..., X, deoted is defied as follows : p X1...X x 1,..., x ) = P X 1 = x 1,..., X = x ), 3) for x 1,..., x ) X 1 Ω)... X Ω). Property 1.1. Let X 1,..., X be N discrete rrvs o probability space Ω, A, P) with joit pmf p X1...X, the the followig holds : p X1...X x 1,..., x ) 0, for x 1,..., x ) X 1 Ω)... X Ω) x 1 X 1Ω)... x X Ω) p X 1...X x 1,..., x ) = 1 Defiitio 1.5 Joit probability desity fuctio). Let X 1,..., X be N rrvs o probability space Ω, A, P). X 1,..., X are said to be joitly cotiuous if there exists a fuctio f X1...X such that, for ay subset B R : PX 1,..., X ) B) = f X1...X x 1,..., x ) dx 1... dx 4) B The fuctio f X1...X is called the joit probability desity fuctio of X 1,..., X. Property 1.2. Let X 1,..., X be N rrvs o probability space Ω, A, P) with joit pdf f X1...X, the the followig holds : f X1...X x 1,..., x ) 0, for x 1,..., x ) R R f X1...X x 1,..., x ) dx 1... dx = 1 Defiitio 1.6 Margial distributios). Let X 1,..., X be N rrvs o probability space Ω, A, P). Discrete case) If X 1,..., X have joit pmf p X1...X. The the margial probability mass fuctio of X i is obtaied by summig over all X j, j i. For istace, p X1 x 1 ) =... p X1...X x 1,..., x ), for x 1 X 1 Ω) x 2 X 2Ω) x X Ω) 5) 2

Cotiuous case) If X 1,..., X have joit pdf f X1...X. The the margial probability desity fuctio of X i is obtaied by itegratig over all X j, j i. For istace, f X1 x 1 ) = f X1...X x 1,..., x ) dx 2... dx, for x 1 R 6) R 1 Defiitio 1.7 Idepedece). Let X 1,..., X be N rrvs o probability space Ω, A, P). Discrete case) If X 1,..., X have joit pmf p X1...X with respective margial pmfs p X1,..., p X. The X 1,..., X are said to be idepedet if ad oly if : p X1...X x 1,..., x ) = p Xi x i ), for all x 1,..., x ) X 1 Ω)... X Ω) 7) Cotiuous case) If X 1,..., X have joit pdf f X1...X with respective margial pdfs f X1,..., f X. The X 1,..., X are said to be idepedet if ad oly if : f X1...X x 1,..., x ) = f Xi x i ), for all x 1,..., x ) R 8) Property 1.3 Distributio of iid sample). Let X 1,..., X ) be a radom sample of size N o probability space Ω, A, P). The Discrete case) p X1...X x 1,..., x ) = p X1 x i ), for all x 1,..., x ) X 1 Ω)... X Ω) 9) Cotiuous case) f X1...X x 1,..., x ) = f X1 x i ), for all x 1,..., x ) R 10) Defiitio 1.8 Expected Value). Let X 1,..., X be N rrvs o probability space Ω, A, P) ad let g : R R. Discrete case) If X 1,..., X have joit pmf p X1...X. The, the mathematical expectatio of gx 1... X ), if it exists, is : E[gX 1... X )] =... gx 1,..., x )p X1...X x 1,..., x ) x 1 X 1Ω) x X Ω) 11) 3

Cotiuous case) If X 1,..., X have joit pdf f X1...X. The, the mathematical expectatio of gx 1... X ), if it exists, is : E[gX 1... X )] = gx 1,..., x )f X1...X x 1,..., x ) 12) R Theorem 1.1. Let X 1,..., X be N idepedet rrvs o probability space Ω, A, P) ad let g 1,..., g be real-valued fuctios o R. The, E[g 1 X 1 )... g X )] = E[g 1 X 1 )]... E[g X )] 13) provided that the expectatios exist. Theorem 1.2 Variace of idepedet rrvs). Let X 1,..., X be N idepedet rrvs o probability space Ω, A, P). The, ) Var X i = VarX i ) 14) provided that the variaces exist. Property 1.4. Let X 1,..., X ) be a radom sample of size N o probability space Ω, A, P) with mea µ = E[X 1 ] ad variace σ 2 = VarX 1 ) <. The E[X ] = µ 15) ad VarX ) = σ2 2 Covergece of radom variables 2.1 Covergece i probability 16) Defiitio 2.1. Let X ) N be a sequece of rrvs o probability space Ω, A, P) ad X be a rrv o the same probability space. Sequece X ) is said to coverge i probability towards X if, for all ε > 0 : Covergece i probability is deoted as follows : lim P X X > ε) = 0 17) X P X Example 1. Let X be a discrete rrv with pmf p X defied by : 1/3 if x = 1 p X x) = 2/3 if x = 0 0 otherwise 4

ad let X = 1 + 1 )X. Show that X P X. Aswer. We have: X X = X + X X = X = X sice X ca oly take oegative values. The, for ay ε > 0, P X X > ε) = P X > ε). Note that the evet { X > ε} ca oly occur whe X = 1 ad ε < 1 sice ε > 0. Therefore, we get: { p X 1) = 1/3 < 1/ε P X X > ε) = 0 > 1/ε It ow becomes obvious that P X X > ε) coverges to 0, because it is idetically equal to zero for all > 1/ε, which etails the desired result. Example 2. For 1, let X ) be a sequece of radom variables where X follows a expoetial distributio with parameter. Show that X ) coverges i probability to 0. Aswer. The probability desity fuctio of X is give by : f X x) = e x 1 [0, ) x). Let ε > 0 be a arbitrary costat, we have P X 0 > ε) = PX > ε) = Hece, the result holds. ε e x dx = e ε 0 sice ε > 0 give that a expoetial rrv ca oly take o oegative values 2.2 Almost sure covergece Defiitio 2.2. Let X ) N be a sequece of rrvs o probability space Ω, A, P) ad X be a rrv o the same probability space. Sequece X ) is said to coverge almost surely or almost everywhere or with probability 1 or strogly towards X if: ) P lim X = X = 1 18) Almost sure covergece is deoted as follows : X a.s. X 5

From Equatio 18, we ote that almost sure covergece is a slightly modified versio of the cocept of poitwise covergece of fuctios recall that a radom variable is formally a mappig from the sample space Ω to R. That is, ω Ω, X ω) Xω) Requirig covergece for all ω Ω is actually too striget. To defie almost sure covergece, we relax the above statemet ad allow that covergece might ot be reached for some outcomes i Ω. Rigorously, let E be the followig evet : E = {ω Ω : X ω) does ot coverge to Xω)} ad F be a evet with zero probability but F is ot the impossible evet, i.e. F. The we say that the sequece X ) coverges almost surely to X if E F. Almost sure covergece is a widely spread cocept i the Probability ad Statistics literature but provig almost sure covergece requires tools from measure theory, which is out of scope of this course. 2.3 Covergece i mea Defiitio 2.3. Let X ) N be a sequece of rrvs o probability space Ω, A, P) ad X be a rrv o the same probability space. Give a real umber r 1, sequece X ) is said to coverge i the r-th mea or i the L r -orm towards X if: lim E[ X X r ] = 0 19) provided that for all, E[ X r ] ad E[ X r ] exist. Covergece i the r-th mea is deoted as follows : X L r X The most importat cases of covergece i the r-th mea are: Whe Equatio 19) holds for r = 1, we say that X ) coverges i mea to X Whe Equatio 19) holds for r = 2, we say that X ) coverges i mea square to X 2.4 Covergece i distributio Defiitio 2.4. Let X ) N be a sequece of rrvs o probability space Ω, A, P). For ay, the distributio fuctio of X is deoted by F. Let X be a rrv with distributio fuctio F X. Sequece X ) is said to coverge i distributio or coverge weakly towards X if: lim F x) = F X x) 20) 6

for all x R at which F X is cotiuous. Covergece i distributio is deoted as follows : X D X The first fact to otice is that covergece i distributio, as the ame suggests, oly ivolves the distributios of the radom variables. Thus, the radom variables eed ot eve be defied o the same probability space that is, they eed ot be defied for the same radom experimet), ad ideed we do t eve eed the radom variables at all. This is i cotrast to the other modes of covergece we have studied. Example 3. Let X ) N be a sequece of rrvs with cdf F F x) = 1 1 1 ) x ) 1 0, ) x) What is the asymptotic distributio of X )? Aswer. Note that for x, 0), we trivially have that F x) = 0 0. Now, let x [0, ), a result i calculus gives : lim 1 1 ) x = e x Therefore, F x) = 1 1 ) 1 x 1 e x. We recogize the cumulative distributio fuctio of a expoetial distributio with parameter 1 for those who are ot coviced, you ca differetiate the expressio o the right-had side. We coclude that the sequece X ) coverges to a expoetial distributio with parameter 1. Theorem 2.1. Let X ) N be a sequece of rrvs o probability space Ω, A, P) with respective mgfs M. Let X be a rrv with mgf M X. If the followig holds : lim M x) = M X x) 21) for all x R where M x) ad M X x) exist, the sequece X ) coverges i distributio to X. 2.5 Implicatios betwee modes of covergece The followig summary gives the implicatios for the various modes of covergece; o other implicatios hold i geeral. 7

Propositio 2.1. 1. For s > r 1, covergece i the s-th mea implies covergece i r-th mea. 2. Covergece i mea implies covergece i probability. 3. Almost sure covergece implies covergece i probability. 4. Covergece i probability implies covergece i distributio. 3 The Laws of Large Numbers Property 3.1 Markov s Iequality). Let X be a rrv that takes oly o oegative values. The, for ay a > 0, we have : PX a) E[X] a 22) Property 3.2 Bieaymé-Chebyshev s Iequality). Let X be a rrv that has expectatio ad variace. The, for ay α > 0, we have : P X E[X] α) VarX) α 2 23) Theorem 3.1 Weak law of large umbers). Let X ) N be a sequece of i.i.d. rrvs, each havig fiite expectatio. The weak law of large umbers also called Khitchie s law) states that the sample mea X coverges i probability towards E[X 1 ], that is, for all ɛ > 0 : lim P X E[X 1 ] > ɛ ) = 0 24) Theorem 3.2 Strog law of large umbers). Let X ) N be a sequece of i.i.d. rrvs, each havig fiite expectatio. The strog law of large umbers SLLN) also called Kolmogorov s strog law, states that the sample mea X coverges almost surely towards E[X 1 ], that is: ) P lim X = E[X 1 ] = 1 25) Fudametal implicatio. Let X ) be a sequece of idepedet Beroulli radom variables with parameter p, that is X = 1 whe some evet E occurs with probability p = PE) ad X = 0 with probability 1 p whe E does ot occur. Accordig to the Strog Law of Large Numbers, X i a.s. E[X 1 ] = p I words, X i is the umber of times that E occurs over trials. The SLLN thus states that the frequecy of observig E coverges to PE) as the size of the sample gets larger ad larger. This justifies the frequetist school that sees the probability of a evet as the theoretical frequecy of observig that evet. 8

4 The Cetral Limit Theorem Property 4.1. Let X 1,..., X be N idepedet rrvs o probability space Ω, A, P) with respective momet geeratig fuctios M 1,..., M. The the momet geeratig fuctio of S = is : for all x R where M x) exist. M S x) = X i M i x) 26) Corollary 4.1 Mgf of iid sample). Let X 1,..., X ) be a radom sample of size N o probability space Ω, A, P) with momet geeratig fuctio M = M X1. The the momet geeratig fuctio of S = is : for all x R where Mx) exists. X i M S x) = Mx)) 27) Theorem 4.2 Cetral Limit Theorem). Let X ) N be a sequece of i.i.d. rrvs, each havig expectatio E[X 1 ] = µ ad fiite variace VarX 1 ) = σ 2 <. The Cetral Limit Theorem states that the sequece of variables Z ) N defied by: Z = X µ σ 2 coverges i distributio towards Z followig a stadard ormal distributio N 0, 1), that is: lim F Z x) = Φx), for all x R 28) Equivaletly, the CLT ca be rewritte as: ) D X N µ, σ2 Applicatios of the CLT. With the Strog Law of Large Numbers, the CLT is the other most importat result i Probability ad Statistics. I words, the CLT states that the distributio of the sum or mea) of ay iid radom variables coverges to a ormal distributio provided that the populatio distributio has fiite variace. As a cosequece, you ca use the ormal distributio to approximate probabilities as log as the sample size is large eough. How large is large eough? The aswer depeds o two factors. 9

Requiremets for accuracy. The more closely the samplig distributio eeds to resemble a ormal distributio, the more sample poits will be required. The shape of the uderlyig populatio. The more closely the origial populatio resembles a ormal distributio, the fewer sample poits will be required. Empirical evidece shows that a sample size of 30 is large eough whe the populatio distributio is roughly bell-shaped. Some statisticias may recommed a sample size of at least 40 though. But if the origial populatio is distictly ot ormal, the sample size should be eve larger. Example 4. Let X 1,..., X 15 ) be a radom sample with probability desity fuctio : fx) = 3 2 x2 1 1,1) x) What is the approximate probability that the sample mea X 15 falls betwee -2/5 ad 1/5? Aswer. The CLT states that X 15 follows approximately a ormal distributio N E[X 1 ], VarX 1 )/15). Let us compute E[X 1 ] ad VarX 1 ) : E[X 1 ] = 1 1 x 3 2 x2 dx = 0 VarX 1 ) = E[X 2 1 ] E[X 1 ] 2 = = 3 5 1 1 x 2 3 2 x2 dx 0 2 Therefore, VarX 1 )/15 = 3/75 = 1/25 D X 15 N 0, 1 ) 25 Hece, P 2 5 X 15 1 ) ) 2/5 0 = P X 15 0 1/5 0 5 1/25 1/25 1/25 P 2 Z 1) Z is a stadard ormal radom variable) Φ1) Φ 2) Φ is the cdf of N 0, 1)) Φ1) + Φ2) 1 0.8413 + 0.9772 1 0.8185 10