Distribution of Random Samples & Limit theorems

STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to demostrate the digital trasformatio i the US populatio over the past 5 years. Oe feature of this techological revolutio is for istace the time spet o smartphoes. Let us deote X that radom variable for some US idividual. If oe is iterested i E[X] the mea time spet o smartphoes by the US populatio, that quatity is exactly E[X] = 1 N x i N where N 320 millio, is the size of the US populatio ad x i is the time spet by idividual i o his/her smartphoe. However, i most statistical studies, it is rare to have access to the whole populatio. Therefore, we eed some pricipled guidace to estimate E[X] from a sample of the populatio of size with geerally N. Defiitio 1.1 i.i.d. Sample). Let X 1,..., X be collectio of N radom variables o probability space Ω, A, P). X 1,..., X ) is a radom sample of size if ad oly if: 1) X 1,..., X are mutually idepedet 2) X 1,..., X are idetically distributed, that is, each X i comes from the same distributio. We say that X 1,..., X are idepedet ad idetically distributed abbreviated i.i.d.). A good statistical sample is such that the picked idividuals are represetative of the populatio. The latter is esured by the mutual idepedece of the idividuals. Upo existece, the populatio mea is deoted µ = E[X 1 ] ad the variace of the populatio σ 2 = VarX 1 ). Defiitio 1.2 Sample mea). Let X 1,..., X ) be a radom sample.the the radom variable X = 1 X i 1) is called the sample mea. 1

We are iterested i the distributio of X. We eed to defie formally the distributio of more tha 2 radom variables. You should see that most of the followig defiitios/properties are atural extesios of the case = 2. Defiitio 1.3 Joit cumulative distributio fuctio). Let X 1,..., X be N rrvs o probability space Ω, A, P). The joit cumulative) distributio fuctio c.d.f.) of X 1,..., X is defied as follows : for x 1,..., x ) R F X1...X x 1,..., x ) = P X 1 x 1,..., X x ), 2) Defiitio 1.4 Discrete rrvs). Let X 1,..., X be N discrete rrvs o probability space Ω, A, P). The joit probability mass fuctio pmf) of X 1,..., X, deoted is defied as follows : p X1...X x 1,..., x ) = P X 1 = x 1,..., X = x ), 3) for x 1,..., x ) X 1 Ω)... X Ω). Property 1.1. Let X 1,..., X be N discrete rrvs o probability space Ω, A, P) with joit pmf p X1...X, the the followig holds : p X1...X x 1,..., x ) 0, for x 1,..., x ) X 1 Ω)... X Ω) x 1 X 1Ω)... x X Ω) p X 1...X x 1,..., x ) = 1 Defiitio 1.5 Joit probability desity fuctio). Let X 1,..., X be N rrvs o probability space Ω, A, P). X 1,..., X are said to be joitly cotiuous if there exists a fuctio f X1...X such that, for ay subset B R : PX 1,..., X ) B) = f X1...X x 1,..., x ) dx 1... dx 4) B The fuctio f X1...X is called the joit probability desity fuctio of X 1,..., X. Property 1.2. Let X 1,..., X be N rrvs o probability space Ω, A, P) with joit pdf f X1...X, the the followig holds : f X1...X x 1,..., x ) 0, for x 1,..., x ) R R f X1...X x 1,..., x ) dx 1... dx = 1 Defiitio 1.6 Margial distributios). Let X 1,..., X be N rrvs o probability space Ω, A, P). Discrete case) If X 1,..., X have joit pmf p X1...X. The the margial probability mass fuctio of X i is obtaied by summig over all X j, j i. For istace, p X1 x 1 ) =... p X1...X x 1,..., x ), for x 1 X 1 Ω) x 2 X 2Ω) x X Ω) 5) 2

Cotiuous case) If X 1,..., X have joit pdf f X1...X. The the margial probability desity fuctio of X i is obtaied by itegratig over all X j, j i. For istace, f X1 x 1 ) = f X1...X x 1,..., x ) dx 2... dx, for x 1 R 6) R 1 Defiitio 1.7 Idepedece). Let X 1,..., X be N rrvs o probability space Ω, A, P). Discrete case) If X 1,..., X have joit pmf p X1...X with respective margial pmfs p X1,..., p X. The X 1,..., X are said to be idepedet if ad oly if : p X1...X x 1,..., x ) = p Xi x i ), for all x 1,..., x ) X 1 Ω)... X Ω) 7) Cotiuous case) If X 1,..., X have joit pdf f X1...X with respective margial pdfs f X1,..., f X. The X 1,..., X are said to be idepedet if ad oly if : f X1...X x 1,..., x ) = f Xi x i ), for all x 1,..., x ) R 8) Property 1.3 Distributio of iid sample). Let X 1,..., X ) be a radom sample of size N o probability space Ω, A, P). The Discrete case) p X1...X x 1,..., x ) = p X1 x i ), for all x 1,..., x ) X 1 Ω)... X Ω) 9) Cotiuous case) f X1...X x 1,..., x ) = f X1 x i ), for all x 1,..., x ) R 10) Defiitio 1.8 Expected Value). Let X 1,..., X be N rrvs o probability space Ω, A, P) ad let g : R R. Discrete case) If X 1,..., X have joit pmf p X1...X. The, the mathematical expectatio of gx 1... X ), if it exists, is : E[gX 1... X )] =... gx 1,..., x )p X1...X x 1,..., x ) x 1 X 1Ω) x X Ω) 11) 3

Cotiuous case) If X 1,..., X have joit pdf f X1...X. The, the mathematical expectatio of gx 1... X ), if it exists, is : E[gX 1... X )] = gx 1,..., x )f X1...X x 1,..., x ) 12) R Theorem 1.1. Let X 1,..., X be N idepedet rrvs o probability space Ω, A, P) ad let g 1,..., g be real-valued fuctios o R. The, E[g 1 X 1 )... g X )] = E[g 1 X 1 )]... E[g X )] 13) provided that the expectatios exist. Theorem 1.2 Variace of idepedet rrvs). Let X 1,..., X be N idepedet rrvs o probability space Ω, A, P). The, ) Var X i = VarX i ) 14) provided that the variaces exist. Property 1.4. Let X 1,..., X ) be a radom sample of size N o probability space Ω, A, P) with mea µ = E[X 1 ] ad variace σ 2 = VarX 1 ) <. The E[X ] = µ 15) ad VarX ) = σ2 2 Covergece of radom variables 2.1 Covergece i probability 16) Defiitio 2.1. Let X ) N be a sequece of rrvs o probability space Ω, A, P) ad X be a rrv o the same probability space. Sequece X ) is said to coverge i probability towards X if, for all ε > 0 : Covergece i probability is deoted as follows : lim P X X > ε) = 0 17) X P X Example 1. Let X be a discrete rrv with pmf p X defied by : 1/3 if x = 1 p X x) = 2/3 if x = 0 0 otherwise 4

ad let X = 1 + 1 )X. Show that X P X. Aswer. We have: X X = X + X X = X = X sice X ca oly take oegative values. The, for ay ε > 0, P X X > ε) = P X > ε). Note that the evet { X > ε} ca oly occur whe X = 1 ad ε < 1 sice ε > 0. Therefore, we get: { p X 1) = 1/3 < 1/ε P X X > ε) = 0 > 1/ε It ow becomes obvious that P X X > ε) coverges to 0, because it is idetically equal to zero for all > 1/ε, which etails the desired result. Example 2. For 1, let X ) be a sequece of radom variables where X follows a expoetial distributio with parameter. Show that X ) coverges i probability to 0. Aswer. The probability desity fuctio of X is give by : f X x) = e x 1 [0, ) x). Let ε > 0 be a arbitrary costat, we have P X 0 > ε) = PX > ε) = Hece, the result holds. ε e x dx = e ε 0 sice ε > 0 give that a expoetial rrv ca oly take o oegative values 2.2 Almost sure covergece Defiitio 2.2. Let X ) N be a sequece of rrvs o probability space Ω, A, P) ad X be a rrv o the same probability space. Sequece X ) is said to coverge almost surely or almost everywhere or with probability 1 or strogly towards X if: ) P lim X = X = 1 18) Almost sure covergece is deoted as follows : X a.s. X 5

From Equatio 18, we ote that almost sure covergece is a slightly modified versio of the cocept of poitwise covergece of fuctios recall that a radom variable is formally a mappig from the sample space Ω to R. That is, ω Ω, X ω) Xω) Requirig covergece for all ω Ω is actually too striget. To defie almost sure covergece, we relax the above statemet ad allow that covergece might ot be reached for some outcomes i Ω. Rigorously, let E be the followig evet : E = {ω Ω : X ω) does ot coverge to Xω)} ad F be a evet with zero probability but F is ot the impossible evet, i.e. F. The we say that the sequece X ) coverges almost surely to X if E F. Almost sure covergece is a widely spread cocept i the Probability ad Statistics literature but provig almost sure covergece requires tools from measure theory, which is out of scope of this course. 2.3 Covergece i mea Defiitio 2.3. Let X ) N be a sequece of rrvs o probability space Ω, A, P) ad X be a rrv o the same probability space. Give a real umber r 1, sequece X ) is said to coverge i the r-th mea or i the L r -orm towards X if: lim E[ X X r ] = 0 19) provided that for all, E[ X r ] ad E[ X r ] exist. Covergece i the r-th mea is deoted as follows : X L r X The most importat cases of covergece i the r-th mea are: Whe Equatio 19) holds for r = 1, we say that X ) coverges i mea to X Whe Equatio 19) holds for r = 2, we say that X ) coverges i mea square to X 2.4 Covergece i distributio Defiitio 2.4. Let X ) N be a sequece of rrvs o probability space Ω, A, P). For ay, the distributio fuctio of X is deoted by F. Let X be a rrv with distributio fuctio F X. Sequece X ) is said to coverge i distributio or coverge weakly towards X if: lim F x) = F X x) 20) 6

for all x R at which F X is cotiuous. Covergece i distributio is deoted as follows : X D X The first fact to otice is that covergece i distributio, as the ame suggests, oly ivolves the distributios of the radom variables. Thus, the radom variables eed ot eve be defied o the same probability space that is, they eed ot be defied for the same radom experimet), ad ideed we do t eve eed the radom variables at all. This is i cotrast to the other modes of covergece we have studied. Example 3. Let X ) N be a sequece of rrvs with cdf F F x) = 1 1 1 ) x ) 1 0, ) x) What is the asymptotic distributio of X )? Aswer. Note that for x, 0), we trivially have that F x) = 0 0. Now, let x [0, ), a result i calculus gives : lim 1 1 ) x = e x Therefore, F x) = 1 1 ) 1 x 1 e x. We recogize the cumulative distributio fuctio of a expoetial distributio with parameter 1 for those who are ot coviced, you ca differetiate the expressio o the right-had side. We coclude that the sequece X ) coverges to a expoetial distributio with parameter 1. Theorem 2.1. Let X ) N be a sequece of rrvs o probability space Ω, A, P) with respective mgfs M. Let X be a rrv with mgf M X. If the followig holds : lim M x) = M X x) 21) for all x R where M x) ad M X x) exist, the sequece X ) coverges i distributio to X. 2.5 Implicatios betwee modes of covergece The followig summary gives the implicatios for the various modes of covergece; o other implicatios hold i geeral. 7

Propositio 2.1. 1. For s > r 1, covergece i the s-th mea implies covergece i r-th mea. 2. Covergece i mea implies covergece i probability. 3. Almost sure covergece implies covergece i probability. 4. Covergece i probability implies covergece i distributio. 3 The Laws of Large Numbers Property 3.1 Markov s Iequality). Let X be a rrv that takes oly o oegative values. The, for ay a > 0, we have : PX a) E[X] a 22) Property 3.2 Bieaymé-Chebyshev s Iequality). Let X be a rrv that has expectatio ad variace. The, for ay α > 0, we have : P X E[X] α) VarX) α 2 23) Theorem 3.1 Weak law of large umbers). Let X ) N be a sequece of i.i.d. rrvs, each havig fiite expectatio. The weak law of large umbers also called Khitchie s law) states that the sample mea X coverges i probability towards E[X 1 ], that is, for all ɛ > 0 : lim P X E[X 1 ] > ɛ ) = 0 24) Theorem 3.2 Strog law of large umbers). Let X ) N be a sequece of i.i.d. rrvs, each havig fiite expectatio. The strog law of large umbers SLLN) also called Kolmogorov s strog law, states that the sample mea X coverges almost surely towards E[X 1 ], that is: ) P lim X = E[X 1 ] = 1 25) Fudametal implicatio. Let X ) be a sequece of idepedet Beroulli radom variables with parameter p, that is X = 1 whe some evet E occurs with probability p = PE) ad X = 0 with probability 1 p whe E does ot occur. Accordig to the Strog Law of Large Numbers, X i a.s. E[X 1 ] = p I words, X i is the umber of times that E occurs over trials. The SLLN thus states that the frequecy of observig E coverges to PE) as the size of the sample gets larger ad larger. This justifies the frequetist school that sees the probability of a evet as the theoretical frequecy of observig that evet. 8

4 The Cetral Limit Theorem Property 4.1. Let X 1,..., X be N idepedet rrvs o probability space Ω, A, P) with respective momet geeratig fuctios M 1,..., M. The the momet geeratig fuctio of S = is : for all x R where M x) exist. M S x) = X i M i x) 26) Corollary 4.1 Mgf of iid sample). Let X 1,..., X ) be a radom sample of size N o probability space Ω, A, P) with momet geeratig fuctio M = M X1. The the momet geeratig fuctio of S = is : for all x R where Mx) exists. X i M S x) = Mx)) 27) Theorem 4.2 Cetral Limit Theorem). Let X ) N be a sequece of i.i.d. rrvs, each havig expectatio E[X 1 ] = µ ad fiite variace VarX 1 ) = σ 2 <. The Cetral Limit Theorem states that the sequece of variables Z ) N defied by: Z = X µ σ 2 coverges i distributio towards Z followig a stadard ormal distributio N 0, 1), that is: lim F Z x) = Φx), for all x R 28) Equivaletly, the CLT ca be rewritte as: ) D X N µ, σ2 Applicatios of the CLT. With the Strog Law of Large Numbers, the CLT is the other most importat result i Probability ad Statistics. I words, the CLT states that the distributio of the sum or mea) of ay iid radom variables coverges to a ormal distributio provided that the populatio distributio has fiite variace. As a cosequece, you ca use the ormal distributio to approximate probabilities as log as the sample size is large eough. How large is large eough? The aswer depeds o two factors. 9

Requiremets for accuracy. The more closely the samplig distributio eeds to resemble a ormal distributio, the more sample poits will be required. The shape of the uderlyig populatio. The more closely the origial populatio resembles a ormal distributio, the fewer sample poits will be required. Empirical evidece shows that a sample size of 30 is large eough whe the populatio distributio is roughly bell-shaped. Some statisticias may recommed a sample size of at least 40 though. But if the origial populatio is distictly ot ormal, the sample size should be eve larger. Example 4. Let X 1,..., X 15 ) be a radom sample with probability desity fuctio : fx) = 3 2 x2 1 1,1) x) What is the approximate probability that the sample mea X 15 falls betwee -2/5 ad 1/5? Aswer. The CLT states that X 15 follows approximately a ormal distributio N E[X 1 ], VarX 1 )/15). Let us compute E[X 1 ] ad VarX 1 ) : E[X 1 ] = 1 1 x 3 2 x2 dx = 0 VarX 1 ) = E[X 2 1 ] E[X 1 ] 2 = = 3 5 1 1 x 2 3 2 x2 dx 0 2 Therefore, VarX 1 )/15 = 3/75 = 1/25 D X 15 N 0, 1 ) 25 Hece, P 2 5 X 15 1 ) ) 2/5 0 = P X 15 0 1/5 0 5 1/25 1/25 1/25 P 2 Z 1) Z is a stadard ormal radom variable) Φ1) Φ 2) Φ is the cdf of N 0, 1)) Φ1) + Φ2) 1 0.8413 + 0.9772 1 0.8185 10