STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

STAT 515 fa 2016 Lec 15-16 Samplig distributio of the mea, part 2 cetral limit theorem Karl B. Gregory Moday, Sep 26th Cotets 1 The cetral limit theorem 1 1.1 The most importat theorem i statistics............. 1 1.2 More adjectives for probability distributios............ 2 1.3 Cetral limit theorem for the sample proportio......... 3 1.4 Diagrams for X ad ˆp........................ 5 1.5 Further examples of approximatio............ 6 1 The cetral limit theorem 1.1 The most importat theorem i statistics Probably the most importat theorem i statistics is the cetral limit theorem. This theorem tells us that the sample mea behaves like a radom variable if the sample size is large eough eve if the populatio itself is ot! Theorem 1 Cetral Limit Theorem If X has mea µ ad variace σ 2 <, the for a radom sample X 1,..., X of X values, the sample mea X = 1 X i behaves more ad more like a µ, σ2 i=1 radom variable for larger ad larger sample sizes. Example 1 Let X be the maratho time of a radomly selected ruer of the ext Columbia maratho. The distributio is skewed to the right ad has mea 4.5 hours ad stadard deviatio 2 hours. Suppose you take a radom sample of 30 fiishers. 1

Questio: What is the probability that the mea of the 30 times is less tha 4.25 hours? Aswer: Eve though the maratho times are ot ly distributed, the sample mea X should behave like a radom variable with a µ, σ2 i.e. a 4.5, 4 30 distributio. Thus we ca get P X < 4.25 usig the distributio: Z = 4.25 4.5 4/30 =.68, ad P Z <.68 =.2483. So the aswer is P X < 4.25 =.2483. Example 2 Let X be the umber of ships which come through a set of locks i a afteroo, ad the mea ad stadard deviatio of X are 6 ad 1, respectively. Suppose you observe o five radomly selected afteroos ad compute X, the mea of the umbers of ships you couted o the 5 afteroos. Questio: What is P X > 7? Aswer: We caot compute it, because the sample size is small, ad the cetral limit theorem holds oly for large sample sizes. 1.2 More adjectives for probability distributios The distributio has a bell-shaped probability desity fuctio. We ofte describe distributios by the way their probability desity fuctios differ i shape from that of the distributio: A left-skewed distributio produces more observatios to the far left of the mea tha the distributio, a heavy-tailed distributio produces more extreme values far away from the mea i both directios, a right-skewed distributio produces more observatios to the far right of the mea tha the distributio. The plots below show probability desity fuctios for a left-skewed, heavytailed, ad a right-skewed distributio solid lies with the probability desity fuctio dashed lie overlaid. Below these plots are histograms from a sample of size = 500 draw from the respective distributios. I the bottom row of the figure, QQ plots are give comparig the quatiles of the sample to the quatiles of the distributio. 2

left skewed quatiles Sample quatiles heavy tailed quatiles right skewed quatiles The cetral limit theorem says that eve whe the populatio has a distributio which is left-skewed, heavy-tailed, right-skewed, or eve which differs from the distributio i some other way, the mea of a large eough sample may be treated as a radom variable. This is take advatage of all the time i statistical practice. 1.3 Cetral limit theorem for the sample proportio We ca express the sample proportio ˆp as a mea ad use the cetral limit theorem to treat it as a radom variable havig a distributio. Suppose we ecode the outcome of a Beroulli trial i the radom variable Y such that Y = { 1 if outcome a success 0 if outcome a failure. If the Beroulli trial has success probability p, the we have P Y = 1 = p ad 3

P Y = 0 = 1 p. We ca compute µ = EY = p ad σ 2 = VarY = p1 p. Suppose we ra the Beroulli trial times idepedetly ad got Y 1,..., Y. The Ȳ = 1 Y i = #{successes} i=1 is the sample proportio ˆp of successes. We ca apply the cetral limit theorem to Ȳ, that is to ˆp. The cetral limit theorem says that Ȳ should behave approximately like a p1 p p, radom variable whe is large. From here we ca compute probabilities about ˆp usig the distributio. Remark 1 How large should be before we ca ivoke the cetral limit theorem for the sample proportio ˆp? A rule of thumb is that we ca treat ˆp as if p 5 ad 1 p 5. Example 3 Suppose you take a radom sample of 15 USC udergraduates ad you ask each oe if they are registered to vote. Let ˆp be the proportio i your sample who are registered to vote. Questio: Supposig that the true proportio of USC udergraduates who are registered to vote is.6, What is the probability that ˆp of your sample is greater tha.7? Aswer #1: For a sample of size 15 ad with the populatio proportio equal to p =.6, ˆp should behave approximately like a p, p1 p i.e. a radom variable sice 15.6, 15.4 5. Now.60,.61.6 15 Z = ˆp p p1 p gives.7.6.61.6 15 = 0.79. We get from the table that P Z >.79 =.2148. So the aswer is P ˆp >.7.2148. Aswer #2: We could also use the Biomial distributio to get the exact aswer. The evet ˆp >.7 correspods to observig 11 or more successes out of the 15 4

Beroulli trials. So if X is the umber of successes, P ˆp >.7 = P X 11 = 1 P X < 10. We ca compute P X < 10 i R usig the commad pbiomq=10,size=15,prob=.6 We get P X < 10 =.7827, so the aswer is P ˆp >.7 = P X 11 = 1.7827 =.2173. It is close to aswer #1, which is approximate. 1.4 Diagrams for X ad ˆp The diagram below summarizes the distributio of the sample mea X: X approx µ, σ 2 / 30 X o - < 30 X X o- X µ, σ 2 X µ, σ 2 / The ext diagram summarizes the distributio of the sample proportio ˆp: ˆp mi{p, 1 p} < 5 mi{p, 1 p} 5 ˆp Biomial, p ˆp approx p, p1 p/ ˆp Biomial, p Recall that ˆp = X, the umber of successes i Beroulli trials, so sayig that ˆp Biomial, p is othig ew, ad it is always true, o matter what is. 5

1.5 Further examples of approximatio Example 4 Suppose X is the time betwee phoe calls to a customer service call ceter every hour, ad suppose it follows the expoetial distributio with mea equal to 1/20. Suppose we observe the ext 30 time itervals betwee calls ad record them as X 1,..., X 30. Let X be the mea legth of the 30 time itervals. Questio: What is P X >.075? Aswer: For the expoetial distributio, we have µ = 1/λ ad σ 2 = 1/λ 2. Accordig to the cetral limit theorem, X should behave approximately like a distributio. So we get 1/λ, 1/λ2 30, i.e. a Z = X µ.075.05 = = 2.74 σ2 /.0025/30.05,.0025 30 We get P Z > 2.74 =.0031. Example 5 Suppose X is the umber of phoe calls to a customer service call ceter every hour, ad suppose it follows the Poisso distributio with λ = 20. Suppose we observe the call ceter durig 25 radomly selected hours ad let X 1,..., X 25 be the umbers of calls we observed ad X the mea umber of calls. Questio: What is P X < 18? Aswer: For the Poisso distributio, we have µ = λ ad σ 2 = λ. Accordig to the cetral limit theorem, X should behave approximately like a distributio. So we get λ, λ 25 = 20, 20 25 Z = X µ 18 20 = = 2.24. σ2 / 20/25 We get P Z < 2.24 =.0125. 6