STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated as i.i.d. o The umber is called the sample size. A statistic is a fuctio of the radom variables i a radom sample. o Each statistic is itself a radom variable ad therefore has its ow probability distributio, describig how it would vary uder repeated radom samplig. o The probability distributio of a statistic is called a samplig distributio. Example 9-: Grade Poit Average Suppose that a studet amed Marius has a.3 probability of gettig a A, a.5 probability of gettig a B, ad a.2 probability of gettig a C i a class. Suppose further that this probability distributio holds idepedetly for each of two classes that he is takig this term. Let X deote the umber of grade poits (A = 4 poits, B = 3 poits, C = 2 poits) that he receives course ad similarly for X 2. a) Calculate the expected value, variace, ad stadard deviatio of Marius s grade poits i a sigle course. Now cosider the statistic X = average (mea) grade poits i the two courses. The followig table lists all of Marius s possible grades i these two courses. b) Determie the probabilities of these 9 possible outcomes, ad record them i the table alog with the value of the sample mea (GPA). Grades A, A A, B A, C B, A B, B B, C C, A C, B C, C Probability GPA c) Report the probability (samplig) distributio of the sample mea grade poits by listig its possible values ad the probability of each: x p( x )
d) Determie the expected value of the sample mea grade poits. How does it compare to the expected grade poits i a sigle course? e) Determie the variace ad SD of the sample mea grade poits. How do they compare to their couterparts for grade poits i a sigle course? Now suppose that you wat to ivestigate Marius s academic performace over a year i which he takes 0 courses. f) If you were to list all possible outcomes (grade permutatios) for those 0 courses, how may would there be? It s o loger feasible to eumerate all possible outcomes, but we ca rely o simulatio to approximate the samplig distributios of these statistics. The followig R code performs such a simulatio: # start with N = umber of repetitios, = umber of courses # also start with pa = Pr(A), pb = Pr(B), pc = Pr(C) # grpts = rep(na, times = ) GPA = rep(na, times = N) for (i i :N) { rad = ruif(,0,) for (j i :) { if (rad[j] < pa) {grpts[j] = 4} if ((rad[j] >= pa) & (rad[j] < pa+pb)) {grpts[j] = 3} if (rad[j] >= pa+pb) {grpts[j] = 2} } GPA[i] = mea(grpts) } hist(gpa); table(gpa) mea(gpa); sd(gpa) 2
g) Explai the differece betwee the (i i :N) ad the (j i :) loops. h) Explai what the GPA vector does. i) Ru this code for 00,000 simulated years of 0 courses per year. What do you otice about the (approximate) samplig distributio of the sample mea GPA? Commet o its shape, mea, ad SD. How do these compare to their couter-parts with a sample size of = 2? j) Use the simulatio results to approximate the probability that Marius s GPA will be at least 3.0. The do the same for a GPA of 3.25. k) Commet o how these probabilities i the = 0 case compare to the = 2 case. l) Icrease the sample size (umber of courses) to 40, represetig a etire college career. Before you ru the simulatio, predict what you will see with regard to the distributio of the sample mea (GPA). m) Ru a simulatio with 00,000 simulated college careers. Commet o what the simulatio reveals about the samplig distributio of the sample mea (GPA). 3
) Agai use the simulatio results to approximate the probability that Marius s GPA will be at least 3.0. The do the same for a GPA of 3.25. Commet o how these probabilities i the = 40 case compare to the = 0 case. Example 9-2: Fast-food service time Suppose agai that the service time for a radomly selected customer at a particular fast-food restaurat follows a expoetial distributio with mea.25 miutes. Let the radom variable T represet this service time, ad let T = sample of customers. i= a) Report the mea ad stadard deviatio of T. T i deote the average service time i a radom b) Simulate the waitig times for N = 00,000 samples, usig each of the followig sample sizes for umber of customers: =, = 5, = 25, = 00. For each sample size, commet o the shape of the samplig distributio of T ad report the mea ad SD of the sample meas. c) Commet o how the samplig distributio of T chages as the sample size icreases. 4
Theoretical result: Let X, X 2,, X be i.i.d. from ay probability distributio. Deote E(X i ) by μ ad Var(X i ) by σ 2. Let X = i= X i for some positive iteger (sample size). a) Use properties of expectatio to determie E( X ). b) Use properties of variace to determie Var( X ) ad SD( X ). c) Now suppose that the X i s have a ormal distributio. What do you kow about the distributio of X i this case? Explai. Your simulatios ad theoretical derivatios from last time lead to the followig result, the most importat i all of probability ad statistics: Cetral Limit Theorem (CLT): Let X, X 2,, X be i.i.d. with μ = E(X i ) ad σ 2 = Var(X i ). Also let X = X deote the sample mea. The the samplig distributio of X has: o E( X ) = μ Be careful i readig this statemet, which speaks of 3 differet meas: The sample mea, X The populatio mea, μ The mea of the sample meas, E( X ) o Var( X ) = σ 2 /, so SD( X ) = σ/ Averages vary less tha idividual values. SD decreases proportioally to the square root of sample size. o A approximately ormal distributio for large values of Regardless of the distributio of the Xi s Exactly ormal for ay if the Xi s are ormally distributio Becomes closer ad closer to ormal as the sample size icreases Also closer to ormal for Xi s that are closer to ormal o Corollary: The distributio of the sum of idepedet radom variables also approaches a ormal distributio as the sample size icreases, with E(Sum) = μ ad Var(sum) = σ 2. i= i 5
Example 9-3: Maufacturig potato chips Suppose that the weights of bags of potato chips comig off a assembly lie are ormally distributed with mea μ = 2 ouces ad stadard deviatio σ = 0.4 ouces. a) Determie the probability that oe radomly selected bag weighs less tha.9 ouces. b) If you take a radom sample of 0 bags, would you expect the probability of their sample mea weight beig less tha.9 ouces to be greater or less tha the probability foud i (a)? Explai, without performig the calculatio. c) Calculate the probability asked about i the previous questio. [Hit: Draw ad label a sketch of the samplig distributio ad shade the regio whose area correspods to this probability.] Does this probability idicate that a sample mea as small as.9 ouces would be surprisig if the populatio mea were really 2 ouces? d) Repeat this aalysis, for a sample of 00 radomly selected bags. 6
e) What is the smallest sample size for which the probability of the sample mea beig less tha.9 ouces is less tha.0? [Hits: Fid the first percetile of the stadard ormal distributio as the value z such that P(Z<z) <.0. Set this percetile equal to the z-score from stadardizig.9 ad solve for.] f) If you were told that a cosumer group had weighed radomly selected bags ad foud a sample mea weight of.9 ouces, would you doubt the claim that the true mea weight of all of the potato chip bags is 2 ouces? O what uspecified iformatio does your aswer deped? Explai. g) Which of your above aswers to would be affected if the distributio of the weights of the bags was ot ormal but was rather skewed? h) Fid a value k such that the probability of the sample mea weight of 000 radomly selected bags beig betwee 2 - k ad 2 + k is roughly 0.95. I other words, betwee what two x values do the middle 95% of the x values fall? i) Determie the smallest sample size for which the probability is.95 that the sample mea falls withi ±.05 of 2 ouces (i.e., betwee.95 ad 2.05). 7