STAT 203 Chapter 18 Samplig Distributio Models Populatio vs. sample, parameter vs. statistic Recall that a populatio cotais the etire collectio of idividuals that oe wats to study, ad a sample is a subset of idividuals selected from a populatio. A parameter refers to a umerical summary of a populatio. The couterpart of a sample is a statistic. The value of a parameter is fixed yet ukow i practice. We use statistics to estimate populatio parameters. Due to samplig variability (variatio from sample to sample), a statistic takes o differet values for differet samples. Samplig Distributio of Proportios (Percetages) A local burger store is iterested i fidig out the proportio of vegetaria customers (who most likely purchase veggie burgers). It radomly samples 500 customers over a moth ad asks each whether he/she is vegetaria. Here, the populatio of iterest are all customers visitig the burger store, ad the 500 radomly chose customers make up the sample. The parameter (a umerical summary of a populatio) is the proportio of all customers who are vegetaria, ad the statistic (a umerical summary of a sample) is the sample proportio of customers who are vegetaria. Sample data (of the burger store s sample): Customer Vegetaria? 1 No 2 No 3 No 4 No 5 Yes 499 Yes 500 No Suppose there are 32 vegetaria customers i the sample. The sample proportio of vegetaria customers is 32/500 = 0.064. How reliable is this sample proportio as a 1
estimate of the true proportio of vegetaria customers? The burger store has draw oe radom sample of size 500. Imagie the samplig procedure is repeated may more radom samples of size 500 are draw. For each of these samples, we have a sample proportio whose value will be differet for differet samples. Repeated samples data (of the may more samples): Sample Sample proportio 1 0.064 2 0.048 3 0.050 4 0.070 5 0.068 Thigs to thik about... 1. Where do the sample proportio values ceter at? 2. How spread out are the sample proportio values? 3. What is the shape of the distributio of the sample proportio values? The true proportio of idividuals sharig a certai characteristic i a populatio is the populatio proportio p (which is a parameter), 0 < p < 1. For a sample of idividuals radomly selected from the populatio, the sample proportio (which is a statistic) is give by ˆp = # idividuals sampled who have the characteristic sample size The value of the populatio proportio p is fixed but is usually ukow. The sample proportio ˆp is used to estimate the true populatio proportio. Due to samplig variatio, ˆp varies across samples, ad is ulikely to be exactly equal to p. How close will ˆp be to p? It would be useful if we ca make probability statemets about the proximity of ˆp to p. 2
Samplig distributio of ˆp: The samplig distributio of proportios is the distributio of the sample proportios of all possible radom samples of size that ca be obtaied from a populatio. The mea µ(ˆp) of the samplig distributio of ˆp is equal to p i value. I other words, the sample proportios from repeated radom samples of size has a mea equal to the populatio proportio p i value. The stadard deviatio σ(ˆp) of the samplig distributio of ˆp is equal to or pq where q = 1 p. p(1 p), Whe p is ukow, σ(ˆp) is estimated by substitutig p by ˆp. We call this estimated σ(ˆp) the stadard error of ˆp: SE(ˆp) = ˆp(1 ˆp), or ˆpˆq where ˆq = 1 ˆp. For sufficietly large samples, the samplig distributio of ˆp is approximately ormal. The larger the sample size, the better the ormal approximatio. Assumptios ad coditios for the validity of ormal approximatio are: 1. the sample is radomly draw from the populatio. 2. the idividual values i the sample are idepedet. (Idividuals are draw without replacemet from the populatio, so idepedece ca ever be achieved. But this assumptio is well validated as log as the sample size is o greater tha 10% of the populatio size.) 3. the sample size has to be large. (It is sufficiet to check the coditios: p > 10 ad (1 p) > 10.) Samplig Distributio of Meas A Erolmet Services staff at a istitutio is iterested i fidig the mea GPA of studets of the istitutio. A sample of 100 studets is radomly draw from all studets, ad their academic records are retrieved. All studets of the istitutio comprise the populatio, ad the 100 studets selected comprise the sample. The parameter is the mea GPA of all studets of the istitutio, ad the statistic is the sample mea GPA. 3
Sample data (obtaied by the staff): Studet GPA 1 3.0 2 3.3 3 2.7 4 4.0 5 2.0 99 1.7 100 3.7 Suppose the staff s sample gives a mea of 2.4. How reliable is this sample mea as a estimate of the true mea GPA of all studets of the istitutio? The staff has draw oe radom sample of size 100. Imagie the samplig procedure is repeated may more radom samples of size 100 are draw. For each of these samples, we have a sample mea whose value will be differet for differet samples. Repeated samples data (of the may more samples): Sample Sample mea 1 2.4 2 2.6 3 2.0 4 2.3 5 3.6 Thigs to thik about... 1. Where do the sample mea values ceter at? 2. How spread out are the sample mea values? 3. What is the shape of the distributio of the sample mea values? The populatio mea µ is a parameter, which is fixed but usually ukow. The sample mea y is a statistic ad is used to estimate the true populatio mea µ. Due to samplig variatio, y varies across samples, ad is ulikely to exactly equal to µ. How close will y be to µ? 4
Samplig Distributio of Meas: The samplig distributio of meas is the distributio of the meas of all the possible radom samples of size that could be selected from a populatio. Suppose a radom sample of subjects is to be draw from a populatio, ad the observatio o a subject (y) i the populatio follows a distributio with mea µ ad stadard deviatio σ. The mea of the samplig distributio of meas is represeted by µ(y), ad is equal to µ i value. Equivaletly, let y 1, y 2,, y be a radom sample from some populatio with mea µ. The sample meas from repeated radom samples of size draw from this populatio has a mea equal to the populatio mea µ i value. The stadard deviatio of the samplig distributio of meas is represeted by σ(y). It is give by σ(y) = σ. Equivaletly, let y 1, y 2,, y be a radom sample from a populatio with mea µ ad stadard deviatio σ. The sample meas from repeated radom samples of size draw from this populatio has stadard deviatio equal to σ i value. Whe σ is ukow, σ(y) is estimated by substitutig σ by the sample SD s. We call this estimated σ(y) the stadard error of y: SE(y) = s The larger the sample size, the smaller the stadard deviatio for the sample meas, ad the better the approximatio of the ormal model to the samplig distributio of y. The Cetral Limit Theorem (CLT): Let y 1, y 2,, y be idepedet values of a radom sample from some populatio with mea µ ad stadard deviatio σ. For sufficietly large samples, the sample mea y follows approximately the ormal model with mea µ ad stadard deviatio σ, eve if the uderlyig distributio of the idividual observatios (y s) i the populatio is ot ormal. Assumptios ad coditios for the validity of CLT are: 1. the sample is radomly draw from the populatio. 2. the idividual values i the sample are idepedet. (The sample size should be o greater tha 10% of the populatio size.) 3. the sample size has to be sufficietly large. If the uderlyig distributio is ormal, the sample mea y follows the ormal model with mea µ ad stadard deviatio σ regardless of the sample size. The ormality of the sample mea i this case is ot a result of the CLT. 5
Examples 1. It is geerally believed that earsightedess affects about 12% of childre. A school district gives visio tests to 144 icomig kidergarte childre. (a) Describe the samplig distributio model for the sample proportio by amig the model ad tellig its mea ad stadard deviatio. Justify your aswer. (b) Sketch ad clearly label the model. (c) What is the probability that i this group over 15% of the childre will be foud to be earsighted? 2. A recet study ivolvig attritio rates at a major uiversity has show that 43% of all icomig freshme do ot graduate withi 4 years of etrace. (a) Describe the samplig distributio of the sample proportio of 200 radomly selected freshme who will graduate withi the ext 4 years. State ay assumptio(s) made to reach your aswer. (b) What is the approximate probability that the percetage of sampled freshme graduatig withi 4 years will be betwee 50% ad 64%? 3. Your mail-order compay advertises that it ships 90% of its orders withi three workig days. You select a radom sample of 120 orders for a audit. The audit reveals that 98 out of the 120 were shipped o time. (a) Fid the sample proportio of orders that were shipped o time. (b) If the compay really ships 90% of its orders o time, what is the probability that the proportio i a radom sample of size 120 orders is smaller tha or equal to the proportio i your sample for audit? Do you thik the compay s claim is trustworthy? 4. A radom sample of = 100 observatios is selected from a populatio with µ = 30 ad σ = 16. (a) Describe the samplig distributio of the mea y. (b) Approximate the followig probabilities: i. y is greater tha 28 ii. y is betwee 22.1 ad 26.8 6
5. The ages of U.S. commercial aircraft have a mea of 13.0 years ad a stadard deviatio of 7.9 years (based o data from Aviatio Data Services). The Federal Aviatio Admiistratio radomly selects 36 commercial aircrafts for special stress tests. (a) Describe the samplig distributio of the mea age of a sample of 36 aircrafts. (b) Fid the probability that the mea age of this sample group is greater tha 15.0 years. (c) Is the probability calculated i part (b) a exact or a a approximate probability? Justify your aswer. 6. A bottlig compay uses a fillig machie to fill plastic bottles with cola. A bottle should cotai 300 ml. I fact, the cotets vary accordig to the ormal model with mea 298 ml ad stadard deviatio 3 ml. (a) What is the probability that a idividual bottle cotais less tha 295 ml? (b) What is the probability that the mea cotets of bottles i a six-pack is less tha 295 ml? (c) Withi what rage of values does the mea cotets of bottles i a 12-pack have a 95% chace of fallig? 7