Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For qualitative variables, the parameter (whe there was oe) was the populatio proportio. Now, there are two parameters the populatio mea ad the populatio variace. Iferece for variace is beyod the scope of this course, so we ll oly cocer ourselves with the mea. The statistic that estimates the populatio mea is the sample mea. Naturally, this is a radom variable ad as such, we eed to kow somethig about the samplig distributio of this statistic. Some Theory About the Sample Mea A Remider Recall from a previous chapter that the samplig distributio of has mea, 2 2 variace, ad a approimately ormal shape uder certai circumstaces. We ow eed to revisit this idea with a eye towards reality. The key for the mea is that the equatio is true; that the equals sig was correct it actually does t affect our (upcomig) calculatios to ot kow the value of. The variace equatio was oly valid if the sample was small relative to the populatio (less tha 10%). That cotiues to be true, ad also cotiues to be the least of our worries. The bigger issue is that we eed this value for our upcomig calculatios. How ofte are we goig to kow the variace of the populatio? Never! A Complicatio We re certaily ot goig to stop ad cry about this! There is a way aroud it i fact, we ve ecoutered the problem before. A few chapters ago, we let go of the parameter p ad started usig the statistic p i its place. 2 Logically, the ow that we do t have the value of (or place? The statistic, of course! Let s replace the stadard error: SE s. ), what should we use i its with s. Just like before, we ll start callig this thig Previously, switchig p for p had o effect o the shape of the distributio because p was a ubiased estimator of p. Would t it be cool if s was a ubiased estimator of? HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 1 OF 7
Alas, it is t. The larger the sample, the smaller the variatio; thus, the value of s will typically get smaller as the sample size icreases right up util the poit that the sample is the populatio, ad s becomes. What you should take from that is that is typically smaller tha s. Oe direct result of this is that SE will typically be larger tha. The other direct result of this cocers the Cetral Limit Theorem. It said that the samplig distributio approached ormal with stadard deviatio. We are ow replacig that eistig stadard deviatio with oe that is larger that meas that the shape of the distributio will be differet! Aother Complicatio The Cetral Limit Theorem says that has a approimately stadard ormal shape; what kid of shape does have? s Fortuately, a very smart guy figured this out a log time ago. He derived a ew distributio, which he called Studet s t (be sure to read your tetbook for the full story). This distributio looks a lot like the stadard ormal, but with fatter tails ad the shape chages as the sample size icreases! We saw this before i our study of Chi Square, ad we saw how the idea was hadled i terms of the graph: degrees of freedom. It turs out that the degrees of freedom for the t distributio are (for ow) 1. Fidig Probabilities You eed to be able to fid probabilities for a t distributio. Happily, this skill is idetical to fidig probabilities for a Chi Square distributio! Whe usig the chart, first fid the degrees of freedom dow the left had side. Net, fid the spot where the give statistic value ought to lie. The, look up to fid the right had area. Fially, make sure that you actually aswer the questio that was asked (this may ivolve symmetry ad the complemet). Whe usig the calculator, kow that tcdf() works idetically to chisqcdf()! Eamples [1.] Fid 2 P t if 15 df. I get a eact aswer of 0.03197, ad a chart aswer betwee 0.025 ad 0.05. HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 2 OF 7
Figure 1 - T Table Ecerpt for Eample 1 [2.] Fid P t 2.5 if df 20 I get a eact aswer of 0.9894, ad a chart aswer betwee 0.975 ad 0.99. You must use the complemet for this oe, sice the questio asked for left had area but the chart oly gives right had area. [3.] Fid P t 1.75 if df 5 I get a eact aswer of 0.0703, ad a chart aswer betwee 0.05 ad 0.1. You must use symmetry for this oe, sice the chart oly uses positive values of t. [4.] Fid P t 3 if df 6 I get a eact aswer of 0.988 ad a chart aswer betwee 0.975 ad 0.99. You must use symmetry ad the complemet for this oe. A Cofidece Iterval for the Mea So how does this affect our procedures for cofidece itervals? The Formula s * t with df 1. The Coditios The t procedures require that the sample was obtaied radomly, that the sample is small eough (the 10% coditio), ad the variable has a ormal distributio i the populatio. Yet Aother Complicatio As before, we will ofte assume that the sample was obtaied radomly. Also, I ll keep the secod coditio i mid, but I ll rarely metio it. The last coditio will almost certaily fail! Fortuately, the t procedures are what we call robust. That meas that they still give reasoably accurate results eve whe the coditios are violated. This does ot mea that we will simply plow ahead ad forget about the coditio rather, it meas that what we eed to check is goig to be slightly differet from oe problem to aother. HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 3 OF 7
If the populatio is ormal, the we are good to go. If the populatio is approimately ormal, or ot very o-ormal, the the robust ature of the t procedures will allow us to cotiue. If the populatio is clearly (or icredibly) o-ormal, the we ll oly be able to cotiue if the sample size is large (because as the sample size icreases, the closer we get to a situatio where the Cetral Limit Theorem kicks i). but how ca we kow aythig about the populatio? How ca we determie if it is OK to cotiue if we do t have the whole populatio to look at? Thik, thik! What have we doe i the past cases? We replaced the populatio iformatio with sample iformatio. Thus, if we ca t look at the shape of the populatio, we should istead look at the shape of the sample! The Solutio For small samples (say, less tha 15), we eed a fairly ormal populatio or, i terms of the sample, there caot be ay clear idicatio of skewess. For slightly larger samples (say 15 through 40), we eed a populatio that is t too skew so we eed to see a sample that is t too terribly skew. No, you ca t make that ay more precise! Deal with it. For larger samples, we almost do t care what the populatio looks like so we could have almost ay amout of skew i the sample. How are you goig to decide if there is skew? If you have data, the you ll eed to graph the data ad that meas that you ll have to draw the graph as part of your aswer. If you do t have the data, the you re goig to have to make a assumptio about the populatio. Make sure that you do t assume too much! Oly assume as much as is eeded i order to move forward with the procedure. I all cases, outliers are a issue i reality. As far as AP is cocered, outliers should ot stop you from performig the procedure. The Coditios (AP Eam Versio) Alas, what has appeared i the scorig rubrics o the AP Eam is t eactly what I ve described specifically with regards to the shape requiremet. Here s what is typically epected: We eed the sample to be a radom sample from the populatio, ad either a ormally distributed populatio or a large sample size. All of the scorig rubrics result i either a very small sample size (i which case you should check to see if the sample shows ay sig of skew, or assume somethig about the populatio) or a quite large sample size (i which case the requiremet is met). I have t see ay with a medium-sized sample where you must either see a ot-too-skew sample or assume a ot-tooskew populatio. So I m goig to go ahead ad work eamples the way I d like to see them i class. Be aware that the same aswers o the AP Eam might ot receive full credit. HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 4 OF 7
Eample [5.] Durig a study o car safety, the brakig distace (feet) was measured for a car travelig at several differet speeds. The data are as follows: Table 1 - Brakig Distaces for Eample 5 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 84 36 32 48 52 56 64 66 54 70 92 93 120 85 46 68 46 34 Costruct a 95% cofidece iterval for the populatio mea brakig distace for these cars. This calls for a oe sample t iterval for the true mea. This requires that the sample was obtaied radomly ad that the populatio variable is ormally distributed. I ll have to assume that the sample was obtaied radomly. I do t kow aythig about the populatio distributio, but with a sample size of 50 I ll almost certaily be able to cotiue regardless of how the sample distributio looks. I ll take a look ayway Eample 5 Frequecy 0 5 10 0 20 40 60 80 100 120 Figure 2 - Histogram for Eample 5 Brakig Distace (ft) Nothig i the sample idicates a problem I should be able to cotiue. With 95% cofidece ad 49 degrees of freedom, t * 2.01. * s 25.77 The iterval is t 42.98 2.01 35.656,50.304 50 I am 95% cofidet that the populatio mea brakig distace is betwee 35.656 feet ad 50.304 feet. A Hypothesis Test for the Mea This will follow the same patter as the other tests we ve leared. The Hypotheses We ll assume that the parameter ( ) has some specific value ( 0 ). The alterative will be oe of the three iequalities. Be sure to eplicitly defie the parameter! H : 0 0 H :? a 0 HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 5 OF 7
The Coditios The coditios here are the same as for the iterval the sample was obtaied radomly, that the sample is small eough (the 10% coditio), ad the variable has a ormal distributio i the populatio. As was the case before, that last coditio will fail. You ll eed to graph the data or make a appropriate assumptio i order to cotiue. Be sure to read my earlier commets about how what I m writig might ot be eactly what is epected o the AP Eam! The Mechaics t with df 1 s. Calculate the p-value the usig the t-distributio much as you did for oe sample proportio tests. Be sure to eplicitly state the level of sigificace that you will be usig. The Coclusio The coclusio is much the same as it was i previous procedures! If [ull hypothesis] the I ca epect to fid [probability statemet] i [p-value] of repeated samples. Sice [ p / p ], this occurs [too rarely / ofte eough] to attribute to chace at the [ ] level; it is [sigificat / ot sigificat], ad I [reject / fail to reject] the ull hypothesis. [coclusio i cotet make a statemet about the alterate hypothesis]. Eample [6.] The girth (diameter; measured i iches) of 31 black cherry trees was measured. The data are as follows: Table 2 - Cherry Tree Data for Eample 6 8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6 12.9 13.3 13.7 13.8 11.4 11.4 11.7 12.0 14.0 12.9 Do these data provide evidece that the populatio mea girth is differet from 12 iches? I ll let represet the populatio mea girth of a Cherry tree. H : 12 0 (the populatio mea girth is 12 iches) H a : 12 (the populatio mea girth is ot 12 iches) This calls for a oe sample t test for the populatio mea. This requires that the sample was obtaied radomly ad that the populatio variable is distributed ormally. I ll have to assume that the sample was obtaied radomly. I do t kow how the populatio is distributed, but with a sample size of 31 I should be able to cotiue i almost ay case I ll go ahead ad look at the data ayway. HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 6 OF 7
Eample 6 Frequecy 0 4 8 8 10 12 14 16 18 20 22 Figure 3 - Histogram for Eample 6 Girth (i) The skew here is OK I ca cotiue. I ll use 0.05. 13.248 12 t 2.215. With df 30, 2P t 2.215 0.0345. s 3.138 31 If the populatio mea girth is 12 iches, the I ca epect to fid a sample with a mea girth less tha 10.75 iches or greater tha 13.248 iches i about 3.45% of samples. Sice p, this occurs too rarely to attribute to chace at the 5% level. This is sigificat; I reject the ull hypothesis. The data do provide evidece that the populatio mea girth is differet from 12 iches. HOLLOMAN S AP STATISTICS BVD CHAPTER 23, PAGE 7 OF 7