We have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers,

Cofidece Itervals III What we kow so far: We have see how to set cofidece itervals for the ea, or expected value, of a oral probability distributio, both whe the variace is kow (usig the stadard oral, or Z, table), ad whe it is ot (usig "tudet's t" table). The two itervals are Z / ad t / We have also leared that, thaks to the Cetral Liit Theore ad the Law of Large Nubers, Z / is a approxiate cofidece iterval for the expected value, E[], whe the saple size () is large, eve if the observatios are coig fro a distributio other tha the oral. Authors: Blue, Greevy Bios 3 Lecture Notes Page of 9

Cofidece Itervals III Most cooly, the secod iterval (with the coefficiet fro the t table, ( t / ) is used oly whe the distributio of the idividual observatios is believed to be early oral. Otherwise the third iterval is used ad is called a large saple cofidece iterval or a approxiate cofidece iterval to ephasize that the coverage probability is oly approxiate. Thus, Z / is a approxiate (-)00% CI. This all works because E[ ] approx ~ N (0,) for `large. Authors: Blue, Greevy Bios 3 Lecture Notes Page of 9

Cofidece Itervals III Cofidece Itervals for the differece betwee two eas I order to lear about how oral cotraceptive use affects blood pressure, we fid soe woe who use oral cotraceptives ad soe who do't, observe their systolic blood pressures, ad see what we see. We coceptualize the two groups as differet populatios, fro which we draw a saple: Group : (OC users),, are i.i.d. with E[]= x ad Var[]= A saple of =8 yields x 3. 86 ad s 5. 34 x Group : (o-users),, are i.i.d. with E[]= ad Var[]= A saple of = yields y 7. 44 ad s 8. 3 y Authors: Blue, Greevy Bios 3 Lecture Notes Page 3 of 9

Cofidece Itervals III To aalyze these observatios we ight use a probability odel that says that blood pressures are orally distributed ad we ight ot. We ll see later why this becoes iportat. The quatity we are tryig to estiate is E[]-E[]= x - ad our estiator is siply. Because ad are rado variables, so is. Hece we ca stadardize it like so: Z E Var Now we kow that: E E E ) ad ) Var Var Var Var Var Authors: Blue, Greevy Bios 3 Lecture Notes Page 4 of 9

Cofidece Itervals III Ad because the CLT ad LLN work o averages of rado variables we have that Z approx ~ N 0, as gets large * Fially because P( -Z / < Z < Z / )=-, A approxiate* (-)00% cofidece iterval for is give by Z / (*Which is exact if the uderlyig distributios of the s ad s are both oral.) Authors: Blue, Greevy Bios 3 Lecture Notes Page 5 of 9

Cofidece Itervals III But the variace is ukow. If we estiate with proble because:, we ru ito a T ~??? The distributio of T is ukow -- is ot oral or studets-t. We have o way of calculatig a exact or eve approxiate cofidece iterval. (Why?) Fortuately, we ca always use the CLT to save us i large saples because: Z approx ~ N 0, as getslarge Authors: Blue, Greevy Bios 3 Lecture Notes Page 6 of 9

Cofidece Itervals III Now, i large saples, a approxiate (-)00% cofidece iterval for give by Z / For our case this yields the followig 95% CI : 3.86 7.44.96 5.34 8 8.3 or 5.4 3.8 7.76,8.60 At the 95% level the data suggest that the differece betwee the two populatios eas is at least -7.76 but o ore tha 8.60. Our observatios are evidece that OC use causes a icrease i blood pressure. ( Our best estiate is that it icreases ea blood pressure by 5.4.) But the evidece is ot very strog (because = 0 eas that there is o icrease, ad the 95% CI icludes this value). Authors: Blue, Greevy Bios 3 Lecture Notes Page 7 of 9

Cofidece Itervals III Is our saple size large eough to ivoke the CLT? Here we have =8 ad =, so probably ot. o what ca we do for sall saples? Uless we are willig to ake soe additioal assuptios the othig ore ca be doe. o, if we assue that the uderlyig distributios of ad are approxiately oral (syetrical ad ot too skewed) the whe the saple size is sall (either or or both) the there are several available ethods. The price we pay is that our procedure is o loger `robust. This is because all of our future calculatios will deped o the fact that s ad s are orally distributed ad if i fact they have soe other distributio our calculatios will be wrog. (This is why the large saple iterval discussed earlier is used so ofte, ad also why us statisticias bug you doctors about gettig a large saple size.) Authors: Blue, Greevy Bios 3 Lecture Notes Page 8 of 9

Cofidece Itervals III The geeral proble is to coe up with a estiate of Var, call it Vˆ, so that Z Vˆ ~ Q where the distributio Q is kow. However there is ore tha oe "atural" way to estiate the variace of. Ufortuately, oly oe of these ways leads to a tidy solutio (aother tudet's t iterval), ad it is ofte iappropriate. Authors: Blue, Greevy Bios 3 Lecture Notes Page 9 of 9

Cofidece Itervals III Ukow Variace Method (The Case of Equal Variaces) Assuig that both the s ad s are orally distributed, there is a eat, exact solutio oly for the case whe the variaces, although ukow, are assued to be equal ( = = ). I this case Var is estiated with Vˆ p where p is the `pooled variace estiate. It is a weighted average of the two saple variaces, ad, with the oe that is based o ore observatios gettig ore weight. p Authors: Blue, Greevy Bios 3 Lecture Notes Page 0 of 9

Cofidece Itervals III Now the stadardized differece p ~ t has exactly a t-distributio with +- degrees of freedo. Thus the (-)00% cofidece iterval for give by is t / p Assuig that the variaces i the two populatios are equal ad the uderlyig distributios are approxiately oral. However this iterval is fairly robust to oorality (That is, it cotiues to have approxiately the correct coverage probability whe the distributios are ot oral). Authors: Blue, Greevy Bios 3 Lecture Notes Page of 9

Cofidece Itervals III I our exaple of how oral cotraceptive use affects blood pressure, the pooled variace estiate is 7 (5.34 ) + 0 (8.3 ) s p = = 307.803 8 + so for a cofidece coefficiet of 0.95 we fid fro Table A., t7 =.05, ad the 95% cofidece iterval for the ea blood pressure differece betwee OC users ad o-users is 3.86 7.44.05 307.803 8 + or 5.4.05 (7.8), or 5.4 4.94. Or (-9.5, 0.36 ). However the variaces are rarely, if ever, equal. o what ca we do if we assue that the s ad s are orally distributed, but the variaces are uequal. Authors: Blue, Greevy Bios 3 Lecture Notes Page of 9

Cofidece Itervals III Ukow Variace Method (The Case of Uequal Variaces) Assuig that both the s ad s are orally distributed, ad that ( ), there is o eat solutio (it reais usolved today!) I this case we estiate with ru ito a proble because:, but T ~??? (Reeber that we are assuig that the saple sizes are sall eough that T would ot be approxiately oral, by the CLT) Authors: Blue, Greevy Bios 3 Lecture Notes Page 3 of 9

Cofidece Itervals III We ca use a approxiatio such as T approx * ~ t where t * is approx a t-dist with DF / / / / rouded dow called atterthwaite s correctio for df. (ost coputer progras do this but too uch of a pai to do by had) Alteratively there is a coservative approxiatio T approx ~ ti, That is, just use the df for the sallest populatio. Why is this coservative? Authors: Blue, Greevy Bios 3 Lecture Notes Page 4 of 9

Cofidece Itervals III Authors: Blue, Greevy Bios 3 Lecture Notes Page 5 of 9 Each approach suggests the iterval t * / Where the degree of freedo is calculated either as DF = Mi[-,-] or DF= / / / / Both itervals are approxiately correct uder the assuptio of s ad s orally distributed with uequal variaces. But either is the exact solutio.

Cofidece Itervals III Take hoe essage: Whe we have idepedet saples fro two oral distributios with ukow ad uequal variaces, we caot fid sesible exact cofidece itervals for the differece betwee the eas (i the sall saple case). Fortuately the Cetral Liit Theore ad the Law of Large Nubers still apply, ad they provide the basis for approxiate CI's whe both saple sizes, ad, are large. These two results (CLT ad LLN) ca be used to prove that Z / is a approxiate (-)00% for - (As we saw earlier). That is, the probability that this rado iterval will iclude - approaches 0.95 as the saple sizes grow. Authors: Blue, Greevy Bios 3 Lecture Notes Page 6 of 9

Cofidece Itervals III If the 's ad s are oral ad the two variaces are equal, the the tudet's t CI, t / p, is exact for all saple sizes, large or sall. If the variaces are very uequal, the coverage probability of this iterval ight ot be eve approxiately correct, eve whe the 's are orally distributed ad the saples are large. The coverage probability of this iterval ca be seriously wrog if the two variaces ad are ot equal, because the pooled variace estiate, p estiates E p + = E ( ) +( ) + + ( = ) +( ) + + Authors: Blue, Greevy Bios 3 Lecture Notes Page 7 of 9

Cofidece Itervals III Var ot the correct quatity I fact the two are the sae oly whe the variaces are equal, i.e., whe =. Iterestigly eough, the ai source of the proble with the tudet's t cofidece iterval that is caused by uequal variaces disappears whe the two saple sizes are equal (=)! I that special case, the pooled variace is estiatig the right quatity after all, because the two variace estiates are idetical: p + ( ) + ( ) = + + ( = )( + ( ) ) = +. Authors: Blue, Greevy Bios 3 Lecture Notes Page 8 of 9

Cofidece Itervals III The distributio is still ot exactly tudet's t, so the coverage probability wo't be exactly the value show i the t-table. But because the variace estiate is estiatig the right thig, the possibility of serious discrepacy betwee the table value ad the actual coverage probability of the iterval is avoided whe the two saple sizes are roughly equal. uary Iterval Whe Z / both ad are large t / p For, sall ad s & s oral; variaces equal or = i[, ] t / For, sall ad s & s oral; variaces uequal; (ca also use atterthwaite s df) Authors: Blue, Greevy Bios 3 Lecture Notes Page 9 of 9