UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences

UCLA STAT 13 Itroductio to Statistical Methods for the Life ad Health Scieces Istructor: Ivo Diov, Asst. Prof. of Statistics ad Neurolog Sample Size Calculatios & Cofidece Itervals for Proportios Teachig Assistats: Bradi Shaata & Tiffa Head Uiversit of Califoria, Los Ageles, Fall 007 http://www.stat.ucla.edu/~diov/courses_studets.html Slide 1 Slide Plaig a Stud to Estimate μ Plaig a Stud to Estimate μ It is importat before ou begi collectig data to cosider whether the estimates will be sufficietl precise. Two factors to cosider: the populatio variabilit of Y sample size First: I certai situatios the variabilit of Y should ot be cotrolled for (respose i a medical stud to treatmet). However, i most studies it is importat to reduce the variabilit of Y, b holdig extraeous coditios as costat as possible. For example: stud of breast cacer might wat to examie ol wome Slide 3 Slide 4 Plaig a Stud to Estimate μ Secod: Oce the experimet is plaed to reduce the variabilit of Y as much as possible, we cosider the sample size. For example: how ma wome should we sample to achieve the desired precisio for our estimate? s RECALL: Plaig a Stud to Estimate μ To decide o a proper value of, we must specif what value of is desirable ad have a guess of s. For we eed to ask what value would we tolerate? For s we could use iformatio from a pilot stud or previous research Guessed s Desired Slide 5 Slide 6 1

Plaig a Stud to Estimate μ Plaig a Stud to Estimate μ Example: Reideer (Cot ) 54.78 s 8.83 0.874 Suppose we would like to estimate the sample size ecessar for ext ear's roud-up to keep < 0.6 8.83 0.60 14.7 16.58 17 reideer Slide 7 Ca't have 0.6 of a reideer, so we roud (ALWAYS roud up o sample size calculatios) to 17 reideer. What happes to as the desired precisio gets smaller? Example: Reideer (cot ) Suppose we would like to estimate the sample size ecessar for ext ear's roud-up to keep < 0.3 8.83 0.30 866.3 867 reideer Whe we double the precisio (ie. cut i half) it requires 4 times as ma reideer. This is the result of the Slide 8 Decisios About Decisios About How do we make the decisio of what we will tolerate is the estimatio of μ RECALL: ± t df ) ( the + part is called the margi of error ad is equivalet to t(df) 0.05 * for a 95% cofidece iterval t( df ) 0. 05 s If we sca the 0.05 (or 95%) colum of the t table the t multipliers are roughl equal to. t df ) ( 0. 05 + t( df ) 0. 05 Slide 9 So the for example, mabe we reaso that we wat our estimate to be withi μ + 1. with 95% cofidece Usig the logic from the previous slide thikig of the spa of the CI, suppose a total spa of.4 or + 1. is desired, the would eed to be < 0.60-1. + 1. Slide 10 t( df ) 0. 05 1. 0.6 Coditios for Validit of Estimatio Methods Coditios of validit of the formula We have to be careful whe makig estimatios computers make it eas iterpretatios are valid ol uder certai coditios Slide 11 For to be a estimate of μ, we must have sampled radoml from the populatio If ot the iferece is questioable/biased The validit of also requires: The populatio is large whe compared to the sample size rare that this is a problem sample size ca be as much as 5% of the populatio without seriousl iflatig. Observatios must be idepedet of each other we wat the observatios to give idepedet pieces of iformatio about the populatio. Slide 1

Coditios of validit of the formula Defiitio: A hierarchical structure exists whe observatios are ested withi the samplig uits this is a commo problem i the scieces Example: Measure the pulse of 10 patiets 3 times each. We do't have 30 pieces of idepedet iformatio. Oe possible aïve solutio: we could use each persos average Coditios of validit of a CI for μ Data must be from a radom sample ad observatios must be idepedet of each other If the data is biased, the samplig distributio cocepts o which the CI method is based do ot hold kowig the average of a biased sample does ot provide iformatio about μ Slide 13 Slide 14 Coditios of validit of a CI for μ We also eed to cosider the shape of the data for Studet's T distributio: If Y is ormall distributed the Studet's T is exactl valid If Y is approximatel ormal the Studet's T is approximatel valid If Y is ot ormal the Studet's T is approximatel valid ol if is large (CLT) How large? Reall depeds o severit of o-ormalit, however our rule of thumb is > 30 Page 0 has a ice summar of these coditios NOTE: If samplig distributio caot be cosidered ormal Studet's T will ot hold. Verificatios of Coditios I practice these coditios are ofte assumptios, but it is importat to check to make sure the are reasoable Scrutiize stud desig for: radom samplig possible bias o-idepedet observatios Populatio Normal? previous experiece with other similar data histogram/ormal probabilit plot icrease sample size tr a trasformatio ad aalze o the trasformed scale Slide 15 Slide 16 CI for a Populatio Proportio CI for a Populatio Proportio So far we have discussed a cofidece iterval usig quatitative data There is also a CI for a dichotomous categorical variable whe the parameter of iterest is a populatio proportio is the sample proportio p is the populatio proportio Whe the sample size is large, the samplig distributio of is approximatel ormal Related to the CLT Whe the sample size is small, the ormal approximatio ma be iadequate To accommodate this we will modif slightl Slide 17 Slide 18 3

CI for a Populatio Proportio The adjustmet we are goig to make to is to use istead + 0.5 z ~ p + z Relax ad remember that the formula for p ˆ p ~ was: CI for a Populatio Proportio So what is the z / bit? 0.05 -Z 0.05 0.05 RECALL: I chapter 4, z was the cut poit of the upper part of the stadard ormal distributio for a give Now we wat z / because we are calculatig a cofidece iterval ad eed to accout for both sides of the distributio So i the distributio above would be 0.05, which correspods to a 95% cofidece iterval 0.95 Z 0.05 Z Slide 19 Slide 0 CI for a Populatio Proportio CI for a Populatio Proportio The stadard error of A sample value p~ also eeds a slight modificatio ( 1 ) p( p) p~ is tpicall withi ~ p ~ 1 + ± ~ p ~ z Before we defie the formula for a CI for p let s remember the formula for a CI(μ) RECALL: ± t df ) ( s Where 100(1 - ) is the desired cofidece If we pick this apart we are reall saig that a CI(μ) is: the estimate of μ + (a appropriate multiplier) x () Slide 1 Slide CI for a Populatio Proportio Applicatio to Data Icorporate that logic ad we get: ( ) ~ p ± z ~ p Where 100(1 - ) is the desired cofidece This time we will use a z multiplier istead of a t multiplier Example: Suppose a researcher is iterested i studig the effect of aspiri i reducig heart attacks. He radoml recruits 500 subjects with evidece of earl heart disease ad has them take oe aspiri dail for two ears. At the ed of the two ears he fids that durig the stud ol 17 subjects had a heart attack. Calculate a 95% cofidece iterval for the true proportio of subjects with earl heart disease that have a heart attack while takig aspiri dail. Slide 3 Slide 4 4

Applicatio to Data Example: Heart Attacks (cot ) First, we eed to fid z / because this is a 95% CI, this meas that will be 0.05 ad z / will be z 0.05 0.05 0.95 0.05 Z Applicatio to Data Next, solve for p~ + The Text rouds this to + 4 + 0.5 z ~ + 0.5( z0.05 ) + 0.5( 1.96 ) + 1.9 p + z + z0.05 + 1.96 + 3.84 that s just the formula for p~, ow we actuall have to fid p~ ~ 17 + 1.9 p 0.038 500 + 3.84 i this case z / 1.96 -Z 0.05 Z 0.05 Slide 5 Slide 6 Applicatio to Data Applicatio to Data Next, solve for p ~ p ( 0.038)( 0.96) ~ 500 + 3.84 Fiall the 95% CI for p ~ p z 0.0085 ( ~ ) 0.038 ± 1.96( 0.0085) ± p 0.038 ± 0.0167 (0.013, 0.0547) What is our iterpretatio of this iterval? CONCLUSION: We are highl cofidet, at the 0.05 level (95% cofidece), that the true proportio of subjects with earl heart disease who have a heart attack after takig aspiri dail is betwee 0.013 ad 0.0547. Is this meaigful? Slide 7 Slide 8 Practice Calculate p~ ad ~ for a 99% cofidece iterval So z 0.005 is.58 + 0.5 z ~ p + z 0.005 ~ p 1 ~ ~ p + z -Z 0.005 p + 0.5 + z 0.99 0.005 Z 0.005 ( z0.005 ) + 0.5(.58 ) 0.005 +.58 ~ 1 ~ + 6.66 ( p) p( p) p( p) ~ 1 ~ +.58 + 3.33 + 6.66 Practice This is a lot of work! Cosider the followig shortcuts: The value of z / ca be carried through for all three formulas + 0.5 z ~ ~ p( 1 ~ p) p + ~ p ~ p ± z ( ~ p ) z + z just do t forget to square it i p~ ad ~ p RECALL: The t distributio approaches a z distributio whe df this meas that at the bottom of the t table there are several t multipliers that ca be substituted for z (use the df row) CAUTION: this will ol work for certai levels of. If ot foud o the t table ou must go back ad solve with the z table! Slide 9 Slide 30 5

Plaig a Stud to Estimate p Plaig a Stud to Estimate p We talked about fidig the sample size ecessar to esure for quatitative data. This method depeded o: Desired ( Guessed ~ p)( 1 Guessed ~ p) Desired ~ p + z Guessed s For the proportios we use a similar idea: where a guess for p~ ca be made o previous research or i igorace. Example: Heart Attacks (cot ) How ma subjects are eeded if researchers wat < 0.005 for a 95% CI, ad have guess based o previous research that p~ would be 0.04 ( 0.04)( 0.96) ( 0.04)( 0.96) 0.005 0.005 + 1.96 ( 0.04)( 0.96) + 3.84 + 3.84 1536 + 3.84 1533.16 1534subjects Slide 31 Slide 3 Example 6.1 Example 6.1 StatisticalBarChartDemo: http://socr.ucla.edu/htmls/socr_charts.html 6.1. Six health three ear-old female Suffolk sheep were ijected with the atibiotic Getamici, at a dosage of 10 mg/kg bod weight. Their blood serum cocetratio (µg/mli) of Getamici 1.5 hours after ijectio were as follows: 33, 6, 34, 31, 3, 5. For these data, the mea is 8.7 ad the stadard deviatio is 4.6. (a) Costruct a 95% cofidece iterval for the populatio mea μ. There are five degrees of freedom. 8.7 ±.571 4.6/sqrt(6), or (3.9, 33.5). -bar 8.7; s 4.5898; 4.5898/sqrt[6] 1.8738 (approx) 1.9 micrograms/liter. 8.7 +/- (.571)(1.8738) (3.9,33.5) or 3.9 < mu < 33.5 (b) Defie i words the populatio mea. The populatio mea μ is the mea blood serum cocetratio i μg/ml of Getamici 1.5 hours after ijectio at a dosage of 10mg/kg bod weight i health three-ear-old female Suffolk sheep. The value of mu is ukow. However, it does exist ad, i words, mu mea blood serum cocetratio of Getamici (1.5 hours after ijectio of 10 mg/kg bod weight) i health threeear-old female Suffolk sheep. (c) The fact that the 95% cofidece iterval for μ cotais earl all the observatios will this be geerall true? The fact that, i this case, 95% cofidece iterval for μ cotais earl all the observatios is mail due to the small sample size. For much larger samples, cofidece i the locatio of μ is much more cocetrated ad the iterval will be much tighter. Slide 33 Slide 34 6