Confidence Interval for one population mean or one population proportion, continued. 1. Sample size estimation based on the large sample C.I.

Cofidece Iterval for oe populatio mea or oe populatio proportio, cotiued 1. ample size estimatio based o the large sample C.I. for p ˆ(1 ˆ) ˆ(1 ˆ) From the iterval ˆ p p Z p ˆ, p Z p p L legh of your 100(1 )% CI Z pˆ (1 pˆ ) L,, pˆ are give ad we are iterested i sample size. Therefore, 1 1 4( Z ) 1 4( Z ) p(1 p) ( Z ) L L L ˆ ˆ (Whe 1 pˆ, it has the maximum value.) Example 1. Example. L 0.0, 0.05, pˆ 0.54? L 0.0, 0.05, pˆ 0.5?. ample size calculatio for p based o the maximum error E. Defiitio. P( pˆ p E) 1 We wat to estimate p withi E with a probability of (1 ). Derive the formula for 1

( ) ( ) ( ( ) ( ) ( ) ) ice whe the sample size is large, by the CLT we have: ( ) ( ) Thus ( ( ) ( ) ) ( ) ( ) ( )

Whe p is ukow, we ca plug i the estimate of p,, ad obtai the followig formula: ( ) ( ) Recall we also derived based o L - the legth of the 100(1 ) % large sample cofidece iterval for p. Their relatioship (usig the secod formula for sample size calculatio because our CI formula was basd o the secod PQ where we estimated the p i the deomiator) is L E Recall we also derived based o L - the legth of the 100(1 ) % large sample cofidece iterval for p. Their relatioship is L E P( pˆ p E) 1 P( E pˆ p E) 1 P( E pˆ p E pˆ ) 1 P( pˆ E p pˆ E) 1 The 100(1 ) % cofidece iterval for p is pˆ E, pˆ E. L pˆ E pˆ E E The legth of the cofidece iterval is Example 3. I order to estimate the percet of childre with iadequate immuizatio to be withi 0.05 of the true proportio with a probability of 98% (a) How may childre should be sampled? olutio. E 0.05, 1 0.98 0.0 Z0.01 4 (0.05).33 4 (0.05) 543 3

(b) If the percetage of childre with iadequate immuizatio is estimated to be 0%, the? olutio. pˆ 0% Z 0.01 0. 1 0. 348 (0.05) ample size calculates for 1 populatio proportios based o the maximum error E. 3. Cofidece Iterval for 1 populatio mea There are 4 scearios we cover oly the first 3 scearios i our class. 1. Normal populatio, a. Poit Estimator : X is kow. X b. Pivotal Quatity : Z ~ N(0,1) c. 100(1 )% CI for : P( Z Z Z ) 1 X Z d. Legth of CI : L Z e. ample size based o L : f. ample size based o E P( X E) 1 P( E X E) 1 4 Z L P( X E X E) 1 L X E X E E Z E. Normal populatio, a. Poit Estimator : X is ukow. 4

X b. Pivotal Quatity : T ~ t 1 c. 100(1 )% CI for : P( t T t ) 1 X t 1, 1, 1, d. Legth of CI : L t 1, e. ample size based o L : f. ample size based o E P( X E) 1 t 1, 4 L P( E X E) 1 P( X E X E) 1 L X E X E E t 1, E 3. Ay populatios, large sample a. Poit Estimator : X b. Pivotal Quatity : X X Z ~ N(0,1) or Z ~ N(0,1) c. 100(1 )% CI for : P( Z Z Z ) 1 X Z X Z d. Legth of CI : L Z or L Z e. ample size based o L : 4 Z 4 Z or L L f. ample size based o E 5

P( X E) 1 P( E X E) 1 P( X E X E) 1 L X E X E E Z Z or E E 4. There also exist other cases, but we do t cover those i our class. Now I will preset more details for ceario metioed above. cearios 1 & 3 (easy) ceario : ormal populatio, ukow 1. Poit estimatio : X. Z ~ N(0,1) X ~ N(, ) 3. Theorem. amplig from ormal populatio a. Z ~ N (0,1) 1 b. W ~ 1 c. Z ad W are idepedet. Z X Defiitio. T ~ t 1 W ( 1) ------ Derivatio of CI, ormal populatio, is ukow ------ X ~ N(, ) is ot a pivotal quatity. X ~ N(0, ) is ot a pivotal quatity. 6

X Z ~ N(0,1) is ot a pivotal quatity. / Remove!!! X Therefore T ~ t 1 is a pivotal quatity. / Now we will use this pivotal quatity to derive the 100(1-α)% cofidece iterval for μ. We start by plottig the pdf of the t-distributio with -1 degrees of freedom as follows: The above pdf plot correspods to the followig probability statemet: P( t T t ) 1 1, / 1, / X P( t t ) 1 / => 1, / 1, / 7

P( t X t ) 1 => 1, / 1, / P( X t X t ) 1 => 1, / 1, / P( X t X t ) 1 => 1, / 1, / P( X t X t ) 1 => 1, / 1, / => Thus the 100(1 )% C.I. for whe is ukow is [ X t 1, /, X t 1, / ]. (*Please ote that t 1, / Z/ ) Example 4. I a radom sample of 36 parochial schools throughout the south, the average umber of pupils per school is 379. with a stadard deviatio of 14. Use the sample to costruct a 95% CI for, the mea umber of pupils per school for all parochial schools i the south. olutio. CI for, large sample 36, X 379., 14, =0.05 14 95% CI for is X Z0.05 379. 1.96 36 338.7, 419.7 Example 5. I a psychological depth-perceptio test, a radom sample of 14 airlie pilots were asked to judge the distace betwee markers at the other ed of a laboratory. The data (i test) are 8

.7,.4, 1.9,.4, 1.9,.3,.,.5,.3, 1.8,.5,.0,.,.6 Please costruct a 95% CI for, the average distace. olutio. (Note: we ca perform the hapiro-wilk test to examie whether the sample comes from a ormal populatio or ot. This test is ot required i our class. Here we simply assume the populatio is ormal. I will always give you such iformatio i the exams.) CI for, small sample, ormal populatio, populatio variace ukow. 14, X.6, 0.8, =0.05 0.8 95% CI for is X t 1,.6.16 14.10,.4 Example 6. A federal agecy has decided to ivestigate the advertised weight we prited o cartos of a certai brad of cereal. Historical data show that 0.75 ouce. If we wish to estimate the weight withi 0.5 ouce with 99% cofidece, how may cartos should be sampled? olutio. E 0.5, 0.75, 0.01 Z0.005 0.75 0.5 60 Example 7. (review of exact CI for mea whe the populatio is ormal ad the populatio variace is kow.) (PU). A radom sample of 16 police officers subjected to costat ihalatio of automobile exhaust fumes i dowtow Cairo had a average blood lead level cocetratio of 9. μg/dl. Assume X, the blood lead level of a radomly selected policema, is ormally distributed with a stadard deviatio of σ = 7.5 μg/dl. Historically, it is kow that the average blood lead level cocetratio of humas with o exposure to automobile exhaust is 18. μg/dl. Is there covicig evidece that policeme exposed to costat auto exhaust have 9

elevated blood lead level cocetratios? (Data source: Kamal, Eldamaty, ad Faris, "Blood lead level of Cairo traffic policeme," ciece of the Total Eviromet, 105(1991): 165-170.) olutio. Let's try to aswer the questio by calculatig a 95% cofidece iterval for the populatio mea. For a 95% cofidece iterval, 1 α = 0.95, so that α = 0.05 ad α/ = 0.05. Therefore, as the followig diagram illustrates the situatio, z 0.05 = 1.96: Now, substitutig i what we kow ( = 9., = 16, σ = 7.5, ad z 0.05 = 1.96) ito the the formula for a Z-iterval for a mea, we get: [ ] [ ] implifyig, we get a 95% cofidece iterval for the mea blood lead level cocetratio of all policeme exposed to costat auto exhaust: [7.89, 30.51] That is, we ca be 95% cofidet that the mea blood lead level cocetratio of all policeme exposed to costat auto exhaust is 10

betwee 7.9 μg/dl ad 30.5 μg/dl. Note that the iterval does ot cotai the value 18., the average blood lead level cocetratio of humas with o exposure to automobile exhaust. I fact, all of the values i the cofidece iterval are much greater tha 18.. Therefore, there is covicig evidece that policeme exposed to costat auto exhaust have elevated blood lead level cocetratios. Example 8. (Large sample CI for populatio mea, variace ukow) (BU) Descriptive statistics o variables measured i a sample of a =3,539 participats attedig the 7th examiatio of the offsprig i the Framigham Heart tudy are show below. Characteristic ample Mea tadard Deviatio (s) ystolic Blood Pressure Diastolic Blood Pressure Total erum Cholesterol 3,534 17.3 19.0 3,53 74.0 9.9 3,310 00.3 36.8 Weight 3,506 174.4 38.7 Height 3,36 65.957 3.749 Body Mass Idex 3,36 8.15 5.3 Because the sample is large, we ca geerate a 95% cofidece iterval for systolic blood pressure usig the followig formula: [ ] 11

ubstitutig the sample statistics ad the Z value for 95% cofidece,, we have [ ] [ ] Therefore, the poit estimate for the true mea systolic blood pressure i the populatio is 17.3, ad we are 95% cofidet that the true mea is betwee 16.7 ad 17.9. The margi of error is very small (the cofidece iterval is arrow), because the sample size is large. Example 9. (mall sample CI for populatio mea, variace ukow, NORMAL POPULATION) (BU) The table below shows data o a subsample of =10 participats i the 7th examiatio of the Framigham offsprig tudy. Characteristic ample Mea tadard Deviatio (s) ystolic Blood Pressure Diastolic Blood Pressure Total erum Cholesterol 10 11. 11.1 10 71.3 7. 10 0.3 37.7 Weight 10 176.0 33.0 Height 10 67.175 4.05 Body Mass Idex 10 7.6 3.10 uppose we compute a 95% cofidece iterval for the true systolic blood pressure usig data i the subsample. Because the sample size is small, we must ow use the cofidece iterval formula that ivolves t rather tha Z. [ ] 1

The sample size is =10, the degrees of freedom (df) = -1 = 9. The t value for 95% cofidece with df = 9 is =.6. ubstitutig the sample statistics ad the t value for 95% cofidece, we have. Iterpretatio: Based o this sample of size =10, our best estimate of the true mea systolic blood pressure i the populatio is 11.. Based o this sample, we are 95% cofidet that the true systolic blood pressure i the populatio is betwee 113.3 ad 19.1. Note that the margi of error is larger here primarily due to the small sample size. 13