Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

PSet ----- Stats, Cocepts I Statistics Cofidece Iterval Guesswork with Cofidece VII. CONFIDENCE INTERVAL 7.1. Sigificace Level ad Cofidece Iterval (CI) The Sigificace Level The sigificace level, ofte deoted by, is a pre-specified value that is used to compare with the probability of a evet uder a propositio kow as the ull hypothesis. The probability of the evet is regarded as the P value ad is cosidered as the coditioal probability of makig a error (Type I error) of rejectig the ull hypothesis while the propositio of ull hypothesis is true. The sigificace level is associated with a critical z-scores z i N(0,1) distributio or a critical t i t distributio, depedig o which distributios i the applicatios are used. Three cases associated with are ofte practiced: the left-tailed, the right-tailed ad twotailed. Oly the two-tailed case is used i determiig the cofidece iterval (CI) i ormal ad t distributios. df is the degree of freedom i the t distributios. 1

PSet ----- Stats, Cocepts I Statistics [MATH] Assume the ull hypothesis or the propositio H 0. The sigificace level is the probability of rejectig the ull hypothesis while the ull hypothesis is true: ( reject H H is true) = Pr 0 0 The critical z * value of Z-distributio or the critical value t * of t-distributio are defied mathematically as follows: 1.) For left-sided: Pr ( Z < z *) =, Pr ( < *) =.) For right-sided: Pr ( Z > z *) =, Pr ( > *) = 3.) For two-sided: Pr ( Z > z * ) =, Pr ( * ) T t T t T > t = [Ti-84] The critical z * or t * values ca be calculated as follows: 1.) Left-tailed: z = ivnorm( ), t = ivt (, df ) ;.) Right-tailed: z = ivnorm( 1 ), t = ivt ( 1, df ) 3.) Two-tailed: z = ivnorm, t = ivt, df Example 7.1.1. For the give, df, the types of tails, ad ull hypothesis, fid t as idicated. z or Type df z or 0.1 Z - Distributio, right - tailed - t Graph 0.1 Z - Distributio, left - tailed - 0.1 Z - Distributio, two - tailed - 0.05 Z - Distributio, two - tailed -

PSet ----- Stats, Cocepts I Statistics 0.01 Z - Distributio, two - tailed - 0.1 t - Distributio, two - tailed 5 0.1 t - Distributio, two - tailed 0 0.1 t - Distributio, two - tailed 30 0.1 t - Distributio, two - tailed 60 0.05 t - Distributio, two - tailed 5 0.05 t - Distributio, two - tailed 0 0.05 t - Distributio, two - tailed 30 0.01 t - Distributio, two - tailed 5 0.01 t - Distributio, two - tailed 30 Solutio: Type df z or t 0.1 Z - Distributio, right - tailed - z = 1.8 Graph 0.1 Z - Distributio, left - tailed - z = 1.8 0.1 Z - Distributio, two - tailed - z = 1.645 0.05 Z - Distributio, two - tailed - z = 1.960 0.01 Z - Distributio, two - tailed - z =.576 0.1 t - Distributio, two - tailed 5 t =.015 3

PSet ----- Stats, Cocepts I Statistics 0.1 t - Distributio, two - tailed 0 t = 1.75 0.1 t - Distributio, two - tailed 30 t = 1.697 0.1 t - Distributio, two - tailed 60 t = 1.671 0.05 t - Distributio, two - tailed 5 t =.571 0.05 t - Distributio, two - tailed 0 t =.086 0.05 t - Distributio, two - tailed 30 t =.04 0.01 t - Distributio, two - tailed 5 t = 4.03 0.01 t - Distributio, two - tailed 30 t =.750 Example 7.1.. Fid the iterval for the µ or p 0. Write the results i the iequality form ad iterval otatio. a.) 5 µ 1 3 b.) 16 µ 3 σ c.) 10 µ 1 σ d.) 5 µ σ 4

PSet ----- Stats, Cocepts I Statistics x µ e.) z * σ p0 f.) pq ˆ ˆ z * Solutio: a.) µ 8, or [, 8] b.) µ 16 3σ or µ 16 + 3 σ, or (,16 3 σ ] [16 + 3 σ, + ) c.) 10 σ µ 10 + σ, or [10 σ,10 + σ ] σ σ σ σ d.) 5 µ 5 +, or 5, 5 + σ σ σ σ e.) x z * µ x + z *, or x z *, x + z * pq f.) pq ˆ ˆ pq ˆ * ˆ ˆ ˆ ˆ pq ˆ ˆ ˆ p z p0 p + z *, or p z *, p + z * Example 7.1.3. Assume that the symmetrical iterval that cotais the mea of the ormally distributed radom variable X is [10, 30], Pr( x [10,30]) = 0.5. Fid µ, σ. Solutio: Sice the iterval is symmetrical to the mea, [10, 30] = [ xa, xb ] 10 + µ = 30 = 0, xb µ 30 0 ad I z = [ za, zb] where zb = ivnorm(0.75, 0,1) = 0.6745. σ = = = 14.86. z 0.6745 b 5

PSet ----- Stats, Cocepts I Statistics Example 7.1.4. Solve each idicated variable. 0.8(1 0.8) a.) 1.96 0.01, for iteger b.) 0.6(1 0.6) z * 0.03, for z * 900 Solutio: a.) > 6146 b.) z* 1.837 Example 7.1.5. a.) The iterval for µ with x µ 3 is [1, 4], i.e., µ [1, 4], for a give x. What is the value of x? 1.96σ b.) The iterval for µ with x µ is described by the iequality above for a 00 give x. What are x ad σ? Solutio: x 3 x + 3 a.) 5. µ x = 5. 3 13 1.96σ 3 ( 13) b.) x = = 8, = σ = 36.0769 00 6

PSet ----- Stats, Cocepts I Statistics Example 7.1.6. Give Pr( z < z*) = 0.95, x = 0.5, σ = 0. 5, where X is ormally distributed. Fid the domai for µ, ad decide if µ could be 0. 07. Solutio: 0.5 µ z b = z* = ivnorm(0.975, 0,1) = 1.960, 1.960 < < 1.960. 0.01< µ < 0.99 or 0.5 µ (0.01, 0.99). Yes, µ could be 0. 07. Example 7.1.7. Pr( z < z*) = 1, µ = 4, σ = 0. 5, where X is ormally distributed. What is the miimum z * such that [ x a, x ]? What is? 5 b Solutio: Sice z * < z < z * ad = x 4 x 4 z, the z * < < z * or 4 0.5z * < x < 4 + 0.5z *. 0.5 0.5 This implies that 4 + 0.5z * 5 or z *. 1 = Pr( z ) = 0.9545 or = 1 0.9545 = 0.0455. 7

PSet ----- Stats, Cocepts I Statistics The Cofidece Iterval [MATH] A estimator ˆ θ is called a ubiased estimator for parameterθ, if the expected value of ˆ θ is θ : E[ ˆ θ ] = θ The differece of E[ ˆ θ ] θ is called the bias of ˆ θ. The ituitive meaig of a ubiased estimator is oe that does ot systematically overestimate or uderestimate the θ. The cofidece level (CL) is a probability that is defied as 1. I the two-tailed case whe the is give, the Cofidece Iterval (CI) is defied as the iterval cotaiig the parameter for some obtaied statistic such that Statistic Parameter Pr < c * = 1 Variability Where c * is the critical value for either z * or t *. The CI is the domai for the probability i terms of the parameter, ad CI is oly defied for the symmetric distributios: or Statistic c * Variability < Parameter < Statstic + c * Variability ( *, * ) CI = Statistic c Variability Statstic + c Variability Specifically, for a give measure x uder the ormal distributio, the cofidece level is used to estimate the mea µ by fidig the domai of probability Pr( z < z*) = 1 or x µ Pr < z * = 1 σ The iterval. The iterval to iclude the mea is µ [ x z * σ, x z * σ ] [ * σ, * σ ] CI = x z x + z +. is the cofidece iterval (CI) for a give measure x ad sigificace level to estimate the mea µ. 8

PSet ----- Stats, Cocepts I Statistics 7.. Cofidece Iterval for a Proportio i Oe Sample The Cofidece Iterval (CI) is used to estimate the populatio parameter from the statistic obtaied from the sample data. Example 7..1. [THE TENAFLY SKITTLES PROBLEM] You are give a jar of Skittles. Some are red. You are iterested i fidig the proportio of red Skittles i the jar. The method of fidig the proportio really depeds o the umber of Skittles i the jar. Whe the umber is small, direct coutig may be the best way; whe the umber is relatively large, a samplig method may be more suitable. Assume that N -- the umber of total Skittles i the jar x -- the umber of red Skittles i the jar x p = -- the proportio of red Skittles i the jar, a oegative real umber N -- the size of a sample ˆx -- the umber of red Skittles i a sample xˆ = -- the proportio of red Skittles i a sample Based o the probability theory, develop a samplig method to fid the iterval that covers the proportio of red Skittles i the jar. Clearly state the coditios whe approximatios are used. Solutio: Use SRS method to select a sample of size. The selectio of Skittles is radom ad idepedet. Pickig skittles from a jar ca be viewed as a Beroulli process whe the sample size is small compared with the populatio size. I the followig example, assume 4 red skittles i the small bag ad 40 red i the large jar. 10 skittles are radomly selected i each cotaier. 9

PSet ----- Stats, Cocepts I Statistics [MATH] The selectio problem should be modeled as a hypergeometric problem. That is, the k successes i draws without replacemet. The stadard deviatio for the hypergeometric model is N p(1 p) σ = N 1 N where N 1 is the modificatio factor. Whe the samplig fractio is small, say N 10% N, the stadard deviatio ca be approximated as 1 N p(1 p) p(1 p) σ = 1 1 N This is the 10% rule. That is, whe the sample size is less tha 10% of the populatio, the selectio problem ca be viewed as with replacemet problem. 10

PSet ----- Stats, Cocepts I Statistics Let us assume that the 10% rule is satisfied. I this case, each time whe a Skittle is selected durig samplig, the probability of gettig a red is p. The probability of gettig the other color Skittle i each selectio is q = 1 p. The sketch of the tree diagram is The mea ad stadard deviatio are µ = p, σ = xˆ xˆ pq Whe the coditios: p > 10 ad q > 10 are satisfied, the biomial distributio ca be approximated by the ormal distributio. The z-score for ˆx is xˆ ˆ ˆ ˆ p x µ ˆ x x p p p zxˆ = = = = = z σ xˆ pq pq pq pq The term σ ˆp = is the stadard deviatio of the sample. Sice p is ukow, approximatio ca be made by 11

PSet ----- Stats, Cocepts I Statistics σ = pq pq ˆ ˆ ad z p ˆ p pq ˆ ˆ For a give sigificace level, the cofidece iterval (CI) is ˆ p p Pr z < z = Pr < z = 1 σ pq ˆ ˆ ˆ(1 ˆ) ˆ(1 ˆ) ˆ p p, ˆ p CI p z p z p = + Or, i the iterval otatio: (1 ) (1 ) z < p < + z i the form of margi of error (MoE) z (1 ) : z ± (1 ) 1

PSet ----- Stats, Cocepts I Statistics The meaig of cofidece iterval is that for a obtaied sample proportio, the populatio proportio p has a probability 1 to be cotaied i CI. There are cases that the CI obviously fails to catch the populatio whe 0 < p < 1: p ˆ = 0 : CI = (0,0) or p ˆ = 1: CI = (1,1) As a example, whe = 5%, ad samplig repeats 100 times, about 95 of the CIs are expected to cotai the populatio proportio p. For a give ad from the characteristic property of quadratic fuctio, the stadard p( 1 p) deviatio σ p = reaches maximum whe p = 0. 5 : σ (0.5) 1 max = = [PROCEDURE] Cofidece Iterval for a Proportio i Oe Sample The steps to obtai a CI for the populatio proportio p from the sample proportio (statistic) with sample size are 1.) The sample is a idepedet radom (the variables are i.i.d.)..) The sample size is less tha 10% of populatio. 3.) 10, (1 ) 10 to approximate the biomial distributio by the ormal distributio. 4.) The CI is costructed from the sample statistic ˆp for a give sigificace level : where (SE). z ˆ(1 ˆ) ˆ(1 ˆ) ˆ p p, ˆ p CI p z p z p = + (1 ) is the margi of error (MoE) ad (1 ) is the Stadard Error 13

PSet ----- Stats, Cocepts I Statistics [Ti-84] Cofidece Iterval for a Proportio i Oe Sample 1.) STAT -> TESTS ->A. 1-PropZit.) Iput x,,1. Note that you eed to put x, i, ot ˆp! Example 7... THS admiistrators wated to kow how may 10 th graders ad 11 th graders did either iterships or commuity services i the past summer. A radom sample of 75 studets idicated that 60 studets did oe of the two. Fid the 95% cofidece iterval for the proportio of the etire 10 th ad 11 th graders who did iterships or commuity services. Assume that the total populatio of these two classes is 800 ad all two grades are equally likely to do summer iterships or commuity services. Use the calculator to verify your aswers. Solutio: p ˆ = 60 / 75 = 0.8, = 0. 05, z = = 1. 96 = 75 < 10% 800 = 80, = 75 0.8 = 60 1 = 75 0. = usig ormal distributios are satisfied. The CI is z, ( ) 15. The coditios for or ˆ ± p z (1 ) = 0.8 ± 1.96 CI = ( 0.709, 0.891) 0.8(1 0.8) 75 = 0.8 ± 0.091 That is, the school admiistrators are 95% cofidet that the true proportio of those two grades who did the summer iterships or commuity services is betwee 0.709 ad 0.891. 14

PSet ----- Stats, Cocepts I Statistics Example 7..3. A previous study has suggested that about 19.3% of tees (aged 1-19) are obese. How large of a sample will be eeded i order to estimate the true proportio of obese tees with 95% cofidece ad a margi of error of o more tha 1%? Solutio: p = 0.193, = 0. 05, z = = 1. 96 z z ( 1 ) z p p p(1 p) 1.96 0.193 (1 0.193) < 0.01 > = 5983 0.0001 0.0001 It is assumed that 6000 < 10%N, ad 10 p >, ( p) 1 > 10. 15

PSet ----- Stats, Cocepts I Statistics Example 7..4. I wat to costruct a 99% cofidece iterval for the proportio of Americas who thik that the govermet has placed too may regulatio o busiesses, ad I wat a margi of error of o more tha 3%. Assume the populatio proportio is 0.5. How large of a sample will this require? Solutio: = 0.01, p = 0.5, z = z = 0.005.576, z p(1 p) 3% =? 0.5(0.5).575(0.5).576 3% = 184 0.03 Example 7..5. A study of 530 people aged 60 or older i US foud 14 with rheumatoid arthritis. Costruct 90% cofidece iterval for the actual proportio of all people aged 60 ad older who have rheumatoid arthritis. Use the calculator to verify your aswers. Solutio: or = 0.1, 14 p ˆ = = 0.033, z z0.05 1.6449 530 = =, = 530, CI =? 0.033(1 0.033) 0.033± 1.6449 = 0.033 ± 0.0035 530 CI = ( 0.01985, 0.0675) 16

PSet ----- Stats, Cocepts I Statistics Example 7..6. [MC109] Based o a survey of a radom sample of 900 adults i the Uited States, a jouralist reports that 60 percet of adults i the Uited States are i favor of icreasig the miimum hourly wage. If the reported percet has a margi error of.7 percetage poits, what is the level of cofidece? Solutio: z p(1 p) 0.6(1 0.6).7% z.7% 900 z 1.6534 0.049 0.1 The CI is 90.0%. It is assumed that all the coditios are satisfied. Example 7..7. [FRQ1605] A pollig agecy showed the followig two statemets to a radom sample of 1048 adults i the Uited States. The order i which the statemets were show was radomly selected for each perso i the sample. After readig the statemets, each perso was asked to choose the statemet that was most cosistet with his or her opiio. The results are show i the table. 17

PSet ----- Stats, Cocepts I Statistics (a) Assume the coditios for iferece have bee met. Costruct ad iterpret a 95 percet cofidece iterval for the proportio of all adults i the Uited States who would have chose i the ecoomy statemet. (b) Oe of the coditios for iferece that was met is that the umber who chose the ecoomy statemet ad the umber who did ot choose the ecoomy statemet are both greater tha 10. Explai why it is ecessary to satisfy that coditio. (c) A suggestio was made to use a two-sample z-iterval for a differece betwee proportios to ivestigate whether the differece i proportios betwee adults i the Uited States who would have chose the eviromet statemet ad the adults i the Uited States who would have chose the ecoomy statemet is statistically sigificat. Is the two-sample z-iterval for a differece betwee proportios a appropriate procedure to ivestigate the differece? Justify your aswer. Solutio: pq ˆ ˆ 0.37(1 0.37) a.) = 1048, = 0.37, = 0.05, z = 1.96, σ = = = 0.0149 1048 CI = 0.37 ± 1.96(0.0149) 0.37 ± 0.03 CI = (0.34, 0.40) 95% of chace that the iterval cotais the populatio proportio of selectig Ecoomy Statemet. b.) Oe of the coditios is p > 10, q > 10. Sice p or q are less tha oe, the sample size should be at least 10. c.) No, the two sampligs should be idepedet. 18

PSet ----- Stats, Cocepts I Statistics Example 7..8. For each of the followig problems of fidig cofidece iterval of populatio proportio from oe-sample proportio, fid the idicated variable. Assume that the sample is idepedet ad the sample size is less tha 10% of the populatio. Case # ˆp p ˆ 1 0. 05 0. 5 30. 05 (1 ) Cof. Iterval Margi of error 0 ( 0.7, 0.9) 3 0. 05 0. 0. 01 4 0. 6 900 0. 03 Solutio: Case # ˆp p ˆ (1 ) Cof. Iterval Margi of error 1 0. 05 0. 5 30 15 15 ( 0.311, 0.6789) 0. 1789 0. 05 0. 8 61 49 1 ( 0.7, 0.9) 0. 1 3 0. 05 0. 6146 19 4917 ( 0.19, 0.1) 0. 01 4 0.067. 6 0.57, 0.63 0. 03 0 900 540 360 ( ) Case #1: 0.5(1 0.5) p ˆ = 30(0.5) = 15, (1 ) = 30(1 0.5) = 15, 0.5 ± 1.96 = 0.5 ± 0. 1789. 30 Case #: 0.7 + 0.9 0.8(1 0.) = = 0.8, MoE = 0.8 0.9 = 0. 1, 1.96 = 0.1 61 Case #3: 0.8(1 0.8) 1.96 = 0.01 6146, CI = 0. ± 0. 01 Case #4: 0.6(1 0.6) z * = 0.03 z* 1.83, = (0.336) = 0. 067 900 19

PSet ----- Stats, Cocepts I Statistics Example 7..9. Give 95% cofidece level ad sample size, prove that the margi of 1 error (MoE) of CI is bouded by. Assume that the coditios for costructig the CI are satisfied. Proof: The MoE is z (1 ) ad z (1 ) (0.5)(0.5) 1 1 1.96 = 1.96(0.5). Note that max{ (1 )} = (1 ) = 0.5. p ˆ = 0.5 0

PSet ----- Stats, Cocepts I Statistics Quick-Check 7..1. Cofidece Iterval for a Proportio i Oe Sample QC 7..1.1. [CB] Courtey has costructed a cricket out of paper ad rubber bads. Accordig to the istructios for makig the cricket, whe it jumps it will lad o its feet half of the time ad o its back the other half of the time. I the 50 jumps, Courtey s cricket laded o its feet 35 times. I the ext 10 jumps, it laded o its feet oly twice. Based o this experiece, Courtey ca coclude that (A) the cricket was due to lad o its feet less tha half the time durig the fial 10 jumps, sice it had haded too ofte o its feet durig the first 50 jumps. (B) a cofidece iterval for estimatig the cricket s true probability of ladig o its feet is wider after the fial 10 jumps tha it was before the fial 10 jumps. (C) a cofidece iterval for estimatig the cricket s true probability of ladig o its feet after the fial 10 jumps is exactly the same as it was before the fial 10 jumps. (D) a cofidece iterval for estimatig the cricket s true probability of ladig o its feet is more arrow after the fial 10 jumps tha it was before the fial 10 jumps. (E) a cofidece iterval for estimatig the cricket s true probability of ladig o its feet based o the iitial 50 jumps does ot iclude 0., so there must be a defect i the cricket s costructio to accout for the poor showig i the fial 10 jumps. QC 7..1.. [MC14] A radom sample of 43 voters revealed that 100 are i favor of a certai bod issue. A 95 percet cofidece iterval for the proportio of the populatio of voters who are i favor of the bod issue is (A) 100 ± 1.96 0.5(0.5) 43 (B) 100 ± 1.645 0.5(0.5) 43 (C) 100 ± 1.96 0.31(0.769) 43 (D) 0.31± 1.96 0.31(0.769) 43 (E) 0.31± 1.645 0.31(0.769) 43 1

PSet ----- Stats, Cocepts I Statistics QC 7..1.3. [MC1413] The maager of a car compay will select a radom sample of its customers to create a 90 percet cofidece iterval to estimate the proportio of its customers who have childre. What is the smallest sample size that will result i a margi of error of o more tha 6 percetage poits? QC 7..1.4. [MC117] A large-sample 98 percet cofidece iterval for the proportio of hotel reservatio that is caceled o the iteded arrival day is (0.048, 0.11). What is the poit estimate for the proportio of hotel reservatios that are caceled o the iteded arrival day from which this iterval was costructed? (A) 0.03 (B) 0.064 (C) 0.080 (D) 0.160 (E) It caot be determied form the iformatio give. QC 7..1.5. [MC16] I 009 a survey of Iteret usage foud that 79 percet of adults age 18 years ad older i the Uited States use the Iteret. A broadbad compay believes that the percet is greater ow tha it was i 009 ad will coduct a survey. The compay plas to costruct a 98 percet cofidece iterval to estimate the curret percet ad wats to the margi of error to be o more tha.5 percetage poits. Assumig that at least 79 percet of adults use the Iteret, which of the followig should be used to fid the sample size () eeded? (A) 0.5 1.96 0.05 0.5(0.5) (B) 1.96 0.05 (C) (D) (E) 0.5(0.5).33 0.05 0.79(0.1).33 0.05 0.79(0.1).33 0.05

PSet ----- Stats, Cocepts I Statistics Aswers QC 7..1.1. D. The proportio is asumed to be p = 0.5. The error term (width) of the cofidece iterval is calculated by p(1 p) ± z. So, whe the sample size is icreasig, the error term will be decreasig. That is, the CI is arrowig whe sample size is icreasig. QC 7..1.. D. 100 p ˆ = = 0.31 43 QC 7..1.3. Sice the proportio is ormalized to 100%, so the questio idicates that p(1 p) 1.64 < 6% without give what p is. The quadratic fuctio f ( p) = p(1 p) reaches maximum whe p = 0.5, therefore, for all 0 p 1, f (0.5) = 0.5(1 0.5) = 0.5 is maximum value. That is 0.048 + 0.11 QC 7..1.4. C. = = 0.08 0.0 QC 7..1.5. D. p ˆ = 0.79, = 0.0, z = ivorm =.33 0.5 1.64 < 6% 187. 3