Chapter 22 Comparing Two Proportions 1 /29
Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /29
Objective Students test null and alternate hypothesis about two population proportions. 3 /29
Comparing Two Proportions In the real world we are much more likely to make comparisons between two values (proportions or means) than we are to investigate the validity of an isolated value. And let s be honest, we are much more interested in comparisons. Do you really care how good your test score is, or would you be more interested in your score compared to the rest of the class? Who won the super bowl? More often than not we want to know how two groups differ. Is treatment 1 more effective than treatment 2? Is a treatment more effective than a placebo control? Is my group better than your group. Who is the best baseball player of all time? (Hint: it was Willy Mays). 4/29
Another Ruler To examine the difference between two proportions, we will use another standard deviation, the standard deviation of the sampling distribution model for the difference between two proportions. Remember, standard deviations do not add, variances add. The variance of the sum (or difference) of two independent, random distributions is the sum of their individual variances. 5/29
The Standard Deviation of the Difference Between Two Proportions Proportions observed in independent random samples are independent. Thus, we can add their variances. So! SD p 1 ( ) = p 1 q 1 (! ) SD p n = p q 2 2 2 1 (! n 2 SD p 1 )2 ( = p q! 1 1 SD p n 2 1 )2 = p 2 q 2 n 2 The standard deviation of the difference between two sample proportions is the square root of the sum of the variances.!! p q SD(p p2 ) = 1 1 + p q 2 2 1 n 2 And it follows that the standard error is!! q1!! q2!! p SE (p p2 ) = 1 1 + p 2 n 2 6/29
Assumptions and Conditions Independence Assumption: Old news Randomization Condition: The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment. The 10% Condition: If the data are sampled without replacement, the sample should not exceed 10% of the population. New news Independent Groups Assumption: The two groups we are comparing must be independent of each other. 7/29
Assumptions and Conditions Sample Size Condition: Success/Failure Condition: Both groups are big enough that at least 10 successes and at least 10 failures have been observed in each. We already know that for large enough samples, each of our proportions has an approximately Normal sampling distribution. The same is true of their difference. 8/29
The Sampling Distribution Provided that the sampled values are independent, the samples are independent, and the samples sizes are large enough, the sampling distribution of!! p2 p 1 is modeled by a Normal model with: Mean µ = p 1 p 2!! p q Standard deviation: SD(p p2 ) = 1 1 + p q 2 2 1 n 2 N p p, 1 2 p 1 q 1 + p 2 q 2 n 2 Think about that standard error for a moment. It is the sum of the variances which, in effect, weights the standard error of the difference distribution. 9/29
Two-Proportion z-interval When the conditions are met, we are ready to find the confidence interval for the difference of two proportions: The confidence interval is!!!! p q1 (p p2 ) ± z * 1 1 + p 2!! q2 n 2!!!! p q1 SE (p p2 ) = 1 1 + p 2!! q2 n 2 As always, the critical value z* is determined by the desired confidence level, C, that you specify. 10/29
Pooled The typical hypothesis test for the difference in two proportions is that there is no difference. Ain t nuthin happenin. In symbols, H 0 : p 1 p 2 = 0 or, if you prefer H 0 : p 1 = p 2. Since we are hypothesizing that there is no difference between the two proportions, that means the sampling distribution for those samples is the same sampling distribution. That implies that the standard errors for each proportion s sampling distribution are actually the same. Since this is the case, we combine (pool) the counts to get a more accurate overall proportion. 11/29
Pooled n The pooled proportion is p! pooled = p! 1 + n 2 p! 2 + n 2 n If you have been given the number of successes for each sample, then p! pooled = x 1 + x 2 + n 2 If the numbers of successes are not whole numbers, round them first. (This is the only time you should round values in the middle of a calculation.) 12/29
Pooled We then put this pooled value into the formula, substituting it for both sample proportions in the standard error formula: "!! p SE (p p2 ) = pooled q" pooled pooled 1 " p + pooled q" pooled n 2 13/29
Compared to What? We then reject our null hypothesis if we see a large enough difference in the two proportions. How can we decide whether the difference we see is large? Same way we always do, measure against the standard error. Unlike previous hypothesis testing situations, the null hypothesis doesn t provide a standard deviation, so we ll use a standard error (here, pooled). 14/29
Two-Proportion z-test The conditions for the two-proportion z-test are the same as for the two-proportion z-interval. We are testing the hypothesis H 0 : p 1 p 2 = 0, or, H 0 : p 1 = p 2. Because we hypothesize that the proportions are equal and form the same sampling distribution, we pool them to find p! pooled = p! 1 + n 2 p! 2 + n 2 15/29
Two-Proportion z-test We use the pooled value to estimate the standard error:!!! p SE (p p2 ) = pooled q! pooled pooled 1! p + pooled q! pooled n 2 Now we find the test statistic: z = p " pooled q" pooled!! (p p2 ) 0 1 " p + pooled q" pooled n 2 When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model,!!! p q! pooled pooled N p 1 p2, n 1! + p pooled q! pooled n 2 so we can use that model to obtain a P-value. 16/29
TI-84 For a two proportion confidence interval (for the difference in proportions) STAT TESTS B:2-PropZInterval x1: n1: x2: n2: C-Level: Calculate For a two proportion z-test (for the difference in proportions). STAT TESTS 6:2-PropZTest x1: n1: x2: n2: p1: p2 <p2 >p2 Calculate 17/29
Example We sample students attending various Colleges and Universities to see how many plan to go to Florida for Spring Break. 5690 of the 12931 randomly selected male students indicated they had plans to go, while 1051 of 3285 randomly selected females indicated intentions to go. Is there a significant difference in the proportions of the males and females intending to behave like idiots during spring break? We are to find whether or not there is a difference in the proportions of males and females intending to go to Florida for Spring Break. H 0 : p m - p f 0, H a : p m - p f 0 H 0 : p m = p f, H a : p m p f p m = proportion of males p f = proportion of females One or the other, do not mix and match, do not use both. 18/29
Spring Break Male and female students were randomly selected from the university. We will assume that the intentions of students to go misbehave are independent. The two groups are independent. p 1 = 5690, q 1 = 7241, n 2 p 2 = 1051, n 2 q 2 = 2234, we have enough success/failure. 12391 males and 3285 females are less tha0% of the population of college students. I will use a Normal model N(0,.0096) to conduct a 2-proportion z-test for the difference in proportions. 19/29
Spring Break n ^ m = 12931 x m = 5690, p m =.4400 n ^ f = 3285 x f = 1051, p f =.3199 p! pooled = p! 1 + n 2 p! 2 + n 2 = 5690 + 1051 12931 + 3285 =.4157 z = p! pooled q! pooled!! (p p2 ) 0 1! p + pooled q! pooled n 2 = (.4157)(.5843) 12931.4400.3199 + (.4157)(.5843) 3285 =.4400.3199.0096 = 12.4723 2-propZTest(46,143,30,151,>p2) ^pm - ^pf =.1201.0000 z = 12.4711, p=.0000 -.0289 -.019 -.0096 0.0096.019.0289 With a probability of essentially zero, we reject the null hypothesis and conclude there is a significant difference in the proportions of males and females intending to be idiotic in Florida for Spring Break. 20/29
Spring Break We will create a 95% confidence interval for the true difference in proportions.!!!! p q1 (p p2 ) ± z * 1 1 + p 2!! q2 n 2 (.4400.3199) ± 1.96 (.44)(.56) 12931 + (.3199)(.6801) 3285 Since we rejected H 0, we no longer pool the variance because we no longer assume both samples are from the same population..1201 ± 1.96(.0096) =.1201 ±.0181 = (.102,.1382) 2-PropZInt(5690,12931,1051,3285,.95) = (.10199,.13819) I am 95% confident that the difference in proportion of males and females intending to go to Florida for Spring Break will fall between 10.2% and 13.9%. 21/29
Computer Here is a Minitab output for testing the two proportions: ^ p ^p1 ^ - p2 z 22/29
Example Do support groups aid in quitting smoking? A county health department tries an experiment using several hundred volunteers who were planning to use the patch to help them quit smoking. The subjects were randomly divided into two groups. People in Group 1 were given the patch and attended a weekly discussion meeting with counselors and others trying to quit. People in Group 2 also used the patch but did not participate in the counseling groups. After six months 46 of the 143 smokers in Group 1 and 30 of 151 smokers in Group 2 had successfully stopped smoking. Do these results suggest that such support groups could be an effective way to help people stop smoking? n ^ 1 = 143 x 1 = 46, p 1 =.3217 n ^ 2 = 151 x 2 = 30, p 2 =.1987 23/29
Example = 143 x 1 = 46, p^ 1 =.3217 n ^ 2 = 151 x 2 = 30, p 2 =.1987 We are interested in determining if smokers trying to quit using a nicotine patch and counseling (Group 1) had a greater success rate (proportion of subjects quitting) than those subjects using a nicotine patch without counseling (Group 2). H 0 : p 1 - p 2 0, H a : p 1 - p 2 > 0 Smokers are independent and were randomly assigned to 2 groups. The two groups are independent. q 1 = 97, p 1 = 46, n 2 q 2 = 121 n 2 p 2 = 30, all > 10, thus we have sufficient success/failure to use a Normal model. 24/29
Example I will use a 2-proportion z test for the normal model N(0,.0511) = 143 x 1 = 46, p^ 1 =.3217 n ^ 2 = 151 x 2 = 30, p 2 =.1987 p! pooled = p! 1 + n 2 p! 2 + n 2 = 46 + 30 143 + 151 =.2585 z = p " pooled q" pooled!! (p p2 ) 0 1 " p + pooled q" pooled n 2 = (.2585)(.7415) 143.3217.1987 + (.2585)(.7415) 151 2-propZTest(46,143,30,151,>p2) z = 2.4077, p=.0080 25/29
Example P(p 1 - p 2.123) = P(z 2.4077) =.0080 ^p1 - ^ p2 =.123.008 z = 2.4077.008 -.153 -.102 -.051 0.051.102.153-3 -2-1 0 1 2 3 SE (p! 1 p! 2 ) = p! pooled q! pooled! p + pooled q! pooled n 2 =.0511 You can check the p-value with normalcdf(.123, 1, 0,.0511) =.008041 26/29
Example If H 0 is true, The probability of getting a difference in proportions as great as.123 simply by chance is.008. I reject the null hypothesis. There is sufficient evidence to believe it plausible that counseling in concert with the nicotine patch improves the proportion of subjects that quit smoking over those that use the patch without counseling. Since the difference is not 0. (We rejected H 0 : p 1 - p 2 = 0) We are now interested in determining the true population difference. 27/29
Example We Confidence will create interval a 95% confidence z-interval for the difference in population proportions of smoking cessation between those who get counseling and those who do not. The conditions have been met to create a 95% confidence interval for the normal model N(.123,.0510)!!!! p q1 (p p2 ) ± z * 1 1!! q2 + p 2 =.123 ± 1.96 n n 1 2.3127 i.6873 143 +.1987 i.8013 151.123 ± 1.96(.0510) =.123 ±.1000 = (.023,.223) 2-PropZInt(46,143,30,151,.95) =(.02344,.22256) I am 95% confident that the increase in proportion of people quitting with counseling will fall between 2.3% and 22.3%. Note that 0 is not in our interval, confirming our results from the hypothesis test. Is that accurate enough to determine if it is worth the cost? 28/29
All Together Now With TI Hypothesis test 2-propZTest(46,143,30,151,>p2) z = 2.4077, p=.0080 ^p1 - ^p2 =.123.008 with the confidence interval. -.153 -.102 -.051 0.051.102.153 2-PropZInt(46,143,30,151,.95) (.02344,.22256) 29/29