Chapter 22. Comparing Two Proportions 1 /29

Similar documents
Chapter 22. Comparing Two Proportions 1 /30

STA Module 10 Comparing Two Proportions

9-6. Testing the difference between proportions /20

Chapter 24. Comparing Means

Is Yawning Contagious video

CHAPTER 10 Comparing Two Populations or Groups

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

10.1. Comparing Two Proportions. Section 10.1

Difference Between Pair Differences v. 2 Samples

Chapter 10: Comparing Two Populations or Groups

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

Chapter 18. Sampling Distribution Models /51

Difference between means - t-test /25

Statistics for Business and Economics: Confidence Intervals for Proportions

Chapter 10: Comparing Two Populations or Groups

Chapter 10: Comparing Two Populations or Groups

Hypothesis Testing with Z and T

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 9. Hypothesis testing. 9.1 Introduction

Comparing Means from Two-Sample

Correlation and regression

Inferences About Two Proportions

The Components of a Statistical Hypothesis Testing Problem

Chapter 26: Comparing Counts (Chi Square)

Chapter 20 Comparing Groups

Chapter 9 Inferences from Two Samples

Two-Sample Inference for Proportions and Inference for Linear Regression

Midterm 1 and 2 results

Chapter 23: Inferences About Means

Two-Sample Inferential Statistics

One-sample categorical data: approximate inference

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Harvard University. Rigorous Research in Engineering Education

Sampling Distributions: Central Limit Theorem

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Analysis of variance (ANOVA) Comparing the means of more than two groups

Statistical Inference

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Chapter 15 Sampling Distribution Models

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. describes the.

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

T test for two Independent Samples. Raja, BSc.N, DCHN, RN Nursing Instructor Acknowledgement: Ms. Saima Hirani June 07, 2016

One-factor analysis of variance (ANOVA)

HYPOTHESIS TESTING. Hypothesis Testing

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

STAT Chapter 8: Hypothesis Tests

Lecture 5: Sampling Methods

Data Analysis and Statistical Methods Statistics 651

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

[ z = 1.48 ; accept H 0 ]

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

10.4 Hypothesis Testing: Two Independent Samples Proportion

hypothesis a claim about the value of some parameter (like p)

Statistical Inference. Section 9.1 Significance Tests: The Basics. Significance Test. The Reasoning of Significance Tests.

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Chapter. Hypothesis Testing with Two Samples. Copyright 2015, 2012, and 2009 Pearson Education, Inc. 1

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Problem Set 4 - Solutions

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

CHAPTER 10 HYPOTHESIS TESTING WITH TWO SAMPLES

Practice AP Statistics Exam Saturday April 29, 2017 University of Delaware. Section II: Free Response

Proportion. Lecture 25 Sections Fri, Oct 10, Hampden-Sydney College. Sampling Distribution of a Sample. Proportion. Robb T.

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

STA Module 11 Inferences for Two Population Means

STA Rev. F Learning Objectives. Two Population Means. Module 11 Inferences for Two Population Means

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

CBA4 is live in practice mode this week exam mode from Saturday!

79 Wyner Math Academy I Spring 2016

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Single Sample Means. SOCY601 Alan Neustadtl

CHAPTER 7. Hypothesis Testing

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

This gives us an upper and lower bound that capture our population mean.

Psych 230. Psychological Measurement and Statistics

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Many natural processes can be fit to a Poisson distribution

Lecture 5: ANOVA and Correlation

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Distribution-Free Procedures (Devore Chapter Fifteen)

10/31/2012. One-Way ANOVA F-test

Data Analysis and Statistical Methods Statistics 651

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

Transcription:

Chapter 22 Comparing Two Proportions 1 /29

Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /29

Objective Students test null and alternate hypothesis about two population proportions. 3 /29

Comparing Two Proportions In the real world we are much more likely to make comparisons between two values (proportions or means) than we are to investigate the validity of an isolated value. And let s be honest, we are much more interested in comparisons. Do you really care how good your test score is, or would you be more interested in your score compared to the rest of the class? Who won the super bowl? More often than not we want to know how two groups differ. Is treatment 1 more effective than treatment 2? Is a treatment more effective than a placebo control? Is my group better than your group. Who is the best baseball player of all time? (Hint: it was Willy Mays). 4/29

Another Ruler To examine the difference between two proportions, we will use another standard deviation, the standard deviation of the sampling distribution model for the difference between two proportions. Remember, standard deviations do not add, variances add. The variance of the sum (or difference) of two independent, random distributions is the sum of their individual variances. 5/29

The Standard Deviation of the Difference Between Two Proportions Proportions observed in independent random samples are independent. Thus, we can add their variances. So! SD p 1 ( ) = p 1 q 1 (! ) SD p n = p q 2 2 2 1 (! n 2 SD p 1 )2 ( = p q! 1 1 SD p n 2 1 )2 = p 2 q 2 n 2 The standard deviation of the difference between two sample proportions is the square root of the sum of the variances.!! p q SD(p p2 ) = 1 1 + p q 2 2 1 n 2 And it follows that the standard error is!! q1!! q2!! p SE (p p2 ) = 1 1 + p 2 n 2 6/29

Assumptions and Conditions Independence Assumption: Old news Randomization Condition: The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment. The 10% Condition: If the data are sampled without replacement, the sample should not exceed 10% of the population. New news Independent Groups Assumption: The two groups we are comparing must be independent of each other. 7/29

Assumptions and Conditions Sample Size Condition: Success/Failure Condition: Both groups are big enough that at least 10 successes and at least 10 failures have been observed in each. We already know that for large enough samples, each of our proportions has an approximately Normal sampling distribution. The same is true of their difference. 8/29

The Sampling Distribution Provided that the sampled values are independent, the samples are independent, and the samples sizes are large enough, the sampling distribution of!! p2 p 1 is modeled by a Normal model with: Mean µ = p 1 p 2!! p q Standard deviation: SD(p p2 ) = 1 1 + p q 2 2 1 n 2 N p p, 1 2 p 1 q 1 + p 2 q 2 n 2 Think about that standard error for a moment. It is the sum of the variances which, in effect, weights the standard error of the difference distribution. 9/29

Two-Proportion z-interval When the conditions are met, we are ready to find the confidence interval for the difference of two proportions: The confidence interval is!!!! p q1 (p p2 ) ± z * 1 1 + p 2!! q2 n 2!!!! p q1 SE (p p2 ) = 1 1 + p 2!! q2 n 2 As always, the critical value z* is determined by the desired confidence level, C, that you specify. 10/29

Pooled The typical hypothesis test for the difference in two proportions is that there is no difference. Ain t nuthin happenin. In symbols, H 0 : p 1 p 2 = 0 or, if you prefer H 0 : p 1 = p 2. Since we are hypothesizing that there is no difference between the two proportions, that means the sampling distribution for those samples is the same sampling distribution. That implies that the standard errors for each proportion s sampling distribution are actually the same. Since this is the case, we combine (pool) the counts to get a more accurate overall proportion. 11/29

Pooled n The pooled proportion is p! pooled = p! 1 + n 2 p! 2 + n 2 n If you have been given the number of successes for each sample, then p! pooled = x 1 + x 2 + n 2 If the numbers of successes are not whole numbers, round them first. (This is the only time you should round values in the middle of a calculation.) 12/29

Pooled We then put this pooled value into the formula, substituting it for both sample proportions in the standard error formula: "!! p SE (p p2 ) = pooled q" pooled pooled 1 " p + pooled q" pooled n 2 13/29

Compared to What? We then reject our null hypothesis if we see a large enough difference in the two proportions. How can we decide whether the difference we see is large? Same way we always do, measure against the standard error. Unlike previous hypothesis testing situations, the null hypothesis doesn t provide a standard deviation, so we ll use a standard error (here, pooled). 14/29

Two-Proportion z-test The conditions for the two-proportion z-test are the same as for the two-proportion z-interval. We are testing the hypothesis H 0 : p 1 p 2 = 0, or, H 0 : p 1 = p 2. Because we hypothesize that the proportions are equal and form the same sampling distribution, we pool them to find p! pooled = p! 1 + n 2 p! 2 + n 2 15/29

Two-Proportion z-test We use the pooled value to estimate the standard error:!!! p SE (p p2 ) = pooled q! pooled pooled 1! p + pooled q! pooled n 2 Now we find the test statistic: z = p " pooled q" pooled!! (p p2 ) 0 1 " p + pooled q" pooled n 2 When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model,!!! p q! pooled pooled N p 1 p2, n 1! + p pooled q! pooled n 2 so we can use that model to obtain a P-value. 16/29

TI-84 For a two proportion confidence interval (for the difference in proportions) STAT TESTS B:2-PropZInterval x1: n1: x2: n2: C-Level: Calculate For a two proportion z-test (for the difference in proportions). STAT TESTS 6:2-PropZTest x1: n1: x2: n2: p1: p2 <p2 >p2 Calculate 17/29

Example We sample students attending various Colleges and Universities to see how many plan to go to Florida for Spring Break. 5690 of the 12931 randomly selected male students indicated they had plans to go, while 1051 of 3285 randomly selected females indicated intentions to go. Is there a significant difference in the proportions of the males and females intending to behave like idiots during spring break? We are to find whether or not there is a difference in the proportions of males and females intending to go to Florida for Spring Break. H 0 : p m - p f 0, H a : p m - p f 0 H 0 : p m = p f, H a : p m p f p m = proportion of males p f = proportion of females One or the other, do not mix and match, do not use both. 18/29

Spring Break Male and female students were randomly selected from the university. We will assume that the intentions of students to go misbehave are independent. The two groups are independent. p 1 = 5690, q 1 = 7241, n 2 p 2 = 1051, n 2 q 2 = 2234, we have enough success/failure. 12391 males and 3285 females are less tha0% of the population of college students. I will use a Normal model N(0,.0096) to conduct a 2-proportion z-test for the difference in proportions. 19/29

Spring Break n ^ m = 12931 x m = 5690, p m =.4400 n ^ f = 3285 x f = 1051, p f =.3199 p! pooled = p! 1 + n 2 p! 2 + n 2 = 5690 + 1051 12931 + 3285 =.4157 z = p! pooled q! pooled!! (p p2 ) 0 1! p + pooled q! pooled n 2 = (.4157)(.5843) 12931.4400.3199 + (.4157)(.5843) 3285 =.4400.3199.0096 = 12.4723 2-propZTest(46,143,30,151,>p2) ^pm - ^pf =.1201.0000 z = 12.4711, p=.0000 -.0289 -.019 -.0096 0.0096.019.0289 With a probability of essentially zero, we reject the null hypothesis and conclude there is a significant difference in the proportions of males and females intending to be idiotic in Florida for Spring Break. 20/29

Spring Break We will create a 95% confidence interval for the true difference in proportions.!!!! p q1 (p p2 ) ± z * 1 1 + p 2!! q2 n 2 (.4400.3199) ± 1.96 (.44)(.56) 12931 + (.3199)(.6801) 3285 Since we rejected H 0, we no longer pool the variance because we no longer assume both samples are from the same population..1201 ± 1.96(.0096) =.1201 ±.0181 = (.102,.1382) 2-PropZInt(5690,12931,1051,3285,.95) = (.10199,.13819) I am 95% confident that the difference in proportion of males and females intending to go to Florida for Spring Break will fall between 10.2% and 13.9%. 21/29

Computer Here is a Minitab output for testing the two proportions: ^ p ^p1 ^ - p2 z 22/29

Example Do support groups aid in quitting smoking? A county health department tries an experiment using several hundred volunteers who were planning to use the patch to help them quit smoking. The subjects were randomly divided into two groups. People in Group 1 were given the patch and attended a weekly discussion meeting with counselors and others trying to quit. People in Group 2 also used the patch but did not participate in the counseling groups. After six months 46 of the 143 smokers in Group 1 and 30 of 151 smokers in Group 2 had successfully stopped smoking. Do these results suggest that such support groups could be an effective way to help people stop smoking? n ^ 1 = 143 x 1 = 46, p 1 =.3217 n ^ 2 = 151 x 2 = 30, p 2 =.1987 23/29

Example = 143 x 1 = 46, p^ 1 =.3217 n ^ 2 = 151 x 2 = 30, p 2 =.1987 We are interested in determining if smokers trying to quit using a nicotine patch and counseling (Group 1) had a greater success rate (proportion of subjects quitting) than those subjects using a nicotine patch without counseling (Group 2). H 0 : p 1 - p 2 0, H a : p 1 - p 2 > 0 Smokers are independent and were randomly assigned to 2 groups. The two groups are independent. q 1 = 97, p 1 = 46, n 2 q 2 = 121 n 2 p 2 = 30, all > 10, thus we have sufficient success/failure to use a Normal model. 24/29

Example I will use a 2-proportion z test for the normal model N(0,.0511) = 143 x 1 = 46, p^ 1 =.3217 n ^ 2 = 151 x 2 = 30, p 2 =.1987 p! pooled = p! 1 + n 2 p! 2 + n 2 = 46 + 30 143 + 151 =.2585 z = p " pooled q" pooled!! (p p2 ) 0 1 " p + pooled q" pooled n 2 = (.2585)(.7415) 143.3217.1987 + (.2585)(.7415) 151 2-propZTest(46,143,30,151,>p2) z = 2.4077, p=.0080 25/29

Example P(p 1 - p 2.123) = P(z 2.4077) =.0080 ^p1 - ^ p2 =.123.008 z = 2.4077.008 -.153 -.102 -.051 0.051.102.153-3 -2-1 0 1 2 3 SE (p! 1 p! 2 ) = p! pooled q! pooled! p + pooled q! pooled n 2 =.0511 You can check the p-value with normalcdf(.123, 1, 0,.0511) =.008041 26/29

Example If H 0 is true, The probability of getting a difference in proportions as great as.123 simply by chance is.008. I reject the null hypothesis. There is sufficient evidence to believe it plausible that counseling in concert with the nicotine patch improves the proportion of subjects that quit smoking over those that use the patch without counseling. Since the difference is not 0. (We rejected H 0 : p 1 - p 2 = 0) We are now interested in determining the true population difference. 27/29

Example We Confidence will create interval a 95% confidence z-interval for the difference in population proportions of smoking cessation between those who get counseling and those who do not. The conditions have been met to create a 95% confidence interval for the normal model N(.123,.0510)!!!! p q1 (p p2 ) ± z * 1 1!! q2 + p 2 =.123 ± 1.96 n n 1 2.3127 i.6873 143 +.1987 i.8013 151.123 ± 1.96(.0510) =.123 ±.1000 = (.023,.223) 2-PropZInt(46,143,30,151,.95) =(.02344,.22256) I am 95% confident that the increase in proportion of people quitting with counseling will fall between 2.3% and 22.3%. Note that 0 is not in our interval, confirming our results from the hypothesis test. Is that accurate enough to determine if it is worth the cost? 28/29

All Together Now With TI Hypothesis test 2-propZTest(46,143,30,151,>p2) z = 2.4077, p=.0080 ^p1 - ^p2 =.123.008 with the confidence interval. -.153 -.102 -.051 0.051.102.153 2-PropZInt(46,143,30,151,.95) (.02344,.22256) 29/29