STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio proportios. 2. Describe the relatioship betwee the sample sizes, cofidece level, ad margi of error for a cofidece iterval for the differece betwee two populatio proportios. 3. Determie the sample size required for a specified cofidece level ad margi of error for the estimate of the differece betwee two populatio proportios. Populatio Proportios I this module, we are goig to lear how to compare two populatio proportios. Remember, a populatio proportio, p is simply the percetage of a populatio that has a specified attribute. 1

Quick Review o Populatio Proportio ad Sample Proportio I short, a sample proportio is obtaied by dividig the umber of members sampled that have the specified attribute (x) by the total umber of members sampled (). Sometimes, we refer to x as the umber of successes ad -x as the umber o failures. Quick Review o Oe-Proportio z-iterval Whe the coditios are met, we are ready to fid the cofidece iterval for the populatio proportio, p. The cofidece iterval is pˆ ± z SE pˆ ( ) where SE( pˆ ) = pq ˆ ˆ The critical value, z*, depeds o the particular cofidece level, C, that you specify. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more iterestig. We ofte wat to kow how two groups differ, whether a treatmet is better tha a placebo cotrol, or whether this year s results are better tha last year s. 2

Aother Ruler I order to examie the differece betwee two proportios, we eed aother ruler the stadard deviatio of the samplig distributio model for the differece betwee two proportios. Recall that stadard deviatios do t add, but variaces do. I fact, the variace of the sum or differece of two idepedet radom variables is the sum of their idividual variaces. The Stadard Deviatio of the Differece Betwee Two Proportios Proportios observed i idepedet radom samples are idepedet. Thus, we ca add their variaces. So The stadard deviatio of the differece betwee two sample proportios is p1q1 p2q2 SD( pˆ ˆ 1 p2 ) = + Thus, the stadard error is 1 2 ( ˆ pˆ ) = + SE p pˆ qˆ pˆ qˆ Assumptios ad Coditios Idepedece Assumptios: Radomizatio Coditio: The data i each group should be draw idepedetly ad at radom from a homogeeous populatio or geerated by a radomized comparative experimet. The 10% Coditio: If the data are sampled without replacemet, the sample should ot exceed 10% of the populatio. Idepedet Groups Assumptio: The two groups we re comparig must be idepedet of each other. 3

Assumptios ad Coditios (cot.) Sample Size Coditio: Each of the groups must be big eough Success/Failure Coditio: Both groups are big eough that both successes ad failures are at least 5 have bee observed i each. The Samplig Distributio We already kow that for large eough samples, each of our proportios has a approximately Normal samplig distributio. The same is true of their differece. The Samplig Distributio (cot.) Provided that the sampled values are idepedet, the samples are idepedet, ad the samples sizes are large eough, the samplig distributio of pˆ pˆ is modeled by a Normal model with Mea: µ = p1 p2 Stadard deviatio: 1 2 ( ˆ pˆ ) = + SD p p q p q 4

Two-Proportio z-iterval Whe the coditios are met, we are ready to fid the cofidece iterval for the differece of two proportios: The cofidece iterval is pˆ pˆ ± z SE pˆ pˆ where ( ) ( ) pˆ 1qˆ 1 pˆ 2qˆ 2 SE( pˆ ˆ 1 p2 ) = + The critical value z* depeds o the particular cofidece level, C, that you specify. Pool or Not Pool? The typical hypothesis test for the differece i two proportios is the oe of o differece. I symbols, H 0 : p 1 p 2 = 0. Sice we are hypothesizig that there is o differece betwee the two proportios, that meas that the stadard deviatios for each proportio are the same. Sice this is the case, we combie (pool) the couts to get oe overall proportio. What is the Pooled Proportio? The pooled proportio is Success + Success pˆ pooled = + where Success1 = 1 pˆ 1 ad Success2 = 2 pˆ 2 If the umbers of successes are ot whole umbers, roud them first. (This is the oly time you should roud values i the middle of a calculatio.) 5

What is the Pooled Proportio? (Cot.) We the put this pooled value ito the formula, substitutig it for both sample proportios i the stadard error formula: pˆ qˆ pˆ qˆ SE ( ˆ ˆ pooled p1 p2 ) = + pooled pooled pooled pooled Compared to What? We ll reject our ull hypothesis if we see a large eough differece i the two proportios. How ca we decide whether the differece we see is large? Just compare it with its stadard deviatio. Ulike previous hypothesis testig situatios, the ull hypothesis does t provide a stadard deviatio, so we ll use a stadard error (here, pooled). Two-Proportio z-test The coditios for the two-proportio z-test are the same as for the two-proportio z-iterval. We are testig the hypothesis H 0 : p 1 = p 2. Because we hypothesize that the proportios are equal, we pool them to fid Success + Success pˆ pooled = + 6

Two-Proportio z-test (cot.) We use the pooled value to estimate the stadard error: pˆ ˆ ˆ ˆ pooledqpooled ppooledqpooled SE ( ˆ ˆ pooled p1 p2 ) = + 1 2 Now we fid the test statistic: ( pˆ 1 pˆ 2 ) 0 z = SE ( ˆ ˆ pooled p1 p2 ) Whe the coditios are met ad the ull hypothesis is true, this statistic follows the stadard Normal model, so we ca use that model to obtai a P-value. Quick Review Let s look at the followig oe more time: How to fid oe proportio z-iterval? How to perform a oe proportio z-test? What is the samplig distributio of the differece betwee two sample proportios? How to perform a two proportio z-test? How to fid a two proportio z-iterval? Let s review How to Costruct a Oe Proportio z-iterval? 7

A Quick Review o How to Perform a Oe Proportio z-test? How to Perform a Oe Proportio z-test? (Cot.) The Samplig Distributio of the Differece Betwee Two Sample Proportios What does it mea? For large idepedet samples, the possible differeces betwee two sample proportios have approximately a ormal distributio with mea p 1 p 2 ad stadard deviatio as above. 8

How to Perform a Two Proportio z-test? How to Perform a Two Proportio z-test? (Cot.) How to Perform a Two Proportio z-iterval? 9

What is the Margi of Error for the Estimate of p 1 - p 2? What does it mea? The margi of error equals half the legth of the cofidece iterval. It represets the precisio with which the differece betwee the sample proportios estimates the differece betwee the populatio proportios at the specified cofidece level. How to Fid the Sample Size for Estimatig p 1 - p 2? What Ca Go Wrog? Do t Misstate What the Iterval Meas: Do t suggest that the parameter varies. Do t claim that other samples will agree with yours. Do t be certai about the parameter. Do t forget: It s the parameter (ot the statistic). Do t claim to kow too much. Do take resposibility (for the ucertaity). 10

What Ca Go Wrog? (cot.) Margi of Error Too Large to Be Useful: We ca t be exact, but how precise do we eed to be? Oe way to make the margi of error smaller is to reduce your level of cofidece. (That may ot be a useful solutio.) You eed to thik about your margi of error whe you desig your study. To get a arrower iterval without givig up cofidece, you eed to have less variability. You ca do this with a larger sample What Ca Go Wrog? (cot.) Violatios of Assumptios: Watch out for biased samplig. Thik about idepedece. What Ca Go Wrog? (cot.) Do t base your ull hypothesis o what you see i the data. Thik about the situatio you are ivestigatig ad develop your ull hypothesis appropriately. Do t base your alterative hypothesis o the data, either. Agai, you eed to Thik about the situatio. 11

What Ca Go Wrog? (Cot.) Do t use two-sample proportio methods whe the samples are t idepedet. These methods give wrog aswers whe the idepedece assumptio is violated. Do t apply iferece methods whe there was o radomizatio. Our data must come from represetative radom samples or from a properly radomized experimet. Do t iterpret a sigificat differece i proportios causally. Be careful ot to jump to coclusios about causality. What have we leared? We have leared to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio proportios. 2. Describe the relatioship betwee the sample sizes, cofidece level, ad margi of error for a cofidece iterval for the differece betwee two populatio proportios. 3. Determie the sample size required for a specified cofidece level ad margi of error for the estimate of the differece betwee two populatio proportios. Credit Some of these slides have bee adapted/modified i part/whole from the slides of the followig textbooks. Weiss, Neil A., Itroductory Statistics, 8th Editio Bock, David E., Stats: Data ad Models, 3rd Editio 12