Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p) ( pˆ p) Note that N(0,1) and N(0,1) if n is large pq / n pq ˆˆ/ n ( q = 1- p, npˆ 10 and nqˆ 10) It follows that: Confidence interval for p: ( pˆ p) P z z 1 pq ˆˆ n α α α pq ˆˆ pˆ z p pˆ + z n α α pq ˆˆ n 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1

A Better Confidence Interval for Proportion Use this probability statement ( pˆ p) P z z 1 pq n α α α Solve for p using quadratic equation CI for p: 1/15/008 Unit 9 - Stat 571 - Ramón V. León 3 1/15/008 Unit 9 - Stat 571 - Ramón V. León 4

CI for Proportion in JMP Arbitrary choice of names = 0.45 x 800 = 0.55 x 800 Value column has two categories. Can you imagine a situation where one would have more categories in this column? 1/15/008 Unit 9 - Stat 571 - Ramón V. León 5 CI for Proportion in JMP 1/15/008 Unit 9 - Stat 571 - Ramón V. León 6 3

Sample Size Determination for a Confidence Interval for Proportion Want (1-α)-level two-sided CI: pq ˆˆ pˆ ± E where E is the margin of error. Then E = zα. n zα Solving for n gives n= pq ˆˆ (Formula 9.4) E 1 1 1 Largest value of pq = = so conservative sample size is: 4 zα 1 n = (Formula 9.5) E 4 1/15/008 Unit 9 - Stat 571 - Ramón V. León 7 Example 9.: Presidential Poll n 1.96 1 = = 9604 0.01 4 1067.11 9 = 9604 Threefold increase in precision requires ninefold increase in sample size 1/15/008 Unit 9 - Stat 571 - Ramón V. León 8 4

Largest Sample Hypothesis Test on Proportion H : p = p vs. H : p p 0 0 1 0 Best test statistics: z = ˆp p p q 0 0 0 n Dual relationship between CI and test of hypothesis holds if the better confidence interval is used. There is an exact test that can be used when the sample size is small (given in Section 9.1.3). We do not cover it. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 9 Basketball Problem: z-test.18 1/15/008 Unit 9 - Stat 571 - Ramón V. León 10 5

Test for Proportion in JMP: Baseball Problem Value has two categories. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 11 Test for Proportion in JMP: Basketball Problem 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 6

Sample Size for Z-Test of Proportion H : p p vs. H : p > p 1 0 o α 0 1 0 Suppose that the power for rejecting H 0 must be at least 1- β when the true proportion is p = p > p. Let δ = p p. Then 1 0 zα p0q0 + z pq β 1 1 n = ˆp p0 δ z = p0q0 n Replace z by z for two-sided test sample size. α Test based on: 1/15/008 Unit 9 - Stat 571 - Ramón V. León 13 H H Example 9.4: Pizza Testing 0 1 : Can't tell two pizzas Apart, : Can tell pizzas apart α =.10, We wants β =.5 when p = 0.5 ± 0.1 zα p0q0 + z pq β 1 1 n = δ ( )( ) + ( )( ) 1.645.5.5 0.675.6.4 = = 13.98 133 0.1 1/15/008 Unit 9 - Stat 571 - Ramón V. León 14 7

Multinomial Test of Proportions 1/15/008 Unit 9 - Stat 571 - Ramón V. León 15 Multinomial Test in JMP Note that one could have gotten confidence intervals Observed count does not exhibit significant deviation from the uniform model. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 16 8

Comparing Two Proportion: Independent Sample Design If np 1 1, nq 1 1, np, nq 10, then pˆ1 pˆ ( p1 p) Z = N(0,1) pq ˆˆ ˆˆ 1 1 pq + n n Confidence Interval: 1 pˆˆ q pˆˆ q pq ˆˆ pˆˆ q pˆ pˆ z + p p pˆ pˆ + z + 1 1 1 1 1 α 1 1 α n1 n n1 n 1/15/008 Unit 9 - Stat 571 - Ramón V. León 17 Test for Equality of Proportions (Large n) Independent Sample Design H : p = p vs. H : p p 0 1 1 1 pˆ1 pˆ Test statitics: z = 1 1 pq ˆˆ + n 1 n npˆ + npˆ x+ y where pˆ 1 1 = = n1+ n n1+ n There is small sample test called Fisher s exact test. See JMP output latter. See Example 9.5 for application to the Salk Polio Vaccine Trial There a test for Matched Pair Design in Section 9... Please read Example 9.9 to see its application for testing the effectiveness of presidential debates. Do voters change their minds about candidates because of debates? 1/15/008 Unit 9 - Stat 571 - Ramón V. León 18 9

Example 9.6 Comparing Two Leukemia Therapies 1/15/008 Unit 9 - Stat 571 - Ramón V. León 19 Test for Equality of Proportions in JMP: Example 9.6 1/15/008 Unit 9 - Stat 571 - Ramón V. León 0 10

Example 9.6 JMP Output 1.00 0.75 Result 0.50 Success 0.5 0.00 Prednisone Prednisone + VCR Failure Drug Group 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Test for Equality of Proportions in JMP Tests Source Model Error C. Total N DF 1 61 6 63 -LogLike.600496 6.575470 9.175966 RSquare (U) 0.0891 Recall that the P- value of the twosided z-test was calculated to be 0.019 Test Likelihood Ratio Pearson Fisher's Exact Test Left Right -Tail ChiSquare 5.01 5.507 Prob 0.9958 0.054 0.033 Prob>ChiSq 0.06 0.0189 z = (-.347) Less significant result 1/15/008 Unit 9 - Stat 571 - Ramón V. León 11

Inferences for Two-Way Count Data Sampling Model 1: Multinomial Model (Total Sample Size Fixed) Sample of 901 from a single population that is then cross-classified The null hypothesis is that X and Y are independent: H : p = P( X = i, Y = j) = P( X = i) P( Y = j) = p p for all i, j 0 ij i.. j 1/15/008 Unit 9 - Stat 571 - Ramón V. León 3 Sampling Model 1 (Total Sample Size Fixed) 6 06 6 06 Estimated Expected Frequency = 901 = = 14.18 901 901 901 (Cell 1,1) = np p 1 1 1/15/008 Unit 9 - Stat 571 - Ramón V. León 4 1

Chi-Square Statistics χ c ( ni ei) = e i= 1 i 1/15/008 Unit 9 - Stat 571 - Ramón V. León 5 Chi-Square Test Critical Value The d.f. for this χ statistics is (4-1)(4-1) = 9. Since χ = 16.919 9,.05 the calculated χ = 11.989 is not sufficiently large to reject the hypothesis of independence at α =.05 level In general df = (r-1)(c-1) where c is the number of columns and r is the number of rows. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 6 13

JMP Analysis 1/15/008 Unit 9 - Stat 571 - Ramón V. León 7 JMP Analysis Shows no significance 1/15/008 Unit 9 - Stat 571 - Ramón V. León 8 14

JMP Analysis Note that most of the contribution to the chi-square statistics comes from the corner cells. Lack of significance in the chi-square statistics is the result of the low contribution to the chisquare statistic coming from the center cells. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 9 JMP Analysis Restricting our chi-square analysis to the corner cells shows a strong relationship between income and level of satisfaction. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 30 15

Product Multinomial Model: Row Totals Fixed Notice that it does not make any sense to say that for the population 1 out of 80 returned wallets came from Big Cities. Homework Problem 9.8 1/15/008 Unit 9 - Stat 571 - Ramón V. León 31 Product Multinomial Model: Row Totals Fixed Sampling Model : Product Multinomial Total number of patients in each drug group is fixed. The null hypothesis is that the probability of column response (success or failure) is the same, regardless of the row population: H : ( ) 0 PY= j X= i = pj 1/15/008 Unit 9 - Stat 571 - Ramón V. León 3 16

Chi-Square Statistics χ c ( ni ei) = e i= 1 i 1/15/008 Unit 9 - Stat 571 - Ramón V. León 33 JMP Analysis 1/15/008 Unit 9 - Stat 571 - Ramón V. León 34 17

JMP Analysis Recall Slide 19: ( ) z = =.347 5.5084 1/15/008 Unit 9 - Stat 571 - Ramón V. León 35 Remarks About Chi-Square Test The distribution of the chi-square statistics under the null hypothesis is approximately chi-square only when the sample sizes are large The rule of thumb is that all expected cell counts should be greater than 1 and No more than 1/5 th of the expected cell counts should be less than 5. Combine sparse cell (having small expected cell counts) with adjacent cells. Unfortunately, this has the drawback of losing some information. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 36 18

Odds Ratio as a Measure of Association for a x Table Sampling Model I: Multinomial p11 p1 ψ = p p 1 The numerator is the odds of the column 1 outcome vs. the column outcome for row 1, and the denominator is the same odds for row, hence the name odds ratio 1/15/008 Unit 9 - Stat 571 - Ramón V. León 37 Odds Ratio as a Measure of Association for a x Table Sampling Model II: Product Multinomial ψ = p p ( 1 p ) ( 1 p ) 1 1 The two column outcomes are labeled as success and failure, then ψ is the odds of success for the row 1 population vs. the odds of success for the row population 1/15/008 Unit 9 - Stat 571 - Ramón V. León 38 19

Odds Ratio as a Measure of Association for a x Table n 11 n 1 14 38 n11 + n1 n1 + n 1 4 14 38 ψˆ = = = n n 7 4 7 4 1 n 1 4 11 + n 1 n1 + n n11 n1 n11n 14 4 ψˆ = = = = 0.105 n n n n 38 7 1 1 1 Confidence Inteval: [0.053, 0.831] 1/15/008 Unit 9 - Stat 571 - Ramón V. León 39 Case-Control Studies: The Odds Ratio Approximates the Relative Risk if the Disease is Rare 1/15/008 Unit 9 - Stat 571 - Ramón V. León 40 0

JMP Output for Case-Control Study 1/15/008 Unit 9 - Stat 571 - Ramón V. León 41 How to Do It in JMP 1/15/008 Unit 9 - Stat 571 - Ramón V. León 4 1

1/15/008 Unit 9 - Stat 571 - Ramón V. León 43 1/15/008 Unit 9 - Stat 571 - Ramón V. León 44

1/15/008 Unit 9 - Stat 571 - Ramón V. León 45 Do You Need to Know More? 578 Categorical Data Analysis (3) Log-linear analysis of multidimensional contingency tables. Logistic regression. Theory, applications, and use of statistical software. Prereq: 1 yr graduate-level statistics, regression analysis and analysis of variance, or consent of instructor. Sp Reference: An Introduction to Categorical Data Analysis by Alan Agresti. Wiley. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 46 3