Unit 9: Inferences for Proportions and Count Data

Size: px

Start display at page:

Download "Unit 9: Inferences for Proportions and Count Data"

Polly Mosley
6 years ago
Views:

1 Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat Ramón V. León 1

2 Large Sample Confidence Interval for Proportion ( pˆ p) ( pˆ p) Note that N(0,1) and N(0,1) if n is large pq / n pq ˆˆ/ n ( q = 1- p, npˆ 10 and nqˆ 10) It follows that: Confidence interval for p: ( pˆ p) P z z 1 pq ˆˆ n α 2 α 2 α pq ˆˆ pˆ z p pˆ + z n α 2 α 2 pq ˆˆ n 12/15/2008 Unit 9 - Stat Ramón V. León 2

3 A Better Confidence Interval for Proportion Use this probability statement ( pˆ p) P z z 1 pq n α 2 α 2 α Solve for p using quadratic equation CI for p: 12/15/2008 Unit 9 - Stat Ramón V. León 3

4 12/15/2008 Unit 9 - Stat Ramón V. León 4

Can you imagine a situation where one would have more

5 CI for Proportion in JMP Arbitrary choice of names = 0.45 x 800 = 0.55 x 800 Value column has two categories. Can you imagine a situation where one would have more categories in this column? 12/15/2008 Unit 9 - Stat Ramón V. León 5

6 CI for Proportion in JMP 12/15/2008 Unit 9 - Stat Ramón V. León 6

7 Sample Size Determination for a Confidence Interval for Proportion Want (1-α)-level two-sided CI: pq ˆˆ pˆ ± E where E is the margin of error. Then E = zα 2. n zα 2 Solving for n gives n= pq ˆˆ (Formula 9.4) E Largest value of pq = = so conservative sample size is: zα 2 n = E (Formula 9.5) 12/15/2008 Unit 9 - Stat Ramón V. León 7

8 Example 9.2: Presidential Poll n = = = 9604 Threefold increase in precision requires ninefold increase in sample size 12/15/2008 Unit 9 - Stat Ramón V. León 8

9 Largest Sample Hypothesis Test on Proportion H : p = p vs. H : p p Best test statistics: z = ˆp pq 0 0 p 0 n Dual relationship between CI and test of hypothesis holds if the better confidence interval is used. There is an exact test that can be used when the sample size is small (given in Section 9.1.3). We do not cover it. 12/15/2008 Unit 9 - Stat Ramón V. León 9

10 Basketball Problem: z-test /15/2008 Unit 9 - Stat Ramón V. León 10

11 Test for Proportion in JMP: Baseball Problem Value has two categories. 12/15/2008 Unit 9 - Stat Ramón V. León 11

12 Test for Proportion in JMP: Basketball Problem 12/15/2008 Unit 9 - Stat Ramón V. León 12

13 Sample Size for Z-Test of Proportion 1 0 H : p p vs. H : p > p o α Suppose that the power for rejecting H must be at least 1- β when the true proportion is p= p > p. Let δ = p p. Then zα p0q0 + z pq β 1 1 n = ˆp p0 δ z = pq 0 0 n Replace z by z for two-sided test sample size. α Test based on: 12/15/2008 Unit 9 - Stat Ramón V. León 13

14 Example 9.4: Pizza Testing H H 0 1 : Can't tell two pizzas Apart, : Can tell pizzas apart α =.10, We wants β =.25 when p = 0.5 ± 0.1 n zα 2 p0q0 + zβ pq 1 1 = δ ( )( ) + ( )( ) = = /15/2008 Unit 9 - Stat Ramón V. León 14

15 Multinomial Test of Proportions 12/15/2008 Unit 9 - Stat Ramón V. León 15

16 Multinomial Test in JMP Note that one could have gotten confidence intervals Observed count does not exhibit significant deviation from the uniform model. 12/15/2008 Unit 9 - Stat Ramón V. León 16

17 Comparing Two Proportion: Independent Sample Design If np, nq, np, nq 10, then Z Confidence Interval: pˆ pˆ ( p p ) = pq ˆˆ n + pq ˆˆ n N(0,1) pq ˆˆ pq ˆˆ pq ˆˆ pq ˆˆ pˆ pˆ z + p p pˆ pˆ + z α α 2 n1 n2 n1 n2 12/15/2008 Unit 9 - Stat Ramón V. León 17

18 Test for Equality of Proportions (Large n) Independent Sample Design H : p = p vs. H : p p pˆ1 pˆ2 Test statitics: z = 1 1 pq ˆˆ + n n npˆ ˆ 1 1+ np 2 2 x+ y where pˆ = = n + n n + n There is small sample test called Fisher s exact test. See JMP output latter. See Example 9.5 for application to the Salk Polio Vaccine Trial There a test for Matched Pair Design in Section Please read Example 9.9 to see its application for testing the effectiveness of presidential debates. Do voters change their minds about candidates because of debates? 12/15/2008 Unit 9 - Stat Ramón V. León 18

19 Example 9.6 Comparing Two Leukemia Therapies 12/15/2008 Unit 9 - Stat Ramón V. León 19

20 Test for Equality of Proportions in JMP: Example /15/2008 Unit 9 - Stat Ramón V. León 20

21 Example 9.6 JMP Output Result 0.50 Success Prednisone Prednisone + VCR Failure Drug Group 12/15/2008 Unit 9 - Stat Ramón V. León 21

22 Test for Equality of Proportions in JMP Tests Source Model Error C. Total N DF LogLike RSquare (U) Recall that the P- value of the twosided z-test was calculated to be Test Likelihood Ratio Pearson Fisher's Exact Test Left Right 2-Tail ChiSquare Prob Prob>ChiSq z 2 = (-2.347) 2 Less significant result 12/15/2008 Unit 9 - Stat Ramón V. León 22

23 Inferences for Two-Way Count Data Sampling Model 1: Multinomial Model (Total Sample Size Fixed) Sample of 901 from a single population that is then cross-classified The null hypothesis is that X and Y are independent: H : p = P( X = i, Y = j) = P( X = i) P( Y = j) = p p for all i, j 0 ij i.. j 12/15/2008 Unit 9 - Stat Ramón V. León 23

24 Sampling Model 1 (Total Sample Size Fixed) Estimated Expected Frequency = 901 = = (Cell 1,1) = np p /15/2008 Unit 9 - Stat Ramón V. León 24

25 Chi-Square Statistics χ 2 c = i= 1 ( n e) i e i i 2 12/15/2008 Unit 9 - Stat Ramón V. León 25

26 Chi-Square Test Critical Value 2 The d.f. for this χ statistics is (4-1)(4-1) = 9. Since χ = the calculated χ 2 9,.05 = is not sufficiently large to reject the hypothesis of independence at α =.05 level In general df = (r-1)(c-1) where c is the number of columns and r is the number of rows. 12/15/2008 Unit 9 - Stat Ramón V. León 26

27 JMP Analysis 12/15/2008 Unit 9 - Stat Ramón V. León 27

28 JMP Analysis Shows no significance 12/15/2008 Unit 9 - Stat Ramón V. León 28

29 JMP Analysis Note that most of the contribution to the chi-square statistics comes from the corner cells. Lack of significance in the chi-square statistics is the result of the low contribution to the chisquare statistic coming from the center cells. 12/15/2008 Unit 9 - Stat Ramón V. León 29

30 JMP Analysis Restricting our chi-square analysis to the corner cells shows a strong relationship between income and level of satisfaction. 12/15/2008 Unit 9 - Stat Ramón V. León 30

31 Product Multinomial Model: Row Totals Fixed Notice that it does not make any sense to say that for the population 21 out of 80 returned wallets came from Big Cities. Homework Problem /15/2008 Unit 9 - Stat Ramón V. León 31

32 Product Multinomial Model: Row Totals Fixed Sampling Model 2: Product Multinomial Total number of patients in each drug group is fixed. The null hypothesis is that the probability of column response (success or failure) is the same, regardless of the row population: H0 : P ( Y = j X = i ) = pj 12/15/2008 Unit 9 - Stat Ramón V. León 32

33 Chi-Square Statistics χ 2 c = i= 1 ( n e) i e i i 2 12/15/2008 Unit 9 - Stat Ramón V. León 33

34 JMP Analysis 12/15/2008 Unit 9 - Stat Ramón V. León 34

35 JMP Analysis Recall Slide 19: ( ) 2 2 z = = /15/2008 Unit 9 - Stat Ramón V. León 35

36 Remarks About Chi-Square Test The distribution of the chi-square statistics under the null hypothesis is approximately chi-square only when the sample sizes are large The rule of thumb is that all expected cell counts should be greater than 1 and No more than 1/5 th of the expected cell counts should be less than 5. Combine sparse cell (having small expected cell counts) with adjacent cells. Unfortunately, this has the drawback of losing some information. 12/15/2008 Unit 9 - Stat Ramón V. León 36

37 Odds Ratio as a Measure of Association for a 2x2 Table Sampling Model I: Multinomial ψ = p p p p The numerator is the odds of the column 1 outcome vs. the column 2 outcome for row 1, and the denominator is the same odds for row 2, hence the name odds ratio 12/15/2008 Unit 9 - Stat Ramón V. León 37

38 Odds Ratio as a Measure of Association for a 2x2 Table Sampling Model II: Product Multinomial ψ = p p 1 1 p p ( ) 1 1 ( ) 2 2 The two column outcomes are labeled as success and failure, then ψ is the odds of success for the row 1 population vs. the odds of success for the row 2 population 12/15/2008 Unit 9 - Stat Ramón V. León 38

Odds Ratio as a Measure of Association for a 2x2 Table n 11 n 21 14 38 n11 + n12 n21 + n 22 21 42 14 38 ψˆ = = = n 7 4 7 4 12 n 22 n 21 42 11 n 12 n21 n

39 Odds Ratio as a Measure of Association for a 2x2 Table n 11 n n11 + n12 n21 + n ψˆ = = = n n 22 n n 12 n21 n n11 n12 n11n ψˆ = = = = n n n n Confidence Inteval: [0.053, 0.831] 12/15/2008 Unit 9 - Stat Ramón V. León 39

40 Case-Control Studies: The Odds Ratio Approximates the Relative Risk if the Disease is Rare 12/15/2008 Unit 9 - Stat Ramón V. León 40

41 JMP Output for Case-Control Study 12/15/2008 Unit 9 - Stat Ramón V. León 41

42 How to Do It in JMP 12/15/2008 Unit 9 - Stat Ramón V. León 42

43 12/15/2008 Unit 9 - Stat Ramón V. León 43

44 12/15/2008 Unit 9 - Stat Ramón V. León 44

45 12/15/2008 Unit 9 - Stat Ramón V. León 45

46 Do You Need to Know More? 578 Categorical Data Analysis (3) Log-linear analysis of multidimensional contingency tables. Logistic regression. Theory, applications, and use of statistical software. Prereq: 1 yr graduate-level statistics, regression analysis and analysis of variance, or consent of instructor. Sp Reference: An Introduction to Categorical Data Analysis by Alan Agresti. Wiley. 12/15/2008 Unit 9 - Stat Ramón V. León 46

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)