Unit 9: Inferences for Proportions and Count Data

Similar documents
Unit 9: Inferences for Proportions and Count Data

Inferences for Proportions and Count Data

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Statistics 3858 : Contingency Tables

13.1 Categorical Data and the Multinomial Experiment

Testing Independence

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Ling 289 Contingency Table Statistics

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Discrete Multivariate Statistics

Module 10: Analysis of Categorical Data Statistics (OA3102)

Poisson regression: Further topics

Econ 325: Introduction to Empirical Economics

BIOS 625 Fall 2015 Homework Set 3 Solutions

The Multinomial Model

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Chapter 9 Inferences from Two Samples

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Unit 11: Multiple Linear Regression

Categorical Data Analysis Chapter 3

11-2 Multinomial Experiment

Unit 14: Nonparametric Statistical Methods

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

STAT 705: Analysis of Contingency Tables

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion

An introduction to biostatistics: part 1

Sociology 6Z03 Review II

Inference for Binomial Parameters

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Mathematical Notation Math Introduction to Applied Statistics

Statistics in medicine

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Chapter 10. Discrete Data Analysis

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Summary of Chapters 7-9

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Review of Statistics 101

2.3 Analysis of Categorical Data

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

15: CHI SQUARED TESTS

Statistics Handbook. All statistical tables were computed by the author.

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Contingency Tables Part One 1

Confidence Intervals, Testing and ANOVA Summary

Harvard University. Rigorous Research in Engineering Education

Introduction to Survey Analysis!

Two-sample Categorical data: Testing

Mathematical Notation Math Introduction to Applied Statistics

Review of One-way Tables and SAS

Hypothesis Testing hypothesis testing approach

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Chapter 10 Nonlinear Models

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

Binary response data

Central Limit Theorem ( 5.3)

Lecture 01: Introduction

Three-Way Contingency Tables

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Chapter 6 Estimation and Sample Sizes

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

8 Nominal and Ordinal Logistic Regression

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Cohen s s Kappa and Log-linear Models

Medical statistics part I, autumn 2010: One sample test of hypothesis

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Statistics: revision

Binary Logistic Regression

Institute of Actuaries of India

Difference Between Pair Differences v. 2 Samples

10: Crosstabs & Independent Proportions

Multiple Sample Categorical Data

Multinomial Logistic Regression Models

Lecture 10: Introduction to Logistic Regression

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Stat 642, Lecture notes for 04/12/05 96

Some comments on Partitioning

Formulas and Tables by Mario F. Triola

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

The material for categorical data follows Agresti closely.

Categorical Variables and Contingency Tables: Description and Inference

Topic 21 Goodness of Fit

Transcription:

Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p) ( pˆ p) Note that N(0,1) and N(0,1) if n is large pq / n pq ˆˆ/ n ( q = 1- p, npˆ 10 and nqˆ 10) It follows that: Confidence interval for p: ( pˆ p) P z z 1 pq ˆˆ n α α α pq ˆˆ pˆ z p pˆ + z n α α pq ˆˆ n 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1

A Better Confidence Interval for Proportion Use this probability statement ( pˆ p) P z z 1 pq n α α α Solve for p using quadratic equation CI for p: 1/15/008 Unit 9 - Stat 571 - Ramón V. León 3 1/15/008 Unit 9 - Stat 571 - Ramón V. León 4

CI for Proportion in JMP Arbitrary choice of names = 0.45 x 800 = 0.55 x 800 Value column has two categories. Can you imagine a situation where one would have more categories in this column? 1/15/008 Unit 9 - Stat 571 - Ramón V. León 5 CI for Proportion in JMP 1/15/008 Unit 9 - Stat 571 - Ramón V. León 6 3

Sample Size Determination for a Confidence Interval for Proportion Want (1-α)-level two-sided CI: pq ˆˆ pˆ ± E where E is the margin of error. Then E = zα. n zα Solving for n gives n= pq ˆˆ (Formula 9.4) E 1 1 1 Largest value of pq = = so conservative sample size is: 4 zα 1 n = (Formula 9.5) E 4 1/15/008 Unit 9 - Stat 571 - Ramón V. León 7 Example 9.: Presidential Poll n 1.96 1 = = 9604 0.01 4 1067.11 9 = 9604 Threefold increase in precision requires ninefold increase in sample size 1/15/008 Unit 9 - Stat 571 - Ramón V. León 8 4

Largest Sample Hypothesis Test on Proportion H : p = p vs. H : p p 0 0 1 0 Best test statistics: z = ˆp p p q 0 0 0 n Dual relationship between CI and test of hypothesis holds if the better confidence interval is used. There is an exact test that can be used when the sample size is small (given in Section 9.1.3). We do not cover it. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 9 Basketball Problem: z-test.18 1/15/008 Unit 9 - Stat 571 - Ramón V. León 10 5

Test for Proportion in JMP: Baseball Problem Value has two categories. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 11 Test for Proportion in JMP: Basketball Problem 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 6

Sample Size for Z-Test of Proportion H : p p vs. H : p > p 1 0 o α 0 1 0 Suppose that the power for rejecting H 0 must be at least 1- β when the true proportion is p = p > p. Let δ = p p. Then 1 0 zα p0q0 + z pq β 1 1 n = ˆp p0 δ z = p0q0 n Replace z by z for two-sided test sample size. α Test based on: 1/15/008 Unit 9 - Stat 571 - Ramón V. León 13 H H Example 9.4: Pizza Testing 0 1 : Can't tell two pizzas Apart, : Can tell pizzas apart α =.10, We wants β =.5 when p = 0.5 ± 0.1 zα p0q0 + z pq β 1 1 n = δ ( )( ) + ( )( ) 1.645.5.5 0.675.6.4 = = 13.98 133 0.1 1/15/008 Unit 9 - Stat 571 - Ramón V. León 14 7

Multinomial Test of Proportions 1/15/008 Unit 9 - Stat 571 - Ramón V. León 15 Multinomial Test in JMP Note that one could have gotten confidence intervals Observed count does not exhibit significant deviation from the uniform model. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 16 8

Comparing Two Proportion: Independent Sample Design If np 1 1, nq 1 1, np, nq 10, then pˆ1 pˆ ( p1 p) Z = N(0,1) pq ˆˆ ˆˆ 1 1 pq + n n Confidence Interval: 1 pˆˆ q pˆˆ q pq ˆˆ pˆˆ q pˆ pˆ z + p p pˆ pˆ + z + 1 1 1 1 1 α 1 1 α n1 n n1 n 1/15/008 Unit 9 - Stat 571 - Ramón V. León 17 Test for Equality of Proportions (Large n) Independent Sample Design H : p = p vs. H : p p 0 1 1 1 pˆ1 pˆ Test statitics: z = 1 1 pq ˆˆ + n 1 n npˆ + npˆ x+ y where pˆ 1 1 = = n1+ n n1+ n There is small sample test called Fisher s exact test. See JMP output latter. See Example 9.5 for application to the Salk Polio Vaccine Trial There a test for Matched Pair Design in Section 9... Please read Example 9.9 to see its application for testing the effectiveness of presidential debates. Do voters change their minds about candidates because of debates? 1/15/008 Unit 9 - Stat 571 - Ramón V. León 18 9

Example 9.6 Comparing Two Leukemia Therapies 1/15/008 Unit 9 - Stat 571 - Ramón V. León 19 Test for Equality of Proportions in JMP: Example 9.6 1/15/008 Unit 9 - Stat 571 - Ramón V. León 0 10

Example 9.6 JMP Output 1.00 0.75 Result 0.50 Success 0.5 0.00 Prednisone Prednisone + VCR Failure Drug Group 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Test for Equality of Proportions in JMP Tests Source Model Error C. Total N DF 1 61 6 63 -LogLike.600496 6.575470 9.175966 RSquare (U) 0.0891 Recall that the P- value of the twosided z-test was calculated to be 0.019 Test Likelihood Ratio Pearson Fisher's Exact Test Left Right -Tail ChiSquare 5.01 5.507 Prob 0.9958 0.054 0.033 Prob>ChiSq 0.06 0.0189 z = (-.347) Less significant result 1/15/008 Unit 9 - Stat 571 - Ramón V. León 11

Inferences for Two-Way Count Data Sampling Model 1: Multinomial Model (Total Sample Size Fixed) Sample of 901 from a single population that is then cross-classified The null hypothesis is that X and Y are independent: H : p = P( X = i, Y = j) = P( X = i) P( Y = j) = p p for all i, j 0 ij i.. j 1/15/008 Unit 9 - Stat 571 - Ramón V. León 3 Sampling Model 1 (Total Sample Size Fixed) 6 06 6 06 Estimated Expected Frequency = 901 = = 14.18 901 901 901 (Cell 1,1) = np p 1 1 1/15/008 Unit 9 - Stat 571 - Ramón V. León 4 1

Chi-Square Statistics χ c ( ni ei) = e i= 1 i 1/15/008 Unit 9 - Stat 571 - Ramón V. León 5 Chi-Square Test Critical Value The d.f. for this χ statistics is (4-1)(4-1) = 9. Since χ = 16.919 9,.05 the calculated χ = 11.989 is not sufficiently large to reject the hypothesis of independence at α =.05 level In general df = (r-1)(c-1) where c is the number of columns and r is the number of rows. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 6 13

JMP Analysis 1/15/008 Unit 9 - Stat 571 - Ramón V. León 7 JMP Analysis Shows no significance 1/15/008 Unit 9 - Stat 571 - Ramón V. León 8 14

JMP Analysis Note that most of the contribution to the chi-square statistics comes from the corner cells. Lack of significance in the chi-square statistics is the result of the low contribution to the chisquare statistic coming from the center cells. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 9 JMP Analysis Restricting our chi-square analysis to the corner cells shows a strong relationship between income and level of satisfaction. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 30 15

Product Multinomial Model: Row Totals Fixed Notice that it does not make any sense to say that for the population 1 out of 80 returned wallets came from Big Cities. Homework Problem 9.8 1/15/008 Unit 9 - Stat 571 - Ramón V. León 31 Product Multinomial Model: Row Totals Fixed Sampling Model : Product Multinomial Total number of patients in each drug group is fixed. The null hypothesis is that the probability of column response (success or failure) is the same, regardless of the row population: H : ( ) 0 PY= j X= i = pj 1/15/008 Unit 9 - Stat 571 - Ramón V. León 3 16

Chi-Square Statistics χ c ( ni ei) = e i= 1 i 1/15/008 Unit 9 - Stat 571 - Ramón V. León 33 JMP Analysis 1/15/008 Unit 9 - Stat 571 - Ramón V. León 34 17

JMP Analysis Recall Slide 19: ( ) z = =.347 5.5084 1/15/008 Unit 9 - Stat 571 - Ramón V. León 35 Remarks About Chi-Square Test The distribution of the chi-square statistics under the null hypothesis is approximately chi-square only when the sample sizes are large The rule of thumb is that all expected cell counts should be greater than 1 and No more than 1/5 th of the expected cell counts should be less than 5. Combine sparse cell (having small expected cell counts) with adjacent cells. Unfortunately, this has the drawback of losing some information. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 36 18

Odds Ratio as a Measure of Association for a x Table Sampling Model I: Multinomial p11 p1 ψ = p p 1 The numerator is the odds of the column 1 outcome vs. the column outcome for row 1, and the denominator is the same odds for row, hence the name odds ratio 1/15/008 Unit 9 - Stat 571 - Ramón V. León 37 Odds Ratio as a Measure of Association for a x Table Sampling Model II: Product Multinomial ψ = p p ( 1 p ) ( 1 p ) 1 1 The two column outcomes are labeled as success and failure, then ψ is the odds of success for the row 1 population vs. the odds of success for the row population 1/15/008 Unit 9 - Stat 571 - Ramón V. León 38 19

Odds Ratio as a Measure of Association for a x Table n 11 n 1 14 38 n11 + n1 n1 + n 1 4 14 38 ψˆ = = = n n 7 4 7 4 1 n 1 4 11 + n 1 n1 + n n11 n1 n11n 14 4 ψˆ = = = = 0.105 n n n n 38 7 1 1 1 Confidence Inteval: [0.053, 0.831] 1/15/008 Unit 9 - Stat 571 - Ramón V. León 39 Case-Control Studies: The Odds Ratio Approximates the Relative Risk if the Disease is Rare 1/15/008 Unit 9 - Stat 571 - Ramón V. León 40 0

JMP Output for Case-Control Study 1/15/008 Unit 9 - Stat 571 - Ramón V. León 41 How to Do It in JMP 1/15/008 Unit 9 - Stat 571 - Ramón V. León 4 1

1/15/008 Unit 9 - Stat 571 - Ramón V. León 43 1/15/008 Unit 9 - Stat 571 - Ramón V. León 44

1/15/008 Unit 9 - Stat 571 - Ramón V. León 45 Do You Need to Know More? 578 Categorical Data Analysis (3) Log-linear analysis of multidimensional contingency tables. Logistic regression. Theory, applications, and use of statistical software. Prereq: 1 yr graduate-level statistics, regression analysis and analysis of variance, or consent of instructor. Sp Reference: An Introduction to Categorical Data Analysis by Alan Agresti. Wiley. 1/15/008 Unit 9 - Stat 571 - Ramón V. León 46 3