ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Similar documents
ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

STAT 705: Analysis of Contingency Tables

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Lecture 8: Summary Measures

STAC51: Categorical data Analysis

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Testing Independence

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Describing Contingency tables

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

13.1 Categorical Data and the Multinomial Experiment

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Categorical Data Analysis Chapter 3

Means or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

BIOS 625 Fall 2015 Homework Set 3 Solutions

Chapter 2: Describing Contingency Tables - I

2 Describing Contingency Tables

Discrete Multivariate Statistics

Inference for Binomial Parameters

3 Way Tables Edpsy/Psych/Soc 589

Ordinal Variables in 2 way Tables

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Unit 9: Inferences for Proportions and Count Data

Frequency Distribution Cross-Tabulation

Unit 9: Inferences for Proportions and Count Data

Solution to Tutorial 7

Categorical Variables and Contingency Tables: Description and Inference

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Statistics in medicine

Cohen s s Kappa and Log-linear Models

Statistics 3858 : Contingency Tables

Some comments on Partitioning

1) Answer the following questions with one or two short sentences.

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Chapter 19. Agreement and the kappa statistic

Topics on Statistics 3

Multinomial Logistic Regression Models

Contingency Tables Part One 1

Log-linear Models for Contingency Tables

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

STAC51H3:Categorical Data Analysis Assign 2 Due: Thu Feb 9, 2016 in class All relevant work must be shown for credit.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Three-Way Contingency Tables

Poisson regression: Further topics

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

11-2 Multinomial Experiment

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Ling 289 Contingency Table Statistics

2.3 Analysis of Categorical Data

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Chapter 11: Analysis of matched pairs

6 Single Sample Methods for a Location Parameter

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Chapter 11: Models for Matched Pairs

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Review of Statistics 101

8 Nominal and Ordinal Logistic Regression

Lecture 25: Models for Matched Pairs

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Inferences for Proportions and Count Data

Review of One-way Tables and SAS

Module 10: Analysis of Categorical Data Statistics (OA3102)

Short Course Introduction to Categorical Data Analysis

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Section 4.6 Simple Linear Regression

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Sociology 6Z03 Review II

Chapter 11. Hypothesis Testing (II)

Relate Attributes and Counts

Binary response data

15: CHI SQUARED TESTS

2.6.3 Generalized likelihood ratio tests

Section Inference for a Single Proportion

Simple logistic regression

Topic 21 Goodness of Fit

Exam details. Final Review Session. Things to Review

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Section Comparing Two Proportions

Glossary for the Triola Statistics Series

Solutions for Examination Categorical Data Analysis, March 21, 2013

Lecture 01: Introduction

REVIEW: Midterm Exam. Spring 2012

STAT 526 Spring Final Exam. Thursday May 5, 2011

Binary Logistic Regression

ZERO INFLATED POISSON REGRESSION

10: Crosstabs & Independent Proportions

Lecture 7: Hypothesis Testing and ANOVA

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

Transcription:

ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1

What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories Y has J categories Display the IJ possible combinations of outcomes in a rectangular table having I rows for the categories of X and J columns for the categories of Y. A table of this form in which the cells contain frequency counts of outcomes is called a contingency table. 2

Example: Belief In Afterlife Data (p.18) Belief in Afterlife Gender Yes No or Undecided Female 435 147 Male 375 134 A contingency table that cross classifies two variables is called a two way table. A table which cross classifies three variables is called a three way table. A two-way table having I rows and J columns is called an I J table. 3

Some Notations, Definitions π ij = P [X = i, Y = j] = probability that (X, Y ) falls in the cell in row i and column j. The probabilities {π ij } form the joint distribution of X and Y. Note that, I i=1 J π ij = 1 j=1 4

Marginal Distributions (p.17) The marginal distribution of X is π i+, which is obtained by the row sums, that is, J π i+ = j=1 π ij The marginal distribution of Y is π +j, which is obtained by the column sums, that is π +j = I i=1 π ij For example, for a 2 2 table π 1+ = π 11 + π 12, π +1 = π 11 + π 21 5

Notations For The Data Cell counts are denoted by {n ij }, with n = I i=1 J j=1 n ij Cell proportions are p ij = n ij n The marginal frequencies are row totals {n i+ }and column totals {n +j } 6

Example Belief in Afterlife Gender Yes No or Undecided Total Female n 11 =435 n 12 =147 n 1+ =582 Male n 21 =375 n 22 =134 n 2+ =509 Total n +1 =810 n +2 =281 n=1091 7

Example: Sample Proportions Belief in Afterlife Gender Yes No or Undecided Total Female p 11 =0.398 p 12 =0.135 p 1+ =0.533 Male p 21 =0.344 p 22 =0.123 p 2+ =0.467 Total p +1 =0.742 p +2 =0.258 p=1.00 8

Conditional Probabilities Let Y be a response variable and X be an explanatory variable. It is informative to construct separate probability distributions for Y at each level of X. Such a distribution consists of conditional probabilities for Y given the level of X and is called a conditional distribution. 9

Example: Sample Conditional Distributions For females, Proportion of yes responses = 0.747 Proportion of no responses = 0.253 For males, Proportion of yes responses = 0.737 Proportion of no responses = 0.263 10

Independence Is the belief in afterlife is independent of gender? Two variables are statistically independent if all joint probabilities equal the product of their marginal probabilities π ij = π i+ π +j, for i = 1,, I and j = 1,, J Conditional distributions of Y are identical at each levels of X. 11

Poisson Model Probability Model For A 2 2 Table Each of the 4 cell counts are independent Poisson random variables Binomial Model Marginal totals of X are fixed. Conditional distributions of Y at each level of X are binomial. Multinomial Model Total sample size is fixed but not the row or column totals. The distribution of 4 cell counts are then multinomial 12

Comparing Proportions In 2 2 Tables Assume that the row totals are fixed and hence we have a binomial model. Suppose the two categories of Y are success and failure. Let π 1 = Probability of success in row 1 and π 2 = Probability of success in row 2. The difference in probabilities π 1 π 2 compares the success probabilities in two rows. 13

Sample Difference of Proportions Let p 1 and p 2 be sample proportions of success for the two rows. The sample difference p 1 p 2 estimates π 1 π 2. If the counts in two rows are independent samples, the estimated standard error of p 1 p 2 is p 1 (1 p 1 ) ˆσ(p 1 p 2 ) = + p 2(1 p 2 ) n 1+ n 2+ 14

Example: Belief in Afterlife (p.16) In our example p 1 = n 11 /n 1+ = 435/582 = 0.747 p 2 = n 21 /n 2+ = 375/509 = 0.737 Therefore, p 1 p 2 = 0.747 0.737 = 0.010 The estimated standard error ˆσ(p 1 p 2 ) = p 1 (1 p 1 )/n 1+ + p 2 (1 p 2 )/n 2+ = 0.02656 15

Example: Aspirin Use And Myocardial Infarction (p. 20) Myocardial Infarction Group Yes No Total Placebo 189 10845 11034 Aspirin 104 10933 11037 To find out whether regular intake of Aspirin reduces mortality from cardiovascular diseases 16

Example: Continued In this example, p 1 = 189/11034 = 0.0171 and p 2 = 104/11037 = 0.0094. Thus p 1 p 2 = 0.0077 and the estimated standard error, ˆσ(p 1 p 2 ) = 0.0171 0.9829 11034 + 0.0094 0.9906 11037 =.0015. 17

Confidence Interval (p.19) A large sample 100(1 α)% confidence interval for π 1 π 2 is p 1 p 2 ± z α/2ˆσ(p 1 p 2 ) where z α/2 denotes the standard normal percentile having a right tail probability equals to α/2. For the aspirin use example, a 95% C.I. for π 1 π 2 is 0.0077 ± 1.96 0.0015 = (0.005, 0.011). 18

Notes (p.21) A difference between two proportions of a certain fixed size may have greater importance when both proportions are near 0 or 1 than when they are near the middle of the range. e.g. the difference between 0.010 and 0.001 is the same as the difference between 0.410 and 0.401, namely 0.009 but the former one may be more important than the later one. Examples of such cases include a comparison of drugs on the proportion of subjects who have adverse reactions when using the drug. 19

Relative Risk (p.21) In 2 2 tables, the relative risk is the ratio of the success probabilities for the two groups π 1 /π 2. The proportions 0.010 and 0.001 has a relative risk of 10.0 whereas the proportions 0.410 and 0.401 have a relative risk 1.02. 20

Sample relative risk = p 1 /p 2. Relative Risk - Continued Its distribution is heavily skewed and cannot approximated by normal distribution well unless the sample sizes are quite large. A large sample confidence interval is given by [ exp log( p 1 p 2 ) ± z α/2 1 p 1 n 1+ p 1 + 1 p 2 n 2+ p 2 ] 21

Example: Aspirin Use and MI The sample relative risk is 1.818. A large sample 95% confidence interval for the relative risk π 1 /π 2 is [1.4330, 2.3059]. The C.I. (0.005, 0.011) for the difference of proportions, π 1 π 2, makes it seem as if the two groups differ by a trivial amount, but the relative risk shows that the difference may have important public health implications. 22

Odds Ratio (p.22) Within Row 1, the odds of success is Odds 1 = π 1 /(1 π 1 ) Similarly, within Row 2, the odds of success is Odds 2 = π 2 /(1 π 2 ) Odds Ratio θ = Odds 1 Odds 2 = π 1/(1 π 1 ) π 2 /(1 π 2 ) 23

Notes (p. 23) For example, if π 1 = 0.75, Odds 1 = 0.75/0.25 = 3. Odds are non-negative and values greater than 1 indicates a success is more likely than a failure. When X and Y are independent, conditional distributions of Rows 1 and 2 are same, that is, π 1 = π 2 and this implies, θ = 1. 24

Observations X and Y are independent π 1 = π 2 θ = 1. If 1 < θ <, the odds of success are higher in row 1 than in row 2. If 0 < θ < 1, a success is less likely in row 1 than in row 2. 25

More Observations Values of θ farther from 1 (too small or too large) in a given direction indicates stronger level of association. If the order of the rows or the order of the columns is reversed (but not both), the new value of θ is the inverse of the original value. This ordering is usually arbitrary, so whether we get θ = 4.0 or 0.25 is simply a matter of how we label the rows and columns. 26

More Observations As the odds ratio treats the variables symmetrically, it is unnecessary to identify one classification as a response variable to calculate it. When both variables are responses, the odds ratio can be defined using the joint probability as θ = π 11/π 12 π 21 /π 22 = π 11π 22 π 12 π 21 and called cross product ratio. 27

Sample Odds Ratio (p.24) Sample odds ratio is defined as ˆθ = p 1/(1 p 1 ) p 2 /(1 p 2 ) = n 11/n 12 n 21 /n 22 = n 11n 22 n 12 n 21. For the Aspirin Use example, for the Placebo group, the odds of MI = 0.0174 and for the Aspirin group, the odds of MI = 0.0095. The sample odds ratio = 0.0174/0.0095 = 1.832. The estimated odds are 83% higher for the placebo group. 28

SAS Codes For Binomial Proportions data veg; input habit $ count; datalines; Veg 10 Nonveg 15 ; run; proc freq data=veg order=data; weight count; tables habit / binomial (p=0.5); run; 29

Output The FREQ Procedure Cumulative Cumulative habit Frequency Percent Frequency Percent Veg 10 40.00 10 40.00 Nonveg 15 60.00 25 100.00 Binomial Proportion for habit = Veg Proportion 0.4000 ASE 0.0980 30

SAS Codes For Binomial Proportions 95% Lower Conf Limit 0.2080 95% Upper Conf Limit 0.5920 Exact Conf Limits 95% Lower Conf Limit 0.2113 95% Upper Conf Limit 0.6133 Test of H0: Proportion = 0.5 ASE under H0 0.1000 Z -1.0000 One-sided Pr < Z 0.1587 Two-sided Pr > Z 0.3173 Sample Size = 25 31

SAS Codes For Binomial Proportions data aspirin; input group $ mi $ count; datalines; Placebo Yes 189 Placebo No 10845 Aspirin Yes 104 Aspirin No 10933 ; run; proc freq data=aspirin order=data; weight count; tables group*mi / measures nopercent norow nocol; run; 32

The FREQ Procedure Table of group by mi group mi SAS Codes For Binomial Proportions Frequency Yes No Total Placebo 189 10845 11034 Aspirin 104 10933 11037 Total 293 21778 22071 33

Output Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits Case-Control (Odds Ratio) 1.8321 1.4400 2.3308 Cohort (Col1 Risk) 1.8178 1.4330 2.3059 Cohort (Col2 Risk) 0.9922 0.9892 0.9953 Sample Size = 22071 34

R Codes For Binomial Proportions For asymptotic test: >veg<-10 >total<-25 >prop.test(veg,total,0.5,correct=f) For Exact Test: >binom.test(veg,total,0.5) 35

Output For prop.test 1-sample proportions test without continuity correction data: veg out of total, null probability 0.5 X-squared = 1, df = 1, p-value = 0.3173 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.2340330 0.5926054 sample estimates: p 0.4 36

Output For binom.test Exact binomial test data: veg and total number of successes = 10, number of trials = 25, p-value = 0.4244 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.2112548 0.6133465 sample estimates: probability of success 0.4 37

R Codes For 2 2 Tables >phs<-matrix(c(189,10845,104,10933), ncol=2, byrow=2) >dimnames(phs)<-list(group=c("placebo", "Aspirin"), MI=c("Yes","No")) >phs MI Group Yes No Placebo 189 10845 Aspirin 104 10933 38

R Codes For Difference In Proportions >prop.test(phs,correct=f) 2-sample test for equality of proportions without continuity correction data: phs X-squared = 25.0139, df = 1, p-value = 5.692e-07 alternative hypothesis: two.sided 95 percent confidence interval: 0.004687751 0.010724297 sample estimates: prop 1 prop 2 0.01712887 0.00942285 39

Relative Risk and Odds Ratio >phs.test<-prop.test(phs,correct=f) >phs.test$estimate[1]/phs.test$estimate[2] prop 1 %Relative Risk% 1.817802 >odds<-phs.test$estimate/(1- phs.test$estimate) >odds[1]/odds[2] %Odds Ratio% prop 1 1.832054 40