Analysis of Categorical Data Three-Way Contingency Table

Similar documents
STAC51: Categorical data Analysis

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 28 Chi-Square Analysis

3 Way Tables Edpsy/Psych/Soc 589

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Describing Contingency tables

Three-Way Contingency Tables

Chapter 2: Describing Contingency Tables - II

Testing Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

Lecture 45 Sections Wed, Nov 19, 2008

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Statistics 3858 : Contingency Tables

Cohen s s Kappa and Log-linear Models

Describing Stratified Multiple Responses for Sparse Data

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Ordinal Variables in 2 way Tables

Topic 21 Goodness of Fit

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

1 Comparing two binomials

Lecture 8: Summary Measures

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Suppose that we are concerned about the effects of smoking. How could we deal with this?

STA102 Class Notes Chapter Logistic Regression

Solution to Tutorial 7

BIOS 625 Fall 2015 Homework Set 3 Solutions

Lecture 41 Sections Mon, Apr 7, 2008

Optimal exact tests for complex alternative hypotheses on cross tabulated data

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Discrete Multivariate Statistics

Ling 289 Contingency Table Statistics

Computational Systems Biology: Biology X

Logistic regression: Miscellaneous topics

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Case-Control Association Testing. Case-Control Association Testing

REVIEW 8/2/2017 陈芳华东师大英语系

Goodness of Fit Tests

Hypothesis testing (cont d)

Lecture 15: Chapter 10

2. AXIOMATIC PROBABILITY

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Example. χ 2 = Continued on the next page. All cells

Marginal Screening and Post-Selection Inference

Lecture 5: ANOVA and Correlation

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Frequency Distribution Cross-Tabulation

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Inferences About Two Proportions

Poisson regression: Further topics

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

For more information about how to cite these materials visit

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

2 Describing Contingency Tables

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Log-linear Models for Contingency Tables

10: Crosstabs & Independent Proportions

Means or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

How is the Statistical Power of Hypothesis Tests affected by Dose Uncertainty?

Part IV Statistics in Epidemiology

Multiple Sample Categorical Data

Visual interpretation with normal approximation

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Statistics in medicine

Chapter 7. Practice Exam Questions and Solutions for Final Exam, Spring 2009 Statistics 301, Professor Wardrop

Analysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

13.1 Categorical Data and the Multinomial Experiment

Psych 230. Psychological Measurement and Statistics

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

STAT 705: Analysis of Contingency Tables

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Introduction to Inferential Statistics. Jaranit Kaewkungwal, Ph.D. Faculty of Tropical Medicine Mahidol University

Goodness of Fit Tests: Homogeneity

E509A: Principle of Biostatistics. GY Zou

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name

ECON Introductory Econometrics. Lecture 2: Review of Statistics

Binary Logistic Regression

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Analysing data: regression and correlation S6 and S7

Multi-Level Test of Independence for 2 X 2 Contingency Table using Cochran and Mantel Haenszel Statistics

Chapter 2: Describing Contingency Tables - I

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Topic 10: Hypothesis Testing

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

CDA Chapter 3 part II

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Chapters 10. Hypothesis Testing

POLI 443 Applied Political Research

Comparison of Two Samples

Transcription:

Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table

Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous association Cochran-Mantel-Haenszel Methods

Yu Lecture 4 p. 3/17 Three-Way Contingency Tables Partial Tables Make 2-way tables of X Y at variaous levels of Z. This effectively removes the effect of Z by holding it constant. The associations of partial tables are called conditional associations because we are looking at X and Y conditional on a fixed level of Z. Focus is on relationship between variables X and Y at fixed levels of another variable Z = 1,..., K. Marginal Tables Sum the counts from the same cell location of partial tables. The idea is to form an X, Y table by summing over Z. Marginal tables can be quite misleading: Simpson s Paradox.

Yu Lecture 4 p. 4/17 Simpson s Paradox: Example 1 Table 1: Admission to Graduate School (Verducci) Accepted Rejected Science Male 60 15 Female 25 5 Accepted Rejected Arts Male 10 15 Female 30 40 X = Sex: Male, Female Y = Admission: Accepted, Rejected Z = College: Science, Arts

Yu Lecture 4 p. 5/17 Example 1 (Cont d) Condition on Z. O XY (Sci) = 4/5 < 1 O XY (Art) = 8/9 < 1 O XY = 21/11 > 1 Condition on X. O ZY (M) = 6 O ZY (F) = 20/3 O ZY = 187/12 Condition on Y. O ZX(Acc) = 36/5 O ZX(Rej) = 8 O ZX = 7

Yu Lecture 4 p. 6/17 Simpson s Paradox (Cont d) Paradox In each College, women have a greater acceptance rate than do men; Overall, men have a greater acceptance rate than do women; Resolution The sciences have a much higher acceptance rate than do the arts Most men apply to sciences; women to arts Simpson s paradox happens when there are different associations in partial and marginal tables.

Yu Lecture 4 p. 7/17 Marginal vs. Conditional Independence If X and Y are independent in each partial table, controlling for Z, then X and Y are conditionally independent. If X and Y are conditionally independent at each level of Z, but may still not be marginally independent Example 2 : Clinic and Treatment Good Bad Good Bad Clinic 1 A 18 12 Clinic 2 A 2 8 B 12 8 B 8 32

Yu Lecture 4 p. 8/17 Example 2 (Cont d) Condition on Z. O XY (C1) = 1 O XY (C2) = 1 O XY = 2 Condition on X. O ZY (A) = 6 O ZY (B) = 6 O ZY = 6 Condition on Y. O ZX(Good) = 6 O ZX(Bad) = 6 O ZX = 6

Yu Lecture 4 p. 9/17 Example 2 (Summary) X and Y are conditionally independent at each level of Z, but they are not marginally independent. This happens because, across levels of Z, there is a reversal in the odds of success: 3:2 in Clinic 1 1:4 in Clinic 2 There is a reversal in prevalence of treatment: Clinic 1 uses Treatment A the most Clinic 2 uses Treatment B the most

Yu Lecture 4 p. 10/17 Homogeneous Association Effect of X on Y is the same at all levels of Z. Happens when the conditional odds ratio using any two levels of X and any two levels of Y is the same at all levels of Z: O XY (1) =... = O XY (K) Conditional Independence is a special case, when these all equal 1. In the case when K=2, homogeneous association implies that the other conditional odds ratios will also be the same: O ZY (1) = O ZY (2) and O ZX(1) = O ZX(2) For 3-way tables of larger dimensions, homogeneous association generalizes to the model of no-three way interaction.

Yu Lecture 4 p. 11/17 Example 3: Bipoloar Children Trtment 200 families with a bipolar child 100 randomized to immediate treatment 100 randomized to 1-year waitlist Outcome Variable: Social functioning at one year into the study 100 good and 100 bad Moderating Variable: Both biological parents as caregivers 100 Yes and 100 No Good Bad Good Bad Intact imm 60 20 Not Intact imm 15 5 Family wait 10 10 Family wait 40 40

Yu Lecture 4 p. 12/17 Example 3 (Cont d) Condition on Z. O XY (IF) = 3 O XY (NIF) = 3 O XY = 3 Condition on X. O ZY (imm) = 1 O ZY (wait) = 1 O ZY = 1.9 Condition on Y. O ZX(Good) = 16 O ZX(Bad) = 16 O ZX = 16

Yu Lecture 4 p. 13/17 CMH Test Motivation: Is there an association between X and Y? Can t just collapse table [why not?] Assume there is a common odds ratio θ at each level of Z Hypotheses Null hypothesis H 0 : θ = 1 Alternative hypothesis H 1 : θ < 1 or θ > 1 Evidence Condition on the margins of XY table at each level of Z Only need to consider one entry n 11k at level k of Z (k = 1,..., K) Under the null hypothesis, {n 11k } are independent hypergeometric random variables

Yu Lecture 4 p. 14/17 Why Not? Could wrongly find association: Example 4 Player1 Player2 Made Missed Made Missed Made 40 4 Made 5 5 Missed 10 1 Missed 5 5 Collapsed Made Missed Made 45 9 Missed 15 6 Odds Ratio = 2

Yu Lecture 4 p. 15/17 Example 4 (Cont d): Why Not? Could wrongly mistake diverse association for no association Player1 Player2 Made Missed Made Missed Made 10 1 Made 10 4 Missed 8 8 Missed 24 0 Odds Ratio = 10 Odds Ratio = 0 Collapsed Made Missed Made 20 5 Missed 32 8 Odds Ratio = 1

Yu Lecture 4 p. 16/17 CMH Test Under the null hypothesis, {n 11k } are independent hypergeometric random variables CMH Test Statistics µ 11k = E(n 11k ) = n 1+kn +1k n ++k V ar(n 11k ) = n 1+kn 1+k n +1k n +1k n 2 ++k (n ++k 1) CMH = [ K k=1 (n 11k µ 11k )] 2 K k=1 V ar(n 11k) Important: In the numerator, sum before squaring Under the null hypothesis CMH χ 2 1

Yu Lecture 4 p. 17/17 CMH Test (Cont d) The CMH test is a powerful summary of evidence against the hypothesis of conditional independence, as long as the sample associations fall primarily in a single direction. Mantel-Haenszel Estimator for Common Odds Ratio ˆθ MH = k (n 11kn 22k n ++k ) k (n 12kn 21k n ++k ) Example 5: Coronary Artery Disease