Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Similar documents
Lecture 21. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Analysis of Categorical Data Three-Way Contingency Table

Simple Linear Regression. John McGready Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

STAC51: Categorical data Analysis

Solution to Tutorial 7

Statistics for laboratory scientists II

Sampling Variability and Confidence Intervals. John McGready Johns Hopkins University

1 Comparing two binomials

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

3 Way Tables Edpsy/Psych/Soc 589

1 Hypothesis testing for a single mean

Section B. The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Chapter 2: Describing Contingency Tables - II

BIOS 625 Fall 2015 Homework Set 3 Solutions

Optimal exact tests for complex alternative hypotheses on cross tabulated data

From last time... The equations

Probability measures A probability measure, P, is a real valued function from the collection of possible events so that the following

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Suppose that we are concerned about the effects of smoking. How could we deal with this?

Lecture 8: Summary Measures

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Example. Test for a proportion

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

Cohen s s Kappa and Log-linear Models

Testing Independence

Inference using structural equations with latent variables

SEM for Categorical Outcomes

Contingency Tables Part One 1

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Describing Stratified Multiple Responses for Sparse Data

Toxicokinetics: Absorption, Distribution, and Excretion. Michael A. Trush, PhD Johns Hopkins University

Three-Way Contingency Tables

Lecture 28 Chi-Square Analysis

Logistic regression: Miscellaneous topics

For more information about how to cite these materials visit

Statistics in medicine

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Solutions for Examination Categorical Data Analysis, March 21, 2013

2 Describing Contingency Tables

Introduction To Logistic Regression

Generalized Linear Models Introduction

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

Tests for Two Correlated Proportions in a Matched Case- Control Design

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

13.1 Categorical Data and the Multinomial Experiment

Statistics for Managers Using Microsoft Excel

Lab 8. Matched Case Control Studies

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Testing Non-Linear Ordinal Responses in L2 K Tables

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Probability Pr(A) 0, for any event A. 2. Pr(S) = 1, for the sample space S. 3. If A and B are mutually exclusive, Pr(A or B) = Pr(A) + Pr(B).

CHAPTER 1: BINARY LOGIT MODEL

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Part IV Statistics in Epidemiology

Computational Systems Biology: Biology X

4. Comparison of Two (K) Samples

Introduction to Structural Equations

Choosing among models

Statistics in medicine

A Framework for the Study of Urban Health. Abdullah Baqui, DrPH, MPH, MBBS Johns Hopkins University

Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning

Topic 21 Goodness of Fit

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

16.400/453J Human Factors Engineering. Design of Experiments II

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

MA : Introductory Probability

Advanced Structural Equations Models I

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Lecture 5: ANOVA and Correlation

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Lecture 5. October 21, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Categorical Data Analysis Chapter 3

Session 3 The proportional odds model and the Mann-Whitney test

STA 2101/442 Assignment 2 1

Linear Methods for Prediction

Generalized logit models for nominal multinomial responses. Local odds ratios

A. Motivation To motivate the analysis of variance framework, we consider the following example.

2.6.3 Generalized likelihood ratio tests

UNIVERSITY OF TORONTO Faculty of Arts and Science

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

INTRODUCTION TO LOG-LINEAR MODELING

CIVL 7012/8012. Simple Linear Regression. Lecture 3

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Describing Contingency tables

STAT 705: Analysis of Contingency Tables

Chapter 2: Describing Contingency Tables - I

Transcription:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2008, The Johns Hopkins University and. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

Lecture 23 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 15, 2007

1 2 3 4 5 6 7

1 2 3 CMH estimate 4 CMH test

(perceived) Death penalty Victim Defendant yes no % yes White White 53 414 11.3 Black 11 37 22.9 Black White 0 16 0.0 Black 4 139 2.8 White 53 430 11.0 Black 15 176 7.9 White 64 451 12.4 Black 4 155 2.5 1 1 From Agresti, Categorical Data Analysis, second edition

Discussion Marginally, white defendants received the death penalty a greater percentage of time than black defendants Across white and black victims, black defendant s received the death penalty a greater percentage of time than white defendants refers to the fact that marginal and conditional associations can be opposing The death penalty was enacted more often for the murder of a white victim than a black victim. Whites tend to kill whites, hence the larger marginal association.

Example Wikipedia s entry on gives an example comparing two player s batting averages First Second Whole Half Half Season Player 1 4/10 (.40) 25/100 (.25) 29/110 (.26) Plater 2 35/100 (.35) 2/10 (.20) 37/110 (.34) Player 1 has a better batting average than Player 2 in both the first and second half of the season, yet has a worse batting average overall Consider the number of at-bats

Berkeley admissions data The Berkeley admissions data is a well known data set regarding Simpsons?UCBAdmissions data(ucbadmissions) apply(ucbadmissions, c(1, 2), sum) Gender Admit Male Female Admitted 1198 557 Rejected 1493 1278.445.304 <- Acceptance rate

Acceptance rate by department > apply(ucbadmissions, 3, function(x) c(x[1] / sum(x[1 : 2]), x[3] / sum(x[3 : 4]) ) ) Dept M F A 0.62 0.82 B 0.63 0.68 C 0.37 0.34 D 0.33 0.35 E 0.28 0.24 F 0.06 0.07

Why? The application rates by department > apply(ucbadmissions, c(2, 3), sum) Dept Gender A B C D E F Male 825 560 325 417 191 373 Female 108 25 593 375 393 341

Discussion Mathematically, pardox is not ical a/b < c/d e/f < g/h (a + e)/(b + f ) > (c + g)/(d + h) More statistically, it says that the apparent relationship between two variables can change in the light or absence of a third

Variables that are correlated with both the explanatory and response variables can distort the estimated effect Victim s race was correlated with defendant s race and death penalty One strategy to adjust for confounding variables is to stratify by the confounder and then combine the strata-specific estimates Requires appropriately weighting the strata-specific estimates Unnecessary stratification reduces precision

Aside: weighting Suppose that you have two unbiased scales, one with variance 1 lb and and one with variance 9 lbs Confronted with weights from both scales, would you give both measurements equal creedance? Suppose that X 1 N(µ, σ 2 1 ) and X 2 N(µ, σ 2 2 ) where σ 1 and σ 2 are both known log-likelihood for µ (x 1 µ) 2 /2σ 2 1 (x 2 µ) 2 /2σ 2 2

Continued Derivative wrt µ set equal to 0 Answer (x 1 µ)/σ 2 1 + (x 2 µ)/σ 2 2 = 0 x 1 r 1 + x 2 r 2 r 1 + r 2 = x 1 p + x 2 (1 p) where r i = 1/σ 2 i and p = r 1 /(r 1 + r 2 ) Note, if X 1 has very low variance, its term dominates the estimate of µ General principle: instead of averaging over several unbiased estimates, take an average weighted according to inverse variances For our example σ1 2 = 1, σ2 2 = 9 so p =.9

Let n ijk be entry i, j of table k The k th sample odds ratio is ˆθ k = n 11kn 22k n 12k n 21k The Mantel Haenszel is of the form ˆθ = The weights are r k = n 12kn 21k n ++k The simplifies to ˆθ MH = k n 11kn 22k /n ++k k n 12kn 21k /n ++k SE of the log is given in Agresti (page 235) or Rosner (page 656) k r k ˆθ k k r k

Center 1 2 3 4 5 6 7 8 S F S F S F S F S F S F S F S F T 11 25 16 4 14 5 2 14 6 11 1 10 1 4 4 2 C 10 27 22 10 7 12 1 16 0 12 0 10 1 8 6 1 n 73 52 38 33 29 21 14 13 S - Success, F - failure T - Active Drug, C - placebo 2 ˆθ MH = (11 27)/73 + (16 10)/25 +... + (4 1)/13 (10 25)/73 + (4 22)/25 +... + (6 2)/13) = 2.13 Also log ˆθ MH =.758 and ˆ SE log ˆθ MH =.303 2 Data from Agresti, Categorical Data Analysis, second edition

CMH test H 0 : θ 1 =... = θ k = 1 versus H a : θ 1 =... = θ k 1 The CHM test applies to other alternatives, but is most powerful for the H a given above Same as testing conditional independence of the response and exposure given the stratifying variable CMH conditioned on the rows and columns for each of the k contingency tables resulting in k hypergeometric distributions and leaving only the n 11k cells free

CMH test cont d Under the conditioning and under the null hypothesis E(n 11k ) = n 1+k n +1k /n ++k Var(n 11k ) = n 1+k n 2+k n +1k n +2k /n 2 ++k (n ++k 1) The CMH test statistic is [ k {n 11k E(n 11k )}] 2 k Var(n 11k) For large sample sizes and under H 0, this test statistic is χ 2 (1) (regardless of how many tables you are summing up)

In R dat <- array(c(11, 10, 25, 27, 16, 22, 4, 10, 14, 7, 5, 12, 2, 1, 14, 16, 6, 0, 11, 12, 1, 0, 10, 10, 1, 1, 4, 8, 4, 6, 2, 1), c(2, 2, 8)) mantelhaen.test(dat, correct = FALSE) Results: CMH TS = 6.38 P-value:.012 Test presents evidence to suggest that the treatment and response are not conditionally independent given center

Some final notes on CMH It s possible to perform an analogous test in a random effects logit model that benefits from a complete model specification It s also possible to test heterogeneity of the strata-specific odds ratios Exact tests (guarantee the type I error rate) are also possible exact = TRUE in R