Suppose that we are concerned about the effects of smoking. How could we deal with this?

Similar documents
Lecture 12: Effect modification, and confounding in logistic regression

Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Logistic Regression - problem 6.14

10.4 Hypothesis Testing: Two Independent Samples Proportion

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Inferences About Two Proportions

DATA IN SERIES AND TIME I. Several different techniques depending on data and what one wants to do

The Chi-Square Distributions

10: Crosstabs & Independent Proportions

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

STA102 Class Notes Chapter Logistic Regression

Testing Independence

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

E509A: Principle of Biostatistics. GY Zou

Lecture 5: ANOVA and Correlation

Correlation and regression

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Chapter Six: Two Independent Samples Methods 1/51

Part IV Statistics in Epidemiology

Logistic regression: Miscellaneous topics

Tests for Two Correlated Proportions in a Matched Case- Control Design

Cohen s s Kappa and Log-linear Models

Sociology 362 Data Exercise 6 Logistic Regression 2

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Statistics in medicine

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. describes the.

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Analysis of Categorical Data Three-Way Contingency Table

Lab 8. Matched Case Control Studies

Institute of Actuaries of India

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Three-Way Contingency Tables

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Topic 21 Goodness of Fit

Data Analysis and Statistical Methods Statistics 651

Comparison of Two Population Means

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

CBA4 is live in practice mode this week exam mode from Saturday!

Lecture 14: ANOVA and the F-test

2 Describing Contingency Tables

Chi Square Analysis M&M Statistics. Name Period Date

The Chi-Square Distributions

Chapter 9 Inferences from Two Samples

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

10/31/2012. One-Way ANOVA F-test

Chapter 7 Comparison of two independent samples

Data Analysis and Statistical Methods Statistics 651

Hypothesis Testing in Action: t-tests

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

STA6938-Logistic Regression Model

Tables Table A Table B Table C Table D Table E 675

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Chapter Seven: Multi-Sample Methods 1/52

Binary Logistic Regression

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Small n, σ known or unknown, underlying nongaussian

12.10 (STUDENT CD-ROM TOPIC) CHI-SQUARE GOODNESS- OF-FIT TESTS

Chapter 20: Logistic regression for binary response variables

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lecture 6 Multiple Linear Regression, cont.

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Hypothesis Testing in Action

For more information about how to cite these materials visit

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Chapter 8 Heteroskedasticity

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Chapter 9. Hypothesis testing. 9.1 Introduction

APPENDIX B Sample-Size Calculation Methods: Classical Design

Chapter 8 - Statistical intervals for a single sample

TUTORIAL 8 SOLUTIONS #

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Unit 5 Logistic Regression

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Parameter Estimation and Fitting to Data

Chapter 3. Comparing two populations

POLI 443 Applied Political Research

Lecture 8: Summary Measures

Practical Meta-Analysis -- Lipsey & Wilson

STAT 285: Fall Semester Final Examination Solutions

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

STAT 705: Analysis of Contingency Tables

Survey of Smoking Behavior. Survey of Smoking Behavior. Survey of Smoking Behavior

Statistics Handbook. All statistical tables were computed by the author.

(i) Given that a student is female, what is the probability of having a GPA of at least 3.0?

Means or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv

Unit 5 Logistic Regression

Unit 5 Logistic Regression

Transcription:

Suppose that we want to study the relationship between coffee drinking and heart attacks in adult males under 55. In particular, we want to know if there is an association between coffee drinking and heart attacks; and we also might like to estimate the odds of having a heart attack for the coffee drinkers versus the non coffee drinkers. We could collect a sample of males under 55 and ask two questions: Have you had a heart attack? Do you drink coffee? If we constructed a 2 x 2 table with the data, what test would be appropriate to test for an association between coffee drinking and heart attack incidence? Suppose that we are concerned about the effects of smoking. How could we deal with this? 1

To address this, we could obtain samples from smoking and nonsmoking males under 55, conduct two 2 x 2 tests, and estimate two odds ratios. Suppose we did that: For the smokers, MI 1011 81 1092 No MI 390 77 467 total 1401 158 1559 2 For the non-smokers, MI 383 66 449 No MI 365 123 488 total 748 189 937 Conduct the tests...

3 How should we combine the results? MI 1394 147 1541 No MI 755 200 955 total 2149 347 2496 Now what happens? Conduct the test...

Simply combining the data defeats the purpose of controlling for the confounding variable! The Mantel-Haenszel test can be used to assess the association of a dichotomous disease and a dichotomous exposure variable after controlling for k confounding variables (in our example, k = 2). To do this, Compute O = k i=1 O i in the (1,1) cell over the k strata. Compute E = k i=1 E i in the (1,1) cell over the k strata. Compute the variance of O: V = k i=1 (a i +b i )(c i +d i )(a i +c i )(b i +d i ) (n i ) 2 (n i 1) The test statistic is given by χ 2 MH = ( O E 0.5)2 V which under H 0 is distributed as chi-square with 1 df. Note: This test should only be used when V 5. 4 For our example, O = 1394, E = 1339.76 and V = 67.5. This gives χ 2 MH = 42.79

5 As before, we might like to get estimates of the odds ratio: For the smokers, MI 1011 81 1092 No MI 390 77 467 total 1401 158 1559 and ˆ (OR S ) = (1011)(77) (390)(81) = 2.46 For the non-smokers, MI 383 66 449 No MI 365 123 488 total 748 189 937 and ˆ (OR NS ) = (383)(123) (365)(66) = 1.96

Of course, what we really want to know is the true underlying OR. Mantel-Haenszel s Method can be used to combine data from multiple 2 x 2 tables to estimate the underlying odds ratio and test if that underlying odds ratio is significantly different than 1. Before combining the data from the two tables, we should verify that the population odds ratios are not significantly different across strata. If they are, there is no good reason to try to obtain a common estimate. 6

We need to test the null H 0 : OR S = OR NS against the alternative H A : OR S OR NS. 7 Recall that Let se(ln( ˆ OR)) ˆ = 1 a + 1 b + 1 c + 1 d se ˆ 2 (ln( OR)) ˆ = 1 a + 1 b + 1 c + 1 d ŝe 2 S = se2 ˆ (ln( ORˆ S )) and ŝe 2 NS = se2 ˆ (ln( ORˆ NS )) Under the null, X 2 = 1 ˆ se 2 S (ln( OR ˆ S ) ln(or)) ˆ 2 + 1 ˆ se 2 NS (ln( ORˆ NS ) ln(or)) ˆ 2 is approximately chi-square with (2-1) degrees of freedom. ˆ ln(or) = 1 ŝe 2 S ln( ORˆ S ) + 1 1 ŝe 2 S ŝe 2 NS + 1 ŝe 2 NS ln( ORˆ NS )

8 Recall the data For the smokers, MI 1011 81 1092 No MI 390 77 467 total 1401 158 1559 and ˆ (OR S ) = (1011)(77) (390)(81) = 2.46 For the non-smokers, MI 383 66 449 No MI 365 123 488 total 748 189 937 and ˆ (OR NS ) = (383)(123) (365)(66) = 1.96

We need to test the null H 0 : OR S = OR NS against the alternative H A : OR S OR NS. ORˆ S = 2.46 and ORˆ NS = 1.96. So, ln( ORˆ S ) = 0.9 and ln( ORˆ NS ) = 0.673 9 se ˆ 2 S = 1 1011 + 1 390 + 1 81 + 1 77 = 0.02888504 and 1 ŝe 2 S Similarly, = 34.62. se ˆ 2 NS = 1 383 + 1 365 + 1 66 + 1 123 = 0.02862869 and 1 ŝe 2 NS = 34.93. ln( ˆ OR) = = 1 ŝe 2 S ln( ORˆ S ) + 1 1 ŝe 2 S ŝe 2 NS + 1 ŝe 2 NS ln( ORˆ NS ) 34.62(0.9) + 34.93(0.673) 34.62 + 34.93 = = 0.786

10 Under the null, X 2 = 1 ˆ se 2 S (ln( OR ˆ S ) ln( OR)) ˆ 2 + 1 ˆ se 2 NS (ln( ORˆ NS ) ln( OR)) ˆ 2 is approximately chi-square with (2-1) degrees of freedom. Evaluating the test statistic gives x 2 = (34.62)(0.9 0.786) 2 + (34.93)(0.673 0.786) 2 = 0.896 The null is not rejected (for α = 0.05) and we can proceed. Recall that we want to combine the data from the two 2 x 2 tables to obtain a better estimate of the underlying odds ratio.

11 Using the Mantel-Haenszel method, ˆ OR = (ad) S T S (bc) S T S + (ad) NS T NS + (bc) NS T NS where T S and T NS represent the total number of observations in the tables obtained from smokers and non-smokers, repsectively. For the data here, ˆ OR = (1011)(77) 1559 + (383)(123) 937 (390)(81) + (365)(66) 1559 937 = 2.18 Conclusion: We estimate that males under the age of 55 who drink coffee have odds 2.18 times greater of having a heart attack than males who do not drink coffee.

12 An approximate 95 % confidence interval for the true odds ratio is given by ( e ln( OR) 1.96ŝe(ln( ˆ OR)),e ˆ ln( OR) 1.96ŝe(ln( ˆ OR)) ˆ ) This gives ( e 0.786 1.96(0.120), e 0.786+1.96(0.120)) = (1.73, 2.78)