Estimating the Marginal Odds Ratio in Observational Studies

Similar documents
Marginal, crude and conditional odds ratios

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Propensity Score Methods for Causal Inference

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Propensity Score Analysis with Hierarchical Data

arxiv: v1 [stat.me] 15 May 2011

Journal of Biostatistics and Epidemiology

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study

Combining multiple observational data sources to estimate causal eects

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

University of California, Berkeley

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Causal Inference Basics

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Ignoring the matching variables in cohort studies - when is it valid, and why?

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

High Dimensional Propensity Score Estimation via Covariate Balancing

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Estimating direct effects in cohort and case-control studies

Propensity Score Weighting with Multilevel Data

Balancing Covariates via Propensity Score Weighting

Covariate Balancing Propensity Score for General Treatment Regimes

University of Michigan School of Public Health

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Bootstrapping Sensitivity Analysis

More Statistics tutorial at Logistic Regression and the new:

Sensitivity analysis and distributional assumptions

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Propensity Score Methods, Models and Adjustment

Lecture 12: Effect modification, and confounding in logistic regression

Stratification and Weighting Via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study

Comment: Understanding OR, PS and DR

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Statistics in medicine

Discussion of Papers on the Extensions of Propensity Score

Stat 642, Lecture notes for 04/12/05 96

Data Integration for Big Data Analysis for finite population inference

Observational Studies and Propensity Scores

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Unbiased estimation of exposure odds ratios in complete records logistic regression

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Categorical data analysis Chapter 5

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Causal Mechanisms Short Course Part II:

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

CompSci Understanding Data: Theory and Applications

CDA Chapter 3 part II

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

A Sampling of IMPACT Research:

Propensity Score Matching

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Causal inference in epidemiological practice

What s New in Econometrics. Lecture 1

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

University of California, Berkeley

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Selection on Observables: Propensity Score Matching.

Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates

DATA-ADAPTIVE VARIABLE SELECTION FOR

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Misclassification in Logistic Regression with Discrete Covariates

Marginal Structural Models and Causal Inference in Epidemiology

University of California, Berkeley

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

Extending causal inferences from a randomized trial to a target population

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies

Propensity Score Methods for Estimating Causal Effects from Complex Survey Data

Causal Analysis in Social Research

Lecture 2: Poisson and logistic regression

Business Statistics. Lecture 10: Correlation and Linear Regression

IV-estimators of the causal odds ratio for a continuous exposure in prospective and retrospective designs

Targeted Maximum Likelihood Estimation in Safety Analysis

A weighted simulation-based estimator for incomplete longitudinal data models

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

Flexible Estimation of Treatment Effect Parameters

Variable selection and machine learning methods in causal inference

Describing Stratified Multiple Responses for Sparse Data

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Matching to estimate the causal effects from multiple treatments

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

PEARL VS RUBIN (GELMAN)

Authors and Affiliations: Nianbo Dong University of Missouri 14 Hill Hall, Columbia, MO Phone: (573)

A tool to demystify regression modelling behaviour

Transcription:

Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011

Outline The Counterfactual Model Odds Ratios Dened Estimation of Odds Ratios The Propensity Score Matching on the Propensity Score Weighting by the Propensity Score Simulations

The Counterfactual Model Denitions We will use the following notation: Y 1 - the potential binary reponse if the unit is exposed Y 0 - the potential binary reponse if the unit is not exposed Z - the binary exposure (or treatment) of interest X - covariates associated with exposure and/or response Though each of Y 1 and Y 0 are real values, we are only able to observe one. We call the observed response Y : Y = Z Y 1 + (1 Z) Y 0

The Counterfactual Model A Population A hypothetical population would then look like the following: Unit X 1 X 2 X 3 Z Y 1 Y 0 1 0.09 1.80 1.86 0 1 1 2 0.51 0.25 1.62 1 0 1 3 0.48 0.69 0.35 1 0 0 4 1.48 1.76 0.47 0 0 1 5 0.55 0.50 0.82 1 1 0....... where the observed Y for each unit is colored blue

Odds Ratios Marginal Odds Ratio The marginal odds ratio can be obtained by comparing the odds of response in the population if everyone is exposed Odds exp = P (Y 1 = 1) P (Y 1 = 0) to the odds of response if everyone in not exposed Looking at the ratio Odds unexp = P (Y 0 = 1) P (Y 0 = 0) Oddsexp Odds unexp, the marginal odds ratio is ψ marg = P (Y 1 = 1) P (Y 0 = 0) P (Y 1 = 0) P (Y 0 = 1)

Odds Ratios Crude Odds Ratio The marginal odds ratio is often approximated by estimating the crude odds ratio: ψ crude = P (Y 1 = 1 Z = 1) P (Y 0 = 0 Z = 0) P (Y 1 = 0 Z = 1) P (Y 0 = 1 Z = 0) When confounding is present P (Y 1 = 1 Z = 1) P (Y 1 = 1) P (Y 0 = 1 Z = 0) P (Y 0 = 1) so the crude and marginal odds ratio may be dierent values

Odds Ratios Conditional Odds Ratio The conditional odds ratio is dened as ψ cond (x) = P (Y = 1 Z = 1, X = x) P (Y = 0 Z = 0, X = x) P (Y = 0 Z = 1, X = x) P (Y = 1 Z = 0, X = x) With the assumption of strongly ignorable treatment assignment, i.e. (Y 1, Y 0 ) Z X and 0 < P(Z = 1 X ) < 1, we can simplify: ψ cond (x) = P (Y 1 = 1 X = x) P (Y 0 = 0 X = x) P (Y 1 = 0 X = x) P (Y 0 = 1 X = x)

Odds Ratios Non-collapsibility In the linear model E(Y X, Z) = β T X + Z, the conditional eect of exposure is equal to the marginal eect: E(Y 1 X ) E(Y 0 X ) = (β T X + ) (β T X ) = E(Y 1 ) E(Y 0 ) = E(E X (β T X + )) E(E X (β T X )) = (β T µ X + ) (β T µ X ) = The linear model is collapsible In the logistic model, this property does not hold The dierence between the marginal and conditional eects means estimators are needed for each

Odds Ratios Standard Estimation If the data follow a logistic model with { exp β 0 + } p β j=1 j x j + αz P (Y = 1 X = x, Z = z) = { 1 + exp β 0 + } p β j=1 j x j + αz the conditional odds ratio is constant across X with ψ cond (x) = ψ cond = e α. Logistic regression will lead to an unbiased and asymptotically ecient estimator of ψ cond (x) In order for the estimate to be unbiased, X must contain all predictors of Y, not just the confounders (Gail, Wieand, and Piantadosi, 1984)

Odds Ratios Standard Estimation Subclassing observations based on covariates into multiple 2 2 tables, we can estimate the odds ratio by the Mantel-Haenszel (MH) estimator If the k th table takes the form Z = 1 Z = 0 Y = 1 a k b k Y = 0 c k d k n k the MH estimator is dened as ˆψ MH = k k a k d k n k b k c k n k But what odds ratio are we estimating?

Odds Ratios Standard Estimation If the covariates are constant within each subclass, ˆψ MH estimates the conditional odds ratio, assuming ψ cond is constant Examples: Subclassication dened by categorical covariates; perfect matching on covariates If covariates vary within the subclasses, ˆψ MH will be biased due to non-collapsibility (eg. Greenland, Robins, and Pearl, 1999) Extreme case: When data is summarized in one table, ˆψMH estimates the crude odds ratio Cochran (1968) shows that using 5 subclasses removes approximately 90% of bias due to confounding in linear models

The Propensity Score Denition and Consequences The propensity score (Rosenbaum and Rubin, 1983) is the probability of exposure conditional on covariates: e(x) = P(Z = 1 X = x) Populations of exposed and unexposed with the same propensity score have the same distribution of observed covariates: X Z e(x ) Under strongly ignorable treatment assignment on X, (Y 1, Y 0 ) Z e(x ) as well Common uses include Subclassication Weighting Matching As a covariate in a regression model

Matching on the Propensity Score In 1-to-1 matching, the MH estimator simplies: The k th table has the form the MH estimator becomes Z = 1 Z = 0 Y = 1 a k b k Y = 0 1 a k 1 b k 1 1 2 ˆψ MH = k a k(1 b k ) k (1 a k)b k Similar simplications hold for case-control designs

Matching on the Propensity Score As the number of matched pairs increases, it can be shown that n ( ˆψ MH ψ) N ( 0, σ 2) where and ψ = lim k p 1k(1 p 0k ) n k (1 p 1k)p 0k p zk = P(Y = 1 Z = z, e(x ) = e k ) for z = 0, 1

Matching on the Propensity Score Assuming a logistic outcome model, so ψ = lim n or p zk = k ψ = e α lim n ˆ exp { β T x + αz } 1 + exp {β T x + αz} df X e(x )(x e k ) exp{β T x+α} k k k 1+exp{β T x+α} df X e(x ) 1 1+exp{β T x} df X e(x ) 1 1+exp{β T x+α} df X e(x ) exp{β T x} 1+exp{β T x} df X e(x ) exp{β T x} 1+exp{β T x+α} df X e(x ) 1 1+exp{β T x} df X e(x ) 1 1+exp{β T x+α} df X e(x ) exp{β T x} 1+exp{β T x} df X e(x )

Matching on the Propensity Score When the exposure follows the model logit P(Z = 1 X ) = γ T X, and the outcome follows the model logit P(Y = 1 X, Z) = β T X + αz, the value of ψ depends on the relationship between γ T X and β T X : If β T X = f (γ T X ) for some f, then ψ = ψ cond β T X is constant in domains dened by e(x ) If γ T X and β T X are independent, then ψ = ψ marg Let H = h(x ) = β T X. Then F H e(x ) = F H If 0 < ρ ( γ T X, β T X ) < 1, ψ falls between ψ cond and ψ marg

Weighting by the Propensity Score A simple weighted estimate is obtained by ˆψ IPW 1 = ˆµ 1(1 ˆµ 0 ) (1 ˆµ 1 )ˆµ 0 where ˆµ 1 = 1 n i Z i Y i e(x i ) and ˆµ 0 = 1 n i (1 Z i )Y i 1 e(x i ) Unbiased for ψ marg if e(x ) correctly specied (Lunceford and Davidian, 2004) Extremely sensitive to extreme propensity scores (near 0 or 1) Either of ˆµ 1 and ˆµ 0 can be greater than 1, so that ˆψ IPW 1 < 0

Weighting by the Propensity Score Improvements can be made by using where ˆψ IPW 2 = µ 1(1 µ 0 ) (1 µ 1 ) µ 0 µ 1 = ( i Z i e(x i ) ) 1 i Z i Y i e(x i ) and µ 0 = ( i 1 Z i 1 e(x i ) ) 1 i (1 Z i )Y i 1 e(x i ) Remains unbiased Decreases sensitivity to propensity scores near 0 and 1 Less variance in small samples Always positive

Simulations Mechanics Created population of 2 million units: Standard normal covariates X = (X 1, X 2, X 3 ) Exposure Z : logit P(Z = 1 X) = γ T X Potential outcome Y 1 : logit P(Y 1 = 1 X, Z) = β T X + αz Potential outcome Y 0 : logit P(Y 0 = 1 X, Z) = β T X Took 10,000 independent samples of 2,000 observations each Crude and logistic estimation Sublcassied and matched on PS Weighted by inverse propensity score, 0.005 < e(x) < 0.995

Simulations Strong Correlation Crude Logistic Reg 5 PS classes PS matched IPW1 IPW2 2 4 6 8 10 Cor(γ T X, β T X) = 1 OR Estimate Crude: 6.924 Conditional: 3.004 Marginal: 1.85

Simulations Weak Correlation Crude Logistic Reg 5 PS classes PS matched IPW1 IPW2 1 2 3 4 5 6 Cor(γ T X, β T X) = 0.0008 OR Estimate Conditional: 3.015 Marginal: 1.604 Crude: 1.602

Simulations Moderate Correlation Crude Logistic Reg 5 PS classes PS matched IPW1 IPW2 1 2 3 4 5 6 Cor(γ T X, β T X) = 0.6413 OR Estimate Crude: 4.068 Conditional: 2.996 Marginal: 1.802

Simulations Correlation and Matching Bias Using scaled bias = Ê( ˆψ MH) ψ marg ψ cond ψ marg for 28 simulations with various γ and β: Conditional OR = 2 Conditional OR = 4 Scaled bias: Marginal OR = 0; Conditional OR = 1 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.5 0.0 0.5 1.0 cor(γ T X, β T X)

Conclusion Matching on the propensity score leads to an estimate which is consistent for neither the conditional nor marginal odds ratio Inverse propensity weighting yields an estimate which is unbiased for the marginal odds ratio Can adjustments be made to improve variability?

References Cochran, W.G. (1968). The Eectiveness of Adjustment by Subclassication in Removing Bias in Observational Studies. Biometrics 24, 295-313. Gail, M.H., Wieand, S., and Piantadosi S. (1984). Biased Estimates of Treatment Eect in Randomized Experiments with Nonlinear Regressions and Omitted Covariates. Biometrika 71, 431-444. Greenland, S., Robins, J.M., and Pearl, J. (1999). Confounding and Collapsibility in Causal Inference. Science 14, 29-46. Lunceford, J.K. and Davidian, M. (2004). Stratication and Weighting via the Propensity Score in Estimation of Causal Treatment Eects: A Comparative Study. Statistics in Medicine 23, 2937-2960. Rosenbaum, P.R. and Rubin, D.B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Eects. Biometrika 70, 41-55.