Propensity Score Methods for Causal Inference

John Pura BIOS790 October 2, 2015

Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good reference on history of causal inference: Paul Holland Statistics and Causal Inference JASA, 1986

What can we estimate? Potential Outcomes Framework (Rubin s Causal Model) Notation: Z (1=treated, 0=control), baseline covariates X =(X 1,...,X p ), outcome Y potential outcomes Y 0, Y 1 We observe (Z, Y, X) foranindividual Y = ZY (1)+(1 Z)Y (0) Causal e ect of treatment: Y (1) Y (0) Average causal e ect: = E[Y (1) Y (0)]

What can we estimate? Average causal e ect ACE All or ATE = E[Y (1) Y (0)]) ACE Exp or ATT = E[Y (1) Y (0) Z = 1]) ACE Un or ATU = E[Y (1) Y (0) Z = 0]) Estimand and statistical methods depends on the study goal/question

Assumptions 1. ZprecedesY 2. Stable Unit Treatment Value Assumption (SUTVA) non-interference no variation in treatment 3. Strongly Ignorable Treatment Assigment (SITA) 0 < P(Z = 1 X) < 1 (this is the propensity score) (Y (0), Y (1)) Z X (very strong assumption) no unobserved confounders

Randomized Controlled Trials vs. Observational Studies RCTs Treatment e ects on outcome considered as causal Z is determined for each participant at random, (Y (0), Y (1)) Z E[(Y Z = 1) (Y Z = 0)] is unbiased estimate of ATT = ATE Observational Study Z is not controlled, (Y (0), Y (1)) Z E(Y Z = 1) =E(Y (1) Z = 1) = E(Y (1)). Cannot obtain unbiased estimate by direct comparison. But...

Potential Solution In observational studies, assuming SITA assumption is met then treatment assignment, Z, among individuals with particular X is essentially random and independent of potential outcomes Rosenbaum and Rubin (1983) - conditioning on the propensity score (PS) we can identify E(Y (0)) and E(Y (1)) from the observed data (Z, Y, X) andultimatelyestimate.

Propensity Score Austin, 2011: The propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects This is a large sample property Unknown in practice, but can be estimated from the data, given some assumptions on e(x) (e.g. parametric regression model, CRTs). Mathematically: e(x) = P(Z = 1 X). R&Rshowedthat X Z e(x) and in addition to the SITA assumption, (Y (0), Y (1)) Z e(x). For theoretical properties see R&R (1983) and Lunceford and Davidian (2004)

The Propensity Score Model Goal: Covariate balance Popular method for estimating PS is logistic regression, though others exist (e.g. tree-based methods, random forests, neural networks, etc.) Regress logit[p(z = 1 X)] on X and obtain predicted probabilities (ê(x)) R&R (1984) and Austin 2011 describe an iterative approach: 1. Specify an initial model to estimate ê(x) 2. Perform diagnostics to assess covariate balance for each treatment 3. Modify PS by adding covariates, interactions, or using non-linear terms 4. Important: Each step should not be motivated by statistical significance but by objective

The Propensity Score Model Goal: Covariate balance What covariates do we include? Selection driven by subject-matter knowledge Only baseline variables Include all confounders and possible non-linear transformations (e.g. interactions). Overfitting generally not an issue (unless treatment is uncommon) Always include variables that a ect the outcome even if they don t a ect treatment assignment (Brookhart et al. (2006))

Diagnostics How do we know the PS model has been adequately specified? Assess standardized di erences of each covariate between treatment groups (very useful) Assess PS distributions by treatment (need common support condition) Compare distributions of the covariates between treatments Varies with PS method Di cult in practice with high dimensional data Assess the sensitivity of study conclusions to the SITA assumption.

Methods utilizing PS Matching Stratification Inverse PS weighting Covariate adjustment by PS PS methods allow for estimation of the marginal treatment e ect. The first three separate the design of the study from the analysis of the study. Can do subsequent regression adjustment to eliminate residual imbalance in prognostically important covariates after first three PS methods

Matching Simple formulation for ATT For each treated subject, select single untreated subject (without replacement) with same value of ê(x) or its logit (R&R, 1985) Take di erence of outcomes for the matched pair and average over all matched pairs Calculating ATE and ATU require slightly di erent sampling, possibly with replacement Advantage: Eliminates large proportion of systematic di erences in baseline characteristics between treated and untreated subjects Disadvantage: Inexact matching may lead to bias. Unmatched individuals are discarded, leading to loss in statistical power. Discarding individuals may also alter our estimand (Hill, 2008)

Stratification Easily estimate ATT: Create quantiles (e.g. quintiles) of the PS values, thereby dividing the subjects into equal-sized strata Within each stratum estimate treatment e ect Calculate weighted average of within-strata estimates of treatment e ect. Weight of each stratum is simply the percent of the quantile Estimating ATE and ATU require weighting by fraction of treated or untreated individuals, respectively, per stratum Advantage: Easy to construct and estimate causal e ects. Disadvantage: Small number of strata may result in residual confounding within the strata, resulting in bias. ATT estimates largely biased (compared to weighting)

Inverse weighting Weighted linear regression of outcome on treatment where w = Z w 1 + 1 Z w 2 For ATE, w 1 = e(x), w 2 = 1 e(x); foratt,w 1 = 1, w 2 = e(x) 1 e(x) ;ForATU,w 1 = 1 e(x) e(x), w 2 = 1. (Morgan & Todd, 2008) Advantage: Uses all available data; Can deal with more complex non-linear link functions (e.g. odds ratio); generally less biased than stratification (Lunceford & Davidian, 2004) Disadvantage: An individual with PS close to 0 or 1 will have unstable weights, leading to potentially spurious treatment e ects with high variance and wide CIs.

Covariate adjustment using PS Fit model: E(Y Z, X) = + Z + f (e(x)) (may include interaction of Z and e(x)) Can obtain ATE, ATT, and ATU by evaluating at di erent values of ê(x) Advantage: Allows for flexible relationship between PS and outcome (e.g. use of splines for PS) Disadvantage: Sensitive to whether PS has been accurately estimated. Analyst may be tempted to work toward desired or anticipated result, given that outcome is in sight.

Final Thoughts PS methods can be done without reference to outcome - i.e. separate study design from analysis Balance of covariates can be easily checked PS methods more robust to model misspecification compared to traditional outcome regression (all we care about is balance) Measures a di erent quantity, namely, the marginal/population treatment e ect (vs. conditional/individual treatment e ect in traditional regression) Important to distinguish the two in relation to study goals Omitted variable bias a ects internal validity of both approaches similarly Strategy so far is to balance covariates. Another idea is to find an instrument " S that is randomly assigned and a ects Y only through Z