Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017
A Tribute to Don Design trumps analysis Motivated by a real study Experimental design & rerandomization Observational study & propensity scores Rubin causal model & potential outcomes Educational testing (AP scores) (Missing data) (Noncompliance)
Design trumps Analysis For Objective Causal Inference, Design trumps Analysis Rubin 2008 X = covariates, W = treatment, Y = outcome(s) Design W X Analysis Y W, X Balance covariates As much as possible should be done without observed outcomes!
Knowledge in Action Goal: estimate causal effect of Knowledge in Action (KIA) (a form of project-based learning) in AP classes on AP scores and other outcomes Part 1 ( Efficacy Study ): randomize schools to KIA or control; compare outcomes after 1 year Part 2 ( Maturation Study ): continue to follow schools another year (experimental & observational)
*In this talk I ll just focus on one district Districts (blocks) District 1 District 5 Schools (clusters) RANDOMIZATION Teachers Students OUTCOMES
Covariates Covariates available at randomization: School covariates (e.g. Title 1 status, type, etc.) Teacher covariates (e.g. years of experience) Previous student (class) covariates: Race/ethnicity Poverty status Parental education PSAT scores x 1 8 th grade standardized test scores Total number of students Number of students who took the AP exam If covariates are available, we should use them when we randomize! x 2 2 covariates used for randomization
Rerandomization Collect covariate data Specify criteria for acceptable balance (Re)randomize Randomize units units to to treatment groups Check balance x $,& x $,( < 0.05 and x -,& x -,( < 0.05 unacceptable acceptable Conduct experiment Analyze results
Covariate Balance: Empirical x 1 x 2 Pure Randomization Rerandomization PRIV = 98.4% PRIV = 97.4% -1.0-0.5 0.0 0.5 1.0 x T x C -0.6-0.4-0.2 0.0 0.2 0.4 0.6 x T x C Percent reduction in variance: PRIV = var x 6,& x 6,( var x 6,& x 6,( rerand. var x 6,& x 6,(
Covariate Balance: Theoretical Suppose x 6,& x 6,( ~ Normal for j 1 k x $ x - x A Rerandomize if x 6,& x 6,( a 6 for j 1 k Then the PRIV for x 6 is p ap = 0.984 p ai = 0.973 p DE = 1 2 G H I, G P I, J E I IKJL(N E ) P SQ T P R SQU JI E IKJL(N E ) P SQ T P R SQU, _ where γ(b, c) y \]$ e ]^dy. `
Outcome PRIV If rerandomization is equal percent variance reducing (EPVR), then PRIV for the outcome difference in means is PRIV g = R - PRIV i Here, R - 0.75 and PRIV i 98%, so PRIV g 0.75 0.98 = 74% Precision increases by a factor of $ $]`.lm = 3.85 Equivalent to almost quadrupling n!!! (Effective sample size goes from 76 to 293!)
Correlational Structure x 2T x 2C -0.6-0.4-0.2 0.0 0.2 0.4 x 2T x 2C -0.04-0.02 0.00 0.02 0.04-0.04-0.02 0.00 0.02 0.04 x 1T x 1C -0.6-0.4-0.2 0.0 0.2 0.4 0.6 x 1T x 1C
Affine Invariance Affine invariance: rerandomization stays the same for any affine transformation a + bx If rerandomization criterion is affinely invariant and x is ellipsoidally symmetric 1. Ε Xp & Xp _ rerand. = Ε Xp & Xp _ = 0 => Rerandomization leads to unbiased estimates for any linear function of x 2. cov Xp & Xp _ rerand. cov Xp & Xp _ Preserves the correlations of Xp & Xp _ Balance improvement equal for each x 6 (equal percent variance reducing) (Morgan and Rubin, Annals of Statistics, 2012)
Mahalanobis Mahalanobis: Xp & Xp _ cov x ]$ Xp & Xp _ x 1 x 2 Pure Randomization Rerandomization PRIV = 97.4% PRIV = 97.4% -0.5 0.0 0.5 x T x C -0.5 0.0 0.5 x T x C
Knowledge in Action Part 1 ( Efficacy Study ): randomize schools to KIA or control; compare outcomes after 1 year Part 2 ( Maturation Study ): continue to follow schools another year (experimental & observational)
Covariate data for schools not in RCT MATCHING Matched Sample: 2 years of KIA no KIA Covariate data for schools in RCT RANDOMIZE WAVE 1: WAVE 2: KIA KIA: 2 nd year KIA: 1 st year 2 years of KIA no KIA? 1 year of KIA no KIA 2 years of KIA 1 year of KIA 2015-2016 2016-2017 2017-2018
2 years of KIA no KIA? 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WAVE 1: KIA KIA: 2 nd year WHICH IS BETTER??? WAVE 2: KIA: 1 st year 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018
Potential Outcomes & Estimands Y~ 6 (W 6, t)= potential outcome for school j under treatment W 6 in year t Causal effect: compare potential outcomes under different treatments τ $, Y 1, t Y 0, t = 6 $ Y~ 6 1,t n 6 $ Y~ 6 0,t n τ -]$, Y 2, t Y 1, t = 6 $ Y~ 6 2, t n 6 $ Y~ 6 1, t n τ -, Y 2, t Y 0, t = 6 $ Y~ 6 2, t n 6 $ Y~ 6 1, t *Note: difference in means presented for clarity; actual analysis to use HLM n
Estimators τ $,-`$l Q EˆP E g~ E ($,-`$l) Q EˆP E τ -]$,-`$ 6 $ I E -Y~ 6 (2,2018) 6 $ I E - Q EˆP ($] E )g~ E (`,-`$l) Q EˆP($] E ) 6 $ I E $Y~ 6 (1,2018) 6 $ I E $ τ -,-`$ 6 $ I E -Y~ 6 (2,2018) 6 $ I E - 6 $ I E $Y~ 6 (0,2018) 6 $ I E $
2 years of KIA no KIA? 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WAVE 1: KIA KIA: 2 nd year WHICH IS BETTER??? WAVE 2: KIA: 1 st year 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018
Propensity Score Matching 1 if in Wave 1 of experiment W 6 = Š 0 if not in experiment Propensity score: e 6 = P W 6 = 1 x 6 ) Match each Wave 1 teacher with a control with a similar propensity score Criteria for success: Quality of observed covariate data can only balance observed data Good matches available adequate overlap between groups large enough pool of potential controls
Propensity Score Matching If we have good matches, we can balance observed covariates Key point: unless we have data on all relevant covariates (which we won t), there will still be bias (baseline differences) Usually hard to quantify this bias BUT we have a very rare feature!!
1 year of KIA no KIA 2 years of KIA no KIA Matched Sample: WAVE 1: WAVE 2: KIA KIA: 2 nd year KIA: 1 st year We can validate the nonexperimental approach by comparing 1 year impact estimates! 1 year of KIA no KIA 2016-2017 2017-2018
2 years of KIA no KIA? 1 year of KIA no KIA 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WAVE 1: KIA KIA: 2 nd year WHICH IS BETTER??? WAVE 2: KIA: 1 st year 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018
Experimental Indirect Approach τ -]$,-`$ + τ $,-`$l = Y 2,2018 Y 1,2018 + Y 1,2017 Y 0,2017 Critical assumption: potential outcomes may depend on year, but treatment effects do not That is, Y 1,2017 Y 1,2018, but τ $,-`$l = τ $,-`$ τ $ This implies τ $ + τ -]$ = τ -
Unbiased Define τ - τ $+ τ -]$ Theorem: Assuming treatment effects do not vary by year, Ε τ - = τ -. Proof: Ε τ - = E τ $+ τ -]$ = τ $ + τ -]$ = τ -.
Variance var(τ -) = var(τ $+ τ -]$) = var τ $ +var τ -]$ + 2cov(τ $, τ -]$) Both estimates are comparisons of the same teachers; likely to be highly positively correlated More than double the variance of each individual estimate
Constant Treatment Effect? Suppose constant treatment effect, so Y~ 6 1, t = Y~ 6 0,t + τ $ and Y~ 6 2, t = Y~ 6 1,t + τ -]$ j. Then: o τ $ = τ $ + Y $ (0, 2017) Y - (0, 2017) o τ -]$ = τ -]$ + Y $ (0, 2018) Y - (0, 2018) Under additivity, and if we again assume differences in time cancel with comparisons within the same year, then τ $ and τ -]$ are perfectly correlated! var(τ -) = var τ $ +var τ -]$ + 2 var τ $ var τ -]$ If var τ $ var τ -]$, then var(τ -) 4var τ $
2 years of KIA no KIA? 1 year of KIA no KIA 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WHICH IS BETTER??? WAVE 1: WAVE 2: KIA KIA: 2 nd year KIA: 1 st year BIAS- VARIANCE TRADEOFF! Complementary! 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018
Other Interesting Tidbits Student-level versus school level analysis Combined analyses? Student/parental consent => missing data Joiners Non-compliance Teachers switching schools/courses Anticipation bias and more!
Conclusion Rerandomization can improve experimental design Propensity score matching can improve observational studies Bias-variance tradeoff for 2 year impact Lots of fun statistics in rich applied problems!
klm47@psu.edu Funded by George Lucas Educational Foundation Joint work with Anna Saavedra, Amie Rappaport, Ying Liu, and Juan Saavedra