Analysis of propensity score approaches in difference-in-differences designs

Size: px

Start display at page:

Download "Analysis of propensity score approaches in difference-in-differences designs"

Alexina Dean
5 years ago
Views:

1 Author: Diego A. Luna Bazaldua Institution: Lynch School of Education, Boston College Contact Conference section: Research methods Analysis of propensity score approaches in difference-in-differences designs Background. Difference-in-differences designs (DD) focus on the comparison of the mean differences between treatment and control groups after the intervention is implemented with respect to the pre-intervention baseline mean (Somers, Zhu, Jacob, & Bloom, 2013). Former work presented in Abadie (2005) and in Stuart et al. (2014) expresses the DD using the Rubin Causal Model (RCM, 1974) framework to estimate treatment effects, adding the particularity that DD requires longitudinal outcome measures of each treatment and control case before and after the intervention is implemented. Because of the existence of pre- and post-implementation measures, the counterfactual assumption in DD is that, had the intervention not been implemented, both groups would show similar average changes over time (Somers et al., 2013). In this way, the design relies on the assumption of equivalence between treatment and control groups. Nevertheless, it is recognized in observational and quasi-experimental methodological research that when these groups differ with respect to relevant characteristics, the observed changes over time are likely to be explained by these preexisting differences rather than the intervention effect on the outcome (Shadish, Cook, & Campbell, 2002; Rubin, 2005). In the case of longitudinal quasi-experiments such as DD, if there is not guarantee of equivalence between treatment and control groups, alternative hypothesis e.g., maturation effects, differential attrition, or historical events impacting only to one group could explain the observed outcomes (Shadish, 2010; Shadish & Steiner, 2010). Former research has documented approaches to reduce bias in DD, focusing on the composition of the control group and its similarity with respect to the treatment group. For this purpose, the literature describes the use of different propensity scores (PS) approaches to improve the estimation of treatment effects and to establish equivalence between treatment and comparison groups with respect to relevant characteristics (Fortson, Verbitsky-Savitz, Kopa, & Gleason, 2012; Somers et al., 2013; Song et al., 2012; Stuart et al., 2014). The research shows that while some authors propose matching treatment and control cases based on PS estimates of the binary treatment status (e.g. Somers et al., 2013), others have analyzed the role of inverse weights using PS (IWPS) for multiple treatments when four groups are defined for those cases in treatment status or control status before and after the intervention is implemented (Stuart et al., 2014).

2 Purpose. Because some of the research on the use of PS in DD has only been exemplified using empirical data, there is not conclusive evidence about whether the use of IWPS or PS matching provides a more accurate estimation of treatment effects in DD or better balance among covariates. Thus, the present research project focuses on analyzing these PS techniques using different PS estimators in a Monte Carlo study. Methods. The study consisted of four conditions to simulate data resembling a long data arrangement. The common features among conditions were the number units i generated with i = 1,, 500 with four time observations t created for each unit, two within-unit pre-intervention observations and other two post-intervention observations. Standard normally distributed covariates were generated for each unit i, two time-varying covariates X ti with covariance equal to 0.30 and two time-constant covariates W i also with covariance equal to 0.30, the covariance between X ti and W i was set equal to zero. 500 data sets were generated for each condition. The four conditions were established based on the model used to generate the PS for the binary treatment assignment indicator Z ti and the model used to generate the outcome variable Y ti. For the PS, a model with only main effects for the probability of treatment assignment was defined as PZ ti ( 1) 1 { 1 (1 W W X X r )} 1 1i 2 2i 3 1ti 4 2ti ti 1 exp (1) where P(Z ti = 1) is the probability of being assigned to treatment. While four PS are available for each participant i because of the four time points t, only the value of the first pre-intervention PS P(Z 1i = 1) was used to determine treatment assignment by comparing it with a randomly generated number from a standard uniform distribution. The values for the four coefficients β 1, 0.5, 0.35, 0.40 were chosen so a larger proportion of units i were allocated in the control group. The residual term r ti is generated from a normal distribution N(0,2). A second model for the PS included squared terms for the W i and interactions between the X ti and the W i PZ ti ( 1) { 1 ( exp W i W i X ti X ti W i W i W ix ti W ix ti rti )} (2) β where the eight coefficients 1, 0.5, 0.35, 0.4, 0.5, 0.9, 0.8, 1 to have a larger proportion of units i in the control group. were also chosen The first model for the outcome Y ti was Y Z P Z P W W X X W W X X r ti 1 i 2 ti 3 i ti 4 1i 5 2i 6 1ti 7 2ti 8 1i 2i 1ti 2ti ti (3)

3 β 0.5, 0.5, 1, 0.67, 0.3, 0.45, 0.2, 0.02 where the eight coefficients indicate the effect of the assignment to the treatment group Z i, the change from pre- to post-implementation P ti, the DD treatment effect Z i P ti resulting from the outcome change over time for those assigned to the treatment, and the effect of the covariates W i and X ti. A second model for the outcome Y ti was Y Z P Z P W W X X ti 1 i 2 ti 3 i ti 4 1i 5 2i 6 1ti 7 2ti 2 2 8W1 i 9 W2i 10 W1 i X1ti 11W2 i X2ti 12 W1 i W2 i X1ti X2ti rti (3) β 0.5, 0.5, 1, 0.67, 0.3, 0.45, 0.2, 0.5, 0.37, 0.32, 1, 0.02 with coefficients. The combination of these models for the PS and outcome resulted in the four conditions to generate the data (see Table 1). Finally, each produced data set was analyzed using different models to estimate the PS for binary treatment assignment and multiple treatment assignment with four groups: logistic regression, multinomial regression, generalized boosted models, and neural networks (Stuart et al., 2014; Keller, Kim, & Steiner, 2015; McCaffrey et al., 2013; Shadish et al., 2002). The PS for binary treatment status were used to match treatment and control cases. The PS for multiple treatment assignment were used as inverse weights in DD models. Results. As shown in Table 2, the methods using inverse weights reach better covariate balance compared to the matching techniques. Additionally, Tables 3 to 5 show that more accurate temporal and treatment effects are also estimated using IWPS methods, especially when the generalized boosted models or the neural networks models are used to estimate the PS. References Abadie, A. (2005). Semiparametric difference-in-differences estimators. The Review of Economic Studies, 72(1), Fortson, K., Verbitsky-Savitz, N., Kopa, E., & Gleason, P. (2012). Using an Experimental Evaluation of Charter Schools to Test Whether Nonexperimental Comparison Group Methods can Replicate Experimental Impacts (NCEE Technical Methods Report ). Washington DC: National Center for Education Evaluation and Regional Assistance, Institute for Education Sciences, U.S. Department of Education. Keller, B., Kim, J.-S., & Steiner, P. M. (2015). Neural networks for propensity score estimation: Simulation results and recommendations. In L. A. van der Ark, D. M. Bolt, S.M. Chow, J.

4 A. Douglas, & W.C. Wang (Eds.), Quantitative psychology research. New York, NY: Springer. McCaffrey, D. F., Griffin, B. A., Almirall, D., Slaughter, M. E., Ramchand, R., & Burgette, L. F. (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Statistics in medicine, 32(19), Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), Rubin, D.B. (2005). Causal Inference Using Potential Outcomes. Journal of the American Statistical Association, 100(469), Shadish, W. R. (2010). Campbell and Rubin: A primer and comparison of their approaches to causal inference in field settings. Psychological methods, 15(1), Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin. Shadish, W. R., & Steiner, P. M. (2010). A primer on propensity score analysis. Newborn and Infant Nursing Reviews, 10(1), Song, Z., Safran, D.G., Landon, B.E., Landrum, M.B., He, Y., Mechanic, R.E., Day, M.P., Chernew, M.E. (2012). The alternative quality contract, based on a global budget, lowered medical spending and improved quality. Health Affairs, 31(8), Somers, M., Zhu, P., Jacob, R., & Bloom, H., (2013). The validity and precision of the comparative interrupted time series design and the difference-in-difference design in educational evaluation (MDRC working paper in research methodology). Retrieved from Stuart, E. A., Huskamp, H. A., Duckworth, K., Simmons, J., Song, Z., Chernew, M. E., & Barry, C. L. (2014). Using propensity scores in difference-in-differences models to estimate the effects of a policy change. Health Services and Outcomes Research Methodology, 14(4),

5 Tables Table 1. Conditions in the simulation study. Condition PS model Outcome model 1 Equation (1) Equation (3) 2 Equation (2) Equation (3) 3 Equation (1) Equation (4) 4 Equation (2) Equation (4) Table 2. rage standardized absolute mean difference between treatment and comparison groups. Condition Naïve Regression Boosted Model Condition Boosted Model Note: The four conditions are specified in Table 1 in terms of the models used to generate the propensity scores and the outcome. The values included in the naïve category are calculated before the use of any PS technique.

6 Cond Naïve Table 3. Estimated effect for the treatment assignment Z i. Regression Boosting Model Cond Naïve with covariates Boosting Model Note: The four conditions are specified in Table 1. The values included in the naïve categories are estimated with the full sample without the use of any PS technique. refers to the average estimate among replications, to the average absolute bias, and to the mean standard error. Table 4. Estimated effect for the pre- and post-implementation P ti. Naïve Regression Boosting Model Cond Naïve with covariates

7 Boosting Model Cond Note: The four conditions are specified in Table 1. The values included in the naïve categories are estimated with the full sample without the use of any PS technique. refers to the average estimate among replications, to the average absolute bias, and to the mean standard error. Cond Naïve Table 5. Estimated DD effect Z i P ti. Regression Boosting Model Naïve with covariates Boosting Model Cond Note: The four conditions are specified in Table 1. The values included in the naïve categories are estimated with the full sample without the use of any PS technique. refers to the average estimate among replications, to the average absolute bias, and to the mean standard error.

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates Eun Sook Kim, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta E. Lanehart, Aarti Bellara,