Working Papers. A Regression-with-Residuals Method for Estimating Controlled Direct Effects

Size: px
Start display at page:

Download "Working Papers. A Regression-with-Residuals Method for Estimating Controlled Direct Effects"

Transcription

1 Working Papers A Regression-with-Residuals Method for Estimating Controlled Direct Effects Xiang Zhou, Harvard University Geoffrey T. Wodtke, University of Toronto UT Sociology Working Paper No March, 2018

2 A Regression-with-Residuals Method for Estimating Controlled Direct Effects Xiang Zhou 1 and Geoffrey T. Wodtke 2 1 Harvard University 2 University of Toronto February 2, 2018 Abstract In a recent contribution, Acharya, Blackwell and Sen (2016) described the method of sequential g-estimation for estimating the controlled direct effect (CDE). We propose an alternative method, which we call regression-with-residuals (RWR), for estimating the CDE. Compared with sequential g-estimation, the RWR method is easier to understand and to implement. More important, unlike sequential g-estimation, it can easily accommodate several different types of effect moderation, including cases in which the effect of the mediator on the outcome is moderated by a post-treatment, or intermediate, confounder. Although common in the social sciences, this type of effect moderation is typically assumed away in applications of sequential g-estimation, which may lead to bias if effect moderation is in fact present. We illustrate RWR by reanalyzing the effect of plough use on female political participation while allowing the effect of log GDP per capita (the mediator) to vary across levels of several intermediate confounders. Direct all correspondence to Xiang Zhou, Department of Government, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138, USA; xiang zhou@fas.harvard.edu. The authors thank Felix Elwert and Gary King for helpful comments on previous versions of this work. 1

3 Introduction Over the past decade, the use of causal mediation analysis has been rapidly growing in empirical political science. Many scholars are no longer satisfied with merely establishing the presence of a causal effect between one variable and another; rather, they now seek to additionally identify causal mechanisms that explain such effects (e.g., Abramson and Carter 2016; Hall 2017; Holbein 2017; Knutsen et al. 2017; Reese, Ruby and Pape 2017; Zhu 2017). The study of causal mediation, however, often rests on strong and untestable assumptions (VanderWeele and Vansteelandt 2009; Imai et al. 2011). In a recent contribution, Acharya, Blackwell and Sen (2016, henceforth ABS) show that these assumptions are relatively weak when we focus on a quantity called the controlled direct effect (CDE). The CDE measures the strength of a causal relationship between a treatment and outcome when a putative mediator is fixed at a given value for all units (Pearl 2001; Robins 2003). This estimand is useful because it helps to adjudicate between alternative causal explanations. A nonzero CDE, for example, would imply that the causal effect of treatment on the outcome does not operate exclusively through the mediator of interest. Moreover, as ABS suggest, the difference between the total effect and the CDE can be interpreted as the degree to which the mediator contributes to a causal mechanism that transmits the effect of treatment to the outcome. Nevertheless, identification of the CDE is not straightforward. Simply conditioning on the mediator (via stratification, matching, or regression adjustment) is usually insufficient because the effect of the mediator on the outcome may be confounded, possibly by post-treatment variables. For example, when assessing the CDE of historical plough use on female political participation at a given level of log GDP per capita (the mediator), post-treatment variables, such as the industrial composition of the contemporary economy or level of democracy, may affect both log GDP per capita and female political participation. Following ABS, we call these variables intermediate confounders. Intermediate confounders pose a dilemma for the identification and estimation of CDEs. On the one hand, omitting intermediate confounders would bias estimates of the effect of the mediator on the outcome, and by extension, also estimates of the CDE. On the other hand, naively controlling for intermediate confounders may block causal pathways, and unblock noncausal pathways, from treatment to the outcome, which would also bias estimates of the CDE. Fortunately, several different approaches have been developed to overcome this dilemma. First, we could estimate a model for the marginal expectation of the potential outcomes under different levels of the treatment and mediator, known as a marginal structural model (MSM), us- 2

4 ing the method of inverse probability weighting (IPW) (Robins, Hernan and Brumback 2000; VanderWeele 2009). This approach performs best when both the treatment and mediator are binary. When the treatment and/or mediator are many valued or continuous, it tends to perform poorly because the inverse probability weights involve conditional density estimates that are often unreliable. Second, to overcome these limitations, we could instead estimate a structural nested mean model (SNMM) for the conditional expectation of the potential outcomes given a set of both pretreatment and intermediate confounders using the method of sequential g-estimation as described in ABS (see also Vansteelandt 2009; Joffe and Greene 2009). Sequential g-estimation of an SNMM involves a two-stage regression-based procedure in which the variation in the outcome due to the causal effect of the mediator is removed, and then the de-mediated outcome is regressed on treatment and the pretreatment confounders. Like IPW estimation, however, sequential g-estimation also suffers from several limitations. First, the underlying logic of g-estimation is not especially intuitive, which is perhaps why the application of this method remains infrequent (Vansteelandt and Joffe 2014). Second, sequential g- estimation is complicated to implement when there are intermediate interactions, that is, when the effect of the mediator on the outcome is moderated by an intermediate confounder. As ABS note [I]f Assumption 2 [no intermediate interactions] is violated, it is still possible to estimate the ACDE in a second stage, but that requires (i) a model for the distribution of the intermediate covariates conditional on the treatment and (ii) the evaluation of the average of within-stratum ACDEs across the distribution of that model. The second part entails a high-dimensional integral that is computationally challenging, though Monte Carlo procedures have been developed (Robins 1986, 1997). Because of these computational challenges, intermediate interactions are typically assumed away in applications of sequential g-estimation, but if this assumption is not met in practice, then estimates of the CDE may be biased. In this paper, we introduce an alternative method, termed regression-with-residuals (RWR), for estimating the CDE that is both more intuitive and easier to implement than sequential g- estimation. In particular, RWR estimation is easily implemented even in the presence of intermediate interactions, while in the absence of such interactions, we show that this method is algebraically equivalent to sequential g-estimation. We illustrate the utility of RWR by reanalyzing 3

5 data considered in ABS to estimate the CDE of plough use on female political participation, now allowing the effect of log GDP per capita (the mediator) to vary across levels of the intermediate confounders, such as oil revenues per capita. We find evidence of a significant intermediate interaction that, when naively excluded, would appear to suppress estimates of the CDE. Notation, Assumptions, and Sequential G-estimation Following ABS, we use A to denote the treatment, M to denote the mediator, Y to denote the observed outcome, and Y(a, m) to denote the potential outcome under treatment a and mediator m. With this notation, the CDE is formally defined as the average effect of changing treatment from a to a while fixing the mediator at a given level m 1 : CDE(a, a, m) = E[Y(a, m) Y(a, m)] This quantity is nonparametrically identified under the assumption of sequential ignorability (Robins 1997; Vansteelandt 2009), 2 which can be formally expressed in two parts as follows: 1. Y(a, m) A X (i.e., no unmeasured treatment outcome confounders) 2. Y(a, m) M X, A, Z (i.e., no unmeasured mediator-outcome confounders) Here, X denotes a vector of observed pretreatment confounders that may affect both treatment and the outcome, while Z denotes a vector of observed post-treatment, or intermediate, confounders that may affect both the mediator and the outcome and that may be affected by treatment. The sequential ignorability assumption is satisfied in Figure 1, which contains a directed acyclic graph summarizing a set of hypothesized causal relationships between the variables outlined previously. Although the sequential ignorability assumption is sufficient for nonparametric identification of the CDE, additional modeling assumptions are needed to estimate the CDE in finite samples. Sequential g-estimation, for example, typically relies on an unsaturated linear model for the conditional mean of the potential outcomes, Y(a, m), given X and Z. Moreover, because sequential g-estimation is difficult to implement in the presence of intermediate interactions, its application 1 The same quantity is defined as ACDE (i.e., the average CDE) in ABS, who use CDE i to denote the individual-level controlled direct effect. We avoid this distinction for concision. 2 The sequential ignorability assumption defined here is weaker than that stated in ABS (Assumption 1), as it does not require M(a) A X (i.e. no unmeasured treatment-mediator confounders). This condition, as discussed in VanderWeele and Vansteelandt (2009), is not required for identifying controlled direct effects. 4

6 X Z A M Y Figure 1: Causal Relationships under Sequential Ignorability Shown in Direct Acyclic Graph. Note: A denotes the treatment, M denotes the mediator, Y denotes the outcome, X denotes pretreatment confounders, Z denotes intermediate confounders. in practice, as with ABS, relies on an additional simplifying assumption that the effect of the mediator on the outcome is not moderated by post-treatment confounders, which can be formally expressed as follows: E[Y(a, m) Y(a, m ) X = x, Z = z] = E[Y(a, m) Y(a, m ) X = x] for any a,m, m, x and z Under this assumption, ABS illustrate sequential g-estimation of the CDE using the following SNMM: E[Y(a, m) X = x, Z = z] = β 0 + β T 1 x + β 2a + β T 3 z + m(γ 0 + γ T 1 x + γ 2a) (1) Specifically, with this model, sequential g-estimation proceeds in three steps: 1. Compute least squares estimates for equation (1) and save ˆγ 2 2. Construct a de-mediated outcome defined as y d = y m( ˆγ 0 + ˆγ 1 Tx + ˆγ 2a) 3. Compute least squares estimates for a linear regression of y d on x and a, which can be expressed as ŷ d = ˆκ 0 + ˆκ 1 Tx + ˆκ 2a The sequential g-estimate of the CDE is then given by ĈDE SG (a, a, m) = ( ˆκ 2 + ˆγ 2 m)(a a ) (2) 5

7 Standard errors can be obtained via the nonparametric bootstrap or a consistent variance estimator derived in ABS. In Figure 2, we attempt to illustrate the logic of sequential g-estimation. First, because the mediator-outcome relationship is unconfounded, the regression in step 1 identifies the causal effect of M on Y. Then, the de-mediation calculation in step 2 neutralizes the causal path from M to Y while keeping all other causal paths intact. Finally, the regression of the de-mediated outcome, Y d, on X and A in step 3 identifies the controlled direct effect of A when M = 0, and because ˆγ 2 is a consistent estimate of the treatment-mediator interaction effect, the CDE when M = m can be estimated with equation (2). X Z A M X Y d Figure 2: The Logic of Sequential G-estimation. Regression-with-Residuals Estimation Like g-estimation of SNMMs, RWR estimation was originally developed to assess how timevarying confounders moderate the effect of time-varying treatments (Almirall, Ten Have and Murphy 2010; Wodtke and Almirall 2017). In this section, we show how this method can be adapted to estimate CDEs while properly adjusting for intermediate confounders under the assumptions outlined previously. Specifically, RWR estimation of the CDE in a SNMM similar to that considered in ABS proceeds in two steps: 1. For each of the intermediate confounders, compute least squares estimates for a linear regression of z on x and a, and save the residuals, which we denote by z 6

8 2. Compute least squares estimates for a model similar to equation (1) but with z replaced by z, which can be expressed as ŷ = β 0 + β T 1 x + β 2 a + β T 3 z + m( γ 0 + γ T 1 x + γ 2a) The RWR estimate of the CDE is then given by ĈDE RWR (a, a, m) = ( β 2 + γ 2 m)(a a ) (3) As we show in Appendix A, when there are no intermediate interactions, RWR and sequential g-estimation are algebraically equivalent (i.e., ˆκ 2 = β 2 ; ˆγ 2 = γ 2 ). They rely on the same identification and modeling assumptions, and they share the same statistical properties. But compared with sequential g-estimation, the logic of RWR estimation is somewhat more intuitive. As shown in Figure 3, residualizing the intermediate confounders in step 1 neutralizes the causal paths emanating from X and A to Z. The residualized confounders, which have been purged of their association with treatment, can then be included in the regression model for the outcome in order to adjust for mediator-outcome confounding while avoiding bias due to conditioning on posttreatment variables. In other words, RWR estimation avoids post-treatment bias because Z is no longer a consequence of A, and it avoids omitted variable bias because all treatment-outcome and mediator-outcome confounders have been appropriately controlled in the model for the outcome. X X Z X A M Y Figure 3: The Logic of Regression-with-residuals. 7

9 Intermediate Interactions In the models considered previously, the effect of the mediator is assumed to be invariant across all intermediate confounders. This is a strong and arguably implausible assumption in many social science applications, and when it fails to hold, estimates of the CDE may be biased and inconsistent. Thus, accommodating, rather than naively assuming away, intermediate interactions will make analyses of causal mediation more robust. In the presence of intermediate interactions, the advantages of RWR over sequential g-estimation become more apparent. For example, consider the following model, which extends equation (1) by including an interaction term between M and Z: E[Y(a, m) X = x, Z = z] = β 0 + β T 1 x + β 2a + β T 3 z + m(γ 0 + γ1 T x + γ 2a + γ3 T z) (4) With this model, sequential g-estimation can still be used to estimate the CDE, but only at M = 0. The only modification to the sequential g-estimator in this situation is that the de-mediated outcome, y d, is obtained by subtracting m( ˆγ 0 + ˆγ T 1 x + ˆγ 2a + ˆγ T 3 z) instead of m( ˆγ 0 + ˆγ T 1 x + ˆγ 2a) from the observed outcome. Then, ĈDE SG (a, a, 0) = ˆκ 2 (a a ), where ˆκ 2 is the coefficient on treatment from the regression of y d on x and a. Unfortunately, however, we can no longer estimate the CDE in general for M = m using ˆγ 2 m(a a ) because this expression is no longer a consistent estimate of the treatment-mediator interaction effect. In equation (4), the inclusion of an intermediate interaction, γ T 3 z, leads to post-treatment bias in the treatment-mediator interaction, γ 2a, just as the inclusion of main effects for the intermediate confounders, β T 3 z, leads to post-treatment bias in the main effect of treatment, β 2 a. With sequential g-estimation, the latter bias is removed by the de-mediation step, but the former is not. By contrast, with RWR estimation, intermediate interactions can be easily accommodated, and its implementation in their presence remains almost exactly the same as before: 1. For each of the intermediate confounders, compute least squares estimates for a linear regression of z on x and a, and save the residuals, denoted by z 2. Compute least squares estimates for a model similar to equation (4) but with z replaced by z, which can be expressed as ŷ = β 0 + β T 1 x + β 2 a + β T 3 z + m( γ 0 + γ T 1 x + γ 2a + γ T 3 z ) 8

10 The RWR estimate of the CDE is then given by 3 ĈDE RWR (a, a, m) = ( β 2 + γ 2 m)(a a ) (5) As we show in Appendix B, equation (5) is a consistent estimator of the CDE under the identification assumptions outlined previously and provided that there is no model misspecification. In words, RWR estimation remains consistent even in the presence of intermediate interactions because, by appropriately residualizing the intermediate confounders, it removes any posttreatment bias for both the main effect of treatment and the treatment-mediator interaction effect. As before, standard errors can be computed using the nonparametric bootstrap. The Effect of Plough Use on Female Political Participation To illustrate the RWR method and demonstrate its utility, we reanalyze the CDE of historical plough use on female political participation. In their original study, Alesina, Giuliano and Nunn (2013) find that the total effect of plough use on female political participation is small and statistically insignificant, but they also find that the coefficient on plough use doubles and becomes statistically significant after controlling for log GDP per capita in the year Based on these results, Alesina, Giuliano and Nunn (2013) conclude that log GDP per capita is an important mediator and suggest that the small total effect of plough use on female political participation is due to the combination of a positive indirect effect (through log GDP per capita) and a negative direct effect. However, as highlighted by ABS, this conclusion may not be warranted because the analysis in Alesina, Giuliano and Nunn (2013) does not account for intermediate confounders that likely affect both log GDP per capita and women s participation in politics. To rectify this problem, ABS use sequential g-estimation to estimate the CDE of plough use on female political participation, controlling for log GDP per capita, and they find that this effect is even larger after appropriately adjusting for intermediate confounders. ABS: To compare RWR with sequential g-estimation, we begin by considering the same model as in E[Y(a, m) x, z] = β 0 + β T 1 x + β 2a + β T 3 z + m(γ 0 + γ 1 a) + m 2 (γ 2 + γ 3 a), 3 In previous work (Almirall, Ten Have and Murphy 2010; Wodtke and Almirall 2017), where RWR has been used to estimate the moderated, or conditional, effects of time-varying treatments, the residualized confounders are only included as main effects and are not used in any cross-product terms. In our adaptation of RWR for estimating CDEs, however, the residualized confounders must be included both as main effects and in the relevant cross-product terms, which ensures that β 2 and γ 2 capture all the information needed to construct estimates of the CDE. 9

11 Table 1: Estimated CDE of Plough Use on Female Political Participation using Sequential G- estimation, RWR, and RWR with intermediate interactions intercept plough use (i.e., ĈDE(a, a + 1, 0)) log GDP per capita Sequential g-estimation (final step) RWR RWR with intermediate interactions (5.42) (5.42) (6.33) -8.64** -8.64** *** (3.14) (3.14) (3.83) 4.32** 5.76*** (1.65) (1.65) log GDP per capita (0.71) (0.64) plough use * log GDP per capita (1.83) (1.86) plough use * log GDP per capita (0.84) (0.76) years of civil conflict years of interstate conflict oil revenues per capita proportion European descent former Communist rule Polity score in 2000 value added in Service as share of GDP in 2000 oil revenues per capita * log GDP per capita 0.13* 0.12* (0.06) (0.06) -0.3* (0.14) (0.14) ** (12.7) (23.86) (0.03) (0.03) (2.33) (2.22) (0.17) (0.17) (0.08) (0.09) 26.56* (10.81) Note: Numbers in parentheses are bootstrapped standard errors. p<.1, *p<.05, **p<.01, ***p<.001 (two-tailed z tests). Coefficients of pretreatment confounders are omitted. 10

12 where the outcome, Y, is the share of political positions held by women in the year 2000; the treatment, a, is the relative proportion of ethnic groups in a country that traditionally used the plough; the mediator, m, is log GDP per capita in the year 2000, recentered at its mean; the vector of pretreatment confounders, x, includes measures of agricultural suitability, tropical climate, presence of large animals, political hierarchy, economic complexity, and terrain ruggedness; and finally, the vector of intermediate confounders, z, includes measures of civil conflict, interstate conflict, oil revenues per capita, the proportion of population that is of European descent, former Communist rule, the policy score in 2000, and the value added of the service industry as share of GDP in the year The first two columns of Table 1 present sequential g-estimates and RWR estimates, respectively, for the CDE of historical plough use based on this model. 5 As expected, the estimated CDE given by these two different methods is exactly the same. Note that, with sequential g-estimation, only the CDE at m = 0 is reported in the final step (in this case, the CDE when log GDP per capita is set at its mean). To construct the CDE at other levels of the mediator, the analyst must return to the regression in the first step of the sequential g-estimation procedure and extract the coefficients on the treatment-mediator interactions. With RWR, by contrast, all the necessary coefficients are reported in a single regression, which allows the analyst to quickly construct the CDE at any level of the mediator from the output in Table 1. Thus far, the effect of log GDP per capita on female political participation has been assumed to be invariant across levels of the intermediate confounders, but if the effect of the mediator is in fact moderated by any of these post-treatment variables, then the estimates reported previously are likely biased and inconsistent. We now relax this assumption by additionally including an interaction term between log GDP and oil revenues per capita. 6 Results from this analysis are shown in the last column of Table 1, and Appendix C presents the R code used to generate them. As indicated by the additional interaction term, which is statistically significant at the 0.05 level, the effect of log GDP per capita is larger in countries with more oil revenue. When this intermediate interaction is appropriately modeled via RWR, the estimated CDE at m = 0 (i.e., when log GDP per capita is set at its mean) increases by almost 50%, from to , while the treatmentmediator interaction between plough use and log GDP per capita becomes muted and no longer 4 Detailed definitions of all these variables can be found in Alesina, Giuliano and Nunn (2013). 5 Our results are slightly different from those reported in ABS because we handle missing values differently. Whereas ABS include countries with missing values on z in the third step of the sequential g-estimator, we use only complete observations throughout the analysis. 6 We also estimated a model with all two-way interactions between log GDP per capita and the intermediate confounders, from which we obtained an RWR estimate of the CDE that is very similar to the one reported in Table 1. 11

13 statistically significant. Taken together, these results suggest that naively assuming away intermediate interactions when they do in fact exist could induce substantial bias in estimates of the CDE. Fortunately, this bias can be easily avoided with RWR. Summary In this paper, we introduced RWR for estimating controlled direct effects. In the absence of intermediate interactions, RWR is algebraically equivalent to the sequential g-estimator described in ABS. However, unlike the sequential g-estimator, RWR can easily accommodate several different types of effect moderation, including as we have shown intermediate interactions, which are likely ubiquitous in the social sciences. And in general, models with less stringent parametric constraints can be estimated more easily with RWR than with sequential g estimation. Given its simplicity, flexibility, and utility, we expect that RWR estimation will be used more widely in causal mediation analyses. Appendix A: Equivalence between RWR and Sequential G-Estimation Under No Intermediate Interactions To see the equivalence between RWR and sequential g-estimation, let us consider model (1) and write the naive least squares regression of it as y = ˆβ 0 + ˆβ T 1 x + ˆβ 2 a + ˆβ T 3 z + m( ˆγ 0 + ˆγ T 1 x + ˆγ 2a) + y, (6) where y denotes the residual. Suppose x is a column vector of p pretreatment confounders and z is a column vector of q intermediate confounders. For each of the components in z, it has a least squares fit on x and a. These least squares fits can be combined in matrix form: z = ˆλ 0 + ˆΛ 1 x + ˆλ 2 a + z, (7) where ˆλ 0 and ˆλ 2 are q 1 vectors, ˆΛ 1 is a q p matrix, and z is a q 1 vector of residuals. Substituting equation (7) into equation (6), we have y = ( ˆβ 0 + ˆβ T 3 ˆλ 0 ) + ( ˆβ T 1 + ˆβ T 3 ˆΛ 1 )x + ( ˆβ 2 + ˆβ T 3 ˆλ 2 )a + ˆβ T 3 z + m( ˆγ 0 + ˆγ T 1 x + ˆγ 2a) + y. (8) 12

14 Since y is the least squares residual for regression (6), it is orthogonal to the span of {1, x, a, z, m, mx, ma}. Because z is a linear combination of x, a, and z, {1, x, a, z, m, mx, ma} and {1, x, a, z, m, mx, ma} span the same space. Thus equation (8) represents the least squares fit of y on {1, x, a, z, m, mx, ma}, meaning that the RWR estimator of the CDE is ĈDE RWR (a, a, m) = ( ˆβ 2 + ˆβ T 3 ˆλ 2 + ˆγ 2 m)(a a ). From equation (8), we also know that the de-mediated outcome can be written as y d = ( ˆβ 0 + ˆβ T 3 ˆλ 0 ) + ( ˆβ T 1 + ˆβ T 3 ˆΛ 1 )x + ( ˆβ 2 + ˆβ T 3 ˆλ 2 )a + ˆβ T 3 z + y. (9) Since z and y are both orthogonal to the span of {1, x, a} (from the properties of least squares residuals), ˆβ T 3 z + y is also orthogonal to the span of {1, x, a}. Thus equation (9) represents the least squares fit of y d on x and a, meaning that the sequential g-estimator of the CDE is ĈDE SG (a, a, m) = ( ˆκ 2 + ˆγ 2 m)(a a ) = ( ˆβ 2 + ˆβ T 3 ˆλ 2 + ˆγ 2 m)(a a ). Obviously, the sequential g-estimator is the same as the RWR estimator. Appendix B: Consistency of RWR in the Presence of Intermediate Interactions First, we explain an implicit modeling assumption that underlies both the sequential g-estimator and the RWR estimator described in the main text. For the sequential g-estimator, the least squares regression in step 3 implies the linearity of E[Y(a, 0) X = x] in x and a: E[Y(a, 0) X = x] = κ 0 + κ T 1 x + κ 2a. (10) Setting m = 0 in model (1) or (4), we have E[Y(a, 0) X = x, Z = z] = β 0 + β T 1 x + β 2a + β T 3 z. (11) 13

15 Taking the expectation of equation (11) over z (given x and a) yields E[Y(a, 0) X = x] = β 0 + β T 1 x + β 2a + β T 3 E[z x, a]. (12) Comparing equations (10) and (12), we can see that β T 3 E[z x, a] must be linear in x and a. Since β 3 represents model parameters that can vary freely in R q, the linearity of β T 3 E[z x, a] implies that each component of E[z x, a] should be linear in x and a. Conversely, when each component of E[z x, a] is linear in x and a, model (1) or (4) implies equation (10). Thus, the sequential g- estimator implicitly assumes each component of E[z x, a] to be linear in x and a. This assumption is more explicit in the RWR estimator, which requires the user to fit a linear model for each of the intermediate confounders. Thus, both the sequential g-estimator and the RWR estimator are based on the linearity of E[z x, a] 7 E[z x, a] = λ 0 + Λ 1 x + λ 2 a. (13) Paralleling equation (7) in Appendix A, λ 0 and λ 2 are both q 1 vectors and Λ 1 is a q p matrix. To see the consistency of the RWR estimator in the presence of intermediate interactions, let us consider model (4). Given equation (13), the CDE can be written as E[y(a, m) y(a, m)] = E x E z x,a E[y(a, m) x, z] E x E z x,a E[y(a, m) x, z] = β 2 (a a ) + γ 2 m(a a ) + β T 3 E x [E[z x, a] E[z x, a ]] + γ3 T m E x [E[z x, a] E[z x, a ]] = β 2 (a a ) + γ 2 m(a a ) + β T 3 λ 2 (a a ) + γ3 T λ 2 m(a a ) = [(β 2 + β T 3 λ 2 ) + (γ 2 + γ3 T λ 2 )m](a a ) It is easy to show that the RWR estimator for model (4) equals ĈDE RWR (a, a, m) = [( ˆβ 2 + ˆβ T 3 ˆλ 2 ) + ( ˆγ 2 + ˆγ T 3 ˆλ 2 )m](a a ). Thus, when the models for E[Y(a, m) x, z] and E[z x, a] are both correctly specified, all coefficient estimates are consistent. It follows that ĈDE RWR (a, a, m) is also consistent. 7 In practice, this model can be specified more flexibly, for example, by including higher-order or interaction terms of x and a. 14

16 Appendix C: R Code for RWR In this appendix, we illustrate the implementation of RWR in R for estimating the CDE of plough use on female labor force participation. Replication data can be found at Matthew Blackwell s Dataverse: library(foreign) library(dplyr) # load data ploughs <- read.dta("crosscountry_dataset.dta") # center log income at its mean, select variables, and keep complete cases ploughs <- ploughs %>% tbl_df() %>% mutate(centered_ln_inc = ln_income - mean(ln_income, na.rm = TRUE), centered_ln_incsq = centered_ln_inc^2) %>% select(women_politics, plow, centered_ln_inc, centered_ln_incsq, agricultural_suitability, tropical_climate, large_animals, political_hierarchies, economic_complexity, rugged, years_civil_conflict, years_interstate_conflict, oil_pc, european_descent, communist_dummy, polity2_2000, serv_va_gdp2000) %>% na.omit() # a function returning residualized intermediate confounders residualize <- function(y){ residuals(lm(y ~ plow + agricultural_suitability + tropical_climate + large_animals + political_hierarchies + economic_complexity + rugged, data = ploughs)) } # generate residualized intermediate confounders ploughs <- ploughs %>% mutate_at(vars(years_civil_conflict:serv_va_gdp2000), funs(res = residualize)) # regression with residualized confounders rwr_mod <- lm(women_politics ~ plow * centered_ln_inc + plow * centered_ln_incsq + agricultural_suitability + tropical_climate + large_animals + political_hierarchies + economic_complexity + rugged + years_civil_conflict_res + years_interstate_conflict_res + oil_pc_res + european_descent_res + communist_dummy_res + polity2_2000_res + serv_va_gdp2000_res + oil_pc_res * centered_ln_inc, data = ploughs) 15

17 References Abramson, Scott F and David B Carter The Historical Origins of Territorial Disputes. American Political Science Review 110(4): Acharya, Avidit, Matthew Blackwell and Maya Sen Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects. American Political Science Review 110(3): Alesina, Alberto, Paola Giuliano and Nathan Nunn On the Origins of Gender Roles: Women and the Plough. The Quarterly Journal of Economics 128(2): Almirall, Daniel, Thomas Ten Have and Susan A Murphy Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Biometrics 66(1): Hall, Matthew EK Macro Implementation: Testing the Causal Paths from US Macro Policy to Federal Incarceration. American Journal of Political Science 61(2): Holbein, John B Childhood Skill Development and Adult Political Participation. American Political Science Review 111(3): Imai, Kosuke, Luke Keele, Dustin Tingley and Teppei Yamamoto Unpacking the Black Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies. American Political Science Review 105(4): Joffe, Marshall M and Tom Greene Related Causal Frameworks for Surrogate Outcomes. Biometrics 65(2): Knutsen, Carl Henrik, Andreas Kotsadam, Eivind Hammersmark Olsen and Tore Wig Mining and Local Corruption in Africa. American Journal of Political Science 61(2): Pearl, Judea Direct and Indirect Effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. pp Reese, Michael J, Keven G Ruby and Robert A Pape Days of Action or Restraint? How the Islamic Calendar Impacts Violence. American Political Science Review 111(3): Robins, James A New Approach to Causal Inference in Mortality Studies with a Sustained Exposure Period-Application to Control of the Healthy Worker Survivor Effect. Mathematical Modelling 7(9-12):

18 Robins, James M Causal Inference from Complex Longitudinal Data. Latent Variable Modeling and Applications to Causality pp Robins, James M Semantics of Causal DAG models and the Identification of Direct and Indirect effects. Highly Structured Stochastic Systems pp Robins, James M, Miguel Angel Hernan and Babette Brumback Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology 11(5): VanderWeele, Tyler J Marginal Structural Models for the Estimation of Direct and Indirect effects. Epidemiology 20(1): VanderWeele, Tyler J and Stijn Vansteelandt Conceptual Issues Concerning Mediation, Interventions and Composition. Statistics and its Interface 2(4): Vansteelandt, Stijn Estimating Direct Effects in Cohort and Case control Studies. Epidemiology 20(6): Vansteelandt, Stijn and Marshall Joffe Structural Nested Models and g-estimation: The Partially Realized Promise. Statistical Science 29(4): Wodtke, Geoffrey T and Daniel Almirall Estimating Moderated Causal Effects with Time- Varying Treatments and Time-Varying Moderators: Structural Nested Mean Models and Regression with Residuals. Sociological Methodology 47(1): Zhu, Boliang MNCs, Rents, and Corruption: Evidence from China. American Journal of Political Science 61(1):

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015 Estimating Heterogeneous Causal Effects with Time-Varying Treatments and Time-Varying Effect Moderators: Structural Nested Mean Models and Regression-with-Residuals Geoffrey T. Wodtke University of Toronto

More information

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Comparison of Three Approaches to Causal Mediation Analysis Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Introduction Mediation defined using the potential outcomes framework natural effects

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Telescope Matching: A Flexible Approach to Estimating Direct Effects

Telescope Matching: A Flexible Approach to Estimating Direct Effects Telescope Matching: A Flexible Approach to Estimating Direct Effects Matthew Blackwell and Anton Strezhnev International Methods Colloquium October 12, 2018 direct effect direct effect effect of treatment

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Causal mediation analysis: Definition of effects and common identification assumptions

Causal mediation analysis: Definition of effects and common identification assumptions Causal mediation analysis: Definition of effects and common identification assumptions Trang Quynh Nguyen Seminar on Statistical Methods for Mental Health Research Johns Hopkins Bloomberg School of Public

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Telescope Matching: A Flexible Approach to Estimating Direct Effects *

Telescope Matching: A Flexible Approach to Estimating Direct Effects * Telescope Matching: A Flexible Approach to Estimating Direct Effects * Matthew Blackwell Anton Strezhnev August 4, 2018 Abstract Estimating the direct effect of a treatment fixing the value of a consequence

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Identification and Sensitivity Analysis for Multiple Causal Mechanisms: Revisiting Evidence from Framing Experiments

Identification and Sensitivity Analysis for Multiple Causal Mechanisms: Revisiting Evidence from Framing Experiments Identification and Sensitivity Analysis for Multiple Causal Mechanisms: Revisiting Evidence from Framing Experiments Kosuke Imai Teppei Yamamoto First Draft: May 17, 2011 This Draft: January 10, 2012 Abstract

More information

Supplementary Materials for Residual Balancing: A Method of Constructing Weights for Marginal Structural Models

Supplementary Materials for Residual Balancing: A Method of Constructing Weights for Marginal Structural Models Supplementary Materials for Residual Balancing: A Method of Constructing Weights for Marginal Structural Models Xiang Zhou Harvard University Geoffrey T. Wodtke University of Toronto March 29, 2019 A.

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University February 23, 2012 Joint work with L. Keele (Penn State)

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University April 13, 2009 Kosuke Imai (Princeton) Causal Mechanisms April 13, 2009 1 / 26 Papers and Software Collaborators: Luke Keele,

More information

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University January 23, 2012 Joint work with L. Keele (Penn State)

More information

Running Head: Effect Heterogeneity with Time-varying Treatments and Moderators ESTIMATING HETEROGENEOUS CAUSAL EFFECTS WITH TIME-VARYING

Running Head: Effect Heterogeneity with Time-varying Treatments and Moderators ESTIMATING HETEROGENEOUS CAUSAL EFFECTS WITH TIME-VARYING Running Head: Effect Heterogeneity with Time-varying Treatments and Moderators ESTIMATING HETEROGENEOUS CAUSAL EFFECTS WITH TIME-VARYING TREATMENTS AND TIME-VARYING EFFECT MODERATORS: STRUCTURAL NESTED

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation

More information

19 Effect Heterogeneity and Bias in Main-Effects-Only Regression Models

19 Effect Heterogeneity and Bias in Main-Effects-Only Regression Models 19 Effect Heterogeneity and Bias in Main-Effects-Only Regression Models FELIX ELWERT AND CHRISTOPHER WINSHIP 1 Introduction The overwhelming majority of OLS regression models estimated in the social sciences,

More information

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University Joint work with Keele (Ohio State), Tingley (Harvard), Yamamoto (Princeton)

More information

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Kosuke Imai Luke Keele Teppei Yamamoto First Draft: November 4, 2008 This Draft: January 15, 2009 Abstract Causal mediation

More information

Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance

Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance Teppei Yamamoto First Draft: May 10, 2013 This Draft: March 26, 2014 Abstract Treatment noncompliance, a common problem

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE insert p. 90: in graphical terms or plain causal language. The mediation problem of Section 6 illustrates how such symbiosis clarifies the

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University November 17, 2008 Joint work with Luke Keele (Ohio State) and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mechanisms

More information

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele Original Article A Unification of Mediation and Interaction A 4-Way Decomposition Tyler J. VanderWeele Abstract: The overall effect of an exposure on an outcome, in the presence of a mediator with which

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Effect Heterogeneity and Bias in Main-Effects- Only Regression Models

Effect Heterogeneity and Bias in Main-Effects- Only Regression Models Effect Heterogeneity and Bias in Regression 1 Effect Heterogeneity and Bias in Main-Effects- Only Regression Models FELIX ELWERT AND CHRISTOPHER WINSHIP Introduction The overwhelming majority of OLS regression

More information

Estimating the effects of timevarying treatments in the presence of time-varying confounding

Estimating the effects of timevarying treatments in the presence of time-varying confounding Estimating the effects of timevarying treatments in the presence of time-varying confounding An application to neighborhood effects on high school graduation David J. Harding, University of Michigan (joint

More information

Identification and Inference in Causal Mediation Analysis

Identification and Inference in Causal Mediation Analysis Identification and Inference in Causal Mediation Analysis Kosuke Imai Luke Keele Teppei Yamamoto Princeton University Ohio State University November 12, 2008 Kosuke Imai (Princeton) Causal Mediation Analysis

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Luke Keele Dustin Tingley Teppei Yamamoto Princeton University Ohio State University July 25, 2009 Summer Political Methodology Conference Imai, Keele,

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect, 13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +

More information

Sensitivity analysis and distributional assumptions

Sensitivity analysis and distributional assumptions Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu

More information

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Ruth Keogh, Stijn Vansteelandt, Rhian Daniel Department of Medical Statistics London

More information

Experimental Designs for Identifying Causal Mechanisms

Experimental Designs for Identifying Causal Mechanisms Experimental Designs for Identifying Causal Mechanisms Kosuke Imai Princeton University October 18, 2011 PSMG Meeting Kosuke Imai (Princeton) Causal Mechanisms October 18, 2011 1 / 25 Project Reference

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

Estimation of direct causal effects.

Estimation of direct causal effects. University of California, Berkeley From the SelectedWorks of Maya Petersen May, 2006 Estimation of direct causal effects. Maya L Petersen, University of California, Berkeley Sandra E Sinisi Mark J van

More information

Residual Balancing: A Method of Constructing Weights for Marginal Structural Models

Residual Balancing: A Method of Constructing Weights for Marginal Structural Models Residual Balancing: A Method of Constructing Weights for Marginal Structural Models Xiang Zhou Harvard University Geoffrey T. Wodtke University of Toronto March 29, 2019 Abstract When making causal inferences,

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017 Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability Alessandra Mattei, Federico Ricciardi and Fabrizia Mealli Department of Statistics, Computer Science, Applications, University

More information

Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies

Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies Guanglei Hong, Jonah Deutsch, Heather Hill University of

More information

Statistical Analysis of Causal Mechanisms for Randomized Experiments

Statistical Analysis of Causal Mechanisms for Randomized Experiments Statistical Analysis of Causal Mechanisms for Randomized Experiments Kosuke Imai Department of Politics Princeton University November 22, 2008 Graduate Student Conference on Experiments in Interactive

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

Help! Statistics! Mediation Analysis

Help! Statistics! Mediation Analysis Help! Statistics! Lunch time lectures Help! Statistics! Mediation Analysis What? Frequently used statistical methods and questions in a manageable timeframe for all researchers at the UMCG. No knowledge

More information

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University September 14, 2010 Institute of Statistical Mathematics Project Reference

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Statistical Methods for Causal Mediation Analysis

Statistical Methods for Causal Mediation Analysis Statistical Methods for Causal Mediation Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Difference-in-Differences Methods

Difference-in-Differences Methods Difference-in-Differences Methods Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 1 Introduction: A Motivating Example 2 Identification 3 Estimation and Inference 4 Diagnostics

More information

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models Alisa Stephens Luke Keele Marshall Joffe December 5, 2013 Abstract In randomized controlled trials, the evaluation

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Causal Directed Acyclic Graphs

Causal Directed Acyclic Graphs Causal Directed Acyclic Graphs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Causal DAGs Stat186/Gov2002 Fall 2018 1 / 15 Elements of DAGs (Pearl. 2000.

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Introduction to Statistical Inference Kosuke Imai Princeton University January 31, 2010 Kosuke Imai (Princeton) Introduction to Statistical Inference January 31, 2010 1 / 21 What is Statistics? Statistics

More information

Notes on causal effects

Notes on causal effects Notes on causal effects Johan A. Elkink March 4, 2013 1 Decomposing bias terms Deriving Eq. 2.12 in Morgan and Winship (2007: 46): { Y1i if T Potential outcome = i = 1 Y 0i if T i = 0 Using shortcut E

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

A Distinction between Causal Effects in Structural and Rubin Causal Models

A Distinction between Causal Effects in Structural and Rubin Causal Models A istinction between Causal Effects in Structural and Rubin Causal Models ionissi Aliprantis April 28, 2017 Abstract: Unspecified mediators play different roles in the outcome equations of Structural Causal

More information

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Simple Sensitivity Analysis for Differential Measurement Error By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Abstract Simple sensitivity analysis results are given for differential

More information

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016 Mediation analyses Advanced Psychometrics Methods in Cognitive Aging Research Workshop June 6, 2016 1 / 40 1 2 3 4 5 2 / 40 Goals for today Motivate mediation analysis Survey rapidly developing field in

More information

A proof of Bell s inequality in quantum mechanics using causal interactions

A proof of Bell s inequality in quantum mechanics using causal interactions A proof of Bell s inequality in quantum mechanics using causal interactions James M. Robins, Tyler J. VanderWeele Departments of Epidemiology and Biostatistics, Harvard School of Public Health Richard

More information

Gov 2002: 13. Dynamic Causal Inference

Gov 2002: 13. Dynamic Causal Inference Gov 2002: 13. Dynamic Causal Inference Matthew Blackwell December 19, 2015 1 / 33 1. Time-varying treatments 2. Marginal structural models 2 / 33 1/ Time-varying treatments 3 / 33 Time-varying treatments

More information

Bounding the Probability of Causation in Mediation Analysis

Bounding the Probability of Causation in Mediation Analysis arxiv:1411.2636v1 [math.st] 10 Nov 2014 Bounding the Probability of Causation in Mediation Analysis A. P. Dawid R. Murtas M. Musio February 16, 2018 Abstract Given empirical evidence for the dependence

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? CIRJE-F-736 Are Forecast Updates Progressive? Chia-Lin Chang National Chung Hsing University Philip Hans Franses Erasmus University Rotterdam Michael McAleer Erasmus University Rotterdam and Tinbergen

More information

arxiv: v2 [math.st] 4 Mar 2013

arxiv: v2 [math.st] 4 Mar 2013 Running head:: LONGITUDINAL MEDIATION ANALYSIS 1 arxiv:1205.0241v2 [math.st] 4 Mar 2013 Counterfactual Graphical Models for Longitudinal Mediation Analysis with Unobserved Confounding Ilya Shpitser School

More information

Mediation and Interaction Analysis

Mediation and Interaction Analysis Mediation and Interaction Analysis Andrea Bellavia abellavi@hsph.harvard.edu May 17, 2017 Andrea Bellavia Mediation and Interaction May 17, 2017 1 / 43 Epidemiology, public health, and clinical research

More information

Front-Door Adjustment

Front-Door Adjustment Front-Door Adjustment Ethan Fosse Princeton University Fall 2016 Ethan Fosse Princeton University Front-Door Adjustment Fall 2016 1 / 38 1 Preliminaries 2 Examples of Mechanisms in Sociology 3 Bias Formulas

More information

The identification of synergism in the sufficient-component cause framework

The identification of synergism in the sufficient-component cause framework * Title Page Original Article The identification of synergism in the sufficient-component cause framework Tyler J. VanderWeele Department of Health Studies, University of Chicago James M. Robins Departments

More information

CAUSALITY CORRECTIONS IMPLEMENTED IN 2nd PRINTING

CAUSALITY CORRECTIONS IMPLEMENTED IN 2nd PRINTING 1 CAUSALITY CORRECTIONS IMPLEMENTED IN 2nd PRINTING Updated 9/26/00 page v *insert* TO RUTH centered in middle of page page xv *insert* in second paragraph David Galles after Dechter page 2 *replace* 2000

More information

Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula

Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula Multivariate Behavioral Research ISSN: 0027-3171 (Print) 1532-7906 (Online) Journal homepage: http://www.tandfonline.com/loi/hmbr20 Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Causality II: How does causal inference fit into public health and what it is the role of statistics? Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 155 Estimation of Direct and Indirect Causal Effects in Longitudinal Studies Mark J. van

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Treatment Effects with Normal Disturbances in sampleselection Package

Treatment Effects with Normal Disturbances in sampleselection Package Treatment Effects with Normal Disturbances in sampleselection Package Ott Toomet University of Washington December 7, 017 1 The Problem Recent decades have seen a surge in interest for evidence-based policy-making.

More information

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path

More information

Modern Mediation Analysis Methods in the Social Sciences

Modern Mediation Analysis Methods in the Social Sciences Modern Mediation Analysis Methods in the Social Sciences David P. MacKinnon, Arizona State University Causal Mediation Analysis in Social and Medical Research, Oxford, England July 7, 2014 Introduction

More information

Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package

Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package Lane Burgette, Beth Ann Griffin and Dan McCaffrey RAND Corporation July, 07 Introduction While standard

More information

Covariate Selection for Generalizing Experimental Results

Covariate Selection for Generalizing Experimental Results Covariate Selection for Generalizing Experimental Results Naoki Egami Erin Hartman July 19, 2018 Abstract Social and biomedical scientists are often interested in generalizing the average treatment effect

More information

Robust Estimation of Inverse Probability Weights for Marginal Structural Models

Robust Estimation of Inverse Probability Weights for Marginal Structural Models Robust Estimation of Inverse Probability Weights for Marginal Structural Models Kosuke IMAI and Marc RATKOVIC Marginal structural models (MSMs) are becoming increasingly popular as a tool for causal inference

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

This paper revisits certain issues concerning differences

This paper revisits certain issues concerning differences ORIGINAL ARTICLE On the Distinction Between Interaction and Effect Modification Tyler J. VanderWeele Abstract: This paper contrasts the concepts of interaction and effect modification using a series of

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics

More information

Technical Appendix C: Methods

Technical Appendix C: Methods Technical Appendix C: Methods As not all readers may be familiar with the multilevel analytical methods used in this study, a brief note helps to clarify the techniques. The general theory developed in

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 269 Diagnosing and Responding to Violations in the Positivity Assumption Maya L. Petersen

More information

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-13-2010 Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of

More information

When Should We Use Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai In Song Kim First Draft: July 1, 2014 This Draft: December 19, 2017 Abstract Many researchers

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information