Published online: 25 Sep PDF Free Download

This article was downloaded by: [Cornell University Library] On: 23 February 2015, At: 11:15 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Multivariate Behavioral Research Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hmbr20 Theory and Analysis of Total, Direct, and Indirect Causal Effects Axel Mayer a, Felix Thoemmes b, Norman Rose c, Rolf Steyer d & Stephen G. West e a Ghent University and University of Jena b Cornell University c University of Tübingen d University of Jena e Arizona State University Published online: 25 Sep 2014. Click for updates To cite this article: Axel Mayer, Felix Thoemmes, Norman Rose, Rolf Steyer & Stephen G. West (2014) Theory and Analysis of Total, Direct, and Indirect Causal Effects, Multivariate Behavioral Research, 49:5, 425-442, DOI: 10.1080/00273171.2014.931797 To link to this article: http://dx.doi.org/10.1080/00273171.2014.931797 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content ) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Multivariate Behavioral Research, 49:425 442, 2014 Copyright C Taylor & Francis Group, LLC ISSN: 0027-3171 print / 1532-7906 online DOI: 10.1080/00273171.2014.931797 Theory and Analysis of Total, Direct, and Indirect Causal Effects Axel Mayer Ghent University and University of Jena Downloaded by [Cornell University Library] at 11:15 23 February 2015 Felix Thoemmes Cornell University Norman Rose University of Tübingen Rolf Steyer University of Jena Stephen G. West Arizona State University Mediation analysis, or more generally models with direct and indirect effects, are commonly used in the behavioral sciences. As we show in our illustrative example, traditional methods of mediation analysis that omit confounding variables can lead to systematically biased direct and indirect effects, even in the context of a randomized experiment. Therefore, several definitions of causal effects in mediation models have been presented in the literature (Baron & Kenny, 1986; Imai, Keele, & Tingley, 2010; Pearl, 2012). We illustrate the stochastic theory of causal effects as an alternative foundation of causal mediation analysis based on probability theory. In this theory we define total, direct, and indirect effects and show how they can be identified in the context of our illustrative example. A particular strength of the stochastic theory of causal effects are the causality conditions that imply causal unbiasedness of effect estimates. The causality conditions have empirically testable implications and can be used for covariate selection. In the discussion, we highlight some similarities and differences of the stochastic theory of causal effects with other theories of causal effects. Direct and indirect effects, commonly encountered in mediation analysis (Hyman, 1955; Judd & Kenny, 1981a; MacKinnon, 2008), are central to many questions in the applied behavioral sciences. The prominent role of these effects can also be seen in the frequent citation of the seminal work by Baron and Kenny (1986) and more recent work by MacKinnon, Lockwood, Hoffmann, West, and Sheets (2002). Mediation analysis, or more generally the analysis of models with direct and indirect effects, is of interest whenever researchers assume that the effect of a putative cause on an Correspondence concerning this article should be addressed to Axel Mayer, Ghent University, Faculty of Psychology and Educational Sciences, Department of Data Analysis, Henri Dunantlaan 1, B-9000, Ghent, Belgium, E-mail: Axel.Mayer@ugent.be outcome of interest might be transmitted through one or more intermediate variables. The basic mediation model depicted in Figure 1 has received much attention and is now widely used by social scientists. The basic model includes an indirect effect a b and a direct effect c of treatment X on outcome Y. The total effect c of the treatment variable X in the linear form of the basic model is the sum of the direct and indirect effect, that is, c = c + a b. EFFECTS IN MEDIATION MODELS AND THEIR CAUSAL INTERPRETATION The causal interpretation of direct and indirect effects in the basic mediation model has long been considered a

426 MAYER ET AL. Downloaded by [Cornell University Library] at 11:15 23 February 2015 FIGURE 1 A basic mediation model with three variables. difficult problem (Holland, 1988; Robins & Greenland, 1992). Despite advances in the statistical analysis of the basic mediation model (MacKinnon, 2008), many applied researchers continue to use the basic model and statistical tests of mediation rather uncritically. Bullock, Green, and Ha (2010) and Green, Ha, and Bullock (2010) questioned the reliance on the basic mediation model and strongly argued for a more critical evaluation of the underlying assumptions of the model, even in randomized experiments. Several alternative approaches to the basic mediation model recently have been offered (Frangakis & Rubin, 2002; Imai, Keele, & Tingley, 2010; Jo, 2008; Jo, Stuart, MacKinnon, & Vinokur, 2011; Kraemer, Wilson, Fairburn, & Agras, 2002; Pearl, 2011b; Robins, Hérnan, & Brumback, 2000; Small, 2011; Ten Have & Joffe, 2010; VanderWeele & Vansteelandt, 2009). Each of these suggests different solutions to the problem, sometimes at the expense of redefining the indirect and direct effects that are of key interest to researchers (Pearl, 2011a). In this article we present an introduction to the stochastic theory of causality, a framework that can be used to analyze mediation effects. The stochastic theory is a comprehensive and consistent theory of causal effects that encompasses numerous applications. Here, we restrict our consideration to its application to mediational processes. The stochastic theory meets a comprehensive set of desiderata. Some of these desiderata are met by other theories, but no alternative theory meets the full set: 1. The theory should explain why causal total effects can be identified in randomized experiments. 2. It should clearly define causal effects in terms of probability theory. 3. It should allow for processes through which effects are transmitted (e.g., mediation). 4. It should specify the conditions under which causal total, direct, and indirect effects can be identified from estimable parameters. 5. It should allow consideration of latent variables such as traits and states as covariates and as outcome variables. 6. It should be able to capture intraindividual development over time and consider persons at multiple time points. To meet these desiderata, the theory must be stochastic because measurement error, situational effects, and transmitting processes cannot be adequately described using deterministic relationships between variables. The stochastic theory was originally presented by Steyer (1984, 1988) and was further developed by Steyer, Gabler, and Rucai (1996); and Steyer, Gabler, von Davier, Nachtigall, and Buhl (2000) and was most recently updated by Steyer, Mayer, and Fiege (2014). Applications to latent variables can be found in Steyer (2005) and Steyer, Mayer, Geiser, and Cole (2014). The present article complements prior work through the application of the stochastic theory to causal mediation analysis. The existing approaches to mediation mentioned earlier have considerable overlap with the stochastic theory, but they are not fully consistent with each other. None of the other existing approaches meets all of the desiderata identified earlier. The result is that the stochastic theory allows us to present the implications in a single, unified framework. In the Discussion section, we present a comparison of some aspects of the different theories in the context of causal mediation analysis. As a vehicle for presenting some key ideas from the theory, we begin with an illustration in which a pretest of the mediator M and outcome Y are also included in the mediation model for an experiment in which participants are randomly assigned to treatment and control conditions. The illustration vividly shows that even pretest variables in a randomized experiment can be confounders that produce substantial bias in the estimate of the mediated effect, an issue well known to methodologists (e.g., Cole & Maxwell, 2003; Judd & Kenny, 1981b; Maxwell & Cole, 2007). In contrast, applied researchers in the social sciences do not appear to appreciate the role of potential confounders in mediation, rarely control for pretests and other possible confounders in longitudinal mediation designs, and almost never probe whether the covariates examined were sufficient to rule out confounding of the mediational results. We then present the basic ideas of the stochastic theory of causal mediation as they apply in our illustrative example. The stochastic theory presents clear definitions of causal effects based on well-defined terms of probability theory (for an introduction to probability theory, see, e.g., Steyer & Nagel, 2014). An important feature is the causality conditions that imply unbiasedness. Of importance and unlike most other recent approaches, the conditions are empirically testable in the sense they can be falsified. The empirical tests allow researchers to probe whether their hypothesized causal model is compatible with the data and are illustrated in the context of our example. Given space limitations, in this initial presentation of the stochastic theory of causal mediation, we address only those causality conditions that apply in the context of our illustrative

TABLE 1 Covariances, Correlations, and Means X M Y Treatment X 0.25.727.597 Study time M 5.00 189.00.893 Posttreatment achievement Y 5.00 205.70 280.45 Means 0.50 90.00 140.00 Note. Correlations (in italics) are above the main diagonal. Variances are on the main diagonal. Covariances are below the main diagonal. example. Finally, after our initial presentation, we identify some similarities and differences with other approaches to causal inference and briefly mention some other features of the stochastic theory. ILLUSTRATIVE EXAMPLE We first illustrate why the basic mediation model can yield biased estimates of effects. For this purpose we use the basic mediation model in which there is random assignment to a binary treatment variable X that has an effect on an outcome variable Y partly mediated by variable M. We use a randomized experiment because it provides the strongest warrant for causal inference (Holland, 1986; Shadish, Cook, & Campbell, 2002) and thus provides a clear context for understanding the problem of causal inference in mediation analysis. Table 1 displays the population means, variances (diagonal), covariances (lower triangle), and correlations (upper triangle) of a treatment variable X with values 0 and 1, an intermediate variable M, and an outcome variable Y, with both M and Y being continuous variables. For our illustration, we use an example adapted from Holland (1988) in which the treatment variable X indicates whether or not a particular teaching intervention is given, the intermediate variable M denotes posttreatment study time (the number of minutes spent studying the academic subject in a single week), and outcome variable Y denotes achievement on a standardized test. According to the researcher s hypothesis, the teaching intervention has a positive effect on study time, which, in turn, has a positive effect on achievement. The key questions an applied researcher would like to answer are whether the treatment X has a positive effect on achievement and to what extent this effect is direct, that is, not mediated by study time, the intermediate variable M. Basic Mediation Model For our initial approach to the problem, we follow the logic of the traditional mediation analysis popularized by Baron and Kenny (1986). This approach requires the specification of three linear regression equations for the correlation matrix presented in Table 1. These three equations aim at identifying TOTAL, DIRECT, AND INDIRECT CAUSAL EFFECTS 427 (a) the total effect of the treatment X on outcome Y,(b)theeffect of treatment X and mediator M on Y, and (c) the effect of treatment X on mediator M. We use conditional expectations in order to describe how the conditional expected values of Y and M depend on the other random variables considered. E(Y X) = 130 + 20 X. (1) E(Y X, M) 35.00 3.75 X + 1.19 M. (2) E(M X) = 80 + 20 X. (3) From Equation 1, the total effect of X on Y is 20. From Equation 2, the direct effect of X on Y, controlling for M, is approximately 3.75. Finally, the indirect effect can be determined in two equivalent ways in the linear case: (a) by subtracting the direct effect from the total effect, 20 ( 3.75) 23.8, or (b) as the product of the a path (from Equation 3) and b path (from Equation 2) in Figure 1, that is, 20 1.19 23.8, which are equal within rounding error. Note that the direct and indirect effects obtained from this basic mediation model are not causal effects; in contrast, the total effect is a causal effect due to randomization. The causal interpretations of direct and indirect effects rest on strong, typically untested assumptions, which are unlikely to hold true (cf. Holland, 1988; Sobel, 2008). We consider these assumptions in depth later. The causal interpretation presumes that there are no other variables that are related to both M and Y and that due to randomization, no other variables are related to both X and M. We show in our second approach to this example that this assumption can be easily violated in the behavioral sciences, even in randomized experiments. Mediation Model with Pretreatment Variables As noted earlier, M represents posttreatment study time in a randomized experiment designed to evaluate a teaching intervention. Even if unobserved, there will be a variable M pre representing pretreatment study time. Some students study more minutes per week than others prior to any intervention. There will also be a variable Y pre representing pretreatment achievement. Our illustration captures the common situation in which baseline measures of the mediator and outcome exist, are related to the posttreatment variables, and are correlated with each other. Given a randomized experiment, the treatment variable X and the two pretreatment variables M pre and Y pre are uncorrelated (see Table 2). Table 1 depicts the means, variances, and covariances of X, M, and Y. Table 2 repeats these identical values and adds the means, variances, and covariances of M pre and Y pre, constraining the covariance of X and M pre and X and Y pre to zero. Using the parameters presented in Table 2, we first compute the average total treatment effect: E(Y X, M pre,y pre ) =.00 + 20 X +.40 M pre +.90 Y pre. (4)

428 MAYER ET AL. TABLE 2 Covariances, Correlations, and Means (Teaching Experiment) Y pre M pre X M Y Pretreatment achievement Y pre 100.00.85.00.50.74 Pretreatment time spent studying M pre 85.00 100.00.00.58.70 Treatment (yes=1, no=0) X 0.00 0.00 0.25.73.60 Posttreatment time spent studying M 68.00 80.00 5.00 189.00.89 Posttreatment achievement Y 124.00 116.50 5.00 205.70 280.45 Means 100.00 100.00 0.50 90.00 140.00 Note. Correlations (in italics) are above the main diagonal. Variances are on the main diagonal. Covariances are below the main diagonal. These values served as the population parameters for the simulated data set. Downloaded by [Cornell University Library] at 11:15 23 February 2015 This value is identical to the average total effect that we computed in the previous simple regression model (see Equation 1) because X and (M pre,y pre ) are independent. The differences between the models depicted in Figures 1 and 2 become apparent when we turn to the computation of the direct effect of X. If instead of E(Y X, M), we now consider the regression E(Y X, M, M pre,y pre ), we find E(Y X, M, M pre,y pre ) =.00 + 10 X +.50 M +.00 M pre +.90 Y pre. (5) Now the regression coefficient 10 for X would be interpreted to be the direct effect of treatment. It is the effect of X controlling for M and for all covariates (in this example M pre and Y pre ) and controlling for all variables that are in between X and M (in this example, there are no variables of this type). In empirical studies there might be additional covariates that are related to M, Y, or the pretreatment measures. However, Shadish, Cook, and Campbell (2002) and Steiner, Cook, Shadish, and Clark (2010) note that carefully selected pretreatment measures will typically reduce a large proportion of any bias that could stem from unobserved covariates. Finally, the indirect effect can be computed as a difference between the total treatment effect and the direct treatment effect in the model with pretreatment measures. In this example, the indirect treatment effect is the difference 85 M pre Y pre X.80 20.90 M 10 ε M FIGURE 2 Path diagram representing the two regressions E(M X, M pre,y pre )ande(y X, M, M pre,y pre ). We report unstandardized coefficients. The treatment is dichotomous and we wish to compare the coefficients with the coefficients obtained from a multigroup structural equation model..50 Y ε Y 20 10 = 10. In this linear model without interactions, this indirect effect is again equal to the product 20 0.50 = 10. How can we explain that omitting pretreatment variables in a randomized experiment yields a highly biased direct effect ( 3.75 instead of +10)? The answer is that even though X and (Y pre,m pre ) are independent, conditional independence of X and (Y pre,m pre )givenm does not hold. Conditioning on M induces conditional dependence of X and M pre, if both M pre and X are correlated with M. Because both M pre and X affect M, a high value of M means that both X and M pre tend to be high, whereas a low value of M means that both X and M pre tend to be low. Hence, conditioning on M induces conditional dependence of X and M pre,even though randomization has made X and M pre unconditionally independent (see Pearl, 2009; Rosenbaum, 1984, chapter 1, p. 17; Spirtes, Glymour, & Scheines, 2000). THE THEORY OF CAUSAL EFFECTS In this section we outline the aspects of the stochastic theory of causal mediation that apply in this illustration. The theory explicitly builds on notation and concepts of probability theory and specifies a probabilistic view on causation instead of a deterministic one (see Steyer, 2005). Because the probability theory notation may be unfamiliar to some readers, we provide Appendix A Table A1, which includes a full explanation of the notation. The basic idea of the theory is to define our effects of interest at the level of the smallest indivisible possible unit, here the individual participant. This definition gives rise to so called atomic effect variables that cannot be partitioned into smaller units. To define the atomic total effect of a putative cause X on outcome Y, we condition on all variables that are prior to or simultaneous with X. To define the atomic direct effect of putative cause X on outcome Y, we condition on all variables that are prior to or simultaneous with the potential mediator M, except for X. The atomic effect variables can be aggregated to determine the average and conditional causal effects. The theory specifies causality conditions that allow for the identification of causal effects.

In this section we provide a formal presentation of the stochastic theory of causal mediation focusing on those features that apply in the context of our illustration. Following the presentation of the theory, we return to the illustration showing how researchers can use the causality conditions to probe whether the proposed causal model is consistent with the data. Random Variables and Their Temporal Ordering All variables considered in an analysis of causal effects have to be random variables on the same probability space representing the random experiment considered. The random experiment is the empirical phenomenon under consideration and an applied researcher needs to consider a random experiment with which it is possible to answer his or her substantive research questions. The random experiment in our introductory example consists of the following: 1. Sampling a person u from a population of persons (u is the value of the nonnumerical random variable U); 2. Observing the values of pretreatment study time (M pre ) and pretreatment achievement(y pre ); 3. Assigning the unit at random to one of the two treatment conditions represented by X; 4. Observing the value of the putative mediator, posttreatment study time (M); and 5. Observing the value of the outcome variable Y, posttreatment achievement. The description of the random experiment is specific to our (simplified) illustrative example. In real world examples, many events and variables may occur in between the assessment of pretreatment covariates and treatment assignment or between onset of treatment and the assessment of a measured potential mediator M. All random variables in our illustrative example refer to this random experiment. The random variables have a joint distribution and a specified temporal ordering. 1 The treatment variable X has to be prior to the outcome variable Y, the pretreatment covariates Y pre and M pre have to be prior or simultaneous with X, the intermediate variable M has to be prior to Y, and X has to be prior to M. Weusetheterm covariate of X (represented by Z) to denote a covariate that is prior or simultaneous with the treatment variable. Accordingly, the term t M -covariate of X(Z tm ) denotes a covariate of X that is prior or simultaneous with the intermediate variable M. A global or comprehensive covariate C X is a vector of variables including all covariates of X; a global covariate 1 An ideal tool for representing a process with a time structure and introducing pre- and equiorderedness between random variables, events, and sets of events is a filtration (see Steyer et al., 2014). A filtration, which is a fundamental concept in the theory of stochastic processes (see, e.g., Bauer, 1996, p. 133, or Øksendal, 2007, p. 31) not only represents the process to be studied but is also used to distinguish intermediate variables from covariates. TOTAL, DIRECT, AND INDIRECT CAUSAL EFFECTS 429 C X,tM is a vector of variables including all t M -covariates of X including M itself. In our example C X = (U,Y pre,m pre ) and C X,tM = (U,M,Y pre,m pre ). True-Outcome Variables The distinction between C X and C X,tM is key to the definition of causal effects of interest. The definition of total effects is based on the conditional expectations E X=x (Y C X )ofy given C X in a treatment condition. 2 In contrast, the definition of direct effects is based on the conditional expectations E X=x (Y C X, tm ). In our illustrative example we can be certain that no other covariates are involved because we constructed the example to have this property. However, in general, we need to condition on all covariates that are potentially related to X, M, and Y in order to define causal effects of interest. Returning to our example, we define τ x := E X=x (Y C X ), (6) where C X = (U,Y pre,m pre ), τ x is the total-effect trueoutcome variable pertaining to x. True-outcome variables can not be observed in any experiment. Nevertheless, we can use them to define the total (causal) effects of interest. In a similar vein, we can define direct-effect true-outcome variables τ x,tm by conditioning on M and all variables that are prior or equiordered to M. Again using our example, we define the direct-effect true-outcome variable τ x,tm : τ x,tm := E X=x (Y C X,tM ), (7) where C X,tM = (U,M,Y pre,m pre ). We use the index t M in order to emphasize that we condition on all variables prior to or simultaneous with M including those in between X and M in order to define the causal effects. In our illustrative example, there is only one variable M to consider, but in real applications, there may be more. The variables τ x,tm are used to define conditional and unconditional direct and indirect effects. Atomic Total, Direct, and Indirect Effects Now we present the definition of atomic total, direct, and indirect effects. We confine ourselves to presenting the key ideas, omitting the details of the mathematical assumptions required for the definitions (see Steyer et al., 2014). The atomic totaleffect variable of treatment 1 (intervention) compared with 2 In many cases, we can build the theory on the conditional expectations E X=x (Y U) of the outcome variable Y given the observational-unit variable U in treatment condition x. In these cases, the true-outcome variables can be considered a stochastic version of Rubin s (1974) potential outcome variables. However, in other cases, there is still systematic variability within the observational units, for example, if the values on one or more covariates of X are not fixed due to measurement error, situational effects, or both. Such cases occur if fallible pretreatment variables intended to measure achievement, personality, or motivation variables are assessed. In such cases, we would need to condition on U and the fallible pretreatment variables.

430 MAYER ET AL. treatment 0 (control) on the outcome Y is δ 10 := τ 1 τ 0. The atomic total-effect variable is unbiased because conditioning on C X is equivalent to conditioning on all possible covariates. We chose the term atomic total-effect variable because δ 10 is a random variable, whose values are the total effects given every possible combination of covariates contained in C X, that is, it is not possible to further subdivide the total effects represented by δ 10. We go down to the atomic level of effects in our illustrative example, the atomic level is the individual person level and the values of δ 10 are the person-specific total effects of treatment X = 1 vs. control X = 0. Ideally, we would like to estimate the values of δ 10 itself, but that is not possible (cf. the fundamental problem of causal inference; Holland, 1988). Therefore, we introduce causality conditions that allow for identification of conditional and unconditional expectations of the δ 10 variable. In a parallel manner, we can also define the atomic directeffect variable using the variables τ x,tm defined in Equation 7: δ 10,tM := τ 1,tM τ 0,tM. The atomic direct-effect variable is defined by conditioning on all covariates that are prior or simultaneous with the time point t M in the stochastic process considered. In our example, the time point t M refers to the time point when M is measured. By definition, we control for all possible preand posttreatment confounders to obtain an unbiased atomic direct-effect variable. The values of the atomic direct-effect variable in our example are the direct effects given a person U = u and a value of the intermediate variable M = m.this definition of an atomic direct-effect variable is based on the notion of a stochastic process and represents a more general definition of direct effects compared with potential outcomes notation (see the paragraph about true-outcome variables in the Discussion for details). Finally, the atomic indirect-effect variable is defined to be the difference between the atomic total-effect variable and the atomic direct-effect variable: δ 10 δ 10,tM. Following our definitions, the values of the atomic indirecteffect variable represent the effects that are not direct. In the stochastic theory of causal effects, the indirect effect is not defined as a product of coefficients as in the basic mediation model. Instead, it is defined as a difference between two random variables, that is, as a difference between the atomic total-effect variable and the atomic direct-effect variable. In a simple linear path model, the two ways of defining the indirect effect yield equivalent results. Of importance, the product of coefficients approach does not generalize to nonlinear models. Average and Conditional Total, Direct, and Indirect Effects Based on the definitions of the atomic total-, direct-, and indirect-effect variables, we can consider conditional and unconditional expectations of these random variables. The unconditional expectation E(δ 10 ) of the atomic total-effect variable is defined to be the average total effect of X = 1vs. X = 0. Accordingly, E(δ 10,tM ) and E(δ 10 δ 10,tM )arethe average direct and average indirect effects, respectively. Considering total effects, we may be interested in the regression E(δ 10 X) and its values E(δ 10 X = x). If X = 1 denotes treatment (the new teaching intervention), then E(δ 10 X = 1) is the total effect given treatment. If we had included another covariate of X such as sex Z, we might be interested in the two possible values of E(δ 10 Z): the conditional total treatment effect E(δ 10 Z = m) on males and the conditional total treatment effect E(δ 10 Z = f ) on females. Finally, we might also consider the conditional expectation E(δ 10 X, Z), whose values are the treatment effects of the treated males, untreated males, treated females, and untreated females, respectively. Now considering direct effects, we may condition on a variable in (X, C X,tM ) or any combination thereof. An example is conditioning on M. The values of E(δ 10,tM M)arethe direct effects of the treatment given a specific level m of the intermediate variable M, for example, a specific amount of study time. A second example is conditioning on (Z, M). In this case, a value of E(δ 10,tM Z, M) is the conditional direct treatment effect given specific values (z, m) of the covariate Z (e.g., sex = male) and the intermediate variable M. Other aggregations of the δ 10,tM variable are possible as well. For example, Pearl (2001) suggests first averaging over the conditional distribution of M given X = x and Z = z and then averaging over the unconditional distribution of pretreatment covariates Z to obtain the average natural direct effect. Didelez, Dawid, and Geneletti (2006) suggest averaging over any distribution of M that is not influenced by X in order to obtain an average standardized direct effect. Which of these effects are most interesting will depend on the specific research question. Unbiasedness In the preceding section, we outlined how to define average and conditional total, direct, and indirect effects; the effects we would ideally like to estimate. These causal effects are defined based on true-outcome variables, which are not observable. How can these effects be identified from empirically estimable quantities? In practice, we can only estimate conditional expectations such as E(Y X, M, Z tm ) and conditional regressions E X=x (Y M,Z tm ) given treatment x. In order to link these regressions to the corresponding theoretical trueoutcome variables and atomic effect variables, we have to make assumptions that specify the conditions under which

TABLE 3 Definitions of Unbiasedness With Respect to Total and Direct Effects Unbiasedness with respect to total effects TOTAL, DIRECT, AND INDIRECT CAUSAL EFFECTS 431 E(Y X = x) E(Y X = x) = E(τ x ) E(Y X) E(Y X = x) = E(τ x ) for each value x of X E(Y X = x,z = z) E(Y X = x,z = z) = E(τ x Z = z) E X=x (Y Z) E X=x (Y Z) = E(τ x Z) E(Y X, Z) E X=x (Y Z) = E(τ x Z) for each value x of X Unbiasedness with respect to direct effects E(Y X = x,m = m, Z tm = z tm ) E(Y X = x,m = m, Z tm = z tm ) = E(τ x,tm M = m, Z tm = z tm ) E X=x (Y M,Z tm ) E X=x (Y M,Z tm ) = E(τ x,tm M,Z tm ) E(Y X, M, Z tm ) E X=x (Y M,Z tm ) = E(τ x,tm M,Z tm ) for each value x of X Note. The empirically estimable quantities in the left column are defined as unbiased if they are equal to the corresponding theoretical concept as specified in the right column. Z is a covariate of X and Z tm is a t M -covariate of X. Downloaded by [Cornell University Library] at 11:15 23 February 2015 these conditional expectations are identical with conditional or unconditional expectations of the true-outcome variables. The equivalence of conditional expectations such as E(Y X, M, Z tm )ore X=x (Y M,Z tm ) with conditional and unconditional expectations of the true-outcome variables is referred to as unbiasedness. The upper part of Table 3 presents the definitions of unbiasedness that are relevant for the analysis of total effects. In our illustrative example of the randomized experiment on the teaching intervention, the conditional expectation E(Y X = x) is unbiased with respect to total effects, if it is equal to the expectation E(τ x ) of the total-effect trueoutcome variable τ x. The example was constructed in such a way that this equality holds for X = 1 and X = 0. Because both values of the regression E(Y X) are unbiased in our example, we can also say that the regression itself is unbiased (cf. second row of Table 3). If E(Y X = 1) and E(Y X = 0) are both unbiased, then their difference is unbiased as well, that is E(Y X = 1) E(Y X = 0) = E(τ 1 ) E(τ 0 ) = E(δ 10 ). Similarly, E X=x (Y Y pre ) is unbiased with respect to (Y pre )- conditional total effects if it is equal to E(τ x Y pre ) (cf. fourth row of Table 3). This equality also holds true in our example. Note that conditional unbiasedness does not imply unconditional unbiasedness and vice versa. The lower part of Table 3 displays the corresponding definitions of unbiasedness that are important for the analysis of direct effects with respect to the time point t M. In these definitions the direct effect true-outcome variables τ x,tm are involved. Using our example, the regression E X=x (Y M) is not unbiased because it is not equal to the regression E(τ x,tm M). In contrast, the regression E X=x (Y M,M pre ) is unbiased because it is equal to E(τ x,tm M,Y pre ). CAUSALITY CONDITIONS Unbiasedness as defined in the preceding section is the weakest condition implying identifiability of causal effects. Unbiasedness cannot be tested empirically. In this section, we identify stronger conditions that imply unbiasedness (and identifiability) and that can be tested empirically. We term these conditions causality conditions. In this article we restrict ourselves to treating two types of these causality conditions that can be applied in the context of our empirical example. These two types of conditions are termed independent cause conditions and regressively independent outcome conditions. In practice, it is not possible to control for all covariates. Causality conditions provide the theoretical foundation to decide which covariates need to be controlled for in order to causally interpret results from a mediation analysis. They yield implications that can be used to test if all relevant covariates out of a set of observed covariates have been included in the hypothesized model. These tests can be easily conducted with standard statistical software packages and are a useful tool for applied researchers. We believe it is not sufficient to simply assume that all relevant confounders have been controlled. Rather, we argue for careful empirical evaluation of the plausibility of the assumptions. Such evaluation is especially important in mediation models, even in the context of a randomized experiment, as we demonstrate. We begin by introducing independent cause conditions and regressively independent outcome conditions and then consider their consequences with regard to design techniques and covariate selection. Independent Cause Conditions The independent cause conditions refer to the conditional or unconditional independence of the treatment variable X and a global covariate. We consider four independent cause conditions that apply in the context of our example. Recall that X

432 MAYER ET AL. in our example is a dichotomous treatment variable (teaching intervention vs. control). Two (1,2) of the independent cause conditions apply to the total effects and two (3,4) apply to the direct and indirect effects. Beginning on the left hand side we present a shorthand for the corresponding condition, followed by its definition, and then the application of the definition to our example. Total Effects: (1) X C X : P (X = 1 C X ) = P (X = 1). Independence of X and C X = (U,Y pre,m pre ) can be created by random assignment of units to treatment conditions. It implies unbiasedness of E(Y X). X C X holds true in our example. (2) X C X Z : P (X = 1 C X ) = P (X = 1 Z). Conditional independence of X and C X given a covariate Z can be created by conditional random assignment of units to treatment conditions based on the values z of Z. For example, random assignment to treatment conditions could be carried out within each gender. Furthermore, the covariate Z = (Z 1,...,Z Q ) can be selected such that X C X Z might hold. If X C X Z holds, then E(Y X, Z) is unbiased. X C X Z holds true in our example for every Z = f (C X ) because X C X implies X C X f (C X ). Direct and Indirect Effects: (3) X C X,tM M : P (X = 1 C X,tM ) = P (X = 1 M). Conditional independence of X and C X,tM given intermediate variable M cannot be created by any known design technique. It also cannot be used for covariate selection. X C X,tM M implies that E(Y X, M) is unbiased. X C X,tM M does not hold in our example. (4) X C X,tM Z tm,m : P (X = 1 C X,tM ) = P (X = 1 M,Z tm ). Conditional independence of X and C X,tM given a t M -covariate Z tm and the intermediate variable M cannot be created by any known design technique. However, the covariate Z tm = (Z 1,...,Z Q ) can be selected such that X C X,tM Z tm,m might hold. X C X,tM Z tm,m implies that E(Y X, M, Z tm ) is unbiased. X C X,tM Z tm,mholds true in our example for Z tm = M pre and for Z tm = (M pre,y pre ). Regressively Independent Outcome Conditions The regressively independent outcome conditions refer to conditional or unconditional regressive independence of the outcome variable Y and all covariates of X, or all t M - covariates of X, respectively. Consider a continuous random variable Y and a discrete random variable Z with values z. Y is regressively independent of Z if all conditional expectations E(Y Z = z) are equal. In this case the regression E(Y Z) = E(Y ) is a constant. Similarly for conditional regressive independence, Y is regressively independent of Z conditional on X, if E(Y X, Z) = E(Y X). We consider four regressively independent outcome conditions that apply in the context of our example. Note that the symbol represents regressive independence and that P (X = x C X,tM ) > 0 holds in our example. The first two conditions (5, 6) presented here are relevant for total effects and the second two conditions (7, 8) are relevant for direct and indirect effects. None of the regressively independent outcome conditions can be created by (conditional) randomization. Beginning on the left hand side we present a shorthand for the corresponding condition, followed by its definition, and then the application of the definition to our example. Total Effects: (5) Y C X X : E(Y X, C X ) = E(Y X). This condition is called regressive independence of Y from all covariates Z given X. Y C X X implies unbiasedness of E(Y X). It does not hold in our example (see Equation 4). (6) Y C X X, Z : E(Y X, C X ) = E(Y X, Z). This condition is called regressive independence of Y from all covariates of X given X and Z. The covariate Z = (Z 1,...,Z Q ) can be selected such that Y C X X, Z might hold. Y C X X, Z implies that E(Y X, Z) is unbiased. In our example, it holds for neither Z = M pre nor Z = Y pre. Direct and Indirect Effects: (7) Y C X,tM X, M : E(Y X, C X,tM ) = E(Y X, M). This condition is called regressive independence of Y from all covariates Z tm given X and M. Y C X,tM X, M implies that E(Y X, M) is unbiased. Y C X,tM X, M does not hold in our example (see Equation 5). (8) Y C X,tM X, M, Z tm : E(Y X, C X,tM ) = E(Y X, M, Z tm ).] This condition is called regressive independence of Y from all t M -covariates of X,given X, M, and Z tm. The covariate Z tm = (Z 1,...,Z Q ) can be selected such that Y C X,tM X, M, Z tm might hold. Y C X,tM X, M, Z tm implies that E(Y X, M, Z tm ) is unbiased. Y C X,tM X, M, Z tm holds true in our example for Z tm = Y pre and for Z tm = (M pre,y pre ). From a methodological point of view, there are three key points: (a) Direct and indirect effects estimated in the basic mediation model are only causally interpretable under unbiasedness of E(Y X, M),whichwouldbeimplied by X C X,tM M or by Y C X,tM X, M. However,

unbiasedness of E(Y X, M) is unlikely to hold as shown in our example. (b) There is no known randomization technique that could create X C X,tM Z tm,mor unbiasedness of the regression E(Y X, M, Z tm ). Hence, whenever the goal is to analyze direct and indirect effects of treatments, there is no alternative to using causal modeling techniques. (c) If the focus of the analysis is only on average and conditional total treatment effects, then either randomization (X C X ) or conditional randomization (X C X Z) guarantees unbiasedness of effect estimates given that the regressions E(Y X) and E(Y X, Z) are correctly specified. Only in this case can we rely on the conditional or unconditional independence created by randomization. Testing Causality Conditions The unbiasedness conditions listed in Table 3 represent properties of the true-outcome variables which have no implications that can be tested empirically. Other causality conditions such as strong ignorability (Rosenbaum & Rubin, 1983) or sequential ignorability (Imai, Keele, & Tingley, 2010) share this same limitation. They provide no opportunity for researchers to directly probe their hypothesized causal model. In contrast, the independent cause conditions and the regressively independent outcome conditions are potentially falsifiable. Although they all involve the global covariates C X or C X,tM, they imply propositions that can be tested empirically. We consider again our illustrative example of the teaching experiment. Independence of X and all covariates of X (X C X ) implies the following testable propositions: P (X = 1 Y pre ) = P (X = 1 M pre ) = P (X = 1 Y pre, M pre ) = P (X = 1). As we will illustrate here, these hypotheses can be tested using logistic regression techniques. In our illustrative example, we would test whether the inclusion of covariates Z = f (U,Y pre,m pre ) significantly alters the probability of receiving treatment. Similarly, conditional independence of X and all covariates of X given M pre implies the following testable proposition: P (X = 1 M pre,y pre ) = P (X = 1 M pre ), which follows from X C X Z choosing Z = M pre.we could also consider the testable propositions of conditional independence of X and all covariates of X given Y pre by choosing Z = Y pre. The independent cause conditions for direct effects (X C X,tM M and X C X,tM Z tm,m)have corresponding implications. Turning to the regressively independent outcome conditions, similar arguments apply. Y C X X, applied to our TOTAL, DIRECT, AND INDIRECT CAUSAL EFFECTS 433 illustrative example, implies E(Y X, Y pre ) = E(Y X, M pre ) = E(Y X, Y pre,m pre ) = E(Y X). As shown here in the context of our illustrative example, these hypotheses can be directly tested using regression techniques. Similarly, Y C X X, Z implies E(Y X, Y pre,m pre ) = E(Y X, Y pre ) for Z = Y pre. Note that we can also choose to test Y C X X, Z for Z = M pre. These implications can also be directly tested. The regressively independent outcome conditions for direct effects (Y C X,tM X, M and Y C X,tM X, M, Z tm ) have corresponding implications. IDENTIFICATION OF CAUSAL EFFECTS Under the assumption of unbiased conditional regressions, we can identify average total, direct, and indirect effects. The basic idea is to estimate the expectations of Y conditional on a covariate Z = (Z 1,...,Z Q ) within each treatment condition x. We need to select (Z 1,...,Z Q ) in such a way that unbiasedness can be assumed. We can then average over the distribution of (Z 1,...,Z Q ) in order to obtain adjusted means. The adjusted means are estimates for the expectations of the τ x and τ x,tm variables and their difference is an estimate for average total and direct effects. Table 4 summarizes identification of conditional and unconditional causal effects in general terms. We again use our example to illustrate identification of average total, direct and indirect effects. Consider the unbiased regressions E X=1 (Y Y pre ) and E X=0 (Y Y pre ). Recall that these regressions are unbiased because X C X Y pre. Consequently, the expectation E[E X=x (Y Y pre )] of these conditional regressions is equal to the expectation E(τ x )of the true-outcome variable with respect to total effects. The average total effect E(δ 10 ) then is identified by E(δ 10 ) = E(τ 1 ) E(τ 0 ) = E[E X=1 (Y Y pre )] E[E X=0 (Y Y pre )] = E[α 10 + α 11 Y pre ] E[α 00 + α 01 Y pre ] = α 10 + α 11 E(Y pre ) (α 00 + α 01 E(Y pre )), (8) as shown in the first row of Table 4. α xi denotes the ith regression coefficient in treatment condition X = x. This is one way to identify the average total effect. Because the regressions E(Y X), E(Y X, M pre ), and E(Y X, Y pre,m pre ) are also unbiased: we can alternatively identify the average total effect as E(Y X = 1) E(Y X = 0), or as E[E X=1 (Y M pre )] E[E X=0 (Y M pre )], or as E[E X=1 (Y Y pre,m pre )] E[E X=0 (Y Y pre,m pre )]. In our example, all these regressions are

434 MAYER ET AL. TABLE 4 Identification of Causal Effects Identification of average and conditional total effects under unbiasedness of E(Y X, Z). Z is a covariate of X and V is a function of Z. Effect Term Identification E(δ 10 ) Average total effect E[E X=1 (Y Z) E X=0 (Y Z)] E(δ 10 V = v) Total effect given V = v E[E X=1 (Y Z) E X=0 (Y Z) V = v] E(δ 10 X = x) Total effect given X = x E[E X=1 (Y Z) E X=0 (Y Z) X = x] Identification of average and conditional direct effects under unbiasedness of E(Y X, M, Z tm ). Z tm is a t M -covariate of X and W is a function of (M,Z tm ). Effect Term Identification E(δ 10,tM ) Average direct effect E[E X=1 (Y M,Z tm ) E X=0 (Y M,Z tm )] E(δ 10,tM W = w) Direct effect given W = w E[E X=1 (Y M,Z tm ) E X=0 (Y M,Z tm ) W = w] E(δ 10,tM M = m) Direct effect given M = m E[E X=1 (Y M,Z tm ) E X=0 (Y M,Z tm ) M = m] E(δ 10,tM X = x) Direct effect given X = x E[E X=1 (Y M,Z tm ) E X=0 (Y M,Z tm ) X = x] Downloaded by [Cornell University Library] at 11:15 23 February 2015 unbiased: therefore, we may use any of these ways to identify the average total effect. Similarly, the average direct effect can be identified based on any of the following unbiased regressions: E(Y M,X,Y pre ), or E(Y M,X,M pre ), or E(Y M,X,Y pre,m pre ). Using E(Y M,X,Y pre ) as an example, the average direct effect is identified by E(δ 10,tM ) = E(τ 1,tM ) E(τ 0,tM ) = E[E X=1 (Y M,Y pre )] E[E X=0 (Y M,Y pre )] = E[γ 10 + γ 11 M + γ 12 Y pre ] E[γ 00 + γ 01 M + γ 02 Y pre ] = γ 10 + γ 11 E(M) + γ 12 E(Y pre ) (γ 00 + γ 01 E(M) + γ 02 E(Y pre )), (9) where γ xi denotes the ith regression coefficient in treatment condition X = x. Note that the regression E(Y M,X)may not be used to identify the average direct effect because it is biased in our example. The average indirect effect is identified by E(δ 10 ) E(δ 10,tM ) = E(τ 1 ) E(τ 0 ) [E(τ 1,tM ) E(τ 0,tM )] = E[E X=1 (Y Y pre )] E[E X=0 (Y Y pre )] [ E[E X=1 (Y M,Y pre )] E[E X=0 (Y M,Y pre )] ]. (10) We demonstrated how to identify average causal effects in the context of our illustrative example. In other applications including interactions between M and X, it can be informative to also consider other aggregates of atomic effect variables, such as conditional effects as shown in Table 4 or natural direct effects (Pearl, 2001). See Pearl (2012) and the references mentioned therein in his Footnote 6 for substantive discussions about policy implications. For example, Pearl s (2001) natural direct effect (NDE x ) expressed in terms of our stochastic theory can be identified based on any of the following unbiased regressions: E(Y M,X,Y pre ), or E(Y M,X,M pre ), or E(Y M,X,Y pre,m pre ). Using E(Y M,X,Y pre ) as an example, the NDE x is identified by NDE x = E [ E X=x [E X=1 (Y M,Y pre ) E X=0 (Y M,Y pre ) Y pre ] ] = (γ 10 γ 00 ) + (γ 11 γ 01 ) E [ E X=x [M Y pre ] ] + (γ 12 γ 02 ) E(Y pre ) (11) for x = 0, 1. Or the direct effect given X = 1, which is sometimes also termed the direct effect on the treated or the direct effect on the exposed (Vansteelandt & VanderWeele, 2012) is identified by E(δ 10,tM X = 1) = E X=1 [E X=1 (Y M,Y pre ) E X=0 (Y M,Y pre )] = (γ 10 γ 00 ) + (γ 11 γ 01 ) E X=1 (M) + (γ 12 γ 02 ) E X=1 (Y pre ). (12) In our illustrative example with randomized assignment to treatment conditions and no interactions, the average direct effect, the natural direct effect, the direct effect given X = 1, and all other conditional effects shown in Table 4 are identical. A SIMULATED DATA EXAMPLE Our simulated data example has two goals. First, we wish to demonstrate how the general theory of total, direct, and indirect effects can be applied to a data set and how the regressions and expectations discussed earlier can be estimated and tested using structural equation modeling. Second, we wish to illustrate problems that occur in an ideal randomized experiment when direct and indirect treatment effects are estimated. The simulated data set is based on the teaching experiment described earlier.