Causal Inference for Mediation Effects

Size: px
Start display at page:

Download "Causal Inference for Mediation Effects"

Transcription

1 Causal Inference for Mediation Effects by Jing Zhang B.S., University of Science and Technology of China, 2006 M.S., Brown University, 2008 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biostatistics at Brown University Providence, Rhode Island May 2012

2 c Copyright by Jing Zhang May 2012

3 This dissertation by Jing Zhang is accepted in its present form by Biostatistics as satisfying the dissertation requirement for the degree of Doctor of Philosophy Date Joseph W. Hogan, Advisor Recommended to the Graduate Council Date Hernando Ombao, Reader Date Tao Liu, Reader Date Christopher W. Kahler, Reader Approved by the Graduate Council Date Peter M. Weber, Dean of the Graduate School iii

4 Acknowledgments First and foremost I express my deepest gratitude to my advisor and mentor, Professor Joseph W. Hogan, who has guided and supported me throughout my research with his patience and knowledge whilst allowing me the room to work in my own way. Without his advice and persistent help, this thesis would not be possible. Also, I would like to acknowledge the comments from Hernando Ombao, Tao Liu, Christopher W. Kahler, Michael J. Daniels and Jason Roy. This research was supported through a grant from RC1 AA iv

5 Table of Contents Table of Contents v List of Tables vii List of Figures viii 1 Imputation-Based Inference for Natural Direct and Indirect Effects Introduction Estimation of causal mediation effects Assumptions and joint model of potential mediator Estimating NDE and NIE Inference about natural direct and indirect effects Simulation Study Data Application Discussion Causal Inference about Mediation when there are Several Mediators Introduction Estimation of natural causal effects with multiple mediators v

6 2.2.1 Notations and natural causal effects definitions Assumptions Method of Estimation Application to STRIDE Data Discussion Causal Mediation Analysis for Observational Survival Data Introduction Nutrition Data from USAID-AMPATH program Estimation of causal mediation effect Notations and natural causal effects definitions Assumptions NDE and NIE Estimations AMPATH Data application Discussion A Consistency of Estimators in Chapter 1 61 A.1 The Proof of IPW estimator s consistency A.2 The Proof of REG estimator s consistency A.3 The Proof of AIPW estimator s consistency B The Decomposition of Total Effect in Chapter 2 67 C Insensitivity of Causal Effects Estimators to the sensitivity parameter in Chapter 2 69 vi

7 List of Tables 1.1 Simulated estimates of NDE when working models are correctly specified (True NDE=2.84) Simulated estimates of NDE when models of [M 0 X] and [M 1 X] are correctly specified and regression imputation model is misspecified (True NDE=2.84) Simulated estimates of Natural Direct Effect when models of [M 0 X] and [M 1 X] are misspecified and regression imputation model is correctly specified (True NDE=2.84) Inference of Natural Effects by carrying forward the most recent behavior process measurement for the missing mediator Strata of binary mediator Estimates and SE (in parentheses) of natural causal effects in the STRIDE study: unconstrained models allow overlapping effect, and constrained models do not Frequency of number of controls in matched sets Full data and observed data:o represents observed and I represents imputation 53 vii

8 List of Figures 1.1 The histograms of potential outcomes in one generated dataset The weights boxplot depending on M Partitioning of direct and indirect effects when there are two mediators The causal diagram of Nutrition Data The proportion surviving after ART initiation for exposed and control groups 50 viii

9 Abstract of Causal Inference for Mediation Effects, by Jing Zhang, Ph.D., Brown University, May 2012 This thesis work is motivated by the specific challenges in analyzing data from behavioral trials. Specifically, SRIDE study designed to increase physical activity. Typically, the objective is to understand how the effect of an intervention operates on a primary outcome through and around potential mediating variables. The overall effect of a treatment can be broken down into its direct and indirect effects. The indirect effect is the effect of intervention on the outcome passing through the mediator, and the direct effect is the effect flowing around the mediator. The most widely used approach to mediation analysis is the Baron-Kenny method. However, it requires that subjects are randomized to baseline intervention and, following randomization, to the mediator levels within each intervention group. Other methods assume each individual can have potential outcomes at each level of the mediator, and they measure controlled effects. In behavior trials, when the mediator reflects inherent characteristics of an individual, such as motivation to excerise, models capturing controlled effect may not be appropriate, because the mediator cannot be externally manipulated. Here, we propose methodology to address this shortcoming, and estimate natural causal effects. The natural direct effect measures the intervention effect when the mediator for each subject is fixed at its potential level under no intervention, while the natural indirect effect measures the effect of intervention going through the mediator by contrasting the mediator value under intervention to its value under no intervention. Although these methods are motivated by behavioral data, they are applicable in other clinical trial settings as well. The first part of this thesis proposed three methods to estimate natural direct and indirect effects in one mediator context of randomized trials: inverse probability weighting

10 (IPW), regression imputation (REG) and augmented inverse probability weighting (AIPW). We use baseline covariates to impute the unobserved potential mediator, and use a sensitivity parameter to capture association between two potential mediators. The unobserved potential outcome is treated as a missing value, whose expectation can be consistently estimated under some assumptions. We study the properties of our methods in simulation studies and illustrate using an analysis of a recent intervention trial designed to increase physical activity. The second part of this thesis develop a model to estimate natural causal mediation effects when there are several potential mediators in randomized trials, and decompose the total intervention effect into natural direct and specific indirect effects. Our model identifies the joint distribution of potential mediating variable given baseline covariates and targeted restrictions to the correlation structure of the potential mediators. Unobserved potential mediators and associated potential outcomes can therefore be imputed under the model, and causal contrasts of interest can be computed in a straightforward manner. We illustrate our methods using an analysis of a recent intervention trial designed to increase physical activity. The third part of this thesis examine causal mediation analysis in the survival data of observational study. A matched nutrition dataset is obtained from USAID-AMPATH program using optimal matching, and we propose a method to identify natural causal effects in survival probability odds ratio scale. Unobserved potential outcomes of each individual in the treatment group when the mediator is fixed its value under the control, can be imputed based on baseline covariates and mediator values of matched controls. Causal contrast of interest can be inferred based on conditional logistic regression accounting for the matching property.

11 Chapter 1 Imputation-Based Inference for Natural Direct and Indirect Effects Abstract In randomized intervention trials of behavior interventions, such as those designed to increase physical activity or reduce substance abuse, a typical objective is to understand how the effect of an intervention operates on a primary outcome through potential mediating variables. A common approach to mediation analysis is based on the Baron-Kenny method, a two-stage regression that requires randomization of the mediating variable within each intervention group. However, because mediating variables are observed after randomization, this assumption typically will not hold in practical settings. Other methods assume that mediator levels can be externally manipulated or controlled such that each individual can have potential outcomes at each level of the mediator. When the mediator reflects inherent characteristics of an individual, such as motivation to exercise, models that capture controlled effects may not be appropriate. An alternative formulation, in terms of natural 1

12 2 direct and indirect effects, can avoid both shortcomings. We propose three methods to estimate natural direct and indirect effects: inverse probability weighting (IPW), regression imputation (REG) and augmented inverse probability weighting (AIPW). The natural direct effect measures the intervention effect when the mediator for each subject is fixed at its potential level under no intervention, while the natural indirect effect measures the effect of intervention going through the mediator by contrasting the mediator value under intervention to its value under no intervention. We use baseline covariates to impute the unobserved potential mediator, and use a sensitivity parameter to capture association between two potential mediators. The unobserved potential outcome is treated as a missing value, whose expectation can be consistently estimated under some assumptions. We study the properties of our methods in simulation studies and illustrate using an analysis of a recent intervention trial designed to increase physical activity.

13 3 1.1 Introduction In randomized behavior intervention trials, new interventions are usually tested relative to a standardized intervention or non-intervention control group. A specific example is Project STRIDE (Marcus and others 2007), a 4-year randomized trial designed to promote physical activity adoption and short-term maintenance among previously sedentary adults. The primary outcome is minutes per week of moderate intensity physical activity measured at baseline, 6 and 12 months by the 7-day Physical Activity Recall (PAR) interview. Participants were randomized to one of the following three arms: telephone-based intervention with individualized motivational feedback; print-based intervention with individualized motivational feedback; and a contact-control delayed treatment group. Individuals randomized to both intervention arms received individually tailored messages that were generated by a computer expert system, stage-targeted booklets, and physical activity-related tip sheets. Participants received a total of 14 contacts over the course of 12 months, and the rate of contacts was more frequent at the beginning of the study. The material delivered to the two treatment arms was matched but the channel of delivery differed. The control group was mailed health education information on the same schedule as the treatment groups, and were informed that they would be able to choose either the print- or the telephone-based intervention for 12 months. To assess the effectiveness of intervention, not only the physical activity measure is considered, but also the potential psychosocial mediators of physical activity behavior change. Several factors affect whether a person is physically active, and some psychological theories and models (i.e learning theory, decision-making theory, and social cognitive theory) can be useful when developing strategies to help participants become more physically active (King and others 2002; Prochaska and DiClemente 1983; Marcus and Forsyth 2009). The

14 4 theory used to design the program suggests which factors might be most important for producing physical activity change. Hence, exploring the portion of the intervention effect that flows through and around these potential mediators is of great interest; with it, researchers can understand the mechanism of the intervention more deeply, and use this knowledge to develop more effective interventions. The overall effect of a treatment can be broken down into its direct and indirect effects. The indirect effect is the effect of intervention on the outcome passing through the mediator, and the direct effect is the effect flowing around the mediator. If they are additive, the sum is the total effect. We can further distinguish between controlled and natural direct effects (Pearl 2001; Robins and Greenland 1992). The former measures the intervention effect in the population when we set the mediator to a pre-specified level for all the individuals, and the latter measures the intervention effect when the mediator for each subject is fixed at its potential level under no intervention (or natural level). The most widely used approach to mediation analysis is the Baron-Kenny method (Baron and Kenny 1986), a regression-based method. However, it requires that subjects are randomized to baseline intervention and, following randomization, to the mediator levels within each intervention group; the latter condition is called the sequential ignorability assumption (MacKinnon and others 2002). A fundamental drawback of the Baron-Kenny method is that mediation is a post-randomization event that cannot generally be controlled. In practice, using this approach under a weaker sequential ignorability assumption, that both intervention and mediator are randomized conditionally on observed covariates, may still be untenable (Lynch and others 2008; Ten Have and others 2007). When the assumptions are satisfied, and models correctly specified, the Baron-Kenny method estimates controlled causal effects.

15 5 To relax the sequential randomization assumption, several alternatives have been proposed. The standard instrumental variables (IV) approach (Angrist and others 1996; Imbens and Rubin 1997) has become very popular for estimating local causal effects such as the complier average causal effect; however, this approach is based on an exclusion restriction. In the context of mediator analysis, this implies that the intervention affects outcome only through the mediator, which does not allow for direct effects of treatment. Another approach is the structural mean model (SMM) (Ten Have and others 2004, 2007). It allows unknown confounding between the mediator and the outcome, but does make a structural interaction assumption between the baseline intervention and covariates on the mediator. It assumes the mediating variable can be manipulated, and therefore estimates controlled causal effects. A third approach is principal stratification (PS) (Frangakis and Rubin 2002). This method divides subjects into latent strata according to the joint distribution of potential mediators under each intervention. Like with SMM, it allows unknown confounding between the mediator and the outcome; unlike the SMM, it does not require the mediating variable to be manipulated to a particular level for all subjects. Correspondingly, the natural causal effects can be inferred. However, because the causal effects are defined within each stratum, this method cannot be used to infer mediation at the population level unless extra assumptions are made about direct and indirect effects across strata (VanderWeele 2008). In intervention trials, where potential mediators are psychological variables that cannot be manipulated even in theory, it is more appropriate to frame the analysis in terms of natural direct and indirect effects. To identify them, different assumptions can be used (Pearl 2001; Robins 2003; van der Laan and Petersen 2004). In this paper, we propose a semiparametric method to estimate the natural direct and indirect effects at the population

16 6 level. We use a model capturing the association between two potential mediators (Roy and others 2008) to identify the unobserved one with baseline covariate information first. Then we treat unobserved potential outcomes as missing data, and use the following methods to estimate the natural causal effects: inverse probability weighting (IPW), regression imputation (REG) and augmented inverse probability weighting (AIPW). The remainder of the paper is organized as follows: In Section 1.2, we define potential outcomes and natural direct and indirect effects, and describe key assumptions and methods of estimation. In Section 1.3, we study bias and efficiency of our proposed methods with a simulation study. An application to Project STRIDE data is given in Section 1.4. There are concluding remarks in Section Estimation of causal mediation effects We consider randomized trials with two arms and two mediator levels. Let R {0, 1} denote a randomization indicator, where R = 1 indicates randomization to the treatment group (e.g. telephone- or print- based intervention) and R = 0 indicates randomization to the contact-control group. Let M r {0, 1} denote the potential mediator level under assignment r. Similarly, Y rmr is defined to be the potential outcome (i.e., response to treatment) if the participant is randomized to the group r and the mediator is fixed at its value under the intervention r, where r and r could be equal or not. Each subject therefore has two potential mediator levels, M 0 and M 1, with M = RM 1 + (1 R)M 0 observed. Similarly, each individual has 4 potential outcomes {Y 0M0, Y 0M1, Y 1M0, Y 1M1 }; however, only Y = RY 1,M1 + (1 R)Y 0,M0 is observed. In our analysis of the STRIDE data, the natural direct effect is defined as NDE = E(Y 1M0 Y 0M0 ), and the natural indirect effect as NIE = E(Y 1M1 Y 1M0 ). When the

17 7 effects are additive, we have the total intervention effect TE = E(Y 1M1 Y 0M0 ) = E(Y 1M0 Y 0M0 ) + E(Y 1M1 Y 1M0 ) = NDE + NIE (1.1) Each of the potential outcome means in (1.1) is identifiable from observed data except E(Y 1M0 ). The best estimator of E(Y 1M1 ) is the sample mean of Y among those randomized to R = 1. Likewise, the best estimator of E(Y 0M0 ) is the sample mean of Y among those randomized to R = 0. Therefore, inference about NDE and NIE requires estimation of E(Y 1M0 ). The potential outcome Y 1M0 is not observed for anyone and is regarded as missing data Assumptions and joint model of potential mediator We assume that for each individual, there are several baseline covariates X. Conceptually, our overall strategy can be summarized as follows. We first decompose the joint distribution [Y 1M0, Y 1M1, M 0, M 1 X, R] as [M 0, M 1 X, R][Y 1M0, Y 1M1 M 0, M 1, X, R]. Next, we specify models and assumptions for each component. For those with R = 1, we impute M 0 from the distribution [M 0 R = 1, X, M 1 ] implied by our specification of [M 0, M 1 X, R]. Then, we will use either regression imputation, inverse probability weighting (IPW), or augmented inverse probability weighting (AIPW) to identify and estimate the mean of Y 1M0 based on assumption about [Y 1M0 Y 1M1, M 0, M 1, X, R]. For all of our analysis, three assumptions are needed. Assumption 1.1 Stable unit treatment value assumption (SUTVA): The potential outcomes for each subject i are unrelated to the treatment status of other individuals. This implies (i) if r = r, then M r = M r for subject i, and (ii) if r = r and m = m, then

18 8 Y rm = Y r m for subject i. Assumption 1.2 Randomization: the treatment assignment is jointly independent of all the potential outcomes and covariates; i.e., R {M 0, M 1, Y 0M0, Y 0M1, Y 1M0, Y 1M1 }. Assumption 1.3 Stochastic monotonicity: Conditional on baseline covariates, the mediator is more likely to have value M = 1 on arm R = 1 if it would have value M = 1 on R = 0; i.e., P (M 1 = 1 M 0 = 1, X) P (M 1 = 1 M 0 = 0, X). This assumption is not technically necessary but helps constrain the joint distribution of M 0 and M 1. Let ψ r (X) = P (M r = 1 X). Then, following Roy (2008), Assumption imply that the conditional distribution [M 1 M 0 = 1, X] follows P (M 1 = 1 M 0 = 1, X = x) = ψ 1 (x) + φ{u(x) ψ 1 (x)}, (1.2) where U(x) = min{1, ψ 1 (x)/ψ 0 (x)}. If φ = 0, then P (M 1 = 1 M 0 = 1, X = x) = ψ 1 (x), i.e, M 1 is independent of M 0 ; if φ = 1, then P (M 1 = 1 M 0 = 1, X = x) = U(x), which is the largest probability compatible with the marginal distributions [M 1 X] and [M 0 X]. Hence P (M 0 = 1 M 1 = a, X = x) = a[ψ 1 (x) + φ{u(x) ψ 1 (x)}] ψ 0(x) ψ 1 (x) ψ 0 (x) + (1 a)[1 ψ 1 (x) φ{u(x) ψ 1 (x)}] 1 ψ 1 (x).(1.3) We use the following assumption to enable imputation of M 0 for those with R = 1. Assumption 1.4 P (M r = 1 X) = ψ r (X) = q r (X; λ r ), where the functional form of q r is known and continuous in λ r, but the finite-dimensional parameter λ r is unknown. Using Assumption , M 0 can be imputed for the subjects in the treatment group as follows:

19 9 (a) For r = 0, 1, fit the mediator models ψ r (X) = P (M r = 1 X) = q r (X; λ r ) to obtain λ r. (b) Fixing the sensitivity parameter φ, draw M 0 from the conditional distribution [M 0 M 1, X; λ r, φ] given by (1.3). Following the imputation, we have observations of {R, M 0, M 1, Y 1M1 } for each subject in the treatment group R = Estimating NDE and NIE To estimate E(Y 1M0 ), we propose three separate approaches based on the imputed dataset: inverse probability weighting (IPW), regression imputation (REG) and augmented inverse probability weighting (AIPW). Besides Assumption , each of these methods relies on two additional assumptions. Assumption 1.5 M 0 Y 1M1 M 1, X, which says that conditional on baseline covariates and the mediator level under intervention, the distribution of M 0 is independent of Y 1M1. Assumption 1.6 E(Y 1M0 M 0 = m, X) = E(Y 1M1 M 1 = m, X), which says that given covariates X, for a fixed value of the mediator, the expectation of the potential outcomes is the same regardless of whether the mediator is M 0 or M 1. Method 1: Inverse Probability Weighting (IPW) This method builds on IPW methods proposed by Robins et al. (1994). The key idea is the following. For those in the intervention arm having M0 = M 1, we treat Y 1M1 as a realization of Y 1M0. In the subset of individuals with M0 = M 1, on average, Y 1M1 = Y 1M 0. We use inverse probability weights to account for the fact that these individuals are not a random subsample of the full data.

20 10 Let µ 1M0 (m) = E(Y 1M0 M 0 = m). The IPW estimator of µ 1M0 (m) is ˆµ IPW 1M 0 (m) = i R ii(m0i = m)w miy i i R ii(m0i = m)w, (1.4) mi where W mi = W mi (ˆλ, φ) = I(M 1i = m)/ P (M 1 = m M 0 = m; ˆλ, φ, X i ), and ˆλ = ( λ 1, λ 0 ). The denominator in W mi is computed using model (1.2). The estimator of E(Y 1M0 ) is a weighted average of the estimated conditional means µ 1M0 (0) and µ 1M0 (1), Ê IPW (Y 1M0 ) = ˆµ IPW 1M 0 (1) P (M 0 = 1) + ˆµ IPW 1M 0 (0){1 P (M 0 = 1)}, where P (M 0 = 1) = i I(M 0i = 1)R i/ i R i. The proof of this estimator s consistency is in the Appendix. Method 2: Regression imputation (REG) We can also estimate E(Y 1M0 ) using imputation methods. Let Ỹ1M 0 denote the imputation of Y 1M0. When M0 = M 1, we set Ỹ1M 0 = Y 1M1 = Y. When M0 M 1, we use a regression based on the following assumption to impute Y 1M0. Assumption 1.7 E(Y 1M1 M 1 = m, X) = g(m, X; β), where the functional form of g is known and continuous in β, but the finite-dimensional parameter β is unknown. Hence the imputed value Ỹ 1M0 = Y 1M1 I(M 0 = M 1 ) + Ŷ1M 0 I(M 0 M 1 ),

21 11 where Ŷ 1M0 = Ê(Y 1M 0 M 0, X) = g(m 0, X; β). (1.5) The procedure is as follows: 1. Fit the model E(Y 1M1 M 1 = m, X) = g(m, X; β) to data from the treatment group R = 1 to obtain ˆβ. 2. Compute Ŷ1M 0 = g(m 0, X; ˆβ) for those with R = 1 and M 0 M Set Ỹ1M 0 = Y 1M1 I(M 0 = M 1) + Ŷ1M 0 I(M 0 M 1). 4. Compute the REG estimator ÊREG (Y 1M0 ) = i R iỹ1m 0,i / i R i. We will show that this estimator is unbiased in the Appendix. Method 3: Augmented Inverse Probability Weighting (AIPW) The IPW estimator only uses data on those with M 0 = M 1, and hence is possibly inefficient. We therefore propose using an augmented inverse probability weighting estimator (AIPW) to potentially improve efficiency. Following Tsiatis (2006), the AIPW estimator can be characterized as a combination of the IPW and REG estimators. Specifically, an AIPW estimator of µ 1M0 (m) is ˆµ AIPW 1M 0 (m) = ˆµ IPW i 1M 0 (m) + R ii(m0i = m)(1 W mi)ŷ1m 0,i i R ii(m0i = m)w, (1.6) mi where ˆµ IPW 1M 0 (m) is given by (1.4), and Ŷ1M 0,i is the regression imputation in (1.5). The first term only involves contributions from subjects with M 1 = M 0 in the treatment group. The second term includes additional contributions from individuals with M 1 M 0 in the treatment group.

22 12 The AIPW estimator of E(Y 1M0 ) is a weighted average of the estimated conditional means µ 1M0 (0) and µ 1M0 (1), Ê AIPW (Y 1M0 ) = ˆµ AIPW 1M 0 (1) P (M 0 = 1) + ˆµ AIPW 1M 0 (0){1 P (M 0 = 1)}. Unlike some AIPW estimators, this estimator is not doubly robust because consistency of Ŷ 1M0 depends on the correct specification of [M 0, M 1 X], which also is needed for IPW. Hence, this estimator generally will not give consistent estimation when the weight model is misspecified. However, if [M 0, M 1 X] is correctly specified, this AIPW method has advantages over IPW and REG methods. First, even if the regression model is misspecified, it remains consistent, and is potentially more efficient than the IPW method. In finite samples, because this strategy borrows information within M0 M 1, it can also reduce some of the instability problems that result from large weights and cause bias in the IPW method. The proof of this AIPW estimator s consistency is in the Appendix, and an investigation of bias and efficiency is given in Section Inference about natural direct and indirect effects All three methods above are designed to estimate E(Y 1M0 ), from which the estimation of natural direct effect and indirect effects are easily obtained; recall that Ê(Y 0M 0 ) = i (1 R i )Y i /n 0 and Ê(Y 1M 1 ) = i R iy i /n 1, where n 0 and n 1 are the number of subjects in the control and treatment group respectively. Multiple imputation is carried out to obtain the estimate of causal effects. Let θ k denote

23 13 an estimated parameter from the k th imputation. Our point estimate is K θ = θ k /K. (1.7) k=1 and Var( θ) is estimated using the bootstrap. 1.3 Simulation Study We conducted a simulation to investigate the properties of our three estimators of causal mediator effect: inverse probability weighting (IPW), regression imputation (REG) and augmented inverse probability weighting (AIPW). We generate 1000 random samples of size n = 500. For each generated sample, there are n 1 = 250 and n 0 = 250 subjects in the treatment and control group, respectively. Let x = (x 1, x 2 ) N(0, I) denote baseline covariates, where I is an identity matrix. The potential mediator M 1 is generated from a Bernoulli distribution with mean P (M 1 = 1 X = x) = exp( x 1 + x 2 )/{1 + exp( x 1 + x 2 )}, and the potential mediator M 0 follows a Bernoulli distribution with mean P (M 0 = 1 X = x) = exp(.5 + 2x 1.5x 2 )/{1 + exp(.5 + 2x 1.5x 2 )}. To capture the association between M 1 and M 0, we use the conditional distribution of [M 0 M 1, X, φ] given by (1.3) with φ =.5. Regarding the potential outcome, in order to satisfy Assumption 1.6, Y 1M0 and Y 1M1 are generated by normal distributions having the same mean but different variance as following. Y 01 (x) N(2 + 4x 1 + x 2, 1), Y 00 (x) N(3 + 2x 1 + x 2, 1), Y 1M0 (x) M 0 = 0 N(7 + 6x 1 + 8x 2, 1), Y 1M0 (x) M 0 = 1 N(5 + 4x 1 + 2x 2, 1), Y 1M1 (x) M 1 = 0 N(7 + 6x 1 + 8x 2, 2), Y 1M1 (x) M 1 = 1 N(5 + 4x 1 + 2x 2, 2).

24 14 The histograms of potential outcomes in one generated dataset is shown in Fig 1.1. Figure 1.1: The histograms of potential outcomes in one generated dataset Histogram of Y01 Histogram of Y00 Frequency Frequency Histogram of Y1M0 M0=0 Histogram of Y1M0 M0=1 Frequency Frequency Histogram of Y1M1 M1=0 Histogram of Y1M1 M1=1 Frequency Frequency Finally, we use the models above to obtain observed data {R, M, Y, X} for each individual. For example, if M 1 = 1 and R = 1, Y [Y 1M1 M 1 = 1]. The data is generated to yield large skewed weights (up to 50) for the IPW method. Under these models, true NDE = Because our methods evaluate the natural effects, all the estimates are compared with the true natural direct effect. In the first part of the simulation study, we examine inference obtained by fixing φ =.1,.5 and.9 in the analysis. The working models ψ r (X) = q r (X; λ r ) for the potential mediator are logistic regression models that are linear in X, and the working regression imputation model g(m, X, β) is a linear regression that regresses Y 1M1 on M, X and interactions between M and X. Therefore, all working models for the first simulation are correctly specified. Inferences about NDE are shown in Table 1.1. Estimates generated by each of our

25 15 Table 1.1: Simulated estimates of NDE when working models are correctly specified (True NDE=2.84) Method φ Estimate SE Bias/SE MSE IPW REG AIPW Note: 1000 simulations with 500 sample size; 10 imputations performed Table 1.2: Simulated estimates of NDE when models of [M 0 X] and [M 1 X] are correctly specified and regression imputation model is misspecified (True NDE=2.84) Method φ Estimate SE Bias/SE MSE IPW REG AIPW Note: 1000 simulations with 500 sample size; 10 imputations performed

26 16 Table 1.3: Simulated estimates of Natural Direct Effect when models of [M 0 X] and [M 1 X] are misspecified and regression imputation model is correctly specified (True NDE=2.84) Method φ Estimate SE Bias/SE MSE IPW REG AIPW Note: 1000 simulations with 500 sample size; 10 imputations performed methods are consistent with the true value, except that the IPW method exhibits some finite-sample bias due to large weights (the bias is removed when the sample size is increased to n = 1000). Second, both the SE and MSE of the AIPW method are smaller than those of the IPW method, which is expected when both models are correctly specified. In the second simulation study, we correctly specify working models [M 0 X] and [M 1 X], but misspecify the regression imputation model. Specifically, we use a linear regression that regresses Y 1M1 only on M and X. Results are shown in Table 1.2. The IPW and AIPW estimators are consistent, but the REG method is not. As in the first simulation, SE and MSE of the AIPW method are smaller than those of the IPW method, but we cannot guarantee that this is always the case. In order to demonstrate that all methods proposed here depend on the correct imputation of M 0 via (1.3), we conduct a third simulation that correctly specifies the regression imputation while misspecifying models of [M 0 X] and [M 1 X] by omitting the covariate X 1. The results are shown in Table 1.3. None of the methods performs well due to the wrong imputation of M 0.

27 17 The simulation results show that the estimates are quite consistent for different values of the sensitivity parameter, which might imply that baseline covariates explain a considerable portion of variability of potential mediator levels. 1.4 Data Application Here, we use STRIDE data to illustrate application of the methods in an intervention trial. Recall that the study was designed to examine the efficacy of interventions targeting physical activity adoption and maintenance. The possible mediators are psychosocial variables. The objective of our analysis is to examine the effect of behavior process that is hypothesized to mediate the relationship between the intervention and the primary outcome: 12-month physical activity behavior. Participants were randomized to three arms: telephone-based tailored intervention, print-based tailored intervention and contact-control delayed group. We combine the two intervention arms into a single treatment group, so that the number of subjects in the treatment and control groups are n 1 = 159 and n 0 = 78 respectively (total sample size n = 239). Individuals on the treatment arm were asked to complete questionnaires including psychosocial mediators of change at months 1, 2, 3, 4, 5, 6, 8, 10, and 12, whereas the control arm was asked to provide this information only at months 1, 6, and 12. Although there are several psychosocial mediators (behavior process, cognitive process, self-efficacy, decision outcome and outcome expectation), we focus on behavior process for this analysis because it has been shown to be significantly related to the increase in physical activity (Marcus and Forsyth 2009). The behavior process was measured by an instrument developed by Marcus and colleagues. Participants completed a 20-item questionnaire to rate the occurrence frequency of several behaviors on a 5-point scale ranging from never (1) to

28 18 repeatedly (5); these included substituion of alternative behaviors, enlisting social report, rewarding yourself, committing yourself and reminding yourself. The behavior process is measured as a summary score of questionnaire responses (Marcus and others 2007). Using the same notation as before, R = 1 for those assigned to treatment (phone and print tailored), and R = 0 for those in control group. We set M = 1 if behavior process increases at 6 months compared to the baseline, and M = 0 otherwise. The outcome Y is total weekly minutes of moderate intensity physical activity at 12 months. The baseline covariates include demographics (age, gender, race, marital status, highest grade completed in school), baseline psychosocial variables (self-efficacy, cognitive process and decision balance) and weight-related variables (BMI, percent body fat). Table 1.4: Inference of Natural Effects by carrying forward the most recent behavior process measurement for the missing mediator Method φ Direct Effect Indirect Effect Estimate SE Z score Estimate SE Zscore IPW REG AIPW Baron Kenny BK(covariate adjusted) These data also had 34 missing outcomes and 38 missing mediator values. To keep matters simple and focus on the methods for mediation, we use baseline carried forward for the missing mediator values, and the last value carried forward to impute physical activity responses. This is not to minimize missing data; in fact is focus of subsequent work. The

29 19 marginal potential mediator models are selected using backward model selection based on parsimony and model fit. For the intervention group M 1 (x), we choose the model logit 1 (λ 10 + λ 11 school + λ 12 cognitive + λ 13 BMI), while for the control group M 0 (x), we had the model logit 1 (λ 00 + λ 01 race + λ 02 married + λ 03 efficacy), where school is the highest grade the subject completed in school, BMI is caculated BMI at baseline, race is an indicator that the subject is Asian, married is an indicator that the subject is married. Cogitive and efficacy are baseline coginitive process mean score and self efficacy mean score on a 5-point scale, respectively. These models did not demonstrate lack of fit (Hosmer-Lemeshow goodness-of-fit statistics on 8 degrees of freedom shows p-values are.48 and.60 respectively). The regression imputation model g(m, X; β) is specified as linear in baseline covariates X. Referring to the inference procedure in Section 1.2.3, we perform 10 imputations and 1000 bootstrap resamples. The total effect of the intervention on the physical activity minutes at 12 months is (SE = 19.99). Results of the natural direct and indirect effects are provided in Table 1.4 for various values of the sensitivity parameter. First, all three methods we proposed show a statistical significant natural indirect effect based on Z scores, which suggests the effect of tailored intervention is mediated by the increase in behavioral process. Second, the natural direct and indirect effects estimated by IPW, REG and AIPW are quite different from the results of Baron-Kenny method regardless of whether we adjust covariates for the BK model. This discrepancy is expected because natural and controlled direct effects are not

30 20 equal unless the controlled direct effect does not depend on the mediator level (van der Laan and Petersen 2004). Third, AIPW has variance close to IPW. The reason why AIPW does not gain more efficiency than IPW is likely due to the moderate sample size. Fourth, our estimates are not sensitive for different values of sensitivity parameter φ. Referring to the effect of skewed weights on the IPW method estimates, we checked weights of treatment group subjects fixing sensitivity parameter φ at.5. As shown in Fig 1.2, the horizontal axis represents 5 imputations, where the weights are around 1 for the M 0 = 1 group, and up to 9 for the M 0 = 0 group, implying that there are no very large weights. Figure 1.2: The weights boxplot depending on M Discussion In behavioral intervention trials, researchers are often interested in identifying potential mediators so that they can understand the intervention more thoroughly and improve the intervention more effectively. The most popular approach to the mediation analysis is based

31 21 on the regression-based Baron-Kenny method, although several new methods are becoming more widely used. We have proposed three methods to identify the population-level natural direct and indirect effect up to a sensitivity parameter. In the STRIDE data analysis, the evidence suggests that the effect of tailored intervention is mediated by the increase in behavioral process. Our methods measure natural causal effects, which are more appropriate in behavior science versus controlled effects. First, they do not require that the mediator can be externally manipulated. Psychological variables such as self-efficacy might never change for a specific participant. The natural effects measure the intervention effect when the mediator for each participant was fixed at its potential level under no intervention, instead of the effect when all participants were set to the pre-specified mediator level. Next, they allow confounding between the mediator and the outcome. The mediator does not have to be randomized within each treatment arm. In addition, as approaches to evaluate natural effects, compared with the PS approach, which focuses only on those strata for whom participants would perform the same mediator level under different intervention group, our methods can not only estimate causal effects in each stratum, but also combine them across strata and give population-averaged estimates. A potential limitation of our methods is assumptions about the joint distribution of potential mediator and outcome. Because Y 1M0 is never observed for any participant, it involves techniques for incomplete data, and we are unable to check those assumptions. For the joint distribution of potential mediators, we use a sensitivity parameter to investigate untestable assumptions about [M 0, M 1 X]. Moreover, our methods are based on imputing potential mediator value under no intervention for the intervention group first, so if the marginal potential mediator model are far away from correct specification, all the estimates

32 22 do not perform well. The IPW and AIPW estimators are quite promising when the mediator models are correctly specified. In practice, model selection in mediator model specification plays an important role. It is critical for researchers to know what factors might be important to predict the mediators values under the treatment and control respectively. Once we have the predictors information, different approaches to select mediators models can be performed. In addition, when both weights and regression imputation models are correct, the REG method gives us an unbiased estimator, and the efficiency is potentially gained by AIPW method versus IPW method, in terms of SE and MSE. Table 1.5: Strata of binary mediator Stratum M 0 M 1 (0,0) 0 0 (0,1) 0 1 (1,0) 1 0 (1,1) 1 1 Although we focus on binary treatments and binary mediators, our methods can be extended to multiple treatments and continuous mediators. Since we have seen that the performance of our methods depends on if the parametric models are correctly specified, the future research will focus on how to relax the parametric models assumption to do the mediation analysis. For example, we have 4 strata for the binary mediator as in Table 1.5. Obtaining imputed M 0, under Assumption 1.6 E(Y 1M1 M 1 = m, X) = E(Y 1M0 M 0 = m, X), we can decompose the right hand side into two strata, resulting in E(Y 1M0 M 0 = m) = E(Y 1M0 M 0 = M 1 = m)p (M 0 = m) + E(Y 1M0 M 0 = m, M 1 = 1 m)p (M 0 = 1 m). The left side is straightforward to be identifiable from observed data. Because we treat Y 1M1 as a realization of Y 1M0 for those with M 1 = M 0, when m = 1, Y 1M0 of stratum (1, 1) is identifiable and E(Y 1M0 ) of stratum (0, 1) can be estimated. Similarly, E(Y 1M0 ) of stratum

33 23 (1, 0) can be estimated as well. Once we have the estimation of E(Y 1M0 ) and population proportion in each strata, natural causal effects can be examined.

34 Chapter 2 Causal Inference about Mediation when there are Several Mediators Abstract In the context of randomized intervention trials of behavior science, the typical objective is to understand how an intervention affects a primary outcome through potential mediating variables. Often there is more than one mediation path, and the relations between potential mediating variables suggest that the multiple mediator model is of more interest than the single mediator model. We develop a model to separately infer the mediation effects for individual variables when there are several potential mediators. A causal model, parameterized in terms of natural direct and indirect effects (Pearl, 2001), is used to encode mediation. To identify the natural indirect effects, we require information about the joint distribution of potential mediators. Our model identifies this joint distribution given baseline covariates and targeted restrictions to the correlation structure of the potential mediators. Unobserved potential mediators and associated potential outcomes can therefore be imputed under the 24

35 25 model, and causal contrasts of interest can be computed in a straightforward manner. We illustrate our methods using an analysis of a recent intervention trial designed to increase physical activity.

36 Introduction In randomized behavior intervention studies, investigators are usually comparing the effect of a new intervention with standard therapy. Moreover, they are interested in how the intervention operates on a primary outcome through potential mediating variables. The effect of intervention on the outcome passing through the mediator is the indirect effect, and the effect flowing around the mediator is the direct effect. Single mediator approaches have been widely proposed, such as Baron-Kenny method (Baron and Kenny 1986), structural mean model (SMM) (Ten Have and others 2004, 2007), principal stratification (Frangakis and Rubin 2002) and others. However, methods involving assessment of multiple mediators simultaneously have received little attention. The single mediation model does not capture relationships between mediators. It is of interest to know how multiple mediators would perform if tested concurrently within the same model. A widely used approach to mediation analysis is the Baron-Kenny method, which is regression-based. Although several approaches have been proposed for assessing total and specific indirect effects in a multiple mediation context (Mackinnon 2008; Preacher and Hayes 2008), they are all extensions of the single mediator framework established by Baron and Kenny. Assumptions for the single mediator framework also apply to the multiple mediator model. It requires that subjects are randomized to baseline intervention and, following randomization, to the mediator levels within each intervention group; the latter condition is called the sequential ignorability assumption (MacKinnon and others 2002). The multiple mediator model addresses the limitation of omitting variables by explicitly including additional mediating variables. However, it still requires sequential ignorability assumption given other mediating variables within each intervention group. When the assumptions hold and models are correctly specified, the Baron-Kenny method estimates controlled causal

37 27 effect (Pearl 2001). The controlled direct effect measures the intervention effect in the population when we set the mediator to a pre-specified level for all subjects. A fundamental drawback of the controlled causal effect is that mediation is a post-randomization event that cannot generally be externally manipulated. In intervention trials where potential mediators are psychological or any other characteristic variables that cannot be directly manipulated, even in theory, it is more natural to break down the overall total effect into its natural direct and indirect effects. The natural direct effect measures the intervention effect when the mediator for each individual is fixed at its natural level under no intervention, i.e., to apply the intervention but block the path to the mediator. Some interventions are designed to target specific mediators, so this is plausible. In the multiple mediation context, it is of great interest not only to determine whether an indirect effect exists, but also to break it down into individual mediating effects attributable to several potential mediating variables. This helps us to determine if the individual potential mediator mediates the effect of intervention on the outcome in a multivariate mediational analysis when multiple mediators are tested concurrently. The randomized behavior intervention study we use in this paper is Project STRIDE (Marcus and others 2007), a 4-year randomized trial designed to promote physical activity adoption and maintain short-term maintenance among previously sedentary adults. The primary outcome is minutes per week of moderate intensity physical activity measured at baseline, 6 and 12 months, using the 7-day Physical Activity Recall interview. Participants were randomized to one of the following three arms: telephone-based intervention with individualized motivational feedback; print-based intervention with individualized motivational feedback; and a contact-control delayed treatment group. Individuals randomized to both intervention arms received individually tailored messages that were generated by a

38 28 computer expert system, stage-targeted booklets, and physical activity-related tip sheets. Participants were contacted 14 times over the course of 12 months, and the rate of contact was more frequent at the beginning of the study. The material delivered to the two treatment arms was matched but the channel of delivery differed. The control group was mailed health education information on the same schedule as the treatment groups, and was informed that they would be able to choose either the print- or the telephone-based intervention for 12 months. To assess the effectiveness of intervention, not only the physical activity measure is considered, but also the potential psychosocial mediators of physical activity behavior change. Several factors affect whether a person is physically active, and some psychological theories and models (i.e. learning theory, decision-making theory, and social cognitive theory) can be useful when developing strategies to help participants become more physically active (King and others 2002; Prochaska and DiClemente 1983; Marcus and Forsyth 2009). The theory used to design the program suggests which factors might be most important for producing physical activity change. The following variables are derived from psychological theories which are believed to produce physical activity change (Marcus and Forsyth 2009): (a) Cognitive processes, which include increasing knowledge, being aware of risks, caring about consequences to others, comprehending benefits, increasing healthy opportunities. (b) Behavioral processes, which include substituting alternatives, enlisting social support, rewarding yourself, committing yourself and reminding yourself. (c) Self-efficacy, which refers to confidence in one s ability to perform specific behaviors in specific situations. (d) Decisional balance, which refers to the ratio of perceived benefits to barriers of change.

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Brian Egleston Fox Chase Cancer Center Collaborators: Daniel Scharfstein,

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016 Mediation analyses Advanced Psychometrics Methods in Cognitive Aging Research Workshop June 6, 2016 1 / 40 1 2 3 4 5 2 / 40 Goals for today Motivate mediation analysis Survey rapidly developing field in

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis. in the Presence of Treatment-by-Mediator Interaction

Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis. in the Presence of Treatment-by-Mediator Interaction Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis in the Presence of Treatment-by-Mediator Interaction Guanglei Hong Jonah Deutsch Heather D. Hill University of Chicago (This is a working

More information

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE insert p. 90: in graphical terms or plain causal language. The mediation problem of Section 6 illustrates how such symbiosis clarifies the

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

Modeling Mediation: Causes, Markers, and Mechanisms

Modeling Mediation: Causes, Markers, and Mechanisms Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, 2016 Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) Our team! 2 Avi Feller (Berkeley) Jane Furey (Abt Associates)

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

Causal mediation analysis: Definition of effects and common identification assumptions

Causal mediation analysis: Definition of effects and common identification assumptions Causal mediation analysis: Definition of effects and common identification assumptions Trang Quynh Nguyen Seminar on Statistical Methods for Mental Health Research Johns Hopkins Bloomberg School of Public

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Comparison of Three Approaches to Causal Mediation Analysis Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Introduction Mediation defined using the potential outcomes framework natural effects

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Granger Mediation Analysis of Functional Magnetic Resonance Imaging Time Series

Granger Mediation Analysis of Functional Magnetic Resonance Imaging Time Series Granger Mediation Analysis of Functional Magnetic Resonance Imaging Time Series Yi Zhao and Xi Luo Department of Biostatistics Brown University June 8, 2017 Overview 1 Introduction 2 Model and Method 3

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University April 13, 2009 Kosuke Imai (Princeton) Causal Mechanisms April 13, 2009 1 / 26 Papers and Software Collaborators: Luke Keele,

More information

Statistical Methods for Causal Mediation Analysis

Statistical Methods for Causal Mediation Analysis Statistical Methods for Causal Mediation Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small Bounds on Causal Effects in Three-Arm Trials with Non-compliance Jing Cheng Dylan Small Department of Biostatistics and Department of Statistics University of Pennsylvania June 20, 2005 A Three-Arm Randomized

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-13-2010 Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2010 Paper 117 Estimating Causal Effects in Trials Involving Multi-treatment Arms Subject to Non-compliance: A Bayesian Frame-work

More information

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference Guanglei Hong University of Chicago Presentation at the 2012 Atlantic Causal Inference Conference 2 Philosophical Question: To be or not to be if treated ; or if untreated Statistics appreciates uncertainties

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Conceptual overview: Techniques for establishing causal pathways in programs and policies

Conceptual overview: Techniques for establishing causal pathways in programs and policies Conceptual overview: Techniques for establishing causal pathways in programs and policies Antonio A. Morgan-Lopez, Ph.D. OPRE/ACF Meeting on Unpacking the Black Box of Programs and Policies 4 September

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models Alisa Stephens Luke Keele Marshall Joffe December 5, 2013 Abstract In randomized controlled trials, the evaluation

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

arxiv: v1 [stat.me] 8 Jun 2016

arxiv: v1 [stat.me] 8 Jun 2016 Principal Score Methods: Assumptions and Extensions Avi Feller UC Berkeley Fabrizia Mealli Università di Firenze Luke Miratrix Harvard GSE arxiv:1606.02682v1 [stat.me] 8 Jun 2016 June 9, 2016 Abstract

More information

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

Mediation for the 21st Century

Mediation for the 21st Century Mediation for the 21st Century Ross Boylan ross@biostat.ucsf.edu Center for Aids Prevention Studies and Division of Biostatistics University of California, San Francisco Mediation for the 21st Century

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach

Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Global for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Daniel Aidan McDermott Ivan Diaz Johns Hopkins University Ibrahim Turkoz Janssen Research and Development September

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Longitudinal Nested Compliance Class Model in the Presence of Time-Varying Noncompliance

Longitudinal Nested Compliance Class Model in the Presence of Time-Varying Noncompliance Longitudinal Nested Compliance Class Model in the Presence of Time-Varying Noncompliance Julia Y. Lin Thomas R. Ten Have Michael R. Elliott Julia Y. Lin is a doctoral candidate (E-mail: jlin@cceb.med.upenn.edu),

More information

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University January 23, 2012 Joint work with L. Keele (Penn State)

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death Kosuke Imai First Draft: January 19, 2007 This Draft: August 24, 2007 Abstract Zhang and Rubin 2003) derives

More information

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University February 23, 2012 Joint work with L. Keele (Penn State)

More information

Outline

Outline 2559 Outline cvonck@111zeelandnet.nl 1. Review of analysis of variance (ANOVA), simple regression analysis (SRA), and path analysis (PA) 1.1 Similarities and differences between MRA with dummy variables

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University Growth Mixture Modeling and Causal Inference Booil Jo Stanford University booil@stanford.edu Conference on Advances in Longitudinal Methods inthe Socialand and Behavioral Sciences June 17 18, 2010 Center

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Bayesian Inference for Causal Mediation Effects Using Principal Stratification with Dichotomous Mediators and Outcomes

Bayesian Inference for Causal Mediation Effects Using Principal Stratification with Dichotomous Mediators and Outcomes Bayesian Inference for Causal Mediation Effects Using Principal Stratification with Dichotomous Mediators and Outcomes Michael R. Elliott 1,2, Trivellore E. Raghunathan, 1,2, Yun Li 1 1 Department of Biostatistics,

More information

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

Modeling Log Data from an Intelligent Tutor Experiment

Modeling Log Data from an Intelligent Tutor Experiment Modeling Log Data from an Intelligent Tutor Experiment Adam Sales 1 joint work with John Pane & Asa Wilks College of Education University of Texas, Austin RAND Corporation Pittsburgh, PA & Santa Monica,

More information

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Kosuke Imai Luke Keele Teppei Yamamoto First Draft: November 4, 2008 This Draft: January 15, 2009 Abstract Causal mediation

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Abstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size

Abstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Abstract Title Page Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Authors and Affiliations: Ben Kelcey University of Cincinnati SREE Spring

More information

Impact Evaluation of Mindspark Centres

Impact Evaluation of Mindspark Centres Impact Evaluation of Mindspark Centres March 27th, 2014 Executive Summary About Educational Initiatives and Mindspark Educational Initiatives (EI) is a prominent education organization in India with the

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Telescope Matching: A Flexible Approach to Estimating Direct Effects *

Telescope Matching: A Flexible Approach to Estimating Direct Effects * Telescope Matching: A Flexible Approach to Estimating Direct Effects * Matthew Blackwell Anton Strezhnev August 4, 2018 Abstract Estimating the direct effect of a treatment fixing the value of a consequence

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

Modern Mediation Analysis Methods in the Social Sciences

Modern Mediation Analysis Methods in the Social Sciences Modern Mediation Analysis Methods in the Social Sciences David P. MacKinnon, Arizona State University Causal Mediation Analysis in Social and Medical Research, Oxford, England July 7, 2014 Introduction

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

Telescope Matching: A Flexible Approach to Estimating Direct Effects

Telescope Matching: A Flexible Approach to Estimating Direct Effects Telescope Matching: A Flexible Approach to Estimating Direct Effects Matthew Blackwell and Anton Strezhnev International Methods Colloquium October 12, 2018 direct effect direct effect effect of treatment

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.

More information

Identification and Inference in Causal Mediation Analysis

Identification and Inference in Causal Mediation Analysis Identification and Inference in Causal Mediation Analysis Kosuke Imai Luke Keele Teppei Yamamoto Princeton University Ohio State University November 12, 2008 Kosuke Imai (Princeton) Causal Mediation Analysis

More information

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017 Causal Inference Miguel A. Hernán, James M. Robins May 19, 2017 ii Causal Inference Part III Causal inference from complex longitudinal data Chapter 19 TIME-VARYING TREATMENTS So far this book has dealt

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 155 Estimation of Direct and Indirect Causal Effects in Longitudinal Studies Mark J. van

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect Xige Zhang April 10, 2016 1 Introduction Technological advances in mobile devices have seen a growing popularity in

More information

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN Ying Liu Division of Biostatistics, Medical College of Wisconsin Yuanjia Wang Department of Biostatistics & Psychiatry,

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Propensity Score Methods, Models and Adjustment

Propensity Score Methods, Models and Adjustment Propensity Score Methods, Models and Adjustment Dr David A. Stephens Department of Mathematics & Statistics McGill University Montreal, QC, Canada. d.stephens@math.mcgill.ca www.math.mcgill.ca/dstephens/siscr2016/

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

Survivor Bias and Effect Heterogeneity

Survivor Bias and Effect Heterogeneity Survivor Bias and Effect Heterogeneity Anton Strezhnev Draft April 27, 207 Abstract In multi-period studies where an outcome is observed at some initial point in time and at some later follow-up wave,

More information

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity

More information