Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models

Size: px
Start display at page:

Download "Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models"

Transcription

1 Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models Alisa Stephens Luke Keele Marshall Joffe December 5, 2013 Abstract In randomized controlled trials, the evaluation of an overall treatment effect is often followed by effect modification or subgroup analyses, where the possibility of a different magnitude or direction of effect for varying values of a covariate is explored. While studies of effect modification are typically restricted to pretreatment covariates, longitudinal experimental designs permit the examination of treatment effect modification by intermediate outcomes, where intermediates are measured after treatment but before the final outcome. We present a generalized structural mean model (GSMM) for analyzing treatment effect modification by posttreatment covariates. The model can accommodate posttreatment effect modification with both full compliance and noncompliance to assigned treatment status. The methods are evaluated using a simulation study that demonstrates that our approach retains unbiased estimation of effect modification by intermediate variables which are affected by treatment and also predict outcomes. We illustrate the method using a randomized trial designed to promote re-employment through teaching skills to enhance self-esteem and inoculate job seekers against setbacks in the job search process. Our analysis provides some evidence that the intervention was much less successful among subjects that displayed higher levels of depression at intermediate posttreatment waves of the study. For helpful comments and suggestions, we thank Stijn Vansteelandt, Eric Tchetgen Tchetgen, and Teppei Yamamoto. Department of Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine 624 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104, alisaste@mail.med.upenn.edu Associate Professor, 211 Pond Lab, Penn State University, University Park, PA 16802, ljk20@psu.edu, Phone: Department of Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine 602 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104

2 1 Treatment Effects That Vary With Covariates 1.1 Posttreatment Effect Modification and Adaptive Treatment Strategies In a randomized study of treatment effects, effects may be heterogenous due to effect modification. Effect modification is an interaction between a treatment and a pretreatment covariate such that the average treatment effect varies across values of the prerandomization covariate. While analyses with effect modification by a pretreatment covariate are relatively common, it is also possible for effect modification to occur as a function of a posttreatment covariate. In many randomized studies, data on intermediate covariates, defined as variables measured post-intervention but prior to the study endpoint, are often collected from subjects. That is, after treatment, intermediate measures are collected at time intervals such as six months, one year and 18 months later, whereas the outcome may be measured at two years. In such designs, we may suspect that the treatment effect may vary across levels of a covariate measured after the treatment but before the final outcome. Identification of posttreatment effect modification may be a useful tool in the development of adaptive treatment strategies. Under an adaptive treatment strategy, the treatment level and type are adjusted according to individual level characteristics (Murphy 2005; Lavori and Dawson 2000; Almirall et al. 2012). The design of adaptive treatment strategies requires choosing tailoring variables, variables that are used to decide how to adapt the treatment to specific individuals. Posttreatment effect modification provides one method for identification of tailoring variables. If the effect of a treatment varies across levels of a postrandomization variable, this would suggest that this covariate may be a good tailoring variable. Thus models where posttreatment covariates are allowed to modify causal effect estimates could be used for further actions within a study or to tailor clinical decision-making. Posttreatment effect modification may also be useful for generating hypotheses that can be tested in future trials. Before considering how one might identify and estimate models with posttreatment effect 2

3 modification, it is helpful to consider a motivating example. 1.2 Motivating Example: The Job Search Intervention Study The Job Search Intervention Study (Vinokur et al. 1995) was a randomized trial conducted in the U.S. to study job training effectiveness and the prevention of poor mental health. The goal was to promote re-employment among unemployed workers through training that focused on not only specific career skills but also on teaching active learning skills to enhance self-esteem and inoculate job seekers against setbacks in the employment search process. For the approximately 1,800 unemployed participants that enrolled, the study investigators collected baseline characteristics on education, income, sex, age, occupation, race, risk for failure, level of economic hardship, and depression level which were measured with a subscale of 11 items based on the Hopkins Symptom Checklist. Participants were randomized to either participate in five four-hour sessions in groups of 15 to 20 led by two group leaders or were mailed a booklet that contained information on job-search methods. In the study, the treatment condition was only available to individuals assigned to the intervention. A substantial percentage (46%) of those assigned to the intervention failed to participate in the training seminars. After the training sessions, outcomes were measured at six weeks, six months, and two years. One outcome of interest is a binary indicator for whether subjects were employed 20 or more hours per week at the two year follow-up period. The intent-totreat analysis reveals that the odds ratio of success in the treatment arm as compared to the control arm is 1.43 with a 95% confidence interval (1.06, 1.95). An IV analysis based on the double-logistic GSMM (Vansteelandt and Goetghebeur 2003), which accounts for noncompliance among the treated, shows that the odds ratio of success for participating in the job training seminars versus not participating is 1.73 with corresponding 95% confidence interval (1.11, 2.73). This analysis indicates that for subjects assigned to the intervention the odds of re-employment increased, and for those subjects who complied with the intervention the odds of re-employment increased further. 3

4 In this paper, we investigate the possibility of causal effect modification by a posttreatment covariate in the JOBS II study. Specifically, our aim in this analysis is to assess whether the causal effect of the JOBS II intervention was modified by depression levels at the initial two follow-up periods. Why might we expect the treatment effect to differ at levels of posttreatment covariate in the JOBS II study? While the intervention was designed to decrease depression through teaching coping skills, it also made participants aware of the mental health aspects of the job search process which may alter the effect of the treatment. Specifically, the treatment encouraged participants to avoid depressive feelings in the job search process. However, this awareness combined with early failure at reemployment may only serve to increase feelings of failure and depression, which may then lower the odds of re-employment at later follow-up periods. Under an adaptive treatment strategy, investigators may wish to identify post-randomization factors early in the follow-up period to allow for the tailoring of subsequent interventions. For example, if at an early stage the job training appears to be ineffective among subjects with high levels of posttreatment depression, investigators can design future interventions for this sub-population. Thus identifying posttreatment effect modifiers can be used to alter the course of future treatments. In this paper, we modify generalized structural mean models (GSMMs) for binary outcomes to estimate the posttreatment modification of causal treatment effects, where the causal quantity of interest is the causal odds ratio. GSMMs for binary outcomes are semiparametric two-stage models comprised of a structural model and association model (Vansteelandt and Goetghebeur 2003). The structural model is of primary concern as it is used to estimate the causal effect and, as we demonstrate, the amount of effect modification. The association model, which is a model for expected outcomes given observed exposure and covariates, is needed for estimation of the structural model parameters. Such models are robust to model misspecification through special weight functions in the estimating equations. Our paper has the following structure. Section 2 outlines our notation, and Section 3 4

5 describes the generalized structural mean model (GSMM) for the causal odds ratio and introduces the two parameter GSMM that allows for posttreatment effect modification. In Section 4, we detail the estimation procedure. Section 5 evaluates the properties of the twoparameter logistic GSMM for posttreatment effect modification through a simulation study. Section 6 presents estimates of posttreatment effect modification of causal effects in JOBS II. Section 7 includes discussion and concluding remarks. 2 Notation for Treatment Effects In this section, we begin with a basic causal model before outlining extensions to both posttreatment effect modification and noncompliance. Since the primary outcome of interest in JOBS II is binary, we focus on estimation of the causal odds ratio. 2.1 Simple Treatment Effects In JOBS II, subjects (i = 1,..., n) are randomly assigned to either treatment (R i = 1) or control (R i = 0). Covariates X i = (X 1,..., X k ) are measured at baseline prior to randomization. As stated in Section 1.2, investigators measured a binary outcome Y i, which was an indicator of employment of 20 hours or more per week. One popular way to define causal effects is in terms of counterfactual or potential outcomes (Neyman 1923; Rubin 1978; Holland 1986). Under the potential outcomes framework, we augment observed outcomes with a set of potential outcomes Y i,r, where Y i,0 indicates a treatment-free response that would have been observed if, perhaps counter to fact, subject i had not received treatment. We can define the treatment effect of R i on binary Y i under the following contrast logit(p [Y i R i = 1, X i ]) logit(p [Y i,0 R i = 1, X i ]). Next we parametrize this contrast logit(p [Y i R i = 1, X i ]) logit(p [Y i,0 R i = 1, X i ]) = η s (X i ; ψ 0 ) = exp(ψ 0 ). 5

6 Identification of the causal parameter ψ 0 requires three assumptions. First, there must be independence of the treatment assignment and potential outcomes. In JOBS II, this assumption is justified by the randomized design. Formally, this restriction can be expressed as Y i,r R i (1) Second, we require consistency of the observed outcome Y i for the counterfactual outcome Y i,r under R i = r. We also require that the stable unit treatment values assignment (SUTVA) holds, which implies that a subject s potential outcome is not affected by other subjects exposures (Rubin 1986). Under these assumptions, standard logistic regression conditioning on R i and X i with some minor adjustments may be used to obtain an unbiased estimate of ψ 0 (Freedman 2008). In this model, the log odds of the probability of success for subjects who were assigned to the training program (R i = 1) would increase by ψ 0 relative to subjects who were assigned to control (R i = 0). We can alter the parameterization so that η s (X i ; ψ 0 ) = ψ 01 + ψ 02 X i which allows for the possibility that the causal odds ratio varies over levels of X i. Such a parameterization allows for effect modification by pretreatment covariates. Logistic regression via maximum likelihood may still be used to estimate ψ 0 = (ψ 01, ψ 02 ) when η s (X i ; ψ 0 ) is restricted to pre-treatement covariates. In the evaluation of effect modification by posttreatment covariates, one might be tempted to enter posttreatment variables into η s (X i ; ψ 0 ) and again estimate ψ 0 through logistic regression. Rosenbaum (1984) demonstrates, however, that conditioning on posttreatment variables may result in biased estimates of the causal parameter ψ 0 in standard regression models. Estimation of posttreatment effect modification therefore necessitates the use of methods that properly account for conditioning on posttreatment quantities. In the next section, we define a GSMM for unbiased estimation of the causal odds ratio under effect modification by posttreatment variables. 6

7 3 Causal models for posttreatment effect modification 3.1 Modeling posttreatment effect modification under full compliance To evaluate modification of the causal odds ratio of obtaining employment of 20 hours or more for having received versus not receiving the JOBS training intervention by intermediate variables, we use the GSMM (Vansteelandt and Goetghebeur 2003). For the moment, we consider modeling posttreatment effect modification without accounting for noncompliance. We define potential posttreatment effect modifiers, S i, as the set of intermediate covariates observed after treatment but prior to the outcome, also possibly multivariate. The structural model is defined by logit(p [Y i S i, R i = 1, X i ]) logit(p [Y i,0 S i, R i = 1, X i ]) = η s (S i, X i ; ψ 0 ) = f 1 (ψ 01 ) + f 2 (S i ; ψ 02 ) (2) for arbitary known functions f 1 ( ) and f 2 ( ) up to an unknown p-dimensional parameter ψ 0 = (ψ 01, ψ 02 ), where p = dim(ψ 01 ) + dim(ψ 02 ). For the structural model in (2), when ψ 0 = 0 or R i = 0 we set f 1 (ψ 01 ) = f 2 (S i ; ψ 02 )=0, and for S i = 0 or ψ 02 = 0, we set f 2 (S i ; ψ 02 ) = 0. Equation 2 is an example of a retrospective structural mean model, which is characterized by exposure effects defined conditional on variables observed subsequent treatment. Unlike standard structural mean models (Robins 1994), which follow a decision-theoretic approach of defining causal effects as a function of covariates observed prior to treatment, retrospective structural mean models allow effects to be defined in subsets of the data not identifiable at baseline. For example, Vansteelandt (2010) considered retrospective models for assessing mediation when outcomes are binary and modeled using a logit link. Under our causal model, the second parameter ψ 02 quantifies the differential in the effect 7

8 of intervention R i at varying levels of S i. When ψ 02 = 0, homogeneous causal effects are estimated across values of S i. One model for a binary treatment R i and scalar intermediate variable S i might assume logit(p (Y i = 1 S i, R i = 1, X i )) logit(p (Y i,0 = 1 S i, R i = 1, X i )) = ψ 01 + ψ 02 S i (3) for scalars ψ 01, ψ 02. Under ψ 02 = 0, the causal odds ratio P (Y i,1 = 1 S i, R i = 1, X i ))/P (Y i,1 = 0 S i, R i = 1, X i ) P (Y i,0 = 1 S i, R i = 1, X i ))/P (Y i,0 = 0 S i, R i = 1, X i ) is quantified by exp(ψ 01 ) for all subjects when ψ 02 = 0 or the set of subjects with S i = 0 in the case that ψ For ψ 02 0 and S i 0, the causal odds ratio considering subjects with S i = s is captured by exp(ψ 1 + ψ 2 s), which allows the effect of R i to vary with the observed value s. 3.2 Identifiability We next consider the identifiability of SMMs with posttreatment effect modification. We do not present a theorem, but instead address identifiability under a theorem originally presented by Vansteelandt and Goetghebeur (2004) in the context of Strong Structural Mean Models. The restriction in (1) along with SUTVA and the assumption of consistency is sufficient to nonparametrically identify the effect of treatment in subgroups defined by baseline covariates. For example, consider the following model with a pretreatment binary covariate x i and treatment R i, E(Y i Y i,0 R i, x i ) = ψ 01 R i + ψ 02 R i x i (4) This model is nonparametrically identified given that under ignorability (1) the quantities in E(Y i R i, x i ) are defined in terms of observable quantities and the conditional expectations E(Y i,0 R i, x i ) and E(Y i R i = 0, x i ), are equivalent and defined in terms of observable 8

9 quantities. Our model in (3) is analogous to the following linear model: E(Y i Y i,0 S i, R i, x i ) = ψ 01 R i + ψ 02 R i S i. (5) Nonparametric identification for the above model does not hold since it imposes additional restrictions on the sub-group specific causal effects. There are four such possible effects (one for each subgroup defined by joint levels of S i and x i ),) whereas (5) involves only two parameters. The restrictions in (5) allow us to write x E(Y i Y i,0 R i = 1, x i ) = s P r(s i = s R i = 1, x i )E(Y i Y i,0 S i, R i = 1, x i )) (6) = s π sx (ψ 01 + ψ 02 s) = π 1x (ψ 01 + ψ 02 ) + π 0x (ψ 01 ) = ψ 01 + π 1x ψ 02 where π sx P r(s i = s R i = 1, x i ). Equation (6) involves two equations in two unknown quantities, ψ 01 and ψ 02, and has a unique solution so long as x i predicts the effect of treatment R i on S i (i.e., that π 1x varies with x i ). A model that is subtly different from (4) is E(Y i Y i,0 S i, R i, x i ) = ψ 01 R i + ψ 02 R i x i. (7) Unlike (4), (7) assumes that the effect of R i does not change across the strata of S i. Like (5), (7) allows us to express x in terms of two quantities: x = ψ 01 + ψ 02 x i. Models (5) and (7) are indistinguishable in the data, as each can be used to fit or explain the observed data under ignorability (1) equally well. More complex models such as E(Y i Y i,0 S i, R i, x i ) = ψ 01 R i + ψ 02 R i x i + ψ 03 R i S i (8) 9

10 are not identifiable from these data, as they involve more parameters than the number of restrictions imposed by (1). More generally, nonparametric identifiability does not hold for models like these, as the number of strata defined jointly by baseline covariates x i and posttreatment covariates S i is greater than the number of restrictions imposed by ignorable treatment assignment (1). Identifiability under the model in (8) which is parameterized by ψ = (ψ 01, ψ 02, ψ 03 ) depends on the parameter ψ being of no larger dimension than the number of restrictions imposed by (1). As such, we must restrict some subset of the R i x i interactions to identify effect modification by S i. Thus identifiability holds under a no-interaction assumption between R i and x i. This no-interaction assumption does allow for model-based identification but no set of assumptions allows for nonparametric identification. No-interaction assumptions are often used for identification of causal effects with instrumental variable analysis (Hernán and Robins 2006), in the estimation of direct and indirect effects (Robins and Greenland 1992; Ten Have et al. 2007; Vansteelandt 2010), and for other causal analyses(vansteelandt and Goetghebeur 2004). In many cases no-interaction assumptions are untestable, but in the context here the identifying assumption does have a testable implication. The analyst can restrict the posttreatment interactions to be zero and estimate the model with pretreatment effect modification. If there is considerable treatment heterogeneity, then the no-interaction assumption is implausible. One might argue that absent nonparametric identification, the model should be abandoned. We think that would be premature. The model can still serve a number of useful purposes so long as the estimates are not given a strong causal interpretation. First, posttreatment effect modification can be used for intermediate decision making if the trial is ongoing. Analysts could use the model to identify subgroups were the treatment is particularly ineffective and a new intervention might be implemented. Second, results from an analysis of this form could also be used as a method for hypothesis-generation and the design of future interventions. Third, the model could also be used as a form of sensitivity analysis 10

11 in conjunction with other causal models. 3.3 An Alternative Definition of the Estimand Next, we develop an alternative characterization of S i the posttreatment modifier, which more clearly reveals linkages between our model and the method of principal stratification (PS) (Frangakis and Rubin 2002; Mealli and Rubin 2003). Given that treatment status R i is allowed to affect S i, our causal model might treat it in two different ways. Thus far the casual model describes stratification on an observed posttreatment variable. Alternatively, we might write S i as a potential outcome such that counterfactual value is denoted as S i,r under R i = r. In both cases, treatment effects are defined for subsets of the population using variables that are not observable at the time of treatment assignment (Joffe et al. 2007). First, using potential outcomes for the posttreatment effect modifier alters the causal model little. Under our stated assumptions, the following set of equalities hold. P (Y i = 1 S i, R i = 1, X i ) = P (Y i,1 = 1 S i,1, R i = 1, X i ) = P (Y i,1 = 1 S i,1, X i )) (9) The consistency assumption justifies the first equality, while the second holds due to ignorability in (1). Similarly, due to ignorability, P (Y i,0 = 1 S i,1, R i = 1, X i ) = P (Y i,0 = 1 S i,1, X i ). Under this notation, we would write the causal model as logit{p (Y i,1 = 1 S i,1, X i )} logit{p (Y i,0 = 1 S i,1, X i )} = ψ 01 + ψ 02 S i,1 (10) Under this causal model, we condition on the potential auxiliary S i,1 rather than the observed value of S i, but do so for the population as a whole. For a binary treatment, stratification on an observed posttreatment variable is equivalent to conditioning on a single potential outcome and so is in general a coarsening of the PS estimand logit{p (Y i,1 = 1 S i,1, S i,0, X i )} logit{p (Y i,0 = 1 S i,1, S i,1 X i )}. When S i,0 has the same value for all subjects our estimand is essentially the same as the PS estimand (see Gilbert et al. (2011)). Thus our estimand can 11

12 be characterized as either one articulated under principal stratification or as stratification on an observed auxiliary variable (Joffe et al. 2007). Hereafter, we retain the notation of S i as observed auxiliary variable. 3.4 Noncompliance As we noted in Section 1.2, JOBS II was subject to noncompliance. We first review the double-logistic SMM for settings with noncompliance and then adapt the structural model to allow for posttreatment effect modification of exposure to treatment rather than assignment to treatment. We denote actual exposure to the treatment by A i. As we noted above in JOBS II many subjects did not comply by failing to attend the training sessions. In JOBS II, those assigned to control did not have access to the training sessions. Therefore when R i = 0 then A i = 0. Conversely, if R i = 1, then A i = 1 if subject i complies with treatment assignment and if A i = 0 the subject does not comply with treatment assignment. Therefore, the experimental design justifies what is known as the no-contamination restriction (Cuzick et al. 2007), so that P r(a i = 0 R i = 0) = 1. One causal parameter that may be of interest is the causal effect of A i on the response. To estimate this quantity, we would like to contrast how each subject in the treatment group would respond if he or she fails to attend the training sessions instead of selecting to attend the intervention. We now denote such potential treatment-free outcomes as Y ir,0. Here, we assume the exclusion restriction holds (Angrist et al. 1996) which implies that Y i1,0 is equal to Y i0,0, and we denote Y i0 = Y i1,0 = Y i0,1. Given the exclusion restriction, the contrast of interest compares P (Y i = 0 A i = 1) to P (Y i0 = 1 A i = 0), which is the total effect of A i, self-selection into the job-training sessions, on the outcome re-employment status. We use the term total effect to reflect that the effect of A i may be mediated through S i, since under our structural model S i is not manipulated. Finally, we assume that R i has a nonzero causal effect on A i (Angrist et al. 1996). The double logistic SMM (Vansteelandt and Goetghebeur 2003) models differences in 12

13 P (Y i = 0) to P (Y i0 = 1), through the following model logit{e[y i A i, X i, R i = 1]} logit{e[y i0 A i, X i, R i = 1]} = η s(a i, X i ; ψ 0 ) (11) In this model, the odds ratio of success for subjects who chose to be exposed to the training program (A i = 1) versus not exposed (A i = 0) is determined by ψ 0, depending on the functional form of η s(a i, X i ; ψ 0 ). Here ψ 0 corresponds to the log odds ratio of success for participating versus not participating in the intervention. Allowing for posttreatment effect modification of A i instead of R i requires minimal alteration of the causal model. To allow for posttreatment effect modification of A i, model (2) becomes logit(e[y i S i, A i, R i = 1, X i ]) logit(e[y i,0 S i, A i, R i = 1, X i ]) = f 1 (A; ψ 01 ) + f 2 (A, S i ; ψ 02 ), (12) Interpretation of the model parameters largely parallels the case with full compliance. 3.5 Identifiability Basic issues of identifiability change little. While noncompliance entails additional assumptions for identification, posttreatment effect modification requires a no-interaction assumption in the same form as the case with full compliance. We must also maintain an exclusion restriction for S i as ell. While noncompliance itself requires untestable assumptions for identification, some of those assumptions are justified by the design in JOBS II and the effect of R i on A i is a testable condition. The design by itself does not provide justification for the no-interaction assumption. 13

14 4 Estimation 4.1 Estimation under full compliance The Vansteelandt and Goetghebeur (2003) approach to the estimation of causal effects under generalized structural mean models of binary outcomes using the logit link was developed as a solution to Robins (1999), which showed that the causal odds ratio could not be estimated using the same G-estimation procedure as used for identity and log links. To facilitate the definition of mean treatment-free outcomes to be used in a modified version of G-estimation, the first of two stages defines an association model among subjects receiving intervention. A detailed argument motivating the need for the association model is described in Vansteelandt and Goetghebeur (2003) and largely follows from the noncollapsibility of the logit link. When the associated model is nonsaturated, it can be uncongenial to the logistic SMM (Robins and Rotnitzky 2004). Vansteelandt et al. (2011) argue that the biases from uncongenial estimators are small compared with other assumption failures. Moreover, alternatives are computationally demanding. The double-logistic SMM has similar asymptotic properties to G-estimators when the association model is correctly specified. Under the null hypothesis ψ 0 = 0, one may construct locally robust estimating equations that yield consistent inference under misspecified association models (Vansteelandt and Goetghebeur 2003). For a binary randomization assignment R i the first stage of estimation defines the model logit(e(y i S i, R i = 1, X i ) = η a (S i, X i ; β) (13) for a known function η a and unknown finite-dimensional parameter vector β. The subscript a distinguishes the association model from the structural, causal model. As in generalized linear regression models, components of the the linear predictor η a (S i, R i = 1, X i ; β) may take any functional form, including main effects, interactions, and higher order terms. For subjects randomized to treatment, we define the pseudo treatment free outcome H i (ψ) = 14

15 h(s i, X i, R i ; ψ) by [ ] H i (ψ) = expit η a (S i, R i = 1, X i ; β) [{f 1 (ψ 1 ) + f 2 (S i ; ψ 2 )}. For subjects with R i = 0, the observed outcome Y i equals the treatment free outcome Y i,0 following the consistency assumption. In the control arm, it is therefore unnecessary to estimate H i (ψ) by removing the treatment effect from a model-predicted mean outcome; H i (ψ) is simply set to H i (ψ) = Y i. The second stage of estimation then defines the estimating function U(S i, R i, X i, H i (ψ)) = d(x i, R i ) [H i (ψ) q(x i )], (14) for d(x i, R i ), a p dimensional weight function defined such that its elements d 1 (X i, R i ),... d p (X i, R i ) are non-collinear, and q(x i ), a function of baseline covariates. The causal parameter ψ is n estimated as the solution to U(S i, R i, X i, H i (ψ)) = 0. By the randomization assumption, [ i=1 n ] under the true ψ and β, E U{S i, R i, X i, H i (ψ)} = 0 for U(S i, X i, H i (ψ)) as defined i=1 in (14). The chosen d(x i, R i ) and q(x i ) affect efficiency but not bias in the resulting estimate ˆψ when the association model is correctly specified. Under a misspecified association model, the robust estimating equations referenced above are constructed through strategic selection of d(x i, R i ) and guarantee that type I error is preserved. Term q(x i ) does not affect bias under correct or incorrect specification of the association model. Vansteelandt and d Goetghebeur (2003) recommend the choice d(x i, R i ) = (X i )( 1) R i, with R i P (R=1 X)+(1 R i )(1 P (R i =1 X i )) [ ] d (X i ) = E Hi (ψ) X ψ i, following semiparametric optimality arguments for their GSMM under known β. Under the logit model H i(ψ) ψ evaluated at an initial estimate of ψ, ˆψ b. 4.2 Estimation under non-compliance = H i (ψ)(1 H i (ψ))[r i R i S i ] where H i (ψ) is The two-stage approach may also be used to estimate post-intervention effect modification amid non-compliance. Estimation considers the observed treatment variable A i in the asso- 15

16 ciation model amd predicted treatment-free mean outcomes H i (ψ). The first stage model is defined as logit(e(y i S i, A i, R i = 1, X i ; β r ) = η a (S i, A i, X i ; β), (15) for a known function η a and unknown finite-dimensional parameter vector β, and predicted mean treatment-free outcomes are constructed as H i (ψ) = expit[η a (S i, A i, X i ; β) [{f 1 (A; ψ 1 ) + f 2 (A, S i ; ψ 2 )}]. (16) For subjects randomized to control and hence unable to access treatment, we again set H i (ψ) = Y i. Estimation then proceeds as described in section 4.2 setting n U(S i, A i, R i, X i, H i (ψ)) = i=1 n d(x i, R i ) [H i (ψ) q(x i )] = 0, (17) i=1 noting the dependence of our estimator on A i. 5 Simulation Study A simulation study was conducted to evaluate the proposed estimator for assessing effect modification by posttreatment variables. Each simulated dataset consisted of n=5,000 subjects. In simulation set 1, we generate data under perfect compliance. For each subject, independent baseline covariates X i1 Bernoulli(p = 0.4) and X i2 Bernoulli(p = 0.7) were generated. Binary treatment R i was simulated following an unstratified randomization design, with R i Bernoulli(p = 0.5). The posttreatment variable S i was simulated from the model logit(p (S i = 1 R i, X i )) = γ 0 + γ 1 X i1 + γ 2 X i2 + γ 3 R i, with γ = (logit(0.2), 1.0, 1.0, γ 3 ), with γ 3 =0, 0.3, or 1.2 for S i not associated, weakly associated, or strongly associated with R i. The data design contained binary covariates to ensure congeniality among various stages of conditional mean treatment-free potential outcome models. Outcomes were generated by first defining logit(e[y i,0 X i1, X i2 ]) = ρ X 0 +ρ X 1 X i1 +ρ X 2 X i2 + 16

17 ρ X 3 X i1 X i2, with ρ X = (logit(0.35), 0.8, 0.8, 1.5). To adjust mean treatment-free potential outcomes for S i, we followed a strategy similar to that described in Robins and Scharfstein (1999). We set E[Y i,0 S i, R i, X i ] = logit[logit(e[y i,0 X i1, X i2 ]) + ρ S 1 S i + ρ S 0 j (1 S i )], where ρ S 1 = 0.4, and for given ρ X, γ and values of covariates R, X i1, X i2, ρ S 0 j was the solution to E S [E[Y i,0 S i, R i, X i ]] = E[Y i,0 X i ] for j = 1,..., 8, indexing unique profiles of R, X i1, X i2. Observed outcomes were then simulated as Bernoulli(p Y ), where logit(p y ) = logit(e[y i,0 S i, R i, X i ]) + ψ 1 R i + ψ 2 R i S i for ψ=(0.5,-0.5). In applying the two-stage G- estimation to each simulated dataset, a fully saturated association model was considered E[Y i S i, R i, X i ], followed by a estimation under a misspecified association model containing only main effects of S i, X i1, X i2 and the two-way interaction X i1 X i2. Standard logistic regression was also applied in two forms, both of which included R i, R i S i as covariates. For the application of the GSMM with a saturated association model, the first application of logistic regression was fully saturated for S i, X i1, X i2, whereas the second application excluded all terms containing S i other than the interaction R i S i of interest. For GSMM with a misspecified association model, potentially comparable logistic regression models included 1) R i, R i S i, S i, X i1, X i2, X i1 X i2 and 2) R i, R i S i, X i1, X i2, X i1 X i2. Simulation set 2 considers estimation of posttreatment effect modification in the presence of non-compliance. Variables X i1,x i2,r i, and S i were generated as in simulation set 1. To simulate non-compliance, the data-generating mechanism and estimation procedure were modified in several ways. First, a compliance variable A i was generated following the model logit(p (A i = 1 X i1 )) = logit(0.9) 3X i1. For subjects with R i = 0, we set A i =0 following the setting where subjects randomized to control are unable to access active treatment. Under this design, compliance was approximately 66% among those randomized to treatment. Under ignorable non-compliance, mean treatment-free outcomes were generated identically to simulation set 1, setting E[Y i,0 S i, A i, R i, X i ] = E[Y i,0 S i, R i, X i ]. Under non-ignorable non-compliance E[Y i,0 S i, A i, R i, X i ] was generated by adjusting the conditional treatmentfree mean E[Y i,0 S i, R i, X i ] for compliance A i and then S i while maintaining congeniality 17

18 across conditional means. Observed outcomes Y i under noncompliance were distributed Bernoulli(p y,nc ), where logit(p y,nc ) = logit(e[y i,0 S i, A i, R i, X i ]) + ψ 1 A i + ψ 2 A i S i, also with ψ = (0.5, 0.5). The application of the two-stage GSMM considered the association model fully saturated for S i, A i, X i1, X i2. Two logistic regression models were fit: model 1 was similar to an intent-to-treat analysis with the addition of post-intervention covariates and contained terms R i and R i S i with full saturation for S i, X i1, X i2 ; and model 2 was an astreated approach, including A i and A i S i, also with full saturation for S i, X i1, X i2. All results were based on 1,000 replicated datasets. Tables 1-4 contain detailed results from the simulations across several different scenarios. Table 1 shows the simulation study results under full compliance when saturated association and logistic regression models are applied. Figure 1 graphically compares percent bias between GSMM and logistic regression estimates as shown numerically in Table 1. We only include results for the logistic regression model containing terms involving S i beyond the R i S i interaction of interest, as this was the less biased of the two logistic regression estimators. For settings characterized by S i associated with Y i,0 and S i affected by R i, bias was observed for standard logistic regression coefficients. For S i and R i weakly associated, the bias in logistic regression coefficients was approximately 20%. For a strong association between treatment and the post treatment variable, bias was more severe. Considering the estimated magnitude of effect, logistic regression estimates were 34%-50% biased; bias increased when noting the reversed sign of the estimated effect compared to the true value. When one of the two characterizing dependencies did not hold, logistic regression estimates were unbiased and more efficient than GSMM estimates, which were typically more biased than logistic regression estimates (2-3%) but within reasonable range considering Monte Carlo variation. G-estimation estimators exhibited the most bias compared to logistic regression when treatment predicted the posttreatment variable, but the posttreatment variable was not associated with Y i,0. Under this design, bias of the G-estimation was 5% to 10%, whereas logistic regressions was < 1% biased. Monte Carlo standard deviations 18

19 were much larger for estimates of the two-parameter GSMM compared to standard logistic regression. Table 2 compares GSMM and logistic regression estimates when terms in the logistic association model are misspecified as described in the simulation design. Bias in logistic regression estimates was similar to that observed in scenario 1. Considering the GSMM, the use of robust weights did not protect against bias when the treatment was strongly predictive of the post-intervention variable. For simulation settings with γ 3 = 1.2 and S i associated with the outcome, the magnitude of causal parameters was overestimated by 38-63% and Monte Carlo standard deviations were substantially larger than under the correct model. For a more modest association of the treatment and intermediate variable (γ 3 = 0.3) or estimation when at least one of the characterizing dependencies did not hold, bias and standard errors were similar to that observed under saturated, correct association models. Theoretically, robust weights guarantee protection against inflation of type I error under the null hypothesis of no treatment affect. Moving away from the null value in the parameter space may result in bias in estimation, as was observed in some scenarios. An additional simulation study was conducted to evaluate the behavior of robust weights as the true parameter value moved away from the null. Results are shown in the appendix and illustrate the trend for increased bias as ψ moves further from 0. The behavior of GSMM and logistic regression for estimating posttreatment effect modification under non-compliance is displayed in Tables 3 and 4, with Table 3 featuring ignorable noncompliance and Table 4 demonstrating nonignorable non-compliance. The intent-to-treat style analysis using logistic regression demonstrated considerably greater bias than in the case of full compliance investigated in scenario 1, with at least 28% bias observed in all settings, and up to nearly 200% bias when treatment strongly predicted the intermediate variable. When the intermediate was not affected by treatment or associated with the outcome, estimates were generally attenuated compared to the true value for both ignorable and non-ignorable noncompliance. Moderate bias (9-18%) was observed for the GSMM under 19

20 weak prediction of treatment for the intermediate variable. Additional simulations showed that this bias is removed by considering larger sample sizes or by increasing the predictiveness of baseline covariates for the intermediate variable. Results of these additional simulations are shown in the appendix. The as-treated analytic approach was unbiased and more efficient than the GSMM under ignorable non-compliance when the intermediate was not affected by treatment or not associated with the outcome but exhibited substantial bias under all scenarios considering non-ignorable missingness. 6 Data Analysis We now turn to the analysis of the data from the JOBS II study. We restrict the analysis to the subset of the subjects for which depression levels and the re-employment outcome are fully observed at all follow up periods. We compare the GSMM estimates to estimates from logistic regression. In all cases, we condition on a large set of pre-treatment covariates that were measured in the JOBS II study. We use pre-treatment covariates to specify the association model, which models the observed outcomes. The pre-treatment covariates include binary indicators for seven categories of occupation type, sex, marital status, whether the subject was nonwhite, years of education, income, age, a measure of financial strain, and depression at baseline. We use the same covariates in the specification of the logistic regression model. In the JOBS II study, depression levels were measured six weeks and six months after subjects completed the training sessions which comprised the intervention. We conduct separate analyses for the two intermediate follow-up periods. In the first analysis, the causal effect of being exposed to the treatment is modified by depression levels at six weeks, and in the second analysis the effect of the intervention is modified by depression levels at six months. The two separate analyses allow us to understand whether the magnitude of effect modification varies over time. We re-scaled the depression scale to range from 0 to 100, which aids interpretation. We found that model convergence was somewhat sensitive to specification of the association model. In particular, we found that when we failed to 20

21 condition on depressive symptoms at baseline estimates either became so large as to signal a lack of convergence or convergence failed outright. Richer specifications also did little to aid precision of the model estimates. Table 5 contains estimates for the two causal parameters, ψ 1 and ψ 2, under two-stage G- estimation and logistic regression. The GSMM causal estimates (robust standard errors are in parenthesis with p-values appearing below) for the ψ 2 parameter: (0.05) at the six week follow-up and (0.04) at the six month follow-up. Estimates of the ψ 2 parameter from logistic regression are much smaller in comparison: (0.008) at six weeks and (0.007) at six months. The bias we observe in the simulations when logistic regression is applied appears to be present in this application as well. The separate parameter estimates in Table 5 do not readily convey the conditional nature of the causal effect implied by the model. We next explore in more detail how posttreatment levels of depression modify the effect of the JOBS II intervention. Here, we use the measure of depressive symptoms from the six week follow-up with the parameter estimates from GSMM. We calculate the causal odds ratio and associated confidence interval for the intervention conditional on three levels of the depression scale. Recall that if we ignore treatment effect modification, the estimated causal odds ratio for compliers is 1.73, which suggests the intervention was effective at increasing employment. How does this estimate change if we consider effect modification by depression at six weeks? In the sample, approximately ten percent of subjects recorded no depressive symptoms. The estimated causal odds ratio for subjects with a score of zero is 4.26 with an associated 95% confidence interval of (1.01, 18.06). While one is just excluded from the confidence interval, the width of this interval is quite long, which reflects the fact that the support in the data is relatively low at that level of depression. Next we calculate the causal odds ratio for subjects with a score of seven on the depression scale, which represents the 25th percentile. The causal odds ratio is nearly 50% smaller at 2.84 with a corresponding 95% confidence interval (1.22, 6.58). When depressive symptoms increase to a score of 16, the median of the depression scale, the causal odds ratio 21

22 decreases further to 1.68 with 95% confidence interval (0.96, 2.96). The magnitude of the treatment effect is further reduced to the point that it appears to be completely without effect for those with higher levels of depression at six weeks. We plot the pattern of effect modification in Figure 2. In Figure 2, we also include estimates and corresponding confidence intervals for two levels of depression above the median. Here, the point estimate falls below 1. Given the sensitivity of estimates to association model specification, we caution against using these estimates as conclusive. The estimates do suggest that in the future, an intervention might be developed or applied to target subjects with higher depression levels observed at 6 weeks. Given the observed pattern of effect modification, investigators may decide that an additional intervention is necessary for subjects with greater than median levels of depression. Our extension of SMMs to the estimation of posttreatment effect modification allows for the tailoring of future interventions when it appears that the treatment is performing poorly among some subpopulations. Under the identification assumption, we must assume that the effect of exposure to the training intervention did not vary across levels of baseline covariates. For example, this implies that the effect of the intervention among compliers did not vary across the seven occupation types measured at baseline. Whether such an assumption is realistic depends largely on subject matter knowledge. While such an assumption might appear implausible, the original study did not place any weight on the possibility of interactions between baseline covariates and exposure to the intervention. In fact, the set of baseline covariates is rich enough that there are more possible sub-populations with varying treatment effects than could be fully specified. As such, the original analysis implicitly invoked the restriction needed for identifiability in our analysis. 7 Discussion We have demonstrated using GSMMs to estimate causal effects that may be modified by posttreatment variables. Our work complements existing literature on noncompliance and 22

23 mediation where conditioning occurs on posttreatment variables. It is natural to contrast the model here with these other causal models that condition on posttreatment quantities. One natural comparison is to casual mediation analysis. It would appear that the analysis we have proposed differs substantially from the purpose of a causal mediation analysis. In mediation, the goal is to decompose a treatment effect into direct and indirect components (Imai et al. 2010). The indirect treatment effect is an effect mediated by a third variable which transmits the treatment effect to the outcome. Mediation effects were of key interest in the original analysis (Vinokur et al. 1995). For example, in the original study it was thought that exposure to the training intervention had a direct effect on re-employment but also had an indirect effect through a measure of self-confidence. In contrast, we stipulate only a total effect of the treatment that is conditional on levels of S. The analysis here also differs from mediation in that the causal contrast here involves only the effect of the treatment and does not attempt to parse out the separate pathways by which treatment influences the outcome or the role of the mediator in those pathways, whereas mediation analysis attempts to do those things. The mediation analysis is, in principle, more ambitious, and, thus, potentially less subject to testing. To see this, consider a (structural nested) model for a mediation analysis in which the effect of the treatment and the mediator are both modeled conditional on levels of the treatment A and the mediator S. By modeling the effect of treatment and the mediator jointly conditional on the same things, we introduce even more parameters into our problem. Under the assumption (often warranted in studies where the main intervention is randomly assigned but the mediator is not) that initial randomization holds but sequential ignorability does not, we have the same number of identifying restrictions. In such circumstances, the models of joint effects involved in mediation analysis will be even less subject to testing than our models of effect modification. Our analysis has focused on a binary treatment. For treatments with more than two levels, the analysis may be extended by fitting a separate association model for each level of treatment, and defining H i (ψ) for each subject by subtracting off the parametrized effect of 23

24 the subjects observed treatment according to the proposed structural model. Restrictions on the effects of various treatment levels may be enforced through the parametrization of ψ in the blip down function from each treatment arm. One weakness of this approach is its dependence on the specification of the association model. Although robust weights provide valid testing in the absence of treatment effects, estimation of treatment effects may be subject to bias under the alternative. Moreover, in data analysis, nonconvergence was observed when baseline depression, a covariate that was highly predictive of the intermediate variable 6-week or 6-month depression, was omitted from the auxiliary model. The implication of this for practitioners is that model fitting of the association model should be completed carefully, with careful attention to functional form and the potential presence of interaction. Additional methodology to enhance robustness is one potential area for further research. A second drawback is the increased variability of estimates determined through G- estimation versus standard logistic regression. This additional variability may be explained by a number of factors, including the semiparametric versus parametric nature of the estimator, as well as the two-stage nature of the estimation strategy. Variance of G-estimation estimates are reduced when baseline covariates contain strong predictors of intermediate variables. For studies using these types of analyses, it is therefore beneficial to record the baseline value of potential intermediate modifying variables and other baseline correlates of intermediates when intending to use this methodology. In this context, efficiency is secondary to bias, and G-estimation yields unbiased estimates when intermediates are affected by treatment and predict the outcome, whereas logistic regression does not. Unless it is known that intermediate variables are not affected by treatment, bias considerations support the use of the logistic G-estimation despite its decreased efficiency compared to logistic regression. 24

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Comparison of Three Approaches to Causal Mediation Analysis Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Introduction Mediation defined using the potential outcomes framework natural effects

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-13-2010 Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, 2016 Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) Our team! 2 Avi Feller (Berkeley) Jane Furey (Abt Associates)

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

arxiv: v1 [stat.me] 8 Jun 2016

arxiv: v1 [stat.me] 8 Jun 2016 Principal Score Methods: Assumptions and Extensions Avi Feller UC Berkeley Fabrizia Mealli Università di Firenze Luke Miratrix Harvard GSE arxiv:1606.02682v1 [stat.me] 8 Jun 2016 June 9, 2016 Abstract

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University Growth Mixture Modeling and Causal Inference Booil Jo Stanford University booil@stanford.edu Conference on Advances in Longitudinal Methods inthe Socialand and Behavioral Sciences June 17 18, 2010 Center

More information

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Brian Egleston Fox Chase Cancer Center Collaborators: Daniel Scharfstein,

More information

Sensitivity checks for the local average treatment effect

Sensitivity checks for the local average treatment effect Sensitivity checks for the local average treatment effect Martin Huber March 13, 2014 University of St. Gallen, Dept. of Economics Abstract: The nonparametric identification of the local average treatment

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data Biometrics 000, 000 000 DOI: 000 000 0000 Discussion of Identifiability and Estimation of Causal Effects in Randomized Trials with Noncompliance and Completely Non-ignorable Missing Data Dylan S. Small

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death

Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death Identification Analysis for Randomized Experiments with Noncompliance and Truncation-by-Death Kosuke Imai First Draft: January 19, 2007 This Draft: August 24, 2007 Abstract Zhang and Rubin 2003) derives

More information

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015 Estimating Heterogeneous Causal Effects with Time-Varying Treatments and Time-Varying Effect Moderators: Structural Nested Mean Models and Regression-with-Residuals Geoffrey T. Wodtke University of Toronto

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance

Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance Identification and Estimation of Causal Mediation Effects with Treatment Noncompliance Teppei Yamamoto First Draft: May 10, 2013 This Draft: March 26, 2014 Abstract Treatment noncompliance, a common problem

More information

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017 Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability Alessandra Mattei, Federico Ricciardi and Fabrizia Mealli Department of Statistics, Computer Science, Applications, University

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Modeling Log Data from an Intelligent Tutor Experiment

Modeling Log Data from an Intelligent Tutor Experiment Modeling Log Data from an Intelligent Tutor Experiment Adam Sales 1 joint work with John Pane & Asa Wilks College of Education University of Texas, Austin RAND Corporation Pittsburgh, PA & Santa Monica,

More information

Small-sample cluster-robust variance estimators for two-stage least squares models

Small-sample cluster-robust variance estimators for two-stage least squares models Small-sample cluster-robust variance estimators for two-stage least squares models ames E. Pustejovsky The University of Texas at Austin Context In randomized field trials of educational interventions,

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small Bounds on Causal Effects in Three-Arm Trials with Non-compliance Jing Cheng Dylan Small Department of Biostatistics and Department of Statistics University of Pennsylvania June 20, 2005 A Three-Arm Randomized

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL BRENDAN KLINE AND ELIE TAMER Abstract. Randomized trials (RTs) are used to learn about treatment effects. This paper

More information

arxiv: v3 [stat.me] 20 Feb 2016

arxiv: v3 [stat.me] 20 Feb 2016 Posterior Predictive p-values with Fisher Randomization Tests in Noncompliance Settings: arxiv:1511.00521v3 [stat.me] 20 Feb 2016 Test Statistics vs Discrepancy Variables Laura Forastiere 1, Fabrizia Mealli

More information

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

Quantitative Economics for the Evaluation of the European Policy

Quantitative Economics for the Evaluation of the European Policy Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,

More information

Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies

Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies Parametric and Non-Parametric Weighting Methods for Mediation Analysis: An Application to the National Evaluation of Welfare-to-Work Strategies Guanglei Hong, Jonah Deutsch, Heather Hill University of

More information

The compliance score as a regressor in randomized trials

The compliance score as a regressor in randomized trials Biostatistics (2003), 4, 3,pp. 327 340 Printed in Great Britain The compliance score as a regressor in randomized trials MARSHALL M. JOFFE, THOMAS R. TEN HAVE, COLLEEN BRENSINGER Department of Biostatistics

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis. in the Presence of Treatment-by-Mediator Interaction

Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis. in the Presence of Treatment-by-Mediator Interaction Ratio-of-Mediator-Probability Weighting for Causal Mediation Analysis in the Presence of Treatment-by-Mediator Interaction Guanglei Hong Jonah Deutsch Heather D. Hill University of Chicago (This is a working

More information

Propensity Score Matching

Propensity Score Matching Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In

More information

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015 Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal

More information

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy Department of Economics, Harvard University 1 / 40 Agenda instrumental variables part I Origins of instrumental

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Introduction to Statistical Inference Kosuke Imai Princeton University January 31, 2010 Kosuke Imai (Princeton) Introduction to Statistical Inference January 31, 2010 1 / 21 What is Statistics? Statistics

More information

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University Joint work with Keele (Ohio State), Tingley (Harvard), Yamamoto (Princeton)

More information

Gov 2002: 13. Dynamic Causal Inference

Gov 2002: 13. Dynamic Causal Inference Gov 2002: 13. Dynamic Causal Inference Matthew Blackwell December 19, 2015 1 / 33 1. Time-varying treatments 2. Marginal structural models 2 / 33 1/ Time-varying treatments 3 / 33 Time-varying treatments

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives Estimating the Dynamic Effects of a Job Training Program with Multiple Alternatives Kai Liu 1, Antonio Dalla-Zuanna 2 1 University of Cambridge 2 Norwegian School of Economics June 19, 2018 Introduction

More information

Identification of Causal Parameters in Randomized Studies with Mediating Variables

Identification of Causal Parameters in Randomized Studies with Mediating Variables 1-1 Identification of Causal Parameters in Randomized Studies with Mediating Variables Michael E. Sobel Columbia University The material in sections 1-3 was presented at the Arizona State University Preventive

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health Personalized Treatment Selection Based on Randomized Clinical Trials Tianxi Cai Department of Biostatistics Harvard School of Public Health Outline Motivation A systematic approach to separating subpopulations

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

Survivor Bias and Effect Heterogeneity

Survivor Bias and Effect Heterogeneity Survivor Bias and Effect Heterogeneity Anton Strezhnev Draft April 27, 207 Abstract In multi-period studies where an outcome is observed at some initial point in time and at some later follow-up wave,

More information

Causal Inference for Mediation Effects

Causal Inference for Mediation Effects Causal Inference for Mediation Effects by Jing Zhang B.S., University of Science and Technology of China, 2006 M.S., Brown University, 2008 A Dissertation Submitted in Partial Fulfillment of the Requirements

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates Eun Sook Kim, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta E. Lanehart, Aarti Bellara,

More information

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 2: Constant Treatment Strategies Introduction Motivation We will focus on evaluating constant treatment strategies in this lecture. We will discuss using randomized or observational study for these

More information

Non-parametric Mediation Analysis for direct effect with categorial outcomes

Non-parametric Mediation Analysis for direct effect with categorial outcomes Non-parametric Mediation Analysis for direct effect with categorial outcomes JM GALHARRET, A. PHILIPPE, P ROCHET July 3, 2018 1 Introduction Within the human sciences, mediation designates a particular

More information

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Simple Sensitivity Analysis for Differential Measurement Error By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Abstract Simple sensitivity analysis results are given for differential

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Difference-in-Differences Methods

Difference-in-Differences Methods Difference-in-Differences Methods Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 1 Introduction: A Motivating Example 2 Identification 3 Estimation and Inference 4 Diagnostics

More information

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN

SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZATION TRIALS WITH ENRICHMENT (SMARTER) DESIGN Ying Liu Division of Biostatistics, Medical College of Wisconsin Yuanjia Wang Department of Biostatistics & Psychiatry,

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

More information

Statistical Methods for Causal Mediation Analysis

Statistical Methods for Causal Mediation Analysis Statistical Methods for Causal Mediation Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

IsoLATEing: Identifying Heterogeneous Effects of Multiple Treatments

IsoLATEing: Identifying Heterogeneous Effects of Multiple Treatments IsoLATEing: Identifying Heterogeneous Effects of Multiple Treatments Peter Hull December 2014 PRELIMINARY: Please do not cite or distribute without permission. Please see www.mit.edu/~hull/research.html

More information

Impact Evaluation of Mindspark Centres

Impact Evaluation of Mindspark Centres Impact Evaluation of Mindspark Centres March 27th, 2014 Executive Summary About Educational Initiatives and Mindspark Educational Initiatives (EI) is a prominent education organization in India with the

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized. Clinical Trials Subject to Noncompliance.

A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized. Clinical Trials Subject to Noncompliance. Draft June 6, 006 A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized Clinical Trials Subject to Noncompliance Roderick Little 1, Qi Long and Xihong Lin 3 Abstract We

More information

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp page 1 Lecture 7 The Regression Discontinuity Design fuzzy and sharp page 2 Regression Discontinuity Design () Introduction (1) The design is a quasi-experimental design with the defining characteristic

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information