Estimating complex causal effects from incomplete observational data

Similar documents
Stability of causal inference: Open problems

Recovering Probability Distributions from Missing Data

Identification and Estimation of Causal Effects from Dependent Data

arxiv: v1 [stat.ml] 15 Nov 2016

Pooling multiple imputations when the sample happens to be the population.

Causal Transportability of Experiments on Controllable Subsets of Variables: z-transportability

Introduction to Causal Calculus

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

m-transportability: Transportability of a Causal Effect from Multiple Environments

Causal Directed Acyclic Graphs

Identifying Linear Causal Effects

Statistical Models for Causal Analysis

The Mathematics of Causal Inference

Interaction effects for continuous predictors in regression modeling

On Identifying Causal Effects

Recovering Causal Effects from Selection Bias

Identification and Overidentification of Linear Structural Equation Models

Don t be Fancy. Impute Your Dependent Variables!

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

Identifiability in Causal Bayesian Networks: A Sound and Complete Algorithm

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Learning Gaussian Process Models from Uncertain Data

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

On the Identification of a Class of Linear Models

A Note on Bayesian Inference After Multiple Imputation

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

Causal Inference. Prediction and causation are very different. Typical questions are:

OUTLINE THE MATHEMATICS OF CAUSAL INFERENCE IN STATISTICS. Judea Pearl University of California Los Angeles (

Identification of Conditional Interventional Distributions

Identifiability in Causal Bayesian Networks: A Gentle Introduction

Bridging the two cultures: Latent variable statistical modeling with boosted regression trees

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

STAT 518 Intro Student Presentation

The mice package. Stef van Buuren 1,2. amst-r-dam, Oct. 29, TNO, Leiden. 2 Methodology and Statistics, FSBS, Utrecht University

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Generalized Do-Calculus with Testable Causal Assumptions

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Recovering from Selection Bias in Causal and Statistical Inference

What Counterfactuals Can Be Tested

Statistical Practice

Causal Models with Hidden Variables

Transportability of Causal Effects: Completeness Results

Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Targeted Maximum Likelihood Estimation in Safety Analysis

STA 4273H: Statistical Machine Learning

arxiv: v4 [math.st] 19 Jun 2018

The Do-Calculus Revisited Judea Pearl Keynote Lecture, August 17, 2012 UAI-2012 Conference, Catalina, CA

Lecture 6: Graphical Models: Learning

Linear Dynamical Systems

Integrating Correlated Bayesian Networks Using Maximum Entropy

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Flexible Estimation of Treatment Effect Parameters

Estimation with Incomplete Data: The Linear Case

THE MATHEMATICS OF CAUSAL INFERENCE With reflections on machine learning and the logic of science. Judea Pearl UCLA

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Empirical Bayes Moderation of Asymptotically Linear Parameters

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Sensitivity analysis and distributional assumptions

Machine Learning Summer School

Lecture : Probabilistic Machine Learning

arxiv: v2 [math.st] 4 Mar 2013

6 Pattern Mixture Models

STA 4273H: Statistical Machine Learning

9 Forward-backward algorithm, sum-product on factor graphs

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

CS 195-5: Machine Learning Problem Set 1

Transportability from Multiple Environments with Limited Experiments

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

CAUSES AND COUNTERFACTUALS: CONCEPTS, PRINCIPLES AND TOOLS

CAUSALITY CORRECTIONS IMPLEMENTED IN 2nd PRINTING

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Casual Mediation Analysis

CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

From Causality, Second edition, Contents

Some methods for handling missing values in outcome variables. Roderick J. Little

Bounding the Probability of Causation in Mediation Analysis

The International Journal of Biostatistics

Single World Intervention Graphs (SWIGs):

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

Parametric Techniques

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Bagging During Markov Chain Monte Carlo for Smoother Predictions

A Decision Theoretic Approach to Causality

MAP Examples. Sargur Srihari

Unsupervised Learning with Permuted Data

Causal Bayesian networks. Peter Antal

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Interactions and Squares: Don t Transform, Just Impute!

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Variable selection and machine learning methods in causal inference

Transcription:

Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland juha.t.karvanen@jyu.fi July 3, 2014 Despite the major advances taken in causal modeling, causality is still an unfamiliar topic for many statisticians. In this paper, it is demonstrated from the beginning to the end how causal effects can be estimated from observational data assuming that the causal structure is known. To make the problem more challenging, the causal effects are highly nonlinear and the data are missing at random. The tools used in the estimation include causal models with design, causal calculus, multiple imputation and generalized additive models. The main message is that a trained statistician can estimate causal effects by judiciously combining existing tools. Keywords: causal estimation, data analysis, missing data, nonlinearity, structural equation model 1 Introduction During the past two decades major advances have been taken in causal modeling. Despite the progress, many statisticians still hesitate to talk about causality especially when only observational data are available. Caution with causality is always advisable but this should not lead to avoidance of the most important statistical problems of the society, that is, predicting the consequences of actions, interventions and policy decisions. 1

The estimation of causal effects from observational data requires external information on the causal relationships. The dependencies seen in observational data may be a result of confounding instead of causality. Therefore, the observation data alone are not sufficient for the estimation of causal effects but must be accompanied with the causal assumptions. It suffices to know the causal relationships qualitatively, i.e. to know whether X affects Y, Y affects X or X and Y do not have an effect to each other. Obtaining this kind of information may be very difficult in some situations but can also be straightforward if the directions of the causal relationships can be easily concluded e.g. from the laws of physics or from the temporal order of the variables. Structural causal models offer a mathematically well-defined concept to convey causal assumptions and causal calculus provides a systematic way to express the causal effects in the terms of the observational probabilities (Pearl, 1995, 2009). The benefits of this framework are numerous: First, the philosophically entangled concept of causality has a clear and practically useful definition. Second, in order to tell whether a causal effect can be identified in general it is sufficient to specify the causal model non-parametrically, i.e. specify only the causal structure of the variables. Third, the completeness of causal calculus has been proved (Huang and Valtorta, 2006; Bareinboim and Pearl, 2012a) and algorithms for the identification of the causal effects have been derived(tian and Pearl, 2002; Shpitser and Pearl, 2006a,b; Bareinboim and Pearl, 2012a). Fourth, the framework can be extended to deal with study design, selection bias and missing data (Karvanen, 2014; Bareinboim and Pearl, 2012b; Mohan et al., 2013). Estimation of nonlinear causal effects from the data has in general received only a little attention in the research on structural causal models. Examples of causal estimation in the literature often use linear Gaussian models or consider causal effects of binary treatments. With linear Gaussian models causal effects can be often directly identified in closed-form using path analysis as recently reminded by Pearl(2013). However, in practical situations, the assumption of the linearity of all causal effects is often too restrictive. G-methods (Hernan and Robins, 2015) and targeted maximum likelihood estimation (TMLE) (Van der Laan and Rubin, 2006; Van der Laan and Rose, 2011) are flexible frameworks for causal estimation but the examples on their use concentrate on situations where the effect of a binary treatment is to be estimated. Thispaperaimstonarrowthegapbetweenthetheoryofcausalmodelsandpractical data analysis by presenting an easy-to-follow example on the estimation of nonlinear causal relationships. As a starting point it is assumed that only the causal structure can be specified, i.e. the causal effects can be described qualitatively in a nonparametric form but not in a parametric form. Causal relationships are expected to have a complex nonlinear form which cannot be a priori modeled but is assumed to be continuous, differentiable and sufficiently smooth. The key idea is to first use causal calculus to find out the observational probabilities needed to calculate the causal effects and then estimate these observational probabilities from the data using flexible models from the 2

arsenal of modern data analysis. The estimated causal effects cannot be expressed in a closed-form but can be numerically calculated for any input. The approach works if the required observational probabilities can be reliably interpolated or extrapolated from the model fitted to the data. Data missing at random creates an additional challenge for the estimation. Multiple imputation (Rubin, 1987) of missing data works well with the idea of flexible nonparametric modeling because in multiple imputation an incomplete dataset is transformed to multiple complete datasets for which the parameters can estimated separately from other imputed datasets. The estimation procedure is given in detail in Section 2. An example demonstrating the estimation of complex nonlinear causal effects is presented in Section 3. Conclusions are given in Section 4. 2 Estimation of causal effects A straightforward approach for the estimation of causal effects is to first to define the causal model explicitly in a parametric form up to some unknown parameters and then use data to estimate these parameters. For instance, in linear Gaussian models, the causal effects are specified by the linear models with unknown regression coefficients that can be estimated on the basis of the sample covariance matrix calculated from the data. This approach may turn out to be difficult to implement when the functional forms of the causal relationships are complex or unknown because it is not clear how the parametric model should specified and how its parameters could be estimated from the data. The estimation procedure proposed here utilizes nonparametric modeling of causal effects. It is assumed that the causal structure is known, the causal effects are complex and there are data collected by simple random sampling from the underlying population. An central concept in structural causal models is the do-operator (Pearl, 1995, 2009). The notation p(y do(x = x)), or shortly p(y do(x)), refers to the distribution of random variable Y when an action or intervention that sets the value of variable X to x is applied. The rules of causal calculus, also known as do-calculus, give graph theoretical conditions when it is allowed to insert or delete observations, to exchange actions and observation and to insert or delete actions. These rules and the standard probability calculus are sequentially applied to present a causal effect in the terms of observational probabilities. The estimation procedure can be presented as follows: 1. Specify the causal structure. 2. Expand the causal structure to describe also the study design and the missing data mechanism. 3

3. Check that the causal effect(s) of interest can be identified. 4. Apply causal calculus to present the causal effect(s) using observational distributions. 5. Apply multiple imputation to deal with missing data. 6. Fit flexible models for the observational distributions needed to calculate the causal effect. 7. Combine the fitted observational models and use them to predict the causal effect. Step 1 is easy to carry out because the causal structure is assumed to be known. In Step 2, causal models with design (Karvanen, 2014) can be utilized. In Steps 3 and 4, the rules of causal calculus are applied either intuitively or systematically via the identification algorithms (Tian and Pearl, 2002; Shpitser and Pearl, 2006a,b; Bareinboim and Pearl, 2012a). In Step 4, multiple imputation is applied in the standard way. In Step 5, fitting a model to the data is a statistical problem and data analysis methods in statistics and machine learning are available for this task. Naturally, validation is needed to avoid over-fitting. The step is repeated for all imputed data sets. In Step 6, the fitted observational models are first combined as the result derived in Step 3 indicates. This usually requires integration over some variables, which can be often carried out by summation with the actual data as demonstrated in the examples in Section 3. Finally, the estimates are combined over the imputed datasets. 3 Example: frontdoor adjustment for incomplete data The example is chosen to illustrate challenges of causal estimation. Simulated data are used because this allows comparisons with the true causal model. The causal effects are set to be strongly nonlinear, which means that they cannot be modeled with linear models. The effects of the latent variables are set so that the conditional distributions in observational data strongly differ from the true uncounfounded causal effect. The setup is further complicated by missing data. The aim is to estimate the average causal effect E(Y do(x = x)) from data 4

Figure 1: Causal structure for the example. The solid circle indicates that the variable is observed and the open circle indicates that the variable is unobserved. generated with the following structural equations U N(0,1), X Unif( 2,2), X = X +U, Z = 4φ(X)+ǫ Z, Y = φ(z 0.5)+0.3 Z 0.1U, ǫ Z N(0,0.1 2 ), where the nonlinear structural relationships are defined using the density function of the standard normal distribution φ(x) = 1 ) exp ( x2. (1) 2π 2 The corresponding causal structure is presented in Figure 1. The causal effect of X to Y is mediated through variable Z. In addition, there is an unknown confounder U which has a causal effect both to X and Y but not to Z. Variables X and ǫ Z in the structural equations are included only because of the convenience of the notation and are not presented in the graph. Pearl (2009) considers the same causal structure with an interpretation that X represents smoking, Z represents the tar deposits in the lungs and Y represents the risk of lung cancer. Differently from Pearl, it is assumed here that all variables are continuous. The observed data are denoted by variables X, Z and Y. Variables X and Z are incompletely observed, which can be formally defined as { X X, if M X = 1 = NA, if M X = 0, { Z Z, if M Z = 1 = NA, if M Z = 0, 5

Figure 2: Causal model with design for the example studied. where M X and M Z are missingness indicators. Response Y is completely observed for the sample. The indicators M X and M Z are binary random variables that depend on other variables as follows P(M X = 1 do(y = y)) = Φ(2 y) (2) P(M Z = 1 do(y = y)) = Φ(4y 1), (3) where Φ stands for the cumulative distribution function of the standard normal distribution. The value of X is more likely to be missing when Y is large and the value of Z is more likely to be missing when Y is small. A causal model with design that combines the causal model and the missing data mechanism is presented in Figure 2. The causal model is the same as in the graph in Figure 1. Subscript i indexes the individuals in a finite well-defined closed population Ω = {1,...,N}. Variable m Ωi represents an indicator for the population membership and is defined as m Ωi = 1 if i Ω and m Ωi = 0 if i / Ω. The data are collected for a sample from population Ω. Indicator variable m 1i has value 1 if the individual is selected to the sample and 0 otherwise. Variables X, Z and Y are recorded and are therefore marked with a solid circle. The underlying variables in the population, X i, Z i and Y i are not observed are therefore marked with an open circle. The value of response Yi is observed for the whole sample. The value of covariate Xi is observed when M Xi = 1 where M Xi depends on Y i as indicated in equation (2). The value of response Zi is observed when M Zi = 1 where M Zi depends on Y i as indicated in equation (3). The input available for the estimation contains only the graph in Figure 2 and the data (X i,z i,y i ), i = 1,...,n = 20000. In the data, the value of X i is missing in 6% 6

of cases, the value of Zi is missing in 26% of cases and both Xi and Zi are missing in 1.5% of cases. The joint distribution of Z, X and Y is illustrated in scatterplots in Figure 3. The most discernible feature of the joint distribution is the strong nonlinear associationbetween covariatex andmediator Z. BothX andz arealso associated with response Y. Figure 4 shows the true causal effect of interest together with the observational data. From the observational distribution p(y x), one might misjudge that Y decreases as the function X but in reality the observed decreasing trend is due to the unobserved confounder U. A particular challenge is that certain combinations of the variable values are not present in the data. For instance, if the value of X is set to 3, the average response is around 0.35 but in the data there no observations even close to this point. It is possible to identify average causal effect E(Y do(x = x)) from observational data because in the causal structure there exists no bi-directed path (i.e. a path where each edge starts from or ends to a latent node) between X and any of its children (Tian and Pearl, 2002; Pearl, 2009). Here, Z is the only child of X and there are no edges between Z and U. Applying the rules of causal calculus (Pearl, 2009), the causal effect of X to Y can be expressed as p(y do(x = x)) = p(z X = x) p(y X = x,z = z)p(x = x )dx dz (4) and the average causal effect is obtained as E(Y do(x = x)) = p(z X = x) E(Y X = x,z = z)p(x = x )dx dz. (5) This result is known as the frontdoor adjustment (Pearl, 1995). The derivation is presented in detail in (Pearl, 2009, p. 81 82). Equation (4) means that the problem of estimating causal effect E(Y do(x = x)) can be reduced to the problem of estimating observational distributions p(z X = x), p(y X = x,z = z) and p(x = x ). Before proceeding with the estimation the problem of missing data must be addressed. Complete case analysis is expected to lead biased results because the missingness of X and Z depends on response Y. Using the d-separation criterion (Pearl, 2009, p. 16 17), it can be concluded from the causal model with design in Figure 2 that on the condition of Y missingness is independent from the missing value M X X Y M Z Z Y. AsvariableY isfullyobserved inthesample, thedataaremissing atrandomandmultiple imputation can be applied. Multiple imputation is not the only way to approach the missing data problem but it is easy to implement together with nonparametric estimation of causal effects. Due to nonlinearity of the dependencies, special attention has to 7

4 2 0 2 4 z 0.0 0.5 1.0 1.5 4 2 0 2 4 x y 0.1 0.3 0.5 0.7 0.0 0.5 1.0 1.5 0.1 0.3 0.5 0.7 Figure 3: Scatterplot matrix of a subsample of 500 observations. 8

y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Observation E(Y x) E(Y do(x)) 4 2 0 2 4 x y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Generated data Y do(x) E(Y x) E(Y do(x)) 4 2 0 2 4 x Figure 4: Average causal effect compared to average conditional mean. In the upper panel, the true average causal effect (solid line) is presented together with observations (subsample of 500 observations) and their conditional average (dotted line). In the lower panel, the same curves are presented with data generated from the hypothetical experiment where the values of X are controlled and eight values of response Y are generated for each x. 9

be paid on the fit of the imputation model. The MICE-algorithm (Van Buuren, 2012) implemented in the R package mice (Van Buuren and Groothuis-Oudshoorn, 2011) is applied in the imputation. The default method of the package, predictive mean matching with linear prediction models, fails because of the for each value of Z, the conditional distribution of X is bimodal. As a solution, the imputation model uses X and sign(x) instead of X and generalized additive models (GAM) (Hastie and Tibshirani, 1990) are applied as prediction model in predictive mean matching. The fit of the imputations is checked by visually comparing the observed and imputed values. The frontdoor adjustment (4) leads to estimators ˆp(Y do(x = x)) = 1 n Ê(Y do(x = x)) = 1 n n ˆp(Y X = x i,z = z i ) (6) i=1 n Ê(Y X = x i,z = z i ) (7) i=1 where z i is a random variable generated from the estimated distribution ˆp(Z X = x). The distributions ˆp(Z X = x) and ˆp(Y X = x,z = z) are modeled by GAM with smoothing splines (Wood, 2006). The estimation is carried out using the R package mgcv (Wood, 2014) which includes automatic smoothness selection. The estimation is repeated for ten imputed datasets and a complete case analysis is carried out as a benchmark. The random variables from the estimated distributions are generated by adding resampled residuals to the estimated expected value. The estimated models have a good fit with the observational data. The estimated causal effects are presented in Figure 5. Both the average causal effect E(Y do(x = x)) and the causal distribution p(y do(x = x)) were estimated without a significant bias when x ( 2,2) and multiple imputation was applied. Complete case analysis systematically over-estimated the average causal effect, which demonstrates the importance of proper handling of missing data. Due to the relatively large size of data, the sample variability was not an issue here. It can be concluded that the chosen nonparametric strategy was successful in the estimation of the average causal effects. 4 Conclusions It was shown how causal effects can estimated by using causal models with design, causal calculus, multiple imputation and generalized additive models. It was demonstrated that the estimation of causal effects does not necessarily require the causal model to be specified parametrically but it suffices to model the observational probability distributions. The causal structure can be often specified even if it is very difficult to forge a suitable parametric model for the causal effects. The data can modeled with 10

E(y do(x)) 0.35 0.40 0.45 0.50 0.55 True average causal effect Estimated effect (MI) Estimated effect (CC) 4 2 0 2 4 x y do(x) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 True average causal effect True 5% and 95% percentiles Estimated average effect Estimated percentiles 4 2 0 2 4 x Figure 5: Estimated causal effects compared to the true causal effect. The upper panel shows the average causal effect estimated using multiple imputed data (MI) or only completely cases (CC). The lower panel shows the true and the estimated 5% and 95% percentiles of the causal distribution p(y do(x = x)). 11

flexible models for which ready-made software implementations exist. The key message can be summarized in the form of the equation Causal estimation = causal calculus + data analysis. The applied nonparametric approach requires that the causal effects are sufficiently smooth to be estimated from the data. In the example, the variables were continuous but the approach works similarly for discrete variables. Large datasets may be required if the causal effects are complicated, the noise level is high or there are unobserved confounders. If the data are missing not at random, external information on the missing data mechanism is needed. The interpretation of the causal estimates requires care. The results should be evaluated with respect to the existing scientific knowledge before any conclusions can be made. The presented example hopefully encourages statisticians to target causal problems. Tools for causal estimation exist and are not too difficult to use. References Bareinboim, E. and Pearl, J. (2012a). Causal inference by surrogate experiments: z- identifiability. In de Freitas, N. and Murphy, K., editors, Proceedings of the Twenty- Eight Conference on Uncertainty in Artificial Intelligence, pages 113 120. AUAI Press. Bareinboim, E. and Pearl, J. (2012b). Controlling selection bias in causal inference. In JMLR Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS), volume 22, pages 100 108. Hastie, T. and Tibshirani, R. (1990). Generalized additive models. CRC Press. Hernan, M. A. and Robins, J. M. (2015). Causal Inference. Chapman & Hall/CRC. Huang, Y. and Valtorta, M. (2006). Pearl s calculus of intervention is complete. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 217 224. AUAI Press. Karvanen, J. (2014). Study design in causal models. arxiv:1211.2958. Mohan, K., Pearl, J., and Tian, J. (2013). Graphical models for inference with missing data. In Proceedings of Neural Information Processing Systems Conference (NIPS- 2013). Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):669 710. 12

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press, second edition. Pearl, J. (2013). Linear models: a useful microscope for causal analysis. Journal of Causal Inference, 1(1):155 170. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley Series in probability and mathematical statistics. John Wiley & Sons, Inc., New York. Shpitser, I. and Pearl, J. (2006a). Identification of conditional interventional distributions. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006), pages 437 444. AUAI Press. Shpitser, I. and Pearl, J. (2006b). Identification of joint interventional distributions in recursive semi-markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219 1226. AAAI Press. Tian, J. and Pearl, J. (2002). A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pages 567 573. AAAI Press/The MIT Press. Van Buuren, S. (2012). Flexible imputation of missing data. CRC press. Van Buuren, S. and Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(3):1 67. Van der Laan, M. and Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer. Van der Laan, M. and Rubin, D. B. (2006). Targeted maximum likelihood learning. International Journal of Biostatistics, 2:Article 11. Wood, S. (2006). Generalized Additive Models: an introduction with R. Chapman and Hall/CRC. Wood, S. (2014). mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation and GAMMs by REML/PQL. R package version 1.7-29. 13