A Decision Theoretic Approach to Causality

Similar documents
The decision theoretic approach to causal inference OR Rethinking the paradigms of causal modelling

Simulating from Marginal Structural Models with Time-Dependent Confounding

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Gov 2002: 4. Observational Studies and Confounding

Introduction to Causal Calculus

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Graphical Representation of Causal Effects. November 10, 2016

Assessing In/Direct Effects: from Structural Equation Models to Causal Mediation Analysis

University of California, Berkeley

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Causal Discovery. Beware of the DAG! OK??? Seeing and Doing SEEING. Properties of CI. Association. Conditional Independence

Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality

Causal Inference Basics

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (

Causal Models with Hidden Variables

Help! Statistics! Mediation Analysis

arxiv: v2 [math.st] 4 Mar 2013

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Identifying the consequences of dynamic treatment strategies: A decision-theoretic overview

Statistical Models for Causal Analysis

Causal Directed Acyclic Graphs

OUTLINE THE MATHEMATICS OF CAUSAL INFERENCE IN STATISTICS. Judea Pearl University of California Los Angeles (

Mendelian randomization as an instrumental variable approach to causal inference

Gov 2002: 13. Dynamic Causal Inference

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017

Instrumental variables estimation in the Cox Proportional Hazard regression model

From Causality, Second edition, Contents

Single World Intervention Graphs (SWIGs):

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Causal Mechanisms Short Course Part II:

Casual Mediation Analysis

Causal Bayesian networks. Peter Antal

Causal Inference. Prediction and causation are very different. Typical questions are:

Statistical Analysis of Causal Mechanisms

CompSci Understanding Data: Theory and Applications

Causal Inference & Reasoning with Causal Bayesian Networks

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating direct effects in cohort and case-control studies

Bayesian networks as causal models. Peter Antal

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Sensitivity analysis and distributional assumptions

Selection on Observables: Propensity Score Matching.

DAGS. f V f V 3 V 1, V 2 f V 2 V 0 f V 1 V 0 f V 0 (V 0, f V. f V m pa m pa m are the parents of V m. Statistical Dag: Example.

PANEL ON MEDIATION AND OTHER MIRACLES OF THE 21 ST CENTURY

Bounding the Probability of Causation in Mediation Analysis

Probabilistic Causal Models

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Ignoring the matching variables in cohort studies - when is it valid, and why?

Causal Effect Evaluation and Causal Network Learning

Causal mediation analysis: Definition of effects and common identification assumptions

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Estimation of direct causal effects.

Mediation analysis for different types of Causal questions: Effect of Cause and Cause of Effect

Counterfactual Reasoning in Algorithmic Fairness

Defining and Identifying the Effect of Treatment on the Treated

CAUSES AND COUNTERFACTUALS: CONCEPTS, PRINCIPLES AND TOOLS

PSC 504: Dynamic Causal Inference

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Introduction to Statistical Inference

Single World Intervention Graphs (SWIGs):

Discussion of Papers on the Extensions of Propensity Score

Causal Inference with Counterfactuals

Lecture Notes 22 Causal Inference

Front-Door Adjustment

Advances in Cyclic Structural Causal Models

Causal inference in epidemiological practice

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Effects of multiple interventions

Data Mining 2018 Bayesian Networks (1)

Research Design: Causal inference and counterfactuals

The Effects of Interventions

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

The International Journal of Biostatistics

Propensity Score Weighting with Multilevel Data

Rapid Introduction to Machine Learning/ Deep Learning

Causal Bayesian networks. Peter Antal

CAUSAL INFERENCE IN STATISTICS. A Gentle Introduction. Judea Pearl University of California Los Angeles (

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

What Causality Is (stats for mathematicians)

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Automatic Causal Discovery

Conditional probabilities and graphical models

Geoffrey T. Wodtke. University of Toronto. Daniel Almirall. University of Michigan. Population Studies Center Research Report July 2015

LEARNING CAUSAL EFFECTS: BRIDGING INSTRUMENTS AND BACKDOORS

DATA-ADAPTIVE VARIABLE SELECTION FOR

Conditional Independence

Rank preserving Structural Nested Distribution Model (RPSNDM) for Continuous

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example

Directed acyclic graphs with edge-specific bounds

Introduction to Artificial Intelligence. Unit # 11

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference

Statistical Methods for Causal Mediation Analysis

CAUSAL INFERENCE AS COMPUTATIONAL LEARNING. Judea Pearl University of California Los Angeles (

CAUSALITY. Models, Reasoning, and Inference 1 CAMBRIDGE UNIVERSITY PRESS. Judea Pearl. University of California, Los Angeles

Transcription:

A Decision Theoretic Approach to Causality Vanessa Didelez School of Mathematics University of Bristol (based on joint work with Philip Dawid) Bordeaux, June 2011

Based on: Dawid & Didelez (2010). Identifying the consequences of dynamic treatment strategies: A decision theoretic overview. Statistics Surveys, 4, 184-231. Didelez & Dawid (2008). Identifying optimal sequential decisions. 24th UAI, 113-120. Didelez, Geneletti & Dawid (2006). Direct and indirect effects of sequential treatments 22nd UAI, 138-146. (more references with abstract; and references therein) all on http://www.maths.bris.ac.uk/ maxvd/

Overview Part 1: Decision theoretic framework for causal inference Part 2: Graphical models and influence diagrams Part 3: Sequential decisions and direct effects

Part 1: Decision theoretic framework for causal inference Examples Target of inference and assumptions Formal frameworks: potential responses vs. decision theory 1

Examples Many studies are carried out to inform e.g. public health interventions, doctors decisions, give advice etc. add folic acid to flour to prevent nural tube defects? banning smoking in pubs to lower lung cancer risk? advice on breast feeding: how long is best and under what conditons? HIV patients: start HAART when CD4-counts below what threshold? hormone replacement therapy (HRT) beneficial or not? Want these to be well informed so that decisions not wrong / interventions not useless distinguish association and causation! 2

Target of Inference & Assumptions Some issues when using data to inform decisions: Target of inference: what possible decisions do we want to compare for what population? Assumptions: under what assumptions linking the data to the decision problem at hand do our methods give valid conclusions (identifiability)? and can we justify these assumptions? need to be clear and explicit about both, target and assumptions. 3

Formal Frameworks... should allow to distinguish association and causation. Association: observing X predicts / is informative for Y. Events occur more often together than expected under independence. Usual conditional probability notation: p(y = y X = x), i.e. probability of Y = y given we happen to know that X = x has occurred. Causation: intervening in X predicts / is informative for Y. We can make event Y = y more likely by manipulating X. Need some special notation for this; conditional probability not enough! 4

Potential Responses (PRs) Consider binary treatment X i {0,1}, individual i Y i 0 = response if X i = 0 (Rubin, 1974; many others) Y i 1 = response if X i = 1 for same subject (at the same time) {Y i 0,Y i 1} can never be observed together potential responses. Once a decision has been made, say X i = 1, then Y1 i can be observed and Y0 i is counterfactual. Note: PRs only well defined if manipulation of X well defined. 5

Target(s) of inference Individual effects: any contrast of Y1 i and Y0, i e.g. individual causal effect (ICE) ICE i = Y1 i Y0 i (or ratio or...). Note: inference about ICE depends on assumptions about joint distribution of (Y 0,Y 1 ). Alternatively: any contrast of the distributions p(y 1 ) and p(y 0 ), e.g. average causal effect (ACE) ACE = E(Y 1 ) E(Y 0 ). Note: does not depend on joint distribution of (Y 0,Y 1 ). 6

Assumptions An example for a standard assumption that allows to identify p(y x ) is the one of random treatment assignment or no unmeasured confounding: let C be observed pre treatment covariates and X Y x C x = 1,0. E.g. in RCT X is randomised and hence independent of pre-treatment. Then p(y X = x,c = c) = p(y x X = x,c = c) = p(y x C = c) and hence e.g. ACE = c {E(Y X = 1,C = c) E(Y X = 0,C = c)}p(c = c) Using also consistency: Y = Y X 7

Decision Theoretic Approach (Pearl, 1993; Dawid, 2002) Indicator: index distributions by regime indicator σ X σ X = { x X, set X to x by specified intervention, let X arise naturally, where p(x;σ X = x) = I(X = x) and p(x;σ X = ) observational distribution of X. If p(y;σ X = x) depends on x, we consider X causal for Y. If p(y X = x;σ X = ) depends on x, then there is an association which could also be due to confounding, reverse causation etc. 8

Target(s) of inference In general: contrast intervention distribution p(y;σ X = x) for different values of x, e.g. average causal effect (ACE) ACE = E(Y;σ X = 1) E(Y;σ X = 0). Cannot formulate individual causal parameters! Could instead consider conditional effects in sub-population s, e.g. ACE s = E(Y S = s;σ X = 1) E(Y S = s;σ X = 0). More generally, let k( ) be loss function want to evaluate E(k(Y);σ X = x) for different x. 9

Assumptions Analogous to no unmeasured confounders is assumption of sufficient covariates C: (Dawid, 2002) C σ X and Y σ X (X,C). Then can identify intervention distribution p(y;σ X = x) = X,Cp(Y X,C;σ X = x)i(x = x)p(c;σ X = x) = C p(y X,C;σ X = )p(c;σ X = ). 10

Outlook (Part 1) More complex targets: effect of treatment on the treated complier causal effect (only with potential responses) conditional strategies: decision is made depending on additional observations optimal decision direct and indirect effects. Assumptions: most methods of causal inference make some version of no unmeasured confounders assumption. Exception: instrumental variables (but other assumptions needed). Methods: instead of adjusting for covariates, can use propensity scores or inverse probability weighting. All make same assumptions! Different wrt. robustness and efficiency. 11

Part 2: Graphical models and influence diagrams Graphical models Influence diagrams Representing assumptions An example 12

Graphical Models Here: directed acyclic graphs (DAGs), where vertices = variables and no edge = some (conditional) independence. C A B A B C 13

Graphical Models Here: directed acyclic graphs (DAGs), where vertices = variables and no edge = some (conditional) independence. C C A B A B A B C also A B C 13

Graphical Models Here: directed acyclic graphs (DAGs), where vertices = variables and no edge = some (conditional) independence. C C C A B A B A B A B C also A B C but A / B C, A B 13

Graphical Models Factorisation of joint density p(x) = i p(x i x pa(i) ) C A B p(a c)p(b c)p(c) 14

Graphical Models Factorisation of joint density p(x) = i p(x i x pa(i) ) C C A B A B p(a c)p(b c)p(c) p(a c)p(b)p(c b) 14

Graphical Models Factorisation of joint density p(x) = i p(x i x pa(i) ) C C C A B A B A B p(a c)p(b c)p(c) p(a c)p(b)p(c b) p(a)p(b)p(c a, b) 14

Graphical Models In general: X i X nd(i) X pa(i) C C C A B A B A B A B C also A B C A B 15

Influence Diagrams (Dawid 2002, 2003) Include decision node / intervention indicator σ X Example: while the following just means p(x, y) = p(x)p(y x) (no restriction) X Y 16

Influence Diagrams (Dawid 2002, 2003) Include decision node / intervention indicator σ X Example: while the following just means p(x, y) = p(x)p(y x) (no restriction) X Y The following implies: Y σ X X or p(x,y;σ X ) = p(y x)p(x;σ X ) σ X X Y 16

Influence Diagrams (Dawid 2002, 2003) Example ctd.: while the following just means p(x, y) = p(y)p(x y) (no restriction) X Y The following implies: Y σ X or p(x,y;σ X ) = p(y)p(x y;σ X ) σ X X Y 17

Influence Diagrams (Dawid 2002, 2003) Example ctd.: While there is no difference between the models X Y X Y... we can express different assumptions about interventions e.g. by σ X X Y σ X X Y Note: indicator σ X not random in box and always conditioned on. 18

Representing Assumptions Use influence diagrams to represent assumptions about interventions. E.g. assumption of sufficient covariates C σ X and Y σ X (X,C). uniquely represented by influence diagram C σ X X Y 19

Where Do the Graphs Come From? Combination of subject matter background knowledge (biology, physics...), and e.g. study design, type of intervention considered; testable implications: certain conditional independencies can be tested from data. 20

Example Study on effect of hormone replacement therapy (HRT) on temporal cerebral ischemia (TCI) (Pedersen et al., 1997; Didelez et al., 2008) HRT Age HRT TCI Occ Smo THist Here: Age & Smoking sufficient covariates for effect of HRT on TCI. 21

Outlook (Part 2) Graphical models / influence diagrams are used to facilitate reasoning remember: justify absence of further nodes &edges help to formulate background knowledge provide graphical rules to check for identifiability suggest what variables to adjust for (if at all), and how. Other applications, for example: effect of treatment on the treated instrumental variables (Mendelian randomisation) data situtations with potential for selection bias dynamic models, time series / event histories and of course Part 3... 22

Part 3: Sequential decisions and direct effects (Optimal) sequential decisions G computation Simple & extended stability Direct causal effects 23

Sequential Decisions Examples: (1) Stroke patients receive regular anticoagulant treatment: has to be continually monitored and dosage adjusted depending on blood test results and other health indicators (2) HIV patients: have to decide at what point in time to start HAART depending on latest CD4 count Target: want to find optimal treatment strategy from (observational?) data. 24

Sequential Decisions Question: what assumptions do we need to be able to make inferences about decisions that may depend on the patient s history conditional interventions / strategies? and are these more restrictive when we want to find an optimal strategy? When assumptions are plausible, how do we evaluate effect of a (optimal) strategy? 25

Some Notation A 1,...,A N action variables can be manipulated L 1,...,L N covariates (available) background information Y = L N+1 response variable all measured over time, L i before A i A <i = (A 1,...,A i 1 ) past up to before i; A i, A >i etc. similarly 26

Strategies Strategy s = (s 1,...,s N ) set of functions assigning an action a i = s i (a <i,l i ) to each history (a <i,l i ) (Could be stochastic, then dependence on a <i relevant.) Also called: conditional / dynamic / adaptive strategies. Indicator σ = {, observational regime s, s S = set of strategies Denote p( ;s) = p( ;σ = s) distributions under strategy s. 27

NB: Why Naïve Regression Does Not Work Naïvely: fit regression model p(y a, l), e.g. Cox regression? Example: Possible scenario under null hypothesis of no causal effect: U A 1 L 2 A 2 Y σ But: Y / A 1 (A 2,L 2 ) conditioning on collider L 2 will induce association between Y and A 1! Regression not consistent at the null. However, need to condition on L 2 as possible confounder for A 2. 28

Evaluation of Strategies Let k( ) be a loss function. Want to evaluate E(k(Y);s). Define f(a j,l i ) := E{k(Y) a j,l i ;s} i = 1,...,N;j = i 1,i. Then obtain f( ) = E(k(Y);s) from f(a N,l N ) iteratively by: f(a <i,l i ) = p(a i a <i,l i ;s) f(a i,l i ) a i f(a <i,l <i ) = l i p(l i a <i,l <i ;s) f(a <i,l i ). Note: well known as extensive form analysis. Also known as G computation (Robins, 1986) 29

Evaluation of Strategies Let k( ) be a loss function. Want to evaluate E(k(Y);s). Define f(a j,l i ) := E{k(Y) a j,l i ;s} i = 1,...,N;j = i 1,i. Then obtain f( ) = E(k(Y);s) from f(a N,l N ) iteratively by: f(a <i,l i ) = } p(a i a <i {{,l i ;s) } f(a i,l i ) a i known by s f(a <i,l <i ) = l i p(l i a <i,l <i ;s) f(a <i,l i ). 29

Evaluation of Strategies Let k( ) be a loss function. Want to evaluate E(k(Y);s). Define f(a j,l i ) := E{k(Y) a j,l i ;s} i = 1,...,N;j = i 1,i. Then obtain f( ) = E(k(Y);s) from f(a N,l N ) iteratively by: f(a <i,l i ) = } p(a i a <i {{,l i ;s) } f(a i,l i ) a i known by s f(a <i,l <i ) = p(l } i a <i {{,l <i ;s) } f(a <i,l i ). l i not (?) known 29

Simple Stability (Dawid & Didelez, 2010) Sufficient for identifiability of any type of strategy is p(l i a <i,l <i ;s) = p(l i a <i,l <i ; ) for all i = 1,...,N +1 or (via intervention indicator) L i σ (A <i,l <i ) for all i = 1,...,N +1 Or graphically: L 1 A 1 L 2 A 2 Y 30

Extended Stability Might not be able to assess simple stability without taking unobserved variables into account. extend covariates L to include unobserved / hidden variables U = (U 1,...,U N ) and check if simple stability can be deduced. Example 1: particular underlying structure (note: L 1 = ) U A 1 L 2 A 2 Y σ Simple stability violated as Y / σ (A 1,A 2,L 2 ). 31

Examples Example 2: different underlying structure U A 1 L 2 A 2 Y σ Simple stability satisfied. 32

Examples Example 3: another different underlying structure U 1 U 2 A 1 L 2 A 2 Y σ Simple stability violated: L 2 / σ A 1 and Y / σ (A 1,A 2,L 2 ) 33

Relax Simple Stability? Specific conditional strategies can be identified under weaker conditions than simple stability. (Pearl & Robins, 1995; Dawid & Didelez, 2010) However, in order to identify an optimal strategy, cannot apply these weaker conditions. (Dawid & Didelez, 2008) Heuristically: optimal strategy has to be allowed to depend on all previous observed information. Note: necessary and sufficient conditions have been given (in a more restrictive causal framework) for conditional strategies, but not generalised to optimal ones. (Tian, 2008) 34

Direct Causal Effects (Greenland & Robins, 1999; Pearl, 2001) Examples: Want direct effect of X on Y not mediated by Z direct effect of treatment not mediated by mental attitude(no placebo effect); direct effect of oral contraception on thrombosis risk not mediated by prevention of pregnancy; direct effect of gender on salary not mediated by qualification. need to block effect through Z by fixing it compare interventions in X while intervening in Z to keep it constant. 35

Controlled Direct Causal Effects Direct effects can be formalised in different ways. Most popular: fix Z at z. Controlled Direct effect of X (binary, say) at σ Z = z, e.g. CDE z = E(Y;σ X = 1,σ Z = z) E(Y;σ X = 0,σ Z = z). Interpretation: force Z = z for everyone and compare different settings of X. Note: as before, regression p(y x, z) not suitable! 36

General Direct Causal Effects Standardised Direct Effect: (Geneletti, 2006) More generally, let Z arise from the same distribution D under different interventions in X SDE D = E(Y;σ X = 1,σ Z = D) E(Y;σ X = 0,σ Z = D) Natural Direct Effect with POs: E(Y 1,Z0 Y 0 ) If D W 0 = p(z W;σ X = 0,σ Z = ) then NDE = E(Y;σ X = 1,σ Z = D W 0 ) E(Y;σ X = 0,σ Z = D W 0 ) Note: only for NDE (and specific W): total=direct+indirect effect! 37

Sequential Decisions and Direct Effects (Didelez et al., 2006) Direct effects essentially the same as sequential decision problem: Choose A 1 = X and A 2 = Z same conditions for identifiability and same methods of evaluation. Can apply conditions that are weaker than simple stability as intervention to fix Z does not depend on previous observations. Note: for NDE need to obtain D W 0 additional conditions required. Also, NDE not well defined in some situations where CDE or SDE are. 38

Example: Double Blind Study NDE not well defined / not identified: Side effect will depend on actual treatment and hence mental attitude will not be generated from same distribution for treated and untreated! Cannot speak of direct effect unless side effect can also be controlled. Mental attitude Z X Treatment U Side effect Y Outcome 39

Outlook (Part 3) In principle, can use G formula, estimating all p(l i a <i,l <i ; ) from observational data sensitive to misspecification / null paradox. Also: curse of high dimensions & dynamic programming becomes infeasible. 1st Alternative: inverse probability of treatment weighting (IPTW). Requires models / estimation of p(a i a <i,l i ; ). Recently: advances in using IPTW for finding optimal dynamic treatments. (Orellana, Rotnitzky, Robins, 2011) 2nd Alternative: G estimation so far only motivated within PR framework and specific survival models. (Robins, 1992) Models / methods for finding optimal treatment strategy still need more testing in practice. (Rosthoj et al., 2006) 40

Summary Decision theoretic framework brings clarity to causal questions underlying assumptions and will alert users to cross world assumptions 41