The decision theoretic approach to causal inference OR Rethinking the paradigms of causal modelling

Similar documents
A Decision Theoretic Approach to Causality

Bounding the Probability of Causation in Mediation Analysis

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality

Statistical Models for Causal Analysis

Counterfactual Reasoning in Algorithmic Fairness

Causal Discovery. Beware of the DAG! OK??? Seeing and Doing SEEING. Properties of CI. Association. Conditional Independence

Single World Intervention Graphs (SWIGs):

Introduction to Causal Calculus

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

On the errors introduced by the naive Bayes independence assumption

arxiv: v1 [math.st] 17 Jun 2017

A Distinction between Causal Effects in Structural and Rubin Causal Models

Econometric Causality

Graphical Representation of Causal Effects. November 10, 2016

Prequential Analysis

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

CAUSALITY. Models, Reasoning, and Inference 1 CAMBRIDGE UNIVERSITY PRESS. Judea Pearl. University of California, Los Angeles

Discussion of Papers on the Extensions of Propensity Score

OF CAUSAL INFERENCE THE MATHEMATICS IN STATISTICS. Department of Computer Science. Judea Pearl UCLA

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

FUNDAMENTALS OF STATISTICAL CAUSALITY

Defining and Identifying the Effect of Treatment on the Treated

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Path analysis for discrete variables: The role of education in social mobility

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

What Causality Is (stats for mathematicians)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

An Empirical Study of Probability Elicitation under Noisy-OR Assumption

Causal Inference from Experimental Data

Gov 2002: 4. Observational Studies and Confounding

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Conditional probabilities and graphical models

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

Potential Outcomes and Causal Inference I

Causal Inference & Reasoning with Causal Bayesian Networks

Technical Track Session I: Causal Inference

Causal mediation analysis: Definition of effects and common identification assumptions

AGEC 661 Note Fourteen

Mendelian randomization as an instrumental variable approach to causal inference

Causal Inference with Counterfactuals

Introduction to Causal Bayesian Inference Chris Holmes University of Oxford

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017

CompSci Understanding Data: Theory and Applications

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

DAGS. f V f V 3 V 1, V 2 f V 2 V 0 f V 1 V 0 f V 0 (V 0, f V. f V m pa m pa m are the parents of V m. Statistical Dag: Example.

Causal Inference. Prediction and causation are very different. Typical questions are:

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Probabilistic Models

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

OUTLINE THE MATHEMATICS OF CAUSAL INFERENCE IN STATISTICS. Judea Pearl University of California Los Angeles (

The Doubly-Modifiable Structural Model (DMSM) The Modifiable Structural Model (MSM)

From Causality, Second edition, Contents

Models of Causality. Roy Dong. University of California, Berkeley

What Counterfactuals Can Be Tested

Ignoring the matching variables in cohort studies - when is it valid, and why?

Help! Statistics! Mediation Analysis

External validity, causal interaction and randomised trials

Causal Mechanisms and Process Tracing

Bounds on Direct Effects in the Presence of Confounded Intermediate Variables

Exchangeability and Invariance: A Causal Theory. Jiji Zhang. (Very Preliminary Draft) 1. Motivation: Lindley-Novick s Puzzle

The Causal Inference Problem and the Rubin Causal Model

Directed Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Assessing In/Direct Effects: from Structural Equation Models to Causal Mediation Analysis

Comparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh

Local Characterizations of Causal Bayesian Networks

arxiv: v2 [math.st] 4 Mar 2013

6.3 How the Associational Criterion Fails

Mediation analysis for different types of Causal questions: Effect of Cause and Cause of Effect

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

CS 5522: Artificial Intelligence II

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Uncertainty and Bayesian Networks

Comparative effectiveness of dynamic treatment regimes

Identification and Estimation of Causal Effects from Dependent Data

Learning causal network structure from multiple (in)dependence models

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Integrating Correlated Bayesian Networks Using Maximum Entropy

Non-independence in Statistical Tests for Discrete Cross-species Data

Causal Bayesian networks. Peter Antal

Learning Semi-Markovian Causal Models using Experiments

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (

CS Lecture 3. More Bayesian Networks

CS 343: Artificial Intelligence

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Bayesian network modeling. 1

Causal Bayesian networks. Peter Antal

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D.

Computational Complexity of Bayesian Networks

Advanced Statistical Methods for Observational Studies L E C T U R E 0 1

Introduction to Artificial Intelligence. Unit # 11

arxiv: v1 [math.st] 7 Jan 2014

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Transcription:

The decision theoretic approach to causal inference OR Rethinking the paradigms of causal modelling A.P.Dawid 1 and S.Geneletti 2 1 University of Cambridge, Statistical Laboratory 2 Imperial College Department of Epidemiology and Public Health 04/05/2009

Outline Issues The simple problem - RCT s The hard problem - Observational studies The statistical decision theoretic approach

Questions Will aspirin cure my headache? Will it help those who are prescribed it? Did it cure my headache? Did it help those who were prescribed it? Would I still have a headache if I hadn t taken it?

Distinctions Retrospective Cause of effect Counterfactual Deterministic Value Observation (passive) Prospective Effect of cause Hypothetical Stochastic Distribution Intervention (active)

Problems Before data Meaning

Problems Before data Meaning Interpretation

Problems Before data Meaning Interpretation Inference

Problems Before data Meaning Interpretation Inference What data? Experimental studies Randomisation Observational studies Confounding Dynamic treatment regimes /Alternative treatment effects

Formal frameworks Maths Potential responses Functional models Conditional independence

Formal frameworks Maths Potential responses Functional models Conditional independence Tools Structural equations Path diagrams Directed acyclic graphs

Which way to go? Are there any differences between the frameworks?

Which way to go? Are there any differences between the frameworks? What explicit/implicit assumptions do they make?

Which way to go? Are there any differences between the frameworks? What explicit/implicit assumptions do they make? How reasonable are they?

Which way to go? Are there any differences between the frameworks? What explicit/implicit assumptions do they make? How reasonable are they? How do they assist(impose) the way we pose,frame

Which way to go? Are there any differences between the frameworks? What explicit/implicit assumptions do they make? How reasonable are they? How do they assist(impose) the way we pose,frame and answer causal queries

A simple problem Randomised experiment Binary treatment decision variable T Response random variable Y

Stats (101) model (Fisher) Specify conditional distribution of Y given T = t (t = 0, 1) e.g. Y N(µ t, σ 2 ) Sufficient to decide which decision is best Measure the effect of treatment by estimating δ = µ 1 µ 0 (1)

Error model Y = µ T + E T E = (E 0, E 1 ) s.t. E N(0, Σ) The values of E for any unit stay the same regardless of the T that unit receives. When E = E 0 = E 1 then this is a structural equation model

Potential responses model Imagine there are two Y s for each person (corresponding to the treatment T ) Y 0 : response to T = 0 Y 1 : response to T = 1 and these exist independently until the treatment you get reveals one of them and Y = Y T unrevealed one becomes counterfactual

Potential responses model So for any unit there is a pair Y = (Y 1, Y 0 ) with some joint distribution The unit level (individual) random causal effect (ICA) is unobservable Y 1 Y 0

Average Causal Effect This is observable E(Y 1 Y 0 ) = E(Y 1 ) E(Y 0 ) = E(Y T = 1) E(Y T = 0) = µ 1 µ 0

General Functional Model Y = f (T, U) (e.g. U = Y) Value of U would stay the same if we were to change T from 0 to 1

Connections PR GFM Any functional model generates a potential responses model (and vice-versa as a PR model is a FM with U = (Y 0, Y 1 )) Y t = f (t, U) Stat PR Any PR model generates a statistical model Pr(Y t ) = Pr(Y T = t) and more than one PR model can correspond to the same stats model

Potential response models: Problems? { Yt N(µ t, σ 2 ) (t = 0, 1) corr(y 0, Y 1 ) = ρ Corresponding stats model Pr(Y T = t) = Φ µt,σ 2(Y ) where Φ( ) is the cumulative distribution function for the N(µ t, σ 2 ) NB: ρ does not feature it cannot be estimated!

Potential response models: Problems? Under the PR model var(y 1 Y 0 ) = 2(1 ρ)σ 2 Cannot identify the pop n variation in ICA E(Y 1 Y 0 Y 1 = y 1 ) = (1 ρ)y 1 + ρµ 1 µ 0 Cannot identify the counterfactual ICA having observed the response to the actual treatment (in this case T = 1)

Not so simple problem: Observational studies Treatment taken is associated to the patient s health (e.g. a confounder) What assumptions are required to make causal inferences? When and how can these assumptions be justified? U T Y

What are causal inferences? General consensus that they are about what happens when we intervene

What are causal inferences? General consensus that they are about what happens when we intervene The big problem is that data are normally observational

What are causal inferences? General consensus that they are about what happens when we intervene The big problem is that data are normally observational Question then is, how do we make inferences about intervention from data that are observational? The different frameworks deal with this in different ways - more or less explicit

Error model E E N(0, Σ) T P T T Y Y = µ T + E T No confounding T E treatment independent of errors

Error model (T, E)? E E N(0, Σ) T P T T Y Y = µ T + E T No confounding T E treatment independent of errors Otherwise what is joint of T and E?

Potential responses model Y Y N(µ, Σ) T P T T Y Y = Y T Ignorable treatment assignment T Y treatment independent of PR s

Potential responses model (T, Y)? Y Y N(µ, Σ) T P T T Y Y = Y T Ignorable treatment assignment T Y treatment independent of PR s Otherwise what is joint of T and Y?

General Functional Model U U P U T P T T Y Y = f (T, U) No confounding T U treatment indep of unit characteristics

General Functional Model (T, U)? U U P U T P T T Y Y = f (T, U) No confounding T U treatment indep of unit characteristics Otherwise what is joint of T and U?

Potential response models: Problems? Value of Y = (Y 0, Y 1 ) for any unit the same for both experimental and observational cases as well as for either choice of T

Potential response models: Problems? Value of Y = (Y 0, Y 1 ) for any unit the same for both experimental and observational cases as well as for either choice of T So how are we to judge the independence of Y and T?

Potential response models: Problems? Value of Y = (Y 0, Y 1 ) for any unit the same for both experimental and observational cases as well as for either choice of T So how are we to judge the independence of Y and T? No reason to believe that responses the same under experiment and observation...

Statistical (Decision theoretic) Model Make the regime explicit with the variable F T

Statistical (Decision theoretic) Model Make the regime explicit with the variable F T F T = 1 p(t = 1 F = 1) = 1 means set treatment 1 (as in RCT)

Statistical (Decision theoretic) Model Make the regime explicit with the variable F T F T = 1 p(t = 1 F = 1) = 1 means set treatment 1 (as in RCT) F T = 0 p(t = 0 F = 0) = 1 means set treatment 0 (as in RCT)

Statistical (Decision theoretic) Model Make the regime explicit with the variable F T F T = 1 p(t = 1 F = 1) = 1 means set treatment 1 (as in RCT) F T = 0 p(t = 0 F = 0) = 1 means set treatment 0 (as in RCT) F T = p(t = t F = ) = p means just observe, T arises naturally" in the observational regime

Statistical (Decision theoretic) Model Make the regime explicit with the variable F T F T = 1 p(t = 1 F = 1) = 1 means set treatment 1 (as in RCT) F T = 0 p(t = 0 F = 0) = 1 means set treatment 0 (as in RCT) F T = p(t = t F = ) = p means just observe, T arises naturally" in the observational regime Ignorable treatment assignment is Y F T T

Statistical (Decision theoretic) Model Make the regime explicit with the variable F T F T = 1 p(t = 1 F = 1) = 1 means set treatment 1 (as in RCT) F T = 0 p(t = 0 F = 0) = 1 means set treatment 0 (as in RCT) F T = p(t = t F = ) = p means just observe, T arises naturally" in the observational regime Ignorable treatment assignment is Y F T T simple

Influence diagrams Start simply F T T Y

Influence diagrams Start simply F T T Y Add regime indicator node non random so in a box

Influence diagrams a Start simply F T T Y Add regime indicator node non random so in a box Absence of arrow a means Y F T T

Confounders F T T Y Y F T T simple case

Confounders a F T T Y Y F T T simple case If a then often

Confounders U F T Y F T (T, U) U b c F T T Y Y F T T simple case If a then often U (un)confounder Treatment assignment is ignorable conditional on U

Confounders U F T Y F T (T, U) U b c F T T Y Y F T T simple case If a then often U (un)confounder Treatment assignment is ignorable conditional on U If b absent (T U F T )

Confounders U F T Y F T (T, U) U b c F T T Y Y F T T simple case If a then often U (un)confounder Treatment assignment is ignorable conditional on U If b absent (T U F T )or c absent (Y U T ) then marginally ignorable

Causal Model Simply a more ambitious non-causal model expressing the invariance of certain modular structures across different regimes

Causal Model Simply a more ambitious non-causal model expressing the invariance of certain modular structures across different regimes E.g. something that behaves in the same way under observational and experimental regimes is a candidate for a stable relationship causal

Causal Model For a functional (e.g. PR) model invariant values of variables and functional relationships implicit, deterministic

Causal Model For a functional (e.g. PR) model invariant values of variables and functional relationships implicit, deterministic Statistical Model invariant conditional distributions explicit, stochastic

Brief word on estimation PR model Expectation of responses over those we already treated Deals with what would have happened to Jack who we treated if he had not been treated?

Brief word on estimation Statistical Model Bayesian predictive expectation of response for a new patient Deals with given we have observed Jack-like individuals, what decision should we recommend to a new patient exchangeable with Jack? Hence the name Decision theoretic

Advantages No impossible to observe-ables Stochastic not deterministic relationships Simple, explict and testable assumptions Focussed on what is the best decision for the future rather than what would have happened if Issues tackled Compliance Dynamic treatment regimes Alternative treatment measures Direct and Indirect effects

References Dawid, A.P. (2000). Causal Inference without Counterfactuals (with comments and rejoinder). JASA 95(450), 407-448. Dawid, A.P. (2002). Influence diagrams for causal modelling and inference, Intern. Stats. Rev. 70, 161-189 Dawid, A.P. (2003) Causal inference using influence diagrams: The problem of partial compliance (with Discussion). In Highly Structured Stochastic System, Eds P.J.Green, N.L.Hjort and S.Richardson. Oxford University Press Dawid, A.P. (2004) Probability, causality and the empirical world: A Bayes-de Finetti-Popper-Borel synthesis. Statistical Science 19, 44-57 Didelez,V and Dawid, A.P. (2008) Identifying optimal sequential decisions, In Proceedings of the 24th Annual Conference on Uncertainty in Artifical Intelligence, 113-120 Geneletti, S. (2007). Identifying direct and indirect effects in a non-counterfactual framework. J ROY STAT SOC B. 69:199-215 Geneltti, S and Dawid, A.P. (2009) Defining and Identifying the Effect of Treatment on the Treated, Technical Report, Imperial College, London Pearl, J. (2000) Causality, Cambridge University Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66(5), 699-701.