Introduction to Causal Calculus

Similar documents
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Causal Directed Acyclic Graphs

Probabilistic Causal Models

Causal Inference & Reasoning with Causal Bayesian Networks

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Rapid Introduction to Machine Learning/ Deep Learning

On Identifying Causal Effects

Data Mining 2018 Bayesian Networks (1)

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Identifying Linear Causal Effects

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Learning in Bayesian Networks

Directed Graphical Models

CS Lecture 3. More Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Identification of Conditional Interventional Distributions

Causal Reasoning with Ancestral Graphs

Causal Bayesian networks. Peter Antal

arxiv: v1 [cs.ai] 18 Oct 2016

Learning causal network structure from multiple (in)dependence models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Generalized Do-Calculus with Testable Causal Assumptions

Automatic Causal Discovery

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Conditional probabilities and graphical models

Probabilistic Models. Models describe how (a portion of) the world works

Complete Identification Methods for the Causal Hierarchy

Conditional Independence

Causal Bayesian networks. Peter Antal

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Causality in Econometrics (3)

Mediation analysis for different types of Causal questions: Effect of Cause and Cause of Effect

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

m-transportability: Transportability of a Causal Effect from Multiple Environments

OUTLINE THE MATHEMATICS OF CAUSAL INFERENCE IN STATISTICS. Judea Pearl University of California Los Angeles (

Based on slides by Richard Zemel

CPSC 540: Machine Learning

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Probabilistic Models

DAGS. f V f V 3 V 1, V 2 f V 2 V 0 f V 1 V 0 f V 0 (V 0, f V. f V m pa m pa m are the parents of V m. Statistical Dag: Example.

Bayesian networks as causal models. Peter Antal

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Instrumental Sets. Carlos Brito. 1 Introduction. 2 The Identification Problem

A Decision Theoretic Approach to Causality

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (

Lecture 4 October 18th

Lecture Notes 22 Causal Inference

Alternative Markov and Causal Properties for Acyclic Directed Mixed Graphs

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Introduction to Probabilistic Graphical Models

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Stability of causal inference: Open problems

Arrowhead completeness from minimal conditional independencies

CS 5522: Artificial Intelligence II

Identifiability and Transportability in Dynamic Causal Networks

Causal Models with Hidden Variables

Machine Learning Summer School

Causal Analysis After Haavelmo

Taming the challenge of extrapolation: From multiple experiments and observations to valid causal conclusions. Judea Pearl and Elias Bareinboim, UCLA

Identifiability in Causal Bayesian Networks: A Sound and Complete Algorithm

THE MATHEMATICS OF CAUSAL INFERENCE With reflections on machine learning and the logic of science. Judea Pearl UCLA

On the Identification of Causal Effects

4.1 Notation and probability review

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Advanced Probabilistic Modeling in R Day 1

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Lecture 5: Bayesian Network

CAUSAL INFERENCE AS COMPUTATIONAL LEARNING. Judea Pearl University of California Los Angeles (

CS 5522: Artificial Intelligence II

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Machine Learning Lecture 14

Probabilistic Graphical Networks: Definitions and Basic Results

Identifiability in Causal Bayesian Networks: A Gentle Introduction

Transportability of Causal Effects: Completeness Results

Causal Effect Evaluation and Causal Network Learning

Causal discovery from big data : mission (im)possible? Tom Heskes Radboud University Nijmegen The Netherlands

Directed Graphical Models or Bayesian Networks

Estimating complex causal effects from incomplete observational data

Gov 2002: 4. Observational Studies and Confounding

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Directed and Undirected Graphical Models

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Naïve Bayes Classifiers

arxiv: v4 [math.st] 19 Jun 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

CSE 473: Artificial Intelligence Autumn 2011

CS 343: Artificial Intelligence

Advances in Cyclic Structural Causal Models

CS 188: Artificial Intelligence Fall 2009

2 : Directed GMs: Bayesian Networks

{ p if x = 1 1 p if x = 0

Causal Analysis After Haavelmo James Heckman Rodrigo Pinto

Our learning goals for Bayesian Nets

Causal Discovery Methods Using Causal Probabilistic Networks

Transcription:

Introduction to Causal Calculus Sanna Tyrväinen University of British Columbia August 1, 2017 1 / 1

2 / 1

Bayesian network Bayesian networks are Directed Acyclic Graphs (DAGs) whose nodes represent random variables: d p(x 1,..., x d ) = p(x j x pa(j) ) Assumes that a variable is independent of previous non-parents given the parents, that is, p(x j x 1,..., x d ) = p(x j x pa(j) ) Captures how variables are conditionally dependent: If there are no any arrows between the nodes then they are independent: j=1 p(a, B) = p(a)p(b) The joint probability distribution: p(g, S, R) = p(g S, R)p(S R)p(R) 3 / 1

Sampling from a DAG Ancestral sampling: Sample x 1 from p(x 1) If x 1 is a parent of x 2, sample x 2 from p(x 2 x 1) Otherwise, sample x 2 from p(x 2) Go through the subsequent j in order sampling x j from p(x j x pa(j) ) Conditional Sampling: easy if condition on the first variables: fix these and run ancestral sampling Hard if condition on the last variables: Conditioning on descendent makes ancestors dependent 4 / 1

It s hard to separate out causality from correlation DAGs can be viewed as a causal process: the parents cause the children to take different values The below equations are equivalent and the graphs have same conditional independences, but the causalities are not the same. Graphs tells us something useful that equations miss. Y = X + 1 Z = 2Z X = Y 1 Y = Z/2 There is observational data ( seeing ) and interventional data ( doing ) Usually the DAG is designed for observational data, but that ignores the possibility of unobserved variables, also without interventional data you can t distinguish the direction of causality. Simplest external intervention: a single variable is forced to take some fixed value (in a graph remove arrows entering that variable) 5 / 1

D-separation d-separation is a criterion for deciding, from a causal graph, whether a set A of variables is independent of another set B (given a third set C) A B C A and B are d-separated if for all paths P from A to B, at least one of the following holds: P includes a chain with an observed middle node P includes a fork with an observed parent node P includes a v-structure or collider A and B are d-separated, give C, iff corresponding random variables are conditionally independent: p(a, B C) = p(a C)p(B C) If A and B are not d-separated they are d-connected 6 / 1

The Causal Calculus (do-calculus, Pearl s Causal Calculus, Calculus of Actions) Shortly: Calculus to discuss causality in a formal language by Judea Pearl A new operator, do(), marks an action or an intervention in the model. In an algebraic model we replace certain functions with a constant X = x, and in a graph we remove edges going into the target of intervention, but preserve edges going out of the target. The causal calculus uses Bayesian conditioning, p(y x), where x is observed variable, and causal conditioning, p(y do(x)), where an action is taken to force a specific value x. Goal is to generate probabilistic formulas for the effect of interventions in terms of the observed probabilities. Judea Pearl, Causality: Models, reasoning, and inference. Cambridge University Press, 2000. 7 / 1

Notations Notation: a graph G, W, X, Y, Z are disjoint subsets of the variables. G X denotes the perturbed graph in which all edges pointing to X have been deleted, and G X denotes the perturbed graph in which all edges pointing from X have been deleted. Z(W ) denote the set of nodes in Z which are not ancestors of W Image: Judea Pearl 8 / 1

Pearl s 3 rules Pearl s 3 rules Ignoring observations p(y do(x), z, w) = p(y do(x), w) if (Y Z X, W ) GX Action/Observation exchange (the back-door criterion) p(y do(x), do(z), w) = p(y do(x), z, w) if (Y Z X, W ) GX,Z Ignoring actions/interventions p(y do(x), do(z), w) = p(y do(x), w) if (Y Z X, W ) GX,Z(W ) Notation: a graph G, W, X, Y, Z are disjoint subsets of the variables. G X denotes the perturbed graph in which all edges pointing to X have been deleted, and G X denotes the perturbed graph in which all edges pointing from X have been deleted. Z(W ) denote the set of nodes in Z which are not ancestors of W 9 / 1

Intuition behind the Pearl s first rule With condition (Y Z X, W ) GX we have p(y do(x), z, w) = p(y do(x), w) Let s start with a simple case where we assume that there are no W or X. We get a condition (Y Z) G, so Y is independent of Z, that is, p(y z) = p(y) In the second case assume we have passively observed W, but no variable X: (Y Z W ) G. Earlier we mentioned connection of d-separation and conditionally independent, that is, p(y z, w) = p(y w) The third case assume we don t know W, but we have X that s value is set by intervention: (Y Z X) GX. By the same theorem, that is, p(y z, do(x)) = p(y do(x)) Combining these gives the full rule. 10 / 1

Example: Smoking and lung cancer Randomized Controlled Trials (RCT) AKA Randomized Control Trial, Randomized clinical trial The participants in the trial are randomly allocated to either the group receiving the treatment under investigation or to the control group The control group removes the confounding factor of the placebo effect Double-blind studies remove further confounding factors Sometimes impractical or impossible We can try to use causal calculus to analyze the probability that someone would get cancer given that they are smoking, without doing an actual RCT: p(y do(x)) Note: We have no information about the hidden variable that could cause both smoking and cancer 11 / 1

Example We can t try to apply rule 1 because there is no observations to ignore, we would just have p(y do(x)) = p(y do(x)). Try apply rule 2: We would have p(y do(x)) = p(y x), that is, the intervention doesn t matter. It s condition is (Y X) GX : Y and X are not d-separated, because they have a common ancestor. = Rule 2 can t be applied Try apply rule 3: We would have p(y do(x)) = p(y), that is, an intervention to force someone to smoke has no impact on whether they get cancer. It s condition is (Y X) GX : Y and X are not d-separated, because we have unblocked path between them. = Rule 3 can t be applied 12 / 1

Example New attempt: p(y do(x)) = z = z = z = z p(y z, do(x))p(z do(x)) p(y z, do(x))p(z x) (rule 2: (Z X) GX ) p(y do(z), do(x))p(z x) (rule 2: (Y Z X) GX,Z ) p(y do(z))p(z x) (rule 3: (Y X Z) GZ,X ) 13 / 1

Example We can use the same approach to the first term on the right hand side: p(y do(z)) = x = x p(y x, do(z))p(x do(z)) p(y x, z)p(x) (rule 2 + rule 3) Finally we can combine these results: p(y do(x)) = z,x p(y x, z)p(z x)p(x ) We can now compare p(y) and p(y x). The needed probabilities can be observed directly from experimental data: What part of smokers have lung cancer, how many of them have tar in their lungs etc. 14 / 1

Example: end The analysis would have not worked if the graph had missed the tar variable, Z, because there is no general way to compute p(y do(x)) from any observed distributions whenever the causal model includes subgraph shown the figure below Causal Calculus can be used to analyze causality in more complicated (and more unethical) situations than RCT Causal Calculus can also be used to test whether unobserved variables are missed by removing all do terms from the relation Not all models are acyclic. See for example Modeling Discrete Interventional Data Using Directed Cyclic Graphical Models (UAI 2009) by Mark Schmidt and Kevin Murphy 15 / 1

Literature A Probabilistic Calculus of Actions (UAI1994) by Judea Pearl Tutorial by Michael Nielsen: http://www.michaelnielsen.org/ddi/ifcorrelation-doesnt-imply-causation-then-what-does/ Tutorial in Formalised Thinking: https://formalisedthinking.wordpress.com/2010/08/20/pearlsformalisation-of-causality-sequence-index/ Judea Pearl, Causality: Models, reasoning, and inference, Cambridge University Press, 2000 Introduction to Judea Pearl s Do-Calculus, Robert Tucci, 2013 16 / 1

Thank you for listening. Questions? 17 / 1