Probabilistic Causal Models

Similar documents
Parametrizations of Discrete Graphical Models

Introduction to Causal Calculus

Causal Models with Hidden Variables

Causal Structure Learning and Inference: A Selective Review

Causality II: How does causal inference fit into public health and what it is the role of statistics?

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Graphical Representation of Causal Effects. November 10, 2016

Causal Bayesian networks. Peter Antal

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

DAGS. f V f V 3 V 1, V 2 f V 2 V 0 f V 1 V 0 f V 0 (V 0, f V. f V m pa m pa m are the parents of V m. Statistical Dag: Example.

Lecture 5: Bayesian Network

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Automatic Causal Discovery

Learning Bayesian networks

Bayesian networks as causal models. Peter Antal

Single World Intervention Graphs (SWIGs):

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

CS Lecture 3. More Bayesian Networks

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

Challenges (& Some Solutions) and Making Connections

What Causality Is (stats for mathematicians)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality

Causal Inference & Reasoning with Causal Bayesian Networks

Directed Graphical Models

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Causal Effect Evaluation and Causal Network Learning

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

2 : Directed GMs: Bayesian Networks

MACHINE LEARNING FOR CAUSE-EFFECT PAIRS DETECTION. Mehreen Saeed CLE Seminar 11 February, 2014.

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Learning in Bayesian Networks

Mendelian randomization as an instrumental variable approach to causal inference

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017

Chapter 11. Correlation and Regression

MATH 10 INTRODUCTORY STATISTICS

Causal Discovery by Computer

The International Journal of Biostatistics

STATISTICS Relationships between variables: Correlation

Directed and Undirected Graphical Models

A Decision Theoretic Approach to Causality

Causal Inference. Prediction and causation are very different. Typical questions are:

Bayesian Networks Basic and simple graphs

Causal Graphical Models in Systems Genetics

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Causal Directed Acyclic Graphs

Causal Reasoning with Ancestral Graphs

Tutorial: Causal Model Search

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Introduction to Probabilistic Graphical Models

Acyclic Linear SEMs Obey the Nested Markov Property

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Graphical Models

Causal Models. Macartan Humphreys. September 4, Abstract Notes for G8412. Some background on DAGs and questions on an argument.

Machine Learning Summer School

Single World Intervention Graphs (SWIGs):

Lecture 6: Graphical Models: Learning

Data Mining 2018 Bayesian Networks (1)

Causal Bayesian networks. Peter Antal

Graphical Models. Chapter Conditional Independence and Factor Models

arxiv: v2 [math.st] 4 Mar 2013

Introduction to Artificial Intelligence. Unit # 11

Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals

Directed Graphical Models or Bayesian Networks

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

COMP538: Introduction to Bayesian Networks

STA 4273H: Statistical Machine Learning

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Undirected Graphical Models

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Conditional probabilities and graphical models

Carnegie Mellon Pittsburgh, Pennsylvania 15213

Variable selection and machine learning methods in causal inference

Computational Genomics

Lecture Notes 22 Causal Inference

2 : Directed GMs: Bayesian Networks

Conditional Independence

Learning causal network structure from multiple (in)dependence models

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

From Causality, Second edition, Contents

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

Probabilistic Models

Causality in Econometrics (3)

Causal Discovery Methods Using Causal Probabilistic Networks

Bayesian Networks for Causal Analysis

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Machine Learning Summer School

Bayesian Inference. 2 CS295-7 cfl Michael J. Black,

Learning Bayesian Networks for Biomedical Data

Gov 2002: 4. Observational Studies and Confounding

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Lecture 4 October 18th

Transcription:

Probabilistic Causal Models A Short Introduction Robin J. Evans www.stat.washington.edu/ rje42 ACMS Seminar, University of Washington 24th February 2011 1/26

Acknowledgements This work is joint with Thomas Richardson. Thanks also to James Robins, Antonio Forcina and Tamás Rudas. 2/26

Outline 1 Introduction 2 Independence 3 ADMGs 4 Models and Parametrizations 5 Conclusions 3/26

The Problem of Inference www.xkcd.com/552/ 4/26

Causation vs Association Some examples: smoking and lung cancer murder rates and ice cream sales use of antidepressants and sales of video games. 5/26

Randomized Controlled Trials Scientists, and particularly medical researchers, want to answer causal questions all the time. Diet? Weight Solution: take 1,000 people, randomly divide into two groups. Assign one group to a healthy diet. This breaks any possible confounding or effect of weight on diet. If we still measure any correlation, it must be causal 1. For this reason randomized controlled trials can be thought of as a gold standard for the scientific method 2. 1 Maybe. 2 Or at least I think so. 6/26

Randomized Controlled Trials Scientists, and particularly medical researchers, want to answer causal questions all the time. Diet? Weight Solution: take 1,000 people, randomly divide into two groups. Assign one group to a healthy diet. This breaks any possible confounding or effect of weight on diet. If we still measure any correlation, it must be causal 1. For this reason randomized controlled trials can be thought of as a gold standard for the scientific method 2. 1 Maybe. 2 Or at least I think so. 6/26

Gold Standards are Expensive... Smoking? Cancer In this case, forcing people to smoke is unethical. In many examples, randomized trials are (i) too expensive, (ii) impractical, (iii) unethical. Thus we want to get the same information from observational data. 7/26

Technical Points We will introduce three separate but related concepts: Structural Equation Models (SEMs). This is our model of how causality works. Independence. SEMs impose independence constraints. This is how we will test different causal models. Graphs. A coarser model than the SEM, but easier to deal with; they contain only independence constraints. They allow us to visualize the independences. 8/26

How does Causality work? We need a model of what causality means in the real world. We use structural equation models: let x 1,x 2,...,x k be our random variables. Assume x i = f i (x 1,...,x i 1,u i ), for some functions f i and noise term u i. Allow correlation between pairs of noise terms u i and u j. Nature takes previous values and unobserved noise, and determines new values. Example. Suppose that x 1 = f 1 (u 1 ) x 2 = f 2 (x 1,u 2 ) x 3 = f 3 (u 3 ) x 4 = f 4 (x 2,u 4 ) 3 with u 2,u 3 correlated and u 3,u 4 correlated. 1 2 4 9/26

Interventions This framework allows us to easily incorporate interventions, such as randomized experiments. Suppose we wish to perform an experiment, by intervening on x 2 : x 1 = f 1 (u 1 ) x 2 = f 2 (x 1,u 2 ) x 3 = f 3 (u 3 ) x 4 = f 4 (x 2,u 4 ) 3 with u 2,u 3 correlated and u 3,u 4 correlated. 1 2 4 A randomized controlled trial is one kind of intervention. 10/26

Interventions This framework allows us to easily incorporate interventions, such as randomized experiments. Suppose we wish to perform an experiment, by intervening on x 2 : x 1 = f 1 (u 1 ) x 2 = x 2 x 3 = f 3 (u 3 ) x 4 = f 4 (x 2,u 4 ) 3 with u 3,u 4 correlated. 1 2 4 A randomized controlled trial is one kind of intervention. 10/26

Marginal and Conditional Independence We say two random variables X and Y are independent if the chances of X taking particular values is unaffected by the value of Y: P(X = x Y = y) = P(X = x) for all levels y. We write X Y. Example: P(raining earthquake occurred) = P(raining). We say X and Y are independent conditional upon Z if X is unaffected by Y for each level z of Z: P(X = x Y = y,z = z) = P(X = x Z = z) for all levels y,z. We write X Y Z. Example: P(ground wet cloudy, rainfall) = P(ground wet rainfall) 11/26

Using Independence How are independence and causality related? Smoking Tar Cancer A causal question: does smoking cause cancer other than through putting tar in the lungs? Suppose we only have observational data. How would it look if the dashed arrow were not present? Then relationship between Smoking and Cancer entirely mediated by Tar; thus we would observe that Smoking Cancer Tar. 12/26

Using Independence Ice Cream Murders Is temperature the confounder for Ice Cream sales and Murder Rates? Then we might expect that Ice Cream Murder Temperature. 13/26

Using Independence Ice Cream Temp. Murders Is temperature the confounder for Ice Cream sales and Murder Rates? Then we might expect that Ice Cream Murder Temperature. 13/26

Acyclic Directed Mixed Graphs We work with directed mixed graphs, which have 2 types of edges (directed and bidirected). 1 2 3 4 No directed cycles: 1 2 3 We call this class Acyclic Directed Mixed Graphs (ADMGs). Without bidirected edges we get a Directed Acyclic Graph (DAG). 14/26

m-separation Two vertices x and y are m-separated by a set Z if all paths from x to y are blocked by Z. Either: at least one collider is not conditioned upon, and nor are any of its descendants: x v 1 z v 2 y v 3 Or: at least one non-collider is conditioned upon: x v 1 z 1 y z 2 m-separation extends to sets X and Y if every x X and y Y are m-separated. 15/26

Global Markov Property Let P be a distribution over the vertices of G. The global Markov property (GMP) for ADMGs states that X m-separated from Y by Z = X Y Z [P] In fact, these independences hold in the structural equation model. 3 1 2 4 Here 1 4 2 and 1 3. ADMGs represent a coarser model class than the corresponding structural equation models. In other words, SEMs may impose additional structure (see appendix). 16/26

Markov Equivalence Two or more different ADMGs can represent the same conditional independences: 1 2 3 1 2 3 1 2 3 We cannot distinguish between these models observationally. 17/26

Set Up We might be interested in some model M G = {P P obeys GMP for G}. For simplicity we assume the probability space is binary (two possible outcomes for each variable) and positive. To fit a model like this to data, we need to characterize it explicitly. That is, need a smooth, bijective map to some nice set A. f : M G A R p This is called a parametrization. There are many different choices, but p is fixed. 18/26

Saturated Model Consider the model 1 2 3 This imposes no restrictions of any kind (saturated). The model is characterized by the 8 probabilities P 000 P 100 P 010 P 110 P 001 P 101 P 011 P 111, where P ijk P(X 1 = i,x 2 = j,x 3 = k). We know they sum to 1, so picking any 7 of these is a parametrization of the model. There are other parametrizations we could have used, but we always have 7 parameters. 19/26

Heart Disease Example Imagine a simple system explaining coronary heart disease. Family Hist. Smoking CHD A priori, P(Family History) = 0.4 and P(Smoking) = 0.25, independently. P(CHD ) given by table: The graph gives us Family History Smoking. FH These 6 probabilities are a parametrization of the model. Smoking 0 1 0 0.300 0.400 1 0.350 0.456 20/26

Model Selection Goal: to select the correct ADMG from a data set. A simple approach: fit a series of ADMG models to the data, see which one fits best according to some score. This requires: (i) a parametrization for the model (Richardson, 2009); (ii) a fitting algorithm (Evans and Richardson, 2010). The parametrization uses probabilities, as on the previous slide. We use the Bayesian information criterion (BIC) to score models. This is 2logL(P)+plogn where p is the number of parameters, n is sample size, L is likelihood. Models with more parameters are penalized. 21/26

Parametrizations The parametrization of the heart disease model doesn t allow for no interactions between effects. Family Hist. 0.63 0.4 CHD 0.88 0.3 Smoking 0.25 This doesn t impose any additional independences, so G is the smallest correct (graphical) model. The parametrization in Richardson (2009) makes it hard to impose this extra constraint. 22/26

Alternative Parametrizations In Richardson and Evans (2011) we provide a new parametrization which overcomes some of these difficulties. The DAG case was first given by Rudas et al. (2006). In the heart disease example, the new parameters have the form log P(Heart Disease = 1 FH = f,s = s) P(Heart Disease = 0 FH = f,s = s). We also get variation independence in some cases, which helps with interpretability as well as some applications. 23/26

Application Maathuis et al. (2010) looked at gene expression data in yeast (5,361 genes). 234 observations from interventional experiments (single gene deletion) taken to be truth. Observational data from 63 wild-type cultures. Aim to recover causal paths from just observational data. The authors use only DAGs (not realistic, but simple and effective). Fit model and look at largest effects. Of top 50 effects found, 2/3 were validated in interventional data. Good way of providing biologist with places to start looking! We would like to extend this to ADMG models. 24/26

Summary There is a close connection between graphical models and causal models. Randomized experiments are the best way of making causal connections. However, these are sometimes not feasible. Sometimes a mixture of intuition, prior knowledge and observational data can go a long way. Some parametrizations are better than others. 25/26

Thank you! 26/26

The Verma Constraint Consider a sequential clinical trial of a blood pressure drug: Dose1 BP1 Dose2 BP2 No independences hold. However, there is no arrow between Dose1 and BP2. Can we test for the presence / absence of this arrow? 27/26

An Intervention It is clear that if we could perform an experiment to randomize Dose2, we would see whether Dose1 and BP2 were independent. Dose1 BP1 Dose2 BP2 However it turns out that we can see the effect of this intervention even without performing it! 28/26

An Intervention It is clear that if we could perform an experiment to randomize Dose2, we would see whether Dose1 and BP2 were independent. Dose1 BP1 Dose2 BP2 However it turns out that we can see the effect of this intervention even without performing it! 28/26

Instrumental Variables A well studied model is the instrumental variables model: z x y This model imposes no independence constraints (or Verma constraints), but there is no edge from z to y. However, the SEM imposes certain inequality constraints such as: P(X = 0,Y = 0 Z = 0)+P(X = 0,Y = 1 Z = 1) 1. These can be used to test the model (i.e. that there is no edge from z to y). 29/26