Bayesian networks as causal models. Peter Antal

Similar documents
Causal Bayesian networks. Peter Antal

Causal Bayesian networks. Peter Antal

Fusion in simple models

#!" $"" % &'' % " ( Data analysis process Bayesian decision theoretic foundation Causal inference Open access data, publication, model, computation

Lecture 4 October 18th

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Introduction to Causal Calculus

Probability theory: elements

Egészségügyi informatika és biostatisztika Biomarkerek

Causal Inference & Reasoning with Causal Bayesian Networks

Lecture 5: Bayesian Network

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

4.1 Notation and probability review

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Conditional Independence and Factorization

Directed Graphical Models or Bayesian Networks

On Causal Explanations of Quantum Correlations

Learning in Bayesian Networks

Probabilistic Causal Models

Rapid Introduction to Machine Learning/ Deep Learning

Causality in Econometrics (3)

Probabilistic Graphical Models (I)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Networks: Representation and Approximate Inference

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

Conditional Independence

Data Analysis and Uncertainty Part 1: Random Variables

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Undirected Graphical Models: Markov Random Fields

LEARNING CAUSAL EFFECTS: BRIDGING INSTRUMENTS AND BACKDOORS

Bayesian Network Representation

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Bayesian Networks Representation

Chris Bishop s PRML Ch. 8: Graphical Models

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

10601 Machine Learning

Bayesian Networks Inference with Probabilistic Graphical Models

A Decision Theoretic Approach to Causality

Automatic Causal Discovery

Introduction to Probabilistic Graphical Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Towards an extension of the PC algorithm to local context-specific independencies detection

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CS6220: DATA MINING TECHNIQUES

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Building Bayesian Networks. Lecture3: Building BN p.1

Lecture Notes 22 Causal Inference

Bounding the Probability of Causation in Mediation Analysis

CS Lecture 3. More Bayesian Networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

COMP538: Introduction to Bayesian Networks

CS6220: DATA MINING TECHNIQUES

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Stephen Scott.

Bayes Nets: Independence

Capturing Independence Graphically; Undirected Graphs

Bayesian Networks for Causal Analysis

From Causality, Second edition, Contents

Graphical Models and Kernel Methods

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Advanced Probabilistic Modeling in R Day 1

Causal Models with Hidden Variables

Causal Reasoning with Ancestral Graphs

Bayesian networks (1) Lirong Xia

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Networks BY: MOHAMAD ALSABBAGH

Single World Intervention Graphs (SWIGs):

4 : Exact Inference: Variable Elimination

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Grundlagen der Künstlichen Intelligenz

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

CMPSCI 240: Reasoning about Uncertainty

STA 4273H: Statistical Machine Learning

Identification of Conditional Interventional Distributions

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Intelligent Systems (AI-2)

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Data Mining 2018 Bayesian Networks (1)

9 Forward-backward algorithm, sum-product on factor graphs

Capturing Independence Graphically; Undirected Graphs

Probabilistic Machine Learning

From intelligent data analysis to medical decision support Péter Antal

Complete Identification Methods for the Causal Hierarchy

Bayesian Networks: Representation, Variable Elimination

Y. Xiang, Inference with Uncertain Knowledge 1

BN Semantics 3 Now it s personal!

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

Statistical Models for Causal Analysis

Uncertain Reasoning. Environment Description. Configurations. Models. Bayesian Networks

Learning Semi-Markovian Causal Models using Experiments

Transcription:

Bayesian networks as causal models Peter Antal antal@mit.bme.hu A.I. 3/20/2018 1

Can we represent exactly (in)dependencies by a BN? From a causal model? Suff.&nec.? Can we interpret edges as causal relations with no hidden variables? in the presence of hidden variables? local models as autonomous mechanisms? Can we infer the effect of interventions?

In a Bayesian network, any query can be answered corresponding to passive observations: p(q=q E=e). What is the (conditional) probability of Q=q given that E=e. Note that Q can preceed temporally E. X Y Specification: p(x), p(y X) Joint distribution: p(x,y) Inferences: p(x), p(y), p(y X), p(x Y) A.I. 3/20/2018 3

Perfect intervention: do(x=x) as set X to x. What is the relation of p(q=q E=e) and p(q=q do(e=e))? X Y Specification: p(x), p(y X) Joint distribution: p(x,y) Inferences: p(y X=x)=p(Y do(x=x)) p(x Y=y) p(x do(y=y)) What is a formal knowledge representation of a causal model? What is the formal inference method? A.I. 3/20/2018 4

Imagery observations and interventions: We observed X=x, but imagine that x would have been observed: denoted as X =x. We set X=x, but imagine that x would have been set: denoted as do(x =x ). What is the relation of Observational p(q=q E=e, X=x ) Interventional p(q=q E=e, do(x=x )) Counterfactual p(q =q Q=q, E=e, do(x=x), do(x =x )) O: What is the probability that the patient recovers if he takes the drug x. I:What is the probability that the patient recovers if we prescribe* the drug x. C: Given that the patient had not recovered for the drug x, what would have been the probability that patient recovers if we had prescribed* the drug x, instead of x. *: Assume that the patient is fully compliant. ** expected to neither he will. A.I. 3/20/2018 5

The domain is defined by the joint distribution P(X 1,..., X n Structure,parameters) 1. Representation of parameteres small number of parameters 2. Representation of independencies what is relevant for diagnosis 3. Representation of causal relations what is the effect of a treatment 4. Representation of possible worlds quantitave qualitative? passive (observational) Active (interventional) Imagery (counterfactual)

strong association, X precedes temporally Y, plausible explanation without alternative explanations based on confounding, necessity (generally: if cause is removed, effect is decreased or actually: y would not have been occurred with that much probability if x had not been present), sufficiency (generally: if exposure to cause is increased, effect is increased or actually: y would have been occurred with larger probability if x had been present). Autonomous, transportable mechanism. The probabilistic definition of causation formalizes many, but for example not the counterfactual aspects. A.I. 3/20/2018 7

I P (X;Y Z) or (X Y Z) P denotes that X is independent of Y given Z: P(X;Y z)=p(y z) P(X z) for all z with P(z)>0. (Almost) alternatively, I P (X;Y Z) iff P(X Z,Y)= P(X Z) for all z,y with P(z,y)>0. Other notations: D P (X;Y Z) =def= I P (X;Y Z) Contextual independence: for not all z.

The independence map (model) M of a distribution P is the set of the valid independence triplets: M P ={I P,1 (X 1 ;Y 1 Z 1 ),..., I P,K (X K ;Y K Z K )} If P(X,Y,Z) is a Markov chain, then M P ={D(X;Y), D(Y;Z), I(X;Z Y)} Normally/almost always: D(X;Z) Exceptionally: I(X;Z) X Y Z

Y X Z If P(Y,X,Z) is a naive Bayesian network, then M P ={D(X;Y), D(Y;Z), I(X;Z Y)} Normally/almost always: D(X;Z) Exceptionally: I(X;Z)

Directed acyclic graph (DAG) nodes random variables/domain entities edges direct probabilistic dependencies (edges- causal relations Local models - P(X i Pa(X i )) Three interpretations: 3. Concise representation of joint distributions P( M, O, D, S, T) P( M ) P( O M ) P( D O, M ) P( S D) P( T S, M ) 1. Causal model M P ={I P,1 (X 1 ;Y 1 Z 1 ),...} 2. Graphical representation of (in)dependencies

I G (X;Y Z) denotes that X is d-separated (directed separated) from Y by Z in directed graph G.

In the first order Markov chain below, despite the dependency of X-Y and Y-Z, X and Z can be independent (assuming non-binary Y). X Y Z

Pairwise independence does not imply multivariate independence! X XOR Z Y

absent Bleeding weak strong Onset Regularity P(D Bleeding=strong) Onset=early Onset=late regular P(D a,e) Mutation P(D w,r) h.wild mutated P(D a,l,h.w.) P(D a,l,m) irregular Mutation h.wild mutated P(D w,i,h.w.) P(D w,i,m) Decision tree: Each internal node represent a (univariate) test, the leafs contains the conditional probabilities given the values along the path. Decision graph: If conditions are equivalent, then subtrees can be merged. E.g. If (Bleeding=absent,Onset=late) ~ (Bleeding=weak,Regularity=irreg) A.I.: BN homework guide

For certain distributions exact representation is not possible by Bayesian networks, e.g.: 1. Intransitive Markov chain: XYZ 2. Pure multivariate cause: {X,Z}Y V 3. Diamond structure: P(X,Y,Z,V) with M P ={D(X;Z), D(X;Y), D(V;X), D(V;Z), I(V;Y {X,Z}), I(X;Z {V,Y}).. }. X Z Y

A.I. 3/20/2018 19

Causal models: J.Pearl: ~ 3D objects From passive observations: P(X 1,..., X n ) M P ={I P,1 (X 1 ;Y 1 Z 1 ),..., I P,K (X K ;Y K Z K )} 2D projection Different causal models can have the same independence map! Typically causal models cannot be identified from passive observations, they are observationally equivalent.

Causal models: X 1 X 2 X 3 X 4 X 1 X 2 X 3 X 4 Markov chain P(X 1,...) M P ={I(X i+1 ;X i-1 X i )} first order Markov property Flow of time?

p(x),p(z X),p(Y Z) X Z Y p(x Z),p(Z Y),p(Y) X Z Y p(x Z),p(Z),p(Y Z) X Z Y transitive M intransitive M p(x),p(z X,Y),p(Y) X Y Z v-structure M P ={D(X;Z), D(Z;Y), D(X,Y), I(X;Y Z)} M P ={D(X;Z), D(Y;Z), I(X;Y), D(X;Y Z) } Often: present knowledge renders future states conditionally independent. (confounding) Ever(?): present knowledge renders past states conditionally independent. (backward/atemporal confounding)

Causal model: Dependency map: P(X 1,..., X n ) M P ={I P,1 (X 1 ;X 2 ),...} One-to-one relation

Causal models (there is a DAG for each ordering, i.e. n! DAGs): Dependency map: P(X 1,..., X n ) M P ={D P,1 (X 1 ;X 2 ),...} One-to-many relation

A DAG is called a causal structure over a set of variables, if each node represents a variable and edges direct influences. A causal model is a causal structure extended with local probabilistic models. A causal structure G and distribution P satisfies the Causal Markov Condition, if P obeys the local Markov condition w.r.t. G. The distribution P is said to stable (or faithful), if there exists a DAG called perfect map exactly representing its (in)dependencies (i.e. I G (X;Y Z) I P (X;Y Z) X,Y,Z V ). CMC: sufficiency of G (there is no extra, acausal edge) Faithfulness/stability: necessity of G (there are no extra, parametric independency)

(Passive, observational) inference P(Query Observations) Interventionist inference P(Query Observations, Interventions) Counterfactual inference P(Query Observations, Counterfactual conditionals)

If G is a causal model, then compute p(y do(x=x)) by 1. deleting the incoming edges to X 2. setting X=x 3. performing standard Bayesian network inference. Mutation? Subpopulation Location E X? Y * Disease

Causal models: X Y X Y X causes Y Y causes X X * Y... There is a common cause (pure confounding) X *... * Y Causal effect of Y on X is confounded by many factors From passive observations: M P ={D(X;Y)} P(X,Y) X Y X and Y are associated Reichenbach's Common Cause Principle: a correlation between events X and Y indicates either that X causes Y, or that Y causes X, or that X and Y have a common cause.

Can we learn causal relations from observational data in presence of confounders??? Smoking Increased propensity Smoking A genetic polymorphism* Increased susceptibility Lung cancer Lung cancer Automated, tabula rasa causal inference from (passive) observation is possible, i.e. hidden, confounding variables can be excluded E X? Y??? *

H.Simon X i =f i (X 1,..,X i-1 ) for i=1..n In the linear case the sytem of equations indicates a natural causal ordering (flow of time?)... X X X X X X X X X X The probabilistic conceptualization is its generalization: P(X i, X 1,..,X i-1 ) ~ X i =f i (X 1,..,X i-1 )

Can we represent exactly (in)dependencies by a BN? almost always Can we interpret edges as causal relations with no hidden variables? compelled edges as a filter in the presence of hidden variables? Sometimes, e.g. confounding can be excluded in certain cases in local models as autonomous mechanisms? a priori knowledge, e.g. Causal Markov Assumption Can we infer the effect of interventions in a causal model? Graph surgery with standard inference in BNs Optimal study design to infer the effect of interventions? With no hidden variables: yes, in a non-bayesian framework In the presence of hidden variables: open issue Suggested reading J. Pearl: Causal inference in statistics, 2009