Causal Bayesian networks. Peter Antal

Similar documents
Bayesian networks as causal models. Peter Antal

Causal Bayesian networks. Peter Antal

Fusion in simple models

Causal Inference & Reasoning with Causal Bayesian Networks

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Introduction to Causal Calculus

Egészségügyi informatika és biostatisztika Biomarkerek

Lecture 4 October 18th

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Probability theory: elements

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Directed Graphical Models or Bayesian Networks

Conditional Independence and Factorization

4.1 Notation and probability review

Lecture 5: Bayesian Network

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

On Causal Explanations of Quantum Correlations

Probabilistic Causal Models

Causality in Econometrics (3)

Rapid Introduction to Machine Learning/ Deep Learning

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

An Introduction to Bayesian Machine Learning

Learning in Bayesian Networks

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Undirected Graphical Models: Markov Random Fields

LEARNING CAUSAL EFFECTS: BRIDGING INSTRUMENTS AND BACKDOORS

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

CS Lecture 3. More Bayesian Networks

Bayesian Network Representation

An Introduction to Bayesian Networks: Representation and Approximate Inference

Conditional Independence

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

Introduction to Probabilistic Graphical Models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Bayesian Networks Representation

Bounding the Probability of Causation in Mediation Analysis

Automatic Causal Discovery

Data Analysis and Uncertainty Part 1: Random Variables

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Graphical Models (I)

Chris Bishop s PRML Ch. 8: Graphical Models

Lecture Notes 22 Causal Inference

A Decision Theoretic Approach to Causality

Causal Models with Hidden Variables

COMP538: Introduction to Bayesian Networks

Towards an extension of the PC algorithm to local context-specific independencies detection

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks Inference with Probabilistic Graphical Models

Bayes Nets: Independence

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Intelligent Systems (AI-2)

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

Causal Reasoning with Ancestral Graphs

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

CS6220: DATA MINING TECHNIQUES

Bayesian Networks BY: MOHAMAD ALSABBAGH

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Single World Intervention Graphs (SWIGs):

Building Bayesian Networks. Lecture3: Building BN p.1

Y. Xiang, Inference with Uncertain Knowledge 1

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

10601 Machine Learning

CSC 412 (Lecture 4): Undirected Graphical Models

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

CS6220: DATA MINING TECHNIQUES

Advanced Probabilistic Modeling in R Day 1

Identification of Conditional Interventional Distributions

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

COMP538: Introduction to Bayesian Networks

Data Mining 2018 Bayesian Networks (1)

Probabilistic Machine Learning

Bayesian networks (1) Lirong Xia

Bayesian Networks for Causal Analysis

Stephen Scott.

What Causality Is (stats for mathematicians)

From Causality, Second edition, Contents

Introduction to Artificial Intelligence. Unit # 11

Stability of causal inference: Open problems

Reasoning Under Uncertainty: Belief Network Inference

Representation of undirected GM. Kayhan Batmanghelich

Causal Inference. Prediction and causation are very different. Typical questions are:

2 : Directed GMs: Bayesian Networks

Bayesian Networks and Decision Graphs

Probability Review. September 25, 2015

Exchangeability and Invariance: A Causal Theory. Jiji Zhang. (Very Preliminary Draft) 1. Motivation: Lindley-Novick s Puzzle

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Bayesian Networks. Material used

Complete Identification Methods for the Causal Hierarchy

Identifiability in Causal Bayesian Networks: A Gentle Introduction

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Probabilistic Models

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

Statistical Models for Causal Analysis

Transcription:

Causal Bayesian networks Peter Antal antal@mit.bme.hu A.I. 4/8/2015 1

Can we represent exactly (in)dependencies by a BN? From a causal model? Suff.&nec.? Can we interpret edges as causal relations with no hidden variables? in the presence of hidden variables? local models as autonomous mechanisms? Can we infer the effect of interventions? Optimal study design to infer the effect of interventions?

In a Bayesian network, any query can be answered corresponding to passive observations: p(q=q E=e). What is the (conditional) probability of Q=q given that E=e. Note that Q can preceed temporally E. X Y Specification: p(x), p(y X) Joint distribution: p(x,y) Inferences: p(x), p(y), p(y X), p(x Y) A.I. 4/8/2015 3

Perfect intervention: do(x=x) as set X to x. What is the relation of p(q=q E=e) and p(q=q do(e=e))? X Y Specification: p(x), p(y X) Joint distribution: p(x,y) Inferences: p(y X=x)=p(Y do(x=x)) p(x Y=y) p(x do(y=y)) What is a formal knowledge representation of a causal model? What is the formal inference method? A.I. 4/8/2015 4

Imagery observations and interventions: We observed X=x, but imagine that x would have been observed: denoted as X =x. We set X=x, but imagine that x would have been set: denoted as do(x =x ). What is the relation of Observational p(q=q E=e, X=x ) Interventional p(q=q E=e, do(x=x )) Counterfactual p(q =q Q=q, E=e, do(x=x), do(x =x )) O: What is the probability that the patient recovers if he takes the drug x. I:What is the probability that the patient recovers if we prescribe* the drug x. C: Given that the patient did not recovered for the drug x, what would have been the probability that patient recovers if we had prescribed* the drug x, instead of x. ~C (time-shifted): Given that patient did not recovered for the drug x and he has not respond well**, what is the probability that patient will recover if we change the prescribed* drug x to x. *: Assume that the patient is fully compliant. ** expected to neither he will. A.I. 4/8/2015 5

The domain is defined by the joint distribution P(X 1,..., X n Structure,parameters) 1. Representation of parameteres small number of parameters 2. Representation of independencies what is relevant for diagnosis 3. Representation of causal relations what is the effect of a treatment 4. Representation of possible worlds quantitave qualitative? passive (observational) Active (interventional) Imagery (counterfactual)

strong association, X precedes temporally Y, plausible explanation without alternative explanations based on confounding, necessity (generally: if cause is removed, effect is decreased or actually: y would not have been occurred with that much probability if x had not been present), sufficiency (generally: if exposure to cause is increased, effect is increased or actually: y would have been occurred with larger probability if x had been present). Autonomous, transportable mechanism. The probabilistic definition of causation formalizes many, but for example not the counterfactual aspects. A.I. 4/8/2015 7

In the first order Markov chain below, despite the dependency of X-Y and Y-Z, X and Z can be independent (assuming non-binary Y). X Y Z

Pairwise independence does not imply multivariate independence! X XOR Z Y

I P (X;Y Z) or (X Y Z) P denotes that X is independent of Y given Z: P(X;Y z)=p(y z) P(X z) for all z with P(z)>0. (Almost) alternatively, I P (X;Y Z) iff P(X Z,Y)= P(X Z) for all z,y with P(z,y)>0. Other notations: D P (X;Y Z) =def= I P (X;Y Z) Contextual independence: for not all z.

The independence map (model) M of a distribution P is the set of the valid independence triplets: M P ={I P,1 (X 1 ;Y 1 Z 1 ),..., I P,K (X K ;Y K Z K )} If P(X,Y,Z) is a Markov chain, then M P ={D(X;Y), D(Y;Z), I(X;Z Y)} Normally/almost always: D(X;Z) Exceptionally: I(X;Z) X Y Z

J.Pearl: Prob. Reasoning in intelligent systems, 1998 A.I. 4/8/2015 13

Y X Z If P(Y,X,Z) is a naive Bayesian network, then M P ={D(X;Y), D(Y;Z), I(X;Z Y)} Normally/almost always: D(X;Z) Exceptionally: I(X;Z)

Directed acyclic graph (DAG) nodes random variables/domain entities edges direct probabilistic dependencies (edges- causal relations Local models - P(X i Pa(X i )) Three interpretations: 3. Concise representation of joint distributions P( M, O, D, S, T ) P( M ) P( O M ) P( D O, M ) P( S D) P( T S, M ) 1. Causal model M P ={I P,1 (X 1 ;Y 1 Z 1 ),...} 2. Graphical representation of (in)dependencies

I G (X;Y Z) denotes that X is d-separated (directed separated) from Y by Z in directed graph G.

For certain distributions exact representation is not possible by Bayesian networks, e.g.: 1. Intransitive Markov chain: XYZ 2. Pure multivariate cause: {X,Z}Y V 3. Diamond structure: P(X,Y,Z,V) with M P ={D(X;Z), D(X;Y), D(V;X), D(V;Z), I(V;Y {X,Z}), I(X;Z {V,Y}).. }. X Z Y

Causal models: X 1 X 2 X 3 X 4 X 1 X 2 X 3 X 4 Markov chain P(X 1,...) M P ={I(X i+1 ;X i-1 X i )} first order Markov property Flow of time?

p(x),p(z X),p(Y Z) X Z Y p(x Z),p(Z Y),p(Y) X Z Y p(x Z),p(Z),p(Y Z) X Z Y transitive M intransitive M p(x),p(z X,Y),p(Y) X Y Z v-structure M P ={D(X;Z), D(Z;Y), D(X,Y), I(X;Y Z)} M P ={D(X;Z), D(Y;Z), I(X;Y), D(X;Y Z) } Often: present knowledge renders future states conditionally independent. (confounding) Ever(?): present knowledge renders past states conditionally independent. (backward/atemporal confounding)

Causal models: J.Pearl: ~ 3D objects From passive observations: P(X 1,..., X n ) M P ={I P,1 (X 1 ;Y 1 Z 1 ),..., I P,K (X K ;Y K Z K )} 2D projection Different causal models can have the same independence map! Typically causal models cannot be identified from passive observations, they are observationally equivalent.

Causal model: Dependency map: P(X 1,..., X n ) M P ={I P,1 (X 1 ;X 2 ),...} One-to-one relation

Causal models (there is a DAG for each ordering, i.e. n! DAGs): Dependency map: P(X 1,..., X n ) M P ={D P,1 (X 1 ;X 2 ),...} One-to-many relation

A DAG is called a causal structure over a set of variables, if each node represents a variable and edges direct influences. A causal model is a causal structure extended with local probabilistic models. A causal structure G and distribution P satisfies the Causal Markov Condition, if P obeys the local Markov condition w.r.t. G. The distribution P is said to stable (or faithful), if there exists a DAG called perfect map exactly representing its (in)dependencies (i.e. I G (X;Y Z) I P (X;Y Z) X,Y,Z V ). CMC: sufficiency of G (there is no extra, acausal edge) Faithfulness/stability: necessity of G (there are no extra, parametric independency)

(Passive, observational) inference P(Query Observations) Interventionist inference P(Query Observations, Interventions) Counterfactual inference P(Query Observations, Counterfactual conditionals)

If G is a causal model, then compute p(y do(x=x)) by 1. deleting the incoming edges to X 2. setting X=x 3. performing standard Bayesian network inference. Mutation? Subpopulation Location E X? Y * Disease

Causal models: X Y X Y X causes Y Y causes X X * Y... There is a common cause (pure confounding) X *... * Y Causal effect of Y on X is confounded by many factors From passive observations: M P ={D(X;Y)} P(X,Y) X Y X and Y are associated Reichenbach's Common Cause Principle: a correlation between events X and Y indicates either that X causes Y, or that Y causes X, or that X and Y have a common cause.

Can we learn causal relations from observational data in presence of confounders??? Smoking Increased propensity Smoking A genetic polymorphism* Increased susceptibility Lung cancer Lung cancer Automated, tabula rasa causal inference from (passive) observation is possible, i.e. hidden, confounding variables can be excluded E X? Y??? *

H.Simon X i =f i (X 1,..,X i-1 ) for i=1..n In the linear case the sytem of equations indicates a natural causal ordering (flow of time?)... X X X X X X X X X X The probabilistic conceptualization is its generalization: P(X i, X 1,..,X i-1 ) ~ X i =f i (X 1,..,X i-1 )

Can we represent exactly (in)dependencies by a BN? almost always Can we interpret edges as causal relations with no hidden variables? compelled edges as a filter in the presence of hidden variables? Sometimes, e.g. confounding can be excluded in certain cases in local models as autonomous mechanisms? a priori knowledge, e.g. Causal Markov Assumption Can we infer the effect of interventions in a causal model? Graph surgery with standard inference in BNs Optimal study design to infer the effect of interventions? With no hidden variables: yes, in a non-bayesian framework In the presence of hidden variables: open issue Suggested reading J. Pearl: Causal inference in statistics, 2009