Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Similar documents
1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Introduction to Causal Calculus

Probabilistic Graphical Models (I)

Causal Inference & Reasoning with Causal Bayesian Networks

Rapid Introduction to Machine Learning/ Deep Learning

Directed Graphical Models

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Causality in Econometrics (3)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Markov properties for directed graphs

Probabilistic Graphical Networks: Definitions and Basic Results

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

CS Lecture 3. More Bayesian Networks

Learning in Bayesian Networks

Chris Bishop s PRML Ch. 8: Graphical Models

Identifiability assumptions for directed graphical models with feedback

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Directed and Undirected Graphical Models

Data Mining 2018 Bayesian Networks (1)

Reasoning Under Uncertainty: Belief Network Inference

Directed Graphical Models or Bayesian Networks

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Conditional Independence

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Lecture 5: Bayesian Network

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Machine Learning Summer School

COMP538: Introduction to Bayesian Networks

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Introduction to Artificial Intelligence. Unit # 11

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

2 : Directed GMs: Bayesian Networks

Bayesian Networks to design optimal experiments. Davide De March

Chapter 16. Structured Probabilistic Models for Deep Learning

STA 4273H: Statistical Machine Learning

Using Graphs to Describe Model Structure. Sargur N. Srihari

Causal Directed Acyclic Graphs

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Learning Semi-Markovian Causal Models using Experiments

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Arrowhead completeness from minimal conditional independencies

Machine Learning Lecture 14

2 : Directed GMs: Bayesian Networks

Bayesian Networks for Causal Analysis

Towards an extension of the PC algorithm to local context-specific independencies detection

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Parametrizations of Discrete Graphical Models

Learning Multivariate Regression Chain Graphs under Faithfulness

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Markov Independence (Continued)

Naïve Bayes Classifiers

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Machine Learning 4771

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

Bayesian Approaches Data Mining Selected Technique

Statistical Approaches to Learning and Discovery

CPSC 540: Machine Learning

Cognitive Systems 300: Probability and Causality (cont.)

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Introduction to Bayesian Networks

Representation of undirected GM. Kayhan Batmanghelich

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Lecture 4 October 18th

Faithfulness of Probability Distributions and Graphs

Review: Bayesian learning and inference

Learning causal network structure from multiple (in)dependence models

Causal Reasoning with Ancestral Graphs

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Causal Models. Macartan Humphreys. September 4, Abstract Notes for G8412. Some background on DAGs and questions on an argument.

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Probabilistic Causal Models

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

4.1 Notation and probability review

PROBABILISTIC REASONING SYSTEMS

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Causal Bayesian networks. Peter Antal

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Graphical Models. Chapter Conditional Independence and Factor Models

Directed acyclic graphs with edge-specific bounds

Artificial Intelligence Bayes Nets: Independence

Lecture 17: May 29, 2002

Introduction to Probabilistic Graphical Models

Automatic Causal Discovery

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Building Bayesian Networks. Lecture3: Building BN p.1

Directed acyclic graphs and the use of linear mixed models

Part I Qualitative Probabilistic Networks

Introduction to Probabilistic Graphical Models

Graphical Models and Kernel Methods

Equivalence in Non-Recursive Structural Equation Models

Bayesian Networks. Motivation

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Transcription:

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects Learning DAGs from observational data IDA algorithm Further problems 2

Overview DAGs and conditional (in)dependence Directed acyclic graphs Factorization of the joint density Markov property d-separation 3

Graph terminology A graph G = V, E consists of vertices (nodes) V and edges E There is at most one edge between every pair of vertices Two vertices are adjacent if there is an edge between them If all edges are directed (i j), the graph is called directed A path between i and j is a sequence of distinct vertices (i,, j) such that successive vertices are adjacent A directed path from i to j is a path between i and j where all edges are pointing towards j, i.e., i j. A non-directed path from i to j is a path from i to j that is not directed. A cycle is a path (i, j,, k) plus an edge between k and i A directed cycle is a directed path (i, j,, k) from i to k, plus an edge k i A directed acyclic graph (DAG) is a directed graph without directed cycles 4

Example 1 2 3 4 5 Which of the following are (directed) paths? (4,2,1,3) (3,4,2,1,4,5) (3,4,2,1) 3,4,5 a directed path from 3 to 5 (1,2,4,1) is a cycle, but not a directed cycle This graph is a DAG not a path: 1 and 3 are not adjacent not a path: vertices are not distinct a path, but not a directed path Adding the edge 5 2 yields a directed cycle 2,4,5,2. The graph is no longer a DAG. 5

Graph terminology If i j, then i is a parent of j, and j is a child of i If there is a directed path from i to j, then i is an ancestor of j and j is a descendant of i. Each vertex is also an ancestor and descendant of itself. The sets of parents, children, descendants and ancestors of i in G are denoted by pa(i, G), ch(i, G), desc(i, G), an i, G. We omit G if the graph is clear from the context. We write sets of variables in bold face All definitions are applied disjunctively to sets. Example: pa S = k S pa(k) The non-descendants of S are the complement of desc S : nondesc S V desc(s) 6

Example 1 2 3 4 5 1 is a parent of 4, and 4 is a child of 1 1 is an ancestor of 5, and 5 is a descendant of 1 ch(4) = {5} desc(4) = {4,5} pa(4) = {1,2,3} an(4) = {1,2,3,4} pa 2,4 = 1 1,2,3 = 1,2,3 desc 2,3 = 2,4,5 3,4,5 = 2,3,4,5 nondesc 2,3 = 1,2,3,4,5 2,3,4,5 = {1} 7

DAGs and random variables Each vertex represents a random variable: we use i to denote both the vertex i and the random variable X i An edge denotes a relationship between the pair of variables (we will make this more precise later) 8

Factorization of the joint density Suppose we have p binary random variables. Then specifying the joint distribution requires a probability table of size 2 p. Can we do this more efficiently? We always have: f x 1,, x p = f x 1 f x 2 x 1 f(x p x 1, x p 1 ) A set of variables pa(j) is said to be the Markovian parents of X j if it is a minimal subset of {X 1, X j 1 } such that f x j x 1,, x j 1 = f(x j pa j ). p Then f x 1,, x p = j=1 f(x j pa j ) We can draw a DAG accordingly 9

Examples (X 1, X 2, X 3 ) and X 1 X 3 X 2 is only (conditional) independence: f x 1 x 2, x 3 = f x 1 x 2 and f x 3 x 1, x 2 = f(x 3 x 2 ) Then Or Or f x 1, x 2, x 3 = f x 1 f x 2 x 1 f x 3 x 1, x 2 = f x 1 f x 2 x 1 f x 3 x 2 DAG: 1 2 3 f x 3, x 2, x 1 = f x 3 f x 2 x 3 f x 1 x 2, x 3 = f x 3 f x 2 x 3 f x 1 x 2 DAG: 3 2 1 f x 1, x 3, x 2 = f x 1 f x 3 x 1 f x 2 x 1, x 3 DAG: 1 3 2 10

Example First order Markov chain: f X 1,, X p = f X 1 f X 2 X 1 f X p X 1,, X p 1 = f X 1 f X 2 X 1 f(x p X p 1 ) DAG: 1 2 p 11

Bayesian network A Bayesian network is pair (G, f), where f factorizes according to G: p f x 1,, x p = f(x j pa j, G ) j=1 Bayesian networks can be used for: Estimation of the joint density from lower order conditional densities (if the parent sets are small) Reading off conditional independencies from the DAG Probabilistic reasoning (expert systems) Causal inference 12

Bayesian network A Bayesian network is pair (G, f), where f factorizes according to G: p f x 1,, x p = f(x j pa j, G ) j=1 Bayesian networks can be used for: Estimation of the joint density from low order conditional densities (if the parent sets are small) Reading off conditional independencies from the DAG Probabilistic reasoning (expert systems) Causal inference 13

Reading off conditional independencies: Markov property Markov model: 1 2 t 1 t t + 1. The future is independent of the past given the present: X t+1 X t 1, X t 2,, X 1 X t In Bayesian networks, we have a similar Markov property. Let S be any collection of nodes. Then S is independent of its nondescendants given its parents: S nondesc S pa S pa(s) 14

Example Markov property smoking yellow teeth tar in lungs asbestos cancer Note: pa yellow teeth = smoking cancer nondesc(yellow teeth) Hence, yellow teeth cancer smoking in any distribution that factorizes according to this DAG 15

Graph terminology The skeleton of a graph is the graph obtained by removing all arrowheads A node i is a collider on a path if the path contains i (the arrows collide at i). Otherwise, it is a non-collider on the path. 1 2 3 4 5 Examples: 4 is a collider on the path (3,4,1) 4 is a non-collider on the path (3,4,5) Collider status depends is relative to a path 16

Reading off conditional independencies: d-separation The Markov property cannot be used to read off arbitrary conditional (in)dependencies. For this we have d-separation. A path between i to j is blocked by a set S if at least one of the following holds: There is a non-collider on the path that is in S; or There is a collider on the path such that neither this collider nor any of its descendants are in S. A path that is not blocked is d-connecting. If all paths between i to j are blocked by S, then i and j are d- separated by S. Otherwise they are d-connected given S. In any distribution that factorizes according to a DAG: if i and j are d-separated by S in the DAG, then X i and X j are conditionally independent given S in the distribution 17

Examples d-separation smoking yellow teeth tar in lungs asbestos cancer Denote d-separation by. Which of the following hold? yellow teeth cancer smoking tar asbestos tar asbestos cancer yellow teeth asbestos cancer yes yes no no 18

Bayesian network A Bayesian network is pair (G, f), where f factorizes according to G: p f x 1,, x p = f(x j pa j, G ) j=1 Bayesian networks can be used for: Estimation of the joint density from low order conditional densities (if the parent sets are small) Reading off conditional independencies from the DAG Probabilistic reasoning (expert systems); see R-code Causal inference 19

Bayesian network A Bayesian network is pair (G, f), where f factorizes according to G: p f x 1,, x p = f(x j pa j, G ) j=1 Bayesian networks can be used for: Estimation of the joint density from low order conditional densities (if the parent sets are small) Reading off conditional independencies from the DAG Probabilistic reasoning (expert systems) Causal inference: next topic 20