Markov properties for directed graphs

Similar documents
Markov properties for undirected graphs

Markov properties for undirected graphs

Conditional Independence and Markov Properties

Lecture 17: May 29, 2002

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Probability Propagation

Markov Independence (Continued)

Faithfulness of Probability Distributions and Graphs

Directed Graphical Models

Data Mining 2018 Bayesian Networks (1)

Probability Propagation

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS Lecture 3. More Bayesian Networks

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Markov properties for mixed graphs

Graphical Models with R

Learning in Bayesian Networks

2 : Directed GMs: Bayesian Networks

Directed and Undirected Graphical Models

4.1 Notation and probability review

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Probabilistic Graphical Models (I)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Rapid Introduction to Machine Learning/ Deep Learning

2 : Directed GMs: Bayesian Networks

Lecture 4 October 18th

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Separable and Transitive Graphoids

Markov properties for graphical time series models

Machine Learning 4771

Random Effects Graphical Models for Multiple Site Sampling

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Identifiability assumptions for directed graphical models with feedback

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

CSC 412 (Lecture 4): Undirected Graphical Models

Machine Learning Summer School

Objective Bayesian Analysis for Differential Gaussian Directed Acyclic Graphs

Learning Multivariate Regression Chain Graphs under Faithfulness

3 : Representation of Undirected GM

Undirected Graphical Models

Directed Graphical Models or Bayesian Networks

Decomposable and Directed Graphical Gaussian Models

Arrowhead completeness from minimal conditional independencies

p L yi z n m x N n xi

Review: Directed Models (Bayes Nets)

BN Semantics 3 Now it s personal!

CS Lecture 4. Markov Random Fields

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Probabilistic Graphical Networks: Definitions and Basic Results

Statistical Learning

Likelihood Analysis of Gaussian Graphical Models

Probabilistic Reasoning: Graphical Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Networks (Part II)

Machine Learning Lecture 14

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Chapter 16. Structured Probabilistic Models for Deep Learning

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Probabilistic Graphical Models

arxiv: v4 [math.st] 11 Jul 2017

STA 4273H: Statistical Machine Learning

Statistical Approaches to Learning and Discovery

Introduction to Causal Calculus

Decomposable Graphical Gaussian Models

arxiv: v3 [math.st] 7 Dec 2017

Undirected Graphical Models: Markov Random Fields

A Tutorial on Bayesian Belief Networks

Parametrizations of Discrete Graphical Models

Introduction to Probabilistic Graphical Models

Representation of undirected GM. Kayhan Batmanghelich

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Undirected Graphical Models

Confounding Equivalence in Causal Inference

Conditional Independence and Factorization

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Intelligent Systems:

CS281A/Stat241A Lecture 19

Total positivity in Markov structures

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Confounding Equivalence in Causal Inference

Learning Marginal AMP Chain Graphs under Faithfulness

6.867 Machine learning, lecture 23 (Jaakkola)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Causal Discovery. Beware of the DAG! OK??? Seeing and Doing SEEING. Properties of CI. Association. Conditional Independence

CPSC 540: Machine Learning

On Markov Properties in Evidence Theory

Part I Qualitative Probabilistic Networks

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Bayesian Networks to design optimal experiments. Davide De March

Causality in Econometrics (3)

Representing sparse Gaussian DAGs as sparse R-vines allowing for non-gaussian dependence

The International Journal of Biostatistics

Transcription:

Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009

Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise Markov property if (semi)graphoid relation. α β α σ β V \ {α, β}; (L) the local Markov property if α V : α σ V \ cl(α) bd(α); (G) the global Markov property if A G B S A σ B S.

Definitions Structural relations among Markov properties Factorization For any semigraphoid it holds that (G) (L) (P) If σ satisfies graphoid axioms it further holds that (P) (G) so that in the graphoid case (G) (L) (P). The latter holds in particular for, when f (x) > 0.

Definitions Structural relations among Markov properties Factorization Assume density f w.r.t. product measure on X. For a V, ψ a (x) denotes a function which depends on x a only, i.e. x a = y a ψ a (x) = ψ a (y). We can then write ψ a (x) = ψ a (x a ) without ambiguity. The distribution of X factorizes w.r.t. G or satisfies (F) if f (x) = a A ψ a (x) where A are complete subsets of G. Complete subsets of a graph are sets with all elements pairwise neighbours.

Definitions Structural relations among Markov properties Factorization Factorization theorem Let (F) denote the property that f factorizes w.r.t. G and let (G), (L) and (P) denote Markov properties w.r.t.. It then holds that (F) (G) and further: If f (x) > 0 for all x, (P) (F). Thus in the case of positive density (but typically only then), all the properties coincide: (F) (G) (L) (P).

A directed acyclic graph D over a finite set V is a simple graph with all edges directed and no directed cycles. We use DAG for brevity. Absence of directed cycles means that, following arrows in the graph, it is impossible to return to any point. Graphical models based on DAGs have proved fundamental and useful in a wealth of interesting applications, including expert systems, genetics, complex biomedical statistics, causal analysis, and machine learning.

Example of a directed graphical model

A semigraphoid relation σ satisfies the local Markov property (L) w.r.t. a directed acyclic graph D if α V : α σ {nd(α) \ pa(α)} pa(α). Here nd(α) are the non-descendants of α. A well-known example is a Markov chain: X 1 X 2 X 3 X 4 X 5 with X i+1 (X 1,..., X i 1 ) X i for i = 3,..., n. X n

2 4 1 5 7 3 6 For example, the local Markov property says 4 σ {1, 3, 5, 6} 2, 5 σ {1, 4} {2, 3} 3 σ {2, 4} 1.

Suppose the vertices V of a DAG D are well-ordered in the sense that they are linearly ordered in a way which is compatible with D, i.e. so that α pa(β) α < β. We then say semigraphoid relation σ satisfies the ordered Markov property (O) w.r.t. a well-ordered DAG D if α V : α σ {pr(α) \ pa(α)} pa(α). Here pr(α) are the predecessors of α, i.e. those which are before α in the well-ordering..

2 4 1 5 7 3 6 The numbering corresponds to a well-ordering. The ordered Markov property says for example 4 σ {1, 3} 2, 5 σ {1, 4} {2, 3} 3 σ {2} 1.

Separation in DAGs A trail τ from vertex α to vertex β in a DAG D is blocked by S if it contains a vertex γ τ such that either γ S and edges of τ do not meet head-to-head at γ, or γ and all its descendants are not in S, and edges of τ meet head-to-head at γ. A trail that is not blocked is active. Two subsets A and B of vertices are d-separated by S if all trails from A to B are blocked by S. We write A D B S.

Separation by example 2 4 1 5 7 3 6 2 4 1 5 7 3 6 For S = {5}, the trail (4, 2, 5, 3, 6) is active, whereas the trails (4, 2, 5, 6) and (4, 7, 6) are blocked. For S = {3, 5}, they are all blocked.

Returning to example 2 4 1 5 7 3 6 Hence 4 D 6 3, 5, but it is not true that 4 D 6 5 nor that 4 D 6.

Equivalence of Markov properties A semigraphoid relation σ satisfies the global Markov property (G) w.r.t. D if A D B S A σ B S. It holds for any DAG D and any semigraphoid relation σ that all directed Markov properties are equivalent: (G) (L) (O). There is also a pairwise property (P), but it is less natural than in the undirected case and it is weaker than the others.

A probability distribution P over X = X V factorizes over a DAG D if its density or probability mass function f has the form (F) : f (x) = v V k v (x v x pa(v) ) where k v 0 and X v k v (x v x pa(v) ) µ v (dx v ) = 1. (F) is equivalent to (F ), where (F ) : f (x) = v V f (x v x pa(v) ), i.e. it follows from (F) that k v in fact are conditional densities/pmf s. Proof by induction!

Example of DAG factorization 2 4 1 5 7 3 6 The above graph corresponds to the factorization f (x) = f (x 1 )f (x 2 x 1 )f (x 3 x 1 )f (x 4 x 2 ) f (x 5 x 2, x 3 )f (x 6 x 3, x 5 )f (x 7 x 4, x 5, x 6 ).

Contrast with undirected factorization 2 4 1 5 7 3 6 Factors ψ are typically not normalized as conditional probabilities: f (x) = ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )ψ 24 (x 2, x 4 )ψ 25 (x 2, x 5 ) ψ 356 (x 3, x 5, x 6 )ψ 47 (x 4, x 7 )ψ 567 (x 5, x 6, x 7 ).

Markov properties and factorization In the directed case it is essentially always true that (F) holds if and only if P satisfies (G), so all directed Markov properties are equivalent to the factorization property! (F) (G) (L) (O).

The moral graph D m of a DAG D is obtained by adding undirected edges between unmarried parents and subsequently dropping directions, as in the example below: 2 4 1 5 7 3 6 2 4 1 5 7 3 6 2 4 1 5 7 3 6

Undirected factorizations If P factorizes w.r.t. D, it factorizes w.r.t. the moralised graph D m. This is seen directly from the factorization: f (x) = v V f (x v x pa(v) ) = v V ψ {v} pa(v) (x), since {v} pa(v) are all complete in D m. Hence if P satisfies any of the directed Markov properties w.r.t. D, it satisfies all Markov properties for D m.

Alternative equivalent separation To resolve query involving three sets A, B, S: 1. Reduce to subgraph induced by ancestral set D An(A B S) of A B S; 2. Moralize to form (D An(A B S) ) m ; 3. Say that S m-separates A from B and write A m B S if and only if S separates A from B in this undirected graph. It then holds that A m B S if and only if A D B S. Proof in Lauritzen (1996) needs to allow self-intersecting paths to be correct.

Forming ancestral set 2 4 1 5 3 6 The subgraph induced by all ancestors of nodes involved in the query 4 m 6 3, 5?

Adding links between unmarried parents 2 4 1 5 3 6 Adding an undirected edge between 2 and 3 with common child 5 in the subgraph induced by all ancestors of nodes involved in the query 4 m 6 3, 5?

Dropping directions 2 4 1 5 3 6 Since {3, 5} separates 4 from 6 in this graph, we can conclude that 4 m 6 3, 5

Two DAGS D and D are Markov equivalent if the separation relations D and D are identical. D and D are equivalent if and only if: so 1. D and D have same skeleton (ignoring directions) 2. D and D have same unmarried parents Contrast with undirected case, where two undirected graphs are Markov equivalent if and only if they are identical.

of directed and undirected graphs A DAG D is Markov equivalent to an undirected G if the separation relations D and G are identical. This happens if and only if D is perfect and G = σ(d). So, these are all equivalent but not equivalent to