Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Similar documents
From Distributions to Markov Networks. Sargur Srihari

3 : Representation of Undirected GM

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models

Undirected Graphical Models

Markov properties for undirected graphs

Markov properties for undirected graphs

Probabilistic Graphical Models

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Lecture 4 October 18th

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

4.1 Notation and probability review

Probabilistic Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Directed and Undirected Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum

Gibbs Fields & Markov Random Fields

3 Undirected Graphical Models

Bayesian & Markov Networks: A unified view

Gibbs Field & Markov Random Field

CS Lecture 4. Markov Random Fields

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Review: Directed Models (Bayes Nets)

Probabilistic Graphical Models

On Markov Properties in Evidence Theory

Markov Random Fields

Representation of undirected GM. Kayhan Batmanghelich

Statistical Approaches to Learning and Discovery

Probabilistic Graphical Models

Graphical Models and Independence Models

Chris Bishop s PRML Ch. 8: Graphical Models

Example: multivariate Gaussian Distribution

6.867 Machine learning, lecture 23 (Jaakkola)

CS Lecture 3. More Bayesian Networks

Conditional Marginalization for Exponential Random Graph Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Markov Independence (Continued)

Undirected Graphical Models

Conditional Independence and Factorization

Reasoning with Bayesian Networks

EE512 Graphical Models Fall 2009

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Undirected graphical models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Lecture 6: Graphical Models

Directed Graphical Models or Bayesian Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Chapter 16. Structured Probabilistic Models for Deep Learning

CMPSCI 250: Introduction to Computation. Lecture #11: Equivalence Relations David Mix Barrington 27 September 2013

Markov Chains and MCMC

Probabilistic Reasoning: Graphical Models

Chapter 9: Relations Relations

Independence, Decomposability and functions which take values into an Abelian Group

Machine Learning Lecture Notes

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Probabilistic Graphical Models (I)

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CPSC 540: Machine Learning

7 The structure of graphs excluding a topological minor

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Bayesian Networks (Part II)

From Bayesian Networks to Markov Networks. Sargur Srihari

Total positivity in Markov structures

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Machine Learning 4771

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Multivariate Gaussians. Sargur Srihari

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Using Graphs to Describe Model Structure. Sargur N. Srihari

Conditional Independence and Markov Properties

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Machine Learning - Lecture 7

BN Semantics 3 Now it s personal!

Capturing Independence Graphically; Undirected Graphs

Testing Equality in Communication Graphs

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

COS597D: Information Theory in Computer Science October 5, Lecture 6

Conditional probabilities and graphical models

Directed Graphical Models

Learning Marginal AMP Chain Graphs under Faithfulness

11 The Max-Product Algorithm

Markov and Gibbs Random Fields

Bayesian Networks 3 D-separation. D-separation

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Dominating Set. Chapter 7

Machine Learning Summer School

Applying Bayesian networks in the game of Minesweeper

Bayesian Learning in Undirected Graphical Models

Topic: Balanced Cut, Sparsest Cut, and Metric Embeddings Date: 3/21/2007

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Lecture 18: More NP-Complete Problems

6.842 Randomness and Computation March 3, Lecture 8

Transcription:

(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies (really easy!) Z Formally: let H be a Markov network structure, and let X 1... X k be a path in H. Let Z X be a set of observed variables. The path X 1... X k is active given Z if none of the X i s in i=1,..., k, is in Z. X 1 X 2 X 3 Path not active if X 2 is in Z and it separates X 1 and X 3 3 4 1

A set of nodes Z separates X and Y in H, denoted sep H (X; Y Z), if there is no active path between any node X X and Y Y given Z. Separation is monotonic in Z ie. If sep H (X; Y Z) then sep H (X; Y Z ) for any Z Z. Example: We define the global independencies associated with H to be: I(H) = {(X Y Z) : sep H (X; Y Z)}, X 1 X 2 X 3 X 4 X 1 X 2 X 3 X 4 5 Can t encode non-monotonic independence relations with separation in a Markov network (more on this later) 6 We will show: Soundness: Distribution P factorizes over a Markov network H Separation in H Completeness: Separation in H finds almost all conditional independencies in P 7 Soundness Theorem 4.1: Let P be a distribution over X, and H a Markov network structure over X. If P is a Gibbs distribution that factorizes over H, then H is an I-map for P. Informally: separation in H distribution P factorizes over H Recall: A distribution with Φ,, factorizes over a Markov network H if each (k=1,, K) is a complete subgraph of H and 1 8 2

Soundness Soundness Proof: Let X, Y, Z be any three disjoint subsets in X such that Z separates X and Y in H. Want to show: P = (X Y Z) Case 1: X Y Z = X Cliques involving only Cliques involving only,, 1 Cliques involving only Cliques involving only 9 1,, Therefore, 10 Soundness Case 2: X Y Z X Let U = X (X Y Z ). We can partition U into two disjoint sets U 1 and U 2 such that: Soundness,, 1,,,, Therefore, [By the decomposition property] Cliques involving only Cliques involving only Decomposition property: (X {Y,W} Z) (X Y Z) 11 12 3

Soundness So far, we ve shown: separation in H distribution P factorizes over H. Showing direction Hammersley-Clifford Theorem: Let P be a positive distribution over X, and H a Markov network graph over X. If H is an I-map for P, then P is a Gibbs distribution that factorizes over H. (Proof omitted) Completeness Strong version: every pair of nodes X and Y that are not separated in H are dependent in every distribution which factorizes over H (not true) Weaker version needed 13 14 Completeness Theorem 4.3: Let H be a Markov network structure. If X and Y are not separated given Z in H, then X and Y are dependent given Z in some distribution P that factorizes over H. (Proof omitted here) Thus, for almost all distributions P that factorize over H, we have I(P) = I(H). We had two definitions of independencies in Bayesian networks: 1. Global independencies D-separation 2. Local independencies: (X i NonDescendants(X i ) Parents(X i )) 15 16 4

We can do the same thing with Markov Networks: 1. Global independencies: separation 2. Local independencies: a) Pairwise independencies b) Local independencies (Markov Blanket) Pairwise Intuitively: when two variables are not directly connected, we can make them conditionally independent through other mediating variables Let H be a Markov network. We define the pairwise independencies associated with H to be: I p (H) = {(X Y X {X,Y}): X Y H} 17 18 Local Markov Blanket Intuitively: block all influences on a node by conditioning on its immediate neighbors Grey nodes are the Markov blanket Local Markov Blanket Formally: for a given graph H, we define the Markov blanket of X in H, denoted MB H (X), to be the neighbors of X in H. We define the local independencies associated with H to be: I l (H) = {(X X {X} MB H (X) MB H (X)) : X X}. Note: For general distributions, I P (H) is strictly weaker than I l (H) which is strictly weaker than I(H) 19 20 5

Proposition 4.3: For any Markov network H, and any distribution P, we have that if P = I l (H) then P = I P (H). Proposition 4.4: For any Markov network H, and any distribution P, we have that if P = I(H) then P = I l (H) Note: The converse of both propositions is true only for positive distributions Theorem 4.4: Let P be a positive distribution. If P satisfies I P (H), then P satisfies I(H). (Proof omitted here) Corollary 4.1: The following three statements are equivalent for a positive distribution P 1. P = I l (H) 2. P = I P (H) 3. P = I(H) 21 22 Why only positive distributions P? In a nonpositive distribution, we can satisfy one of the weaker properties but not the stronger one (Counterexamples to follow) Counterexample #1: Let be any distribution over,, ; let,,. Construct a distribution, whose marginal over,, is the same as, and where is deterministically equal to. [Note: this is where the nonpositivity comes in] Let be a Markov network over, that contains no edges other than to. X 1 X 2 X 3 23 X 1 X 2 X 3 24 6

In : Thus, satisfies the local independencies for every node in the network. But is not an I-map for : makes many independence assertions regarding the that do not hold in (or in ) Eg. that the are independent Counterexample #2: Let be any distribution over,,, Consider two auxiliary sets of variables and, and define Construct a distribution whose marginal,,,,, and where and are both deterministically equal to. [Note: this is where the nonpositivity comes in] 25 26 Let be the empty Markov network over eg. X 1 X 2 X 1 X 2 X 1 X 2 Satisfies the pairwise assumptions for every pair of nodes in the network ie., : Why? Eg. and are rendered independent because X* - {, } contains (knowing tells you everything about or Similarly, and are independent given. Thus, satisfies the pairwise independencies, but not the local or global independencies 27 28 7

How do we encode the independencies in a distribution P using a graph structure? Need to construct a minimal I-map Use either of the following two algorithms: Theorem 4.5: uses pairwise independencies Theorem 4.6: uses Markov Blanket 29 30 Theorem 4.5: Let P be a positive distribution, and let H be defined by introducing an edge {X,Y} for all X,Y for which P = (X Y X {X,Y}). Then the Markov network H is the unique minimal I- map for P. [Proof omitted] Theorem 4.6: Let P be a positive distribution. For each node X, let MB p (X) be a minimal set of nodes U that form a Markov blanket. We define a graph H by introducing an edge {X, Y} for all X and all Y MB p (X). Then the Markov network H is the unique minimal I-map for P. [Proof omitted] 31 32 8

Counterexample for nonpositive distribution: Consider a nonpositive distribution P over four binary variables A, B, C, D Assigns non-zero probability only to cases where all four variables take on exactly the same value Eg. P(A=1,B=1,C=1,D=1) = 0.5 and P(A=0,B=0,C=0,D=0) = 0.5 What happens when we apply the Markov Blanket construction from Theorem 4.6? A C B D The graph to the left is one possibility (note: not an I-map for the distribution) P = (A {C, D} B) and {B} is a legal choice for MB p (A) 33 34 What happens when we apply the pairwise independence construction from Theorem 4.5? A C B D For example, no edge placed between A and B because P = (A B {C, D}) Resulting network is not an I-map of P 35 Does every distribution have a perfect map? No, even for positive distributions Example: Can t use a Markov network to represent a distribution corresponding a Bayesian network v-structure D G I D G I I dependent on D given G. Minimal I-map is the fully connected graph which doesn t capture (I D) Not a perfect map 36 9