Graphical Models and Independence Models

Similar documents
Dual Decomposition for Inference

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Chris Bishop s PRML Ch. 8: Graphical Models

Undirected Graphical Models

Probabilistic Graphical Models (I)

Undirected Graphical Models

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

CSC 412 (Lecture 4): Undirected Graphical Models

Lecture 4 October 18th

Undirected Graphical Models: Markov Random Fields

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Markov properties for undirected graphs

Conditional Independence and Markov Properties

Review: Directed Models (Bayes Nets)

Markov properties for undirected graphs

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Learning Multivariate Regression Chain Graphs under Faithfulness

3 : Representation of Undirected GM

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Rapid Introduction to Machine Learning/ Deep Learning

Machine Learning Lecture 14

3 Undirected Graphical Models

Total positivity in Markov structures

CPSC 540: Machine Learning

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Directed and Undirected Graphical Models

4.1 Notation and probability review

Representation of undirected GM. Kayhan Batmanghelich

Conditional Independence and Factorization

Machine Learning 4771

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

An Introduction to Bayesian Machine Learning

Graphical Models and Kernel Methods

STA 4273H: Statistical Machine Learning

Chapter 17: Undirected Graphical Models

Undirected graphical models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Likelihood Analysis of Gaussian Graphical Models

Probabilistic Graphical Models

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Towards an extension of the PC algorithm to local context-specific independencies detection

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Chapter 16. Structured Probabilistic Models for Deep Learning

Learning Marginal AMP Chain Graphs under Faithfulness

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

On Markov Properties in Evidence Theory

Alternative Parameterizations of Markov Networks. Sargur Srihari

Probabilistic Graphical Models

Bayesian Learning in Undirected Graphical Models

10708 Graphical Models: Homework 2

Using Graphs to Describe Model Structure. Sargur N. Srihari

Intelligent Systems:

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Statistical Learning

Algebraic Representations of Gaussian Markov Combinations

Probabilistic Graphical Models

Directed Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

Probabilistic Graphical Models

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Belief Update in CLG Bayesian Networks With Lazy Propagation

Graphical Models - Part I

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

CS Lecture 4. Markov Random Fields

Markov Random Fields

Lecture 6: Graphical Models

Introduction to Graphical Models

p L yi z n m x N n xi

6.867 Machine learning, lecture 23 (Jaakkola)

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Statistical Approaches to Learning and Discovery

Graphical models. Sunita Sarawagi IIT Bombay

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

EE512 Graphical Models Fall 2009

Machine Learning Summer School

Directed and Undirected Graphical Models

Junction Tree, BP and Variational Methods

Based on slides by Richard Zemel

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Lecture 17: May 29, 2002

Example: multivariate Gaussian Distribution

Introduction to Probabilistic Graphical Models

9 Forward-backward algorithm, sum-product on factor graphs

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Markov Chains and MCMC

Applying Bayesian networks in the game of Minesweeper

Faithfulness of Probability Distributions and Graphs

Markovian Combination of Decomposable Model Structures: MCMoSt

Bayesian Learning in Undirected Graphical Models

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

The Expectation-Maximization Algorithm

Random Field Models for Applications in Computer Vision

Markov Random Fields

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

Probabilistic Graphical Models

CS281A/Stat241A Lecture 19

Exact distribution theory for belief net responses

Transcription:

Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04

References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag New York, Inc. 2006 [3]. Kevin P. Murphy, Machine Learning - A Probabilistic Perspective, The MIT Press, 2012 [4]. Petr Šimeček, Independence Models, Workshop on Uncertainty Processing(WUPES), 2006, Mikulov. [5]. Radim Lněnička and František Matúš, On Gaussian Conditional Independence Structures, Kybernetika, Vol. 43(2007), No. 3, 327-342.

Outline Preliminaries on Graphical Models Directed graphical model Undirected graphical model Independence Models Gaussian Distributional framework Discrete Distributional framework

Outline Preliminaries on Graphical Models Independence Models

Preliminaries on Graphical Models Definition of Graphical Models: A graphical model is a probabilistic model for which a graph denotes the conditional dependence structure between random variables. Example: Suppose MIT and Stanford accepted undergraduate students only based on GPA MIT : Accepted by MIT Stanford : Accepted by Stanford GPA MIT Stanford Given Alice s GPA as GP A Alice, P(MIT Stanford, GP A Alice )=P(MIT GP A Alice ) We say MIT is conditionally independent of Stanford given GP A Alice Sometimes use symbol (MIT Stanford GP A Alice )

Bayesian networks: directed graphical model Bayesian networks A Bayesian network consists of a collection of probability distributions P over x = {x 1,, x K } that factorize over a directed acyclic graph(dag) in the following way: p(x) = p(x 1,, x K ) = k K p(x k pa k ) where pa k is the direct parents nodes of x k. Alias of Bayesian networks: probabilistic directed graphical model: via directed acyclic graph(dag) belief networks causal networks: directed arrows represent causal realtions

Bayesian networks: directed graphical model Examples of Bayesian networks Consider an arbitrary joint distribution p(x) = p(x 1, x 2, x 3 ) over three variables, we can write: p(x 1, x 2, x 3 ) = p(x 3 x 1, x 2 )p(x 1, x 2 ) = p(x 3 x 1, x 2 )p(x 2 x 1 )p(x 1 ) which can be expressed in the following directed graph: x 1 x 3 x 2

Bayesian networks: directed graphical model Examples Similarly, if we change the order of x 1, x 2 and x 3 (same as consider all permutations of them), we can express p(x 1, x 2, x 3 ) in five other different ways, for example: p(x 1, x 2, x 3 ) = p(x 1 x 2, x 3 )p(x 2, x 3 ) = p(x 1 x 2, x 3 )p(x 2 x 3 )p(x 3 ) which corresponding to the following directed graph: x 1 x 3 x 2

Bayesian networks: directed graphical model Examples Recall the previous example about how MIT and Stanford accept undergraduate students, if we assign x 1 to GPA, x 2 to accepted by MIT and x 3 to accepted by Stanford, then since p(x 3 x 1, x 2 ) = p(x 3 x 1 ) we have p(x 1, x 2, x 3 ) = p(x 3 x 1, x 2 )p(x 2 x 1 )p(x 1 ) = p(x 3 x 1 )p(x 2 x 1 )p(x 1 ) which corresponding to the following directed graph: x 1 GPA x 3 Stanford x 2 MIT

,x 4 } is not a clique because of the missing link from x 1 to x 4. re Markov define the random factors infields: the decomposition undirected of graphical the joint distribution model the variables In the undirected in the cliques. case, theinprobability fact, we can distribution consider factorizes functions ques, according without loss to functions of generality, definedbecause on the clique otherof cliques the graph. must be l cliques. A clique Thus, is aif subset {x 1,x 2 of,xnodes 3 } is ainmaximal a graph such clique that and there we define exist a on over linkthis between clique, allthen pairsincluding of nodes inanother subset. factor defined over a iablesawould maximal be redundant. clique is a clique such that it is not possible to a clique by C and the set of variables in that clique by x C. Then include any other nodes from the graph in the set without it ceasing to be a clique. ted graph showing a clique (outlined in al clique (outlined in blue). x 1 Example of cliques: x 2 {x 1,x 2 },{x 2,x 3 },{x 3,x 4 },{x 2,x 4 },{x 1,x 3 }, {x 1,x 2,x 3 }, {x 2,x 3,x 4 } Maximal cliques: {x 1,x 2,x 3 }, {x 2,x 3,x 4 } x 3 x 4

Markov random fields: undirected graphical model Markov random fields: Definition Denote C as a clique, x C the set of variables in clique C and ψ C (x C ) a nonnegative potential function associated with clique C. Then a Markov random field is a collection of distributions that factorize as a product of potential functions ψ C (x C ) over the maximal cliques of the graph: p(x) = 1 ψ Z C (x C ) where normalization constant Z = x called the partition function. C C ψ C(x C ) sometimes

Markov random fields: undirected graphical model Factorization of undirected graphs Question: how to write the joint distribution for this undirected graph? x 1 x 3 x 2 (2 3 1) hold Answer: p(x) = 1 Z ψ 12(x 1, x 2 )ψ 13 (x 1, x 3 ) where ψ 12 (x 1, x 2 ) and ψ 13 (x 1, x 3 ) are the potential functions and Z is the partition function that make sure p(x) satisfy the conditions to be a probability distribution.

Markov random fields: undirected graphical model Markov property Given an undirected graph G = (V,E), a set of random variables X = (X a ) a V indexed by V, we have the following Markov properties: Pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables: X a X b X V \{a,b} if {a, b} E Local Markov property: A variable is conditionally independent of all other variables given its neighbors: X a X V \{nb(a) a} X nb(a) where nb(a) is the neighbors of node a. Global Markov property: Any two subsets of variables are conditionally independent given a separating subset: X A X B X S, where every path from a node in A to a node in B passes through S(means when we remove all the nodes in S, there are no paths connecting any nodes in A to any nodes in B).

Markov random fields: undirected graphical model G. The dotted red node X 8 is independent of all other clude Examples its parents of Markov (blue), properties children (green) and co-parents GM. Pairwise The redmarkov nodeproperty: X (1 7 23456), (3 4 12567) 8 is independent of the other black Local Markov property: (1 4567 23), (4 13 2567) Global Markov property: (1 67 345), (12 67 345) 1 2 3 5 4 6 7 (b)

Markov random fields: undirected graphical model Relationship between different Markov properties and factorization property ( F ): Factorization property; ( G ): Global Markov property; ( L ): Local Markov property; ( P ):Pairwise Markov property if assuming strictly positive p( ) (F) (G) (L) (P) (P) (F) which give us the Hammersley-Clifford theorem.

Markov random fields: undirected graphical model The Hammersley-Clifford theorem(see Koller and Friedman 2009, p131 for proof) Consider graph G, for strictly positive p( ), the following Markov property and Factorization property are equivalent: Markov property: Any two subsets of variables are conditionally independent given a separating subset (X A, X B X S ) where every path from a node in A to a node in B passes through S. Factorization property: The distribution p factorizes according to G if it can be expressed as a product of potential functions over maximal cliques.

Markov random fields: undirected graphical model Example: Gaussian Markov random fields A multivariate normal distribution forms a Markov random field w.r.t. a graph G = (V, E) if the missing edges correspond to zeros on the concentration matrix(the inverse convariance matrix) Consider x = {x 1,, x K } N (0, Σ) with Σ regular and K = Σ 1. The concentration matrix of the conditional distribution of (x i, x j ) given x \{i,j} is ( ) kii k K {i,j} = ij k ji k jj Hence x i x j x \ij k ij = 0 {i, j} E

Markov random fields: undirected graphical model Example: Gaussian Markov random fields The joint distribution of gaussian markov random fields can be factorizes as: log p(x) = const 1 2 i k ii x 2 i {i,j} E k ij x i x j The zero entries in K are called structural zeros since they represent the absent edges in the undirected graph.

Outline Preliminaries on Graphical Models Independence Models

Gaussian Distributional framework The multivariate Guassian (Definition from Lauritzen)A d-dimensional random vector X = (X 1,, X d ) has a multivariate Gaussian distribution on R d if there is a vector ξ R d and a d d matrix Σ such that λ T X N (λ T ξ, λ T Σλ) for all λ R d (1) We write X N d (ξ, Σ), ξ is the mean vector and Σ the covariance matrix of the distribution. The validation of the definition requires that λ T Σλ 0, i.e. if Σ is a positive semidefinite matrix.

Gaussian Distributional framework Regular Guassian distribution If furthermore, Σ is positive definite, i.e. if λ T Σλ > 0 for λ 0, the distribution is called regular Guassian distribution and has a density f (x ξ, Σ) = (2π) d 1 (x ξ) T K (x ξ) 2 (det K ) 2 exp( ) (2) 2 where K = Σ 1 is the concentration matrix or precision matrix of the distribution. Note: a positive semidefinite matrix is positive definite if and only if it is invertible, we also call Σ a regular matrix.

Gaussian Distributional framework Properties of Gaussian distribution Given a matrix Σ = (σ a b ) a,b {1,,d} and A, B non-empty subsets of {1,, d}, we denote the submatrix with A-rows and B-columns be Σ A B = (σ a b ) a A,b B Now let A be a subset of {1,, d}, then the marginal distribution ξ A is also Gaussian distribution with variance matrix Σ A A.

Gaussian Distributional framework Properties of Gaussian distribution Let us partition {1,, d} into A and B such that A B = {1,, d} and A B =, the conditional distribution of ξ A given ξ B = x B is a Gaussian distribution with the variance matrix Σ A B = Σ A A Σ A B Σ B B Σ B A where Σ B B is any generalized inverse of Σ B B, which equals to Σ 1 B B if Σ B B is regular.

Gaussian Distributional framework Properties of Gaussian distribution Now assume Σ B B is regular, since ( ΣA A Σ A B ) 1 ( = KA A K A B ) Σ B A Σ B B K B A K B B we have K 1 A A = Σ A A Σ A B Σ 1 B B Σ B A (3) K 1 A A K A B = Σ A B Σ 1 B B (4) Thus Σ A B = K 1 A A

Gaussian Distributional framework Properties of Gaussian distribution Let A and B be a non-trivial partition of {1,, d}, we have ξ A B = ξ A K 1 A A K A B(x B ξ B ) (5) = ξ A + Σ A B Σ 1 B B (x B ξ B ) K A B = K A A (6) Thus X A and X B are independent if and only if K A B = 0, giving K A B = 0 if and only if Σ A B = 0.

Gaussian Distributional framework Properties of Gaussian distribution Let a and b be distinct elements of {1,, d} and C {1,, d}\ab, if Σ C C > 0, then the random variables ξ a and ξ b are independent given ξ C if and only if det(σ ac Cb ) = 0 which is proved by det Σ ac bc = 0 σ a b Σ a C Σ 1 C C Σ C b (Σ ab ab C ) a b = 0 ξ a ξ b ξ C

Gaussian Distributional framework Independence models Let D = {1, 2,, d} be a finite set and T D denotes the set of all pairs ab C such that ab is an (unordered) couple of distinct emements of D and C D\ab. Subsets of T D will be referred here as formal independence models over D. The independence model I(ξ) induced by a random vector ξ = ξ 1,, ξ d is the independence model over D defined as follows: I(ξ) = { ab C ; ξ a ξ b ξ C } An independence model I is said to be generally/regularly Gaussian representable if there exists a general/regular Gaussian distribution ξ such that I = I(ξ).

Gaussian Distributional framework Independence models: diagram Visualisation of independence model I over D such that D 4: Each element of D is plotted as a dot. If ab I then a and b are joined by a line; If ab c I then we put a line between dots corresponding to a and b and add small line in the middle pointing in c-direction; If both ab c I and ab d I are elements of I, then only one line with two small lines in the middle is plotted; If ab cd I then a brace between a and b are added. 1 2 Diagram of independence model I = {12, 23 1, 23 4, 34 12, 14, 14 2} 4 { 3

Gaussian Distributional framework Independence models:permutation Two independence models I and J over D are isomorphic/permutably equivalent if there exist a permutation π of D such that ab C I π(a)π(b) π(c) J where π(z ) stands for {π(z); z Z }. Example of three isomorphic models

Gaussian Distributional framework Independence models: example There are 5 regularly Gaussian representable permutation classes of independence models over D = {1, 2, 3}: I 1 I 2 = = { 12 } I 3 = { 12 {3} } I 4 = { 12, 12 {3}, 23, 23 {1} } I 5 = { 12, 12 {3}, 23, 23 {1}, 13, 13 {2} } In addition there are two generally representable permutation classes that are not regularly representable: I 6 = { 12 {3}, 23 {1} } I 7 = { 12 {3}, 23 {1}, 13 {2} }

Gaussian Distributional framework Independence models: minor If I is an independence model over D = {1,, d} and E, F are disjoint subsets of D then let us define the minor I[ F E as an independence model over D\EF I[ F E = { ab C ; E (abc) =, ab CF I} Notice that I[ E = { ab C ; E / ({a, b} C)}, I[ F = { ab C ; ab CF I}, I[ = I, (I[F 1 E 1 )[ F 2 E 2 = [ F 1F 2 E 1 E 2. Properties involving minors If an independence model I is generally/regularly representable, then all its minors are generally/regularly representable.

Gaussian Distributional framework Regular Gaussian representable Every regularly Gaussian representable model must have regularly representable minors. Using this property, among all the independence models for four variables, 58 classes of permutation equivalence with regularly representable minors are found for four variables. Among the 58 types, 5 of them are not regularly representable, which left with 53 classes of permutation equivalent rugularly representable models(629 independence models after permutation).

Gaussian Distributional framework Independence Models Regular Gaussian representable:m1-m30 to be continue 7

Gaussian Distributional framework Regular Gaussian representable:m31-m53 M54-M58 are not regularly Gaussian representable

Gaussian Distributional framework General Gaussian representable:m1-m53, M59-M85 In addition, there are 27 Gaussian representable independence models(m59-m85) which are not regular representable.

Graphical Independence Models Independence Models representable by undirected graph For 4 variables, there are only 11 graphical(9 non-trivial) type of independence models, while there are 80 general gaussian representable, 53 regular gaussian representable independence models in total. = 1 2.......... 3... 4 1 2. = 3...... 4 1. 2 =... 1, 3... 4 1, = 1 2..... 1, 3... 4 1. 2. = 1,... 1, 3... 4 = 1, 1 2.... 3 4 = = 1 2.......... 3 4 1 2 2,.... 1, 3... 4 1,2, 1. 2 =... 3 4

Discrete Distributional framework Discrete distribution A random vector ξ = (ξ 1,, ξ d ) is called discrete if each ξ a takes values in a state space X a such that 0 < X a <. In particular, ξ is called binary if a : X a = 2. A discrete random vector ξ is called positive if x X = n X a : 0 < P(ξ = x) < 1 a=1 Conditional independence for discrete variables For discrete distributioned random vector, variables ξ a and ξ b are independent given ξ C if and only if for any x abc X abc : P(ξ abc = x abc ) P(ξ C = x C ) = P(ξ ac = x ac ) P(ξ bc = x bc )

Discrete Distributional framework Discrete representable models The class of all discrete representable models can be described by the set C of irreducible models, i.e. nontrivial discrete representable models that cannot be written as an intersection of two other discrete representable models. There are 13 types of irreducible models for discrete distribution, and 18478 models in total corresponding to 1098 types.

C A Discrete Distributional framework As shown in [5], [6] and [7] there are only only 13 types of irreducible models, see Figure 4, and 18478 d representable independence models corresponding to 1098 Discrete types. representable models Figure 4: Irreducible types. Notes: results on positive discrete representable models are not known yet. +

Thanks! Questions!