Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Similar documents
Probabilistic Graphical Models

p L yi z n m x N n xi

Machine Learning 4771

Junction Tree, BP and Variational Methods

Representation of undirected GM. Kayhan Batmanghelich

Machine Learning Lecture 14

Exact Inference: Clique Trees. Sargur Srihari

Message Passing Algorithms and Junction Tree Algorithms

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

CS281A/Stat241A Lecture 19

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

6.867 Machine learning, lecture 23 (Jaakkola)

Variable Elimination: Algorithm

Statistical Approaches to Learning and Discovery

Probabilistic Graphical Models (I)

Probabilistic and Bayesian Machine Learning

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Inference and Representation

Inference in Bayesian Networks

From Bayesian Networks to Markov Networks. Sargur Srihari

On the Choice of Regions for Generalized Belief Propagation

Lecture 17: May 29, 2002

Variable Elimination: Algorithm

CS Lecture 4. Markov Random Fields

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Lecture 9: PGM Learning

CSC 412 (Lecture 4): Undirected Graphical Models

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

14 : Theory of Variational Inference: Inner and Outer Approximation

13 : Variational Inference: Loopy Belief Propagation

Probabilistic Graphical Models

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Machine Learning - Lecture 7

12 : Variational Inference I

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Probability Propagation

Lecture 6: Graphical Models

The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks

Fractional Belief Propagation

Intelligent Systems:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Machine Learning 4771

Min-Max Message Passing and Local Consistency in Constraint Networks

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Bayesian & Markov Networks: A unified view

Exact Inference Algorithms Bucket-elimination

14 : Theory of Variational Inference: Inner and Outer Approximation

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Does Better Inference mean Better Learning?

Probabilistic Graphical Models

Nash Propagation for Loopy Graphical Games

Inference as Optimization

Exact Inference Algorithms Bucket-elimination

9 Forward-backward algorithm, sum-product on factor graphs

COMPSCI 276 Fall 2007

Variable Elimination (VE) Barak Sternberg

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Graphical models. Sunita Sarawagi IIT Bombay

MACHINE LEARNING 2 UGM,HMMS Lecture 7

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Lecture 8: Bayesian Networks

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Lecture 15. Probabilistic Models on Graph

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Data Mining 2018 Bayesian Networks (1)

2 : Directed GMs: Bayesian Networks

A Generalized Loop Correction Method for Approximate Inference in Graphical Models

Linear Response for Approximate Inference

Graphical Models and Kernel Methods

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Bayesian Learning in Undirected Graphical Models

Bayesian Networks (Part II)

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Lecture 8: PGM Inference

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Graphical Model Inference with Perfect Graphs

Bayesian networks: approximate inference

4 : Exact Inference: Variable Elimination

Probabilistic Graphical Models

Bayesian Networks Representation and Reasoning

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Undirected Graphical Models

COMP90051 Statistical Machine Learning

Directed Graphical Models or Bayesian Networks

Bayesian Learning in Undirected Graphical Models

Variational algorithms for marginal MAP

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science ALGORITHMS FOR INFERENCE Fall 2014

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Machine Learning Summer School

Example: multivariate Gaussian Distribution

Structure Learning: the good, the bad, the ugly

2 : Directed GMs: Bayesian Networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Undirected Graphical Models: Markov Random Fields

Transcription:

Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1

Review 2

Review 3

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11 here (= 7+3+1) Slides adapted from Matt Gormley (2016) adapted from MacKay (2003) textbook 4

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here (= 3+3+1) 3 here Slides adapted from Matt Gormley (2016) adapted from MacKay (2003) textbook 5

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 11 here (= 7+3+1) 7 here 3 here Slides adapted from Matt Gormley (2016) adapted from MacKay (2003) textbook 6

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 3 here Belief: Must be 14 of us Slides adapted from Matt Gormley (2016) adapted from MacKay (2003) textbook 7

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 3 here Belief: Must be 14 of us Slides adapted from Matt Gormley (2016) adapted from MacKay (2003) textbook wouldn't work correctly with a 'loopy' (cyclic) graph 8

Review B A B A A C m c m b Message from one C1 to C2: Multiply all incoming messages with the local factor and sum over variables that are not shared m d C E D m e A C m g E D A m f E E F m h A F G H me ( a, c, d) = å p( e c, d) m e g ( e) m f ( a, e) Eric Xing @ CMU, 2005-2015 9

Message passing (Belief Propagation) on singly connected graph 10

Remember this: Factor Graph? Random variables x 1 f(x 1,x 2 ) x 2 x 3 f(x 1,x 3 ) x 4 f(x2,x 4 ) Factors A factor graph is a graphical model representation that unifies directed and undirected models It is an undirected bipartite graph with two kinds of nodes. Round nodes represent variables, Square nodes represent factors and there is an edge from each variable to every factor that mentions it. We are going to study messages passing between nodes. 11

How General Are Factor Graphs? Factor graphs can be used to describe Markov Random Fields (undirected graphical models) i.e., log-linear models over a tuple of variables Conditional Random Fields Bayesian Networks (directed graphical models) Inference treats all of these interchangeably. Convert your model to a factor graph first. Pearl (1988) gave key strategies for exact inference: Belief propagation, for inference on acyclic graphs Junction tree algorithm, for making any graph acyclic (by merging variables and factors: blows up the runtime)

Factor Graph Notation Variables: X 9 ψ {1,8,9} Factors: X 8 ψ {2,7,8} X 7 ψ {3,6,7} Joint Distribution X 6 ψ 10 X 1 ψ {1,2} ψ 2 X 2 ψ {2,3} ψ {3,4} X 3 X 4 ψ 8 X ψ {1} 1 ψ {2} 3 ψ {3} 5 ψ 7 ψ time flies like an arr 13

Factors are Tensors Factors: s vp pp s 0 s 2 vp.3 pp s 0 s2 vp.3pp vp 3 4 2 vp s3 04 2.3 pp.1 2 1 pp vp.1 32 41 2 pp.1 2 1 s vppp X 9 ψ {1,8,9} X 8 ψ {2,7,8} v n p d v 1 6 3 4 n 8 4 2 0.1 p 1 3 1 3 d 0.1 8 0 0 X 7 ψ {3,6,7} X 6 ψ 10 Slides adapted from Matt Gormley (2016) v 3 n 4 p 0.1 d 0.1 X 1 ψ {1} 1 time ψ {1,2} ψ 2 X 2 ψ {2,3} ψ {3,4} X 3 ψ {2} 3 ψ {3} 5 X 4 ψ 8 ψ 7 flies like an arr 14 X ψ

An Inference Example 15

Message Passing in Belief Propagation v 6 n 6 a 9 v 1 n 6 a 3 My other factors think I m a noun X Ψ Slides adapted from Matt Gormley (2016) But my other variables and I think you re a verb v 6 n 1 a 3 Both of these messages judge the possible values of variable X. Their product = belief at X = product of all 3 messages to X. 16

Sum-Product Belief Propagation Variables Factors Beliefs ψ 2 X 2 ψ 1 X 1 ψ 3 X 1 ψ 1 X 3 Messages ψ 2 X 2 ψ 1 X 1 ψ 3 ψ 1 X 1 X 3 Slides adapted from Matt Gormley (2016) 17

Sum-Product Belief Propagation Variable Belief v 0.1 n 3 p 1 ψ 2 v 1 n 2 p 2 v 4 n 1 p 0 ψ 1 X 1 ψ 3 Slides adapted from Matt Gormley (2016) v.4 n 6 p 0 18

Sum-Product Belief Propagation Variable Message v 0.1 n 3 p 1 ψ 2 v 1 n 2 p 2 v 0.1 n 6 p 2 ψ 1 X 1 ψ 3 Slides adapted from Matt Gormley (2016) 19

Sum-Product Belief Propagation Factor Belief p 4 d 1 n 0 v n p 0.1 8 d 3 0 n 1 1 v 8 n 0.2 ψ 1 X 1 X 3 v n p 3.2 6.4 d 24 0 n 0 0 Slides adapted from Matt Gormley (2016) 20

Sum-Product Belief Propagation Factor Belief ψ 1 X 1 X 3 v n p 3.2 6.4 d 24 0 n 0 0 Slides adapted from Matt Gormley (2016) 21

Sum-Product Belief Propagation Factor Message p 0.8 + 0.16 d 24 + 0 n 8 + 0.2 v n p 0.1 8 d 3 0 n 1 1 v 8 n 0.2 ψ 1 X 1 X 3 Slides adapted from Matt Gormley (2016) 22

Sum-Product Belief Propagation Factor Message matrix-vector product (for a binary factor) ψ 1 X 1 X 3 Slides adapted from Matt Gormley (2016) 23

Summary of the Messages 24

Sum-Product Belief Propagation Input: a factor graph with no cycles Output: exact marginals for each variable and factor Algorithm: 1. Initialize the messages to the uniform distribution. 1. Choose a root node. 2. Send messages from the leaves to the root. Send messages from the root to the leaves. 1. Compute the beliefs (unnormalized marginals). 2. Normalize beliefs and return the exact marginals. 25

Sum-Product Belief Propagation Variables Factors Beliefs ψ 2 X 2 ψ 1 X 1 ψ 3 X 1 ψ 1 X 3 Messages ψ 2 X 2 ψ 1 X 1 ψ 3 ψ 1 X 1 X 3 Slides adapted from Matt Gormley (2016) 26

Sum-Product Belief Propagation Variables Factors Beliefs ψ 2 X 2 ψ 1 X 1 ψ 3 X 1 ψ 1 X 3 Messages ψ 2 X 2 ψ 1 X 1 ψ 3 ψ 1 X 1 X 3 Slides adapted from Matt Gormley (2016) 27

(Acyclic) Belief Propagation In a factor graph with no cycles: 1. Pick any node to serve as the root. 2. Send messages from the leaves to the root. 3. Send messages from the root to the leaves. A node computes an outgoing message along an edge only after it has received incoming messages along all its other edges. X 8 ψ 12 X 7 ψ 11 X 9 X 6 ψ 13 ψ 10 X 1 X 2 X 3 X 4 X 5 Slides adapted from Matt Gormley (2016) ψ 1 time ψ 3 ψ 5 like flies an arrow ψ 7 ψ 9 28

(Acyclic) Belief Propagation In a factor graph with no cycles: 1. Pick any node to serve as the root. 2. Send messages from the leaves to the root. 3. Send messages from the root to the leaves. A node computes an outgoing message along an edge only after it has received incoming messages along all its other edges. X 8 ψ 12 X 7 ψ 11 X 9 X 6 ψ 13 ψ 10 X 1 X 2 X 3 X 4 X 5 Slides adapted from Matt Gormley (2016) ψ 1 time ψ 3 ψ 5 flies like an arrow ψ 7 ψ 9 29

A note on the implementation To avoid numerical precision issue, use log message ( ): 30

How about other queries? (MPA, Evidence) 31

Example 32

The Max Product Algorithm 33

Can I use BP in a multiply connected graph? 34

Loops the trouble makers One needs to account for the fact that the structure of the graph changes. The junction tree algorithm deals with this by combining variables to make a new singly connected graph for which the graph structure remains singly connected under variable elimination. 35

Clique Graph Def (Clique Graph): A clique graph consists of a set of potentials,! " # ",,! & (# & ) each defined on a set of variables # ". For neighboring cliques on the graph, defined on sets of variables # " and # *, the intersection # + = # - # * is called the separator and has a corresponding potential! + (# + ). A clique graph represents the function Example 36

Clique Graph Def (Clique Graph): A clique graph consists of a set of potentials,! " # ",,! & (# & ) each defined on a set of variables # ". For neighboring cliques on the graph, defined on sets of variables # " and # *, the intersection # + = # - # * is called the separator and has a corresponding potential! + (# + ). A clique graph represents the function Don t confuse it with Factor Graph! Example 37

Example: probability density 38

Junction Tree Idea: form a new representation of the graph in which variables are clustered together, resulting in a singly-connected graph in the cluster variables. Insight: distribution can be written as product of marginal distributions, divided by a product of the intersection of the marginal distributions. Not a remedy to the intractability. 39

Let s learn by an example. Coherence Coherence Difficulty Intelligence Difficulty Intelligence Grades SAT Grades SAT Happy Letter Job Happy Letter Job Moralization 40

Let s learn by an example. Difficulty Coherence Intelligence Let s pick an ordering for the variable elimination C, D, I, H, G, S, L Grades SAT Happy Letter Job 41

Let s learn by an example. Difficulty Coherence Intelligence Let s pick an ordering for the variable elimination C, D, I, H, G, S, L Grades SAT Happy Letter Job 42

Let s learn by an example. Difficulty Coherence Intelligence Let s pick an ordering for the variable elimination C, D, I, H, G, S, L Grades SAT Happy Letter Job 43

Let s learn by an example. Difficulty Coherence Intelligence Let s pick an ordering for the variable elimination C, D, I, H, G, S, L Grades SAT Happy Letter Job 44

Let s learn by an example. Difficulty Coherence Intelligence Let s pick an ordering for the variable elimination C, D, I, H, G, S, L Grades SAT Happy Letter Job 45

Let s learn by an example. Difficulty Coherence Grades Intelligence SAT Let s pick an ordering for the variable elimination C, D, I, H, G, S, L The rest is obvious Happy Letter Job 46

OK, what we got so far? Coherence Coherence Difficulty Intelligence Difficulty Intelligence Grades SAT Grades SAT Happy Letter Job Happy Letter Job We started with Moralized and Triangulated 47

OK, what we got so far? Def: An undirected graph is triangulated if every loop Coherence Coherence of length 4 or more has a chord. An equivalent term is that the graph is decomposable or chordal. From this Difficulty Intelligence Difficulty Intelligence definition, one may show that an undirected graph is triangulated SAT if and only if its clique graph has a SAT Grades Grades junction tree. Happy Letter Job Happy Letter Job We started with Moralized and Triangulated 48

Let s build the Junction Tree Coherence C,D Difficulty Intelligence D D,I,G Grades SAT G,I G,I,S L,J Happy Letter Job G,S G,J,S,L J,S,L J,S,L L,J G,J The ordering C, D, I, H, G, S, L 49 H,G,J

Pass the messages on the JT Coherence C,D Difficulty Intelligence D D,I,G Grades SAT G,I G,I,S L,J Happy Letter Job G,S G,J,S,L J,S,L J,S,L L,J The ordering C, D, I, H, G, S, L G,J H,G,J can you figure out the directionality? 50

The message passing protocol Message passing protocol Cluster B is allowed to send a message to a neighbor C only after it has received messages from all neighbors except C. COLLECT DISTRIBUTE root 51

Message from Clique to another (The Shafer-Shenoy Algorithm) B B \ C C µ B!C (u) = X B(u [ v) Y µ A!B (u A [ v A ) v2b\c (A,B)2E A6=C 52

Formal Algorithm Moralisation: Marry the parents (only for directed distributions). Triangulation: Ensure that every loop of length 4 or more has a chord. Junction Tree: Form a junction tree from cliques of the triangulated graph, removing any unnecessary links in a loop on the cluster graph. Algorithmically, this can be achieved by finding a tree with maximal spanning weight with weight given by the number of variables in the separator between cliques. Alternatively, given a clique elimination order (with the lowest cliques eliminated first), one may connect each clique to the single neighboring clique. Potential Assignment: Assign potentials to junction tree cliques and set the separator potentials to unity. Message Propagation 53

Some Facts about BP BP is exact on trees. If BP converges it has reached a local minimum of an objective function (the Bethe free energy Yedidia et.al 00, Heskes 02)àoften good approximation If it converges, convergence is fast near the fixed point. Many exciting applications: - error correcting decoding (MacKay, Yedidia, McEliece, Frey) - vision (Freeman, Weiss) - bioinformatics (Weiss) - constraint satisfaction problems (Dechter) - game theory (Kearns) - 54

Summary of the Network Zoo x 1 f(x 1,x 2 ) x 2 x 3 f(x 1,x 3 ) x 4 f(x2,x 4 ) UGM and DGM Use to represent family of probability distributions Clear definition of arrows and circles Factor Graph A way to present factorization for both UGM and DGM It is bipartite graph More like a data structure Not to read the independencies Clique graph or Junction Tree A data structure used for exact inference and message passing Nodes are cluster of variables Not to read the independencies 55