Message Passing Algorithms and Junction Tree Algorithms

Similar documents
Inference in Graphical Models Variable Elimination and Message Passing Algorithm

CS281A/Stat241A Lecture 19

Probabilistic Graphical Models

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Statistical Approaches to Learning and Discovery

Machine Learning 4771

6.867 Machine learning, lecture 23 (Jaakkola)

Machine Learning 4771

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Bayesian Networks Representation and Reasoning

Variable Elimination (VE) Barak Sternberg

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

13 : Variational Inference: Loopy Belief Propagation

Exact Inference: Clique Trees. Sargur Srihari

Machine Learning Summer School

Bayesian & Markov Networks: A unified view

Probabilistic Graphical Models (I)

Machine Learning Lecture 14

9 Forward-backward algorithm, sum-product on factor graphs

Probabilistic Graphical Models

Inference in Bayesian Networks

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Recitation 9: Graphical Models: D-separation, Variable Elimination and Inference

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Variable Elimination: Algorithm

Variational Inference (11/04/13)

Variable Elimination: Algorithm

Junction Tree, BP and Variational Methods

Review. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

p L yi z n m x N n xi

Graphical Models and Kernel Methods

Graphical Models for Collaborative Filtering

Lecture 8: Bayesian Networks

Example: multivariate Gaussian Distribution

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Inference as Optimization

11 The Max-Product Algorithm

4 : Exact Inference: Variable Elimination

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

14 : Theory of Variational Inference: Inner and Outer Approximation

Inference and Representation

Context-specific independence Parameter learning: MLE

Exact Inference: Variable Elimination

5. Sum-product algorithm

Bayesian Machine Learning - Lecture 7

17 Variational Inference

14 : Theory of Variational Inference: Inner and Outer Approximation

12 : Variational Inference I

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Lecture 17: May 29, 2002

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Loopy Belief Propagation for Bipartite Maximum Weight b-matching

Probabilistic Graphical Models

Review: Directed Models (Bayes Nets)

COMPSCI 276 Fall 2007

Bayesian networks: approximate inference

Undirected Graphical Models: Markov Random Fields

Graphical Models - Part II

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Lecture 21: Spectral Learning for Graphical Models

13: Variational inference II

Intelligent Systems:

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Undirected graphical models

Probabilistic Graphical Models

Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum

CSC 412 (Lecture 4): Undirected Graphical Models

Variable Elimination: Basic Ideas

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

COMP538: Introduction to Bayesian Networks

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

CS Lecture 4. Markov Random Fields

Machine Learning for Data Science (CS4786) Lecture 24

Alternative Parameterizations of Markov Networks. Sargur Srihari

Undirected Graphical Models

Undirected Graphical Models

STA 4273H: Statistical Machine Learning

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

Lecture 8: PGM Inference

Directed Graphical Models or Bayesian Networks

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Chapter 7 Network Flow Problems, I

Probability Propagation

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Representation of undirected GM. Kayhan Batmanghelich

Lecture 9: PGM Learning

Directed and Undirected Graphical Models

Fisher Information in Gaussian Graphical Models

2 : Directed GMs: Bayesian Networks

Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes

Learning MN Parameters with Approximation. Sargur Srihari

Cours 7 12th November 2014

Organization. I MCMC discussion. I project talks. I Lecture.

Transcription:

Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012

Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ( i ) i Want to query Y variable given evidence e, and don t care a set of Z variables ompute τ Y, e = Z i Ψ( i ) using variable elimination Renormalize to obtain the conditionals P Y e = τ(y,e) Y τ(y,e) Two examples: use graph structure to order computation : hain: 2

hain: Query m b m c m d m Nice localization in computation P = P a)p b a P c b P d c P( d d c b a P = P d P d c ( P c b P b a P a) d c b a m b m c m d P = m 3

hain: Query m b m m m d Start elimination away from the query variable P() = d e b a P a)p b a P c b P d c P(e d P() = ( P d ( P(e d))) ( P b ( P b a P a d e b a )) m d m b m m P = m m () 4

hain: What if I want to query everybody P = ( c P c ( d P d c ( e P e d ))) a P a P a m m m c m d Query P, P, P, P, P() omputational cost ach message O K 2 hain length is L ost for each query is about O LK 2 or L queries, cost is about O L 2 K 2 5

What is shared in these queries? P = ( c P c ( d P d c ( e P e d ))) a P a P a m m m c m d P = P d P d c ( P c b P b a P a) d c m b m c m d m b a P = ( P d ( P(e d))) ( P b ( P b a P a d e b a )) m b m m m d The number of unique message is 2(L 1) 6

orward-backward algorithm ompute and cache the 2(L 1) unique messages orward pass: m b m c m d m e ackward pass: In query time, just multiply together the messages from the neighbors eg. P m a m b m c m d = m m () or all queries, O 2LK 2 m m 7

: Variable elimination limination order,,,,,, P = P P d ( ( P b P c b )( P e c, d ( P g e )( P f P h e, f ))) d c b e g f h m c m e m () e, f m (), c, d m (), e 4-way tables created! m (), d m 8

: liques of size 4 are generated 9 m () e, f m e m (), e m (), c, d m c m (), d m 4-way tables created!

: different elimination order limination order,,,,,, P = e ( d P(d ) c P(e c, d) b P b P c b f P f h P h e, f P g e g ) m c m () e, f m e m () e, d m (), e m (), e m NO 4-way tables! 10

: No cliques of size 4 11 m e m () e, f m (), e m c m () d, e m (), e m

ny thoughts? hain has nice properties forward-backward algorithm works Immediate results (messages) along edges an we generalize to other graphs? (trees, loopy graphs?) ow about undirected trees? Is there a forward-backward algorithm? Loopy graph is more complicated ifferent elimination order results in different computational cost an we somehow make loopy graph behave like trees? 12

Tree raphical Models Undirected tree: a unique path between any pair of nodes irected tree: all nodes except the root have exactly one parent 13

quivalence of directed and undirected trees ny undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it directed tree and the corresponding undirected tree make the conditional independence assertions Parameterization are essentially the same Undirected tree: P X = 1 Z i V Ψ X i Ψ(X i, X j ) (i,j) irected tree: P X = P X r P(X j X i ) i,j quivalence: Ψ X i = P X r, Ψ X i, X j = P X j X i, Z = 1, Ψ X i = 1 14

Message passing on trees Message passed along tree edges P X i, X j, X k, X l, X f Ψ X i Ψ X j Ψ X k Ψ X l Ψ X f Ψ X i, X j Ψ X k, X j Ψ X l, X j Ψ(X i, X f ) P f = Ψ(X f ) (Ψ X i Ψ X i, X f Ψ X j Ψ X i, X j ( xk Ψ X k Ψ X k, X j )( Ψ X l Ψ X l, X j x i x j xl )) m kj X k m lj X j m ji X i m if X f k m kj X j m ji X i m if X f j i f l m lj X j 15

Sharing messages on trees Query f k m kj X j m ji X i m if X f j i f l m lj X j Query j k m kj X j m ij X j m fi X i j i f l m lj X j 16

omputational cost for all queries k m kj X j m ij X j m fi X i j i f l m lj X j Query P X k, P X l, P X j, P X i, P X f oing things separately ach message O K 2 Number of edges is L ost for each query is about O LK 2 or L queries, cost is about O L 2 K 2 17

orward-backward algorithm in trees orward: pick one leave as root, compute all messages, cache k m kj X j m ji X i m if X f j i f l m lj X j resuse ackward: pick another root, compute all messages, cache k m jk X k m ij X j m if X f j i f g. Query j l m lj X j k m kj X j j m ij X j i f l m lj X j 18

omputational saving for trees ompute forward and backward messages for each edge, save them oing things separately ach message O K 2 Number of edges is L 2L unique messages ost for all queries is about O 2LK 2 k m jk X k m kj X j j m ij X j m ji X i i m fi X i m if X f f l m lj X j m jl X l 19

Message passing algorithm m ji X i Xj Ψ X i, X j Ψ X j s N j \i m sj X j product of incoming messages multiply by local potentials N j \i k m kj X j Sum out X j m ji X i X j can send message when incoming messages from N j \i arrive j i f l m lj X j 20

Message passing for loopy graph Local message passing for trees guarantees the consistency of local marginals P X i computed is the correct one P X i, X j computed is the correct on or loopy graphs, no consistency guarantees for local message passing k m kj X j m ji X i j i f l m lj X j 21

Message update schedule Synchronous update: X j can send message when incoming messages from N j \i arrive Slow Provably correct for tree, may converge for loopy graphs synchronous update: X j can send message when there is a change in any incoming messages from N j \i ast Not easy to prove convergence, but empirically it often works 22

ow about general graph? Trees are nice an just compute two messages for each edge Order computation along the graph ssociate intermediate results with edges eneral graph is not so clear ifferent elimination generate different cliques and factor size omputation and immediate results not associated with edges Local computation view is not so clear k l m jk X k m ij X j m fi X i m kj X j m ji X i m if X f j i f m lj X j m jl X l an we make them tree like? 23

lique raph clique graph for if ach node in corresponds to a clique in and each maximal clique in is a node in ach edge is common set for two nodes i and j L L 24

lique raph: nother example L L L an run message passing on this tree? are in 3 different places L 25

The junction tree Junction tree clique tree with running intersection property: if two cliques share certain variables, then these variables appear everywhere on the path between them L L 26

ow to obtain Junction tree Run maximum spanning tree algorithm on the clique graph dge weight is the size of the variable on the edge L Maximum Spanning tree L 27

Junction tree algorithm for Inference Moralize the graph Triangulate the graph Obtain clique tree Obtain junction tree Run local message passing on clique level instead 28