Message Passing Algorithms and Junction Tree Algorithms

Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012

Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ( i ) i Want to query Y variable given evidence e, and don t care a set of Z variables ompute τ Y, e = Z i Ψ( i ) using variable elimination Renormalize to obtain the conditionals P Y e = τ(y,e) Y τ(y,e) Two examples: use graph structure to order computation : hain: 2

hain: Query m b m c m d m Nice localization in computation P = P a)p b a P c b P d c P( d d c b a P = P d P d c ( P c b P b a P a) d c b a m b m c m d P = m 3

hain: Query m b m m m d Start elimination away from the query variable P() = d e b a P a)p b a P c b P d c P(e d P() = ( P d ( P(e d))) ( P b ( P b a P a d e b a )) m d m b m m P = m m () 4

hain: What if I want to query everybody P = ( c P c ( d P d c ( e P e d ))) a P a P a m m m c m d Query P, P, P, P, P() omputational cost ach message O K 2 hain length is L ost for each query is about O LK 2 or L queries, cost is about O L 2 K 2 5

What is shared in these queries? P = ( c P c ( d P d c ( e P e d ))) a P a P a m m m c m d P = P d P d c ( P c b P b a P a) d c m b m c m d m b a P = ( P d ( P(e d))) ( P b ( P b a P a d e b a )) m b m m m d The number of unique message is 2(L 1) 6

orward-backward algorithm ompute and cache the 2(L 1) unique messages orward pass: m b m c m d m e ackward pass: In query time, just multiply together the messages from the neighbors eg. P m a m b m c m d = m m () or all queries, O 2LK 2 m m 7

: Variable elimination limination order,,,,,, P = P P d ( ( P b P c b )( P e c, d ( P g e )( P f P h e, f ))) d c b e g f h m c m e m () e, f m (), c, d m (), e 4-way tables created! m (), d m 8

: liques of size 4 are generated 9 m () e, f m e m (), e m (), c, d m c m (), d m 4-way tables created!

: different elimination order limination order,,,,,, P = e ( d P(d ) c P(e c, d) b P b P c b f P f h P h e, f P g e g ) m c m () e, f m e m () e, d m (), e m (), e m NO 4-way tables! 10

: No cliques of size 4 11 m e m () e, f m (), e m c m () d, e m (), e m

ny thoughts? hain has nice properties forward-backward algorithm works Immediate results (messages) along edges an we generalize to other graphs? (trees, loopy graphs?) ow about undirected trees? Is there a forward-backward algorithm? Loopy graph is more complicated ifferent elimination order results in different computational cost an we somehow make loopy graph behave like trees? 12

Tree raphical Models Undirected tree: a unique path between any pair of nodes irected tree: all nodes except the root have exactly one parent 13

quivalence of directed and undirected trees ny undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it directed tree and the corresponding undirected tree make the conditional independence assertions Parameterization are essentially the same Undirected tree: P X = 1 Z i V Ψ X i Ψ(X i, X j ) (i,j) irected tree: P X = P X r P(X j X i ) i,j quivalence: Ψ X i = P X r, Ψ X i, X j = P X j X i, Z = 1, Ψ X i = 1 14

Message passing on trees Message passed along tree edges P X i, X j, X k, X l, X f Ψ X i Ψ X j Ψ X k Ψ X l Ψ X f Ψ X i, X j Ψ X k, X j Ψ X l, X j Ψ(X i, X f ) P f = Ψ(X f ) (Ψ X i Ψ X i, X f Ψ X j Ψ X i, X j ( xk Ψ X k Ψ X k, X j )( Ψ X l Ψ X l, X j x i x j xl )) m kj X k m lj X j m ji X i m if X f k m kj X j m ji X i m if X f j i f l m lj X j 15

Sharing messages on trees Query f k m kj X j m ji X i m if X f j i f l m lj X j Query j k m kj X j m ij X j m fi X i j i f l m lj X j 16

omputational cost for all queries k m kj X j m ij X j m fi X i j i f l m lj X j Query P X k, P X l, P X j, P X i, P X f oing things separately ach message O K 2 Number of edges is L ost for each query is about O LK 2 or L queries, cost is about O L 2 K 2 17

orward-backward algorithm in trees orward: pick one leave as root, compute all messages, cache k m kj X j m ji X i m if X f j i f l m lj X j resuse ackward: pick another root, compute all messages, cache k m jk X k m ij X j m if X f j i f g. Query j l m lj X j k m kj X j j m ij X j i f l m lj X j 18

omputational saving for trees ompute forward and backward messages for each edge, save them oing things separately ach message O K 2 Number of edges is L 2L unique messages ost for all queries is about O 2LK 2 k m jk X k m kj X j j m ij X j m ji X i i m fi X i m if X f f l m lj X j m jl X l 19

Message passing algorithm m ji X i Xj Ψ X i, X j Ψ X j s N j \i m sj X j product of incoming messages multiply by local potentials N j \i k m kj X j Sum out X j m ji X i X j can send message when incoming messages from N j \i arrive j i f l m lj X j 20

Message passing for loopy graph Local message passing for trees guarantees the consistency of local marginals P X i computed is the correct one P X i, X j computed is the correct on or loopy graphs, no consistency guarantees for local message passing k m kj X j m ji X i j i f l m lj X j 21

Message update schedule Synchronous update: X j can send message when incoming messages from N j \i arrive Slow Provably correct for tree, may converge for loopy graphs synchronous update: X j can send message when there is a change in any incoming messages from N j \i ast Not easy to prove convergence, but empirically it often works 22

ow about general graph? Trees are nice an just compute two messages for each edge Order computation along the graph ssociate intermediate results with edges eneral graph is not so clear ifferent elimination generate different cliques and factor size omputation and immediate results not associated with edges Local computation view is not so clear k l m jk X k m ij X j m fi X i m kj X j m ji X i m if X f j i f m lj X j m jl X l an we make them tree like? 23

lique raph clique graph for if ach node in corresponds to a clique in and each maximal clique in is a node in ach edge is common set for two nodes i and j L L 24

lique raph: nother example L L L an run message passing on this tree? are in 3 different places L 25

The junction tree Junction tree clique tree with running intersection property: if two cliques share certain variables, then these variables appear everywhere on the path between them L L 26

ow to obtain Junction tree Run maximum spanning tree algorithm on the clique graph dge weight is the size of the variable on the edge L Maximum Spanning tree L 27

Junction tree algorithm for Inference Moralize the graph Triangulate the graph Obtain clique tree Obtain junction tree Run local message passing on clique level instead 28