Exact Inference: Clique Trees. Sargur Srihari

Size: px

Start display at page:

Download "Exact Inference: Clique Trees. Sargur Srihari"

Rudolph Greer
5 years ago
Views:

1 Exact Inference: Clique Trees Sargur 1

2 Topics 1. Overview 2. Variable Elimination and Clique Trees 3. Message Passing: Sum-Product VE in a Clique Tree Clique-Tree Calibration 4. Message Passing: Belief Update 5. Constructing a Clique Tree 2

3 Overview Two methods of inference using factors Φ over variables χ 1. Variable elimination (VE) algorithm uses factor representation and local operations instead of generating entire distribution (See next slide) 2. Clique Trees: alternative implementation of same insight Use a more global data structure for scheduling operations 3

4 Sum-product VE P(C,D,I,G,S,L,J,H)= P(C)P(D C)P(I)P(G I,D)P(S I)P(L G)P(J L)P(H G,J)= ϕ C (C) ϕ D (D,C) ϕ I (I) ϕ G (G,I,D) ϕ S (S,I) ϕ L (L,G) ϕ J (J,L,S) ϕ H (H,G,J) Elimination ordering C,D,I,H.G,S,L 1. Eliminating C: Compute the factors 2. Eliminating D: Note we already eliminated one factor with D, but introduced τ 1 involving D 3. Eliminating I: 4. Eliminating H: Note τ 4 (G,J)=1 5. Eliminating G: P(J ) = P(C,D,I,G,S,L,J,H ) L S G H ψ 1 ( C,D) = φ C (C)φ D ( D,C ) τ 1 ( D) = ψ 1 ψ 2 (G,I,D) = φ G (G,I,D)τ 1 (D) τ 2 G,I I D C C ( ) = ψ 2 D ( C,D) ( G,I,D) ψ 3 ( G,I,S) = φ I ( I )φ S ( S,I )τ 2 ( G,I ) τ 3 ( G,S) = ψ 3 ( G,I,S) ψ 4 ( G,J,H ) = φ H (H,G,J) τ 4 ( G,J ) = ψ 4 ( G,J,H ) H I Each step involves factor product and factor marginalization ψ 5 ( G,J,L,S ) = τ 4 ( G,J )τ 3 ( G,S)φ L ( L,G) τ 5 ( J,L,S ) = ψ 5 ( G,J,L,S ) G 6. Eliminating S: 7. Eliminating L: ψ 6 ψ 7 ( J,L,S ) = τ 5 ( J,L,S ) φ J J,L,S ( J,L) = τ 6 J,L ( ) τ 6 ( J,L) = ψ 6 ( J,L,S ) ( ) τ 7 ( J ) = ψ 7 ( J,L) L S

5 Unnormalized Measure with Factors 1. We deal with unnormalized measure here 2. For a BN 1. without evidence factors are CPDs and!p ( Φ χ) is a normalized distribution 2. with evidence E=e, 1. factors are CPDs restricted to e and 3. For a Gibbs distribution, 1. factors are potentials!p Φ χ ( ) ( ) = φ i φ i Φ!P Φ χ ( )!P B ( χ) = P B ( χ,e) 2. is the unnormalized Gibbs measure X i 5

6 Marginalize with Unnormalized Unnormalized Conditional Measure equivalent to Normalized Conditional Probability!P Φ ( X Y ) = P Φ ( X Y ) since!p Φ ( X Y ) = P φ ( )!P Φ Y ( )!P Φ X, Y ( X Y) = P (X,Y ) Φ = P Φ Y ( ) = 1 Z φ i Φ φ i X 1 Z φ i Φ φ i Φ X ( D i ) φ i φ i φ i Φ ( D i ) ( D i ) φ i ( D i ) 6

7 Factor Product Let X, Y and Z be three disjoint sets of variables and let Φ 1 (X,Y) and Φ 2 (Y,Z) be two factors. The factor product is the mapping Val(X,Y,Z)à R as follows An example: Φ 1 : 3 x 2 = 6 entries Φ 2 : 2 x 2= 4 entries ψ(x,y,z)=φ 1 (X,Y) Φ 2 (Y,Z) ψ=φ 1 x Φ 2 has 3 x 2 x 2= 12 entries 7

8 VE and Factor Creation In variable elimination each step creates a factor ψ i by multiplying existing factors A variable is then eliminated to create a factor τ 1 which is then used to create another factor P(C,D,I,G,S,L,J,H)= P(C)P(D C)P(I)P(G I,D)P(S I)P(L G)P(J L)P(H G,J)= Φ C (C) Φ D (D,C) Φ I (I) Φ G (G,I,D) Φ S (S,I) Φ L (L,G) Φ J (J,L,S) Φ H (H,G,J) ( ) = φ C (C )φ D ( D,C ) τ 1 ( D) = ψ 1 ( C,D) ψ 1 C,D C 8

9 VE Alternative View Alternative view We take ψ i to be a data-structure takes messages τ 1 generated by other factors ψ j and generates message τ i used by another factor ψ l ( ) = φ C (C )φ D ( D,C ) τ 1 ( D) = ψ 1 ( C,D) ψ 1 C,D C τ 1 (D) ψ 2 (G,I,D) = φ G (G,I,D)τ 1 (D) τ 2 ( G,I ) = ψ 2 ( G,I,D) D τ 2 (G,I) ψ 3 ( G,I,S) = φ I ( I )φ S ( S,I )τ 2 G,I ( G,S) = ψ 3 ( G,I,S) τ 3 I ( ) ψ 4 ( G,J,H ) = φ H (H,G,J ) ( G,J ) = ψ 4 G,J,H τ 4 H ( ) τ 4 (G,J) ψ 5 ( G,J,L,S ) = τ 4 ( G,J )τ 3 ( G,S)φ L L,G ( J,L,S ) = ψ 9 5 ( G,J,L,S ) τ 5 G τ 3 (G,S) ( )

10 Example of Cluster Graph VE execution defines cluster graph ( a flow-chart) A cluster for each factor ψ i Draw edge between clusters C i and C j if message τ i produced by eliminating a variable in ψ i is used in the computation of τ j ψ 1 ( C,D) = φ C (C)φ D ( D,C ) τ 1 ( D) = ψ 1 C ( C,D) ψ 2 (G,I,D) = φ G (G,I,D)τ 1 (D) τ 2 G,I ( ) = ψ 2 ( G,I,D) Edge between C 1 and C 2 since message τ 1 (D) produced by eliminating C is used for τ 2 (G,I) D Arrows indicate flow of messages τ 1 (D) generated from ψ 1 (C,D) participates In the computation of ψ 2

11 Cluster Graph Definition Cluster graph U for factors Φ over χ is an undirected graph 1. Each of whose nodes i is associated with a subset C i χ Cluster graph is family-preserving Each factor ϕ must be associated with a cluster C i, denoted α(ϕ) such that Scope φ 2. Each edge between pair of clusters C i and C j is associated with a sepset E.g., D {C,D) (D,I,G} ( ) C i S i,j C i C j 11

12 Cluster Graph is a Directed Tree In a tree there are no cycles Directions for this tree are specified by messages Since intermediate factor τ i is used only once Otherwise there would be more than one link for a node Called Clique Tree (or Junction Tree or Join Tree) Root is up Leaf is down

13 Definition of Tree Tree a graph with only one path between any pair of nodes Such graphs have no loops In directed graphs a tree has a single node with no parents called a root Directed to undirected will not add moralization links since every node has only one parent Polytree A directed graph has nodes with more than one parent but there is only one path between nodes (ignoring arrow direction) Moralization will add links Undirected tree Directed tree Directed polytree 13

14 Running Intersection Property 1. Definition If X ε C i & X ε C j then X is in every clique inbetween In a clique, every pair of nodes is connected In a maximal clique no more nodes can be added Ex: in cluster graph below, G is present in C 2 and C 4 and also present in clique inbetween: C 3 and C 4 2. A VE generated cluster graph satisfies running intersection property

15 Clique Tree 1. BN 2. Induced Graph 3. Some Cliques: {C,D}, {G,I,D},{G,S,I},{GI,S,L},{H,G,J} 4. A clique Tree that satisfies running intersection

16 Clique Tree Definition A tree T is a clique tree for graph H if Each node in T corresponds to a clique in H and each maximal clique in H is a node in T Each sepset S i,j separates W <Ij,j) and W< (j,i) in H Edge S 2,3 ={G,I} separates W <(2,3) ={G,I,D} and W <(3,2) ={G,S,I}

17 Message Passing: Sum Product Proceed in opposite direction of VE algorithm: Starting from a clique tree, how to perform VE Clique Tree is a very versatile Data Structure

18 Variable Elimination in a Clique Tree Clique Tree can be used as guidance for VE Factors are computed in the cliques and messages are sent along edges

19 Variable Elimination in a Clique Tree A Clique Tree for Student Network This tree satisfies Running Intersection Property i.e., If X ε C i & X ε C j then X is in every clique inbetween Family Preservation property i.e., each factor is associated with a cluster

20 Example of VE in a Clique Tree A Clique Tree for Student Network Non-maximal cliques C 6 and C 7 are absent Assign α: initial factors (CPDs) to cliques First step: Generate initial set of potentials by multiplying out the factors E.g., ψ 5 (J,L,G,S)=ϕ L (L,G)*ϕ J (J,L,S) Root is selected to have variable J, since we are interested in determining P(J), e.g., C 5

21 Message Propagation in a Clique Tree Root=C 5 To compute P(J) In C 1 : eliminate C by performing C The resulting factor has scope D. We send it as a message δ 1-2 (D) to C 2 ψ 1 ( C,D) In C 2 : We define β 2 (G,I,D)=δ 1-2 (D)ψ 2 (G,I,D). We then eliminate D to get a factor over G,I. The resulting factor is δ 2-3 (G,I) which is sent to C 3.

22 Message Propagation in a Clique Tree Root=C 5 To compute P(J) Root=C 3 To compute P(G)

23 VE as Clique Tree Message Passing 1. Let T be a clique tree with Cliques C 1,..C k 2. Begin by multiplying factors assigned to each clique, resulting in initial potentials ψ j ( C j ) = φ 3. Begin passing messages between neighbor cliques sending towards root node δ i j = ψ i δ k i C i S i,j k Nb i j ( { }) 4. Message passing culminates at root node φ: α( φ)=j Result is a factor called beliefs denoted β r (C r ) which is equivalent to ( ) = φ!p φ C r χ C r φ

24 Algorithm: Upward Pass of VE in Clique Tree Procedure Ctree-SP-Upward ( O, // Set of factors T, // Clique tree over id (t, // lnitial assignment of factors to cliques C, // Some selected root clique I ) lnitialize-cliques 2 while C, is not ready 3 Let C a be a ready clique 4 6n*e,U)(St,e,@)* SP-Message(i,p"(i,)) 5 B" * 0l 'fl*.l.rl.,5**, 6 return B, Procedure lnitialize-cliques ( I 2 3 ) for each clique C; th(ct)* 1I4,,.14s:ndi Procedure SP-Message ( i, // sending clique j // receiving clique I 2 3 ),h(ct) * /,.II*.(*oo *1iy.6*-t r(si,i) <- D"o_ "o,, rh@ r) return r(sti)

25 Clique Tree Calibration We have seen how to use the same clique tree to compute probability of any variable We wish to compute the probability of a large number of variables Consider task of computing the posterior distribution over every random variable in network As with HMMs with several latent variables

26 Ready Clique C i is ready to transmit to neighbor C j when C i has messages from all of its neighbors except from C j Sum-product belief propagation algorithm Uses yet another layer of dynamic programming Defined asynchronously

Sum-Product Belief Propagation Algorithm: Calibration using sum-product message passing in a clique tree Procedure CTree-SP-Calibrate ( ) O, // Set of factors T // Clique tree

27 Sum-Product Belief Propagation Algorithm: Calibration using sum-product message passing in a clique tree Procedure CTree-SP-Calibrate ( ) O, // Set of factors T // Clique tree over O I lnitialize-cliques 2 while exist e, j such that z is ready to transmit to 7 3 d,- i(sqi) <- sp-message(i., i) { for each clique ri 5 {1, * th.fl*.rtn dn-, 6 return {Ba}

28 Result at End of Algorithm Computes beliefs of all cliques by Multiplying the initial potential with each of the incoming messages For each clique i, β i is computed as β i ( C i ) = PΦ! ( χ) χ C i Which is the unnormalized marginal distribution of variables in C i

29 Calibration Definition If X appears in two cliques they must agree on its marginal Two adjacent cliques C i and C k =j are said to be calibrated if β i ( C i ) = β j ( C j ) C i S i,j Clique tree T is calibrated if all adjacent pairs of cliques are calibrated Terminology: Clique Beliefs: β i (C i ) Sepset Beliefs: μ i,j (S i,j )= C i S i,j C j S i,j β i ( C i ) = β j C j S i,j ( C j )

30 Calibration Tree as a Distribution A calibrated clique tree Is more than a data structure that stores results of probabilistic inference It can be viewed as an alternative representation of P Φ At convergence of clique tree calibration algorithm!p Φ χ ( ) = i V T ( i j) E T β i ( C ) i µ ( i,j S ) i,j

31 Misconception Markov Network D A B Factors in terms of potentials C (a) Gibbs Distribution P(a,b,c,d) = 1 Z φ 1 (a,b) φ 2 (b,c) φ 3 (c,d) φ 4 (d,a) where Z = a,b,c,d φ 1 (a,b) φ 2 (b,c) φ 3 (c,d) φ 4 (d,a) Z=7,201,840 31

32 Beliefs for Misconception example One clique tree consists cliques {A,B,D} and {B,C,D} with sepset {B,D} A,B,D {B,D} B,C,D Tree obtained either from (i) VE or from (ii) triangulation (constructing a chordal graph) Final clique potentials and sepset Assignment a a,o ao n a" al al al AL 6o 6r 6t 6o 6o 6r 6r maxc 600, ,030 5, ooo,5oo 1, ,000, , ,000 Assignment d Assienment Potential from Gibbs and Clique Tree are!p same: Φ ( a 1,b 0,c 1,d 0 ) = 100 β 1 ( a 1,b 0,d 0 )β 2 b 0,c 1,d 0 6o bl br ( ) ( ) µ 1,2 b 0,d 0 n,z(b, D) = 600,200 1,300, 130 5, 100, , b0 bo b0 bl bl bt 6t co cl ct co co c1 ct = 100 d1 ll I 4o 4t 4o 4r d0 ll 5, l3z(8,c, 32

33 Message Passing: Belief Update Alternative Message Passing Scheme Involves operations on reparameterized distribution in terms of cliques {β i (C i )}, iεv T and sepset beliefs {μ i,j (S i,j )}, (i--j) ε V T

34 Message Passing with Division Multiply all the messages and then divide the resulting factor by δ jà i

35 Factor Division Message Passing with Division An example of factor division 35

36 Constructing a Clique Tree Two approaches to construct a clique tree from a graph From Variable Elimination From Chordal Graphs

property and is hence a clique tree Unambitious Student

37 Clique Tree from VE Execution of Variable Elimination can be associated with a cluster graph Satisfies running intersection property and is hence a clique tree Unambitious Student Variable Elimination with ordering J,L,S,H,C,D,I, G results in clique tree:

38 Clique Tree from Chordal Graphs There exists a clique tree for Φ whose cliques are precisely the maximal cliques in I Φ,< Triangulation: construct chordal graph subsuming existing graph 1. Undirected factor graph 2. A triangulation 3. Cluster graph With edge weights

39 Algorithm: Clique Tree from Chordal Graph Given a set of factors, construct the undirected graph H Φ Triangulate H Φ to construct Chordal Graph H* Find cliques in H*, and make each one a node in a cluster graph Run the maximal spanning tree algorithm on the cluster graph to construct a tree

Variable Elimination: Algorithm

Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product