Machine Learning 4771

Size: px

Start display at page:

Download "Machine Learning 4771"

Patrick Gray
5 years ago
Views:

1 Machine Learning 4771 Instructor: Tony Jebara

2 Topic 18 The Junction Tree Algorithm Collect & Distribute Algorithmic Complexity ArgMax Junction Tree Algorithm

3 Review: Junction Tree Algorithm end message from each clique to its separators of what it thinks the submarginal on the separator is. Normalize each clique by incoming message from its separators so it agrees with them If agree: Else: V \ ψ V end message From V to W φ = V \ ψ V ψ W ψ V = φ ψ φ W = ψ V V = { A,B } = { B} W = { B,C } = φ = p( ) = φ = W \ ψ W end message From W to V φ = W \ ψ W ψ V = φ φ ψ W = ψ W ψ V Done! Now they Agree Done! ψ V \ V = V \ = φ φ φ φ ψ V V \ ψ V W \ = φ = ψ W

4 JTA with many cliques Problem: what if we have more than two cliques? 1) Update AB & BC 2) Update BC & CD AB B BC C CD AB B BC C CD Problem: AB has not heard about CD! After BC updates, it will be inconsistent for AB Need to iterate the pairwise updates many times This will eventually converge to consistent marginals But, inefficient can we do better?

5 JTA: Collect & Distribute Use tree recursion rather than iterate messages mindlessly! initialize(dag){ Pick root et all variables as: ψ Ci = p( x i π i ),φ = 1 } collectevidence(node) { for each child of node { update1(node,collectevidence(child)); } return(node); } distributeevidence(node) { for each child of node { update2(child,node); distributeevidence(child); } } update1(node w,node v) { φ V W = V \ V W ψ V, ψ } ( ) W = φ V W ψ W update2(node w,node v) { φ V W = V \ V W ψ V, ψ } ( ) W = φ V W ψ W ( ) = 1 ψ C C ( ) = 1 φ normalize() { p X C } ψ C, p X C φ φ V W φ V W

6 Junction Tree Algorithm JTA: 1)Initialize 2)Collect 3)Distribute 4)Normalize Note: leaves do not change their ψ during collect Note: the first cliques collect changes are parents of leaves Note: root does not change its ψ during distribute

7 Algorithmic Complexity The 5 steps of JTA are all efficient: 1) Moralization Polynomial in # of nodes 2) Introduce Evidence (fixed or constant) Polynomial in # of nodes (convert pdf to slices) 3) Triangulate (Tarjan & Yannakakis 1984) uboptimal=polynomial, Optimal=NP 4) Construct Junction Tree (Kruskal) Polynomial in # of cliques 5) Junction Tree Algorithm (Init,Collect,Distribute,Normalize) Polynomial (linear) in # of cliques, Exponential in Clique Cardinality

8 Junction Tree Algorithm Convert Directed Graph to Junction Tree x 3 x 4 x 2 x 3! x 1 x 2 x 3 x 5 Initialize separators to 1 (and Z=1) and set clique tables to the CPTs in the Directed Graph ( ) = p x 1 p X p( X ) = 1 Z ( )p( x 2 x 1 )p( x 3 x 2 )p( x 4 x 3 )p( x 5 x 3 )p( x 6 x 5 )p( x 7 x 5 ) ψ( X C ) = 1 p( x 1,x 2 )p( x 3 x 2 )p( x 4 x 3 )p( x 5 x 3 )p( x 6 x 5 )p x 7 x 5 φ( X ) C Run Collect, Distribute, Normalize Get valid marginals from all ψ, φ tables x 5 x 6 x 5 x 7 ( )

9 JTA with Extra Evidence If extra evidence is observed, must slice tables accordingly Example: p A,B,C,D Z = 1 ψ AB ( ) = 1 Z ψ AB ψ BC ψ CD AB B BC C CD B = 0 B = 1 ψ BC You are given evidence: A=0. Replace table with slices ψ AB 8 4 A = 0 A = 1 ψ BC JTA now gives ψ, φ as marginals conditioned on evidence p( B A = 0) = ψ AB p( B,C A = 0) = ψ B,C ψ B AB BC All denominators equal the new normalizer Z Z ' = p( EVIDENCE ) = ψ AB ψ BC = ψ BC = ψ CD B B,C C = 0 C = 1 C,D B = 0 B = 1 ψ CD ψ CD D = 0 D = 1 p( C,D A = 0) = C = 0 C = 1 ψ CD C,D ψ CD

10 ArgMax Junction Tree Algorithm We can also use JTA for finding the max not the sum over the joint to get argmax of marginals & conditionals ay have some evidence: Most likely (highest p) X F? p X F,X E What is most likely state of patient with fever & headache? olution: replace sum with max inside JTA update code φ V W Final potentials are max marginals: ψ Highest value in potential is most likely: ( ) = p( x 1,,x n,x n+1,,x N ) X F = arg max XF p( X F,X E ) ( ) ( ) ( ) p F = max x2 p x,x 3,x 4,x 5 1 = 1,x 2,x 3,x 4,x 5,x 6 = 1 = max x2 p( x 2 x 1 = 1)p ( x 1 = 1)max x3 p x 3 x 1 = 1 max x4 p( x 4 x 2 )max x5 p( x 5 x 3 )p x 6 = 1 x 2,x 5 = max V \ ( V W ) ψ, ψ = φ V W ψ V W φ W φ V W V W = max V \ V W ( ) ψ, ψ = φ V W V W ψ W φ V W ( X C ) = max U \C p( X ) ( ) X C = arg max C ψ X C

11 ArgMax Junction Tree Algorithm Why do I need the ArgMax junction tree algorithm? Can t I just compute marginals using the um algorithm and then find the highest value in each marginal??? No!! Here s a counter-example: p( x 1,x 2 ) = x 2 = 0 x 2 = 1 Most likely is x 1 =C and x 2 =0 But the sub-marginals p(x 1 ) and p(x 2 ) do not reveal this p( x 1 ) = A B C The marginals would falsely imply that is x 1 =A and x 2 =1 x 1 A x 1 B x 1 C p( x 2 ) = x 2 = 0 x 2 =

12 Example AB B BC Note that products are element-wise Let us send a regular JTA message from AB to BC ψ AB B = 0 B = 1 A = 0 A = 1 φ B 1 1 ψ BC If argmax JTA, just change the separator update to: B = 0 B = 1 φ B = ψ AB = 8 4 A = A = 11 B = 0 B = = φ B 5 ψ φ BC = 2 3 = B = 0 B B = 1 1 C = 0 C = 1 ψ BC φ B = max A ψ AB = max 8 4 A 3 1 = 8 4 B = 0 B = 1 = 8 4 B = 0 B = 1 B = 0 B = 1 C = 0 C = 1 B = 0 B = 1

Machine Learning 4771

Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation