CS281A/Stat241A Lecture 19

Similar documents
Probability Propagation

Probability Propagation

Variable Elimination (VE) Barak Sternberg

Inference and Representation

Lecture 17: May 29, 2002

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Probabilistic Graphical Models

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Machine Learning 4771

Message Passing Algorithms and Junction Tree Algorithms

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

CS Lecture 4. Markov Random Fields

Bayesian & Markov Networks: A unified view

CS173 Lecture B, November 3, 2015

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Variable Elimination: Algorithm

9 Forward-backward algorithm, sum-product on factor graphs

Junction Tree, BP and Variational Methods

Exact Inference: Clique Trees. Sargur Srihari

Undirected Graphical Models

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Statistical Approaches to Learning and Discovery

Probabilistic Graphical Models

Fisher Information in Gaussian Graphical Models

Probabilistic Graphical Models

Variable Elimination: Algorithm

Probabilistic Graphical Models

Lecture 4 October 18th

Graphical Model Inference with Perfect Graphs

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Probabilistic Graphical Models (I)

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

CSC 412 (Lecture 4): Undirected Graphical Models

Machine Learning Lecture 14

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Machine Learning 4771

11 The Max-Product Algorithm

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

COMPSCI 276 Fall 2007

EE595A Submodular functions, their optimization and applications Spring 2011

4.1 Notation and probability review

Inference as Optimization

Strongly chordal and chordal bipartite graphs are sandwich monotone

Example: multivariate Gaussian Distribution

Cographs; chordal graphs and tree decompositions

6.867 Machine learning, lecture 23 (Jaakkola)

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

12 : Variational Inference I

3 Undirected Graphical Models

Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum

Belief Update in CLG Bayesian Networks With Lazy Propagation

Context-specific independence Parameter learning: MLE

Decomposable and Directed Graphical Gaussian Models

Undirected Graphical Models: Markov Random Fields

5. Sum-product algorithm

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Likelihood Analysis of Gaussian Graphical Models

Exact Inference Algorithms Bucket-elimination

Graphical Models Another Approach to Generalize the Viterbi Algorithm

p L yi z n m x N n xi

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

4 : Exact Inference: Variable Elimination

Lecture 8: Bayesian Networks

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Bayesian Networks Representation and Reasoning

Bayesian networks approximation

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Machine Learning Summer School

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

Probabilistic Graphical Models

13 : Variational Inference: Loopy Belief Propagation

14 : Theory of Variational Inference: Inner and Outer Approximation

On Chordal Graphs and Their Chromatic Polynomials

NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours

CMPSCI611: The Matroid Theorem Lecture 5

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Chris Bishop s PRML Ch. 8: Graphical Models

Cours 7 12th November 2014

COMP538: Introduction to Bayesian Networks

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Enumerating minimal connected dominating sets in graphs of bounded chordality,

Induced Cycles of Fixed Length

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Bayesian Machine Learning - Lecture 7

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

13: Variational inference II

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Variational Inference (11/04/13)

Decomposable Graphical Gaussian Models

Lecture 6: Graphical Models: Learning

F. Roussel, I. Rusu. Université d Orléans, L.I.F.O., B.P. 6759, Orléans Cedex 2, France

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

Transcription:

CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett

CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723 Sutardja Dai Hall. Thursday Nov 5, 1-2pm, in 723 Sutardja Dai Hall. Homework 5 due 5pm Monday, November 16.

CS281A/Stat241A Lecture 19 p. 3/4 Key ideas of this lecture Junction Tree Algorithm. (For directed graphical models:) Moralize. Triangulate. Construct a junction tree. Define potentials on maximal cliques. Introduce evidence. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 4/4 Junction Tree Algorithm Inference: Given Graph G = (V, E), Evidence x E, for E V, Set F V, compute p(x F x E ).

CS281A/Stat241A Lecture 19 p. 5/4 Junction Tree Algorithm Elimination: Single set F. Any G. Sum-product: All singleton sets F simultaneously. G a tree. Junction tree: All cliques F simultaneously. Any G.

CS281A/Stat241A Lecture 19 p. 6/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. 3. Construct a junction tree. 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 7/4 1. Moralize In a directed graphical model, local conditionals are functions of a variable and its parents: p(x i x π(i) ) = ψ π(i) {i}. To represent this as an undirected model, the set must be a clique. π(i) {i} We consider the moral graph: all parents connected.

CS281A/Stat241A Lecture 19 p. 8/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. (e.g., run UNDIRECTEDGRAPHELIMINATE) 3. Construct a junction tree. 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 9/4 2. Triangulate: Motivation Theorem: A graph G is chordal iff it has a junction tree. Recall that all of the following are equivalent: G is chordal G is decomposable G is recursively simplicial G is the fixed point of UNDIRECTEDGRAPHELIMINATE an oriented version of G has moral graph G. G implies the same cond. indep. as some directed graph.

CS281A/Stat241A Lecture 19 p. 10/4 Chordal/Triangulated Definitions: A cycle for a graph G = (V,E) is a vertex sequence v 1,...,v n with v 1 = v n but all other vertices distinct, and {v i,v i+1 } E. A cycle has a chord if it has a pair v i,v j with 1 < i j < n and {i,j} E. A graph is chordal or triangulated if every cycle of length at least four has a chord.

CS281A/Stat241A Lecture 19 p. 11/4 Junction Tree: Definition A clique tree for a graph G = (V,E) is a tree T = (V T,E T ) where V T is a set of cliques of G, V T contains all maximal cliques of G. A junction tree for a graph G is a clique tree T = (V T,E T ) for G that has the running intersection property: for any cliques C 1 and C 2 in V T, every clique on the path connecting C 1 and C 2 contains C 1 C 2.

CS281A/Stat241A Lecture 19 p. 12/4 Junction Tree: Example 3 3 3 3 3 C 2 C 4 3 C 3 3 C 1 C 5 Clique Tree: C 1 C 2 C 3 C 4 C 5 C 6 C 6 3

CS281A/Stat241A Lecture 19 p. 13/4 2. Triangulate Theorem: A graph G is chordal iff it has a junction tree. Chordal Recursively Simplicial, and we ll show: Recursively Simplicial Junction Tree Chordal

CS281A/Stat241A Lecture 19 p. 14/4 Recursively Simplicial Junction Tree Recall: A graph G is recursively simplicial if it contains a simplicial vertex v (neighbors form a clique), and when v is removed, the remaining graph is recursively simplicial. Proof idea induction step: Consider a recursively simplicial graph G of size n + 1. When we remove a simplicial vertex v, it leaves a subgraph G with a junction tree T. Let N be the clique in T containing v s neighbors. Let C be a new clique of v and its neighbors. To obtain a junction tree for G: If N contains only v s neighbors, replace it with C. Otherwise, add C with an edge to N.

CS281A/Stat241A Lecture 19 p. 15/4 Junction Tree Chordal Proof idea induction step: Consider a junction tree T of size n + 1. Consider a leaf C of T and its neighbor N in T. Remove C from T and remove C \ N from V. The remaining tree is a junction tree for the remaining (chordal) graph. All cycles have a chord, since either 1. Cycle is completely in remaining graph, 2. Cycle is completely in C, or 3. Cycle is in C \ N, C N, and V \ C (and C N is complete, so contains a chord).

CS281A/Stat241A Lecture 19 p. 16/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. 3. Construct a junction tree: Find a maximal spanning tree. 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 17/4 Junction Tree is Maximal Spanning Tree Define: The weight of a clique tree T = (C,E) is w(t) = C C. (C,C ) E A maximal spanning tree for a clique set C of a graph is a clique tree T = (C,E ) with maximal weight over all clique trees of the form (C,E).

CS281A/Stat241A Lecture 19 p. 18/4 Junction Tree is Maximal Spanning Tree Theorem: Suppose that a graph G with clique set C has a junction tree. Then a clique tree (C,E) is a junction tree iff it is a maximal spanning tree for C. And there are efficient greedy algorithms for finding the maximal spanning tree: While graph is not connected: Add an edge for the biggest weight separating set that does not lead to a cycle.

CS281A/Stat241A Lecture 19 p. 19/4 Maximal Spanning Tree: Proof Consider a graph G with vertices V. For a clique tree (C,E), define the separators S = { C C : (C,C ) E }. Since T is a tree, for any k V, S S 1[k S] C C 1[k C] 1, with equality iff the subgraph of T of nodes containing k is a tree.

CS281A/Stat241A Lecture 19 p. 20/4 Maximal Spanning Tree: Proof Thus, w(t) = S S S = 1[k S] S S k ( ) 1[k C] 1 k C C = C n, C C with equality iff T is a junction tree.

CS281A/Stat241A Lecture 19 p. 21/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. 3. Construct a junction tree 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 22/4 Define potentials on maximal cliques To express a product of clique potentials as a product of maximal clique potentials: ψ S1 (x S1 ) ψ SN (x SN ) = C C ψ C (x C ), 1. Set all clique potentials to 1. 2. For each potential, incorporate (multiply) it into a potential containing its variables.

CS281A/Stat241A Lecture 19 p. 23/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. 3. Construct a junction tree 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 24/4 Introduce Evidence Two equivalent approaches: 1. Introduce evidence potentials δ(x i, x i ) for i in E, so that marginalizing fixes x E = x E. 2. Take slice of each clique potential: ( ) ψ C (x C ) := ψ C xc (V \E), x C E.

CS281A/Stat241A Lecture 19 p. 25/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. 3. Construct a junction tree 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 26/4 Hugin Algorithm Add potentials for separator sets. Recall: If G has a junction tree T = (C, S), then any probability distribution that satisfies the conditional independences implied by G can be factorized as p(x) = C C p(x C) S S p(x S), where, if S = {C 1,C 2 } then x S denotes x C1 C 2.

CS281A/Stat241A Lecture 19 p. 27/4 Hugin Algorithm Represent p with potentials on separators: p(x) = C C ψ C(x C ) S S φ S(x S ) This can represent any distribution that an undirected graphical model can (since we can set φ S 1). But nothing more (since we can incorporate each φ S into one of the ψ C s that have S C).

CS281A/Stat241A Lecture 19 p. 28/4 Hugin Algorithm 1. Initialize: ψ C (x C ) = appropriate clique potential φ S (x S ) = 1. 2. Update potentials so that p(x) is invariant ψ C,φ S become the marginal distributions.

CS281A/Stat241A Lecture 19 p. 29/4 Algorithm: Messages are passed between cliques in the junction tree. A message is passed from V to adjacent vertex W once V has received messages from all its other neighbors. The message corresponds to the updates: φ (1) S (x S) = x V S ψ V (x V ), W (x W) = ψ W (x W ) φ(1) S (x S) φ S (x S ), ψ (1) ψ (1) V (x V ) = ψ V (x V ), where S = V W is the separator.

CS281A/Stat241A Lecture 19 p. 30/4 Analysis: 1. p(x) is invariant under these updates. 2. In a tree, messages can path in both directions over every edge. 3. The potentials are locally consistent after messages have passed in both directions. 4. In a graph with a junction tree, local consistency implies global consistency.

CS281A/Stat241A Lecture 19 p. 31/4 Analysis: p(x) = C C ψ C(x C ) S S φ S(x S ). In this ratio, only ψ V ψ W φ S changes, to ψ (1) V ψ(1) W φ (1) S = ψ V ψ W φ (1) S φ (1) S φ S = ψ V ψ W φ S.

CS281A/Stat241A Lecture 19 p. 32/4 Analysis: 1. p(x) is invariant under these updates. 2. In a tree, messages can path in both directions over every edge. 3. The potentials are locally consistent after messages have passed in both directions. 4. In a graph with a junction tree, local consistency implies global consistency.

CS281A/Stat241A Lecture 19 p. 33/4 Analysis: Local Consistency Suppose that messages pass both ways, from V to W and back: 1. a message passes from V to W : φ (1) S (x S) = x V S ψ V (x V ), W (x W) = ψ W (x W ) φ(1) S (x S) φ S (x S ), ψ (1) ψ (1) V (x V ) = ψ V (x V ),

CS281A/Stat241A Lecture 19 p. 34/4 Analysis: Local Consistency 2. other messages pass to W : φ (2) S ψ (2) V = φ(1) S = ψ(1) V ψ (2) W =

CS281A/Stat241A Lecture 19 p. 35/4 Analysis: Local Consistency 3. a message passes from W to V : φ (3) S (x S) = x W S ψ (2) W (x W), ψ (3) V (x V ) = ψ (2) V (x V ) φ(3) S (x S) φ (2) S (x S), ψ (3) W (x V ) = ψ W (x W ). Subsequently, φ S,ψ V,ψ W remain unchanged.

CS281A/Stat241A Lecture 19 p. 36/4 Analysis: Local Consistency After messages have passed in both directions, these potentials are locally consistent: x V \S ψ (3) V (x V ) = x W \S ψ (3) W (x W) = φ (3) S (x S). c.f.: x V \S p(x V ) = x W \S p(x W ) = p(x S ).

CS281A/Stat241A Lecture 19 p. 37/4 Analysis: Local Consistency Proof ψ (3) V (x V ) = ψ (2) V (xv ) φ(3) S (x S) x V \S x V \S φ (2) S (x S) = φ(3) S (x S) φ (1) S (x S) (W V ) x V \S ψ V (x V ) (to W ) = φ (3) S (x S) (V W ) = x W \S ψ (2) W (x W) = x W \S ψ (3) W (x W).

CS281A/Stat241A Lecture 19 p. 38/4 Analysis: 1. p(x) is invariant under these updates. 2. In a tree, messages can path in both directions over every edge. 3. The potentials are locally consistent after messages have passed in both directions. 4. In a graph with a junction tree, local consistency implies global consistency.

CS281A/Stat241A Lecture 19 p. 39/4 Analysis: Local implies Global Local consistency: For all adjacent cliques V,W with separator S, x V \S ψ (3) V (x V ) = x W \S ψ (3) W (x W) = φ (3) S (x S). Global consistency: For all cliques C, ψ C (x C ) = x Cc C ψ C(x C ) S φ S(x S ) = p(x C)

CS281A/Stat241A Lecture 19 p. 40/4 Local implies Global: Proof Induction on number of cliques. Trivial for one clique. Assume true for all junction trees with n cliques. Consider a tree T W of size n + 1. Fix a leaf B, attached by separator R. Define W = all variables N = B \ R V = W \ N T V = junction tree without B.

CS281A/Stat241A Lecture 19 p. 41/4 Local implies Global: Proof The junction tree T V : Has only variables V (from the junction tree property). Satisfies local consistency. Hence satisfies global consistency for V : for all A T V, ψ A (x A ) = C C V ψ C (x C ) x V S S \A V φ S (x S ).

CS281A/Stat241A Lecture 19 p. 42/4 Local implies Global: Proof But viewing this A in the larger T W, we have C C W ψ C (x C ) S S W φ S (x S ) = ψ B (x B ) φ x V \A x R (x R ) N x W \A = x V \A = ψ A (x A ). C C V ψ C (x C ) S S V φ S (x S ) C C V ψ C (x C ) S S V φ S (x S )

CS281A/Stat241A Lecture 19 p. 43/4 Local implies Global: Proof And for the new clique B: Suppose A is the neighbor of B in T W. C C W ψ C (x C ) x W \B S S W φ S (x S ) = ψ B (x B ) φ x A\R x R (x R ) V \A = ψ B(x B ) φ R (x R ) x V \A = ψ B (x B ) = ψ B (x B ). x A\R x A\R ψ A (x A ) φ R (x R ) C C V ψ C (x C ) S S V φ S (x S ) C C V ψ C (x C ) S S V φ S (x S )

CS281A/Stat241A Lecture 19 p. 44/4 Junction Tree Algorithm 1. (For directed graphical models:) Moralize. 2. Triangulate. 3. Construct a junction tree 4. Define potentials on maximal cliques. 5. Introduce evidence. 6. Propagate probabilities.

CS281A/Stat241A Lecture 19 p. 45/4 Junction Tree Algorithm: Computation Size of largest maximal clique determines run-time. If variables are categorical and potentials are represented as tables, marginalizing takes time exponential in clique size. Finding a triangulation to minimize the size of the largest clique is NP-hard.

CS281A/Stat241A Lecture 19 p. 46/4 Key ideas of this lecture Junction Tree Algorithm. (For directed graphical models:) Moralize. Triangulate. Construct a junction tree. Define potentials on maximal cliques. Introduce evidence. Propagate probabilities.