Probability Propagation

Similar documents
Probability Propagation

Decomposable Graphical Gaussian Models

CS281A/Stat241A Lecture 19

Markov properties for directed graphs

Likelihood Analysis of Gaussian Graphical Models

Decomposable and Directed Graphical Gaussian Models

Lecture 17: May 29, 2002

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

p L yi z n m x N n xi

Markov properties for undirected graphs

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Inference in Bayesian Networks

6.867 Machine learning, lecture 23 (Jaakkola)

Variable Elimination (VE) Barak Sternberg

Markov properties for undirected graphs

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

9 Forward-backward algorithm, sum-product on factor graphs

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Lecture 4 October 18th

Machine Learning 4771

Exact Inference: Clique Trees. Sargur Srihari

Variable Elimination: Algorithm

CS Lecture 4. Markov Random Fields

Conditional Independence and Markov Properties

Machine Learning 4771

Variable Elimination: Algorithm

Bayesian Networks Representation and Reasoning

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

A brief introduction to Conditional Random Fields

Lecture 12: May 09, Decomposable Graphs (continues from last time)

14 : Theory of Variational Inference: Inner and Outer Approximation

4.1 Notation and probability review

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Maximum likelihood in log-linear models

Statistical Approaches to Learning and Discovery

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph

CPSC 540: Machine Learning

Probabilistic Graphical Models

Probabilistic Graphical Models

Junction Tree, BP and Variational Methods

k-blocks: a connectivity invariant for graphs

Undirected Graphical Models

Inference as Optimization

Bayesian Machine Learning - Lecture 7

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Probabilistic Graphical Models (I)

Directed and Undirected Graphical Models

Statistical Learning

Inference and Representation

Probabilistic Graphical Models

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

CRF for human beings

Graphical Model Inference with Perfect Graphs

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Lecture 6: Graphical Models

Tree sets. Reinhard Diestel

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Message Passing Algorithms and Junction Tree Algorithms

11 The Max-Product Algorithm

Bayesian Networks: Belief Propagation in Singly Connected Networks

1. How many labeled trees are there on n vertices such that all odd numbered vertices are leaves?

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

4 : Exact Inference: Variable Elimination

A Graphical Model for Simultaneous Partitioning and Labeling

3 : Representation of Undirected GM

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

arxiv: v1 [cs.dm] 29 Oct 2012

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Bayesian Networks: Representation, Variable Elimination

Parameterized Complexity of the Sparsest k-subgraph Problem in Chordal Graphs

Review: Directed Models (Bayes Nets)

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

4-coloring P 6 -free graphs with no induced 5-cycles

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science ALGORITHMS FOR INFERENCE Fall 2014

Probabilistic Graphical Models

LINK scheduling algorithms based on CSMA have received

Lecture 8: Bayesian Networks

5 Flows and cuts in digraphs

Data Mining 2018 Bayesian Networks (1)

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

CS Introduction to Complexity Theory. Lecture #11: Dec 8th, 2015

Efficient Reassembling of Graphs, Part 1: The Linear Case

Bayesian & Markov Networks: A unified view

Triangle-free graphs with no six-vertex induced path

13 : Variational Inference: Loopy Belief Propagation

Efficient Approximation for Restricted Biclique Cover Problems

Minimization of Symmetric Submodular Functions under Hereditary Constraints

COMP90051 Statistical Machine Learning

COMP538: Introduction to Bayesian Networks

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Tree-chromatic number

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 15. Probabilistic Models on Graph

Massachusetts Institute of Technology 6.042J/18.062J, Fall 02: Mathematics for Computer Science Professor Albert Meyer and Dr.

Transcription:

Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009

Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii) All prime components of G are cliques; (iv) G admits a perfect numbering; (v) Every minimal (α, β)-separator are complete; (vi) Cliques of G can be arranged in a junction tree.

Algorithms associated with chordality Maximum Cardinality Search (MCS) identifies whether a graph is chordal or not. If a graph G is chordal, MCS yields a perfect numbering of the vertices. In addition it finds the cliques of G: From an MCS numbering V = {1,..., V }, let B λ = bd(λ) {1,..., λ 1} and π λ = B λ. A ladder vertex is either λ = V or one with π λ+1 < π λ + 1. Let Λ be the set of ladder vertices. The cliques are C λ = {λ} B λ, λ Λ.

Junction tree Let A be a collection of finite subsets of a set V. A junction tree T of sets in A is an undirected tree with A as a vertex set, satisfying the junction tree property: If A, B A and C is on the unique path in T between A and B it holds that A B C. If the sets in A are pairwise incomparable, they can be arranged in a junction tree if and only if A = C where C are the cliques of a chordal graph. The junction tree can be constructed directly from the MCS ordering C λ, λ Λ.

The general problem Factorizing density on X = v V X v with V and X v finite: p(x) = C C φ C (x). The potentials φ C (x) depend on x C = (x v, v C) only. Basic task to calculate marginal (likelihood) p E (x E ) = y V \E p(x E, y V \E ) for E V and fixed xe, but sum has too many terms. A second purpose is to get the prediction p(x v xe ) = p(x v, xe )/p(x E ) for v V.

Computational structure Algorithms all arrange the collection of sets C in a junction tree T. Hence, they works only if C are cliques of chordal graph G. If the initial model is based on a DAG D, the first step is to form the moral graph G = D m, exploiting that if P factorizes w.r.t. D, it also factorizes w.r.t. D m. If G is not chordal from the outset, triangulation is used to construct chordal graph G with E E. Again, if P factorizes w.r.t. G it factorizes w.r.t. G.This step is non-trivial and it is NP-complete to optimize. When this has been done, the computations are executed by message passing.

The computational structure is set up in several steps: 1. Moralisation: Constructing D m, exploiting that if P factorizes over D, it factorizes over D m.

The computational structure is set up in several steps: 1. Moralisation: Constructing D m, exploiting that if P factorizes over D, it factorizes over D m. 2. Triangulation: Adding edges to find chordal graph G with G G. This step is non-trivial (NP-complete) to optimize;

The computational structure is set up in several steps: 1. Moralisation: Constructing D m, exploiting that if P factorizes over D, it factorizes over D m. 2. Triangulation: Adding edges to find chordal graph G with G G. This step is non-trivial (NP-complete) to optimize; 3. Constructing junction tree: Using MCS, the cliques of G are found and arranged in a junction tree.

The computational structure is set up in several steps: 1. Moralisation: Constructing D m, exploiting that if P factorizes over D, it factorizes over D m. 2. Triangulation: Adding edges to find chordal graph G with G G. This step is non-trivial (NP-complete) to optimize; 3. Constructing junction tree: Using MCS, the cliques of G are found and arranged in a junction tree. 4. Initialization: Assigning potential functions φ C to cliques.

The computational structure is set up in several steps: 1. Moralisation: Constructing D m, exploiting that if P factorizes over D, it factorizes over D m. 2. Triangulation: Adding edges to find chordal graph G with G G. This step is non-trivial (NP-complete) to optimize; 3. Constructing junction tree: Using MCS, the cliques of G are found and arranged in a junction tree. 4. Initialization: Assigning potential functions φ C to cliques. The complete process above is known as compilation.

Initialization Chordal graphs and junction trees 1. For every vertex v V we find a clique C(v) in the triangulated graph G which contains pa(v). Such a clique exists because v pa(v) are complete in D m by construction, and hence in G;

Initialization Chordal graphs and junction trees 1. For every vertex v V we find a clique C(v) in the triangulated graph G which contains pa(v). Such a clique exists because v pa(v) are complete in D m by construction, and hence in G; 2. Define potential functions φ C for all cliques C in G as φ C (x) = p(x v x pa(v) ) v:c(v)=c where the product over an empty index set is set to 1, i.e. φ C 1 if no vertex is assigned to C.

Initialization Chordal graphs and junction trees 1. For every vertex v V we find a clique C(v) in the triangulated graph G which contains pa(v). Such a clique exists because v pa(v) are complete in D m by construction, and hence in G; 2. Define potential functions φ C for all cliques C in G as φ C (x) = p(x v x pa(v) ) v:c(v)=c where the product over an empty index set is set to 1, i.e. φ C 1 if no vertex is assigned to C. 3. It now holds that p(x) = C C φ C (x).

Overview Chordal graphs and junction trees This involves following steps 1. Incorporating observations: If X E = xe is observed, we modify potentials as φ C (x C ) φ C (x) δ(xe, x e ), e E C with δ(u, v) = 1 if u = v and else δ(u, v) = 0. Then: p(x X E = xe ) = C C φ C (x C ) p(xe ). 2. Marginals p(xe ) and p(x C xe ) are then calculated by a local message passing algorithm.

Separators Between any two cliques C and D which are neighbours in the junction tree their intersection S = C D is called a separator. In fact, the sets S are the minimal separators appearing in any decomposition sequence. We also assign potentials to separators, initially φ S 1 for all S S, where S is the set of separators. Finally let κ(x) = C C φ C (x C ) and now it holds that p(x x E ) = κ(x)/p(x E ). S S φ S(x S ), (1) The expression (1) will be invariant under the message passing.

Marginalization Chordal graphs and junction trees The A-marginal of a potential φ B for A B is φ A B (x) = φ B (y) y B :y A =x A If φ B depends on x through x B only and B V is small, marginal can be computed easily. Marginalization satisfies Consonance For subsets A and B: φ (A B) = ( φ B) A Distributivity If φ C depends on x C only and C B: (φφ C ) B = ( φ B) φ C.

Messages Chordal graphs and junction trees When C sends message to D, the following happens: Before φ C φ S φ D φ C φ S φ C φ S C D φ S After Computation is local, involving only variables within cliques.

The expression Chordal graphs and junction trees κ(x) = C C φ C (x C ) S S φ S(x S ) is invariant under the message passing since φ C φ D /φ S is: φ φ C φ S C D φ S φ S C = φ C φ D φ S. After the message has been sent, D contains the D-marginal of φ C φ D /φ S. To see this, calculate ( φc φ D φ S ) D = φ D φ D φ C = φ D φ S S φ C. S

Second message Chordal graphs and junction trees If D returns message to C, the following happens: First message φ C φ S φ C φ S C D φ S φ φ S D C φ S φ S φ φ S C D φ S Second message

Now all sets contain the relevant marginal of φ = φ C φ D /φ S : The separator contains ( φ S φc φ D = φ S ) ) S = (φ D ) S φ = (φ S S C D = φ S C φ S D. φ S φ S C contains since, as before φ C φ S φ S C = φ C φ S φ D = φ C S ( φc φ D φ S ) C = φ D φ D φ C = φ C φ S S φ D. S Further messages between C and D are neutral! Nothing will change if a message is repeated.

Two phases: CollInfo: messages are sent from leaves towards arbitrarily chosen root R. After CollInfo, the root potential satisfies φ R (x R ) = κ R (x R ) = p(x R, x E ).

Two phases: CollInfo: messages are sent from leaves towards arbitrarily chosen root R. After CollInfo, the root potential satisfies φ R (x R ) = κ R (x R ) = p(x R, x E ). DistInfo: messages are sent from root R towards leaves. After CollInfo and subsequent DistInfo, it holds for all B C S that φ B (x B ) == κ B (x B ) = p(x B, x E ).

Two phases: CollInfo: messages are sent from leaves towards arbitrarily chosen root R. After CollInfo, the root potential satisfies φ R (x R ) = κ R (x R ) = p(x R, x E ). DistInfo: messages are sent from root R towards leaves. After CollInfo and subsequent DistInfo, it holds for all B C S that φ B (x B ) == κ B (x B ) = p(x B, x E ). Hence p(x E ) = x S φ S (x S ) for any S S and p(x v x E ) can readily be computed from any φ S with v S.

CollInfo Chordal graphs and junction trees Root Messages are sent from leaves towards root.

DistInfo Chordal graphs and junction trees Root After CollInfo, messages are sent from root towards leaves.

The correctness of the algorithm is easily established by induction: We have on the previous overheads shown correctness for a junction tree with only two cliques. Now consider a leaf clique L of the juction tree and let V = C:C C\{L} C. We can then think of L and V forming a junction tree of two cliques with separator S = L C where C is the neighbour of L in the junction tree. After a message has been sent from L to V in the CollInfo phase, φ V is equal to the V -marginal of κ.

By induction, when all messages have been sent except the one from the neighbour clique C to L, all cliques other than L contain the relevant marginal of κ, and C:C C\{L} φ V = φ C S:S S\{S } φ. S Now let, V send its message back to L. To do this, it needs to calculate φ S V. But since S C, and φ C = φ C V we have φ S V = φ S C and sending a message from V to L is thus equivalent to sending a message from C to L. Thus, after this message has been sent, φ L = κ L as desired.

Alternative scheduling of messages Local control: Allow clique to send message if and only if it has already received message from all other neighbours. Such messages are live. Using this protocol, there will be one clique who first receives messages from all its neighbours. This is effectively the root R in CollInfo and DistInfo. Additional messages never do any harm (ignoring efficiency issues) as κ is invariant under message passing. Exactly two live messages along every branch is needed.

Maximization Random sampling Efficient proportional scaling Replace sum-marginal with A maxmarginal: φ A B (x) = max φ B (y) y B :y A =x A Satisfies consonance: φ (A B) = ( φ B) A and distributivity: (φφ C ) B = ( φ B) φ C, if φ C depends on x C only and C B. CollInfo yields maximal value of density f. DistInfo yields configuration with maximum probability. Viterbi decoding for HMMs is special case. Since (1) remains invariant, one can switch freely between maxand sum-propagation.

Maximization Random sampling Efficient proportional scaling After CollInfo, the root potential is φ R (x) p(x R x E ) Modify DistInfo as follows: 1. Pick random configuration ˇx R from φ R. 2. Send message to neighbours C as ˇx R C = ˇx S where S = C R is the separator. 3. Continue by picking ˇx C according to φ C (x C\S, ˇx S ) and send message further away from root. When the sampling stops at leaves of junction tree, a configuration ˇx has been generated from p(x x E ).

Maximization Random sampling Efficient proportional scaling The scaling operation on p: (T a p)(x) p(x) n a (x a ) np a (x a ), x X is potentially very complex, as it cycles through all x X, which is huge if V is large. If we exploit a factorization of p w.r.t. a junction tree T for a decomposable C A p(x) = C C φ C (x C ) S S φ S(x S ), we can avoid scaling p and only scale the corresponding factor φ C with a C : (T a φ C )(x C ) φ C (x C ) n a (x a ) np a (x a ), where p a is calculated by probability propagation. x C X C

Maximization Random sampling Efficient proportional scaling The scaling can now be made by changing the φ s: φ B φ B for B C, φ C T a φ C. This can reduce the complexity considerably. Note that if a = C and φ a = n a (x a ), then T a φ a = φ a. Hence the explicit formula for the MLE.