Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum

Similar documents
Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Probabilistic Graphical Models

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Exact Inference: Clique Trees. Sargur Srihari

CS281A/Stat241A Lecture 19

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Message Passing Algorithms and Junction Tree Algorithms

Context-specific independence Parameter learning: MLE

13 : Variational Inference: Loopy Belief Propagation

6.867 Machine learning, lecture 23 (Jaakkola)

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Variable Elimination (VE) Barak Sternberg

Variable Elimination: Algorithm

Inference and Representation

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum

A Graphical Model for Simultaneous Partitioning and Labeling

Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes

Belief Update in CLG Bayesian Networks With Lazy Propagation

Variable Elimination: Algorithm

Probability and Structure in Natural Language Processing

Inference as Optimization

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Bayesian Networks

9 Forward-backward algorithm, sum-product on factor graphs

Lecture 17: May 29, 2002

11 The Max-Product Algorithm

Variational Inference (11/04/13)

Consistency Under Sampling of Exponential Random Graph Models

More belief propaga+on (sum- product)

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Machine Learning 4771

Junction Tree, BP and Variational Methods

CS Lecture 4. Markov Random Fields

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Graphical Models and Kernel Methods

Bayesian Machine Learning

Axioms for Probability and Belief-Function Propagation

Machine Learning Lecture 14

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Directed and Undirected Graphical Models

Stat 521A Lecture 18 1

Machine Learning 4771

Probabilistic Graphical Models

Morphing the Hugin and Shenoy Shafer Architectures

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Probability Propagation

5. Sum-product algorithm

Probability and Structure in Natural Language Processing

Probabilistic Graphical Models

Cographs; chordal graphs and tree decompositions

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Lecture 6: Graphical Models

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

17 Variational Inference

Machine Learning for Data Science (CS4786) Lecture 24

Variational algorithms for marginal MAP

p L yi z n m x N n xi

Lecture 10: Lovász Local Lemma. Lovász Local Lemma

Probability Propagation

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Probabilistic Graphical Models

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

12 : Variational Inference I

Loopy Belief Propagation for Bipartite Maximum Weight b-matching

Fractional Belief Propagation

Probabilistic Graphical Models (I)

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

COMPSCI 276 Fall 2007

Graphical Model Inference with Perfect Graphs

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. C P (D = t) f 0.9 t 0.

Bayesian & Markov Networks: A unified view

Bayesian Machine Learning - Lecture 7

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Intelligent Systems (AI-2)

Bayesian Networks: Independencies and Inference

Constitutive models. Constitutive model: determines P in terms of deformation

Probabilistic Graphical Models Lecture Notes Fall 2009

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CS 188: Artificial Intelligence Spring Announcements

Structure Learning: the good, the bad, the ugly

Lecture 8: Bayesian Networks

Nearest Neighbor Search with Keywords

Particle-Based Approximate Inference on Graphical Model

Lecture 20: Lagrange Interpolation and Neville s Algorithm. for I will pass through thee, saith the LORD. Amos 5:17

13 : Variational Inference: Loopy Belief Propagation and Mean Field

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

4 : Exact Inference: Variable Elimination

Introduc)on to Ar)ficial Intelligence

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

CS 188: Artificial Intelligence. Bayes Nets

Exact MAP estimates by (hyper)tree agreement

Graphical models. Sunita Sarawagi IIT Bombay

Hidden Markov Models (recap BNs)

Transcription:

Graphical Models Lecture 12: Belief Update Message Passing Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for slide materials. 1

Today s Plan Quick Review: Sum Product Message Passing (also known as Shafer- Shenoy ) Today: Sum Product Divide Message Passing (also known as belief update message passing Lauritzen- Spiegelhalter and belief propagafon ) MathemaFcally equivalent, but different intuifons. Moving toward approximate inference.

Quick Review ϕ F ϕ A {H, R, S, A, F} Flu ϕ FAS All. S.I. ϕ SR ϕ SH ψ SR SR ϕ SR R.N. H. τ S ϕ SH ϕ FAS ϕ A ϕ F ν FAS ψ SH HS τ S SAF ψ FAS ν j (C j ) = φ Φ j φ

Message Passing (One Root) Input: clique tree T, factors Φ, root C r For each clique C i, calculate ν i While C r is swll waiwng on incoming messages: Choose a C i that has received all of δ i j = its incoming ν i C messages. i \S i,j k Neighbors i \{j} Calculate and send the message from C i to C upstream- neighbor(i) β r = ν r = k Neighbors r δ k r X\C i φ Φ = Z P (C r ) φ δ k i

Different Roots; Same Messages δ SAF SR β SR SR δ SR SAF HS SAF root β HS δ HS SAF β SAF δ SAF HS

Sum- Product Message Passing Each clique tree vertex C i passes messages to each of its neighbors once it s ready to do so. At the end, for all C i : β i = ν i k Neighbors i δ k i This is the unnormalized marginal for C i.

Calibrated Clique Tree Two adjacent cliques C i and C j are calibrated when: C i \S i,j β i = C j \S i,j β j = µ i,j (S i,j ) β i β j C i μ i,j S i,j = C i C j C j

Calibrated Clique Tree as a Original (unnormalized) factor model and calibrated clique tree represent the same (unnormalized) measure: φ Φ φ = unnormalized Gibbs distribuwon from original factors Φ C Vertices(T ) S Edges(T ) β C µ S clique beliefs sepset beliefs

Inventory of Factors original factors ϕ iniwal potenwals ν messages δ intermediate factors ψ (no longer explicit) clique beliefs β sepset beliefs μ

Inventory of Factors original factors ϕ iniwal potenwals ν messages δ intermediate factors ψ (no longer explicit) clique beliefs β sepset beliefs μ New algorithm collapses everything into beliefs!

Another OperaWon: Factor Division 0 / 0 is defined to be 0 a / 0 is undefined when a > 0 A B C ϕ 1 (A, B, C) A B C ϕ 3 (A, B, C) 0 0 0 7000 0 0 0 70 0 0 1 30 0 1 0 5 0 1 1 500 1 0 0 100 1 0 1 1 1 1 0 10 / B C ϕ 2 (B, C) 0 0 100 0 1 1 1 0 2 1 1 100 = 0 0 1 30 0 1 0 2.5 0 1 1 5 1 0 0 1 1 0 1 1 1 1 0 5 1 1 1 1000 1 1 1 10

Messages When compuwng the message δ i j from i to j, we mulwply together all incoming messages to i except the one from j to i, δ j i. AlternaWve: mulwply all messages, and divide out the one from j to i. β i = ν i δ i j = δ i j = k Neighbors i δ k i C i \S i,j ν i C i \S i,j β i δ j i k Neighbors i \{j} δ k i

Key Idea We can forget the iniwal potenwals ν. We do not need to calculate the messages δ explicitly. Store a parwally calculated β on each vertex and a parwally calculated μ on each edge; update whenever new informawon comes in.

A Single Belief Update At any point in the algorithm: σ i j C i \S i,j β i β j β j σ i j µ i,j µ i,j σ i j

Belief Update Message Passing Maintain beliefs at each vertex (β) and edge (μ). IniWalize each β i to ν i. IniWalize each μ i,j to 1. Pass belief update messages. σ i j C i \S i,j β i β j β j σ i j µ i,j µ i,j σ i j

Three Clique Example σ i j C i \S i,j β i β j β j σ i j µ i,j µ i,j σ i j β 1 C 1 = {A, B} μ 1,2 C 3 = {C, D} μ 2,3 C 2 = {B, C} β 3 β 2

Worries Does the order of the messages maqer? What if we pass the same message twice? What if we pass a message based on parwal informawon?

Claims At convergence, we will have a calibrated clique tree. C i \S i,j β i = C j \S i,j β j = µ i,j (S i,j ) Invariant: throughout the algorithm: φ = ν C = φ Φ C Vertices(T ) C Vertices(T ) S Edges(T ) β C µ S

Equivalence Sum product message passing and sum product divide message passing lead to the same result: a calibrated clique tree. SumProduct lets you calculate beliefs at the very end. BeliefUpdate has beliefs from the start, and keeps them around the whole Wme.

Complexity Linear in number of cliques, total size of all factors. Can accomplish convergence in the same upward/downward passes we used for the earlier version.

Dealing with Evidence 21

Advantage: Incremental Updates Naively, if we have evidence, we can alter the iniwal potenwals at the start, then calibrate using message passing. Beqer: think of evidence as a newly arrived factor into some clique tree node(s).

Example σ i j C i \S i,j β i β j β j σ i j µ i,j µ i,j σ i j β 1 C 1 = {A, B} μ 1,2 D=1 C 3 = {C, D} β 2 β 3 μ 2,3 C 2 = {B, C}

Advantage: Incremental Updates Naively, if we have evidence, we can alter the iniwal potenwals at the start, then calibrate using message passing. Beqer: think of evidence as a newly arrived factor into some clique tree node i. Recalibrate: pass messages out from node i. Single pass! RetracWon: can t recover anything mulwplied by zero.

Queries across cliques 25

Advantage: Queries across Cliques Naively: enforce that all query variables are in some clique. Every query might need its own clique tree! Beqer: variable eliminawon in a calibrated clique tree. Bonus: only have to use a subtree that includes all query variables.

MulW- Clique Queries Find a subtree of T that includes all query variables Q. Call it T and its scope S. Pick a root node r in T. Run variable eliminawon of S \ Q with factors (for all I in T ): φ i = β i µ i,upstream(i)

Example: Z P(B, D) β 1 C 1 = {A, B} μ 1,2 D=1 C 3 = {C, D} μ 2,3 C 2 = {B, C} β 2 β 3

Advantage: MulWple Queries Suppose we want the marginal for every pair of variables X, Y. Naïve: construct a clique tree so all nodes pair together. (Very bad.) Naïve: run VE n- choose- 2 Wmes. Beqer: dynamic programming.

Dynamic Programming for All Pairs Construct a table so that A i,j contains U(C i, C j ) = Z P(C i, C j ). Base case: C i and C j are neighboring cliques. A i,j = U(C i, C j ) Proceed to farther more distant pairs recursively. = U(C j C i )U(C i ) = β j µ i,j β i C l C j C i

Dynamic Programming for All Pairs C i and C j are independent given C l. We already have U(C i, C l ) and U(C l, C j ). A i,j = U(C i, C j ) = U(C i, C l )U(C j C l ) C l \C j = µ j,l C l \C j A i,l β j C l C j C i

Pros and Cons: Message Passing in Clique Trees MulWple queries Incremental updates CalibraWon operawon has transparent complexity. But: Complexity can be high (space!) Slower than VE for a single query Local factor structure is lost

Summary: Message Passing in Clique Trees How to construct a clique tree (from VE elimina5on order or triangulated chordal graph) Marginal queries for all variables solved in only twice the 5me of one query! Belief update version: clique poten5als are reparameterized so that the clique tree invariant always holds. Run5me is linear in number of cliques, exponen5al in size of the largest clique (# variables; induced width).