Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Similar documents
Message Passing Algorithms and Junction Tree Algorithms

4 : Exact Inference: Variable Elimination

Example: multivariate Gaussian Distribution

Directed Graphical Models or Bayesian Networks

Probabilistic Graphical Models (I)

Bayesian & Markov Networks: A unified view

Undirected Graphical Models: Markov Random Fields

COMPSCI 276 Fall 2007

Variable Elimination: Basic Ideas

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Bayes Networks 6.872/HST.950

Probabilistic Graphical Models

Inference in Bayesian Networks

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Machine Learning 4771

Variable Elimination: Algorithm

Machine Learning Summer School

Exact Inference: Variable Elimination

Recitation 9: Graphical Models: D-separation, Variable Elimination and Inference

Statistical Approaches to Learning and Discovery

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

9 Forward-backward algorithm, sum-product on factor graphs

Variable Elimination: Algorithm

Lecture 8: Bayesian Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Inference as Optimization

Machine Learning Lecture 14

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Graphical Models and Kernel Methods

Lecture 17: May 29, 2002

CPSC 540: Machine Learning

Quantifying uncertainty & Bayesian networks

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

CSC 412 (Lecture 4): Undirected Graphical Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Probabilistic Graphical Networks: Definitions and Basic Results

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Undirected graphical models

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

13 : Variational Inference: Loopy Belief Propagation

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Probabilistic Graphical Models

Directed and Undirected Graphical Models

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

6.867 Machine learning, lecture 23 (Jaakkola)

Bayesian networks. Chapter Chapter

Bayesian Networks. Motivation

Alternative Parameterizations of Markov Networks. Sargur Srihari

Junction Tree, BP and Variational Methods

Representation of undirected GM. Kayhan Batmanghelich

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Uncertainty and Bayesian Networks

Directed and Undirected Graphical Models

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

5. Sum-product algorithm

Undirected Graphical Models

2 : Directed GMs: Bayesian Networks

CS Lecture 4. Markov Random Fields

Chapter 16. Structured Probabilistic Models for Deep Learning

Based on slides by Richard Zemel

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Lecture 4 October 18th

Computational Complexity of Inference

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Notes on Machine Learning for and

Bayesian Machine Learning

An Introduction to Bayesian Machine Learning

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Bayesian Networks BY: MOHAMAD ALSABBAGH

Directed Graphical Models

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Naïve Bayes classification

Lecture 9: PGM Learning

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Bayesian Networks Representation and Reasoning

Probabilistic Classification

Bayesian Machine Learning - Lecture 7

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Stephen Scott.

3 : Representation of Undirected GM

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Review: Bayesian learning and inference

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Inference in Bayesian networks

Variational Inference (11/04/13)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Structured Variational Inference

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Learning Bayesian Networks (part 1) Goals for the lecture

Context-specific independence Parameter learning: MLE

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Transcription:

Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012

onditional Independence ssumptions Local Markov ssumption X Nondescendant X Pa X Global Markov ssumption, sep G, ; Nondescendant X Pa X X N MN Undirected Tree Undirected hordal Graph N Moralize MN P Triangulate 2

Distribution Factorization ayesian Networks (Directed Graphical Models) I map: I l G I P P(X 1,, X n ) = P(X i Pa Xi ) n i=1 onditional Probability Tables (PTs) Markov Networks (Undirected Graphical Models) strictly positive P, I map: I G I P m lique Potentials Normalization (Partition Function) P(X 1,, X n ) = 1 Z Z = i=1 m x 1,x 2,,x n i=1 Ψ i D i Ψ i D i Maximal lique 3

Inference in Graphical Models Graphical models give compact representations of probabilistic distributions P X 1,, X n (n-way tables to much smaller tables) How do we answer queries about P? We use inference as a name for the process of computing answers to such queries 4

Query Type 1: Likelihood Most queries involve evidence Evidence e is an assignment of values to a set E variables Evidence are observations on some variables Without loss of generality E = X k+1,, X n Simplest query: compute probability of evidence P e = x 1 x k P(x 1,, x k, e) This is often referred to as computing the likelihood of e Sum over this set of variables E 5

Query Type 2: onditional Probability Often we are interested in the conditional probability distribution of a variable given the evidence P X, e P(X, e) P X e = = P e x P(X = x, e) It is also called a posteriori belief in X given evidence e We usually query a subset Y of all variables X = {Y, Z, e} and don t care about the remaining Z P Y e = P(Y, Z = z e) z Take all possible configuration of Z into account The processes of summing out the unwanted variable Z is called marginalization 6

Query Type 2: onditional Probability Example Sum over this set of variables Interested in the conditionals for these variables E Interested in the conditionals for these variables E Sum over this set of variables 7

pplication of a posteriori elief Prediction: what is the probability of an outcome given the starting condition The query node is a descendent of the evidence Diagnosis: what is the probability of disease/fault given symptoms The query node is an ancestor of the evidence Learning under partial observations (Fill in the unobserved) Information can flow in either direction Inference can combine evidence from all parts of the networks 8

Query Type 3: Most Probable ssignment Want to find the most probably joint assignment for some variables of interests Such reasoning is usually performed under some given evidence e, and ignoring (the values of other variables) Z lso called maximum a posteriori (MP) assignment for Y MP Y e = argmax y P Y e = argmax y P Y, Z = z e z Sum over this set of variables Interested in the most probable values for these variables E 9

pplication of MP assignment lassification Find most likely label, given the evidence Explanation What is the most likely scenario, given the evidence autionary note: The MP assignment of a variable dependence on its context the set of varibales being jointly queried Example: MP of X, Y? (0, 0) MP of X? 1 X Y P(X,Y) 0 0 0.35 0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 10

omplexity of Inference omputing the a posteriori belief P X e in a GM is NP-hard in general Hardness implies we cannot find a general procedure that works efficiently for arbitrary GMs For particular families of GMs, we can have provably efficient procedures eg. trees For some families of GMs, we need to design efficient approximate inference algorithms eg. grids 11

pproaches to inference Exact inference algorithms Variable elimination algorithm Message-passing algorithm (sum-product, belief propagation algorithm) The junction tree algorithm pproximate inference algorithms Sampling methods/stochastic simulation Variational algorithms 12

Marginalization and Elimination metabolic pathway: What is the likelihood protein E is produced D E Query: P(E) P E = P a, b, c, d, E d c b a Using graphical models, we get P E = P a)p b a P c b P d c P(E d d c b a Naïve summation needs to enumerate over an exponential number of terms 13

Elimination in hains D E Rearranging terms and the summations P E = P a)p b a P c b P d c P(E d d c b a = P c b P d c P E d P a P b a d c b a 14

Elimination in hains (cont.) D E P(b) Now we can perform innermost summation efficiently P E = P c b P d c P E d P a P b a d c b = P c b P d c P E d P(b) d c b The innermost summation eliminates one variable from our summation argument at a local cost. a Equivalent to matrix-vector multiplication, Val() * Val() 15

Elimination in hains (cont.) D E P(b) P(c) Rearranging and then summing again, we get P E = P c b P d c P e d P(b) d c b 0 1 0 0.15 0.35 1 0.85 0.65 0 0 0.25 1 0.75 = P d c P E d P c b P b d c = P d c P E d P(c) d c b Equivalent to matrix-vector multiplication, Val() * Val() 16

Elimination in hains (cont.) D E P(b) P(c) Eliminate nodes one by one all the way to the end P E = P E d P(d) d omputational omplexity for a chain of length k Each step costs O( Val(X i ) * Val(X i+1 ) ) operations: O(kn 2 ) Ψ X i = P X i X i 1 )P(X i 1 ) x i 1 ompare to naïve summation: O(n k ) x 1 P(x 1,, X k ) x k 1 17

Undirected hains D E Rearrange terms, perform local summation P E = d c b a 1 Z Ψ b, a Ψ c, b Ψ d, c Ψ(E, d) = 1 Z Ψ c, b Ψ d, c Ψ E, d Ψ b, a d c b a = 1 Z Ψ c, b Ψ d, c Ψ E, d Ψ b d c b 18

The Sum-Product Operation During inference, we try to compute an expression Sum-product form: Z Ψ F X = {X 1,, X n } the set of variables F a set of factors such that for each Ψ F, Scope Ψ X Y X a set of query variables Z = X Y the variables to eliminate The result of eliminating the variables in Z is a factor τ Y = Ψ z Ψ F This factor does not necessarily correspond to any probability or conditional probability in the network. P Y = τ(y) τ(y) Ψ 19

Inference via Variable Elimination General Idea Write query in the form P X 1, e = P x i Pa Xi x n The sum is ordered to suggest an elimination order Then iteratively Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product Finally renormalize P X 1 e = x 3 x 2 i τ X 1, e x 1 τ(x 1, e) 20

more complex network food web D E F G H What is the probability P H that hawks are leaving given that the grass condition is poor? 21

Example: Variable Elimination Query: P( h), need to eliminate,, D, E, F, G, H Initial factors D P a P b P c b P d a P e c, d P f a P g e P h e, f E F hoose an elimination order: H, G, F, E, D,, (<) G H Step 1: Eliminate G onditioning (fix the evidence node on its observed value) m h e, f = P(H = h e, f) D E F G 22

Example: Variable Elimination Query: P( h), need to eliminate,, D, E, F, G Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h (e, f) Step 2: Eliminate G ompute m g e = P g e g = 1 P a P b P c b P d a P e c, d P f a m g e m h (e, f) P a P b P c b P d a P e c, d P f a m h (e, f) G E E D H D F F 23

Example: Variable Elimination Query: P( h), need to eliminate,, D, E, F Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f G E D H F Step 3: Eliminate F ompute m f e, a = P f a m h (e, f) f P a P b P c b P d a P e c, d m f (e, a) E D 24

Example: Variable Elimination Query: P( h), need to eliminate,, D, E Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e G E D H F Step 3: Eliminate E ompute m e a, c, d = P e c, d m f (a, e) e P a P b P c b P d a m e (a, c, d) D 25

Example: Variable Elimination Query: P( h), need to eliminate,, D Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d G E D H F Step 3: Eliminate D ompute m d a, c = P d a m e (a, c, d) d P a P b P c b m d (a, c) 26

Example: Variable Elimination Query: P( h), need to eliminate, Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d P a P b P c b m d a, c Step 3: Eliminate ompute m c a, b = P c b m d (a, c) c P a P b m c (a, b) G E D H F 27

Example: Variable Elimination Query: P( h), need to eliminate Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d P a P b P c b m d a, c P a P b m c a, b G E D H F Step 3: Eliminate ompute m b a = b P(b)m c (a, b) P a m b (a) 28

Example: Variable Elimination Query: P( h), need to renormalize over Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d P a P b P c b m d a, c P a P b m c a, b P a m b a G E D H F Step 3: renormalize P a, h = P a m b a, compute P(h) = P a m b (a) a P a h = P a m b (a) a P a m b () 29

omplexity of variable elimination Suppose in one elimination step we compute m x y 1,, y k = m x (x, y 1,, y k ) m x x, y 1,, y k = m i x, y ci x k i=1 X This requires k Val X i Val Y ci multiplications For each value of x, y 1,, y k, we do k multiplications Val X i Val Y ci additions For each value of y 1,, y k, we do Val X additions y 1 y i y k omplexity is exponential in the number of variables in the intermediate factor 30

From Variable Elimination to Message Passing Recall that induced dependency during marginalization is captured in elimination cliques Summation Elimination Intermediate term Elimination cliques an this lead to an generic inference algorithm? 31

Tree Graphical Models Undirected tree: a unique path between any pair of nodes Directed tree: all nodes except the root have exactly one parent 32

Equivalence of directed and undirected trees ny undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it directed tree and the corresponding undirected tree make the conditional independence assertions Parameterization are essentially the same Undirected tree: P X = 1 Z i V Ψ X i Ψ(X i, X j ) (i,j) E Directed tree: P X = P X r P(X j X i ) i,j E Equivalence: Ψ X i = P X r, Ψ X i, X j = P X j X i, Z = 1, Ψ X i = 1 33

From Variable Elimination to Message Passing Recall Variable Elimination lgorithm hoose an ordering in which the query node f is the final node Eliminate node i by removing all potentials containing i, take sum/product over x i Place the resultant factor back For a Tree graphical model: hoose query node f as the root of the tree View tree as a directed tree with edges pointing towards f Elimination of each node can be considered as message-passing directly along tree branches, rather than on some transformed graphs Thus, we can use the tree itself as a data-structure to inference 34

Message passing for trees Let m ij X i denote the factor resulting from eliminating variables from below up to i, which is a function X i m ji X i = Ψ x j Ψ X i, x j m kj (x j ) k N j \i x j This is like a message sent from j to i k m kj X j m ji X i m ji X f j i f l m lj X j P x f Ψ x f m ef (x f ) e N f m ef (x f ) represents a belief on x f from x e 35