Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012
onditional Independence ssumptions Local Markov ssumption X Nondescendant X Pa X Global Markov ssumption, sep G, ; Nondescendant X Pa X X N MN Undirected Tree Undirected hordal Graph N Moralize MN P Triangulate 2
Distribution Factorization ayesian Networks (Directed Graphical Models) I map: I l G I P P(X 1,, X n ) = P(X i Pa Xi ) n i=1 onditional Probability Tables (PTs) Markov Networks (Undirected Graphical Models) strictly positive P, I map: I G I P m lique Potentials Normalization (Partition Function) P(X 1,, X n ) = 1 Z Z = i=1 m x 1,x 2,,x n i=1 Ψ i D i Ψ i D i Maximal lique 3
Inference in Graphical Models Graphical models give compact representations of probabilistic distributions P X 1,, X n (n-way tables to much smaller tables) How do we answer queries about P? We use inference as a name for the process of computing answers to such queries 4
Query Type 1: Likelihood Most queries involve evidence Evidence e is an assignment of values to a set E variables Evidence are observations on some variables Without loss of generality E = X k+1,, X n Simplest query: compute probability of evidence P e = x 1 x k P(x 1,, x k, e) This is often referred to as computing the likelihood of e Sum over this set of variables E 5
Query Type 2: onditional Probability Often we are interested in the conditional probability distribution of a variable given the evidence P X, e P(X, e) P X e = = P e x P(X = x, e) It is also called a posteriori belief in X given evidence e We usually query a subset Y of all variables X = {Y, Z, e} and don t care about the remaining Z P Y e = P(Y, Z = z e) z Take all possible configuration of Z into account The processes of summing out the unwanted variable Z is called marginalization 6
Query Type 2: onditional Probability Example Sum over this set of variables Interested in the conditionals for these variables E Interested in the conditionals for these variables E Sum over this set of variables 7
pplication of a posteriori elief Prediction: what is the probability of an outcome given the starting condition The query node is a descendent of the evidence Diagnosis: what is the probability of disease/fault given symptoms The query node is an ancestor of the evidence Learning under partial observations (Fill in the unobserved) Information can flow in either direction Inference can combine evidence from all parts of the networks 8
Query Type 3: Most Probable ssignment Want to find the most probably joint assignment for some variables of interests Such reasoning is usually performed under some given evidence e, and ignoring (the values of other variables) Z lso called maximum a posteriori (MP) assignment for Y MP Y e = argmax y P Y e = argmax y P Y, Z = z e z Sum over this set of variables Interested in the most probable values for these variables E 9
pplication of MP assignment lassification Find most likely label, given the evidence Explanation What is the most likely scenario, given the evidence autionary note: The MP assignment of a variable dependence on its context the set of varibales being jointly queried Example: MP of X, Y? (0, 0) MP of X? 1 X Y P(X,Y) 0 0 0.35 0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 10
omplexity of Inference omputing the a posteriori belief P X e in a GM is NP-hard in general Hardness implies we cannot find a general procedure that works efficiently for arbitrary GMs For particular families of GMs, we can have provably efficient procedures eg. trees For some families of GMs, we need to design efficient approximate inference algorithms eg. grids 11
pproaches to inference Exact inference algorithms Variable elimination algorithm Message-passing algorithm (sum-product, belief propagation algorithm) The junction tree algorithm pproximate inference algorithms Sampling methods/stochastic simulation Variational algorithms 12
Marginalization and Elimination metabolic pathway: What is the likelihood protein E is produced D E Query: P(E) P E = P a, b, c, d, E d c b a Using graphical models, we get P E = P a)p b a P c b P d c P(E d d c b a Naïve summation needs to enumerate over an exponential number of terms 13
Elimination in hains D E Rearranging terms and the summations P E = P a)p b a P c b P d c P(E d d c b a = P c b P d c P E d P a P b a d c b a 14
Elimination in hains (cont.) D E P(b) Now we can perform innermost summation efficiently P E = P c b P d c P E d P a P b a d c b = P c b P d c P E d P(b) d c b The innermost summation eliminates one variable from our summation argument at a local cost. a Equivalent to matrix-vector multiplication, Val() * Val() 15
Elimination in hains (cont.) D E P(b) P(c) Rearranging and then summing again, we get P E = P c b P d c P e d P(b) d c b 0 1 0 0.15 0.35 1 0.85 0.65 0 0 0.25 1 0.75 = P d c P E d P c b P b d c = P d c P E d P(c) d c b Equivalent to matrix-vector multiplication, Val() * Val() 16
Elimination in hains (cont.) D E P(b) P(c) Eliminate nodes one by one all the way to the end P E = P E d P(d) d omputational omplexity for a chain of length k Each step costs O( Val(X i ) * Val(X i+1 ) ) operations: O(kn 2 ) Ψ X i = P X i X i 1 )P(X i 1 ) x i 1 ompare to naïve summation: O(n k ) x 1 P(x 1,, X k ) x k 1 17
Undirected hains D E Rearrange terms, perform local summation P E = d c b a 1 Z Ψ b, a Ψ c, b Ψ d, c Ψ(E, d) = 1 Z Ψ c, b Ψ d, c Ψ E, d Ψ b, a d c b a = 1 Z Ψ c, b Ψ d, c Ψ E, d Ψ b d c b 18
The Sum-Product Operation During inference, we try to compute an expression Sum-product form: Z Ψ F X = {X 1,, X n } the set of variables F a set of factors such that for each Ψ F, Scope Ψ X Y X a set of query variables Z = X Y the variables to eliminate The result of eliminating the variables in Z is a factor τ Y = Ψ z Ψ F This factor does not necessarily correspond to any probability or conditional probability in the network. P Y = τ(y) τ(y) Ψ 19
Inference via Variable Elimination General Idea Write query in the form P X 1, e = P x i Pa Xi x n The sum is ordered to suggest an elimination order Then iteratively Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product Finally renormalize P X 1 e = x 3 x 2 i τ X 1, e x 1 τ(x 1, e) 20
more complex network food web D E F G H What is the probability P H that hawks are leaving given that the grass condition is poor? 21
Example: Variable Elimination Query: P( h), need to eliminate,, D, E, F, G, H Initial factors D P a P b P c b P d a P e c, d P f a P g e P h e, f E F hoose an elimination order: H, G, F, E, D,, (<) G H Step 1: Eliminate G onditioning (fix the evidence node on its observed value) m h e, f = P(H = h e, f) D E F G 22
Example: Variable Elimination Query: P( h), need to eliminate,, D, E, F, G Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h (e, f) Step 2: Eliminate G ompute m g e = P g e g = 1 P a P b P c b P d a P e c, d P f a m g e m h (e, f) P a P b P c b P d a P e c, d P f a m h (e, f) G E E D H D F F 23
Example: Variable Elimination Query: P( h), need to eliminate,, D, E, F Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f G E D H F Step 3: Eliminate F ompute m f e, a = P f a m h (e, f) f P a P b P c b P d a P e c, d m f (e, a) E D 24
Example: Variable Elimination Query: P( h), need to eliminate,, D, E Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e G E D H F Step 3: Eliminate E ompute m e a, c, d = P e c, d m f (a, e) e P a P b P c b P d a m e (a, c, d) D 25
Example: Variable Elimination Query: P( h), need to eliminate,, D Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d G E D H F Step 3: Eliminate D ompute m d a, c = P d a m e (a, c, d) d P a P b P c b m d (a, c) 26
Example: Variable Elimination Query: P( h), need to eliminate, Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d P a P b P c b m d a, c Step 3: Eliminate ompute m c a, b = P c b m d (a, c) c P a P b m c (a, b) G E D H F 27
Example: Variable Elimination Query: P( h), need to eliminate Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d P a P b P c b m d a, c P a P b m c a, b G E D H F Step 3: Eliminate ompute m b a = b P(b)m c (a, b) P a m b (a) 28
Example: Variable Elimination Query: P( h), need to renormalize over Initial factors P a P b P c b P d a P e c, d P f a P g e P h e, f P a P b P c b P d a P e c, d P f a P g e m h e, f P a P b P c b P d a P e c, d P f a m h e, f P a P b P c b P d a P e c, d m f a, e P a P b P c b P d a m e a, c, d P a P b P c b m d a, c P a P b m c a, b P a m b a G E D H F Step 3: renormalize P a, h = P a m b a, compute P(h) = P a m b (a) a P a h = P a m b (a) a P a m b () 29
omplexity of variable elimination Suppose in one elimination step we compute m x y 1,, y k = m x (x, y 1,, y k ) m x x, y 1,, y k = m i x, y ci x k i=1 X This requires k Val X i Val Y ci multiplications For each value of x, y 1,, y k, we do k multiplications Val X i Val Y ci additions For each value of y 1,, y k, we do Val X additions y 1 y i y k omplexity is exponential in the number of variables in the intermediate factor 30
From Variable Elimination to Message Passing Recall that induced dependency during marginalization is captured in elimination cliques Summation Elimination Intermediate term Elimination cliques an this lead to an generic inference algorithm? 31
Tree Graphical Models Undirected tree: a unique path between any pair of nodes Directed tree: all nodes except the root have exactly one parent 32
Equivalence of directed and undirected trees ny undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it directed tree and the corresponding undirected tree make the conditional independence assertions Parameterization are essentially the same Undirected tree: P X = 1 Z i V Ψ X i Ψ(X i, X j ) (i,j) E Directed tree: P X = P X r P(X j X i ) i,j E Equivalence: Ψ X i = P X r, Ψ X i, X j = P X j X i, Z = 1, Ψ X i = 1 33
From Variable Elimination to Message Passing Recall Variable Elimination lgorithm hoose an ordering in which the query node f is the final node Eliminate node i by removing all potentials containing i, take sum/product over x i Place the resultant factor back For a Tree graphical model: hoose query node f as the root of the tree View tree as a directed tree with edges pointing towards f Elimination of each node can be considered as message-passing directly along tree branches, rather than on some transformed graphs Thus, we can use the tree itself as a data-structure to inference 34
Message passing for trees Let m ij X i denote the factor resulting from eliminating variables from below up to i, which is a function X i m ji X i = Ψ x j Ψ X i, x j m kj (x j ) k N j \i x j This is like a message sent from j to i k m kj X j m ji X i m ji X f j i f l m lj X j P x f Ψ x f m ef (x f ) e N f m ef (x f ) represents a belief on x f from x e 35