Variable Elimination (VE) Barak Sternberg

Similar documents
Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Inference and Representation

Variable Elimination: Algorithm

Variable Elimination: Algorithm

CS281A/Stat241A Lecture 19

Exact Inference: Variable Elimination

Variable Elimination: Basic Ideas

4 : Exact Inference: Variable Elimination

Exact Inference Algorithms Bucket-elimination

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Junction Tree, BP and Variational Methods

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Exact Inference Algorithms Bucket-elimination

Probabilistic Graphical Models (I)

Message Passing Algorithms and Junction Tree Algorithms

Inference as Optimization

Lecture 4 October 18th

Machine Learning 4771

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Exact Inference: Clique Trees. Sargur Srihari

Intelligent Systems (AI-2)

Alternative Parameterizations of Markov Networks. Sargur Srihari

Undirected Graphical Models: Markov Random Fields

Machine Learning Lecture 14

5. Sum-product algorithm

Bayesian networks approximation

Lecture 17: May 29, 2002

CS Lecture 4. Markov Random Fields

COMPSCI 276 Fall 2007

Alternative Parameterizations of Markov Networks. Sargur Srihari

Probability and Structure in Natural Language Processing

CSC 412 (Lecture 4): Undirected Graphical Models

Bayesian Networks: Representation, Variable Elimination

Directed and Undirected Graphical Models

Artificial Intelligence

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter

Sums of Products. Pasi Rastas November 15, 2005

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Computational Complexity of Inference

Directed and Undirected Graphical Models

Graphical Models and Kernel Methods

4.1 Notation and probability review

Join Ordering. 3. Join Ordering

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

COMP538: Introduction to Bayesian Networks

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Probability Propagation

6.867 Machine learning, lecture 23 (Jaakkola)

Probabilistic Graphical Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian & Markov Networks: A unified view

From Bayesian Networks to Markov Networks. Sargur Srihari

Graphical Models - Part II

Probability and Structure in Natural Language Processing

Inference methods in discrete Bayesian networks

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic

Database Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler

13 : Variational Inference: Loopy Belief Propagation

Bayesian Networks: Independencies and Inference

Markov properties for undirected graphs

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

1 Matchings in Non-Bipartite Graphs

Inference in Bayesian Networks

Probabilistic Graphical Models

Directed Graphical Models or Bayesian Networks

On the Relationship between Sum-Product Networks and Bayesian Networks

EE512 Graphical Models Fall 2009

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Structure Learning: the good, the bad, the ugly

Efficient Approximation for Restricted Biclique Cover Problems

Bayesian Machine Learning - Lecture 7

Probability Propagation

Fisher Information in Gaussian Graphical Models

Graphical models. Sunita Sarawagi IIT Bombay

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Classification of Join Ordering Problems

Undirected Graphical Models

CS Lecture 3. More Bayesian Networks

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Structured Variational Inference

Duality of LPs and Applications

Lecture 8: December 17, 2003

Statistical Learning

11 The Max-Product Algorithm

Variational Inference. Sargur Srihari

The minimum G c cut problem

Chris Bishop s PRML Ch. 8: Graphical Models

Lecture 13, Fall 04/05

CRF for human beings

Dominating Set. Chapter 26

Dominating Set. Chapter Sequential Greedy Algorithm 294 CHAPTER 26. DOMINATING SET

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

Learning' Probabilis2c' Graphical' Models' BN'Structure' Structure' Learning' Daphne Koller

Undirected Graphical Models

University of New Mexico Department of Computer Science. Final Examination. CS 362 Data Structures and Algorithms Spring, 2007

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

9 Forward-backward algorithm, sum-product on factor graphs

Principles of AI Planning

Transcription:

Variable Elimination (VE) Barak Sternberg

Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x P X 2 = l X 1 = x) Saving each distribution and reusing it for next computation (Dynamic Programming).

Basic Ideas in VE Complexity of Example 1: Assume value space of X i s is of size k For Each distribution P(X i ): O(k 2 ) There are n nodes: O(nk 2 ) Naive Computation cost is O(k n 1 ): computing X 1, X n 1 P(X 1, X n )

Basic Ideas in VE Example 2 Assume The equation: ψ A, B, C, D = φ 1 (A)φ 2 (B,A) φ 3 (C,B) φ 4 D, C We want to compute: ψ D : = A,B,C ψ A, B, C, D This will be elimination of A,B,C from ψ.

Basic Ideas in VE We can directly compute: A,B,C φ 1 (A )φ 2 (B,A ) φ 3 (C,B ) φ 4 D, C Will cost O(K 3 ) Or we can use the following observation: A,B,C φ 1 (A )φ 2 (B,A ) φ 3 (C,B ) φ 4 D, C = φ 4 D, C φ 3 (C,B ) φ 2 (B,A ) φ 1 (A ) C B A

Basic Ideas in VE We can do it in stages: Stage 1: define δ 0 A, B = φ 1 A φ 2 (B,A ) Compute δ 1 B = A δ 0 A, B We have Left to compute: B,C δ 1 (B )φ 3 (C,B ) φ 4 D, C Continue the same way. Complexity: costs: O 3K 2 = O(K 2 )

Conclusion Computing inner expressions to outer, caching inner ones prevents recomputation! Exactly like in dynamic programming. Pushing the sigmas inside may help us compute the total function in stages.

Factor Marginalization Definition: φ X is a factor iff φ Val X R Where X is a set of rv s, denote Scope φ X Definition: Let X be a set of rv s, Y X is a rv, and a factor φ(x, Y). Then The factor marginalization of Y in φ or Y φ is: Y φ ψ X = Y Val(Y) φ(x, Y ) Trivial properties: X Y φ = Y X φ X Scope φ 1 X φ 1 φ 2 = φ 1 X φ 2

Algorithm Result Given set of factors Φ and variables to eliminate, Z, the algorithm returns the factor (function): Z φ Φ φ

3SAT Run Example On Board

Complexity Analysis Assume we have n random variables and m factors. We run the algorithm for total variable elimination. e.g: in Bayesian model with n nodes the number of factors is m = n, Markov can have m n. m + n factors at most were in the set of factors. Denote the number of entries in ψ i as N i. Each factor φ is multiplied at most N i times when generating its corresponding ψ i, and then replaced with new factor. Thus, max multiplication number: O( n + m max(n i )) i

Complexity Analysis Additions Complexity: each entry in ψ i is added once to generate a new factor τ i, there are at most n factors ψ i generated in the internal algorithm. Thus, addition complexity: O(nmax(N i )) i Total Complexity: O((m + n)max(n i )) i

Complexity Analysis If each factor φ i has K i variables, and we have no more than v values, N i v k i And when each variable has exactly v possible values, then N i = v k i. Thus, we depend on all factors generated, exponentially in their variables number.

Variable Elimination For Conditional Proability

Algorithm Result Given a set of factors Φ and query variables Y, E=e as evidence, the algorithm returns: φ Y = Z φ Φ φ[e = e] α = Z,Y φ Φ φ[e = e]

Elimination as Graph Transformation Definition: let Φ be a set of factors, then define: Scope Φ φ Φ Scope φ Definition: Let Φ be a set of factors, H Φ is the undirected graph over the nodes Scope(Φ) where X i, X j have an edge iff there is φ such that X i, X j Scope φ.

Elimination as Graph Transformation For given set of factors in the algorithm Φ, construct H Φ. Eliminate variables (nodes), remove edges from removed node, add fill edges.

The Induced Graph Definition: Let Φ be a set of factors over X = {X 1, X n } and be elimination ordering over some X X. The induced graph L Φ, nodes are X. X i and X j are connected by an edge if they both appear in some intermediate factor generated by the VE algorithm using as an elimination ordering.

The Induced Graph And Maximal Cliques Theorem: Let L Φ, be the induced graph for a set of factors Φ and some elimination ordering. Then: 1. The scope of every factor generated during the variable elimination process is a clique in L Φ,. 2. Every maximal clique in L Φ, is the scope of some intermediate factor in the computation.

The Induced Graph And Maximal Cliques Proof: 1. Assume factor ψ(x 1,.., X k ) generated by our algorithm, by the construction of the induced graph, each X i, X j will have edge between them in the induced graph. 2. For a Maximal clique: Y = {Y 1,.., Y k }, assume wlog that Y 1 was the first eliminated by the algorithm: Y is a clique: so Y 1 has edges to each other Y i (its now eliminated), therefore its removal adding all edges between the others. Therefore generating a factor between all of them. The factor generated contains only them, otherwise by (1) there is a larger clique contains Y in contradiction to Y s maximallity.

Conclusions the maximal factors (generated by our algorithm ) Maximal Cliques in the induced graph Elimination Ordering Maximal Cliques in the induced graph

Switching Lecturers

Elimination Ordering Definition: width of the induced graph is the size of the largest clique minus one. Definition: induced width by ordering l is the width of the induced graph after applying the VE algorithm with l as ordering. NP Complete Problem: Given a graph H and some bound K, determine whether there exists an elimination ordering achieving an induced width K

Chordal Graphs Definition: a Chordal graph is an undirected graph such that any minimal loop is of size 3. Theorem: Every induced graph is chordal. Proof: assume X 1 X 2 X k X 1 is a cycle where k>3, then: wlog assume X 1 was first eliminated. Then, because no edge is added after X 1 eliminated, both X 1 X k and X 1 X 2 exists at this stage. Therefore, we will add the edge: X 2 X k, thus X 1 X 2 X k is a minimal loop in contradiction that k>3.

Chordal Graphs Theorem: Every Chordal graph H is the induced graph with some elimination ordering without adding new fill edges. Proof: using induction on the number of nodes in the tree. Let H be a chordal graph with n nodes. From Chapter 4.12: Every Chordal Graph have an Clique Tree. Take a leaf in the clique tree, choose one node which is only in that clique (property about clique trees we won t prove), Eliminate it. Its neighbors are only from the clique, therefore no new edges introduced. Remaining graph is also chordal, with n-1 nodes, continue the same.

Elimination Ordering Heuristic 1

Elimination Ordering Heuristic 1 Theorem: Let H be a chordal graph. Let π be the ranking obtained by running Max-Cardinality on H. Then Sum-Product-VE eliminating variables in order of increasing π, does not introduce any fill edges. Proof Intuition: Look at the Clique-Tree for that chordal graph, once we ve selected node from any clique, we will choose first the separators before choosing the nodes in the clique connected to it. This will ensure that in the ordering obtained, for each node, we only should have add edges from it to its neighbors.

Elimination Ordering Heuristic 1 Goal: make any H given graph model into a chordal graph with a minimum width. It s also NP-Hard problem

Elimination Ordering Heuristic 2 Greedy algorithm with heuristic cost function:

Heuristic cost function examples Min-neighbors: The cost of a vertex is the number of neighbors it has in the current graph. Min-weight: The cost of a vertex is the product of weights domain cardinality of its neighbors. Min-fill: The cost of a vertex is the number of edges that need to be added to the graph due to its elimination. Weighted-min-fill: The cost of a vertex is the sum of weights of the edges that need to be added to the graph due to its elimination, where a weight of an edge is the product of weights of its constituent vertices.

Heuristics testing results Overall, Min-Fill and Weighted-Min-Fill achieve the best performance, but they are not universally better. The best answer obtained by the greedy algorithms is generally very good; it is often significantly better than the answer obtained by a deterministic state-of-theart scheme, and it is usually quite close to the bestknown ordering, even when the latter is obtained using much more expensive techniques. Because the computational cost of the heuristic ordering-selection algorithms is usually negligible relative to the running time of the inference itself, we conclude that for large networks it is worthwhile to run several heuristic algorithms in order to find the best ordering obtained by any of them.

Remainder: Clique Tree Definition

Remainder: Clique Tree Definition