Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Similar documents
Probability and Structure in Natural Language Processing

Variable Elimination (VE) Barak Sternberg

Probability and Structure in Natural Language Processing

Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

Variable Elimination: Algorithm

Inference and Representation

Variable Elimination: Algorithm

Introduc)on to Ar)ficial Intelligence

Lecture 17: May 29, 2002

CS 188: Artificial Intelligence. Bayes Nets

Structure Learning: the good, the bad, the ugly

More belief propaga+on (sum- product)

4 : Exact Inference: Variable Elimination

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Intelligent Systems (AI-2)

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum

CS281A/Stat241A Lecture 19

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Directed Graphical Models or Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

Alternative Parameterizations of Markov Networks. Sargur Srihari

Junction Tree, BP and Variational Methods

Undirected Graphical Models: Markov Random Fields

CS 343: Artificial Intelligence

Bayes Nets III: Inference

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Probabilistic Graphical Models

Lecture 6: Graphical Models

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Directed and Undirected Graphical Models

Graphical Models and Kernel Methods

CS 188: Artificial Intelligence Spring Announcements

Machine Learning 4771

Bayesian Networks Representation

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Probabilistic Graphical Models

Lecture 4 October 18th

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

4.1 Notation and probability review

Bayesian Networks: Representation, Variable Elimination

Probabilistic Graphical Models (I)

Bayesian Networks. Why Joint Distributions are Important. CompSci 570 Ron Parr Department of Computer Science Duke University

Probabilistic Graphical Models

Markov Networks.

Inference in Bayesian Networks

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

CS 5522: Artificial Intelligence II

Probabilistic Graphical Models

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Graphical Models. Lecture 5: Template- Based Representa:ons. Andrew McCallum

Message Passing Algorithms and Junction Tree Algorithms

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Exact Inference Algorithms Bucket-elimination

Exact Inference Algorithms Bucket-elimination

Probabilistic Graphical Networks: Definitions and Basic Results

Lecture 8: Bayesian Networks

Bayes Nets: Independence

Bayesian networks. Chapter Chapter

The Origin of Deep Learning. Lili Mou Jan, 2015

Exact Inference: Clique Trees. Sargur Srihari

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian networks Lecture 18. David Sontag New York University

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Exact Inference: Variable Elimination

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Alternative Parameterizations of Markov Networks. Sargur Srihari

5. Sum-product algorithm

Bayesian Networks BY: MOHAMAD ALSABBAGH

Learning P-maps Param. Learning

Artificial Intelligence Bayesian Networks

last two digits of your SID

Artificial Intelligence Bayes Nets: Independence

6.867 Machine learning, lecture 23 (Jaakkola)

Convergence Rate of Markov Chains

Directed and Undirected Graphical Models

Inference methods in discrete Bayesian networks

COMPSCI 276 Fall 2007

Directed Graphical Models

Sources of Uncertainty

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

MA/CS 109 Lecture 7. Back To Exponen:al Growth Popula:on Models

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

From Bayesian Networks to Markov Networks. Sargur Srihari

BN Semantics 3 Now it s personal! Parameter Learning 1

Introduction to Bayesian Networks

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Bayesian networks. Chapter 14, Sections 1 4

Transcription:

Graphical Models Lecture 10: Variable Elimina:on, con:nued Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1

Last Time Probabilis:c inference is the goal: P(X E = e). #P- complete in general Do it anyway! Variable elimina:on

Markov Chain Example A 0 1 P (B) = a Val(A) P (A = a)p (B A = a) P(B A) 0 1 0 B 1 P (C) = b Val(B) P (B = b)p (C B = b) P(C B) 0 1 0 1 C P (D) = P (C = c)p (D C = c) P(D C) 0 1 D c Val(C) 0 1

Last Time Probabilis:c inference is the goal: P(X E = e). #P- complete in general Do it anyway! Variable elimina:on Work on factors (algebra of factors) Generally: sum- product inference φ Z φ Φ

Products of Factors Given two factors with different scopes, we can calculate a new factor equal to their products. A B C ϕ 3 (A, B, C) 0 0 0 3000 0 0 1 30 A B ϕ 1 (A, B) 0 0 30 0 1 5 1 0 1 1 1 10. B C ϕ 2 (B, C) 0 0 100 0 1 1 1 0 1 1 1 100 = 0 1 0 5 0 1 1 500 1 0 0 100 1 0 1 1 1 1 0 10 1 1 1 1000

Factor Marginaliza:on Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza:on: ψ(x) = φ(x,y) y Val(Y ) P(C A, B) 0, 0 0, 1 1, 0 1,1 0 0.5 0.4 0.2 0.1 1 0.5 0.6 0.8 0.9 summing out B A C ψ(a, C) 0 0 0.9 0 1 0.3 1 0 1.1 1 1 1.7

Last Time Probabilis:c inference is the goal: P(X E = e). #P- complete in general Do it anyway! Variable elimina:on Work on factors (algebra of factors) How to eliminate one variable (marginalize a product of factors)

Elimina:ng One Variable Input: Set of factors Φ, variable Z to eliminate Output: new set of factors Ψ 1. Let Φ = {ϕ Φ Z Scope(ϕ)} 2. Let Ψ = {ϕ Φ Z Scope(ϕ)} 3. Let ψ be Z ϕ Φ ϕ 4. Return Ψ {ψ}

Last Time Probabilis:c inference is the goal: P(X E = e). #P- complete in general Do it anyway! Variable elimina:on Work on factors (algebra of factors) Generally: sum- product inference How to eliminate one variable (marginalize a product of factors) How to eliminate a bunch of variables Z φ Φ φ

Variable Elimina:on Input: Set of factors Φ, ordered list of variables Z to eliminate Output: new factor ψ 1. For each Z i Z (in order): Let Φ = Eliminate- One(Φ, Z i ) 2. Return ϕ Φ ϕ

Today Variable elimina:on for inference (with evidence) Complexity analysis of VE Elimina:on orderings

Probabilis:c Inference Assume we are given a graphical model. Want: P (X, E = e) P (X E = e) = P (E = e) P (X, E = e) = P (X, E = e, Y = y) y Val(Y )

Adding Evidence Condi:onal distribu:ons are Gibbs; can be represented as factor graphs! Everything is essen:ally the same, but we reduce the factors to match the evidence. Previously normalized factors may not be normalized any longer, but this is not a problem. Prune anything not on an ac:ve trail to query variables.

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let s reduce to R = true (runny nose). ϕ SR ϕ SH R.N. H. P(R S) 0 1 0 1

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let s reduce to R = true (runny nose). ϕ SR ϕ SH R.N. H. P(R S) 0 1 0 1 S R ϕ SR (S, R) 0 0 0 1 1 0 1 1

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let s reduce to R = true (runny nose). ϕ SR ϕ SH R.N. H. P(R S) 0 1 0 1 S R ϕ SR (S, R) 0 0 0 1 1 0 1 1 S R ϕ S (S) 0 0 0 1 1 0 1 1

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let s reduce to R = true (runny nose). ϕ SR ϕ SH R.N. H. P(R S) 0 1 0 1 S R ϕ SR (S, R) 0 0 0 1 1 0 1 1 S R ϕ S (S) 0 1 1 1

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let s reduce to R = true (runny nose). ϕ S ϕ SH H. S R ϕ S (S) 0 1 1 1

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Now run variable elimina:on all the way down to one factor (for F). ϕ S ϕ SH H. H can be pruned for the same reasons as before.

Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Now run variable elimina:on all the way down to one factor (for F). ϕ S Eliminate S.

Example ϕ F ϕ A Query: P(Flu runny nose) Now run variable elimina:on all the way down to one factor (for F). Flu ψ FA All. Eliminate A.

Example ϕ F Query: P(Flu runny nose) Flu ψ F Take final product. Now run variable elimina:on all the way down to one factor (for F).

Example Query: P(Flu runny nose) Flu ϕ F ψ F Now run variable elimina:on all the way down to one factor.

General Recipe Variable Elimina:on for Condi:onal Probabili:es Input: Graphical model, set of query variables Q, evidence E = e Output: factor ϕ and scalar α 1. Φ = factors in the model 2. Reduce factors in Φ by E = e 3. Choose variable ordering on Z = X \ Q \ E 4. ϕ = Variable- EliminaIon(Φ, Z) 5. α = z Val(Z) ϕ(z) 6. Return ϕ, α

Note For Bayesian networks, the final factor will be P(Q, E = e) and the sum α = P(E = e). This equates to a Gibbs distribu:on with par::on func:on = α.

Complexity of Variable Elimina:on 26

Complexity of Variable Elimina:on n = number of random variables m = number of factors In step i, we mul:ply all factors rela:ng to X i, resul:ng in ψ i, and sum out X i, giving a new factor τ i. N i = number of entries in ψ i N max = max i N i If we eliminate everything, m ini:al factors plus n new ones (the τ i ).

Complexity of Variable Elimina:on m + n factors Each is mul:plied once, then removed.

Recall: Elimina:ng One Variable Input: Set of factors Φ, variable Z to eliminate Output: new set of factors Ψ 1. Let Φ = {ϕ Φ Z Scope(ϕ)} 2. Let Ψ = {ϕ Φ Z Scope(ϕ)} 3. Let ψ be Z ϕ Φ ϕ 4. Return Ψ {ψ}

Complexity of Variable Elimina:on m + n factors Each is mul:plied once to produce some ψ i, then removed to produce τ i. (m + n) N i mul:plica:ons for X i O(mN max ) Marginaliza:on (summing) touches each entry in each ψ i once: N i addi:ons for X i O(nN max )

Complexity of Variable Elimina:on Overall: O(mN max ) Bayesian network: m = n Markov network: m n

Complexity of Variable Elimina:on Overall: O(mN max ) The size N max of the intermediate factors ψ i is what makes this blow up. v values per random variable k i variables for factor ψ i N i = v k i But really, how bad is it?

Analyzing VE via the Graph Assume a factor graph representa:on. One step of VE, on X i : Create a single factor ψ that includes X i and its neighbors (Y that share factors with X i ). Marginalize X i out of ψ, giving new factor τ. If we go back to a Markov network, we have now introduced new edges!

Example ϕ F ϕ A Factor graph. Flu All. ϕ FAS S.I. ϕ SR ϕ SH R.N. H.

Example Markov network. Flu All. S.I. R.N. H.

Example ϕ F ϕ A Eliminate S. Flu All. ϕ FAS S.I. ϕ SR ϕ SH R.N. H.

Example ϕ F ϕ A Eliminate S. Flu ϕ FAS All. ψ = ϕ FAS ϕ SR ϕ SH ϕ SR ϕ SH S.I. R.N. H.

Example ϕ F ϕ A Eliminate S. Flu All. ψ = ϕ FAS ϕ SR ϕ SH ψ S.I. R.N. H.

Example ϕ F ϕ A Eliminate S. Flu All. ψ = ϕ FAS ϕ SR ϕ SH τ = S ψ ψ S.I. R.N. H.

Example ϕ F ϕ A Eliminate S. Flu All. ψ = ϕ FAS ϕ SR ϕ SH τ = S ψ τ R.N. H.

Example Eliminate S. Flu All. ψ = ϕ FAS ϕ SR ϕ SH τ = S ψ R.N. H. Back to Markov net?

Example Flu All. Flu All. S.I. R.N. H. R.N. H. removed stuff fill edges

Insight Each VE step is a transforma:on on the graph. We ve been drawing it on slides this way all along! We can put the full sequence of graphs together into a single structure.

Union of the Graphs Flu All. Flu All. S.I. R.N. H. R.N. H.

Union of the Graphs Flu All. S.I. R.N. H.

Induced Graph Take the union over all of the undirected graphs from each step: induced graph. (1)The scope of every intermediate factor is a clique in this graph. (2)Every maximal clique in the graph is the scope of some intermediate factor. Important: different ordering different induced graph

Proof (1) The scope of every intermediate factor is a clique in the induced graph. Consider ψ(x 1,, X k ), an intermediate factor. In the corresponding Markov network, all of the X i are connected (they share a factor). Hence they form a clique.

Proof (2) Every maximal clique in the induced graph is the scope of some intermediate factor. Consider maximal clique Y = {Y 1,, Y k } Let Y 1 be the first one eliminated, with resuling product- of- factors ψ. All edges relaing to Y 1 are introduced before it is eliminated. Y 1 and Y i share an edge, so they share a factor that gets muliplied into ψ; so ψ includes all of Y. Any other variable X can t be in the scope of ψ, because it would have to be linked to all of Y, so that Y wouldn t be a maximal clique. KF p 309

Ordering: {S, } Flu All. S.I. R.N. H.

Ordering: {H, R, S, A, F} Flu All. S.I. R.N. H.

Ordering: {H, R, S, A, F} Flu All. Flu All. S.I. S.I. R.N. H. R.N.

Ordering: {H, R, S, A, F} Flu All. Flu All. Flu All. S.I. S.I. S.I. R.N. H. R.N.

Ordering: {H, R, S, A, F} Flu All. Flu All. Flu All. S.I. S.I. S.I. R.N. H. R.N. Flu All.

Ordering: {H, R, S, A, F} Flu All. Flu All. Flu All. S.I. S.I. S.I. R.N. H. R.N. Flu All. Flu

Ordering: {H, R, S, A, F} Flu All. S.I. R.N. H. induced graph = original graph

Induced Width Number of nodes in the largest clique of the induced graph, minus one. Rela:ve to an ordering! Flu All. Flu All. S.I. S.I. R.N. H. R.N. H.

Induced Width Number of nodes in the largest clique of the induced graph, minus one. Rela:ve to an ordering! Tree width = minimum width over all possible orderings. Bound on the best performance we can hope for VE run:me is exponen*al in treewidth! Why minus one?

Treewidth Example

Treewidth Example

Treewidth Example

Treewidth Example

Treewidth Example X 1,1 X 1,2 X 1,3 X 2,1 X 2,2 X 2,3 X 3,1 X 3,2 X 3,3 X n,n

Treewidth Example X M Z Y N

Finding Elimina:on Orderings NP- complete: Is there an elimina:on ordering such that induced width K? Nonetheless, some convenient cases arise.

You d like to be able to look at the original graph and easily say something about the difficulty of inference 65

Chordal Graphs Undirected graph whose minimal cycles are not longer than 3. A A A B C B C B C D Not chordal D Chordal D Chordal Lecture 5 (MN BN)

Chordal Graphs Induced graphs are always chordal!

Chordal Graphs Induced graphs are always chordal! A B C D induced graph? not chordal Lemma: cannot add any edges incident on X i a~er it is eliminated. When we eliminate C, edges A- C and C- D must exist. A~er elimina:on A- D will exist.

Chordal Graphs Induced graphs are always chordal! B A C Lemma: cannot add any edges incident on X i a~er it is eliminated. A B D fill edge induced graph? not chordal D

Theorem Chordal graphs always admit an elimina:on ordering that doesn t introduce any fill edges into the graph. No fill edges: no blowup. Inference becomes linear in size of the factors already present!

Clique Tree ABC B A C Every maximal clique becomes a vertex. BCD Tree structure. D E CDE F Lecture 5 DEF

Clique Tree ABC A sep H (A, D B, C) B C sep H (B, E C, D) BCD D E CDE F Lecture 5 For each edge, intersec:on of r.v.s separates the rest in H. sep H (C, F D, E) DEF

Clique Tree Does a clique tree exist? Yes, if the undirected graph H is chordal! Lecture 5

Theorem Chordal graphs always admit an elimina:on ordering that doesn t introduce any fill edges into the graph. Proof by induc:on on the number of nodes in the tree: Take a leaf C i in the clique tree. Eliminate a variable in C i (but not C i s neighbor). No fill edges. S:ll chordal.

Heuris:cs for Variable Elimina:on Ordering other than using a clique tree 75

Alterna:ve Ordering Heuris:c: Maximum Cardinality Start with undirected graph on X, all nodes unmarked. For i = X to 1: Let Y be the unmarked variable in X with the largest number of marked neighbors π(y) = i Mark Y. Eliminate using permuta:on π. i.e. π maps each variable to an integer; eliminate the variables in order of those integers 1, 2, 3...

Alterna:ve Ordering Heuris:c: Maximum Cardinality Maximum Cardinality permuta:on will not introduce any fill edges for chordal graphs. Don t need the clique tree. Can also use it on non- chordal graphs! Be er ideas exist, though; greedy algorithms that try to add a small number of edges at a :me.

Bayesian Networks Again Recall: If undirected graph H is chordal, then there is a Bayesian network structure G that is a P- map for H. Chordal graphs correspond to Bayesian networks with a polytree structure. At most one trail between any pair of nodes. polytree = directed graph with at most one undirected path between any two ver:ces; equiv: directed acyclic graph (DAG) for which there are no undirected cycles either.

Example Polytree From Wikipedia :- ) 79

Inference in Polytrees Linear in condi:onal probability table sizes! Consider the skeleton. Pick a root; form a tree. Work in from leaves. In the corresponding undirected graph, no fill edges are introduced.

Variable Elimina:on Summary In general, exponenial requirements in induced width corresponding to the ordering you choose. It s NP- hard to find the best eliminaion ordering. If you can avoid fill edges (or big intermediate factors), you can make inference linear in the size of the original factors. Chordal graphs Polytrees