Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Similar documents
Bayesian Networks. Probability Theory and Probabilistic Reasoning. Emma Rollon and Javier Larrosa Q

COMP538: Introduction to Bayesian Networks

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Bayesian Networks: Representation, Variable Elimination

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Quantifying uncertainty & Bayesian networks

COMPSCI 276 Fall 2007

Bayesian Network Representation

CS 188: Artificial Intelligence. Bayes Nets

Directed Graphical Models or Bayesian Networks

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 9: PGM Learning

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

On the Relationship between Sum-Product Networks and Bayesian Networks

CS 343: Artificial Intelligence

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Exact Inference Algorithms Bucket-elimination

Introduction to Bayesian Learning

{ p if x = 1 1 p if x = 0

Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas

Sampling Algorithms for Probabilistic Graphical models

Bayes Nets: Independence

Probability. CS 3793/5233 Artificial Intelligence Probability 1

4 : Exact Inference: Variable Elimination

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Informatics 2D Reasoning and Agents Semester 2,

Exercises, II part Exercises, II part

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Inference in Bayesian Networks

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayes Nets III: Inference

last two digits of your SID

Bayesian networks. Chapter Chapter

Rapid Introduction to Machine Learning/ Deep Learning

CS 5522: Artificial Intelligence II

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS6220: DATA MINING TECHNIQUES

Introduction to Bayesian Networks

Bayesian Machine Learning

Need for Sampling in Machine Learning. Sargur Srihari

Exact Inference Algorithms Bucket-elimination

Bayesian networks. Instructor: Vincent Conitzer

Bayesian networks. Chapter 14, Sections 1 4

Computational Complexity of Inference

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

CSEP 573: Artificial Intelligence

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Directed Graphical Models

Artificial Intelligence

Bayes Nets: Sampling

Graphical Models - Part I

Variable Elimination: Algorithm

The Bucket Elimination Algorithm for Belief Net Inference R Greiner October 22, 2006

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Bayesian Learning. Instructor: Jesse Davis

Machine Learning for Signal Processing Bayes Classification and Regression

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Variational Inference. Sargur Srihari

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

T Machine Learning: Basic Principles

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

Qualifier: CS 6375 Machine Learning Spring 2015

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

Variable Elimination (VE) Barak Sternberg

Inference in Bayesian Networks

PROBABILITY AND INFERENCE

Artificial Intelligence Bayesian Networks

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Learning Bayesian Networks (part 1) Goals for the lecture

Variable Elimination: Algorithm

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

An Introduction to Bayesian Networks: Representation and Approximate Inference

Bayesian Networks. Motivation

Introduction to Bayesian inference

Directed and Undirected Graphical Models

Probability Propagation in Singly Connected Networks

Intelligent Systems (AI-2)

Probabilistic Reasoning Systems

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

Introduction to Artificial Intelligence Belief networks

CS6220: DATA MINING TECHNIQUES

Review: Bayesian learning and inference

Uncertainty and Bayesian Networks

Multivariate Bayesian Linear Regression MLAI Lecture 11

Artificial Intelligence Bayes Nets: Independence

Probabilistic Graphical Models

Bayesian Network Inference Using Marginal Trees

Intelligent Systems (AI-2)

Transcription:

Bayesian Networks Exact Inference by Variable Elimination Emma Rollon and Javier Larrosa Q1-2015-2016 Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 1 / 25

Recall the most usual queries Let X = E Q Z, where E are the evidence variables, e are the observed values, Q are the variables of interest, Z are the rest of the variables. Probability of Evidence (Q = ): Prior Marginals (E = ): Posterior Marginals (E ): Pr(e) = Z Pr(e, Z) Pr(Q) = Z Pr(Q, Z) Pr(Q e) = Z Pr(Q, Z e) Most Probable Explanation (MPE) (E, Z = ): q = arg max Q Pr(Q e) Maximum a Posteriori Probability (MAP): q = arg max Q Pr(Q e) = arg max Q Z Pr(Q, Z e) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 2 / 25

In this session We are going to see a generic algorithm for inference (inference = query answer) It is called Variable Elimination because it eliminates one by one those variables which are irrelevant for the query. It relies on some basic operations on a class of functions known as factors. It uses an algorithmic technique called dynamic programming We will illustrate it for the computation of prior and posteriors marginals, P(Y) = Z P(Y, Z) P(Y e) = Z P(Y, Z e) It can be easily extended to other queries. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 3 / 25

Factors A factor f is a function over a set of variables {X 1, X 2,..., X k }, called scope, mapping each instantiation of these variables to a real number R. X Y Z f (X, Y, Z) 0 0 0 5 0 0 1 3 0 0 2 1 0 1 0 5 0 1 1 2 0 1 2 7 1 0 0 1 1 0 1 4 1 0 2 6 1 1 0 2 1 1 1 4 1 1 2 3 Observation: Condition Probability Tables (CPTs) are factors. We define three operations on factors: Product: f (X, Y, Z) g(q, Y, R) Conditioning: f (X, y, Z) Marginalization: y Y f (X, y, Z) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 4 / 25

Factors: Product X Y f (X, Y ) 0 0 1 0 1 3 1 0 2 1 1 1 Y Z g(y, Z) 0 0 4 0 1 3 1 0 1 1 1 2 X Y Z f (X, Y ) g(y, Z) 0 0 0 1 4 0 0 1 1 3 0 1 0 3 1 0 1 1 3 2 1 0 0 2 4 1 0 1 2 3 1 1 0 1 1 1 1 1 1 2 Factor multiplication is commutative and associative. It is time and space exponential in the number of variables in the resulting factor. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 5 / 25

Factors: Conditioning X Y f (X, Y ) 0 0 1 0 1 3 1 0 2 1 1 1 Y f (0, Y ) 0 1 1 3 Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 6 / 25

Factors: Marginalization X Y f (X, Y ) 0 0 1 0 1 3 1 0 2 1 1 1 Y x X f (x, Y ) 0 1 + 2 1 3 + 1 Factor marginalization is commutative. It is time exponential in the number of variables in the original factor, and space exponential in the number of variables in the resulting factor. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 7 / 25

Factors: Example Given the following BN: compute: Pr(S I ) Pr(I ), Pr(G i 1, D), G Pr(L G), L Pr(L G) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 8 / 25

The basic property: distributivity When dealing with integers, we know that: n n K i = K i i=1 i=1 The same happens when dealing with factors (e.g., D(X ) = {x 0, x 1 }): f (Z) g(x, Y ) = f (Z) g(x 0, Y ) + f (Z) g(x 1, Y ) = X = f (Z) (g(x 0, Y ) + g(x 1, Y )) = f (Z) X g(x, Y ) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 9 / 25

VE: prior marginal on a chain We will illustrate VE in the simplest case (a chain with no evidence): Boolean variables X i The BN is a chain: P(X 1, X 2, X 3, X 4 ) = Pr(X 1 )Pr(X 2 X 1 )Pr(X 3 X 2 )Pr(X 4 X 3 ) We have no evidence (E = ) The only variable of interest is X 4 Thus, we want to compute: Pr(X 4 ) = Pr(X 1 )Pr(X 2 X 1 )Pr(X 3 X 2 )Pr(X 4 X 3 ) X 1 X 2 X 3 Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 10 / 25

VE: prior marginal on a chain Elimination of X 1 : Pr(X 4 ) = Pr(X 1 )Pr(X 2 X 1 )Pr(X 3 X 2 )Pr(X 4 X 3 ) = X 1 X 2 X 3 = Pr(X 1 )Pr(X 2 X 1 )Pr(X 3 X 2 )Pr(X 4 X 3 ) = X 2 X 3 X 1 = Pr(X 3 X 2 )Pr(X 4 X 3 )( Pr(X 1 )Pr(X 2 X 1 )) = X 2 X 3 X 1 = Pr(X 3 X 2 )Pr(X 4 X 3 )λ 1 (X 2 ) X 2 X 3 where λ 1 (X 2 ) = X 1 Pr(X 1 )Pr(X 2 X 1 ) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 11 / 25

VE: prior marginal on a chain Elimination of X 2 : P(X 4 ) = Pr(X 3 X 2 )Pr(X 4 X 3 )λ 1 (X 2 ) X 2 X 3 = Pr(X 3 X 2 )Pr(X 4 X 3 )λ 1 (X 2 ) = X 3 X 2 = Pr(X 4 X 3 )( Pr(X 3 X 2 )λ 1 (X 2 )) = X 3 X 2 = X 3 Pr(X 4 X 3 )λ 2 (X 3 ) = where λ 2 (X 3 ) = X 2 Pr(X 3 X 2 )λ 1 (X 2 ) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 12 / 25

VE: prior marginal on a chain Elimination of X 3 : P(X 4 ) = X 3 Pr(X 4 X 3 )λ 2 (X 3 ) = λ 3 (X 4 ) Question: What would be the time and space complexity of this execution? Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 13 / 25

VE: prior marginal on general networks Joint probability distribution: P(C, D, I, G, L, S, H, J) = P(H G, J)P(J L, S)P(L G)P(S I )P(I )P(G D, I )P(D C)P(C) Goal: P(G, L, S, H, J) Elimination order: C, D, I Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 14 / 25

VE: prior marginal on general networks Elimination of C: P(H G, J)P(J L, S)P(L G)P(S I )P(I )P(G D, I )P(D C)P(C) = I,D,C = P(H G, J)P(J L, S)P(L G)P(S I )P(I )P(G D, I )( P(D C)P(C)) I,D C = I,D P(H G, J)P(J L, S)P(L G)P(S I )P(I )P(G D, I )λ C (D) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 15 / 25

VE: prior marginal on general networks Elimination of D: P(H G, J)P(J L, S)P(L G)P(S I )P(I )P(G D, I )λ C (D) I,D = P(H G, J)P(J L, S)P(L G)P(S I )P(I )( P(G D, I )λ C (D)) = I D = I P(H G, J)P(J L, S)P(L G)P(S I )P(I )λ D (G, I ) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 16 / 25

VE: prior marginal on general networks Elimination of I : P(H G, J)P(J L, S)P(L G)P(S I )P(I )λ D (G, I ) I P(H G, J)P(J L, S)P(L G)( P(S I )P(I )λ D (G, I )) = I P(H G, J)P(J L, S)P(L G)λ I (S, G) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 17 / 25

VE: posterior marginal on general networks Given the joint probability: P(C, D, I, G, L, S, H, J) = P(H G, J)P(J L, S)P(L G)P(S I )P(I )P(G D, I )P(D C)P(C) and evidence {I = i, H = h}, we want to compute: P(J, S, L i, h) = P(J, S, L, i, h) P(i, h) = C,D,G J,S,L,C,D,G P(C, D, i, G, L, S, h, J) P(C, D, i, G, L, S, h, J) using elimination order C, D, G. Trick Compute P(J, S, L, i, h) and normalize. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 18 / 25

VE: posterior marginal on general networks Elimination of C: P(h G, J)P(J L, S)P(L G)P(S i)p(i)p(g D, i)p(d C)P(C) = C,D,G P(h G, J)P(J L, S)P(L G)P(S i)p(i)p(g D, i)( D,G C P(h G, J)P(J L, S)P(L G)P(S i)p(i)p(g D, i)λ C (D) D,G P(D C)P(C)) = Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 19 / 25

VE: posterior marginal on general networks Elimination of D: P(h G, J)P(J L, S)P(L G)P(S i)p(i)p(g D, i)λ C (D) = D,G P(h G, J)P(J L, S)P(L G)P(S i)p(i)( P(G D, i)λ C (D)) = G D P(h G, J)P(J L, S)P(L G)P(S i)p(i)λ D (G) G Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 20 / 25

VE: posterior marginal on general networks Elimination of G: P(h G, J)P(J L, S)P(L G)P(S i)p(i)λ D (G) = G P(J L, S)P(S i)p(i)( P(h G, J)P(L G)λ D (G)) = G P(J L, S)P(S i)p(i)λ G (J, L) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 21 / 25

VE: Summary Condition all factors on evidence. Eliminate/marginalize each non-query variable X : Move factors mentioning X towards the right. Push-in the summation over X towards the right. Replace expression by λ function. Normalize to get distribution. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 22 / 25

VE: Exercise Given the following BN: Eliminate Letter and Difficulty from the network. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 23 / 25

VE: Some tricks Given a BN and a set of query nodes Q, and a set of evidence variables E: Pruning edges: the effect of conditioning is to remove any edge from a node in E to its parents. Pruning nodes: we can remove any leaf node (with its CPT) from the network as long as it does not belong to variables in Q E. Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 24 / 25

Complexity and Elimination Order P(C, X 1,..., X n ) = P(C)P(X 1 C)P(X 2 C)... P(X n C) Eliminate X n first: P(C)P(X 1 C)P(X 2 C)... P(X n 1 C)( P(X n C)) = C,X 1,X 2,...X n 1 X n P(C)P(X 1 C)P(X 2 C)... P(X n 1 C)λ Xn (C) C,X 1,X 2,...,X n 1 Eliminate C first: X 1,X 2,...,X n ( P(C)P(X 1 C)P(X 2 C)... P(X n C)) C X 1,X 2,...,X n λ C (X 1, X 2,..., X n ) Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 25 / 25