Probability Propagation in Singly Connected Networks

Similar documents
Lecture 4: Probability Propagation in Singly Connected Bayesian Networks

Decisiveness in Loopy Propagation

Bayesian networks: approximate inference

Inference in Bayesian Networks

6.867 Machine learning, lecture 23 (Jaakkola)

Bayesian Networks: Belief Propagation in Singly Connected Networks

Statistical Approaches to Learning and Discovery

Outline. Bayesian Networks: Belief Propagation in Singly Connected Networks. Form of Evidence and Notation. Motivation

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013

Introduction to Artificial Intelligence. Unit # 11

Machine Learning Lecture 14

13 : Variational Inference: Loopy Belief Propagation

Lecture 8: Bayesian Networks

Methods of Exact Inference. Cutset Conditioning. Cutset Conditioning Example 1. Cutset Conditioning Example 2. Cutset Conditioning Example 3

Discrete Bayesian Networks: The Exact Posterior Marginal Distributions

Bayesian Networks: Aspects of Approximate Inference

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Probabilistic Graphical Models

p L yi z n m x N n xi

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Networks: Representation, Variable Elimination

Bayesian Machine Learning - Lecture 7

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Probabilistic Graphical Models (I)

Belief Update in CLG Bayesian Networks With Lazy Propagation

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

CMPUT 675: Approximation Algorithms Fall 2014

9 Forward-backward algorithm, sum-product on factor graphs

Probabilistic Graphical Models

13 : Variational Inference: Loopy Belief Propagation and Mean Field

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Variational algorithms for marginal MAP

12 : Variational Inference I

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Graphical Models and Kernel Methods

Distributed Data Fusion with Kalman Filters. Simon Julier Computer Science Department University College London

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. C P (D = t) f 0.9 t 0.

Intelligent Systems (AI-2)

Bayesian Networks Representation and Reasoning

Part I Qualitative Probabilistic Networks

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Expectation Propagation Algorithm

Bayesian network modeling. 1

Probabilistic Graphical Models

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Stat 521A Lecture 18 1

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

STA 4273H: Statistical Machine Learning

Morphing the Hugin and Shenoy Shafer Architectures

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Independencies and Inference

Bayesian Networks Basic and simple graphs

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Bayesian Machine Learning

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Networks. B. Inference / B.3 Approximate Inference / Sampling

In rare cases, primarily involving terminological

Variational Message Passing

Probabilistic and Bayesian Machine Learning

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Exact Inference: Clique Trees. Sargur Srihari

Latent Dirichlet Allocation

Lecture 15. Probabilistic Models on Graph

Probability. CS 3793/5233 Artificial Intelligence Probability 1

From Bayes Theorem to Pattern Recognition via Bayes Rule

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.

An Embedded Implementation of Bayesian Network Robot Programming Methods

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayes Networks 6.872/HST.950

Hidden Markov Models (recap BNs)

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

An anytime scheme for bounding posterior beliefs

EECS 545 Project Report: Query Learning for Multiple Object Identification

Bayesian networks. Chapter Chapter

Message Passing Algorithms and Junction Tree Algorithms

Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams)

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

Probabilistic Reasoning Systems

Using first-order logic, formalize the following knowledge:

Approximate Message Passing

A New Approach for Hybrid Bayesian Networks Using Full Densities

Inference in Hybrid Bayesian Networks with Mixtures of Truncated Exponentials

Introduction to Bayesian Learning

Variational Inference (11/04/13)

{ p if x = 1 1 p if x = 0

Probabilistic Graphical Models: Representation and Inference

Machine Learning for Data Science (CS4786) Lecture 24

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

Bayesian belief networks. Inference.

CS 188: Artificial Intelligence. Bayes Nets

Transcription:

Lecture 4 Probability Propagation in Singly Connected Networks Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 1

Probability Propagation We will now develop a general probability propagation method based on just five operating equations which can be applied independently at any node to process incoming evidence and dispatch messages to neighbours. We will first need to introduce the idea of multiple parents. To keep the notation simple we will restrict the treatment to nodes with at most two parents. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 2

Singly Connected Networks The equations are also limited to singly connected networks. which have at most one path between any two nodes. Singly Connected Network Multiply Connected Network Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 3

Multiple Parents Up until now our networks have been trees, but in general they need not be. In particular, we need to cope with the possibility of multiple parents. Multiple parents can be thought of as representing different possible causes of an outcome. In our cat example, the eyes in a picture could be caused by other image features: Pictures showing other animals (owls dogs etc), Pictures that have features sharing the same geometric model (bicycles). Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 4

How to tell an owl from a cat Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 5

How to tell an owl from a cat Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 6

How to tell an owl from a cat Owl Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 7

How to tell an owl from a cat Owl Cat Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 8

How to tell an owl from a cat Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 9

The Owl and the Pussycat The two causes for the E variable (owl and cat) form multiple parents. Where W is the owl variable. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 10

Conditional Probabilities with Multiple Parents Conditional probabilities, with multiple parents must include all the joint states of the parents. Thus for the E node we have: P(E C&W ) Notice that the conditional probability table is associated with the child node and not with the arcs of the networks. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 11

Example Link Matrix Given that node W has states w 1 and w 2, and similarly C, the conditional probability matrix takes the form: P (E W &C) = P(e 1 w 1 &c 1 ) P(e 1 w 1 &c 2 ) P(e 1 w 2 &c 1 ) P(e 1 w 2 &c 2 ) P(e 2 w 1 &c 1 ) P(e 2 w 1 &c 2 ) P(e 2 w 2 &c 1 ) P(e 2 w 2 &c 2 ) P(e 3 w 1 &c 1 ) P(e 3 w 1 &c 2 ) P(e 3 w 2 &c 1 ) P(e 3 w 2 &c 2 ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 12

π messages with multiple parents If we wish to calculate the π evidence for eyes, we first calculate the evidence for C, taking into account only the evidence that does not come from E. This is called the π message from C to E. In this case it includes the prior probability of C and the λ evidence from F. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 13

λ and π messages with multiple parents Next we calculate the π message from W to E. In this example there is only the prior evidence for W. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 14

Notation for π messages We write: π E (C) for the π message from C to E. It means all the evidence for C excluding the evidence from E. π E (W ) for the π message from W to E. It means all the evidence for W excluding the evidence from E. We can also define the π messages using the posterior probability of a node, for example P (C), as follows: P (C) = απ E (C)λ E (C) π E (C) = P (C)/λ E (C) Remember that it is not necessary to normalise evidence. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 15

Finding a joint distribution over the parents Having calculated a π message from each parent, we calculate a joint distribution of π messages over all parents. To do this we assume that π E (C) and π E (W ) are independent: π E (W &C) = π E (W ) π E (C) Remember that this is a scalar equation with variables C and W, for individual states: π E (c i &w j ) = π E (c i ) π E (w j ) In vector form the joint evidence is: π E (W &C) = [π E (w 1 &c 1 ),π E (w 1 &c 2 ),π E (w 2 &c 1 ),π E (w 2 &c 2 )] = [π E (w 1 )π E (c 1 ),π E (w 1 )π E (c 2 ),π E (w 2 )π E (c 1 ),π E (w 2 )π E (c 2 )] Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 16

The independence of π E (C) and π E (W ) In this simple example C and W have no other path linking them, and hence if we do not consider the evidence from E then π E (C) and π E (W ) must be independent. If they had, for example, a common parent, then our assumption about the independence of π E (C) and π E (W ) would no longer hold. We have therefore made an implicit assumption that there are no loops in our network. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 17

Calculating the π evidence for E We can now compute the π evidence for E using the link matrix: [π(e 1 ) π(e 2 ) π(e 3 )] = P(e 1 w 1 &c 1 ) P(e 1 w 1 &c 2 ) P(e 1 w 2 &c 1 ) P(e 1 w 2 &c 2 ) P(e 2 w 1 &c 1 ) P(e 2 w 1 &c 2 ) P(e 2 w 2 &c 1 ) P(e 2 w 2 &c 2 ) P(e 3 w 1 &c 1 ) P(e 3 w 1 &c 2 ) P(e 3 w 2 &c 1 ) P(e 3 w 2 &c 2 ) or in vector form: π(e) = P (E W &C)π E (W &C) π E (w 1 &c 1 ) π E (w 1 &c 2 ) π E (w 2 &c 1 ) π E (w 2 &c 2 ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 18

Posterior probability of E We now can finally compute a probability distribution over the states of E. As before all we do is multiply the λ and π evidence together and normalise. P (e i ) = αλ(e i )π(e i ) α is the normalising constant. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 19

λ messages with multiple parents A node with multiple parents will send an individual λ messages to each. We must compute each one from the joint conditional probability table P(E C&W ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 20

Revision on λ messages The λ message from E to C (before introducing the node W ) was written as: λ E (c 1 ) = λ(e 1 )P(e 1 c 1 ) + λ(e 2 )P(e 2 c 1 ) + λ(e 3 )P(e 3 c 1 ) λ E (c 2 ) = λ(e 1 )P(e 1 c 2 ) + λ(e 2 )P(e 2 c 2 ) + λ(e 3 )P(e 3 c 1 ) or more generally: or in vector form: λ E (c i ) = j P(e j c i )λ(e j ) λ E (C) = λ(e)p (E C) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 21

λ messages with joint link matrices We can calculate a joint λ message using the vector formulation: λ E (W &C) = λ(e)p (E W &C) but this gives us a joint message to the parents: λ E (W &C) = [λ(w 1 &c 1 ),λ(w 1 &c 2 ),λ(w 2 &c 1 ),λ(w 2 &c 2 )] We use the evidence for W to separate out the evidence for C from the joint λ message: λ E (c i ) = j π E (w j )λ(w j &c i ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 22

Scalar equation for the λ message For any specific π E (W ) we can estimate the link matrix P(E C), from P(E W &C). Each element is computed using the scalar equation: P(e i c j ) = P(e i c j &w 1 )π E (w 1 ) + P(e i c j &w 2 )π E (w 2 ) or more generally: P(e i c j ) = k P(e i c j &w k )π E (w k ) where k ranges over the states of W. The λ message can be written directly as a single scalar equation: λ E (c j ) = k π E (w k ) i P(e i c j &w k )λ(e i ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 23

Probability Propagation Each node in a Bayesian Network has six different data items being: 1. The posterior probability 2. The link matrix 3. The λ evidence 4. The π evidence 5. The λ messages 6. The π messages Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 24

Operating Equation 1: the λ message The λ message from C to A in scalar form is given by: λ C (a i ) = m j=1 π C (b j ) n k=1 P(c k a i &b j )λ(c k ) For the case of a single parent this simplifies to: λ C (a i ) = n k=1 P(c k a i )λ(c k ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 25

Operating Equation 1: the λ message For the case of the single parent A we have a very simple matrix form: λ C (A) = λ(c)p(c A) The matrix form for multiple parents relates to the joint states of the parents. λ C (A&B) = λ(c)p(c A&B) It is necessary to separate the λ evidence for the individual parents with a scalar equation of the form: λ C (a i ) = Σ j π C (b j )λ C (a i &b j ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 26

Operating Equation 2: the π message If C is a child of A, the π message from A to C is given by: 1 if A is instantiated for a i π C (a i ) = 0 if A is instantiated but not for a i P (a i )/λ C (a i ) if A is not instantiated The π message to a child contains all the evidence for the parent except that from the child. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 27

Operating Equation 3: the λ evidence If C is a node with n children D 1,D 2,..D n, then the λ evidence for C is: 1 if C is instantiated for c k λ(c k ) = 0 if C is instantiated but not for c k i λ Di (c k ) if C is not instantiated Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 28

Operating Equation 4: the π evidence If C is a child of two parents A and B the π evidence for C is given by: π(c k ) = l m i=1 j=1 P(c k a i &b j )π C (a i )π C (b j ) This can be written in matrix form as follows: where π(c) = P(C A&B)π C (A&B) π C (a i &b j ) = π C (a i )π C (b j ) The single parent matrix equation is: π(c) = P(C A)π C (A) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 29

Operating Equation 5: the posterior probability If C is a variable the (posterior) probability of C based on the evidence received is written as: P (c k ) = αλ(c k )π(c k ) where α is chosen to make k P (c k ) = 1 Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 30

Belief Propagation Probability propagation is a form of belief propagation and is achieved by message passing. New evidence enters a network when a variable is instantiated, ie when it receives a new value from the input. When this happens the posterior probabilities of each node in the whole network may need to be re-calculated. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 31

Belief Propagation When the π or λ evidence for a node changes it must inform some of its parents and its children as follows: For each parent to be updated: Update the parent s λ message array. Set a flag to indicate that the parent must be re-calculated For each child to be updated Update the π message held for each child. Set a flag to indicate that the child must be re-calculated Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 32

Termination The network reaches a steady state when there are no more nodes to be re-calculated. This condition will always be reached in singly connected networks. This is because λ messages will terminate at root nodes and π messages will terminate at leaf nodes. Multiply connected networks will not necessarily reach a steady state. This is referred to as loopy belief propagation We will discuss termination conditons later in the lecture. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 33

Initialisation When a network is initialised no nodes have been instantiated and the only evidence comes from the prior probabilities of the roots: 1. All λ message and evidence values are set to 1 2. All π messages are set to 1 3. For all root nodes the π evidence values are set to the prior probabilities, eg, for all states of R : π(r i ) = P(r i ) 4. Post and propagate the π messages from the root nodes down to the leaf nodes (see downward propagation). Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 34

Upward Propagation if node C receives λ message from a child if C is not instantiated 1. Compute the new λ(c) value (Op. Eq. 3) 2. Compute the new posterior probability P (C) (Eq. 5) 3. Post a λ message to all C s parents 4. Post a π message to C s other children Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 35

Downward Propagation if a variable C receives a π message from one parent: if C is not instantiated 1. Compute a new value for evidence π(c) (Eq.4) 2. Compute a new value for P (C) (Eq. 5) 3. Post a π message to each child if (there is λ evidence for C) 1. Post a λ message to the other parents Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 36

Instantiation if variable C is instantiated in state c i 1. Compute λ(c) (Eq. 3) 2. Compute P (C) (=λ(c)) (Eq. 5) 3. Post a π message (=λ(c)) to each child of C (Eq. 2) 4. Post a λ message to each parent of C Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 37

Blocked Paths If a node is instantiated it will block some message passing. Instantiating the centre node below prevents any messages passing through it. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 38

Converging Paths However, converging paths are blocked when there is no λ evidence for the child node, but unblocked if the child has λ evidence or is instantiated. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 39

Converging Paths To understand the behaviour of converging paths we look at operating equaion 1. λ C (a i ) = m j=1 π C (b j ) n k=1 P(c k a i &b j )λ(c k ) Given that node C has no λ evidence we can write λ(c) = {1,1,1,. 1} and substituting this into operating equation 1 we get: λ C (a i ) = m j=1 π C (b j ) n k=1 P(c k a i &b j ) Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 40

Converging Paths λ C (a i ) = m j=1 π C (b j ) n k=1 P(c k a i &b j ) The second summation now evalutes to 1 for any value of i and j, since we are summing a probability distribution. Thus: λ C (a i ) = m j=1 π C (b j ) The sum is independent of i and hence the value of λ c (a i ) is the same for all values of i, that is to say there is no λ evidence sent to A. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 41

Example: the Owl and the Pussy Cat Initialisation: The only evidence in the network is the prior probabilities of the root nodes. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 42

Example: the Owl and the Pussy Cat Initialisation: The priors become π messages and propagate downwards. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 43

Example: the Owl and the Pussy Cat Initialisation: E has no λ evidence so it just sends a π message to its children and the network reaches a steady state. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 44

Example: the Owl and the Pussy Cat Instantiation of S: A new measurement is made and we instantiate S. A λ message propagates upwards. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 45

Example: the Owl and the Pussy Cat Instantiation of S: E calculates its evidence and sends messages everywhere but to S. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 46

Example: the Owl and the Pussy Cat Instantiation of S: W and C recalculate their evidence. W has no messages to send but C will send a π message to F. The network reaches a steady state. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 47

Example: the Owl and the Pussy Cat Instantiation of C: C is now instantiated and sends π messages to its children. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 48

Example: the Owl and the Pussy Cat Instantiation of C: There is λ evidence for E so it will send a λ message to W. The π message sent to S has no effect. It is blocked since S is already instantiated. The network reaches a steady state. Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 49