Bayesian Networks Machine Learning, Fall 2010 Slides based on material from the Russell and Norvig AI Book, Ch. 14 1
Administrativia Bayesian networks The inference problem: given a BN, how to make predictions based on it The learning problem: given a data set D, how to learn a BN that gives the best fit to D No class on Thursday Office hours: Today 12:30-1:30; Wed. cancelled 2
Bayesian Network Directed Acyclic Graph Overview Each node corresponds to one of the variables (attributes) that describe instances Edges indicate (causal) relationships Each node X i maintains the conditional probability distribution P(X i Parents(X i )) A joint assignment x 1,x 2,... x n Compact representation of a joint probability distribution over a set of variables. to the variables X 1,X 2,... X n has probability P ( x 1,x 2,... x n )= 3 n i=1 P (x i Parents(x i ))
Classic BN Example Picture from Russell and Norvig AI Book 4
Conditional Independence in a BN X is conditionally independent of its nondescendants given its parents X is conditionally independent of all other nodes given its parents, its children, and it s children s parents also known as its Markov blanket 5
Predicting with a BN Called inference Given: A BN Observations for some of the variables What can be inferred about the unobserved variables? 6
Probability Cheat Sheet Before talking about inference, let s review some probability basics Given a joint distribution over two Boolean variables, A and B, how can we use it to derive P(A B)? P (A, B) P (A B) = = P (A, B) P (B) To find what α is, simply compute P (A B) =αp (A, B) and P ( A B) =αp ( A, B) and normalize 7
Cheat Sheet Continued P (A B) = Given a joint distribution over three Boolean variables A, B, and C, how can we find P(A B)? P (A B, C = c) = c Values(C) c Values(C) P (A, B, C = c) What if we have A, B, C, and D and want P(A B)? Sum over joint assignments to values of C and D 8
Exact Inference in BNs Want to answer questions like What is the probability of one of the variables, given values for some (not necessarily all) of the others E.g. in Alarm network, what is P(Burglary JohnCalls = t, MaryCalls = t)? P (B = t J = t, M = t) =α According to our cheat sheet, this is e Values(E) a Values(A) P (B = t, J = t, M = t, E = e, A = a) 9
Variable Elimination Works by summing out (eliminating) uninteresting variables one at a time, doing so in a smart way to reuse computation.... not going to cover it (or any other exact algo) in detail... because in the worst case exact inference in BNs is an NP-hard problem special cases are tractable 10
Approximate Inference in BNs Based on sampling Will use this example from Russell and Norvig AI Book: 11
Direct Sampling First, assume the values of none of the variables are given as evidence e.g., want P(WetGrass) Simple sampling procedure: How to generate samples from a BN? Taking the generative approach, analogous to how we did it in Naive Bayes Generate N = large number of samples from the BN P(WetGrass) is computed as the number of samples in which WetGrass was t, out of all generated samples 12
Rejection Sampling What if we have observed some of the variables as evidence? i.e., want P(WetGrass Cloudy) Take same approach as before, but reject samples in which Cloudy is not set to true Compute counts based on just un-rejected samples Problem: We may end up rejecting most of our samples! 13
MCMC Sampling Rather than generating each sample from scratch, why not use previous sample to generate the next one? 14
MCMC Basic Idea Set observed variables to their observed values Set unobserved variables to random values Begin sampling: Generate a new value for an unobserved variable X, given the currently set values of the variables in its Markov blanket 15
Sampling Given the Markov Blanket We only know how to sample a value for X given it s parents values However, MB(X) includes X s parents, children, and the other parents of its children So, how to compute P(X MB(X))? P (x MB(X)) = αp (x Parents(X)) P (y Parents(Y )) Y Chldrn(X) 16