Introduction to Bayesian Networks

Introduction to Bayesian Networks Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/23

Outline Basic Concepts Bayesian Networks Inference 2/23

Example 1: Icy Road 1 Police Inspector Smith is waiting for Mr Holmes and Dr Watson, who are late for their appointment Both of them are bad drivers Smith wonder if the road is icy as it is snowing Smith s secretary enters and tell him Watson has had a car accident Smith is afraid that Holmes has probably crashed too, as the road is icy The secretary says the road is salted and not icy at all Smith decides to wait 10 more minutes for Holmes rather than leaving for lunch 1 in An Introduction to Bayesian Networks by F. Jensen 3/23

The Uncertainty Reasoning H I W I: Icy road H: Holmes crashes W: Watson crashes The uncertainty on W influences the uncertainty on H When Smith is told that Watson has had a car accident, how does he do the reasoning, i.e., p(w H)? When the secretary tells him the road is not icy, how does he do the reasoning, i.e., p(w H, I)? 4/23

Conditional Independency x y 1 y 2 When x is not known, y 1 and y 2 are dependent p(y 1, y 2 ) p(y 1 )p(y 2 ) The uncertainties on y 1 and y 2 influence each other When x is given, y 1 and y 2 are independent This is called conditional independency p(y 1 x, y 2 ) = p(y 1 x) p(y 2 x, y 1 ) = p(y 2 x) p(y 1, y 2 x) = p(y 1 x)p(y 2 x) 5/23

Uncertainty Propagation and Blocking x x X y 1 y 2 y 1 y 2 uncertainty propagating blokcing at x The evidence on y 2 changes the uncertainty on y 2, i.e., p(y 2 ) This change is propagated through the network, i.e, p(x y 2 ) and p(y 1 y 2 ) Until it is blocked at a node that has evidence (i.e., observed). p(y 2 y 2, x) = p(y 2 x) 6/23

Example 2: Wet Grass 2 W R H S H: Holmes lawn is wet W: Watson lawn is wet R: rain S: sprinkler was on Holmes finds that his lawn is wet (H) It may be due to rain (R), or he forgot to turn off the sprinkler (S) What does he do in reasoning? He notices that his neighbor, Watson, also has a wet lawn (W) What does he do in reasoning? 2 in An Introduction to Bayesian Networks by F. Jensen 7/23

Conditional Dependency x 1 x 2 y When y is not known, x 1 and x 2 are independent p(x 1, x 2 ) = p(x 1 )p(x 2 ) When y is observed and given, x 1 and x 2 are no longer independent p(x 1, x 2 y) p(x 1 y)p(x 2 y) This is called conditional dependency 8/23

Propagation x 1 x 2 x 1 x 2 y 2 y 1 y 2 y 1 When y 1 is known, it will influence the rest of the network p(x 1 y 1 ), p(x 2 y 1 ), p(y 2 y 1 ) When y 1 is known, p(x 1 y 1 ) and p(x 2 y 1 ) are dependent When y 2 is known, it will influence x 1 and it turns influence x 2 9/23

Excise: Earthquake or Burglary 3 B A E R A: burglary alarm gone off B: burglary E: earthquake R: radio report Holmes is told that the burglary alarm in his house is gone off What does he do reasoning when he rushes back home? The radio reports a small earthquake How does it influence? 3 in An Introduction to Bayesian Networks by F. Jensen 10/23

Three Types of Connections E A B C E A B C E sequential connection B C E diverging connection E A converging connection Sequential connection. Evidence is transmitted unless the states of the connection node is known Diverging connection Evidence is transmitted unless the connection node is known Converging connection Evidence is transmitted if (1) the connection node is known, or (2) its descendants receive evidence 11/23

d-separation Two nodes are d-separated if pathes between them, there is an intermediate node, s.t., its connection is serial or diverging, and its is known its connection is converging, but it and its descendants receive no evidence i.e., evidence cannot be transmitted between the two nodes Let s examine the following example: E A B C E D F G H The neighbors of E are all known. Do E and F d-separated? 12/23

Markov Blanket The Markov blanket of a variable A includes the parent of A the children of A the variables that share a child of A If all variables in the Markov blanket of A are known, then A is d-separated from the rest of the network Let s see the example again E A B C E D F G H What is the Markov blanket of E? 13/23

Outline Basic Concepts Bayesian Networks Inference 14/23

Definition A Bayesian network consists of the following: A set of random variables (nodes), and a set of directed links representing the conditional probabilities A directed acyclic graph (DAG) Each node A with parents B 1,...,B n is associated with a conditional probability p(a B 1,...,B n ) 15/23

The Chain Rule The statistical property of a Bayesian network is completely characterized by the joint distribution of all the nodes Marginals are obtained by integrations and Bayesian rules The nice property of Bayesian net is the factorization of this large joint distribution Support the BN has X = {x 1,...,x n }, then p(x) = p(x 1,...,x n ) = where pa(x i ) is the parents of x i. This can be proved by induction n p(x i pa(x i )) i=1 16/23

Example 1 revisit H I W I: Icy road H: Holmes crashes W: Watson crashes The priors are P(I = y) = 0.7, P(I = n) = 0.3 The conditional distributions are p(h I) and p(w I) p(h I) I = y I = n H = y 0.8 0.1 H = n 0.2 0.9 p(w I) I = y I = n W = y 0.8 0.1 W = n 0.2 0.9 17/23

Example 1 revisit When W is observed, i.e., W = y or W = n, how does this evidence propagate? Let s do p(i W = y) and p(h W = y) When I is observed, i.e., I = y or I = n, how does this evidence propagate? 18/23

Example 2 revisit The priors The conditionals are p(r) = (0.2, 0.8), P(S) = (0.1, 0.9) p(w R) R = y R = n W = y 1 0.2 W = n 0 0.8 p(h R, S) R = y R = n S = y (1,0) (0.9,0.1) S = n (1,0) (0,1) 19/23

Outline Basic Concepts Bayesian Networks Inference 20/23

Inference in Bayesian Networks The nodes can be divided into two categories Xh which are unknown or hidden Xe which receive evidence and are known Inference is to estimate the hiddens X h based on the knowns, i.e., to find p(x h X e ) It is simply p(x h X e ) p(x) = p(x h,x e ) = p(x e X h )p(x h ) Once the BN is specified, the information is complete It is just a matter of marginalization The structure of BN makes this marginalization processes easier We call the following belief p(x i X e ), x i X h 21/23

Sum-Product Let s use Example 1 p(h W) p(i, H, W) = p(h I)p(W I)p(I) I H receives messages propagated from W through I W gives I p(i W) I combines it with p(i) and p(h I) (product) I is the one all messages are integrated (sum) I 22/23

Sum-Product Let s use Example 2 p(r W,H) p(w R)p(R) p(h R, S)p(S) S R receives three things (product) its prior p(r) message combined at W messages combined at S (d-connected) Another one p(s W,H) p(s) S receives two things (product) its local prior p(s) messages combined at R (sum) R p(w R)p(H R, S)p(R) 23/23