Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Probabilistic Artiicial Intelligence Problem Set 3 Oct 27, 2017 1. Variable elimination In this exercise you will use variable elimination to perorm inerence on a bayesian network. Consider the network in igure 1 and its corresponding conditional probability tables (CPTs). A C B E D F Figure 1: Bayesian network or problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs or problem 1. (a) P (B A, C) (b) P (D C) (c) P (E B) (d) P (F D, E) A C P (B = t) 0.2 t 0.8 t 0.3 t t 0.5 C P (D = t) 0.9 t 0.75 B P (E = t) 0.2 t 0.4 D E P (F = t) 0.95 t 1 t 0 t t 0.25 1

Assuming a query on A with evidence or B and D, i.e. P (A B, D), use the variable elimination algorithm to answer the ollowing queries. Make explicit the selected ordering or the variables and compute the probability tables o the intermediate actors. 1. P (A = t B = t, D = ) 2. P (A = B =, D = ) 3. P (A = t B = t, D = t) Consider now the ordering, C, E, F, D, B, A, use again the variable elimination algorithm and write down the intermediate actors, this time without computing their probability tables. Is this ordering better or worse than the one you used beore? Why? 2. Belie propagation In this exercise, you will implement the belie propagation algorithm or perorming inerence in Bayesian networks. As you have seen in the class lectures, the algorithm is based on converting the Bayesian network to a actor graph and then passing messages between variable and actor nodes o that graph until convergence. You are provided some skeleton Python code in the.zip ile accompanying this document. Take the ollowing steps or this exercise. 1. Install the Python dependencies listed in README.txt, i your system does not already satisy them. Ater that, you should be able to run demo.py and produce some plots, albeit wrong ones or now. 2. Implement the missing code in bprop.py marked with TODO. In particular, you have to ill in parts o the two unctions that are responsible or sending messages rom variable to actor nodes and vice versa, as well as parts o the unction that returns the resulting marginal distribution o a variable node ater message passing has terminated. 3. Now, set up the ull-ledged earthquake network, whose structure was introduced in Problem Set 2 and is shown again in Figure 2. Here is the story behind this network: While Fred is commuting to work, he receives a phone call rom his neighbor saying that the burglar alarm in Fred s house is ringing. Upon hearing this, Fred immediately turns around to get back and check his home. A ew minutes on his way back, however, he hears on the radio that there was an earthquake near his home earlier that day. Relieved by the news, he turns around again and continues his way to work. To build up the conditional probability tables (CPTs) or the network o Figure 2 you may make the ollowing assumptions about the variables involved: All variables in the network are binary. As can be seen rom the network structure, burglaries and earthquakes are assumed to be independent. Furthermore, each o them is assumed to occur with probability 0.1%. 2

Earthquake Burglar Radio Alarm Phone Figure 2: The earthquake network to be implemented. The alarm is triggered in the ollowing ways: (1) When a burglar enters the house, the alarm will ring 99% o the time; (2) when an earthquake occurs, there will be a alse alarm 1% o the time; (3) the alarm might go o due to other causes (wind, rain, etc.) 0.1% o the time. These three types o causes are assumed to be independent o each other. The neighbor is assumed to call only when the alarm is ringing, but only does so 70% o the time when it is actually ringing. The radio is assumed to never alsely report an earthquake, but it might ail to report an earthquake that actually happened 50% o the time. (This includes the times that Fred ails to listen to the announcement.) 4. Ater having set up the network and its CPTs, answer the ollowing questions using your belie propagation implementation: (a) Beore Fred gets the neighbor s call, what is the probability o a burglary having occurred? What is the probability o an earthquake having occurred? (b) How do these probabilities change ater Fred receives the neighbor s phonecall? (c) How do these probabilities change ater Fred listens to the news on the radio? 3. Belie propagation on tree actor graphs* In this exercise we will prove that the belie propagation algorithm converges to the exact marginals ater a ixed number o iterations given that the actor graph is a tree, that is, given that the original Bayesian network is a polytree. We will assume that the actor graph contains no single-variable actors. (You have already seen that i those exist, they can easily be incorporated into multi-variable actors without increasing the complexity o the algorithm.) Since the actor graph is a tree, we will designate 3

u u g g v 1 v 2 w v 1 v 2 w h h T [v1 ] T [u] r 1 r 2 r 1 r 2 Figure 3: An example actor graph and two o its subtrees. a variable node, say a, as the root o the tree. We will consider subtrees T [rt], where r and t are adjacent nodes in the actor graph and t is closer to the root than r, which are o two types: i r is a actor node (and t a variable node), then T [rt] denotes a subtree that has t as its root, contains the whole subtree under r and, additionally, the edge {r, t}, i r is a variable node (and t a actor node), then T [rt] denotes the whole subtree under r with r as its root. See Figure 3 or two example subtrees, one o each type. The depth o a tree is deined as the maximum distance between the root and any other node. Note that, both types o subtrees T [rt] deined above have always depths that are even numbers. We will use the subscript notation [rt] to reer to quantities constrained to the subtree T [rt]. In particular, we denote by F [rt] the set o actors in the subtree and by P [rt] (x v ) the marginal distribution o v when we only consider the nodes o the subtree. More concretely, i r is a variable node, by the sum rule we get P [rt] (x r ) (x ), (1) F [rt] x [rt]\{r} where denotes equality up to a normalization constant. Remember the orm o the messages passed between variable and actor nodes at each iteration 4

T [v] v T [v] T [v] v v... N () \ {v}... N (v) \ {}... N () \ {v} (a) Base case. (b) First induction step. (c) Second induction step. Figure 4: The three cases considered in the proo by induction. o the algorithm: µ (t+1) v (x v) := g N (v)\{} µ (t+1) v (x v) := x \{v} (x ) g v(x v ) (2) w N ()\{v} w (x w). (3) We also deine the estimated marginal distribution o variable v at iteration t as ˆP (t) (x v ) := g v(x v ). (4) g N (v) Our ultimate goal is to show that the estimated marginals are equal to the true marginals or all variables ater a number iterations. However, we will irst consider the rooted version o the actor graph and show that the previous statement holds or the root node a. More concretely, i we denote variable nodes with v and actor nodes with, we will show using induction that or all subtrees T [v] o depth τ, it holds that, or all t τ, v (x v) P [v] (x v ). (5) (i) Consider the base case o T [v] being a subtree o depth τ = 2 (see Figure 4a). Show that (5) holds in this case or all t 2. (ii) Now, assume that (5) holds or all subtrees o depth τ. As a irst step, show that or any subtree T [v] o depth τ (see Figure 4b) it holds that, or all t τ + 1, v (x v) P [v] (x v ). (iii) Using the result o the previous step, show that, or any subtree T [v] o depth τ = τ +2, (5) holds or all t τ (see Figure 4c). 5

(iv) Show that, i the actor graph rooted at a has depth d, then, or all t d, ˆP (t) (x a ) P (x a ). (v) To generalize the statement above, assume that the actor graph has diameter D, i.e., the maximum distance between any two nodes in the graph is D. Show that or all t D the estimated marginal o any variable v is exact, that is, ˆP (t) (x v ) P (x v ). 6