ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm an d-separation Bayesian networks are a graphical model representing conditional independence relations The nodes of the graphs represent r.v. s Each node has associated with it a conditional probability distribution for the corresponding r.v., given its parents The lack of edges represents conditional independence assumptions. The presence of edges doe not represent dependence. January 8, 2006 1 OMP-526 Lecture 3 January 8, 2006 2 OMP-526 Lecture 3 Example: Bayesian (belief) network E=0 E=1 p( E) =1 0.0001 0.65 0.35 =0 =1 p(e) E=1 E=0 0.005 0.995 =0 0.9999 p( ) =1 =0 0.05 0.95 0.7 0.3 E B p(b) B=1 B=0 0.01 0.99 p( B,E) =1 =0 B=0,E=0 0.001 0.999 B=0,E=1 B=1,E=0 B=1,E=1 0.3 0.8 0.95 0.7 0.2 0.05 The nodes represent random variables The arcs represent influences t each node, we have a conditional probability distribution (PD) for the corresponding variable given its parents Factorization Let G be a DG over variables 1,..., n. We say that a joint probability distribution p factorizes according to G if p can be expressed as a product: n p(x 1,..., x n ) = p(x i x πi ) i=1 The individual factors p(x i x πi ) are called local probabilistic models or conditional probability distributions (PD). January 8, 2006 3 OMP-526 Lecture 3 January 8, 2006 4 OMP-526 Lecture 3
Example I-Maps directed acyclic graph (DG) G whose nodes represent random variables 1,..., n is an I-map (independence map) of a distribution p if p satisfies the independence assumptions: i Nondescendents( i ) πi, i = 1,... n where πi are the parents of i January 8, 2006 5 OMP-526 Lecture 3 onsider all possible DG structures over 2 variables. Which graph is an I-map for the following distribution? x y p(x, y) 0 0 0.08 0 1 0.32 1 0 0.32 1 1 0.28 What about the following distribution? x y p(x, y) 0 0 0.08 0 1 0.12 1 0 0.32 1 1 0.48 January 8, 2006 6 OMP-526 Lecture 3 Factorization theorem G is an I-map of p if and only if p factorizes according to G: n p(x 1,..., x n ) = p(x i x πi ), x i Ω i i=1 Proof: = (other direction in a bit) ssume that G is an I-map for p. By the chain rule, p(x 1,...,x n ) = Q n i=1 p(x i x 1,..., x i 1 ). Without loss of generality, we can order the variables x i according to G. From this assumption, πi { 1,..., i 1 }. This means that { 1,..., i 1 } = πi, where Nondescendents( i ). Since G is an I-map, we have i Nondescendents( i ) πi, so: p(x i x 1,..., x i 1 ) = p(x i z, x πi ) = p(x i x πi ) Factorization example E B The factorization theorem allows us to represent p(c, a, r, e,b) as: p(c, a, r, e, b) = p(b)p(e)p(a b, e)p(c a)p(r e) instead of: p(c, a, r, e, b) = p(b)p(e b)p(a e, b)p(c a, e, b)p(r a, e, c, b) and the conclusion follows. January 8, 2006 7 OMP-526 Lecture 3 January 8, 2006 8 OMP-526 Lecture 3
omplexity of factorized representations If k is the maximum number of ancestors for any node in the graph, and we have binary variables, then every conditional probability distribution will require 2 k numbers to specify The whole joint distribution can then be specified with n 2 k numbers, instead of 2 n The savings are big if the graph is sparse (k n). Minimal I-maps The fact that a DG G is an I-map for a joint distribution p might not be very useful. E.g. omplete DGs (where all arcs that do not create a cycle are present) are I-maps for any distribution (because they do not imply any independencies). DG G is minimal I-map of p if: 1. G is an I-map of p 2. If G G then G is not an I-map for p January 8, 2006 9 OMP-526 Lecture 3 January 8, 2006 10 OMP-526 Lecture 3 Non-uniqueness of the minimal I-map onstructing minimal I-maps The factorization theorem suggests an algorithm: 1. Fix an ordering of the variables: 1,..., n 2. For each i, select its parents πi to be the minimal subset of { 1,..., i 1 } such that i ({ 1,..., i 1 } πi ) πi. This will yield a minimal I-map January 8, 2006 11 OMP-526 Lecture 3 Unfortunately, a distribution can have many minimal I-maps, depending on the variable ordering we choose! The initial choice of variable ordering can have a big impact on the complexity of the minimal I-map: Example: E Ordering: E,B,,, Ordering:,,, E,B good heuristic is to use causality in order to generate an ordering. January 8, 2006 12 OMP-526 Lecture 3 B E B
Implied independency The fact that a Bayes net is an I-map for a distribution implies a set of conditional independencies that always hold, and allows simple case: Indirect connection us to compute join probabilities (and hence make inference) a lot faster in practice In practice, we also have evidence about the values of certain variables. Is there a way to say what are all the independence relations Think of as the past, as the present and as the future This is a simple Markov chain We interpret the lack of an edge between and as a conditional independence,. Is this justified? implied by a Bayes net? January 8, 2006 13 OMP-526 Lecture 3 January 8, 2006 14 OMP-526 Lecture 3 Indirect connection (continued) more interesting case: ommon cause We interpret the lack of an edge between and as a conditional independence,. Is this justified? Based on the graph structure, we have: Hence, we have: p(, ) = p(,, ) = p()p( )p( ) p(,,) p(, ) = p()p( )p( ) p()p( ) = p( ) Note that the edges that are present do not imply dependence. But the edges that are missing do imply independence. gain, we interpret the lack of edge between and as. Why is this true? p(,) = p(,, ) p(, ) = p( )p( )p( ) p( )p( ) = p( ) This is a hidden variable scenario: if is unknown, then and could appear to be dependent on each other January 8, 2006 15 OMP-526 Lecture 3 January 8, 2006 16 OMP-526 Lecture 3
The most interesting case: V-structure Bayes ball algorithm In this case, the lacking edge between and is a statement of marginal independence:. In this case, once we know the value of, and might depend on each other. E.g., suppose and are independent coin flips, and is true if and only if both and come up heads. Note that in this case, is not independent of given! This is the case of explaining away. January 8, 2006 17 OMP-526 Lecture 3 Suppose we want to decide whether for a general Bayes net with corresponding graph G. We shade all nodes in the evidence set, We put balls in all the nodes in, and we let them bounce around the graph according to rules inspired by these three base cases Note that the balls can go in any direction along an edge! If any ball reaches any node in, then the conditional independence assertion is not true. January 8, 2006 18 OMP-526 Lecture 3 Head-to-tail Tail-to-tail Head-to-head Base rules unknown, path unblocked known, path blocked unknown, path unblocked known, path blocked unknown, path BLOKED known, path UNBLOKED d-separation Suppose we want to show that a conditional independence relation,, is implied by a DG G in which,, are non-intersecting sets of nodes. path is said to be blocked if it includes a node such that: 1. the arrows in the path do not meet head-to-head at the node, and the node is in the conditioning set (this covers the head-to-tail and tail-to-tail cases) 2. the arrows do meet head-to-head and neither the node nor its descendents are in If, given the set of conditioning nodes, all paths from any node in to any node in are blocked, then is d-separated (directed-separated) from given January 8, 2006 19 OMP-526 Lecture 3 January 8, 2006 20 OMP-526 Lecture 3
Is? Example: The alarm network E B Important results Soundness : If a joint distribution p factorizes according to a DG G, and if, and are subsets of nodes such that d-separates and in G, then p satisfies. This property allows us to prove the reverse direction of the factorization theorem. ompleteness : if does not d-separate and in DG G, then there exists at least one distribution p which factorizes over G and in which \ January 8, 2006 21 OMP-526 Lecture 3 January 8, 2006 22 OMP-526 Lecture 3