Bayesian Networks. Example of Inference. Another Bayes Net Example (from Andrew Moore) Example Bayes Net. Want to know P(A)? Proceed as follows:

Size: px
Start display at page:

Download "Bayesian Networks. Example of Inference. Another Bayes Net Example (from Andrew Moore) Example Bayes Net. Want to know P(A)? Proceed as follows:"

Transcription

1 ayesian Networks ayesian network (N) is a graphical representation of the direct dependencies over a set of variables, together with a set of conditional probability tables quantifying the strength of those influences. ayes nets are effective tools to represent the world and make inferences where there is uncertainty. More formally: N over variables {X 1, X 2,, X n } consists of a G (directed acyclic graph) whose nodes are the variables. ach node X i is associated with a conditional probability tables (PTs) that specify P(X i Parents(X i )) for that X i xample ayes Net H P() = 0.7 P(~) = 0.3 P( ) = 0.8 P(~ ) = 0.2 P( ~) = 0.5 P(~ ~) = 0.5 P( ) = 0.7 P(~ ) = 0.3 P( ~) = 0.0 P(~ ~) = 1.0 P( ) = 0.2 P(~ ) = 0.8 P( ~) = 0.1 P(~ ~) = 0.9 P(H ) = 0.9 P(~H ) = 0.1 P(H ~) = 0.1 P(~H ~) = 0.9 Note that specifying P(,,,,H) for any assignment of values to these variables requires only 9 parameters in the joint (rather than 31): That means inference is linear in the number of variables instead of exponential! Moreover, inference is linear generally if dependence has a chain structure xample of Inference Want to know P()? Proceed as follows: H nother ayes Net xample (from ndrew Moore) M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ssume that all Ts may arrive late in bad weather. Some Ts may be more likely to be late than others. These are all terms specified in our local distributions!

2 nother ayes Net xample (from ndrew Moore) M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ssume that all Ts may arrive late in bad weather. Some Ts may be more likely to be late than others. Let s begin by writing down knowledge we feel happy about: P(S M) = P(S), P(S) = 0.3, P(M) = 0.6 Lateness is independent of the weather and not independent of the T. nother ayes Net xample (from ndrew Moore) M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ssume that all Ts may arrive late in bad weather. Some Ts may be more likely to be late than others. P(S M) = P(S), P(S) = 0.3, P(M) = 0.6 Lateness is independent of the weather and not independent of the T. To specify P(L S = u, M = v) we need values for the 4 cases of u/v = True/alse. ayes net example ayes net example M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ecause of conditional independence, we only need 6 values in the joint (how many would there be otherwise?) gain, conditional independence leads to computational savings!

3 Graphically representing the N Graphically representing the N Read the absence of an arrow between S and M to mean It would not help me predict M if I knew the value of S Read the two arrows into L to mean If I want to know the value of L it may help me to know M and to know S. ltering the graph Now let s suppose we have these three events: onditional independence Once you know who the T is, then whether they arrive late does not affect whether the help session concerns Reasoning with ayes Nets. M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets nd we know: rin has a higher chance of being late than Meredith. rin has a higher chance of giving a help session about reasoning with Ns What kind of independences exist in our graph?

4 Graphically representing the N Let s say we have 5 variables M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 We know: T is only directly influenced by L (i.e. T is conditionally independent of R,M,S given L) L is only directly influenced by M and S (i.e. L is conditionally independent of R given M & S) R is only directly influenced by M (i.e. R is conditionally independent of L,S, given M) M and S are independent uilding a ayes Net uilding a ayes Net M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 Step One: add variables. Step Two: add links. The link structure must be acyclic (Graph is a G) Remember that if a node X has parents, you are promising that any non-descendent of X is conditionally independent of X given its parents.

5 uilding a ayes Net M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 Step Three: add a probability table for each node. Note that the table for any node X must list P(X Parents(X)) for each possible combination of parent values. ach node is conditionally independent of all non-descendants, given its parents. Two unconnected variables may still be correlated You can deduce many other conditional independence relations from a ayes Net. uilding a ayes Net Note that it is always possible to construct a ayes Net to represent any distribution over variables X 1, X 2,, X n, using any ordering of variables. Take any ordering of the variables (say, the order given). rom the chain rule we obtain. P(X 1,,X n ) = P(X n X 1,,X n-1 )P(X n-1 X 1,,X n-2 ) P(X 1 ) Now for each Xi go through its conditioning set X 1,,X i-1, and iteratively remove all variables X j such that X i is conditionally independent of X j given the remaining variables. o this until no more variables can be removed. The final product will specify a ayes net. ausal Intuitions However, some orderings yield N s with very large parent sets. This requires exponential space, and (as we will see later) exponential time to perform inference. mpirically, and conceptually, a good way to construct a N is to use an ordering based on causality. This often yields a more natural and compact N. xample: ausal Intuitions Malaria, the flu and a cold all cause aches. So use the ordering that causes come before effects Malaria, lu, old, ches P(M,,,) = P( M,,) P( M,) P( M) P(M) ach of these diseases affects the probability of aches, so the first conditional probability does not change. It is reasonable to assume that these diseases are independent of each other: having or not having one does not change the probability of having the others. So P( M,) = P() P( M) = P()

6 xample: ausal Intuitions This yields a fairly simple ayes net. We only need one big PT, involving the family of ches. xample: ausal Intuitions ut suppose we build the N for the distribution using the opposite ordering, i.e., we use ordering ches, old, lu, Malaria P(,,,M) = P(M,,) P(,) P( ) P() We can t reduce P(M,,). Probability of Malaria is clearly affected by knowing aches. What about knowing aches and old, or aches and old and lu? Probability of Malaria is affected by both of these additional pieces of knowledge Knowing old and lu lowers the probability of aches related to Malaria since the other conditions explain away aches! xample: ausal Intuitions We obtain a much more complex ayes Net. In fact, we obtain no savings over explicitly representing the full joint distribution (i.e., representing the probability of every atomic event). urglary xample I'm at work, neighbour John calls to say my alarm is ringing, but neighbour Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: urglary, arthquake, larm, Johnalls, Maryalls Network topology reflects "causal" knowledge: burglar can set the alarm off n earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

7 urglary xample urglary xample burglary can set the alarm off n earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? P(J M) = P(J)? # of Params: = 10 (vs = 31) urglary xample urglary xample Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? No P(, J, M) = P( )? P(, J, M) = P()?

8 urglary xample urglary xample Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? No P(, J, M) = P( )? Yes P(, J, M) = P()? No P(,,J, M) = P( )? P(,, J, M) = P(, )? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? No P(, J, M) = P( )? Yes P(, J, M) = P()? No P(,,J, M) = P( )? No P(,, J, M) = P(, )? Yes urglary xample Inference in ayes Nets eciding conditional independence is hard in non-causal directions! (ausal models and conditional independence seem hardwired for humans!) Network is less compact: = 13 numbers needed Given a ayes net P(X 1, X 2,, X n ) = P(X n P(Parents(X n ))) * P(X n-1 P(Parents(X n-1 ))) * * P(X 1 P(Parents(X 1 ))) nd some evidence = {a set of values for some of the variables} we want to compute the new probability distribution P(X k ) That is, we want to figure our P(X k = d ) for all d om[x k ] This is a posterior probability function, meaning that is is conditioned on evidence.

9 Inference in ayes Nets Other types of examples are, computing probability of different diseases given symptoms, computing probability of hail storms given different metrological evidence, etc. In such cases getting a good estimate of the probability of the unknown event allows us to respond more effectively (gamble rationally) Inference in ayes Nets In the larm example: P(urglary,arthquake, larm, Johnalls, Maryalls) = P(arthquake) * P(urglary) * P(larm arthquake,urglary) * P(Johnalls larm) * P(Maryalls larm) We may want to infer things like P(urglary=true Maryalls=false, Johnalls=true) Variable limination Making inferences can be computationally hairy. Variable elimination uses the product decomposition that defines a ayes Net and the summing out rule to compute posterior probabilities from information already in the network (PTs). V helps to reduce some of the hair in our computation. xample (inary valued Variables) P(,,,,,,G,H,I,J,K) = P() x P() x P( ) x P(,) x P( ) x P( ) x P(G) x P(H,) x P(I,G) x P(J H,I) x P(K I)

10 xample xample P(,,,,,,G,H,I,J,K) = P ( ) P ( ) P ( ) P (, ) P ( ) P ( ) P ( G ) P(H,)P(I,G)P(J H,I)P(K I) Now we can compute P(h,-i) P(d,h,-i) + P(-d,h,-i) = P(h,-i) Say that (our evidence) = {H=true, I=false}, and we want to know P( h,-i) (h: H is true, -h: H is false) irst, we write as a sum for each value of (i.e. = d and = -d),,,,,g,j,k P(,,,d,,,h,-i,J,K) = P(d,h,-i),,,,,G,J,K P(,,,-d,,,h,-i,J,K) = P(-d,h,-i) nd finally, P( h,-i) P( d h,-i) = P(d,h,-i)/P(h,-i) P(-d h,-i) = P(-d,h,-i)/P(h,-i) So very good; it appears we only need to compute P(d,h,-i) and P(-d,h,-i) to obtain the conditional probabilities we want. xample xample We start with P(d,h,-i) =,,,,,G,J,K P(,,,d,,,h,- i,j,k) Use ayes Net product decomposition to rewrite summation:,,,,,g,j,k P(,,,d,,,h,-i,J,K) =,,,,,G,J,K P()P()P( )P(d,)P( ) P ( d)p(g)p(h,)p(-i,g)p(j h,-i) Next, rearrange summations so that we are not summing over variables that do not depend on the summed variable. =,,,,, G, J, P()P()P( )P(d,)P( ) P( d)p(g)p(h,)p(-i,g)p(j h,-i) = P() P() P( )P(d,) P( ) P( d) G P(G)P(h,)P(-i,G) J P(J h,-i) = P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h, -i)

11 Now start computing. xample P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(k -i) + P(-k -i) = c 1 J P(J h,-i) c 1 = c 1 J P(J h,-i) = c 1 (P(j h,-i) + P(-j h,-i)) = c 1 c 2 xample P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) c 1 c 2 G P(G) P(-i,G) = c 1 c 2 (P(g)P(-i,g) + P(-g)P(-i,-g)) ut P(-i,g) depends on the value of, so this is not a single number! xample xample Hum. Let s try ordering summations from front to back. P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(a) P() P(d a,) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a) P() P(d a,) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(a)P(b) P(d a,b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(a)P(-b) P(d a,-b) P( a) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(b) P(d -a,b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(-b) P(d -a,-b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i)

12 xample = Yikes! The size of the sum is doubling as we expand each variable (into v and v). This approach has exponential complexity. ut let s look a bit closer. xample = P(a)P(b) P(d a,b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(a)P(-b) P(d a,-b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(b) P(d -a,b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(-b) P(d -a,-b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) Repeated subterm Repeated subterm xample If we store the value of the sub-terms, we need only compute them once. xample = P(a)P(b) P(d a,b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(a)P(-b) P(d a,-b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + f 1 P(-a)P(b) P(d -a,b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(-b) P(d -a,-b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) f 2 = c 1 f 1 + c 2 f 1 + c 3 f 2 + c 3 f 2 where c 1 = P(a)P(b) P(d a,b), c 2 = P(a)P(-b) P(d a,-b), c 3 = P(a)P(-b) P(d a,-b) and c 4 = P(a)P(-b) P(d a,-b)

13 xample f 1 = P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(c a) P( c) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-c a) P( -c) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) Repeated subterm ynamic Programming Within computation of sub-terms we obtain more repeated smaller sub-terms. The core idea of dynamic programming is to remember all smaller computations, so that they can be reused. This can convert an exponential computation into one that takes only polynomial time. Variable elimination is a dynamic programming technique that computes the sum from the bottom up (starting with the smaller sub-terms and working its way up to the bigger terms). Relevant (return to this later) brief aside is to also note that in the sum P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) we have that = 1 (Why?), thus J P(J h,-i) = J P(J h,-i) urthermore J P(J h,-i) = 1. So we can, in theory, drop these last two terms from the computation as J and K are not relevant given our query and our evidence ( i and h). Variable limination (V) V works from the inside of our equation out, summing out K, then J, then G, as we tried to before. P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) When we tried to sum out G, we got here. c 1 c 2 G P(G) P(-i,G) = c 1 c 2 (P(g)P(-i,g) + P(-g)P(-i,-g)) and we found that P(-i,-g) to depend on the value of ; it wasn t a single number. However, we can still continue with the computation by computing two different numbers, one for each value of (i.e. for = f and =f).

14 Variable limination (V) t(-f) = c 1 c 2 G P(G) P(-i -f,g) t(f) = c 1 c 2 G P(G) P(-i f,g) t(-f) = c 1 c 2 (P(g)P(-i -f,g) + P(-g)P(-i -f,-g)) t(f) = c 1 c 2 (P(g)P(-i f,g) + P(-g)P(-i f,-g)) With these, we can sum out. Variable limination (V) P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) c 1 c 2 P( d) P(h,) G P(G) P(-i,G) = c 1 c 2 (P(f d) P(h,f)( G P(G) P(-i f,g)) +P(-f d)p(h,-f)( G P(G)P(-i -f,g)) = c 1 c 2 P( d) P(h,) t() t(f), t(-f) Variable limination (V) c 1 c 2 (P(f d) P(h,f)t(f) +P(-f d)p(h,-f)t(-f) Now this is a function of, so we obtain two new numbers s(e) = c 1 c 2 (P(f d) P(h e,f)t(f) +P(-f d)p(h e,-f)t(-f) s(-e) = c 1 c 2 (P(f d) P(h -e,f)t(f) +P(-f d)p(h -e,- f)t(-f) Variable limination (V) P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) On summing out we obtain two numbers which represent a function of. Then a function of, then a function of. inally, we can sum out all variables to obtain the single number we wanted to compute, which is P(d,h,-i). Now we can repeat the process to compute P(-d,h,-i). Or, instead of doing it twice, we can simply regard as a variable in the computation. This will result in some computations that depend on the value of, and we ll obtain different numbers for each value of. Proceeding in this manner will yield a function of (i.e., a number for each value of ).

15 Variable limination (V) In general, each stage V will compute a table of numbers: one number for each different instantiation of the variables that are in the sum. The size of these tables is exponential in the number of variables appearing in the sum, e.g., P( ) P(h,)t() depends on the value of and, thus we will obtain om[] * om[] different numbers in the resulting table. actors we call the tables of values computed by V factors. Note that the original probabilities that appear in the summation, e.g., P( ), are also tables of values (one value for each instantiation of and ). Thus we also call the original PTs factors. ach factor is a function of some variables, e.g., P( ) = f(,): it maps each value of its arguments to a number. tabular representation is exponential in the number of variables in the factor. Operations on actors If we examine the summation process we will see that various operations repeatedly occur on factors. Notation: f(x,y) denotes a factor over the variables X Y (where X and Y are sets of variables) The Product of Two actors Let f(x,y) & g(y,z) be two factors with variables Y in common The product of f and g, denoted h = f x g (or sometimes just h = fg), is defined as: h(x,y,z) = f(x,y) x g(y,z) f(,) g(,) h(,,) ab 0.9 bc 0.7 abc 0.63 ab~c 0.27 a~b 0.1 b~c 0.3 a~bc 0.08 a~b~c 0.02 ~ab 0.4 ~bc 0.8 ~abc 0.28 ~ab~c 0.12 ~a~b 0.6 ~b~c 0.2 ~a~bc 0.48 ~a~b~c 0.12

16 Summing a Variable Out of a actor Let f(x,y) be a factor We can sum out variable X from f to produce a new factor h = Σ X f, which is defined: h(y) = Σ x om(x) f(x,y) f(,) h() ab 0.9 b 1.3 a~b 0.1 ~b 0.7 ~ab 0.4 ~a~b 0.6 Restricting a actor Let f(x,y) be a factor We can restrict factor f to X = a by setting X to the value x and deleting incompatible elements of f s domain. efine h = f X=a as: h(y) = f(a,y) f(,) h() for f =a ab 0.9 b 0.9 a~b 0.1 ~b 0.1 ~ab 0.4 ~a~b 0.6 Variable limination the lgorithm V: xample Given query var Q, evidence vars (variables observed to have values e), and remaining vars Z. Let be factors in original PTs. actors: f 1 () f 2 () f 3 (,,) f 4 (,) Query: P()? vidence: = d limination Order:, f 1 () f 2 () f 3 (,,) f 4 (,) 1. Replace each factor f that mentions a variable(s) in with its restriction f =e (this might yield a constant factor) 2. or each Z j - in the order given - eliminate Z j Z as follows: (a) ompute new factor g j = Zj f 1 x f 2 x x f k, where the f i are the factors in that include Z j (b) Remove the factors f i (that mention Z j ) from and add new factor g j to 3. The remaining factors at the end of this process will refer only to the query variable Q. Take their product and normalize to produce P(Q ). Step 1 (Restriction): Replace f 4 (,) with f 5 () = f 4 (,d) Step 2 (liminate ): ompute & add f 6 (,)= Σ f 5 () f 3 (,,) to list of factors. Remove: f 3 (,,), f 5 () Step 3 (liminate ): ompute & add f 7 () = Σ f 6 (,) f 2 () to list of factors. Remove: f 6 (,), f 2 () Step 4 (Normalize inal actors): f 7 (), f 1 (). The product f 1 () x f 7 () is (unnormalized) posterior for. So we normalize: P( d) = α f 1 () x f 7 () where α = 1/ f 1 ()f 7 ()

17 Numeric xample Here s an example with some numbers f 1 () f 2 (,) f 3 (,) f 1 () f 2 (,) f 3 (,) f 4 () Σ f 2 (,)f 1 () f 5 () Σ f 3 (,) f 4 () a 0.9 ab 0.9 bc 0.7 b 0.85 c ~a 0.1 a~b 0.1 b~c 0.3 ~b 0.15 ~c ~ab 0.4 ~bc 0.2 ~a~b 0.6 ~b~c 0.8 V: uckets as a Notational evice Ordering:,,,,, 1. : 2. : 3. : 4. : 5. : 6. : f 1 () f 2 () f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: uckets Place Original actors in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 3. : f 1 () 4. : f 2 () 5. : 6. : 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 () 5. : 6. : 1. Σ f 3 (,,) x f 4 (,) x f 5 (,) = f 7 (,,,)

18 V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 2. Σ f 6 (,,) = f 8 (,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 3. Σ f 1 () x f 7 (,,,) = f 9 (,,) 4. : f 2 () 4. : f 2 (), f 9 (,,) 5. : f 8 (,) 5. : f 8 (,) 6. : 6. : V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. Σ f 2 () x f 9 (,,) = f 10 (,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 5. Σ f 8 (,) x f 10 (,) = f 11 () 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) f 11 is the final answer, once we normalize it. 6. : 6. : f 11 ()

19 omplexity of Variable limination hypergraph has vertices just like an ordinary graph, but instead of edges between two vertices X Y it contains hyperedges. hyperedge is a set of vertices (i.e., potentially more than one) {,,} {,,} {,} omplexity of Variable limination Hypergraph of ayes Net The set of vertices are the nodes of the ayes net. The hyperedges are the variables appearing in each PT. {X i } Parents(X i ) omplexity of Variable limination Variable limination in the HyperGraph P(,,,,,) = P()P() X P(,) X P( ) X P( ) X P(,). To eliminate variable X i in the hypergraph we Remove the vertex X i reate a new hyperedge H i equal to the union of all of the hyperedges that contain X i minus X i Remove all of the hyperedges containing X from the hypergraph. dd the new hyperedge H i to the hypergraph.

20 omplexity of Variable limination omplexity of Variable limination liminate liminate omplexity of Variable limination Variable limination Notice that when at the outset of V we have a set of factors consisting of the reduced (or restricted) PTs. The unassigned variables for the vertices and the set of variables each factor depends on forms the hyperedges of a hypergraph H 1. liminate If the first variable we eliminate is X, then we remove all factors containing X (all hyperedges) and add a new factor that has as variables the union of the variables in the factors containing X (we add a hyperdege that is the union of the removed hyperedges minus X).

21 V actors f 5 (,) V: Place Original actors in first applicable bucket. f 5 (,) Ordering:,,,,, f 1 () f 2 () f 3 (,,) f 6,,) Ordering:,,,,, f 1 () f 2 () f 3 (,,) f 6 (,,) 1. : 2. : 3. : 4. : 5. : 6. : f 4 (,) 1. : f 3 (,,), f 4 (,), f 5 (,) 2. : f 6 (,,) 3. : f 1 () 4. : f 2 () 5. : 6. : f 4 (,) V: liminate, placing new factor f7 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate, placing new factor f8 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 () 5. : 3. : f 1 (), f 7 (,,,) 4. : f 2 () 5. : f 8 (,) 6. : 6. :

22 V: liminate, placing new factor f9 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate, placing new factor f10 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) 6. : 6. : V: liminate, placing new factor f11 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) limination Width Given an ordering π of the variables and an initial hypergraph H eliminating these variables yields a sequence of hypergraphs H = H 0, H 1,H 2,,H n where H n contains only one vertex (the query variable). 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) 6. : f 11 () The elimination width π is the maximum size (number of variables) of any hyperedge in any of the hypergraphs H 0,H 1,,H n. The elimination width of the previous example was 4 ({,,,} in H 1 and H 2 ).

23 limination Width If the elimination width of an ordering π is k, then the complexity of V using that ordering is 2 O(k) limination width k means that at some stage in the elimination process a factor involving k variables was generated. That factor will require 2 O(k) space to store space complexity of V is 2 O(k) nd it will require 2 O(k) operations to process (either to compute in the first place, or when it is being processed to eliminate one of its variables). Time complexity of V is 2 O(k) NOT, that k is the elimination width of this particular ordering. Tree Width Given a hypergraph H with vertices {X 1,X 2,,X n } the tree width (ω) of H is the MINIMUM elimination width of any of the n! different orderings of the X i minus 1. Thus V has best case complexity of 2 O(ω) where ω is the tree width of the initial ayes Net. In the worst case, the tree width is equal to the number of variables. ifferent Orderings = ifferent limination Widths Suppose query variable is. onsider different orderings for this network,,,,g,h, : ad,,h,g,,,: Good Tree Width xponential in the tree width is the best that V can do. ut, finding an ordering that has elimination width equal to tree width is NP-Hard. so in practice there is no point in trying to speed up V by finding the best possible elimination ordering. Instead, heuristics are used to find orderings with good (low) elimination widths. In practice, this can be very successful. limination widths can often be relatively small, 8-10 even when the network has 1000s of variables. Thus V can be much more efficient than simply summing the probability of all possible events (which is exponential in the number of variables). Sometimes, however, the tree width is equal to the number of variables.

24 inding Good Orderings polytree is a singly connected ayes Net: in particular there is only one path between any two nodes. node can have multiple parents, but we have no cycles. Good orderings are easy to find for polytrees t each stage eliminate a singly connected node. ecause we have a polytree we are assured that a singly connected node will exist at each elimination stage. The size of the factors in the tree never increase. limination Ordering: Polytrees Tree width of a polytree is 1! liminating singly connected nodes allows V to run in time linear in size of network e.g., in this network, eliminate,,, X1, ; or eliminate X1, Xk,, ; or mix it up result: no factor ever larger than original PTs eliminating before these gives factors that include all of,, X1, Xk!!! Min ill Heuristic fairly effective heuristic is to always eliminate next the variable that creates the smallest size factor. This is called the min-fill heuristic. creates a factor of size k+2 creates a factor of size 2 creates a factor of size 1 This heuristic always solves polytrees in linear time. Relevance ertain variables have no impact on the query. or example, in network, computing P() with no evidence requires elimination of and. ut when you sum out these variables, you compute a trivial factor (that always evaluates to 1); for example: liminating : f 4 () = Σ f 3 (,) = Σ P( ) This is 1 for any value of (e.g., P(c b) + P(~c b) = 1) No need to think about or for this query

25 Relevance an restrict attention to relevant variables. Given query q, evidence : q itself is relevant if any node Z is relevant, its parents are relevant if e is a descendent of a relevant node, then is relevant We can restrict our attention to the subnetwork comprising only relevant variables when evaluating a query Q Relevance: xamples Query: P() relevant:,,, Query: P( ) relevant:,,, also:, hence, G G intuitively, we need to compute P( ) to compute P( ) H Query: P( H) relevant:,,, P()P()P(,)P( ) P(G)P(h G)P( G,)P( ) = P(G)P(h G)P( G,) P( ) = a table of 1 s = P(G)P(h G) P( G,) = a table of 1 s = [P()P()P(,)P( )] [P(G)P(h G)] [P(G)P(h G)] 1 but irrelevant once we normalize, multiplies each value of equally Relevance: xamples Query: P(,) algorithm says all variables except H are relevant; but really none except, (since cuts of all influence of others) algorithm is overestimating relevant set H G Independence in a ayes Net nother piece of information we can obtain from a ayes net is the structure of relationships in the domain. The structure of the N means: every X i is conditionally independent of all of its non-descendants given it parents: P(X i S Par(X i )) = P(X i Par(X i )) for any subset S Nonescendents(X i )

26 More generally. onditional independencies can be useful in computation, explanation, etc. related to a N How do we determine if two variables X, Y are independent given a set of variables? We can use a (simple) graphical property called -separation -separation: set of variables d-separates X and Y if it blocks every undirected path in the N between X and Y (we'll define locks next.) X and Y are conditionally independent given evidence if d- separates X and Y thus N gives us an easy way to tell if two variables are independent (set = ) or conditionally independent given. What does it mean to be blocked? X There exists a variable V on the path such that it is in the evidence set the arcs putting V in the path are tail-to-tail Or, there exists a variable V on the path such that it is in the evidence set the arcs putting V in the path are tail-to-head V X Y V Y What does it mean to be blocked? locking: Graphical View X Y V If the variable V is on the path such that the arcs putting V on the path are head-to-head, the variables are still blocked... so long as: V is NOT in the evidence set neither are any of its descendants

27 -separation implies conditional independence -Separation: Intuitions Theorem [Verma & Pearl, 1998]: If a set of evidence variables d-separates X and Z in a ayesian network s graph, then X is independent of Z given. Subway and Therm are dependent; but are independent given lu (since lu blocks the only path) -Separation: Intuitions -Separation: Intuitions ches and ever are dependent; but are independent given lu (since lu blocks the only path). Similarly for ches and Therm (dependent, but indep. given lu). lu and Mal are indep. (given no evidence): ever blocks the path, since it is not in evidence, nor is its descendant Therm. lu and Mal are dependent given ever (or given Therm): nothing blocks path now.

28 -Separation: Intuitions Subway, xotictrip are indep.; they are dependent given Therm; they are indep. given Therm and Malaria. This for exactly the same reasons for lu/mal above. -Separation xample In the following network determine if and are independent given the evidence. 1. and given no evidence? 2. and given {}? 3. and given {G,}? 4. and given {G,,H}? 5. and given {G,}? 6. and given {,}? 7. and given {,,H}? 8. and given {}? 9. and given {H,}? 10. and given {G,,,H,,,}? H G -Separation xample In the following network determine if and are independent given the evidence. 1. and given no evidence? N 2. and given {}? N 3. and given {G,}? Y 4. and given {G,,H}? Y 5. and given {G,}? N 6. and given {,}? Y 7. and given {,,H}? N 8. and given {}? Y 9. and given {H,}? Y 10. and given {G,,,H,,,}? Y H G

Reasoning under Uncertainty

Reasoning under Uncertainty Reasoning under Uncertainty This material is covered in chapters 13 and 14 of Russell and Norvig 2 nd and 3 rd edition. hapter 13 gives some basic background on probability from the point of view of I.

More information

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks. Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.

More information

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III Bayesian Networks Announcements: Drop deadline is this Sunday Nov 5 th. All lecture notes needed for T3 posted (L13,,L17). T3 sample

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car CSE 473: rtificial Intelligence Spring 2012 ayesian Networks Dan Weld Outline Probabilistic models (and inference) ayesian Networks (Ns) Independence in Ns Efficient Inference in Ns Learning Many slides

More information

Artificial Intelligence Bayes Nets

Artificial Intelligence Bayes Nets rtificial Intelligence ayes Nets print troubleshooter (part of Windows 95) Nilsson - hapter 19 Russell and Norvig - hapter 14 ayes Nets; page 1 of 21 ayes Nets; page 2 of 21 joint probability distributions

More information

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006 Lecture 8 Probabilistic Reasoning CS 486/686 May 25, 2006 Outline Review probabilistic inference, independence and conditional independence Bayesian networks What are they What do they mean How do we create

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

Bayesian Networks Representation and Reasoning

Bayesian Networks Representation and Reasoning ayesian Networks Representation and Reasoning Marco F. Ramoni hildren s Hospital Informatics Program Harvard Medical School (2003) Harvard-MIT ivision of Health Sciences and Technology HST.951J: Medical

More information

14 PROBABILISTIC REASONING

14 PROBABILISTIC REASONING 228 14 PROBABILISTIC REASONING A Bayesian network is a directed graph in which each node is annotated with quantitative probability information 1. A set of random variables makes up the nodes of the network.

More information

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination Outline Syntax Semantics Exact inference by enumeration Exact inference by variable elimination s A simple, graphical notation for conditional independence assertions and hence for compact specication

More information

Bayesian networks. Chapter Chapter

Bayesian networks. Chapter Chapter Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Fall 2009 CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes Nets 10/13/2009 Dan Klein UC Berkeley Announcements Assignments P3 due yesterday W2 due Thursday W1 returned in front (after lecture) Midterm

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

More information

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473 ayesian Networks CSE 473 Last Time asic notions tomic events Probabilities Joint distribution Inference by enumeration Independence & conditional independence ayes rule ayesian networks Statistical learning

More information

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II. Copyright Richard J. Povinelli rev 1.0, 10/1//2001 Page 1 Probabilistic Reasoning Systems Dr. Richard J. Povinelli Objectives You should be able to apply belief networks to model a problem with uncertainty.

More information

Probabilistic Reasoning Systems

Probabilistic Reasoning Systems Probabilistic Reasoning Systems Dr. Richard J. Povinelli Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 1 Objectives You should be able to apply belief networks to model a problem with uncertainty.

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Reasoning Under Uncertainty

Reasoning Under Uncertainty CSC384h: Intro to Artificial Intelligence Reasoning Under Uncertainty This material is covered in chapters 13 and 14. Chapter 13 gives some basic background on probability from the point of view of AI.

More information

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network. ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm

More information

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline Probability

More information

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net CS 188: Artificial Intelligence Fall 2010 Lecture 15: ayes Nets II Independence 10/14/2010 an Klein UC erkeley A ayes net is an efficient encoding of a probabilistic model of a domain ayes Nets Questions

More information

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 14: Bayes Nets II Independence 3/9/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Chapter 14, Sections 1 4 Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks

More information

Probabilistic Models

Probabilistic Models Bayes Nets 1 Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence

More information

Informatics 2D Reasoning and Agents Semester 2,

Informatics 2D Reasoning and Agents Semester 2, Informatics 2D Reasoning and Agents Semester 2, 2017 2018 Alex Lascarides alex@inf.ed.ac.uk Lecture 23 Probabilistic Reasoning with Bayesian Networks 15th March 2018 Informatics UoE Informatics 2D 1 Where

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference 15-780: Graduate Artificial Intelligence ayesian networks: Construction and inference ayesian networks: Notations ayesian networks are directed acyclic graphs. Conditional probability tables (CPTs) P(Lo)

More information

Bayesian Belief Network

Bayesian Belief Network Bayesian Belief Network a! b) = a) b) toothache, catch, cavity, Weather = cloudy) = = Weather = cloudy) toothache, catch, cavity) The decomposition of large probabilistic domains into weakly connected

More information

Graphical Models - Part I

Graphical Models - Part I Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 14: Bayes Nets 10/14/2008 Dan Klein UC Berkeley 1 1 Announcements Midterm 10/21! One page note sheet Review sessions Friday and Sunday (similar) OHs on

More information

Probabilistic Models. Models describe how (a portion of) the world works

Probabilistic Models. Models describe how (a portion of) the world works Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables All models

More information

CSE 473: Artificial Intelligence Autumn 2011

CSE 473: Artificial Intelligence Autumn 2011 CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Outline Probabilistic models

More information

Bayes Networks 6.872/HST.950

Bayes Networks 6.872/HST.950 Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to ayesian Networks Lecture 4: Inference in ayesian Networks: The VE lgorithm Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University

More information

Review: Bayesian learning and inference

Review: Bayesian learning and inference Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2! Uncertainty and Belief Networks Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2! This lecture Conditional Independence Bayesian (Belief) Networks: Syntax and semantics

More information

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car CSE 573: Artificial Intelligence Autumn 2012 Bayesian Networks Dan Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer Outline Probabilistic models (and inference)

More information

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 6364) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

Bayesian Networks: Independencies and Inference

Bayesian Networks: Independencies and Inference Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful

More information

Bayes Nets: Independence

Bayes Nets: Independence Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes

More information

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser Lecture 17: Uncertainty 2 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture How belief networks can be a Knowledge Base for probabilistic knowledge. How to construct a belief network. How to answer

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

last two digits of your SID

last two digits of your SID Announcements Midterm: Wednesday 7pm-9pm See midterm prep page (posted on Piazza, inst.eecs page) Four rooms; your room determined by last two digits of your SID: 00-32: Dwinelle 155 33-45: Genetics and

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK

More information

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015 S 88 Spring 205 Introduction to rtificial Intelligence Midterm 2 V ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004 This lecture Conditional Independence Bayesian (Belief) Networks: Syntax and semantics Reading Chapter 14.1-14.2 Propositions and Random Variables Letting A refer to a proposition which may either be true

More information

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes Approximate reasoning Uncertainty and knowledge Introduction All knowledge representation formalism and problem solving mechanisms that we have seen until now are based on the following assumptions: All

More information

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Quantifying Uncertainty & Probabilistic Reasoning Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Outline Previous Implementations What is Uncertainty? Acting Under Uncertainty Rational Decisions Basic

More information

Product rule. Chain rule

Product rule. Chain rule Probability Recap CS 188: Artificial Intelligence ayes Nets: Independence Conditional probability Product rule Chain rule, independent if and only if: and are conditionally independent given if and only

More information

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation CS 188: Artificial Intelligence Spring 2010 Lecture 15: Bayes Nets II Independence 3/9/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Current

More information

COMP5211 Lecture Note on Reasoning under Uncertainty

COMP5211 Lecture Note on Reasoning under Uncertainty COMP5211 Lecture Note on Reasoning under Uncertainty Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Uncertainty 1 / 33 Uncertainty

More information

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks Computer Science CPSC 322 Lecture 23 Planning Under Uncertainty and Decision Networks 1 Announcements Final exam Mon, Dec. 18, 12noon Same general format as midterm Part short questions, part longer problems

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Uncertainty and Bayesian Networks

Uncertainty and Bayesian Networks Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks

More information

Recitation 9: Graphical Models: D-separation, Variable Elimination and Inference

Recitation 9: Graphical Models: D-separation, Variable Elimination and Inference 10-601b: Machine Learning, Spring 2014 Recitation 9: Graphical Models: -separation, Variable limination and Inference Jing Xiang March 18, 2014 1 -separation Let s start by getting some intuition about

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics

More information

Inference in Bayesian networks

Inference in Bayesian networks Inference in Bayesian networks AIMA2e hapter 14.4 5 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference

More information

Inference in Bayesian networks

Inference in Bayesian networks Inference in ayesian networks Devika uramanian omp 440 Lecture 7 xact inference in ayesian networks Inference y enumeration The variale elimination algorithm c Devika uramanian 2006 2 1 roailistic inference

More information

Probabilistic representation and reasoning

Probabilistic representation and reasoning Probabilistic representation and reasoning Applied artificial intelligence (EDAF70) Lecture 04 2019-02-01 Elin A. Topp Material based on course book, chapter 13, 14.1-3 1 Show time! Two boxes of chocolates,

More information

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science Probabilistic Reasoning Kee-Eung Kim KAIST Computer Science Outline #1 Acting under uncertainty Probabilities Inference with Probabilities Independence and Bayes Rule Bayesian networks Inference in Bayesian

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

COMPSCI 276 Fall 2007

COMPSCI 276 Fall 2007 Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1 elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2 Probabilistic Inference

More information

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions: opology of network encodes conditional independence assertions: ayesian networks Weather Cavity Chapter 4. 3 oothache Catch W eather is independent of the other variables oothache and Catch are conditionally

More information

The Bucket Elimination Algorithm for Belief Net Inference R Greiner October 22, 2006

The Bucket Elimination Algorithm for Belief Net Inference R Greiner October 22, 2006 The Bucket Elimination Algorithm for Belief Net Inference R Greiner October 22, 2006 In general, the input to the BuckElim algorithm is a belief net (structure plus CPtables) here: the Belief Net shown

More information

Probabilistic representation and reasoning

Probabilistic representation and reasoning Probabilistic representation and reasoning Applied artificial intelligence (EDA132) Lecture 09 2017-02-15 Elin A. Topp Material based on course book, chapter 13, 14.1-3 1 Show time! Two boxes of chocolates,

More information

Graphical Models - Part II

Graphical Models - Part II Graphical Models - Part II Bishop PRML Ch. 8 Alireza Ghane Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Graphical Models Alireza Ghane / Greg Mori 1 Outline Probabilistic

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z) Graphical Models Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig AIMA2e Möller/Mori 2

More information

Bayesian Networks 3 D-separation. D-separation

Bayesian Networks 3 D-separation. D-separation ayesian Networks 3 D-separation 1 D-separation iven a graph, we would like to read off independencies The converse is easier to think about: when does an independence statement not hold? g. when can X

More information

Another look at Bayesian. inference

Another look at Bayesian. inference Another look at Bayesian A general scenario: - Query variables: X inference - Evidence (observed) variables and their values: E = e - Unobserved variables: Y Inference problem: answer questions about the

More information

Reasoning under Uncertainty

Reasoning under Uncertainty Reasoning under Uncertainty Uncertainty This material is covered in chapters 13 and 14 of Russell and Norvig3 rd edition. Chapter 13 gives some basic background on probability from the point of view of

More information

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004 Uncertainty Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, 2004 Administration PA 1 will be handed out today. There will be a MATLAB tutorial tomorrow, Friday, April 2 in AP&M 4882 at

More information

Bayesian networks (1) Lirong Xia

Bayesian networks (1) Lirong Xia Bayesian networks (1) Lirong Xia Random variables and joint distributions Ø A random variable is a variable with a domain Random variables: capital letters, e.g. W, D, L values: small letters, e.g. w,

More information

Artificial Intelligence Methods. Inference in Bayesian networks

Artificial Intelligence Methods. Inference in Bayesian networks Artificial Intelligence Methods Inference in Bayesian networks In which we explain how to build network models to reason under uncertainty according to the laws of probability theory. Dr. Igor rajkovski

More information

Reasoning Under Uncertainty

Reasoning Under Uncertainty Reasoning Under Uncertainty Introduction Representing uncertain knowledge: logic and probability (a reminder!) Probabilistic inference using the joint probability distribution Bayesian networks The Importance

More information

Conditional Independence

Conditional Independence Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information