Bayesian Networks. Example of Inference. Another Bayes Net Example (from Andrew Moore) Example Bayes Net. Want to know P(A)? Proceed as follows:
|
|
- Nathaniel Jessie Malone
- 5 years ago
- Views:
Transcription
1 ayesian Networks ayesian network (N) is a graphical representation of the direct dependencies over a set of variables, together with a set of conditional probability tables quantifying the strength of those influences. ayes nets are effective tools to represent the world and make inferences where there is uncertainty. More formally: N over variables {X 1, X 2,, X n } consists of a G (directed acyclic graph) whose nodes are the variables. ach node X i is associated with a conditional probability tables (PTs) that specify P(X i Parents(X i )) for that X i xample ayes Net H P() = 0.7 P(~) = 0.3 P( ) = 0.8 P(~ ) = 0.2 P( ~) = 0.5 P(~ ~) = 0.5 P( ) = 0.7 P(~ ) = 0.3 P( ~) = 0.0 P(~ ~) = 1.0 P( ) = 0.2 P(~ ) = 0.8 P( ~) = 0.1 P(~ ~) = 0.9 P(H ) = 0.9 P(~H ) = 0.1 P(H ~) = 0.1 P(~H ~) = 0.9 Note that specifying P(,,,,H) for any assignment of values to these variables requires only 9 parameters in the joint (rather than 31): That means inference is linear in the number of variables instead of exponential! Moreover, inference is linear generally if dependence has a chain structure xample of Inference Want to know P()? Proceed as follows: H nother ayes Net xample (from ndrew Moore) M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ssume that all Ts may arrive late in bad weather. Some Ts may be more likely to be late than others. These are all terms specified in our local distributions!
2 nother ayes Net xample (from ndrew Moore) M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ssume that all Ts may arrive late in bad weather. Some Ts may be more likely to be late than others. Let s begin by writing down knowledge we feel happy about: P(S M) = P(S), P(S) = 0.3, P(M) = 0.6 Lateness is independent of the weather and not independent of the T. nother ayes Net xample (from ndrew Moore) M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ssume that all Ts may arrive late in bad weather. Some Ts may be more likely to be late than others. P(S M) = P(S), P(S) = 0.3, P(M) = 0.6 Lateness is independent of the weather and not independent of the T. To specify P(L S = u, M = v) we need values for the 4 cases of u/v = True/alse. ayes net example ayes net example M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late M: Meredith is running the help session S: It is sunny out L: The leader of the help session arrives late ecause of conditional independence, we only need 6 values in the joint (how many would there be otherwise?) gain, conditional independence leads to computational savings!
3 Graphically representing the N Graphically representing the N Read the absence of an arrow between S and M to mean It would not help me predict M if I knew the value of S Read the two arrows into L to mean If I want to know the value of L it may help me to know M and to know S. ltering the graph Now let s suppose we have these three events: onditional independence Once you know who the T is, then whether they arrive late does not affect whether the help session concerns Reasoning with ayes Nets. M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets nd we know: rin has a higher chance of being late than Meredith. rin has a higher chance of giving a help session about reasoning with Ns What kind of independences exist in our graph?
4 Graphically representing the N Let s say we have 5 variables M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 We know: T is only directly influenced by L (i.e. T is conditionally independent of R,M,S given L) L is only directly influenced by M and S (i.e. L is conditionally independent of R given M & S) R is only directly influenced by M (i.e. R is conditionally independent of L,S, given M) M and S are independent uilding a ayes Net uilding a ayes Net M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 Step One: add variables. Step Two: add links. The link structure must be acyclic (Graph is a G) Remember that if a node X has parents, you are promising that any non-descendent of X is conditionally independent of X given its parents.
5 uilding a ayes Net M: Meredith is running the help session L: The leader of the session arrives late R : The help session concerns Reasoning with ayes Nets S: It is sunny out T: The session starts by 4:00 Step Three: add a probability table for each node. Note that the table for any node X must list P(X Parents(X)) for each possible combination of parent values. ach node is conditionally independent of all non-descendants, given its parents. Two unconnected variables may still be correlated You can deduce many other conditional independence relations from a ayes Net. uilding a ayes Net Note that it is always possible to construct a ayes Net to represent any distribution over variables X 1, X 2,, X n, using any ordering of variables. Take any ordering of the variables (say, the order given). rom the chain rule we obtain. P(X 1,,X n ) = P(X n X 1,,X n-1 )P(X n-1 X 1,,X n-2 ) P(X 1 ) Now for each Xi go through its conditioning set X 1,,X i-1, and iteratively remove all variables X j such that X i is conditionally independent of X j given the remaining variables. o this until no more variables can be removed. The final product will specify a ayes net. ausal Intuitions However, some orderings yield N s with very large parent sets. This requires exponential space, and (as we will see later) exponential time to perform inference. mpirically, and conceptually, a good way to construct a N is to use an ordering based on causality. This often yields a more natural and compact N. xample: ausal Intuitions Malaria, the flu and a cold all cause aches. So use the ordering that causes come before effects Malaria, lu, old, ches P(M,,,) = P( M,,) P( M,) P( M) P(M) ach of these diseases affects the probability of aches, so the first conditional probability does not change. It is reasonable to assume that these diseases are independent of each other: having or not having one does not change the probability of having the others. So P( M,) = P() P( M) = P()
6 xample: ausal Intuitions This yields a fairly simple ayes net. We only need one big PT, involving the family of ches. xample: ausal Intuitions ut suppose we build the N for the distribution using the opposite ordering, i.e., we use ordering ches, old, lu, Malaria P(,,,M) = P(M,,) P(,) P( ) P() We can t reduce P(M,,). Probability of Malaria is clearly affected by knowing aches. What about knowing aches and old, or aches and old and lu? Probability of Malaria is affected by both of these additional pieces of knowledge Knowing old and lu lowers the probability of aches related to Malaria since the other conditions explain away aches! xample: ausal Intuitions We obtain a much more complex ayes Net. In fact, we obtain no savings over explicitly representing the full joint distribution (i.e., representing the probability of every atomic event). urglary xample I'm at work, neighbour John calls to say my alarm is ringing, but neighbour Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: urglary, arthquake, larm, Johnalls, Maryalls Network topology reflects "causal" knowledge: burglar can set the alarm off n earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call
7 urglary xample urglary xample burglary can set the alarm off n earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? P(J M) = P(J)? # of Params: = 10 (vs = 31) urglary xample urglary xample Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? No P(, J, M) = P( )? P(, J, M) = P()?
8 urglary xample urglary xample Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? Suppose we choose the ordering Maryalls, Johnalls, larm, urglary, arthquake? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? No P(, J, M) = P( )? Yes P(, J, M) = P()? No P(,,J, M) = P( )? P(,, J, M) = P(, )? P(J M) = P(J)? No P( J, M) = P( J)? P( J, M) = P()? No P(, J, M) = P( )? Yes P(, J, M) = P()? No P(,,J, M) = P( )? No P(,, J, M) = P(, )? Yes urglary xample Inference in ayes Nets eciding conditional independence is hard in non-causal directions! (ausal models and conditional independence seem hardwired for humans!) Network is less compact: = 13 numbers needed Given a ayes net P(X 1, X 2,, X n ) = P(X n P(Parents(X n ))) * P(X n-1 P(Parents(X n-1 ))) * * P(X 1 P(Parents(X 1 ))) nd some evidence = {a set of values for some of the variables} we want to compute the new probability distribution P(X k ) That is, we want to figure our P(X k = d ) for all d om[x k ] This is a posterior probability function, meaning that is is conditioned on evidence.
9 Inference in ayes Nets Other types of examples are, computing probability of different diseases given symptoms, computing probability of hail storms given different metrological evidence, etc. In such cases getting a good estimate of the probability of the unknown event allows us to respond more effectively (gamble rationally) Inference in ayes Nets In the larm example: P(urglary,arthquake, larm, Johnalls, Maryalls) = P(arthquake) * P(urglary) * P(larm arthquake,urglary) * P(Johnalls larm) * P(Maryalls larm) We may want to infer things like P(urglary=true Maryalls=false, Johnalls=true) Variable limination Making inferences can be computationally hairy. Variable elimination uses the product decomposition that defines a ayes Net and the summing out rule to compute posterior probabilities from information already in the network (PTs). V helps to reduce some of the hair in our computation. xample (inary valued Variables) P(,,,,,,G,H,I,J,K) = P() x P() x P( ) x P(,) x P( ) x P( ) x P(G) x P(H,) x P(I,G) x P(J H,I) x P(K I)
10 xample xample P(,,,,,,G,H,I,J,K) = P ( ) P ( ) P ( ) P (, ) P ( ) P ( ) P ( G ) P(H,)P(I,G)P(J H,I)P(K I) Now we can compute P(h,-i) P(d,h,-i) + P(-d,h,-i) = P(h,-i) Say that (our evidence) = {H=true, I=false}, and we want to know P( h,-i) (h: H is true, -h: H is false) irst, we write as a sum for each value of (i.e. = d and = -d),,,,,g,j,k P(,,,d,,,h,-i,J,K) = P(d,h,-i),,,,,G,J,K P(,,,-d,,,h,-i,J,K) = P(-d,h,-i) nd finally, P( h,-i) P( d h,-i) = P(d,h,-i)/P(h,-i) P(-d h,-i) = P(-d,h,-i)/P(h,-i) So very good; it appears we only need to compute P(d,h,-i) and P(-d,h,-i) to obtain the conditional probabilities we want. xample xample We start with P(d,h,-i) =,,,,,G,J,K P(,,,d,,,h,- i,j,k) Use ayes Net product decomposition to rewrite summation:,,,,,g,j,k P(,,,d,,,h,-i,J,K) =,,,,,G,J,K P()P()P( )P(d,)P( ) P ( d)p(g)p(h,)p(-i,g)p(j h,-i) Next, rearrange summations so that we are not summing over variables that do not depend on the summed variable. =,,,,, G, J, P()P()P( )P(d,)P( ) P( d)p(g)p(h,)p(-i,g)p(j h,-i) = P() P() P( )P(d,) P( ) P( d) G P(G)P(h,)P(-i,G) J P(J h,-i) = P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h, -i)
11 Now start computing. xample P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(k -i) + P(-k -i) = c 1 J P(J h,-i) c 1 = c 1 J P(J h,-i) = c 1 (P(j h,-i) + P(-j h,-i)) = c 1 c 2 xample P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) c 1 c 2 G P(G) P(-i,G) = c 1 c 2 (P(g)P(-i,g) + P(-g)P(-i,-g)) ut P(-i,g) depends on the value of, so this is not a single number! xample xample Hum. Let s try ordering summations from front to back. P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(a) P() P(d a,) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a) P() P(d a,) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(a)P(b) P(d a,b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(a)P(-b) P(d a,-b) P( a) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(b) P(d -a,b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(-b) P(d -a,-b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i)
12 xample = Yikes! The size of the sum is doubling as we expand each variable (into v and v). This approach has exponential complexity. ut let s look a bit closer. xample = P(a)P(b) P(d a,b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(a)P(-b) P(d a,-b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(b) P(d -a,b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(-b) P(d -a,-b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) Repeated subterm Repeated subterm xample If we store the value of the sub-terms, we need only compute them once. xample = P(a)P(b) P(d a,b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(a)P(-b) P(d a,-b) P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + f 1 P(-a)P(b) P(d -a,b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-a)P(-b) P(d -a,-b) P( -a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) f 2 = c 1 f 1 + c 2 f 1 + c 3 f 2 + c 3 f 2 where c 1 = P(a)P(b) P(d a,b), c 2 = P(a)P(-b) P(d a,-b), c 3 = P(a)P(-b) P(d a,-b) and c 4 = P(a)P(-b) P(d a,-b)
13 xample f 1 = P( a) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) = P(c a) P( c) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) + P(-c a) P( -c) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) Repeated subterm ynamic Programming Within computation of sub-terms we obtain more repeated smaller sub-terms. The core idea of dynamic programming is to remember all smaller computations, so that they can be reused. This can convert an exponential computation into one that takes only polynomial time. Variable elimination is a dynamic programming technique that computes the sum from the bottom up (starting with the smaller sub-terms and working its way up to the bigger terms). Relevant (return to this later) brief aside is to also note that in the sum P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) we have that = 1 (Why?), thus J P(J h,-i) = J P(J h,-i) urthermore J P(J h,-i) = 1. So we can, in theory, drop these last two terms from the computation as J and K are not relevant given our query and our evidence ( i and h). Variable limination (V) V works from the inside of our equation out, summing out K, then J, then G, as we tried to before. P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) When we tried to sum out G, we got here. c 1 c 2 G P(G) P(-i,G) = c 1 c 2 (P(g)P(-i,g) + P(-g)P(-i,-g)) and we found that P(-i,-g) to depend on the value of ; it wasn t a single number. However, we can still continue with the computation by computing two different numbers, one for each value of (i.e. for = f and =f).
14 Variable limination (V) t(-f) = c 1 c 2 G P(G) P(-i -f,g) t(f) = c 1 c 2 G P(G) P(-i f,g) t(-f) = c 1 c 2 (P(g)P(-i -f,g) + P(-g)P(-i -f,-g)) t(f) = c 1 c 2 (P(g)P(-i f,g) + P(-g)P(-i f,-g)) With these, we can sum out. Variable limination (V) P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) c 1 c 2 P( d) P(h,) G P(G) P(-i,G) = c 1 c 2 (P(f d) P(h,f)( G P(G) P(-i f,g)) +P(-f d)p(h,-f)( G P(G)P(-i -f,g)) = c 1 c 2 P( d) P(h,) t() t(f), t(-f) Variable limination (V) c 1 c 2 (P(f d) P(h,f)t(f) +P(-f d)p(h,-f)t(-f) Now this is a function of, so we obtain two new numbers s(e) = c 1 c 2 (P(f d) P(h e,f)t(f) +P(-f d)p(h e,-f)t(-f) s(-e) = c 1 c 2 (P(f d) P(h -e,f)t(f) +P(-f d)p(h -e,- f)t(-f) Variable limination (V) P() P() P(d,) P( ) P( ) P( d) P(h,) G P(G) P(-i,G) J P(J h,-i) On summing out we obtain two numbers which represent a function of. Then a function of, then a function of. inally, we can sum out all variables to obtain the single number we wanted to compute, which is P(d,h,-i). Now we can repeat the process to compute P(-d,h,-i). Or, instead of doing it twice, we can simply regard as a variable in the computation. This will result in some computations that depend on the value of, and we ll obtain different numbers for each value of. Proceeding in this manner will yield a function of (i.e., a number for each value of ).
15 Variable limination (V) In general, each stage V will compute a table of numbers: one number for each different instantiation of the variables that are in the sum. The size of these tables is exponential in the number of variables appearing in the sum, e.g., P( ) P(h,)t() depends on the value of and, thus we will obtain om[] * om[] different numbers in the resulting table. actors we call the tables of values computed by V factors. Note that the original probabilities that appear in the summation, e.g., P( ), are also tables of values (one value for each instantiation of and ). Thus we also call the original PTs factors. ach factor is a function of some variables, e.g., P( ) = f(,): it maps each value of its arguments to a number. tabular representation is exponential in the number of variables in the factor. Operations on actors If we examine the summation process we will see that various operations repeatedly occur on factors. Notation: f(x,y) denotes a factor over the variables X Y (where X and Y are sets of variables) The Product of Two actors Let f(x,y) & g(y,z) be two factors with variables Y in common The product of f and g, denoted h = f x g (or sometimes just h = fg), is defined as: h(x,y,z) = f(x,y) x g(y,z) f(,) g(,) h(,,) ab 0.9 bc 0.7 abc 0.63 ab~c 0.27 a~b 0.1 b~c 0.3 a~bc 0.08 a~b~c 0.02 ~ab 0.4 ~bc 0.8 ~abc 0.28 ~ab~c 0.12 ~a~b 0.6 ~b~c 0.2 ~a~bc 0.48 ~a~b~c 0.12
16 Summing a Variable Out of a actor Let f(x,y) be a factor We can sum out variable X from f to produce a new factor h = Σ X f, which is defined: h(y) = Σ x om(x) f(x,y) f(,) h() ab 0.9 b 1.3 a~b 0.1 ~b 0.7 ~ab 0.4 ~a~b 0.6 Restricting a actor Let f(x,y) be a factor We can restrict factor f to X = a by setting X to the value x and deleting incompatible elements of f s domain. efine h = f X=a as: h(y) = f(a,y) f(,) h() for f =a ab 0.9 b 0.9 a~b 0.1 ~b 0.1 ~ab 0.4 ~a~b 0.6 Variable limination the lgorithm V: xample Given query var Q, evidence vars (variables observed to have values e), and remaining vars Z. Let be factors in original PTs. actors: f 1 () f 2 () f 3 (,,) f 4 (,) Query: P()? vidence: = d limination Order:, f 1 () f 2 () f 3 (,,) f 4 (,) 1. Replace each factor f that mentions a variable(s) in with its restriction f =e (this might yield a constant factor) 2. or each Z j - in the order given - eliminate Z j Z as follows: (a) ompute new factor g j = Zj f 1 x f 2 x x f k, where the f i are the factors in that include Z j (b) Remove the factors f i (that mention Z j ) from and add new factor g j to 3. The remaining factors at the end of this process will refer only to the query variable Q. Take their product and normalize to produce P(Q ). Step 1 (Restriction): Replace f 4 (,) with f 5 () = f 4 (,d) Step 2 (liminate ): ompute & add f 6 (,)= Σ f 5 () f 3 (,,) to list of factors. Remove: f 3 (,,), f 5 () Step 3 (liminate ): ompute & add f 7 () = Σ f 6 (,) f 2 () to list of factors. Remove: f 6 (,), f 2 () Step 4 (Normalize inal actors): f 7 (), f 1 (). The product f 1 () x f 7 () is (unnormalized) posterior for. So we normalize: P( d) = α f 1 () x f 7 () where α = 1/ f 1 ()f 7 ()
17 Numeric xample Here s an example with some numbers f 1 () f 2 (,) f 3 (,) f 1 () f 2 (,) f 3 (,) f 4 () Σ f 2 (,)f 1 () f 5 () Σ f 3 (,) f 4 () a 0.9 ab 0.9 bc 0.7 b 0.85 c ~a 0.1 a~b 0.1 b~c 0.3 ~b 0.15 ~c ~ab 0.4 ~bc 0.2 ~a~b 0.6 ~b~c 0.8 V: uckets as a Notational evice Ordering:,,,,, 1. : 2. : 3. : 4. : 5. : 6. : f 1 () f 2 () f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: uckets Place Original actors in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 3. : f 1 () 4. : f 2 () 5. : 6. : 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 () 5. : 6. : 1. Σ f 3 (,,) x f 4 (,) x f 5 (,) = f 7 (,,,)
18 V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 2. Σ f 6 (,,) = f 8 (,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 3. Σ f 1 () x f 7 (,,,) = f 9 (,,) 4. : f 2 () 4. : f 2 (), f 9 (,,) 5. : f 8 (,) 5. : f 8 (,) 6. : 6. : V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate the variables in order, placing new factor in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. Σ f 2 () x f 9 (,,) = f 10 (,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 5. Σ f 8 (,) x f 10 (,) = f 11 () 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) f 11 is the final answer, once we normalize it. 6. : 6. : f 11 ()
19 omplexity of Variable limination hypergraph has vertices just like an ordinary graph, but instead of edges between two vertices X Y it contains hyperedges. hyperedge is a set of vertices (i.e., potentially more than one) {,,} {,,} {,} omplexity of Variable limination Hypergraph of ayes Net The set of vertices are the nodes of the ayes net. The hyperedges are the variables appearing in each PT. {X i } Parents(X i ) omplexity of Variable limination Variable limination in the HyperGraph P(,,,,,) = P()P() X P(,) X P( ) X P( ) X P(,). To eliminate variable X i in the hypergraph we Remove the vertex X i reate a new hyperedge H i equal to the union of all of the hyperedges that contain X i minus X i Remove all of the hyperedges containing X from the hypergraph. dd the new hyperedge H i to the hypergraph.
20 omplexity of Variable limination omplexity of Variable limination liminate liminate omplexity of Variable limination Variable limination Notice that when at the outset of V we have a set of factors consisting of the reduced (or restricted) PTs. The unassigned variables for the vertices and the set of variables each factor depends on forms the hyperedges of a hypergraph H 1. liminate If the first variable we eliminate is X, then we remove all factors containing X (all hyperedges) and add a new factor that has as variables the union of the variables in the factors containing X (we add a hyperdege that is the union of the removed hyperedges minus X).
21 V actors f 5 (,) V: Place Original actors in first applicable bucket. f 5 (,) Ordering:,,,,, f 1 () f 2 () f 3 (,,) f 6,,) Ordering:,,,,, f 1 () f 2 () f 3 (,,) f 6 (,,) 1. : 2. : 3. : 4. : 5. : 6. : f 4 (,) 1. : f 3 (,,), f 4 (,), f 5 (,) 2. : f 6 (,,) 3. : f 1 () 4. : f 2 () 5. : 6. : f 4 (,) V: liminate, placing new factor f7 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate, placing new factor f8 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 () 5. : 3. : f 1 (), f 7 (,,,) 4. : f 2 () 5. : f 8 (,) 6. : 6. :
22 V: liminate, placing new factor f9 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) V: liminate, placing new factor f10 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) 2. : f 6 (,,) 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) 6. : 6. : V: liminate, placing new factor f11 in first applicable bucket. Ordering:,,,,, f 1 () f 2 () 1. : f 3 (,,), f 4 (,), f 5 (,) f 3 (,,) f 5 (,) f 4 (,) f 6 (,,) limination Width Given an ordering π of the variables and an initial hypergraph H eliminating these variables yields a sequence of hypergraphs H = H 0, H 1,H 2,,H n where H n contains only one vertex (the query variable). 2. : f 6 (,,) 3. : f 1 (), f 7 (,,,) 4. : f 2 (), f 9 (,,) 5. : f 8 (,), f 10 (,) 6. : f 11 () The elimination width π is the maximum size (number of variables) of any hyperedge in any of the hypergraphs H 0,H 1,,H n. The elimination width of the previous example was 4 ({,,,} in H 1 and H 2 ).
23 limination Width If the elimination width of an ordering π is k, then the complexity of V using that ordering is 2 O(k) limination width k means that at some stage in the elimination process a factor involving k variables was generated. That factor will require 2 O(k) space to store space complexity of V is 2 O(k) nd it will require 2 O(k) operations to process (either to compute in the first place, or when it is being processed to eliminate one of its variables). Time complexity of V is 2 O(k) NOT, that k is the elimination width of this particular ordering. Tree Width Given a hypergraph H with vertices {X 1,X 2,,X n } the tree width (ω) of H is the MINIMUM elimination width of any of the n! different orderings of the X i minus 1. Thus V has best case complexity of 2 O(ω) where ω is the tree width of the initial ayes Net. In the worst case, the tree width is equal to the number of variables. ifferent Orderings = ifferent limination Widths Suppose query variable is. onsider different orderings for this network,,,,g,h, : ad,,h,g,,,: Good Tree Width xponential in the tree width is the best that V can do. ut, finding an ordering that has elimination width equal to tree width is NP-Hard. so in practice there is no point in trying to speed up V by finding the best possible elimination ordering. Instead, heuristics are used to find orderings with good (low) elimination widths. In practice, this can be very successful. limination widths can often be relatively small, 8-10 even when the network has 1000s of variables. Thus V can be much more efficient than simply summing the probability of all possible events (which is exponential in the number of variables). Sometimes, however, the tree width is equal to the number of variables.
24 inding Good Orderings polytree is a singly connected ayes Net: in particular there is only one path between any two nodes. node can have multiple parents, but we have no cycles. Good orderings are easy to find for polytrees t each stage eliminate a singly connected node. ecause we have a polytree we are assured that a singly connected node will exist at each elimination stage. The size of the factors in the tree never increase. limination Ordering: Polytrees Tree width of a polytree is 1! liminating singly connected nodes allows V to run in time linear in size of network e.g., in this network, eliminate,,, X1, ; or eliminate X1, Xk,, ; or mix it up result: no factor ever larger than original PTs eliminating before these gives factors that include all of,, X1, Xk!!! Min ill Heuristic fairly effective heuristic is to always eliminate next the variable that creates the smallest size factor. This is called the min-fill heuristic. creates a factor of size k+2 creates a factor of size 2 creates a factor of size 1 This heuristic always solves polytrees in linear time. Relevance ertain variables have no impact on the query. or example, in network, computing P() with no evidence requires elimination of and. ut when you sum out these variables, you compute a trivial factor (that always evaluates to 1); for example: liminating : f 4 () = Σ f 3 (,) = Σ P( ) This is 1 for any value of (e.g., P(c b) + P(~c b) = 1) No need to think about or for this query
25 Relevance an restrict attention to relevant variables. Given query q, evidence : q itself is relevant if any node Z is relevant, its parents are relevant if e is a descendent of a relevant node, then is relevant We can restrict our attention to the subnetwork comprising only relevant variables when evaluating a query Q Relevance: xamples Query: P() relevant:,,, Query: P( ) relevant:,,, also:, hence, G G intuitively, we need to compute P( ) to compute P( ) H Query: P( H) relevant:,,, P()P()P(,)P( ) P(G)P(h G)P( G,)P( ) = P(G)P(h G)P( G,) P( ) = a table of 1 s = P(G)P(h G) P( G,) = a table of 1 s = [P()P()P(,)P( )] [P(G)P(h G)] [P(G)P(h G)] 1 but irrelevant once we normalize, multiplies each value of equally Relevance: xamples Query: P(,) algorithm says all variables except H are relevant; but really none except, (since cuts of all influence of others) algorithm is overestimating relevant set H G Independence in a ayes Net nother piece of information we can obtain from a ayes net is the structure of relationships in the domain. The structure of the N means: every X i is conditionally independent of all of its non-descendants given it parents: P(X i S Par(X i )) = P(X i Par(X i )) for any subset S Nonescendents(X i )
26 More generally. onditional independencies can be useful in computation, explanation, etc. related to a N How do we determine if two variables X, Y are independent given a set of variables? We can use a (simple) graphical property called -separation -separation: set of variables d-separates X and Y if it blocks every undirected path in the N between X and Y (we'll define locks next.) X and Y are conditionally independent given evidence if d- separates X and Y thus N gives us an easy way to tell if two variables are independent (set = ) or conditionally independent given. What does it mean to be blocked? X There exists a variable V on the path such that it is in the evidence set the arcs putting V in the path are tail-to-tail Or, there exists a variable V on the path such that it is in the evidence set the arcs putting V in the path are tail-to-head V X Y V Y What does it mean to be blocked? locking: Graphical View X Y V If the variable V is on the path such that the arcs putting V on the path are head-to-head, the variables are still blocked... so long as: V is NOT in the evidence set neither are any of its descendants
27 -separation implies conditional independence -Separation: Intuitions Theorem [Verma & Pearl, 1998]: If a set of evidence variables d-separates X and Z in a ayesian network s graph, then X is independent of Z given. Subway and Therm are dependent; but are independent given lu (since lu blocks the only path) -Separation: Intuitions -Separation: Intuitions ches and ever are dependent; but are independent given lu (since lu blocks the only path). Similarly for ches and Therm (dependent, but indep. given lu). lu and Mal are indep. (given no evidence): ever blocks the path, since it is not in evidence, nor is its descendant Therm. lu and Mal are dependent given ever (or given Therm): nothing blocks path now.
28 -Separation: Intuitions Subway, xotictrip are indep.; they are dependent given Therm; they are indep. given Therm and Malaria. This for exactly the same reasons for lu/mal above. -Separation xample In the following network determine if and are independent given the evidence. 1. and given no evidence? 2. and given {}? 3. and given {G,}? 4. and given {G,,H}? 5. and given {G,}? 6. and given {,}? 7. and given {,,H}? 8. and given {}? 9. and given {H,}? 10. and given {G,,,H,,,}? H G -Separation xample In the following network determine if and are independent given the evidence. 1. and given no evidence? N 2. and given {}? N 3. and given {G,}? Y 4. and given {G,,H}? Y 5. and given {G,}? N 6. and given {,}? Y 7. and given {,,H}? N 8. and given {}? Y 9. and given {H,}? Y 10. and given {G,,,H,,,}? Y H G
Reasoning under Uncertainty
Reasoning under Uncertainty This material is covered in chapters 13 and 14 of Russell and Norvig 2 nd and 3 rd edition. hapter 13 gives some basic background on probability from the point of view of I.
More informationAxioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.
Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.
More informationBayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III
CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III Bayesian Networks Announcements: Drop deadline is this Sunday Nov 5 th. All lecture notes needed for T3 posted (L13,,L17). T3 sample
More informationPROBABILISTIC REASONING SYSTEMS
PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain
More informationOutline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car
CSE 473: rtificial Intelligence Spring 2012 ayesian Networks Dan Weld Outline Probabilistic models (and inference) ayesian Networks (Ns) Independence in Ns Efficient Inference in Ns Learning Many slides
More informationArtificial Intelligence Bayes Nets
rtificial Intelligence ayes Nets print troubleshooter (part of Windows 95) Nilsson - hapter 19 Russell and Norvig - hapter 14 ayes Nets; page 1 of 21 ayes Nets; page 2 of 21 joint probability distributions
More informationLecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006
Lecture 8 Probabilistic Reasoning CS 486/686 May 25, 2006 Outline Review probabilistic inference, independence and conditional independence Bayesian networks What are they What do they mean How do we create
More informationProbabilistic Reasoning. (Mostly using Bayesian Networks)
Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationQuantifying uncertainty & Bayesian networks
Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,
More informationBayesian Networks Representation and Reasoning
ayesian Networks Representation and Reasoning Marco F. Ramoni hildren s Hospital Informatics Program Harvard Medical School (2003) Harvard-MIT ivision of Health Sciences and Technology HST.951J: Medical
More information14 PROBABILISTIC REASONING
228 14 PROBABILISTIC REASONING A Bayesian network is a directed graph in which each node is annotated with quantitative probability information 1. A set of random variables makes up the nodes of the network.
More informationBayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination
Outline Syntax Semantics Exact inference by enumeration Exact inference by variable elimination s A simple, graphical notation for conditional independence assertions and hence for compact specication
More informationBayesian networks. Chapter Chapter
Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions
More informationIntroduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What
More informationCS 188: Artificial Intelligence Fall 2009
CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes Nets 10/13/2009 Dan Klein UC Berkeley Announcements Assignments P3 due yesterday W2 due Thursday W1 returned in front (after lecture) Midterm
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationImplementing Machine Reasoning using Bayesian Network in Big Data Analytics
Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability
More informationBayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473
ayesian Networks CSE 473 Last Time asic notions tomic events Probabilities Joint distribution Inference by enumeration Independence & conditional independence ayes rule ayesian networks Statistical learning
More informationObjectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.
Copyright Richard J. Povinelli rev 1.0, 10/1//2001 Page 1 Probabilistic Reasoning Systems Dr. Richard J. Povinelli Objectives You should be able to apply belief networks to model a problem with uncertainty.
More informationProbabilistic Reasoning Systems
Probabilistic Reasoning Systems Dr. Richard J. Povinelli Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 1 Objectives You should be able to apply belief networks to model a problem with uncertainty.
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More informationReasoning Under Uncertainty
CSC384h: Intro to Artificial Intelligence Reasoning Under Uncertainty This material is covered in chapters 13 and 14. Chapter 13 gives some basic background on probability from the point of view of AI.
More informationRecall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.
ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm
More informationBayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1
Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain
More informationIntroduction to Artificial Intelligence. Unit # 11
Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements
More informationBayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018
Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline Probability
More informationBayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net
CS 188: Artificial Intelligence Fall 2010 Lecture 15: ayes Nets II Independence 10/14/2010 an Klein UC erkeley A ayes net is an efficient encoding of a probabilistic model of a domain ayes Nets Questions
More informationAnnouncements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences
CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 14: Bayes Nets II Independence 3/9/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements
More informationBayesian networks. Chapter 14, Sections 1 4
Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks
More informationProbabilistic Models
Bayes Nets 1 Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables
More informationArtificial Intelligence Bayesian Networks
Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence
More informationInformatics 2D Reasoning and Agents Semester 2,
Informatics 2D Reasoning and Agents Semester 2, 2017 2018 Alex Lascarides alex@inf.ed.ac.uk Lecture 23 Probabilistic Reasoning with Bayesian Networks 15th March 2018 Informatics UoE Informatics 2D 1 Where
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More information15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference
15-780: Graduate Artificial Intelligence ayesian networks: Construction and inference ayesian networks: Notations ayesian networks are directed acyclic graphs. Conditional probability tables (CPTs) P(Lo)
More informationBayesian Belief Network
Bayesian Belief Network a! b) = a) b) toothache, catch, cavity, Weather = cloudy) = = Weather = cloudy) toothache, catch, cavity) The decomposition of large probabilistic domains into weakly connected
More informationGraphical Models - Part I
Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 14: Bayes Nets 10/14/2008 Dan Klein UC Berkeley 1 1 Announcements Midterm 10/21! One page note sheet Review sessions Friday and Sunday (similar) OHs on
More informationProbabilistic Models. Models describe how (a portion of) the world works
Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables All models
More informationCSE 473: Artificial Intelligence Autumn 2011
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Outline Probabilistic models
More informationBayes Networks 6.872/HST.950
Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationBayesian Networks. Vibhav Gogate The University of Texas at Dallas
Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to ayesian Networks Lecture 4: Inference in ayesian Networks: The VE lgorithm Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University
More informationReview: Bayesian learning and inference
Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationUncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!
Uncertainty and Belief Networks Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2! This lecture Conditional Independence Bayesian (Belief) Networks: Syntax and semantics
More informationOutline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car
CSE 573: Artificial Intelligence Autumn 2012 Bayesian Networks Dan Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer Outline Probabilistic models (and inference)
More informationBayesian Networks. Vibhav Gogate The University of Texas at Dallas
Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 6364) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline
More informationProbability. CS 3793/5233 Artificial Intelligence Probability 1
CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions
More informationBayesian Networks: Independencies and Inference
Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful
More informationBayes Nets: Independence
Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes
More informationDefining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser
Lecture 17: Uncertainty 2 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture How belief networks can be a Knowledge Base for probabilistic knowledge. How to construct a belief network. How to answer
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationlast two digits of your SID
Announcements Midterm: Wednesday 7pm-9pm See midterm prep page (posted on Piazza, inst.eecs page) Four rooms; your room determined by last two digits of your SID: 00-32: Dwinelle 155 33-45: Genetics and
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK
More informationMidterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015
S 88 Spring 205 Introduction to rtificial Intelligence Midterm 2 V ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib
More informationThis lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004
This lecture Conditional Independence Bayesian (Belief) Networks: Syntax and semantics Reading Chapter 14.1-14.2 Propositions and Random Variables Letting A refer to a proposition which may either be true
More informationUncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes
Approximate reasoning Uncertainty and knowledge Introduction All knowledge representation formalism and problem solving mechanisms that we have seen until now are based on the following assumptions: All
More informationQuantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari
Quantifying Uncertainty & Probabilistic Reasoning Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Outline Previous Implementations What is Uncertainty? Acting Under Uncertainty Rational Decisions Basic
More informationProduct rule. Chain rule
Probability Recap CS 188: Artificial Intelligence ayes Nets: Independence Conditional probability Product rule Chain rule, independent if and only if: and are conditionally independent given if and only
More informationAnnouncements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation
CS 188: Artificial Intelligence Spring 2010 Lecture 15: Bayes Nets II Independence 3/9/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Current
More informationCOMP5211 Lecture Note on Reasoning under Uncertainty
COMP5211 Lecture Note on Reasoning under Uncertainty Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Uncertainty 1 / 33 Uncertainty
More informationComputer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks
Computer Science CPSC 322 Lecture 23 Planning Under Uncertainty and Decision Networks 1 Announcements Final exam Mon, Dec. 18, 12noon Same general format as midterm Part short questions, part longer problems
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationUncertainty and Bayesian Networks
Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks
More informationRecitation 9: Graphical Models: D-separation, Variable Elimination and Inference
10-601b: Machine Learning, Spring 2014 Recitation 9: Graphical Models: -separation, Variable limination and Inference Jing Xiang March 18, 2014 1 -separation Let s start by getting some intuition about
More informationLearning Bayesian Networks (part 1) Goals for the lecture
Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationInference in Bayesian networks
Inference in Bayesian networks AIMA2e hapter 14.4 5 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference
More informationInference in Bayesian networks
Inference in ayesian networks Devika uramanian omp 440 Lecture 7 xact inference in ayesian networks Inference y enumeration The variale elimination algorithm c Devika uramanian 2006 2 1 roailistic inference
More informationProbabilistic representation and reasoning
Probabilistic representation and reasoning Applied artificial intelligence (EDAF70) Lecture 04 2019-02-01 Elin A. Topp Material based on course book, chapter 13, 14.1-3 1 Show time! Two boxes of chocolates,
More informationProbabilistic Reasoning. Kee-Eung Kim KAIST Computer Science
Probabilistic Reasoning Kee-Eung Kim KAIST Computer Science Outline #1 Acting under uncertainty Probabilities Inference with Probabilities Independence and Bayes Rule Bayesian networks Inference in Bayesian
More informationBayesian Networks: Representation, Variable Elimination
Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationCOMPSCI 276 Fall 2007
Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1 elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2 Probabilistic Inference
More informationExample. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:
opology of network encodes conditional independence assertions: ayesian networks Weather Cavity Chapter 4. 3 oothache Catch W eather is independent of the other variables oothache and Catch are conditionally
More informationThe Bucket Elimination Algorithm for Belief Net Inference R Greiner October 22, 2006
The Bucket Elimination Algorithm for Belief Net Inference R Greiner October 22, 2006 In general, the input to the BuckElim algorithm is a belief net (structure plus CPtables) here: the Belief Net shown
More informationProbabilistic representation and reasoning
Probabilistic representation and reasoning Applied artificial intelligence (EDA132) Lecture 09 2017-02-15 Elin A. Topp Material based on course book, chapter 13, 14.1-3 1 Show time! Two boxes of chocolates,
More informationGraphical Models - Part II
Graphical Models - Part II Bishop PRML Ch. 8 Alireza Ghane Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Graphical Models Alireza Ghane / Greg Mori 1 Outline Probabilistic
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationp(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)
Graphical Models Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig AIMA2e Möller/Mori 2
More informationBayesian Networks 3 D-separation. D-separation
ayesian Networks 3 D-separation 1 D-separation iven a graph, we would like to read off independencies The converse is easier to think about: when does an independence statement not hold? g. when can X
More informationAnother look at Bayesian. inference
Another look at Bayesian A general scenario: - Query variables: X inference - Evidence (observed) variables and their values: E = e - Unobserved variables: Y Inference problem: answer questions about the
More informationReasoning under Uncertainty
Reasoning under Uncertainty Uncertainty This material is covered in chapters 13 and 14 of Russell and Norvig3 rd edition. Chapter 13 gives some basic background on probability from the point of view of
More informationUncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004
Uncertainty Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, 2004 Administration PA 1 will be handed out today. There will be a MATLAB tutorial tomorrow, Friday, April 2 in AP&M 4882 at
More informationBayesian networks (1) Lirong Xia
Bayesian networks (1) Lirong Xia Random variables and joint distributions Ø A random variable is a variable with a domain Random variables: capital letters, e.g. W, D, L values: small letters, e.g. w,
More informationArtificial Intelligence Methods. Inference in Bayesian networks
Artificial Intelligence Methods Inference in Bayesian networks In which we explain how to build network models to reason under uncertainty according to the laws of probability theory. Dr. Igor rajkovski
More informationReasoning Under Uncertainty
Reasoning Under Uncertainty Introduction Representing uncertain knowledge: logic and probability (a reminder!) Probabilistic inference using the joint probability distribution Bayesian networks The Importance
More informationConditional Independence
Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why
More informationIntelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example
More information