Bayes Nets for representing and reasoning about uncertainty

Size: px

Start display at page:

Download "Bayes Nets for representing and reasoning about uncertainty"

Leona Watson
6 years ago
Views:

1 Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of ndrew s tutorials: Comments and corrections gratefully received. ndrew W. Moore Professor School of Computer Science Carnegie Mellon University awm@cs.cmu.edu What we ll discuss Recall the numerous and dramatic benefits of Joint Distributions for describing uncertain worlds Reel with terror at the problem with using Joint Distributions Discover how Bayes Net methodology allows us to built Joint Distributions in manageable chunks Discover there s still a lurking problem Start to solve that problem 2

2 Why this matters In ndrew s opinion, the most important technology in the Machine Learning / I field to have emerged in the last years. clean, clear, manageable language and methodology for expressing what you re certain and uncertain about lready, many practical applications in medicine, factories, helpdesks: this problem these symptoms) anomalousness of this observation choosing next diagnostic test these observations 3 Why this matters In ndrew s opinion, the most important technology in the Machine Learning / I field ctive to have Data emerged in the last years. Collection clean, clear, manageable language and methodology for expressing what you re Inference certain and uncertain about lready, many practical applications nomaly in medicine, factories, helpdesks: Detection this problem these symptoms) anomalousness of this observation choosing next diagnostic test these observations 4 2

3 Ways to deal with Uncertainty Three-valued logic: True / False / Maybe Fuzzy logic (truth values between and ) Non-monotonic reasoning (especially focused on Penguin informatics) Dempster-Shafer theory (and an extension known as quasi-bayesian theory) Possibabilistic Logic Probability 5 Discrete Random Variables is a Boolean-valued random variable if denotes an event, and there is some degree of uncertainty as to whether occurs. Examples The US president in 223 will be male You wake up tomorrow with a headache You have Ebola 6 3

4 Probabilities We write ) as the fraction of possible worlds in which is true We could at this point spend 2 hours on the philosophy of this. But we won t. 7 Visualizing Event space of all possible worlds Worlds in which is true ) rea of reddish oval Its area is Worlds in which is False 8 4

5 Interpreting the axioms < ) < True) False) or B) ) + B) - and B) The area of can t get any smaller than nd a zero area would mean no world could ever have true 9 Interpreting the axioms < ) < True) False) or B) ) + B) - and B) The area of can t get any bigger than nd an area of would mean all worlds will have true 5

6 Interpreting the axioms < ) < True) False) or B) ) + B) - and B) B Interpreting the axioms < ) < True) False) or B) ) + B) - and B) B or B) and B) B Simple addition and subtraction 2 6

7 These xioms are Not to be Trifled With There have been attempts to do different methodologies for uncertainty Fuzzy Logic Three-valued logic Dempster-Shafer Non-monotonic reasoning But the axioms of probability are the only system with this property: If you gamble using them you can t be unfairly exploited by an opponent using some other system [di Finetti 93] 3 Theorems from the xioms < ) <, True), False) or B) ) + B) - and B) From these we can prove: not ) ~) -) How? 4 7

8 Side Note I am inflicting these proofs on you for two reasons:. These kind of manipulations will need to be second nature to you if you use probabilistic analytics in depth 2. Suffering is good for you 5 nother important theorem < ) <, True), False) or B) ) + B) - and B) From these we can prove: ) ^ B) + ^ ~B) How? 6 8

9 Conditional Probability B) Fraction of worlds in which B is true that also have true H Have a headache F Coming down with Flu F H) / F) /4 H F) /2 H Headaches are rare and flu is rarer, but if you re coming down with flu there s a 5-5 chance you ll have a headache. 7 Conditional Probability F H F) Fraction of flu-inflicted worlds in which you have a headache H Have a headache F Coming down with Flu H) / F) /4 H F) /2 H #worlds with flu and headache #worlds with flu rea of H and F region rea of F region H ^ F) F) 8 9

10 Definition of Conditional Probability ^ B) B) B) Corollary: The Chain Rule ^ B) B) B) 9 Bayes Rule ^ B) B) B) B ) ) ) This is Bayes Rule Bayes, Thomas (763) n essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:

11 Using Bayes Rule to Gamble R R B B $. R B B The Win envelope has a dollar and four beads in it The Lose envelope has three beads and no money Trivial question: someone draws an envelope at random and offers to sell it to you. How much should you pay? 2 Using Bayes Rule to Gamble $. The Win envelope has a dollar and four beads in it The Lose envelope has three beads and no money Interesting question: before deciding, you are allowed to see one bead drawn from the envelope. Suppose it s black: How much should you pay? Suppose it s red: How much should you pay? 22

12 Calculation $. 23 Multivalued Random Variables Suppose can take on more than 2 values is a random variable with arity k if it can take on exactly one value out of {v,v 2,.. v k } Thus v P v v v ) if i j v ) i j ( 2 k 24 2

13 n easy fact about Multivalued Random Variables: Using the axioms of probability < ) <, True), False) or B) ) + B) - and B) nd assuming that obeys v P v It s easy to prove that v v ) if i v ) i j ( 2 k j v v2 v ) i i v j j ) 25 n easy fact about Multivalued Random Variables: Using the axioms of probability < ) <, True), False) or B) ) + B) - and B) nd assuming that obeys v P v It s easy to prove that v v ) if i v ) i j ( 2 k j v v2 v ) nd thus we can prove k j v j i i v j j ) ) 26 3

14 4 27 nother fact about Multivalued Random Variables: Using the axioms of probability < ) <, True), False) or B) ) + B) - and B) nd assuming that obeys It s easy to prove that j i v v P j i if ) ( ) ( 2 k v v v P ) ( ]) [ ( 2 i j i v j B P v v v B P 28 nother fact about Multivalued Random Variables: Using the axioms of probability < ) <, True), False) or B) ) + B) - and B) nd assuming that obeys It s easy to prove that j i v v P j i if ) ( ) ( 2 k v v v P ) ( ]) [ ( 2 i j i v j B P v v v B P nd thus we can prove ) ( ) ( k j v j B P B P

15 More General Forms of Bayes Rule B ) ) B) B ) ) + B ~ ) ~ ) B X) X) B X) B X) 29 More General Forms of Bayes Rule v i B) n k B v ) v ) B v i ) v ) k i k 3 5

16 Useful Easy-to-prove facts B) + B) n k v k B) 3 The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables, B, C 32 6

17 The Joint Distribution Recipe for making a joint distribution of M variables:. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). Example: Boolean variables, B, C B C 33 The Joint Distribution Recipe for making a joint distribution of M variables:. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). 2. For each combination of values, say how probable it is. Example: Boolean variables, B, C B C Prob

18 The Joint Distribution Recipe for making a joint distribution of M variables:. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). 2. For each combination of values, say how probable it is. 3. If you subscribe to the axioms of probability, those numbers must sum to. Example: Boolean variables, B, C B.5.25 C.5 Prob C.3 B. 35 Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute E) rows matching E row) 36 8

19 Using the Joint Poor Male).4654 E) rows matching E row) 37 Using the Joint Poor).764 E) rows matching E row) 38 9

20 Inference with the Joint E E 2 ) E E2) E ) 2 rows matching E and E rows matching E row) 2 2 row) 39 Inference with the Joint E E 2 ) E E2) E ) 2 rows matching E and E rows matching E row) 2 2 row) Male Poor).4654 /

21 Joint distributions Good news Once you have a joint distribution, you can ask important questions about stuff that involves a lot of uncertainty Bad news Impossible to create for more than about ten attributes because there are so many numbers needed when you build the damn thing. 4 Using fewer numbers Suppose there are two events: M: Manuela teaches the class (otherwise it s ndrew) S: It is sunny The joint p.d.f. for these events contain four entries. If we want to build the joint p.d.f. we ll have to invent those four numbers. OR WILL WE?? We don t have to specify with bottom level conjunctive events such as ~M^S) IF instead it may sometimes be more convenient for us to specify things like: M), S). But just M) and S) don t derive the joint distribution. So you can t answer all questions. 42 2

22 Using fewer numbers Suppose there are two events: M: Manuela teaches the class (otherwise it s ndrew) S: It is sunny The joint p.d.f. for these events contain four entries. If we want to build the joint p.d.f. we ll have to invent those four numbers. OR WILL WE?? We don t have to specify with bottom level conjunctive events such as ~M^S) IF instead it may sometimes be more convenient for us to specify things like: M), S). But just M) and S) don t derive the joint distribution. So you can t answer all questions. What extra assumption can you make? 43 Independence The sunshine levels do not depend on and do not influence who is teaching. This can be specified very simply: S M) S) This is a powerful statement! It required extra domain knowledge. different kind of knowledge than numerical probabilities. It needed an understanding of causation

23 Independence From S M) S), the rules of probability imply: (can you prove these?) ~S M) ~S) M S) M) M ^ S) M) S) ~M ^ S) ~M) S), (PM^~S) M)~S), ~M^~S) ~M)~S) 45 Independence From S M) S), the rules of probability imply: (can you prove these?) nd in general: ~S M) ~S) Mu ^ Sv) Mu) Sv) M S) M) for each of the four combinations of M ^ S) M) S) utrue/false vtrue/false ~M ^ S) ~M) S), (PM^~S) M)~S), ~M^~S) ~M)~S) 46 23

24 We ve stated: M).6 S).3 S M) S) Independence From these statements, we can derive the full joint pdf. T T F F M T F T F S Prob nd since we now have the joint pdf, we can make any queries we like. 47 more interesting case M : Manuela teaches the class S : It is sunny L : The lecturer arrives slightly late. ssume both lecturers are sometimes delayed by bad weather. ndrew is more likely to arrive late than Manuela

25 more interesting case M : Manuela teaches the class S : It is sunny L : The lecturer arrives slightly late. ssume both lecturers are sometimes delayed by bad weather. ndrew is more likely to arrive late than Manuela. Let s begin with writing down knowledge we re happy about: S M) S), S).3, M).6 Lateness is not independent of the weather and is not independent of the lecturer. 49 more interesting case M : Manuela teaches the class S : It is sunny L : The lecturer arrives slightly late. ssume both lecturers are sometimes delayed by bad weather. ndrew is more likely to arrive late than Manuela. Let s begin with writing down knowledge we re happy about: S M) S), S).3, M).6 Lateness is not independent of the weather and is not independent of the lecturer. We already know the Joint of S and M, so all we need now is L Su, Mv) in the 4 cases of u/v True/False. 5 25

26 more interesting case M : Manuela teaches the class S : It is sunny L : The lecturer arrives slightly late. ssume both lecturers are sometimes delayed by bad weather. ndrew is more likely to arrive late than Manuela. S M) S) S).3 M).6 L M ^ S).5 L M ^ ~S). L ~M ^ S). L ~M ^ ~S).2 Now we can derive a full joint p.d.f. with a mere six numbers instead of seven* *Savings are larger for larger numbers of variables. 5 more interesting case M : Manuela teaches the class S : It is sunny L : The lecturer arrives slightly late. ssume both lecturers are sometimes delayed by bad weather. ndrew is more likely to arrive late than Manuela. S M) S) S).3 M).6 L M ^ S).5 L M ^ ~S). L ~M ^ S). L ~M ^ ~S).2 Question: Express Lx ^ My ^ Sz) in terms that only need the above expressions, where x,y and z may each be True or False

27 bit of notation S M) S) S).3 M).6 L M ^ S).5 L M ^ ~S). L ~M ^ S). L ~M ^ ~S).2 s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L 53 S M) S) S).3 M).6 s).3 bit of notation L M ^ S).5 L M ^ ~S) Read. the absence of an arrow L ~M ^ S) between. S and M to mean it L ~M ^ ~S) would.2not help me predict M if I knew the value of S M).6 S M This kind of stuff will be thoroughly formalized later L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S

28 n even cuter trick Suppose we have these three events: M : Lecture taught by Manuela L : Lecturer arrives late R : Lecture concerns robots Suppose: ndrew has a higher chance of being late than Manuela. ndrew has a higher chance of giving robotics lectures. What kind of independence can we find? How about: L M) L)? R M) R)? L R) L)? 55 Conditional independence Once you know who the lecturer is, then whether they arrive late doesn t affect whether the lecture concerns robots. R M,L) R M) and R ~M,L) R ~M) We express this in the following way: R and L are conditionally independent given M..which is also notated by the following diagram. L M R Given knowledge of M, knowing anything else in the diagram won t help us with L, etc

29 Conditional Independence formalized R and L are conditionally independent given M if for all x,y,z in {T,F}: Rx My ^ Lz) Rx My) More generally: Let S and S2 and S3 be sets of variables. Set-of-variables S and set-of-variables S2 are conditionally independent given S3 if for all assignments of values to the variables in the sets, S s assignments S 2 s assignments & S 3 s assignments) S s assignments S3 s assignments) 57 Example: Shoe-size is conditionally independent of Glove-size given height weight and age R and L are conditionally independent given M if means for all x,y,z in {T,F}: forall s,g,h,w,a Rx My ShoeSizes Heighth,Weightw,gea) ^ Lz) Rx My) ShoeSizes Heighth,Weightw,gea,GloveSizeg) More generally: Let S and S2 and S3 be sets of variables. Set-of-variables S and set-of-variables S2 are conditionally independent given S3 if for all assignments of values to the variables in the sets, S s assignments S 2 s assignments & S 3 s assignments) S s assignments S3 s assignments) 58 29

30 Example: Shoe-size is conditionally independent of Glove-size given height weight and age R and L are conditionally independent given M if does not mean for all x,y,z in {T,F}: forall s,g,h Rx My ^ Lz) ShoeSizes Heighth) Rx My) ShoeSizes Heighth, GloveSizeg) More generally: Let S and S2 and S3 be sets of variables. Set-of-variables S and set-of-variables S2 are conditionally independent given S3 if for all assignments of values to the variables in the sets, S s assignments S 2 s assignments & S 3 s assignments) S s assignments S3 s assignments) 59 Conditional independence L M R We can write down M). nd then, since we know L is only directly influenced by M, we can write down the values of L M) and L ~M) and know we ve fully specified L s behavior. Ditto for R. M).6 L M).85 L ~M).7 R M).3 R ~M).6 R and L conditionally independent given M 6 3

31 Conditional independence M M).6 L L M).85 L ~M).7 R M).3 R ~M).6 R Conditional Independence: R M,L) R M), R ~M,L) R ~M) gain, we can obtain any member of the Joint prob dist that we desire: Lx ^ Ry ^ Mz) 6 ssume five variables T: The lecture started by :35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny T only directly influenced by L (i.e. T is conditionally independent of R,M,S given L) L only directly influenced by M and S (i.e. L is conditionally independent of R given M & S) R only directly influenced by M (i.e. R is conditionally independent of L,S, given M) M and S are independent 62 3

32 Making a Bayes net T: The lecture started by :35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny S M L R T Step One: add variables. Just choose the variables you d like to be included in the net. 63 Making a Bayes net T: The lecture started by :35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny S M L R T Step Two: add links. The link structure must be acyclic. If node X is given parents Q,Q 2,..Q n you are promising that any variable that s a non-descendent of X is conditionally independent of X given {Q,Q 2,..Q n } 64 32

33 Making a Bayes net T: The lecture started by :35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 R R M).3 R ~M).6 Step Three: add a probability table for each node. The table for node X must list X Parent Values) for each possible combination of parent values 65 Making a Bayes net T: The lecture started by :35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 R R M).3 R ~M).6 Two unconnected variables may still be correlated Each node is conditionally independent of all nondescendants in the tree, given its parents. You can deduce many other conditional independence relations from a Bayes net. See the next lecture

34 Bayes Nets Formalized Bayes net (also called a belief network) is an augmented directed acyclic graph, represented by the pair V, E where: V is a set of vertices. E is a set of directed edges joining vertices. No loops of any length are allowed. Each vertex in V contains the following information: The name of a random variable probability distribution table indicating how the probability of this variable s values depends on all possible combinations of parental values. 67 Building a Bayes Net. Choose a set of relevant variables. 2. Choose an ordering for them 3. ssume they re called X.. X m (where X is the first in the ordering, X is the second, etc) 4. For i to m:. dd the X i node to the network 2. Set Parents(X i ) to be a minimal subset of {X X i- } such that we have conditional independence of X i and all other members of {X X i- } given Parents(X i ) 3. Define the probability table of X i k ssignments of Parents(X i ) )

35 Example Bayes Net Building Suppose we re building a nuclear power station. There are the following random variables: GRL : Gauge Reads Low. CTL : Core temperature is low. FG : Gauge is faulty. F : larm is faulty S : larm sounds If alarm working properly, the alarm is meant to sound if the gauge stops reading a low temp. If gauge working properly, the gauge is meant to read the temp of the core. 69 Computing a Joint Entry How to compute an entry in a joint distribution? E.G: What is S ^ ~M ^ L ~R ^ T)? s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 R R M).3 R ~M)

36 s).3 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 Computing with Bayes Net S L T M T L).3 T ~L).8 M).6 R R M).3 R ~M).6 T ^ ~R ^ L ^ ~M ^ S) T ~R ^ L ^ ~M ^ S) * ~R ^ L ^ ~M ^ S) T L) * ~R ^ L ^ ~M ^ S) T L) * ~R L ^ ~M ^ S) * L^~M^S) T L) * ~R ~M) * L^~M^S) T L) * ~R ~M) * L ~M^S)*~M^S) T L) * ~R ~M) * L ~M^S)*~M S)*S) T L) * ~R ~M) * L ~M^S)*~M)*S). 7 The general case X x ^ X 2 x 2 ^.X n- x n- ^ X n x n ) X n x n ^ X n- x n- ^.X 2 x 2 ^ X x ) X n x n X n- x n- ^.X 2 x 2 ^ X x ) * X n- x n- ^. X 2 x 2 ^ X x ) X n x n X n- x n- ^.X 2 x 2 ^ X x ) * X n- x n-. X 2 x 2 ^ X x ) * X n-2 x n-2 ^. X 2 x 2 ^ X x ) : : n P X x X x K X x i n i P ( )(( ) ( ))) i i ( X x ) ssignment s of Parents( X )) i i i i So any entry in joint pdf table can be computed. nd so any conditional probability can be computed. i 72 36

37 Where are we now? We have a methodology for building Bayes nets. We don t require exponential storage to hold our probability table. Only exponential in the maximum number of parents of any node. We can compute probabilities of any given assignment of truth values to the variables. nd we can do it in time linear with the number of nodes. So we can also compute answers to any questions. s).3 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 E.G. What could we do to compute R T,~S)? S L T M T L).3 T ~L).8 M).6 R R M).3 R ~M).6 73 Where are we now? Step We : Compute have a methodology R ^ T ^ ~S) for building Bayes nets. We don t require exponential storage to hold our probability Step 2: Compute ~R ^ T ^ ~S) table. Only exponential in the maximum number of parents of any node. Step 3: Return We can compute probabilities of any given assignment of truth values to the variables. nd we can do it in time R ^ T ^ ~S) linear with the number of nodes R So we ^ T ^ can ~S)+ also ~R compute ^ T ^ ~S) answers to any questions. s).3 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 E.G. What could we do to compute R T,~S)? S L T M T L).3 T ~L).8 M).6 R R M).3 R ~M)

38 Where are we now? Step We : Compute have a methodology R ^ T ^ ~S) for building Sum Bayes of all the nets. rows in the Joint that match R ^ T ^ ~S We don t require exponential storage to hold our probability Step 2: Compute ~R ^ T ^ ~S) table. Only exponential in the maximum number of parents of any node. Sum of all the rows in the Joint Step 3: Return We can compute probabilities of any that given match assignment ~R ^ T ^ ~Sof truth values to the variables. nd we can do it in time R ^ T ^ ~S) linear with the number of nodes R So we ^ T ^ can ~S)+ also ~R compute ^ T ^ ~S) answers to any questions. s).3 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 E.G. What could we do to compute R T,~S)? S L T M T L).3 T ~L).8 M).6 R R M).3 R ~M).6 75 Where are we now? Step We : Compute have a methodology R ^ T ^ ~S) for building Sum Bayes of all the nets. rows in the Joint that match R ^ T ^ ~S We don t require exponential storage to hold our probability Step 2: Compute ~R ^ T ^ ~S) table. Only exponential in the maximum number of parents of any node. Sum of all the rows in the Joint Step 3: Return We can compute probabilities of any that given match assignment ~R ^ T ^ ~Sof truth values to the variables. nd we can do it in time R ^ T ^ ~S) linear with the number of nodes R So we ^ T ^ can ~S)+ also ~R compute ^ T ^ ~S) answers Each to any of these questions. obtained by the computing a joint s).3 S M M).6 probability entry method of the earlier slides L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 E.G. What could we do to compute R T,~S)? R R M).3 R ~M).6 4 joint computes 4 joint computes 76 38

39 The good news We can do inference. We can compute any conditional probability: Some variable Some other variable values ) E E 2 ) E E2) E ) 2 joint entries matching E and E joint entries matching E joint entry) joint entry) The good news We can do inference. We can compute any conditional probability: Some variable Some other variable values ) E E 2 ) E E2) E ) 2 joint entries matching E and E joint entries matching E joint entry) joint entry) 2 2 Suppose you have m binary-valued variables in your Bayes Net and expression E 2 mentions k variables. How much work is the above computation? 78 39

40 The sad, bad news Conditional probabilities by enumerating all matching entries in the joint are expensive: Exponential in the number of variables. 79 The sad, bad news Conditional probabilities by enumerating all matching entries in the joint are expensive: Exponential in the number of variables. But perhaps there are faster ways of querying Bayes nets? In fact, if I ever ask you to manually do a Bayes Net inference, you ll find there are often many tricks to save you time. So we ve just got to program our computer to do those tricks too, right? 8 4

41 The sad, bad news Conditional probabilities by enumerating all matching entries in the joint are expensive: Exponential in the number of variables. But perhaps there are faster ways of querying Bayes nets? In fact, if I ever ask you to manually do a Bayes Net inference, you ll find there are often many tricks to save you time. So we ve just got to program our computer to do those tricks too, right? Sadder and worse news: General querying of Bayes nets is NP-complete. 8 Bayes nets inference algorithms poly-tree is a directed acyclic graph in which no two nodes have more than one path between them. S X X M 2 L T poly tree X 3 X 5 R X 4 S X X 2 M L X 3 R X 4 X T 5 Not a poly tree (but still a legal Bayes net) If net is a poly-tree, there is a linear-time algorithm (see a later ndrew lecture). The best general-case algorithms convert a general net to a polytree (often at huge expense) and calls the poly-tree algorithm. nother popular, practical approach (doesn t assume poly-tree): Stochastic Simulation. 82 4

42 Sampling from the Joint Distribution s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 R R M).3 R ~M).6 It s pretty easy to generate a set of variable-assignments at random with the same probability as the underlying joint distribution. How? 83 Sampling from the Joint Distribution s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 R R M).3 R ~M).6 It s pretty easy to generate a set of variable-assignments at random with the same probability as the underlying joint distribution. How? 84 42

43 85 Sampling from the Joint Distribution s).3 S M M).6 L M^S).5 L M^~S). L ~M^S). L ~M^~S).2 L T T L).3 T ~L).8 R R M).3 R ~M).6. Randomly choose S. S True with prob.3 2. Randomly choose M. M True with prob.6 3. Randomly choose L. The probability that L is true depends on the assignments of S and M. E.G. if steps and 2 had produced STrue, MFalse, then probability that L is true is. 4. Randomly choose R. Probability depends on M. 5. Randomly choose T. Probability depends on L 86 43

44 general sampling algorithm Let s generalize the example on the previous slide to a general Bayes Net. s in Slides 6-7, call the variables X.. X n, where Parents(X i ) must be a subset of {X.. X i- }. For i to n:. Find parents, if any, of X i. ssume n(i) parents. Call them X p(i,), X p(i,2), X p(i,n(i)). 2. Recall the values that those parents were randomly given: x p(i,), x p(i,2), x p(i,n(i)). 3. Look up in the lookup-table for: X i True X p(i,) x p(i,),x p(i,2) x p(i,2) X p(i,n(i)) x p(i,n(i)) ) 4. Randomly set x i True according to this probability x, x 2, x n are now a sample from the joint distribution of X, X 2, X n. 87 Stochastic Simulation Example Someone wants to know R True T True ^ S False ) We ll do lots of random samplings and count the number of occurrences of the following: N c : Num. samples in which TTrue and SFalse. N s : Num. samples in which RTrue, TTrue and SFalse. N : Number of random samplings Now if N is big enough: N c /N is a good estimate of TTrue and SFalse). N s /N is a good estimate of RTrue,TTrue, SFalse). R T^~S) R^T^~S)/T^~S), so N s / N c can be a good estimate of R T^~S)

45 General Stochastic Simulation Someone wants to know E E 2 ) We ll do lots of random samplings and count the number of occurrences of the following: N c : Num. samples in which E 2 N s : Num. samples in which E and E 2 N : Number of random samplings Now if N is big enough: N c /N is a good estimate of E 2 ). N s /N is a good estimate of E, E 2 ). E E 2 ) E ^ E 2 )/E 2 ), so N s / N c can be a good estimate of E E 2 ). 89 Likelihood weighting Problem with Stochastic Sampling: With lots of constraints in E, or unlikely events in E, then most of the simulations will be thrown away, (they ll have no effect on Nc, or Ns). Imagine we re part way through our simulation. In E2 we have the constraint Xi v We re just about to generate a value for Xi at random. Given the values assigned to the parents, we see that Xi v parents) p. Now we know that with stochastic sampling: we ll generate Xi v proportion p of the time, and proceed. nd we ll generate a different value proportion -p of the time, and the simulation will be wasted. Instead, always generate Xi v, but weight the answer by weight p to compensate. 9 45

46 Likelihood weighting Set N c :, N s :. Generate a random assignment of all variables that matches E 2. This process returns a weight w. 2. Define w to be the probability that this assignment would have been generated instead of an unmatching assignment during its generation in the original algorithm.fact: w is a product of all likelihood factors involved in the generation. 3. N c : N c + w 4. If our sample matches E then N s : N s + w 5. Go to gain, N s / N c estimates E E 2 ) 9 Case Study I Pathfinder system. (Heckerman 99, Probabilistic Similarity Networks, MIT Press, Cambridge M). Diagnostic system for lymph-node diseases. 6 diseases and symptoms and test-results. 4, probabilities Expert consulted to make net. 8 hours to determine variables. 35 hours for net topology. 4 hours for probability table values. pparently, the experts found it quite easy to invent the causal links and probabilities. Pathfinder is now outperforming the world experts in diagnosis. Being extended to several dozen other medical domains

47 Questions What are the strengths of probabilistic networks compared with propositional logic? What are the weaknesses of probabilistic networks compared with propositional logic? What are the strengths of probabilistic networks compared with predicate logic? What are the weaknesses of probabilistic networks compared with predicate logic? (How) could predicate logic and probabilistic networks be combined? 93 What you should know The meanings and importance of independence and conditional independence. The definition of a Bayes net. Computing probabilities of assignments of variables (i.e. members of the joint p.d.f.) with a Bayes net. The slow (exponential) method for computing arbitrary, conditional probabilities. The stochastic simulation method and likelihood weighting

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giing your own lectures.