Probabilistic Reasoning Systems Dr. Richard J. Povinelli Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 1
Objectives You should be able to apply belief networks to model a problem with uncertainty. be able to briefly describe other approaches to modeling uncertainty such as Dempster-Shafer theory and fuzzy sets/logic. Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 2
Outline Conditional independence Bayesian networks: syntax and semantics Exact inference Approximate inference Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference by Markov chain Monte Carlo Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 3
Independence Two random variables A and B are (absolutely) independent iff P (A B) = P (A) P (A,B) = P (A B) P (B) = P (A) P (B) for example A and B are two coin tosses If n Boolean variables are independent, the full joint is P(X 1,, X n ) = Π i P(X i ) hence can be specified by just n numbers Absolute independence is a very strong requirement, seldom met Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 4
Conditional independence Consider the dentist problem with three random variables: Toothache, Cavity, Catch (steel probe catches in my tooth) The full joint distribution has 2 3-1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P (Catch Toothache, Cavity) = P (Catch Cavity) i.e., Catch is conditionally independent of Toothache given Cavity The same independence holds if I haven't got a cavity: (2) P (Catch Toothache, Cavity) = P (Catch Cavity) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 5
Conditional independence II Equivalent statements to (1) (1a) P (Toothache Catch, Cavity) = P(Toothache Cavity) Why?? (1b) P (Toothache, Catch Cavity) = P(Toothache Cavity) P(Catch Cavity) Why?? Full joint distribution can now be written as P(Toothache, Catch, Cavity) = P(Toothache,Catch Cavity)P(Cavity) = P(Toothache Cavity)P(Catch Cavity)P(Cavity) i.e., 2 + 2 + 1 = 5 independent numbers (equations 1 and 2 remove 2) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 6
Conditional independence III Equivalent statements to (1) (1a) P (Toothache Catch, Cavity) = P(Toothache Cavity) Why?? P (Toothache Catch, Cavity) = P (Catch Toothache, Cavity) P(Toothache Cavity) /P(Catch Cavity) = P(Catch Cavity) P(Toothache Cavity)/P(Catch Cavity) (from 1) = P(ToothachejCavity) (1b) P (Toothache, Catch Cavity) = P(Toothache Cavity) P(Catch Cavity) Why?? P(Toothache, Catch Cavity) = P (Toothache Catch, Cavity) P(Catch Cavity) (product rule) = P (Toothache Cavity) P(Catch Cavity) (from 1a) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 7
Belief networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P(X i Parents(X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 8
Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge: Note: k parents O(d k n) numbers vs. O(d n ) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 9
Semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P(X 1,, X n ) = Π n i = 1 P(X n Parents(X i )) e.g., P (J M A B E) is given by?? Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 10
Semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P(X 1,, X n ) = Π n i = 1 P(X n Parents(X i )) e.g., P (J M A B E) is given by?? = P( B) P( E) P(A B E) P(J A) P(M A) Local semantics: each node is conditionally independent of its nondescendants given its parents Theorem: Local semantics, global semantics Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 11
Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children's parents Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 12
Constructing belief networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Choose an ordering of variables X 1,, X n 2. For i = 1 to n add X i to the network select parents from X 1,, X i-1 such that P(X i Parents(X i )) = P(X i X 1,, X i-1 ) This choice of parents guarantees the global semantics: P(X 1,, X n ) = Π n i = 1 P(X i X 1,, X i-1 ) (chain rule) = Π n i = 1 P(X n Parents(X i )) by construction Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 13
Example Suppose we choose the ordering M, J, A, B, E P (J M) = P (J)? MaryCalls JohnCalls Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 14
Example Suppose we choose the ordering M, J, A, B, E P (J M) = P (J)? No P (A J, M) = P (A J)? P(A J, M) = P(A)? MaryCalls JohnCalls Alarm Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 15
Example Suppose we choose the ordering M, J, A, B, E P (J M) = P (J)? No P (A J, M) = P (A J)? P(A J, M) = P(A)? No P (B A, J, M) = P (B A)? P (B A, J, M) = P (B)? MaryCalls JohnCalls Alarm Burglary Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 16
Example Suppose we choose the ordering M, J, A, B, E P (J M) = P (J)? No P (A J, M) = P (A J)? P(A J, M) = P(A)? No P (B A, J, M) = P (B A)? Yes P (B A, J, M) = P (B)? No MaryCalls P (E B, A, J, M) = P (E A)? P (E B, A, J, M) = P (E A, B)? JohnCalls Alarm Burglary Earthquake Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 17
Example Suppose we choose the ordering M, J, A, B, E P (J M) = P (J)? No P (A J, M) = P (A J)? P(A J, M) = P(A)? No P (B A, J, M) = P (B A)? Yes P (B A, J, M) = P (B)? No MaryCalls P (E B, A, J, M) = P (E A)? No P (E B, A, J, M) = P (E A, B)? Yes JohnCalls Alarm Burglary Earthquake Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 18
CAT Belief Network Using the previous example make a better belief network. Why is yours better? Work with a partner for 5 minutes. Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 19
Example: Car diagnosis Initial evidence: engine won't start Testable variables (thin ovals), diagnosis variables (thick ovals) Hidden variables (shaded) ensure sparse structure, reduce parameters Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 20
Example: Car insurance Predict claim costs (medical, liability, property) given data on application form (other unshaded nodes) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 21
Inference tasks Simple queries: compute posterior marginal P(X i E = e) e.g., P (NoGas Gauge = empty, Lights = on, Starts = false) Conjunctive queries: P(X i, X j E = e) = P(X i E = e) P(X j X i, E= e) Optimal decisions decision networks include utility information probabilistic inference required for P (outcome action, evidence) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 22
Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network P(B J = true, M = true) = P(B, J = true, M = true)/p(j = true, M = true) = αp(b, J = true, M = true) = α Σ e Σ a P(B, e, a, J = true, M = true) Rewrite full joint entries using product of CPT entries: P (B = true J = true, M = true) = ασ e Σ a P (B=true) P(e) P(a B=true, e) P(J=true a) P(M=true a) = αp(b=true) Σ e P(e) Σ a P(a B=true, e) P(J=true a) P(M=true a) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 23
Inference by variable elimination Enumeration is inefficient: repeated computation e.g., computes P (J = trueja)p (M = trueja) for each value of e Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation P(B J = true, M = true) = α P(B) Σ e P(e) Σ a P(a B, e) P(J = true a) P(M = true a) Burglary Earthquake Alarm JohnCall MaryCall = α P(B) Σ e P(e) Σ a P(a B, e) P(J = true a) f M (a) = α P(B) Σ e P(e) Σ a P(a B, e) f J (a) f M (a) = α P(B) Σ e P(e) Σ a f A (a, b, e) f J (a) f M (a) = α P(B) Σ e P(e) f AJM (b, e) (sum out A) = α P(B)f EAJM (b) (sum out E) = α f B (b) x f EAJM (b) Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 24
Complexity of exact inference Singly connected networks (or polytrees) any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(d k n) Multiply connected networks: can reduce 3SAT to exact inference => NP-hard equivalent to counting 3SAT models => NP-complete Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 25
Inference by stochastic simulation Basic idea: 1) Draw N samples from a sampling distribution S 2) Compute an approximate posterior probability P 3) Show this converges to the true probability P Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 26
Logic Sampling Simulate events to estimate the desired probability What is P(WetGrass Cloudy)? Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 27
CAT Logic Sampling Using the previous belief network determine P(Rain Wet Grass, Cloudy). How many simulations did you run? Work with a partner for 5 minutes. Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 28