Inference in Bayesian networks hapter 14.4 5 hapter 14.4 5 1
Exact inference by enumeration Outline Approximate inference by stochastic simulation hapter 14.4 5 2
Inference tasks Simple queries: compute posterior marginal P(X i E =e) e.g., P(NoGas Gauge = empty, Lights =on, Starts = false) onjunctive queries: P(X i,x j E =e) = P(X i E =e)p(x j X i,e=e) Optimal decisions: decision networks include utility information; probabilistic inference required for P(outcome action,evidence) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? hapter 14.4 5 3
Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: P(B j, m) = P(B,j, m)/p(j, m) = αp(b, j,m) = α Σ e Σ a P(B,e, a,j, m) B J A E M Rewrite full joint entries using product of P entries: P(B j, m) = α Σ e Σ a P(B)P(e)P(a B,e)P(j a)p(m a) = αp(b) Σ e P(e) Σ a P(a B,e)P(j a)p(m a) Recursive depth-first enumeration: O(n) space, O(d n ) time hapter 14.4 5 4
Enumeration algorithm function Enumeration-Ask(X,e,bn) returns a distribution over X inputs: X, the query variable e, observed values for variables E bn, a Bayesian network with variables {X} E Y Q(X ) a distribution over X, initially empty for each value x i of X do extend e with value x i for X Q(x i ) Enumerate-All(Vars[bn],e) return Normalize(Q(X )) function Enumerate-All(vars,e) returns a real number if Empty?(vars) then return 1.0 Y irst(vars) if Y has value y in e then return P(y P a(y )) Enumerate-All(Rest(vars), e) else return y P(y Pa(Y )) Enumerate-All(Rest(vars),e y ) where e y is e extended with Y = y hapter 14.4 5 5
L L L L omplexity of exact inference Multiply connected networks: can reduce 3SA to exact inference NP-hard equivalent to counting 3SA models #P-complete 0.5 0.5 0.5 0.5 A B D 1. A v B v 2. v D v A 1 2 3 3. B v v D AND hapter 14.4 5 6
Inference by stochastic simulation Basic idea: 1) Draw N samples from a sampling distribution S 2) ompute an approximate posterior probability ˆP 3) Show this converges to the true probability P Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples 0.5 oin hapter 14.4 5 7
Sampling from an empty network function Prior-Sample(bn) returns an event sampled from bn inputs: bn, a belief network specifying joint distribution P(X 1,..., X n ) x an event with n elements for i = 1 to n do x i a random sample from P(X i parents(x i )) given the values of Parents(X i ) in x return x hapter 14.4 5 8
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 9
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 10
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 11
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 12
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 13
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 14
Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 15
Sampling from an empty network contd. Probability that PriorSample generates a particular event S PS (x 1...x n ) = Π n i = 1P(x i parents(x i )) = P(x 1...x n ) i.e., the true prior probability E.g., S PS (t,f, t,t) = 0.5 0.9 0.8 0.9 = 0.324 = P(t,f, t,t) Let N PS (x 1...x n ) be the number of samples generated for event x 1,...,x n hen we have lim N ˆP(x 1,...,x n ) = lim N N PS(x 1,...,x n )/N = S PS (x 1,...,x n ) = P(x 1...x n ) hat is, estimates derived from PriorSample are consistent Shorthand: ˆP(x1,...,x n ) P(x 1...x n ) hapter 14.4 5 16
Rejection sampling ˆP(X e) estimated from samples agreeing with e function Rejection-Sampling(X, e, bn, N) returns an estimate of P(X e) local variables: N, a vector of counts over X, initially zero for j = 1 to N do x Prior-Sample(bn) if x is consistent with e then N[x] N[x]+1 where x is the value of X in x return Normalize(N[X]) E.g., estimate P( = true) using 100 samples 27 samples have = true Of these, 8 have =true and 19 have = false. ˆP( = true) = Normalize( 8, 19 ) = 0.296, 0.704 Similar to a basic real-world empirical estimation procedure hapter 14.4 5 17
Analysis of rejection sampling ˆP(X e) = αn PS (X,e) (algorithm defn.) = N PS (X,e)/N PS (e) (normalized by N PS (e)) P(X, e)/p(e) (property of PriorSample) = P(X e) (defn. of conditional probability) Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P(e) is small P(e) drops off exponentially with number of evidence variables! hapter 14.4 5 18
Likelihood weighting Idea: fix evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence function Likelihood-Weighting(X, e, bn, N) returns an estimate of P(X e) local variables: W, a vector of weighted counts over X, initially zero for j = 1 to N do x, w Weighted-Sample(bn) W[x] W[x] + w where x is the value of X in x return Normalize(W[X ]) function Weighted-Sample(bn,e) returns an event and a weight x an event with n elements; w 1 for i = 1 to n do if X i has a value x i in e then w w P(X i = x i parents(x i )) else x i a random sample from P(X i parents(x i )) return x, w hapter 14.4 5 19
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 hapter 14.4 5 20
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 hapter 14.4 5 21
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 hapter 14.4 5 22
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 hapter 14.4 5 23
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 hapter 14.4 5 24
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 hapter 14.4 5 25
Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 0.99 = 0.099 hapter 14.4 5 26
Likelihood weighting analysis Sampling probability for WeightedSample is S WS (z,e) = Π l i = 1P(z i parents(z i )) Note: pays attention to evidence in ancestors only somewhere in between prior and posterior distribution loudy Weight for a given sample z,e is w(z,e) = Π m i = 1P(e i parents(e i )) Weighted sampling probability is S WS (z,e)w(z,e) = Π l i = 1P(z i parents(z i )) Π m i = 1P(e i parents(e i )) = P(z,e) (by standard global semantics of network) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight hapter 14.4 5 27
Exact inference by enumeration: NP-hard on general graphs Summary Approximate inference by LW: LW does poorly when there is lots of (downstream) evidence LW, generally insensitive to topology onvergence can be very slow with probabilities close to 1 or 0 an handle arbitrary combinations of discrete and continuous variables hapter 14.4 5 28