Bayesian Networks. B. Inference / B.3 Approximate Inference / Sampling

Size: px

Start display at page:

Download "Bayesian Networks. B. Inference / B.3 Approximate Inference / Sampling"

Solomon Harrell
6 years ago
Views:

1 Bayesian etworks B. Inference / B.3 Approximate Inference / Sampling Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim Course on Bayesian etworks, winter term 2016/2017 1/20

2 Bayesian etworks Mon Syllabus (1) 0. Introduction A. Fundamentals & A.1. Probability Calculus revisited Mon (2) A.2. Separation in Graphs Mon (3) A.3. Bayesian and Markov etworks Mon (4) (ctd.) B. Inference Mon (5) B.1. Exact Inference Mon (6) B.2. Exact Inference / Join Tree Mon (7) (ctd.) Mon (8) B.3. Approximate Inference / Sampling Mon (9) B.4. Approximate Inference / Propagation C. Learning Mon (10) C.1. Parameter Learning Mon (11) C.2. Parameter Learning with Missing Values Mon (12) C.3. Structure Learning / PC Algorithm Mon (13) C.4. Structure Learning / Search Methods Course on Bayesian etworks, winter term 2016/2017 1/20

3 Bayesian etworks 1. Why exact inference may not be good enough 2. Acceptance-Rejection Sampling 3. Importance Sampling 4. Self and Adaptive Importance Sampling 5. Stochastic / Loopy Propagation Course on Bayesian etworks, winter term 2016/2017 1/20

4 Bayesian etworks / 1. Why exact inference may not be good enough bayesian network # variables time for exact inference studfarm s Hailfinder s Pathfinder s Link s 1) on a 1.6MHz Pentium-M notebook ( 1) on a 2.5 MHz Pentium-IV) though w/o optimized implementation with very simple triangulation heuristics (minimal degree). Course on Bayesian etworks, winter term 2016/2017 1/20

5 Bayesian etworks 1. Why exact inference may not be good enough 2. Acceptance-Rejection Sampling 3. Importance Sampling 4. Self and Adaptive Importance Sampling 5. Stochastic / Loopy Propagation Course on Bayesian etworks, winter term 2016/2017 2/20

6 Bayesian etworks / 2. Acceptance-Rejection Sampling Estimating marginals from data case family-out light-on bowel-problem dog-out hear-bark Figure 1: Example data for the dog-problem. family-out bowel-problem light-on dog-out hear-bark Figure 2: Bayesian network for dog-problem. family-out a) counts Figure 3: Estimating absolute probabilities (root node tables). family-out b) probabilities Course on Bayesian etworks, winter term 2016/2017 2/20

7 Bayesian etworks / 2. Acceptance-Rejection Sampling Estimating marginals from data case family-out light-on bowel-problem dog-out hear-bark Figure 1: Example data for the dog-problem. family-out bowel-problem light-on dog-out hear-bark Figure 2: Bayesian network for dog-problem. family-out 0 1 bowel dog-out a) counts family-out 0 1 bowel dog-out b) absolute probabilities Figure 4: Estimating conditional probabilities (inner node tables). family-out 0 1 bowel dog-out c) cond. probabilities Course on Bayesian etworks, winter term 2016/2017 3/20

8 Bayesian etworks / 2. Acceptance-Rejection Sampling Estimating marginals from data given evidence If we want to estimate the probabilities for family-out given the evidence that dog-out is 1, we have (i) identify all cases that are compatible with the given evidence, (ii) estimate the target potential p(familiy-out) from these cases. family-out light-on bowel-problem dog-out hear-bark case rejected rejected accepted accepted rejected rejected accepted rejected accepted accepted Figure 5: Accepted and rejected cases for evidence dog-out = 1. family-out a) counts family-out b) probabilities Figure 6: Estimating target potentials given evidence, here p(family-out dog-out = 1). Course on Bayesian etworks, winter term 2016/2017 4/20

9 Bayesian etworks / 2. Acceptance-Rejection Sampling Learning and inferencing data learn: a) parameters: estimate vertex potentials b) graph structure model infer target potentials Figure 7: Learing models from data for inferencing. Course on Bayesian etworks, winter term 2016/2017 4/20

10 Bayesian etworks / 2. Acceptance-Rejection Sampling Sampling and estimating data learn: a) parameters: estimate vertex potentials b) graph structure generate synthetic data / sample model estimate infer target potentials Figure 7: Learing models from data for inferencing vs. sampling from models and estimating. Course on Bayesian etworks, winter term 2016/2017 5/20

11 Bayesian etworks / 2. Acceptance-Rejection Sampling Given a discrete distribution, e.g., Sampling a discrete distribution Pain Weightloss Vomiting Adeno Figure 8: Example for a discrete distribution. How do we draw samples from this distribution? = generate synthetic data that is distributed according to it? Course on Bayesian etworks, winter term 2016/2017 6/20

12 Bayesian etworks / 2. Acceptance-Rejection Sampling Sampling a discrete distribution (i) Fix an enumeration of all states of the distribution p, i.e., σ : {1,..., Ω } Ω bijective with Ω the set of all states, (ii) compute the cumulative distribution function in the state index, i.e., cump,σ : {1,..., Ω } [0, X1] i 7 p(σ(j)) j i (iii) draw a random real value r uniformly from [0,1], (iv) search the state ω with cump,σ (ω) r and maximal cump,σ (ω). Pain Weightloss Vomiting Adeno Figure 8: Example for a discrete distribution. Adeno Pain Weightloss Vomiting cump,σ (i) index i Figure 9: Cumulative distribution function. Course on Bayesian etworks, winter term 2016/2017 7/20

13 Bayesian etworks / 2. Acceptance-Rejection Sampling Sampling a Bayesian etwork / naive approach As a bayesian network encodes a discrete distribution, we can use the method from the former slide to draw samples from a bayesian network: (i) Compute the full JPD table from the bayesian network, (ii) draw a sample from the table as on the slide before. This approach is not sensible though, as we actually used bayesian networks s.t. we not have to compute the full JPD (as it normally is way to large to handle). How can we make use of the independencies encoded in the bayesian network structure? Course on Bayesian etworks, winter term 2016/2017 8/20

14 Bayesian etworks / 2. Acceptance-Rejection Sampling Sampling a Bayesian etwork Idea: sample variables separately, one at a time. 1 2 sample-forward(b := (G := (V, E), (pv )v V )) : σ := topological-ordering (G) x := 0V for i = 1,..., σ do v := σ(i) q := pv x pa(v) draw xv q od return x 3 If we have sampled 4 X1,..., X k 5 6 already and Xk+1 is a vertex s.t. 7 desc(xk+1) {X1,..., Xk } = 8 then 9 p(xk+1 X1,..., Xk ) = p(xk+1 pa(xk+1)) Figure 10: Algorithm for sampling a bayesian i.e., we can sample Xk+1 from its vertex network. potential given the evidence of its parents (as sampled before). Course on Bayesian etworks, winter term 2016/2017 9/20

15 Bayesian etworks / 2. Acceptance-Rejection Sampling Sampling a Bayesian etwork / example Let σ := (F, B, L, D, H). 1. x F p F = (0.85, 0.15) say with outcome x B p B = (0.8, 0.2) say with outcome 1. family-out light-on bowel-problem dog-out 3. x L p L (F = 0) = (0.95, 0.05) say with outcome x D p D (F = 0, B = 1) = (0.03, 0.97) say with outcome x H p H (D = 1) = (0.3, 0.7) say with outcome 1. The result is x = (0, 1, 0, 1, 1) F = F 0 1 L = hear-bark B = D 0 1 H = B 0 1 F D = Figure 11: Bayesian network for dog-problem. Course on Bayesian etworks, winter term 2016/ /20

16 Bayesian etworks / 2. Acceptance-Rejection Sampling Acceptance-rejection sampling Inferencing by acceptance-rejection sampling means: (i) draw a sample from the bayesian network (w/o evidence entered), (ii) drop all data from the sample that are not conformant with the evidence, (iii) estimate target potentials from the remaining data For bayesian networks sampling is done by forward-sampling. Forward sampling is stopped as soon as an evidence variable has been instantiated that contradicts the evidence infer-acceptance-rejection(b : bayesian network, W : target domain, E : evidence, n : sample size ) : D := (sample-forward (B) i = 1,..., n) return estimate (D, W, E) sample-forward(b := (G := (V, E), (pv )v V )) : σ := topological-ordering (G) x := 0V for i = 1,..., σ do v := σ(i) q := pv x pa(v) draw xv q od return x estimate(d : data, W : target domain, E : evidence ) : D 0 := (d D d dom(e) = val(e)) return estimate (D 0, W ) estimate(d : data, W : target domain ) : q := zero-potential on W for d D do q(d)++ od q := q/ data return q Acceptance-rejection sampling for 7 bayesian networks is also called logic Figure 12: Algorithm for acceptance-rejection sampling [Hen88]. sampling. Course on Bayesian etworks, winter term 2016/ /20

17 Bayesian etworks 1. Why exact inference may not be good enough 2. Acceptance-Rejection Sampling 3. Importance Sampling 4. Self and Adaptive Importance Sampling 5. Stochastic / Loopy Propagation Course on Bayesian etworks, winter term 2016/ /20

18 Bayesian etworks / 3. Importance Sampling Acceptance rate of acceptance-rejection sampling How efficient acceptance-rejection sampling is depends on the acceptance rate. Let E be evidence. Then the acceptance rate, i.e., the fraction of samples conformant with E, is p(e) the marginal probability of the evidence. Thus, acceptance-rejection sampling performs poorly if the probability of evidence is small. In the studfarm example p(j = aa) = i.e., from 2326 sampled cases 2325 are rejected. Course on Bayesian etworks, winter term 2016/ /20

19 Bayesian etworks / 3. Importance Sampling Idea of importance sampling Idea: do not sample the evidence variables, but instantiate them to the values of the evidence. A B C Instantiating the evidence variables first, Figure 13: If C is evidential and already instanmeans, we have to sample the other tiated, say C = c, then A is dependent on C, so variables from we would have to sample A from p(a C = c). Even worse, B is dependant on C and A (dp(xk+1 X1 = x1,..., Xk = xk, separation), so we would have to sample B from E1 = e1,..., Em = em) even for a topological ordering of nonevidential variables. Problem: if there is an evidence variable that is a descendant of a nonevidential variable Xk+1 that has to be sampled, then p(b A = a, C = c). But neither of these cpdfs is known in advance. it does neither occur among its parents nor is independent from Xk+1, and it may open dependency chains to other variables! Course on Bayesian etworks, winter term 2016/ /20

20 Bayesian etworks / 3. Importance Sampling Inference from a stochastic point of view LetQ V be a set of variables and p a pdf on dom(v ). Infering the marginal on a given set of variables W V and given evidence E means to compute i.e., for all x (pe ) W (x) = = Q (pe ) W dom(w ) X Q y dom(v \W \dom(e)) X Q y dom(v ) So we can reformulate the inference problem as the problem of averaging a given random variable f (here: f := I(x,e)) over a given pdf p, i.e., to compute / estimate the mean X Ep(f ) := f (x) p(x) x dom(p) p(x, y, e) I(x,e)(y) p(y) with the indicator function Q Ix : dom(v ) {0, ( 1} 1, if y dom(x) = x y 7 0, else Theorem 1 (strong law of large numbers). Let p : Ω [0, 1] be a pdf, f : Ω R be a random variable with Ep( f ) <, and Xi f, i independently. Then n 1X Xi a.s. Ep(f ) n i=1 Proof. See, e.g., [Sha03, p. 62] Course on Bayesian etworks, winter term 2016/ /20

21 Bayesian etworks / 3. Importance Sampling Sampling from the wrong distribution Inference by sampling applies the SLL: X f (x) p(x) =:Ep(f ) x dom(p) 1 X f (x) n x p (n draws) ow let q be any other pdf with p(x) > 0 = q(x) > 0, Due to X x dom(p) f (x) p(x) = x dom(p) = dom(q) X x dom(p) f (x) p =:Eq (f ) q 1 X p(x) f (x) n x q q(x) (n draws) note: p(x) q(x) q(x) X i we can sample from q instead from p if we adjust the function values of f accordingly. The pdf q is called importance function, the function w := p/q is called score or case weight. Often we know the case weight only up to a multiplicative constant, i.e., w0 := c w p/q with unknown constant c. For a sample x1,..., xn q, we then can approximate Ep(f ) by n 1X f (xi) w(xi) Ep(f ) n i=1 n X 1 f (xi) w0(xi) P 0 i w (xi ) i=1 X p(xi) 1 X p(xi) w (xi) = c = cn 1 c n Ep(1) = c n q(x ) n q(x ) i i i i 0 Course on Bayesian etworks, winter term 2016/ /20

22 Bayesian etworks / 3. Importance Sampling Case weight Back to sampling from the true distribution pe vs. sampling from the bayesian network with pre-instantiated evidence variables (the wrong distribution) The probability for a sample x from a Bayesian network among samples con- The probability for a sample x from a Bayesian network with pre-instantiated sistent with a given evidence E is evidence variables is pv (xv x pa(v)) p(x) qe (x) = pv (xv x pa(v)) v V p(x E) = = v V \dom(e) p(e) p(e) Thus, the case weight is w(x) := p(x E) = qe (x) v dom(e) pv (xv x pa(v)) p(e) Course on Bayesian etworks, winter term 2016/ /20

23 Bayesian etworks / 3. Importance Sampling Likelihood weighting sampling Inferencing by importance sampling means: infer-likelihood-weighting(b : bayesian network, W : target domain, E : evidence, n : sample size ) : (D, w) := (sample-likelihood-weighting (B, E) i = 1,..., n) return estimate (D, w, W ) 1 2 (i) choose a sampling distribution q, 3 4 (ii) draw a weighted sample from q, 1 (iii) estimate target potentials from these sample data For bayesian networks using sampling from bayesian networks with pre-instantiated evidence variables and the case weight w(x) := pv (xv x pa(v)) v dom(e) is called likelihood [FC90, SP90] weighting sampling sample-likelihood-weighting(b := (G, (pv )v VG ), E : evidence ) : σ := topological-ordering (G \ dom(e)) x := 0VG x dom(e) := val(e) for i = 1,..., σ do v := σ(i) q := pv x pa(v) draw xv q od w(x) := pv (xv x pa(v) ) v dom(e) return (x, w(x)) estimate(d : data, w : case weight, W : target domain ) : q := zero-potential on W wtot := 0 for d D do q(d) := q(d) + w(d) wtot := wtot + w(d) od q := q/wtot return q Figure 14: Algorithm for inference by likelihood weighting sampling. Course on Bayesian etworks, winter term 2016/ /20

24 Bayesian etworks / 3. Importance Sampling Likelihood weighting sampling / example Let the evidence be D = 1. Fix σ := (F, B, L, H). 1. x F p F = (0.85, 0.15) say with outcome x B p B = (0.8, 0.2) say with outcome x L p L (F = 0) = (0.95, 0.05) say with outcome x H p H (D = 1) = (0.3, 0.7) say with outcome 1. The result is family-out light-on F = F 0 1 L = bowel-problem dog-out hear-bark B = D 0 1 H = x = (0, 1, 0, 1, 1) and the case weight w(x) = p D (D = 1 F = 0, B = 1) = 0.97 B 0 1 F D = Figure 15: Bayesian network for dog-problem. Course on Bayesian etworks, winter term 2016/ /20

25 Bayesian etworks / 3. Importance Sampling Acceptance-rejection sampling Acceptance-rejection sampling can be viewed as another instance of importance sampling. Here, the sampling distribution is q := p (i.e., the distribution without evidence entered; the target distribtion is pe!) and the case weight ( 1, if x dom(e) = val(e) w(x) := Ie(x) := 0, otherwise Course on Bayesian etworks, winter term 2016/ /20

26 Bayesian etworks / 3. Importance Sampling References [FC90] R. Fung and K. Chang. Weighting and integrating evidence for stochastic simulation in bayesian networks. In M. Henrion, R.D. Shachter, L.. Kanal, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 5, pages orth Holland, Amsterdam, [Hen88] M. Henrion. Propagation of unvertainty by logic sampling in bayes networks. In J. F. Lemmer and L.. Kanal, editors, Uncertainty in Artificial Intelligence 2, pages orth Holland, Amsterdam, [Sha03] Jun Shao. Mathematical Statistics. Springer, [SP90] R. D. Shachter and M. Peot. Simulation approaches to general probabilistic inference on belief networks. In M. Henrion, R. D. Shachter, L.. Kanal, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 5, pages orth Holland, Amsterdam, Course on Bayesian etworks, winter term 2016/ /20

Bayesian Networks. 7. Approximate Inference / Sampling

Bayesian Networks. 7. Approximate Inference / Sampling Bayesian Networks Bayesian Networks 7. Approximate Inference / Sampling Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Systems