AND/OR Importance Sampling

Size: px
Start display at page:

Download "AND/OR Importance Sampling"

Transcription

1 D/ Importance Sampling Vibhav Gogate and Rina Dechter Department of Information and Computer Science, University of California, Irvine, C 9697, {vgogate,dechter}@ics.uci.edu bstract The paper introduces D/ importance sampling for probabilistic graphical models. In contrast to importance sampling, D/ importance sampling caches samples in the D/ space and then extracts a new sample mean from the stored samples. We prove that D/ importance sampling may have lower variance than importance sampling; thereby providing a theoretical justification for preferring it over importance sampling. Our empirical evaluation demonstrates that D/ importance sampling is far more accurate than importance sampling in many cases. Introduction Many problems in graphical models such as computing the probability of evidence in ayesian networks, solution counting in constraint networks and computing the partition function in Markov random fields are summation problems, defined as a sum of a function over a domain. ecause these problems are P-hard, sampling based techniques are often used to approximate the sum. The focus of the current paper is on importance sampling. The main idea in importance sampling [Geweke, 989, Rubinstein, 98] is to transform the summation problem to that of computing a weighted average over the domain by using a special distribution called the proposal (or importance) distribution. Importance sampling then generates samples from the proposal distribution and approximates the true average over the domain by an average over the samples; often referred to as the sample average. The sample average is simply a ratio of the sum of sample weights and the number of samples, and it can be computed in a memory-less fashion since it requires keeping only these two quantities in memory. The main idea in this paper is to equip importance sampling with memoization or caching in order to exploit conditional independencies that exist in the graphical model. Specifically, we cache the samples on an D/ tree or graph [Dechter and Mateescu, 007] which respects the structure of the graphical model and then compute a new weighted average over that D/ structure, yielding, as we show, an unbiased estimator that has a smaller variance than the importance sampling estimator. Similar to D/ search [Dechter and Mateescu, 007], our new D/ importance sampling scheme recursively combines samples that are cached in independent components yielding an increase in the effective sample size which is part of the reason that its estimates have lower variance. We present a detailed experimental evaluation comparing importance sampling with D/ importance sampling on ayesian network benchmarks. We observe that the latter outperforms the former on most benchmarks and in some cases quite significantly. The rest of the paper is organized as follows. In the next section, we describe preliminaries on graphical models, importance sampling and D/ search spaces. In sections 3, 4 and 5 we formally describe D/ importance sampling and prove that its sample mean has lower variance than conventional importance sampling. Experimental results are described in section 6 and we conclude with a discussion of related work and summary in section 7. Preliminaries We represent sets by bold capital letters and members of a set by capital letters. n assignment of a value to a variable is denoted by a small letter while bold small letters indicate an assignment to a set of variables. Definition. (belief networks). belief network () is a graphical model R = (,D,P), where = {,..., n } is a set of random variables over multi-valued domains D = {D,...,D n }. Given a directed acyclic graph G over, P = {P i }, where P i = P( i pa( i )) are conditional probability tables (CPTs) associated with each i. pa( i ) is the set of parents of the variable i in G. belief network represents a probability distribution over, P() =

2 E D C [C] C [] [] E [E] D D D C E 0 0 C E 0 0 C E 0 0 C E 0 0 D D D 0 0 C E C E D D D D 0 C E C E D D D D F G [DC] D D D D 0 0 D D 0 0 D D 0 0 D D 0 0 D (c) (d) Figure : ayesian etwork, Pseudo-tree (c) D/ tree (d) D/ search graph n i= P( i pa( i )). n evidence set E = e is an instantiated subset of variables.the moral graph (or primal graph) of a belief network is the undirected graph obtained by connecting the parent nodes and removing direction. Definition. (Probability of Evidence). Given a belief network R and evidence E = e, the probability of evidence P(E = e) is defined as: P(e) = n \E j= P( j pa( j )) E=e () The notation h() E=e stands for a function h over \ E with the assignment E = e.. D/ search spaces We can compute probability of evidence by search, by accumulating probabilities over the search space of instantiated variables. In the simplest case, this process defines an search tree, whose nodes represent partial variable assignments. This search space does not capture the structure of the underlying graphical model. To remedy this problem, [Dechter and Mateescu, 007] introduced the notion of D/ search space. Given a bayesian network R = (,D,P), its D/ search space is driven by a pseudo tree defined below. Definition.3 (Pseudo Tree). Given an undirected graph G = (V,E), a directed rooted tree T = (V,E) defined on all its nodes is called pseudo tree if any arc of G which is not included in E is a back-arc, namely it connects a node to an ancestor in T. Definition.4 (Labeled D/ tree). Given a graphical model R =,D,P, its primal graph G and a backbone pseudo tree T of G, the associated D/ search tree, has alternating levels of D and nodes. The nodes are labeled i and correspond to the variables. The D nodes are labeled i,x i and correspond to the value assignments in the domains of the variables. The structure of the D/ search tree is based on the underlying backbone tree T. The root of the D/ search tree is an node labeled by the root of T. Each arc, emanating from an node to an D node is associated with a label which can be derived from the CPTs of the bayesian network [Dechter and Mateescu, 007]. Each node and D node is also associated with a value that is used for computing the quantity of interest. Semantically, the states represent alternative assignments, whereas the D states represent problem decomposition into independent subproblems, all of which need be solved. When the pseudo-tree is a chain, the D/ search tree coincides with the regular search tree. The probability of evidence can be computed from a labeled D/ tree by recursively computing the value of all nodes from leaves to the root [Dechter and Mateescu, 007]. Example.5. Figure shows a bayesian network over seven variables with domains of {0,}. F and G are evidence nodes. Figure (c) shows the D/-search tree for the bayesian network based on the Pseudo-tree in Figure. ote that because F and G are instantiated, the search space has only 5 variables.. Computing Probability of Evidence Using Importance Sampling Importance sampling [Rubinstein, 98] is a simulation technique commonly used to evaluate the sum, M = x f(x) for some real function f. The idea is to generate samples x,...,x from a proposal distribution Q (satisfying f(x) > 0 Q(x) > 0) and then estimate M as follows: M = f(x) = x x M = i= f(x) Q(x) Q(x) = E Q[ f(x) Q(x) ] () w(x i ), where w(x i ) = f(xi ) Q(x i ) w is often referred to as the sample weight. It is known that the expected value E( M) = M [Rubinstein, 98]. To compute the probability of evidence by importance sampling, we use the substitution: f(x) = n j= (3) P( j pa( j )) E=e (4) Several choices are available for the proposal distribution Q(x) ranging from the prior distribution as in likelihood weighting to more sophisticated alternatives such as

3 IJGP-Sampling [Gogate and Dechter, 005] and EPIS- [uan and Druzdzel, 006] where the output of belief propagation is used to compute the proposal distribution. s in prior work [Cheng and Druzdzel, 000], we assume that the proposal distribution is expressed in a factored product form: Q() = n i= Q i( i,..., i ) = n i= Q i( i i ), where i {,..., i }, Q i ( i i ) = Q( i,..., i ) and i < c for some constant c. We can generate a full sample from Q as follows. For i = to n, sample i = x i from the conditional distribution Q( i = x,..., i = x i ) and set i = x i. 3 D/ importance sampling We first discuss computing expectation by parts; which forms the backbone of D/ importance sampling. We then present the D/ importance sampling scheme formally and derive its properties. 3. Estimating Expectation by Parts In Equation, the expectation of a multi-variable function is computed by summing over the entire domain. This method is clearly inefficient because it does not take into account the decomposition of the multi-variable function as we illustrate below. Consider the tree graphical model given in Figure. Let = a and = b be the evidence variables. Let Q() = Q()Q( )Q( ) be the proposal distribution. For simplicity, let us assume that f() = P(), f() = P( )P( = a ) and f( ) = P( )P( = b ). We can express probability of evidence P(a,b) as: f() f() f() P(a,b) = Q()Q( )Q( ) Q()Q( )Q( ) [ ] f() f() f() = E Q()Q( )Q( ) We can decompose the expectation in Equation 5 into smaller components as follows: P(a,b) = f()q() Q() ( f()q( ) Q( ) )( ) f( )Q( ) Q( ) The quantities in the two brackets in Equation 6 are, by definition, conditional expectations of a function over and respectively given. Therefore, Equation 6 can be written as: P(a,b) = f() Q() E (5) (6) [ ] [ ] f() f( ) Q( ) E Q( ) Q() (7) y definition, Equation 7 can be written as: [ [ ] [ ]] f() f() f( ) P(a,b) = E Q() E Q( ) E Q( ) (8) We will refer to Equations of the form 8 as expectation by parts borrowing from similar terms such as integration and summation by parts. If the domain size of all variables is d = 3, for example, computing expectation using Equation 5 would require summing over d 3 = 3 3 = 7 terms while computing the same expectation by parts would require summing over d + d + d = = terms. Therefore, exactly computing expectation by parts is clearly more efficient. Importance sampling ignores the decomposition of expectation while approximating it by the sample average. Our new algorithm estimates the true expectation by decomposing it into several conditional expectations and then approximating each by an appropriate weighted average over the samples. Since computing expectation by parts is less complex than computing expectation by summing over the domain; we expect that approximating it by parts will be easier as well. We next illustrate how to estimate expectation by parts on our example ayesian network given in Figure. ssume that we are given samples (z,x,y ),...,(z,x,y ) generated from Q decomposed according to Figure. For simplicity, let {0,} be the domain of and let = 0 and = be sampled [ 0 and] times respectively. We can approximate E f() Q( ) and E below: [ f( ) Q( ) ] by g ( = j) and g ( = j) = f(x j i, = j)i(x i, = j) i= Q(x i, = j) g ( = j) = f(y j i, = j)i(y i, = j) i= Q(y i, = j) g ( = j) defined where I(x i, = j) (or I(y i, = j)) is an indicator function which is iff the tuple (x i, = j) ( or (y i, = j) ) is generated in any of the samples and 0 otherwise. From Equation 8, we can now derive the following unbiased estimator for P(a,b): P(a, b) = j=0 j f( = j) g ( = j) g ( = j) Q( = j) (9) (0) Importance sampling on the other hand would estimate P(a, b) as follows: P(a,b) = f( = j) j j=0 Q( = j) f(x j i, = j) f(y i, = j) i= Q(x i = j)q( i = j) I(xi,y i, = j) () where I(x i,y i, = j) is an indicator function which is iff the tuple (x i,y i, = j) is generated in any of the samples and 0 otherwise.

4 P( ) =0 = = = = P( ) =0 = = = = Evidence =0, =0 P() =0 = P( ) =0 = = = = P( ) =0 = = = = Sample # Q()=uniform distribution <0.6,> 0 0 <0.36,> <0.,> <0.4,> <.6,> <0.4,> P( = = 0) P( = 0 = ) (0.4)(0.) = = 0.6 Q( = = 0) 0.5 (c) <0.,> <0.08,> <0.8,> <0.84,> Figure : ayesian etwork, its CPTs, Proposal Distribution and Samples (c) D/ sample tree P( = ) 0. = = 0.4 Q( = ) Equation 0 which is an unbiased estimator of expectation by parts given in Equation 8 provides another rationale for preferring it over the usual importance sampling estimator given by Equation. In particular in Equation 0, we estimate two functions defined over the random variables = z and = z respectively from the generated samples. In importance sampling, on the other hand, we estimate a function over the joint random variable = z using the generated samples. ecause the samples for = z and = z are considered independently in Equation 0, j samples drawn over the joint random variable = z in Equation correspond to a larger set j j = j of virtual samples. We know that [Rubinstein, 98] the variance (and therefore the mean-squared error) of an unbiased estimator decreases with an increase in the effective sample size. Consequently, our new estimation technique will have lower error than the conventional approach. In the following subsection, we discuss how the D/ structure can be used for estimating expectation by parts yielding the D/ importance sampling scheme. 3. Computing Sample Mean in D/-space In this subsection, we formalize the ideas of estimating expectation by parts on a general D/ tree starting with some required definitions. We define the notion of an D/ sample tree which is restricted to the generated samples and which will be used to compute the D/ sample mean. The labels on this D/ tree are set to account for the importance weights. Definition 3. (rc Labeled D/ Sample Tree). Given a a graphical model R =,D,P, a pseudo-tree T(V,E), a proposal distribution Q = n i= Q( i nc( i )) such that nc( i ) is a subset of all ancestors of i in T, a sequence of assignments (samples) S and a complete D/ search tree φ T, an D/ sample tree S OT is constructed from φ T by removing all edges and corresponding nodes which are not in S i.e. they are not sampled. The rc-label for an node i to an D node i = x i in S OT is a pair w,# where: w = P( i=x i,anc(x i )) Q( i =x i anc(x i )) is called the weight of the arc. anc(x i ) is the assignment of values to all variables from the node i to the root node of S O and P( i = x i,anc(x i )) is the product of all functions in R that mention i but do not mention any variable ordered below it in T given ( i = x i,anc(x i )). # is the frequency of the arc. amely, it is equal to the number of times the assignment ( i = x i,anc(x i )) is sampled. Example 3.. Consider again the ayesian network given in Figure. ssume that the proposal distribution Q( ) is uniform. Figure shows four hypothetical random samples drawn from Q. Figure (c) shows the D/ sample tree over the four samples. Each arc from an node to an D node in the D/ sample tree is labeled with appropriate frequencies and weights according to Definition 3.. Figure (c) shows the derivation of arc-weights for two arcs. The main virtue of arranging the samples on an D/ sample tree is that we can exploit the independencies to define the D/ sample mean. Definition 3.3 (D/ Sample Mean). Given a D/ sample tree with arcs labeled according to Definition 3., the value of a node is defined recursively as follows. The value of leaf D nodes is and the value of leaf nodes is 0. Let C(n) denote the child nodes and v(n) denotes the value of node n. If n is a D node then: v(n) = n C(n) v(n ) and if n is a node then v(n) = n C(n)(#(n,n )w(n,n )v(n )) n C(n) #(n,n ) The D/ sample mean is the value of the root node. We can show that the value of an node is equal to an unbiased estimate of the conditional expectation of the variable at the node given an assignment from the root to the parent of the node. Since all variables, except the evidence variables are unassigned at the root node, the value of the root node equals the D/ sample mean which is an unbiased estimate of probability of evidence. Formally, THEEM 3.4. The D/ sample mean is an unbiased estimate of probability of evidence. Example 3.5. The calculations involved in computing the sample mean on the D/ sample tree on our example

5 0.6*0.7 = = = 0. = 0.46 = 0.7 <0.6,> <0.36,> <0.,> <0.,> <0.4,> <0.08,> <0.8,> <0.84,> 0 ( ) + ( ) = <.6,> <0.4,> 0.*0.46 = Figure 3: Computation of Values of and D nodes in a D/ sample tree. The value of root node is equal to the D/ sample mean ayesian network given in Figure are shown in Figure 3. Each D node and node in Figure 3 is marked with a value that is computed recursively using definition 3.3. The value of nodes and given = j {0,} is equal to g ( = j) and g ( = j) respectively defined in Equation 9. The value of the root node is equal to the D/ sample mean which is equal to the sample mean computed by parts in Equation 0. lgorithm D/ Importance Sampling Input: an ordering O = (,..., n ),a ayesian network and a proposal distribution Q Output: Estimate of Probability of Evidence : Generate samples x,...,x from Q along O. : uild a D/ sample tree S OT for the samples x,...,x along the ordering O. 3: Initialize all labeling functions w,# on each arc from an Ornode n to an nd-node n using Definition 3.. 4: F all leaf nodes i of S OT do 5: IF nd-node v(i)= ELSE v(i)=0 6: For every node n from leaves to the root do 7: Let C(n) denote the child nodes of node n 8: IF n =,x is a D node, then v(n) = n C(n) v(n ) 9: ELSE if n = is a node then 0: Return v(root node) v(n) = n C(n)(#(n,n )w(n,n )v(n )) n C(n) #(n,n. ) We now have the necessary definitions to formally present the D/ importance sampling scheme (see lgorithm ). In Steps -3, the algorithm generates samples from Q and stores them on an D/ sample tree. The algorithm then computes the D/ sample mean over the D/ sample tree recursively from leaves to the root in Steps 4 9. We can show that the value v(n) of a node in the D/ sample tree stores the sample average of the subproblem rooted at n, subject to the current variable instantiation along the path from the root to n. If n is the root, then v(n) is the D/ sample mean which is our D/ estimator of probability of evidence. Finally, we summarize the complexity of computing D/ sample mean in the following theorem: THEEM 3.6. Given samples and n variables (with constant domain size), the time complexity of computing D/ sample mean is O(n) (same as importance sampling) and its space complexity is O(n) (the space complexity of importance sampling is constant). 4 Variance Reduction In this section, we prove that the D/ sample mean may have lower variance than the sample mean computed using importance sampling (Equation 3). THEEM 4. (Variance Reduction). Variance of D/ sample mean is less than or equal to the variance of importance sampling sample mean. Proof. The details of the proof are quite complicated and therefore we only provide the intuitions involved. s noted earlier the guiding principle of D/ sample mean is to take advantage of conditional independence in the graphical model. Let us assume that we have three random variables, and with the following relationship: and are independent of each other given (similar to our example ayesian network). The expression for variance derived here can be used in an induction step (induction is carried on the nodes of the pseudo tree) to prove the theorem. In this case, importance sampling generates samples ((x,y,z ),...,(x,y,z )) along the order,, and estimates the mean as follows: µ IS () = i= xi y i z i () Without loss of generality, let {z,z } be the domain of and let these values be sampled and times respectively. We can rewrite Equation as follows: µ IS () = j= j z j i= xi y i I(z j,x i,y i ) j (3) where I(z j,x i,y i ) is an indicator function which is iff the partial assignment (z j,x i,y i ) is generated in any of the samples and 0 otherwise. D/ sample mean is defined as: µ O () = ( j z )( i= x i I(z j,x i ) ) i= y i I(z j,y i ) j j j j= (4) where I(x j,z i ) (and similarly I(y j,z i )) is an indicator function which equals when one of the samples contains the tuple (x j,z i ) (and similarly (y j,z i ))) and is 0 otherwise. y simple algebraic manipulations, we can prove that the variance of estimator µ IS () is given by: ( ( Var(µ IS ()) = z j Q(z j) µ( z j ) V( z j )+ j= µ( z j ) V( z j )+V( z j )V( z j )) ) / µ / (5)

6 Similarly, the variance of D/ sample mean is given by: Var(µ O ()) = ( ( z j Q(z j) µ( z j ) V( z j ) j= + µ( z j ) V( z j )+ V( z ) j)v( z j )) / µ / (6) j where µ( z j ) and V( z j ) are the conditional mean and variance respectively of given = z j. Similarly, µ( z j ) and V( z j ) are the conditional mean and variance respectively of given = z j. From Equations 5 and 6, if j = for all j, then we can see that the Var(µ O ()) = Var(µ IS ()). However if j >, Var(µ O ()) < Var(µ IS ()). This proves that the variance of D/ sample mean is less than or equal to the variance of conventional sample mean on this special case. s noted earlier using this case in induction over the nodes of a general pseudo-tree completes the proof. 5 Estimation in D/ graphs ext, we describe a more powerful algorithm for estimating mean in D/-space by moving from D/-trees to D/ graphs as presented in [Dechter and Mateescu, 007]. n D/-tree may contain nodes that root identical subtrees. When such unifiable nodes are merged, the tree becomes a graph and its size becomes smaller. Some unifiable nodes can be identified using contexts defined below. Definition 5. (Context). Given a belief network and the corresponding D/ search tree S OT relative to a pseudo-tree T, the context of any D node i,x i S OT, denoted by context( i ), is defined as the set of ancestors of i in T, that are connected to i and descendants of i. The context minimal D/ graph is obtained by merging all the context unifiable D nodes. The size of the largest context is bounded by the tree width w of the pseudo-tree [Dechter and Mateescu, 007]. Therefore, the time and space complexity of a search algorithm traversing the context-minimal D/ graph is O(exp(w )). Example 5.. For illustration, consider the contextminimal graph in Figure (e) of the pseudo-tree from Figure (c). Its size is far smaller that that of the D/ tree from Figure (c) (30 nodes vs. 38 nodes). The contexts of the nodes can be read from the pseudo-tree in Figure as follows: context() = {}, context() = {,}, context(c) = {C,,}, context(d) = {D,C,} and context(e) = {E,,}. The main idea in D/-graph estimation is to store all samples on an D/-graph instead of an D/-tree. Similar to an D/ sample tree, we can define an identical notion of an D/ sample graph. Definition 5.3 ( rc labeled D/ sample graph). Given a complete D/ graph φ G and a set of samples S, an D/ sample graph S OG is obtained by removing all nodes and arcs not in S from φ G. The labels on S OG are set similar to that of an D/ sample tree (see Definition 3.). Example 5.4. The bold edges and nodes in Figure (c) define an D/ sample tree. The bold edges and nodes in Figure (d) define an D/ sample graph corresponding to the same samples that define the D/ sample tree in Figure (c). The algorithm for computing the sample mean on D/ sample graphs is identical to the algorithm for D/tree (Steps 4-0 of lgorithm ). The main reason in moving from trees to graphs is that the variance of the sample mean computed on an D/ sample graph can be even smaller than that computed on an D/ sample tree. More formally, THEEM 5.5. Let V(µ OG ), V(µ OT ) and V(µ IS ) be the variance of D/ sample mean on an D/ sample graph, variance of D/ sample mean on an D/ sample tree and variance of sample mean of importance sampling respectively. Then given the same set of input samples: V(µ OG ) V(µ OT ) V(µ IS ) We omit the proof due to lack of space. THEEM 5.6 (Complexity of computing D/ graph sample mean). Given a graphical model with n variables, a psuedo-tree with treewidth w and samples, the time complexity of D/ graph sampling is O(nw ) while its space complexity is O(n). 6 Experimental Evaluation 6. Competing lgorithms The performance of importance sampling based algorithms is highly dependent on the proposal distribution [Cheng and Druzdzel, 000]. It was shown that computing the proposal distribution from the output of a Generalized elief Propagation scheme of Iterative Join Graph Propagation (IJGP) yields better empirical performance than other available choices [Gogate and Dechter, 005]. Therefore, we use the output of IJGP to compute the proposal distribution Q. The complexity of IJGP is time and space exponential in its i-bound, a parameter that bounds cluster sizes. We use a i-bound of 5 in all our experiments. We experimented with three sampling algorithms for benchmarks which do not have determinism: (pure) IJGP-sampling, D/-tree IJGP-sampling and (c) D/-graph IJGP-sampling. ote that the underlying scheme for generating the samples is identical in all the

7 methods. What changes is the method of accumulating the samples and deriving the estimates. On benchmarks which have zero probabilities or determinism, we use the Sample- Search scheme introduced by [Gogate and Dechter, 007] to overcome the rejection problem. We experiment with the following versions of on deterministic networks: pure, D/-tree Sample- Search and (c) D/-graph. 6.. Results We experimented with three sets of benchmark belief networks Random networks, Linkage networks and (c) Grid networks. ote that only linkage and grid networks have zero probabilities on which we use.the exact P(e) for most instances is available from the UI 006 competition web-site. Our results are presented in Figures 4-6. Each Figure shows approximate probability of evidence as a function of time. The bold line in each Figure indicates the exact probability of evidence. The reader can visualize the error from the distance between the approximate curves and the exact line. For lack of space, we show only part of our results. Each Figure shows the number of variables n, the maximum-domain size d and the number of evidence nodes E for the respective benchmark. Random etworks From Figures 4 and 4, we see that D/-graph sampling is better than D/-tree sampling which in turn is better than pure IJGP-sampling. However there is not much difference in the error because the proposal distribution seems to be a very good approximation of the posterior. Grid etworks ll Grid instances have 444 binary nodes and between 5-0 evidence nodes. From Figures 5 and 5, we can see that D/-graph and D/-tree are substantially better than pure. Linkage etworks The linkage instances are generated by converting a Pedigree to a ayesian network [Fishelson and Geiger, 003]. These networks have between nodes with a maximum domain size of 36. ote that it is hard to compute exact probability of evidence in these networks [Fishelson and Geiger, 003]. We observe from Figures 6, (c) and (d) that D/graph is substantially more accurate than D/-tree which in turn is substantially more accurate than pure. otice the logscale in Figures 6 -(d) which means that there is an order of magnitude difference between the errors. Our results suggest that D/-graph and tree estimators yield far better performance than conventional estimators especially on problems in which the proposal distribution is a bad approximation of the posterior distribution. P(e) P(e) P(e) P(e) Random -98 : n=57,d=50, E =6.35e-.3e-.5e-.e-.5e-.e-.05e- e-.95e-.9e IJGP-Sampling D/-Graph-IJGP-Sampling D/-Tree-IJGP-Sampling Random -0 : n=76,d=50, E =5.e-6.5e-6.e-6.05e-6 e-6.95e-6.9e-6.85e-6.8e-6.75e-6.7e e-.5e- e-.5e- e- IJGP-Sampling D/-Graph-IJGP-Sampling D/-Tree-IJGP-Sampling Figure 4: Random etworks Grids -30: n=56,d=, E =0 5e D/-Graph- D/-Tree- Grids -40 : n=444,d=, E =50 7.5e-4 7e-4 6.5e-4 6e-4 5.5e-4 5e-4 4.5e-4 4e-4 3.5e-4 3e D/-Graph- D/-Tree- Figure 5: Grid etworks 7 Related Work and Summary The work presented in this paper is related to the work by [Hernndez and Moral, 995, Kjærulff, 995, Dawid et al., 994] who perform sampling based inference on a junction tree. The main idea in these papers is to perform message passing on a junction tree by substituting messages which are too hard to compute exactly by their sampling-based approx-

8 Log P(e) Log P(e) linkage-instance n=777,d=36, E = D/-Graph- D/-Tree- linkage-instance n=80,d=36, E = D/-Graph- D/-Tree- exact computations to achieve variance reduction. In fact, variance reduction due to Rao-lackwellisation is orthogonal to the variance reduction achieved by D/ estimation and therefore the two could be combined to achieve more variance reduction. lso, unlike our work which focuses on probability of evidence, the focus of these aforementioned papers was on belief updating. To summarize, the paper introduces a new sampling based estimation technique called D/ importance sampling. The main idea of our new scheme is to derive statistics on the generated samples by using an D/ tree or graph that takes advantage of the independencies present in the graphical model. We proved that the sample mean computed on an D/ tree or graph may have smaller variance than the sample mean computed using the conventional approach. Our experimental evaluation is preliminary but quite promising showing that on most instances D/ sample mean has lower error than importance sampling and sometimes by significant margins. cknowledgements Log P(e) linkage-instance n=00,d=36, E = D/-Graph- D/-Tree- (c) This work was supported in part by the SF under award numbers IIS , IIS and IIS-0738 and the IH grant R0-HG References [idyuk and Dechter, 003] idyuk,. and Dechter, R. (003). n empirical study of w-cutset sampling for bayesian networks. In Proceedings of the 9th nnual Conference on Uncertainty in rtificial Intelligence (UI-03). [Cheng and Druzdzel, 000] Cheng, J. and Druzdzel, M. J. (000). is-bn: n adaptive importance sampling algorithm for evidential reasoning in large bayesian networks. J. rtif. Intell. Res. (JIR), 3: Log P(e) linkage-instance n=749,d=36, E = D/-Graph- D/-Tree- (d) Figure 6: Linkage ayesian etworks imations. [Kjærulff, 995, Dawid et al., 994] use Gibbs sampling while [Hernndez and Moral, 995] use importance sampling to approximate the messages. Similar to recent work on Rao-lackwellised sampling such as [idyuk and Dechter, 003, Paskin, 004, Gogate and Dechter, 005], variance reduction is achieved in these junction tree based sampling schemes because of some exact computations; as dictated by the Rao-lackwell theorem. D/ estimation, however, does not require [Dawid et al., 994] Dawid,. P., Kjaerulff, U., and Lauritzen, S. L. (994). Hybrid propagation in junction trees. In IPMU 94, pages [Dechter and Mateescu, 007] Dechter, R. and Mateescu, R. (007). D/ search spaces for graphical models. rtificial Intelligence, 7(-3): [Fishelson and Geiger, 003] Fishelson, M. and Geiger, D. (003). Optimizing exact genetic linkage computations. In RECOM 003. [Geweke, 989] Geweke, J. (989). ayesian inference in econometric models using monte carlo integration. Econometrica, 57(6): [Gogate and Dechter, 005] Gogate, V. and Dechter, R. (005). pproximate inference algorithms for hybrid bayesian networks with discrete constraints. UI [Gogate and Dechter, 007] Gogate, V. and Dechter, R. (007). Samplesearch: scheme that searches for consistent samples. ISTTS 007. [Hernndez and Moral, 995] Hernndez, L. D. and Moral, S. (995). Mixing exact and importance sampling propagation algorithms in dependence graphs. International Journal of pproximate Reasoning, (8): [Kjærulff, 995] Kjærulff, U. (995). Hugs: Combining exact inference and gibbs sampling in junction trees. In UI, pages [Paskin, 004] Paskin, M.. (004). Sample propagation. In Thrun, S., Saul, L., and Schölkopf,., editors, dvances in eural Information Processing Systems 6. MIT Press, Cambridge, M. [Rubinstein, 98] Rubinstein, R.. (98). Simulation and the Monte Carlo Method. John Wiley & Sons, Inc., ew ork,, US. [uan and Druzdzel, 006] uan, C. and Druzdzel, M. J. (006). Importance sampling algorithms for ayesian networks: Principles and performance. Mathematical and Computer Modelling.

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Improving Bound Propagation

Improving Bound Propagation Improving Bound Propagation Bozhena Bidyuk 1 and Rina Dechter 2 Abstract. This paper extends previously proposed bound propagation algorithm [11] for computing lower and upper bounds on posterior marginals

More information

Studies in Solution Sampling

Studies in Solution Sampling Studies in Solution Sampling Vibhav Gogate and Rina Dechter Donald Bren School of Information and Computer Science, University of California, Irvine, CA 92697, USA. {vgogate,dechter}@ics.uci.edu Abstract

More information

Dynamic importance sampling in Bayesian networks using factorisation of probability trees

Dynamic importance sampling in Bayesian networks using factorisation of probability trees Dynamic importance sampling in Bayesian networks using factorisation of probability trees Irene Martínez Department of Languages and Computation University of Almería Almería, 4, Spain Carmelo Rodríguez

More information

Cutset sampling for Bayesian networks

Cutset sampling for Bayesian networks Cutset sampling for Bayesian networks Cutset sampling for Bayesian networks Bozhena Bidyuk Rina Dechter School of Information and Computer Science University Of California Irvine Irvine, CA 92697-3425

More information

COMPSCI 276 Fall 2007

COMPSCI 276 Fall 2007 Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1 elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2 Probabilistic Inference

More information

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = = Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 Inference: Exploiting Local Structure aphne Koller Stanford University CS228 Handout #4 We have seen that N inference exploits the network structure, in particular the conditional independence and the

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Bayes Networks 6.872/HST.950

Bayes Networks 6.872/HST.950 Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Generalized Evidence Pre-propagated Importance Sampling for Hybrid Bayesian Networks

Generalized Evidence Pre-propagated Importance Sampling for Hybrid Bayesian Networks Generalized Evidence Pre-propagated Importance Sampling for Hybrid Bayesian Networks Changhe Yuan Department of Computer Science and Engineering Mississippi State University Mississippi State, MS 39762

More information

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Tensor rank-one decomposition of probability tables

Tensor rank-one decomposition of probability tables Tensor rank-one decomposition of probability tables Petr Savicky Inst. of Comp. Science Academy of Sciences of the Czech Rep. Pod vodárenskou věží 2 82 7 Prague, Czech Rep. http://www.cs.cas.cz/~savicky/

More information

Bayesian Networks Representation and Reasoning

Bayesian Networks Representation and Reasoning ayesian Networks Representation and Reasoning Marco F. Ramoni hildren s Hospital Informatics Program Harvard Medical School (2003) Harvard-MIT ivision of Health Sciences and Technology HST.951J: Medical

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

On the Power of Belief Propagation: A Constraint Propagation Perspective

On the Power of Belief Propagation: A Constraint Propagation Perspective 9 On the Power of Belief Propagation: A Constraint Propagation Perspective R. DECHTER, B. BIDYUK, R. MATEESCU AND E. ROLLON Introduction In his seminal paper, Pearl [986] introduced the notion of Bayesian

More information

Sampling-based Lower Bounds for Counting Queries

Sampling-based Lower Bounds for Counting Queries Sampling-based Lower Bounds for Counting Queries Vibhav Gogate Computer Science & Engineering University of Washington Seattle, WA 98195, USA Email: vgogate@cs.washington.edu Rina Dechter Donald Bren School

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models, Spring 2015 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Yi Cheng, Cong Lu 1 Notation Here the notations used in this course are defined:

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to ayesian Networks Lecture 4: Inference in ayesian Networks: The VE lgorithm Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University

More information

On the Relationship between Sum-Product Networks and Bayesian Networks

On the Relationship between Sum-Product Networks and Bayesian Networks On the Relationship between Sum-Product Networks and Bayesian Networks by Han Zhao A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017 Algorithms for Reasoning with Probabilistic Graphical Models Class 3: Approximate Inference International Summer School on Deep Learning July 2017 Prof. Rina Dechter Prof. Alexander Ihler Approximate Inference

More information

Context-specific independence Parameter learning: MLE

Context-specific independence Parameter learning: MLE Use hapter 3 of K&F as a reference for SI Reading for parameter learning: hapter 12 of K&F ontext-specific independence Parameter learning: MLE Graphical Models 10708 arlos Guestrin arnegie Mellon University

More information

Bayesian Networks Factor Graphs the Case-Factor Algorithm and the Junction Tree Algorithm

Bayesian Networks Factor Graphs the Case-Factor Algorithm and the Junction Tree Algorithm Bayesian Networks Factor Graphs the Case-Factor Algorithm and the Junction Tree Algorithm 1 Bayesian Networks We will use capital letters for random variables and lower case letters for values of those

More information

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

More information

On the Power of Belief Propagation: A Constraint Propagation Perspective

On the Power of Belief Propagation: A Constraint Propagation Perspective 9 On the Power of Belief Propagation: A Constraint Propagation Perspective R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon Introduction In his seminal paper, Pearl [986] introduced the notion of Bayesian

More information

Bayesian networks: approximate inference

Bayesian networks: approximate inference Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

Lifted Inference: Exact Search Based Algorithms

Lifted Inference: Exact Search Based Algorithms Lifted Inference: Exact Search Based Algorithms Vibhav Gogate The University of Texas at Dallas Overview Background and Notation Probabilistic Knowledge Bases Exact Inference in Propositional Models First-order

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car CSE 573: Artificial Intelligence Autumn 2012 Bayesian Networks Dan Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer Outline Probabilistic models (and inference)

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Bayesian Networks and Markov Random Fields

Bayesian Networks and Markov Random Fields Bayesian Networks and Markov Random Fields 1 Bayesian Networks We will use capital letters for random variables and lower case letters for values of those variables. A Bayesian network is a triple V, G,

More information

Basing Decisions on Sentences in Decision Diagrams

Basing Decisions on Sentences in Decision Diagrams Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Basing Decisions on Sentences in Decision Diagrams Yexiang Xue Department of Computer Science Cornell University yexiang@cs.cornell.edu

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall Chapter 8 Cluster Graph & elief ropagation robabilistic Graphical Models 2016 Fall Outlines Variable Elimination 消元法 imple case: linear chain ayesian networks VE in complex graphs Inferences in HMMs and

More information

Graph structure in polynomial systems: chordal networks

Graph structure in polynomial systems: chordal networks Graph structure in polynomial systems: chordal networks Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

More information

Exact Inference Algorithms Bucket-elimination

Exact Inference Algorithms Bucket-elimination Exact Inference Algorithms Bucket-elimination COMPSCI 276, Spring 2011 Class 5: Rina Dechter (Reading: class notes chapter 4, Darwiche chapter 6) 1 Belief Updating Smoking lung Cancer Bronchitis X-ray

More information

The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks

The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks Johan H.P. Kwisthout and Hans L. Bodlaender and L.C. van der Gaag 1 Abstract. Algorithms for probabilistic inference in Bayesian

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Sums of Products. Pasi Rastas November 15, 2005

Sums of Products. Pasi Rastas November 15, 2005 Sums of Products Pasi Rastas November 15, 2005 1 Introduction This presentation is mainly based on 1. Bacchus, Dalmao and Pitassi : Algorithms and Complexity results for #SAT and Bayesian inference 2.

More information

Variable Elimination: Basic Ideas

Variable Elimination: Basic Ideas Variable Elimination: asic Ideas Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference lgorithms 2. Variable Elimination: the asic ideas 3. Variable Elimination Sum-Product VE lgorithm Sum-Product

More information

Dynamic Importance Sampling in Bayesian Networks Based on Probability Trees

Dynamic Importance Sampling in Bayesian Networks Based on Probability Trees Dynamic Importance Sampling in Bayesian Networks Based on Probability Trees Serafín Moral Dpt. Computer Science and Artificial Intelligence University of Granada Avda. Andalucía 38, 807 Granada, Spain

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Inference in Bayesian networks

Inference in Bayesian networks Inference in Bayesian networks AIMA2e hapter 14.4 5 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian belief networks. Inference.

Bayesian belief networks. Inference. Lecture 13 Bayesian belief networks. Inference. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Midterm exam Monday, March 17, 2003 In class Closed book Material covered by Wednesday, March 12 Last

More information

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014 10-708: Probabilistic Graphical Models 10-708, Spring 2014 1 : Introduction Lecturer: Eric P. Xing Scribes: Daniel Silva and Calvin McCarter 1 Course Overview In this lecture we introduce the concept of

More information

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter Road Map n Graphical models n Constraint networks Model n Inference n Search n Probabilistic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Hidden Markov Models (recap BNs)

Hidden Markov Models (recap BNs) Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Likelihood Weighting and Importance Sampling

Likelihood Weighting and Importance Sampling Likelihood Weighting and Importance Sampling Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Likelihood Weighting Intuition Importance Sampling Unnormalized Importance Sampling Normalized Importance

More information

Bayesian Networks: Independencies and Inference

Bayesian Networks: Independencies and Inference Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull

More information

On finding minimal w-cutset problem

On finding minimal w-cutset problem On finding minimal w-cutset problem Bozhena Bidyuk and Rina Dechter Information and Computer Science University Of California Irvine Irvine, CA 92697-3425 Abstract The complexity of a reasoning task over

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

A Differential Approach to Inference in Bayesian Networks

A Differential Approach to Inference in Bayesian Networks A Differential Approach to Inference in Bayesian Networks Adnan Darwiche Computer Science Department University of California Los Angeles, Ca 90095 darwiche@cs.ucla.edu We present a new approach to inference

More information

Graph structure in polynomial systems: chordal networks

Graph structure in polynomial systems: chordal networks Graph structure in polynomial systems: chordal networks Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

Mixed deterministic and probabilistic networks

Mixed deterministic and probabilistic networks Noname manuscript No. (will be inserted by the editor) Mixed deterministic and probabilistic networks Robert Mateescu Rina echter Received: date / ccepted: date bstract The paper introduces mixed networks,

More information