Gaussian graphical model determination based on birth-death MCMC inference

Size: px
Start display at page:

Download "Gaussian graphical model determination based on birth-death MCMC inference"

Transcription

1 Gaussian graphical model determination based on birth-death MCMC inference Abdolreza Mohammadi Johann Bernoulli Institute, University of Groningen, Netherlands Ernst C. Wit Johann Bernoulli Institute, University of Groningen, Netherlands February 5, 2013 Abstract We propose an efficient Bayesian methodology for model determination in Gaussian graphical models for both decomposable and nondecomposable cases. The proposed methodology is a trans-dimensional MCMC approach, which makes use of the spatial birth-death process. The birth-death process jumps through all possible graphical models by adding a new edge in a birth event or deleting an edge in a death event. The proposed method is easy to implement and computationally feasible for large graphical models. We illustrate the efficiency of the proposed methodology on simulated and real datasets. Besides, we have implemented the proposed methodology into an R-package, called BDgraph which is freely available online. Keywords : Bayesian model selection, Gaussian graphical models, Non-decomposable graphs, Birth-death process, Markov chain Monte Carlo, G-Wishart. 1 Introduction In archetypal high-dimensional inference problems, large number of variables are recorded on a relatively small number of observations. Examples of these kind of high-dimensional problems are detecting neurological associations in fmri data, inferring gene networks from genomic data or predicting movie preferences in sparse film rating data. The simplest way to describe these types of multivariate data would be by means of a multivariate Gaussian distribution. In the high-dimensional problems, either by design or necessity, it can be to interest to look at the multivariate Gaussian distributions with a reduced parameter space. Covariance-selection models or Gaussian graphical models offer a potent set of tools for shrinkage and regularization of covariance matrices in these kind of high-dimensional problems. Dempster (1972) proposed the 1

2 method which reduces the number of parameters in Gaussian graphical models by setting to zero selected elements of the precision matrix. In addition, the dependency patterns among the variables in the model can be visually summarized by means of a undirected graph G = (V, E). In the graph each variable is associated with a vertex V = {1,..., p} and the edges set E V V. In case the underlying variables are multivariate normal then the off-diagonal elements of precision matrix that are unequal to zero show the edges which link the vertices. The graphical model is undirected since the precision matrix is symmetric. Graphs with p nodes have m = p(p 1)/2 possible edges. As a result, there are 2 m possible graphical models corresponding to all combinations of individual edges being included in or excluded from the model. For instance, for a graph with p = 8, there are more than 250 million structurally different graphical models. In typical high-dimensional problems, such as in genetic networks, there are hundreds of nodes. This motivates the development of efficient, scalable search methodologies, which are able to move through all possible graphical models for inferring a model close to the true one, or at least for distinguishing a set of true edges from irrelevant ones. Roverato (2002), Jones et al. (2005) and Lenkoski and Dobra (2011) proposed Bayesian approaches for computing the posterior distribution of graph based on G-Wishart prior distribution. The ability to focus on the graph alone allows for the development of various search algorithms to visit the high probability regions of model space. However, determining the graphical models with the highest posterior probability requires knowledge of the normalizing constants for all possible graphical models and these normalizing constants are not available analytically unless the graph is decomposable. Such methods are unsuitable as the basis of an MCMC sampling scheme for even moderate p because there is a huge number of possible graphical models with only a small fraction of them being decomposable. Some approaches try to approximate the normalizing constant for the non-decomposable graphical models; Roverato (2002) proposed importance sampling, Atay-Kayis and Massam (2005) proposed Monte Carlo sampling and Lenkoski and Dobra (2011) proposed Laplace approximation. Nevertheless, normalizing constant approximation is often the crucial part of the computation. An alternative is a trans-dimensional MCMC methodology. In this methodology the MCMC algorithm can move through all possible models for not only selecting a best model, but also estimating the parameters of the best model, simultaneously. One special and popular case is reversible-jump MCMC (RJM- CMC) approach which have been proposed by Green (1995). Reversible jump methods allow for the construction of an ergodic Markov chain with the joint posterior distribution of the parameters and the model as its stationary distribution. Moves between models are achieved by periodically proposing a move to a different model, and rejecting it with appropriate probability to ensure that the chain possesses the required stationary distribution. Ideally these proposed moves are designed to have a high probability of acceptance so that the algorithm explores the model space efficiently, though this is not always easy. Giudici and Green (1999) used this methodology in the decomposable Gaus- 2

3 sian graphical models which works only for low-dimensional models. Dobra et al. (2011) proposed another RJMCMC method based on Cholesky decompositions of precision matrix which requires the computationally intensive matrix completion. Moreover, both methods still require to compute the normalizing constant. Another trans-dimensional MCMC methodology is the birth-death MCMC (BDMCMC) approach which is based on a continuous time Markov process. This methodology was developed by Stephens (2000) for the use in finite mixture models with variable dimensions, following earlier proposals by Preston (1976), Ripley (1977) and Geyer and Møller (1994). In this method, the time between jumps to a larger dimensional model, i.e. births, or jumps to a smaller dimensional model, i.e. deaths, is taken to be a random variable with a specific rate. The choice of the birth and death rates determines the birth-death process and is chosen in such a way that its stationary distribution is precisely the posterior distribution of interest. In contrast to the RJMCMC approach, moves between models are always accepted, which can make the BDMCMC approach extremely efficient. In this paper, we propose a novel Bayesian framework for Gaussian graphical models based on the BDMCMC methodology. In our proposed BDMCMC method, we add or remove an edge via birth and death events, respectively. The birth and death events are modeled as an independent Poisson processes. Therefore, the time of a birth or death is exponentially distributed. Our proposed methodology applies to general graphical models, i.e. both decomposable and non-decomposable models. It can be used for high-dimensional problems, i.e. graphical models with more than 120 nodes; example 4.2. In section 2, after briefly introducing the notation and preliminary background material related to graphical models from a Bayesian point of view, we propose our Bayesian model selection in Gaussian graphical models based on the BDMCMC methodology. In addition, for the proposed BDMCMC algorithm, we consider two different death rates for low or high dimensional cases, respectively. Section 3 contains the specific implementation of the proposed method, such as proposing suitable prior distributions, an algorithm for sampling from the precision matrix and computing the death rates. In section 4 we demonstrate the performance of proposed methodology in several simulations and real data. We conclude the paper with a discussion of various possible extensions of proposed methodology. 2 Birth-death MCMC inference for Gaussian graphical models 2.1 Bayesian graphical models We briefly introduce some notation and the structure of undirected Gaussian graphical models. For a comprehensive introduction to Gaussian graphical models see Lauritzen (1996) and Whittaker (1990). Let G = (V, E) be a undirected 3

4 graph, where V = {1, 2,..., p} is the set of p vertices and E is the edges set. Let W = {(i, j) i, j V, i j}, V = {(i, j) i j, such that i = j or (i, j) E}, and E = W\V. We define Gaussian graphical models with respect to the graph G and zero mean as M G = { N p (0, Σ) K = Σ 1 P G }, where P G denotes the space of p p positive definite matrices with entries (i, j) equal to zero whenever (i, j) E, that is P G = { K P k ij = 0, for (i, j) E; G = (V, E) }, where K = {k ij } and P denotes the space of p p positive definite matrices. Note that P G P. Let x = (x 1,..., x n ) be an independent and identically distributed sample of size n from a Gaussian graphical model M G. Therefore, the likelihood is p(x K, G) = (2π) np/2 K n/2 exp { 12 } tr(ks), (1) where S = x x. In our graphical model we are dealing with two kinds of uncertainties: (a) uncertainty about the structure of the underlying conditional independence graph and (b) uncertainty about the parameters of the graphical model. Our aim is to propose a Bayesian framework to deal with both of these uncertainties. It is natural to define the joint prior on the graph and precision matrix via product rule p(g, K) = p(g)p(k G). Therefore, the joint posterior distribution on (G, K) is p(k, G x) P (x K, G)p(K, G) P (x K, G)p(K G)p(G). (2) 2.2 Birth-death process for Gaussian graphical models Here, we propose a continuous time Markov birth-death process for Gaussian graphical model determination based on theory derived by (Preston, 1976, Section 5). The birth and death events of the edges occur in continuous time with rates determined by the stationary distribution of the process. Let G = (V, E) be the current state of the process. If it is the time for birth of edge ξ = (i, j) E, the process jumps from current graph to a new graph G +ξ = (V, E ξ). If, however, it is the time for death, then an edge ξ = (i, j) E is removed and the process jumps from the current graph to a new graph G ξ = (V, E \ ξ). 4

5 Suppose at time t the process is at state M G in which G = (V, E) with precision matrix K P G and let Ω = G G P G where G denotes the set of all possible graphical models. Now, we consider the following continuous time Markov birth-death process on Ω as follows: Death: When the process is at state M G each edge ξ = (i, j) E dies independently of the others as a Poisson process with a rate δ ξ (K). Thus, the overall death rates is given by δ(k) = ξ E δ ξ (K). When death of an edge ξ = (i, j) E occurs as a result the parameter k ξ dies and the process jumps from K to K ξ = K \k ξ. We define the matrix K ξ is equal with matrix K except for the entry in positions {(i, j), (j, i), (j, j)}. We put 0 in positions (i, j) and (j, i). To get a guarantee that new precision matrix is positive definite, we set the (j, j)th entry in K ξ to k jj c + c, where and c = K j,v \j ( KV \j,v \j ) 1 KV \j,j, ( ) 1 c = K ξ j,v \j K ξ V \j,v \j K ξ V \j,j. The idea of the proposed modification is coming from block Gibbs sampling method; see (8). Birth: When the process is at state M G a new edge ξ = (i, j) E is born independently of the others as a Poisson process with a rate β ξ (K) respecting to the product measure R. Then the overall birth rates is given by β(k) = ξ E β ξ (K). When a birth of ξ E occurs the process jumps from K to K +ξ = K k ξ. We define matrix K +ξ is equal with matrix K except for the entry in positions {(i, j), (j, i), (j, j)}. We put non-zero value k ξ in positions (i, j) and (j, i). k ξ is chosen according to proposed density b ξ (k ξ ; K). To get a guarantee that new precision matrix is positive definite, we set the (j, j)th entry in K ξ to k jj c + c +, where ( ) 1 c + = K +ξ j,v \j K +ξ V \j,v \j K +ξ V \j,j. Thus a death decreases the number of parameters by one, while a birth increases the number of parameters by one. In our birth death process, the time to the next birth/death event is exponentially distributed, with mean 1/(β(K) + δ(k)). As a result, it will be a death of edge ξ E with probability δ ξ (K)/(β(K) + δ(k)), likewise a birth of edge ξ E with probability β ξ (K)/(β(K) + δ(k)). To show the stationary distribution for the birth-death process is precisely the posterior distribution p(k, G x), we require the following sufficient condition on the birth and death rates. 5

6 Theorem 2.1. The birth-death process defined above has stationary distribution p(k, G x), if for each ξ E β ξ (K)b ξ (k ξ ; K)p(G, K x) = δ ξ (K +ξ )p(g +ξ, K +ξ x). (3) Proof. See the Appendix. 2.3 The proposed BDMCMC algorithm Here we propose a BDMCMC algorithm based on a specific choice of the birth and death rates that satisfy the Theorem 2.1. Suppose we consider the birthdeath process obtained by setting fixed birth rates as β ξ (K) = β 0 for each ξ E, where β 0 is an arbitrary fixed number in R +. Therefore, according to (3), the death rates are δ ξ (K) = b ξ(k ξ ; K ξ )p(g ξ, K ξ x) β 0 for each ξ E. (4) p(g, K x) Based on these birth and death rates, we can determine the BDMCMC algorithm for the Gaussian graphical models as below. Algorithm 2.1. BDMCMC algorithm. Starting with initial graphical model M G in which G = (V, E) with the precision matrix K, iterate the following steps: 1. Let the birth rates β ξ (K) = β 0, for each edge ξ E. 2. Calculate the total birth rate β(k) = E β Calculate the death rates by: δ ξ (K) = b ξ(k ξ ; K ξ )p(g ξ, K ξ x) β 0 for each ξ E. p(g, K x) 4. Calculate the total death rate δ(k) = ξ E δ ξ(k). 5. Calculate the waiting time by λ(k) = 1/(β(K) + δ(k)). 6. Simulate the type of jump: a birth or death with respective probabilities p(birth element ξ) = β ξ(k), for each ξ E, λ(k) p(death element ξ) = δ ξ(k), for each ξ E. λ(k) 7. According to the type of jump sample from the posterior distribution of new precision matrix. For step 7, in subsection 3.4 we explain how to sample from the posterior distribution of precision matrix. 6

7 Figure 1: Illustration of sampling from BDMCMC algorithm in continuous time. According to this figure we sampling from the BDMCMC algorithm in times {t 1, t 2, t 3, t 4,...}. 2.4 Sampling from BDMCMC algorithm in continuous time In the RJMCMC approach or other kinds of Metropolis-Hastings algorithms, the outputs are typically monitored after each iteration. In our continuous time BDMCMC algorithm there are several choices for a sampling scheme. For example, we can sample from the continuous time Markov process at regular times, as Stephens (2000). Another way is to sample in each step of jumping to the new state, as we do in this paper; see figure 1. Then effectively put the weight on each state visited by algorithm, when computing the sample mean. The weights are equal with the length of the holding time in that state. In other word, assume that the process is in state M G with precision matrix K, thus the holding time for this state would be λ(k) in which λ(k) = β(k) + δ(k); See (9). In this way, the variances of estimators built from the sampler output are decreased; For more details see Cappé et al. (2003) subsection Specific implementation of the BDMCMC algorithm 3.1 Proposed prior distributions In our proposed Bayesian methodology, we embed the joint inference problem naturally in the structure of a Bayesian hierarchical model. Given a prior p(g) over the graph, we set a prior distribution for its precision matrix p(k G). For the prior distribution of graph we propose two different prior distributions. One is discrete uniform distribution on the graph space G as below p(g) = 1, for each G G, G 7

8 where G denotes the set of all possible graphical models. A second prior distribution of the graph is given by a truncated Poisson distribution on the edge degree (degree(g) T P (γ)) with the probabilities of the graphs proportional to p(g) γ E, for each G G, E! where E is the number of edges in the graph G. For simplicity in computing the death rates we can put γ = β 0 which β 0 is the birth rate in our BDMCMC algorithm. For the prior distribution of precision matrix, we use the G-Wishart distribution. The G-Wishart distribution is extremely attractive, since it represents the conjugate prior to normally distributed data. It easily places probability no mass on absent edges of graph. A zero constrained random matrix K P G has the G-Wishart distribution W G (b, D) as below 1 p(k G) = I G (b, D) K (b 2)/2 exp { 12 } tr(dk), where b > 2 is the degree of freedom, D is a symmetric positive definite matrix and I G (b, D) is the normalizing constant, namely, I G (b, D) = K (b 2)/2 exp { 12 } tr(dk) dk. P G Hence, conditional on a specific graph and an observed dataset x, the posterior distribution of K is { 1 p(k x, G) = I G (b, D ) K (b 2)/2 exp 1 } 2 tr(d K), where b = b+n and D = D+S. This posterior distribution is also G-Wishart, W G (b, D ). 3.2 Computing the death rates for low dimensional cases By using uniform prior distribution for graph, and G-Wishart distribution for precision matrix, we have p(g, K x) P (x K, G)p(K G)p(G) { 1 I G (b, D) K (b 2)/2 exp 1 } 2 tr(d K), where b = b + n and D = D + S. As a result, for each ξ E, the death rate according to (4) is 8

9 δ ξ (K) = p(g ξ, K ξ x) β 0 b ξ (k ξ ; K ξ ) p(g, K x) = I ( G(b, D) K ξ ) (b 2)/2 exp { 1 2 tr(d K ξ ) } I G ξ(b, D) K exp { 1 2 tr(d K) } β 0b ξ (k ξ ; K ξ ) = I ( G(b, D) K ξ ) (b 2)/2 { exp 1 } I G ξ(b, D) K 2 tr(d (K ξ K)) β 0 b ξ (k ξ ; K ξ ). For the proposal density, b ξ (k ξ ; K ξ ), according to the G-Wishart distribution property, we propose a Normal distribution k ξ N( d ij d k ii, k ii jj d ), for each ξ E, jj which d ij is the (i, j) entry of matrix D. For computing above death rates, we need to compute the ratio of normalizing constants of G-Wishart distribution. There is no direct way to obtain the exact value of the normalizing constant. This is the biggest computational bottleneck not only in our Bayesian approach, but also in the Bayesian Gaussian graphical model literature; see Atay-Kayis and Massam (2005), Lenkoski and Dobra (2011), and Dobra et al. (2011). To approximate the normalizing constant of the G-Wishart distribution, Atay-Kayis and Massam (2005) proposed a Monte Carlo method based on the Cholesky decomposition, described below. Let G = (V, E) be an arbitrary Gaussian graphical model with precision matrix K that K W G (b, D). According to Cholesky decomposition we have K = T Ψ ΨT that Ψ = (ψ ij ) p p and D 1 = T T. Then, for i = 1,..., p, ψ 2 ii has the chi-square distribution with b + ν i degrees of freedom and, for (i, j) E, ψ ij has standard Normal distribution that all are mutually independent. For the ψ ij that (i, j) E, are well-defined functions of Ψ ν, as ( j 1 ψ ij = ψ ik h kj 1 i 1 i ( j ψ rl h li) ψ rl h li ), ψ ii k=i r=1 l=r l=r where h ij = t ij /t jj and t ij being the (i, j) entry of matrix T. In particular, for i = 1 and (1, j) E j 1 ψ 1j = ψ 1k h kj. k=1 According to Atay-Kayis and Massam (2005), the normalizing constant of G-Wishart distribution W G (b, D) is I G (b, D) = C b,t E G [f T (Ψ ν )], 9

10 where and f T (Ψ ν ) = exp 1 2 C b,t = 2 (pb/2+ ν i) π ν i/2 i=1 ξ E ψ 2 ξ, p ( ) b + νi Γ t b+τi ii, 2 where ν i is the number of neighbors of node i subsequent to it in the ordering of vertices and τ i is total number of neighbors of node i. We can approximate E G [f T (Ψ ν )] according to Atay-Kayis and Massam (2005), as below. Algorithm 3.1. Monte Carlo method. Given the arbitrary graph G = (V, E): 1. Sample Ψ following Steps 1,2,3 and 4 in Section 4.2 of Atay-Kayis and Massam (2005). ( ) 2. Compute f (k) T (Ψν ) = exp 1 2 ξ E (ψk ξ )2, for N iterations. 3. Compute E G [f T (Ψ ν )] = 1 N N exp 1 (ψξ k ) 2. 2 k=1 ξ E With some computation, we can write the ratio of the normalizing constant, for each ξ = (i, j) E, as below I G (b, D) I G ξ(b, D) = 2 πt ii t jj Γ ( b+νi ) 2 Γ ( b+ν i 1 2 As a result, we can write the death rates as below ) E G [f T (ψ ν )] E G ξ [f T (ψ ν )]. (5) δ ξ (K) = 2 Γ ( b+ν i ) 2 πt ii t jj Γ ( b+ν i 1) E G [f T (ψ ν )] E 2 G ξ [f T (ψ ν )] ( K ξ ) (b 2)/2 { exp 1 } K 2 tr(d (K ξ K)) β 0 b ξ (k ξ ; K ξ ),(6) in which the ratio of expectations are computed by Algorithm 3.1. We should mention that the proposed BDMCMC algorithm according to the above death rates is slow; see example 4.1. As a result, it is not suitable for high-dimensional cases. 10

11 Plot for ratio of normalizing constants ratio of expectation number of nodes Figure 2: Plot the ratio of expectation in 5 for the graphs with the same structure by increasing the dimension of the graph (s). This plot shows that for high-dimensional cases this ratio of expectation converging to one. 3.3 Computing death rates for high-dimensional cases Computing the ratio of normalizing constants is the main computational bottleneck not only in our proposed BDMCMC algorithm but also in other Bayesian approaches in this area. For example, Wang and Li (2012) proposed double Metropolis Hastings algorithm to avoid the computationally expensive normalizing constants that made their algorithm mach faster. Nevertheless, in their methodology instead of computing ratio of normalizing constant they need sampling from precision matrix which for high-dimensional cases takes time. Here, we propose easy way to compute ratio of normalizing constants which makes sense especially for high-dimensional cases. We can write the ratio of normalizing constants according to (5) which includes a ratio of expectations. Fortunately, our results indicate that for high-dimensional cases, this ratio of expectations converges to one. To show this assumption makes sense for highdimensional case, according to algorithm 3.1. we computed this ratio of expectations for the graphs with the same structure by increasing the dimension of the graph (p). The result is in a figure 2. This figure shows that ratio of expectations in (5) for high-dimensional cases converges to one. This ratio does not depend on the data. Therefore, for high-dimensional graphs, the death rates become δ ξ (K) = 2 Γ ( ) b+ν i 2 πt ii t jj Γ ( b+ν i 1 2 { exp 1 2 tr(d (K ξ K)) ( K ξ ) (b 2)/2 ) K } β 0 b ξ (k ξ ; K ξ ). (7) 11

12 In simulation examples 4.1 and 4.2 we show that our proposed BDMCMC algorithm according to above death rates is fast and accurate for high-dimensional cases. 3.4 Sampling from precision matrix In our proposed BDMCMC algorithm, we need to sample from the conditional distribution of precision matrix. For sampling from precision matrix under a G-Wishart distribution, several sampling methods have been proposed; Block Gibbs sampling (Wang and Li (2012)), Metropolis-Hastings method (Mitsakakis et al. (2011)), and accept-reject algorithm (Wang and Carvalho (2010)). Wang and Li (2012) review all existing methods and they show the block Gibbs sampler method generally outperforms all other proposed methods. Here, we briefly review the block Gibbs sampler method. Let G = (V, E) be an arbitrary graph with precision matrix K that K W G (b, D) and let l V denote a complete subset of graph. (Roverato, 2002, Lemma 1) shows that for any complete subset of graph like l, we have K l,l c K \ K l,l W (b + p l, D l,l ), (8) where c = K l,v \l (K V \l,v \l ) 1 K V \l,l and l is a size of complete subset l and W denotes a standard Wishart distribution. Now, according to Wang and Li (2012), we can summarize the block Gibbs samplers as follows. Algorithm 3.2. Block Gibbs sampler. Given an arbitrary graph G = (V, E), construct a sequence of complete subsets l = {l k }, where l k V such that k l k = E: 1. Generate A W (b + p l, D l,l ). 2. Set K l,l = A + K l,v \l (K V \l,v \l ) 1 K V \l,l. For an arbitrary graph G, the choice of complete subsets l may not be unique. Lenkoski and Dobra (2011) proposed the special case where l is a collection of the maximum cliques. This case requires an algorithm for maximum clique decomposition which computationally is expensive. Another extreme is proposed by Wang and Li (2012) where l is a collection of the edge set E and they call it edgewise block Gibbs sampler. 4 Statistical performance of proposed methodology In this section we present the result of the analyses for a real and two simulation datasets considered for both high and low dimensional cases. All computations have been done by an R package, called BDgraph. The R package is freely available from the Comprehensive R Archive Network at http: //CRAN.R-project.org/package=BDgraph. All timings were carried out on a Intel(R) Core(TM) i5 CPU 2.67GHz processor. 12

13 4.1 Simulation example 1: Graph with 8 nodes We consider a graph with 8 nodes, in which we have more than 250 million graphical models. We assume the true graphical model is M G = { N 8 (0, Σ) K = Σ 1 P G }, in which the precision matrix is K = We sample from the true graph with n = 100. For the prior distribution of graph, we place a uniform prior distribution. For the prior distribution of precision matrix, we proposed the G-Wishart prior distribution W G (3, I 8 ). First, we run the BDMCMC algorithm (Algorithm 2.1) with death rates according to (7). We run the algorithm with iterations and 5000 iterations as a burn-in and it takes 55 seconds. We calculate the posterior edge inclusion probabilities as ˆp ξ = N t=1 N t=1 I(ξ G (t) ) λ(k (t) ) 1 λ(k (t) ), for each ξ W, (9) which N is a number of iterations, I(ξ G (t) ) is a general indicator function, so that I(ξ G (t) ) = 1 if ξ G (t) and zero otherwise and λ(k (t) ) is the waiting time in the graph G (t) with precision matrix K (t) ; see figure 1. By using this formula, for the posterior mean estimations of all edges ξ = (i, j) W we have ˆp ξ = Moreover, the posterior distribution of the true graph is 0.36 which is the most probable graph. With output of the BDMCMC algorithm, we can also estimate the matrix variance covariance and precision matrix. Estimation of 13

14 Pr(number of links in the graph data) number of links in the graph Figure 3: (Left) plot of posterior distribution for the graphs according to number of their edges. (Right) The cumulative occupancy fractions of all possible edges for checking convergency of our BDMCMC algorithm. It shows that our BDMCMC algorithm converges after roughly 4000 iteration. precision matrix is ˆK = Figure 3 in the left shows the estimation of posterior distribution for the graphs according to number of their edges. The figure shows that the posterior distribution for most of the graphs are zero. Furthermore, according to number of edges, the most probable graphs are the graph with 8 edges which is in total 0.36 and this probability also includes probability of the true graph which is 0.35 that is quite reasonable. In comparison to other Bayesian methodologies in this area, like the RJM- CMC, one of the advantages of our proposed BDMCMC algorithm is its fast convergency. A useful check on the convergency is given by the plot of the cumulative occupancy fraction for different edges against all iterations. It is represented in the right side of Figure 3. As the figure shows, our BDMCMC algorithm converges roughly after 4000 iterations. It can be seen that our burnin (5000) is more that adequate to achieve stability in the occupancy fractions. Effect of sample size. For checking the sensitivity of our BDMCMC algorithm to sample size, we run the BDMCMC algorithm in the same situation but for different number of observations. The result is shown in table 1. The 14

15 Table 1: Simulation results according to different number of observations for the graph with 8 nodes. It shows that the accuracy of the BDMCMC is depend on number of observation and for observation equal or more than 30 the proposed BDMCMC algorithm perfectly select the true graph as the best graph. n p(true graph data) false discovery false negative Table 2: Simulation results according to different value of b in W G (b, D), for the graph with 8 nodes. It shows that the result of our proposed BDMCMC algorithm almost is not sensitive to value of b. b p(true graph data) first row of the table is the number of observations. The second row is the posterior distribution of the true graph for the different value of observations. The third row (false discovery) is the number of true zeros estimated as nonzero and the forth row (false negative) is the number of off-diagonal non-zero elements estimated as zero. The result of the table 1 shows that our BDMCMC algorithm is sensitive to the number of observations. By increasing the number of observations the result of BDMCMC algorithm is going to be more accurate. Sensitivity to the priors. To evaluate the sensitivity of the BDMCMC algorithm to the prior distributions, first we check the result for different value of b, the parameter of prior distribution of precision matrix in W G (b, D). Then we evaluate the result for different prior distributions of the graph. In this example, we placed a uniform prior distribution for the graph. For evaluating the sensitivity of the results to the prior distribution of the graph, we also check the result by placing truncated Poisson distribution. Table 2 shows the results for different values of b. The results indicate that our BDMCMC algorithm is not very sensitive to value b. Table 3 shows the results for different values of β 0 (the birth rates here is the rate for truncated Poisson distribution, degree(g) T P (β 0 )). In addition, according to the result from tables 1, 2 and 3 our BDMCMC algorithm improves with sample size, but is not very sensitive to the prior dis- Table 3: Simulation results according to different value of β 0 in degree(g) T P (β 0 ), for the graph with 8 nodes.it shows that the result of our proposed BDMCMC algorithm almost is not sensitive to value of β 0. β p(true graph data)

16 Table 4: Result of our proposed BDMCMC algorithm with death rates according to 3.2 for different Monte Carlo iterations for the graph with 8 nodes. According to the result, for more than 100 MC iterations the BDMCMC is accurate but not fast. MC iteration CPU time (min.) p(true graph data) false discovery false negative tributions. Comparison two different death rates. For comparison and checking the accuracy of our BDMCMC algorithm with our two proposed different death rates for low and high dimensional cases according to (7) and (6), we also run the BDMCMC algorithm in the same situation for death rates according to (6). The results are given in table 4. It shows the CPU time per minute and accuracy of the BDMCMC algorithm (with death rates 6) with number of iterations in Monte Carlo approach, according to algorithm 3.1. In addition, the example shows the result of the BDMCMC algorithm with death rates according to (7) is almost the same with death rates according to (6). The main difference is computation time; the BDMCMC algorithm with death rates according to (7) gives us almost the same result in less than one minute. Table 4 shows that the BDMCMC algorithm with death rates according to (6) is not fast, therefore it is not suitable for high-dimensional problems. On the other hand, the BDMCMC algorithm with death rates according to (7) is fast and accurate, so we can use it for a high-dimensional problems. 4.2 Simulation example 2: Graph with 120 nodes For checking the accuracy of the BDMCMC algorithm for high-dimensional problems, here we consider a sparse circle graph with p = 120. We assume the true graphical model is given by M G = { N 120 (0, Σ) K = Σ 1 P G }, in which for the element of precision matrix we have k ii = 1, k ij = 0.5 for i j = 1, k 1p = k p1 = 0.4, and k ij = 0 otherwise. We sample from the true graphical model with n = We place a uniform prior distribution for the prior distribution of graph. For the prior distribution of precision matrix K, we proposed the G-Wishart prior distribution W G (3, I 120 ). We run the BDMCMC algorithm with death rates according to (7) with iterations and 5000 iterations as a burn-in. It takes only 190 minutes which shows that the algorithm is fast and outperforms other Bayesian approaches in this area. The posterior distribution of the true graph is 0.4 which is most probable graphical model. The posterior edge inclusion probabilities are calculated for 16

17 Pr(number of links in the graph data) number of links in the graph Figure 4: Plot of posterior distributions for the graphs according to number of their edges. each ξ W according to (9) in which the lowest probability including the true edge has probability 1 and the highest probability excluded the true edge has probability which is quite reasonable. Figure 4 shows the estimation of posterior distribution for the graphs according to number of their edges. 4.3 Real example: Cell signaling data Here we consider a flow cytometry dataset with 11 proteins from Sachs et al. (2005). By using Bayesian network inference, they fit a directed acyclic graph (DAG) to the data and produce the network shown in the left side of figure 5. Friedman et al. (2008) applied the undirected graphs from graphical lasso to the data for different values of the penalty parameter. In our Bayesian approach, we place a uniform prior distribution for the graph and the G-Wishart prior distribution W G (3, I 11 ) for precision matrix K. We run the BDMCMC algorithm with the death rates according to (7) with iterations and 5000 iterations as a burn-in. Running the algorithm takes less than 2 minutes. Figure 5 in the left shows the graph with the most posterior probability which is 0.125; this graph has 31 edges. Moreover, according to (9), the posterior mean estimations for all edges 17

18 Figure 5: (Left) Cell-signaling dataset: the most probable undirected graphical model according to the result from the BDMCMC algorithm. (Right) Result from Sachs et al. (2005): Directed graph from cell-signaling dataset according to Bayesian network inference. ξ = (i, j) W are ˆp = Discussion In this article, we have proposed a Bayesian method for determining the Gaussian conditional independence graphs based on birth-death MCMC inference. We derived the conditions for which the balance conditions of the birth-death MCMC methodology holds. In accordance with those conditions we proposed a convenient BDMCMC algorithm. If we use the exact death rates (6), we show in example 4.1 that the BDMCMC algorithm is accurate but not fast. However, our proposed BDMCMC algorithm according to death rates (7) is fast and for high-dimensional problems actually improves! A so called blessing of dimensionality. Our examples demonstrate that a scalable Bayesian inference methodology exists, which exactly in the case of large graphs is able to distinguish important edges from irrelevant ones and detect the true model with 18

19 high accuracy. The resulting graphical model is reasonably robust to modelling assumptions and priors used. A possible disadvantage of our BDMCMC algorithm is that the birth rates are constant across the edges and therefore the algorithm relies on death rates to converge to the most probable graph. However, this feature has the advantage that it allows for fast mixing across the model space. Other Bayesian approaches in this area, such as the RJMCMC in Giudici and Green (1999), mix very slowly, since they randomly pick new edges, but only add them if they are consistent with the data. Therefore, our algorithm has an advantage of being able to mix fast across the full model space, especially for high-dimensional problems. There are several conceptual extensions. For our BDMCMC methodology, we proposed a uniform or a truncated Poisson prior for the conditional independence graph, and a G-Wishart prior for precision matrix. We can also use different prior distributions for both the graph and the precision matrix, as was done in Wong et al. (2003), Chan and Jeliazkov (2009) and Wang and Pillai (2011). Furthermore, our methodology is general for any type of graphical model and does not rely on the normality of the variable. We can therefore use this methodology for other families of graphical models. We hope this work opens a window for new developments in MCMC approaches for efficient inference of general, high-dimensional graphical models. A Appendix: The proof for theorem 2.1 Our proof for theorem 2.1 is based on the theory derived by (Preston, 1976, Section 7 and 8). Preston proposed a special birth-death process, in which the birth and death rates depend on the position of the individuals in the underlying space. The process evolves by jumps, of which only a finite number can occur in a finite time. The jumps are of two types: a birth is defined as the appearance of a single individual, whereas a death is the removal of a single individual. By considering the solution of the backward Kolmogorov equation, (Preston, 1976, Theorem 7.1) showed that under certain conditions the process exists and is temporally ergodic, that is, there exists a unique stationary distribution. He showed that if the balance conditions would be hold the birth-death process converge to unique stationary distribution which for our proposed method is the joint posterior distribution of graph and precision matrix. Before we derive the detailed balance conditions for our proposed BDMCMC algorithm we introduce some notation. Assume the process is at state M G in which G = (V, E) with precision matrix K P G. The behavior of the process is defined by the birth rates β ξ (K), the death rates δ ξ (K), and the birth and death transition kernels T G β ξ (K;.) and T G δ ξ (K;.). For each ξ E, T G β ξ (K;.) denotes the probability that the process jumps from state M G to a point in the new state M G +ξ. Hence, if F P G +ξ then we have 19

20 Tβ G ξ (K; F) = β ξ(k) b ξ (k ξ ; K)dk ξ. (10) β(k) k ξ :K k ξ F Likewise, for each ξ E, T G δ ξ (K;.) denotes the probability that the process jumps from state M G to a point in the new state M G ξ. Therefore, if F P G ξ then T G δ ξ (K; F) = η E:K\k η F δ η (K) δ(k) = δ ξ(k) δ(k) I(K ξ F). (11) In our model, one specific way to satisfy the detailed balance conditions is by matching the birth events from graph G to all possible graphs with one more link and death events from all possible graphs with one more link to graph G. It is described by the following definition; See also (Preston, 1976, equations 8.4 and 8.5). Detailed balance conditions. In our birth-death process, p(k, G x) satisfy detailed balance conditions if β(k)dp(k, G x) = δ(k +ξ )Tδ G ξ (K +ξ ; F)dp(K +ξ, G +ξ x) (12) F P ξ E G +ξ and F δ(k)dp(k, G x) = ξ E P G ξ β(k ξ )T G β ξ (K ξ ; F)dp(K ξ, G ξ x), (13) which F P G. The first part (Eq. 12) means the rate at which the process leaves the current graph through a birth events is precise matched by the rates at which the process enters this graph through all possible death events, and the other way around, for second part (Eq. 13). 20

21 To prove the first part of the detailed balance conditions (Eq. 12), we have LHS = β(k)dp(g, K x) F = I(K F)β(K)dp(G, K x) P G = I(K F) β ξ (K)dp(G, K x) P G ξ E = I(K F)β ξ (K)dp(G, K x) P G ξ E = [ ] I(K F)β ξ (K) b ξ (k ξ K)dk ξ dp(g, K x) P G k ξ E ξ = I(K F)β ξ (K)b ξ (k ξ K)dk ξ dp(g, K x) P G k ξ E ξ = I(K F)β ξ (K)b ξ (k ξ K)p(G, K x)dk ξ dk ζ. ξ E ζ V Now, for RHS, by using Eq. (11) we have RHS = δ(k +ξ )Tδ G ξ (K +ξ ; F )dp(g +ξ, K +ξ x) P ξ E G +ξ = I(K F)δ ξ (K +ξ )dp(g +ξ, K +ξ x) P ξ E G +ξ = I(K F)δ ξ (K +ξ )p(g +ξ, K +ξ x)dk ξ dk ζ, ξ E ζ V and so LHS=RHS, if β ξ (K)b ξ (k ξ ; K)p(G, K x) = δ ξ (K +ξ )p(g +ξ, K +ξ x), which is equivalent to the conditions in theorem 2.1. In the same way, the second part of detailed balance condition Eq. (13) can be shown to hold. References Atay-Kayis, A. and H. Massam (2005). A monte carlo method for computing the marginal likelihood in nondecomposable gaussian graphical models. Biometrika 92 (2),

22 Cappé, O., C. Robert, and T. Rydén (2003). Reversible jump, birth-and-death and more general continuous time markov chain monte carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 (3), Chan, J. and I. Jeliazkov (2009). Mcmc estimation of restricted covariance matrices. Journal of Computational and Graphical Statistics 18 (2), Dempster, A. (1972). Covariance selection. Biometrics 28 (1), Dobra, A., A. Lenkoski, and A. Rodriguez (2011). Bayesian inference for general gaussian graphical models with application to multivariate lattice data. Journal of the American Statistical Association 106 (496), Friedman, J., T. Hastie, and R. Tibshirani (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 (3), Geyer, C. and J. Møller (1994). Simulation procedures and likelihood inference for spatial point processes. Scandinavian Journal of Statistics 21 (4), Giudici, P. and P. Green (1999). Decomposable graphical gaussian model determination. Biometrika 86 (4), Green, P. (1995). Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika 82 (4), Jones, B., C. Carvalho, A. Dobra, C. Hans, C. Carter, and M. West (2005). Experiments in stochastic computation for high-dimensional graphical models. Statistical Science 20 (4), Lauritzen, S. (1996). Graphical models, Volume 17. Oxford University Press, USA. Lenkoski, A. and A. Dobra (2011). Computational aspects related to inference in gaussian graphical models with the g-wishart prior. Journal of Computational and Graphical Statistics 20 (1), Mitsakakis, N., H. Massam, and M. D Escobar (2011). A metropolis-hastings based method for sampling from the g-wishart distribution in gaussian graphical models. Electronic Journal of Statistics 5, Preston, C. J. (1976). Special birth-and-death processes. Bull. Inst. Internat. Statist. 46, Ripley, B. (1977). Modelling spatial patterns. Journal of the Royal Statistical Society. Series B (methodological) 39 (2), Roverato, A. (2002). Hyper inverse wishart distribution for non-decomposable graphs and its application to bayesian inference for gaussian graphical models. Scandinavian Journal of Statistics 29 (3),

23 Sachs, K., O. Perez, D. Pe er, D. Lauffenburger, and G. Nolan (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science s STKE 308 (5721), 523. Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Annals of Statistics 28 (1), Wang, H. and C. Carvalho (2010). Simulation of hyper-inverse wishart distributions for non-decomposable graphs. Electronic Journal of Statistics 4, Wang, H. and S. Li (2012). Efficient gaussian graphical model determination under g-wishart prior distributions. Electronic Journal of Statistics 6, Wang, H. and N. Pillai (2011). On a class of shrinkage priors for covariance matrix estimation. Arxiv preprint arxiv: Whittaker, J. (1990). Graphical models in applied multivariate statistics, Volume 16. Wiley New York. Wong, F., C. Carter, and R. Kohn (2003). Efficient estimation of covariance selection models. Biometrika 90 (4),

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

Bayesian Model Determination in Complex Systems

Bayesian Model Determination in Complex Systems Bayesian Model Determination in Complex Systems PhD thesis to obtain the degree of PhD at the University of Groningen on the authority of the Rector Magnificus Prof. E. Sterken and in accordance with the

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian Inference of Multiple Gaussian Graphical Models Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Posterior convergence rates for estimating large precision. matrices using graphical models

Posterior convergence rates for estimating large precision. matrices using graphical models Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs Ricardo Silva Abstract Graphical models are widely used to encode conditional independence constraints and causal assumptions,

More information

Package BDgraph. June 21, 2018

Package BDgraph. June 21, 2018 Version 2.51 Date 2018-06-18 Package BDgraph June 21, 2018 Title Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Author Reza Mohammadi [aut, cre] ,

More information

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Covariance decomposition in multivariate analysis

Covariance decomposition in multivariate analysis Covariance decomposition in multivariate analysis By BEATRIX JONES 1 & MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27708-0251 {trix,mw}@stat.duke.edu Summary The

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 18-16th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Trans-dimensional Markov chain Monte Carlo. Bayesian model for autoregressions.

More information

Inequalities on partial correlations in Gaussian graphical models

Inequalities on partial correlations in Gaussian graphical models Inequalities on partial correlations in Gaussian graphical models containing star shapes Edmund Jones and Vanessa Didelez, School of Mathematics, University of Bristol Abstract This short paper proves

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Bayesian Inference for Gaussian Mixed Graph Models

Bayesian Inference for Gaussian Mixed Graph Models Bayesian Inference for Gaussian Mixed Graph Models Ricardo Silva Gatsby Computational Neuroscience Unit University College London rbas@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering University

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Coupled Hidden Markov Models: Computational Challenges

Coupled Hidden Markov Models: Computational Challenges .. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov

More information

Bayesian Inference for Gaussian Mixed Graph Models

Bayesian Inference for Gaussian Mixed Graph Models Bayesian Inference for Gaussian Mixed Graph Models Ricardo Silva Gatsby Computational Neuroscience Unit University College London rbas@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering University

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

More information

Bayesian Analysis of Mixture Models with an Unknown Number of Components an alternative to reversible jump methods

Bayesian Analysis of Mixture Models with an Unknown Number of Components an alternative to reversible jump methods Bayesian Analysis of Mixture Models with an Unknown Number of Components an alternative to reversible jump methods Matthew Stephens Λ Department of Statistics University of Oxford Submitted to Annals of

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models arxiv:1811.03735v1 [math.st] 9 Nov 2018 Lu Zhang UCLA Department of Biostatistics Lu.Zhang@ucla.edu Sudipto Banerjee UCLA

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

FastGP: an R package for Gaussian processes

FastGP: an R package for Gaussian processes FastGP: an R package for Gaussian processes Giri Gopalan Harvard University Luke Bornn Harvard University Many methodologies involving a Gaussian process rely heavily on computationally expensive functions

More information

The ratio of normalizing constants for Bayesian graphical Gaussian model selection Mohammadi, A.; Massam, Helene; Letac, Gerard

The ratio of normalizing constants for Bayesian graphical Gaussian model selection Mohammadi, A.; Massam, Helene; Letac, Gerard UvA-DARE (Digital Academic Repository The ratio of normalizing constants for Bayesian graphical Gaussian model selection Mohammadi, A.; Massam, Helene; Letac, Gerard Published in: ArXiv e-prints Link to

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

MCMC and Gibbs Sampling. Sargur Srihari

MCMC and Gibbs Sampling. Sargur Srihari MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

A Short Note on Resolving Singularity Problems in Covariance Matrices

A Short Note on Resolving Singularity Problems in Covariance Matrices International Journal of Statistics and Probability; Vol. 1, No. 2; 2012 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Short Note on Resolving Singularity Problems

More information

Point spread function reconstruction from the image of a sharp edge

Point spread function reconstruction from the image of a sharp edge DOE/NV/5946--49 Point spread function reconstruction from the image of a sharp edge John Bardsley, Kevin Joyce, Aaron Luttman The University of Montana National Security Technologies LLC Montana Uncertainty

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Markov-Chain Monte Carlo

Markov-Chain Monte Carlo Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. References Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Bayesian inference for general Gaussian graphical models with application to multivariate lattice data

Bayesian inference for general Gaussian graphical models with application to multivariate lattice data Bayesian inference for general Gaussian graphical models with application to multivariate lattice data Adrian Dobra, Alex Lenkoski and Abel odriguez arxiv:1005.4094v1 [stat.me] 21 May 2010 Abstract We

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Bayesian Graphical Models for Structural Vector AutoregressiveMarch Processes 21, / 1

Bayesian Graphical Models for Structural Vector AutoregressiveMarch Processes 21, / 1 Bayesian Graphical Models for Structural Vector Autoregressive Processes Daniel Ahelegbey, Monica Billio, and Roberto Cassin (2014) March 21, 2015 Bayesian Graphical Models for Structural Vector AutoregressiveMarch

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo Lecture 9 Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Generalized Exponential Random Graph Models: Inference for Weighted Graphs

Generalized Exponential Random Graph Models: Inference for Weighted Graphs Generalized Exponential Random Graph Models: Inference for Weighted Graphs James D. Wilson University of North Carolina at Chapel Hill June 18th, 2015 Political Networks, 2015 James D. Wilson GERGMs for

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information