Marginal likelihood estimation via power posteriors

Size: px
Start display at page:

Download "Marginal likelihood estimation via power posteriors"

Transcription

1 Marginal likelihood estimation via power posteriors N. Friel University of Glasgow, UK A.N. Pettitt Queensland University of Technology, Australia Summary. Model choice plays an increasingly important role in Statistics. From a Bayesian perspective a crucial goal is to compute the marginal likelihood of the data for a given model. This however is typically a difficult task since it amounts to integrating over all model parameters. The aim of this paper is to illustrate how this may be achieved using ideas from thermodynamic integration or path sampling. We show how the marginal likelihood can be computed via MCMC methods on modified posterior distributions for each model. This then allows Bayes factors or posterior model probabilities to be calculated. We show that this approach requires very little tuning, and is straightforward to implement. The new method is illustrated in a variety of challenging statistical settings. Keywords: BAYES FACTOR; REGRESSION; LONGITUDINAL DATA; MULTINOMIAL; HIDDEN MARKOV MODEL; MODEL CHOICE. Introduction Suppose data y is assumed to have been generated by one of M models indexed by the set {, 2,..., M}. An important goal of Bayesian model selection is to calculate p(k y) the posterior model probability for model k. Here the aim may be to obtain a single most probable model, or indeed a subset of likely models, a posteriori. Alternatively, posterior model probabilities may be synthesised from all competing models to calculate some quantity of interest common to all models, using model averaging (Hoeting et al 200). But before we can make inference for p(k y) we first need to fully specify the posterior distribution of all unknown parameters in the joint model and parameter space. We denote by θ k, the parameters specific to model k, where θ denotes collection of all model parameters. Specifying prior distributions for within model parameters p(θ k k), and for model indicators p(k), together with a likelihood L(y θ k, k) within model k allows Bayesian inference to proceed by examining the posterior distribution p(θ k, k y) L(y θ k, k)p(θ k k)p(k). () Across model strategies proceed by sampling from this joint posterior distribution of model indicators and parameters. The reversible jump MCMC algorithm (Green 995) is a popular approach for this situation. Other across model search strategies include (Godsill 200) and (Carlin and Chib 995). By contrast, within model searches examine the posterior distribution within model k separately for each k. Here the within model posterior appears as p(θ k y, k) L(y θ k, k)p(θ k k), where the constant of proportionality, often termed the marginal likelihood or integrated likelihood for model k is written as p(y k) = L(y θ k, k)p(θ k k)dθ k. (2) θ k This is in general a difficult integral to compute, possibly involving high dimensional within model parameters θ k. However if we were able to do so, then we would be readily able to make statements about posterior model probabilities, using Bayes theorem, p(k y) = p(y k)p(k) M p(y k)p(k). Address for correspondence: Department of Statistics, University of Glasgow, Glasgow, G2 8QQ, UK. nial@stats.gla.ac.uk

2 2 Friel, Pettitt The marginal likelihoods can be used to compare two models by computing Bayes factors, B 2 = p(y k = ) p(y k = 2), without the need to specify prior model probabilities. Note that the Bayes factor B 2 gives the evidence provided by the data in favour of model compared to model 2. It can also be seen that B 2 = p(k = y) p(k = 2) p(k = 2 y) p(k = ). In other words, the Bayes factor is the ratio of the posterior odds of model to its prior odds. Note that an improper prior distribution p(θ k k) leads necessarily to an improper marginal likelihood, which in turns implies that the Bayes factor is not well-defined. To circumvent the difficulty of using improper priors for model comparison, (O Hagan 995) introduced an approximate method termed the fractional Bayes factors. Here an approximate (proper) marginal likelihood is defined by the ratio θ k L(y θ k, k)p(θ k k)dθ k, θ k {L(y θ k, k)} a p(θ k k)dθ k since any impropriety in the prior for θ k cancels above and below. Other approaches to this problem include the intrinsic Bayes factor (Berger and Perricchi 996). In this paper we concentrate on the case where prior model distributions are proper. Various methods have been proposed in the literature to estimate the marginal likelihood (2). For example (Chib 995) estimates the marginal likelihood p(y k) using output from a Gibbs sampler for (θ k y, k). It relies however on a block updating approach for θ k. Clearly this is not always possible to do. This work was then extended in (Chib and Jeliazkov 200), where output from a Metropolis-Hasting algorithm for the posterior (θ k y, k) can be used to estimate the marginal likelihood. Annealed importance sampling (Neal 200) estimates the marginal likelihood using ideas from importance sampling. Here an independent sample from the posterior is generated by defining a sequence of distributions indexed by a temperature parameter t from the prior through to the posterior. Importantly however, the collection of importance weights can be used to estimate the marginal likelihood. In this paper we propose a new method to compute the marginal likelihood based on samples from a distribution proportional to the likelihood raised to a power t times the prior, which we term the power posterior. This method was inspired by ideas from path sampling or thermodynamic integration (Gelman and Meng 998). We find that the marginal likelihood, p(y k) can be expressed as an integral with respect to t from 0 to of the expected deviance for model k, where the expectation is taken with respect to the power distribution, at power t. We argue that this method requires very little tuning or book keeping. It is easy to implement, requiring minor modification to computer code which samples from the posterior distribution. In this paper, in Section 2 we carry out a review of different approaches to across and within model searches. Section 3 introduces the new method for estimating the marginal likelihood, based on sampling from the so-called power posterior distribution. Here we also outline how this method could be implemented in practice, while also giving some guidance as to the sensitivity of the estimate of the marginal likelihood (or Bayes factor) to the diffusivity of the prior model parameters. Three illustrations of how the new method performs in practice are given in Section 4. In particular complex modelling situations are illustrated which preclude other methods of marginal likelihood estimation which rely on block updating model parameters. We conclude this paper with some final remarks in Section Review of methods to compute Bayes factors There are numerous methods and techniques available to estimate (2). Generally speaking two approaches are possible an across model search or a within model search. The former approach in an MCMC setting, involves generating a single Markov chain which traverses the joint model and parameter space (2). A popular choice is the reversible jump sampler (Green 995). Godsill (200) retains aspects of the reversible jump sampler, but considers the case where parameters are shared

3 Marginal likelihood estimation via power posteriors 3 between different model, as occurs for example in nested models. Other choices include (Stephens 2000), (Carlin and Chib 995) and (Dellaportas, Forster and Ntzoufras 200). Within model search essentially aims to estimate the marginal likelihood (2) for each model k separately, and then if desired uses this information to form Bayes factors, see (Chib 995), (Chib and Jeliazkov 200). Neal (200) combines aspects of simulated annealing and importance sampling to provide a method of gathering an independent sample from a posterior distribution of interest, but importantly also to estimate the marginal likelihood. Bridge sampling (Meng and Wong 996) offers the possibility of estimating the Bayes factor linking the two posterior distributions by a bridge function. Bartolucci and Mira (2003) use this approach, although it is based on an across model reversible jump sampler. Within model approaches are disadvantageous when the cardinality of the model space is large. However, as noted by Green (2003), the ideal situation for a within model approach is one where the models are all reasonably heterogeneous. In effect, this is the case where it is difficult to choose proposal distributions when jumping between models - and indeed the situation where parameters across models of the same dimension have different interpretations. This short review is not intended to be exhaustive. A more complete picture can be found in the substantial reviews (Sisson 2005) and (Han and Carlin 200). 2.. Reversible jump MCMC Reversible jump MCMC (Green 995) offers the potential to carry out inference for all unknown parameters () in the joint model and parameter space in a single logical framework. A crucial innovation in the seminal paper by Green (995) was to illustrate that detailed balance could be achieved for general state spaces. In particular this extends the Metropolis-Hastings algorithm to variable dimension state spaces of the type (θ k, k). To implement the algorithm, proposing to move from (θ k, k) to (θ l, l) proceeds by generating a random vector u from a distribution g and setting (θ l, l) = f kl ((θ k, k), u). Similarly to move from (θ l, l) to (θ k, k) requires random numbers u following some distribution g, and setting (θ k, k) = f lk ((θ l, l), u ), for some deterministic function f lk. However it is important that the transformation f kl from (θ k, k) to (θ l, l) is both a bijection and its differential invertible. A necessary condition for this to apply is if the so-called dimension matching condition applies, that is, if dim(θ k ) + dim(u) = dim(θ l ) + dim(u ). In this case, the probability of accepting such a move appears as ( min, p(θ l, l y)p(l k)g (u ) ) p(θ k, k y)p(k l)g(u) J where p(l k) is the probability of moving from model k to model l and in addition J is the jacobian resulting from the transformation from ((θ k, k), u) to ((θ l, l), u ). In practice, this may be simplified slightly by not insisting on stochastic moves in both direction, so that, for example, dim(u ) = 0, whence the term g ( ) disappears in the numerator above. Finally for the case of nested models, a possible move type is (θ k+, k + ) = ((θ k, k), u) in which case the jacobian term equals. In some respects RJMCMC is difficult to use in practice. The main drawback would appear to be the problem of model mixing across dimensions. Typically this is as a result of the difficulty in choosing a suitable jump proposal distribution. Here it is unclear how to reasonably center and scale the distribution to increase the chance of the move being accepted. However recent work by (Brooks et al 2003) has tackled this problem to some extent Chib s method An important method of marginal likelihood estimation within each model is that of Chib (995). This method follows from noticing that for any parameter configuration θ, Bayes rule implies that the marginal likelihood of the data y for model k satisfies p(y) = L(y θ )p(θ ) p(θ. y) Here and for the remainder of this article, for ease of notation, we remove reference to the model indicator k, except where this is ambiguous. Each factor on the right hand side above can be calculated

4 4 Friel, Pettitt immediately, with the exception of the posterior probability p(θ y). Typically θ would be chosen as a point of high posterior probability to increase the numerical accuracy of the estimate. Chib illustrated that this probability can be estimated via Gibbs sampling provided θ can be partitioned into n blocks {θi }, say, where the full conditional of each block is amenable to Gibbs sampling. It is clear that n p(θ y) = p(θi θi,...,θ, y). i=2 Now each factor p(θj θ j,...,θ, y) can be estimated from the Gibbs output by integrating out parameters θj+,...,θ n: p(θ j θ j,...,θ, y) = I I i= p(θj θ(i) n,...,θ(i) j+, θ j,..., θ, y), (3) where the index i indicates iterations of the Markov chain at stationarity. Further the normalising constant of each block must be known exactly in order for full conditional probabilities to be estimated. Chib and Jeliazkov (200) extended this methodology to the case where (3) can be updated using Metropolis-Hastings output, employing an identity based solely on the Metropolis-Hastings acceptance probabilities, but which does not require the normalising constant of p(θ y). However implementing both methods relies on judicious partitioning of the parameter θ, in addition to a considerable amount of book keeping. Clearly both methods increase in computational complexity as the dimension of θ k increases Annealed importance sampling Estimating the marginal likelihood using ideas from importance sampling is also possible as illustrated in (Neal 200). The idea is to define a sequence of distributions, starting from one for which it is possible to gather samples, for example the prior distribution, and ending at a target distribution from which you would like to sample. Neal (200) defines this sequence as p ti (θ k y) = p(θ k ) ti p(θ k y) ti, where 0 = t 0 < t < < t n =. Thus p t0 and p tn corresponds to the prior and posterior distribution respectively. The algorithm begins by sampling a point x t0 from the prior, p t0. At the ith step, a point x ti+ is generated from p ti+ via a Markov chain transition kernel at x ti, for example via Gibbs, or Metropolis-Hastings updating. The final step n yields a point x tn from the posterior. Repeating this scheme N times, yields an independent sample x (),..., x (N) from the posterior. In effect, distribution p ti is an importance distribution for p ti+. An important by-product of this scheme is that the collection of importance weights are such that w (i) = p t n (x tn ) p tn 2 (x tn 2 ) p tn (x tn ) p tn (x tn )... p t 0 (x t0 ) p t (x t ) p(y) = N i= w(i) That is, the marginal likelihood obtains as the average of the importance weights. N. 3. Marginal likelihoods and power posteriors Here we introduce a new approach to estimating the integrated likelihood based on ideas of thermodynamic integration or path sampling (Gelman and Meng 998). Consider introducing an auxiliary variable (or temperature schedule) T(t) where T : [0, ] [0, ] defined such that T(0) = 0 and T() =. For simplicity we assume that T(t) = t. Consider the power posterior defined as p t (θ k y) {L(y θ k )} t p(θ k ). (4)

5 Now, define Marginal likelihood estimation via power posteriors 5 z(y t) = {L(y θ k )} t p(θ k ) dθ k. θ k By construction, z(y t = 0) is the integral of the prior for θ k, which equals. Further, z(y t = ) is the marginal likelihood of the data. Here we assume of course that z(y t) < for all t [0, ]. Now ideas from path sampling (Gelman and Meng 998) can be used to calculate the integral of interest z(y t = ). The following identity is crucial to the problem in hand: { } z(y t = ) log(p(y)) = log = E z(y t = 0) θk y,t log{l(y θ k)} dt. (5) Thus the log of the marginal likelihood results as the mean deviance where the expectation is taken with respect to the power posterior (4) at temperature t, where t moves from 0 to. The identity (5) can be derived easily as follows: d dt log{z(y t)} = z(y t) = z(y t) = z(y t) = 0 d dt z(y t) d {L(y θ k )} t p(θ k ) dθ k dt θ k θ k {L(y θ k )} t log{l(y θ k )}p(θ k ) dθ k {L(y θ k )} t p(θ k ) θ k z(y t) = E θk y,t log{l(y θ k)}. log{l(y θ k )} dθ k Equation (5) now follows by integrating with respect to t. This approach shares some analogies to annealed importance sampling outlined in Section 2.3, however, here we estimate the marginal likelihood on the log scale, ensuring increased numerical stability. Further our method estimates log(p(y)) using expectations, again aiding the numerical stability. It is interesting to note that the fraction z(y t = )/z(y t = a), where 0 < a < is precisely the approximation to the marginal likelihood used to compute the fractional Bayes factor (O Hagan 995). In addition note that the likelihood contribution to the power posterior is generally not a proper likelihood since it may not always hold that {L(y θ k )} t dy. Finally in common with simulated annealing and simulated tempering, the effect of the temperature parameter t is to flatten the likelihood contribution in the power posterior, so that it is approximately uniform for values of t close to 0, in which case the power posterior approximates the prior contribution. Note that path sampling has been employed to compute high dimensional normalising constants, most notably in estimation of parameters of Markov random fields. In this context the technique has been used to calculate normalising constants of model parameters, which are then used as a look-up table in the estimation process, see for example (Green and Richardson 2002) and (Dryden, Scarr and Taylor 2003). 3.. Estimating the marginal likelihood Note that the identity for the marginal likelihood (5) can be considered as a double integral, integrating over the power parameter t and the model parameters θ k. The joint distribution of θ k and t can be written as, p(θ k, t y) = p(θ k t, y)p(t) = L(y θ k) t p(θ k ) p(t) z(y t) Now the full conditional distribution of θ k looks like, while if we assume that p(t) z(y t), then also p(θ k y, t) {L(y θ k )} t p(θ k ) p(t θ k, y) {L(y θ k )} t p(θ k ).

6 6 Friel, Pettitt Now a sample {(θ () k, t ),..., (θ (n) k, t n)} gathered from p(θ k, t y) can be used to calculate (5) by ordering the t i s and calculating log{l(y θ k )}, estimating the integral via quadrature. All of this hinges on the assumption that p(t) z(y t). It is our experience that z(y t) varies by orders of magnitude with t. This is not surprising since by construction z(y t = 0) =, while the marginal likelihood, z(y t = ), could be quite large depending on the problem in hand. Thus using a single chain, values of t close to zero would tend not to be sampled with high frequency leading to poor estimation of p(y). As a more direct approach we suggest discretising the integral (5) over t [0, ], running separate chains for each t, sampling from the power posterior to estimate the mean deviance, E θk y,t log(l(θ k y)). Numerical integration using, for example, a trapezoidal rule, over t yields an estimate of the marginal likelihood. For example choosing a discretisation 0 = t 0 < t < < t n < t n =, leads to an approximation [ ] n E θk y,t i+ log L(y θ k ) + E θk y,t i log L(y θ k ) log p(y) (t i+ t i ) (6) 2 i= Note that Monte Carlo standard errors for each E θk y,t i log L(y θ k ), can be pieced together to give an overall Monte Carlo standard error for log p(y). It should be apparent that if it is possible to sample from the posterior distribution of model parameters, then it should often be possible to sample from the power posterior, and hence to compute the marginal likelihood. In particular if the likelihood contribution follows an exponential family model, then raising the likelihood to a power t amounts to multiplying the exponent by t. We therefore expect that in many cases if the posterior is amenable to Gibbs sampling, then so too would the power posterior. In terms of computation, modifying existing MCMC code which samples from posterior model parameters is trivial. Essentially all that is needed is an extra iteration loop for the temperature parameter t, calculating the expected deviance under the power posterior at each temperature iteration Sensitivity of p(y k) to the prior It is well understood that the Bayes factor is sensitive to the choice of prior model parameters. Here we outline for a simple example how this impacts on the estimate of the marginal likelihood via samples from the power posterior, as outlined above. Consider the simple situation where data y = {y i } are independent and normally distributed with mean θ and unit variance. Assuming θ N(m, v), a priori, leads to a power posterior, θ y, t N(m t, v t ) where It is straightforward to show that m t = ntȳ + m/v nt + /v E θ y,t log L(y θ) = n 2 log 2π 2 and v t = n (y i ȳ) 2 n 2 i= nt + /v. (m ȳ) 2 (vmt + ) 2 n 2 (nt + /v). (7) Recall that the log of the marginal likelihood obtains by integrating (7) with respect to t over t [0, ]. Consider the situation when t = 0. In this case the final term in on the left hand side of (7) appears as nv/2. Clearly as v, so too does E θ y,t=0 log L(y θ), and at the same speed. This illustrates the sensitivity of the prior specification to the numerical stability of the estimate of the marginal likelihood. Figure plots E θ y,t log L(y θ) against t for v = 0, 5,, (for illustrative purposes the first two terms on the left hand side take a constant value, ȳ = 0, m = 0 and n = 0). It is our experience that the behaviour of the mean deviance with t, as outlined in Figure is typical in more complex settings, even when the prior parameters are quite uninformative. Bearing this in mind we see that the choice of spacing for the t i s in (6) is important. For example, a temperature schedule of the type t i = a c i, where a i = i/n is an equal spacing of the n points in the interval [0, ], and c > is a constant, ensures that the t i s are chosen with high frequency close to t = 0. Prescribing the collection of t i s in this way should improve the efficiency of the estimate of p(y).

7 Marginal likelihood estimation via power posteriors Fig.. Expected deviance (7), under the distribution θ y, t plotted against t for prior variance equal to 0, 5,. As v increases, so too does the rate at which the mean deviance changes with t. i y i x i z i i y i x i z i i y i x i z i Table. Radiata pine dataset. y i: Maximum compression strength parallel to the grain, x i: Density, z i: Resinadjusted density. 4. Examples 4.. Linear regression - non-nested models The dataset in Table was taken from Williams (959). The data describe the maximum compression strength parallel to the grain y i, the density x i, and the resin-adjusted density z i for 42 specimens of radiata pine. This dataset has been examined in (Han and Carlin 200), (Carlin and Chib 995) and (Bartolucci and Scaccia 2004), where they compared several methods to estimate the Bayes factor between two non-nested competing models. The competing models are as follows: M : y i = α + β(x i x) + ǫ i, ǫ i N(0, σ 2 ). M 2 : y i = γ + δ(z i z) + η i, η i N(0, τ 2 ). The following prior specification was used (identical to those papers cited immediately above): N((3000, 85) T, diag(0 6, 0 4 ) for (α, β) T and (γ, δ) T. An IG(3, ( ) ) prior was chosen for σ 2 and τ 2, where IG(a, b) is an inverse gamma distribution with density f(x) = exp(/bx)γ(a)b a x a+.

8 8 Friel, Pettitt Power Posterior RJMCMC RJ corrected Mean Standard error Relative error.8% 34.5% 2.0% Table 2. Linear regression models. Estimates of means, standard errors and relative errors of B 2 using the method of power posteriors and RJMCMC. RJMCMC entries corresponds to prior model probabilities p(k = ) = p(k = 2) = 0.5, while RJ corrected corresponds to p(k = ) = and p(k = 2) = Green and O Hagan (998) found, for the given prior specification, by numerical integration that the Bayes factor B 2 = The aim for this straightforward situation is to see what statistically efficiency can be achieved by using the power posterior method to compute the Bayes factor over using RJMCMC. Here we calculated each marginal likelihood using a temperature schedule of the type t i = a 5 i, where the a i s correspond to 0 equally spaced points in [0, ]. In total 30, 000 iterations were used to estimate each marginal likelihood. Finally the algorithm was run for 00 independent chains, each estimating ˆB 2,i for i =,...,00. To implement RJMCMC we specified p(k = ) = p(k = 2) = 0.5. To allow for a fair comparison, the reversible jump sampler was run for 60, 000 iterations. Within model parameters were updated via Gibbs sampling. Across model moves were proposed by simply setting (α, β, σ) = (γ, δ, τ), resulting in the jacobian term taking the value. The following quantities appear in Table 2: Mean = Standard error = Rel. Error = B 2 00 ˆB 2,i 00 i= 00 ( 00 ˆB 2,i Mean) 2 i= 00 ( 00 ˆB 2,i B 2 ) 2. i= As can be seen from the results in Table 2, RJMCMC faired poorly. This is simply because the reversible jump sampler does not mix well and so does not visit model very often, leading to a poor posterior estimate of p(k = y). Running the reversible jump sampler with the prior model probability strongly weighted towards model with p(k = ) = and p(k = 2) = , leads to estimates of B 2 with similar efficiency to that of the power posterior method and the power posterior has marginally smaller relative error than the RJ corrected method Categorical longitudinal models For this example we re-visit an analysis of a categorical longitudinal dataset presented in (Pettitt et al 2005). This example concerns a large social survey of immigrants to Australia. Data is recorded for each subject on three separate occasions. The response variable of interest is employment status, which comprises of 3 categories employed, unemployed or non-participant. The nonparticipant category refers to subjects who are for example, students, retired or pensioners. The response variable is modelled as a multinomial random variable. Here we are concerned with fitting Bayesian hierarchical models to the data including both fixed and random effects on employment status. Specifically we assume that the response y isj of individual i at time s belongs to employment category j with probability p isj. Note we have used s to index time rather than k as in (Pettitt et al 2005). Thus y isj is a binary random variable, and further we assume that y is = {y is, y is2, y is3 } has a multinomial distribution y is Multinomial(p is, n is ),

9 Marginal likelihood estimation via power posteriors 9 Fixed effects model k = t i Mean Dev MC se Random effects model k = 2 t i Mean Dev MC se Table 3. Categorical longitudinal model. Expected deviances (and Monte Carlo standard errors) for the power posterior at temperature t i for models k =, 2. where n is = j y isj =. The next level of the hierarchy models the binary probabilities p isj to fixed and random covariate effects, p isj = µ isj j µ, isj where, log(µ isj ) = X T is β j + α ij, and where X is is a vector of covariates, β j are fixed effects, and α ij is a random effect reflecting time constant unobserved heterogeneity. Choosing employed as a reference state and setting β and α i to zero allows the model to be identified. In order to maintain invariance with respect to which state is chosen as the reference state, we must take the random effect (α i2, α i3 ) to have a multivariate normal distribution with mean 0 and variance-covariance Σ. We write the posterior distribution as p(β, α, Σ y) L(y β, α, Σ)p(α Σ)p(Σ)p(β), assuming a priori independence between β and Σ. The prior distribution for β j was set to standard normal distribution with variances of 6 (except for the parameters corresponding to age 2 which were given more precise priors). The random effects terms (α i2, α i3 ) was assigned a bivariate zero mean distribution where the elements of the variance-covariance matrix Σ where Σ Uniform(0, 0), Σ 22 Uniform(0, 0) and the correlation coefficient was assigned Uniform(, ). Here it is possible to update beliefs about all unknown parameters from their full conditional distributions using Gibbs sampling. Here our interest is in calculating marginal likelihoods for Model k = : log(µ isj ) = X T is β j Model k = 2: log(µ isj ) = X T is β j + α ij. Model is essentially a fixed effects model, where the regression effect β j, for employment state j, remains constant for each individual at each of the time points s. A random effects term, α ij, is however included in model 2, accounting for variability between individuals at a given state j. Indeed other plausible models are also possible, modelling for example, variability in time between individuals, again the reader is referred to (Pettitt et al 2005) for more details. For this illustration we have used a randomly selected sample size of 000 from the complete case data (n = 3234) analysed in (Pettitt et al 2005). For this example, collecting samples from the power posteriors is possible using the WinBUGS software (Spiegelhalter, Thomas and Best 998). To implement (6) we chose a temperature schedule t i = a 4 i, where the a i s are 0 equally spaced points in the interval [0.006, ] and the end point t = 0. Within each temperature t i, 2, 000 samples were collected from the stationary distribution p ti (β y, k = ) and p ti (β, α, Σ y, k = 2). Table 3 summarises the output. Applying the trapezoidal rule (6), yields log p(y k = ) = 2, and log p(y k = 2) = 2, 358.5, with associated Monte Carlo standard errors of 0.64 and.34 for the trapezoidal rule, respectively. This leads to the strong conclusion that a random effects model is much more probable which is qualitatively similar to the conclusion presented in (Pettitt et al 2005).

10 0 Friel, Pettitt 4.3. Hidden Markov random field models Markov random fields (MRFs) are often used to model binary spatially dependent data the autologistic model (Besag 974) is a popular choice. Here the joint distribution of x = {x i : i =, 2,..., n} taking values {(, +)} on a regular lattice is defined as p(x β) exp{β 0 x i + β x i x j } (8) i conditional on parameters β = (β 0, β ). Positive values of β 0 encourage x i to take the values +, while positive values of β encourage homogenous regions of + s or s. The notation i j denotes that x i and x j are neighbours. For this example we examine two models, defined via their neighbourhood structure: Model k = : A first order neighbourhood where each point x i has as neighbours the four nearest adjacent points. Model k = 2: A 2nd order neighbourhood structure where in addition to the first order neighbours, the four nearest diagonal points also belong to the neighbourhood. Both neighbourhood structures are modified along the edges of the lattice. MRF models are difficult to handle in practice, due to computational burden of calculating the proportional constant, in (8), c(β) say. A hidden MRF y arises when an MRF x is corrupted by some noise process. The underlying MRF is essentially hidden, and appears as parameters in the model. Typically it assumed that conditional on x the y i s are independent which gives the likelihood: L(y x, θ) = i j n p(y i x i, θ), i= for some parameters θ. Once prior distributions p(β) and p(θ) are specified for β and θ respectively, a complete Bayesian analysis proceeds by making inference on the posterior distribution p(x, β, θ y, k) L(y x, θ)p(x β, k)p(β)p(θ). It is relatively straightforward to sample from the full conditional distribution of each of x and θ. Sampling from the full conditional distribution of β is more problematic, due to the difficulty of calculating the normalising constant of the MRF, c(β). However provided the number of rows or columns is not greater than 20 for a reasonable number of the other dimension, then the forward recursion method presented in (Reeves and Pettitt 2004) can be use to calculate c(β). For a more complete description of the problem of Bayesian estimation of hidden MRFs, the reader is referred to (Friel et al 2005). For this example, gene expression levels were measured for 34 genes in a cluster of 38 neighbouring genes on the Streptomyces coelcicolor genome for 0 time points. The cluster of 38 neighbouring genes under study is responsible for the production of calcium-dependent antibodies We define the observations on a 38 0 regular lattice, where log expression level y sg corresponds to the gth gene at time point s. Figure 2 displays the data y, indicating gene locations for which there is no data. Here we assume that the data y masks a MRF process x, where states (, +) correspond to up-regulation and down-regulation respectively. We assume that the MRF process follows a first order neighbourhood structure (k = ), or a second order neighbourhood structure (k = 2). Finally we assume that the distribution of y given x is modelled as independent Gaussian noise with state specific mean µ(x sg ), and common variance σ and set θ = (µ, σ). Wit and McClure (2004) show that normality of log expression levels is a reasonable assumption for similar experimental setups. It is straightforward to handle the missing data in the full conditional distribution of the latent process x, the likelihood function needs to be modified slightly to allow for the fact that 4 of the columns of x are not supported by any data. A flat normal prior was chosen for each of the β parameters. The prior distribution for µ was distributed uniformly from the set {(µ( ), µ(+)) 2 µ( ) 2, µ( ) µ(+) 2}. The

11 Marginal likelihood estimation via power posteriors Fig. 2. Log expression levels of 34 genes on the Streptomyces genome for 0 consecutive time points. The x-axis labels missing columns. First order model k = t i Mean Dev MC se Second order model k = 2 t i Mean Dev MC se Table 4. Hidden Markov random field models. Expected deviances (and Monte Carlo standard errors) for the power posterior at temperature t i for models k =,2. values 2 and +2 represent approximate minimum and maximum values which are found in similar datasets. Corresponding to these values a gamma prior with mean 2 and variance 4 was specified for σ. Note that Friel and Wit (2005) present a more complete analysis of a similar dataset. Here we chose a temperature schedule t i = a 4 i, where the a i s are equally spaced points in the interval [0, ]. Within each temperature t i, 5, 000 samples were collected from the stationary distribution p ti (x, β, µ y, k), for k =, 2. Table 4 summarises the output. Applying the trapezoidal rule (6), yields log p(y k = ) = and log p(y k = 2) = 255.6, with associated Monte Carlo standard errors of and 0.000, respectively. Thus the second order model is deemed more probable a posteriori. 5. Concluding remarks We have introduced a new method of estimating the marginal likelihood for complex hierarchical Bayesian models which involves a minimal amount of change to commonly used algorithms which compute the posterior distributions of unknown parameters. We have illustrated the technique for three examples. The first, a simple regression example, where the prior model probabilities needed tuning in order to estimate the Bayes factor well. The second example involved a random effects model for multinomial data and demonstrated the ease of computing the marginal likelihood with a standard software package such as WinBUGS. Here the results demonstrated the overwhelming difference in marginal likelihoods for the two models considered, a similar situation to the first example where RJMCMC with default equally weighted model prior probabilities would perform very poorly. The third example involved a complex hidden Markov structure and the results demonstrated a difference in terms of marginal likelihood between the two models. In terms of approximating the mean deviance, the second example is far more challenging than the third example with the former displaying characteristics resulting from use of a vague prior with large negative values of the mean deviance for t near 0. However the Monte Carlo standard errors of the marginal likelihoods are

12 2 Friel, Pettitt nevertheless reasonably small for this example. Computation of the marginal likelihood requires a proper prior. The sensitivity of the value of marginal likelihood to the choice of prior can be readily investigated using our method. Various approaches have been proposed for the case where the prior is improper. As we mentioned above, the fractional Bayes factor is straightforwardly computed as a by-product of the marginal likelihood. For those seeking such approximations our method provides a straightforward solution. Our choice of quadrature rule and use of simulation resources can be improved but we have not followed up that matter here. But nevertheless, our computational approach provides estimates with tolerably small standard errors. In conclusion, we have illustrated a method of computing the marginal likelihood which is straightforward to implement and can be used for complex models. Acknowledgements Both authors were supported by the Australian Research Council. The authors wish to kindly acknowledge Thu Tran for her assistance with computational aspects of this work. Nial Friel wishes to acknowledge the School of Mathematical Sciences, QUT for its hospitality during June References Bartolucci, F. and A. Mira (2003), Efficient estimate of Bayes factors from reversible jump output. Technical report, Universitá dell Insubria, Dipartimento di Economia Bartolucci, F. and L. Scaccia (2004), A new approach for estimating the Bayes factor. Technical report, Universitá di Perugia Berger, J. O. and L. R. Perricchi (996), The intrinsic Bayes factor for linear models. In J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (eds.), Bayesian Statistics, vol. 5, pp , Oxford, Oxford University press Besag, J. E. (974), Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of the Royal Statistical Society, Series B 36, Brooks, S. P., P. Giudici and G. O. Roberts (2003), Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). Journal of the Royal Statistical Society, Series B 65(), 3 57 Carlin, B. P. and S. Chib (995), Bayesian Model Choice via Markov Chain Monte Carlo. Journal of the Royal Statistical Society, Series B 57, Chib, S. (995), Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90, Chib, S. and I. Jeliazkov (200), Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association 96, Dellaportas, P., J. J. Forster and I. Ntzoufras (200), On Bayesian model and variable selection using MCMC. Statistics and Computing 2, Dryden, I. L., M. R. Scarr and C. C. Taylor (2003), Bayesian texture segmentation of weed and crop images using reversible jump Markov chain Monte Carlo methods. Applied Statistics 52(), 3 50 Friel, N., A. N. Pettitt, R. Reeves and E. Wit (2005), Bayesian inference in hidden Markov random fields for binary data defined on large lattices. Technical report, University of Glasgow Friel, N. and E. Wit (2005), Markov random field model of gene interactions on the M. Tuberculosis genome. Technical report, University of Glasgow, Department of Statistics Gelman, A. and X.-L. Meng (998), Simulating normalizing contants: from importance sampling to bridge sampling to path sampling. Statistical Science 3, Godsill, S. J. (200), On the Relationship Between Markov Chain Monte Carlo Methods for Model Uncertainty. Journal of Computational and Graphical Statistics 0,

13 Marginal likelihood estimation via power posteriors 3 Green, P. J. (995), Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, Green, P. J. (2003), Trans-dimensional Markov chain Monte Carlo. In P. J. Green, N. L. Hjort and S. Richardson (eds.), Highly Structured Stochastic Systems, Oxford University Press, Oxford Green, P. J. and A. O Hagan (998), Model choice with MCMC on product spaces without using pseudo-priors. Technical Report 98-3, University of Nottingham Green, P. J. and S. Richardson (2002), Hidden Markov models and disease mapping. Journal of the American Statistical Association 97, Han, C. and B. P. Carlin (200), Marlov chain Monte Carlo methods for computing Bayes factors: A comparative review. Journal of the American Statistical Association 96(455), Hoeting, J. A., D. Madigan, A. E. Raftery and C. T. Volinsky (200), Bayesian model averaging: A tutorial. Statistical Science 4(4), Meng, X.-L. and W. Wong (996), Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica 6, Neal, R. M. (200), Annealed importance sampling. Statistics and Computing, O Hagan, A. (995), Fractional Bayes factors for model comparison (with discussion). Journal of the Royal Statistical Society, series B 57, Pettitt, A. N., T. T. Tran, M. A. Haynes and J. L. Hay (2005), A Bayesian hierarchical model for categorical longitudinal data from a social survey of immigrants. Journal of the Royal Statistical Society, series A (to appear) Reeves, R. and A. N. Pettitt (2004), Efficient recursions for general factorisable models. Biometrika 9(3), Sisson, S. A. (2005), Trans-dimensional Markov chains: A decade of progress and future perspectives. Journal of the American Statistical Association (to appear) Spiegelhalter, D., A. Thomas and N. Best (998), WinBUGS: Bayesian inference using Gibbs Sampling, Manual version.2. Imperial College, London and Medical Research Council Biostatistics Unit, Cambridge Stephens, M. (2000), Bayesian analysis of mixture models with an unknown number of components - an alternative to reversible jump methods. Annals of Statistics 28, Willams, E. (959), Regression Analysis. Wiley Wit, E. and J. McClure (2004), Statistics for Microarrays: Design, Analysis and Inference. Wiley, Chichester

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

Improving power posterior estimation of statistical evidence

Improving power posterior estimation of statistical evidence Improving power posterior estimation of statistical evidence Nial Friel, Merrilee Hurn and Jason Wyse Department of Mathematical Sciences, University of Bath, UK 10 June 2013 Bayesian Model Choice Possible

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection Silvia Pandolfi Francesco Bartolucci Nial Friel University of Perugia, IT University of Perugia, IT

More information

Package RcppSMC. March 18, 2018

Package RcppSMC. March 18, 2018 Type Package Title Rcpp Bindings for Sequential Monte Carlo Version 0.2.1 Date 2018-03-18 Package RcppSMC March 18, 2018 Author Dirk Eddelbuettel, Adam M. Johansen and Leah F. South Maintainer Dirk Eddelbuettel

More information

On Bayesian model and variable selection using MCMC

On Bayesian model and variable selection using MCMC Statistics and Computing 12: 27 36, 2002 C 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. On Bayesian model and variable selection using MCMC PETROS DELLAPORTAS, JONATHAN J. FORSTER

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Roberta Paroli 1, Silvia Pistollato, Maria Rosa, and Luigi Spezia 3 1 Istituto di Statistica

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

An EM algorithm for Gaussian Markov Random Fields

An EM algorithm for Gaussian Markov Random Fields An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

Delayed Rejection Algorithm to Estimate Bayesian Social Networks Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Assessing Regime Uncertainty Through Reversible Jump McMC

Assessing Regime Uncertainty Through Reversible Jump McMC Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Bridge estimation of the probability density at a point. July 2000, revised September 2003

Bridge estimation of the probability density at a point. July 2000, revised September 2003 Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Miscellanea An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants

Miscellanea An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants Biometrika (2006), 93, 2, pp. 451 458 2006 Biometrika Trust Printed in Great Britain Miscellanea An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants BY

More information

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE

AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE Marko Laine 1, Johanna Tamminen 1, Erkki Kyrölä 1, and Heikki Haario 2 1 Finnish Meteorological Institute, Helsinki, Finland 2 Lappeenranta

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Estimating marginal likelihoods from the posterior draws through a geometric identity

Estimating marginal likelihoods from the posterior draws through a geometric identity Estimating marginal likelihoods from the posterior draws through a geometric identity Johannes Reichl Energy Institute at the Johannes Kepler University Linz E-mail for correspondence: reichl@energieinstitut-linz.at

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Computer Practical: Metropolis-Hastings-based MCMC

Computer Practical: Metropolis-Hastings-based MCMC Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Chapter 12 PAWL-Forced Simulated Tempering

Chapter 12 PAWL-Forced Simulated Tempering Chapter 12 PAWL-Forced Simulated Tempering Luke Bornn Abstract In this short note, we show how the parallel adaptive Wang Landau (PAWL) algorithm of Bornn et al. (J Comput Graph Stat, to appear) can be

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Prediction of Data with help of the Gaussian Process Method

Prediction of Data with help of the Gaussian Process Method of Data with help of the Gaussian Process Method R. Preuss, U. von Toussaint Max-Planck-Institute for Plasma Physics EURATOM Association 878 Garching, Germany March, Abstract The simulation of plasma-wall

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Coupled Hidden Markov Models: Computational Challenges

Coupled Hidden Markov Models: Computational Challenges .. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Posterior Model Probabilities via Path-based Pairwise Priors

Posterior Model Probabilities via Path-based Pairwise Priors Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute

More information

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22 Overview 1 Bayesian modelling

More information

Fully Bayesian Spatial Analysis of Homicide Rates.

Fully Bayesian Spatial Analysis of Homicide Rates. Fully Bayesian Spatial Analysis of Homicide Rates. Silvio A. da Silva, Luiz L.M. Melo and Ricardo S. Ehlers Universidade Federal do Paraná, Brazil Abstract Spatial models have been used in many fields

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

BRIDGE ESTIMATION OF THE PROBABILITY DENSITY AT A POINT

BRIDGE ESTIMATION OF THE PROBABILITY DENSITY AT A POINT Statistica Sinica 14(2004), 603-612 BRIDGE ESTIMATION OF THE PROBABILITY DENSITY AT A POINT Antonietta Mira and Geoff Nicholls University of Insubria and Auckland University Abstract: Bridge estimation,

More information

BAYESIAN ANALYSIS OF ORDER UNCERTAINTY IN ARIMA MODELS

BAYESIAN ANALYSIS OF ORDER UNCERTAINTY IN ARIMA MODELS BAYESIAN ANALYSIS OF ORDER UNCERTAINTY IN ARIMA MODELS BY RICARDO S. EHLERS AND STEPHEN P. BROOKS Federal University of Paraná, Brazil and University of Cambridge, UK Abstract. In this paper we extend

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Statistical Inference for Stochastic Epidemic Models

Statistical Inference for Stochastic Epidemic Models Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian Classification and Regression Trees

Bayesian Classification and Regression Trees Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny

More information

Hierarchical Bayesian approaches for robust inference in ARX models

Hierarchical Bayesian approaches for robust inference in ARX models Hierarchical Bayesian approaches for robust inference in ARX models Johan Dahlin, Fredrik Lindsten, Thomas Bo Schön and Adrian George Wills Linköping University Post Print N.B.: When citing this work,

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 2 Coding and output Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. General (Markov) epidemic model 2. Non-Markov epidemic model 3. Debugging

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Infer relationships among three species: Outgroup:

Infer relationships among three species: Outgroup: Infer relationships among three species: Outgroup: Three possible trees (topologies): A C B A B C Model probability 1.0 Prior distribution Data (observations) probability 1.0 Posterior distribution Bayes

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2014 A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions Shane T. Jensen University of Pennsylvania Dean

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Bayesian Analysis of Order Uncertainty in ARIMA Models

Bayesian Analysis of Order Uncertainty in ARIMA Models Bayesian Analysis of Order Uncertainty in ARIMA Models R.S. Ehlers Federal University of Paraná, Brazil S.P. Brooks University of Cambridge, UK Summary. In this paper we extend the work of Brooks and Ehlers

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Spatial Analysis of Incidence Rates: A Bayesian Approach

Spatial Analysis of Incidence Rates: A Bayesian Approach Spatial Analysis of Incidence Rates: A Bayesian Approach Silvio A. da Silva, Luiz L.M. Melo and Ricardo Ehlers July 2004 Abstract Spatial models have been used in many fields of science where the data

More information