Marginal likelihood estimation via power posteriors
|
|
- Lindsay Carmella Knight
- 6 years ago
- Views:
Transcription
1 Marginal likelihood estimation via power posteriors N. Friel University of Glasgow, UK A.N. Pettitt Queensland University of Technology, Australia Summary. Model choice plays an increasingly important role in Statistics. From a Bayesian perspective a crucial goal is to compute the marginal likelihood of the data for a given model. This however is typically a difficult task since it amounts to integrating over all model parameters. The aim of this paper is to illustrate how this may be achieved using ideas from thermodynamic integration or path sampling. We show how the marginal likelihood can be computed via MCMC methods on modified posterior distributions for each model. This then allows Bayes factors or posterior model probabilities to be calculated. We show that this approach requires very little tuning, and is straightforward to implement. The new method is illustrated in a variety of challenging statistical settings. Keywords: BAYES FACTOR; REGRESSION; LONGITUDINAL DATA; MULTINOMIAL; HIDDEN MARKOV MODEL; MODEL CHOICE. Introduction Suppose data y is assumed to have been generated by one of M models indexed by the set {, 2,..., M}. An important goal of Bayesian model selection is to calculate p(k y) the posterior model probability for model k. Here the aim may be to obtain a single most probable model, or indeed a subset of likely models, a posteriori. Alternatively, posterior model probabilities may be synthesised from all competing models to calculate some quantity of interest common to all models, using model averaging (Hoeting et al 200). But before we can make inference for p(k y) we first need to fully specify the posterior distribution of all unknown parameters in the joint model and parameter space. We denote by θ k, the parameters specific to model k, where θ denotes collection of all model parameters. Specifying prior distributions for within model parameters p(θ k k), and for model indicators p(k), together with a likelihood L(y θ k, k) within model k allows Bayesian inference to proceed by examining the posterior distribution p(θ k, k y) L(y θ k, k)p(θ k k)p(k). () Across model strategies proceed by sampling from this joint posterior distribution of model indicators and parameters. The reversible jump MCMC algorithm (Green 995) is a popular approach for this situation. Other across model search strategies include (Godsill 200) and (Carlin and Chib 995). By contrast, within model searches examine the posterior distribution within model k separately for each k. Here the within model posterior appears as p(θ k y, k) L(y θ k, k)p(θ k k), where the constant of proportionality, often termed the marginal likelihood or integrated likelihood for model k is written as p(y k) = L(y θ k, k)p(θ k k)dθ k. (2) θ k This is in general a difficult integral to compute, possibly involving high dimensional within model parameters θ k. However if we were able to do so, then we would be readily able to make statements about posterior model probabilities, using Bayes theorem, p(k y) = p(y k)p(k) M p(y k)p(k). Address for correspondence: Department of Statistics, University of Glasgow, Glasgow, G2 8QQ, UK. nial@stats.gla.ac.uk
2 2 Friel, Pettitt The marginal likelihoods can be used to compare two models by computing Bayes factors, B 2 = p(y k = ) p(y k = 2), without the need to specify prior model probabilities. Note that the Bayes factor B 2 gives the evidence provided by the data in favour of model compared to model 2. It can also be seen that B 2 = p(k = y) p(k = 2) p(k = 2 y) p(k = ). In other words, the Bayes factor is the ratio of the posterior odds of model to its prior odds. Note that an improper prior distribution p(θ k k) leads necessarily to an improper marginal likelihood, which in turns implies that the Bayes factor is not well-defined. To circumvent the difficulty of using improper priors for model comparison, (O Hagan 995) introduced an approximate method termed the fractional Bayes factors. Here an approximate (proper) marginal likelihood is defined by the ratio θ k L(y θ k, k)p(θ k k)dθ k, θ k {L(y θ k, k)} a p(θ k k)dθ k since any impropriety in the prior for θ k cancels above and below. Other approaches to this problem include the intrinsic Bayes factor (Berger and Perricchi 996). In this paper we concentrate on the case where prior model distributions are proper. Various methods have been proposed in the literature to estimate the marginal likelihood (2). For example (Chib 995) estimates the marginal likelihood p(y k) using output from a Gibbs sampler for (θ k y, k). It relies however on a block updating approach for θ k. Clearly this is not always possible to do. This work was then extended in (Chib and Jeliazkov 200), where output from a Metropolis-Hasting algorithm for the posterior (θ k y, k) can be used to estimate the marginal likelihood. Annealed importance sampling (Neal 200) estimates the marginal likelihood using ideas from importance sampling. Here an independent sample from the posterior is generated by defining a sequence of distributions indexed by a temperature parameter t from the prior through to the posterior. Importantly however, the collection of importance weights can be used to estimate the marginal likelihood. In this paper we propose a new method to compute the marginal likelihood based on samples from a distribution proportional to the likelihood raised to a power t times the prior, which we term the power posterior. This method was inspired by ideas from path sampling or thermodynamic integration (Gelman and Meng 998). We find that the marginal likelihood, p(y k) can be expressed as an integral with respect to t from 0 to of the expected deviance for model k, where the expectation is taken with respect to the power distribution, at power t. We argue that this method requires very little tuning or book keeping. It is easy to implement, requiring minor modification to computer code which samples from the posterior distribution. In this paper, in Section 2 we carry out a review of different approaches to across and within model searches. Section 3 introduces the new method for estimating the marginal likelihood, based on sampling from the so-called power posterior distribution. Here we also outline how this method could be implemented in practice, while also giving some guidance as to the sensitivity of the estimate of the marginal likelihood (or Bayes factor) to the diffusivity of the prior model parameters. Three illustrations of how the new method performs in practice are given in Section 4. In particular complex modelling situations are illustrated which preclude other methods of marginal likelihood estimation which rely on block updating model parameters. We conclude this paper with some final remarks in Section Review of methods to compute Bayes factors There are numerous methods and techniques available to estimate (2). Generally speaking two approaches are possible an across model search or a within model search. The former approach in an MCMC setting, involves generating a single Markov chain which traverses the joint model and parameter space (2). A popular choice is the reversible jump sampler (Green 995). Godsill (200) retains aspects of the reversible jump sampler, but considers the case where parameters are shared
3 Marginal likelihood estimation via power posteriors 3 between different model, as occurs for example in nested models. Other choices include (Stephens 2000), (Carlin and Chib 995) and (Dellaportas, Forster and Ntzoufras 200). Within model search essentially aims to estimate the marginal likelihood (2) for each model k separately, and then if desired uses this information to form Bayes factors, see (Chib 995), (Chib and Jeliazkov 200). Neal (200) combines aspects of simulated annealing and importance sampling to provide a method of gathering an independent sample from a posterior distribution of interest, but importantly also to estimate the marginal likelihood. Bridge sampling (Meng and Wong 996) offers the possibility of estimating the Bayes factor linking the two posterior distributions by a bridge function. Bartolucci and Mira (2003) use this approach, although it is based on an across model reversible jump sampler. Within model approaches are disadvantageous when the cardinality of the model space is large. However, as noted by Green (2003), the ideal situation for a within model approach is one where the models are all reasonably heterogeneous. In effect, this is the case where it is difficult to choose proposal distributions when jumping between models - and indeed the situation where parameters across models of the same dimension have different interpretations. This short review is not intended to be exhaustive. A more complete picture can be found in the substantial reviews (Sisson 2005) and (Han and Carlin 200). 2.. Reversible jump MCMC Reversible jump MCMC (Green 995) offers the potential to carry out inference for all unknown parameters () in the joint model and parameter space in a single logical framework. A crucial innovation in the seminal paper by Green (995) was to illustrate that detailed balance could be achieved for general state spaces. In particular this extends the Metropolis-Hastings algorithm to variable dimension state spaces of the type (θ k, k). To implement the algorithm, proposing to move from (θ k, k) to (θ l, l) proceeds by generating a random vector u from a distribution g and setting (θ l, l) = f kl ((θ k, k), u). Similarly to move from (θ l, l) to (θ k, k) requires random numbers u following some distribution g, and setting (θ k, k) = f lk ((θ l, l), u ), for some deterministic function f lk. However it is important that the transformation f kl from (θ k, k) to (θ l, l) is both a bijection and its differential invertible. A necessary condition for this to apply is if the so-called dimension matching condition applies, that is, if dim(θ k ) + dim(u) = dim(θ l ) + dim(u ). In this case, the probability of accepting such a move appears as ( min, p(θ l, l y)p(l k)g (u ) ) p(θ k, k y)p(k l)g(u) J where p(l k) is the probability of moving from model k to model l and in addition J is the jacobian resulting from the transformation from ((θ k, k), u) to ((θ l, l), u ). In practice, this may be simplified slightly by not insisting on stochastic moves in both direction, so that, for example, dim(u ) = 0, whence the term g ( ) disappears in the numerator above. Finally for the case of nested models, a possible move type is (θ k+, k + ) = ((θ k, k), u) in which case the jacobian term equals. In some respects RJMCMC is difficult to use in practice. The main drawback would appear to be the problem of model mixing across dimensions. Typically this is as a result of the difficulty in choosing a suitable jump proposal distribution. Here it is unclear how to reasonably center and scale the distribution to increase the chance of the move being accepted. However recent work by (Brooks et al 2003) has tackled this problem to some extent Chib s method An important method of marginal likelihood estimation within each model is that of Chib (995). This method follows from noticing that for any parameter configuration θ, Bayes rule implies that the marginal likelihood of the data y for model k satisfies p(y) = L(y θ )p(θ ) p(θ. y) Here and for the remainder of this article, for ease of notation, we remove reference to the model indicator k, except where this is ambiguous. Each factor on the right hand side above can be calculated
4 4 Friel, Pettitt immediately, with the exception of the posterior probability p(θ y). Typically θ would be chosen as a point of high posterior probability to increase the numerical accuracy of the estimate. Chib illustrated that this probability can be estimated via Gibbs sampling provided θ can be partitioned into n blocks {θi }, say, where the full conditional of each block is amenable to Gibbs sampling. It is clear that n p(θ y) = p(θi θi,...,θ, y). i=2 Now each factor p(θj θ j,...,θ, y) can be estimated from the Gibbs output by integrating out parameters θj+,...,θ n: p(θ j θ j,...,θ, y) = I I i= p(θj θ(i) n,...,θ(i) j+, θ j,..., θ, y), (3) where the index i indicates iterations of the Markov chain at stationarity. Further the normalising constant of each block must be known exactly in order for full conditional probabilities to be estimated. Chib and Jeliazkov (200) extended this methodology to the case where (3) can be updated using Metropolis-Hastings output, employing an identity based solely on the Metropolis-Hastings acceptance probabilities, but which does not require the normalising constant of p(θ y). However implementing both methods relies on judicious partitioning of the parameter θ, in addition to a considerable amount of book keeping. Clearly both methods increase in computational complexity as the dimension of θ k increases Annealed importance sampling Estimating the marginal likelihood using ideas from importance sampling is also possible as illustrated in (Neal 200). The idea is to define a sequence of distributions, starting from one for which it is possible to gather samples, for example the prior distribution, and ending at a target distribution from which you would like to sample. Neal (200) defines this sequence as p ti (θ k y) = p(θ k ) ti p(θ k y) ti, where 0 = t 0 < t < < t n =. Thus p t0 and p tn corresponds to the prior and posterior distribution respectively. The algorithm begins by sampling a point x t0 from the prior, p t0. At the ith step, a point x ti+ is generated from p ti+ via a Markov chain transition kernel at x ti, for example via Gibbs, or Metropolis-Hastings updating. The final step n yields a point x tn from the posterior. Repeating this scheme N times, yields an independent sample x (),..., x (N) from the posterior. In effect, distribution p ti is an importance distribution for p ti+. An important by-product of this scheme is that the collection of importance weights are such that w (i) = p t n (x tn ) p tn 2 (x tn 2 ) p tn (x tn ) p tn (x tn )... p t 0 (x t0 ) p t (x t ) p(y) = N i= w(i) That is, the marginal likelihood obtains as the average of the importance weights. N. 3. Marginal likelihoods and power posteriors Here we introduce a new approach to estimating the integrated likelihood based on ideas of thermodynamic integration or path sampling (Gelman and Meng 998). Consider introducing an auxiliary variable (or temperature schedule) T(t) where T : [0, ] [0, ] defined such that T(0) = 0 and T() =. For simplicity we assume that T(t) = t. Consider the power posterior defined as p t (θ k y) {L(y θ k )} t p(θ k ). (4)
5 Now, define Marginal likelihood estimation via power posteriors 5 z(y t) = {L(y θ k )} t p(θ k ) dθ k. θ k By construction, z(y t = 0) is the integral of the prior for θ k, which equals. Further, z(y t = ) is the marginal likelihood of the data. Here we assume of course that z(y t) < for all t [0, ]. Now ideas from path sampling (Gelman and Meng 998) can be used to calculate the integral of interest z(y t = ). The following identity is crucial to the problem in hand: { } z(y t = ) log(p(y)) = log = E z(y t = 0) θk y,t log{l(y θ k)} dt. (5) Thus the log of the marginal likelihood results as the mean deviance where the expectation is taken with respect to the power posterior (4) at temperature t, where t moves from 0 to. The identity (5) can be derived easily as follows: d dt log{z(y t)} = z(y t) = z(y t) = z(y t) = 0 d dt z(y t) d {L(y θ k )} t p(θ k ) dθ k dt θ k θ k {L(y θ k )} t log{l(y θ k )}p(θ k ) dθ k {L(y θ k )} t p(θ k ) θ k z(y t) = E θk y,t log{l(y θ k)}. log{l(y θ k )} dθ k Equation (5) now follows by integrating with respect to t. This approach shares some analogies to annealed importance sampling outlined in Section 2.3, however, here we estimate the marginal likelihood on the log scale, ensuring increased numerical stability. Further our method estimates log(p(y)) using expectations, again aiding the numerical stability. It is interesting to note that the fraction z(y t = )/z(y t = a), where 0 < a < is precisely the approximation to the marginal likelihood used to compute the fractional Bayes factor (O Hagan 995). In addition note that the likelihood contribution to the power posterior is generally not a proper likelihood since it may not always hold that {L(y θ k )} t dy. Finally in common with simulated annealing and simulated tempering, the effect of the temperature parameter t is to flatten the likelihood contribution in the power posterior, so that it is approximately uniform for values of t close to 0, in which case the power posterior approximates the prior contribution. Note that path sampling has been employed to compute high dimensional normalising constants, most notably in estimation of parameters of Markov random fields. In this context the technique has been used to calculate normalising constants of model parameters, which are then used as a look-up table in the estimation process, see for example (Green and Richardson 2002) and (Dryden, Scarr and Taylor 2003). 3.. Estimating the marginal likelihood Note that the identity for the marginal likelihood (5) can be considered as a double integral, integrating over the power parameter t and the model parameters θ k. The joint distribution of θ k and t can be written as, p(θ k, t y) = p(θ k t, y)p(t) = L(y θ k) t p(θ k ) p(t) z(y t) Now the full conditional distribution of θ k looks like, while if we assume that p(t) z(y t), then also p(θ k y, t) {L(y θ k )} t p(θ k ) p(t θ k, y) {L(y θ k )} t p(θ k ).
6 6 Friel, Pettitt Now a sample {(θ () k, t ),..., (θ (n) k, t n)} gathered from p(θ k, t y) can be used to calculate (5) by ordering the t i s and calculating log{l(y θ k )}, estimating the integral via quadrature. All of this hinges on the assumption that p(t) z(y t). It is our experience that z(y t) varies by orders of magnitude with t. This is not surprising since by construction z(y t = 0) =, while the marginal likelihood, z(y t = ), could be quite large depending on the problem in hand. Thus using a single chain, values of t close to zero would tend not to be sampled with high frequency leading to poor estimation of p(y). As a more direct approach we suggest discretising the integral (5) over t [0, ], running separate chains for each t, sampling from the power posterior to estimate the mean deviance, E θk y,t log(l(θ k y)). Numerical integration using, for example, a trapezoidal rule, over t yields an estimate of the marginal likelihood. For example choosing a discretisation 0 = t 0 < t < < t n < t n =, leads to an approximation [ ] n E θk y,t i+ log L(y θ k ) + E θk y,t i log L(y θ k ) log p(y) (t i+ t i ) (6) 2 i= Note that Monte Carlo standard errors for each E θk y,t i log L(y θ k ), can be pieced together to give an overall Monte Carlo standard error for log p(y). It should be apparent that if it is possible to sample from the posterior distribution of model parameters, then it should often be possible to sample from the power posterior, and hence to compute the marginal likelihood. In particular if the likelihood contribution follows an exponential family model, then raising the likelihood to a power t amounts to multiplying the exponent by t. We therefore expect that in many cases if the posterior is amenable to Gibbs sampling, then so too would the power posterior. In terms of computation, modifying existing MCMC code which samples from posterior model parameters is trivial. Essentially all that is needed is an extra iteration loop for the temperature parameter t, calculating the expected deviance under the power posterior at each temperature iteration Sensitivity of p(y k) to the prior It is well understood that the Bayes factor is sensitive to the choice of prior model parameters. Here we outline for a simple example how this impacts on the estimate of the marginal likelihood via samples from the power posterior, as outlined above. Consider the simple situation where data y = {y i } are independent and normally distributed with mean θ and unit variance. Assuming θ N(m, v), a priori, leads to a power posterior, θ y, t N(m t, v t ) where It is straightforward to show that m t = ntȳ + m/v nt + /v E θ y,t log L(y θ) = n 2 log 2π 2 and v t = n (y i ȳ) 2 n 2 i= nt + /v. (m ȳ) 2 (vmt + ) 2 n 2 (nt + /v). (7) Recall that the log of the marginal likelihood obtains by integrating (7) with respect to t over t [0, ]. Consider the situation when t = 0. In this case the final term in on the left hand side of (7) appears as nv/2. Clearly as v, so too does E θ y,t=0 log L(y θ), and at the same speed. This illustrates the sensitivity of the prior specification to the numerical stability of the estimate of the marginal likelihood. Figure plots E θ y,t log L(y θ) against t for v = 0, 5,, (for illustrative purposes the first two terms on the left hand side take a constant value, ȳ = 0, m = 0 and n = 0). It is our experience that the behaviour of the mean deviance with t, as outlined in Figure is typical in more complex settings, even when the prior parameters are quite uninformative. Bearing this in mind we see that the choice of spacing for the t i s in (6) is important. For example, a temperature schedule of the type t i = a c i, where a i = i/n is an equal spacing of the n points in the interval [0, ], and c > is a constant, ensures that the t i s are chosen with high frequency close to t = 0. Prescribing the collection of t i s in this way should improve the efficiency of the estimate of p(y).
7 Marginal likelihood estimation via power posteriors Fig.. Expected deviance (7), under the distribution θ y, t plotted against t for prior variance equal to 0, 5,. As v increases, so too does the rate at which the mean deviance changes with t. i y i x i z i i y i x i z i i y i x i z i Table. Radiata pine dataset. y i: Maximum compression strength parallel to the grain, x i: Density, z i: Resinadjusted density. 4. Examples 4.. Linear regression - non-nested models The dataset in Table was taken from Williams (959). The data describe the maximum compression strength parallel to the grain y i, the density x i, and the resin-adjusted density z i for 42 specimens of radiata pine. This dataset has been examined in (Han and Carlin 200), (Carlin and Chib 995) and (Bartolucci and Scaccia 2004), where they compared several methods to estimate the Bayes factor between two non-nested competing models. The competing models are as follows: M : y i = α + β(x i x) + ǫ i, ǫ i N(0, σ 2 ). M 2 : y i = γ + δ(z i z) + η i, η i N(0, τ 2 ). The following prior specification was used (identical to those papers cited immediately above): N((3000, 85) T, diag(0 6, 0 4 ) for (α, β) T and (γ, δ) T. An IG(3, ( ) ) prior was chosen for σ 2 and τ 2, where IG(a, b) is an inverse gamma distribution with density f(x) = exp(/bx)γ(a)b a x a+.
8 8 Friel, Pettitt Power Posterior RJMCMC RJ corrected Mean Standard error Relative error.8% 34.5% 2.0% Table 2. Linear regression models. Estimates of means, standard errors and relative errors of B 2 using the method of power posteriors and RJMCMC. RJMCMC entries corresponds to prior model probabilities p(k = ) = p(k = 2) = 0.5, while RJ corrected corresponds to p(k = ) = and p(k = 2) = Green and O Hagan (998) found, for the given prior specification, by numerical integration that the Bayes factor B 2 = The aim for this straightforward situation is to see what statistically efficiency can be achieved by using the power posterior method to compute the Bayes factor over using RJMCMC. Here we calculated each marginal likelihood using a temperature schedule of the type t i = a 5 i, where the a i s correspond to 0 equally spaced points in [0, ]. In total 30, 000 iterations were used to estimate each marginal likelihood. Finally the algorithm was run for 00 independent chains, each estimating ˆB 2,i for i =,...,00. To implement RJMCMC we specified p(k = ) = p(k = 2) = 0.5. To allow for a fair comparison, the reversible jump sampler was run for 60, 000 iterations. Within model parameters were updated via Gibbs sampling. Across model moves were proposed by simply setting (α, β, σ) = (γ, δ, τ), resulting in the jacobian term taking the value. The following quantities appear in Table 2: Mean = Standard error = Rel. Error = B 2 00 ˆB 2,i 00 i= 00 ( 00 ˆB 2,i Mean) 2 i= 00 ( 00 ˆB 2,i B 2 ) 2. i= As can be seen from the results in Table 2, RJMCMC faired poorly. This is simply because the reversible jump sampler does not mix well and so does not visit model very often, leading to a poor posterior estimate of p(k = y). Running the reversible jump sampler with the prior model probability strongly weighted towards model with p(k = ) = and p(k = 2) = , leads to estimates of B 2 with similar efficiency to that of the power posterior method and the power posterior has marginally smaller relative error than the RJ corrected method Categorical longitudinal models For this example we re-visit an analysis of a categorical longitudinal dataset presented in (Pettitt et al 2005). This example concerns a large social survey of immigrants to Australia. Data is recorded for each subject on three separate occasions. The response variable of interest is employment status, which comprises of 3 categories employed, unemployed or non-participant. The nonparticipant category refers to subjects who are for example, students, retired or pensioners. The response variable is modelled as a multinomial random variable. Here we are concerned with fitting Bayesian hierarchical models to the data including both fixed and random effects on employment status. Specifically we assume that the response y isj of individual i at time s belongs to employment category j with probability p isj. Note we have used s to index time rather than k as in (Pettitt et al 2005). Thus y isj is a binary random variable, and further we assume that y is = {y is, y is2, y is3 } has a multinomial distribution y is Multinomial(p is, n is ),
9 Marginal likelihood estimation via power posteriors 9 Fixed effects model k = t i Mean Dev MC se Random effects model k = 2 t i Mean Dev MC se Table 3. Categorical longitudinal model. Expected deviances (and Monte Carlo standard errors) for the power posterior at temperature t i for models k =, 2. where n is = j y isj =. The next level of the hierarchy models the binary probabilities p isj to fixed and random covariate effects, p isj = µ isj j µ, isj where, log(µ isj ) = X T is β j + α ij, and where X is is a vector of covariates, β j are fixed effects, and α ij is a random effect reflecting time constant unobserved heterogeneity. Choosing employed as a reference state and setting β and α i to zero allows the model to be identified. In order to maintain invariance with respect to which state is chosen as the reference state, we must take the random effect (α i2, α i3 ) to have a multivariate normal distribution with mean 0 and variance-covariance Σ. We write the posterior distribution as p(β, α, Σ y) L(y β, α, Σ)p(α Σ)p(Σ)p(β), assuming a priori independence between β and Σ. The prior distribution for β j was set to standard normal distribution with variances of 6 (except for the parameters corresponding to age 2 which were given more precise priors). The random effects terms (α i2, α i3 ) was assigned a bivariate zero mean distribution where the elements of the variance-covariance matrix Σ where Σ Uniform(0, 0), Σ 22 Uniform(0, 0) and the correlation coefficient was assigned Uniform(, ). Here it is possible to update beliefs about all unknown parameters from their full conditional distributions using Gibbs sampling. Here our interest is in calculating marginal likelihoods for Model k = : log(µ isj ) = X T is β j Model k = 2: log(µ isj ) = X T is β j + α ij. Model is essentially a fixed effects model, where the regression effect β j, for employment state j, remains constant for each individual at each of the time points s. A random effects term, α ij, is however included in model 2, accounting for variability between individuals at a given state j. Indeed other plausible models are also possible, modelling for example, variability in time between individuals, again the reader is referred to (Pettitt et al 2005) for more details. For this illustration we have used a randomly selected sample size of 000 from the complete case data (n = 3234) analysed in (Pettitt et al 2005). For this example, collecting samples from the power posteriors is possible using the WinBUGS software (Spiegelhalter, Thomas and Best 998). To implement (6) we chose a temperature schedule t i = a 4 i, where the a i s are 0 equally spaced points in the interval [0.006, ] and the end point t = 0. Within each temperature t i, 2, 000 samples were collected from the stationary distribution p ti (β y, k = ) and p ti (β, α, Σ y, k = 2). Table 3 summarises the output. Applying the trapezoidal rule (6), yields log p(y k = ) = 2, and log p(y k = 2) = 2, 358.5, with associated Monte Carlo standard errors of 0.64 and.34 for the trapezoidal rule, respectively. This leads to the strong conclusion that a random effects model is much more probable which is qualitatively similar to the conclusion presented in (Pettitt et al 2005).
10 0 Friel, Pettitt 4.3. Hidden Markov random field models Markov random fields (MRFs) are often used to model binary spatially dependent data the autologistic model (Besag 974) is a popular choice. Here the joint distribution of x = {x i : i =, 2,..., n} taking values {(, +)} on a regular lattice is defined as p(x β) exp{β 0 x i + β x i x j } (8) i conditional on parameters β = (β 0, β ). Positive values of β 0 encourage x i to take the values +, while positive values of β encourage homogenous regions of + s or s. The notation i j denotes that x i and x j are neighbours. For this example we examine two models, defined via their neighbourhood structure: Model k = : A first order neighbourhood where each point x i has as neighbours the four nearest adjacent points. Model k = 2: A 2nd order neighbourhood structure where in addition to the first order neighbours, the four nearest diagonal points also belong to the neighbourhood. Both neighbourhood structures are modified along the edges of the lattice. MRF models are difficult to handle in practice, due to computational burden of calculating the proportional constant, in (8), c(β) say. A hidden MRF y arises when an MRF x is corrupted by some noise process. The underlying MRF is essentially hidden, and appears as parameters in the model. Typically it assumed that conditional on x the y i s are independent which gives the likelihood: L(y x, θ) = i j n p(y i x i, θ), i= for some parameters θ. Once prior distributions p(β) and p(θ) are specified for β and θ respectively, a complete Bayesian analysis proceeds by making inference on the posterior distribution p(x, β, θ y, k) L(y x, θ)p(x β, k)p(β)p(θ). It is relatively straightforward to sample from the full conditional distribution of each of x and θ. Sampling from the full conditional distribution of β is more problematic, due to the difficulty of calculating the normalising constant of the MRF, c(β). However provided the number of rows or columns is not greater than 20 for a reasonable number of the other dimension, then the forward recursion method presented in (Reeves and Pettitt 2004) can be use to calculate c(β). For a more complete description of the problem of Bayesian estimation of hidden MRFs, the reader is referred to (Friel et al 2005). For this example, gene expression levels were measured for 34 genes in a cluster of 38 neighbouring genes on the Streptomyces coelcicolor genome for 0 time points. The cluster of 38 neighbouring genes under study is responsible for the production of calcium-dependent antibodies We define the observations on a 38 0 regular lattice, where log expression level y sg corresponds to the gth gene at time point s. Figure 2 displays the data y, indicating gene locations for which there is no data. Here we assume that the data y masks a MRF process x, where states (, +) correspond to up-regulation and down-regulation respectively. We assume that the MRF process follows a first order neighbourhood structure (k = ), or a second order neighbourhood structure (k = 2). Finally we assume that the distribution of y given x is modelled as independent Gaussian noise with state specific mean µ(x sg ), and common variance σ and set θ = (µ, σ). Wit and McClure (2004) show that normality of log expression levels is a reasonable assumption for similar experimental setups. It is straightforward to handle the missing data in the full conditional distribution of the latent process x, the likelihood function needs to be modified slightly to allow for the fact that 4 of the columns of x are not supported by any data. A flat normal prior was chosen for each of the β parameters. The prior distribution for µ was distributed uniformly from the set {(µ( ), µ(+)) 2 µ( ) 2, µ( ) µ(+) 2}. The
11 Marginal likelihood estimation via power posteriors Fig. 2. Log expression levels of 34 genes on the Streptomyces genome for 0 consecutive time points. The x-axis labels missing columns. First order model k = t i Mean Dev MC se Second order model k = 2 t i Mean Dev MC se Table 4. Hidden Markov random field models. Expected deviances (and Monte Carlo standard errors) for the power posterior at temperature t i for models k =,2. values 2 and +2 represent approximate minimum and maximum values which are found in similar datasets. Corresponding to these values a gamma prior with mean 2 and variance 4 was specified for σ. Note that Friel and Wit (2005) present a more complete analysis of a similar dataset. Here we chose a temperature schedule t i = a 4 i, where the a i s are equally spaced points in the interval [0, ]. Within each temperature t i, 5, 000 samples were collected from the stationary distribution p ti (x, β, µ y, k), for k =, 2. Table 4 summarises the output. Applying the trapezoidal rule (6), yields log p(y k = ) = and log p(y k = 2) = 255.6, with associated Monte Carlo standard errors of and 0.000, respectively. Thus the second order model is deemed more probable a posteriori. 5. Concluding remarks We have introduced a new method of estimating the marginal likelihood for complex hierarchical Bayesian models which involves a minimal amount of change to commonly used algorithms which compute the posterior distributions of unknown parameters. We have illustrated the technique for three examples. The first, a simple regression example, where the prior model probabilities needed tuning in order to estimate the Bayes factor well. The second example involved a random effects model for multinomial data and demonstrated the ease of computing the marginal likelihood with a standard software package such as WinBUGS. Here the results demonstrated the overwhelming difference in marginal likelihoods for the two models considered, a similar situation to the first example where RJMCMC with default equally weighted model prior probabilities would perform very poorly. The third example involved a complex hidden Markov structure and the results demonstrated a difference in terms of marginal likelihood between the two models. In terms of approximating the mean deviance, the second example is far more challenging than the third example with the former displaying characteristics resulting from use of a vague prior with large negative values of the mean deviance for t near 0. However the Monte Carlo standard errors of the marginal likelihoods are
12 2 Friel, Pettitt nevertheless reasonably small for this example. Computation of the marginal likelihood requires a proper prior. The sensitivity of the value of marginal likelihood to the choice of prior can be readily investigated using our method. Various approaches have been proposed for the case where the prior is improper. As we mentioned above, the fractional Bayes factor is straightforwardly computed as a by-product of the marginal likelihood. For those seeking such approximations our method provides a straightforward solution. Our choice of quadrature rule and use of simulation resources can be improved but we have not followed up that matter here. But nevertheless, our computational approach provides estimates with tolerably small standard errors. In conclusion, we have illustrated a method of computing the marginal likelihood which is straightforward to implement and can be used for complex models. Acknowledgements Both authors were supported by the Australian Research Council. The authors wish to kindly acknowledge Thu Tran for her assistance with computational aspects of this work. Nial Friel wishes to acknowledge the School of Mathematical Sciences, QUT for its hospitality during June References Bartolucci, F. and A. Mira (2003), Efficient estimate of Bayes factors from reversible jump output. Technical report, Universitá dell Insubria, Dipartimento di Economia Bartolucci, F. and L. Scaccia (2004), A new approach for estimating the Bayes factor. Technical report, Universitá di Perugia Berger, J. O. and L. R. Perricchi (996), The intrinsic Bayes factor for linear models. In J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (eds.), Bayesian Statistics, vol. 5, pp , Oxford, Oxford University press Besag, J. E. (974), Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of the Royal Statistical Society, Series B 36, Brooks, S. P., P. Giudici and G. O. Roberts (2003), Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). Journal of the Royal Statistical Society, Series B 65(), 3 57 Carlin, B. P. and S. Chib (995), Bayesian Model Choice via Markov Chain Monte Carlo. Journal of the Royal Statistical Society, Series B 57, Chib, S. (995), Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90, Chib, S. and I. Jeliazkov (200), Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association 96, Dellaportas, P., J. J. Forster and I. Ntzoufras (200), On Bayesian model and variable selection using MCMC. Statistics and Computing 2, Dryden, I. L., M. R. Scarr and C. C. Taylor (2003), Bayesian texture segmentation of weed and crop images using reversible jump Markov chain Monte Carlo methods. Applied Statistics 52(), 3 50 Friel, N., A. N. Pettitt, R. Reeves and E. Wit (2005), Bayesian inference in hidden Markov random fields for binary data defined on large lattices. Technical report, University of Glasgow Friel, N. and E. Wit (2005), Markov random field model of gene interactions on the M. Tuberculosis genome. Technical report, University of Glasgow, Department of Statistics Gelman, A. and X.-L. Meng (998), Simulating normalizing contants: from importance sampling to bridge sampling to path sampling. Statistical Science 3, Godsill, S. J. (200), On the Relationship Between Markov Chain Monte Carlo Methods for Model Uncertainty. Journal of Computational and Graphical Statistics 0,
13 Marginal likelihood estimation via power posteriors 3 Green, P. J. (995), Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, Green, P. J. (2003), Trans-dimensional Markov chain Monte Carlo. In P. J. Green, N. L. Hjort and S. Richardson (eds.), Highly Structured Stochastic Systems, Oxford University Press, Oxford Green, P. J. and A. O Hagan (998), Model choice with MCMC on product spaces without using pseudo-priors. Technical Report 98-3, University of Nottingham Green, P. J. and S. Richardson (2002), Hidden Markov models and disease mapping. Journal of the American Statistical Association 97, Han, C. and B. P. Carlin (200), Marlov chain Monte Carlo methods for computing Bayes factors: A comparative review. Journal of the American Statistical Association 96(455), Hoeting, J. A., D. Madigan, A. E. Raftery and C. T. Volinsky (200), Bayesian model averaging: A tutorial. Statistical Science 4(4), Meng, X.-L. and W. Wong (996), Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica 6, Neal, R. M. (200), Annealed importance sampling. Statistics and Computing, O Hagan, A. (995), Fractional Bayes factors for model comparison (with discussion). Journal of the Royal Statistical Society, series B 57, Pettitt, A. N., T. T. Tran, M. A. Haynes and J. L. Hay (2005), A Bayesian hierarchical model for categorical longitudinal data from a social survey of immigrants. Journal of the Royal Statistical Society, series A (to appear) Reeves, R. and A. N. Pettitt (2004), Efficient recursions for general factorisable models. Biometrika 9(3), Sisson, S. A. (2005), Trans-dimensional Markov chains: A decade of progress and future perspectives. Journal of the American Statistical Association (to appear) Spiegelhalter, D., A. Thomas and N. Best (998), WinBUGS: Bayesian inference using Gibbs Sampling, Manual version.2. Imperial College, London and Medical Research Council Biostatistics Unit, Cambridge Stephens, M. (2000), Bayesian analysis of mixture models with an unknown number of components - an alternative to reversible jump methods. Annals of Statistics 28, Willams, E. (959), Regression Analysis. Wiley Wit, E. and J. McClure (2004), Statistics for Microarrays: Design, Analysis and Inference. Wiley, Chichester
A note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationBAYESIAN MODEL CRITICISM
Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes
More informationImproving power posterior estimation of statistical evidence
Improving power posterior estimation of statistical evidence Nial Friel, Merrilee Hurn and Jason Wyse Department of Mathematical Sciences, University of Bath, UK 10 June 2013 Bayesian Model Choice Possible
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationA generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection
A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection Silvia Pandolfi Francesco Bartolucci Nial Friel University of Perugia, IT University of Perugia, IT
More informationPackage RcppSMC. March 18, 2018
Type Package Title Rcpp Bindings for Sequential Monte Carlo Version 0.2.1 Date 2018-03-18 Package RcppSMC March 18, 2018 Author Dirk Eddelbuettel, Adam M. Johansen and Leah F. South Maintainer Dirk Eddelbuettel
More informationOn Bayesian model and variable selection using MCMC
Statistics and Computing 12: 27 36, 2002 C 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. On Bayesian model and variable selection using MCMC PETROS DELLAPORTAS, JONATHAN J. FORSTER
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationAn ABC interpretation of the multiple auxiliary variable method
School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle
More informationBayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models
Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
More informationBayesian time series classification
Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationNon-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice
Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Roberta Paroli 1, Silvia Pistollato, Maria Rosa, and Luigi Spezia 3 1 Istituto di Statistica
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationAn EM algorithm for Gaussian Markov Random Fields
An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationDelayed Rejection Algorithm to Estimate Bayesian Social Networks
Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationAssessing Regime Uncertainty Through Reversible Jump McMC
Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationBridge estimation of the probability density at a point. July 2000, revised September 2003
Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMiscellanea An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
Biometrika (2006), 93, 2, pp. 451 458 2006 Biometrika Trust Printed in Great Britain Miscellanea An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants BY
More informationEstimating the marginal likelihood with Integrated nested Laplace approximation (INLA)
Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationAEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE
AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE Marko Laine 1, Johanna Tamminen 1, Erkki Kyrölä 1, and Heikki Haario 2 1 Finnish Meteorological Institute, Helsinki, Finland 2 Lappeenranta
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationBayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)
Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationEstimating marginal likelihoods from the posterior draws through a geometric identity
Estimating marginal likelihoods from the posterior draws through a geometric identity Johannes Reichl Energy Institute at the Johannes Kepler University Linz E-mail for correspondence: reichl@energieinstitut-linz.at
More informationFrailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.
Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationComputer Practical: Metropolis-Hastings-based MCMC
Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationChapter 12 PAWL-Forced Simulated Tempering
Chapter 12 PAWL-Forced Simulated Tempering Luke Bornn Abstract In this short note, we show how the parallel adaptive Wang Landau (PAWL) algorithm of Bornn et al. (J Comput Graph Stat, to appear) can be
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationPrediction of Data with help of the Gaussian Process Method
of Data with help of the Gaussian Process Method R. Preuss, U. von Toussaint Max-Planck-Institute for Plasma Physics EURATOM Association 878 Garching, Germany March, Abstract The simulation of plasma-wall
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationInference and estimation in probabilistic time series models
1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationPosterior Model Probabilities via Path-based Pairwise Priors
Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationBayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration
Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute
More informationBayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School
Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22 Overview 1 Bayesian modelling
More informationFully Bayesian Spatial Analysis of Homicide Rates.
Fully Bayesian Spatial Analysis of Homicide Rates. Silvio A. da Silva, Luiz L.M. Melo and Ricardo S. Ehlers Universidade Federal do Paraná, Brazil Abstract Spatial models have been used in many fields
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationBRIDGE ESTIMATION OF THE PROBABILITY DENSITY AT A POINT
Statistica Sinica 14(2004), 603-612 BRIDGE ESTIMATION OF THE PROBABILITY DENSITY AT A POINT Antonietta Mira and Geoff Nicholls University of Insubria and Auckland University Abstract: Bridge estimation,
More informationBAYESIAN ANALYSIS OF ORDER UNCERTAINTY IN ARIMA MODELS
BAYESIAN ANALYSIS OF ORDER UNCERTAINTY IN ARIMA MODELS BY RICARDO S. EHLERS AND STEPHEN P. BROOKS Federal University of Paraná, Brazil and University of Cambridge, UK Abstract. In this paper we extend
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationStatistical Inference for Stochastic Epidemic Models
Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBayesian Classification and Regression Trees
Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny
More informationHierarchical Bayesian approaches for robust inference in ARX models
Hierarchical Bayesian approaches for robust inference in ARX models Johan Dahlin, Fredrik Lindsten, Thomas Bo Schön and Adrian George Wills Linköping University Post Print N.B.: When citing this work,
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationMCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 2 Coding and output Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. General (Markov) epidemic model 2. Non-Markov epidemic model 3. Debugging
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationInfer relationships among three species: Outgroup:
Infer relationships among three species: Outgroup: Three possible trees (topologies): A C B A B C Model probability 1.0 Prior distribution Data (observations) probability 1.0 Posterior distribution Bayes
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationA Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2014 A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions Shane T. Jensen University of Pennsylvania Dean
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationBayesian Analysis of Order Uncertainty in ARIMA Models
Bayesian Analysis of Order Uncertainty in ARIMA Models R.S. Ehlers Federal University of Paraná, Brazil S.P. Brooks University of Cambridge, UK Summary. In this paper we extend the work of Brooks and Ehlers
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSpatial Analysis of Incidence Rates: A Bayesian Approach
Spatial Analysis of Incidence Rates: A Bayesian Approach Silvio A. da Silva, Luiz L.M. Melo and Ricardo Ehlers July 2004 Abstract Spatial models have been used in many fields of science where the data
More information