The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS
|
|
- Dwayne Willis
- 5 years ago
- Views:
Transcription
1 The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science May 2008
2 The thesis of Chris Groendyke was reviewed and approved by the following: Murali Haran Assistant Professor of Statistics Thesis Advisor Donald Richards Professor of Statistics Associate Chair of the Department of Statistics Runze Li Associate Professor of Statistics Graduate Program Chair Signatures are on file in the Graduate School.
3 Abstract We develop various Markov chain Monte Carlo (MCMC) methods based on the ratio-of-uniforms (ROU) transformation and show how they can be used in a Bayesian context to simulate from the posterior distribution of linear Gaussian process models. These models are very popular in many disciplines, but are particularly important for modeling spatial data. We show that these algorithms, in spite of requiring no tuning, perform well in practice. We describe how the algorithms can be used in conjunction with some recently developed methods to estimate standard errors of MCMC-based estimates accurately. The estimated standard errors can, in turn, be used to automatically decide when to stop the MCMC runs thereby providing, in principle, a completely automated MCMC algorithm. We conclude with a study of the properties of these algorithms, using simulated as well as real data, taken from the field of Geosciences. iii
4 Table of Contents List of Figures List of Tables Acknowledgments vi viii ix Chapter 1 Introduction The Gaussian Process Model Bayesian Inference The Need for Automation Chapter 2 Markov Chain Monte Carlo MCMC Theory Markov Chains The Metropolis-Hastings Algorithm Variable-at-a-time Metropolis-Hastings Monte Carlo Standard Errors and Stopping Rules Effective Sample Size Chapter 3 Ratio-of-Uniforms Markov Chain Monte Carlo The Ratio-of-Uniforms Transformation Slice Sampling Multivariate Generalizations of the Ratio-of-Uniforms Transformation MCMC Using the Ratio-of-Uniforms Transformation Random Walk Stepping Out / Doubling Auto-tuning Random Walk Stepping Out / Doubling Starting Values Other Methods Using the ROU Transformation iv
5 3.6.1 Hybrid ROU Approach Rejection Sampling in the ROU Region Adaptive Rejection Sampling in the ROU Region Chapter 4 Comparative Study of Algorithms A Simulated Dataset A Geosciences Application Chapter 5 Conclusions and Future Work Conclusions Future Work Further Exploration of ROU-MCMC Algorithms Hit and Run Theoretical Results Spatial Generalized Linear Models Appendix A Derivation of Posterior Distributions 70 Bibliography 72 v
6 List of Figures 1.1 Spatially correlated data with linear regression fit and kriging Realization of a random walk Markov chain The Metropolis-Hastings algorithm Movement of the random walk Markov chain Sample produced by the random walk Markov chain The Gibbs algorithm for constructing a bivariate Markov chain Movement of the random walk Markov chain using Gibbs updates Sample produced by the random walk Markov chain using Gibbs updates The IMSE algorithm for computing autocorrelation time ROU region corresponding to univariate Uniform random variable Standard Normal ROU Example ROU region corresponding to bivariate Normal random variable The random walk algorithm for generating a new point in the ROU space The coordinate-at-a-time random walk algorithm for generating a new value for the i th coordinate in the ROU space The stepping out procedure for finding an interval (L, R) around the current point η 0 which contains the desired slice The doubling procedure for finding an interval (L, R) around the current point η 0 which contains the desired slice The procedure for generating a point in the slice from the proposal interval (L, R) The doubling procedure for finding an proposal hyper-rectangle around the current point The procedure for generating a point in the slice from a given proposal hyperrectangle The tuning procedure for the univariate random walk algorithm The tuning procedure for the multivariate random walk algorithm Empirical relationship between steps and shrinks for stepping out procedure The tuning procedure for the univariate stepping out algorithm Empirical relationship between steps and shrinks for doubling procedure The tuning procedure for the univariate doubling algorithm ACF plots for Univariate ROU Stepping Out Algorithm Run on Simulated Data ACF plots for Slice Sampler Algorithm Run on Simulated Data ACF plots for Multivariate Metropolis-Hastings Algorithm Run on Simulated Data Estimated Posterior Densities for Parameter κ for Simulated Data vi
7 4.5 Estimated Posterior Densities for Parameter ψ for Simulated Data Estimated Posterior Densities for Parameter φ for Simulated Data Estimated Posterior Densities for Parameter β for Simulated Data ACF plots for Univariate ROU Random Walk Algorithm Run on Geosciences Data ACF plots for Slice Sampler Algorithm Run on Geosciences Data ACF plots for Multivariate Metropolis-Hastings Algorithm Run on Geosciences Data Estimated Posterior Densities for Parameter κ for Geosciences Data Estimated Posterior Densities for Parameter ψ for Geosciences Data Estimated Posterior Densities for Parameter φ for Geosciences Data Estimated Posterior Densities for Parameter β for Geosciences Data The Hit and Run procedure for generating a proposal interval (L, R) vii
8 List of Tables 2.1 First six trials of the Metropolis-Hastings random walk First six trials of the Metropolis-Hastings random walk using Gibbs updates Comparison of Algorithms Run on Simulated Data for Parameter κ Comparison of Algorithms Run on Simulated Data for Parameter ψ Comparison of Algorithms Run on Simulated Data for Parameter φ Comparison of Algorithms Run on Simulated Data for Parameter β Comparison of Algorithms Run on Geosciences Data for Parameter κ Comparison of Algorithms Run on Geosciences Data for Parameter ψ Comparison of Algorithms Run on Geosciences Data for Parameter φ Comparison of Algorithms Run on Geosciences Data for Parameter β viii
9 Acknowledgments The author is very grateful to Dr. Murali Haran for his guidance and efforts during the course of this research. In addition, the author thanks Klaus Keller and Josh Dorin for providing the Geosciences data used in this study. The author is also grateful to the following people for their helpful conversations and suggestions regarding this effort: K. Sham Bhat, Matthew Tibbits, Muhammad Atiyat, and Scott Roths. ix
10 Chapter 1 Introduction Linear Gaussian process models are very flexible and widely applicable. They have therefore been used as models for data in a number of disciplines. One of the areas in which these models are commonly used is in modeling spatially-dependent data; for the current study, we will apply the linear Gaussian process model in this context. In addition to their applicability to many types of data, the linear Gaussian process model enjoys other significant advantages. Of particular note are a number of attractive theoretical properties (Cressie, 1993), some of which are described in Section 1.1. Our main interest lies in inference for the parameters of this model. To this end, one approach would be to use frequentist methods to perform inference on the model parameters. For instance, we might consider the possibility of estimating the parameters using a Maximum Likelihood Estimation (MLE) technique. Another approach for this problem is to use Bayesian inference methods, which have a few notable benefits. First, they allow us to incorporate the uncertainty in our parameter estimates into the predictions we make. They also provide a natural framework for working with hierarchical or multi-level statistical models. Finally, Bayesian inference methods provide us with the ability to utilize prior information or beliefs about model parameters, if such information is available. In the Bayesian approach, we assign prior distributions to each of the model parameters. Then the inference for each model parameter is based on its posterior distribution. In the ideal situation, this posterior distribution would be of a known form (or at least an unknown, but analytically tractable form). We would then be able to perform inference directly, either using analytical methods or possibly by generating a sample from this posterior distribution. However, when we are not able to work with a tractable posterior distribution (as is the case in this study), we can instead resort to Markov chain Monte Carlo (MCMC). That is, we run a Markov chain that converges to the desired posterior distribution, and base our inference on the sample produced by this Markov chain. Some basic theory relating to Markov chain Monte Carlo methods is covered in Chapter 2. The use of Markov chain Monte Carlo methods is very common in modern statistics. How-
11 2 ever, the algorithms used here differ from typical applications of MCMC theory in that they couple MCMC theory with an auxiliary variable method known as the ratio-of-uniforms (ROU) transformation. Using MCMC methods in conjunction with the ROU transformation (henceforth ROU-MCMC) has been suggested by Tierney (2005) and Karawatzki et al. (2006). These authors discuss various strategies for ROU-MCMC, but only discuss the application of the algorithm to relatively simple examples. Here we consider a number of variants of ROU-MCMC in the context of fitting linear Gaussian process models, which can present computational challenges. The specific algorithms used for this study are discussed in Chapter The Gaussian Process Model As noted above, the linear Gaussian process model has been used to model data from a wide spectrum of disciplines. One of the areas in which this model is commonly used is in spatial statistics - in particular, in the study of geostatistical data. This is the context in which we are using this model for the present study. In geostatistical data, we work with a response variable Z, which is present over some continuous domain D R p (see Cressie (1993) or Schabenberger and Gotway (2005) for a more detailed discussion). We only observe this process at a finite number of points in D; we denote the points at which the process is observed as s 1, s 2,..., s n, so that the response variable at each location s i is given by Z(s i ). Let Z = (Z(s 1 ),..., Z(s n )) T. Then, if we assume that Z can be described using the linear Gaussian process model, we have Z N (µ, Σ(Θ)), (1.1) with the mean vector µ given by µ = Xβ, where X is a matrix of covariates, and β is the corresponding vector of regression parameters. Therefore, under the assumption that the data can be described by this model, then the probability density function (pdf) of the data is f Z (z) = ( 1 exp 1 ) (2π) n/2 Σ(Θ) 1/2 2 (z µ)t Σ(Θ) 1 (z µ). (1.2) For this study, we are assuming an exponential covariance matrix, although it should be noted that other choices for the covariance structure, such as the Matérn, could also ( be used. ) In this specification, Θ = (κ, ψ, φ) and Σ(Θ) = ψi + κh(φ), where {H(φ)} i,j = exp and I si sj φ is the identity matrix. s i s j is the distance between locations i and j. The most common distance metric used in this model is the Euclidean distance, which is the distance measure we shall use here as well. The basic idea of this model is that observations which are closer together will be more similar to each other (in terms of the values of their response variables) than those with a greater distance between them; the covariance model parameters serve to precisely describe the nature of this relationship. Also note that the (covariance) parameters of the model above have meaningful physical interpretations, so that inference about the model parameters can yield immediate physical con-
12 3 clusions. In geostatistical terms, the parameter κ represents the sill, which is the asymptotic covariance between two points at a large distance from each other. φ denotes the range parameter. The range is the minimum distance required to attain the sill. Finally, the parameter ψ is the nugget and represents the amount of intrinsic variance not due to the distance between points (Schabenberger and Gotway, 2005). As mentioned above, the linear Gaussian process model also has some theoretical properties that can be beneficial. One of these properties is that its distribution is completely and uniquely determined by its mean vector and covariance matrix. That is, in order to fully describe the distribution of a random vector following this model, we need only specify its mean vector and covariance matrix. Another desirable property of this model is that weak stationarity is necessary and sufficient to imply strong stationarity. In general, weak stationarity is only necessary, but not sufficient (Cressie, 1993). Also, much asymptotic theory is known of Gaussian distributions. It is also important to note the importance of accounting for spatial correlation in data, when such spatial correlation exists. Failure to do so can lead to incorrect model assumptions, invalid parameter inference, and poor predicted values. For example, consider the following data set, which consists of 100 one-dimensional points that were simulated from a linear Gaussian process model. To demonstrate the importance of accounting for spatial dependence in the error structure of the data, we have fit both a standard linear regression model and a linear Gaussian process model to this data. The former model assumes independence between the data points, whereas the latter model incorporates spatial dependence. The predicted values are superimposed on the data shown in Figure 1.1. The solid line shows the predicted values based on a standard linear regression. The dashed line gives predicted values obtained by kriging. (Performing prediction on geostatisical data, such as we are doing for this example, is known as kriging (Schabenberger and Gotway, 2005).) We can see immediately that accounting for spatial dependence results in predictions that are much closer to the actual data points. 1.2 Bayesian Inference In classical frequentist inference methods, we treat the parameters of interest as fixed but unknown values. We use the data to try to determine the best estimates for these parameters, using methods like Maximum Likelihood Estimation (MLE) or the Method of Moments. These methods produce point estimates (perhaps with associated confidence intervals) for the parameters being estimated. Bayesian inference, on the other hand, treats the parameters as random variables, rather than fixed, unknown values. To each parameter η, we assign a prior distribution which represents our prior beliefs about this parameter. We then use the data to update our beliefs about the parameter, producing a posterior distribution for the parameter η. Then our inference regarding each parameter is based on its corresponding posterior distribution. The actual updating of the distributions of the parameters is performed by using Bayes theorem. We will denote the prior distribution for the parameter η by π(η), the likelihood function by f(z η), and the posterior distribution of η by π(η Z). Then by Bayes rule, we have
13 4 Figure 1.1: Spatially correlated data with linear regression fit and kriging. π(η Z) = f(z η)π(η) f(z η)π(η)dη (1.3) f(z η)π(η). (1.4) The denominator of (1.3) is known as the normalizing constant. One beneficial feature of some MCMC methods is that they often do not require us to know (or compute) this normalizing constant; in these cases it is sufficient to estimate the density kernel given by (1.4). 1.3 The Need for Automation One practical problem that arises with MCMC algorithms such as the Metropolis-Hastings algorithm is that they often require a substantial amount of tuning by the user. Tuning refers to the repeated adjustment of various auxiliary parameters, often known as tuning parameters. This tuning stage can potentially be expensive in terms of the time and effort required of the user. For example, the standard Metropolis-Hastings algorithm requires the user to specify a proposal distribution for each parameter or block of parameters being updated. In order to attempt to increase the efficiency of the algorithm, the user may be required to experiment with both the
14 5 form as well as the parameters, of these proposal distributions. Worse yet, these adjustments are dataset-specific, meaning that they must be repeated for each different dataset on which the algorithm is used. Using the ratio-of-uniforms transformation in conjunction with MCMC algorithms offers the possibility of automating this tuning process, freeing users of the burden of designing and tuning MCMC algorithms for each new data set. A related problem, which also requires the intervention of the user, is determining how long to run the Markov chain. Even when we know that a Markov chain will eventually converge to the correct target distribution, there still remains the question of how many trials it may take before this convergence can be deemed to have occurred. The user is often forced to rely on ad-hoc methods in order to make this judgment. This situation is clearly not ideal; it would be preferable to have clear, theoretically justified rules which tell us how many trials of our algorithms are sufficient. Fortunately, we are able to use recently developed methods on fixedwidth MCMC to accurately assess standard errors of our estimates, along with determining stopping rules for our algorithms. Thus, one advantage of this ROU-MCMC idea is that it offers the potential of producing a completely automated algorithm, that is, an algorithm which requires no user intervention either to tune the algorithm or to decide the length of the chain, but nonetheless retains desirable theoretical and practical properties. We consider the implementation of this idea in the context of linear Gaussian process models which are very important and popular and hence would benefit greatly from more efficient and/or automated MCMC algorithms. In this paper, we explore several different types of Markov chain Monte Carlo algorithms for sampling from the posterior distribution of a linear Gaussian process model. We implement some algorithms based on the ratio-of-uniforms transformation, as well as some standard algorithms, such as a standard Metropolis-Hastings algorithm, and compare their performances. We also discuss ideas for the automation of some of these algorithms. We show how to automate these algorithms both on the front end by having the algorithm tune itself, as well as on the back end by using estimates of Monte Carlo standard errors to determine how long to run the Markov chain. The remainder of the paper is organized as follows: in Chapter 2, we outline some basic theory of Markov chain Monte Carlo methods, as well as the estimation of Monte Carlo standard errors and how these Monte Carlo standard errors can be used to construct stopping rules. In Chapter 3, we introduce the ratio-of-uniforms transformation and discuss how this transformation can be used in conjunction with Markov chain Monte Carlo methods. Chapter 4 gives comparisons of the performance of the various algorithms in the context of real data. Finally, Chapter 5 contains the conclusions of this study and ideas for future work.
15 Chapter 2 Markov Chain Monte Carlo 2.1 MCMC Theory Before we proceed to describing the Markov chain Monte Carlo algorithms used in this study, it is first necessary to briefly discuss some basic theory of Markov chains, and how these Markov chains can be used to construct MCMC algorithms. More detailed discussions of MCMC theory can be found in Tierney (1994) and Robert and Casella (2004), while Geyer (1992) contains a discussion of some of the practical aspects of constructing MCMC algorithms Markov Chains A Markov chain is a sequence of random variables {X (i) }, i 1 having the property that the distribution of each random variable depends, at most, on the value of the previous random variable. That is, {X (i) } is a Markov chain if we have P (X (i+1) A X (1),..., X (i) ) = P (X (i+1) A X (i) ) for any set A (Casella and Berger, 2002), where X (j) denotes the j th step of the Markov chain. This property proves very useful in the construction of MCMC algorithms. In particular, the lack of dependence on prior random variables allows us to generate the next value in the chain using only its current value, rather than having to consider all previous values of the sequence. In the construction of Markov chains, we make use of a transition kernel. The transition kernel specifies the likelihood of the sequence moving from the current value of the random variable to all of the possible values that the next random variable in the sequence could take. It takes the form of a conditional density function and specifies the probability density for all values of the next step in the chain, given the current value of the chain. Note that for this study, all of the random variables we are studying have continuous distributions. Thus, we will only concern ourselves here with the continuous case, and not explore the theory of Markov chains on discrete
16 7 state spaces. Given a transition kernel, we can construct a Markov chain by choosing an initial starting point for the chain, and then using the transition kernel to govern the probabilities of moving to future states. Example 2.1. As a simple example of constructing a Markov chain, consider a random walk model (Robert and Casella, 2004). For this model, we have the relationship X (n+1) = X (n) + ɛ (n), where ɛ (n) is a random variable whose distribution is independent of the {X (i) } values. For this example, we will assume that ɛ (n) N(0, 1), so that X (n+1) N(x (n), 1), where x (n) is the realized value of X (n), i.e., the previous value of the Markov chain. Thus, the transition kernel for this model is given in (2.1). P (X (n+1) = x X (1) = x (1),..., X (n) = x (n) ) = P (X (n+1) = x X (n) = x (n) ) = 1 2π e 1 2 (x x(n) ) 2 (2.1) To complete the specification of the chain, we will also need to assign a starting value, that is, a value for X (1). For this example we will set x (1) = 0. Now we can generate each subsequent value of the chain using the transition kernel given in (2.1) and conditioning on the current value of the chain. Thus, to generate X (2) we would simulate using X (2) N(x (1), 1) = N(0, 1). Once we have a value for X (2) (call it x (2) ), we continue by simulating X (3) N(x (2), 1). We can continue to build a Markov chain of any desired length in this manner. A plot of the first 1,000 values of one possible realization of this Markov chain is shown in Figure 2.1. Generally, when we construct a Markov chain, we are hoping that it will eventually converge to a particular target distribution. In some circumstances, we can cause this to occur by the nature of the construction of the Markov chain. We now briefly explore some of the necessary conditions for this to take place, starting by defining some properties common to Markov chains. The invariant distribution π is the stationary distribution of a Markov chain if lim P n (X(n) A X (1) = x (1) ) = π(a) for almost all sets A and points x (1). Now denote the 1-step transition kernel by P and the n-step transition kernel by P n. That is, given that the chain is currently at x, the conditional probability that the next point will fall within the set A is P (x, A). Similarly, given that the chain is currently at x, the conditional probability that the chain will be at a point in the set A in n steps is P n (x, A). Then we can say that π is the stationary distribution of a Markov chain with transition kernel P if lim P n (x, A) = π(a) n for almost all x. This terminology means that the chain is stationary in its distribution, i.e.,
17 8 Figure 2.1: Realization of a random walk Markov chain. X (i) π implies that X (i+j) π for all j. A Markov chain is said to be irreducible if it has positive probability of moving to any set A for which π(a) > 0. Thus, an irreducible Markov chain is one in which all states communicate with one other. This is clearly an important property in the construction of MCMC algorithms; in order to have any chance of fully exploring the state space, the Markov chain must be able to get to all states, that is, it needs to be irreducible. Another important property of a Markov chain is its period. A Markov chain is known as periodic if there exists states to which the chain can only move at some particular regularly spaced times. For example, if a Markov chain can only take on a value in a set A every fourth period, then this chain would be periodic with a period of four. Irreducible Markov chains which are not periodic are known as aperiodic. A concept which will be important in the discussion of the convergence of Markov chains is recurrence. An irreducible Markov chain {X (n) } with invariant distribution π is said to be recurrent if, for each set A such that π(a) > 0, we have P (X (n) A i.o.) = 1 for almost all x A and P (X (n) A i.o.) > 0 for all x A (Tierney, 1994), where i.o. stands for infinitely often. Intuitively, recurrence means that the expected number of times that the chain will return to any set with positive measure is infinite. A slightly stronger property than recurrence is Harris recurrence. A Markov chain {X (i) } is called Harris recurrent if P (X (i) A i.o.) = 1 for all x. A Harris recurrent chain will return to every set of positive measure infinitely often with probability one. If there is an invariant finite measure for an irreducible Markov chain, then the chain is called positive recurrent. Markov chains which are recurrent, but not positive recurrent are called null recurrent.
18 9 A Markov chain which is positive recurrent and aperiodic is said to be ergodic. Intuitively, an ergodic Markov chain is one whose invariant distribution π is independent of the initial conditions of the chain (Robert and Casella, 2004). Similarly, a Markov chain which is both Harris recurrent and aperiodic is known as a Harris ergodic chain. The conditions which assure the convergence of a Markov chain to the stationary distribution π are given in Theorem 2.1, known as the Ergodic Theorem (which is a form of the Law of Large Numbers for Markov chains). Theorem 2.1. If a Markov chain with n-step transition kernel P n is Harris ergodic and irreducible, then lim n P n π T V = 0, where T V denotes the total variation norm, that is, f 1 f 2 T V = sup A f 1 (A) f 2 (A), where the supremum is taken over all measurable sets A. Proof. See Athreya et al. (1996). The Ergodic Theorem, while guaranteeing convergence of the Markov chain, unfortunately does not specify the rate of this convergence. In other words, while it assures us that the given Markov chain will indeed eventually converge to π, it does not give any indication of how long this convergence might take, or even provide an upper bound on this length of time. Clearly, this is an important point; if our goal in constructing the Markov chain is that it converge to a given stationary distribution π, we would like to have some indication of when this might occur, so that we might have an idea of how long to run the Markov chain. To address this issue, we can consider more stringent forms of ergodicity which put bounds on the rate of convergence of a Markov chain to its stationary distribution π. Uniform ergodicity and geometric ergodicity are two such stronger types of ergodicity. Specifically, a Markov chain with invariant distribution π is geometrically ergodic if there is a function M( ) and a constant r, 0 < r < 1 such that P n (x, ) π( ) T V M(x)r n for all x (Tierney, 1994). Furthermore, the chain is uniformly ergodic if there is a constant M and a constant r, 0 < r < 1 such that P n (x, ) π( ) T V Mr n for all x (Tierney, 1994). Clearly, uniform ergodicity is stronger than geometric ergodicity, and in fact the former implies the latter. Once we have completed running the Markov chain and have the corresponding sample {X (i) }, we can then use this sample to estimate various functions of the the random variable. In particular, we would estimate the function E π (g) (that is, the expectation of the function g with respect to the stationary distribution π) by using the corresponding sample mean ḡ n, where ḡ n = 1 n n g(x i ). i=1
19 10 While ḡ n will necessarily be an imperfect estimate of E π (g), under regularity conditions we can bound this discrepancy via a type of Central Limit Theorem for Markov chains. Theorem 2.2. Under regularity conditions, n (ḡ n E π (g)) d N(0, σ 2 g), where σ 2 g = V ar π (g(x 1 )) +2 i=2 Cov π(g(x 1 ), g(x i )) and the variance and covariance calculations are performed with respect to the distribution π. Proof. See Tierney (1994) and Nummelin (1984). Two examples of regularity conditions that will guarantee this Central Limit Theorem are (Roberts and Rosenthal, 2004): (i) {X i } is geometrically ergodic and E π g 2+δ < for some δ > 0, or (ii) {X i } is uniformly ergodic and E π g 2 <, though we should note that these are not the only such conditions. The importance of establishing this Central Limit Theorem is that it allows us to estimate σ 2 g, the variability of ḡ n, so that we can get some idea of the quality of our estimate ḡ n. Although there are many different methods of finding estimates for σ 2 g, here we will only consider the batch means method, which is described in Section The Metropolis-Hastings Algorithm Perhaps the most commonly used Markov chain Monte Carlo method is the Metropolis- Hastings algorithm. The basic idea of this algorithm is that instead of constructing the Markov chain by directly using the target distribution, the state transitions will be guided by a different distribution, known as the proposal distribution. Of course, using transition probabilities from the proposal distribution rather than the target distribution will cause the Markov chain to converge to the incorrect stationary distribution. The algorithm adjusts for this by sometimes staying at the current state, rather than moving to the state selected by the proposal distribution. This adjustment ensures that the algorithm does indeed converge to the correct target distribution. Suppose that our target distribution (the distribution we are interested in sampling from) is π. Further suppose that the proposal distribution is q(x, y) or q(y x). In both notations, x represents the current value of the Markov chain, whereas y is a possible next value of the chain. If the chain is at point X (n) = x, we define the acceptance probability as { } π(y)q(y, x) α(x, y) = min π(x)q(x, y), 1 unless π(x)q(x, y) = 0, in which case we set α(x, y) = 1. Next we generate a proposal from q( x), accept the proposal with probability α(x, y) and reject otherwise. If we accept the proposal, then this proposal becomes the next point in the Markov chain; if on the other hand we reject the proposal, then the current point is used as the next point in the chain. This algorithm, which (2.2)
20 11 was originally introduced by Metropolis et al. (1953) and later generalized by Hastings (1970), is described in Figure 2.2. Input: x (n) = current value of Markov chain x q(x (n), ) q(x, y) = proposal distribution a α(x (n), x ) V Uniform(0, 1) if (V < a) Output: then x (n+1) x x (n+1) = new value of the Markov chain else x (n+1) x (n) Figure 2.2: The Metropolis-Hastings algorithm. It is common to use a symmetric proposal distribution so that q(x, y) = q(y, x). In this case, (2.2) reduces to which simplifies the calculation of the acceptance probability. { } π(y) α(x, y) = min π(x), 1, (2.3) This is often referred to as a Metropolis update. Also note that, both (2.2) and (2.3) only depend on the distribution π( ) through the ratio π(y). It is for this reason that we need only specify the kernel of π( ); the π(x) normalizing constants cancel in this expression. Example 2.2. As an example of the Metropolis-Hastings algorithm, consider the problem of generating a random sample uniformly on a unit circle C centered at the origin. In this case, our target distribution is π(x 1, x 2 ) = 1 area(c) I ((x 1, x 2 ) C) = 1 π I ( x x 2 2 < 1 ), (2.4) where I( ) denotes the indicator function. For the proposal distribution, we will use a two-dimensional Normal distribution. The mean vector for the distribution will be the current point, and the covariance matrix will be the identity matrix. Thus, if the Markov chain is currently at X (n) = (x (n) ), then our proposal 1, x(n) 2 distribution is q(y 1, y 2 x 1, x 2 ) = 1 ( 2π exp 1 ( (y1 x 1 ) 2 2(y 1 x 1 )(y 2 x 2 ) + (y 2 x 2 ) 2)), 2 which is a symmetric distribution, enabling us to use the simpler form of the acceptance probability given in (2.3). To initialize the Metropolis-Hastings algorithm, we must choose a starting value for the Markov chain; we will start at the origin, so that X (1) = (x (1) ) = (0, 0). We 1, x(1) 2 then run the algorithm for as many trials as desired. Note that for this example, (2.3) becomes { } π(y1, y 2 ) α((x 1, x 2 ), (y 1, y 2 )) = min π(x 1, x 2 ), 1
21 12 { 1 π = min I ( y1 2 + y2 2 < 1 ) } 1 π I (x2 1 + x2 2 < 1), 1 { ( I y 2 = min 1 + y2 2 < 1 ) } I (x x2 2 < 1), 1 { ( I y 2 = min 1 + y2 2 < 1 ) }, 1 1 = I ( y1 2 + y2 2 < 1 ) (2.5) since I ( x x 2 2 < 1 ) is 1 because we know that the current point X (n) = (x (n) 1, x(n) 2 ) is in the unit circle (due to the fact that this is the current state of the Markov chain). Now notice that (2.5) will either be 0 or 1, depending on whether the proposed point is in the unit circle. If the proposed point is indeed within the unit circle, the acceptance probability (2.5) is 1, so that the proposed point is automatically accepted. On the other hand, if the proposed point lies outside the unit circle, the proposed point will always be rejected. Thus, for this random walk algorithm, the problem of determining whether a proposed point should be accepted or rejected reduces to calculating whether or not this proposed point lies within the unit circle, which is a rather simple calculation. For demonstration purposes, we will run this MCMC algorithm for 100 trials. The results of the first six trials are shown in Table 2.1. The movement of the Markov chain for these six trials is shown in Figure 2.3, along with the boundary of the region C from which we are trying to sample. Table 2.1: First six trials of the Metropolis-Hastings random walk. Trial Location of Markov chain Proposed Point Point Accepted? 1 (0.000, 0.000) (-0.140, 0.827) YES 2 (-0.140, 0.827) (0.706, ) YES 3 (0.706, ) (0.557, ) NO 4 (0.706, ) (0.608, 0.461) YES 5 (0.608, 0.461) (0.256, 0.167) YES 6 (0.256, 0.167) (-0.826, ) NO A plot of the entire sample of 100 points is shown in Figure 2.4. These points do indeed appear to be distributed uniformly across the unit circle, as we would hope. Note, however, that there are fewer than 100 distinct points on the plot. Some points are duplicates as a result of the trials in which the proposed point fell outside the unit circle and was hence rejected. Thus, in these trials, the Markov chain remained at its current location, rather than moving to a new point. We should also note that this simple example is only presented for demonstration purposes; if we actually wanted to generate a random sample with a bivariate Uniform distribution on the unit circle, there are many more efficient algorithms to produce such a sample than the random walk algorithm given in this example. In fact, for this case, it is unlikely that we would use any type of Markov chain algorithm, since it would be simple to produce an i.i.d. (independent and
22 13 Figure 2.3: Movement of the random walk Markov chain. identically distributed) sample from this distribution. Finally, note that in general, most Markov chains Monte Carlo algorithms are run for far more than 100 trials. These few trials will typically not be sufficient to produce a reasonable sample from the target distribution Variable-at-a-time Metropolis-Hastings Variable-at-a-time Metropolis-Hastings algorithms, of which the Gibbs sampler (Gelfand and Smith, 1990) is a special case, can be particularly helpful when we are attempting to construct a multivariate Markov chain. The reason this class of samplers is often beneficial is because they allow us to update the variables in the Markov chain individually, rather than having to update all of them at once. Suppose that we are trying to construct a Markov chain which converges to a stationary distribution π(x 1, x 2 ). We first let π X1 (x 1 ) = π(x 1, x 2 )dx 2 and π X2 (x 2 ) = π(x1, x 2 )dx 1 be the marginal distributions associated with π(x 1, x 2 ). Then the conditional distributions for the two variables are π X1 X 2 (x 1 x 2 ) = π(x 1, x 2 ) π X2 (x 2 ) and π X 2 X 1 (x 2 x 1 ) = π(x 1, x 2 ) π X1 (x 1 ). Then we can sample x 1 and x 2 individually, conditional upon the other. That is, we will first sample x 1 from π X1 X 2 (x 1 x 2 ) and then sample x 2 from π X2 X 1 (x 2 x 1 ). Sampling from these conditional distributions (rather than the full distribution) can lead to increases in efficiency, especially in the cases where the conditional distributions have recognizable distributions or are much easier to generate samples from. To produce the sampled points from each of these conditional distributions, we can use univariate Metropolis-Hastings methods, rejection samplers, or if the conditional distributions have recognized forms, we may be able to sample directly from one or more of them. We need not use the same updating method for each of the variables;
23 14 Figure 2.4: Sample produced by the random walk Markov chain. we can choose any univariate updating scheme that is appropriate for the given variable. This procedure is shown in Figure 2.5 for the case of a bivariate Markov chain, and can easily be extended to Markov chains of any finite dimension. Input: (x (n) 1, x(n) 2 ) = current value of Markov chain x(n+1) 1 π X1 X 2 (x 1 X 2 = x (n) 2 ) π X1 X 2 (x 1 x 2 ) = conditional distribution of X 1 X 2 π X2 X 1 (x 2 x 1 ) = conditional distribution of X 2 X 1 x (n+1) 2 π X2 X 1 (x 2 X 1 = x (n+1) 1 ) Output: (x (n+1) 1, x (n+1) 2 ) = new value of the Markov chain Figure 2.5: The Gibbs algorithm for constructing a bivariate Markov chain. Example 2.3. As an example, we will use the Gibbs algorithm to sample uniformly from a unit circle C centered at the origin. Note that this is the same target distribution as in the previous example. That is, we will construct a Markov chain that converges to the distribution given in (2.4). Instead of updating both coordinates simultaneously as before, however, using the Gibbs algorithm we will update the coordinates individually. To do this we need to find the appropriate conditional distributions for each variable; we first solve for the marginal distributions for each variable. π X1 (x 1 ) = π(x 1, x 2 )dx 2
24 15 1 = π I ( x x 2 2 < 1 ) dx 2 1 = π I ( x 2 2 < 1 x 2 ) 1 dx2 1 x = π dx 2 1 x 2 1 = 2 1 x 2 1 I ( 1 < x 1 < 1) π Similarly, we find that π X2 (x 2 ) = 2 1 x 2 2 I ( 1 < x 2 < 1). π Then we can solve for the conditional distributions corresponding to each of these variables. Likewise, we can also see that π X1 X 2 (x 1 x 2 ) = π(x 1, x 2 ) π X2 (x 2 ) 1 π = I ( x x 2 2 < 1 ) 2 1 x 2 2 I ( 1 < x 2 < 1) π ( 1 = 2 I 1 x 2 1 x 2 2 < x 1 < 2 ) 1 x 2 2 ( ) 1 π X2 X 1 (x 2 x 1 ) = 2 I 1 x 2 1 x 2 1 < x 2 < 1 x Inspecting these distributions, we can see that, conditional upon the value of the other coordinate, each coordinate has a uniform distribution, with limits determined by the value of the other coordinate. These limits correspond with the boundary of the unit circle. In this Gibbs sampler, we will update each coordinate via a univariate Metropolis-Hastings step. In order to do this, we must specify a proposal distribution for each coordinate. We will use a univariate normal distribution for each coordinate. The means of each of these Normal distributions will be the current values of the corresponding coordinate, and each distribution will have a variance of 1. Thus we have that q X1 (y x 1 ) is N(x 1, 1) and q X2 (y x 2 ) is N(x 2, 1). Now we can calculate the acceptance probabilities for each of the Metropolis-Hastings updates. In both cases, the proposal distributions are symmetric, so that we can use the simplified version of the acceptance probability given in (2.3). { } πx1 X α X1 X 2 (x, y) = min 2 (y) π X1 X 2 (x), x 2 2 = min x 2 2 ( I 1 x 2 2 < y < ) 1 x 2 2 ( I 1 x 2 2 < x < 1 x 2 2 ), 1
25 16 1 ( 2 I 1 x 2 1 x 2 2 < y < ) 1 x = min, x 2 2 ( I 1 x 2 2 < y < ) 1 x 2 2 = min, 1 1 = min = I { I ( 1 x 22 < y < ( 1 x 22 < y < 1 x 2 2 ) } 1 x 2 2, 1 ) As was the case in the previous example, this acceptance probability will always be either 0 or 1, depending on whether or not the proposed point lies within the unit circle. If it does, then we will accept it; if not, we ( reject and this coordinate of the Markov chain remains at its current value. Also note that I 1 x 2 2 < x < ) 1 x 2 2 will always be 1 by virtue of the current point lying within the unit circle. Similarly, the acceptance probability for the other coordinate is ( ) α X2 X 1 (x, y) = I 1 x 21 < y < 1 x 2 1. To complete the specification of this algorithm, we must assign a starting value to the Markov chain. As before, we will start the chain at the origin so that X (1) = (x (1) ) = (0, 0). We 1, x(1) 2 run this Markov chain for 100 trials (i.e., 50 updates of each coordinate). The first six trials are shown in Table 2.2. The movement of the Markov chain for these six trials is shown in Figure 2.6, along with the boundary of the region C from which we are trying to sample. Table 2.2: First six trials of the Metropolis-Hastings random walk using Gibbs updates. Trial Location of Markov chain Proposed Point Point Accepted? 1 (0.000, 0.000) (-0.472, 0.000) YES 2 (-0.472, 0.000) (-0.472, 0.402) YES 3 (-0.472, 0.402) (0.364, 0.402) YES 4 (0.364, 0.402) (0.364, ) YES 5 (0.364, ) (-0.534, ) YES 6 (-0.534, ) (-0.534, ) NO A plot of the entire sample is shown in Figure Monte Carlo Standard Errors and Stopping Rules In section 2.1 we mentioned some of the properties of Markov chains, including circumstances under which we can state a Central Limit Theorem for Markov chains, which was given in (2.2). If we can estimate σg, 2 the Central Limit Theorem lets us assess the accuracy of any of the calculations we do based on the Markov chain. While there are many different potential methods
26 17 Figure 2.6: Movement of the random walk Markov chain using Gibbs updates. available for calculating ˆσ 2 g (an estimate of σ 2 g), the one we will focus on here is the consistent batch means (CBM) method, as described in Jones et al. (2006) and Flegal et al. (2008). Our discussion of this method closely follows the description given by Flegal et al. (2008). Using the batch means method, to compute ˆσ 2 g based on a Markov chain run for n trials, we first split these n trials into a number of batches. In particular, we let a be the number of batches and b be the number of trials in each batch, so that n = ab. We then compute the sample mean for each batch as Ȳ j = 1 b Then the estimate of σ 2 g is given by jb i=(j 1)b+1 ˆσ 2 g = b a 1 g(x i ), j = 1,..., a. a ) 2 (Ȳj ḡ n. (2.6) j=1 Note that in general, for arbitrary values of a and b, the estimator defined in (2.6) will not be consistent for σ 2 g. However, there are some choices for a and b that do indeed assure consistency of this estimator. One such choice is to let b = n and a = n/b. Once we have an estimate of σ 2 g, we can then use this estimate to create a confidence interval for E π g. Specifically, if ˆσ 2 g is the estimator defined in (2.6), then t a 1 ˆσ 2 g n (2.7)
27 18 Figure 2.7: Sample produced by the random walk Markov chain using Gibbs updates. is the half-width of an asymptotically valid confidence interval for E π g, where t a 1 is the desired quantile from a t distribution with a 1 degrees of freedom (Jones et al., 2006). We can now use ˆσ g 2 to help devise stopping rules for an MCMC simulation. That is, we can use (2.7) to create guidelines which tell us for how long we should run a Markov chain in order to produce estimates of a desired accuracy. We must first specify the desired level of accuracy ɛ for our estimate. To achieve this level of accuracy, we want the half-width given in (2.7) to be no greater than ɛ. Thus our stopping rule is as follows: at periodic intervals (i.e., every k trials for some pre-specified value of k) we will calculate ˆσ g 2 using (2.6). Then we stop the simulation only if the associated confidence interval is narrow enough. In particular, we will stop if ˆσ g 2 t an 1 n + p(n) ɛ, (2.8) where p(n) = ɛi (n < n ) with n being a number chosen beforehand. (2.8) makes use of the half-width formula given in (2.7), with the addition of the p(n) term. This term ensures that the simulation is not stopped prematurely due to ˆσ g 2 being a poor estimate for σg 2 resulting from a small sample size. ˆσ g 2 being a consistent estimator of σg 2 implies that this procedure will stop for a sufficiently large value of n Effective Sample Size Ideally, when we produce a sample from a target distribution, we would like this sample to consist of i.i.d. (independent and identically distributed) draws from the distribution. However, draws produced using Markov chains will typically not be independent of each other, though the level of
28 19 dependence can vary greatly. We need to take into account the degree of this dependence when judging the quality of the samples produced. One way to do this is by examining the sample autocorrelations at various lags. If the samples were truly i.i.d., then the autocorrelation at each lag would be close to 0. Thus, we will prefer samples with autocorrelations that decay to 0 quickly, as this indicates samples which are less dependent. In order to assess the autocorrelation in a given sample, we must then examine the autocorrelation at each lag, which can be very tedious. (Note that in practice, we typically only look at autocorrelations for the first n lags, for some moderate value of n, since the autocorrelations at the very large lags are typically insignificant.) Another metric that is commonly used to assess the level of dependence in a sample is the effective sample size (ESS). A sample (containing some autocorrelation) which has an ESS of m contains as much information as an i.i.d. sample of m draws. We can calculate ESS for a sample of size N by using the formula ESS = N/κ(η), where κ(η) is the autocorrelation time for parameter η. A standard formula for autocorrelation time is κ(η) = k=1 ρ k(η), where ρ k (η) is the autocorrelation at lag k for the parameter η. Kass et al. (1998) recommends modifying this formula slightly by summing only the first j autocorrelation lags (for some finite j past which the autocorrelations have nearly vanished); here we determine j by using the Initial Monotone Sequence Estimator (IMSE) method (Geyer, 1992). This method is described in Figure 2.8. Note that since we do not know the true autocorrelations for each parameter, in practice we use the corresponding estimates from the sample data, ˆρ k (η). Input: ˆρ i (η) = Γ i ˆρ i (η) + ˆρ i+1 (η) estimated lag i autocorrelation for η k 1 repeat while Γ k+1 > 0 and Γ k > Γ k+1 : k k + 1 Output: κ(η) = autocorrelation time for η κ(η) k 1 j=1 ˆρ j(η) Figure 2.8: The IMSE algorithm for computing autocorrelation time.
29 Chapter 3 Ratio-of-Uniforms Markov Chain Monte Carlo 3.1 The Ratio-of-Uniforms Transformation The Ratio-of-Uniforms (ROU) transformation, as described by Kinderman and Monahan (1977), is a method for producing a random draw from a given distribution. Rather than sampling from the desired p-dimensional distribution directly, the ratio-of-uniforms method instead generates a draw from a uniform distribution on a particular region in p + 1 dimensions. A transformation is then required to translate this draw back into the original space; this backtransformed draw is a sample from the desired distribution. This is a type of auxiliary variable method; the method introduces an extra variable, with the hope that increasing the dimension of the target distribution (the distribution we are attempting to generate a sample from) will result in a more tractable sampling problem. The simplest case is that of a univariate target distribution. Consider the problem of sampling from a univariate distribution f. With the ROU method, we would generate a sample from the 2-dimensional region C f defined by C f = {(u, v) : 0 < v < f(u/v)}. After we have obtained the sample { (u (1), v (1) ), (u (2), v (2) ),..., (u (n), v (n) ) }, then we apply a transformation to translate this sample into a sample from the desired (1-dimensional) distribution. In particular, consider the transformation (y, z) = (u/v, v) After applying this transformation to the 2-dimensional sample, the result is that the marginal distribution of y = u/v has the desired distribution f. We simply ignore the other variable, z = v. This is summarized in Theorem 3.1: Theorem 3.1. (Kinderman and Monahan, 1977) Let f be a density function for a univariate random variable. Let (U,V) be random variables with a joint Uniform distribution on region C f R 2, with C f = {(u, v) : 0 < v < f(u/v)}. Then the random variable Y = U/V has distribution f.
30 21 Proof. (Kinderman and Monahan, 1977) The bivariate distribution of (u, v) is g(u, v) = 1 area(c f ) I ((u, v) C 1 ( f ) = area(c f ) I 0 < v < ) ( f(u/v) = 2I 0 < v < ) f(u/v) (Note that for univariate f, the area of C f is always 1 2, since this corresponds to the normalizing constant that causes the density to integrate to 1 as required.) The transformation (y, z) = (u/v, v) implies that (u, v) = (yz, z), so the Jacobian of this transformation is J = du dy dv dy du dz dv dz = du dv dy dz dv du = z 1 y 0 = z J = z = z dy dz (since z = v 0 by construction). Then the bivariate distribution of (y, z) is ( h Y,Z (y, z) = g U,V (u(y, z), v(y, z)) J = 2I 0 < z < f so that the marginal distribution of y is ( yz ) ) ( z = 2zI 0 < z < ) f(y) z j(y) = h(y, z)dz = ( 2zI 0 < z < ) f(y) dz = f(y) 0 2zdz = [ z 2] f(y) 0 = f(y) which is the desired target distribution. One important point to note concerning this method is that, in order to use the ROU transformation, we do not need to know the normalizing constant. That is, f does not need to be a density; it only must be proportional to a density. This property is very useful in cases where the normalizing constant is unknown or intractable. Example 3.1. As a simple example of this transformation, consider a target distribution X Uniform(1,2), i.e., f X (x) = I (x (1, 2)). We have C f = {(u, v) : 0 < v < f(u/v)} = {(u, v) : 0 < v < I (u/v (1, 2))} = {(u, v) : 0 < v < 1, u/v (1, 2)} = {(u, v) : 0 < v < 1, 1 < u/v < 2} = {(u, v) : 0 < v < 1, v < u < 2v}, which is a region bounded by a triangle with vertices at the points (0,0), (1,1), and (2,1) in the U V plane (see Figure 3.1).
31 22 Figure 3.1: ROU region corresponding to univariate Uniform random variable. Example 3.2. As an example of generating a sample from a univariate target distribution via the ROU transformation, consider the kernel of a univariate standard normal distribution (Kinderman and Monahan, 1977). That is, we will generate a sample for the random variable X, where X N(0, 1), i.e., f X (x) e x2 /2. In this case, the 2-dimensional ROU region corresponding to f X is given by C f = {0 < v < e u2 /4v 2 }. Note that C f is bounded by the rectangle {(u, v) : 1 u 1, 0 v 1}. This allows us to sample uniformly on C f by using a rejection sampler on this bounding rectangle. Once we have the sample of points on this region, then we simply let X (i) = U (i) /V (i), i = 1,..., n, where n is the desired sample size, so that {X (i) } is the sample from f X. Figure 3.2 shows the ROU region for this distribution, along with the histogram corresponding to the generated sample. We can also state a more general version of Theorem 3.1. This generalization of the ROU method, due to Wakefield et al. (1991), enables us to use different power transformations to create ROU regions of varying shapes, possibly yielding more efficient sampling schemes. Theorem 3.2. (Wakefield et al., 1991) Let f be a density function for a univariate random variable. Let (U,V) be random variables with a joint Uniform distribution on region C f R 2, with C f = {(u, v) : 0 < v < r+1 f(u/v r )}. Then the random variable Y = U/V r has distribution f.
32 23 (a) ROU Region for Standard Normal Sample (b) Histogram for ROU Standard Normal Sample Figure 3.2: Standard Normal ROU Example. Proof. The bivariate distribution of (u, v) is 1 g(u, v) = area(c f ) I ((u, v) C f ) 1 ( = area(c f ) I 0 < v < r+1 ) f(u/v r ) ( = (r + 1)I 0 < v < r+1 ) f(u/v r ) 1 (Note that for univariate f, the area of C f is always r+1, since this corresponds to the normalizing constant that causes the density to integrate to 1 as required.) The transformation (y, z) = (u/v r, v) implies that (u, v) = (yz r, z), so the Jacobian of this transformation is J = du dy dv dy du dz dv dz = du dv dy dz dv du dy dz = zr 1 ryz r 1 0 = z r J = z r = z r (since z = v 0 by construction). Then the bivariate distribution of (y, z) is h Y,Z (y, z) = g U,V (u(y, z), v(y, z)) J ( ) = (r + 1)I (0 ) yz r < z < r+1 f z r z r ( = (r + 1)z r I 0 < z < r+1 ) f(y)
Markov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationOn the Applicability of Regenerative Simulation in Markov Chain Monte Carlo
On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationComparing Non-informative Priors for Estimation and Prediction in Spatial Models
Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationApplicability of subsampling bootstrap methods in Markov chain Monte Carlo
Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationMultivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the
Multivariate Slice Sampling A Thesis Submitted to the Faculty of Drexel University by Jingjing Lu in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2008 c Copyright
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationModels for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data
Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationComputer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationModeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study
Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Gunter Spöck, Hannes Kazianka, Jürgen Pilz Department of Statistics, University of Klagenfurt, Austria hannes.kazianka@uni-klu.ac.at
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationA Bayesian perspective on GMM and IV
A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationCTDL-Positive Stable Frailty Model
CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland
More information1 Using standard errors when comparing estimated values
MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationGeometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics
Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics a dissertation submitted to the faculty of the graduate school of the university
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More information