The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

Size: px
Start display at page:

Download "The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS"

Transcription

1 The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science May 2008

2 The thesis of Chris Groendyke was reviewed and approved by the following: Murali Haran Assistant Professor of Statistics Thesis Advisor Donald Richards Professor of Statistics Associate Chair of the Department of Statistics Runze Li Associate Professor of Statistics Graduate Program Chair Signatures are on file in the Graduate School.

3 Abstract We develop various Markov chain Monte Carlo (MCMC) methods based on the ratio-of-uniforms (ROU) transformation and show how they can be used in a Bayesian context to simulate from the posterior distribution of linear Gaussian process models. These models are very popular in many disciplines, but are particularly important for modeling spatial data. We show that these algorithms, in spite of requiring no tuning, perform well in practice. We describe how the algorithms can be used in conjunction with some recently developed methods to estimate standard errors of MCMC-based estimates accurately. The estimated standard errors can, in turn, be used to automatically decide when to stop the MCMC runs thereby providing, in principle, a completely automated MCMC algorithm. We conclude with a study of the properties of these algorithms, using simulated as well as real data, taken from the field of Geosciences. iii

4 Table of Contents List of Figures List of Tables Acknowledgments vi viii ix Chapter 1 Introduction The Gaussian Process Model Bayesian Inference The Need for Automation Chapter 2 Markov Chain Monte Carlo MCMC Theory Markov Chains The Metropolis-Hastings Algorithm Variable-at-a-time Metropolis-Hastings Monte Carlo Standard Errors and Stopping Rules Effective Sample Size Chapter 3 Ratio-of-Uniforms Markov Chain Monte Carlo The Ratio-of-Uniforms Transformation Slice Sampling Multivariate Generalizations of the Ratio-of-Uniforms Transformation MCMC Using the Ratio-of-Uniforms Transformation Random Walk Stepping Out / Doubling Auto-tuning Random Walk Stepping Out / Doubling Starting Values Other Methods Using the ROU Transformation iv

5 3.6.1 Hybrid ROU Approach Rejection Sampling in the ROU Region Adaptive Rejection Sampling in the ROU Region Chapter 4 Comparative Study of Algorithms A Simulated Dataset A Geosciences Application Chapter 5 Conclusions and Future Work Conclusions Future Work Further Exploration of ROU-MCMC Algorithms Hit and Run Theoretical Results Spatial Generalized Linear Models Appendix A Derivation of Posterior Distributions 70 Bibliography 72 v

6 List of Figures 1.1 Spatially correlated data with linear regression fit and kriging Realization of a random walk Markov chain The Metropolis-Hastings algorithm Movement of the random walk Markov chain Sample produced by the random walk Markov chain The Gibbs algorithm for constructing a bivariate Markov chain Movement of the random walk Markov chain using Gibbs updates Sample produced by the random walk Markov chain using Gibbs updates The IMSE algorithm for computing autocorrelation time ROU region corresponding to univariate Uniform random variable Standard Normal ROU Example ROU region corresponding to bivariate Normal random variable The random walk algorithm for generating a new point in the ROU space The coordinate-at-a-time random walk algorithm for generating a new value for the i th coordinate in the ROU space The stepping out procedure for finding an interval (L, R) around the current point η 0 which contains the desired slice The doubling procedure for finding an interval (L, R) around the current point η 0 which contains the desired slice The procedure for generating a point in the slice from the proposal interval (L, R) The doubling procedure for finding an proposal hyper-rectangle around the current point The procedure for generating a point in the slice from a given proposal hyperrectangle The tuning procedure for the univariate random walk algorithm The tuning procedure for the multivariate random walk algorithm Empirical relationship between steps and shrinks for stepping out procedure The tuning procedure for the univariate stepping out algorithm Empirical relationship between steps and shrinks for doubling procedure The tuning procedure for the univariate doubling algorithm ACF plots for Univariate ROU Stepping Out Algorithm Run on Simulated Data ACF plots for Slice Sampler Algorithm Run on Simulated Data ACF plots for Multivariate Metropolis-Hastings Algorithm Run on Simulated Data Estimated Posterior Densities for Parameter κ for Simulated Data vi

7 4.5 Estimated Posterior Densities for Parameter ψ for Simulated Data Estimated Posterior Densities for Parameter φ for Simulated Data Estimated Posterior Densities for Parameter β for Simulated Data ACF plots for Univariate ROU Random Walk Algorithm Run on Geosciences Data ACF plots for Slice Sampler Algorithm Run on Geosciences Data ACF plots for Multivariate Metropolis-Hastings Algorithm Run on Geosciences Data Estimated Posterior Densities for Parameter κ for Geosciences Data Estimated Posterior Densities for Parameter ψ for Geosciences Data Estimated Posterior Densities for Parameter φ for Geosciences Data Estimated Posterior Densities for Parameter β for Geosciences Data The Hit and Run procedure for generating a proposal interval (L, R) vii

8 List of Tables 2.1 First six trials of the Metropolis-Hastings random walk First six trials of the Metropolis-Hastings random walk using Gibbs updates Comparison of Algorithms Run on Simulated Data for Parameter κ Comparison of Algorithms Run on Simulated Data for Parameter ψ Comparison of Algorithms Run on Simulated Data for Parameter φ Comparison of Algorithms Run on Simulated Data for Parameter β Comparison of Algorithms Run on Geosciences Data for Parameter κ Comparison of Algorithms Run on Geosciences Data for Parameter ψ Comparison of Algorithms Run on Geosciences Data for Parameter φ Comparison of Algorithms Run on Geosciences Data for Parameter β viii

9 Acknowledgments The author is very grateful to Dr. Murali Haran for his guidance and efforts during the course of this research. In addition, the author thanks Klaus Keller and Josh Dorin for providing the Geosciences data used in this study. The author is also grateful to the following people for their helpful conversations and suggestions regarding this effort: K. Sham Bhat, Matthew Tibbits, Muhammad Atiyat, and Scott Roths. ix

10 Chapter 1 Introduction Linear Gaussian process models are very flexible and widely applicable. They have therefore been used as models for data in a number of disciplines. One of the areas in which these models are commonly used is in modeling spatially-dependent data; for the current study, we will apply the linear Gaussian process model in this context. In addition to their applicability to many types of data, the linear Gaussian process model enjoys other significant advantages. Of particular note are a number of attractive theoretical properties (Cressie, 1993), some of which are described in Section 1.1. Our main interest lies in inference for the parameters of this model. To this end, one approach would be to use frequentist methods to perform inference on the model parameters. For instance, we might consider the possibility of estimating the parameters using a Maximum Likelihood Estimation (MLE) technique. Another approach for this problem is to use Bayesian inference methods, which have a few notable benefits. First, they allow us to incorporate the uncertainty in our parameter estimates into the predictions we make. They also provide a natural framework for working with hierarchical or multi-level statistical models. Finally, Bayesian inference methods provide us with the ability to utilize prior information or beliefs about model parameters, if such information is available. In the Bayesian approach, we assign prior distributions to each of the model parameters. Then the inference for each model parameter is based on its posterior distribution. In the ideal situation, this posterior distribution would be of a known form (or at least an unknown, but analytically tractable form). We would then be able to perform inference directly, either using analytical methods or possibly by generating a sample from this posterior distribution. However, when we are not able to work with a tractable posterior distribution (as is the case in this study), we can instead resort to Markov chain Monte Carlo (MCMC). That is, we run a Markov chain that converges to the desired posterior distribution, and base our inference on the sample produced by this Markov chain. Some basic theory relating to Markov chain Monte Carlo methods is covered in Chapter 2. The use of Markov chain Monte Carlo methods is very common in modern statistics. How-

11 2 ever, the algorithms used here differ from typical applications of MCMC theory in that they couple MCMC theory with an auxiliary variable method known as the ratio-of-uniforms (ROU) transformation. Using MCMC methods in conjunction with the ROU transformation (henceforth ROU-MCMC) has been suggested by Tierney (2005) and Karawatzki et al. (2006). These authors discuss various strategies for ROU-MCMC, but only discuss the application of the algorithm to relatively simple examples. Here we consider a number of variants of ROU-MCMC in the context of fitting linear Gaussian process models, which can present computational challenges. The specific algorithms used for this study are discussed in Chapter The Gaussian Process Model As noted above, the linear Gaussian process model has been used to model data from a wide spectrum of disciplines. One of the areas in which this model is commonly used is in spatial statistics - in particular, in the study of geostatistical data. This is the context in which we are using this model for the present study. In geostatistical data, we work with a response variable Z, which is present over some continuous domain D R p (see Cressie (1993) or Schabenberger and Gotway (2005) for a more detailed discussion). We only observe this process at a finite number of points in D; we denote the points at which the process is observed as s 1, s 2,..., s n, so that the response variable at each location s i is given by Z(s i ). Let Z = (Z(s 1 ),..., Z(s n )) T. Then, if we assume that Z can be described using the linear Gaussian process model, we have Z N (µ, Σ(Θ)), (1.1) with the mean vector µ given by µ = Xβ, where X is a matrix of covariates, and β is the corresponding vector of regression parameters. Therefore, under the assumption that the data can be described by this model, then the probability density function (pdf) of the data is f Z (z) = ( 1 exp 1 ) (2π) n/2 Σ(Θ) 1/2 2 (z µ)t Σ(Θ) 1 (z µ). (1.2) For this study, we are assuming an exponential covariance matrix, although it should be noted that other choices for the covariance structure, such as the Matérn, could also ( be used. ) In this specification, Θ = (κ, ψ, φ) and Σ(Θ) = ψi + κh(φ), where {H(φ)} i,j = exp and I si sj φ is the identity matrix. s i s j is the distance between locations i and j. The most common distance metric used in this model is the Euclidean distance, which is the distance measure we shall use here as well. The basic idea of this model is that observations which are closer together will be more similar to each other (in terms of the values of their response variables) than those with a greater distance between them; the covariance model parameters serve to precisely describe the nature of this relationship. Also note that the (covariance) parameters of the model above have meaningful physical interpretations, so that inference about the model parameters can yield immediate physical con-

12 3 clusions. In geostatistical terms, the parameter κ represents the sill, which is the asymptotic covariance between two points at a large distance from each other. φ denotes the range parameter. The range is the minimum distance required to attain the sill. Finally, the parameter ψ is the nugget and represents the amount of intrinsic variance not due to the distance between points (Schabenberger and Gotway, 2005). As mentioned above, the linear Gaussian process model also has some theoretical properties that can be beneficial. One of these properties is that its distribution is completely and uniquely determined by its mean vector and covariance matrix. That is, in order to fully describe the distribution of a random vector following this model, we need only specify its mean vector and covariance matrix. Another desirable property of this model is that weak stationarity is necessary and sufficient to imply strong stationarity. In general, weak stationarity is only necessary, but not sufficient (Cressie, 1993). Also, much asymptotic theory is known of Gaussian distributions. It is also important to note the importance of accounting for spatial correlation in data, when such spatial correlation exists. Failure to do so can lead to incorrect model assumptions, invalid parameter inference, and poor predicted values. For example, consider the following data set, which consists of 100 one-dimensional points that were simulated from a linear Gaussian process model. To demonstrate the importance of accounting for spatial dependence in the error structure of the data, we have fit both a standard linear regression model and a linear Gaussian process model to this data. The former model assumes independence between the data points, whereas the latter model incorporates spatial dependence. The predicted values are superimposed on the data shown in Figure 1.1. The solid line shows the predicted values based on a standard linear regression. The dashed line gives predicted values obtained by kriging. (Performing prediction on geostatisical data, such as we are doing for this example, is known as kriging (Schabenberger and Gotway, 2005).) We can see immediately that accounting for spatial dependence results in predictions that are much closer to the actual data points. 1.2 Bayesian Inference In classical frequentist inference methods, we treat the parameters of interest as fixed but unknown values. We use the data to try to determine the best estimates for these parameters, using methods like Maximum Likelihood Estimation (MLE) or the Method of Moments. These methods produce point estimates (perhaps with associated confidence intervals) for the parameters being estimated. Bayesian inference, on the other hand, treats the parameters as random variables, rather than fixed, unknown values. To each parameter η, we assign a prior distribution which represents our prior beliefs about this parameter. We then use the data to update our beliefs about the parameter, producing a posterior distribution for the parameter η. Then our inference regarding each parameter is based on its corresponding posterior distribution. The actual updating of the distributions of the parameters is performed by using Bayes theorem. We will denote the prior distribution for the parameter η by π(η), the likelihood function by f(z η), and the posterior distribution of η by π(η Z). Then by Bayes rule, we have

13 4 Figure 1.1: Spatially correlated data with linear regression fit and kriging. π(η Z) = f(z η)π(η) f(z η)π(η)dη (1.3) f(z η)π(η). (1.4) The denominator of (1.3) is known as the normalizing constant. One beneficial feature of some MCMC methods is that they often do not require us to know (or compute) this normalizing constant; in these cases it is sufficient to estimate the density kernel given by (1.4). 1.3 The Need for Automation One practical problem that arises with MCMC algorithms such as the Metropolis-Hastings algorithm is that they often require a substantial amount of tuning by the user. Tuning refers to the repeated adjustment of various auxiliary parameters, often known as tuning parameters. This tuning stage can potentially be expensive in terms of the time and effort required of the user. For example, the standard Metropolis-Hastings algorithm requires the user to specify a proposal distribution for each parameter or block of parameters being updated. In order to attempt to increase the efficiency of the algorithm, the user may be required to experiment with both the

14 5 form as well as the parameters, of these proposal distributions. Worse yet, these adjustments are dataset-specific, meaning that they must be repeated for each different dataset on which the algorithm is used. Using the ratio-of-uniforms transformation in conjunction with MCMC algorithms offers the possibility of automating this tuning process, freeing users of the burden of designing and tuning MCMC algorithms for each new data set. A related problem, which also requires the intervention of the user, is determining how long to run the Markov chain. Even when we know that a Markov chain will eventually converge to the correct target distribution, there still remains the question of how many trials it may take before this convergence can be deemed to have occurred. The user is often forced to rely on ad-hoc methods in order to make this judgment. This situation is clearly not ideal; it would be preferable to have clear, theoretically justified rules which tell us how many trials of our algorithms are sufficient. Fortunately, we are able to use recently developed methods on fixedwidth MCMC to accurately assess standard errors of our estimates, along with determining stopping rules for our algorithms. Thus, one advantage of this ROU-MCMC idea is that it offers the potential of producing a completely automated algorithm, that is, an algorithm which requires no user intervention either to tune the algorithm or to decide the length of the chain, but nonetheless retains desirable theoretical and practical properties. We consider the implementation of this idea in the context of linear Gaussian process models which are very important and popular and hence would benefit greatly from more efficient and/or automated MCMC algorithms. In this paper, we explore several different types of Markov chain Monte Carlo algorithms for sampling from the posterior distribution of a linear Gaussian process model. We implement some algorithms based on the ratio-of-uniforms transformation, as well as some standard algorithms, such as a standard Metropolis-Hastings algorithm, and compare their performances. We also discuss ideas for the automation of some of these algorithms. We show how to automate these algorithms both on the front end by having the algorithm tune itself, as well as on the back end by using estimates of Monte Carlo standard errors to determine how long to run the Markov chain. The remainder of the paper is organized as follows: in Chapter 2, we outline some basic theory of Markov chain Monte Carlo methods, as well as the estimation of Monte Carlo standard errors and how these Monte Carlo standard errors can be used to construct stopping rules. In Chapter 3, we introduce the ratio-of-uniforms transformation and discuss how this transformation can be used in conjunction with Markov chain Monte Carlo methods. Chapter 4 gives comparisons of the performance of the various algorithms in the context of real data. Finally, Chapter 5 contains the conclusions of this study and ideas for future work.

15 Chapter 2 Markov Chain Monte Carlo 2.1 MCMC Theory Before we proceed to describing the Markov chain Monte Carlo algorithms used in this study, it is first necessary to briefly discuss some basic theory of Markov chains, and how these Markov chains can be used to construct MCMC algorithms. More detailed discussions of MCMC theory can be found in Tierney (1994) and Robert and Casella (2004), while Geyer (1992) contains a discussion of some of the practical aspects of constructing MCMC algorithms Markov Chains A Markov chain is a sequence of random variables {X (i) }, i 1 having the property that the distribution of each random variable depends, at most, on the value of the previous random variable. That is, {X (i) } is a Markov chain if we have P (X (i+1) A X (1),..., X (i) ) = P (X (i+1) A X (i) ) for any set A (Casella and Berger, 2002), where X (j) denotes the j th step of the Markov chain. This property proves very useful in the construction of MCMC algorithms. In particular, the lack of dependence on prior random variables allows us to generate the next value in the chain using only its current value, rather than having to consider all previous values of the sequence. In the construction of Markov chains, we make use of a transition kernel. The transition kernel specifies the likelihood of the sequence moving from the current value of the random variable to all of the possible values that the next random variable in the sequence could take. It takes the form of a conditional density function and specifies the probability density for all values of the next step in the chain, given the current value of the chain. Note that for this study, all of the random variables we are studying have continuous distributions. Thus, we will only concern ourselves here with the continuous case, and not explore the theory of Markov chains on discrete

16 7 state spaces. Given a transition kernel, we can construct a Markov chain by choosing an initial starting point for the chain, and then using the transition kernel to govern the probabilities of moving to future states. Example 2.1. As a simple example of constructing a Markov chain, consider a random walk model (Robert and Casella, 2004). For this model, we have the relationship X (n+1) = X (n) + ɛ (n), where ɛ (n) is a random variable whose distribution is independent of the {X (i) } values. For this example, we will assume that ɛ (n) N(0, 1), so that X (n+1) N(x (n), 1), where x (n) is the realized value of X (n), i.e., the previous value of the Markov chain. Thus, the transition kernel for this model is given in (2.1). P (X (n+1) = x X (1) = x (1),..., X (n) = x (n) ) = P (X (n+1) = x X (n) = x (n) ) = 1 2π e 1 2 (x x(n) ) 2 (2.1) To complete the specification of the chain, we will also need to assign a starting value, that is, a value for X (1). For this example we will set x (1) = 0. Now we can generate each subsequent value of the chain using the transition kernel given in (2.1) and conditioning on the current value of the chain. Thus, to generate X (2) we would simulate using X (2) N(x (1), 1) = N(0, 1). Once we have a value for X (2) (call it x (2) ), we continue by simulating X (3) N(x (2), 1). We can continue to build a Markov chain of any desired length in this manner. A plot of the first 1,000 values of one possible realization of this Markov chain is shown in Figure 2.1. Generally, when we construct a Markov chain, we are hoping that it will eventually converge to a particular target distribution. In some circumstances, we can cause this to occur by the nature of the construction of the Markov chain. We now briefly explore some of the necessary conditions for this to take place, starting by defining some properties common to Markov chains. The invariant distribution π is the stationary distribution of a Markov chain if lim P n (X(n) A X (1) = x (1) ) = π(a) for almost all sets A and points x (1). Now denote the 1-step transition kernel by P and the n-step transition kernel by P n. That is, given that the chain is currently at x, the conditional probability that the next point will fall within the set A is P (x, A). Similarly, given that the chain is currently at x, the conditional probability that the chain will be at a point in the set A in n steps is P n (x, A). Then we can say that π is the stationary distribution of a Markov chain with transition kernel P if lim P n (x, A) = π(a) n for almost all x. This terminology means that the chain is stationary in its distribution, i.e.,

17 8 Figure 2.1: Realization of a random walk Markov chain. X (i) π implies that X (i+j) π for all j. A Markov chain is said to be irreducible if it has positive probability of moving to any set A for which π(a) > 0. Thus, an irreducible Markov chain is one in which all states communicate with one other. This is clearly an important property in the construction of MCMC algorithms; in order to have any chance of fully exploring the state space, the Markov chain must be able to get to all states, that is, it needs to be irreducible. Another important property of a Markov chain is its period. A Markov chain is known as periodic if there exists states to which the chain can only move at some particular regularly spaced times. For example, if a Markov chain can only take on a value in a set A every fourth period, then this chain would be periodic with a period of four. Irreducible Markov chains which are not periodic are known as aperiodic. A concept which will be important in the discussion of the convergence of Markov chains is recurrence. An irreducible Markov chain {X (n) } with invariant distribution π is said to be recurrent if, for each set A such that π(a) > 0, we have P (X (n) A i.o.) = 1 for almost all x A and P (X (n) A i.o.) > 0 for all x A (Tierney, 1994), where i.o. stands for infinitely often. Intuitively, recurrence means that the expected number of times that the chain will return to any set with positive measure is infinite. A slightly stronger property than recurrence is Harris recurrence. A Markov chain {X (i) } is called Harris recurrent if P (X (i) A i.o.) = 1 for all x. A Harris recurrent chain will return to every set of positive measure infinitely often with probability one. If there is an invariant finite measure for an irreducible Markov chain, then the chain is called positive recurrent. Markov chains which are recurrent, but not positive recurrent are called null recurrent.

18 9 A Markov chain which is positive recurrent and aperiodic is said to be ergodic. Intuitively, an ergodic Markov chain is one whose invariant distribution π is independent of the initial conditions of the chain (Robert and Casella, 2004). Similarly, a Markov chain which is both Harris recurrent and aperiodic is known as a Harris ergodic chain. The conditions which assure the convergence of a Markov chain to the stationary distribution π are given in Theorem 2.1, known as the Ergodic Theorem (which is a form of the Law of Large Numbers for Markov chains). Theorem 2.1. If a Markov chain with n-step transition kernel P n is Harris ergodic and irreducible, then lim n P n π T V = 0, where T V denotes the total variation norm, that is, f 1 f 2 T V = sup A f 1 (A) f 2 (A), where the supremum is taken over all measurable sets A. Proof. See Athreya et al. (1996). The Ergodic Theorem, while guaranteeing convergence of the Markov chain, unfortunately does not specify the rate of this convergence. In other words, while it assures us that the given Markov chain will indeed eventually converge to π, it does not give any indication of how long this convergence might take, or even provide an upper bound on this length of time. Clearly, this is an important point; if our goal in constructing the Markov chain is that it converge to a given stationary distribution π, we would like to have some indication of when this might occur, so that we might have an idea of how long to run the Markov chain. To address this issue, we can consider more stringent forms of ergodicity which put bounds on the rate of convergence of a Markov chain to its stationary distribution π. Uniform ergodicity and geometric ergodicity are two such stronger types of ergodicity. Specifically, a Markov chain with invariant distribution π is geometrically ergodic if there is a function M( ) and a constant r, 0 < r < 1 such that P n (x, ) π( ) T V M(x)r n for all x (Tierney, 1994). Furthermore, the chain is uniformly ergodic if there is a constant M and a constant r, 0 < r < 1 such that P n (x, ) π( ) T V Mr n for all x (Tierney, 1994). Clearly, uniform ergodicity is stronger than geometric ergodicity, and in fact the former implies the latter. Once we have completed running the Markov chain and have the corresponding sample {X (i) }, we can then use this sample to estimate various functions of the the random variable. In particular, we would estimate the function E π (g) (that is, the expectation of the function g with respect to the stationary distribution π) by using the corresponding sample mean ḡ n, where ḡ n = 1 n n g(x i ). i=1

19 10 While ḡ n will necessarily be an imperfect estimate of E π (g), under regularity conditions we can bound this discrepancy via a type of Central Limit Theorem for Markov chains. Theorem 2.2. Under regularity conditions, n (ḡ n E π (g)) d N(0, σ 2 g), where σ 2 g = V ar π (g(x 1 )) +2 i=2 Cov π(g(x 1 ), g(x i )) and the variance and covariance calculations are performed with respect to the distribution π. Proof. See Tierney (1994) and Nummelin (1984). Two examples of regularity conditions that will guarantee this Central Limit Theorem are (Roberts and Rosenthal, 2004): (i) {X i } is geometrically ergodic and E π g 2+δ < for some δ > 0, or (ii) {X i } is uniformly ergodic and E π g 2 <, though we should note that these are not the only such conditions. The importance of establishing this Central Limit Theorem is that it allows us to estimate σ 2 g, the variability of ḡ n, so that we can get some idea of the quality of our estimate ḡ n. Although there are many different methods of finding estimates for σ 2 g, here we will only consider the batch means method, which is described in Section The Metropolis-Hastings Algorithm Perhaps the most commonly used Markov chain Monte Carlo method is the Metropolis- Hastings algorithm. The basic idea of this algorithm is that instead of constructing the Markov chain by directly using the target distribution, the state transitions will be guided by a different distribution, known as the proposal distribution. Of course, using transition probabilities from the proposal distribution rather than the target distribution will cause the Markov chain to converge to the incorrect stationary distribution. The algorithm adjusts for this by sometimes staying at the current state, rather than moving to the state selected by the proposal distribution. This adjustment ensures that the algorithm does indeed converge to the correct target distribution. Suppose that our target distribution (the distribution we are interested in sampling from) is π. Further suppose that the proposal distribution is q(x, y) or q(y x). In both notations, x represents the current value of the Markov chain, whereas y is a possible next value of the chain. If the chain is at point X (n) = x, we define the acceptance probability as { } π(y)q(y, x) α(x, y) = min π(x)q(x, y), 1 unless π(x)q(x, y) = 0, in which case we set α(x, y) = 1. Next we generate a proposal from q( x), accept the proposal with probability α(x, y) and reject otherwise. If we accept the proposal, then this proposal becomes the next point in the Markov chain; if on the other hand we reject the proposal, then the current point is used as the next point in the chain. This algorithm, which (2.2)

20 11 was originally introduced by Metropolis et al. (1953) and later generalized by Hastings (1970), is described in Figure 2.2. Input: x (n) = current value of Markov chain x q(x (n), ) q(x, y) = proposal distribution a α(x (n), x ) V Uniform(0, 1) if (V < a) Output: then x (n+1) x x (n+1) = new value of the Markov chain else x (n+1) x (n) Figure 2.2: The Metropolis-Hastings algorithm. It is common to use a symmetric proposal distribution so that q(x, y) = q(y, x). In this case, (2.2) reduces to which simplifies the calculation of the acceptance probability. { } π(y) α(x, y) = min π(x), 1, (2.3) This is often referred to as a Metropolis update. Also note that, both (2.2) and (2.3) only depend on the distribution π( ) through the ratio π(y). It is for this reason that we need only specify the kernel of π( ); the π(x) normalizing constants cancel in this expression. Example 2.2. As an example of the Metropolis-Hastings algorithm, consider the problem of generating a random sample uniformly on a unit circle C centered at the origin. In this case, our target distribution is π(x 1, x 2 ) = 1 area(c) I ((x 1, x 2 ) C) = 1 π I ( x x 2 2 < 1 ), (2.4) where I( ) denotes the indicator function. For the proposal distribution, we will use a two-dimensional Normal distribution. The mean vector for the distribution will be the current point, and the covariance matrix will be the identity matrix. Thus, if the Markov chain is currently at X (n) = (x (n) ), then our proposal 1, x(n) 2 distribution is q(y 1, y 2 x 1, x 2 ) = 1 ( 2π exp 1 ( (y1 x 1 ) 2 2(y 1 x 1 )(y 2 x 2 ) + (y 2 x 2 ) 2)), 2 which is a symmetric distribution, enabling us to use the simpler form of the acceptance probability given in (2.3). To initialize the Metropolis-Hastings algorithm, we must choose a starting value for the Markov chain; we will start at the origin, so that X (1) = (x (1) ) = (0, 0). We 1, x(1) 2 then run the algorithm for as many trials as desired. Note that for this example, (2.3) becomes { } π(y1, y 2 ) α((x 1, x 2 ), (y 1, y 2 )) = min π(x 1, x 2 ), 1

21 12 { 1 π = min I ( y1 2 + y2 2 < 1 ) } 1 π I (x2 1 + x2 2 < 1), 1 { ( I y 2 = min 1 + y2 2 < 1 ) } I (x x2 2 < 1), 1 { ( I y 2 = min 1 + y2 2 < 1 ) }, 1 1 = I ( y1 2 + y2 2 < 1 ) (2.5) since I ( x x 2 2 < 1 ) is 1 because we know that the current point X (n) = (x (n) 1, x(n) 2 ) is in the unit circle (due to the fact that this is the current state of the Markov chain). Now notice that (2.5) will either be 0 or 1, depending on whether the proposed point is in the unit circle. If the proposed point is indeed within the unit circle, the acceptance probability (2.5) is 1, so that the proposed point is automatically accepted. On the other hand, if the proposed point lies outside the unit circle, the proposed point will always be rejected. Thus, for this random walk algorithm, the problem of determining whether a proposed point should be accepted or rejected reduces to calculating whether or not this proposed point lies within the unit circle, which is a rather simple calculation. For demonstration purposes, we will run this MCMC algorithm for 100 trials. The results of the first six trials are shown in Table 2.1. The movement of the Markov chain for these six trials is shown in Figure 2.3, along with the boundary of the region C from which we are trying to sample. Table 2.1: First six trials of the Metropolis-Hastings random walk. Trial Location of Markov chain Proposed Point Point Accepted? 1 (0.000, 0.000) (-0.140, 0.827) YES 2 (-0.140, 0.827) (0.706, ) YES 3 (0.706, ) (0.557, ) NO 4 (0.706, ) (0.608, 0.461) YES 5 (0.608, 0.461) (0.256, 0.167) YES 6 (0.256, 0.167) (-0.826, ) NO A plot of the entire sample of 100 points is shown in Figure 2.4. These points do indeed appear to be distributed uniformly across the unit circle, as we would hope. Note, however, that there are fewer than 100 distinct points on the plot. Some points are duplicates as a result of the trials in which the proposed point fell outside the unit circle and was hence rejected. Thus, in these trials, the Markov chain remained at its current location, rather than moving to a new point. We should also note that this simple example is only presented for demonstration purposes; if we actually wanted to generate a random sample with a bivariate Uniform distribution on the unit circle, there are many more efficient algorithms to produce such a sample than the random walk algorithm given in this example. In fact, for this case, it is unlikely that we would use any type of Markov chain algorithm, since it would be simple to produce an i.i.d. (independent and

22 13 Figure 2.3: Movement of the random walk Markov chain. identically distributed) sample from this distribution. Finally, note that in general, most Markov chains Monte Carlo algorithms are run for far more than 100 trials. These few trials will typically not be sufficient to produce a reasonable sample from the target distribution Variable-at-a-time Metropolis-Hastings Variable-at-a-time Metropolis-Hastings algorithms, of which the Gibbs sampler (Gelfand and Smith, 1990) is a special case, can be particularly helpful when we are attempting to construct a multivariate Markov chain. The reason this class of samplers is often beneficial is because they allow us to update the variables in the Markov chain individually, rather than having to update all of them at once. Suppose that we are trying to construct a Markov chain which converges to a stationary distribution π(x 1, x 2 ). We first let π X1 (x 1 ) = π(x 1, x 2 )dx 2 and π X2 (x 2 ) = π(x1, x 2 )dx 1 be the marginal distributions associated with π(x 1, x 2 ). Then the conditional distributions for the two variables are π X1 X 2 (x 1 x 2 ) = π(x 1, x 2 ) π X2 (x 2 ) and π X 2 X 1 (x 2 x 1 ) = π(x 1, x 2 ) π X1 (x 1 ). Then we can sample x 1 and x 2 individually, conditional upon the other. That is, we will first sample x 1 from π X1 X 2 (x 1 x 2 ) and then sample x 2 from π X2 X 1 (x 2 x 1 ). Sampling from these conditional distributions (rather than the full distribution) can lead to increases in efficiency, especially in the cases where the conditional distributions have recognizable distributions or are much easier to generate samples from. To produce the sampled points from each of these conditional distributions, we can use univariate Metropolis-Hastings methods, rejection samplers, or if the conditional distributions have recognized forms, we may be able to sample directly from one or more of them. We need not use the same updating method for each of the variables;

23 14 Figure 2.4: Sample produced by the random walk Markov chain. we can choose any univariate updating scheme that is appropriate for the given variable. This procedure is shown in Figure 2.5 for the case of a bivariate Markov chain, and can easily be extended to Markov chains of any finite dimension. Input: (x (n) 1, x(n) 2 ) = current value of Markov chain x(n+1) 1 π X1 X 2 (x 1 X 2 = x (n) 2 ) π X1 X 2 (x 1 x 2 ) = conditional distribution of X 1 X 2 π X2 X 1 (x 2 x 1 ) = conditional distribution of X 2 X 1 x (n+1) 2 π X2 X 1 (x 2 X 1 = x (n+1) 1 ) Output: (x (n+1) 1, x (n+1) 2 ) = new value of the Markov chain Figure 2.5: The Gibbs algorithm for constructing a bivariate Markov chain. Example 2.3. As an example, we will use the Gibbs algorithm to sample uniformly from a unit circle C centered at the origin. Note that this is the same target distribution as in the previous example. That is, we will construct a Markov chain that converges to the distribution given in (2.4). Instead of updating both coordinates simultaneously as before, however, using the Gibbs algorithm we will update the coordinates individually. To do this we need to find the appropriate conditional distributions for each variable; we first solve for the marginal distributions for each variable. π X1 (x 1 ) = π(x 1, x 2 )dx 2

24 15 1 = π I ( x x 2 2 < 1 ) dx 2 1 = π I ( x 2 2 < 1 x 2 ) 1 dx2 1 x = π dx 2 1 x 2 1 = 2 1 x 2 1 I ( 1 < x 1 < 1) π Similarly, we find that π X2 (x 2 ) = 2 1 x 2 2 I ( 1 < x 2 < 1). π Then we can solve for the conditional distributions corresponding to each of these variables. Likewise, we can also see that π X1 X 2 (x 1 x 2 ) = π(x 1, x 2 ) π X2 (x 2 ) 1 π = I ( x x 2 2 < 1 ) 2 1 x 2 2 I ( 1 < x 2 < 1) π ( 1 = 2 I 1 x 2 1 x 2 2 < x 1 < 2 ) 1 x 2 2 ( ) 1 π X2 X 1 (x 2 x 1 ) = 2 I 1 x 2 1 x 2 1 < x 2 < 1 x Inspecting these distributions, we can see that, conditional upon the value of the other coordinate, each coordinate has a uniform distribution, with limits determined by the value of the other coordinate. These limits correspond with the boundary of the unit circle. In this Gibbs sampler, we will update each coordinate via a univariate Metropolis-Hastings step. In order to do this, we must specify a proposal distribution for each coordinate. We will use a univariate normal distribution for each coordinate. The means of each of these Normal distributions will be the current values of the corresponding coordinate, and each distribution will have a variance of 1. Thus we have that q X1 (y x 1 ) is N(x 1, 1) and q X2 (y x 2 ) is N(x 2, 1). Now we can calculate the acceptance probabilities for each of the Metropolis-Hastings updates. In both cases, the proposal distributions are symmetric, so that we can use the simplified version of the acceptance probability given in (2.3). { } πx1 X α X1 X 2 (x, y) = min 2 (y) π X1 X 2 (x), x 2 2 = min x 2 2 ( I 1 x 2 2 < y < ) 1 x 2 2 ( I 1 x 2 2 < x < 1 x 2 2 ), 1

25 16 1 ( 2 I 1 x 2 1 x 2 2 < y < ) 1 x = min, x 2 2 ( I 1 x 2 2 < y < ) 1 x 2 2 = min, 1 1 = min = I { I ( 1 x 22 < y < ( 1 x 22 < y < 1 x 2 2 ) } 1 x 2 2, 1 ) As was the case in the previous example, this acceptance probability will always be either 0 or 1, depending on whether or not the proposed point lies within the unit circle. If it does, then we will accept it; if not, we ( reject and this coordinate of the Markov chain remains at its current value. Also note that I 1 x 2 2 < x < ) 1 x 2 2 will always be 1 by virtue of the current point lying within the unit circle. Similarly, the acceptance probability for the other coordinate is ( ) α X2 X 1 (x, y) = I 1 x 21 < y < 1 x 2 1. To complete the specification of this algorithm, we must assign a starting value to the Markov chain. As before, we will start the chain at the origin so that X (1) = (x (1) ) = (0, 0). We 1, x(1) 2 run this Markov chain for 100 trials (i.e., 50 updates of each coordinate). The first six trials are shown in Table 2.2. The movement of the Markov chain for these six trials is shown in Figure 2.6, along with the boundary of the region C from which we are trying to sample. Table 2.2: First six trials of the Metropolis-Hastings random walk using Gibbs updates. Trial Location of Markov chain Proposed Point Point Accepted? 1 (0.000, 0.000) (-0.472, 0.000) YES 2 (-0.472, 0.000) (-0.472, 0.402) YES 3 (-0.472, 0.402) (0.364, 0.402) YES 4 (0.364, 0.402) (0.364, ) YES 5 (0.364, ) (-0.534, ) YES 6 (-0.534, ) (-0.534, ) NO A plot of the entire sample is shown in Figure Monte Carlo Standard Errors and Stopping Rules In section 2.1 we mentioned some of the properties of Markov chains, including circumstances under which we can state a Central Limit Theorem for Markov chains, which was given in (2.2). If we can estimate σg, 2 the Central Limit Theorem lets us assess the accuracy of any of the calculations we do based on the Markov chain. While there are many different potential methods

26 17 Figure 2.6: Movement of the random walk Markov chain using Gibbs updates. available for calculating ˆσ 2 g (an estimate of σ 2 g), the one we will focus on here is the consistent batch means (CBM) method, as described in Jones et al. (2006) and Flegal et al. (2008). Our discussion of this method closely follows the description given by Flegal et al. (2008). Using the batch means method, to compute ˆσ 2 g based on a Markov chain run for n trials, we first split these n trials into a number of batches. In particular, we let a be the number of batches and b be the number of trials in each batch, so that n = ab. We then compute the sample mean for each batch as Ȳ j = 1 b Then the estimate of σ 2 g is given by jb i=(j 1)b+1 ˆσ 2 g = b a 1 g(x i ), j = 1,..., a. a ) 2 (Ȳj ḡ n. (2.6) j=1 Note that in general, for arbitrary values of a and b, the estimator defined in (2.6) will not be consistent for σ 2 g. However, there are some choices for a and b that do indeed assure consistency of this estimator. One such choice is to let b = n and a = n/b. Once we have an estimate of σ 2 g, we can then use this estimate to create a confidence interval for E π g. Specifically, if ˆσ 2 g is the estimator defined in (2.6), then t a 1 ˆσ 2 g n (2.7)

27 18 Figure 2.7: Sample produced by the random walk Markov chain using Gibbs updates. is the half-width of an asymptotically valid confidence interval for E π g, where t a 1 is the desired quantile from a t distribution with a 1 degrees of freedom (Jones et al., 2006). We can now use ˆσ g 2 to help devise stopping rules for an MCMC simulation. That is, we can use (2.7) to create guidelines which tell us for how long we should run a Markov chain in order to produce estimates of a desired accuracy. We must first specify the desired level of accuracy ɛ for our estimate. To achieve this level of accuracy, we want the half-width given in (2.7) to be no greater than ɛ. Thus our stopping rule is as follows: at periodic intervals (i.e., every k trials for some pre-specified value of k) we will calculate ˆσ g 2 using (2.6). Then we stop the simulation only if the associated confidence interval is narrow enough. In particular, we will stop if ˆσ g 2 t an 1 n + p(n) ɛ, (2.8) where p(n) = ɛi (n < n ) with n being a number chosen beforehand. (2.8) makes use of the half-width formula given in (2.7), with the addition of the p(n) term. This term ensures that the simulation is not stopped prematurely due to ˆσ g 2 being a poor estimate for σg 2 resulting from a small sample size. ˆσ g 2 being a consistent estimator of σg 2 implies that this procedure will stop for a sufficiently large value of n Effective Sample Size Ideally, when we produce a sample from a target distribution, we would like this sample to consist of i.i.d. (independent and identically distributed) draws from the distribution. However, draws produced using Markov chains will typically not be independent of each other, though the level of

28 19 dependence can vary greatly. We need to take into account the degree of this dependence when judging the quality of the samples produced. One way to do this is by examining the sample autocorrelations at various lags. If the samples were truly i.i.d., then the autocorrelation at each lag would be close to 0. Thus, we will prefer samples with autocorrelations that decay to 0 quickly, as this indicates samples which are less dependent. In order to assess the autocorrelation in a given sample, we must then examine the autocorrelation at each lag, which can be very tedious. (Note that in practice, we typically only look at autocorrelations for the first n lags, for some moderate value of n, since the autocorrelations at the very large lags are typically insignificant.) Another metric that is commonly used to assess the level of dependence in a sample is the effective sample size (ESS). A sample (containing some autocorrelation) which has an ESS of m contains as much information as an i.i.d. sample of m draws. We can calculate ESS for a sample of size N by using the formula ESS = N/κ(η), where κ(η) is the autocorrelation time for parameter η. A standard formula for autocorrelation time is κ(η) = k=1 ρ k(η), where ρ k (η) is the autocorrelation at lag k for the parameter η. Kass et al. (1998) recommends modifying this formula slightly by summing only the first j autocorrelation lags (for some finite j past which the autocorrelations have nearly vanished); here we determine j by using the Initial Monotone Sequence Estimator (IMSE) method (Geyer, 1992). This method is described in Figure 2.8. Note that since we do not know the true autocorrelations for each parameter, in practice we use the corresponding estimates from the sample data, ˆρ k (η). Input: ˆρ i (η) = Γ i ˆρ i (η) + ˆρ i+1 (η) estimated lag i autocorrelation for η k 1 repeat while Γ k+1 > 0 and Γ k > Γ k+1 : k k + 1 Output: κ(η) = autocorrelation time for η κ(η) k 1 j=1 ˆρ j(η) Figure 2.8: The IMSE algorithm for computing autocorrelation time.

29 Chapter 3 Ratio-of-Uniforms Markov Chain Monte Carlo 3.1 The Ratio-of-Uniforms Transformation The Ratio-of-Uniforms (ROU) transformation, as described by Kinderman and Monahan (1977), is a method for producing a random draw from a given distribution. Rather than sampling from the desired p-dimensional distribution directly, the ratio-of-uniforms method instead generates a draw from a uniform distribution on a particular region in p + 1 dimensions. A transformation is then required to translate this draw back into the original space; this backtransformed draw is a sample from the desired distribution. This is a type of auxiliary variable method; the method introduces an extra variable, with the hope that increasing the dimension of the target distribution (the distribution we are attempting to generate a sample from) will result in a more tractable sampling problem. The simplest case is that of a univariate target distribution. Consider the problem of sampling from a univariate distribution f. With the ROU method, we would generate a sample from the 2-dimensional region C f defined by C f = {(u, v) : 0 < v < f(u/v)}. After we have obtained the sample { (u (1), v (1) ), (u (2), v (2) ),..., (u (n), v (n) ) }, then we apply a transformation to translate this sample into a sample from the desired (1-dimensional) distribution. In particular, consider the transformation (y, z) = (u/v, v) After applying this transformation to the 2-dimensional sample, the result is that the marginal distribution of y = u/v has the desired distribution f. We simply ignore the other variable, z = v. This is summarized in Theorem 3.1: Theorem 3.1. (Kinderman and Monahan, 1977) Let f be a density function for a univariate random variable. Let (U,V) be random variables with a joint Uniform distribution on region C f R 2, with C f = {(u, v) : 0 < v < f(u/v)}. Then the random variable Y = U/V has distribution f.

30 21 Proof. (Kinderman and Monahan, 1977) The bivariate distribution of (u, v) is g(u, v) = 1 area(c f ) I ((u, v) C 1 ( f ) = area(c f ) I 0 < v < ) ( f(u/v) = 2I 0 < v < ) f(u/v) (Note that for univariate f, the area of C f is always 1 2, since this corresponds to the normalizing constant that causes the density to integrate to 1 as required.) The transformation (y, z) = (u/v, v) implies that (u, v) = (yz, z), so the Jacobian of this transformation is J = du dy dv dy du dz dv dz = du dv dy dz dv du = z 1 y 0 = z J = z = z dy dz (since z = v 0 by construction). Then the bivariate distribution of (y, z) is ( h Y,Z (y, z) = g U,V (u(y, z), v(y, z)) J = 2I 0 < z < f so that the marginal distribution of y is ( yz ) ) ( z = 2zI 0 < z < ) f(y) z j(y) = h(y, z)dz = ( 2zI 0 < z < ) f(y) dz = f(y) 0 2zdz = [ z 2] f(y) 0 = f(y) which is the desired target distribution. One important point to note concerning this method is that, in order to use the ROU transformation, we do not need to know the normalizing constant. That is, f does not need to be a density; it only must be proportional to a density. This property is very useful in cases where the normalizing constant is unknown or intractable. Example 3.1. As a simple example of this transformation, consider a target distribution X Uniform(1,2), i.e., f X (x) = I (x (1, 2)). We have C f = {(u, v) : 0 < v < f(u/v)} = {(u, v) : 0 < v < I (u/v (1, 2))} = {(u, v) : 0 < v < 1, u/v (1, 2)} = {(u, v) : 0 < v < 1, 1 < u/v < 2} = {(u, v) : 0 < v < 1, v < u < 2v}, which is a region bounded by a triangle with vertices at the points (0,0), (1,1), and (2,1) in the U V plane (see Figure 3.1).

31 22 Figure 3.1: ROU region corresponding to univariate Uniform random variable. Example 3.2. As an example of generating a sample from a univariate target distribution via the ROU transformation, consider the kernel of a univariate standard normal distribution (Kinderman and Monahan, 1977). That is, we will generate a sample for the random variable X, where X N(0, 1), i.e., f X (x) e x2 /2. In this case, the 2-dimensional ROU region corresponding to f X is given by C f = {0 < v < e u2 /4v 2 }. Note that C f is bounded by the rectangle {(u, v) : 1 u 1, 0 v 1}. This allows us to sample uniformly on C f by using a rejection sampler on this bounding rectangle. Once we have the sample of points on this region, then we simply let X (i) = U (i) /V (i), i = 1,..., n, where n is the desired sample size, so that {X (i) } is the sample from f X. Figure 3.2 shows the ROU region for this distribution, along with the histogram corresponding to the generated sample. We can also state a more general version of Theorem 3.1. This generalization of the ROU method, due to Wakefield et al. (1991), enables us to use different power transformations to create ROU regions of varying shapes, possibly yielding more efficient sampling schemes. Theorem 3.2. (Wakefield et al., 1991) Let f be a density function for a univariate random variable. Let (U,V) be random variables with a joint Uniform distribution on region C f R 2, with C f = {(u, v) : 0 < v < r+1 f(u/v r )}. Then the random variable Y = U/V r has distribution f.

32 23 (a) ROU Region for Standard Normal Sample (b) Histogram for ROU Standard Normal Sample Figure 3.2: Standard Normal ROU Example. Proof. The bivariate distribution of (u, v) is 1 g(u, v) = area(c f ) I ((u, v) C f ) 1 ( = area(c f ) I 0 < v < r+1 ) f(u/v r ) ( = (r + 1)I 0 < v < r+1 ) f(u/v r ) 1 (Note that for univariate f, the area of C f is always r+1, since this corresponds to the normalizing constant that causes the density to integrate to 1 as required.) The transformation (y, z) = (u/v r, v) implies that (u, v) = (yz r, z), so the Jacobian of this transformation is J = du dy dv dy du dz dv dz = du dv dy dz dv du dy dz = zr 1 ryz r 1 0 = z r J = z r = z r (since z = v 0 by construction). Then the bivariate distribution of (y, z) is h Y,Z (y, z) = g U,V (u(y, z), v(y, z)) J ( ) = (r + 1)I (0 ) yz r < z < r+1 f z r z r ( = (r + 1)z r I 0 < z < r+1 ) f(y)

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the Multivariate Slice Sampling A Thesis Submitted to the Faculty of Drexel University by Jingjing Lu in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2008 c Copyright

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Gunter Spöck, Hannes Kazianka, Jürgen Pilz Department of Statistics, University of Klagenfurt, Austria hannes.kazianka@uni-klu.ac.at

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

1 Using standard errors when comparing estimated values

1 Using standard errors when comparing estimated values MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics a dissertation submitted to the faculty of the graduate school of the university

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information