Output analysis for Markov chain Monte Carlo simulations

Size: px
Start display at page:

Download "Output analysis for Markov chain Monte Carlo simulations"

Transcription

1 Chapter 1 Output analysis for Markov chain Monte Carlo simulations James M. Flegal and Galin L. Jones (October 12, 2009) 1.1 Introduction In obtaining simulation-based results, it is desirable to use estimation procedures which include a measure of the reliability of the procedure. Our goal is to introduce some of the tools useful for analyzing the output of a Markov chain Monte Carlo (MCMC) simulation which allow us to have confidence in the claims put forward. The following are the main issues we will address: (1) basic graphical assessment of MCMC output; (2) using the output for estimation; (3) assessing the Monte Carlo error of estimation; and (4) terminating the simulation. Let π be a probability function with support X R d (most often π will be a probability mass function or a probability density function) about which we wish to make an inference. Typically, this inference will be based on some feature θ π of π. For example, if g : X R a 1

2 2 CHAPTER 1. OUTPUT ANALYSIS common goal is the calculation of 1 θ π = E π g = g(x)π(dx). (1.1.1) We might also want quantiles associated with the marginal distributions of π. Indeed we will typically want several features of π such as various mean and variance parameters along with quantiles and so on. As a result, θ π will generally be a p-dimensional vector. Unfortunately, due to the use of complex stochastic models, we very often cannot calculate any of the components of θ π analytically or even numerically. Thus we are faced with a classical statistical problem; given a probability distribution π we want to estimate several fixed and unknown features of it. A brief aside is in order. Users of MCMC are often doing so in order to make an inferential statement about π. That is, the estimation of θ π is simply an intermediate step towards the ultimate inferential goal. It is an important point that the quality of the estimate of θ π can have a profound impact on the quality of the resulting inference about π. For now we consider only the problem of estimating an expectation as in (1.1.1) and defer considering what to do when θ π is not an expectation. The basic MCMC method entails constructing a Markov chain X = {X 0, X 1, X 2,...} on X having π as its invariant distribution. (A description of the required regularity conditions is given in Section 1.7). Then we simulate X for a finite number of steps, say n, and use the observed sample path to estimate E π g with a sample average X ḡ n := 1 n 1 g(x i ). (1.1.2) n i=0 The use of this estimator is justified through the Markov chain strong law of large numbers (SLLN) 2 : If E π g <, then ḡ n E π g almost surely as n. From a practical point of view, this means that we can obtain an arbitrarily accurate estimate of E π g by increasing n, i.e. running the simulation longer. How long should we run the simulation? There are several methods commonly used to terminate an MCMC simulation. For example, one could stop when the sequence ḡ n, as a function of n, appears to have stabilized or n could be chosen based on the variability in 1 The notation in (1.1.1) avoids having separate formulas for the discrete case where it denotes g(x) π(x) and the continuous case where it denotes g(x) π(x) dx. x X 2 This is a special case of the Birkhoff Ergodic Theorem (p. 558 Fristedt and Gray, 1997). X

3 1.1. INTRODUCTION 3 ḡ n observed in some necessarily short preliminary runs. Convergence diagnostics (Cowles and Carlin, 1996) are also often used for this purpose; for example, see Gelman and Rubin (1992). However, these approaches do not provide a measure of the quality of our estimate ḡ n. The alternative advocated here (see also Flegal et al., 2008) is that this decision should be based on the size of the Monte Carlo error, ḡ n E π g. Unfortunately, we can never know the Monte Carlo error except in toy problems where we already know E π g. However we can obtain its approximate sampling distribution if a Markov chain central limit theorem (CLT) holds n(ḡn E π g) d N(0, σg) 2 (1.1.3) as n where σ 2 g (0, ). Due to the correlation present in a Markov chain, σ 2 g is generally complicated and, in particular, σ 2 g var π g. For now, suppose we have an estimator such that ˆσ n 2 σg 2 almost surely as n (see Section 1.4 for some suitable techniques). This allows one to construct an asymptotically valid confidence interval for E π g with half-width ˆσ n t n where t is an appropriate quantile. This interval then allows us to describe the confidence in the reported estimate and one can continue the simulation until the interval estimate is sufficiently narrow. This is a fixed-width approach to terminating the simulation and will be illustrated in Section 1.6. But, more importantly, reporting the Monte Carlo standard error allows an independent reader to assess the Monte Carlo sampling error and independently judge the results being presented. The rest of this chapter is organized as follows. In Section 1.2 we consider some basic techniques for graphical assessment of MCMC output then Section 1.3 contains a discussion of various point estimators of θ π. Next in Section 1.4 we introduce techniques for constructing interval estimators of θ π. Section 1.5 considers estimating marginal densities associated with π and Section 1.6 further considers stopping rules for MCMC simulations. Finally, in Section 1.7 we give conditions for ensuring (1.1.3). The computations presented in our examples were carried out using the R language. See the author s web pages for the relevant Sweave file from which the reader can reproduce all of our calculations.

4 4 CHAPTER 1. OUTPUT ANALYSIS 1.2 Initial Examination of Output As a first step it pays to examine the empirical finite-sample properties of the Markov chain being simulated. A few simple graphical methods provide some of the most common ways of initially assessing the output of an MCMC simulation. These include scatterplots, histograms, time series plots, autocorrelation plots and plots of sample means. As an illustration consider the following toy example which will be used several times throughout this chapter. Example 1 (Normal AR(1) Markov chains). Consider the normal AR(1) time series defined by X n+1 = ρx n + ɛ n (1.2.1) where the ɛ n are i.i.d. N(0,1) and ρ < 1. This Markov chain has invariant distribution N (0, 1/(1 ρ 2 )). As a simple numerical example, consider simulating the chain (1.2.1) in order to estimate the mean of the invariant distribution, i.e. 0. While this is a toy example, it is quite useful because ρ plays a crucial role in the behavior of this chain. Figure 1.1 contains plots based on single sample path realizations starting at X 1 = 1 with ρ = 0.5 and ρ = In each figure the top plot is a time series plot of the observed sample path. The mean of the target distribution is 0 and the horizontal lines are 2 standard deviations above and below the mean. Comparing the time series plots it is apparent that while we may be getting a representative sample from the invariant distribution, when ρ = 0.95 the sample is highly correlated. This is also apparent from the autocorrelation (middle) plots in both figures. When ρ = 0.5 the autocorrelation is negligible after about lag 4 but when ρ = 0.95 there is a substantial autocorrelation until about lag 30. The impact of this correlation is apparent in the bottom two plots which plot the running estimates of the mean versus iterations in the chain. The true value is displayed as the horizontal line at 0. Clearly, the more correlated sequence requires many more iterations to achieve a reasonable estimate. In fact, it is apparent from these plots that the simulation with ρ = 0.5 has been run too long while in the ρ = 0.95 case it likely hasn t been run long enough. In the example, the plots were informative because we were able to draw the horizontal lines depicting the true values. In practically relevant MCMC settings where these would be unavailable it is hard to know when to trust these plots. Nevertheless, they can still be useful since a Markov chain that is mixing well would tend to have a time series and autocorrelation plots that look like Figure 1.1a while time series and autocorrelation plots like the one in

5 1.3. POINT ESTIMATES OF θ π 5 Time Series Time Series Autocorrelation Autocorrelation Lag Lag Running Average Running Average (a) ρ = (b) ρ = 0.95 Figure 1.1: Initial output examination for Example 1. Figure 1.1b (or worse) would indicate a potentially problematic simulation. Also, plots of current parameter estimates (with no reference to a standard error) versus iteration are not as helpful since they provide little information as to the quality of estimation. In simulating a d-dimensional Markov chain to simultaneously estimate the p-dimensional vector θ π of features of π, p can be either greater than or less than d. When either d or p are large, the standard graphical techniques are obviously problematic. That is, even if for each component the time series plot indicates good mixing we should not necessarily infer that the chain has converged to its joint target distribution. In addition, if either d or p is large it will be impractical to look at plots of each component. 1.3 Point Estimates of θ π In this section, we consider two specific cases of θ π, estimating an expectation E π g and estimating a quantile of one of the univariate marginal distributions from π.

6 6 CHAPTER 1. OUTPUT ANALYSIS Expectations Suppose θ π = E π g and assume E π g <. Recall from Section 1.1 that there is a SLLN and hence it is natural to use the sample average ḡ n to estimate θ π. However, there are two popular alternatives the first of which is obtained through the use of burn-in, i.e. discarding an initial portion of the simulation, and the second which is obtained through averaging conditional expectations. Since the simulation is not typically started with a draw from π (otherwise we could just do ordinary Monte Carlo) it follows that marginally each X i π and hence E π g E[g(X i )]. Thus ḡ n is a biased estimator of E π g. In our setting, we have that X d n π as n so, in order to diminish the effect of this initialization bias an alternative estimator is often employed ḡ n,b = 1 n n+b 1 i=b g(x i ) where B denotes the burn-in or amount of simulation discarded. By keeping only the draws obtained after B 1 we are effectively choosing a new initial distribution that is closer to π. The SLLN still applies to ḡ n,b since if it holds for any initial distribution it holds for every initial distribution. Of course, one possible (maybe even likely) consequence of using burn-in is that var(ḡ n,b ) var(ḡ n+b,0 ), that is, the bias decreases but the variance increases for the same total simulation effort. Obviously, this means that using burn-in could result in an estimator having larger mean-squared error than one without burn-in. Moreover, without some theoretical work (Jones and Hobert, 2001; Latuszynski and Niemiro, 2009; Rosenthal, 1995; Rudolf, 2009) it is not clear what value of B should be chosen. Popular approaches to determining B are usually based on so-called convergence diagnostics (Cowles and Carlin, 1996). Unfortunately, there simply is no guarantee that any of these diagnostics will detect a problem with the simulation and, in fact, using them can introduce bias (Cowles et al., 1999). To motivate the estimator obtained by averaging conditional expectations, suppose the target is a function of two variables π(x, y) and we are interested in estimating the expectation of a function of only one of the variables, say g(x). Let m Y (y) and f X Y (x y) be marginal and conditional densities, respectively. Notice E π g = [ g(x)π(x, y)dydx = ] g(x)f X Y (x y)dx m Y (y)dy

7 1.3. POINT ESTIMATES OF θ π 7 so by letting h(y) = g(x)f X Y (x y)dx we can appeal to the SLLN again to see that as n h n = 1 n 1 h(y i ) = 1 n 1 n n i=0 i=0 g(x)f X Yi (x y i )dx a.s. E π g. This estimator is conceptually the same as ḡ n in the sense that both are sample averages over the chain and the Markov chain SLLN applies to both. The estimator h n is often called the Rao-Blackwellised estimator of E π g (an unfortunate name since it has nothing to do with the Rao-Blackwell Theorem). An obvious question is which of ḡ n, the sample average, or h n, the Rao-Blackwellised estimator, is better? It is obvious that h n will sometimes be impossible to use if f X Y (x y) is not available in closed form or if the integral is intractable, hence h n will not be as generally practical as ḡ n. However, there are settings, such as in data augmentation (Hobert, 2009) where h n is theoretically and empirically superior to ḡ n ; see Liu et al. (1994) and Geyer (1995) for theoretical investigation of these two estimators. We will content ourselves with an empirical comparison in two simple examples. Example 2. This example is also considered in Hobert (2009). Suppose our goal is to estimate the first two moments of a Students t distribution with 4 degrees of freedom and having density m(x) = 3 8 ) 5/2 (1 + x2. 4 Obviously, there is nothing about this that requires MCMC since we can easily calculate that E m X = 0 and E m X 2 = 2. Nevertheless, we will use a data augmentation algorithm based on the joint density π(x, y) = 4 2π y 3 2 e y(2+x 2 /2) so that the full conditionals are X Y N(0, y 1 ) and Y X Gamma(5/2, 2 + x 2 /2). Consider the Gibbs sampler that updates X then Y so that a one-step transition looks like (x, y ) (x, y ) (x, y). Suppose we have obtained n observations {x i, y i ; i = 0,..., n 1} from running the Gibbs sampler. Then the standard sample average estimates of E m X and E m X 2 are 1 n 1 n i=0 x i and 1 n 1 x 2 i, respectively. n i=0

8 8 CHAPTER 1. OUTPUT ANALYSIS First Moment Second Moment Std RB Std RB (a) θ π = E π X (b) θ π = E π X 2 Figure 1.2: Estimators of the first two moments for Example 2. The horizontal line denotes the truth, the solid curves are the running sample averages while the dotted curves are the running RB sample averages. Further, the Rao-Blackwellised estimates are easily computed. Since X Y N(0, y 1 ) the Rao-Blackwellised estimate of E m X is 0! On the other hand, the RB estimate of E m X 2 is 1 n 1 y 1 i. n i=0 As an illustration of these estimators we simulated 2000 iterations of the Gibbs sampler and plotted the running values of the estimators in Figure 1.2. It is obvious for these plots that the RB averages are much less variable than the standard sample averages, especially for smaller MCMC sample sizes. It is not the case that RB estimators are always better than sample means. Whether they are better depends on the expectation being estimated as well as the properties of the MCMC sampler. In fact, Liu et al. (1994) and Geyer (1995) give an example where the RB estimator is provably worse than the sample average which is illustrated in the following example.

9 1.3. POINT ESTIMATES OF θ π Std RB Std RB (a) θ π = E π X (b) θ π = E π (X 2Y ) Figure 1.3: Plots based on a single run of the Gibbs sampler in Example 3 with ρ = 0.6. The horizontal line denotes the truth, the solid curves are the running sample averages while the dotted curves are the running RB sample averages. Example 3. As in Liu et al. (1994) and Geyer (1995), suppose ( ) (( ) ( )) X 0 1 ρ N, Y 0 ρ 1 then X Y N(ρY, 1 ρ 2 ) and Y X N(ρX, 1 ρ 2 ). Consider a Gibbs sampler that updates X followed by an update of Y so that a one-step transition looks like (x, y ) (x, y ) (x, y). For estimating θ π = E π (X by ), Geyer (1995) shows when b = 2 and ρ = 0.6, the sample average has smaller asymptotic variance than the RB estimator. However, for estimating E π X, the RB estimator will be asymptotically superior. The gain in either case is not always apparent in small Monte Carlo sample sizes when looking at running plots of the estimates; see Figure 1.3. RB estimators are more general than our presentation suggests. Let h be any function and set f(x) = E π [g(x) h(x) = h(x)]

10 10 CHAPTER 1. OUTPUT ANALYSIS so that E π g = E π f. Thus, by the Markov chain SLLN with probability 1 as n, 1 n n g(x i ) = 1 n i=1 n E π [g(x) h(x i ) = h(x i )] E π g. i=1 As long as the conditional distribution X h(x) is tractable, RB estimators are available Quantiles It is common to report estimated quantiles in addition to estimated expectations. Actually, what is nearly always reported is not a multivariate quantile but rather a quantile of the univariate marginal distributions associated with π. This is the only setting we consider. Let F be a marginal distribution function associated with π. Then the qth quantile of F is φ q = F 1 (q) = inf{x : F (x) q}, 0 < q < 1. (1.3.1) There are many potential estimates of φ q (Hyndman and Fan, 1996) but we consider only the inverse of the empirical distribution function from the observed sequence. First define {X (1),..., X (n) } as the order statistics of {X 0,..., X n 1 }, then the estimator of φ q is given by ˆφ q,n = X (j+1) where j n q < j + 1 n. (1.3.2) Example 4 (Normal AR(1) Markov chains). Consider again the time series defined at (1.2.1). Our goal in this example is to illustrate estimating the first and third quartiles, denoted Q 1 and Q 3. In this simple example, the true values of Q 1 and Q 3 are ±Φ 1 (0.75)/ 1 ρ 2 where Φ is the cumulative distribution function of a standard normal distribution. Using the same realization of the chain as in Example 1, Figures 1.4a and 1.4b show plots of the running quartiles versus iteration when ρ = 0.5 and ρ = 0.95 respectively. It is immediately apparent that obtaining good estimates is more difficult when ρ = 0.95 and hence the simulation should continue. Also, without the horizontal lines, these plots would not be as useful. Recall a similar conclusion was reached for estimating the mean of the target distribution.

11 1.4. INTERVAL ESTIMATES OF θ π 11 Running Quartiles Running Quartiles (a) ρ = 0.5 (b) ρ = 0.95 Figure 1.4: Plots for AR(1) model of running Q 1 and Q 3. The horizontal lines are the true quartiles. 1.4 Interval Estimates of θ π In our examples so far we have known the truth enabling us to draw the horizontal lines on the plots which allow us to gauge the quality of estimation. Obviously, the true parameter value is unknown in practically relevant settings and hence the size of the Monte Carlo error is unknown. For this reason, when reporting a point estimate of θ π one should also include a Monte Carlo standard error (MCSE). In this section we address how to construct asymptotically valid standard errors and interval estimates of θ π. The method presented will be based on the existence of a Markov chain CLT as at (1.1.3). We defer discussion of the regularity conditions required to ensure a CLT until Section 1.7 and, for now assume a limiting distribution for the Monte Carlo error holds; see also Geyer (2009) and Hobert (2009) Expectations Suppose θ π = E π g which will be estimated with ḡ n = ḡ n,0, that is with no burn-in. However, we note that, as with the SLLN, if the CLT holds for any initial distribution then it holds for every initial distribution. Thus the use of burn-in does not affect the existence of a CLT,

12 12 CHAPTER 1. OUTPUT ANALYSIS but it may affect the quality of the asymptotic approximation. If ˆσ 2 n is an estimate of σ 2 g, then one can form a confidence interval for E π g with half-width ˆσ n t (1.4.1) n where t is an appropriate quantile. Thus the difficulty in finding interval estimates is estimating σg 2 which requires specialized techniques to account for correlation in the underlying chain. We restrict attention to strongly consistent estimators of σg 2 so that the asymptotic validity of (1.4.1) is ensured. The methods satisfying this requirement include various batch means methods, spectral variance methods and regenerative simulation. Other alternatives include the initial sequence methods of Geyer (1992), however the theoretical properties of Geyer s estimators are not well understood. We will focus on batch means as it is the most generally applicable method; for more on spectral methods see Flegal and Jones (2009a) while Hobert et al. (2002) and Mykland et al. (1995) study regenerative simulation. There are many variants of batch means, here we focus on non-overlapping batch means (BM) and overlapping batch means (OLBM), a simple extension with a smaller asymptotic variance (Flegal and Jones, 2009a; Meketon and Schmeiser, 1984). Batch Means For BM the simulation output is broken into a n equally sized batches of length b n. If an MCMC sampler is run for a total of n = a n b n iterations then define Ȳk := b 1 bn 1 n i=0 g(x kb n+i) for k = 0,..., a n 1. Further define the batch means estimator of σg 2 as ˆσ 2 BM = b a n 1 n (Ȳk ḡ n ) 2. a n 1 The estimator ˆσ 2 BM is not generally a consistent estimator of σ2 g (Glynn and Whitt, 1991). Specifically, if the number of batches is fixed, then no matter how large the batch size the estimator ˆσ 2 BM does not converge in probability to σ2 g as the Monte Carlo sample size increases. However, roughly speaking, Jones et al. (2006) show that if the Markov chain mixes quickly and both a n and b n are allowed to increase as the overall length of the simulation does, then ˆσ 2 BM is a strongly consistent estimator of σ2 g. It is often convenient to take b n = n ν for some 0 < ν < 1 and a n = n/b n. k=0 OLBM is a generalization of BM if we consider the n b n + 1 overlapping batches of

13 1.4. INTERVAL ESTIMATES OF θ π 13 length b n. Indexing the batches by j = 0,..., n b n, define the batch mean as Ȳj(b n ) := b 1 bn 1 n i=0 g(x j+i), so that σg 2 is estimated with ˆσ 2 OLBM = n b nb n n (Ȳj(b n ) ḡ n ) 2. (n b n )(n b n + 1) Flegal and Jones (2009a) establish strong consistency of the OLBM estimator under similar conditions to those required for BM, e.g. by setting b n = n ν for some 0 < ν < 3/4. When constructing the interval (1.4.1) t is a quantile from a Student s t distribution with n b n degrees of freedom. In both BM and OLBM, ν values yielding strongly consistent estimators are dependent on the number of finite moments and mixing conditions of the Markov chain. These conditions are similar to those required for a Markov chain CLT, see (Flegal and Jones, 2009a; Jones, 2004; Jones et al., 2006). Example 5 (Normal AR(1) Markov chains). Recall the AR(1) model defined at (1.2.1). Using the same realization of the chain as in Example 1, 2000 iterations with ρ {0.5, 0.95} starting from X 1 = 1, we consider estimating the mean of the invariant distribution, i.e. 0. Utilizing OLBM with b n = n 1/2, we calculated an MCSE and resulting 95% confidence interval, see Figure 1.5. The top figure is a plot of the running MCSE versus iteration for a single realization where the dashed line corresponds to ρ = 0.5 and the dotted line corresponds to ρ = When ρ is larger it takes longer for the MCSE to stabilize and begin decreasing. After 2000 iterations for ρ = 0.5 we obtained an interval of ± while for ρ = 0.95 the interval is ± Notice that both confidence intervals contain 0 (the true value). Many of our plots are based on simulating only 2000 iterations. j=0 We chose this value strictly for illustration purposes. An obvious question is has the simulation been run long enough? Another way of asking the same questions is are the interval estimates sufficiently narrow after 2000 iterations? In the ρ =.5 case, the answer is maybe while in the ρ =.95 case it is clearly no. Consider the interval estimate of the mean with ρ = 0.5, that is, ± = ( 0.119, 0.051). This indicates that we can trust only the leading 0 in the point estimate 0.034, but not the sign of the estimate. If the user is satisfied with this level of precision, then 2000 iterations is sufficient. On the other hand, when ρ =.95 our interval estimate is ± = ( 1.196, 0.182) which indicates that we cannot trust any of the figures (or sign) reported in the point estimate, Recall the RB estimators of Section It is straightforward to use OLBM to calculate

14 14 CHAPTER 1. OUTPUT ANALYSIS Running MCSE ρ=0.5 ρ=0.95 Running Average Running Average Figure 1.5: Plots for AR(1) model of running MCSE and confidence intervals calculated via OLBM. the MCSE for these estimators since the conditional expectations being averaged define the sequence of batch means Ȳj(b n ). Example 6. Recall Example 3 where we found it difficult to distinguish between using the sample average estimator and the RB estimators. The difference becomes clearer with interval estimators. Figure 1.6 considers estimating E π X, here the RB estimators have a smaller asymptotic variance than the sample average estimators as clearly reflected by the narrower interval estimates. On the other hand, Figure 1.7 shows the RB estimators have a larger asymptotic variance than the sample average estimators, also reflected by the wider interval estimates. The practical effect of this is that, depending on which expectation is being estimated, using RB estimators can lead to either more or less simulation effort in order to achieve a sufficiently narrow interval estimate. Notice the additional information available by including the resulting interval estimates. Parallel Chains To this point, our recipe seems straightforward: given a sampler, pick a starting value and run the simulation for a long time using the SLLN and the CLT to produce a point estimate and a measure of its uncertainty. A variation that stimulated debate in the early statistics literature on MCMC (see e.g. Gelman and Rubin, 1992; Geyer, 1992), even earlier

15 1.4. INTERVAL ESTIMATES OF θ π (a) Sample Average Estimates (b) RB Estimates Figure 1.6: Both plots are based on a single run of the Gibbs sampler in Example 3 with ρ =.6 for estimating E π X. The horizontal line denotes the truth, the solid curves are the running estimates while the dotted curves are the running pointwise 95% interval estimates (a) Sample Average Estimates (b) RB Estimates Figure 1.7: Both plots are based on a single run of the Gibbs sampler in Example 3 with ρ =.6 for estimating E π (X 2Y ). The horizontal line denotes the truth, the solid curves are the running estimates while the dotted curves are the running pointwise 95% interval estimates.

16 16 CHAPTER 1. OUTPUT ANALYSIS in the operations research literature (see e.g. Bratley et al., 1987; Kelton and Law, 1984), and continues today (Alexopoulos et al., 2006; Alexopoulos and Goldsman, 2004) instead relies on simulating r independent replications of this recipe. If we suppose that each replication is started from different starting points but is run for the same length and using the same burn-in, then this procedure would produce r independent estimates of E π g, namely ḡ n,b,1, ḡ n,b,2,..., ḡ n,b,r. The resulting estimate of E π g would then be the grand mean and our estimate of its performance, i.e. σ 2 g, would be the usual sample variance of the ḡ n,b,i. This approach has some intuitive appeal in that estimation avoids some of the serial correlation inherent in MCMC. Moreover, clearly there is value in trying a variety of initial values for the simulation. It has also been argued that by choosing the r starting points in a widely dispersed manner there is a greater chance of encountering modes that one long run may have missed. Thus, for example, some argue that using independent replications results in superior inferential validity (Gelman and Rubin, 1992, p. 503). However, there is no agreement on this issue, indeed, Bratley et al. (1987, p. 80) are skeptical about [the] rationale of some proponents of independent replications. Notice that the total simulation effort using independent replications is r(n + B). To obtain good estimates of σ 2 g will require r to be large which will require n + B to be small for a given computational effort. If we use the same value of B as we would when using one one long run this means that each ḡ n,b,i will be based on a comparatively small number n of observations. Moreover, since each run will be short there is a reasonable chance that a given replication will not move far from its starting value. Alexopoulos and Goldsman (2004) have shown that this can result in much poorer estimates (in terms of mean square error) of E π g than a single long run. On the other hand, if we can find widely dispersed starting values that are from a distribution very close to π, then independent replications may indeed be superior (Alexopoulos and Goldsman, 2004). This should not be surprising since draws directly from π are clearly desirable Functions of Moments Suppose now that we are interested in estimating φ(e π g) where φ is some function. If φ is continuous, then φ(ḡ n ) φ(e π g) with probability 1 as n which makes estimation of φ(e π g) straightforward. Again we also want a valid Monte Carlo error which can be obtained via the delta method (Ferguson, 1996; van der Vaart, 1998). Assuming (1.1.3) the delta method says that if φ is continuously differentiable in a neighborhood of E π g and

17 1.4. INTERVAL ESTIMATES OF θ π 17 φ (E π g) 0, then as n n(φ(ḡn ) φ(e π g)) d N(0, [φ (E π g)] 2 σ 2 g). Thus if our estimator of σg, 2 say ˆσ n 2 is strongly consistent then [φ (ḡ n )] 2ˆσ n 2 is strongly consistent for [φ (E π g)] 2 σg. 2 Example 7. Consider estimating (E π X) 2 with ( X n ) 2. Let φ(x) = x 2 and assume φ(e π X) 0 and (1.1.3). Then ( n ( Xn ) 2 (E π X) 2) d N(0, 4(E π X) 2 σx) 2 and we can use OLBM to consistently estimate σ 2 x with ˆσ 2 n which means that 4( X n ) 2ˆσ 2 n is a strongly consistent estimator of 4(E π X) 2 σ 2 x. From this example we see that the univariate delta method makes it straightforward to handle powers of moments. The multivariate delta method allows us to handle more complicated functions of moments. Let T n denote a d-dimensional sequence of random vectors and θ be a d-dimensional parameter. If as n, n(tn θ) d N(µ, Σ) and φ is continuously differentiable in a neighborhood of θ and φ (θ) 0, then as n n(φ(tn ) φ(θ)) d N(µ, φ (θ)σφ (θ) T ). Example 8. Consider estimating var π g = E π g 2 (E π g) 2 with, setting h = g 2, ˆv n := 1 n n h(x i ) i=1 ( 1 n 2 n g(x i )). i=1 Assume ) ( )) (( ) ((ḡ n E π g d 0 n N, h n E π g 2 0 ( σg 2 c )) c σ 2 h where c = E π g 3 E π ge π g 2. Let φ(x, y) = y x 2 and assume E π g 0, then as n n(ˆvn var π g) d N(0, 4(E π g)(σ 2 ge π g E π g 3 + E π ge π g 2 ) + σ 2 h).

18 18 CHAPTER 1. OUTPUT ANALYSIS Since it is easy to use OLBM to construct strongly consistent estimators of σg 2 and σh 2 the following formula gives a strongly consistent estimator of the variance in the asymptotic normal distribution for ˆv n 4(ḡ n )(ˆσ 2 g,n ḡ n j n + ḡ n hn ) + ˆσ 2 h,n where j = g Quantiles Suppose our goal is to estimate φ q with ˆφ q,n defined at (1.3.1) and (1.3.2), respectively. We now turn our attention to constructing an interval estimate of φ q. It is tempting to think that bootstrap methods would be appropriate for this problem. Indeed there has been a substantial amount of research into bootstrap methods for stationary time series which would be appropriate for MCMC settings (see e.g. Bertail and Clémençon, 2006; Bühlmann, 2002; Datta and McCormick, 1993; Politis, 2003). Unfortunately, our experience has been that these methods are extremely computationally intensive as compared to the method presented below. Moreover, the coverage probabilities of the intervals produced with the bootstrap methods are typically much worse; see Flegal (2008). As above, we assume the existence of an asymptotic distribution for the Monte Carlo error, that is, there is a constant γq 2 (0, ) such that as n n( ˆφq,n φ q ) d N(0, γ 2 q ). (1.4.2) Flegal and Jones (2009b) give conditions under which (1.4.2) obtains and these are also briefly introduced in Section 1.7. Just as when we were estimating an expectation we find ourselves in the position of estimating a complicated constant γq 2. We focus on the use of subsampling methods in this context. However, before we begin note that this use of the term subsampling is quite different than the way it is often used in the context of MCMC in that we are not deleting any observations of the Markov chain. Subsampling methods This section will provide a brief overview of subsampling methods in the context of MCMC and illustrate their use for estimating the MCSE of ˆφ q,n. While this section focuses on quan-

19 1.4. INTERVAL ESTIMATES OF θ π 19 tiles, subsampling methods apply much more generally; the interested reader is encouraged to consult Politis et al. (1999). In the case of sample averages, as at (1.1.2), the subsampling variance estimator is equivalent to the OLBM estimator of σ 2 g. The main idea for subsampling is similar to OLBM in that we are taking overlapping batches (or subsamples) of size b n from the first n observations of the chain {X 0, X 1,..., X n 1 }. There are n b n + 1 such subsamples where b n << n typically. Let {X i,..., X i+bn 1} be the ith subsample with corresponding ordered subsample {X(1),..., X (b n) }. Then define the quantile based on the ith subsample as φ i = X (j+1) where j b n q < j + 1 b n for i = 0,..., n b n. The subsampling estimator of γ 2 q is then where ˆγ 2 q = n b b n n+1 (φ i n b n + 1 φ ) 2, (1.4.3) φ = i=0 n b 1 n+1 φ i. n b n + 1 Politis et al. (1999) give conditions to ensure this estimator is strongly consistent, however their conditions could be difficult to check in practice. i=0 Implementation of subsampling requires choosing b n so that as n we have b n and b n /n 0. A natural choice is b n = n. Example 9 (Normal AR(1) Markov chains). Using the AR(1) model defined at (1.2.1) we again consider estimating first and third quartiles, denoted Q 1 and Q 3, which equal ±Φ 1 (0.75)/ 1 ρ 2. Figure 1.8 shows the output from the same realization of the chain used previously in Example 4 but this time the plot includes an interval estimate of the quartiles. Figure 1.8a shows a plot of the running quartiles versus iteration when ρ = 0.5. In addition, the dashed lines show the 95% confidence interval bounds at each iteration. These intervals were produced with subsampling using b n = n. At around 200 iterations, the MCSE (and hence interval estimates) seem to stabilize and begin to decrease. At 2000 iterations, the estimates for Q 1 and Q 3 are 0.817±0.105 and 0.778±0.099 respectively. Figure 1.8b shows the same plot when ρ = At 2000 iterations, the estimates for Q 1 and Q 3 are 2.74 ± and

20 20 CHAPTER 1. OUTPUT ANALYSIS Running Quartiles Running Quartiles (a) ρ = 0.5 (b) ρ = 0.95 Figure 1.8: Plots for AR(1) model of running Q 1 and Q 3 estimates with 95% pointwise confidence intervals calculated via subsampling ± respectively. Are the intervals sufficiently narrow after 2000 iterations? In both cases (ρ = 0.5 and ρ =.95) the answer is apparently no. Consider the narrowest interval which is the estimate for Q 3 with ρ =.5, that is, ± = (.679,.877) which indicates that we can at most trust the sign and the leading 0 of the estimate, Note that in a real problem we would not have the horizontal line in the plot depicting the truth. 1.5 Estimating Marginals A common inferential goal is the production of a plot of a marginal density associated with π. In this section we cover two methods for doing this. We begin with a simple graphical method then introduce a clever method due to Wei and Tanner (1990) that reminds us of the Rao-Blackwellization methods of Section 1.3. A histogram approximates the true marginal by the Markov chain SLLN and because it is so easy to implement provides the most common method for doing so. Another common approach is to report a nonparametric density estimate or smoothed histogram. It is conceptually straightforward to construct pointwise interval estimates for the smoothed histogram

21 1.5. ESTIMATING MARGINALS 21 using subsampling techniques. However, outside of toy examples, the computational cost can be substantial; see Politis et al. (1999) Example 10. Suppose Y i µ, θ N(µ, θ) independently for i = 1,..., m where m 3 and prior ν(µ, θ) θ 1/2. The target is the posterior density π(µ, θ y) θ (m+1)/2 e m 2θ (s2 +(ȳ µ) 2 ) where s 2 is the usual biased sample variance. It is easy to see that µ θ, y N(ȳ, θ/m) and that θ µ, y IG((m 1)/2, m[s 2 + (ȳ µ) 2 ]/2). We will consider the Gibbs sampler that updates µ then θ so that a one-step transition is given by (µ, θ ) (µ, θ ) (µ, θ) and use this sampler to estimate the marginal densities of µ and θ. Now suppose m = 11, ȳ = 1 and s 2 = 4. We simulated 2000 realizations of the Gibbs sampler starting from (µ 1, λ 1 ) = (1, 1). The marginal density plots were created using the default settings for the density function in R and are shown in Figure 1.9 while an estimated bivariate density plot (created using R functions kde2d and persp) is given in Figure It is obvious from these figures that the posterior is simple, so it is no surprise that the Gibbs sampler has been shown to converge in just a few iterations (Jones and Hobert, 2001). Obviously, in more realistic examples where the dimension of the posterior is three or more it will be difficult to use these techniques to visualize the entire posterior. A clever technique for estimating a marginal is based on the same idea as RB estimators (Wei and Tanner, 1990). To keep the notation simple suppose the target is a function of only two variables, π(x, y) and let m X and m Y be the associated marginals. Then m X (x) = π(x, y)dy = f(x y)m Y (y)dy = E my f(x y) suggesting that, by the Markov chain SLLN, we can get a functional approximation to m X since as n 1 n 1 f(x y i ) m X (x). (1.5.1) n i=0 Of course, just as with RB estimators this will only be useful when the conditionals are tractable. Note also, that it is straightforward to use BM or OLBM to get pointwise confidence intervals for the resulting curve; that is for each x we can calculate an MCSE of the sample average in (1.5.1). Example 11. Recall the setting of Example 10. We will focus on estimation of the marginal

22 22 CHAPTER 1. OUTPUT ANALYSIS Density Density (a) Marginal density of µ (b) Marginal density of θ Figure 1.9: Histograms for estimating the marginal densities of Example 10. mu theta Figure 1.10: Estimated posterior density of Example 10.

23 1.5. ESTIMATING MARGINALS 23 Density Figure 1.11: Estimates of the marginal density µ. The three estimates are based a histogram, smoothed marginal densities (solid line), and the method of Wei and Tanner (1990) (dashed line). posterior density of µ y, i.e. π(µ y). Note that π(µ y) = π(µ θ, y)π(θ y) dθ so that by the Markov chain SLLN we can estimate π(µ y) with 1 n n π(µ θ i, y) which is straightforward to evaluate since µ θ i, y N(ȳ, θ i /m). i=1 Specifically, the resulting marginal estimate is a linear combination of normal distributions resulting in N(ȳ, θ/m). Using the same realization of the chain from Example 10 we estimated π(µ y) using this method. Figure 1.11 shows the results with our previous estimates. One can also calculate pointwise 95% confidence intervals using OLBM which results in a very small Monte Carlo error (which are therefore not included in the plot). Notice the estimate based on (1.5.1) is a bit smoother than either the histogram or the smoothed histogram estimate but is otherwise quite similar.

24 24 CHAPTER 1. OUTPUT ANALYSIS 1.6 Terminating the Simulation The most common approach to stopping an MCMC experiment is to simulate a fixed, predetermined path length. That is, the simulation is terminated with a fixed-time rule. Indeed there are settings where, due to the nearly prohibitive difficulty of the simulation, this is the only reasonable way to plan an experiment; for example see Caffo et al. (2009). However, this is not the case for the typical MCMC experiment. As in Section 1.4, inclusion of a MCSE with the resulting point estimate allows one to gauge whether the desired precision has been achieved. After an initial run, if the MCSE is not sufficiently small, then one might increase the simulation length and recalculate an MCSE. That is, the simulation is terminated the first time the MCSE is sufficiently small. Equivalently, the simulation is terminated the first time that the half-width of a confidence interval for θ π is sufficiently small and so this is a fixed-width rule. Notice that this is a sequential procedure that will result in a random total simulation effort. There is a substantial amount of research on fixed-width procedures in the MCMC context when θ π is an expectation; see Flegal et al. (2008), Glynn and Whitt (1992) and Jones et al. (2006) and the references therein, but none that we are aware of when θ π is not an expectation. Let ˆσ n 2 be a consistent estimator of σg 2 from (1.1.3). Formally, given a desired half-width ɛ the simulation terminates the first time the following inequality is satisfied t ˆσ n n + p(n) ɛ (1.6.1) where t is an appropriate Student s t quantile and p(n) is a positive function such that as n, p(n) = o(n 1/2 ). Letting n be the desired minimum simulation effort, a reasonable default is p(n) = ɛ[i(n n ) + n 1 I(n > n )]. Glynn and Whitt (1992) established that the interval at (1.6.1) is asymptotically valid in the sense that the desired coverage probability is obtained as ɛ 0. The use of these intervals in MCMC settings has been extensively investigated by Flegal et al. (2008), Flegal and Jones (2009a) and Jones et al. (2006) who show that they work well. Example 12 (Normal AR(1) Markov chains). Consider the normal AR(1) time series defined at (1.2.1). In Example 5 we simulated 2000 iterations from the chain with ρ = 0.95 starting from X 1 = 1 and found that a 95% confidence interval for the mean of the invariant distribution was 0.27 ± Suppose we wanted to continue our simulation until we were 95% confident our estimate

25 1.7. MARKOV CHAIN CENTRAL LIMIT THEOREMS 25 was within.1 of the true value, a fixed-width procedure using OLBM with b n = n. Thus (1.6.1) becomes t n bn ˆσ n n + 0.1[I(n 1000) + n 1 I(n > 1000)] 0.1. It would be computationally expensive to check this criterion after each iteration so instead we added 1000 iterations before recalculating the half-width each time. In this case, the simulation terminated after 1.42e5 iterations resulting in an interval estimate of 8.59e 3 ± Notice that this simple example required an exceptionally large simulation effort relative to what is often done in practice in much more complicated settings. 1.7 Markov Chain Central Limit Theorems Throughout we have assumed the existence of a Markov chain central limit theorem, see e.g. (1.1.3) and (1.4.2). In this section we provide a brief discussion of the conditions required for these claims; much more detail can be found in Chan and Geyer (1994), Jones (2004), Meyn and Tweedie (1993), Roberts and Rosenthal (2004) and Tierney (1994). Without stating it we have assumed throughout this chapter the Markov chain X is Harris ergodic. To fully explain this condition would require a fair amount of Markov chain theory and hence we will content ourselves with an informal discussion of this assumption. An interested reader should consult Meyn and Tweedie (1993), Nummelin (1984) or Roberts and Rosenthal (2004). Informally, Harris ergodicity ensures that for every starting value if A X is such that π(dx) > 0, then A will be visited infinitely often as n. Thus, at A least asymptotically, X will thoroughly explore the support X of the target π. This assumption guarantees a nice form of convergence which requires a bit of notation to describe. Let the conditional distribution of X n given X 1 = x be denoted P n (x, ), that is, P n (x, A) = Pr(X n A X 1 = x). Then Harris ergodicity implies for every starting point x X P n (x, ) π( ) 0 as n (1.7.1) where is the total variation norm. (This is stronger than convergence in distribution.)

26 26 CHAPTER 1. OUTPUT ANALYSIS Hence as n increases the marginal distribution of X n will be closer to that of π. Harris ergodicity alone is not sufficient for the Markov chain SLLN or a CLT. However, if X is Harris ergodic and E π g <, then the SLLN holds: ḡ n E π g with probability 1 as n. For the CLT we require more; specifically we will need to know the rate of the convergence in (1.7.1) to say something about the existence of a CLT. Let M(x) be a nonnegative function on X and γ(n) be a nonnegative function on Z + such that P n (x, ) π( ) M(x)γ(n). (1.7.2) When X is geometrically ergodic γ(n) = t n for some t < 1 while uniform ergodicity means X is geometrically ergodic and M is bounded. These are key sufficient conditions for the existence of an asymptotic distribution of the Monte Carlo error. In particular, a CLT as at (1.1.3) holds if X is geometrically ergodic and E π g 2+δ < for some δ > 0 or if X is uniformly ergodic and E π g 2 <. Moreover, Flegal and Jones (2009a) and Jones et al. (2006) establish that when X is geometrically ergodic the batch means and overlapping batch means methods produce strongly consistent estimators of σg 2 from (1.1.3). Geometric ergodicity is also an important sufficient condition for establishing (1.4.2) when estimating a quantile. Suppose F has has derivative f(φ q ) which is positive and finite. Further assume f (x) exists and f (x) B < for x in some neighborhood of φ q. If X is geometrically ergodic, then (1.4.2) obtains (Flegal and Jones, 2009b). In general, establishing (1.7.2) directly is apparently daunting. However, If X is finite (no matter how large), then a Harris ergodic Markov chain is uniformly ergodic. When X is a general space there are constructive methods which can be used to establish geometric or uniform ergodicity; see Hobert (2009) and Jones and Hobert (2001) for accessible introductions. These techniques have been applied to many MCMC samplers. For example, Metropolis-Hastings samplers with state-independent proposals can be uniformly ergodic (Tierney, 1994) but such situations are uncommon in realistic settings. Standard random walk Metropolis-Hastings chains on R d, d 1 cannot be uniformly ergodic but may still be geometrically ergodic; see Mengersen and Tweedie (1996). Other research on establishing convergence rates of Markov chains used in MCMC is given by Geyer (1999), Jarner and Hansen (2000), Meyn and Tweedie (1994) and Neath and Jones (2009) who considered Metropolis-Hastings algorithms and Hobert and Geyer (1998), Hobert et al. (2002), Johnson and Jones (2008), Jones and Hobert (2004), Marchev and Hobert (2004), Roberts and Polson (1994), Roberts and Rosenthal (1999), Rosenthal (1995, 1996), Roy and Hobert (2007), Tan

27 1.7. MARKOV CHAIN CENTRAL LIMIT THEOREMS 27 and Hobert (2009) and Tierney (1994) who examined Gibbs samplers. Acknowledgments This work supported by NSF grant DMS

28 28 CHAPTER 1. OUTPUT ANALYSIS

29 Bibliography Alexopoulos, C., Andradóttir, S., Argon, N. T., and Goldsman, D. (2006). Replicated batch means variance estimators in the presence of an initial transient. ACM Transactions on Modeling and Computer Simulation, 16: Alexopoulos, C. and Goldsman, D. (2004). To batch or not to batch? ACM Trans. Model. Comput. Simul., 14(1): Bertail, P. and Clémençon, S. (2006). Bernoulli, 12: Regenerative block-bootstrap for Markov chains. Bratley, P., Fox, B. L., and Schrage, L. E. (1987). A Guide to Simulation. Springer Verlag, New York. Bühlmann, P. (2002). Bootstraps for time series. Statistical Science, 17: Caffo, B. S., Peng, R., Dominici, F., Louis, T., and Zeger, S. (2009). Parallel Bayesian MCMC imputation for multiple distributed lag models: A case study in environmental epidemiology (to appear). In Handbook of Markov Chain Monte Carlo. CRC, London. Chan, K. S. and Geyer, C. J. (1994). Comment on Markov chains for exploring posterior distributions. The Annals of Statistics, 22: Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91: Cowles, M. K., Roberts, G. O., and Rosenthal, J. S. (1999). Possible biases induced by MCMC convergence diagnostics. Journal of Statistical Computing and Simulation, 64: Datta, S. and McCormick, W. P. (1993). Regeneration-based bootstrap for Markov chains. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 21:

30 30 BIBLIOGRAPHY Ferguson, T. S. (1996). A Course in Large Sample Theory. Chapman & Hall / CRC, Boca Raton. Flegal, J. M. (2008). Monte Carlo standard errors for Markov chain Monte Carlo. PhD thesis, University of Minnesota, School of Statistics. Flegal, J. M., Haran, M., and Jones, G. L. (2008). Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science, 23: Flegal, J. M. and Jones, G. L. (2009a). Batch means and spectral variance estimators in Markov chain Monte Carlo. The Annals of Statistics (to appear). Flegal, J. M. and Jones, G. L. (2009b). Quantile estimation via Markov chain Monte Carlo. Technical report, University of California, Riverside, Department of Statistics. Fristedt, B. and Gray, L. F. (1997). A Modern Approach to Probability Theory. Birkhauser Verlag. Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7: Geyer, C. J. (1992). Practical Markov chain Monte Carlo (with discussion). Statistical Science, 7: Geyer, C. J. (1995). Conditioning in Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 4: Geyer, C. J. (1999). Likelihood inference for spatial point processes. In Barndorff-Nielsen, O. E., Kendall, W. S., and van Lieshout, M. N. M., editors, Stochastic Geometry: Likelihood and Computation, pages Chapman & Hall/CRC, Boca Raton. Geyer, C. J. (2009). Introduction to Markov chain Monte Carlo (to appear). In Handbook of Markov Chain Monte Carlo. CRC, London. Glynn, P. W. and Whitt, W. (1991). Estimating the asymptotic variance with batch means. Operations Research Letters, 10: Glynn, P. W. and Whitt, W. (1992). The asymptotic validity of sequential stopping rules for stochastic simulations. The Annals of Applied Probability, 2: Hobert, J. P. (2009). The data augmentation algorithm: Theory and methodology (to appear). In Handbook of Markov Chain Monte Carlo. CRC, London.

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

Markov Chain Monte Carlo: Can We Trust the Third Significant Figure?

Markov Chain Monte Carlo: Can We Trust the Third Significant Figure? Markov Chain Monte Carlo: Can We Trust the Third Significant Figure? James M. Flegal School of Statistics University of Minnesota jflegal@stat.umn.edu Murali Haran Department of Statistics The Pennsylvania

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

When is a Markov chain regenerative?

When is a Markov chain regenerative? When is a Markov chain regenerative? Krishna B. Athreya and Vivekananda Roy Iowa tate University Ames, Iowa, 50011, UA Abstract A sequence of random variables {X n } n 0 is called regenerative if it can

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making

More information

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics Galin L. Jones and James P. Hobert Department of Statistics University of Florida May 2000 1 Introduction Realistic statistical

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Package mcmcse. July 4, 2017

Package mcmcse. July 4, 2017 Version 1.3-2 Date 2017-07-03 Title Monte Carlo Standard Errors for MCMC Package mcmcse July 4, 2017 Author James M. Flegal , John Hughes , Dootika Vats ,

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics a dissertation submitted to the faculty of the graduate school of the university

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance

Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Electronic Journal of Statistics Vol. (207) 326 337 ISSN: 935-7524 DOI: 0.24/7-EJS227 Analysis of Polya-Gamma Gibbs sampler for Bayesian logistic analysis of variance Hee Min Choi Department of Statistics

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past" (CFTP) for exact sampling from a distribution using

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past (CFTP) for exact sampling from a distribution using Ecient Use of Exact Samples by Duncan J. Murdoch* and Jerey S. Rosenthal** Abstract Propp and Wilson (1996,1998) described a protocol called coupling from the past (CFTP) for exact sampling from the steady-state

More information

Partially Collapsed Gibbs Samplers: Theory and Methods

Partially Collapsed Gibbs Samplers: Theory and Methods David A. VAN DYK and Taeyoung PARK Partially Collapsed Gibbs Samplers: Theory and Methods Ever-increasing computational power, along with ever more sophisticated statistical computing techniques, is making

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Simulation. Where real stuff starts

Simulation. Where real stuff starts 1 Simulation Where real stuff starts ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2 1. What is a simulation? 3 What is a simulation?

More information

arxiv: v1 [stat.co] 21 Mar 2014

arxiv: v1 [stat.co] 21 Mar 2014 A practical sequential stopping rule for high-dimensional MCMC and its application to spatial-temporal Bayesian models arxiv:1403.5536v1 [stat.co] 21 Mar 2014 Lei Gong Department of Statistics University

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

Bridge estimation of the probability density at a point. July 2000, revised September 2003

Bridge estimation of the probability density at a point. July 2000, revised September 2003 Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information

Markov Chain Monte Carlo for Linear Mixed Models

Markov Chain Monte Carlo for Linear Mixed Models Markov Chain Monte Carlo for Linear Mixed Models a dissertation submitted to the faculty of the graduate school of the university of minnesota by Felipe Humberto Acosta Archila in partial fulfillment of

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods M. González, R. Martínez, I. del Puerto, A. Ramos Department of Mathematics. University of Extremadura Spanish

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference Charles J. Geyer April 6, 2009 1 The Problem This is an example of an application of Bayes rule that requires some form of computer analysis.

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Local consistency of Markov chain Monte Carlo methods

Local consistency of Markov chain Monte Carlo methods Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Fixed-Width Output Analysis for Markov Chain Monte Carlo

Fixed-Width Output Analysis for Markov Chain Monte Carlo Johns Hopkins University, Dept. of Biostatistics Working Papers 2-9-2005 Fixed-Width Output Analysis for Markov Chain Monte Carlo Galin L. Jones School of Statistics, University of Minnesota, galin@stat.umn.edu

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

NEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS

NEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS roceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. NEW ESTIMATORS FOR ARALLEL STEADY-STATE SIMULATIONS Ming-hua Hsieh Department

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling James P. Hobert Department of Statistics University of Florida Vivekananda Roy

More information

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference Charles J. Geyer March 30, 2012 1 The Problem This is an example of an application of Bayes rule that requires some form of computer analysis.

More information

Multivariate Output Analysis for Markov Chain Monte Carlo

Multivariate Output Analysis for Markov Chain Monte Carlo Multivariate Output Analysis for Marov Chain Monte Carlo Dootia Vats School of Statistics University of Minnesota vatsx007@umn.edu James M. Flegal Department of Statistics University of California, Riverside

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information