BENCHMARK ESTIMATION FOR MARKOV CHAIN MONTE CARLO SAMPLERS

Size: px
Start display at page:

Download "BENCHMARK ESTIMATION FOR MARKOV CHAIN MONTE CARLO SAMPLERS"

Transcription

1 BENCHMARK ESTIMATION FOR MARKOV CHAIN MONTE CARLO SAMPLERS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Subharup Guha, M. Sc. * * * * * The Ohio State University 2004 Dissertation Committee: Steven N. MacEachern, Co-Adviser Mario Peruggia, Co-Adviser L. Mark Berliner Peter F. Craigmile Approved by Co-Adviser Co-Adviser Department of Statistics

2 c Copyright by Subharup Guha 2004

3 ABSTRACT While studying various features of the posterior distribution of a vector-valued parameter using an MCMC sample, systematically subsampling of the MCMC output can only lead to poorer estimation. Nevertheless, a 1-in-k subsample is often all that is retained in investigations where intensive computations are involved or where speed is essential. The goal of benchmark estimation is to produce a number of estimates based on the best available information, i.e. the entire MCMC sample, and to use these to improve other estimates made on the basis of the subsample. We take a simple approach by creating a weighted subsample where the weights are quickly obtained as a solution to a system of linear equations. We provide a theoretical basis for the method and illustrate the technique using examples from the literature. For a subsampling rate of 1-in-10, the observed reductions in MSE often exceed 50% for a number of posterior features. Much larger gains are expected for certain complex estimation methods and for the commonly used thinner subsampling rates. Benchmark estimation can be used wherever other fast or efficient estimation strategies like importance sampling already exist. We show how the two strategies can be used in conjunction with each other. We discuss some asymptotic properties of benchmark estimators that provide insight into the gains associated with the technique. The observed gains are found to closely match the theoretical values predicted by the asymptotic, even for k as small as 10. ii

4 Dedicated to my dear parents iii

5 ACKNOWLEDGMENTS I am indebted to Dr. Steve MacEachern for many useful research ideas and for his unstinting help and guidance over the years. I am grateful to Dr. Mario Peruggia for his patience, insight and invaluable support. I am deeply appreciative of Dr. Mark Berliner and Dr. Peter Craigmile for acting as references for my work. Finally, I thank my parents and lovingly dedicate my dissertation to them. iv

6 VITA September 13, Born - Calcutta, India Master of Science in Statistics, Indian Institute of Technology, Kanpur, India 1999-present Graduate Teaching Associate, The Ohio State University PUBLICATIONS MacEachern S. N., M. Peruggia and S. Guha (2003). Discussion of A theory of statistical models for Monte Carlo integration by Kong, McCullagh, Nicolae, Tan and Meng. Journal of the Royal Statistical Society - Series B, Volume 65, Issue 3. FIELDS OF STUDY Major Field: Statistics v

7 TABLE OF CONTENTS Page Abstract Dedication Acknowledgments Vita List of Tables ii iii iv v viii List of Figures ix Chapters: 1. Introduction Monte Carlo Methods Rejection Method Importance Sampling MCMC Estimation Systematic Subsampling Benchmark Estimation for MCMC Samplers An Overview of this Dissertation A Simple Approach to Benchmark Estimation An Improved Subsample Estimator Some Methods of Obtaining Weights Benchmark Estimation and Importance Sampling vi

8 3. Theoretical Results Notation Asymptotic Properties of Post-stratification Estimators Based on a Subsample Asymptotic A: n is held fixed and k tends to Asymptotic B: k is held fixed and n tends to Asymptotic C: n and k jointly tend to Illustrations Failure Times of Power Plant Pumps Allometry Example: Brain Masses and Body Weights Conclusions and Future Work Appendices: A. Proof of Theorem B. Results Used to Establish Theorem B.1 A New Set of Asymptotic Tools B.2 Details of Lemmas C. Likelihood Function of an Over- or Under-dispersed Model for the Pumps Dataset D. Allometry Example: R Code for Generating Draws from the Semiparametric Model Posterior Bibliography vii

9 LIST OF TABLES Table Page 4.1 Comparison of MSE of the subsample estimators for a 1-in-10 systematic subsample Comparison of MSE of the subsample estimators for a 1-in-100 systematic subsample viii

10 LIST OF FIGURES Figure Page 4.1 Trellis plots of the percent reductions in MSE produced by the subsample estimator for all method-by-rate combinations and all features of interest E[g(θ)] Level plot of the percent MSE reduction for the weighted subsample estimation of β 2 using the post-stratification weights (iii) and 1-in-10 subsamples Plot of ˆB c 2 ˆ SE( ˆB c ) versus c A comparison of the predictive mass functions of the number of failures of a pump on which there is no data, having an operation time of hours Importance sampling weights versus β for the target posterior corresponding to the model M The percent reductions in MSE for various features of the target posterior, π A normal probability plot of the log-transformed, recentered covariates for the Weisberg data Percent reductions in variance achieved by post-stratification, relative to subsample-based empirical average estimation, for some of the investigated posterior features Percent reductions in variance achieved by post-stratification, relative to subsample-based empirical average estimation, for the remaining six posterior features investigated ix

11 4.10 A comparison of the asymptotic ( ) and the actual () variance reductions for PS-(2). The dashed lines indicate a margin of two standard errors A comparison of the asymptotic ( ) and the actual () variance reductions for PS-(2) for the remaining set of posterior features. The dashed lines indicate a margin of two standard errors Estimated asymptotic reductions in variance ( ) and squared multiple correlations ( ) for different posterior features Estimated asymptotic reductions in variance ( ) and squared multiple correlations ( ) for different posterior features x

12 CHAPTER 1 Introduction 1.1 Monte Carlo Methods Bayesian methods have long been touted as providing an optimal approach to statistics (Savage, 1954). Bayesian methods have a common foundation with traditional approaches to statistics. Both approaches begin with a description of the outcome of an experiment, x, as a random quantity whose value is influenced by a parameter, θ. The outcome of the experiment, or data, can be thought of as the result of the experiment. In small problems, the data will consist of dozens of measurements; in very large problems, the data may come in terabytes. In a simple setting, the parameter may be as direct as the mean of a probability distribution for x; in more complex settings, the parameter may be a vector with hundreds, thousands, or even an infinite number of dimensions. The more complex parameters are used to describe various features of the distribution of x, including the shape of a distribution and the relationship between various components of x. Bayesian methods depart from traditional statistical methods in how they treat the parameter. Since we use the language of probability to describe our uncertainty about the world, the Bayesian naturally uses this language to describe our uncertainty 1

13 about the parameter, θ, as well. Having done so, the formal distinction between parameter and data disappears, and the tools of conditional probability can be used to describe our uncertainty about (or conversely, knowledge about) the parameter after having seen the data. This last probability distribution is referred to as the posterior distribution for θ given x. This distribution summarizes all current knowledge about the parameter, and so is the basis of all inference about the model parameter, θ. Formally, a Bayesian model consists of the data x, and a parameter θ taking values in a set Θ. The joint probability distribution, denoted by f(x, θ), is specified by the model, usually in terms of the likelihood function f(x θ) and the prior f(θ). All inference is based on the posterior distribution, denoted by π. The posterior density at a point θ is given by π(θ) = f(x, θ)/m(x), where m(x) = f(x, θ)µ(dθ) is the marginal likelihood of the data and µ is a σ-finite measure. Interest often focusses on features that can be expressed as a posterior expectation, E[g(θ)], for some function g( ). Written explicitly, E[g(θ)] = g(θ)f(x, θ)µ(dθ)/m(x). Examples of such posterior features include posterior moments, quantiles, HPD regions and the posterior density itself. In many Bayesian models, the parameter has a large number of components, and the posterior distribution is complex enough that it cannot be exactly evaluated numerically. Evaluating posterior features by numerical integration is therefore not possible unless the parameter consists of a very small number of components. Widespread use of Bayesian methods was hindered by a lack of computational strategies and power that restricted their use to a small number of stylized problems. A seminal paper (Gelfand and Smith, 1990) advocated the fitting of statistical models 2

14 with simulation (or Monte Carlo ) methods like rather than through analytic integration, analytic approximation, or numerical integration. Since that time, there has been increased acceptance of the use of simulation-based approximations to fit realistic models. There has also been tremendous growth in the use of Bayesian methods, to the point where several departments have been founded with an exclusively Bayesian approach to statistics. Monte Carlo methods use a stream of random numbers to generate a sample of parameter vectors drawn from the posterior distribution of θ. The information contained in these parameter vectors is then used to provide estimates of a broad class of features of the posterior distribution. Gilks et al. (1996) provides an excellent overview of these techniques. Formally, simulation methods use a random sample of N draws, denoted by θ (1),..., θ (N), to estimate posterior features of interest. These sample draws may or may not be independent. If the draws are i.i.d. with a common distribution π s, referred to as the source distribution, the law of large numbers states that the empirical average, Ê[g(θ)] = 1 N N i=1 g(θ(i) ), tends almost surely to E πs [g(θ)]. Therefore, if the source distribution π s is identical to the posterior disribution π and the sample size N is large, Ê[g(θ)] may be regarded as a reasonable estimate of the posterior feature E[g(θ)]. In practice, it is usually not possible to sample directly from π if the marginal likelihood m(x) cannot be analytically or numerically evaluated. This is the case with most realisitic Bayesian models. We could obtain i.i.d. samples from a different source distribution and then weight the samples appropriately to obtain an estimate of E[g(θ)]. Two techniques that use this approach are importance sampling and the 3

15 rejection method. A drawback of both these methods is that they are not automatic and often require considerable skill to be effectively applied Rejection Method Given a set of i.i.d. draws from a source distribution π s, the rejection method re-samples from these draws with a probability compensating for the difference between the source and the posterior distribution. The resulting sample of draws are approximately i.i.d. π. The success of the method depends on being able to find a good envelope, π s, for the posterior. For most Bayesian models, it is difficult to find a reasonably good envelope Importance Sampling Geweke (1989) provides an overview of importance sampling for i.i.d. draws. Let θ (1),..., θ (N) be a sample of i.i.d. draws from the source distribution π s. It is often convenient to work with a non-normalized version πs(θ) of the source density, where πs(θ) = π s (θ)/d and d > 0 is a constant. The non-normalized importance weight of a point θ equals w (θ) = f(x, θ)/πs(θ). On the other hand, the normalized weight, w(θ) = π(θ)/π s (θ), cannot be computed exactly. The importance sampling estimator of E[g(θ)] is Ê[g(θ)] IS = n i=1 w (θ (i) ) g(θ (i) ) n i=1 w (θ (i) ) (1.1) Under assumptions about the existence of first moments and about the support of π s (θ), Theorem 1 of Geweke s paper states that Ê[g(θ)] IS is a consistent estimator of E[g(θ)]. Under additional assumptions about the existence of appropriate second-order moments, Theorem 2 states that the estimator is asmptotically normal: n (Ê[g(θ)]IS E[g(θ)]) N(0, σ 2 IS ), where σ2 IS = E[w(θ) (g(θ) E[g(θ)])2 ]. 4

16 The relative numerical efficiency of Ê[g(θ)] IS is defined as RNE = V ar(g(θ))/σis 2. The RNE can be estimated consistently based on the importance sample. An RNE much smaller than unity indicates that the estimator Ê[g(θ)] IS has a low precision. A better importance sampling source should then be considered because the posterior itself, at least theoretically, is a source corresponding to a substantially better estimator. An RNE greater than unity implies that the estimator Ê[g(θ)] IS has a smaller asymptotic MSE than the empirical average estimator based on i.i.d. draws from the posterior. Theorem 3 of Geweke s paper states that πs(θ) = g(θ) E[g(θ)] f(x, θ) optimizes the asymptotic variance of Ê[g(θ)] IS. Importance sampling can therefore be regarded as a variance reduction technique. However, notice that the optimal source depends on g( ). Thus, computational costs may prevent one from sampling from the optimal source corresponding to each investigated feature. Nevertheless, the form of the optimal weight function suggests that a thicker-tailed source than the posterior would produce reasonably precise (although possibly sub-optimal) estimates. Choices for the source include a multivariate normal or t-distribution, with the parameters chosen to match the tail behavior of the posterior. A useful diagnostic for the effectiveness of importance sampling estimation is a matrix scatterplot of the importance sampling weights versus the coordinates of the sampled points. The existence of high weights for outlying coordinate values would indicate that the importance sampling estimates are imprecise. A related measure of the effectiveness of importance sampling is the effective sample size. This is defined as follows: Given a set of N i.i.d. draws from the source 5

17 distribution π s, ESS = N V ar πs (w(θ)) + 1 where w( ) is the normalized importance sampling weight. The intuitive interpretation of this quantity is that importance sampling estimation with π s as the source, has roughly the same precision as empirical average estimation based on ESS i.i.d. draws sampled from π. An ESS much smaller than N suggests imprecise importance sampling estimation. The ESS involves essentially the same idea as the RNE. The two measures are equal if w(θ) and g(θ) are uncorrelated under π (Liu, 2001.) However, the ESS is easier to estimate and is not tied to any particular posterior feature. The ESS is maximized when π s π. Intuitively, the importance sampling source that works best for a wide variety of posterior features is the posterior distribution itself. To summarize, the source density π s (θ) should closely resemble the posterior density. This corresponds to V ar πs (w(θ)) being small. The source should also be thickertailed than the posterior, so that large importance weights do not occur in the tails. This strategy would produce reasonably accurate importance sampling estimates of most posterior features. Importance Link Function (ILF) estimation, introduced in MacEachern and Peruggia (2000a), is a sophisticated version of importance sampling. A transformation is applied to the source distribution so that the density of the transformed distribution more closely matches the posterior. Re-centering and re-scaling (applying an affine transformation) often achieves this goal. It is also computionally attractive because the Jacobian is trivially known for an affine transformation. More accurate importance sampling estimates are obtained in this fashion. 6

18 1.1.3 MCMC Estimation For almost any Bayesian model, Markov Chain Monte Carlo (MCMC) methods can be applied to produce dependent samples approximately distributed as the posterior. Tierney (1994) states the following property of Markov chains: Suppose the transition kernel P of the Markov chain is aperiodic, irreducible and has an invariant distribution π. Then π is the unique invariant distribution of P. For any starting value θ 0 belonging to a set having π-probability equal to 1, the n-th iterate of the kernel, P n (θ 0, ), converges in total variation distance to the distribution π as n. This convergence holds for all θ 0 Θ if the chain P is Harris recurrent. This result implies that, under mild conditions, a Markov chain having the posterior π as the invariant distribution ultimately produces dependent samples approximately distributed as π. Let θ (1),..., θ (N) represent the MCMC sample from the posterior. As stated in Tierney s paper, ergodic results and central limit theorems also hold for the empirical average estimators of most posterior features. These results provide a theoretical justification for discarding the initial MCMC draws and for using the post burn-in draws to estimate posterior features. MCMC samples can be produced using the Metropolis-Hastings algorithm or Gibbs sampling, which is a special case of the former method. Metropolis-Hastings Algorithm. Let Q(y, ) be a distribution having the density q(y, ) with respect to the measure µ. Let X n = y denote the current state of the chain. The chain proposes a candidate state z Q(y, ). It moves to the proposed state (i.e. X n+1 = z) with probability α(y, z) and rejects the move (i.e. X n+1 = y) 7

19 with probability 1 α(y, z), where { { } min π(z)q(z,y) π(y)q(y,z) α(y, z) =, 1, if π(y)q(y, z) > 0, 1, if π(y)q(y, z) = 0. Under weak assumptions, it can be shown that this chain has the invariant distribution π. The algorithm depends on π only through the ratio π(z)/π(y), which can be computed irrespective of the value of the marginal likelihood, m(x). In most situations, the Metropolis-Hastings algorithm can therefore be used to generate draws from the posterior. Special cases of Metropolis-Hastings chains include random walk chains, for which the proposal distribution is of the form q(y, z) = h(z y). The hit-and-run algorithm also belongs to this class. Independence chains, in which the candidate steps are chosen according to a fixed distribution (i.e. q(y, z) = h(z)) are another special case. These chains are closely related to importance sampling. Section 2.4 of Tierney s paper discusses hybrids strategies like cycles and mixtures. The Gibbs sampler is an example, in which several reducible kernels are cycled in a fixed or random order to obtain an irreducible kernel. A mixture or cycle is uniformly ergodic if one of its components is uniformly ergodic (Gilks, et al., 1996). This property can be used to construct faster mixing chains. Mixing is also improved by occassionally restarting the chain by combining it with an independence chain. Gibbs Sampling. In this sampling scheme, the elements of the vector of model parameters is grouped into blocks, each consisting of a small number of components. Block sizes are typically 1, although it may be natural (as in a Bayesian randomeffects model) to group an entire precision matrix into a single block. Grouping highly 8

20 correlated components into a single higher-dimensional block may also improve the mixing of the chain. Gibbs sampling updates the current state one block at a time. In the context of the Metropolis-Hastings algorithm, the proposal distribution is the conditional posterior distribution of the block being updated, given the data x and the current values of the remaining blocks. Candidate steps are always accepted, since α(y, z) is identically equal to 1. The block sizes are chosen to be very small because the full conditionals have to be sampled in a straight-forward manner, possibly using numerical integration to evaluate the marginal likelihood (or other techniques, like adaptive rejection sampling) when the full conditionals cannot be identified or easily sampled from. Random permutations of the updating order of the blocks are allowed. In fact, not all of the blocks need to be updated in a given cycle. Result GG1 of Gelfand and Smith (1990) states that the chain converges to π, as long as the blocks are updated according to an i.o. visiting scheme. Gibbs sampling is widely used because of its ease of implementation. However, these chains may mix slowly compared to more specialized chains. Provided the model follows a standard distribution at each hierarchical stage, the BUGS software can be used to automatically implement Gibbs sampling. Importance Sampling Based on MCMC Draws. After an MCMC sample has been obtained from the posterior distribution, any subsequent changes to the prior or the likelihood function of the model results in a different posterior distribution. We refer to the posterior resulting from the subsequent changes as the target posterior. Many applications involve the investigation of thousands of changes to the model, 9

21 with each change producing a different posterior distribution. Generating a different MCMC sample for each target posterior is usually not cost-effective. Moreover, the target model may involve parameters that do not have an explicitly written standard distribution at each hierarchical stage. Off the shelf packages like BUGS then cannot be used to generate MCMC draws easily from the target posterior. Departures from a conjugate hierarchical structure may also result in slower mixing chains. In such cases, importance sampling provides estimates of the target posterior features if the source posterior density dominates the target posterior density. Let π s denote the source posterior, and π t denote the target posterior distribution. Let m s (m t ) denote the marginal density of the data under the source (target) model. Both m t and m s are typically unknown. The non-normalized importance sampling weight w (θ) equals (f t (x θ)f t (θ)) / (f s (x θ)f s (θ)), where the subscripts t and s stand for target and source, and f t (θ) and f s (θ) are the respective priors under the target and the source models. The importance sampling estimator Ê[g(θ)] IS defined in (1.1) is consistent, by the ergodic result for empirical averages based on MCMC draws. Analogously to Theorem 2 of Geweke s paper, an application of the delta method and the central limit theorem for geometrically ergodic Markov chains (Geyer, 1992) gives the following result: Assume that the Markov chain is geometrically ergodic with invariant distribution π s. Under mild conditions, the estimator Ê[g(θ)] IS is asymptotically normal: n (Ê[g(θ)]IS E[g(θ)]) N(0, σis 2 ) for some σ2 IS > 0. As described in Geyer (1992), the asymptotic variance σ 2 IS can be estimated consistently by the batch means method or by window estimation. 10

22 As with independent draws, a sufficient condition for importance sampling to produce reasonably accurate estimates is that the importance weight function w(θ) is bounded in the tails. Similarly to importance sampling with independent draws, a matrix scatterplot of the importance sampling weights versus the coordinates of the sample may be used to verify this. The ESS may be used because of computational convenience. Its use may be valid, especially if the j-lag covariance Cov πs (g(θ (0) ), g(θ (j) )) decays fast enough (e.g. if the function g( ) is bounded, in which case the lag covariance decays exponentially in j). Importance sampling techniques can be used to produce faster mixing chains. As mentioned earlier, cycles or mixtures having a uniformly ergodic component are uniformly ergodic. Combining a uniformly ergodic chain with the regular Gibbs sampler therefore produces a faster mixing chain. As an example of a uniformly ergodic chain, consider an independence chain having a proposal density of h(z) for the candidate state z. Let y be the current state of the chain. Then w(z) = π t (z)/h(z) is the importance sampling weight of the candidate state z, with the distribution h regarded as the source. The acceptance probability of the chain is α(y, z) = min {w(z)/w(y), 1}. Gilks, et al. (1996) states that an independence chain is uniformly ergodic if the weight function is bounded, i.e. the density h is thickertailed than the target posterior. Experience with importance sampling is therefore useful for constructing such uniformly ergodic independence chains. An interesting feature of the importance link methodology (discussed earlier) is that it produces consistent importance sampling estimates even when the Markov chain is reducible. This contradicts common notions about MCMC methods. For example, consider a reducible chain having two separate components Θ 1 and Θ 2. The 11

23 probability of the chain going from one component to the other is zero. Suppose that a (possibly many-to-many) ILF mapping exists from set Θ 1 onto set Θ 2. A sample initialized in the set Θ 1 can then be transformed to a sample of points belonging to the set Θ 2. The transformed sample can be used to estimate the feature E[g(θ)I Θ2 (θ)] consistently by importance sampling. The feature E[g(θ)] = E[g(θ)I Θ1 (θ)] + E[g(θ)I Θ2 (θ)] can therefore be consistently estimated. 1.2 Systematic Subsampling While using an MCMC sample to investigate the posterior distribution of a vectorvalued parameter θ, a subsample of the MCMC output is often all that is retained for further investigation of the posterior distribution. Systematically subsampling of the MCMC output is not recommended unless the computational cost of processing the output or of creating the estimator is much greater than the cost of generating the sample. This is convincingly argued in Geyer (1992) as follows: Consider the empirical average estimator, Ê k [g(θ)], based on a systematic 1-in-k subsample of size n. The cost of generating the samples and of creating the estimator are approximately equal to c 1 nk and c 2 n, respectively, for large n. Therefore the total cost is approximately (c 1 k + c 2 )n. The precision of the estimator is approximately n/σk 2, where σ2 k is the asymptotic variance of Êk[g(θ)]. The effective precision of the estimator is therefore (σk 2(c 1k + c 2 )) 1 for large values of n. If the ratio c 2 /c 1 is negligible, the limit equals (kσk 2c 1) 1. As shown in Geyer s paper and also in MacEachern and Berliner (1994), kσk 2 > σ2 1 if k > 1. Any subsampling is therefore bad if this cost structure applies. 12

24 Nevertheless, subsampling is often necessary in computationally intensive or realtime, interactive investigations where speed is essential. Examples include expensive plot processing and examination of changes in the prior (sensitivity analysis), likelihood (robustness) or data (case influence). Typically, such studies involve hundreds or thousands of changes to the model, necessitating subsampling. Practical constraints, like the limited disk space available to users of shared computing resources, often make it infeasible to store the entire sample of MCMC draws when the parameter has a large number of components. A subsample is then retained for future investigation of the posterior. Subsampling may actually result in an increase in the effective accuracy of estimation if the ratio c 2 /c 1 is non-negligible or if the processing cost is non-linear in n. The optimal value of k would then be different from one. As an example of a situation where subsampling is beneficial, consider the likelihood estimator for the marginal likelihood introduced in section 5.2 of Kong et al. (2003). The asymptotic precision of this estimator has a higher order precision than that of Chib (1995). Using the same notation as above, the new likelihood estimator, based on a subample of size n, has a total cost of c 1 n + c 2 n 2 and a precision of p k n 2. The effective precision is therefore approximately equal to p k /c 2, for large n. The positive correlation typical of Gibbs samplers implies that setting k > 1 results in an estimator having a higher effective precision. The above argument appears in MacEachern, Peruggia and Guha (2003). 13

25 1.3 Benchmark Estimation for MCMC Samplers This dissertation introduces a subsample-based estimation technique called benchmark estimation. The goal of benchmark estimation is to improve estimation based on a subsample by using some of the discarded information available in the full MCMC sample. The benchmark estimates must be quick and easy to compute. They must also be compatible with quick computations for further, more (computationally) expensive analyses based on the eventual subsample. Several motivating perspectives are useful to understand and investigate various aspects of benchmark estimation. The point of view of calibration estimation, developed in the sampling literature to improve survey estimates (Deville and Särndal, 1992; Vanderhoeft, 2001), helps to bring all these perspectives together into a unified framework. In calibration estimation, a probability sample from a finite population is used to compute estimates of population quantities of interest. The (regression type) estimators are built as weighted averages of the observations in the sample, with the weights determined so as to satisfy a (vector-valued) calibration equation which forces the resulting estimators to produce exact estimates of known population features. Usually, the constraints imposed by the calibration equation do not determine a unique set of weights. Thus, among the sets of weights satisfying the calibration equation, one chooses the set that yields weights that are as close as possible (with respect to some distance metric) to a fixed set of prespecified (typically uniform) weights. To cast MCMC benchmark estimation into the the framework of calibration estimation, we regard the MCMC output as a finite population and a 1-in-k systematic subsample as a probability sample drawn from the finite population. This systematic 14

26 sampling design gives each unit in the population a probability 1/k of being selected, though many joint inclusion probabilities are 0. In this setting, the (vector-valued) benchmark E[h(θ)], for which the subsample estimate is forced to match the full sample estimate, corresponds to the auxiliary information available through the calibration equation. Once the calibration weights have been calculated, they can then be used to compute the calibration subsample estimate of any feature E[g(θ)]. As the full MCMC sample size increases, the asymptotic performance of these benchmark estimators matches that of the corresponding calibration estimators. The benchmark estimators introduced in Chapter 2 can be shown to be calibration estimators corresponding to appropriately chosen calibration equations and metrics. 1.4 An Overview of this Dissertation The rest of this dissertation is organized as follows: Chapter 2 investigates two methods of creating weights: post-stratification and maximum entropy. In their simplest form, post-stratification weights are derived by partitioning the parameter space into a finite number of regions and by forcing the weighted subsample frequencies of each region to match the corresponding raw frequencies for the entire MCMC sample. The weights are taken to be constant over the various elements of the partition and to sum to one. An improved version of post-stratification, (and, in fact, the approach that in our experience has generated the most successful estimators) begins with a representation of an arbitrary function g(θ) as a countable linear combination of basis functions h j (θ): g(θ) = j=1 c jh j (θ). The estimand, E[g(θ)], is expressed as the same linear 15

27 combination of integrals of the basis functions, j=1 c je[h j (θ)]. Splitting the infinite series representation of g(θ) into two parts, we have a finite series which may provide a good approximation to g(θ) and an infinite remainder series that fills out the representation of g(θ). Focusing on the finite series, we determine the weights by forcing estimates of E[h j (θ)] based on the subsample to match those based on the full sample. In addition, we require the weights to be constant over the elements of a suitably chosen partition of the parameter space and to sum to one. This produces a better estimate of E[ m j=1 c jh j (θ)] than one based on the subsample alone. The improvement carries over to estimation of E[g(θ)] when the tail of the series is of minor importance. We refer to the finite set of basis functions as the (vector-valued) benchmark function. In both the basic and improved post-stratification approaches, we specify enough conditions that (for virtually all MCMC samples of reasonable size) there is a unique set of weights that would satisfy them. Thus, from the point of view of calibration estimation, the choice of the distance metric becomes immaterial, in the sense that any metric would yield identical weights. In this respect, our post-stratification weights arise from a degenerate instance of a problem of calibration estimation. In the case of the maximum entropy weights, however, we do not specify enough benchmark conditions to make the weights unique. Rather, among the sets of weights satisfying an under-determined number of benchmark conditions, we select the set having maximum entropy and this, from the point of view of calibration estimation, is tantamount to choosing a specific distance metric. As we mentioned earlier, many computationally expensive investigations involve hundreds or thousands of changes to the model. Each change in the prior, likelihood 16

28 or the data corresponds to a different posterior distribution. Generating a different MCMC sample for each posterior is usually impossible because of the prohibitive amounts of time and computational effort required. Importance sampling can be then used along with subsampling to estimate quickly various features of interest of the different target posteriors. We show how post-stratification can be used in conjunction with importance sampling to improve future estimation based on the stored subsample. The combination of the techniques makes possible quick and accurate estimation in these situations. Moreover, the cost of combining the two techniques is negligible. Chapter 3 investigates the large-sample properties of post-stratification estimators under three asymptotic situations: Case A. The subsampling distance k tending to, with the subsample size n remaining fixed. This asymptotic is motivated by the fact that in situations where subsampling is necessary, the cost of processing the MCMC draws to produce empirical average estimates generally exceeds the cost of generating the draws. Case B. The subsample size n tending to, with the subsampling distance k remaining fixed. This is motivated by benchmark estimation based on a large number of widely spaced (approximately independent) MCMC draws, where the subsampling distance k is fixed by an initial run of the MCMC algorithm. Case C. The subsample size n and the subsampling distance k jointly tending to. This asymptotic is also a natural candidate for theoretical exploration. It is motivated by MCMC estimation based on a large subsample of widely spaced and approximately independent draws. Viewing this process as a whole, k and n both tend to as the computational resources grow. 17

29 For Case C, we obtain a general result that applies to subsample-based estimators relying on a combination of importance sampling and post-stratification weights. As a corollary, we obtain expressions for the asymptotic variances of post-stratification estimators that do not rely on importance sampling. Although the asymptotic is technically difficult, a new set of analytical tools presented as Lemma B.1.1 and its corollaries in the Appendix allows us to pass in a relatively straightforward manner to an i.i.d. sample of draws. The asymptotic precisions of the combination-weighted estimators are then quantified by a linear projection on the space of the benchmark functions and the strata indicators. The theoretical results presented in chapter 3 suggest the substantial gains that we see in practice for benchmakr estimation. Chapter 4 illustrates the methodology on examples from the literature. By itself and in conjunction with importance sampling, benchmark estimation results in substantial benefits in the estimation of E[g(θ)] for a variety of functions, g(θ). The extent of the improvement in estimation of E[g(θ)] for functions that are noticeably different from the benchmarks is striking, even for values of m as small as 3 or 4. As illustrated in chapter 4 using the data from George, Makov and Smith (1993), the theoretical results are a close match to the actual behavior of the estimators, even for a subsample distance k as small as 10. Chapter 5 summarizes the dissertation and provides pointers to future research. Appendix A contains the proofs of the result stated in chapter 3 describing the behavior of subsample-based post-stratification estimators under Asymptotic B. Appendix B develops the asymptotic tool used in chapter 3 to investigate the properties under Asymptotic C of subsample estimators based on a combination of 18

30 post-stratification and importance sampling weights. It also provides the details of the proofs of related results. Appendix C derives the closed-form expression for the likelihood function obtained in the pumps example in chapter 4 by considering over- or under-dispersed waiting times for the basic gamma-poisson model. Appendix D contains the R code used to generate MCMC samples for the allometry example of chapter 4. 19

31 CHAPTER 2 A Simple Approach to Benchmark Estimation 2.1 An Improved Subsample Estimator Let θ Θ be a vector-valued parameter. Imagine that an MCMC sample is drawn from the posterior distribution of θ. Call the sequence of draws θ (1), θ (2),..., θ (N). The draws are used to estimate some feature of the posterior distribution. Often these features of interest can be represented as E[g(θ)] for some (possibly vector-valued) function g(θ). The most straightforward estimator for E[g(θ)] is Ê[g(θ)] f = 1 N g(θ (i) ), (2.1) N where the right hand subscript denotes the full sample estimator. If one selects a systematic 1-in-k subsample of the data, the natural estimator is i=1 Ê[g(θ)] s = 1 n g(θ (ki) ), (2.2) n where N = kn and s denotes the subsample estimator. As mentioned in Chapter 1, this form of subsampling always leads to poorer estimation; the unweighted subsample estimator (2.2) has a larger variance than the full sample estimator (2.1). We wish to use the information available from the full sample to improve future i=1 estimation based on the subsample for a large class of posterior features. For an 20

32 appropriately chosen (and possibly vector-valued) function h(θ), we refer to the feature E[h(θ)] as the benchmark. We now create a weighted version of the subsample estimator of E[g(θ)] as follows: Ê[g(θ)] w = n w i g(θ (ki) ), (2.3) i=1 where n i=1 w i = 1. The weights w i are chosen so that they force the weighted subsample benchmark estimate to equal the full sample estimate: Ê[h(θ)] w = Ê[h(θ)] f. (2.4) Thus Ê[h(θ)] w and Ê[h(θ)] f have the same distributions provided the weights can be constructed, and all features of their distributions conditional on this event are the same. For a vector-valued benchmark function, any linear combination of its coordinates results in the same estimate for both the subsample and the full sample, and the estimators have the same distribution. In particular, the two estimators have the same variance, and we have possibly greatly increased precision for our subsample estimator of E[g(θ)]. The connection between a conditionally conjugate structure and linear posterior expectation in exponential families implies that, for many popular models, quantities such as the conditional posterior mean for a case or the conditional posterior predictive mean will be a linear function of hyperparameters. The structure of the hierarchical model enables us to use benchmark functions based on the hyperparameters to create more accurate estimates of these quantities. The reduction in variability when moving from Ê[h(θ)] s to Ê[h(θ)] w also appears when examining expectations of functions g(θ) that are similar to h(θ). Functions such as a predictive variance which depend on first and second moments will typically be closely related to benchmark functions 21

33 based on the hyperparameters and so they will be more accurately estimated with our technique. The weighted subsample, (w i, θ (ki) ) for i = 1, 2,..., n, is now used in place of the unweighted subsample. The weights act exactly as they would if arising from an importance sample, and so we obtain weighted subsample estimates Ê[g(θ)] w for various features of interest E[g(θ)] of the posterior. Techniques and software developed for importance samples can be used without modification for these weighted samples. 2.2 Some Methods of Obtaining Weights The constraints that n i=1 w i = 1 and that Ê[h(θ)] w = Ê[h(θ)] f will not typically determine the w i. With a single real benchmark function, we would have only two linear constraints on the w i. We supplement the constraints with a principle that will yield a unique set of weights. The two principles we investigate are motivated by the literatures on survey sampling and information theory. Weights by Post-stratification. Post-stratification is a standard technique in survey sampling, designed to ensure that a sample matches certain characteristics of a population. The population characteristics are matched by computing a weight for each unit in the sample. Large sample results show that a post-stratified sample behaves almost exactly like a proportionally allocated stratified sample. This type of stratification reduces the variance of estimates as compared to a simple random sample. In this setting, the full sample plays the role of the population while the subsample plays the role of the sample. Thus, the essence of the technique is to partition the parameter space into (say) m strata, and to assign the same weight to each θ (ki) in a 22

34 stratum. Formally, suppose that {Θ j } m j=1 is a finite partition of the parameter space Θ. Let I j (θ) denote the indicator of set Θ j, for j = 1,..., m. The natural application of the post-stratification method takes as the benchmark function the vector of these m indicator functions. That is, h(θ) = (I 1 (θ), I 2 (θ),..., I m (θ)). We assign the same weight to all subsample points belonging to a given stratum. Specifically, for all i such that θ (ki) Θ j, we set w i = v j, where, according to (2.4), the values v j are determined by n v j I j (θ (ki) ) = 1 N i=1 N I j (θ (i) ), where j = 1,..., m. i=1 The post-stratification weights are then obtained as: v j = N 1 N i=1 I j(θ (i) ) n i=1 I, (2.5) j(θ (ki) ) provided each of the strata contains at least one subsample point. As in survey sampling, with fairly well chosen strata, the chance that any of the strata are empty of subsample points is negligible. The intuitive description of the post-stratification weight v j is as the ratio of the proportion of full sample points in Θ j to the number of subsample points in Θ j. We refer to this subsample estimator as the basic poststratification estimator, Ê[g(θ)] w,ps. The perspective of a basis expansion of g(θ) provides a more sophisticated use of post-stratification. Instead of using a basis formed of indicator functions (essentially a Haar basis), alternative bases consist of functions other than indicators. An attractive basis, due to its success throughout statistics, is the polynomial basis that generates Taylor series. Assigning equal weight to subsample points within each given post-stratum yields n m linear constraints on the weights, and forcing the 23

35 weights to sum to 1 provides one additional constraint. Supplementing these with a further m 1 linear constraints on the weights (and also with mild conditions on the posterior distribution and simulation method to guarantee uniqueness) defines the weights. These constraints are provided by matching the full sample estimates and the weighted subsample estimates of a vector-valued benchmark function consisting of m 1 components, denoted by h(θ) = (h 1 (θ),..., h m 1 (θ)). The weights are quickly obtained as a solution to a system of m linear equations. This version of post-stratification has proven to be extremely effective in practice. ( ). Let c represent the column vector 1, Ê[h 1(θ)] f, Ê[h 2(θ)] f,..., Ê[h m 1(θ)] f For t = 1,..., m 1 and j = 1,..., m, let h t,j = n i=1 h t(θ (ki) ) I j (θ (ki) ) be the sum of the function h t (θ) over the subsample points belonging to stratum Θ j. For j = 1,..., m, let n j be the number of subsampling points falling in stratum Θ j. Define the square matrix B as n 1 /n... n m /n h 1,1 /n... h 1,m /n B =.. (2.6) h m 1,1 /n... h m 1,m /n Provided the matrix is invertible, the vector of modified post-stratification weights, v = (v 1,..., v m ), is determined as the unique solution to the following system of linear equations: v = (nb) 1 c (2.7) We refer to an estimator based on these benchmark weights as a modified poststratification estimator, Ê[g(θ)] w2. The estimator is defined arbitrarily when B is singular. We shall prove later, under mild assumptions, that this is a rare event for n large enough. 24

36 Equivalent expressions for the estimator Ê[g(θ)] w 2 are: Ê[g(θ)] w2 = ( ḡ 1,... ), ḡ m B 1 c m = (n j v j ) ḡ j,n, (2.8) j=1 where ḡ j = 1 n n i=1 g(θ(ki) ) I j (θ (ki) ) and ḡ j,n = 1 n n j i=1 g(θ(ki) ) I j (θ (ki) ), for j = 1,..., m. Unlike simple post-stratification weights, some of the weights obtained by the modified version of post-stratification may occasionally be negative. The weighted subsample cannot be interpreted as a probability sample in that case. However, it can be shown that this happens very rarely for large subsample sizes. In the simulation study of chapter 4, where subsamples of size n = 1000 are generated, the modified post-stratification weights were positive for all 100 independent replications of the chain. Maximum Entropy Weights. Information theory describes, in various fashions, the amount of information in the data about a parameter or distribution. In a Bayesian context, it is often used to describe subjective information (playing the role of data) in order to elicit a prior distribution. This is accomplished by specifying a number of features of the distribution, typically expectations, as the information about the prior. The prior is then chosen to reflect this information but no more. With entropy defined as the negative of information, the prior which reflects exactly the desired information is that which maximizes entropy among those priors matching the constraints. 25

37 In our setting, we borrow this technique, matching exactly the information in the full sample benchmark estimates, but no more. Let w = (w 1, w 2,..., w n ) be the n-tuple of weights given in (2.3). Let us denote by Ω the (possibly empty) set of all n-tuples of weights that satisfy (2.4). Thus Ω is the set { w w i 0 i, n i=1 w i = 1, n i=1 w ih(θ (ki) ) = Ê[h(θ)] f}. Definition The entropy of an n-tuple w belonging to the set Ω is defined as n En (w) = w i ln w i, i=1 subject to the convention that 0 ln(0) equals 0. We observe that for all w belonging to Ω, En (w) En (( 1 n, 1 n,..., 1 n)) = ln(n). Since Ω is closed, there exists an element w of Ω such that En (w ) = sup w Ω En (w). These weights w are called maximum entropy weights, and they exist whenever Ω is non-empty. Finding maximum entropy weights w is thus equivalent to maximizing En (w) subject to the constraints w i 0 for i = 1,..., n, n i=1 w i = 1, and n i=1 w ih(θ (ki) ) = Ê[h(θ)] f. Since Ω is a closed, convex set and En (w) is a strictly concave function, the maximum entropy weights w are unique whenever they exist. For a real benchmark function h(θ) and most subsamples of reasonable size, it can be shown that the maximum entropy weights w are given by w i = e λ 1 +λ 2 h(θ (ki)), i = 1, 2,... n; (2.9) where λ 2 R satisfies the equation n i=1 ( ) ( )) h(θ (ki) ) Ê[h(θ)] f exp (λ 2 h(θ (ki) ) Ê[h(θ)] f 26 = 0, (2.10)

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Direct Simulation Methods #2

Direct Simulation Methods #2 Direct Simulation Methods #2 Econ 690 Purdue University Outline 1 A Generalized Rejection Sampling Algorithm 2 The Weighted Bootstrap 3 Importance Sampling Rejection Sampling Algorithm #2 Suppose we wish

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013 Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Peter Beerli October 10, 2005 [this chapter is highly influenced by chapter 1 in Markov chain Monte Carlo in Practice, eds Gilks W. R. et al. Chapman and Hall/CRC, 1996] 1 Short

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

FAV i R This paper is produced mechanically as part of FAViR. See  for more information. Bayesian Claim Severity Part 2 Mixed Exponentials with Trend, Censoring, and Truncation By Benedict Escoto FAV i R This paper is produced mechanically as part of FAViR. See http://www.favir.net for more

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

MONTE CARLO METHODS. Hedibert Freitas Lopes

MONTE CARLO METHODS. Hedibert Freitas Lopes MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

TEORIA BAYESIANA Ralph S. Silva

TEORIA BAYESIANA Ralph S. Silva TEORIA BAYESIANA Ralph S. Silva Departamento de Métodos Estatísticos Instituto de Matemática Universidade Federal do Rio de Janeiro Sumário Numerical Integration Polynomial quadrature is intended to approximate

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Statistical Inference for Stochastic Epidemic Models

Statistical Inference for Stochastic Epidemic Models Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods University of Pennsylvania Bradley Visitor Lectures October 23, 2017 Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models 5 MCMC approaches Label switching MCMC for variable dimension models 291/459 Missing variable models Complexity of a model may originate from the fact that some piece of information is missing Example

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

More information