Markov Chain Monte Carlo methods

Size: px
Start display at page:

Download "Markov Chain Monte Carlo methods"


1 Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation Just here Supresses numbering After this 1.2 Literature 2 Method 2.1 New math As we pointed out in Section 1... and Application See Example 1 for details 3 Results 4 Conclusion 4.1 Introduction Markov Chain Monte Carlo (MCMC) methods are computational methods developed for Bayesian inference. Bayesian inference deals with parameter 1

2 estimation under some prior assumptions. For example, suppose we are estimating some parameter θ. We have some information about θ expressed as a prior distribution. It s called prior because this is what we believe will happen prior to collecting any data. What happens after you collect your data? The evidence from your data is summarized in the likelihood. This is simply the (joint) density of your observations. However, in the likelihood and Bayesian inference, it is treated as a function of the (unknown) θ, and the data y 1,..., y n are treated as given. The goal of Bayesian inference is to compute the posterior distribution. It s called posterior because it is computed after obtaining the data. It s the conditional distribution of the parameter θ given the data. We will assume that we deal with continuous quantities, so the formulas below are in terms of densities. By the continuous version of Bayes formula, the posterior is p(θ y 1,..., y n ) = p(θ, y 1,..., y n ) p(y 1,..., y n ) or, briefly, = p(y 1,..., y n θ) p(θ) p(y 1,..., y n ) posterior likelihood prior p(y 1,..., y n θ) p(θ) where the sign is frequently used in Bayesian analysis and reads "proportional to", that is, equal to the quantity described times some proportionality constant. Frequently, this constant is found later from the condition that the posterior density integrates to 1. Thus, the posterior distribution combines prior information with the new information obtained from the data, and makes a balanced guess about the unknown parameters. Example 1 : Normal/Normal prior and likelihood Under a simple assumption that both prior and likelihood are Normal, suppose first that we have just one observation y. Let the prior p(θ) exp [ (θ µ ] 0) 2, that is, N (µ 0, σ0), 2 2σ 2 0 and the likelihood ] (y θ)2 p(y θ) exp [, that is, N (θ, σ 2 ). 2σ 2 We will assume that σ 2 is known and so the parameter θ is the unknown mean we d like to estimate. (1) 2

3 Using Eq.(1), we get, after some algebra work, p(θ y) exp [ (θ µ ] p) 2 2σ 2 p where ( 1/σ0 2 µ p = µ 0 1/σ /σ + y 1/σ /σ /σ and 2 σ2 p = σ σ 2 ) 1 Thus, the posterior also has Normal distribution, N (µ p, σ 2 p). Note also that µ p is the weighted average of the prior mean µ 0 and the observation y, where the weights are inversely proportional to the variances. p(x) p(x) p(x) x x x Figure 1: Prior (broken lines), likelihood(solid lines) and posterior (thick lines) for normal conjugate prior. Left: σ 2 0 = 0.5, center: σ 2 0 = 1, right: σ 2 0 = 2. In Fig. 1, three cases are shown, all with the same data y = 2, σ 2 = 1, prior mean µ 0 = 0, and differing only by the value of σ0. 2 When σ0 2 is small, the prior has a larger weight, and thus the posterior mean is closer to the prior mean. When σ0 2 is large, the situation is the opposite. This is a case of the so-called conjugate prior on θ which is chosen in such a way that the posterior has the same functional form as the prior. This example is easily generalized to several observations y 1,..., y n. If they are all independent Normal, with the same mean θ and standard deviation σ, then we may treat them as a single observation y = y i /n and standard deviation σ/ n. Direct computation of posterior densities is impossible for all but the simplest problems. Markov Chain Monte Carlo (MCMC) methods are computational methods developed for Bayesian inference. As with all Monte Carlo methods 1, the goal of MCMC is to obtain a sample from the probability distribution of interest. However, this sample will not consist of independent 1 See e.g. Tarantola, Chapter 2 3

4 observations (as is the case with classical Monte Carlo), but rather form a sequence of realizations of a Markov chain. The trick is to set up a Markov chain whose stationary distribution is exactly the posterior distribution we need to sample from. Markov chain is a sequence of random variables, for which every observation is independent of the past, except for its immediate predecessor. This means that the Markov chain {X t } is defined by its transition probability (or transition kernel in case of continuous state-space) that describes the conditional probability density q(x t+1 X t ), t = 0, 1, 2, 3,... (2) In addition, we will assume that a starting probability density q(x 0 ) is also known. The procedure of generating such a chain starts with generating (sampling) X 0 from this density, then iterating (2) to get the subsequent samples. Under some assumptions on the transition probability, a Markov chain will converge to its stationary distribution, regardless of the starting value X 0. Such Markov chains are called ergodic. In particular, an easy condition to validate is the detailed balance condition π(y)q(x y) = q(y x)π(x) for all x, y which ensures that π is the stationary distribution. Denote the data briefly as y and the unknown parameters we d like to estimate as θ. Then, the Markov chain takes values in the θ-space. The MCMC methods employ ways to generate Markov chains that will have the desired posterior p(θ y) as its stationary distribution. They are Gibbs sampler and Metropolis-Hastings method. 4.2 Gibbs sampler One way to form a Markov chain that will converge to the posterior p(θ y) is to split the values of θ into blocks of variables, assuming that it s easy to generate samples from these blocks. This is the case when the blocks will have a nice conjugate forms for its distributions. Suppose that we split the model into k blocks, θ = [θ 1, θ 2,..., θ k ]. In the simplest case, the blocks are just scalar components of the model θ. The Gibbs sampler will iteratively obtain samples of θ j based on their full conditional posteriors (FCP s) defined as p j (θ j y, θ 1,..., θ j 1, θ j+1,..., θ k ). The process is done for all j = 1,..., k and then repeated many times until the sample of the desired length from the entire θ is obtained. 4

5 4.3 Example: Regression with censored data There are frequently cases in Bayesian inference when the estimates can be easily obtained if only certain hidden variables were known. We will consider one such situation and indicate how to set up the Gibbs sampler. The censored data situation arises when we do not know the exact value of an observation, but some inequality is available, for example, yi c i (* here indicates a censored observation). This frequently happens in survival studies when an item was removed from study at the time c i before we had the chance to observe its failure at yi. A similar situation arises in environmental studies with non-detects. A non-detect means that a certain chemical (likely a pollutant) was not detected in the sample, however we cannot with certainty claim that it does not exist, but only that its concentration lies below an estimated threshold. In this case, yi c i where yi is the unknown true concentration and c i is the threshold. If we treat the missing data as extra parameters in the model, we can sample from their FCP given all the other model parameters. For example, if we fit a regression model with some predictors x i and errors ε i, y i = β 0 + β 1 x i + ε i, i = 1,..., n then the unobserved y i can be sampled from the truncated Normal distribution with the mean β 0 +β 1 x i, standard deviation σ (equal to the standard deviation of errors ε i ) and the upper threshold c i. Then, in turn, the Gibbs sampler will use the current samples of y i, together with known concentrations y j, to estimate the regression parameters β 0, β 1 and σ. This way, we will get the MCMC samples from both model parameters and the missing data. Example: from Helzel (2005), Ch. 14. The data given are TCE 2 concentrations (µg/l) in ground waters of Long Island, New York, along with several possible explanatory variables (population density, land use and depth to the water surface). Objective is to determine if concentrations are related to one or more explanatory variables. There are four detection limits, at 1, 2, 4 and 5 µg/l. Out of 247 observations, 194 are classified as non-detects. We will use y i = ln(t CE) and x i = population density. The data are shown in Fig. 2. What do you think is the direction of the trendline? Is the slope positive or negative? The parameters β 0, β 1 can be estimated using linear regression. However, we will be interested not only in the estimates ˆβ 0 and ˆβ 1, but their entire FCP. Fortunately, it s a bivariate Normal, and we have an easy way to generate samples from it. 2 trichloroethylene, a chlorinated hydrocarbon commonly used as an industrial solvent 5

6 log(conc) Population density Figure 2: Censored observations: blank circles, uncensored observations: dark circles. The points are jittered, i.e. have a small amount of noise added to visualize multiple occurrences of the same point. Namely, if β are the coefficients from linear regression y = Gβ, where the data have covariance matrix σ 2 I, then we know that β has the distribution (likelihood) N (ˆβ, σ 2 (G T G) 1 ), where ˆβ = (G T G) 1 G T y. We can generate a sample from such distribution by using e.g. Cholesky decomposition and then using σ 2 (G T G) 1 = R T R β = R T s + ˆβ where s is a standard Normal vector. For simplicity, we will assume that β = [β 0, β 1 ] have a flat prior, that is, p(β) 1, which corresponds to the situation we have no prior information on β s. Another technicality concerns sampling from σ. This is usually done using inverse chi-square prior on σ 2. Under assumption of normal errors e i, this 6

7 turns out to be a conjugate prior, i.e. the posterior distribution of σ 2 will also be the inverse chi-square. 3 The updating equation is σ 2 = V 0df 0 + e 2 i χ 2 df where χ 2 df is a chi-square random variable with df = df 0 + n m, with m equal to the number of parameters in the regression equation (here m = 2), df 0 are the prior degrees of freedom, and V 0 is the prior variance. Decreasing df 0 to 0 leads to using a flat prior on σ Metropolis-Hastings algorithm Metropolis algorithm and its extension, Metropolis-Hastings algorithm, were developed for sampling from non-standard densities. Its idea is as follows. To sample from x with some density q(x), generate a proposal value x and, at the iteration t + 1 of the sampler, accept this value, setting x t+1 = x, with the probability 4 { } q(x ) p acc = min q(x t ), 1. Otherwise, we would keep the old value x t. In practice, this means generating a uniform [0, 1] random variable U and setting { xt+1 = x, if U p acc x t+1 = x t, if U > p acc (3) This algorithm is reminiscent of the stochastic search algorithm for maximizing the function q(x), where we move to a new point if that increases the value of q, and stay at the old point otherwise. The difference in Metropolis algorithm is that we also occasionally jump to a point with a lower value of q. This, among other things, helps us overcome the local maxima. 5 The practical difficulty lies in generating a suitable proposal value x. One popular method is random-walk Metropolis algorithm for which the proposal value equals x = x t +h t, where x t is the previous value from the Markov chain, and h t is chosen randomly from a symmetric distribution, for example, Uniform 3 The equation for the inverse chi-square density is given e.g. by Gelman et al (2003). Of course, a sample value for σ can be obtained taking the square root of σ 2. 4 Note that since p acc is the ratio of q-densities, we only need to know the density q(x) up to a proportionality constant! This makes the Metropolis method particularly attractive to the computation of the posterior densities. 5 The benefits of both algorithms can be combined in the simulated annealing algorithm that will move more freely in the beginning and become more like stochastic search in the end. 7

8 on [ δ h, δ h ] or Normal N (0, σh 2). The size of the jump h t may be chosen adaptively. Smaller jumps will result in higher acceptance rates, but will be slower in exploring the model space. Longer jumps will be less frequently accepted, so the chain will tend to get stuck in the same place. The jumps that are either too small or too large will result in the increased autocorrelation of your Markov chain, and therefore the need to get longer chains to estimate your parameters more precisely. Some studies have shown (see e.g. Gelman et al.) that the most efficient acceptance frequency lies between 20% and 50%. The Metropolis method is simple to use, however, it requires certain assumptions on how the proposal value x is picked. For example, the randomwalk Metropolis version will not work if the jump h t has an asymmetric distribution. The generalization, Metropolis-Hastings method, does not require the proposal distribution to be symmetric, but we will not discuss it here. 4.5 Analyzing the MCMC output First, we need to realize that, using MCMC, we produce just a sample from the posterior density. Thus, the estimates we obtain from it (e.g. mean, median, credible intervals etc.) will be subject to sampling error. For example, if we took M Monte-Carlo samples, then the error in estimating the mean is proportional to 1/ M. m3 values Frequency ACF sample Lag Figure 3: Tri-plot for a highly autocorrelated output Also, as any iterative method, MCMC methods take some time to converge. However, the convergence is not to any particular number, but to the distribution π, i.e. the entire range of numbers. To monitor convergence, a simple graphical tool is to produce the plots of MC values and watch them until they seem to converge to some stable values. A more scientific method uses multiple Markov chains, started at different values and run in parallel 8

9 (see Gelman et al). After the convergence analysis, we determine a burn-in period, during which the initial values that we collected are discarded. Another complication is specific to MCMC methods. These methods produce positively correlated samples from the desired distribution. This means, roughly, that the next value of the Markov chain is similar to the previous value. The higher the correlation, the less new information each sample value contains! The amount of correlation is usually monitored using the autocorrelation plots. They indicate the amount of correlation between x t and x t+, for various values of the lag. We can use these plots to find the value D such that the autocorrelation is negligible for D. Then, to obtain the final estimates of our parameters, we will thin out our sample, keeping only every Dth value. m1 values Frequency ACF sample Lag Figure 4: Tri-plot for a slightly autocorrelated output See Fig. 4 above. In practice, we recommend using the tri-plot for each scalar parameter that we fit. The tri-plot includes the time-series plot of the MC values, the histogram of the results (to monitor the posterior distribution), and the autocorrelation plot. See Figures 1 and 2 for the examples. In Figure 2, the burn-in period is clearly seen in the beginning. Once we obtained a clean sample of the model parameters, it is straightforward to obtain the estimates. It is important to realize, however, that all of these estimates are subject to sampling error. The MAP (maximum a posteriori) solutions might be difficult to obtain in the absence of the density functions to maximize. Thus, for symmetric distributions, we might settle for the posterior means instead. In case of asymmetric distributions, we might want to use posterior medians. The credible intervals can be easily obtained using the sample percentiles 9

10 (quantiles). For example, to obtain the 95% credible interval, we may use the sample 2.5-th percentile as the lower bound and 97.5-th percentile as the upper bound. Bibliography Gelman, A., J.B. Carlin, H.S. Stern and D.B. Rubin (2003), Bayesian Data Analysis, 2nd ed., Chapman & Hall/CRC. Gi Gilks, W. R., Richardson, S. and Spiegelhalter, D., eds. (1996), Markov Chain Monte Carlo in Practice: Interdisciplinary Statistics, Chapman & Hall/CRC Diaconis, P. (2009), The Markov Chain Monte Carlo revolution, Bulletin of the AMS, 46 (2), Helzel, D.R. (2005), Nondetects and Data Analysis: Statistics for Censored Environmental Data, Wiley. Tarantola, A. (2005), Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM. 10

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (, Zoubin Ghahramni (,

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Spatio-temporal precipitation modeling based on time-varying regressions

Spatio-temporal precipitation modeling based on time-varying regressions Spatio-temporal precipitation modeling based on time-varying regressions Oleg Makhnin Department of Mathematics New Mexico Tech Socorro, NM 87801 January 19, 2007 1 Abstract: A time-varying regression

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge Ghahramani

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis ( An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22 Overview 1 Bayesian modelling

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

Tools for Parameter Estimation and Propagation of Uncertainty

Tools for Parameter Estimation and Propagation of Uncertainty Tools for Parameter Estimation and Propagation of Uncertainty Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 Outline Models, parameters, parameter estimation,

More information



More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

Bayesian Estimation of Input Output Tables for Russia

Bayesian Estimation of Input Output Tables for Russia Bayesian Estimation of Input Output Tables for Russia Oleg Lugovoy (EDF, RANE) Andrey Polbin (RANE) Vladimir Potashnikov (RANE) WIOD Conference April 24, 2012 Groningen Outline Motivation Objectives Bayesian

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Markov Networks.

Markov Networks. Markov Networks Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem? Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Molecular Epidemiology Workshop: Bayesian Data Analysis

Molecular Epidemiology Workshop: Bayesian Data Analysis Molecular Epidemiology Workshop: Bayesian Data Analysis Jay Taylor and Ananias Escalante School of Mathematical and Statistical Sciences Center for Evolutionary Medicine and Informatics Arizona State University

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Model Comparison:

Bayesian Model Comparison: Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information


INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information


MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

A Beginner s Guide to MCMC

A Beginner s Guide to MCMC A Beginner s Guide to MCMC David Kipping Sagan Workshop 2016 but first, Sagan workshops, symposiums and fellowships are the bomb how to get the most out of a Sagan workshop, 2009-style lunch with Saganites

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK Adriana Ibrahim Institute

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information