Markov Chain Monte Carlo methods
|
|
- Oswin Ross
- 5 years ago
- Views:
Transcription
1 Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation Just here Supresses numbering After this 1.2 Literature 2 Method 2.1 New math As we pointed out in Section 1... and Application See Example 1 for details 3 Results 4 Conclusion 4.1 Introduction Markov Chain Monte Carlo (MCMC) methods are computational methods developed for Bayesian inference. Bayesian inference deals with parameter 1
2 estimation under some prior assumptions. For example, suppose we are estimating some parameter θ. We have some information about θ expressed as a prior distribution. It s called prior because this is what we believe will happen prior to collecting any data. What happens after you collect your data? The evidence from your data is summarized in the likelihood. This is simply the (joint) density of your observations. However, in the likelihood and Bayesian inference, it is treated as a function of the (unknown) θ, and the data y 1,..., y n are treated as given. The goal of Bayesian inference is to compute the posterior distribution. It s called posterior because it is computed after obtaining the data. It s the conditional distribution of the parameter θ given the data. We will assume that we deal with continuous quantities, so the formulas below are in terms of densities. By the continuous version of Bayes formula, the posterior is p(θ y 1,..., y n ) = p(θ, y 1,..., y n ) p(y 1,..., y n ) or, briefly, = p(y 1,..., y n θ) p(θ) p(y 1,..., y n ) posterior likelihood prior p(y 1,..., y n θ) p(θ) where the sign is frequently used in Bayesian analysis and reads "proportional to", that is, equal to the quantity described times some proportionality constant. Frequently, this constant is found later from the condition that the posterior density integrates to 1. Thus, the posterior distribution combines prior information with the new information obtained from the data, and makes a balanced guess about the unknown parameters. Example 1 : Normal/Normal prior and likelihood Under a simple assumption that both prior and likelihood are Normal, suppose first that we have just one observation y. Let the prior p(θ) exp [ (θ µ ] 0) 2, that is, N (µ 0, σ0), 2 2σ 2 0 and the likelihood ] (y θ)2 p(y θ) exp [, that is, N (θ, σ 2 ). 2σ 2 We will assume that σ 2 is known and so the parameter θ is the unknown mean we d like to estimate. (1) 2
3 Using Eq.(1), we get, after some algebra work, p(θ y) exp [ (θ µ ] p) 2 2σ 2 p where ( 1/σ0 2 µ p = µ 0 1/σ /σ + y 1/σ /σ /σ and 2 σ2 p = σ σ 2 ) 1 Thus, the posterior also has Normal distribution, N (µ p, σ 2 p). Note also that µ p is the weighted average of the prior mean µ 0 and the observation y, where the weights are inversely proportional to the variances. p(x) p(x) p(x) x x x Figure 1: Prior (broken lines), likelihood(solid lines) and posterior (thick lines) for normal conjugate prior. Left: σ 2 0 = 0.5, center: σ 2 0 = 1, right: σ 2 0 = 2. In Fig. 1, three cases are shown, all with the same data y = 2, σ 2 = 1, prior mean µ 0 = 0, and differing only by the value of σ0. 2 When σ0 2 is small, the prior has a larger weight, and thus the posterior mean is closer to the prior mean. When σ0 2 is large, the situation is the opposite. This is a case of the so-called conjugate prior on θ which is chosen in such a way that the posterior has the same functional form as the prior. This example is easily generalized to several observations y 1,..., y n. If they are all independent Normal, with the same mean θ and standard deviation σ, then we may treat them as a single observation y = y i /n and standard deviation σ/ n. Direct computation of posterior densities is impossible for all but the simplest problems. Markov Chain Monte Carlo (MCMC) methods are computational methods developed for Bayesian inference. As with all Monte Carlo methods 1, the goal of MCMC is to obtain a sample from the probability distribution of interest. However, this sample will not consist of independent 1 See e.g. Tarantola, Chapter 2 3
4 observations (as is the case with classical Monte Carlo), but rather form a sequence of realizations of a Markov chain. The trick is to set up a Markov chain whose stationary distribution is exactly the posterior distribution we need to sample from. Markov chain is a sequence of random variables, for which every observation is independent of the past, except for its immediate predecessor. This means that the Markov chain {X t } is defined by its transition probability (or transition kernel in case of continuous state-space) that describes the conditional probability density q(x t+1 X t ), t = 0, 1, 2, 3,... (2) In addition, we will assume that a starting probability density q(x 0 ) is also known. The procedure of generating such a chain starts with generating (sampling) X 0 from this density, then iterating (2) to get the subsequent samples. Under some assumptions on the transition probability, a Markov chain will converge to its stationary distribution, regardless of the starting value X 0. Such Markov chains are called ergodic. In particular, an easy condition to validate is the detailed balance condition π(y)q(x y) = q(y x)π(x) for all x, y which ensures that π is the stationary distribution. Denote the data briefly as y and the unknown parameters we d like to estimate as θ. Then, the Markov chain takes values in the θ-space. The MCMC methods employ ways to generate Markov chains that will have the desired posterior p(θ y) as its stationary distribution. They are Gibbs sampler and Metropolis-Hastings method. 4.2 Gibbs sampler One way to form a Markov chain that will converge to the posterior p(θ y) is to split the values of θ into blocks of variables, assuming that it s easy to generate samples from these blocks. This is the case when the blocks will have a nice conjugate forms for its distributions. Suppose that we split the model into k blocks, θ = [θ 1, θ 2,..., θ k ]. In the simplest case, the blocks are just scalar components of the model θ. The Gibbs sampler will iteratively obtain samples of θ j based on their full conditional posteriors (FCP s) defined as p j (θ j y, θ 1,..., θ j 1, θ j+1,..., θ k ). The process is done for all j = 1,..., k and then repeated many times until the sample of the desired length from the entire θ is obtained. 4
5 4.3 Example: Regression with censored data There are frequently cases in Bayesian inference when the estimates can be easily obtained if only certain hidden variables were known. We will consider one such situation and indicate how to set up the Gibbs sampler. The censored data situation arises when we do not know the exact value of an observation, but some inequality is available, for example, yi c i (* here indicates a censored observation). This frequently happens in survival studies when an item was removed from study at the time c i before we had the chance to observe its failure at yi. A similar situation arises in environmental studies with non-detects. A non-detect means that a certain chemical (likely a pollutant) was not detected in the sample, however we cannot with certainty claim that it does not exist, but only that its concentration lies below an estimated threshold. In this case, yi c i where yi is the unknown true concentration and c i is the threshold. If we treat the missing data as extra parameters in the model, we can sample from their FCP given all the other model parameters. For example, if we fit a regression model with some predictors x i and errors ε i, y i = β 0 + β 1 x i + ε i, i = 1,..., n then the unobserved y i can be sampled from the truncated Normal distribution with the mean β 0 +β 1 x i, standard deviation σ (equal to the standard deviation of errors ε i ) and the upper threshold c i. Then, in turn, the Gibbs sampler will use the current samples of y i, together with known concentrations y j, to estimate the regression parameters β 0, β 1 and σ. This way, we will get the MCMC samples from both model parameters and the missing data. Example: from Helzel (2005), Ch. 14. The data given are TCE 2 concentrations (µg/l) in ground waters of Long Island, New York, along with several possible explanatory variables (population density, land use and depth to the water surface). Objective is to determine if concentrations are related to one or more explanatory variables. There are four detection limits, at 1, 2, 4 and 5 µg/l. Out of 247 observations, 194 are classified as non-detects. We will use y i = ln(t CE) and x i = population density. The data are shown in Fig. 2. What do you think is the direction of the trendline? Is the slope positive or negative? The parameters β 0, β 1 can be estimated using linear regression. However, we will be interested not only in the estimates ˆβ 0 and ˆβ 1, but their entire FCP. Fortunately, it s a bivariate Normal, and we have an easy way to generate samples from it. 2 trichloroethylene, a chlorinated hydrocarbon commonly used as an industrial solvent 5
6 log(conc) Population density Figure 2: Censored observations: blank circles, uncensored observations: dark circles. The points are jittered, i.e. have a small amount of noise added to visualize multiple occurrences of the same point. Namely, if β are the coefficients from linear regression y = Gβ, where the data have covariance matrix σ 2 I, then we know that β has the distribution (likelihood) N (ˆβ, σ 2 (G T G) 1 ), where ˆβ = (G T G) 1 G T y. We can generate a sample from such distribution by using e.g. Cholesky decomposition and then using σ 2 (G T G) 1 = R T R β = R T s + ˆβ where s is a standard Normal vector. For simplicity, we will assume that β = [β 0, β 1 ] have a flat prior, that is, p(β) 1, which corresponds to the situation we have no prior information on β s. Another technicality concerns sampling from σ. This is usually done using inverse chi-square prior on σ 2. Under assumption of normal errors e i, this 6
7 turns out to be a conjugate prior, i.e. the posterior distribution of σ 2 will also be the inverse chi-square. 3 The updating equation is σ 2 = V 0df 0 + e 2 i χ 2 df where χ 2 df is a chi-square random variable with df = df 0 + n m, with m equal to the number of parameters in the regression equation (here m = 2), df 0 are the prior degrees of freedom, and V 0 is the prior variance. Decreasing df 0 to 0 leads to using a flat prior on σ Metropolis-Hastings algorithm Metropolis algorithm and its extension, Metropolis-Hastings algorithm, were developed for sampling from non-standard densities. Its idea is as follows. To sample from x with some density q(x), generate a proposal value x and, at the iteration t + 1 of the sampler, accept this value, setting x t+1 = x, with the probability 4 { } q(x ) p acc = min q(x t ), 1. Otherwise, we would keep the old value x t. In practice, this means generating a uniform [0, 1] random variable U and setting { xt+1 = x, if U p acc x t+1 = x t, if U > p acc (3) This algorithm is reminiscent of the stochastic search algorithm for maximizing the function q(x), where we move to a new point if that increases the value of q, and stay at the old point otherwise. The difference in Metropolis algorithm is that we also occasionally jump to a point with a lower value of q. This, among other things, helps us overcome the local maxima. 5 The practical difficulty lies in generating a suitable proposal value x. One popular method is random-walk Metropolis algorithm for which the proposal value equals x = x t +h t, where x t is the previous value from the Markov chain, and h t is chosen randomly from a symmetric distribution, for example, Uniform 3 The equation for the inverse chi-square density is given e.g. by Gelman et al (2003). Of course, a sample value for σ can be obtained taking the square root of σ 2. 4 Note that since p acc is the ratio of q-densities, we only need to know the density q(x) up to a proportionality constant! This makes the Metropolis method particularly attractive to the computation of the posterior densities. 5 The benefits of both algorithms can be combined in the simulated annealing algorithm that will move more freely in the beginning and become more like stochastic search in the end. 7
8 on [ δ h, δ h ] or Normal N (0, σh 2). The size of the jump h t may be chosen adaptively. Smaller jumps will result in higher acceptance rates, but will be slower in exploring the model space. Longer jumps will be less frequently accepted, so the chain will tend to get stuck in the same place. The jumps that are either too small or too large will result in the increased autocorrelation of your Markov chain, and therefore the need to get longer chains to estimate your parameters more precisely. Some studies have shown (see e.g. Gelman et al.) that the most efficient acceptance frequency lies between 20% and 50%. The Metropolis method is simple to use, however, it requires certain assumptions on how the proposal value x is picked. For example, the randomwalk Metropolis version will not work if the jump h t has an asymmetric distribution. The generalization, Metropolis-Hastings method, does not require the proposal distribution to be symmetric, but we will not discuss it here. 4.5 Analyzing the MCMC output First, we need to realize that, using MCMC, we produce just a sample from the posterior density. Thus, the estimates we obtain from it (e.g. mean, median, credible intervals etc.) will be subject to sampling error. For example, if we took M Monte-Carlo samples, then the error in estimating the mean is proportional to 1/ M. m3 values Frequency ACF sample Lag Figure 3: Tri-plot for a highly autocorrelated output Also, as any iterative method, MCMC methods take some time to converge. However, the convergence is not to any particular number, but to the distribution π, i.e. the entire range of numbers. To monitor convergence, a simple graphical tool is to produce the plots of MC values and watch them until they seem to converge to some stable values. A more scientific method uses multiple Markov chains, started at different values and run in parallel 8
9 (see Gelman et al). After the convergence analysis, we determine a burn-in period, during which the initial values that we collected are discarded. Another complication is specific to MCMC methods. These methods produce positively correlated samples from the desired distribution. This means, roughly, that the next value of the Markov chain is similar to the previous value. The higher the correlation, the less new information each sample value contains! The amount of correlation is usually monitored using the autocorrelation plots. They indicate the amount of correlation between x t and x t+, for various values of the lag. We can use these plots to find the value D such that the autocorrelation is negligible for D. Then, to obtain the final estimates of our parameters, we will thin out our sample, keeping only every Dth value. m1 values Frequency ACF sample Lag Figure 4: Tri-plot for a slightly autocorrelated output See Fig. 4 above. In practice, we recommend using the tri-plot for each scalar parameter that we fit. The tri-plot includes the time-series plot of the MC values, the histogram of the results (to monitor the posterior distribution), and the autocorrelation plot. See Figures 1 and 2 for the examples. In Figure 2, the burn-in period is clearly seen in the beginning. Once we obtained a clean sample of the model parameters, it is straightforward to obtain the estimates. It is important to realize, however, that all of these estimates are subject to sampling error. The MAP (maximum a posteriori) solutions might be difficult to obtain in the absence of the density functions to maximize. Thus, for symmetric distributions, we might settle for the posterior means instead. In case of asymmetric distributions, we might want to use posterior medians. The credible intervals can be easily obtained using the sample percentiles 9
10 (quantiles). For example, to obtain the 95% credible interval, we may use the sample 2.5-th percentile as the lower bound and 97.5-th percentile as the upper bound. Bibliography Gelman, A., J.B. Carlin, H.S. Stern and D.B. Rubin (2003), Bayesian Data Analysis, 2nd ed., Chapman & Hall/CRC. Gi Gilks, W. R., Richardson, S. and Spiegelhalter, D., eds. (1996), Markov Chain Monte Carlo in Practice: Interdisciplinary Statistics, Chapman & Hall/CRC Diaconis, P. (2009), The Markov Chain Monte Carlo revolution, Bulletin of the AMS, 46 (2), Helzel, D.R. (2005), Nondetects and Data Analysis: Statistics for Censored Environmental Data, Wiley. Tarantola, A. (2005), Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM. 10
Bayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationSpatio-temporal precipitation modeling based on time-varying regressions
Spatio-temporal precipitation modeling based on time-varying regressions Oleg Makhnin Department of Mathematics New Mexico Tech Socorro, NM 87801 January 19, 2007 1 Abstract: A time-varying regression
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationBayesian Inference. Chapter 1. Introduction and basic concepts
Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master
More informationItem Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions
R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMarkov chain Monte Carlo methods in atmospheric remote sensing
1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationSAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software
SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationA Bayesian Approach to Phylogenetics
A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte
More informationBayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School
Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22 Overview 1 Bayesian modelling
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationTools for Parameter Estimation and Propagation of Uncertainty
Tools for Parameter Estimation and Propagation of Uncertainty Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline Models, parameters, parameter estimation,
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationQuantile POD for Hit-Miss Data
Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationComputer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationBayesian Estimation of Input Output Tables for Russia
Bayesian Estimation of Input Output Tables for Russia Oleg Lugovoy (EDF, RANE) Andrey Polbin (RANE) Vladimir Potashnikov (RANE) WIOD Conference April 24, 2012 Groningen Outline Motivation Objectives Bayesian
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationLecture Notes based on Koop (2003) Bayesian Econometrics
Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationWho was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?
Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the
More informationBayesian Phylogenetics
Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationBayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder
Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationMolecular Epidemiology Workshop: Bayesian Data Analysis
Molecular Epidemiology Workshop: Bayesian Data Analysis Jay Taylor and Ananias Escalante School of Mathematical and Statistical Sciences Center for Evolutionary Medicine and Informatics Arizona State University
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationBayesian Model Comparison:
Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More informationA Beginner s Guide to MCMC
A Beginner s Guide to MCMC David Kipping Sagan Workshop 2016 but first, Sagan workshops, symposiums and fellowships are the bomb how to get the most out of a Sagan workshop, 2009-style lunch with Saganites
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More information