Markov Chain Monte Carlo (MCMC)
|
|
- Douglas Fowler
- 6 years ago
- Views:
Transcription
1 Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can be used if a trial density f can be found where π/f has a reasonable bound. When the dimension is high this is difficult or impossible. Rejection sampling, the inversion method, and the density transform method all produce independent realizations from π. If these methods are inefficient or difficult to implement, we can drop independence as a goal, aiming instead to generate a dependent sequence X 1, X 2,... such that the marginal distribution of ech X i is π. Going a step further, we may allow the marginal distribution π k (X k to be different from π, but converge to π in some sense: π k π. By relaxing the goal of the problem, it becomes possible to overcome the key problem of rejection sampling that it does not adapt as information about π is built up through repeated evaluations of π at the sampled points. A practical framework for constructing dependent sequences satsifying these goals is provided by discrete time Markov chains. Review of Markov Chains A discrete time Markov chain is a sequence of random variables X 0, X 1,... such that P (X n+1 A X 0,..., X n = P (X n+1 A X n P (X n, A. The function P is called the transition kernel. A probability distribution π is called the invariant distribution for P if π(a = P (x, Aπ(xdx, for all measurable sets A, in which case we may write πp = π. More generally this relationship may hold for a non-finite measure π, in which case π is called the invariant measure for P. 1
2 Suppose that P has a density, denoted q(x, y. This means that P (x, A = A q(x, ydy. A sufficient condition for π to be the invariant distribution for P is detailed balance: π(xq(x, y = π(yq(y, x. The proof is a simple application of Fubini s theorem. The distribution π is the equilibrium distribution for X 0, X 1,... if satisfies P n (X 0, A P (X n A X 0 lim P n (x, A = π(a, n for all measurable sets A and for π almost-every x. Key definitions: Irreducible: The Markov chain given by transition kernel P with invariant distribution π is irreducible if for each measurable set A with π(a > 0, for each x there exists n such that P n (x, A > 0. Recurrent: The Markov chain given by transition kernel P with invariant distribution π is recurrent if for every measurable set B with π(b > 0, and for π almost-every X 0. P (X 1, X 2,... B i.o. X 0 > 0 X 0 P (X 1, X 2,... B i.o. X 0 = 1 The Markov chain given by transition kernel P with invariant distribution π is Harris recurrent if P (X 1, X 2,... B i.o. X 0 = 1 X 0. Fact: A Markov transition kernel P that is π-irreducible and has an invariant measure is recurrent. A recurrent kernel P with a finite invariant measure is positive recurrent, otherwise it is null recurrent. 2
3 Periodic: A Markov transition kernel P is periodic if there exists n 2 and a sequence of nonempty, disjoint measurable sets A 1, A 2,..., A n such that x A j (j < n, P (x, A j+1 = 1, x A j, P (x, A 1 = 1 x A n. Total Variation norm: For any measure λ, the total variation (TV norm is λ = sup A λ(a inf A λ(a, where the supremum and infimum are taken over all λ measurable sets. The distance between two probability measures π 1 and π 2 can be assesed using π 1 π 2. Theorem: Let P be a Markov transition kernel with invariant distribution π, and suppose that P is π irreducible. Then P is positive recurrent, and π is the unique invariant distribution of P. If P is aperiodic, then for π almost-every x P n (x, A π(a 0 for all measurable sets A (P n converges to π in the TV norm. If P is Harris recurrent, then the convergence occurs for all x. In words, we can start at essentially any X, run the chain for a long time, and the final draw has a distribution that is approximately π. The long time is called the burn in, and it is generally impossible to know how long this needs to be in order to get a good approximation. We don t need independence for the most common application of Monte Carlo studies: estimating E π f: the mean of a function f with respect to the invariant distribution π. If chain is uniformly ergodic sup x P n (x, π Mr n for some M 0 and r < 1, there is a central limit theorem: let f n = n j=1 f(x j /n denote the sample mean. Then n( f n E π f converges weakly to a normal distribution with mean 0. Methods for generating chains with a given equilibrium distribution We have a density π that we can evaluate but not sample from. How do we construct a markov chain P with invariant distribution π? Gibbs sampling: Suppose the random variable X = (X 1,..., X d can be partitioned into blocks, and we can sample from each of the conditional distributions P (X i {X j, j i}. These transitions are invariant with respect to P (X 1,..., X d. Consider the case X = (X 1, X 2, and for a set A, let A X1 = {X 2 (X 1, X 2 A}. Under the proposal distribution P (X 2 X 1 : 3
4 P ((X 1, X 2, Aπ(X 1, X 2 dx 1 dx 2 = π(y X 1 dy π(x 2 X 1 π(x 1 dx 1 dx 2 A X1 = π(y X 1 π(x 1 dy dx 1 A X1 = π(a. The Gibbs chain will in general only be irreducible and aperiodic if we combine all conditional distributions in random order. The Metropolis (or Metropolis-Hastings algorithm. Suppose we have a transition density q(x, y (P (X n+1 A X n = x = A q(x, ydy. Define the Metropolis-Hastings ratio by α(x, y = π(yq(y, x min{, 1} π(xq(x, y π(xq(x, y > 0 = 1 π(xq(x, y = 0. The kernel q(x, y will roughly play the role that the trial density plays in rejection sampling. Suppose we are at X n. Sample Y from q(x n, y. With probability α(x n, Y set X n+1 = Y, otherwise set X n+1 = X n. The transition kernel can be represented as where δ x is the point mass at x, and p(x, y = q(x, yα(x, y + r(xδ x r(x = 1 q(x, yα(x, ydy is the marginal probability of remaining at x. The resulting chain satisfies detailed balance, and hence has invariant distribution π. Only density evaluations are required, and unknown normalizing constants cancel. General Metropolis-Hastings Schemes Gibbs sampling If q(x, y = π(y x then α(x, y 1, so we accept every move. This shows that Gibbs sampling is a special case of the Metropolis-Hastings algorithm. Random Walks Suppose q(x, y is a random walk: q(x, y = q (y x for some distribution q. The Metropolis-Hastings ratio is min(π(yq (x y/π(xq (y x, 1 (or 1 if π(xq (y x = 0. If q is symmetric, then the Metropolis-Hastings ratio reduces to min(π(y/π(x, 1 (or 1 if π(x = 0. 4
5 An important practical issue with random walk chains is the scaling of the random walk density q. If the variance of q is too large, most steps will move into the tails of the distribution and will be rejected. On the other hand if the variance is too small, the autocorrelation in the sequence of accepted draws will be high and convergence to the target distribution will be slow. Roberts and Rosenthal (2001 have shown that an acceptance rate of around 0.23 is optimal when the target density is the product of identical marginal density functions. Independence chains If q(x, y = q(y, so that the proposals are an iid sequence, the Metropolis-Hastings ratio is min(π(yq(x/π(xq(y, 1 (or 1 if π(xq(y = 0. This is similar to rejection sampling, however the acceptance probabilities are scaled by q(x/q(y rather than by 1/c, and points are repeated rather than rejected if the proposal failes. Two interesting special cases are: 1. q(x 1: if the proposal distribution is uniform, it is even more similar to rejection sampling, since the Metropolis-Hastings ratio becomes min(π(y/π(x, If q(y = π(y, then the Metropolis-Hastings ratio is identically 1, and we are sampling directly from the target density. Rejection sampling chains Suppose we want to use rejection sampling from a trial density f to produce realizations from π. We can use the Metropolis-Hastings algorithm to rescue rejection sampling when we can not find a good bound for π/f. We conjecture, but are not certain, that π/f < c. If we carry out rejection sampling we are actually drawing from the density proportional to f = min(π, cf. Suppose we use an independence chain generated by f to drive the Metropolis-Hastings algorithm. Thus we have q(x, y min(π(y, cf(y. Let S denote the subset of the sample space where the bound π/f < c holds. We have four possibilities: 1. x, y S α = x S, y S α = x S, y S α < x S, y S. In this case nothing can be said about α in general. Since S c is the set of points that will be undersampled using the rejection sampling procedure, it is reasonable that moves from S to S c are always accepted, while moves from S c to S always have a nonzero probability of being rejected. The chain dwells in S c to build up mass that is missing from f. Examples: Normal mean and variance: Suppose we observe Y 1,..., Y n iid from the N(µ, σ 2 distribution. If we take flat (improper priors on µ and σ 2, the posterior distribution is 5
6 ( P (µ, σ 2 {Y } (σ 2 n/2 exp 1 (Y 2σ 2 i µ 2. i As a function of µ this is proportional to Thus exp( 1 2σ 2 (nµ2 2µY exp( n 2σ 2 (µ Ȳ 2 P (µ σ 2, {Y } = N(Ȳ, σ2 /n. To get the conditional distribution of σ 2 given {Y } and µ, do the change of variables η = 1/σ 2, so P (η µ, {Y } η n/2 2 exp( η i (Y i µ 2 /2. This is a Γ(n/2 1, 2/ i(y i µ 2 distribution. So the Gibbs sampling algorithm alternates between sampling µ from a normal distribution and η from a gamma distribution. Grouped binary data: Suppose we observe m groups of Bernoulli trials, where the i th group consists of n i iid trials. Let n i be the number of trials in group i, and let p i be the success probability. Suppose the p i are viewed as random effects from a symmetric Beta distribution, which has density proportional to p α 1 (1 p α 1. Let k i be the number of successes observed in the i th group, so P ({k i } {p i }P ({p i } α i p k i+α 1 i (1 p i n i k i +α 1. To be safe, we will take a nearly-flat proper exponential prior on α with mean θ (set to an arbitrary large value. Gibbs sampling operates using the following distributions The specific updates are P (α {p i }, {k i } P ({p i } α, {k i } α exponential ( 1/( i p i i (1 p i + 1/θ 6
7 and p i Beta(k i + α, n i k i + α. Grouped continuous data with right-skewed group means: Suppose we observe data Y ij for m groups with n i replicates in group i, so i = 1,..., m and j = 1,..., n i. Let µ i be the population mean for group i, and suppose we take the µ i to be random effects from an exponential distribution with mean λ. We will also take Y i1,..., Y ini µ i to be independent and Gaussian with mean µ i and variance σ 2. If we form the improper posterior based on flat priors for λ and σ 2, then integrate out the random effects µ i, we get P (λ, σ 2 {Y } i (σ 2 n i/2 λ 1 ( exp 1 2σ (Y 2 i µ1 ni (Y i µ1 ni exp( µ/λdµ = (σ 2 n /2+m/2 λ m i exp ( σ 2 (S i M 2 i /n i /2 M i λ 1 /n i + σ 2 λ 2 /2n i, where Y i = (Y i1,..., Y ini, M i = j Y ij, and S i = j Y 2 ij. This does not have a finite integral, therefore a sampling algorithm applied to this formulation of the problem will not give meaningful results. For simplicity, we first suppose that η = 1/λ and σ 2 have a joint prior density proportional to exp( 1 2 i 1 n i σ 2 η 2 on (0, (0,. In this case, the posterior on η, σ 2 is proportional to (σ 2 n /2+m/2 η m 2 exp ( σ 2 i (S i M 2 i /n i /2 η i M i /n i. Now if we set θ = 1/σ 2, we get the posterior expressed in terms of η and θ θ (n m/2 2 η m 2 exp ( θ i (S i M 2 i /n i /2 η i M i /n i. Letting V i = i S i /n i M 2 i /n 2 i, approximately var(y i /n i, and M i = M i /n i, the sample mean of Y i, we get θ (n m/2 2 η m 2 exp ( θ i n i V i /2 η i M i. 7
8 Thus η and θ have independent Gamma posteriors η Γ(m 1, 1/ i M i θ Γ((n m/2 1, 2/ i n i V i. If we wanted to use a different prior, say with λ and σ 2 independent and exponential with means β λ and β σ 2 we could use the Gibbs sampler. Cauchy regression: Suppose we take the linear regression model Y i = α + β X i + ɛ i, where Y i R and X i R d are observed, and the ɛ i have central Cauchy distributions, with density π 1 /(1 + ɛ 2 i. The joint distribution of the observed data is: π(y 1,..., Y n X 1,..., X n = i π (Y i α β X i 2. Suppose we consider the Bayesian model with improper prior on θ = (α, β. We use a Metropolis-Hastings algorithm with a random walk distribution for q: q(θ, θ = φ(θ θ, where φ is the bivariate normal distribution with mean 0. Due to the symmetry of the random walk, the Metropolis ratio α simplifies to α(θ, θ = π(θ /π(θ. Rejection sampling using the wrong bound. Suppose we want to sample from the density π(x = I( 1 x 1 x 1/2 /4. The density is unbounded at x = 0, so we can not use rejection sampling with a bounded trial density. Suppose we generate draws using the rejection sampling algorithm with a uniform trial density on ( 1, 1 and c = 6. These draws will have density q(x I( 1 x 1 min( x 1/2 /4, 3. Now we use these draws to drive a Metropolis algorithm. On S = {x x 1/144}, the bound π/f < 6 is satisfied (where f is the uniform trial density. The Metropolis- Hastings ratio is given by the following. 1. x, y S α = x S, y S α = x S, y S α = 12 x < x S, y S α = π(y/π(x = x/y. 8
9 3 2 alpha beta alpha beta alpha beta Figure 1: Sample paths for α and β for the Cauchy regression model, using random walk input chains with Gaussian steps having mean zero and standard deviation σ =.01 (top, σ = 3 (middle, and σ = 1 (bottom. 9
10 Gaussian mixture model. Suppose that Y 1,..., Y n R d are independent and observed, δ 1,..., δ n {1, 2,..., K} are iid and unobserved, and P (Y j δ j = k = φ(y j µ k, where φ( is the standard multivariate normal density. The parameters in the model are µ 1,..., µ K, and π 1,..., π K, where π k = P (δ j = k. We can take flat priors on these parameters, and estimate them through their posterior distribution, which can be simulated using Gibbs sampling. Here is a full set of conditional distributions, where µ = [µ 1,..., µ n ], and similarly for Y and δ. P (µ π, Y, δ = P (µ Y, δ P (Y µ, δ = φ(y j µ δj j This splits into a product of functions of µ 1,..., µ K, implying that the µ k are independent given π, Y and δ. The factor that involves µ k can be expressed exp( 1 2 ( 2µ k Y j I(δ j = k + µ 2 k I(δ j = k, j which is a multivariate normal density with mean j Y j I(δ j = k/ j I(δ j = k, and variance 1/ I(δ j = k (with zero correlation. j P (δ π, Y, µ P (Y δ, µ P (δ π = j φ(y j µ δj π δj This splits into a product of functions of δ 1,..., δ n, implying that the δ j are independent given π, Y and µ. To sample δ j, first compute q k = φ(y j µ k π k for k = 1,..., K, then rescale the q k so they sum to 1, and generate a multinomial draw according to the probabilities that result. P (π Y, µ, δ P (δ π π k k j I(δ j=k Let n k = j I(δ j = k, so the posterior density of π is proportional to k π n k k. This is a Dirichlet density, from which samples can be generated by simulating Z k Γ(n k +1, 1, and setting π k = Z k / l Z l. 10
11 Figure 2: Gibbs sample paths for the probability parameters π 1, π 2, π 3 in the Gaussian mixture model with k = 3 components, d = 10 Y dimensions, and n = 200 observations. The horizontal lines are the true values. Spatial processes. Suppose we have a random 0/1 array Y ij {0, 1}, and we specify the conditional probabilities as P (Y ij = 1 Y i+1,j, Y i 1,j, Y i,j+1, Y i,j 1 = ɛ + τ ɛ 4 (Y i+1,j + Y i 1,j + Y i,j+1 + Y i,j 1. where 0 ɛ τ 1. It is a fact that this family of conditional distributions is associated with a unique joint distribution on the array. The conditional proability of observing a 1 given the neighbors ranges from ɛ to τ, and increases as more neighbors have 1 s. Larger values of ɛ τ produce greater spatial autocorrelation (the tendency of 1 s to cluster spatially in the array. We can simulate from this distribution using the Gibbs sampler. 11
Markov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationThe Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS
The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationA short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods
A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods by Kasper K. Berthelsen and Jesper Møller June 2004 2004-01 DEPARTMENT OF MATHEMATICAL SCIENCES AALBORG
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationConvergence Rate of Markov Chains
Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationComputer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationAPM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53
APM 541: Stochastic Modelling in Biology Bayesian Inference Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 53 Outline Outline 1 Introduction 2 Conjugate Distributions 3 Noninformative priors
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods
More informationLattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC)
Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London aaa joint work with Zheng Wang (Huawei Technologies Shanghai) aaa September 20, 2016 Cong Ling (ICL) MCMC
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationMarkov Random Fields
Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMonte Carlo Methods in Bayesian Inference: Theory, Methods and Applications
University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 1-016 Monte Carlo Methods in Bayesian Inference: Theory, Methods and Applications Huarui Zhang University of Arkansas, Fayetteville
More informationSimulation - Lectures - Part III Markov chain Monte Carlo
Simulation - Lectures - Part III Markov chain Monte Carlo Julien Berestycki Part A Simulation and Statistical Programming Hilary Term 2018 Part A Simulation. HT 2018. J. Berestycki. 1 / 50 Outline Markov
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationMultiplicative random walk Metropolis-Hastings on the real line
1 2 Multiplicative random walk Metropolis-Hastings on the real line 3 Somak Dutta University of Chicago, IL, USA. arxiv:1008.5227v2 [stat.co] 15 Aug 2011 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationMarkov Chain Monte Carlo Lecture 6
Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationPSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL
PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMarkov and Gibbs Random Fields
Markov and Gibbs Random Fields Bruno Galerne bruno.galerne@parisdescartes.fr MAP5, Université Paris Descartes Master MVA Cours Méthodes stochastiques pour l analyse d images Lundi 6 mars 2017 Outline The
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationInverse Problems in the Bayesian Framework
Inverse Problems in the Bayesian Framework Daniela Calvetti Case Western Reserve University Cleveland, Ohio Raleigh, NC, July 2016 Bayes Formula Stochastic model: Two random variables X R n, B R m, where
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at
MSc MT15. Further Statistical Methods: MCMC Lecture 5-6: Markov chains; Metropolis Hastings MCMC Notes and Practicals available at www.stats.ox.ac.uk\ nicholls\mscmcmc15 Markov chain Monte Carlo Methods
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationHierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31
Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationMath 456: Mathematical Modeling. Tuesday, April 9th, 2018
Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount
More information1 Kernels Definitions Operations Finite State Space Regular Conditional Probabilities... 4
Stat 8501 Lecture Notes Markov Chains Charles J. Geyer April 23, 2014 Contents 1 Kernels 2 1.1 Definitions............................. 2 1.2 Operations............................ 3 1.3 Finite State Space........................
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationMarkov Chain Monte Carlo for Linear Mixed Models
Markov Chain Monte Carlo for Linear Mixed Models a dissertation submitted to the faculty of the graduate school of the university of minnesota by Felipe Humberto Acosta Archila in partial fulfillment of
More informationThe Metropolis Algorithm
16 Metropolis Algorithm Lab Objective: Understand the basic principles of the Metropolis algorithm and apply these ideas to the Ising Model. The Metropolis Algorithm Sampling from a given probability distribution
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationGeometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics
Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics a dissertation submitted to the faculty of the graduate school of the university
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationMarkov chain Monte Carlo methods in atmospheric remote sensing
1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture
More information