Adaptive Monte Carlo methods
|
|
- Ashlyn Ray
- 6 years ago
- Views:
Transcription
1 Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert (Université Paris Dauphine) Séminaire Montpellier (17 décembre 2006) Page 1
2 Introduction Let (X, B(X), Π) be a probability space. (A1) Π µ and Π(dx) = π(x)µ(dx). (A2) π is known up to a normalizing constant: π(x) = π(x) ; π(x)µ(dx) π is known; the calculation of π(x)µ(dx) < is intractable. Séminaire Montpellier (17 décembre 2006) Page 2
3 Problem: for some Π-measurable applications h, approximate h(x) π(x)µ(dx) Π(h) = h(x)π(x)µ(dx) =. π(x)µ(dx) (A3) the calculation of h(x) π(x)µ(dx) is intractable. More concisely, X Π and we would like to approximate (µ(dx) = dx). Π(h) = E Π (h(x)) = h(x) π(x)dx π(x)dx Séminaire Montpellier (17 décembre 2006) Page 3
4 Applications in Bayesian inference where the target distribution is the posterior distribution of the parameter of interest π(θ x) f(x θ)π 1 (θ) f(x θ) (θ Θ) is the likelihood ; π 1 (θ) the prior distribution of θ. A Bayesian estimator of θ is the posterior mean of θ, that is θf(x θ)π 1 (θ)dθ E π (θ x) =. f(x θ)π 1 (θ)dθ Séminaire Montpellier (17 décembre 2006) Page 4
5 Monte Carlo methods (MC) = Generate an iid sample x 1,...,x N from Π and to estimate E Π (h(x)) by N ˆΠ MC N (h) = N 1 h(x i ). i=1 (i) MC ˆΠ N (h) as E Π (h(x)); (ii) E Π (h 2 (X)) <, ) N (ˆΠ MC N (h) E Π (h(x)) L N(0, V Π (h(x)). Often impossible to simulate directly from Π! Séminaire Montpellier (17 décembre 2006) Page 5
6 Markov Chain Monte Carlo methods (MCMC) = Generate x (1),...,x (T) from a Markov chain (x t ) t N with stationary distribution Π and estimate E Π (h(x)) by ˆΠ MCMC N (h) = N 1 T i=t N+1 h ( x (i)). Convergence to the stationary distribution could be very slow! Séminaire Montpellier (17 décembre 2006) Page 6
7 Metropolis Hastings algorithms Metropolis Hastings algorithms are generic (or down-the-shelf) MCMC algorithms, compared with the Gibbs sampler, in the sense that they can be tuned with a much wider range of possibilities. Séminaire Montpellier (17 décembre 2006) Page 7
8 If the target distribution has density π, the generic Metropolis Hastings algorithm is: Initialization: Choose an arbitrary x (0) Iteration t: 1. Given x (t 1), generate x q(x (t 1), x) 2. Calculate ρ(x (t 1), x) = min ( π( x)/q(x (t 1) ), x) π(x (t 1) )/q( x, x (t 1) ), 1 3. With probability min(ρ(x (t 1), x), 1) accept x and set x (t) = x; otherwise reject x and set x (t) = x (t 1). Séminaire Montpellier (17 décembre 2006) Page 8
9 This algorithm only needs to simulate from q which we can choose arbitrarily, as long as q is capable of reaching all areas of positive probability under π. While theoretical guarantees that the algorithm converges are very high, the choice of q remains paramount in practice. Séminaire Montpellier (17 décembre 2006) Page 9
10 The random walk sampler A random walk proposal has a symmetric transition density q(x, y) = q RW (y x) where q RW (x) = q RW ( x). In this case the acceptance probability ρ(x, y) reduces to the simpler form ( ρ(x, y) = min 1, π(y) ). π(x) Séminaire Montpellier (17 décembre 2006) Page 10
11 Example Consider the standard normal distribution N(0, 1) as a target. If we use random walk Metropolis-Hastings algorithm with a normal random walk, i.e. ( x x (t 1) N x (t 1), σ 2), q RW ( x x (t 1) ) = 1 2πσ 2 exp 1 2σ 2 ( x x(t 1) ), the performances of the sampler depends on the value of σ 2. Séminaire Montpellier (17 décembre 2006) Page 11
12 MH chain Iterations Density x Autocorrelation Iterations Density MH chain x Autocorrelation Lag Lag Figure 1: (left) σ 2 = 10 4 and a (right) σ 2 = 10 3 top: sequence of 10,000 iterations subsampled at every 10-th iteration; middle: histogram of the 2, 000 last iterations compared with the target density; bottom: empirical autocorrelations. Séminaire Montpellier (17 décembre 2006) Page 12
13 MH chain Iterations Density x Autocorrelation Lag Figure 2: σ 2 = 2 top: sequence of 10, 000 iterations subsampled at every 10-th iteration; middle: histogram of the 2, 000 last iterations compared with the target density; bottom: empirical autocorrelations. Séminaire Montpellier (17 décembre 2006) Page 13
14 Importance sampling Let Q be a probability distribution on (X, B(X)). Suppose that Q(dx) = q(x)dx and that [q(x) = 0] [π(x) = 0]. Π(h) = E Π (h(x)) = h(x) π(x) q(x) q(x)dx = E Q ( ) π(x) q(x) h(x) = Q ( ) π q h = Generate an iid sample x 1,...,x N from Q, called the proposal distribution, and to estimate Π(h) by ˆΠ IS Q,N(h) = N 1 N i=1 π(x i ) q(x i ) h(x i). Séminaire Montpellier (17 décembre 2006) Page 14
15 (i) ˆΠ IS Q,N(h) as E π (h(x)); ( ) (ii) E π 2 (X) Q q 2 (X) h2 (X) <, N(ˆΠ IS Q,N(h) E Π (h(x)) L N ( ( )) π(x) 0, V Q q(x) h(x). ( ) For many h, a sufficient condition for E π 2 (X) Q q 2 (X) h2 (X) is bounded. < is that π/q The normalizing constant of Π is unknown, not possible to use ˆΠ IS Q,N. It is natural to use the self-normalized version of the IS estimator, ˆΠ SNIS Q,N (h) = ( N i=1 ) 1 π(x i ) N q(x i ) i=1 π(x i ) q(x i ) h(x i). Séminaire Montpellier (17 décembre 2006) Page 15
16 (i) SNIS ˆΠ Q,N (h) as E Π (h(x)); ( (ii) E π 2 (X) ( Q q 2 (X) 1 + h 2 (X) )) <, N(ˆΠSNIS Q,N (h) E Π (h(x)) L N ( ( )) π(x) 0, V Q (h(x) π(h). q(x) The quality of the SNIS approximation depends on the choice of the proposal distribution Q. Séminaire Montpellier (17 décembre 2006) Page 16
17 It is the well-known that the importance distribution / q (x) = h(x) π(x) h(y) π(y)dy minimizes the variance of ˆΠ IS Q,N (h). It produces a zero variance estimator when h is either positive or negative (indeed, in both cases, ˆπ IS Q,N = E Π(h(X))). q cannot be used in practice because it depends on the integral h(y) π(y)dy. This result is thus rather understood as providing a goal for choosing a importance function g tailored for the approximation of E Π (h(x)). Séminaire Montpellier (17 décembre 2006) Page 17
18 / q (x) = h(x) π(h) π(x) h(y) E Π (h(x)) π(y)dy minimizes the asymptotic variance of ˆΠ SNIS Q,N (h). This second optimum is not available either, because it still depends on E Π (h(x)). There is little in the literature besides general recommendations that the support of q should be the support of h(x) π(x) or of h(y) π(h) π(y), or yet that the tails of q should be at least as thick as those of h(x) π(x). Séminaire Montpellier (17 décembre 2006) Page 18
19 PMC algorithms The notion of importance sampling can actually be greatly generalized to encompass much more adaptive and local schemes than thought previously. This extension is to learn from experience, that is, to build an importance sampling function based on the performances of earlier importance sampling proposals. By introducing a temporal dimension to the selection of the importance function, an adaptive perspective can be achieved at little cost, for a potentially large gain in efficiency. Séminaire Montpellier (17 décembre 2006) Page 19
20 D-kernel PMC algorithm Let Q i,t be the proposal distribution at iteration t of the algorithm for particle x i,t. Obviously, the quasi-total freedom in the construction of the Q i,t s has drawbacks, namely that some proposals do not necessarily lead to improvements in terms of variance reduction. We now restrict the family of proposals from which to select the new Q i,t s to mixture of fixed proposals. Séminaire Montpellier (17 décembre 2006) Page 20
21 We assume from now on that we use in parallel D fixed kernels Q d (, ) with densities q d and that the proposal is a mixture of those kernels q i,t (x) = D d=1 α t,n d q d ( x i,t 1, x), d α t,n d = 1, where the weights α t,n d > 0 can be modified at each iteration. The amount of adaptivity we allow in this version of PMC is thus restricted to a possible modification of the weights α t,n d. Séminaire Montpellier (17 décembre 2006) Page 21
22 The importance weight associated with this mixture proposal is / D π(x i,t ) d=1 α t,n d q d ( x i,t 1, x i,t ) while simulation from q i,t can be decomposed in the two usual mixture steps: first pick the component d then simulate from the corresponding kernel Q d. Séminaire Montpellier (17 décembre 2006) Page 22
23 Generic D-kernel PMC algorithm At time 0, produce the sample ( x i,0 ) 1 i N and set α 1,N d At time 1 t T, = 1/D (1 d D); a). Conditionally on the α t,n iid d s, generate (K i,t ) 1 i N M(1, (α t,n d ) 1 d D ); b). Conditionally on ( x i,t 1, K i,t ) 1 i N, generate independently ffi X D and set ω i,t = π(x i,t ) (x i,t ) 1 i N Q Ki,t ( x i,t 1, ) d=1 α t,n d q d ( x i,t 1, x i,t ); c). Conditionally on ( x i,t 1, K i,t, x i,t ) 1 i N, generate set x i,t = x Ji,t,t and α t+1,n d (J i,t ) 1 i N iid M(1, (ω i,t ) 1 i N ) = Ψ d (( x i,t 1, x i,t,k i,t ) 1 i N ) such that P D d=1 αt+1,n d = 1. Séminaire Montpellier (17 décembre 2006) Page 23
24 Ψ d (1 d D) denotes an update function that depends upon the past iteration. (A1) d {1,...,D}, Π Π {q d (x, x ) = 0} = 0 (the individual kernel importance weights are almost surely finite). Séminaire Montpellier (17 décembre 2006) Page 24
25 Theorem 1 Under (A1) and (A2), for any function h in L 1 π and for all t 0, both the unnormalised and the self-normalized PMC estimators are convergent, and ˆΠ PMC t,n (h) = 1 N ˆΠ SNPMC t,n (h) = N i=1 N i=1 ω i,t h(x i,t ) N P E Π (h(x)) ω i,t h(x i,t ) N P E Π (h(x)). As noted earlier, the unnormalised PMC estimator can only be used when π is completely known. Séminaire Montpellier (17 décembre 2006) Page 25
26 { } (A2) Π Π (1 + h 2 (x )) π(x ) q d (x,x ) (integrability condition). Theorem 2 Under (A1) and (A2), if for all t 1, < for a d {1,...,D} 1 d D, α t,n d N P α t d > 0, then both ( N ) N ω i,t h(x i,t ) E Π (h(x)) i=1 N ( 1 N and ) N ω i,t h(x i,t ) E Π (h(x)) i=1 converge in distribution as n goes to infinity to centered normal distributions with variances Séminaire Montpellier (17 décembre 2006) Page 26
27 σ 2 1,t = Π Π ( (h(x ) E Π (h(x))) 2 π(x ) D d=1 αt d q d(x, x ) and ( σ2,t 2 π(x ) = Π Π D d=1 αt d q d(x, x ) h(x ) E Π (h(x)) ) ) 2 D d=1 αt d q d(x, x ) π(x ). Séminaire Montpellier (17 décembre 2006) Page 27
28 A first Kullback-Leibler criterion S = { } D α = (α 1,...,α D ); d {1,...,D}, α d 0 and α d = 1. d=1 α S, let us denote by KL 1 (α) the Kullback-Leibler divergence between the mixture and the target distribution Π: [ ( )] π(x)π(x ) KL 1 (α) = log π(x) D d=1 α Π Π(dx, dx ). dq d (x, x ) First Kullback-Leibler divergence criterion: the best mixture of transition kernels is the one that minimizes KL 1 (α). Séminaire Montpellier (17 décembre 2006) Page 28
29 Theorem 3 Under (A1) and (A2), for the unnormalised and the selfnormalised cases, the updates Ψ d of the mixture weights given by α t+1,n d = N ω i,t I d (K i,t ) i=1 garantee a systematic decrease of KL 1, a long-term run of the algorithm providing the mixture that is KL 1 -closest to the target. Séminaire Montpellier (17 décembre 2006) Page 29
30 A first toy example Target π(x) = 1/3f N( 1,0.1) (x) + 1/3f N(0,1) (x) + 1/3f N(3,10) (x). 3 proposal distributions: N( 1, 0.1), N(0, 1) and N(3, 10) (more simple than transition kernels) Use of the Rao-Blackwellized 3-kernels algorithm with N = 100, 000 Séminaire Montpellier (17 décembre 2006) Page 30
31 Table 1: Evolution of the proposal mixture weights over the PMC iterations Séminaire Montpellier (17 décembre 2006) Page 31
32 A second toy example Target Π N(0, 1). 3 Gaussian random walks proposals: q 1 (x, x ) = f N(x,0.1) (x ), q 2 (x, x ) = f N(x,2) (x ) and q 3 = f N(x,10) (x ) Use of the Rao-Blackwellized 3-kernels algorithm with N = 100, 000 Séminaire Montpellier (17 décembre 2006) Page 32
33 Table 2: Evolution of the proposal mixture weights over the PMC iterations Séminaire Montpellier (17 décembre 2006) Page 33
34 A second Kullback-Leibler criterion in the unnormalised case α S, let us denote by KL 2 (α) the Kullback-Leibler divergence between the mixture and h(x) π(x): [ ( )] π(x) h(x ) π(x ) KL 2 (α) = log π(x) D d=1 α Π Π(dx, dx ). dq d (x, x ) Second Kullback-Leibler divergence criterion: the best mixture of transition kernels is the one that minimizes KL 2 (α). Séminaire Montpellier (17 décembre 2006) Page 34
35 Theorem 4 Under (A1), for the unnormalised case, the updates Ψ d of the mixture weights given by α t+1,n d = N i=1 / N ω i,t h(x i,t ) I d (K i,t ) i=1 ω i,t h(x i,t ) garantee a systematic decrease of KL 2, a long-term run of the algorithm providing the mixture that is KL 2 -closest to the target. Séminaire Montpellier (17 décembre 2006) Page 35
36 Asymptotic variance criterion α S, let us define σ 2 1(α) = Π Π (self-normalised case) ( (h(x ) π(h)) 2 π(x ) D d=1 α dq d (x, x ) ( σ2(α) 2 π(x ) = Π Π D d=1 α dq d (x, x ) h(x ) π(h) ) ) 2 D d=1 α dq d (x, x ) π(x ). (unnormalised case) Asymptotic variance criterion: the best mixture of transition kernels is the one that minimizes σ 2 1(α) or σ 2 2(α). Séminaire Montpellier (17 décembre 2006) Page 36
37 Theorem 5 Under (A1) and (A2), for the unnormalised case, the updates Ψ d of the mixture weights given by α t+1,n d = N i=1 / N ωi,th 2 2 (x i,t )I d (K i,t ) i=1 ω 2 i,th 2 (x i,t ) garantee a systematic decrease of σ 2 2, a long-term run of the algorithm providing the mixture that is σ 2 2-closest to the target. Séminaire Montpellier (17 décembre 2006) Page 37
38 Theorem 6 Under (A1) and (A2), for the self-normalised case, the updates Ψ d of the mixture weights given by 2 N N h(x i,t ) ω j,t h(x j,t ) I d (K i,t ) α t+1,n d = i=1 ω 2 i,t N i=1 ω 2 i,t j=1 h(x i,t ) N ω j,t h(x j,t ) garantee a systematic decrease of σ 2 1, a long-term run of the algorithm providing the mixture that is σ 2 1-closest to the target. j=1 2 Séminaire Montpellier (17 décembre 2006) Page 38
39 A final example Target N(0, 1) and h(x) = x In this case, it well-known that the optimal importance distribution which minimises the variance of the unnormalised importance sampling estimator is g (x) x exp x 2 /2. We choose g as one of D = 3 independent kernels, the other kernels being the N(0, 1) and the C(0, 1) (Cauchy) distributions. N = 100, 000 and T = 20 Séminaire Montpellier (17 décembre 2006) Page 39
40 t ˆπ t,n PMC (x) αt+1,n 1 α t+1,n 2 α t+1,n 3 σ 2,t Séminaire Montpellier (17 décembre 2006) Page 40
41 t Figure 3: Estimation of E[X] = 0 for a normal variate: decrease of the standard deviation to its optimal value Séminaire Montpellier (17 décembre 2006) Page 41
Adaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More informationOutline. General purpose
Outline Population Monte Carlo and adaptive sampling schemes Christian P. Robert Université Paris Dauphine and CREST-INSEE http://www.ceremade.dauphine.fr/~xian 1 2 3 Illustrations 4 Joint work with O.
More informationSurveying the Characteristics of Population Monte Carlo
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationComputer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMCMC and likelihood-free methods
MCMC and likelihood-free methods Christian P. Robert Université Paris-Dauphine, IUF, & CREST Université de Besançon, November 22, 2012 MCMC and likelihood-free methods Computational issues in Bayesian
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationA short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods
A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods by Kasper K. Berthelsen and Jesper Møller June 2004 2004-01 DEPARTMENT OF MATHEMATICAL SCIENCES AALBORG
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationMonte Carlo Methods. Geoff Gordon February 9, 2006
Monte Carlo Methods Geoff Gordon ggordon@cs.cmu.edu February 9, 2006 Numerical integration problem 5 4 3 f(x,y) 2 1 1 0 0.5 0 X 0.5 1 1 0.8 0.6 0.4 Y 0.2 0 0.2 0.4 0.6 0.8 1 x X f(x)dx Used for: function
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationTEORIA BAYESIANA Ralph S. Silva
TEORIA BAYESIANA Ralph S. Silva Departamento de Métodos Estatísticos Instituto de Matemática Universidade Federal do Rio de Janeiro Sumário Numerical Integration Polynomial quadrature is intended to approximate
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationApproximate Bayesian Computation: a simulation based approach to inference
Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More information13 Notes on Markov Chain Monte Carlo
13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationPSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL
PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationDynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models
6 Dependent data The AR(p) model The MA(q) model Hidden Markov models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGradient-based Monte Carlo sampling methods
Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains
More informationSpatial Statistics Chapter 4 Basics of Bayesian Inference and Computation
Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:
More informationQuantifying Uncertainty
Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationComputer Practical: Metropolis-Hastings-based MCMC
Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov
More informationOverlapping block proposals for latent Gaussian Markov random fields
NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Overlapping block proposals for latent Gaussian Markov random fields by Ingelin Steinsland and Håvard Rue PREPRINT STATISTICS NO. 8/3 NORWEGIAN UNIVERSITY
More information