MCMC and likelihood-free methods

Size: px
Start display at page:

Download "MCMC and likelihood-free methods"

Transcription

1 MCMC and likelihood-free methods Christian P. Robert Université Paris-Dauphine, IUF, & CREST Université de Besançon, November 22, 2012

2 MCMC and likelihood-free methods Computational issues in Bayesian cosmology Computational issues in Bayesian cosmology Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation

3 Computational issues in Bayesian cosmology Statistical problems in cosmology Potentially high dimensional parameter space [Not considered here] Immensely slow computation of likelihoods, e.g WMAP, CMB, because of numerically costly spectral transforms [Data is a Fortran program] Nonlinear dependence and degeneracies between parameters introduced by physical constraints or theoretical assumptions

4 Computational issues in Bayesian cosmology Cosmological data Posterior distribution of cosmological parameters for recent observational data of CMB anisotropies (differences in temperature from directions) [WMAP], SNIa, and cosmic shear. Combination of three likelihoods, some of which are available as public (Fortran) code, and of a uniform prior on a hypercube.

5 Computational issues in Bayesian cosmology Cosmology parameters Parameters for the cosmology likelihood (C=CMB, S=SNIa, L=lensing) Symbol Description Minimum Maximum Experiment Ω b Baryon density C L Ω m Total matter density C S L w Dark-energy eq. of state C S L n s Primordial spectral index C L 2 R Normalization (large scales) C σ 8 Normalization (small scales) C L h Hubble constant C L τ Optical depth C M Absolute SNIa magnitude S α Colour response S β Stretch response S a L b galaxy z-distribution fit L c L For WMAP5, σ 8 is a deduced quantity that depends on the other parameters

6 Computational issues in Bayesian cosmology Adaptation of importance function [Benabed et al., MNRAS, 2010]

7 Computational issues in Bayesian cosmology Estimates Parameter PMC MCMC Ω b Ω m τ w ± n s R h a b c M ± α β ± 0.16 σ Means and 68% credible intervals using lensing, SNIa and CMB

8 Computational issues in Bayesian cosmology Evidence/Marginal likelihood/integrated Likelihood... Central quantity of interest in (Bayesian) model choice π(x) E = π(x)dx = q(x) q(x)dx. expressed as an expectation under any density q with large enough support.

9 Computational issues in Bayesian cosmology Evidence/Marginal likelihood/integrated Likelihood... Central quantity of interest in (Bayesian) model choice π(x) E = π(x)dx = q(x) q(x)dx. expressed as an expectation under any density q with large enough support. Importance sampling provides a sample x 1,... x N q and approximation of the above integral, where the w n = π(x n) q(x n ) E N n=1 w n are the (unnormalised) importance weights.

10 Computational issues in Bayesian cosmology Back to cosmology questions Standard cosmology successful in explaining recent observations, such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster counts, and Lyα forest clustering. Flat ΛCDM model with only six free parameters (Ω m, Ω b, h, n s, τ, σ 8 )

11 Computational issues in Bayesian cosmology Back to cosmology questions Standard cosmology successful in explaining recent observations, such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster counts, and Lyα forest clustering. Flat ΛCDM model with only six free parameters (Ω m, Ω b, h, n s, τ, σ 8 ) Extensions to ΛCDM may be based on independent evidence (massive neutrinos from oscillation experiments), predicted by compelling hypotheses (primordial gravitational waves from inflation) or reflect ignorance about fundamental physics (dynamical dark energy). Testing for dark energy, curvature, and inflationary models

12 Computational issues in Bayesian cosmology Extended models Focus on the dark energy equation-of-state parameter, modeled as w = 1 w = w 0 w = w 0 + w 1 (1 a) ΛCDM wcdm w(z)cdm In addition, curvature parameter Ω K for each of the above is either Ω K = 0 ( flat ) or Ω K 0 ( curved ). Choice of models represents simplest models beyond a cosmological constant model able to explain the observed, recent accelerated expansion of the Universe.

13 Computational issues in Bayesian cosmology Cosmology priors Prior ranges for dark energy and curvature models. In case of w(a) models, the prior on w 1 depends on w 0 Parameter Description Min. Max. Ω m Total matter density Ω b Baryon density h Hubble parameter Ω K Curvature 1 1 w 0 Constant dark-energy par. 1 1/3 1/3 w w 1 Linear dark-energy par. 1 w a acc

14 Computational issues in Bayesian cosmology Results In most cases evidence in favour of the standard model. especially when more datasets/experiments are combined. Largest evidence is ln B 12 = 1.8, for the w(z)cdm model and CMB alone. Case where a large part of the prior range is still allowed by the data, and a region of comparable size is excluded. Hence weak evidence that both w 0 and w 1 are required, but excluded when adding SNIa and BAO datasets. Results on the curvature are compatible with current findings: non-flat Universe(s) strongly disfavoured for the three dark-energy cases.

15 Computational issues in Bayesian cosmology Evidence

16 Computational issues in Bayesian cosmology Posterior outcome Posterior on dark-energy parameters w 0 and w 1 as 68%- and 95% credible regions for WMAP (solid blue lines), WMAP+SNIa (dashed green) and WMAP+SNIa+BAO (dotted red curves). Allowed prior range as red straight lines.

17 MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis-Hastings Algorithm Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation

18 The Metropolis-Hastings Algorithm Monte Carlo basics General purpose A major computational issue in Bayesian statistics: Given a density π known up to a normalizing constant, and an integrable function h, compute h(x) π(x)µ(dx) Π(h) = h(x)π(x)µ(dx) = π(x)µ(dx) when h(x) π(x)µ(dx) is intractable.

19 The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo 101 Generate an iid sample x 1,..., x N from π and estimate Π(h) by ^Π MC N (h) = N 1 LLN: ^Π MC as N (h) Π(h) If Π(h 2 ) = h 2 (x)π(x)µ(dx) <, N h(x i ). i=1 CLT: N ( ^Π MC N (h) Π(h)) L N ( 0, Π { [h Π(h)] 2 }).

20 The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo 101 Generate an iid sample x 1,..., x N from π and estimate Π(h) by ^Π MC N (h) = N 1 LLN: ^Π MC as N (h) Π(h) If Π(h 2 ) = h 2 (x)π(x)µ(dx) <, N h(x i ). i=1 CLT: N ( ^Π MC N (h) Π(h)) L N ( 0, Π { [h Π(h)] 2 }). Caveat conducting to MCMC Often impossible or inefficient to simulate directly from Π

21 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I = h(x)f(x)dx,

22 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I = h(x)f(x)dx, [notation warnin: π turned to f!]

23 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I = h(x)f(x)dx, We can obtain X 1,..., X n f (approx) without directly simulating from f, using an ergodic Markov chain with stationary distribution f

24 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I = h(x)f(x)dx, We can obtain X 1,..., X n f (approx) without directly simulating from f, using an ergodic Markov chain with stationary distribution f Andreï Markov

25 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x (0), an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f

26 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x (0), an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f irreducible Markov chain with stationary distribution f is ergodic with limiting distribution f under weak conditions hence convergence in distribution of (X (t) ) to a random variable from f. for T 0 large enough T 0, X (T 0) distributed from f Markov sequence is dependent sample X (T 0), X (T 0+1),... generated from f Birkoff s ergodic theorem extends LLN, sufficient for most approximation purposes

27 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x (0), an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f Problem: How can one build a Markov chain with a given stationary distribution?

28 The Metropolis-Hastings Algorithm The Metropolis Hastings algorithm The Metropolis Hastings algorithm Arguments: The algorithm uses the objective (target) density and a conditional density f q(y x) called the instrumental (or proposal) distribution Nicholas Metropolis

29 The Metropolis-Hastings Algorithm The Metropolis Hastings algorithm The MH algorithm Algorithm (Metropolis Hastings) Given x (t), 1. Generate Y t q(y x (t) ). 2. Take X (t+1) = { Y t with prob. ρ(x (t), Y t ), x (t) with prob. 1 ρ(x (t), Y t ), where { f(y) ρ(x, y) = min f(x) } q(x y) q(y x), 1.

30 The Metropolis-Hastings Algorithm The Metropolis Hastings algorithm Features Independent of normalizing constants for both f and q( x) (ie, those constants independent of x) Never move to values with f(y) = 0 The chain (x (t) ) t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (y t ) t is usually not a Markov chain

31 The Metropolis-Hastings Algorithm The Metropolis Hastings algorithm Convergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y)

32 The Metropolis-Hastings Algorithm The Metropolis Hastings algorithm Convergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent

33 The Metropolis-Hastings Algorithm The Metropolis Hastings algorithm Convergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent 3. If [ f(yt ) q(x (t) ] Y t ) Pr f(x (t) ) q(y t X (t) ) 1 < 1. (1) that is, the event {X (t+1) = X (t) } is possible, then the chain is aperiodic

34 The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Random walk Metropolis Hastings Use of a local perturbation as proposal Y t = X (t) + ε t, where ε t g, independent of X (t). The instrumental density is of the form g(y x) and the Markov chain is a random walk if we take g to be symmetric g(x) = g( x)

35 The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Random walk Metropolis Hastings [code] Algorithm (Random walk Metropolis) Given x (t) 1. Generate Y t g(y x (t) ) 2. Take X (t+1) = Y t x (t) { } f(y t ) with prob. min 1, f(x (t), ) otherwise.

36 The Metropolis-Hastings Algorithm Extensions Langevin Algorithms Proposal based on the Langevin diffusion L t is defined by the stochastic differential equation dl t = db t log f(l t)dt, where B t is the standard Brownian motion Theorem The Langevin diffusion is the only non-explosive diffusion which is reversible with respect to f.

37 The Metropolis-Hastings Algorithm Extensions Discretization Instead, consider the sequence x (t+1) = x (t) + σ2 2 log f(x(t) ) + σε t, ε t N p (0, I p ) where σ 2 corresponds to the discretization step

38 The Metropolis-Hastings Algorithm Extensions Discretization Instead, consider the sequence x (t+1) = x (t) + σ2 2 log f(x(t) ) + σε t, ε t N p (0, I p ) where σ 2 corresponds to the discretization step Unfortunately, the discretized chain may be transient, for instance when lim σ 2 log f(x) x 1 > 1 x ±

39 The Metropolis-Hastings Algorithm Extensions MH correction Accept the new value Y t with probability { exp Y t x (t) σ2 f(y t ) f(x (t) ) 2 log / } f(x(t) ) 2 2σ 2 { exp x (t) Y t σ2 2 log f(y / } 1. t) 2 2σ 2 Choice of the scaling factor σ Should lead to an acceptance rate of to achieve optimal convergence rates (when the components of x are uncorrelated) [Roberts & Rosenthal, 1998; Girolami & Calderhead, 2011]

40 The Metropolis-Hastings Algorithm Extensions Optimizing the Acceptance Rate Problem of choosing the transition q kernel from a practical point of view Most common solutions: (a) a fully automated algorithm like ARMS; [Gilks & Wild, 1992] (b) an instrumental density g which approximates f, such that f/g is bounded for uniform ergodicity to apply; (c) a random walk In both cases (b) and (c), the choice of g is critical,

41 The Metropolis-Hastings Algorithm Extensions Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f.

42 The Metropolis-Hastings Algorithm Extensions Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f. If x (t) and y t are close, i.e. f(x (t) ) f(y t ) y is accepted with probability ( ) f(yt ) min f(x (t) ), 1 1. For multimodal densities with well separated modes, the negative effect of limited moves on the surface of f clearly shows.

43 The Metropolis-Hastings Algorithm Extensions Case of the random walk (2) If the average acceptance rate is low, the successive values of f(y t ) tend to be small compared with f(x (t) ), which means that the random walk moves quickly on the surface of f since it often reaches the borders of the support of f

44 The Metropolis-Hastings Algorithm Extensions Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]

45 The Metropolis-Hastings Algorithm Extensions Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995] warnin: rule to be taken with a pinch of salt!

46 The Metropolis-Hastings Algorithm Extensions Role of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, x t+1 = ϕx t + ɛ t+1 ɛ t N(0, τ 2 ) and observables y t x t N(x 2 t, σ 2 )

47 The Metropolis-Hastings Algorithm Extensions Role of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, x t+1 = ϕx t + ɛ t+1 ɛ t N(0, τ 2 ) and observables y t x t N(x 2 t, σ 2 ) The distribution of x t given x t 1, x t+1 and y t is exp 1 } {(x 2τ 2 t ϕx t 1 ) 2 + (x t+1 ϕx t ) 2 + τ2 σ 2 (y t x 2 t) 2.

48 The Metropolis-Hastings Algorithm Extensions Role of scale Example (Noisy AR(1) continued) For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is sufficiently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.

49 The Metropolis-Hastings Algorithm Extensions Role of scale Markov chain based on a random walk with scale ω =.1.

50 The Metropolis-Hastings Algorithm Extensions Role of scale Markov chain based on a random walk with scale ω =.5.

51 MCMC and likelihood-free methods The Gibbs Sampler The Gibbs Sampler Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation

52 The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f 1,..., f p from f

53 The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f 1,..., f p from f 2. Start with the random variable X = (X 1,..., X p )

54 The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f 1,..., f p from f 2. Start with the random variable X = (X 1,..., X p ) 3. Simulate from the conditional densities, for i = 1, 2,..., p. X i x 1, x 2,..., x i 1, x i+1,..., x p f i (x i x 1, x 2,..., x i 1, x i+1,..., x p )

55 The Gibbs Sampler General Principles Gibbs code Algorithm (Gibbs sampler) Given x (t) = (x (t) 1,..., x(t) p ), generate 1. X (t+1) 1 f 1 (x 1 x (t) 2,..., x(t) p ); 2. X (t+1) 2 f 2 (x 2 x (t+1) 1, x (t) 3,..., x(t) p ),... p. X (t+1) p f p (x p x (t+1) 1,..., x (t+1) p 1 ) X (t+1) X f

56 The Gibbs Sampler General Principles Properties The full conditionals densities f 1,..., f p are the only densities used for simulation. Thus, even in a high dimensional problem, all of the simulations may be univariate

57 The Gibbs Sampler General Principles toy example: iid N(µ, σ 2 ) variates When Y 1,..., Y n iid N(y µ, σ 2 ) with both µ and σ unknown, the posterior in (µ, σ 2 ) is conjugate outside a standard familly

58 The Gibbs Sampler General Principles toy example: iid N(µ, σ 2 ) variates When Y 1,..., Y n iid N(y µ, σ 2 ) with both µ and σ unknown, the posterior in (µ, σ 2 ) is conjugate outside a standard familly But... ( µ Y 0:n, σ 2 N µ 1 n n i=1 Y i, σ2 σ 2 Y 1:n, µ IG ( σ 2 n 2 1, 1 2 n ) n i=1 (Y i µ) 2 ) assuming constant (improper) priors on both µ and σ 2 Hence we may use the Gibbs sampler for simulating from the posterior of (µ, σ 2 )

59 The Gibbs Sampler General Principles toy example: R code Gibbs Sampler for Gaussian posterior n = length(y); S = sum(y); mu = S/n; for (i in 1:500) S2 = sum((y-mu)^2); sigma2 = 1/rgamma(1,n/2-1,S2/2); mu = S/n + sqrt(sigma2/n)*rnorm(1);

60 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1

61 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2

62 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3

63 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4

64 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5

65 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10

66 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25

67 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50

68 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100

69 The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100, 500

70 The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions

71 The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f

72 The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional

73 The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional 4. does not apply to problems where the number of parameters varies as the resulting chain is not irreducible.

74 The Gibbs Sampler General Principles A wee problem µ µ 1 Gibbs started at random

75 The Gibbs Sampler General Principles A wee problem Gibbs stuck at the wrong mode µ µ µ 1 Gibbs started at random µ 1

76 The Gibbs Sampler General Principles Slice sampler as generic Gibbs If f(θ) can be written as a product k f i (θ), i=1

77 The Gibbs Sampler General Principles Slice sampler as generic Gibbs If f(θ) can be written as a product k f i (θ), it can be completed as k i=1 i=1 I 0 ωi f i (θ), leading to the following Gibbs algorithm:

78 The Gibbs Sampler General Principles Slice sampler (code) Algorithm (Slice sampler) Simulate 1. ω (t+1) 1 U [0,f1 (θ (t) )] ;... k. ω (t+1) k U [0,fk (θ (t) )] ; k+1. θ (t+1) U A (t+1), with A (t+1) = {y; f i (y) ω (t+1) i, i = 1,..., k}.

79 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2

80 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2, 3

81 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2, 3, 4

82 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2, 3, 4, 5

83 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2, 3, 4, 5, 10

84 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2, 3, 4, 5, 10, 50

85 y MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N( 3, 1) distribution x Number of Iterations 2, 3, 4, 5, 10, 50, 100

86 MCMC and likelihood-free methods Approximate Bayesian computation Approximate Bayesian computation Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation

87 Approximate Bayesian computation ABC basics Regular Bayesian computation issues Recap : When faced with a non-standard posterior distribution π(θ y) π(θ)l(θ y) the standard solution is to use simulation (Monte Carlo) to produce a sample θ 1,..., θ T from π(θ y) (or approximately by Markov chain Monte Carlo methods) [Robert & Casella, 2004]

88 Approximate Bayesian computation ABC basics Untractable likelihoods Cases when the likelihood function f(y θ) is unavailable (in analytic and numerical senses) and when the completion step f(y θ) = f(y, z θ) dz is impossible or too costly because of the dimension of z c MCMC cannot be implemented! Z

89 Approximate Bayesian computation ABC basics Illustration Phylogenetic tree: in population genetics, reconstitution of a common ancestor from a sample of genes via a phylogenetic tree that is close to impossible to integrate out [100 processor days with 4 parameters] [Cornuet et al., 2009, Bioinformatics]

90 Approximate Bayesian computation ABC basics Illustration!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ demo-genetic Différents inference scénarios possibles, choix de scenario par ABC Genetic model of evolution from a common ancestor (MRCA) characterized by a set of parameters that cover historical, demographic, and genetic factors Dataset of polymorphism (DNA sample) observed at the present time Le scenario 1a est largement soutenu par rapport aux autres! plaide pour une origine commune des populations pygmées d Afrique de l Ouest Verdu et al

91 Approximate Bayesian computation ABC basics Illustration Pygmies population demo-genetics Pygmies populations: do they have a common origin? when and how did they split from non-pygmies populations? were there more recent interactions between pygmies and non-pygmies populations?!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ 94

92 Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f(x θ)

93 Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f(x θ) When likelihood f(x θ) not in closed form, likelihood-free rejection technique:

94 Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f(x θ) When likelihood f(x θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y f(y θ), under the prior π(θ), keep jointly simulating θ π(θ), z f(z θ ), until the auxiliary variable z is equal to the observed value, z = y. [Tavaré et al., 1997]

95 Approximate Bayesian computation ABC basics Why does it work?! The proof is trivial: f(θ i ) π(θ i )f(z θ i )I y (z) z D π(θ i )f(y θ i ) = π(θ i y). [Accept Reject 101]

96 Approximate Bayesian computation ABC basics A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, where ρ is a distance ρ(y, z) ɛ

97 Approximate Bayesian computation ABC basics A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, where ρ is a distance Output distributed from ρ(y, z) ɛ π(θ) P θ {ρ(y, z) < ɛ} π(θ ρ(y, z) < ɛ)

98 Approximate Bayesian computation ABC basics ABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π( ) generate z from the likelihood f( θ ) until ρ{η(z), η(y)} ɛ set θ i = θ end for where η(y) defines a (not necessarily sufficient) statistic

99 Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π ɛ (θ, z y) = π(θ)f(z θ)i Aɛ,y (z) A ɛ,y Θ π(θ)f(z θ)dzdθ, where A ɛ,y = {z D ρ(η(z), η(y)) < ɛ}.

100 Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π ɛ (θ, z y) = π(θ)f(z θ)i Aɛ,y (z) A ɛ,y Θ π(θ)f(z θ)dzdθ, where A ɛ,y = {z D ρ(η(z), η(y)) < ɛ}. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π ɛ (θ y) = π ɛ (θ, z y)dz π(θ η(y)).

101 . MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Pima Indian benchmark Density Density Density Figure: Comparison between density estimates of the marginals on β 1 (left), β 2 (center) and β 3 (right) from ABC rejection samples (red) and MCMC samples (black)

102 Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency

103 Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

104 Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger ɛ [Beaumont et al., 2002]

105 Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger ɛ [Beaumont et al., 2002]...or even by including ɛ in the inferential framework [ABC µ ] [Ratmann et al., 2009]

106 Approximate Bayesian computation ABC basics ABC-MCMC Markov chain (θ (t) ) created via the transition function θ K ω (θ θ (t) ) if x f(x θ ) is such that x = y θ (t+1) = and u U(0, 1) π(θ )K ω (θ (t) θ ) θ (t) otherwise, π(θ (t) )K ω (θ θ (t) ),

107 Approximate Bayesian computation ABC basics ABC-MCMC Markov chain (θ (t) ) created via the transition function θ K ω (θ θ (t) ) if x f(x θ ) is such that x = y θ (t+1) = and u U(0, 1) π(θ )K ω (θ (t) θ ) θ (t) otherwise, π(θ (t) )K ω (θ θ (t) ), has the posterior π(θ y) as stationary distribution [Marjoram et al, 2003]

108 Approximate Bayesian computation ABC basics ABC-MCMC (2) Algorithm 2 Likelihood-free MCMC sampler Use Algorithm 1 to get (θ (0), z (0) ) for t = 1 to N do Generate θ from K ω ( θ (t 1) ), Generate z from the likelihood f( θ ), Generate u from U [0,1], if u π(θ )K ω (θ (t 1) θ ) π(θ (t 1) K ω (θ θ (t 1) ) I A ɛ,y (z ) then set (θ (t), z (t) ) = (θ, z ) else (θ (t), z (t) )) = (θ (t 1), z (t 1) ), end if end for

109 Approximate Bayesian computation ABC basics Sequential Monte Carlo SMC is a simulation technique to approximate a sequence of related probability distributions π n with π 0 easy and π T as target. Iterated IS as PMC : particles moved from time n to time n via kernel K n and use of a sequence of extended targets π n π n (z 0:n ) = π n (z n ) n L j (z j+1, z j ) where the L j s are backward Markov kernels [check that π n (z n ) is a marginal] [Del Moral, Doucet & Jasra, Series B, 2006] j=0

110 Approximate Bayesian computation ABC basics Sequential Monte Carlo (2) Algorithm 3 SMC sampler [Del Moral, Doucet & Jasra, Series B, 2006] sample z (0) i γ 0 (x) (i = 1,..., N) compute weights w (0) i = π 0 (z (0) i ))/γ 0 (z (0) i ) for t = 1 to N do if ESS(w (t 1) ) < N T then resample N particles z (t 1) and set weights to 1 end if generate z (t 1) i K t (z (t 1) i, ) and set weights to end for w (t) i = W (t 1) i 1 π t (z (t) i π t 1 (z (t 1) i ))L t 1 (z (t) i ), z (t 1) i )) ))K t (z (t 1) i ), z (t) i ))

111 Approximate Bayesian computation ABC basics ABC-SMC [Del Moral, Doucet & Jasra, 2009] True derivation of an SMC-ABC algorithm Use of a kernel K n associated with target π ɛn and derivation of the backward kernel Update of the weights L n 1 (z, z ) = π ɛ n (z )K n (z, z) π n (z) w in w i(n 1) M m=1 I A ɛn (x m in ) M m=1 I A ɛn 1 (xm i(n 1) ) when x m in K(x i(n 1), )

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Approximate Bayesian Computation: a simulation based approach to inference

Approximate Bayesian Computation: a simulation based approach to inference Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Tutorial on ABC Algorithms

Tutorial on ABC Algorithms Tutorial on ABC Algorithms Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au July 3, 2014 Notation Model parameter θ with prior π(θ) Likelihood is f(ý θ) with observed

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Introduction to CosmoMC

Introduction to CosmoMC Introduction to CosmoMC Part I: Motivation & Basic concepts Institut de Ciències del Cosmos - Universitat de Barcelona Dept. de Física Teórica y del Cosmos, Universidad de Granada, 1-3 Marzo 2016 What

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Multimodal Nested Sampling

Multimodal Nested Sampling Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

A = {(x, u) : 0 u f(x)},

A = {(x, u) : 0 u f(x)}, Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods University of Pennsylvania Bradley Visitor Lectures October 23, 2017 Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Likelihood-free inference and approximate Bayesian computation for stochastic modelling

Likelihood-free inference and approximate Bayesian computation for stochastic modelling Likelihood-free inference and approximate Bayesian computation for stochastic modelling Master Thesis April of 2013 September of 2013 Written by Oskar Nilsson Supervised by Umberto Picchini Centre for

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

CosmoSIS Webinar Part 1

CosmoSIS Webinar Part 1 CosmoSIS Webinar Part 1 An introduction to cosmological parameter estimation and sampling Presenting for the CosmoSIS team: Elise Jennings (Fermilab, KICP) Joe Zuntz (University of Manchester) Vinicius

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Surveying the Characteristics of Population Monte Carlo

Surveying the Characteristics of Population Monte Carlo International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Reliable Approximate Bayesian computation (ABC) model choice via random forests

Reliable Approximate Bayesian computation (ABC) model choice via random forests Reliable Approximate Bayesian computation (ABC) model choice via random forests Christian P. Robert Université Paris-Dauphine, Paris & University of Warwick, Coventry Max-Plank-Institut für Physik, October

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

SC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

SMC 2 : an efficient algorithm for sequential analysis of state-space models

SMC 2 : an efficient algorithm for sequential analysis of state-space models SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds. Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds. OPTIMAL PARALLELIZATION OF A SEQUENTIAL APPROXIMATE BAYESIAN COMPUTATION

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling

More information

An Brief Overview of Particle Filtering

An Brief Overview of Particle Filtering 1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

Generating Random Variables

Generating Random Variables Generating Random Variables Christian Robert Université Paris Dauphine and CREST, INSEE George Casella University of Florida Keywords and Phrases: Random Number Generator, Probability Integral Transform,

More information

arxiv: v1 [stat.me] 30 Sep 2009

arxiv: v1 [stat.me] 30 Sep 2009 Model choice versus model criticism arxiv:0909.5673v1 [stat.me] 30 Sep 2009 Christian P. Robert 1,2, Kerrie Mengersen 3, and Carla Chen 3 1 Université Paris Dauphine, 2 CREST-INSEE, Paris, France, and

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

Outline. General purpose

Outline. General purpose Outline Population Monte Carlo and adaptive sampling schemes Christian P. Robert Université Paris Dauphine and CREST-INSEE http://www.ceremade.dauphine.fr/~xian 1 2 3 Illustrations 4 Joint work with O.

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

arxiv: v2 [stat.co] 27 May 2011

arxiv: v2 [stat.co] 27 May 2011 Approximate Bayesian Computational methods arxiv:1101.0955v2 [stat.co] 27 May 2011 Jean-Michel Marin Institut de Mathématiques et Modélisation (I3M), Université Montpellier 2, France Pierre Pudlo Institut

More information

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

13 Notes on Markov Chain Monte Carlo

13 Notes on Markov Chain Monte Carlo 13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful

More information