Manifold Monte Carlo Methods

Size: px
Start display at page:

Download "Manifold Monte Carlo Methods"

Transcription

1 Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October 13, 2010

2 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

3 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

4 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1-37.

5 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, Advanced Monte Carlo methodology founded on geometric principles

6 Talk Outline Motivation to improve MCMC capability for challenging problems

7 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process

8 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology

9 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism

10 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods

11 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian

12 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model

13 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model Log-Gaussian Cox process

14 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model Log-Gaussian Cox process Conclusions

15 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 )

16 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ)

17 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts

18 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ)

19 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ) accept or reject proposal with probability» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ)

20 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ) accept or reject proposal with probability» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Efficiency dependent on p p(θ θ) defining proposal mechanism

21 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ) accept or reject proposal with probability» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Efficiency dependent on p p(θ θ) defining proposal mechanism Success of MCMC reliant upon appropriate proposal design

22 Adaptive Proposal Distributions - Exploit Discretised Diffusion

23 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t)

24 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ)

25 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ)

26 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ) Acceptance probability to correct for bias» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ)

27 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ) Acceptance probability to correct for bias» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Isotropic diffusion inefficient, employ pre-conditioning θ = θ + ɛ 2 M θ L(θ)/2 + ɛ Mz

28 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ) Acceptance probability to correct for bias» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Isotropic diffusion inefficient, employ pre-conditioning θ = θ + ɛ 2 M θ L(θ)/2 + ɛ Mz How to set M systematically? Tuning in transient & stationary phases

29 Geometric Concepts in MCMC

30 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ)

31 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ)

32 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold

33 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics

34 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989

35 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989 Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

36 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989 Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998 Can geometric structure be employed in MCMC methodology?

37 Geometric Concepts in MCMC

38 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l

39 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold

40 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold Γ i kl = 1 X g im gmk + g ml g «kl 2 θ l θ k θ m m

41 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold Γ i kl = 1 X g im gmk + g ml g «kl 2 θ l θ k θ m m Geodesics - shortest path between two points on manifold

42 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold Γ i kl = 1 X g im gmk + g ml g «kl 2 θ l θ k θ m m Geodesics - shortest path between two points on manifold d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0

43 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ)

44 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T

45 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T Metric is Fisher Information» σ 2 0 G(µ, σ) = 0 2σ 2

46 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T Metric is Fisher Information» σ 2 0 G(µ, σ) = 0 2σ 2 Inner-product σ 2 (δµ 2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further apart than the densities N (0, 2) & N (1, 2) - distance non-euclidean

47 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T Metric is Fisher Information» σ 2 0 G(µ, σ) = 0 2σ 2 Inner-product σ 2 (δµ 2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further apart than the densities N (0, 2) & N (1, 2) - distance non-euclidean Metric tensor for univariate Normal defines a Hyperbolic Space

48 M.C. Escher, Heaven and Hell, 1960

49 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j G(θ) 1 ij Γ d ij + ɛ p G 1 (θ)z d

50 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z

51 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z MALA proposal with preconditioning θ = θ + ɛ2 2 M θl(θ) + ɛ Mz

52 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z MALA proposal with preconditioning θ = θ + ɛ2 2 M θl(θ) + ɛ Mz Proposal and acceptance probability p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 G(θ))» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ)

53 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z MALA proposal with preconditioning θ = θ + ɛ2 2 M θl(θ) + ɛ Mz Proposal and acceptance probability p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 G(θ))» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Proposal mechanism diffuses approximately along the manifold

54 Langevin Diffusion on Riemannian manifold !! µ µ

55 Langevin Diffusion on Riemannian manifold !! µ µ

56 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics

57 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0

58 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator?

59 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour

60 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ)

61 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ) Introduce auxiliary variable p N (0, G(θ))

62 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ) Introduce auxiliary variable p N (0, G(θ)) Negative joint log density is H(θ, p) = L(θ) log 2πD G(θ) pt G(θ) 1 p

63 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy

64 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)}

65 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)} Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) p n+1 θ n N (0, G(θ n )) θ n+1 p n+1 p(θ n+1 p n+1 )

66 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)} Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) p n+1 θ n N (0, G(θ n )) θ n+1 p n+1 p(θ n+1 p n+1 ) Employ Hamiltonian dynamics to propose samples for p(θ n+1 p n+1 ), Duane et al, 1987; Neal, 2010.

67 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)} Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) p n+1 θ n N (0, G(θ n )) θ n+1 p n+1 p(θ n+1 p n+1 ) Employ Hamiltonian dynamics to propose samples for p(θ n+1 p n+1 ), Duane et al, 1987; Neal, dθ dt = dp H(θ, p) p dt = H(θ, p) θ

68 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p

69 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G

70 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p

71 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p)

72 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h

73 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ dt = dp H(θ, p) p dt = H(θ, p) θ

74 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ dt = dp H(θ, p) p dt is therefore equivalent to the solution of = H(θ, p) θ d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0

75 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ dt = dp H(θ, p) p dt is therefore equivalent to the solution of = H(θ, p) θ d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 RMHMC proposals are along the manifold geodesics

76 Warped Bivariate Gaussian p(w 1, w 2 y, σ x, σ y) Q N n=1 N (yn w 1 + w2 2, σy 2 )N (w 1, w 2 0, σx 2 I)

77 Warped Bivariate Gaussian p(w 1, w 2 y, σ x, σ y) Q N n=1 N (yn w 1 + w2 2, σy 2 )N (w 1, w 2 0, σx 2 I) HMC 2 RMHMC 1 1 w2 0 w w w1

78 Gaussian Mixture Model Univariate finite mixture model KX p(x i θ) = π k N (x i µ k, σk 2 ) k=1

79 Gaussian Mixture Model Univariate finite mixture model KX p(x i θ) = π k N (x i µ k, σk 2 ) k=1 FI based metric tensor non-analytic - employ empirical FI

80 Gaussian Mixture Model Univariate finite mixture model KX p(x i θ) = π k N (x i µ k, σk 2 ) k=1 FI based metric tensor non-analytic - employ empirical FI G(θ) = 1 N ST S 1 N 2 s st N cov ( θl(θ)) = I(θ) G(θ) θ d = 1 N! S T S + S T S 1 «s s T + s st θ d θ d N 2 θ d θ d with score matrix S with elements S i,d = log p(x i θ) θ d and s = P N i=1 ST i,

81 Gaussian Mixture Model Univariate finite mixture model p(x µ, σ 2 ) = 0.7 N (x 0, σ 2 ) N (x µ, σ 2 )

82 Gaussian Mixture Model Univariate finite mixture model p(x µ, σ 2 ) = 0.7 N (x 0, σ 2 ) N (x µ, σ 2 ) lnσ µ Figure: Arrows correspond to the gradients and ellipses to the inverse metric tensor. Dashed lines are isocontours of the joint log density

83 Gaussian Mixture Model MALA path mmala path Simplified mmala path lnσ 0 lnσ 0 lnσ µ µ µ MALA Sample Autocorrelation Function mmala Sample Autocorrelation Function Simplified mmala Sample Autocorrelation Function Sample Autocorrelation Sample Autocorrelation Sample Autocorrelation Lag Lag Lag Figure: Comparison of MALA (left), mmala (middle) and simplified mmala (right) convergence paths and autocorrelation plots. Autocorrelation plots are from the stationary chains, i.e. once the chains have converged to the stationary distribution.

84 Gaussian Mixture Model HMC path RMHMC path Gibbs path lnσ 0 lnσ 0 lnσ µ µ µ HMC Sample Autocorrelation Function RMHMC Sample Autocorrelation Function Gibbs Sample Autocorrelation Function Sample Autocorrelation Sample Autocorrelation Sample Autocorrelation Lag Lag Lag Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergence paths and autocorrelation plots. Autocorrelation plots are from the stationary chains, i.e. once the chains have converged to the stationary distribution.

85 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2)

86 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2) Metric tensors G(θ) i,j = 1 2 trace G(x) = Λ + Σ 1 θ Σ 1 θ Σ θ θ i Σ 1 θ where Λ is diagonal with elements m exp(µ + (Σ θ ) i,i ) «Σ θ θ j

87 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2) Metric tensors G(θ) i,j = 1 2 trace G(x) = Λ + Σ 1 θ Σ 1 θ Σ θ θ i Σ 1 θ where Λ is diagonal with elements m exp(µ + (Σ θ ) i,i ) «Σ θ θ j Latent field metric tensor defining flat manifold is , O(N 3 ) obtained from parameter conditional

88 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2) Metric tensors G(θ) i,j = 1 2 trace G(x) = Λ + Σ 1 θ Σ 1 θ Σ θ θ i Σ 1 θ where Λ is diagonal with elements m exp(µ + (Σ θ ) i,i ) «Σ θ θ j Latent field metric tensor defining flat manifold is , O(N 3 ) obtained from parameter conditional MALA requires transformation of latent field to sample - with separate tuning in transient and stationary phases of Markov chain Manifold methods requires no pilot tuning or additional transformations

89 RMHMC for Log-Gaussian Cox Point Processes Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of sampling methods Method Time ESS (Min, Med, Max) s/min ESS Rel. Speed MALA (Transient) 31,577 (3, 8, 50) 10,605 1 MALA (Stationary) 31,118 (4, 16, 80) mmala 634 (26, 84, 174) RMHMC 2936 (1951, 4545, 5000)

90 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods

91 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA

92 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC

93 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models

94 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology

95 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development

96 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols

97 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence

98 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures

99 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric

100 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics

101 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics Alternative transition kernels

102 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics Alternative transition kernels No silver bullet or cure all - new powerful methodology for MC toolkit

103 Funding Acknowledgment Girolami funded by EPSRC Advanced Research Fellowship and BBSRC project grant Calderhead supported by Microsoft Research PhD Scholarship

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Riemann manifold Langevin and Hamiltonian Monte Carlo methods

Riemann manifold Langevin and Hamiltonian Monte Carlo methods J. R. Statist. Soc. B (2011) 73, Part 2, pp. 123 214 Riemann manifold Langevin and Hamiltonian Monte Carlo methods Mark Girolami and Ben Calderhead University College London, UK [Read before The Royal

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Gradient-based Monte Carlo sampling methods

Gradient-based Monte Carlo sampling methods Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification

More information

1 Geometry of high dimensional probability distributions

1 Geometry of high dimensional probability distributions Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Tutorial: Statistical distance and Fisher information

Tutorial: Statistical distance and Fisher information Tutorial: Statistical distance and Fisher information Pieter Kok Department of Materials, Oxford University, Parks Road, Oxford OX1 3PH, UK Statistical distance We wish to construct a space of probability

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

Information-Geometric Markov Chain Monte Carlo Methods Using Diffusions

Information-Geometric Markov Chain Monte Carlo Methods Using Diffusions Entropy 2014, 16, 3074-3102; doi:10.3390/e16063074 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information-Geometric Markov Chain Monte Carlo Methods Using Diffusions Samuel

More information

A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS

A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS TAN BUI-THANH AND OMAR GHATTAS Abstract. We propose a scaled stochastic Newton algorithm ssn) for local Metropolis-Hastings

More information

ICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami

ICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami ICES REPORT 4- March 4 Solving Large-Scale Pde-Constrained Bayesian Inverse Problems With Riemann Manifold Hamiltonian Monte Carlo by Tan Bui-Thanh And Mark Andrew Girolami The Institute for Computational

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

arxiv: v4 [stat.co] 4 May 2016

arxiv: v4 [stat.co] 4 May 2016 Hamiltonian Monte Carlo Acceleration using Surrogate Functions with Random Bases arxiv:56.5555v4 [stat.co] 4 May 26 Cheng Zhang Department of Mathematics University of California, Irvine Irvine, CA 92697

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Applications of Fisher Information

Applications of Fisher Information Applications of Fisher Information MarvinH.J. Gruber School of Mathematical Sciences Rochester Institute of Technology 85 Lomb Memorial Drive Rochester,NY 14623 jgsma@rit.edu www.people.rit.edu/mjgsma/syracuse/talk.html

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

arxiv: v1 [stat.co] 2 Nov 2017

arxiv: v1 [stat.co] 2 Nov 2017 Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University

More information

Statistical physics models belonging to the generalised exponential family

Statistical physics models belonging to the generalised exponential family Statistical physics models belonging to the generalised exponential family Jan Naudts Universiteit Antwerpen 1. The generalized exponential family 6. The porous media equation 2. Theorem 7. The microcanonical

More information

Quasi-Newton Methods for Markov Chain Monte Carlo

Quasi-Newton Methods for Markov Chain Monte Carlo Quasi-Newton Methods for Markov Chain Monte Carlo Yichuan Zhang and Charles Sutton School of Informatics University of Edinburgh Y.Zhang-60@sms.ed.ac.uk, csutton@inf.ed.ac.uk Abstract The performance of

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Riemannian Stein Variational Gradient Descent for Bayesian Inference

Riemannian Stein Variational Gradient Descent for Bayesian Inference Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu 1 Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech.

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material : Supplementary Material Zhe Gan ZHEGAN@DUKEEDU Changyou Chen CHANGYOUCHEN@DUKEEDU Ricardo Henao RICARDOHENAO@DUKEEDU David Carlson DAVIDCARLSON@DUKEEDU Lawrence Carin LCARIN@DUKEEDU Department of Electrical

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

Natural Evolution Strategies for Direct Search

Natural Evolution Strategies for Direct Search Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Approximate Slice Sampling for Bayesian Posterior Inference

Approximate Slice Sampling for Bayesian Posterior Inference Approximate Slice Sampling for Bayesian Posterior Inference Anonymous Author 1 Anonymous Author 2 Anonymous Author 3 Unknown Institution 1 Unknown Institution 2 Unknown Institution 3 Abstract In this paper,

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

Annealing Between Distributions by Averaging Moments

Annealing Between Distributions by Averaging Moments Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Bayesian covariate models in extreme value analysis

Bayesian covariate models in extreme value analysis Bayesian covariate models in extreme value analysis David Randell, Philip Jonathan, Kathryn Turnbull, Mathew Jones EVA 2015 Ann Arbor Copyright 2015 Shell Global Solutions (UK) EVA 2015 Ann Arbor June

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Patterns of Scalable Bayesian Inference Background (Session 1)

Patterns of Scalable Bayesian Inference Background (Session 1) Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning

Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning Ben Leimkuhler University of Edinburgh Joint work with C. Matthews (Chicago), G. Stoltz (ENPC-Paris), M. Tretyakov

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronatics Massachusetts Institute of Technology ymarz@mit.edu 22 June 2015 Marzouk (MIT) IMA Summer School 22 June 2015

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Summary STK 4150/9150

Summary STK 4150/9150 STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Lagrangian Dynamical Monte Carlo

Lagrangian Dynamical Monte Carlo Lagrangian Dynamical Monte Carlo Shiwei Lan, Vassilios Stathopoulos, Babak Shahbaba, Mark Girolami November, arxiv:.3759v [stat.co] Nov Abstract Hamiltonian Monte Carlo (HMC) improves the computational

More information

Efficient adaptive covariate modelling for extremes

Efficient adaptive covariate modelling for extremes Efficient adaptive covariate modelling for extremes Slides at www.lancs.ac.uk/ jonathan Matthew Jones, David Randell, Emma Ross, Elena Zanini, Philip Jonathan Copyright of Shell December 218 1 / 23 Structural

More information

Sequential Monte Carlo Samplers for Applications in High Dimensions

Sequential Monte Carlo Samplers for Applications in High Dimensions Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

Stochastic modelling of urban structure

Stochastic modelling of urban structure Stochastic modelling of urban structure Louis Ellam Department of Mathematics, Imperial College London The Alan Turing Institute https://iconicmath.org/ IPAM, UCLA Uncertainty quantification for stochastic

More information

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Nested Sampling. Brendon J. Brewer.   brewer/ Department of Statistics The University of Auckland Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics

More information

Hyperbolic Geometry on Geometric Surfaces

Hyperbolic Geometry on Geometric Surfaces Mathematics Seminar, 15 September 2010 Outline Introduction Hyperbolic geometry Abstract surfaces The hemisphere model as a geometric surface The Poincaré disk model as a geometric surface Conclusion Introduction

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Particle Metropolis-Hastings using gradient and Hessian information

Particle Metropolis-Hastings using gradient and Hessian information Particle Metropolis-Hastings using gradient and Hessian information Johan Dahlin, Fredrik Lindsten and Thomas B. Schön September 19, 2014 Abstract Particle Metropolis-Hastings (PMH) allows for Bayesian

More information

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), Geoff Nicholls (Statistics, Oxford) fox@math.auckland.ac.nz

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Diffusion Processes and Dimensionality Reduction on Manifolds

Diffusion Processes and Dimensionality Reduction on Manifolds Faculty of Science Diffusion Processes and Dimensionality Reduction on Manifolds ESI, Vienna, Feb. 2015 Stefan Sommer Department of Computer Science, University of Copenhagen February 23, 2015 Slide 1/22

More information

Mix & Match Hamiltonian Monte Carlo

Mix & Match Hamiltonian Monte Carlo Mix & Match Hamiltonian Monte Carlo Elena Akhmatskaya,2 and Tijana Radivojević BCAM - Basque Center for Applied Mathematics, Bilbao, Spain 2 IKERBASQUE, Basque Foundation for Science, Bilbao, Spain MCMSki

More information

1 First and second variational formulas for area

1 First and second variational formulas for area 1 First and second variational formulas for area In this chapter, we will derive the first and second variational formulas for the area of a submanifold. This will be useful in our later discussion on

More information

Approximate Slice Sampling for Bayesian Posterior Inference

Approximate Slice Sampling for Bayesian Posterior Inference Approximate Slice Sampling for Bayesian Posterior Inference Christopher DuBois GraphLab, Inc. Anoop Korattikara Dept. of Computer Science UC Irvine Max Welling Informatics Institute University of Amsterdam

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information