Manifold Monte Carlo Methods

Size: px

Start display at page:

Download "Manifold Monte Carlo Methods"

Virginia Clemence Singleton
5 years ago
Views:

1 Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October 13, 2010

2 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

3 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

4 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1-37.

5 Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, Advanced Monte Carlo methodology founded on geometric principles

6 Talk Outline Motivation to improve MCMC capability for challenging problems

7 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process

8 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology

9 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism

10 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods

11 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian

12 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model

13 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model Log-Gaussian Cox process

14 Talk Outline Motivation to improve MCMC capability for challenging problems Stochastic diffusion as adaptive proposal process Exploring geometric concepts in MCMC methodology Diffusions across Riemann manifold as proposal mechanism Deterministic geodesic flows on manifold form basis of MCMC methods Illustrative Examples:- Warped Bivariate Gaussian Gaussian mixture model Log-Gaussian Cox process Conclusions

15 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 )

16 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ)

17 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts

18 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ)

19 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ) accept or reject proposal with probability» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ)

20 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ) accept or reject proposal with probability» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Efficiency dependent on p p(θ θ) defining proposal mechanism

21 Motivation Simulation Based Inference Monte Carlo method employs samples from p(θ) to obtain estimate Z φ(θ)p(θ)dθ = 1 X N n φ(θn ) + O(N 1 2 ) Draw θ n from ergodic Markov process with stationary distribution p(θ) Construct process in two parts Propose a move θ θ with probability p p(θ θ) accept or reject proposal with probability» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Efficiency dependent on p p(θ θ) defining proposal mechanism Success of MCMC reliant upon appropriate proposal design

22 Adaptive Proposal Distributions - Exploit Discretised Diffusion

23 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t)

24 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ)

25 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ)

26 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ) Acceptance probability to correct for bias» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ)

27 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ) Acceptance probability to correct for bias» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Isotropic diffusion inefficient, employ pre-conditioning θ = θ + ɛ 2 M θ L(θ)/2 + ɛ Mz

28 Adaptive Proposal Distributions - Exploit Discretised Diffusion For θ R D with density p(θ), L(θ) log p(θ), define Langevin diffusion dθ(t) = 1 2 θl(θ(t))dt + db(t) First order Euler-Maruyama discrete integration of diffusion θ(τ + ɛ) = θ(τ) + ɛ2 2 θl(θ(τ)) + ɛz(τ) Proposal p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 I) with µ(θ, ɛ) = θ + ɛ2 2 θl(θ) Acceptance probability to correct for bias» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Isotropic diffusion inefficient, employ pre-conditioning θ = θ + ɛ 2 M θ L(θ)/2 + ɛ Mz How to set M systematically? Tuning in transient & stationary phases

29 Geometric Concepts in MCMC

30 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ)

31 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ)

32 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold

33 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics

34 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989

35 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989 Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

36 Geometric Concepts in MCMC Rao, 1945; Jeffreys, 1948, to first order Z p(y; θ + δθ) p(y; θ + δθ) log dθ δθ T G(θ)δθ p(y; θ) where G(θ) = E y θ ( θ p(y; θ) p(y; θ) ) T θ p(y; θ) p(y; θ) Fisher Information G(θ) is p.d. metric defining a Riemann manifold Non-Euclidean geometry for probabilities - distances, metrics, invariants, curvature, geodesics Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975; Lauritsen, 1989 Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998 Can geometric structure be employed in MCMC methodology?

37 Geometric Concepts in MCMC

38 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l

39 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold

40 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold Γ i kl = 1 X g im gmk + g ml g «kl 2 θ l θ k θ m m

41 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold Γ i kl = 1 X g im gmk + g ml g «kl 2 θ l θ k θ m m Geodesics - shortest path between two points on manifold

Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved

42 Geometric Concepts in MCMC Tangent space - local metric defined by δθ T G(θ)δθ = P k,l g klδθ k δθ l Christoffel symbols - characterise connection on curved manifold Γ i kl = 1 X g im gmk + g ml g «kl 2 θ l θ k θ m m Geodesics - shortest path between two points on manifold d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0

43 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ)

44 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T

45 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T Metric is Fisher Information» σ 2 0 G(µ, σ) = 0 2σ 2

46 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T Metric is Fisher Information» σ 2 0 G(µ, σ) = 0 2σ 2 Inner-product σ 2 (δµ 2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further apart than the densities N (0, 2) & N (1, 2) - distance non-euclidean

47 Illustration of Geometric Concepts Consider Normal density p(x µ, σ) = N x(µ, σ) Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ) T Metric is Fisher Information» σ 2 0 G(µ, σ) = 0 2σ 2 Inner-product σ 2 (δµ 2 + 2δσ 2 ) so densities N (0, 1) & N (1, 1) further apart than the densities N (0, 2) & N (1, 2) - distance non-euclidean Metric tensor for univariate Normal defines a Hyperbolic Space

48 M.C. Escher, Heaven and Hell, 1960

49 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j G(θ) 1 ij Γ d ij + ɛ p G 1 (θ)z d

50 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z

51 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z MALA proposal with preconditioning θ = θ + ɛ2 2 M θl(θ) + ɛ Mz

52 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z MALA proposal with preconditioning θ = θ + ɛ2 2 M θl(θ) + ɛ Mz Proposal and acceptance probability p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 G(θ))» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ)

53 Langevin Diffusion on Riemannian manifold Discretised Langevin diffusion on manifold defines proposal mechanism θ d = θ d + ɛ2 2 G 1 (θ) θ L(θ) ɛ 2 d DX i,j p G(θ) 1 ij Γ d ij + ɛ G 1 (θ)z d Manifold with constant curvature then proposal mechanism reduces to θ = θ + ɛ2 2 G 1 (θ) θ L(θ) + ɛ p G 1 (θ)z MALA proposal with preconditioning θ = θ + ɛ2 2 M θl(θ) + ɛ Mz Proposal and acceptance probability p p(θ θ) = N (θ µ(θ, ɛ), ɛ 2 G(θ))» p a(θ θ) = min 1, p(θ )p p(θ θ ) p(θ)p p(θ θ) Proposal mechanism diffuses approximately along the manifold

54 Langevin Diffusion on Riemannian manifold !! µ µ

55 Langevin Diffusion on Riemannian manifold !! µ µ

56 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics

57 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0

58 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator?

59 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour

60 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ)

61 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ) Introduce auxiliary variable p N (0, G(θ))

62 Geodesic flow as proposal mechanism Desirable that proposals follow direct path on manifold - geodesics Define geodesic flow on manifold by solving d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 How can this be exploited in the design of a transition operator? Need slight detour - first define log-density under model as L(θ) Introduce auxiliary variable p N (0, G(θ)) Negative joint log density is H(θ, p) = L(θ) log 2πD G(θ) pt G(θ) 1 p

63 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy

64 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)}

65 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)} Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) p n+1 θ n N (0, G(θ n )) θ n+1 p n+1 p(θ n+1 p n+1 )

66 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)} Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) p n+1 θ n N (0, G(θ n )) θ n+1 p n+1 p(θ n+1 p n+1 ) Employ Hamiltonian dynamics to propose samples for p(θ n+1 p n+1 ), Duane et al, 1987; Neal, 2010.

67 Riemannian Hamiltonian Monte Carlo Negative joint log-density Hamiltonian defined on Riemann manifold H(θ, p) = L(θ) log 2πD G(θ) + 1 {z } 2 pt G(θ) 1 p {z } Potential Energy Kinetic Energy Marginal density follows as required Z j exp {L(θ)} p(θ) p exp 1 ff 2πD G(θ) 2 pt G(θ) 1 p dp = exp {L(θ)} Obtain samples from marginal p(θ) using Gibbs sampler for p(θ, p) p n+1 θ n N (0, G(θ n )) θ n+1 p n+1 p(θ n+1 p n+1 ) Employ Hamiltonian dynamics to propose samples for p(θ n+1 p n+1 ), Duane et al, 1987; Neal, dθ dt = dp H(θ, p) p dt = H(θ, p) θ

68 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p

69 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G

70 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p

71 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p)

72 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h

73 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ dt = dp H(θ, p) p dt = H(θ, p) θ

74 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ dt = dp H(θ, p) p dt is therefore equivalent to the solution of = H(θ, p) θ d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0

75 Riemannian Manifold Hamiltonian Monte Carlo Consider the Hamiltonian H(θ, p) = 1 2 pt G(θ) 1 p Hamiltonians with only a quadratic kinetic energy term exactly describe geodesic flow on the coordinate space θ with metric G However our Hamiltonian is H(θ, p) = V (θ) pt G(θ) 1 p If we define G(θ) = G(θ) (h V (θ)), where h is a constant H(θ, p) Then the Maupertuis principle tells us that the Hamiltonian flow for H(θ, p) and H(θ, p) are equivalent along energy level h The solution of dθ dt = dp H(θ, p) p dt is therefore equivalent to the solution of = H(θ, p) θ d 2 θ i dt 2 + X k,l Γ i dθ k dθ l kl dt dt = 0 RMHMC proposals are along the manifold geodesics

76 Warped Bivariate Gaussian p(w 1, w 2 y, σ x, σ y) Q N n=1 N (yn w 1 + w2 2, σy 2 )N (w 1, w 2 0, σx 2 I)

77 Warped Bivariate Gaussian p(w 1, w 2 y, σ x, σ y) Q N n=1 N (yn w 1 + w2 2, σy 2 )N (w 1, w 2 0, σx 2 I) HMC 2 RMHMC 1 1 w2 0 w w w1

78 Gaussian Mixture Model Univariate finite mixture model KX p(x i θ) = π k N (x i µ k, σk 2 ) k=1

79 Gaussian Mixture Model Univariate finite mixture model KX p(x i θ) = π k N (x i µ k, σk 2 ) k=1 FI based metric tensor non-analytic - employ empirical FI

80 Gaussian Mixture Model Univariate finite mixture model KX p(x i θ) = π k N (x i µ k, σk 2 ) k=1 FI based metric tensor non-analytic - employ empirical FI G(θ) = 1 N ST S 1 N 2 s st N cov ( θl(θ)) = I(θ) G(θ) θ d = 1 N! S T S + S T S 1 «s s T + s st θ d θ d N 2 θ d θ d with score matrix S with elements S i,d = log p(x i θ) θ d and s = P N i=1 ST i,

81 Gaussian Mixture Model Univariate finite mixture model p(x µ, σ 2 ) = 0.7 N (x 0, σ 2 ) N (x µ, σ 2 )

82 Gaussian Mixture Model Univariate finite mixture model p(x µ, σ 2 ) = 0.7 N (x 0, σ 2 ) N (x µ, σ 2 ) lnσ µ Figure: Arrows correspond to the gradients and ellipses to the inverse metric tensor. Dashed lines are isocontours of the joint log density

83 Gaussian Mixture Model MALA path mmala path Simplified mmala path lnσ 0 lnσ 0 lnσ µ µ µ MALA Sample Autocorrelation Function mmala Sample Autocorrelation Function Simplified mmala Sample Autocorrelation Function Sample Autocorrelation Sample Autocorrelation Sample Autocorrelation Lag Lag Lag Figure: Comparison of MALA (left), mmala (middle) and simplified mmala (right) convergence paths and autocorrelation plots. Autocorrelation plots are from the stationary chains, i.e. once the chains have converged to the stationary distribution.

84 Gaussian Mixture Model HMC path RMHMC path Gibbs path lnσ 0 lnσ 0 lnσ µ µ µ HMC Sample Autocorrelation Function RMHMC Sample Autocorrelation Function Gibbs Sample Autocorrelation Function Sample Autocorrelation Sample Autocorrelation Sample Autocorrelation Lag Lag Lag Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergence paths and autocorrelation plots. Autocorrelation plots are from the stationary chains, i.e. once the chains have converged to the stationary distribution.

85 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2)

86 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2) Metric tensors G(θ) i,j = 1 2 trace G(x) = Λ + Σ 1 θ Σ 1 θ Σ θ θ i Σ 1 θ where Λ is diagonal with elements m exp(µ + (Σ θ ) i,i ) «Σ θ θ j

87 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2) Metric tensors G(θ) i,j = 1 2 trace G(x) = Λ + Σ 1 θ Σ 1 θ Σ θ θ i Σ 1 θ where Λ is diagonal with elements m exp(µ + (Σ θ ) i,i ) «Σ θ θ j Latent field metric tensor defining flat manifold is , O(N 3 ) obtained from parameter conditional

88 Log-Gaussian Cox Point Process with Latent Field The joint density for Poisson counts and latent Gaussian field p(y, x µ, σ, β) Y 64 i,j exp{y i,j x i,j m exp(x i,j )} exp( (x µ1) T Σ 1 θ (x µ1)/2) Metric tensors G(θ) i,j = 1 2 trace G(x) = Λ + Σ 1 θ Σ 1 θ Σ θ θ i Σ 1 θ where Λ is diagonal with elements m exp(µ + (Σ θ ) i,i ) «Σ θ θ j Latent field metric tensor defining flat manifold is , O(N 3 ) obtained from parameter conditional MALA requires transformation of latent field to sample - with separate tuning in transient and stationary phases of Markov chain Manifold methods requires no pilot tuning or additional transformations

89 RMHMC for Log-Gaussian Cox Point Processes Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of sampling methods Method Time ESS (Min, Med, Max) s/min ESS Rel. Speed MALA (Transient) 31,577 (3, 8, 50) 10,605 1 MALA (Stationary) 31,118 (4, 16, 80) mmala 634 (26, 84, 174) RMHMC 2936 (1951, 4545, 5000)

90 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods

91 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA

92 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC

93 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models

94 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology

95 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development

96 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols

97 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence

98 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures

99 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric

100 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics

101 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics Alternative transition kernels

102 Conclusion and Discussion Geometry of statistical models harnessed in Monte Carlo methods Diffusions that respect structure and curvature of space - Manifold MALA Geodesic flows on model manifold - RMHMC - generalisation of HMC Assessed on correlated & high-dimensional latent variable models Promising capability of methodology Ongoing development Potential bottleneck at metric tensor and Christoffel symbols Theoretical analysis of convergence Investigate alternative manifold structures Design and effect of metric Optimality of Hamiltonian flows as local geodesics Alternative transition kernels No silver bullet or cure all - new powerful methodology for MC toolkit

103 Funding Acknowledgment Girolami funded by EPSRC Advanced Research Fellowship and BBSRC project grant Calderhead supported by Microsoft Research PhD Scholarship

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview