Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Size: px

Start display at page:

Download "Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017"

Clinton Lucas
5 years ago
Views:

1 Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

2 Acknowledgements Collaborators Andrew Duncan Paul Fearnhead Antonietta Mira Gareth oberts Financial support Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

3 Outline 1 Motivation: Markov Chain Monte Carlo 2 One-dimensional Zig-Zag process 3 Multi-dimensional ZZP 4 Subsampling 5 Doubly intractable likelihood Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

4 Bayesian inference In Bayesian inference we typically deal with a posterior density π(x) = π(x; y) L(y x)π 0 (x), x d, where L(y x) is the likelihood of the data y given parameter x d, and π 0 is a prior density for x. Quantities of interest are e.g. posterior mean xπ(x) dx, posterior variance x 2 π(x) dx ( xπ(x) dx ) 2, tail probability 1 {x c} π(x) dx. All of these involve integrals of the form h(x)π(x) dx. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

5 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

6 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

7 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X 1, X 2,... ) from π and use the law of large numbers. equires independent samples from π 1 h(x)π(x) dx = lim K K K h(x k ). k=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

8 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X 1, X 2,... ) from π and use the law of large numbers. equires independent samples from π 4 Markov Chain Monte Carlo. Construct an ergodic Markov chain (X 1, X 2,... ) with invariant distribution π(x) dx, use Birkhoff s ergodic theorem. 1 h(x)π(x) dx = lim K K K h(x k ). k=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

9 One-dimensional Zig-Zag process Dynamics Continuous time Current state (X (t), Θ(t)) { 1, +1}. Move X (t) in direction Θ(t) = ±1 until a switch occurs. The switching intensity is λ(x (t), Θ(t)) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

10 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

11 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

12 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Equivalently, λ(x, θ) = γ(x) + max (0, θu (x)), γ(x) 0. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

13 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Equivalently, λ(x, θ) = γ(x) + max (0, θu (x)), γ(x) 0. Example: Gaussian distribution N (0, σ 2 ) Density π(x) exp( x 2 /(2σ 2 )) Potential U(x) = x 2 /(2σ 2 ) Derivative U (x) = x/σ 2 Switching rates λ(x, θ) = (θx/σ 2 ) + + γ(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

14 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

15 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) +1) λ(x, 1) = U (x). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

16 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

17 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

18 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

19 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

20 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

21 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

22 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

23 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx = (f (x, 1) f (x, +1))π (x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

24 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx = (f (x, 1) f (x, +1))π (x) dx = (f (x, 1) f (x, +1))π(x) dx = θ df (x, θ)π(x) dx. θ=±1 dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

25 Use in Monte Carlo (X (t), Θ(t)) t 0 has invariant distribution proportional to π(x). If ergodic, 1 T lim h(x (s)) ds = h(x)π(x) dx. T T 0 How to use in computations Either: Numerically integrate 1 T T 0 h(x s) ds for some finite T > 0, or Define (X 1, X 2,... ) by setting X k = X (k ) for some > 0; use as in traditional MCMC. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

26 CLT for the 1D Zig-Zag process [B., Duncan, Limit theorems for the Zig-Zag process, 2016] X (t) satisfies a Central Limit Theorem (CLT) for observable h if 1 T [h(x s ) E π h(x )] ds N (0, σh). 2 T Example: unimodal potential/density function X (t) 0 S + 1 S T 0 + T 1 T 1 + S 2 + T 2 T 2 + T 3 T 3 + S 1 S 2 S 3 t Say Y i = T + i h(x T + s ) ds. i 1 CLT for ZZP follows essentially from CLT for N(t) i=1 Y i. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

27 CLT for the 1D Zig-Zag process [B., Duncan, Limit theorems for the Zig-Zag process, 2016] General formula for asymptotic variance σh 2 = 2 (λ(x, +1) + λ(x, 1)) φ (x) 2 π(x) dx where L Langevin φ = h := h π(h). Langevin diffusion: σh 2 = 2 φ (x) 2 π(x) dx Cool results Computational efficiency for ZZP better than IID sampling for Gaussian (oscillatory ACF) Student-t distribution, ν degrees of freedom Langevin diffusion satisfies CLT for ν > 2 Zig-Zag process satisfies CLT for ν > 1. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

28 Multi-dimensional Zig-Zag process Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

29 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

30 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

31 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Switching rates: λ i (x, θ) = (θ i U i (x i)) +. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

32 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Switching rates: λ i (x, θ) = (θ i U i (x i)) +. Every component of the Zig-Zag process mixes at O(1). Compare to WM O (d), MALA O ( d 1/3), HMC O ( d 1/4). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

33 Sampling x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

34 Sampling λ(x) = max ( ) 0, du dx x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

35 Sampling Λ(x) λ(x) = max ( ) 0, du dx x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

36 Sampling Λ(x) λ(x) = max ( ) 0, du dx T x ( draw P(T t) = exp ) t 0 Λ(X (s)) ds du dx accept T with probability λ(x (T ) Λ(X (T )) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

37 Subsampling m(x) du 1 dx du 2 dx x du dx U = 1 2 (U 1 + U 2 ) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

38 Subsampling Λ(x) λ 1 (x) λ 2 (x) du 1 dx du 2 dx x Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

39 Subsampling Λ(x) λ 1 (x) λ 2 (x) du 2 dx du 1 dx T ( draw P(T t) = exp ) t 0 Λ(X (s)) ds draw I from {1, 2} uniformly accept T with probability λ I (X (T )) Λ(X (T )) x Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

40 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

41 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n n λ i (x, θ) = 1 n i=1 n (θu i (x)) +. i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

42 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n λ i (x, θ) = 1 n (θu i (x)) +. n n i=1 i=1 { n } λ(x, +1) λ(x, 1) = 1 n (U i (x)) + ( U i (x)) + n i=1 i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

43 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n λ i (x, θ) = 1 n (θu i (x)) +. n n i=1 i=1 { n } λ(x, +1) λ(x, 1) = 1 n (U i (x)) + ( U i (x)) + n = 1 n i=1 n { (U i (x)) + (U i (x)) } = 1 n i=1 i=1 n U i (x) = U (x). i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

44 Subsampling - scaling Without subsampling, O(n) computations per O(1) update Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

45 Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

46 Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Subsampling with control variates, O(1) computations per O(1) update: super-efficient. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

47 Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Subsampling with control variates, O(1) computations per O(1) update: super-efficient. The Control Variates approach depends on posterior contraction and requires finding a point close to the mode: O(n) start-up cost. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

48 Control variates U(x) = 1 n n i=1 U i(x) Let x denote (a point close to) the mode of the posterior distribution. Naive subsampling: λ i (x, θ) = (θu i (x))+. Control variates: λ i (x, θ) = (θ {U i (x) + U (x ) U i (x )}) +. If x is close to the mode then U i (x) U i (x ) is small (under assumptions on U) So each λ i (x, θ) is close to the ideal switching rate (θu (x)) +. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

49 100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

50 100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

51 100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

52 10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

53 10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

54 10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

55 Scaling in number of observations Zig-Zag, Zig-Zag w/subsampling, Zig-Zag w/control Variates, Zig-Zag with poor computational bound log(ess / epochs) base log(number of observations) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

56 Scaling in number of observations Zig-Zag, Zig-Zag w/subsampling, Zig-Zag w/control Variates, Zig-Zag with poor computational bound log(ess / second) base log(number of observations) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

57 Doubly intractable likelihood In many applications, the distribution of interest π has the following form. ( d ) exp i=1 x is i (y) π(x; y) = π 0 (x), x d, Z(y)M(x) where y {0, 1} n is a fixed observed realization of the forward model, ( d ) exp i=1 x is i (y) p(y x) =, M(x) s i, i = 1,..., d, are statistics which characterize the distribution of the forward model, with weights x 1,..., x d. Z(y) usual normalization constant Computational problem: Computation of M(x) is O(2 n ): M(x) = ( d ) x i s i (y). y {0,1} n exp i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

58 Examples of doubly intractable likelihood p(y x) = Ising model (physics, image analysis) ( d ) exp i=1 x is i (y), x d, y {0, 1} n. M(x) s 1 (y) = y T Wy, where W is an interaction matrix s 2 (y) = h T y, where h represents an external magnetic field x 1, x 2 serve as inverse temperatures Exponential andom Graph Model random graphs over k vertices, with n := 1 2k(k 1) possible edges y 1,..., y n indicate the presence of an edge s 1 (y): number of edges in the random graph s 2 (y): e.g. number of triangles in the random graph Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

59 The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

60 The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx z {0,1} n exp (xs(z)) s(z) = s(y) + M(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

61 The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx z {0,1} n exp (xs(z)) s(z) = s(y) + = s(y) + E x [s(y )], M(x) where Y is a realization of the forward model with parameter x. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

62 The Zig-Zag process applied to doubly intractable likelihood U (x) = s(y) + E x [s(y )]. Switching rate complexity O(2 n ). For x, θ { 1, +1}, λ(x, θ) = max(θu (x), 0) = max ( θs(y) + θe x [s(y )], 0). Idea: Use unbiased estimate of E x [s(y )] Crude algorithm for determining next switch: 1 Determine upper bound Λ(x) for λ(x, θ) 2 Generate switching ( time according to P(T t) = exp t ). 0 Λ(X (r)) dr d 3 Obtain unbiased estimate Ĝ of dx U(x) 4 Accept switch with probability max(0, θĝ)/λ(x (T )), otherwise repeat. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

63 Unbiased estimation of E x [s(y )] Two possible approaches: perfect sampling, coupling from the past (Propp, Wilson, 1996): use Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π( x). Disadvantages: Not applicable to all discrete models Exponentially slow convergence in cold temperature regimes Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

64 Unbiased estimation of E x [s(y )] Two possible approaches: perfect sampling, coupling from the past (Propp, Wilson, 1996): use Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π( x). Disadvantages: Not applicable to all discrete models Exponentially slow convergence in cold temperature regimes unbiased MCMC sampling (Glynn, hee, 2014): introduce N-valued random variable N. Define i := s(y i ) s(ỹi) where (Y i ) and (Ỹi) are two realizations of Glauber dynamics, correlated in a specific way. Unbiased estimate N i Ĝ = P(N i). i=0 Disadvantages: no global upper bound for estimate, variance may be extremely large. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

65 Zig-Zag Process We can use piecewise deterministic Markov processes for sampling Unbiased estimate for the log density gradient results in correct invariant distribution. Significantly better scaling than IID sampling for big data Doubly intractable likelihood: work in progress Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

66 eferences B., oberts, A piecewise deterministic scaling limit of Lifted Metropolis-Hastings in the Curie-Weiss model, to appear in Annals of Applied Probability, 2015, B., Fearnhead, oberts, The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data, 2016, B., Duncan, Limit theorems for the Zig-Zag process, 2016, B., Fearnhead, Pollock, oberts, Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo, Thank you! Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

The zig-zag and super-efficient sampling for Bayesian analysis of big data

The zig-zag and super-efficient sampling for Bayesian analysis of big data LMS-CRiSM Summer School on Computational Statistics 15th July 2018 Gareth Roberts, University of Warwick Joint work with Joris