Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33
Acknowledgements Collaborators Andrew Duncan Paul Fearnhead Antonietta Mira Gareth oberts Financial support Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 2 / 33
Outline 1 Motivation: Markov Chain Monte Carlo 2 One-dimensional Zig-Zag process 3 Multi-dimensional ZZP 4 Subsampling 5 Doubly intractable likelihood Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 3 / 33
Bayesian inference In Bayesian inference we typically deal with a posterior density π(x) = π(x; y) L(y x)π 0 (x), x d, where L(y x) is the likelihood of the data y given parameter x d, and π 0 is a prior density for x. Quantities of interest are e.g. posterior mean xπ(x) dx, posterior variance x 2 π(x) dx ( xπ(x) dx ) 2, tail probability 1 {x c} π(x) dx. All of these involve integrals of the form h(x)π(x) dx. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 4 / 33
Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X 1, X 2,... ) from π and use the law of large numbers. equires independent samples from π 1 h(x)π(x) dx = lim K K K h(x k ). k=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X 1, X 2,... ) from π and use the law of large numbers. equires independent samples from π 4 Markov Chain Monte Carlo. Construct an ergodic Markov chain (X 1, X 2,... ) with invariant distribution π(x) dx, use Birkhoff s ergodic theorem. 1 h(x)π(x) dx = lim K K K h(x k ). k=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
One-dimensional Zig-Zag process Dynamics Continuous time Current state (X (t), Θ(t)) { 1, +1}. Move X (t) in direction Θ(t) = ±1 until a switch occurs. The switching intensity is λ(x (t), Θ(t)). 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 0 10 20 30 40 50 60 70 80 90 100 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 6 / 33
elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Equivalently, λ(x, θ) = γ(x) + max (0, θu (x)), γ(x) 0. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Equivalently, λ(x, θ) = γ(x) + max (0, θu (x)), γ(x) 0. Example: Gaussian distribution N (0, σ 2 ) Density π(x) exp( x 2 /(2σ 2 )) Potential U(x) = x 2 /(2σ 2 ) Derivative U (x) = x/σ 2 Switching rates λ(x, θ) = (θx/σ 2 ) + + γ(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) +1) λ(x, 1) = U (x). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx = (f (x, 1) f (x, +1))π (x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx = (f (x, 1) f (x, +1))π (x) dx = (f (x, 1) f (x, +1))π(x) dx = θ df (x, θ)π(x) dx. θ=±1 dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Use in Monte Carlo (X (t), Θ(t)) t 0 has invariant distribution proportional to π(x). If ergodic, 1 T lim h(x (s)) ds = h(x)π(x) dx. T T 0 How to use in computations Either: Numerically integrate 1 T T 0 h(x s) ds for some finite T > 0, or Define (X 1, X 2,... ) by setting X k = X (k ) for some > 0; use as in traditional MCMC. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 9 / 33
CLT for the 1D Zig-Zag process [B., Duncan, Limit theorems for the Zig-Zag process, 2016] X (t) satisfies a Central Limit Theorem (CLT) for observable h if 1 T [h(x s ) E π h(x )] ds N (0, σh). 2 T Example: unimodal potential/density function X (t) 0 S + 1 S + 3 0 T 0 + T 1 T 1 + S 2 + T 2 T 2 + T 3 T 3 + S 1 S 2 S 3 t Say Y i = T + i h(x T + s ) ds. i 1 CLT for ZZP follows essentially from CLT for N(t) i=1 Y i. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 10 / 33
CLT for the 1D Zig-Zag process [B., Duncan, Limit theorems for the Zig-Zag process, 2016] General formula for asymptotic variance σh 2 = 2 (λ(x, +1) + λ(x, 1)) φ (x) 2 π(x) dx where L Langevin φ = h := h π(h). Langevin diffusion: σh 2 = 2 φ (x) 2 π(x) dx Cool results Computational efficiency for ZZP better than IID sampling for Gaussian (oscillatory ACF) Student-t distribution, ν degrees of freedom Langevin diffusion satisfies CLT for ν > 2 Zig-Zag process satisfies CLT for ν > 1. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 11 / 33
Multi-dimensional Zig-Zag process Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 12 / 33
Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Switching rates: λ i (x, θ) = (θ i U i (x i)) +. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Switching rates: λ i (x, θ) = (θ i U i (x i)) +. Every component of the Zig-Zag process mixes at O(1). Compare to WM O (d), MALA O ( d 1/3), HMC O ( d 1/4). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Sampling x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Sampling λ(x) = max ( ) 0, du dx x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Sampling Λ(x) λ(x) = max ( ) 0, du dx x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Sampling Λ(x) λ(x) = max ( ) 0, du dx T x ( draw P(T t) = exp ) t 0 Λ(X (s)) ds du dx accept T with probability λ(x (T ) Λ(X (T )) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Subsampling m(x) du 1 dx du 2 dx x du dx U = 1 2 (U 1 + U 2 ) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 15 / 33
Subsampling Λ(x) λ 1 (x) λ 2 (x) du 1 dx du 2 dx x Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 15 / 33
Subsampling Λ(x) λ 1 (x) λ 2 (x) du 2 dx du 1 dx T ( draw P(T t) = exp ) t 0 Λ(X (s)) ds draw I from {1, 2} uniformly accept T with probability λ I (X (T )) Λ(X (T )) x Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 15 / 33
Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n n λ i (x, θ) = 1 n i=1 n (θu i (x)) +. i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n λ i (x, θ) = 1 n (θu i (x)) +. n n i=1 i=1 { n } λ(x, +1) λ(x, 1) = 1 n (U i (x)) + ( U i (x)) + n i=1 i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n λ i (x, θ) = 1 n (θu i (x)) +. n n i=1 i=1 { n } λ(x, +1) λ(x, 1) = 1 n (U i (x)) + ( U i (x)) + n = 1 n i=1 n { (U i (x)) + (U i (x)) } = 1 n i=1 i=1 n U i (x) = U (x). i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling - scaling Without subsampling, O(n) computations per O(1) update Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Subsampling with control variates, O(1) computations per O(1) update: super-efficient. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Subsampling with control variates, O(1) computations per O(1) update: super-efficient. The Control Variates approach depends on posterior contraction and requires finding a point close to the mode: O(n) start-up cost. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Control variates U(x) = 1 n n i=1 U i(x) Let x denote (a point close to) the mode of the posterior distribution. Naive subsampling: λ i (x, θ) = (θu i (x))+. Control variates: λ i (x, θ) = (θ {U i (x) + U (x ) U i (x )}) +. If x is close to the mode then U i (x) U i (x ) is small (under assumptions on U) So each λ i (x, θ) is close to the ideal switching rate (θu (x)) +. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 18 / 33
100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 19 / 33
100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 20 / 33
100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 21 / 33
10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 22 / 33
10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 23 / 33
10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 24 / 33
Scaling in number of observations Zig-Zag, Zig-Zag w/subsampling, Zig-Zag w/control Variates, Zig-Zag with poor computational bound log(ess / epochs) base 2 8 6 4 2 0 2 6 7 8 9 10 log(number of observations) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 25 / 33
Scaling in number of observations Zig-Zag, Zig-Zag w/subsampling, Zig-Zag w/control Variates, Zig-Zag with poor computational bound log(ess / second) base 2 6 8 10 12 14 16 6 7 8 9 10 log(number of observations) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 26 / 33
Doubly intractable likelihood In many applications, the distribution of interest π has the following form. ( d ) exp i=1 x is i (y) π(x; y) = π 0 (x), x d, Z(y)M(x) where y {0, 1} n is a fixed observed realization of the forward model, ( d ) exp i=1 x is i (y) p(y x) =, M(x) s i, i = 1,..., d, are statistics which characterize the distribution of the forward model, with weights x 1,..., x d. Z(y) usual normalization constant Computational problem: Computation of M(x) is O(2 n ): M(x) = ( d ) x i s i (y). y {0,1} n exp i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 27 / 33
Examples of doubly intractable likelihood p(y x) = Ising model (physics, image analysis) ( d ) exp i=1 x is i (y), x d, y {0, 1} n. M(x) s 1 (y) = y T Wy, where W is an interaction matrix s 2 (y) = h T y, where h represents an external magnetic field x 1, x 2 serve as inverse temperatures Exponential andom Graph Model random graphs over k vertices, with n := 1 2k(k 1) possible edges y 1,..., y n indicate the presence of an edge s 1 (y): number of edges in the random graph s 2 (y): e.g. number of triangles in the random graph Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 28 / 33
The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 29 / 33
The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx z {0,1} n exp (xs(z)) s(z) = s(y) + M(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 29 / 33
The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx z {0,1} n exp (xs(z)) s(z) = s(y) + = s(y) + E x [s(y )], M(x) where Y is a realization of the forward model with parameter x. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 29 / 33
The Zig-Zag process applied to doubly intractable likelihood U (x) = s(y) + E x [s(y )]. Switching rate complexity O(2 n ). For x, θ { 1, +1}, λ(x, θ) = max(θu (x), 0) = max ( θs(y) + θe x [s(y )], 0). Idea: Use unbiased estimate of E x [s(y )] Crude algorithm for determining next switch: 1 Determine upper bound Λ(x) for λ(x, θ) 2 Generate switching ( time according to P(T t) = exp t ). 0 Λ(X (r)) dr d 3 Obtain unbiased estimate Ĝ of dx U(x) 4 Accept switch with probability max(0, θĝ)/λ(x (T )), otherwise repeat. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 30 / 33
Unbiased estimation of E x [s(y )] Two possible approaches: perfect sampling, coupling from the past (Propp, Wilson, 1996): use Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π( x). Disadvantages: Not applicable to all discrete models Exponentially slow convergence in cold temperature regimes Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 31 / 33
Unbiased estimation of E x [s(y )] Two possible approaches: perfect sampling, coupling from the past (Propp, Wilson, 1996): use Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π( x). Disadvantages: Not applicable to all discrete models Exponentially slow convergence in cold temperature regimes unbiased MCMC sampling (Glynn, hee, 2014): introduce N-valued random variable N. Define i := s(y i ) s(ỹi) where (Y i ) and (Ỹi) are two realizations of Glauber dynamics, correlated in a specific way. Unbiased estimate N i Ĝ = P(N i). i=0 Disadvantages: no global upper bound for estimate, variance may be extremely large. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 31 / 33
Zig-Zag Process We can use piecewise deterministic Markov processes for sampling Unbiased estimate for the log density gradient results in correct invariant distribution. Significantly better scaling than IID sampling for big data Doubly intractable likelihood: work in progress Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 32 / 33
eferences B., oberts, A piecewise deterministic scaling limit of Lifted Metropolis-Hastings in the Curie-Weiss model, to appear in Annals of Applied Probability, 2015, https://arxiv.org/abs/1509.00302 B., Fearnhead, oberts, The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data, 2016, https://arxiv.org/abs/1607.03188 B., Duncan, Limit theorems for the Zig-Zag process, 2016, https://arxiv.org/abs/1607.08845 B., Fearnhead, Pollock, oberts, Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo, https://arxiv.org/abs/1611.07873 Thank you! Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 33 / 33