Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Size: px
Start display at page:

Download "Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017"

Transcription

1 Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

2 Acknowledgements Collaborators Andrew Duncan Paul Fearnhead Antonietta Mira Gareth oberts Financial support Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

3 Outline 1 Motivation: Markov Chain Monte Carlo 2 One-dimensional Zig-Zag process 3 Multi-dimensional ZZP 4 Subsampling 5 Doubly intractable likelihood Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

4 Bayesian inference In Bayesian inference we typically deal with a posterior density π(x) = π(x; y) L(y x)π 0 (x), x d, where L(y x) is the likelihood of the data y given parameter x d, and π 0 is a prior density for x. Quantities of interest are e.g. posterior mean xπ(x) dx, posterior variance x 2 π(x) dx ( xπ(x) dx ) 2, tail probability 1 {x c} π(x) dx. All of these involve integrals of the form h(x)π(x) dx. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

5 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

6 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

7 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X 1, X 2,... ) from π and use the law of large numbers. equires independent samples from π 1 h(x)π(x) dx = lim K K K h(x k ). k=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

8 Evaluating h(x)π(x) dx Possible approaches: 1 Explicit (analytic) integration. arely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X 1, X 2,... ) from π and use the law of large numbers. equires independent samples from π 4 Markov Chain Monte Carlo. Construct an ergodic Markov chain (X 1, X 2,... ) with invariant distribution π(x) dx, use Birkhoff s ergodic theorem. 1 h(x)π(x) dx = lim K K K h(x k ). k=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

9 One-dimensional Zig-Zag process Dynamics Continuous time Current state (X (t), Θ(t)) { 1, +1}. Move X (t) in direction Θ(t) = ±1 until a switch occurs. The switching intensity is λ(x (t), Θ(t)) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

10 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

11 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

12 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Equivalently, λ(x, θ) = γ(x) + max (0, θu (x)), γ(x) 0. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

13 elation between switching rate and potential Lf (x, θ) = θ df + λ(x, θ)(f (x, θ) f (x, θ)), x, θ { 1, +1}. dx Potential U(x) = log π(x) π is invariant if and only if λ(x, +1) λ(x, 1) = U (x) for all x. Equivalently, λ(x, θ) = γ(x) + max (0, θu (x)), γ(x) 0. Example: Gaussian distribution N (0, σ 2 ) Density π(x) exp( x 2 /(2σ 2 )) Potential U(x) = x 2 /(2σ 2 ) Derivative U (x) = x/σ 2 Switching rates λ(x, θ) = (θx/σ 2 ) + + γ(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

14 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

15 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) +1) λ(x, 1) = U (x). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

16 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

17 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

18 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

19 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

20 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

21 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

22 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

23 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx = (f (x, 1) f (x, +1))π (x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

24 Proof of invariance of π exp( U) Lf (x, θ) = θ f (x, θ) + λ(x, θ) (f (x, θ) f (x, θ)), x λ(x, +1) λ(x, 1) = U (x). Markov semigroup P(t)f (x, θ) = E x,θ f (X (t), Θ(t)) π stationary means that P(t)f (x, θ)π(x) dx = f (x, θ)π(x) dx f D(L), t 0. θ=±1 θ=±1 Differentiating gives the equivalent condition: θ=±1 Lf (x, θ)π(x) dx = 0, f D(L). λ(x, θ) (f (x, θ) f (x, θ)) π(x) dx θ=±1 = {λ(x, +1) (f (x, 1) f (x, +1)) + λ(x, 1) (f (x, +1) f (x, 1))} π(x) dx = (f (x, 1) f (x, +1))(λ(x, +1) λ(x, 1)) π(x) dx = (f (x, 1) f (x, +1))U (x)π(x) dx = (f (x, 1) f (x, +1))π (x) dx = (f (x, 1) f (x, +1))π(x) dx = θ df (x, θ)π(x) dx. θ=±1 dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

25 Use in Monte Carlo (X (t), Θ(t)) t 0 has invariant distribution proportional to π(x). If ergodic, 1 T lim h(x (s)) ds = h(x)π(x) dx. T T 0 How to use in computations Either: Numerically integrate 1 T T 0 h(x s) ds for some finite T > 0, or Define (X 1, X 2,... ) by setting X k = X (k ) for some > 0; use as in traditional MCMC. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

26 CLT for the 1D Zig-Zag process [B., Duncan, Limit theorems for the Zig-Zag process, 2016] X (t) satisfies a Central Limit Theorem (CLT) for observable h if 1 T [h(x s ) E π h(x )] ds N (0, σh). 2 T Example: unimodal potential/density function X (t) 0 S + 1 S T 0 + T 1 T 1 + S 2 + T 2 T 2 + T 3 T 3 + S 1 S 2 S 3 t Say Y i = T + i h(x T + s ) ds. i 1 CLT for ZZP follows essentially from CLT for N(t) i=1 Y i. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

27 CLT for the 1D Zig-Zag process [B., Duncan, Limit theorems for the Zig-Zag process, 2016] General formula for asymptotic variance σh 2 = 2 (λ(x, +1) + λ(x, 1)) φ (x) 2 π(x) dx where L Langevin φ = h := h π(h). Langevin diffusion: σh 2 = 2 φ (x) 2 π(x) dx Cool results Computational efficiency for ZZP better than IID sampling for Gaussian (oscillatory ACF) Student-t distribution, ν degrees of freedom Langevin diffusion satisfies CLT for ν > 2 Zig-Zag process satisfies CLT for ν > 1. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

28 Multi-dimensional Zig-Zag process Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

29 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

30 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

31 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Switching rates: λ i (x, θ) = (θ i U i (x i)) +. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

32 Multi-dimensional Zig-Zag process Target π(x) = exp( U(x)) on d. Set of directions θ { 1, +1} d. Switching rates λ i (x, θ) = (θ i i U(x)) +, for i = 1,..., d. Cool observation factorized target distribution π(x) = d i=1 π i(x i ) with π i (y) = exp( U i (y)). Switching rates: λ i (x, θ) = (θ i U i (x i)) +. Every component of the Zig-Zag process mixes at O(1). Compare to WM O (d), MALA O ( d 1/3), HMC O ( d 1/4). Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

33 Sampling x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

34 Sampling λ(x) = max ( ) 0, du dx x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

35 Sampling Λ(x) λ(x) = max ( ) 0, du dx x du dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

36 Sampling Λ(x) λ(x) = max ( ) 0, du dx T x ( draw P(T t) = exp ) t 0 Λ(X (s)) ds du dx accept T with probability λ(x (T ) Λ(X (T )) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

37 Subsampling m(x) du 1 dx du 2 dx x du dx U = 1 2 (U 1 + U 2 ) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

38 Subsampling Λ(x) λ 1 (x) λ 2 (x) du 1 dx du 2 dx x Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

39 Subsampling Λ(x) λ 1 (x) λ 2 (x) du 2 dx du 1 dx T ( draw P(T t) = exp ) t 0 Λ(X (s)) ds draw I from {1, 2} uniformly accept T with probability λ I (X (T )) Λ(X (T )) x Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

40 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

41 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n n λ i (x, θ) = 1 n i=1 n (θu i (x)) +. i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

42 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n λ i (x, θ) = 1 n (θu i (x)) +. n n i=1 i=1 { n } λ(x, +1) λ(x, 1) = 1 n (U i (x)) + ( U i (x)) + n i=1 i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

43 Subsampling Intractable likelihood, big data: U(x) = 1 n n i=1 U i(x). If π(x) n i=1 f (y i x)π 0 (x), take Theorem U i (x) = log π 0 (x) n log f (y i x). With subsampling, the Zig-Zag Process has exp( U) as invariant density. Proof: Effective switching rate is λ(x, θ) = 1 n λ i (x, θ) = 1 n (θu i (x)) +. n n i=1 i=1 { n } λ(x, +1) λ(x, 1) = 1 n (U i (x)) + ( U i (x)) + n = 1 n i=1 n { (U i (x)) + (U i (x)) } = 1 n i=1 i=1 n U i (x) = U (x). i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

44 Subsampling - scaling Without subsampling, O(n) computations per O(1) update Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

45 Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

46 Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Subsampling with control variates, O(1) computations per O(1) update: super-efficient. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

47 Subsampling - scaling Without subsampling, O(n) computations per O(1) update With naive subsampling, O(1) computations per O(1/n) update Subsampling with control variates, O(1) computations per O(1) update: super-efficient. The Control Variates approach depends on posterior contraction and requires finding a point close to the mode: O(n) start-up cost. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

48 Control variates U(x) = 1 n n i=1 U i(x) Let x denote (a point close to) the mode of the posterior distribution. Naive subsampling: λ i (x, θ) = (θu i (x))+. Control variates: λ i (x, θ) = (θ {U i (x) + U (x ) U i (x )}) +. If x is close to the mode then U i (x) U i (x ) is small (under assumptions on U) So each λ i (x, θ) is close to the ideal switching rate (θu (x)) +. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

49 100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

50 100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

51 100 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

52 10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

53 10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

54 10,000 observations Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

55 Scaling in number of observations Zig-Zag, Zig-Zag w/subsampling, Zig-Zag w/control Variates, Zig-Zag with poor computational bound log(ess / epochs) base log(number of observations) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

56 Scaling in number of observations Zig-Zag, Zig-Zag w/subsampling, Zig-Zag w/control Variates, Zig-Zag with poor computational bound log(ess / second) base log(number of observations) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

57 Doubly intractable likelihood In many applications, the distribution of interest π has the following form. ( d ) exp i=1 x is i (y) π(x; y) = π 0 (x), x d, Z(y)M(x) where y {0, 1} n is a fixed observed realization of the forward model, ( d ) exp i=1 x is i (y) p(y x) =, M(x) s i, i = 1,..., d, are statistics which characterize the distribution of the forward model, with weights x 1,..., x d. Z(y) usual normalization constant Computational problem: Computation of M(x) is O(2 n ): M(x) = ( d ) x i s i (y). y {0,1} n exp i=1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

58 Examples of doubly intractable likelihood p(y x) = Ising model (physics, image analysis) ( d ) exp i=1 x is i (y), x d, y {0, 1} n. M(x) s 1 (y) = y T Wy, where W is an interaction matrix s 2 (y) = h T y, where h represents an external magnetic field x 1, x 2 serve as inverse temperatures Exponential andom Graph Model random graphs over k vertices, with n := 1 2k(k 1) possible edges y 1,..., y n indicate the presence of an edge s 1 (y): number of edges in the random graph s 2 (y): e.g. number of triangles in the random graph Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

59 The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

60 The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx z {0,1} n exp (xs(z)) s(z) = s(y) + M(x) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

61 The Zig-Zag process applied to doubly intractable likelihood For simplicity, say x and ignore prior distribution. π(x; y) = so that exp (xs(y)) Z(y)M(x), M(x) = For the derivative of U we find z {0,1} n exp (xs(z)) x, y {0, 1} n, U(x) = log π(x; y) = xs(y) + log M(x). U (x) = s(y) + d log M(x) dx z {0,1} n exp (xs(z)) s(z) = s(y) + = s(y) + E x [s(y )], M(x) where Y is a realization of the forward model with parameter x. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

62 The Zig-Zag process applied to doubly intractable likelihood U (x) = s(y) + E x [s(y )]. Switching rate complexity O(2 n ). For x, θ { 1, +1}, λ(x, θ) = max(θu (x), 0) = max ( θs(y) + θe x [s(y )], 0). Idea: Use unbiased estimate of E x [s(y )] Crude algorithm for determining next switch: 1 Determine upper bound Λ(x) for λ(x, θ) 2 Generate switching ( time according to P(T t) = exp t ). 0 Λ(X (r)) dr d 3 Obtain unbiased estimate Ĝ of dx U(x) 4 Accept switch with probability max(0, θĝ)/λ(x (T )), otherwise repeat. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

63 Unbiased estimation of E x [s(y )] Two possible approaches: perfect sampling, coupling from the past (Propp, Wilson, 1996): use Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π( x). Disadvantages: Not applicable to all discrete models Exponentially slow convergence in cold temperature regimes Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

64 Unbiased estimation of E x [s(y )] Two possible approaches: perfect sampling, coupling from the past (Propp, Wilson, 1996): use Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π( x). Disadvantages: Not applicable to all discrete models Exponentially slow convergence in cold temperature regimes unbiased MCMC sampling (Glynn, hee, 2014): introduce N-valued random variable N. Define i := s(y i ) s(ỹi) where (Y i ) and (Ỹi) are two realizations of Glauber dynamics, correlated in a specific way. Unbiased estimate N i Ĝ = P(N i). i=0 Disadvantages: no global upper bound for estimate, variance may be extremely large. Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

65 Zig-Zag Process We can use piecewise deterministic Markov processes for sampling Unbiased estimate for the log density gradient results in correct invariant distribution. Significantly better scaling than IID sampling for big data Doubly intractable likelihood: work in progress Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

66 eferences B., oberts, A piecewise deterministic scaling limit of Lifted Metropolis-Hastings in the Curie-Weiss model, to appear in Annals of Applied Probability, 2015, B., Fearnhead, oberts, The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data, 2016, B., Duncan, Limit theorems for the Zig-Zag process, 2016, B., Fearnhead, Pollock, oberts, Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo, Thank you! Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, / 33

The zig-zag and super-efficient sampling for Bayesian analysis of big data

The zig-zag and super-efficient sampling for Bayesian analysis of big data The zig-zag and super-efficient sampling for Bayesian analysis of big data LMS-CRiSM Summer School on Computational Statistics 15th July 2018 Gareth Roberts, University of Warwick Joint work with Joris

More information

Carlo. Correspondence: February 16, Abstract

Carlo. Correspondence: February 16, Abstract Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo Paul Fearnhead 1,, Joris Bierkens 2, Murray Pollock 3 and Gareth O Roberts 3 1 Department of Mathematics and Statistics, Lancaster

More information

Practical unbiased Monte Carlo for Uncertainty Quantification

Practical unbiased Monte Carlo for Uncertainty Quantification Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University

More information

arxiv: v1 [stat.co] 2 Nov 2017

arxiv: v1 [stat.co] 2 Nov 2017 Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Sequential Monte Carlo Samplers for Applications in High Dimensions

Sequential Monte Carlo Samplers for Applications in High Dimensions Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Inexact approximations for doubly and triply intractable problems

Inexact approximations for doubly and triply intractable problems Inexact approximations for doubly and triply intractable problems March 27th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers of) interacting

More information

Sequential Monte Carlo Methods in High Dimensions

Sequential Monte Carlo Methods in High Dimensions Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,

More information

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), Geoff Nicholls (Statistics, Oxford) fox@math.auckland.ac.nz

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

Asymptotics and Simulation of Heavy-Tailed Processes

Asymptotics and Simulation of Heavy-Tailed Processes Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline

More information

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang Introduction to MCMC DB Breakfast 09/30/2011 Guozhang Wang Motivation: Statistical Inference Joint Distribution Sleeps Well Playground Sunny Bike Ride Pleasant dinner Productive day Posterior Estimation

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Introduction to Rare Event Simulation

Introduction to Rare Event Simulation Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31

More information

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Maksym Byshkin 1, Alex Stivala 4,1, Antonietta Mira 1,3, Garry Robins 2, Alessandro Lomi 1,2 1 Università della Svizzera

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

A = {(x, u) : 0 u f(x)},

A = {(x, u) : 0 u f(x)}, Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent

More information

Gradient-based Monte Carlo sampling methods

Gradient-based Monte Carlo sampling methods Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

1 Geometry of high dimensional probability distributions

1 Geometry of high dimensional probability distributions Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Retail Planning in Future Cities A Stochastic Dynamical Singly Constrained Spatial Interaction Model

Retail Planning in Future Cities A Stochastic Dynamical Singly Constrained Spatial Interaction Model Retail Planning in Future Cities A Stochastic Dynamical Singly Constrained Spatial Interaction Model Mark Girolami Department of Mathematics, Imperial College London The Alan Turing Institute Lloyds Register

More information

Stochastic modelling of urban structure

Stochastic modelling of urban structure Stochastic modelling of urban structure Louis Ellam Department of Mathematics, Imperial College London The Alan Turing Institute https://iconicmath.org/ IPAM, UCLA Uncertainty quantification for stochastic

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

arxiv: v1 [stat.co] 23 Nov 2016

arxiv: v1 [stat.co] 23 Nov 2016 Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo Paul Fearnhead 1,, Joris Bierkens 2, Murray Pollock 2 and Gareth O Roberts 2 1 Department of Mathematics and Statistics, Lancaster

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Surveying the Characteristics of Population Monte Carlo

Surveying the Characteristics of Population Monte Carlo International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Bayesian parameter estimation in predictive engineering

Bayesian parameter estimation in predictive engineering Bayesian parameter estimation in predictive engineering Damon McDougall Institute for Computational Engineering and Sciences, UT Austin 14th August 2014 1/27 Motivation Understand physical phenomena Observations

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

A stochastic formulation of a dynamical singly constrained spatial interaction model

A stochastic formulation of a dynamical singly constrained spatial interaction model A stochastic formulation of a dynamical singly constrained spatial interaction model Mark Girolami Department of Mathematics, Imperial College London The Alan Turing Institute, British Library Lloyds Register

More information

Patterns of Scalable Bayesian Inference Background (Session 1)

Patterns of Scalable Bayesian Inference Background (Session 1) Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected 4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Evidence estimation for Markov random fields: a triply intractable problem

Evidence estimation for Markov random fields: a triply intractable problem Evidence estimation for Markov random fields: a triply intractable problem January 7th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers

More information