A Dirichlet Form approach to MCMC Optimal Scaling

Size: px
Start display at page:

Download "A Dirichlet Form approach to MCMC Optimal Scaling"

Transcription

1 A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. Supported by EPSRC Research Grants EP/D002060, EP/K LMS Durham Symposium on Stochastic Analysis 12th July 2017

2 Introduction Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion

3 3 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y).

4 3 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Joint probability density p(x, y)

5 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Conditional density p(x y) = Joint probability density p(x, y) Z Norming constant Z can be hard to compute!

6 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Conditional density p(x y) = Build Markov chain with this as equilibrium (no need to know Z) Joint probability density p(x, y) Z Norming constant Z can be hard to compute!

7 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Conditional density p(x y) = Build Markov chain with this as equilibrium (no need to know Z) Joint probability density p(x, y) Z Norming constant Z can be hard to compute! Simulate Markov chain till approximate equilibrium.

8 Example: MCMC for Anglo-Saxon statistics Some historians conjecture, Anglo-Saxon placenames cluster by dissimilar names. Zanella (2015, 2016) uses MCMC: data provides some support, resulting in useful clustering.

9 Example: MCMC for Anglo-Saxon statistics Some historians conjecture, Anglo-Saxon placenames cluster by dissimilar names. Zanella (2015, 2016) uses MCMC: data provides some support, resulting in useful clustering.

10 5 MCMC and optimal scaling Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion

11 6 Goal: estimate E = E π [h(x)]. MCMC idea

12 6 MCMC idea Goal: estimate E = E π [h(x)]. Method: simulate ergodic Markov chain with stationary distribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n ).

13 6 MCMC idea Goal: estimate E = E π [h(x)]. Method: simulate ergodic Markov chain with stationary distribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n ). (Much easier to apply theory if chain is reversible.)

14 MCMC idea Goal: estimate E = E π [h(x)]. Method: simulate ergodic Markov chain with stationary distribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n ). (Much easier to apply theory if chain is reversible.) Theory: Ê n E almost surely.

15 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X)

16 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x;

17 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x; 2. Random walk (RW MH-MCMC): proposal q(y x) = q(y x) behaves as a random walk;

18 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x; 2. Random walk (RW MH-MCMC): proposal q(y x) = q(y x) behaves as a random walk; 3. MALA MH-MCMC: proposal q(y x) = q(y x λ grad log f ) drifts towards high target density f.

19 7 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x; 2. Random walk (RW MH-MCMC): proposal q(y x) = q(y x) behaves as a random walk; 3. MALA MH-MCMC: proposal q(y x) = q(y x λ grad log f ) drifts towards high target density f. We shall focus on RW MH-MCMC with Gaussian proposals.

20 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy:

21 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step;

22 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio;

23 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio; Implement step if accepted (vector addition).

24 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio; Implement step if accepted (vector addition). while not mcmc.stopped(): z = normal(0, tau, size=mcmc.dim) if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x): mcmc.x += z mcmc.record_result()

25 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio; Implement step if accepted (vector addition). while not mcmc.stopped(): z = normal(0, tau, size=mcmc.dim) if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x): mcmc.x += z mcmc.record_result() What is best choice of scale / standard deviation tau?

26 RW MH-MCMC with Gaussian proposals (smooth target, marginal exp( x 4 )) Target is given by 10 i.i.d. coordinates. Scale parameter for proposal: τ = 1 is too large! Acceptance ratio 1.7%

27 RW MH-MCMC with Gaussian proposals (smooth target, marginal exp( x 4 )) Target is given by 10 i.i.d. coordinates. Scale parameter for proposal: τ = 0.1 is better. Acceptance ratio 76.5%

28 RW MH-MCMC with Gaussian proposals (smooth target, marginal exp( x 4 )) Target is given by 10 i.i.d. coordinates. Scale parameter for proposal: τ = 0.01 is too small. Acceptance ratio 98.5%

29 10 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d )

30 0 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d ) Questions: (1) complexity as d? (2) optimal σ d?

31 0 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d ) Questions: (1) complexity as d? (2) optimal σ d? Theorem (Roberts, Gelman and Gilks, 1997) Given σ 2 d = σ 2 d, Lipschitz φ, and finite E π [(φ ) 8 ], E π [(φ ) 4 ] {X (d) td,1 } t Z where dz t = s(σ ) 1 2 dbt s(σ ) φ (Z t ) dt.

32 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d ) Questions: (1) complexity as d? (2) optimal σ d? Theorem (Roberts, Gelman and Gilks, 1997) Given σ 2 d = σ 2 d, Lipschitz φ, and finite E π [(φ ) 8 ], E π [(φ ) 4 ] {X (d) td,1 } t Z where dz t = s(σ ) 1 2 dbt s(σ ) φ (Z t ) dt. 0 Answers: (1) mix in O(d) steps; (2) σ max = arg max σ s(σ ).

33 11 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } 0.234

34 11 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } 0.234

35 11 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } Strengths: Establish complexity as d ; Practical information on how to tune proposal; Does not depend on φ (CLT-type universality).

36 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } Strengths: Establish complexity as d ; Practical information on how to tune proposal; Does not depend on φ (CLT-type universality). Some weaknesses that we will address: (there are others) Convergence of marginal rather than joint distribution Strong regularity assumptions: Lipschitz g, finite E [ (g ) 8], E [ (g ) 4].

37 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998);

38 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007);

39 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000);

40 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012);

41 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012); Markov chains on a hypercube (Roberts, 1998);

42 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012); Markov chains on a hypercube (Roberts, 1998); Adaptive MCMC; adjust online to optimize acceptance probability (Andrieu and Thoms, 2008; Rosenthal, 2011).

43 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012); Markov chains on a hypercube (Roberts, 1998); Adaptive MCMC; adjust online to optimize acceptance probability (Andrieu and Thoms, 2008; Rosenthal, 2011). All these build on the s.d.e. approach of Roberts, Gelman and Gilks (1997); hence regularity conditions tend to be severe (but see Durmus et al., 2016).

44 13 Dirichlet forms and optimal scaling Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion

45 Dirichlet forms and MCMC 1 Definition of Dirichlet form A (symmetric) Dirichlet form E on a Hilbert space H is a closed bilinear function E(u, v), defined / finite for any u, v D H, which satisfies: 1. D is a dense linear subspace of H; 2. E(u, v) = E(v, u) for u, v D, so E is symmetric; 3. E(u) = E(u, u) 0 for u D; 4. D is a Hilbert space under the ( Sobolev ) inner product u, v + E(u, v); 5. If u D then u = (u 1) 0 D, moreover E(u, u ) E(u, u). Relate to Markov process if (quasi)-regular.

46 Dirichlet forms and MCMC 1 Definition of Dirichlet form A (symmetric) Dirichlet form E on a Hilbert space H is a closed bilinear function E(u, v), defined / finite for any u, v D H, which satisfies: 1. D is a dense linear subspace of H; 2. E(u, v) = E(v, u) for u, v D, so E is symmetric; 3. E(u) = E(u, u) 0 for u D; 4. D is a Hilbert space under the ( Sobolev ) inner product u, v + E(u, v); 5. If u D then u = (u 1) 0 D, moreover E(u, u ) E(u, u). Relate to Markov process if (quasi)-regular. Regular Dirichlet form for locally compact Polish E: D C 0 (E) is E 1 2 -dense in D, uniformly dense in C 0 (E).

47 15 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.)

48 15 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.) 2. Heuristic infinite-dimensional diffusion limit of this form under scaling: E (h) = s(σ ) [ 2 E π h 2].

49 15 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.) 2. Heuristic infinite-dimensional diffusion limit of this form under scaling: E (h) = s(σ ) [ 2 E π h 2]. Under mild conditions this is: closable, Dirichlet, quasi-regular.

50 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). 15 (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.) 2. Heuristic infinite-dimensional diffusion limit of this form under scaling: E (h) = s(σ ) [ 2 E π h 2]. Under mild conditions this is: closable, Dirichlet, quasi-regular. Can we deduce that the RW MH-MCMC scales to look like the infinite-dimensional diffusion, by showing that E d converges to E?

51 16 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if

52 16 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H;

53 16 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ).

54 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions;

55 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H;

56 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ).

57 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ). 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1): conditions (M1) and (M2) imply convergence of associated resolvent operators,

58 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ). 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1): conditions (M1) and (M2) imply convergence of associated resolvent operators, and indeed of associated semigroups.

59 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ). 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1): conditions (M1) and (M2) imply convergence of associated resolvent operators, and indeed of associated semigroups. 4. Sun (1998) gives further conditions which imply weak convergence of the associated processes: these conditions are implied by existence of a finite constant C such that E n (h) C( h 2 + E(h)) for all h H.

60 Results and methods of proofs Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion

61 18 Results Theorem (Zanella, Bédard and WSK, 2016) Consider the Gaussian RW MH-MCMC based on proposal variance σ 2 /d with target π d, where dπ = f dx = e φ dx. Suppose I = φ 2 f dx < (finite Fisher information), and φ (x + v) φ (x) < κ max{ v γ, v α } for some κ > 0, 0 < γ < 1, and α > 1. Let E d be the corresponding [ Dirichlet form scaled as] above. E d Mosco-converges to E 1 exp(n ( 1 2 σ 2 I, σ 2 I)) E, so corresponding L 2 semigroups also converge.

62 18 Results Theorem (Zanella, Bédard and WSK, 2016) Consider the Gaussian RW MH-MCMC based on proposal variance σ 2 /d with target π d, where dπ = f dx = e φ dx. Suppose I = φ 2 f dx < (finite Fisher information), and φ (x + v) φ (x) < κ max{ v γ, v α } for some κ > 0, 0 < γ < 1, and α > 1. Let E d be the corresponding [ Dirichlet form scaled as] above. E d Mosco-converges to E 1 exp(n ( 1 2 σ 2 I, σ 2 I)) E, so corresponding L 2 semigroups also converge. Corollary Suppose in the above that φ is globally Lipschitz. The correspondingly scaled processes exhibit weak convergence.

63 19 Methods of proof 1: a CLT result Lemma (A conditional CLT) Under the conditions of the Corollary, almost surely (in x with invariant measure π ) the log Metropolis-Hastings ratio converges weakly (in W ) as follows as d : d f (x i + σ W i log f (x i ) i=1 d i=1 d ) = ( ) φ(x i + σ W i d ) φ(x i ) N ( 1 2 σ 2 I, σ 2 I).

64 19 Methods of proof 1: a CLT result Lemma (A conditional CLT) Under the conditions of the Corollary, almost surely (in x with invariant measure π ) the log Metropolis-Hastings ratio converges weakly (in W ) as follows as d : d f (x i + σ W i log f (x i ) i=1 d i=1 d ) = ( ) φ(x i + σ W i d ) φ(x i ) N ( 1 2 σ 2 I, σ 2 I). We may use this to deduce the asymptotic acceptance rate of the RW MH-MCMC sampler.

65 0 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du.

66 20 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du. Condition implicitly on x for first 2.5 steps. 1. First summand converges weakly to N (0, σ 2 I).

67 20 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du. Condition implicitly on x for first 2.5 steps. 1. First summand converges weakly to N (0, σ 2 I). 2. Decompose [ variance of second summand to deduce d ( ) ] Var i=1 φ (x i + σ W i d u φ (x i ) du 0. σ W i d 1 0

68 20 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du. Condition implicitly on x for first 2.5 steps. 1. First summand converges weakly to N (0, σ 2 I). 2. Decompose [ variance of second summand to deduce d σ W Var i ( ) ] d 1 i=1 0 φ (x i + σ W i d u φ (x i ) du Use [ Hoeffding s inequality then absolute expectations: d ( ) ] E i=1 φ (x i + σ W i d u φ (x i ) du 1 2 σ 2 I. σ W i d 1 0

69 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <.

70 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <. 2. Find sequence of smooth cylinder functions h n with compact cylindrical support, such that E (h) E (h n ) 1/n.

71 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <. 2. Find sequence of smooth cylinder functions h n with compact cylindrical support, such that E (h) E (h n ) 1/n. 3. Using smoothness etc, E m (h n ) E (h n ) as m.

72 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <. 2. Find sequence of smooth cylinder functions h n with compact cylindrical support, such that E (h) E (h n ) 1/n. 3. Using smoothness etc, E m (h n ) E (h n ) as m. 4. Subsequences....

73 Methods of proof 3: establishing condition (M1) If h n h weakly in L 2 (π ), show E (h) lim inf n E n (h n ). Detailed stochastic analysis involves: 1. Set Ψ n (h) = n 2 (h(x(n) 0 ) h(x(n) 1 )).

74 Methods of proof 3: establishing condition (M1) If h n h weakly in L 2 (π ), show E (h) lim inf n E n (h n ). Detailed stochastic analysis involves: 1. Set Ψ n (h) = n 2 (h(x(n) 0 ) h(x(n) 1 )). 2. Integrate against test function ξ(x 1:N, W 1:N )I(U < a(x 1:N, W 1:N )) for ξ smooth, compact support, U a Uniform(0, 1) random variable. Apply Cauchy-Schwarz.

75 Methods of proof 3: establishing condition (M1) If h n h weakly in L 2 (π ), show E (h) lim inf n E n (h n ). Detailed stochastic analysis involves: 1. Set Ψ n (h) = n 2 (h(x(n) 0 ) h(x(n) 1 )). 2. Integrate against test function ξ(x 1:N, W 1:N )I(U < a(x 1:N, W 1:N )) for ξ smooth, compact support, U a Uniform(0, 1) random variable. Apply Cauchy-Schwarz. 3. Use integration by parts, careful analysis and conditions on φ.

76 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability:

77 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ).

78 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ). [ Also I = E φ 2] <.

79 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ). [ Also I = E φ 2] <. Durmus et al. [(2016) obtain optimal scaling results when p > 4, and E φ 6] <,

80 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ). [ Also I = E φ 2] <. Durmus et al. [(2016) obtain optimal scaling results when p > 4, and E φ 6] <, L p mean differentiability applies straightforwardly to the Zanella, Bédard and WSK (2016) argument mutatis mutandis: the regularity conditions can be weakened even more at least for vague convergence.

81 Conclusion Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion

82 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results;

83 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions;

84 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I);

85 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress);

86 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress); Need to explore development beyond i.i.d. targets; e.g. can regularity be similarly relaxed in more general random field settings?

87 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress); Need to explore development beyond i.i.d. targets; e.g. can regularity be similarly relaxed in more general random field settings? Apply to discrete Markov chain cases? (c.f. Roberts, 1998);

88 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress); Need to explore development beyond i.i.d. targets; e.g. can regularity be similarly relaxed in more general random field settings? Apply to discrete Markov chain cases? (c.f. Roberts, 1998); Investigate applications to Adaptive MCMC.

89 6 References: Andrieu, Christophe and Johannes Thoms (2008). A tutorial on adaptive MCMC. In: Statistics and Computing 18.4, pp Bédard, Mylène (2007). Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. In: Annals of Applied Probability 17.4, pp Breyer, L A and Gareth O Roberts (2000). From Metropolis to diffusions : Gibbs states and optimal scaling. In: Stochastic Processes and their Applications 90.2, pp Brooks, Stephen P, Andrew Gelman, Galin L Jones and Xiao-Li Meng (2011). Handbook of Markov Chain Monte Carlo. Boca Raton: Chapman & Hall/CRC, pp. 592+xxv.

90 Durmus, Alain, Sylvain Le Corff, Eric Moulines and Gareth O Roberts (2016). Optimal scaling of the Random Walk Metropolis algorithm under $Lˆp$ mean differentiability. In: arxiv Geyer, Charlie (1999). Likelihood inference for spatial point processes. In: Stochastic Geometry: likelihood and computation. Ed. by Ole E Barndorff-Nielsen, WSK and M N M van Lieshout. Boca Raton: Chapman & Hall/CRC. Chap. 4, pp Hastings, W K (1970). Monte Carlo sampling methods using Markov chains and their applications. In: Biometrika 57, pp Mattingly, Jonathan C., Natesh S. Pillai and Andrew M. Stuart (2012). Diffusion limits of the random walk metropolis algorithm in high dimensions. In: Annals of Applied Probability 22.3, pp

91 8 Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller and Edward Teller (1953). Equation of state calculations by fast computing machines. en. In: Journal Chemical Physics 21.6, pp Mosco, Umberto (1994). Composite media and asymptotic Dirichlet forms. In: Journal of Functional Analysis 123.2, pp Roberts, Gareth O (1998). Optimal Metropolis algorithms for product measures on the vertices of a hypercube. In: Stochastics and Stochastic Reports June 2013, pp Roberts, Gareth O, A Gelman and W Gilks (1997). Weak Convergence and Optimal Scaling of Random Walk Algorithms. In: The Annals of Applied Probability 7.1, pp

92 9 Roberts, Gareth O and Jeffrey S Rosenthal (1998). Optimal scaling of discrete approximations to Langevin diffusions.. In: J. R. Statist. Soc. B 60.1, pp Rosenthal, Jeffrey S (2011). Optimal Proposal Distributions and Adaptive MCMC. In: Handbook of Markov Chain Monte Carlo 1, pp Sun, Wei (1998). Weak convergence of Dirichlet processes. In: Science in China Series A: Mathematics 41.1, pp Thompson, Elizabeth A (2005). MCMC in the analysis of genetic data on pedigree. In: Markov chain Monte Carlo: Innovations and Applications. Ed. by WSK, Faming Liang and Jian-Sheng Wang. Singapore: World Scientific. Chap. 5, pp Zanella, Giacomo (2015). Bayesian Complementary Clustering, MCMC, and Anglo-Saxon Placenames. PhD Thesis. University of Warwick.

93 0 Zanella, Giacomo (2016). Random Partition Models and Complementary Clustering of Anglo-Saxon Placenames. In: Annals of Applied Statistics 9.4, pp Zanella, Giacomo, Mylène Bédard and WSK (2016). A Dirichlet Form approach to MCMC Optimal Scaling. In: arxiv , 22pp.

94 1 Version 1.34 (Wed Jul 12 06:27: ) ================================================ commit d81469c5f14c484aa363616e3d701c6e3fbb1141 Author: Wilfrid Kendall Made it explicit that Lp mean differentiability still doesn t cover weak without extra regularity: need to beat this!

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Log-concave sampling: Metropolis-Hastings algorithms are fast!

Log-concave sampling: Metropolis-Hastings algorithms are fast! Proceedings of Machine Learning Research vol 75:1 5, 2018 31st Annual Conference on Learning Theory Log-concave sampling: Metropolis-Hastings algorithms are fast! Raaz Dwivedi Department of Electrical

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018 Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames

Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames Giacomo Zanella g.zanella@warwick.ac.uk Department of Statistics University of Warwick, Coventry, UK 20 May 2014 Overview 1. Motivation:

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Computational Complexity of Metropolis-Hastings Methods in High Dimensions

Computational Complexity of Metropolis-Hastings Methods in High Dimensions Computational Complexity of Metropolis-Hastings Methods in High Dimensions Alexandros Beskos and Andrew Stuart Abstract This article contains an overview of the literature concerning the computational

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Variance Bounding Markov Chains

Variance Bounding Markov Chains Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.

More information

Sequential Monte Carlo Methods in High Dimensions

Sequential Monte Carlo Methods in High Dimensions Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1 Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Gradient-based Monte Carlo sampling methods

Gradient-based Monte Carlo sampling methods Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification

More information

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

Practical unbiased Monte Carlo for Uncertainty Quantification

Practical unbiased Monte Carlo for Uncertainty Quantification Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronatics Massachusetts Institute of Technology ymarz@mit.edu 22 June 2015 Marzouk (MIT) IMA Summer School 22 June 2015

More information

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM

VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016 List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers

More information

Probability for Statistics and Machine Learning

Probability for Statistics and Machine Learning ~Springer Anirban DasGupta Probability for Statistics and Machine Learning Fundamentals and Advanced Topics Contents Suggested Courses with Diffe~ent Themes........................... xix 1 Review of Univariate

More information

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented

More information

Negative Association, Ordering and Convergence of Resampling Methods

Negative Association, Ordering and Convergence of Resampling Methods Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal

More information

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017 Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Particle Metropolis-adjusted Langevin algorithms

Particle Metropolis-adjusted Langevin algorithms Particle Metropolis-adjusted Langevin algorithms Christopher Nemeth, Chris Sherlock and Paul Fearnhead arxiv:1412.7299v3 [stat.me] 27 May 2016 Department of Mathematics and Statistics, Lancaster University,

More information

The zig-zag and super-efficient sampling for Bayesian analysis of big data

The zig-zag and super-efficient sampling for Bayesian analysis of big data The zig-zag and super-efficient sampling for Bayesian analysis of big data LMS-CRiSM Summer School on Computational Statistics 15th July 2018 Gareth Roberts, University of Warwick Joint work with Joris

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Spectral properties of Markov operators in Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Michael Johannes Columbia University Nicholas Polson University of Chicago August 28, 2007 1 Introduction The Bayesian solution to any inference problem is a simple rule: compute

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Adaptive Markov Chain Monte Carlo: Theory and Methods

Adaptive Markov Chain Monte Carlo: Theory and Methods Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget Anoop Korattikara, Yutian Chen and Max Welling,2 Department of Computer Science, University of California, Irvine 2 Informatics Institute,

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Asymptotics and Simulation of Heavy-Tailed Processes

Asymptotics and Simulation of Heavy-Tailed Processes Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Optimal Acceptance Rates for Metropolis Algorithms: Moving Beyond 0.234

Optimal Acceptance Rates for Metropolis Algorithms: Moving Beyond 0.234 Optimal Acceptance Rates for Metropolis Algorithms: Moving Beyond 0.34 Mylène Bédard April 3, 006 Abstract Recent optimal scaling theory has produced a necessary and sufficient condition for the asymptotically

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

arxiv: v1 [math.st] 4 Dec 2015

arxiv: v1 [math.st] 4 Dec 2015 MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv:151.01366v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

Optimizing and Adapting the Metropolis Algorithm

Optimizing and Adapting the Metropolis Algorithm 6 Optimizing and Adapting the Metropolis Algorithm Jeffrey S. Rosenthal University of Toronto, Toronto, ON 6.1 Introduction Many modern scientific questions involve high-dimensional data and complicated

More information

Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods

Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods 1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 6 Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods Florian Maire maire@dms.umontreal.ca Département

More information

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics

Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics a dissertation submitted to the faculty of the graduate school of the university

More information

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ.

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information