A Dirichlet Form approach to MCMC Optimal Scaling
|
|
- Blaise Oliver
- 5 years ago
- Views:
Transcription
1 A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. Supported by EPSRC Research Grants EP/D002060, EP/K LMS Durham Symposium on Stochastic Analysis 12th July 2017
2 Introduction Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion
3 3 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y).
4 3 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Joint probability density p(x, y)
5 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Conditional density p(x y) = Joint probability density p(x, y) Z Norming constant Z can be hard to compute!
6 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Conditional density p(x y) = Build Markov chain with this as equilibrium (no need to know Z) Joint probability density p(x, y) Z Norming constant Z can be hard to compute!
7 Introduction to Markov chain Monte Carlo (MCMC) General reference: Brooks et al. (2011) MCMC Handbook. Suppose x represents an unknown (and therefore random!) parameter, and y represents data depending on the unknown parameter, joint probability density p(x, y). Conditional density p(x y) = Build Markov chain with this as equilibrium (no need to know Z) Joint probability density p(x, y) Z Norming constant Z can be hard to compute! Simulate Markov chain till approximate equilibrium.
8 Example: MCMC for Anglo-Saxon statistics Some historians conjecture, Anglo-Saxon placenames cluster by dissimilar names. Zanella (2015, 2016) uses MCMC: data provides some support, resulting in useful clustering.
9 Example: MCMC for Anglo-Saxon statistics Some historians conjecture, Anglo-Saxon placenames cluster by dissimilar names. Zanella (2015, 2016) uses MCMC: data provides some support, resulting in useful clustering.
10 5 MCMC and optimal scaling Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion
11 6 Goal: estimate E = E π [h(x)]. MCMC idea
12 6 MCMC idea Goal: estimate E = E π [h(x)]. Method: simulate ergodic Markov chain with stationary distribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n ).
13 6 MCMC idea Goal: estimate E = E π [h(x)]. Method: simulate ergodic Markov chain with stationary distribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n ). (Much easier to apply theory if chain is reversible.)
14 MCMC idea Goal: estimate E = E π [h(x)]. Method: simulate ergodic Markov chain with stationary distribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n ). (Much easier to apply theory if chain is reversible.) Theory: Ê n E almost surely.
15 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X)
16 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x;
17 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x; 2. Random walk (RW MH-MCMC): proposal q(y x) = q(y x) behaves as a random walk;
18 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x; 2. Random walk (RW MH-MCMC): proposal q(y x) = q(y x) behaves as a random walk; 3. MALA MH-MCMC: proposal q(y x) = q(y x λ grad log f ) drifts towards high target density f.
19 7 Varieties of MH-MCMC Here is the famous Metropolis-Hastings recipe for drawing from a distribution with density f : Propose Y using conditional density q(y x); Accept/Reject move from X to Y, based on ratio f(y) q(x Y) / f(x) q(y X) Options: 1. Independence sampler: proposal q(y x) = q(y) doesn t depend on x; 2. Random walk (RW MH-MCMC): proposal q(y x) = q(y x) behaves as a random walk; 3. MALA MH-MCMC: proposal q(y x) = q(y x λ grad log f ) drifts towards high target density f. We shall focus on RW MH-MCMC with Gaussian proposals.
20 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy:
21 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step;
22 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio;
23 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio; Implement step if accepted (vector addition).
24 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio; Implement step if accepted (vector addition). while not mcmc.stopped(): z = normal(0, tau, size=mcmc.dim) if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x): mcmc.x += z mcmc.record_result()
25 8 Gaussian RW MH-MCMC Simple Python code for Gaussian RW MH-MCMC, using normal and exponential from Numpy: Propose multivariate Gaussian step; Test whether to accept proposal by comparing exponential random variable with log MH ratio; Implement step if accepted (vector addition). while not mcmc.stopped(): z = normal(0, tau, size=mcmc.dim) if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x): mcmc.x += z mcmc.record_result() What is best choice of scale / standard deviation tau?
26 RW MH-MCMC with Gaussian proposals (smooth target, marginal exp( x 4 )) Target is given by 10 i.i.d. coordinates. Scale parameter for proposal: τ = 1 is too large! Acceptance ratio 1.7%
27 RW MH-MCMC with Gaussian proposals (smooth target, marginal exp( x 4 )) Target is given by 10 i.i.d. coordinates. Scale parameter for proposal: τ = 0.1 is better. Acceptance ratio 76.5%
28 RW MH-MCMC with Gaussian proposals (smooth target, marginal exp( x 4 )) Target is given by 10 i.i.d. coordinates. Scale parameter for proposal: τ = 0.01 is too small. Acceptance ratio 98.5%
29 10 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d )
30 0 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d ) Questions: (1) complexity as d? (2) optimal σ d?
31 0 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d ) Questions: (1) complexity as d? (2) optimal σ d? Theorem (Roberts, Gelman and Gilks, 1997) Given σ 2 d = σ 2 d, Lipschitz φ, and finite E π [(φ ) 8 ], E π [(φ ) 4 ] {X (d) td,1 } t Z where dz t = s(σ ) 1 2 dbt s(σ ) φ (Z t ) dt.
32 MCMC Optimal Scaling: classic result (I) RW MH-MCMC on (R d, π d ) π(dx i ) = e φ(x i) dx i ; MH acceptance rule A (d) = 0 or 1. X (d) 0 = ( X 1,..., X d ) X i iid π X (d) 1 = (X 1 + A (d) W 1,..., X d + A (d) W d ) W i iid N(0, σ 2 d ) Questions: (1) complexity as d? (2) optimal σ d? Theorem (Roberts, Gelman and Gilks, 1997) Given σ 2 d = σ 2 d, Lipschitz φ, and finite E π [(φ ) 8 ], E π [(φ ) 4 ] {X (d) td,1 } t Z where dz t = s(σ ) 1 2 dbt s(σ ) φ (Z t ) dt. 0 Answers: (1) mix in O(d) steps; (2) σ max = arg max σ s(σ ).
33 11 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } 0.234
34 11 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } 0.234
35 11 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } Strengths: Establish complexity as d ; Practical information on how to tune proposal; Does not depend on φ (CLT-type universality).
36 MCMC Optimal Scaling: classic result (II) Optimization: maximize s(σ )! Given I = E π [φ (X) 2 ] and normal CDF Φ, s(σ ) = σ 2 2Φ( σ I 2 ) = σ 2 A(σ ) = 4 I ( Φ 1 ( A(σ ) 2 )) 2 A(σ ) So σ max maximized by choosing asymptotic acceptance rate A(σ max ) = arg max A [0,1] { (Φ 1 ( A 2 )) 2 A} } Strengths: Establish complexity as d ; Practical information on how to tune proposal; Does not depend on φ (CLT-type universality). Some weaknesses that we will address: (there are others) Convergence of marginal rather than joint distribution Strong regularity assumptions: Lipschitz g, finite E [ (g ) 8], E [ (g ) 4].
37 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998);
38 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007);
39 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000);
40 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012);
41 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012); Markov chains on a hypercube (Roberts, 1998);
42 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012); Markov chains on a hypercube (Roberts, 1998); Adaptive MCMC; adjust online to optimize acceptance probability (Andrieu and Thoms, 2008; Rosenthal, 2011).
43 12 MCMC Optimal Scaling: classic result (III) There is a wide range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is (Roberts and Rosenthal, 1998); Non-identically distributed independent target coordinates (Bédard, 2007); Gibbs random fields (Breyer and Roberts, 2000); Infinite dimensional random fields (Mattingly, Pillai and Stuart, 2012); Markov chains on a hypercube (Roberts, 1998); Adaptive MCMC; adjust online to optimize acceptance probability (Andrieu and Thoms, 2008; Rosenthal, 2011). All these build on the s.d.e. approach of Roberts, Gelman and Gilks (1997); hence regularity conditions tend to be severe (but see Durmus et al., 2016).
44 13 Dirichlet forms and optimal scaling Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion
45 Dirichlet forms and MCMC 1 Definition of Dirichlet form A (symmetric) Dirichlet form E on a Hilbert space H is a closed bilinear function E(u, v), defined / finite for any u, v D H, which satisfies: 1. D is a dense linear subspace of H; 2. E(u, v) = E(v, u) for u, v D, so E is symmetric; 3. E(u) = E(u, u) 0 for u D; 4. D is a Hilbert space under the ( Sobolev ) inner product u, v + E(u, v); 5. If u D then u = (u 1) 0 D, moreover E(u, u ) E(u, u). Relate to Markov process if (quasi)-regular.
46 Dirichlet forms and MCMC 1 Definition of Dirichlet form A (symmetric) Dirichlet form E on a Hilbert space H is a closed bilinear function E(u, v), defined / finite for any u, v D H, which satisfies: 1. D is a dense linear subspace of H; 2. E(u, v) = E(v, u) for u, v D, so E is symmetric; 3. E(u) = E(u, u) 0 for u D; 4. D is a Hilbert space under the ( Sobolev ) inner product u, v + E(u, v); 5. If u D then u = (u 1) 0 D, moreover E(u, u ) E(u, u). Relate to Markov process if (quasi)-regular. Regular Dirichlet form for locally compact Polish E: D C 0 (E) is E 1 2 -dense in D, uniformly dense in C 0 (E).
47 15 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.)
48 15 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.) 2. Heuristic infinite-dimensional diffusion limit of this form under scaling: E (h) = s(σ ) [ 2 E π h 2].
49 15 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.) 2. Heuristic infinite-dimensional diffusion limit of this form under scaling: E (h) = s(σ ) [ 2 E π h 2]. Under mild conditions this is: closable, Dirichlet, quasi-regular.
50 Dirichlet forms and MCMC 2 Two examples 1. Dirichlet form obtained from (re-scaled) RW MH-MCMC: E d (h) = d [ ( ) ] 2 E h(x (d) 2 1 ) h(x(d) 0 ). 15 (E d can be viewed as the Dirichlet form arising from speeding up the RW MH-MCMC by rate d.) 2. Heuristic infinite-dimensional diffusion limit of this form under scaling: E (h) = s(σ ) [ 2 E π h 2]. Under mild conditions this is: closable, Dirichlet, quasi-regular. Can we deduce that the RW MH-MCMC scales to look like the infinite-dimensional diffusion, by showing that E d converges to E?
51 16 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if
52 16 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H;
53 16 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ).
54 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions;
55 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H;
56 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ).
57 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ). 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1): conditions (M1) and (M2) imply convergence of associated resolvent operators,
58 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ). 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1): conditions (M1) and (M2) imply convergence of associated resolvent operators, and indeed of associated semigroups.
59 6 Useful modes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1) E (h) lim inf n E n (h n ) whenever h n h H; (Γ 2) For every h H there are h n h H such that E (h) lim sup n E n (h n ). 2. Mosco (1994) introduces stronger conditions; (M1) E (h) lim inf n E n (h n ) whenever h n h weakly in H; (M2) For every h H there are h n h strongly in H such that E (h) lim sup n E n (h n ). 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1): conditions (M1) and (M2) imply convergence of associated resolvent operators, and indeed of associated semigroups. 4. Sun (1998) gives further conditions which imply weak convergence of the associated processes: these conditions are implied by existence of a finite constant C such that E n (h) C( h 2 + E(h)) for all h H.
60 Results and methods of proofs Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion
61 18 Results Theorem (Zanella, Bédard and WSK, 2016) Consider the Gaussian RW MH-MCMC based on proposal variance σ 2 /d with target π d, where dπ = f dx = e φ dx. Suppose I = φ 2 f dx < (finite Fisher information), and φ (x + v) φ (x) < κ max{ v γ, v α } for some κ > 0, 0 < γ < 1, and α > 1. Let E d be the corresponding [ Dirichlet form scaled as] above. E d Mosco-converges to E 1 exp(n ( 1 2 σ 2 I, σ 2 I)) E, so corresponding L 2 semigroups also converge.
62 18 Results Theorem (Zanella, Bédard and WSK, 2016) Consider the Gaussian RW MH-MCMC based on proposal variance σ 2 /d with target π d, where dπ = f dx = e φ dx. Suppose I = φ 2 f dx < (finite Fisher information), and φ (x + v) φ (x) < κ max{ v γ, v α } for some κ > 0, 0 < γ < 1, and α > 1. Let E d be the corresponding [ Dirichlet form scaled as] above. E d Mosco-converges to E 1 exp(n ( 1 2 σ 2 I, σ 2 I)) E, so corresponding L 2 semigroups also converge. Corollary Suppose in the above that φ is globally Lipschitz. The correspondingly scaled processes exhibit weak convergence.
63 19 Methods of proof 1: a CLT result Lemma (A conditional CLT) Under the conditions of the Corollary, almost surely (in x with invariant measure π ) the log Metropolis-Hastings ratio converges weakly (in W ) as follows as d : d f (x i + σ W i log f (x i ) i=1 d i=1 d ) = ( ) φ(x i + σ W i d ) φ(x i ) N ( 1 2 σ 2 I, σ 2 I).
64 19 Methods of proof 1: a CLT result Lemma (A conditional CLT) Under the conditions of the Corollary, almost surely (in x with invariant measure π ) the log Metropolis-Hastings ratio converges weakly (in W ) as follows as d : d f (x i + σ W i log f (x i ) i=1 d i=1 d ) = ( ) φ(x i + σ W i d ) φ(x i ) N ( 1 2 σ 2 I, σ 2 I). We may use this to deduce the asymptotic acceptance rate of the RW MH-MCMC sampler.
65 0 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du.
66 20 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du. Condition implicitly on x for first 2.5 steps. 1. First summand converges weakly to N (0, σ 2 I).
67 20 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du. Condition implicitly on x for first 2.5 steps. 1. First summand converges weakly to N (0, σ 2 I). 2. Decompose [ variance of second summand to deduce d ( ) ] Var i=1 φ (x i + σ W i d u φ (x i ) du 0. σ W i d 1 0
68 20 Key idea for CLT Use exact Taylor expansion techniques: d i=1 d i=1 ( ) φ(x i + σ W i d ) φ(x i ) φ (x i ) σ W i d + d i=1 = 1 ( ) σ d W i φ (x i + σ W i 0 d u) φ (x i ) du. Condition implicitly on x for first 2.5 steps. 1. First summand converges weakly to N (0, σ 2 I). 2. Decompose [ variance of second summand to deduce d σ W Var i ( ) ] d 1 i=1 0 φ (x i + σ W i d u φ (x i ) du Use [ Hoeffding s inequality then absolute expectations: d ( ) ] E i=1 φ (x i + σ W i d u φ (x i ) du 1 2 σ 2 I. σ W i d 1 0
69 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <.
70 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <. 2. Find sequence of smooth cylinder functions h n with compact cylindrical support, such that E (h) E (h n ) 1/n.
71 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <. 2. Find sequence of smooth cylinder functions h n with compact cylindrical support, such that E (h) E (h n ) 1/n. 3. Using smoothness etc, E m (h n ) E (h n ) as m.
72 1 Methods of proof 2: establishing condition (M2) For every h L 2 (π ), find h n h (strongly) in L 2 (π ) such that E (h) lim sup n E n (h n ). 1. Sufficient to consider case E (h) <. 2. Find sequence of smooth cylinder functions h n with compact cylindrical support, such that E (h) E (h n ) 1/n. 3. Using smoothness etc, E m (h n ) E (h n ) as m. 4. Subsequences....
73 Methods of proof 3: establishing condition (M1) If h n h weakly in L 2 (π ), show E (h) lim inf n E n (h n ). Detailed stochastic analysis involves: 1. Set Ψ n (h) = n 2 (h(x(n) 0 ) h(x(n) 1 )).
74 Methods of proof 3: establishing condition (M1) If h n h weakly in L 2 (π ), show E (h) lim inf n E n (h n ). Detailed stochastic analysis involves: 1. Set Ψ n (h) = n 2 (h(x(n) 0 ) h(x(n) 1 )). 2. Integrate against test function ξ(x 1:N, W 1:N )I(U < a(x 1:N, W 1:N )) for ξ smooth, compact support, U a Uniform(0, 1) random variable. Apply Cauchy-Schwarz.
75 Methods of proof 3: establishing condition (M1) If h n h weakly in L 2 (π ), show E (h) lim inf n E n (h n ). Detailed stochastic analysis involves: 1. Set Ψ n (h) = n 2 (h(x(n) 0 ) h(x(n) 1 )). 2. Integrate against test function ξ(x 1:N, W 1:N )I(U < a(x 1:N, W 1:N )) for ξ smooth, compact support, U a Uniform(0, 1) random variable. Apply Cauchy-Schwarz. 3. Use integration by parts, careful analysis and conditions on φ.
76 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability:
77 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ).
78 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ). [ Also I = E φ 2] <.
79 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ). [ Also I = E φ 2] <. Durmus et al. [(2016) obtain optimal scaling results when p > 4, and E φ 6] <,
80 3 Doing even better Durmus et al. (2016) introduce L p mean differentiability: there is φ such that, for some p > 2, some α > 0, φ(x + u) φ(x) = ( φ(x) + R(X, u)) u, E [ R(X, u) p ] 1/p = o( u α ). [ Also I = E φ 2] <. Durmus et al. [(2016) obtain optimal scaling results when p > 4, and E φ 6] <, L p mean differentiability applies straightforwardly to the Zanella, Bédard and WSK (2016) argument mutatis mutandis: the regularity conditions can be weakened even more at least for vague convergence.
81 Conclusion Introduction MCMC and optimal scaling Dirichlet forms and optimal scaling Results and methods of proofs Conclusion
82 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results;
83 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions;
84 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I);
85 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress);
86 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress); Need to explore development beyond i.i.d. targets; e.g. can regularity be similarly relaxed in more general random field settings?
87 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress); Need to explore development beyond i.i.d. targets; e.g. can regularity be similarly relaxed in more general random field settings? Apply to discrete Markov chain cases? (c.f. Roberts, 1998);
88 25 Conclusion The Dirichlet form approach allows significant relaxation of conditions required for optimal scaling results; Combine with L p mean differentiability to obtain further relaxation of regularity conditions; Soft argument for 1 2variance + mean 0 implied by N ( 1 2 σ 2 I, σ 2 I); MALA generalization (exercise in progress); Need to explore development beyond i.i.d. targets; e.g. can regularity be similarly relaxed in more general random field settings? Apply to discrete Markov chain cases? (c.f. Roberts, 1998); Investigate applications to Adaptive MCMC.
89 6 References: Andrieu, Christophe and Johannes Thoms (2008). A tutorial on adaptive MCMC. In: Statistics and Computing 18.4, pp Bédard, Mylène (2007). Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. In: Annals of Applied Probability 17.4, pp Breyer, L A and Gareth O Roberts (2000). From Metropolis to diffusions : Gibbs states and optimal scaling. In: Stochastic Processes and their Applications 90.2, pp Brooks, Stephen P, Andrew Gelman, Galin L Jones and Xiao-Li Meng (2011). Handbook of Markov Chain Monte Carlo. Boca Raton: Chapman & Hall/CRC, pp. 592+xxv.
90 Durmus, Alain, Sylvain Le Corff, Eric Moulines and Gareth O Roberts (2016). Optimal scaling of the Random Walk Metropolis algorithm under $Lˆp$ mean differentiability. In: arxiv Geyer, Charlie (1999). Likelihood inference for spatial point processes. In: Stochastic Geometry: likelihood and computation. Ed. by Ole E Barndorff-Nielsen, WSK and M N M van Lieshout. Boca Raton: Chapman & Hall/CRC. Chap. 4, pp Hastings, W K (1970). Monte Carlo sampling methods using Markov chains and their applications. In: Biometrika 57, pp Mattingly, Jonathan C., Natesh S. Pillai and Andrew M. Stuart (2012). Diffusion limits of the random walk metropolis algorithm in high dimensions. In: Annals of Applied Probability 22.3, pp
91 8 Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller and Edward Teller (1953). Equation of state calculations by fast computing machines. en. In: Journal Chemical Physics 21.6, pp Mosco, Umberto (1994). Composite media and asymptotic Dirichlet forms. In: Journal of Functional Analysis 123.2, pp Roberts, Gareth O (1998). Optimal Metropolis algorithms for product measures on the vertices of a hypercube. In: Stochastics and Stochastic Reports June 2013, pp Roberts, Gareth O, A Gelman and W Gilks (1997). Weak Convergence and Optimal Scaling of Random Walk Algorithms. In: The Annals of Applied Probability 7.1, pp
92 9 Roberts, Gareth O and Jeffrey S Rosenthal (1998). Optimal scaling of discrete approximations to Langevin diffusions.. In: J. R. Statist. Soc. B 60.1, pp Rosenthal, Jeffrey S (2011). Optimal Proposal Distributions and Adaptive MCMC. In: Handbook of Markov Chain Monte Carlo 1, pp Sun, Wei (1998). Weak convergence of Dirichlet processes. In: Science in China Series A: Mathematics 41.1, pp Thompson, Elizabeth A (2005). MCMC in the analysis of genetic data on pedigree. In: Markov chain Monte Carlo: Innovations and Applications. Ed. by WSK, Faming Liang and Jian-Sheng Wang. Singapore: World Scientific. Chap. 5, pp Zanella, Giacomo (2015). Bayesian Complementary Clustering, MCMC, and Anglo-Saxon Placenames. PhD Thesis. University of Warwick.
93 0 Zanella, Giacomo (2016). Random Partition Models and Complementary Clustering of Anglo-Saxon Placenames. In: Annals of Applied Statistics 9.4, pp Zanella, Giacomo, Mylène Bédard and WSK (2016). A Dirichlet Form approach to MCMC Optimal Scaling. In: arxiv , 22pp.
94 1 Version 1.34 (Wed Jul 12 06:27: ) ================================================ commit d81469c5f14c484aa363616e3d701c6e3fbb1141 Author: Wilfrid Kendall Made it explicit that Lp mean differentiability still doesn t cover weak without extra regularity: need to beat this!
Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea
Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationLog-concave sampling: Metropolis-Hastings algorithms are fast!
Proceedings of Machine Learning Research vol 75:1 5, 2018 31st Annual Conference on Learning Theory Log-concave sampling: Metropolis-Hastings algorithms are fast! Raaz Dwivedi Department of Electrical
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationMath 456: Mathematical Modeling. Tuesday, April 9th, 2018
Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian Complementary Clustering, MCMC and Anglo-Saxon placenames
Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames Giacomo Zanella g.zanella@warwick.ac.uk Department of Statistics University of Warwick, Coventry, UK 20 May 2014 Overview 1. Motivation:
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationComputational Complexity of Metropolis-Hastings Methods in High Dimensions
Computational Complexity of Metropolis-Hastings Methods in High Dimensions Alexandros Beskos and Andrew Stuart Abstract This article contains an overview of the literature concerning the computational
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationVariance Bounding Markov Chains
Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.
More informationSequential Monte Carlo Methods in High Dimensions
Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationThe simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1
Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationReversible Markov chains
Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationGradient-based Monte Carlo sampling methods
Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification
More informationApplicability of subsampling bootstrap methods in Markov chain Monte Carlo
Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationPractical unbiased Monte Carlo for Uncertainty Quantification
Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationOn the Applicability of Regenerative Simulation in Markov Chain Monte Carlo
On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronatics Massachusetts Institute of Technology ymarz@mit.edu 22 June 2015 Marzouk (MIT) IMA Summer School 22 June 2015
More informationVARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM
Submitted to the Annals of Statistics VARIABLE TRANSFORMATION TO OBTAIN GEOMETRIC ERGODICITY IN THE RANDOM-WALK METROPOLIS ALGORITHM By Leif T. Johnson and Charles J. Geyer Google Inc. and University of
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationList of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016
List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers
More informationProbability for Statistics and Machine Learning
~Springer Anirban DasGupta Probability for Statistics and Machine Learning Fundamentals and Advanced Topics Contents Suggested Courses with Diffe~ent Themes........................... xix 1 Review of Univariate
More informationMALA versus Random Walk Metropolis Dootika Vats June 4, 2017
MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented
More informationNegative Association, Ordering and Convergence of Resampling Methods
Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal
More informationZig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationParticle Metropolis-adjusted Langevin algorithms
Particle Metropolis-adjusted Langevin algorithms Christopher Nemeth, Chris Sherlock and Paul Fearnhead arxiv:1412.7299v3 [stat.me] 27 May 2016 Department of Mathematics and Statistics, Lancaster University,
More informationThe zig-zag and super-efficient sampling for Bayesian analysis of big data
The zig-zag and super-efficient sampling for Bayesian analysis of big data LMS-CRiSM Summer School on Computational Statistics 15th July 2018 Gareth Roberts, University of Warwick Joint work with Joris
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationSpectral properties of Markov operators in Markov chain Monte Carlo
Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Michael Johannes Columbia University Nicholas Polson University of Chicago August 28, 2007 1 Introduction The Bayesian solution to any inference problem is a simple rule: compute
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationAdaptive Markov Chain Monte Carlo: Theory and Methods
Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationAusterity in MCMC Land: Cutting the Metropolis-Hastings Budget
Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget Anoop Korattikara, Yutian Chen and Max Welling,2 Department of Computer Science, University of California, Irvine 2 Informatics Institute,
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationMonte Carlo Inference Methods
Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More informationAsymptotics and Simulation of Heavy-Tailed Processes
Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationOptimal Acceptance Rates for Metropolis Algorithms: Moving Beyond 0.234
Optimal Acceptance Rates for Metropolis Algorithms: Moving Beyond 0.34 Mylène Bédard April 3, 006 Abstract Recent optimal scaling theory has produced a necessary and sufficient condition for the asymptotically
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationLecture 21: Convergence of transformations and generating a random variable
Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous
More informationSimulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris
Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationarxiv: v1 [math.st] 4 Dec 2015
MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv:151.01366v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationOptimizing and Adapting the Metropolis Algorithm
6 Optimizing and Adapting the Metropolis Algorithm Jeffrey S. Rosenthal University of Toronto, Toronto, ON 6.1 Introduction Many modern scientific questions involve high-dimensional data and complicated
More informationDereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods
1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 6 Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods Florian Maire maire@dms.umontreal.ca Département
More informationGeometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics
Geometric Ergodicity of a Random-Walk Metorpolis Algorithm via Variable Transformation and Computer Aided Reasoning in Statistics a dissertation submitted to the faculty of the graduate school of the university
More informationMean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral
Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ.
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More information