Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Similar documents
A Dirichlet Form approach to MCMC Optimal Scaling

Convergence of Random Walks

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Markov Chain Monte Carlo (MCMC)

YVES F. ATCHADÉ, GARETH O. ROBERTS, AND JEFFREY S. ROSENTHAL

A Review of Multiple Try MCMC algorithms for Signal Processing

Expected Value of Partial Perfect Information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

arxiv:math/ v1 [math.pr] 3 Jul 2006

Topic 7: Convergence of Random Variables

Logarithmic spurious regressions

Self-normalized Martingale Tail Inequality

Function Spaces. 1 Hilbert Spaces

eqr094: Hierarchical MCMC for Bayesian System Reliability

Agmon Kolmogorov Inequalities on l 2 (Z d )

1. Aufgabenblatt zur Vorlesung Probability Theory

Monte Carlo Methods with Reduced Error

Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Markov chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo

Generalization of the persistent random walk to dimensions greater than 1

3 The variational formulation of elliptic PDEs

Kernel Sequential Monte Carlo

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Bayesian Methods for Machine Learning

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Kernel adaptive Sequential Monte Carlo

Random products and product auto-regression

Darboux s theorem and symplectic geometry

Introduction to Machine Learning CMU-10701

Monotonicity for excited random walk in high dimensions

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

University of Toronto Department of Statistics

Acute sets in Euclidean spaces

7.1 Support Vector Machine

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

05 The Continuum Limit and the Wave Equation

arxiv: v4 [math.pr] 27 Jul 2016

Exponential asymptotic property of a parallel repairable system with warm standby under common-cause failure

A simple tranformation of copulas

REVERSIBILITY FOR DIFFUSIONS VIA QUASI-INVARIANCE. 1. Introduction We look at the problem of reversibility for operators of the form

Tensors, Fields Pt. 1 and the Lie Bracket Pt. 1

6 Markov Chain Monte Carlo (MCMC)

17 : Markov Chain Monte Carlo

ESTIMATION OF INTEGRALS WITH RESPECT TO INFINITE MEASURES USING REGENERATIVE SEQUENCES

Lower bounds on Locality Sensitive Hashing

Quantile function expansion using regularly varying functions

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Lecture 1: Review of Basic Asymptotic Theory

Convergence of Langevin MCMC in KL-divergence

Controlled sequential Monte Carlo

GLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS

arxiv: v1 [math.pr] 4 Feb 2016

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Variance Bounding Markov Chains

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

ON MULTIPLE TRY SCHEMES AND THE PARTICLE METROPOLIS-HASTINGS ALGORITHM

Markov chain Monte Carlo

Modeling of Dependence Structures in Risk Management and Solvency

A Short Note on Self-Similar Solution to Unconfined Flow in an Aquifer with Accretion

Tractability results for weighted Banach spaces of smooth functions

Some Examples. Uniform motion. Poisson processes on the real line

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Monte Carlo methods for sampling-based Stochastic Optimization

PROBABILITY THEORY EXPONENTIAL FUNCTIONAL OF LÉVY PROCESSES: GENERALIZED WEIERSTRASS PRODUCTS AND WIENER-HOPF FACTORIZATION P. PATIE AND M.

arxiv: v1 [math-ph] 2 May 2016

Sequential Monte Carlo Methods in High Dimensions

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Some Results on the Ergodicity of Adaptive MCMC Algorithms

WUCHEN LI AND STANLEY OSHER

Session 3A: Markov chain Monte Carlo (MCMC)

Introduction to the Vlasov-Poisson system

Solution to the exam in TFY4230 STATISTICAL PHYSICS Wednesday december 1, 2010

High-Dimensional p-norms

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems

Analysis IV, Assignment 4

WELL-POSEDNESS OF A POROUS MEDIUM FLOW WITH FRACTIONAL PRESSURE IN SOBOLEV SPACES

SYMPLECTIC GEOMETRY: LECTURE 3

STA 4273H: Statistical Machine Learning

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

Chapter 2 Lagrangian Modeling

Computational Complexity of Metropolis-Hastings Methods in High Dimensions

A Spectral Method for the Biharmonic Equation

Solutions to Math 41 Second Exam November 4, 2010

Robust Low Rank Kernel Embeddings of Multivariate Distributions

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

Optimal scaling for partially updating MCMC algorithms. Neal, Peter and Roberts, Gareth. MIMS EPrint:

Energy behaviour of the Boris method for charged-particle dynamics

LECTURE NOTES ON DVORETZKY S THEOREM

16 : Markov Chain Monte Carlo (MCMC)

Discrete Mathematics

Transcription:

Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics problems which are as yet not analytically soluble. Giacomo Zanella, Wilfri S. Kenall, an Mylène Béar. g.zanella@warwick.ac.uk, w.s.kenall@warwick.ac.uk, mylene.bear@umontreal.ca Supporte by EPSRC Research Grants EP/D002060, EP/K013939 Arian Smith (circa 1990, you set the MCMC sampler going, then you go off to the pub. Charlie Geyer (circa 1996 on the uses of MCMC, If you can write own a moel, I can o likelihoo inference for it,..., whatever. Lonon Probability Seminar King s College Lonon Elizabeth Thompson (circa 2004, You shoul only use MCMC when all else has faile. 7th October 2016 If you have to use MCMC, how to make it work as well as possible? 1 3 Example: MCMC for Anglo-Saxon statistics Some historians conjecture that Anglo-Saxon placenames cluster by issimilar names. Zanella(2016,2015 uses MCMC: ata provies some support, resulting in useful clustering. MCMC iea Goal: estimate E = E π [h(x]. Metho: simulate ergoic Markov chain with stationary istribution π: use empirical estimate Ê n = 1 n0 +n n n=n 0 h(x n. (Much easier to apply theory if chain is reversible. Theory: Ê n E almost surely. 4 6

Varieties of MH MCMC Here is the famous Metropolis-Hastings recipe for rawing from a istribution with ensity f : Propose Y using conitional ensity q(y x; Accept/Reject move from X to Y, base on ratio f(y q(x Y / f(x q(y X Gaussian MHRW MCMC Simple Python coe for Gaussian MHRW MCMC, using normal an exponential from Numpy: Propose multivariate Gaussian step; Options: 1. Inepenence sampler: proposal q(y x = q(y oesn t epen on x; 2. Ranom walk (MHRW MCMC: proposal q(y x = q(y x behaves as a ranom walk; 3. Langevin MH MCMC (or MALA: proposal q(y x = q(y x λ gra log f rifts towars high target ensity f. Test whether to accept proposal by comparing exponential ranom variable with log MH ratio; Implement step if accepte (vector aition while not mcmc.stoppe(: z = normal(0, tau, size=mcmc.im if exponential( > mcmc.phi(mcmc.x + z-mcmc.phi(mcmc.x: mcmc.x += z mcmc.recor_result( We shall focus on MHRW MCMC with Gaussian proposals. 7 8 MHRW MCMC with Gaussian proposals (I MHRW MCMC with Gaussian proposals (II (smooth target, marginal exp( x 4 (smooth target, marginal exp( x 4 Target is given by 10 i.i.. coorinates. Target is given by 20 i.i.. coorinates: Scale parameter for proposal: τ = 0.01 is too small. Scale parameter for proposal: τ = 0.01 is clearly too small. 9 Acceptance ratio 98.5% 10 Acceptance ratio 96.7%

MCMC Optimal Scaling: classic result (I MHRW MCMC on (R, π π(x i = e φ(x i x i ; MH acceptance rule A ( = 0 or 1. X ( 0 = ( X 1,..., X X i ii π X ( 1 = (X 1 + A ( W 1,..., X + A ( W W i ii N(0, σ 2 Questions: (1 complexity as? (2 optimal σ? Theorem (Roberts, Gelman an Gilks, 1997 Given σ 2 = σ 2, Lipschitz φ, an finite E π [(φ 8 ], E π [(φ 4 ] {X ( t,1 } t Z where Z t = s(σ 1 2 Bt + s(σ φ (Z t /2 t. Answers: (1 mix in O( steps; (2 σ max = arg max σ s(σ. 11 MCMC Optimal Scaling: classic result (II How to maximize s(σ? Given I = E π [φ (X 2 ] an normal CDF Φ, s(σ = σ 2 2Φ( σ I 2 = σ 2 A(σ = 4 I ( Φ 1 ( A(σ 2 2 A(σ So σ max given by maximizing { asymptotic acceptance rate (Φ } A(σ max = arg max 1 A [0,1] ( A 2 2 A} 0.234 Strengths: Establish complexity as ; Practical information on how to tune proposal; Does not epen on φ (CLT-type universality. Some weaknesses that we aress: (there are others Convergence of marginal rather than joint istribution Strong regularity assumptions: Lipschitz g, finite E [ (g 8], E [ (g 4]. 12 13 MCMC Optimal Scaling: classic result (III There is a wie range of extensions: for example, Langevin / MALA, for which the magic acceptance probability is 0.574 (Roberts an Rosenthal 1998; Non-ientically istribute inepenent target coorinates (Béar 2007; Gibbs ranom fiels (Breyer an Roberts 2000; Infinite imensional ranom fiels (Mattingly, Pillai, an Stuart 2012; Markov chains on a hypercube (Roberts 1998; Aaptive MCMC; ajust online to optimize acceptance probability (Anrieu an Thoms 2008; Rosenthal 2011. All these buil on the s..e. approach of Roberts, Gelman, an Gilks (1997; hence regularity conitions ten to be severe (though see Durmus, Le Corff, Moulines, an Roberts 2016. 15 Dirichlet forms an MCMC 1 Definition of Dirichlet form A (symmetric Dirichlet form E on a Hilbert space H is a bilinear function E(u, v, efine for any u, v D H, which satisfies: 1. D is a ense linear subspace of H; 2. E(u, v = E(v, u for u, v D, so E is symmetric; 3. E(u = E(u, u 0 for u D; 4. D is a Hilbert space uner the ( Sobolev inner prouct u, v + E(u, v; 5. If u D then u = (u 1 0 D, moreover E(u, u E(u, u. Relate to Markov processes if (quasi-regular.

Dirichlet forms an MCMC 2 Two examples 1. Dirichlet form obtaine from (re-scale MHRW MCMC: E (h = [ ( ] 2 E h(x ( 2 1 h(x( 0. (E can be viewe as the Dirichlet form arising from speeing up the MHRW MCMC by rate. 2. Heuristic infinite-imensional iffusion limit of this form uner scaling: E (h = s(σ 2 E π [ h 2]. Can we euce that the MHRW MCMC scales to look like the infinite-imensional iffusion, by showing that E converges to E? 16 Useful moes of convergence for Dirichlet forms 1. Gamma-convergence; E n Γ -converges to E if (Γ 1 E (h lim inf n E n (h n whenever h n h H; (Γ 2 For every h H there are h n h H such that E (h lim sup n E n (h n. 2. Mosco (1994 introuces stronger conitions; (M1 E (h lim inf n E n (h n whenever h n h weakly in H; (M2 For every h H there are h n h strongly in H such that E (h lim sup n E n (h n. 3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1: conitions (M1 an (M2 imply convergence of associate resolvent operators, an inee of associate semigroups. 4. Sun (1998 gives further conitions which imply weak convergence of the associate processes: these conitions are implie by existence of a finite constant C such that E n (h C( h 2 + E(h for all h H. 17 19 Results Theorem (Zanella, Kenall an Béar, 2016 Consier the Gaussian MHRW MCMC base on proposal variance σ 2 / with target π, where π = f x = e φ x. Suppose I = φ 2 f x < (finite Fisher information, an φ (x + v φ (x < κ max{ v γ, v α } for some κ > 0, 0 < γ < 1, an α > 1. Let E be the corresponing [ Dirichlet form scale as] above. E Mosco-converges to E 1 exp(n ( 1 2 σ 2 I, σ 2 I E, so corresponing L 2 semigroups also converge. Corollary Suppose in the above that φ is globally Lipschitz. The corresponingly scale processes exhibit weak convergence. 20 Lemma Methos of proof 1: a CLT result Uner the conitions of the Theorem, almost surely (in x with invariant measure π the log Metropolis-Hastings ratio converges weakly (in W as follows as : f (x i + σ W i log f (x i = ( φ(x i + σ W i φ(x i N ( 1 2 σ 2 I, σ 2 I. We may use this to euce the asymptotic acceptance rate of the MHRW MCMC sampler.

Key iea for CLT Methos of proof 2: establishing conition (M2 ( φ(x i + σ W i φ(x i φ (x i σ W i + = 1 ( σ W i φ (x i + σ W i 0 u φ (x i u. Conition implicitly on x for first 2.5 steps. 1. First summan converges weakly to N (0, σ 2 I. 2. Decompose [ variance of secon summan to euce σ W Var i ( ] 1 0 φ (x i + σ W i u φ (x i u 0. 3. Use [ Hoeffing s inequality then absolute expectations: ( ] E φ (x i + σ W i u φ (x i u 1 2 σ 2 I. σ W i 1 0 For every h L 2 (π, fin h n h (strongly in L 2 (π such that E (h lim sup n E n (h n. 1. Sufficient to consier case E (h <. 2. Fin sequence of smooth cyliner functions h n with compact cylinrical support, such that E (h E (h n 1/n. 3. Using smoothness etc, E m (h n E (h n as m. 4. Subsequences.... 21 22 Methos of proof 3: establishing conition (M1 Conclusion If h n h weakly in L 2 (π, show E (h lim inf n E n (h n. 1. Set Ψ (h = 2 (h(x( 0 h(x( 1. 2. Integrate against test function ξ(x 1:N, W 1:N I(U < a(x 1:N, W 1:N for ξ smooth, compact support, U a Uniform(0, 1 ranom variable. Apply Cauchy-Schwarz. 3. Use integration by parts, careful analysis an conitions on φ. The Dirichlet form approach allows significant relaxation of conitions require for optimal scaling results; Nee to explore whether further relaxation can be obtaine (almost surely possible; Nee to explore evelopment beyon i.i.. targets; e.g. can regularity be similarly relaxe in more general ranom fiel settings? Can this be applie in iscrete Markov chain cases (c.f. Roberts 1998; Investigate applications to Aaptive MCMC. 23 25

Anrieu, C. an J. Thoms (2008. A tutorial on aaptive MCMC. Statistics an Computing 18(4, 343 373. Béar, M. (2007, January. Weak convergence of Metropolis algorithms for non-i.i.d. target istributions. Annals of Applie Probability 17(4, 1222 1244. Breyer, L. A. an G. O. Roberts (2000. From Metropolis to iffusions : Gibbs states an optimal scaling. Stochastic Processes an their Applications 90(2, 181 206. Durmus, A., S. Le Corff, E. Moulines, an G. O. Roberts (2016. Optimal scaling of the Ranom Walk Metropolis algorithm uner L p mean ifferentiability. arxiv 1604.06664. Geyer, C. (1999. Likelihoo inference for spatial point processes. In O. E. Barnorff-Nielsen, WSK, an M. N. M. van Lieshout (Es., Stochastic Geometry: likelihoo an computation, Chapter 4, pp. 79 140. Boca Raton: Chapman & Hall/CRC. Hastings, W. K. (1970. Monte Carlo sampling methos using Markov chains an their applications. Biometrika 57, 97 109. 26 27 28 Mattingly, J. C., N. S. Pillai, an A. M. Stuart (2012. Diffusion limits of the ranom walk metropolis algorithm in high imensions. Annals of Applie Probability 22(3, 881 890. Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, an E. Teller (1953, June. Equation of state calculations by fast computing machines. Journal Chemical Physics 21(6, 1087 1092. Mosco, U. (1994. Composite meia an asymptotic Dirichlet forms. Journal of Functional Analysis 123(2, 368 421. 29 Roberts, G. O. (1998. Optimal Metropolis algorithms for prouct measures on the vertices of a hypercube. Stochastics an Stochastic Reports (June 2013, 37 41. Roberts, G. O., A. Gelman, an W. Gilks (1997. Weak Convergence an Optimal Scaling of Ranom Walk Algorithms. The Annals of Applie Probability 7(1, 110 120. Roberts, G. O. an J. S. Rosenthal (1998. Optimal scaling of iscrete approximations to Langevin iffusions. J. R. Statist. Soc. B 60(1, 255 268. Rosenthal, J. S. (2011. Optimal Proposal Distributions an Aaptive MCMC. Hanbook of Markov Chain Monte Carlo (1, 93 112.

Sun, W. (1998. Weak convergence of Dirichlet processes. Science in China Series A: Mathematics 41(1, 8 21. Thompson, E. A. (2005. MCMC in the analysis of genetic ata on peigree. In WSK, F. Liang, an J.-S. Wang (Es., Markov chain Monte Carlo: Innovations an Applications, Chapter 5, pp. 183 216. Singapore: Worl Scientific. Zanella, G. (2016. Ranom Partition Moels an Complementary Clustering of Anglo-Saxon Placenames. Annals of Applie Statistics 9(4, 1792 1822. Zanella, G., WSK, an M. Béar (2016. A Dirichlet Form approach to MCMC Optimal Scaling. arxiv 1606.01528, 22pp. Zanella, G. (2015. Bayesian Complementary Clustering, MCMC, an Anglo-Saxon Placenames. Ph thesis, University of Warwick. 30 31