The parallel replica method for Markov chains

Similar documents
The Parallel Replica Method for Simulating Long Trajectories of Markov Chains

arxiv: v2 [math.na] 20 Dec 2016

A mathematical framework for Exact Milestoning

Numerical methods in molecular dynamics and multiscale problems

Approximate Counting and Markov Chain Monte Carlo

MARKOV PROCESSES. Valerio Di Valerio

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Quantum algorithms based on quantum walks. Jérémie Roland (ULB - QuIC) Quantum walks Grenoble, Novembre / 39

Lecture 5: Random Walks and Markov Chain

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Monte Carlo methods for sampling-based Stochastic Optimization

Markov Chain Monte Carlo (MCMC)

Advanced Sampling Algorithms

The story of the film so far... Mathematics for Informatics 4a. Continuous-time Markov processes. Counting processes

Stochastic optimization Markov Chain Monte Carlo

Capacitary inequalities in discrete setting and application to metastable Markov chains. André Schlichting

The coupling method - Simons Counting Complexity Bootcamp, 2016

NUMERICAL ANALYSIS OF PARALLEL REPLICA DYNAMICS

CSC 2541: Bayesian Methods for Machine Learning

Inference in Bayesian Networks

Some Results on the Ergodicity of Adaptive MCMC Algorithms

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University

Markov processes and queueing networks

Introduction to Stochastic Processes

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

MCMC and Gibbs Sampling. Sargur Srihari

Random Locations of Periodic Stationary Processes

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Approximate Bayesian Computation and Particle Filters

STOCHASTIC PROCESSES Basic notions

Applied Stochastic Processes

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

Stat 516, Homework 1

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Markov Chain BSDEs and risk averse networks

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Information Theory and Statistics Lecture 3: Stationary ergodic processes

The Theory behind PageRank

Uniformly Uniformly-ergodic Markov chains and BSDEs

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Stochastic domination in space-time for the supercritical contact process

T. Liggett Mathematics 171 Final Exam June 8, 2011

Introduction to Restricted Boltzmann Machines

Markov Chains and Stochastic Sampling

1 Stat 605. Homework I. Due Feb. 1, 2011

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Stochastic Process (ENPC) Monday, 22nd of January 2018 (2h30)

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

Non-homogeneous random walks on a semi-infinite strip

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Markov chains (week 6) Solutions

18.600: Lecture 32 Markov Chains

25.1 Markov Chain Monte Carlo (MCMC)

Advanced Computer Networks Lecture 2. Markov Processes

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Computational statistics

Markov Chain Monte Carlo

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

Convergence Rate of Markov Chains

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Coupling AMS Short Course

Monte Carlo Methods. Leon Gu CSD, CMU

Bayesian Methods with Monte Carlo Markov Chains II

Random Walks A&T and F&S 3.1.2

18.175: Lecture 30 Markov chains

1. Stochastic Processes and filtrations

Stochastic process. X, a series of random variables indexed by t


The Markov Chain Monte Carlo Method

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

University of Toronto Department of Statistics

Propp-Wilson Algorithm (and sampling the Ising model)

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

16 : Approximate Inference: Markov Chain Monte Carlo

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

A primer on basic probability and Markov chains

On Stopping Times and Impulse Control with Constraint

Sequential Monte Carlo Samplers for Applications in High Dimensions

arxiv: v2 [math.pr] 25 May 2017

Markov Chains for Everybody

Machine Learning for Data Science (CS4786) Lecture 24

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Markov chain Monte Carlo

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

LECTURE #6 BIRTH-DEATH PROCESS

Transcription:

The parallel replica method for Markov chains David Aristoff (joint work with T Lelièvre and G Simpson) Colorado State University March 2015 D Aristoff (Colorado State University) March 2015 1 / 29

Introduction Definition Let (X n ) be a time homogeneous Markov chain: P(X n+1 A X 0,, X n 1, {X n = x}) = Our goal To efficiently simulate (X n ) when it is metastable State space can be discrete: random walk in random environment, Markov State Models or continuous: time discretization of molecular dynamics (MD) (X n ) can be reversible: MCMC, time discretization of Brownian/Langevin MD or non-reversible: stochastic MD with flow/external forces We are interested in both dynamics and equilibrium! A K(x, dy) D Aristoff (Colorado State University) March 2015 2 / 29

Introduction Metastability can arise from energetic barriers or entropic barriers entropic barrier S V = 0 in blue region V = in red region Example: Energy (left) and entropic (right) barriers for Brownian/Langevin dynamics Here, S is a metastable subset of state space D Aristoff (Colorado State University) March 2015 3 / 29

The QSD Definition If ν is a probability measure supported in S with ν( ) = P(X n X 0 ν, X 1,, X n S), n 1, then ν is a quasi-stationary distribution (QSD) in S If for any µ supported in S and any A S, ν(a) = lim n P(X n A X 0 µ, X 1,, X n S), then ν is the unique QSD in S Intuitively: (X n ) reaches ν after spending a long time in S without leaving Our setting: Time scale to reach ν time scale to leave S We need to understand exit events from S, starting from ν D Aristoff (Colorado State University) March 2015 4 / 29

The QSD Let ν be the QSD in S and τ the first exit time from S Theorem If X 0 ν, then τ is geometrically distributed, and τ, X τ are independent The geometric distribution is P(τ > n) = (1 p) n, p = P(X 1 / S X 0 ν) It is memoryless: P(τ > n + k τ > k) = P(τ > n) D Aristoff (Colorado State University) March 2015 5 / 29

The QSD S ν Figure: Serial simulation of an exit event from S, starting from ν D Aristoff (Colorado State University) March 2015 6 / 29

The QSD S ν ν ν Figure: Parallel simulation of an exit event from S, starting from ν D Aristoff (Colorado State University) March 2015 7 / 29

The QSD S ν ν ν Figure: Parallel simulation of an exit event from S, with a tie D Aristoff (Colorado State University) March 2015 8 / 29

The QSD S ν ν ν Figure: Parallel simulation of an exit event from S, with a tiebreaker D Aristoff (Colorado State University) March 2015 9 / 29

The Parallel Step Parallel Step Choose a polling time T poll, set M = 1 and iterate: 1 Start N iid replicas of (X n ) at independent samples of ν 2 Evolve the replicas from time (M 1)T poll to time MT poll 3 If no replica leaves S during this time, update M = M + 1 and return to 2 Else, stop and compute an exit time, τ acc, and exit point, X acc D Aristoff (Colorado State University) March 2015 10 / 29

The Parallel Step T poll 2T poll (M 1)T poll MT poll replica 1 replica 2 replica K 1 replica K replica K+1 replica N time 0 NT poll 2NT poll (M 1)NT poll τ acc D Aristoff (Colorado State University) March 2015 11 / 29

The Parallel Step T poll 2T poll (M 1)T poll MT poll replica 1 replica 2 replica K 1 replica K replica K+1 replica N time 0 NT poll 2NT poll (M 1)NT poll τ acc D Aristoff (Colorado State University) March 2015 12 / 29

The Parallel Step T poll 2T poll (M 1)T poll MT poll replica 1 replica 2 replica K 1 replica K replica K+1 replica N time 0 NT poll 2NT poll (M 1)NT poll τ acc D Aristoff (Colorado State University) March 2015 13 / 29

The Parallel Step T poll 2T poll (M 1)T poll MT poll replica 1 replica 2 replica K 1 replica K replica K+1 replica N time 0 NT poll 2NT poll (M 1)NT poll τ acc D Aristoff (Colorado State University) March 2015 14 / 29

The Parallel Step T poll 2T poll (M 1)T poll MT poll replica 1 replica 2 replica K 1 replica K replica K+1 replica N time 0 NT poll 2NT poll (M 1)NT poll τ acc D Aristoff (Colorado State University) March 2015 15 / 29

The Parallel Step T poll 2T poll (M 1)T poll MT poll replica 1 τ K replica 2 replica K 1 replica K replica K+1 replica N time 0 NT poll 2NT poll (M 1)NT poll τ acc τ acc = sum of lengths of bluelines = (M 1)NT poll +(K 1)T poll +[τ K (M 1)T poll ] X acc = positionof K-th replica at red cross (time τ K ) D Aristoff (Colorado State University) March 2015 16 / 29

The Parallel Step Parallel Step Choose a polling time T poll, set M = 1 and iterate the following 1 Start N iid replicas of (X n ) at independent samples of ν 2 Evolve the replicas from time (M 1)T poll to time MT poll 3 If no replica leaves S during this time, update M = M + 1 and return to 2 Else, record the exit event (τ acc, X acc ) Theorem The Parallel Step is exact: (τ acc, X acc ) (τ, X τ ), with τ and X τ the first exit time and exit point from S, starting at ν D Aristoff (Colorado State University) March 2015 17 / 29

The Main Algorithm Let S be a collection of disjoint subsets of state space Let T corr = T corr (S) be the time it takes to reach the QSD in S Main Algorithm 1 Evolve (X n ) until it spends time T corr in some set S S without leaving 2 Run the Parallel Step in S Then, return to 1 Let Π be the quotient map which mods out state space by S Coarse dynamics The Main Algorithm produces an approximation ( X n ) of (Π(X n )) via X n = Π(X n ) in Step 1, Xn = Π(S) in Step 2 D Aristoff (Colorado State University) March 2015 18 / 29

The Main Algorithm Main Algorithm 1 Evolve (X n ) until it spends time T corr in some set S S without leaving 2 Run the Parallel Step in S Then, return to 1 Let f be a function on state space and π the equilibrium distribution of (X n ) Equilibrium averages The Main Algorithm produces an approximation f sim T sim of f dπ via f sim = f sim + f (X n ) in Step 1, f sim = f sim + f acc in Step 2, where f acc is the sum of values of f over replicas in the Parallel Step In Step 1, T sim increases by 1 for each time step; in Step 2, T sim is increased by τ acc D Aristoff (Colorado State University) March 2015 19 / 29

The Main Algorithm T poll 2T poll (M 1)T poll MT poll replica 1 τ K replica 2 replica K 1 replica K replica K+1 replica N time τ acc = sum of lengths of bluelines f acc = sum of values of f over positions of replicas along bluelines D Aristoff (Colorado State University) March 2015 20 / 29

The Main Algorithm Idealization After spending T corr consecutive time steps in S, (X n ) exactly reaches the QSD Assumption (X n ) is uniformly ergodic: Theorem lim sup P(X n X 0 = x) π TV = 0 n x Under the Idealization, the Main Algorithm generates exact coarse dynamics: ( X n ) (Π(X n )) Under the Idealization and Assumption, the Main Algorithm generates exact equilibrium averages: f sim as lim = f dπ T sim T sim D Aristoff (Colorado State University) March 2015 21 / 29

Numerics 200 180 160 140 120 S 2 100 80 60 40 S 1 barrier pathways 20 0 100 50 0 50 100 150 200 Figure: A random walk with entropic barriers At each step it moves up, down, left or right at random, unless that results in crossing a barrier D Aristoff (Colorado State University) March 2015 22 / 29

Numerics 75 70 65 92 91 90 89 x 60 y 88 55 50 87 86 85 45 0 2000 4000 6000 T corr (S 1 ) 84 0 2000 4000 6000 T corr (S 1 ) 041 25 f 04 039 038 037 036 time speedup 20 15 035 0 2000 4000 6000 T corr (S 1 ) 10 0 2000 4000 6000 T corr (S 1 ) Figure: Equilibrium averages vs T corr when N = 100 Here f = 1 (x,y) upper half of right box D Aristoff (Colorado State University) March 2015 23 / 29

Numerics 12 10 x 106 time speedup 10 8 6 4 decorrelation steps 95 9 2 0 20 40 60 80 100 N 85 0 20 40 60 80 100 N 580 2 x 107 parallel steps 570 560 550 540 530 520 510 parallel loops 15 1 05 500 0 20 40 60 80 100 N 0 0 20 40 60 80 100 N Figure: Parallel step stats when T corr = largest value from last slide and T sim = 5 10 9 D Aristoff (Colorado State University) March 2015 24 / 29

Numerics 16 14 12 1 slope = 005 slope = 010 08 06 04 02 0 0 10 20 30 40 50 60 S 1 S 2 S 3 Figure: A random walk with energetic barriers At each step, it moves to the left with probability 1/2 + slope, and to the right with probability 1/2 slope D Aristoff (Colorado State University) March 2015 25 / 29

Numerics 30 025 28 02 x 26 24 22 20 18 0 20 40 60 T corr (S 3 ) f 015 01 005 0 0 20 40 60 T corr (S 3 ) 70 time speedup 65 60 55 50 0 20 40 60 T corr (S 3 ) Figure: Equilibrium averages vs T corr when N = 100 Here, f = 1 y right half of state space D Aristoff (Colorado State University) March 2015 26 / 29

Numerics 100 102 x 105 time speedup 80 60 40 20 decorrelation steps 10 98 96 94 0 0 100 200 300 400 500 N 92 0 100 200 300 400 500 N parallel steps 10200 10100 10000 9900 9800 9700 9600 9500 9400 0 100 200 300 400 500 N parallel loops x 10 6 5 4 3 2 1 0 0 100 200 300 400 500 N Figure: Parallel step stats when T corr = largest value from last slide and T sim = 10 9 D Aristoff (Colorado State University) March 2015 27 / 29

Conclusion Remarks Our algorithm applies to any Markov chain It is efficient when for S S, Time scale to reach QSD in S Time scale to leave S S need not be known a priori Sets in S can be identified on the fly Future work (X n ) only approximately reaches ν How does this error propagate? When can S be efficiently identified on the fly? D Aristoff (Colorado State University) March 2015 28 / 29

Conclusion Acknowledgements Collaborators T Lelièvre (École des Ponts ParisTech), G Simpson (Drexel University) Funding DOE Award de-sc0002085 (Thanks to Mitch Luskin for the support) http://wwwmathcolostateedu/~aristoff/ D Aristoff (Colorado State University) March 2015 29 / 29