Improved diffusion Monte Carlo

Similar documents
Sequential Monte Carlo Methods for Bayesian Computation

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 8: Importance Sampling

Erdős-Renyi random graphs basics

Kolmogorov Equations and Markov Processes

Xt i Xs i N(0, σ 2 (t s)) and they are independent. This implies that the density function of X t X s is a product of normal density functions:

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

Lecture Particle Filters

Monte Carlo Methods. Geoff Gordon February 9, 2006

Exact Simulation of Multivariate Itô Diffusions

More Empirical Process Theory

Controlled sequential Monte Carlo

Auxiliary Particle Methods

Network Theory II: Stochastic Petri Nets, Chemical Reaction Networks and Feynman Diagrams. John Baez, Jacob Biamonte, Brendan Fong

18.440: Lecture 28 Lectures Review

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Introduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0

p(t)dt a p(τ)dτ , c R.

Bernardo D Auria Stochastic Processes /12. Notes. March 29 th, 2012

Legendre s Equation. PHYS Southern Illinois University. October 13, 2016

Improved diffusion Monte Carlo

Lecture 21: Stochastic Differential Equations

Bridging the Gap between Center and Tail for Multiscale Processes

Particle Filters: Convergence Results and High Dimensions

SDE Coefficients. March 4, 2008

Monte Carlo Methods. Leon Gu CSD, CMU

MATH4210 Financial Mathematics ( ) Tutorial 7

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

arxiv: v4 [quant-ph] 26 Oct 2017

Non-orthogonal Hilbert Projections in Trend Regression

Computational statistics

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

2.3. The heat equation. A function u(x, t) solves the heat equation if (with σ>0)

Stochastic Differential Equations.

Importance splitting for rare event simulation

Latent state estimation using control theory

STATISTICS 385: STOCHASTIC CALCULUS HOMEWORK ASSIGNMENT 4 DUE NOVEMBER 23, = (2n 1)(2n 3) 3 1.

Predator - Prey Model Trajectories are periodic

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

Exam 3, Math Fall 2016 October 19, 2016

PY 351 Modern Physics - Lecture notes, 3

In this approach, the ground state of the system is found by modeling a diffusion process.

1. Stochastic Process

A stochastic particle system for the Burgers equation.

Predator - Prey Model Trajectories and the nonlinear conservation law

Imprecise Filtering for Spacecraft Navigation

Lecture 4: Ito s Stochastic Calculus and SDE. Seung Yeal Ha Dept of Mathematical Sciences Seoul National University

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Chapter 2 - Survival Models

Fock Space Techniques for Stochastic Physics. John Baez, Jacob Biamonte, Brendan Fong

Brownian Motion. Chapter Stochastic Process

Lecture Particle Filters. Magnus Wiktorsson

UNCERTAINTY FUNCTIONAL DIFFERENTIAL EQUATIONS FOR FINANCE

Finding Limits of Multiscale SPDEs

Introduction to the Numerical Solution of IVP for ODE

Convergence of Particle Filtering Method for Nonlinear Estimation of Vortex Dynamics

Simulation methods for stochastic models in chemistry

Stochastic Simulation Variance reduction methods Bo Friis Nielsen

Stochastic Gradient Descent

Diffusion Monte Carlo

Lecture 6: Bayesian Inference in SDE Models

ELEMENTS OF PROBABILITY THEORY

The multidimensional Ito Integral and the multidimensional Ito Formula. Eric Mu ller June 1, 2015 Seminar on Stochastic Geometry and its applications

Sequential Monte Carlo Samplers for Applications in High Dimensions

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

Stat410 Probability and Statistics II (F16)

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Gaussian, Markov and stationary processes

Predator - Prey Model Trajectories are periodic

Legendre s Equation. PHYS Southern Illinois University. October 18, 2016

Effective dynamics for the (overdamped) Langevin equation

Chapter 6 - Random Processes

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP

Weak Ergodicity Breaking WCHAOS 2011

STAT232B Importance and Sequential Importance Sampling

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3.

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Probabilities versus Amplitudes. John Baez, Jacob Biamonte, Brendan Fong Mathematical Trends in Reaction Network Theory 1 July 2015

Stability of Stochastic Differential Equations

3. Solutions of first order linear ODEs

Uncertainty quantification and systemic risk

Markov Chain Monte Carlo Methods for Stochastic

Multilevel Monte Carlo for Stochastic McKean-Vlasov Equations

Approximation of BSDEs using least-squares regression and Malliavin weights

Qualitative Analysis of Tumor-Immune ODE System

Systemic risk and uncertainty quantification

Action Principles in Mechanics, and the Transition to Quantum Mechanics

Bayesian Inverse problem, Data assimilation and Localization

Some Properties of NSFDEs

A State Space Model for Wind Forecast Correction

Introduction to Random Diffusions

F denotes cumulative density. denotes probability density function; (.)

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Random Variables and Their Distributions

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

ISyE 6644 Fall 2014 Test 3 Solutions

Monte Carlo Simulations

Transcription:

Improved diffusion Monte Carlo Jonathan Weare University of Chicago with Martin Hairer (U of Warwick) October 4, 2012

Diffusion Monte Carlo The original motivation for DMC was to compute averages with respect to the ground state eigenfunction of Notice that if ψ solves the PDE then Hψ = 1 ψ + uψ. 2 t ψ = Hψ ψ(t, x) = e λ i t φ i (x) So if you can compute averages with respect to then you can approximate averages with respect to ψ ψdx for large t φ 0 φ0 dx. We have the Feynman Kac representation f (x)ψ(t, x) dx = E [f (W (t))e t 0 u(w (s))ds]. where W is a Brownian motion.

Using the trapezoidal approximation of the integral: E [f (W (t))e t 0 u(w (s))ds] [ E f (W (t))e ] t/dt 1 j=1 2 (u(w (jdt))+u(w ((j 1)dt))ds This is of the general form E [f (X(t k ))e ] k j=1 v(x(t j 1),X(t j ))

Diffusion Monte Carlo generates an ensemble of points X i (t) so that N(t k ) E f (X i (t k )) = E [f (X(t k ))e ] k j=1 v(x(t j 1),X(t j )) i=1 for any reasonable observable f where X(t 0 ), X(t 1 ), X(t 2 ),... is some underlying process. The ensemble of points evolves in two steps: 1 Evolve each point according to the underlying dynamics for one increment. 2 To incorporate the additional weight factor e v(x(s 1),X(s)) copy particles with large weight and kill those with low weight.

DMC proceeds from and ensemble {X i (t k 1 )} of size N(t k 1 ) at time t k 1 to an ensemble of size N(t k ) at time t k as follows: 1 : for i = 1 : N(t k 1 ) 2 : evolve the sample X i (t k 1 ) to time t k : X i (t k 1 ) X i (t k ) 3 : generate a random integer N i 0 with E [N i ] = e v(x i (t k 1 ), X i (t k )) 4 : add N i copies of X i (t k ) to the time t k 5 : set N(t k ) = ensemble N(t k 1 ) i=1 N i

Over the last 40 years or so the basic DMC idea has spread. For example in Sequential Monte Carlo (e.g. particle filters) one transforms N samples X i (t k 1 ) approximately drawn from some density p k 1 (x) to N samples X i (t k ) approximately drawn from p k (y) e v(y) p(y x)p k 1 (x)dx where e v(y) is some weight function (e.g. from data) and p(y x) is a transition density. This is done by 1 Sampling X i (t k ) p(x X(t k 1 )) 2 Weighting each sample by e v(x i (t k )) 3 Resampling from the weighted distribution.

Over the last 40 years or so the basic DMC idea has spread. For example in Sequential Monte Carlo (e.g. particle filters) one transforms N samples X i (t k 1 ) approximately drawn from some density p k 1 (x) to N samples X i (t k ) approximately drawn from p k (y) e v(y) p(y x)p k 1 (x)dx where e v(y) is some weight function (e.g. from data) and p(y x) is a transition density. This is done by 1 Sampling X i (t k ) p(x X(t k 1 )) 2 Weighting each sample by e v(x i (t k )) 3 Resampling from the weighted distribution.

In the Quantum Monte Carlo application v was specified by the problem. In other applications we may want to choose v to achieve some goal. What if we choose v(x, y) = G(y) G(x)? Then DMC would compute N(t k ) E f (X i (t k )) = e G(X(0) E [f ] (X(t k ))e G(X(t k ) i=1

or (by redefining f ) N(t k ) e G(X(0) E f (X i (t k ))e G(X i (t k ) = E [f (X(t k ))]. i=1 Recall the branching rule: 3 : generate a random integer N i 0 with E [N i ] = e (G( X i (t k )) G(X i (t k 1 ))) 4 : add N i copies of X i (t k ) to the time t k ensemble So if G decreases in a step more copies will be created. By choosing G to be relatively small in a region we can sample that region more thoroughly

So, for example suppose the underlying dynamics is a discrete sampling (or discrete approximation) X(kdt) of dx(t) = V (X(t))dt + 2µdW (t) where the invariant measure e V /µ looks like 2.5 2 1.5 µ 1 1 0.5 0 2 1.5 1 0.5 0 0.5 1 1.5 2 y

Then we might choose 0 1.5 0.2 1 0.4 0.5 0 0.6 0.5 0.8 1 2 1.5 1 0.5 0 0.5 1 1.5 2 or 2 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 if (left) we want to compute an average near the low probability saddle or (right) we want to force the system from the left well to the right well.

Unfortunately this won t work. Note that G(X(kdt)) G(X((k 1)dt)) = O( dt) so every dt units of time we create or destroy dt samples so if creation and destruction aren t carefully balanced expect bad behaior for small dt. As the time step dt is taken smaller and smaller the ensemble size will either blow up or hit zero in a very short time. Yes... we could only do the killing/cloning step once for every O(1/ dt) evolution steps but if the weights degenerate very quickly expect bad results. A small modification borrowed from another rare event scheme (RESTART/DPR) developed in the computer science community seems to solve the problem

Unfortunately this won t work. Note that G(X(kdt)) G(X((k 1)dt)) = O( dt) so every dt units of time we create or destroy dt samples so if creation and destruction aren t carefully balanced expect bad behaior for small dt. As the time step dt is taken smaller and smaller the ensemble size will either blow up or hit zero in a very short time. Yes... we could only do the killing/cloning step once for every O(1/ dt) evolution steps but if the weights degenerate very quickly expect bad results. A small modification borrowed from another rare event scheme (RESTART/DPR) developed in the computer science community seems to solve the problem

In RESTART/DPR: 1 The state space is first partitioned into cells. 2 When an ensemble member crosses from one region to another a certain number of new ensemble members are born. 3 These new ensemble members are restricted in what cells they are allowed to visit... they are killed if they try to visit a cell that is off limits to them.

We ll give each of our DMC particles X i (t k ) a ticket ϑ i (t k ) that tells the trajectory where it s allowed to visit. When e v is below the ticket the particle is killed. The branching step is now 3 : if e v(x i (t k 1 ), X i (t k )) ϑ i generate N i 1 with E [N i ] = min {1, } e v(x i (t k 1 ), X i (t k )) otherwise N i = 0 4 : add N i copies of X i (t k ) to the time t k ensemble 5 : for each new copy generate a ticket ϑ U(e v(x i (t k 1 ), X i (t k )), 1) tickets of surviving particles are discounted by e v(x i (t k 1 ), X i (t k )) The initial ticket ϑ 1 (t 0 ) is drawn from U(0, 1)

Consider the simple dynamics X(kdt) = X((k 1)dt) + dtξ k ξ k N(0, 1) with G(x) = x. 4.5 4 Modified DMC DMC 3.5 3 log(e[n 2 ]) 2.5 2 1.5 1 0.5 0 2 4 6 8 10 12 14 log(dt) For the new scheme: for all powers p. sup E [ (N(t)) p] < dt>0 kdt=t

As with DMC the new algorithm produces an unbiased estimate N(t k ) i=1 f (X i (t k )) of E [f (X(t k ))e ] k j=1 v(x(t j 1),X(t j )) You get back DMC if you re-draw all tickets from U(0, 1) at each step. This allows us to prove that (basically in any setting): The new estimator has lower variance than the old estimator. It s expected cost is easily seen to be the same so the new algorithm is unambiguously superior.

In settings in which DMC fails dramatically the improved algorithm has a nice continuous time limit. We call the limiting object the Brownian fan. The Brownian fan is constructed recursively with each generation being a realization of a poisson point process on an excursion space with intensity depending on the previous generation. So the new algorithm is better in all settings and in some cases the improvement is really dramatic

When using the algorithm with v(x, y) = G(y) G(x) to compute E [f (X(t k ))] it s natural to ask what is the optimal choice of G. One has to consider the effect of the choice of G on both the workload and variance of the resulting estimator. We have not pursued this but we expect that in the small temperature limit an optimal strategy could be identified by following arguments of Dean and Dupuis for DPR. But these strategies will involve finding, e.g. approximate transition paths. Since you can t expect to do that very accurately you d like to see the algorithm behave well with more naive design choices.

A simple rare event example: dx(t) = V (X(t))dt + 2kT dw (t) Starting from the lower well and running for 1 unit of time.

We use our modified DMC scheme with v(x, y) = G(y) G(x), G(x) = λ x x A where x A is the minimum in the lower basin. We want to compute P xa (X(1) B), B = {x : x x B < 0.25}. λ is chosen so that the expected number of particles ending in B is close to 1.

kt λ estimate variance workload brute force variance 16 5 0.5133 0.3357 0.2499 8 15 0.2839 10 1 0.5519 10 2 0.2758 10 1 4 25.5 0.4813 10 5 0.1521 10 8 0.4813 10 5 2 33 0.1262 10 13 0.2133 10 23 0.1262 10 13

A rare event example: dx(t) = V (X(t))dt + 2µdW (t) Figure : (Left) Initial configuration x A. (Right) A typical snapshot after the transition. The 7 discs interact with each other through V (x) = i<j 4 ( x i x j 12 x i x j 6)

Using v(x, y) = G(y) G(x), G(x) = λ { µ min x i 1 i 2 7 x j }. j where x 1 is the point at the center. We approximate { P(X(2) } B) where B is the event that min i 2 x i 1 7 j x j < 0.1. µ λ estimate workload variance workload brute force variance 0.4 1.9 1.125 10 2 6.451 4.732 10 3 1.112 10 2 0.2 1.3 2.340 10 3 5.818 2.344 10 4 2.335 10 3 0.1 1 7.723 10 5 6.936 7.473 10 7 7.722 10 5 0.05 0.85 9.290 10 8 15.42 1.002 10 11 9.290 10 8 0.025 0.8 1.129 10 13 102.4 1.311 10 21 1.129 10 13 λ is adjusted so that the expected number of particles ending in B is close to 1.

z A filtering example: The solution to dx 1 (t) = 10(X 1 (t) X 2 (t))dt + 2dW 1 (t) dx 2 (t) = (X 1 (t)(28 X 3 (t)) X 2 (t))dt + 2dW 2 (t) dx 3 (t) = (X 1 (t)x 2 (t) 8 3 X 3(t))dt + 2dW 3 (t) is hidden from us. 45 40 35 30 25 20 15 10 5 40 20 20 0 y 20 40 20 10 x 0 10

We see only dh(t) = X 1 (t) X 2 (t) X 3 (t) dt + 0.1 db(t) 0.01 x 0 0.01 0 2 4 6 8 10 12 14 16 18 20 0.01 y 0 0.01 0 2 4 6 8 10 12 14 16 18 20 10 x 10 3 5 z 0 5 0 2 4 6 8 10 12 14 16 18 20 time Our task is to estimate the hidden signal by computing [ ] E (X 1 (t), X 2 (t), X 3 (t)) Ft H

After discretizing a DMC method for sampling from the conditional distribution of the hidden process given the observations becomes 3 : if P i ϑ i generate N i 1 with otherwise N i = 0 E [N i ] = min {1, P i } 4 : add N i copies of X i (t k ) to the time t k ensemble 5 : for each new copy generate a ticket ϑ U(P 1 i, 1) discount ticket of surviving particles by P 1 i The initial ticket ϑ 1 (t 0 ) is drawn from U(0, 1) where now ) exp ( X i (t)dt dh(t) 2 0.02dt P i = )N j ( exp X j (t)dt dh(t) 2 0.02dt

With 10 particles: 20 10 x 0 10 20 0 2 4 6 8 10 12 14 16 18 20 40 20 y 0 20 40 0 2 4 6 8 10 12 14 16 18 20 60 40 z 20 0 0 2 4 6 8 10 12 14 16 18 20 time

What if we d done the same thing without the tickets (i.e. standard DMC) 20 10 x 0 10 20 0 2 4 6 8 10 12 14 16 18 20 40 20 y 0 20 40 0 2 4 6 8 10 12 14 16 18 20 60 40 z 20 0 0 2 4 6 8 10 12 14 16 18 20 time