Computing ergodic limits for SDEs

Computing ergodic limits for SDEs M.V. Tretyakov School of Mathematical Sciences, University of Nottingham, UK Talk at the workshop Stochastic numerical algorithms, multiscale modelling and high-dimensional data analytics, ICERM, 21st July 2016

Plan of the talk ϕ erg = 1 t ϕ(x)ρ(x) dx = lim Eϕ(X (t)) = lim ϕ(x (s))ds a.s. t t t 0

Plan of the talk ϕ erg = 1 t ϕ(x)ρ(x) dx = lim Eϕ(X (t)) = lim ϕ(x (s))ds a.s. t t t 0 Introduction Examples of Langevin-type equations and stochastic gradient systems Geometric integrators for Langevin equations and the gradient system Non-Markovian scheme for stochastic gradient systems [Davidchack, Ouldridge&T. J Chem Phys 2015] [Leimkuhler, Matthews,T 2014]

Introduction Hamiltonian H(r, p) canonical ensemble (NVT ) ρ(r, p) exp( βh(r, p)), where β = 1/(k B T ) > 0 is an inverse temperature. ϕ erg = 1 t ϕ(r)ρ(r, p) drdp = lim Eϕ(R(t)) = lim ϕ(r(s))ds a.s. t t t 0

Rigid Body Dynamics Following [Miller III et al J. Chem. Phys., 2002] H(r, p, q, π) = pt p 2m + n j=1 l=1 3 V l (q j, π j ) + U(r, q), (1) where r = (r 1T,..., r nt ) T R 3n, and p = (p 1T,..., p nt ) T R 3n are the center-of-mass coordinates and momenta; q = (q 1T,..., q nt ) T R 4n, q j = (q j 0, qj 1, qj 2, qj 3 )T R 4, q j = 1, are the rotational coordinates in the quaternion representation, π = (π 1T,..., π nt ) T R 4n, are the angular momenta; 3 V l (q, π) = 1 8 l=1 3 l=1 1 I l [ π T S l q ] 2 = 1 8 πt S(q)DS T (q)π, q, π R 4, l = 1, 2, 3, I l the principal moments of inertia and the constant 4-by-4 matrices S l : S 1 q = ( q 1, q 0, q 3, q 2 ) T, S 2 q = ( q 2, q 3, q 0, q 1 ) T, S 3 q = ( q 3, q 2, q 1, q 0 ) T, S 0 = diag(1, 1, 1, 1) T. S(q) = [S 0 q, S 1 q, S 2 q, S 3 q], D = diag(0, 1/I 1, 1/I 2, 1/I 3 ).

Langevin thermostat for Rigid Body Dynamics dr j = Pj m dt, Rj (0) = r j, dp j = f j (R, Q)dt γp j dt + 2mγ β dw j (t), P j (0) = p j, (2) dq j = 1 4 S(Qj )DS T (Q j )Π j dt, Q j (0) = q j, q j = 1, (3) 3 dπ j = 1 4 + l=1 1 I l ( Π j T S l Q j) S l Π j dt + F j (R, Q)dt ΓJ(Q j )Π j dt 2MΓ β 3 l=1 S l Q j dw j l (t), Πj (0) = π j, q j T π j = 0, j = 1,..., n, where f j (r, q) = r j U(r, q) R 3, F j (r, q) = q j U(r, q) T q j S 3, (w T, W T ) T = (w 1 T,..., w n T, W 1 T,..., W n T ) T is a (3n + 3n)-dimensional standard Wiener process; γ 0 and Γ 0 are the friction coefficients for the translational and rotational motions, and 3 J(q) = MS(q)DS T (q)/4, M = 4/ 1/I l. Davidchack, Ouldridge&T. J Chem Phys 2015 l=1

Langevin-type equations The Ito interpretation of the SDEs (2) (3) coincides with its Stratonovich interpretation. The solution of (2) (3) preserves the quaternion length Q j (t) = 1, j = 1,..., n, for all t 0. (4) The solution of (2) (3) automatically preserves the constraint: Q j T (t)π j (t) = 0, j = 1,..., n, for t 0 (5) Assume that the solution X (t) = (R T (t), P T (t), Q T (t), Π T (t)) T of (2) (3) is an ergodic process on D = {x = (r T, p T, q T, π T ) T R 14n : q j = 1, q j T π j = 0, j = 1,..., n}. Then it can be shown that the invariant measure of X (t) is Gibbsian with the density ρ(r, p, q, π) on D: ρ(r, p, q, π) exp( βh(r, p, q, π)) (6) Davidchack, Ouldridge&T. J Chem Phys 2015

Langevin equations and quasi-symplectic integrators dr j = Pj m dt, Rj (0) = r j, dp j = f j (R, Q)dt γp j dt + 2mγ β dw j (t), P j (0) = p j, (9) dq j = 1 4 S(Qj )DS T (Q j )Π j dt, Q j (0) = q j, q j = 1, (10) 3 dπ j = 1 4 + l=1 1 I l ( Π j T S l Q j) S l Π j dt + F j (R, Q)dt ΓJ(Q j )Π j dt 2MΓ β 3 l=1 S l Q j dw j l (t), Πj (0) = π j, q j T π j = 0, j = 1,..., n, Let D 0 R d, d = 14n, be a domain with finite volume. The transformation x = (r, p, q, π) X (t) = X (t; x) = (R(t; x), P(t; x), Q(t; x), Π(t; x)) maps D 0 into the domain D t.

Langevin equations and quasi-symplectic integrators V t = = dx 1... dx d (7) D t D(X 1,..., X d ) D(x 1,..., x d ) dx 1... dx d. D 0 The Jacobian J is equal to J = D(X 1,..., X d ) D(x 1,..., x d ) = exp ( n(3γ + Γ) t). (8) [Milstein&T. IMA J. Numer. Anal. 2003 (also Springer 2004)], [Davidchack, Ouldridge&T. J Chem Phys 2015]

The stochastic gradient system It is easy to verify that exp( βh(r, p, q, π))dpdπ (9) D mom exp( βu(r, q)) =: ρ(r, q), where (r T, q T ) T D = {(r T, q T ) T R 7n : q j = 1} and the domain of conjugate momenta D mom = {(p T, π T ) T R 7n : q T π = 0}. We introduce the gradient system in the form of Stratonovich SDEs: dr = υ m f(r, Q)dt + 2υ dw(t), R(0) = r, (10) mβ dq j = Υ M F j 2Υ 3 (R, Q)dt + S l Q j dw j l (t), (11) Mβ l=1 Q j (0) = q j, q j = 1, j = 1,..., n, where indicates the Stratonovich form of the SDEs, parameters υ > 0 and Υ > 0 control the speed of evolution of the gradient system (10) (11), f = (f 1 T,..., f n T ) T and the rest of the notation is as in (2) (3). [Davidchack, Ouldridge&T. J Chem Phys 2015]

The gradient thermostat for rigid body dynamics This new gradient thermostat possesses the following properties. As in the case of (2) (3), the solution of (10) (11) preserves the quaternion length (4). Assume that the solution X (t) = (R T (t), Q T (t)) T D of (10) (11) is an ergodic process. Then, by the usual means of the stationary Fokker-Planck equation, one can show that its invariant measure is Gibbsian with the density ρ(r, q) from (9).

Langevin integrators Davidchack, Ouldridge&T. J Chem Phys 2015 For simplicity we use a uniform time discretization of a time interval [0, T ] with the step h = T /N. Goal: to construct integrators quasi-symplectic preserve Q j (t k ) = 1, j = 1,..., n, for all t 0 automatically preserve Q j T (t k ) Π j (t k ) = 0, of weak order 2 To this end: j = 1,..., n, for t 0 automatically stochastic numerics+splitting techniques [see e.g. Milstein&T, Springer 2004] the deterministic symplectic integrator from [Miller III et al J. Chem. Phys., 2002]

Langevin A integrator The first integrator is based on splitting the Langevin system in dr j = Pj m dt, Rj (0) = r j, (12) dp j = f j 2mγ (R, Q)dt + β dw j (t), dq j = 1 4 S(Qj )DS T (Q j )Π j dt, (13) 3 dπ j = 1 4 + l=1 1 I l ( Π j T S l Q j) S l Π j dt + F j (R, Q)dt 2MΓ β 3 l=1 S l Q j dw j l (t), j = 1,..., n,

Langevin B integrator 2mγ dp I = γp I dt + β dw(t), dπ j I = ΓJ(q)Π j I dt + 2MΓ 3 S l qdw j l β (t); l=1 (15) dr II = P II m dt, dp II = f(r II, Q II )dt, dq j II = 1 4 S(Qj II )DS T (Q j II )Πj II dt, (16) dπ j II = F j (R II, Q II )dt + 1 3 1 ] [(Π j II 4 I )T S l Q j II S l Π j II dt, j = 1,..., n. l l=1 The SDEs (15) have the exact solution: 2mγ t P I (t) = P I (0) exp( γt) + β 0 Π j I (t) = exp( ΓJ(q)t)Πj I (0) + 2MΓ β exp( γ(t s))dw(s), (17) 3 l=1 t 0 exp( ΓJ(q)(t s))dw j l (s). To construct the method: half a step of (17) + one step of a symplectic method for (16) + half a step of (17).

Langevin C integrator Based on the same spliting (15) and (16) as Langevin B, i.e., the determinisitic Hamiltonian system + OU. To construct the method: half a step of a symplectic method for (16) + step of (17) + half a step of a symplectic method for (16).

Numerical experiments Davidchack, Handel&T. J Chem Phys 2009 and Davidchack, Ouldridge&T. J Chem Phys 2015 Two objectives for the experiments: the dependence of Langevin dynamics properties on the choice of parameters γ and Γ errors of the numerical schemes. TIP4P rigid model of water (Jorgensen et. al J. Chem. Phys. 1983) A h : = EA( X ) = EA(X ) + E A h p + O(h p+1 ) = A 0 + E A h p + O(h p+1 ) p = 2 for Langevin integrators and p = 1 for the gradient thermostat integrator Talay&Tubaro Stoch.Anal.Appl. 1990

Accuracy of integrators Langevin A (left), Langevin B (centre), and Langevin C (right) with γ = 5 ps 1 and Γ = 10 ps 1. Error bars denote estimated 95% confidence intervals.

Geometric integrator for the gradient thermostat The main idea is to rewrite the components Q j of the solution to (10) (11) in the form Q j (t) = exp(y j (t))q j (0) and then solve numerically the SDEs for the 4 4-matrices Y j (t) Lie group methods [Hairer, Lubich, Wanner; Springer, 2002] To this end, we introduce the 4 4 skew-symmetric matrices: F j (r, q) = F j (r, q)q j T q j (F j (r, q)) T, j = 1,..., n. Note that F j (r, q)q j = F j (r, q) under q j = 1 and the equations (11) can be written as dq j = Υ M F j(r, Q)Q j 2Υ 3 dt+ S l Q j dw j l Mβ (t), Qj (0) = q j, q j = 1. One can show that l=1 Y j (t + h) = h Υ M F j(r(t), Q(t)) + 2Υ Mβ 3 ( l=1 W j l + terms of higher order. (18) (t + h) W j l (t) ) S l

Geometric integrator for the gradient thermostat R 0 = r, Q 0 = q, q j = 1, j = 1,..., n, (19) R k+1 = R k + h υ m f(r k, Q k ) + 2υ h mβ ξ k, Y j k = h Υ M F j(r k, Q k ) + 2Υ 3 h η j,l k Mβ S l, Q j k+1 = exp(y j k )Qj k, j = 1,..., n, where ξ k = (ξ 1,k,..., ξ 3n,k ) T and ξ i,k, i = 1,..., 3n, η j,l k, l = 1, 2, 3, j = 1,..., n, are i.i.d. random variables with the same law l=1 P(θ = ±1) = 1/2. (20) Proposition. The numerical scheme (19) (20) for (10) (11) preserves the length of quaternions, i.e., Q j k = 1, j = 1,..., n, for all k, and it is of weak order one. Davidchack, Ouldridge&T. J Chem Phys 2015

Stochastic gradient system dx = a(x)dt + σdw, X(0) = X 0, (21) a(x) := U(x), x R d, (22) σ = 2/β, d = 3n, and w(t) is a standard d-dimensional Wiener process.

Stochastic gradient system dx = a(x)dt + σdw, X(0) = X 0, (21) a(x) := U(x), x R d, (22) σ = 2/β, d = 3n, and w(t) is a standard d-dimensional Wiener process. We use the following notation for the solution of (21): X(t) = X t0,x(t) when X(t 0 ) = x, t t 0, and also we will write X x (t) when t 0 = 0. The process X(t) is exponentially ergodic [Hasminskii 1980] if for any x R d and any function ϕ with a polynomial growth there are C(x) > 0 and λ > 0 such that Eϕ(X x (t)) ϕ erg C(x)e λt, t 0,

Stochastic gradient system The Euler scheme: X k+1 = X k + ha(x k ) + σ hξ k+1, (24) where ξ k = (ξ 1 k,..., ξ d k ) and ξ i k, i = 1,..., d, k = 1,..., are i.i.d. random variables with the law N (0, 1). Heun s scheme: ˆX k+1 = X k + ha(x k ) + σ hξ k+1, X k+1 = X k + h [ a( 2 ˆX ] k+1 ) + a(x k ) + σ hξ k+1. (25) Weak convergence: Eϕ(X (T )) Eϕ(X N ) Kh p (26) N = T /h; Euler p = 1, Heun p = 2 [e.g. Milstein, T., Springer 2004]

Non-Markovian scheme X k+1 = X k + ha(x k ) + σ h 2 (ξ k + ξ k+1 ), (28) where ξ k = (ξ 1 k,..., ξ i k) defined on (Ω, P, F) and ξ i k, i = 1,..., d, k = 1,..., are i.i.d. random variables with the law N (0, 1) [Leimkuhler, Matthews (2013)]

Example Let a(x) = αx with α > 0, then X(t) from (21) is the Ornstein-Uhlenbeck process, which is Gaussian with EX x (t) = xe αt, Cov(X x (s), X x (t)) = σ2 2α (e α(t s) e α(t+s) ) for s t and Var(X x (T )) = σ2 2α (1 e 2αT ). For the Euler scheme (24): EX N = x 0 (1 αh) N = x 0 e αt (1 + O(h)), Var(X N ) = σ2 1 (1 αh) 2N 2α 1 + αh = σ2 2α (1 e 2αT ) σ2 2 h + e 2αT O(h) + O(h 2 ), αh < 1, where O(h p ) Kh with K > 0 independent of T.

Assumptions Assumption 1 There exist c 0 R and c 1 > 0 such that (x, a(x)) c 0 c 1 x 2. Assumption 2 The potential U(x) C 7 (R d ), its first-order derivatives grow not faster than a linear function at infinity and higher derivatives are bounded. The function ϕ(x) C 6 (R d ) and it and its derivatives grow not faster than a polynomial function at infinity. The most restrictive condition in Assumption 2 is the requirement for a(x) = U to be globally Lipschitz: a(x) 2 K(1 + x 2 ), (29) where K > 0 is independent of x R d, which can be relaxed via [Milstein, T. (2005) and (2007): the concept of rejecting exploding trajectories]

Introduce the operator L L := t + L, where L The function L := d i=1 a i (x) x i + σ2 2 d i=1 2 ( x i ) 2. (30) u(t, x) = Eϕ(X t,x (T )) (31) satisfies the Cauchy problem for the backward Kolmogorov equation Lu = 0, (32) u(t, x) = ϕ(x).

Main theorem Theorem (1) Let Assumptions 1-2 hold. Then the scheme (28) is first order weakly convergent and for all sufficiently small h > 0 its error has the form + σ2 2 Eϕ(X x (T )) Eϕ(X N ) = C 0 (T, x)h + C(T, x)h 2, (33) T C 0 (T, x) = E B 0 (t, X x (t))dt, (34) 0 B 0 (t, x) = 1 d 2 d i,j=1 i,j=1 a i (x) 2 u(t, x) x j x i x j + σ2 2 a j (x) ai (x) u(t, x) x j x i d 2 a i (x) u(t, x) ( x j ) 2 x i, i,j=1 C(T, x) K(1 + x κ e λt ), for some K > 0, κ N and λ > 0 independent of h and T.

Corollary Theorem 1: Eϕ(X x (T )) Eϕ(X N ) = C 0 (T, x)h + C(T, x)h 2, T C 0 (T, x) = E B 0 (t, X x (t))dt. 0

Corollary Theorem 1: Eϕ(X x (T )) Eϕ(X N ) = C 0 (T, x)h + C(T, x)h 2, T C 0 (T, x) = E B 0 (t, X x (t))dt. 0 Theorem (2) Let Assumptions 1-2 hold. Then the coefficient C 0 (T, x) from (34) goes to zero as T : C 0 (T, x) K(1 + x κ )e λt (35) for some constants K > 0, κ N and λ > 0, i.e., over a long integration time the scheme (28) is of order two up to exponentially small correction.

Sketch of the proof C 0 (T, x) = = T 0 T 0 T + 0 EB 0 (t, X x (t))dt = T B 0 (t, y)ρ(y)dydt R d R d B 0 (t, y)[p(t, x, y) ρ(y)]dydt, 0 R d B 0 (t, y)p(t, x, y)dydt (36) where p(t, x, y) is the transition density for (21) and ρ(y) is the invariant density.

Discussion 1 We emphasize that the fact that the average of B 0 (t, x) with respect to the invariant measure is equal to zero is the reason why the scheme (28) is second order accurate in approximating ergodic limits. 2 In the case of the Euler scheme (24) we get the same error expansion as (33) for the scheme (28) but with a different B 0 (t, x) = B0 E (t, x) (see [Milstein, T. (2004)]): B0 E (t, x) = 1 d a j u u 2 x j ai x i + σ2 2 +σ 2 d i,j=1 d i,j i,j=1 2 a j u ( x i ) 2 x j + σ2 2 a j 2 u x i x j x i + σ4 6 d a i 3 u i,j=1 x i ( x j ) 2 d 4 u ( x i ) 2 ( x j ) 2. i,j=1 The average of B0 E (t, x) with respect to the invariant measure is not equal to zero and, consequently, the Euler scheme (24) approximates ergodic limits with order one the same order as its weak convergence over a finite time interval.

Discussion 3 Let a one-step weak approximation X t,x (t + h) of the solution X t,x (t + h) of (21) generate a method of order p. The global error of the method: R : = Eϕ(X x (T )) Eϕ( X x (T )) (38) = C 0 (T, x)h p + + C n (T, x)h p+n + O(h p+n+1 ), where n N and the functions C 0 (T, x),..., C n (T, x) are independent of h which can be presented in the form C i (T, x) = T 0 EB i (s, X x (s))ds. One can deduce from the proof of Theorem 2 that if the averages of B i (s, x) 0 i n, with respect to the invariant measure are equal to zero then in the limit of T the scheme has p + n order of accuracy in h.

Numerical experiments Anharmonic scalar model: the one-dimensional potential energy U(x) = cos(x) L 2 error: (ˆρ i ρ i ) 2 i tropy error 10-1 L 2 error uyama 10-2 Euler-Maruyama ethod didate Scheme (1.7) Error 10-3 10-4 Heun s Method Candidate Scheme (1.7) 0.4 0.8 Timestep 0.2 0.4 0.8 Timestep Figure: The error in computed distributions is plotted for each scheme.

Error in finite time 2 10-1 Error at time t 10-2 10-3 10-4 t=1.92 t=3.84 t=5.76 t=7.68 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 Timestep Timestep Timestep Timestep 10-1 Error at time t 10-2 10-3 t=0.96 t=2.88 t=4.80 t=6.72 t=8.64 10-4 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 0.16 0.24 0.32 0.48 Timestep Timestep Timestep Timestep Timestep 10-1 Error using h = 0.16 L error 10-2 10-3 Euler-Maruyama Heun s Method Candidate Scheme (1.7) 10-4 0 1 2 3 4 5 6 7 8 9 Time Figure: The lower plot shows the error in the distribution after time t, as computed using each scheme at h = 0.16. In the plots at the top, we compare the error growth with respect to stepsize h at multiples of t = 0.96.

Conclusions Two new thermostats, one of Langevin type and one of gradient (Brownian) type, for rigid body dynamics are introduced. As in the deterministic case, it is important to preserve structural properties of stochastic systems for accurate long term simulations. Current work includes stochastic rigid body dynamics with hydrodynamic interactions. Development of more efficient methods for stochastic gradient systems.