arxiv: v2 [math.na] 20 Dec 2016

Size: px

Start display at page:

Download "arxiv: v2 [math.na] 20 Dec 2016"

Eunice Lewis
6 years ago
Views:

1 SAIONARY AVERAGING FOR MULI-SCALE CONINUOUS IME MARKOV CHAINS USING PARALLEL REPLICA DYNAMICS ING WANG, PER PLECHÁČ, AND DAVID ARISOFF arxiv: v2 [math.na 2 Dec 216 Abstract. We propose two algorithms for simulating continuous time Markov chains in the presence of metastability. We show that the algorithms correctly estimate, under the ergodicity assumption, stationary averages of the process. Both algorithms, based on the idea of the parallel replica method, use parallel computing in order to explore metastable sets more efficiently. he algorithms require no assumptions on the Markov chains beyond ergodicity and the presence of identifiable metastability. In particular, there is no assumption on reversibility. We present error analyses, as well as numerical simulations on multi-scale stochastic reaction network models in order to demonstrate consistency of the method and its efficiency. Key words. Markov chains, Monte Carlo, reversibility, stationary distribution, metastability, parallel replica, stochastic reaction networks, multi-scale dynamics, coarse graining AMS subject classifications. 6J22, 65C5, 65Z5, 82B31, 92E2 1. Introduction. We focus on computing stationary averages of continuous time Markov chains. More precisely, if π is the stationary distribution of a continuous time Markov chain (CMC) and f is a function on the state space, we aim at estimating the average π(f) E π [f by taking a time average on a long trajectory of the CMC. here are many methods for computing stationary averages of stochastic processes, however, the vast majority of them rely on reversibility of the process, e.g., as in Markov chain Monte Carlo [2. Computational cost of the ergodic (trajectory) averaging becomes prohibitive when the convergence to the stationary distribution is slow due to metastability of the dynamics, for example in the presence of rare events or large time scale disparities (multi-scale dynamics), [21. A possible remedy for this issues is to use parallel computing in order to accelerate sampling of the state space. For instance, the parallel tempering method (also known as the replica exchange) [12, 1, 9, 17 has been successfully applied to many problems by simulating multiple replicas of the original systems, each replica at a different temperature. However, the method requires the time reversibility of the underlying processes, which is typically not true for processes that model chemical reaction networks or systems with non-equilibrium steady states. In fact, there are not many methods that parallelize Monte Carlo simulation for irreversible processes with metastability, in particular if long-time sampling such as ergodic averaging, is required. We present a parallel computing approach for CMCs without time reversibility. One advantage of the proposed algorithms is that they may be used, in principle, on arbitrary CMCs. However, gains in efficiency can occur only if the CMC is metastable. In this contribution we consider only models described by continuous time Markov chains. As a motivating example we study a multi-scale chemical reaction network model in which molecules of different types react with different rates depending on their concentrations and reaction rate constants. In this model metastability emerges due to the infrequent occurrence of reactions with small rates which makes the relaxation to the steady state dynamics extremely slow. In the transient regime the finite time distribution can be approximated using the stochastic averaging tech- University of Delaware, Newark, DE, (tingw@udel.edu), University of Delaware, Newark, DE, (plechac@math.udel.edu), Colorado State University, Fort Collins, CO, 8523 (aristoff@rams.colorado.edu). 1

2 2 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF nique, [24, 14, or the tau-leap method [19. However, the former does not apply for stationary distribution estimation and the latter can be still computationally expensive for long-time simulations. It is thus desirable to have an efficient algorithm for computing the stationary averages. hus the proposed algorithm will provide a new multi-scale simulation method (in particular for stationary averaging estimation) for the stochastic reaction networks community. he presented approach builds on the parallel replica (ParRep) dynamics introduced in the context of molecular simulations in [23. he ParRep method used in the context of stochastic differential equations, e.g. Langevin dynamics, was rigorously analysed in [15, 16. he algorithm we present and analyse builds on the recent work of [1, 2 where the ParRep process was studied for discrete-time Markov chains. In our algorithms, each time the simulation reaches a local equilibrium in a metastable set W, R independent replicas of the CMC are launched inside the set allowing for parallel simulations of the dynamics at this stage. he main contribution of this work is a procedure for using the replicas in order to efficiently and consistently estimate the exit time and exit state from W, along with the contribution to the stationary time average of f from the time spent in W. We emphasize that we are able to handle arbitrary functions (or observables) on the state space, not only those that are piece-wise constant, i.e., assuming a single value in each W. In the best case, if there are R replicas, then the simulation leaves a metastable set about R times faster compared to a direct serial simulation. he consistency of our algorithms relies on certain properties of the quasi-stationary distribution (QSD) which are essentially local equilibria associated with the metastable sets. We propose two algorithms for computing π(f), called CMC ParRep and embedded ParRep. he former uses parallel simulation of the CMC, while the latter employs parallel simulation of its embedded chain, which is a discrete time Markov chain (DMC). CMC ParRep (resp. embedded ParRep) relies on the fact that, starting at the QSD in a metastable set, the first time to leave the set is an exponential (resp. geometric) random variable and independent of the exit state; see heorem 5 below. he algorithms require some methods for identifying metastable sets, though this need not be done a priori it is sufficient to identify when the CMC is currently in a metastable set, and when it exits such set. While both algorithms can be useful for efficient simulation of π(f) in the presence of metastability, we expect the embedded ParRep can be significantly more efficient, especially when combined with a certain type of QSD sampling, called Fleming-Viot [3, 4. hough we focus here on the computation of π(f), we note that one of our algorithms, CMC ParRep, can be used to compute the dynamics of the CMC on a coarse space in which each metastable set is considered a single (meta-)state. See the discussion below Algorithm 1. he advantages of the proposed algorithms include: (a) no requirement of time reversibility for the underlying dynamics; (b) they are suitable for long-time sampling; (c) they may be used, in principle, on arbitrary CMCs in the presence of metastability. In Section 2, we briefly review CMCs before defining QSDs and detailing relevant properties thereof. In Section 3, we present CMC ParRep, and study how the error in the algorithm depends on the quality of QSD sampling. In Section 4, we present embedded ParRep and provide an analogous error analysis. We detail some numerical experiments on multi-scale chemical reaction network model in Section 5 in order to demonstrate the consistency and accuracy of the algorithms.

3 PARALLEL REPLICA MEHODS FOR CMC 3 2. Background and problem formulation Continuous ime Markov Chains. hroughout this paper, X(t) is an irreducible and positive recurrent continuous time Markov chain (CMC) with values in a countable set E and π denotes the stationary distribution of X(t). We are interested in computing stationary averages π(f) for a bounded function f : E R by using the ergodic theorem 1 (1) lim t t t f(x(s))ds = π(f), which holds almost surely for any initial distribution of X(t). he jump times τ n and holding times τ n for X(t) are defined recursively by and τ =, τ n = inf{t > τ n 1 : X(t) X(τ n 1 )}, τ n 1 = τ n τ n 1 for n 1. We assume that X(t) is non-explosive, that is, lim n τ n = almost surely for every initial distribution of X(t). his precludes the possibility of infinitely many jumps in finite time. We denote X n = X(τ n ) the embedded chain of X(t). It is easy to see that X n is a discrete time Markov chain (DMC). Recall that X(t) is completely determined by its infinitesimal generator matrix Q = {q(x, y)} x,y E. We write q(x) := q(x, x); note that irreducibility implies q(x) > for all x E. It is easy to check that X n has the transition probability matrix P = {p(x, y)} x,y E satisfying { q(x,y) q(x) p(x, y) =, x y,, x = y. We state the following well known fact for the later reference. Lemma 1. For a CMC X(t) with the corresponding embedded Markov chain X n, the holding time between successive jumps τ, τ 1,, τ i, are independent conditioned on the embedded chain X n. Moreover, τ i {X n } is exponentially distributed with the rate q(x i ) and hence E [ τ i {X n } = q(x i ) 1. For details on the above facts, see for instance [ he Quasi-stationary Distribution and Metastability. Below, we write P, E for various probabilities and expectations, the precise meaning of which will be clear from context. We use a superscript P ξ, E ξ to indicate that the initial distribution is ξ. When the initial distribution is δ x, we write P x, E x. he symbol will indicate equality in probability law. Re( ) and denote the real part and modulus of a complex number. Our ParRep algorithms rely on certain properties of quasi-stationary distributions, which we now briefly review. Let W E be fixed and consider the first exit time of X(t) from W, that is, = inf{t > ; X(t) / W }. We consider also the first exit time of X n from W, N = inf{n > ; X n / W }.

4 4 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF A quasi-stationary distribution (QSD) of X(t) in W (or X n in W ) is defined as follows. Definition 2. A probability distribution ν with support in W is a quasi-stationary distribution for X(t) in W if for each y W and t >, (2) ν(y) = P ν (X(t) = y > t). Similarly, a probability distribution µ with support in W is a QSD for X n in W if for each y W and n >, (3) µ(y) = P µ (X n = y N > n). hroughout we write ν for a QSD of the CMC X(t) and µ for a QSD of the embedded chain X n. he associated set W will be implicit since no ambiguities should arise. We will write (4) ν t (A) = P x (X(t) A > t) for the distribution of X(t) conditioned on > t, and (5) µ n (A) = P x (X n A N > n). for the distribution of X n conditioned on N > n. Notice we do not make explicit the dependence on the starting point x. We summarize existence, uniqueness, and convergence properties of the QSD in heorem 3 below (see [6, 22). In heorem 3 below, for simpler presentation, we assume W is finite. hat allows us to characterize convergence to the QSD of X(t) and X n in terms of spectral properties of their generator and transition matrices. We emphasize, however, that finiteness of W is not required for consistency of the algorithms proposed in this paper. Recall that Q is the infinitesimal generator matrix of X(t) and P is the transition probability matrix of the DMC X n. We denote Q W = {q xy } x,y W and P W = {p xy } x,y W the restrictions of P and Q to W. heorem 3. Let W be finite and nonabsorbing for X(t), and assume P W is irreducible. (a) he eigenvalues λ 1, λ 2,... of Q W can be ordered so that > λ 1 > Re(λ 2 )..., where λ 1 has the left eigenvector ν which is a probability distribution on W. Moreover, ν is the unique quasi-stationary distribution of X(t) in W, and for all x, y W, (6) ν t (y) ν(y) = P x (X(t) = y > t) ν(y) C(x)e (λ1 β)t, with C(x) a constant depending on x, and β any real number satisfying Re(λ 2 ) < β < λ 1. (b) Suppose P W is also aperiodic. hen the eigenvalues σ 1, σ 2,... of P can be ordered so that 1 > σ 1 > σ 2..., where σ 1 has the left eigenvector µ which is a probability distribution on W. Moreover, µ is the unique quasi-stationary distribution of X n in W and for all x, y W, ( ) n γ (7) µ n (y) µ(y) = P x (X n = y N > n) µ(y) D(x), σ 1

5 PARALLEL REPLICA MEHODS FOR CMC 5 with D(x) a constant depending on x, and γ any real number satisfying γ > σ 2. Proof. We first justify the expression for the eigenvalues. Observe that for x y W, we have q(x, y) > if and only if p(x, y) >. It follows that Q W is irreducible if and only if P W is irreducible; see Definition 2.1 in [22. Now let e be the all ones column vector, e(x) = 1 for x W. Recall that q(x, y) for every x y E and y q(x, y) = for every x E. his implies that Q W e component-wise. Since W is non-absorbing, for some x W and y / W we have q(x, y) >, and it follows that z W q(x, z) <. his shows that at least one component of Q W e is strictly negative. he expression for the eigenvalues, and the fact that ν is signed (hence a probability distribution, after normalization) now follows from heorem 2.6 of Seneta [22. o see ν is the QSD for X(t) in W, we define the stopped process X (t) = X(t ) such that X(t) is absorbed outside W. For any x, z E, let e x be the column vector e x (z) = 1 if x = z and e x (z) = otherwise. Finiteness of W ensures that P x (X (t) = y) = e x e Q W t e y. hus, for each y W, and P ν (X(t) = y, > t) = P ν (X (t) = y) = νe Q W t e y = e λ1t ν(y) P ν ( > t) = P ν (X (t) W ) = e λ1t, which leads to ν(y) = P ν (X(t) = y > t). Now we turn to the convergence to ν. It follows from heorem 2.7 in [22 that there is a constant C(x) depending on x such that for any real β with Re(λ 2 ) < β, (8) P x (X(t) = y, > t) = P x (X (t) = y) = C(x)e λ1t ν(y) + O(e βt ) and (9) P x ( > t) = C(x)e λ1t + O(e βt ), It follows that ν t (t) ν(y) = P x (X(t) = y > t) ν(y) C(x)e (λ1 β)t where C(x) is now a (possibly different) constant depending on x. he arguments in (b) are similar, using the Perron-Frobenius theorem (Seneta [22, heorem 1.1). For analogous results on the QSD in more general settings, see [6, heorem4.5 for CMCs and [8, heorem 1 for DMCs. We are now ready to define metastability. Definition 4. Let W and λ i, σ i be as in heorem W is metastable for X(t) if λ 1 and (1) λ 1 λ 1 Re(λ 2 ). X(t) is metastable if it has at least one metastable set W. 2. W is metastable for X n if σ 1 1 and (11) σ 1 σ 2 σ 1. X n is metastable if it has at least one metastable set W.

6 6 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF In light of heorem 3, Conditions 1-2 in Definition 4 essentially say that the time to leave W is large in an absolute sense, and the time to leave W is large relative to the time to converge to the QSD in W. Metastability of the CMC is not necessarily equivalent to the metastability of its underlying embedded chain, as we now show. Consider X(t) with the infinitesimal generator 1 1/2 1/2 Q = 1/2 1 1/2 ɛ/2 ɛ ɛ/2, 1 1 where ɛ is positive. hen W = {1, 2, 3} is metastable for X(t) but not for X n, since σ 1.81, σ 2 1/2, λ 1 ɛ/2, Re(λ 2 ) 1/2. Now consider X(t) with the infinitesimal generator ɛ 1 ɛ 1 /2 ɛ 1 /2 Q = ɛ 1 1 ɛ 1 1 ɛ 1 1 ɛ hen W = {1, 2, 3} is metastable for X n but not for X(t), since σ 1 1 ɛ/5, σ 2 1/2, λ 1 1/5, Re(λ 2 ) 3ɛ 1 /2. Algorithm 1 below requires a collection of metastable sets for X(t), and Algorithm 2 requires a collection of metastable sets for X n. he only assumption we make on these sets is that they are pairwise disjoint. (he sets may be different for the two algorithms, as noted above.) hroughout we write W to denote a generic metastable set. We emphasize that we do not assume the metastable sets form a partition of E: the union of the metastable sets may be a proper subset of E. Here and below, we assume that each W has a unique QSD and that ν t (and µ t ) converge to the QSD in total variation norm, for any starting point x. We conclude this section by mentioning properties of the QSD which are essential for the consistency of our algorithms in Section 3 and 4 below. heorem Suppose X() ν. hen is exponentially distributed with the parameter λ 1 : P ν ( > t) = e λ1t, t >, and and X( ) are independent. 2. Suppose X µ. hen N is geometrically distributed with the parameter 1 σ 1 : P µ (N > n) = σ n 1, n = 1, 2,..., and N and X N are independent. Proof. he first part of 1 and 2 was shown in heorem 3. For the rest of the proof see [6. 3. he CMC ParRep Method.

7 PARALLEL REPLICA MEHODS FOR CMC Formulation of the CMC Algorithm. In this section, we introduce a method for accelerating the computation of π(f), where we recall f : E R is any bounded function and π is the stationary distribution. We call this algorithm CMC ParRep, for reasons that will be outlined below. Before we describe CMC ParRep, we introduce some notation. hroughout, X 1 (t),..., X R (t) will be independent processes with the same law as X(t) and with initial distributions supported in W. Recall that the first exit time of X(t) from W is = inf{t > : X(t) / W }. Similarly, for r = 1,..., R, we define the first exit time of X r (t) from W by and the smallest one among them by r = inf{t > : X r (t) / W } = min r r. We denote the index of the replica with the first exit time by M, i.e., M = arg min r. r, r, and M depend on W, but we do not make this explicit. We are in the position to present the CMC ParRep in Algorithm 1. In this algorithm, we will need user-chosen parameters t c associated with each metastable set W. Roughly speaking, these parameters correspond to the time for X(t) to converge to the QSD in W. he accumulated value F (f) sim serves as a quantity that approximates the integral end f(x(s)) ds when the algorithm terminates. If X par (t) remains in W for sufficiently long time (i.e., decorrelation threshold t c ), it is distributed nearly according to the QSD ν of X(t) in W by heorem 3. his means that at the end of the decorrelation stage, X par ( sim ) can be considered a sample of ν. he aim of the dephasing stage is to prepare a sequence of independent initial states with distribution ν. here are several ways for achieving this. Perhaps the simplest is the rejection method. In this procedure, each of the R replicas evolves independently. A parameter t p similar to the decorrelation threshold t c is selected. If a replica leaves W before spending a time interval of length t p in W, it restarts in W from the original initial state. Once all the replicas remain in W for time t p, we stop and take x 1,..., x R as the final states of all the replicas in the dephasing stage and use them for the subsequent parallel stage. Besides rejection sampling, another method is a Fleming-Viot based particle sampler; see the discussion after Algorithm 2 below. he acceleration of CMC ParRep comes from the parallel stage. Recall that, for each r = 1,..., R, if x 1,..., x R are independent, identically distributed (iid) with the common distribution ν, then 1,..., R are independent exponential random variables with common parameter λ 1. Using = min r r, it is then easy to check that R has the same distribution as 1. See Lemma 6 below. his means one only needs to wait for instead of 1 to observe an exit from W. Note that this is true whether or not W is metastable, so efficiency of the parallel stage does not require metastability. However, the dephasing stage is not efficient if W is not metastable. hat is because, in practice, the samples x 1,..., x R are obtained by

8 8 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Algorithm 1 CMC ParRep 1: Set a decorrelation threshold t c for each metastable set W. Initialize the simulation time clock sim = and the accumulated value F (f) sim =. We will write X par (t) for a simulation process that obeys the law of X(t). A complete ParRep cycle consists of three stages. 2: Decorrelation Stage : Starting at t = sim, evolve X par (t) until it spends an interval of the time length t c inside the same metastable set W. hat is, evolve X par (t) from time t = sim until time corr = inf{t sim + t c : X par (s) W forall s [t t c, t forsome W }. hen update corr F (f) sim = F (f) sim + f(x par (t)) dt, sim set sim = corr, and proceed to the dephasing stage. 3: Dephasing Stage : Let W be such that X par ( sim ) W, that is, W is the metastable set from the end of the last decorrelation stage. Generate R independent samples x 1,..., x R from ν, the QSD of X(t) in W. hen proceed to the parallel stage. 4: Parallel Stage : Start R parallel processes X 1 (t),..., X R (t) at x 1,..., x R, and evolve them from time t = until time. hen update (12) F (f) sim = F (f) sim + sim = sim + R, R f(x r (s))ds, set X par ( sim ) = X M ( ), and return to the decorrelation stage. 5: he algorithm is stopped when sim reached a user-chosen terminal time end. he stationary average π(f) is estimated as π(f) F (f) sim / sim. simulating trajectories which remain in W for a sufficiently long time t p. Such samples are hard to obtain when the typical time t p for x 1,..., x R to reach the QSD in W is not much smaller than the typical time to leave W. o see that each parallel stage has a consistent contribution to the stationary average, we make the following two observations. Suppose that x 1,..., x R are iid samples from ν. 1. he joint law of (R, X M ( )) is the same as that of ( 1, X( 1 )). hat is, the joint distribution of the first exit time and the exit state in the parallel stage is independent of the number of replicas. 2. he expected value of R f(x r (s))ds in (12) is the same as that of 1 f(x 1 (s))ds. hat is, the expected contribution to F (f) sim from each parallel stage is independent of the number of replicas. he first observation is a consequence of the heorem 5, and the second will be proved

9 PARALLEL REPLICA MEHODS FOR CMC 9 in heorem 7 below. Consistency of stationary averages follows from the points 1-2 above and the law of large numbers. Since there are indefinitely many parallel stages in a given W, consistency is ensured as long as the expected contribution to F (f) sim from the parallel stage has the correct expected value. See [1 for details and discussion in a related discrete time version of the algorithm under some idealized assumptions. he CMC ParRep algorithm suffers some serious drawbacks. Even if the parallel processors are synchronous, M and may not be known at the wall clock time when the first replica leaves W. he reason is that the holding times for a CMC are random, while the wall clock time for simulating each jump of the CMC is always roughly the same. We illustrate this problem in Figure 1. In the worst possible Fig. 1. he parallel stage of the CMC ParRep algorithm with two replicas. R1 escapes from W at t = 7 with 7 transitions while R2 escapes at t = 8 but with only 4 transitions. In the parallel stage of the CMC ParRep algorithm, R2 escaped from W before R1 does but 2 > 1. here is no acceleration in this case since the parallel stage does not terminate when R2 escapes. case, in order to determine M and, we must wait for all the replicas to leave W. However, one can set a variable min to record the current minimum first exit time over all replicas which have left W, and terminate any replicas which reach time min but have not left W, since no replica contributes to the accumulated value past time min. Since the expected first exit times E[ r, r = 1,..., R are roughly the same, if the variance in the number of jumps of X r (t) before time is small for all r = 1,..., R, then we can expect that the parallel stage stops after only a few replicas leave W. For the same reason, there is another major drawback of CMC ParRep. If f takes multiple values in W, then the computation of R f(x r (s))ds in (12) requires storing the entire history of each replica in that parallel stage. Hence, the implementation of the CMC ParRep might be memory demanding unless one is interested in the equilibrium average of a metastable-set invariant function f, i.e., if f(x) has only one value in each metastable set W. In Section 4 we present another algorithm, called embedded ParRep, which addresses these drawbacks Error Analysis of CMC ParRep. Here and below we will write E νr for the expectation of (X 1 (t),..., X R (t)) starting at ν R, where for ν R (x 1,..., x R ) = R ν(x r ), x 1,..., x R W. We begin with a simple well known lemma. Lemma 6. Suppose 1,..., R are iid exponential random variables with the parameter λ 1. hen = min 1 r R r is exponentially distributed with the parameter Rλ 1. In particular, R has the same distribution as 1. We now show that if the dephasing sampling is exact, then the expected contribution to the accumulated value F (f) sim from the parallel step of Algorithm 1 is exact.

10 1 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF heorem 7. Suppose in the dephasing step (x 1,..., x R ) ν R. hen the expected contribution to F (f) sim from the parallel stage of Algorithm 1 is independent of the number of replicas, [ R [ E νr f(x r (s))ds = E ν f(x(s))ds = ν(f)e ν [. Proof. First we consider the case with a single replica. We condition on the exit time 1 and write [ 1 [ t E ν f(x 1 (s))ds = E ν f(x 1 (s))ds 1 = t P ν ( 1 dt). Interchanging the two integrals of the right-hand side leads to s E ν [ f(x 1 (s)) 1 = t P( 1 dt)ds. Note that the inner integral can be written as E [ ν f(x 1 (s))1 >s 1 and hence [ 1 E ν f(x 1 (s))ds = E ν [ f(x 1 (s)) 1 > s P ν ( 1 > s)ds. Owing to the definition of QSD and the fact that E ν [ 1 = P ν ( 1 > s)ds, [ 1 E ν f(x 1 (s))ds = ν(f)e ν [ 1. In the case of multiple replicas, similar steps can be used to show that [ R R E νr f(x r (s))ds = E νr [f(x r (s)) > s P νr ( > s)ds. Recall that > s if and only if r > s for all r = 1,..., R. Using this, the fact that 1,..., r are independent, and the definition of the QSD, we get hus R [ E νr E νr [f(x r (s)) > s = E ν [f(x r (s)) r > s = ν(f). f(x r (s))ds = ν(f) R P νr ( > s)ds = ν(f)re νr [. Finally, the result follows from Lemma 6. he purpose of CMC ParRep is to efficiently simulate very long trajectories of a metastable CMC and estimate the equilibrium average π(f). CMC ParRep can produce accelerated dynamics of the CMC on a coarse state space where each coarse set corresponds to some W ; see the discussion below Algorithm 2 below. Our numerical experiments suggest that CMC ParRep (and also embedded ParRep described below) are consistent for estimating the stationary distribution.

11 PARALLEL REPLICA MEHODS FOR CMC 11 For CMC ParRep, we justify this claim in heorem 8 below, which shows that, starting in some W and waiting until the simulation leaves W, the error for a complete ParRep cycle in CMC ParRep compared to direct (serial) simulation vanishes as t c increases. See heorem 12 below for the analogous result on embedded ParRep. We note that the errors from each ParRep cycle produce an error in the estimation (5) of stationary averages that does not disappear as sim. However, we expect that the error vanishes as the thresholds t c = t p. Study of the this error is more involved and will be the focus of another work. Recall we have assumed convergence of ν tc ν V as t c, for every starting point x E, where V denotes total variation norm. See for instance heorem 3 for conditions guaranteeing this convergence. heorem 8. Consider CMC ParRep starting at x W in the decorrelation stage. Assume the dephasing stage sampling is exact, that is, (x 1,..., x R ) ν R. Consider the expected contribution to F (f) sim until the first time the simulation leaves W (either in the decorrelation or in the parallel stage), [ [ tc R F (f) sim E x f(x(s)) ds + E x,νr f(x r (s))ds, 1 >t c where E x,νr denotes expectation for (X(t), X 1 (t),..., X R (t)) with X(t) starting at x and the replicas (X 1 (t),..., X R (t)) starting at initial distribution ν R. he error compared to direct (serial) simulation satisfies the bound [ (13) Ex f(x(s))ds F (f) sim f sup E x [ ν tc ν V. x W = = Proof. We estimate [ f(x(s))ds F (f) sim Ex Ex Ex Ex [ t c f(x(s))ds E x,νr [ 1 >t c R [ [ f(x(s))ds t c > t R c E x,νr [ [ f(x(s))ds > t R c E νr t c f(x (s))ds r f(x r (s))ds > t c Px ( > t c ) f(x (s))ds r, where we used the fact that X(t) and the replicas (X 1 (t),..., X R (t)) are independent. By the Markov property, [ [ E x f(x(s))ds > t c = E νtc f(x(s))ds. t c By heorem 7, E νr [ R [ f(x r (s))ds = E ν f(x(s)) ds.

12 12 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Combining the above estimates and equalities, [ f(x(s))ds F (f) sim Ex [ [ Eνtc f(x(s))ds E ν f(x(s)) ds [ = E x f(x(s))ds ν tc (x) [ E x f(x(s))ds ν(x) x W x W f sup E x [ ν tc ν V. x W We note that E x [ is uniformly bounded in x W if, for instance, P W is irreducible and W is finite and non-absorbing for X(t), as in heorem 3. his uniform boundedness guarantees that the right hand side of (13) vanishes as t c. 4. he Embedded ParRep Method Formulation of the Embedded ParRep Algorithm. In this section, we introduce another algorithm for accelerating the computation of π(f). he algorithm, called embedded ParRep, circumvents the disadvantages of CMC ParRep discussed above. As mentioned in the previous section, CMC ParRep can be slow due to the randomness of the holding times. In the worst case, one has to wait until all replicas leave W in order to determine the first exit time. o circumvent this issue we propose an algorithm based on the embedded chain in which the parallel stage terminates as soon as one of the replicas leaves W. Before we describe embedded ParRep, we introduce some notations. hroughout, Xn, 1..., Xn R will be independent processes with the same law as X n and with initial distributions supported in W. Moreover, we consider Xn, 1..., Xn R as the embedded chains of X 1 (t),..., X r (t) defined above, and let τn, 1..., τn R be the corresponding holding times. Recall that the first exit time of X n from W is N = inf{n > : X n / W }. For r = 1,..., R, we define the first exit time of X r n from W by and the smallest among them by N r = min{n N; X r n / W } N = min{n r ; r = 1,..., R}. Note that it is possible that more than one replica leave W for the first time after N transitions. We denote by K the smallest index among these escaped replicas. hat is, K = min{r = 1,..., R; XN r / W }. It is clear from the above definition that N K = N. Of course N, N r, N and K depend on W, but we do not make this explicit. Here and below we write E µr for expectation of (Xn, 1..., Xn R ) starting at µ R, where µ R (x 1,..., x R ) = R µ(x r ), x 1,..., x R W.

13 PARALLEL REPLICA MEHODS FOR CMC 13 We begin by reproducing from [2 heorem 9 and 1 below, with proofs for completeness. heorem 9. Suppose (X 1 n,..., X R n ) has initial distribution µ R. hen R(N K 1) + K has the same distribution as N 1. Proof. By heorem 5, N 1 is geometrically distributed with rate P µ (N 1 > 1). Note that for any n and r = 1,..., R, the event {N K = n, K = k} is equivalent to the event {N 1 > n,..., N k 1 > n, N k = n, N k+1 > n 1,..., N R > n 1}. Since X 1 n,..., X R n are iid and N 1 is geometrically distributed with rate p = P µr (N 1 > 1), P µr (N K = n, K = k) = (1 p) n(k 1) (1 p) n 1 p(1 p) (n 1)(k 1) = (1 p) R(n 1)+k 1 p. hat is, R(N K 1) + K has geometric distribution with rate p. heorem 1. Suppose (X 1 n,..., X R n ) has the initial distribution µ R. hen X K N K is independent of R(N K 1) + K and the distribution of (X K N K, R(N K 1) + K) is same as that of (X 1 N 1, N 1 ). Proof. We first prove that X K N K is independent of K. Since X R n,..., X R n are iid and N k is independent of X k N k for each k, then X k N k is independent of N 1,..., N R. Note that K σ(n 1,..., N R ), hence X k N k is independent of K for each k. Now observe that for any A E, P µr (X K N K A) = R = P µr (X k N k A, K = r) R P µr (XN 1 (K = r) 1 A)PµR = P µr (X 1 N 1 A), that is, X K N and X 1 K N are equally distributed. his implies that X K 1 N is independent K of K. o see the independence between X K N and R(N K 1) + K, note that K P µr (X K N K A, N K = n, K = r) = P µr (X r N r A, N r = n, K = r) = P µr (X r N r A, K = r N r = n)p µr (N r = n) = P µr (X r N r A N r = n)p µr (N r = n, K = r) = P µr (X r N r A)PµR (N r = n, K = r) = P µr (X K N K A)PµR (N K = n, K = r) for any measurable A E, n Z + and r = 1,..., R. Finally, heorem 9 and the above analysis imply that (X K N, R(N K 1) + K) and (X 1 K N, N 1 ) are equally 1 distributed. Now we present the embedded ParRep algorithm in Algorithm 2. In this algorithm we will need user-chosen parameters n c associated with each metastable set W. Roughly, these parameters correspond to the time for X n to converge to the QSD in W. he DMC X n and holding times τ n are simulated by the stochastic simulation algorithm (SSA), see, for instance, [13, just as in the CMC ParRep. If Xn par remains in W for sufficiently long time (i.e., time t c ), it is distributed nearly according to the QSD µ of X n in W. See heorem 3. his means that at the end of the decorrelation stage, Xn par can be considered a sample of µ.

14 14 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Algorithm 2 Embedded ParRep 1: Set a decorrelation threshold n c for each metastable set W. Initialize the simulation time clock N sim = and the accumulated value F (f) sim =. We will write Xn par and par n for a DMC and holding time process following the law of the embedded chain and holding times of X(t) respectively. A complete ParRep cycle consists of three stages. 2: Decorrelation Stage: Starting at n = N sim, evolve Xn par and τn par until Xn par spends n c consecutive time steps inside of the same metastable set W. hat is, evolve X par n and τ par n from time n = N sim until time N corr = inf{n N sim +n c 1 : X par m W for m {n n c +1,..., n} forsome W }. hen update N corr 1 F (f) sim = F (f) sim + f(xn par ) τn par, n=n sim set N sim = N corr, and proceed to the dephasing stage. 3: Dephasing Stage : Let W be such that X par N sim W, that is, W is the metastable set from the end of the decorrelation stage. Generate R independent samples x 1,..., x R from µ, the QSD of X n in W. hen proceed to the parallel stage. 4: Parallel Stage : Start R parallel processes Xn, 1..., Xn R at x 1,..., x R, and evolve them and the corresponding holding times τn, 1..., τn R from time n = until time N. hen update (14) F (f) sim = F (f) sim + R N 2 k= N sim = N sim + R(N 1) + K, f(x r k) τ r k + K f(xn r 1) τ r N 1 set X par N sim = XN K, and return to the decorrelation stage. 5: he algorithm is stopped when N sim reaches some user-chosen time N end. he stationary average π(f) is estimated as π(f) F (f) sim /F (1) sim. he aim of the dephasing stage is to prepare a sequence of iid initial states with distribution µ. Like the CMC ParRep, rejection sampling can be used for the embedded ParRep as well. However, a more natural and efficient option for the embedded ParRep is a Fleming-Viot based sampling procedure [3, 11. he procedure can be summarized as follows. he R replicas Xn, 1..., Xn R, starting in W, evolve until one or more of them leaves W. hen each replica that left W is restarted from the current state of another replica that is currently in W (usually chosen uniformly at random). he procedure stops after the replicas have evolved for n = n p time steps, where n p is a parameter similar to n c. (If all the replicas leave W at the same time, the procedure restarts from the beginning.) With this type of sampling, the number of time steps simulated for each replica in the dephasing step is the same. In particular, if the R parallel processors

15 PARALLEL REPLICA MEHODS FOR CMC = 2 1 = 1 1 = 1 N $!1 = 2 N $!1 = 1 N $ = 2 N $ 3 = 3 1 = 3 N $!1 = 3 N $ R! 1 = R!1 1 = R!1 N $!1 = R!1 N $ R = R 1 = R N $!1 = R N $ Fig. 2. he diagram for one parallel stage of the embedded ParRep algorithm with R replicas. Each blue dot represents an exit event along the time line. Both replica 2 and 3 leave W after N = 6 transitions (the blue dot with the red x ), in which case K = 2. are synchronous (i.e. if each processor takes the same wall clock time to simulate one time step), then each processor finishes the dephasing step at the same wall clock time. he acceleration of the embedded ParRep comes from the parallel stage. Roughly, we only have to wait N time steps instead of N to observe an exit from W. he theoretical wall clock time speedup can be approximately a factor of R. See heorem 9 below. Like with CMC ParRep, the parallel step does not require metastability for this time speedup, but if W is not metastable, then the dephasing step will not be efficient. See the remarks below Algorithm 1. Similar to the CMC ParRep, each parallel stage of the embedded ParRep has a consistent averaged contribution to the stationary average. Suppose that x 1,..., x R are iid samples from µ. 1. he joint law of (X K N K, R(N K 1) + K) is the same as that of (X 1 N 1, N 1 ). hat is, the joint distribution of the first exit time and the exit state for each parallel stage is independent of the number of replicas. 2. he expected value of R is the same as that of N 2 k= f(x r k) τ r k + N 1 n= K f(xn r 1) τ r N 1 f(x 1 n) τ 1 n. Hence the expected contribution to F (f) sim from each parallel stage is independent of the number of replicas. See heorem 11 below. See heorem 1 and 11 for proofs of these statements. Remark 1 (Parallel implementation and efficiency). We expect that embedded ParRep is superior to the CMC ParRep for the following two reasons. First, consider the parallel stages of both algorithms. In the CMC ParRep, observing the first exit

16 16 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF event in the parallel stage is not sufficient to determine. But in embedded ParRep, once any replica leaves W, we know N. hus the embedded ParRep parallel step terminates once any of the replicas leaves W. For this reason we expect the parallel stage of the embedded ParRep to be significantly faster than that of the CMC ParRep. Second, consider the dephasing stage. For the embedded ParRep, Fleming- Viot sampling is a natural technique because if the processors are synchronous then they all finish the dephasing stage at the same wall clock time, and only the current states of each processor are needed at each time step to decide where to restart replicas which left W. For asynchronous processors, one can simply implement a polling time. his is not true, however, for Fleming-Viot sampling with the CMC ParRep. Indeed, to implement Fleming-Viot sampling with the CMC ParRep, one would have to store the histories of every replica, and the replicas would finish at potentially very different wall clock times. he rejection method can be slow for both algorithms, particularly when the metastability is weak or when the number of replicas is large Error analysis of the embedded ParRep. Now we are able to show that if the dephasing sampling is exact, then the expected contribution to F (f) sim from the parallel stage is exact. heorem 11. Suppose in the dephasing step (x 1,..., x R ) µ R. hen the expected contribution to F (f) sim from the parallel stage of Algorithm 2 is the same for every number of replicas. E µr [ R (15) N 2 k= f(x r k) τ r k + Proof. We first rewrite R = N 2 k= R N 1 i= K f(xn r 1) τ r N 1 f(x r k) τ r k + [ N 1 = E µ n= K f(xn r 1) τ r N 1 f(x r i ) τ r i R r=k+1 f(x r N 1) τ r N 1. For the first part, we condition N and obtain [ R N 1 R n 1 E µr f(xi r ) τi r = E µr [f(xi r ) τi r I N =n i= n=1 i= Interchanging the iterated summations leads to R n 1 E µr [f(xi r ) τi r I N =n = n=1 i= R i= f(x n ) τ n = µ(fq 1 )E µ [N. E µr [f(xi r )IN >i τi r. Notice N > i is equivalent to N 1 > i,..., N R > i and τi r is independent of N s for s r. hus = R i= R i= E µr [f(xi r ) τi r N > i P µr (N > i) E µr [f(xi r ) τi r N r > i P µr (N > i).

17 PARALLEL REPLICA MEHODS FOR CMC 17 Now by Lemma 1 and the definition of the QSD, E µ [f(x r i ) τ r i N r > i = E µ [E µ [f(x r i ) τ r i {X r n} n=,1,... N r > i Combining the last four equations gives (16) E µr [ R N 1 i= = E µ [f(x r i )E µ [ τ r i {X r n} n=,1,... N r > i = E µ [ f(x r i )q(x r i ) 1 N r > i = µ(fq 1 ). f(x r i ) τ r i = µ(fq 1 )RE µr [N. A similar argument can be applied to the second term on the right hand side of (15). First we condition N and K simultaneously such that [ R E µr f(xn r 1) τ r N 1 = r=k+1 R R n=1 r=k+1 E µr [ f(x r n 1) τ r n 1 N = n, K = k P µr (N = n, K = k). Interchanging the second and third summations the right-hand side equals Recall that R r 1 E [ µr f(xn 1) τ r n 1 N r = n, K = k P µr (N = n, K = k) n=1 r=2 k=1 N = n, K = k N 1 > n,..., N k 1 > n, N k = n, N k+1 > n 1,..., N R > n 1. hus, using independence of X 1 n,..., X R n and the definition of the QSD, = R r 1 E [ µr f(xn 1) τ r n 1 N r = n, K = k P µr (N = n, K = k) n=1 r=2 k=1 R r 1 E µ [ f(xn 1) τ r n 1 N r r > n 1 P µr (N = n, K = k) n=1 r=2 k=1 =µ(fq 1 ) R r 1 P µr (N = n, K = k) n=1 r=2 k=1 =µ(fq 1 )(R E µr [K). Combining the last three equations leads to [ R (17) E µ f(xn r 1) τ r N 1 = µ(fq 1 )(R E µr [K). r=k+1 Subtracting (17) from (16), we have E µr [ R N 1 i= f(x r i ) τ r i R r=k+1 f(x r N 1) τ r N 1 = µ(fq 1 )E µr [R(N 1) + K.

18 18 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Now the result follows since µ(fq 1 )E µr [R(N 1) + K = µ(fq 1 )E µ [N by heorem 1. In particular, when R = 1 we have N = N and K = 1, and thus [ N 1 E µ n= f(x n ) τ n = µ(fq 1 )E µ [N. We now prove an analog of heorem 8 for the embedded ParRep. Recall we have assumed convergence of µ nc µ V as n c, for every starting point x E. See for instance heorem 3 for conditions guaranteeing this convergence. heorem 12. Consider the embedded ParRep starting at x W in the decorrelation stage. Assume the dephasing stage sampling is exact, that is, (x 1,..., x R ) µ R. Consider the expected contribution to F (f) sim up until the first time the simulation leaves W (either in the decorrelation stage or in the parallel stage): [ nc N 1 [ R N 2 F (f) sim E x f(x n ) τ n + E x,µr 1N>n c f(xk) τ r k r n= K + 1N>n c f(xn r 1) τ r N 1, where E x,µr denotes expectation for (X n, Xn, 1..., Xn R ) with X n starting at x and the replicas (Xn, 1..., Xn R ) starting at the initial distribution µ R. he error compared to a direct (serial) simulation satisfies the bound [ N 1 (18) Ex f(x n ) τ n F (f) sim f sup E x [ µ nc µ V. x W n= Proof. he proof is similar to that for the CMC ParRep, [ N 1 f(x n ) τ n F (f) sim = Ex Ex n= [ N 1 n=n c N f(x n ) τ n E x,µr [ K + 1N>n c f(xn r 1) τ r N 1 Ex [ N 1 f(x n ) τ n N > n c n=n c E µr [ R N 2 k= f(x r k) τ r k + By the Markov property [ N 1 E x f(x n ) τ n N > n c n=n c R N 2 1N>n c k= k= f(x r k) τ r k K f(xn r 1) τ r N 1. [ N 1 = E µnc n= f(x n ) τ n.

19 Owing to heorem 11, E µr [ R herefore Ex N 2 k= [ N 1 n= Eµnc = x W PARALLEL REPLICA MEHODS FOR CMC 19 f(x r k) τ r k + f(x n ) τ n [ N 1 n= [ N 1 E x f(x n ) τ n n= K f(xn r 1) τ r N 1 F (f) sim f(x n ) τ n f sup E x [ µ nc µ V x W [ N 1 E µ n= µ nc (x) x W f(x n ) τ n [ N 1 E x [ N 1 = E µ n= n= f(x n ) τ n f(x n ) τ n µ(x) with the last equation coming from the fact that E x [ N 1 n= τ n = E x [. 5. Numerical Experiments. We present two numerical examples from the stochastic reaction networks in order to demonstrate the consistency and efficiency of the ParRep algorithms Reaction networks with linear propensity. We consider the following stochastic reaction network (19) A B C taken from [7, where A, B and C represent reacting species. he time evolution of the population (the number of species) in the reaction network is commonly modeled as a CMC X(t) = (X 1 (t), X 2 (t), X 3 (t)) with state space E Z 3 +. he jump rate of each reaction is governed by the propensity function (intensity) λ j (x), j = 1,..., 5 such that for all t >, P(X(t + h) = x + η j X(t) = x) λ j (x) = lim, h h where η j is the state change vector associated with the jth reaction. We list the reactions and their corresponding propensity functions and state change vectors in able 1. able 1 Reactions, propensity functions and state change vectors Reaction Propensity function State change vector A λ 1 (x) = c 1 η 1 = (1,, ) A B λ 2 (x) = c 2 x 1 η 2 = ( 1, 1, ) B A λ 3 (x) = c 3 x 2 η 3 = (1, 1, ) B C λ 4 (x) = c 4 x 2 η 4 = (, 1, 1) C λ 5 (x) = c 5 x 3 η 5 = (,, 1).

20 2 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF In this numerical experiment, we take the initial state x = (5, 1, 1) and the rate constants (c 1, c 2, c 3, c 4, c 5 ) = (.1, 1, 1,.1,.1). With this choice of parameters the timescale separation is about ɛ = 1 4 and hence the process X(t) demonstrates metastability. he reactions A B and B A occur with a much higher probability than the other reactions and hence we call A B and B A fast reactions and the other reactions slow reactions. he occurrence of slow reactions is a rare event. We define the observables f 1 (x) = x 1 + x 2 and f 2 (x) = x 3, the collection of sets {W m,n } m,n Z+ with W m,n = {x E : f 1 (x) = m, f 2 (x) = n} form a full decomposition of the state space E. Note that both the total population of species A and B (i.e., f 1 (X(t))) and the population of species C (i.e. f 2 (X(t))) remain constant until one of the slow reactions occurs. Hence the typical sojourn time for X(t) in each W m,n is very long comparing to the transition time between any two states that are in W m,n. In this case, we say X(t) is metastable in W m,n. For example, with the initial population x = (1, 1, ), the states (1, 1, ), (2,, ) and (, 2, ) form a metastable set since the fast reactions A B and B A occur with a significantly higher probability than slow reactions and only the occurrence of the slow reactions can allow the process to move from the metastable set to another metastable set. Note that both observables f 1 and f 2 defined above are invariant in each metastable set, we call them slow observables. In general, an observable f is called a slow observable if it is invariant in each metastable set W m,n, i.e., there is a constant C(m, n) such that f(x) = C(m, n) for each x W m,n. An observable is called a fast observable if it is not slow (e.g., f(x) = x 1 ). his kind of two-scale problems arise in many fields other than the stochastic reaction networks, such as the queuing theory and population dynamics. Estimation of the distributions of two-scale processes can be computationally prohibitive due to the insufficient sampling of the rare events. herefore, it is desirable to apply the two ParRep algorithms proposed in this paper to accelerate the long time simulation and estimate the stationary distribution. We apply both the CMC ParRep and the embedded ParRep to estimate the stationary averages of the slow observables f 1 and f 2. he stationary distribution of the fast observable f 3 (x) = x 1 is also computed using the embedded ParRep. On the other hand, for the reaction network (19) under consideration, one can calculate the stationary distribution analytically since it only involves mono-molecular reactions. In fact, it can be shown that the stationary distribution is a multivariate Poisson distribution [7, that is, (2) π(x 1, x 2, x 3 ) = x1 x2 x3 λ 1 λ 2 λ 3 e ( λ 1+ λ 2+ λ 3), x 1!x 2!x 3! where λ 1 = c 1(c 3 + c 4 ) c 2 c 4, λ2 = c 1 c 4, λ3 = c 1 c 5. Hence the exact stationary averages of the slow observables f 1 and f 2 are π(f 1 ) = 2.1 and π(f 2 ) = 1 and the exact stationary averages of the fast observable f 3 (x) = x 1 is 1.1. We use this exact result to compare with our result from numerical simulation.

21 :(f2) :(f2) :(f1) :(f1) PARALLEL REPLICA MEHODS FOR CMC CMC ParRep 22 Embedded ParRep 21.5 CMC ParRep Exact 21.5 Embedded ParRep Exact Replica Replica Fig. 3. he stationary average of the slow observable f 1 (x) = x 1 +x 2 computed with the CMC ParRep (left) and with the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. 11 CMC ParRep 11 Embedded ParRep CMC ParRep Exact Embedded ParRep Exact Replica Replica Fig. 4. he stationary average of the slow observable f 2 (x) = x 3 computed with the CMC ParRep (left) and with the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. Our simulations compare the CMC ParRep and the embedded ParRep with the Stochastic Simulation Algorithm (SSA), [13. In Figure 3, we demonstrate the estimation of π(f 1 ) using the CMC ParRep and the embedded ParRep with various numbers of replicas (R = 1, 2,, 1) and with SSA (R = 1). Similarly, Figure 4 shows the estimation of π(f 2 ). Note that only the embedded ParRep is used to compute the stationary average of the fast variable f(x) = x 1 since the CMC ParRep is not efficient for fast observables as we commented at the end of Section 3.1. Currently, the rejection sampling is used for dephasing and the decorrelation and dephasing thresholds are taken to be t c = t p =.1 for the CMC ParRep and n c = n p = 15 steps for the embedded ParRep. In Figure 5, the estimation for the fast observable and speedup are shown. It can be seen that with 1 replicas, the speedup factor is about 4.5 for the CMC ParRep and 5.5 for the embedded ParRep.

22 :(x1) Speedup 22 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Embedded ParRep Embedded ParRep Exact 3 25 CMC vs. Embedded CMC ParRep Embedded ParRep Replica Replica Fig. 5. he stationary average of the fast observable f 3 (x) = x 1 computed with the embedded ParRep (left) and the speedup comparison between the CMC ParRep and the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. When the number of replicas increases, the embedded ParRep becomes much more efficient than the CMC ParRep. However, even the embedded ParRep is far away from the linear speedup (with 1 replicas, about 27 times faster than SSA). his sublinear speedup comes from the fact that when the number of replica is large, the acceleration is offset by the inefficient rejection sampling based dephasing procedure. We expect that the embedded ParRep would be more efficient if the Fleming-Viot particle processes are used for dephasing Reaction networks with nonlinear propensity. In the second example, we focus on the following network from [24, (21) S 1 S 2, S 1 S 3, 2S 2 + S 3 3S 4. he propensity function and state change vector associated with each reaction is shown in able 2. Note that by the law of mass action, the reactions 2S 2 + S 3 3S 4 have nonlinear propensity functions. able 2 Reactions, propensity functions and state change vectors Reaction Propensity function State change vector S 1 S 2 λ 1 (x) = c 1 x 1 η 1 = ( 1, 1,, ) S 2 S 1 λ 2 (x) = c 2 x 2 η 2 = (1, 1,, ) S 1 S 3 λ 3 (x) = c 3 x 1 η 3 = ( 1,, 1, ) S 3 S 1 λ 4 (x) = c 4 x 3 η 4 = (1,, 1, ) 2S 2 + S 3 3S 4 λ 5 (x) = c 5 x 2 (x 2 1)x 3 η 5 = (, 2, 1, 3) 3S 4 2S 2 + S 3 λ 6 (x) = c 6 x 3 (x 3 1)(x 3 2) η 6 = (, 2, 1, 3) hroughout this example, we choose the initial state x = (3, 3, 3, 3) and the reaction rate constants (c 1, c 2, c 3, c 4, c 5, c 6 ) = (.1,.1,.1,.1, 2, 2)

The parallel replica method for Markov chains

The parallel replica method for Markov chains David Aristoff (joint work with T Lelièvre and G Simpson) Colorado State University March 2015 D Aristoff (Colorado State University) March 2015 1 / 29 Introduction