arxiv: v2 [math.na] 20 Dec 2016

Size: px
Start display at page:

Download "arxiv: v2 [math.na] 20 Dec 2016"

Transcription

1 SAIONARY AVERAGING FOR MULI-SCALE CONINUOUS IME MARKOV CHAINS USING PARALLEL REPLICA DYNAMICS ING WANG, PER PLECHÁČ, AND DAVID ARISOFF arxiv: v2 [math.na 2 Dec 216 Abstract. We propose two algorithms for simulating continuous time Markov chains in the presence of metastability. We show that the algorithms correctly estimate, under the ergodicity assumption, stationary averages of the process. Both algorithms, based on the idea of the parallel replica method, use parallel computing in order to explore metastable sets more efficiently. he algorithms require no assumptions on the Markov chains beyond ergodicity and the presence of identifiable metastability. In particular, there is no assumption on reversibility. We present error analyses, as well as numerical simulations on multi-scale stochastic reaction network models in order to demonstrate consistency of the method and its efficiency. Key words. Markov chains, Monte Carlo, reversibility, stationary distribution, metastability, parallel replica, stochastic reaction networks, multi-scale dynamics, coarse graining AMS subject classifications. 6J22, 65C5, 65Z5, 82B31, 92E2 1. Introduction. We focus on computing stationary averages of continuous time Markov chains. More precisely, if π is the stationary distribution of a continuous time Markov chain (CMC) and f is a function on the state space, we aim at estimating the average π(f) E π [f by taking a time average on a long trajectory of the CMC. here are many methods for computing stationary averages of stochastic processes, however, the vast majority of them rely on reversibility of the process, e.g., as in Markov chain Monte Carlo [2. Computational cost of the ergodic (trajectory) averaging becomes prohibitive when the convergence to the stationary distribution is slow due to metastability of the dynamics, for example in the presence of rare events or large time scale disparities (multi-scale dynamics), [21. A possible remedy for this issues is to use parallel computing in order to accelerate sampling of the state space. For instance, the parallel tempering method (also known as the replica exchange) [12, 1, 9, 17 has been successfully applied to many problems by simulating multiple replicas of the original systems, each replica at a different temperature. However, the method requires the time reversibility of the underlying processes, which is typically not true for processes that model chemical reaction networks or systems with non-equilibrium steady states. In fact, there are not many methods that parallelize Monte Carlo simulation for irreversible processes with metastability, in particular if long-time sampling such as ergodic averaging, is required. We present a parallel computing approach for CMCs without time reversibility. One advantage of the proposed algorithms is that they may be used, in principle, on arbitrary CMCs. However, gains in efficiency can occur only if the CMC is metastable. In this contribution we consider only models described by continuous time Markov chains. As a motivating example we study a multi-scale chemical reaction network model in which molecules of different types react with different rates depending on their concentrations and reaction rate constants. In this model metastability emerges due to the infrequent occurrence of reactions with small rates which makes the relaxation to the steady state dynamics extremely slow. In the transient regime the finite time distribution can be approximated using the stochastic averaging tech- University of Delaware, Newark, DE, (tingw@udel.edu), University of Delaware, Newark, DE, (plechac@math.udel.edu), Colorado State University, Fort Collins, CO, 8523 (aristoff@rams.colorado.edu). 1

2 2 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF nique, [24, 14, or the tau-leap method [19. However, the former does not apply for stationary distribution estimation and the latter can be still computationally expensive for long-time simulations. It is thus desirable to have an efficient algorithm for computing the stationary averages. hus the proposed algorithm will provide a new multi-scale simulation method (in particular for stationary averaging estimation) for the stochastic reaction networks community. he presented approach builds on the parallel replica (ParRep) dynamics introduced in the context of molecular simulations in [23. he ParRep method used in the context of stochastic differential equations, e.g. Langevin dynamics, was rigorously analysed in [15, 16. he algorithm we present and analyse builds on the recent work of [1, 2 where the ParRep process was studied for discrete-time Markov chains. In our algorithms, each time the simulation reaches a local equilibrium in a metastable set W, R independent replicas of the CMC are launched inside the set allowing for parallel simulations of the dynamics at this stage. he main contribution of this work is a procedure for using the replicas in order to efficiently and consistently estimate the exit time and exit state from W, along with the contribution to the stationary time average of f from the time spent in W. We emphasize that we are able to handle arbitrary functions (or observables) on the state space, not only those that are piece-wise constant, i.e., assuming a single value in each W. In the best case, if there are R replicas, then the simulation leaves a metastable set about R times faster compared to a direct serial simulation. he consistency of our algorithms relies on certain properties of the quasi-stationary distribution (QSD) which are essentially local equilibria associated with the metastable sets. We propose two algorithms for computing π(f), called CMC ParRep and embedded ParRep. he former uses parallel simulation of the CMC, while the latter employs parallel simulation of its embedded chain, which is a discrete time Markov chain (DMC). CMC ParRep (resp. embedded ParRep) relies on the fact that, starting at the QSD in a metastable set, the first time to leave the set is an exponential (resp. geometric) random variable and independent of the exit state; see heorem 5 below. he algorithms require some methods for identifying metastable sets, though this need not be done a priori it is sufficient to identify when the CMC is currently in a metastable set, and when it exits such set. While both algorithms can be useful for efficient simulation of π(f) in the presence of metastability, we expect the embedded ParRep can be significantly more efficient, especially when combined with a certain type of QSD sampling, called Fleming-Viot [3, 4. hough we focus here on the computation of π(f), we note that one of our algorithms, CMC ParRep, can be used to compute the dynamics of the CMC on a coarse space in which each metastable set is considered a single (meta-)state. See the discussion below Algorithm 1. he advantages of the proposed algorithms include: (a) no requirement of time reversibility for the underlying dynamics; (b) they are suitable for long-time sampling; (c) they may be used, in principle, on arbitrary CMCs in the presence of metastability. In Section 2, we briefly review CMCs before defining QSDs and detailing relevant properties thereof. In Section 3, we present CMC ParRep, and study how the error in the algorithm depends on the quality of QSD sampling. In Section 4, we present embedded ParRep and provide an analogous error analysis. We detail some numerical experiments on multi-scale chemical reaction network model in Section 5 in order to demonstrate the consistency and accuracy of the algorithms.

3 PARALLEL REPLICA MEHODS FOR CMC 3 2. Background and problem formulation Continuous ime Markov Chains. hroughout this paper, X(t) is an irreducible and positive recurrent continuous time Markov chain (CMC) with values in a countable set E and π denotes the stationary distribution of X(t). We are interested in computing stationary averages π(f) for a bounded function f : E R by using the ergodic theorem 1 (1) lim t t t f(x(s))ds = π(f), which holds almost surely for any initial distribution of X(t). he jump times τ n and holding times τ n for X(t) are defined recursively by and τ =, τ n = inf{t > τ n 1 : X(t) X(τ n 1 )}, τ n 1 = τ n τ n 1 for n 1. We assume that X(t) is non-explosive, that is, lim n τ n = almost surely for every initial distribution of X(t). his precludes the possibility of infinitely many jumps in finite time. We denote X n = X(τ n ) the embedded chain of X(t). It is easy to see that X n is a discrete time Markov chain (DMC). Recall that X(t) is completely determined by its infinitesimal generator matrix Q = {q(x, y)} x,y E. We write q(x) := q(x, x); note that irreducibility implies q(x) > for all x E. It is easy to check that X n has the transition probability matrix P = {p(x, y)} x,y E satisfying { q(x,y) q(x) p(x, y) =, x y,, x = y. We state the following well known fact for the later reference. Lemma 1. For a CMC X(t) with the corresponding embedded Markov chain X n, the holding time between successive jumps τ, τ 1,, τ i, are independent conditioned on the embedded chain X n. Moreover, τ i {X n } is exponentially distributed with the rate q(x i ) and hence E [ τ i {X n } = q(x i ) 1. For details on the above facts, see for instance [ he Quasi-stationary Distribution and Metastability. Below, we write P, E for various probabilities and expectations, the precise meaning of which will be clear from context. We use a superscript P ξ, E ξ to indicate that the initial distribution is ξ. When the initial distribution is δ x, we write P x, E x. he symbol will indicate equality in probability law. Re( ) and denote the real part and modulus of a complex number. Our ParRep algorithms rely on certain properties of quasi-stationary distributions, which we now briefly review. Let W E be fixed and consider the first exit time of X(t) from W, that is, = inf{t > ; X(t) / W }. We consider also the first exit time of X n from W, N = inf{n > ; X n / W }.

4 4 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF A quasi-stationary distribution (QSD) of X(t) in W (or X n in W ) is defined as follows. Definition 2. A probability distribution ν with support in W is a quasi-stationary distribution for X(t) in W if for each y W and t >, (2) ν(y) = P ν (X(t) = y > t). Similarly, a probability distribution µ with support in W is a QSD for X n in W if for each y W and n >, (3) µ(y) = P µ (X n = y N > n). hroughout we write ν for a QSD of the CMC X(t) and µ for a QSD of the embedded chain X n. he associated set W will be implicit since no ambiguities should arise. We will write (4) ν t (A) = P x (X(t) A > t) for the distribution of X(t) conditioned on > t, and (5) µ n (A) = P x (X n A N > n). for the distribution of X n conditioned on N > n. Notice we do not make explicit the dependence on the starting point x. We summarize existence, uniqueness, and convergence properties of the QSD in heorem 3 below (see [6, 22). In heorem 3 below, for simpler presentation, we assume W is finite. hat allows us to characterize convergence to the QSD of X(t) and X n in terms of spectral properties of their generator and transition matrices. We emphasize, however, that finiteness of W is not required for consistency of the algorithms proposed in this paper. Recall that Q is the infinitesimal generator matrix of X(t) and P is the transition probability matrix of the DMC X n. We denote Q W = {q xy } x,y W and P W = {p xy } x,y W the restrictions of P and Q to W. heorem 3. Let W be finite and nonabsorbing for X(t), and assume P W is irreducible. (a) he eigenvalues λ 1, λ 2,... of Q W can be ordered so that > λ 1 > Re(λ 2 )..., where λ 1 has the left eigenvector ν which is a probability distribution on W. Moreover, ν is the unique quasi-stationary distribution of X(t) in W, and for all x, y W, (6) ν t (y) ν(y) = P x (X(t) = y > t) ν(y) C(x)e (λ1 β)t, with C(x) a constant depending on x, and β any real number satisfying Re(λ 2 ) < β < λ 1. (b) Suppose P W is also aperiodic. hen the eigenvalues σ 1, σ 2,... of P can be ordered so that 1 > σ 1 > σ 2..., where σ 1 has the left eigenvector µ which is a probability distribution on W. Moreover, µ is the unique quasi-stationary distribution of X n in W and for all x, y W, ( ) n γ (7) µ n (y) µ(y) = P x (X n = y N > n) µ(y) D(x), σ 1

5 PARALLEL REPLICA MEHODS FOR CMC 5 with D(x) a constant depending on x, and γ any real number satisfying γ > σ 2. Proof. We first justify the expression for the eigenvalues. Observe that for x y W, we have q(x, y) > if and only if p(x, y) >. It follows that Q W is irreducible if and only if P W is irreducible; see Definition 2.1 in [22. Now let e be the all ones column vector, e(x) = 1 for x W. Recall that q(x, y) for every x y E and y q(x, y) = for every x E. his implies that Q W e component-wise. Since W is non-absorbing, for some x W and y / W we have q(x, y) >, and it follows that z W q(x, z) <. his shows that at least one component of Q W e is strictly negative. he expression for the eigenvalues, and the fact that ν is signed (hence a probability distribution, after normalization) now follows from heorem 2.6 of Seneta [22. o see ν is the QSD for X(t) in W, we define the stopped process X (t) = X(t ) such that X(t) is absorbed outside W. For any x, z E, let e x be the column vector e x (z) = 1 if x = z and e x (z) = otherwise. Finiteness of W ensures that P x (X (t) = y) = e x e Q W t e y. hus, for each y W, and P ν (X(t) = y, > t) = P ν (X (t) = y) = νe Q W t e y = e λ1t ν(y) P ν ( > t) = P ν (X (t) W ) = e λ1t, which leads to ν(y) = P ν (X(t) = y > t). Now we turn to the convergence to ν. It follows from heorem 2.7 in [22 that there is a constant C(x) depending on x such that for any real β with Re(λ 2 ) < β, (8) P x (X(t) = y, > t) = P x (X (t) = y) = C(x)e λ1t ν(y) + O(e βt ) and (9) P x ( > t) = C(x)e λ1t + O(e βt ), It follows that ν t (t) ν(y) = P x (X(t) = y > t) ν(y) C(x)e (λ1 β)t where C(x) is now a (possibly different) constant depending on x. he arguments in (b) are similar, using the Perron-Frobenius theorem (Seneta [22, heorem 1.1). For analogous results on the QSD in more general settings, see [6, heorem4.5 for CMCs and [8, heorem 1 for DMCs. We are now ready to define metastability. Definition 4. Let W and λ i, σ i be as in heorem W is metastable for X(t) if λ 1 and (1) λ 1 λ 1 Re(λ 2 ). X(t) is metastable if it has at least one metastable set W. 2. W is metastable for X n if σ 1 1 and (11) σ 1 σ 2 σ 1. X n is metastable if it has at least one metastable set W.

6 6 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF In light of heorem 3, Conditions 1-2 in Definition 4 essentially say that the time to leave W is large in an absolute sense, and the time to leave W is large relative to the time to converge to the QSD in W. Metastability of the CMC is not necessarily equivalent to the metastability of its underlying embedded chain, as we now show. Consider X(t) with the infinitesimal generator 1 1/2 1/2 Q = 1/2 1 1/2 ɛ/2 ɛ ɛ/2, 1 1 where ɛ is positive. hen W = {1, 2, 3} is metastable for X(t) but not for X n, since σ 1.81, σ 2 1/2, λ 1 ɛ/2, Re(λ 2 ) 1/2. Now consider X(t) with the infinitesimal generator ɛ 1 ɛ 1 /2 ɛ 1 /2 Q = ɛ 1 1 ɛ 1 1 ɛ 1 1 ɛ hen W = {1, 2, 3} is metastable for X n but not for X(t), since σ 1 1 ɛ/5, σ 2 1/2, λ 1 1/5, Re(λ 2 ) 3ɛ 1 /2. Algorithm 1 below requires a collection of metastable sets for X(t), and Algorithm 2 requires a collection of metastable sets for X n. he only assumption we make on these sets is that they are pairwise disjoint. (he sets may be different for the two algorithms, as noted above.) hroughout we write W to denote a generic metastable set. We emphasize that we do not assume the metastable sets form a partition of E: the union of the metastable sets may be a proper subset of E. Here and below, we assume that each W has a unique QSD and that ν t (and µ t ) converge to the QSD in total variation norm, for any starting point x. We conclude this section by mentioning properties of the QSD which are essential for the consistency of our algorithms in Section 3 and 4 below. heorem Suppose X() ν. hen is exponentially distributed with the parameter λ 1 : P ν ( > t) = e λ1t, t >, and and X( ) are independent. 2. Suppose X µ. hen N is geometrically distributed with the parameter 1 σ 1 : P µ (N > n) = σ n 1, n = 1, 2,..., and N and X N are independent. Proof. he first part of 1 and 2 was shown in heorem 3. For the rest of the proof see [6. 3. he CMC ParRep Method.

7 PARALLEL REPLICA MEHODS FOR CMC Formulation of the CMC Algorithm. In this section, we introduce a method for accelerating the computation of π(f), where we recall f : E R is any bounded function and π is the stationary distribution. We call this algorithm CMC ParRep, for reasons that will be outlined below. Before we describe CMC ParRep, we introduce some notation. hroughout, X 1 (t),..., X R (t) will be independent processes with the same law as X(t) and with initial distributions supported in W. Recall that the first exit time of X(t) from W is = inf{t > : X(t) / W }. Similarly, for r = 1,..., R, we define the first exit time of X r (t) from W by and the smallest one among them by r = inf{t > : X r (t) / W } = min r r. We denote the index of the replica with the first exit time by M, i.e., M = arg min r. r, r, and M depend on W, but we do not make this explicit. We are in the position to present the CMC ParRep in Algorithm 1. In this algorithm, we will need user-chosen parameters t c associated with each metastable set W. Roughly speaking, these parameters correspond to the time for X(t) to converge to the QSD in W. he accumulated value F (f) sim serves as a quantity that approximates the integral end f(x(s)) ds when the algorithm terminates. If X par (t) remains in W for sufficiently long time (i.e., decorrelation threshold t c ), it is distributed nearly according to the QSD ν of X(t) in W by heorem 3. his means that at the end of the decorrelation stage, X par ( sim ) can be considered a sample of ν. he aim of the dephasing stage is to prepare a sequence of independent initial states with distribution ν. here are several ways for achieving this. Perhaps the simplest is the rejection method. In this procedure, each of the R replicas evolves independently. A parameter t p similar to the decorrelation threshold t c is selected. If a replica leaves W before spending a time interval of length t p in W, it restarts in W from the original initial state. Once all the replicas remain in W for time t p, we stop and take x 1,..., x R as the final states of all the replicas in the dephasing stage and use them for the subsequent parallel stage. Besides rejection sampling, another method is a Fleming-Viot based particle sampler; see the discussion after Algorithm 2 below. he acceleration of CMC ParRep comes from the parallel stage. Recall that, for each r = 1,..., R, if x 1,..., x R are independent, identically distributed (iid) with the common distribution ν, then 1,..., R are independent exponential random variables with common parameter λ 1. Using = min r r, it is then easy to check that R has the same distribution as 1. See Lemma 6 below. his means one only needs to wait for instead of 1 to observe an exit from W. Note that this is true whether or not W is metastable, so efficiency of the parallel stage does not require metastability. However, the dephasing stage is not efficient if W is not metastable. hat is because, in practice, the samples x 1,..., x R are obtained by

8 8 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Algorithm 1 CMC ParRep 1: Set a decorrelation threshold t c for each metastable set W. Initialize the simulation time clock sim = and the accumulated value F (f) sim =. We will write X par (t) for a simulation process that obeys the law of X(t). A complete ParRep cycle consists of three stages. 2: Decorrelation Stage : Starting at t = sim, evolve X par (t) until it spends an interval of the time length t c inside the same metastable set W. hat is, evolve X par (t) from time t = sim until time corr = inf{t sim + t c : X par (s) W forall s [t t c, t forsome W }. hen update corr F (f) sim = F (f) sim + f(x par (t)) dt, sim set sim = corr, and proceed to the dephasing stage. 3: Dephasing Stage : Let W be such that X par ( sim ) W, that is, W is the metastable set from the end of the last decorrelation stage. Generate R independent samples x 1,..., x R from ν, the QSD of X(t) in W. hen proceed to the parallel stage. 4: Parallel Stage : Start R parallel processes X 1 (t),..., X R (t) at x 1,..., x R, and evolve them from time t = until time. hen update (12) F (f) sim = F (f) sim + sim = sim + R, R f(x r (s))ds, set X par ( sim ) = X M ( ), and return to the decorrelation stage. 5: he algorithm is stopped when sim reached a user-chosen terminal time end. he stationary average π(f) is estimated as π(f) F (f) sim / sim. simulating trajectories which remain in W for a sufficiently long time t p. Such samples are hard to obtain when the typical time t p for x 1,..., x R to reach the QSD in W is not much smaller than the typical time to leave W. o see that each parallel stage has a consistent contribution to the stationary average, we make the following two observations. Suppose that x 1,..., x R are iid samples from ν. 1. he joint law of (R, X M ( )) is the same as that of ( 1, X( 1 )). hat is, the joint distribution of the first exit time and the exit state in the parallel stage is independent of the number of replicas. 2. he expected value of R f(x r (s))ds in (12) is the same as that of 1 f(x 1 (s))ds. hat is, the expected contribution to F (f) sim from each parallel stage is independent of the number of replicas. he first observation is a consequence of the heorem 5, and the second will be proved

9 PARALLEL REPLICA MEHODS FOR CMC 9 in heorem 7 below. Consistency of stationary averages follows from the points 1-2 above and the law of large numbers. Since there are indefinitely many parallel stages in a given W, consistency is ensured as long as the expected contribution to F (f) sim from the parallel stage has the correct expected value. See [1 for details and discussion in a related discrete time version of the algorithm under some idealized assumptions. he CMC ParRep algorithm suffers some serious drawbacks. Even if the parallel processors are synchronous, M and may not be known at the wall clock time when the first replica leaves W. he reason is that the holding times for a CMC are random, while the wall clock time for simulating each jump of the CMC is always roughly the same. We illustrate this problem in Figure 1. In the worst possible Fig. 1. he parallel stage of the CMC ParRep algorithm with two replicas. R1 escapes from W at t = 7 with 7 transitions while R2 escapes at t = 8 but with only 4 transitions. In the parallel stage of the CMC ParRep algorithm, R2 escaped from W before R1 does but 2 > 1. here is no acceleration in this case since the parallel stage does not terminate when R2 escapes. case, in order to determine M and, we must wait for all the replicas to leave W. However, one can set a variable min to record the current minimum first exit time over all replicas which have left W, and terminate any replicas which reach time min but have not left W, since no replica contributes to the accumulated value past time min. Since the expected first exit times E[ r, r = 1,..., R are roughly the same, if the variance in the number of jumps of X r (t) before time is small for all r = 1,..., R, then we can expect that the parallel stage stops after only a few replicas leave W. For the same reason, there is another major drawback of CMC ParRep. If f takes multiple values in W, then the computation of R f(x r (s))ds in (12) requires storing the entire history of each replica in that parallel stage. Hence, the implementation of the CMC ParRep might be memory demanding unless one is interested in the equilibrium average of a metastable-set invariant function f, i.e., if f(x) has only one value in each metastable set W. In Section 4 we present another algorithm, called embedded ParRep, which addresses these drawbacks Error Analysis of CMC ParRep. Here and below we will write E νr for the expectation of (X 1 (t),..., X R (t)) starting at ν R, where for ν R (x 1,..., x R ) = R ν(x r ), x 1,..., x R W. We begin with a simple well known lemma. Lemma 6. Suppose 1,..., R are iid exponential random variables with the parameter λ 1. hen = min 1 r R r is exponentially distributed with the parameter Rλ 1. In particular, R has the same distribution as 1. We now show that if the dephasing sampling is exact, then the expected contribution to the accumulated value F (f) sim from the parallel step of Algorithm 1 is exact.

10 1 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF heorem 7. Suppose in the dephasing step (x 1,..., x R ) ν R. hen the expected contribution to F (f) sim from the parallel stage of Algorithm 1 is independent of the number of replicas, [ R [ E νr f(x r (s))ds = E ν f(x(s))ds = ν(f)e ν [. Proof. First we consider the case with a single replica. We condition on the exit time 1 and write [ 1 [ t E ν f(x 1 (s))ds = E ν f(x 1 (s))ds 1 = t P ν ( 1 dt). Interchanging the two integrals of the right-hand side leads to s E ν [ f(x 1 (s)) 1 = t P( 1 dt)ds. Note that the inner integral can be written as E [ ν f(x 1 (s))1 >s 1 and hence [ 1 E ν f(x 1 (s))ds = E ν [ f(x 1 (s)) 1 > s P ν ( 1 > s)ds. Owing to the definition of QSD and the fact that E ν [ 1 = P ν ( 1 > s)ds, [ 1 E ν f(x 1 (s))ds = ν(f)e ν [ 1. In the case of multiple replicas, similar steps can be used to show that [ R R E νr f(x r (s))ds = E νr [f(x r (s)) > s P νr ( > s)ds. Recall that > s if and only if r > s for all r = 1,..., R. Using this, the fact that 1,..., r are independent, and the definition of the QSD, we get hus R [ E νr E νr [f(x r (s)) > s = E ν [f(x r (s)) r > s = ν(f). f(x r (s))ds = ν(f) R P νr ( > s)ds = ν(f)re νr [. Finally, the result follows from Lemma 6. he purpose of CMC ParRep is to efficiently simulate very long trajectories of a metastable CMC and estimate the equilibrium average π(f). CMC ParRep can produce accelerated dynamics of the CMC on a coarse state space where each coarse set corresponds to some W ; see the discussion below Algorithm 2 below. Our numerical experiments suggest that CMC ParRep (and also embedded ParRep described below) are consistent for estimating the stationary distribution.

11 PARALLEL REPLICA MEHODS FOR CMC 11 For CMC ParRep, we justify this claim in heorem 8 below, which shows that, starting in some W and waiting until the simulation leaves W, the error for a complete ParRep cycle in CMC ParRep compared to direct (serial) simulation vanishes as t c increases. See heorem 12 below for the analogous result on embedded ParRep. We note that the errors from each ParRep cycle produce an error in the estimation (5) of stationary averages that does not disappear as sim. However, we expect that the error vanishes as the thresholds t c = t p. Study of the this error is more involved and will be the focus of another work. Recall we have assumed convergence of ν tc ν V as t c, for every starting point x E, where V denotes total variation norm. See for instance heorem 3 for conditions guaranteeing this convergence. heorem 8. Consider CMC ParRep starting at x W in the decorrelation stage. Assume the dephasing stage sampling is exact, that is, (x 1,..., x R ) ν R. Consider the expected contribution to F (f) sim until the first time the simulation leaves W (either in the decorrelation or in the parallel stage), [ [ tc R F (f) sim E x f(x(s)) ds + E x,νr f(x r (s))ds, 1 >t c where E x,νr denotes expectation for (X(t), X 1 (t),..., X R (t)) with X(t) starting at x and the replicas (X 1 (t),..., X R (t)) starting at initial distribution ν R. he error compared to direct (serial) simulation satisfies the bound [ (13) Ex f(x(s))ds F (f) sim f sup E x [ ν tc ν V. x W = = Proof. We estimate [ f(x(s))ds F (f) sim Ex Ex Ex Ex [ t c f(x(s))ds E x,νr [ 1 >t c R [ [ f(x(s))ds t c > t R c E x,νr [ [ f(x(s))ds > t R c E νr t c f(x (s))ds r f(x r (s))ds > t c Px ( > t c ) f(x (s))ds r, where we used the fact that X(t) and the replicas (X 1 (t),..., X R (t)) are independent. By the Markov property, [ [ E x f(x(s))ds > t c = E νtc f(x(s))ds. t c By heorem 7, E νr [ R [ f(x r (s))ds = E ν f(x(s)) ds.

12 12 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Combining the above estimates and equalities, [ f(x(s))ds F (f) sim Ex [ [ Eνtc f(x(s))ds E ν f(x(s)) ds [ = E x f(x(s))ds ν tc (x) [ E x f(x(s))ds ν(x) x W x W f sup E x [ ν tc ν V. x W We note that E x [ is uniformly bounded in x W if, for instance, P W is irreducible and W is finite and non-absorbing for X(t), as in heorem 3. his uniform boundedness guarantees that the right hand side of (13) vanishes as t c. 4. he Embedded ParRep Method Formulation of the Embedded ParRep Algorithm. In this section, we introduce another algorithm for accelerating the computation of π(f). he algorithm, called embedded ParRep, circumvents the disadvantages of CMC ParRep discussed above. As mentioned in the previous section, CMC ParRep can be slow due to the randomness of the holding times. In the worst case, one has to wait until all replicas leave W in order to determine the first exit time. o circumvent this issue we propose an algorithm based on the embedded chain in which the parallel stage terminates as soon as one of the replicas leaves W. Before we describe embedded ParRep, we introduce some notations. hroughout, Xn, 1..., Xn R will be independent processes with the same law as X n and with initial distributions supported in W. Moreover, we consider Xn, 1..., Xn R as the embedded chains of X 1 (t),..., X r (t) defined above, and let τn, 1..., τn R be the corresponding holding times. Recall that the first exit time of X n from W is N = inf{n > : X n / W }. For r = 1,..., R, we define the first exit time of X r n from W by and the smallest among them by N r = min{n N; X r n / W } N = min{n r ; r = 1,..., R}. Note that it is possible that more than one replica leave W for the first time after N transitions. We denote by K the smallest index among these escaped replicas. hat is, K = min{r = 1,..., R; XN r / W }. It is clear from the above definition that N K = N. Of course N, N r, N and K depend on W, but we do not make this explicit. Here and below we write E µr for expectation of (Xn, 1..., Xn R ) starting at µ R, where µ R (x 1,..., x R ) = R µ(x r ), x 1,..., x R W.

13 PARALLEL REPLICA MEHODS FOR CMC 13 We begin by reproducing from [2 heorem 9 and 1 below, with proofs for completeness. heorem 9. Suppose (X 1 n,..., X R n ) has initial distribution µ R. hen R(N K 1) + K has the same distribution as N 1. Proof. By heorem 5, N 1 is geometrically distributed with rate P µ (N 1 > 1). Note that for any n and r = 1,..., R, the event {N K = n, K = k} is equivalent to the event {N 1 > n,..., N k 1 > n, N k = n, N k+1 > n 1,..., N R > n 1}. Since X 1 n,..., X R n are iid and N 1 is geometrically distributed with rate p = P µr (N 1 > 1), P µr (N K = n, K = k) = (1 p) n(k 1) (1 p) n 1 p(1 p) (n 1)(k 1) = (1 p) R(n 1)+k 1 p. hat is, R(N K 1) + K has geometric distribution with rate p. heorem 1. Suppose (X 1 n,..., X R n ) has the initial distribution µ R. hen X K N K is independent of R(N K 1) + K and the distribution of (X K N K, R(N K 1) + K) is same as that of (X 1 N 1, N 1 ). Proof. We first prove that X K N K is independent of K. Since X R n,..., X R n are iid and N k is independent of X k N k for each k, then X k N k is independent of N 1,..., N R. Note that K σ(n 1,..., N R ), hence X k N k is independent of K for each k. Now observe that for any A E, P µr (X K N K A) = R = P µr (X k N k A, K = r) R P µr (XN 1 (K = r) 1 A)PµR = P µr (X 1 N 1 A), that is, X K N and X 1 K N are equally distributed. his implies that X K 1 N is independent K of K. o see the independence between X K N and R(N K 1) + K, note that K P µr (X K N K A, N K = n, K = r) = P µr (X r N r A, N r = n, K = r) = P µr (X r N r A, K = r N r = n)p µr (N r = n) = P µr (X r N r A N r = n)p µr (N r = n, K = r) = P µr (X r N r A)PµR (N r = n, K = r) = P µr (X K N K A)PµR (N K = n, K = r) for any measurable A E, n Z + and r = 1,..., R. Finally, heorem 9 and the above analysis imply that (X K N, R(N K 1) + K) and (X 1 K N, N 1 ) are equally 1 distributed. Now we present the embedded ParRep algorithm in Algorithm 2. In this algorithm we will need user-chosen parameters n c associated with each metastable set W. Roughly, these parameters correspond to the time for X n to converge to the QSD in W. he DMC X n and holding times τ n are simulated by the stochastic simulation algorithm (SSA), see, for instance, [13, just as in the CMC ParRep. If Xn par remains in W for sufficiently long time (i.e., time t c ), it is distributed nearly according to the QSD µ of X n in W. See heorem 3. his means that at the end of the decorrelation stage, Xn par can be considered a sample of µ.

14 14 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Algorithm 2 Embedded ParRep 1: Set a decorrelation threshold n c for each metastable set W. Initialize the simulation time clock N sim = and the accumulated value F (f) sim =. We will write Xn par and par n for a DMC and holding time process following the law of the embedded chain and holding times of X(t) respectively. A complete ParRep cycle consists of three stages. 2: Decorrelation Stage: Starting at n = N sim, evolve Xn par and τn par until Xn par spends n c consecutive time steps inside of the same metastable set W. hat is, evolve X par n and τ par n from time n = N sim until time N corr = inf{n N sim +n c 1 : X par m W for m {n n c +1,..., n} forsome W }. hen update N corr 1 F (f) sim = F (f) sim + f(xn par ) τn par, n=n sim set N sim = N corr, and proceed to the dephasing stage. 3: Dephasing Stage : Let W be such that X par N sim W, that is, W is the metastable set from the end of the decorrelation stage. Generate R independent samples x 1,..., x R from µ, the QSD of X n in W. hen proceed to the parallel stage. 4: Parallel Stage : Start R parallel processes Xn, 1..., Xn R at x 1,..., x R, and evolve them and the corresponding holding times τn, 1..., τn R from time n = until time N. hen update (14) F (f) sim = F (f) sim + R N 2 k= N sim = N sim + R(N 1) + K, f(x r k) τ r k + K f(xn r 1) τ r N 1 set X par N sim = XN K, and return to the decorrelation stage. 5: he algorithm is stopped when N sim reaches some user-chosen time N end. he stationary average π(f) is estimated as π(f) F (f) sim /F (1) sim. he aim of the dephasing stage is to prepare a sequence of iid initial states with distribution µ. Like the CMC ParRep, rejection sampling can be used for the embedded ParRep as well. However, a more natural and efficient option for the embedded ParRep is a Fleming-Viot based sampling procedure [3, 11. he procedure can be summarized as follows. he R replicas Xn, 1..., Xn R, starting in W, evolve until one or more of them leaves W. hen each replica that left W is restarted from the current state of another replica that is currently in W (usually chosen uniformly at random). he procedure stops after the replicas have evolved for n = n p time steps, where n p is a parameter similar to n c. (If all the replicas leave W at the same time, the procedure restarts from the beginning.) With this type of sampling, the number of time steps simulated for each replica in the dephasing step is the same. In particular, if the R parallel processors

15 PARALLEL REPLICA MEHODS FOR CMC = 2 1 = 1 1 = 1 N $!1 = 2 N $!1 = 1 N $ = 2 N $ 3 = 3 1 = 3 N $!1 = 3 N $ R! 1 = R!1 1 = R!1 N $!1 = R!1 N $ R = R 1 = R N $!1 = R N $ Fig. 2. he diagram for one parallel stage of the embedded ParRep algorithm with R replicas. Each blue dot represents an exit event along the time line. Both replica 2 and 3 leave W after N = 6 transitions (the blue dot with the red x ), in which case K = 2. are synchronous (i.e. if each processor takes the same wall clock time to simulate one time step), then each processor finishes the dephasing step at the same wall clock time. he acceleration of the embedded ParRep comes from the parallel stage. Roughly, we only have to wait N time steps instead of N to observe an exit from W. he theoretical wall clock time speedup can be approximately a factor of R. See heorem 9 below. Like with CMC ParRep, the parallel step does not require metastability for this time speedup, but if W is not metastable, then the dephasing step will not be efficient. See the remarks below Algorithm 1. Similar to the CMC ParRep, each parallel stage of the embedded ParRep has a consistent averaged contribution to the stationary average. Suppose that x 1,..., x R are iid samples from µ. 1. he joint law of (X K N K, R(N K 1) + K) is the same as that of (X 1 N 1, N 1 ). hat is, the joint distribution of the first exit time and the exit state for each parallel stage is independent of the number of replicas. 2. he expected value of R is the same as that of N 2 k= f(x r k) τ r k + N 1 n= K f(xn r 1) τ r N 1 f(x 1 n) τ 1 n. Hence the expected contribution to F (f) sim from each parallel stage is independent of the number of replicas. See heorem 11 below. See heorem 1 and 11 for proofs of these statements. Remark 1 (Parallel implementation and efficiency). We expect that embedded ParRep is superior to the CMC ParRep for the following two reasons. First, consider the parallel stages of both algorithms. In the CMC ParRep, observing the first exit

16 16 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF event in the parallel stage is not sufficient to determine. But in embedded ParRep, once any replica leaves W, we know N. hus the embedded ParRep parallel step terminates once any of the replicas leaves W. For this reason we expect the parallel stage of the embedded ParRep to be significantly faster than that of the CMC ParRep. Second, consider the dephasing stage. For the embedded ParRep, Fleming- Viot sampling is a natural technique because if the processors are synchronous then they all finish the dephasing stage at the same wall clock time, and only the current states of each processor are needed at each time step to decide where to restart replicas which left W. For asynchronous processors, one can simply implement a polling time. his is not true, however, for Fleming-Viot sampling with the CMC ParRep. Indeed, to implement Fleming-Viot sampling with the CMC ParRep, one would have to store the histories of every replica, and the replicas would finish at potentially very different wall clock times. he rejection method can be slow for both algorithms, particularly when the metastability is weak or when the number of replicas is large Error analysis of the embedded ParRep. Now we are able to show that if the dephasing sampling is exact, then the expected contribution to F (f) sim from the parallel stage is exact. heorem 11. Suppose in the dephasing step (x 1,..., x R ) µ R. hen the expected contribution to F (f) sim from the parallel stage of Algorithm 2 is the same for every number of replicas. E µr [ R (15) N 2 k= f(x r k) τ r k + Proof. We first rewrite R = N 2 k= R N 1 i= K f(xn r 1) τ r N 1 f(x r k) τ r k + [ N 1 = E µ n= K f(xn r 1) τ r N 1 f(x r i ) τ r i R r=k+1 f(x r N 1) τ r N 1. For the first part, we condition N and obtain [ R N 1 R n 1 E µr f(xi r ) τi r = E µr [f(xi r ) τi r I N =n i= n=1 i= Interchanging the iterated summations leads to R n 1 E µr [f(xi r ) τi r I N =n = n=1 i= R i= f(x n ) τ n = µ(fq 1 )E µ [N. E µr [f(xi r )IN >i τi r. Notice N > i is equivalent to N 1 > i,..., N R > i and τi r is independent of N s for s r. hus = R i= R i= E µr [f(xi r ) τi r N > i P µr (N > i) E µr [f(xi r ) τi r N r > i P µr (N > i).

17 PARALLEL REPLICA MEHODS FOR CMC 17 Now by Lemma 1 and the definition of the QSD, E µ [f(x r i ) τ r i N r > i = E µ [E µ [f(x r i ) τ r i {X r n} n=,1,... N r > i Combining the last four equations gives (16) E µr [ R N 1 i= = E µ [f(x r i )E µ [ τ r i {X r n} n=,1,... N r > i = E µ [ f(x r i )q(x r i ) 1 N r > i = µ(fq 1 ). f(x r i ) τ r i = µ(fq 1 )RE µr [N. A similar argument can be applied to the second term on the right hand side of (15). First we condition N and K simultaneously such that [ R E µr f(xn r 1) τ r N 1 = r=k+1 R R n=1 r=k+1 E µr [ f(x r n 1) τ r n 1 N = n, K = k P µr (N = n, K = k). Interchanging the second and third summations the right-hand side equals Recall that R r 1 E [ µr f(xn 1) τ r n 1 N r = n, K = k P µr (N = n, K = k) n=1 r=2 k=1 N = n, K = k N 1 > n,..., N k 1 > n, N k = n, N k+1 > n 1,..., N R > n 1. hus, using independence of X 1 n,..., X R n and the definition of the QSD, = R r 1 E [ µr f(xn 1) τ r n 1 N r = n, K = k P µr (N = n, K = k) n=1 r=2 k=1 R r 1 E µ [ f(xn 1) τ r n 1 N r r > n 1 P µr (N = n, K = k) n=1 r=2 k=1 =µ(fq 1 ) R r 1 P µr (N = n, K = k) n=1 r=2 k=1 =µ(fq 1 )(R E µr [K). Combining the last three equations leads to [ R (17) E µ f(xn r 1) τ r N 1 = µ(fq 1 )(R E µr [K). r=k+1 Subtracting (17) from (16), we have E µr [ R N 1 i= f(x r i ) τ r i R r=k+1 f(x r N 1) τ r N 1 = µ(fq 1 )E µr [R(N 1) + K.

18 18 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Now the result follows since µ(fq 1 )E µr [R(N 1) + K = µ(fq 1 )E µ [N by heorem 1. In particular, when R = 1 we have N = N and K = 1, and thus [ N 1 E µ n= f(x n ) τ n = µ(fq 1 )E µ [N. We now prove an analog of heorem 8 for the embedded ParRep. Recall we have assumed convergence of µ nc µ V as n c, for every starting point x E. See for instance heorem 3 for conditions guaranteeing this convergence. heorem 12. Consider the embedded ParRep starting at x W in the decorrelation stage. Assume the dephasing stage sampling is exact, that is, (x 1,..., x R ) µ R. Consider the expected contribution to F (f) sim up until the first time the simulation leaves W (either in the decorrelation stage or in the parallel stage): [ nc N 1 [ R N 2 F (f) sim E x f(x n ) τ n + E x,µr 1N>n c f(xk) τ r k r n= K + 1N>n c f(xn r 1) τ r N 1, where E x,µr denotes expectation for (X n, Xn, 1..., Xn R ) with X n starting at x and the replicas (Xn, 1..., Xn R ) starting at the initial distribution µ R. he error compared to a direct (serial) simulation satisfies the bound [ N 1 (18) Ex f(x n ) τ n F (f) sim f sup E x [ µ nc µ V. x W n= Proof. he proof is similar to that for the CMC ParRep, [ N 1 f(x n ) τ n F (f) sim = Ex Ex n= [ N 1 n=n c N f(x n ) τ n E x,µr [ K + 1N>n c f(xn r 1) τ r N 1 Ex [ N 1 f(x n ) τ n N > n c n=n c E µr [ R N 2 k= f(x r k) τ r k + By the Markov property [ N 1 E x f(x n ) τ n N > n c n=n c R N 2 1N>n c k= k= f(x r k) τ r k K f(xn r 1) τ r N 1. [ N 1 = E µnc n= f(x n ) τ n.

19 Owing to heorem 11, E µr [ R herefore Ex N 2 k= [ N 1 n= Eµnc = x W PARALLEL REPLICA MEHODS FOR CMC 19 f(x r k) τ r k + f(x n ) τ n [ N 1 n= [ N 1 E x f(x n ) τ n n= K f(xn r 1) τ r N 1 F (f) sim f(x n ) τ n f sup E x [ µ nc µ V x W [ N 1 E µ n= µ nc (x) x W f(x n ) τ n [ N 1 E x [ N 1 = E µ n= n= f(x n ) τ n f(x n ) τ n µ(x) with the last equation coming from the fact that E x [ N 1 n= τ n = E x [. 5. Numerical Experiments. We present two numerical examples from the stochastic reaction networks in order to demonstrate the consistency and efficiency of the ParRep algorithms Reaction networks with linear propensity. We consider the following stochastic reaction network (19) A B C taken from [7, where A, B and C represent reacting species. he time evolution of the population (the number of species) in the reaction network is commonly modeled as a CMC X(t) = (X 1 (t), X 2 (t), X 3 (t)) with state space E Z 3 +. he jump rate of each reaction is governed by the propensity function (intensity) λ j (x), j = 1,..., 5 such that for all t >, P(X(t + h) = x + η j X(t) = x) λ j (x) = lim, h h where η j is the state change vector associated with the jth reaction. We list the reactions and their corresponding propensity functions and state change vectors in able 1. able 1 Reactions, propensity functions and state change vectors Reaction Propensity function State change vector A λ 1 (x) = c 1 η 1 = (1,, ) A B λ 2 (x) = c 2 x 1 η 2 = ( 1, 1, ) B A λ 3 (x) = c 3 x 2 η 3 = (1, 1, ) B C λ 4 (x) = c 4 x 2 η 4 = (, 1, 1) C λ 5 (x) = c 5 x 3 η 5 = (,, 1).

20 2 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF In this numerical experiment, we take the initial state x = (5, 1, 1) and the rate constants (c 1, c 2, c 3, c 4, c 5 ) = (.1, 1, 1,.1,.1). With this choice of parameters the timescale separation is about ɛ = 1 4 and hence the process X(t) demonstrates metastability. he reactions A B and B A occur with a much higher probability than the other reactions and hence we call A B and B A fast reactions and the other reactions slow reactions. he occurrence of slow reactions is a rare event. We define the observables f 1 (x) = x 1 + x 2 and f 2 (x) = x 3, the collection of sets {W m,n } m,n Z+ with W m,n = {x E : f 1 (x) = m, f 2 (x) = n} form a full decomposition of the state space E. Note that both the total population of species A and B (i.e., f 1 (X(t))) and the population of species C (i.e. f 2 (X(t))) remain constant until one of the slow reactions occurs. Hence the typical sojourn time for X(t) in each W m,n is very long comparing to the transition time between any two states that are in W m,n. In this case, we say X(t) is metastable in W m,n. For example, with the initial population x = (1, 1, ), the states (1, 1, ), (2,, ) and (, 2, ) form a metastable set since the fast reactions A B and B A occur with a significantly higher probability than slow reactions and only the occurrence of the slow reactions can allow the process to move from the metastable set to another metastable set. Note that both observables f 1 and f 2 defined above are invariant in each metastable set, we call them slow observables. In general, an observable f is called a slow observable if it is invariant in each metastable set W m,n, i.e., there is a constant C(m, n) such that f(x) = C(m, n) for each x W m,n. An observable is called a fast observable if it is not slow (e.g., f(x) = x 1 ). his kind of two-scale problems arise in many fields other than the stochastic reaction networks, such as the queuing theory and population dynamics. Estimation of the distributions of two-scale processes can be computationally prohibitive due to the insufficient sampling of the rare events. herefore, it is desirable to apply the two ParRep algorithms proposed in this paper to accelerate the long time simulation and estimate the stationary distribution. We apply both the CMC ParRep and the embedded ParRep to estimate the stationary averages of the slow observables f 1 and f 2. he stationary distribution of the fast observable f 3 (x) = x 1 is also computed using the embedded ParRep. On the other hand, for the reaction network (19) under consideration, one can calculate the stationary distribution analytically since it only involves mono-molecular reactions. In fact, it can be shown that the stationary distribution is a multivariate Poisson distribution [7, that is, (2) π(x 1, x 2, x 3 ) = x1 x2 x3 λ 1 λ 2 λ 3 e ( λ 1+ λ 2+ λ 3), x 1!x 2!x 3! where λ 1 = c 1(c 3 + c 4 ) c 2 c 4, λ2 = c 1 c 4, λ3 = c 1 c 5. Hence the exact stationary averages of the slow observables f 1 and f 2 are π(f 1 ) = 2.1 and π(f 2 ) = 1 and the exact stationary averages of the fast observable f 3 (x) = x 1 is 1.1. We use this exact result to compare with our result from numerical simulation.

21 :(f2) :(f2) :(f1) :(f1) PARALLEL REPLICA MEHODS FOR CMC CMC ParRep 22 Embedded ParRep 21.5 CMC ParRep Exact 21.5 Embedded ParRep Exact Replica Replica Fig. 3. he stationary average of the slow observable f 1 (x) = x 1 +x 2 computed with the CMC ParRep (left) and with the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. 11 CMC ParRep 11 Embedded ParRep CMC ParRep Exact Embedded ParRep Exact Replica Replica Fig. 4. he stationary average of the slow observable f 2 (x) = x 3 computed with the CMC ParRep (left) and with the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. Our simulations compare the CMC ParRep and the embedded ParRep with the Stochastic Simulation Algorithm (SSA), [13. In Figure 3, we demonstrate the estimation of π(f 1 ) using the CMC ParRep and the embedded ParRep with various numbers of replicas (R = 1, 2,, 1) and with SSA (R = 1). Similarly, Figure 4 shows the estimation of π(f 2 ). Note that only the embedded ParRep is used to compute the stationary average of the fast variable f(x) = x 1 since the CMC ParRep is not efficient for fast observables as we commented at the end of Section 3.1. Currently, the rejection sampling is used for dephasing and the decorrelation and dephasing thresholds are taken to be t c = t p =.1 for the CMC ParRep and n c = n p = 15 steps for the embedded ParRep. In Figure 5, the estimation for the fast observable and speedup are shown. It can be seen that with 1 replicas, the speedup factor is about 4.5 for the CMC ParRep and 5.5 for the embedded ParRep.

22 :(x1) Speedup 22 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Embedded ParRep Embedded ParRep Exact 3 25 CMC vs. Embedded CMC ParRep Embedded ParRep Replica Replica Fig. 5. he stationary average of the fast observable f 3 (x) = x 1 computed with the embedded ParRep (left) and the speedup comparison between the CMC ParRep and the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. When the number of replicas increases, the embedded ParRep becomes much more efficient than the CMC ParRep. However, even the embedded ParRep is far away from the linear speedup (with 1 replicas, about 27 times faster than SSA). his sublinear speedup comes from the fact that when the number of replica is large, the acceleration is offset by the inefficient rejection sampling based dephasing procedure. We expect that the embedded ParRep would be more efficient if the Fleming-Viot particle processes are used for dephasing Reaction networks with nonlinear propensity. In the second example, we focus on the following network from [24, (21) S 1 S 2, S 1 S 3, 2S 2 + S 3 3S 4. he propensity function and state change vector associated with each reaction is shown in able 2. Note that by the law of mass action, the reactions 2S 2 + S 3 3S 4 have nonlinear propensity functions. able 2 Reactions, propensity functions and state change vectors Reaction Propensity function State change vector S 1 S 2 λ 1 (x) = c 1 x 1 η 1 = ( 1, 1,, ) S 2 S 1 λ 2 (x) = c 2 x 2 η 2 = (1, 1,, ) S 1 S 3 λ 3 (x) = c 3 x 1 η 3 = ( 1,, 1, ) S 3 S 1 λ 4 (x) = c 4 x 3 η 4 = (1,, 1, ) 2S 2 + S 3 3S 4 λ 5 (x) = c 5 x 2 (x 2 1)x 3 η 5 = (, 2, 1, 3) 3S 4 2S 2 + S 3 λ 6 (x) = c 6 x 3 (x 3 1)(x 3 2) η 6 = (, 2, 1, 3) hroughout this example, we choose the initial state x = (3, 3, 3, 3) and the reaction rate constants (c 1, c 2, c 3, c 4, c 5, c 6 ) = (.1,.1,.1,.1, 2, 2)

The parallel replica method for Markov chains

The parallel replica method for Markov chains The parallel replica method for Markov chains David Aristoff (joint work with T Lelièvre and G Simpson) Colorado State University March 2015 D Aristoff (Colorado State University) March 2015 1 / 29 Introduction

More information

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution

More information

Markov processes and queueing networks

Markov processes and queueing networks Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

SMSTC (2007/08) Probability.

SMSTC (2007/08) Probability. SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................

More information

A mathematical framework for Exact Milestoning

A mathematical framework for Exact Milestoning A mathematical framework for Exact Milestoning David Aristoff (joint work with Juan M. Bello-Rivas and Ron Elber) Colorado State University July 2015 D. Aristoff (Colorado State University) July 2015 1

More information

Convergence of Feller Processes

Convergence of Feller Processes Chapter 15 Convergence of Feller Processes This chapter looks at the convergence of sequences of Feller processes to a iting process. Section 15.1 lays some ground work concerning weak convergence of processes

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

MARKOV CHAINS AND HIDDEN MARKOV MODELS

MARKOV CHAINS AND HIDDEN MARKOV MODELS MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Statistics 150: Spring 2007

Statistics 150: Spring 2007 Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities

More information

Stochastic Processes

Stochastic Processes Stochastic Processes 8.445 MIT, fall 20 Mid Term Exam Solutions October 27, 20 Your Name: Alberto De Sole Exercise Max Grade Grade 5 5 2 5 5 3 5 5 4 5 5 5 5 5 6 5 5 Total 30 30 Problem :. True / False

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that 15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

NUMERICAL ANALYSIS OF PARALLEL REPLICA DYNAMICS

NUMERICAL ANALYSIS OF PARALLEL REPLICA DYNAMICS NUMERICAL ANALYSIS OF PARALLEL REPLICA DYNAMICS GIDEON SIMPSON AND MITCHELL LUSKIN Abstract Parallel replica dynamics is a method for accelerating the computation of processes characterized by a sequence

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Numerical methods in molecular dynamics and multiscale problems

Numerical methods in molecular dynamics and multiscale problems Numerical methods in molecular dynamics and multiscale problems Two examples T. Lelièvre CERMICS - Ecole des Ponts ParisTech & MicMac project-team - INRIA Horizon Maths December 2012 Introduction The aim

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

MARKOV PROCESSES. Valerio Di Valerio

MARKOV PROCESSES. Valerio Di Valerio MARKOV PROCESSES Valerio Di Valerio Stochastic Process Definition: a stochastic process is a collection of random variables {X(t)} indexed by time t T Each X(t) X is a random variable that satisfy some

More information

1 Continuous-time chains, finite state space

1 Continuous-time chains, finite state space Université Paris Diderot 208 Markov chains Exercises 3 Continuous-time chains, finite state space Exercise Consider a continuous-time taking values in {, 2, 3}, with generator 2 2. 2 2 0. Draw the diagramm

More information

Stochastic Processes (Week 6)

Stochastic Processes (Week 6) Stochastic Processes (Week 6) October 30th, 2014 1 Discrete-time Finite Markov Chains 2 Countable Markov Chains 3 Continuous-Time Markov Chains 3.1 Poisson Process 3.2 Finite State Space 3.2.1 Kolmogrov

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

T.8. Perron-Frobenius theory of positive matrices From: H.R. Thieme, Mathematics in Population Biology, Princeton University Press, Princeton 2003

T.8. Perron-Frobenius theory of positive matrices From: H.R. Thieme, Mathematics in Population Biology, Princeton University Press, Princeton 2003 T.8. Perron-Frobenius theory of positive matrices From: H.R. Thieme, Mathematics in Population Biology, Princeton University Press, Princeton 2003 A vector x R n is called positive, symbolically x > 0,

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdlhandlenet/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on

More information

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < + Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Universal examples. Chapter The Bernoulli process

Universal examples. Chapter The Bernoulli process Chapter 1 Universal examples 1.1 The Bernoulli process First description: Bernoulli random variables Y i for i = 1, 2, 3,... independent with P [Y i = 1] = p and P [Y i = ] = 1 p. Second description: Binomial

More information

Latent voter model on random regular graphs

Latent voter model on random regular graphs Latent voter model on random regular graphs Shirshendu Chatterjee Cornell University (visiting Duke U.) Work in progress with Rick Durrett April 25, 2011 Outline Definition of voter model and duality with

More information

Figure 10.1: Recording when the event E occurs

Figure 10.1: Recording when the event E occurs 10 Poisson Processes Let T R be an interval. A family of random variables {X(t) ; t T} is called a continuous time stochastic process. We often consider T = [0, 1] and T = [0, ). As X(t) is a random variable

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Stochastic Chemical Kinetics

Stochastic Chemical Kinetics Stochastic Chemical Kinetics Joseph K Scott November 10, 2011 1 Introduction to Stochastic Chemical Kinetics Consider the reaction I + I D The conventional kinetic model for the concentration of I in a

More information

LTCC. Exercises solutions

LTCC. Exercises solutions 1. Markov chain LTCC. Exercises solutions (a) Draw a state space diagram with the loops for the possible steps. If the chain starts in state 4, it must stay there. If the chain starts in state 1, it will

More information

Propp-Wilson Algorithm (and sampling the Ising model)

Propp-Wilson Algorithm (and sampling the Ising model) Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Stochastic process for macro

Stochastic process for macro Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,

More information

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ), MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Introduction to stochastic multiscale modelling in tumour growth

Introduction to stochastic multiscale modelling in tumour growth Introduction to stochastic multiscale modelling in tumour growth Tomás Alarcón ICREA & Centre de Recerca Matemàtica T. Alarcón (ICREA & CRM, Barcelona, Spain) Lecture 3 CIMPA, Santiago de Cuba, June 2016

More information

Irregular Birth-Death process: stationarity and quasi-stationarity

Irregular Birth-Death process: stationarity and quasi-stationarity Irregular Birth-Death process: stationarity and quasi-stationarity MAO Yong-Hua May 8-12, 2017 @ BNU orks with W-J Gao and C Zhang) CONTENTS 1 Stationarity and quasi-stationarity 2 birth-death process

More information

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015 Markov chains c A. J. Ganesh, University of Bristol, 2015 1 Discrete time Markov chains Example: A drunkard is walking home from the pub. There are n lampposts between the pub and his home, at each of

More information

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A ) 6. Brownian Motion. stochastic process can be thought of in one of many equivalent ways. We can begin with an underlying probability space (Ω, Σ, P) and a real valued stochastic process can be defined

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 6

MATH 56A: STOCHASTIC PROCESSES CHAPTER 6 MATH 56A: STOCHASTIC PROCESSES CHAPTER 6 6. Renewal Mathematically, renewal refers to a continuous time stochastic process with states,, 2,. N t {,, 2, 3, } so that you only have jumps from x to x + and

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

5. Solving the Bellman Equation

5. Solving the Bellman Equation 5. Solving the Bellman Equation In the next two lectures, we will look at several methods to solve Bellman s Equation (BE) for the stochastic shortest path problem: Value Iteration, Policy Iteration and

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

Bisection Ideas in End-Point Conditioned Markov Process Simulation

Bisection Ideas in End-Point Conditioned Markov Process Simulation Bisection Ideas in End-Point Conditioned Markov Process Simulation Søren Asmussen and Asger Hobolth Department of Mathematical Sciences, Aarhus University Ny Munkegade, 8000 Aarhus C, Denmark {asmus,asger}@imf.au.dk

More information

Lecture 9 Classification of States

Lecture 9 Classification of States Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Persistence and Stationary Distributions of Biochemical Reaction Networks

Persistence and Stationary Distributions of Biochemical Reaction Networks Persistence and Stationary Distributions of Biochemical Reaction Networks David F. Anderson Department of Mathematics University of Wisconsin - Madison Discrete Models in Systems Biology SAMSI December

More information

Part I Stochastic variables and Markov chains

Part I Stochastic variables and Markov chains Part I Stochastic variables and Markov chains Random variables describe the behaviour of a phenomenon independent of any specific sample space Distribution function (cdf, cumulative distribution function)

More information

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities

More information

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States

More information

Reflected Brownian Motion

Reflected Brownian Motion Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

SIMILAR MARKOV CHAINS

SIMILAR MARKOV CHAINS SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence of Markov transition probabilities and their spectral properties 1. Vere-Jones, D. Geometric ergodicity in

More information

Consensus on networks

Consensus on networks Consensus on networks c A. J. Ganesh, University of Bristol The spread of a rumour is one example of an absorbing Markov process on networks. It was a purely increasing process and so it reached the absorbing

More information

X. Hu, R. Shonkwiler, and M.C. Spruill. School of Mathematics. Georgia Institute of Technology. Atlanta, GA 30332

X. Hu, R. Shonkwiler, and M.C. Spruill. School of Mathematics. Georgia Institute of Technology. Atlanta, GA 30332 Approximate Speedup by Independent Identical Processing. Hu, R. Shonkwiler, and M.C. Spruill School of Mathematics Georgia Institute of Technology Atlanta, GA 30332 Running head: Parallel iip Methods Mail

More information

Applied Stochastic Processes

Applied Stochastic Processes STAT455/855 Fall 26 Applied Stochastic Processes Final Exam, Brief Solutions 1 (15 marks (a (7 marks For 3 j n, starting at the jth best point we condition on the rank R of the point we jump to next By

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

2.1 Laplacian Variants

2.1 Laplacian Variants -3 MS&E 337: Spectral Graph heory and Algorithmic Applications Spring 2015 Lecturer: Prof. Amin Saberi Lecture 2-3: 4/7/2015 Scribe: Simon Anastasiadis and Nolan Skochdopole Disclaimer: hese notes have

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Markov Chains and Computer Science

Markov Chains and Computer Science A not so Short Introduction Jean-Marc Vincent Laboratoire LIG, projet Inria-Mescal UniversitéJoseph Fourier Jean-Marc.Vincent@imag.fr Spring 2015 1 / 44 Outline 1 Markov Chain History Approaches 2 Formalisation

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 3: Regenerative Processes Contents 3.1 Regeneration: The Basic Idea............................... 1 3.2

More information

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes CDA6530: Performance Models of Computers and Networks Chapter 3: Review of Practical Stochastic Processes Definition Stochastic process X = {X(t), t2 T} is a collection of random variables (rvs); one rv

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

process on the hierarchical group

process on the hierarchical group Intertwining of Markov processes and the contact process on the hierarchical group April 27, 2010 Outline Intertwining of Markov processes Outline Intertwining of Markov processes First passage times of

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution

More information

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds)

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds) Advanced sampling ChE210D Today's lecture: methods for facilitating equilibration and sampling in complex, frustrated, or slow-evolving systems Difficult-to-simulate systems Practically speaking, one is

More information

A note on adiabatic theorem for Markov chains

A note on adiabatic theorem for Markov chains Yevgeniy Kovchegov Abstract We state and prove a version of an adiabatic theorem for Markov chains using well known facts about mixing times. We extend the result to the case of continuous time Markov

More information

A primer on basic probability and Markov chains

A primer on basic probability and Markov chains A primer on basic probability and Markov chains David Aristo January 26, 2018 Contents 1 Basic probability 2 1.1 Informal ideas and random variables.................... 2 1.2 Probability spaces...............................

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Random Times and Their Properties

Random Times and Their Properties Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2

More information

Lecture XI. Approximating the Invariant Distribution

Lecture XI. Approximating the Invariant Distribution Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Krzysztof Burdzy Robert Ho lyst Peter March

Krzysztof Burdzy Robert Ho lyst Peter March A FLEMING-VIOT PARTICLE REPRESENTATION OF THE DIRICHLET LAPLACIAN Krzysztof Burdzy Robert Ho lyst Peter March Abstract: We consider a model with a large number N of particles which move according to independent

More information

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant

More information