arxiv: v2 [math.na] 20 Dec 2016
|
|
- Eunice Lewis
- 6 years ago
- Views:
Transcription
1 SAIONARY AVERAGING FOR MULI-SCALE CONINUOUS IME MARKOV CHAINS USING PARALLEL REPLICA DYNAMICS ING WANG, PER PLECHÁČ, AND DAVID ARISOFF arxiv: v2 [math.na 2 Dec 216 Abstract. We propose two algorithms for simulating continuous time Markov chains in the presence of metastability. We show that the algorithms correctly estimate, under the ergodicity assumption, stationary averages of the process. Both algorithms, based on the idea of the parallel replica method, use parallel computing in order to explore metastable sets more efficiently. he algorithms require no assumptions on the Markov chains beyond ergodicity and the presence of identifiable metastability. In particular, there is no assumption on reversibility. We present error analyses, as well as numerical simulations on multi-scale stochastic reaction network models in order to demonstrate consistency of the method and its efficiency. Key words. Markov chains, Monte Carlo, reversibility, stationary distribution, metastability, parallel replica, stochastic reaction networks, multi-scale dynamics, coarse graining AMS subject classifications. 6J22, 65C5, 65Z5, 82B31, 92E2 1. Introduction. We focus on computing stationary averages of continuous time Markov chains. More precisely, if π is the stationary distribution of a continuous time Markov chain (CMC) and f is a function on the state space, we aim at estimating the average π(f) E π [f by taking a time average on a long trajectory of the CMC. here are many methods for computing stationary averages of stochastic processes, however, the vast majority of them rely on reversibility of the process, e.g., as in Markov chain Monte Carlo [2. Computational cost of the ergodic (trajectory) averaging becomes prohibitive when the convergence to the stationary distribution is slow due to metastability of the dynamics, for example in the presence of rare events or large time scale disparities (multi-scale dynamics), [21. A possible remedy for this issues is to use parallel computing in order to accelerate sampling of the state space. For instance, the parallel tempering method (also known as the replica exchange) [12, 1, 9, 17 has been successfully applied to many problems by simulating multiple replicas of the original systems, each replica at a different temperature. However, the method requires the time reversibility of the underlying processes, which is typically not true for processes that model chemical reaction networks or systems with non-equilibrium steady states. In fact, there are not many methods that parallelize Monte Carlo simulation for irreversible processes with metastability, in particular if long-time sampling such as ergodic averaging, is required. We present a parallel computing approach for CMCs without time reversibility. One advantage of the proposed algorithms is that they may be used, in principle, on arbitrary CMCs. However, gains in efficiency can occur only if the CMC is metastable. In this contribution we consider only models described by continuous time Markov chains. As a motivating example we study a multi-scale chemical reaction network model in which molecules of different types react with different rates depending on their concentrations and reaction rate constants. In this model metastability emerges due to the infrequent occurrence of reactions with small rates which makes the relaxation to the steady state dynamics extremely slow. In the transient regime the finite time distribution can be approximated using the stochastic averaging tech- University of Delaware, Newark, DE, (tingw@udel.edu), University of Delaware, Newark, DE, (plechac@math.udel.edu), Colorado State University, Fort Collins, CO, 8523 (aristoff@rams.colorado.edu). 1
2 2 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF nique, [24, 14, or the tau-leap method [19. However, the former does not apply for stationary distribution estimation and the latter can be still computationally expensive for long-time simulations. It is thus desirable to have an efficient algorithm for computing the stationary averages. hus the proposed algorithm will provide a new multi-scale simulation method (in particular for stationary averaging estimation) for the stochastic reaction networks community. he presented approach builds on the parallel replica (ParRep) dynamics introduced in the context of molecular simulations in [23. he ParRep method used in the context of stochastic differential equations, e.g. Langevin dynamics, was rigorously analysed in [15, 16. he algorithm we present and analyse builds on the recent work of [1, 2 where the ParRep process was studied for discrete-time Markov chains. In our algorithms, each time the simulation reaches a local equilibrium in a metastable set W, R independent replicas of the CMC are launched inside the set allowing for parallel simulations of the dynamics at this stage. he main contribution of this work is a procedure for using the replicas in order to efficiently and consistently estimate the exit time and exit state from W, along with the contribution to the stationary time average of f from the time spent in W. We emphasize that we are able to handle arbitrary functions (or observables) on the state space, not only those that are piece-wise constant, i.e., assuming a single value in each W. In the best case, if there are R replicas, then the simulation leaves a metastable set about R times faster compared to a direct serial simulation. he consistency of our algorithms relies on certain properties of the quasi-stationary distribution (QSD) which are essentially local equilibria associated with the metastable sets. We propose two algorithms for computing π(f), called CMC ParRep and embedded ParRep. he former uses parallel simulation of the CMC, while the latter employs parallel simulation of its embedded chain, which is a discrete time Markov chain (DMC). CMC ParRep (resp. embedded ParRep) relies on the fact that, starting at the QSD in a metastable set, the first time to leave the set is an exponential (resp. geometric) random variable and independent of the exit state; see heorem 5 below. he algorithms require some methods for identifying metastable sets, though this need not be done a priori it is sufficient to identify when the CMC is currently in a metastable set, and when it exits such set. While both algorithms can be useful for efficient simulation of π(f) in the presence of metastability, we expect the embedded ParRep can be significantly more efficient, especially when combined with a certain type of QSD sampling, called Fleming-Viot [3, 4. hough we focus here on the computation of π(f), we note that one of our algorithms, CMC ParRep, can be used to compute the dynamics of the CMC on a coarse space in which each metastable set is considered a single (meta-)state. See the discussion below Algorithm 1. he advantages of the proposed algorithms include: (a) no requirement of time reversibility for the underlying dynamics; (b) they are suitable for long-time sampling; (c) they may be used, in principle, on arbitrary CMCs in the presence of metastability. In Section 2, we briefly review CMCs before defining QSDs and detailing relevant properties thereof. In Section 3, we present CMC ParRep, and study how the error in the algorithm depends on the quality of QSD sampling. In Section 4, we present embedded ParRep and provide an analogous error analysis. We detail some numerical experiments on multi-scale chemical reaction network model in Section 5 in order to demonstrate the consistency and accuracy of the algorithms.
3 PARALLEL REPLICA MEHODS FOR CMC 3 2. Background and problem formulation Continuous ime Markov Chains. hroughout this paper, X(t) is an irreducible and positive recurrent continuous time Markov chain (CMC) with values in a countable set E and π denotes the stationary distribution of X(t). We are interested in computing stationary averages π(f) for a bounded function f : E R by using the ergodic theorem 1 (1) lim t t t f(x(s))ds = π(f), which holds almost surely for any initial distribution of X(t). he jump times τ n and holding times τ n for X(t) are defined recursively by and τ =, τ n = inf{t > τ n 1 : X(t) X(τ n 1 )}, τ n 1 = τ n τ n 1 for n 1. We assume that X(t) is non-explosive, that is, lim n τ n = almost surely for every initial distribution of X(t). his precludes the possibility of infinitely many jumps in finite time. We denote X n = X(τ n ) the embedded chain of X(t). It is easy to see that X n is a discrete time Markov chain (DMC). Recall that X(t) is completely determined by its infinitesimal generator matrix Q = {q(x, y)} x,y E. We write q(x) := q(x, x); note that irreducibility implies q(x) > for all x E. It is easy to check that X n has the transition probability matrix P = {p(x, y)} x,y E satisfying { q(x,y) q(x) p(x, y) =, x y,, x = y. We state the following well known fact for the later reference. Lemma 1. For a CMC X(t) with the corresponding embedded Markov chain X n, the holding time between successive jumps τ, τ 1,, τ i, are independent conditioned on the embedded chain X n. Moreover, τ i {X n } is exponentially distributed with the rate q(x i ) and hence E [ τ i {X n } = q(x i ) 1. For details on the above facts, see for instance [ he Quasi-stationary Distribution and Metastability. Below, we write P, E for various probabilities and expectations, the precise meaning of which will be clear from context. We use a superscript P ξ, E ξ to indicate that the initial distribution is ξ. When the initial distribution is δ x, we write P x, E x. he symbol will indicate equality in probability law. Re( ) and denote the real part and modulus of a complex number. Our ParRep algorithms rely on certain properties of quasi-stationary distributions, which we now briefly review. Let W E be fixed and consider the first exit time of X(t) from W, that is, = inf{t > ; X(t) / W }. We consider also the first exit time of X n from W, N = inf{n > ; X n / W }.
4 4 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF A quasi-stationary distribution (QSD) of X(t) in W (or X n in W ) is defined as follows. Definition 2. A probability distribution ν with support in W is a quasi-stationary distribution for X(t) in W if for each y W and t >, (2) ν(y) = P ν (X(t) = y > t). Similarly, a probability distribution µ with support in W is a QSD for X n in W if for each y W and n >, (3) µ(y) = P µ (X n = y N > n). hroughout we write ν for a QSD of the CMC X(t) and µ for a QSD of the embedded chain X n. he associated set W will be implicit since no ambiguities should arise. We will write (4) ν t (A) = P x (X(t) A > t) for the distribution of X(t) conditioned on > t, and (5) µ n (A) = P x (X n A N > n). for the distribution of X n conditioned on N > n. Notice we do not make explicit the dependence on the starting point x. We summarize existence, uniqueness, and convergence properties of the QSD in heorem 3 below (see [6, 22). In heorem 3 below, for simpler presentation, we assume W is finite. hat allows us to characterize convergence to the QSD of X(t) and X n in terms of spectral properties of their generator and transition matrices. We emphasize, however, that finiteness of W is not required for consistency of the algorithms proposed in this paper. Recall that Q is the infinitesimal generator matrix of X(t) and P is the transition probability matrix of the DMC X n. We denote Q W = {q xy } x,y W and P W = {p xy } x,y W the restrictions of P and Q to W. heorem 3. Let W be finite and nonabsorbing for X(t), and assume P W is irreducible. (a) he eigenvalues λ 1, λ 2,... of Q W can be ordered so that > λ 1 > Re(λ 2 )..., where λ 1 has the left eigenvector ν which is a probability distribution on W. Moreover, ν is the unique quasi-stationary distribution of X(t) in W, and for all x, y W, (6) ν t (y) ν(y) = P x (X(t) = y > t) ν(y) C(x)e (λ1 β)t, with C(x) a constant depending on x, and β any real number satisfying Re(λ 2 ) < β < λ 1. (b) Suppose P W is also aperiodic. hen the eigenvalues σ 1, σ 2,... of P can be ordered so that 1 > σ 1 > σ 2..., where σ 1 has the left eigenvector µ which is a probability distribution on W. Moreover, µ is the unique quasi-stationary distribution of X n in W and for all x, y W, ( ) n γ (7) µ n (y) µ(y) = P x (X n = y N > n) µ(y) D(x), σ 1
5 PARALLEL REPLICA MEHODS FOR CMC 5 with D(x) a constant depending on x, and γ any real number satisfying γ > σ 2. Proof. We first justify the expression for the eigenvalues. Observe that for x y W, we have q(x, y) > if and only if p(x, y) >. It follows that Q W is irreducible if and only if P W is irreducible; see Definition 2.1 in [22. Now let e be the all ones column vector, e(x) = 1 for x W. Recall that q(x, y) for every x y E and y q(x, y) = for every x E. his implies that Q W e component-wise. Since W is non-absorbing, for some x W and y / W we have q(x, y) >, and it follows that z W q(x, z) <. his shows that at least one component of Q W e is strictly negative. he expression for the eigenvalues, and the fact that ν is signed (hence a probability distribution, after normalization) now follows from heorem 2.6 of Seneta [22. o see ν is the QSD for X(t) in W, we define the stopped process X (t) = X(t ) such that X(t) is absorbed outside W. For any x, z E, let e x be the column vector e x (z) = 1 if x = z and e x (z) = otherwise. Finiteness of W ensures that P x (X (t) = y) = e x e Q W t e y. hus, for each y W, and P ν (X(t) = y, > t) = P ν (X (t) = y) = νe Q W t e y = e λ1t ν(y) P ν ( > t) = P ν (X (t) W ) = e λ1t, which leads to ν(y) = P ν (X(t) = y > t). Now we turn to the convergence to ν. It follows from heorem 2.7 in [22 that there is a constant C(x) depending on x such that for any real β with Re(λ 2 ) < β, (8) P x (X(t) = y, > t) = P x (X (t) = y) = C(x)e λ1t ν(y) + O(e βt ) and (9) P x ( > t) = C(x)e λ1t + O(e βt ), It follows that ν t (t) ν(y) = P x (X(t) = y > t) ν(y) C(x)e (λ1 β)t where C(x) is now a (possibly different) constant depending on x. he arguments in (b) are similar, using the Perron-Frobenius theorem (Seneta [22, heorem 1.1). For analogous results on the QSD in more general settings, see [6, heorem4.5 for CMCs and [8, heorem 1 for DMCs. We are now ready to define metastability. Definition 4. Let W and λ i, σ i be as in heorem W is metastable for X(t) if λ 1 and (1) λ 1 λ 1 Re(λ 2 ). X(t) is metastable if it has at least one metastable set W. 2. W is metastable for X n if σ 1 1 and (11) σ 1 σ 2 σ 1. X n is metastable if it has at least one metastable set W.
6 6 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF In light of heorem 3, Conditions 1-2 in Definition 4 essentially say that the time to leave W is large in an absolute sense, and the time to leave W is large relative to the time to converge to the QSD in W. Metastability of the CMC is not necessarily equivalent to the metastability of its underlying embedded chain, as we now show. Consider X(t) with the infinitesimal generator 1 1/2 1/2 Q = 1/2 1 1/2 ɛ/2 ɛ ɛ/2, 1 1 where ɛ is positive. hen W = {1, 2, 3} is metastable for X(t) but not for X n, since σ 1.81, σ 2 1/2, λ 1 ɛ/2, Re(λ 2 ) 1/2. Now consider X(t) with the infinitesimal generator ɛ 1 ɛ 1 /2 ɛ 1 /2 Q = ɛ 1 1 ɛ 1 1 ɛ 1 1 ɛ hen W = {1, 2, 3} is metastable for X n but not for X(t), since σ 1 1 ɛ/5, σ 2 1/2, λ 1 1/5, Re(λ 2 ) 3ɛ 1 /2. Algorithm 1 below requires a collection of metastable sets for X(t), and Algorithm 2 requires a collection of metastable sets for X n. he only assumption we make on these sets is that they are pairwise disjoint. (he sets may be different for the two algorithms, as noted above.) hroughout we write W to denote a generic metastable set. We emphasize that we do not assume the metastable sets form a partition of E: the union of the metastable sets may be a proper subset of E. Here and below, we assume that each W has a unique QSD and that ν t (and µ t ) converge to the QSD in total variation norm, for any starting point x. We conclude this section by mentioning properties of the QSD which are essential for the consistency of our algorithms in Section 3 and 4 below. heorem Suppose X() ν. hen is exponentially distributed with the parameter λ 1 : P ν ( > t) = e λ1t, t >, and and X( ) are independent. 2. Suppose X µ. hen N is geometrically distributed with the parameter 1 σ 1 : P µ (N > n) = σ n 1, n = 1, 2,..., and N and X N are independent. Proof. he first part of 1 and 2 was shown in heorem 3. For the rest of the proof see [6. 3. he CMC ParRep Method.
7 PARALLEL REPLICA MEHODS FOR CMC Formulation of the CMC Algorithm. In this section, we introduce a method for accelerating the computation of π(f), where we recall f : E R is any bounded function and π is the stationary distribution. We call this algorithm CMC ParRep, for reasons that will be outlined below. Before we describe CMC ParRep, we introduce some notation. hroughout, X 1 (t),..., X R (t) will be independent processes with the same law as X(t) and with initial distributions supported in W. Recall that the first exit time of X(t) from W is = inf{t > : X(t) / W }. Similarly, for r = 1,..., R, we define the first exit time of X r (t) from W by and the smallest one among them by r = inf{t > : X r (t) / W } = min r r. We denote the index of the replica with the first exit time by M, i.e., M = arg min r. r, r, and M depend on W, but we do not make this explicit. We are in the position to present the CMC ParRep in Algorithm 1. In this algorithm, we will need user-chosen parameters t c associated with each metastable set W. Roughly speaking, these parameters correspond to the time for X(t) to converge to the QSD in W. he accumulated value F (f) sim serves as a quantity that approximates the integral end f(x(s)) ds when the algorithm terminates. If X par (t) remains in W for sufficiently long time (i.e., decorrelation threshold t c ), it is distributed nearly according to the QSD ν of X(t) in W by heorem 3. his means that at the end of the decorrelation stage, X par ( sim ) can be considered a sample of ν. he aim of the dephasing stage is to prepare a sequence of independent initial states with distribution ν. here are several ways for achieving this. Perhaps the simplest is the rejection method. In this procedure, each of the R replicas evolves independently. A parameter t p similar to the decorrelation threshold t c is selected. If a replica leaves W before spending a time interval of length t p in W, it restarts in W from the original initial state. Once all the replicas remain in W for time t p, we stop and take x 1,..., x R as the final states of all the replicas in the dephasing stage and use them for the subsequent parallel stage. Besides rejection sampling, another method is a Fleming-Viot based particle sampler; see the discussion after Algorithm 2 below. he acceleration of CMC ParRep comes from the parallel stage. Recall that, for each r = 1,..., R, if x 1,..., x R are independent, identically distributed (iid) with the common distribution ν, then 1,..., R are independent exponential random variables with common parameter λ 1. Using = min r r, it is then easy to check that R has the same distribution as 1. See Lemma 6 below. his means one only needs to wait for instead of 1 to observe an exit from W. Note that this is true whether or not W is metastable, so efficiency of the parallel stage does not require metastability. However, the dephasing stage is not efficient if W is not metastable. hat is because, in practice, the samples x 1,..., x R are obtained by
8 8 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Algorithm 1 CMC ParRep 1: Set a decorrelation threshold t c for each metastable set W. Initialize the simulation time clock sim = and the accumulated value F (f) sim =. We will write X par (t) for a simulation process that obeys the law of X(t). A complete ParRep cycle consists of three stages. 2: Decorrelation Stage : Starting at t = sim, evolve X par (t) until it spends an interval of the time length t c inside the same metastable set W. hat is, evolve X par (t) from time t = sim until time corr = inf{t sim + t c : X par (s) W forall s [t t c, t forsome W }. hen update corr F (f) sim = F (f) sim + f(x par (t)) dt, sim set sim = corr, and proceed to the dephasing stage. 3: Dephasing Stage : Let W be such that X par ( sim ) W, that is, W is the metastable set from the end of the last decorrelation stage. Generate R independent samples x 1,..., x R from ν, the QSD of X(t) in W. hen proceed to the parallel stage. 4: Parallel Stage : Start R parallel processes X 1 (t),..., X R (t) at x 1,..., x R, and evolve them from time t = until time. hen update (12) F (f) sim = F (f) sim + sim = sim + R, R f(x r (s))ds, set X par ( sim ) = X M ( ), and return to the decorrelation stage. 5: he algorithm is stopped when sim reached a user-chosen terminal time end. he stationary average π(f) is estimated as π(f) F (f) sim / sim. simulating trajectories which remain in W for a sufficiently long time t p. Such samples are hard to obtain when the typical time t p for x 1,..., x R to reach the QSD in W is not much smaller than the typical time to leave W. o see that each parallel stage has a consistent contribution to the stationary average, we make the following two observations. Suppose that x 1,..., x R are iid samples from ν. 1. he joint law of (R, X M ( )) is the same as that of ( 1, X( 1 )). hat is, the joint distribution of the first exit time and the exit state in the parallel stage is independent of the number of replicas. 2. he expected value of R f(x r (s))ds in (12) is the same as that of 1 f(x 1 (s))ds. hat is, the expected contribution to F (f) sim from each parallel stage is independent of the number of replicas. he first observation is a consequence of the heorem 5, and the second will be proved
9 PARALLEL REPLICA MEHODS FOR CMC 9 in heorem 7 below. Consistency of stationary averages follows from the points 1-2 above and the law of large numbers. Since there are indefinitely many parallel stages in a given W, consistency is ensured as long as the expected contribution to F (f) sim from the parallel stage has the correct expected value. See [1 for details and discussion in a related discrete time version of the algorithm under some idealized assumptions. he CMC ParRep algorithm suffers some serious drawbacks. Even if the parallel processors are synchronous, M and may not be known at the wall clock time when the first replica leaves W. he reason is that the holding times for a CMC are random, while the wall clock time for simulating each jump of the CMC is always roughly the same. We illustrate this problem in Figure 1. In the worst possible Fig. 1. he parallel stage of the CMC ParRep algorithm with two replicas. R1 escapes from W at t = 7 with 7 transitions while R2 escapes at t = 8 but with only 4 transitions. In the parallel stage of the CMC ParRep algorithm, R2 escaped from W before R1 does but 2 > 1. here is no acceleration in this case since the parallel stage does not terminate when R2 escapes. case, in order to determine M and, we must wait for all the replicas to leave W. However, one can set a variable min to record the current minimum first exit time over all replicas which have left W, and terminate any replicas which reach time min but have not left W, since no replica contributes to the accumulated value past time min. Since the expected first exit times E[ r, r = 1,..., R are roughly the same, if the variance in the number of jumps of X r (t) before time is small for all r = 1,..., R, then we can expect that the parallel stage stops after only a few replicas leave W. For the same reason, there is another major drawback of CMC ParRep. If f takes multiple values in W, then the computation of R f(x r (s))ds in (12) requires storing the entire history of each replica in that parallel stage. Hence, the implementation of the CMC ParRep might be memory demanding unless one is interested in the equilibrium average of a metastable-set invariant function f, i.e., if f(x) has only one value in each metastable set W. In Section 4 we present another algorithm, called embedded ParRep, which addresses these drawbacks Error Analysis of CMC ParRep. Here and below we will write E νr for the expectation of (X 1 (t),..., X R (t)) starting at ν R, where for ν R (x 1,..., x R ) = R ν(x r ), x 1,..., x R W. We begin with a simple well known lemma. Lemma 6. Suppose 1,..., R are iid exponential random variables with the parameter λ 1. hen = min 1 r R r is exponentially distributed with the parameter Rλ 1. In particular, R has the same distribution as 1. We now show that if the dephasing sampling is exact, then the expected contribution to the accumulated value F (f) sim from the parallel step of Algorithm 1 is exact.
10 1 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF heorem 7. Suppose in the dephasing step (x 1,..., x R ) ν R. hen the expected contribution to F (f) sim from the parallel stage of Algorithm 1 is independent of the number of replicas, [ R [ E νr f(x r (s))ds = E ν f(x(s))ds = ν(f)e ν [. Proof. First we consider the case with a single replica. We condition on the exit time 1 and write [ 1 [ t E ν f(x 1 (s))ds = E ν f(x 1 (s))ds 1 = t P ν ( 1 dt). Interchanging the two integrals of the right-hand side leads to s E ν [ f(x 1 (s)) 1 = t P( 1 dt)ds. Note that the inner integral can be written as E [ ν f(x 1 (s))1 >s 1 and hence [ 1 E ν f(x 1 (s))ds = E ν [ f(x 1 (s)) 1 > s P ν ( 1 > s)ds. Owing to the definition of QSD and the fact that E ν [ 1 = P ν ( 1 > s)ds, [ 1 E ν f(x 1 (s))ds = ν(f)e ν [ 1. In the case of multiple replicas, similar steps can be used to show that [ R R E νr f(x r (s))ds = E νr [f(x r (s)) > s P νr ( > s)ds. Recall that > s if and only if r > s for all r = 1,..., R. Using this, the fact that 1,..., r are independent, and the definition of the QSD, we get hus R [ E νr E νr [f(x r (s)) > s = E ν [f(x r (s)) r > s = ν(f). f(x r (s))ds = ν(f) R P νr ( > s)ds = ν(f)re νr [. Finally, the result follows from Lemma 6. he purpose of CMC ParRep is to efficiently simulate very long trajectories of a metastable CMC and estimate the equilibrium average π(f). CMC ParRep can produce accelerated dynamics of the CMC on a coarse state space where each coarse set corresponds to some W ; see the discussion below Algorithm 2 below. Our numerical experiments suggest that CMC ParRep (and also embedded ParRep described below) are consistent for estimating the stationary distribution.
11 PARALLEL REPLICA MEHODS FOR CMC 11 For CMC ParRep, we justify this claim in heorem 8 below, which shows that, starting in some W and waiting until the simulation leaves W, the error for a complete ParRep cycle in CMC ParRep compared to direct (serial) simulation vanishes as t c increases. See heorem 12 below for the analogous result on embedded ParRep. We note that the errors from each ParRep cycle produce an error in the estimation (5) of stationary averages that does not disappear as sim. However, we expect that the error vanishes as the thresholds t c = t p. Study of the this error is more involved and will be the focus of another work. Recall we have assumed convergence of ν tc ν V as t c, for every starting point x E, where V denotes total variation norm. See for instance heorem 3 for conditions guaranteeing this convergence. heorem 8. Consider CMC ParRep starting at x W in the decorrelation stage. Assume the dephasing stage sampling is exact, that is, (x 1,..., x R ) ν R. Consider the expected contribution to F (f) sim until the first time the simulation leaves W (either in the decorrelation or in the parallel stage), [ [ tc R F (f) sim E x f(x(s)) ds + E x,νr f(x r (s))ds, 1 >t c where E x,νr denotes expectation for (X(t), X 1 (t),..., X R (t)) with X(t) starting at x and the replicas (X 1 (t),..., X R (t)) starting at initial distribution ν R. he error compared to direct (serial) simulation satisfies the bound [ (13) Ex f(x(s))ds F (f) sim f sup E x [ ν tc ν V. x W = = Proof. We estimate [ f(x(s))ds F (f) sim Ex Ex Ex Ex [ t c f(x(s))ds E x,νr [ 1 >t c R [ [ f(x(s))ds t c > t R c E x,νr [ [ f(x(s))ds > t R c E νr t c f(x (s))ds r f(x r (s))ds > t c Px ( > t c ) f(x (s))ds r, where we used the fact that X(t) and the replicas (X 1 (t),..., X R (t)) are independent. By the Markov property, [ [ E x f(x(s))ds > t c = E νtc f(x(s))ds. t c By heorem 7, E νr [ R [ f(x r (s))ds = E ν f(x(s)) ds.
12 12 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Combining the above estimates and equalities, [ f(x(s))ds F (f) sim Ex [ [ Eνtc f(x(s))ds E ν f(x(s)) ds [ = E x f(x(s))ds ν tc (x) [ E x f(x(s))ds ν(x) x W x W f sup E x [ ν tc ν V. x W We note that E x [ is uniformly bounded in x W if, for instance, P W is irreducible and W is finite and non-absorbing for X(t), as in heorem 3. his uniform boundedness guarantees that the right hand side of (13) vanishes as t c. 4. he Embedded ParRep Method Formulation of the Embedded ParRep Algorithm. In this section, we introduce another algorithm for accelerating the computation of π(f). he algorithm, called embedded ParRep, circumvents the disadvantages of CMC ParRep discussed above. As mentioned in the previous section, CMC ParRep can be slow due to the randomness of the holding times. In the worst case, one has to wait until all replicas leave W in order to determine the first exit time. o circumvent this issue we propose an algorithm based on the embedded chain in which the parallel stage terminates as soon as one of the replicas leaves W. Before we describe embedded ParRep, we introduce some notations. hroughout, Xn, 1..., Xn R will be independent processes with the same law as X n and with initial distributions supported in W. Moreover, we consider Xn, 1..., Xn R as the embedded chains of X 1 (t),..., X r (t) defined above, and let τn, 1..., τn R be the corresponding holding times. Recall that the first exit time of X n from W is N = inf{n > : X n / W }. For r = 1,..., R, we define the first exit time of X r n from W by and the smallest among them by N r = min{n N; X r n / W } N = min{n r ; r = 1,..., R}. Note that it is possible that more than one replica leave W for the first time after N transitions. We denote by K the smallest index among these escaped replicas. hat is, K = min{r = 1,..., R; XN r / W }. It is clear from the above definition that N K = N. Of course N, N r, N and K depend on W, but we do not make this explicit. Here and below we write E µr for expectation of (Xn, 1..., Xn R ) starting at µ R, where µ R (x 1,..., x R ) = R µ(x r ), x 1,..., x R W.
13 PARALLEL REPLICA MEHODS FOR CMC 13 We begin by reproducing from [2 heorem 9 and 1 below, with proofs for completeness. heorem 9. Suppose (X 1 n,..., X R n ) has initial distribution µ R. hen R(N K 1) + K has the same distribution as N 1. Proof. By heorem 5, N 1 is geometrically distributed with rate P µ (N 1 > 1). Note that for any n and r = 1,..., R, the event {N K = n, K = k} is equivalent to the event {N 1 > n,..., N k 1 > n, N k = n, N k+1 > n 1,..., N R > n 1}. Since X 1 n,..., X R n are iid and N 1 is geometrically distributed with rate p = P µr (N 1 > 1), P µr (N K = n, K = k) = (1 p) n(k 1) (1 p) n 1 p(1 p) (n 1)(k 1) = (1 p) R(n 1)+k 1 p. hat is, R(N K 1) + K has geometric distribution with rate p. heorem 1. Suppose (X 1 n,..., X R n ) has the initial distribution µ R. hen X K N K is independent of R(N K 1) + K and the distribution of (X K N K, R(N K 1) + K) is same as that of (X 1 N 1, N 1 ). Proof. We first prove that X K N K is independent of K. Since X R n,..., X R n are iid and N k is independent of X k N k for each k, then X k N k is independent of N 1,..., N R. Note that K σ(n 1,..., N R ), hence X k N k is independent of K for each k. Now observe that for any A E, P µr (X K N K A) = R = P µr (X k N k A, K = r) R P µr (XN 1 (K = r) 1 A)PµR = P µr (X 1 N 1 A), that is, X K N and X 1 K N are equally distributed. his implies that X K 1 N is independent K of K. o see the independence between X K N and R(N K 1) + K, note that K P µr (X K N K A, N K = n, K = r) = P µr (X r N r A, N r = n, K = r) = P µr (X r N r A, K = r N r = n)p µr (N r = n) = P µr (X r N r A N r = n)p µr (N r = n, K = r) = P µr (X r N r A)PµR (N r = n, K = r) = P µr (X K N K A)PµR (N K = n, K = r) for any measurable A E, n Z + and r = 1,..., R. Finally, heorem 9 and the above analysis imply that (X K N, R(N K 1) + K) and (X 1 K N, N 1 ) are equally 1 distributed. Now we present the embedded ParRep algorithm in Algorithm 2. In this algorithm we will need user-chosen parameters n c associated with each metastable set W. Roughly, these parameters correspond to the time for X n to converge to the QSD in W. he DMC X n and holding times τ n are simulated by the stochastic simulation algorithm (SSA), see, for instance, [13, just as in the CMC ParRep. If Xn par remains in W for sufficiently long time (i.e., time t c ), it is distributed nearly according to the QSD µ of X n in W. See heorem 3. his means that at the end of the decorrelation stage, Xn par can be considered a sample of µ.
14 14 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Algorithm 2 Embedded ParRep 1: Set a decorrelation threshold n c for each metastable set W. Initialize the simulation time clock N sim = and the accumulated value F (f) sim =. We will write Xn par and par n for a DMC and holding time process following the law of the embedded chain and holding times of X(t) respectively. A complete ParRep cycle consists of three stages. 2: Decorrelation Stage: Starting at n = N sim, evolve Xn par and τn par until Xn par spends n c consecutive time steps inside of the same metastable set W. hat is, evolve X par n and τ par n from time n = N sim until time N corr = inf{n N sim +n c 1 : X par m W for m {n n c +1,..., n} forsome W }. hen update N corr 1 F (f) sim = F (f) sim + f(xn par ) τn par, n=n sim set N sim = N corr, and proceed to the dephasing stage. 3: Dephasing Stage : Let W be such that X par N sim W, that is, W is the metastable set from the end of the decorrelation stage. Generate R independent samples x 1,..., x R from µ, the QSD of X n in W. hen proceed to the parallel stage. 4: Parallel Stage : Start R parallel processes Xn, 1..., Xn R at x 1,..., x R, and evolve them and the corresponding holding times τn, 1..., τn R from time n = until time N. hen update (14) F (f) sim = F (f) sim + R N 2 k= N sim = N sim + R(N 1) + K, f(x r k) τ r k + K f(xn r 1) τ r N 1 set X par N sim = XN K, and return to the decorrelation stage. 5: he algorithm is stopped when N sim reaches some user-chosen time N end. he stationary average π(f) is estimated as π(f) F (f) sim /F (1) sim. he aim of the dephasing stage is to prepare a sequence of iid initial states with distribution µ. Like the CMC ParRep, rejection sampling can be used for the embedded ParRep as well. However, a more natural and efficient option for the embedded ParRep is a Fleming-Viot based sampling procedure [3, 11. he procedure can be summarized as follows. he R replicas Xn, 1..., Xn R, starting in W, evolve until one or more of them leaves W. hen each replica that left W is restarted from the current state of another replica that is currently in W (usually chosen uniformly at random). he procedure stops after the replicas have evolved for n = n p time steps, where n p is a parameter similar to n c. (If all the replicas leave W at the same time, the procedure restarts from the beginning.) With this type of sampling, the number of time steps simulated for each replica in the dephasing step is the same. In particular, if the R parallel processors
15 PARALLEL REPLICA MEHODS FOR CMC = 2 1 = 1 1 = 1 N $!1 = 2 N $!1 = 1 N $ = 2 N $ 3 = 3 1 = 3 N $!1 = 3 N $ R! 1 = R!1 1 = R!1 N $!1 = R!1 N $ R = R 1 = R N $!1 = R N $ Fig. 2. he diagram for one parallel stage of the embedded ParRep algorithm with R replicas. Each blue dot represents an exit event along the time line. Both replica 2 and 3 leave W after N = 6 transitions (the blue dot with the red x ), in which case K = 2. are synchronous (i.e. if each processor takes the same wall clock time to simulate one time step), then each processor finishes the dephasing step at the same wall clock time. he acceleration of the embedded ParRep comes from the parallel stage. Roughly, we only have to wait N time steps instead of N to observe an exit from W. he theoretical wall clock time speedup can be approximately a factor of R. See heorem 9 below. Like with CMC ParRep, the parallel step does not require metastability for this time speedup, but if W is not metastable, then the dephasing step will not be efficient. See the remarks below Algorithm 1. Similar to the CMC ParRep, each parallel stage of the embedded ParRep has a consistent averaged contribution to the stationary average. Suppose that x 1,..., x R are iid samples from µ. 1. he joint law of (X K N K, R(N K 1) + K) is the same as that of (X 1 N 1, N 1 ). hat is, the joint distribution of the first exit time and the exit state for each parallel stage is independent of the number of replicas. 2. he expected value of R is the same as that of N 2 k= f(x r k) τ r k + N 1 n= K f(xn r 1) τ r N 1 f(x 1 n) τ 1 n. Hence the expected contribution to F (f) sim from each parallel stage is independent of the number of replicas. See heorem 11 below. See heorem 1 and 11 for proofs of these statements. Remark 1 (Parallel implementation and efficiency). We expect that embedded ParRep is superior to the CMC ParRep for the following two reasons. First, consider the parallel stages of both algorithms. In the CMC ParRep, observing the first exit
16 16 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF event in the parallel stage is not sufficient to determine. But in embedded ParRep, once any replica leaves W, we know N. hus the embedded ParRep parallel step terminates once any of the replicas leaves W. For this reason we expect the parallel stage of the embedded ParRep to be significantly faster than that of the CMC ParRep. Second, consider the dephasing stage. For the embedded ParRep, Fleming- Viot sampling is a natural technique because if the processors are synchronous then they all finish the dephasing stage at the same wall clock time, and only the current states of each processor are needed at each time step to decide where to restart replicas which left W. For asynchronous processors, one can simply implement a polling time. his is not true, however, for Fleming-Viot sampling with the CMC ParRep. Indeed, to implement Fleming-Viot sampling with the CMC ParRep, one would have to store the histories of every replica, and the replicas would finish at potentially very different wall clock times. he rejection method can be slow for both algorithms, particularly when the metastability is weak or when the number of replicas is large Error analysis of the embedded ParRep. Now we are able to show that if the dephasing sampling is exact, then the expected contribution to F (f) sim from the parallel stage is exact. heorem 11. Suppose in the dephasing step (x 1,..., x R ) µ R. hen the expected contribution to F (f) sim from the parallel stage of Algorithm 2 is the same for every number of replicas. E µr [ R (15) N 2 k= f(x r k) τ r k + Proof. We first rewrite R = N 2 k= R N 1 i= K f(xn r 1) τ r N 1 f(x r k) τ r k + [ N 1 = E µ n= K f(xn r 1) τ r N 1 f(x r i ) τ r i R r=k+1 f(x r N 1) τ r N 1. For the first part, we condition N and obtain [ R N 1 R n 1 E µr f(xi r ) τi r = E µr [f(xi r ) τi r I N =n i= n=1 i= Interchanging the iterated summations leads to R n 1 E µr [f(xi r ) τi r I N =n = n=1 i= R i= f(x n ) τ n = µ(fq 1 )E µ [N. E µr [f(xi r )IN >i τi r. Notice N > i is equivalent to N 1 > i,..., N R > i and τi r is independent of N s for s r. hus = R i= R i= E µr [f(xi r ) τi r N > i P µr (N > i) E µr [f(xi r ) τi r N r > i P µr (N > i).
17 PARALLEL REPLICA MEHODS FOR CMC 17 Now by Lemma 1 and the definition of the QSD, E µ [f(x r i ) τ r i N r > i = E µ [E µ [f(x r i ) τ r i {X r n} n=,1,... N r > i Combining the last four equations gives (16) E µr [ R N 1 i= = E µ [f(x r i )E µ [ τ r i {X r n} n=,1,... N r > i = E µ [ f(x r i )q(x r i ) 1 N r > i = µ(fq 1 ). f(x r i ) τ r i = µ(fq 1 )RE µr [N. A similar argument can be applied to the second term on the right hand side of (15). First we condition N and K simultaneously such that [ R E µr f(xn r 1) τ r N 1 = r=k+1 R R n=1 r=k+1 E µr [ f(x r n 1) τ r n 1 N = n, K = k P µr (N = n, K = k). Interchanging the second and third summations the right-hand side equals Recall that R r 1 E [ µr f(xn 1) τ r n 1 N r = n, K = k P µr (N = n, K = k) n=1 r=2 k=1 N = n, K = k N 1 > n,..., N k 1 > n, N k = n, N k+1 > n 1,..., N R > n 1. hus, using independence of X 1 n,..., X R n and the definition of the QSD, = R r 1 E [ µr f(xn 1) τ r n 1 N r = n, K = k P µr (N = n, K = k) n=1 r=2 k=1 R r 1 E µ [ f(xn 1) τ r n 1 N r r > n 1 P µr (N = n, K = k) n=1 r=2 k=1 =µ(fq 1 ) R r 1 P µr (N = n, K = k) n=1 r=2 k=1 =µ(fq 1 )(R E µr [K). Combining the last three equations leads to [ R (17) E µ f(xn r 1) τ r N 1 = µ(fq 1 )(R E µr [K). r=k+1 Subtracting (17) from (16), we have E µr [ R N 1 i= f(x r i ) τ r i R r=k+1 f(x r N 1) τ r N 1 = µ(fq 1 )E µr [R(N 1) + K.
18 18 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Now the result follows since µ(fq 1 )E µr [R(N 1) + K = µ(fq 1 )E µ [N by heorem 1. In particular, when R = 1 we have N = N and K = 1, and thus [ N 1 E µ n= f(x n ) τ n = µ(fq 1 )E µ [N. We now prove an analog of heorem 8 for the embedded ParRep. Recall we have assumed convergence of µ nc µ V as n c, for every starting point x E. See for instance heorem 3 for conditions guaranteeing this convergence. heorem 12. Consider the embedded ParRep starting at x W in the decorrelation stage. Assume the dephasing stage sampling is exact, that is, (x 1,..., x R ) µ R. Consider the expected contribution to F (f) sim up until the first time the simulation leaves W (either in the decorrelation stage or in the parallel stage): [ nc N 1 [ R N 2 F (f) sim E x f(x n ) τ n + E x,µr 1N>n c f(xk) τ r k r n= K + 1N>n c f(xn r 1) τ r N 1, where E x,µr denotes expectation for (X n, Xn, 1..., Xn R ) with X n starting at x and the replicas (Xn, 1..., Xn R ) starting at the initial distribution µ R. he error compared to a direct (serial) simulation satisfies the bound [ N 1 (18) Ex f(x n ) τ n F (f) sim f sup E x [ µ nc µ V. x W n= Proof. he proof is similar to that for the CMC ParRep, [ N 1 f(x n ) τ n F (f) sim = Ex Ex n= [ N 1 n=n c N f(x n ) τ n E x,µr [ K + 1N>n c f(xn r 1) τ r N 1 Ex [ N 1 f(x n ) τ n N > n c n=n c E µr [ R N 2 k= f(x r k) τ r k + By the Markov property [ N 1 E x f(x n ) τ n N > n c n=n c R N 2 1N>n c k= k= f(x r k) τ r k K f(xn r 1) τ r N 1. [ N 1 = E µnc n= f(x n ) τ n.
19 Owing to heorem 11, E µr [ R herefore Ex N 2 k= [ N 1 n= Eµnc = x W PARALLEL REPLICA MEHODS FOR CMC 19 f(x r k) τ r k + f(x n ) τ n [ N 1 n= [ N 1 E x f(x n ) τ n n= K f(xn r 1) τ r N 1 F (f) sim f(x n ) τ n f sup E x [ µ nc µ V x W [ N 1 E µ n= µ nc (x) x W f(x n ) τ n [ N 1 E x [ N 1 = E µ n= n= f(x n ) τ n f(x n ) τ n µ(x) with the last equation coming from the fact that E x [ N 1 n= τ n = E x [. 5. Numerical Experiments. We present two numerical examples from the stochastic reaction networks in order to demonstrate the consistency and efficiency of the ParRep algorithms Reaction networks with linear propensity. We consider the following stochastic reaction network (19) A B C taken from [7, where A, B and C represent reacting species. he time evolution of the population (the number of species) in the reaction network is commonly modeled as a CMC X(t) = (X 1 (t), X 2 (t), X 3 (t)) with state space E Z 3 +. he jump rate of each reaction is governed by the propensity function (intensity) λ j (x), j = 1,..., 5 such that for all t >, P(X(t + h) = x + η j X(t) = x) λ j (x) = lim, h h where η j is the state change vector associated with the jth reaction. We list the reactions and their corresponding propensity functions and state change vectors in able 1. able 1 Reactions, propensity functions and state change vectors Reaction Propensity function State change vector A λ 1 (x) = c 1 η 1 = (1,, ) A B λ 2 (x) = c 2 x 1 η 2 = ( 1, 1, ) B A λ 3 (x) = c 3 x 2 η 3 = (1, 1, ) B C λ 4 (x) = c 4 x 2 η 4 = (, 1, 1) C λ 5 (x) = c 5 x 3 η 5 = (,, 1).
20 2 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF In this numerical experiment, we take the initial state x = (5, 1, 1) and the rate constants (c 1, c 2, c 3, c 4, c 5 ) = (.1, 1, 1,.1,.1). With this choice of parameters the timescale separation is about ɛ = 1 4 and hence the process X(t) demonstrates metastability. he reactions A B and B A occur with a much higher probability than the other reactions and hence we call A B and B A fast reactions and the other reactions slow reactions. he occurrence of slow reactions is a rare event. We define the observables f 1 (x) = x 1 + x 2 and f 2 (x) = x 3, the collection of sets {W m,n } m,n Z+ with W m,n = {x E : f 1 (x) = m, f 2 (x) = n} form a full decomposition of the state space E. Note that both the total population of species A and B (i.e., f 1 (X(t))) and the population of species C (i.e. f 2 (X(t))) remain constant until one of the slow reactions occurs. Hence the typical sojourn time for X(t) in each W m,n is very long comparing to the transition time between any two states that are in W m,n. In this case, we say X(t) is metastable in W m,n. For example, with the initial population x = (1, 1, ), the states (1, 1, ), (2,, ) and (, 2, ) form a metastable set since the fast reactions A B and B A occur with a significantly higher probability than slow reactions and only the occurrence of the slow reactions can allow the process to move from the metastable set to another metastable set. Note that both observables f 1 and f 2 defined above are invariant in each metastable set, we call them slow observables. In general, an observable f is called a slow observable if it is invariant in each metastable set W m,n, i.e., there is a constant C(m, n) such that f(x) = C(m, n) for each x W m,n. An observable is called a fast observable if it is not slow (e.g., f(x) = x 1 ). his kind of two-scale problems arise in many fields other than the stochastic reaction networks, such as the queuing theory and population dynamics. Estimation of the distributions of two-scale processes can be computationally prohibitive due to the insufficient sampling of the rare events. herefore, it is desirable to apply the two ParRep algorithms proposed in this paper to accelerate the long time simulation and estimate the stationary distribution. We apply both the CMC ParRep and the embedded ParRep to estimate the stationary averages of the slow observables f 1 and f 2. he stationary distribution of the fast observable f 3 (x) = x 1 is also computed using the embedded ParRep. On the other hand, for the reaction network (19) under consideration, one can calculate the stationary distribution analytically since it only involves mono-molecular reactions. In fact, it can be shown that the stationary distribution is a multivariate Poisson distribution [7, that is, (2) π(x 1, x 2, x 3 ) = x1 x2 x3 λ 1 λ 2 λ 3 e ( λ 1+ λ 2+ λ 3), x 1!x 2!x 3! where λ 1 = c 1(c 3 + c 4 ) c 2 c 4, λ2 = c 1 c 4, λ3 = c 1 c 5. Hence the exact stationary averages of the slow observables f 1 and f 2 are π(f 1 ) = 2.1 and π(f 2 ) = 1 and the exact stationary averages of the fast observable f 3 (x) = x 1 is 1.1. We use this exact result to compare with our result from numerical simulation.
21 :(f2) :(f2) :(f1) :(f1) PARALLEL REPLICA MEHODS FOR CMC CMC ParRep 22 Embedded ParRep 21.5 CMC ParRep Exact 21.5 Embedded ParRep Exact Replica Replica Fig. 3. he stationary average of the slow observable f 1 (x) = x 1 +x 2 computed with the CMC ParRep (left) and with the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. 11 CMC ParRep 11 Embedded ParRep CMC ParRep Exact Embedded ParRep Exact Replica Replica Fig. 4. he stationary average of the slow observable f 2 (x) = x 3 computed with the CMC ParRep (left) and with the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. Our simulations compare the CMC ParRep and the embedded ParRep with the Stochastic Simulation Algorithm (SSA), [13. In Figure 3, we demonstrate the estimation of π(f 1 ) using the CMC ParRep and the embedded ParRep with various numbers of replicas (R = 1, 2,, 1) and with SSA (R = 1). Similarly, Figure 4 shows the estimation of π(f 2 ). Note that only the embedded ParRep is used to compute the stationary average of the fast variable f(x) = x 1 since the CMC ParRep is not efficient for fast observables as we commented at the end of Section 3.1. Currently, the rejection sampling is used for dephasing and the decorrelation and dephasing thresholds are taken to be t c = t p =.1 for the CMC ParRep and n c = n p = 15 steps for the embedded ParRep. In Figure 5, the estimation for the fast observable and speedup are shown. It can be seen that with 1 replicas, the speedup factor is about 4.5 for the CMC ParRep and 5.5 for the embedded ParRep.
22 :(x1) Speedup 22 ING WANG, PER PLECHÁČ, AND DAVID ARISOFF Embedded ParRep Embedded ParRep Exact 3 25 CMC vs. Embedded CMC ParRep Embedded ParRep Replica Replica Fig. 5. he stationary average of the fast observable f 3 (x) = x 1 computed with the embedded ParRep (left) and the speedup comparison between the CMC ParRep and the embedded ParRep (right). he user-specified terminal time is end = 1 4 in the simulation. When the number of replicas increases, the embedded ParRep becomes much more efficient than the CMC ParRep. However, even the embedded ParRep is far away from the linear speedup (with 1 replicas, about 27 times faster than SSA). his sublinear speedup comes from the fact that when the number of replica is large, the acceleration is offset by the inefficient rejection sampling based dephasing procedure. We expect that the embedded ParRep would be more efficient if the Fleming-Viot particle processes are used for dephasing Reaction networks with nonlinear propensity. In the second example, we focus on the following network from [24, (21) S 1 S 2, S 1 S 3, 2S 2 + S 3 3S 4. he propensity function and state change vector associated with each reaction is shown in able 2. Note that by the law of mass action, the reactions 2S 2 + S 3 3S 4 have nonlinear propensity functions. able 2 Reactions, propensity functions and state change vectors Reaction Propensity function State change vector S 1 S 2 λ 1 (x) = c 1 x 1 η 1 = ( 1, 1,, ) S 2 S 1 λ 2 (x) = c 2 x 2 η 2 = (1, 1,, ) S 1 S 3 λ 3 (x) = c 3 x 1 η 3 = ( 1,, 1, ) S 3 S 1 λ 4 (x) = c 4 x 3 η 4 = (1,, 1, ) 2S 2 + S 3 3S 4 λ 5 (x) = c 5 x 2 (x 2 1)x 3 η 5 = (, 2, 1, 3) 3S 4 2S 2 + S 3 λ 6 (x) = c 6 x 3 (x 3 1)(x 3 2) η 6 = (, 2, 1, 3) hroughout this example, we choose the initial state x = (3, 3, 3, 3) and the reaction rate constants (c 1, c 2, c 3, c 4, c 5, c 6 ) = (.1,.1,.1,.1, 2, 2)
The parallel replica method for Markov chains
The parallel replica method for Markov chains David Aristoff (joint work with T Lelièvre and G Simpson) Colorado State University March 2015 D Aristoff (Colorado State University) March 2015 1 / 29 Introduction
More informationLecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.
Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution
More informationMarkov processes and queueing networks
Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationSMSTC (2007/08) Probability.
SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................
More informationA mathematical framework for Exact Milestoning
A mathematical framework for Exact Milestoning David Aristoff (joint work with Juan M. Bello-Rivas and Ron Elber) Colorado State University July 2015 D. Aristoff (Colorado State University) July 2015 1
More informationConvergence of Feller Processes
Chapter 15 Convergence of Feller Processes This chapter looks at the convergence of sequences of Feller processes to a iting process. Section 15.1 lays some ground work concerning weak convergence of processes
More informationINTRODUCTION TO MARKOV CHAIN MONTE CARLO
INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate
More informationLecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.
1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMARKOV CHAINS AND HIDDEN MARKOV MODELS
MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationStatistics 150: Spring 2007
Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities
More informationStochastic Processes
Stochastic Processes 8.445 MIT, fall 20 Mid Term Exam Solutions October 27, 20 Your Name: Alberto De Sole Exercise Max Grade Grade 5 5 2 5 5 3 5 5 4 5 5 5 5 5 6 5 5 Total 30 30 Problem :. True / False
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More information215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that
15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationNUMERICAL ANALYSIS OF PARALLEL REPLICA DYNAMICS
NUMERICAL ANALYSIS OF PARALLEL REPLICA DYNAMICS GIDEON SIMPSON AND MITCHELL LUSKIN Abstract Parallel replica dynamics is a method for accelerating the computation of processes characterized by a sequence
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationNumerical methods in molecular dynamics and multiscale problems
Numerical methods in molecular dynamics and multiscale problems Two examples T. Lelièvre CERMICS - Ecole des Ponts ParisTech & MicMac project-team - INRIA Horizon Maths December 2012 Introduction The aim
More informationMarkov Chains, Stochastic Processes, and Matrix Decompositions
Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationValue and Policy Iteration
Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite
More informationMARKOV PROCESSES. Valerio Di Valerio
MARKOV PROCESSES Valerio Di Valerio Stochastic Process Definition: a stochastic process is a collection of random variables {X(t)} indexed by time t T Each X(t) X is a random variable that satisfy some
More information1 Continuous-time chains, finite state space
Université Paris Diderot 208 Markov chains Exercises 3 Continuous-time chains, finite state space Exercise Consider a continuous-time taking values in {, 2, 3}, with generator 2 2. 2 2 0. Draw the diagramm
More informationStochastic Processes (Week 6)
Stochastic Processes (Week 6) October 30th, 2014 1 Discrete-time Finite Markov Chains 2 Countable Markov Chains 3 Continuous-Time Markov Chains 3.1 Poisson Process 3.2 Finite State Space 3.2.1 Kolmogrov
More informationLecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.
Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationT.8. Perron-Frobenius theory of positive matrices From: H.R. Thieme, Mathematics in Population Biology, Princeton University Press, Princeton 2003
T.8. Perron-Frobenius theory of positive matrices From: H.R. Thieme, Mathematics in Population Biology, Princeton University Press, Princeton 2003 A vector x R n is called positive, symbolically x > 0,
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdlhandlenet/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 2
MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on
More informationNote that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +
Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n
More informationStability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games
Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,
More informationUniversal examples. Chapter The Bernoulli process
Chapter 1 Universal examples 1.1 The Bernoulli process First description: Bernoulli random variables Y i for i = 1, 2, 3,... independent with P [Y i = 1] = p and P [Y i = ] = 1 p. Second description: Binomial
More informationLatent voter model on random regular graphs
Latent voter model on random regular graphs Shirshendu Chatterjee Cornell University (visiting Duke U.) Work in progress with Rick Durrett April 25, 2011 Outline Definition of voter model and duality with
More informationFigure 10.1: Recording when the event E occurs
10 Poisson Processes Let T R be an interval. A family of random variables {X(t) ; t T} is called a continuous time stochastic process. We often consider T = [0, 1] and T = [0, ). As X(t) is a random variable
More informationLet (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t
2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition
More informationStatistics 992 Continuous-time Markov Chains Spring 2004
Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationStochastic Chemical Kinetics
Stochastic Chemical Kinetics Joseph K Scott November 10, 2011 1 Introduction to Stochastic Chemical Kinetics Consider the reaction I + I D The conventional kinetic model for the concentration of I in a
More informationLTCC. Exercises solutions
1. Markov chain LTCC. Exercises solutions (a) Draw a state space diagram with the loops for the possible steps. If the chain starts in state 4, it must stay there. If the chain starts in state 1, it will
More informationPropp-Wilson Algorithm (and sampling the Ising model)
Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationStochastic process for macro
Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,
More informationMS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),
MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationIntroduction to stochastic multiscale modelling in tumour growth
Introduction to stochastic multiscale modelling in tumour growth Tomás Alarcón ICREA & Centre de Recerca Matemàtica T. Alarcón (ICREA & CRM, Barcelona, Spain) Lecture 3 CIMPA, Santiago de Cuba, June 2016
More informationIrregular Birth-Death process: stationarity and quasi-stationarity
Irregular Birth-Death process: stationarity and quasi-stationarity MAO Yong-Hua May 8-12, 2017 @ BNU orks with W-J Gao and C Zhang) CONTENTS 1 Stationarity and quasi-stationarity 2 birth-death process
More informationMarkov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015
Markov chains c A. J. Ganesh, University of Bristol, 2015 1 Discrete time Markov chains Example: A drunkard is walking home from the pub. There are n lampposts between the pub and his home, at each of
More information6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )
6. Brownian Motion. stochastic process can be thought of in one of many equivalent ways. We can begin with an underlying probability space (Ω, Σ, P) and a real valued stochastic process can be defined
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 6
MATH 56A: STOCHASTIC PROCESSES CHAPTER 6 6. Renewal Mathematically, renewal refers to a continuous time stochastic process with states,, 2,. N t {,, 2, 3, } so that you only have jumps from x to x + and
More informationAn Introduction to Entropy and Subshifts of. Finite Type
An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview
More informationMarkov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued
Markov Chains X(t) is a Markov Process if, for arbitrary times t 1 < t 2
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More information5. Solving the Bellman Equation
5. Solving the Bellman Equation In the next two lectures, we will look at several methods to solve Bellman s Equation (BE) for the stochastic shortest path problem: Value Iteration, Policy Iteration and
More informationINTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING
INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing
More informationBisection Ideas in End-Point Conditioned Markov Process Simulation
Bisection Ideas in End-Point Conditioned Markov Process Simulation Søren Asmussen and Asger Hobolth Department of Mathematical Sciences, Aarhus University Ny Munkegade, 8000 Aarhus C, Denmark {asmus,asger}@imf.au.dk
More informationLecture 9 Classification of States
Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and
More informationMarkov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains
Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time
More informationPersistence and Stationary Distributions of Biochemical Reaction Networks
Persistence and Stationary Distributions of Biochemical Reaction Networks David F. Anderson Department of Mathematics University of Wisconsin - Madison Discrete Models in Systems Biology SAMSI December
More informationPart I Stochastic variables and Markov chains
Part I Stochastic variables and Markov chains Random variables describe the behaviour of a phenomenon independent of any specific sample space Distribution function (cdf, cumulative distribution function)
More informationLecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes
Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities
More information8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains
8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States
More informationReflected Brownian Motion
Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationSIMILAR MARKOV CHAINS
SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence of Markov transition probabilities and their spectral properties 1. Vere-Jones, D. Geometric ergodicity in
More informationConsensus on networks
Consensus on networks c A. J. Ganesh, University of Bristol The spread of a rumour is one example of an absorbing Markov process on networks. It was a purely increasing process and so it reached the absorbing
More informationX. Hu, R. Shonkwiler, and M.C. Spruill. School of Mathematics. Georgia Institute of Technology. Atlanta, GA 30332
Approximate Speedup by Independent Identical Processing. Hu, R. Shonkwiler, and M.C. Spruill School of Mathematics Georgia Institute of Technology Atlanta, GA 30332 Running head: Parallel iip Methods Mail
More informationApplied Stochastic Processes
STAT455/855 Fall 26 Applied Stochastic Processes Final Exam, Brief Solutions 1 (15 marks (a (7 marks For 3 j n, starting at the jth best point we condition on the rank R of the point we jump to next By
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More information2.1 Laplacian Variants
-3 MS&E 337: Spectral Graph heory and Algorithmic Applications Spring 2015 Lecturer: Prof. Amin Saberi Lecture 2-3: 4/7/2015 Scribe: Simon Anastasiadis and Nolan Skochdopole Disclaimer: hese notes have
More informationDistributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract
More informationMarkov Chains and Computer Science
A not so Short Introduction Jean-Marc Vincent Laboratoire LIG, projet Inria-Mescal UniversitéJoseph Fourier Jean-Marc.Vincent@imag.fr Spring 2015 1 / 44 Outline 1 Markov Chain History Approaches 2 Formalisation
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More informationLecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321
Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationMS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10
MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 3: Regenerative Processes Contents 3.1 Regeneration: The Basic Idea............................... 1 3.2
More informationCDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes
CDA6530: Performance Models of Computers and Networks Chapter 3: Review of Practical Stochastic Processes Definition Stochastic process X = {X(t), t2 T} is a collection of random variables (rvs); one rv
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationLecture 21: Convergence of transformations and generating a random variable
Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous
More informationprocess on the hierarchical group
Intertwining of Markov processes and the contact process on the hierarchical group April 27, 2010 Outline Intertwining of Markov processes Outline Intertwining of Markov processes First passage times of
More informationLect4: Exact Sampling Techniques and MCMC Convergence Analysis
Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions
More informationRecap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks
Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution
More informationAdvanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds)
Advanced sampling ChE210D Today's lecture: methods for facilitating equilibration and sampling in complex, frustrated, or slow-evolving systems Difficult-to-simulate systems Practically speaking, one is
More informationA note on adiabatic theorem for Markov chains
Yevgeniy Kovchegov Abstract We state and prove a version of an adiabatic theorem for Markov chains using well known facts about mixing times. We extend the result to the case of continuous time Markov
More informationA primer on basic probability and Markov chains
A primer on basic probability and Markov chains David Aristo January 26, 2018 Contents 1 Basic probability 2 1.1 Informal ideas and random variables.................... 2 1.2 Probability spaces...............................
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationRandom Times and Their Properties
Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2
More informationLecture XI. Approximating the Invariant Distribution
Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.
More information25.1 Markov Chain Monte Carlo (MCMC)
CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More information. Find E(V ) and var(v ).
Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number
More informationKrzysztof Burdzy Robert Ho lyst Peter March
A FLEMING-VIOT PARTICLE REPRESENTATION OF THE DIRICHLET LAPLACIAN Krzysztof Burdzy Robert Ho lyst Peter March Abstract: We consider a model with a large number N of particles which move according to independent
More informationDiscrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices
Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant
More information