Computational methods for continuous time Markov chains with applications to biological processes David F. Anderson anderson@math.wisc.edu Department of Mathematics University of Wisconsin - Madison Penn. State January 13th, 212
Stochastic Models of Biochemical Reaction Systems Most common stochastic models of biochemical reaction systems are continuous time Markov chains. Often called chemical master equation type models in biosciences. Common examples include: 1. Gene regulatory networks. 2. Models of viral infection. 3. General population models (epidemic, predator-prey, etc.
Stochastic Models of Biochemical Reaction Systems Most common stochastic models of biochemical reaction systems are continuous time Markov chains. Often called chemical master equation type models in biosciences. Common examples include: 1. Gene regulatory networks. 2. Models of viral infection. 3. General population models (epidemic, predator-prey, etc. Path-wise simulation methods include: Language in Biology Gillespie s Algorithm Next reaction method First reaction method Language in Math Sim. embedded DTMC Sim. random time change representation of Tom Kurtz Sim. using exponential alarm clocks
Stochastic Models of Biochemical Reaction Systems Path-wise methods can approximate values such as For example, 1. Means: f (x = x i. 2. Moments/variances: f (x = x 2 i. Ef (X(t 3. Probabilities: f (x = 1 {x A}. or compute sensitivities d dκ Ef (X κ (t.
Stochastic Models of Biochemical Reaction Systems Path-wise methods can approximate values such as For example, 1. Means: f (x = x i. 2. Moments/variances: f (x = x 2 i. Ef (X(t 3. Probabilities: f (x = 1 {x A}. or compute sensitivities d dκ Ef (X κ (t. Problem: solving using these algorithms can be computationally expensive.
First problem: joint with Des Higham Our first problem: Approximate Ef (X(T to some desired tolerance, ɛ >.
First problem: joint with Des Higham Our first problem: Approximate Ef (X(T to some desired tolerance, ɛ >. Easy! Simulate the CTMC exactly, generate independent paths, X [i] (t, use the unbiased estimator µ n = 1 n n f (X [i] (t. i=1 stop when desired confidence interval is ± ɛ.
What is the computational cost? Recall, Thus, µ n = 1 n n f (X [i] (t. i=1 Var(µ n = O ( 1. n
What is the computational cost? Recall, Thus, So, if we want we need µ n = 1 n n f (X [i] (t. i=1 Var(µ n = O σ n = O(ɛ, ( 1. n 1 n = O(ɛ = n = O(ɛ 2.
What is the computational cost? Recall, Thus, So, if we want we need µ n = 1 n n f (X [i] (t. i=1 Var(µ n = O σ n = O(ɛ, ( 1. n 1 n = O(ɛ = n = O(ɛ 2. If N gives average cost (steps of a path using exact algorithm: Total computational complexity = (cost per path (# paths = O(Nɛ 2. Can be bad if (i N is large, or (ii ɛ is small.
Benefits/drawbacks Benefits: 1. Easy to implement. 2. Estimator is unbiased. µ n = 1 n n f (X [i] (t i=1
Benefits/drawbacks Benefits: 1. Easy to implement. 2. Estimator is unbiased. µ n = 1 n n f (X [i] (t i=1 Drawbacks: 1. The cost of O(Nɛ 2 could be prohibitively large. 2. For our models, we often have that N is very large. We need to develop the model for better ideas...
Build up model: Random time change representation of Tom Kurtz Consider the simple system A + B C where one molecule each of A and B is being converted to one of C.
Build up model: Random time change representation of Tom Kurtz Consider the simple system A + B C where one molecule each of A and B is being converted to one of C. Simple book-keeping: if X(t = (X A (t, X B (t, X C (t T gives the state at time t, X(t = X( + R(t 1 1 1, where R(t is the # of times the reaction has occurred by time t, and X( is the initial condition.
Build up model: Random time change representation of Tom Kurtz Assuming intensity or propensity of reaction is κx A (sx B (s,
Build up model: Random time change representation of Tom Kurtz Assuming intensity or propensity of reaction is We can model ( t R(t = Y κx A (sx B (s, where Y is a unit-rate Poisson point process. κx A (sx B (sds
Build up model: Random time change representation of Tom Kurtz Assuming intensity or propensity of reaction is We can model ( t R(t = Y κx A (sx B (s, where Y is a unit-rate Poisson point process. κx A (sx B (sds Hence X A (t X B (t X C (t X(t = X( + 1 1 1 ( t Y κx A (sx B (sds.
Build up model: Random time change representation of Tom Kurtz Now consider a network of reactions involving d chemical species, S 1,..., S d : d d ν ik S i i=1 i=1 Denote reaction vector as ζ k = ν k ν k, ν iks i
Build up model: Random time change representation of Tom Kurtz Now consider a network of reactions involving d chemical species, S 1,..., S d : d d ν ik S i i=1 i=1 Denote reaction vector as ζ k = ν k ν k, ν iks i The intensity (or propensity of kth reaction is λ k : Z d R. By analogy with before X(t = X( + k Y k ( t λ k (X(sds ζ k, Y k are independent, unit-rate Poisson processes.
Example Consider a model of gene transcription and translation: G 25 G + M, M 1 M + P, P + P.1 D, M.1, P 1 Then, if X = [X M, X P, X D ] T, (Transcription (Translation (Dimerization (Degradation of mrna (Degradation of Protein.
Example Consider a model of gene transcription and translation: G 25 G + M, M 1 M + P, P + P.1 D, M.1, (Transcription (Translation (Dimerization (Degradation of mrna P 1 Then, if X = [X M, X P, X D ] T, X(t = X( + Y 1 (25t 1 (Degradation of Protein. + Y 2 (1 t t + Y 3 (.1 X P (s(x P (s 1ds t + Y 4 (.1 X M (sds 1 + Y 5 (1 X M (sds 2 1 t 1 X P (sds 1
Back to our problem Recall: Benefits: 1. Easy to implement. 2. Estimator is unbiased. µ n = 1 n n f (X [i] (t i=1 Drawbacks: 1. The cost of O(Nɛ 2 could be prohibitively large. 2. For our models, we often have that N is very large. Let s try an approximate scheme.
Tau-leaping: Euler s method Explicit tau-leaping 1 or Euler s method, was first formulated by Dan Gillespie in this setting. Tau-leaping is essentially an Euler approximation of Z (h = Z ( + k Z ( + k d = Z ( + k t ( h Y k λ k (Z (s ds Y k (λ k (Z ( h ζ k λ k (X(sds: ζ k ( Poisson λ k (Z ( h ζ k. 1 D. T. Gillespie, J. Chem. Phys., 115, 1716 1733.
Euler s method Path-wise representation for Z (t generated by Euler s method is Z (t = X( + k Y k ( t λ k (Z η(sds ζ k, where η(s = t n if t n s < t n+1 = t n + h is a step function giving left endpoints of time discretization.
Return to approximating Ef (X(T Let Z L denote an approximate processes generated with time discretization step of h L. Let µ n = 1 n f (Z L,[i] (t. n i=1
Return to approximating Ef (X(T Let Z L denote an approximate processes generated with time discretization step of h L. Let µ n = 1 n f (Z L,[i] (t. n We note i=1 Ef (X(t µ n = [ Ef (X(t Ef (Z L (t ] + Ef (Z L (t µ n
Return to approximating Ef (X(T Let Z L denote an approximate processes generated with time discretization step of h L. Let µ n = 1 n f (Z L,[i] (t. n We note i=1 Ef (X(t µ n = [ Ef (X(t Ef (Z L (t ] + Ef (Z L (t µ n Suppose have an order one method Ef (X(t Ef (Z L (t = O(h L.
Return to approximating Ef (X(T Let Z L denote an approximate processes generated with time discretization step of h L. Let µ n = 1 n f (Z L,[i] (t. n We note i=1 Ef (X(t µ n = [ Ef (X(t Ef (Z L (t ] + Ef (Z L (t µ n Suppose have an order one method Ef (X(t Ef (Z L (t = O(h L. We need: 1. h L = O(ɛ. 2. n = ɛ 2.
Return to approximating Ef (X(T Let Z L denote an approximate processes generated with time discretization step of h L. Let µ n = 1 n f (Z L,[i] (t. n We note i=1 Ef (X(t µ n = [ Ef (X(t Ef (Z L (t ] + Ef (Z L (t µ n Suppose have an order one method Ef (X(t Ef (Z L (t = O(h L. We need: 1. h L = O(ɛ. 2. n = ɛ 2. Suppose a path costs O(ɛ 1 steps. Then Total computational complexity = (# paths (cost per path = O(ɛ 3.
Benefits/drawbacks Benefits: 1. Can drastically lower the computational complexity of a problem if ɛ 1 N. CC of using exact = Nɛ 2 CC of using approximate = ɛ 1 ɛ 2.
Benefits/drawbacks Benefits: 1. Can drastically lower the computational complexity of a problem if ɛ 1 N. CC of using exact = Nɛ 2 CC of using approximate = ɛ 1 ɛ 2. Drawbacks: 1. Convergence results usually give order of convergence. Can t give a precise h L. Bias is a problem. 2. Tau-leaping has problems: what happens if you go negative? 3. Gone away from an unbiased estimator.
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1 Suppose X ẐL, and ẐL is cheap.
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1 Suppose X ẐL, and ẐL is cheap. Suppose X, ẐL can be generated simultaneously so that Var(X ẐL is small.
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1 Suppose X ẐL, and ẐL is cheap. Suppose X, ẐL can be generated simultaneously so that Var(X ẐL is small. Then use EX = E[X ẐL] + EẐL
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1 Suppose X ẐL, and ẐL is cheap. Suppose X, ẐL can be generated simultaneously so that Var(X ẐL is small. Then use EX = E[X ẐL] + EẐL 1 n 1 (X [i] n ẐL,[i] + 1 n 2 Ẑ L,[i]. 1 n 2 i=1 i=1
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1 Suppose X ẐL, and ẐL is cheap. Suppose X, ẐL can be generated simultaneously so that Var(X ẐL is small. Then use EX = E[X ẐL] + EẐL 1 n 1 (X [i] n ẐL,[i] + 1 n 2 Ẑ L,[i]. 1 n 2 i=1 i=1 Multi-level Monte Carlo (Mike Giles, Stefan Heinrich = Keep going EX = E(X ẐL + EẐL =
Multi-level Monte Carlo and control variates Suppose I want EX 1 n X [i], n but realizations of X are expensive. i=1 Suppose X ẐL, and ẐL is cheap. Suppose X, ẐL can be generated simultaneously so that Var(X ẐL is small. Then use EX = E[X ẐL] + EẐL 1 n 1 (X [i] n ẐL,[i] + 1 n 2 Ẑ L,[i]. 1 n 2 i=1 i=1 Multi-level Monte Carlo (Mike Giles, Stefan Heinrich = Keep going EX = E(X ẐL + EẐL = E(Z ẐL + E(ẐL ẐL 1 + EẐL 1 =
Multi-level Monte Carlo: an unbiased estimator In our setting: Ef (X(t =
Multi-level Monte Carlo: an unbiased estimator In our setting: Ef (X(t = E[f (X(t f (Z L (t] + L E[f (Z l (t f (Z l 1 (t] + Ef (Z l (t. l=l +1
Multi-level Monte Carlo: an unbiased estimator In our setting: Ef (X(t = E[f (X(t f (Z L (t] + L l=l +1 E[f (Z l (t f (Z l 1 (t] + Ef (Z l (t. For appropiate choices of n, n l, and n E, we define the estimators for the three terms above via Q E def = 1 n E n E i=1 (f (X [i] (T f (Z L,[i] (T,
Multi-level Monte Carlo: an unbiased estimator In our setting: Ef (X(t = E[f (X(t f (Z L (t] + L l=l +1 E[f (Z l (t f (Z l 1 (t] + Ef (Z l (t. For appropiate choices of n, n l, and n E, we define the estimators for the three terms above via Q E Q l def = 1 n E n E i=1 (f (X [i] (T f (Z L,[i] (T, def = 1 n l (f (Z l,[i] (T f (Z l 1,[i] (T, for l {l + 1,..., L} n l i=1
Multi-level Monte Carlo: an unbiased estimator In our setting: Ef (X(t = E[f (X(t f (Z L (t] + L l=l +1 E[f (Z l (t f (Z l 1 (t] + Ef (Z l (t. For appropiate choices of n, n l, and n E, we define the estimators for the three terms above via Q E Q l Q def = 1 n E and note that n E i=1 (f (X [i] (T f (Z L,[i] (T, def = 1 n l (f (Z l,[i] (T f (Z l 1,[i] (T, for l {l + 1,..., L} n l i=1 def = 1 n f (Z l,[i](t, n i=1 Q def = Q E + is an unbiased estimator for Ef (X(T. L l=l +1 Q l + Q
Multi-level Monte Carlo: an unbiased estimator In our setting: Ef (X(t = E[f (X(t f (Z L (t] + L l=l +1 E[f (Z l (t f (Z l 1 (t] + Ef (Z l (t. For appropiate choices of n, n l, and n E, we define the estimators for the three terms above via Q E Q l Q def = 1 n E and note that n E i=1 (f (X [i] (T f (Z L,[i] (T, def = 1 n l (f (Z l,[i] (T f (Z l 1,[i] (T, for l {l + 1,..., L} n l i=1 def = 1 n f (Z l,[i](t, n i=1 Q def = Q E + is an unbiased estimator for Ef (X(T. L l=l +1 Q l + Q So what is the coupling and the variance of the estimator?
How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13.
How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13. We could let Y 1 and Y 2 be independent, unit-rate Poisson processes, and set Z 13.1 (t = Y 1 (13.1t, Z 13 (t = Y 2 (13t, Using this representation, these processes are independent and, hence, not coupled.
How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13. We could let Y 1 and Y 2 be independent, unit-rate Poisson processes, and set Z 13.1 (t = Y 1 (13.1t, Z 13 (t = Y 2 (13t, Using this representation, these processes are independent and, hence, not coupled. The variance of difference is large: Var(Z 13.1 (t Z 13 (t = Var(Y 1 (13.1t + Var(Y 2 (13t = 26.1t.
How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13.
How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13. We could let Y 1 and Y 2 be independent unit-rate Poisson processes, and set Z 13.1 (t = Y 1 (13t + Y 2 (.1t Z 13 (t = Y 1 (13t, The variance of difference is much smaller: Var(Z 13.1 (t Z 13 (t = Var (Y 2 (.1t =.1t.
How do we generate processes simultaneously More generally, suppose we want 1. non-homogeneous Poisson process with intensity f (t and 2. non-homogeneous Poisson process with intensity g(t.
How do we generate processes simultaneously More generally, suppose we want 1. non-homogeneous Poisson process with intensity f (t and 2. non-homogeneous Poisson process with intensity g(t. We can can let Y 1, Y 2, and Y 3 be independent, unit-rate Poisson processes and define ( t ( t Z f (t = Y 1 f (s g(sds + Y 2 f (s (f (s g(s ds, ( t ( t Z g(t = Y 1 f (s g(sds + Y 3 g(s (f (s g(s ds,
How do we generate processes simultaneously More generally, suppose we want 1. non-homogeneous Poisson process with intensity f (t and 2. non-homogeneous Poisson process with intensity g(t. We can can let Y 1, Y 2, and Y 3 be independent, unit-rate Poisson processes and define ( t ( t Z f (t = Y 1 f (s g(sds + Y 2 f (s (f (s g(s ds, ( t ( t Z g(t = Y 1 f (s g(sds + Y 3 g(s (f (s g(s ds, where we are using that, for example, ( t ( t ( t Y 1 f (s g(sds + Y 2 f (s (f (s g(s ds = Y where Y is a unit rate Poisson process. f (sds,
Back to our processes X(t = X( + k Z (t = X( + k Y k ( t Y k ( t λ k (X(sds ζ k, λ k (Z η(sds ζ k.
Back to our processes X(t = X( + k Z (t = X( + k Y k ( t Y k ( t λ k (X(sds ζ k, λ k (Z η(sds ζ k. Now couple X(t = X( + k + k Z l (t = Z l ( + k + k Y k,1 ( t Y k,2 ( t Y k,1 ( t Y k,3 ( t λ k (X(s λ k (Z l η l (sds ζ k λ k (X(s λ k (X(s λ k (Z l η l (sds ζ k λ k (X(s λ k (Z l η l (sds ζ k λ k (Z l η l (s λ k (X(s λ k (Z l η l (sds ζ k
Back to our processes X(t = X( + k Z (t = X( + k Y k ( t Y k ( t λ k (X(sds ζ k, λ k (Z η(sds ζ k. Now couple X(t = X( + k + k Z l (t = Z l ( + k + k Y k,1 ( t Y k,2 ( t Y k,1 ( t Y k,3 ( t λ k (X(s λ k (Z l η l (sds ζ k λ k (X(s λ k (X(s λ k (Z l η l (sds ζ k λ k (X(s λ k (Z l η l (sds ζ k λ k (Z l η l (s λ k (X(s λ k (Z l η l (sds ζ k Algorithm for simulation is equivalent to next reaction method or Gillespie.
For approximate processes Z l (t = Z l ( + ( t Y k,1 λ k (Z l η l (s λ k (Z l 1 η l 1 (sds ζ k k + ( t Y k,2 λ k (Z l η l (s λ k (Z l η l (s λ k (Z l 1 η l 1 (sds ζ k k Z l 1 (t = Z l 1 ( + ( t Y k,1 λ k (Z l η l (s λ k (Z l 1 η l 1 (sds ζ k k + ( t Y k,3 λ k (Z l 1 η l 1 (s λ k (Z l η l (s λ k (Z l 1 η l 1 (sds ζ k, k Algorithm for simulation is equivalent in to τ-leaping.
Multi-level Monte Carlo: chemical kinetic setting Can prove: Theorem (Anderson, Higham 211 Suppose (X, Z l satisfy coupling. Then, sup E X(t Z l (t 2 C 1 (T N ρ h l + C 2 (T hl. 2 t T 1 David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeled chemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available at arxiv.org:117.2181. Also at www.math.wisc.edu/ anderson.
Multi-level Monte Carlo: chemical kinetic setting Can prove: Theorem (Anderson, Higham 211 Suppose (X, Z l satisfy coupling. Then, sup E X(t Z l (t 2 C 1 (T N ρ h l + C 2 (T hl. 2 t T Theorem (Anderson, Higham 211 Suppose (Z l, Z l 1 satisfy coupling. Then, sup E Z l (t Z l 1 (t 2 C 1 (T N ρ h l + C 2 (T hl. 2 t T 1 David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeled chemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available at arxiv.org:117.2181. Also at www.math.wisc.edu/ anderson.
Multi-level Monte Carlo: an unbiased estimator For well chosen n, n l, and n E. We have Var( Q L = Var QE + l=l +1 Q l + Q = O(ɛ 2, with [ ] ( Comp. cost = ɛ 2 (N ρ h L + hl 2 N+ɛ 2 h 1 l + ln(ɛ 2 N ρ + ln(ɛ 1 1 M 1 h l
Multi-level Monte Carlo: an unbiased estimator Some observations: 1. Weak error plays no role in analysis: free to choose h L.
Multi-level Monte Carlo: an unbiased estimator Some observations: 1. Weak error plays no role in analysis: free to choose h L. 2. Common problems associated with tau-leaping Negativity of species numbers, does not matter. Just define process in a sensible way.
Multi-level Monte Carlo: an unbiased estimator Some observations: 1. Weak error plays no role in analysis: free to choose h L. 2. Common problems associated with tau-leaping Negativity of species numbers, does not matter. Just define process in a sensible way. 3. The method is unbiased.
Example Consider a model of gene transcription and translation: Suppose: G 25 G + M, M 1 M + P, P + P.1 D, M.1, P 1. 1. initialize with: G = 1, M =, P =, D =, 2. want to estimate the expected number of dimers at time T = 1, 3. to an accuracy of ± 1. with 95% confidence.
Example Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1. 4,74, 149, CPU S (41 hours! 8.27 1 1
Example Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1. 4,74, 149, CPU S (41 hours! 8.27 1 1 Method: Euler tau-leaping with crude Monte Carlo. Step-size Approximation # paths CPU Time # updates h = 3 7 3,712.3 ± 1. 4,75, 13,374.6 S 6.2 1 1 h = 3 6 3,77.5 ± 1. 4,75, 6,27.9 S 2.1 1 1 h = 3 5 3,693.4 ± 1. 4,7, 2,83.9 S 6.9 1 9 h = 3 4 3,654.6 ± 1. 4,65, 1,219. S 2.6 1 9
Example Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1. 4,74, 149, CPU S (41 hours! 8.27 1 1 Method: Euler tau-leaping with crude Monte Carlo. Step-size Approximation # paths CPU Time # updates h = 3 7 3,712.3 ± 1. 4,75, 13,374.6 S 6.2 1 1 h = 3 6 3,77.5 ± 1. 4,75, 6,27.9 S 2.1 1 1 h = 3 5 3,693.4 ± 1. 4,7, 2,83.9 S 6.9 1 9 h = 3 4 3,654.6 ± 1. 4,65, 1,219. S 2.6 1 9 Method: unbiased MLMC with l = 2, and M and L detailed below. Step-size parameters Approx. CPU Time # updates M = 3, L = 6 3,713.9 ± 1. 1,63.3 S 1.1 1 9 M = 3, L = 5 3,714.7 ± 1. 1,114.9 S 9.4 1 8 M = 3, L = 4 3,714.2 ± 1. 1,656.6 S 1. 1 9 M = 4, L = 4 3714.2 ± 1. 1,334.8 S 1.1 1 9 M = 4, L = 5 3,713.8 ± 1. 1,14.9 S 1.1 1 9
Example Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1. 4,74, 149, CPU S (41 hours! 8.27 1 1 Method: Euler tau-leaping with crude Monte Carlo. Step-size Approximation # paths CPU Time # updates h = 3 7 3,712.3 ± 1. 4,75, 13,374.6 S 6.2 1 1 h = 3 6 3,77.5 ± 1. 4,75, 6,27.9 S 2.1 1 1 h = 3 5 3,693.4 ± 1. 4,7, 2,83.9 S 6.9 1 9 h = 3 4 3,654.6 ± 1. 4,65, 1,219. S 2.6 1 9 Method: unbiased MLMC with l = 2, and M and L detailed below. Step-size parameters Approx. CPU Time # updates M = 3, L = 6 3,713.9 ± 1. 1,63.3 S 1.1 1 9 M = 3, L = 5 3,714.7 ± 1. 1,114.9 S 9.4 1 8 M = 3, L = 4 3,714.2 ± 1. 1,656.6 S 1. 1 9 M = 4, L = 4 3714.2 ± 1. 1,334.8 S 1.1 1 9 M = 4, L = 5 3,713.8 ± 1. 1,14.9 S 1.1 1 9 the exact algorithm with crude Monte Carlo demanded 14 times more CPU time than our unbiased MLMC estimator!
Example Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1. 4,74, 149, CPU S (41 hours! 8.27 1 1 Unbiased Multi-level Monte Carlo with M = 3, L = 5, and l = 2. Level # paths CPU Time Var. estimator # updates (X, Z 3 5 3,9 279.6 S.658 6.8 1 7 (Z 3 5, Z 3 4 3, 49. S.217 8.8 1 7 (Z 3 4, Z 3 3 15, 71.7 S.179 1.5 1 8 (Z 3 3, Z 3 2 51, 112.3 S.319 1.7 1 8 Tau-leap with h = 3 2 8,63, 518.4 S.1192 4.7 1 8 Totals N.A. 131. S.2565 9.5 1 8
Some conclusions about this method 1. Gillespie s algorithm is by far the most common way to compute expectations: 1.1 Means. 1.2 Variances. 1.3 Probabilities. 2. The new method (MLMC also performs this task with no bias (exact.
Some conclusions about this method 1. Gillespie s algorithm is by far the most common way to compute expectations: 1.1 Means. 1.2 Variances. 1.3 Probabilities. 2. The new method (MLMC also performs this task with no bias (exact. 3. Will be at worst the same speed as Gillespie (exact algorithm + crude Monte Carlo. 4. Will commonly be many orders of magnitude faster.
Some conclusions about this method 1. Gillespie s algorithm is by far the most common way to compute expectations: 1.1 Means. 1.2 Variances. 1.3 Probabilities. 2. The new method (MLMC also performs this task with no bias (exact. 3. Will be at worst the same speed as Gillespie (exact algorithm + crude Monte Carlo. 4. Will commonly be many orders of magnitude faster. 5. Applicable to essentially all continuous time Markov chains: X(t = X( + ( t Y k λ k (X(sds ζ k. k
Some conclusions about this method 1. Gillespie s algorithm is by far the most common way to compute expectations: 1.1 Means. 1.2 Variances. 1.3 Probabilities. 2. The new method (MLMC also performs this task with no bias (exact. 3. Will be at worst the same speed as Gillespie (exact algorithm + crude Monte Carlo. 4. Will commonly be many orders of magnitude faster. 5. Applicable to essentially all continuous time Markov chains: X(t = X( + ( t Y k λ k (X(sds ζ k. k 6. Con: Is substantially harder to implement; good software is needed.
Some conclusions about this method 1. Gillespie s algorithm is by far the most common way to compute expectations: 1.1 Means. 1.2 Variances. 1.3 Probabilities. 2. The new method (MLMC also performs this task with no bias (exact. 3. Will be at worst the same speed as Gillespie (exact algorithm + crude Monte Carlo. 4. Will commonly be many orders of magnitude faster. 5. Applicable to essentially all continuous time Markov chains: X(t = X( + ( t Y k λ k (X(sds ζ k. k 6. Con: Is substantially harder to implement; good software is needed. 7. Makes no use of any specific structure or scaling in the problem.
Another example: Viral infection Let 1. T = viral template. 2. G = viral genome. 3. S = viral structure. 4. V = virus. Reactions: R1 T + stuff κ 1 T + G κ 1 = 1 R2 G κ 2 T κ 2 =.25 R3 T + stuff κ 3 T + S κ 3 = 1 R4 T κ 4 κ 4 =.25 R5 S κ 5 κ 5 = 2 R6 G + S κ 6 V κ 6 = 7.5 1 6 R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 22. E. Haseltine and J. Rawlings, J. Chem. Phys, 22. K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 26. W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 26.
Another example: Viral infection Stochastic equations for X = (X G, X S, X T, X V are ( t ( X 1 (t = X 1 ( + Y 1 X 3 (sds Y 2.25 Y 6 ( 7.5 1 6 t t X 1 (sx 2 (sds t ( X 2 (t = X 2 ( + Y 3 (1 X 3 (sds Y 5 2 ( t Y 6 7.5 1 6 X 1 (sx 2 (sds t X 3 (t = X 3 ( + Y 2 (.25 X 1 (sds X 4 (t = X 4 ( + Y 6 ( 7.5 1 6 t t ( Y 4.25 X 1 (sx 2 (sds. X 1 (sds X 2 (sds t X 3 (sds
Another example: Viral infection Reactions: R1 T + stuff κ 1 T + G κ 1 = 1 R2 G κ 2 T κ 2 =.25 R3 T + stuff κ 3 T + S κ 3 = 1 R4 T κ 4 κ 4 =.25 R5 S κ 5 κ 5 = 2 R6 G + S κ 6 V κ 6 = 7.5 1 6 If T >, reactions 3 and 5 are much faster than others. Looks like S is approximately Poisson(5 T.
Another example: Viral infection Reactions: R1 T + stuff κ 1 T + G κ 1 = 1 R2 G κ 2 T κ 2 =.25 R3 T + stuff κ 3 T + S κ 3 = 1 R4 T κ 4 κ 4 =.25 R5 S κ 5 κ 5 = 2 R6 G + S κ 6 V κ 6 = 7.5 1 6 If T >, reactions 3 and 5 are much faster than others. Looks like S is approximately Poisson(5 T. Can average out to get approximate process Z (t.
Another example: Viral infection Approximate process satisfies. ( t ( Z 1 (t = X 1 ( + Y 1 Z 3 (sds Y 2.25 Y 6 ( 3.75 1 3 t t Z 3 (t = X 3 ( + Y 2 (.25 Z 1 (sds Z 4 (t = X 4 ( + Y 6 ( 3.75 1 3 t t Z 1 (sz 3 (sds ( Y 4.25 Z 1 (sz 3 (sds. Z 1 (sds t Z 3 (sds (1 Now use Ef (X(t = E[f (X(t f (Z (t] + Ef (Z (t.
Another example: Viral infection ( t min{x 3 (s, Z 3 (s}ds ζ 1 + Y 1,2 ( t X(t = X( + Y 1,1 t ( t + Y 2,1 (.25 min{x 1 (s, Z 1 (s}ds ζ 2 + Y 2,2.25 t + Y 3 (1 X 3 (sds ζ 3 t ( t + Y 4,1 (.25 min{x 3 (s, Z 3 (s}(sds ζ 4 + Y 4,2.25 t + Y 5 (2 X 2 (sds ζ 5 ( t + Y 6,1 X 3 (s min{x 3 (s, Z 3 (s}ds ζ 1 X 1 (s min{x 1 (s, Z 1 (s}ds ζ 2 X 3 (s min{x 3 (s, Z 3 (s}(sds ζ 4 ( t min{λ 6 (X(s, Λ 6 (Z (s}ds ζ 6 Y 6,2 λ 6 (X(s min{λ 6 (X(s, Λ 6 (Z (s}ds ζ 6 ( t ( t Z (t = Y 1,1 min{x 3 (s, Z 3 (s}ds ζ 1 + Y 1,3 Z 3 (s min{x 3 (s, Z 3 (s}ds ζ 1 t ( t + Y 2,1 (.25 min{x 1 (s, Z 1 (s}ds ζ 2 + Y 2,3.25 Z 1 (s min{x 1 (s, Z 1 (s}ds ζ 2 t ( t + Y 4,1 (.25 min{x 3 (s, Z 3 (s}(sds ζ 4 + Y 4,3.25 Z 3 (s min{x 3 (s, Z 3 (s}(sds ζ 4 ( t ( t + Y 6,1 min{λ 6 (X(s, Λ 6 (Z (s}ds ζ 6 Y 6,3 Λ 6 (Z (s min{λ 6 (X(s, Λ 6 (Z (s}ds ζ 6,
Another example: Viral infection Suppose want Given T ( = 1, all others zero. EX virus (2 Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 13.85 ±.7 75, 24,8 CPU S 1.45 1 1
Another example: Viral infection Suppose want Given T ( = 1, all others zero. EX virus (2 Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 13.85 ±.7 75, 24,8 CPU S 1.45 1 1 Method: Ef (X(t = E[f (X(t f (Z (t] + Ef (Z (t. Approximation CPU Time # updates 13.91 ±.7 1,118.5 CPU S 2.41 1 8 Exact + crude Monte Carlo used: 1. 6 times more total steps. 2. 22 times more CPU time.
Mathematical Analysis We had X(t = X( + k Y k ( t λ k(x(sds ζ k. Assumed λ k(x( N 1. k There are therefore two extreme parameters floating around our models: 1. Some parameter N 1, causing N 1 (inherent to model. 2. h, the stepsize (inherent to approximation. To quantify errors, need to account for both.
Mathematical Analysis: Scaling in style of Thomas Kurtz For each species i, define the normalized abundance Xi N (t = N α i X i (t, where α i should be selected so that X N i = O(1.
Mathematical Analysis: Scaling in style of Thomas Kurtz For each species i, define the normalized abundance Xi N (t = N α i X i (t, where α i should be selected so that X N i = O(1. Rate constants, κ k, may also vary over several orders of magnitude. We write κ k = κ k N β k where the β k are selected so that κ k = O(1.
Mathematical Analysis: Scaling in style of Thomas Kurtz For each species i, define the normalized abundance Xi N (t = N α i X i (t, where α i should be selected so that X N i = O(1. Rate constants, κ k, may also vary over several orders of magnitude. We write κ k = κ k N β k where the β k are selected so that κ k = O(1. Eventually leads to scaled model X N (t = X N ( + k Y k ( N γ t N β k +α ν k γ λ k (X N (sds ζk N.
Results Let ρ k satisfy and set X N (t = X N ( + k Y k ( N γ t ζ N k N ρ k, ρ = min{ρ k }. N c k λ k (X N (sds ζk N. Theorem (A., Higham 211 Suppose (Z N l, Z N l 1 satisfy coupling with Z N l ( = Z N l 1(. Then, sup E Zl N (t Zl 1(t N 2 C 1 (T, N, γn ρ h l + C 2 (T, N, γhl. 2 t T
Results Let ρ k satisfy and set X N (t = X N ( + k Y k ( N γ t ζ N k N ρ k, ρ = min{ρ k }. N c k λ k (X N (sds ζk N. Theorem (A., Higham 211 Suppose (X N, Z N l satisfy coupling with X N ( = Z N l (. Then, sup E X N (t Zl N (t 2 C 1 (T, N, γn ρ h l + C 2 (T, N, γhl. 2 t T
Flavor of Proof Theorem (A., Higham 211 Suppose (X N, Z N l satisfy coupling with X N ( = Z N l (. Then, sup E X N (t Zl N (t 2 C 1 (T, N, γn ρ h l + C 2 (T, N, γhl. 2 t T η l(sds ζk N X N (t =X N ( + t Y k,1 (N γ N c k λ k (X N (s λ k (Zl N k + t Y k,2 (N γ N c k λ k (X N (s λ k (X N (s λ k (Zl N k η l(sds ζk N Zl N (t =Z l N ( + t Y k,1 (N γ N c k λ k (X N (s λ k (Z l η l (sds ζk N k + t Y k,3 (N γ N c k λ k (Zl N η l(s λ k (X N (s λ k (Zl N η l(sds ζk N k
Flavor of Proof So, X N (t Z N (t = t Y k,2 (N γ N c k λ k (X N (s λ k (X N (s λ k (Zl N η l(sds ζk N k t Y k,3 (N γ N c k λ k (Zl N η l(s λ k (X N (s λ k (Zl N η l(sds ζk N
Flavor of Proof So, X N (t Z N (t = t Y k,2 (N γ N c k λ k (X N (s λ k (X N (s λ k (Zl N η l(sds ζk N k t Y k,3 (N γ N c k λ k (Zl N η l(s λ k (X N (s λ k (Zl N η l(sds ζk N Hence, X N (t Z N (t = M N (t + k N γ ζ N k N c k t (λ k (X N (s λ k (Z N l η l (sds. Now work.
Next problem: parameter sensitivities. Motivated by Jim Rawlings.
Next problem: parameter sensitivities. Motivated by Jim Rawlings. We have and we define X θ (t = X θ ( + k Y k ( t λ θ k (X θ (sds ζ k. J(θ = Ef (X θ (t].
Next problem: parameter sensitivities. Motivated by Jim Rawlings. We have and we define X θ (t = X θ ( + k Y k ( t λ θ k (X θ (sds ζ k. J(θ = Ef (X θ (t]. We want J (θ = d dθ Ef (X θ (t.
Next problem: parameter sensitivities. Motivated by Jim Rawlings. We have and we define X θ (t = X θ ( + k Y k ( t λ θ k (X θ (sds ζ k. J(θ = Ef (X θ (t]. We want J (θ = d dθ Ef (X θ (t. There are multiple methods. We consider finite differences: J (θ = J(θ + ɛ J(θ ɛ + O(ɛ.
Next problem: parameter sensitivities. Noting that J (θ = d dθ Ef (X θ (t = Ef (X θ+ɛ (t Ef (X θ (t ɛ + o(ɛ. The usual finite difference estimator is D R (ɛ = ɛ 1 1 R R i=1 f (X θ+ɛ [i] (t 1 R R f (X[j](t θ j=1
Next problem: parameter sensitivities. Noting that J (θ = d dθ Ef (X θ (t = Ef (X θ+ɛ (t Ef (X θ (t ɛ + o(ɛ. The usual finite difference estimator is D R (ɛ = ɛ 1 1 R R i=1 f (X θ+ɛ [i] (t 1 R R f (X[j](t θ j=1 If generated independently, then Var(D R (ɛ = O(R 1 ɛ 2.
Next problem: parameter sensitivities. Couple the processes. X θ+ɛ (t = X θ+ɛ ( + ( t Y k,1 λ θ+ɛ k (X θ+ɛ (s λ θ k (X θ (sds ζ k k + ( t Y k,2 λ θ+ɛ k (X θ+ɛ (s λ θ+ɛ k (X θ+ɛ (s λ θ k (X θ (sds k X θ (t = X θ ( + ( t Y k,1 λ θ+ɛ k (X θ+ɛ (s λ θ k (X θ (sds ζ k k + ( t Y k,3 λ θ k (X θ (s λ θ+ɛ k (X θ+ɛ (s λ θ k (X θ (sds ζ k, k ζ k Use: D R (ɛ = ɛ 1 1 R R [ f (X θ+ɛ [i] i=1 ] (t f (X[i](t θ.
Next problem: parameter sensitivities. Theorem (Anderson, 211 Suppose (X θ+ɛ, X θ satisfy coupling. Then, for any T > there is a C T,f > for which [ ] ( 2 E f (X θ+ɛ (t f (X θ (t C T,f ɛ. sup t T 1 David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains. Submitted. Available at arxiv.org:119.289. Also at www.math.wisc.edu/ anderson.
Next problem: parameter sensitivities. Theorem (Anderson, 211 Suppose (X θ+ɛ, X θ satisfy coupling. Then, for any T > there is a C T,f > for which [ ] ( 2 E f (X θ+ɛ (t f (X θ (t C T,f ɛ. sup t T This lowers variance of estimator from O(R 1 ɛ 2, to O(R 1 ɛ 1. Lowered by order of magnitude (in ɛ. 1 David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains. Submitted. Available at arxiv.org:119.289. Also at www.math.wisc.edu/ anderson.
Parameter Sensitivities G 2 G + M, M 1 M + P, M k, P 1. Want k E [ X k protein(3 ], k 1/4.
Parameter Sensitivities G 2 G + M, M 1 M + P, M k, P 1. Want [ ] k E Xprotein(3 k, k 1/4. Method # paths Approximation # updates CPU Time Likelihood ratio 689,6-312.1 ± 6. 2.9 1 9 3,56.6 S Exact/Naive FD 246,2-318.8 ± 6. 2.1 1 9 3,282.1 S CRP 26,32-32.7 ± 6. 2.2 1 8 41. S Coupled 4,78-321.2 ± 6. 2.1 1 7 35.3 S
Analysis Theorem Suppose (X θ+ɛ, X θ satisfy coupling. Then, for any T > there is a C T,f > for which ( 2 E sup f (X θ+ɛ (t f (X (t θ CT,f ɛ. t T Proof:
Analysis Theorem Suppose (X θ+ɛ, X θ satisfy coupling. Then, for any T > there is a C T,f > for which ( 2 E sup f (X θ+ɛ (t f (X (t θ CT,f ɛ. t T Proof: Key observation of proof: X θ+ɛ (t X θ (t = M θ,ɛ (t + t F θ+ɛ (X θ+ɛ (s F θ (X θ (sds. Now work on Martingale and absolutely continuous part.
Thanks! References: 1. David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics, to appear in SIAM: Multiscale Modeling and Simulation. Available at arxiv.org:117.2181. Also on my website: www.math.wisc.edu/ anderson. 2. David F. Anderson, Efficient Finite Difference Method for Parameter Sensitivities of Continuous time Markov Chains, submitted. Available at arxiv.org:119.289. Also on my website: www.math.wisc.edu/ anderson. Funding: NSF-DMS-19275.