Stochastic models of biochemical systems

Similar documents
Computational methods for continuous time Markov chains with applications to biological processes

Simulation methods for stochastic models in chemistry

2008 Hotelling Lectures

Stochastic analysis of biochemical reaction networks with absolute concentration robustness

Chemical reaction network theory for stochastic and deterministic models of biochemical reaction systems

Stochastic and deterministic models of biochemical reaction networks

Longtime behavior of stochastically modeled biochemical reaction networks

Stochastic models for chemical reactions

Extending the Tools of Chemical Reaction Engineering to the Molecular Scale

Gillespie s Algorithm and its Approximations. Des Higham Department of Mathematics and Statistics University of Strathclyde

Fast Probability Generating Function Method for Stochastic Chemical Reaction Networks

arxiv: v2 [q-bio.mn] 31 Aug 2007

Efficient Leaping Methods for Stochastic Chemical Systems

Persistence and Stationary Distributions of Biochemical Reaction Networks

Stochastic Simulation.

Modelling in Systems Biology

Parameter Estimation in Stochastic Chemical Kinetic Models. Rishi Srivastava. A dissertation submitted in partial fulfillment of

Mean-square Stability Analysis of an Extended Euler-Maruyama Method for a System of Stochastic Differential Equations

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Supporting Information for The Stochastic Quasi-Steady-State Assumption: Reducing the Model but Not the Noise

Comparison of Two Parameter Estimation Techniques for Stochastic Models

Extending the multi-level method for the simulation of stochastic biological systems

Nested Stochastic Simulation Algorithm for Chemical Kinetic Systems with Disparate Rates. Abstract

Introduction to Stochastic Processes with Applications in the Biosciences

ON THE APPROXIMATION OF STOCHASTIC CONCURRENT CONSTRAINT PROGRAMMING BY MASTER EQUATION

Nested Stochastic Simulation Algorithm for KMC with Multiple Time-Scales

Kinetic Monte Carlo. Heiko Rieger. Theoretical Physics Saarland University Saarbrücken, Germany

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC

Lecture 4 The stochastic ingredient

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Introduction to Random Diffusions

Cybergenetics: Control theory for living cells

Multilevel Monte Carlo for Lévy Driven SDEs

Stochastic Chemical Kinetics

Lecture 1 Modeling in Biology: an introduction

Continuous-time Markov Chains

Exponential Distribution and Poisson Process

Bayesian Inference for DSGE Models. Lawrence J. Christiano

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

Stochastic Differential Equations.

Frequency Spectra and Inference in Population Genetics

Bernardo D Auria Stochastic Processes /12. Notes. March 29 th, 2012

Computational complexity analysis for Monte Carlo approximations of classically scaled population processes

MA 777: Topics in Mathematical Biology

1 Delayed Renewal Processes: Exploiting Laplace Transforms

Multifidelity Approaches to Approximate Bayesian Computation

A Note on the Central Limit Theorem for a Class of Linear Systems 1

Predator-Prey Population Dynamics

c 2018 Society for Industrial and Applied Mathematics

Gene Expression as a Stochastic Process: From Gene Number Distributions to Protein Statistics and Back

Malliavin Calculus in Finance

Lecture on Parameter Estimation for Stochastic Differential Equations. Erik Lindström

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

This is now an algebraic equation that can be solved simply:

Latent voter model on random regular graphs

Stochastic Simulation of Biochemical Reactions

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

Bayesian Inference for DSGE Models. Lawrence J. Christiano

SMSTC (2007/08) Probability.

Martingale Problems. Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi

Dynamic Risk Measures and Nonlinear Expectations with Markov Chain noise

Week 9 Generators, duality, change of measure

Stability of Stochastic Differential Equations

(implicitly assuming time-homogeneity from here on)

Dynamic Partitioning of Large Discrete Event Biological Systems for Hybrid Simulation and Analysis

Derivation of Itô SDE and Relationship to ODE and CTMC Models

Positive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network. Haifa Statistics Seminar May 5, 2008

CIMPA SCHOOL, 2007 Jump Processes and Applications to Finance Monique Jeanblanc

Reflected Brownian Motion

Sensitivity analysis of complex stochastic processes

STOCHASTIC MODELING OF BIOCHEMICAL REACTIONS

Discrepancies between extinction events and boundary equilibria in reaction networks

ABC methods for phase-type distributions with applications in insurance risk problems

Point Process Control

Adaptive timestepping for SDEs with non-globally Lipschitz drift

Lecture 4: Ito s Stochastic Calculus and SDE. Seung Yeal Ha Dept of Mathematical Sciences Seoul National University

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

STAT STOCHASTIC PROCESSES. Contents

LMI Methods in Optimal and Robust Control

The Wiener Itô Chaos Expansion

Lognormal Moment Closures for Biochemical Reactions

stochnotes Page 1

Matrix Solutions to Linear Systems of ODEs

Introduction Probabilistic Programming ProPPA Inference Results Conclusions. Embedding Machine Learning in Stochastic Process Algebra.

Can we do statistical inference in a non-asymptotic way? 1

Stat 451 Lecture Notes Monte Carlo Integration

FE610 Stochastic Calculus for Financial Engineers. Stevens Institute of Technology

STAT 331. Martingale Central Limit Theorem and Related Results

Branching Processes II: Convergence of critical branching to Feller s CSB

What do we know about EnKF?

Large Deviations for Small-Noise Stochastic Differential Equations

Lecture 8. Instructor: Haipeng Luo

Uniformly Uniformly-ergodic Markov chains and BSDEs

Large Deviations for Small-Noise Stochastic Differential Equations

Equilibrium Time, Permutation, Multiscale and Modified Multiscale Entropies for Low-High Infection Level Intracellular Viral Reaction Kinetics

On the numerical solution of the chemical master equation with sums of rank one tensors

DISCRETE-TIME STOCHASTIC MODELS, SDEs, AND NUMERICAL METHODS. Ed Allen. NIMBioS Tutorial: Stochastic Models With Biological Applications

arxiv: v2 [math.na] 20 Dec 2016

SDE Coefficients. March 4, 2008

EnKF and filter divergence

Transcription:

Stochastic models of biochemical systems David F. Anderson anderson@math.wisc.edu Department of Mathematics University of Wisconsin - Madison University of Amsterdam November 14th, 212

Stochastic models of biochemical systems Goal: give broad introduction to stochastic models of biochemical systems, with minimal technical details. Outline 1. Construct useful representation for most common continuous time Markov chain model for population processes. 2. Discuss some computational methods sensitivity analysis. 3. Discuss various approximate models for these CTMCs.

Example: ODE Lotka-Volterra predator-prey model Think of A as a prey and B as a predator. A κ 1 2A, A + B κ 2 2B, B κ 3, with κ 1 = 2, κ 2 =.2, κ 3 = 2.

Example: ODE Lotka-Volterra predator-prey model Think of A as a prey and B as a predator. A κ 1 2A, A + B κ 2 2B, B κ 3, with κ 1 = 2, κ 2 =.2, κ 3 = 2. Deterministic model. Let x(t) = [# prey at t, # predator at t] T or [ x(t) 1 = κ 1 x 1 (t) ] [ 1 + κ 2 x 1 (t)x 2 (t) 1 ] [ + κ 3 x 2 (t) 1 ] x(t) = x() + κ 1 t [ 1 x 1 (s)ds ] + κ 2 t [ 1 x 1 (s)x 2 (s)ds 1 ] + κ 3 t [ x 2 (s)ds 1 ]

Lotka-Volterra Think of A as a prey and B as a predator. A κ 1 2A, A + B κ 2 2B, B κ 3, with κ 1 = 2, κ 2 =.2, κ 3 = 2. 15 14 13 Prey Predator 12 11 1 9 8 7 5 1 15 2 25 3 35 4

Biological example: transcription-translation Gene transcription & translation: G κ 1 G + M M κ 2 M + P M κ 3 P κ 4 G + P κ 5 κ 5 B transcription translation degradation degradation Binding/unbinding of Gene Cartoon representation: 1 N α q 1 X 1 X 2, N α q 2 N λ X 1 N λ 1 M, 2 X2 M, M µ. 1 J. Paulsson, Physics of Life Reviews, 2, 25 157 175.

Another example: Viral infection Let 1. T = viral template. 2. G = viral genome. 3. S = viral structure. 4. V = virus. Reactions: R1) T + stuff κ 1 T + G κ 1 = 1 R2) G κ 2 T κ 2 =.25 R3) T + stuff κ 3 T + S κ 3 = 1 R4) T κ 4 κ 4 =.25 R5) S κ 5 κ 5 = 2 R6) G + S κ 6 V κ 6 = 7.5 1 6 R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 22. E. Haseltine and J. Rawlings, J. Chem. Phys, 22. K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 26. W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 26.

Some examples E. coli Heat Shock Response Model. 9 species, 18 reactions. 2 2 Hye Won Kang, presentation at SPA in 27.

Modeling 1. These models (and much more complicated ones) have historically been predominantly modeled using ODEs. 2. However: 2.1 there are often low numbers of molecules, which makes timing of reactions more random (less averaging), 2.2 when a reaction occurs, the system jumps to new state by non-trivial amount: 1.

Modeling 1. These models (and much more complicated ones) have historically been predominantly modeled using ODEs. 2. However: 2.1 there are often low numbers of molecules, which makes timing of reactions more random (less averaging), 2.2 when a reaction occurs, the system jumps to new state by non-trivial amount: 1. 3. Researchers (mostly) lived with these shortcomings until the late 199s and early 2s when it was shown ODE models can not capture important qualitative behavior of certain models: λ-phage lysis-lysogeny decision mechanism (Arkin-McAdams 1998). Green fluorescent protein. ODEs were often the wrong modeling choice.

Specifying infinitesimal behavior Q: What is a better modeling choice? Should be 1. discrete space, since counting molecules, and 2. stochastic dynamics. Let s return to development of ODEs. An ordinary differential equation is specified by describing how a function should vary over a small period of time X(t + t) X(t) F(X(t)) t

Specifying infinitesimal behavior Q: What is a better modeling choice? Should be 1. discrete space, since counting molecules, and 2. stochastic dynamics. Let s return to development of ODEs. An ordinary differential equation is specified by describing how a function should vary over a small period of time X(t + t) X(t) F(X(t)) t A more precise description (consider a telescoping sum) X(t) = X() + t F(X(s))ds

Infinitesimal behavior for jump processes We are interested in functions that are piecewise constant and random. Changes, when they occur, won t be small. If reaction k occurs at time t, X(t) X(t ) = ζ k Z d

Infinitesimal behavior for jump processes We are interested in functions that are piecewise constant and random. Changes, when they occur, won t be small. If reaction k occurs at time t, X(t) X(t ) = ζ k Z d What is small? The probability of seeing a jump of a particular size. P{X(t + t) X(t) = ζ k F t} λ ζk (t) t Question: Can we specify the λ ζk in some way that determines X? For the ODE, F depended on X. Maybe λ ζk should depend on X?

Simple model For example, consider the simple system A + B C where one molecule each of A and B is being converted to one of C.

Simple model For example, consider the simple system A + B C where one molecule each of A and B is being converted to one of C. Intuition for standard stochastic model: P{reaction occurs in (t, t + t] F t} κx A (t)x B (t) t where κ is a positive constant, the reaction rate constant. F t is all the information pertaining to the process up through time t. Can we specify a reasonable model satisfying this assumption?

Background information: The Poisson process Will view a Poisson process, Y ( ), through the lens of an underlying point process. (a) Let {e i } be i.i.d. exponential random variables with parameter one.

Background information: The Poisson process Will view a Poisson process, Y ( ), through the lens of an underlying point process. (a) Let {e i } be i.i.d. exponential random variables with parameter one. (b) Now, put points down on a line with spacing equal to the e i : x x x x x x x x e 1 e2 e3 t Let Y1 (t) denote the number of points hit by time t. In the figure above, Y1 (t) = 6. 25 2 λ =1 15 1 5 5 1 15 2

The Poisson process Let Y 1 be a unit rate Poisson process. Define Y λ (t) Y 1 (λt), Then Y λ is a Poisson process with parameter λ. x x x x x x x x e 1 e2 e3 t Intuition: The Poisson process with rate λ is simply the number of points hit (of the unit-rate point process) when we run along the time frame at rate λ. 6 5 4 λ =3 3 2 1 5 1 15 2

The Poisson process There is no reason λ needs to be constant in time, in which case ( t ) Y λ (t) Y λ(s)ds is a non-homogeneous Poisson process with propensity/intensity λ(t). Thus P{Y λ (t + t) Y λ (t) > F t} = 1 exp { t+ t } λ(s)ds λ(t) t. t

The Poisson process There is no reason λ needs to be constant in time, in which case ( t ) Y λ (t) Y λ(s)ds is a non-homogeneous Poisson process with propensity/intensity λ(t). Thus P{Y λ (t + t) Y λ (t) > F t} = 1 exp { t+ t } λ(s)ds λ(t) t. t Points: 1. We have changed time to convert a unit-rate Poisson process to one which has rate or intensity or propensity λ(t). 2. Will use similar time changes of unit-rate processes to build models of interest.

Return to models of interest Consider the simple system A + B C where one molecule each of A and B is being converted to one of C. Intuition for standard stochastic model: P{reaction occurs in (t, t + t] F t} κx A (t)x B (t) t where κ is a positive constant, the reaction rate constant. F t is all the information pertaining to the process up through time t.

Models of interest A + B C Simple book-keeping says: if gives the state at time t, then where X(t) = X A (t) X B (t) X C (t) X(t) = X() + R(t) 1 1 1, R(t) is the # of times the reaction has occurred by time t and X() is the initial condition. Goal: represent R(t) in terms of Poisson process.

Models of interest Recall that for A + B C our intuition was to specify infinitesimal behavior P{reaction occurs in (t, t + t] Ft} κx A (t)x B (t) t,

Models of interest Recall that for A + B C our intuition was to specify infinitesimal behavior P{reaction occurs in (t, t + t] Ft} κx A (t)x B (t) t, and that for a counting process with specified intensity λ(t) we have P{Y λ (t + t) Y λ (t) = 1 F t} λ(t) t.

Models of interest Recall that for A + B C our intuition was to specify infinitesimal behavior P{reaction occurs in (t, t + t] Ft} κx A (t)x B (t) t, and that for a counting process with specified intensity λ(t) we have P{Y λ (t + t) Y λ (t) = 1 F t} λ(t) t. This suggests we can model ( t R(t) = Y where Y is a unit-rate Poisson process. ) κx A (s)x B (s)ds

Models of interest Recall that for A + B C our intuition was to specify infinitesimal behavior P{reaction occurs in (t, t + t] Ft} κx A (t)x B (t) t, and that for a counting process with specified intensity λ(t) we have P{Y λ (t + t) Y λ (t) = 1 F t} λ(t) t. This suggests we can model ( t R(t) = Y where Y is a unit-rate Poisson process. Hence X A (t) X B (t) X C (t) X(t) = X() + ) κx A (s)x B (s)ds 1 1 1 ( t Y This equation uniquely determines X for all t. ) κx A (s)x B (s)ds.

Build up model: Random time change representation of Kurtz Now consider a network of reactions involving d chemical species, S 1,..., S d : d d ν ik S i i=1 i=1 ν iks i Denote reaction vector as ζ k = ν k ν k, so that if reaction k occurs at time t X(t) = X(t ) + ζ k.

Build up model: Random time change representation of Kurtz Now consider a network of reactions involving d chemical species, S 1,..., S d : d d ν ik S i i=1 i=1 ν iks i Denote reaction vector as ζ k = ν k ν k, so that if reaction k occurs at time t X(t) = X(t ) + ζ k. The intensity (or propensity) of kth reaction is λ k : Z d R. By analogy with before: X(t) = X() + k R k (t)ζ k, with X(t) = X() + ( t ) Y k λ k (X(s))ds ζ k, k Y k are independent, unit-rate Poisson processes.

Mass-action kinetics The standard intensity function chosen is mass-action kinetics: λ k (x) = κ k ( ( ) x x i! ν ik!) = κ k ν k (x i ν ik )!. i i Example: If S 1 anything, then λ k (x) = κ k x 1. Example: If S 1 + S 2 anything, then λ k (x) = κ k x 1 x 2. Example: If 2S 2 anything, then λ k (x) = κ k x 2 (x 2 1).

Other ways to understand model The infinitesimal generator of a Markov process determines the process: Af (x) def 1 = lim [Exf (X(h)) f (x)] h h

Other ways to understand model The infinitesimal generator of a Markov process determines the process: Af (x) def 1 = lim [Exf (X(h)) f (x)] h h [ ] 1 = lim (f (x + ζ k ) f (x))p(r k (h) = 1) + O(h) h h k

Other ways to understand model The infinitesimal generator of a Markov process determines the process: Af (x) def 1 = lim [Exf (X(h)) f (x)] h h [ ] 1 = lim (f (x + ζ k ) f (x))p(r k (h) = 1) h h k [ ] 1 = lim (f (x + ζ k ) f (x))λ k (x)h + O(h) h h k + O(h)

Other ways to understand model The infinitesimal generator of a Markov process determines the process: Af (x) def 1 = lim [Exf (X(h)) f (x)] h h [ ] 1 = lim (f (x + ζ k ) f (x))p(r k (h) = 1) h h k [ ] 1 = lim (f (x + ζ k ) f (x))λ k (x)h h h k + O(h) + O(h) = k λ k (x)(f (x + ζ k ) f (x)).

Other ways to understand model And we have Dynkin s formula (See Ethier and Kurtz, 1986, Ch. 1) Ef (X(t)) f (X ) = E t Af (X(s))ds,

Other ways to understand model And we have Dynkin s formula (See Ethier and Kurtz, 1986, Ch. 1) Ef (X(t)) f (X ) = E t Af (X(s))ds, Letting f (y) = 1 x(y), above so that E[f (X(t))] = P{X(t) = x} = p x(t), gives Kolmogorov forward equation (chemical master equation) p t (x) = k λ(x ζ k )p t(x ζ k ) p t(x) k λ k (x)

Equivalence of formulations We now have three ways of making the infinitesimal specification precise: P{X(t + t) X(t) = ξ k F X t } λ k (X(t)) t 1. The stochastic equation: X(t) = X() + k Y k ( t ) λ k (X(s))ds ζ k 2. The process is Markov with infinitesimal generator (Af )(x) = k λ k (x)(f (x + ζ k ) f (x)) 3. The master (forward) equation for the probability distributions: p x(t) = k λ k (x ζ k )p t(x ζ k ) p t(x) k λ k (x) Fortunately, if the solution of the stochastic equation doesn t blow up, the three are equivalent. This model is an example of a continuous time Markov chain.

Example: ODE Lotka-Volterra predator-prey model Think of A as a prey and B as a predator. A κ 1 2A, A + B κ 2 2B, B κ 3, with κ 1 = 2, κ 2 =.2, κ 3 = 2. Deterministic model. Let x(t) = [#prey, #predators] T x(t) = x() + κ 1 t [ 1 x 1 (s)ds ] + κ 2 t [ 1 x 1 (s)x 2 (s)ds 1 ] + κ 3 t [ x 2 (s)ds 1 ]

Example: ODE Lotka-Volterra predator-prey model Think of A as a prey and B as a predator. A κ 1 2A, A + B κ 2 2B, B κ 3, with κ 1 = 2, κ 2 =.2, κ 3 = 2. Deterministic model. Let x(t) = [#prey, #predators] T x(t) = x() + κ 1 t [ 1 x 1 (s)ds ] + κ 2 t [ 1 x 1 (s)x 2 (s)ds 1 ] + κ 3 t [ x 2 (s)ds 1 ] Stochastic model. Let X(t) = [#prey, #predators] T ( t X(t) = X() + Y 1 κ 1 ( t + Y 3 κ 3 ) [ 1 X 1 (s)ds ) [ X 2 (s)ds 1 ] + Y 2 (κ 2 t ] ) [ 1 X 1 (s)x 2 (s)ds 1 ]

Another example: Viral infection Let 1. T = viral template. 2. G = viral genome. 3. S = viral structure. 4. V = virus. Reactions: R1) T + stuff κ 1 T + G κ 1 = 1 R2) G κ 2 T κ 2 =.25 R3) T + stuff κ 3 T + S κ 3 = 1 R4) T κ 4 κ 4 =.25 R5) S κ 5 κ 5 = 2 R6) G + S κ 6 V κ 6 = 7.5 1 6 R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 22. E. Haseltine and J. Rawlings, J. Chem. Phys, 22. K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 26. W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 26.

Another example: Viral infection Stochastic equations for X = (X G, X S, X T, X V ) are ( t ) ( X 1 (t) = X 1 () + Y 1 X 3 (s)ds Y 2.25 Y 6 ( 7.5 1 6 t t ) X 1 (s)x 2 (s)ds t ) ( X 2 (t) = X 2 () + Y 3 (1 X 3 (s)ds Y 5 2 ( t ) Y 6 7.5 1 6 X 1 (s)x 2 (s)ds t X 3 (t) = X 3 () + Y 2 (.25 X 1 (s)ds X 4 (t) = X 4 () + Y 6 ( 7.5 1 6 t t ) ( Y 4.25 ) X 1 (s)x 2 (s)ds. ) X 1 (s)ds ) X 2 (s)ds t ) X 3 (s)ds

Computational methods These are continuous time Markov chains! Simulation/computation should be easy. The most common simulation methods include 1. Gillespie s Algorithm Answer where and when independently. 2. The next reaction method of Gibson and Bruck. 3. Each is an example of discrete event simulation.

Numerical methods Each exact method produces sample paths that can approximate values such as (which I will talk about tomorrow at CWI) Ef (X(t)) 1 n For example, 1. Means expected virus yield. 2. Variances. 3. Probabilities. n f (X [i] (t)) i=1 or sensitivities d dθ Ef (θ, X θ (t)).

Numerical methods Each exact method produces sample paths that can approximate values such as (which I will talk about tomorrow at CWI) Ef (X(t)) 1 n n f (X [i] (t)) i=1 For example, 1. Means expected virus yield. 2. Variances. 3. Probabilities. or sensitivities d dθ Ef (θ, X θ (t)). Problem: solving using these algorithms can be computationally expensive: 1. Each path may require significant number of computational steps. 2. May require significant number of paths. Solution: Need to use novel stochastic representations to get good methods.

Specific computational problem: Gradient estimation/sensitivity analysis We have X θ (t) = X θ () + ( t ) Y k λ k (θ, X θ (s))ds ζ k, k with θ R s, and we define J(θ) = Ef (θ, X θ (t)]. We know how to estimate J(θ) using Monte Carlo.

Specific computational problem: Gradient estimation/sensitivity analysis We have X θ (t) = X θ () + ( t ) Y k λ k (θ, X θ (s))ds ζ k, k with θ R s, and we define J(θ) = Ef (θ, X θ (t)]. We know how to estimate J(θ) using Monte Carlo. However, what if we want J (θ) = d dθ Ef (θ, X θ (t)). Thus, we want to know how sensitive our statistic is to perturbations in θ. Tells us, for example: 1. Robustness of system to perturbations in parameters. 2. Which parameters we need to estimate well from data, etc.

Specific computational problem: Gradient estimation/sensitivity analysis We have X θ (t) = X θ () + ( t ) Y k λ k (θ, X θ (s))ds ζ k, k with θ R s, and we define J(θ) = Ef (θ, X θ (t)]. We know how to estimate J(θ) using Monte Carlo. However, what if we want J (θ) = d dθ Ef (θ, X θ (t)). Thus, we want to know how sensitive our statistic is to perturbations in θ. Tells us, for example: 1. Robustness of system to perturbations in parameters. 2. Which parameters we need to estimate well from data, etc. There are multiple methods. We will consider: Finite differences.

Finite differencing This method is pretty straightforward and is therefore used most.

Finite differencing This method is pretty straightforward and is therefore used most. Simply note that J (θ) = J(θ + ɛ) J(θ) ɛ Centered differencing reduces bias to O(ɛ 2 ). [ f (θ + ɛ, X θ+ɛ (t)) f (θ, X θ (t)) + O(ɛ) = E ɛ ] + O(ɛ).

Finite differencing This method is pretty straightforward and is therefore used most. Simply note that J (θ) = J(θ + ɛ) J(θ) ɛ Centered differencing reduces bias to O(ɛ 2 ). [ f (θ + ɛ, X θ+ɛ (t)) f (θ, X θ (t)) + O(ɛ) = E ɛ ] + O(ɛ). The usual finite difference estimator is D N (ɛ) = 1 N N i=1 f (θ + ɛ, X θ+ɛ [i] (t)) f (θ, X[i](t)) θ ɛ Letting δ > be some desired accuracy (for confidence interval), we need N so that Var(DN (ɛ)) δ.

Finite differencing Want Var(DN (ɛ)) δ. with D N (ɛ) = 1 N N i=1 f (θ, X θ+ɛ [i] (t)) f (θ, X[i](t)) θ ɛ If paths generated independently, then implying Var(D N (ɛ)) = N 1 ɛ 2 Var(f (θ, X θ+ɛ [i] (t)) f (θ, X[i](t))) θ = O(N 1 ɛ 2 ), Terrible. Worse than expectations. 1 1 N ɛ = O(δ) = N = O(ɛ 2 δ 2 ) How about common random numbers for variance reduction?

Common random numbers It s exactly what it sounds like. Reuse the random numbers used in the generation of X θ+ɛ [i] (t) and X[i](t). θ Why?

Common random numbers It s exactly what it sounds like. Reuse the random numbers used in the generation of X θ+ɛ [i] (t) and X[i](t). θ Why? Because: Var(f (θ, X θ+ɛ [i] (t)) f (θ, X θ+ɛ (t))) = Var(f (θ, X θ+ɛ (t))) + Var(f (θ, X[i](t))) θ [i] [i] 2Cov(f (θ, X θ+ɛ [i] (t)), f (θ, X[i](t))). θ So, if we can couple the random variables, we can get a variance reduction! Sometimes substantial.

Common random numbers In the context of Gillespie s algorithm, we simply reuse all the same random numbers (uniforms). This can be achieved simply by setting the seed of the random number generator before generating X θ+ɛ and X θ.

Common random numbers CRN + Gillespie is good idea. 1. Costs little in terms of implementation. 2. Variance reduction and gains in efficiency can be huge. Thus, it is probably the most common method used today. But: Over time, the processes decouple, often completely. Can we do better?

Coupling Using common random numbers in previous fashion is a way of coupling the two processes together.

Coupling Using common random numbers in previous fashion is a way of coupling the two processes together. Is there a natural way to couple processes using random time change? Can we couple the Poisson processes?

Coupling Using common random numbers in previous fashion is a way of coupling the two processes together. Is there a natural way to couple processes using random time change? Can we couple the Poisson processes? Answer: yes. Multiple ways. I will show one which works very well.

How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13.

How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13. We could let Y 1 and Y 2 be independent, unit-rate Poisson processes, and set Z 13.1 (t) = Y 1 (13.1t), Z 13 (t) = Y 2 (13t), Using this representation, these processes are independent and, hence, not coupled.

How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13. We could let Y 1 and Y 2 be independent, unit-rate Poisson processes, and set Z 13.1 (t) = Y 1 (13.1t), Z 13 (t) = Y 2 (13t), Using this representation, these processes are independent and, hence, not coupled. The variance of difference is large: Var(Z 13.1 (t) Z 13 (t)) = Var(Y 1 (13.1t)) + Var(Y 2 (13t)) = 26.1t.

How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13.

How do we generate processes simultaneously Suppose I want to generate: A Poisson process with intensity 13.1. A Poisson process with intensity 13. We could let Y 1 and Y 2 be independent unit-rate Poisson processes, and set Z 13.1 (t) = Y 1 (13t) + Y 2 (.1t) Z 13 (t) = Y 1 (13t), The variance of difference is much smaller: Var(Z 13.1 (t) Z 13 (t)) = Var (Y 2 (.1t)) =.1t. Using a fact: sum of homogeneous Poisson process is again a Poisson process.

How do we generate processes simultaneously More generally, suppose we want 1. non-homogeneous Poisson process with intensity f (t) and 2. non-homogeneous Poisson process with intensity g(t).

How do we generate processes simultaneously More generally, suppose we want 1. non-homogeneous Poisson process with intensity f (t) and 2. non-homogeneous Poisson process with intensity g(t). We can can let Y 1, Y 2, and Y 3 be independent, unit-rate Poisson processes and define ( t ) ( t ) Z f (t) = Y 1 f (s) g(s)ds + Y 2 f (s) (f (s) g(s)) ds, ( t ) ( t Z g(t) = Y 1 f (s) g(s)ds + Y 3 ) g(s) (f (s) g(s)) ds,

How do we generate processes simultaneously More generally, suppose we want 1. non-homogeneous Poisson process with intensity f (t) and 2. non-homogeneous Poisson process with intensity g(t). We can can let Y 1, Y 2, and Y 3 be independent, unit-rate Poisson processes and define ( t ) ( t ) Z f (t) = Y 1 f (s) g(s)ds + Y 2 f (s) (f (s) g(s)) ds, ( t ) ( t Z g(t) = Y 1 f (s) g(s)ds + Y 3 ) g(s) (f (s) g(s)) ds, where we are using that, for example, ( t ) ( t ) ( t Y 1 f (s) g(s)ds + Y 2 f (s) (f (s) g(s)) ds = Y where Y is a unit rate Poisson process. ) f (s)ds,

Parameter sensitivities. Couple the processes. X θ+ɛ (t) = X θ+ɛ () + k + k X θ (t) = X θ () + k + k Y k,2 ( t Y k,1 ( t Y k,1 ( t Y k,3 ( t λ θ+ɛ k ) λ θ+ɛ k (X θ+ɛ (s)) λ θ k (X θ (s))ds ζ k (X θ+ɛ (s)) λ θ+ɛ k ) (X θ+ɛ (s)) λ θ k (X θ (s))ds ) λ θ+ɛ k (X θ+ɛ (s)) λ θ k (X θ (s))ds ζ k ) λ θ k (X θ (s)) λ θ+ɛ k (X θ+ɛ (s)) λ θ k (X θ (s))ds ζ k, ζ k

Parameter sensitivities. Theorem 3 Suppose (X θ+ɛ, X θ ) satisfy coupling. Then, for any T > there is a C T,f > for which ( 2 E sup f (θ + ɛ, X θ+ɛ (t)) f (θ, X (t))) θ CT,f ɛ. t T 3 David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 5, No. 5, 212.

Parameter sensitivities. Theorem 3 Suppose (X θ+ɛ, X θ ) satisfy coupling. Then, for any T > there is a C T,f > for which ( 2 E sup f (θ + ɛ, X θ+ɛ (t)) f (θ, X (t))) θ CT,f ɛ. t T This lowers variance of estimator from to Lowered by order of magnitude (in ɛ). O(N 1 ɛ 2 ), O(N 1 ɛ 1 ). Point: a deeper mathematical understanding led to better computational method. 3 David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 5, No. 5, 212.

Analysis Theorem Suppose (X θ+ɛ, X θ ) satisfy coupling. Then, for any T > there is a C T,f > for which ( 2 E sup f (θ + ɛ, X θ+ɛ (t)) f (θ, X (t))) θ CT,f ɛ. t T Proof:

Analysis Theorem Suppose (X θ+ɛ, X θ ) satisfy coupling. Then, for any T > there is a C T,f > for which ( 2 E sup f (θ + ɛ, X θ+ɛ (t)) f (θ, X (t))) θ CT,f ɛ. t T Proof: Key observation of proof: X θ+ɛ (t) X θ (t) = M θ,ɛ (t) + t where most of the jumps have vanished. F θ+ɛ (X θ+ɛ (s)) F θ (X θ (s))ds, Now work on Martingale and absolutely continuous part.

Example: gene transcription and translation G 2 G + M, M 1 M + P, M k, P 1. Want θ E [ X θ protein(3) ], θ 1/4.

Example: gene transcription and translation G 2 G + M, M 1 M + P, M k, P 1. Want [ ] θ E Xprotein(3) θ, θ 1/4. Method R 95% CI # updates CPU Time Likelihood 689,6-312.1 ± 6. 2.9 1 9 3,56.6 S CMC 246, -319.3 ± 6. 2.1 1 9 2,364.8 S CRP/CRN 25,98-316.7 ± 6. 2.2 1 8 27.9 S CFD 4,58-319.9 ± 6. 2. 1 7 29.2 S Table: Each finite difference method used ɛ = 1/4. The exact value is J(1/4) = 318.73.

Comparison from 5, samples each with ɛ = 1/4 6 5 5 4 Variance 4 3 2 Coupled Finite Differences Common Reaction Path Variance 3 2 Crude Monte Carlo 1 1 1 2 3 4 5 6 Time 1 2 3 4 5 6 Time 3 25 Girsanov Transformation 2 Variance 15 1 5 1 2 3 4 5 6 Time

Example: genetic toggle switch λ 1(X) X 1, λ 2 (X) λ 3(X) X 2, (1) λ 4 (X) with intensity functions λ 1 (X(t)) = α 1 1 + X 2 (t) β, λ 2(X(t)) = X 1 (t) and parameter choice λ 3 (X(t)) = α 2 1 + X 1 (t) γ. λ 4(X(t)) = X 2 (t), α 1 = 5, α 2 = 16, β = 2.5, γ = 1. Begin the process with initial condition [, ] and consider the sensitivity of X 1 as a function of α 1.

Example: genetic toggle switch.8.7.6 Coupled Finite Differences Common Reaction Path Variance.5.4.3.2.1 5 1 15 2 25 3 35 4 Time (a) Variance to time T = 4 Figure: Time plot of the variance of the Coupled Finite Difference estimator versus the Common Reaction Path estimator for the model (1). Each plot was generated using 1, sample paths. A perturbation of ɛ = 1/1 was used.

Are these representations only good for simulation? LLN and ODEs. Tom Kurtz 197 s Suppose X N (t) = O(N). Denote concentrations via X N (t) = N 1 X N (t) = O(1).

Are these representations only good for simulation? LLN and ODEs. Tom Kurtz 197 s Suppose X N (t) = O(N). Denote concentrations via Under mild assumptions, have X N (t) = N 1 X N (t) = O(1). λ k (X N (t)) = λ k (N X N (t)/n) = N λ k (X N (t)).

Are these representations only good for simulation? LLN and ODEs. Tom Kurtz 197 s Suppose X N (t) = O(N). Denote concentrations via Under mild assumptions, have X N (t) = N 1 X N (t) = O(1). λ k (X N (t)) = λ k (N X N (t)/n) = N λ k (X N (t)). becomes X N (t) = X N () + k X N (t) = X N () + k Y k ( t ) λ k (X N (s))ds ξ k t ) N 1 Y k (N λ k (X N (s))ds ξ k

Are these representations only good for simulation? LLN and ODEs. Tom Kurtz 197 s Suppose X N (t) = O(N). Denote concentrations via Under mild assumptions, have X N (t) = N 1 X N (t) = O(1). λ k (X N (t)) = λ k (N X N (t)/n) = N λ k (X N (t)). becomes use that X N (t) = X N () + k X N (t) = X N () + k lim sup N {u U} Y k ( t ) λ k (X N (s))ds ξ k t ) N 1 Y k (N λ k (X N (s))ds ξ k N 1 Y (Nu) u =,

Are these representations only good for simulation? LLN and ODEs. Tom Kurtz 197 s Suppose X N (t) = O(N). Denote concentrations via Under mild assumptions, have X N (t) = N 1 X N (t) = O(1). λ k (X N (t)) = λ k (N X N (t)/n) = N λ k (X N (t)). becomes X N (t) = X N () + k X N (t) = X N () + k Y k ( t ) λ k (X N (s))ds ξ k t ) N 1 Y k (N λ k (X N (s))ds ξ k use that lim sup N {u U} N 1 Y (Nu) u =, find X N (t) converges to solution of classical ODE x(t) = x() + k t λ k (x(s))ds ξ k x() + t F(X(s))ds.

Diffusions? Argument due to Tom Kurtz Suppose X N (t) = O(N). Denote concentrations via X N (t) = N 1 X N (t) = O(1).

Diffusions? Argument due to Tom Kurtz Suppose X N (t) = O(N). Denote concentrations via X N (t) = N 1 X N (t) = O(1). becomes X N (t) = X N () + k X N (t) = X N () + k Y k ( t ) λ k (X N (s))ds ξ k t ) N 1 Y k (N λ k (X N (s))ds ξ k

Diffusions? Argument due to Tom Kurtz Suppose X N (t) = O(N). Denote concentrations via X N (t) = N 1 X N (t) = O(1). becomes use that X N (t) = X N () + k X N (t) = X N () + k Y k ( t ) λ k (X N (s))ds ξ k t ) N 1 Y k (N λ k (X N (s))ds ξ k 1 N (Y k (Nu) Nu) W k (u)

Diffusions? Argument due to Tom Kurtz Suppose X N (t) = O(N). Denote concentrations via X N (t) = N 1 X N (t) = O(1). becomes use that X N (t) = X N () + k X N (t) = X N () + k Y k ( t ) λ k (X N (s))ds ξ k t ) N 1 Y k (N λ k (X N (s))ds ξ k 1 N (Y k (Nu) Nu) W k (u) find X N (t) well approximated by chemical Langevin process X(t) = X() + k t ζ k λ k (X(s))ds + 1 N t k λk (X(s))dW k (s). or dx(t) = F(X(t))dt + N 1/2 ζ k λk (X(t))dW k (t). k

Central limit theorem - Kurtz/ Van Kampen Suppose X N (t) = O(N). Denote concentrations via Let X N (t) = N 1 X N (t) = O(1), x(t) = ODE solution. U N (t) = ( ) Xn(t) Nx(t) N X N (t) x(t) =. N

Central limit theorem - Kurtz/ Van Kampen Suppose X N (t) = O(N). Denote concentrations via Let Then, U N (t) = N X N (t) = N 1 X N (t) = O(1), x(t) = ODE solution. U N (t) = ( ) Xn(t) Nx(t) N X N (t) x(t) =. N ( + N N 1 k t k t ) ζ k Ỹ k (N ) λ k (X n(s))ds (F(X n(s)) F (x(s)))ds 1 t ) t ζ k Ỹ k (N λ k (X n(s))ds + DF(x(s))U n(s)ds. N

Central limit theorem - Kurtz/ Van Kampen Suppose X N (t) = O(N). Denote concentrations via Let Then, U N (t) = N X N (t) = N 1 X N (t) = O(1), x(t) = ODE solution. U N (t) = ( ) Xn(t) Nx(t) N X N (t) x(t) =. N ( + N N 1 k t k t ) ζ k Ỹ k (N ) λ k (X n(s))ds (F(X n(s)) F (x(s)))ds 1 t ) t ζ k Ỹ k (N λ k (X n(s))ds + DF(x(s))U n(s)ds. N use martingale central limit theorem to show that 1 N Ỹ k (N ) W k ( ), get U n U, U(t) = k ζ k W k ( t ) t λ k (x(s))ds + DF(x(s))U(s)ds

Thanks! References: 1. David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains, SIAM: Journal on Numerical Analysis, Vol. 5, No. 5, 212. 2. David F. Anderson and Thomas G. Kurtz, Continuous Time Markov Chain Models for Chemical Reaction Networks, in Design and Analysis of biomolecular circuits, Springer, 211, Eds. Heinz Koeppl et al. Funding: NSF-DMS-19275.