adaptivetau: efficient stochastic simulations in R

Similar documents
Simulating stochastic epidemics

Numerical solution of stochastic epidemiological models

Thursday. Threshold and Sensitivity Analysis

Predator-Prey Population Dynamics

Transmission in finite populations

Stochastic Models. John M. Drake & Pejman Rohani

Analysis of Numerical and Exact solutions of certain SIR and SIS Epidemic models

A Simple Ecological Model

Derivation of Itô SDE and Relationship to ODE and CTMC Models

Spotlight on Modeling: The Possum Plague

Mathematical Modeling and Analysis of Infectious Disease Dynamics

A BINOMIAL MOMENT APPROXIMATION SCHEME FOR EPIDEMIC SPREADING IN NETWORKS

Final Project Descriptions Introduction to Mathematical Biology Professor: Paul J. Atzberger. Project I: Predator-Prey Equations

Fixed Point Analysis of Kermack Mckendrick SIR Model

Comment on A BINOMIAL MOMENT APPROXIMATION SCHEME FOR EPIDEMIC SPREADING IN NETWORKS in U.P.B. Sci. Bull., Series A, Vol. 76, Iss.

Introduction to SEIR Models

Models of Infectious Disease Formal Demography Stanford Summer Short Course James Holland Jones, Instructor. August 15, 2005

Statistical Inference for Stochastic Epidemic Models

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC

Electronic appendices are refereed with the text. However, no attempt has been made to impose a uniform editorial style on the electronic appendices.

MATHEMATICAL MODELS Vol. III - Mathematical Models in Epidemiology - M. G. Roberts, J. A. P. Heesterbeek

STOCHASTIC CHEMICAL KINETICS

dy dt = a by y = a b

Gerardo Zavala. Math 388. Predator-Prey Models

Epidemics in Networks Part 2 Compartmental Disease Models

A NUMERICAL STUDY ON PREDATOR PREY MODEL

Any live cell with less than 2 live neighbours dies. Any live cell with 2 or 3 live neighbours lives on to the next step.

On state-space reduction in multi-strain pathogen models, with an application to antigenic drift in influenza A

PHYSICAL REVIEW LETTERS

Three Disguises of 1 x = e λx

Introduction to Stochastic SIR Model

The Wright-Fisher Model and Genetic Drift

Inference for partially observed stochastic dynamic systems: A new algorithm, its theory and applications

MA 138 Calculus 2 for the Life Sciences Spring 2016 Final Exam May 4, Exam Scores. Question Score Total

Ordinary Differential Equations

SIR Epidemic Model with total Population size

MATH 56A: STOCHASTIC PROCESSES CHAPTER 0

MAS1302 Computational Probability and Statistics

GENETICS. On the Fixation Process of a Beneficial Mutation in a Variable Environment

Supplementary Figures.

Sylvatic Dengue and Animal-to-Human Spillover

A NEW SOLUTION OF SIR MODEL BY USING THE DIFFERENTIAL FRACTIONAL TRANSFORMATION METHOD

Stochastic models in biology and their deterministic analogues

Applications in Biology

The Effects of Varying Parameter Values and Heterogeneity in an Individual-Based Model of Predator-Prey Interaction

The dynamics of disease transmission in a Prey Predator System with harvesting of prey

Fast Probability Generating Function Method for Stochastic Chemical Reaction Networks

Simulation methods for stochastic models in chemistry

University of Leeds. Project in Statistics. Math5004M. Epidemic Modelling. Supervisor: Dr Andrew J Baczkowski. Student: Ross Breckon

Discrete Time Markov Chain of a Dynamical System with a Rest Phase

Markov Chains and Pandemics

Stochastic modelling of epidemic spread

Mathematics, Box F, Brown University, Providence RI 02912, Web site:

We have two possible solutions (intersections of null-clines. dt = bv + muv = g(u, v). du = au nuv = f (u, v),

Partially Observed Stochastic Epidemics

1. Population dynamics of rabbits and foxes

Qualitative Analysis of a Discrete SIR Epidemic Model

The effect of emigration and immigration on the dynamics of a discrete-generation population

Dynamical Systems and Chaos Part II: Biology Applications. Lecture 6: Population dynamics. Ilya Potapov Mathematics Department, TUT Room TD325

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Figure The Threshold Theorem of epidemiology

The death of an epidemic

Epidemic Modeling with Contact Heterogeneity and Multiple Routes of Transmission

GEMF: GENERALIZED EPIDEMIC MODELING FRAMEWORK SOFTWARE IN PYTHON

Notes for Math 450 Stochastic Petri nets and reactions

Dynamics of Disease Spread. in a Predator-Prey System

Discrete versus continuous-time models of malaria infections

POMP inference via iterated filtering

On the Interpretation of Delays in Delay Stochastic Simulation of Biological Systems

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT

Module 02 Control Systems Preliminaries, Intro to State Space

Brief Glimpse of Agent-Based Modeling

Approximation of epidemic models by diffusion processes and their statistical inferencedes

Predation is.. The eating of live organisms, regardless of their identity

An Introduction to Evolutionary Game Theory: Lecture 2

Markov-modulated interactions in SIR epidemics

Downloaded from:

Eigenvalues and eigenvectors.

Evolutionary dynamics on graphs

A Mathematical Analysis on the Transmission Dynamics of Neisseria gonorrhoeae. Yk j N k j

Global Analysis of an Epidemic Model with Nonmonotone Incidence Rate

Mathematical Epidemiology Lecture 1. Matylda Jabłońska-Sabuka

Lecture 20/Lab 21: Systems of Nonlinear ODEs

Epidemics in Complex Networks and Phase Transitions

Models of Infectious Disease Formal Demography Stanford Spring Workshop in Formal Demography May 2008

Introduction to stochastic multiscale modelling in tumour growth

Introduction to Stochastic Processes with Applications in the Biosciences

Relations in epidemiology-- the need for models

Non-Linear Models Cont d: Infectious Diseases. Non-Linear Models Cont d: Infectious Diseases

Evolutionary Dynamics of Lewis Signaling Games: Signaling Systems vs. Partial Pooling

Topic 2: Review of Probability Theory

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

SYMBIOTIC MODELS WITH AN SIR DISEASE

Segregation versus mitotic recombination APPENDIX

4452 Mathematical Modeling Lecture 16: Markov Processes

On the Interpretation of Delays in Delay Stochastic Simulation of Biological Systems

COMPETITION OF FAST AND SLOW MOVERS FOR RENEWABLE AND DIFFUSIVE RESOURCE

Mathematical Model for the Epidemiology of Fowl Pox Infection Transmission That Incorporates Discrete Delay

AARMS Homework Exercises

Simulation of Chemical Reactions

Transcription:

adaptivetau: efficient stochastic simulations in R Philip Johnson Abstract Stochastic processes underlie all of biology, from the large-scale processes of evolution to the fine-scale processes of biochemical interactions. Consequently, the analysis of biological data frequently necessitates the use of Markov models. While these models sometimes yield analytic solutions, simulations can provide intuition when analytic results are intractable. This vignette presents a few examples of adaptivetau applied to biological problems. Introduction Fundamentally, stochastic effects become critical to take into account when the quantities in a system are small enough that extinction becomes probable. This vignette presents three such biological systems: 1. (ecology) Lotka-Volterra predator-prey dynamics predict oscillations in the population sizes of predator and prey species for certain parameters [Lotka, 1920]. However, when the population of either species nears zero, chance effects can lead to extinction of that species and thus a halt to the oscillations. 2. (evolution) When a new mutation arises in a population of size N, it will eventually be either lost (reaching 0% frequency) or fix (reaching 100% frequency). If the allele is selectively neutral, then with probability (N 1)/N, the stochastic process of genetic drift will lead to the new allele disappearing from the population instead of reaching fixation [Kimura, 1957]. 3. (epidemiology) In general, emerging zoonotic infections are poorly adapted to humans and thus unlikely to be transmitted from the initial infected person to significantly more people. However, continued contact between humans and an animal reservoir will give the pathogen 1

more chances to become human adapted, leading to a larger epidemic [Lloyd-Smith et al., 2009]. Crucially, these stochastic processes are all Markovian the future state of the process depends only on the present state (i.e., it is independent of the history). Here, we focus on simulating trajectories from a continuous-time Markov process for which the transition rates are not a function of time (i.e. timehomogeneous). The straightforward (and exact) method of simulating such a process requires many iterations of 1) drawing a random exponentiallydistributed waiting time until the next transition and 2) randomly selecting the identity of the transition, weighted by the relative transition rates. This solution, which is known as the Gillespie algorithm in the chemical physics literature [Gillespie, 1976], becomes impractical as any one transition rate grows large. Instead, models with larger transition rates require an approximate approach that increases simulation speed while still maintaining reasonable accuracy. One way to perform this approximation is to reduce the number of iterations by treating transition rates as constant over time periods for which this approximation leads to little error. The adaptivetau package in R implements both an exact solution and an approximate solution known as the adaptive tau-leaping algorithm [Cao et al., 2007]. Similar functionality exists in the publicly-available GillespieSSA R package [Pineda-Krch, 2008]; however, our new implementation is much faster, due in part to its underlying implementation in C++. Methods IMPORTANT: This section can be skipped if you just want to learn to use the package, but the approximations involved in the adaptive tau-leaping algorithm should be understood before interpreting simulation results! First we define our notation and use it to describe the type of model simulated by this package. Then we provide basic intuition for the adaptive tau-leaping algorithm, leaving a more detailed description to Cao et al. [2007]. Let the state of the Markov process at time t, X(t), be completely described by a vector of n 1 random variables X(t) := [X 1 (t),..., X n (t)], each of which is defined on the non-negative integers. Transitions between states occur according to the instantaneous rate matrix, Q. In general, this matrix will be extremely sparse, so we do not attempt to use it directly. Instead, for each allowable transition, j, we define a rate, λ j, and a vector 2

of n integers, j := [δ j,1,..., δ j,n ], that reflects the change in state if this transition were followed: X(t) + j. An arbitrary number of transitions may exist: j Z. Because we are modeling a time-homogeneous process, a transition rate λ j cannot depend on t, although it may depend on the current state, X(t). For a given state X(t), transitions that would lead to any of the n state variables becoming negative must have rate 0. Thus the stochastic model is completely defined by a vector giving the initial state (X(0)), a set of allowable transitions ({ j }) and a function to calculate transition rates given the state (λ(x)). Given a model, the package simulates a trajectory from time 0 to a usersupplied stopping time, t f. Intuitively, the algorithm identifies time periods during which all transition rates will remain approximately constant and all n state variables will remain greater than zero with probability 1. Then the simulator can leap over such a period of length τ and, for each transition, add the net effect of the Poisson-distributed number of transitions that should have occurred during this period: X(t + τ) X(t) + j y j j where y j Poisson(τλ j ). The challenge arises in handling models for which transition rates frequently change and in balancing efficiency with accuracy when selecting these time periods to leap over. Here we implement the algorithm of Cao et al. [2007]. Example 1: ecology Consider a simple Lotka-Volterra ecological model in which a prey species (X 1 ) interacts with a predator species (X 2 ). Traditionally, this would be written using two differential equations: dx 1 /dt = rx 1 αx 1 X 2 dx 2 /dt = βx 1 X 2 δx 2 However, these equations ignore stochastic variance and the possibility of species extinction. We can reformulate them in a stochastic model with state variables X(t) = [X 1 (t), X 2 (t)] and three transitions: 1. growth of prey: 1 = [+1, 0] with rate λ 1 = rx 1 2. predation: 2 = [ 2, +1] with rate λ 2 = βx 1 X 2 (effectively, α = 2β) 3. death of predators: 3 = [0, 1] with rate λ 3 = δx 2 Now we can write this model in R for use with 3

> library(adaptivetau) First we specify our transitions ( ) in sparse form, whereby we only give non-zero entries in the transition matrix (note we could also supply the entire matrix in matrix form; see the ssa.adaptivetau help page for details): > transitions = list(c(prey = +1), # trans 1: prey grows + c(prey = -2, pred = +1), # trans 2: predation + c(pred = -1)) # trans 3: predator dies Next we write a function to calculate the rates of each transition (λ). The order in which rates are returned by this function must match the order of transitions specified by the matrix above: > lvratef <- function(x, params, t) { + return(c(params$r * x["prey"], # rate of prey growing + params$beta * x["prey"]*x["pred"] * # rate of predation + (x["prey"] >= 2), + params$delta * x["pred"])) # rate of predators dying + } Note that the conditional (x["prey"] >= 2) above ensures that the predation transition can never create a negative number of prey (since 2 are subtracted). Finally we run the simulations setting initial conditions (init.values), parameters (params), and the simulation stop time (tf): > set.seed(4) # set random number generator seed to be reproducible > simresults = ssa.adaptivetau(init.values = c(prey = 1000, pred = 500), + transitions, lvratef, + params = list(r=10, beta=0.01, delta=10), + tf=12) We plot the results in Figure 1 using the following code: > matplot(simresults[,"time"], simresults[,c("prey","pred")], type='l', + xlab='time', ylab='counts (log scale)', log='y') > legend("bottomleft", legend=c("prey", "predator"), lty=1:2, col=1:2) Exact vs approximate algorithms The above code uses the adaptive tau-leaping approximation for simulation. An exact simulation requires more computational time, but captures more fine-scale variance, as you can see by running another simulation using the ssa.exact command: 4

Counts (log scale) 1 5 50 500 5000 prey predator 0 2 4 6 8 10 12 Time Figure 1: Simulated trajectory from the Lotka-Voltera model under the adaptive tau-leaping approximation. > simresults = ssa.exact(init.values = c(prey = 1000, pred = 500), + transitions, lvratef, + params = list(r=10, beta=0.01, delta=10), + tf=12) Alternatively, you can improve the approximation by reducing the ɛ variable in the tau-leaping parameters (tl.params) when calling ssa.adaptivetau (in the earlier example, ɛ takes its default value of 0.05): > simresults = ssa.adaptivetau(init.values = c(prey = 1000, pred = 500), + transitions, lvratef, + params = list(r=10, beta=0.01, delta=10), + tf=12, + tl.params = list(epsilon = 0.005)) Note we have explicitly linked the death of prey to growth in the predator population by placing these two actions in a single transition and incorporating the parameter α into the 2 vector. Alternatively, these actions could have been split into two independent transitions with state change vectors [ 1, 0] and [0, +1] and rates αx 1 X 2 and βx 1 X 2. This formulation would yield slightly increased variance in the stochastic process since these actions are uncoupled. With the parameters and initial conditions used in this example, the deterministic model predicts a steady state with dx 1 /dt = dx 2 /dt = 0, 5

while our stochastic trajectory has a dramatically different outcome with prey becoming extinct around time 11. If we repeat the simulation many times, we find the prey go extinct in approximately 1/3 of the simulations (followed shortly by extinction of the predator) while the predator become extinct in the other 2/3 of the simulations (and prey grow infinitely, revealing a fundamental unrealistic aspect of this model). Example 2: genetic drift Consider a Moran model for the evolution of a population of size N in which we track the number of individuals with a novel mutant allele (X) versus the number of individuals with the ancestral allele (N X). Since the population size is a constant, this model has only one independent variable (X). Under the Moran model, evolution occurs when one individual is chosen to reproduce and, simultaneously, one individual is chosen to die. Thus the model allows only one-step transitions in which X either decreases or increases by one ( 1 = [ 1] and 2 = [+1]). Both transitions occur at the N X N. same rate: λ 1 = λ 2 = X N Although this model has only one variable, we have two critical values for this variable: X = 0 and X = N are both absorbing boundaries. The adaptive tau-leaping approximation is always careful near 0 (the X = 0 boundary), but it is unaware of other critical values. We work around this problem by creating an additional variable for the ancestral allele, such that X 2 = 0 when X 1 = N. > library(adaptivetau) > transitions = cbind(c(-1,+1), # num individuals w/ derived allele decreases + c(+1,-1)) # num individuals w/ derived allele increases > driftratef <- function(x, params, t) { + rep(x[1] * x[2]/sum(x)^2, 2) + } > set.seed(1) # set random number generator seed to be reproducible > r=ssa.adaptivetau(c(500,500), transitions, driftratef, params=null, tf=inf) Simulation results are plotted in Figure 2. The above code also demonstrates an alternative stopping condition for the simulation: when tf=inf, the simulation will run until all transition rates are 0 (in this case, when either allele fixes in the population). 6

Number of derived alleles 0 200 600 1000 0e+00 2e+05 4e+05 6e+05 Time Figure 2: Genetic drift starting at frequency of 50% in a population of size 1000. Example 3: epidemiology We want to model the stuttering transmission of an infectious disease in the initial stages of its movement from a non-human animal reservoir to a human population. Our model is a slight modification on the standard Kermack- McKendrick SIR compartment model: susceptible (S), infected with nonhuman-adapted pathogen (I 1 ), infected with human-adapted pathogen (I 2 ), and recovered (this last class is also immune) (R). First, the differential equation representation: ds/dt = S(z + β 1 I 1 + β 1 µi 1 + β 2 I 2 ) di 1 /dt = S(z + β 1 I 1 ) I 1 (δ 1 + γ 1 ) di 2 /dt = S(β 1 µi 1 + β 2 I 2 ) I 2 (δ 2 + γ 2 ) dr/dt = γ 1 I 1 + γ 2 I 2 where we assume transmissible contact with infected animals occurs at a constant rate z, human-to-human transmissible contact occurs with rates β 1,2, human adaptation occurs with rate µ, death due to infection with rates δ 1,2 and recovery with rates γ 1,2. Now we will make a stochastic version of this model in R: > library(adaptivetau) > init.values = c( 7

+ S = 10^5, # susceptible humans + I1 = 0, # infected humans + I2 = 0, # infected humans + R = 0) # recovered (and immune) humans > transitions = list( + c(s = -1, I1 = +1), # infection (animal-adapted strain) + c(s = -1, I2 = +1), # infection (human-adapted strain) + c(i1 = -1), # death due to infection + c(i2 = -1), + c(i1 = -1, R = +1), # recovery + c(i2 = -1, R = +1) + ) > SIRrateF <- function(x, p, t) { + return(c(x["s"] * (p$zoonotic + p$beta[1]*x["i1"]), # infection rate + x["s"] * (p$beta[2]*x["i2"] + p$mu*p$beta[1]*x["i1"]), + params$delta[1]*x["i1"], # infected death rate + params$delta[2]*x["i2"], + params$gamma[1]*x["i1"], # recovery rate + params$gamma[2]*x["i2"])) + } > params = list(zoonotic=1e-6, beta=c(1e-7, 1e-5), mu=0.1, + delta=c(1e-2,1e-5), gamma=c(0.1, 0.1)) > set.seed(3) # set random number generator seed to be reproducible > r=ssa.adaptivetau(init.values, transitions, SIRrateF, params, tf=1000) Output from this simulation appears in Figure 3. Conclusion While the above examples are simple, the adaptivetau package maintains quick runtimes on models with hundreds of variables and thousands of transitions. Due to the approximations inherent in the adaptive tau-leaping algorithm, the precise runtime depends strongly on the exact nature of the model structure and parameters. However, this hybrid R/C++ implementation appears to be approximately 50 times faster than the pure R implementation in the GillespieSSA package. 8

Individuals (log scale) 1 100 10000 S I 1 I 2 0 200 400 600 800 1000 Time Figure 3: Simulated trajectory from SIIR model of stuttering zoonotic transmission. Zoonotic transmission occurs many times (red dashed line) before the pathogen becomes human-adapted and creates a serious epidemic (blue dotted line). Acknowledgment Funding: This work supported in part by National Science Foundation DBI-0906041 to Philip Johnson and National Institutes of Health R01- AI049334 to Rustom Antia. References Y. Cao, D. T. Gillespie, and L. R. Petzold. Adaptive explicit-implicit tau-leaping method with automatic tau selection. J Chem Phys, 126 (22):224101, 2007. URL http://eutils.ncbi.nlm.nih.gov/entrez/ eutils/elink.fcgi?cmd=prlinks&dbfrom=pubmed&retmode=ref&id= 17581038. Daniel T. Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys, 22(4):403 434, 1976. ISSN 0021-9991. doi: DOI:10.1016/ 0021-9991(76)90041-3. URL http://www.sciencedirect.com/science/ article/b6why-4dd1nc9-cp/2/43ade5f11fb949602b3a2abdbbb29f0e. 9

M. Kimura. Some problems of stochastic processes in genetics. Ann Math Stat, 28(4):882 901, 1957. J. O. Lloyd-Smith, D. George, K. M. Pepin, V. E. Pitzer, J. R. Pulliam, A. P. Dobson, P. J. Hudson, and B. T. Grenfell. Epidemic dynamics at the human-animal interface. Science, 326(5958):1362 7, 2009. URL http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi? cmd=prlinks&dbfrom=pubmed&retmode=ref&id=19965751. A. J. Lotka. Analytical note on certain rhythmic relations in organic systems. Proc Natl Acad Sci U S A, 6(7):410 5, 1920. URL http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi? cmd=prlinks&dbfrom=pubmed&retmode=ref&id=16576509. Mario Pineda-Krch. GillespieSSA: implementing the Gillespie stochastic simulation algorithm in R. Journal of Statistical Software, 25(12):1 18, 4 2008. ISSN 1548-7660. URL http://www.jstatsoft.org/v25/i12. 10