Stochastic Modelling of Biological Processes Lecture Notes

Size: px

Start display at page:

Download "Stochastic Modelling of Biological Processes Lecture Notes"

Jerome Stafford
5 years ago
Views:

1 Stochastic Modelling of Biological Processes Lecture Notes Ruth Baker Hilary Term 218

2 Abstract These lecture notes have been written to accompany the Part B course Stochastic Modelling of Biological Processes. They are based heavily on lecture notes originally written by Radek Erban for the Part C course of the same name, which will form part of the planned book R. Erban and S. J. Chapman, Stochastic Modelling of Reaction-Diffusion Processes. A huge thanks to Radek for letting me adapt this material for Part B. Matlab codes for most of the examples in these notes can be found on the course webpages. Although the final examination does not include computer practicals, you are encouraged to implement the algorithms contained in these notes as doing so will greatly aid in understanding. General suggested reading material is listed on the course webpages, and more specific references can be found at the of each lecture. Please me with any mistakes, however large or small! Ruth Baker, HT 218.

3 Contents 1 Modelling of degradation The chemical master equation Stochastic simulation Connection with the reaction rate equation References and further reading Tasks Example Matlab code Modelling production and degradation The chemical master equation The stationary distribution Stochastic simulation References and further reading Tasks Example Matlab code Modelling general chemical reactions The chemical master equation Stochastic simulation Example References and further reading Tasks Example Matlab code Stochastic versus deterministic modelling Stochastic modelling of dimerization Stochastic focussing References and further reading Tasks Example Matlab code Connection to stochastic differential equations The tau-leap method The chemical Langevin equation References and further reading i

4 Stochastic modelling of biological processes ii 5.4 Tasks Example Matlab code Introduction to stochastic differential equations A computational definition of a stochastic differential equation Example Example Example References and further reading Tasks Example Matlab code The Fokker-Planck equation Derivation of the Fokker-Planck equation The stationary distribution References and further reading Tasks Example Matlab code The backward Kolmogorov equation Derivation of the backward Kolmogorov equation The diffusion coefficient Average switching times Example 3 of Lecture References and further reading Tasks Example Matlab code The chemical Fokker-Planck equation Example: production and degradation The chemical Fokker-Planck equation References and further reading Tasks Example Matlab code A simple model of diffusion A compartment-based approach to diffusion Connection to a macroscale diffusion coefficient Analysis of the variance References and further reading Tasks Example Matlab code

5 Stochastic modelling of biological processes iii 11 The reaction-diffusion master equation A compartment-based model for production, degradation and diffusion A compartment-based model for higher order reactions Choice of compartment size, h Models of pattern formation References and further reading Tasks Diffusion and stochastic differential equations The Fokker-Planck equation and the diffusion equation One-dimensional diffusion and boundary conditions References and further reading Tasks Example Matlab code Molecular approaches to reaction-diffusion Molecular-based model of diffusion, production and degradation Molecular-based approaches for second-order reactions Reaction radius and reaction probability References and further reading Tasks A simple velocity-jump process A simple velocity-jump model in one dimension Analysis of the simple velocity-jump model A simply velocity-jump model with boundary conditions References and further reading Tasks Example Matlab code A more general velocity-jump process Large friction limit Einstein-Smoluchowski relation References and further reading Tasks

6 Chapter 1: Modelling of degradation Consider degradation of the chemical species A according to the following reaction: A k, (1.1) where k > is the rate constant of the reaction (units s 1 ). It is defined so that kdt is the probability that a molecule of A degrades during the time interval [t, t + dt) where t is the time and dt is an (infinitesimally) small time step. Denote the number of molecules of A present at time t by A(t). Then we have P(no reactions in [t, t + dt)) = 1 A(t)kdt + O ( dt 2), (1.2) P(one reaction in [t, t + dt)) = A(t)kdt + O ( dt 2), (1.3) P(more than one reaction in [t, t + dt)) = O ( dt 2). (1.4) 1.1 The chemical master equation Let p n denote the probability that there are n molecules of A present in the system at time t. Then p n (t + dt) = (1 kndt) p n (t) + k(n + 1)dtp n+1 (t) + O ( dt 2), (1.5) so that, rearranging and taking the limit dt, we have the chemical master equation: dp n dt = k(n + 1)p n+1(t) knp n (t). (1.6) Equation (1.6) is a system of ordinary differential equations for the probabilities p n where n =, 1, 2,..., N 1 and A() = N is the initial number of A molecules in the system. The equation for P N is dp N dt = knp N (t) with P N () = 1 = p N (t) = e knt. (1.7) Inductively, one can show that ( ) N p n (t) = e knt 1 e kt) N n, n n =, 1,..., N, (1.8) i.e. p n (t) B(N, e kt ). This means that the mean and variance of the number of A molecules present in the system at time t are, respectively, given by M(t) = Ne kt and V (t) = Ne kt ( 1 e kt). (1.9) The chemical master equation, (1.6) and (1.7), and its solution, (1.8), enable us to quantify the stochastic fluctuations around the mean. 1

7 Chapter 1 Stochastic modelling of biological processes Stochastic simulation We would like to be able to generate individual trajectories from the model using a computational algorithm (this is more useful when we are unable to solve the chemical master equation explicitly). An efficient algorithm involves generating random numbers that represent the waiting times between successive degradation events. In order to do this, we need to understand, for each time t, how to compute the time when the next molecule degrades, t + τ. We note that τ is a random variable so we need to calculate its probability distribution function, and then how to draw random variates from this distribution Waiting times Let f(a(t), s)ds denote the probability that, given A(t) molecules of A in the system at time t, the next degradation reaction occurs in the time interval [t + s, t + s + ds), where ds is an (infinitesimally) small time step. For this to happen, we know that there cannot be a reaction in the time interval [t, t + s) and then a reaction must occur in the time interval [t + s, t + s + ds). Hence we can write f(a(t), s)ds = g(a(t), s) A(t + s)kds = g(a(t), s) A(t)kds, (1.1) where g(a(t), s) is the probability that no reaction occurs during the time interval [t, t+s) when there are A(t) molecules at time t. For any σ >, the probability that no reaction happens in the interval [t, t + σ + dσ) is given by g(a(t), σ + dσ) = g(a(t), σ) [1 A(t + σ)kdσ] = g(a(t), σ) [1 A(t)kdσ]. (1.11) Rearranging and taking the limit as dσ gives dg(a(t), σ) dσ = ka(t)g(a(t), σ) = g(a(t), σ) = e ka(t)σ, (1.12) upon noting that g(a(t), ) = 1. Substituting into Equation (1.1) gives f(a(t), s)ds = ka(t)e ka(t)s ds. (1.13) Generating random numbers To generate stochastic sample paths, we now need a means by which to generate random numbers distributed according to Equation (1.13). We start by considering the function F (τ) = e ka(t)τ, (1.14) which is monotone decreasing for A(t) > and such that F : (, ) (, 1). For a, b (, 1) with a < b we then have that P (F (τ) (a, b)) = P ( τ ( F 1 (b), F 1 (a) )), (1.15)

8 Chapter 1 Stochastic modelling of biological processes 3 or, equivalently, F 1 (a) F 1 (b) f(a(t), s)ds = F 1 (a) F 1 (b) F 1 (a) ka(t)e ka(t)s ds df = F 1 (b) ds ds = F ( F 1 (a) ) + F ( F 1 (b) ) = b a. (1.16) This means that if τ is a random number distributed according to Equation (1.13) then F (τ) is a random number uniformly distributed in (, 1). As such, if we can generate a random number, r, uniformly distributed on (, 1) then we can generate the time of the next reaction by solving r = F (τ) = e ka(t)τ, (1.17) to give τ = 1 ( ) 1 ka(t) ln. (1.18) r Stochastic simulation algorithm for degradation The stochastic simulation algorithm for a degradation reaction can then be written: 1. Set t = and A(t) = N. 2. Generate a random number r U(, 1) and set τ = 1 ( ) 1 ka(t) ln. r 3. (a) If t + τ t final set t = t + τ and A(t + τ) = A(t) 1. If A(t) > return to Step 2, otherwise exit. (b) If t > t final set t = t final, A(t final ) = A(t) and exit. Figure 1.1 shows a number of sample paths generated using this stochastic simulation algorithm. 1.3 Connection with the reaction rate equation Using the chemical master equation, (1.6) and (1.7), we can show that the mean number of A molecules, M(t) = np n (t), (1.19) n= satisfies (as we might have expected from consideration of the corresponding deterministic model) dm = km with M() = N, (1.2) dt which has solution given by (1.9). Evolution of the mean molecule number is plotted in Figure 1.1, alongside a number of sample paths.

9 number of A molecules number of A molecules Chapter 1 Stochastic modelling of biological processes time [sec] time [sec] Figure 1.1: Sample paths from a degradation reaction system. Left: four different sample paths. Right: twenty different sample paths, with the mean (black dashed line). Parameters are A() = 2, k =.1s References and further reading A practical guide to stochastic simulations of reaction-diffusion processes. R. Erban, S. J. Chapman and P. K. Maini. arχiv (27). A rigorous derivation of the chemical master equation. D. T. Gillespie. Physica A 188 (1992). Exact stochastic simulation of coupled chemical reactions. D. T. Gillespie. J. Chem. Phys. 81 (1977) Tasks Implement a stochastic simulation algorithm that models the degradation reaction (1.1), and plot some sample paths generated using the algorithm and the parameter values in Figure 1.1. Use your algorithm to estimate the evolution of the mean and variance in the number of A molecules by averaging over a large number of sample paths. By plotting both on the same axes, compare your result with the analytical expressions for the mean and variance given in Equation (1.9). 1.6 Example Matlab code To simulate sample paths from the degradation model using the Gillespie algorithm. function degradation lecture() clear all; close all; N=2; % initial number of A molecules k=.1; % reaction rate t final=3; % final time

10 Chapter 1 Stochastic modelling of biological processes 5 % create variables to store the results no paths=5; % number of sample paths A=cell(no paths,1); % create a cell array to record number of A molecules t=cell(no paths,1); % create a cell array to record reaction times %% % analytic expressions for the mean and variance t grid=:.5:t final; M=N*exp(-k*t grid); V=N*exp(-k*t grid).*(1-exp(-k*t grid)); %% % use the Gillespie algorithm to generate sample paths for ii=1:no paths % set and record the initial time and molecule numbers jj=1; t{ii}(jj)=; A{ii}(jj)=N; % perform reactions until the final time is reached while t{ii}(jj)<t final tau=1/(k*a{ii}(jj))*log(1/rand); % time until the next reaction if t{ii}(jj)+tau<=t final % update molecule numbers and time A{ii}(jj+1)=A{ii}(jj)-1; t{ii}(jj+1)=t{ii}(jj)+tau; jj=jj+1; if A{ii}(jj)== % stop if all molecules of A have decayed break; else A{ii}(jj+1)=A{ii}(jj); t{ii}(jj+1)=t final; break; %% % plot the results figure(1); clf; hold on; box on for ii=1:no paths stairs(t{ii},a{ii},'linewidth',1) plot(t grid,m,'k--','linewidth',2) axis([ 3 N]) set(gca,'xtick',:1:3) set(gca,'ytick',:5:2) xlabel('time [sec]'); ylabel('number of A molecules')

11 Chapter 2: Modelling production and degradation Consider the production and degradation of the chemical species A according to the following reactions: A k 1, (2.1) k 2 A, (2.2) where k 1 > is the rate of degradation (units s 1 ) and k 2 is the rate of production of A per unit volume (units m 3 s 1 ). This means that one molecule of A is produced during the time interval [t, t + dt) with probability k 2 νdt where ν is the volume of the system. As before, we denote the number of molecules of A present at time t by A(t). Then we have P(no reactions in [t, t + dt)) = 1 (k 1 A(t) + k 2 ν)dt + O ( dt 2), (2.3) P(one A molecule decays in [t, t + dt)) = k 1 A(t)dt + O ( dt 2), (2.4) P(one A molecule produced in [t, t + dt)) = k 2 νdt + O ( dt 2), (2.5) P(more than one reaction in [t, t + dt)) = O ( dt 2). (2.6) 2.1 The chemical master equation As before, let p n (t) denote the probability that there are n molecules of A present in the system at time t. Then, for n >, we have p n (t + dt) = (1 k 1 ndt k 2 νdt) p n (t) + k 1 (n + 1)dtp n+1 (t) + k 2 νdtp n 1 (t) + O ( dt 2), (2.7) so that, rearranging and taking the limit dt, we arrive at dp n dt = k 1(n + 1)p n+1 k 1 np n + k 2 νp n 1 k 2 νp n. (2.8) For the case n = we have dp dt = k 1 p 1 k 2 ν p. (2.9) equations (2.8) and (2.9) constitute the chemical master equation, a system of ordinary differential equations for the probabilities p n where n =, 1, 2,... and A() = N is the initial number of A molecules in the system Mean and variance of molecule number We can use the chemical master equation, (2.8) and (2.9), to derive equations for the mean and variance of A(t): M(t) = np n (t) and V (t) = n= 6 (n M(t)) 2 p n (t). (2.1) n=

12 Chapter 2 Stochastic modelling of biological processes 7 Multiplying the chemical master equation by n and summing over n, we have d dt np n = k 1 n= n= n(n + 1)p n+1 k 1 n= n 2 p n + k 2 ν np n 1 k 2 ν np n, (2.11) where we define p 1 = to write (2.9) in the same form as (2.8). Changing indices on the right-hand side (e.g. n ± 1 n) gives n= n= dm dt = k 1 (n 1)np n k 1 n 2 p n + k 2 ν (n + 1)p n k 2 ν np n n= = k 1 np n + k 2 ν n= n= n= p n n= n= = k 1 M + k 2 ν, (2.12) with M(t) = k 2ν k 1 ( 1 e k 1t ) + Ne k 1t k 2ν k 1 as t. (2.13) To derive an expression for the variance, we multiply the chemical master equation by n 2, sum over n and change indices on the right-hand side to obtain d dt n 2 p n = k 1 n 2 (n + 1)p n+1 k 1 n= n= n= n= n 3 p n + k 2 ν n 2 p n 1 k 2 ν n 2 p n = k 1 ( 2n 2 + n)p n + k 2 ν (2n + 1)p n. (2.14) Substituting into the expression for V (t) we have n= n= n= dv dt = d dt n 2 p n (t) 2M dm dt n= = ( 2k 1 V + M 2 ) + k 1 M + 2k 2 νm + k 2 ν 2M ( k 1 M + k 2 ν) = 2k 1 V + k 1 M + k 2 ν. (2.15) We can then see that the stationary distribution (t ) is such that M s = lim t M(t) = k 2ν k 1 and V s = lim t V (t) = k 2ν k 1 = M s. (2.16) Evolution of the mean molecule number towards this steady state is plotted in Figure The stationary distribution The variance, V (t), gives us some information about the fluctuations in molecule number. However, we can learn more about the fluctuations about the (quasi-)steady state by considering the stationary distribution: φ(n) := lim t p n (t), n =, 1, 2,... (2.17) We can compute φ(n) by considering the steady states of the chemical master equation: = k 1 φ(1) k 2 ν φ(); (2.18) = k 1 (n + 1)φ(n + 1) k 1 nφ(n) + k 2 νφ(n 1) k 2 νφ(n); (2.19)

13 Chapter 2 Stochastic modelling of biological processes 8 for n 1, to arrive at the recursive definition φ(1) = k 2ν φ(), (2.2) k 1 1 φ(n + 1) = k 1 (n + 1) [k 1nφ(n) + k 2 νφ(n) k 2 νφ(n 1)], n 1. (2.21) We can eliminate the remaining constant, φ(), by noting that n= φ(n) = 1. The stationary distribution is plotted in Figure Stochastic simulation As previously, we would like to be able to generate individual sample paths from the model using a computational algorithm. This time we have two decisions: when the next reaction occurs; and which reaction (production or degradation) occurs Waiting times We can use the same principles as before to find the waiting time, τ, until the next reaction: let f(a(t), s)ds denote the probability that, given A(t) molecules of A in the system at time t, the next reaction occurs in the time interval [t + s, t + s + ds), where ds is an (infinitesimally) small time step. For this to happen, we know that there cannot be a reaction in time interval [t, t + s) and then a reaction must occur in the time interval [t + s, t + s + ds). Hence we can write f(a(t), s)ds = g(a(t), s) (A(t + s)k 1 + k 2 ν) ds = g(a(t), s) (A(t)k 1 + k 2 ν) ds, (2.22) where g(a(t), s) is the probability that no reaction occurs during the time interval [t, t + s). by For any σ >, the probability that no reaction happens in the interval [t, t + σ + dσ) is given g(a(t), σ + dσ) = g(a(t), σ) [1 (A(t + σ)k 1 + k 2 ν) dσ] Rearranging and taking the limit as dσ gives = g(a(t), σ) [1 (A(t)k 1 + k 2 ν) dσ]. (2.23) hence dg(a(t), σ) dσ = a (t)g(a(t), σ) where a (t) = (A(t)k 1 + k 2 ν), (2.24) g(a(t), σ) = e a (t)σ, (2.25) Substituting into Equation (2.22) gives f(a(t), s)ds = a (t)e a(t)s ds. (2.26) Generating random numbers for the waiting time To generate stochastic sample paths, we now need a means by which to generate random numbers distributed according to Equation (2.26). Following the same arguments as in Section 1.2.2, if we can generate a random number, r 1, uniformly distributed on (, 1) then we can generate the time of the next reaction by solving r 1 = F (τ) = e a(t)τ, (2.27)

14 Chapter 2 Stochastic modelling of biological processes 9 to give where, as in Equation (2.24), a (t) = k 1 A(t) + k 2 ν Choosing which reaction occurs τ = 1 ( ) 1 a (t) ln, (2.28) r 1 Whether the reaction that occurs is a production or degradation reaction deps on the relative probabilities of the two reactions: k 1 A(t) P(degradation reaction occurs) = k 1 A(t) + k 2 ν = k 1A(t) a (t) ; (2.29) k 2 ν P(production reaction occurs) = k 1 A(t) + k 2 ν = k 2ν a (t). (2.3) This means that if we can draw a random number, r 2, uniformly distributed on (, 1) then we can decide which reaction occurs using the following rule: degradation reaction occurs if r 2 a (t) [, k 1 A(t)); (2.31) production reaction occurs if r 2 a (t) [k 1 A(t), k 1 A(t) + k 2 ν). (2.32) Stochastic simulation algorithm for production and degradation The stochastic simulation algorithm for a degradation reaction can then be written: 1. Set t = and A(t) = N. 2. Calculate a (t). 3. Generate a random number r 1 U(, 1) and set τ = 1 ( 1 a (t) ln 4. (a) If t + τ t final then generate a random number r 2 U(, 1). i. If r 2 a (t) [, k 1 A(t)) then set A(t + τ) = A(t) 1 and t = t + τ. ii. If r 2 a (t) [k 1 A(t), k 1 A(t) + k 2 ν) then set A(t + τ) = A(t) + 1 and t = t + τ. Return to Step 2. (b) If t > t final exit. Five sample paths generated using this algorithm are plotted in Figure 2.1. A large number (1 7 ) of sample paths generated using the algorithm are also used to approximate the stationary distribution on the right of Figure References and further reading A practical guide to stochastic simulations of reaction-diffusion processes. R. Erban, S. J. Chapman and P. K. Maini. arχiv (27). A rigorous derivation of the chemical master equation. D. T. Gillespie. Physica A 188 r 1 ). (1992). Exact stochastic simulation of coupled chemical reactions. D. T. Gillespie. J. Chem. Phys. 81 (1977).

15 number of A molecules frequency Chapter 2 Stochastic modelling of biological processes time [sec] number of A molecules Figure 2.1: Sample paths from a production-degradation reaction system. Left: five different sample paths with the mean (black dashed line). Right: the stationary distribution calculated using both equations (2.2) and (2.21) (orange) and repeated stochastic simulation (grey). Parameters are A() =, k 1 =.1s 1 and k 2 ν = 1.s Tasks Implement a stochastic simulation algorithm that models the production-degradation system (2.1)-(2.2), and plot some sample paths generated using the algorithm and the parameter values in Figure 2.1. Use your algorithm to estimate the stationary distribution, as in Figure 2.1. In order to make this computationally efficient you can, for example, collect data from a single sample path every second for 1 7 seconds. By plotting both on the same axes, compare your result with the analytical expression specified in (2.2)-(2.21). Note that to do this, you should show inductively that φ(n) = C ( ) k2 ν n ( with C = exp k ) 2ν. (2.33) n! k 1 k Example Matlab code To generate sample paths from the production-degradation model using the Gillespie algorithm, and use a long sample path to estimate the stationary distribution. function production degradation lecture() clear all; close all; N=; % initial number of A molecules k1=.1; %decay rate k2v=1; % production rate t final=1; % final time %% % create variables to store the results

16 Chapter 2 Stochastic modelling of biological processes 11 no paths=5; A=cell(no paths,1); % create a cell array to record number of A molecules t=cell(no paths,1); % create a cell array to record reaction times % analytic expression for the mean t grid=:.5:t final; M=k2V/k1*(1-exp(-k1*t grid)); % Generate sample paths using the Gillespie algorithm for i=1:no paths % set and record the initial time and molecule numbers j=1; t{i}(j)=; A{i}(j)=N; while t{i}(j)<t final A=k1*A{i}(j)+k2V; % calculate a tau=1/a*log(1/rand); % time until the next reaction if t{i}(j)+tau<=t final % update molecule numbers and time r2=a*rand; if r2<k1*a{i}(j) A{i}(j+1)=A{i}(j)-1; else A{i}(j+1)=A{i}(j)+1; t{i}(j+1)=t{i}(j)+tau; j=j+1; else break; %% % plot the results figure(1); clf; hold on; box on for i=1:no paths stairs(t{i},a{i},'linewidth',1) plot(t grid,m,'k--','linewidth',2) axis([ 8 2]) set(gca,'xtick',:2:8) set(gca,'ytick',:5:2) xlabel('time [sec]'); ylabel('number of A molecules') %% % estimate the stationary distribution using a very long sample path path length=1e7; % length of the sample path t=; A=k2V/k1; % initial value of path A stationary=zeros(11,1); % create a vector to hold values of A

17 Chapter 2 Stochastic modelling of biological processes 12 for i=1:path length % record the value of A every 1 seconds while t<i A=k1*A+k2V; tau=1/a*log(1/rand); % update molecule numbers and time r2=a*rand; if r2<k1*a A=A-1; else A=A+1; t=t+tau; if A<=1 A stationary(a+1)=a stationary(a+1)+1; % analytic expression for the stationary distribution C=exp(-k2V/k1); n phi=:1:25; phi=c./factorial(n phi).*(k2v/k1).ˆn phi; % plot the results figure(2); clf; cla; hold on; box on bar(:1:1,a stationary/path length,'facecolor',[.7.7.7],... 'LineWidth',.5,'BarWidth',.6); plot(n phi,phi,'*','color',[ ],'markersize',2) axis([ ]) set(gca,'xtick',:5:25) set(gca,'ytick',:.5:.15) set(gca,'yticklabel',arrayfun(@(s)sprintf('%.2f', s),cellfun(@(s)str2num(s),... get(gca,'yticklabel')),'uniformoutput', false)) ylabel('frequency'); xlabel('number of A molecules')

18 Chapter 3: Modelling general chemical reactions The reactions we have considered up until now have either been zeroth order (production of A) or first order (degradation of A). We would also like to be able to simulate second order reactions of the form A + B A + A where, in each case, k > is the rate of reaction. k C, (3.1) k B, (3.2) We have to think carefully about the probabilities of each reaction and also how the probability of a reaction occurring during the time interval [t, t + dt) scales with the volume of the system, ν. For example, we would expect one molecule of A and one molecule of B to collide and react twice as often in a system that has volume ν/2 as compared to one with volume ν. This means that, for one molecule of A and one molecule of B, the probability of a reaction occurring in the time interval [t, t + dt) is k dt/ν, and that the rate constant k has unit m 3 s 1. When we have more than one molecule of A and/or B, we need to consider the number of different pairs of A and B molecules. This is equal to the product A(t)B(t), where A(t) and B(t) are the number of A and B molecules, respectively. The probability that reaction (3.1) occurs during the time interval [t, t + dt) is therefore A(t)B(t)k dt/ν. chemical reaction order propensity function, a(t) units of k k A zeroth kν m 3 s 1 k A first ka(t) s 1 k A + B C second ka(t)b(t)/ν m 3 s 1 k A + A B second ka(t)(a(t) 1)/ν m 3 s 1 k A + B + C D third ka(t)b(t)c(t)/ν 2 m 6 s 1 k A + A + B C third ka(t)(a(t) 1)B(t)/ν 2 m 6 s 1 A + A + A k B third ka(t)(a(t) 1)(A(t) 2)/ν 2 m 6 s 1 Table 3.1: The basic types of reactions with their order, propensity function and the units of the rate constant. For each reaction, the propensity function is defined such that the probability of a reaction occurring in the infinitesimally small time interval [t, t+dt) is a i (t)dt. We can use the propensity functions to understand how the units of the reaction rate change for each type of reaction. Table 3.1 lists the propensity functions for the reactions we have discussed thus far, along with their propensity function and the units of k. Note that for reaction (3.2) the number of different 13

19 Chapter 3 Stochastic modelling of biological processes 14 pairs of A molecules is ( ) A(t) = 2 A(t)(A(t) 1), (3.3) 2 and that it is common practice to absorb the factor 1/2 into the rate constant, k. Similar choices are made for third and higher order reactions also. Note that this might be different notation to that used in the Part B Further Mathematical Biology course. 3.1 The chemical master equation To write down the chemical master equation for a general reaction system, we need some additional terminology. We will consider a biochemical network consisting of N species, S 1,..., S N that may be involved in M possible reactions R 1,..., R M. The population size of S i is known as its copy number and is denoted by X i (t) at time t. The state vector is then defined as X 1 (t) X(t) :=.. (3.4) X N (t) With every reaction, j, we have two quantities. The first is the propensity function, a j (X(t)), which we have already discussed. The second is the stoichiometric, or state-change, vector, ν j := where ν ij is the change in species i caused by the firing of reaction j. ν 1j. ν Nj, (3.5) As before, we construct the chemical master equation by considering how the probability that the system is in a given state changes through time. Define P(x, t x, t ) = P(X(t) = x given X(t ) = x ). (3.6) Then, by considering the possible changes in species numbers brought about by a single reaction taking place we have Notes dp(x, t x, t ) dt = M [a j (x ν j )P(x ν j, t x, t ) a j (x)p(x, t x, t )]. (3.7) j=1 The chemical master equation in fact constitutes a (possibly infinite) system of ODEs that is closed by specifying an initial condition. The description of stochastic chemical kinetics used here is a Markov jump process, and then chemical master equation is otherwise known as Kolmogorov s forward equation for the Markov jump process. 3.2 Stochastic simulation The stochastic simulation algorithms outlined in Lectures 1-2 are special forms of the Gillespie Direct Method stochastic simulation algorithm. We will now generalise it to allow the generation of sample paths from Equation (3.7). Once again, we have two decisions: when the next reaction occurs; and which reaction occurs.

20 Chapter 3 Stochastic modelling of biological processes Generating sample paths for the waiting time Following the same arguments as in Lectures 1-2, for a system in state X(t) at time t, if we can generate a random number, r 1, uniformly distributed on (, 1) then we can generate the time of the next reaction as τ = 1 ( ) 1 a (t) ln r 1 where a (t) = M a j (t), (3.8) with a j (t) the propensity of reaction j and, for brevity, we have dropped the explicit depence of the a j upon the state X(t) Choosing which reaction occurs Again, the same arguments as in Lectures 1-2 tell us that P(reaction R j occurs) = j=1 a j (t) M j=1 a j(t) = a j(t) a (t). (3.9) This means that if we can draw a random number, r 2, uniformly distributed on (, 1) then we can decide which reaction occurs using the following rule: reaction R j occurs if j 1 a i (t) r 2 a (t) < i= Gillespie stochastic simulation algorithm j a i (t). (3.1) The Gillespie (direct method) stochastic simulation algorithm can then be written: 1. Set t = t and X(t) = X(t ). 2. Calculate a j (X(t)) for j = 1,..., M and a (t). 3. Generate a random number r 1 U(, 1) and set τ = 1 ( 1 a (t) ln 4. (a) If t + τ t final then: i. generate a random number r 2 U(, 1); ii. find j such that j 1 r 1 ). a i (t) r 2 a (t) < i=1 i=1 j a i (t). Let X(t + τ) = X(t) + ν j, t = t + τ and return to Step 2. (b) If t + τ > t final exit. 3.3 Example Consider two chemical species, A and B, that undergo the following chemical reactions i=1 A + A k 1, (3.11) A + B k 2, (3.12) k 3 A, (3.13) k 4 B, (3.14) in a system with volume ν where k 1, k 2, k 3 and k 4 are all positive rate constants.

21 Chapter 3 Stochastic modelling of biological processes Chemical master equation Let p n,m (t) = P (A(t) = n and B(t) = m A(t ) = n and B(t ) = m ), (3.15) where we adopt the previous convention that A(t) denotes the number of A molecules present at time t and similarly for B. We then have dp n,m dt = k 1 ν (n + 2)(n + 1)p n+2,m k 1 ν n(n 1)p n,m + k 2 ν (n + 1)(m + 1)p n+1,m+1 k 2 ν nmp n,m + k 3 νp n 1,m k 3 νp n,m + k 4 νp n,m 1 k 4 νp n,m, (3.16) for n, m with the convention, as before, that p n,m if n < or m <. The stationary distribution is then defined as φ(n, m) = lim t p n,m (t), (3.17) and one can also compute the stationary distribution of A only as φ(n) = φ(n, m). (3.18) The stationary distributions for reaction system (3.11)-(3.14) are plotted in Figure Stochastic simulation m= Since reactions (3.11) and (3.12) are second order we cannot solve Equation (3.16) analytically, nor can we obtain closed evolution equations for the stochastic mean and variance. This means that to make progress one option is to generate statistics for the system using repeated stochastic simulation. For the reaction system (3.11)-(3.14) need to compute the following reaction propensities and stoichiometric vectors: a 1 (t) = k 1 ν A(t)(A(t) 1), ν 1 = a 2 (t) = k 2 ν A(t)B(t), ν 2 = a 3 (t) = k 3 ν, ν 3 = a 4 (t) = k 4 ν, ν 4 = [ [ [ [ ] ] ] ] ; (3.19) ; (3.2) ; (3.21) ; (3.22) with a (t) = a 1 (t) + a 2 (t) + a 3 (t) + a 4 (t). algorithm are shown in Figure 3.1. Five different sample paths generated using this

22 number of A molecules number of B molecules Chapter 3 Stochastic modelling of biological processes time [sec] time [sec] Figure 3.1: Sample paths from system (3.11)-(3.14). Numerical solutions of the reaction rate equations (3.28)-(3.29) are indicated by black dashed lines. Parameters are A() =, B() =, k 1 /ν =.1s 1 and k 2 /ν =.1s 1, k 3 = 1.2s 1 and k 4 = 1.s Reaction rate equations The connection with the reaction rate equations can be found by using Equation (3.16) to derive expressions for the evolution of mean A and B molecule numbers: We have where A = A 2 = n= m= d A dt d B dt n= m= np n,m (t) and B = n= m= mp n,m (t). (3.23) = 2k 1 ν A(A 1) k 2 ν AB + k 3ν, (3.24) = k 2 ν AB + k 4ν, (3.25) n 2 p n,m (t) and AB = n= m= nmp n,m (t). (3.26) The reaction rate equations can be deduced from equations (3.24)-(3.25) by taking limits as the molecule numbers, A(t) and B(t), and the system size, ν, t to infinity in such a way that a(t) = A(t) ν and b(t) = B(t) ν, (3.27) are held constant, and assuming A 2 = A 2 and AB = A B, to arrive at da dt db dt = 2k 1 a 2 k 2 ab + k 3, (3.28) = k 2 ab + k 4. (3.29) Note that this process of taking limits means that the reaction rate equations (3.28)-(3.29) do not exactly describe the mean behaviour of the system. In the next lecture and Problem Sheet 1 we shall explore this in more detail.

number of B molecules frequency Chapter 3 Stochastic modelling of biological processes 18 3 25 2 15 1 5 5 1 15 2 25 3 number of A molecules.1.8.6.4.2. 5 1 15 2 25 3 number of A molecules Figure 3.

23 number of B molecules frequency Chapter 3 Stochastic modelling of biological processes number of A molecules number of A molecules Figure 3.2: The stationary distribution for system (3.11)-(3.14) estimated using long time paths generated using the stochastic simulation algorithm. Left: φ(n, m). Right: φ(n) with the mean predicted by the reaction rate equations (3.28)-(3.29) indicated in orange. Parameters are A() =, B() =, k 1 /ν =.1s 1 and k 2 /ν =.1s 1, k 3 = 1.2s 1 and k 4 = 1.s References and further reading A practical guide to stochastic simulations of reaction-diffusion processes. R. Erban, S. J. Chapman and P. K. Maini. arχiv (27). A rigorous derivation of the chemical master equation. D. T. Gillespie. Physica A 188 (1992). Exact stochastic simulation of coupled chemical reactions. D. T. Gillespie. J. Chem. Phys. 81 (1977) Tasks Implement a stochastic simulation algorithm that models the two-species system (3.11)- (3.14), and plot some sample paths generated using the algorithm using the parameter values in Figure 3.1. Use your algorithm to generate a long time sample path to estimate the stationary distributions for both A and B, and for each of A and B individually, as in Figure 3.2. Derive the reaction rate equations (3.28)-(3.29) from the chemical master equation (3.16) and compare the steady states predicted by the reaction rate equations with the stationary distribution. How do the average values predicted by long time simulation compare with those of the reaction rate equations? 3.6 Example Matlab code To generate sample paths from the two species model using the Gillespie algorithm. function example lecture() clear all;

24 Chapter 3 Stochastic modelling of biological processes 19 close all; initial A=; % initial number of A molecules initial B=; % initial number of B molecules % set the values of the rate constants k1=.1; k2=.1; k3=1.2; k4=1; t final=12; % final time %% % generate sample paths using the Gillespie algorithm % create data structures to hold the results no repeats=5; A=cell(no repeats,1); B=cell(no repeats,1); t=cell(no repeats,1); for i=1:no repeats % set the initial time and molecule numbers j=1; t{i}(j)=; A{i}(j)=initial A; B{i}(j)=initial B; while t{i}(j)<t final A=k1*A{i}(j)*(A{i}(j)-1)+k2*A{i}(j)*B{i}(j)+k3+k4; % calculate a tau=1/a*log(1/rand); % calculate the time to the next reaction if t{i}(j)+tau<=t final % update molecule numbers and time % figure out which reaction has taken place r2=a*rand; ss=k1*a{i}(j)*(a{i}(j)-1); % check to see if first reaction if r2<ss A{i}(j+1)=A{i}(j)-2; B{i}(j+1)=B{i}(j); else % if not, check to see if second reaction ss=ss+k2*a{i}(j)*b{i}(j); if r2<ss A{i}(j+1)=A{i}(j)-1; B{i}(j+1)=B{i}(j)-1; else % if not, check to see if third reaction ss=ss+k3; if r2<ss A{i}(j+1)=A{i}(j)+1; B{i}(j+1)=B{i}(j); else % if not, then it must be the fourth reaction

25 Chapter 3 Stochastic modelling of biological processes 2 A{i}(j+1)=A{i}(j); B{i}(j+1)=B{i}(j)+1; % update time t{i}(j+1)=t{i}(j)+tau; j=j+1; else break; %% % solve the reaction rate equations using a forward Euler method dt=.1; % time step t grid=:dt:t final; A det=zeros(length(t),1); B det=zeros(length(t),1); A det(1)=initial A; % initial A value B det(1)=initial B; % initial B value for i=1:length(t grid)-1 A det(i+1)=a det(i)+dt*(-2*k1*a det(i)ˆ2-k2*a det(i)*b det(i)+k3); B det(i+1)=b det(i)+dt*(-k2*a det(i)*b det(i)+k4); %% % plot the results figure(1); clf; hold on; box on for i=1:no repeats stairs(t{i},a{i},'linewidth',1) plot(t grid,a det,'k--','linewidth',2) axis([ 1 25]) set(gca,'xtick',:25:1) set(gca,'ytick',:5:25) xlabel('time [sec]'); ylabel('number of A molecules') figure(2); clf; hold on; box on for i=1:no repeats stairs(t{i},b{i},'linewidth',1) plot(t grid,b det,'k--','linewidth',2) axis([ 1 25]) set(gca,'xtick',:25:1) set(gca,'ytick',:5:25)

26 Chapter 3 Stochastic modelling of biological processes 21 xlabel('time [sec]'); ylabel('number of B molecules') To use a long sample path to estimate the stationary distribution. function example stationary density() clear all; close all; % set the parameter values k1=.1; k2=.1; k3=1.2; k4=1; % calculate the deterministic steady state initial A=sqrt((k3-k4)/(2*k1)); initial B=k4/(k2*initial A); %% t=; % round the initial molecule numbers A=round(initial A); B=round(initial B); path length=1e7; % set the path length % create matrices to hold values of A and B AB stationary=zeros(11,11); A stationary=zeros(11,1); A RRE=zeros(11,1); for i=1:path length % record the values of A and B every 1 seconds while t<i A=k1*A*(A-1)+k2*A*B+k3+k4; % calculate a tau=1/a*log(1/rand); % calculate the time until the next reaction % figure out which reaction has taken place r2=a*rand; ss=k1*a*(a-1); % check to see if first reaction if r2<ss A=A-2; B=B; else % if not, check to see if second reaction ss=ss+k2*a*b; if r2<ss A=A-1; B=B-1; else % if not, check to see if second reaction ss=ss+k3;

27 Chapter 3 Stochastic modelling of biological processes 22 if r2<ss A=A+1; B=B; % if not, must be fourth reaction else A=A; B=B+1; t=t+tau; % save the values of A and B if A<=1 && B<=1 A stationary(a+1)=a stationary(a+1)+1; AB stationary(a+1,b+1)=ab stationary(a+1,b+1)+1; A RRE(round(initial A)+1)=A stationary(round(initial A)+1); %% % plot the results % plot the joint stationary distribution of A and B figure(1); clf; hold on; box on imagesc(ab stationary/path length); axis([ 3 3]) colormap gray colormap(flipud(colormap)) plot(1,1,'+','color',[ ]); set(gca,'xtick',:5:3) set(gca,'ytick',:5:3) xlabel('number of A molecules'); ylabel('number of B molecules') % plot the marginal stationary distribution of A figure(2); clf; hold on; box on bar(:1:1,a stationary/path length,'facecolor',[.7.7.7],... 'LineWidth',.5,'BarWidth',.6) bar(:1:1,a RRE/path length,'facecolor',[ ],... 'LineWidth',.5,'BarWidth',.6) axis([ 3.1]) set(gca,'xtick',:5:3) set(gca,'ytick',:.2:.1) set(gca,'yticklabel',arrayfun(@(s)sprintf('%.2f', s),cellfun(@(s)str2num(s),... get(gca,'yticklabel')),'uniformoutput',false)) xlabel('number of A molecules'); ylabel('frequency')

28 Chapter 4: Stochastic versus deterministic modelling In the first two lectures, we saw that when only zeroth or first order reactions are present, the reaction rate equations exactly predict evolution of mean particle numbers. In this lecture, we shall explore a range of cases in which the predictions of stochastic and deterministic models differ when second or higher reactions are included. 4.1 Stochastic modelling of dimerization Consider a system consisting of the following reactions: where k 1 and k 2 are positive rate constants. evolution of p n (t) := P(A(t) = n), can be written A + A k 1, (4.1) k 2 A, (4.2) The chemical master equation, describing the dp n dt = k 1 ν (n + 2)(n + 1)p n+2 k 1 ν n(n 1)p n + k 2 νp n 1 k 2 νp n n =, 1, 2,..., (4.3) where we define p 1. As dimerisation is a second order reaction, we know that we cannot obtain a closed evolution equation for the mean or variance of A(t). Stochastic simulation provides one approach to investigate the dynamics of the system, but if we are interested only in the stationary values, M s and V s, and the stationary distribution φ(n) then we can make some progress analytically using probability generating functions Probability generating function approach Define G : [ 1, 1] (, ) R by G(x, t) := Differentiating G(x, t) with respect to x we have x n p n (t), (4.4) n= G x = nx n 1 p n (t), (4.5) 2 G x 2 = n=1 n(n 1)x n 2 p n (t). (4.6) n=1 Using the definitions provided in Equation (2.1) gives M(t) = G (1, t), (4.7) x ( ) V (t) = 2 G G G 2 (1, t) + (1, t) (1, t). (4.8) x2 x x 23

29 Chapter 4 Stochastic modelling of biological processes 24 Using induction, it is also possible to show that p n (t) = 1 n G (, t), n. (4.9) n! xn To derive an expression for G(x, t) we multiply the chemical master equation (4.3) by x n and sum over n to arrive at x n p n = k 1 x n (n + 2)(n + 1)p n+2 k 1 x n n(n 1)p n t ν ν n= n= n= +k 2 ν x n p n 1 k 2 ν x n p n, (4.1) n= and then we substitute using Equations (4.4) and (4.7) to obtain a partial differential equation for G: G t = k 1 ν (1 x2 ) 2 G x 2 + k 2ν(x 1)G. (4.11) Together with appropriate boundary and initial conditions, this equation can be solved numerically to give information about M(t), V (t) and p n (t). Here we will focus on the stationary distribution, and the associated stationary probability generating function G s (x) = lim G(x, t) = x n φ(n), (4.12) t which satisfies the ordinary differential equation = k 1 ν (1 x2 ) d2 G s dx 2 + k 2ν(x 1)G s, (4.13) or, equivalently, d 2 G s dx 2 = k 2ν 2 1 k x G s. (4.14) The nontrivial solution of this equation is G s (x) = C k 2 ν 1 + xi (1 + x), (4.15) k 1 where I 1 is the modified Bessel function of the first kind and is a solution of the equation We can evaluate C by noting that G(1, t) = n= n= z 2 I 1 (z) + zi 1(z) (z 2 1)I 1 (z) =. (4.16) p n (t) = 1 = C = n= 2I 1 2 2k 2 ν 2 k 1 Differentiating Equation (4.15) with respect to x and substituting we have M s = G s(1) = k 2 ν 2 2 I 1 2 k 1 I 1 2k 2 ν 2 k 1 2k 2 ν 2 k (4.17), (4.18) and V s = k 2ν 2 + M s Ms 2, (4.19) 2k 1 φ(n) = C 1 ( k2 ν 2 ) n/2 2k 2 ν I n 1 2 2, n = 1, 2, 3,... (4.2) n! k 1 k 1

30 number of A molecules frequency Chapter 4 Stochastic modelling of biological processes Reaction rate equations The reaction rate equation for system (4.1)-(4.2) is da dt = 2k 1a 2 + k 2, (4.21) where a(t) is the concentration of A molecules in volume ν. We can then compare the output from repeated stochastic simulation of system (4.1)-(4.2) with dā dt = 2k 1 ν Ā2 + k 2 ν, (4.22) where Ā(t) is the mean molecule number predicted by the reaction rate equation at time t Comparison of stochastic versus deterministic model Figure 4.1 shows a comparison of the results predicted by the stochastic and deterministic models. On the right-hand side five sample paths of the stochastic model are plotted, along with the solution of the reaction rate equation (4.21) and on the left-hand side the stationary distribution is shown. Equation (4.21) predicts a steady state A population of Ās = 1, whereas M s = 1.13 (to 2 d.p.). Although the difference between the two predictions is not large, we see that in this case the reaction rate equations cannot exactly predict the mean evolution of the system. In general this is true whenever second and higher order reactions are present. In the following sections, and on Problem Sheet 1, we will explore situations where the differences between stochastic and deterministic models are much more significant time [sec] number of A molecules Figure 4.1: Left: Sample paths from the dimerization system (4.1)-(4.2), alongside the solution of the reaction rate equation (4.21). The stationary distribution for system (4.1)-(4.2) generated using both long time path simulation (grey) and analytically (4.2) (orange). Parameters are A() =, k 1 /ν =.5s 1 and k 2 ν = 1.s Stochastic focussing Consider the following chemical reactions: k 1 C k 2 B k 3, (4.23) A + C k 4 A, (4.24) k 5 A k 6, (4.25)

31 Chapter 4 Stochastic modelling of biological processes 26 where k 1,..., k 6 are positive rate constants. We will refer to A as the signal and B as the product and study how changes in the number of signal molecules affects the number of product molecules in the system Reaction rate equations The reaction rate equations for system (4.23)-(4.25) can be written as da dt db dt dc dt = k 5 k 6 a, (4.26) = k 2 c k 3 b, (4.27) = k 1 k 2 c k 4 ac, (4.28) which means that the mean molecule numbers predicted by the reaction rate equations, Ā(t), B(t), C(t), are given by dā dt d B dt d C dt = k 5 ν k 6 Ā, (4.29) = k 2 C k3 B, (4.3) = k 1 ν k 2 C k 4 ν Ā C. (4.31) The steady states predicted by the reaction rate equations are therefore Ā s = k 5ν k 6, Bs = k 1 νk 2 k 6 k 3 (k 2 k 6 + k 4 k 5 ), Cs = Comparison of stochastic versus deterministic model First, we look at the predictions of the two models using the parameter values and k 1 ν = 1 s 1, k 2 = 1 s 1, k 3 =.1 s 1, k 5 ν = { 1 s 1 for t < 1 min, 5 s 1 for t 1 min, k 1 νk 6 k 2 k 6 + k 4 k 5. (4.32) k 4 ν = 99 s 1, (4.33) k 6 = 1 s 1, (4.34) with initial conditions A() =, B() = 1, C() =. In Figure 4.2 we see the comparison between stochastic and deterministic models of system (4.23)-(4.25). The deterministic model predicts that, as k 5 ν is halved the number of signal molecules (A) halves and, in response, the number of product (B) molecules doubles. However, simulations of the stochastic model show that the number of product molecules in fact nearly triples i.e. it is more sensitive to the change in signal than the deterministic model. Ignoring fluctuations in the signal We know that the deterministic model correctly predicts evolution of the stochastic mean number of A molecules. To try to understand the differences between the predictions of the stochastic and deterministic models, we can fix A(t) = M A,s = { 1 for t < 1 min, 5 for t 1 min, (4.35)

32 number of A molecules number of B molecules number of A molecules number of B molecules Chapter 4 Stochastic modelling of biological processes time [min] time [min] Figure 4.2: Five sample paths from system (4.23)-(4.25), together with the solution of the reaction rate equations (4.29)-(4.31). Parameter values are as given in (4.33)-(4.34). and then use a stochastic simulation algorithm to generate sample paths from the system The corresponding reaction rate equations give d B dt d C dt In each case we use the initial conditions B() = 1, C() =. k 1 C k 2 B k 3, (4.36) C k 4A(t). (4.37) = k 2 C k3 B, (4.38) = k 1 ν k 2 C k 4 ν A(t) C. (4.39) In Figure 4.3 we see that the deterministic model correctly predicts the time evolution of the number of B molecules. This is to be expected since there are only zeroth and first order reactions in the system, and it indicates that fluctuations in the number of A molecules is what gives rise to the differences between the model predictions time [min] time [min] Figure 4.3: Five sample paths from system (4.36)-(4.37), together with the solution of the reaction rate equations (4.38)-(4.38). Parameter values are as given in (4.33) and (4.35).

33 number of A molecules number of B molecules Chapter 4 Stochastic modelling of biological processes 28 Looking at the C population The parameter values listed in Equations (4.33)-(4.34) entail { 1 C 3 for t < 1 min, s = for t 1 min, (4.4) This means that it is not sensible to interpret C s as the number of C molecules present in the system. Instead, the stochastic models predicts that there is either zero or one molecules of C present. This low number of C molecules, and the role of C in the second order reaction, is what causes the large differences in the number of B molecules predicted by the two different models. To further illustrate, we now look at the predictions of the models when the parameter values are k 1 ν = 1 s 1, k 2 =.1 s 1, k 3 =.1 s 1 k 4, ν =.99 s 1, (4.41) { 1 s 1 for t < 1 min, k 5 ν = 5 s 1 k 6 = 1 s 1, (4.42) for t 1 min, together with initial conditions A() =, B() = 1, C() = time [min] time [min] Figure 4.4: Five sample paths from system (4.23)-(4.25), together with the solution of the reaction rate equations (4.29)-(4.31). Parameter values are as given in (4.41)-(4.42). Figure 4.4 shows that, with these parameter values, the predictions of the stochastic and deterministic models are very similar. We note that Ās and B s are the same for both sets of parameter values, but in the second case we have { 1 for t < 1 min, C s = (4.43) 19.8 for t 1 min. This means that the number of C molecules is large enough that it can be approximated well by C s. Figure 4.5 shows the time evolution of the number of C molecules for each parameter set and confirms our hypotheses. Further analysis of the model We have seen that second (or higher) order reactions, together with low copy numbers can lead to significant differences between the average behaviour of the stochastic model and that of the deterministic model.

34 number of C molecules number of C molecules Chapter 4 Stochastic modelling of biological processes time [min] time [min] Figure 4.5: Comparison of the C population for system (4.23)-(4.25). Left: Parameter values as given in (4.33)-(4.34). Right: Parameter values as given in (4.41)-(4.42). From our work in Lecture 2 we know that if the rate constants k 5 and k 6 are fixed then we can estimate the number of A molecules in the system as φ(n) = 1 n! (M A,s) n exp ( M A,s ). (4.44) Assuming now that the number of A molecules is fixed and equal to n, then the probability of there being a C molecule in the system is approximately k 1 ν k 2 + nk 4 /ν. (4.45) Putting the two together, the average probability of finding a C molecule in the system is n k 1 ν k 2 + nk 4 /ν φ(n) = n k 1 ν k 2 + nk 4 /ν 1 n! (M A,s) n exp ( M A,s ). (4.46) From here we can estimate the average number of B molecules in the system as k 1 νk 2 1 k 3 (k 2 + nk 4 /ν) n! (M A,s) n exp ( M A,s ). (4.47) n= Using Equation (4.35) we have M B,s { 113. for t < 1 min, for t 1 min. Comparison with Figure 4.2 shows this to be a good estimate! 4.3 References and further reading (4.48) Stochastic focusing: Fluctuation-enhanced sensitivity of intracellular regulation. J. Paulsson, O. G. Berg and M. Ehrenberg. Proc. Natl. Acad. Sci. USA 97: Tasks Verify that (4.15) is a solution of the generating function Equation (4.13). Complete Problem Sheet 1!

35 Chapter 4 Stochastic modelling of biological processes Example Matlab code To generate sample paths and estimate the stationary distribution of the single species dimerization model using the Gillespie algorithm. function dimerization lecture() clear all; close all; initial A=; % initial number of A molecules k1=.5; % annihilation rate k2=1.; % production rate t final=1; % final time %% no paths=5; % number of sample paths A=cell(no paths,1); t=cell(no paths,1); % use the Gillespie algorithm to generate sample paths for i=1:no paths % set the initial time and molecule numbers j=1; t{i}(j)=; A{i}(j)=initial A; while t{i}(j)<t final A=k1*A{i}(j)*(A{i}(j)-1)+k2; % calculate a tau=1/a*log(1/rand); % calculate the time until the next reaction if t{i}(j)+tau<=t final % update molecule numbers and time r2=a*rand; ss=k1*a{i}(j)*(a{i}(j)-1); if r2<ss A{i}(j+1)=A{i}(j)-2; else A{i}(j+1)=A{i}(j)+1; t{i}(j+1)=t{i}(j)+tau; j=j+1; else break; %% % solve the reaction rate equations using a forward Euler method dt=.1; t grid=:dt:t final;

36 Chapter 4 Stochastic modelling of biological processes 31 A det=zeros(length(t grid),1); A det(1)=initial A; for i=1:length(t grid)-1 A det(i+1)=a det(i)+dt*(-2*k1*a det(i)ˆ2+k2); %% % plot the results figure(1); clf; hold on; box on for i=1:no paths stairs(t{i},a{i},'linewidth',1) plot(t grid,a det,'k--','linewidth',2) axis([ 1 2]) set(gca,'xtick',:25:1) set(gca,'ytick',:5:2) xlabel('time [sec]'); ylabel('number of A molecules') %% % estimate the stationary distribution using a long sample path path length=1e8; % path length A stationary=zeros(11,1); t=; A=round(sqrt(k2/(2*k1))); % initial number of A molecules for i=1:path length % record the number of A molecules every second while t<i A=k1*A*(A-1)+k2; tau=1/a*log(1/rand); r2=a*rand; ss=k1*a*(a-1); if r2<ss A=A-2; else A=A+1; t=t+tau; if A<=1 A stationary(a+1)=a stationary(a+1)+1; % record the mean number of A molecules A mean=sum(a stationary.*(:1:1)')/path length

37 Chapter 4 Stochastic modelling of biological processes 32 %% % calculate the analytic stationary distribution n phi=1:1:11; C=(sqrt(2)*besseli(1,2*sqrt(2*k2/k1)))ˆ(-1); phi=c./factorial(n phi).*(k2/k1).ˆ(n phi/2).*besseli(n phi-1,2*sqrt(k2/k1)); phi=[c phi]; n phi=[ n phi]; %% % save the results (useful if the code takes some time to run) % save dimerization.mat % load the results % load dimerization.mat %% % plot the results figure(2); clf; hold on; box on bar(:1:1,a stationary/path length,'facecolor',[.7.7.7],... 'LineWidth',.2,'BarWidth',.6) plot(n phi,phi,'*','color',[ ],'markersize',1); axis([ ]) set(gca,'xtick',:5:25) set(gca,'ytick',:.5:.15) set(gca,'yticklabel',arrayfun(@(s)sprintf('%.2f', s),cellfun(@(s)str2num(s),... get(gca,'yticklabel')), 'UniformOutput', false)) xlabel('number of A molecules'); ylabel('frequency')

38 Chapter 5: Connection to stochastic differential equations In this lecture we will consider successive approximations to the chemical master equation, that facilitate both analytical and efficient numerical interrogation of the dynamics of biochemical reaction networks. 5.1 The tau-leap method Suppose that we are considering the biochemical reaction system defined in Section 3.1, that is a biochemical network consisting of N species, S 1,..., S N, that may be involved in M possible reactions, R 1,..., R M. We restate the chemical master equation (3.7): dp(x, t x, t ) dt M = [a j (x ν j )P(x ν j, t x, t ) a j (x)p(x, t x, t )]. j=1 The Gillespie stochastic simulation algorithm is termed event-driven as it advances through time one reaction at a time. A means to speed up the generation of sample paths is to leap over an interval of length τ and work out approximately how many reactions of each type fire in this interval. We will now describe how to do this. With the system in state X at time t suppose that there exists τ > such that during [t, t + τ) no propensity function, a j (X), j = 1,..., M, changes its value significantly. It then follows that, for j = 1,..., M, the number of times reaction channel R j fires during [t, t + τ) is (approximately) a Poisson random variable with mean (and variance) a j (X(t))τ. That is, number of firings of reaction channel in [t, t + τ) P j (a j (X(t))τ), j = 1,..., M, (5.1) where the P j (m j ), j = 1,..., M are statistically indepent Poisson random variables with mean (and variance) m j. This means that we can approximately leap the system forward in time by step τ by taking M X(t + τ) X(t) + P j (a j (X(t))τ)ν j. (5.2) j= The tau-leap approximate stochastic simulation algorithm Equation (5.2) provides a computational definition for the tau-leap approximate stochastic simulation algorithm: 1. Set t = t and X(t) = X(t ). Then, while t < t final, 2. Calculate a j (X(t)) for j = 1,..., M. 3. Generate random numbers R j P j (a j (X(t))τ). 33

39 number of A molecules mean number of A molecules Chapter 5 Stochastic modelling of biological processes Let X(t + τ) = X(t) + M j=1 R jν j, t = t + τ and return to step 2. Note that the Matlab function poissrnd(lambda) generates Poisson distributed random variates with mean lambda. The results of using the tau-leap algorithm to generate sample paths from the degradation reaction (1.1) are shown in Figure 5.1. Four sample paths generated using τ =.1 are shown on the left-hand side. On the right-hand side the predicted mean molecule number is plotted for τ = 1. and τ = 5., alongside the analytical solution. We see that the accuracy of the method decreases as τ increases time [sec] time [sec] Figure 5.1: Sample paths from a degradation reaction system generated using the tau-leap algorithm. Left: four different sample paths each generated using τ =.1. Right: mean number of A molecules predicted using τ = 1 (blue) and τ = 5 (orange). In each case 1 4 sample paths were used to estimate the mean. The exact solution for the mean number of A molecules is plotted in black. Parameters are A() = 2, k =.1s Connection to the random time-change representation An alternative way to derive the tau-leap algorithm is to consider a different description of the biochemical reaction system. In constructing the random time-change representation our aim is to be able to write M X(t) = X() + R j (t) ν j, (5.3) where R j (t) is the number of times reaction j fires in the interval (, t). We assume that two R j reactions cannot take place at the same time, so that R j (t) is a counting process, that is R j () = and R j is constant except for jumps of plus one. To make progress in understanding how to calculate R j (t) we will first remind ourselves of the definition of a Poisson process. A constant rate Poisson process A Poisson process is a model for a series of random observations occurring in time. Let Y (t) denote the number of observations by time t. Note that, for t < s, Y (s) Y (t) is the number of observations in the time interval (t, s]. We make the following assumptions about the model. 1. Observations occur one at a time. j=1

40 Chapter 5 Stochastic modelling of biological processes The numbers of observations in disjoint time intervals are indepent random variables, i.e. if t < t 1 <... < t m, then Y (t k ) Y (t k 1 ), k = 1,..., m are indepent random variables. 3. The distribution of Y (t + a) Y (t) does not dep on t. When these assumptions are satisfied, there is a constant λ > such that, for t < s, Y (s) Y (t) is Poisson distributed with parameter λ(s t), that is, P(Y (s) Y (t) = k) = [λ(s t)]k k! exp[ λ(s t)]. (5.4) If λ = 1 then we denote the process Y 1 (t) and call it the unit rate Poisson process. An alternative way to think about the unit rate Poisson process is to let ξ i, i = 1, 2,..., be indepent, identically distributed exponential random variables with parameter one and put points down on a line with spacing equal to the ξ i. Then Y 1 (t) is simply the number of points hit when we run along the time frame at rate one. We can then define a Poisson process with parameter a to be Y a (t) := Y 1 (at) where Y 1 is a unit rate Possion process. This means that the Poisson process with rate a is simply the number of points (of the unit rate Possion process) hit when we run along the time frame at rate a. An inhomogeneous Poisson process There is no reason a needs to be constant in time, in which case we can define ( t ) Y a (t) = Y 1 a(s) ds, (5.5) to be an inhomogeneous Poisson process. The random time-change representation The inhomogeneous Poisson process gives us a natural means by which to write R j (t): ( t ) R j (t) = Y j 1 a j (X(s ))ds, (5.6) where Y j 1 is a unit rate Poisson process. We can then write the random time-change representation of the system as X(t) = X() + M j=1 Connection to the tau-leap algorithm ( t ) Y j 1 a j (X(s ))ds ν j. (5.7) The τ-leaping algorithm can then be deduced by taking a forward Euler approximation of Equation (5.7) with time step τ. Let N τ = t/τ and X(t) Z τ (t) to give N τ M Z τ (t) = Z τ () + Y j 1 (a j(z τ ((k 1)τ )τ)) ν j, (5.8) or k=1 j=1 Z τ (t + τ) = Z τ (t) + M j=1 Y j 1 (a j(z τ (t))τ) ν j. (5.9)

41 Chapter 5 Stochastic modelling of biological processes The chemical Langevin equation With the system in state X at time t now suppose that, in addition to being able to find a τ such that during [t, t+τ) no propensity function, a j (X), j = 1,..., M, changes its value significantly, this τ is also large enough that the expected number of firings of R j during [t, t + τ) is 1, that is, a j (X(t))τ 1. (5.1) By the Central Limit Theorem, we know that for large λ the Poisson random variable P(λ) can be approximated using a normal random variable with mean and variance λ, i.e. This means that we can write P(λ) N (λ, λ) = λ + λn (, 1). (5.11) P j (a j (X(t))τ) a j (X(t)) + for j = 1,..., M, and hence further approximate Equation (5.2) as j=1 a j (X(t))τN j (, 1), (5.12) M M X(t + τ) X(t) + a j (X(t))τ ν j + a j (X(t))τN j (, 1)ν j. (5.13) Whilst we won t concern ourselves with the proof here, it can be shown using the theory of continuous time Markov processes that Equation (5.13) can also be formally written as a stochastic differential equation in white noise form: dx(t) dt M M a j (X(t))ν j + j=1 j=1 j=1 a j (X(t))dW j ν j. (5.14) The dw j, j = 1,..., M, are statistically indepent Gaussian white noise processes satisfying dw j (t)dw j (t ) = δ j,j δ(t t ), (5.15) where the first δ represents the Kronecker delta function and the second the Dirac delta function. In the next lectures, we will sp some time exploring the properties of stochastic differential equations, before returning to discuss their use in modelling biochemical reaction systems. 5.3 References and further reading Stochastic simulation of chemical kinetics. D. T. Gillespie. Annu. Rev. Phys. Chem. 58 (27). Continuous time Markov chain models for chemical reaction networks. D. F. Anderson and T. G. Kurtz. Chapter 1, Design and analysis of bimolecular circuits (211). http: // The chemical Langevin equation. D. T. Gillespie. J. Chem. Phys. 113 (2). http: //doi.org/1.163/ Tasks Implement a tau-leaping approximate stochastic simulation algorithm that models the death process of Equation (1.1). How do your results change as you vary the value of τ? Why is this the case?

42 Chapter 5 Stochastic modelling of biological processes Example Matlab code To generate tau-leap sample paths, and compare results as tau is varied. function degradation lecture() clear all; close all; N=2; % initial number of A molecules k=.1; % reaction rate t final=3; % final time % no paths=4; A=cell(no paths,1); t=cell(no paths,1); tau=.1; % value of tau no steps=t final/tau; % number of steps required A=zeros(no paths,no steps+1); A(:,1)=N*ones(no paths,1); for i=1:no steps % generate path by repeatedly drawing Poisson random variates A(:,i+1)=A(:,i)-poissrnd(k*A(:,i)*tau); % if molecule numbers go negative, set to zero neg=find(a(:,i+1)<); A(neg)==; % plot the results figure(1); clf; hold on; box on for i=1:no paths stairs(:tau:t final,a(i,:),'linewidth',1); axis([ 3 N]) set(gca,'xtick',:1:3) set(gca,'ytick',:5:2) xlabel('time [sec]'); ylabel('number of A molecules'); %% % compare results using different values of tau against the analytic result % deterministic solution t grid=:.5:t final; M=N*exp(-k*t grid); no paths=1e4; tau=[1 5]; % values of tau no steps=t final./tau;

43 Chapter 5 Stochastic modelling of biological processes 38 % generate the paths using the tau leap algorithm for j=1:length(tau) A=zeros(no paths,no steps(j)+1); A(:,1)=N*ones(no paths,1); for i=1:no steps(j) A(:,i+1)=A(:,i)-poissrnd(k*A(:,i)*tau(j)); neg=find(a<); A(neg)=; % average the results A mean{j}=mean(a); % plot the results figure(1); clf; hold on; box on plot(:tau(1):t final,a mean{1},'+','markersize',5,'color',[ ]) plot(:tau(2):t final,a mean{2},'b+','markersize',5,'color',[ ]) plot(t grid,m,'k','linewidth',1) axis([ 3 N]) set(gca,'xtick',:1:3) set(gca,'ytick',:5:2) xlabel('time [sec]'); ylabel('mean number of A molecules');

44 Chapter 6: Introduction to stochastic differential equations In this lecture we will give an informal introduction to some simple stochastic differential equations. We will look at a computational definition of a stochastic differential equation and study three simple examples. 6.1 A computational definition of a stochastic differential equation Suppose that x(t) evolves according to dx dt = f(x, t) with x() = x, (6.1) where f : R [, ) R is a given, sufficiently nice function. equation (6.1) can be re-written as The ordinary differential dx = f(x, t)dt, (6.2) to specify the infinitesimal change in x(t), dx(t) = x(t + dt) x(t). This means that we can then write (6.1) as x(t + dt) = x(t) + f(x(t), t)dt. (6.3) Equation (6.3) gives us a simple means to compute an approximate solution to the ordinary differential equation (6.1). Choosing a small time step t then, given x(t), we can write x(t + t) = x(t) + f(x(t), t) t with x() = x. (6.4) This method is called the forward Euler method, and the error of the approximation can be decreased by making the time step, t, smaller. In simple terms, a stochastic differential equation is just an ordinary differential equation with an additional noise term describing stochastic fluctuations. If Equation (6.4) is the computational definition of an ordinary differential equation, then we can write the computational definition of the corresponding stochastic differential equation as X(t + t) = X(t) + f(x(t), t) t + g(x(t), t) tξ with X() = x, (6.5) where g : R [, ) R is the strength of the noise and ξ N (, 1). The stochastic differential equation can be formally written in the form X(t + t) = X(t) + f(x(t), t) t + g(x(t), t)dw with X() = x, (6.6) where dw is the so-called white noise. Note that other noise terms are possible, but here we will only use Gaussian (white) noise. For our purposes, it will be sufficient to know that the meaning of Equation (6.6) is given by the computational definition, (6.5). In fact, we will only need to know how to simulate stochastic 39

45 X mean(x) / variance(x) Chapter 6 Stochastic modelling of biological processes 4 differential equations numerically and to use them for the analysis of reaction-diffusion processes. This means that whenever we write a stochastic differential equation in the form (5.14) we can replace the dw by tn (, 1) where t is the time step of the algorithm. Equation (6.5) is often called the Euler-Maruyama method for solving the stochastic differential equation, and one can immediately see how this method relates to the forward Euler method for ordinary differential equations discussed above. 6.2 Example 1 Suppose that both time, t, and the variable X(t) are dimensionless and f(x, t), g(x, t) 1 so that X(t) satisfies the stochastic differential equation X(t + dt) = X(t) + dw, with X() =. (6.7) The corresponding computational definition is X(t + t) = X(t) + tξ, with X() =, (6.8) where t is the small time step and ξ N (, 1). Four sample paths generated using Equation (6.8) to define a stochastic simulation algorithm are shown in Figure time [sec] time [sec] Figure 6.1: Left: four sample paths generated using Equation (6.8). Right: the mean (orange) and variance (blue) estimated from 5 sample paths, together with the analytic results (M(t) = and V (t) = t, black dashed). Parameters are t =.1s. Let M(t) := E[X(t)] and V (t) := Var(X(t)) = E[X(t) 2 ] M(t) 2 where E( ) denotes the average over (infinitely) many realisations of the stochastic simulation algorithm. computational definition, (6.8), we have M(t + t) = E [X(t + t)] [ = E X(t) + ] tξ = E [X(t)] + te [ξ ] Using the = M(t), (6.9)

46 X mean(x) / variance(x) Chapter 6 Stochastic modelling of biological processes 41 and V (t + t) = E [ X(t + t) 2] M(t + t) 2 [ ( = E X(t) + ) ] 2 tξ M(t) 2 = E [ X(t) 2] + 2 te [X(t)] E [ξ ] + te [ ξ 2] M(t) 2 = E [ X(t) 2] + t M(t) 2 = V (t) + t, (6.1) where we have used the fact that E[ξ ] = and E[ξ 2 ] = 1. The initial conditions are M(t) = and V (t) =, so that we have M(t) and V (t) = t (see Figure 6.1). This means that both M(t) and V (t) are indepent of the time step t. This is in fact true for any moment, E [ X(t) k] for k = 1, 2, 3,... (see Problem Sheet 2), and is one of the reasons why we chose the computational definition of dw as tξ. 6.3 Example 2 The function f(x, t) in Equation (6.6) is often called the drift coefficient. We will now ext Example 1 by letting f(x, t) 1, g(x, t) 1 so that X(t + dt) = X(t) + dt + dw, with X() =, (6.11) with corresponding computational definition X(t + t) = X(t) + t + tξ, with X() =. (6.12) Four sample paths generated using Equation (6.8) to define a stochastic simulation algorithm are shown in Figure time [sec] time [sec] Figure 6.2: Left: four sample paths generated using Equation (6.12). Right: the mean (orange) and variance (blue) estimated from 5 sample paths, together with the analytic results (M(t) = t and V (t) = t, black dashed). Parameters are t =.1s. Using Equation (6.12) we have M(t + t) = E [X(t + t)] [ = E X(t) + t + ] tξ = E [X(t)] + t + te [ξ ] = M(t) + t, (6.13)

47 X X Chapter 6 Stochastic modelling of biological processes 42 so that M(t) = t (see Figure 6.2). This means that, as one might expect, solutions of Equation (6.11) fluctuate around a mean value that deps only on the drift coefficient. Similarly, one can show that V (t) = t (see Figure 6.2). 6.4 Example 3 The final example is motivated by considering the bistability example we looked at in Problem Sheet 1, Question 3. We take f(x, t) = k 1 x 3 + k 2 x 2 k 3 x + k 4, (6.14) g(x, t) = k 5, (6.15) where k 1,..., k 5 are positive constants to give X(t + dt) = X(t) + [ k 1 X(t) 3 + k 2 X(t) 2 ] k 3 X(t) + k 4 dt + k5 dw. (6.16) time [sec] time [sec] Figure 6.3: Two sample paths generated by solving Equation (6.16) with X() = (left) and X() = 5 (right) are shown in blue. The corresponding solution of Equation (6.17) is plotted in orange. Parameters are k 1 = 1 3, k 2 =.75, k 3 = 165, k 4 = 1 4, k 5 = 2 and t =.1. We note that if k 5 = then Equation (6.16) becomes the ordinary differential equation derived in Question 3 of Problem Sheet 1: dx dt = k 1x 3 + k 2 x 2 k 3 x + k 4. (6.17) Choosing k 1 = 1 3, k 2 =.75, k 3 = 165 and k 4 = 1 4 gives steady states states, two stable and one unstable, with x 1 s = 1, x u = 25, x 2 s = 4. (6.18) Two sample paths generated by solving Equation (6.16) are shown in Figure 6.3, along with the corresponding solutions of Equation (6.17). 6.5 References and further reading Stochastic simulation of chemical kinetics. D. T. Gillespie. Annu. Rev. Phys. Chem. 58 (27). The chemical Langevin equation. D. T. Gillespie. J. Chem. Phys. 113 (2). http: //doi.org/1.163/

48 Chapter 6 Stochastic modelling of biological processes Tasks Show that V (t) = t for Example 2. Implement a stochastic simulation algorithm to generate a large number of sample paths from Example 2. Compare the mean and variance of your sample with the theoretical predictions M(t) = t and V (t) = t. Implement a stochastic simulation algorithm to generate a large number of sample paths from Example 3. Compare your results with the solution of Equation (6.17). 6.7 Example Matlab code To generate sample paths from Example 1. function example1 lecture() clear all; close all; % no paths=4; % number of sample paths t final=1; % final time dt=.1; % time step no steps=t final/dt; % total number of steps X=zeros(no paths,no steps+1); dw=sqrt(dt)*randn(no paths,no steps); % generate all the noise increments X(:,2:)=cumsum(dW,2); % take cumulative sums to generate the sample paths % plot the results figure(1); clf; hold on; box on for i=1:no paths plot(:dt:t final,x,'linewidth',1); axis([ t final -6 6]) set(gca,'xtick',:2:1) set(gca,'ytick',-6:3:6) xlabel('time [sec]'); ylabel('x') %% no paths=5; % number of sample paths t final=1; % final time dt=.1; % time step no steps=t final/dt; % total number steps X=zeros(no paths,no steps+1); dw=sqrt(dt)*randn(no paths,no steps); % generate all the noise increments X(:,2:)=cumsum(dW,2); % take cumulative sums to generate the sample paths % plot the mean and variance of the asample paths, compare to analytic results figure(2); clf; hold on; box on

49 Chapter 6 Stochastic modelling of biological processes 44 plot(:dt:t final,mean(x),'color',[ ],'linewidth',1); plot(:dt:t final,zeros(1,no steps+1),'k--','linewidth',1) plot(:dt:t final,var(x),'color',[ ],'linewidth',1); plot(:dt:t final,:dt:t final,'k--','linewidth',1) axis([ t final ]) set(gca,'xtick',:2:1) set(gca,'ytick',:2:1) xlabel('time [sec]') ylabel('mean(x) / variance(x)') To generate sample paths from Example 2. function example2 lecture() clear all; close all; % no paths=4; % number of paths t final=1; % final time dt=.1; % time step no steps=t final/dt; % total number of steps X=zeros(no paths,no steps+1); dt=dt*ones(no paths,no steps); dw=sqrt(dt)*randn(no paths,no steps); % generate all random increments X(:,2:)=cumsum(dT,2)+cumsum(dW,2); % generate sample paths using cumulative sums % plot the results figure(1); clf; hold on; box on for i=1:no paths plot(:dt:t final,x,'linewidth',1); axis([ t final -1 15]) set(gca,'xtick',:2:1) set(gca,'ytick',:5:15) xlabel('time [sec]'); ylabel('x') %% no paths=5; t final=1; dt=.1; no steps=t final/dt; X=zeros(no paths,no steps+1); dt=dt*ones(no paths,no steps); dw=sqrt(dt)*randn(no paths,no steps); X(:,2:)=cumsum(dT,2)+cumsum(dW,2); % plot the mean and the variance of the sample paths

50 Chapter 6 Stochastic modelling of biological processes 45 figure(2); clf; hold on; box on plot(:dt:t final,mean(x),'color',[ ],'linewidth',1); plot(:dt:t final,var(x),'color',[ ],'linewidth',1); plot(:dt:t final,:dt:t final,'k--','linewidth',1) axis([ t final ]) set(gca,'xtick',:2:1) set(gca,'ytick',:5:15) xlabel('time [sec]'); ylabel('mean(x) / variance(x)')

51 Chapter 7: The Fokker-Planck equation In this lecture we will show how to derive the Fokker-Planck equation using our computational definition of a stochastic differential equation. Suppose that X(t) evolves according to the stochastic differential equation (6.6). We define its probability distribution, p(x, t), such that p(x, t)dx = P (X(t) [x, x + dx] X() = x ). (7.1) Roughly speaking, p(x, t) quantifies the probability of finding a given trajectory of the stochastic differential equation around the point x at time t given it started at x at time t =. For any time t, p(x, t) satisfies the normalisation condition p(x, t)dx = 1. (7.2) We will show that p(x, t) evolves according to the partial differential equation ( ) p 2 g(x, t) 2 (x, t) = t x 2 p(x, t) ( ) f(x, t)p(x, t). (7.3) 2 x Equation (7.3) is often called the Fokker-Planck equation or the forward Kolmogorov equation. 7.1 Derivation of the Fokker-Planck equation For s < t let p(x, t y, s) := P (X(t) [x + dx) X(s) = y). (7.4) Then we can write down the Chapman-Kolmogorov equation to describe the value of X at time t + t: p(z, t + t y, s) = R p(z, t + t x, t)p(x, t y, s)dx, (7.5) where s < t and Equation (7.5) is valid for all t. To derive the Fokker-Planck equation we will take the limit as t. However, we first multiply both sides by a smooth test function ϕ(z) and integrate over z to give p(z, t + t y, s)ϕ(z)dz = R R [ ] p(z, t + t x, t)ϕ(z)dz p(x, t y, s)dx. (7.6) R We now rename the integrating variable z to x on the left-hand side so that [ ] p(x, t + t y, s)ϕ(x)dx = p(z, t + t x, t)ϕ(z)dz p(x, t y, s)dx. (7.7) R R R On the right-hand side, we Taylor expand ϕ(z) around the point x i.e. we write ϕ(z) = ϕ(x) + (z x)ϕ (x) (z x)2 ϕ (x) + o ( (z x) 2), (7.8) 46

52 Chapter 7 Stochastic modelling of biological processes 47 so that p(x, t + t y, s)ϕ(x)dx = R = R R [ { p(z, t + t x, t) ϕ(x) + (z x)ϕ (x) R (z x)2 ϕ (x) + o ( (z x) 2)} ] dz p(x, t y, s)dx [ ϕ(x) p(z, t + t x, t)dz R +ϕ (x) (z x)p(z, t + t x, t)dz R ϕ (x) (z x) 2 p(z, t + t x, t)dz R + o ( (z x) 2) ] p(z, t + t x, t)dz p(x, t y, s)dx. (7.9) R We simplify the right-hand side of Equation (7.9) by considering each term individually. For the first term, we have R p(z, t + t x, t)dz = 1. (7.1) We note that the second term can also be written (z x)p(z, t + t x, t)dz = E [X(t + t) x X(t) = x]. (7.11) R Then we can use the computational definition (6.5) to write [ E [X(t + t) x X(t) = x] = E f(x, t) t + g(x, t) ] tξ Together, this means that R = f(x, t) t + g(x, t) te [ξ ] = f(x, t) t. (7.12) (z x)p(z, t + t x, t)dz = f(x, t) t. (7.13) To use the same approach for the third term, we note that [ ] (z x) 2 p(z, t + t x, t)dz = E (X(t + t) x) 2 X(t) = x, (7.14) R and again use the computational definition (6.5) to write [ ] E (X(t + t) x) 2 X(t) = x [ ( = E f(x, t) t + g(x, t) ) ] 2 tξ = f(x, t) 2 ( t) 2 + 2f(x, t)g(x, t)( t) 3/2 E [ξ ] +g(x, t) 2 te [ ξ 2 ] = g(x, t) 2 t + O ( ( t) 2). (7.15) Together, this means that (z x) 2 p(z, t + t x, t)dz = g(x, t) 2 t + O ( ( t) 2). (7.16) R

53 Chapter 7 Stochastic modelling of biological processes 48 Substituting equations (7.1), (7.13) and (7.16) into Equation (7.9) gives [ p(x, t + t y, s)ϕ(x)dx = ϕ(x) + ϕ (x)f(x, t) t R R ] +ϕ g(x, t)2 (x) t p(x, t y, s)dx + O ( ( t) 2), (7.17) 2 which can be rearranged to obtain p(x, t + t y, s) p(x, t y, s) ϕ(x)dx = t R R We can then use integration by parts on the right-hand side to give p(x, t + t y, s) p(x, t y, s) ϕ(x) dx = ϕ(x) t x ϕ (x)f(x, t)p(x, t y, s)dx R + ϕ g(x, t)2 (x) p(x, t y, s)dx + O( t). (7.18) 2 + R R R ϕ(x) 2 x 2 The above expression can now be written as one integral: [ p(x, t + t y, s) p(x, t y, s) = ϕ(x) R t ( ) f(x, t)p(x, t y, s) x ( g(x, t) x 2 2 ( f(x, t)p(x, t y, s) ( g(x, t) 2 2 ) dx ) p(x, t y, s) dx + O( t). (7.19) )] p(x, t y, s) dx + O( t). (7.2) Since the test function, ϕ(x), is arbitrary, we can conclude that the term inside the square brackets must be zero to arrive at p(x, t + t y, s) p(x, t y, s) t = 2 x 2 ( g(x, t) 2 2 ) p(x, t y, s) ( ) f(x, t)p(x, t y, s) + O( t). x (7.21) Taking the limit as t we obtain the Fokker-Planck equation: ( ) 2 g(x, t) 2 p(x, t y, s) = t x 2 p(x, t y, s) ( ) f(x, t)p(x, t y, s). (7.22) 2 x Note that to write the Fokker-Planck equation in the same form as Equation (7.3), we set y = x and s = and note that the function we denoted as p(x, t) should more formally be written as p(x, t x, t ) Example 1 of Lecture 6 For Example 1 of Lecture 6 we have f(x, t), g(x, t) 1 and X() =. The corresponding Fokker-Planck equation is then t p(x, t) = 1 p(x, t) with p(x, ) = δ(x), (7.23) 2 x2 2

54 p(x,1) Chapter 7 Stochastic modelling of biological processes 49 and it has solution p(x, t) = 1 ) exp ( x2. (7.24) 2πt 2t Figure 7.1 shows comparison of p(x, 1) with the results of simulating many sample paths using Equation (6.8) x Figure 7.1: Solution of the Fokker-Planck equation, (7.23), given by Equation (7.24) at t = 1 (orange) together with a histogram of sample path positions at t = 1 (grey) generated using 1 5 sample paths and t = The stationary distribution If f(x, t) f(x) and g(x, t) g(x) then we can evaluate the long time behaviour of the corresponding stochastic differential equations by considering the stationary distribution: p s (x) := lim t p(x, t). (7.25) The function can be found by solving the stationary problem corresponding to (7.3), which is the ordinary differential equation d 2 ( ) g(x) 2 dx 2 p s (x) d ( ) f(x)p s (x) =, (7.26) 2 dx with solution p s (x) = C [ x ] g(x) 2 exp 2f(y) g(y) 2 dy. (7.27) Using the normalisation condition, Equation (7.2), we have ( C = Example 3 of Lecture 6 R [ 1 x ] g(x) 2 exp 2f(y) 1 dx) g(y) 2 dy. (7.28) Recall the stochastic differential equation of Lecture 6, Example 3, X(t + dt) = X(t) + [ k 1 X(t) 3 + k 2 X(t) 2 k 3 X(t) + k 4 ] dt + k5 dw. (7.29) Using the method outlined above we can calculate the stationary distribution as [ p s (x) = C 3k1 x 4 + 4k 2 x 3 6k 3 x 2 ] + 12k 4 x exp, (7.3) 6k 2 5

55 p s (x) Chapter 7 Stochastic modelling of biological processes 5 where ( [ 3k1 x C 4 + 4k 2 x 3 6k 3 x 2 ] k 4 x = exp dx). (7.31) R 6k x Figure 7.2: Stationary distribution for Example 3 of Lecture 6, given by Equation (7.3) (orange), together with an estimate of the stationary distribution generated using stochastic simulation (grey) with 1 5 sample paths and t = References and further reading An Introduction to Stochastic Processes in Physics and Chemistry. L. S. J. Allen. Stochastic Processes in Physics and Chemistry. N. G. van Kampen. 7.4 Tasks Reproduce the results shown in Figure 7.1. Solve Equation (7.26) to show that the stationary distribution is as stated in Equation (7.27), and use the normalisation condition, Equation (7.2), to show C is as stated in Equation (7.28). 7.5 Example Matlab code To generate sample paths from Example 1. function example1 lecture() clear all; close all; %% no paths=1e5; % number of sample paths t final=1; % final time dt=.1; % time step dx=.25; % bin width no steps=t final/dt; % total number of time steps X=zeros(no paths,no steps+1);

56 Chapter 7 Stochastic modelling of biological processes 51 dw=sqrt(dt)*randn(no paths,no steps); % generate all random increments X(:,2:)=cumsum(dW,2); % generate sample paths using cumulative sum bins=-5:dx:5; SDE hist=hist(x(:,),bins)/(no paths*dx); % bin the results to plot a histogram % plot the results figure(1); clf; hold on; box on % plot the averaged stochastic results bar(bins-dx/2,sde hist','facecolor',[.7.7.7],'linewidth',.5,'barwidth',.6) % plot the analytic result plot(bins-dx/2,1/sqrt(2*pi)*exp(-bins.ˆ2/2),'linewidth',1,'color',[ ]); axis([ ]) set(gca,'xtick',-4:4:4) set(gca,'ytick',:.1:.4) xlabel('x') ylabel('p(x,1)') To generate sample paths from Example 3. function example3 lecture() clear all; close all; t final=1e6; % final time dt=.1; % time step save step=1; dx=1.; % bin width no steps=t final/dt; % number of save steps no saves=t final/save step+1; SDE hist=zeros(5/dx+1,1); % parameters k1=1e-3; k2=.75; k3=165; k4=1e4; k5=2; X=1; % SDE realisation dw=sqrt(dt)*randn(no steps,1); % generate all random increments for ii=1:no steps % save the results every 1 second if mod((ii-1)*dt,save step)== jj=round(x/dx); SDE hist(jj)=sde hist(jj)+1; % update the sample path X=X+(-k1*X.ˆ3+k2*X.ˆ2-k3*X+k4)*dt+k5*dW(ii);

57 Chapter 7 Stochastic modelling of biological processes 52 % deterministic solution x=:1:5; ps=exp((-3*k1*x.ˆ4+4*k2*x.ˆ3-6*k3*x.ˆ2+12*k4*x)/(6*k5ˆ2)); ps=ps/(sum(ps)); % normalise the constant \bar{c} % save the results to file save lec7 example3.mat %% % load the results load lec7 example3.mat % plot the results figure(1); clf; hold on; box on bins=:dx:5; bar(bins+dx/2,sde hist'/(sum(sde hist)*dx),'facecolor',[.7.7.7],... 'LineWidth',.5,'BarWidth',.6); plot(x-5,ps,'linewidth',1,'color',[ ]) axis([ 5.1]) set(gca,'xtick',:1:5) set(gca,'ytick',:.1:.1) xlabel('x'); ylabel('p s(x)')

58 Chapter 8: The backward Kolmogorov equation Sometimes we might want to understand how the likelihood of ing up in a given state deps on the starting state. This means that, in contrast to Lecture 7, the position is known while the starting position is underdetermined. We would like to be able to understand the evolution of p(x, t y, s) in terms of the initial time, s, and initial state, y. 8.1 Derivation of the backward Kolmogorov equation We start by renaming variables in the Chapman-Kolmogorov equation, (7.5), to obtain p(x, t y, s s) = p(x, t z, s)p(z, s y, s s)dz. (8.1) R As in Lecture 7, this equation is valid for any s, and we will eventually take the limit s. First, we Taylor expand about the point z = y to write p(x, t z, s) = p(x, t y, s)+(z y) y p(x, t y, s)+ 1 2 (z y)2 2 y 2 p(x, t y, s)+o ( (z y) 2), (8.2) and substitute into the right-hand side of Equation (8.1) so that we have p(x, t y, s s) = p(x, t y, s) p(z, s y, s s)dz R + y p(x, t y, s) (z y)p(z, s y, s s)dz R p(x, t y, s) y2 2 (z y)2 p(z, s y, s s)dz + O ( ( s) 2). (8.3) Using Equations (7.1), (7.13) and (7.16) we can then obtain R p(x, t y, s s) p(x, t y, s) s = f(y, s) g(y, s)2 2 p(x, t y, s) + p(x, t y, s) + O( s). (8.4) y 2 y2 Taking the limit s gives the so-called backward Kolmogorov equation: s p(x, t y, s) = f(y, s) g(y, s)2 2 p(x, t y, s) + p(x, t y, s). (8.5) y 2 y2 Both the Fokker-Planck equation and the backward Kolmogorov equation provide an exact description of p(x, t y, s) corresponding to the stochastic differential equation (6.6). 8.2 The diffusion coefficient We can simplify notation by defining the diffusion coefficient, d(x, t) = 1 2 g(x, t)2. (8.6) 53

59 Chapter 8 Stochastic modelling of biological processes 54 In this case, the Fokker-Planck equation becomes ( ) p 2 (x, t) = t x 2 d(x, t)p(x, t) ( ) f(x, t)p(x, t), (8.7) x the backward Kolmogorov equation can be written s p(x, t y, s) = f(y, s) y and the stationary distribution is where p s (x) = ( C = 8.3 Average switching times R 2 p(x, t y, s) + d(y, s) p(x, t y, s), (8.8) y2 C [ x ] d(x) exp f(y) d(y) dy. (8.9) [ 1 x ] d(x) exp f(y) 1 dx) d(y) dy. (8.1) Recall that the stochastic differential equation of Lecture 6, Example 3, X(t + dt) = X(t) + [ k 1 X(t) 3 + k 2 X(t) 2 k 3 X(t) + k 4 ] dt + k5 dw, (8.11) with parameters k 1 = 1 3, k 2 =.75, k 3 = 165 and k 4 = 1 4 has two favourable states, x 1 s = 1 and x 2 s = 4, and an unfavourable state, x u = 25. In Figure 6.3 we saw that most of the time, X(t) stays close to those favourable states and occasionally it switches between them. In this section we will learn how to calculate the average time it takes to switch between two favourable states. Since the trajectories jump around each of the favourable states, it is not clear quite how to define mathematically when the system is in a given state (i.e. when it is close to x 1 s or x 2 s). This makes it hard to define when the switch takes place, and so we need to take a different approach. We define τ to be the average time for a trajectory to reach x u, given that X() = x 1 s. If a trajectory reaches x u there is a 5% chance it will return back to x 1 s and a 5% chance it will continue on to x 2 s. This means that the average switching is therefore 2 τ Numerical estimation of the average switching time We can average over a number of trajectories generated using the stochastic simulation algorithm to estimate τ. We start each trajectory at x 1 s and wait until the trajectory first leaves the interval (, x u ). For the parameters used in Lecture 6, Example 3, t =.1 and generating 1 5 trajectories, we have τ sim = 64.7 with a sample variance of Analytical expression for the average switching time We will derive an analytical expression for τ for any stochastic differential equation of the form (6.6) where f(x, t) f(x) and g(x, t) g(x). Let h(y, t) := P ( X(t ) (, x u ) t (, t) X() = y (, x u ) ). (8.12) Then h(y, t) = xu p(x, t y, )dx, (8.13)

60 frequency Chapter 8 Stochastic modelling of biological processes switching time Figure 8.1: The distribution of switching times for Equation (6.16) as estimated using 1 5 sample paths. Parameters are k 1 = 1 3, k 2 =.75, k 3 = 165, k 4 = 1 4, k 5 = 2 and t =.1, and in each case X() = x 1 s = 1. where p(x, t y, s) represents the probability that the trajectory remains in (, x u ) and lies in the interval [x, x + dx) at time t given that it started at y at time s < t. From our results thus far, we know that p satisfies the Fokker-Planck and backward Kolmogorov equations with the boundary conditions p(x u, t y, s) = p(x, t x u, s) =, (8.14) so that p(x, t y, s) = if y x u or x x u. (8.15) Since the coefficients f and g are assumed not to dep on time, we can shift time in the definition of p in Equation (8.13): h(y, t) = xu p(x, y, t)dx. (8.16) Using the same transformation in the backward Kolmogorov equation, (8.8), we obtain s p(x, y, t) = f(y) y 2 p(x, y, t) + d(y) p(x, y, t). (8.17) y2 Integrating this equation with respect to x, and using Equation (8.13), gives t h(y, t) = f(y) 2 h(y, t) + d(y) h(y, t). (8.18) y y2 Let τ(y) be the average time for a trajectory, X(t), with X() = y to leave the interval (, x u ). The probability that X first leaves the interval (, x u ) during the time interval [t, t + dt) is then This means that τ(y) can be computed as h(y, t) h(y, t + dt) h(y, t)dt. (8.19) t τ(y) = t t h(y, t)dt = h(y, t) dt. (8.2)

61 Chapter 8 Stochastic modelling of biological processes 56 Integrating Equation (8.18) with respect to t between and gives h(y, ) h(y, ) = f(y) y h(y, t)dt + d(y) 2 y 2 h(y, t) dt. (8.21) Substituting for τ(y) and using the fact that h(y, ) = and h(y, ) = 1, we obtain 1 = f(y) dτ dy + d(y)d2 τ dy 2 for y (, x u ). (8.22) The system is closed by specifying two boundary conditions. First, p(x, t x u, s) = implies h(x u, t) = and so τ(x u ) =. (8.23) The other boundary condition deps on the problem under consideration. Suppose, for example, that f(y) as y : if we start trajectories further and further to the left, we expect that the exit time will not dep on the starting position. Thus, we impose dτ ( ) =. (8.24) dy Using an integrating factor to integrate (8.22) with respect to y, and using the boundary condition at y = we have dτ dy = exp [ y 1 = d(y)p s (y) y ] f(z) y d(z) dz [ 1 z ] d(z) exp f(x) d(x) dx dz p s (x)dx, (8.25) where p s (x) is the stationary distribution, (8.9). Integrating again with respect to y and using the boundary condition at y = x u τ(y) = xu By definition, we can now compute τ as y τ = τ(x 1 s) = 8.4 Example 3 of Lecture 6 1 z p s (x)dxdz. (8.26) d(z)p s (z) xu x 1 s 1 z p s (x)dxdz. (8.27) d(z)p s (z) Recall again the stochastic differential equation of Lecture 6, Example 3, X(t + dt) = X(t) + [ k 1 X(t) 3 + k 2 X(t) 2 k 3 X(t) + k 4 ] dt + k5 dw. (8.28) The stationary distribution, p s (x), for this example of a bistable system is given in equations (7.3)-(7.31). Substituting p s (x) into Equation (8.27) and evaluating the resulting integrals numerically, we obtain τ = The mean exit time is plotted as a function of starting position in Figure 8.2. There is an error (approximately 9%) between the theoretical value of τ and the value τ sim = The main reason for this is that we simulated the stochastic differential equation using a finite time step of =.1. Decreasing the time step improves the accuracy of the stochastic

62 X mean exit time Chapter 8 Stochastic modelling of biological processes time x X() Figure 8.2: Left: trajectories of (8.28) computed using t = 1 3 (blue) and t = 1 5 (orange). The trajectory with smaller time step leaves the domain (, x u ) whilst the trajectory with larger time step does not. Right: the mean exit time as a function of starting position with the analytic estimate, Equation (8.26), shown in orange and the averaged discrete results in blue. Parameters are k 1 = 1 3, k 2 =.75, k 3 = 165, k 4 = 1 4, k 5 = 2, x u = 25 and t =.1, and in each case X() = x 1 s = 1. simulation algorithm. This is essentially because, even if X(t) < x u and X(t + t) < x u, there is some probability that the trajectory left the domain (, x u ) during the time interval (t, t + t) (see Figure 8.2). To estimate this probability, we suppose that during (t, t + t) the particle diffuses only with diffusion coefficient d = k 2 5 /2. This is a good approximation close to x u because we know that the drift coefficient is such that f(x u ) =. In this case, the probability that the trajectory left (, x u ) during the time interval (t, t+ t) is approximately (see Problem Sheet 2) probability left (, x u ) during (t, t + t) exp 8.5 References and further reading [ (X(t) x ] u) (X(t + t) x u ). (8.29) d t An Introduction to Stochastic Processes in Physics and Chemistry. L. S. J. Allen. Stochastic Processes in Physics and Chemistry. N. G. van Kampen. 8.6 Tasks Use the computational definition of Equation (8.28) to numerically estimate the switching time τ (recall that this is defined as the average time for a trajectory to reach x u, given that X() = x 1 s). Evaluate the switching time, τ(y), from Equation (8.26) and compare your result with that estimated using repeated simulation of SDE sample paths. Repeat this exercise, but with a numerical estimate that is corrected using the result in Equation (8.29). Complete Problem Sheet 2!

63 Chapter 8 Stochastic modelling of biological processes Example Matlab code To estimate the switching time distribution for Lecture 6, Example 3. function example3 switching SDE lecture() clear all; close all; no paths=1e5; % number of sample paths dt=1e-3; % time step t stop=zeros(no paths,1); % parameters k1=1e-3; k2=.75; k3=165; k4=1e4; k5=2; X unstable=25; % SDE realisations dw=sqrt(dt)*randn(1e6,1); % generate initial set of random increments jj=1; for ii=1:no paths X=1; tt=; % until x u reached, evolve the sample paths while X<X unstable tt=tt+dt; X=X+(-k1*X.ˆ3+k2*X.ˆ2-k3*X+k4)*dt+k5*dW(jj); jj=jj+1; if jj>1e6 % if more random increments needed, generate them dw=sqrt(dt)*randn(1e6,1); jj=1; % record the time at which x u is reached t stop(ii)=tt; % output the mean and variance of the switching time mean(t stop) var(t stop)/no paths %% % save the results to file % save example3 switching pdf.mat % load the results from file % load example3 switching pdf.mat

64 Chapter 8 Stochastic modelling of biological processes 59 %% % create histogram of switching times bin width=5; t stop hist=hist(t stop,:bin width:1); % plot the results figure (1); clf; hold on; box on stairs(bin width/2:bin width:1+bin width/2,... t stop hist'/(sum(t stop hist)*bin width),'linewidth',1.); axis([ 25.2]) set(gca,'xtick',:5:25) set(gca,'ytick',:.1:.2) xlabel('switching time'); ylabel('frequency') To compare analytic estimate for switching time with that obtained using repeated stochastic simulation. function example3 switching analytical lecture() clear all; close all; % parameters k1=1e-3; k2=.75; k3=165; k4=1e4; k5=2; x unstable=25; %% analytical calculations % analytical calculation of the switching time as a function of y x=; %Left Side of domain xn=6; %Right side of domain delta=1.; %Step size % regular mesh for integrals xseries=x:delta:xn; % regular mesh for evaluating tau tauseries=x:delta:x unstable; N=length(xseries); taun=length(tauseries); p=zeros(1,length(xseries)); % drift and diffusion function for the SDE f=@(x)(-k1*x.ˆ3+k2*x.ˆ2-k3*x+k4); d=k5ˆ2/2; % compute the stationary distribution

65 Chapter 8 Stochastic modelling of biological processes 6 % define the function for the stationary distribution pfunc=@(x)exp(integral(@(y)(f(y)./d),,x))/d; p=arrayfun(pfunc,xseries); %Compute the norm of p norm=trapz(xseries,p); %Normalise p p=p/norm; % compute the exit time % compute the function Z(x) = \int ˆx p(y) dy for values x = x to xn Z=cumtrapz(xseries,p); % compute the function Z(x)/(d(x)*p(x)) for values x = x to xn tauintegrand = Z./(d.*p); % compute the exit time, integration limits reversed to work well with cumtrapz tauy=tauintegrand(1:taun); tau=cumtrapz(tauseries(:-1:1),tauy(:-1:1)); % analytical estimate for the switching time from x sˆ1 -tau(15) %% computational approximation no paths=1e4; dt=1e-3; t stop=zeros(no paths,taun); % SDE realisations dw=sqrt(dt)*randn(1e6,1); correct=rand(1e6,1); jj=1; kk=1; for rr=1:taun rr for ii=1:no paths X=tauseries(rr); xold=tauseries(rr); tt=; while X<x unstable tt=tt+dt; X=X+(-k1*X.ˆ3+k2*X.ˆ2-k3*X+k4)*dt+k5*dW(jj); jj=jj+1; if jj>1e6 dw=sqrt(dt)*randn(1e6,1); jj=1; % correction probability if correct(kk)<exp(-(xold-x unstable)*(x-x unstable)/(dt*k5ˆ2/2)) X=1e1; kk=kk+1; if kk>1e6 correct=rand(1e6,1); kk=1;

66 Chapter 8 Stochastic modelling of biological processes 61 xold=x; t stop(ii,rr)=tt; average time=mean(t stop,1); var time=var(t stop,1)/no paths; %% % save example3 switching diffy.mat % load example3 switching diffy.mat %% figure (1); clf; hold on; box on stairs(tauseries,average time,'linewidth',1.); plot(tauseries(:-1:1),-tau,'linewidth',1.); axis([ 25 7]) set(gca,'xtick',:5:25) set(gca,'ytick',:1:7) xlabel('x()'); ylabel('mean exit time')

67 Chapter 9: The chemical Fokker-Planck equation In Lecture 5, we saw that it was possible to approximate the dynamics of the chemical master equation using the chemical Langevin equation: M M X(t + dt) X(t) + dt a j (X(t))ν j + a j (X(t))dW j ν j. (9.1) j=1 where the dw j, j = 1,..., M are statistically indepent Gaussian white noise processes. The computational definition of the chemical Langevin equation is M X(t + t) X(t) + t a j (x(t))ν j + M t a j (x(t))ξ j ν j, (9.2) j=1 where ξ j N (, 1) for j = 1,..., M. In this lecture, we will learn how to connect the chemical Langevin equation with the chemical Fokker-Planck equation, proceeding by means of example. 9.1 Example: production and degradation For the production/degradation system considered in Lecture 2, the chemical Langevin equation becomes j=1 j=1 A k 1, (9.3) k 2 A, (9.4) X(t + dt) = X(t) + [ k 1 X(t) + k 2 ν] dt k 1 X(t)dW 1 + k 2 ν dw 2, (9.5) with computational definition X(t + t) = X(t) + [ k 1 X(t) + k 2 ν] t k 1 X(t) tξ 1 + k 2 ν tξ 2, (9.6) where ξ 1 and ξ 2 are two random numbers sampled from the unit normal distribution. Equation (9.5) is different from those we have studied in the last three lectures because it has two indepent white noises. However, on Problem Sheet 3 we will show that this does not complicate derivation of the corresponding Fokker-Planck equation. For this example, we have drift and diffusion coefficients f(x) = k 1 x + k 2 ν and d(x) = 1 2 (k 1x + k 2 ν), (9.7) and ( ) 2 p(x, t) = t x 2 d(x)p(x, t) ( ) f(x)p(x, t). (9.8) x 62

68 frequency =(y) [sec] Chapter 9 Stochastic modelling of biological processes 63 We are now in a position to apply the theory that we developed over the last three lectures. The stationary distribution is given by [ C x ] p s (x) = d(x) exp f(y) d(y) dy [ 2C x ] = k 1 x + k 2 ν exp k 1 y + k 2 ν 2 k 1 y + k 2 ν dy [ 2C x ] = k 1 x + k 2 ν exp 1 2x + 4k 2 ν k 1 y + k 2 ν dy 2 = C [ k 1 x + k 2 ν exp 2x + 4k ] 2ν log (k 1 x + k 2 ν) k 1 [ ( ) ] = 2 C 4k2 ν exp 2x + 1 log (k 1 x + k 2 ν), (9.9) k 1 where ( 2 C = exp R [ ( ) ] 4k2 ν 1 2x + 1 log (k 1 x + k 2 ν) dx). (9.1) k number of A molecules y Figure 9.1: Right: comparison of the stationary distribution as estimated using repeated stochastic simulation (grey bars) and analytically using Equation (9.9) (blue). Left: mean exit time as estimated using Equation (9.11) (red) and using repeated stochastic simulation (blue). Parameters are A() =, k 1 =.1s 1 and k 2 ν = 1.s 1. We can look at how well the chemical Fokker-Planck equation performs when it comes to estimating mean transition times. Define τ SSA (n), n = 1, 2, 3,..., 18, to be the average time predicted by the Gillespie stochastic simulation algorithm for trajectories to leave the interval (, 18] when the initial condition is A() = n. We can approximate τ SSA (y) using our estimate derived from the Fokker-Planck equation: τ F P E (y) = 19 y 1 z p s (x)dxdz, (9.11) d(z)p s (z) where p s (x) is as given by equations (9.9) and (9.1). We can evaluate the integrals in τ F P E numerically. The results are shown in Figure 9.1, and demonstrate that Equation (9.11) is a good approximation of τ(n).

69 Chapter 9 Stochastic modelling of biological processes The chemical Fokker-Planck equation Now consider general well-stirred reaction system with N species, S 1,..., S N, that may be involved in M possible reactions, R 1,..., R M, as in Lectures 3 and 5. Langevin equation is j=1 j=1 Then the chemical M M X(t + dt) X(t) + dt a j (x(t))ν j + a j (x(t))dw j ν j, (9.12) and the chemical Fokker-Planck equation is t p(x, t) = M ν ji a j (x) p(x, t) x i=1 i j=1 + 1 N 2 M 2 x 2 νjia 2 j (x) p(x, t) i=1 i j=1 N i 1 2 M ν ji ν jk a j (x) p(x, t). x i x k (9.13) i=1 k=1 9.3 References and further reading An Introduction to Stochastic Processes in Physics and Chemistry. L. S. J. Allen. Stochastic Processes in Physics and Chemistry. N. G. van Kampen. 9.4 Tasks Estimate the stationary distribution for the production/degradation system using repeated stochastic simulation, and compare your result with the corresponding analytical estimate derived in equations (9.9) and (9.1). Use repeated stochastic simulation to estimate τ SSA, the average time to leave the interval (, 18] when the initial condition is A() = n. Evaluate the corresponding approximate mean exit time using the Fokker-Planck equation and compare your results. 9.5 Example Matlab code To estimate the stationary distribution for the production-degradation example. j=1 function production degradation stationary lecture() clear all; close all; k1=.1; % decay rate k2v=1; % production rate t final=1; % final time %%

70 Chapter 9 Stochastic modelling of biological processes 65 %estimate stationary distribution using a long sample path no samples=1e7; A stationary=zeros(no samples,1); t=; A=k2V/k1; % initial condition A stationary=zeros(11,1); for i=1:no samples % set the initial time and molecule numbers while t<i A=k1*A+k2V; % calculate a tau=1/a*log(1/rand); % calculate time until next reaction % update molecule numbers and time r2=a*rand; if r2<k1*a A=A-1; else A=A+1; t=t+tau; if A<=1 A stationary(a+1)=a stationary(a+1)+1; %% % analytic expression for the stationary distribution dx=.1; x=:dx:5; ps=2*exp(-2*x+(4*k2v/k1-1)*log(k1*x+k2v)); ps=ps/(sum(ps)*dx); %% % plot results figure(2); clf; cla; hold on; box on bar(:1:1,a stationary/no samples,'facecolor',[.7.7.7],... 'LineWidth',.5,'BarWidth',.6); plot(x,ps,'linewidth',1.) axis([ ]) set(gca,'xtick',:5:25) set(gca,'ytick',:.5:.15) set(gca,'yticklabel',arrayfun(@(s)sprintf('%.2f', s),... cellfun(@(s)str2num(s),get(gca,'yticklabel')),'uniformoutput',false)) ylabel('frequency'); xlabel('number of A molecules')

71 Chapter 1: A simple model of diffusion In Lectures 1-5 we learnt how to analyse and simulate models of chemical reactions where the systems were all assumed to be well-mixed, that is, the concentrations of reacting species were assumed spatially homogeneous throughout the reaction volume, ν. The goal of this lecture is to begin to ext this approach in a simple way to systems that are not well-mixed. To do this we need a model of diffusion 1.1 A compartment-based approach to diffusion Suppose that we want to model the diffusion of chemical species A on the domain [, L] [, h] [, h], where L = Kh. To do this we will divide the domain along the x axis into K compartments of length h. We denote the number of A molecules in the ith compartment, x [(i 1)/h, i/h), by A i, i = 1,..., K. As a result of Brownian motion, the molecules jump between neighbouring compartments. This means that we can model diffusion as the following chain of chemical reactions: where A i d A 1 d d A 2 d d A 3 d d A K, (1.1) d d d d A i+1 means A i A i+1 and A i+1 A i. (1.2) d The rate constant, d, has units s 1. This means that the propensity function for a diffusion event from box i to box i + 1 is da i. The system of chemical reactions (1.1) can be simulated using the Gillespie stochastic simulation algorithm outlined in Section Example We illustrate this method using an example where we simulate 1, molecules starting from position x =.4 mm in the interval x [, L] where L = 1 mm. We will take d =.16 s 1 and K = 4 (so that h =.25 mm). Since x =.4 mm is on the boundary between the 16th and 17th compartments, we take the initial condition to be A 16 () = 5, A 17 () = 5 and A i () = for i 16, 17. The right-hand plot of Figure 1.1 shows six sample paths of individual molecules diffusing in the system, whereas the left-hand plot of Figure 1.1 shows the density profile at time t = 4 minutes. 1.2 Connection to a macroscale diffusion coefficient We would now like to think about how to relate the jump rate, d, to a macroscale diffusion coefficient, D. We denote by p(n, t) the joint probability that A i (t) = n i, i = 1,..., K, where n = [n 1, n 2,..., n K ]. Let us define the operators R i, L i : N K N K (where N is the set of non-negative integers) by R i : [n 1,..., n i, n i+1,..., n K ] [n 1,..., n i + 1, n i+1 1,..., n K ], (1.3) 66

72 time [min] A Chapter 1 Stochastic modelling of biological processes x [mm] x [mm] Figure 1.1: Left: The paths of six individual molecules. Right: histogram of number of A molecules in each compartment (grey bars) together with the solution of the diffusion equation (1.12) (blue). For further details see Section for i = 1,..., K 1, and L i : [n 1,..., n i 1, n i,..., n K ] [n 1,..., n i 1 1, n i + 1,..., n K ], (1.4) for i = 2,..., K. Then the (diffusion) chemical master equation, which corresponds to the system of chemical reactions given by Equation (1.1), can be written as follows p(n, t) t K 1 = d {(n j + 1) p(r j n, t) n j p(n, t)} +d j=1 K {(n j + 1) p(l j n, t) n j p(n, t)}. (1.5) j=2 The mean is defined as the vector M(t) [M 1, M 2,..., M K ] where M i (t) = n n i p(n, t) n i p(n, t), (1.6) n 1 = n 2 = n K = gives the mean number of molecules in the ith compartment, i = 1, 2,..., K. To derive an evolution equation for the mean vector M(t) we can follow the method from Section 2.1. Multiplying (1.5) by n i and summing over all the possible values the state vector, n, can take, we obtain (see Problem Sheet 3) a system of equations for M i of the form M i t M 1 t M K t = d (M i+1 2M i + M i 1 ), for i = 2,..., K 1, (1.7) = d (M 2 M 1 ), (1.8) = d (M K 1 M K ). (1.9) The classical deterministic description of diffusion is written in terms of concentration a(x, t) which can be approximated as a(x i, t) M i (t)/h where x i is the centre of the ith compartment, i = 1, 2,..., K. Dividing (1.7) by h, we obtain t a(x i, t) d (a(x i + h, t) 2a(x i, t) + a(x i h, t)). (1.1)

73 Chapter 1 Stochastic modelling of biological processes 68 By Taylor expanding the right-hand side we arrive at t a(x i, t) = dh 2 2 x 2 a(x i, t) + O ( h 4). (1.11) This means that, in the limit h, the system of equations (1.1) is equivalent to the diffusion equation with D = dh 2. Since molecules of A cannot move left out of compartment 1 or right out of compartment K, we see from Equations (1.8) and (1.9) that zero flux boundary conditions are appropriate for the diffusion equation so that we have a t = D 2 a x 2 for x (, L) with a x =. (1.12) x=,l The system is closed by specifying appropriate initial conditions. Comparison of the solution of Equation (1.12) with the results from stochastic simulation of the system are shown in Figure Analysis of the variance We can extract further information from the (diffusion) chemical master equation by also considering evolution of the variance vector V (t) [V 1 (t), V 2 (t),..., V K (t)] where V i (t) = (n i M i (t)) 2 p(n, t) (n i M i (t)) 2 p(n, t), (1.13) n n 1 = n 2 = n K = gives the variance in the number of A molecules in compartment i. To derive the evolution equation for the vector V (t), we define more generally the covariance matrix {V ij } by V ij = n n i n j p(n, t) M i M j for i, j = 1, 2,..., K. (1.14) From Equation (1.14) we see that the variance vector comprises the diagonal entries of this matrix: V i = V ii for i = 1, 2,..., K. Multiplying Equation (1.5) by n 2 i and summing over n, we obtain { K 1 n 2 i p(n, t) = d n 2 i (n j + 1)p(R j n, t) } n 2 i n j p(n, t) t n j=1 n n { K +d n 2 i (n j + 1)p(L j n, t) } n 2 i n j p(n, t). (1.15) j=2 n n Let us consider the case that i {2,..., K 1}. We evaluate first the term corresponding to j = i in the first sum on the right-hand side. We have n 2 i (n i + 1)p(R i n, t) n 2 i n i p(n, t) = (n i 1) 2 n i p(n, t) n 2 i n i p(n, t) n n n n = ( 2n 2 i + n i )p(n, t) (1.16) n = 2V i 2M 2 i + M i. Here we changed indices in the first sum R i n n and then used Equations (1.6) and (1.14). Similarly, the term corresponding to j = i 1 in the first sum on the right-hand side of Equation (1.15) can be rewritten as n 2 i (n i 1 + 1)p(R i 1 n, t) n 2 i n i 1 p(n, t) n n = (2n i n i 1 + n i 1 )p(n, t) n = 2V i,i 1 + 2M i M i 1 + M i 1. (1.17)

74 Chapter 1 Stochastic modelling of biological processes 69 Other terms corresponding to j i, (i 1) in the first sum on the right-hand side of Equation (1.15) are equal to zero. The second sum on the right-hand side of Equation (1.15) can be handled analogously to give, finally, t n 2 i p(n) = d { 2V i,i 1 + 2M i M i 1 + M i 1 2V i 2M 2 } i + M i n +d { 2V i,i+1 + 2M i M i+1 + M i+1 2V i 2Mi 2 } + M i. (1.18) Now using Equation (1.14) and Equation (1.7) on the left-hand side of Equation (1.18), we obtain t n n 2 i p(n, t) = V i t + 2M M i i t Substituting this into Equation (1.18), we have = V i t + d ( 2M i M i+1 + 2M i M i 1 4Mi 2 ). (1.19) V i t = 2d {V i,i+1 + V i,i 1 2V i } + d {M i+1 + M i 1 + 2M i }, (1.2) for i = 2,..., K 1. A similar analysis gives V 1 t V K t = 2d{V 1,2 V 1 } + d{m 2 + M 1 }, (1.21) = 2d{V K,K 1 V K } + d{m K 1 + M K }. (1.22) We see that the evolution equation for the variance vector, V (t), deps on the mean, M, variance, V, and on non-diagonal terms of the covariance matrix, {V ij }. To get a closed system of equations, we have to derive evolution equations for V ij too. This can be done by multiplying (1.5) by n i n j, summing over n and following the same arguments as before. 1.4 References and further reading A practical guide to stochastic simulations of reaction-diffusion processes. R. Erban, S. J. Chapman and P. K. Maini. arχiv (27) Tasks Generate a number of sample paths from system (1.1) using the Gillespie algorithm. Show that the averaged discrete dynamics is consistent with the diffusion equation. [Note that to do this you could, for example, use a forward Euler method for the time stepping, with centered finite diferences for the second spatial derivative.] 1.6 Example Matlab code To generate sample paths for a simple model of diffusion. function example1 sample paths lecture() clear all; close all;

75 Chapter 1 Stochastic modelling of biological processes 7 %% D=.1; % diffusion coefficient L=1; % domain length t final=1*6; % final time no boxes=4; h=1/no boxes; no realisations=6; X initial=.4; % initial position X=cell(no realisations,1); t=cell(no realisations,1); for ii=1:no realisations % initial position corresponds to a compartment boundary % put half into each neighbouring compartment if ii<=no realisations/2 X{ii}(1)=X initial-h/2; t{ii}(1)=; else X{ii}(1)=X initial+h/2; t{ii}(1)=; ; time=; kk=1; while time<t final r1=rand; r2=rand; a=2*d/(h*h); time=time+(1/a)*log(1/r1); % time of the next reaction % check to see if left-ward jump if r2*a<d/hˆ2 X{ii}(kk+1)=X{ii}(kk)-h; else % if not, must be right-ward jump X{ii}(kk+1)=X{ii}(kk)+h; ; % if jumped left out of domain, reflect back into domain if X{ii}(kk+1)< X{ii}(kk+1)=h/2; ; % if jumped right out of domain, reflect back into domain if X{ii}(kk+1)>L X{ii}(kk+1)=L-h/2; ; t{ii}(kk+1)=time; kk=kk+1; ; ; %%

76 Chapter 1 Stochastic modelling of biological processes 71 % plot the results figure(1); clf; hold on; box on for ii=1:no realisations t{ii}=t{ii}/6; stairs(x{ii},t{ii},'linewidth',1) axis([ 1 1]) set(gca,'xtick',:.2:1.) set(gca,'ytick',:2:1) xlabel('x [mm]') ylabel('time [min]') set(gca,'xticklabel',arrayfun(@(s)sprintf('%.1f', s),... cellfun(@(s)str2num(s),get(gca,'xticklabel')),'uniformoutput',false)) To estimate the distribution of particle positions. function example1 distribution lecture() clear all; close all; %% D=.1; % diffusion coefficient L=1; %domain length t final=4*6; % final time no molecules=1e4; % total number of molecules no boxes=4; % number of compartments h=l/no boxes; % width of compartments mesh=[h/2:h:l-h/2]; A=zeros(no boxes,1); A(16)=no molecules/2; %put half the molecules into box 16 initially A(17)=no molecules/2; %put half the molecules into box 17 initially time=; while time<t final r1=rand; a=2*d/(hˆ2)*(no molecules-a(1)/2-a(no boxes)/2); % calculate a time=time+(1/a)*log(1/r1); % calculate time of next reaction ss=; k=; % decide which reaction occurred r2=rand; % check to see if molecule jumped to the right while ss<=r2*a && k<no boxes-1 k=k+1; ss=ss+d/hˆ2*a(k); ; % implement rightwards jump from correct compartment if ss>r2*a

77 Chapter 1 Stochastic modelling of biological processes 72 A(k)=A(k)-1; A(k+1)=A(k+1)+1; else % else molecule must have jumped to the left k=1; while ss<=r2*a && k<no boxes k=k+1; ss=ss+d/hˆ2*a(k); ; % implement leftwards jump from correct compartment A(k)=A(k)-1; A(k-1)=A(k-1)+1; ; ; %% % solve the diffusion equation numerically using the forward Euler method dt=.1; M=zeros(no boxes,1); M(16)=no molecules/2; M(17)=no molecules/2; hh=dt*d/(hˆ2); time PDE=; while time PDE<=t final; M old=m; M(1)=M old(1)+hh*(m old(2)-m old(1)); M(2:-1)=M old(2:-1)+hh*(m old(3:)+m old(1:-2)-2*m old(2:-1)); M()=M old()+hh*(m old(-1)-m old()); time PDE=time PDE+dt; ; %% % plot the results figure(1); clf; hold on; box on bar(mesh,a,'facecolor',[.7.7.7],'linewidth',.5,'barwidth',.6) plot(mesh,m,'linewidth',1) axis([ 1 5]) set(gca,'xtick',:.2:1.) set(gca,'ytick',:1:5) xlabel('x [mm]') ylabel('a') set(gca,'xticklabel',arrayfun(@(s)sprintf('%.1f', s),... cellfun(@(s)str2num(s),get(gca,'xticklabel')),'uniformoutput',false))

78 A A Chapter 11: The reaction-diffusion master equation Now that we have outlined a simple model for diffusion, we are now in a position to write down a model for chemical reactions that take place in a reaction volume that cannot be assumed well-mixed A compartment-based model for production, degradation and diffusion Suppose that we want to model the production, degradation and diffusion of chemical species A on the domain [, L] [, h] [, h], where L = Kh. As in Lecture 1, to do this we will divide the domain along the x axis into K compartments of length h. We denote the number of A molecules in the ith compartment, x [(i 1)/h, i/h), by A i, i = 1,..., K. The reaction-diffusion process we will consider can be described by the following set of chemical reactions: A 1 d d A 2 d d A 3 d d A K, (11.1) d d k A 1 i, for i = 1, 2,..., K, (11.2) k 2 A i, for i = 1, 2,..., K/5. (11.3) Here, Equation (11.2) describes the decay of A molecules in each compartment at rate k 1 s 1, and Equation (11.3) describes the production of A molecules in each of the first K/5 compartments at rate k 2 h 3 s x [7m] 5 1 x [7m] Figure 11.1: Histogram of number of A molecules in each compartment (grey bars) together with the solution of the reaction-diffusion equation (11.9) (blue) at (a) t = 1 minutes and (b) t = 3 minutes. Parameters are: L = 1µm, D = 1µms 1, K = 4 (h = 25µmm), k 1 = s 1 and k 2 = 2 1 5µm 3 s 1. This model is relatively easy to simulate using the Gillespie algorithm. Each diffusion reaction has propensity function da i (t) s 1, where D = dh 2, whilst the production reactions have 73

79 Chapter 11 Stochastic modelling of biological processes 74 propensity function k 2 h 3 s 1 and the decay reactions have propensity function k 1 A i (t) s 1. Figure 11.1 shows the results from stochastic simulation, starting with no molecules of A in the system The reaction-diffusion master equation This model can be analysed using the reaction-diffusion master equation. Let p(n, t) = P(A(t) = n A() = A ), where A = [A 1, A 2,..., A K ] and n = [n 1, n 2..., n K ]. Then the reaction-diffusion master equation for system (11.1)-(11.3) can be written as follows: p(n, t) t K 1 = d {(n i + 1) p(r i n, t) n i p(n, t)} +d i=1 K {(n i + 1) p(l i n, t) n i p(n, t)} i=2 K +k 1 {(n i + 1)p(n 1,..., n i + 1,..., n K, t) n i p(n, t)} i=1 K/5 +k 2 h 3 {p(n 1,..., n i 1,..., n K, t) n i p(n, t)}. (11.4) i=1 Following similar derivations to those previously, we can show that M 1 t M i t M i t M K t = d (M 2 M 1 ) + k 2 h 3 k 1 M 1, (11.5) = d (M i+1 2M i + M i 1 ) + k 2 h 3 k 1 M i, for i = 2,..., K/5, (11.6) = d (M i+1 2M i + M i 1 ) k 1 M i, for i = K/5 + 1,..., K 1, (11.7) = d (M K 1 M K ) k 1 M K. (11.8) Expanding using Taylor series, we can show that, in the limit h, the concentration of A molecules is given by a t = D 2 a x 2 + k 2χ [,L/5] k 1 a, with a x =. (11.9) x=,l Again, the system is closed by specifying appropriate initial conditions. Figure 11.1 compares the solution of Equation (11.9) with the results generated using stochastic simulation with the initial condition a(x, ) = A compartment-based model for higher order reactions We now consider the reaction and diffusion of species A and B on domain [, L] [, h] [, h], where L = Kh where L = 1µm and h = 25µm and A + A k 1, A + B k 2, (11.1) A k 3, B k 4, k 5 A, (11.11) k 6 B in subdomain [3L/5, L] [, h] [, h]. (11.12)

80 A B Chapter 11 Stochastic modelling of biological processes 75 We model this system using a compartment-based approach: we divide the computational domain into 4 compartments of volume h 3 and denote the number of A (B) molecules in compartment i as A i (t) (B i (t)) for i = 1,..., K. Denoting the diffusion coefficients of A and B by D A and D B, respectively, then the diffusion chemical reactions are A 1 d A da A 2 d A da A 3 d A da d A A K, (11.13) da B 1 d B db B 2 where d A = D A /h 2 and d B = D B /h 2. d B db B 3 d B d B B K, (11.14) db db Second order reactions are implemented in the compartment-based approach by assuming that only molecules that are in the same compartment can react with each other. This means that we have the following chemical reactions: k A i + A 1 k i, Ai + B 2 i, i = 1, 2,..., K, (11.15) k A 3 k i, 4 k Bi, 5 Ai, i = 1, 2,..., K, (11.16) k 6 B i, i = 3K/5 + 1,..., K. (11.17) x [7m] 5 1 x [7m] Figure 11.2: Left: Histogram of number of A molecules in each compartment (grey bars) together with the solution of Equation (11.18) (blue). Right: Histogram of number of B molecules in each compartment (grey bars) together with the solution of Equation (11.19) (blue). Parameters are: L = 1µm, D A = 1.µm 2 s 1, D B = 1.µm 2 s 1, K = 4 (h = 25µmm), k 1 = µm 3 s 1 and k 2 = µm 3 s 1, k 3 = s 1, k 4 = s 1, k 5 = 1 7 µm 3 s 1 and k 5 = 1 6 µm 3 s 1. As with the well-stirred case, the introduction of second order reactions means that we cannot write down closed form solutions for the mean number of A and B molecules in the system. To make progress, we can use the Law of Mass Action to write down partial differential equations describing the concentrations a(x, t) A i (t)/h 3 and b(x, t) B i (t)/h 3 where x ih: a t b t = D A 2 a x 2 2 k 1 h 3 a2 k 2 h 3 ab k 3a + k 5 h 3, (11.18) = D B 2 b x 2 k 2 h 3 ab k 3b + k 6 h 3 χ [3L/5,L], (11.19)

81 Chapter 11 Stochastic modelling of biological processes 76 together with zero flux boundary conditions a x =, x=,l and b x =. (11.2) x=,l Figure 11.2 shows both the results from stochastic simulation, as well as solution of the approximate partial differential equation of the system Choice of compartment size, h An important question that was not addressed in the previous sections is: What is the appropriate choice of the compartment size, h?. Until now, we have mostly considered linear models and we were able to derive exact equations for the mean molecule numbers, for example, (11.5)-(11.8), and derive the corresponding deterministic reaction-diffusion partial differential equation (11.9) for the concentration of A by dividing by h 3 and taking the limit as h. Equation (11.9) can also be viewed as an equation for the probability distribution function of a single molecule. Consequently, for reaction-diffusion systems involving only zero- and firstorder chemical reactions we can increase the accuracy of the stochastic simulation algorithm by decreasing h. The situation is much more delicate when the system involves second or higher order reactions. In this case, although diffusion is modelled more accurately as h is decreased, the reactions might be modelled less accurately as h is decreased, so that we lose accuracy if we choose h too small. We demonstrate this phenomenon using the following illustrative example. We consider chemical species A and B that diffuse in the cubic domain [, L] [, L] [, L] and are subject to the following two chemical reactions A + B k 1 B, (11.21) k 2 A. (11.22) Since the number of B molecules is preserved, the dynamics of the model are simple: some molecules of A are produced by the second reaction and some are destroyed by the first reaction. Thus, after an initial transient behaviour, the number of A molecules fluctuates around its equilibrium value The well-stirred case In Lecture 2, we investigated this chemical system under the assumption that the reactor is well-stirred. Then the propensity of the first reaction, (11.21), is α 1 (t) = A(t)B k 1 /ν where B is the (constant) number of molecules of B and ν = L 3 is the volume of the cubic domain [, L] [, L] [, L]. The propensity of the second reaction, (11.22), is α 2 (t) = k 2 ν. In particular, the system is equivalent to the production/degradation example (2.1)-(2.2) and we know that the stationary distribution can be written φ(n) = 1 ( k2 ν 2 ) n exp [ k 2ν 2 ], n! k 1 B k 1 B n =, 1, 2, 3,... (11.23) Including spatial heterogeneity Our goal is to highlight that a very small compartment size h leads to large computational errors. We divide the cubic domain [, L] [, L] [, L] into K 3 cubic compartments of volume h 3 where K 1 and h = L/K. To formulate precisely the compartment-based stochastic simulation

82 Chapter 11 Stochastic modelling of biological processes 77 algorithm for the illustrative chemical system (11.21)-(11.22) in the reactor [, L] [, L] [, L], we denote the compartments by indices from the set I all = {(i, j, k) i, j, k are integers such that 1 i, j, k K}. (11.24) Let A ijk (t) (respectively, B ijk (t)) be the number of molecules of the chemical species A (respectively, B) in the (i, j, k)-th compartment at time t where (i, j, k) I all. Diffusion is modelled as a jump process between neighbouring compartments. Let us define the set of possible directions of jumps E = {[1,, ], [ 1,, ], [, 1, ], [, 1, ], [,, 1], [,, 1]}. (11.25) For every (i, j, k) I all, we also define E ijk = {e E ((i, j, k) + e) I all }, (11.26) i.e E ijk is the set of possible directions of jumps from the (i, j, k)-th compartment. For most compartments E ijk will be the full set of possible jumps E, but for compartments on the boundary the set of jumps is restricted. The notation E ijk avoids us having to write down separate equations for each boundary compartment. The main idea of the compartment-based approach is that the small compartments are assumed to be well-mixed, and that only molecules in the same compartment can react according to bimolecular reactions. Thus the compartment-based reaction-diffusion model can be written using the chemical reaction formalism as follows: k A ijk + B 1 ijk Bijk, k 2 A ijk, for (i, j, k) I all, (11.27) A ijk D A /h 2 A ijk+e, for (i, j, k) I all, e E ijk, (11.28) B ijk D B /h 2 B ijk+e, for (i, j, k) I all, e E ijk, (11.29) where D A (respectively, D B ) is the diffusion constant of A (respectively, B). The propensity functions of reactions (11.27) are α ijk,1 (t) = A ijk (t)b ijk (t)k 1 /h 3, α ijk,2 (t) = k 2 h 3, (11.3) where h 3 is the volume of the compartment. The reactions (11.28)-(11.29) correspond to diffusive jumps between neighbouring compartments. The propensity functions of these reactions are equal to A ijk (t)d A /h 2 and B ijk (t)d B /h 2. The number of molecules of A in the whole container [, L] [, L] [, L] is given by A(t) = A ijk (t). (11.31) (i,j,k) I all Let p n (t) be the probability that A(t) = n, and φ K (n) be the stationary distribution φ K (n) = lim t p n (t), (11.32) so that φ K (n) is the probability that there are n molecules of A in the system when it is simulated using K compartments, provided that the system is observed for a long time. Since A molecules are produced uniformly in the volume, we do not expect any spatial variation in the probability distribution for the number of A molecules. This means that we expect φ K (n) = φ(n) K.

83 stationary distribution Chapter 11 Stochastic modelling of biological processes Results from stochastic simulation In Figure 11.3, we present the stationary distributions φ K (n) for K = 1, 2, 2. We observe that the peak of φ K (n) moves to the right as K is increased (i.e. as h is decreased) and that φ K (n) does not converge to φ 1 (n) as h. In fact, φ K (n) does not converge to any distribution as h ; it moves further and further to the right as h is decreased. The shift of φ K (n) to the right in Figure 11.3 is caused by the bimolecular reaction being lost in the limit h (i.e. this reaction does not occur as frequently as it should when h is too small). This makes the assessment of the accuracy of computations more challenging than in the deterministic case..2 L h number of A molecules Figure 11.3: Stationary distribution φ K (n), given by (11.32), for K = 1 (grey bars),k = 2 (blue), K = 2 (orange) and K = 1 (yellow), computed from long time simulations generated using the Gillespie stochastic simulation algorithm. Parameters are k 1 =.2µm 3 s 1, k 2 = 1µm 3 s 1, D A = D B = 1µm 2 s 1, L = 1µm and B = Conclusions The compartment-based stochastic simulation algorithm is generally considered valid only for a range of values of h. In particular, h must not be too small. For the second-order reactions lik (11.21) this constraint is usually stated in the form h k/(d A +D B ) where k is the reaction rate constant (of any second or higher order reactions present). To satisfy this condition in our particular example, we could simply choose h = L. However, if the system under consideration has some spatial variations, then we obviously want to choose h small enough to capture the desired spatial resolution. This leads to a restriction on h from above, namely L h. Thus it is often suggested to choose h small (to satisfy L h) but not too small (in order to satisfy h k/(d A + D B )). The optimal choice of h is subject of current research Models of pattern formation In this section, we will discuss the stochastic equivalents of two models for spatial patterning. The first is the French Flag model and relates to patterning via concentration gradients, whilst the second is the mechanism of diffusion-driven instability The French flag model Here one assumes that the domain is prepatterned: in our example from Section 11.1, we considered a chemical A that is produced in only part of the domain [, L], specifically, in

84 number of molecules number of molecules Chapter 11 Stochastic modelling of biological processes 79 [, L/5]. We assume that the interval [, L] describes a layer of cells which are sensitive to the concentration of the chemical A. In particular, we suppose that a cell can have three different fates (e.g. different genes are switched on or off) deping on the concentration of A. Then the concentration gradient of A can help to distinguish three different regions in [, L]; see Figure If the concentration of A is high enough (above a certain threshold), a cell follows the blue program. The white program is followed for medium concentrations of A, and the red program is followed for low concentrations x [mm] x [mm] Figure 11.4: Left: The deterministic (partial differential equation) version of the French flag model. Right: the corresponding stochastic version Diffusion-driven instability Consider a system of two chemical species A and B in the elongated computational domain [, L] [, h] [, h], where L = 1mm and h = 25µm, which react according to the Schnakenberg system of chemical reactions (see Sheet 1 Question 4): 2A + B k 1 3A; k 2 A; A k 3 ; k 4 B. (11.33) If molecules of A and B are well-mixed then, for the parameter values in Figure 11.5, the corresponding deterministic system of ordinary differential equations has one non-negative stable steady state equal to a s = 2 and b s = 75 molecules per volume, h 3. Introducing diffusion to the model, one steady state solution of the spatial problem is the constant one (a s, b s ) everywhere. However, this solution might not be stable (so might not be seen in reality) if the diffusion constants of A and B differ significantly. To simulate the reaction-diffusion problem with the Schnakenberg system of chemical reactions (11.33), we follow the compartment-based method of Section Starting with a uniform distribution of chemicals A i () = a s = 2 and B i () = b s = 75, i = 1, 2,..., K, at time t =, we plot the numbers of molecules in each compartment at time t = 3 minutes computed by the Gillespie stochastic simulation algorithm in Figure To demonstrate the idea of patterning, compartments with above steady state values, a s or b s, are plotted in blue and other compartments are plotted in red. We see in Figure 11.5 that the chemical A can be clearly used to divide our computational domain into several regions. There are two and half blue peaks in this figure. The number of blue peaks deps on the size of the computational domain [, L] and it is not a unique number in general. The reaction-diffusion system has several favourable states, each with a different number of blue peaks.

85 number of A molecules number of B molecules Chapter 11 Stochastic modelling of biological processes x [mm] x [mm] Figure 11.5: Turing patterns. Left: numbers of molecules of chemical species A in each compartment at time 3 minutes. Right: the same plot for chemical species B. Parameters are k 1 /h 6 = 1 6 s 1, k 2 h 3 = 1s 1, k 3 =.2s 1 and k 4 h 3 = 3s 1, D A = 1 5 mm 2 s 1 and D B = 1 3 mm 2 s References and further reading A convergent reaction-diffusion master equation. S. A. Isaacson. J. Chem. Phys. 139 (213) Tasks Implement a stochastic simulation algorithm to generate a number of sample paths from system (11.1)-(11.3). Compare you results with those generated by solving the corresponding partial differential equation model.

y [mm] y [mm] Chapter 12: Diffusion and stochastic differential equations Consider a typical protein molecule immersed in the aqueous medium of a living cell.

86 y [mm] y [mm] Chapter 12: Diffusion and stochastic differential equations Consider a typical protein molecule immersed in the aqueous medium of a living cell. As with any small particle, it has a non-zero kinetic energy which is proportional to the absolute temperature. In particular, the protein molecule has a non-zero instantaneous speed. However, it cannot travel too far before it bumps into other molecules (e.g. water molecules) in the solution. As a result, the trajectory of the molecule is not straight but it executes a random walk, the well-known Brownian motion. This means that the position of the molecule evolves according to X(t + dt) = X(t) + 2DdW x, (12.1) Y (t + dt) = Y (t) + 2DdW y, (12.2) Z(t + dt) = Z(t) + 2DdW z, (12.3) where [X(t), Y (t), Z(t)] R 3 is the position of the diffusing molecule at time t, and D is the diffusion constant. We can simulate sample paths from (12.1)-(12.3) by using the computational definition we used in Lecture 6, i.e. we choose time step t and compute the solution iteratively using X(t + t) = X(t) + 2D tξ x, (12.4) Y (t + t) = Y (t) + 2D tξ y, (12.5) Z(t + t) = Z(t) + 2D tξ z, (12.6) where ξ x, ξ y, ξ z N (, 1) x [mm] x [mm] Figure 12.1: Left: six trajectories generated using equations (12.4)-(12.6). All trajectories start at the origin and the points are marked with an asterisk. Right: corresponding solution of the diffusion equation (12.13). Parameters are D = 1 4 mm 2 s 1 and t = 1 minutes. 81

Gillespie s Algorithm and its Approximations. Des Higham Department of Mathematics and Statistics University of Strathclyde

Gillespie s Algorithm and its Approximations Des Higham Department of Mathematics and Statistics University of Strathclyde djh@maths.strath.ac.uk The Three Lectures 1 Gillespie s algorithm and its relation