Stochastic model of mrna production

Stochastic model of mrna production We assume that the number of mrna (m) of a gene can change either due to the production of a mrna by transcription of DNA (which occurs at a rate α) or due to degradation of the mrna (which occurs at a rate β). We aim to describe the evolution of P(m, t) the probability to have m molecules in a cell at time t. Derivation of the Chemical Master Equation (CME) The evolution of m is governed by two processes: production (birth) m m + at propensity α and degradation (death) m m at propensity βm. The probability to have m molecules at time t + is given by P(m, t + ) = P(m +, t)(β(m + )) + P(m, t)(α) + P(m, t)( βm α) This leads to the CME dp(m, t) = β(m + )P(m +, t) + αp(m, t) (βm + α)p(m, t) It represents the equation of evolution for P(m, t) and is composed of two contributions: the incoming flux corresponding to transitions to the m-state and outgoing flux corresponding to transitions form the m-state. Steady-state solution The state-state solution is given by setting dp/ = 0 for every m. For m = 0 this gives P() = α/βp(0). For m =, this leads to P(2) = (α/β)p()/2 = (α/β) 2 /2P(0). By recursion, we can easily show that α m P(m) = P(0) m! ( β ) We require that the probability over all states be normalized: m=0 P(m) =.This implies P(0) = exp[ λ] with λ = α/β. Therefore, P(m) is given by the Poisson distribution of parameter λ P(m) = exp[ λ] m! λm In []: import numpy as np import matplotlib.pyplot as plt %matplotlib inline In [9]: from scipy.stats import poisson,norm lamb= lamb2=50 m=np.arange(00) plt.plot(m,poisson.pmf(m,lamb)) #Poisson distribution plt.plot(m,norm.pdf(m,lamb,np.sqrt(lamb)),'--b') #Gaussian approximation plt.plot(m,poisson.pmf(m,lamb2),'r') #Poisson distribution plt.plot(m,norm.pdf(m,lamb2,np.sqrt(lamb2)),'--r') #Gaussian approximation sur 7 22//206 :28

For Poisson distributions, the mean is equal to λ and the variance σ 2 equals the mean. For large λ values, it could be approximated by a Gaussian distribution exp[ (m λ ) 2 /(2λ)]/ 2πλ. To measure the strength of noise or fluctuations in the system, there are two standard quantities: ) the coefficient of variation CV, equals to the ratio between the standard deviation and the mean of the corresponding distribution. For the simple mrna production: CV = / λ. 2) The Fano factor F, equals to the ratio between the variance and the mean. It assesses the level of noise compared to a Poisson distribution. For the simple mrna production: F =. General form of the CME Assume that the system state is described by a vector x of dimension p (corresponding to the p chemical entities possibly present in the system). Consider a set of processes/reactions (propensity given by w l ). For each process, the stochoimetry change is fixed and is given by n l (one index of n l could be positive (birth) or negative (death)). Then the CME for P( x, t) is given by dp( x, t) = w ( )P(, t) ( ) P( x), t) l x n l x n l w l x ( l ( l ) The first sum (incoming flux) indicates sum over transitions to state x and the second sum (outgoing flux) indicates sum over transitions from state x. Note that the CME can be viewed as a system of linear ordinary differential equation dp/ = AP. Therefore, at least theoretically, solutions can be expressed in terms of eigenvectors and eigenvalues of the A matrix. The eigenvector corresponding to eigenvalue of zero is the stationnary distribution. Solution of the CME: generating functions Not all CMEs are solvable by straightforward iteration as in the previous example. A more generalizable way to solve CME is by the introduction of generating functions. Let's illustate the concept on the simple mrna production example. We define the generating function as G(x, t) = P(m, t) x m m=0 a power-series expansion in a continuous variable whose coefficients are the probabilities P(m). Note that in the general case of p variables, we can also define a generating function G( x,..., x p ) = m... mp P( m,..., m p, t) x m... x m p p. The generating function is equivalent to the Laplace transform ( f (t) exp[ st]) for function with discrete variable. 0 The probability may be recovered by the inverse transformation P(m, t) = G(x, t m! m x ) x=0 Moreover, the kth moment can also be generated by m k = k log x G(x, t) x= Also G(, t) =. The utility of generating function for solving CME is that it turns an infinite set of ODEs into a single partial differential equation. To obtain it, we have to multiply each part of the CME equation by x m and sum over m. This leads to dg = (x )(αg β x G) The steady-state solution is then given by αg β x G = 0, hence G(x) = G(0) exp[λx] with G(0) = exp[ λ] ( G() = ). Finally, by iteratively differenting by x, we retrieve the Poisson distribution for P(m). 2 sur 7 22//206 :28

Different levels of approximations Fokker-Planck approximation For many larger systems, full solution of CME is not possible. In some cases, one can make progress by deriving an approximate equation which is valid when the molecule numbers are large. We will illustrate this on the simple mrna production example, but the method is easily generalizable to more complex models. The Fokker-Planck (FP) approximation is derived under the assumption that the typical number of mrna is large ( m >> ), such that m can be approximated by a continuous variable and a change of ± can be treated as a small change. First let's rewrite the CME as dp = [ w + P](m ) + [ w P](m + ) [ w + P + w P](m) with w + (m) = α and w (m) = βm. We can expand the function [wp](m ± ) to second order: [wp](m ± ) = [wp](m) ± m [wp] + [wp] 2 2 m This leads to the Fokker-Planck equation for P t P(m) = m [v(m)p(m)] + [D(m)P(m)] 2 2 m with v(m) = w + w an effective drift velocity (effective force) that recovers the right-hand side of the law of mass-action, and D(m) = w + + w an effective diffusion (effective temperature) constant that sets the scale of the fluctuations. From, the FP equation, by multiplying by m and by integrating over m, one can find d m = v = α β m we retrieve here the law of mass-action for m. Idem for the variance C = m 2 m 2 σ 2 dc = 2( mv m v ) + D = 2βC + α + β m At steady-state, m = α/β = λ and C = λ (we retrieve that the variance is equal to the mean). The full steady-state solution is easily obtained by noticing that the FP equation is a continuity equation of the form t P = m j(m) with j(m) = vp m (DP)/2 the current probability. At steady-state, the current is constant (and equal to 0 since P(m) and m P 0 for n ), leading to N P(m) = exp[ 2m] λ ( + m 4λ λ ) with N a normalization constant. 3 sur 7 22//206 :28

In [7]: from scipy.stats import poisson,norm lamb=20 mmax=3*lamb m=np.arange(mmax) ma= np.linspace(0, mmax, 256*6, endpoint=true) p=(+ma/lamb)**(4*lamb-)*np.exp(-2*ma) p=p/np.sum(p)/(ma[]-ma[0]) plt.plot(m,poisson.pmf(m,lamb)) #Poisson distribution plt.plot(ma,p,'r') #solution of the FP equation The linear noise approximation The linear noise approximation (LNA) still assumes that the number of molecules is large, but also that the fluctuations around the mean molecule numbers are small. More precisely, we can note m = m + δm. For the equation of evolution of the mean (see above), d m = v(m) v( m ) This is exactly the law of mass action. For the variance C = δ m 2, we have dc = 2 δmv + D 2 v ( m )C + D( m ) It could be shown that LNA is equivalent to approximate the distribution P by a Gaussian of mean m and variance C. The Langevin approximation We now consider another stochastic approximation: the Langevin equation. The advantage of the Langevin approach is that a large amount can be learned about the process without findind the full distribution. Starting from the FP equation, it can be shown that equation of evolution for the instantaneous number of molecules inside a cell at a given time t can be written as dm = v(m) + η(t) with η(t) a delta-correlated white noise: η(t) = 0, η(t)η( t ) = D(m) δ(t t ). This description is equivalent to FP. Taking the average value of the Langevin equation or multiplying it by m and taking the average leads to the exact same equation of evolution for $ andc$ as in the FP approximation. An interesting application of the Langevin approximation is to consider the Fourier transform of it iw m = v (w) + η (w) with η = 0 and η (w) η ( w ) = 2π D(m) δ(w w ). If we note m = m + δm, it can be shown that C (w) = d w δm ~ (w) δm ~ ( w ), this is called the power spectrum because it tells us which frequency modes contribute most to the form of the noise (analogy with filters in electronics for example). For our simple example at steady-state, we find iw δm ~ = β δm ~ + η, so δm ~ = η /( iw + β) and C /(2π) = 2α/( w 2 + β 2 ). This is equivalent to a low-pass filter, meaning that noise with high frequency will be filtered. 4 sur 7 22//206 :28

In [22]: alpha=0 beta= w=np.linspace(0, beta*0, 256, endpoint=true) plt.plot(np.log0(w),np.log0(2*alpha/(w**2+beta**2))) /Users/jostd/anaconda/lib/python3.5/site-packages/ipykernel/ main.py:4: Run timewarning: divide by zero encountered in log0 Stochastic simulations: the Gillespie algorithm Solution of CME are rarely tractable. Moreover, the previous approximations are also sometimes not easy to used or they are not applicable. However, it is possible to generate exact trajectories, ie individual realizations of the biochemical system described by the transition model. Simulations can be performed efficiently with the Gillespie algorithm which computes the history of the system as a set of state jumps occuring at randomly sampled times. If enough trajectories are computed (enough statistics), the underlying probability distribution functions or the corresponding moments may be easily computed. Let's assume that at time t a reaction just occurs and the system is in state i, we first ask when the next reaction will occur? We assume that in state i the system can make l transitions (characterized by propensities w l (i)) to different states. We note w tot (i) = l w l the total reaction rate for state i. The probability that the system remains in state i at time t + τ is given by p(τ) = exp[ w tot τ]. The probability that a reaction occurs between t + τ and t + τ + dτ is given by dp S(τ)dτ = p(τ) p(τ + dτ) = dτ = w tot exp[ w tot τ]dτ dτ This means that the time of the next reaction is a random variable drawn from an exponential distribution w tot exp[ w tot τ]. 5 sur 7 22//206 :28

In [24]: wtot= tau=np.linspace(0, 0/wtot, 256, endpoint=true) plt.plot(tau,wtot*np.exp(-wtot*tau)) A standard method to sample a time τ from such distribution is to choose a random number r uniformly distributed on the interval [0, ] and then to take τ = log r w tot Once we have determined the time of the next reaction, we need to decide which reaction will occur. This is the most easy part, since the transition probability for reaction l is proportional to propensity w l. Practically, we generate a random number r uniformly distributed between 0 and and we choose l according to min l l w k / k= w tot > r These two simple rules leads to a simple algorithm ) initialize system at time t = 0 2) Compute propensities for all possible transitions 3) Select τ from the corresponding exponential distribution 4) Select a transition 5) Update time ( t = t + τ) and system state and restart steps 2 and 3 until a desired final time. 6 sur 7 22//206 :28

In [36]: #Gillespie algorithm for the simple mrna production def gilles(alpha,beta,tfinal,m0): m=m0 t=0 mlist=[m0] tlist=[t] while (t<=tfinal): mo=m wtot=alpha+beta*m t=t-np.log(np.random.random_sample())/wtot r=np.random.random_sample() if r<=(alpha/wtot): m=m+ else: m=m- mlist=np.append(m,mlist) tlist=np.append(t,tlist) return mlist,tlist In [4]: alpha=0 beta= tfinal=00/beta ml,tl=gilles(alpha,beta,tfinal,0) plt.plot(tl,ml) 7 sur 7 22//206 :28