Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie) Week 1 1 Motivation Random numbers (RNs) are of course only pseudo-random when generated on our computers: they are determined uniquely by the seed value and the algorithm for producing the next pseudorandom number from the last in the generated sequence. Care must be taken to use a high-quality RNG, and to always know what is going on with the seed (especially if you run multiple jobs on the cluster). RNs are very useful for a series of numerical procedures, three of which we will look at today. Next week, we will also use RNs for Metropolis Monte Carlo simulations. 2 Goals 1. Generating random numbers from a non-uniform distribution by using uniform random numbers (URNs). 2. Monte Carlo integration, especially when the integrals involved are in high dimension. 3. Gillespie SSA simulation: illustrating the relation between a deterministic approximation to an underlying stochastic system, and true stochastic dynamics. 3 Generating random numbers from a given distribution As a rule, standard RNGs generate random numbers on an interval [0, 1) distributed uniformly, but one might require random numbers from some other distribution, e.g., Gaussian (for which a lot of software already provides the generator, for instance, Matlab s randn), or some arbitrary distribution function for which no such generator is provided. 3.1 Transformation method Suppose you can generate URN on unit interval: let x be uniformly distributed such that p(x) = 1. We would like random numbers y generated according to a distribution p(y) = f(y), with some given f, satisfying f(y) 0 and dyf(y) = 1. We would like to produce y as a functional transformation of URNs x. Consequently, the probabilities must transform as: p(y) = p(x) dx dy. (1) 1

Figure 1: Left: Generating points in a sphere where each of the spherical coordinates is drawn uniformly on its domain; points are not uniformly distributed in the volume. Right: the correct prescription by the transformation method of Eq. (6). As p(x) = 1, this implies a differential equation for f(y), f(y) = dx dy, which is solved by or x = y f(y )dy (2) y = F 1 (x), (3) where F (y) = f(y )dy is the cumulative distribution function (CDF) of the desired distribution that we want to generate, f. So to be able to generate random numbers according to f quickly, we need to be able to evaluate quickly the inverse of the CDF of f; how easy this is depends on the desired distribution. The graphical intuition behind this construction is simple: a particular value of URN x is drawn in the range of the CDF for f (which is always [0, 1]), and then y is identified as the value that contains the drawn total probability weight to the left of y. Note that this also allows us also to generate random numbers over infinite domains, for instance, for Gaussian random numbers. For details, see Numerical Recipes chapter 7.2 Example: Generating random numbers in a sphere. Suppose we want to generate points uniformly distributed in the volume of a sphere with radius R (and V = 4πR 3 /3), i.e., we would like P (V ) = 1/V, such that dv P (V ) = 1. Points in 3D spherical volume are compactly parametrized in spherical coordinates, (r, θ, φ), where r [0, R], θ [0, π], φ [0, 2π); these coordinates map into the cartesian system via: x = r sin(θ) cos(φ) y = r sin(θ) sin(φ) z = r cos(θ). (4) It would be incorrect, however, to conclude that generating points uniformly sampled in volume 2

amounts to generating URNs for each of the three coordinates, r, θ, φ, within their respective domains. This is clearly illustrated in Fig. 1 (left), and it leads to overly concentrated points towards the center of the sphere as well as its north/south poles. The functional transformation between the spherical volume element and spherical coordinates is given by the determinant of the Jacobian of the system in Eq. (4), i.e., dv = r 2 drd(cos θ)dφ. If, then, we want to draw spherical coordinates independently from distributions P θ (θ), P r (r), and P φ (φ) that we have yet to idenfiy, we must have: P (V )dv = P r (r)drp φ (φ)dφp θ (θ)dθ. (5) Given that P (V ) = 1/V is uniform and we know how dv transforms, we can insert both quantities into the above equation and recover: P r (r) = 3r2 R 3 P θ (θ) = 1 2 sin θ P φ (φ) = 1 2π. You can verify that the above distributions are properly normalized so that their integrals over the variable domains are 1 (note also that the units of the distribution for r are, correctly, those of inverse length). We next need to compute the CDFs of these distributions and invert them to get the prescription for generating points uniformly in a spherical volume; this gives (if z are URNs different for each coordinate): 3.2 Rejection sampling θ = arccos(1 2z) r = Rz 1 3 φ = 2πz. (6) Another easy approach to generate samples according to a desired distribution f(y) does so by selectively discarding samples. Here we use RNG to generate N random number pairs uniformly on a square [0, A] [0, B] which fully covers the distribution f(y) on y [0, A] domain (the uniform distribution in this case is referred to as the trial function ), to obtain a sequence (y i, z i ), for i = 1,..., N. Take each pair in turn and check whether z i < f(y i ); if yes, retain y i, otherwise discard it. The resulting set of y i is distributed according to f(y). Graphically, the procedure will uniformly cover the desired distribution f(y), so the number of samples in some small interval [y, y + dy] will be f(y)dy, which by definition means that samples are distributed as desired. You can also think about this as follows. Initially, the trial points are drawn according to P 0 (y, z) = (AB) 1, i.e., uniformly on the square that covers the desired distribution. Each such trial point is kept ( accepted ) with acceptance function P acc = Θ(f(y) z), where Θ is a Heaviside step function, Θ(x) = 1 for x 0 and is 0 otherwise. Then, the distribution of values of the y coordinate of accepted points only will be: dz P0 (y, z)p acc P (y) =. (7) dy dz P0 (y, z)p acc 3

You can convince yourself that the denominator (the normalizing factor) is just the area under the desired distribution (or the fraction of accepted points), and the numerator is the conditional distribution of accepted points. If you plug in for P 0 the uniform distribution and the acceptance function, you will find that P (y) = f(y), as desired. Generating a single desired random number hence takes 2 URNs and a single evaluation of f( ). Rejection sampling is more efficient if you can generate trial points uniformly under a function that covers the desired f(y) as tightly as possible; in other words, the trial function need not be uniform. This is in particular useful for generating random numbers on unbounded domains; in this case, transformation method can generate e.g., Lorenzian or Gaussian distributed numbers which can be made to cover the desired distribution (e.g., Gamma or Poisson) as trial functions, from which you subsequently discard the samples (cf. Numerical Recipes Chapter 7.3). 4 Monte Carlo integration Often, various integrals are impossible to evaluate analytically. On the other hand, analytically solvable limits may be of insufficient precision, while numerical quadratures quickly become unfeasible for integrals in more than a few dimensions due to the curse of dimensionality. One can then resort to Monte Carlo integration. The main idea is very similar to the idea of sample discarding described above. Suppose we need to evaluate Z = f(x)d D x, (8) D where D is a D-dimensional domain, and f is the integrand. Monte Carlo integration consists of drawing T random numbers x i either directly from domain D if possible, or some larger domain D such that D D, if the larger domain is easier to sample from (in this case, define f(x) = f(x) if x D and f(x) = 0 if not; in other words, f is zero outside of the domain of integration, D). The integral can then be approximated as follows: Z V T f(x i ), (9) T i=1 i.e., as an average of the integrand over the domain over the sample points. V is the volume of the domain D. If the domain from which random numbers are drawn is tight, the relative error of such an approximation to the true integral should decrease as T 1/2, regardless of the dimensionality of the domain. 5 Stochastic Simulation Algorithm (SSA) Proposed by Gillespie in a series of papers (e.g., Ref [1]), this is a way of exactly simulating stochastic chemical reaction processes taking place in well-mixed solution. 5.1 Introduction and example Let s start with an example. Suppose we have a chemical system consisting of four reactions (R 1,..., R 4 ) among three different chemical species (X, Y, Z): R 1 : X + Y Z 4

R 2 : R 3 : Z Y + 2Z X R 4 : Y (10) The usual way to approach the mathematical modeling of this problem would be to specify the initial concentrations of the three chemical species, c X, c Y, c Z at time t = 0, specify the reaction rates for each of the four reactions, k 1,..., k 4 and write down chemical reaction kinetics, a set of ODE (also known as mass-action equations): dc X dt dc Y dt dc Z dt = k 1 c X c Y + k 3 c Y c 2 Z = k 1 c X c Y k 3 c Y c 2 Z + k 4 = k 1 c X c Y k 2 c Z 2k 3 c Y c 2 Z. (11) We could then go on and, say, numerically integrate this system of equations starting with the initial condition. Instead of doing that, let us rather reexamine the assumptions behind chemical kinetics and pose the following questions: The actual numbers of molecules in a finite volume are integers (e.g., there is x molecules of type X in a reaction vessel of volume V ), while the Eqs. (11) model them effectively as concentrations, c X = x/v. How reasonable is this approximation? Even in a well-mixed system, a molecular encounter is stochastic, as are then the reaction times; this means that the evolution of a system is random. Eqs. (11), on the other hand, are deterministic. How do we properly describe the stochasticity? To address both questions, we can move to a probabilistic description of the time-evolution of the reaction system, Eqs. (10), by defining a probability distribution of observing x molecules of type X, y molecules of type Y and z molecules of type Z at time t, written as P (x, y, z t), given the initial condition, P (x, y, z t = 0) = δ(x, x 0 )δ(y, y 0 )δ(z, z 0 ), where δ(a, b) = 1 if a = b and 0 otherwise. The temporal evolution of P is given by the Master equation, which is a linear equation in P : d dt P ( t) = ˆLP ( t), (12) where ˆL is a linear operator that we can construct from the reaction scheme, Eqs. (10), as we will show next. But let s first pause and think about the Master equation and P. First, in theory, P is a vector of infinite dimensionality, because the molecule counts can take on any nonnegative integer value. Consequently, ˆL is of dimension of that squared, since it prescribes the transitions from some values of molecular counts to any other value. In practice (but not always, e.g., in exponentially growing populations) the numbers will be bounded and the probability will sharply decrease for very high counts, suggesting that P (and thus ˆL) can be truncated into a finite dimensional vector (or matrix). However, already a few interacting chemical species, even those truncated vectors can be of huge dimensionality, preventing us from even writing them down. On the other hand, the Master equation contains the full information about chemical reactions in a well-mixed system. For example, the information of all moments is there, obtainable 5

by simply marginalizing over the distribution: c X (t) = x(t) V = y,z xp (x, y, z t) (13) (although you may look at some caveats of equating the marginals of P with the deterministic mass-action kinetics of Eqs. (11) in Gillespie s original paper). Of course, the Master equation also contains all information about correlations etc. as well. How does one build the operator ˆL from the reaction system? Consider the reaction X +Y Z. This reaction consumes one X and one Y and produces one Z. The two terms that correspond to it (and every chemical reaction) in the Master equation are: dp (x, y, z t) = c 1 (x + 1)(y + 1)P (x + 1, y + 1, z 1 t) c 1 xyp (x, y, z t) +. (14) dt This is because there are two ways in which the probability of having (x, y, z) molecules can change (the term on the right): this probability will increase if the reaction takes place from the state (x + 1, y + 1, z 1), which will happen with the rate that is proportional to the number of reactants in that state and the molecular reaction rate c 1 ; or, this probability will decrease because the reaction takes place from the state (x, y, z). In this stochastic interpretation, c µ for each reaction µ are the so-called molecular or stochastic reaction rates: probabilities per unit time that a particular molecular configuration for reaction µ will react. So, for our four reactions, the probabilities per unit time a µ for each to take place in the state (x, y, z) are: R 1 : a 1 = c 1 xy (15) R 2 : a 2 = c 2 z (16) R 3 : a 3 = z(z 1) c 3 y 2 (17) R 4 : a 4 = c 4. (18) Note the binomial factor for R 3 that counts in how many ways can I choose its reactants: one molecule of Y and two (identical) molecules of Z. This becomes important at small number of molecules: for example, with only one Z molecule available, that reaction cannot take place, whereas the ODE system in Eqs. (11) would still assign a non-zero rate to such a reaction. These factors of order ±1, as well as combinatorial factors (e.g., of 2 in equation 3), are different between the stochastic reaction rates, c µ, and the mass-action constants, k µ in Eqs. (11). For further details see Gillespie s original paper and a classic reference on stochastic reaction kinetics [2]. 5.2 SSA Algorithm The logic behind the algorithm proposed by Gillespie, is general and applicable to the simulation of continuous-time discrete space Markov processes. For example, in the homework, we will use it to simulate a predator-prey system. Consider a vessel with various chemical species, S i, i = 1,..., N. There are X i (t = 0) molecules of every species at t = 0, where X i are non-negative whole numbers. The reaction system is specified by listing possible reactions R µ, µ = 1,..., M, as above, each prescribing (i) the reactants (i.e. a list of chemical species and integer quantities of how much of each is needed); (ii) the products of the reaction, again a subset of S and the corresponding integer quantities; (iii) the intrinsic reaction rate c µ. We want to simulate exactly various possible time-courses of X i (t). Note that the time-courses are stochastic, because the 6

exact times and ordering of reactions are random, due to randomness associated with diffusion that can bring two (or more) reactants in close enough proximity that the reaction can take place. Thus, starting from the same initial condition, and using the same reaction schemes, the exact integer trajectories will be different across many runs of the SSA. Going back to the stochastic simulation, a naive way to simulate this process would be as follows: divide the time in very small intervals t. In every interval (if it is small enough, which means that the product of reaction rates and t is always much less than 1), the probability that a reaction happens is, for each reaction, given by the product of the rate, the reactant concentrations, and t. Thus, at every time step, you can flip a die using RNG, and determine if reaction µ should occur: if so, you update the table of chemical species numbers, and repeat. This, however, can be very inefficient, since most of the time no reaction will be drawn. SSA does away with this, by making a clever argument that the time to any next reaction is exponentially distributed. So instead of chunking time into small bins, I draw one exponential random variable (time-to-next reaction), another random variable that tells me which reaction I should carry out, and then I can execute the reaction and update the table of chemical species numbers. More formally, SSA defines a reaction probability density function, P (τ, µ)dτ as the probability that, given state X (a vector of chemical species counts) at time t, the next reaction µ will happen in time interval [t + τ, t + τ + dτ]. Gillespie then shows that one can write P (τ, µ)dτ = P 0 (τ)a µ dτ, where P 0 (τ) is the probability that no reaction took place in time interval τ and a µ dτ is the probability that that next reaction is µ. One can show that the distribution P 0 for τ is exponential, P 0 (τ) exp( a µ τ), (19) µ by verifying that it satisfies the recursion for P 0 (τ) which is easy to write down (can you try? the probability for no reaction until t + dt is the probability for no reaction until t times the probability that no reaction takes place in dt, which is just 1 the probability that any reaction takes place in dt, which is µ a µdτ). Note that a µ depend on the state of the system, X! This leads to a very simple simulation scheme: At t = 0, set the system state to the initial condition, X(t = 0), and input stochastic reaction rates, c µ for all reactions µ = 1,..., M, and the termination time T. Compute a µ from c µ and the state X, compute also µ a µ. Draw two URNs and use them to generate exponentially distributed time-to-next-reaction, τ (with parameter µ a µ), and reaction index µ, from P (τ, µ). Update the state, X(t), into X(t+τ) by executing reaction µ, and advance time t t+τ. Repeat (2) and terminate if t T. Pros and cons of SSA: + Exact for well-mixed systems in thermal (not not necessarily chemical) equilibrium (if both equilibria hold, we can sample stationary statistics of chemical reactions). No need for the t 0 limit in the numerical simulation. + Gives 1 stochastic trace per run, to estimate low-order moments (e.g. mean concentration as a function of time) not that many traces usually need to be simulated. 7

+ It can simulate rare events correctly. + Small memory requirement. Lots of computation time in stiff systems, where a subset of reactions is much faster than other reactions (e.g. dimerization/monomerization reactions in the cell). One is tempted to use SSA only for slow reactions and do mass-action kinetics for fast ones, but this requires a lot of care to do correctly. Produces a lot of simulation data (and takes lots of time) if reaction systems have many species / reactions; used for small systems usually. If space matters, e.g., if the system is not well-mixed as when the diffusion of reactants is slow, completely new effects can emerge and SSA should not be used. Alternative schemes that take space into account are clearly much slower. 6 Study literature Wikipedia on Lotka-Volterra: http://en.wikipedia.org/wiki/lotka-volterra equation. The paper introducing Gillespie SSA algorithm in Ref [1]. Numerical Recipes in C, chapters 7.2 (random numbers by transformation method), 7.3 (by rejection method), 7.6 (Monte Carlo integration) A simple introduction to Euler method: http://en.wikipedia.org/wiki/euler method. 7 Homework 1. Drawing random numbers distributed according the exponential distribution. The uniform RNG generates random numbers on [0, 1) according to P (x) = 1. Using the inversion method and given that you have access to uniform RNG distributed numbers, how can you generate random numbers y that are distributed according to P (y) = α exp( αy) for y [0, ) and any positive parameter α? Implement this prescription, draw 10 6 exponentially distributed random numbers with α = 1, and show their properly normalized PDF on the y-logarithmic plot. Make sure you understand the Box-Muller method for generating Gaussian-distributed random numbers in Numerical Recipes Chapter 7.2. 2. Computing the center-of-mass of a ball with a cylindrical hole. Consider a homogenous ball with radius R = 1 centered on coordinates r 0 = 0. We drill a cylindrical hole into the ball, whose center line is aligned with the z axis, of radius 0.5, and passing through a point (0.5, 0, 0). Compute the center of mass of this object using Monte Carlo sampling. Hint: you can generate 3D random numbers uniformly in a box [ 1, 1] 3, discard all points that do not fall into the object, and evaluate the center of mass, r = dv r/ dv, simply as an average of coordinates r = (x, y, z) of all the points that fall into the object. To check that you are generating the points correctly, plot the projection of the points onto the (x, y) plane. Report the center of mass coordinates. Due to symmetry, the correct answer for two center-of-mass coordinates, y, z has to be 0; you can use this exact result to 8

estimate the error of your Monte Carlo integral. Evaluate the Monte Carlo integral using 10, 100, 1000,..., (however much is feasible on your machine, on Matlab you should easily go up to 10 6 in a few seconds) sampling points, and for each number of sampling points, repeat the estimation 10 or more times; plot the average error in y or z as a function of the number of sampling points, on a log-log plot. How does this error depend on the number of points? If you generate random points uniformly in a sphere, and only discard those that fall into the hole, does that integration go slower or faster than if you discard samples uniformly sampled in a box? Time the two implementations and compare. 3. Deterministic Lotka-Volterra system for the dynamics of predators and prey. We are given a system of two ordinary differential equations, describing the interactions between the prey (x), and predators (y), as follows: ẋ = λ x δ xy (20) ẏ = ɛ y + β xy (21) Here, the prey is intrinsically multiplying at a rate λ (a first-order reaction ), but gets predated on in a second-order reaction with coefficient δ by predators y. Similarly, the predators intrinsically decay with rate ɛ, but grow in numbers if they catch the prey (rate β). This well-known system has two fixed points (points where the time derivatives are 0). One is trivially (x = 0, y = 0); find the second fixed point (x 0, y 0 ). It appears as if the system has 4 free parameters (λ, δ, ɛ, β), but one can reduce this to a one-parameter system, by a proper choice of units. Show that if you measure the predators in units of y 0, prey in units of x 0, and time in units of t 0 = (λɛ) 1/2, the system reduces to: ωẋ = x xy (22) ω 1 ẏ = y + xy (23) Remember, in this system, x and y are measured in new units of x 0 and y 0, rather then in absolute numbers. What is the parameter ω in terms of the original parameters of the problem? Plot a phase portrait for ω = 1. A phase portrait is a plane (x, y), where at every point of the plane you draw a little vector pointing into the direction of (ẋ, ẏ). What do you think the dynamics will be like? Choose an initial point (t = 0, x = 0.5, y = 0.5) and draw a trajectory until T = 10 on the (x, y) plane; separately also show x(t) and y(t). You can compute the trajectories either by doing a very simple numerical Euler integration (where you change the differential Eqs (23) into difference equations for a very small t 1, and step forward into the future; see Study literature), or, if you know how to, more efficient schemes (like Runge-Kutta). You can also use solvers in Matlab / Mathematica, if you know how. 4. Stochastic simulation of the LV system. In reality, the numbers of predators and prey are not continuous variables but discrete quantities, and due to random nature of encounters between predators and prey it can happen that the prey or predators die out. In the first case the system will settle to the (0, 0) attractor, in the latter prey starts to multiply exponentially. Indeed, the LV system can be viewed as a reaction network with the following reactions, assuming δ = β: X 2X (24) 9

Y (25) X + Y 2Y (26) (Note that X 2X stands for one individual of species X becoming two, not a doubling of the whole population.) This is just one way of representing the same deterministic dynamics (think, for instance, about what have we assumed here regarding how the prey multiply). Implement Gillespie s SSA algorithm for this reaction network. Since the fluctuations matter now (as the system can go extinct), and fluctuations depend on all rates, the behavior of this system is no longer determined only by a single effective parameter ω, as before. Select a few sets of parameters for which ω = 1 in the deterministic regime, and run the simulation many times, with the same initial condition (that corresponds to the deterministic (0.5, 0.5), but different random seed, and plot the histogram of times until predators go extinct. Plot sample trajectories in the (x, y) plane, in dimensionless units introduced above, for different parameter sets. Which parameter sets make the average time to extinction longer (and the system thus more similar to the deterministic one that never goes extinct)? References [1] Gillespie D (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81: 2340. [2] van Kampen NG (2011) Stochastic processes in physics and chemistry (North-Holland). 10