APM 541: Stochastic Modelling in Biology Diffusion Processes. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 61

APM 541: Stochastic Modelling in Biology Diffusion Processes Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 61

Brownian Motion Brownian Motion and Random Walks Let (ξ n : n 0) be a collection of i.i.d. random variables with distribution P(ξ n = 1) = P(ξ n = 1) = 1 2 and define the discrete-time Markov process X = (X t : t 0) by setting X 0 = 0 and tx X t = X t 1 + ξ t = ξ s. X is said to be a simple random walk. Since E[ξ n] = 0 and Var(ξ n) = 1, it follows that E [X t] = 0 Var(X t) = t for every t 0. s=1 Jay Taylor (ASU) APM 541 Fall 2013 2 / 61

Brownian Motion Simple Random Walk 20 Simple Random Walk 15 10 5 X 0 5 10 15 20 0 10 20 30 40 50 60 70 80 90 100 time Jay Taylor (ASU) APM 541 Fall 2013 3 / 61

Brownian Motion We can even explicitly calculate the distribution of the random variables X t. Because X increases or decreases by 1 in every time step, it follows that X t will be odd whenever t is odd and even whenever t is even. Thus, if t = 2n is even, then for X 2n = 2k to be true, the process must have taken n + k steps to the right and n k steps to the left, giving P(X 2n = 2k) =! «2n 2n 1. n + k 2 Although this result is exact, it is unwieldy when t is large. In that case, it is more convenient to invoke the central limit theorem which tells us that X t is approximately normal with mean 0 and variance t, i.e., P(a X t b) Z b a 1 2πt e x2 /2t dx. Jay Taylor (ASU) APM 541 Fall 2013 4 / 61

Brownian Motion In the previous example, we arbitrarily assumed that the random walker took steps of size 1. Let us modify this by introducing a family of processes, say X (ɛ) = X (ɛ) t : t 0, indexed by a positive real number ɛ > 0, where and X (ɛ) t = tx s=1 ξ (ɛ) s ξ n (ɛ) : n 1 is a collection of i.i.d. random variables with P ξ n (ɛ) = ɛ = P ξ n (ɛ) = ɛ = 1 2. In other words, the step size is now ɛ. In this case we have h E X (ɛ) t i = 0 Var(X (ɛ) t ) = tɛ. Jay Taylor (ASU) APM 541 Fall 2013 5 / 61

Brownian Motion Although each of the processes X (ɛ) is a Markov jump process, when ɛ > 0 is small, the jumps are small and so to the naked eye the sample paths may appear to be continuous. 20 Simple Random Walk 15 eps =1 eps =1/4 eps =1/16 eps =1/64 10 5 X 0 5 10 15 20 0 10 20 30 40 50 60 70 80 90 100 time However, as the figure shows, as ɛ decreases to 0, not only do the step sizes decrease, but all of the variables X (ɛ) t also tend to 0. Jay Taylor (ASU) APM 541 Fall 2013 6 / 61

Brownian Motion To compensate for the shrinking step size, we need to inflate the time scale on which the process is run, i.e., we will rescale time so that the process takes a greater number of steps of smaller size. The right rescaling can be deduced by examining the relationship between the variance of X (ɛ) t, ɛ, and t. Since Var(X (ɛ) t ) = tɛ, if we define a new process B (ɛ) = (B (ɛ) t : t 0) by setting then B (ɛ) t h E B (ɛ) t = X (ɛ) t/ɛ i = 0 Var(B (ɛ) t ) = t, for all ɛ > 0 and all t 0. With this rescaling of time, t t/ɛ, the variance of the variables B (ɛ) t remains constant even as ɛ decreases to 0 and so it is at least plausible that the processes B (ɛ) might converge to a non-trivial limit as ɛ 0. Jay Taylor (ASU) APM 541 Fall 2013 7 / 61

Brownian Motion Simple Random Walks with Diffusive Rescaling 15 Simple Random Walk 10 5 X 0 5 10 eps =1 eps =1/4 eps =1/16 eps =1/64 15 0 10 20 30 40 50 60 time Jay Taylor (ASU) APM 541 Fall 2013 8 / 61

Brownian Motion Apart from the picture, there are several lines of evidence that suggest that the processes B (ɛ) do in fact converge to a non-trivial limit as ɛ 0. In particular, for each t 0, not only is it true that the mean and the variance of the random variables B (ɛ) t are constant, but in addition the central limit theorem shows that these variables converge in distribution to a normal random variable with mean 0 and variance t: B (ɛ) t t 0 1 @ p 1 t/ɛ X ξ i (ɛ) A t/ɛ ɛ i=1 0 = d t @ p 1 t/ɛ X t/ɛ i=1 ξ i 1 A d N (0, t), as ɛ 0. Here we have used the fact that ξ (ɛ) i every increment B (ɛ) t+s B (ɛ) t converge in distribution to normal random variables d = ξ i for every ɛ > 0. Furthermore, since is a sum of i.i.d. random variables, it follows that these also B (ɛ) t+s B (ɛ) t d N (0, s) for every t, s > 0, again as ɛ 0. This suggests that both the marginal distributions and the increments of the limiting process, B, assuming it exists, are normal random variables. Jay Taylor (ASU) APM 541 Fall 2013 9 / 61

Brownian Motion In fact, the rescaled processes B (ɛ) do converge to a limit as ɛ decreases to 0. This limiting process is called Brownian motion and is described in the next definition. Definition A real-valued continuous-time stochastic process B = (B t; t 0) is called a standard one-dimensional Brownian motion if 1 B 0 = 0 a.s. 2 B has independent stationary increments with normal distribution B t+s B t N (0, s); 3 B has continuous sample paths, i.e., P t 0 lim B t+h = B t = 1. h 0 Jay Taylor (ASU) APM 541 Fall 2013 10 / 61

Brownian Motion Historical Remarks Brownian motion is named after the Scottish botanist Robert Brown who in 1827 described the random movement of starch and lipid grains in a drop of water observed through a microscope. Brownian motion (as a stochastic process) was first described by Bachelier in 1900 as a model for the fluctuation of stock prices. Einstein independently formulated a theory of Brownian motion in one of his 1905 papers to explain the apparently random movement of small particles suspended in a fluid. Wiener (1923) gave the first rigorous proof of the existence of Brownian motion. For this reason, Brownian motion is also known as the Wiener process. Donsker (1951) gave a formal proof that the rescaled random walks converge in distribution to Brownian motion. This result is known as Donsker s Invariance Principle and holds for a large class of unbiased random walks. Jay Taylor (ASU) APM 541 Fall 2013 11 / 61

Brownian Motion Brownian motion has a number of important and somewhat exotic properties that are summarized in the following theorem. Theorem Let B = (B t; t 0) be a standard one-dimensional Brownian motion. Then 1 B is a continuous-time Markov process. 2 Brownian paths are almost surely nowhere differentiable, i.e., «1 P t 0 : lim h 0 h (B t+h B t) exists = 0; 3 Brownian paths have infinite first variation on any interval [a, b], i.e., for any sequence of partitions n = {t 1,, t n} of [a, b] with t 1 = a, t n = b, and mesh tending to zero as n, we have! nx P lim B tk B tk 1 = = 1. n k=1 In particular, the length of a Brownian path over any interval is almost surely infinite. Jay Taylor (ASU) APM 541 Fall 2013 12 / 61

Brownian Motion Theorem (Cont d.) 4 Brownian paths have finite quadratic variation on any interval [a, b], i.e., for any sequence of partitions n = {t 1,, t n} of [a, b] with t 1 = a, t n = b, and mesh tending to zero as n, we have! nx P lim `Btk B tk 1 2 = b a = 1. n k=1 5 For any T > 0, B has uncountably infinitely many zeros on [0, T ]. 6 The scaling property: For any γ > 0, the process (γb t/γ 2 : t 0) is a standard Brownian motion. The theorem tells us that although Brownian paths are continuous, they are locally extremely irregular. Indeed, because Brownian paths effectively look the same on all scales, the macroscopic fluctuations that we see when plotting these paths can also be seen at all scales, no matter how small. Jay Taylor (ASU) APM 541 Fall 2013 13 / 61

Diffusion Processes Diffusion Processes and Stochastic Differential Equations Brownian motion is a special case of a much more general class of continuous-time Markov processes known as diffusion processes. Definition A continuous-time Markov process X = (X t : t 0) with values in an interval I = (l, r) is said to be a diffusion process if the sample paths of X are almost surely continuous; for every x in I and every ɛ > 0, the following limits exist 1 b(x) = lim h 0 h E [X t+h X t X t = x] 1 h i a(x) = lim h 0 h E (X t+h X t) 2 X t = x The functions a, b : I R are called the infinitesimal variance and infinitesimal drift coefficients of X, respectively. Jay Taylor (ASU) APM 541 Fall 2013 14 / 61

Diffusion Processes Interpretation of the Infinitesimal Drift and Variance The infinitesimal drift b(x) determines the expected change in a small increment of X starting at x: EˆX t X 0 X 0 = x = b(x)t + o(t) The infinitesimal variance a(x) determines the variance of a small increment of X starting at x: Eˆ`X t X 0 2 X0 = x = a(x)t + o(t) Jay Taylor (ASU) APM 541 Fall 2013 15 / 61

Diffusion Processes Provided that the variance and drift coefficients are differentiable, it can be shown that a diffusion process X with these coefficients exists and that the transition probabilities of this process have a density p(x, y; t): P(u X t v X 0 = x) = Z v u p(x, y; t)dy. In other words, conditional on X 0 = x, the marginal distribution of X t is continuous with density p(x, y; t). Furthermore, this density function is a solution to the following partial differential equations, which are known as the Kolmogorov forward and backwards equations for X : (KFE) (KBE) 1 p(x, y; t) = b(y)p(x, y; t) + t y 2 2 a(y)p(x, y; t) y 2 p(x, y; t) = b(x) t x p(x, y; t) + 1 2 a(x) p(x, y; t) 2 x 2 subject to the initial condition p(x, y; 0) = δ x(dy). Jay Taylor (ASU) APM 541 Fall 2013 16 / 61

Diffusion Processes Example: Since Brownian motion is a diffusion process with drift coefficient b(x) = 0 and variance coefficient a(x) = 1, the forward equation for Brownian motion is just the heat equation tp(x, y; t) = 1 yy p(x, y; t). 2 If the initial condition is p(x, y; 0) = δ x(y), the unique solution to this equation is the Gaussian density p(x, y; t) = 1 2πt e (y x)2 /2t, which confirms that the increment B t B 0 is normally distributed with mean 0 and variance t. Jay Taylor (ASU) APM 541 Fall 2013 17 / 61

Diffusion Processes Example: A diffusion process X with variance a(x) = σ 2 > 0 and drift b(x) = γx is said to be an Ornstein-Uhlenbeck process. In this case, the Kolmogorov forward equation is the pde which has solution 1 pt(x, y; t) = γ ˆy p(x, y; t) 2 + t y 2 σ2 p(x, y; t). y 2 p(x, y; t) = j ff 1 1 p 2πσ2 (t) exp `y xe γt 2, 2σ 2 (t) where σ 2 (t) σ2 1 e 2γt. 2γ It follows that X has Gaussian increments. The Ornstein-Uhlenbeck process was originally introduced as a model for the motion of a tethered particle subject to thermal noise. Jay Taylor (ASU) APM 541 Fall 2013 18 / 61

Diffusion Processes Ornstein-Uhlenbeck Process 15 Ornstein Uhlenbeck Process 10 Brownian motion OU: γ=σ 2 =1 OU: γ=1, σ 2 =10 5 0 5 10 15 0 20 40 60 80 100 120 140 time Jay Taylor (ASU) APM 541 Fall 2013 19 / 61

Applications of the Forward and Backward Equations Stationary Distributions for Diffusion Processes Recall that a probability distribution π is said to be stationary for a continuous-time stochastic process X if at every time t 0, π is the distribution of the process at that time. If X is a diffusion process on an interval (l, r) and π is a continuous stationary distribution with density π(x) on this same interval, then π must also be a stationary solution for the Kolmogorov forward equation: 0 = 1 2 a(x)π(x) b(x)π(x) 2 x 2 x subject to no-flux boundary conditions at l and r: x π(x) x=l,r = 0. These conditions guarantee that no probability mass enters or escapes across the boundary. Jay Taylor (ASU) APM 541 Fall 2013 20 / 61

Applications of the Forward and Backward Equations Assuming that this problem has a solution, a direct calculation shows that the density of the stationary distribution is π(x) = 1 Z x «Ca(x) exp b(y) 2 c a(y) dy, where c is an arbitrary number within the interval (l, r) and C < is a normalizing constant which must be chosen (if possible) so that Z r l π(x)dx = 1. If no such choice of C is possible, then we can conclude that the process has no continuous stationary distributions on (l, r). Jay Taylor (ASU) APM 541 Fall 2013 21 / 61

Applications of the Forward and Backward Equations Example: The Ornstein-Uhlenbeck process with coefficients a(x) = σ 2 and b(x) = γx has a unique stationary distribution on R provided that the parameters γ and σ 2 are both positive. In this case, the density of the stationary distribution is π(x) = 1 2 Cσ exp γσ Z x «y dy = 1 2 2 0 Cσ exp γ 2 σ x 2. 2 As this is the density of a Gaussian distribution with mean 0 and variance σ 2 /2γ, we can forgo the calculation of C and immediately write π(x) = r γ πσ 2 exp γ σ 2 x 2. Since this distribution is supported on the entire real line, we can deduce that an OU process visits values that are arbitrarily large and arbitrarily small, i.e., the linear restoring force is not strong enough to prevent the particle from randomly drifting arbitrarily far from 0. Jay Taylor (ASU) APM 541 Fall 2013 22 / 61

Applications of the Forward and Backward Equations Example: If X is Brownian motion, then the Kolmogorov forward equation is just the heat equation and so any stationary distribution, if it exists, must be a solution to Laplace s equation: π (x) = 0. However, the general solution to this equation is π(x) = a + bx, and there is no way of choosing a and b such that π(x) is integrable over (, ). This is no accident, since Brownian motion does not have a (proper) stationary distribution on the real line. Jay Taylor (ASU) APM 541 Fall 2013 23 / 61

Applications of the Forward and Backward Equations Time Reversal and Detailed Balance Like birth-death processes, one-dimensional diffusion processes with non-trivial stationary distributions satisfy a detailed-balance condition. Namely, if X is a stationary diffusion process with transition density p(x, y; t) and stationary distribution π(x)dx, then for all x, y and every t > 0, we have π(x)p(x, y; t) = π(y)p(y, x; t) Furthermore, if we consider the time-reversed process ˆX = ( ˆX t : t 0), where ˆX t = X T t for some T > 0, then ˆX is also a stationary diffusion process with the same infinitesimal variance and drift coefficients as X. In other words, time reversal does not change any of the statistical properties of a one-dimensional diffusion process. This is essentially a consequence of the topological constraints of life in one-dimension and is no longer true (in general) when we pass to two or more dimensions. Jay Taylor (ASU) APM 541 Fall 2013 24 / 61

Applications of the Forward and Backward Equations Exit Distributions for Diffusion Processes Exit distributions of one-dimensional diffusion processes are just as easily calculated. Suppose that X is a diffusion process on the interval (l, r) with initial condition X 0 = x (u, v), where l u < v r. Provided that the infinitesimal variance a(y) > 0 is positive everywhere in the interval (u, v), it can be shown that X will eventually exit the interval at some finite time T u,v. In this case, it is often of interest to know the distribution of X at the exit time: p(x) P x `XTu,v = v. In other words, p(x) is the probability that the diffusion will exit the interval through the right boundary v rather than the left boundary u. Equivalently, p(x) is the probability that X will hit v before it hits u. Jay Taylor (ASU) APM 541 Fall 2013 25 / 61

Applications of the Forward and Backward Equations It can be shown that m(x) is a stationary solution of the Kolmogorov backward equation, i.e., 1 2 a(x)p (x) + b(x)p (x) = 0 subject to the boundary conditions p(v) = 1 and p(u) = 0. This problem can also be solved directly and gives where c is an arbitrary number in (u, v). R x exp 2 R y b(z) dz dy u c a(z) p(x) = R v exp 2 R, y b(z) dz dy u c a(z) Jay Taylor (ASU) APM 541 Fall 2013 26 / 61

Applications of the Forward and Backward Equations Example: Suppose that X is a Brownian motion with initial position X 0 = x (u, v). Since the infinitesimal variance and drift are simply a(x) = 1 and b(x) = 0, the probability that X hits v before it hits u is p(x) = R x R 1 dy u v 1 dy = x u v u. u In other words, the probability that Brownian motion exits through one boundary rather than the other is a simple linear function of the distance of the initial value from that boundary. This is a reflection of the isotropy of Brownian motion: the process moves in a completely unbiased fashion, with an equal propensity to increase and decrease. Jay Taylor (ASU) APM 541 Fall 2013 27 / 61

Diffusion Approximations Diffusion Approximations for Markov Chains Just as Brownian motion arises as the limit of a sequence of rescaled random walks, so too can other diffusion processes be thought of as approximations for discrete- and continuous-time Markov chains that take frequent jumps of small size. Before stating a theorem that describes how we can identify such approximations, we need to decide what it means for a sequence of stochastic processes to converge to another. One notion of convergence is described in the following definition. Definition For each n 1, let X (n) = (X (n) t : t 0) be a stochastic process with values in E and suppose that X = (X t : t 0) is also a stochastic process with values in this space. We say that the finite-dimensional distributions of X (n) converge to those of X if for every m 1 and every set of m distinct times, say 0 t 1 < t 2 < < t m, we have h lim E f n X (n) t 1 i,, X (n) t m = E [f (X t1,, X tm )], for every bounded, continuous function f : E E R. Jay Taylor (ASU) APM 541 Fall 2013 28 / 61

Diffusion Approximations The theorem is long enough that I could only fit the necessary conditions onto this slide: Theorem For each N 1, let Y (N) = Y n (N) : n 1 be a DTMC with values in I N I = (l, r) and initial distribution µ N. Suppose that the following conditions are satisfied: 1 Each x I is the limit of a sequence x N I N ; 2 The initial distributions µ N converge to a probability distribution µ on I ; 3 There is a sequence of positive numbers ɛ N 0 tending to zero such that for each x I and any sequence x N I N tending to x we have 1 h i (a) b(x) = lim E Y (N) (N) 1 x N Y 0 = x N N ɛ N» 1 (b) a(x) = lim E Y (N) 1 x N 2 Y (N) 0 = x N N ɛ N (c) 0 = lim N 1 ɛ N E h Y (N) 1 x N e Y (N) 0 = x N i, for some e > 2. Jay Taylor (ASU) APM 541 Fall 2013 29 / 61

Diffusion Approximations Provided that these conditions are satisfied, we can draw the following conclusion. Theorem (Cont.) If X (N) is the continuous-time process defined by piecewise constant interpolation of Y (N), i.e., X (N) t = Y (N) t/ɛ N, then the finite-dimensional distributions of X (N) converge to those of the diffusion process X with initial distribution µ and infinitesimal drift and variance coefficients b(x) and a(x), respectively. Interpretation: The theorem tells us that we can regard the diffusion process X as an approximation to the Markov chain Y (N) for large values of N. In particular, we can use X to find approximate expressions for the stationary distribution and hitting probabilities of Y (N). Furthermore, the accuracies of these approximations will typically be of order O(ɛ N ), i.e., the smaller the jumps of the chain Y (N), the more accurately it will be approximated by a diffusion process. Jay Taylor (ASU) APM 541 Fall 2013 30 / 61

Diffusion Approximations Example: Recall that a Galton-Watson process with offspring distribution ν is a discrete-time Markov chain Z = (Z n : n 0) defined by setting XZ n Z n+1 = ξ n,k, where ξ n,k, k, n 0 are i.i.d. non-negative integer-valued random variables with distribution ν. Now suppose that for each N 1, Z (N) = Z n (N) : n 0 is a Galton-Watson process with offspring distribution ν N and that k=1 h i E ξ (N) n,k = 1 + γ N Var ξ (N) n,k = σ 2 < (N) E» ξ 4 = M <. n,k In other words, when N is large and γ 0, Z (N) is almost a critical branching process. Jay Taylor (ASU) APM 541 Fall 2013 31 / 61

Diffusion Approximations Define the processes Y (N) by setting Y (N) n = 1 N Z (N) n, i.e., think of Y n (N) as the density of the population at time n. Then, provided that Nx is a non-negative integer, the following identities hold: h i E Y (N) 1 x Y (N) 0 = x " = 1 XNx # N E ξ (N) 0,k 1 k=1» E Y (N) 1 x 2 Y (N) 0 = x = 1 N γx 2 = 1 N E 4 2 XNx k=1! 3 2 0,k 1 5 ξ (N) = 1 N σ2 x + 1 N 2 γ2 x 2 Jay Taylor (ASU) APM 541 Fall 2013 32 / 61

Diffusion Approximations Furthermore, by exploiting the independence of the offspring numbers, we can establish the following bound on the fourth moment of the increment:» E Y (N) 1 x 4 Y (N) 0 = x 2 = 1 N E 4 4 XNx k=1 1 N 4» (Nx) 4 γ N = O N 2.! 3 4 0,k 1 5 ξ (N) 4 + 6(Nx) 3 (σ 2 + γ 2 γ 2 ) N γ i + (Nx)M N +6(Nx) 2 (σ 2 + γ 2 ) 2 + 4(Nx) 2 M Remark: It is often easier to estimate the fourth moment of the difference than the expectation of the absolute value of the third moment. Jay Taylor (ASU) APM 541 Fall 2013 33 / 61

Diffusion Approximations It follows that if we take ɛ N = N 1, then condition (3) of the preceding theorem is satisfied with drift and variance coefficients b(x) = γx and a(x) = σ 2 x. Clearly condition (1) is satisfied as well. Thus, provided that the initial distributions Z (N) 0 are chosen so that the distributions of the variables N 1 Z (N) 0 converge weakly to a probability distribution µ on [0, ), then the finite dimensional distributions of the sequence of processes X (N) defined by X (N) t = 1 N Z (N) Nt will converge to those of the diffusion process X with the drift and variance coefficients shown above. X is called a continuous-state branching process and is said to be subcritical, critical, or supercritical depending on whether γ < 0, γ = 0, or γ > 0, respectively. If γ = 0, then X is also called the Feller diffusion. Jay Taylor (ASU) APM 541 Fall 2013 34 / 61

Stochastic Differential Equations Diffusion Processes and Stochastic Differential Equations Suppose that a system of interest is modeled by an ordinary differential equation of the form Ẋ t = b(x t) subject to some initial condition X 0 = x 0. Except in special cases, we cannot solve this equation in closed form and we instead must resort to numerical solutions obtained by discretizing the ODE. The simplest such discretization is known as Euler s method and takes the form: X t+h = X t + b(x t)h, where h is chosen as small as possible. This recursion can be solved explicitly and, provided that the function b(x) is sufficiently regular, it can be shown that as h tends to zero, the sequence of solutions obtained from the difference approximation will converge to the solution of the original ODE, at least on compact time intervals. Jay Taylor (ASU) APM 541 Fall 2013 35 / 61

Stochastic Differential Equations Now let us consider the consequences of adding noise to Euler s method, i.e., suppose that X (h) is a process which solves the following stochastic difference equation Here q X (h) t+h = X (h) t + b(x (h) t ) h + a(x) is a non-negative continuous function; (ξ (h) nh a(x (h) t ) ξ (h) : n 0) is a sequence of independent, identically-distributed random variables with mean 0 and variance h. t. In other words, the process X (h) has a tendency to move in the direction prescribed by the vector field b(x), but at each instant is also subject to a random perturbation which is unbiased (i.e., mean zero) and which has variance of order h. Jay Taylor (ASU) APM 541 Fall 2013 36 / 61

Stochastic Differential Equations Because noise is being injected at every step, it is no longer the case that the solutions X (h) converge to the solution of the original ODE. Instead, if we let h tend to zero, then we can use the preceding theorem to show that X (h) tends to a diffusion process X with infinitesimal variance a(x) and drift b(x). For this reason, diffusion processes are often though of as solutions of stochastic differential equations, which can either be written in the form dx t = b(x t)dt + p a(x t)dw t or, equivalently, as stochastic integral equations X t = X 0 + Z t 0 b(x s)ds + Z t 0 p a(xs)dw s. Here, dw t is the stochastic differential of the Wiener process W = (W t : t 0), which we can think of as an infinitesimal increment W t+δt W t, which is normally distributed with mean 0 and variance δt. Jay Taylor (ASU) APM 541 Fall 2013 37 / 61

Stochastic Differential Equations If X = (X t : t 0) is a stochastic process with continuous sample paths and W is a Wiener process, then the Ito stochastic integral of X with respect to W is defined by taking the following limit: Z t 0 X sdw s = lim h 0! X X sn (W sn+1 W sn ) n where 0 = x 0 < x 1 < < x N = t is any partition of [0, t] with step sizes less than h. Under suitable conditions on X, it can be shown that this limit exists in probability and that»z t E X sdw s 0 Z t «Var X sdw s 0 = 0 = Z t 0 h i E Xs 2 ds. Jay Taylor (ASU) APM 541 Fall 2013 38 / 61

Stochastic Differential Equations Because of the extreme irregularity of the sample paths of Brownian motion, Ito integrals do not obey the ordinary rules of calculus. In particular, the usual change-of-variables formula does not generally apply to stochastic integrals. For example, if W is a Wiener process, then Z t 0 W sdw s = 1 2 W 2 t 1 2 t. To see where the anomalous second term comes from, partition [0, t] into N subintervals of length t/n and consider the difference approximation NX W tn (W tn+1 W tn ) = 1 X (W tn+1 + W tn )(W tn+1 W tn ) 2 n=1 n 1 X (W tn+1 W tn )(W tn+1 W tn ). 2 n Jay Taylor (ASU) APM 541 Fall 2013 39 / 61

Stochastic Differential Equations The first term on the left-hand side is 1 2 X (W tn+1 + W tn )(W tn+1 W tn ) = 1 X 2 n Likewise, the second term can be written as = 1 2 n Wt 2 n+1 Wt 2 n W 2 t W 2 0. 1 2 NX (W tn+1 W tn ) 2 = 1 1 2 N n=1 NX n=1 (N) n where (N) n = N(W tn+1 W tn ) 2 is a random variable with mean h E (N) n i = NVar(W tn+1 W tn ) = N t N = t. Jay Taylor (ASU) APM 541 Fall 2013 40 / 61

Stochastic Differential Equations However, since the variables (N) n, n = 1,, N are independent, the second term is a sum of i.i.d. random variables, each with mean t, and by the strong law of large numbers this converges to t almost surely: 1 lim N N NX n=1 (N) n = t (a.s.) This is the source of the second term in the stochastic integral R W sdw s. Jay Taylor (ASU) APM 541 Fall 2013 41 / 61

Stochastic Differential Equations The calculation on the preceding slides is a special case of a more general result known as Ito s formula. Suppose that X = (X t : t 0) is a solution to the following stochastic differential equation dx t = b(x t)dt + σ(x t)dw t If f (x) is a twice differentiable function, then the process Y = (Y t : t 0) defined by setting Y t = f (X t) is also a diffusion process and is a solution to the following SDE df (X t) = b(x f t) x (Xt) + 1 «2 σ2 (X t) 2 f x (Xt) dt + σ(x f t) 2 x (Xt)dWt. The additional second term in the drift σ 2 2 f x 2 the fluctuations of the processes W and X. is generated by the correlations between Jay Taylor (ASU) APM 541 Fall 2013 42 / 61

Population Genetics Diffusion Processes in Population Genetics We revisit the Wright-Fisher model, now including both mutation and selection: The population size is constant, with N haploid individuals. Generations are non-overlapping. The population contains two alleles, A 1 and A 2, with relative fitnesses 1 + s and 1. The selection coefficient s = s(p) may depend on the frequency p of the A 1-allele. The parents of the N individuals alive in generation t + 1 are chosen at random and with replacement, but each A 1-type individual is (1 + s)-times more likely to be chosen than an A 2-type individual (fecundity selection). At birth, A 1 mutates to A 2 with probability µ 2, while A 2 mutates to A 1 with probability µ 1. Jay Taylor (ASU) APM 541 Fall 2013 43 / 61

Population Genetics If p (N) t denotes the frequency of A 1 present in the t th generation, then the process p (N) = p (N) t : t 0 is a discrete-time Markov chain with transition probabilities p ij = P p (N) t+1 = j N p(n) t = i «! = n p j i (1 p j ) N j, N j where «1 + s(pi ) p i = (1 µ 2)p i + µ 1(1 p i ) 1 + p i s(p i ) 1 1 + p i s(p i ) «. and p i = i/n. The quantity p i is equal to the sum of the probability that an A 1-type individual is chosen as the parent and gives birth to a non-mutant offspring and the probability that an A 2-type individual is chosen as the parent but gives birth to a mutant offspring. Jay Taylor (ASU) APM 541 Fall 2013 44 / 61

Population Genetics Although our description of the transition matrix is explicit, we have little hope of directly analyzing this model, especially when N is large. Instead, we seek a diffusion approximation. We begin by calculating the first two moments of the change in the frequency of A 1 over a single generation: h i E p (N) 1 p p (N) 0 = p» E p (N) 1 p 2 p (N) 0 = p = p p = p(1 p) N where p = p(1 µ2)(1 + s(p)) + (1 p)µ1. 1 + ps(p) For a non-trivial diffusion approximation to exist in the limit as N, these must both tend to zero at the same rate. In this case, that rate is set by the binomial sampling variance, which is necessarily of order 1/N. Jay Taylor (ASU) APM 541 Fall 2013 45 / 61

Population Genetics To achieve a comparable rate in the first moment, we will assume that the two mutation rates and the selection coefficient are of order 1/N: µ i θ i N s(p) σ(p) N where the quantities θ i and σ(p) do not depend on N. When N is large, this is tantamount to assuming that both mutation and selection are weak. Jay Taylor (ASU) APM 541 Fall 2013 46 / 61

Population Genetics With these assumptions, we have h i E p (N) 1 p p (N) 0 = p» E p (N) 1 p 2 p (N) 0 = p» E p (N) 1 p 4 p (N) 0 = p = 1 N [θ1 (θ2 + θ1)p + σ(p)p(1 p)] + O N 2 p(1 p) = + O N 2 N = O N 2. The third identity uses the fact that if X Binomial(N, p), then E[(X Np) 4 ] = Np(1 p)[3p 2 (2 N) + 3p(N 2) + 1] = O N 2. Jay Taylor (ASU) APM 541 Fall 2013 47 / 61

Population Genetics It follows that the rescaled processes (p (N) ( Nt ) : t 0) converge in distribution to a diffusion process (p(t) : t 0) on [0, 1] with the following infinitesimal mean and variance coefficients: b(p) = θ 1 (θ 2 + θ 1)p + σ(p)p(1 p) a(p) = p(1 p). This process is commonly known as the Wright-Fisher diffusion. Since this diffusion should only take values in [0, 1] (p(t) is the frequency of A 1), we should check that it cannot escape this interval. This follows from two observations: The variance a(p) vanishes when p = 0 or 1, which means that the process cannot randomly fluctuate across these boundaries; At the boundaries, the drift b(p) points into the interior of the interval: b(0) = θ 1 0 and b(1) = θ 2 0. Jay Taylor (ASU) APM 541 Fall 2013 48 / 61

Population Genetics Before examining how the diffusion approximation can be used, we need to address the relationship between the original model (with fixed mutation rates and selection coefficient), the sequence of rescaled models, and the diffusion approximation. The sequence of rescaled models serves an auxiliary function: it shows how the parameters in the original model should be related for the diffusion approximation to be useful. In this example, we can express the parameters appearing in the diffusion approximation in terms of the parameters of the original model: θ i = Nµ i σ(p) = Ns(p). The approximation will be more accurate for larger values of N and smaller values of µ i and s(p). Jay Taylor (ASU) APM 541 Fall 2013 49 / 61

Population Genetics Stationary Distribution of Allele Frequencies If both mutation rates θ 1 and θ 2 are positive, then the Wright-Fisher diffusion has a unique stationary distribution on [0, 1] with density π(p) = 1 C p2θ 1 1 (1 p) 2θ 2 1 e 2σ(p)p, where C is the normalizing constant. In the neutral setting (σ(p) 0), this reduces to the density of a Beta distribution with parameters 2θ 1 and 2θ 2: π(p) = = 1 β(2θ 1, 2θ 2) p2θ 1 1 (1 p) 2θ 2 1 1 β(2nµ 1, 2Nµ 2) p2nµ 1 1 (1 p) 2Nµ 2 1. Jay Taylor (ASU) APM 541 Fall 2013 50 / 61

Population Genetics The neutral stationary distribution reflects the competing effects of genetic drift, which eliminates variation, and mutation, which generates variation. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 2Nu = 0.1 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 2Nu = 10 0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p p When 2Nµ 1, 2Nµ 2 > 1, mutation dominates drift and the stationary distribution is peaked about its mean (both alleles are common). When 2Nµ 1, 2Nµ 2 < 1, drift dominates mutation and the stationary distribution is bimodal, with peaks at the boundaries (one allele is common and one rare). Jay Taylor (ASU) APM 541 Fall 2013 51 / 61

Population Genetics With selection and mutation, the density of the stationary distribution is π(p) = 1 C p2θ 1 1 (1 p) 2θ 2 1 e 2σp. Purifying selection has two consequences: It shifts the stationary distribution in the direction of the favored allele. It tends to reduce the amount of variation present at the selected locus. 0.0 0.2 0.4 0.6 0.8 1.0 2Ns = 1 0.0 0.2 0.4 0.6 0.8 1.0 2Ns = 2 0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p p Jay Taylor (ASU) APM 541 Fall 2013 52 / 61

Population Genetics Genetic variation is often summarized by a statistic called the heterozygosity (H) or nucleotide diversity (π): H = P { a random sample of two individuals contains two different alleles} Z 1 0 2p(1 p)π(p)dp. The figure below shows that directional selection reduces heterozygosity. 0.10 0.08 0.06 H 0.04 0.02 0.00 0 1 2 3 4 5 Ns Jay Taylor (ASU) APM 541 Fall 2013 53 / 61

Population Genetics Selective Constraint and Polymorphism One prediction of this theory is that sites that are under purifying selection should be less variable than neutrally evolving sites. The degeneracy of the genetic code illustrates this effect. Amino acids are encoded by triplets of DNA bases called codons. There are 64 = 4 3 different codons, but only 20 amino acids. On average, there are 3 different codons per amino acid. It follows that there are two kinds of mutations in coding DNA: (i) A non-synonymous mutation is one that changes an amino acid. (ii) A synonymous substitution is one that changes only the DNA sequence. Jay Taylor (ASU) APM 541 Fall 2013 54 / 61

Population Genetics The mutation TTT TTC is synonymous. The mutation TTT TTA is non-synonymous because the amino acid changes from phenylalanine (F) to Leucine (L). Jay Taylor (ASU) APM 541 Fall 2013 55 / 61

Population Genetics Purifying Selection and Polymorphism in Coding Regions Prediction: If synonymous mutations are generally under weaker purifying selection than non-synonymous mutations, then we would expect synonymous diversity to be greater than non-synonymous diversity. This is what is seen: syn (H) non-syn (H) ratio (syn/non-syn) D. melanogaster 0.0054 0.00038 14.2 H. sapiens (US) 0.0005 0.0001 5 Jay Taylor (ASU) APM 541 Fall 2013 56 / 61

Population Genetics Fixation Probabilities If mutation is neglected (µ 1 = µ 2 = 0), then p = 0 and p = 1 are both absorbing states and it can be shown that one of the two alleles will become fixed in the population in finite time. If the selection coefficient is constant, σ(p) = σ = Ns, then the fixation probability of allele A 1 is given by u(p) = 1 e 2σp 1 e 2σ = 1 e 2Nsp 1 e 2Ns. Jay Taylor (ASU) APM 541 Fall 2013 57 / 61

Population Genetics The most important case is when a single copy of a new allele is introduced into a population, either by mutation or immigration. Then the initial frequency is p = 1/N and the fixation probability of the new allele is «1 u N = 1 e 2s 1 e 2Ns 8 < : 2s if N 1 s 1 2 s e 2N s if 1 s N 1 In particular, this shows that Novel beneficial mutations are likely to be lost from a population; Deleterious mutations can be fixed, but only if N s is not too large; Selection is dominated by genetic drift when s < 1. N Key result: Selection is more effective in larger populations. Jay Taylor (ASU) APM 541 Fall 2013 58 / 61

Population Genetics Fixation Probabilities of New Mutants 1 0.1 0.01 0.001 s = 0.01 s = 0.001 prob 0.0001 1E-05 1E-06 s = -0.001 s = 0 1E-07 1E-08 s = -0.002 1E-09 1E-10 10 100 1000 10000 N Jay Taylor (ASU) APM 541 Fall 2013 59 / 61

Population Genetics Hypothesis: In general, it is thought that non-synonymous mutations are more likely to be deleterious than synonymous mutations because they can change protein structure and function. Prediction: If true, then synonymous substitution rates should be higher than non-synonymous substitution rates. This is, in fact, what is observed: syn (yr 1 ) non-syn (yr 1 ) ratio (syn/non-syn) influenza A 13.1 10 3 3.5 10 3 3.8 HIV-1 9.7 10 3 1.7 10 3 5.7 Hepatitis B 4.6 10 5 1.5 10 5 3.1 Drosophila 15.6 10 9 1.9 10 9 8.2 human-rodent 3.51 10 9 0.74 10 9 4.7 Jay Taylor (ASU) APM 541 Fall 2013 60 / 61

Population Genetics The following plot shows synonymous and nonsynonymous substitution rates estimated from comparisons of human and rodent genes. In every case the non-synonymous substitution rate is less than the synonymous substitution rate. Substitution Rates Estmated from Human-Rodent Divergence 10 9 Nonsynonymous rate (x 10 9 ) 8 7 6 5 4 3 2 Average substitution rates: Synonymous: 3.51 Non-synonymous: 0.74 1 0 0 1 2 3 4 5 6 7 8 9 10 Synonymous rate (x 10 9 ) Source: Li (1997) Jay Taylor (ASU) APM 541 Fall 2013 61 / 61