Parameter Estimation in Stochastic Chemical Kinetic Models. Rishi Srivastava. A dissertation submitted in partial fulfillment of

Size: px

Start display at page:

Download "Parameter Estimation in Stochastic Chemical Kinetic Models. Rishi Srivastava. A dissertation submitted in partial fulfillment of"

Maximilian Greene
5 years ago
Views:

1 Parameter Estimation in Stochastic Chemical Kinetic Models by Rishi Srivastava A dissertation submitted in partial fulfillment of the requirement for the degree of Doctor of Philosophy (Chemical Engineering) at the University of Wisconsin-Madison 2012 Date of final oral examination: 9/20/12 The dissertation is approved by the following members of the Final Oral Committee: James B. Rawlings, Professor, Chemical and Biological Engineering John Yin, Professor, Chemical and Biological Engineering Michael D. Graham, Professor, Chemical and Biological Engineering Jennifer L. Reed, Assistant Professor, Chemical and Biological Engineering David F. Anderson, Assistant Professor, Mathematics

3 To my mom Manju i

4 DISCARD THIS PAGE

5 ii TABLE OF CONTENTS LIST OF TABLES v LIST OF FIGURES vi Page ABSTRACT viii 1 Introduction The stochastic chemical kinetic model The master equation Exact simulation of stochastic chemical kinetic models: Stochastic simulation algorithm (SSA) Application of SSA Approximate simulation of stochastic chemical kinetic models τ leap and Langevin approximation Reaction equilibrium approximation Quasi-steady state approximation Summary Parameter estimation in stochastic chemical kinetic models Introduction The negative log likelihood minimization problem The algorithm for the negative log likelihood minimization problem The quadratic model The notion of adequacy Obtaining number of SSA simulations N and smoothing/noise parameter R The UOBYQA-Fit algorithm Estimation of confidence regions The confidence region estimation algorithm The verification of confidence region algorithm Application: RNA dynamics in Escherichia coli

6 iii Page 3.6 Conclusions New methods to obtain sensitivities of stochastic chemical kinetic models Introduction The estimators Examples Sensitivity of expected value of population of a species in a reaction network Sensitivity of the negative log likelihood function Sensitivity of a rare state probability Sensitivity of a fast fluctuating species Conclusions Model reduction using the Stochastic Quasi-Steady-State Assumption Introduction Results Pap operon regulation Biochemical Oscillator Fast fluctuation Conclusions Conclusions and Future work APPENDICES Appendix A: Proof of exact likelihood of the experimental data Appendix B: Supplementary information for Chapter Appendix C: Supporting Information for Chapter

7 DISCARD THIS PAGE

8 iv LIST OF TABLES Table Page 3.1 System and optimization parameters Parameter estimates and confidence regions Parameter values for example Parameter value for example Reaction stoichiometry and reaction rates for example Parameter values for example Reaction stoichiometry and reaction rates for pap operon regulation Parameters for the biochemical oscillator Comparison of the full, sqspa reduced and dqc reduced models Initial population and reaction rate constants for the fast fluctuation example B.1 Experimental data for example 4.3.2, Sensitivity of negative log likelihood function 82

9 DISCARD THIS PAGE

10 v LIST OF FIGURES Figure Page 2.1 Multiple SSA simulations and generation of histogram from them Evolving Probability density function replicates of mrna vs time experimental data An ellipse and its bounding box Verification of confidence region using bootstrapping A typical simulation of the network involving reaction (4.1) and (4.2) Comparison of CRN and CFD estimators Experimental data for example Convergence of sensitivity estimate from CRN estimator Convergence of sensitivity estimate from CFD estimator Comparison of convergences of CRN and CFD estimators Schematic diagram of the Pap regulatory network Reduced system in the slow time scale regime Estimated sensitivity from the CRN, CFD and SRN estimators A typical SSA simulation of the network of reactions (4.33) (4.35) Comparison of standard deviations of CRN and CFD estimators Schematic diagram of the Pap regulatory network Reduced system in the slow time scale regime

11 vi Figure Page 5.3 Comparison of full model and sqspa slow time scale reduced model at t = 10s A comparison of the full model and the sqspa reduced model of pap operon regulation Stochastic simulation of the biochemical oscillator Sensitivity of the full and sqspa reduced models to parameters Comparison of the full model simulation and the sqspa-ω simulation of the fast fluctuation example The step size and the frequency of the fluctuation of SSA simulation versus time The noise in full SSA and hybrid SSA-Ω for the rapidly increasing species Comparison of probability density of C

12 vii ABSTRACT Recent years have seen increasing popularity of stochastic chemical kinetic models due to their ability to explain and model several critical biological phenomena. Several developments in high resolution fluorescence microscopy have enabled researchers to obtain protein and mrna data on the single cell level. The availability of these data along with the knowledge that the system is governed by a stochastic chemical kinetic model leads to the problem of parameter estimation. This thesis develops a new method of parameter estimation for stochastic chemical kinetic models. There are three components of the new method. First, we propose a new expression for likelihood of the experimental data. Second, we use sample path optimization along with UOBYQA-Fit, a variant of of Powell s unconstrained optimization by quadratic approximation, for optimization. Third, we use a variant of Efron s percentile bootstrapping method to estimate the confidence regions for the parameter estimates. We apply the parameter estimation method in an RNA dynamics model of E. coli. We test the parameter estimates obtained and the confidence regions in this model. The testing of the parameter estimation method demonstrates the efficiency, reliability and accuracy of the new method.

13 1 Chapter 1 Introduction The traditional approach to model biological systems deterministically had seen difficulties in explaining increasing evidence of stochastic phenomena [61, 101, 107]. A deterministic model can only simulate one pathway of evolution for a particular network. However there is usually noise in biological networks due to the presence of several components at small copy numbers. This noise can lead to phenomena usually inexplicable by a deterministic model. A stochastic model which mimics the typical noise and average behavior of these biological systems is imperative for successful explanation of several biological phenomena. Elowitz et al. [26] in their experimental work on Escherichia coli (E. coli) have shown that the intrinsic stochastic nature of gene expression process and cell to cell differences contribute significantly in explaining overall variation in gene expression. This work reinforces the idea that identical cells under similar conditions 1 can exhibit different expression of the same gene. Arkin et al. [5], in their work on phage 2 λ, infected E. coli cells have shown that a stochastic model can predict the number of lytic (destroying cell) and lysogenic (staying dormant) type of infections. Their work suggests that stochastic fluctuations in concentration of two regulatory proteins acting at low concentrations can produce pathways that might lead to phenotype bifurcation ultimately. Works on HIV [107] have shown that viruses adopt stochastic networks, whereby a probabilistic pathway for active infection and latent infection may be chosen to optimize the chances of viral-escape from cellular defenses. Intrinsic noise with its ability to drive processes of significant biological interests and stochastic chemical kinetic models with their ability to explain intrinsic noise have seen significant attention recently [5, 7, 26, 57, 72, 77, 85]. Intrinsic noise depicts itself when similar cells give rise to markedly 1 Infection by genetically identical viruses under identical environmental conditions 2 A virus that infects a bacteria

14 2 different behaviors [29, 111]. Stochastic chemical kinetic model, which is known as continuous time discrete jump Markov chain in Mathematics and Probability theory, models such similar cells as realizations of a Markov chain. One useful tool for understanding these models is the chemical master equation, which describes the evolution of the probability density of the system. The solution of the master equation is computationally tractable only for simple systems. Rather, approximation techniques such as finite state projections [74] or the stochastic simulation algorithm (SSA) [33, 34] are employed to reconstruct a system s probability distribution and statistics (usually the mean and variance). Applying these techniques to solve models of biological processes leads to significant improvements in our understanding of intrinsic noise and its effect on cellular behavior. With the advent of fluorescence microscopy [66], which can give high-throughput data for gene expression, data at the protein level have become available. Usually proteins in cells are present in the range of , however mrna are present in the level of molecules. Recently mrna data [40, 41, 68] have also been quantified using fluorescence microscopy. The availability of mrna level data, along with stochastic chemical kinetic models for the experimental systems makes parameter identification of intracellular stochastic model possible for the first time. While the motivation of the parameter estimation method described in this thesis comes from biology, the method is general enough and is applicable to any scenario where continuous time discrete jump Markov chain is the underlying model.

15 3 Overview of Thesis Chapter 2 The stochastic chemical kinetic model : In the following chapter, we introduce the stochastic chemical kinetic modeling framework. We define the governing equation, the method to simulate, and approximate methods to simulate a stochastic chemical kinetic model. Chapter 3 Parameter estimation in stochastic chemical kinetic models: This chapter is the main contribution of this thesis. We develop a novel method of parameter estimation in stochastic chemical kinetic models. Chapter 4 New methods to obtain sensitivities of stochastic chemical kinetic models : Sensitivities are a powerful tool both in parameter estimation and in model analysis. This chapter compares various methods of sensitivity estimation in stochastic chemical kinetic models. Chapter 5 Model reduction using the Stochastic Quasi-Steady-State Assumption: Reducing a stochastic chemical kinetic model can lead to ease in parameter estimation both due to the reduced model having fewer and better constrained parameters and due to the speed up in the simulation time of the reduced model. This chapter presents applications of stochastic Quasi-Steady-State Assumption and a new model reduction method for certain types of models. Chapter 6 Conclusions and Future Work: We end with a summary of the contributions of this thesis and give recommendations for further work.

16 4 Chapter 2 The stochastic chemical kinetic model 2.1 The master equation The dynamic state of a well stirred mixture of N 1 species {S 1,S 2,...,S N } under the influence of M 1 reaction channels {R 1,R 2,...,R M } can be specified as X(t) = [X 1 (t), X 2 (t),..., X N (t)] in which X i (t) = The population of species i at timet, i {1,2,..., N } A probability density P(x, t) for the random variable X can be defined as the probability of system being in state x at time t, where we note that even though time t is a continuous variable the random variable X takes on finitely many or countably infinite values. The underlying governing equation for P(x, t) is a linear first order differential equation known as master equation which can be written as : dp(x, t) M = k j a j (x ν j, t)p(x ν j ) k j a j (x)p(x, t) (2.1) d t j =1 In equation 2.1, ν j is stoichiometric vector of j th reaction of the reaction network. If we had a method whereby we could solve explicitly the time evolution of P(x, t) for all x in the state space of the biological network, we would be done. However, solving this system of ODEs in equation 2.1 either analytically or numerically is challenging even for moderately large state-space systems. There are several methods to solve the master equation approximately [9, 17, 27, 52, 60, 74]. However, the quality of the approximation obtained by these methods depends both upon the size of the state space and a good intuition about the concentration of the probability density. There is a method [32] available to generate realization of this system, instead of solving for the time evolution of the probabilities of different states explicitly. This method is known as the Gillespie direct

17 5 method or stochastic simulation algorithm (SSA) and usually this is the preferred way to simulate complex biological networks. 2.2 Exact simulation of stochastic chemical kinetic models: Stochastic simulation algorithm (SSA) The stochastic simulation algorithm (SSA) or Gillespie s direct method is the most widely method used for the simulation of stochastic chemical kinetic models. SSA is a method to obtain exact samples from the master equation. The algorithm is as follows: Input: Initial population vector x 0 R n, stoichiometric matrix ν R m n in which m is the number of reactions, vector of rate constants k, End time t end Output: Vector of time points, T, corresponding to jump times, and matrix X, consisting of species population at the jump times in T 1. Initialize t = 0, x = x 0, T (1) = 0, X (:,1) = x 0, l = 1 2. While t t end (a) For each i = 1 to m, calculate a(i ) = k(i )x(i ). Calculate r tot = m i=1 a(i ) (b) Generate two uniform random numbers u 1,u 2 and set τ = logu 1 r tot (c) Find reaction α, such that, α 1 i=1 a(i ) u 2 α i=1 a(i ) (d) For each i = 1 to n, update x(i ) using x(i ) x(i ) + m i=1 ν(α,i ) (e) t t + τ (f) l l + 1 (g) Store the results: T (l) = t, X (:,l) = x (h) Go to the step 2 to check the while condition 3. Return T and X

18 6 2.3 Application of SSA Consider the following reversible reaction A B k 1 B (2.2) k 2 A (2.3) with n A0 = 100,n B0 = 0 and k 1 = 2,k 2 = 1. Figure 2.1(a) shows 3 different runs of SSA algorithm described in section 2.2. Figure 2.1(b) shows the sample probability density of the system using 5000 SSA simulations at time t = 2, and illustrates usefulness of SSA in reconstructing probability density function described by the master equation. 450 (b) (a) n B Count ni n A time (sec) n A Figure 2.1: (a) Multiple SSA simulations of system (2.2) (2.3) (b) Histogram at t=2sec, for 5000 SSA simulations In-fact as the number of SSA simulations increase the SSA reconstructed probability density function, starts getting close to the true probability density function given by the master equation. For the same set of reactions (2.2) and (2.3), writing master equations gives us 101 coupled ODES, in 101 states. Figure 2.2 shows the probability density, obtained by solving the master equation 2.1, as a function of number of A molecules and time.

19 7 P(n A ) Time (sec) n A Figure 2.2: Evolving Probability density function At time t = 0, all the probability resides in the state n A = 100. As time progresses the probability starts to diffuse to other states. Finally, at time t = 4, the probability distribution is fairly symmetric, with maxima around n A = 50.

20 8 2.4 Approximate simulation of stochastic chemical kinetic models The original stochastic simulation algorithm (SSA) proposed by Gillespie [32], tracks all the microscopic reaction events individually and generates realization of the master equation exactly. Tracking individual reactions events microscopically in SSA is both its strength and weakness. The strength is that one gets exact realizations of the underlying micro-physical phenomena of the individual reactions and the weakness is that it becomes computationally intensive for complex biological networks. There have been several attempts to decrease the computational burden of the SSA. One of the improvements of Gillespie s first reaction method was proposed by Gibson and Bruck [30]. Their method exploits the memorylessness property of exponential random variables and some efficient data structures like indexed priority queue and graphs to perform the SSA. Next we discuss three classes of model reduction normally found in the stochastic modeling literature. We briefly discuss each and describe their relevance to the biological systems of interest here τ leap and Langevin approximation To simulate multiple reaction events in a single step, Gillespie [35] has proposed a τ-leap method. The τ-leap method essentially relies on a large population of all species and a specific selection method of τ. The time interval τ is chosen in such a way that within this interval number of times each reaction channel fires is large yet none of the reaction propensities change appreciably. Under these conditions the number of times each reaction channel fires is obtained by a Poisson random variable. However their τ selection method [35] can lead to negative population of species. Many researchers [1, 3, 11, 12, 37, 58, 59, 63, 86] have proposed theoretical analysis and improvement of the τ leap method that improves the efficiency and negative population problem of the original τ leap method [35]. An extension of the τ-leap method, where under suitable conditions one can approximate the number of times each reaction fires with a normally distributed random variable instead of a Poisson random variable, gives rise to the well known Langevin approach. However it should be noted that both the τ-leap and the Langevin approach as described by Gillespie [35] rely on a large population of every species, which is rarely the case in the biological systems of interest here.

21 Reaction equilibrium approximation A usual challenge of simulation of Gillespie s SSA occurs due to the presence of several fast and slow reactions together in the same reaction network. This problem is commonly known as the reaction equilibrium problem. A fast reaction fires many times within a fast time scale whereas a slow reaction fires rarely in this fast time scale. A full SSA on such systems spends the majority of its run time in the simulation of these fast reactions which are usually transitory in nature and are not of any particular interest for the evolution of the system in the slow time scale. There have been efforts that separate the fast and slow subsystems to reduce the simulation burden of full SSA [10, 20 22, 50, 51]. Mastny [70] describes the issue in the approach of slow scale SSA adopted by Cao et al. [10] and how one can circumvent this problem. Several papers have been directed towards addressing the fast and slow reactions type of networks [43, 92 94]. However, most of the systems of interest here, do not have clear cut partitioning among the reactions and many of the reactions in the network involve species at low copy numbers. As shown by Griffith et al. [44], it is incorrect to put fast reactions involving species at low copy number in a fast reaction subset Quasi-steady state approximation In deterministic kinetics, a quasi-steady state approximation (QSSA) is usually applied for a highly reactive species, whereby one assumes that after an initial transience (fast time scale) the rate of change of concentration of a species always remains at zero [15]. This usually leads to a reduced mechanism in the slow time scale in which one can remove a highly reactive species from the model. The advantage of using the QSSA in the deterministic case is that one does not have to estimate very large rate constants from the data. The same advantage remains in the case of stochastic kinetic models simplified by the QSSA. However, this simplification, at times can give significant speedup along with the ease of parameter estimation. There have been a few attempts to utilize the concept of QSSA in the stochastic settings. Rao and Arkin [84] propose a QSSA approximation for stochastic settings drawing analogy from a deterministic QSSA reduction. They assume that conditional probability density of population of fast species conditioned on slow species is Markovian in nature and set the derivative of this conditional probability density to zero. Their assumption that conditional probability of population of fast species conditioned on slow

22 10 species is Markovian seems to be rather ad-hoc and it is unclear if this approximation is indeed true in general. Mastny et al. [71] have recently treated QSSA in stochastic settings by application of singular perturbation analysis. They have shown that in the limit of a certain parameter approaching to zero, their reduced model indeed converges to the full model. Appearance of the Quasi-Steady-State(QSS) is also not common in the systems of interest here. Even when the system does have a QSS, identification of the QSS species is not straight forward. 2.5 Summary In this chapter we introduced the master equation, the probability evolution equation for stochastic chemical kinetic models. We described the SSA, a method to generate exact samples from the master equation. Finally, we described approximate methods to simulate stochastic chemical kinetic models and the their relevance to the systems of interest here.

23 11 Chapter 3 Parameter estimation in stochastic chemical kinetic models 3.1 Introduction Unlike ordinary differential equation based models for which fairly robust and efficient parameter estimation techniques exist, there are not many such techniques for stochastic chemical kinetic models. Boys et al. [8] propose generating many samples of the full master equation consistent with the given measurement. They then use Markov chain Monte Carlo to obtain the posterior distribution of the parameter. The first step is computationally intractable for the models of interest here. Golightly et al. [42] use the Fokker-Planck approximation of the master equation. This diffusion approximation is not generally applicable in stochastic chemical kinetics. Tian et al. [102] express the likelihood p(y θ) as a product of transition densities p(y θ) = n i=1 p(y i+1 y i,θ). Each p(y i+1 y i,θ) is evaluated using 5000 SSA simulations. A genetic algorithm is used to maximize p(y θ). This procedure is computationally inefficient because 5000 SSA simulations are used for each transition. Reinker et al. [89] calculate the likelihood analytically using an artificial maximum number of reactions that can occur within a given time interval. They use a quasi-newton method to maximize the likelihood. The assumption about the maximum number of reactions is unrealistic. Both Sisson et al. [99] and Toni et al. [103] use approximate Bayesian computation approach but their approach requires the use of summary statistics and a distance metric. It is difficult to extend their approach to the stochastic chemical kinetic setting. Poovathingal et al. [81] propose to evaluate the likelihood using the solution to the master equation. Their proposed function is not the likelihood, but some other merit function. They estimate the solution of the master equation by SSA simulations. This is computationally intensive and requires a binning strategy. They use directed evolution to optimize. Henderson et al. [53] replace stochastic chemical kinetic model with a statistical model, and use the statistical model for obtaining parameter estimates. The replacement with the statistical model makes their method limited to the cases where

24 12 such a replacement is accurate and possible. Wang et al. [110] propose a stochastic gradient descent method that requires generating many samples of the master equation consistent with the measurement. They then use reversible jump Markov chain Monte Carlo to obtain the posterior distribution of the parameter. The first step is computationally intractable for the models of interest here. We present here a novel method of parameter estimation in stochastic chemical kinetic models. As noted by Wets [19, 108], the method developed here employs tight integration of statistical sampling and mathematical optimization. The method is based on constructing a new expression for the likelihood of the data. The likelihood expression is in the form of an expectation of a smooth function of the data, model parameters, and a smoothing parameter. We then use sample path method [28, 46, 47, 55, 56, 78, 79, 96 98] to approximate the likelihood, which is in the form of an expected value, to a sample average. The basic idea behind the sample path method is that in calculating the objective function, i.e. the likelihood, we use common random numbers. Since the gradient and Hessian of the likelihood are hard to estimate both due to the requirement of computation and the inaccuracy in their estimation, we pursue a derivative free optimization method, UOBYQA [83]. For some other derivative free methods, see references [13, 14, 69]. We adapt UOBYQA to UOBYQA-Fit that suits the optimization problem of interest here. Unlike Deng and Ferris [16], who assume that the objective function at the points of the fitting set are components of a multivariate normal to obtain the number of sample paths for different iterations, we keep the number of sample paths fixed. Point estimates of the parameter alone are not useful when one is dealing with a stochastic model. Confidence region around the point estimate tells us the precision of the point estimate and how much information is contained in the experimental data. We will discuss in a subsequent section that neither finite sample distribution nor asymptotic distribution of parameter estimates are obtainable. Bootstrap estimation of confidence regions becomes handy when one does not have much information about the distribution of the point estimates. We use a variant of Efron s percentile method [23 25] to estimate the parameter confidence region. Along with estimation of confidence region, we test the quality of this confidence region. The successful testing of the confidence region is a reliable indicator that the UOBYQA-Fit along with the variant of the percentile method solves the parameter estimation problem reliably and accurately.

25 13 This chapter is arranged as follows. In section 3.2, we provide the expression of the likelihood of the data and describe the optimization problem. Section 3.3 describes the UOBYQA-Fit algorithm and provides the pseudo-code for it. Section 3.4 describes the bootstrapping confidence estimation algorithm and provides the pseudo code for both the confidence region estimation algorithm and the verification of the confidence region algorithm. An application of the technique described in this chapter is given in section 3.5. Finally, section 3.6 discusses the conclusions of this chapter and summarizes the contributions. 3.2 The negative log likelihood minimization problem To obtain the parameter estimates, we pursue likelihood maximization. The likelihood of data are poorly conditioned numbers, and to improve the conditioning of these numbers we pursue an equivalent problem to the likelihood maximization, the minimization of the negative log likelihood. We show in Appendix A that the exact likelihood of the experimental data, composed of m replicates y = {y 1, y 2,..., y m }, is given by L(y θ) = lim R 0 1 (2π) mn d /2 R m/2 m E θ [e 1/2(y j x) R 1 (y j x) ] (3.1) in which x is the random vector of the population of species coming from the SSA simulation of the model, m is the number of replicates, n d is the number of sample points in each replicate, R is an n d n d positive definite matrix representing both measurement noise and smoothing, and θ is the vector of parameters that we are trying to estimate. For any positive definite value of R, the right hand side of equation (3.1) after the limit is an approximation to the exact likelihood. For such a value of R 0, using sample mean as the estimator for the expectation in equation (3.1), we obtain an estimator of L(y θ) j =1 1 ˆL(y θ, N ) = (2π) mn d /2 R m/2 N m m N e (1/2)(y j x i (θ)) R 1 (y j x i (θ)) j =1 i=1 (3.2) As N and R 0, the estimated likelihood, ˆL(y θ, N ), approaches the exact likelihood L(y θ). The estimate of negative log likelihood which we use as objective function for minimization is: [ m 1 N ˆφ(θ, N ) = log ˆL(y θ, N ) = log N (2π) n d /2 R 1/2 j =1 e (1/2)(y j x i (θ)) R 1 (y j x i (θ)) i=1 ] (3.3)

26 14 The parameter estimates are then given by the minimization of the negative log likelihood ˆθ = argmin θ ˆφ(θ, N ) (3.4) 3.3 The algorithm for the negative log likelihood minimization problem We rewrite equation (3.4) to denote the dependence on randomness ω which itself depends on all the random numbers used to generate N SSA sample paths. ˆθ(N,ω) = argmin θ ˆφ(θ, N,ω) (3.5) To use sample path optimization, we freeze the N streams of random numbers and use these streams repeatedly to calculate the value of ˆφ(θ, N,ω) for different value of θ. The freezing of the random numbers means that the problem (3.5) becomes a deterministic optimization problem. Furthermore, we define θ to be the logarithms of the rate constants to keep the deterministic optimization problem unconstrained. We develop UOBYQA-Fit, which is a variant of Powell s UOBYQA [83], for solving the sample path optimization problem (3.5). The lack of availability of good derivative estimates for the objective function ˆφ(θ, N,ω) makes use of derivative free methods like UOBYQA attractive here. Like UOBYQA, UOBYQA-Fit is a model based approach which constructs a series of quadratic models approximating the objective function. Like UOBYQA, UOBYQA-Fit uses a trust region framework [73, 76]. However, unlike UOBYQA, which does exact interpolation to obtain the quadratic model, UOBYQA-Fit fits the quadratic model to the sample points. As described in section 3.3.2, the use of fitting instead of exact interpolation enables UOBYQA-Fit to have a fairly mild adequacy criteria for the fitting points The quadratic model At every iteration of the algorithm, UOBYQA-Fit fits a quadratic model Q(θ) = c + g T (θ θ k ) (θ θ k) T G(θ θ k ) (3.6) to an adequate set of sample points S k = {z 1, z 2,..., z L }. The value of the objective function is obtained using (3.3) along with N SSA simulations. We describe the notion of adequacy and how we obtain the number N in sections and 3.3.3, respectively. The fitting procedure solves the following

27 15 minimization problem. {c k, g k,g k } = argmin c,g,g L (Q(z i ) ˆφ(z i, N,ω)) 2 (3.7) The point θ k is the center of the trust region, c k is a scalar, g k is a vector in R n, and G k is a symmetric i=1 R n n matrix. The quadratic model is expected to approximate the function ˆφ around θ k. The number of points in the fitting set S k is chosen to be double the number of points that can determine a unique quadratic, i.e. L = (n + 1)(n + 2) (3.8) The fitted quadratic that is obtained after solving minimization problem (3.7) is Q k (θ) = c k + g T k (θ θ k) (θ θ k) T G k (θ θ k ) (3.9) The notion of adequacy Unlike UOBYQA, which requires that no nonzero quadratic function should vanish at all the interpolation points, we impose an adequacy criteria that is relatively mild. The points in the fitting set must neither be too close to each other, nor too far away from the current iterate θ k. We ensure that any two points in the fitting set are no closer than δ k L, where closeness is measured by the Euclidean distance. We also ensure that z i θ k 2δ k, z i S k. The adequacy criteria ensures that the fitted quadratic model Q k is a good local approximation to the function ˆφ in the neighborhood of θ k Obtaining number of SSA simulations N and smoothing/noise parameter R As evident from the equation (3.3), the estimation of negative log likelihood requires the values of N and R. In the absence of any external information about measurement noise, we use in which value of α depends upon N by R = αi nd n d (3.10) α = c N. (3.11)

28 16 The value of c is a problem dependent parameter. To obtain N, we implement the UOBYQA-Fit algorithm described in the next section, with N = N i {1,2,3...} values to obtain several optimal parameter estimates θ i. We choose the smallest N = N i such that θ i θ i 1 ɛ in which ɛ is a small number. With c and the obtained N values, we use equation (3.11) to get value of α. α, which determines R by equation (3.10), and N, the number of SSA simulations, are the two required parameter for the estimation of ˆφ of equation (3.3) The UOBYQA-Fit algorithm In this section we present the core algorithm, which inherits several basic features of the UOBYQA algorithm. For complete details of UOBYQA, please see [83]. Starting the algorithm requires a starting point θ 0, and an initial trust region radius δ 0. At the iteration k, we fit the quadratic model (3.6) as described in section As in a classic trust region method, a new promising point θ k+1 is obtained by solving the sub-problem min Q k(θ k + s), subject to s δ k (3.12) s R n The new promising point s is accepted if the degree of agreement ρ k = ˆφ(θ k ) ˆφ(θ k + s ) Q k (θ k ) Q k (θ k + s ) (3.13) is large enough. Otherwise the next iterate θ k+1 = θ k. If ρ k is large enough, which indicates good match between the Quadratic model Q k and the function ˆφ, the point θ k + s is set as the iterate k + 1 and it is put inside the interpolation set S k+1. UOBYQA-Fit algorithm Input Objective function ˆφ which depends on the model, experimental data, N, R Trust region parameters 0 η a η 0 η 1, 0 γ 0 γ 1 1 γ 2,δ 0,δ end

29 17 Starting point θ 0 Output Parameter estimate θ and objective function value, φ, at the parameter estimate θ 1. Generate the initial fitting set S 1. Using equation (3.3), evaluate ˆφ for each point in the fitting set S 1. The first iterate is the point θ 1 S 1 that minimizes ˆφ over the points in the set S 1 2. For iteration k = 1,2,... (a) Construct a quadratic model of the form (3.9) which fits points in S k by solving the optimization problem (3.7) (b) Solve the trust region problem (3.12). Evaluate ˆφ at the new point θ k + s, and compute the agreement ratio, ρ k, defined in equation (3.13) (c) If ρ k η 1, increase the trust radius δ k by using δ k δ k (1 + γ 2 )/2 otherwise, decrease the trust radius by using δ k δ k (γ 0 + γ 1 )/2 (d) If ρ k η 0, accept the point θ k + s as the next iterate θ k θ k + s otherwise, set the next iterate as the current iterate θ k θ k (e) If ρ k η a, improve the quality of fitting points in S k as described in section (f) Check whether any of the termination criteria is satisfied, otherwise repeat the loop. The termination criteria include hitting the limit on the number of function ˆφ evaluations and δ k δ end 3. Evaluate and return the final solution point θ and value of the objective function ˆφ at θ.

30 Estimation of confidence regions Bootstrapping is the method of choice to estimate confidence regions when we do not have any information about the distribution of the sample statistics [6, 18, 23, 49]. The parameter estimates obtained for a given experimental data y, consisting of m replicates, depend upon the experimental data y. We do not have any specific information about the distribution of y, implying that the distribution of sample statistics, i.e. parameter estimate ˆθ, is also unavailable to us. To generate a large population of replicates from the experimental data y, we make c y copies of y and generate a bootstrapped population of replicates Y. From the bootstrapped population of replicates, we pick m replicates by doing random sampling with replacement. This random sampling with replacement gives us one bootstrapped experimental data set. We repeat this random sampling with replacement N B times to obtain N B sets of bootstrapped experimental data, each consisting of m replicates. Next we present the algorithm to estimate the parameter confidence region, which is an ellipsoid, given the experimental data y and confidence level α The confidence region estimation algorithm Input Experimental data y Number of copies of experimental data c y Number of bootstrapped experimental data sets N B The level of confidence for parameter estimates α Output Center of the confidence region, θ c G and b α that characterize the confidence region (θ θ c ) G(θ θ c ) b α Bounding box half edge lengths l

31 19 1. Make c y copies of experimental data y to generate bootstrapped population of replicates Y 2. Generate N B bootstrapped experimental data sets from Y using random sampling with replacement 3. For each bootstrapped experimental data run UOBYQA-Fit of section to obtain a bootstrapped parameter estimate 4. Make a grid in the parameter space centered around the median of the bootstrapped parameter estimates obtained from step 3 5. Evaluate objective function ˆφ at each point of the grid of step 4 6. Fit a quadratic (θ θ c ) G(θ θ c ) = d through the grid points and the value of the objective functions obtained in step 5 at the grid points. θ c is the median of the bootstrapped parameter estimates of step 3 7. For each bootstrapped parameter estimate obtained from step 3, evaluate f (θ) = (θ θ c ) G(θ θ c ) 8. Sort f (θ) values obtained from step 7 in ascending order. Assign b α to 100α percentile point of this sorted f (θ) values 9. The vector of bounding box half lengths is given by l = (b α diag(g 1 )) 10. Return θ c,g,b α,l Next we check the accuracy of the obtained confidence region. We generate several sets of experimental data, and obtain parameter estimates for each experimental data set. We obtain the confidence region, using just the first experimental data set. We verify the accuracy of this confidence region by checking how many of the parameter estimates fall inside this confidence region. Ideally, the fraction of points inside the confidence region should be equal to the confidence level α. We are now ready to present, the verification procedure for the confidence region algorithm.

32 The verification of confidence region algorithm Input N e sets of experimental data. Each experimental data set consists of m replicates C, a vector of confidence levels Output Vector I with elements as I i such that for each confidence level α i C, I i is the number of points inside α i confidence hyper-ellipsoid 1. For each experimental data, obtain parameter estimate using UOBYQA-Fit algorithm 2. For i = 1,2,... to number of confidence levels (a) Call the confidence region estimation algorithm of section with confidence level α i and first experimental data set (b) For each parameter estimate ˆθ obtained in step 1 check (θ i θ c ) G(θ i θ c ) b αi (3.14) (c) Assign I i to the number of parameter estimates from step 3.4.2that satisfy (3.14) 3. Return vector I 3.5 Application: RNA dynamics in Escherichia coli Golding et al. [41] developed a florescence microscopy method to quantify molecular level of mrnas in individual E. coli cells. The method is based on amplification of a fluorescence protein having the capability to bind to a reporter RNA. To obtain the number of mrna molecules, the fluorescence flux produced in the cells is compared with the fluorescence produced by a single mrna molecule. The mrna signal was shown to rise until 80 minutes and then plateau. They fit the experimental data with a mass action kinetic model given by reactions (3.15) (3.17). k 1 DNA S DNA A (3.15) k 2 DNA A DNA S (3.16) k 3 DNA A DNA A + RNA (3.17)

33 21 Since our aim here is to test the algorithm developed, we test it with simulated experimental data assuming model (3.15) (3.17) as the true system. The amount and quality of experimental data is important to obtain good parameter estimates and tight confidence regions. In Figure 3.1 we show 10 replicates of the experiment. The tremendous replicate to replicate variability is an indicator of lack of enough experimental data. We use m = 100 replicates as our experimental data set. For each replicate we use sampling time of 0.5 minutes, which is the same as the successive image time of the experimental protocol mrna t Figure 3.1: 10 replicates of mrna vs time experimental data We apply the UOBYQA-Fit algorithm described in section to obtain the point estimates of the parameters. The parameters used in the implementation of the algorithm are listed in Table 3.1. Unlike Poovathingal and Gunawan [81], we provide a confidence region using the method described in section 3.4. In 2-D the confidence region for the parameters is an ellipse and the bounding box is the rectangle aligned with the coordinate axes and tangent to the ellipse. Figure 3.2 shows an ellipse with its rectangular bounding box.

34 22 Parameter m N θ θ 0 δ 0 δ end Value Table 3.1: System and optimization parameters Figure 3.2: An ellipse and its bounding box In 3-D the confidence region for the parameters is an ellipsoid and the bounding box is the cuboid aligned with the coordinate axes and tangent to the ellipsoid. The parameter estimates obtained and the corresponding confidence region, both the bounding box and the extreme points of the ellipsoid, are listed in Table 3.2. The ratio of volumes of the 95% confidence region ellipsoid and the bounding box corresponding to this ellipsoid is This ratio indicates the significant stretch in the confidence region when we put the bounding box around the 95% confidence region ellipsoid.

35 23 Parameter estimates 95%Bounding box 6 Extreme points of the ellipsoid ± ± ± Table 3.2: Parameter estimates and confidence regions. The true parameter values are [ ] Next we verify whether the large confidence region depicted in Table 3.2 is due to the lack of information in the experimental data or it is an artifact of our confidence region estimation algorithm. We generate 500 sets of experimental data set where each experimental data set consists of 100 replicates. We use the verification of the confidence region algorithm of section to obtain the number of parameter estimates inside different confidence level ellipsoids. Number of points in α confidence ellipsoid Actual Expected α Figure 3.3: Verification of confidence region using bootstrapping

36 24 Figure 3.3 shows, the number of these points that are inside α level confidence region as a function of α. For small values of α the number of points inside α level confidence ellipsoid are close to the expected number of points inside the α level confidence ellipse. This figure illustrates the quality of the confidence region generated by the confidence region estimation algorithm. Therefore the confidence region algorithm generates reliable confidence regions, and the large confidence region depicted in the table 3.2 is due to the lack of information in the experimental data. 3.6 Conclusions In this chapter we presented a new method of parameter estimation in stochastic chemical kinetic models. The method is based upon a negative log likelihood minimization approach in which the likelihood expression had several nice properties. The likelihood expression was in the form of the expectation of a function of data, parameters, and a smoothing parameter. The estimation of this likelihood is possible even with just 1 SSA simulation. We describe a procedure to obtain the number of SSA simulations, N, which can give us reliable parameter estimates. We give a heuristic expression for the connection between the smoothing parameter, R, and the number of SSA simulations, N. Equipped with the likelihood expression, N, and R, we used the sample path method to estimate the negative log likelihood. To minimize this negative log likelihood, we developed a derivative free optimization method UOBYQA-Fit, a variant of Powell s UOBYQA algorithm. To estimate the confidence region, we developed a variant of Efron s percentile method. We tested the obtained parameter estimates and the confidence regions by generating several experimental data sets. The tests indicate that both the optimization and the confidence estimation algorithms are producing reliable parameter estimates and confidence regions, respectively.

37 25 Chapter 4 New methods to obtain sensitivities of stochastic chemical kinetic models 4.1 Introduction The stochastic chemical kinetic models that describe the biological systems of interest here depend on parameters whose values are often unknown and can change due to changes in the environment. Sensitivities quantify the dependence of the system s output to changes in the model parameters. Sensitivity analysis is useful in determining parameters to which the system output is most responsive, in assessing robustness of the system to extreme circumstances or unusual environmental conditions, and in identifying rate limiting pathways as a candidate for drug delivery. However, one of the most important applications of sensitivities is in parameter estimation. Sensitivities provide a way to approximate the Hessian of the objective function through the Gauss-Newton approximation[88, p. 535]. Unbiased methods of sensitivity estimation include the likelihood ratio gradient method [39, 75] and the infinitesimal perturbation method based on the Girsanov transform [80, 106]. The canonical convergence rate, which is a measure of how fast the estimator error converges to a standard normal distribution, of both the likelihood ratio gradient and the infinitesimal perturbation analysis estimators are O(N 1/2 ) [38], in which N is number of estimator simulations. The unbiasedness of the likelihood ratio gradient method comes at the cost of high variance of the estimator if there are several reaction events in the estimation of the output of interest. Girsanov transform based methods, on the other hand, have high variance of the estimator when there are a small number of reaction events in the estimation of the output of interest. Komorowski et al. [64] use a linear noise approximation of stochastic chemical kinetic models for sensitivity analysis. However, use of the linear noise

38 26 approximation limits their analysis to only stochastic differential equation models. Gunawan et al. [45] compare the sensitivity of the mean with the sensitivity of the entire distribution. They explain why the sensitivity of the mean can be inadequate in determining the sensitivity of stochastic chemical kinetic models. Despite being easier to implement and intuitive to understand, finite difference based methods produce biased sensitivity estimates. However, implemented with consideration of the trade-off between the statistical error of the estimator and its bias, finite difference based methods can have a canonical convergence rate close to the best possible convergence rate of O(N 1/2 ) [38]. In fact, L Ecuyer and Perron [67] show that for many practical cases of interest, infinitesimal perturbation analysis and finite difference with common random numbers have the same canonical convergence rate. Several different estimators using finite difference have been proposed [2, 38, 87]. Anderson [2] proposes a new estimator, coupled finite difference (CFD), using a single Markov chain for the nominal and perturbed processes. The CFD estimator incorporates a tight coupling between the nominal and perturbed processes, thereby producing a significant reduction in estimator the variance [2]. In this chapter, we show the superiority of CFD over CRN in the estimation of sensitivities. We do not discuss the independent random number [87] estimator, also known as Crude Monte Carlo [2] estimator, because either estimator, CRN or CFD, usually has several order of magnitudes smaller variance than this estimator. We calculate sensitivity estimates of four different quantities of interest. In example one, the quantity of interest is the expected value of a species. Example two looks at the likelihood of experimental data. Example three looks at the probability of a rare state. Example four looks at the expected value of a fast fluctuating species. This chapter is arranged as follows. Section 4.2 defines the estimators that are used in the subsequent examples. Section 4.3 shows the results we obtain from the four examples. Finally section 4.4 discusses the conclusions of this chapter and summarizes the contributions. 4.2 The estimators Common random number (CRN) [38, 87]: A single simulation of the CRN estimator gives two coupled SSA simulations: the first coupled SSA simulation uses the rate parameter k and randomness ω the second one uses the perturbed rate parameter k + ɛ and the same randomness ω.

Extending the Tools of Chemical Reaction Engineering to the Molecular Scale

Extending the Tools of Chemical Reaction Engineering to the Molecular Scale Multiple-time-scale order reduction for stochastic kinetics James B. Rawlings Department of Chemical and Biological Engineering