Automted Modeling of Stochstic Rections with Lrge Mesurement Time-Gps Michel Schmidt Computtionl Synthesis L Cornell University Ithc, NY 14853 mds47@cornell.edu Hod Lipson Computtionl Synthesis L Cornell University Ithc, NY 14853 hod.lipson@cornell.edu ABSTRACT Mny systems, prticulrly in iology nd chemistry, involve the interction of discrete quntities, such s individul elements or molecules. When the totl numer of elements in the system is low, the impct of individul rections ecomes non-negligile nd modeling requires the simultion of exct sequences of rections. In this pper, we introduce n lgorithm tht cn infer n exct stochstic rection model sed on sprse mesurements of n evolving system of discrete quntities. The lgorithm is sed on simulting cndidte model to mximize the likelihood of the dt. When the likelihood is too smll to provide serch grdient, the lgorithm uses the distnce of the dt to the model's estimted distriution. Results show tht this method infers stochstic models relily with oth short time gps etween mesurements of the system, nd long time gps where the system stte hs evolved qulittively fr etween ech mesurement. Furthermore, the proposed metric outperforms optimizing on likelihood or distnce components lone. Trits mesured on the serch novelty, ge, nd lot suggest tht this lgorithm scles well to incresingly complex systems. Ctegories nd Suject Descriptors I.6.5 [Simultion nd Modeling]: Model Development Modeling methodologies. Generl Terms Algorithms, Design, Performnce. Keywords Stochstic modeling, stochstic simultion. 1. INTRODUCTION Stochstic systems pervde nerly ll res of science, from quntum properties of tomic prticles, to chemicl rections in chemicl th, to fluctutions in popultions or ecosystems. All stochstic systems re t lest prtilly rndom, mking them difficult to model dynmiclly or deterministiclly. Insted, Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distriuted for profit or commercil dvntge nd tht copies er this notice nd the full cittion on the first pge. To copy otherwise, or repulish, to post on servers or to redistriute to lists, requires prior specific permission nd/or fee. GECCO 11, July 12 16, 2011, Dulin, Irelnd. Copyright 2011 ACM 978-1-4503-0557-0/11/07...$10.00. Monte Crlo methods re often employed to simulte nd nlyze their ehvior. A prticulrly importnt Monte Crlo method ws developed y Dn Gillespie in 1977 in order to model chemicl rections kinetics [1]. The Gillespie lgorithm performs n exct nd sttisticlly-correct simultion of stochstic system sed on set of discrete chemicl rections, rection coefficients, nd initil conditions. The Gillespie lgorithm hs een used extensively in systems iology, nd lso similr domins. Trditionlly, the set of rections tht model stochstic system must e developed nd theorized mnully y experts. In this pper we introduce n evolutionry lgorithm tht utomticlly hypothesizes out the rections nd rection rtes tking plce in system simply y nlyzing rw experimentl dt, even with lrge time gps etween oservtions (see Figure 1). The proposed method serches over spce of rections in order to find the mximum likelihood model tht grees with the experimentl oservtions. The key chllenge to serching over stochstic models is the computtionl cost of estimting likelihood vlues from model nd mintining serch grdient. Except for only the most trivil systems, the proility density of set of stochstic rections cnnot e solved over time. Insted, the model cn e simulted (or smpled) repetedly. However, efficient smpling methods fil over lrge time spns [2], mking it difficult to estimte distriution tils. The proposed method overcomes this difficulty y using twocomponent optimiztion metric. The metric ttempts to mximize the log-likelihood of the dt given cndidte model. However, if the likelihood is too smll to provide grdient for the serch, the criterion chnges to the distnce of ech dt point to the estimted proility density of the cndidte model. In effect, this distnce component llows even extremely inccurte models to improve despite hving zero likelihood. Once models get close enough to the dt, where their likelihoods cn e estimted ccurtely through smpling, the metric switches to mximize the likelihood. This metric lso reduces the computtionl complexity, s the ccurcy of estimting the tils of distriutions is less importnt. The lgorithm cn therey use fewer smples (fewer simultions of cndidte model) nd still estimte useful likelihood grdient. 2. BACKGROUND Here we introduce importnt concepts in stochstic simultion lgorithms, density estimtion, nd evolutionry lgorithms. 307
1500 Periodic Smples of Stochstic System Mximum Likelihood Stochstic Model 1400 1300 1200 x 10 2 x y 1100 1000 x + y 0.1 2 y 900 800 y 10 ø 700 600 600 700 800 900 1000 1100 1200 1300 1400 x Figure 1. Overview of the modeling prolem. A stochstic system evolves n exct ehvior over time shown in lue. Periodiclly, the stte of system cn e mesured (shown in red dots), smple of the exct time evolution of the system. The tsk is to infer mximum likelihood stochstic model (right) for this system from these periodic mesurements. Actul dt nd solution shown. 2.1 Stochstic Simultion Algorithms The exct stochstic simultion lgorithm ws first developed in [3] nd lter pplied to chemicl kinetics in [1]. The method mkes few ssumptions out the system except tht the environment is well mixed. The sic lgorithm involves two steps: (1) smpling time dely until the next rection occurs, nd (2) smpling mong possile rections which occurs. Ech of these smples re dependent on the numer of molecules in the current stte. When there re lrge numer of molecules, the time until the next rection cn e extremely smll. The counts of ech species lso influences which rection is more likely to occur. The system is simulted y repetedly pplying rections nd incrementing time y the smpled time mount, resulting in rndom wlk, time-series trjectory. See [1] for more detils. The exct simultion of the Gillespie lgorithm ecomes criticlly importnt when the numer of molecules is sufficiently smll. In this cse, single rections cn significntly impct rection propensities nd future sttes (e.g. reching terminting stte). When the numer of molecules is exceedingly lrge, the system dynmics re pproximtely deterministic ecuse lrge numers of rections tend to verge out rndom fluctutions. The exctness of the Gillespie lgorithm does come t computtion cost, nd severl methods hve een proposed to improve its performnce, while still preserving exctness where necessry. For our simultions, we use the modified Poisson tu-leping procedure tht ensures tht t most one criticl rection occurs per lep [4]. The tu-leping speeds up the stochstic simultion y estimting the numer of rections occurring during time period tu. The vlue of tu is chosen such tht the chnge in rection propensities during tu is ritrrily smll. When the tu lep is not lrge enough to provide useful speed up, the lgorithm defults to n exct simultion. 2.2 Kernel Density Estimtion In order to clculte the likelihood of the dt given cndidte model, we need to estimte the proility density of the model t ech dt point. There re mny wys to estimte proility densities. A simple method is to use histogrm. The histogrm divides ll smples (in our cse counts of molecules fter simulting model) into numer of ins. The density is then the in frequency divided y the in width. Severl methods exist for choosing optiml in widths nd positions [5]. A mjor drwck to inned histogrms however is tht they re loclly flt everywhere. In other words, they hve no locl grdient tht is menle to optimiztion. An lterntive to histogrm, nd the method used in our experiments, is kernel density estimtion [6, 7]. Kernel density estimtion is non-prmetric method to estimte proility density functions. It sums series of kernel functions tht re centered on ech smple. We used Gussin kernel function, mening ech smple contriuted Gussin density round its smple vlue. Choosing uniform kernel for exmple would produce result similr to inned histogrm. The Gussin kernel produces density estimtes, useful for optimizing, however we still need to specify ndwidth. The ndwidth is nlogous to the in width in inned histogrm. Vrile kernel ndwidth selection is the technique of selecting different ndwidth for ech smple [8]. Vrile ndwidths llow the kernels to e nrrow in high density regions, cpturing high detils of the distriution, nd wide in less certin lowdensity res. 308
Rections: 1,1 x 1 + 1,2 x 2 c 1,1 x 1 + 1,2 x 2 Encoding: m 1,1 2,1,1 1,2 2,2 c 2,1 x 1 + 2,2 x 2 2,1 x 1 + 2,2 x 2 m,2 1, n 2, n m, n c c c 1 1 m m 1,1 2,1,1 1,2 2,2 m,2 1, n 2, n m, n integers integers Figure 2. The encoding of solution representing stochstic model of discrete rections. A series of chemicl rections (top) re represented y corresponding integer coefficients nd rel vlued rte constnts for ech rection (ottom). rels In our experiments, we used the squre-root lw [9] for selecting the ndwidths per smple. This technique requires n initil estimte of the density here, we used n ordinry histogrm with optimize ins chosen y minimizing the men integrted squred error (MISE) [5]. The finl result is smooth continuous estimte of the proility density tht cptures oth shrp nd diffuse fetures in the distriution. 3. ALGORITHM The proposed method for inferring mximum likelihood stochstic model uses n evolutionry lgorithm to serch for sets of rection chnnels nd rtes to mtch the dt. In this section, we descrie the evolutionry encoding of cndidte models in the serch, nd the fitness function. 3.1 Encoding The stochstic model consists of series of rections. Ech rection specifies n integer numer for the inputs, n integer numer for the outputs, nd rel vlued numer for the rection rte. If rection does not use n input, its input vlue is 0; likewise for outputs. We use fixed, mximum numer of rections for our experiments. Cndidte models cn opt to use fewer rections thn the mximum y setting the rection rte to 0, or setting the inputs nd outputs to 0. Figure 2 summrizes our encoding for stochstic model. It consists of mtrix of integer vlued input coefficients for ech rection, vector of rel vlued coefficients for ech rection, nd mtrix of integer vlued output coefficients for ech rection. A rndom encoding is produced y filling ech mtrix with rndom integers, normlly distriuted with zero men nd stndrd devition of 1, nd filling the rection vector with rndom positive rel vlues, normlly distriuted with zero men nd stndrd devition of 1. The muttion opertor works y rndomizing ech individul element with fixed point muttion proility. The crossover opertion recomines two prent encodings to form new offspring. We use rndom single point crossover on the rections for exmple, copying the first n rections (inputs, outputs, nd rte) from the first prent, nd the remining from the second prent. The complexity of the encoding is defined s the sum of ll integer vlued rection coefficients on oth inputs nd outputs of the rections. 3.2 Likelihood Estimte Our gol is to find mximum likelihood model. We cnnot estimted the likelihood of model explicitly, however, we cn estimte the likelihood of seeing the experimentl dt given specific model. This gives mesure of how well prticulr model grees with the dt. In other words, we re trying to mximize the following expression: Likelihood P( xi n i 1 M m) Here, n is the numer of dt points (mesurements of system stte), x i is prticulr dt point, m is prticulr model, nd P is the proility density of the model m t dt point i. Rther thn working directly with proilities, it is numericlly more stle to work with the log of proilities, or the Log- Likelihood: Log Likelihood n i1 log P( x i M m) 309
Figure 3. Compring cndidte model with the experimentl dt. The left pne shows the hypotheticl exct ehvior of system in lue, nd two known mesurements of the system t red dots. The cndidte model is simulted multiple times, strting from the first mesurement for t seconds, in order to estimte proility distriution of the model (right). The stte of the second mesurement is then compred with this distriution to evlute the qulity of the model to reproduce the mesurement. To evlute the likelihood, we need to estimte the vlue of P(x i M = m). We do this y smpling the model m tht is, simulting the model over the time spn from the previous dt i 1 point to the current dt point i. Figure 3 visulizes the simultion process. The cndidte model is simulted, using the previous stte, until the time reches the current stte. Ech simultion is then dded to kernel density estimtor, descried ove, to estimte the proility density P. The log of the density is then summed for ech stte x of the system to the cumultive log-likelihood vlue. 3.3 Fitness Function Ultimtely we wnt to mximize the likelihood of cndidte model, ut since we cn only pproximte the density function, most rndom models will tend to hve zero likelihood nd no grdient to optimize on ecuse we cnnot ccurtely estimte the tils of the proility density function. Our solution to this prolem is to use two-component fitness metric. The two components re: 1. The log-likelihood s usul, nd 2. The distnce of the dt point to the medin vlue of the estimted distriution When model hs ner zero likelihood (e.g. lower thn epsilon = 10-6 in our experiments) we sutrct the distnce of the dt point to the medin vlue of the distriution. Otherwise, the fitness is equl to the log-likelihood. This fitness metric is summrized in Figure 3. By dding the log-likelihood component to the distnce component, the fitness function remins monotoniclly incresing for improving models. This llows initilly poor rndom models to move their distriutions close enough to the dt points such tht their density estimtions cn e used to mximize the likelihood. 4. EXPERIMENTS We perform proof of concept experiments on the sic Lotk- Volterr model [10, 11]. The trget rections for this system re shown elow: x x y y 10 0.1 10 2x 2y 0 The Lotk-Volterr rections model predtor-prey system. In the first rection, prey (represented y x) grow exponentilly. In the second rection, prey my meet predtors (represented y y), cusing prey to die nd predtors to increse in numer. Finlly in the lst rection, predtor cn die out. We generted dt sets of 10 pirs of mesurements of the Lotk- Volterr system. Ech pir consists of rndom initil condition, followed y mesurement fter simulting for fixed time durtion. In our experiments, we compre two types of dt sets, those with short time gps, where mesurements re mde in short succession (time steps of 0.002), nd long time gps (time steps of 0.1) where the stte of the system chnges drmticlly etween mesurements. An exmple of the long time gps dt set is shown in Figure 3 (left), where ech green rrow is pir of mesurements. 310
In the evolutionry lgorithm we use popultion size of 30, crossover proility of 50%, nd muttion proility of 15%. We llow mximum of 3 rections in ech model. In estimting model density for dt point, we smple 100 independent simultions. We trck vrious sttistics of the est solution throughout ech tril, including fitness on trining nd test dt sets. We terminte ll tril runs fter 300 itertions (genertions) of the evolutionry lgorithm. We repeted multiple trils of the evolutionry lgorithm using three different fitness metrics: 1. Log-likelihood only 2. Medin distnce only 3. The proposed distnce nd Log-likelihood metric Therefore, we will e le to evlute strengths or weknesses of ech component in the proposed metric. Figure 4. The serch performnce of the three compred fitness metrics. The top pnes show performnce when dt points pper in rpid succession with short gps in time. The ottom pnes show performnce when there re long gps of time etween dt points. The left pnes show the likelihood score of the est model during the serch. The right pnes show the percent of runs tht identified the exct solution for the mount of computtionl effort. Error rs indicte the stndrd error. 311
5. RESULTS The first results is tht the evolutionry lgorithm is le to find the mximum likelihood model for ll three compred fitness metrics. For the short time gp dt set, Figure 4 (top) shows tht ll three metrics rech pproximtely 90% convergence to the exct known model. Both the likelihood nd hyrid metrics perform 100% convergence fter 100 genertions. In terms of computtion time, ech genertion took pproximtely 1 minute. Most computtion cost lies in simulting vrious cndidte models to estimte their proility densities for ech dt point. On the dt set with lrge time gps, Figure 4 (ottom) shows greter differentition etween the three metrics. The twocomponent metric reches the highest likelihood models nd convergence, followed y the likelihood only metric. The distnce metric only performs the worst. Interestingly, when the time gps re short, the performnce of the two-component metric nd likelihood metric re only pproximtely similr. This indictes tht on short time gps, the proility density of rndom cndidte models is more likely to provided useful serch grdient, ecuse dt points re close to their initil conditions. Here, there is no enefit to using the extr distnce component in the fitness metric. However, the distnce metric ppers to e crucil when the dt set hs lrge time gps (Figure 4). Here, the two-component metric out performs the other metrics. Also interesting is tht the distnce metric lone performs very poorly. This metric llows models to get their distriutions centered on the dt, ut does not optimize the likelihood mking it indequte on its own. In Figure 5 we compre the reltionship etween the loglikelihood score nd the distnce metric. We cn see tht the distnce is correlted with the log-likelihood, ut imperfect. There is lrge vrince verticlly in the log-likelihood for fixed distnce, indicting tht log-likelihood metric is inccurte or t lest unstle t the tils of the model proility distriution. Finlly, we collected vrious trits of the est solution for ech lgorithm during ech serch, shown in Figure 6. The first oservtion is tht the genotypic ge [12] of the est solution (mesured in genertions) is roughly equl to the totl genertions on verge. This indictes tht the evolutionry serch is not eing trpped y locl optim, otherwise the est solutions would pper younger s younger solutions would replce solutions in locl optim. Interestingly, the distnce metric lgorithm tended to hve the highest ges, suggesting tht it voided locl optim most, perhps y identifying n ttrcting region for the glol optim most relily. The novelty of the est solution over time, shown in Figure 6, shows tht the popultions re initilly very diverse efore converging onto optim. But no cler difference etween the compred metrics is pprent. Novelty [13] is defined s the verge distnce summed over the rection coefficients of cndidte solution to nerest neighors in the current popultion. In terms of lot [14], the lgorithm strts off with low lot rtion fter rndom initiliztion. The lot tends to increse quickly, nd then fll towrd rtio of 1 (no lot) s the est solution converges to the trget (Figure 6). The distnce only Figure 5. The reltionships etween the distnce metric of model nd its corresponding likelihood given the experimentl dt. Ech point in the plot is rndom cndidte model during the likelihood serch. 312
Figure 6. Trits of the est model over time during the evolutionry serch. The top left plot shows the genotypic ge of the est solution (the numer of genertions ny prt of the solution existed in the popultion). The top right shows the novelty of the est solution (how different it is from the rest of the popultion). The ottom pne shows the lot of the est solution (rtio its complexity with the trget solution complexity). Error rs indicte the stndrd error. metric tended to rech higher lot, which my e reflection tht it ws less likely to converge to the trget. One finl oservtion is tht for these trits in Figure 6, there ppers to e very little difference etween the likelihood metric nd the two-component metric. The key difference is only in the overll performnce (Figure 4). This suggests tht the role of the distnce component is to help models move towrd the dt so tht the likelihood component cn e used, nd does not impct other spects of the popultion or evolutionry lgorithm. 6. CONCLUSIONS In this pper we introduced n utomted lgorithm for identifying stochstic rection models. The proposed method used n evolutionry lgorithm to identify mximum likelihood set of rections nd rection coefficients. Insted of only optimizing likelihood, the proposed lgorithm used two-component fitness metric tht optimized the distnce of cndidte model's distriution from the dt point when the likelihood ws too smll to provide n ccurte serch grdient. The experiments indicte tht the likelihood metric lone performs well on dt with short time gps in dt set. However, when the dt set contined lrge time gps, where the stte of the system evolved fr from the locl ehvior the two-component fitness metric performed est, finding the exct trget solution fster nd more relily. Oservtions on the ge, novelty, nd lot of the est solution indicte tht the lgorithm voids locl optim, nd could scle well with incresing complexity systems. 313
7. ACKNOWLEDGMENTS This work ws supported in prt y NIH NIDA grnt RC2 DA028981, NSF CDI Grnt ECCS 0941561, nd DTRA grnt HDTRA 1-09-1-0013. The content of this pper is solely the responsiility of the uthors nd does not necessrily represent the officil views of the sponsoring orgniztions. 8. REFERENCES [1] D. T. Gillespie, "Exct Stochstic Simultion of Coupled Chemicl Rections," The Journl of Physicl Chemistry, vol. 81, pp. 2340-2361, 1977. [2] D. T. Gillespie, "Stochstic simultion of chemicl kinetics," Annu Rev Phys Chem, vol. 58, pp. 35-55, 2007. [3] J. L. Doo, "Mrkoff chins--denumerle cse " Trns. Amer. Mth. Soc., vol. 58, pp. 455-473, 1945. [4] Y. Co, D. T. Gillespie, nd L. R. Petzold, "Avoiding negtive popultions in explicit Poisson tu-leping," J Chem Phys, vol. 123, p. 054104, Aug 1 2005. [5] S. Hideki nd S. Shigeru, "A Method for Selecting the Bin Size of Time Histogrm," Neurl Comput., vol. 19, pp. 1503-1527, 2007. [6] M. Rosenltt, "Remrks on some nonprmetric estimtes of density function," Ann. Mth. Sttist., vol. 27, pp. 832-837, 1956. [7] E. Przen, "On Estimtion of Proility Density Function nd Mode," The Annls of Mthemticl Sttistics, vol. 33, pp. 1065-1076, 1962. [8] G. Terrell nd D. Scott, "Vrile kernel density estimtion," The Annls of Sttistics, vol. 20, pp. 1236-1265, 1992. [9] I. S. Armson, "On ndwidth vrition in kernel estimtes-- squre root lw," Ann. Sttist., vol. 10, pp. 1217-1223, 1982. [10] A. J. Lotk, Elements of physicl iology. Bltimore: Willims & Wilkins Co., 1925. [11] V. Volterr, "Vrizioni e fluttuzioni del numero d'individui in specie nimli conviventi," Mem. R. Accd. Nz. dei Lincei, vol. 2, 1926. [12] G. S. Horny, "ALPS: the ge-lyered popultion structure for reducing the prolem of premture convergence," in GECCO 2006: Proceedings of the 8th nnul conference on Genetic nd evolutionry computtion. vol. 1, ACM SIGEVO (formerly ISGEC), 2006, pp. 815--822. [13] J. Lehmn nd K. O. Stnley, "Andoning Ojectives: Evolution through the Serch for Novelty Alone," Evol Comput, p. 24, Sep 24 2010. [14] W. Bnzhf nd W. B. Lngdon, "Some considertions on the reson for lot," in Genetic Progrmming nd Evolvle Mchines. vol. 3, 2002, pp. 81--91. 314