Modelling the Randomness in Biological Systems

Size: px

Start display at page:

Download "Modelling the Randomness in Biological Systems"

Bruce Dalton
6 years ago
Views:

1 Modelling the Randomness in Biological Systems Ole Schulz-Trieglaff E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science School of Informatics University of Edinburgh 2005

2 Abstract This dissertation deals with the modelling of biological processes using Stochastic Petri Nets (SPNs). Petri Nets are a formalism that comes from Computer Science (Petri 1962) and has been used since many years to model systems such as computer networks. Recently, they have also been applied to model biological processes such as genetic networks (Goss & Peccoud 1998). The outcome of this dissertation is twofold. First, a software framework was implemented that allows creating a SPN model in a graphical editor. This software is called PNK 2e and can be used to simulate the behaviour of the net using the infrastructure of the Systems Biology Workbench, a collection of simulation and analysis tools tailored for biological applications. PNK 2e is based on the Petri Net Kernel (PNK), an Open Source project developed at the Humboldt University in Berlin, Germany. PNK 2e is also available under the Open Source software licence. It can import Petri Net models from different XML description formats. The graphical representation of biological models offered by SPNs is very intuitive. Furthermore, a SPN can be simulated using algorithms that are commonly used in the field of Systems or Theoretical Biology. PNK 2e was the topic of a scientific poster at the BioSysBio conference 2005 in Edinburgh. It is available online 1 and has been announced in a forum dealing with Systems Biology software. In the second part of this dissertation, PNK 2e was used to simulate genetic oscillators, networks of genes and proteins that exhibit oscillations with a period close to 24 hours. These systems are thought to represent the molecular basis of the internal clock of many organisms. Two oscillators of very different architecture (Gonze, Halloy & Goldbeter 2002, Vilar, Kueh, Barkai & Leibler 2002) were simulated with different numbers of molecules involved. This behaviour is of special interest, since it is known that circadian clocks have to work reliably with only very few molecules. The obtained results support previous findings (Barkai & Leibler 2000) but also provide new insights about design features of biomolecular clocks. It was found that one particular architecture is even driven by fluctuations in the molecular populations. Usually, this noise is considered to be a source of disturbance, but in this case it is essential for the functioning of the clock. This architecture also reveals significant robustness in case of mutations of key genes or changes of rate constants in the model. 1 trieglaf/pnk2e i

3 Acknowledgements I am very indebted to my supervisor Prof. Gordon Plotkin for his invaluable advice during my time in Edinburgh and for reviewing this dissertation. I also would like to thank Prof. Andrew Millar for his introduction into circadian clocks and many helpful ideas. Many thanks to Lucia Castellanos and Malcolm Leiva Gebhard who reviewed parts of this dissertation and gave many advices and much appreciated support. Thanks to Stephen Ramsey (Institute for Systems Biology, Seattle), Frank Bergmann (Keck Graduate School, Claremont) and Michael Weber (German Aerospace Center) for making their software available to be used in this project. Funding was provided by the Students Awards Agency for Scotland and the Landesregierung des Saarlandes (Government of the state Saarland, Germany). ii

4 Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Ole Schulz-Trieglaff ) iii

5 Table of Contents 1 Introduction Motivation Scope of this Dissertation Organisation Background Introduction Petri Net theory Untimed Petri Nets Stochastic Petri Nets and Markov Processes Representing Biological Processes with Petri Nets Kinetics of chemical reactions Deterministic Kinetics Stochastic Kinetics Simulation Algorithms The Gillespie Algorithm The Gibson-Bruck Algorithm Background on Stochastic Models and Tools in Biology Previous Work on Stochastic Models A Stochastic Model of Pathway Bifurcation in Phage λ Stochastic analysis of Biological models with Petri Nets Analysis of the E.coli Stress Circuit with Stochastic Nets Review of other Petri Net Tools Conclusions iv

6 4 Technical Methodology The Petri Net Kernel Design and Concepts The Extended Kernel (PNK 2e) The Systems Biology Workbench Stochastic Simulations with Dizzy Conclusions Experiments The Volterra-Lotka Reactions Results and Discussion Stochastic models of circadian rhythms The delay-based Model The hysteresis-based Model Synchronising several oscillating cells Discussion Conclusions Concluding remarks and Observations Unsolved Problems Suggestions for Future Work A User guide to PNK 2e 78 B Delay-based Oscillatory Network 83 C Genetic Oscillator based on Hysteresis 88 D Glossary of biological terms 91 Bibliography 92 v

7 List of Figures 2.1 Firing of transitions in a Petri Net A Petri Net Model of Gene Expression Example of Michaelis-Menten kinetics Simplified model of the E.Coli stress circuit Overview of the extended Petri Net Kernel Broker architecture of the SBW Petri Net representation of the Lotka-Volterra Reactions Stochastic simulation of the Lotka-Volterra Reactions Core model for circadian rhythms based on delay Delay-based circadian clock: stochastic and deterministic simulation Stochastic simulation with changing numbers of molecules The hysteresis-based model Hysteresis-based circadian clock: stochastic and deterministic simulation Hysteresis-based circadian clock: simulation with low degradation rate of repressor Stochastic simulation with changing numbers of molecules (hysteresis) Simulation of the effects of gene duplication Simulation of both circadian clocks with low rate constants Synchronisation of several cells Robustness of the oscillattions in both models measured by half-life of autocorrelation A.1 Screenshot of PNK 2e A.2 Screenshot of PNK 2e B.1 SPN representation of the circadian clock in Neurospora vi

8 C.1 SPN representation of the hysteresis-based model vii

9 Chapter 1 Introduction In recent years, the new scientific field of Systems Biology has aroused much interest. Systems Biology deals with the application of methods from Mathematics, Computer Science and Physics to improve our understanding of how biological systems function. It tries to understand the cell in an integrated manner and to obtain an understanding of its single components by examining their relationships and relating them to a global view of the cell. An important aspect of Systems Biology is the search for representations of cellular processes that can be used to compute their future behaviour. This idea is not new and simple models of biochemical reactions have been researched since a long time. But the availability of high-throughput experiments and recent advancements in computational methods and computer power have greatly improved their potential. However, the search for suitable modelling techniques still continues, techniques that are able to capture the full complexity of the cell and its components. The first goal of this M.Sc. thesis was to develop a software tool that facilitates the modelling of a biological system with Stochastic Petri Nets. But during the preparation of this project, its scope was further extended. In addition, we provide a comparison of two general models representing genetic oscillators. These oscillators are assumed to represent the basis of the circadian rhythm that is observed in many organisms. The experiments give an example for a successful application of PNK 2e, the software that was written during the first part of this dissertation. In addition, they provide an insight into possible architectures for more detailed and realistic models of circadian clocks. 1

10 Chapter 1. Introduction 2 Project Objectives The tasks of this dissertation can be summarized as follows: Development of a software platform for experiments with Stochastic Petri Nets. In contrast to software that is already available, this platform should be tailored for biological applications. As an example, it should support data exchange formats that are commonly used in the scientific community. Application of this software to model an exemplary biological system. Evaluate the usefulness of Stochastic Petri Nets for this task and validate the software by comparing the experimental results to findings published in the literature. Further experiment to compare two models describing circadian clocks and how their behaviour reacts to changes in parameters and structure of the model. 1.1 Motivation Petri Net software tools Currently there is a wealth of software environments available, tools that can be used to create representations of chemical reactions and to compute their future behaviour. In the early stage of this M.Sc. dissertation, it was planned to develop a SPN software completely from the scratch. Then it was discovered that several other tools with similar functionality exist already. However none of these tools were freely available and seemed to be of use in biological applications. Therefore it was decided to develop a new Petri Net tool tailored for the modelling of biological processes. Several Petri Net tools were found on the Internet that were developed during M.Sc. projects at other universities. It was estimated that the development of a new tool from the scratch would take almost all of the three months available. It would leave no time to model interesting systems and to conduct detailed experiments. That is why we decided to use the infrastructure from already existing Open Source projects in this dissertation. There are some software projects that seemed to be useful for this task. In the beginning, it was planned to extend the software PIPE (Platform Independent Petri Net Editor) but this turned out to be far more difficult than expected. PIPE does only support non-stochastic Petri Nets and an extension by another net type would have been very laborious. As a consequence, we decided to use the Petri Net Kernel instead and the Systems

11 Chapter 1. Introduction 3 Biology Workbench, a collection of analysis tools for Systems Biology, to simulate the behaviour of the net. Both software tools are published under the Open Source licence and their authors were very helpful and provided much advice. The Petri Net Kernel (PNK) was developed at the Humboldt University in Berlin, Germany, and the Systems Biology Workbench is a collaborative project between several institutes, among them the California Institute of Technology, the Keck Graduate Institute in Claremont and the Institute for Systems Biology in Seattle. Circadian Oscillators In the second part of this M.Sc. project, we validate the PNK 2e software by modelling so-called circadian oscillators. Many organisms are known to have an internal clock that has significant influence on their behaviour and lifecycle. Circadian oscillators or oscillatory networks are believed to be the source of these clocks on a molecular level. They consist of a small set of genes and proteins that interact with each other. At least one of the contained proteins exhibits some rhythmic activity e.g. its concentration in the cell oscillates with a period of about 24 hours. This clock protein is assumed to regulate the activity of other genes and to drive the circadian rhythm of the organism. The understanding of how exactly these circadian rhythms are created is very important, for instance to stimulate the growth of food plants. Nevertheless the exact behaviour of all components in a typical clock is not yet understood and these details are often very difficult to examine in biological experiments. Therefore it has been decided to model two competing architectures for a genetic oscillator with Stochastic Petri Nets and to simulate them under different conditions. As a first step, we tried to recreate findings published elsewhere (Gonze, Halloy & Goldbeter 2002, Vilar et al. 2002) in order to validate our approach. But we also expanded the experiments done by others and tried to gain new insights into possible models for circadian clocks. There exists a group of experimental biologists at the School of Biosciences in the University of Edinburgh that works on mathematical models of circadian clocks. This group is lead by Prof. Andrew Millar who provided many help advices and gave valuable hints for the experiments in this dissertation.

12 Chapter 1. Introduction Scope of this Dissertation As remarked above, there are already many software tools that can be used to model biological systems. However, (Stochastic) Petri Nets provide a more formal view on the modelling problem. Their theory is well researched, efficient algorithms exist not only to simulate their dynamic behaviour but also to examine structural properties. The software that is presented in this dissertation makes use of other Open Source projects and relies on common data exchange formats. This is particularly important since there are many tools that merely repeat work that has already been done elsewhere. In addition, many of these tools use their own formats to store their data and are thus not compatible with other software. When it comes to the experimental part, we focus on stochastic simulations with different numbers of molecules and examine the influences of changes in the rate constants. This is important for two reasons. It is known that genetic circuits have to function reliably under a variety of conditions, also if only very few instances of the involved key proteins are present. Moreover, the reaction rates can be influenced by changes in temperature or nutrition of the organisms. The behaviour of a genetic oscillator should not be influenced by moderate changes in these constants. 1.3 Organisation This dissertation assumes that the reader does not have a background on chemical kinetics or Stochastic Petri Nets. Therefore one chapter is dedicated to explaining the most relevant issues. Nevertheless, a basic knowledge of biology is assumed such as the regulation of genes and the synthesis of proteins, but on a rather basic level. A glossary of the most important biological terms used in this dissertation is given in Appendix D. We will also provide a brief introduction of Markov processes that are closely related to SPNs. The organisation of this dissertation is given below. Chapter One provides an introduction into the theory of Petri Nets. Stochastic Petri Nets are introduced and their relationship to stochastic kinetics is explained. We also give a comparison of the deterministic and the stochastic assumption in chemical kinetics. The most common algorithms for the stochastic simulation of chemical reactions are presented. We will discuss their individual advantages and how they can be used to simulate Stochastic Petri Nets.

13 Chapter 1. Introduction 5 Chapter Two provides an overview of stochastic models and tools in Biology. We present three related efforts on stochastic modelling of biological processes. We aim at providing on overview of previous work and discuss their relationship to this dissertation. The second part of this chapter introduces two software tools that are similar to the software developed during this dissertation. An overview of their functionality is given and a comparison to our software, PNK 2e, is drawn. Chapter Three describes in detail the Petri Net Kernel. We enumerate the extensions that we made and explain why they are necessary. Technical difficulties that were encountered are also discussed in this chapter. We also give details of other software that was used such as the Systems Biology Workbench and the description language SBML. Chapter Four describes the experiments that were performed to validate the new version of the Petri Net Kernel. We will give a brief introduction of circadian clocks. We successfully replicated the results of other authors. Furthermore, an experimental comparison of two competing models for circadian oscillators is given that underlines the usefulness of PNK 2e for experiments in Systems Biology. Chapter Five contains an evaluation of this dissertation, summarizes the conclusions and gives indications for future work.

14 Chapter 2 Background 2.1 Introduction This chapter lays the theoretical foundations of this project. A short overview of Petri Net theory is given and some of its properties are introduced. There are many extensions to this basic theoretical concept. However, only the theory of Stochastic Petri Nets and its relation to Markov Processes will be discussed since it is the model used in this work. In the second section, an introduction to mathematical modelling in Biology is given. The two main competing approaches, deterministic models based on differential equations, and stochastic simulations are compared. The stochastic approach is usually based on the assumption that the behaviour of the system fulfils the Markov assumption. Therefore an outline of the relationship between Stochastic Petri Nets and Markovian random processes is also provided. One of the advantages of Stochastic Petri Nets is that they can be simulated using efficient simulation algorithms. Therefore, two of these algorithms will be introduced in the last section. These algorithms are implemented in the Systems Biology Workbench and were used in our experiments. 2.2 Petri Net theory Petri Nets provide a graphical notation for the formal description of the dynamic behaviour of systems. Although Petri Nets have been used for the qualitative modelling of computer systems and communication networks since the 1960s (Petri 1962), their use as a paradigm for quantitative modelling started only about twenty years ago. Untimed Petri Nets (Place-Transition Nets) are 6

15 Chapter 2. Background 7 introduced first. Following that, Stochastic Petri Nets and their relation to Markov Processes are described Untimed Petri Nets A Petri Net (PN) is a directed bipartite graph with two sets of nodes, transitions and places. Transitions are drawn as bars and places as circles. Places can contain tokens that are drawn as black dots. The state of a PN is given by the number of tokens in all places and is called a marking. The initial placing of tokens is called the initial marking and represents the starting state of the net. Transitions and places are connected by directed arcs. Input places are the places for which the arcs point from the place to the transition. Output places are the places for which the arcs point from the transition to the place. An arc can have an inscription, its multiplicity. Tokens move in the net according to rules given by the transitions. A transition is said to be enabled if each input place contains at least as many tokens as given by the multiplicity of the connecting arc. An enabled transition is executed by removing as many tokens from each input place as given by the inscription of the arc connecting input place and transition and by inserting as many tokens into each output place as given by the inscription of the arc pointed from the transition to the output place (a) Petri Net before Firing (b) Petri Net after Firing Figure 2.1: Example of the execution of a transition in a Petri Net. Starting from the initial marking and following the firing rules we can progress through all possible

16 Chapter 2. Background 8 states of the net. This procedure is called the token game. The set of all possible states of a net, given a certain initial marking, is called the reachability set with respect to this initial marking. Different initial markings may give rise to different reachability sets. For this reason, the initial marking is an important part of the model. The token game gives rise to the reachability graph. This graph contains all markings encountered during the token game as nodes and the transitions between these markings as arcs Stochastic Petri Nets and Markov Processes Stochastic Petri Nets (SPNs) are a popular extension to basic Petri Net theory. In a SPN each transition fires with an exponentially distributed delay. In other terms, each transition has an associated firing rate µ which is the parameter of a (negative) exponential distribution with the density function f (x) = µe µx. This distribution has the memoryless property. If X is an exponentially distributed random variable, then this property states: P(X > t + s X > t) = P(X > s) t > 0, s > 0 Proof: P(X > t + s X > t) = P(X > t + s) P(X > t) = e µ(t+s) e µt = e µs = P(X > s) The exponential distribution is often used to model waiting times until some event occurs. In a SPN, such an event would be execution of the next transition. In this context, knowing that already t time units have elapsed without the execution of a transition, does not give us any additional information about when the next execution will occur. In other terms, if no change of state has happened at time t, then the distribution of the remaining sojourn time in the current state is the same as if no time had passed. If one or several transitions are enabled in some state of the SPN, the delays for each transition are sampled from the exponential distribution. The transition with the smallest delay is then executed first.

17 Chapter 2. Background 9 In recent years, there has been a lot of interest in the application of Markov processes to Biology; for instance, in the modelling of the evolution of genome sequences. In fact, a SPN is nothing else than a representation of a Markov process and the reachability graph is simply the state transition diagram of the Markov process. In order to underline this fact we will give a brief definition of a Markov process and sketch the aforementioned relationship Markov Processes A sequence X = {X(t)} t N of random variables X(t) is called a discrete-time stochastic process. The state space of this process is the set of all possible values that X(t) can assume (Ross 1996). A Markov process is a stochastic process X(t) which has the Markov or memoryless property. Given the value of X(t) at some time t, future values X(s) of the process for s > t do not depend on knowledge of the past history X(u) for u < t : P(X(t n+1 ) = x n+1 X(t n ) = x n,...,x(t 1 ) = x 1 ) = P(X(t n+1 ) = x n+1 X(t n ) = x n ) This memoryless property means that once we have arrived in a particular state, the future behaviour is always the same regardless of how we arrived in the state. Markov processes are popular since the underlying theory is relatively simple. The process can be visualised by its state-transition diagram which contains the states of the process together with the connecting transitions. If we are in a state s S of a Markov process, the distribution of time until the next change of state is independent of the time of the previous change of state due to the Markov property. In other terms, the waiting or sojourn time in a state is memoryless. The only probability distribution that has this attribute is the exponential distribution and therefore the waiting times until a change of state in a Markov process follow this distribution. Each transition in a Markov process has therefore a rate which is the parameter of an exponential distribution. As mentioned above, a SPN gives rise to a Markov process. If we compute the reachability graph of the SPN, this graph is isomorphic to the state-transition diagram of the Markov Process. But this also means that we can analyse the behaviour of this process and by doing this, obtain new knowledge about the dynamics of the Petri Net model. Steady State distribution of a Markov Process An important property of a Markov process is its behaviour over a long period of time. Under certain conditions, the process will settle to some regular or steady state behaviour. This does not

18 Chapter 2. Background 10 mean that the process has stopped and does not make any transitions. But it does mean that the probability distribution of the process being in a certain state does not change anymore. We denote the probability that the Markov process is in state x k at time t by π t (x k ). The steady state has been reached if this probability is not dependent on the time anymore. Thus we denote the steady state distribution by π and π(x k ) is then the probability that the model is in state x k after the steady state is reached. Theorem: (Ross 1996) A steady state distribution, {π(x k ),x k S} exists for every Markov process with the following properties: its transition rates are time homogeneous e.g. they do not depend on the time at which we observe the process, finite number of states, and irreducibility. This means that all states in S can be reached from all other states if we follow the transitions of the process. The steady state distribution can be calculated by using the so-called global balance equations. These equations give rise to a system of linear equations that can be solved by appropriate algorithms. If we derive the underlying Markov process of the Petri Net by computing the reachability graph of the net, we can compute the steady state of the Markov process. In practice, not every SPN has a steady state distribution (because the state space might be not finite) or it would be too expensive to compute it (if the number of states is very large). This makes observations obtained from stochastic simulations even more valuable Representing Biological Processes with Petri Nets The first research involving the application of Petri Nets to biological models was conducted by Reddy, Liebman & Mavrovouniotis (1993). They modelled the combined glycolytic and pentose phosphate pathway in the erythrocyte cell with a Petri Net and used Petri Net theory to analyse qualitative properties of this pathway.

19 Chapter 2. Background 11 Figure 2.2: A Petri Net model of gene expression. This is a abstract model of gene expression and protein synthesis. The gene becomes active with rate λ. The protein is synthesized with rate v and degraded with rate δ. Example is taken from Goss & Peccoud (1998). In general it is easy to represent chemical reactions with a Petri Net. We think of places as chemical species and of transitions as reactions occurring between these species. The multiplicity of an arc is given by the stoichiometric coefficient of the species involved in the reaction. Tokens usually represent single molecules but can also be seen as a fixed amount of molecules such as a mole (= molecules), a common base unit in Chemistry. In the last years, there have been some publications in which Stochastic Petri Nets were used to model coupled chemical reactions (Goss & Peccoud 1998), (Srivastava, Peterson & Bentley 2001). This is due to the fact that if the participating chemical species occur only at very low concentrations then a stochastic model of the chemical kinetics is more accurate then a deterministic one. In this context, we assume that each reaction occurs with a certain probability and the rates of firings in the net are given by the stochastic rate laws. We will give details of this stochastic assumption in chemical kinetics in the next section. 2.3 Kinetics of chemical reactions The next two sections give an overview of chemical kinetics. The basics of the deterministic formulation are outlined and a comparison to the stochastic approach is drawn.

20 Chapter 2. Background Deterministic Kinetics Chemical kinetics are concerned with the time evolution of a reaction system. Classical or deterministic kinetics are expressed in terms of the concentrations of the chemicals. These concentrations can vary continuously as the reactions progress. We assume that the rate of a reaction follows the mass action law which means that this rate is proportional to the concentration (and in turn to the mass) of each reactant raised to the power of its stoichiometry (Cox & Nelson 2004). In other terms, the change of a product quantity is proportional to the product of the reactant concentrations. As an example, in this second-order reaction A + B C the rate of change of C, dc dt, is given by k [A] [B] where [A] and [B] denote the concentration of A and B respectively and k is some constant. Thus the rate of change of C can be modelled by a differential equation dc dt = k [A] [B] The rate constant k needs to be specified as well as the initial concentrations of A and B. This procedure can of course be extended to several coupled reactions. In this case, the reactions give rise to a set of coupled differential equations. These equations can be used to compute the time evolution of the reactions either by solving them analytically (which is not often possible) or by numerical integration. The idea behind the deterministic formulation of chemical kinetics is that even if single molecules move randomly, the overall behaviour of a large group of molecules follows a pattern and this pattern can be modelled deterministically. Even if a set of differential equations cannot be solved analytically, it is often possible to determine characteristics of their steady state behaviour. This can be done by setting the right side of the equations to zero and solving for the concentrations of the reactants. But if we know the steady states of the system, we do not know if a particular set of initial conditions will lead to one of these states and how it will reach it. An example is given in our experiments with the Lotka-Volterra reactions presented in section 5.1.

21 Chapter 2. Background Time of course of Michaels Menten kinetics 90 Concentration of product Time Figure 2.3: Typical time course of a reaction following Michaelis-Menten kinetics. The rate of the synthesis of the product P increases rapidly but converges after some time. This is due to the saturation of the enzyme Michaelis-Menten kinetics Michaelis-Menten kinetics are a special case of deterministic kinetics. They are named after Leonar Michaelis ( ) and Maud Menten ( ). Since these kinetics occur in the experimental section of this work several times, a brief explanation is given in this section. We consider a set of reactions in which a substrate S is converted into a product P only in the presence of an enzyme E: S + E ES (2.1) ES P + E (2.2) We assume that the reactions follow mass-action kinetics. The forward reaction of 2.1 has the rate constant k 1, the backward reaction k 1 and reaction 2.2 the constant k 2. These reactions can be modelled by a set of coupled differential equations (Cox & Nelson 2004). Using several assumptions it was shown that these reactions can be simplified to one single equation expressing the rate of change of the product [P] in terms of the Michaelis-Menten constant K M = k 1+k 2 k 1 and the maximum rate V max of the reaction. V max is given by [E 0 ] k 2 where [E 0 ] is the total concentration of the enzyme E. Thus the Michaelis-Menten equation is given by d[p] dt = V max [S] K m + [S] Both constants, K M and V max can be determined experimentally. V max can be obtained by increasing the substrate concentration until the reaction reaches its maximum rate and K M is equal to the substrate concentration at which d[p] dt equals V max /2 (set K M = [S] in the equation above).

22 Chapter 2. Background 14 This approximation is also part of the SBML description language (section 4.2) and occurs several times in the deterministic model of the biomolecular clock in Neurospora (section 5.2). In order to simulate Michaelis-Menten Kinetics stochastically, this equation is usually decomposed again into its elementary steps. The problem is that the decomposition of the deterministic formulation to the elementary stochastic steps leads to ambiguities since the elementary rate constants are not specified by the deterministic model Stochastic Kinetics As outlined before, classical mass-action kinetics assume that the behaviour of a large number of molecules follows deterministic patterns. The reaction constants are regarded as rates and the various species concentrations are represented by continuous, single-valued functions of time. In many cases. random fluctuations and correlations do not play a significant role for the behaviour of a system and this assumption is adequate. Nevertheless there are many examples for which this approach turned out not to be correct (Arkin, Ross & McAdams 1998). We will start this section by outlining the central assumptions of the stochastic approach to chemical kinetics. After this we will describe how this approach relates to the deterministic model and how stochastic rate constants can be converted into deterministic ones and vice versa. In a stochastic context, the reaction constants are viewed as reaction probabilities per unit time. The temporal behaviour of the system is modelled as a Markovian random walk on the space of the molecular populations of the species. It was proved that the stochastic formulation reduces to the deterministic formulation in the thermodynamic limit e.g. when the numbers of molecules and the volume approach infinity (Kurtz 1971). We consider a set of n chemical species X 1,X 2,...,X n and a set of m reactions R 1,R 2,...,R m. If the container in which the reactions take place is well stirred and in thermal equilibrium, it can be shown that the probabilities that two molecules X i and X j, i, j 1...n, collide is constant (Gillespie 1977). Each reaction R i can therefore be characterized by a single constant c i which is defined as the average probability that a particular combination of R i reactant molecules will react according to reaction R i. The probability of the next occurence of reaction R i in the time interval dt is then c i dt. As an example, let us consider again a simple second-order reaction: A + B C (2.3)

23 Chapter 2. Background 15 The rate constant for this reaction gives the probability that a pair of molecules A and B reacts to produce C. Since there are A B different combinations of molecules of this type, the probability that this reaction will occur somewhere inside the container in the next infinitesimal time interval dt is given by A B c i where A and B denote the number of molecules of species A and B and c i is the reaction constant of the reaction as defined above. If the reaction had been of the form A B then this probability had been c i A and in the case 2A B the probability would have been A(A 1) A c i A 2 c i. We are now examining the relationship between the reaction parameter c i and the deterministic rate constant k i. This is important since much of the literature on biochemical rate constants is dominated by a deterministic point of view. Furthermore, if we want to compare deterministic and stochastic formulations of the same model, we need to be able to convert these deterministic constants into their stochastic counterparts. Referring again of the simple example of reaction 2.3, A B c i dt gives the probability that this reaction will occur somewhere in the container in the next time interval dt. Dividing by the volume V leads us to the average reaction rate per unit volume A B c i /V. This is already close to the deterministic rate constant which is defined as the average reaction rate per unit volume. But the deterministic constants are expressed in terms of concentrations [A] = A/V and [B] = A/V and not in terms of numbers of molecules. But if we replace A and B in the stochastic rate law by [A] and [B], we obtain [A][B]Vc I which is the rate of change dependent on the concentration. Since the deterministic rate law is defined as k 1 [A][B], we can infer from that: k i = V c i for a bimolecular reaction of the form 2.3. For a reaction 2A B, we would have obtained k i = V c i /2. For a monomolecular reaction such as A B, k i and c i are equal. In general, we can conclude that the relationship between c i and k i is simple in a mathematical sense. Conversions and comparisons between parameters in deterministic and stochastic approaches are possible.

24 Chapter 2. Background Simulation Algorithms Starting from the stochastic approach to chemical kinetics described in the previous section, two well-known stochastic simulation algorithms will be introduced: the First Reaction Method developed by Gillespie (1976) and the Next Reaction Method by Gibson & Bruck (2000). Both algorithms are exact procedures for numerically simulating the time evolution of a well-stirred chemically reacting system. Both procedures are used in the experimental section in order to simulate the dynamic behaviour of a Stochastic Petri Net model. The Gillespie algorithm is described in more detail, explaining its relationship to a Stochastic Petri Net and drawing to the Next Reaction Method The Gillespie Algorithm The Gillespie algorithm is the most popular algorithm for the stochastic simulation of coupled chemical reactions. It was proposed by Gillespie (1976) and comes in two different versions: The First Reaction Method and the Direct Method. Following the structure of this paper, the theoretical concept that underlies the algorithm will be introduced first. Then, the traditional master-equation and the simulation approach will be compared. In a deterministic setting, we assume that the time evolution of a chemically reacting system is continuous and deterministic. It is evident that this is not correct since the molecular population levels can only change by discrete integer amounts. In addition, the time evolution is usually not a deterministic process, but is governed by the random movements of single molecules. The Gillespie algorithm is useful if we want to simulate reactions with very few molecules involved. This is the case for many regulatory networks. Furthermore a stochastic approach is appropriate for systems that exhibit instable behaviour. In this case, even small fluctuations in the molecular populations can drive the system out of its current state. This causes drastic changes in the system that could not be predicted by a deterministic formulation. The traditional approach, before Gillespie developed his algorithm, was the master-equation approach. In this approach, random variables are used to denote each possible state of the system i.e. combinations of molecular populations. The master equation or Chapman-Kolmogorov equation is a system of coupled differential equations that describes the transition probabilities in the sys-

25 Chapter 2. Background 17 tem. It is possible to write down and solve the master equation for a system. This would give us complete knowledge of the systems dynamics. However, this is only possible for simple systems with very few states. For larger systems this approach becomes intractable. As outlined above, the Gillespie algorithm follows the assumption of stochastic kinetics. We assume that for each reaction R µ a stochastic reaction constant c µ exists that gives the probability that a particular combination of molecules will react according to R µ. This assumption requires that the systems is kept well mixed either by direct stirring or simply by requiring that nonreactive molecular collisions occur much more frequently than reactive molecular collisions (Gillespie 1977). This is the fundamental hypothesis of the Gillespie algorithm. The algorithm generates a single sample trajectory of the chemical process. This can be interpreted as a random walk through the space of possible states. At each time step, the system is exactly in one state defined by the molecular populations in the system. The Gillespie algorithms then picks a reaction and executes it according to a probability distribution such that the probability of the generated trajectory is the same that the Master equation would assign to this trajectory. By generating many trajectories and averaging their results, we can estimate any parameter of interest such as the average number of molecules of species at some time t. Gillespie (1976) proposed two methods for the simulation of the trajectories. The Direct Method calculates explicitly which reaction occurs next and when it occurs. The First Reaction Method generates for each reaction µ a time τ µ at which it occurs and then executes the reaction which occurs first. We will describe both methods now. The Direct Method This method relies on the probability density P(µ,τ) that the next reaction is µ and occurs at time τ. We already introduced the stochastic reaction constant c µ which is the probability that a particular combination of molecules will react according to reaction R µ. Let h µ be the number of distinct combinations of R µ reactant molecules in a certain state. For a bimolecular reaction X +Y Z, h µ would have the form XY, for a reaction of the form 2X Z, h µ would be X(X 1)/2 etc. We can then define a µ dt = h µ c µ dt as the probability that reaction Rµ will occur in some state (X 1,X 2,...X N ) at time t. It can be shown (Gillespie 1976) that P(µ,τ)dτ = a µ exp( τ a j ) dτ j

26 Chapter 2. Background 18 This equation can be used to compute directly the next reaction to occur. Integrating P(µ, τ) over all τ from 0 to yields P(Reaction = µ) = a µ / a j j In a similar way, we can obtain a distribution of the waiting times until the next reaction occurs by summing P(µ,τ) over all µ which gives us P(τ)dτ = ( j a j ) exp( τ a j ) dτ j This simply means that the waiting time to the next reaction is exponentially distributed with parameter j a j. These two distributions give rise to the Direct method: 1. Set the initial numbers of molecules and set t to Calculate a µ for all reactions µ. 3. Choose a reaction µ according to the distribution P(Reaction = µ). 4. Choose τ according to P(τ). 5. Execute reaction µ by changing the numbers of molecules accordingly. Update the time to t + τ. 6. Go to step 2. This algorithm needs two random numbers per iteration, takes time proportional to the number of reactions to update the a µ values since they depend on the current state of the system and takes time proportional to the number of reactions to calculate j a j. The First Reaction Method This algorithm computes a putative time τ µ for each reaction µ to occur, a time the reaction would occur at if no other reaction occurred first. The reaction with the lowest τ µ is then executed first. 1. Set the initial numbers of molecules and set t to Calculate a µ for all reactions µ. 3. For each reaction µ, compute a delay τ µ according to an exponential distribution with parameter a µ.

27 Chapter 2. Background Let µ be the reaction with the smallest delay τ µ. 5. Execute reaction µ by changing the numbers of molecules accordingly. Update the time to t + τ µ. 6. Go to step 2. These two algorithms seem to be very different, but it can be proved that τ and µ are chosen according to the same probability distribution and both approaches are therefore equivalent (Gillespie 1976). The First Reaction method computes two random numbers per iteration, needs time proportional to the number of reactions to update the a i values and needs time proportional to the number of reactions to identify the reaction with the smallest putative time. The Gillespie algorithm and Stochastic Petri Nets One can already anticipate that there is a close relationship between the First Reaction method and the simulation of a Stochastic Petri Net. In fact, the Gillespie algorithm fulfils the Markov property since the transition probabilities to each new state depend on the current state only. If a SPN is chosen to represent a set of coupled reactions and if its transition rates are chosen according to the mass rate laws, then its behaviour can be simulated with the First Reaction method. The SPN can be seen as a direct graphical representation of the underlying Markov process. In some cases the execution of a reaction might affect other reactions as well, for instance if some reactions share the same reactants. A SPN gives some information about those dependencies since reactions with the same reactants would be represented by transitions with the same input places in the SPN. But the Gillespie algorithm does not make use of this information. In the next section, an algorithm is introduced that takes these dependencies into consideration The Gibson-Bruck Algorithm This algorithm is also called the Next Reaction Method (Gibson & Bruck 2000) and is an improvement of the First Reaction Method as described in the last sections. The Gillespie algorithm in both variants was extremely successful and is still commonly used. Its main disadvantage is the amount of computational effort that is needed to conduct simulations with many molecular species and many reactions.

28 Chapter 2. Background 20 The Next Reaction Method addresses this problem by offering a reduction of the running time while still being exact. In our experiments it turned out that the size of the models was not large enough to cause a substantial difference in the running time. Nevertheless this algorithm was used to obtain results averaged over a large number of simulation runs. In this case, the difference in the running time does make a difference. If one examines the Gillespie algorithm carefully, one can see that it does in fact more work than might be necessary. It recomputes the rate and delay of each reaction even in cases when they have not changed. In contrast to this, the approach of Gibson and Bruck stores both values for each reaction in an efficient data structure and only recomputes them if necessary. Usually it is not allowed to simply re-use random numbers such as the delay for each reaction. In our case it is legitimate due to the fact that the delays follow an exponential distribution. The Next Reaction Method: 1. Set the initial numbers of molecules, set the time t = 0, calculate the stochastic rate a i for each reaction i and calculate a delay τ i for each reaction. 2. Let j be the index of the smallest τ i. 3. Execute reaction j by changing the numbers of molecules accordingly. 4. Update a j according to the new state and compute a new putative delay for reaction j using the new a j. 5. For each reaction whose parameter a i is affected by the execution of reaction j: (a) Update the rate a i, but store the old value as a i,old. (b) Set τ i = (a i,old /a a,new )/(τ i t) +t (see (Gibson & Bruck 2000)). (c) Delete old rate a i,old. 6. Go to step 2. The Next Reaction method needs to know which a i s are affected by the execution of a reaction. Gibson and Bruck state that this can be achieved by using a data structure called dependency graph. This graph has a node for each reaction. A directed arc connects nodes i and j if the rate of reaction j is affected by the execution of reaction i. The graph can be constructed automatically before the

29 Chapter 2. Background 21 algorithms starts by searching for species that are reactants or products of one reaction and reactants of another reaction. After the execution of a reaction i, the algorithms determines the rates that need to be changed by examining the children nodes of node i in the graph. In order to obtain efficiently the reaction with the smallest delay, Gibson and Bruck propose an indexed priority queue. The queue is implemented by another graph which offers fast search and insert operations. The Next Reaction method fits more naturally to a Stochastic Petri Net since it operates on a more local level. The graph representation of a SPN is similar to the dependency graph proposed by Gibson and Bruck. For a reaction which is represented by a transition in the SPN, the set of all places which are connected to this transition are molecular species that are affected by this reaction. These species might be reactants of other reactions and the rates of these reactions have to be changed. This is an interesting relationship but since the Systems Biology Workbench (Hucka, Finney, Sauro, Bolouri, Doyle & Kitano 2002) offers already efficient implementations of the Gibson Bruck algorithm, it was not used here. The Gibson Bruck algorithm needs one new random number per iteration and it only recomputes reaction rates and delays if necessary. It is the fastest exact simulation algorithm. But since it is not very easy to implement, the Gillespie algorithm is still the most commonly used approach.

30 Chapter 3 Background on Stochastic Models and Tools in Biology After laying the theoretical foundations of this project in the last chapter, some previous work on stochastic modelling of biological phenomena will now be reviewed. It might be surprising, but random fluctuations or noise play a very important role in the regulation of genes and in turn can lead to a random behaviour on the level of metabolic or regulatory pathways. One of the most famous examples is the Phage λ decision circuit (Ptashne 1992). Arkin et al. (1998) developed a stochastic model of this gene network that was able to explain the apparently random decision between lysogenic or lysic cycle. A brief description of this model, which was one of the first and most comprehensive stochastic models of a gene network, is given, together with a review of two other publications that used the same modelling formalism. A comparison of their approach with the findings of this dissertation is given. To our knowledge, these two publications are the first scientific projects in which Stochastic Petri Nets where used to model the dynamic behaviour of biomolecular interactions. The last section in this chapter is dedicated to the technical methodology of this project. Two software tools that can be used to edit and simulate Petri Nets are presented. A comparison is drawn between these tools and the Petri Net Kernel, the tool that was used and extended in this study. 22

31 Chapter 3. Background on Stochastic Models and Tools in Biology Previous Work on Stochastic Models A Stochastic Model of Pathway Bifurcation in Phage λ The following simplified description of the pathway bifurcation in phage λ follows the paper by Arkin et al. (1998). They were the first to present such a detailed stochastic model of a regulatory gene network. They also proved that random fluctuations in the molecular populations can have drastic consequences for the organism as a whole. We provide a short description of the model and draw a comparison to our work. In reality, the decision circuit is much more complex and contains various other factors. But this description focuses on the most important ones. The model The phage λ is a bacteriophage, a parasite that attacks E. coli bacteria. The phage attaches its tail to the surface of the bacteria and injects its chromosome into it. After this, the infected bacteria can switch between two states: in some cases the phage chromosome replicates itself and new phages are produced in the host cell. According to Ptashne (1992) it takes about 45 min until the infected bacterium is destroyed (lysis) and about 100 new phages are released. In other cases the phage chromosome enters the lysogenic state, integrates into the host DNA and is replicated together with the bacterium. The phage chromosome stays therefore in a dormant state and only some events, for instance ultraviolet irradiation, can lead to a lysis of the bacteria. Apart from this so-called stress-induced decision between two states, it was also observed that the change between the two states can occur at random. The decision whether an infected E.coli cell enters either the lysic or lysogenic pathway is mainly controlled by two different proteins: Cro and CI. Cro starts the lytic cycle. If it is expressed constantly over a longer period of time, the chromosome of the λ phage is replicated and the lysis of the bacteria is inevitable. On the other side, the CI protein controls the lysogenic pathway. If it is expressed constantly, other phage genes are suppressed and the phage enters a dormant state and the lysogenic cycle starts. However CI is usually expressed after infection with the phage. But its expression can be induced by another protein, CII. This protein is usually degraded very quickly and CI is not expressed. But if a fourth protein, CIII, is expressed at the same time, the degradation of CII is slowed down and CII molecules are available long enough to induce the expression of CI which leads to the lyogenic state.

32 Chapter 3. Background on Stochastic Models and Tools in Biology 24 To conclude, if the proteins CII and CIII are expressed shortly after the infection by a λ phage, CI is activated, the expression of Cro is suppressed and the bacteria enters the lysogenic pathway. If not, CI is not expressed and the lysis is started. Results Arkin et al. (1998) were able to show that the lysis-lysogeny decision is indeed influenced by random bursts in the protein production. In all cases, a burst in the concentration of the CII protein occurred after infection with the phage. But only in the lysogenic case, this burst was by chance accompanied by a burst of CIII production. The CIII could then stabilize the CII production which in turns leads to an increase of CI and entry into the lysogenic state. In contrast, in the lytic-fated case no CIII production occurred so the unprotected CII rapidly degraded and did not activate the CI expression. Without expression of CI, Cro production continued and lysis ensued. These results show that random fluctuations in the production of one single protein can influence the fate of a simple organism. This work was the first comprehensive stochastic model of a regulatory network. It highlighted the need for stochastic models to capture the full complexity of Biology. Even if the authors did not use Petri Nets to model the network, it would have been possible since they simulated the system with the Gillespie algorithm. This corresponds to the execution of the corresponding Stochastic Petri Net as described in chapter Stochastic analysis of Biological models with Petri Nets The first attempts to model biological systems with Petri Nets focussed on static properties of the model. Reddy et al. (1993) were the first to use Petri Nets to model the combined glycolytic and pentose phosphate pathway in the erythrocyte cell. Goss & Peccoud (1998) used Stochastic Petri Nets (SPNs) for the first time to model the dynamics of molecular interactions. They were the first to recognize the advantages that are offered by this representation. For these reasons, a brief summary of their work and a comparison to our approach is given. The model Goss & Peccoud (1998) explain the terminology of Stochastic Petri Nets and illustrate how it can be used to model molecular interactions. Furthermore they present a stochastic model of ColE1 plasmid replication and compare its simulation results to the deterministic solution.

33 Chapter 3. Background on Stochastic Models and Tools in Biology 25 They introduced the very intuitive mapping of molecular species to places and reactions to transitions in the Petri Net. The same representation was used in the experimental section of this work. Nevertheless, sometimes it is more convenient to understand places more as a certain state in the system rather than a species. For instance in the model of a biological clock (section 5.2), a repressor protein binds to the promoter region of a gene. This binding reaction is represented by a transition whose input set consists of the protein and the gene place. The output place of this transition represents the suppressed gene with the bound protein. But this place rather represents the state inhibited gene than a molecular species since gene and protein are still two separate molecules. Goss & Peccoud (1998) simulate the Petri Net with the UltraSAN software (Deavours, II, Qureshi, Sanders & van Moorsel 1995). This simulation is similar to the Next Reaction Method (Gibson & Bruck 2000). In addition, they also derive an analytic solution by solving the associated Markov process for its steady state. They briefly mention that Petri Net theory can also be used to examine structural properties of the net. We tried this as well but in our case the results were not very interesting. It is possible, for instance, to examine the Petri Net for place invariants. These invariants are, informally speaking, a set of places whose sum of tokens does not change during the simulation of the net. Not surprisingly, the places that represent an enzyme and the enzyme-substrate complex are part of an invariant since the number of enzyme molecules is always constant. Results To summarize, Goss & Peccoud (1998) were the first to apply SPNs to model chemical reactions. Their work is the foundation of this project even if their objective was slightly different. They were interested in analytical and simulation results of the Markov process represented by the Petri Net. However, as already mentioned in chapter 2, obtaining the steady state distribution of the Markov process is not always possible. Goss & Peccoud (1998) demonstrated that it is possible to restrict the state space of a SPN by simply enforcing an upper bound on the number of tokens in the states. It is important to find a reasonable value for this bound. We did not follow this approach in our experiments since it is unlikely that the steady state behaviour of the systems we considered give interesting results. The models of circadian clocks we are presenting in 5 exhibit an oscillatory behaviour and the steady state distribution will simply give an average of the oscillations. In addition, simulation results are easier and faster to obtain. Therefore

34 Chapter 3. Background on Stochastic Models and Tools in Biology 26 we focus more on the evaluation of the results obtained from the stochastic simulation. Goss & Peccoud (1998) compared analytical and deterministic results with the simulated behaviour of the net. We also made comparisons between the deterministic and stochastic solution but following the work of Gonze, Halloy & Goldbeter (2002), we were also interested in the behaviour of the model with different numbers of molecules involved. The behaviour of the stochastic model under different conditions was not a topic in the work by Goss & Peccoud (1998) Analysis of the E.coli Stress Circuit with Stochastic Nets (a) Model of E.Coli Stress Circuit (b) Time course of σ 32 concentration Figure 3.1: (a) An overview of the simplified E. Coli stress circuit (Srivastava et al. 2001). The σ 32 protein is synthesized and forms a holoenzyme together with the mrna polymerase (Eσ 32 ). This complex binds to the promoter regions of several chaperone proteins and proteases that degrade misfolded proteins. Some of the chaperons can also bind to Eσ 32 and serve as a reservoir of the sigma factor and lead to its degradation. (b) Ethanol stress response with and with out σ 32 mrna antisense. The results of the stochastic simulation (thick solid line) + standard deviation (thin dotted lines) compared with experimental data (points with error bar). The antisense mrna binds to the σ 32 mrna and was included to see how the model behaves if the σ 32 synthesis is inhibited. Srivastava et al. (2001) developed a SPN model of the E.coli stress circuit and used this model to characterize the behaviour of the bacterium under stress i.e. if exposed to heat, ethanol, heavy metals etc. σ 32 is a protein that regulates the expression of other proteins in response to external stress. Under stress, the rate of synthesis and stability of σ 32 increases. In this case the production of other proteins such as chaperones (enzymes which assist other proteins in achieving proper

35 Chapter 3. Background on Stochastic Models and Tools in Biology 27 folding which is affected by stress) or proteases (enzymes that degrade other proteins). Results The problem is that many details of the σ 32 mediated pathways are still not understood. The authors validated their model by comparing its simulated behaviour to experimental results. But it is not sure if all parts of the model are correct, such as the exact binding sequence of the different chaperons. Nevertheless, they were able to reproduce experimental results and to gain new insights into the behaviour of the stress circuit. One of these results was that the σ 32 response is mainly controlled by the rate of mrna translation and that large quantities of the protein are bound to chaperones under non-stress conditions. This allows the cell to react rapidly to external stress by releasing these proteins and not waiting until new ones are produced. The authors also underlined the fact that a stochastic formulation can be used to generate estimates of the variance in the data. This information cannot be obtained from a deterministic model. Their approach was similar to ours since they created a simplified model of a biological system and used simulation results to understand it better. They faced the same dilemma as we did since many details of the real biological system are not understood, so one has to generalize and make assumptions about some parameters. On the other hand, our experiments deal with rather general features such as the architecture of the system, its resistance to noise etc. In contrast to this, Srivastava et al. (2001) were interested in very specific features of the σ 32 pathways such as the partitioning of the σ 32 within the cell and the time evolution of its concentration. 3.2 Review of other Petri Net Tools An Open Source software tool was used to edit and simulate the Petri Net models in this project. The development of a new software completely from the scratch would have taken too much of the three months that were allocated for this project. Hence we decided to search for a suitable software and to extend it, developing the features needed. It turned out that there are many software tools available that have some of the required features. There are some very good Petri Net editors such as GreatSPN (Chiola, Franceschinis, Gaeta & Ribaudo 1995) or UltraSAN (Deavours et al. 1995). GreatSPN supports Generalized Stochastic Petri Nets (Marsan, Balbo, Conte, Donatelli & Franceschinis 1995), Stochastic Petri Nets with

36 Chapter 3. Background on Stochastic Models and Tools in Biology 28 immediate transitions and inhibitor arcs. UltraSAN started as a tool for Stochastic Petri Nets but offers now many extensions to this theory such as arcs that are associated with a certain function and various analytical solvers on the level of the Markov process. Both tools are general tools with no specific application domain. That means that they do not offer any features specific to a biological application. Nevertheless, they are widely used and UltraSAN has also been used to model regulatory networks in E.Coli (Goss & Peccoud 1998). The problem is that they have often very restrictive licenses and, even if they are developed by research groups at Universities, the developers do not support any extensions of their software. Thus we searched for alternatives. A great help was the online archive of Petri Net tools available at the University of Hamburg, Germany. 1 In the remainder of this chapter, we will review some of the Petri Net software tools that are available online and justify the decision to use the Petri Net Kernel (Kindler & Weber 2001). Requirements for this project The software needed should fulfil the following requirements: a graphical interface that is easy to use, platform-independent (which usually means that it is written in the programming language Java), support of a stochastic simulation, means to analyse the results of the simulation, and a modular structure, such that we could implement new features if necessary. Most of the Petri Net tools at Sourceforge and at the Petri Net world were tested for this abilities. As an example, we will present two of these tools and the extent to which they fulfil these requirements. 1

37 Chapter 3. Background on Stochastic Models and Tools in Biology 29 Cell Illustrator (Genomic Object Net) The basis of this software project was an extension of the discrete Petri Net theory by continuous features (Matsuno, Doi, Nagasaki & Miyano 2000). This can be useful since protein concentrations vary continuously but are coupled with discrete switches (i.e. protein production is switched on or off depending on the expression levels of some genes). Based on this observation, Matsuno, Tanaka, Aoshima, Doi, Matsui & Miyano (2003) developed the theory of Hybrid Functional Petri Nets (HFPN). In general, a hybrid Petri Net contains two sets of places and transitions, discrete / continuous places and discrete / continuous transitions. Discrete places and discrete transitions are the same as in the discrete Petri Net model. In contrast to this, a continuous place holds a nonnegative real number as its content. A continuous transition fires continuously and its firing speed is a function of the values in the places. Finally, a hybrid functional Petri Net has discrete and continuous input and output arcs but also test input arcs. This arc can be directed from a place of any kind to a transition of any kind and does not consume the content of the source place. The test arc only inhibits its target transition if its source place contains as least as many tokens as given by its transition. The simulation software Genomic Object Net (GON) implements these Hybrid Functional Petri Nets. Matsuno et al. (2003) claims that using GON it is possible to construct a computational model directly from a map of the biological pathway taken from the literature. GON uses a system of differential equations to simulate this pathway. The parameters of the reaction have to be determined by experiments or found in the literature. A trial version of GON was installed and tested. It is evident that a hybrid net is ideal to model switch behaviour that may occur in biology. The software is very easy to use and gives a very professional impression. On the other hand, it is does not implement any stochastic simulation features so it is a very different modelling approach. One of the largest drawbacks of GON is that the latest version has become commercial. The company Gene Networks International sells this software now under the name Cell Illustrator. Even academic users have to pay a considerable sum for the full version but a trial version with limited functionality is available. Cell Illustrator is only available for Windows. It is very easy to use but cannot be extended and

38 Chapter 3. Background on Stochastic Models and Tools in Biology 30 does not offer any stochastic simulation features. Therefore it was not suitable for this project. PIPE - Platform Independent Petri Net Editor In contrast to GON, this tool is a classical editor for Petri Nets. It was developed during a M.Sc. group project at the Imperial College London 2003 (Bloom 2003). After the end of the dissertation, PIPE was further extended and is now available as Open Source project at the Sourceforge online repository 2. The software was written with the aim to provide an easy-to-use application. The authors also wanted to ensure that the program is extensible by creating a modular structure. New modules can be added easily to extend to functionality of the program. So far, different analysis modules have been developed such as modules for the identification of invariants, state space analysis etc. In the beginning, it seemed that PIPE would be an ideal tool for this project. It is published under an Open Source licence, written in Java and very intuitive to use. Nevertheless, it does not support Stochastic Petri Nets and hence no stochastic simulation. First we planned to implement these features on our own and received considerable support from the current maintainer, James Bloom. Unfortunately, it turned out that the features needed were far more difficult to implement than expected. After some days it was decided to use another software tool that was easier to extend. The problem was that PIPE was written in such a way that new analysis modules could be added easily, but it was difficult to introduce new types of Petri Nets. In its current version, PIPE only supports non-stochastic Place-Transition Nets and is therefore not suitable for this project. 3.3 Conclusions The idea that a stochastic simulation of chemical reactions can be more accurate and natural from a physical perspective is not new. But the fact that variations on the molecular level can have substantial influence on the high-level pathways and even on the phenotype of the organism was proven only some years ago (Arkin et al. 1998). We were able to make use of the experience and the work of others in this project even if our approach slightly differs from the work published so far. 2

39 Chapter 3. Background on Stochastic Models and Tools in Biology 31 From a software engineering perspective, we faced the difficult task to find a software tool that fulfilled most of our requirements but was also modular so that it could be extended easily if needed. There is the clear tendency that every research group writes their own tools mainly because self-written tools appear to be more trustworthy. In the near future it might become even more important to integrate different tools and their abilities in order to avoid a waste of time and resources. For this project, it turned out that even if there are many tools available, none fulfilled all requirements. We had the option to either write our own tool or to extend an existing one. Writing a useful software from the scratch takes a considerable amount of time, probably even more than the three months that were allocated for this project, and therefore we decided to use an existing tool. This also gave us enough time to put more effort into the experimental and modelling part of the dissertation.

40 Chapter 4 Technical Methodology The first half of this chapter describes the Petri Net Kernel (PNK) version 2 (Kindler & Weber 2001) and the extensions made during this project. We start by giving a general overview about architecture and functions of the Kernel in the first section. After this the extended software as it was used in this dissertation is presented. A documentation on how the Kernel can be extended by support for other Petri Nets is detailed in the remainder of the first section. In its current version, the Kernel can only be used to visualize and edit Petri Nets but not to simulate their behaviour. An extension of PNK was developed that uses the infrastructure of the Systems Biology Workbench (Hucka et al. 2002) to simulate the Markov process represented by the Petri Net. We call this extended version PNK 2e (extended Petri Net Kernel 2). The second half of this chapter is therefore dedicated to the Systems Biology Workbench (Hucka et al. 2002). An overview of this project and its components is given, as well as the XML implementation SBML (Finney & Hucka 2003), which is used as the workbench s data exchange format. 4.1 The Petri Net Kernel The Petri Net Kernel (PNK) was developed at the research group of Theory of Programming at the Humboldt University of Berlin, Germany. It is rather intended to provide an infrastructure for the development of Petri Net tools than to be a tool of its own. The current version is 2.2 and written in Java. It is available at 32

41 Chapter 4. Technical Methodology 33 In its standard version PNK supports Place-Transition nets and Coloured Petri Nets, Petri Nets in which the tokens belong to different classes and are distinguishable. It also offers a rudimentary graphical editor that can be used to create and modify Petri Nets. The rationale for the development of PNK was the fact that implementing a new Petri Net tool can be very time consuming. Most of the implementation effort for a new Petri Net tool is spent for almost the same functionality: a graphical editor, a visualization device, functions to save and store the net, etc. The effort for implementing this standard functionality is spent over and over again when new tools are developed by different groups. The aim of the PNK project was to provide a common infrastructure for Petri Net tools and to avoid repeating the same implementation effort. Right now the PNK project is finished. But a new and improved version is currently developed at the University of Paderborn, Germany. There are some tools that have been developed using PNK such as PNVis which supports the 3D visualization of Petri Nets and the Petri Net Cube, a software that implements parametric Petri Nets Design and Concepts In the following an outline of the architecture of PNK and its modules is given. New Petri Net types can be added to the Kernel by implementing classes for places, transitions and arcs. Each of these classes can have labels or extensions which are also Java classes. As an example, a place has an extension for its current marking and a transition has an another extension which represents the rate of the exponential distribution from which the delay is sampled. The net as a whole can also have one or several extensions, for instance its name and additional information about the net type. The dependencies between classes and their labels are defined in a XML file called nettypespecification. An example of a definition file for a Stochastic Petri Net is given on the next page. This concept of a net type is very general. Its advantage is its close relationship to the Petri Net Markup Language (Weber & Kindler 2002), which is a common description language to exchange Petri Nets between different software tools. It is also used by the Kernel to save nets created by the user. Nevertheless, this markup language defines only the syntax: which labels and combinations of labels are allowed for a certain net type. In order to provide the semantics, methods can be defined for each label. For instance the label representing the marking of a place has methods that

42 Chapter 4. Technical Methodology 34 define the addition and subtraction of tokens. These methods are used by the simulator to move the token through the net. Each net has also an associated instance of the class FiringRule. This class defines how transitions are executed and how conflicts between transitions are resolved. In a SPN, the class StochasticRule defines that the delays in the net follow an exponential distribution and that in case of a conflict, the transition with the smallest delay is executed first. Application Modules and I/O Modules Each instance of a net type that is loaded into the editor is checked against its definition. Furthermore, the Petri Net Kernel provides a net type interface. This interface can be used to derive new types. It basically states that each label and node in a net must be defined by a Java class. Once a new net type is defined that fulfils the requirements of the net type interface, an instance of the net can be loaded into the Kernel. PNK also contains templates that define how a Petri Net can be saved to and retrieved from a file, the so-called I/O-modules. So far, there is a module that saves a net into a PNML file (Petri Net Markup language). Support for SBML and CMDL was added. SBML is used in the Systems Biology Workbench to exchange models between different components and CMDL is a simplified description language for chemical reactions supported by the stochastic simulator Dizzy (Ramsey, Orrell & Bolouri 2005). The aim of the PNK project was to provide a common infrastructure for a variety of Petri Net tools by offering templates for tasks that occur repeatedly. The architecture of the Petri Net Kernel consists of five parts corresponding to the different steps encountered during the development of a Petri Net tool: The net interface provides methods to access and to modify a net. It is also used to synchronize the activity of different applications accessing the same net. The net dialog interface provides means to visualize information within a net and to interact with the end user. This interface is used by applications that require an input by the user. The net type interface, already mentioned above, states how to define labels and firing rule of a net type. It is used to check the validity of a net type. The application interface is a template for the definition of a new application based on the PNK project. The related InOut interface defines the minimal functionality an I/O-module has to offer.

43 Chapter 4. Technical Methodology 35 New tools and their relation to existing net types can be defined by in a XML file, the tool specification. This file contains an entry for each application developed using the infrastructure of the Kernel. Each of these application entries has a sub-entry for each net type the application supports. For instance, a stochastic simulator only supports Stochastic Petri Nets and therefore its entry in the tool interface has only one sub-entry for Stochastic Petri Nets. These interfaces will be discussed in more detail in the remainder of this section. We do not aim at giving a complete review of all methods and classes, but rather a general idea of the available functions and how they were used in this dissertation. The Net Interface This interface consists of a collection of Java classes that represent a Petri Net together with its extensions. The basic classes are Net, Place, Transition, Arc, and Extension. Each class provides methods for accessing and modifying the corresponding net and its elements. As an example, for an instance net of class Net, the method call net.getplaces() returns a list of all its places. Likewise, there are methods for returning the in-going and out-going arcs of a transition or of a place. There are also methods to access the label of an element. For all these methods, the PNK will take care of maintaining the net consistent. This means that the Kernel synchronizes the state of the Kernel between the different applications accessing it. The Java class application control, which is described at the end of this section, keeps track of changes to the net made by different tools and guarantees the consistency of the net in all situations. Definition of Petri Net types As outlined before, a net type in PNK is defined by a collection of classes for nodes, transitions and extensions. The dependencies of these classes are given in a XML file. The net type interface checks a net type definition for its validity. In addition, the declaration of a new Petri Net requires also the definition of its firing rule i.e. how its transitions are executed. During the extension of the Kernel by support for Stochastic Petri Nets, classes representing stochastic transitions were implemented. Each class has a rate which is a label that is derived from the super-class Extension. This class provides a set of basic methods that need to be provided by all

44 Chapter 4. Technical Methodology 36 extensions such as a method to convert the internal value of the class into a string etc. A concrete instance of a class stores all of its extensions in a hash. This hash contains key-value pairs of each extension with their name as value. The net type itself is specified in a XML file. The file enumerates all possible labels for each kind of Petri Net element. The following listing shows an example of a Petri Net type specification. The listing is simplified for the sake of clarity. <?xml v e r s i o n = 1. 0 e n c o d i n g = UTF 8?> <!DOCTYPE n e t T y p e S p e c i f i c a t i o n SYSTEM n e t T y p e S p e c i f i c a t i o n. d t d > <n e t T y p e S p e c i f i c a t i o n name= S t o c h a s t i c N e t > <e x t e n d a b l e c l a s s =. k e r n e l. Net > <e x t e n s i o n name= f i r i n g R u l e c l a s s = S t o c h a s t i c N e t R u l e /> </ e x t e n d a b l e > <e x t e n d a b l e c l a s s = P l a c e > <e x t e n s i o n name= marking c l a s s = NaturalNumber /> <e x t e n s i o n name= i n i t i a l M a r k i n g c l a s s = NaturalNumber /> </ e x t e n d a b l e > <e x t e n d a b l e c l a s s = Arc > <e x t e n s i o n name= i n s c r i p t i o n c l a s s = NaturalNumber1 /> </ e x t e n d a b l e > <e x t e n d a b l e c l a s s = T r a n s i t i o n > <e x t e n s i o n name= r a t e c l a s s = DoubleValue /> </ e x t e n d a b l e > </ n e t T y p e S p e c i f i c a t i o n > The file contains a XML specific header and mentions the name of the Petri Net type. It then introduces the two types of nodes in the SPN together with their extensions. The net itself has the extension FiringRule, a place has a InitialMarking and a current Marking. The distinction between initial marking and a current marking is useful to track the time evolution of the numbers of tokens in a place. Each stochastic transition has a rate, which is the parameter of an exponential distribution, and each arc in the net has an inscription. It is important to distinguish between the net interface and the net type interface. The net interface as described in the last section offers the basic classes needed for the implementation of a new Petri Net such as places, transitions etc. In contrast to this, the net type interface is used to check the validity of a new net type. One requirement is that each component of the net as defined in the type specification file is also represented by a Java class.

45 Chapter 4. Technical Methodology 37 The Dialog Interface This interface is needed by applications that require an action by the user or want to present the results of some computation to the user. It offers functions to display textual information in the net, to highlight a set of nodes or to request a decision from the user. We used it in the Simulator application which plays the token game by moving the tokens through the net. While performing the simulation, enabled transitions are highlighted and the flow of tokens is illustrated by changing the node labels. In a net type that does not define how conflicts between simultaneously enabled transitions are resolved, the user can be asked to decide which transition to execute by clicking on the corresponding node. By default, if an application requests a dialog, this request will be passed to the editor which then displays a dialog to the user, colours the nodes, etc. But it is also possible to implement one s own dialogs. Creating new Applications with PNK This section briefly summarizes how a new Petri Net application can be created by using the infrastructure of PNK. A new application is a small module in a Petri Net tool that implements some functionality. As an example, the Editor is an application and the stochastic simulator as well. Technically, an application is a class derived from the PNK class MetaApplication or MetaBigApplication. In the most simple case, an application modules implements only one single method run(). The current net can be accessed by the method getnet(). In a more complex application, there can be arbitrarily many methods. In that case, however, the application must implement a method getmenus() which provides the PNK with the necessary information on the available methods, such that the PNK can provide corresponding menus for the end user and start a method on user request. There are also two more special interfaces, NetObserver and ApplicationNetDialog. The Net Observer interfaces allows an application to keep track of changes that are made to the net. The ApplicationNetDialog interface can be used to display graphical information screens or questions to the user. The editor is an example of an application that implements both interfaces. Input/Output-modules are classes derived from the class InOut and are similar to the aforemen-

46 Chapter 4. Technical Methodology 38 tioned applications. They have to implement two methods, load() and save(), that load or save a net to or from a given URL. Support for PNML is implemented in the class PNMLInOut. Definition of new Tools Again we have to clarify an important point. In the framework of the Petri Net Kernel, an application is a small module that is part of a larger program or tool. A tool is a collection of at least one new net type and at least one application. The relationships between net types and application are defined in a XML file, the tool type definition. This file contains an entry for each application. Each of these entries has a sub-entry for the net types the application supports. This is a part of the tool definition for the extended Petri Net Kernel that was developed in this dissertation. <n e t t y p e i d = n10 t y p e S p e c i f i c a t i o n = f i l e : n e t T y p e S p e c i f i c a t i o n s / S t o c h a s t i c N e t. xml /> <n e t t y p e i d = n11 t y p e S p e c i f i c a t i o n = f i l e : n e t T y p e S p e c i f i c a t i o n s / GSPN. xml /> <a p p l i c a t i o n i d = a1 mainclass = de. h u b e r l i n. i n f o r m a t i k. pnk. app. S t o c h a s t i c S i m u l a t o r > <a l l o w e d N e t t y p e s > < n t r e f r e f = n10 /> < n t r e f r e f = n11 /> </ a l l o w e d N e t t y p e s > </ a p p l i c a t i o n > The file specifies references to the definitions of the Stochastic Net and the Generalized Stochastic Net. The next lines refer to the Stochastic Simulator (token game) which is available for Stochastic Nets and Generalized Stochastic Nets. There are also additional attributes that can be defined for an application, such as the maximum number of instances and a default net type. The Application Control The Java class Application Control is finally the mediator between the different applications in a Petri Net tool. When the Kernel is started, this class loads all applications and net types as given by the tool definition. The application control also coordinates the interaction between the different applications and starts an application on request by the user The Extended Kernel (PNK 2e) This section summarizes the extensions implemented for the Petri Net Kernel. The Kernel was developed to provide an infrastructure for new Petri Net tools and has a modular composition. It

47 Chapter 4. Technical Methodology 39 Figure 4.1: Overview of the extended Petri Net Kernel. The interface to the INA software was developed at the Humboldt University of Berlin but is also available in the extended version. The simulation interface to the Systems Biology Workbench was developed in this work. For a description see below. was therefore very suitable for this project. Developing new Petri Net software from the scratch takes a considerable amount of time. The idea behind this project was to extend an existing software tool and spend more time on the application of the software. It turned out that this approach also has some shortcomings. The Petri Net Kernel is a mature software tool. Nevertheless, it still contains some bugs that appeared during this work. Despite its modularity, more changes had to be made than expected and these changes took more time than anticipated. Nonetheless the Kernel turned out to be useful and many helpful ideas and support were received from its developers. It is available online 1 and has been registered to a database of Systems Biology software 2. The remainder of this section will be used to present the changes that were made to the Petri Net Kernel. Print / Export A function was implemented that allows the user of PNK to print a Petri Net or to export it to a Postscript file. This function relies on the methods provided by the Java API The plots of Petri Nets in this report were created with this function. 1 trieglaf/pnk2e 2

48 Chapter 4. Technical Methodology 40 Stochastic Petri Nets / Generalized Petri Nets Classes for a Stochastic Petri Net, such as delayed transitions and a stochastic firing rule, were implemented. In addition, Java classes that define the graphical representation of immediate transitions and inhibitor arcs in the editor were also developed. Generalized Stochastic Petri Nets (GSPNs) are an extension of Stochastic Petri Nets. In addition to delayed transitions, a GSPN also comprises immediate transitions and inhibitor arcs. An immediate transition fires, if enabled, before all delayed transitions. If several immediate transitions are enabled at the same time, the transition to execute is determined at random. An inhibitor arc is an arc that disables a transition if its input place(s) contain(s) at least one token. Inhibitor arcs may have multiplicities in the same way as ordinary arcs. Inhibitor arcs can be used to model decisions that are assumed to take no time. They can also be used to represent control actions that are necessary to ensure the correct behaviour of the model. Inhibitor arcs result in a smaller state space of the underlying Markov process since they can be used to exclude certain states. We also implemented support for GSPNs since they seemed to be an useful extension of the standard SPN theory. But this formalism does not seem to be necessary for biological applications, at least not in the examples used. Furthermore their simulation is more difficult and the correspondence to a Markov process is less obvious. For these reasons this approach was not further pursued. Token Game for Stochastic Nets If we start from an initial marking and execute enabled transitions according to the firing rule, we call this simulation the token game. The token game that was implemented is essentially nothing else than a stochastic simulation of the Markov process but at much lower speed. Every time a transition is executed, it is highlighted for some milliseconds. The movement of the tokens through the net is visualized. The dialog and observer interfaces of the Petri Net Kernel allow implementing a token game simulator easily. It is useful to get an idea of the general behaviour of the net. But for larger networks, a token game simulation takes a lot of time and one should resort to a standard stochastic simulation.

49 Chapter 4. Technical Methodology 41 SBML and CMDL support SBML and CMDL are data exchange formats that are used in the Systems Biology Workbench. Before a Petri Net can be simulated in the workbench, it needs to be translated into one of these formats. We will give details of both languages in the next section. As outlined above, new In- and Output classes can be derived from the InOut class that is provided by the Kernel. We developed two new classes, SBMInOut which translates a Petri Net into SBML, and CMDLInOut that translates into CMDL. In addition, the extended Kernel can import Stochastic Petri Nets from SBML. This is particularly useful since many published models of chemical reactions are available in this data format. There are also efforts to provide a collection of annotated SBML models online 3. These models can now be imported into the Kernel. The problem is that SBML does not contain any layout information. If a Petri Net representation is created from a SBML file, PNK 2e creates a single cluster of all nodes in the net. The Kernel contains a simple layout algorithm that was implemented by the developers at the Humboldt University. This algorithm is capable of rearranging the nodes and tries to create a clearer layout of the graph. In our experience, this works only for simple and small Petri Nets. The algorithmic drawing of complex graphs is a topic of current research. Interface to Systems Biology Workbench The Systems Biology Workbench (SBW) includes an interface that allows other applications to use its infrastructure. This application only needs to use the API offered by the SBW to call the so-called broker and can access any of its services. A new application was developed that derives from the interface MetaApplication, translates the Petri Net into SBML or CMDL and calls the simulation service of the SBW. This service can then be used to simulate the net either stochastically or deterministically. Details of this simulation service will be given in section 4.3. Simplified Petri Nets for Biological Reactions After the first few experiments with the extended Petri Net Kernel, it turned out that Petri Nets representing networks of chemical reactions can become very large and complex. For this reason, two 3 Biomodels database: (developed by the European Bioinformatics Institute, UK)

50 Chapter 4. Technical Methodology 42 simplifications for reactions which occur very frequently in biological models, were introduced. X Y n1 The first simplification comprises a new Petri Net transition. It represents a reversible chemical reaction X Y. It is defined in the Java class ReversibleTransition and has two associated rates instead of one. Before this simplified model can be simulated, the reversible transition has to be decomposed into its forward and backward reaction. The first rate is assigned to the forward and the second to the backward reaction. Enzyme Y X t1 The second simplification represents the reaction X Y which is catalysed by an enzyme (indicated by the place with the inscription enzyme ). The fact that the enzyme catalyses this reaction is indicated by the black box at the end of the arc that connects the enzyme place with the transition. This arc which is implemented in class EnzymeArc and has three associated rates r 1, r 2 and r 3. In order to simulate this reaction stochastically, the Kernel decomposes the reaction into three separate steps: X + Enzyme r 1 Cm (4.1) r C 2 m X + Enzyme (4.2) r C 3 m Y + Enzyme (4.3) This set of reactions represents the synthesis of an intermediate complex C m (4.1), the dissociation of this complex into reactant X and the enzyme (4.2) and production of the product Y (4.3). Each of the rates in class EnzymeArc is associated to one reaction steps. More complex reactions with several reactants and/or products are also possible. 4.2 The Systems Biology Workbench This section gives an outline of the Systems Biology Workbench, the framework used to perform the simulations in this dissertation. The ERATO Systems Biology Workbench (SBW) Project was originally funded by a grant from the Japan Science and Technology Corporation (Hucka et al. 2002). The aim of this project was to build a software infrastructure that allows to share resources between simulation and analysis programs for Systems Biology. Currently financial support comes from the U.S. Department of Energy. The principal investigators of the project are situated at the Keck Graduate Institute in Claremont, the California Institute of Technology and the Institute for Systems Biology in Seattle.

51 Chapter 4. Technical Methodology 43 The Systems Biology Workbench was used to simulate the time evolution of Stochastic Petri Nets. As outlined in chapter 2, a SPN maps directly to the Gillespie or Gibson Bruck algorithm and can be executed since all of these stochastic models are based on a Markov process. The next sections describe the architecture of the SBW, give an overview of its API and describe its simulation service in more detail. Overview The current version of the SBW is Due to large scale architectural changes, this version is only available for the Windows operating system. All parts are available as open-source under the GNU LGPL. It is fact that many software engineers often duplicate each other s effort when implementing different packages. Many research groups write their own programs that fulfil their very special needs and reflect the specific expertise and preferences of the group. This results in many small software projects, each having its niche strengths which are different, but complementary to, the strengths of other tools. On the other hand, since there are certain basic functions that are needed by all programs (data input/output, visualisation of results etc.), developers often have to re-implement general functionality in their tools. There is currently no software that can answer all the questions of the Systems Biology community. Many researchers uses a variety of tools at the same time to look at their problems from different perspectives. Software that eases the exchange of information between these tools will become even more important. The Systems Biology Workbench tries to address this issue by offering a common infrastructure for software tools within the field of Systems or Theoretical Biology. The approach is very similar to the Petri Net Kernel that comprises an infrastructure for Petri Net tools to avoid a waste of resources. We decided to merge these tools that are so similar in their approach.

52 Chapter 4. Technical Methodology 44 The SBW Broker The SBW distribution consists of the SBW Broker and several small modules that illustrate the use of the workbench. Among these modules are Jarnac, a deterministic simulator, and a bifurcation analysis tool. Furthermore there are many programs that have been developed independently from the SBW but support its communication protocols. They are SBW-enabled and can communicate through the SBW. Module written in Java SBW Java Interface Figure 4.2: Module written in Python SBW Python Interface SBW Broker SBW C++ Interface Module written in C++ The broker architecture of the SBW. Gray areas indicates SBW components. Adapted from (Hucka et al. 2002). Among these programs is Dizzy, a stochastic simulator that implements the Gillespie algorithm. The centrepiece of the SBW is the SBW Broker. It is a background program that is started automatically if needed by a module of the SBW. The Broker maintains a list of all registered modules. A program needs to be registered once with the Broker and can then be started on demand. Architecture of the SBW The communication of the different components in the SBW is realized by a message-passing architecture. Messages are exchanged as structured data-bundles. All interactions are defined on a very high level and the whole framework is independent from any programming language. The Broker itself is written in C++ but the modules can be written in any language as long as they can send, retrieve and process messages according to the conventions given by the SBW. A module can implement one or more services. Services are interfaces to the functions of the module. These interfaces are made visible to other modules and can be executed through the Broker. Each service belongs to a hierarchically organized category. This organization allows other applications to list all services that belong to a certain category without knowing the names of each service. As an example, the SBW module that we developed for the Petri Net Kernel simply lists all simulation services that are known to the Broker and the user can decide which service to execute.

53 Chapter 4. Technical Methodology 45 The following listing is taken from PNK 2e and illustrates the use of the Java API provided by Systems Biology Workbench: 1 2 SBW. c o n n e c t ( ) ; / / open c o n n e c t i o n t o SBW b r o k e r 3 4 s e l e c t i o n = ( S e r v i c e D e s c r i p t o r ) l i s t. g e t S e l e c t e d V a l u e ( ) ; 5 6 S e r v i c e s e r v i c e = s e l e c t i o n. g e t S e r v i c e I n M o d u l e I n s t a n c e ( ) ; 7 A n a l y s i s a n a l y s i s = ( A n a l y s i s ) s e r v i c e. g e t S e r v i c e O b j e c t ( A n a l y s i s. c l a s s ) ; 8 9 a n a l y s i s. d o A n a l y s i s ( sbml ) ; / / v a r i a b l e sbml s t o r e s t h e sbml r e p r e s e n t a t i o n In the third line the service that was chosen by the user is retrieved. This description of this service is encapsulated in the class ServiceDescriptor. Then an instance of the service is requested from the SBW. The simulation service is a subclass of the more general Analysis category. We retrieve an instance of this category and perform the simulation with a call of the method doanalysis. This source code is very general since our intention was to offer the user not only the choice between services of the very specific Simulation category but also to the larger Analysis category which contains also deterministic simulators and export modules. Visualisation and Analysis of Results The Systems Biology Workbench contains some modules that can perform a rudimentary analysis of the results. The graphs in the experimental section of this work were performed by exporting the simulation results to Matlab, which is possible using the Matlab export module of the SBW. The Systems Biology Markup Language The Systems Biology Markup Language (SBML) (Finney & Hucka 2003) complements the Systems Biology Workbench described in the last section. Most of the modules contained in the SBW require the model to be described in this language. SBML is a standard description language for the representation of models of biological networks. Mathematical models described in SBML for a variety of systems are available at This webpage also contains a SBML test suite that allows to verify own models and to ease the development of own applications. Pack-

54 Chapter 4. Technical Methodology 46 ages such as libsbml are available and can be used to parse SBML files and to map them into their own internal representation. Although SBML (currently implemented in XML) aims at machine readability, it can also be followed by eye. Currently SBML exists in two versions, level 1 and 2. Level 2 is the newest release but level 1 is still widely supported. Conversions between both levels is possible but level 2 contains some improvements that can not be translated into level 1. However, most of the available tools only support level 1 including the Systems Biology Workbench. We will therefore restrict this overview to the features available in level 1. A SBML model consists of the following entities: Unit Definition determines the unit of the participating species. For stochastic simulations, this has to be the number of molecules. A deterministic model requires concentrations which can be calculated from the volume of the container and the number of molecules. Compartment is a finite container in which the reactions take place. An example could be a biological cell or a cellular compartment such as the nucleus. This definition also contains the volume of the container. Species are the types of molecules involved in this model. An attribute of each species is its initial amount given in a unit as defined above together with the compartment in which it is located. Reaction is a statement describing some transformation, transport or binding process that can change the amount of one or more species. Each reaction has an associated rate law that describes its kinetics. Parameter are variables in the model. They can relate to a single reaction only (such as rate constants) but also act on a global level. Rules are a general concept that can be used to express relations between the amount of species or constrains on the rate law of a reaction etc. The SBML language definition is available online at and gives further details about the different entities in a SBML document and some examples. Here we will only show a small example of a reaction and how it can be represented with SBML. Consider the reaction X + 2 Y k 1 Z which is some hypothetical synthesis of a molecule Z with the reactants X and Y. The reaction occurs with at a rate constant k 1. A representation of this reaction in SBML level 1 would look like this:

55 Chapter 4. Technical Methodology <l i s t O f R e a c t i o n s> <r e a c t i o n name= r e a c t i o n 1 r e v e r s i b l e = f a l s e > <l i s t O f R e a c t a n t s> <s p e c i e s R e f e r e n c e s p e c i e s = X s t o i c h i o m e t r y = 1 /> <s p e c i e s R e f e r e n c e s p e c i e s = Y s t o i c h i o m e t r y = 2 /> </ l i s t O f R e a c t a n t s> <l i s t O f P r o d u c t s> <s p e c i e s R e f e r e n c e s p e c i e s = Z s t o i c h i o m e t r y = 1 /> </ l i s t O f P r o d u c t s> <k i n e t i c L a w f o r m u l a = k1 X Y > <l i s t O f P a r a m e t e r s> <p a r a m e t e r name= k1 v a l u e = 5 /> </ l i s t O f P a r a m e t e r s> </ k i n e t i c L a w> </ r e a c t i o n> </ l i s t O f R e a c t i o n s>... The XML header and the declaration of units and compartment were omitted for clarity. SBML is clearly a useful invention. So far, the problem is that not all tools support all features of this language. For instance, the stochastic simulator Dizzy does not recognize the attribute reversible of a reaction. As a result, a reversible reaction has to be represented by two separate reactions, one for the forward and another one for the backward reaction. There is also no consensus about the representation of enzymes or inhibitors but this is an issue that is supposed to be solved in future releases of SBML. Mapping of the Petri Net to SBML With the information given in the last section, it is relatively easy to see how a Petri Net can be stored in SBML. Places are stored as species and transitions as reactions. The Java class SBML InOut implements this mapping. This class uses the aforementioned library libsbml provided by the developers of SBML. This library simplifies the export to SBML since the programmer does not have to write the XML file directly but creates a set of Java objects that represent the document. The library handles all read and write accesses on the level of the XML file and is very fast.

56 Chapter 4. Technical Methodology 48 During this dissertation, the same approach of mapping Petri Nets to SBML has been published elsewhere (Shaw, Koelmans, Steggles & Wipat 2004). However, we developed our own implementation and did not use any other software besides the libsml library. There is only one issue that needs to be clarified and that is the question of how to map the tokens in the Petri Net to the units in the SBML model. In general, a stochastic simulation requires the units to be given in terms of the numbers of molecules. But in the literature, there are several approaches. The most common one is to think of one token as one molecule. But there are also studies in which one token represents a fixed number of molecules (Srivastava et al. 2001, Arkin et al. 1998). In this implementation and all subsequent experiments, it is assumed that one token represents one molecule. If the user wants to obtain deterministic information about the model, the concentration is computed from the numbers of tokens and the volume of the compartment. 4.3 Stochastic Simulations with Dizzy The software Dizzy (Ramsey et al. 2005) is developed at the Institute for Systems Biology in Seattle. It implements the Gillespie-Algorithm and its improvement by Gibson and Bruck. Dizzy also contains a tool for the integration of differential equations and is capable of importing models from SBML. The user can plot the results or write them to a file for further processing by other tools. The program is written in Java and published as OpenSource software. Dizzy was used to perform the simulations in the next chapter. There are several tools with similar functionality that are supported by the SBW. It was decided to use Dizzy since it has a very comfortable graphical interface. In addition it also offers a so-called programmatic-interface. This interface can be used by Java programs to use the algorithms implemented by Dizzy directly without the SBW, simply by importing the libraries of Dizzy. This was useful since the first release of the SBW was not very stable and the connection to the Broker was interrupted frequently. This resulted in an interruption of the simulation. We reported these problems to the developers of the workbench but it took them some time to correct this bug. In the meantime, Dizzy could be used through its programmatic interface directly. Later it was decided to switch back to the SBW mediated communication. A communication of Petri Net Kernel and Dizzy through the workbench is more flexible and allows to include other tools into the analysis as well.

57 Chapter 4. Technical Methodology 49 Dizzy is also capable of reading CMDL which is a simplified version of SBML only suitable for the description of small models. An export module for the Petri Net Kernel was written since CMDL is easier to read and to debug if the model is small. But SMBL is much more commonly used and therefore no examples of CMDL will be given. 4.4 Conclusions In this chapter, the technical methodology used in this project has been described. The Petri Net Kernel and its extensions made during this dissertation were detailed. A survey of the Systems Biology Workbench was given. Both projects are similar in their approach and have proven to be useful during this dissertation. There seems to be a clear trend in academical software engineering to develop frameworks that integrate different tools and avoid double effort. It was decided to extend existing tools instead of writing new software in order to have more time for the experimental part of this project. This approach was successful. On the other hand, it turned out that one has to spend considerable time to understand program written by other people. This is particularly important if one wants to make changes to the software or to implement new features. Furthermore, if bugs are found in the software it is often very difficult to find them. One is usually dependent on the help of the developers of the software. This is usually not the case if one uses only self-written software. During this dissertation, considerable help was received from the developers of the Systems Biology Workbench, especially Frank Bergmann at the Keck Graduate Institute in Claremont, USA and Stephen Ramsey who is developing the simulation software Dizzy. Both responded quickly to any questions and were eager to extend their software and to provide additional documentation. Reported bugs were often corrected within some days. The friendly and open attitude that is characteristic for the Open Source community greatly facilitated the work on this dissertation. The extension to the Petri Net Kernel that was developed during this dissertation is available online at trieglaf/pnk2e/. This web page also provides a detailed documentation of the software generated with the javadoc tool. Appendix A contains a user guide to PNK 2e which includes screenshots and a short introduction. This document is also available online at the aforementioned web page.

58 Chapter 5 Experiments This chapter describes the experiments conducted with the extended Petri Net Kernel. We start with a simulation of the Volterra-Lotka Reactions. This well-known model has been extensively used as an example for systems whose behaviour cannot be accurately computed with deterministic simulations (Gillespie 1976). In this dissertation, it serves as a first example for an application of the Petri Net Kernel. It highlights the usefulness of Stochastic Petri Nets for the modeling of biological reactions and, since the simulation results have already been researched, is used to validate our approach. The second section deals with a more comprehensive example, a stochastic simulation of a regulatory gene network that exhibits circadian rhythms. It is known that many organisms exhibit an circadian rhythm on the level of genes and proteins that follows the daily cycle of day and night. However, the details of the molecular mechanisms upon which these rhythms rely are not yet understood. Two competing models and their dynamic behaviour are compared. These models differ heavily in their architecture and the assumptions they make. Consequently, they also reveal a very different behaviour if simulated stochastically under changing conditions. This chapter concludes with some experiments dealing with the synchronisation of several biomolecular clocks. 50

59 Chapter 5. Experiments The Volterra-Lotka Reactions The so-called Lotka-Volterra reactions are described by a set of three coupled, autocatalytic reactions: Y Y + X X c 1 2Y (5.1) c 2 2X (5.2) c 3 /0 (5.3) This model can also be seen as a simple description of a predator-prey ecosystem. In this case, Y represents the prey which reproduces itself in reaction 5.1. Reaction 5.2 describes how X, the predator species, reproduces by feeding on the prey. Reaction 5.3 models the demise of X through natural causes. These reactions have been extensively modelled using different deterministic and stochastic approaches. The corresponding reaction-rate equations are given by dy dt dx dt The nontrivial steady state of this system is given by and one can show that it is characterized by = c 1 Y c 2 XY (5.4) = c 2 XY c 3 X (5.5) dy dt = dx dt = 0 Y = c 3 /c 2 and X = c 1 /c 2 For X = Y = 1000, c 1 = c 3 = 10 and c 2 = 0.01, a deterministic approach predicts that this situation will persist indefinitely (Gillespie 1976). Figure 5.1(b) shows the result of a numerical integration of the equations 5.4 and 5.5 with these initial conditions: the number of molecules remains indeed constant over time. But this is not what one would expect from a good model of a predator-prey ecosystem. It was also shown that this system exhibits an oscillatory behaviour if modelled stochastically. The system was simulated by mapping the Stochastic Petri Net to its SBML representation using

60 Chapter 5. Experiments 52 2 reproduction of prey prey 1 1 predator death 10 1 predator number of molecules Prey Predator consumption of prey Time [d] (a) Petri Net Representation (b) Deterministic solution Figure 5.1: 5.1(a) is the SPN representation of the Lotka-Volterra reactions. 5.1(b) shows the deterministic solution of the reaction-rate equations with X = Y = 1000, c 1 = c 3 = 10 and c 2 = The plot was obtained by numerical integration of the differential equations 5.4 and 5.5. PNK2e. This SBML representation was used to simulate the dynamic behaviour of the model with the Gillespie algorithm. Figure 5.2(a) shows the result of this simulation with the expected oscillations. We conducted our experiments with initial conditions that match the steady state. Further experiments have shown that the behaviour of the system is heavily influenced by its initial conditions. Not always one of the steady states is reached. Nonetheless, stochastic and deterministic formulation come to a different result in many cases (data not shown). We decided to present simulations with the same parameters as Gillespie (1976) because this example is very illustrative Results and Discussion It is not difficult to see why the Lotka-Volterra model exhibits an oscillatory behaviour. If one examines the plot 5.2(a) carefully, one can see that each rise in the prey population is followed by an increase of the predator population and a subsequent decrease in the prey population. If the prey population increases, the amount of available food for the predator population rises as well. This leads to an increase of the predator population followed by a decrease in the prey population. The resultant food shortage for the predators leads to a decline in their population. This permits the prey population to increase again etc.

61 Chapter 5. Experiments Predator Prey number of molecules Number of X molecules time Number of Y molecules (a) Stochastic Simulation (30 timesteps) (b) Y vs. X Figure 5.2: Stochastic simulation of the Lotka-Volterra reactions with the Gillespie algorithm. The number of X and Y molecules oscillates between 150 and 2600 molecules. The right figure shows a plot of Y vs. X. Gillespie (1976) gives a more formal analysis of the simulation results. If one solves the differential equations 5.4 and 5.5 for an arbitrary initial state (X 0,Y 0 ), then the solution would be an orbit in the (X,Y) plane that passes through the initial state (X 0,Y 0 ). One can show that this solution is only neutrally stable in a mathematical sense. That means that if the system is perturbed by some random fluctuations, it is driven out of this orbit and ends up in a new solution orbit passing through (X 0,Y 0 ). Figure 5.2(b) illustrates this behaviour. It shows a plot of the number of Y (predator) versus the number of X (prey) molecules. The system passes through several neutrally stable, concentric solution orbits. Fluctuations can drive it either outward or inward into a new orbit. Sooner or later, a random fluctuation will drive the system on one of the two axes in figure 5.2(b). That is, either the prey or the predator population will die out. If the prey dies out first, the predator population will die out as well. If the predator population dies out first, the prey population will increase indefinitely. The Volterra-Lotka reactions are of course a very simple example. They have been used by Gillespie (1976) to underline that there are (bio-)chemical systems whose behaviour can not be reliably predicted using deterministic approaches. The reason is that a deterministic formulation does not take into account the random fluctuations occurring at the microscopic level. In contrast to this, the stochastic simulations is a much more natural framework and gives results that are closer to intuition. In this work, it was decided to use this example again to show that the Petri Net representation is able to represent chemical reactions in a natural way. The SPN representations of the reactions is given in figure 5.1(a) and is easy to understand. Furthermore, if the transition rates in

62 Chapter 5. Experiments 54 the Stochastic Petri Net are chosen according to the mass action rate kinetics (Cox & Nelson 2004), the net can be efficiently simulated using the Gillespie algorithm as done in our experiments. 5.2 Stochastic models of circadian rhythms The delay-based Model The daily change of day and night affect nearly all life-forms. Many organisms have evolved rhythmic responses that follow this day-night cycle. There exist many responses, ranging from behaviour (sleep-wake cycles, feeding rhythms) to molecular rhythms (e.g. gene expression and enzyme-activity rhythms). This section deals with a core molecular model capable of generating circadian rhythms in Neurospora crassa which is a red bread mould. This mould is often used as a model organism since it is easy to grow. Its genome is fully sequenced and simple. The model itself represents a rhythmic response on the level of gene expression (i.e. the response is given by oscillations in the concentration of a regulatory protein with a period close to 24 h). One has to keep in mind that this model is fairly general. It does not aim at capturing all the details in a real cell. It is also not specific to Neurospora but represents an architecture that is thought to be representative for a biomolecular clock in simple organisms. As an example, Gonze, Halloy & Goldbeter (2002) state that, with small modifications, this system is equivalent to the biomolecular clock in Drosophila melanogaster, the fruit fly. There are several models of genetic oscillators, almost all of them are based on a negative feedback loop. In this loop a regulatory protein inhibits its own expression. The model we are examining was published by Leloup, Gonze & Goldbeter (1999) for the first time. This deterministic description was later transformed into a stochastic model by Barkai & Leibler (2000). The simulation of this model revealed a very noisy behaviour and no stable oscillations. Their conclusion was that the model is wrong since it is only able to exhibit circadian rhythms if solved deterministically. But the deterministic formulation is assumed to be inadequate since the regulatory protein occurs in very few instances only. Two years later the same model was simulated again using the same algorithm but with different rate constants (Gonze, Halloy & Goldbeter 2002) for the protein-dna interactions. These kinetic constants are thought to be critical for the noise resistance of the system. Under these conditions the model revealed stable oscillations with a period of about 24 hours.

63 Chapter 5. Experiments 55 Gonze, Halloy & Goldbeter (2002) argue that the kinetic rate constants used by Barkai & Leibler (2000) in their stochastic model were too small. Using different parameters, they were able to produce stable oscillations in a stochastic simulation even with very few molecules. In the experiments that follow, the same kinetic constants and experimental setup are used as by Gonze, Halloy & Goldbeter (2002). But it is not clear which parameters are actually correct since no experimental data is available so far. Figure 5.3: Core model for circadian rhythms based on delay. This plot is taken from Gonze, Halloy & Goldbeter (2002). The model incorporates gene transcription as well as transport, degradation and translation of mrna (M p ). The clock protein (P 0 ) that is synthesized from the mrna is reversibly phosporylated into the form P 1 and P 2 successively. P 2 is either degraded or transported into the nucleus (P N ) where it exerts a negative feedback on its own gene. The inhibition is cooperative (explanation see text below). This general model accounts for circadian oscillations in Neurospora but also Drosophila. The model An overview of the model is given in figure 5.3. Essentially a protein is phosphorylated in two steps, it diffuses back into the nucleus and inhibits its own synthesis. This negative feedback loop leads, if timed correctly, to oscillations in the concentration of the protein with a period close to 24 hours. The phosphorylation induces a delay between the translation of the mrna and the diffusion of the protein back into the nucleus. Its role in the biological clock is not yet clear (Barkai & Leibler 2000) and there are theoretical models of circadian oscillations that can do without it (Gonze, Halloy & Gaspard 2002). Up to four proteins must bind successively to the gene promoter to repress transcription. The resulting inhibition of the gene is cooperative. This means that each bound protein facilitates the binding of the next protein, i.e. the rate constants of these reactions

64 Chapter 5. Experiments 56 are increased. This is a phenomenon that occurs very frequently in nature. But also cooperativity is not required for the oscillations to take place. Nevertheless, a cooperative inhibition increases the robustness of the oscillations. The experiments of Gonze, Halloy & Goldbeter (2002) revealed that the highest robustness is achieved with three proteins binding to the gene. They understand robustness as an informal measure of the regularity of the oscillations. Whenever we use the term robustness in the remainder of this dissertation, we refer to the regularity the oscillations as well. This regularity is determined by the deviation in period and amplitude of the oscillations. The differential equations that correspond to the individual reaction steps in the model are given in Appendix B. From this deterministic model, a description of the detailed reaction steps was derived. Reactions following Michaels-Menten kinetics are decomposed into single steps. The Petri Net representation of the reaction steps is large and is therefore included in Appendix B. The probabilities of the reactions are, if not present in the deterministic model, taken from the literature. As outlined before, the aim of the stochastic formulation was to show that this model is adequate, a fact that was questioned in a previous study (Barkai & Leibler 2000). The volume of the system, Ω, was changed systematically in order to modulate the number of molecules in the model. In order to understand why an increase of the volume results in a larger number of molecules, one needs to recall how the deterministic rate constants are converted in the stochastic reaction constants (see section in chapter 2). Intuitively, an increase of the size of the container should result in a dilution and less molecules involved. But this is not the case in this scenario. Since the deterministic constants are expressed in terms of the concentrations of the molecular species, we multiply these constants by the volume of the system and obtain constants in terms of the number of molecules. The deterministic constants are fixed and therefore an increase in the volume of the system increases the magnitude of the reaction probabilities and effectively the number of molecules involved. Gonze, Halloy & Goldbeter (2002) refer to Ω by the size of the system instead of volume in order to avoid confusions. The problem with this approach is that if we modify all reaction probabilities in this way, we would also increase the number of genes in the model. But this is not realistic and therefore all reaction constants involving the gene promoter G all scaled by Ω to keep its number equal to unity. For details see table B.2 in the appendix. The experiments were started by creating a Petri Net representation of this model. The Petri Net is then translated into SBML and its behaviour simulated by the Gillespie algorithm. This algorithm is implemented in the Systems Biology Workbench

65 Chapter 5. Experiments 57 (Hucka et al. 2002). We tried first to recreate the results published by Gonze, Halloy & Goldbeter (2002) and performed simulations with different values of Ω. The aim was to check whether the stochastic simulations produce results similar to those obtained with the deterministic model. Estimates of period in the oscillations were obtained with the Matlab Signal Processing Toolbox. The Fourier transform was applied to the time course of the protein population. This transform aims at decomposing a noisy signal into a linear combination of sine and cosine functions. The power spectrum or spectral density was obtained and used to estimate the strength of the different frequencies that form the signal, in this case the time evolution of the numbers of proteins. From the frequency estimate, the period of the signal was obtained. 15 Deterministic simulation mrna nuclear protein all protein molecules 7 Deterministic simulation 6 Concentration [nm] 10 5 Nuclear protein concentration, P N (nm) Time [h] mrna concentration, M p (nm) Stochastic simulation with Ω = 500 mrna nuclear protein all protein molecules Stochastic Simulation with Ω = Number of mrna or protein molecules Nuclear protein molecules P N Time [h] mrna molecules M P Figure 5.4: Delay-based circadian clock: stochastic and deterministic simulation The first row gives the results obtained in the absence of noise. These curves are generated by numerical integration of the kinetic equations as given in Appendix B. The oscillations of mrna (M p ), nuclear (P N ) and total clock protein (P t ) correspond to a the evolution towards a limit cycle shown as a projection onto the (M p,p N ) plane. The results in the second row are obtained by stochastic simulation of the chemical reactions corresponding to the deterministic model. The number of mrna molecules oscillates between a few and about 1000 whereas nuclear and total clock protein oscillate in the ranges of and , respectively.

66 Chapter 5. Experiments Stochastic simulation with Ω = Stochastic Simulation with Ω = Autocorrelation with Ω = 100 mrna nuclear protein all protein molecules Number of mrna or protein molecules Nuclear protein molecules P N Sample Autocorrelation Time [h] mrna molecules M P Time (h) 1800 Stochastic Simulation with Ω = Stochastic Simulation with Ω = 50 1 Stochastic Simulation with Ω = mrna nuclear protein all protein molecules Number of mrna or protein molecules Nuclear protein molecules P N Sample Autocorrelation Time [h] mrna molecules M P Time (h) 600 Stochastic Simulation with Ω = Stochastic Simulation with Ω = 10 1 Stochastic Simulation with Ω = mrna nuclear protein all protein molecules Number of mrna or protein molecules Nuclear protein molecules P N Sample Autocorrelation Time (h) mrna molecules M P Time (h) Figure 5.5: Effect of the number of molecules on the robustness of the oscillations. The plots show the results of stochastic simulations with Ω changing from 100 to 50 and 10. The left plot in each row shows the oscillations for mrna, nuclear protein and all proteins during a simulation time of 484 hours. The middle plot shows the corresponding limit cycle and the right plot the time evolution of the autocorrelation function. The autocorrelation function was computed for time lags from 0 to 480. For Ω = 50 and 100, one can still observe robust circadian oscillations. Only for Ω = 10 the oscillations become very noisy. This fact is underlined by the rapid decrease of the autocorrelation function. The simulations were conducted with the Gillespie algorithm. The plots of the autocorrelation function were created with Matlab.

67 Chapter 5. Experiments 59 Experiments and Results The aim of this first experiment was to check whether for sufficiently large numbers of molecules, a stochastic simulation will produce results similar to the deterministic model. A simulation of the SPN representing the model was performed with the Gillespie algorithm (Figure 5.4). The left plot in each row shows the oscillations of mrna (M P ), nuclear protein (P N ) and the sum of all protein molecules obtained with the deterministic model (first row) and the stochastic simulation. These oscillations evolve towards a limit cycle which is shown in the second plot in each row. The stochastic simulation was conducted for Ω = 500. For this particular value of Ω, the oscillations of the stochastic model are quite stable with a period of 24.5 hours and a standard deviation of 1.1 hours. Similar values were obtained by Gonze, Halloy & Goldbeter (2002). The number of mrna molecules varies in the range of and the number of proteins in the range of (nuclear form) and (all proteins). The deterministic model shows stable oscillations with a period of 23.8 hours. In this model, the concentration of mrna varies in the range of 0 2 nm whereas the protein concentrations change in a range of nm (P N ) and nm (all proteins). It seems that the molecular noise which is considered only by the stochastic model, merely induces a change in the amplitude of the oscillations and not in the period. The next experiments dealt with the influence of decreasing numbers of molecules on the results of the stochastic simulation. Stochastic simulations were performed for Ω = 10, 50 and 100. The results in figure 5.5 reveal that stable oscillations occur with Ω = 100 (first row) and 50 (second row). With these parameters, the number of mrna molecules oscillate in the range of (Ω = 100) and (Ω = 50). But for smaller numbers of molecules the circadian rhythms are more and more overlapped by noise. The bottom row shows the result of a simulation with Ω = 10. The number of mrna molecules varies from 0 to 200 and the number of proteins changes from 0 to 200 (P N ) and from 50 to 400 (all proteins). The limit cycle is not longer visible and the circadian oscillations have become very noisy. These observations are underlined by the time evolution of the autocorrelation function which is given in the third column of each row. This function measures the degree of periodicity of a function. It is the correlation of a discrete process against a time-shifted version of itself, for a time

68 Chapter 5. Experiments 60 lag τ, and is defined by: R f (τ) = E[(X i µ)(x i+τ µ)] σ 2 where E is the expected value and µ the mean. For a deterministic and periodic time series, the autocorrelation function oscillates between 0 and 1. In the presence of noise, the more periodic the function, the more slowly the autocorrelation function goes to zero. This loss of correlation is due to the phenomenon of phase diffusion. In the presence of noise the phase of free-running oscillations varies in such a way that it eventually covers the whole range of possible values over a period (Gonze, Halloy & Goldbeter 2002). If many molecules are present in the system, the autocorrelation decreases slowly as can be observed for Ω = 100. If noise starts to obliterate the oscillatory behaviour, the autocorrelation function decreases more rapidly. This can be seen for lower numbers of molecules (Ω = 10,50). So far, we merely repeated experiments that were published elsewhere (Gonze, Halloy & Goldbeter 2002). We can now hope that our approach, to model the reaction steps with a SPN, is valid and can apply this method and the experimental setup to other models. This has not yet been done The hysteresis-based Model As mentioned at the beginning of this chapter, the validity of the delay-based model has been questioned (Barkai & Leibler 2000). In this part of the dissertation, we will present a different model based on hysteresis (Vilar et al. 2002). In general, hysteresis is a property of a system that describes a memory or lagging effect. In contrast to the previous model, this genetic oscillator consists of two different components, an activator and a repressor protein. The expression of the activator leads to a delayed expression of the repressor. The repressor inhibits the synthesis of the activator protein and is the source of the oscillations. The delay-based and the hysteresis-based model are very different but general models that try to explain how circadian oscillations might be generated on a molecular level. A short comparison of both models has already been made (Barkai & Leibler 2000). Recently, it has been suggested that the rate of the binding reaction between protein and DNA has significant influence on the noise resistance of the oscillations (Forger & Peskin 2005). We therefore repeat previous experiments with faster rate constants for these binding reactions that are assumed to increase the noise resistance. We also change the size of the hysteresis-based model in the same fashion as for the delay-based model in the last section. This was not part of the experiments in Barkai & Leibler (2000).

69 Chapter 5. Experiments 61 The Model Figure 5.6: The hysteresis-based model. Illustration taken from Barkai & Leibler (2000). This model consists of two active components, an activator (A) and a repressor (R). A stimulates the synthesis of B. If the concentration of R rises, A is degraded quickly. After the slow degradation of R, A is v synthesized again. Figure 5.6 above gives an illustration of the second oscillatory network that is examined in this section. The model also includes the degradation of both messenger RNAs synthesized from genes P A and P R. The activator protein A binds to the promoter regions of its own gene which leads to an increase of the transcription rate. This type of feedback loop increases the noise resistance of the oscillator and seems to be a common feature among several competing models (Barkai & Leibler 2000). Protein A also binds to the promoter of gene P R and induces its expression. The gene is transcribed into mrna and the resulting protein R binds to protein A. The degradation complex (C) is formed (not shown in the simplified illustration above) and A is degraded. That is, the degradation complex C decays into R. The cycle completes by degradation of the repressor R and subsequent re-expression of the activator. For a detailed description of all reaction steps in the model, see Appendix C. As for the previous model, this oscillator is assumed to represent the core architecture of a biomolecular clock. It is not specific to any organism but contains features such as a positive feedback loop and two competing components that were found experimentally in a variety of organisms such as cyanobacteria but also mammals (Dunlap 1999). The dynamics of this system are captured by a set of differential equations which are given in Appendix C. These equations have been decomposed into the elementary steps in the same way as for the delay-based model. The reaction probabilities are given in table C.1 and the SPN representation of the reactions in figure C.1 in the third appendix.

70 Chapter 5. Experiments Deterministic simulation C R 1800 Limit cycle of the deterministic simulation Number of molecules R Time [h] C (a) Deterministic simulation (b) Limit cycle 2500 Stochastic simulation 1800 C R Number of molecules Time [h] (c) Stochastic simulation (d) Limit cycle Figure 5.7: Hysteresis-based circadian clock: stochastic and deterministic simulation Oscillations in the repressor protein (R) and the degradation complex (C) obtained by numerical simulation of the deterministic 5.7(a) and stochastic description 5.7(c) of the model. Experiments and Results The delay-based model that was examined in the last chapter, reveals stable oscillations at high numbers of molecules i.e. if the size of the system is increased to Ω = 500. On the other hand, if Ω is decreased to 10, oscillations can still be perceived but with very irregular period and amplitude. The delay-based and the hysteresis-based model have already been compared (Barkai & Leibler 2000) but with slower rate constants and without changing the system size. We present a more thorough comparison with faster rates for the protein and DNA binding reactions and with changing numbers of proteins and mrna in the model. All stochastic simulations were started with one gene for repressor and activator protein and no instances of mrna and protein species as initial conditions. To begin, we compare the stochastic formulation of the new hysteresis-based model with its deterministic formulation. Figure 5.7 shows the results of this experiment. Following the work of Vilar

71 Chapter 5. Experiments 63 et al. (2002), we give the time course of repressor protein (R) and degradation complex (C) which consists of activator and repressor protein. Deterministic formulation and stochastic simulation are in good agreement and correspond to the results obtained by Vilar et al. (2002). The degradation complex is formed as soon as activator and repressor proteins are available. It decays into R and therefore a peak in the concentration of C is followed by a peak in the concentration of R. Later on, R is degraded and the cycle starts again. This experiment was performed with Ω = 1, as done by Vilar et al. (2002). With these parameter settings, the stochastic simulation results in a change of amplitude in the oscillations but the period remains remarkably stable. These experiments have already been performed elsewhere (Barkai & Leibler 2000). But in contrast to our experiments, they used slow rate constants for the binding reaction between protein and DNA. Whereas the noise resistance of the delay-based model is greatly improved with these higher rate constants, we cannot observe any change in the behaviour of the hysteresis-based model. Similar to the experiments in the last section, we are now examining the behaviour of the biomolecular clock for different numbers of molecules. The delay-based model exhibited very noisy oscillations at low values of Ω. If Ω is increased the behaviour of the stochastic model approaches the time course of the deterministic simulation. The fact that random fluctuations on the molecular level are averaged out if enough molecules are involved has been proven formally (Kurtz 1971) and the behaviour of the delay-based model confirms this. In contrast to these results, the hysteresis-based model reveals a very different behaviour. For Ω = 10 and 50, the oscillations are very stable (third and second row of figure 5.9). But if the size of the system is further increased, the oscillations stop completely (first row of figure 5.9). For this model, higher numbers of molecules do not seem to improve the oscillatory behaviour of the model. A limited amount of noise seems to have a positive influence on the oscillator. For higher numbers of molecules, the system seems to approach a steady state. These results confirm the findings of Vilar et al. (2002) that performed a theoretical analysis of a simplified version of the hysteresis-based oscillator. They were able to show that the molecular fluctuations can actually enhance the oscillator. Essentially, small perturbations can drive the system out of a stable state and initiate a new phase. In a deterministic setting, these perturbations are not considered and the system remains in the stable state once it has arrived there. This behaviour was observed for a particular low value for the degradation rate of the repressor protein R

72 Chapter 5. Experiments Deterministic simulation 2500 Stochastic simulation Numer of repressor molecules Numer of repressor molecules Time [h] Time [h] (a) Deterministic simulation (b) Stochastic simulation Figure 5.8: Time evolution of the repressor protein (R) for deterministic (a) and stochastic (b) formulation of the model. Parameter values are as given in Appendix C except for the degradation rate of R (δ R ) which is now 0.05h 1. (δ R = 0.05 h 1 ). Figure 5.8 shows the results of a deterministic and a stochastic simulation with these parameters. It is not completely clear whether a similar situation is created if the size of the system is increased as in our experiments. The simulation shows that the abundance of the key proteins A and R oscillates between 0 and several thousands of molecules. Vilar et al. (2002) observed a similar behaviour for Ω = 1. The fact that very few instances of the key proteins are present only during a very short time interval might be the reason for the noise resistance of the system at low values of Ω. On the other hand, the perturbations that will necessarily occur at these low abundances might just be enough to drive the system out of the steady state and into the next period. The simulation with Ω = 100 reveals that the numbers of both key proteins (data for A not shown) do not decrease to zero but oscillate around 1000 molecules. This might be the reason that the system approaches a steady state because the influence of fluctuations in the molecular populations is too low to drive the system out of its stable state and to initiate a new oscillation. In it very difficult to obtain coherent conclusions about a complex nonlinear system just from observations. To our knowledge, most of the theoretical results about the behaviour of a model with different parameters were obtained for simplified versions of the model only. In this case, various assumptions about the model were made such as the steady state of some molecular species and the hope was expressed that both models, real and simplified one, exhibit a similar behaviour over a wide range of conditions. It might require further advancement in the theoretical sciences to obtain new insights. Nevertheless, our simulations have shown that the hysteresis-based model exhibits an unexpected behaviour if the size of the system is increased.

73 Chapter 5. Experiments Stochastic simulation with Ω = Limit cycle of the stochastic simulation with Ω = 100 Autocorrelation with Ω = Number of molecules C R R Sample Autocorrelation Time [h] C Time lag [h] 12 x 104 Stochastic simulation with Ω = 50 9 x 104 Limit cycle of the stochastic simulation with Ω = 50 1 Autocorrelation with Ω = 50 C R Number of molecules R Sample Autocorrelation Time [h] C x Time lag [h] 2.5 x 104 Stochastic simulation with Ω = Limit cycle of the stochastic simulation with Ω = 10 1 Autocorrelation with Ω = 10 C R Number of molecules R Sample Autocorrelation Time [h] C x Time lag [h] Figure 5.9: Stochastic simulation with changing numbers of molecules (hysteresis-based model) The plots show the results of stochastic simulations with Ω changing from 100 to 50 and 10. The left plot in each row shows the oscillations for repressor (R) and degradation complex (C)during a simulation time of 200 hours. The middle plot is the limit cycle and the right plot the time evolution of the autocorrelation function.

74 Chapter 5. Experiments 66 Simulating the effects of gene duplication Stochastic simulation with Ω= Stochastic simulation with Ω=1 C R mrna P N Numer of molecules 5000 Number of molecules Time [h] Time [h] (a) Delay-based model (b) Hysteresis-based model Figure 5.10: Both models are simulated with a second copy of the clock gene (activator gene in case of the hysteresis model). The rate of transcription of both models is increased by factor 10. Both models are simulated with a value of Ω for which they should exhibit stable oscillations (100 for delay-based and 1 for hysteresis-based model). Gene duplication is thought to have a major role in evolution. It can happen when an error during the DNA replication occurs and a copy of a functional gene is inserted into a different part of the DNA. This copy might be identical to the original gene or mutated. If both copies are functional, one of the genes might mutate later on and acquire a different function since it is not longer required for the survival of the organism. But both genes can also remain active during the further evolution of the species (Cox & Nelson 2004). The effect of gene duplication on a genetic oscillator has already been examined by Forger & Peskin (2005). They increased the number of genes in their stochastic model of a circadian clock in mammals and found that the robustness of the oscillations is improved if more genes are present in the model. They measured the robustness of the oscillations in terms of the deviation of the period over many runs. Forger & Peskin (2005) argue that a low number of genes always leads to some residual stochasticity in the model. Even in the limit of a large volume when all reactions can be modelled deterministically, the reactions involving the gene and its promoter will still occur stochastically since the number of genes will not be influenced by the increase of the volume. This might also explain the fact that faster binding rates between protein and DNA lead to a reduction of the stochasticity in the model since the randomness of these reactions will be averaged out if they occur on a very fast time scale (Gonze, Halloy & Goldbeter 2002).

75 Chapter 5. Experiments 67 Gene duplication might also involve the mutation of the duplicated gene. This was not considered in the simulations of Forger & Peskin (2005). We simulate gene duplication by introducing a second mutated gene in both models. A mutation can have several effects on a gene. It might lead to a disfunctional protein product. But it also possible that the promoter region is modified and that the gene is transcribed at a much higher rate than its original. We modelled mutation by increasing the transcription rate of the gene copy by a factor of 10. Figure 5.10 shows the results of this experiment. The hysteresis-based model is apparently not severely perturbed by the modification. The amplitude of the oscillations is decreased and less proteins are produced than in the model with only one gene (2043 maximum compared to 2495). It seems that a higher overall transcription rate of the activator gene leads to a faster transcription of the repressor protein and in turn to a faster degradation of the activator protein. This might be the reason for the damping of the amplitudes. But the genetic circuit still exhibits regular oscillations. In contrast to this, the delay-based model is affected by the introduction of a second mutated gene. It does not exhibit any oscillations but the number of clock proteins increases. A maximum value of was observed during a simulation time of 200 hours. We tested the behaviour of both genetic oscillators when a second gene with increased transcription rate is introduced. This modification can be interpreted as a simulation of a duplication of the clock gene. However, the main intention of this experiment was to examine the behaviour of both models if key genes are copied and mutated. It is questionable if the modification that we introduced are a good model of a real gene duplication. Nevertheless it was shown that the hysteresis-based model is less susceptible to structural modifications. This is an characteristic that is supported by evolution since gene networks that are easily affected by mutations might die out quickly. The ability to function reliably even if key components are mutated is probably necessary for the circadian clock to be successfully embedded within the cell. Influence of changes in the Protein-DNA binding rates In a previous study (Barkai & Leibler 2000), the delay-based model of a biomolecular clock (Leloup et al. 1999) has been criticised since it exhibits very unstable and noisy oscillations if simulated stochastically. Later on, Gonze, Halloy & Goldbeter (2002) repeated this stochastic simulation with different rate constants. Theses experiments revealed that the delay-based oscillator is able to oscillate reliably if the rates of the reaction between the clock protein and its own

76 Chapter 5. Experiments 68 gene are set to very high values. Nowadays it is believed that high rate constants in these reactions are crucial for the robustness of the oscillation in many models of circadian clocks (Forger & Peskin 2005). We used the same values for the binding reactions in our experiments for both models, hysteresisand delay-based. But we briefly repeat the simulations conducted by Barkai & Leibler (2000) with low rate constants to finalise our comparison of both models. 160 Stochastic simulation with Ω= Stochastic simulation 140 mrna P N C R Numer of molecules Number of molecules Time [h] Time [h] (a) Delay-based model (b) Hysteresis-based model Figure 5.11: Simulation of both circadian clocks with low rate constants Stochastic simulation of both models with low rate constants for the binding reactions between DNA and proteins. The rates were set to 50 (binding) and 10h 1 (dissociation). Our findings match the results obtained by Barkai & Leibler (2000). Whereas the hysteresis-based model exhibits stable and pronounced oscillations even with low rate constants, the delay-based still oscillates but in a very noisy manner. No period that is even close to 24 hours is visible. The problem is here that the true values that reflect the velocity of the protein-dna binding reactions are not known. Even if it has been observed that high rate constants are crucial for the robustness of the model, it is not clear if these rate constants reflect reality. Currently there exists no experimental data about the kinetics of these reactions. However, the obtained results can be seen as an indication about the soundness of both models since real circadian clocks have to function reliably if their parameters are changed due to external influences such as change in temperature and current state of the organism (hunger, stress, etc.).

77 Chapter 5. Experiments Synchronising several oscillating cells The two models presented in this chapter are oscillators with a very general structure. They do not contain features specific to any organism. They exhibit a somehow contradicting behaviour and only noisy or no oscillations at all under certain conditions. However, the ability to create stable circadian oscillations under a large variety of external conditions is thought to be a key feature of biological clocks (Barkai & Leibler 2000). Changes in transcription and translation rates may arise from variations in nutrition, growth condition or temperature. It seems to be reasonable that evolution favours designs of cellular systems that function reliable despite global changes in their environment. There might be several factors that can help biomolecular clocks to achieve this task. The entrainment by daylight is certainly such a factor. In the biomolecular clock of Neurospora, light enhances the degradation of the clock protein and by doing this, exerts a periodic forcing of the clock that was found to improve the noise resistance (Gonze, Halloy & Goldbeter 2002). Light can be seen as some kind of external synchronisation. There are also theories of an internal synchronisation e.g. a synchronisation between different cells such that the oscillations of a group of cells are more stable than the oscillations of a single cell (Forger & Peskin 2005). Experiments and Results This section presents a short experiment about a possible mechanism of synchronisation between different clocks. One possibility is to simply average the oscillations of several cells. But we would expect that several noisy oscillations which are averaged over a large number of cells simply disappear due to the different shift of each oscillation. In order to cope with this problem, a single run of the delay-based model was observed at Ω = 50. After this system left the transient phase and settled into some more or less stable limit cycle, the numbers of each molecular species were recorded and used as initial state for a new run of 100 cells. Again, the average was taken of the individual oscillations. The rationale of this approach is that each cell should start at a common initial state such that the initial shift of the oscillations against each other is zero. The results of these experiments are shown in figure 5.12.

78 Chapter 5. Experiments 70 In both cases, the experiments were not very successful. In fact, the oscillations that were averaged over several cells are even weaker than the oscillations of the single cell. The autocorrelation function decays very fast and a limit cycle is nonexistent. However, we can also observe that the very first oscillations are clearly pronounced in each experiment. In the later course of the simulation, the oscillations become noisier and their average converges towards a straight line. What can be concluded from these experiments? First of all, there is almost certainly some kind of synchronisation of oscillating cells (Forger & Peskin 2005). But given the results from these experiments and given the fact that cells are clearly separated compartments, it does not appear to be reasonable to simply average the oscillations of several noisy cells. It might be possible to improve the results of these experiments by combining external and internal synchronisation. We could simply introduce a factor that the simulates the influence of daylight, for instance changing the degradation rate of the clock protein in the delay-based model every 12 hours, and then average the oscillations of several cells. However, this was not possible in our experimental setup since after the SPN model is created, it is simulated in one run and there is no possibility to change parameters in the model during a simulation in the Systems Biology Workbench. Recent findings suggest that several noisy oscillatory cells are synchronised by a messenger substance by a messenger substance (Gonze, Bernard, Waltermann, Kramer & Herzel 2005). As an example, cells in in the suprachiasmatic nucleus of the hypothalamus, which is assumed to be the circadian pacemaker in mammals, exhibit oscillations with free-running periods if examined in isolation. But the suprachiasmatic nucleus as a whole exhibits regular oscillations with stable period close to 24 hours. In a different study, a model of a cell population in the hypothalamus was developed and it was shown that this population can be synchronised by introducing a global variable representing a neurotransmitter which influences directly the transcription rate of the clock gene (Gonze et al. 2005). However, the details of the real synchronisation of oscillating cells in mammals are still unknown. It is not known which messenger substance actually enforces the synchronisation. Moreover how this substances interacts with the molecular clocks in the individual cells is also unknown. Our results suggest that there must be some form of global synchronisation to ensure a stable circadian rhythm on a tissue level since the simple averaging of single oscillators does not improve the stability of the circadian rhythm.

79 Chapter 5. Experiments Stochastic simulation with Ω= Stochastic simulation with Ω=50 1 Stochastic simulation with Ω= mrna P N all proteins Numer of molecules number of P N molecules Sample Autocorrelation Time [h] number of mrna molecules Lag 1000 Stochastic simulation with Ω= Stochastic simulation with Ω=50 1 Sample Autocorrelation Function (ACF) 900 mrna P N all proteins Numer of molecules number of P N molecules Sample Autocorrelation Time [h] number of mrna molecules Lag 1800 Stochastic Simulation with Ω = Stochastic Simulation with Ω = 50 1 Stochastic Simulation with Ω = mrna nuclear protein all protein molecules Number of mrna or protein molecules Nuclear protein molecules P N Sample Autocorrelation Time [h] mrna molecules M P Time (h) Figure 5.12: Synchronisation of several cells The first row represents an experiment in which the molecule numbers were simply averaged over 100 runs. The second row shows the results of the second experiment. In this case, the initial numbers of all molecular species were set to a value within the limit cycle and the results again averaged over 100 runs. The last row gives a simulation of a single cell at the same system size (Ω = 50). The fist plot in each row gives the time evolution of mrna, nuclear protein (P N ) and the whole protein population. The second plot gives plot of mrna versus the nuclear protein abundance and the third one the time course of the autocorrelation function.

80 Chapter 5. Experiments Delay based model (Gonze et al. 2002) 160 Hysteresis based model (Barkai and Leibler 2000) Half time of autocorrelation Half time of the autocorrelation Ω Ω (a) Delay-based model (b) Hysteresis-based model Figure 5.13: Robustness of the oscillations in both models measured by half-life of autocorrelation. The half-life of the autocorrelation is plotted against Ω, the size of the system. 5.4 Discussion The primary intention of these experiments was to show a practical application for the extended Petri Net Kernel and to give a comprehensive example that underlines the need for stochastic models in biology. But our objective was also to present a more detailed comparison of two stochastic models of circadian clocks and to give new insights into the architecture of the true clock. The first aim has been achieved. The Petri Net Kernel in its extended version has proven to be useful in these experiments. The Petri Net representation is a visualisation of the reaction steps that is easy to understand and it can be simulated efficiently using the Systems Biology Workbench. One of the biggest advantages of the Kernel is that is does not require any knowledge of programming languages to create a model. The user is only required to use a graphical editor with an intuitive interface and to create a network of nodes and arcs. Furthermore, the experimental results published by Gonze, Halloy & Goldbeter (2002) and Vilar et al. (2002) have been successfully recreated. When it comes to the section objective, the comparison of two clock architectures, the results are more difficult to evaluate. We were able to recreate simulation results for the delay-based model that were published by Gonze, Halloy & Goldbeter (2002). Furthermore we extended this approach, to simulate a stochastic model with changing system size, to the hysteresis-based model. Even if both models have already been compared by Barkai & Leibler (2000), the size of the system was not changed in their study. We also simulated the effects of gene duplication and mutation

81 Chapter 5. Experiments 73 in both models and investigated the synchronisation of several noisy oscillators. In addition, our simulations support the results from Barkai & Leibler (2000), that the delay-based model is susceptible to changes in the protein-dna binding reactions. In general, the hysteresis-based model seems to be less susceptible to changes in its rate constants and is also hardly affected by duplication and mutation of its key gene. The delay-based model is severely affected by both modifications. In addition, if the size of the system is small and degradation rate of the repressor protein is low, the hysteresis-based model is enhanced by the fluctuations on a molecular level. Changes in transcription and translation rates may arise in a real cell and gene duplication is a common event in the evolution of simple organisms. The robustness in the presence of noise is also an important factor since it was shown in biological experiments that the circadian clock in Neurospora is able to work reliably if the numbers of key proteins are in the order of 20 molecules (Merrow, Garceau & Dunlap 1997). According to Barkai & Leibler (2000), the ability to resist such uncertainties was probably one of the decisive factors in the evolution of circadian clocks and should be reflected in the underlying oscillation mechanism. From this perspective, the hysteresis-based model seems to be more sound. On the other hand, we were able to repeat experiments that reveal that the delay-based model (Gonze, Halloy & Goldbeter 2002) approaches the oscillatory behaviour of its deterministic formulation if the size of the system is increased. The oscillations are more robust and exhibit a stable period of about 24 hours. The hysteresis-based model reveals a somehow contradictory behaviour. We chose a parameter setting with high degradation rates of the repressor protein and Ω = 1. Stochastic and deterministic simulation revealed robust oscillations for these parameters. But when we increased Ω in the same way as we did for the delay-based model, the hysteresis-based model did not oscillate anymore. To conclude, we cannot draw any final conclusions about the validity of each model due to limited time, generality of both architectures and the fact that the true rate constants of many reactions are unknown. The delay-based model does not capture all details of the circadian clock in Neurospora or Drosophila. The hysteresis-based model is not specific to any organism but contains components that were found in several genetic oscillators. It seems to us that the hysteresis-based model is more sound since it is less susceptible to changes in its environment and mutations. Its characteristics, postive feedback loops and two active proteins, could serve as starting points for the construction of better models.

82 Chapter 6 Conclusions 6.1 Concluding remarks and Observations The outcome of this dissertation is twofold. First, a software for the modelling and simulation of biological processes with Stochastic Petri Nets was created. Second, this software was used to model genetic circuits that are of current scientific interest. The implementation of the experimental framework lasted for about one month. A new version of the Petri Net Kernel, called PNK 2e, was developed. This version can import Petri Nets from SBML. Petri Nets can also be created using a graphical editor and their behaviour can be simulated in the Systems Biology Workbench, either stochastically or deterministically. The SPN can be written to either SBML or CMDL, two description languages for biological models. The aim was to keep the usage of the software as simple as possible. No programming experience is required to create a model and to simulate it. PNK 2e was presented during the poster session at the BioSysBio conference 2005 in Edinburgh. Since Stochastic Petri Nets provide an intuitive representation of stochastic models that are commonly used in the Systems Biology community, the tool received some interest and we received encouragement for our work. Furthermore, PNK 2e has been announced on the webpage and mailing list of the SBML project. It is available on the internet 1, together with a manual and a step-by-step user guide. We hoped that the use of freely available software would save us some time. This expectation was met. On the other hand, we found out that it can be difficult to be dependent on the work and good will of others. During the first part of this project, some bugs were found in the Systems Biol- 1 trieglaf/pnk2e 74

83 Chapter 6. Conclusions 75 ogy Workbench. It was difficult to correct the errors without a deep knowledge of the program. Therefore we contacted the developers of the workbench and asked for help. Fortunately, they were very helpful and corrected the problem within days. But this is certainly not always the case. We summarize that the software has aroused interest during a first presentation because Markov Processes and graphical models are approaches biologists are very familiar with. The need for stochastic models is widely accepted but software tools that are really user-friendly and offer all the functionality needed by biologists are still rare. When it comes to the experimental part, conclusions are more difficult to draw. We were able to recreate the simulation results of others and PNK 2e has proven its usefulness during these experiments. In addition, we conducted a comparison of two competing architectures for circadian clocks. A comparison at this level of detail has not been done before. We could also present some results about hypothetical models of the synchronisation of different cells. We compared a model based on delay induced by phosphorylation of the clock protein and a model based on hysteresis or lag caused by slow degradation of a repressor protein. The delay-based model is very sensitive to slow rate constants in the binding reactions of DNA and protein and to duplication of the clock protein. Its oscillations become very noisy if few molecules are involved. On the other hand, the hysteresis-based model seems to be enhanced by noise. For some parameter settings, this model oscillates only in the stochastic simulation. The deterministic solution, which does not take noise into account, arrives in a steady state. On the other hand, if we start from a setting of parameters that leads to oscillations in a stochastic and deterministic simulation, and further increase the size of the system, the genetic circuit does not exhibit any oscillations. This behaviour contradicts our expectations but might be due to the lack of detail in the model. Both models capture only core features of real circadian clocks and we can therefore only draw general conclusions about possible architectures. It is known that evolution favours designs that are robust to noise and work well under a variety of external influences. In our experiments, it was shown that the hysteresis-based model is less susceptible to changes in the rate constants. From this perspective, one might prefer this model. It seems also to be enhanced by molecular fluctuations which are known to be an important factor in the cell (Arkin et al. 1998). On the other hand, our experiments revealed that the hysteresis-based model does not oscillate if many molecules are contained in the system. This fact contradicts common expectations since it should function better with increasing size of the system.

84 Chapter 6. Conclusions 76 In contrast to this, the delay-based model oscillates independently of the system size. But this model seems to be heavily affected by changes in its rate constants, especially in changes of the binding rates between protein and DNA. We are sure that further advancement, both in computational but also experimental sciences, is necessary before we can draw final conclusions about the architecture of biomolecular clocks. Nonetheless, both scientific fields are developing at a fast pace and we hope these advances will be made very soon. 6.2 Unsolved Problems We were able to obtain some interesting results. But is certainly difficult to deliver a coherent piece of work during three months. The software, PNK 2e, has scope for further improvement. As an example, it would be very convenient if the editor could assign several rate constants to a stochastic transition. Each rate could be assigned to a different environment such as size of the system, temperature etc. The user could choose a set of rates for an experiment and compare simulation results with different settings easily. The editor itself could also be extended. It might be useful to have the option to merge different nets or to use hierarchical nets e.g. nets that contain subnets in a transition. Concerning the experiments on stochastic models of genetic oscillators, there is certainly a lot of work to be done. First of all, the search for rate constants that truly reflect the real velocity of the reactions is still continuing. Furthermore, many details of real circadian clocks are still to be uncovered by experimental means. We also need new formal methods that are able to analyse the complex behaviour of nonlinear systems. There is also a lack of methods that can capture the reliability of the oscillations in an adequate way. Mere visual inspection of the oscillations is not sufficient and the autocorrelation is often misleading since it is always decreasing for noisy oscillations. 6.3 Suggestions for Future Work The advantages of Petri Nets compared to other modelling formalisms used in Biology is that their theory has been researched for decades. There exist algorithms that can not only be used to examine their dynamic behaviour but also to search for structural properties and to derive steady

85 Chapter 6. Conclusions 77 state information by algebraic means. However, in this dissertation, we focussed on results from stochastic and deterministic simulations. This is due to the fact that, for the models that have been examined here, structural analysis yielded not very interesting results and steady state information was not possible to obtain. But here lies the true advantage of Petri Nets. The full use of this potential will emerge if the models become more complex such that a topological analysis will give more interesting results. So far, it can at least be used to check if the systems fulfils the assumptions made, such as invariants on the number of enzyme molecules. An algebraic computation of the steady state distribution requires an upper bound on the number of states in the Markov process. This can be enforced by simply limiting the state space. But this threshold has to be chosen carefully. The efficient derivation of the steady state distribution is also an interesting topic and there is certainly scope for future research. Some attempts have been made in this dissertation to simplify the representation of biological reactions with Petri Nets. The problem is that the net becomes very large with increasing complexity of the reactions. Further attempts could be made to develop new representations that maintain the advantages of Petri Nets and capture the complexity inherent in biological processes even more easily. It might also be interesting to develop new simulation algorithms that make use of the information that is captured by the net. As it was outlined in chapter 2, some useful information is lost if the net is translated into SBML e.g. information about dependencies among the reactions that can be used to perform efficient simulations. Even if the use of other software can save a lot of time and effort, it is also an advantage to use one s implementations that can be adopted more easily. For instance, if we could stop the simulation at a time of our choice, we could simulate the influence of daylight by changing the rate constants during the simulation.

86 Appendix A User guide to PNK 2e This is the manual for PNK 2e, a software developed using the Petri Net Kernel (PNK) version 2.2. The Petri Net Kernel is a framework for the development of Petri Net tools. It was developed at the Humboldt University of Berlin, Germany. Its extended version, PNK 2e, was developed by Ole Schulz-Trieglaff, during his M.Sc. dissertation at the University of Edinburgh, UK. PNK 2e features Stochastic Petri Nets, a modelling formalism that stems from Computer Science. Stochastic Petri Nets (SPNs) are closely related to Markov Jump Processes. Their behaviour can be simulated using the Gillespie Algorithm and its improved versions (Gibson-Bruck, Tau Leap). PNK 2e extends the PNK by features for the modelling of biological processes. PNK 2e means extended Petri Net Kernel version 2. The software is able to create a Petri Net representation of a model described in SBML (Systems Biology Workbench Language). The net is drawn by using a simple algorithm implemented by Alexander Gruenewald, Humboldt University of Berlin. The dynamic behaviour of the Petri Net can be simulated using the Systems Biology Workbench. In order to achieve this, PNK 2e translates the net back into its SBML description and passes this description automatically to the Workbench. Alternatively, a Petri Net can be created using the graphical editor of the Kernel. Licence Agreement PNK 2e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the license. PNK 2e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY. You are NOT ALLOWED to CHANGE THE ORIGINAL COPYRIGHT NOTICE. See the GNU General 78

87 Appendix A. User guide to PNK 2e 79 Public License for more details. You should have received a copy of the GNU General Public License along with PNK 2e; if not see Quick start This is a brief introduction to PNK 2e. The software requires the Java version The archive PNK2e.zip can be downloaded from trieglaf/pnk2e and contains all necessary files. The Systems Biology Workbench is required to perform the simulation of the Petri Net and is available at Download and run the software If the archive PNK2e.zip is extracted, a new directory PNK2e should be created in the current directory. This directory contains several libraries in.jar format, the file PNK2e.jar which is the program itself and several subdirectories: samplenets - contains some netexamples nettypespecifications - contains examples for a net s specification toolspecification - contains some toolspecification examples If anything goes wrong, first check if you have the correct version of Java installed by executing java -version. Then try to find out if all necessary libraries are contained in the same directory as the.jar file. PNk 2e needs at least the libraries jaxp.jar, crimson.jar, SBWcore.jar and SBMLreader.jar. The remaining libraries are needed for the translation of Petri Nets into CMDL only. Open, edit and save a Petri Net The software can be started by double-clicking on the file PNK2e.jar (Windows) or by executing java -jar PNK2e.jar (Linux and other operating systems). The main menu of PNK 2e should appear. By clicking on the File menu, the user can open a file and load the net into the Kernel (see screenshot A.1). The editor is opened automatically and displays the net.

88 Appendix A. User guide to PNK 2e 80 Alternatively, the user can select New in the menu Open of the main menu to create a new Petri Net. PNK 2e can edit Stochastic Petri Nets, Biological Nets (SPNs with simplificactions for biological reactions) and Generalized Stochastic Nets (SPNs with inhibitor arcs and immediate transitions). Depending on this choice, the main menu changes to the editor menu. This menu offers the user the possibility to draw places and transitions by simply choosing the type of node to be drawn and by clicking into the editor pane. Arcs can be drawn by first clicking on the source node and then on the target node. PNK 2e also contains a function to automatically arrange a net. This function is called DoNetLayout and is available in the main menu. Figure A.1: PNK 2e after loading the SPN representation of a genetic oscillator. Simulating a net PNK 2e can simulate a Stochastic Net with or without the developed simplifications for biological reactions. This simulation is conducted in form of a token game, that is a transition that is executed is coloured for some milliseconds and the flow of the token through the net is visualised. This simulation is available in the main menu under stochastic simulation. This simulation is the correct way to simulate the net and gives a good idea of its dynamics. However, it is not well suited for large nets since it is very slow. Furthermore, data of the simulation run is not collected. If a more detailed analysis of the simulation is required, the user can choose

Appendix A. User guide to PNK 2e 81 the entry ConnectToSBW in the main menu. PNK2e then opens a window with a list of all services in the Systems Biology Workbench that are available on this computer.

89 Appendix A. User guide to PNK 2e 81 the entry ConnectToSBW in the main menu. PNK2e then opens a window with a list of all services in the Systems Biology Workbench that are available on this computer. For a description of the SBW and on how to install new modules and services, have a look at the manual of the SBW which is available at We recommend to install the simulator Dizzy in addition to the workbench because this software offers several simulation algorithms, stochastic and deterministic, and works very well with PNK 2e. But every simulation software is compatible to the Systems Biology Workbench and implements the Gillespie algorithm or one of its improved versions can be used. Figure A.2: The simulation interface of PNK 2e. If Dizzy is installed, the list of SBW services should contain an entry Dizzy simulation service. After clicking on this entry, the Dizzy simulation window opens (see screenshot A.2). The window contains a list with all available simulation algorithms. Start and end time of the simulation can be chosen. In case of a stochastic simulation, the user can also decide to average the result over several runs. If a deterministic simulation is chosen, the user has also decide about step size and maximum relative and absolute error.

Stochastic Simulation.

Stochastic Simulation. (and Gillespie s algorithm) Alberto Policriti Dipartimento di Matematica e Informatica Istituto di Genomica Applicata A. Policriti Stochastic Simulation 1/20 Quote of the day D.T.