Modeling Biomolecular Networks in Cells

Size: px

Start display at page:

Download "Modeling Biomolecular Networks in Cells"

Randolf Hart
5 years ago
Views:

2 Modeling Biomolecular Networks in Cells

3 Luonan Chen Ruiqi Wang Chunguang Li Kazuyuki Aihara Modeling Biomolecular Networks in Cells Structures and Dynamics 123

4 Prof. Luonan Chen Shanghai Institutes for Biological Sciences Chinese Academy of Sciences Shanghai China Prof. Chunguang Li Department of Information Science and Electronic Engineering Zhejiang University Hangzhou China Prof. Ruiqi Wang Institute of Systems Biology Shanghai University Shanghai China Prof. Kazuyuki Aihara University of Tokyo Tokyo Japan ISBN e-isbn DOI / Springer London Dordrecht Heidelberg New York British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: Springer-Verlag London Limited 2010 MATLAB and Simulink are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA , U.S.A. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Cover design: estudiocalamar, Figueres/Berlin Printed on acid-free paper Springer is part of Springer Science+Business Media (

5 This book is dedicated to our colleagues and our families

6 Preface One of the major challenges for post-genomic biology is to understand how genes, proteins, and small molecules interact to form cellular systems. It has been recognized that a complicated living organism cannot be completely understood by merely analyzing individual components, and that interactions between these components or biomolecular networks in terms of structures and dynamics are ultimately responsible for an organism s form and functions. To elucidate the essential principles or fundamental mechanisms in cellular systems, study of structures and dynamics of biomolecular networks in cells is increasingly attracting much attention from biology, mathematics, and engineering communities. In particular, there are many complicated but interesting phenomena generated from the biomolecular networks due to their specific structures and nonlinear dynamics such as switching behavior and collective rhythms of biological systems which are ubiquitous in living organisms. This book will cover contents and topics on modeling biomolecular networks and analyzing their nonlinear dynamics in a comprehensive manner, especially stressing the viewpoints of systems and engineering. Attention will be focused on deriving general theoretical results and revealing the essential principles of biological systems on the basis of nonlinear dynamical and control theories. In particular, we will describe how to model a general molecular network in a single cell; how to construct a molecular network with specific functions or structures, such as gene switching networks and gene oscillating networks in individual cells at the molecular level; how to model a general multicellular system with the consideration of external fluctuations and intercellular coupling of signal molecules; how to design a synthetic molecular network from the viewpoint of forward engineering; and how to analyze and further control the nonlinear phenomena of living organisms at the molecular level, such as switching behavior, cooperative dynamics, and synchronization of biological oscillators in multicellular systems. This book will provide upper-level undergraduate students, graduate students, and researchers in the areas of mathematics, engineering, computer

7 viii Preface science, and biology with a computational or theoretical background in both academia and industry, e.g., fields of systems biology, bioinformatics, and synthetic biology. The book assumes little knowledge of molecular biology with each chapter covering the necessary material. Biologists would find the book useful if they have a strong computational background or training in systems biology or computational biology. Readers are assumed to have undergraduate-level backgrounds in mathematics, engineering, and basic biology. This book will introduce readers to the challenges in life sciences: from the understanding of individual molecules to system-level analysis, and from the static interactions between molecules to dynamical networks, in the hope that the readers will build on them to make new discoveries of their own. Designing and constructing synthetic molecular networks from the perspectives of both synthetic biology and engineering are also described in the book. Unlike traditional books on systems biology and bioinformatics, this book aims to show engineers and biologists the essentials of biomolecular networks with emphasis on structures and dynamics by presenting cutting edge research topics and methodologies, which will be vital for their future careers. Contents of this book are mainly based on collaborative studies and discussions with many researchers. Collectively and individually, we express our gratitude to these people for their collaboration. In particular, the authors thank Yong Wang, Xing-Ming Zhao, Rui-Sheng Wang, Tetsuya J. Kobayashi, Zhi-Ping Liu, Tianshou Zhou, Zhujun Jing, Marto di Bernardo, Masaaki Takada, Rui Liu and Shinji Hara for their cooperation and valuable comments in bringing this book to completion. The studies forming the basis of this book were partially supported by the ERATO Aihara Complexity Modeling Project, Japan Science and Technology Agency, Japan; FIRST, Aihara Innovative Mathematical Modeling Project, JSPS; Grant-in-Aid for Scientific Research on Priority Areas from MEXT of Japan; the Chief Scientist Program of Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences with the grant number 2009CSP002; the National Natural Science Foundation of China under Youth Research Grants , , and ; Shanghai Pujiang Program; the Distinguished Youth Foundation of Sichuan Province under Grant 07ZQ ; the Program for New Century Excellent Talents in University under Grant NCET ; and the JSPS- NSFC Collaboration Project. Shanghai, Shanghai, Hangzhou, Tokyo, Luonan Chen Ruiqi Wang Chunguang Li Kazuyuki Aihara June 2010

8 Contents 1 Introduction Biological Processes and Networks in Cellular Systems GeneRegulation:GeneRegulatoryNetworks Signal Transduction: Signal Transduction Networks Protein Interactions: Protein Interaction Networks Metabolism:MetabolicNetworks Cell Cycles and Cellular Rhythms: Nonlinear Network Dynamics APrimertoNetworks Basic Concepts of Networks Topological Properties of Networks APrimertoDynamics DynamicsandCollectiveBehavior SystemStates Structures and Functions Cellular Noise TimeDelays Multiple Time Scales Robustness and Sensitivity NetworkSystemsBiologyandSyntheticSystemsBiology Outline of the Book Dynamical Representations of Molecular Networks Biochemical Reactions MolecularNetworks Graphical Representation ExampleofInteractionGraphs ExampleofIncidenceGraphs Example of Species-reaction Graphs Biochemical Kinetics Stochastic Representation

9 x Contents Master Equations for a General Molecular Network StochasticSimulation Analysis of Sensitivity and Robustness of Master Equations Langevin Equations Fokker Planck Equations Cumulant Equations Deterministic Representation Basic Kinetics Deterministic Representation of a General Molecular System Michaelis Menten and Hill Equations Total Quasi-steady-state Approximation Deriving Rate Equations Modeling Transcription and Translation Processes Hybrid Representation and Reducing Molecular Networks Decomposition of Biomolecular Networks Approximation of Continuous Variables in Molecular Networks Gaussian Approximation in Molecular Networks Deterministic Approximation in Molecular Networks Prefactor Approximation of Deterministic Representation Stochastic Simulation of Hybrid Systems Stochastic versus Deterministic Representation Deterministic Structures of Biomolecular Networks A General Structure of Molecular Networks Basic Definitions A General Structure for Gene Regulatory Networks GeneRegulatoryNetworkswithCellCycles GeneRegulatoryNetworksforEukaryotes GeneRegulatoryNetworksforProkaryotes Interaction Graphs and Logic Gates InteractionGraphsandTypesofInteractions Logic Gates Qualitative Analysis of Deterministic Dynamical Networks Stability Analysis Bifurcation Analysis Examples for Analyzing Stability and Bifurcations A Simplified Gene Network A Two-gene Network A Three-gene Network Robustness and Sensitivity Analysis...141

10 Contents xi RobustnessMeasures Sensitivity Analysis ControlAnalysis Control Coefficients of Metabolic Systems MetabolicControlTheorems MonotoneDynamicalSystems Notation Decomposition of Monotone Systems Stability Analysis of Genetic Networks in Lur e Form AGeneticNetworkModel Stability Analysis of Genetic Networks Without Noise Stochastic Stability of Gene Regulatory Networks Mean-square Stability Stochastic Stability with Disturbance Attenuation Examples Design of Synthetic Switching Networks TypesofSwitches Simple Switching Networks Bistability in a Single Gene Network The Toggle Switch The MAPK Cascade Model Design of Switching Networks with Positive Loops Detection of Multistability Enzyme-driven Switching Networks Design of Synthetic Oscillating Networks Simple Oscillatory Networks Delayed Autoinhibition Networks Goldbeter smodels Relaxation Oscillators Stochastic Oscillators Design of Oscillating Networks with Negative Loops TheoreticalModelofCyclicFeedbackNetworks ASpecialCyclicFeedbackNetwork AGeneralCyclicFeedbackNetwork Construction of Oscillators by Non-monotone Dynamical Systems Design of Molecular Oscillators with Hybrid Networks: GeneralFormalism...255

11 xii Contents 8 Multicellular Networks and Synchronization A General Multicellular Network for Deterministic Models Deterministic Synchronization of Cellular Oscillators Complete Synchronization Other Types of Synchronization Spontaneous Synchronization of Deterministic Models Entrained Synchronization for Deterministic Models Noise-driven Synchronization for Stochastic Models Without Coupling A General Multicellular Network for Stochastic Models with Coupling AModel ExampleofaGeneRegulatoryNetwork TheoreticalAnalysis Algorithm for Stochastic Simulation NumericalSimulation Deterministic Synchronization of Genetic Networks in Lur e Form Stochastic Synchronization of Genetic Networks in Lur e Form Transient Resetting for Synchronization Without Coupling References Index...339

12 1 Introduction Modern molecular biology has led to remarkable progress in understanding individual cellular components. One of the next main challenges is to elucidate biological networks comprising components revealed by reductionism in molecular biology at a system level. Because the behavior of living organisms can seldom be attributed to individual components, one has to assemble the components into networks in order to understand various complex biological processes. Many scientists are becoming increasingly interested in the research on system-level understanding of living organisms, especially, the topics related to topological structures, system dynamics, and biological functions of various molecular and cellular networks in living organisms. Biological functions can be carried out through interactions and robust regulations of thousands of cellular components such as genes, ribonucleic acids (RNAs), proteins, and metabolites in a concurrent manner. Due to the large number of components involved in a molecular network, it is almost impossible to intuitively understand how the network executes various complex cellular functions. The understanding of complex biological systems requires the integration of experimental and theoretical research. Therefore, mathematical modeling is a prerequisite to reveal the biological implications of molecular networks, including gene regulatory networks, transcription regulatory networks, RNA interaction networks, protein RNA interaction networks, protein interaction networks, metabolic networks, and signal transduction networks. Using an appropriate model, qualitative or quantitative analysis can be conducted to gain deep insight into the essential mechanisms of various biological functions and processes, which is crucial for experimentally verifiable predictions and further successful advancement of biological science. Both the structures and the dynamics underlie the functionality of molecular networks, ranging from transcriptional regulation to cell signaling. Network structures can be inferred based on topological features of the networks. For example, network modules can be identified on the basis of topological distances, and network motifs can be detected by their recurrent topological patterns. Different dynamics may correspond to different functions of a

13 2 1 Introduction specified molecular network. For example, periodic oscillations in nonlinear dynamics correspond to various biological rhythmic phenomena with periods ranging from seconds to years, while multistability corresponds to the capacity of cellular systems to achieve multiple distinct stable steady-states in response to a set of external stimuli. Moreover, some relationship may exist between network structures and system dynamics; for example, network topology sometimes determines network dynamics (Muller et al. 2008), and network dynamics analysis may also reveal topological changes (Luscombe et al. 2004). Technological innovations in theoretical and computational methods may significantly advance our understanding of the functionality of molecular networks. A cellular system comprises signaling, metabolic, and regulatory processes in a hierarchical structure. Actually, a living organism can be viewed as a huge biochemical reaction network with each chemical (e.g., an RNA, protein, or metabolite) as a node and each reaction as an edge, which is constantly affected by both internal and external stochastic fluctuations. Thousands of components or chemicals interact with each other and participate in these complicated nonlinear processes. From the theoretical viewpoint, besides significant time delays and specific diffusion processes, there are three major difficulties to model such a complicated system, i.e., (1) nonlinearity, (2) a large scale, and (3) stochasticity. Therefore, it is necessary to model a biological system both by exploiting its special properties and by developing special theories to make such a molecular network tractable for theoretical analysis. In this book, we present and further propose various mathematical theories to model and analyze a variety of molecular networks, including stochastic processes (e.g., master equations), nonlinear dynamics (e.g., monotone dynamical systems), and control theory (e.g., Lur e systems and linear matrix inequality (LMI) because of their universality and ability to mimic cellular networks comprising many dynamically interacting components. In particular, we aim at providing a general framework for modeling and analyzing dynamical networks at the molecular level from the viewpoints of systems and engineering. Three graph representations, e.g., interaction graph, incidence graph and species-reaction graph, are also adopted to efficiently model biomolecular network structures in the book. In this chapter, we will provide a brief introduction to essential regulation processes in cellular systems and basic mathematical concepts in networks and modeling. For more detailed and systematic knowledge on these areas, readers can refer to related expert books. 1.1 Biological Processes and Networks in Cellular Systems A living organism can be regarded as a huge nonlinear biochemical reaction system which can be represented by the interactions of biomolecules in-

14 1.1 Biological Processes and Networks in Cellular Systems 3 cluding genes, RNAs, proteins, metabolites, thereby forming various types of biomolecular networks (Chen et al. 2009). These complex networks indispensably exist in cellular systems and play fundamental and essential roles in giving rise of life and maintaining the homeostasis in living organisms. In particular, biological processes are governed by complex networks ranging from gene regulation to signal transduction. These processes are required to be modeled at the molecular level to accurately reflect their essential properties. In this section, we provide a brief review of the most- and best-understood processes. Note that this is a very general and brief introduction intended mainly for mathematicians and computer scientists who are not familiar with molecular biology. Biology-oriented researchers can skip the details in this section Gene Regulation: Gene Regulatory Networks Genes are the fundamental units of biology. Genes encode proteins which carry out various functions required for the maintenance of life. Gene regulation is a complex process that begins with the deoxyribonucleic acid (DNA) sequence for a given gene. The process from DNA through many intermediates to functional proteins involves transcription, translation, transport, degradation, biochemical modification, and many other mechanisms. The central dogma of molecular biology, i.e., DNA encodes RNA which in turn encodes protein molecules, as shown in Figure 1.1, provides a framework for understanding the transfer of sequence information flow from DNA via RNAs to proteins. As indicated in Figure 1.1, there are three main processes, i.e., transcription, translation, and replication in the central dogma of molecular biology. R e p l i c a t i o n D N A R N A P r o t e i n T r a n s c r i p t i o n T r a n s l a t i o n D e g r a d a t i o n D e g r a d a t i o n Figure 1.1 The central dogma of molecular biology Transcription of a gene is the process by which RNA polymerase (RNAP) produces messenger RNA (mrna) that corresponds to the gene s coding sequence, as shown in Figure 1.2. The gene consists of a coding region and a regulatory region. The coding region is the part of the gene that encodes a certain protein. The regulatory region is the part of the DNA that contributes to the regulation of the gene. In particular, it contains binding sites for proteins known as transcriptional factors (TFs). The binding sites are also called a

15 4 1 Introduction promoter of a particular gene, i.e., the regulatory region preceding the gene. In eukaryotes, every gene has its own promoter, whereas in prokaryotes, a group of genes called operon can be transcribed as a single mrna molecule and hence are regulated by a single promoter. The TFs act by binding to the DNA, directly or in a complex form with other TFs or cofactors, to regulate the rate at which a specific target gene is read. When bound to the DNA, the TFs change the probability per unit time that an RNAP binds the promoter and thus affect the rate at which the RNAP initiates transcription. Once bound to the DNA, the TFs or their complexes recruit or allow RNAP to bind to a specific site at the promoter. RNAP forms a transcriptional complex which separates the two strands of the DNA in a step-wise manner, and transcribes the coding region into mrna. The TFs can act as activators or repressors, depending on whether the transcriptional rate is increased or decreased. The transcription regulation or the transcription process is one of the most important and also essential processes for gene activities, which is generally nonlinear with large stochastic fluctuations. Clearly, the regulation of the transcriptional process is facilitated mainly by pairwise interactions between TFs and DNA. Therefore, a transcription regulatory network is mainly formed by TF-DNA interactions. Besides TFs, there are many other proteins, known as transcriptional cofactors or co-regulators, which do not bind to the DNA themselves but bind to TFs to regulate the transcription process by linking TFs and RNAP. In addition, many epigenetic factors, such as DNA methylation and histone modification, also affect the transcriptional process. For both prokaryotes and eukaryotes, translation occurs in the cytosol. Between the transcription and the translation, in eukaryotes, mrna must be translocated from the nucleus to the cytosol where mrna binds to ribosomes. During translation, a ribosome moves along the mrna, three bases at a time, and each three-base combination, or codon, is translated into one of the 20 amino acids. The function of the ribosome is to copy the one-dimensional structure of mrna into a one-dimensional sequence of amino acids, which folds into a three-dimensional protein structure, thereby facilitating different kinds of functions. The replication of DNA is the basis for biological inheritance and is a fundamental process occurring in all living organisms to copy their DNA. During replication, each strand of the original double-stranded DNA molecule serves as a template for reproduction of the complementary strand. Hence, following DNA replication, two identical DNA molecules are produced from a single double-stranded DNA molecule. Cellular proofreading and error-checking mechanisms ensure nearly perfect fidelity for DNA replication. Degradation of mrna and proteins constantly occurs via the cellular machinery. Specifically, mrna is degraded by a ribonuclease which competes with ribosomes to bind to mrna. If a ribosome binds, the mrna will be translated, otherwise, the mrna will get degraded. On the other hand, pro-

16 1.1 Biological Processes and Networks in Cellular Systems 5 T r a n s c r i p t i o n a l f a c t o r b i n d i n g s i t e s ( a ) ( c ) R N A P R e g u l a t o r y r e g i o n C o d i n g r e g i o n A s i n g l e g e n e m a y b e a c t i v a t e d b y s e v e r a l d i f f e r e n t p r o t e i n c o m b i n a t i o n s T h e t r a n s c r i p t i o n a l c o m p l e x e n b l e s R N A P b i n d i n g ( b ) ( d ) R N A P m R N A T r a n s c r i p t i o n a l f a c t o r s f o r m a c o m p l e x a t t a c h e d t o a r e g u l a t o r y r e g i o n T r a n s c r i p t i o n f r o m D N A t o m R N A Figure 1.2 Processes involved in the transcription regulation of a gene: (a) binding sites of TFs; (b) formation of a transcriptional complex; (c) RNAP binding; and (d) transcription initiation by RNAP teins are degraded by cellular machineries such as proteasomes activated by ubiquitin tagging. The process is regulated by some specific enzymes. In addition to the above processes, many other processes also play important roles in gene regulation. In prokaryotes, the coding region is contiguous, while in eukaryotes the coding region is typically split up into several parts. Each of these coding region parts is called an exon, and the parts between the exons are called introns. In eukaryotes, the introns need to be removed, i.e., spliced. In some cases, alternative splicing occurs which allows a cell to edit an mrna molecule in different ways to produce many different proteins from the same gene. Other processes such as diffusion, cell growth, micro-rna (or non-coding RNA) regulation (Barrandon et al. 2008,Keene 2007), and electrical properties may also affect gene regulation. For instance, it has been found that micrornas may play critical roles in the regulation of gene expression, e.g., cell fate decision (Johnston et al. 2005), circadian rhythms (Nandi et al. 2009), cancer network regulation (Aguda et al. 2008), and robustness (Tsang et al. 2007). A quantitative comparison of non-coding RNA-based and proteinbased gene regulation and a review on quantitative roles of non-conding RNAs can be found in (Mehta et al. 2008) and (Shimoni et al. 2007), respectively. In addition to genetic factors (e.g., single nucleotide polymorphism (SNP) or TFs), recently, it has been found that epigenetic factors, such as DNA/histon methylation, histon acetylation, post-transcriptional genomic silencing in

17 6 1 Introduction plants, genomic imprinting in mammals, and double-strand RNA (dsrna) or RNA iinterference (RNAi), also play important roles in regulation of genes (Steensel 2005). Therefore, a real gene regulatory network should include both genetic and epigenetic regulatory factors in order to build a realistic model for understanding essential mechanisms of cellular systems. As mentioned above, a TF may act as an activator or repressor to influence the expression of other genes. The TF in turn is also a gene product. Such regulatory mechanisms form complex networks of transcriptional and regulatory interactions. For example, gene A can activate gene B and gene C but repress gene D which in turn activates A, etc. It is believed that complex behavior of living cells is created by not only the properties of individual components but also the manner in which they are connected. In general, all processes in Figure 1.1 are nonlinear and result from a variety of pairwise interactions (e.g., TF DNA, microrna mrna, RNA RNA, RNA protein, and protein protein) with large stochastic fluctuations. Therefore, a living organism can be considered to be formed by a huge nonlinear biochemical reaction network or molecular network, which generates complicated phenomena of a life Signal Transduction: Signal Transduction Networks Cells do not live in isolation. They sense, transmit, and process signals originating in other cells and their environment. As a result, a wealth of sensor systems allow cells to monitor their external and internal states. The sensing of external stimuli at the cell membrane demands the transfer of a signal to the place of action, i.e., signal transduction. Typical signals are hormones, pheromones, heat, cold, light, osmotic pressure, and appearance or concentration change of substances such as glucose, K +,Ca +, or camp. Such a process in a cell is carried out via receptors or membrane proteins which work as the interface of cells to their external environment and can bind specific ligands. The monitoring of external conditions is important for securing survival and communicating with other cells. Correct signal processing is necessary to ensure the optional response to the external and internal states to facilitate the triggering of various biological responses. Receptors can be roughly divided into two major classes: intracellular receptors and cell-surface receptors. Therefore, cells have developed two modes of importing a signal. First, the stimulus may penetrate the cell membrane and bind to a respective receptor in the cell interior. Another possibility is that the signal is perceived by a transmembrane receptor. Once the extracellular signaling molecules, e.g., the ligands, bind to the receptors, arrays of intracellular proteins form signal transduction pathways which facilitate signal transmission from the extracellular compartment to the nucleus by a cascade of biochemical reactions and thereby trigger various biological responses, including the transcription of genes, protein protein interactions, and metabolism, as shown in Figure 1.3.

1.1 Biological Processes and Networks in Cellular Systems 7 Figure 1.3 Simplified scheme for general signal transduction (from (Saez-Rodriguez et al.

18 1.1 Biological Processes and Networks in Cellular Systems 7 Figure 1.3 Simplified scheme for general signal transduction (from (Saez-Rodriguez et al. 2004)) The external signal is sensed at a specific point and is propagated to modulate activities of other components or processes. The target may be either enzymes or DNA, and especially, enzymes can be modified; for example by phosphorylation, so that their catalytic activities are increased or decreased in response to the extracellular signal. On the other hand, for DNA, the signal transduction process targets TFs, which are proteins regulating gene expression. In the simplest case, the signaling system consists of two components: a sensor that detects environmental changes and a regulatory element that influences the transcription of selected genes. In addition to the sensing and transduction of a signal, the term signal transduction traditionally includes the processing of signals. Information in cellular signaling processes is generally transferred by modifications of proteins leading to changes in their activities, such as phosphorylation, which produces a conformational change in a protein and alters its activity. The activation and inactivation of a protein is the result of signal transduction processes. Besides, the interactions between proteins are also important for signal transduction, e.g., signals from the exterior of a cell are mediated to the inside of that cell by a series of protein protein interactions (PPI) of the signaling molecules, which play a fundamental role in many biological processes and in many diseases.

19 8 1 Introduction Many signaling pathways have been extensively investigated, e.g., the general mechanism of G protein signaling pathways, mitogen-activated protein kinase (MAPK) cascades, and Jak-Stat pathways. The dynamics of signaling pathways such as the propagation of noise and stochastic fluctuations, the role of various feedbacks, the origin of multistability and oscillations, and the consequences of multiple phosphorylations have been the subject of active research. Unlike the homogeneous components in a protein interaction network, a signal transduction network is a heterogeneous one with components of not only proteins but also metabolites and small molecules Protein Interactions: Protein Interaction Networks A typical mechanism used to transfer information or signals in a cell is physical interactions between proteins. Actually, PPIs are of central importance for virtually every process in a living cell, and can be viewed as an essential part of the signal transduction process. Many proteins involved in signaling processes contain amino acid sequences, known as domains, which can bind to the domains in other proteins, i.e., an interaction called a domain domain interaction (DDI), leading to association of molecules. Figure 1.4 shows schematically an example of a protein interaction due to the interactions of two domain pairs. Hence, one of the major molecular networks, a protein interaction network or a PPI network, is formed by such PPIs. Analyzing and further unraveling PPI networks and interactions will not only facilitate better understanding of complex cellular processes, but also enable the drawing of inferences regarding functions of proteins. Biochemically, PPIs involve not only the direct-contact association of protein molecules but also long-range interactions through the electrolyte, an aqueous solution medium surrounding neighbor hydrated proteins over distances from less than one nanometer to distances of several tens of nanometers. With respect to the types of protein interactions, proteins may interact over a longer duration to integrate into a protein complex called permanent interaction (e.g., ribosomes and polymerases), or they may interact briefly with other proteins just to modify them; e.g., a protein kinase will add a phosphate to a target protein (or ligand receptor interactions), called transient interactions, by forming a functional module. This modification of proteins can itself change PPIs. In addition, a protein may also interact with another protein for transportation, e.g., from the cytoplasm to the nucleus or vice versa. Information about these interactions improves our understanding of diseases and can provide the basis for developing new therapeutic approaches Metabolism: Metabolic Networks All the processes that occur within a living cell are ultimately driven by energy. Green plants and some bacteria obtain energy directly from sunlight. Other

20 1.1 Biological Processes and Networks in Cellular Systems 9 Figure 1.4 Domain-based protein interactions. There are two domains D 1 and D 2 for Protein 1, and three domains D 3, D 4, and D 5 for Protein 2. Domain pairs D 1-D 3 and D 2-D 4 interact, which facilitate the interaction between Protein 1 and Protein 2 organisms utilize compounds made using sunlight and break them down to release energy through a process called catabolism, as shown in Figure 1.5. The most common method of breaking down these food compounds is to oxidize them, that is, to burn them but in a well-controlled way. The energy trapped in energy currencies can then be used for the regeneration, repair, and homeostatic processes termed anabolism. Metabolism is the general term for these two kinds of reactions: catabolism and anabolism. Catabolism is the set of metabolic pathways that break down molecules into smaller units by releasing energy, while anabolism is the set of metabolic pathways that construct molecules from smaller units generally by consuming energy. Metabolism is a highly organized process and generally involves thousands of reactions that are catalyzed by enzymes. Major components in a metabolic network are enzymes and substrates. Therefore, a metabolic network can be considered to be a result of enzyme substrate interactions, and is determined by the set of catalyzing enzymes, the possible metabolic fluxes, and the intrinsic modes of regulation. When

21 10 1 Introduction H e a t r e l e a s e d S i m p l e m o l e c u l e s a n d m o n o m e r s s u c h a s g l u c o s e, a m i n o a c i d s, g l y c e r o l, a n d f a t t y a c i d s C a t a b o l i c r e a c t i o n s t r a n s f e r e n e r g y f r o m c o m p l e x m o l e c u l e s t o A T P A T P A D P + P A n a b o l i c r e a c t i o n s t r a n s f e r e n e r g y f r o m A T P t o c o m p l e x m o l e c u l e s C o m p l e x m o l e c u l e s a n d p o l y m e r s s u c h a s g l y c o g e n, p r o t e i n s, a n d t r i g l y c e r i d e s H e a t r e l e a s e d Figure 1.5 Schematic representation of metabolism modeling a metabolic system, the concentrations of the molecules and their rates of change are of special interest: e.g., enzyme kinetics, which is used to investigate the dynamical properties of the metabolic networks, and metabolic control analysis (MCA), which quantifies the effect of perturbations in the network. On the other hand, the network feature of metabolism is studied with stoichiometric analysis considering the balance of compound production and degradation, e.g., flux balance analysis (FBA) or MCA. Metabolic networks are one of many types of molecular networks which consist of a variety of networks at different levels, e.g., interlocked genetic regulatory networks, transcription regulatory networks, protein interaction networks, signal transduction networks, and metabolic networks, although they have been introduced independently. For example, a signaling network is triggered by the presence of extracellular stimuli and often results in the activation of TFs, which function in gene regulatory networks, by regulating the transcription of associated genes and the synthesis of various proteins used in protein interaction networks and metabolic networks. Metabolism is responsible for the production of energy needed for all biological processes. The interconnection of different kinds of networks affects highly integrated intracellular processes. Note that besides the molecular networks, there are many other types of networks widely studied in computational biology, such as disease networks, functional linkage networks, and structural similarity networks (Chen et al. 2009).

22 1.1 Biological Processes and Networks in Cellular Systems Cell Cycles and Cellular Rhythms: Nonlinear Network Dynamics The cell cycle or cell-division cycle is the series of events that allows the division or duplication of cells. Such a phenomenon results from dynamical interactions among the related biomolecules, or the corresponding molecular network, which can be represented as a nonlinear dynamical system. A typical cycle for both eukaryotic and prokaryotic cells is growth and division. Growth implies formation of new molecules in a cell and associated increase in its mass and volume, while division means separation of two almost equally sized daughter cells, which is usually a much faster process in contrast to growth. While preparing for cell division, a genome together with associated proteins must be doubled in size with extraordinary precision. A eukaryotic cell cycle comprises four stages: G1 (gap) phase in which the size of the cell is increased by constantly producing RNA and synthesizing protein; S (synthesis) phase in which DNA synthesis and duplication occur; G2 (gap) phase in which the cell continues to produce new proteins and grows in size; and M (mitosis) phase in which chromosomes segregate and cell division occur. In particular, the genome is constantly kept in the G1, G2, and M phases, but duplicated in the S phase, which lasts a shorter time than the cell volume growth process and much longer than the cell division time, as shown in Figure 1.6. Mammalian cells require around h to complete one cell cycle, whereas bacteria may divide every min, and yeast cells or other protozoans may require 6 8 h. Since the cell volume and DNA number must increase by a factor of 2 between successive divisions in order to ensure that the mass of the two daughter cells remains nearly equal to that of the mother cell, the concentrations or the numbers of molecules in the cell inevitably depend on dynamics of the cell cycle, which in turn have a significant effect on the dynamics of molecular networks owing to such dynamical changes during the cell cycle. The cell cycle is a vital process by which a single-celled fertilized egg develops into a mature organism, as well as the process by which hair, skin, blood cells, and some internal organs are renewed. Major events of the cell cycle, i.e., DNA synthesis, mitosis, and cell division, are regulated by a complex network of regulatory proteins known as cell cycle control system. The core of this system is an ordered series of biochemical switches which control the main events of the cycle. The control system monitors the conditions inside and outside of the cell. When the system malfunctions, excessive cell division can result in cancers. In the past decade, many researchers developed various models for the cell cycle and its control system which have greatly improved our understanding of cell cycle dynamics (Battogtokh et al. 2006, Chenkc et al. 2000, Tyson et al. 2001, Tyson and Novak 2001, Tyson et al. 2002). Besides cell cycle, rhythmic phenomena exist at all levels in living organisms with periods from less than a second to years, which may allow living organisms to adapt their behaviors to a periodically varying environment.

23 12 1 Introduction Figure 1.6 Schematic representation of the cell cycle. To produce two similar daughter cells, the complete DNA must be duplicated. DNA replication occurs during the S phase. At the end of S phase, each chromosome consists of a pair of sister chromatids held together by tethering proteins. After a gap (G2 phase), the cell enters mitosis (M phase), when the replicated chromosomes are aligned on the metaphase spindle, with sister chromatids attached by microtubules to opposite poles of the spindle. Finally, the tether proteins are removed so that the sister chromatids can be segregated at the opposite sides of the cell (anaphase). Shortly thereafter, the cell divides to produce two daughter cells in G1 phase (from (Collins et al. 1997)) In particular, a circadian rhythm is roughly a 24-h cycle in the biochemical, physiological or behavioral processes of living beings, including plants, animals, fungi, and cyanobacteria. Recently, it was also found that there is a functional pathway linking cell division and the circadian clock in an experiment on murine-regenerating liver. Circadian rhythms are biological clocks which are endogenously generated and can be entrained by external cues called Zeitgebers. The primary one is daylight, but other factors, e.g., temperature, also affect the rhythms. For instance, there is some evidence which shows that liver cells appear to respond to feeding rather than to light. These rhythms allow organisms to anticipate and prepare for precise and regular environmental changes. The primary circadian clock or master clock in mammals is located in the suprachiasmatic nucleus, a pair of distinct groups of neurons located in the hypothalamus. More-or-less independent circadian rhythms are also found in many organs and cells in the body outside the suprachiasmatic nuclei in mammals. These clocks, called peripheral oscillators, are found in

24 1.2 A Primer to Networks 13 the oesophagus, lung, liver, pancreas, spleen, thymus, and skin. Generally, the rhythms are linked to the light dark cycle. Animals, including humans, kept in total darkness for extended periods, eventually function with a freerunning rhythm. For each day, their sleep cycle is pushed back or forward, depending on whether their endogenous period is longer or shorter than 24 h. The rhythms are reset each day by environmental cues, i.e., Zeitgebers. Every rhythmic phenomenon in living organisms is believed to result from the internal nonlinear interactions of molecules or molecular networks, which are coupled with external stimuli and are usually represented as a nonlinear dynamical system. 1.2 A Primer to Networks The quantifiable tools of network theory and nonlinear dynamics theory offer unexplored possibilities to understand the structures and dynamics of molecular networks. Network theory has been applied to many disciplines, including particle physics, computer science, biology, economics, operations research, and sociology. There are many types of networks, examples of which include logistical networks, the world wide web, gene regulatory networks, metabolic networks, social networks, epistemological networks, etc. With the recent explosion of publicly available high-throughput biological data, the analysis of molecular networks has gained significant interest. The study on molecular networks now covers not only local patterns in the network, e.g., network hubs, network motifs, pathways, feedback loops, modules, and communities, but also global features of the networks, such as feedback networks, scale-free distribution, small-world networks, and robustness. Here, we introduce some basic concepts related to network theory that allow us to characterize different molecular networks, e.g., gene regulatory networks, transcription regulatory networks, protein interaction networks, metabolic networks, and signal transduction networks. A network is composed of a set of interconnected vertices or nodes. For example, in a transcription regulatory network, the nodes are genes, and edges represent transcriptional regulation of one gene by the protein products of other genes. On the other hand, in a protein network, a node is a protein, and an edge represents a physical or genetic interaction between two proteins. If an edge has a specific direction, then the network is called a directed graph; otherwise, it is called an undirected graph. Generally, transcriptional regulatory networks and metabolic networks would be modeled as directed graphs, whereas signal transduction networks can be represented as directed graphs or as hybrid graphs with both directed and undirected edges depending on the reactions and interactions in the network (e.g., PPIs). For instance, a transcription regulatory network (or transcriptional regulatory network) is a directed graph because if gene A regulates gene B, then there is a natural direction associated with the edge between the corresponding nodes, starting

25 14 1 Introduction at A and finishing at B. A directed graph also includes so-called self-loops, i.e., edges from a vertex to itself. On the other hand, PPI networks describe physical interactions between proteins in an organism, and there is no direction associated with the interactions in such networks. Hence, they are typically modeled as undirected graphs. The number of vertices n in a directed or undirected graph is the size or order of the graph. Generally, there is a weight coefficient, a positive or negative number, on each edge to represent the strength of the interaction. A network can be represented as a static graph or a dynamical system, depending on the setting of the problem Basic Concepts of Networks A set of vertices connected by edges is only the simplest type of network or graph. There are many ways in which a network may be more complex than this structure. For example, there may be more than one type of vertex or more than one type of edge or both in a network. Consider the example of a gene regulatory network, in which the vertices represent genes (mrnas) or proteins, and edges represent transcription, translation, or other kinds of gene regulations. The edges can carry weights representing, say how strong a TF directly controls the transcription rate of its target gene. The edges can be of two types. Activation or positive regulation with a positive weight on the edge occurs when increasing the concentration or the number in one vertex enhances the concentration or the number in another vertex. Otherwise, the regulation is negative or is called repression with a negative weight on the edge. Similarly to edges, a node can also carry a weight to quantitatively or qualitatively represent its attribution to a certain property. As described above, a molecular network formed by a variety of pairwise interactions between molecules, can be directed only in one direction. A gene regulatory network is directed because each message is transmitted in only one direction. Directed networks can be either cyclic, indicating that they contain closed loops of edges, or acyclic indicating that they do not contain closed loops. In directed networks, feedback plays an important role in understanding many aspects of networks such as stability and robustness. Feedback can be defined as the ability of a system to adjust its output in response to monitoring itself. Taking a gene regulation process as an example, a protein A may inhibit or enhance transcription of a gene which encodes some other protein B, while B may in turn influence the production of A, thereby forming a closed feedback loop. Feedback is a ubiquitous control mechanism in molecular networks. It has become clear that the principles of feedback both positive and negative loops are indeed used to produce signaling properties. By negative feedback, the response counteracts the effect of the stimulus, and therefore, it can be used to create homeostasis where the steady-state concentration of response is confined to a narrow window for a broad range of signal strength (Tyson et al. 2003). Negative feedback can also be used to create an oscillatory response

26 1.2 A Primer to Networks 15 such as circadian rhythms and to suppress noise (Kim et al. 2006). Positive feedback, however, would amplify the initial conditions, thereby rendering it possible to create amplification, decision, and memory. For instance, by positive feedback, a slight hot temperature would lead to maximum heating and a slight cold temperature would lead to maximum cooling, which eventually would result in two stable states very hot and very cold. Therefore, positive feedback can be used to create a switch, indicating that the cellular response changes abruptly as signal magnitude crosses a critical value. Moreover, such a feature can also be used as a sensor to detect small perturbations. Molecular networks may have multiple positive and negative feedback loops, particularly, in the form of coupled positive and negative feedback loops, which have been proposed as a basis for rapidly turning on a reaction to a proper stimulus, robustly maintaining its status, and immediately turning off the reaction when the stimulus disappears. Therefore, coupled positive and negative feedback loops form essential signal transduction motifs in cellular signaling systems (Kim et al. 2006). It has also been revealed that a signaling system with multiple feedback loops is more robust than one with a single feedback loop (Venkatesh et al. 2004). Excellent reviews on the roles of positive and negative feedbacks in cellular systems can be found in (Tyson et al. 2003, Mitrophanov and Groisman 2008, Ferrell 2002, Freeman 2000). Molecular networks also evolve over time, with vertices or edges appearing or disappearing (i.e., dynamics for structure evolution), or values defined on those vertices and edges changing (i.e., dynamics for state evolution). Nodes are added to a gene regulatory network when genes duplicate. However, duplicated genes immediately change their interactions and thereby rapidly specialize their interacting partners. This results in a link dynamics model which explains the network evolution through interaction loss and preferential interaction gain (Wagner 2003). Generally, core components of a network tend to be conserved, whereas components at the periphery or false interactions do not. Although conservation of network nodes and edges is extremely valuable for mapping conserved interactions and common features among organisms, it is likely that many regulatory interactions are not conserved, thereby likely leading to species diversity and allowing organisms to occupy distinct ecological niches (Zhu et al. 2007) Topological Properties of Networks Cellular functions cannot be attributed to isolated components. Rather, they arise from characteristics of molecular networks, which represent connections between cellular components. A molecular network is a typical complex network, which is generally characterized by not only global topological properties such as small-world character and scale-free distribution (Albert et al. 2002, Barabasi and Oltvai 2004, Watts et al. 1998) but also local patterns or structures such as motifs and modules (Alon 2006), as shown in Figure 1.7.

16 1 Introduction In other words, unlike random networks, molecular networks contain characteristic topological patterns that enable their functionality. A node with a high degree, i.e., connecting to many other nodes, is called a hub, which is considered to play a crucial role in the qualitative behavior of the network.

27 16 1 Introduction In other words, unlike random networks, molecular networks contain characteristic topological patterns that enable their functionality. A node with a high degree, i.e., connecting to many other nodes, is called a hub, which is considered to play a crucial role in the qualitative behavior of the network. Being the basic building blocks of molecular networks, which imply structural design principles, simple patterns of interconnections occurring in complex networks significantly more frequently than in randomized networks are defined as network motifs (Milo et al. 2002). Figure 1.7 Ingredients of molecular networks: molecules, interactions, local structures, and networks, where local structures include hubs, pathways, feedback loops, modules, communities, network motifs, and subnetworks. Clearly, basic units or components in a network are individual molecules, which affect each other by their local or pairwise interactions. A chain or cascade of those local interactions is a linear pathway or local structure which transforms local perturbations into a functional response. All of linear pathways or local structures are assembled into a global molecular network which eventually generates global behaviors and holds responsible for complicated life in a living organism (Chen et al. 2009) Some functionally related components often interact with each another, forming modules in molecular networks (Hartwell et al. 1999, Qi and Ge 2006). Motifs represent recurrent topological patterns, while modules are bigger building blocks that can carry out certain cellular functions such as signal transmission. Modules may retain motifs as their structural components and maintain certain properties such as robustness to environmental perturbations and evolutionary conservations because such a modular design can prevent damage from spreading limitlessly and can ease the evolutionary upgrading of some components. In addition to hubs, motifs and modules can be found in PPI, metabolic, and transcriptional networks. For the PPI and metabolic networks, modules can be defined as subnetworks whose component entities are more likely to be connected to each other than to entities outside the subnetworks, while for the transcriptional networks, modules can be defined as sets of genes controlled by the same set of TFs under certain conditions. The modular organization of molecular networks provides us testable hypotheses that provide biological

28 1.3 A Primer to Dynamics 17 insights such as functional annotation and key regulatory information because components in a given module are hypothesized to be functionally coherent. The module structures can also supply key regulatory information (Qi and Ge 2006). In addition to motifs and modules, other topological and statistical properties of networks such as degree distribution, centrality, clustering coefficient, and average path length are important factors to reveal the complex nature of molecular networks. 1.3 A Primer to Dynamics Having described some cellular processes and network basics, we proceed to describe basic concepts of modeling and dynamics. The enterprise of modeling research requires both breadth and depth in understanding of various aspects of molecular networks, including biological, computational, mathematical, and even engineering issues. When modeling a specific problem of a system, appropriate models must be selected in order to reflect the essential properties of the system because different models may highlight different aspects of the same system. Further, even after an appropriate model has been selected, there are still many different aspects which need to be understood. This section will introduce some fundamental concepts and aspects in modeling a molecular network as a dynamical system. Different approaches for modeling molecular networks are being investigated. For instance, various kinds of network models like Bayesian networks, Boolean networks, and linear feedback networks have been adopted to infer the static or dynamic structure of regulatory networks from experimental data. There are also other mathematical models which have attempted to capture more complex phenomena like spatiotemporal fluctuations and diffusions. All these methods can be categorized as static or dynamic, discrete or continuous, qualitative or quantitative (Jong 2002). Clearly, these studies help us qualitatively or quantitatively to understand the structures and functions of various molecular networks and their regulatory mechanisms in living cells Dynamics and Collective Behavior Dynamics exist in living organisms at all levels. From both theoretical and experiment viewpoints, it is a big challenge to model, analyze, and further predict various kinds of dynamical behavior in biological systems. One of the best studied dynamical phenomena thus far is circadian oscillations, which are assumed to be produced by limit cycle oscillators at the molecular level from the gene regulatory feedback loops. With rapid advances in mathematics and experiments concerning the underlying regulatory mechanisms, more sophisticated theoretical models and general techniques are increasingly required to elucidate various features of dynamical behavior at a system-wide level.

29 18 1 Introduction Another common dynamical phenomenon in biology is collective behavior, which is well-coordinated responses resulting from an integrated exchange of information by cell communication in both prokaryotes and eukaryotes. Collective behavior is also widespread in living organisms. The ability of cells to cooperate or communicate is an absolute requisite to ensure appropriate and robust coordination of cell activities at all levels of organisms under an uncertain environment. To understand the mechanism of cooperative behavior (such as chemotaxis and quorum sensing) for molecules is an essential topic in systems biology, which requires both mathematical and biological knowledge and insight. Generally, for a coupled system, cooperative behavior such as intercellular communication is accomplished by transmitting individual cell reactions via intercellular signals to neighboring cells and further integrating them to generate a global cellular response at the level of molecules, tissues, organs, and bodies. Recently, many studies have indicated that an uncoupled system may also realize collective behavior or synchrony provided that all subsystems or individual components are subjected to a common fluctuating environment (Teramae and Tanaka 2004) System States An important notion in modeling a dynamical molecular network is the system state. It is a snapshot of the network at a given time and contains considerable information to predict the behavior of the system at all future instances. A system state is described by a set of variables that must be monitored in a model. Different representations of the state can be used in different modeling approaches. The states can be discrete or continuous, deterministic or stochastic. For instance, in a Boolean network model for a gene regulatory network, each gene is assumed to be in one of two states: expressed or not expressed. Similarly, each protein can be assumed to be in either an active or an inactive state (Li et al. 2004). For simplicity, the expressed and not expressed states will be denoted as 1 or 0, respectively. The state of the model is simply a list of which genes are expressed and which are not; therefore, the system states are discrete and deterministic in a Boolean network model. In a differential equation model for a genetic regulatory network, the state, a list of concentrations of its cellular components, is continuous and deterministic. On the other hand, in its respective stochastic model such as the master equations, the state is a list of the current numbers of all species, and therefore, the state is discrete and stochastic. With the approximation of discrete variables by continuous variables (e.g., concentrations), a stochastic model can also be continuous and stochastic, such as stochastic differential equations Structures and Functions The analysis of topological structures and functions in molecular networks is an important issue in systems biology. Increasing numbers of scientists have

30 1.3 A Primer to Dynamics 19 been attracted to such a research topic. Therefore, it is essential to establish theoretical methodologies and computational techniques to enable the understanding of the way components dynamically interact to form molecular networks which facilitate sophisticated biological functions such as robustness and adaptation. Inferring a network structure is a complicated problem which generally cannot be automatically solved simply on the basis of some principles or universal rules by referring to experimental data. To identify a network structure, two kinds of approaches, i.e., bottom-up (knowledge-driven) and top-down (data-driven) approaches, can be utilized on the basis of the available experimental data. The bottom-up approach involves the construction of a molecular network by compiling independent experimental data, mostly through literature searches and database requests such as Kyoto Encyclopedia of Genes and Genomes and some specific experiments to obtain data regarding very specific aspects of the network. This approach is suitable when most of the molecular mechanisms and their regulatory relationships are relatively well understood. On the other hand, the top-down approach involves the utilization of highthroughput data obtained via new measurement technologies, e.g., microarray for gene expression and mass spectrometry for protein expression. Although the top-down approach does not require prior knowledge, its drawback is the computational cost. Hence, when prior knowledge is abundantly available, a hybrid method that combines the bottom-up and the top-down approaches is preferred in order to reduce the possible searching space of network structures. The theory of complex networks is a powerful tool to elucidate the structures and functions of molecular networks because of its universality and ability to mimic systems of many interacting parts. Global and local properties such as degree distribution, motifs and modules, and hierarchies play important roles in understanding the functional implication of molecular networks. For example, metabolic networks have been found to have small-world and scale-free properties (Barabasi and Oltvai 2004,Jeong et al. 2000). Such properties cannot only ensure the network robustness against random failures of the nodes but also guarantee an efficient transport and flow processing by avoiding congestion. Many protein interaction networks also have the scalefree distribution which represents network s tolerance to random errors, simultaneously coupled with fragility against the removal of the most connected nodes. Irrespective of these advancements, revealing the structures and functions of various complex molecular networks is still a key challenge in the field of systems biology. Besides theoretical results of complex networks, nonlinear dynamical theory and control theory are also widely used in analyzing structures and functions of networks, e.g., stability theory, bifurcation theory, Lur e system, and LMI. For example, the dynamics of cell cycle regulation has been analyzed by using bifurcation theory, which reveals how the generic properties of a dynamical molecular network depend on parameter perturbations (Tyson et al. 2002). On the basis of theoretical analysis, bifurcations have been shown to

31 20 1 Introduction underlie cell cycle transitions. A network with only positive feedback loops can be used to construct a molecular switch because the dynamics of such a network can be ensured to converge only to stable equilibria on the basis of theory of monotone dynamical systems (Kobayashi et al. 2003). On the other hand, a network with interlocked positive and negative feedback loops can be used to construct a circadian clock (Glossop et al. 1999), which ensures a stably periodic oscillation Cellular Noise Cellular processes at the molecular level are inherently stochastic. The origin of stochasticity in a cell can be attributed to random transitions among the discrete biochemical states, which are the source of inherent fluctuations for the cell. There can be two sources of noise. First, the inherent stochasticity in biochemical processes such as binding, transcription, and translation generates the intrinsic noise due to random encountering, whose relative magnitude is proportional to the inverse of the system size. Second, variations in the amounts or states of cellular components due to discrete numbers or the external environment generate extrinsic noise. Such noise processes are believed to play especially important roles when species are present at low copy numbers. Systematic treatment of noise is essential for understanding biologically relevant system properties. It has become clear that the role of noises in cellular functions may be complex and cannot always be treated as a small perturbation to the deterministic behavior. Moreover, the stochastic fluctuations may play constructive roles and not always be the cause of systematic worsening of the properties of the system. For instance, molecular fluctuations may enhance the sensitivity of intracellular regulation (Paulsson et al. 2000), induce new phenomena such as noise-based switches and amplifiers for gene expression (Hasty et al. 2000), and mediate collective behavior or stochastic synchronization (Chen et al. 2005). When species are present at low copy numbers, the stochastic description is more reasonable although it may be solvable neither analytically nor with high computational efficiency. On the other hand, when the species numbers are high and the system is operating far from its critical points, the deterministic description is more reasonable due to its simple representation and high computational efficiency. Recently, it has been shown that noise is generated at the microscopic level of discrete variables and transmitted to the mesoscopic level of continuous variables (Crudu et al. 2009) Time Delays Biological regulation typically occurs via direct or indirect interactions between cellular components. During the regulation, there are always time delays associated with biosynthesis and transport of regulatory molecules to reach

32 1.3 A Primer to Dynamics 21 the site of action. Delayed feedbacks are ubiquitous in many cellular systems such as the regulatory networks of circadian rhythms. Time delays are usually introduced in cellular models to signify the time taken for transcription, translation, phosphorylation, protein degradation, translocation, and posttranslational modification, and may significantly influence the stability and dynamics of the overall system, especially for eukaryotes (Chen and Aihara 2002a). Quantitative measurements of the delays are few, but see (Audibert et al. 2002), for information on recent progress in measuring RNA splicing delays. Few studies have focused on signal transduction delays. Time delays may also play important roles in many other aspects such as entrainment and occurrence of certain physiological disorders like sleep phase syndromes (Sriram et al. 2006). Besides the roles played in these deterministic systems, time delays may also cause some new phenomena in stochastic systems such as delay-induced stochastic oscillations (Bratsun et al. 2005). It has been shown that when the time delays in biochemical reactions are in the order of other significant time scales characterizing the cellular system or longer, and the feedback loops associated with these delays are strong, the delays can be crucial for the description of transient processes; this implies that when delay times are significant, both analytical and numerical modeling should take into account the effects of time delays. Actually, several useful algorithms and computational tools have been developed to consider such an effect. For instance, a generalized Gillespie algorithm that accounts for the non-markovian properties of random biochemical events with delays has been widely used to simulate the dynamical behavior of molecular networks (Bratsun et al. 2005) Multiple Time Scales Each cell can be viewed as an integrated device made of several thousand types of interacting cellular components. Explicitly considering all the components in individual cells is unrealistic for a cellular system from a modeling, analyzing, and computing viewpoint. However, many different time scales characterize various cellular processes which can be exploited to reduce the complexity of the mathematical models. Generally, in gene regulatory networks, the transcription and translation processes evolve on a time scale that is much slower than that of other biochemical reactions, such as phosphorylation, dimerization, or binding reactions of proteins. For instance, time to transcribe a DNA sequence and translate a mrna into a protein in Escherichia coli is about one and two minutes, respectively. However, time for binding a TF to a DNA promoter in Escherichia coli is about one second. The wide range of time scales in which cellular processes may occur is also an important feature of metabolism. Some modifications may happen within seconds, while other processes take minutes, hours or even longer. For example, in Michaelis Menten rate equations, it is assumed that enzyme production and degradation occur on a much larger time scale than the catalyzed reaction. In addition, protein production and

33 22 1 Introduction degradation occur on a different time scale than signal transduction. Dynamics are generally intertwined between the gene regulatory, metabolic, and signal transduction networks, and different time scales also co-exist in these networks. For example, enzymatic reactions occur on the order of milliseconds, and gene regulatory events occur on the order of minutes or hours. Because of the existence of multiple different time scales in a molecular network, we need to consider what time scale is valid for a specific problem under study. When modeling, we can choose an appropriate time scale while neglecting faster or slower processes in order to reduce the system provided that the simplified system is guaranteed to behave similarly to the original one. For example, in a process that occurs within seconds or less, the details of binding and unbinding and protein transitions between active and inactive states have to be modeled. At longer time scales, on the other hand, these can be considered to be at quasi-steady-states. At very long time scales, processes such as cell division, which can be ignored at shorter time scales, may be very important. An example for neglecting slow processes in metabolic modeling is to assume that enzyme concentrations are constant. Their production and degradation are considered to be much slower than the reactions that they catalyze (Ciliberto et al. 2007). Similarly, an example for neglecting fast processes in modeling a gene regulatory network is to assume that the fast reactions rapidly approach quasi-steady-states. Therefore, in mathematical terms, differential equations for the concentration change of fast variables can be replaced by the corresponding algebraic equations (Wang et al. 2004) Robustness and Sensitivity Robustness, which characterizes the ability to maintain performance in the face of perturbations and uncertainty, is one of the essential features of cellular systems (Stelling et al. 2004a). For example, it has been shown that the yeast cell cycle network is robust with respect to small perturbations to the network (Li et al. 2004). It is essential for cells to protect their genetic information and their mode of living against perturbations. Robustness in biological systems is often achieved by a high degree of complexity involving feedback, modularity, redundancy, and structural stability (Kitano 2002). The phenomenological properties exhibited by robust systems can be classified into three areas: adaptation, which denotes the ability to cope with environmental changes; parameter insensitivity, which indicates a system s relative insensitivity to specific kinetic parameters; and graceful degradation, which reflects the characteristic slow degradation of a system s functions after damage, rather than catastrophic failure (Kitano 2002). The global properties such as scalefree network structures are also helpful for robustness against random failure of the molecules (Barabasi and Oltvai 2004). Different robustness measures have been proposed, such as degree of robustness (DOR), which measures the minimal distance from a reference point in the parameter space to a bifurcation point, multiparametric robustness

34 1.4 Network Systems Biology and Synthetic Systems Biology 23 by structural singular values (Ma and Iglesias 2002), robustness with respect to random multiple parameter variation by defining total parameter variation (Bluthgen and Herzel 2003), and Monte Carlo-based robustness measure (Eissing et al. 2005), in which random parameter sets are drawn from predefined ranges and the relative frequency of occurrence provides an estimate of the volume in the parameter space allowing bistable behavior. However, cellular systems are able to adapt to changes, sense and process internal and external signals, and react precisely depending on the type or strength of a perturbation. Sensitivity or fragility characterizes the ability of living organisms to adequately react to a certain stimulus. Moreover, robustness is usually quantified by calculating sensitivity such as period and amplitude sensitivity for quantifying robustness of circadian rhythms. Relatively small sensitivity indicates high robustness. Understanding the mechanism behind robustness and sensitivity is particularly important because it provides in-depth understanding on how the system not only maintains its functional properties against various disturbances but also adapts to environmental changes and reacts precisely. Due to the requirement of both robustness and sensitivity, there must exist a tradeoff between them. Some of the mechanisms for such tradeoff have been revealed; for example, module-based analysis of robustness tradeoffs in the heat shock response system (Kurata et al. 2006) and steady-state analysis of yeast cell polarization (Chou et al. 2008). 1.4 Network Systems Biology and Synthetic Systems Biology Systems biology is an emergent area arising recently in biology that focuses on the systematic study of complex interactions in biological systems by integrating biology, mathematics, chemistry, physics, informatics, engineering, and other fields (Kitano 2002,Chen et al. 2009). It has been found that a complicated living organism cannot be completely understood by merely analyzing individual components, such as genes and metabolites. It is the interactions among components or the networks that are responsible for the functions and behavior of the biological systems. With respect to the challenge of understanding the complexities of living organisms faced by biologists, instead of analyzing individual components or interactions of an organism with the socalled reductionist approach, systems biology studies an organism by considering all components and interactions together and treats the organism as a dynamical and interacting network of genes, proteins, and biochemical reactions which contribute to life (Barabasi and Oltvai 2004,Chen et al. 2009). In recent years, with rapid progress of various measurement technologies and experiential methods, many high-throughput technologies have been developed for systematically studying interactions or networks of molecules, such as microarray, next generation sequencing, the two-hybrid assay, co-immunoprecipitation,

35 24 1 Introduction and the ChIP-chip approach, which can be used to screen for PPI or to infer gene regulatory networks. With increasingly accumulated data from highthroughput technologies, molecular networks and their dynamics have been studied extensively from various aspects of living organisms. These researches help biologists not only to understand complicated biochemical phenomena but also to elucidate the essential principles and fundamental mechanisms of cellular systems at a system-wide level. One of the biggest challenges in network systems biology is to build a complete and high-resolution description of molecular topography and connect molecular interactions or molecular networks with physiological responses. By studying the relationships and interactions between various parts of a biological system, e.g., regulation modules, functional pathways, organelles, cells, physiological systems, and organisms, we aim to eventually develop an understandable model or a molecular network of the whole system, which is a key both for understanding life and for application in human medicine, in particular from the theoretical and engineering perspective. Figure 1.8 shows schematically major research topics of the network systems biology. Closely related to systems biology, synthetic biology is also a new area of research that combines biological science and engineering in order to design and build artificially novel biological functions and systems and requires the techniques from systems biology. In other words, synthetic biology, in particular synthetic systems biology, involves the design and construction of new biological parts, devices, and systems or the re-design of natural biological systems for useful purposes. Research in synthetic biology aims at combining knowledge from various disciplines, including molecular biology, engineering, and mathematics to design functional networks and implement new cellular behavior. Recent progress in genetic engineering has made the design and implementation of artificial synthetic gene networks realistic from both theoretical and experimental viewpoints. Actually, from the theoretical predictions, several simple gene networks have been experimentally constructed, e.g., genetic toggle switch (Gardner et al. 2000) and repressilator (Elowitz and Leibler 2000). Such simple models clearly represent a first step towards logical cellular control by manipulating and monitoring biological processes at the DNA level, and not only can be used as building blocks to synthesize artificial biological systems but also have great potential for biotechnological and therapeutic applications. To demonstrate the similarity and difference, Table 1.1 shows simple comparisons between the silicon digital computing system and the synthetic biological system. 1.5 Outline of the Book This book provides new theoretical tools and computational models for modeling and analyzing dynamical networks at the molecular level, from systems and engineering viewpoints. In particular, we provide a general theoretical

36 1.5 Outline of the Book 25 Figure 1.8 Major research topics of network systems biology. Omics data include high-throughput data from genomics, transcriptomics, proteomics, metabomics and phenomics Table 1.1 Silicon computing system and synthetic biological system In-silico computing system Synthetic biological system Building blocks silicon switch, clock, gene switch, gene oscillator, silicon sensor gene sensor Hardware gate, memory, CPU, bio-gate, bio-sensor, bio-memory, computer bio-computer Software (codes) C++, Fortran, Basic genetic code (A, C, G, T) Networks computing system living organism Applications computation, control, bio-tech, logical cellular control, artificial intelligence, etc. drug, etc.

37 26 1 Introduction framework to model building blocks of biomolecular systems and to analyze nonlinear dynamical phenomena, which readers can apply to design functionoriented molecular networks and solve novel biological problems on the basis of their knowledge and skills. Specifically, the new features of the book include: 1. modeling a general molecular network with either time-invariant or timevarying parameters, 2. nonlinear analysis of molecular networks, such as stability and bifurcation analysis; 3. designing synthetic switching networks; 4. designing synthetic oscillating networks; 5. quantitative simulation scheme for molecular networks with stochastic fluctuations; 6. synchronizing bio-oscillators without noise; 7. synchronizing bio-oscillators with noise; 8. noise-induced collective behavior of molecular networks; 9. graphic representations for molecular networks, i.e., interaction graph, incidence graph, and species-reaction (SR) graph. The engineering, networks, and dynamics approaches of this book have major strengths. For instance, there are many engineering areas featured in this book, including forward engineering design, signal processing, and control systems. The text or material is designed to match ideas that engineering students are familiar with. These approaches are demonstrated and explored within the context of the functionality of living systems. On the other hand, the nonlinear dynamical analysis of molecular networks on collective behavior and switching dynamics, which are ubiquitous in living organisms, is presented in the book in a comprehensive and systematic manner. The topic is treated in depth and is related to other emerging areas, such as network motif, molecular communication, and stochastic resonance in living organisms. The intended readers are systems biology and computational biology specialists in academia and industry, including pharmaceuticals, engineers, postgraduates, and molecular biologists who rely on computers, and mathematical scientists with interests in biology. In addition, there are three graphic representations introduced in this book to analyze molecular networks, i.e., interaction graph, incidence graph, and SR graph. Figure 1.9 shows the applications of these graphic representations to cellular systems. Table 1.2 shows the major theoretical tools, i.e., monotone dynamical systems, Lur e systems, and LMI techniques, which are also adopted to analyze the dynamical behavior of cellular systems, in particular for the stability, bifurcation, and synchronization of molecular networks. The book includes new developments with cutting edge research topics and methodologies in the area of molecular network design and nonlinear analysis, which are difficult to fully cover due to a dearth of experts in the related fields. However, this area is one where applied mathematicians and engineers can make a big impact. The main contents of the book are as follows:

38 1.5 Outline of the Book 27 Figure 1.9 Graphic representations for analyzing molecular networks. Interaction graph emphasizes regulations which can be direct or indirect interactions. Incidence graph is similar to the interaction graph but stresses the relation between input and output. In contrast, SR graph mainly describes the direct interactions or chemical reactions. The solid line means the directly induced relation between the two graphs, and the dotted line implies the indirectly induced relation between the two graphs Table 1.2 Special theoretical tools adopted in the book; that is, monotone dynamical systems, Lur e systems, and LMI technique, etc. Special theoretical tools Major topics Monotone dynamical systems switching behavior oscillating behavior multistability Master equations stochastic simulation stochastic synchronization Cumulant equations stability bifurcation LMI stability analysis synchronization Lur e systems gene regulation stochastic stability

39 28 1 Introduction 1. Chapter 1 covers the problems and topics of biomolecular networks from both biological and theoretical viewpoints to provide the context and an impetus for the following chapters, and also, this chapter provides the fundamental concepts for molecular biology and network theory used in the book. In this chapter, we also include perspectives and challenges on modeling and analyzing molecular networks in cellular systems. 2. In Chapter 2, we present a mathematical description of stochastic molecular networks in a single cell in a multiscale manner. Specifically, we show how the master equations, stochastic differential equations, cumulant equations, and then deterministic equations are obtained to model cellular dynamics at the molecular level, depending on the requirement for accuracy. There are three representations of molecular networks, i.e., stochastic representation, deterministic representation, and hybrid representation. Special structures and properties of biochemical reactions are exploited by monotone dynamical systems to reduce the complexity of the cellular systems. 3. Chapter 3 provides a general framework to represent a deterministic molecular network either with time-invariant structures or with timevarying structures to consider the cell division process. 4. In Chapter 4, we discuss qualitative analysis of the molecular networks, including stability, bifurcation, sensitivity, and robustness analysis. 5. Chapter 5 describes qualitative analysis of genetic networks based on Lur e model by using LMI techniques and Lyapunov functions. 6. In Chapter 6, we illustrate how to construct or design a synthetic molecular network based on interaction graphs and SR graphs, in particular, a gene regulatory network with specific functions: switching dynamics with feedback or interlocked feedback networks (i.e., positive loop network), by exploiting the special structures of biological systems, e.g., monotone dynamics of biochemical reactions. On the basis of the synthetic biology, the detailed examples for gene regulation are also provided for such networks in bacteria. 7. In Chapter 7, we illustrate how to construct or design a gene regulatory network with rhythmic dynamics in feedback or interlocked feedback networks (i.e., negative loop networks and hybrid network), by exploiting the special structures of biological systems, e.g., monotone dynamics of biochemical reactions. Incidence graphs are also adopted to transform a non-monotone system into a discrete map which can be analyzed in a much simpler way. On the basis of the synthetic systems biology, the detailed examples for gene regulation are also provided for such synthetic networks in bacteria. 8. Chapter 8 describes a general model for the coupled molecular networks in a multicellular system, i.e., we formulate a molecular network with coupling in stochastic and deterministic forms in a multicell system, which is similar to the star-like coupling but has a different structure. The theoretical results on synchronization of multiple bio-oscillators without noise

40 1.5 Outline of the Book 29 and with noise, by nonlinear dynamical theory and control theory are described. A synthetic multicellular system is constructed to show how synchronization is achieved and how dynamics of individual cells is controlled as an illustrative example. The ingredient flow of this book is shown in Figure The major topics and theoretical methods are also summarized in Figure M o d e l i n g m o l e c u l a r n e t w o r k s ( C h a p t e r s 2 a n d 3 ) Q u a l i t a t i v e l y a n a l y z i n g m o l e c u l a r n e t w o r k s ( C h a p t e r s 4 a n d 5 ) Q u a n t i t a t i v e l y s i m u l a t i n g m o l e c u l a r n e t w o r k s ( C h a p t e r 2 ) D e s i g n i n g m o l e c u l a r n e t w o r k s ( C h a p t e r s 6 a n d 7 ) A p p l y i n g t o m u l t i - c e l l u l a r s y s t e m s a n d c o l l e c t i v e d y n a m i c s ( C h a p t e r 8 ) Figure 1.10 Content flow in the book

41 30 1 Introduction Figure 1.11 Major topics and theoretical methods in the book

42 2 Dynamical Representations of Molecular Networks A living cell can be viewed as a huge dynamical system or molecular network with stochastic fluctuations at the molecular levels, where cellular components interact dynamically, both temporally and spatially. Dynamical representations of the molecular network are necessary for an accurate understanding of the temporal and spatial evolution of a cellular system and can also provide deep insight into the dynamical interactions among the cellular components. This chapter describes a general theoretical framework to model molecular networks with the consideration of discrete state transitions and stochastic fluctuations. In particular, we provide both stochastic and deterministic formulation to model networks of biochemical reactions in a cell. 2.1 Biochemical Reactions Biochemical reactions, which specify how the states of a system change and how fast that change occurs, are very important in modeling molecular networks. Many complex networks can be expressed by such reactions, either qualitatively or quantitatively. Biochemical reactions are so fundamental that the same list of biochemical reactions can lead to different models, e.g., a graphical representation, a stochastic process, or a deterministic model. In this sense, representing processes by biochemical reactions is more basic than using a mathematical model of either stochastic or deterministic dynamics. Writing down a list of biochemical reactions corresponding to a cellular system or a molecular network, together with the rate of every reaction and the initial condition of each species, is a powerful and flexible way to specify the system. However, the reactions themselves specify only the qualitative structure of the system and must be augmented with additional tools or formalisms before they can be used to analyze and simulate dynamics of the network and further make predictions. One tool is the dynamical representations of cellular systems or molecular networks.

43 32 2 Dynamical Representations of Molecular Networks In this section, we introduce some basic biochemical reactions involved in cellular systems. A cellular system consists of a network of coupled biochemical reactions. These reactions can be transcription, translation, dimerization, protein or mrna degradation, enzyme-catalyzed reactions, transportation, diffusion, binding or unbinding, DNA or histone methylation, histone acetylation or phosphorylation. These biochemical reactions constitute various biomolecular networks, e.g., metabolic, genetic, and signaling networks. In an elementary biochemical reaction, one or more biochemical species react directly to form products in a single reaction step with a single transition state. More complicated reactions can be decomposed into a sequence of elementary reactions. A general elementary biochemical reaction can be represented as follows: r 1 R 1 + r 2 R r m R m k p1 P 1 + p 2 P p n P n, (2.1) where m is the number of reactants, and n is the number of products. The terms to the left of the arrow are called reactants, and those on the right are called products. Thus, R i is the ith reactant, and P j is the jth product. r i and p j are the numbers of reactant R i consumed and product P j produced in a reaction step, respectively. k is a positive number to represent the rate of the reaction. The coefficients r i and p j are known as stoichiometries and are generally small positive integers. The reactants and products in a cell are DNA, RNA, proteins, or other chemicals. The total reaction order is defined as the sum of r i, i.e., m i=1 r i. The general reaction (2.1) can express any biochemical reaction in a cell. However, in order to clearly represent concrete biochemical processes, we sequentially describe typical reactions in cellular systems. First, we consider the dimerization of a protein P with reaction rate k d : 2P k d P 2, (2.2) where 2P means P + P. (2.2) is a reaction for a protein protein interaction, which results in a homodimer P 2 composed of two protein monomers P.The reaction has one reactant P and one product P 2 with stoichiometries of 2 and 1, respectively. Stoichiometries of 1 are not usually written explicitly. Similarly, the reaction for the dissociation of the dimer P 2 with reaction rate k d is written as follows: k d P 2 2P. (2.3) A reaction that can happen in both directions is termed reversible. Reversible reactions are quite common in biological systems and can be briefly written by one reaction equation. For instance, (2.2) (2.3) can be represented as 2P k d k d P 2 (2.4)

44 2.1 Biochemical Reactions 33 or sometimes as 2P K eq P 2. (2.5) Here, K eq = k d /k d is called the equilibrium constant. It is important to remember that the notation for a reversible reaction is simply a convenient way to represent two separate reactions. When modeling, we still need to decompose a reversible reaction or more complex reactions into a sequence of elementary reactions in order to analyze and simulate their dynamics. An elementary reaction is a biochemical reaction in which one or more of the chemical species react directly to form products in a single reaction step and with a single transition state. Before introducing different modeling approaches to systems of coupled biochemical reactions in the next section, we will first describe in detail some basic biochemical processes and show how their essential features can be captured with fairly simple systems of coupled biochemical reactions. Generally, a complex molecular network is intertwined among various processes, including gene regulation, protein interactions, metabolism, and signal transduction. Reactions in each process are relatively strongly correlated, but reactions between these processes are relatively independent, and therefore we only need to couple all the reactions involved in the system or process of interest when modeling it. Transcription is a key cellular process for gene regulation, and control of transcription is a fundamental regulation mechanism in a living organism. The crucial stage of the transcription process is the binding of a RNAP to the promoter of a gene, which is regulated by various TFs, to initiate the transcription or produce mrna. A RNAP is an enzyme which copies the genetic sequence of a gene and synthesizes the mrna by attaching to the DNA strand. A TF or a sequence-specific DNA binding factor is a protein that binds to specific DNA sequences and thereby controls the transcription of genetic information from DNA to RNA by promoting or blocking the recruitment of RNAP to specific genes as an activator or a repressor. Note that a particular feature of TFs is that they contain one or more DNA binding domains which attach to specific DNA sequences adjacent to the genes that they regulate. Other proteins or chemicals without DNA binding domains also play crucial roles in gene regulation, e.g., by binding to a TF to form a transcriptional complex, and are called as cofactors. In this book, we consider a TF as one protein or one complex unless otherwise specified. Consider the case of one TF P, which can be a monomer or a complex, and one free DNA binding site D in the promoter region of a gene, which produces mrna. The transcription process as a system of coupled biochemical reactions can be represented as follows:

45 34 2 Dynamical Representations of Molecular Networks (a) (b) (c) Q RNAP TF Q RNAP P TF RNAP mrna P binding site of P binding site of Q binding site of P binding site of Q binding site of P binding site of Q (d) P RNAP (e) P mrna (f) Q mrna Q Q RNAP P RNAP binding site of P binding site of Q binding site of P binding site of Q binding site of P binding site of Q (g) mrna (h) RNAP P Q RNAP P Q binding site of P binding site of Q binding site of P binding site of Q Figure 2.1 Transcription regulation of two TFs (P and Q) and two binding sites on the promoter of a gene. There are eight cases for the transcription regulation: (a) no binding, (b) RNAP binding, (c) P bound to the site, (d) Q bound to the site, (e) Q bound to the site with RNAP binding, (f) P bound to the site with RNAP binding, (g) P and Q bound to the sites with RNAP binding, and (h) P and Q bound to the sites, among which RNAP binds to the promoter to produce mrna for four cases, i.e., (b), (e), (f), and (g) P + D k a P D, (2.6) k a D k d mrna+ D, (2.7) P D k dc mrna+ P D, (2.8) where k a, k a, k d,andk dc are rate constants for the respective reactions, and P D is a protein DNA complex which denotes TF P bound to the binding site D in the promoter. Note that we do not explicitly include RNAP in the model to simplify the expression, although RNAP is necessary in initiating the transcription process. Transcriptional reactions (2.7) and (2.8) indicate

46 2.1 Biochemical Reactions 35 that transcription rates k d and k dc to produce mrna are different without TF and with TF. If the TF is an activator to enhance the transcription (i.e., TF recruits RNAP binding to the promoter to initiate the transcription), k dc will be clearly larger than k d. On the other hand, if the TF is a repressor to inhibit the transcription (i.e., TF prevents RNAP binding to the promoter and from initiating the transcription), k dc will be smaller than k d. Unlike a single binding reaction (2.6), transcription reactions (2.7) and (2.8) are actually a chain of reactions due to the synthesis of a RNA sequence from a number of nucleotides and are generally slow. Sometimes, there are multiple TFs which can bind to different binding sites to regulate the expression of the same gene. Such a case can be modeled in a similar way. For example, consider the case where two TFs, P and Q, can bind to two respective binding sites on the DNA, as shown in Figure 2.1. Using similar notation P D to denote P bound to the binding site D, and Q P D to denote Q bound to P D or P bound to Q D, such processes can be formulated by a set of reactions as follows: P + D k1 k 1 P D, (2.9) Q + D k 2 Q D, (2.10) k 2 P + Q D k 3 P Q D, (2.11) k 3 Q + P D k4 k 4 P Q D, (2.12) D k5 mrna+ D, (2.13) P D k 6 mrna+ P D, (2.14) Q D k 7 mrna+ Q D, (2.15) P Q D k8 mrna+ P Q D. (2.16) Clearly, the transcription reactions (2.13), (2.14), (2.15), and (2.16) correspond to cases (b), (f), (e), and (g) of Figure 2.1, respectively. Transcription of a gene occurs when RNAP is bound to the promoter of DNA regulated by the TF P (or multiple TFs). Thus, the DNA binding site is either free D or bound P D, resulting in a conservation equation for the system (2.7) (2.8), or D + P D = n x (2.17) D + P D + Q D + P Q D = n x (2.18) for the system (2.13) (2.16), where n x is the total binding sites of the promoter. Note that we also use the same symbols D, P, Q, Q D, P D, and

47 36 2 Dynamical Representations of Molecular Networks P Q D in (2.17) (2.18) to represent the concentrations of the respective molecules. Similarly to the transcription processes, translation is another complicated process which sometimes involves over several hundred reactions to produce a single protein from a single mrna. The key stages involved in the translation process are the binding of a ribosome to the mrna, and the translation of the mrna to a polypeptide chain, and then to a functional protein. For instance, the production of a protein and a complex of multiple proteins can be formulated as a set of reactions as follows: mrna k p P + mrna, (2.19) P + P k b P 2, (2.20) k b P 2 + P 2 k c k c P 4, (2.21) where P, P 2,andP 4 are a protein monomer, a dimer, and a tetramer, respectively. Similarly to the transcription reactions, the translation process (2.19) is also a chain of reactions. Note that we do not model every aspect of the translation and production of protein complex processes, but express the prominent stages of the translation and features of interest. In particular, we do not consider the processes of binding of ribosomes to the mrna and the folding of the polypeptide chain into a functional protein. Degradation of mrna and protein can be modeled in a similar way: mrna d m 0, (2.22) P dp 0, (2.23) which mean that mrna or protein is transformed or degraded into nothing, where d m and d p are the degradation rates. In fact, mrna and protein degradation is a rather complex process. Note that the transcription processes (2.7) (2.8) and (2.13) (2.16), translation process (2.19), and degradation processes (2.22) (2.23) all are a chain of reactions, which are generally much slower than the binding and unbinding processes (2.9) (2.12) and (2.20) (2.21). The transportation process is also important, especially, in eukaryotes. Due to the existence of a nucleus, mrna must be transported out of the cell nucleus before the translation process occurs. In contrast, proteins (e.g., TFs) must be transported into the nucleus from cytoplasm to regulate gene expression. Transduction has also been shown to be important, especially, in higher eukaryotes and multicellular organisms, because individual cells need to communicate with each other using chemical signaling molecules which can dissolve in the cytosol and diffuse between individual cells and their extracellular medium. By appropriately modeling this process, it might become possible to elucidate the fundamental mechanisms underlying many biological phenomena, e.g., collective behavior through intercellular signaling and

48 2.1 Biochemical Reactions 37 signaling pathways. A model for this process would be as simple as I k A, (2.24) where I denotes an mrna or a protein within the nucleus, and A corresponds to the mrna or the protein in the cytoplasm, or vice versa, for the transportation process. On the other hand, for the transduction process, I and A denote intracellular and extracellular signaling molecules, respectively. Such a reaction can also be used to model the transition between different states such as inactive and active states of a protein. Metabolism is a general term for catabolic and anabolic reactions. It is a highly organized process and often involves thousands of reactions that are catalyzed by enzymes. Phosphorylation in signaling cascades is also an enzyme kinetic reaction with the kinase facilitating the phosphorylation of a substrate. Consider the simple enzyme-catalyzed biochemical reactions E + S k 1 ES k 2 E + P. (2.25) k 1 The reactions comprise a reversible formation of an enzyme substrate complex ES from the free enzyme E and the substrate S and an irreversible release of the product P. Generally, the duration of the state ES is very short. In the reactions, the enzyme is neither produced nor consumed. Therefore, its total concentration remains constant. It may be free or involved in the complex. Note that in this book, without confusion, we use both XY and X Y to express the binding or complex of X and Y. Besides the reactions mentioned above, there are many other reactions involved in cellular systems that we have omitted, such as post-transcriptional regulation, post-translational modification, and microrna regulation. Although involved in different processes, many reactions can be represented in similar forms. For example, the reaction form for the formation of a heterodimer is the same as that for the formation of an enzyme substrate complex in the enzyme-catalyzed reaction. Moreover, small RNA-mediated posttranscriptional regulation can be described similarly as the formation of a heterodimer because the small RNA (srna) itself is consumed via the interaction between srnas and their mrna targets. Although there are many kinds of biochemical processes and reactions, after they are decomposed into elementary reactions, most of them can be classified into monomolecular reactions (the first-order reactions), bimolecular reactions (the second-order reactions), or trimolecular reactions (the third-order reactions) according to the number of molecules involved. The monomolecular reaction has the form of degradation (2.22) (2.23) or transportation and transduction (2.24). When the reactant concentration in monomolecular reactions is considered to be constant, the first-order reactions become the zero-order ones. The bimolecular reactions are probably the most common reactions in cellular systems, e.g., (2.9) (2.12) and (2.20) (2.21). There are two different

49 38 2 Dynamical Representations of Molecular Networks ways by which two reactants combine to form one product. One is the binding of two same molecules such as the formation of a homodimer, and the other is the binding of two different molecules to form a heterodimer, e.g., the formation of an enzyme substrate complex, the binding of a TF to a promoter, and the binding of a small RNA to its mrna target. Trimolecular reactions or higher-order reactions are rare because the probability of the simultaneous collision of three or more molecules is very small. A more general reaction can be represented as (2.1), which is the m i=1 r i-order reaction. 2.2 Molecular Networks Although biochemical reactions are fundamental to understanding a cellular system at the molecular level due to the detailed information that these reactions provide, how these reactions form various molecular networks to facilitate specific cellular functions is unclear. Hence, it is important to elucidate not only the function of each individual reaction but also that of the associated molecular network as a whole. Note that we particularly study biological networks at the molecular level, and therefore, a biological network simply means a biomolecular network or a molecular network in this book. Actually, a living organism can be viewed as a huge biochemical reaction network, which is nonlinear with both intrinsic and extrinsic stochastic fluctuations. A molecular network is assumed to organize a list of biochemical reactions in an accurate, complete, and comprehensive manner. Modeling and analyzing the molecular networks may not only lead to the elucidation of the essential mechanism of how biochemical reactions generate various particular cellular functions but also lead to the revelation of the regulatory roles of individual reactions in the network functions. Many types of molecular networks, such as gene regulatory networks, transcription regulatory networks, protein interaction networks, metabolic networks, and signal transduction networks exist in a cell (Cao and Liang 2008, Chen et al. 2009, Jeong et al. 2000, Tyson et al. 2001). However, few such networks are known for their complete structures, even in the simplest bacteria. Still less is known on how the networks interact at different levels in a cell, and how to predict the complete state description of a eukaryotic cell or a bacterial organism at a given point in the future. In this sense, the study of molecular networks from a systems biology perspective is still in its infancy. Next, we show how a molecular network is represented by a graph, a stochastic system, a deterministic system or a hybrid system, depending on the requirements to model and analyze a specific cellular system. 2.3 Graphical Representation One way to understand biochemical reactions is via graphical representation, which can provide more information and is easier to analyze than the reac-

50 2.3 Graphical Representation 39 tion list, especially for cases with a large number of reactions. For example, the global properties of molecular networks such as degree distribution and betweenness centrality can be more easily obtained from graphical representations than from the reaction list. Moreover, many theoretical results, e.g., by graph theory and complex network theory, can be adopted to investigate these networks in order to obtain qualitative or quantitative predictions. In this book, we use three graphic representations to describe biochemical reactions or molecular interactions, depending on the models, i.e., interaction graphs, incidence graphs, and species-reaction graphs. Detailed descriptions of the three representations are provided in Chapter Example of Interaction Graphs First, consider the simple enzyme kinetic reaction (2.25) as an example of the interaction graph. The graphical representation is shown in Figure 2.2. Such a diagram represents the relationship or interactions between the components in a biomolecular system and is known as an interaction graph. As described in detail in Chapters 3 and 6, in an interaction graph, each node represents the concentration or the number of a chemical. Each edge or connection can be a linear or nonlinear function of the connected node. A positive value of the edge, e.g., A B, implies that chemical A enhances the synthesis of chemical B, while a negative value implies that chemical A represses the synthesis of chemical B, which is also represented by in contrast to. Consider another example of a gene regulatory system with two genes in a eukaryotic cell. The biochemical reactions are shown below, where D x and D y denote the promoter regions of genes x and y, respectively. The multimerization (2.26), (2.30), (2.31); transportation (from cytoplasm to nucleus) (2.27), (2.32); and binding reactions of TFs to promoter regions (2.28), (2.29), (2.33) are described as follows: k 1 P x + P x P 2x, (2.26) k 1 k 2 P 2x P 2x, (2.27) k 2 P 2x k 3 + D y P 2xD y, (2.28) k 3 P 2x + P 2xD y k 4 k 4 P 2x P 2xD y, (2.29)

51 40 2 Dynamical Representations of Molecular Networks k 5 P y + P y P 2y, (2.30) k 5 k 6 P 2y + P 2y P 4y, (2.31) k 6 k 7 P 4y P 4y, (2.32) k 7 P 4y k 8 + D x P 4yD x, (2.33) k 8 where P 2x, P 4y,andP 2x, P 4y denote protein dimers and tetramers in the cytoplasm and the nucleus, respectively. P 2xD y,p 4yD x and P 2xP 2xD y are all complexes. All these reactions are generally fast and occur within less than a few seconds. E S P E S Figure 2.2 Graphical representation of the enzyme kinetic reaction via an interaction graph. All interactions or edges are positive The reactions for the transcription of mrnas (m x,m y ), translation of proteins (P x,p y ), and degradation of proteins and mrnas are represented as follows: k D mx0 x mx + D x, (2.34) P 4yD k mx1 x mx + P 4yD x, (2.35) m x k Px Px + m x, (2.36) k my0 D y my + D y, (2.37) P 2xD k my1 y my + P 2xD y, (2.38) P 2x P 2xD k my2 y my + P 2x P 2xD y, (2.39)

52 2.3 Graphical Representation 41 k Py m y Py + m y, (2.40) d m mx x 0, (2.41) d P Px x 0, (2.42) d my m y 0, (2.43) d Py P y 0. (2.44) P ' 2 x Px P2 x P ' 2 x P ' 2 x Dy P ' 2 x Dy m x m y P ' 4 y D x P ' 4 y P4 y P2 y Py Figure 2.3 Schematic illustration of the graphical representation of a gene network by an interaction graph. means a negative regulation, i.e., repression, and implies a positive regulation, i.e., activation Transcription and translation reactions are a chain of reactions, which are considerably slower than other reactions and generally require more than minutes to complete. The schematic representation for a gene network or gene regulatory network as an interaction graph is shown in Figure 2.3, where means a negative regulation, i.e., repression, and implies a positive regulation, i.e., activation. In this book, when there is no specific note on each edge of a network or a graph, a positive regulation (or interaction) and a negative regulation (or interaction) are represented by and, respectively. However, the regulation between the two chemicals is positive or negative if the edge of the regulation is specifically indicated by + or. The regulatory interactions between all the components are explicitly expressed in the associated diagram. When there are multiple feedback loops in a molecular network, it becomes difficult to gain some direct insight into the regulatory relations from the reactions. Even for this simple network, the interactions from p 2y to p 2x are not straightforward. However, the repression relation from p 2y to p 2x can easily be obtained from the graphical representation, as schematically illustrated in Figure 2.3.

53 42 2 Dynamical Representations of Molecular Networks Example of Incidence Graphs To analyze biomolecular networks with control inputs, it is convenient to use incidence graphs, which are the same as the interaction graphs, except the two extended input and output nodes. Hence, compared with an interaction graph with n nodes, an incidence graph has n+2 nodes. All definitions in an incidence graph are the same as those of the corresponding interaction graph. When the input and output nodes are viewed equivalently as other nodes, it becomes an interaction graph. A detailed description of the incidence graph is provided in Chapter 6. For the example of (2.25), assuming the substrate S and the product P to be input and output of the system, respectively, the incidence graph can be simply represented by Figure 2.4, which has two additional nodes (input and output nodes) compared with the interaction graph of Figure 2.2. The incidence graph is used to study the relation between input and output. E S P O E S I Figure 2.4 Graphical representation of the enzyme kinetic reaction by an incidence graph. All the definitions are the same as those of Figure 2.2, except input and output nodes. I and O represent the input and output nodes, respectively Example of Species-reaction Graphs In addition to interaction graphs and incidence graphs for the analysis of gene regulatory networks, SR graphs are used mainly in metabolic networks. An SR graph is composed of chemicals and reactions and is a general technique for studying the relationship between reaction network structures and the capacity for multistability. Unlike interaction graphs and incidence graphs which are directed graphs, an SR graph is an undirected graph. A detailed description of incidence graphs and SR graphs is provided in Chapter 6. For the example of the enzyme reaction (2.25), the SR graph is illustrated in Figure 2.5, where each circle represents a chemical (or a species) and each box stands for a reaction. Each edge or arc is labeled with the name of the

54 2.4 Biochemical Kinetics 43 complex in which the species appears. An SR graph is mainly used to study the multistability in a molecular network. ES ES E+P E+P E+P ES P E+S E+S ES E+S E S Figure 2.5 Graphical representation of an enzyme kinetic reaction by an SR graph 2.4 Biochemical Kinetics Molecular networks can be modeled and analyzed as stochastic or deterministic systems. Different approaches may highlight different aspects of the same list of reactions. Moreover, even for a given model, many different aspects can be revealed from theoretical and numerical analysis of the network. In dynamical modeling, the reaction rates that depend on the change of participating species over time need to be defined, e.g., in terms of the number or the concentration of each molecule. In order to obtain the temporal behavior of the species, the functional relation of change in the number or the concentration need to be specified, depending on the approach chosen. Specifically, for each reaction, we identify its rate constant, which depends on temperature and specifies the amount of time that the reaction takes, and an associated rate law, which specifies the amount of state changes or the probability that the reaction occurs in a small time interval. For example, consider a simple reaction of the form A k B. (2.45) This reaction means that A is transformed to B atarateofk. In the deterministic formulation, the reaction specifies the state changes in which the

55 44 2 Dynamical Representations of Molecular Networks concentration of A decreases concomitantly with the increase in the concentration of B. The amount of state change in a small time interval dt is given by k[a]dt, where [A] is the concentration of A. On the other hand, in the stochastic formulation, the reaction specifies the state changes in which the total number of A decreases by one, and the total number of B increases by one. The probability for this reaction is given by k{#a}dt, where {#A} is the number of A present. Note that in this book, [A] represents the concentration of species A, but sometimes A also stands for the concentration of species A to simplify the representation. For the same reason, we also occasionally drop # to represent the number of species A directly by using A instead. When modeling biochemical kinetics, both the stochastic and deterministic approaches follow the mass action law, which states that the reaction rate is proportional to the probability of a collision of the reactants. This probability is in turn proportional to the concentrations of reactants to the power of the molecularity, i.e., the number in which they enter the specific reaction. There are some minor differences between stochastic and deterministic rate constants. In order to conduct a stochastic simulation, the deterministic rate constants must be converted in an appropriate way to stochastic rate constants. Next, we will provide a general mathematical framework of both stochastic and deterministic approaches for modeling molecular networks. Especially, we will show how the master equations, Langevin (stochastic differential) equations, cumulant equations, and then deterministic differential equations can be obtained to model biochemical kinetics and further provide a brief comparison among them. 2.5 Stochastic Representation A cellular system is well regulated at a molecular level but is an inherently noisy process, from transcriptional control, alternative splicing, translation, diffusion to chemical modification reactions, all of which involve stochastic fluctuations. Such stochastic noise may not only affect the dynamics of biological systems but may also be exploited by living organisms to actively facilitate certain functions. From an evolutionary viewpoint, noise is also assumed to be used for cellular and population variability control. Due to very low copy numbers for many species in living cells, the origins of stochasticity can be traced to the random transitions among discrete chemical states, which implies that a model of a molecular network should be able to present this discrete nature of small numbers both qualitatively and quantitatively. To capture the discrete and stochastic nature of biochemical kinetics with low concentrations or small numbers of molecules, a stochastic modeling formulation can be adopted to describe such a biological system. In stochastic modeling, one can estimate the time of each reaction from statistical properties. For example, stochastic simulation provides the time of a reaction from

56 2.5 Stochastic Representation 45 a probability distribution. The result of a simulation run is one possible realization of the temporal evolution of the system. The stochastic framework considers the exact numbers of molecules present, which are discrete quantities. Such a strategy may identify how many molecules of each component are present in the system. The stochastic framework grasps the essence of the stochastic collision of biochemical components, i.e., the components change discretely, but which change occurs and when it occurs are probabilistic. Actually, a cellular system at a molecular level can be considered to be governed by stochastic processes with many random events, or by a continuous time and discrete-space Markov chain. Consider, for example, a DNA binding protein P, whose binding to and unbinding from DNA follow elementary reactions P + DNA k + P DNA, (2.46) P DNA k P + DNA. (2.47) This list of reactions involves chemicals of three types: P, DNA, and P DNA, which is a complex of P and DNA. There are multiple possible reactions involved in (2.46) (2.47), and the states change discretely when any one of the reactions occurs. Assume that the system state at time t is ({#P }, {#DNA}, {#P DNA}). After a short time dt, the state will change to ({#P 1},{#DNA 1},{#P DNA +1}) with the probability k + {#P }{#DNA}dt, to state ({#P +1}, {#DNA+1}, {#P DNA 1}) with the probability k {#P DNA}dt, or stay in the same state with the probability 1 k + {#P }{#DNA}dt k {#P DNA}dt, where {#A} indicates the number of molecule A (Gibson and Mjolsness 2001). In stochastic modeling, one can also deal with the probabilities rather than the numbers of molecules. In other words, the state or variable is the probability distribution over all configurations. A configuration is a list of numbers of molecules. The dynamics of probabilities obey a master equation, a linear differential equation, which describes the temporal evolution of the probability distribution (Van Kampen 1992) for the discrete transitions of molecules Master Equations for a General Molecular Network The master equations description accounts for the probabilistic nature of cellular biochemical processes and can be viewed as a continuous time discretespace Markov model. It describes the time evolution of the probability of having a certain number of molecules, and its result is usually taken as a gold standard for numerical simulation in computational biology due to its detailed representation and also due to the lack of experimental data. In the master equation, reaction rates are transformed into probability transition rates. Suppose that the number of molecules of reactant X can be any positive integer {#X} = n. Let P n (t) denote the probability that there are

57 46 2 Dynamical Representations of Molecular Networks n molecules of X at time t. Then, we want to know the temporal evolution of the probability which governs its development. In other words, we want to express P n (t +dt) in terms of P m (t) for all m, i.e., to express the probability of having n molecules at time t +dt in terms of the probability of all possible values for m molecules at time t. The occurrence of event {#X(t +dt)} = n can be thought of as the the occurrence of mutually exclusive event ({#X(t +dt)} = n, {#X(t)} = m) for all possible m. When taking the conditional probabilities into account, we have P ({#X(t +dt)} = n) = m = m P ({#X(t +dt)} = n, {#X(t)} = m) P m,n P ({#X(t)} = m), (2.48) where P m,n = P ({#X(t +dt)} = n {#X(t)} = m) is the transition probability of changing from m to n molecules in the time interval dt for m, n =0, 1, 2,... The summation of (2.48) is over all possible m. The stochastic representation (2.48) obeys a Markov process because the transition probability itself neither explicitly depends upon the time at which the transition occurs, nor does it depends on the path on which the change occurred. Taking the limit as dt 0 in the difference P ({#X(t +dt)} = n) P ({#X(t)} = n) leads to a differential equation in the probabilities, i.e., the master equation. An Illustrative Example of the Master Equation In order to clearly explain the master equation, consider the reactions (2.46) (2.47). Let X = (X 1,X 2,X 3 ) = ({#P }, {#DNA}, {#P DNA}) bethe numbers of the three species. Define P (X; t) to be the probability function for the state ({#P }, {#DNA}, {#P DNA}) =X at time t. The probability of being in state X =(X 1,X 2,X 3 )attimet +dt is composed of the sum of terms which describe all possible previous states multiplied by their respective transition probabilities. Then, the probability at t +dt is given by P (X; t +dt) =P (X1+1,X 2+1,X 3 1),(X 1,X 2,X 3)P (X 1 +1,X 2 +1,X 3 1; t) +P (X1 1,X 2 1,X 3 +1),(X 1,X 2,X 3 )P (X 1 1,X 2 1,X 3 +1;t) P (X1,X 2,X 3),(X 1 1,X 2 1,X 3+1)P (X 1,X 2,X 3 ; t) P (X1,X 2,X 3 ),(X 1 +1,X 2 +1,X 3 1)P (X 1,X 2,X 3 ; t) +P (X 1,X 2,X 3 ; t). (2.49) The transition probability is assumed to be proportional to their numbers and time interval dt

58 2.5 Stochastic Representation 47 P (X1 +1,X 2 +1,X 3 1),(X 1,X 2,X 3 ) = k + (X 1 +1)(X 2 +1)dt, (2.50) P (X1 1,X 2 1,X 3+1),(X 1,X 2,X 3) = k (X 3 +1)dt, (2.51) P (X1,X 2,X 3 ),(X 1 1,X 2 1,X 3 +1) = k + X 1 X 2 dt, (2.52) P (X1,X 2,X 3 ),(X 1 +1,X 2 +1,X 3 1) = k X 3 dt. (2.53) Substituting (2.50) (2.53) in (2.49), taking P (X 1,X 2,X 3 ; t) to the left-hand side, and dividing by dt, and then considering the limit as dt 0, we obtain a differential equation in the probability as follows: dp (X; t) dt P (X 1,X 2,X 3 ; t +dt) P (X 1,X 2,X 3 ; t) = lim dt 0 dt = k + (X 1 +1)(X 2 +1)P (X 1 +1,X 2 +1,X 3 1; t) +k (X 3 +1)P (X 1 1,X 2 1,X 3 +1;t) (k + X 1 X 2 + k X 3 )P (X 1,X 2,X 3 ; t). (2.54) An equation of the type (2.54) is called the master equation, which is a continuous-time discrete-space Markov chain because the probability of the next transition depends on the current state only, and not on the history of states. The first two terms are the gain of state X =(X 1,X 2,X 3 ) due to transitions from other states, and the last term (i.e., (k + X 1 X 2 + k X 3 )P (X 1,X 2,X 3 ; t)) is the loss due to transitions from X =(X 1,X 2,X 3 ) to other states. A General Molecular Network Consider a general molecular network with N molecular species {S 1,..., S N } that react through M reaction channels {R 1,..., R M }. Let X =(X 1,..., X N ) be the state of the molecules at time t, i.e., X i is the number of the ith molecule at time t. Define P (X; t) to be the probability function for the state X at time t. Here, we occasionally drop t to simplify the expression in the following, e.g., X = X(t), without confusion. Then, the temporal evolution of the system is governed by a set of reactions X X and X X. The probability P (X; t) that the system is in state X at time t evolves according to the master equation (Van Kampen 1992) P(X; t) t = } {P X XP (X ; t) P XX P (X; t), (2.55) X where P X X is the transition probability from state X to state X, andp XX is the transition probability from state X to state X. The master equation is nothing but a gain loss equation for the probability distribution P (X; t) or a stochastic birth-and-death process. The summation of (2.55) is for all possible states of X. The first term is the gain of state X due to transitions from other states X to X, and the second term is the loss due to transitions from X to other states.

59 48 2 Dynamical Representations of Molecular Networks Define a j (X)dt to be the probability, given X, that one R j reaction occurs in the next infinitesimal time interval [t, t +dt) andν ji to be the change in the number of S i molecules produced by one R j reaction (j =1,..., M and i =1,..., N). The master equation (2.55) for a general molecular network can be rewritten as P(X; t) t = M { } a j (X ν j )P (X ν j ; t) a j (X)P (X; t), (2.56) j=1 where ν j =(ν j1,..., ν jn ). Note that the state X is discrete but the probability distribution P is continuous. Generally, the master equation itself is not analytically or numerically solvable in any but the simplest cases. Therefore, one has to resort to Monte Carlo types of simulations that produce a random walk through the possible states of the system. In other words, instead of calculating the probability distribution P (X; t), the approaches simulate the time evolution of a particular trajectory, starting at an initial state X 0. Various methods have been developed, such as the Gillespie stochastic simulation algorithm (SSA) (Gillespie 1976,Gillespie 1977). Recently, some more computationally efficient algorithms have been developed such as the Gibson Bruck algorithm (Gibson and Bruck 2000), the approximate simulation strategies (Gillespie 2001, Gillespie and Petzold 2003), the hybrid simulation strategies (Kiehl et al. 2004, Rao and Arkin 2003, Puchalka and Kierzek 2004), and parallel simulation strategies (Chen et al. 2005). However, making the stochastic simulation algorithms more efficient and applying stochastic simulation algorithms to larger models are still the subject of active research. For example, the perfect sampling algorithm was proposed to recast Gillespie s SSA in the light of Markov chain Monte Carlo methods and combine it with the dominated coupling from past algorithms in order to provide guaranteed sampling from the stationary distribution (Hemberg and Barahona 2007). Equilibrium Probability Distribution In the stochastic framework, it is difficult to define an equilibrium configuration in terms of the numbers of molecules as in a deterministic framework since any reaction which changes the numbers of molecules takes places in probability. In deterministic differential equations, an equilibrium means that there is no net change in the numbers or concentrations of molecules. The concept of the equilibrium suffices for differential equation models, but not for stochastic models. In contrast, for stochastic models, there is equilibrium probability distribution, i.e., a set of probabilities that the system has certain numbers of molecules, even though the numbers of molecules may change. Generally, it is difficult to get equilibrium probability distribution of a master equation analytically, if not impossible, although some statistical quantities such as means and variances can be obtained based on it. However, for some

60 2.5 Stochastic Representation 49 simple cases, it can be obtained by recursion relations. For example, consider a set of coupled reactions called the Lotka Volterra system, X k 1 2X, (2.57) X + Y β 2Y, (2.58) Y k 2 0. (2.59) The time evolution of the joint probability P (X, Y ; t) obeys the following master equation: P(X, Y ; t) t = k 1 (X 1)P (X 1,Y; t) k 1 XP(X, Y ; t) +k 2 (Y +1)P (X, Y +1;t) k 2 YP(X, Y ; t) +β(x +1)(Y 1)P (X +1,Y 1; t) βxy P(X, Y ; t).(2.60) To obtain its equilibrium distribution, we define A(X; t) = P (X, Y ; t) and B(Y ; t) = Y =0 P (X, Y ; t), (2.61) X=0 where A(X; t) is the marginal probability of X at time t, irrespective of the number of Y. The probability B(Y ; t) is defined analogously. Supposing that there is equilibrium probability distribution, one can obtain A(X; t) =0 for Y =0, 1, 2,..., (2.62) Ḃ(Y ; t) =0 for X =0, 1, 2,..., (2.63) where Ȧ =da/dt and Ḃ =db/dt. Then, using recursion relations, we can obtain the equilibrium distribution P (X, Y ; t) = 0 for all X =0, 1,... and Y =1, 2,... (2.64) To derive (2.64) in more detail, on the basis of the definition of B(Y ; t) in (2.61), by summing (2.60) over all X, we have Ḃ(Y ; t) =k 2 (Y +1)B(Y +1;t) k 2 YB(Y ; t) βy Then, we have +β(y 1) XP(X, Y ; t) X=0 XP(X, Y 1; t). (2.65) X=0

61 50 2 Dynamical Representations of Molecular Networks 0=Ḃ(0,t)=k 2(0 + 1)B(0 + 1,t) k 2 0 B(0,t) +β(0 1) XP(X, 0 1,t) β 0 X=0 XP(X, 0,t) X=0 = k 2 B(1,t). (2.66) Note that P (X, 1,t) = 0 because all X, Y are non-negative numbers. Hence, we can obtain B(1,t)= P (X, 1,t)=0. (2.67) X=0 Because all the probabilities are non-negative, we have P (X, 1,t)=0 for X =1, 2,... (2.68) Proceeding with this process, we can obtain (2.64). Similarly, by using A(X; t) =k 1 (X 1)A(X 1; t) k 1 XA(X; t) βx YP(X, Y ; t) +β(x +1) Y =0 YP(X +1,Y; t), (2.69) Y =0 we obtain P (X, Y ; t) = 0 for all X =1, 2,... and Y =0, 1,... (2.70) Integrating (2.64) and (2.70), it can be obtained that the only non-zero probability left in the equilibrium distribution is P (0, 0; t) =1. (2.71) Therefore, the stochastic Lotka Volterra system does not have equilibrium distribution other than the one in which both species X and Y are extinct (Reddy 1975), i.e., X = Y = 0. However, on the basis of (2.57) (2.59), the deterministic Lotka Volterra system can be written as dx dt = k 1X βxy, (2.72) dy dt = βxy k 2Y. (2.73) In this book, we occasionally use Ẋ to represent dx/dt. Clearly, it has an equilibrium X = k 2 /β and Ȳ = k 1/β, which are generally non-zero. The difference arises from the different possibilities for the identification of birth and death terms in the deterministic equation. Another example where the equilibrium probability distribution can be solved analytically is the model for the signaling cycle (Levine et al. 2007).

62 2.5 Stochastic Representation Stochastic Simulation Although the analytical solution of the master equation is rarely available, the density function can be constructed numerically using the SSA (Gillespie 1976). Generally, the SSA first constructs numerical realizations of X(t) and then averages the results of many realizations. Specifically, we compute the reaction probability density function P (Δt, μ X; t), which is the joint probability density function of two random variables, the time to the next reaction Δt and the index of the next reaction μ, givenx. The reaction probability density function for the general molecular network (2.56) takes the form with P (Δt, μ X; t) =a μ (X) exp( a 0 (X)Δt) (Δt 0; μ =1,..., M), (2.74) a 0 (X) = M a j (X). (2.75) j=1 The reaction probability density function provides the basis for the SSA. According to the joint density function (2.74), the next reaction and the time of its occurrence can be generated through the direct method as follows. Draw two random numbers r 1 and r 2 from a uniform distribution in unit interval [0, 1). The time to the next reaction Δt and the index of the next reaction μ, given X, can be taken as follows: Δt = 1 a 0 (X) ln( 1 ), (2.76) r 1 μ = the smallest integer satisfying μ a j (X) >r 2 a 0 (X). (2.77) The Gillespie direct method for exact simulation of the master equation (2.56) is shown in Table 2.1. Despite its simplicity in expression, the computational cost of the Gillespie algorithm increases drastically with the number of reactions and the number of molecules. The increment in computational cost is primarily due to the generation of the two random numbers and the enumeration of reactions to determine the next reaction. For example, when the number of molecules is equal to 10 6, the time step Δt will become excessively small, i.e., ontheorderof10 6, which means that more time steps or iterations are needed. For illustrative purposes, we use the example for the Gillespie algorithm presented in (Gillespie 2001). There are three molecular species (S 1,S 2,S 3 ) and four reaction channels as follows: j =1 c S 1 1 0, (2.78) c S 1 + S 2 1 S2, (2.79) c S 3 2 S1 + S 1, (2.80) c S 4 2 S3. (2.81)

63 52 2 Dynamical Representations of Molecular Networks Table 2.1 Gillespie direct method for exact simulation of (2.56) (Gillespie 1977) Step 1. Initialization: set t = 0 and fix the initial numbers of molecules X 0. Step 2. Calculate the propensity function a j, j =1,..., M. Step 3. Generate two random numbers r 1 and r 2 in [0, 1). Step 4. Determine Δt and μ according to (2.76) (2.77). Step 5. Execute reaction μ and advance time Δt, i.e., t t + Δt. Ift reaches T max, terminate the computation. Otherwise, go to Step 2. For these reactions, the reaction propensities, which describe the probability of one molecule colliding with another, take the forms a 1 = c 1 X 1, (2.82) a 2 = c 2 X 1 (X 1 1)/2, (2.83) a 3 = c 3 X 2, (2.84) a 4 = c 4 X 2, (2.85) where X 1 and X 2 are the numbers of molecules S 1 and S 2, respectively. The simulation results are shown in Figure 2.6 when using the rate constant values c 1 =1, c 2 =0.002, c 3 =0.5, c 4 =0.04 (2.86) and the initial conditions X 1 =10 5, X 2 = X 3 =0, (2.87) where X 3 is the number of molecules of S 3. Time delay exists in many cellular processes and in a mathematical model, the effect of the delay may be drastic. When delay times are significant, both analytical and numerical modeling should take into account their crucial influences. To explore the combined effects of time delay and intrinsic noise on cellular dynamics, a modified Gillespie method was developed (Bratsun et al. 2005), which allows incorporation of delayed reactions and accounts for the non-markovian properties of random biochemical events with time delay. The formal steps for the modified method are shown in Table 2.2. To account for the modified method more clearly, we use a simple model presented in (Bratsun et al. 2005) as an example. Suppose that the protein can exist both in the form of monomers X 1 and dimers X 2, and protein production occurs with a time lag τ and can only occur if the operator D is unoccupied at time t. The reaction channels are as follows:

64 2.5 Stochastic Representation 53 Figure 2.6 Stochastic simulation of the system (2.78) (2.81) for the rate constants (2.86) and the initial condition (2.87) (from (Gillespie 2001)) D 0 k 0 D0 + X 1, (2.88) X 1 + X 1 k 2 X2, (2.89) k 2 X 2 X1 + X 1, (2.90) k D 0 + X 1 2 D1, (2.91) k 1 D 1 D0 + X 2, (2.92) d X 1 0, (2.93)

65 54 2 Dynamical Representations of Molecular Networks Table 2.2 Modified Gillespie method to incorporate delayed reactions (Bratsun et al. 2005) Step 1. Initialization: set t = 0 and fix the initial numbers of molecules X 0 and the reaction counter. Step 2. Calculate the propensity function a j, j =1,..., M. Step 3. Generate two random numbers r 1 and r 2 in [0, 1). Step 4. Compute the time interval until the next reaction according to (2.76). Step 5. Check whether there are delayed reactions scheduled within time interval [t, t + Δt]. If yes, time t advances to the time t d = t + τ of the next scheduled delayed reaction, states X are updated according to the delayed reaction channel, and the counter is increased i i + 1. Proceed to Step 2. Otherwise, go to Step 6. Step 6. Find the channel of the next reaction μ according to (2.77). Step 7. If the selected reaction μ is not delayed, update X by executing reaction μ, update time t t + Δt, and increase counter i i + 1. If the selected reaction is delayed, update is postponed until time t d.ift reaches T max, terminate the computation. Otherwise, go to Step 2. where D 0 and D 1 are the unoccupied operator and the occupied operator by repressor X 2, respectively. When utilizing the inherent separation of time scales and eliminating the fast variables X 2 and D 1 corresponding to reactions (2.89) (2.92), the deterministic equation for the number of monomers X 1 takes the form ( 1+4ɛX 1 + 4ɛδX 1 (1 + ɛδx1 2)2 ) dx1 dt = k 0 1+ɛδX 2 1 (t τ) dx 1, (2.94) where ɛ = k 1 /k 1 and δ = k 2 /k 2. (2.94) is an approximation of the original system using the quasi-steady-state equilibrium of (2.89) (2.92), but clearly, deterministic simulation of (2.94) is considerably simpler. The stochastic simulation results are shown in Figure 2.7. See (Bratsun et al. 2005) for more details on the modified Gillespie method. Although the equilibrium probability distribution of the master equation (2.55) is difficult to derive analytically, it can be calculated numerically on the basis of the following algebraic equation: { } P X XP (X ; t) P XX P (X; t) =0. (2.95) X For instance, by enumerating all possible (feasible) configurations or states of X and then substituting them into (2.95), we have simultaneous equations

66 2.5 Stochastic Representation 55 for all feasible states. Then, exploring the sparse structure of these linear equations, e.g., using the Arnoldi method (Cao and Liang 2008), we can estimate the equilibrium probability distribution of P for all states or configurations of X. The exact stochastic simulation algorithms are computationally expensive, and thereby only realistic for the computation of a small-scale system. To simulate a large-scale biomolecular system, approximation schemes are usually adopted, e.g., the finite state projection approach, the parallel computation scheme, the adaptive time-step procedure, and the model reduction (or aggregation) method based on time-scale separation or continuous discrete variable separation, by exploiting the special dynamical or topological properties of the concerned system. Figure 2.7 Stochastic simulation of the system (2.88) (2.93). The parameter values are ɛ =0.1, δ =0.2, k 1 = 100, k 1 = 1000, k 2 = 200, k 2 = 1000, d =4,τ = 20, k 0 = 20 (a), and k 0 = 70 (b) (from (Bratsun et al. 2005))

67 56 2 Dynamical Representations of Molecular Networks Analysis of Sensitivity and Robustness of Master Equations Sensitivity characterizes the ability of living organisms to adequately react to certain stimulus. It quantifies the dependence of system behavior on parameters. In deterministic modeling, robustness is usually quantified by calculating sensitivity, e.g., period and amplitude sensitivity in quantifying robustness of circadian rhythms. Sensitivity analysis of stochastic systems has recently become popular due to its relevance to the simulation of cellular processes. Using an analogue of the classical sensitivity analysis, the parameter sensitivity can also be applied to discrete stochastic dynamical systems. In stochastic systems, the state is probability distribution and parameters affect it indirectly through a master equation. The parameter sensitivity applied to the probability distribution is given by (Gunawan et al. 2005) S j (X; t) = P(X(p); t) p j, (2.96) where p denotes the parameter vector. In traditional sensitivity analysis, one implicit assumption is that the outputs are continuous with respect to parameters. However, in stochastic systems, the output takes random values with probability. A sensitivity measure for discrete stochastic systems is defined by (Gunawan et al. 2005) [ ] P(X; t) s j (t) =E p j = P(X; t) p j P (X; t)dx, (2.97) where E[ ] means the expected values on X. The absolute value is used here because only the magnitude of the sensitivity coefficient is concerned. Unlike the sensitivity coefficient of a single system state with respect to a parameter, the dependence of the system state X on parameter p is implicitly assumed. The evolution of sensitivity coefficients can be derived from (2.55) by taking the derivatives with respect to parameters X S j (X; t) t = X {P X XS j (X ; t) P XX S j (X; t) } + P X X P (X PXX ; t) P (X; t) p j p j. (2.98) The evolution of sensitivity coefficients (2.98) should be solved simultaneously with its respective master equation (2.55). Generally, its analytical solution is difficult to construct. The SSA cannot be directly applied to solve the sensitivity equations. Techniques for estimating the sensitivity coefficients have been developed on the basis of a black-box approach such as finite difference. Specifically, the probability density function is constructed through a cumulative distribution function from the SSA. The cumulative distribution function is given by

68 F (X; t) = X 2.5 Stochastic Representation 57 P ( X; t)d X, (2.99) from which the density function P (X; t) can be obtained as follows: df (X; t) P (X; t) = dx. (2.100) Then, the sensitivity coefficients can be numerically computed according to difference approximation as follows: S j (X; t) = P(X; t) p j = P (X(p j + Δp j ); t) P (X(p j Δp j ); t) 2Δp j = F (P (X(p j + Δp j )); t) 2F (P (X(p j )); t)+f (P (X(p j Δp j )); t) (Δp j ) 2,(2.101) where Δp j is the perturbation to p j and should be small enough to minimize truncation error but large enough to accurately predict the effect of the parameter changes over the level of internal noise (Gunawan et al. 2005). Generally, this balance in choosing step sizes in the parameters may hinder the application of finite difference to the probability density function. Recently, spectral methods for parameter sensitivity in stochastic dynamical systems were developed (Kim et al. 2007). The authors used spectral polynomial chaos expansions to represent statistics of the system dynamics as polynomial functions of system parameters. Such an approach allows an accurate and robust evaluation of sensitivities, even for the case of large-magnitude parametric perturbations. In addition, it enables studies of the predictability of the system given uncertainty, variability, or external noise in the model parameters, and estimation of corresponding uncertainty of the predicted output state statistics Langevin Equations The Langevin equational description to represent a cellular system explicitly incorporates the effects of noise. It takes the form of continuous differential equations augmented with additive or multiplicative stochastic terms, called stochastic differential equations. Due to the explicit incorporation of noise, the Langevin approach is ideal to describe constructive effects of stochastic fluctuations in cellular systems. Two typical examples of Langevin equations with additive and multiplicative stochastic terms take the forms and Ẋ = f(x)+γ (t) (2.102)

69 58 2 Dynamical Representations of Molecular Networks Ẋ = h(x)+g(x)γ (t), (2.103) where Γ (t) can be a Gaussian white noise or other fluctuating random term (Hasty et al. 2000). In (2.102), the noise term does not depend on the current state, whereas in (2.103), the noise term depends on the current state. Here, g(x) is a deterministic function of X, which determines the scaling of the noise. Note that unlike a master equation, the state X in the Langevin equations takes continuous values, and hence the Langevin equations can be viewed as an approximation to the corresponding master equation, i.e., they approximate discrete state variable X of the master equation by continuous variable X of (2.103). In order to describe the fluctuations, one can generally proceed by the following steps (Van Kampen 1992) when g(x) andγ are not explicitly given. 1. Write the deterministic macroscopic equations of the system. 2. Add the Langevin force, i.e., the second term of (2.102) or (2.103). 3. Adjust the specified constants related to the Langevin force so that the stationary solution reproduces the correct mean square fluctuations as known from statistical mechanics or other considerations. The master equation and various stochastic simulation algorithms account for the stochastic and discrete nature of biochemical reactions and have been widely used to investigate the properties and effects of the internal noise as the gold standard of numerical computation. However, the computational efficiency rapidly degrades as the complexity of a system increases. In addition, a master equation cannot provide clear perspectives on the origins and magnitude of the internal noise. Staring from a master equation, when a system possesses a macroscopically infinitesimal time scale in the sense that during any time increment dt on that scale, all the reaction channels fire much more than once and none of the propensity functions change appreciably, its dynamics can be well approximated by Langevin equations (Gillespie 2000). In particular, when the number of each species is large (e.g., over hundreds in a cell), Langevin equations can well describe the dynamics of cellular systems. The Langevin equations of a general molecular network can be straightforwardly derived from (2.56) and take the form dx i (t) dt M M = ν ji a j (X(t)) + ν ji a j (X(t))Γ j (t), (2.104) j=1 j=1 where all parameters and variables are defined as the same as those in (2.56). Γ j (t) are temporally uncorrelated, statistically independent Gaussian white noise signals, and are formally defined by Γ j (t) = lim N (0, 1/dt), (2.105) dt

70 2.5 Stochastic Representation 59 where N (m, σ 2 ) denotes the normal random variable with mean m and variance σ 2. Together with the properties of temporal and statistical independence, the definition implies <Γ j (t)γ j (t ) >= δ jj δ(t t ), (2.106) where the first and second delta functions are Kronecker s function and Dirac s function, respectively, i.e., δ jj =1ifj = j, and 0 otherwise; δ(t t )=1 if t = t, and 0 otherwise. Next, we will occasionally use δ(j, j ) to represent δ jj. Although the Langevin approach is approximate analysis due to continuous approximation of discrete state X, which loses validity when the number of molecules is small, it is often solved with much greater analytical ease than other representations, e.g., the master equational approach. For example, on the basis of the transfer function around the feedback loop and the equivalent noise bandwidth of a system, a frequency domain technique for the analysis of intrinsic noise in negatively auto-regulated gene circuits has been proposed (Simpson Siegal-Gaskins 2003). Working with Langevin equations is beneficial, especially, when one can clearly find how the internal noise involved in biochemical reactions is related to the parameter values, the system size, and the state variables that evolve with time. Because internal noise is inherent in biochemical reactions and cannot be switched off, the Langevin equations have been extensively used in modeling intrinsic noise in cellular systems to study their constructive roles such as internal noise stochastic resonance (INSR) (Hou and Xin 2003) and coherence resonance (Steuer et al. 2003). Let us take a biochemical clock model, which incorporates three species: mrna (M), clock proteins in cytosol (P C ), and in nucleus (P N ), as an example (Hou and Xin 2003). The deterministic equations that describe the evolution of the three species are d[m] dt d[p C ] dt d[p N ] dt k1 n = ν s k1 n +[P N ] n ν [M] m k m +[M] a 1 a 2, (2.107) = k s [M] ν d [P C ] k d +[P C ] k 1[P C ]+k 2 [P N ] a 3 a 4 a 5 + a 6, (2.108) = k 1 [P C ] k 2 [P N ] a 5 a 6. (2.109) According to (2.104), the Langevin equations for this system can be written directly as

71 60 2 Dynamical Representations of Molecular Networks Figure 2.8 Bifurcation diagrams for the deterministic equations (2.107) (2.109) and the chemical Langevin equations (CLE) (2.110) (2.112) (from (Hou and Xin 2003)) d[m] dt d[p C ] dt d[p N ] dt =(a 1 a 2 )+ 1 V [ a 1 Γ 1 (t) a 2 Γ 2 (t)], (2.110) =(a 3 a 4 a 5 + a 6 )+ 1 V [ a 3 Γ 3 (t) a 4 Γ 4 (t) a 5 Γ 5 (t)+ a 6 Γ 6 (t)], (2.111) =(a 5 a 6 )+ 1 [ a 5 Γ 5 (t) a 6 Γ 6 (t)], V (2.112) where Γ i=1,...,6 (t) are Gaussian white noise signals with Γ i (t) = 0 and Γ i (t)γ j (t ) = δ ij δ(t t ), and V is the system size, e.g., the cell volume. When removing the second terms in the brackets at the right side of (2.110) (2.112), the equations are equivalent to the deterministic ones (2.107) (2.109). Therefore, the second terms actually denote the internal noise. It is clear that the magnitude of the internal noise scales as 1/ V and also depends not only on the control parameters but also on system states, i.e., the concentrations of [M], [P C ], and [P N ]. Due to the explicit expression of noise in the Langevin equations, the constructive roles of intrinsic and extrinsic noise can be studied relatively easily. For example, as shown in Figure 2.8, when the system size is very large and the deterministic kinetics applies, the system would not sustain oscillations if the control parameter is less than the threshold. On the other hand, when the system size is small, stochastic oscillations occur in the Langevin equations for such a parameter region. Such stochastic oscillations are distinct from random

72 2.5 Stochastic Representation 61 noise in that there is a clear peak in their power spectra. These oscillations are induced by the internal noise. In addition, the best performance at an optimal noise level demonstrates the occurrence of INSR (Hou and Xin 2003). Figure 2.9 Dependence of the effective SNR on the system size at ν s = 0.25 (from (Hou and Xin 2003)) To measure the relative performance of the stochastic oscillations quantitatively, signal-to-noise ratio (SNR) can be defined, i.e., β = S/N, where S and N are signal and noise strength, respectively. The dependence of SNR on the system size V is plotted in Figure 2.9. There is a clear maximum for system size V 10 4, which demonstrates the existence of a resonance region. Since this resonance effect is purely induced by the internal noise, it is simply called INSR. See (Hou and Xin 2003) for an algorithm on how to calculate the effective SNR. Recently, increasing amount of experimental and theoretical evidence indicates that noise can play a very important role in cellular systems. For example, noise-based switches and amplifiers were studied for gene expression in a single network derived from the bacteriophage λ (Hasty et al. 2000). Fluctuation-enhanced sensitivity of intracellular regulation in a single cell has also been reported (Paulsson et al. 2000). Internal noise can induce circadian oscillations, and the performance of the noise-induced circadian oscillation reaches a maximum with variation of the internal noise level, indicating the occurrence of INSR (Hou and Xin 2003, Li and Lang 2008). When combined with delays, a noisy system may keep oscillations for parameter values such that its corresponding deterministic one reaches a steady-state level (Bratsun et al. 2005,Lewis 2003). Noise-induced collective behavior was also discovered

73 62 2 Dynamical Representations of Molecular Networks for multicellular systems (Chen et al. 2005, Zhou et al. 2005). All these phenomena show that noise can induce potential richness to cellular dynamics Fokker Planck Equations Fokker Planck equations are a special type of master equation, and are often used as an approximation to the actual master equations or as a model for more general Markov processes. By the Taylor expansion of a j (X(t) ν j )P (X(t) ν j ) to order two in (2.56), we obtain the Fokker Plank equation for the general molecular network as follows: P(X; t) N M = ν ji a j (X) P (X; t) t X i=1 i j=1 + 1 N 2 M ν 2 2 X i X jia j (X) P (X; t), (2.113) j i=1 where all parameters and variables are defined in (2.56). This equation can be rewritten in the following more compact form: j=1 P(X; t) t = N i=1 X i [A i (X)P (X; t)] N i=1 2 X i X j [D i (X)P (X; t)], (2.114) or in a vector form P(X; t) t where 2 = X [A(X)P (X; t)] + 1 [B(X)P (X; t)], (2.115) 2 X2 A(X) =(A 1,..., A N ), (2.116) D(X) =(D 1,..., D N ), (2.117) B(X) =[B ij ] N M, (2.118) M A i (X) = ν ji a j (X), (2.119) j=1 B ij (X) =ν ji a 1/2 j (X), (2.120) M D i (X) = Bij(X). 2 (2.121) j=1

74 2.5 Stochastic Representation 63 Note that the master equation is expanded with respect to variables X and discrete jumps ν ji. If the variables are transformed as x = X/V with the discrete jumps as ν ji /V, where V is the system size (e.g., the cellular volume), the master equation can be expanded with respect to x and ν ji /V by the Taylor series, and such an approximation is called the Kramers Moyal expansion. The first-order term of the Taylor or Kramers Moyal expansion for the master equation is the deterministic kinetics of the molecular network, while the second-order term represents the Langevin dynamics. It is clear that the Fokker Planck equation (2.113) is a continuous approximation of discrete state X to the master equation (2.56) and can be proved to be equivalent to the Langevin equations (2.104). The Fokker Planck equation is beneficial in the sense that some theoretical analysis can be conducted. For example, the equilibrium probability distribution can be obtained from (2.113) with one variable as follows: P eq (X) = C B(X) exp [ 2 X 0 A(X ) B(X ) dx ] and t X = A i(x), (2.122) where C is a constant, and is the mean over all X. Note that P eq is the equilibrium probability distribution in the completely stochastic case, not an equilibrium in the deterministic one. On the basis of the equilibrium probability distribution, the equilibrium mean concentration or number can be obtained. The Fokker Planck equation is an approximation to but similar to the master equation in that it describes the evolution of probability distribution of the state X(t). One main difference between them is with regard to the presentation of the species. In the Fokker Planck equation, the description is continuous, while in the master equation the representation is discrete. If the biochemical system contains only a few molecules, the discrete representation is more accurate than the continuous one. As an example, consider the effect of a randomly varying external field or environment on the biochemical reactions. We will show how to transform a Langevin equation to an equivalent Fokker Planck equation, or transform the master equation approximately to a Langevin equation. Reconsider the example (2.102) with additive noise Γ (t), which is a rapidly fluctuating random term with zero mean Γ (t) = 0 and is δ-correlated, i.e., Γ (t)γ (t ) = Dδ(t t ) with D proportional to the strength of the perturbation. Introducing the probability distribution P (X; t), i.e., the probability of the system with a state of concentration X at time t, its corresponding Fokker Planck equation for P (X; t) can be constructed as follows: P(X; t) t (f(x)p (X; t)) = X = 2 P (X; t) X 2 + D 2 J(X; t) X, (2.123)

75 64 2 Dynamical Representations of Molecular Networks where J(X; t) is the following probability flux J(X; t) =f(x)p (X; t) D P(X; t) 2 X. (2.124) Compared with (2.115), in (2.124), clearly, A(X) =f(x) andb(x) =D. At the equilibrium distribution, we have J(X; t) = 0. Then, integrating (2.124) over X, the equilibrium distribution for one-dimensional X is therefore given analytically as P eq (X) =Ke 2/Dφ(X) (2.125) with φ(x) = f(x), (2.126) X i.e., φ(x) can be viewed as an energy landscape, where K is a normalization constant determined by considering the integral of P eq (X) over all X to be unity (Hasty et al. 2000). Using (2.125), the equilibrium mean value is given by X eq = XP eq (X)dX. (2.127) 0 For multiplicative noise signals described by (2.103), the equilibrium probability distribution is obtained by transforming (2.103) to an equivalent Fokker Planck equation P(X; t) t = (h(x)+ D 2 g(x)g (X))P (X; t) + D 2 g 2 (X)P (X; t) X 2 X 2, (2.128) where the prime denotes the derivative of g(x) with respect to X. Compared with (2.115), A(X) = h(x) +Dg(X)g (X)/2 andb(x) = Dg 2 (X). The equilibrium probability distribution for one-dimensional X can be similarly obtained as follows: P eq (X) =Le 2/Dφ m(x) (2.129) with φ m (X) X ( = h(x)+ D ) 2 g(x)g (X), (2.130) where the function φ m (X) can also be viewed as a potential (Hasty et al. 2000), and L is a normalization constant. Clearly, from the master equation (2.56), we can derive the corresponding Langevin equations (2.104) and Fokker Planck equation (2.115) for a general molecular network. Note that the Fokker Planck equation (2.115) and the Langevin equations (2.104) are equivalent.

76 2.5 Stochastic Representation Cumulant Equations It can be proven that when all state variables X follow Gaussian distributions, the Langevin equation (2.104) or the Fokker Planck equation (2.113) can be equivalently expressed by the first and second cumulant evolution equations, which means that we can actually examine the dynamics by deterministic cumulant equations instead of the complicated stochastic dynamics such as the master equation and the Langevin equations. Next, we first describe a procedure to derive cumulant equations up to any order for a general cellular system, and then describe a closed-form expression of cumulant equations for the system with all variables following Gaussian distributions. From the master equation (2.56) for a general molecular network, define K i (X(t)) = K ik (X(t)) = M ν ji a j (X(t)) (2.131) j=1 M ν ji ν jk a j (X(t)) (2.132) j=1 for i =1,..., N and k =1,..., N. Let the concentration of X(t) bex(t), i.e., x(t) =X(t)/V, where V is the system size or the individual cellular volume. By using the concentration and letting f i (x(t)) = K i (Vx(t))/V, K ik (x(t)) = K ik (Vx(t))/V, (2.133) the Langevin equations (2.104) with molecular numbers as states can be rewritten into the following form with concentrations as states (Van Kampen 1992, Chen et al. 2005): dx i (t) = f i (x(t)) + ξ i (t), (2.134) dt where the vector ξ i are Gaussian white noise signals with zero means and the intracellular covariances K ik (x(t)), i.e., ξ i (t) = 0 and ξ i (t)ξ k (t ) = K ik (x(t))δ(t t ), and K ik is an N N matrix. is the means, or implementation of integration over the probability distribution, i.e., f(x) = f(x)p (Vx; t)dx, (2.135) x where X = Vx. Clearly, Langevin equations (2.104) and (2.134) are equivalent but with different variables, and both are derived from (2.56). The first and second cumulants for any two random variables x i and x j are actually their means and covariances, i.e., x i, x j,and x i x j x i x j. Letting g(x(t),s)= N i=1 x si i (t), (2.136)

77 66 2 Dynamical Representations of Molecular Networks then for each integer-valued vector s =(s 1,..., s N ), the moment evolution equations corresponding to (2.134) are given as follows (Kawai et al. 2004, Wojtkiewicz et al. 1996), d g(x(t),s) dt = N i=1 g f i + 1 x i 2 N N i=1 k=1 K ik 2 g, (2.137) x i x k which can be directly used to derive the cumulant evolution equations. Therefore, from (2.134) and (2.137), we can derive the cumulant equations up to any order for a general molecular network which, however, may not have a closed-form. On the other hand, if each variable x i in a system follows a Gaussian distribution, we can show that it can be expressed exactly by the first- and second-order cumulant equations in a closed-form. For such a system, all odd central moments vanish and any even central moment can be expressed as products of the second central moments. For instance, x i x j x k x l c = x i x j c x k x l c + x i x k c x j x l c + x i x l c x j x k c, (2.138) x i x i x i x j x j x k c =6 x i x i c x i x j c x j x k c +6 x i x j 2 c x i x k c +3 x i x i c x i x k c x j x j c, (2.139) where all moments are central moments (Chen et al. 2005), e.g., x i x j c = (x i x i )(x j x j ) = x i x j x i x j. Note that cumulants are identical to central moments for the first, second, and third orders, and any differentiable function f(x) can be expanded around x by central moments, i.e., for the Gaussian distribution of X, since all odd central moments are zero, f(x) can be expanded as f(x) = f( x )+ 1 2 f( x ) 2! x 2 xx c f( x ) 4! x 4 xxxx c +.(2.140) The cumulant evolution equations of (2.134) with a closed-form expression can be derived in terms of the first-order cumulant m i and the second-order cumulant c ik as follows: with dm i (t) dt dc ik (t) dt = F i (m(t),c(t)), (2.141) = G ik (m(t),c(t)), (2.142) F i (m(t),c(t)) = f i (x(t)) (2.143) G ik (m(t),c(t)) = (x i (t) m i (t))f k (x(t)) + (x k (t) m k (t))f i (x(t)) + K ik (x(t)), (2.144)

78 2.5 Stochastic Representation 67 where i, k =1,..., N. In other words, a molecular network can be expressed exactly by the first-order and second-order cumulant equations. Here, the first cumulants or means of x are m =(m 1,..., m N ) with each element m i = x i, and the second cumulants or covariances of x are c =[c ik ] N N with each element c ik = (x i m i )(x k m k ). (2.141) is obtained by directly integrating (2.134) over all x i, whereas (2.142) is derived by integrating d(x i m i )(x k m k )/dt over all x i and x k with the substitution of (2.134), m i,andm k.allf i and G ik are the results of those integrating implementations. The vector m clearly has N elements. On the other hand, the non-zero elements of the covariance matrix c are at most N(N +1)/2, but more than N because any two molecules in each cell are not generally dependent on each other. A detailed example, which shows how to derive the explicit expression for the Langevin equations (2.134) and cumulant equations (2.141) (2.142) for a gene regulatory network can be found in (Chen et al. 2005). The system (2.141) (2.142) is a closed-form expression in m and c, and hence can be viewed as a deterministic representation for a cellular system although there are high-order statistic terms, i.e., covariances c ij in addition to the means m i. By examining deterministic cumulant equations, we can approximately determine the qualitative dynamics of the original stochastic system described by master or Langevin equations, e.g., the stochastic stability and the stochastic synchronization from the viewpoint of probability distribution. For instance, if each x i follows a Gaussian distribution, the switching dynamics, oscillatory behavior or synchrony in (2.141) (2.142) correspond to those in (2.134), because the original stochastic dynamics of x can be entirely reconstructed from the deterministic means m and covariances c (Chen et al. 2005). The fundamental importance of intrinsic and extrinsic noises within molecular networks has been appreciated within both mathematical and biological communities with increasing interest in recent years. These studies include techniques to predict (Mettetal et al. 2006), estimate (Orrell et al. 2005), and control (Orrell and Bolouri 2004) stochastic noise, effects of negative and positive feedback on intrinsic and external noise (Yi 2004), noise-induced qualitative change in dynamics (Hasty et al. 2000, Chen et al. 2005, J. Wang et al. 2007), enhancement of cellular memory by reducing stochastic transitions (Acar et al. 2005), robust properties with respect to noise (Gonze et al. 2002a), and some other stochastic cellular models to quantitatively or qualitatively investigate different roles played by noise (Tian and Burrage 2006, Tsimring et al. 2006). Elucidating constructive roles of stochastic noise and developing new techniques to deal with these stochastic fluctuations will remain an important and attractive subject of active research.

79 68 2 Dynamical Representations of Molecular Networks 2.6 Deterministic Representation There are two mathematical forms to model dynamical molecular networks, generally, i.e., one is a stochastic formulation that explicitly includes the discrete and probabilistic change in reactant molecule numbers as reactions occur and the other is a deterministic formulation with reactant concentrations varying continuously in time and governed by a system of rate equations. The advantage of the deterministic representation is that qualitative behavior can be obtained relatively easily by using theoretical methods of complex networks, nonlinear dynamics, and control. Therefore, with the further development of sophisticated analytical and computational techniques, deterministic representation is expected to not only be a powerful tool to provide testable predictions and foundations for designing synthetic molecular networks and controlling cellular processes but also have great potential for biotechnological and therapeutic applications Basic Kinetics In the deterministic formulation, a cellular system or molecular network is considered to be a series of elementary biochemical reactions, whose kinetics can be described by rate equations according to the mass action law. Consider the simplest monomolecular reaction, which is called the firstorder reaction due to one reactant: A k A. (2.145) It states that the species A is converted to species A with a rate constant k. As the reaction occurs, [A] decreases and [A ] increases. In this book, [X] represents the concentration of species X, but we also directly use X to represent its concentration if no confusion arises. Many biochemical reactions such as transcription from DNA to an mrna, translation from an mrna to a protein, and transportation of a protein from the cytoplasm into the nucleus can be written in such a form. The reaction rate depends not on the products but only on the reactants and is a positive quantity identical for all participating species in an elementary reaction. Particularly, for the reaction (2.145), the rate equation has the form r = d[a ] dt = d[a] dt = k[a]. (2.146) Bimolecular reactions probably are the most common reactions occurring in cellular systems. The two molecules can be the same or different. If the reactants are two different species, the reaction takes the following form, which is called the second-order reaction due to the presence of two reactants: A + B k AB. (2.147)

80 2.6 Deterministic Representation 69 Many biochemical reactions, e.g., binding of a TF to DNA, formation of a heterodimer through binding of two different proteins, and conversion of a substrate into a complex catalyzed by an enzyme, take such a form. The reaction rate is r = d[ab] dt = d[a] dt = d[b] dt = k[a][b]. (2.148) When both the reactants are the same species, the reaction becomes 2A k A 2. (2.149) Formation of a homodimer through binding of two monomers and formation of a tetramer through binding of two identical homodimers are such types. The reaction rate can be written as r = d[a 2] dt = 1 d[a] = k[a] 2. (2.150) 2 dt The prefactor 1/2 ensures that one gets the same rate for the reactants and products. DNA is a relatively stable molecule; on the other hand, mrnas and proteins can be constantly degraded by cellular machinery and recycled. The degradation of mrnas and proteins takes the simple form which is also a first-order reaction: A k 0, (2.151) where A can be an mrna or a protein. The reaction rate can be written as r = d[a] = k[a]. (2.152) dt Besides monomolecular and bimolecular reactions, there are some other complex reactions such as reversible reactions, parallel reactions, and consecutive reactions, each of which can be decomposed into multiple elementary reactions. Of course, there are also some other types of reactions which occur rarely such as trimolecular reactions. We now develop formalisms for a general biochemical reaction. A general elementary biochemical reaction takes the form (see (2.1)) r 1 R 1 + r 2 R 2 + r m R m k p1 P 1 + p 2 P p n P n, (2.153) where m is the number of reactants, and n is the number of products. With m reactants, (2.153) is the m i=1 r i-order reaction due to m i=1 r i reactants. R i is the ith reactant and P j is the jth product. r i and p j are the numbers of reactant R i consumed and product P j produced in a reaction step, respectively. The coefficients r i and p j are known as stoichiometries. The reaction rate based on the mass action law is

81 70 2 Dynamical Representations of Molecular Networks r = 1 r i d[r i ] dt = 1 p j d[p j ] dt = k m [R l ] r l, (2.154) where i =1,..., m and j =1,..., n. When more than one reaction are considered, the differential equations for a certain chemical is simply the sum of the contribution of each reaction. For example, based on rate equations, the reactions lead to a system of differential equations l=1 A + B k AB (2.155) AB k 1 A+ B (2.156) d[ab] = k 1 [A][B] k 1 [AB], (2.157) dt d[a] = d[b] = k 1 [AB] k 1 [A][B]. (2.158) dt dt The transformation is simply the sum of the rates of the two reactions. Generally, a cellular system consists of a network of coupled biochemical reactions. These reactions may be types of transcription, translation, dimerization, protein or mrna degradation, enzyme-catalyzed reactions, transportation, and transduction. Such reactions constitute various metabolic, genetic, and signaling networks. Moreover, one species may participate in multiple reactions, and one species can be a reactant in one reaction and a product in another one. Thus, the change of one species is actually the sum of changes in all reactions that it participates in Deterministic Representation of a General Molecular System Recalling the general molecular network represented by master equation (2.56), we can also derive the deterministic representation in the form of differential equations with the numbers of molecules as variables by simply eliminating noise terms of (2.104) as dx i (t) dt = M ν ji a j (X(t)), (2.159) j=1 where all variables and parameters are defined as those in (2.104). The same expression but with the concentrations of molecules as variables can be obtained from (2.134) by eliminating ξ i (t). Actually, the deterministic representation (2.159) for a general molecular network is identical to (2.141) when X i /v = m i and all c ij = 0. In other words, when we ignore the stochastic noise (or let the means represent whole dynamical states), (2.159) approximately captures the main dynamical behavior of the general molecular network described by (2.56).

82 2.6.3 Michaelis Menten and Hill Equations 2.6 Deterministic Representation 71 By exploiting multiple time scales, e.g., fast slow dynamics, the reaction (2.159) can be approximately expressed in a simple form. A commonly used reaction model for enzymatic reactions is the Michaelis Menten (MM) equation, which is an approximation of the original dynamics. In this reaction mechanism it is assumed that the enzyme is neither consumed nor produced; therefore, the total concentration of the enzyme remains constant. It only interacts directly with the substrate to form an enzyme substrate complex, which leads to synthesis of the product and release of the enzyme: E + S k 1 ES k 2 E + P, (2.160) k 1 where E, S, andp are enzyme, substrate, and product, respectively. The rate equations are now a system of differential equations: d[s] = k 1 [E][S]+k 1 [ES], dt (2.161) d[e] = k 1 [E][S]+(k 1 + k 2 )[ES], dt (2.162) d[es] = k 1 [E][S] (k 1 + k 2 )[ES], dt (2.163) d[p ] = k 2 [ES]. dt (2.164) The quasi-steady-state assumption for the enzyme substrate complex ES due to the fast reaction, i.e., d[es]/dt = 0, leads to with the MM constant [ES]= [E][S] K M (2.165) K M = k 1 + k 2. (2.166) k 1 Combining the quasi-steady-state complex concentration and the conservation law for the enzyme [E] T =[E] +[ES] with the total enzyme concentration [E] T, we obtain [ES]= [E] T[S] K M +[S] (2.167) for the enzyme. This leads to the well-known MM equation d[p ] dt = V max[s] K M +[S], (2.168)

83 72 2 Dynamical Representations of Molecular Networks where V max = k 2 [E] T is the maximum reaction rate, and [E] T is assumed to be constant. In a molecular network, when there is no cooperative interactions among molecules such as dimerization, binding of TFs and their cofactors, and binding of TFs to an operator site of a promoter on DNA, a cellular process (e.g., gene regulation) can be described approximately by the MM type of equations. Generally, gene expression can be regulated at multiple levels involving interactions among inducers (or cofactors), repressors, activators, and operator sites (or binding sites) on DNA. On one hand, a repressor (or an activator) protein, which is called a TF, binds to an operator site at the beginning of a gene to prevent (or promote) RNAP attaching to the DNA to synthesize mrna. On the other hand, an inducer which is called a cofactor, binds to a repressor (or activator), causing it to change shape and preventing (or enhancing) its binding to the DNA strand. Therefore, the interactions among TFs, cofactors, and operator sites lead to two categories of transcriptional regulation: activation and repression. Take the binding of a repressor protein to an inducer as an example. The repressor protein P binds to a small inducer I to form a complex PI by P + I k 1 PI. (2.169) k 1 The repressor is therefore found in either free (P )orbound(pi) form. Assume that the number of P is relatively stable in contrast to the small inducer I, i.e., the conservation law states that the total concentration of the repressor protein remains constant of [P ] T, i.e., The kinetic rate equation is [P ] T =[P ]+[PI]. (2.170) d[pi] = k 1 [P ][I] k 1 [PI]. (2.171) dt A system is said to be at equilibrium when its state ceases to change. Assume that the system has reached its equilibrium, that is, there are no further net changes. Thus, which leads to d[pi] dt = k 1 [P ][I] k 1 [PI]=0, (2.172) K eq [PI]=[P ][I], (2.173) where K eq = k 1 /k 1 is called the equilibrium constant. Combining the conservation of the total repressor concentration, we derive the MM equation as follows:

84 2.6 Deterministic Representation 73 [PI]= [P ] T[I]. (2.174) [I]+K eq The MM function has three notable features (Alon 2006): 1. it reaches saturation at high [I]; 2. it has a regime where [PI] increases linearly with I when [I] << K eq ; 3. the fraction of bound protein reaches 50% when [I] =K eq. Another commonly used function is the Hill type of equation, which can be viewed as an extension of the MM function. As well known, many molecules or TFs have several identical subunits or are composed of several identical proteins, e.g., in the form of dimers or tetramers. The Hill function is often used to describe cooperative interactions among such molecules and can be derived as follows. Assume that the repressor protein P has n active binding sites. When n identical inducer monomers I bind to the protein P, the protein P can be either bound to n molecules of I, described by complex PI n,or unbound, denoted by P. The reaction takes the form P + ni k 1 PI n. (2.175) k 1 Assume that the total concentration of the bound and unbound P is P T, which remains constant. The conservation law takes the form The kinetic rate equation for such a reaction is [P ] T =[P ]+[PI n ]. (2.176) d[pi n ] = k 1 [P ][I] n k 1 [PI n ], (2.177) dt which reaches an equilibrium usually within milliseconds and therefore leads to a steady-state approximation, i.e., k 1 [P ][I] n = k 1 [PI n ]. Combining the conservation law (2.176), we derive the Hill equation as follows: [PI n ]= [P ] T[I] n Keq n f([i]), (2.178) +[I] n where K n eq = k 1 /k 1. Similarly, we can derive the MM function and the Hill function for binding of TF proteins and operator sites based on the conservation of operator sites. The parameter K eq is termed the activation coefficient. Half-maximum binding [P ] T /2isreachedat[I] =K eq and a limiting value, i.e., maximum binding [P ] T, is reached at a sufficiently high level of [I]. This saturation of the Hill function is fundamentally due to physical restriction of binding sites of the promoter. The parameter n is termed as the Hill coefficient. The larger is n, the steeper of the Hill curve (Alon 2006). When n =1, we obtain the MM equation.

85 74 2 Dynamical Representations of Molecular Networks The Hill equation (2.178) is a monotonically increasing function with respect to [I]; therefore, it is the case where I acts as an enhancer. For a repressor case, the Hill function takes the form f([i]) = [P ] T 1+([I]/K eq ) n, (2.179) which is a decreasing function of I. The MM and Hill equations are commonly used in modeling molecular networks. For example, when studying network topologies that can achieve adaption, i.e., the ability to reset signaling systems after responding to stimuli, the MM functions are used directly to model enzymatic regulatory networks, as shown in Figure It was shown that only two major core topologies emerge as robust adaption: a negative feedback loop with a buffering node and an incoherent feedforward loop with a proportioner node (Ma et al. 2009). Moreover, the use of Hill functions to model the enzymatic regulatory networks does not significantly alter their results (Ma et al. 2009). When modeling gene networks (or genetic networks), MM and Hill kinetics are generally used for transcriptional rates, and mass action kinetics are used for all other rates, e.g., mrna and protein degradations, translation, and complex formation and dissociation (Mirsky et al. 2009). Figure 2.10 Illustration using MM functions to model one enzymatic regulatory network (from (Ma et al. 2009)) For transcriptional processes, the mathematical expression of the toggle switch (Gardner et al. 2000) is another example demonstrating how to use the MM and Hill equations to model a molecular network. Specifically, let us consider a case where each of the two proteins negatively regulates the synthesis of the other, as depicted in Figure 2.11 (Gardner et al. 2000). The dynamics of the toggle switch can be described by the dimensionless model du dt = α 1 u, 1+vβ (2.180) dv dt = α 2 v, 1+uγ (2.181)

86 2.6 Deterministic Representation 75 Figure 2.11 The toggle switch is composed of two co-repressive genes. The constitutive promoter 1 drives the expression of the laci gene, which produces the lac repressor tetramer. The lac repressor tetramer binds the lac operator sites adjacent to the promoter 2, thereby blocking transcription of ci. The constitutive promoter 2 drives the expression of the ci gene, which produces the λ-repressor dimer. The λ-repressor dimer cooperatively binds to the operator sites within the promoter 1, which prevents transcription of laci (from (Gardner et al. 2000)) where u is the concentration of the repressor 1; v is the concentration of the repressor 2; α 1 and α 2 are the effective rates of synthesis of the repressor 1 and repressor 2, respectively; β and γ are the cooperativity parameters of repression of the promoter 2 and promoter 1, respectively. The first terms on the right side of (2.180) (2.181) represent production of the two repressors due to transcription. The system equations preserve the two most fundamental aspects of the network: mutual repression of constitutively transcribed genes and degradation/dilution of the repressors, described by the first and second terms, respectively, in each equation (Gardner et al. 2000) Total Quasi-steady-state Approximation Although the MM approximation is widely used to study enzyme kinetics, it is not applicable to the signaling cycle because the MM approximation assumes the much lower concentration of the enzyme than that of the substrate, which is not generally valid in signaling pathways. For example, substrates and enzymes of MAPK pathways are usually present at comparable concentrations in Saccharomyces cerevisiae and Xenopus oocyte cells (Ferrell 1996). Recently, the total quasi-steady-state approximation (tqssa) which is valid more generally was proposed and applied to model biomolecular networks of coupled enzymatic reactions, including interconnection of phosphorylation (Ciliberto et al. 2007). The tqssa for the irreversible (Tzafriri 2003) and reversible MM schemes (Tzafriri and Edelmana 2004) have been developed for the limit of low enzyme concentrations and also for high enzyme concentrations. Their reduced tqssa representation accurately reproduces behavior predicted by detailed mass action kinetics models. Consider the signaling cycle in signal transduction consisting of a substrate protein that can be in one of two states, either active A or inactive I (Gomez

87 76 2 Dynamical Representations of Molecular Networks et al. 2007). The signaling cycle can be modeled by two enzymatic reactions, which have the form of MM equations similar to (2.168): k 1 k I + E 1 IE 1 2 A+ E1, (2.182) k 1 k 3 k A + E 2 AE 2 4 I + E2, (2.183) k 3 where E 1 is the kinase and E 2 is the phosphatase. Considering that the six chemical species, i.e., I, E 1, IE 1, A, E 2,andAE 2, and three conservation relations, i.e., that the kinase, the phosphatase, and the substrate protein, are conserved, we obtain a system of three variables according to the mass action law: where dā dt = k 2IE 1 k 4 AE 2, (2.184) die 1 = k 1 [( dt Ā IE 1)(Ē1 IE 1 ) K 1 IE 1 ], (2.185) dae 2 = k 3 [(Ā dt 2)(Ē2 AE 2 ) K 2 AE 2 ], (2.186) K 1 = k 2 + k 1 k 1 and K 2 = k 4 + k 3 k 3 (2.187) are the MM constants for the kinase and the phosphatase, respectively. S stands for the total concentration of the substrate protein in both active and inactive forms, i.e., S = I + A. X denotes the concentration of an unbound chemical species, and X denotes the total concentration of bound and unbound forms, i.e., Ā = A + AE 2, Ē1 = E 1 + IE 1,andĒ2 = E 2 + AE 2 (Gomez et al. 2007). Assuming that the complexes IE 1 and AE 2 have faster dynamics than the active protein Ā and that they are always at equilibrium with respect to the active substrate protein, the following algebraic expressions are obtained: where IE 1 = K 1 + Ē1 + S Ā (1 1 4r 1 ), 2 (2.188) AE 2 = K 2 + Ē2 + Ā (1 1 4r 2 ), 2 (2.189) r 1 = r 2 = Ē 1 ( S Ā) (K 1 + Ē1 + S, (2.190) Ā)2 Ē 2 Ā. (2.191) (K 2 + Ē2 + Ā)2

88 2.6 Deterministic Representation 77 Approximating to the first order in r 1 and r 2 yields the expressions IE 1 = AE 2 = Ē 1 ( S Ā) (K 1 + Ē1 + S (2.192) Ā), Ē 2 Ā (2.193) (K 2 + Ē2 + Ā). Inseting (2.192) (2.193) into (2.184), we obtain the time evolution of the total active protein as follows: dā dt = k Ē 1 ( S Ā) 2 K 1 + Ē1 + S Ā k Ē 2 Ā(t) 4 (2.194) K 2 + Ē2 + Ā Deriving Rate Equations Now, we present a procedure for deriving rate equations of a cellular system. The procedure is summarized as follows: 1. Draw an illustration of all reactions to be considered. It contains all the related reactants and products. 2. Introduce the respective concentrations of substances as system states. 3. For each substance S, according to the mass action law, sum up the rates of all reactions that it participates in, i.e., reactions which transform other substances to S and reactions which transform S to other substances. 4. Simplify the system or the network by utilizing the inherent separation of time scales, i.e., by allowing fast equations to be at their respective equilibrium. 5. Further simplify the system by conservation laws, e.g., total enzyme concentrations and the number of total binding sites of each promoter remain constant. Consider the example of the toggle switch. To show how to obtain (2.180) (2.181), we write the biochemical reactions governing the process in detail. Generally, biochemical reactions are divided into two categories: fast and slow reactions. The fast reactions are mainly phosphorylation, multimerization, and binding processes and have rate constants of order seconds, and are assumed to be in equilibrium with respect to the slow reactions. Letting R 1, P 1, R 2, P 2, R γ 1,andRβ 2 denote the repressor 1, promoter 1 (or binding site 1), repressor 2, promoter 2 (or binding site 2), repressor 1 multimer (or γ-mer), and repressor 2 multimer (or β-mer), respectively, in Figure 2.11, we can write the biochemical reactions as follows:

89 78 2 Dynamical Representations of Molecular Networks P 1 + R γ 1 P 2 + R β 2 K 3 γr 1 R γ 1, (2.195) K 4 βr 2 R β 2, (2.196) K 1 P 1 R γ 1, (2.197) K 2 P 2 R β 2, (2.198) where K i = k i /k i are equilibrium constants. All these reactions generally evolve considerably fast, e.g., in terms of less than seconds, and reach equilibrium quickly, compared with the transcription and translation reactions, which usually require more than a few minutes. On the other hand, the slow irreversible reactions, such as transcription of mrnas, translation of proteins, and degradation of proteins and mrnas, evolve on a time scale that is much slower than those of fast reactions (2.195) (2.198). The slow reactions are mainly a synthesis of the two repressors and their degradation P 2 R β 2 P 1 R γ 1 k P 1 2 P 2 + nr 1, (2.199) k 1 P 2 R β 2 + nr 1, (2.200) k P 2 1 P 1 + mr 2, (2.201) k 2 P 1 R γ 1 + mr 2, (2.202) d R 1 1 0, (2.203) d R 2 2 0, (2.204) where n and m denote the number of the two repressors per mrna transcript, respectively. Here, for the sake of simplicity, assume that k 1 = k 2 = 0 due to the strong repression of R 1 and R 2. There are also conservation conditions for the total binding sites of the two promoters, i.e., P 1 + P 1 R γ 1 = P 2 + P 2 R β 2 = P T, where P T is the concentration of the genes ci and lac and is assumed to be the same for the two genes. By setting all (2.195) (2.198) to be at their respective equilibrium due to their fast reactions, the synthesis rates of the two repressors are V 1 = k 1n[P 2 ]= k 1n[P 2 ][P T ] [P 2 ]+[P 2 R β 2 ] = k 1n[P T ] 1+K 2 K 4 [R 2 ] β, (2.205) V 2 = k 2m[P 1 ]= k 2m[P 1 ][P T ] [P 1 ]+[P 1 R γ 1 ] = k 2m[P T ] 1+K 1 K 3 [R 1 ] γ. (2.206) Assuming the same degradation rates d 1 = d 2 = δ leads to

90 d[r 1 ] dt d[r 2 ] dt = = 2.6 Deterministic Representation 79 k 1n[P T ] 1+K 2 K 4 [R 2 ] β δ[r 1], (2.207) k 2m[P T ] 1+K 1 K 3 [R 1 ] γ δ[r 2]. (2.208) We next eliminate some of the parameters by rescaling the repressors [R 1 ], [R 2 ], and time. The dimensionless variables are defined as follows: u =(K 1 K 3 ) 1/γ [R 1 ], (2.209) v =(K 2 K 4 ) 1/β [R 2 ], (2.210) τ = tδ. (2.211) By substituting (2.209) (2.211) into (2.207) (2.208), we finally obtain (2.180) (2.181) with cooperativity parameters β and γ, respectively, and effective rates α 1 = k 1n[P T ] δ(k 1 K 3 ) γ, (2.212) α 2 = k 2m[P T ] δ(k 2 K 3 ) β. (2.213) Many other detailed examples, e.g., autoregulation of λ repressor expression (Hasty et al. 2000) and a synthetic multicellular gene network (Chen et al. 2005), can also be found in related literature. Details on differential equation models will be discussed in the succeeding chapters. In addition to the above procedure, as indicated in (2.159), we can also directly derive deterministic rate equations from a general molecular network (2.56) Modeling Transcription and Translation Processes Transcription and translation are fundamental processes in cellular systems. In particular, transcription is a key process in gene regulation, which results in the synthesis of RNA under the direction of DNA, i.e., transcribing a gene or a DNA nucleotide sequence into a RNA sequence or mrna, which is enzymatically copied by RNAP and is regulated by various TFs. When two or more TFs bind to the promoter of a gene, similar biochemical reactions shown in the previous section for a transcription process can be derived but with higher-order terms. Generally, to model gene regulation, transcription and translation processes are considered as slow reactions, whereas other biochemical reactions are viewed as fast reactions, which are therefore all reduced to equilibrium based on the quasi-steady-state assumption. Consider the system of (2.9) (2.16) with the conservation condition (2.18) shown in Figure 2.1 as an example. By the quasi-steady-state assumption for (2.9) (2.12), i.e., d[p D]/dt =d[q D]/dt =d[p Q D]/dt = 0 with the conservation condition (2.18), we can derive the following equations or configurations:

91 80 2 Dynamical Representations of Molecular Networks n x [D] =, 1+K 1 [P ]+K 2 [Q]+K PQ (2.214) K 1 n x [P ] [PD]=K 1 [P ][D] =, 1+K 1 [P ]+K 2 [Q]+K PQ (2.215) K 2 n x [Q] [QD] =K 2 [Q][D] =, 1+K 1 [P ]+K 2 [Q]+K PQ (2.216) [PQD]=(K 3 + K 4 )[P ][Q][D] = (K 3 + K 4 )n x [P ][Q] 1+K 1 [P ]+K 2 [Q]+K PQ, (2.217) where K 1 = k 1 /k 1, K 2 = k 2 /k 2, K 3 = k 3 K 2 /(k 3 + k 4 ), K 4 = k 4 K 1 /(k 3 + k 4 ), and K PQ =(K 3 + K 4 )[P ][Q]. Generally, once RNAP binds to the DNA strand, transcription is initiated to produce mrna. Let P (RNAP i ) be the probability that the RNAP is bound to DNA or gene i. Then for a molecular network with n genes, the probability has the following form for i =1,..., n P (RNAP i )= P (c i )P (RNAP i c i ), (2.218) c i C where C is the set of all possible configurations of TFs with RNAP on the DNA, P (c i ) is the probability distribution over TF configurations on the DNA, and P (RNAP i c i ) is the probability (or affinity) that the RNAP is bound given the constellation of TFs bound to the ith gene in configuration c i C. Here, we assume that the gene expression level, i.e., the concentration of mrna is proportional to the probability that the RNAP occupies its regulatory sequence. That is, the transcription and translation processes for the ith (i =1,..., n) gene can be expressed as d[mrna i ] = k Ri P (RNAP i ) d Ri [mrna i ], dt (2.219) d[protein i ] = k Pi [mrna i ] d Pi [Protein i ], dt (2.220) where k Ri and k Pi denote the synthesis rates of the mrna and the protein, respectively; d Ri and d Pi represent the degradation rates of the mrna and the protein, respectively. Equations (2.219) (2.220) represent a gene regulatory network. For the case of Figure 2.1 with one gene, there are eight configurations of TFs and RNAP, i.e., Figure 2.1 (a) (h) for c i, i.e., gene-i, but P (RNAP c i )is non-zero only for the cases of Figure 2.1 (b), (e) (g). Assume that the affinities or probabilities of RNAP bound to the gene-i are P (RNAP (b)) = w D, P (RNAP (e)) = w PD,andP (RNAP (f)) = w QD,P(RNAP (g)) = w PQD for the configurations of Figure 2.1 (b), (e) (g), respectively, where we drop the subscript i because there is only one gene. Therefore, according to (2.218), we have

92 2.6 Deterministic Representation 81 P (RNAP i )=P(b)P(RNAP (b)) + P (e)p (RNAP (e)) +P (f)p (RNAP (f)) + P (g)p (RNAP (g)) w D n x + w PD K 1 [P ][D] = 1+K 1 [P ]+K 2 [Q]+(K 3 + K 4 )[P ][Q] + w QDK 2 [Q][D]+w PQD (K 3 + K 4 )[P ][Q][D], (2.221) 1+K 1 [P ]+K 2 [Q]+(K 3 + K 4 )[P ][Q] where P (b) =[D], P (e) =[PD], P (f) =[QD], and P (g) =[PQD] are given in (2.214) (2.217). Therefore, substituting (2.221) with (2.219), we can obtain the local gene regulatory network with (2.220) for gene-i. In the same way, constructing equations for all studied genes, we can derive the gene regulatory network of the system, where the interactions among proteins, i.e., Protein i, can be obtained from the related protein interaction network or signal pathway. According to thermodynamic equilibrium, we can also derive the equation for P (RNAP i ) in the same form. Specifically, we assume that the transcription and translation rates are much slower than other biochemical reactions such as binding and dimerization. Therefore, the system is in a state of thermodynamic equilibrium such that each configuration is achieved with a probability that is proportional to its probability under the Boltzmann distribution. If for the ith gene there are m i TFs, among which the jth TF has s ij binding sites, then P (RNAP i )= si1 s imi q i1=0 q imi =0 w 1[TF] Q 1+ 1 si1 s imi q i1=0 q i1=0 q imi =0 w, (2.222) 0[TF] Q where w 1 = w(1,q i1,..., q imi )andw 0 = w(q 0,q i1,..., q imi ) are the parameters related to the thermodynamic coefficients (or affinities) on the system and [TF] Q =[TF i1 ] q i1 [TF imi ] q im i.[tf ij ] means the concentration of the jth TF for gene i. Therefore, substituting (2.222) into (2.219), we have the nonlinear dynamical equations of the gene regulatory network (2.219) (2.220). Clearly, (2.219) (2.220) with (2.222) are a general representation of a transcription regulatory network (Chen et al. 2009), which is applicable to a large class of systems. For example, by fitting parameters w, q, k, andd, we can obtain the transcription regulatory relations for any molecular network. As a special example, assume that all TFs independently bind at available distinct sites of a gene s promoter region without interacting with each other, i.e., there is no cooperativity between TFs. Then, the molecular network with n genes can be mathematically expressed as d[mrna i ] dt = d i0 d i [mrna i ] +d i (1 ρ([tf i ],s ij,θ ij )) ρ([tf i ],s ij,θ ij ), (2.223) j R + i j R i

93 82 2 Dynamical Representations of Molecular Networks and (2.220), where ρ([tf],s,θ)=1/(1 + θ[tf]) s. Let R i be the regulator or TF set of the ith gene. Then, R + i is the set of activators in R i and R i is the set of inhibitors in R i. Hence, R i = R + i R i. s ij represents the number of binding sites for the jth TF of gene i, andθ ij represents the affinity constant for the jth TF to the binding site of gene i. d io is the transcription initiation rate of the ith gene, and d i is the degradation rate of the ith gene. Due to the assumption of independence between TFs, there is a considerably smaller set of parameters in the above expression, compared with those in (2.222). 2.7 Hybrid Representation and Reducing Molecular Networks A living organism is inherently a discrete and stochastic system due to the random birth and death of molecules in cells. Explicitly considering all variables and chemical reactions in a cell is unrealistic for a molecular network, in particular, for a gene regulatory network from modeling, analyzing, and computing viewpoints. However, in a cell, many different time scales characterize the gene regulatory processes, which can be exploited to reduce the complexity of the mathematical models. For instance, the transcription and translation processes generally evolve on a time scale that is much slower than those of phosphorylation, dimerization, and binding reactions. Moreover, in biological systems, there are many subsystems, such as gene regulatory networks, protein interaction networks, and metabolic networks, which can dynamically interact with each other but are relatively independent. In this section, we exploit such properties to simplify a complicated molecular network to a hybrid system or even a deterministic system by developing several models, which can be applied to the quantitative simulation of a large cellular system. Note that there are two different implications for hybrid systems, i.e., one is a hybrid system with both discrete and continuous variables, and the other is a hybrid system with both stochastic and deterministic dynamics, or stochastic hybrid processes. Hybrid simplification approximates some or all discrete variables by continuous ones, and therefore can drastically reduce computational time by eliminating the simulated discrete events from a computational viewpoint because the computational complexity depends on the number of discrete jumps. Besides the hybrid simplification based on the central limit theorem, other simplification schemes, such as averaging approximation and stochastic quasi-steady-state approximation, can also be adopted to simplify the complexity of the complicated molecular networks Decomposition of Biomolecular Networks Consider a general molecular network containing m chemical reactions with n molecular species. Let z =(z 1,..., z n ) be the state of the species, i.e., z i is the number of the ith species at t, which is a non-negative integer. Define P (z; t)

94 2.7 Hybrid Representation and Reducing Molecular Networks 83 as the probability for the state z at t. Then, the dynamics of the system is described by the master equation with initial state z 0 at t = 0, as indicated in (2.56): P(z; t) t = m k=1 [w k(z r k )P (z r k ; t) w k (z)p (z; t)], (2.224) where r k =(r k,1,..., r k,n ) is an integer vector for the change of state, i.e., r k,j is the change in the number of the jth molecule by the kth reaction. w k (z) is the transition rate ( 0) from state z to state z + r k by the kth reaction. Notice that z r k should be non-negative although r k,j can be negative. Note also that (2.224) is the same as (2.56) although different symbols are used. In theory, (2.224) provides complete information of system performances, but only a few simple cases are amenable to exact solutions due to its complexity with a large number of variables. Next, we exploit the fast slow reactions in cellular systems to simplify the master equation. Figure 2.12 Decomposition of a biomolecular network Although dynamics are intertwined among different biological processes, such as gene regulation and protein interactions, interactions within each biological process are generally more active than those between them. Such a property can be explored to decompose a biomolecular network, in particular, a system, including both a gene regulatory network and a protein interaction network. As shown schematically in Figure 2.12, we define a gene regulatory network (gene network) and a protein interaction network (protein network) in the following manner. A gene network is mainly composed of transcription, translation, splicing, and degradation reactions, with total numbers of direct gene products, such as protein and RNA, as variables.

95 84 2 Dynamical Representations of Molecular Networks A protein network is composed of all other chemical reactions, such as phosphorylation, dimerization, binding, enzyme reactions, and chemical modifications, with free chemical numbers as variables. A gene network involves the gene regulation, and its dynamics are generally slow in contrast to the relatively fast reactions in a protein network. In the gene network, rather than the free monomers, we adopt total numbers of gene products as variables, which include not only free monomers but also those in all complexes, such as dimers and other multimers. Note that the degradation or depletion reactions of direct gene products (chemical monomers) are also included in the gene network. With such definition, the gene protein network can generally be decomposed into a gene regulatory network and a protein interaction network. For a general cellular system, there are total m reactions with total n variables. Without loss of generality, assume that the first m 0 reactions are in the protein network. Rearrange the state variables by z =(x, y) andr k = (φ k,θ k ) with x =(x 1,..., x nx ), y =(y 1,..., y ny ), and n x + n y = n, where x i is the number of a molecules synthesized in a fast chemical reaction, and y i is the total number of mrna produced by a transcription reaction, or the total number of proteins produced by a translation reaction in the gene network. Notice that y i for proteins is the total number including those in dimers and other complexes. Next, we suppress the explicit time dependence of P (z) for readability. Define a marginal function P (y) = P (x, y)dx, where the integration is x simply a summation over all discrete x. P (x) is similarly defined as P (y). Then, the joint probability function is written by the marginal probability and the conditional probability as P (x, y) =P (x y)p (y) =P (y x)p (x). (2.225) Assume that the concentrations of all house-keeping molecules, such as RNAP and ribosomes, are constant although we can consider them as variables in our model. We use a simple example to demonstrate the theoretical results. The simple biomolecular network with a single gene is shown in Figure A gene is transcribed into mrnas, which are further translated into protein monomers. Then, the protein monomers are dimerized and act as TFs to regulate the gene activity by binding to the promoter sites. Let numbers of the free protein monomers and the free binding sites of DNA be p and d, respectively. Represent the numbers of the mrna, the protein dimers, and the protein DNA complex to be y 1, x 1,andx 2, respectively. Then, the total numbers of the protein and the DNA are y 2 = p +2x 1 +2x 2 and u = d + x 2. Note that using the total numbers as slow variables (e.g., y 2 ) is more effective than using the number of the free molecules (e.g., p) although it increases the complexity of the formalation. The protein dimerization and DNA binding reactions are fast dynamics

96 2.7 Hybrid Representation and Reducing Molecular Networks 85 Figure 2.13 A simple molecular network with one gene. Numbers of free protein monomers and free bind sites of DNA are p and d, respectively, whereas total numbers of protein and DNA are y 2 = p +2x 1 +2x 2 and d + x 2, respectively p + p k1 k 2 x 1 ; d + x 1 x 2. (2.226) k 1 k 2 Clearly, there are four fast reactions, including both forward and reverse reactions. On the other hand, the transcription, translation, and degradation reactions are considered as slow dynamics d αk 3 y 1 + d; k x 3 2 y1 + x 2, (2.227) k y 4 1 p+ y1 ; d y 1 1 ; d p 2. (2.228) The transcription reactions occur at different rates for cases of with and without the binding of the dimer x 2 to the promoter, where α can be a repression (α <1) or an activation (α >1) coefficient. There are five slow reactions represented by (2.227) (2.228). Let z = (x, y) = (x 1,x 2,y 1,y 2 ), where x = (x 1,x 2 )andy = (y 1,y 2 ). Therefore, n = 4, m = 9, m 0 = 4, n x = 2, and n y = 2. The terms for k =1,..., 4 in Table 2.3 correspond to the four reactions in (2.226), whereas the terms for k =5,..., 9 are derived from the five reactions in (2.227) (2.228). Then, the master equation (2.224) is obtained directly from Table 2.3 for the simple molecular network. Clearly, φ k for k = m 0 +1,..., m and θ k for k =1,..., m 0 are zero vectors. The total numbers of direct gene products (i.e., y 1, and p) are affected only by transcription, translation, and degradation reactions, which are all in the gene network, whereas other chemical numbers vary only in the protein network although there exist interactions between gene and protein networks. Then, the protein network can be described as

97 86 2 Dynamical Representations of Molecular Networks Table 2.3 r k and w k for the simple molecular network, where V stands for the cell volume and r k =(φ k,θ k )=(φ k,1,φ k,2,θ k,1,θ k,2 ). Similarly, w 1 should be w 1 = k 1 p(p 1)/V, rather than k 1 p 2 /V. u = d + x 2 is the total number of binding sites for the gene, which is a constant. θ k for all k =1,..., 4 and φ k for all k =5,..., 9are zero vectors k r k = w k (x, y) (φ k, θ k ) 1 1,0, 0,0 k 1 p 2 /V = k 1 (y 2 2x 1 2x 2 ) 2 /V 2-1,0, 0,0 k 1 x 1 3-1,1, 0,0 k 2x 1d/V = k 2x 1(u x 2)/V 4 1,-1, 0,0 k 2x 2 5 0,0, 1,0 αk 3 d = αk 3 (u x 2 ) 6 0,0, 1,0 k 3 x 2 7 0,0, 0,1 k 4 y 1 8 0,0, -1,0 d 1 y 1 9 0,0, 0,-1 d 2 p = d 2 (y 2 2x 1 2x 2 ) P(x y) t m 0 = [w k (x φ k,y)p(x φ k y) w k (x, y)p (x y)]. (2.229) k=1 Substituting (2.225) into (2.224) and summing over all x, we have the following evolution equation of the marginal functions for the gene network: where P(y) t = m k=m 0+1 [ w k(y θ k )P (y θ k ) w k (y)p (y)], (2.230) w k (y) = x w k(x, y)p (x y) (2.231) is an average value conditional to y. w k (y) can be expressed by conditional moments or cumulants of x because w k (x, y) is generally a polynomial of x and y. According to (2.225), we can clearly obtain the dynamics of the biomolecular network by (2.229) and (2.230), which is much simpler than the original (2.224) and thus is a reduced biomolecular network. From the viewpoint of computational complexity, the decoupled master equations (2.229) (2.230) require considerably less CPU demand than the original master equation (2.224) does Approximation of Continuous Variables in Molecular Networks When we approximate x as continuous variables, (2.229) can be expressed approximately by the Fokker Planck equation

98 where with 2.7 Hybrid Representation and Reducing Molecular Networks 87 P(x y) t 2 = x [A(x)P (x y)] + 1 [B(x)P (x y)], (2.232) 2 x2 A(x) =(A 1,..., A m0 ), (2.233) D(x) =(D 1,..., D m0 ), (2.234) B(X) =[B ij ] m0 n, (2.235) A i (x) = n φ ji w j (x, y), (2.236) j=1 B ij (x) =φ ji w 1/2 j (x, y), (2.237) n D i (x) = Bij(x). 2 (2.238) j=1 Therefore, we have (2.232) and (2.230) to represent the original (2.224). Generally, compared with the master equation (2.229), P (x y) can be calculated more efficiently by (2.232) for a given y, thereby yielding w k of (2.231) efficiently. Note that y is still considered to be discrete even when solved by the reduced master equation (2.230). Clearly, (2.229) and (2.232) are a hybrid system with both discrete and continuous variables Gaussian Approximation in Molecular Networks When focusing on the gene network, we can easily examine the dynamics if the conditional moments or cumulants of x are provided, according to (2.230). In other words, provided that the necessary moments or cumulants in w k (y) and w k (y θ k ) are available, the dynamics of the gene network simply follow (2.230) with a much smaller number of variables. The required moments or cumulants can be calculated from (2.229) by multiplying x i for appropriate integer i and summing over all x. Next, we approximate the model (2.224) with (2.230) by assuming that all variables follow approximately Gaussian distribution. Let N be a conditional mean vector with elements N i = x i y, andm be a conditional correlation matrix with elements M ij = (x i N i )(x j N j ) for x. Then, from (2.231), by expanding w k at N, the transition rate w k (y) of (2.230) is given as w k (y) = [w k (N,y)+ w k(n,y) (x N)+ ]P (x y) x x = w k (N,y)+ 1 2 w k (N,y) 2 x 2 M + h k (N,M,y), (2.239)

99 88 2 Dynamical Representations of Molecular Networks which requires the conditional moments of x. Notice that all odd central moments vanish and the even moments can be expressed by the second central moment M in the Gaussian distribution. Therefore, w k (y) can be expressed by N and M with y, i.e., h k (N,M,y), and we only need to derive the means and correlations of x when assuming the Gaussian distribution, as indicated in (2.239). On the other hand, by multiplying x i by (2.229) and summing over x, we have the following evolution equation for the means: dn i dt m 0 = φ ki w k (x φ k,y)p(x φ k y) k=1 x m 0 = φ ki w k (x, y)p (x y) k=1 x m 0 = φ ki w k (y) k=1 f i (N,M,y), (2.240) where w k (x, y) is expanded at N in (2.240). According to (2.239), it is obvious that f i (N,M,y) = m 0 k=1 φ kih k (N,M,y). Similarly, we can derive the evolution equations for the correlations by multiplying (x i N i )(x j N j ) by (2.229) and summing over x as follows: dm ij dt m 0 = [φ ki φ kj + φ ki (x j N j )+φ kj (x i N i )]w k (x, y)p (x y) k=1 m 0 x w k (N,y) w k (N,y) = [φ ki φ kj w k (y)+φ ki M j + φ kj M i + ] x x k=1 g ij (N,M,y), (2.241) where M i is the ith column of M. Note that M is a symmetrical matrix, i.e., M ij = M ji. Therefore, the evolution equations can be expressed in terms of N and M for given y, as indicated in (2.240) and (2.241). For the simple molecular network shown in Figure 2.13, clearly from (2.239) we can derive the w 1 (y) shown in Table 2.4, where V is the cellular volume. The evolution equations for means and correlations can easily be derived from (2.240) and (2.241). Therefore, the master equation (2.224) is reduced to a reduced master equation (2.230) with a set of ordinary differential equations (ODEs) (2.240) and (2.241), which can be simulated easily from the numerical computation viewpoint, e.g., using the Gillespie algorithm (Chen et al. 2005), due to small numbers of variables and reactions in the reduced master equation. Therefore, we have a decomposed system that is composed of a master equation (2.230) for the gene network and time-varying ODEs (2.240) and

100 2.7 Hybrid Representation and Reducing Molecular Networks 89 Table 2.4 Each term for the simple molecular network w 1 (y) k 1 (y 2 2N 1 2N 2 ) 2 /V +4k 1 (2M 12 + M 11 + M 22 )/V w 2 (y) k 1 N 1 w 3(y) k 2N 1(u N 2)/V k 2M 12/V w 4(y) k 2N 2 w 5(y) αk 3(u N 2) w 6(y) k 3N 2 w 7 (y) k 4 y 1 w 8 (y) d 1 y 1 w 9 (y) d 2 (y 2 2N 1 2N 2 ) f 1 f 2 g 11 g 22 g 12 w 1 (y) w 2 (y) w 3 (y)+ w 4 (y) w 3 (y) w 4 (y) w 1(y)+ w 2(y)+ w 3(y)+ w 4(y) 8k 1(y 2 2N 1 2N 2)(M 11 + M 12)/V 2k 1M 11 2k 2(u N 2)M 11/V +2k 2N 1M 12/V +2k 2M 12 w 3 (y)+ w 4 (y) 2k 2 (u N 2 )M 12 /V 2k 2 N 1 M 22 /V 2k 2 M 22 w 3 (y) w 4 (y) k 2 (u N 2 )M 12 /V + k 2 N 1 M 22 /V + k 2 M 22 k 2 (u N 2 )M 11 /V k 2 N 1 M 12 /V k 2 M 12 (2.241) for the protein network. For an extreme case, assuming that the dynamics of the protein network is much faster than that of the gene network, the moments can be numerically or even analytically calculated by considering (2.240) to be zero. Note that we do not require the quasi-steady-state equilibrium of the fast slow dynamics in this section for the Gaussian approximation. In other words, with the assumption of the Gaussian distribution, (2.230) with (2.240) (2.241) is an exact representation of the original master equation (2.224) Deterministic Approximation in Molecular Networks We further approximate the model (2.230) with (2.240) and (2.241) by a deterministic scheme in this section. The dynamics of the protein network is relatively fast and the number of variables is usually substantially bigger than that of the gene network. Let V p be the size of the protein system, e.g., the cellular volume. Since the stochastic deviation is approximately proportional to 1/ V p, we assume that the noise of the protein network is almost averaged. In other words, we assume that the reactions in the protein network are deterministic. Therefore, when y is given, according to (2.240), x y = x and the protein network for x is a deterministic system, i.e., m 0 ẋ i = φ k,i w k (x, y), (2.242) k=1

101 90 2 Dynamical Representations of Molecular Networks where y is given. For y and x(0), we have m 0 ψ i (y) = φ k,i w k (ψ(y),y) with ψ(y, 0) = x(0), (2.243) k=1 where ψ is the flow of the dynamics. (2.243) is a time-varying ODE due to the time-dependent y(t). Hence, the conditional probability of x is and P (x y) =δ(x ψ(y)) (2.244) P (x, y) =P (x y)p (y) =δ(x ψ(y))p (y). (2.245) Therefore, we have the following master equation (2.246) with ODE (2.243) which is a hybrid system with both stochastic and deterministic processes: P(y) t = m k=m 0 +1 [w k (ψ(y θ k ),y θ k )P (y θ k ) w k (ψ(y),y)p (y)]. (2.246) For the simple molecular network, the master equation is (2.246) with m 0 = 4 and m = 9, and ODEs are dψ 2 = w 1 (ψ, y) w 2 (ψ, y) w 3 (ψ, y)+w 4 (ψ, y), (2.247) dt dψ 3 = w 3 (ψ, y) w 4 (ψ, y), (2.248) dt where x 1 (t) =ψ 2 (t) andx 2 (t) =ψ 3 (t). Moreover, approximating (2.246) by the Fokker Planck equation, we equivalently transform (2.246) into the form of the Langevin equations dy i dt = K i(y)+η i, (2.249) where η i are Gaussian noise signals with zero mean η i (t) = 0 and covariances η i (t)η j (t ) = K ij δ(t t ). K i and K ij are defined by K i (y) = K ij (y) = m k=m 0 +1 m k=m 0+1 θ k,i w k, (2.250) θ k,i θ k,j w k, (2.251) for all i and j. When there is no noise, it is easy to check that dy i /dt = K i (y) is identical to the rate equation of the deterministic system.

102 2.7 Hybrid Representation and Reducing Molecular Networks 91 Thus, for the simple molecular network, we have the corresponding Langevin equations of (2.249) as follows: dy 1 dt = K 1(ψ, y)+η 1, (2.252) dy 2 dt = K 2(ψ, y)+η 2, (2.253) where η i are Gaussian noise signals with zero mean η i (t) = 0 and covariances η i (t)η j (t ) = K ij δ(t t ). K 1 (ψ, y) =w 5 + w 6 w 8 and K 2 (ψ, y) =w 7 w 9. K 11 (ψ, y) =w 5 + w 6 + w 8 and K 22 (ψ, y) =w 7 + w 9. K 12 = K 21 = 0. Hence, the gene protein network can be simplified as (2.247) (2.248) and (2.252) (2.253). As another decomposition method for the general molecular network (2.224), we can also partition the variables z as z =(x, y), where x is the species in large numbers and y represents the species in small numbers. Then, by the partial Taylor expansion or the Kramers Moyal expansion for x (Crudu et al. 2009), we can express the master equation approximately with x as continuous variables but considering y as a discrete variable. In such a way, the computational efficiency can be significantly improved due to a reduced number of discrete variables, which follow the master equation Prefactor Approximation of Deterministic Representation A continuous approximation scheme, which reduces the number of dimensions of a system while predicting the dynamics of the entire system more accurately than that provided by the classic quasi-steady-state (QSS) approximation, was developed (Kepler and Elston 2001,Bundschuh et al. 2003,Bennett et al. 2007) for the deterministic model. By correctly applying multiple time scale analysis, we found that the resulting reduced systems were similar to the QSS approximations, but with a prefactor in front of the time derivatives of the concentrations. Consider the example of the synthetic repressilator shown in Figure 2.14 (Elowitz and Leibler 2000). The reactions in the repressilator are as follows (Bennett et al. 2007): x i + x i κ + κ y i, (2.254) d i,0 + y k k + k d r,i, (2.255) α d i,0 di,0 + m i, (2.256) σ m i mi + x i, (2.257) γ m m i 0, (2.258) γ p x i 0, (2.259)

103 92 2 Dynamical Representations of Molecular Networks where d i,0 and d r,i are the free and repressed promoter sites, respectively. If the promoter of gene G i is free, it can transcribe its associated mrna, m i, which in turn can translate its associated protein. Figure 2.14 Schematic representation of the repressilator. Gene G 1 produces protein x 1, whose dimer y 1 inhibits transcription of gene G 2. Similarly, the protein dimer y 2 represses the gene G 3, whose protein dimer y 3 represses transcription of G 1 (from (Bennett et al. 2007)) According to the mass action law, the differential equations are given by ẋ i = 2κ + x 2 i +2κ y i + σm i γ p x i, (2.260) ẏ i = κ + x 2 i κ y i k + y i d 0,j + k d r,j, (2.261) d i,0 = k + d i,0 y k + k d r,i, (2.262) d r,i = k + d i,0 y k k d r,i, (2.263) ṁ i = αd i,0 γ m m i, (2.264) where i {1, 2, 3}, j {2, 3, 1}, andk {3, 1, 2}. Assume that the dimerization and dissociation processes are faster than the other processes and that these reactions are in equilibrium. Solving the resulting algebraic equations, we obtain y i = c p x 2 i, (2.265) d d i,0 = 1+c d c p x 2, k (2.266) d r,i = dc dc p x 2 k 1+c d c p x 2, (2.267) k

104 2.7 Hybrid Representation and Reducing Molecular Networks 93 where d = d 0,i + d ri, c p = κ + /κ,andc d = k + /k. Substituting these expressions into (2.260) (2.261), we obtain the following differential equations: n i ṅ i =ẋ i =ẋ i p(x i ), (2.268) x i αd ṁ i = 1+c d c p x 2 γ m m i, (2.269) k where n i = x i +2y i +2d r,j x i +2c p x 2 i +2c dc p dx 2 i (1 + c dc p x 2 i ) 1 and p(x i )=1+4c p x i + 4c dc p dx i (1 + c d c p x 2. (2.270) i )2 Clearly n i is the total number or concentration of protein i, including those in the complexes. Defining the dimensionless variables by rescaling γ m t t, cd c p x i x i,and(σ c d c p )/(γ m β)m i m i, we obtain the system p(x i )ẋ i = β(m i x i ), (2.271) ṁ i = κd 1+x 2 m i, (2.272) k where β = γ p /γ m, κ = ασ/γ m γ p, d = c d c p d,andp(x i )ẋ i = ṅ i. Note that f(x i ) in (2.271) is the rescaled one, i.e., p(x i )=1+4rx i + 4d x i (1 + x 2, (2.273) i )2 where r = c p /c d. The difference between (2.271) (2.272) and the model under QSS as used in (Elowitz and Leibler 2000) is the existence of the prefactor p(x i ). Such a difference results from different reduction approaches, i.e., treating x i and n i = x i +2y i +2d r,j as slow variables under the QSS and the prefactor approximation, respectively. It is true that x i depends on slow reactions, but it also depends on fast reactions. Therefore, x i is not a slow variable, but a mixture of both slow and fast variables. Here, the true slow variable is n i, representing the total concentration of protein molecules. It has been shown that the prefactor approximation predicts the dynamics of the entire system more accurately than that provided by the classic QSS approximation, including both transient dynamics and other properties such as relaxation times of equilibria and periods and amplitudes of oscillations. For example, a comparison of the periods and amplitudes predicted by the entire system (2.260) (2.264), the prefactor approximation (2.271) (2.272), and the QSS approximation (i.e., p(x i ) = 1 in (2.271) (2.272)) is shown in Figure It can be seen that the prefactor approximation predicts the dynamics of the entire system more accurately than the QSS approximation with respect to the estimation of both the periods and amplitudes. See (Bennett et al. 2007) for more theoretical analysis and examples.

105 94 2 Dynamical Representations of Molecular Networks Figure 2.15 A comparison of the periods (a) and the amplitudes (b) of the oscillations as functions of γ p for the entire system (black curves), the prefactor approximation (circles), and the QSS approximation (dashed curves). The parameters are γ p =6,γ m =1,κ + = k + =5,κ = k = 100, α = 10, σ = 20, and d =20 (from (Bennett et al. 2007)) Stochastic Simulation of Hybrid Systems Define (x, y) and(x, Y ) to be concentrations and numbers of molecules, respectively. Reconsider the general system (2.224) by considering z =(x, y) and r k =(φ k,θ k ) with x =(x 1,..., x nx ), y =(y 1,..., y ny ), and n x + n y = n, where x i and y i are the concentrations of molecules, i.e., the numbers divided by the system size or volume V. Then, the dynamics of the system is described by the master equation P(x, y; t) t m = [w k (x φ k /V, y θ k /V )P (x φ k /V, y θ k /V ; t) k=1 w k (x, y)p (x, y; t)], (2.274) where (φ k,θ k ) is a vector for the change of the state, i.e., r k,j is the change in the number of the jth molecule by the kth reaction. w k (x, y) is the transition rate ( 0) from state (x, y) to state (x + φ k /V, y + θ k /V )bythekth reaction. Note that (x, y) represent the concentrations in (2.274), in contrast to the numbers z in (2.224). Hybrid System with Deterministic Process Assuming that the number of X is much bigger than that of Y, we can approximate X by continuous variables x, i.e., x = X/V, by keeping y = Y/V as discrete variables. Therefore, by the partial Kramers Moyal expansion of (2.274) with respect to x and φ k /V up to the first order (i.e., the zeroth order and the first order), we have the following hybrid representation

106 2.7 Hybrid Representation and Reducing Molecular Networks 95 P(x, y; t) t m = [w k (x, y θ k /V )P (x, y θ k /V ; t) w k (x, y)p (x, y; t)] k=1 n x [( m ) ] φ kj W k (x, y) P (x, y; t) + O( 1 ), (2.275) x j V j=1 k=1 where w k = W k V is rates of the reactions which are proportional to the volume V based on the mass action law. O( 1 V ) implies that the order of the term is equal to or higher than 1/V. Clearly, the term of the Taylor zeroth order expansion, i.e., the first term m k=1 [ ] in (2.275), is the discrete dynamics or the master equation for discrete variables y, and the term of the Taylor first-order expansion, i.e., the second term n x j=1 x j [ ] in (2.275), is the deterministic kinetic dynamics or corresponds to the Langevin equation for the continuous variables x. The term of the Taylor second-order expansion, i.e., O( 1 V ), corresponds to the diffusion process, and when V, it approaches zero. Therefore, (2.275) is a hybrid system with both discrete and continuous dynamics, or with both stochastic and deterministic processes. Specifically, the stochastic system for discrete variables Y is P(x, y; t) t = m [w k (x, y θ k /V )P (x, y θ k /V ; t) k=1 w k (x, y)p (x, y; t)], (2.276) and the deterministic system with continuous variables x is for j =1,..., n x dx j (t) dt = m φ kj W k (x(t),y), (2.277) k=1 where j =1,..., n. Let δ(θ k )=0ifθ k is a zero vector; otherwise, δ(θ k )=1. Therefore, defining the jump intensity w 0 = m k=1 w k(x, y)δ(θ k ), we have the algorithm of stochastic simulation based on the piecewise deterministic Markov process (PDMP) (Davis 1993, Zeiser et al. 2008, Crudu et al. 2009), shown in Table 2.5. Clearly, the continuous variables x are governed by the deterministic system and change continuously during each time interval [t i,t i + Δt i ), while the discrete variables y remain constant. Therefore, x are piecewise continuous variables and may change discretely at t i + Δt i. On the other hand, y evolve discretely with stochastic motion punctuated by a sequence of random waiting times Δt i due to the master equation (2.276). Such dynamics is schematically shown in Figure Hybrid System with Diffusion Process Moreover, we can expand (2.274) to the second order of V to consider the diffusion effect, i.e.,

107 96 2 Dynamical Representations of Molecular Networks Table 2.5 Algorithm of stochastic simulation for (2.275) based on PDMP Step 1. Initialization: set t 0 = 0 and fix the initial numbers of molecules (X 0 /V, Y 0 /V ). Step 2. Calculate the propensity function w k, k =1,..., m. Step 3. Generate one random number r 1 uniformly distributed in [0, 1). Step 4. Integrate the following differential equations: dx j (t) dt dq(t) dt m = φ kj W k (x(t),y)forj =1,..., n x k=1 = w 0 (x(t),y)q(t) with x(t i )=x i, q(t i )=1 between t i and t i +Δt i with the stopping condition q(t i +Δt i)=r 1. Then, we have Δt i. Step 5. Generate a second random number r 2 which is uniformly distributed in [0, 1). Choose μ i so that μ i is the smallest integer satisfying μ i j =1 w j (x, y) >r 2w 0 (x, y). Step 6. Execute the reaction μ i, i.e., update (x, y). If t i computation. Otherwise, go to Step 2. >T max, terminate the where P(x, y; t) t = m [w k (x, y θ k /V )P (x, y θ k /V ; t) (2.278) k=1 w k (x, y)p (x, y; t)] (2.279) m n x [g jk (x, y; t)p (x, y; t)] (2.280) x k=1 j=1 j ( n x n n 2 m ) φ kj φ kl + x j x k 2V W k(x, y)p (x, y; t) (2.281) j=1 l=1 k=1 +O( 1 V 2 ) (2.282)

108 2.7 Hybrid Representation and Reducing Molecular Networks 97 x(t) 0 t1 t2 t3 t4 t5 t Figure 2.16 Schematic illustration of a piecewise deterministic Markov process n y g jk (x, y; t) =φ kj W k (x, y) l=1 n y = φ kj W k (x, y) l=1 φ kj θ kl W k (x, y)p (x, y; t) VP(x, y; t) y l φ kj θ kl [ W k(x, y) V y l lnp(x, y; t) +W k (x, y) ]. (2.283) y l Therefore, (2.280) (2.281) can also be expressed by the Langevin equations instead of the differential equations (2.277), i.e., stochastic differential equations with the continuous variables x for j =1,..., n dx j (t) dt m m = g jk (x(t),y; t)+ k=1 k=1 φ kj V Wk (x(t),y)γ k (t), (2.284) where Γ k (t) is defined in (2.105). For this case, the hybrid system is the combination of the discrete stochastic system (2.276) and continuous stochastic system (2.284), which can be simulated similarly to the algorithm shown in Table 2.5 and Table 2.6. In Table 2.6, V j (t) are independent one-dimensional Wiener processes. The stochastic differential equation can be calculated by the Itô integration. Clearly, there is P (x(t),y; t) in (2.284) or g jk, which is required to be estimated during the integration. There are many ways to approximate P (x(t),y; t), such as by the finite state projection approach, the Gaussian distribution assumption for the continuous variables, or the equilibrium probability distribution. We consider the scheme to estimate approximately ln P (x, y; t)/ y l. Since y corresponds to discrete variables which are expected to change the

109 98 2 Dynamical Representations of Molecular Networks Table 2.6 Algorithm of stochastic simulation for (2.278) (2.282) based on PDMP Step 1. Initialization: set t 0 = 0 and fix the initial numbers of molecules (X 0 /V, Y 0 /V ). Step 2. Calculate the propensity function w k, k =1,..., m. Step 3. Generate one random number r 1 uniformly distributed in [0, 1). Step 4. Integrate the following stochastic differential equations: m m φ kj dx j(t) = g jk (x(t),y; t)dt + Wk (x(t),y)dv k (t) V k=1 for j =1,..., n x k=1 dq(t) = w 0 (x(t),y)q(t)dt with x(t i)=x i,q(t i)=1 between t i and t i +Δt i with the stopping condition q(t i +Δt i)=r 1. Then, we have Δt i. Step 5. Generate a second random number r 2 which is uniformly distributed in [0, 1). Choose μ i so that μ i is the smallest integer satisfying μ i j =1 w j (x, y) > r 2 w 0 (x, y). Step 6. Execute the reaction μ i, i.e., update (x, y). If t i computation. Otherwise, go to Step 2. >T max, terminate the dynamics in a slow manner in contrast to the continuous variables x, we assume P(x(t),y; t)/ y l 0, or ln P (x(t),y; t)/ y l 0. Specifically, we have n y g jk (x, y; t) =φ kj W k (x, y) l=1 φ kj θ kl V W k (x, y) y l. (2.285) 2.8 Stochastic versus Deterministic Representation Stochastic and deterministic approaches have both advantages and disadvantages. The advantage of stochastic approaches is that they exactly capture the discrete and stochastic nature of cellular systems. At low concentrations of molecules, molecular fluctuations are likely to have a marked impact on system dynamics. The predictions of deterministic and stochastic models for circadian rhythms show that robust circadian oscillations can be observed even when the maximum number of mrna and protein molecules is of the order of some tens and hundreds (Gonze et al. 2002b). To assess exactly the effects of molecular noise, it is necessary to resort to a stochastic approach.

110 2.8 Stochastic versus Deterministic Representation 99 However, almost all analytical methods available for deterministic approaches are no longer applicable and all stochastic simulations are time-consuming processes. The computational efficiency rapidly degrades as the complexity of a system increases. One should therefore use stochastic approaches only if they are absolutely necessary. Deterministic approaches neglecting the discrete and stochastic nature have received considerable attention due to the simplicity for performing qualitative and quantitative studies. For deterministic formalisms, there are rich techniques, e.g., structural analysis, cellular control analysis, frequency analysis, and bifurcation analysis, that can be used to qualitatively or quantitatively analyze the system dynamics. In many cases, stochastic and deterministic descriptions of a system coincide in the sense that the mean behavior of the system can be accurately captured by the deterministic description. For example, sustained oscillation corresponding to a limit cycle in a deterministic circadian rhythm model can also be obtained in its corresponding stochastic description, i.e., the stochastic oscillations present fluctuations around the deterministic limit cycle (Gonze et al. 2006). Figure 2.17 Formalisms to model molecular networks Generally, when a deterministic system is operating near a critical point, stochastic and deterministic processes may be substantially different. In this case, noise can induce some new phenomena or qualitative changes. For example, for some parameters, both the stochastic and deterministic processes in a circadian rhythm model coincide, i.e., they both exhibit periodic oscillations

111 100 2 Dynamical Representations of Molecular Networks with only difference in their periods and fluctuations in terms of concentrations. However, for other parameters, the deterministic description results in a stable equilibrium, while stable oscillations persist in the stochastic description (Vilar et al. 2002). Some other noise-induced new phenomena such as noise-based switches and amplifiers for gene expression (Hasty et al. 2000) and fluctuation-enhanced sensitivity of intracellular regulation (Paulsson et al. 2000) in a single cell have also been developed. Therefore, when species are present at low copy numbers, the stochastic description is more reasonable although it is solvable neither analytically nor with high computational efficiency. On the other hand, when the species numbers are high and the system is operating far from its critical points, the deterministic description is more reasonable due to its simple representation and high computational efficiency. The features of various formalisms to model molecular networks are briefly summarized in Figure Depending on the requirement for accuracy, we can choose different modeling approaches for quantitative simulation and qualitative analysis of cellular dynamics.

112 3 Deterministic Structures of Biomolecular Networks One of the ultimate goals in molecular biology is to understand the physiology of living cells in terms of the information that is encoded in the genome of a cell. The central dogma of molecular biology, i.e., DNA encodes RNA which in turn produces protein molecules, provides a framework for understanding the flow of information transfer from the DNA through the RNA to the protein molecules. Individual molecules, such as proteins, perform various functions in complex molecular networks and play key roles in most of the cellular processes. For example, a protein may affect production rates of other proteins or itself by transcriptional regulation when acting as a transcriptional factor. Therefore, to understand how genes, proteins, and small molecules dynamically interact to form molecular networks which realize sophisticated biological functions becomes one of the major challenges for post-genomic biology. Recent advances in genomic science have made the quantitative analysis of molecular interactions, e.g., PPIs and DNA protein interactions, possible because of the progress in experimental and measurement techniques, unlike conventional qualitative study. Nonlinear phenomena in cellular dynamics, such as biochemical oscillations and gene expression multistability, have been extensively investigated through various mathematical models, in particular, for molecular networks in simple organisms. Mathematical models can provide testable quantitative predictions despite the complexity of the networks. In addition, general regulatory principles can be found through them so as to allow us to manipulate and monitor various biological processes at the molecular level, which has great potential for biotechnological and therapeutic applications. A biomolecular network can be expressed as a set of vertices representing cellular elements, e.g., genes, proteins, metabolites, and complexes, connected by edges which represent the relations between pairs of elements such as biochemical reactions and intermolecular interactions. Networks enable representation and characterization of biological processes such as signaling, metabolic, and regulatory processes. For example, a metabolic network can be viewed as a directed graph where each vertex represents a metabolite and every edge

113 102 3 Deterministic Structures of Biomolecular Networks represents a biochemical reaction that transforms one metabolite into another. On the other hand, in a protein protein interaction network or protein interaction network, a vertex represents a protein and an edge represents a pairwise interaction between two proteins, that is, two proteins are connected if they interact with each other. A representative gene regulatory network with two genes lac and ci is shown in Figure 3.1, which is actually a schematic representation of a gene oscillator (Hasty et al. 2002a, Hasty et al. 2002b). The two genes synthesize their mrnas and subsequently proteins X and Y,whichin turn activate and repress the two genes. In this directed network, interactions include transcription, translation, and protein DNA binding. In this chapter we introduce some basic concepts in modeling biomolecular networks to help readers understand the related problems discussed in the later chapters and provide a general structure for molecular networks in the deterministic form. In particular, we provide examples of gene regulatory networks. Other kinds of networks can be similarly discussed. Figure 3.1 Schematic representation of a gene oscillator (from (Hasty et al. 2002b)) Cellular regulation is highly integrated and consists of signaling, metabolic, and regulatory processes although various analyses often focus independently on one or some of these processes. For example, signaling cascades are triggered by the presence of extracellular stimuli and often result in the activation of the targets, which may be either enzymes within the cytoplasm or transcription factors. The transcription factors which regulate transcription of associated genes and synthesis of various proteins involved in signal transduction and metabolism function in transcription regulatory networks. The enzymes may be modified so that their catalytic activities are increased or decreased in response to extracellular signals. Consequently, biomolecular networks are highly integrated.

114 3.1 A General Structure of Molecular Networks 103 Understanding how various cellular processes are controlled and what is their general structure are one of the major challenges in research of systems biology. Traditionally, research has been focused on the characterization of individual components and then interactions. However, almost all cellular functions cannot be attributed to isolated components. Rather, they are associated with the characteristic molecular networks. The structure and dynamics of molecular networks have been the subject of active research by using computational modeling coupled with various experimental techniques and methodologies. Theoretical and computational studies of dynamics in molecular networks span a broad range of contents such as dynamics of various regulatory processes in transcriptional, signaling, and metabolic networks, as shown in Figure 3.2. B i o c h e m i c a l i n t e r a c t i o n s S i g n a l i n g n e t w o r k s R e g u l a t o r y n e t w o r k s 1 1 A T P M e t a b o l i c n e t w o r k s A D P 2 A T P A D P 3 N e t w o r k s t r u c t u r e a n d d y n a m i c s Figure 3.2 An overview of network structure and dynamics in molecular networks 3.1 A General Structure of Molecular Networks All living organisms consist of one or more cells, which are the basic structural units of an organism. A cell is an integrated device comprising several thousand or more types of genes, proteins, and other molecules. The set of nodes representing these biochemical components and the set of directed or undirected edges representing the interactions between them constitute a molecular network, whose dynamical properties cannot be understood by individual components alone. Complex molecular networks can perform various

115 104 3 Deterministic Structures of Biomolecular Networks specific functions, including DNA replication, translation, conversion of glucose to pyruvate, and cell cycle regulation. Physiological functions of cells and organisms are actually responsible for the coordinated or integrated functions of multiple molecular networks. Recent advances by mathematical modeling in biology have demonstrated that molecular networks can be well described by mathematical models. These models shed light on the design principles of molecular networks with specified functions and allow making non-trivial predictions, some of which have been verified experimentally. Consequently, to understand how these networks are built, what is their general structure, and how they function, one must develop a conceptual framework, i.e., a precise mathematical description, which can be used to describe and analyze these networks. An appropriate mathematical model can allow qualitative or even quantitative predictions in order to provide guidelines for conducting experiments. Advanced computing devices combined with improved numerical techniques have made it possible to simulate and analyze dynamical properties of various molecular networks. To build and analyze theoretical models or structures of the various complex molecular networks shown in Figure 3.2, the bottom-up approach can be used to propose a hypothetical network of biochemical reactions among the component species, to formulate a set of dynamical equations which describe the temporal and spatial evolution of the network, to analyze the equations qualitatively or quantitatively, to compare the behavior of the networks with that of living cells, and consequently, to better understand the underlying molecular basis of cell physiology. In principle, the governing equations for any chemical reaction network can be formulated by the mass action law. Therefore, the differential equational formulation, which models concentrations of cellular components by time-dependent variables, has been widely used to analyze various molecular networks. For instance, regulatory interactions take the form of differential equational relations among the concentrations of variables. One example is the ODE models based on rate equations with such forms. Next, we define a general molecular network on the basis of differential equations in a mathematical manner Basic Definitions Let R + be the set of non-negative real numbers. Assume that a molecular network is composed of n biochemical components, which can be proteins, mrnas, chemical complexes, different states of the same protein, or proteins at different locations in a cell. The network can be represented by a functional differential equation (FDE) dx(t) dt = f(x t ), (3.1)

116 3.1 A General Structure of Molecular Networks 105 where x(t) =(x 1 (t),...,, x n (t)) X R +n is the concentrations of all components at time t R. Let C + C([ r, 0], R +n ), where C([ r, 0], R +n )is the space of continuous maps on [ r, 0] into R +n. x t C + is defined by x t x(t + θ), r θ 0. The reaction rates f =(f 1,..., f n ):C + R +n are continuously differentiable and map a bounded subset of C + to a bounded subset of R +n. Note that the reaction rates f include both synthesis and degradation rates of the components. Let ˆX C + be an induced space of X on [ r, 0] into R +n, i.e., φ ˆX means φ(θ) X for r θ 0. A special form of (3.1), which is widely used, is differential equations with discrete delays represented by dx i (t) = f i (x 1 (t τ i1 ),..., x n (t τ in )) f i (x τi ), (3.2) dt where τ ij (i, j =1,..., n) denotes time delays from component j to component i. x τi =(x 1 (t τ i1 ),..., x n (t τ in )). These delays arise from the time required to complete transcription, translation, and diffusion to the places where the RNAs or proteins can act. In this book, (3.2) is adopted for most of the cases to simplify the descriptions although most theoretical results related to (3.2) also hold for (3.1). When all delays are set to be zero, (3.2) takes the form of an ODE. Time delays are often involved in gene regulatory networks. For example, a delay can represent the time taken for a protein to repress or activate the production of its own or other proteins, including the time for translation and processing steps such as multiple phosphorylation, nuclear entry, and complex formation. Few studies have focused on metabolic delays because most metabolic reactions are fast. The temporal behavior of a metabolic network, consisting of n metabolites and r reactions, can often be described by a set of differential equations ds(t) dt = Nν(S(t),p), (3.3) where S denotes the n-dimensional vector of biochemical reactants, and N denotes the n r stoichiometric matrix. Clearly, time delays x τ are all zero in this system. The stoichiometric matrix N contains important information about the structure of the metabolic network. The r-dimensional vector ν(s, p) consists of reaction rates, which depend on the substrate concentrations S, as well as a set of parameters p, e.g., enzyme activities. The reaction rates can be determined on the basis of the enzyme dynamics which obey the mass action law. The description of the metabolic system (3.3) consists of a vector S = (S 1,S 2,..., S n ) T of concentration values, a vector ν =(ν 1,ν 2,..., ν r ) T of reaction rates or fluxes, a parameter vector p =(p 1,p 2,..., p m ) T, and the stoichiometric matrix N. The equation describes the rate of concentration change in each metabolite, including the consumption and production of a metabolite.

117 106 3 Deterministic Structures of Biomolecular Networks Take the simple metabolic network with four chemicals S 1,..., S 4 and fluxes ν 1,..., ν 4 shown in Figure 3.3 as an example. The stoichiometric matrix takes the form N = (3.4) V1 S1 V2 S2 V3 S3 V4 S4 Figure 3.3 A simple metabolic system The system dynamics is described by a set of ODEs Ṡ 1 = ν 1 ν 2, (3.5) Ṡ 2 = ν 2 ν 3, (3.6) Ṡ 3 = ν 4 ν 2, (3.7) Ṡ 4 = ν 2 ν 4, (3.8) where Ṡi =ds i (t)/dt. The conservation condition is that S 3 + S 4 is constant. Generally, there are some equations for conserved quantities, i.e., the sum of two or more metabolites is a conserved quantity. These conservation conditions can be used to simplify the system, e.g., the fourth equation can be eliminated on the basis of the conservation condition. Clearly, (3.5) (3.8) are a linear system of fluxes ν i, but can be generally a nonlinear system. A function x(t; φ) R +n issaidtobeasolution of (3.1) if it satisfies (3.1) for all t t 0 with x(t 0 + θ; φ) =φ(θ), r θ 0, where φ C + isagiven initial function. To emphasize the initial function, we define x t (φ) x(t+θ; φ) with x t0 (φ) =x(t 0 + θ; φ) =φ(θ), r θ 0. For (3.1), orbits, equilibria, periodic orbits, omega, and alpha limit sets are defined in the following ways. Let ˆx be the constant function equal to x for all values of its argument, i.e., ˆx(θ) =x, where r θ 0. In other words, ˆx is the natural inclusion from x R +n to ˆx C + by ˆx(θ) =x with r θ 0. Definition 3.1. The set of equilibria for (3.1) is defined by E {φ C + : φ =ˆx for some ˆx R +n satisfying f(ˆx) =0}. (3.9)

118 3.1 A General Structure of Molecular Networks 107 Definition 3.2. The orbit of (3.1) for the initial condition φ C + is O + (φ) {x t (φ) :t t 0 }. (3.10) Definition 3.3. The omega limit set is defined by ω(φ) {x t (φ) :t s}, (3.11) s 0 whereas the alpha limit set is defined by α(φ) s 0 {x t (φ) :t s}. (3.12) Definition 3.4. The orbit O + (φ) issaidtobeat-periodic orbit if x T +t (φ) = x t (φ) for all t and the minimal T> A General Structure for Gene Regulatory Networks Next, we consider an example of a general gene regulatory network, which emphasizes the structure of feedback effects on transcription, splicing, and translation processes (Chen and Aihara 2002a, Wang et al. 2008). The structure of the network is shown schematically in Figure 3.4. In the network, each node represents one gene with its products (the mrna and the protein), and the relationship between them, i.e., the transcription, translation, and splicing processes. As shown in Figure 3.4 (a), for any single node i, there are generally many inputs of proteins, i.e., p 1 (t τ pi1 ),..., p n (t τ pin ), which come from its own or other nodes with time delays τ pi1,..., τ pin, respectively. τ pij R + is a time delay from p j to m i, i.e., from protein j to mrna i, mainly due to the slow transcription processes. The regulation and interactions of the inputs on the gene or mrna i is represented as a nonlinear function r i (p 1 (t τ pi1 ),..., p n (t τ pin )) to show the activation and repression effect of the individual proteins. However, there is only one output, i.e., the protein p i (t), from any single Node i, which may activate or repress its own gene or other genes with time delays τ pji. The mrna i and protein i degrade with degradation rates d mi and d pi, respectively. Couplings and interactions of many such nodes constitute a gene regulatory network, as shown in Figure 3.4 (b). The differential equations of the gene regulatory network can be mathematically represented as (Chen and Aihara 2002a, Wang et al. 2008) ṁ i (t) = d mi m i (t)+r i (p(t τpi )), (3.13) ṗ i (t) = d pi p i (t)+s i (m i (t τ mi )), (3.14) where m i, p i R (i =1,..., n) represent the concentrations or numbers of mr- NAs and proteins with degradation rates d mi and d pi, respectively. The regulatory functions r i (p(t τpi )) = r i (p 1 (t τ pi1 ),..., p n (t τ pin )) and s i (m i (t τ mi ))

119 108 3 Deterministic Structures of Biomolecular Networks (a) Interactions from its own or other proteins r(p(t- τ ),...,p(t- τ )) i 1 n p in p i1 To its own or other genes dm i Degradation d p i mrna m i (t) s(m(t - τ )) i m i τ mi and τ pij are delays Protein p (t) i p(t) i Node i (b) Node-1 p (t - ) 1 τ pk1 p (t - τ ) k pnk Node- n p (t - τ ) j p1j Node-k p (t - τ pin ) n Node-j Node- i p (t - τ ) i pji A gene regulatory network composed of n nodes Figure 3.4 Illustration for a single node and a gene regulatory network: (a) the detailed structure of Node i. (b) A gene regulatory network composed of n nodes with many feedback loops due to regulations and interactions among them (from (Wang et al. 2008)) are generally nonlinear, with τ mi R + and τ pij R + representing time delays for mrna i and protein i, respectively. τ mi is a time delay from m i to p i, i.e., from mrna i to protein i, mainly due to the slow translation process. If the detailed binding information among proteins (i.e., TFs) and DNA (i.e., promoters) is available, r i can be derived analytically directly from P (RNAP i ) of (2.222). However, there are rare cases for which the TFs and promoters are known. Hence, r i and s i are considered as general nonlinear functions to describe the transcription and translation processes in gene regulatory networks. Some choices of r i (x) ands i (x) can be the sigmoid functions such as α tanh(x i /ɛ) orx k i /(αxk i + β) with parameters α and β, which show the switch-like phenomena, where ɛ is a positive parameter and k is the Hill co-

120 3.2 Gene Regulatory Networks with Cell Cycles 109 efficient denoting the degree of cooperativity. As one approximation scheme, an integration model is represented as follows: r i (p(t pτi )) = α i tanh( n w ij p j (t τ pij )/ɛ), (3.15) j=1 s i (m i (t τ mi )) = β i m i (t τ mi ), (3.16) where w ij represents the regulation rate from protein j to mrna i, andβ i is the linear synthesis rate of protein i. Clearly, all inputs from other genes are linearly added with weights w ij and then their total effect is nonlinearly transformed, whereas the synthesis of protein i is assumed to depend approximately linearly on the concentration of mrna i (Chen and Aihara 2002a). Note that in molecular regulation, there might exist several different co-regulation mechanisms that can be explained by OR gate logic and AND gate logic. The above case corresponds to the OR gate logic, i.e., any of the inputs is sufficient to activate or repress the mrna i. In addition, SUM (summation) and PROD (product) forms can also be used to model the regulatory function r i. 3.2 Gene Regulatory Networks with Cell Cycles The cell cycle may significantly change the dynamics of a biomolecular network both qualitatively and quantitatively. A cycle of most eukaryotes is composed of four stages: the G1 (gap) phase in which the size of the cell is increased by constantly producing RNAs and synthesizing proteins, the S (synthesis) phase in which DNA synthesis and duplication occur, the G2 (gap) phase in which the cell continues to produce new proteins and grows in size, and the M (mitosis) phase in which chromosomes segregate and cell division takes place. In particular, the genome is constantly maintained in the G1, G2, and M phases, but duplicated in the S phase which is shorter than the cell volume growth process and much longer than the cell division instant. The time period of a cell cycle in most mammalian cells is on the order of h, whereas bacteria by contrast may divide every min, and yeast cells or other protozoans may take 6 8 h. Since the cell volume and DNA number must increase by a factor of 2 between successive divisions in order to ensure that the mass of the two daughter cells remains nearly equal to that of the mother cell, the concentrations or the numbers of molecules inevitably depend on dynamics of the cell cycle, which in turn has significant effects on the dynamics of the gene regulatory networks because of the dynamical fluctuations of the cell cycle. Let m i and p i represent the numbers of mrnas and proteins for gene i. To consider the nonlinear dynamics of gene regulatory networks by consideration of the cell cycle, (3.13) (3.14) can be rewritten as (Chen et al. 2004)

121 110 3 Deterministic Structures of Biomolecular Networks ṁ i (t) = d mi m i (t)+n i u(t)r i ( p(t τ pi ) ), v(t) (3.17) ṗ i (t) = d pi p i (t)+s i (m i (t τ mi )), (3.18) where N i is a positive scalar which represents the number of gene i at the beginning of the cell growth phase, while u(t) R is the DNA number factor so that N i u(t) is the number of gene i at time t, due to which 1 u(t) 2. Define the cell volume factor as v(t) =V (t)/v 0 R, where V (t) is the host cell volume at time t and V 0 is the host cell volume at the beginning of its growth phase, due to which 1 v(t) 2. For the period from the beginning of cell growth to the cell division, (3.17) represents the transcription reaction whereas (3.18) describes the translation process. m i (t) =dm i (t)/dt and ṗ i (t) =dp i (t)/dt hold for all t except division instants, while the volume and the number of chemicals all halve, i.e., v(t) v(t)/2, u(t) u(t)/2, m i (t) m i (t)/2, and p i (t) p i (t)/2 at each division instant t (Chen et al. 2004). In a eukaryotic cell, there is usually one copy for each gene at the beginning of a cell growth phase, i.e., N i = 1. However, for bacteria, there may exist multiple DNA plasmids per cell, e.g., as many as 100 plasmids, which implies N i =1 100 for a host cell. For the sake of simplicity, it is assumed that genes or DNAs, including plasmid DNAs, are duplicated before cell division. Assuming the period of a cell division cycle to be τ, the following piecewise function can be used to describe v(t): { e (t/τ k)ln2, kτ t<(k +1)τ, v(t) = (3.19) 1, t =(k +1)τ, where k =0, 1, 2,... The cell volume factor v(t) exponentially increases from 1 to 2 during each cell cycle and is reset to 1 after each cell division. On the other hand, since DNA is duplicated in a much faster manner than the cell volume, the following sigmoidal function is adopted to describe approximately the DNA number factor u(t): { ag(t/τ k)+b, kτ t<(k +1)τ, u(t) = (3.20) 1, t =(k +1)τ, where k =0, 1, 2,... and g(t) =1/(1 + e γ(t t d) ) is centered at t d for 0 t d 1. a =1/(g(1) g(0)) and b =1 g(1)/(g(1) g(0)) are chosen to ensure that u(t) =1andu(t) = 2 just after and before each division, respectively. Clearly the DNA number factor u(t) is mostly constant, except the time period near (k +t d )τ during which u(t) rapidly increases from 1 to 2 and is reset to 1 after each cell division. The period near (k + t d )τ corresponds to the S phase in a eukaryotic cell. The temporal changes of the cell volume factor v(t) and the DNA number factor u(t) are shown in Figure 3.5, where the DNA duplication occurs around (k + t d )τ for each cell cycle period τ with τ =1.

122 3.2 Gene Regulatory Networks with Cell Cycles u v 2 u and v t (cell division period τ =1) Figure 3.5 The cell volume factor v(t) and the DNA number factor u(t) att d =0.6 and γ = 50 (from (Chen et al. 2004)) Define [m i ](t) =m i (t)/v(t) and[p i ](t) =p i (t)/v(t) as the relative concentrations of mrnas and proteins. Then, by differentiating m i (t) =v(t)[m i ](t) and p i (t) =v(t)[p i ](t) with respect to t, and further by substituting (3.19) (3.20) into (3.17) (3.18) with consideration of v(t) v(t)/2, u(t) u(t)/2, m i (t) m i (t)/2, and p i (t) p i (t)/2 at each division instant, a model of the gene regulatory network in terms of relative concentrations takes the form [m] u(t) i (t) = (d mi + v)[m i ](t)+n i v(t) r i([p](t τpi )), (3.21) [p] i (t) = (d pi + v)[p i ](t)+s i [m i ](t τ mi ), (3.22) where v = v(t)/v(t) = (ln 2)/τ when v(t) of (3.19) is adopted. Note that when there is no cell division cycle dynamics, the terms v[m i ](t) and v[p i ](t) in (3.21) (3.22) disappear and u(t) =v(t) =1. The gene regulatory network with cell cycle (3.21) (3.22) is a non-autonomous system due to the existence of u(t)/v(t), which generally is a periodic function. Therefore, the cell cycle may significantly change the dynamics of biomolecular networks. In particular, when a gene network is near a stability boundary, a cell cycle as a degradation factor may significantly change the dynamics both qualitatively and quantitatively. For example, a cell division cycle can be viewed as an external periodic force for the inherent autonomous dynamics of genetic networks. Depending on the frequencies and coupling of external and internal oscillations, there may exist periodic, quasi-periodic, resonant, and even chaotic dynamics that are generated by synchronization of the two oscillators (Chen et al. 2004). As a cell grows, the DNA or gene numbers can

123 112 3 Deterministic Structures of Biomolecular Networks be assumed to change rapidly or smoothly, depending on cell types and the initial DNA or gene numbers. It has also been shown that the cell cycle may play significant roles in gene regulation due to the nonlinear relation among the cell volume, the DNA number, and gene regulatory network, although gene expression is usually tightly controlled by TFs (Chen et al. 2004). Consider two situations for changes in the numbers of genes. The first one is rapid change, i.e., u(t) is constant during the cell growth, except the S phase in which u(t) is doubled, but immediately halves after division in each daughter cell, as indicated in (3.20). Such a case corresponds to the situation of a eukaryotic cell. It may also hold for a system with chromosomal genes in a prokaryotic cell. On the other hand, the second one is a smooth change, i.e., u(t) proportionally increases with the cell volume growth until it is doubled at the division instant but immediately halves after division in each daughter cell, i.e., u(t)/v(t) = 1. Such a case can be considered as an approximation to a prokaryotic cell with a large number of plasmids, i.e., there are many copies of a gene in a cell Gene Regulatory Networks for Eukaryotes First consider the dynamics with rapid changes of gene numbers. Such a case corresponds to the situation of a eukaryotic cell, where there are usually one copy or a few of copies of a gene in a cell. Therefore, by (3.17) (3.20) with consideration of m m/2,p p/2, v v/2, and u u/2 at the division instant, we can describe the dynamics in terms of the chemical numbers and v, u by impulsive differential equations (IDEs) as follows: ṁ = Nuf( p v ) K mm m 2 ṗ = S p m K p p p 2 v = vv v 2 δ(t kτ), (3.23) k=1 δ(t kτ), (3.24) k=1 δ(t kτ), (3.25) k=1 u = γ τ (u b) γ aτ (u b)2 u 2 δ(t kτ), (3.26) where the impulse function δ is defined as: δ(t) = 0 when t 0, and + δ(t)dt = 1. Note that v(0) = u(0) = 1. Due to the last impulsive terms of (3.23) (3.26), values of m(t), p(t), v(t), and u(t) all halve at t = kτ. Clearly (3.23) (3.26) are not ODEs but IDEs with a periodic impulse force. The effects of a cell division cycle on the chemical numbers include two parts, i.e., a variable term (v(t),u(t)) to influence the synthesis of mrnas, and an impulse term δ(t kτ) to enhance the degradation or dilution of each chemical. k=1

124 3.2 Gene Regulatory Networks with Cell Cycles 113 Consider the stability of a periodic oscillation in (3.23) (3.26). For the sake of simplicity, (3.23) (3.26) can be summarized as Ẋ(t) =F (X(t)) 1 2 δ(t kτ)x(t), (3.27) where X =(m, p, v, u) andf = F (m, p, v, u) =(Nuf(p/v) K m m, S p m K p p, vv, γ(u b)/τ γ(u b) 2 /(aτ)). Let φ(t; X(kτ)) denote the flow of the vector field F starting from φ(0; X(kτ)) = X(kτ) att =0,i.e., dφ(t; X(kτ)) = F (φ(t; X(kτ))), (3.28) dt and define ψ(t) as a fundamental solution satisfying ψ(t) t = k=1 F(φ(t; X(kτ))) X ψ(t) (3.29) with ψ(0) = I, where ψ R (2n+2) (2n+2),andI is the identity matrix. According to (3.28), integrating (3.27) from kτ + to t for kτ < t < (k +1)τ yields X(t) X(kτ + )= = t kτ + F (X(t))dt t kτ 0 F (φ(t; X(kτ)))dt = φ(t kτ; X(kτ)) φ(0; X(kτ)). (3.30) Note that X(kτ + )=φ(0; X(kτ)), and the integration range is changed for φ(t; X(kτ)) in (3.30) due to its initial state starting from X(kτ). On the other hand, in the same way, by integrating (3.27) from kτ + to (k +1)τ for t, we have τ X((k +1)τ) X(kτ + )= F (φ(t; X(kτ)))dt 1 δ(t τ)φ(t; X(kτ))dt = φ(τ; X(kτ)) φ(0,x(kτ)) 1 φ(τ; X(kτ)). (3.31) 2 Therefore, by using φ of the autonomous system and from (3.30) (3.31), the orbit of the non-autonomous (3.27) with k =0, 1, 2,... can be expressed as τ X(t) =φ(t kτ; X(kτ)), kτ t<(k +1)τ, (3.32) X((k +1)τ) = 1 φ(τ; X(kτ)), t =(k +1)τ. (3.33) 2 Unlike the continuous dynamics of the concentrations, the chemical number X(t) is continuous at t = kτ from the right side, but generally discontinuous at t = kτ from the left side.

125 114 3 Deterministic Structures of Biomolecular Networks It is evident that (3.33) is a Poincaré map of (3.27). Thus, the existence of a period-τ solution for (3.27) is equivalent to the existence of a real solution for the algebraic equation X(kτ) = 1 φ(τ; X(kτ)). (3.34) 2 Note that φ is not the flow of the right-hand side of (3.27) but the flow of the vector field F of (3.28). According to (3.33), the stability of the period-τ solution depends on the eigenvalues of the Jacobian matrix at X(kτ): J = 1 φ(τ,x(kτ)) 1 ψ(τ). (3.35) 2 X(kτ) 2 From dynamical system theory, if the absolute values of eigenvalues for J are all less than 1, the periodic solution is asymptotically stable. In a similar manner, we can derive the existence and stability conditions for any period-kτ solution Gene Regulatory Networks for Prokaryotes Next, we consider the dynamics with smooth changes of gene numbers, i.e., u(t) proportionally increases with the cell volume, and u(t)/v(t) = 1 approximately holds true. Such an assumption is actually valid only when N is sufficiently large, e.g., a large number of plasmids in a bacterial cell. Otherwise, u(t) should be considered a time-varying factor and its value should rapidly change as indicated in the first situation. Therefore, by (3.21) (3.22), we can describe the dynamics in terms of relative concentrations as follows: [m] =Nf([p]) (K m + v)[m], (3.36) [p] =S p [m] (K p + v)[p], (3.37) which are actually autonomous ODEs. The effect of a cell division cycle on relative concentrations is an additional degradation rate v, which implies that a cell cycle mainly enhances the dilution of chemicals in terms of concentrations or affects gene regulation by acting as a degradation factor. In a real cell, there actually exist many perturbations such as noise, around u(t)/v(t) = 1, which prevent the perfect tuning of the DNA duplication with the cell size or even equal distribution of DNAs in daughter cells. For the case of deterministic perturbations, i.e., u(t)/v(t) =1+σ, where σ is a small real number, the stability analysis of equilibria is relatively easy and can be investigated by perturbing N of (3.36) due to Nu(t)/v(t) = N + σn according to (3.21) (3.22). Local stability analysis of the dynamics of (3.36) (3.37) at an equilibrium point for [m] and[p] can be obtained directly by simply investigating the eigenvalues of the Jacobian matrix J for (3.36) (3.37), i.e.,

126 3.2 Gene Regulatory Networks with Cell Cycles 115 [ ] Km v Ndf([p])/d[p] J =. S p K p v (3.38) Note that the stability of [m] and[p] for an equilibrium point is identical to that of m and p for the corresponding periodic solution according to the definition of [m] and[p], i.e., [m i ](t) =m i (t)/v(t) and[p i ](t) =p i (t)/v(t). By including nonlinear terms, we can also analyze local bifurcations (Kuznetsov 1995). The effects of stochastic perturbations on cellular dynamics by using a model with small stochastic noises can be examined as follows. Let u(t)/v(t) = 1+ση(t), where σ is a small real number corresponding to the deviation of the small noise, and η(t) is Gaussian noise with zero mean η(t) = 0 and variance η(t)η(t ) = δ(t t ). Then, (3.36) (3.37) become [m] =Nf([p]) (K m + v)[m]+nf([p])ση(t), (3.39) [p] =S p [m] (K p + v)[p]. (3.40) Clearly, fluctuations by such noise mainly influence the system dynamics through the transcription process due to (3.39). Consider the example of a synthetic genetic network shown in Figure 3.6 with genes lac, tetr, ci and promoters P L teto1,prm, P LtetO1. All the three genes are well-characterized prokaryotic transcriptional regulators, which can be found in the bacterium E. coli and λ phage. The protein Lac forms a tetramer to inhibit the gene tetr with promoter PRM, and the protein CI as a dimer activates the gene tetr, while the protein TetR forms a homodimer to repress both the lac and ci genes with promoter P L teto1. All the three genes can be engineered on plasmids and then cloned to multiple copies, e.g., by polymerase chain reaction (PCR). The engineered plasmids are further assumed to grow in E. coli. Let x, y, andz be the numbers of protein monomers TetR, Lac, and CI, respectively. Define x 2 to be the number of protein dimer TetR. Let d and d x2 be the number of free DNA and the number of TetR 2 DNA complex, i.e., O R bound by a protein dimer TetR 2. Then, to regulate gene ci, the multimerization and binding reactions of tetr can be written as the equilibrium reactions as follows: x + x k1 k 1 x 2, (3.41) k 2 d + x 2 d x2. (3.42) k 2 The corresponding slow dynamics, i.e., transcription and translation processes of ci are d βm z d+ m z, (3.43) s m z z z, (3.44)

127 116 3 Deterministic Structures of Biomolecular Networks Figure 3.6 A three-gene model of a synthetic gene regulatory network where the protein Lac forms a tetramer to inhibit the gene tetr, and the protein CI enhances the gene tetr as a dimer, whereas the protein TetR forms a dimer to repress both gene lac and gene ci. PRM is a mutated promoter of P RM and has two binding sites O R1 and O R2 for the protein dimer CI 2 and one binding site O R3 for the protein tetramer Lac 4. Affinities of CI 2 for PRM are O R1 >O R2. Binding effects of CI 2 to O R1 and O R2 for transcription of PRM are neutral and positive, respectively, in contrast to a negative binding effect of O R3 by Lac 4. On the other hand, there is one binding site, i.e., O R for protein TetR 2, which represses the transcription of the promoter P LtetO1 (from (Chen et al. 2004)) with the degradation processes of m z and z as k mz m z 0, (3.45) z k z 0. (3.46) According to the mass action law, the dynamics of reactions (3.41) (3.42) can be described by the following differential equations x 2 ẋ 2 = V (k 1 V 2 k x 2 1 ), V (3.47) d x 2 d x2 = V (k 2 V 2 k d x2 2 ). V (3.48) The equilibrium states of fast dynamics (3.41) (3.42) can be written as algebraic equations x 2 = k 1 x 2 /(k 1 V )=c 1 x 2 /V and d x2 = k 2 x 2 d/(k 2 V )= σ 3 x 2 d/v 2, where σ 3 = c 1 c 2 and c i = k i /(k i V ). Such algebraic equations imply that the numbers of chemicals synthesized in the fast dynamics are inversely proportional to the cell volume. Let the copy number of plasmids with gene ci be n z u(t). Then, we have the conservation condition n z u(t) =d + d x2,whichleadstod = n z u(t)/(1 + σ 3 x 2 /v 2 ). Therefore, by substituting the equilibrium states of fast dynamics, the slow dynamics of (3.43) (3.46) for the mrna and the synthesized protein representing the transcription and translation processes of gene ci are

128 3.2 Gene Regulatory Networks with Cell Cycles 117 ṁ z = β mz d k mz m z = β m z n z u(t) 1+σ 3 x 2 /v 2 k m z m z, (3.49) ż = s z m z k z z, (3.50) where the synthesis rate of m z is β mz d due to the repressive effect of TetR on the binding site O R. At division instants, x x/2, y y/2, z z/2, v v/2, and u u/2. Similarly, we can obtain the dynamics for regulating genes lac and tetr.by defining the relative concentrations for proteins as [x] =x/v, [y] =y/v, and [z] =z/v, the dynamical system of the three-gene network can be summarized in terms of the relative concentrations of proteins in the following closed form: [m u(t) x ]=β mx n x v(t) f x([y], [z]) (k mx + v)[m x ], (3.51) [x] =s x [m x ] (k x + v)[x], (3.52) [m u(t) y ]=β my n y v(t) f y([x]) (k my + v)[m y ], (3.53) [y] =s y [m y ] (k y + v)[y], (3.54) [m u(t) z ]=β mz n z v(t) f z([x]) (k mz + v)[m z ], (3.55) [z] =s z [m z ] (k z + v)[z], (3.56) where f x ([y], [z]) = (1 + c[z] 2 + ασ 1 c 2 [z] 4 )/((1 + c[z] 2 + σ 1 c 2 [z] 4 )(1 + σ 4 [y] 4 )), f y ([x]) = 1/(1 + σ 2 [x] 2 ), and f z ([x]) = 1/(1 + σ 3 [x] 2 ). On the basis of the above theoretical framework, the nonlinear dynamics of gene regulatory networks with the consideration of the cell division cycle and the duplication process of DNA can be considered. In particular, for synthetic switches and oscillators, the cell cycle as a degradation factor may significantly affect cellular dynamics both qualitatively and quantitatively as follows: For a gene switch (or genetic switch), the bistable region may disappear due to the cell cycle although there is a bistable region for the autonomous system, and vice versa. For a gene oscillator (or genetic oscillator), a cell division cycle functions as an external force to entrain or synchronize the natural oscillation. Usually, a cell cycle entrains the system to tend to a limit cycle, but depending on the natural oscillation period or network structure, there may exist quasi-periodic, resonant, and even chaotic dynamics, stimulated by the cell cycle. A genet network (or genetic network) in vivo in a cell and an artificial genetic network in vitro in a cell-free system actually correspond to our model with and without a cell division cycle, respectively. Therefore, such analysis

129 118 3 Deterministic Structures of Biomolecular Networks with and without a cell division cycle may be a theoretical basis to quantitatively predict the essential dynamics and to successfully implement experiments from in vitro to in vivo. See (Chen et al. 2004) for more details on the synthetic network models and simulation results. 3.3 Interaction Graphs and Logic Gates Interaction Graphs and Types of Interactions We use three types of graphs to represent deterministic structures of molecular networks, one of which is the interaction graph. Each edge in a biomolecular network represented by an interaction graph corresponds to an interaction between two components. An interaction or more exactly a pairwise interaction can be of two types in a directed molecular network: activation and repression. Activation from B to A occurs when the synthesis rate of A increases as the concentration or number of B increases. For example, an activator B can increase the transcription rate of its target gene A; therefore, the interaction from the activator B to the target gene A is activation. On the other hand, repression from B to A occurs when the synthesis rate of A decreases as the concentration or number of B increases. For instance, when a repressor B binds to the promoter of gene A, it can reduce the transcription rate of its target gene A. The types of interaction can be more generally defined as follows in a molecular network with n chemicals, i.e., (3.1) or (3.2). Suppose that the concentration of the jth component at time t τ ij affects the synthesis rate of the ith component at time t, where i, j {1,..., n}. If the synthesis rate of the ith component at time t increases (or decreases) as the concentration of the jth component at t τ ij increases, the type of interaction from the jth component to the ith component is called positive (or negative), and we set s ij =1(ors ij = 1). If the synthesis rate of the ith component at t is never affected by the change in the concentration of the jth component, i.e., there is no direct interaction between them, we set s ij =0. For (3.1) or (3.2), mathematically s ij =1,s ij = 1, or s ij = 0 means s ij =1 s ij =0 s ij = 1 if if if f i (x(t)) x j (t) f i (x(t)) x j (t) f i (x(t)) x j (t) > 0, (3.57) =0, (3.58) < 0, (3.59) for all x(t) X. Thus,s ij =1(ors ij = 1) means that the jth component affects positively (or negatively) the ith component with time delay τ ij, where

130 3.3 Interaction Graphs and Logic Gates 119 x τi =(x 1 (t τ i1 ),..., x n (t τ in )), and τ ij is the time delay from chemical j to chemical i. For instance, in the following equation dp (t) dt = f(s(t τ s )) = V s K M + S(t τ s ) n, (3.60) we have s PS = 1 due to f(s(t))/ S(t) < 0 for all S(t). On the other hand, in the following equation dp (t) dt = f(s(t)) = V ss(t) K M + S(t), (3.61) we have s PS = 1 due to f(s)/ S > 0 for all S. Subsequently, we describe the definition of interaction graphs (Kobayashi et al. 2003), which would enable an intuitive understanding of the relation among the components. An interaction graph, IG(F ), of the biomolecular network defined by (3.1) or (3.2) is a directed graph whose nodes represent the individual components and whose edges e ij represent the interaction between node i and node j. When s ij 0, the graph has an edge, e ij, directed from the node j to the node i. A representative interaction graph is shown in Figure Figure 3.7 A representative interaction graph with feedback loops. Signs + and on an edge indicate s = 1 and 1, respectively. A feedback loop designated by a solid curve (or dashed curve) is a positive (or negative) feedback loop. In this graph, there is a negative feedback loop composed of the first, second, fourth, fifth, and sixth nodes, a positive feedback loop composed of the first, second, third, and sixth nodes, and a positive self-feedback loop composed of the fifth node 6 + The types of feedback loops are qualitative characteristics of biomolecular networks. If there is a path from the ith node of an interaction graph to itself,

131 120 3 Deterministic Structures of Biomolecular Networks p(i, i) =(i = p 1 p 2 p l 1 p l = i), then this path is said to be a feedback loop, and further, it can become a self-feedback loop when l =2, where p i means the ith node in the path. In addition, this feedback loop is said to be positive (or negative) if l 1 m=1 s p m+1 p m =1(or 1). By using positive and negative feedback loops, we can obtain many features of a molecular network. For instance, if all feedback loops are positive, such a molecular network has a very simple steady-state behavior which can be adopted for designing switching networks as stated in the succeeding chapters. On the other hand, if some of the feedback loops are negative, a periodic oscillation may appear in the molecular network and such a mechanism is adopted to design oscillating networks. The dynamics of biomolecular networks are generally complicated due to various types of nonlinear interactions among the components. When details of the interactions between any two components in a network are known, depending to the requirement for accuracy, one can use stochastic or deterministic equations to model its dynamics at the molecular level on the basis of the mass action law. Such a technique can also be used to construct synthetic molecular networks using known biological materials, which is a main goal in synthetic biology. The details of functional regulatory relationship between two components are generally unknown. Therefore, when modeling such a relationship irrespective of particular parameter values, some approximations are usually adopted. Assume that for each reaction in a biomolecular network there exists a rate function, which may include kinetic parameters. The rate function is usually a MM or Hill type function and is a monotone function with its variables. For example, consider two interacting components x and y. Ifx activates y, when using the Hill function, the dynamics can be described as follows: dy dt = v x(x/k xy ) n 1+(x/k xy ) n k dy + k b, (3.62) where x = x(t) andy = y(t). Clearly, the Hill function can be viewed as a special form of a general polynomial function (2.222). On the other hand, if x inhibits y, the dynamics can be described by dy dt = v x 1+(x/k xy ) n k dy + k b. (3.63) Here, v x represents the regulatory effect of x, the Hill coefficient n indicates the sensitivity of y with respect to x, k xy denotes the threshold of x inducing a significant response of y, k d is the degradation rate, and k b is the basal synthesis rate (Kim et al. 2008). Taking v x, k xy,andk d to be the unit leads to simpler forms dy dt = xn 1+x n y + k b (3.64) and

132 3.3 Interaction Graphs and Logic Gates 121 dy dt = 1 1+x n y + k b (3.65) of (3.62) and (3.63) Logic Gates Generally, a component may have one or more regulators. For example, a gene can be regulated by multiple transcription factors. Several regulators are combined with a logic block, which merges different regulations into one by the continuous analogue of Boolean functions AND or OR gate logic. Different co-regulation mechanisms may produce different dynamics. Even for different logic gates, there exist different co-regulation mechanisms. For example, there are competitive and non-competitive binding mechanisms for the OR and AND gate logic as shown in Chapter 2. Here, we consider the case of two regulators based on the Hill function. The case of more regulators can be similarly discussed. Let us first consider the case where both the regulators j and k are repressors. If the simultaneous binding of the two regulators is required to achieve the transcriptional repression of gene i, the co-regulation function or synthetic function can be modeled as follows: f S i (p j,p k )= 1 1+p n j pm k, (3.66) where n and m are the Hill coefficients of the proteins j and k, respectively. The superscript S stands for simultaneous. In other words, the regulatory function fi S is chosen as algebraic equivalent of Boolean function AND for the repressor. On the other hand, if the binding of either of the two repressors is sufficient to inhibit the gene expression, the co-regulation can be expressed as fi I 1 (p j,p k )= 1+p n j +, (3.67) pm k where the superscript I stands for independent and the regulatory function fi I is chosen as algebraic equivalent of Boolean function OR for the repressor. The co-regulatory functions of Boolean AND and OR gates for the activator can be similarly defined. Next, if one regulator, say j, is an inhibitor and the other, say k, isan activator, the co-regulation function can be modeled as follows: f M i (p j,p k )= 1+pm k 1+p n j + pm k, (3.68) where the superscript M stands for mixed (Goh et al. 2008). (3.66) (3.68) are defined based on the simper Hill functions (3.64) (3.65). Co-regulation functions corresponding to (3.62) (3.63) for different Boolean logics can be

133 122 3 Deterministic Structures of Biomolecular Networks similarly defined (Kim et al. 2008). For example, if both x and z activate y, the resulting dynamics can be dy dt = v x ((x/k xy ) n +(z/k zy ) m ) 1+(x/k xy ) n +(z/k zy ) m k dy + k b, (3.69) which is clearly a special form of a general transcription regulatory function (2.222). These co-regulation mechanisms for the case of MM type rate function can be similarly defined. Consider the example of the repressilator, a synthetic gene regulatory network by genes ci, tetr, andlaci (Elowitz and Leibler 2000). Its basic kinetics is described by dm i dt = m i + α 1+p n j + α 0, (3.70) dp i dt = β(p i m i ), (3.71) where i =(laci, tetr, ci) andj =(ci, laci, tetr), respectively. The variables m i = m i (t) andp i = p i (t) are concentrations of the mrnas and their protein products, respectively. The parameter α 0 is the basal synthetic rate, α + α 0 is the maximum synthetic rate in the absence of repressors, β is the ratio of the decay rate, and n is the Hill coefficient. Figure 3.8 Eight possible configurations of the repressilator with a new component N 4 forming a coupled feedback structure. Open circles represent the elements of the original three-node repressilator, and the solid circles denote the additional elements newly introduced. The arrows denote activation, and the blunted lines indicate repression (from (Goh et al. 2008)) To investigate the dynamical consequences of the extension through interlocking of elementary cellular networks, Goh et al. studied the sustained oscillations in the extended repressilator with a new component (Goh et al.

134 3.3 Interaction Graphs and Logic Gates ). The new component interacts with two existing nodes to form an additional feedback loop. The four-node system is modified into dm i = m i + αf i (p j,p k )+α 0, (3.72) dt dp i dt = β(p i m i ), (3.73) where i = N 1,N 2,N 3,N 4,andN 4 is a new node in addition to the three nodes of the original repressilator. According to the co-regulation among the fourth node and the other two existing nodes, the function f i can take one of the following forms fi S, f i I, and fi M.Fromin silico analysis, it can be shown that the capability of sustained oscillation depends on the topology of extended systems, and the stability of sustained oscillation under the extension also depends on the coupling topology. Clearly, all eight possible configurations have at least one negative feedback loop. Coherent coupling, i.e., Figure 3.8 (a, d, f, g), where the new feedback loop contains an odd number of inhibitory interactions, and homogeneous regulation, i.e., Figure 3.8 (d, g), where the element regulating two targets has the same sign of regulation (N2 in Figure 3.8 (d) and N1 in Figure 3.8 (g)), can favor sustained oscillations (Goh et al. 2008). The detailed analysis of such networks on dynamical and topological features will be provided in succeeding chapters, in particular for switching and oscillating networks.

135 4 Qualitative Analysis of Deterministic Dynamical Networks Given a biomolecular network, one can characterize its system behavior by applying various analytical methods, such as stability and bifurcation analysis, robustness and sensitivity analysis, topological analysis, and control analysis. These techniques can provide qualitative and quantitative insights into the system behavior of various networks. Such information is useful for revealing the design principles of biomolecular networks with specified functions such as switching and oscillation and for implementing synthetic biomolecular networks. 4.1 Stability Analysis Consider system (3.2) with delays for i =1,..., n, or, in a vector form dx i (t) dt = f i (x τi ), (4.1) dx(t) dt = f(x τ ), (4.2) where there are n n delays τ ij for i, j =1,..., n, andτ ij is the time delay from x j to x i. The inherent nonlinearity in (4.1) often precludes the analytical investigation of its dynamics. But we can gain an insight into its behavior by linearizing it around some point, generally an equilibrium. If a system is at its equilibria, it should stay there if there is no external perturbation. Depending on system behavior after the perturbation, an equilibrium is stable if the system returns to this state or unstable if the system leaves this state after the perturbation. An equilibrium is asymptotically stable if it is stable and the trajectories from nearby initial conditions called the basin of attraction tend to this state for

136 126 4 Qualitative Analysis of Deterministic Dynamical Networks t. Local stability describes the behavior after small perturbations and global stability after any perturbation. Let us first consider the local stability at an equilibrium. Assume f( x) 0, i.e., the zero function x x is an equilibrium solution of (4.1). Extracting the linear part from (4.1), it can be rewritten as dx dt = L(x τ )+f h (x τ ), (4.3) where L( ) is a linear functional and f h (u) = o( u ) (namely, when u 0, o( u )/ u 0) is the nonlinear term with order higher than u. If all roots λ k of the characteristic equation with det[λi P (λ)] = 0, (4.4) P (λ) =L(x τ e λτ ), (4.5) have negative real parts Reλ k < 0, then the equilibrium is asymptotically stable, where L(x τ e λτ ) represents the replacement of all x(t τ ij )in L(x τ )bye λτ ij, i.e., x(t τ ij ) e λτ ij in L(x τ )fori, j =1,..., n. IfReλ k > 0 for at least one k, it is unstable. For the case with at least one real part Reλ k = 0 and all other Reλ k < 0, the local stability cannot be determined by the characteristic equation, but depends on the higher-order term f h (x τ ). Such a case is the main focus in bifurcation analysis. Here, I is the n n identity matrix. Local stability depends on the characteristic values. On the other hand, global stability is widely analyzed using the Lyapunov function because of its simplicity and the generality of the method. An equilibrium is globally stable if the trajectories from all initial conditions approach it for t. Assume x to be an equilibrium of (4.1) and that its stability can be tested with the Lyapunov Krasovskii functional method as follows: 1. Transfer the equilibrium into the origin by coordination transformation ˆx = x x. 2. Find a Lyapunov Krasovskii functional V (ˆx τ,t) with the following properties: V (ˆx, t) is positive definite, i.e., V (ˆx τ,t) = 0 for ˆx τ = 0 and V (ˆx τ,t) > 0 for ˆx τ Calculate the time derivative of V (ˆx τ,t). The equilibrium ˆx = 0 is stable if the time derivative of V (ˆx τ,t) in a certain region around this state has no positive values. The equilibrium is asymptotically stable if the time derivative of V (ˆx τ,t) in this region is negative definite, i.e., dv (ˆx τ,t)/dt =0forˆx τ =0anddV (ˆx τ,t)/dt <0 for ˆx τ 0.Ifthe

137 4.1 Stability Analysis 127 region is the whole state space, the equilibrium is globally and asymptotically stable. Consider a scalar delay differential equation with constant coefficients a, b, and constant delay h 0, ẋ(t) = ax(t) bx(t h), t 0. (4.6) Clearly, x = 0 is the equilibrium. Construct a Lyapunov Krasovskii functional as follows: t V (x h,t)=x 2 (t)+ b x 2 (s)ds. (4.7) t h Then, V (x h,t)= 2ax 2 (t) 2bx(t)x(t h)+ b [ x 2 (t) x 2 (t h) ] ( 2a + b )x 2 (t)+ b x 2 (t h)+ b [ x 2 (t) x 2 (t h) ] = 2(a b )x 2 (t). (4.8) As a result, we obtain that V (x h,t) 2(a b )x 2 (t). This gives us the asymptotically and globally stable condition a> b, which is independent of the value of the delay h. Consider the general model of gene regulatory networks (3.13) (3.14) for the analysis of local stability. Assume that ( m, p) is an equilibrium and consider its local stability. By linearizing r =(r 1,..., r n )ands =(s 1,..., s n )and by substituting m(t) =ae λτm and p(t) =be λτp into (3.13) (3.14), we obtain the characteristic equation at the equilibrium ( m, p) as follows: det ([ λi 0 0 λi ] [ ]) Dm J r E λτ p =0, (4.9) J s E λτm D p where D m = diag(d m1,..., d mn )andd p = diag(d p1,..., d pn ). J r = dr(p) dp and J s = ds(m) dm (4.10) at ( m, p) are n n matrices and I is the n n identity matrix. Note that J s is a diagonal matrix because s i depends only on m i. E λτ p = diag(e λτ p 1,..., e λτ pn )ande λτ m = diag(e λτ m 1,..., e λτ mn ), where we assume τ pi = τ pij with i, j =1,..., n for the simplicity. Multiplying (4.9) by diag(e λτ p,e λτ m ) and using Schur s theorem, we obtain det((λi n + D m )(λi n + D p )E λ τ J s J r )=0, (4.11) where τ =(τ m1 + τ p1,..., τ mn + τ pn ). The local stability of ( m, p) depends on the characteristic values of (4.11). In particular, if all characteristic roots of (4.11) have negative real parts, the equilibrium ( m, p) will be asymptotically

138 128 4 Qualitative Analysis of Deterministic Dynamical Networks stable. If there exists a root with a positive real part, on the other hand, it will be unstable. Now, consider the global stability of the equilibrium ( m, p) for (3.13) (3.14) by assuming that r can be expressed in the Lur e system form of p and that s are linear functions of m. Under simple transformations, (3.13) (3.14) can be rewritten in the following Lur e system form, ẋ(t) =Ax(t)+Gf(y τ 1 (t)), (4.12) ẏ(t) =Cy(t) Dx(t τ 2 (t)), (4.13) where τ 1 (t) > 0andτ 2 (t) > 0 are inter- and intra-node time-varying delays. We assume that τ 1 (t) d 1 < 1and τ 2 (t) d 2 < 1. Based on the Lyapunov method and LMI technique, the global asymptotical stability of the equilibrium is obtained as follows (Li et al. 2006a): Suppose there exist matrices P 11,P 22,P 12, Q > 0, R > 0, and Λ = diag(λ 1,..., λ n ) > 0 such that the following LMIs hold: [ ] P11 P M 2 < 0 and P = 12 P12 T > 0, (4.14) P 22 then, the unique equilibrium of the genetic network (4.12) (4.13) is globally asymptotically stable, where M 2 is the following matrix: 2P 11 A + R P 12 C + AP 12 P 12 D 0 P 11 G P12A T + CP12 T 2P 22 C P 22 D kλ P12G T DP12 T DP 22 (1 d 2 )R kλ 0 Q 2Λ 0. (4.15) G T P 11 G T P (1 d 1 )Q This theoretical result can be proven by constructing a Lyapunov Krasovskii functional [ ] T [ ] t x(t) x(t) V (x(t),y(t),t)= P + f T (y(μ))qf(y(μ))dμ y(t) y(t) t τ 1(t) + t t τ 2 (t) x(μ)rx(μ)dμ (4.16) and showing that V (x(t),y(t),t) < 0 holds. These conditions guarantee the global stability of the equilibrium and can be easily verified by using software tools, e.g., the MATLAB R LMI Toolbox. Moreover, this numerical approach can be used not only to analyze and understand various gene regulation mechanisms in living organisms but also to design synthetic biomolecular networks in the framework of synthetic biology and analyze, for example, oscillatory phenomena such as synchronization of gene oscillators.

139 4.2 Bifurcation Analysis 129 The type of stability mentioned above is the Lyapunov stability. Another type of stability is known as structural stability, for which the qualitative behavior of the trajectories is unaffected by continuously differentiable small perturbations. Examples of such qualitative properties are hyperbolic invariant sets like hyperbolic equilibria and periodic orbits. In contrast to the Lyapunov stability, in which perturbations of initial conditions for a fixed system are considered, structural stability deals with perturbations of the system itself. Structural stability plays important roles in robustness and development in biology (Kitano 2002). For example, the fundamental properties of the lambda phage fate decision circuit are not affected even if the sequence of OR binding sites is altered (Kitano 2002). Lambda phage exploits multiple feedback mechanisms to stabilize the committed state and to enable switching of its pathways, other than specific parametric features of the elements such as binding sites. 4.2 Bifurcation Analysis Bifurcation analysis focuses on the qualitative changes in system behavior in response to parameter changes. It is performed by varying single or multiple parameters until a qualitative change in dynamics occurs. The value at which the bifurcation occurs is called the bifurcation value, where the real part of at least one root λ k of the characteristic equation is zero. Bifurcation analysis can help to provide comprehensive and predictive information for understanding gene expression patterns, regulatory pathways, and functions of various biomolecular networks. To show explicitly the dependence of the system dynamics on the parameter, (4.1) is rewritten as dx(t) = f(x τ ; α), (4.17) dt where α is a vector of system parameters. Assume that f(0; α) 0, i.e., the zero function corresponding to the equilibrium is a solution of (4.17) for all parameter values α. The characteristic equation of the linear approximation to (4.1) has the form of (4.4). In general, bifurcations can be divided into two principal classes, i.e., local bifurcations and global bifurcations. Local bifurcations can be analyzed entirely through changes in the local stability properties of equilibria, periodic orbits or other invariant sets as parameters cross through critical thresholds, whereas global bifurcations often occur when larger invariant sets of the system collide with each other or with equilibria of the system. Global bifurcations cannot be detected only by stability analysis of the equilibria. Varying the system parameters can create or destroy equilibrium solutions. Further, the properties of these equilibria can change. At bifurcation points, a stable equilibrium may lose its stability or vice versa. A stable equilibrium

140 130 4 Qualitative Analysis of Deterministic Dynamical Networks may also bifurcate to other or no equilibria. Local bifurcation of equilibria can be used to analyze such phenomena. For local bifurcations, there are two generic bifurcation types (codimension-one bifurcation types) for a general nonlinear dynamical system (a continuous-time dynamical system), i.e., steady-state bifurcations and Hopf bifurcations, where the real part of a root of the characteristic equation is zero, most notably: 1. if a root is equal to zero, the bifurcation is a steady-state bifurcation; 2. if two roots are nonzero but a pair of purely imaginary complex conjugate numbers, the bifurcation is called the Hopf bifurcation. Clearly, the simplest and most commonly occurring bifurcations associated with the appearance of one characteristic value λ 1 = 0 are steadystate bifurcation, including saddle-node bifurcations, transcritical bifurcations, and pitchfork bifurcations. In fact, by central manifold theory and appropriate transformation, a nonlinear system can be qualitatively reduced to a codimension-one system near the bifurcation point. Therefore, we can have the following qualitative descriptions for each steady-state bifurcation. 1. The saddle-node bifurcation with the prototype function or the normal form as ẋ(t) =α x 2 (t): the saddle-node bifurcation occurs when there are two curves of stable and unstable equilibria on one side of the bifurcation point in the bifurcation diagram and no curves of equilibria on the other side. At the bifurcation point, stable and unstable equilibria coalesce and form a saddle-node equilibrium. 2. The transcritical bifurcation with the prototype function or the normal form as ẋ(t) =αx(t) x 2 (t): the transcritical bifurcation occurs when two curves of stable and unstable equilibria exist on either side of the bifurcation point, both curves intersect at the bifurcation point, and the stability of each equilibrium along the curve changes on passing through the bifurcation point in the bifurcation diagram. 3. The pitchfork bifurcation with a prototype function or the normal form as ẋ(t) =αx(t) ± x 3 (t). The negative sign and positive sign correspond to a supercritical bifurcation and a subcritical bifurcation, respectively. For the supercritical case, the pitchfork bifurcation occurs when three curves of equilibria intersect at the bifurcation point and only one middle curve c exists on both sides of the bifurcation point in the bifurcation diagram; however, the other curves lie entirely to one side of the bifurcation point and have a stability type which is opposite to that of the curve c. Next, we consider the Hopf bifurcation. In addition to equilibrium solutions, another widely observed phenomenon is oscillatory behavior, which is also common in biological systems. The cause of an oscillation may be different, either externally imposed or internally generated. An internally caused stable oscillation can be found if the system has a limit cycle in the state space. A limit cycle is an isolated closed trajectory. All trajectories in its

141 4.2 Bifurcation Analysis 131 vicinity will wind towards or away from the limit cycle for t, depending on the stability of the limit cycle. Hopf bifurcations can lead to stable or unstable oscillations. The Hopf bifurcation occurs when there are a pair of purely imaginary roots. Assume that there is α 0 such that for α < α 0, all roots λ k of the characteristic equation (4.4) belong to the open left half of the complex plane, whereas at α = α 0, λ 1,2 α=α0 = ±iω 0 (ω 0 > 0), (4.18) dreλ 1,2 (α) α=α0 > 0, Reλ j α=α0 < 0 (for all j>2). (4.19) dα Under these conditions, if α increases and passes through the value α 0, then the stable equilibrium becomes unstable, i.e., α = α 0 is the bifurcation value. When α increases and passes through α 0, a periodic solution bifurcates from the equilibrium. Such a solution is stable if it arises for α>α 0 and is unstable if for α<α 0. When a stable cycle emerges, it is a supercritical Hopf bifurcation, otherwise, the Hopf bifurcation is subcritical. Local bifurcation has been widely used in analysis of biomolecular networks. For example, saddle-node bifurcations and pitchfork bifurcations correspond to progression and decision differentiation (Guantes and Poyatos 2008), respectively, and Hopf bifurcations are often used to detect the occurrence of oscillations such as circadian rhythms (Goldbeter 1995, Leloup and Goldbeter 1998, Leloup and Goldbeter 2003). For an extensive treatment of bifurcation analysis, refer to (Guckenheimer and Holmes 1983, Kuznetsov 1995, Kolmanovskii and Myshkis 1999). Figure 4.1 Bifurcation diagram for wild-type cell cycle (from (Tyson et al. 2003)) A typical example of bifurcation analysis in biomolecular networks is the cell cycle, as shown in Figure 4.1. The cell cycle can be viewed as a sequence

142 132 4 Qualitative Analysis of Deterministic Dynamical Networks of bifurcations. A small newborn cell is attracted to the stable G1 steadystate. As it grows, it eventually passes the saddle-node bifurcation (SN3) where the G1 steady-state disappears, and the cell makes an irreversible transition into the S/G2 stage. It stays in the S/G2 stage until it grows so large that the S/G2 steady-state disappears, inducing an infinite-period bifurcation (SN/IP), where a stable steady-state gives way to a large-amplitude periodic solution, and the period of oscillation is very long. Cyclin-B-dependent kinase activity soars, driving the cell into mitosis, and then plummets, as cyclin B is degraded by APC-Cdc20. The drop in Cdk1-cyclin B activity is the signal for the cell to divide, causing the cell size to be halved, and the system returns to its starting point (Tyson et al. 2003). Other practical bifurcation analysis has been widely used for analyzing multistability (Angeli et al. 2004), resonance in oscillatory cellular processes (Hasty et al. 2002a), production of circadian oscillations and coexistence of multiple attractors (Goldbeter 1997), single parameter robustness (Ma and Iglesias 2002), and synchronization of coupled oscillators (Gonze et al. 2005, Chen et al. 2005). Many bifurcation tools have been developed to study various interesting dynamics such as switching and oscillatory dynamics in biomolecular networks. These software tools, tutorial manuals, and test models can be freely downloaded, e.g., Xppaut at bard/xpp/xpp.html, MATLAB R package matcont at and DDE-BIFTOOL at twr/research/software /delay/ddebiftool.shtml. Recently, a software BunKi, which is an integrated environment dedicated to bifurcation analysis, was developed. Its highlight feature is that it offers a user-friendly analysis environment. The software is freely available at Examples for Analyzing Stability and Bifurcations In this section, we present some simple examples to analyze stability and bifurcations of molecular networks A Simplified Gene Network We first study the qualitative behavior of the general gene regulatory networks (3.13) (3.14), with the following simplification. Assumption Assume that the total delay time τ of the transcription and translation processes for each gene product has the same value, i.e., τ τ m1 +τ p1 = = τ mn +τ pn. Assume that all mrnas and all proteins have the same degradation rates k m and k p, respectively, i.e., k m k m1 = = k mn and k p k p1 = = k pn.

143 4.3 Examples for Analyzing Stability and Bifurcations 133 Hence, according to (4.11) with this assumption, we have a much simpler representation of the characteristic equation, which is the transcendental equation det((λ + d m )(λ + d p )e λτ I n J) =0. (4.20) Thus, we have the following theorem. Theorem 4.1. Suppose that Assumption holds. If λ is a root of (4.20), there is an eigenvalue γ i (i =1,..., n) of the matrix J s J r for which γ i =(λ + k m )(λ + k p )e λτ. (4.21) On the other hand, if γ i Cis an eigenvalue of the matrix J s J r, then any solution λ of (4.21) is a characteristic root of (4.20). Note that in general γ i is a complex number. The proof of the theorem is straightforward according to (4.20), which is equivalent to n equations of (4.21) for n eigenvalues of J, respectively. For instance, suppose that (γ 1,..., γ n ) are the n eigenvalues of the matrix J s J r R n n at an equilibrium ( m, p). Then, the local stability depends on the characteristic values λ of the n equations of (4.21). If all the roots of (4.21) for i =1,..., n have negative real parts, then the equilibrium of (3.13) (3.14) is asymptotically stable ; if there exists a root with a positive real part for i =1,..., n, then the equilibrium of (3.13) (3.14) is unstable. Let ψ(λ) =(λ + k m )(λ + k p ) where k m and k p are non-negative numbers. Then, we have the following theorem. Theorem 4.2. Suppose that Assumption holds. Assume that each eigenvalue of J s J r is expressed as γ i Cfor i =1,..., n. If and only if γ i for all i =1,..., n lies inside the region including the origin, bounded by the following arcs of spirals ψ(jω)e jωτ, (4.22) then all roots of (4.21) have negative real parts at ( m, p), where j = 1 and ω R. Notice that the spirals ψ(jω)e jωτ can be drawn in the following way. As shown in Figure 4.2 (a), first draw ψ(jω) in the complex plane with the x axis and the y axis as x(ω) =Re[ψ(jω)] and y = Im[ψ(jω)], which is clearly a parabolic curve. Then ψ(jω)e jωτ is obtained by rotating each point of ψ(jω) by ωτ, as shown in Figure 4.2 (b), which includes the origin or the zero point in the complex plane (Hori et al. 2010). The region including the origin, bounded by the arcs of spirals ψ(jω)e jωτ,isω shown in Figure 4.2 (b). If any square root of any eigenvalue for J s J r moves from the inside region to the outside region and passes the spirals indicated in Theorem 4.2, then there exists a bifurcation that destabilizes the system. From Theorem 4.2, we have two limiting cases, i.e., τ = 0 and τ.

144 134 4 Qualitative Analysis of Deterministic Dynamical Networks Figure 4.2 The stability region in the complex plane for the simplified gene network: (a) stability region without time delay; (b) stability region with time delay

145 4.3 Examples for Analyzing Stability and Bifurcations 135 Corollary 4.3. Suppose that Assumption holds and that k = k p = k m. Assume that γ i (i =1,..., n) are eigenvalues of J s J r. Then, at ( m, p), 1. all roots of (4.21) have negative real parts for τ =0if and only if k 2 > γ i +Re[γ i] 2 for all i =1,..., n, and 2. all roots of (4.21) have negative real parts for all non-negative τ if and only if k 2 > γ i for all i =1,..., n. Proof of Corollary 4.3 Let γ i = R i e jθi, where γ i = R i 0and2π> θ i 0. According to Theorem 4.2, when τ = 0, the sufficient and necessary conditions for the existence of negative roots for (4.21) are Re[± R i e jθ i 2 ] <k. (4.23) Therefore, by noting γ i = R i,wehavek 2 > R i cos 2 θi 2, which proves condition 1 of this corollary. In the same manner, we can show condition 2 of Corollary 4.3. From Corollary 4.3, there clearly exists a critical ˆτ R + such that all roots of (4.20) have negative real parts for τ [0, ˆτ) and at least one root of (4.20) has a positive real part for τ>ˆτ provided that max{ γ 1 + Re[γ 1 ] 2 2 = γ i +Re[γ i ],, γ n + Re[γ n ] } <k 2 < max{ γ 1,, γ n }. (4.24) 2 When J s J r has only real eigenvalues, we can also obtain the conditions for Hopf bifurcations by taking τ as a bifurcation parameter (Chen and Aihara 2002a) A Two-gene Network The second example is a simplified two-gene model ṗ(t) = k p p(t)+ ɛ q(t) = k q q(t)+ k 1 q(t)+k 2, (4.25) q2 (t) q 2 (t)+k 4 p(t)+k 3, (4.26) where p(t) andq(t) are the protein products of genes P and Q, respectively. The protein q(t) enhances transcription of itself but represses that of gene P, whereas the protein p(t) is an activator of gene Q. Figure 4.3 illustrates schematically the two-gene network. To minimize the number of variables, we adopt only q(t) andp(t) concentrations without explicitly expressing the mrna or other related chemicals. k p and k q /ɛ are the degradation rates of the two proteins, whereas k 1 is the transcription and translation rate for gene P. k 2 is the MM constant. k 3 and

146 136 4 Qualitative Analysis of Deterministic Dynamical Networks p( t ) k p Protein: p + Promoter Gene-Q DNA sequences Promoter Gene-P + q(t) Protein : q k q degradation q(t) Figure 4.3 A two-gene model of the genetic regulatory system with an autoregulatory feedback loop (from (Chen and Aihara 2002b)) k 4 are lumped parameters that describe the effects of binding or multimerization of proteins, phosphorylation, and other similar phenomena. ɛ is a small positive real number expressing the difference of time scales between q(t) and p(t). Assume that E =( p, q) is an equilibrium of (4.25) (4.26). The Jacobian matrix J of (4.25) (4.26) at E is [ ] ɛkp ɛk 1 ( q+k 2 ) 2 J = 1 ɛ q 2 q 2 +k 4 J q, (4.27) where J q = k q + 2k 4 q p ( q 2 + k 4 ) 2. (4.28) For (4.25) (4.26), the following results hold (Chen and Aihara 2002b). Theorem 4.4. Assume that (4.25) (4.26) have only one equilibrium E and ɛ is sufficiently small. If J q > 0, (4.25) (4.26) have a periodic solution around E. On the other hand, if J q < 0, (4.25) (4.26) have a stable equilibrium at E. The stability of the unique equilibrium can be obtained by proving that all eigenvalues of J have negative real parts if J q < 0. On the other hand, the existence of a periodic solution around E can be proven by showing that there exists a trapping region for (4.25) (4.26) and there is only one unstable

147 4.3 Examples for Analyzing Stability and Bifurcations 137 equilibrium E within it. According to the Poincaré Bendixson theorem, a periodic solution exists. One of the key factors affecting the dynamics of gene protein networks is time delays, which usually exist in transcription, translation, and translocation processes and may significantly influence the stability of the overall system, particularly in an eukaryotic cell. Next, we assume that there is a time delay τ 0 only for the slow variable p and that the time delay for the fast variable q is small enough to be ignored. Then (4.25) (4.26) become ṗ(t) = k p p(t)+ k 1 q(t)+k 2, (4.29) ɛ q(t) = k q q(t)+ q2 (t) q 2 p(t τ)+k 3. (4.30) (t)+k 4 The characteristic equation of (4.29) (4.30) at the unique equilibrium E = ( p, q) takes the form λ 2 + a ɛ λ + b ɛ + c ɛ e λτ =0, (4.31) where a = ɛk p J q, b = J q k p,andc = k 1 q 2 /[(q 2 +k 4 )(q +k 2 ) 2 ]ate =( p, q). The roots of the transcendental equation (4.31) for λ determines the stability of the equilibrium E. If the real parts of all the roots are negative, E is an asymptotically stable equilibrium and there is no oscillation. On the other hand, if there exists a root with a positive real part, the oscillation exists. In other words, the bifurcations occur when λ = jv is a root of (4.31), namely b ɛv 2 + c cos(vτ) =0, (4.32) av c sin(vτ) =0. (4.33) For τ > 0anda 0, by eliminating the sin(vτ) and cos(vτ) terms, we have v 4 + ā ɛ 2 v2 + b =0, (4.34) ɛ2 where ā = a 2 2bɛ and b = b 2 c 2. When ɛ is sufficiently small and c > b, the real solutions of (4.34) can be written as v = ±[ ā +(ā2 2ɛ 2 4ɛ 4 b ɛ 2 )1/2 ] 1/2 = ±[ ā 2ɛ 2 +( ā 2ɛ 2 b ā + O(ɛ))]1/2. (4.35) Therefore, the critical values for v and τ k are v = ±[ c2 b 2 a 2 + O(ɛ)] 1/2 = ± c 2 b 2 /a + O(ɛ)

148 138 4 Qualitative Analysis of Deterministic Dynamical Networks and τ k = 1 v [2kπ + b arccos(ɛv2 )] c a = c2 b [2kπ + arccos( b )] + O(ɛ), (4.36) 2 c where k =0, 1, 2,... and the range of arccos is [0,π]. When c < b, itisclear that there is no real solution of (4.34), which means that no Hopf bifurcation could occur for any τ. 4 3 =0.01, 2 protein ln(q) 1 0 E =0.01, protein p Figure 4.4 Limit cycles obtained at τ = 0 and τ = 0.5. The point E is the equilibrium of the overall system (4.29) (4.30) (from (Chen and Aihara 2002b)) On the other hand, from (4.31), we have ( dλ dτ ) 1 = 2ɛλ + a e λτ τ cλ λ. (4.37) Since (Reλ)/ τ = λ 2 Re[(dλ/dτ) 1 ], by substituting (4.32) (4.33) into the real part of (4.37) at λ = jv, we obtain (Reλ) λ=jv = v 2 2ɛ2 v 2 2ɛb + a 2 τ c 2 = v2 a 2 c 2 + O(ɛ). (4.38) Thus, when ɛ is sufficiently small, (Reλ)/ τ λ=jv > 0, which implies that the real part of any λ moves to the right half plane for increasing τ when λ

149 4.3 Examples for Analyzing Stability and Bifurcations 139 solution: q(t), p(t) solution: q(t), p(t) Time courses for q(t) and p(t) without time delay Time courses for q(t) and p(t) with time delay log(q(t)) p(t)/10 log(q(t)) p(t)/ time t Figure 4.5 Time evolutions of q(t) and p(t) forɛ =0.01 with τ = 0 and τ =0.5 (from (Chen and Aihara 2002b)) is on the imaginary axis. In other words, the time delay only destabilizes the equilibrium E. Therefore, the following theorem is obtained by summarizing the above discussions (Chen and Aihara 2002b). Theorem 4.5. Assume that (4.29) (4.30) have only one equilibrium E =( p, q) and ɛ is sufficiently small. If J q > 0, (4.29) (4.30) have an oscillatory solution around E for any non-negative τ. IfJ q < 0 and c b, (4.29) (4.30) have an oscillatory solution around E when τ> τ 0, where τ 0 is the first bifurcation value in (4.36) at k =0.IfJ q < 0 and c < b, (4.29) (4.30) have an asymptotically stable equilibrium at E for any non-negative τ. For k p =1,k q =1,k 1 =15,k 2 =0.2,k 3 =0.1, and k 4 = 10, it can be estimated that there is only one equilibrium ( p, q) =(7.6831, ). It is unstable at τ = 0 and τ =0.5, independent of ɛ, andj q =0.53 > 0. The phase portrait and time evolutions based on numerical simulation with ɛ =0.01 are shown in Figure 4.4 and Figure 4.5, respectively A Three-gene Network Next, we examine the biological plausible three-gene model shown in Figure 4.6, where proteins p 1 and p 3 form a heterodimer to inhibit gene 2, whereas

150 140 4 Qualitative Analysis of Deterministic Dynamical Networks protein p 2 forms a homodimer to activate gene 3 and inhibit gene 1 (Chen and Aihara 2002b). Assume that the production of proteins p 1 and p 2 is much faster than that of protein p 3 : ɛṗ 1 (t) = ɛṗ 2 (t) = k 1 1+a 1 p 2 2 (t) d 1p 1 (t)+b 1 (4.39) k 2 1+a 2 p 1 (t)p 3 (t τ) d 2p 2 (t)+b 2 (4.40) ṗ 3 (t) = k 3p 2 2(t) 1+a 3 p 2 2 (t) d 3p 3 (t)+b 3 (4.41) where ɛ =0.01, d 1 = d 2 = d 3 =0.04, b 1 = b 2 = b 3 =0.004, k 1 =4,k 2 = 1,k 3 =0.08, and a 1 =1,a 2 =1/16,a 3 =0.05. All variables are positive. τ is the time delay. p2 + Gene 1 Gene 2 Gene 3 p1 p3 Figure 4.6 A three-gene network (from (Chen and Aihara 2002b)) Assume τ = 0. From (4.39) (4.40), we obtain the nullcline, or the equilibrium loci of the fast system: p 3 = d 1(1 + a 1 p 2 2)(k 2 + b 2 d 2 p 2 ) a 2 (d 2 p 2 b 2 )(k 1 + b 1 (1 + a 1 p 2 (4.42) 2 )). On the other hand, the nullcline of the slow system can be derived from (4.41): p 3 = k 3 p 2 2 d 3 (1 + a 3 p 2 2 ) + b 3 d 3. (4.43) Figure 4.7 shows the limit cycle as well as the nullclines for both the fast system and the slow system obtained by numerical calculation. The relaxation oscillation is mainly due to the time scale difference and the hysteresis of the slow manifold. As with the previous example, we can obtain the time evolutions and the effects of the time delay, which are similar to the two-gene model.

151 4.4 Robustness and Sensitivity Analysis Relaxation Oscillator of the three-gene system without time delay eqn.(26) 20 protein p eqn.(27) E protein p2 Figure 4.7 A limit cycle of the three-gene model with ɛ =0.01 and τ =0.Thetwo curves are the nullclines of the fast and slow systems (4.42) and (4.43), respectively. The point E is the equilibrium of the overall system (4.39) (4.41) (from (Chen and Aihara 2002b)) 4.4 Robustness and Sensitivity Analysis In contrast to bifurcation analysis, sensitivity analysis provides a quantitative measure of the dependence of system behavior on parameters. Complex mathematical models of biomolecular networks are increasingly being used as predictive tools and aid in gaining an understanding of the system behavior underlying observed biological phenomena. The parameters appearing in these models, which may include rate constants, initial conditions, operating conditions, and thermodynamic constants, are seldom known to high precision. Quantification of the roles of the parameters in model prediction is the traditional realm of sensitivity analysis. Most research studies are directed at conceptualization and implementation of numerical techniques for determining the parameter sensitivities in various mathematical models including those with stochastic characteristics and non-constant parameters. Robustness and sensitivity are two sides of the same coin. In addition to some direct robustness measures, sensitivity analysis has also been widely used in quantifying robustness with respect to different parameter perturbations. In a complex system with a large number of parameters, the system behavior may be robust to changes in some parameters but sensitive to changes in other parameters. Accurate identification of the underlying mechanisms for

152 142 4 Qualitative Analysis of Deterministic Dynamical Networks such robustness is a challenging problem. The system behavior depends on both the parameters and the system architecture. Some analysis shows that the tradeoff between robustness and fragility is largely determined by the regulatory structures (Stelling et al. 2004b). Robustness and sensitivity analysis provides a method to characterize both the impact of parameters and system structures. It may also serve as a guide not only in designing synthetic biomolecular networks with specified functions and high robustness but also in realizing system behavior as desired experimentally. For example, sensitivity analysis can be used to optimize genetic networks (Feng et al. 2004) and to provide guidance on which proteins are likely to be the most significant as drug targets (Ihekwaba et al. 2004). Different parameters may have different influences on system dynamics, and the degree of the influence can be quantified by analysis on robustness and parameter sensitivity. Next, we consider systems of differential equations. Robustness and sensitivity analysis of the general system (4.1) can be treated similarly. Consider a system described by the ODEs dx(t) = f(x(t); p), (4.44) dt where x R n denote the state, p R m denotes the parameters, and f : R n R m R n consists of functions of the state and parameters. We will analyze this system for robustness and sensitivity Robustness Measures Different robustness measures have been proposed, such as single parameter robustness (Ma and Iglesias 2002), multiparametric robustness (Ma and Iglesias 2002,Bluthgen and Herzel 2003), and Monte Carlo-based robustness (Eissing et al. 2005). The single parameter robustness measures the minimal distance from a reference point in the parameter space to a bifurcation point. For each parameter p i, bifurcation values can be obtained by using bifurcation analysis tools. Suppose that some kind of bifurcation, e.g., a Hopf bifurcation, occurs at p i and ˆp i. Both the size of the interval ( p i, ˆp i ) as well as the proximity of the nominal parameter value to either boundary are measures of the robustness of the system. Therefore, the degree of robustness (DOR) for each parameter p i can be defined as { pi DOR i =1 max p i, p } i, (4.45) ˆp i where p i is the parameter at the reference point or in the reference system. According to the definition, it is straightforward to see that the value is always between zero and one. A robustness measure of 0 indicates that the parameter value is exactly at a bifurcation point, i.e., the extreme parameter sensitivity, whereas a robustness measure of 1 implies large insensitivity

153 4.4 Robustness and Sensitivity Analysis 143 or high robustness. Single-parameter robustness has been adopted to quantify the robustness of an oscillatory network, where stable oscillations exist for the range ( p i, ˆp i ) (Ma and Iglesias 2002), and a bistable network, where bistability is generated for the range ( p i, ˆp i ) by saddle-node or transcritical bifurcations (Eissing et al. 2007). Note that single-parameter robustness is strongly dependent on the reference parameter set. Multiparametric robustness with respect to random parameter variations measures the robustness of global variability caused by environmental or cellto-cell variations. An ensemble of altered systems is obtained from a reference system by random modifications of its parameters. Each alternation of the reference system is characterized by the total parameter variation, p T, which is defined as p T = m i=1 ˆp i log 10, (4.46) p i where ˆp i and p i are the parameters in the altered and reference systems, respectively. The total parameter variation p T can be interpreted as the total order of magnitude of the parameter variation. Such an approach has been applied to analysis on the robustness of bacterial chemotaxis (Barkai and Leibler 1997) and intracellular signaling cascades (Bluthgen and Herzel 2003). Similarly, a Monte Carlo-based approach to evaluate the robustness of bistable systems was proposed and applied to apoptosis signaling (Eissing et al. 2005, Eissing et al. 2007). In this approach, random parameter sets are drawn from predefined ranges such that each parameter is uniformly distributed on a logarithmic scale, and the relative frequency of occurrence of bistability provides an estimate of the volume in the parameter space and can be used as a robustness measure. Such an approach is also applicable to oscillatory systems, although there is no related work to date as far as the authors know. Some other multi-parametric robustness measures were also proposed. For example, a tool of the structural singular value was used to quantify the robust stability of limit cycles in an oscillatory biomolecular network (Ma and Iglesias 2002) Sensitivity Analysis Assuming that the solution of (4.44) exists, the sensitivity matrix of the system, S(t), which describes how variations in the parameters near a point in the parameter space, p 0, influence the system trajectories, is defined as ( ) x S(t) = = {s ij }, (4.47) p x=x(t;p 0),p=p 0 where S(t) is composed of individual sensitivities of each state variable to each parameter s ij (Zak et al. 2005). The sensitivity matrix S(t) can be calculated by the finite differences as follows. For a single parameter p j,

154 144 4 Qualitative Analysis of Deterministic Dynamical Networks ( ) x(t) x(t; p j + Δp j ) x(t; p j ). (4.48) p j x=x(t;p 0) Δp j It is computationally tedious and frequently inaccurate to estimate the sensitivities using (4.48), and this may lead to numerical instabilities. An alternative approach is to differentiate (4.44) with respect to the parameter p, giving ds(t) = A(t, p 0 )S(t)+B(t, p 0 ), (4.49) dt where A(t, p 0 )andb(t, p 0 ) are the Jacobian matrices of f with respect to x and p, respectively, i.e., ( ) f A(t, p 0 )= B(t, p 0 )=, (4.50) x x=x(t;p 0 ),p=p 0 ). (4.51) x=x(t;p 0),p=p 0 ( f p The sensitivity matrix S(t) is a solution of the linear time-varying system defined by (4.49). By symbolically calculating the state and parameter Jacobian matrices (A and B, respectively), it is possible to integrate x(t) ands(t) simultaneously, given the nominal parameter values, p 0, and the initial conditions for the state, x 0 (Zak et al. 2005). A state-based sensitivity measure describes the change in the state with respect to changes in the parameters. Periodic oscillations are very common in biomolecular networks. For an oscillatory network, the global indicator of robustness captures aspects including the shape, phase, period, and amplitude of an oscillation. The overall state sensitivity is defined by S o (t) =(S o1 (t),..., S om (t)) T, where S oj (t) (j =1,..., m) is determined by summation over discrete time t 0,..., t nt and normalization to the relative sensitivity (log-gain sensitivity) as follows: S oj (t) = 1 n p j ( nt n [ k=1 i=1 1 x i (t k,t 0 ) ] ) 1/2 x i (t k,t 0 ). (4.52) p j The overall state sensitivity is normalized with respect to the number of state variables and the parameters in order to enable comparison of models (Stelling et al. 2004b). In addition to the state sensitivity, period and amplitude sensitivities are also the quantities of primary interest in oscillatory networks. The period sensitivity S τ captures the change of period τ upon changes in parameters p: ( ) τ S τ =. (4.53) p p=p 0 The relative period sensitivity (Wolf et al. 2005) can be similarly quantified by

155 S j = The period sensitivity measure σ = 1 m 4.5 Control Analysis 145 Δτ/τ Δp j /p j. (4.54) m Sj 2 (4.55) is introduced for quantification of the overall sensitivity of the period against changes in parameters (Wolf et al. 2005). Using the above period sensitivity measure and comparing different oscillatory models, it is shown that the sensitivity depends on the oscillatory mechanisms rather than the details of the model description. It is also shown that systems with negative feedback are more robust than the corresponding systems with positive feedback. An increase in the length of the reaction chain under regulation can lead to a decrease in sensitivity (Wolf et al. 2005). The amplitude sensitivity is similarly described by ( ) Ai S Ai =, (4.56) p p=p 0 where A i is the amplitude of the ith state variable. In addition to the above sensitivity quantities, some other quantities, e.g., the sensitivity measure for discrete stochastic systems (Gunawan et al. 2005, Kim et al. 2007) and the phase sensitivity (Bagheri et al. 2007, Taylor et al. 2008), can also be defined to quantify various sensitivity performances. In addition, various algorithms for solving sensitivity-related problems, e.g., the singularity value decomposition approach to period sensitivity (Zak et al. 2005) and Green s function method for phase sensitivity (Taylor et al. 2008) have been developed. j=1 4.5 Control Analysis Control Coefficients of Metabolic Systems In this section, we focus on the flux control of metabolic systems. Metabolic control analysis (MCA) was formulated originally for extensive metabolic networks but can be extended to any problem that considers the transformations of elements, or more generally, the fluxes of any elements. It is a powerful quantitative and qualitative framework for studying not only the relationship between the properties of steady-states and the properties of individual reactions, but also the control and regulation of biomolecular networks, e.g., metabolic, signaling, and genetic pathways. It quantifies both the control that molecular processes exert on system properties and how system properties result from interactions between individual components. It can also identify the

156 146 4 Qualitative Analysis of Deterministic Dynamical Networks relative importance of each reaction for setting particular system properties. A more recent study has shown that MCA can be mapped directly on to classical control theory and that the two are equivalent. The effect of change in a process or quantity P on a system property S is expressed in terms of the coefficient ( ) P CP S ΔS =. (4.57) S ΔP ΔP 0 The prefactor P/S is a normalization factor that makes the coefficient independent of units and the magnitudes of P and S. For example, when P is the concentration of an enzyme and S is the metabolic flux, (4.57) gives the ratio of the fractional change in flux ΔS to the fractional change in the enzyme concentration ΔP. When considering an infinitesimal change in P, i.e., ΔP 0, (4.57) can be written as CP S = P S S P, (4.58) which can be further simplified to the logarithmic control coefficient CP S = ln S ln P. (4.59) The coefficient can be described by various control coefficients, depending on the changes in system variables, i.e., elasticity coefficients, control coefficients, and response coefficients. These coefficients can be divided into two distinct types: local and global coefficients. Elasticity coefficients are local coefficients pertaining to individual reactions. They can be calculated in any given state. Control coefficients and response coefficients, on the other hand, are global quantities. An elasticity coefficient quantifies the sensitivity of a reaction rate to the change of a concentration or a parameter. It measures the direct effect of a specific change of a concentration or a parameter on a reaction velocity, while the rest of the network is kept fixed. The sensitivity of the reaction rate v k to change of the concentration of a metabolite S i is calculated by the ɛ-elasticity ɛ v k S i = S i v k. (4.60) v k S i Any sufficiently small perturbation of an individual reaction rate by a parameter change p p + Δp, i.e., v k v k + Δv k, drives the system to a new steady state from an old one with J = J + ΔJ and S = S + ΔS, where S(p) andj = v(s(p),p) denote the steady-state concentration and the steady-state flux, respectively. In general, flux is mathematically defined as the amount that flows through a unit area per unit time. In particular, metabolic flux refers to the rate of flow of metabolites along a metabolic pathway, or even through a single enzyme.

157 4.5 Control Analysis 147 The calculation of the flux is dependent on a number of factors, including enzyme concentrations, concentrations of precursors, products, and intermediate metabolites, post-translational modification of enzymes, and the presence of metabolic activators or repressors. MCA and flux balance analysis (FBA) provide frameworks for understanding metabolic fluxes and their constraints. The flux and concentration control coefficients are defined as and C J j v k = v k J j (4.61) J j v k C S i v k = v k S i, (4.62) S i v k respectively, i.e., they quantify the control that a certain reaction v k exerts on the steady-state flux J j and the steady-state concentration S i, respectively. In addition to the elasticity and control coefficients, the third type of coefficient, i.e., response coefficients, which express the direct dependence of the steady-state flux and the concentration on parameters, are similarly defined as R J j p m = p m J j J j p m and R S i p m = p m S i. (4.63) S i p m Metabolic Control Theorems A set of theorems have been developed for MCA. The first theorem, i.e., the summation theorem, makes a statement about the total control over a flux or a steady-state concentration. On the other hand, the second theorem, i.e., the connectivity theorem, relates the control coefficients to the elasticity coefficients. Both types of theorems along with dependency information encoded in the stoichiometric matrix, contain enough information to calculate all control coefficients as functions of the elasticities. The summation theorem states that metabolic fluxes are system properties and that the control of metabolic fluxes is shared by all reactions in the system. When a single reaction changes its control of the metabolic flux, it is compensated by changes in the control of the same flux by all other reactions: r k=1 C J j v k =1, (4.64) where r is the number of reactions, i.e., the flux-control coefficients of a metabolic network for one steady-state flux sum up to 1. This means that all enzymatic reactions can share the control over this flux. For the concentrationcontrol coefficients, we have

158 148 4 Qualitative Analysis of Deterministic Dynamical Networks r k=1 R S i v k =0, (4.65) which means that the control coefficients of a metabolic network for one steady-state concentration are balanced. It follows from these summation theorems that the control coefficients are not independent of each other. If, for example, one coefficient increases, one or more of the other coefficients have to decrease. Thus, control coefficients are system properties and are defined in the context and constraints of the system. Note that the flux summation theorem does not restrict the flux control coefficients to the interval [0 1]. Some coefficients may be negative or exceed unity. On the other hand, the connectivity theorems characterize specific relation between elasticities and control coefficients. They are useful because they highlight the close relationship between the kinetic properties of individual reactions and the system properties of a pathway. Two basic sets of theorems exist, one for fluxes and another for concentrations. The first connectivity theorem relates the flux control coefficients and the ɛ-elasticities in a summation over products between both coefficients: r C J j v k ɛ v k S i =0. (4.66) k=1 Note that the summation runs over all rates v k for the concentration S i and flux J j of a given metabolite. An analogous theorem was derived for concentrations: r C S i v k ɛ v k S i = 1. (4.67) k=1 It connects the concentration control coefficients C Si v k to the ɛ-elasticities ɛ v k S i. With these metabolic control theorems, we are able to investigate metabolic pathways or networks from their global and local properties. In addition, we can also determine control coefficients from elasticity coefficients. During the past decades, MCA, which was originally proposed by Kacser and Burns (Kacser and Burns 1973) and Heinrich and Rapoport (Heinrich and Rapoport 1974), has been used extensively for analyzing regulatory behavior of various metabolic pathways. Reviews on MCA can be found in (Fell 1992, Wildermuth 2000, Moreno et al. 2008) and also in (Klipp et al. 2005). 4.6 Monotone Dynamical Systems Notation In biomolecular networks, consistent interactions are common, i.e., the synthesis rate of the ith component at t, namely, f i (x τ ) in (4.1), monotonously

159 4.6 Monotone Dynamical Systems 149 increases (or decreases) as the concentration of the jth component at t τ ij monotonously increases. Here, τ ij is the time delay from node j to node i. Mathematically, the types of such interactions in the interaction graph can be defined as follows: An interaction is a positive interaction from the jth node to the ith node if f i (x) > 0 for all x X, (4.68) x j and we set s ij =1. An interaction is a negative interaction from the jth node to the ith node if f i (x) < 0 for all x X, (4.69) x j and we set s ij = 1. There is no interaction from the jth node to the ith node if and we set s ij =0. f i (x) x j = 0 for all x X, (4.70) Here X is the feasible region of variables. Thus, s ij =1(or 1) means that the jth component affects positively (or negatively) the ith component with time delay τ ij. Most of elementary biochemical reactions satisfy this monotone condition. In this section, we assume that these consistent interactions, i.e., s ij =1, 1, or 0, hold for all of the systems under study. Next, we describe the paths and loops in a network or an interaction graph. A path in a molecular network is a sequence of vertices such that from each of its vertices or nodes, there is an edge or an interaction to the next vertex in the sequence. A finite path always has a first vertex, called its start vertex, and a last vertex, called its end vertex. Both of them are called end or terminal vertices of the path. The other vertices in the path are internal vertices. A cycle or a loop is a path such that the starting vertex and the ending vertex are the same. Note that the choice of starting vertex in a loop is arbitrary. A loop is called a self-feedback or simply a self-loop if an edge connects a vertex to itself. For a molecular network defined as an interaction graph, there are two types of loops or cycles that characterize the dynamics of the system, i.e., positive loops (or positive feedback loops) and negative loops (or negative feedback loops).

160 150 4 Qualitative Analysis of Deterministic Dynamical Networks If there are an even (or zero) number of negative interactions in a closed loop, this loop is called as positive. If there are an odd number of negative interactions in a closed loop, this loop is called as negative. Clearly, the product of s ij for all interactions in a positive loop, i.e., i,j s ij for all i j interactions in the loop, is +1, whereas the product of s ij for all interactions in a negative loop is 1. Consider the following input/output (I/O) system: dx(t) = f(x(t),u(t)), y(t) =h(x(t)), (4.71) dt in which the state x(t) evolves on some subset X R n, and input and output values u(t) andy(t) belong to subsets U R m and Y R p, respectively. The maps f : X U R n and h : X Y are assumed to be continuously differentiable. An input is a signal u :[0, ) U that is locally essentially compact. By an ordered Euclidean space, we mean a Euclidean space R q,for some positive integer q, together with an order induced by a positivity cone K, that is, K R q is a non-empty, closed, convex, pointed (K K = {0}), and solid (K has a non-empty interior) cone. Then, we have the order defined as follows: the order x 2 x 1 implies that x 1 x 2 K; the order x 2 x 1 implies that x 2 x 1 and x 2 x 1 ; the order x 2 x 1 implies that x 1 x 2 int(k). Denote the solution of (4.71) with initial value x(t 0 )=x 0 and input u by φ(t; x 0,u). Given an order on X, U, andy,amonotone I/O system with respect to the order is a system (4.71), where h is assumed to be a monotone map, i.e., for each input, solutions do not blow up in finite time therefore x(t) and y(t) are defined for all t 0, and for all initial values x 1 and x 2 with x 1 X and x 2 X and inputs u 1 (t) andu 2 (t) with u 1 U and u 2 U, the following property holds: x 1 x 2 & u 1 u 2 φ(t; x 1,u 1 ) φ(t; x 2,u 2 ) for all t 0. (4.72) Furthermore, a monotone system is strongly monotone if the following stronger property holds: x 1 x 2 & u 1 u 2 φ(t; x 1,u 1 ) φ(t; x 2,u 2 ) for all t 0. (4.73) The monotone I/O system can be viewed as a nonlinear dynamical system with the control variables u(t). Hence, the monotone dynamical system for (4.1) can be similarly defined by eliminating the control variable u(t). Clearly, a monotone dynamical system or a monotone I/O system is an order-preserved system, i.e., its solution preserves the order. A monotone dynamical system is also called a cooperative system. In contrast, a competitive system is defined

161 4.6 Monotone Dynamical Systems 151 based on the corresponding monotone dynamical system; namely, dx(t)/dt = f(x τ ) is called a competitive system if dx(t)/dt = f(x τ ) is a cooperative system. A dynamical system, whose structure is represented by an interaction graph, is monotone if and only if every loop in the interaction graph has an even (including zero) number of negative interactions. A molecular network is a monotone dynamical system if every loop in the network is positive. We will show such a theoretical result in Chapters 6 and 7. Examples of monotone and non-monotone dynamical networks are shown in Figure 4.8. ( a ) x1 x2 ( b ) x1 x2 x3 x4 x3 x4 Figure 4.8 Examples of monotone and non-monotone dynamical networks: (a) an example of a non-monotone dynamical network; (b) an example of a monotone dynamical network. Arrows and bar heads indicate positive and negative interactions, respectively Although consistent interactions are common, monotonicity is still a very strong assumption to satisfy. Monotonicity is usually satisfied on subsystems or modules of a given network. Given the interaction graph of a biomolecular network with monotonicity, it has been proven that its trajectories can only converge to stable equilibria for almost all initial states. In other words, there are neither stable oscillations nor other dynamical attractors like quasiperiodic and strange attractors for monotone dynamical systems (Kobayashi et al. 2003,Angeli and Sontag 2004a). However, it is very difficult to determine the number of equilibria and their stability when there is no other information besides monotonicity Decomposition of Monotone Systems Depending on the structure, a molecular network may be decomposed into two monotone dynamical systems (Angeli et al. 2004). Take the simple monotone nonlinear system shown in Figure 4.9 (a) as an example. The system is described by the following ODEs

162 152 4 Qualitative Analysis of Deterministic Dynamical Networks ẋ 1 = f 1 (x 2 ) d 1 x 1, (4.74) ẋ 2 = f 2 (x 3 ) d 2 x 2, (4.75) ẋ n = f n (x 1 ) d n x n, (4.76) where df i /dx i+1 > 0fori =1,..., n 1anddf n /dx 1 > 0. ( a ) x2 ( b ) x2 x1 x3 y x1 x3 xn x4 u xn x4 xn - 1 xn - 1 Figure 4.9 Decomposition of a closed feedback network into an open network with an input and an output: (a) a closed monotone cyclic network; (b) decomposition of (a) into an open network with input u and output y = x 1 Replacing x 1 in f n (x 1 ) of (4.76) by an input u forms a new parameterized system ẋ 1 = f 1 (x 2 ) d 1 x 1, (4.77) ẋ 2 = f 2 (x 3 ) d 2 x 2, (4.78) ẋ n = f n (u) d n x n, (4.79) with output y = h(x) = x 1, as shown in Figure 4.9 (b). By setting u(t) = y(t) = h(x(t)), (4.77) (4.79) return to the original system (4.74) (4.76). Such a decomposition technique can be used to detect multistability and bifurcations in a large class of biological monotone systems, i.e., to derive information such as the number of the equilibria and their stability properties (Angeli et al. 2004, Angeli and Sontag 2004a). A system is said to admit a non-degenerate input to state (I/S) static characteristic k X ( ) :U X if for each constant input u U there exists a unique globally asymptotically stable equilibrium k X (u) and det f(k X (u),u) 0. For systems with non-degenerate I/S characteristic, their input/output (I/O) static characteristic is defined as the composition, i.e., k Y = h k X.For example, the input/output (I/O) characteristic of (4.77) (4.79) is k Y (u) = f 1 d 1 f 2 d 2 f n(u) d n. (4.80)

163 4.6 Monotone Dynamical Systems 153 A map k : U U is said to have non-degenerate fixed points if for all u U with k(u) =u, k (u) exists and k (u) 1. represents the operation of function composition, i.e.,(g f)(x) =g(f(x)) for functions f and g. The fixed points of the I/O characteristic play a central role in the number of equilibria and their stability properties, i.e., the following theorem applies (Angeli et al. 2004, Angeli and Sontag 2004a). Theorem 4.6. Consider a monotone single input single output (SISO) (m = p =1, with standard order) system endowed with non-degenerate I/S and I/O static characteristic ẋ = f(x, u), (4.81) y = h(x). (4.82) Consider the unitary positive feedback interconnection u = y. Then, the equilibria are in one to one correspondence with the fixed points of the I/O characteristic. Moreover, if k Y has non-degenerate fixed points, the closed-loop system is strongly monotone, and all trajectories are bounded, then, for almost all initial conditions, solutions converge to the set of equilibria of (4.81) (4.82) corresponding to inputs for which k Y (u) < 1. y y = u I I I k Y I I I u Figure 4.10 Schematic diagram of Theorem 4.6 The schematic diagram of Theorem 4.6 is shown in Figure According to the theorem, the points III and I are stable and II is unstable. Consider the following simple example described in (Angeli and Sontag 2004a): ẋ 1 = α 1 1+x β 2 x 1, (4.83) ẋ 2 = α 2 1+x γ 1 x 2, (4.84)

154 4 Qualitative Analysis of Deterministic Dynamical Networks where α 1, α 2, β, andγ are positive parameters. It is the unitary feedback closure of ẋ 1 = α 1 1+u β x 1, (4.

164 154 4 Qualitative Analysis of Deterministic Dynamical Networks where α 1, α 2, β, andγ are positive parameters. It is the unitary feedback closure of ẋ 1 = α 1 1+u β x 1, (4.85) ẋ 2 = α 2 1+x γ 1 x 2, (4.86) y = x 2. (4.87) It is easy to verify that (4.85) (4.87) are a monotone dynamical system due to the positive loop. The system is endowed with the static I/S characteristic ) k X (u) =. (4.88) ( α 1 1+u β α 2 (1+u β ) γ (1+u β ) γ +α γ 1 The I/O static characteristic and the phase portrait for α 1 =1.3, α 2 =1, β =3,andγ = 10 are shown in Figure ( a ) ( b ) Figure 4.11 I/O characteristic and phase portrait of (4.83) (4.84). The horizontal axis is u in (a) and x 1 in (b). The vertical axis is x 2 (from (Angeli and Sontag 2004a)) As indicated above, monotonicity is a very strong assumption, which is generally satisfied only by subsystems of a given network because of the existence of negative loops. Therefore, the general results on monotone systems are limited to a small class of biomolecular systems. However, a non-monotone system can often be decomposed into two or more monotone subsystems or modules and the results for monotone systems can be applied to these monotone subsystems. Then the properties of the non-monotone system can be obtained by combining the properties the monotone subsystems. Consider the stability of the SISO monotone dynamical systems connected in feedback, as shown in Figure The resulting system may not be a monotone dynamical system. For such a system, a small gain theorem for the

165 4.6 Monotone Dynamical Systems 155 feedback interconnection of a system with a monotonically increasing input output static gain and a system with a monotonically decreasing input output gain is obtained as follows (Angeli and Sontag 2003). Theorem 4.7. Consider the following interconnection of two SISO dynamical systems: Σ 1 : ẋ = f x (x, w), y = h x (x), (4.89) Σ 2 : ż = f z (z, y), w = h z (z), (4.90) with U x = Y z and U z = Y x, where U and Y denote the input and output sets. Suppose that 1. the first subnetwork Σ 1 is monotone when its input w, as well as output y, is ordered according to the standard order induced by the positive real semi-axis; 2. the second subnetwork Σ 2 is monotone when its input y is ordered according to the standard order induced by the positive real semi-axis and its output w is ordered by the negative real semi-axis; 3. the respective static input-state characteristics k x ( ) and k z ( ) exist, and therefore, the static input output characteristics K y ( ) and K w ( ) also exist, and are monotonically increasing and decreasing respectively; 4. every solution of the feedback closure for (4.89) (4.90), i.e., the network Σ defined by (4.92), is bounded. Then, (4.92) has the globally attractive equilibrium provided that the following scalar discrete-time dynamical system, evolving in U x, has a unique globally attractive fixed point. w k+1 =(K w K y )(w k ) (4.91) Figure 4.12 Feedback closure of the two SISO monotone subsystems Σ 1 and Σ 2, i.e., the first and second equations of the network Σ defined by (4.92) Here, the feedback closure network of the two subnetworks Σ i (i =1,2) has the form

166 156 4 Qualitative Analysis of Deterministic Dynamical Networks { ẋ = f x (x, h z (z)), Σ : ż = f z (z,h x (x)). (4.92) ( a ) ( b ) Figure 4.13 Two types of input output characteristics in the (w, y) plane: (a) convergence to a stable fixed point; (b) convergence to a stable periodic orbit (from (Wang et al. 2006a)) The graphical interpretation of (4.91) of Theorem 4.7 is shown in Figure 4.13 (a). In addition to the convergence to a stable fixed point, (4.91) may also converge to a stable periodic orbit, as shown in Figure 4.13 (b). When an oscillation occurs in (4.91), stable oscillations may also emerge in (4.92) when appropriate time delays are introduced. See (Wang et al. 2006a, Wang et al. 2007, Angeli and Sontag 2004b) for more details. In particular, for a class of cyclic delay systems, a general result was obtained as follows (Enciso 2004): Theorem 4.8. Consider the cyclic nonlinear system ẋ i = g i (x i+1 ) d i x i, i =1,..., n 1, (4.93) ẋ n = g n (x 1 (t τ)) d n x n, (4.94) where δ i g i(x) 0, δ i { 1, 1}, δ 1 δ 2 δ n = 1, (4.95) and g i (x) has the Hill function form, i.e., g i (x) = axm b + x m + c or g a i(x) = + c. (4.96) b + xm Then exactly one of the following statements holds 1. g (ū) 1, the discrete-time system (4.97) is globally attracted to a unique fixed point, and the continuous-time system (4.93) (4.94) is also globally attracted to a unique equilibrium, for all values of the delay τ.

167 4.6 Monotone Dynamical Systems g (ū) > 1, the discrete-time system (4.97) has non-constant periodic solutions, and the continuous-time system (4.93) (4.94) has non-constant periodic solutions for some values of τ, where ū is the unique fixed point of the one-dimensional map u k+1 = g(u k ) (4.97) with g(u) = 1 g g 2 g n. (4.98) d 1 d 2 d n The assumption δ 1 δ 2 δ n = 1 means that the system is subject to a negative feedback loop. In the positive feedback loop case, i.e., δ 1 δ 2 δ n =1, system (4.93) (4.94) falls into the framework of positive feedback loop systems and is monotone. A large number of results are known for this case, perhaps the most important one of which is that the generic solution converges towards an equilibrium (Kobayashi et al. 2003, Angeli and Sontag 2004a). The same result holds if, instead of assuming the a Hill function form, every nonlinear function g i (x) is assumed to be one of the forms ±a tan 1 (bx) or ±a tanh(bx), which are often used in neural networks. An example is shown in Figures 4.14 and ( a ) ( b ) ( c ) Figure 4.14 The first case of Theorem 4.8. (a) System (4.93) (4.94) is globally attracted to a unique fixed point. (b) System (4.97) is globally attractive to a unique equilibrium. (c) The induced decreasing function is g(x), and the increasing function g 2 (x) =g(g(x)). The parameter values and the functions are n =3,g 1 = g 2 = g 3 = tan 1 (x), d 1 = 0.11, d 2 = 2.5, d 3 = 4, and τ = 80. Here, g (ū) = 1/1.1 < 1 (from (Enciso 2004))

168 158 4 Qualitative Analysis of Deterministic Dynamical Networks ( a ) ( b ) ( c ) Figure 4.15 The second case of Theorem 4.8. (a) System (4.93) (4.94) converges to non-constant periodic solutions. (b) System (4.97) converges to periodic 2-cycles. (c) The induced decreasing function is g(x), and the increasing function g 2 (x) =g(g(x)). In this case, g (ū) =1/0.9 > 1 and g 2 (x) =x has several solutions. The parameter values and the functions are the same as those used in Figure 4.14, except that d 1 =0.09 (from (Enciso 2004))

169 5 Stability Analysis of Genetic Networks in Lur e Form In this chapter, we present a gene regulatory network model, which has a special structure described by differential equations, and study its stability. As mentioned in previous chapters, generally, a biomolecular system is characterized with significant time delays in gene regulation, particularly, in the transcription, translation, diffusion, and translocation processes. All cellular components also exhibit intracellular noise due to random births and deaths of individual molecules, and extracellular noise due to environmental fluctuations. Such time delays and stochastic noises may affect the dynamics of the entire biomolecular system both qualitatively and quantitatively. In this chapter, in addition to a basic case, we also consider the cases of genetic networks with time delays and stochastic perturbations. 5.1 A Genetic Network Model In (Becskei and Serrano 2000), to evaluate the role of negative feedback in the stability of genetic networks, the authors studied a simple genetic network model. In (Chen and Aihara 2002a), the authors presented a general gene regulatory network model as follows: ṁ i (t) = a i m i (t)+b i (p 1 (t),p 2 (t),..., p n (t)), (5.1) ṗ i (t) = c i p i (t)+d i m i (t), (5.2) where m i (t) R and p i (t) R (i =1, 2,..., n) are the concentrations of the mrna and the protein of the ith node. Note that (5.1) (5.2) are the same with (3.13) (3.14), but the translation process (5.2) is expressed by a simple linear form. In this network, there is one output but multiple inputs for a single node or gene. A direct edge is linked from node j to node i if the TF or protein j regulates gene i. In (5.1) (5.2), a i and c i are the degradation rates of the mrna and the protein, d i is a constant, and b i is the regulatory function of the ith gene, which is a general nonlinear function of the variables,

170 160 5 Stability Analysis of Genetic Networks in Lur e Form p 1 (t),..., p n (t). Based on (4.4) (Chen and Aihara 2002a), the stability of (5.1) (5.2) was studied by using local stability analysis and characteristic equation analysis. Although the method of characteristic equation analysis can provide an accurate local stability region, it is difficult to verify, especially for largescale genetic networks with time delays. As is well known, genetic networks (or gene networks) are usually large-scale even in a simple organism. The gene activity is well controlled in a cell. The gene regulation function b i plays a key role in the nonlinear dynamics. In general, the form of b i may be very much complicated, depending on all biochemical reactions involved in the regulation. Typical regulatory logics include AND-like gates and OR-like gates (Yuh et al. 1998,Buchler et al. 2003,Setty et al. 2003). Here, we present a model of genetic networks, in which different TFs act additively to regulate the same gene (Li et al. 2006a). In other words, the regulatory function is of the form b i = j b ij(p j (t)), which is called a SUM regulatory logic (Yuh et al. 1998, Kalir et al. 2005). The function b ij (p j (t)) is usually expressed as a monotonic function of the Hill form as follows: (p α j(t)/β) H ij 1+(p j, if TF j is an activator of gene i, (t)/β) H b ij (p j (t)) = α ij 1 1+(p j(t)/β) H, if TF j is a repressor of gene i, (5.3) where H is the Hill coefficient, β is a positive constant, and α ij is the dimensionless transcriptional rate of TF j to gene i, which is a bounded constant. Note that 1 1+(p j (t)/β) H =1 Therefore, (5.1) (5.2) can be rewritten as follows: (p j(t)/β) H 1+(p j (t)/β) H. (5.4) ṁ i (t) = a i m i (t)+ j G ij g(p j (t)) + l i, (5.5) ṗ i (t) = c i p i (t)+d i m i (t), (5.6) where g(x) = (x/β)h 1+(x/β) H (5.7) is a monotonically increasing function and G =(G ij ) R n n is the coupling matrix of the genetic network, which is defined as follows: if there is no link from node j to node i, G ij =0; if TF j is an activator of gene i, G ij = α ij ; if TF j is a repressor of gene i, G ij = α ij. Thus, the matrix G =(G ij ) defines coupling topology, directions, and transcriptional rates of the genetic network. l i is defined as a basal rate

171 5.1 A Genetic Network Model 161 l i = α ij, j V i1 (5.8) in which V i1 is the set of all the repressors of gene i. In a compact matrix form, (5.5) (5.6) can be rewritten as follows: ṁ(t) =Am(t)+Gg(p(t)) + l, (5.9) ṗ(t) =Cp(t)+Dm(t), (5.10) where m(t) = [m 1 (t),..., m n (t)] T, p(t) = [p 1 (t),..., p n (t)] T, l = [l 1..., l n ] T, A = diag( a 1,..., a n ), C = diag( c 1,..., c n ), D = diag(d 1,..., d n ), and g(p(t)) = [g 1 (p 1 (t)),..., g n (p n (t))] T. Note that in (5.9) (5.10), we can include multiple nonlinear vector regulatory functions, but for simplicity, we consider only one here. m and p are said to be an equilibrium of (5.9) (5.10) if they satisfy Am + Gg(p )+l = 0 and Cp + Dm = 0. For convenience, we will always shift the equilibrium (m,p ) to the origin by letting x(t) =m(t) m and y(t) =p(t) p. Thus, we have ẋ(t) =Ax(t)+Gf(y(t)), (5.11) ẏ(t) =Cy(t)+Dx(t), (5.12) where f(y(t)) = g(y(t) +p ) g(p ). Since g is a monotonically increasing nonlinear function with saturation, for all a, b R with a b, it satisfies g(a) g(b) 0 k. (5.13) a b When g is a differentiable function, the above inequality is equivalent to 0 dg(a)/da k. From the relationship between f( ) andg( ), f( ) satisfies the sector condition 0 f(a)/a k, or equivalently f(a)(f(a) ka) 0. (5.14) Note that a Lur e system is a linear dynamical system, interconnected by feedback to a static nonlinearity f( ) that satisfies a sector condition (5.14) (Vidyasagar 1993). Hence, the genetic network (5.11) (5.12) can be seen as a type of Lur e system and can be investigated by using the fruitful theory of Lur e system. Next, let us return to the motivations for presenting the genetic network model. When modeling genetic networks (or many other systems), we must accept that there are no unique, exact mathematical descriptions of processes in nature. We have to search for approximations that capture all aspects of interest as accurately as feasible and at the same time allow us to gain insight from their analysis (Voit 2000). The primary bases for proposing this model are mainly due to the following facts (Li et al. 2006a, Li et al. 2007a). 1. Such a SUM logic does exist in many natural genetic networks.

172 162 5 Stability Analysis of Genetic Networks in Lur e Form 2. Such a genetic network with SUM regulatory function can be implemented experimentally, which is an advantage from the viewpoint of synthetic biology. 3. In literature, there exist many well-known genetic systems that can be described in (or reformed into) the form of the model. 4. Even if the regulatory functions in real genetic networks are not exactly of the form in the model, the model can serve as a good approximation of the real regulatory networks. 5. It has been shown that the model is extremely suited to be analyzed in the framework of control theory. In the following sections, we will analyze the stability of the genetic network model. Although the above system (5.11) (5.12) is developed by analyzing genetic networks (5.1) (5.2), the above derivation is also applicable to genetic networks with time delays and stochastic perturbations. When analyzing the stability of an equilibrium, it is equivalent to studying systems (5.5) (5.6) and (5.11) (5.12). In the following sections we study system (5.11) (5.12) directly. The stability analysis of the genetic networks is based on the Lyapunov method and the Lur e system approach, and the results are presented in the form of LMIs (Boyd et al. 1994), which are easy to be verified by convex optimization techniques, e.g., the interior point method (Boyd et al. 1994), or by using software packages, e.g., the MATLAB R LMI Toolbox. 5.2 Stability Analysis of Genetic Networks Without Noise In this section, we analyze the global stability of the genetic network (5.11) (5.12) by using the Lyapunov function method. The main results are based on (Li et al. 2006a). The sufficient condition is summarized in the following theorem. Theorem 5.1. If there exist matrices P 11, P 22, P 12, and Λ = diag(λ 1,..., λ n ) > 0, such that the following LMIs hold: 2P 11 A + P 12 D + DP12 T DP 22 + AP 12 + P 12 C P 11 G M 1 = P 22 D + P12A T + CP12 T 2P 22 C P12G T + kλ < 0, G T P 11 G T P 12 + kλ 2Λ [ ] P11 P P = 12 P12 T > 0, (5.15) P 22 then the origin of the genetic network (5.11) (5.12) is the unique equilibrium point and is globally asymptotic stable. Proof (Li et al. 2006a): Consider the following Lyapunov function

173 5.2 Stability Analysis of Genetic Networks Without Noise 163 V (x(t),y(t)) = [ ] T x(t) P y(t) [ ] x(t). (5.16) y(t) By calculating the time derivative of V along (5.11) (5.12), we obtain V (x(t),y(t)) = 2x T (t)p 11 Ax(t)+2y T (t)p12ax(t)+2x T T (t)p 11 Gf(y(t)) +2y T (t)p12gf(y(t))+2x T T (t)p 12 Dx(t) +2y T (t)p 22 Dx(t)+2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) 2x T (t)p 11 Ax(t)+2y T (t)p12ax(t)+2x T T (t)p 11 Gf(y(t)) +2y T (t)p12gf(y(t))+2x T T (t)p 12 Dx(t) +2y T (t)p 22 Dx(t)+2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) n 2 λ i f(y i (t))[f(y i (t)) ky i (t)] i=1 = ξ T (t)m 1 ξ(t) 0, (5.17) where ξ(t) =[x T (t),y T (t),f T (y(t))] T. From the above analysis, we know that V (t) = 0 if and only if both x(t) =0andy(t) = 0, and for all the other (x(t),y(t)), V (t) < 0. Hence, the origin of the genetic network (5.11) (5.12) is globally asymptotic stable. Under condition (5.15), the uniqueness of the equilibrium can be proven by using the contradiction method (see, for example, (Arik 2002)). In fact, if there is another equilibrium point ( m, p) that is different from (m,p ), we can also shift the equilibrium ( m, p) to the origin, and by the same analysis as above, it is easy to show that the origin is also globally asymptotic stable under the condition (5.15). Note that the condition (5.15) is independent of the equilibrium. Hence, we have more than one globally asymptotic stable equilibrium, which is impossible. Therefore, (5.9) (5.10) has a unique equilibrium, in other words, under condition (5.15) the origin is the unique equilibrium (5.11) (5.12). In (5.15), constant matrices A, C, D, andg and constant k are from the model (5.11) (5.12), and all the components of the symmetric matrices M 1 and P in (5.15) are linear functions of the matrix variables P 11, P 22, P 12, and Λ, hence the conditions in (5.15) are LMIs, which is easy to be verified numerically. In genetic networks, time delays generally exist in transcription, translation, and translocation processes. Next, we consider the genetic network with time delays, as described below ẋ(t) =Ax(t)+Gf(y(t τ 1 (t))), (5.18) ẏ(t) =Cy(t)+Dx(t τ 2 (t)), (5.19) where τ 1 (t) > 0andτ 2 (t) > 0 are inter- and intra-node time-varying delays. We assume that τ 1 (t) d 1 < 1and τ 2 (t) d 2 < 1. This model can be derived

174 164 5 Stability Analysis of Genetic Networks in Lur e Form from a delayed analogue of system (5.1) (5.2) by using the same manipulation as that in the above section. The theoretical result is summarized in the following theorem (Li et al. 2006a): Theorem 5.2. If there exist matrices P 11, P 22, P 12, Q > 0, R > 0, and Λ = diag(λ 1,..., λ n ) > 0, such that the following LMIs hold 2P 11 A + R P 12 C + AP 12 P 12 D 0 P 11 G P12A T + CP12 T 2P 22 C P 22 D kλ P12G T M 2 = DP12 T DP 22 (1 d 2 )R kλ 0 Q 2Λ 0 < 0, G T P 11 G T P (1 d 1 )Q [ ] P11 P P = 12 P12 T > 0, (5.20) P 22 then the origin of the genetic network (5.18) (5.19) is the unique equilibrium point and is globally asymptotic stable. Proof (Li et al. 2006a): Construct a Lyapunov Krasovskii functional as follows: [ ] T [ ] t x(t) x(t) V (x(t),y(t),t)= P + f T (y(μ))qf(y(μ))dμ y(t) y(t) t τ 1 (t) + t t τ 2(t) x(μ)rx(μ)dμ. (5.21) Calculating the time derivative of V (x(t),y(t),t), we have V (x(t),y(t),t)=2x T (t)p 11 Ax(t)+2y T (t)p12ax(t) T +2x T (t)p 11 Gf(y(t τ 1 (t))) + 2y T P12Gf(y(t T τ 1 (t)) +2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) +2x T (t)p 12 Dx(t τ 2 (t))+2y T (t)p 22 Dx(t τ 2 (t)) +f T (y(t))qf(y(t)) (1 τ 1 (t))f T (y(t τ 1 (t)))qf(y(t τ 1 (t))) + x T (t)rx(t) (1 τ 2 (t))x T (t τ 2 (t))rx(t τ 2 (t)). (5.22) Considering τ 1 (t) d 1 < 1, τ 2 (t) d 2 < 1, and we have n 2 λ i f(y i (t))[f(y i (t)) ky i (t)] 0, (5.23) i=1 V (x(t),y(t),t) ξ T (t)m 2 ξ(t) < 0, (5.24) where ξ(t) =[x T (t),y T (t),x T (t τ 2 (t)),f T (y(t)),f T (y(t τ 1 (t)))] T.Itfollows from the Lyapunov Krasovskii Theorem (Kolmanovskii and Myshkis 1999)

175 5.3 Stochastic Stability of Gene Regulatory Networks 165 that the delayed genetic network (5.18) (5.19) is globally asymptotic stable. By a proof similar to that of Theorem 5.1, the origin can be proven to be the unique equilibrium. 5.3 Stochastic Stability of Gene Regulatory Networks Gene regulation is an intrinsically noisy process, which is subject to intracellular and extracellular noise perturbations and environmental fluctuations. Such noise should affect the dynamics of genetic networks. Since we know very little about how noise acts on genetic networks, one of the simplest ways to incorporate random effects is to assume that certain fluctuations randomly perturb the genetic network in an additive manner. In this section, we study genetic networks with random perturbations based on stochastic differential equation models. This section is mainly based on parts of (Li et al. 2006a) and (Li et al. 2007a). In the following, λmax(p ) denotes the maximum eigenvalue of a square matrix P, L 2 [0, ) is the space of square-integrable vector functions over [0, ), stands for the Euclidean vector norm, and 2 stands for the usual L 2 [0, ) norm Mean-square Stability We consider genetic networks with noise perturbations of the following form ẋ(t) =Ax(t)+Gf(y(t)) + σ(y(t))n(t), (5.25) ẏ(t) =Cy(t)+Dx(t), (5.26) where n(t) =[n 1 (t),..., n l (t)] T with n i (t) as a scalar white Gaussian noise process with zero mean, and n i (t) is independent of n j (t) for all i j. σ(y(t)) R n l is called the noise intensity matrix. Here we only consider the noise perturbations from regulations or inter-nodes perturbations. Since the nodes of the genetic network model communicate via variable y(t), we assume that the noise intensity matrix is a function of y(t), which acts on the dynamics of x(t). As in many studies of stochastic dynamical systems, e.g., (Kolmanovskii and Myshkis 1999, W. Chen et al. 2005), we assume that σ(y(t)) can be estimated by trace[σ(y(t))σ T (y(t))] y T (t)hy(t), H 0. (5.27) Recall that the time derivative of a Wiener process is a white noise process (Arnold 1974). We have dw(t) =n(t)dt, where w(t) isanl-dimensional Wiener process. Hence, (5.25) (5.26) can be rewritten as the following stochastic differential equations (SDEs):

176 166 5 Stability Analysis of Genetic Networks in Lur e Form dx(t) =[Ax(t)+Gf(y(t))]dt + σ(y(t))dw(t), (5.28) dy(t) =[Cy(t)+Dx(t)]dt. (5.29) For this genetic network model, we have the following stability theorem (Li et al. 2006a). Theorem 5.3. If there exist matrices P 11, P 22, P 12, and Λ = diag(λ 1,..., λ n ) > 0, and constant ρ>0, such that the following LMIs hold: 2P 11 A + P 12 D + DP12 T DP 22 + AP 12 + P 12 C P 11 G M 3 = P 22 D + P12A T + CP12 T 2P 22 C + ρh P12G T + kλ < 0, G T P 11 G T P 12 + kλ 2Λ [ ] P11 P P = 12 P12 T > 0, P P 11 ρi, (5.30) 22 then the genetic network (5.25) (5.26) is asymptotically stable in mean square. Proof (Li et al. 2006a): Consider the same Lyapunov function as that in the proof of Theorem 5.1. By Itǒ s formula (Arnold 1974), we obtain the following stochastic differential: dv (x(t),y(t)) = LV (x(t),y(t))dt +2[x T (t)p 11 + y T (t)p12]σ(y(t))dw(t), T (5.31) where L is the diffusion operator and LV (x(t),y(t)) = 2x T (t)p 11 Ax(t)+2y T (t)p12ax(t)+2x T T (t)p 11 Gf(y(t)) +2y T (t)p12gf(y(t))+2x T T (t)p 12 Dx(t) +2y T (t)p 22 Dx(t)+2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) +trace(σ(y(t))σ T (y(t))p 11 ). (5.32) By using (5.27) and (5.30), we have (W. Chen et al. 2005) trace(σ(y(t))σ T (y(t))p 11 ) λmax(p 11 )trace(σ(y(t))σ T (y(t))) ρy T (t)hy(t). (5.33) By considering 2 n i=1 λ if(y i (t))[f(y i (t)) ky i (t)] 0, we obtain LV (x(t),y(t)) 2x T (t)p 11 Ax(t)+2y T (t)p12ax(t)+2x T T (t)p 11 Gf(y(t)) +2y T (t)p12gf(y(t))+2x T T (t)p 12 Dx(t) +2y T (t)p 22 Dx(t)+2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) n 2 λ i f(y i (t))[f(y i (t)) ky i (t)] + ρy T (t)hy(t) i=1 = ξ T (t)m 3 ξ(t), (5.34)

177 5.3 Stochastic Stability of Gene Regulatory Networks 167 where ξ(t) =[x T (t),y T (t),f T (y(t))] T. Therefore, it follows from M 3 < 0 that E[dV (x(t),y(t))] = E[LV (x(t),y(t))dt] < 0, where E is the mathematical expectation operator. It is easy to show that the genetic network (5.25) (5.26) is asymptotically stable in mean square. Next, we go further to consider genetic networks with both time delays and noise perturbations. The networks are represented in the following form: ẋ(t) =Ax(t)+Gf(y(t τ 1 (t))) + σ(y(t),y(t τ 1 (t)))n(t), (5.35) ẏ(t) =Cy(t)+Dx(t τ 2 (t)), (5.36) which can also be rewritten as the following SDEs: dx(t) =[Ax(t)+Gf(y(t τ 1 (t)))]dt + σ(y(t),y(t τ 1 (t)))dw(t), (5.37) dy(t) =[Cy(t)+Dx(t τ 2 (t))]dt. (5.38) We assume that the noise intensity matrix σ(y(t),y(t τ 1 (t))) can be estimated by trace[σ(y(t),y(t τ 1 (t)))σ T (y(t),y(t τ 1 (t)))] y T (t)h 1 y(t)+y T (t τ 1 (t))h 2 y(t τ 1 (t)), (5.39) with H 1 0andH 2 0. For this genetic network model, the main result is summarized in the following theorem (Li et al. 2006a). Theorem 5.4. If there exists matrices P 11, P 22, P 12, Q>0, R>0, S>0, and Λ = diag(λ 1,..., λ n ) > 0, and constant ρ>0, such that the following LMIs hold 2P 11 A + R P 12 C + AP 12 P 12 D 0 P 11 G P12A T + CP12 T 2P 22 C + S + ρh 1 P 22 D kλ P12G T M 4 = DP12 T DP 22 (1 d 2 )R kλ 0 Q Λ 0 < 0, G T P 11 G T P (1 d 1 )Q [ ] P11 P P = 12 P12 T > 0, P P 11 ρi, ρh 2 (1 d 1 )S<0, (5.40) 22 then the genetic network (5.35) (5.36) is asymptotically stable in mean square. Proof: Consider a Lyapunov Krasovskii functional as follows: [ ] T [ ] x(t) x(t) V (x(t),y(t),t)= P y(t) y(t) + + t t τ 1(t) t t τ 2 (t) [f T (y(μ))qf(y(μ)) + y T (μ))sy(μ)]dμ x(μ)rx(μ)dμ. (5.41)

178 168 5 Stability Analysis of Genetic Networks in Lur e Form By Itô s formula (Arnold 1974), we obtain the following stochastic differential: dv (x(t),y(t),t)=lv (x(t),y(t),t)dt +2[x T (t)p 11 + y T (t)p12]σ(y(t),y(t T τ 1 (t)))dw(t),(5.42) where LV (x(t),y(t),t) =2x T (t)p 11 Ax(t)+2y T (t)p12ax(t) T +2x T (t)p 11 Gf(y(t τ 1 (t))) + 2y T P12Gf(y(t T τ 1 (t)) +2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t)+2x T (t)p 12 Dx(t τ 2 (t)) +2y T (t)p 22 Dx(t τ 2 (t)) + f T (y(t))qf(y(t)) (1 τ 1 (t))f T (y(t τ 1 (t)))qf(y(t τ 1 (t))) +y T (t)sy(t) (1 τ 1 (t))y T (t τ 1 (t))sy(t τ 1 (t)) +x T (t)rx(t) (1 τ 2 (t))x T (t τ 2 (t))rx(t τ 2 (t)) +trace(σ(y(t),y(t τ 1 (t)))σ T (y(t),y(t τ 1 (t)))p 11 ) 2x T (t)p 11 Ax(t)+2y T (t)p12ax(t) T +2x T (t)p 11 Gf(y(t τ 1 (t))) + 2y T P12Gf(y(t T τ 1 (t)) +2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t)+2x T (t)p 12 Dx(t τ 2 (t)) +2y T (t)p 22 Dx(t τ 2 (t)) + f T (y(t))qf(y(t)) (1 d 1 )f T (y(t τ 1 (t)))qf(y(t τ 1 (t))) +y T (t)sy(t) (1 d 1 )y T (t τ 1 (t))sy(t τ 1 (t)) +x T (t)rx(t) (1 d 2 )x T (t τ 2 (t))rx(t τ 2 (t)) 2 n i=1 λ if(y i (t))[f(y i (t)) ky i (t)] +trace(σ(y(t),y(t τ 1 (t)))σ T (y(t),y(t τ 1 (t)))p 11 ). By (5.39) and (5.40) we have (W. Chen et al. 2005) trace(σ(y(t),y(t τ 1 (t)))σ T (y(t),y(t τ 1 (t)))p 11 ) λmax(p 11 )trace(σ(y(t),y(t τ 1 (t)))σ T (y(t),y(t τ 1 (t)))) ρ[y T (t)h 1 y(t)+y T (t τ 1 (t))h 2 y(t τ 1 (t))]. Hence, we have LV (x(t),y(t),t) 2x T (t)p 11 Ax(t)+2y T (t)p T 12Ax(t) +2x T (t)p 11 Gf(y(t τ 1 (t)))+2y T P T 12Gf(y(t τ 1 (t)) +2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) +2x T (t)p 12 Dx(t τ 2 (t))+2y T (t)p 22 Dx(t τ 2 (t)) +f T (y(t))qf(y(t)) (1 d 1 )f T (y(t τ 1 (t)))qf(y(t τ 1 (t))) +y T (t)sy(t) (1 d 1 )y T (t τ 1 (t))sy(t τ 1 (t)) +x T (t)rx(t) (1 d 2 )x T (t τ 2 (t))rx(t τ 2 (t)) 2f T (y(t))λf(y(t))+2kf T (y(t))λy(t) +ρy T (t)h 1 y(t)+ρy T (t τ 1 (t))h 2 y(t τ 1 (t)) = ξ T (t)m 4 ξ(t),

179 5.3 Stochastic Stability of Gene Regulatory Networks 169 where ξ(t) = [x T (t),y T (t),x T (t τ 2 (t)),f T (y(t)),f T (y(t τ 1 (t))),y T (t τ 1 (t))] T. Since M 4 < 0, we obtain E[dV (x(t),y(t),t)] = E[LV (x(t),y(t),t)dt] < 0 (5.43) for all x(t) andy(t) except x(t) =y(t) = 0. Therefore, the genetic network (5.35) (5.36) is asymptotically stable in mean square Stochastic Stability with Disturbance Attenuation In the above section, we described the mean-square asymptotic stability of the genetic network model. The definition of mean-square asymptotic stability (Arnold 1974) is rather restrictive, which requires that lim t E z(t) 2 =0, (5.44) where z(t) =[x(t) T,y(t) T ] T. If the noise perturbations do not vanish in the steady-state, it is highly unlikely that the network will achieve mean-square asymptotic stability. Numerous experimental results also indicate that real biological systems generally cannot achieve mean-square asymptotic stability and that small fluctuations around the steady-states generally occur. In this section, we study a more realistic stochastic genetic network model as follows: dx(t) =[Ax(t)+Gf(y(t))]dt + σ(x(t),y(t))dw 1 (t)+v(t)dw 2 (t), (5.45) dy(t) =[Cy(t)+Dx(t)]dt. (5.46) For simplicity and convenience, we let σ(x(t),y(t)) R n and v(t) R n belong to L 2 [0, ). w 1 (t) andw 2 (t) are two independent one-dimensional Wiener processes. As we will see in the following analysis, the results are independent of the form of v(t), and the results hold no matter what v(t) is and no matter where it is introduced. We assume that the perturbation terms can represent perturbations from all sources, but even if we have to add another noise perturbation term to (5.46), the procedure of analysis does not change significantly. For (5.45) (5.46), when v(t) does not vanish in steady-states, the network cannot achieve mean-square asymptotic stability. We give another definition below. Definition 5.5. For a given scalar γ>0, the network (5.45) (5.46) is said to be stochastically stable with disturbance attenuation γ if when v(t) =0 the network is asymptotically stable in mean-square, and under zero initial conditions, we have z(t) E2 <γ v(t) 2 (5.47) for all non-zero v(t), where

180 170 5 Stability Analysis of Genetic Networks in Lur e Form z(t) E2 = for z(t) =[x(t) T,y(t) T ] T (Xu and Chen 2002). ( ( 1/2 E z(t) dt)) 2 (5.48) 0 Similarly, we also assume that σ(x(t),y(t)) can be estimated by σ T (x(t),y(t))σ(x(t),y(t)) x T (t)h 1 x(t)+y T (t)h 2 y(t) (5.49) with H 1 0andH 2 0. For this genetic network model, we have the following stability theorem (Li et al. 2007a): Theorem 5.6. Given a scalar γ>0, if there are matrices P 11, P 22, P 12, and Λ = diag(λ 1,..., λ n ) > 0, and a constant ρ>0, such that the following LMIs hold (1, 1) (1, 2) P 11 G M 5 = (1, 2) T (2, 2) P12G T + kλ < 0, G T P 11 G T P 12 + kλ 2Λ [ ] P11 P P = 12 P12 T > 0, P P 11 ρi, (5.50) 22 where (1, 1) = 2P 11 A + P 12 D + DP12 T + ρh 1 + ρ γ I, (1, 2) = DP A T P 12 + P 12 C, and (2, 2) = 2P 22 C +ρh 2 + ρ γ I, then the genetic network (5.45) (5.46) 2 is stochastically stable with disturbance attenuation γ. Proof (Li et al. 2007a): Consider the following Lyapunov function: V (x(t),y(t)) = [ ] T x(t) P y(t) [ ] x(t). (5.51) y(t) By Itô s formula, we obtain the following stochastic differential dv (x(t),y(t)) = LV (x(t),y(t))dt +2[x T (t)p 11 + y T (t)p12] T [σ(x(t),y(t))dw 1 (t)+v(t)dw 2 (t)], (5.52) where L is again the diffusion operator. Assuming that the model (5.45) (5.46) has zero initial conditions (according to Definition 5.5), we can derive ( t ) E(V (x(t),y(t))) = E LV (x(s),y(s))ds. (5.53) 0 For γ>0, we define { t } J(t) =E [x T (s)x(s)+y T (s)y(s) γ 2 v T (s)v(s)]ds. (5.54) 0

181 5.3 Stochastic Stability of Gene Regulatory Networks 171 Then, from (5.53) and (5.54), it is easy to show that for α>0, ( t ) J(t) E S 1 (s)ds, (5.55) 0 where S 1 (s) =αlv (x(s),y(s))+x T (s)x(s)+y T (s)y(s) γ 2 v T (s)v(s). Letting α = γ 2 /ρ, we obtain S 1 (t) = γ2 ρ LV (x(t),y(t)) + xt (t)x(t)+y T (t)y(t) γ 2 v T (t)v(t) = γ2 ρ [2xT (t)p 11 Ax(t)+2y T (t)p12ax(t) T +2x T (t)p 11 Gf(y(t)) + 2y T (t)p12gf(y(t)) T +2x T (t)p 12 Dx(t)+2y T (t)p 22 Dx(t)+2x T (t)p 12 Cy(t) +2y T (t)p 22 Cy(t)+ ρ γ 2 [xt (t)x(t)+y T (t)y(t)] +trace(σ(x(t),y(t))σ T (x(t),y(t))p 11 ) ρv T (t)v(t) + trace(v(t)v T (t)p 11 )]. (5.56) Considering P 11 ρi and 2 n i=1 λ if(y i (t))[f(y i (t)) ky i (t)] 0 and using also (5.49), we obtain S 1 (t) γ2 ρ [2xT (t)p 11 Ax(t)+2y T (t)p12ax(t)+2x T T (t)p 11 Gf(y(t)) +2y T (t)p12gf(y(t))+2x T T (t)p 12 Dx(t) +2y T (t)p 22 Dx(t)+2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) n 2 λ i f(y i (t))[f(y i (t)) ky i (t)] + ρx T (t)h 1 x(t) i=1 +ρy T (t)h 2 y(t)+ ρ γ 2 [xt (t)x(t)+y T (t)y(t)]] = γ2 ρ ξt (t)m 5 ξ(t), (5.57) where ξ(t) =[x T (t),y T (t),f T (y(t))] T. In view of the LMIs (5.50) and (5.55), we obtain [ t ] J(t) γ2 ρ E ξ T (s)m 5 ξ(s)ds < 0 (5.58) 0 for all x(t) andy(t) except for x(t) =y(t) =f(y(t)) = 0. Then (5.47) follows immediately from (5.54) and (5.58). It is easy to show that the genetic network (5.45) (5.46) is stochastically stable with disturbance attenuation. Next, we consider genetic networks with both time delays and noise perturbations of the following form

182 172 5 Stability Analysis of Genetic Networks in Lur e Form dx(t) =[Ax(t)+Gf(y(t τ 1 (t)))]dt +σ(x(t),x(t τ 2 (t)),y(t),y(t τ 1 (t)))dw 1 (t)+v(t)dw 2 (t), (5.59) dy(t) =[Cy(t)+Dx(t τ 2 (t))]dt, (5.60) where w 1 (t) andw 2 (t) are defined in the same manner as those in the above section, and τ 1 (t) > 0andτ 2 (t) > 0 are time-varying delays. We assume that τ 1 (t) d 1 < 1and τ 2 (t) d 2 < 1. We also assume that the noise intensity matrix σ(x(t),x(t τ 2 (t)),y(t),y(t τ 1 (t))) can be estimated by σ T σ x T (t)h 1 x(t)+x T (t τ 2 (t))h 2 x(t τ 2 (t)) + y T (t)h 3 y(t) +y T (t τ 1 (t))h 4 y(t τ 1 (t)), (5.61) with H 1 0, H 2 0, H 3 0, and H 4 0. For this network, the main result is summarized in the following theorem (Li et al. 2007a). Theorem 5.7. Given a scalar γ > 0, if there are matrices P 11, P 22, P 12, Q>0, R>0, S>0, Λ = diag(λ 1,..., λ n ) > 0, and a constant ρ>0, such that the following LMIs hold: (1, 1) P 12 C + AP 12 P 12 D 0 P 11 G P12A T + CP12 T (2, 2) P 22 D kλ P12G T M 6 = DP12 T DP 22 (3, 3) kλ 0 Q Λ 0 < 0, G T P 11 G T P (1 d 1 )Q [ ] P11 P P = 12 P12 T > 0, P P 11 ρi, ρh 4 (1 d 1 )S<0, (5.62) 22 where (1, 1) = 2P 11 A + R + ρh 1 + ρ γ I, (2, 2) = 2P 2 22 C + S + ρh 3 + ρ γ I, 2 and (3, 3) = (1 d 2 )R + ρh 2, then the genetic network (5.59) (5.60) is stochastically stable with disturbance attenuation γ. Proof (Li et al. 2007a): Consider a Lyapunov Krasovskii functional as follows (Kolmanovskii and Myshkis 1999): [ ] T [ ] x(t) x(t) V (x(t),y(t),t)= P y(t) y(t) + + t t τ 1 (t) t t τ 2 (t) [f T (y(μ))qf(y(μ)) + y T (μ)sy(μ)]dμ x(μ)rx(μ)dμ. (5.63) By using Itô s formula, we obtain the stochastic differential dv (x(t),y(t),t) = LV (x(t),y(t),t)dt +2[x T (t)p 11 + y T (t)p12] T [σ(x(t),x(t τ 2 (t)),y(t),y(t τ 1 (t)))dw 1 (t)+v(t)dw 2 (t)].(5.64)

183 5.3 Stochastic Stability of Gene Regulatory Networks 173 Following a procedure similar to that in the proof of Theorem 5.6, we obtain ( t ) J(t) E S 2 (s)ds, (5.65) 0 where S 2 (s) = γ2 ρ LV (x(s),y(s),s)+xt (s)x(s) +y T (s)y(s) γ 2 v T (s)v(s). Using the above Lyapunov Krasovskii functional, we have S 2 (t) = γ2 ρ [2xT (t)p 11 Ax(t)+2y T (t)p12ax(t) T +2x T (t)p 11 Gf(y(t τ 1 (t))) + 2y T P12Gf(y(t T τ 1 (t))) +2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) +2x T (t)p 12 Dx(t τ 2 (t)) + 2y T (t)p 22 Dx(t τ 2 (t)) +f T (y(t))qf(y(t)) (1 τ 1 (t))f T (y(t τ 1 (t)))qf(y(t τ 1 (t))) +y T (t)sy(t) (1 τ 1 (t))y T (t τ 1 (t))sy(t τ 1 (t)) +x T (t)rx(t) (1 τ 2 (t))x T (t τ 2 (t))rx(t τ 2 (t)) + ρ γ 2 xt (s)x(s)+ ρ γ 2 yt (s)y(s) ρv T (s)v(s) +trace(σσ T P 11 ) + trace(v(t)v T (t)p 11 )] γ2 ρ [2xT (t)p 11 Ax(t)+2y T (t)p12ax(t) T +2x T (t)p 11 Gf(y(t τ 1 (t))) + 2y T P12Gf(y(t T τ 1 (t))) +2x T (t)p 12 Cy(t)+2y T (t)p 22 Cy(t) +2x T (t)p 12 Dx(t τ 2 (t)) + 2y T (t)p 22 Dx(t τ 2 (t)) +f T (y(t))qf(y(t)) (1 d 1 )f T (y(t τ 1 (t)))qf(y(t τ 1 (t))) +y T (t)sy(t) (1 d 1 )y T (t τ 1 (t))sy(t τ 1 (t)) +x T (t)rx(t) (1 d 2 )x T (t τ 2 (t))rx(t τ 2 (t)) + ρ γ 2 xt (s)x(s)+ ρ γ 2 yt (s)y(s)+ρx T (t)h 1 x(t) +ρx T (t τ 2 (t))h 2 x(t τ 2 (t)) + ρy T (t)h 3 y(t) +ρy T (t τ 1 (t))h 4 y(t τ 1 (t)) n 2 λ i f(y i (t))[f(y i (t)) ky i (t)]]. (5.66) i=1 In the above analysis, we have used the sector condition as well as the inequality (5.61). Letting ξ(t) =[x T (t),y T (t),x T (t τ 2 (t)),f T (y(t)),f T (y(t τ 1 (t)))] T,wehave S 2 (t) γ2 ρ [ξt (t)m 6 ξ(t)+y T (t τ 1 )(ρh 4 (1 d 1 )S)y(t τ 1 )]. (5.67) Since M 6 < 0andρH 4 (1 d 1 )S<0, we obtain

184 174 5 Stability Analysis of Genetic Networks in Lur e Form [ t ] J(t) E S 2 (s)ds < 0. (5.68) 0 We can easily obtain inequality (5.47). Therefore, the genetic network (5.59) (5.60) is stochastically stable with disturbance attenuation γ. As mentioned in the Discussion section in (Li et al. 2006a), the model can be generalized in many ways and the analysis procedures should not add significant difficulty. 5.4 Examples In this section, we present three examples to show the effectiveness and correctness of the theoretical results. In order to demonstrate the evaluation of the theoretical results in detail, we consider a small size (but with complex coupling topology) genetic network with five nodes as shown in Figure 5.1 (Li et al. 2006a). Figure 5.1 shows an interaction graph of a gene regulatory network, where each ellipse represents a node, and the lines represent regulatory links, in which and denote activation and repression, respectively. We assume that the dimensionless transcriptional rates are all 0.5. According to the definition of links in Section 5.1, we can obtain the coupling matrix G of this network as follows: G = , (5.69) and l =0.5 [1, 1, 0, 1, 0] T in (5.9). In the following three examples, all the networks have this topology Figure 5.1 The interaction graph of a genetic network model, where represents repression and represents activation (from (Li et al. 2006a)) Example 5.1 We consider the genetic network in Figure 5.1 with time delays of the form (5.18) (5.19). Let A = C = diag( 1, 1, 1, 1, 1),

185 5.4 Examples 175 D = diag(0.8, 0.8, 0.8, 0.8, 0.8), and f(x) =x 2 /(1 + x 2 ), in other words, the Hill coefficient H is 2. It is easy to show that the maximum value of the derivative of f(x) is less than k =0.65. Assuming that the time delays are τ 1 (t) = 1+0.1sin(t) andτ 2 (t) = sin(t), we have d 1 = d 2 =0.1 < 1. The unique equilibrium of this network is m =[0.4302, , , , ] T and p =[0.3459, , , , ] T. We first shift the equilibrium to the origin. According to Theorem 5.2, if the LMIs (5.20) hold, then the genetic network is globally asymptotic stable. By using the MATLAB R LMI Toolbox, we can easily obtain feasible solutions of the LMIs (5.20). Thus, the network is globally asymptotic stable. The trajectories of variables x(t) are shown in Figure 5.2, which indicates that the network considered in this example is indeed stable x(t) t Figure 5.2 Trajectories of x(t) in the genetic network with time delays (from (Li et al. 2006a)) Example 5.2 In this example, let us consider the genetic network in Figure 5.1 without time delay but with stochastic perturbations of the form of (5.25) (5.26). Let D = diag(1, 1, 1, 1, 1), n(t) be a scalar white Gaussian noise process with zero mean, and σ(y(t)) = [σ 1 (y(t)),..., σ 5 (y(t))] T with σ i (y(t)) = j=1 y j(t) for all i. The other parameters are the same as those in Example 5.1. The unique equilibrium in the absence of perturbation is m = [0.3955, , , , ] T, p =[0.3905, , , , ] T. First of all, we also shift the equilibrium to the origin. By using the MATLAB R LMI Toolbox, we can easily find feasible solutions of the LMIs (5.30) in Theorem 5.3, which indicates that the network with stochastic perturbations is asymptotic stable in mean square. We show the trajectories of variables x(t) in Figure 5.3, which indicates that the network of this example is indeed stable in mean square.

186 176 5 Stability Analysis of Genetic Networks in Lur e Form x(t) t Figure 5.3 Trajectories of x(t) in the genetic network with stochastic perturbations (from (Li et al. 2006a)) x(t) t Figure 5.4 Trajectories of x(t) in the genetic network with both time delays and stochastic perturbations (from (Li et al. 2007a)) Example 5.3 In this example, we consider a genetic network of the form of (5.59) (5.60). We set the noise intensity as σ(x(t),y(t τ 1 (t))) = [σ 1 (x(t),y(t τ 1 (t))),..., σ 5 (x(t),y(t τ 1 (t)))] T with σ i (x(t),y(t τ 1 (t))) = 0.05[x i (t)+ 5 j=1 y j(t τ 1 (t))] for all i. Fromσ(x(t),y(t τ 1 (t))) we can easily obtain the matrix H i,i=1, 2, 3, 4. Since the theoretical result is independent of the form of v(t), we let v(t) =[0.05, 0.05, 0.05, 0.05, 0.05] T. The other parameters are the same as those in Example 5.1. By applying Theorem 5.7, and using the MATLAB R LMI Toolbox, we can easily obtain feasible solutions of the LMIs (5.62) for γ 4.5. The trajectories of variables x(t) are shown in Figure 5.4. Since v(t) is invariant with time, we can easily confirm that the right-hand side of (5.47) is equal to t when γ =4.5, and by numerical calculations, we can show that the time average of the left-hand side of

187 5.4 Examples 177 (5.47) is less than Thus, the network is indeed stochastically stable with disturbance attenuation γ =4.5.

188 6 Design of Synthetic Switching Networks One of the major challenges in post-genomic biology is to understand how genes, proteins, and small molecules dynamically interact to form molecular networks that facilitate sophisticated biological functions. From the engineering viewpoint, there are two major approaches to clarify network structures and their functions for a cellular system. One is called reverse engineering in which biomolecular networks are developed by conducting biological experiments on a specific species or organisms, collecting data applying highthroughput technologies, and finally inferring the related network structure based on computational and analytical theory, in other words, the process of reverse engineering is from data or experiments to networks. The other approach is called forward engineering in which a biomolecular device or a simple network with a specific function is first designed, then, this artificial part is integrated into the system of a living organism to grow, and finally, data are collected to confirm its functioning, in other words, the process of forward engineering is from networks to data. An ability to rationally design complex networks from the bottom-up can also help realize useful quantitative model systems for gaining a deeper appreciation of the principles governing the functional characteristics of complex biological systems. In this chapter, we focus on the design method of biomolecular networks, which clearly belongs to the category of forward engineering. In fact, such a topic is closely related to synthetic biology, which is a new and rapidly emerging discipline aimed at the design and construction of new biological devices and systems and the re-design of existing or natural biological systems for desired purposes. Although the basic concepts in designing biomolecular networks and controlling cellular systems at the DNA level have been in existence for sixty years, it is the recent advances of genetic engineering that facilitate both theoretical design and experimental implementation realistic. Progress in the theory on networks and dynamics provides mathematical frameworks for designing biologically viable biomolecular networks with specified functions. One such function which exists ubiquitously in cellular systems is multistability, i.e., the capacity to achieve multiple alternative internal states in response to dif-

189 180 6 Design of Synthetic Switching Networks ferent stimuli. Multistability is a defining characteristic of a switch. Cells can switch between multiple internal states to accommodate environmental and intercellular conditions. It is increasingly becoming clear that such multiple discrete and alternative stable states are generated by regulatory interactions among cellular components. Such capacity has been found in both synthetic and natural biomolecular networks, including gene regulatory networks (Gardner et al. 2000), signal transduction networks (Markevich et al. 2004, Rodriguez et al. 2008), and metabolic networks (Ozbudak et al. 2004). Multistability has fundamental biological significance, notably in cell differentiation (Suel et al. 2006, Becskei et al. 2001), cell fate decision (Xiong and Ferrell Jr 2003), adaptive response to environmental changes (Kashiwagi et al. 2006), regulation of cell-cycle oscillations during mitosis (Pomerening et al. 2003), and so on. Many research studies have indicated that the multistability of these systems is attributed to positive feedback loops in their regulatory networks. In the modern information technology age, switch-like structures play important roles. For instance, the silicon switch is an essential building block for a computer; by connecting a few silicon switches we can construct a simple gate and by using additional silicon switches we can develop a logic circuit. When a network is connected by a large number of silicon switches, we can construct a high-capacity memory, a powerful CPU, or even a sophisticated computer. Analogously to information science, a living organism is also assumed to be synthesized by basic building blocks, including gene switches, gene sensors, and gene oscillators, which are the main topics of this book. By connecting a few such building blocks, we can synthesize a module with a more elaborate function. Hence, it should be important to gain a deep understanding of those building blocks. Responses of a complex biomolecular network are often understood to be facilitated by various switches. For example, a cell or an organism often switches off one gene or gene group but simultaneously switches on a different gene or group of genes to respond to environmental changes. Recently, it has been revealed that many genetic switches can turn therapeutic genes on and off. Such switches let them regulate levels of compounds like insulin in diabetics. Therefore, designing and constructing such a building block represents a first step towards cellular control by monitoring and manipulating biological processes at the DNA level. This process can be used not only for building modules to synthesize artificial biological systems but also has great potential for biotechnological and therapeutic applications. The perspectives as well as simple comparisons between silicon computing systems; and synthetical biological systems are listed in Table 1.1. Here, we will briefly introduce some basic concepts on switching networks and then present a general theoretical framework to design switching networks from the viewpoints of engineering and synthetic biology.

190 6.1 Types of Switches Types of Switches Bistable switches are one of the most extensively studied building blocks in biomolecular networks (Ferrell 2002). Much theoretical and experimental research has been carried out to elucidate the structures and functions of various switches in cellular systems. A bistable system is like a toggle which can reside only in one of two stable states. Two simple mechanisms which can exhibit bistability through a positive or double-negative feedback are shown in Figure 6.1. The synthetic toggle switch (Gardner et al. 2000) takes the form shown in Figure 6.1 (a) and the bistable model of natural lactose utilization (Ozbudak et al. 2004) network is of the form shown in Figure 6.1 (b). (a) (b) Figure 6.1 Two simple networks that can exhibit bistability. (a) A double-negative feedback loop, where protein A inhibits B and protein B inhibits A. Thus there can be a stable steady-state with in which A is on and B is off or one in which B is on and A is off, but there cannot be a stable steady-state in which both A and B are on or off. (b) A positive feedback loop in which A activates B and B activates A. There could be a stable steady-state in which both A and B are off or one in which both A and B are on, but not one in which A is on and B is off or vice versa. Both types of circuit exhibit persistent, self-perpetuating responses long after the triggering stimulus is removed (from (Ferrell 2002)) Switching networks can be grouped into two distinct categories: reversible and irreversible switches. One can classify switches by plotting the signalresponse curves, which display the qualitative changes in the steady-states as the input signal or a system parameter is smoothly varied. Consider the phosphorylation and dephosphorylation reactions shown in Figure 6.2 (a). The dynamics of these reactions is governed by the MM kinetics as follows (Tyson et al. 2003):

191 182 6 Design of Synthetic Switching Networks dr P dt = k 1S(R T R P ) k 2R P, (6.1) K m1 + R T R P k m2 + R P where S is the signal strength and R P is the concentration of the phosphorylated form. R is the concentration of the dephosphorylated form and R T is the total concentration of the molecule, i.e., R T = R + R P. The steady-state concentration of the phosphorylated form is shown in Figure 6.2 (b). The mechanism for creating a switch-like signal-response curve is called zero-order ultrasensitivity because when the signal strength S is close to the threshold, small signal changes result in large changes in the steady-state response. Ultrasensitivity is the opposite of homeostasis, where the steady-state concentration of the response is confined to a narrow window for a broad range of signal strengths. The steady-state response R P increases continuously with signal strength, which is called graded. A slightly stronger signal results in a slightly stronger response. Reversible implies that if the signal strength is changed from S initial to S final, the response at S final is the same irrespective of whether the signal strength is being increased (S initial <S final ) or decreased (S initial >S final ). Clearly, the switch in Figure 6.2 is reversible because of the characteristic shown in Figure 6.2 (b). Such a switch is also called a gate, where S is taken as an input and R P is taken as an output. Note that Figure 6.2 (b) is a map on a parameter space in which S is a parameter and R P is a state variable. Clearly, there is only one stable equilibrium for one specific value of parameter S. Now, we consider an example of an irreversible switch by considering a mutual activation network, as shown in Figure 6.3 (a), whose dynamics is governed by (Tyson et al. 2003) here dr dt = k 0E P (R)+k 1 S k 2 XR, (6.2) E P (R) = 2k 3 RJ 4 K JR + K 2 JR 4(k 4 k 3 R)k 3 RJ 4, (6.3) where K JR = k 4 k 3 R + k 4 J 3 + k 3 RJ 4. In the network, R activates protein E by phosphorylating E into its active form EP, andep in turn enhances the synthesis of R, resulting in a positive feedback. In contrast to the graded and reversible sigmoidal signal response, the response to such a mutual activation network may create a discontinuous switch. As the signal magnitude S goes over a critical value S crit, the response changes abruptly and irreversibly, as shown in Figure 6.3 (b). In other words, as the signal strength S increases, the response is low until S exceeds some critical intensity S crit,atwhichpoint the response increases discontinuously to a high value. Then, if S decreases, the response stays high, i.e., the switch is irreversible (Tyson et al. 2003). Clearly, there are two stable equilibria for one specific parameter S in the interval 0 S S crit, as shown by the solid lines in Figure 6.3 (b).

192 6.1 Types of Switches 183 (a) (b) Figure 6.2 Sigmoidal signal-response element: a reversible switch or a gate. (a) The network for phosphorylation and dephosphorylation reactions. (b) The sigmoidal response of the steady-state concentration of the phosphorylated form. The parameter values are k 1 = k 2 =1,R T = 1, and K m1 = K m2 =0.05 (from (Tyson et al. 2003)) (a) (b) Figure 6.3 An irreversible switch: a one-way switch: (a) the mutual activation network; (b) the signal-response curve. The parameter values are k 0 =0.4, k 1 =0.01, k 2 = k 3 =1,k 4 =0.2, and J 3 = J 4 =0.05 (from (Tyson et al. 2003))

193 184 6 Design of Synthetic Switching Networks Another example of an irreversible switch is the mutual inhibition network, as shown in Figure 6.4, where R represses E and E in turn inhibits R, thereby forming a positive feedback loop. Its dynamics is governed by (Tyson et al. 2003) where dr dt = k 0 + k 1 S k 2 R k 2E P (R)R, (6.4) E P (R) = 2k 3 J 4 K RJ + K 2 RJ 4(k 4R k 3 )k 3 J 4, (6.5) here K RJ = k 4 R k 3 + k 4 RJ 3 + k 3 J 4. In general, there are two kinds of discontinuous responses: one-way switch and a toggle switch, as shown in Figure 6.3 (b) and Figure 6.4 (b), respectively. One-way switches presumably play major roles in developmental processes characterized by a point of no return, as shown in Figure 6.3 (b). Apoptosis is an example of a one-way switch. On the other hand, in a toggle switch, if S is sufficiently decreased, the switch can go back to the off-state, as shown in Figure 6.4 (b). In the case of a single parameter, the discontinuous toggle switch is often referred to as hysteresis or multistability. The examples include the lac operon in bacteria and cell cycle transitions driven by hysteresis (Sha et al. 2003,Pomerening et al. 2003,Han et al. 2005). A hysteresis model in a synthetic mammalian genetic network has been constructed. This model is based on a positive feedback loop (Kramer and Fussenegger 2005). A comprehensive introduction to making continuous processes discontinuous and reversible processes irreversible by using graphical displays of the rate equations can be found in (Ferrell and Xiong 2001). In contrast to the case illustrated in Figure 6.2, in such a case, there are two stable equilibria in the state space for a specific parameter S. Different definitions are given for reversible and irreversible switches, e.g., in (Paladugu et al. 2006), where toggle switches are actually defined to be reversible because it is possible to switch between the two stable states simply by changing the parameter values. The definition is based on bifurcation diagrams. When the two saddle-node bifurcation points appear in the positive region of a parameter, the switch will be usually reversible. On the contrary, if one of the two bifurcation points appears in the negative region or at a biologically infeasible parameter value, it is impossible to switch back to the other steady-state because of physical or biological constraints, in which case the switch will be irreversible. It has been shown that cellular systems use bistability as a means to achieve irreversiblity (Ferrell 2002). In this chapter, we mainly consider the dynamics and structures of irreversible switches with multiple steady-states, or equilibria, i.e., switches with characteristics similar to those illustrated in Figures. 6.3 and 6.4.

194 6.2 Simple Switching Networks 185 (a) (b) Figure 6.4 Another irreversible switch: a toggle switch: (a) the mutual inhibition network; (b) the corresponding signal-response curve. The parameter values are k 0 =0,k 1 =0.05, k 2 =0.1, k 2 =0.5, k 3 =1,k 4 =0.2, and J 3 = J 4 =0.05 (from (Tyson et al. 2003)) 6.2 Simple Switching Networks Bistability in a Single Gene Network A simple quantitative model describing the regulation of the P RM operator region of λ phage was developed in (Hasty et al. 2001). The system is a DNA plasmid consisting of the promoter region and the ci gene. The promoter region contains three operator sites known as OR 1,OR 2, and OR 3. The gene ci expresses its protein CI, which in turn dimerizes and binds to the DNA as a TF. The binding can take place at one of the three binding sites. The CI dimer first binds to OR 1, then to OR 2, and finally to OR 3, according to their binding affinities. Positive feedback arises because the downstream transcription is enhanced by the binding at OR 2, while the binding at OR 3 represses the transcription and constitutes a negative feedback loop (Hasty et al. 2001, Isaacs et al. 2003). However, there is almost no effect on the transcription when the CI dimer binds only to OR 1, although OR 1 has the highest binding affinity among the three operator sites. Figure 6.5 shows schematically the binding effects and priorities of the CI dimers for the operator sites of the P RM promoter. The operate sites can be fully occupied by three dimers or equivalently total six monomers. The binding reactions are fast and assumed to be in equilibrium with respect to other reactions. Letting X 1, X 2, D, andd i denote the repressor (CI), the repressor dimer, the DNA promoter site, and the dimer binding to

195 186 6 Design of Synthetic Switching Networks Figure 6.5 The binding effects of the CI dimers on the operator sites of the P RM promoter with priorities OR 1 (0)>OR 2 (+)>OR 3 ( ) the OR i site, respectively, the reactions take the following form: K 1 X 1 + X 1 X 2, (6.6) K 2 D + X 2 D 1, (6.7) K 3 D 1 + X 2 D 2 D 1, (6.8) K 4 D 2 D 1 + X 2 D 3 D 2 D 1, (6.9) where K 3 = σ 1 K 2 and K 4 = σ 2 K 2 thus σ 1 and σ 2 represent binding affinities of OR 2 and OR 3 relative to the dimer-or 1 affinity. The slow irreversible reactions are the transcription and degradation processes. If no repressor is bound to the operator region or a single repressor dimer is bound only to OR 1, the transcription proceeds at a normal rate. If, however, a repressor dimer is bound to OR 2, the transcription is enhanced. Moveover, when the CI dimer is also bound to OR 3, the transcription is completely repressed or terminated. The reactions governing these processes are D kt D + nx 1, (6.10) D 1 k t D1 + nx 1, (6.11) αk D 2 D t 1 D2 D 1 + nx 1, (6.12) 0 D 3 D 2 D 1 D3 D 2 D 1 + nx 1, (6.13) X k x 0, (6.14)

196 6.2 Simple Switching Networks 187 where α>1 is the degree to which the transcription is enhanced by dimer occupation of OR 2 and n is the number of proteins per mrna transcript. Note that the transcription is assumed to be completely inhibited when the CI dimer is bound to OR 3, and thus we set the rate in (6.13) to be zero. Figure 6.6 The steady-state concentration of the repressor as a function of the parameter γ (from (Hasty et al. 2001)) Under the QSS assumption, by applying the conservation law for DNA promoter sites and using rescaled dimensionless variables (Hasty et al. 2001), we obtain the following rate equation that describes the evolution of the concentration for the CI monomer: dx dt = m(1 + x2 + ασ 1 x 4 ) 1+x 2 + σ 1 x 4 γx, (6.15) + σ 1 σ 2 x6 where x is the concentration of the CI monomer. The first term on the righthand side of (6.15) represents the production of repressor CI by transcription. (6.15) includes even polynomials because of the dimerization of CI and the subsequent binding to the promoter region. Recall that three dimers or six equivalent monomers can fully occupy the operator sites of the promoter, and they correspond to the polynomial x 6 in (6.15). The system (6.15) has the capacity to create bistability. The steady-state concentration of the repressor as a function of the parameter γ is shown in Figure 6.6. The bistability arises as a consequence of the competition between the production of x along with dimerization and its degradation The Toggle Switch A synthetic toggle switch was constructed on plasmids (Gardner et al. 2000). The network was designed from two promoters and their repressors, where

197 188 6 Design of Synthetic Switching Networks each promoter can be inhibited by the repressor transcribed by the other promoter, as shown in Figure The model can be expressed by (2.180) (2.181). Intuitively, one might anticipate that there will be two possible equilibria. Because LacI production is repressed by the CI protein, an initial high concentration of CI would lead to a state with high CI and low LacI concentrations. Conversely, because CI production is repressed by LacI, if LacI is initially present at a high concentration, a second stable state would entail high LacI and low CI concentrations. One counter-intuitive observation is that not all parameter combinations will show bistability. The bistable and monostable situations, the bifurcation set, and the effects of cooperativity parameters β and γ on bifurcations are shown in Figure 6.7. The design of an operating toggle switch thus depends on the choice of parameters that can lead to bistability. These criteria include the use of strong and balanced constitutive promoters, effective transcriptional repressions, formation of protein multimers, and protein degradation rates. Reliable toggling between different stable states can be induced experimentally through the transient introduction of either a chemical or a thermal stimulus (Gardner et al. 2000) The MAPK Cascade Model Now, we consider a higher-dimensional system of the MAPK cascade. The key features of the cascade are shown schematically in Figure 6.8. Active Mos (x) can activate MEK through phosphorylation of two residues, i.e., conversion of unphosphorylated y 1 to monophosphorylated y 2 and then bisphosphorylated y 3. Similarly, active MEK (y 3 ) then phosphorylates p42 MAPK (z 1 )intwo residues, resulting in monophosphorylated z 2 and then bisphosphorylated z 3. Active p42 MAPK (z 3 ) can then promote Mos synthesis, completing the closed positive feedback loop (Angeli et al. 2004). The equations following the MM expressions are given by ẋ = V 2x K 2 + x + V 0z 3 x + V 1, (6.16) ẏ 1 = V 6y 2 K 6 + y 2 V 3xy 1 K 3 + y 1, (6.17) ẏ 2 = V 3xy 1 K 3 + y 1 + V 5y 3 K 5 + y 3 V 4xy 2 K 4 + y 2 V 6y 2 K 6 + y 2, (6.18) ẏ 3 = V 4xy 2 K 4 + y 2 V 5y 3 K 5 + y 3, (6.19) ż 1 = V 10z 2 K 10 + z 2 V 7y 3 z 1 K 7 + z 1, (6.20) ż 2 = V 7y 3 z 1 K 7 + z 1 + V 9z 3 K 9 + z 3 V 8y 3 z 2 K 8 + z 2 V 10z 2 K 10 + z 2, (6.21) ż 3 = V 8y 3 z 2 K 8 + z 2 V 9z 3 K 9 + z 3. (6.22)

198 6.2 Simple Switching Networks 189 Figure 6.7 Dynamical analysis of the toggle switch: (a) bistability with balanced promoter strength; (b) monostability with imbalanced promoter strength; (c) the bistable region. The lines mark the boundary between bistable and monostable regions. The slopes of the bifurcation lines are determined by the exponents β and γ for large α 1 and α 2. (d) Reducing the cooperativity of repression (β and γ) reduces the size of the bistable regions. Bifurcation lines are illustrated for three different values of β and γ. The bistable region lies inside each pair of curves (from (Gardner et al. 2000)) For appropriate parameter values, this model has two stable equilibria, which work as a switch. Details on how to detect the bistability and determine the parameter values that are chosen to reproduce the experimentally determined abundance and kinetic data in Xenopus oocytes are given in (Angeli et al. 2004). Excellent reviews of other switches can be found in (Ferrell 2002, Wolf and Arkiny 2003). The response of individual reactions or biomolecular interactions with respect to the concentrations of the participating components is continuous and usually graded; however, the combination of the individual reactions can give rise to sharp and switching behavior. Switching behavior can even be realized

199 190 6 Design of Synthetic Switching Networks Figure 6.8 Schematic depiction of the Mos-MEK-p42 MAPK cascade (from (Angeli et al. 2004)) by a single function such as one with highly cooperative behavior as demonstrated in (6.15); examples of such functions are Hill type function with large exponents. The above simple switching networks indicate that network structures, cooperative interactions, feedback loops, and parameter values all play key roles in realizing switching behavior. A systems approach offers a better way to understanding how the complexity of switches arises from the network structure and feedback loops. 6.3 Design of Switching Networks with Positive Loops The switching mechanisms discussed include cross-repressive feedback with cooperativity, e.g., the toggle switch (Gardner et al. 2000), cooperative autoactivation of gene expression (Hasty et al. 2001), and more complex positive feedback systems, e.g., the MAPK cascade (Angeli et al. 2004). The requirements for multistability include some sort of feedback and some type of nonlinearity within the feedback circuits. In addition, the two aspects of the feedback loops must be properly balanced for the circuit to exhibit bistability, as shown in Figure 6.7. If either aspect is too strong or weak, the circuit will be monostable rather than bistable (Ferrell 2002). Bistability can arise in circuits with positive (or even-number negative-interactions) loops, but a circuit with a negative (or odd-number negative-interactions) loop is expected to exhibit different properties.

200 6.3 Design of Switching Networks with Positive Loops 191 Here, we describe a general theoretical framework for constructing a switching network with positive feedback loops (Kobayashi et al. 2003); this network can act as a toggle switch with multiple switching states. The construction method is based on the use of monotone dynamical systems (Smith 1995). The mathematical description of biomolecular networks comprises a system of differential equations derived by considering the synthesis and degradation of individual components. Assuming that there are n biochemical components in a network, i.e., proteins, mrnas, and small molecules, we can generally describe a biomolecular model as follows: ẋ = f(x τ,p) Dx, (6.23) where x(t) R +n represents the concentrations of all components at time t R + and R + is the set of non-negative real numbers. x τ = x(t τ) X R +n is the concentration of all components at time t τ. When emphasizing the dependence of a solution on the initial data φ C + C([ τ,0], R +n ), we write x(t, φ) orx t (φ). D = diag(d 1,..., d n )isann n diagonal matrix with n positive real diagonal components representing the degradation rates of individual components, and f =(f 1,..., f n ):C +n R +n with f i indicating the synthesis rate of the ith component. In addition, we define N = {1,..., n}. τ ij is the time delay from node j to node i. A general design procedure based on monotone dynamical systems and positive feedback networks (PFNs) was developed (Kobayashi et al. 2003, Smith 1995); this procedure guarantees stable switching states without any non-equilibrium dynamics, thereby making theoretical analysis and design tractable even for large-scale biological systems with time delays. We first define interactions, feedback loops, interaction graphs (IG), and then define PFNs by applying the theoretical results of monotone dynamical systems. Definition 6.1 (Types of interactions). Suppose that the concentration of the jth component affects the synthesis rate of the ith component with i j. Express f i as f i (x τ )=f i (x 1 (t τ i1 ),..., x n (t τ in )) with f i (x) =f i (x 1 (t),..., x n (t)), and define the type of interaction between the ith and the jth components, i.e., s ij, as follows: f +1 : if i (x) x j x > 0 for all x X, f s ij = 1 : if i (x j ) x j x < 0 for all x X, (6.24) f 0 : if i(x j) x j x = 0 for all x X. If s ij =1(or 1), then the jth component affects positively (or negatively) the ith component with time delay τ ij, namely a positive (or negative) interaction.

201 192 6 Design of Synthetic Switching Networks For example, s ij =1forf i (x τ )=x j (t τ ij )/(1 + x j (t τ ij )) and s ij = 1 for f i (x τ )=1/(1 + x j (t τ ij )). If s ij = s ji = 0 for all x X, there is no interaction between the ith component and the jth component. Next, we define an interaction graph of the model (6.23). This not only enables us to understand the relation among the components intuitively but also gives us an intuitive interpretation of theoretical results. Definition 6.2 (Interaction graph). An interaction graph, IG(f), of a biomolecular network defined by (6.23) is a directed graph whose nodes represent individual components in the network and whose edges with additional parameter sets represent the interaction between the nodes. When s ij 0 and τ ij 0, that is, when the jth chemical affects the synthesis rate of the ith component with time delay τ ij, the graph has an edge, e ij, directed from the jth node to the ith node with an additional parameter set (s ij,τ ij ). It should be noted that an edge from the jth node to the ith node is subscribed oppositely to the convention in graph theory. In other words, an edge e ij in an interaction graph of (6.23) is an edge from the jth node to the ith node, which is related to the derivative of f i by x j, i.e., f i (x τ )/ x j (t τ ij ) or f i (x)/ x j. Definition 6.3 (Feedback loops and their types). A path from the ith node to itself in an interaction graph, i.e., p(i, i) =(i = p 1 p 2 p l = i) issaidtobeafeedback loop or cycle and is also a self-feedback loop when l is 2; p k represents the node k in the path p(i, i). In addition, this feedback loop is said to be positive (or negative) if l 1 m=1 s p m+1p m =1(or 1). The network (6.23) is called a positive feedback network if its interaction graph IG(f) has only positive feedback loops except self-feedback loops. Intuitive descriptions for interactions, paths, and loops of an interaction graph are also given in Chapter 3. An example of a PFN that is expressed by an interaction graph with only positive feedback loops is shown in Figure 6.9. It is worth noting that a positive feedback loop may include negative interaction edges. Further, for two arbitrary nodes i and j in an interaction graph, there may exist multiple paths from the ith node to the jth node. Although different paths from the ith to the jth node may have different signs in a general interaction graph, all loops must have the same sign in PFNs, i.e., all loops must be positive. For example, there are three positive loops from the node 1 to itself, i.e., , , and in Figure 6.9. There is no restriction on the self-feedback loops in PFNs, which may be either positive or negative. The associated ODEs of (6.23) are obtained by ignoring all time delays, i.e., setting all delays τ ij = 0 for all i and j. ẋ = f(x, p) Dx. (6.25) The following theorems are derived based on monotone dynamical systems (Smith 1995), (Kobayashi et al. 2003).

202 6.3 Design of Switching Networks with Positive Loops 193 e 12 τ e 15 e 21 τ 21 τ e 13 e 36 - τ e 42 τ e 54 τ τ 36 τ - e Figure 6.9 An example of a PFN. The signs + and on an edge indicate s =1 and 1, respectively. There are three loops, i.e., the loop connecting nodes (1,2,1), the loop connecting nodes (1, 2, 4, 5, 1), and the loop connecting nodes (1, 2, 4, 6, 3, 1), which are all positive (from (Wang et al. 2008)) Theorem 6.4. Suppose that a biomolecular network has only positive feedback loops except self-feedback loops, and its dynamics is described by (6.23). For almost all initial conditions φ C +, its solution x t (φ) converges to an equilibrium. This theorem indicates that a biomolecular network with only positive feedback loops has no dynamical attractors except equilibria. When designing a switching network, it is important to ensure that the designed switch does not show any dynamical oscillations except asymptotic convergence to stable equilibria. However, it is generally not easy to guarantee such a stable or convergent behavior even for a small biomolecular network with only a few components and without any time delays because of the nonlinearity of the system. As indicated in Theorem 6.4, if we design a switching network with only positive feedback loops, the network is guaranteed to converge to stable equilibria in spite of the nonlinearity, size, and delays of the network. Such a property significantly reduces the complexity of designing and analyzing switching networks. It should be noted that this theorem does not exclude the existence of unstable non-equilibrium solutions such as unstable limit cycles. However, such unstable non-equilibrium solutions cannot usually be observed because of intracellular noise. In this sense, the theorem asserts that a biomolecular network composed of only positive feedback loops inevitably converges to stable equilibria. In addition, it is worth noting that this theorem can be extended not only for networks with multiple time delays but also for some networks with non-positive feedback loops.

203 194 6 Design of Synthetic Switching Networks Theorem 6.5. Suppose that a biomolecular network has only positive feedback loops except self-feedback loops and that its dynamics is described by (6.23), where there are no time delays in the self-feedback loops. Then, time delays have no effects on the stabilities of equilibria. Note that (6.23) and (6.25) have identical equilibria. Theorem 6.5 indicates that the time delays have no effects on the stabilities of the equilibria. In other words, instead of the complicated FDEs (6.23), we can use the associated ODEs by letting all τ = 0 in (6.23), i.e., (6.25), to design and analyze switching networks with only positive feedback loops. Note that the self-feedback loops do not allow any delays. By using the ODEs instead of the FDEs, we can significantly reduce the complexity of the problem and enable the design of large-scaled complex biomolecular networks. The theorem allows us to examine the equilibria and their stability by using the much simpler associated ODEs, however, it is still difficult to analyze the nonlinear ODEs, especially high-dimensional ODEs. To cope with such a problem, a reduction method was developed to further simplify the complex ODEs to lower-dimensional ones with the same equilibria and stability as the original system (Kobayashi et al. 2003). The reduction of dimensionality is carried out by considering the pseudo-steady-state and making some assumptions on feedback loops. Such a reduction process is different from the conventional methods to reduce complexity by exploiting multiple time scales and keeping only the slow variables. In other words, there are no approximations and assumptions for time scales. Note that transient dynamics may be quite different although (6.23) and (6.25) have the same equilibria with the same stability. Next, we consider only the ODEs (6.25). Theorem 6.6. Consider (6.25) and its interaction graph IG(F). Assume that the ith node does not have any self-feedback loop, i.e., does not have any edge e ii. By removing ẋ i = f i (x) d i x i and substituting x i = f i (x)/d i into the remaining equations in (6.25), we obtain (n 1)-dimensional differential equations where ẋ = f (x ) D x, (6.26) x =(x 1,..., x i 1,x i+1,..., x n ), (6.27) f =(f 1,..., f i 1,f i+1,..., f n ), (6.28) D =(d 1,..., d i 1,d i+1,..., d n ). (6.29) Then, there is a one-to-one correspondence between the equilibria of (6.26) and those of (6.25). In addition, their stabilities are also the same. Theorem 6.6 shows a procedure to reduce the dimensionality of a biomolecular network with only positive feedback loops. According to this theorem, the

204 6.3 Design of Switching Networks with Positive Loops 195 associated ODEs can be reduced stepwise to a lower-dimensional network until all the remaining nodes in the interaction graph of the reduced network have self-feedback edges. In other words, according to Theorem 6.6, all nodes without any self-feedback loop can be eliminated one by one, as illustrated in Figure ( a ) ( b ) Figure 6.10 Schematic diagram of the reduction procedure. The signs accompanying the arrows indicate the types of interactions: (a) the case that both of the signs s 12 and s 21 are positive; (b) the case that both of the signs are negative (from (Kobayashi et al. 2003)) The reduction procedure is represented by the following operations on the interaction graph. First, we choose any node that has no self-feedback loops as the target node. In Figures 6.10 (a-b), the 2nd node is chosen as the target node. Then, for nodes from each of which an edge leaves towards the target node, we create new edges from these nodes to all the nodes to which an edge enters from the target node. The sign corresponding to each new edge is the same as that corresponding to the path connecting the same two nodes through the target node in the original graph. In Figure 6.10 (a) and Figure

205 196 6 Design of Synthetic Switching Networks 6.10 (b), there are two edges from the 1st and 5th nodes to the removed 2nd node and three edges leaving from the 2nd node to the 1st, 6th, and 7th nodes. Thus, new edges from the 1st node to the 1st, 6th, and 7th nodes and ones from the 5th node to the 1st, 6th, and 7th nodes are created. Here, the edge that begins and terminates at the 1st node is a positive self-feedback loop. In Figure 6.10 (A), because the sign of the edge from the 1st node to the 2nd node in the original graph is positive, the new edges from the 1st node to the 1st, 6th, and 7th nodes are positive, positive, and negative, respectively. On the other hand, in Figure 6.10 (B), because the sign corresponding to the edge from the 1st node to the 2nd node is negative, the new edges from the 1st node to the 1st, 6th, and 7th nodes are positive, negative, and positive, respectively. Continuing the above process until all the remaining nodes in the interaction graph of the reduced network have self-feedback edges, we can eventually obtain low-dimensional ODEs that are easier to analyze than the original highdimensional ones. According to such a procedure, a biomolecular network with only positive feedback loops can be reduced to a minimal model in terms of the number of nodes; the maximum of this model is the number of loops in the original network. It should be noted that the associated ODEs and the nodes in the minimal network can be different, depending on the reduction procedure. For example, if we choose the 1st node as the first target node, we obtain a different minimal network with different associated ODEs. However, according to Theorem 6.6, different reduction procedures have no effects on the equilibria and their stabilities although they can result in different reduced systems with different transient dynamics. A detailed reduction procedure is shown in Figure By applying this procedure, a node in the IG(F ) without any self-feedback loop can be eliminated, and the edges entering and leaving this node are merged. Then, we finally obtain lower-dimensional ODEs and the corresponding interaction graph with a smaller number of nodes than that of the original graph. For instance, a four-node network is eventually reduced to a one-node network with two positive self-feedback loops in Figure 6.11; the network obtained is a minimal network of the four-node network. Theorems are important for the design of switching networks and indicate that a PFN is ideally suited to a switching system. These theoretical results also demonstrate that a PFN or a switching network satisfying the conditions of Theorems is robust to some uncertainty, e.g., time delays and perturbations, because the stability of the equilibrium is not qualitatively affected by delays and there is no oscillatory behavior except stable equilibria that represent switching states. The above results show how to reduce the dimensionality of a biomolecular network to simplify the analysis and the computation of the associated ODEs. However, when we design a switching network, it is convenient for us to start with a minimal network satisfying all the requirements of PFNs and then to extend it to a biologically plausible network with higher dimensions. In

206 6.3 Design of Switching Networks with Positive Loops 197 Figure 6.11 Schematic illustration of the reduction procedure. The original network with four components is reduced stepwise to a minimal network with one component and two self-feedback loops. First, the 4th node is removed and the edges e 43 and e 14 are merged. Then, the 2nd and the 3rd nodes are successively removed. Finally, we obtain the minimal network with only the 1st node and two positive self-feedback loops (from (Kobayashi et al. 2003)) other words, we need to reverse the previous procedure by increasing the dimension of the network. The following theorem shows how to extend a switching network while preserving its equilibria and their stabilities. Theorem 6.7. Let a transformation from (6.26) to (6.25) be x i = f i (x )/d i ẋ i = f i (x) d i x i. (6.30) Assume that the networks described by (6.25) and (6.26) have only positive feedback loops except self-feedback loops and that the orbits of (6.25) and (6.26) have a compact closure in the state spaces. Then, (6.25) and (6.26) have the same equilibria with identical stabilities. The proof of all theorems above requires several conditions to be satisfied; in fact these conditions can always be satisfied in biological systems (Kobayashi et al. 2003). Based on the Theorem 6.7, the procedure to design a switching network by extending a minimal network is as follows: 1. Design a minimal switching network satisfying the requirements for configuration, equilibria, and their stabilities, even if such a network itself may not be plausible from a biological viewpoint.

207 198 6 Design of Synthetic Switching Networks 2. Extend the minimal network by successively adding nodes one at a time that satisfy the assumptions required for the minimal network in order to make the network more plausible and easier to implement in experiments. According to Theorem 6.7, the extended network preserves the static properties of the system in terms of equilibria and their stabilities. Figure 6.12 Schematic diagram of the extension procedure. The original minimal network with only two nodes is extended by adding nodes to obtain a biologically plausible network. First, the 3rd and the 4th nodes are added, then the 5th and the 6th nodes are added (from (Kobayashi et al. 2003)) The above procedure is illustrated schematically in Figure 6.12 and can be viewed as a reverse procedure of the reduction. Starting with an abstract minimal switching network, we obtain a biologically plausible network by adding nodes and edges to the interaction graph. Note that we do not need to introduce time delays because the systems with and without them have the identical equilibria and stabilities. To demonstrate the above procedure, a genetic switch is designed as follows. First, an abstract minimal switching network with two nodes and three positive feedback loops is constructed, as shown in Figure 6.13 (a). Simple algebraic analysis shows that it can have three or four equilibria. Starting from the minimal network and applying the extension procedure, a realistic network with three nodes can be constructed, as shown in Figure 6.13 (b). Next, three different proteins are selected to represent the three nodes, as shown in Figure 6.13 (c). Now, the extended network is composed of only three proteins. We extend it further by incorporating their corresponding mrnas and eventually obtain a biologically plausible network, as shown in Figure 6.13 (d).

208 6.3 Design of Switching Networks with Positive Loops 199 (a) (b) (c) (d) Figure 6.13 Synthetic genetic switching network designed by the proposed procedure. (a) An abstract minimal network with two components and three feedback loops. Each node has a positive self-feedback loop, and the interactions among the nodes form a positive feedback loop; (b) an extension of (a); the 3rd node is added in order to replace the positive self-feedback loop of the 1st node in (a) with mutual negative interactions between the 1st and the 3rd nodes. (c) A realization of the extended network (b); proteins LacI, CI, and TetR are adopted to represent the 1st, 2nd, and 3rd nodes in (b), respectively. The broken line indicates the feedback loop with proteins LacI and TetR, which realizes a toggle switch. The bold line indicates a self-feedback loop of CI. (d) A further extension of (c); the extension includes the mrnas corresponding to the proteins (from (Kobayashi et al. 2003)) The implementation of Figure 6.13(d) is shown in Figure 6.14, where genes laci, tetr, andci and promoters P LtetO 1, P trc 2,andP RM are adopted. Genes laci and tetr with promoters P LtetO 1 and P trc 2 are artificially engineered and used to construct a two-state toggle switch (Gardner et al. 2000). On the other hand, the wild-type P RM promoter has three binding sites, i.e., OR 1,OR 2, and OR 3. In the model, the binding site OR 3 of the P RM promoter is assumed to be artificially altered or mutated so that CI proteins cannot bind to it, as shown in Figure With such a mutated PRM, the

209 200 6 Design of Synthetic Switching Networks transcription rate is monotone with the CI; thus the conditions of monotone dynamical systems are satisfied. Figure 6.14 A model for the implementation of the switching network with two nodes and three feedback loops (Figure 6.13) that include genes laci, tetr, and ci and promoters P LtetO 1, P trc 2 and P RM, where the mrnas of laci, tetr, and ci are omitted for simplicity. The signs indicate the types of interactions among the proteins LacI, TetR, and CI. (tetr 1, tetr 2) and (ci 1, ci 2) are identical to tetr and ci genes but have different promoters and ribosome binding sites (from (Kobayashi et al. 2003)) The detailed eight-dimensional functional differential equations for the real network shown in Figure 6.14 can be found in (Kobayashi et al. 2003). By applying the reduction procedure, they can be reduced to two-dimensional ODEs; thus preserving the equilibria and their stabilities. Numerical simulations show that the switching network can have three or four stable equilibria, depending on the parameter values. Figure 6.16 shows that the network has four stable equilibria, namely, (OFF, OFF), (ON, OFF), (OFF, ON), and (ON, ON); this represents a four-state switch. Note that operons are used to wire individual genes to form a network as shown in Figure An operon is made up of several structural genes arranged under a common promoter and regulated by a common operator. Operons exist primarily in prokaryotes; they also exist in some eukaryotes, including nematodes. Therefore, when tuning the parameters, we need to consider the inefficiency of poly-cistronic transcription for the second gene at the downstream of the promoter (which may be as low as 1/100 as that of the first gene). Clearly, the two-state toggle switch is also embedded in this

210 6.4 Detection of Multistability 201 Figure 6.15 The mutated P RM promoter and its binding sites with binding priorities OR 1 (0)>OR 2 (+). The binding site OR 3 of the P RM promoter is mutated and hence CI proteins cannot bind to it four-state switch. In fact, we can easily show that the toggle switch, as well as other switches described in this chapter, satisfies the conditions of Theorems , and therefore is robust to the uncertain delays and perturbations. See (Kobayashi et al. 2003) for more details on the proof of the theorems, differential equations, and the parameter values. 6.4 Detection of Multistability Biomolecular networks with only positive feedback loops have no dynamical attractors like oscillatory behavior, which makes them work as model switching networks. However, the detection of multistability of such kinds of networks is not a trivial problem because of nonlinearity. Recently, a simple graphical method was developed (Angeli et al. 2004, Angeli and Sontag 2004a). For networks with arbitrary nodes and only positive feedback loops, the stability properties can be deduced mathematically by the open-loop approach. When an open-loop network is monotone and possesses a sigmoidal characteristic, the network is guaranteed to be bistable for some range of feedback strength. Before introducing the general theoretical framework, we first present a simple network with two proteins, Cdc2-cyclin B complex and Wee1, and a mutually inhibitory feedback loop, as shown in Figure 6.17 (a), to show how to analyze its dynamical behavior. The two mutually inhibitory proteins form a positive feedback loop. The equations for this model are as follows: (Angeli et al. 2004):

211 202 6 Design of Synthetic Switching Networks Figure 6.16 The switching network has four stable equilibria. The broken and solid lines are the nullclines of the reduced two-dimensional ODEs (not shown) (from (Kobayashi et al. 2003)) ẋ 1 = α 1 (1 x 1 ) β 1x 1 (νy 1 ) γ 1 K 1 +(νy 1 ) γ 1, (6.31) ẏ 1 = α 2 (1 y 1 ) β 2y 1 x γ 2 1 K 2 + x γ, (6.32) 2 1 where α 1,2 and β 1,2 are rate constants, K 1,2 are the MM constants, γ 1,2 are the Hill coefficients, and ν is a coefficient that indicates the strength of the influence of Weel on Cdc2-cyclin B. x 1 and y 1 represent the concentrations of Cdc2-cyclin B and Weel, respectively. Clearly, the system (6.31) (6.32) is a monotone dynamical system and also a PFN satisfying the conditions of Theorems Therefore, there are no dynamical attractors except stable equilibria. When appropriate parameter values are chosen, the system exhibits two stable equilibria, as shown by the two small circles in Figure 6.17 (b). The approach is based on considering (6.31) (6.32) to be a feedback closure of the open-loop system ẋ 1 = α 1 (1 x 1 ) β 1x 1 ω γ 1 K 1 + ω γ 1, (6.33) ẏ 1 = α 2 (1 y 1 ) β 2y 1 x γ 2 1, (6.34) K 2 + x γ2 1

212 6.4 Detection of Multistability 203 (a) (b) Figure 6.17 Network with two nodes and a mutually inhibitory feedback loop: (a) schematic description of the network; (b) phase plane with bistability. Parameter values are α 1 = α 2 =1,β 1 = 200, β 2 = 10, γ 1 = γ 2 =4,K 1 = 4, and K 2 =1 (from (Angeli et al. 2004)) where ω is an input. Let η = y 1 be the output of (6.33) (6.34) with respect to the input ω. By breaking the feedback loop at the step of the inhibition of Cdc2 (x 1 ) from Wee1 (y 1 ) and by considering the effect of Wee1 (y 1 )oncdc2(x 1 )as an input signal ω (see Figures (a) (c)), the system behavior can easily be analyzed. For the open-loop system, the input ω can be considered to be a parameter but not a state variable as in the full original system. Therefore, the behavior of the output as a function of the input can be realized. Subsequently, by letting η = ω/ν, the original system can be recovered and its system behavior can also be determined. Simple algebraic analysis shows that the open-loop system (6.33) (6.34) has a monostable steady-state for a constant input ω, and thus, the system has a well-defined steady-state input/output characteristic. In fact, for any input ω, the steady-state input/output characteristic for (6.33) (6.34), i.e., y 1 = η = k η (ω), can easily be obtained as follows: k η (ω) =η = α 2(K 2 +(α 1 (K 1 + ω γ 1 )/(α 1 K 1 + α 1 ω γ 1 + β 1 ω γ 1 )) γ 2 ) α 2 K 2 + α β (α 1 (K 1 + ω γ 1 )/(α1 K 1 + α 1 ω γ 1 + β1 ω γ 1 )) γ 2, (6.35) where α β = α 2 + β 2. This function has a single value for every ω, i.e., oneto-one mapping, as shown in Figure 6.18(d); therefore, the open-loop system has a well-defined steady-state input/output characteristic. The equilibria can be detected by simultaneously plotting together the characteristic k η that represents the steady-state output η as a function of the constant input ω and the diagonal η = ω. Algebraically, this amounts

213 204 6 Design of Synthetic Switching Networks Figure 6.18 Mathematical analysis of the Cdc2-cyclin B/Wee1 system carried out by breaking the feedback loop. Schematic change of a feedback system before (a) and after (b) breaking the feedback loop, where ω is the input of the open-loop system and η is the output. (c) The incidence graph. (d) The steady-state input/output characteristic curve k η as a function of ω with the same parameter values as those in Figure The solid curve represents η as a function of ω for unitary feedback, i.e., η = ω. There are three intersection points (I, II, and III), which represent two stable equilibria (I and III) and an unstable one (II). The solid line and dashed lines are η = ω/ν for different values of ν. The dashed lines represent the two tangent lines of the characteristic curve at ν 0.83 and ν 1.8, which are the two bifurcation values. When ν is between the two bifurcation values, the system shows bistability. (e) Bifurcation diagram showing bistability when the feedback strength ν is between 0.83 and 1.8 (from (Angeli et al. 2004)) to searching for fixed points of the mapping k η. In other words, the steadystate input/output characteristic represents the equilibrium curve of the open system as a function of ω. By letting the output be the input, i.e., η = ω, we close the open system and go back to the original one. The intersection points of the two curves, i.e., k η (ω) andη = ω are exactly the equilibria of the closed system. We find that there are three intersection points between these graphs, which we refer to as points I, II, and III, as shown in Figure 6.18 (d).

214 6.4 Detection of Multistability 205 The stability can be detected through the slope of the curve k η at each equilibrium according to Theorem 4.6. When the slope is greater than unity, the equilibrium will be unstable. On the contrary, when the slope is less than unity, the equilibrium will be stable. We see from the picture that this slope is less than 1 at I and III and more than 1 at II. Therefore, the system (6.31) (6.32) has two stable equilibria I and III and an unstable equilibrium II. Now, we describe the general theoretical framework that holds for a general class of positive feedback biomolecular networks. The approach can be directly applied to detect multistablity and bifurcations even for large-scale networks. The results in (Kobayashi et al. 2003) indicates that time delays have no effects on stability and the number of equilibria in PFNs. Therefore, we only consider the following ODEs without delays, ẋ = f(x, ω), (6.36) which describe the evolution of a set of variables x(t) =(x 1 (t),..., x n (t)) with an external input ω; ω is generally a scalar. It is possible to extend this to vector inputs (Angeli and Sontag 2003,Angeli and Sontag 2004a). The output η = h(x) is a function of the state variables. The functions f and h are supposed to be differential at all of their arguments according to the implicit function theorem; at an equilibrium, i.e., f(x(ω),ω)=0,x(ω) can be derived as a function ω provided fx 1 is not singular at the equilibrium. To describe the methodology, we introduce an incidence graph, which is similar to the interaction graph, except for the presence of two extended input and output nodes. An incidence graph has n + 2 nodes, labeled ω, η, andx i, i = 1,..., n. When the input and output nodes are considered equivalent to others, it becomes an interaction graph. The definition of the sign corresponding to a path in an incidence graph is analogous to that in an interaction graph. For example, consider the following system, ẋ 1 = x ω, (6.37) ẋ 2 = x 1 x 2 + x 3 ± ω, (6.38) ẋ 3 = x 1 + x 2 x 3. (6.39) Its output is η = x 3 x 1. Its incidence graph is shown in Figure 6.19 (a). Two critical necessary conditions must be satisfied before adopting the framework: positive monotonicity and a well-defined steady-state input/output characteristic. Here, the monotonicity implies that there are no possible negative feedback loops in the incidence graph or the interaction graph, even when the system is closed under positive feedback. On the other hand, the well-defined input/output characteristic implies that the open-loop system has a monostable steady-state response to constant inputs. The property of monotonicity amounts to checking that the following conditions are satisfied. 1. Every loop in the incidence graph, either directed or not, is positive.

215 206 6 Design of Synthetic Switching Networks (a) (b) Figure 6.19 Example of an incidence graph: (a) the incidence graph of (6.37) (6.39); (b) a cascade of subsystems (from (Angeli et al. 2004)) 2. All paths from the input to the output node are positive. 3. There is a directed path from the input node to each node. 4. There is a directed path from each node to the output node. Note that conditions 1 and 2 are equivalent to the requirement that every possible loop is positive, even after closing under any positive feedback (because a positive path under any positive feedback forms a positive loop). Conditions 3 and 4 are technical conditions needed for irreducibility. When an incidence graph is irreducible, it cannot be divided into two or more subnetworks. In fact, if an interaction graph can be divided into several irreducible subnetworks that satisfy conditions 1 and 2, we can examine each individual subnetwork by applying the approach to each irreducible subnetwork. The well-defined steady-state input/output characteristic implies that for each constant input ω(t) ω, there exists a globally and asymptotically stable equilibrium x for (6.36). We say that (6.36) has a static input-state characteristic k η ( ) : W X (6.40) if for each constant input ω(t) ω Wthere exists a globally and asymptotically stable equilibrium x(ω) =k η ( ω) with f( x( ω), ω) = 0, where W and X are the input space and the state space, respectively. We also define the static input/output characteristic as η = k y ( ω) :=h(k η ( ω)) at the equilibria, provided that the input-state characteristic exists. Then, with η = h(k η ( ω)) and η = ω, the analysis of equilibria and their stabilities is relatively simpler. Note that x = k η ( ω) is derived from f( x( ω), ω) = 0 at the equilibrium.

216 6.4 Detection of Multistability 207 A very useful fact in the verification of these two critical conditions is that these two conditions are always true for cascades of systems. Cascades are systems composed of subsystems; the output of each subsystem is an input to the next subsystem, as shown in Figure 6.19 (b). Now, we adopt the method to a higher-dimensional system, i.e., a three-tier MAPK cascade based on the Mos/MEK1/p42 MAPK cascade present in Xenopus oocytes, as shown in Figure 6.20 (a). The system can be described by the differential equations ẋ = V 2x K 2 + x + V 0z 3 x + V 1, (6.41) y 1 = V 6(1200 y 1 y 3 ) K 6 + ( y 1 + y 3 ) V 3xy 1, K 3 + y 1 (6.42) y 3 = V 4x(1200 y 1 y 3 ) K 4 + (1200 y 1 y 3 ) V 5y 3, (6.43) K 5 + y 3 z 1 = V 10(300 z 1 z 3 ) K 10 + (300 z 1 z 3 ) V 7y 3 z 1, (6.44) K 7 + z 1 z 3 = V 8y 3 (300 z 1 z 3 ) K 8 + (300 z 1 z 3 ) V 9z 3. (6.45) K 9 + z 3 We show how the approach can be adopted to detect the multistability. The first step is to view the system as a cascade of three modular subsystems: the one-dimensional x (Mos) subsystem, whose input is ω and output is η = x, the two-dimensional y 1 y 3 (MEK) subsystem, whose input is ω = x and output is η = y 3, and the two-dimensional z 1 z 3 (MAPK) subsystem, whose input is ω = y 3 and output is η = z 3, as shown in Figure 6.20(c). It is straightforward to see that the Mos subsystem helps realize a well-defined I/O characteristic. They can also be proven for the MEK and MAPK subsystems. Therefore, the entire cascade helps realize a well-defined I/O characteristic. Next, monotonicity needs to be verified. This is trivial for all the subsystems. Because each subsystem in the cascade is monotone, the entire cascade is guaranteed to be monotone. Thus, the framework can be adopted for the system described by (6.41) (6.45). Clearly, the closed-loop system is obtained with ω = z 3 for the one-dimensional x (Mos) system. The open-loop system is obtained by taking the input to be ω = νz 3. The global stability behavior of the entire five-dimensional system can be deduced from a plot of the characteristic z 3 = k η (ω) and the line ω = νz 3. Under unitary feedback ν = 1, the system has three equilibria, as shown in Figures (d) and (f). The theoretical framework shows that the middle equilibrium is unstable and the high and low equilibria are stable. Moreover, every trajectory except for the unstable equilibrium itself and a zero-measure separatrix corresponding to the stable manifold of the unstable equilibrium necessarily converges to one of the two stable equilibria. The experimental demonstration of a sigmoidal response of MAPK to Mos based on actual data is shown in Figure 6.20 (e).

217 208 6 Design of Synthetic Switching Networks (a) (b) (c) (d) (e) (f) Figure 6.20 Bistability in a MAPK cascade. Schematic depiction of the Mos- MEK-p42 MAPK cascade, which is a linear cascade of protein kinases embedded in a positive feedback loop (a), together with the corresponding open-loop system (b). (c) Incidence graphs of three subsystems. (d) The steady-state I/O characteristic (k η as a function of ω) for the MAPK cascade, plotted together with the diagonal that represents η as a function of ω with unity feedback. (e) Experimental demonstration of a sigmoidal response of MAPK to Mos. (f) The bifurcation diagram showing the stable on-state (the upper curve), the stable off-state (the lower curve), and the unstable threshold (the middle curve) as a function of feedback strength ν (from (Angeli et al. 2004)) 6.5 Enzyme-driven Switching Networks Although PFNs are general network structures, they are mainly used to design and model gene regulatory networks, as indicated in the preceding sections. In contrast, in this section, we will discuss switching networks at the level of proteins and metabolites. In addition to using PFNs to construct switching networks, other techniques have also been developed. In this section, we provide a rigorous conceptual basis for understanding the relationship between the structures of mass action biochemical reaction networks and their capacity to exhibit bistability (Craciun et al. 2006). Before introducing the relationship, we first consider a motivating example. The conversion of two substrates S 1 and S 2 into the product P is catalyzed by an enzyme E with three intermediate complexes ES 1, ES 2,andES 1 S 2. The kinetic mechanism can be represented as

218 6.5 Enzyme-driven Switching Networks 209 k 1 E + S 1 ES 1, (6.46) k2 k 3 E + S 2 ES 2, (6.47) k4 k 5 k 7 S 2 + ES 1 ES 1 S 2 S 1 + ES 2, (6.48) k6 k8 k ES 1 S 9 2 E + P, (6.49) where k i (i =1,..., 9) denote the rate constants. The corresponding system of coupled differential equations according to the mass action law is ċ E = k 1 c E c S1 + k 2 c ES1 k 3 c E c S2 + k 4 c ES2 + k 9 c ES1 S 2, (6.50) ċ S1 = k 1 c E c S1 + k 2 c ES1 k 8 c S1 c ES2 + k 7 c ES1S 2 + F S1 d S1 c S1,(6.51) ċ S2 = k 3 c E c S2 + k 4 c ES2 k 5 c S2 c ES1 + k 6 c ES1 S 2 + F S2 d S2 c S2,(6.52) ċ ES1 = k 1 c E c S1 k 2 c ES1 k 5 c ES1 c S2 + k 6 c ES1S 2, (6.53) ċ ES2 = k 3 c E c S2 k 4 c ES2 k 8 c ES2 c S1 + k 7 c ES1 S 2, (6.54) ċ ES1 S 2 = k 5 c S2 c ES1 + k 8 c S1 c ES2 (k 6 + k 7 + k 9 )c ES1 S 2, (6.55) ċ P = k 9 c ES1S 2 d P c P, (6.56) where d S1, d S2,andd P are the degradation rates, and F S1 and F S2 are supply rates of the substrates. c i represents the concentration of chemical i. There are appropriate combinations of parameter values, i.e., appropriate rate constants, mass transfer coefficients, and substrate supply rates, for which the system (6.50) (6.56) shows bistability. There are two stable equilibria, one characterized by higher productivity of P and the other by a substantially lower one. The trajectories will converge to one of the two equilibria, depending on the initial conditions. Switching between the two equilibria would result, for example, from a signal in the form of a temporary disturbance in a substrate supply rate (Craciun et al. 2006), as shown in Figure In fact, we can easily show that the network described by (6.46) (6.49) or (6.50) (6.56) is also a PFN that satisfies the conditions of Theorems This example shows that the capacity for bistability is already present in the biochemical reactions. Some networks can give rise to bistability, while others cannot show bistability for any set of parameter values. Capacity for bistability refers to the existence of combinations of parameter values that result in the governing equations allowing at least two distinct stable equilibria. On the basis of the species-reaction (SR) graph, a general technique for determining the relationship between reaction network structures and the capacity for bistability was developed (Craciun et al. 2006). Before describing the technique, some terminology needs to be introduced. The first term is the SR graph, which is obtained as follows: symbols in circles represent species and symbols in boxes represent reactions. Reversible reaction pairs and irreversible reactions are included in the same box (see Figure 6.22). If a species appears within a reaction, then an arc is drawn from

219 210 6 Design of Synthetic Switching Networks Figure 6.21 Simulations of (6.50) (6.56) show that trajectories converge to one of the two stable equilibria. The parameter values are k 1 =93.43, k 2 = 2539, k 3 = 481.6, k 4 = 1183, k 5 = 1556, k 6 = , k 7 = , k 8 = 1689, k 9 = 85842, d S1 = d S2 = d P =1,F S1 = 2500, and F S2 = 1500 (from (Craciun et al. 2006)) the species symbol to the reaction symbol, and the arc is labeled with the name of the complex in which the species appears. For example, assume that species A appears within the reactions A+B F. Thus, an arc is drawn from A to reactions A+B F and labeled with the complex A+B. The SR graph is completed once all the species nodes are connected to the reaction nodes in the manner described above. If a species appears in both complexes of a reaction, as A+B 2A, then two arcs are drawn between A and the reaction, each labeled by a different complex, i.e., A + B and A + A. Anexampleof the SR graph and its corresponding reactions are depicted in Figure The second term is a complex pair. A complex pair in an SR graph refers to a pair of arcs that are adjacent to the same reaction symbol and carry the same complex label. For example, the two arcs labeled C + E in Figure 6.22 constitute a complex pair because they are adjacent to the same reaction symbol and carry the same complex label C + E. There are a total of four complex pairs in Figure 6.22, i.e., C + E, C + G, C + D, anda + B. Note that each arc or edge has two components for a complex pair. Next, we introduce the types of cycles, which are qualitative characteristics of a reaction network. A cycle is similarly defined as a feedback loop in the interaction graph and incidence graph except no direction in SR graph, i.e., a closed path in which no arc or vertex is traversed twice. For example, there are

220 6.5 Enzyme-driven Switching Networks 211 A + B F A + B cycle 1 C + D B C + D A C + G C + G cycle 2 C + E Cycles 1 and 2 split the red c-pair D C + E Figure 6.22 An example of the SR graph and its corresponding reactions (redrawn from (Craciun et al. 2006)) three cycles in Figure 6.22; labeled cycle 1 and cycle 2, and a third unlabeled cycle A B D C A (i.e., cycle 3). Three kinds of cycles need to be classified: namely, odd-cycles, even-cycles, and 1-cycles. These classifications are not mutually exclusive. A cycle can, for example, be both an odd-cycle and a 1-cycle. An odd- (or even-) cycle refers to a cycle containing an odd (or even) number of complex pairs. Therefore, all three cycles in Figure 6.22 are odd cycles. In particular, there is only one complex pair, A + B, A + B, for cycle 1; there is only one complex pair, C + D, C + D, for cycle 2; there is also only one complex pair, A + B, A + B, for cycle 3. A 1-cycle in an SR graph is a cycle such that the stoichiometric coefficient associated with each of its arcs is one. Clearly, all three cycles in Figure 6.22 are 1-cycle. Finally, we say that two cycles split a complex pair if both arcs of the complex pair are among the arcs of the two cycles and one of the arcs is contained in just one of the cycles. In Figure 6.22, cycles 1 and 2 split the C + D complex pair. Both arcs are among the two cycles, but cycle 1 contains only one of the arcs with C + D. The large outer cycle (i.e., cycle 3) and cycle 1 also split the complex pair C + D, as do the large outer cycle and cycle 2. A technique based on a graph theoretical method was developed in order to discriminate complex reaction networks that can admit multiple equilibria from those that cannot (Craciun et al. 2006). It has been shown that if

221 212 6 Design of Synthetic Switching Networks Table 6.1 Some networks and their capacity for bistability (from (Craciun et al. 2006)) Entry Networks Remark Bistability the graph satisfies certain conditions, the differential equations corresponding to the network cannot admit multiple equilibria no matter what values the parameter take. Because these conditions are not stringent, they amount to powerful necessary conditions that a network must satisfy if it is to have the capacity to produce multiple equilibria. Theorem 6.8. Consider a reaction network whose SR graph has both of the following properties.

222 6.5 Enzyme-driven Switching Networks Each cycle in the graph is a 1-cycle, an odd-cycle, or both. 2. No complex pair is split by two even-cycles. For such a reaction network, the corresponding differential equations cannot admit more than one positive equilibrium, no matter what values the parameters take. ES ES E+P E+P E+P ES P E+S E+S ES E+S E S Figure 6.23 The SR graph for entry 1 of Table 6.1 Although the theorem does not provide sufficient conditions for the capacity of multistability, it provides a necessary condition for bistability. In particular, when every stoichiometric coefficient is one, which is very common in biochemical systems, two even-cycles must split a complex pair to generate multiple equilibria. According to Theorem 6.8, the networks shown in the first three entries in Table 6.1 cannot admit more than one positive equilibrium, no matter what values are assigned to their parameters. Clearly there is no bistability for the system of Figure 6.22 according to Theorem 6.8 because there are only three odd-cycles. For entry 1 of Table 6.1, the SR graph is shown in Figure 6.23, where there is only one 1-cycle (containing no complex pair); therefore the network satisfies the conditions of Theorem 6.8. See (Craciun et al. 2006) for more a detailed analysis on the other networks shown in Table 6.1. Take dihydrofolate reductase (DHFR), a crucial enzyme along the pathway for thymine synthesis, as an example. The DHFR promotes the overall

223 214 6 Design of Synthetic Switching Networks Figure 6.24 Reactions and rate constants for DHFR catalysis: E, DHFR; H2F, dihydrofolate; H4F, tetrahydrofolate; NH, NADPH; N, NADP + ; and EX, X bound to DHFR (from (Craciun et al. 2006)) reaction, as shown in Figure Its SR graph and the bistability are shown in Figure 6.25 and Figure 6.26, respectively. In addition to the techniques mentioned above, i.e., the interaction graph and SR graph techniques, there are some other techniques that can be used to determine whether a given network has the capacity to exhibit multiple equilibria, e.g., the Thomas conjecture (Thomas 1981), the Kauffan multistability conditions (Kaufman et al. 2007), the injective polynomial function approach (Craciun and Feinberg 2005, Craciun and Feinberg 2006), and the chemical reaction network theory implemented by the Chemical Reaction Network Toolbox (Siegal-Gaskins et al. 2009). Interested readers may refer to (Thomas 1981, Kaufman et al. 2007, Craciun and Feinberg 2005, Craciun and Feinberg 2006) for more details on the theories and examples. All the above approaches have been compared and applied to one-component and two-component subnetworks (Siegal-Gaskins et al. 2009). It was demonstrated that different methods have their merits and drawbacks; this suggestes that different techniques may be of limited use in the analysis of different kinds of networks. For example, the Thomas conjecture (Thomas 1981), which states that a necessary condition for the existence of multiple equilibria is the presence of a positive loop in the interaction graph, is only a necessary condition and does not preclude the existence of multiple equilibria for any PFNs. How-

224 6.5 Enzyme-driven Switching Networks 215 ENH4F N + EH4 N + EH4 E + H4F EH4F H4F + EN E + H4F E + N EN E + N Split c-pair H4F + EN NH + EH2F NH + E ENH NH + E H2F + E EH2F H2F + E Cycle 1 Cycle 2 ENHH2F EH2FNH NH + EH2F ENH4F H2F + ENH ENHH2F H2F + ENH Figure 6.25 The SR graph corresponding to reactions shown in Figure 6.24 (redrawn from (Craciun et al. 2006)) ever, not all PFNs have multiple equilibria, and even for the same network, different sets of parameters may have a different number of equilibria. Such a situation is similar to the Kauffan multistability condition, which states that multistationarity requires either the presence of a variable nucleus or else the presence of two nuclei of opposite signs; this is also a necessary condition (Kaufman et al. 2007), where a nucleus is a union of one or more disjoint loops that includes all vertices in an interaction graph. There are many pioneering works in the area of detecting multiple equilibria of a molecular network. Craciun et al. modeled a cascade of chemical reactions in a chemical network based on the law of mass-action (Craciun et al. 2006). If the details of each reaction are available, one can express exactly such a system as a chemical network based on the rates of the reactions by the law of mass action. But for an enzyme catalyzed metabolic reaction, the details of its intermediate process are generally unknown, thereby limiting the application of such a theoretical result. On the other hand, for an enzyme-catalyzed metabolic reaction, one can formulate the biochemical reactions based on the MM kinetics or the Hill kinetics, even without identifying the details of its intermediate process, thereby avoiding the difficulty of the previous modeling approaches. Based on the Hill kinetics, by exploiting network structures of

225 216 6 Design of Synthetic Switching Networks Figure 6.26 Bistability and switch-like behavior in the DHFR experiment determined from measured rate constants (from (Craciun et al. 2006)) enzyme-driven reactions, a module-based approach has recently been also developed to analyze the multistability of metabolic networks (Lei et al. 2010), which first decomposes a general metabolic network into four types of elementary modules according to the number of substrates and products, and then derives the sufficient conditions for monostability of the metabolic network.

226 7 Design of Synthetic Oscillating Networks Rhythmic behavior represents one of the most striking dynamical phenomena in biological systems. The biological rhythms, including neural, cardiac, glycolytic, mitotic, hormonal, circadian rhythms, and rhythms in ecology and epidemiology, with periods ranging from seconds to years, play important roles in many processes (Goldbeter 1997). Such dynamical phenomena arise from interplay of cellular components and are typically generated by negative feedback loops (Dunlap 1999). From both theoretical and experimental viewpoints, it is still a great challenge to model, analyze, and further predict oscillatory phenomena in various living organisms. Oscillations, particularly periodic oscillations, are widely used in engineering control systems as central clocks to synchronize various elements with periodic behavior. Many multicellular organisms also adopt variations of cellular clocks to coordinate their behavior over the course of the day night cycle. Models and theoretical approaches are essential for gaining understanding of the principles underlying these rhythmic or oscillating processes. The most widely studied models of rhythmic phenomena are circadian oscillators (Dunlap 1999), cell cycle oscillators (Tyson et al. 2001, Tyson and Novak 2001), and glycolytic oscillators (Wolf et al. 2000). It has been shown that using simplified systems and focusing on general mechanisms is useful for obtaining a fundamental understanding of the oscillatory mechanisms. Theoretical studies have yielded models for cellular oscillators and even helped predict oscillatory behavior. Many models of sustained biological oscillators use three components, namely, X Y R X, where X enhances Y, Y enhances R, but R represses X, thereby forming a negative feedback loop. The effect of the variable R can also be accompanied with a time delay in the feedback loop (Tyson et al. 2003). The simplest case of a feedback oscillator is represented by a single gene, its corresponding mrna X, and its product, i.e., protein Y. If the protein inhibits the transcription of the mrna, i.e., X Y X, the gene expression can be oscillatory if the time between the beginning of the transcription and the end of the translation can be represented by the self-inhibitory or auto-repressed system with a time delay.

227 218 7 Design of Synthetic Oscillating Networks This mechanism has been proposed to serve as the basis for the mechanism of the circadian rhythm (Schepper et al. 1999) and the zebrafish somitogenesis oscillator (Lewis 2003). In all these models, time delays are essential to the generation of the oscillations although a simple negative feedback loop without a time delay can also generate oscillations. A network forming a cellular oscillator in living organisms typically involves more components than just a protein and its mrna. These oscillators function faithfully under different environmental conditions. For example, the well-known Goodwin oscillator (Goodwin 1965), a three-dimensional model composed of X, a clock mrna, Y, a clock protein, and Z, an inhibitor of the transcription, describes an oscillatory negative feedback regulation of a translated protein Y that inhibits its own transcription through the inhibitor Z. Subsequently, many other more complex oscillators were proposed, e.g., Goldbeter s models (Goldbeter 1995, Leloup and Goldbeter 1998, Leloup and Goldbeter 2003). In addition to the models mentioned above, some synthetic oscillators have also been constructed, e.g., the repressilator (Elowitz and Leibler 2000), the gene-metabolic oscillator (Fung et al. 2005), and the robust and tunable synthetic gene oscillator (Stricker et al. 2008). Some of them have been implemented experimentally. Excellent reviews can be found elsewhere, e.g., papers (Dunlap 1999, Goldbeter 2002, Pedersen et al. 2005, Kruse and Jülicher 2005, Hasty et al. 2002b) and books (Goldbeter 1997, Segel 1984). In general, because of complicated nonlinearities, it is difficult to guarantee that a biomolecular system will converge to a limit cycle, or a sustained oscillation, even for a simply-structured dynamical system. Therefore, many important physiological factors, e.g., time delays and multiple time scales, are simply ignored in order to reduce the dimensionality and complexity of the systems. It is well known, however, that such factors may play important roles in the dynamics of biomolecular systems. With rapid advances in mathematics and experiments concerning the underlying regulatory mechanisms, more theoretical results and general techniques have been obtained to elucidate oscillatory behavior. In this chapter, we aim to provide a general framework to design and analyze oscillating networks by exploiting special dynamical features of biomolecular networks and applying recent theoretical results on monotone dynamical systems. In contrast to the preceding chapter, in which the asymptotical property of positive feedback networks (PFNs) was explored to construct switching networks, in this chapter, a class of negative feedback networks, namely, cyclic feedback networks (CFNs), was used to construct oscillating networks. 7.1 Simple Oscillatory Networks Cellular oscillations are typically generated by negative feedback loops. To obtain an insight into how to design an oscillatory network, we first consider several simple oscillatory networks.

228 7.1.1 Delayed Autoinhibition Networks 7.1 Simple Oscillatory Networks 219 An important consideration in modeling oscillatory networks is the fact that individual processes need a certain amount of time to be completed. For example, mature mrna is not immediately available shortly after the initiation of the transcription. An oscillator mechanism involves a negative feedback loop in which the responding protein directly binds to the regulatory DNA of its own gene to inhibit its own transcription, thus forming a direct autoinhibition with transcriptional delays, as depicted in Figure 7.1. Protein Delay τ p her1 mrna her1 Delay τ m Gene Figure 7.1 An oscillatory network based on a direct autoinhibition with delays. The protein product acts as a homodimer to inhibit the expression of gene her1 For this simple autoregulatory network, let m and p denote the numbers or concentrations of the mrna and protein molecules, respectively. The dynamics of the autoregulation network can then be assumed to obey the following differential equations (Lewis 2003): dm(t) = f(p(t τ m )) d m m(t), (7.1) dt dp(t) = am(t τ p ) d p p(t), (7.2) dt where the constants d p and d m are the degradation rates of the protein and mrna molecules, respectively, a is the production rate of the protein molecules, and f(p) is the production rate of the mrna molecules, which is assumed to be a decreasing Hill function of the protein p. τ m and τ p are the time delays involved in the synthesis of mrna and protein molecules. The amount of regulatory protein p influences the transcription rate, but a significant time, τ m, elapses between the initiation of the transcription and the arrival of the mature mrna molecules at the cytoplasm. Thus, the rate

220 7 Design of Synthetic Oscillating Networks of increase of the number of mature mrna molecules at any instant reflects the value of p at a time that is earlier by an amount τ m.

229 220 7 Design of Synthetic Oscillating Networks of increase of the number of mature mrna molecules at any instant reflects the value of p at a time that is earlier by an amount τ m. Similarly, there is a delay, τ p, between initiation of the translation and the emergence of complete functional protein molecules. The decreasing Hill function f(p) takes the form k f(p) = 1+p n /p n (7.3) 0 with Hill coefficient n, where constants k and p 0 represent the action of an inhibitory protein that acts as a dimer, i.e., n = 2. The behavior is qualitatively similar for cases in which n>1. It can be proved from Bendixson s negative criterion that it is impossible for (7.1) (7.2) to generate sustained oscillations when the two delays are set to zero. To generate sustained oscillations, three conditions must be satisfied: (1) ak/d p d m > 2p 0 ;(2)d p 1.7/T, and d m 1.7/T, with total time delay T = τ p + τ m (Lewis 2003). Sustained oscillations are shown in Figure 7.2. The oscillator based on autoinhibition with time delays possesses the remarkable property that the period is mainly determined by the total time delay T and values of the other parameters can change by orders of their magnitudes with very little effect on the period. See (Lewis 2003) for more details. Figure 7.2 Sustained oscillations generated by (7.1) (7.2). The parameter values are a =4.5, b = c =0.23, k = 33, and p 0 = 40, which are chosen so that the predicted period is close to the observed period. The delays τ p and τ m are estimated according to the rate at which RNAP II moves along DNAs and the time needed for intron splicing out and transfer of mature mrnas from the nucleus to the cytosol. The estimated delays are τ p 2.8 min and τ m 31.5 minutes for her1 (from (Lewis 2003)) In (7.2), the production rate of the protein is linear. In contrast, another oscillatory model based on a similar mechanism was proposed, in which the time delay and nonlinearity in protein production and cooperativity in the negative feedback are necessary to generate circadian oscillations (Schepper et al. 1999). The mathematical model for the intracellular circadian rhythm generator is also based on negative feedback regulation of the protein product on the transcriptional rate of its gene. The model includes the production and

230 7.1 Simple Oscillatory Networks 221 degradation of mrna and protein molecules, along with negative feedback of protein molecules upon mrna production, as shown in Figure 7.3. ( a ) ( b ) Figure 7.3 An example of autoinhibition networks. (a) Schematic representation of the circadian rhythm generator. The generator involves the autoinhibition of the protein at the translational or transcriptional level and post-translational processing such as phosphorylation, dimerization, and transport. Protein denotes effective protein molecules in the molecular state that are capable of inhibiting the mrna production and expressing the circadian rhythm. (b) Model interpretation of (a), emphasizing the delay τ and nonlinearity in the protein production, the nonlinear negative feedback, as well as the production and degradation of mrna and protein molecules (from (Schepper et al. 1999)) It is assumed that the protein production and the negative feedback are nonlinear processes, whereas the total time involved in the protein production and subsequent processing is represented by a single delay. The nonlinearities and the delay are critical to generate oscillations. On the basis of the above assumptions, the model is defined as follows: dm(t) dt dp (t) dt = r M 1+(P (t)/k) n q M M(t), (7.4) = r P M m (t τ) q P P (t), (7.5)

231 222 7 Design of Synthetic Oscillating Networks where M and P denote the relative concentrations of mrna and effective protein molecules, respectively, r M is the scaled mrna production rate constant, r P is the protein production rate constant, q M and q P are the mrna and protein degradation rates, respectively, n is the Hill coefficient, the exponent m denotes the nonlinearity in protein production, the delay τ is the total duration of protein production from mrnas, and k represents a scaling constant. Sustained oscillations generated by (7.4) (7.5) are shown in Figure 7.4. Figure 7.4 Sustained oscillations generated by (7.4) (7.5) at r M =1.0 h 1, r P = 1.0 h 1, q M =0.21 h 1, q P =0.21 h 1, n =2,m =3.0, τ =4.0 h,andk =1.The protein and mrna concentrations are shown by the continuous and dashed lines, respectively (from (Schepper et al. 1999)) The simple autoinhibition networks with negative feedback on genetic expression are relatively easier to analyze theoretically. Many other oscillators based on such a mechanism have also been developed, such as the Gro/TLE1- Hes1 repression model (Bernard et al. 2006) and the discrete stochastic Hes1 delay model (Barrio et al. 2006), and others (Monk 2003). One disadvantage of these oscillators is that not all the processes can be specified e.g., phosphorylation. Therefore, they may be too general to account for various aspects of dynamics such as entrainment and robustness. The negative feedback on genetic expression has been subsequently used to analyze various periodic phenomena in many biomolecular networks (Goldbeter 1995, Leloup and Goldbeter 1998, Leloup and Goldbeter 2003) Goldbeter s Models In Drosophila, the negative autoregulatory feedback established by the period (per) gene is at the heart of the circadian oscillator, as shown in Figure 7.5.

232 7.1 Simple Oscillatory Networks 223 The gene per is first expressed in the nucleus and transcribed into mrna. The per mrna is then transported into the cytosol where it is translated into the protein P 0 and degraded. The protein undergoes reversible phosphorylation twice at multiple residues, from P 0 into P 1 and from P 1 into P 2.The fully phosphorylated form of the protein is transported into the nucleus in a reversible manner. The nuclear form of the protein P N represses the transcription of the gene per, resulting in a negative autoregulatory feedback. The phosphorylation, dephosphorylation, and degradation steps are assumed to obey Michaelian kinetics. The repression is supposed to be cooperative according to the Hill kinetics (Goldbeter 1995, Gonze et al. 2004). Figure 7.5 The Goldbeter minimal model for circadian oscillations in Drosophila; the model is based on negative autoregulation of the per gene by its protein product PER (from (Goldbeter 1995)) The alternating protein production, gene repression, and protein degradation may lead to sustained circadian oscillations even in continuous darkness. The temporal variation of the concentrations of mrna (M) and of the various protein forms (P 0,P 1, P 2,P N ) is governed by the differential equations dm KI n M = ν s dt KI n + P N n ν m K m + M, (7.6) dp 0 P 0 P 1 = k s M V 1 + V 2, dt K 1 + P 0 K 2 + P 1 (7.7) dp 1 P 0 P 1 P 1 P 2 = V 1 V 2 V 3 + V 4, dt K 1 + P 0 K 2 + P 1 K 3 + P 1 K 4 + P 2 (7.8) dp 2 P 1 P 2 P 2 = V 3 V 4 k 1 P 2 + k 2 P N ν d, dt K 3 + P 1 K 4 + P 2 K d + P 2 (7.9) dp N dt = k 1 P 2 k 2 P N, (7.10) where per mrna is accumulated at a maximum rate of ν s and is degraded by an enzyme at a maximum rate of ν m with MM constant K m. The synthesis rate of the PER protein from M is characterized by a first-order rate constant

233 224 7 Design of Synthetic Oscillating Networks k s. The parameters V i and K i (i =1,..., 4) denote the maximum rates and Michaelis constants of the kinases and phosphatases involved in the reversible phosphorylation of P 0 into P 1 and P 1 into P 2, respectively. The fully phosphorylated form P 2 is degraded with maximum rate of ν d with MM constant K d. The repressor P N is synthesized and degraded with the first-order rate constant k 1 and k 2 in the cytoplasm; P N is transported into the nucleus to inhibit the mrna in the Hill type with coefficient n. Figure 7.6 shows an example of the sustained oscillations in (7.6) (7.10). Figure 7.6 The sustained oscillations generated by the minimal model with ν s = 0.76 μm h 1, ν m =0.65 μm h 1, K m =0.5 μm, k s =0.38 h 1, ν d =0.95 μm h 1, k 1 =1.9 h 1, k 2 =1.3 h 1, K I =1μM, K d =0.2 μm, n =4,K 1 = K 2 = K 3 = K 4 =2μM, V 1 =3.2 μm h 1, V 2 =1.58 μm h 1, V 3 =5μM h 1, and V 4 =2.5 μm h 1 (from (Goldbeter 1995)) The minimal model described by (7.6) (7.10) can be decomposed into 30 elementary steps (Gonze et al. 2004). The stochastic simulations show that dynamical behavior predicted by the deterministic equations remains valid as long as the maximum numbers of mrna and protein molecules involved in the circadian oscillator are of the order of a few tens and hundreds, respectively (Gonze et al. 2004, Gonze et al. 2002a). Because of the presence of intrinsic noise, the trajectory in the phase space transforms into a cloud of stochastic fluctuations around the deterministic limit cycle. Thus, the stochastic and deterministic descriptions are equivalent since the mean behavior of the stochastic system can be captured by the deterministic description, as shown in Figure 7.7. Only when the maximum numbers of molecules of mrna and protein become smaller than a few tens, does the noise begin to obliterate the circadian rhythm. Despite the above fact in Drosophila, a more complex extended model based on the minimal model was established by the period (per) and timeless (tim) genes, as shown in Figure 7.8 (Goldbeter 2002). The model describes

234 7.1 Simple Oscillatory Networks 225 ( a ) ( b ) ( c ) ( d ) Figure 7.7 Comparison of sustained circadian oscillations and limit cycles predicted by the deterministic and stochastic descriptions. (a) The limit cycle and (b) the oscillations obtained in the deterministic model. (c) The limit cycle and (d) the Oscillations obtained in the stochastic simulation by using the Gillespie method. The variables are expressed in terms of the concentrations and the numbers of molecules in deterministic and stochastic simulations, respectively (from (Gonze et al. 2004)) both branches of negative feedback. After expression of both the genes, their respective proteins PER and TIM are phosphorylated at multiple residues. The heterodimer PER-TIM acts as a transcriptional repressor for both genes. The rate of TIM degradation is induced by light, which enables entrainment with the environment. See (Leloup and Goldbeter 1998, Leloup et al. 1999, Leloup and Goldbeter 1999) for more details on the kinetic equations and simulation results. The spontaneous and entrained oscillations are shown in Figure 7.9. The existence of the light dark (LD) cycle is reflected in the model by the periodic, square-wave variation of parameter ν dt, which represents the maximum rate of TIM degradation. It has been shown that the minimal model based on a single negative autoregulatory feedback loop is sufficient for emergence of robust oscillations to occur (Gonze et al. 2002a). The function of the additional branch would enhance system robustness under conditions of continuous darkness, or free running in darkness. In other words, the dual feedback structure contributes to robust fine-tuning of the clock in the case of single parameter perturbations (Stelling et al. 2004b).

226 7 Design of Synthetic Oscillating Networks Figure 7.8 The extended network involving negative regulation of genes per and tim. The box delimits the nucleus.

235 226 7 Design of Synthetic Oscillating Networks Figure 7.8 The extended network involving negative regulation of genes per and tim. The box delimits the nucleus. The negative feedback is exerted by the nuclear PER-TIM complex on per and tim transcription (from (Goldbeter 2002)) Relaxation Oscillators Oscillators with interlinked positive and negative feedback loops appear frequently in biological systems. Such oscillators may have advantages over pure negative feedback loops in some contexts, e.g., great robustness to parameter changes and noise (Barkai and Leibler 2000,Vilar et al. 2002). Interlinked positive and negative feedback can produce relaxation oscillations that exhibit rapid transitions followed by durations of slow change. In this section, we present a model for producing a relaxation oscillation; the oscillation is produced by virtue of the competition between a strong self-activating gene A that activates a repressor R and the repression of A by the repressor R (Barkai and Leibler 2000). The implementation of such a hysteresis-based activator repressor relaxation oscillator was also proposed in (Hasty et al. 2001), as shown in Figure The relaxation oscillator is constructed as follows: first, both the repressor CI (X) and the activator RcsA (Y ) are under the control of the same promoter P RM, so that the functional form of the production term f(x) is identical for both proteins. Then, the network is constructed from two plasmids, one for the repressor and one for the activator, and that we control the number of plasmids per cell for each type. Finally, the interaction of RcsA and CI leads to the degradation of CI (Hasty et al. 2001). Defining the concentrations as the variables, i.e., x =[X] andy =[Y ], the equations governing this network are

236 ( a ) ( d ) 7.1 Simple Oscillatory Networks 227 ( b ) ( e ) ( c ) ( f ) Figure 7.9 Effects of asymmetrical conditions (i.e., different parameter values in the two branches) and of entrainment by an LD cycle. Panels (a) (c) on the left refer to the case of continuous darkness, whereas panels (d) (f) on the right pertain to the entrainment by a 12:12 LD cycle (from (Leloup and Goldbeter 1998)) where dx dt = m xf(x) γ x x γ xy xy, (7.11) dy dt = m yf(x) γ y y, (7.12)

237 228 7 Design of Synthetic Oscillating Networks 1+x 2 + ασ 1 x 4 f(x) = 1+x 2 + σ 1 x 4 + σ 1 σ 2 x 6 ; (7.13) here, m x and m y are the plasmid copy numbers for the two species, i.e., the number of plasmids per cell. The P RM promoter and its binding affinities are shown in Figure 6.5. The production term f(x) can be obtained by the regulation of the P RM operator region of λ phage. The system is a DNA plasmid consisting of the promoter region and the ci gene. The promoter region contains three operator sites known as OR 1,OR 2, and OR 3, as shown in Figure 6.5. The gene ci expresses repressor CI, which in turn dimerizes and binds to the DNA as a TF. The binding can take place at one of the three binding sites. The binding affinities are such that, typically, the binding proceeds sequentially: the dimer first binds to OR 1, then to OR 2, and finally to OR 3. The biochemical reactions include both fast and slow reactions. Letting X, X 2,andD denote the repressor, the repressor dimer, and the DNA promoter site, respectively, we can write the fast reactions as follows: X + X K1 X 2, (7.14) K 2 D + X 2 D 1, (7.15) K 3 D 1 + X 2 D 2 D 1, (7.16) K 4 D 2 D 1 + X 2 D 3 D 2 D 1, (7.17) where D i denotes dimer binding to the OR i site, and K i = k i /k i are equilibrium constants. Let K 3 = σ 1 K 2 and K 4 = σ 2 K 2 ;thusσ 1 and σ 2 represent the binding affinities relative to the dimer OR 1 affinity. Based on singular perturbation theory, the fast reactions rapidly converge to quasi-equilibrium states. The reactions governing the slow processes are as follows: D + RNAP k t D+ RNAP + nx, (7.18) D 1 + RNAP D kt 1 + RNAP + nx, (7.19) D 2 D 1 + RNAP αk t D 2 D 1 + RNAP + nx, (7.20) D 3 D 2 D 1 + RNAP D 0 3 D 2 D 1 + RNAP + nx, (7.21) X kx, (7.22) where RNAP denotes the RNA polymerase, n is the number of repressor proteins per mrna transcript, and α>1 is the degree to which the transcription is enhanced by dimer occupation of OR 2. Assuming that D 3 D 2 D 1 completely terminates the transcription, the transcription rate of (7.21) is set to be zero. The protein multimers and other complexes can be eliminated by utilizing the inherent separation of time scales (i.e., setting the fast reactions into equilibrium states). This allows algebraic substitution and yields the equation

238 7.1 Simple Oscillatory Networks 229 ( a ) ( b ) Figure 7.10 The relaxation oscillator. (a) The schematic of the network. The P RM promoter is used on two plasmids to control the production of repressor CI (X) and RcsA (Y ). After dimerization, the repressor acts to turn on both plasmids through its interaction at P RM. As its promoter is activated, the RcsA concentration increases, leading to an induced reduction of CI. (b) The simulation which arises as the RcsA-induced degradation of repressor. The parameter values are m x = 10, m y =1,γ x =0.1, γ y =0.01, γ xy =0.1, σ 1 =2,σ 2 =0.08, and α = 11 (from (Hasty et al. 2001)) dx dt = m xf(x) γx, (7.23) where γ = k x /d T nk t p 0 K1 K 2 and d T is the total concentration of DNA promoter sites and is kept constant, i.e., m x d T = d 0 (1 + K 1 K 2 x 2 + σ 1 (K 1 K 2 ) 2 x 4 + σ 1 σ 2 (K 1 K 2 ) 3 x 6 ). (7.24) The dimensionless variables are defined by t = tk t p 0 d T n K 1 K 2 and x = K 1 K 2 x. We still use x and t in (7.23) by replacing x and t for simplification. Combining (7.23) and the fact that the interaction of RcsA and CI leads to the degradation of CI, we can finally obtain (7.11). Similarly, (7.12) can be also obtained. The network shown in Figure 7.10 is a hysteresis oscillator based on interlinked positive and negative feedback. It is based on hysteresis, in other words, its construction is based on a two-way discontinuous switch, as shown in Figure 6.6. The positive feedback gives bistability while the interlinked negative feedback drives hysteresis, i.e., drives the bistable system back and forth

239 230 7 Design of Synthetic Oscillating Networks between its two steady-state regimes. First, consider y to be the signal and x to be the response, and plot the steady states as a function of y. We obtain an S-shaped signal-response curve, indicating that the network functions as a toggle switch (i.e., replace γ with y in Figure 6.6). For intermediate values of y, the network is bistable. Conversely, plotting y (response) as a function of x, we can obtain a simple linear response curve. These curves are referred to as the x-nullcline and the y-nullcline. The intersection point of the two curves represents a steady-state, or an equilibrium for the full system, but the system does not settle in this steady-state because it is unstable. Instead, the system oscillates in a closed orbit around the steady-state. Some other relaxation oscillators have also been constructed (Tyson et al. 2003, Chen and Aihara 2002b, Guantes and Poyatos 2006, Kurosawa et al. 2006). Relaxation oscillators have been shown to be consistent with higher noise-rejection properties (Barkai and Leibler 2000) and have been shown to facilitate synchronization (McMillen et al. 2002) Stochastic Oscillators The constructive roles of noise and disorder in nonlinear biomolecular systems have been extensively studied. The most well known example of such a role is the phenomenon of noise-induced oscillations. In addition to oscillations generated by various deterministic mechanisms such as delayed negative feedback, internal rhythms can also be generated by intrinsic or extrinsic noise, although the corresponding deterministic systems are in steady states. Such a phenomenon is known as noise-induced coherent motion, which has become an active topic of research since it has a variety of applications. We call such noise-induced oscillators stochastic oscillators. Consider the genetic oscillator shown in Figure The deterministic dynamics of the model is given by the following rate equations: dd A = θ A D A γ A D A A, dt (7.25) dd R = θ R D R γ R D R A, dt (7.26) dd A = γ A D A A θ A D dt A, (7.27)

240 7.1 Simple Oscillatory Networks 231 dd R = γ R D R A θ R D dt R, (7.28) dm A = α dt AD A + α A D A δ MA M A, (7.29) da dt = β AM A + θ A D A θ R D R A(γ A D A + γ R D R + γ C R + δ A ), (7.30) dm R = α dt RD R + α R D R δ MR M R, (7.31) dr dt = β RM R γ C AR + δ A C δ R R, (7.32) dc dt = γ CAR δ A C, (7.33) where D A and D A denote the numbers of activator genes in which A is bound and not bound to its promoter, respectively. D R and D R refer to the repressor promoters. M A and M R denote the numbers of mrnas of A and R. A and R are the numbers of activator and repressor proteins. C corresponds to the inactivated complex formed by A and R. The constants α and α denote the basal and activated rates of transcription; β, the rates of translation; δ, the rates of spontaneous degradation; γ, the rates of binding of A to other components; and θ, the rates of unbinding of A from those components. The results obtained by the deterministic and stochastic analysis are shown in Figure The deterministic result was obtained from numerical integration of (7.25) (7.33), whereas the stochastic result was obtained by the Gillespie stochastic algorithm. It can be found that some specific parameter values that yield a stable steady-state in the deterministic case can produce sustained oscillations in the stochastic case. Therefore, the presence of noise not only changes the behavior of the system by adding more disorder but can also lead to marked qualitative differences (Vilar et al. 2002). To measure the temporal coherence of noise-induced oscillations, we introduce an index, called the the signal-to-noise ratio (SNR), defined as (Zhou et al. 2008), SNR = T k t Var(Tk ), (7.34) (7.35) where T k = τ k+1 τ k (here τ k is the time for the presence of the kth firing of the noise-induced oscillator to occur) represents the kth pulse duration and

241 232 7 Design of Synthetic Oscillating Networks Figure 7.11 A genetic oscillator (from (Vilar et al. 2002).) t denotes the average over time. A plot of SNR versus noise intensity reveals non-monotonic behavior, which is a signature of stochastic or coherence resonance. The noise intensity at which the SNR attains its maximum gives the amount of noise that can be introduced into the system to play the best constructive role (Hou and Xin 2003, Li and Lang 2008). 7.2 Design of Oscillating Networks with Negative Loops The relationship between network topology and functionality is also an important research topic because the topology of a network plays an important role in determining the functions of the network. For example, networks with only positive feedback loops have no dynamical attractors (Kobayashi et al. 2003) except stable equilibria as explained in Chapter 6. A system-level understanding of topological structures and biological functions requires a set of principles and methodologies that link the behavior of molecules to network characteristics and functions. The aim of this section is to introduce a general framework to design and analyze oscillatory networks by using cyclic feedback networks, which are also closely related to synthetic biology. Many oscillatory networks belong to the class of the scope of such network structures, e.g., the Goodwin model (Goodwin 1965), Goldbeter s single loop model (Goldbeter

242 7.2 Design of Oscillating Networks with Negative Loops 233 (a) (b) Figure 7.12 Time evolution of R for (a) the deterministic equations (7.25) (7.33) and (b) the stochastic version of the model. The parameter values are α A =50 h 1, α A = 500 h 1, α R =0.01 h 1, α R =50h 1, β A =50h 1, β R =5h 1, δ MA =10h 1, δ MR =0.5 h 1, δ A =1h 1, δ R =0.05 h 1, γ A = 1mol 1 h 1, γ R = 1mol 1 h 1, γ C = 2mol 1 h 1, θ A =50h 1, and θ R = 100 h 1 (from (Vilar et al. 2002)) 1995), and the synthetic oscillator repressilator (Elowitz and Leibler 2000). One desirable property for a cyclic network is that their omega-limit sets are composed of only periodic orbits and equilibria. Such a property drastically reduces the difficulty in theoretical analysis and design of oscillators. Negative cyclic networks with certain conditions have no stable equilibria but have stable periodic oscillations. In other words, the asymptotical dynamics of such networks obey the Poincaré Bendixson Theorem, although it is a highdimensional system. Such a property is clearly ideal for designing or modeling cellular oscillators. Next, we describe the theoretical model of cyclic feedback networks and then present its general representation by relaxing several constraints Theoretical Model of Cyclic Feedback Networks A network with cyclic feedback loops, also known as a cyclic feedback network (CFN), can be represented generally by functional differential equations as follows (Mallet-Paret and Sell 1996a, Mallet-Paret and Sell 1996b):

243 234 7 Design of Synthetic Oscillating Networks dx i (t) = f i (t, x i 1 (t α i ),x i (t),x i+1 (t β i )), 0 i n, (7.36) dt where the index i is taken mod(n +1),x i R +, f i : R +4 R, α i R, and β i R. Clearly, there are n + 1 nodes in the network. Relations are imposed on the time delays, i.e., α i = β i 1 for 1 i n, (7.37) which means either α i = β i 1 =0orα i > 0 without the edge from node i to node i 1, and f i is independent of variable x i 1. Feedback conditions are also imposed on the nonlinearities f i (7.36) as follows: { 0 if δ i f i (t, u, 0,v) u 0andδ+v i 0, 0 if δ u i 0andδ+v i (7.38) 0, where δ i ± { 1, 1} are constants satisfying δ i = δ i 1 +, (7.39) which implies either δ i = δ i 1 + =0orδ i 1 + > 0andf i is independent of the variable x i 1. A further assumption is that f 0 is independent of its second argument, f 0 (t, u, w, v) =f 0 (t, w, v). (7.40) Although (7.36) has multiple time delays, it can be reduced to the canonical form by using the transformation for 0 i n, where i 1 σ i = (sgn r) i δ+,r= j y i (t) =σ i x i (rt γ i ), (7.41) j=0 with σ 0 = 1 and γ 0 = 0 and by setting n i 1 β i, and γ i = β j (7.42) i=0 δ = (sgn r) n+1 n j=0 j=0 δ j +. (7.43) The canonical form of a cyclic network can be written as follows: ẏ 0 (t) =g 0 (y 0 (t),y 1 (t)), (7.44) ẏ i (t) =g i (y i 1 (t),y i (t),y i+1 (t)), 1 i n 1, (7.45) ẏ n (t) =g n (y n 1 (t),y n (t),y 0 (t 1)), (7.46)

244 7.2 Design of Oscillating Networks with Negative Loops 235 where the functions g i are given by g i (t, u, w, v) =σ i rf i (rt γ i,σ i 1 u, σ i w, σ i+1 v) (7.47) with { g 0 (t, 0,v) 0 if v 0, 0 if v 0, (7.48) { g i (t, u, 0,v) 0 if u 0andv 0, 0 if u 0andv 0, (7.49) { g n (t, u, 0,v) 0 if u 0andδ v 0, 0 if u 0andδ v 0. (7.50) To model a cellular oscillator by a cyclic network, we need to describe the network structure through interaction graphs introduced in Chapters 2 3 and 6. Let s ij = 1, 0, 1 represent negative, no, positive interactions from node j to node i, respectively. Note that s ij and τ ij represent the same notation with s ij and τ ij in this book, respectively. For any two nodes i and i+1 (1 i n 1) of a CFN, if s i,i+1 = 0, then s i+1,i is non-zero and the interaction between these two nodes is said to be a one-directional interaction. If both s i,i+1 and s i+1,i are non-zero, the interaction between these two nodes is said to be a bi-directional interaction A Special Cyclic Feedback Network According to the mathematical expression of the cyclic feedback systems, the corresponding molecular network can be obtained as follows (Wang et al. 2004, Wang et al. 2005). Assumption For i =1,..., n 1, 1. the reaction rate of the ith component of the network f i may depend on the (i 1)th, the (i +1)th, and the ith components; 2. if the interaction from the ith component to the (i +1)th component is positive (or negative), then the interaction from the (i +1)th component to the ith component is non-negative (or non-positive); 3. the last reaction rate f n depends on the (n 1)th and the nth chemical components. In addition, if both s i,i+1 and s i+1,i are non-zero, the delay τ i,i+1 = τ i+1,i =0; otherwise, τ i+1,i can be any finite non-negative real number and τ ii =0, where 0 i n. τ ij represents the time delay from node j to node i. Note that the 0th and the (n + 1)th nodes represent the nth node and the 1st node, respectively.

245 236 7 Design of Synthetic Oscillating Networks Therefore, the CFN satisfying Assumption takes the following form for i =1,..., n 1: ẋ 1 (t) =f 1 (x 2 (t τ 12 ),x 1 (t),x n (t τ 1n )), (7.51) ẋ i (t) =f i (x i+1 (t τ i,i+1 ),x i (t),x i 1 (t τ i,i 1 )), (7.52) ẋ n (t) =f n (x n (t),x n 1 (t τ n,n 1 )), (7.53) where τ i+1,i = τ i,i+1 = 0 if the reaction rates f i and f i+1 depend on x i+1 and x i, respectively, i.e., both s i,i+1 and s i+1,i are non-zero; this implies that the interactions between the two nodes i and i+1 are bi-directional. Otherwise, if the interaction between them is one-directional, i.e., s i,i+1 = 0, then τ i+1,i is a non-negative finite real number. In addition, all self-feedbacks have no time delays, i.e., τ ii = 0 for all 0 i n. Assumption also requires that f i (x i+1,x i,x i 1 ) x i+1 f i+1 (x i+2,x i+1,x i ) x i 0, (7.54) where f i+1 (x i+2,x i+1,x i )/ x i 0for1 i n and all x, which indicates that for any two neighboring components i and i + 1, the interaction from the (i + 1)th component to the ith component has the same type as that from the ith component to the (i + 1)th component or is zero. In other words, if s i+1,i =1(or 1), then s i,i+1 =1(or 1) or s i,i+1 = 0. Because of the one-directional interaction from the nth component to the 1st component, we have s n1 =0. In addition, there is an additional restriction on the model as stated in the following Assumption 7.2.2, which will be relaxed in generalized CFNs. Assumption Any two neighboring interactions except node n, i.e., the interaction from node i 1 to node i and the interaction from node i to node i +1 have opposite signs for i n. Assumption determines the signs of the neighboring interactions, except for the nth chemical component, i.e., s i+1,i s i,i 1 = 1 fori n. Such an assumption on the network structure limits the application of cyclic networks. For example, Goldbeter s single loop model does not satisfy this assumption. In the next section, we will extend the CFNs by eliminating Assumption According to Assumption 7.2.1, it is clear that the interaction between node 1 and node n is one-directional, i.e., s n1 = 0. Assumption requires that there be at least one one-directional interaction between nodes in a cyclic network. Figure 7.13 schematically illustrates an example of the basic structure with five nodes. The signs + and on the edges indicate s = 1 and 1, respectively. Note that s can be zero for the interactions represented by the dotted lines, whereas s must be nonzero for the interactions represented by the solid lines.

246 7.2 Design of Oscillating Networks with Negative Loops Figure 7.13 An interaction graph of a CFN with a negative largest loop. A feedback represented by solid lines cannot be zero and a feedback represented by dashed lines can be zero (or the interaction can be eliminated). Each node may have a linear or nonlinear, positive or negative self-feedback loop, which is now shown. The interactions between any two neighboring nodes have opposite signs, except in the caseofnode5 Although there are multiple time delays in (7.51) (7.53), we can actually reduce all time delays equivalently into a single total time delay τ by using the transformation y i (t) =x i (t γ i ) (7.55) for 1 i n, where n γ i = τ j,j 1, (7.56) j=i+1 for 1 i n 1 with γ n = 0. It is easy to show that by the transformation (7.55), (7.51) (7.53) can be equivalently transformed to a canonical form of the CFN with only one total time delay as follows: ẏ 1 (t) =f 1 (y 2 (t),y 1 (t),y n (t τ)), (7.57) ẏ i (t) =f i (y i+1 (t),y i (t),y i 1 (t)), 2 i n 1, (7.58) ẏ n (t) =f n (y n (t),y n 1 (t)), (7.59) where the total time delay τ = n i=1 τ i+1,i. Note that τ n+1,n = τ 1n. In contrast to (7.51) (7.53), the canonical cyclic network (7.57) (7.59) is much easier to analyze since it involves only a single delay. There are only phase differences between x i and y i according to the transformation (7.55). When using (7.57) (7.59) to model cellular oscillators, the time delays can be the simplified representation of the accumulated time consumed mainly in the transcription, translation, signal transduction, translocation, and diffusion processes. For simplicity, only the case of one delay for each component is considered, although multiple direct interactions from the jth component to the ith component with different delays may exist. For any two nodes with bidirectional interactions, the time delays between them must be zero, according to Assumption For one-directional interaction between any two nodes, the delay can be any non-negative finite real number.

247 238 7 Design of Synthetic Oscillating Networks In addition to the feedback loops connecting two neighboring nodes, which are all positive, there is a unique largest loop that connects all nodes, as shown in Figure A cyclic network with a positive largest loop falls into the scope of cooperative dynamical systems, i.e., special structures of PFN, which have been thoroughly investigated for both FDEs and ODEs. Cooperative systems exhibit very regular behavior, e.g., typical solutions tend to equilibria in the case of autonomous systems or to periodic solutions in the case of non-autonomous periodic systems. In other words, there is no stable periodic solutions for autonomous monotone dynamical systems with only positive feedback loops (Smith 1995, Kobayashi et al. 2003). When the largest loop is negative, the network (7.51) (7.53) is said to ba a negative cyclic feedback network. Therefore, to ensure existence of periodic solutions, the following important assumption should be made. Assumption The feedback loop connecting all nodes, i.e., the largest loop, is negative. A CFN with a negative largest loop is said to be a negative cyclic network. An example of a negative CFN is shown in Figure The largest loop, i.e., , is a negative feedback loop because of s 21 s 32 s 43 s 54 s 15 = 1. According to Assumptions , it is clear that a negative cyclic network requires that 1. the largest loop be negative and all self-feedback loops can have arbitrary signs, 2. all loops excluding the self-feedback loops and the largest-loop are positive, 3. except the nth node, the interactions are of opposite signs for neighboring nodes, e.g., ifs 21 is positive, then s 32 must be negative, as shown in Figure The time delays do not change the location of an equilibrium in (7.51) (7.53) but can change its stability. If all the eigenvalues of the characteristic equation have negative real parts, then the equilibrium is stable and there is no oscillation near the equilibrium. On the other hand, when a parameter value changes, e.g., τ, if any complex eigenvalue crosses the imaginary axis, then a stable equilibrium loses its stability with appearance of an oscillation because of a typical Hopf bifurcation. Let x =( x 1,..., x n ) be an equilibrium of (7.51) (7.53). Define f 11 f 12 e τ12λ 0 f 1n e τ 1,nλ f 21 e τ21λ f 22 f 23 e τ 23λ 0 A(λ) = 0 f 32 e τ32λ f , (7.60) 0 0 f n,n 1 e τ n,n 1λ f nn where f ij = f i x j x= x (7.61)

248 7.2 Design of Oscillating Networks with Negative Loops 239 for 0 i, j n. Clearly, A(0) is the Jacobian matrix of f =(f 1,..., f n ) with respect to x. Then, the characteristic equation evaluated at the equilibrium x, i.e., det(λi A(λ)) = 0, (7.62) has the form b n λ n + b n 1 λ n b 0 +( 1) n+1 Be λτ =0, (7.63) where I is the n n identity matrix, B = f 1n n 1 i=1 f i+1,i, b n = ( 1) n, and τ = i τ i+1,i. B represents the total feedback strength. Notice that b j for j =0,..., n 1 are functions of f k,k+1 f k+1,k for 1 k n and f ii for 1 i n except f 1n ; this implies that all effects of interactions between nodes k and k +1onb j disappear but the effects on B exist if f k,k+1 is zero. On the basis of the monotone dynamical system theory and the discrete Lyapunov functional, Mallet-Paret and Sell (Mallet-Paret and Sell 1996a, Mallet-Paret and Sell 1996b) obtained the Morse decomposition and established that the Poincaré Bendixson type theorem holds for (7.51) (7.53) when Assumptions are satisfied. Let the natural phase space for the (7.51) (7.53) be C(K), where Here, N =0, 1, 2,... K =[0,τ 21 ] [0,τ 32 ] [0,τ n,n 1 ] [0,τ 1n ] N. (7.64) Theorem 7.1. (Poincaré Bendixson type theorem) Consider the differentiable system (7.51) (7.53). Assume that Assumptions and (7.54) hold. Let x(t) be a solution of (7.51) (7.53) on some time interval [t 0, ). Let ω(x) C(K) denote the omega limit set of this solution in the phase space C(K). Then, either 1. ω(x) is a single non-constant periodic orbit; or 2. for each solution u(t) of (7.51) (7.53) in ω(x), i.e., for solutions with u(t) ω(x) for all t R, we have α(u) ω(u) E, (7.65) where α(u) and ω(u) denote the alpha and omega limit sets of u, respectively, and E C(K) denotes the set of equilibria. This theorem does not provide sufficient conditions for existence of periodic solutions but indicates that omega limit sets of cyclic feedback systems are composed of only periodic orbits and equilibria, which are a desirable property for modeling cellular oscillators if it can be shown that there is no stable equilibrium. Take the total delay τ as a parameter and assume that when τ = 0, there is at least one pair of complex eigenvalues for the characteristic equation (7.63). By changing τ, the asymptotic stability of x can be detected from the roots of (7.63).

249 240 7 Design of Synthetic Oscillating Networks Since the solution exists and is bounded, the omega limit set is non-empty, which means that all asymptotical solutions are periodic orbits provided that there is no stable equilibrium. Therefore, the basic idea is to destabilize all equilibria, which can be carried out mainly by linear analysis for most cases. For instance, one way to generate a global periodic oscillation is to identify all equilibria and then make all of them unstable by tuning the related parameters. Next, we first state theoretical results for the local analysis with respect to initial values, which guarantees that a negative cyclic network converges to a local periodic orbit by tuning τ as a parameter (Wang et al. 2005). Theorem 7.2. Suppose that Assumptions hold for (7.51) (7.53). The following equation has one nonzero real root v at x: ( i I e ( 1) i 2 bi v i ) 2 +( i I o ( 1) i 1 2 bi v i ) 2 B 2 =0, (7.66) and further, { ( 1 B 2 v 2 ( 1) k 1 2 bk v k ( 1) i 1 2 ibi v i k I o i Io ( 1) k 2 bk v k ( 1) i 2 2 ibi v i 0, k I e i I e,i 2 (7.67) where I e = {i : mod (i, 2) = 0, 0 i n}, I o = {i : mod (i, 2) = 1, 0 i n}, x is stable with at least a pair of complex eigenvalues at τ =0, and mod (x, y) denotes the remainder after x divided by y. Then, there exists τ such that [ ( τ = 1 v ( 1) n )] i I arccos e ( 1) i 2 b i v i, (7.68) B where the range of arccos is [0,π], and (7.51) (7.53) will converge to a stable periodic orbit when τ is near τ and τ> τ. This theorem indicates that if there exists τ at which λ = j v with j = 1 is a root of (7.63) and the derivative of the real part of the eigenvalues at τ = τ is not zero, then (7.51) (7.53) will converge to a stable periodic orbit when τ is near τ and τ> τ for any initial conditions near x except x itself. Therefore, a negative cyclic network can have periodic orbits which can be proved by the Hopf bifurcation theorem for FDEs. Based on the local convergence of Theorem 7.2, global convergence conditions of non-trivial periodic orbits can be derived as follows (Wang et al. 2005): Theorem 7.3. Assume that Assumptions hold for (7.51) (7.53) and the feedback for the total one-directional interaction is sufficiently strong,

250 7.2 Design of Oscillating Networks with Negative Loops 241 i.e., f1 f i+1 x n i x i for those i with fi x i+1 =0at any equilibrium is sufficiently large; then, there exists τ such that [ ( τ = 1 v ( 1) n )] i I arccos e ( 1) i 2 b i v i, (7.69) B where the range of arccos is [0,π], and (7.51) (7.53) will converge to a stable periodic orbit for almost all initial conditions when τ> τ. The product f 1 f i+1 x n i x i for all those i with f i x i+1 = 0 represents the total strength of the one-directional interaction, i.e., the product of those interactions f i+1 / x i with s i,i+1 = 0 but s i+1,i 0. The theorem can be proven by showing that all equilibria are unstable because of the existence of an eigenvalue with a positive real part and that the real part never returns to a negative region for any τ> τ. Therefore, there is no stable equilibrium but stable periodic orbits because of the existence of the non-empty omega limit set. Although the conditions appear quite stringent in expressions, it is significant that certain common mechanisms in cellular systems actually satisfy these conditions. Theorem 7.4. Assume that Assumptions hold for (7.51) (7.53). If det(a(0)) < 0 at all equilibria, then for almost all initial conditions, (7.51) (7.53) will converge to a stable periodic solution, where A(λ) is defined by (7.60). The virtue of this theorem is strong. It ensures that (7.51) (7.53) converges to a stable periodic orbit regardless of any non-negative time delays. The theorem can be proven by showing that all equilibria are unstable for any non-negative delays (Wang et al. 2005). It is generally not easy to guarantee stable behavior such as equilibria and periodic orbits even for a small network with a few components due to the nonlinearity of the system. Theorem 7.3 implies that if the feedback for the total one-directional interaction is sufficiently strong, a stable periodic orbit exists. In other words, the omega limit set is non-empty and includes only periodic orbits. On the other hand, Theorem 7.4 indicates that when the determinant of the Jacobian matrix is negative for all equilibria, the system (7.51) (7.53) will converge to periodic orbits from almost all initial conditions and with any non-negative time delay. When all the feedback loops of a system are positive, its orbits have a strong tendency to converge to equilibria. However, for negative cyclic feedback networks, when the conditions of Theorem 7.3 or Theorem 7.4 hold, only stable periodic orbits exist and constitute the omega limit sets, which is quite different from those of positive feedback networks. Therefore, if conditions of Theorem 7.3 or Theorem 7.4 are satisfied, (7.51) (7.53) will inevitably converge to stable periodic orbits. In other words, negative CFNs have ideal properties for constructing oscillatory networks and therefore can be used to

251 242 7 Design of Synthetic Oscillating Networks model and design cellular oscillators. Although Theorem 7.2 is a local convergence theorem, it becomes global convergence for almost all initial condition when there is at most one equilibrium, as stated in Corollary 7.5. Corollary 7.5. Assume that all conditions of Theorem 7.2 hold. If det(a(0)) 0 for all x in a convex set X, then when τ is near τ and τ> τ, (7.51) (7.53) will converge to a stable periodic orbit for almost all initial conditions. By showing that there is at most one unstable equilibrium, we can prove the corollary. As a simple example, we can verify that the following system satisfies the conditions of Corollary 7.5: ẋ i (t) = 1 1+x 2 j (t τ j) x i(t), (7.70) where i and j have the following three pairs of values: (i =1,j =2),(i = 2,j =3),and(i =3,j = 1). It is clear that det(a(0)) 0, or more exactly det(a(0)) < 0 for all x> A General Cyclic Feedback Network Negative CFNs can be used for modeling and designing cellular oscillators when the feedback for the total one-directional interaction is strong enough. However, the special feedback structure in Assumption requires that interactions are opposite for neighboring nodes, except the last one; this is difficult to satisfy for many cellular systems and therefore may limit the potential applications. In fact, such a restriction can be eliminated by a coordinate transformation. In other words, the original CFNs with a special feedback structure can be extended to general ones with any type of interaction for neighboring nodes. Moreover, since there is no limitation on the dimensionality of the general cyclic feedback networks, a cellular oscillator can be modeled and designed even by a large-scale system. Choose any node i and change types or signs of all interactions associated with it. Denote the system obtained under such a transformation as ẏ(t) =g(y τ ), (7.71) where the transformation P for x = Py is defined by a matrix σ 1 0 P =... (7.72) 0 σ n with σ i = 1 andσ j = 1 for all j with j i. By substituting x = Py into (7.71), we get ẋ(t) =f(x τ ) Pg(Px τ ), (7.73)

252 7.2 Design of Oscillating Networks with Negative Loops 243 where P = P 1 is used. It can be easily proven that (7.73) is qualitatively equivalent to (7.71) since P is a reversible and one-to-one map. Therefore, the following theorem can be obtained (Wang et al. 2005). Theorem 7.6. Assume that Assumption holds for the CFN (7.51) (7.53). The transformation (7.72), which changes the signs of all interactions connected to any node i (1 i n), does not change its dynamical properties. Theorem 7.6 implies that the dynamics of (7.71), in which the signs of all the interactions connected to any node i are changed, is qualitatively equivalent to that of (7.73). Moreover, it is also easy to show that such a transformation does not change the type of any feedback loop, which implies that a negative cycle network is still a negative cycle network under this transformation. A simple case of the transformation procedure is shown in Figures 7.13 and 7.14 (a). The difference between them is the types or signs of all interactions connected to node 2. The two cases in Figure 7.14 are dynamically equivalent to the case in Figure ( a ) - 1 ( b ) Figure 7.14 A simple example of the transformation procedure. All these networks have dynamical properties that are qualitatively equivalent to those of Figure (a)the signs of all interactions connected to node 2 are changed. (b) On the basis of (a), the signs of all interaction connected with node 3 are changed An important property of the transformation is that any combination of interactions can be obtained under such consecutive transformations, which actually do not change the type of any loop. By performing such a transformation for each node, we can obtain different cyclic networks with different combinations of interactions, which are all qualitatively equivalent. Therefore,

253 244 7 Design of Synthetic Oscillating Networks Assumption can be eliminated to obtain more general cyclic networks by the reversible and one-to-one map. Corollary 7.7. Assume that Assumptions and 7.2.3, excluding Assumption 7.2.2, are satisfied; then, Theorems and Corollary 7.5 still hold for the CFN (7.51) (7.53). Time delay τ m Translation ks v1 M vm Nucleus per DNA vs Transcription vs vd v3 k2 Time delay τ p P0 P1 P2 PN v2 v4 Phosphorylation k1 Figure 7.15 The Goldbeter single loop model with a time delay to show the slow diffusion, transportation, and or signal transduction processes of molecules between the nucleus and the cytosol. It is a general negative cyclic network (from (Chen and Wang 2006)) In contrast to the restricted structure of the special CFNs, Corollary 7.7 indicates that general negative cyclic networks with any type of interactions on each node can be used to design cellular oscillators. Figures show different network structures, which have qualitatively equivalent dynamical properties. An example of the generalized cyclic network is Goldbeter s single loop model, as shown in Figure See (Wang et al. 2005) for all proofs of the theorems and numerical results for the delayed Goldbeter s single loop model and the synthetic repressilator. 7.3 Construction of Oscillators by Non-monotone Dynamical Systems Cellular functions, such as circadian rhythms, are carried out by interlocked feedback networks, which are made up of many interacting molecules or modules. Understanding how the networks work requires combining phenomenological analysis with molecular and modular studies. It is thus important to

254 7.3 Construction of Oscillators by Non-monotone Dynamical Systems 245 consider both the functions and structures of each module and then elucidate the complex sets of molecules that interact to form functional networks. Therefore, the general principles that dominate the structure and behavior of the interlocked networks may be discovered by understanding each module and the biochemical connectivity among the modules. In this section, we show how monotone modules with simple dynamics can be used to construct nonmonotone interlocked feedback networks functioning as cellular oscillators. A dynamical system in the standard sense of control theory with inputs and outputs is shown as ẋ = f(x, u), y = h(x), (7.74) with a state space X, an input set U, and an output set Y. A dynamical system is said to be monotone if the following property holds, with respect to the orders (see partial order or vector order (4.72)) on states and inputs (Smith 1995, Angeli and Sontag 2003): ξ 1 ξ 2 & u 1 u 2 x(t; u 1,ξ 1 ) x(t; u 2,ξ 2 ) for all t 0, (7.75) where x(t, u, ξ) X denotes a solution at time t with initial condition ξ and input u( ). u 1 <u 2 means that u 1 (t) <u 2 (t) for all t. In monotone control system, a larger input and/or a larger initial condition will produce a larger output; this is very common in cellular systems. For example, high mrna concentration results in a high synthesis rate of its corresponding protein. Monotone systems are one of the most important classes of dynamical systems in theoretical biology (Sontag 2004). Monotone systems with inputs and outputs are important for understanding the interactions between cellular components. Such systems allow the application of the rich theory developed for the classical monotone systems, e.g., theoretical results for PFNs that guarantee convergence of trajectories to equilibria (Kobayashi et al. 2003, Angeli et al. 2004). The monotone control system (7.74) is said to be endowed with a static input-state characteristic k x ( ) : U X (7.76) if for each constant input u(t) ū there exists a globally and asymptotically stable equilibrium x = k x (ū). The static input/output characteristic is defined as K y (ū) :=ȳ = h(k x (ū)), provided the input-state characteristic exists and h is continuous, as illustrated in Figure 7.16 (Angeli and Sontag 2003, Wang et al. 2006a). The existence of an input output characteristic implies the uniqueness of the equilibrium; this can be confirmed as follows: view ū u(t) as a parameter and (7.74) as a feedback closure of an open-loop system with an input and an output, and if the open-loop system exhibits a linear response, a Michaelian response, or any response that lacks an inflection point for each ū, the openloop system is guaranteed to be monostable. By setting the input and output

255 246 7 Design of Synthetic Oscillating Networks ū f( x, u ) = x = kx( u ) y = h( x ) K y( u ) = y = h( x ) = h(kx( u )) I n p u t - s t a t e c h a r a c t e r i s t i c : I n p u t - o u t p u t c h a r a c t e r i s t i c : - - x = kx( u ) Ky( u ) = y = h( x ) = h(kx( u )) Figure 7.16 The static input output characteristic of (7.74) variables to be equal and thus recovering the original system, it will be also monostable (Angeli et al. 2004). To understand how a large-scale network that is not necessarily monotone is built and how it works, one must develop a precise mathematical description of the network and some intuition about its dynamical properties. Complex networks can often be constructed from simple modules, i.e., sets of interacting components that carry out specific tasks and can be connected together. These simple modules can be used to construct an interlocked feedback network with specific functions such as cellular oscillators. A monotone module can be represented in the form of control system (7.74) with monotone condition (7.75). Consider two monotone modules with inputs and outputs, which can be be either scalars or vectors: Σ 1 : ẋ = f x (x, w), y = h x (x), (7.77) Σ 2 : ż = f z (z,y), w = h z (z), (7.78) with U x = Y z and U z = Y x, where U and Y denote the input and output sets, with the following important assumptions: 1. the module Σ 1 is monotone when its input w as well as output y is ordered according to the standard order induced by the positive real semi-axis; 2. the module Σ 2 is monotone when its input y is ordered according to the standard order induced by the positive real semi-axis and its output w is ordered in the opposite order; 3. static input-state characteristics k x ( ) andk z ( ) and static input output characteristics K y ( ) andk w ( ) exist, and are monotonically increasing and decreasing, respectively; 4. every solution of the feedback closure for (7.77) (7.78), i.e., the network Σ defined by (7.79), is bounded. The first assumption implies that for Σ 1, increasing w will cause an increase in y. This assumption can be satisfied if the x-subsystem, i.e., (7.77), is monotone according to (7.75) and h is a monotonically increasing function

256 7.3 Construction of Oscillators by Non-monotone Dynamical Systems 247 of x. The second assumption implies that for Σ 2, increasing y will cause a decrease in w. In other words, the static input output characteristics K y ( ) and K w ( ) are monotonically increasing and decreasing, respectively. The feedback closure network of the two subnetworks Σ i (i =1, 2) is shown in Figure 4.12 and has the form { ẋ = f x (x, h z (z)), Σ : (7.79) ż = f z (z,h x (x)). The first two conditions imply that the network Σ, which does not need to be monotone, can be decomposed into two open-loop modules Σ 1 and Σ 2 with opposite monotonicity, as shown in Figure In other words, a non-monotone network functioning as an oscillator can be constructed and designed by integrating two monotone modules Σ 1 and Σ 2 under delayed negative feedback. The third condition implies that for each constant input and any initial condition, each module will converge asymptotically to a global equilibrium. Although this condition is not trivial to prove rigorously, even for a system of differential equations describing a relatively simple signaling network, it might seem evident from a viewpoint of biochemistry. In addition, it is worth noting that the boundedness of trajectories is generally satisfied in biochemical models because of the conservation of mass and other constraints in a cell. (a) (b) (c) Figure 7.17 Schematic description of I/O monotone modules (7.77) (7.78) under negative feedback. (a) The incidence graph of Σ 1. (b) The incidence graph of Σ 2. (c) The feedback closure of Σ 1 and Σ 2, i.e., the incidence graph of Σ (from (Wang et al. 2006a))

257 248 7 Design of Synthetic Oscillating Networks Oscillatory behavior is a strongly nonlinear phenomenon, and thus, linear stability theory generally does not work. Although bifurcation analysis is a powerful tool for investigating oscillatory behavior, it can only reveal their local existence and does not provide any qualitative insight into the source of oscillations. Then, a simple approach, which can identify the regulation mechanism underlying oscillatory behavior just by linear stability analysis, was established. It relates the oscillatory behavior of a network to destabilization of a steady-state in a simple discrete map (Wang et al. 2006a,Angeli and Sontag 2004b). When the source of the instability for the steady-state rather than the oscillatory behavior itself is examined, linear stability analysis and feedback control theory can be employed. The presence of time delays is an inevitable feature of biomolecular systems, and time delays generally enrich possible dynamics and increase mathematical complexity. By considering the delays in the input and output variables, the correspondence between (7.79) and a discrete map defined by (4.91), i.e., w k+1 =(K w K y )(w k ) (7.80) evolving in U x can be established. In other words, the map preserves the qualitative characteristics of the network Σ. The non-monotone network Σ thus is composed of two open-loop monotone modules (7.77) and (7.78) combined with the connectivity (7.80) between them. It has been shown that (7.79) has a globally attractive equilibrium provided that (7.80) has an unique globally attractive steady-state (see Theorem 4.7 in Chapter 4). It provides a sufficient condition for global asymptotic stability of an equilibrium, and hence its violation is a necessary condition for the existence of periodic solutions in (7.79). It can be shown that attractors of (7.80) are composed of only periodic orbits or steady-states, as shown in Figure 4.13, and the correspondence between the attractors of (7.79) and (7.80) can be further established, which makes the map ideal as an indicator to show when a specific oscillation will occur in (7.79) (Wang et al. 2006a). More accurately, oscillatory behavior in (7.79) can be related to the properties of the two open-loop modules by determining the interactions on destabilization of the steady-state in (7.80). The oscillatory behavior in (7.79), which is a strongly nonlinear phenomenon, can thus be traced to instability of the steady-state in (7.80), which can be easily determined by linear stability analysis. According to the Lyapunov stability theory, the stability of the unique steady-state in (7.80) can in most cases be analyzed based on the linearization at the steady-state. Thus, linear system theory can, in principle, predict the mechanisms giving rise to destabilization of the unique steady-state, i.e., emergence of oscillatory behavior in (7.80), because its attractors are composed of only periodic orbits or steady-states. Using the correspondence between the continuous and discrete systems, the oscillatory behaviors in (7.79) can also be obtained. This property holds if there are no additional bifurcation points between the original bifurcation point and the considered point

258 7.3 Construction of Oscillators by Non-monotone Dynamical Systems 249 in the parameter space. Otherwise, one can choose and change one or more parameters to move the system closer to the original bifurcation point prior to analysis. Accordingly, the mechanisms causing destabilization of the steadystate in (7.80) can be analyzed, i.e., it will be unstable if at least one of the eigenvalues of A has modules greater than 1, where A = (K w K y )/ w we and w e is the unique steady-state of (7.80). Figure 7.18 et al. 2006a)) Schematic depiction of the destabilization mechanism (from (Wang The destabilization of the steady-state in (7.80) can be realized as follows: For any chosen parameter p, by denoting the w-coordinate of the steady-state as w e (p), one can always choose an initial input w 0 (p) with 0 < w 0 (p) w e (p) <ɛ, where ɛ is sufficiently small. When w 2 (p) >w 0 (p), w e will be unstable and (7.79) will be oscillatory for appropriate delays; otherwise, it will be stable and so will the equilibrium of (7.79). Moreover, when w 2 (p) <w 0 (p), one can adjust p until w 2 (p) >w 0 (p) holds, thus, an oscillation will occur in (7.80) and in (7.79), as shown in Figure Therefore, this technique is useful for constructing a cellular oscillator. The technique can also be used to control the amplitude of an oscillation. For w 0 (p) w e (p) =α>0, when w 2 (p) >w 0 (p), the amplitude of the obtained oscillation will be larger than α; otherwise, the map either oscillates with an amplitude smaller than α or converges to a steady-state. The same results hold for (7.79) with appropriate delays. In addition to the indicator of oscillations, amplitude robust to changes in delays can also be obtained from (7.80). By understanding the properties of each module and interactions within the modules that can induce the destabilization in (7.80), a cellular oscillator with networks interlocked by two I/O monotone modules under delayed negative feedback can be constructed. Many well-known models can fall

259 250 7 Design of Synthetic Oscillating Networks into the category of interlocked feedback networks, e.g., Goldbeter s minimal model (Goldbeter 1995, Angeli and Sontag 2004b), the repressilator (Elowitz and Leibler 2000,Wang et al. 2006a), and Goldbeter s dual loop model (Leloup and Goldbeter 1998,Wang et al. 2007); these models can thus be analyzed by using such a technique. Consider the repressilator, a synthetic oscillator with genes ci, tetr, andlaci, as an example. The construction of the repressilator and the emergence of oscillatory behavior will be illustrated stepwise, although the repressilator can be also analyzed by the general CFNs. The module Σ 1 is constructed as follows: the first repressor protein LacI from E. coli inhibits the transcription of the second repressor gene tetr from the tetracycline-resistance transposon Tn10, whose protein product in turn inhibits the expression of the third gene ci from λ phage. Moreover, there is an input variable w and an output variable y. The regulations from the input w to the first repressor protein LacI and from protein CI to the output y are assumed to be positive so that the monotonicity conditions are satisfied. The incidence graph of Σ 1 with input w and output y is shown in Figure 7.19 (a). The module Σ 1 can be described by ṗ 1 = βw βp 1, (7.81) ṁ 3 = α 3 1+p n α 0 m 3, 1 (7.82) ṗ 3 = βm 3 βp 3, (7.83) ṁ 2 = α 2 1+p n α 0 m 2, 3 (7.84) ṗ 2 = βm 2 βp 2, (7.85) with input w and output y = p 2, where p i (i =1, 2, 3) denote proteins LacI, TetR, and CI, respectively, and m i (i =2, 3) denote mrnas of genes ci and tetr, respectively. It is easy to show that for any positive parameters β and α i (i =0, 2, 3) and constant input w, there is only one unique globally asymptotically stable equilibrium ( w, m 3,p 3,m 2,p 2) with negative eigenvalues ( β, β, β, α 0, α 0 ), where m 3 = p 3 = α 3 /(α 0 (1 + w n )) and m 2 = p 2 = α 2 /(α 0 (1 + (p 3) n )). Hence, its static input output characteristic can be defined as K y ( w) :=α 2 /(α 0 (1 + (p 3) n )). (7.86) The module Σ 2 is composed of only gene lac with inhibitory input y and activating output w. It is described by a scalar differential equation m 1 = α 1 1+y n α 0m 1 (7.87) with input y and output w = m 1. Its monotonicity and the existence of an equilibrium are clear. Its incidence graph is shown in Figure 7.19 (b). Its static input output characteristic is defined as

260 7.3 Construction of Oscillators by Non-monotone Dynamical Systems 251 (a) (b) (c) () Figure 7.19 The construction of the repressilator: (a) monotone open-loop module Σ 1; (b) monotone open-loop module Σ 2; (c) the repressilator with three mrnas m i and three proteins p i (i =1, 2, 3). The arrows and bar heads indicate positive and negative regulation, respectively. Time delays τ pj and τ mi are omitted (from (Wang et al. 2006a)) K w (ȳ) :=α 1 /(α 0 (1 + ȳ n )). (7.88) According to the regulation between the components in the two modules, i.e., the repressor protein CI inhibits the transcription of the repressor gene lacl, which activates the synthesis of the repressor protein LacI, the repressilator can be constructed by combining (7.81) (7.85) and (7.87) and closing the feedback loop, as shown in Figure 7.19 (c). The repressilator Σ without delays can thus be described as α i m i = 1+p n α 0 m i, j (7.89) p i = βm i βp i, (7.90) where i and j have the following three pairs: (i=1, j=2), (i=2, j=3), and (i=3, j=1). By introducing delays in the input and output components to represent the slow processes of transcription, translation, and transportation of the molecules between the nucleus and the cytoplasm and using the transformation developed in (Wang et al. 2005), the correspondence between the repressilator described by the delayed differential equations

261 252 7 Design of Synthetic Oscillating Networks α i m i (t) = 1+p j (t τ Pj ) n α 0m i (t), (7.91) p i (t) =βm i (t τ mi ) βp i (t), (7.92) and the discrete map described by (7.80) can be established according to K y and K w defined by (7.86) and (7.88), respectively. The total time delay is defined as τ = 3 i=1 τ m i + 3 i=1 τ p i. It is worth noting that the repressilator with delays introduced only in the input and output components is qualitatively equivalent to that with delays introduced in all components due to its special structure and the transformation developed in (Wang et al. 2005). Two scenarios of the input output characteristics in the (w, y) plane are illustrated in Figure 7.20 to show the convergence of (7.80) to a steady-state and a periodic orbit, respectively. According to the correspondence, an oscillation will also occur with appropriate delays in (7.91) (7.92). The oscillations corresponding to Figure 7.20 (b) with the amplitude y A y D for protein TetR at different time delays are shown in Figure 7.21 (a). Therefore, (7.80) can be used not only to indicate the occurrence of oscillations but also to detect the amplitudes that are robust to the variation in delays. The robustness of the amplitude against variation in the delays for different α is shown in Figure 7.21 (b). When τ is small, no oscillation occurs or the amplitude of oscillations is small. The amplitude of oscillations increases with increasing τ. Eventually, the amplitude keeps almost constant and is robust to variations in τ. Such robust amplitudes are obtained from the iterations of equilibria in the two modules, where the equilibria are robust to variations in the respective delays (Kobayashi et al. 2003). Therefore, the oscillations with other amplitude must be sensitive to variations in τ and thus show poor robustness. ( a ) ( b ) Figure 7.20 The two different asymptotic states in (7.80). Convergence to an asymptotically stable steady-state at α 1 =1.5 (a) and to an asymptotically stable periodic orbit at α 1 =2.5 (b). Other parameters are n =2,β =2,α 2 = α 3 =2.5, and α 0 = 1 (from (Wang et al. 2006a))

262 7.3 Construction of Oscillators by Non-monotone Dynamical Systems 253 ( a ) ( b ) Figure 7.21 Oscillations with different delays. (a) Different oscillations with the same amplitudes but different periods for different delays at α =2.5. (b) Bifurcation diagrams with τ as a bifurcation parameter at α =2.5 (dotted) and α =3.5 (solid), respectively, where the maximum and minimum peak values of protein TetR are shown (from (Wang et al. 2006a)) The correspondence of bifurcation diagrams showing the maximum and minimum peak values of protein TetR as a function of α = α i (i =1, 2, 3) for (7.80) and (7.91) (7.92) at τ = 30, 60, and 90 min is demonstrated in Figure 7.22 (a). The bifurcation diagrams and maximum and minimum peak values are identical for both cases. Therefore, the dynamical behavior of the repressilator with an appropriate delay is determined by (7.80). Although the delay is important to produce oscillations, it has little effect on the bifurcation diagrams. In other words, the delay only affects the periods but not the amplitude. Because of the opposite monotonicity of the two modules, both of them have only one equilibrium. The destabilization of the steady-state in (7.80) indicates a stable oscillation in (7.91) (7.92) for an appropriate τ. Note that when τ is small, (7.79) and (7.80) may have different bifurcation diagrams. In this case, both the amplitude and the period are sensitive to delay variations. The regulation mechanism can also be understood from (7.80) so as to control the system behavior. The parameter values in one module are kept constant and the effects of parameter variations in the other module are discussed. When the parameters in both the modules vary, a similar discussion can be made. The regulation mechanism is illustrated in Figure 7.22 (b) to show why different dynamics can emerge in Figure 7.20, where α 1 in Σ 2 is chosen as a parameter; a change in α 1 does not affect the dynamics of Σ 1. Two different Kw 1 curves denoted by K 1 wp and K 1 wq are represented by solid and dashed lines in Figure 7.22 (b), respectively at α 1P =2.5 andα 1Q =1.5. For the same initial input w 0, two identical outputs S y and T y from Σ 1 with S y = T y and two different w 1P = S w and w 1Q = T w with S w >T w at α 1P and α 1Q can be obtained. Finally, two different w 2P = P w and w 2Q = Q w with P w >w 0 and Q w <w 0 are derived at α 1P and α 1Q based on the monotonicity of K y and Kw 1. w 2 >w 0 implies that (7.80) and (7.91) (7.92) with

263 254 7 Design of Synthetic Oscillating Networks an appropriate delay τ will converge to a periodic orbit. Although w 2 <w 0 does not necessarily mean that the map will converge to a steady-state, this is true when 0 <w 0 w e <ɛwith sufficiently small ɛ is satisfied. In other words, by choosing w 0 with 0 <w 0 w e <ɛand increasing α 1 until w 2 >w 0, the repressilator will become oscillatory with an appropriate delay. ( a ) ( b ) Figure 7.22 The bifurcation diagrams and the regulation mechanism. (a) The bifurcation diagrams of (7.80) and (7.91) (7.92) with α = α 1 = α 2 = α 3 as a bifurcation parameter for different τ. (b) The regulation mechanism can be derived from the input output characteristics and thereby to detect oscillations and control system dynamics. For α 1 =1.5 and α 1 =2.5, we can obtain two identical K y curves and two different Kw 1 curves, denoted by K 1 wq and K 1 wp, and two different w 2, w 2Q and w 2P with w 2Q <w 0 <w 2P. Therefore, an oscillation must occur at α 1 =2.5 for an appropriate τ. On the other hand, for α 1 =1.5, more iterations are needed to determine the convergence (from (Wang et al. 2006a)) Although the approach above is mainly applicable to the case of single input and single output, extension to cases of multiple inputs and outputs is possible (Wang et al. 2007). Consider Goldbeter s dual loop model as an example. As shown in Figure 7.8, the first module is the mrna subsystem, described by dm P dt dm T dt KIP n = v sp KIP n + ν M P yn mp k d M P, (7.93) K mp + M P KIT n = v st KIT n + ν M T yn mt k d M T, (7.94) K mt + M T with input y and output w = (w (1),w (2) ) = (h M1 (M p ),h M2 (M T )) = (M P,M T ). The second module is the protein subsystem, described by

264 dp 0 dt = k sp w (1) P V 0 P 1P K 1P +P 0 + V 1 2P K 2P +P 1 k d P 0, dp 1 P dt = V 0 P 1P K 1P +P 0 V 1 P 2P K 2P +P 1 V 1 P 3P K 3P +P 1 + V 2 4P K 4P +P 2 k d P 1, dp 2 P dt = V 1 P 3P K 3P +P 1 V 2 P 4P K 4P +P 2 k 3 P 2 T 2 + k 4 C v 2 dp K dp +P 2 k d P 2, dt 0 dt = k st w (2) T V 0 T 1T K 1T +T 0 + V 1 2T K 2T +T 1 k d T 0, dt 1 T dt = V 0 T 1T K 1T +P 0 V 1 T 2T K 2T +T 1 V 1 T 3T K 3T +T 1 + V 2 4T K 4T +T 2 k d T 1, dt 2 T dt = V 1 T 3T K 3T +T 1 V 2 T 4T K 4T +T 2 k 3 P 2 T 2 + k 4 C v 2 dt K dt +T 2 k d T 2, dc dt = k 3 P 2 T 2 k 4 C k 1 C + k 2 C N k dc C, dc N dt = k 1 C k 2 C N k dn C N, with input w =(w (1),w (2) ) and output y = h P (C N )=C N. We focus on the core delayed negative feedback loop established by per and tim, i.e., when the two modules are closed by delayed network feedback, the feedback closure takes the following form: dm P 7.4 Design of Molecular Oscillators with Hybrid Networks 255 K dt = v n IP sp KIP n +Cn N (t τ) v M P mp K mp +M P k d M P, dp 0 P dt = k sp M P V 0 P 1P K 1P +P 0 + V 1 2P K 2P +P 1 k d P 0, dp 1 P dt = V 0 P 1P K 1P +P 0 V 1 P 2P K 2P +P 1 V 1 P 3P K 3P +P 1 + V 2 4P K 4P +P 2 k d P 1, dp 2 P dt = V 1 P 3P K 3P +P 1 V 2 P 4P K 4P +P 2 k 3 P 2 T 2 + k 4 C v 2 dp K dp +P 2 k d P 2, dm t K dt = v n IT st KIT n +Cn N (t τ) v M T mt K mt +M T k d M T, dt 0 T dt = k st M T V 0 T 1T K 1T +T 0 + V 1 2T K 2T +T 1 k d T 0, dt 1 T dt = V 0 T 1T K 1T +P 0 V 1 T 2T K 2T +T 1 V 1 T 3T K 3T +T 1 + V 2 4T K 4T +T 2 k d T 1, dt 2 T dt = V 1 T 3T K 3T +T 1 V 2 T 4T K 4T +T 2 k 3 P 2 T 2 + k 4 C v 2 dt K dt +T 2 k d T 2, dc dt = k 3 P 2 T 2 k 4 C k 1 C + k 2 C N k dc C, dc N dt = k 1 C k 2 C N k dn C N. The delayed dual loop model shows some ideal properties, i.e., when introducing delays in the negative feedback, coexistence of an equilibrium and a periodic oscillation, coexistence of two periodic oscillations, and coexistence of a periodic oscillation and chaos disappear and only a periodic oscillation can exist, as shown in Figure Some other properties, e.g., correspondence of bifurcation diagrams between the continuous system and the discrete map, and the robust amplitude to delays, can also be found. See (Wang et al. 2007) for more details. 7.4 Design of Molecular Oscillators with Hybrid Networks: General Formalism As shown in previous chapters, a general procedure for the designe of the PFNs guarantees the stable switching states without any non-equilibrium dynamics, thereby making theoretical analysis and design of switching networks tractable even for large-scale systems with time delays. Meanwhile, a CFN with some specified conditions can converge to periodic oscillations (Wang et al. 2005,

265 256 7 Design of Synthetic Oscillating Networks Figure 7.23 The global oscillations induced by delayed negative feedback, shown along with the bifurcation diagrams. (a) (c) Transition from coexistence of an equilibrium and a periodic oscillation, coexistence of two periodic oscillations, and coexistence of a periodic oscillation and chaos, to global oscillations. (d) Correspondence of bifurcation diagrams between the continuous system and the discrete map (from (Wang et al. 2006a)) Chen and Wang 2006). Although the original CFNs have been extended to general ones, the specific structures still considerably limit their applications. Explicitly considering all components and biochemical reactions in a biomolecular network is unrealistic from the viewpoint of modeling, analysis, and computation. However, different time scales characterize the various cellular regulatory processes, which can be exploited to reduce the complexity of mathematical models (Chen and Aihara 2002b, Hasty et al. 2002a, Ciliberto et al. 2007). For example, the transcription and translation processes in genetic networks generally evolve on a time scale that is much slower than that of phosphorylation, dimerization, and binding reactions of TFs in protein networks. In addition, although the dynamics is intertwined between gene networks, signal transduction networks, and metabolic networks, interactions within each network are generally more active than those between them, or

266 7.4 Design of Molecular Oscillators with Hybrid Networks 257 they are relatively independent. Such properties can also be exploited to simplify a biomolecular network provided the behavior of the simplified network is guaranteed to be qualitatively and quantitatively identical to the behavior of the original network. According to the convergence properties of PFNs and CFNs and the multiple time scales in different processes, a methodology to construct and analyze cellular oscillators with time delays was developed (Wang et al. 2004). A multiple time scale network (MTN) is composed of a series of CFNs and multiple PFNs. The PFNs are mainly constituted by fast reactions, whereas the CFNs consist of slow reactions. According to different convergence properties of positive and cyclic feedback networks, it can be proven that an MTN with certain conditions has no stable equilibria but has stable periodic oscillations, depending on the total time delay of the CFN, although it has a complicated network structure including both positive and negative feedback loops. Such a property is clearly ideal for designing and modeling biological oscillators. Since there is less restriction on the network structures of an MTN, it can be used in several applications for modeling, analysis, and design of cellular oscillators. A basic MTN consists of a fast PFN and a slow CFN. Assume that there are m fast variables y = (y 1,..., y m ) R +m and p slow variables x =(x 1,..., x p ) R +p, representing the concentrations of chemical components at time t R, where p 2. Then, (3.1) can be rewritten as ẋ(t) =h(x τx,y τy ), (7.95) ɛẏ(t) =g(x τx,y τy ), (7.96) where ɛ is a small positive real parameter, x τx = x(t τ x ), and y τy = y(t τ y ). The system (7.95) (7.96) is called a singularly perturbed system and also known as a fast slow system with slow x and fast y. Such multiple time scale properties are found in many biochemical systems, especially gene regulatory and metabolic systems. Assume that (7.96) is a PFN for a fixed x τx, (7.95) has a CFN structure except for those parts interacting with y τy, and that there are two neighboring variables in x τx affecting y τy or the PFN. This implies that all loops in (7.96) are positive for fixed x τx, and (7.95) has the structure or partial structure of cyclic networks, except those interacting with y τy. Figure 7.24 shows a schematic of an example of an MTN, where all PFNs evolve on a much faster time scale than other components, and C i is a CFN or its partial structure. Note that all C i s have to be connected in series, whereas PFNs can be connected in any form, e.g., in series, parallel or hybrid forms. Moreover, due to the difference in time scales, time delays in the different subnetworks have different effects on the dynamical properties. A PFN is robust to time delays, while time delays in a CFN may significantly affect the dynamics of the network. Such properties actually hold when different time scales are utilized. In other words, we do not need to consider the time delays in the fast posi-

267 258 7 Design of Synthetic Oscillating Networks tive feedback subnetworks when analyzing and designing molecular oscillators, e.g., gene oscillators, although they may influence the transient dynamics. When ɛ = 0, (7.95) (7.96) degenerate to a set of only p functional differential equations, i.e., (7.95) with the following constraint: 0=g(x τx,y τy ). (7.97) According to the convergence properties of PFNs, for a fixed x τx, (7.96) converges to a stable equilibrium E 0 = {y 0 (x τx )}. Let K denote the set of solutions of (7.97). Since (7.96) is a PFN that is irreducible, g/ y is negatively definite in K, and hence, det( g/ y) 0 or rank( g/ y) =m at the point E 0 of K. By the implicit function theorem, there exist neighborhoods of A o of x E0 and B o of y E0, and unique smooth mapping h : A o B o such that g(x τx,h(x τx )) = 0 for all x τx A o. Therefore, locally around (x E0,y E0 ), the degenerate system (7.95) and (7.97) is equivalent to a p-dimensional FDE defined on the graph of the mapping h, i.e., ontheset and represented by the equation S = { (x τx,y τy ) A o B o : y τy = h(x τx ) } (7.98) ẋ(t) =f(x τx,h(x τx )) ˆf(x τx ). (7.99) This system is called a reduced system. Without loss of generality, for the MTN described by (7.95) (7.96), we assume that the (p 1)th and the pth nodes are two neighboring slow chemical components, which connect with the fast chemical components. We also assume that the reduced network defined by (7.99) is a CFN. Theorem 7.8. An orbitally and asymptotically stable periodic solution x = Φ(t) of (7.99) is stable under persistent perturbations. Moreover, when assuming that (7.96) is a PFN for a fixed x τx and that the reduced network defined by (7.99) is a CFN, for a sufficiently small ɛ, x = Φ(t) is a stable periodic solution of (7.95) (7.96). The reduction from a MTN to a CFN can be carried out as follows: the protein multimers and their complexes can be eliminated by utilizing the inherent separation of time scales because the multimerization processes are known to be governed by rate constants that are extremely fast with respect to the cellular growth and the transcription. In other words, fast reactions are assumed to converge to equilibria rapidly and thus all fast variables can be eliminated from the MTN. The reduced MTN has the structure of a CFN. According to the conditions for a CFN to converge to periodic oscillations, oscillatory behavior in the MTN can be approximately analyzed. Although it is generally very difficult to guarantee asymptotical behavior such as equilibria and periodic orbits even for a small network because of the nonlinearity of the system, it becomes much easier to analyze the dynamical properties if the

268 7.4 Design of Molecular Oscillators with Hybrid Networks 259 C1 P F N - 1 C1 P F N - 5 P F N - 4 C2 C3 P F N - 2 C3 C2 P F N - 3 M T N R e d u c e d M T N ( C F N ) Figure 7.24 Schematic illustration of a multiple time scale network, where PFNs evolve on a much faster time scale than other components, which evolve on a slower time scale and are denoted as C i. C i is a CFN or its partial structure, and all C i s are connected in series. After all the PFNs are eliminated, the reduced network is a CFN (from (Chen and Wang 2006)) Figure 7.25 Schematic illustration of a gene regulatory network. Proteins p x (CI) and p y (Lac) and mrna m x (ci) and m y (Lac) constitute a slow CFN. Other chemicals, such as the CI dimer, the Lac dimer and the Lac tetramer, consist of two fast PFNs (from (Wang et al. 2004))

269 260 7 Design of Synthetic Oscillating Networks reduced MTN has the same structure as that of a CFN by eliminating all fast PFNs from the original MTN. See (Wang et al. 2004) for more details on the theoretical analysis. Consider a simple two-gene network with genes ci and lac under the control of promoters P L laco1 andprm, respectively, as an example. It consists of two fast PFNs and one slow CFN, as shown in Figure The two genes are both well-characterized transcriptional regulators, which can be found in bacterium E. coli and λ phage. Assume that the network is implemented in a eukaryotic cell, e.g., yeast, so as to examine the effect of time delays on oscillatory behavior. mrna of gene ci (m x ) is translated to protein CI (p x ) in the cytoplasm, which in turn forms a homodimer p 2x and is transported or diffused into the nucleus in the form p 2x to enhance the expression of gene Lac by binding on the two operator sites of the promoter PRM. On the other hand, the mrna of gene lac (m y ) is translated to protein Lac (p y ), which forms a homodimer p 2y, and further, a tetramer p 4y in the cytoplasm. When they are moved to the nucleus, the tetramer p 4y is in the form of p 4y, which represses the expression of gene ci by binding on the operator site of the promoter P L laco1. The promoter P L laco1 has one binding site OR for the Lac tetramer, but the promoter PRM has two binding sites OR 1 and OR 2 for the CI dimer, with the priority binding first on OR 1 and next on OR 2. Note that PRM is a mutated promoter obtained from P RM, which has no binding site for the Lac tetramer. In contrast to the case of prokaryotes, there are time delays (τ mx,τ my,τ px,τ py ) because of transportation and diffusion of mrnas and TFs between the nucleus and cytoplasm, which may significantly affect the dynamics of the system. We define the following species in terms of concentrations: m x, mrna CI; p x, CI protein; p 2x, CI dimer in the cytoplasm; p 2x, CI dimer in the nucleus; D y, the free DNA binding or operator site in the promoter PRM ; p 2xD y, the CI dimer bound to operator site OR 1 of the promoter PRM ; p 2xp 2xD y, CI dimers bound to both OR 1 and OR 2 of the promoter PRM ; m y, mrna Lac; p y, Lac protein; p 2y, Lac dimer; p 4y, Lac tetramer in the cytoplasm; p 4y, Lac tetramer in the nucleus; D x, the free DNA binding site in the promoter P L laco1; p 4yD x, the Lac tetramer bound to the operator site OR of the promoter P L laco1. The fast reactions are mainly the multimerization and binding reactions for the protein network. As indicated in Figure 7.25, we have fast reactions for CI, which consist of a PFN k 1 p x + p x p 2x, (7.100) k 1 k 2 p 2x p 2x, (7.101) k 2 p k 3 2x + D y p 2xD y, (7.102) k 3 p 2x + p 2xD y k 4 k 4 p 2x p 2xD y. (7.103)

270 7.4 Design of Molecular Oscillators with Hybrid Networks 261 The fast reactions for Lac also consist of a PFN k 5 p y + p y p 2y, (7.104) k 5 k 6 p 2y + p 2y p 4y, (7.105) k 6 k 7 p 4y p 4y, (7.106) k 7 p k 8 4y + D x p 4yD x. (7.107) k 8 On the other hand, the slow reactions involve the transcription of mrnas, the translation of proteins, and the degradation of proteins and mrnas. The slow reactions for CI are m x k px px + m x, (7.108) k my0 D y my + D y, (7.109) p k my1 2xD y my + p 2xD y, (7.110) p 2x p k my2 2xD y my + p 2x p 2xD y, (7.111) d m mx x 0, (7.112) p x d px 0. (7.113) The slow reactions for Lac are m y k py py + m y, (7.114) k D mx0 x mx + D x, (7.115) p k mx1 4yD x mx + p 4yD x, (7.116) m y p y d my 0, (7.117) d py 0. (7.118) There are also conservation conditions for the total binding sites of the two promoters, i.e., D y + p 2xD y + p 2xp 2xD y = n y and D x + p 4yD x = n x, where n x and n y are the concentrations of the genes ci and lac, respectively. For convenience, m x is denoted by X 1, p x by X 2, m y by X 3, p y by X 4, p 2x by Y 1, p 2x by Y 2, p 2xD y by Y 3, p 2xp 2xD y by Y 4, p 2y by Y 5, p 4y by Y 6, p 4y by Y 7,and p 4yD x by Y 8. Then the time evolution of the twelve-variable model is governed by the following functional differential equations, in which all parameters and concentrations are defined with respect to the total cell volume:

271 262 7 Design of Synthetic Oscillating Networks dx 1 = k mx0 (n x Y 8 )+k mx1 Y 8 d mx X 1, dt (7.119) dx 2 = k px X 1 (t τ mx )+2k 1 Y 1 2k 1 X2 2 d px X 2, dt (7.120) dx 3 = k my0 (n y Y 3 Y 4 )+k my1 Y 3 + k my2 Y 4 d my X 3, dt (7.121) dx 4 = k py X 3 (t τ my ) 2k 5 X4 2 +2k 5 Y 5 d py X 4, dt (7.122) dy 1 = k 1 X2 2 + k 2 Y 2 (t τ px ) k 1 Y 1 k 2 Y 1, dt (7.123) dy 2 = k 2 Y 1 (t τ px ) k 2 Y 2 + k 3 Y 3 dt k 3 (n y Y 3 Y 4 )Y 2 + k 4 Y 4 k 4 Y 2 Y 3, (7.124) dy 3 = k 3 (n y Y 3 Y 4 )Y 2 + k 4 Y 4 k 4 Y 2 Y 3 k 3 Y 3, dt (7.125) dy 4 = k 4 Y 2 Y 3 k 4 Y 4, dt (7.126) dy 5 = k 5 X4 2 k 5 Y 5 2k 6 Y5 2 +2k 6 Y 6, dt (7.127) dy 6 = k 6 Y5 2 k 6 Y 6 + k 7 Y 7 (t τ py ) k 7 Y 6, dt (7.128) dy 7 = k 7 Y 6 (t τ py ) k 7 Y 7 k 8 Y 7 (n x Y 8 )+k 8 Y 8, dt (7.129) dy 8 = k 8 (n x Y 8 )Y 7 k 8 Y 8, dt (7.130) where Y i are fast variables and ɛ is not explicitly expressed in (7.119) (7.130). It is easy to check that the two fast reaction subgroups are the two PFNs for the fixed slow variables. By assuming that the fast reactions converge to equilibria rapidly, all fast variables can be eliminated. To demonstrate the example clearly, we explicitly derive the reduced MTN, although it is not necessary in general. In particular, by dy i /dt = 0 in (7.123) (7.130), we eliminate fast variables as follows: Y 1 = Y 2 = K 1 X 2 2, Y 3 = n y K 3 K 1 X 2 2 /(1 + K 3 K 1 X K 4 K 3 K 2 1X 4 2 ), Y 4 = n y K 4 K 3 K 2 1X 4 2 /(1 + K 3 K 1 X K 4 K 3 K 2 1X 4 2 ), Y 5 = K 5 X 2 4, Y 6 = Y 7 = K 6 K 2 5X 4 4,andY 8 = n x K 8 K 6 K 2 5X 4 4 /(1+K 8 K 6 K 2 5X 4 4 ), where K i = k i /k i,(i =1,..., 8) and K 2 = K 7 = 1. Then, we obtain the reduced equations

272 7.4 Design of Molecular Oscillators with Hybrid Networks 263 dx 1 dt = 1 (k n x x 4 4(t) mx1 r s 1+x 4 4 (t) d mx x 1 (t)+k mx0 n x ), K a (7.131) dx 2 dt = 1 ( k pxk b r s Ka 2 x 1 (t τ m x ) d px x 2 (t)), K a (7.132) dx 3 dt = r ( k my1n y x 2 2(t)+k my2σn y x 4 2(t) r s 1+x 2 2 (t)+σx4 2 (t) d my x 3 (t)+k my0 n y ), (7.133) K b dx 4 dt = 1 r s ( k py K b x 3 (t τ m y ) d py K a x 4 (t)). (7.134) The dimensionless variables are scaled as follows: x 1 (K 8 K 6 K 2 5) 1/4 X 1, x 2 (K 1 K 3 ) 1/2 X 2, x 3 (K 1 K 3 ) 1/2 X 3, x 4 (K 8 K 6 K 2 5) 1/4 X 4 t r s K a t, τ mx r s K a τ mx,andτ px r s K a τ px, where r s = n x k px rk mx1/d mx, r = K b /K a, K a =(K 8 K 6 K 2 5) 1/4, K b =(K 1 K 3 ) 1/2, σ = K 4 /K 3, k mx1 = k mx1 k mx0, k my1 = k my1 k my0,andk my2 = k my2 k my0. The reduced network described by (7.131) (7.134) is shown in Figure Figure 7.26 The reduced MTN with proteins p x (CI), p y (Lac) and mrnas m x (ci) and m y (Lac). The self-feedback loops are omitted It is clear that when k mx1 < 0, k my1 > 0, and k my2 > 0, (7.131) (7.134) is a CFN with a negative cyclic feedback loop. By using a functional transformation, i.e., x 1 (t τ) x 1(t ), (7.135) x 2 (t τ m y ) x 2(t ), (7.136) x 3 (t τ m y ) x 3(t ), (7.137) x 4 (t ) x 4(t ), (7.138) we can equivalently change all time delays into a single time delay τ = τ m x + τ m y for (7.131) (7.134), i.e.,

273 264 7 Design of Synthetic Oscillating Networks dx 1 dt dx 2 dt dx 3 = 1 (k n x x 4 4(t τ) mx1 r s 1+x 4 4 (t τ) d mx x 1(t )+k mx0 n x ), (7.139) K a = 1 ( k pxk b r s Ka 2 x 1(t ) d px x 2(t )), (7.140) K a 2(t )+k my2σn y x 4 2(t ) 1+x 2 2 (t )+σx 4 d my x 3(t )+k 2 (t my0 n y ),(7.141) ) K b dt = r r s ( k my1n y x 2 dx 4 dt = 1 r s ( k py K b x 3(t ) d py K a x 4(t )). (7.142) Note that τ does not include τ px and τ py, which are eliminated in the fast PFNs. The parameter values are k mx1 =0.2 min 1, K 8 = M 1, n x = 1 nm, n y = 1 nm, K 6 =10 7 M 1, K 5 =10 8 M 1, k mx0 = 3 min 1, k px = 4min 1, k my1 = 3 min 1, k my2 =12min 1, K 1 = M 1, K 3 = M 1, d my = 5 min 1, k my0 = 2 min 1, k py = 1 min 1, d py = 2 min 1, and σ = 2. According to the above parameters, the variables are scaled as X 1 (nm) 0.8x 1, X 2 (nm) 8x 2, X 3 (nm) 8x 3, X 4 0.8x 4,and t (min) t /1.37. Note that τ is also a time delay scaled by x 1 x x 3 x t Figure 7.27 The sustained oscillations generated by the reduced network shown in Figure 7.26 (from (Wang et al. 2004)) According to the reduction process, the complex network can easily be reduced to a simple network, as shown in Figure 7.26; this network is a negative cyclic network. The reduced network consists of only four components and is relatively easier to analyze. Moreover, according to the theoretical analy-

274 7.4 Design of Molecular Oscillators with Hybrid Networks 265 sis, the reduced network quantitatively maintains the dynamical properties of the original network. The sustained oscillations in the reduced network are shown in Figure Because the fast reactions in the form of perturbations do not change the period and amplitude over a long time period, limit cycle oscillations represent a particularly stable mode of periodic behavior. Such stability is consistent with the robust nature of circadian clocks which have to maintain their amplitude and period in the changing environment.

275 8 Multicellular Networks and Synchronization In higher eukaryotes and multicellular organisms, intercellular communication has been shown to be very important. The biosignals received by individual cells, whether originating from other cells or from some change in the organism s physical and chemical surroundings, have various forms. Cells can sense and respond to electromagnetic signals, such as light, and to mechanical signals, such as direct contact. Individual cells usually communicate with each other using chemical signal molecules which can dissolve in the cytosol and diffuse freely between individual cells and their extracellular medium. The signals which are sent and received by cells during their entire existence may also be essential for the harmonious development of tissues, organs, and bodies. They may also influence movements, information processing, and behavior of individual cells. Normal cellular functions require a precise coordination of the emission and reception of the signals and dysfunctioning is often associated with pathological condition. The mechanism by which cells produce, release, then detect, and respond to the signals is an important aspect of intercellular communication. Besides the signals, the extracellular environment in a multicellular network is also important because individual cells must sense, respond, and adapt to the modification in their environment. The first recognized diffusible signaling mechanism described in living organisms was autoinduction, which reflects the observation that bacteria themselves were the source of the signal (Novick 2003). Through the diffusive process of the signal molecules, e.g., autoinducer (AI), all cells are coupled, and a multicellular system is formed. The coupling is composed of three main stages: production, release, and subsequent detection of the signal molecules. Complex patterned structures in multicellular organisms, various kinds of social behavior, and cellular differentiations in bacteria can be attributed to intercellular communication, e.g., quorum sensing (Weiss and Knight 2000). Generally, intercellular communication is accomplished by transmitting one or more intercellular signal molecules such as acyl-homoserine lactone, hormones, growth factors, and neurotransmitters to neighboring cells and fur-

276 268 8 Multicellular Networks and Synchronization ther integrating the signals to generate a global cellular response at the level of molecules, cells, tissues, organs, and bodies. In the detection and response processes, a signal molecule binds to a receptor protein. The activated protein acts as a TF or an enzyme, thereby triggering some specific cellular activities. The ability of cells to communicate is an absolute requisite to ensure collective behavior like synchronization under an uncertain environment. Depending on the nature of the signals, distinct pathways can be used to enter individual cells. For example, hydrophobic compounds such as steroid hormones can proceed through the lipid bilayer of the cells and eventually combine with receptors which are known to be TF s regulating gene expression. The signals also diffuse through ion channels, which allow ions such as sodium, potassium, and calcium to translocate across the membrane (Höfer 1999, Koenigsberger et al. 2004). Besides communication of signaling molecules, we show that a multicellular system can be synchronized by common perturbations of environment even without any signaling molecules between cells. 8.1 A General Multicellular Network for Deterministic Models Collective behavior is a phenomenon whereby two or more cells adjust their motions to common behavior due to coupling or forcing. Many researchers have studied such a phenomenon experimentally, numerically, or theoretically (McMillen et al. 2002, Chen et al. 2005, Zhou et al. 2005, Yamaguchi et al. 2005, Gonze et al. 2005, Teramae and Tanaka 2004, Höfer 1999, Zhou et al. 2008,Garcia-Ojalvo et al. 2004,Kuznetsov and Kopell 2004). Collective behavior is essential for cellular organization and information processing. The quorum-sensing bacteria have revealed a widespread mechanism of collective gene expression. By monitoring signal molecules produced, individual bacteria can regulate their expression of group-beneficial phenotypes which guarantee an effective group outcome. Once a particular density threshold is reached, cooperative behavior is established. To describe and analyze collective behavior, a general model based on the intercellular communication mechanism by which cells produce, release, then detect, and respond to the signals, as shown in Figure 8.1, was constructed (Wang et al. 2008). The model can be represented as ẋ i = d x i (x i )+f i (x i )+r i (x i,s i ), (8.1) ṡ i = d s i (s i )+p i (x i,s i )+c i (s i,s e ), (8.2) ṡ e = d e (s e )+c e (s i,s e ), (8.3) where x i (t) R +m (i =1,..., n) indicates the concentrations of all intracellular components of the ith cell with degradation d x i (x i), except the signal molecules. s i (t) ands e (t) R +p are the concentrations of intracellular and extracellular signal molecules with degradation d s i (s i)andd e (s e ), respectively.

277 8.1 A General Multicellular Network for Deterministic Models 269 The three degradation terms in (8.1) (8.3) can be either linear or nonlinear. The dynamics of an isolated cell is represented as ẋ i (t) =f i (x i (t)) d x i (x i). The term p i (x i,s i ) represents the synthesis of the signal molecules, and the term r i (x i,s i ) shows how individual cells detect and respond to the signal molecules. The coupling terms c i (s i,s e )andc e (s i,s e ) show how the signal molecules are released and diffused across cell membranes. For different i; f i, r i, d x i, ds i, p i,andc i may be the same or different, depending on whether the intrinsic and extrinsic noise and cell variances are considered. Unlike many models of coupled networks, any two cells in (8.1) (8.3) are not directly coupled but interact indirectly through a diffusive and mixing process through a common extracellular environment, which is more plausible biologically. E x t r a c e l l u l a r m e d i u m A I L u x I L u x R L u x R G e n e s C e l l m e m b r a n e Figure 8.1 Schematic representation of the quorum-sensing mechanism. The Luxtype protein catalyzes the synthesis of the signal molecule autoinducer (AI). The LuxR-type protein binds to the AI and controls the expression of target genes. (from (Wang et al. 2008)) Beginning with an initial isolated network in individual cells suggested by the knowledge of regulatory mechanisms and various modeling techniques, a specific multicellular network comprising some cells can be constructed after the intercellular coupling is included. The dynamics of individual cells may be switching or oscillatory. Besides the deterministic expression (8.1) (8.3), stochastic formulations presented in previous chapters can also be used to model multicellular networks, especially when the effects of stochastic fluctuations on the collective behavior of cells should be considered. For this, the coupling reactions can be expressed approximately by biochemical reactions in the form of (2.24), but in a reversible form because of the free diffusion of the signal molecules between the intracellular cytosol and the extracellular medium (Chen et al. 2005). When the QSS approximation assumption is made, i.e., ṡ e (t) =0,the extracellular concentrations of the signal molecules s e can be approximated

278 270 8 Multicellular Networks and Synchronization by d e (s e )+c e (s i,s e )=0or Then, (8.1) (8.3) becomes s e = h(s i ). (8.4) ẋ i = d x i (x i )+f i (x i )+r i (x i,s i ), (8.5) ṡ i = d s i (s i )+p i (x i,s i )+c i (s i,h(s i )). (8.6) Generally, coupling between individual cells or subsystems and environments is nonlinear, i.e., r i,c i,andc e are nonlinear functions. When the diffusion process (8.2) (8.3) takes the linear form, i.e., the linearly coupled model ṡ i = d s i (s i )+p i (x i,s i )+η int (s e s i ), (8.7) ṡ e = d e s e + η ext n j=1 (s j s e ), (8.8) where s e is assumed to degrade linearly, the approximated extracellular s e concentration has the form (McMillen et al. 2002, Garcia-Ojalvo et al. 2004) s e = nη ext 1 d e + nη ext n n s j Q s, (8.9) where η int and η ext are p p diagonal matrices, which represent the coupling strength from cell i to the environment and from the environment to the cell j, respectively, and s indicates the average over all cells. When the s e degradation is not considered, (8.9) becomes the mean field, as used in (Gonze et al. 2005), s e = s = 1 n j=1 n s j. (8.10) Assuming the linear degradation of s e and using (8.7) and (8.9), (8.1) (8.3) take the form j=1 ẋ i = d x i (x i )+f i (x i )+r i (x i,s i ), (8.11) ṡ i = d s i (s i )+p i (x i,s i )+η int ( Q n s j s i ) n = d s i (s i )+p i (x i,s i )+ j=1 n b ij Γs j, (8.12) where Γ = ηq is a diagonal matrix that indicates the linkage of variables to the coupled system. B =[b ij ]isann n coupling matrix. For symmetrical coupling of (8.11) (8.12), j=1

279 8.2 Deterministic Synchronization of Cellular Oscillators n if i j, 1 b ij = n if i = j and Q i =0, (8.13) 1 n 1 Q i if i = j and Q i 0. For diffusive coupling, i.e., n j=1 b ij = 0, we have Q i = 1 for all i =1,..., n, whereas when some Q i < 1, non-diffusive coupling occurs (Wang and Chen 2005). Such a multicellular network may show rich dynamics such as synchronization (McMillen et al. 2002,Garcia-Ojalvo et al. 2004), multistability, clustering, and partial synchronization (Zhou et al. 2008,Ullner et al. 2007,Ullner et al. 2008). We mainly focus on synchronization in this Chapter. 8.2 Deterministic Synchronization of Cellular Oscillators Synchronization and possible effects of coupling on synchronization through intercellular signaling in a population of cellular oscillators have been investigated intensively in recent decades because of its biological importance and potential applications (McMillen et al. 2002, Glass 2001, Garcia-Ojalvo et al. 2004,Yamaguchi et al. 2005,Zhou et al. 2008). Synchronization in multicellular systems has received considerable interest both biologically and theoretically. Intercellular signaling has been shown to be essential for coordinated responses resulting from an integrated exchange of information in both prokaryotes and eukaryotes Complete Synchronization The coupled dynamical system (8.11) (8.12) is said to achieve complete synchronization if u 1 (t) =u 2 (t) = = u n (t) φ(t), as t. (8.14) Here, u i (t) =(x i (t),s i (t)), and φ(t) can be an equilibrium, a periodic orbit, or even a non-periodic orbit such as a chaotic orbit. The synchronization manifold is defined as the hyperplane Λ = {u 1,u 2,..., u n R m+p u i = u j ; i, j =1, 2,..., n}. (8.15) When the synchronized state is not an equilibrium, complete synchronization is generally expected only for a coupled network with identical subsystems or subnetworks. For a linearly coupled model, two cases for the coupling matrix B, i.e., diffusive coupling and non-diffusive coupling are considered. Diffusive coupling means n b ij =0, i =1,..., n. (8.16) j=1

280 272 8 Multicellular Networks and Synchronization Many studies have particularly examined the synchronization problem of diffusive coupling networks, such as master stability (Pecora and Carrol 1998), global synchronization of coupled neural networks (Cheng et al. 2004), and many other phenomena (Pikovsky et al. 2001). The diffusively coupled condition (8.16) ensures that the synchronization manifold is an invariant manifold of an individual network, namely, ẋ i = d x i (x i )+f i (x i )+r i (x i,s i ), (8.17) ṡ i = d s i (s i )+p i (x i,s i ). (8.18) Note that the coupling matrix B is an irreducible matrix. Furthermore, it can be shown that zero is an eigenvalue of B with multiplicity 1 and that all other eigenvalues of B are strictly negative, i.e., λ 1 =0 and 0>λ 2 λ n. (8.19) On the other hand, non-diffusive coupling for a linearly coupled model means n b ij 0, for some i {1,..., n}. (8.20) j=1 In this case, the synchronization manifold of (8.11) (8.12) is not an invariant manifold of (8.17) (8.18). For non-diffusive coupling, few results have been reported on the characterization of network synchronization because of the difficulties in identifying the synchronization state and analyzing its stability. Moreover, coupling matrix B may have entirely different properties. To deal with such a situation, rewrite n j=1 b ij as where b ij = ˆb ij + b ij, (8.21) n bij = 0 (8.22) j=1 for i =1,..., n. Then (8.11) (8.12) can be rewritten as ẋ i = d x i (x i )+f i (x i )+r i (x i,s i ), (8.23) n ṡ i = d s i (s i )+ p i (x i,s i )+ bij Γs j, (8.24) j=1 where n p(x i,s i )=p(x i,s i )+ ˆbij Γx j (t). (8.25) j=1 Instead of the original individual system (8.17) (8.18), we discuss an auxiliary individual system, i.e.,

281 8.2 Deterministic Synchronization of Cellular Oscillators 273 ẋ i = d x i (x i )+f i (x i )+r i (x i,s i ), (8.26) ṡ i = d s i (s i )+ p i (x i,s i ), (8.27) which has the properties of the diffusive coupling. In other words, the synchronization state φ(t) is not a solution of the original individual system (8.17) (8.18), but a solution of the auxiliary individual system (8.26) (8.27), due to the non-diffusively coupling condition (8.20). It is important to indicate that the dynamics of the original individual system and that of the auxiliary one may differ entirely. For instance, the original individual system may converge to an equilibrium, while the auxiliary one may converge to a periodic or even a chaotic attractor, which means that a periodically oscillatory synchronization can be realized even if the original individual system is neither non-oscillatory nor chaotic. For example, synchronized hysteresis-based oscillators can be obtained by coupling a population of toggle switches via the quorum-sensing mechanism (Kuznetsov and Kopell 2004). Using the auxiliary individual system, for the case of linear degradation, i.e., d x i (x i)=d x i x i and d s i (s i)=d s i s i, a sufficient condition on the stability of synchronization was obtained (Wang and Chen 2005). Theorem 8.1. For (8.23) (8.24), assume F =(f, p) to be globally Lipschitz continuous, i.e., there exist constants L i such that F i (u 1 ) F i (u 2 ) L i u 1 u 2, i =1,..., m + p, (8.28) holds for any two different u 1 and u 2 R m+p, where u is a vector norm defined for vector u =(u 1,..., u n ) T by u = n k=1 u2 k. Let the eigenvalues of its coupling matrix B =[ b ij ] be ordered as follows: 0=λ 1 >λ 2 λ 3 λ n (8.29) with λ 2, if γ i > 0, λ(γ i )= 0, if γ i =0, λ n, if γ i < 0, such that for all i =1, 2,..., m + p, (8.30) L i d i + γ i λ(γ i ) < 0, (8.31) where d i = d x i,ifi m, otherwise, d i = d s i+m, then, the dynamical system (8.23) (8.24) achieves strong synchronization or complete synchronization or identical synchronization, i.e., there is an invariant diagonal hyperplane Λ defined by (8.15), which is not only attractive but also Lyapunov stable. Clearly, complete synchronization for (8.1) (8.3) means that, when t, the dynamics asymptotically converges to the synchronization manifold (8.15).

282 274 8 Multicellular Networks and Synchronization Other Types of Synchronization Besides complete synchronization, there are a variety of other synchronization forms which is quite a rich phenomenon, e.g., phase synchronization, partial synchronization, and generalized synchronization. Generalized synchronization occurs when there are functions φ 1,..., φ n, such that φ 1 (u 1 (t)) = φ 2 (u 2 (t)) =... = φ n (u n (t)) for the coupled system (8.1) (8.3) hold after a transitory evolution, i.e., t, from appropriate initial conditions, where u i (t) =(x i (t),s i (t)). Clearly, if the subsystems or oscillators are mutually coupled, φ i is an invertible function. Otherwise, e.g., if there is a drive-response configuration among the subsystems, φ i may not be an invertible function. Complete synchronization is a particular case of generalized synchronization where φ 1 (u 1 (t)) = φ 2 (u 2 (t)) =... = φ n (u n (t)) = φ(t) (Rulkov et al. 1995). Partial synchronization occurs when a part of subsystems, e.g., the first m (m n) subsystems of (8.1) (8.3) asymptotically evolve in the sense of the generalized synchronization (Inoue et al. 1998). In other words, there are functions φ 1,..., φ m, such that φ 1 (u 1 (t)) = φ 2 (u 2 (t)) =... = φ m (u m (t)) for the coupled system (8.1) (8.3) hold after a transitory evolution, i.e., t, from appropriate initial conditions. For a coupled system, there may exist multiple groups separatively synchronized in different periods, which are also called clustering synchronization. Phase synchronization occurs when in the synchronized state, the amplitudes of the subsystems or oscillators remain unsynchronized, but their phases evolve in synchrony with phase differences kept constant (Rosenblum et al. 1996). In particular, the phase synchronization is in-phase synchronization, or sometimes, bulk synchronization, if all the synchronized states have the same phase. For a dynamical system, there are many ways to define a phase of a periodic, quasi-periodic, or even chaotic trajectory. Actually, the synchronized states for all these forms of synchronization can be asymptotically stable. This means that once the synchronized state has been reached, the effect of a small perturbation that destroys synchronization is rapidly damped, and the synchronization is re-established. For the stochastic synchronization, the definition will be provided in the following sections. The various forms of synchronization mentioned above can also be classified as spontaneous or entrained synchronization, depending on whether the external stimulus is needed. Next, we introduce some elementary methods by which synchronization can be obtained. For entrained synchronization, the external stimulus can be deterministic or stochastic. Here, the synchronization induced by a deterministic stimulus, e.g., the light dark cycle or periodic external forcing, is referred to as entrained synchronization. The synchronization induced by stochastic effects such as intrinsic or extrinsic noise will be called

283 8.3 Spontaneous Synchronization of Deterministic Models 275 noise-induced or noise-driven synchronization. All kinds of synchronization are generalized synchronization for either deterministic or stochastic forms. 8.3 Spontaneous Synchronization of Deterministic Models Some detailed synthetic multicellular network models have been proposed to study spontaneous synchronization, i.e., synchronization without any extracellular stimulus (McMillen et al. 2002, Garcia-Ojalvo et al. 2004, Kuznetsov and Kopell 2004,Gonze et al. 2005). Two of them are the coupled genetic relaxation oscillators and the repressilators with intercellular signaling molecules, as shown in Figure 8.2. Both of them fall into the scope of the general model (8.1) (8.3). Figure 8.2 Coupled genetic relaxation oscillators and repressilators with intercellular signaling molecules. (from (Zhou et al. 2008)) The mathematical equations corresponding to Figure 8.2 (a) (b) can be described by and dx i dt = β 1+ρx n i 1x i + α 1 1+x n γx i y i + μ 1+ρ 1s 2 i i 1+s 2, i (8.32) dy i dt = β 1+ρx n i 2y i + α 2 1+x n, i (8.33) dl i dt = β 1+ρx n i 3l i + α 3 1+x n, i (8.34) ds i dt = β 4s i + α 4 l i + η int (s e s i ), (8.35)

284 276 8 Multicellular Networks and Synchronization dx i dt = β 1x i + α 1 1+zi n + γ 1, (8.36) dy i dt = β 2y i + α 2 1+x n + γ 2, (8.37) i dz i dt = β 3z i + α 3 1+yi n + γ 3 + μ s i, (8.38) 1+s i ds i dt = β 4S i + α 4 x i + η int (s e s i ), (8.39) respectively, where the AI (s i ) activation is considered to follow the standard MM kinetics, and μ is the maximal contribution to the laci transcription in the presence of saturating amounts of AI. The final equation in each model is for the dynamical evolution of the intracellular AI concentration, which is affected by degradation, synthesis, and diffusion toward/from the intercellular medium. The dynamics of the signaling molecule AI in the extracellular environment can be described as ds e n dt = β es e + η ext (s i s e ), (8.40) where η ext = δ/v ext with V ext being the total extracellular volume, n 1 n i=1 s i indicates average AI concentration inside individual cells, and β e is the degradation rate of AI in the extracellular environment, which is assumed to be a homogeneous culture. For the coupled genetic relaxation oscillators (8.32) (8.35) and (8.40), rapid synchronization can be achieved by the intercellular coupling mechanism even for the initially randomly distributed phases. The mechanism underlying such a phenomenon is that the cells initially in the high-x state produce sufficient AI to saturate the levels of s in the cells initially in the lowx state, thereby strengthening the coupling, causing rapid transitions, and quickly condensing the range of phases in the population (McMillen et al. 2002). The rapid synchronization is shown in Figure 8.3. Unlike the coupled genetic relaxation oscillators, (8.32) (8.35) and (8.40) are a model of coupled sinusoidal-type oscillators. It has been shown that synchronization can also be obtained even when there is a relatively broad distribution in frequencies of individual oscillators due to various noise signals (Garcia-Ojalvo et al. 2004). Diffusion of the signaling molecules across the cell membrane facilitates intercellular coupling. As the cell density increases, partial frequency locking occurs, and finally, perfect locking is achieved, i.e., complete synchronization can be observed when the cell density is large enough (Garcia-Ojalvo et al. 2004). To characterize the transition to synchronization, a quantity R which changes abruptly at the transition point is defined as R = 1 N i=1 M 2 M 2 N i=1 ( y2 i y i 2 ) = Var t (M) Mean i (Var t (y i )), (8.41)

285 8.3 Spontaneous Synchronization of Deterministic Models 277 Figure 8.3 Rapid synchronization for the coupled genetic relaxation oscillators. (from (McMillen et al. 2002)) where M(t) =(1/N ) N i=0 y i(t) is the average signal, denotes time average, and the dynamics of y i is defined by (8.33) or (8.37). In the unsynchronized regime, R 0, whereas R 1 in the synchronized state. Synchronization occurs when the coupling strength is strong enough, i.e., R 1asQ 1, as shown in Figure 8.4, where Q is defined as Q = δn/v ext β e + δn/v ext, (8.42) which is linearly proportional to the cell density, if δn/v ext is sufficiently smaller than the extracellular AI degradation rate β e (Garcia-Ojalvo et al. 2004). To describe the degree of synchronization across a population of cells, a similar ordering parameter R is defined as 1 R = n n k=1 e iφ k, (8.43) where i = 1, and φ k stands for the phase of the kth oscillator. Then, R = 0 corresponds to unsynchronization, whereas R = 1 corresponds to complete synchronization. The dependence of the amplitude and the period of the synchronized entire system on Q is shown in Figure 8.5 for the coupled relaxation oscillators and repressilators. An assembly of relaxation oscillators can persist in more stable period and amplitude in a wider region of parameter Q

286 278 8 Multicellular Networks and Synchronization Figure 8.4 Synchronization transition of the coupled repressilators for increasing Q. (from (Garcia-Ojalvo et al. 2004)) than repressilators. Further, the amplitude and period abruptly decrease when the cell density increasingly approaches the limit Q = 1. In other words, the cell density can play a role in the rapid damping of the period and amplitude of coupled genetic relaxation oscillators when it is beyond a threshold (Zhou et al. 2008). Spontaneous synchronization of coupled circadian oscillators in suprachiasmatic nucleus was also investigated based on a similar mechanism. The coupling mechanism through the global level of neurotransmitter concentrations is effective in synchronizing a population of oscillators. Moreover, it has also been shown that phases of individual cells are governed by their intrinsic periods and efficient synchronization can be achieved when the average neurotransmitter concentrations would dampen individual oscillators (Gonze et al. 2005). It is worth noting that synchronization can also be established even though cell-to-cell variation exists, as observed in the coupled repressilators (Garcia- Ojalvo et al. 2004) and circadian oscillators (Gonze et al. 2005). When considering such a variation, stochastic parameters are used. For example, in both coupled repressilators and coupled circadian oscillators, one or more parameters can be selected randomly from a Gaussian distribution. Because not all of the oscillators are identical, perfect synchronization cannot be achieved and phase differences between some oscillators still persist at the synchronized state, i.e., phase synchronization.

287 8.4 Entrained Synchronization for Deterministic Models 279 Figure 8.5 The effect of parameter Q on the amplitude and the period of oscillations: (a) and (c) for the coupled relaxation oscillators; (b) and (d) for the coupled repressilators. Parameters values for the coupled relaxation oscillators are n = 100, α 1 = 10, α 2 =1,β 1 = β 2 =0.5, α 3 = 50, β 3 = 25, α 4 = β 4 =0.4, ρ = 200, n =4, μ = 10, η int = 120, ρ 1 = 10, and γ = 6. Parameter values for the repressilators are n = 100, α 1 = α 2 = α 3 = 216, β 1 = β 2 = β 3 =1,α 4 =0.01, β 4 =1,μ = 10, η int =1,n = 3, and γ γ 1 = γ 2 = γ 3 =0.5. (from (Zhou et al. 2008)) 8.4 Entrained Synchronization for Deterministic Models Besides spontaneous synchronization, where any external forcing is not required to induce synchronization, it is well known that external forcing can play important roles in synchronization of cellular oscillators. For example, in the natural environment, circadian clocks are subjected to the alternation of light and dark every day. This external cycle entrains coupled oscillators precisely to a 24h period (Gonze et al. 2005, Beersma et al. 1999). Generally, cellular oscillators can be synchronized by an appropriate external or internal stimuli, i.e., entrained synchronization. Examples of entrained synchronization have been widely demonstrated. For instance, synchronization of electronic genetic networks by an external forcing, i.e., external voltage (Wagemakers et al. 2006), synchronization induced by periodic stimulation in squid giant axons (Aihara et al. 1984, Matsumoto et al. 1987, Kaplan et al. 1996),

288 280 8 Multicellular Networks and Synchronization and synchronization induced by periodic impulses (Zhou et al. 2008,Wang et al. 2006b) and light and dark cycles (Gonze et al. 2005). Note that although there is a cell-to-cell variation, i.e., the period of each oscillator varies slightly, a light dark cycle can still induce a systematic change in their periods due to the light intensity. All oscillators are synchronized, leading to a single resulting period, i.e., a precisely 24h period, which is identical for all oscillators. Consider the example of the coupled circadian oscillators. The light dark cycle is simulated by using a square-wave function for the light term, L, which switches, e.g., from L = 0 in the dark phase to L =0.01 in the light phase. Such a forcing entrains the circadian oscillators to a 24h period, as shown in Figure 8.6. (a) (b) (c) (d) Figure 8.6 Entrainment of 10,000 coupled circadian oscillators by a light dark cycle. The light dark cycle is described by a square-wave forcing: L = 0 in the dark phases and L = 0.01 in the light phases. (a) Distribution of individual periods. (b) Oscillations of randomly chosen 10 oscillators among the 10,000 oscillators. (c) Distribution of the periods in the coupled system. (d) The oscillation of the mean field. In (b) and (d), the white and black bars indicate the light and dark phases, respectively. (from (Gonze et al. 2005)) In addition to the natural light and dark cycle, which entrains coupled oscillators precisely to a 24h period, there are some other types of external

289 8.4 Entrained Synchronization for Deterministic Models 281 forcing, which can be used as artificial control strategies to implement in experiments and further clinical applications possibly to control desynchronization and pathological rhythms especially when necessary synchronization cannot be achieved spontaneously. Since individual oscillators interact with each other in a fluctuating external environment, disruption of the rhythmic processes beyond normal limits, emergence of abnormal rhythms, and large fluctuations in the external environment are often associated with the loss of synchronization. Moreover, diseases can also lead to alternations from synchronization to desynchronization. External forcing can thus be used as an artificial control strategy to compensate coupling inefficiency and further induce entrained synchronization by increasing the production, the release, and the detection of the signaling molecules. Generally, external forcing can be imposed on one or more components in individual oscillators (Gonze et al. 2005). Moreover, external forcing can be imposed on the signaling molecules (Zhou et al. 2008, Wang et al. 2006b). The commonly used extracellular medium and the diffusive process provide an artificial control strategy, i.e., introduction of the diffusive signaling molecules into the extracellular medium at fixed instants. The impulsive control system can be represented as (8.1) (8.2) and where I ext (t) takes the form ṡ e (t) = d e (s e (t)) + c e (s i (t),s e (t)) + I ext (t), (8.44) I ext (t) =σ δ(t t k ) (8.45) with t k = kt (k Z + = {1, 2,...}). More specifically, it takes the form k=1 ẋ i (t) = d x i (x i (t)) + f i (x i (t)) + r i (x i (t),s i (t)), (8.46) ṡ i (t) = d s i (s i (t)) + p i (x i (t),s i (t)) + c i (s i (t),s e (t)), (8.47) ṡ e (t) = d e (s e (t)) + c e (s i (t),s e (t)), t kt, (8.48) Δs e (t) =σ, t = kt, (8.49) where Δs e (t) =s e (t + ) s e (t) =σ is the impulsive input with constant or random injection amounts of the signals into the common extracellular medium at control instants t = kt, with impulsive input period T. Clearly, Δs e (t) represents a sudden change of signals in the extracellular medium at instants t = kt. The dynamics of (8.46) (8.49) is determined by the the original system (8.1) (8.3), the injection amount σ, which can take a deterministic or stochastic value, and the injection period T. In the case σ = 0, (8.46) (8.49) reverts to (8.1) (8.3). The impulsive control strategy has two beneficial effects. One is that it can compensate coupling inefficiency when spontaneous synchronization cannot be achieved, i.e., the external forcing can be used as a coupling amplifier so as

290 282 8 Multicellular Networks and Synchronization to induce controlled synchronization. Moreover, the amount and period of the external forcing are independent of the state variables; therefore, when the periodic external forcing is used to increase its effectiveness at control, we need not to measure the system states at the control instants, which makes the method biologically plausible and easy to implement in experiments and possibly in medical treatments. The other is that it can reduce the noisiness of the multicellular network, effectively transforming an ensemble of noisy clocks into a very reliable collective oscillator. The findings demonstrate an efficient way to synchronize multicellular networks by an artificial control strategy and also provide a powerful mechanism for noise resistance (Wang et al. 2006b). Generally, external forcing can profoundly affect the dynamics of the signaling molecules in the extracellular medium, thereby affecting the dynamics of individual cells due to the diffusive process. In other words, the dynamics of the impulsive control system may strongly depend on the frequency and amount of external forcing. Consider the coupled repressilators with periodic injection of the signalling molecules AI into the common extracellular medium. The schematic of the coupled repressilators with impulsive control is shown in Figure 8.7. It is found that external forcing can indeed entrain the intrinsic rhythms or induce collective rhythms although the natural periods of individual oscillators are broadly distributed. I n j e c t i o n C e l l - 1 C e l l - 2 A I L a c C I L u x I T e t R A I A I E x t r a c e l l u l a r s p a c e C e l l - N A I Figure 8.7 Schematic of the coupled repressilators with the quorum-sensing mechanism and periodic external forcing for impulsive control. (from (Wang et al. 2006b)) The impulsive control system can easily be synchronized especially when the impulsive period T is close to the mean period of the oscillators. In other words, when the impulsive period T is close to the mean period of the oscillators, a relatively small amount of impulsive control is required to induce synchronization. In addition, the higher the impulsive amount of the external input, the larger the impulsive period range for synchronization. In the pres-

291 8.5 Noise-driven Synchronization for Stochastic Models Without Coupling 283 ence of appropriate impulsive control, when the impulsive period is close to the mean period of all oscillators, the characteristic oscillations of the controlled synchronization do not change qualitatively, except the dynamics of AI in the intracellular and extracellular medium, which is changed qualitatively by the external forcing. On the other hand, for the same impulsive period, the minimum impulsive amount required to synchronize a population of oscillators decreases with increase in the coupling strength, which means for a specific impulsive period and relatively larger coupling strength, a smaller impulsive amount is required to compensate coupling inefficiency. Therefore, even if the coupling itself is insufficient to induce spontaneous synchronization, it still plays an important role for the entrained synchronization. In other words, the entrained synchronization is induced by the coupling along with the external forcing, rather than the external forcing alone. Because the rhythms actually arise from a stochastic cellular mechanism interacting with a fluctuating environment, the strictly constant impulsive amount is not plausible. It has been shown that under conditions of periodic input with random impulsive amount σ, collective dynamics can also emerge. In such a case, σ can be chosen randomly from an interval [σ a, σ b ], and a population of noisy oscillators can be entrained by a periodic source of the coupling substance to yield sustained oscillations with an irregular waveform and a stable period, which is determined by the relative magnitude of the impulsive period. Unlike the stability of the period, both the irregularity of the oscillators induced by the intrinsic and extrinsic noise and the random impulsive amounts render the amplitude irregular. External stimuli can also induce rich dynamics such as the Arnold tongue and resonance (Zhou et al. 2008), and oscillation death and chaos (Wang et al. 2006b). For example, for the coupled repressilators (8.36) (8.39) and (8.44), the Arnold tongue and resonance regions as functions of the impulsive amount σ and period T are shown in Figure 8.8. Besides the period impulse, other forms of external forcing can also be used, for example, a sinusoidal stimulus. It has been shown that when the sinusoidal stimulus I ext (t) =λ + σ cos(ωt) is used, periodic intermittent synchronization can be observed due to the timevarying strength. 8.5 Noise-driven Synchronization for Stochastic Models Without Coupling When examined carefully, biological oscillators are rarely strictly periodic but rather fluctuate irregularly with time. The fluctuations arise from the combined influences of intracellular or intrinsic noise due to the intrinsically stochastic nature of the biochemical reactions involved, i.e., the random transitions among discrete chemical states such as random births and deaths of individual molecules, extracellular noise owing to environmental perturbations or stochastic variations of externally set parameters, and biological variance,

284 8 Multicellular Networks and Synchronization Figure 8.8 External forcing induced Arnold tongues and resonance regions. Five different dynamical regions are labeled A-E.

292 284 8 Multicellular Networks and Synchronization Figure 8.8 External forcing induced Arnold tongues and resonance regions. Five different dynamical regions are labeled A-E. In region A, the coupled oscillators can display rich dynamics, in particular, including oscillation death. In the dominant region B, the Arnold tongue is found around the natural period. Within this resonance region, the period of oscillation is entrained to the external period. The regions C, D, and E show 3:2, 2:1, and 3:1 resonance regions, respectively. (from (Zhou et al. 2008)) i.e., different properties among cells. The continual interaction between the environmental fluctuations and intracellular feedback mechanisms renders the separation of the effects impossible. In most cases, additive and multiplicative stochastic terms, or random parameters can be used to simulate stochastic fluctuations. Such stochastic noise may not only affect biological activities of both individual cells and the entire multicellular system but also may be exploited by living organisms to positively facilitate certain collective behavior. By using the phase reduction method (Kuramoto 1984) and analytically computing the Lyapunov exponent, Teramae and Tanaka (Teramae and Tanaka 2004) showed that uncoupled limit-cycle oscillators can be in-phase synchronized by common weak additive noise regardless of their intrinsic properties and the initial conditions. The population of n oscillators driven by common additive noise are described as ẋ i (t) =f(x i )+ξ(t), (8.50) where ξ(t) is a vector of Gaussian white noise. The elements of the vector are normalized as ξ i (t) = 0 and ξ i (t)ξ j (t) =2D ij δ(t s), where D = {D ij } is a variance matrix of the noise components with i, j =1,..., n. Assume that the unperturbed system has a limit cycle and consider the common noise as a weak perturbation to the deterministic oscillators. Based on the Kuramoto model (Kuramoto 1984), one would attempt to obtain a phase variable φ(x) such that dφ dt = ω, (8.51)

293 8.5 Noise-driven Synchronization for Stochastic Models Without Coupling 285 where ω is the intrinsic frequency of the unperturbed oscillators. Then, the equation for the phase becomes dφ dt = φ x f(x)+ φ x ξ = ω + φ ξ. (8.52) x Because all the oscillators are identical and there is coupling among them, we can study the phase synchronization of the entire population in a reduced system of two oscillators. This can be converted into the equation dφ i dt = ω + Z(φ i)ξ, i =1, 2 (8.53) where φ i is the phase variable of the ith oscillator, Z is the phase-dependent sensitivity defined as Z(φ) =grad x φ x=x0(φ), andx 0 (φ) is the unperturbed limit cycle solution. φ i is the phase of oscillator i. The phase equation obtained by using Itô s formula is described by dφ i dt = ω + Z (φ i ) T DZ(φ i )+Z(φ i )ξ, (8.54) where the dash denotes differentiation with respect to φ i. When all the oscillators are identical and there is no interaction between them, the phase synchronization of the entire population can be reduced to a system of two oscillators. To prove that the synchronizing solution φ 1 (t) = φ 2 (t) is stable, one needs to calculate the Lyapunov exponent λ of the solution. Define the phase difference between the two oscillators as ψ = φ 2 φ 1.The linearization of (8.54) with respect to ψ yields dψ dt =[(Z (φ i ) T DZ(φ i )) + Z (φ i ) ξ]ψ, (8.55) where φ i obeys (8.54). By proving the Lyapunov exponent to be negative, i.e., λ = 1 2π Z T DZ dφ 0, (8.56) 2π 0 it can be shown that phase synchronization induced by the common weak noise is stable in an arbitrary oscillator system, regardless of the detailed oscillatory dynamics. How weak noise can be, however, is difficult to determine because it is not a quantitative condition and too strong noise may also decrease the synchronization probability. For coupled oscillators, on the other hand, the situation may be different due to the combined influences of the intercellular coupling and the common noise. In other words, the coupled oscillators can be more easily synchronized because both the noise and the intercellular coupling may play active roles in inducing synchronization, i.e., effects of the common noise can be more easily imposed on individual cells through the coupling, i.e., the diffusive signaling molecules.

294 286 8 Multicellular Networks and Synchronization 8.6 A General Multicellular Network for Stochastic Models with Coupling A Model A general method of modeling multicellular networks with stochastic fluctuations, i.e., from the master equation to the Langevin equations and then to the cumulant evolution equations, including the derivation of intracellular and extracellular noise from the biochemical reactions involved, was developed for synchronization in multicellular networks (Chen et al. 2005, Zhou et al. 2005). Sufficient conditions were also provided based on the global Hopf bifurcation theory (Alexander et al. 1978, Alexander et al. 1986). The results are obtained under the assumption that the cell number is sufficiently large that the system with Gaussian approximation can be expressed by the cumulant equations. Therefore, the existence of periodic solutions in the cumulant equations implies that the original n cells show bulk synchronization (Chen et al. 2005). As described in Chapter 2, for a general molecular network in a single cell or a subsystem with m molecular species {S 1,..., S m } that react through M reaction channels {R 1,..., R M }, we can express the M intracellular biochemical reactions by the master equation (2.55). Depending on the requirement of accuracy and computational power, the master equation can be represented approximately by the Langevin equation (2.104), the Fokker Planck equation (2.113), and further the cumulant equation (2.142). For a multicellular system with coupling, the master equation includes not only the M intracellular biochemical reactions, but also the diffusion reactions which can be expressed approximately by the following biochemical reactions: X i d ii d ii v/v Y i, (8.57) where X i is the number of the intracellular signaling molecules, d ii is the diffusion rate of X i between the cell and the extracellular environment, Y i is the total number of X i in the extracellular environment, v = va, and V = VAwith v, V,andAbeing the individual cell volume, the total culture or environmental volume, and the Avogadro number, respectively. Moreover, the effects of extracellular noise with Gaussian distribution are also assumed to be associated with the diffusion process of X i and are incorporated in the master equation, which can be equivalently described in the form of the following biochemical reactions: σ 2 ii X v/(2x i) i Y i, (8.58) σii 2 v/(2y i) where σii 2 is the extracellular noise intensity or the variance, which affects the cell dynamics through signaling molecules X i and Y i. Note that when the

295 8.6 A General Multicellular Network for Stochastic Models with Coupling 287 master equations are used to model multicellular systems, both the probability function P (X; t) and the transition rates P XX and P X X are also considered as functions of the environmental variables Y i, if the coupling is included. d ii =0,iftheith molecule is not a coupling variable. By adding (8.57) (8.58) to the master equation (2.55) we can analyze and simulate the dynamical behavior of multicellular networks with the consideration of both the diffusion process and the extracellular noise by the algorithms of stochastic simulation described in Chapter 2. In other words, assume that the system is homogeneous due to the free diffusion and transportation processes of the signaling molecules between individual cells and the environment, which means that the signaling molecules are randomly and uniformly distributed throughout the environment. When there are a sufficiently large number of cells, i.e., n, the concentration of Y approaches the average or the mean field concentration of X, i.e., Y (t)/v = N x (t τ) X(t τ)/v which represents the time-delayed feedback effects. N(t) is the mean value of the concentration X(t)/v. Hence, the master equation (2.55) with (8.57) (8.58) represents a general multicellular network, which can be numerically simulated by the modified Gillespie algorithm with time delay effects and a parallel computation scheme (Chen et al. 2005, Wang et al. 2008). When the coupling by the diffusion reactions is considered as the approximation of the master equation, the Langevin equations for the coupled multicellular networks become dx i (t) = f i (x(t)) + d ii (y i (t) x i (t)) + ξ i (t), (8.59) dt where ξ i (t) are Gaussian white noise signals with zero means ξ i (t) = 0 and covariances ξ i (t)ξ j (t ) =(K ij (x(t)) + d ij (y i (t)+x j (t)) + σij 2 )δ(t t ). Note that d ij = σ ij =0fori j, andd ii = 0 if variable x i (t) is not a coupling variable. When the system is sufficiently large, we assume that the stochastic variables obey Gaussian distribution. Then, it can be proved that (8.59) is equivalently expressed by the first and second cumulant evolution equations, which means that we can actually examine the dynamics by deterministic cumulants instead of the complicated stochastic variables. Let us denote the first cumulant or mean of x by N with each element N i = x i, and the second cumulant or covariances of x by M with each element M ij = (x i N i )(x j N j ). Then, by integrating over all x, the cumulant evolution equations can be obtained as follows: dn i (t) = F i (N(t),M(t)) + d ii (N i (t τ i ) N i (t)), (8.60) dt dm ij (t) = G ij (N(t),M(t)) (d ii + d jj )M ij (t) dt +d ij (N i (t τ i )+N j (t)), (8.61) where i, j = 1,..., m, F i (N(t),M(t)) = f i (x(t)), andg ij (N(t),M(t)) = (x i (t) N i (t))f j (x(t))+(x j (t) N j (t))f i (x(t)) + K ij (x(t)) +σij 2. The vector

296 288 8 Multicellular Networks and Synchronization N(t) clearly has m elements. On the other hand, the non-zero elements of the covariance matrix M(t) are at most m(m +1)/2, but more than m. Note that the element d ii is zero if x i is not a coupling variable with the environment. Because not all molecules among cells are coupled with the environment, many d ii (i =1,..., m) are generally zero. Clearly, (8.61) mainly represents the effect of noise. Although beneficial roles of noise on synchronization have been extensively studied, complete understanding of their origin and how to exploit them in order to regulate cellular functions require further investigation. The mechanism of synchronizing a population of interacting oscillators by common extracellular noise seems likely that the noise is shared by all cells and can be effectively exerted on each cell through the intercellular coupling, i.e., the diffusive process of the signaling molecules. Moreover, the intracellular and extracellular noise may play different roles in inducing synchronization. Since the intracellular noise signals of each cell are independent, they generally tend to disturb synchronization among cells. However, the extracellular noise is nearly common to all cells due to the common extracellular environment and facilitate the synchronization of the dynamics of all cells by exerting the same fluctuations on each cell via signaling molecules Example of a Gene Regulatory Network Consider the example of a coupled genetic network, i.e., a two-gene model which uses luxi and luxr with promoter P lac Lux0 adopted in (Chen et al. 2005), as shown in Figure 8.9. Genes luxi and luxr which coordinate the behavior of bacteria, such as quorum sensing, were initially discovered in the marine bacterium, Vibrio fischeri. They are constructed as an operon under the control of the promoter P lac Lux0. Cell-to-cell coupling is accomplished by diffusing a small signaling molecule into the extracellular environment, i.e., AI, which plays a major role in the cell-to-cell communication. The protein LuxI is an AI synthase that produces AI. Both proteins LuxR and AI are first dimerized and then form a complex, i.e., a hetero-tetramer, which inhibits the activity of the promoter P lac Lux0. As a signaling molecule, AI freely diffuses into the environment to exchange information with other cells, and then enters individual cells to alter gene expression. Let AI 2 and LuxR 2 indicate AI and LuxR dimers, and AL and ALD represent AI 2 LuxR 2 and AI 2 LuxR 2 -DNA complexes, respectively. Then, the autoinducer synthesis, the multimerization reactions of proteins, and the binding reaction on the regulatory region of DNAs in individual cells are described as follows:

297 8.6 A General Multicellular Network for Stochastic Models with Coupling 289 CELL-2. LuxR 2 - AI 2 AI AI AI AI CELL-1 LuxR LuxI AI CELL-3 AI P lac Lux0 luxr luxi promotor genes CELL-4 AI Figure 8.9 A two-gene model of a gene regulatory network. Gene luxr produces the protein LuxR, which is dimerized. Protein LuxI synthesizes AI, which forms a dimer and further a hetero-tetramer by binding to a LuxR dimer. The AI-LuxR tetramer binds to the promoter p lac Lux0 to inhibit the transcription of the genes luxr and luxi. Cell communication or synchronization is accomplished by diffusing AIs to the extracellular environment, which further enter the cells as signaling molecules to regulate gene expression. (from (Chen et al. 2005)) LuxI k a LuxI + AI, (8.62) AI + AI k 1 AI 2, (8.63) k 1 LuxR + LuxR k2 k 2 LuxR 2, (8.64) k 3 AI 2 + LuxR 2 AL, (8.65) k 3 AL + DNA k 4 ALD. (8.66) k 4 Let the copy number of plasmids with the operon luxi and luxr be n D.Then a conservation condition for DNA binding sites can be obtained, i.e., the total number of free DNAs and ALDs should be equal to n D. On the other hand, the reactions involving transcription, translation, and degradation in a cell are expressed as DNA km mrna LuxI + mrna LuxR + DNA, (8.67) ALD αk m mrna LuxI + mrna LuxR + ALD, (8.68) mrna LuxI mrna LuxR k pi LuxI + mrnaluxi, (8.69) k pr LuxR + mrnaluxr, (8.70)

298 290 8 Multicellular Networks and Synchronization where 0 < α < 1 is a repression coefficient. As shown in (8.67) (8.68), mrna LuxI and mrna LuxR are produced by the same reactions due to the operon. Molecules LuxI, LuxR, mrna LuxI, mrna LuxR, and AI degrade at rates e i, e r, e mi, e mr,ande a, respectively. Denote the numbers of LuxI, LuxR, AL, ALD, AI 2, LuxR 2, mrna LuxI, mrna LuxR, and AI as R 1, R 2, R 3, R 4, R 5, R 6, R 7, R 8,andR 9, respectively. Then, we can derive the master equation for the gene network shown in Figure 8.9. For convenience, we define the following molecules: X 1, LuxI; X 2, LuxR; X 3, AL; X 4, ALD; X 5,AI 2 ; X 6, LuxR 2 ; X 7, mrna LuxI ; X 8, mrna LuxR ; X 9, AI; and Y,AI 2 in the environment. Define n D as the total number of DNAs, and n DNA as the free DNA number. Then, by the conservation condition, we have n DNA + X 4 = n D. From (2.55) and (8.57) (8.58), the transition rates and the states corresponding to reactions (8.62) (8.70) and (8.57) (8.58) are listed in Table 8.1, where the last two rows represent the diffusion process and the extracellular noise effect between each cell and the environment for AI according to (8.57) (8.58). In Table 8.1, the volume factors v and V are multiplied to some w k to convert the concentration to the number of molecules because the reaction rates k 1 k 4 are second-order reactions and are defined not by the numbers but by the concentrations in the given data. Then, by appropriate approximations to the master equation as described in Chapter 2, the Langevin equations of (8.59) for a single cell in terms of concentrations can be obtained: dx 1 (t) dt = e i x 1 (t)+k pi x 7 (t)+ξ 1, dx 2 (t) dt = 2k 2 x 2 (t)(x 2 (t) 1 v )+2k 2x 6 (t)+k pr x 8 (t) e r x 2 (t)+ξ 2, dx 3(t) dt = k 3 x 5 (t)x 6 (t) x 3 (t)(k 3 + k 4 ( n D v x 4 (t))) + k 4 x 4 (t)+ξ 3, dx 4(t) dt = k 4 x 3 (t)( n D v x 4 (t)) k 4 x 4 (t)+ξ 4, dx 5(t) dt = k 1 x 9 (t)(x 9 (t) 1 v ) k 1x 5 (t) k 3 x 5 (t)x 6 (t)+k 3 x 3 (t)+ξ 5, dx 6 (t) dt = k 2 x 2 (t)(x 2 (t) 1 v ) k 2x 6 (t) k 3 x 5 (t)x 6 (t)+k 3 x 3 (t)+ξ 6, dx 7 (t) dt = k m ( n D v x 4 (t)) + αk m x 4 (t) e mi x 7 (t)+ξ 7, dx 8 (t) dt = k m ( n D v x 4 (t)) + αk m x 4 (t) e mr x 8 (t)+ξ 8, where dx 9 (t) dt = 2k 1 x 9 (t)(x 9 (t) 1 v )+2k 1x 5 (t)+k a x 1 (t) e a x 9 (t), +d( x 9 (t τ) x 9 (t)) + ξ 9, ξ i (t)ξ j (t ) = K ij δ(t t )fori 9 and j 9 with K ij = K ji, and ξ 9 (t)ξ 9 (t ) =(K 99 + d( x 9 (t τ) V v + x 9(t))+(σV ) 2 )δ(t t ),

299 8.6 A General Multicellular Network for Stochastic Models with Coupling 291 Table 8.1 Transition rates and states X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 k θ k,1 θ k,2 θ k,3 θ k,4 θ k,5 θ k,6 θ k,7 θ k,8 θ k,9 w k k a X 1 (t) k 1 X 9 (t)(x 9 (t) 1)/v k 1 X 5 (t) k 2 X 2 (t)(x 2 (t) 1)/v k 2 X 6 (t) k 3 X 5 (t)x 6 (t)/v k 3X 3(t) k 4X 3(t)(n D X 4(t))/v k 4X 4(t) k m (n D X 4 (t)) αk m X 4 (t) k pi X 7 (t) k pr X 8 (t) e mi X 7 (t) e mr X 8 (t) e ix 1(t) e rx 2(t) e ax 9(t) dy 9 (t)v/v +(σ) 2 v/ dx 9 (t)+(σ) 2 v/2 If n, then Y 9 (t) = X 9 (t τ) V. v The transition rate w k (X(t)) = 0, if w k (X(t)) < 0orifw k (X(t)) has a variable X i (t) satisfying X i (t)+θ k,i < 0, due to nonnegative values of w k and X(t). w k (X(t)) = 0, if w k (X(t)) has a term n D X 4 (t) satisfying either n D X 4 (t) < 0 or n D (X 4 (t)+θ k,4 ) < 0, due to the conservation condition of the DNA number. K 11 (t) =e i x 1 (t)+k pi x 7, K 22 (t) =4k 2 x 2 (t)(x 2 (t) 1 v )+4k 2x 6 (t)+k pr x 8 + e r x 2 (t), K 33 (t) =k 3 x 5 (t)x 6 (t)+x 3 (t)(k 3 + k 4 ( n D v x 4(t))) + k 4 x 4 (t), K 44 (t) =k 4 x 3 (t)( n D v x 4(t)) + k 4 x 4 (t)+σ 2, K 55 (t) =k 1 x 9 (t)(x 9 (t) 1 v )+k 1x 5 (t)+k 3 x 5 (t)x 6 (t)+k 3 x 3 (t)+σ 2, K 66 (t) =k 2 x 2 (t)(x 2 (t) 1 v )+k 2x 6 (t)+k 3 x 5 (t)x 6 (t)+k 3 x 3 (t), K 77 (t) =k m ( n D v K 88 (t) =k m ( n D v x 4(t)) + αk m x 4 (t)+e mi x 7 (t), x 4(t)) + αk m x 4 (t)+e mr x 8 (t),

300 292 8 Multicellular Networks and Synchronization K 99 (t) =4k 1 x 9 (t)(x 9 (t) 1 v )+4k 1x 5 (t)) + k a x 1 (t) +d( x 9 (t τ) x 9 (t)) + e a x 9 (t)+σ 2, K 26 (t) = 2k 2 x 2 (t)(x 2 (t) 1 v ) 2k 2x 6 (t), K 34 (t) = k 4 x 3 (t)( n D v x 4(t)) k 4 x 4 (t), K 35 (t) = k 3 x 5 (t)x 6 (t) k 3 x 3 (t), K 36 (t) = k 3 x 5 (t)x 6 (t) k 3 x 3 (t), K 56 (t) =k 3 x 5 (t)x 6 (t)+k 3 x 3 (t), K 59 (t) = 2k 1 x 9 (t)(x 9 (t) 1 v ) 2k 1x 5 (t), where x i represents the concentration of R i, i.e., x i = R i /v, and other K ij = 0. Note that if a term in f i and K ij is negative, then the corresponding term is zero, due to the constraints of w k in the master equation. The intracellular noises ξ i are derived directly from the master equation by the second-order approximation of θ k,i, and they are additive and white with identical and independent distribution for each cell. Theoretically, when individual jumps or the changes θ k,i of the number X i (t) are small, such an approximation approaches an accurate result; i.e., the additive and white noise signals are an adequate representation of the fluctuations in a cell. Otherwise, the Ω expansion technique or other approximation method should be adopted to approximate the master equation in a more accurate manner. In the numerical examples, all the jumps θ k,i are 1 or 2, which are small compared with X i (t) but they may still lead to the introduction of errors in the simulation. Define N i to be the first cumulant or the mean value of x i in the cell and M ij to be the second cumulant or the covariance of x i and x j. Then, according to (8.61) and the Gaussian distribution approximation of Chapter 2, we have the evolution equations for the first and second cumulants or the means and the covariances as follows:

301 8.6 A General Multicellular Network for Stochastic Models with Coupling 293 dn 1 dt = e i N 1 ( + k pi N 7 ) dt = 2k 2 N M 22 +2k 2 N 6 + k pr N 8 +( 2k2 v e r)n 2 dt = k 3 (N 5 N 6 + M 56 ) N 3 (t) ( k 3 + k 4n D ) v k 4 N 4 + k4 M 34 + k 4 N 4 dn 2 dn 3 dn 4 ( dt = k 4 N ndv ) 3 N 4 k 4 N 4 k 4 M 34 dn 5 ( ) dt = k 1 N 2 ( 9 + M 99 ) k 1 N 5 k 3 (N 5 N 6 + M 56 )+k 3 N 3 k1 v N 9 dn 6 dt = k 2 N M 22 k 2 N 6 k 3 (N 5 N 6 + M 56 )+k 3 N 3 k 2 v N 2 dn 7 dt = k m ( n D v N 4 )+αk m N 4 e mi N 7 dn 8 dt = k m ( n D v N 4 )+αk m N 4 e mr N 8 dn 9 ( ) dt = 2k 1 N M 99 +2k 1 N 5 e a N 9 + k a N 1 N 9 + d(n 9 (t τ) N 9 (t)) 1 dm dm k1 v dt = 1 2 (e in 1 + k pi N 7 ) e i M 11 ( ) dt =2k 2 N M 22 +2k 2 N k prn 8 +( 1 2 e r 2k 2 v )N 2 (4k 2 N 2 + e r ) M 22 +2k 2 M k 2 ( ( v M 22 dt = 1 2 k3 (N 5 N 6 + M 56 )+N 3 k 3 + k 4n D ) ) v k 4 N 4 k4 M 34 + k 4 N 4 +k 3 N 6 M 35 + k 3 N 5 M 36 ( k 4 N 4 + k 3 + k 4n D ) v M3 +(k ( 4 N 3 ( + k 4 ) M 34 k4 N ndv ) ) ( 3 N 4 + k 4 N 4 k 4 M 34 + ndv ) k4 N 4 M34 1 dm dm 44 2 dt = dm 55 2 dt = dm dm dm dm 99 2 dm 26 dm 34 dt ( (k 4 ( + k 4 N 3 ) M ) σ2 k1 N M 99 + k 1 N 5 + k 3 (N 5 N 6 + M 56 )+k 3 N 3 k1 v N 9 (k 1 + k 3 N 6 ) M 55 + k 3 M 35 k 3 N 5 M 56 +2k 1 N 9 M 59 dm 55 k 1 ( v M ( σ2 ) ) dt = 1 2 k2 N M 22 + k 2 N 6 + k 3 (N 5 N 6 + M 56 )+k 3 N 3 k 2 v N 2 (k 2 + k 3 N 5 ) M 66 +2k 2 N 2 M 26 + k 3 M 36 k 3 N 6 M 56 k 2 ( v M 26 dt = 1 2 km ( n ) D ( v N 4 )+αk m N 4 + e mi N 7 emi M 77 dt = 1 2 km ( n ) D ( ( v N 4 )+αk ) m N 4 + e mr N 8 emi M 88 dt = 1 2 4k1 N M 99 +4k 1 N 5 + e a N 9 + k a N 1 4k1 v N 9 + d(n 5 (t τ) N 5 )) (e a +4k 1 N 9 ) M 99 + k a M 19 +2k 1 M k1 v M σ2 dt = 2k 2 (N2 2 + M 22 ) 2k 2 N 6 + 2k 2 v N 2 +2k 2 N 2 M 22 +2k 2 M 66 (k 2 + e r +4k 2 N 2 + k 3 N 5 ) M 26 k2 v M k2 v M 26 = k 4n D ( v N 3 + k 4 (N 3 N 4 + M 34 ) k 4 N 4 + k ndv ) 4 N 4 M33 +(k 4 N 3 + k 4 ) M 4 ( k 4 N 4 + k 4 N 3 + k 3 + k 4 + k 4n D ) v M34 dm 35 dt = k 3 (N 5 N 6 + M 56 ) k 3 N 3 + k 3 M 33 + k 3 N 6 M 55 ( k 4 ( N 4 + n ) D v )+k 1 + k 3 + k 3 N 6 M35 + k 3 N 5 (M 56 M 36 ) dm 36 dt = k 3 (N 5 N 6 + M 56 ) k 3 N 3 + k 3 M 33 + k 3 N 5 M 66 k 3 N 6 M 35 ( n k 2 + k 3 + k Dv ) 4 + k 3 N 5 k 4 N 4 M36 + k 3 N 6 M 56 dm 56 dt = k 3 (N 5 N 6 + M 56 )+k 3 N 3 k 3 N 6 M 55 k 3 N 5 M 66 +k 3 (M 35 + M 36 ) (k 1 + k 2 + k 3 N 6 + k 3 N 5 ) M 56 dm 59 dt = 2k 1 (N M 99 ) 2k 1 N 5 + 2k1 v N 9 +2k 1 M 55 +2k 1 N 9 M 99 (d + k 1 + e a + k 3 N 6 +4k 1 X 9 ) M 59 k 1 v M k 1 v M 59. )

302 294 8 Multicellular Networks and Synchronization Theoretical Analysis Hopf Bifurcation of Evolution Equations We first analyze the deterministic model, i.e., the cumulant equations, to derive general conditions for the Hopf bifurcation, under which the system (8.60) (8.61) with time delays will converge to a non-trivial periodic solution. Let ( N, M) be an equilibrium of (8.60) (8.61). The number of the non-zero elements in the covariance vector M is p. Denote functions F i (N(t),M(t)), G ij (N(t),M(t)) (d ii + d jj )M ij (t), and d ij (N i (t τ i )+N j (t)) by a m 1 vector function F (N(t),M(t)), a p 1 vector function G(N(t),M(t)), and a p 1 vector function U(N(t τ),n(t)), respectively. Define ( F A(λ) = N + P ) F M G N + Q G, (8.71) M where P = diag(d 11 (e λτ 1 1),..., d mm (e λτ m 1))isanm m diagonal matrix. Q = e λt (U N1,..., U Nm )isap m matrix, where U Ni denotes a p 1 vector function U(N(t τ),n(t)), in which N j (t τ j )andn j (t) for j =1,..., m are replaced by zeros if j i, and replaced by e λ(t τi) and e λt if j = i, respectively. Then, the characteristic equation of (2.141) (2.142) evaluated at the equilibrium ( N, M) is det(λi A(λ)) = 0, (8.72) where I is the (m + p) (m + p) identity matrix. Note that A also includes the noise deviation σ due to G. For any parameter α in (8.60) (8.61), such as coupling coefficients, time delays, and noise deviations, we have the following theorem: Theorem 8.2. Suppose that functions F, G, and U are sufficiently smoothly depending on the parameter α, and there is α 0, such that for α<α 0 all roots λ k, k =1, 2,..., m + p of the characteristic equation belong to the open left half-plane, whereas for α = α 0, 1. λ 1,2 α=α0 = ±iω 0, ω 0 > 0; 2. dreλ 1,2(α) dα α=α0 > 0, Reλ j α=α0 < 0(j>2), then, a periodic solution of system (8.60) (8.61) arises near the solution (N,M) = ( N, M), and this solution is stable if it arises for α > α 0 and unstable if in the opposite case. Under these conditions, if α increases and passes through the value α 0, then, the stable equilibrium becomes unstable, i.e., α = α 0 is a critical value of the bifurcation. When α passes through α 0 in one of the two directions, a periodic solution bifurcates from the equilibrium. Such a solution is stable if it arises for α>α 0 and unstable in the opposite case.

303 8.6 A General Multicellular Network for Stochastic Models with Coupling 295 A Sufficient Condition for Synchronization When the number of cells n is sufficiently large, we assume that the system can be expressed by deterministic dynamics of (8.60) (8.61) by the Gaussian approximation. To directly consider the interconnection of cells, we replace N i (t τ) of (8.60) (8.61) by n k=1 N i k(t τ)/n due to y i of (8.59), where the superscript k indicates cell k. For such a case, the existence of periodic solutions in the system (8.60) (8.61) implies that the original n cells show bulk synchronization. However, partial oscillations among cells are generally expected. For the consideration of generality, we divide the n cells into n ( n n) different sets or groups, each set or group containing a fraction W k, k =1, 2,..., n, of n oscillators with n k=1 W k =1, (8.73) where the superscript k for W k indicates the group or the set k. W k is a non-negative scalar and W k n is an integer representing the number of cells in the kth group. Because all the cells in each set are equivalent, we further use R i (t τ) n k=1 W k Ni k(t τ) to replace n k=1 N i k (t τ)/n. Thus, (8.60) (8.61) are rewritten as follows: dni k(t) dt dmij k (t) dt = F i (N k (t),m k (t)) + d ii (R i (t τ) N k i (t)), (8.74) = G ij (N k (t),m k (t)) (d ii + d jj )M k ij(t) +d ij (R i (t τ)+n k j (t)), (8.75) where 1 k n, d ij =0ifi j, Mij k (t) =M ji k (t). Note that bulk oscillation or in-phase synchronization of these n groups correspond to n = 1. This implies that the bulk oscillation is a special case of solutions of (8.74) (8.75). However, phase-locked oscillations (i.e., phase synchronization) among cells are generally expected. For clarity, (8.74) (8.75) are rewritten as dni k(t) = F i (N k (t),m k (t)) + d ii (R i (t τ) Ni k (t)), dt (8.76) dmii k(t) = G ii (N k (t),m k (t)) 2d ii M k dt ii(t)+2d ii ( Mii(t)+N k i k (t)) +d ii [R i (t τ) Ni k (t)], (8.77) dmij k (t) = G ij (N k (t),m k (t)) (d ii + d jj )M k dt ij(t), i j. (8.78) By introducing a vector-valued variable Z combining variables N and M, (8.76) (8.78) are further rewritten in compact form as follows:

304 296 8 Multicellular Networks and Synchronization z d 1 k (t) D n z2 k (t) = H(Z k k=1 W k z1 k (t τ) Dz1 k (t) (t); μ)+ D n dt z3 k k=1 W k z1 k (t τ) Dz1 k (t), (8.79) (t) O where d D = , (8.80) 0 0 d mm z Z k 1 k (t) (t) = z2 k (t), (8.81) z3 k (t) with z1 k (t) R m, z2 k (t) R m,andz3 k (t) R m(m 1)/2, representing Ni k, M ii k and Mij k respectively, and μ is a parameter (see below). Note that there are only m(m 1)/2 independent variables in z3 k due to Mij k = M ji k. Equation 8.79 may be regarded as a system of n identical groups coupled in a linear way with time delays. Each group will be considered as a system with m(m +3)/2 distinct deterministic variables (by summing all variables of z1 k,z2 k,z3 k for each k), which are governed by the dynamical equation in the following vector form dz = H(Z; μ). (8.82) dt Suppose that its steady-state satisfies H( Z; μ) =0. Then, a steady-state of the coupled system (8.79) is Ū = ( Z; Z;...; Z). (8.83) We now study synchronization solutions of (8.79), i.e., phase-locked solutions with non-zero phase difference. The mathematical analysis of mutual synchronization is still a challenging problem. The pioneering work in this area is due to Winfree (Winfree 1967, Winfree 1980, Winfree 1987) and Kuramoto (Kuramoto 1984), who simplified the problem by assuming that the oscillators are strongly attracted to limit cycles in the phase space, so that the amplitude variations can be neglected and only phase variations need to be considered. Winfree and Kuramoto discovered that mutual synchronization is a cooperative phenomenon, by a temporal analogue of the phase transitions encountered in statistical physics. Now, suppose that the system (8.79) has a periodic solution of the form P 1 (t α j T ) Z j (t) =P (t α j T )= P 2 (t α j T ), 1 j n, (8.84) P 3 (t α j T ) where P (t) is a non-trivial vector-valued function with the least period T>0. Let α 1 = 0 without loss of generality. Such a solution is called a phase-locked

305 8.6 A General Multicellular Network for Stochastic Models with Coupling 297 solution of (8.79). Essentially, the oscillation in each group is described by function P (t). Other groups, however, may be out of phase with the phase difference, Tβ j T (α j+1 α j ). Here and henceforth, we index the group of cells by j mod( n). When (8.84) is a solution of (8.79), certain compatibility conditions must hold true. To derive those conditions, consider the behavior of the jth and the (j + 1)th variables at times t and t + β j T, respectively. From (8.84) and (8.79), for 2 j n, dp 1 (t α j T ) dt [ n ] = D W k P 1 (t τ (α k + α j )T ) P 1 (t α j T ) k=1 +H(P (t α j T ); μ) (8.85) and dp 1 (t α j T ) dt [ n ] = D W k P 1 (t τ α k T ) P 1 (t α j T ) k=1 +H(P (t α j T ); μ). (8.86) Subtracting the two equations, we have Let D n l=1 W l [P 1 (t τ (α l + α j )T ) P 1 (t α l T )] = 0. (8.87) P 1 (t) = γ k e 2πikt/T (8.88) k= be the Fourier expansion of P 1 (t), where i = 1. Then γ k = γ k,andγ k = 1 T T 0 P 1(t)e 2πikt/T dt. Substituting (8.88) into (8.87) and using orthogonality, we find that basic compatibility conditions are [( n (e det D W l e l) 2πikα 2πikα j 1 )] = 0 (8.89) l=1 for all k for which γ k 0and2 j n, where det[ ] means the determinant. Note that (8.89) has the following trivial solution for all D and arbitrary W k : α j =0, 1 j n, which corresponds to the in-phase synchronization solution of (8.79). We are more interested in non-trivial solutions. For this, assume det(d) = n l=1 d ll 0. (8.90)

306 298 8 Multicellular Networks and Synchronization Then, when One solution of (8.91) is n l=1 W l e 2πikα l =0. (8.91) α j = j 1, 1 j n (8.92) n W j =, 1 j n. (8.93) 1 n Clearly, the solution corresponding to such a phase has uniform phase difference. An interesting phenomenon is in the case that n identical cells are coupled in a ring in which each cell is connected to its nearest neighbors as depicted in (Alexander et al. 1978, Alexander et al. 1986). In such a case, we have n n. Next, we speculate the existence conditions of such a periodic solution with period T for (8.79), which are used to describe a synchronization mechanism through cell-to-cell communication. We show that the conditions required are straightforward and are easy to verify for any particular example. These conditions strongly depend on coupling, time delays, variances of noise, and kinetics. For the system (8.79), consider a problem to find a phase-locked solution of the form (8.84). By (8.84) and (8.79), we have d P 1(t) P 2 (t) = D n l=1 W l P 1 (t τ α l T ) DP 1 (t) D n dt l=1 W l P 1 (t τ α l T ) DP 1 (t) + H(P (t); μ) P 3 (t) O (8.94) and the oscillation in the jth (2 j n) group is given by Z j (t) =P (t α j T ). (8.95) Thus, the existence of a synchronous solution to (8.79) is converted to finding a periodic solution of system (8.94). According to the global Hopf bifurcation theorem (Alexander et al. 1978,Alexander et al. 1986), we only need to examine some algebraic conditions. To be specific, let t = ω 0 t, where ω 0 =2π/T. Then, (8.95) can be rewritten as d ω 0 P 1(t ) P 2 (t ) = D n l=1 W l P 1 (t 2πτ/T 2πα l ) DP 1 (t ) D n dt P 3 (t l=1 W l P 1 (t 2πτ/T 2πα l ) DP 1 (t ) ) O +H(P (t ); μ). (8.96) Considering the linearization equation of (8.96) evaluated at Ū, we have

307 8.6 A General Multicellular Network for Stochastic Models with Coupling 299 P d 1 (t) D n l=1 W l P 1 (t 2πτ/T 2πα l ) DP 1 (t) ω 0 P 2 (t) = D n dt l=1 W l P 1 (t 2πτ/T 2πα l ) DP 1 (t) P 3 (t) O +AP (t), (8.97) where A = P H((Ū); μ). Let B k (μ) = D ( n l=1 W l e 2πik(τ/T+αl) 1 ) OO D ( n l=1 W l e 2πik(τ/T+αl) 1 ) OO + A (8.98) O O O for k =0, ±1, ±2,... In addition, because our main interest is in the effects of noise on synchronized oscillations, we set μ = σ. Then, finally we obtain the following theorem to conclude the existence conditions of phase-locked synchronous solutions (Alexander et al. 1978, Alexander et al. 1986). Theorem 8.3. Suppose that function H is differentiable with respect to its arguments, and that W k =1/ n for 1 k n. If for some μ = μ 0 and α j =(j 1)/ n (mod 1), the following conditions are satisfied: 1. B 0 (μ 0 ) of (8.98) is non-singular; 2. B 1 (μ 0 ) of (8.98) has a simple purely complex eigenvalue iω 0 with the corresponding left and right eigenvectors V L and V R, respectively; 3. ikω ( 0 is not an eigenvalue ) of B k (μ 0 ) for k 2; da(μ 4. R V 0) L dμ V R 0where R is an operator taking the real part of a complex number. Then there is a global branch of 2π-periodic solutions of (8.97) bifurcating from (Ū, μ 0,ω 0 ), or equivalently, the original coupled system (8.79) has a phase-locked synchronous solution with a uniform phase difference. If these conditions in Theorem 8.3 are satisfied, then the system (8.79) definitely has a synchronous solution, and the corresponding synchronization mechanism is based on the global Hopf bifurcation. Such conditions are easy to use and enable us to predict, for a given set of parameter values, whether or not the intercell coupling and the noise can synchronize the cells Algorithm for Stochastic Simulation Based on the Direct Gillespie method (Gillespie 1976), we give a detailed algorithm for the stochastic simulation of the master equation (2.55), where Y (t) X(t τ) V/v with time delays. Let the superscript j of the algorithm indicate the jth cell, and assume there is one time delay τ although multiple delays can be incorporated to the algorithm in a similar manner. 1. Initialization: input the cell number n, the stop time t stop, and the initial states X j (0) = (X j 1,..., Xj m)ofthejth cell for j =1,..., n. Let X(r) = n j=1 Xj (0)/n for all τ r 0, and the time evolution t j =0.

308 300 8 Multicellular Networks and Synchronization 2. Parallel computation for each cell: if t MX t mx τ, proceed with the parallel computation for each cell, i.e., j =1,..., n. Otherwise, choose only the mxth cell, i.e., j = mx to proceed with the following computation, where MX and mx are the cells with the maximal and the minimal current evolution times among {t 1,..., t n } respectively. a) Mean field variables: compute X(t j τ) = n j=1 Xj (t j τ)/n, where X j (t j τ) is the latest value of X j at t j τ. That is, if t j τ>0, then X j (t j τ) =X j (t j 1 ) for two consecutive updating times tj 1 and t j 2 of the jth cell with tj 1 t τ<tj 2 ; otherwise Xj (t j τ) =X j (0). b) Propensities: compute w i (X j (t j )) for i =1,..., n 0 according to the state X j (t j )and X(t j τ). c) Uniform random numbers: draw two uniform random numbers r j 1 and r j 2 [0, 1). d) Time interval Δτ j : compute Δτ j = (lnr j 1 )/ n 0 i=1 w i until the next reaction. e) Next reaction μ j : find the next reaction μ j by taking μ j to be the integer satisfying μ j 1 i=1 n 0 w i <r j 2 ( w i ) w i. (8.99) f) Update the time t j t j +Δτ j, and the state X j X j +θ μ j according to the μ j th reaction. 3. Termination check: if min{t 1,..., t n } >t stop, then terminate the computation; otherwise, go to step 2. The Gillespie algorithm is considered the standard one for stochastic simulation of biochemical systems. In particular, the algorithm entails the generation of an ensemble of sample trajectories of the system with correct statistics for a set of biochemical reactions, which asymptotically converge to the solution of the corresponding master equation. Clearly, the algorithm requires storing the sampling times and the state in the time interval [t j τ,t j ] due to the time delays. i=1 μ j i= Numerical Simulation The noise, the coupling, and the time delays affect the dynamics of individual oscillators and can lead to cooperative behavior. The effects of the extracellular noise intensity on cooperative behavior among cells are first considered. The bifurcation diagram of the AI mean value as a function of the noise deviation is indicated in Figure 8.10 (a), which shows that the noise can actually induce oscillations by a Hopf bifurcation. Clearly, without extracellular noise, the evolution equations for cumulants and covariances converge to a stable equilibrium or a stationary probability distribution with constant means

309 8.6 A General Multicellular Network for Stochastic Models with Coupling 301 and covariances, which corresponds to asynchronously fluctuating behavior of cells. As the noise intensity increases, the equilibrium becomes unstable and the multicellular system becomes oscillatory, which implies that the synchronized oscillation is induced by extracellular noises. In other words, for such a system, noise provides extra dynamics, i.e., the dynamics of the second cumulants that originate from fluctuations or other unknown energy sources beyond the coupled system induces cooperative but oscillatory behavior among cells. The cooperative behavior, which is also confirmed by the simulation of the stochastic master equation, is shown in Figure 8.10 (b). Such behavior corresponds to in-phase synchronization induced by extracellular noise. Clearly all cells oscillate almost in the same way with synchronous fluctuations. However, in the absence of extracellular noise, the mean values of AIs evolve to a steady state. (a) (b) Figure 8.10 Noise-driven synchronization. (a) The bifurcation diagram of the cumulant equations. The two curves show the stable steady-state of the AI concentration, as well as the maximum and minimum concentrations of AI in the course of oscillation. (b) The cooperative behavior of concentrations AI of three cells is achieved with random initial phases, which indicates cooperative behavior or nearly synchronous periodic oscillations induced by extracellular noise (from (Chen et al. 2005)) Generally, the coupling enhances the synchronization among cells, whereas the time delay τ and the extracellular noises σ tend to induce oscillatory behavior in each cell. When there is no time delay or the time delay is small, the evolution equations for cumulants and covariances converge to a stable steady-state or a stationary probability distribution with constant means and covariances, for which each cell does not fluctuates, as shown in Figure 8.11 (a). However, with a large time delay, the multicellular system becomes syn-

310 302 8 Multicellular Networks and Synchronization chronously oscillatory. The extracellular noise σ has the effect of inducing oscillations similarly to the time delay. On the other hand, the oscillatory and steady-state regions as functions of the noise deviation σ and the coupling strength d are shown in Figure 8.11 (b). Clearly, the oscillatory region emerges with increasing the noise deviation and is also related to the coupling strength. In particular, for a small noise, the multicellular system converges to a steady-state, which is also a trivial synchronous state. Therefore, for such a system, the coupling and the time delay as well as the noise can also significantly affect cooperative dynamics. (a) (b) Figure 8.11 Bifurcation diagrams on the AI mean value for the evolution equations. (a) Bifurcation diagram with time delay. (b) Oscillatory and steady-state regions by the noise deviation σ and the coupling strength d. The oscillatory region emerges with increasing the noise deviation σ (from (Chen et al. 2005)) The cooperative dynamics induced by the time delay is confirmed in Figure 8.12 (a), whereas the convergence to a steady-state of cumulants with a small time delay is shown in Figure 8.12 (b) by both the evolution equations and the master equations. On the other hand, the effect of coupling on cooperative behavior is demonstrated in Figure 8.11 (b). We can see that all cells almost synchronously oscillate for a large coupling d. However, when the coupling d is changed to a small value, the multicellular system still oscillates but in an asynchronous manner (not shown). Such facts imply that the coupling generally enhances the cooperative dynamics by controlling the stability of synchronization. All these figures are obtained with randomly distributed initial phases. In other words, the simulation for all cells starts from asynchronously initial conditions, to demonstrate the effect of active synchronization. These results indicate that when the system is oscillatory, the cooperative behavior among cells can be observed clearly from the evolution equations, as shown in Figure 8.10 (b). Figure 8.12 (a) shows good agreement between stochastic

311 8.7 Deterministic Synchronization of Genetic Networks in Lur e Form 303 and deterministic approaches, except for quantitative difference in peak levels. Such differences arise possibly due to a small number of cells in the simulation, and are expected to be further reduced if more cells are considered. Noise promotes oscillations by introducing extra dynamics, which originates from random fluctuations or the second cumulants. Such extra dynamics under certain conditions may play a crucial role as an energy source to excite cooperative behavior or lead to ordered states among cells. Actually, it is well known that noise plays a significant role in stochastic resonance of a non-autonomous system with a periodic or non-periodic driving force and coherence resonance of an autonomous system. In particular, coherence resonance phenomena show that noise enhances temporal regularity of a dynamical system. Many theoretical and experimental works including coupled coherence resonance oscillators and coupled chaotic systems have analyzed various kinds of resonance behavior. All these studies indicate that noise can play a stabilizing role in coupled oscillators and maps, and tends to drive stochastic systems toward regular dynamics. Intracellular and extracellular noises play different roles in cell communication. Since the intracellular noise of each cell is independent, they generally tend to disturb cooperative behavior among cells. However, since the extracellular noise is nearly common to all cells due to the common environment, they have effects of synchronizing all cells by exerting the same fluctuations on each cell through signal molecules. The effects of coupling on cooperative behavior across a population of noisy oscillators have not been extensively studied and the roles of noise are not well understood. The results show that noise and time delays can induce cooperative behavior or an order which seems contradictory to our intuitions based on the usual negative meanings of the words noise and randomness. The theoretical and numerical results suggest that such an essential and constructive role played by noise and coupling may make living organisms organize their various apparatuses harmoniously and actively accomplish mutual communications (Springer and Paulsson 2006, Zhou et al. 2005). 8.7 Deterministic Synchronization of Genetic Networks in Lur e Form As described in Chapter 5, a gene regulatory system in a single cell can be represented in the Lur e form (5.11) (5.12), therefore, we can study synchronization of multicellular networks by considering coupled genetic oscillators in the Lur e form. In this section and the next section, we present results in this framework. The main contents of this section are based on (Li et al. 2006b). More generally, we can study a genetic oscillator model in the Lur e form with multiple nonlinear vector regulatory functions as follows:

304 8 Multicellular Networks and Synchronization Figure 8.12 Dynamics of AIs with the effects of time delay by the simulation of the stochastic master equation.

312 304 8 Multicellular Networks and Synchronization Figure 8.12 Dynamics of AIs with the effects of time delay by the simulation of the stochastic master equation. The AI concentrations generated by the stochastic simulations are denoted by the dots, while the mean value generated by the evolution equations is denoted by the bold line. (a) Cooperative behavior induced by the time delay (a nearly synchronous periodic oscillation). (b) Convergence to a steady-state where the dots correspond to the result of the stochastic simulation and the bold line corresponds to that of the evolution equations (from (Chen et al. 2005)) ẏ(t) =Ay(t)+ l B i f i (y(t)), (8.100) where y(t) R n represents the concentrations of proteins, RNAs, and biochemical complexes, and f i (y) =[f i1 (y 1 (t)),..., f in (y n (t))] T with f ij (y j (t)) as a monotonic increasing or decreasing regulatory function, which usually is of the MM or Hill form. A and B i are matrices in R n n. Many well-known genetic system models can be represented in (or rewritten into) this form, such as the Goodwin model (Goodwin 1965), the repressilator (Elowitz and Leibler 2000), the toggle switch (Gardner et al. 2000), and the circadian oscillators (Goldbeter 1995). In synthetic biology, genetic oscillators of this form can be implemented experimentally (Kalir et al. 2005). To make the method more understandable and to avoid unnecessarily complicated notation, we consider the following simplified model, in which there are only one increasing and one decreasing nonlinear term in each equation of the individual genetic oscillator: ẏ(t) =Ay(t)+B 1 f 1 (y(t)) + B 2 f 2 (y(t)), (8.101) where Ay(t) includes the degradation terms and all the other linear terms in the genetic oscillator, f 1 (y(t)) = [f 11 (y 1 (t)),..., f 1n (y n (t))] T with f 1j (y j (t)) as a monotonic increasing nonlinear regulatory function of the Hill form: i=1

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on Regulation and signaling Overview Cells need to regulate the amounts of different proteins they express, depending on cell development (skin vs liver cell) cell stage environmental conditions (food, temperature,