DYNAMIC MODELING OF BIOLOGICAL AND PHYSICAL SYSTEMS

Size: px

Start display at page:

Download "DYNAMIC MODELING OF BIOLOGICAL AND PHYSICAL SYSTEMS"

Donald Parker
5 years ago
Views:

1 The Pennsylvania State University The Graduate School Eberly College of Science DYNAMIC MODELING OF BIOLOGICAL AND PHYSICAL SYSTEMS A Dissertation in Mathematics by Assieh Saadatpour Moghaddam 2012 Assieh Saadatpour Moghaddam Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2012

2 The dissertation of Assieh Saadatpour Moghaddam was reviewed and approved* by the following: Mark Levi Professor of Mathematics Dissertation Co-Adviser Co-Chair of Committee Réka Albert Professor of Physics and Biology Dissertation Co-Adviser Co-Chair of Committee Andrew Belmonte Associate Professor of Mathematics Timothy Reluga Assistant Professor of Mathematics and Biology John Fricks Associate Professor of Statistics Svetlana Katok Professor of Mathematics Chair of Graduate Program *Signatures are on file in the Graduate School. ii

3 Abstract Given the complexity and interactive nature of many biological and physical systems, constructing informative and coherent network models of these systems and subsequently developing efficient approaches to analyze the models is of utmost importance. The combination of network modeling and dynamic analysis enables one to investigate the behavior of the underlying system as a whole and to make experimentally testable predictions about less-understood aspects of the processes involved. This dissertation reports on a combination of theoretical and computational approaches for network-based dynamic analysis of several highly interactive biological and physical systems. Various dynamic modeling approaches, ranging from Boolean to continuous models, are employed to carry out a systematic analysis of the long-term behavior (attractors) of the respective systems. First, we employ a Boolean dynamic framework to model two biological systems: the abscisic acid (ABA) signal transduction network in plants and the T-LGL leukemia signaling network in humans. Given the relatively large number of components in these networks, we develop a network reduction technique leading to a significant decrease in the computational burden associated with the state space analysis of Boolean models while preserving essential dynamical features. For the ABA system, we utilize a synchronous and three different asynchronous Boolean dynamic methods and compare the attractors of the system and their basins of attraction for both unperturbed and perturbed systems. For the T-LGL signaling network, the best-performing asynchronous Boolean dynamic method identified in our first study is used to determine the disease states of the components of the system and to propose several novel candidate therapeutic targets. Next, we apply a Boolean-continuous hybrid (piecewise linear) dynamic formalism to model a pathogen-immune system interaction network, and present the results of a comparative study of the dynamic characteristics of Boolean and hybrid models. Finally, we rely on continuous dynamic modeling to prove the existence of traveling wave solutions in a better-characterized physical system, namely, a chain of coupled pendula in the presence of damping and forcing. Overall, the theoretical and computational approaches developed in this dissertation provide a bird s-eye-view of the avenues available for model-driven analysis of complex biological and physical systems. iii

4 Table of Contents List of Figures vi List of Tables...viii Acknowledgements x Chapter 1. Introduction Boolean dynamic modeling of signal transduction networks Reconstructing the network Identifying Boolean functions Implementing time Analyzing the dynamics of the system Validating the reconstructed model Studying the robustness of the reconstructed model Using the model to make new predictions Chapter 2. Attractor Analysis of Asynchronous Boolean Models of Signal Transduction Networks Introduction Methods Network reduction Identification of attractors Results Synchronous model Random order asynchronous (ROA) model General asynchronous (GA) model Deterministic asynchronous (DA) model Node perturbations Discussion and conclusion 41 Chapter 3. Boolean Dynamic Modeling of a T Cell Survival Network Identifies Novel Candidate Therapeutic Targets for Large Granular Lymphocyte Leukemia Introduction iv

5 3.2. Methods Results Network simplification and dynamic analysis Experimental validation of the T-LGL steady state Node perturbations Discussion and conclusion 62 Chapter 4. Piecewise Linear Differential Equation (Hybrid) Models of Biological Regulatory Networks Introduction A hybrid model of the pathogen-immune system interactions Network modeling Parameter analysis Comparison of the dynamic properties of the hybrid and asynchronous Boolean models Discussion and conclusion 78 Chapter 5. Traveling Waves in Chains of Pendula Introduction Results Proofs Discussion and conclusion.. 93 Appendix A. Supporting Information for Chapter Appendix B. Supporting Information for Chapter Appendix C. Supporting Information for Chapter References v

6 List of Figures 1.1 Graphical representation and Boolean functions for the network given in Example State transition graph of the network given in Example 1.1 obtained by the synchronous update method. 1.3 State transition graph of the network given in Example 1.1 obtained by the random order asynchronous update method. 1.4 State transition graph of the network given in Example 1.1 obtained by the general asynchronous update method. 1.5 State transition graph of the network given in Example 1.1 obtained by the deterministic asynchronous update method The ABA signal transduction network as synthesized in [23] The 13-node sub-network of the ABA signal transduction network The 3-node sub-network of the ABA signaling network with the corresponding state transition graph obtained from the ROA model The 8-node sub-network of the ABA signaling network State transition graph of the 3-node sub-network given in Figure 2.3(a) obtained from the GA model. 2.6 State transition graph of the 3-node sub-network for the DA model with the time units given in Proposition Reduced sub-networks of the ABA signaling network upon knocking out the node ph c The T-LGL leukemia signaling network Reduced sub-networks of the T-LGL leukemia signaling network The state transition graph corresponding to the two oscillatory nodes, CTLA4 and TCR. 3.4 State transition graph of the 6-node sub-network represented in Figure 3.2(b) vi

7 3.5 Probabilities of reaching the normal and T-LGL fixed points when both are reachable. 4.1 Network model of immunological steps and processes activated upon invasion by B. bronchiseptica. 4.2 Illustration of the effect of parameter correlations in an example where node A activates node B. 4.3 The 3-node T-LGL sub-network and the corresponding piecewise linear differential equations. 4.4 State transition graph of the hybrid model for the 3-node T-LGL subnetwork given in Figure 4.3(a). 4.5 State transition graphs of the hybrid and general asynchronous models for the 3-node T-LGL sub-network given in Figure 4.3(a). 5.1 An infinite chain of pendula with a common axis and nearest-neighbor torsional coupling. 5.2 A traveling wave in a chain of pendula with nearest-neighbor torsional coupling, in the presence of forcing and damping Traveling wave viewed as the motion of a particle in a potential field The chain of pendula sags under the gravitational potential Schematic representation of an invariant circle in R 2N. 85 A.1 The 15-node sub-network of the ABA signaling network obtained from S1P or PA perturbation. 99 vii

8 List of Tables 1.1 Components and the causal relationships between them incorporated in Example 1.1 [12]. 2.1 Boolean rules governing the state of the 13-node sub-network depicted in Figure The limit cycles observed in the synchronous model of the 11-node subnetwork given in Figure Alternative statement of the Boolean rules governing the state of the synchronous model of the 11-node sub-network given in Figure 2.2 as a function of Ca 2+ c. 2.4 The expected number of time steps for absorbing into the fixed point when the Markov chain corresponding to the 3-node sub-network given in Figure 2.3(a) starts from the transient states in the ROA model. 2.5 Boolean rules governing the state of the 8-node sub-network represented in Figure The expected number of time steps for absorbing into the fixed point when the Markov chain corresponding to the 3-node sub-network given in Figure 2.3(a) starts from the transient states in the GA model. 2.7 State of the 3-node sub-network at different time steps starting from the initial state 100 obtained from the DA model with the time units given in Proposition Boolean rules governing the state of the nodes in the 19-node sub-network represented in Figure 2.7(a). 2.9 Boolean rules governing the state of the nodes in the 7-node sub-network represented in Figure 2.7(b) The limit cycle observed in the synchronous model for the sub-network given in Figure 2.7(b). 3.1 Boolean rules governing the nodes states in the 6-node sub-network represented in Figure 3.2(b). 3.2 A summary of the dynamic analysis results of the T-LGL survival signaling network viii

9 4.1 Parameters in the hybrid model of the pathogen-immune system interaction network and their ranges [115]. A.1 State of the 3-node sub-network given in Figure 2.3(a) in different time steps starting from initial state 100 obtained by the DA model with the time units given in Proposition 2.2. B.1 Boolean rules governing the state of the T-LGL signaling network depicted in Figure 3.1. B.2 The full names of components in the T-LGL signaling network corresponding to the abbreviated node labels used in Figure 3.1. B.3 Boolean rules governing the state of the 18-node sub-network depicted in Figure 3.2(a) C.1 Differential equations governing the nodes states in Figure 4.1 [115]. 105 C.2 Statistically significant correlations (p < 0.05) among the threshold (!) and decay (") parameters of different nodes of the network given in Figure 4.1 [115]. 107 ix

10 Acknowledgements My utmost gratitude goes to my co-advisors, Prof. Réka Albert and Prof. Mark Levi, for their guidance, scientific insights, encouragement, and continuous support during the course of my research work at Penn State. It has been a truly rewarding experience working with them. I would also like to thank the other members of my doctoral committee, Prof. Andrew Belmonte, Prof. Timothy Reluga, and Prof. John Fricks, for their valuable suggestions and comments. Special thanks are extended to my collaborators, Prof. István Albert, Dr. Rui-Sheng Wang, and Dr. Juilee Thakar, for their contribution and insightful suggestions. I am also thankful to the former and current members of Prof. Réka Albert s lab, especially Dr. Ranran Zhang and Dr. Song Li, for enlightening discussions. I would like to express my heart-felt gratitude to my husband, Ali, whose endless love and consistent encouragement empowers me to reach levels I always dreamed of. I also extend the deepest gratitude to my little bundle of joy, whom my husband and I are expecting in a few days, for being a great source of inspiration during the writing of this dissertation. Last but not least, my sincerest thanks go to my parents for whom the words are simply not enough to describe their love and continuous support during my life and in all my endeavors. x

11 Introduction Network modeling provides a powerful tool for analyzing many complex systems, including molecular and cellular level biological systems, by integrating the existing knowledge into a coherent representation [1]. In this representation, the components of a system of interest are represented by nodes and the interactions among the nodes are described by edges. This type of abstraction can provide a foundation for developing computational approaches that are capable of predicting the dynamic behavior of a system in different conditions. Dynamic models describe the evolution of a system over time. In dynamic models, the nodes of the network are characterized by states (representing, e.g., concentration or activity), and the states of the nodes change in time according to the interactions encapsulated in the network. Dynamic modeling approaches are divided into two major categories, namely continuous or discrete, according to the description of nodes states. Continuous models, usually recast as a set of differential equations, are the most appropriate strategy to capture the dynamical nature of biological systems. However, the use of these models is hampered by the scarcity of the kinetic details of the interactions in all but a handful of extensively studied systems [2,3,4]. On the other hand, discrete models, such as finite state logical models [5,6], Boolean models [7,8], and Petri nets [9,10], provide a qualitative description that requires no or few parameters. The class of piecewise linear differential equation (hybrid) models bridges the gap between 1

12 continuous and discrete models by characterizing each node by two variables, a continuous concentration and a discrete activity [11]. These models meld the logical description of the regulatory relationships with a linear concentration decay. Which model to select depends on the level of quantitative details of the available experimental data: continuous models can be employed when sufficient kinetic information is available, discrete models are best suited for less characterized systems with no kinetic details, and hybrid models can be used when partial information on the kinetic parameters is available. In this dissertation, we employ Boolean and hybrid dynamic modeling approaches to analyze several incompletely characterized biological regulatory networks and utilize a continuous dynamic framework to study a physical system that is better characterized than the biological systems. More precisely, in Chapters 2 and 3, we study Boolean models of two biological systems, namely, the abscisic acid signal transduction network in plants and the T cell survival signaling network in humans. In Chapter 4, we focus on hybrid dynamic modeling of the interaction network between mammalian host immune components and respiratory bacteria. In addition, this chapter includes the results of a comparative study of the dynamic characteristics of Boolean and hybrid models. Finally, in Chapter 5, we use a continuous dynamic approach for modeling a physical system, namely, a lattice of coupled pendula in the presence of damping and forcing. As Boolean modeling is the main focus of most chapters of this dissertation, in the following we describe the fundamental steps necessary to construct and analyze Boolean dynamic models of biological regulatory networks with a focus on signal transduction networks. The background information on hybrid and continuous models is described in the Introduction of Chapters 4 and 5, respectively Boolean dynamic modeling of signal transduction networks Part of the material presented in this section has been previously published in modified form by Springer [12] and some other parts have been submitted for publication [13]. Recent years have witnessed a growing interest in the study of signal transduction pathways due to their pivotal role in adapting to various environmental conditions. The 2

13 process of sensing a signal in the extracellular environment and its subsequent transduction and reaching of targets is carried out through a cascade of interactions [14]. A wealth of experimental data characterizing various aspects of signaling pathways has provided the basis for network reconstructions to visualize and better investigate the properties of these interacting pathways. Network reconstruction of cellular signaling pathways involves the representation of proteins, secondary messengers, and small molecules as nodes and the interactions among these components as edges. This graphical representation, denoted by G = (V, E) where V = {v 1, v 2,, v n } is the set of nodes (vertices) and E is the set of edges, usually contains directed edges, where the direction of each edge implies the regulation of the downstream node by the upstream node. In addition, any edge can be characterized by a positive or negative sign denoting activation or inhibition, respectively. The source nodes (i.e., nodes with no incoming edges) of this graph, if they exist, represent external inputs (signals), and one or more nodes, usually sink nodes (i.e., nodes with no outgoing edges), are customarily designated as outputs of the network. Boolean models belong to the class of discrete dynamic models in which each node of the network is characterized by an ON (1) or OFF (0) state [7,8]. This indicates, for example, that a gene is expressed or not expressed, a transcription factor is active or inactive, and a molecule s concentration is above or below a certain threshold. The future state of each node v i is determined by the current states of the nodes regulating it according to a Boolean function f i :{0,1} m i {0,1}, where m i is the number of regulators of v i. Each Boolean function (rule) represents the regulatory relationships between the components and is usually expressed via the Boolean operators AND, OR and NOT. Boolean dynamic modeling of signal transduction networks consists of the following main steps: 1- Reconstructing the network 2- Identifying Boolean functions 3- Implementing time 4- Analyzing the dynamics of the system 5- Validating the reconstructed model 6- Studying the robustness of the reconstructed model 3

14 7- Using the model to make new predictions In the following, we explain these steps in detail and illustrate them through an example Reconstructing the network The first step towards modeling signal transduction networks within a Boolean dynamic framework is to reconstruct the network by synthesizing all the relevant information about the system of interest. This is usually done through an extensive literature search. Experimental evidence can provide information about both the components of the system as well as the regulatory relationships among them [15]. Examples of the experimental evidence leading to identification of the components of a signaling system (nodes of the network) include experiments demonstrating that the activity or concentration of a protein changes once the respective input signal is induced or when a known component of the signaling system is knocked out. Alternatively, experiments showing that the output of the signaling system is altered upon a mutation of a gene suggest that the gene product is involved in the process, and experiments indicating that the artificial over-expression of an intermediary node affects the output of the signal transduction process imply that the node can be considered as a candidate component. There are three major types of causal experimental evidence from which information on the interactions between the components (edges of the network) can be extracted [15]: (i) biochemical evidence of transcription factor-gene interactions, enzymatic activity or protein-protein interactions which indicate direct interaction between two components; (ii) genetic evidence of the effect of the mutation of a particular component on another component; this evidence leads to an indirect causal relationship between the two components; or (iii) pharmacological evidence of the effect of exogenous treatment of a particular component on another component; this evidence also leads to an indirect causal relationship between the two components. The integration of the indirect causal evidence is often challenging and non-intuitive, as each such apparently pairwise relationship may in fact reflect a set of adjacent edges (a path) in the network, and it may involve other, 4

15 known or unknown nodes. Fortunately algorithmic approaches exist and have been implemented in the software package NET-SYNTHESIS [16], a regulatory network inference and simplification tool, which generates the sparsest network consistent with the given causal evidence. In some cases, one may need to perform manual curation of the output of the software to find the most realistic network corresponding to the available experimental observations [17]. After the nodes and edges of the network were identified, one can assemble a network corresponding to the regulatory system of interest to get a big picture of the whole system and its interactions. Several software packages are available for network visualization and analysis including the yed Graph Editor [18], Graphviz [19], Cytoscape [20,21], and Pajek [22]. It is usually useful to summarize the collected evidence in a table that lists all the nodes that are incorporated into the signaling process as well as the regulatory relationships between the nodes. The example below illustrates this first step. Example 1.1. Let us consider a signaling pathway with four components A, B, C, and D in which A is the network s input node (signal) and D is the output node. Information regarding the causal relationships is summarized in Table 1.1. Table 1.1. Components and the causal relationships between them incorporated in Example 1.1 [12]. Components Causal relationships A A activates B A inhibits C B B activates C B activates D C C inhibits B C activates D D The network representation of this simple example is given in Figure 1.1(a), where the directed edge or denotes activation or inhibition of the downstream node by the upstream node, respectively. The Boolean functions governing the nodes states in this network are listed in Figure 1.1(b). We elaborate more on the Boolean rules in the next section. 5

16 (a) (b) Node Boolean rule A B B* = A AND (NOT C) C C* = (NOT A) OR B D D* = B AND C Figure 1.1. Graphical representation and Boolean functions for the network given in Example 1.1. (a) Graphical representation of the network given in Example 1 with the causal relationships listed in Table 1.1. A is the network s input node, B and C are intermediary nodes, and D is the output node. The directed edge or represents activation or inhibition of the downstream node by the upstream node, respectively. (b) Boolean rules governing the nodes states in the network given in Example 1.1. The asterisk signifies the future state of a node, and for simplicity the state of each node is represented by its label [12]. Additional information can also be incorporated in the network representation [17,23]. For example, if there is experimental evidence for the genetic interaction of two proteins but no information on their physical interaction, a putative intermediary node can be added between the two proteins. In addition, both nodes and edges of the network may be color coded to describe different functionalities or location of the components or to represent regulatory relationships between different types of biological entities. Conceptual nodes that represent a phenomenon rather than a physical component may be involved in the network, as well. The network representation of a signal transduction process is a static description of the system. Graph theoretical analysis of this network using measures, such as node degree, distance, betweenness centrality, and clustering coefficient [1], can shed light on the topological organization and function of the underlying biological system Identifying Boolean functions Once the network backbone is synthesized, the next step is to identify the Boolean functions governing the state changes of the nodes in the network. The Boolean function for a given node is determined by the nature of interactions between that node and the input nodes directly interacting with it and is formulated via the Boolean operators AND, 6

17 OR, and NOT.If a node has only one regulator, then a single variable, usually denoted by the label of the regulator, appears in its Boolean rule. This variable is combined with a NOT operator if the regulator is an inhibitor. When a node has multiple regulators, the OR operator is used if any of the regulators can activate the node, and the AND operator is used if co-expression of all of them is required for successful activation of the node. For example, in the case of node B in Figure 1.1 the rule B*=A AND (NOT C) expresses that both presence of A and the absence of C are required for the activation of B. When constructing the Boolean rules, one may encounter difficulties in deciding whether to use an AND or OR operator in a rule wherein a node is controlled by more than one regulator. In this situation, one should refer to the relevant experimental evidence. For example, node D in Figure 1.1 is regulated by B and C. How can one determine, in a real situation, if activation of both B and C, or only one of them, is required for the activation of D? If there is experimental evidence that knocking out either B or C leads to the absence of D, then AND should be used. Conversely, if there is evidence that only simultaneous knockout of B and C would inactivate D, then OR should be used. When no such information is available, the OR operator may be used as a default, and the model can be updated once additional information is obtained. Alternatively, one can also employ probabilistic Boolean networks [24,25], which allow incorporating uncertainty in the rules by assigning different Boolean rules to a node, each with a certain probability of being selected Implementing time In order to convert a static network representation into a dynamic model, one first needs to decide how to implement time. In Boolean models, time is an implicit variable and can be implemented via synchronous or asynchronous update algorithms. Synchronous models assume similar timescales for all the processes involved in a system, and are implemented by updating all the nodes states simultaneously [7]. More precisely, the state of node v i at time step t+1, denoted by x i (t+1), is determined based on the state of its regulators at the t th time step: x i (t +1) = f i (x i1 (t), x i2 (t),..., x im (t)), (1.1) i 7

18 where f i is the Boolean rule for node v i and s,, are the states of its regulators. Although the simplicity and computational efficiency of synchronous models is attractive, they cannot describe well the biological systems that include processes at multiple levels, e.g. mrna, protein, and post-translational levels, because the timescales of these processes ranges from fractions of a second to hours [26]. To overcome this limitation, asynchronous models have been developed wherein the nodes states are updated based on their individual timescales [8]. Various asynchronous algorithms have been proposed so far, including the random order asynchronous [27,28], general asynchronous [28], and deterministic asynchronous [29] algorithms. In the random order asynchronous algorithm, at each time step (or round of update), a random permutation of the nodes, say, is selected and the nodes states are updated in that order [27]. In this case, x i (t+1) is determined according to the most recent updated state of the regulators of node v i : x i (t +1) = f i (x i 1 ( i1 ),..., x imi ( imi )), (1.2) where i j = t if j > i t +1 if j < i In the general asynchronous framework, at each time step, the state of a randomly selected node is updated [28,30].Thus, in this approach, it is quite possible that a node chosen in the current time step will be selected again in the subsequent time step. In the deterministic asynchronous framework, each node has a pre-selected (either based on a priori knowledge or randomly chosen from a uniform distribution) time unit,, and its states is updated at positive multiples of that unit [29]: x i (t +1) = f i (x i 1 (t),..., x im (t)) if t +1 = k i i, for some k N (1.3) x i (t) otherwise As in this method there is a deterministic updating order that depends on the time units, it is possible that the state one node is updated several times while the other only once Analyzing the dynamics of the system By updating the nodes states according to the synchronous or asynchronous algorithms, one can obtain the state of the whole system at each time step, which is 8

19 expressed by a vector whose i th component represents the state of node v i at that time step. We note that the Boolean model of a network with n nodes has a total of 2 n states. These states and the allowed transitions among them form the state transition graph of the system. Starting from an initial state in the state transition graph and iteratively updating the state of the nodes, the state of the system evolves over time and following a trajectory of states, it eventually converges to an attractor. Attractors, which describe the long-time behavior of a system, fall into two groups: fixed points (steady states), wherein the state of the system does not change, and complex attractors, wherein the system oscillates among a set of states. As fixed points of a system are time independent, they are the same for both synchronous and asynchronous models. To obtain the fixed points, one can remove the time dependency from the Boolean rules and solve the resulting set of equations. Complex attractors of synchronous and deterministic asynchronous models correspond to limit cycles (i.e., a set of states that are repeated regularly). The length (period) of a limit cycle is the number of states in the cycle. In random asynchronous models, including random order and general asynchronous, the system may oscillate irregularly among a set of states to form the so-called loose attractors [28]. A loose attractor is indeed a strongly connected component (SSC) of the state transition graph whose out-component (i.e., the set of states that can be reached from the SCC) is empty. For each attractor, the set of states that can reach that attractor form the basin of attraction of that attractor. Let us find all possible attractors for the simple signaling network in Example 1.1 using the approaches introduced in section We start with illustrating the synchronous update method. Since the network has four nodes, totally 2 4 =16 initial conditions are possible. Let us set the initial condition of the network at A 0 =C 0 =1 (ON) and B 0 =D 0 =0 (OFF), which we denote by the vector (1,0,1,0). To obtain the state of the system at the first time step using the synchronous method, one needs to plug in the initial states of the nodes in the corresponding Boolean functions given in Figure 1.1(b). This results in A 1 =1 (since nothing affects the signal), B 1 = 0 (since NOT C 0 =0), C 1 =0 (since both B 0 =0 and NOT A 0 =0), and D 1 =0 (since B 0 =0). Therefore, the state of the system at t=1 will be (1,0,0,0). Following this procedure leads to a repeating sequence of four states, i.e. (1,1,0,0), (1,1,1,0), (1,0,1,1), and (1,0,0,0). This is a limit cycle of length 9

20 four which has the initial state (1,0,1,0) in its basin of attraction. Starting from another initial condition, where A 0 =C 0 =D 0 =0 and B 0 =1, leads to the system s state (0,0,1,0) which remains unchanged with further updates. Thus, the initial state (0,1,0,0) is in the basin of attraction of the fixed-point attractor (0,0,1,0). Synthesizing all state trajectories leads to a state transition graph depicted in Figure 1.2. The binary digits represent the state of the nodes A, B, C, and D in order from left to right. The four gray states on the left form the limit cycle of the system, which has the four white states in its basin of attraction. The fixed point, which is represented with a gray color and a self-loop on it, has seven states plus itself in its basin of attraction. Figure 1.2. State transition graph of the network given in Example 1.1 obtained by the synchronous update method. The binary digits from left to right represent the state of the nodes A, B, C, and D, respectively. The four states represented with a gray color on the left form a limit cycle with four white states in its basin of attraction. The gray state with a self-loop on the right is the fixed point of the system which has seven other states in its basin of attraction [12]. We now consider the random order asynchronous updating scheme. To compute the system s state at t=1 starting from the initial condition A 0 =C 0 =1 (ON) and B 0 =D 0 =0 (OFF) and the update order C-B-A-D, one needs to plug in the latest updated state of the nodes in the corresponding Boolean functions given in Figure 1.1(b). Based on the given update order, first C is updated and becomes C 1 =(NOT A 0 ) OR B 0 =0, next B is updated and becomes B 1 =A 0 AND (NOT C 1 )=1, then A is updated and becomes A 1 =1 (since nothing affects it), and finally D is updated and becomes D 1 = B 1 AND C 1 =0. Therefore, 10

21 the system s state at t=1 is (1,1,0,0). This state is part of a loose attractor, which is depicted with a gray color on the left in Figure 1.3. Starting from initial condition A 0 =B 0 =C 0 =0 (OFF) and D 0 =1 (ON) and any random update order chosen from all 4!=24 possible permutations of four nodes, the system reaches a fixed point which is identical to the fixed point of the synchronous model Figure 1.3. State transition graph of the network given in Example 1.1 obtained by the random order asynchronous update method. The binary digits from left to right represent the state of the nodes A, B, C, and D, respectively. The seven states represented with a gray color on the left form a loose attractor with a white state in its basin of attraction. The gray state with a self-loop on the right is the fixed point of the system which has seven other states in its basin of attraction [12]. Comparing the state transition graphs for the synchronous and random order asynchronous methods (Figures 1.2 and 1.3), we find that the fixed point is the same in both cases, as expected. This is explained by the fact that fixed points are update independent and can be obtained by taking away the time dependencies of the Boolean functions and solving the resulting set of equations. We also observe that the state transition graph for the random order asynchronous method is denser than that of the synchronous method because each state can have multiple outgoing edges in random order asynchronous method, while in the synchronous method only one edge is going out of each state. By using the general asynchronous method, the eight states in which the signal is ON form a loose attractor of the system depicted by a gray color on the left in Figure 1.4. In this method, at each initial condition, there are four possibilities of choosing a node 11

22 randomly. For example, starting from the initial condition A 0 =B 0 =C 0 =0 (OFF) and D 0 =1 (ON) and selecting node D for update, the system reaches the state (0,0,0,0). Then from this state and updating node C, the system will converge to the only fixed point of the system, i.e. the state (0,0,1,0). As can be seen in Figure 1.4, there are more self-loops in the state transition graph of the general asynchronous model since it is quite possible that updating the state of a particular node does not change the state of the system. Figure 1.4. State transition graph of the network given in Example 1.1 obtained by the general asynchronous update method. The binary digits from left to right represent the state of the nodes A, B, C, and D, respectively. The eight states represented with a gray color on the left form a loose attractor. The gray state on the right is the fixed point of the system which has seven other states in its basin of attraction [12]. Finally, we illustrate the deterministic asynchronous method with the preselected time units as t A =1, t B =2, t C =3, and t D =6. This choice of time units implies that node A is updated at all time instants (but its state will not change because it has no input node), node B is updated at multiples of two, node C is updated at multiples of three, and finally node D is updated at multiples of six. For example, starting from the initial condition A 0 =B 0 =D 0 =1 (ON) and C 0 =0 (OFF), at t=1 only A is updated but its state will be unchanged and as a result a self-loop appears at the (1,1,0,1) state. At t=2, nodes A and B are updated but their states remain unchanged (because nothing affects A, and B 2 =A 1 AND (NOT C 1 )=1). At t=3, nodes A and C are updated. While node A remains unchanged, node C is turned ON (because C 3 = (NOT A 2 ) OR B 2 =1). At t=6, all the nodes are updated, and after traversing through some transient states the system eventually 12

23 reaches a limit cycle of length four. The state transition graph corresponding to the deterministic asynchronous model is depicted in Figure 1.5. The limit cycle is different from that of the synchronous method, but the fixed point is the same as that of the other updating schemes. It should be noted that a change in the time units may result in observing different attractors. Thus this method is suitable for modeling signaling networks for which the time units are known beforehand. Figure 1.5. State transition graph of the network given in Example 1.1 obtained by the deterministic asynchronous update method. The binary digits from left to right represent the state of the nodes A, B, C and D, respectively. The four states represented with a gray color on the left form a limit cycle that can be reached from the four white states. The gray state on the right is the fixed point of the system which has seven other states in its basin of attraction [12]. In the particular example that we considered here, an oscillatory behavior was obtained both with synchronous and asynchronous updating schemes. It is known that synchronous models may generate some artifacts, such as spurious limit cycles [31,32]. As a result, it is possible that in some cases oscillations observed in the synchronous model disappear in asynchronous models. Several software resources can be employed for simulation and dynamic analysis of Boolean models such as BooleanNet [33], BoolNet [34], GINsim [35], CellNetAnalyzer [36,37], SimBoolNet [38], and ADAM [39]. The state transition graphs in Figures were generated using BooleanNet. 13

24 Validating the reconstructed model To assess the validity of a reconstructed model, one needs to check if the known experimental observations can be replicated by the dynamic analysis. Therefore, if there is experimental evidence for a certain behavior of a system, which cannot be reproduced by the dynamic model, then the network model and/or the Boolean rules should be revised. For instance, when there is experimental evidence for an ON state of the output node under a certain initial condition while the model shows the opposite or when the dynamic model indicates that there is no fixed point for the system, while in reality the system eventually approaches a steady state, then the model needs to be refined. To this end, one can, for example, change the rules with uncertainties in them (e.g., those for which it was not clear whether to use an OR or AND operator) or incorporate possible additional nodes/edges to the network Studying the robustness of the reconstructed model In order to assess the robustness of the reconstructed model one can perform a perturbation analysis on the network structure and Boolean rules, or on the nodes states. For the former, the robustness can be examined through, for example, randomly adding an edge between two components or interchanging OR and AND operators in a rule. For the latter, one can study the effect of knockout or over-expression mutations by fixing the state of a node at 0 or 1, respectively. Let us elaborate more on node perturbations using the simple network given in Example 1.1 and considering synchronous and the first two asynchronous methods. We omit the deterministic asynchronous updating scheme since the results would be highly dependent on the time units. As we have seen in section 1.1.4, when the signal node A is OFF, the system will always converge to the fixed point (0,0,1,0) in which the output node D stabilizes in the OFF state. This implies that activation of the signal is required for observing the output of the signaling process. On the other hand, when the signal is constitutively ON, the state transition graph possesses a complex attractor (either a limit cycle or a loose attractor) that contains sustained oscillations of the output node. The effects of perturbations of intermediary nodes in Example 1.1 are the same for synchronous, random order asynchronous, and general asynchronous update methods. 14

25 Knocking out the intermediary node B leads to two fixed points, (0,0,1,0) and (1,0,0,0), where in both cases the output node settles into the OFF state. Conversely, overexpression of B results in two other fixed points, namely (0,1,1,1) and (1,1,1,1), where the output node is stabilized in the ON state. As a result, when the signal is absent, the over-expression of node B would be sufficient for activation of the output in the longterm behavior. With eliminating node C, the state transition graph converges to either (0,0,0,0) or (1,1,0,0) fixed point, whereas with its over-expression the system reaches (0,0,1,0) or (1,0,1,0) fixed point. Therefore, neither knockout nor over-expression of node C suffices for the ON state of the output node D. A model is not expected to be robust to every possible change, but fulfilling some degree of robustness reflects the adaptability of the underlying biological system under different conditions Using the model to make new predictions Once the reconstructed model has been shown to reproduce known biological observations, it can be used to make new implications and predictions on the underlying system. For example, performing knockout and over-expression analysis, as described in the previous step, can reveal the importance of certain components of the system or predict the phenotype traits for system perturbations. The new predictions can guide future experimental studies, thus leading to further model refinements. In Chapters 2 and 3, we show how Boolean dynamic modeling can lead to concrete biological predictions in the case of the abscisic acid signal transduction network in plants as well as the T cell survival signaling network in humans. 15

26 Attractor Analysis of Asynchronous Boolean Models of Signal Transduction Networks This chapter has been previously published in modified form in the Journal of Theoretical Biology [40] Introduction Prior work on the dynamics of Boolean networks, including analysis of the state space attractors and the basin of attraction of each attractor, has mainly focused on synchronous update of the nodes states [30,31,41,42,43,44]. However, the majority of the current studies on asynchronous Boolean models have mainly focused on finding the fixed points of the system or on identifying the fixed points reachable from the nominal (wild-type) initial condition [27]. Very few studies set their goals to identify complex attractors [30,31]. Thus, the investigation of all possible attractors and their basins of attraction for a system under different updating schemes is still an open question. Here, we perform a comprehensive study of all attractors of a biological system, considering every possible initial state, using a synchronous and the three asynchronous Boolean models described in Section The biological system that we choose for this analysis is the signal transduction network corresponding to drought response in plants. Plants take up carbon dioxide for 16

27 photosynthesis and lose water by transpiration through microscopic pores called stomata. The surrounding pair of so-called guard cells plays a crucial role in controlling the stomatal size. During drought, the plant hormone abscisic acid (ABA) is synthesized to promote stomatal closure, thus cutting down evaporation from the interior of the plant [23]. In addition to the importance of stomata in control of the plant water balance, study of the stomatal opening and closing in response to various stimuli like ABA is of particular interest to biologists as it can be readily used to test different hypotheses for mechanisms affecting signal transduction [45]. A comprehensive reconstruction of the signaling network responsible for ABA-induced stomatal closure was previously performed by Li et al. [23]. The 54 nodes of the network contain proteins, ion channels and secondary messengers, and a few conceptual nodes such as plasma membrane depolarization and stomatal closure. The edges of this network, the vast majority of them directed, represent protein-protein interactions, chemical reactions, and indirect regulatory relationships between two nodes. Li et al. [23] also developed a Boolean model for the ABA signaling network and performed random order asynchronous simulations, focusing on the behavior of a single node, representing stomatal closure. One of the interesting features of the ABA signal transduction network is the possible existence of Ca 2+ -driven oscillations in a subset of the nodes. Among the many signaling mechanisms associated with Ca 2+ [46], calcium oscillations and waves have been frequently observed in muscle cells and neurons [47], mammalian embryonic development [48,49] as well as plant cells [50]. While cytosolic Ca 2+ (i.e., calcium 2+ ion in the cytosol, denoted by Ca 2+ c) fluctuations have been observed in stomatal guard cells, there are indications that Ca 2+ c oscillation may be more important for the maintenance of closure than for the induction of closure [51] and that the induction of closure might only depend on the first, transient Ca 2+ c elevation. The model of Li et al. [23] predicted that when Ca 2+ c elevation occurs, stomatal closure is triggered; however, neither Ca 2+ c oscillation nor Ca 2+ c elevation is required for ABA-induced stomatal closure in the model. Nevertheless, Li et al. [23] do conclude that Ca 2+ c modulation confers an essential redundancy to the network, as Ca 2+ c elevation becomes required for engendering stomatal closure when ph c changes, K + efflux or the S1P-PA pathway are perturbed. 17

28 There has been a significant interest in relating the structure of regulatory networks, specifically the existence of positive and negative feedback loops, to their dynamical properties [8]. In a graph-theoretical sense, a feedback loop is a directed cycle whose sign depends upon the parity of the number of negative interactions in the cycle. A positive/negative feedback loop has an even/odd number of negative interactions. It has been conjectured that the existence of positive feedback loops in the regulatory network are necessary for the existence of several stable fixed points in the dynamics (multistability), whereas negative feedback loops are required for observing sustained oscillations [52]. There have been several efforts towards proving these hypotheses in both a continuous framework [53,54] as well as a discrete framework [55,56]. As the ABA signal transduction network contains both positive and negative feedback loops, it has the potential for both multistability and sustained oscillations. In this work, we perform a systematic study of the long-term dynamic behavior supported by the ABA-induced closure model developed by Li et al. [23], with a special focus on the possibility of long-term oscillations. To this end, we employ synchronous as well as the three asynchronous updating techniques described in Section 1.1.3, i.e., random order asynchronous (ROA), general asynchronous (GA), and deterministic asynchronous (DA) methods. Our analysis reveals a significant degree of dependence of the model s behavior on the manner of update as an overlapping but non-identical set of attractors emerged for the various updating schemes. This diversity in dynamical behavior is also observed when exploring the impact of deleting (knocking out) biologically significant nodes. For instance, disrupting certain nodes leads to a fixed point or sustained oscillations (depending on the initial condition) in the synchronous model, while the same disruption modeled with the ROA or GA methods produces the same fixed point but no oscillations. This study provides a comprehensive comparison and novel insights into the dynamical features of various Boolean models through capturing the diverse behaviors of the ABA signaling network in different conditions. Furthermore, the results obtained for the ABA system can enable experimentalists to rapidly rank and test different hypotheses about stomatal closure. The combination of methods presented here can be readily applied to other biological networks as well. 18

29 2.2. Methods In this work, we employ synchronous as well as the three the three asynchronous updating methods described in Section 1.1.3, namely, random order asynchronous (ROA), general asynchronous (GA), and deterministic asynchronous (DA) methods, to analyze the attractors of the ABA signal transduction network and their basins of attraction. Recall that in the ROA method, at each round of update, a random ordered sequence of the nodes is selected and the nodes states are updated in that order. In the GA method, at each time step, the state of a randomly selected node is updated, and in the DA method nodes are updated only at multiples of their corresponding pre-selected time units. To reduce the computational burden associated with the large state space of the system, we propose the following network reduction method Network reduction The exponential dependence of the size of the state space of Boolean models on the number of nodes poses a substantial obstacle to mapping the state transitions of even relatively small networks. This calls for developing efficient network reduction approaches. Recent efforts towards addressing this challenge consist of iteratively removing single nodes that do not regulate their own function and simplifying the redundant transfer functions [57,58]. Naldi et al. [57] proved that this approach preserves the fixed points of the system and that for each complex attractor in the original asynchronous model there is at least one complex attractor in the reduced model (i.e., network reduction may create spurious oscillations). Here we propose and implement a two-step reduction method by which the state space of Boolean models can be reduced without significant loss of information. Most Boolean networks, and especially Boolean models of signal transduction networks with a sustained signal, contain nodes whose state stabilizes in an attracting state after a transient period, whether or not the system as a whole has a fixed-point attractor. Our reduction method (i) pinpoints and eliminates these stabilized nodes and (ii) iteratively removes a simple mediator node (e.g., a node that has one incoming edge and one outgoing edge, but without a self-loop) and connects its input(s) to its target(s). We note that we use the second step in asynchronous Boolean models. Our simplification method 19

30 shares similarities with the method proposed in [57,58], with the difference that we only remove stabilized nodes (which have the same state on every attractor) and simple mediator nodes rather than eliminating each node without a self-loop. Thus their proof regarding the preservation of the steady states by the reduction method holds true in our case. We also note that the first step of our reduction method is similar to the logical steady state analysis implemented in the software tool CellNetAnalyzer [36,37]. In this work, we employ our proposed network reduction method for the analysis of the ABA signal transduction network and verify by using numerical simulations that it preserves the attractors of this system Identification of attractors It should be noted that the fixed points of a Boolean network are the same for both synchronous and asynchronous methods. In order to obtain the fixed points of a system one can solve the set of Boolean equations independent of time. To this end, we first fix the state of the source nodes. We then determine the nodes whose rules depend on the source nodes and will either stabilize in an attracting state after a time delay or otherwise their rules can be simplified significantly by plugging in the state of the source nodes. Iteratively inserting the states of stabilized nodes in the rules (i.e., employing the first step of our reduction method) will result in either the fixed point(s) of the system, or the partial fixed point(s) and a remaining set of equations to be solved. In the latter case, if the remaining set of equations is too large to obtain its fixed point(s) analytically, we take advantage of the second step of our reduction method to simplify the resulting network and to determine a simpler set of Boolean rules. By solving this simpler set of equations (or performing numerical simulations, if necessary) and plugging the solutions into the original rules, we can then find the states of the removed nodes and determine the attractors of the whole system accordingly. The computer simulations for this study have been implemented in python using the open source software packages BooleanNet [33] and NetworkX [59] Results The experimental results concerning ABA-induced stomatal closure have been compiled into a network by Li et al. [23] (see Figure 2.1). This network contains 54 20

31 nodes of which the state of 39 is regulated by other nodes [23]. ABA serves as the input node of this network, whereas Closure is considered as the output node. The edges represent interactions between two nodes, where an arrowhead/short segment at the end of an edge denotes activation/inhibition. Furthermore, a dynamic model of the process using the ROA Boolean approach was developed in [23]. The output of the model was chosen as the percentage of simulations, involving different initial conditions for nodes other than ABA and different update orders, that at a given time step attained the ON state for the node Closure (called the percentage of closure). This study also revealed that certain components of the system, such as Ca 2+ c, show oscillatory behavior [23]. Here we use the same Boolean rules as in [23]. We assume that the unregulated nodes included in the Boolean rules, such as ABA, are in the ON state. Based on this assumption, and using the first step of our reduction method, the Boolean functions of many nodes can be simplified. As a result, the majority of the regulated nodes of the network (26 out of 39) stabilize in an ON or OFF state within eight time steps with the synchronous or ROA method. In particular, the node Closure stabilizes in the ON state. Subsequently, we concentrate on the dynamics of the sub-network consisting of only those nodes with oscillatory behavior, and use the steady states of the stationary nodes in the Boolean rules governing the state transitions of the oscillating nodes. Li et al. [23] observed sustained fluctuations during the period of their study which consisted of ten time steps (rounds of update) in the ROA model. Here we aim to determine whether the oscillatory behavior is sustained in the long-time behavior and whether it constitutes an attractor of the state space of the system for the ROA, GA, and DA models. The sub-network obtained after safely removing the stabilized nodes is illustrated in Figure 2.2. The Boolean rules governing the states of the nodes in this 13-node subnetwork are given in Table 2.1. Although this simplified network has a size considerably smaller than the original network, it is still computationally taxing to map the state trajectories of the network, especially for the asynchronous models, due to the combinatorial nature of the underlying problem. Therefore, we iteratively simplified the 13-node network into smaller sub-networks to obtain a general insight into the dynamics of the system. As a first simplification, we removed the nodes KEV and KAP, which 21

32 Figure 2.1. The ABA signal transduction network as synthesized in [23]. The functions of the nodes are color coded in this figure: enzymes are shown in red, signal transduction proteins are green, membrane transport-related nodes are blue and secondary messengers and small molecules are orange. Small black filled circles represent putative intermediaries of indirect regulatory interactions. Arrowheads represent activation and short perpendicular bars indicate inhibition. Nodes involved in the same metabolic pathway or protein complex are bordered by a gray box. The full names of selected network components corresponding to each abbreviated node label are: ABA, abscisic acid; ABI1, protein phosphatase 2C ABI1; AGB1, heterotrimeric G protein subunit; AnionEM, anion efflux at the plasma membrane; CaIM, Ca 2+ influx across the plasma membrane; Ca 2+ ATPase, Ca 2+ ATPases and Ca 2+ /H + antiporters responsible for Ca 2+ efflux from the cytosol; Ca 2+ c, cytosolic Ca 2+ increase; cadpr, cyclic ADP-ribose; cgmp, cyclic GMP; CIS, Ca 2+ influx to the cytosol from intracellular stores; Depolar, plasma membrane depolarization; GC, guanyl cyclase; GCR1, putative G protein coupled receptor; H + ATPase, H + ATPase at the plasma membrane; InsP3, inositol-1,4,5-trisphosphate; KEV, K + efflux from the vacuole to the cytosol; KOUT, K + efflux through slowly activating outwardly rectifying K + channels at the plasma membrane; NOS, nitric oxide synthase; NO, nitric oxide; PA, phosphatidic acid; PLC, phospholipase C; S1P, sphingosine-1- phosphate. This figure and its caption have been reproduced from [23]. 22

33 have no regulatory effects on the other nodes of the sub-network. From here on, we refer to the 13-node sub-network without these two nodes as the 11-node sub-network. In the following sections, we discuss the results of the synchronous and three asynchronous models for this sub-network and its variants. Figure 2.2. The 13-node sub-network of the ABA signal transduction network. This sub-network is obtained by removing the nodes that stabilize within eight time steps. The node labels are the same as in Figure 2.1. The 11-node sub-network is derived from this network by removing the nodes KEV and KAP. Table 2.1. Boolean rules governing the state of the 13-node sub-network depicted in Figure 2.2. For simplicity, the node s states are represented by the node names. The asterisk indicates the future state of the marked node. Node NOS NO GC ADPRc cadpr cgmp PLC InsP3 CIS Ca 2+ ATPase Ca 2+ c KAP KEV Boolean rule NOS* = Ca 2+ c NO* = NOS GC* = NO ADPRc* = NO cadpr* = ADPRc cgmp* = GC PLC* = Ca 2+ c InsP3* = PLC CIS* = (cgmp AND cadpr) OR InsP3 Ca 2+ ATPase* = Ca 2+ c Ca 2+ c* = CIS AND (NOT Ca 2+ ATPase) KAP* = NOT Ca 2+ c KEV* = Ca 2+ c 23

34 Synchronous model We found by simulation that with the synchronous update method, the 11-node subnetwork possesses three attractors: a fixed point in which all the 11 nodes are in the OFF state (we call such a fixed point a null fixed point) and two distinct limit cycles of period four as given in Table 2.2. The fixed point is reachable from 27 (~1%) out of 2048 possible initial conditions. The basin of attraction of the first limit cycle contains 426 states (~21% of all the initial states) whereas that of the second cycle is comprised of 1595 initial states (~78% of all the initial states). The two limit cycles contain markedly different levels of node activation as the frequency of the ON state for each node in the second limit cycle is double of the corresponding frequency in the first one (see Table 2.2). Interestingly, we observe that the size of the basin of attraction of the three attractors is an increasing function of the frequency of the ON states in the attractor. Table 2.2. The limit cycles observed in the synchronous model of the 11-node sub-network given in Figure 2.2. (a) The first limit cycle NOS NO GC ADPRc cadpr cgmp PLC InsP3 CIS Ca 2+ ATPase Ca 2+ c (b) The second limit cycle NOS NO GC ADPRc cadpr cgmp PLC InsP3 CIS Ca 2+ ATPase Ca 2+ c In addition to the numerical simulations, we also performed a theoretical analysis using scalar equations to determine the attractors of the synchronous model. Scalar equations and reduced scalar equations have been previously found to be useful in acquiring information about the cyclic and transient structure of synchronous Boolean networks [42,60]. As defined by Heidel et al. [42], a scalar equation is an ordinary recurrence equation for a particular node of a Boolean network. However, sometimes 24

35 such equations may take an impractically complex form. To tackle this difficulty, a simpler form called reduced scalar equation [60] can be derived by further iterating the original recurrence relationship. In order to obtain the scalar equations for the 11-node sub-network, we first recapitulate all the Boolean rules given in Table 2.1 as a function of Ca 2+ c (see Table 2.3). The reduced scalar equation corresponding to the node Ca 2+ c can then be expressed as follows (see Appendix A for details): Ca 2+ c (t+8) = Ca 2+ c (t+4). (2.1) Similar equations can be obtained for other nodes of the sub-network. Equation (2.1) provides immediate information about the cyclic and transient structure of the network. Considering equation (2.1) and noting the fact that the Boolean functions of all other nodes depend on Ca 2+ c, the only possible limit cycles of the sub-network are of length one, two or four. We find that limit cycles of length two are not possible. To demonstrate this, let us assume that Ca 2+ c admits a cycle of length two. Then for sufficiently large t, we have Ca 2+ c (t) = Ca 2+ c (t+2k) for any positive integer k. But then the Boolean function of the node Ca 2+ c (given in Table 2.3) can be simplified as Ca 2+ c (t) = Ca 2+ c (t) AND (NOT Ca 2+ c (t)). The only solution of this equation implies that Ca 2+ c stabilizes in the OFF state, but that gives the fixed point of the network, not a cycle of length two. Therefore, the sub-network can only have fixed points or cycles of length four, confirming the results obtained by simulations. Table 2.3. Alternative statement of the Boolean rules governing the state of the synchronous model of the 11-node sub-network given in Figure 2.2 as a function of Ca 2+ c. Node NOS NO GC ADPRc cadpr cgmp PLC InsP3 CIS Ca 2+ ATPase Ca 2+ c Boolean rule NOS(t+1) = Ca 2+ c (t) NO(t+2) = Ca 2+ c (t) GC(t+3) = Ca 2+ c (t) ADPRc(t+3) = Ca 2+ c (t) cadpr(t+4) = Ca 2+ c (t) cgmp(t+4) = Ca 2+ c (t) PLC(t+1) = Ca 2+ c (t) InsP3(t+2) = Ca 2+ c (t) CIS(t+5) = Ca 2+ c(t) OR Ca 2+ c (t+2) Ca 2+ ATPase(t+1) = Ca 2+ c (t) Ca 2+ c (t+6) = (Ca 2+ c (t) OR Ca 2+ c (t+2)) AND (NOT Ca 2+ c (t+4)) 25

36 A similar argument can be used to prove that the sub-network possesses a single fixed point, namely the null fixed point. To obtain the cycles of length four, we should have Ca 2+ c (t+4) = Ca 2+ c (t) for large enough t, which simplifies the Boolean function of the node Ca 2+ c (given in Table 2.3) as follows: Ca 2+ c (t+2) = Ca 2+ c (t+2) AND (NOT Ca 2+ c (t)). (2.2) Based on this equation the state 1 for Ca 2+ c at time instant t implies state 0 for this node at time instant t+2. Therefore the state of Ca 2+ c is on one of the orbits (0, 0, 0, 0), (1, 0, 0, 0) or (1, 1, 0, 0). The first orbit implies that Ca 2+ c stabilizes in the OFF state, which gives the null fixed point found before. Considering the last two orbits and using the Boolean functions indicated in Table 2.3, we obtain the limit cycles of length four obtained by simulations (see Table 2.2). Furthermore, using equation (2.1) and similar equations for other nodes in the 11-node sub-network, the longest possible transient trajectory of the sub-network is found to be of length nine. In other words, given any point in the state space of the sub-network, after at most nine iterations it will reach either the null fixed point or one of the two limit cycles. Numerical simulations of this subnetwork indicate that the actual longest transient trajectory is of length seven. In effect, though certain nodes phase follows Ca 2+ c with a delay during the cycle, their transient is shorter than the sum of the transient of Ca 2+ c and the corresponding delay. We also verified analytically that the basin of attraction of the fixed point contains 27 states. It should be noted that adding back the two nodes KEV and KAP (see Figure 2.2) does not change the attractors of the network. More precisely, the system still possesses a fixed point and two limit cycles of length four, in which the states of the nodes KEV and KAP are determined by Ca 2+ c. Considering the attractors for the 13-node sub-network together with those nodes that were already considered to be in their steady state, we conclude that the synchronous model of the whole ABA signal transduction network eventually settles into either a fixed point or one of the two limit cycles of length four Random order asynchronous (ROA) model To be able to handle the additional state transitions possible in asynchronous models, we further simplified the 11-node sub-network using the second step of our reduction method. This sub-network has three different positive feedback loops involving Ca 2+ c and 26

37 CIS, as well as a negative feedback loop made up of Ca 2+ c and Ca 2+ ATPase. Thus the essence of this network is captured by only three nodes, Ca 2+ c, Ca 2+ ATPase and CIS, forming a coupled positive and negative feedback loop. Figure 2.3(a) shows this 3-node sub-network with the corresponding Boolean rules. We have analytically determined the state transition graph of this sub-network (see Figure 2.3(b)). The digits of the binary numbers in this figure indicate the states of the nodes in the order CIS, Ca 2+ c, and Ca 2+ ATPase. For example, the binary sequence 001 represents CIS = OFF, Ca 2+ c = OFF, and Ca 2+ ATPase = ON. As can be seen in Figure 2.3(b), the system has a fixed point to which all the states converge. This null fixed point, in which all the three nodes are in the OFF state, can be also obtained by solving the time-independent Boolean equations. Figure 2.3. The 3-node sub-network of the ABA signaling network with the corresponding state transition graph obtained from the ROA model. (a) The 3-node sub-network and the Boolean rules governing the state of the nodes in the sub-network. (b) The state transition graph of the 3-node network given in (a) obtained from the ROA model. The binary digits from left to right represent the state of the nodes CIS, Ca 2+ c, and Ca 2+ ATPase, respectively. A directed edge between two states indicates that the second state can be obtained from the first after a round of update of the nodes states. The four nodes whose symbols have a gray background form the strongly connected component of the state transition graph. We determined that the transition matrix corresponding to the state space of the 3- node sub-network is as follows: 27

38 /3 1/ /3 0 1/6 P ROA 1/3 1/ /6 1/3 0 0 = 1/ /6 1/3 5/ / / /3 0 1/2 1/3 1/ /6 1/3 0 0 where each entry of the matrix, p ij, denotes the probability of going from state i to state j for 0 i, j 7 (the numbers 0 to 7 are the decimal representation of the binary numbers given in Figure 2.3(b)). In other words, the probabilities p ij are the fractions of the total number of update orders (i.e., permutations) causing a change of state from i to j in one step. It should be noted that the matrix P ROA is a stochastic matrix, because 0 p ij 1 for 0 i, j 7, and all rows sum up to one. In the sense of a Markov chain, the fixed point of the system is an absorbing state, i.e., a state left unchanged in all permutations. All other states are transient states because there is a positive probability that the chain does not return to these states after leaving them. Hence the vector is the stationary distribution of the chain where each component denotes the long-term probability of being in state i. The strongly connected component (SCC) of the state space in Figure 2.3(b) contains half of the total states, namely states 100, 101, 110, and 111 (see symbols with a gray background in Figure 2.3(b)). However, every state, including those in the SCC, is connected to the fixed point, implying that every state trajectory will ultimately converge to the fixed point. An immediate calculation, using the transition matrix P ROA, shows that the average probability of reaching the fixed point in one step is larger than the average probability of remaining in the SCC for one step. We also estimated the expected times for absorption into the fixed point [61] when the chain starts from each of the transient states (see Table 2.4). As given in this table, the absorption time for state 001 is less than that for other states since it can reach the fixed point in only one step (see Figure 2.3(b)). In addition, we can see that state 101 in the SCC has a shorter absorption time compared to the other states in the SCC because with a probability of 5/6 it reaches the fixed point 28

39 in one step. In summary, the 3-node sub-network does not support sustained oscillations as an attracting behavior. Table 2.4. The expected number of time steps for absorbing into the fixed point when the Markov chain corresponding to the 3-node sub-network given in Figure 2.3(a) starts from the transient states in the ROA model. State Absorption time In the next step, we considered a larger sub-network to see whether the same behavior is observed. As can be seen in Figure 2.2, in the 11-node sub-network there are two different paths from NO to CIS with the same length. The two paths are conditional on each other, as both cgmp and cadpr need to be ON in order to activate CIS, thus for simplification we collapsed these two equivalent paths into one. We next removed the node cgmp from the network which has both in-degree (number of edges entering a node) and out-degree (number of edges coming out of a node) of one and acts as a delay between the two nodes GC and CIS. The resulting 8-node sub-network, which has two positive feedback loops of length four and five, and one negative feedback loop of length two, is depicted in Figure 2.4. The Boolean rules governing the state of the nodes of this sub-network are given in Table 2.5. Since in this case there are two positive feedback loops competing with one negative feedback loop, a loose attractor may be possible. Figure 2.4. The 8-node sub-network of the ABA signaling network. This sub-network is obtained by collapsing the two paths from NO to CIS and eliminating the node cgmp in the 11-node sub-network (see Figure 2.2). 29

40 Table 2.5. Boolean rules governing the state of the 8-node sub-network represented in Figure 2.4. Node NOS NO GC PLC InsP3 CIS Ca 2+ ATPase Ca 2+ c Boolean rule NOS* = Ca 2+ c NO* = NOS GC* = NO PLC* = Ca 2+ c InsP3* = PLC CIS* = GC OR InsP3 Ca 2+ ATPase* = Ca 2+ c Ca 2+ c* = CIS AND (NOT Ca 2+ ATPase) We numerically probed the dynamics of this sub-network and found that the majority of the nodes of the state space (237 out of 256) are part of a giant SCC. The average indegree of the SCC is equal to 32 and its average out-degree is 31, thus the SCC is densely connected. The fixed point of the system is the null fixed point, in which all the nodes are in the OFF state. The large in-degree of the fixed point (242) shows that almost all the nodes of the state space can reach the fixed point in one step. By extension of the results obtained for the 8-node sub-network, we can conclude that for the 11-node sub-network the only attractor of the system under the ROA update is the null fixed point. We also investigated whether small changes to the Boolean rules or to the topology of the 8-node sub-network can strengthen the strongly connected component of the state transition graph (see Appendix A for details) and found that no such changes can strengthen it to the point of becoming a loose attractor. In summary, based on the results we discussed for the simplified sub-networks, we can conclude that the ROA model of the ABA signal transduction network possesses only a fixed point, which is identical to the fixed point of the synchronous model. It is important to notice the difference between the basins of attraction of the attractors for the synchronous and ROA approaches. In the synchronous method the limit cycles had the majority of the states in their basins of attraction, whereas, in the asynchronous case all the states are in the basin of attraction of the fixed point. This result is consistent with previous reports that synchronous Boolean models can have spurious oscillations [31,32]. For example, a comparison of synchronous Boolean and continuous models of gene networks has showed that many of the periodic oscillations observed in the Boolean model were not present in the continuous model [32]. 30

41 General asynchronous (GA) model For this asynchronous model, we analytically determined the state transition graph corresponding to the simplified 3-node sub-network given in Figure 2.3(a). As represented in Figure 2.5, the state 000 serves as the fixed point and the SCC of the state transition graph contains the same four states as that of the ROA model. However, there are some important differences between the transition graphs of the GA and ROA models. First, the GA graph (Figure 2.5) has fewer edges than the ROA graph (Figure 2.3(b)). This is explained by the fact that only one node can change state during a GA update, compared to all three potentially changing states during an ROA round of update, thus the number of states adjacent to a given state is less in the GA model than in the ROA model. Second, there are more self-loops in Figure 2.5 than in Figure 2.3(b) since it is quite possible in the GA model that updating the state of a particular node does not change the state of the system. Finally, in Figure 2.3(b) almost all the states can reach the fixed point in one step, which is not the case in the GA model. Figure 2.5. State transition graph of the 3-node sub-network given in Figure 2.3(a) obtained from the GA model. The binary digits from left to right represent the state of the nodes CIS, Ca 2+ c, and Ca 2+ ATPase, respectively. A directed edge between two states indicates that the second state can be obtained from the first after a time step. The four nodes whose symbols have a gray background form the strongly connected component of the state transition graph. The transition matrix corresponding to the GA graph is as follows: 31

42 /3 2 / / / /3 0 P GA 0 1/3 0 1/ /3 = 1/ /3 0 1/ / /3 1/ /3 1/ /3 0 2 /3 Based on the transition matrix we estimated the expected number of time steps for absorption into the fixed point from each of the transient states (see Table 2.6). A comparison of the absorption times for the ROA and GA models (Tables 2.4 and 2.6) shows that all the states have longer absorption times in the GA model than in the ROA model, in agreement with the higher number of self loops in the GA graph. Table 2.6. The expected number of time steps for absorbing into the fixed point when the Markov chain corresponding to the 3-node sub-network given in Figure 2.3(a) starts from the transient states in the GA model. State Absorption time The ROA and GA state transition graphs are related by a variant of transitive closure: the ROA state transition graph contains an edge for every path of length at most three (generally, of length at most n) that represents the update of different network nodes in the GA state transition graph. Since transitive closure and its opposite, transitive reduction, do not change the strongly connected components and their in- and outcomponents (the in-component of an SCC is the set of states that can reach the SCC, whereas the out-component of an SCC is the set of states that can be reached from the SCC), we expected that the similarity of the ROA and GA models is preserved for the 8- node sub-network shown in Figure 2.4 as well. Indeed, we found that the 8-node subnetwork exhibits a single fixed point, namely the null (all-off) state, similar to the ROA model. However, the in-degree of the fixed point is 9 in the GA model which is remarkably smaller than that obtained with the ROA model (242). Furthermore, we identified a large SCC containing 238 nodes of the state space (out of 256), with average in- and out-degrees of 5 and 4, respectively. The results for changing the Boolean rules or 32

43 the topology of the 8-node sub-network obtained from these two updating schemes were almost the same and the only difference was related to the average in- and out-degrees (see Appendix A for details). This agreement between the two methods goes beyond the previously observed equality of the expected value of the time taken between two consecutive updates of a node [62]. We conclude that the GA model of the ABA signal transduction network possesses only a fixed point, which is identical to the fixed point of the synchronous and of the ROA model Deterministic asynchronous (DA) model We applied this method of update to the 3-node sub-network represented in Figure 2.3(a) with many different choices of time units. In all cases, the sub-network possesses the null fixed point. When the time unit for the node CIS is the largest, interesting additional behaviors are observed. Specifically, in some cases, the four states that formed the SCC of the transition graph in the ROA and GA models now form a limit cycle. The length of the observed limit cycles can vary depending on the frequency of the four states in the limit cycles. As the additional nodes in the 8-node sub-network only affect the effective time units between the three main nodes, there are no dynamical behaviors which are possible in the 8-node sub-network and would not be in the 3-node subnetwork. Thus in the following we only consider the 3-node sub-network. Let be the time unit associated with node v i, where v i is one of the three nodes of the sub-network. In the following, we prove that under certain conditions, limit cycles for the DA model exist and we express the length of the limit cycles in terms of. Proposition 2.1. If, where k is a positive integer number, and, then the transition graph of the 3-node sub-network has a limit cycle of length. Proof. First note that if CIS is in the OFF state initially, then the condition implies that the node Ca 2+ c will be turned OFF and therefore the system eventually settles into the null fixed point. Thus, a necessary condition for the basin of attraction of a cycle is that the node CIS must be in the ON state initially. Assuming that, we need to consider two cases: 33

44 Case 1. The node Ca 2+ c is in the OFF state initially, i.e., the initial state of the system is either 100 or 101 (the order of the nodes in these states is CIS, Ca 2+ c, and Ca 2+ ATPase). Without loss of generality, assume that the initial state is 100. A similar argument with minor changes holds for the case where the system starts from the state 101. The system stays at the state 100 at all even time instants, in which Ca 2+ ATPase is updated, until the time point t = 2k+1, when Ca 2+ c is updated and the state of the system changes to 110. Hence, the number of time steps the system stays in the state 100 (without considering the initial state) is, where [x] denotes the greatest integer less than or equal to x. At t = 2k+2, the state of the system changes from 110 to 111 because Ca 2+ ATPase is updated. The system s state remains unchanged until t = 4k+2 = t CIS, when all the nodes are updated, causing a change of state to 101. Note that the number of time steps that the system visits the state 111, before it reaches the state 101, is again. Then at t = 4k+4, Ca 2+ ATPase is updated and the system returns to the state 100 and the above scenario repeats again. Therefore, the length of the limit cycle is k+1+k+1 = 2k+2 =. Table 2.7 summarizes this argument. Case 2. The node Ca 2+ c is in the ON state initially, i.e., the initial state of the system is either 110 or 111. Without loss of generality, we can assume that the initial state is 110. At t = 2, the node Ca 2+ ATPase is updated and the state of the system changes to 111 and it stays there until the time point t = 2k+1, when Ca 2+ c is updated, and the system s state changes to 101. At t = 2k+2, the state of the system changes from 101 to 100 due to a change in the state of Ca 2+ ATPase. The system s state remains unchanged until t = 4k+2 = t CIS, when all the nodes are updated, causing a change of state to 010. Since CIS is turned OFF, after some time the node Ca 2+ c will be turned OFF and hence the network ultimately settles into the null fixed point. Table 2.7. State of the 3-node sub-network at different time steps starting from the initial state 100 obtained from the DA model with the time units given in Proposition 2.1. Time Node t=0 t=2 t=2k t=2k+1 t=2k+2 t=4k+1 t=4k+2 t=4k+4 CIS Ca 2+ c Ca 2+ ATPase

45 In conclusion, the limit cycle is formed by the states 100, 110, 111, and 101, and the basin of attraction of the limit cycle contains the states 100 and 101. Importantly, the states 111 and 110, although part of the limit cycle, are not part of its basin of attraction, since when they are used as initial conditions, the system will converge to the null fixed point. The other four states are also part of the basin of attraction of the fixed point. The state transition graph corresponding to the 3-node sub-network with the time units specified in Proposition 2.1 is represented in Figure 2.6. The state transition shown as a dash-dotted edge from 100 to 110 is only possible when the system starts from initial states 100 or 101. Starting from the other two states in the cycle, the system will converge to the fixed point through the dash-dotted edge from 100 to 010. Figure 2.6. State transition graph of the 3-node sub-network for the DA model with the time units given in Proposition 2.1. The four nodes whose symbols have a gray background form the strongly connected component of the state transition graph. The dash-dotted edges appear only if the system starts from specific initial states: starting from initial state 100 or 101, the system remains in the cycle through the edge from 100 to 110. However, if the system starts from initial state 110 or 111, it reaches the fixed point through the edge from 100 to 010. Comparing Figures 2.3(b) and 2.6, we observe that the transition graph of the 3-node sub-network for the ROA model is denser than that for the DA model. This is because there are multiple edges coming out of each state in Figure 2.3(b) corresponding to different orders of update of the three nodes, while for Figure 2.6 there are only three states that have more than one outgoing edge. The self-loops present in Figure 2.6 but absent from Figure 2.3(b) are due to updates in which the state of Ca 2+ ATPase is updated but does not change; there are no updates of all three nodes that would leave the system s state unchanged unless that state is the null fixed point. There is no direct edge from state 101 to 100 in Figure 2.3(b), while there is such an edge in Figure 2.6. This is because 35

46 going from state 101 to 100 requires that the only updated node is Ca 2+ ATPase, which is impossible in the ROA model but possible in the DA model. Notably, the edges among the nodes in the SCC are the same for the DA and GA models, however the transition graph obtained from the GA model (Figure 2.5) is denser and has more self-loops than the state transition graph of the DA model. We identified four more types of limit cycles. The proofs of the following propositions are given in Appendix A. Proposition 2.2. If, where k is a positive integer number, and, then the transition graph of the 3-node sub-network has a limit cycle of length. The state transition graph corresponding to this proposition has the same cycle as the state transition graph of proposition 2.1, and the limit cycle contains the states on this cycle. The basin of attraction of the limit cycle contains the states 100 and 101 as in proposition 2.1. The self-loops appear when Ca 2+ ATPase is updated but its state does not change. Proposition 2.3. If, where k is a positive integer number, and, then the transition graph of the 3-node sub-network has a limit cycle of length. Proposition 2.4. If, where k is a positive integer number, and, then the transition graph of the 3-node sub-network has a limit cycle of length. The transition graph obtained from the conditions mentioned in the last two propositions has the cycle as in Figure 2.6 except that the self-loops are on the states 101 and 110 where the node Ca 2+ c is updated. The tail going out of the cycle to the fixed point is different from Figure 2.6. The limit cycle contains the same states as in propositions 2.1 and 2.2, but the basin of attraction of the limit cycle differs in one state, containing the states 101 and

47 Proposition 2.5. If and t CIS = 4k, where k is a positive integer, then the transition graph of the 3-node sub-network has a limit cycle of length four. The transition graph corresponding to Proposition 2.5 has the same cycle as in Figure 2.6. There is no self-loop in this case as the length of the cycle is four, and the tail going out of the cycle to the fixed point differs from that represented in Figure 2.6. The basin of attraction of the limit cycle contains the states 101 and 111. Comparing propositions , we find that the corresponding limit cycles share a common sequence of node updates. For each state transition along the cycle, there is one node whose update is required in order for the state transition to occur, and there may be other nodes that can also be updated but do not change state. The nodes that must be updated are: Ca 2+ ATPase for 101 going into 100, Ca 2+ c for 100 going into 110, Ca 2+ ATPase for 110 going into 111, and Ca 2+ c for 111 going into 101. The additional nodes that can be updated are: Ca 2+ c for 101 going into 100, Ca 2+ ATPase for 100 going into 110, Ca 2+ and CIS for 110 going into 111, and Ca 2+ ATPase and CIS for 111 going into 101. This suggests that any other time units that lead to the required updates and all or part of the allowed additional updates will lead to a limit cycle Node perturbations Li et al. [23] reported that perturbations in S1P, PA, ph c or four other nodes lead to a reduced closure probability due to fluctuations in the state of the node Closure. With knocking out the nodes AnionEM, Depolar, or Actin the system exhibits insensitivity to ABA, i.e., a zero probability for closure. Disruption of ABI1 or Ca 2+ ATPase results in ABA hypersensitivity, whereas, perturbation in Ca 2+ c or two other nodes leads to ABA hyposensitivity, meaning that the curve giving the time course of reaching 100% of closed stomata is above (for hypersensitivity), or below (for hyposensitivity) that in the wild type [23]. We further investigate the effect of disrupting (setting in the OFF state) S1P, PA, AnionEM, ABI1, Ca 2+ c, and ph c on the attractors of the network. A detailed description of the results for the first five disruptions is given in Appendix A. In the following, we present the ph c disruption and summarize the results. With perturbation in ph c, 20 nodes stabilize at early steps, leaving a fluctuating subnetwork of 19 nodes (see Figure 2.7(a) with the Boolean rules in Table 2.8). Solving the 37

48 Boolean equations given in Table 2.8 (independently of time), we found that this subnetwork does not have a fixed point, implying that it must have at least one limit cycle for the synchronous method and a loose attractor for the ROA or GA method. Considering the high number of nodes, which makes it computationally intractable to map the state transition graph, we simplified the sub-network by collapsing a number of redundant paths as shown in Figure 2.7(b). We collapsed the 11-node sub-graph common to Figures 2.2 and 2.7(a) into the simple 3-node sub-graph of Figure 2.3(a). Then, we merged the parallel positive paths between Ca 2+ c and Depolar into a single edge, and eliminated three nodes. We also consolidated the two positive paths from Depolar to Closure into one, and removed the node KAP from the network. The Boolean functions governing the state of the resulting 7-node sub-network (represented in Figure 2.7(b)) are listed in Table 2.9. We confirmed by solving the Boolean equations given in Table 2.9 that this simplified sub-network maintained the original s lack of fixed point. Figure 2.7. Reduced sub-networks of the ABA signaling network upon knocking out the node ph c. (a) The 19-node sub-network resulting from ph c perturbation after removing the nodes that stabilized in an ON or OFF state. (b) The 7-node sub-network obtained from the network given in (a) after eliminating some of the redundant nodes and edges. 38

49 Numerical simulations revealed that with synchronous updating the transition graph of the 7-node sub-network has a limit cycle of length five (see Table 2.10) having all the states in its basin of attraction. For the ROA (or the GA) method, the transition graph of the sub-network given in Figure 2.7(b) has a loose attractor containing the majority of the states (120 out of 128 states), whereas the remaining eight states, which are in the incomponent of the SCC, can reach the SCC in one step. The only difference in the transition graphs obtained from the ROA and GA methods is that the average in- and outdegrees are smaller in the GA model (where both are equal to 4) than in the ROA model (where the average in-degree is 25 and the average out-degree is 23). Note that here the whole state space is in the basin of attraction of the loose attractor and the eight aforementioned states are Garden-of-Eden states (not reachable from any other state). Thus this case corresponds to sustained oscillations in the sub-network, including sustained oscillations of Ca 2+ c, which had only a fixed point in the OFF state in the unperturbed case, and sustained oscillations of Closure, which stabilized in the ON state in the unperturbed case. Table 2.8. Boolean rules governing the state of the nodes in the 19-node sub-network represented in Figure 2.7(a). Node NOS NO GC ADPRc cadpr cgmp PLC InsP3 CIS Ca 2+ ATPase Ca 2+ c AnionEM H + ATPase Depolar CaIM KOUT KAP KEV Closure Boolean rule NOS* = Ca 2+ c NO* = NOS GC* = NO ADPRc* = NO cadpr* = ADPRc cgmp* = GC PLC* = Ca 2+ c InsP3* = PLC CIS* = (cgmp AND cadpr) OR InsP3 Ca 2+ ATPase* = Ca 2+ c Ca 2+ c* = (CaIM OR CIS) AND (NOT Ca 2+ ATPase) AnionEM* = Ca 2+ c H + ATPase* = not Ca 2+ c Depolar* = KEV OR AnionEM OR (NOT H + ATPase) OR (NOT KOUT) OR Ca 2+ c CaIM* = Not Depolar KOUT* = Depolar KAP* = Depolar KEV* = Ca 2+ c Closure* = (KOUT OR KAP) AND AnionEM 39

50 Table 2.9. Boolean rules governing the state of the nodes in the 7-node sub-network represented in Figure 2.7(b). Node CIS Ca 2+ ATPase Ca 2+ c Depolar CaIM KOUT Closure Boolean rule CIS* = Ca 2+ c Ca 2+ ATPase* = Ca 2+ c Ca 2+ c* = (CaIM OR CIS) AND (NOT Ca 2+ ATPase) Depolar* = (NOT KOUT) or Ca 2+ c CaIM* = NOT Depolar KOUT* = Depolar Closure* = KOUT AND Ca 2+ c Table The limit cycle observed in the synchronous model for the sub-network given in Figure 2.7(b). CIS Ca 2+ c Ca 2+ ATPase CaIM Closure Depolar KOUT When focusing on the node Ca 2+ c, which is the key driver of the sub-network according to the scalar equation analysis, the difference between the 3-node sub-network and the current network is an additional negative feedback on Ca 2+ c mediated by the node Depolar. This negative feedback may be more effective than the negative feedback mediated by Ca 2+ ATPase, because of the OR rule governing the activation of the node Depolar. We tested whether weakening the negative feedback changes the dynamics, and found, both numerically and by solving the Boolean equations, that the necessary condition for having a fixed point is a change of OR operation in the Boolean function of the node Depolar to AND, and changing the inhibitory effect of Depolar on CaIM into an activation. Note that the fixed point of the whole network obtained after these rule changes is not the same as that of the wild-type network and, in particular, Closure stabilizes in the OFF state. In summary, synchronous or DA models of the perturbed networks exhibit both limit cycles and fixed points, except for the knockout of Ca 2+ c that results in the wild-type fixed point only and the knockout of ph c that leads only to a limit cycle. With the ROA or GA updating, the knockouts identified as leading to insensitivity by Li et al. [23] lead 40

51 to a fixed point in which half of the nodes, including Closure, are OFF, as expected. Most of the knockouts identified as yielding reduced sensitivity will lead to a fixed point in which Closure stabilizes in the OFF state. The fixed points obtained by knocking out PA, S1P, or AnionEM differ in the state of at most seven nodes. This suggests that PA and S1P knockouts that were previously reported as leading to reduced sensitivity [23] can be better categorized as leading to insensitivity. An interesting exception is the perturbation of ph c that leads to sustained oscillations for 19 nodes including Closure. The knockouts in the hypo/hyper sensitivity cases still have the wild-type fixed point with Closure stabilizing in the ON state Discussion and conclusion Here we provided an investigation of the long-term dynamics of the ABA signal transduction network by applying one synchronous and three asynchronous updating methods, namely random order asynchronous, general asynchronous, and deterministic asynchronous methods. We performed a comprehensive study of all attractors of the system considering every possible initial state, changes in timing, as well as perturbations in the nodes and regulatory functions of the system. Due to the complexity of the network, we proposed a network reduction technique to simplify the network while preserving its essential dynamical properties. This reduction method eased attractor analysis of the system. For example, it allowed us to determine the attractors of the synchronous model analytically by using the reduced scalar equation technique proposed by Farrow et al. [60]. Although reduced scalar equations were shown to be useful for small biological networks such as our simplified system, it should be noted that they cannot be applied to large-scale networks as the identification of the number of fixed points for monotone Boolean networks as well as determination of the existence of fixed points for general Boolean networks have been proven to be strong NP-complete problems [63]. Our proposed network reduction technique support and complement a recently proposed network reduction method by Veliz-Cuba [58]. This method contains two steps: (1) simplifying Boolean functions using Boolean algebra, and deleting the resulting nonfunctional edges, (2) sequentially deleting the nodes with no self-loops [58]. The first 41

52 step is similar to the first step of our method, however we consider not only simplifications induced by redundant Boolean functions (which are a modeling artifact and may be rare) but also by stable signals in the biological system (which are systemdriven and quite common). The second step is similar to our collapsing of nodes, and it is very effective when one is only interested in fixed points, however it may lead to drastic information reduction. Indeed, fixing the unregulated nodes in the ON state and applying the Veliz-Cuba method to the whole ABA network leads to a single-node network with the Boolean function Ca 2+ c*=0. As a result, the reduced network has only one fixed point, implying, based on Veliz-Cuba s results, that the original network must have a unique fixed point, which confirms our result. However, the reduced network is not informative regarding the possibility of complex attractors. Applying this reduction method to the networks obtained after ph c knockout results in a 3-node network with the following Boolean rules: Ca 2+ c *= not Depolar and not Ca 2+ c Depolar*= not Depolar or Ca 2+ c Closure*= Depolar and Ca 2+ c It can be easily seen that this reduced network does not have any fixed points, implying that the original system affected by ph c knockout does not have any fixed points either, which again confirms our result. In another study [64], a network reduction method based on the removal of frozen nodes (stable variables) and network leaves (i.e., nodes with out-degree = 0) was used to simplify random Boolean networks. In this method, first a random sampling method of the initial states was used to determine a subset of the attractors, and then a minimum set of frozen nodes was found by identifying the nodes whose state was the same in all the attractors. This method is not useful for reducing the 13-node sub-network as the minimum set of frozen nodes for this network is empty. The method may be used for reduction of the 39-node full network to the 13-node sub-network, but it would be an inefficient alternative to what we have done. A recently introduced asynchronous Boolean model reduction method [65] consists of simplifying the model s state transition graph into a directed acyclic graph of its strongly connected components, and then identifying the subset of interactions that are 42

53 operational in each strongly connected component of the state transition graph. This method is applicable to networks of eight to fifteen nodes, thus its combination with our elimination of stabilized nodes may be very informative. Markov chains have been previously employed in the study of probabilistic Boolean networks by incorporating uncertainty in the Boolean rules governing the state of the nodes [24,66,67]. In this study, we have provided an added insight into how Markov chain techniques can be used to analyze the dynamic behavior of systems whose uncertainty is not in the logic functions controlling the states of the nodes but in the timing of the interactions [27,65]. Particularly, we showed that the transition graph corresponding to the random order or general asynchronous models is in fact a Markov chain, thereby the absorption times to the fixed point can be obtained. This work provides further insights into the role of the Ca 2+ c oscillations. We found that the Ca 2+ c oscillations eventually disappear in the wild-type system, unless strict constraints regarding the timing of certain processes and the initial state of the system are satisfied. For example, we found that in the deterministic asynchronous model of the 3- node reduced network, only timing values that correspond to a restricted update sequence, i.e. Ca 2+ ATPase, Ca 2+ c, Ca 2+ ATPase, Ca 2+ c and a few variants of it, and only two of eight possible initial conditions, lead to a limit cycle. Assuming that all node or node combination updates have equal probability, the estimated incidence of cycleinducing update sequences is 64 of 2401, or 2.7%. Currently there is insufficient experimental information to gauge whether these constraints are biologically satisfied or not. However, the requirement for a longer time scale for CIS than the other two nodes is supported by the longer positive feedback loops (that involve Ca 2+ c and CIS) than the negative feedback loop (of Ca 2+ c and Ca 2+ ATPase) in the 11-node sub-network (Figure 2.2). Interestingly, we found that the system converges to a loose attractor when perturbing the node ph c. In this case the Ca 2+ c-driven oscillations do play an observable beneficial role by leading to an at least fluctuating stomatal closure. Our analysis refines previous results on the relative ranking of node disruptions in the ABA system. We find that from the standpoint of the long-term behavior there are only three categories of responses: knockouts leading to the wild-type fixed point (which 43

54 represent 75% of all knockouts), knockouts leading to the null fixed point (22.5% of all knockouts), and a single knockout leading to sustained fluctuations. Our study provides a roadmap to compare and verify the existence of different types of attractors resulting from Boolean models. The important steps along the road are simplification of the network by elimination of frozen nodes and collapse of intermediary nodes, and analysis of the state transition graphs that correspond to synchronous as well as stochastic and deterministic asynchronous models. For the ABA signaling system investigated here we found both agreement (a common fixed point) and disagreement in the possibility of limit cycles among the models. We also found update-dependent diversity in the basins of attraction of the attractors, and in the case of the DA model we surprisingly found some states that are part of an attractor but not part of its basin of attraction. The results presented in this study support the necessity of using asynchronous update in Boolean models of biological systems. The two periodic attractors whose basins of attraction dominated the synchronous model were nonexistent or had a much reduced basin in asynchronous models. This observation is consistent with previous reports [31,32] that synchronous models possess spurious cycles. While there is increasing evidence for the necessity of relaxing the assumption of synchronicity, what kind of asynchronous implementation to use is an open question. The most definitive information, experimental data on timing and kinetics, is rarely available. It is thus very important to study what conclusions are robust to changes in the implementation of time. Our results suggest that sustained oscillations are much less prevalent than asymptotic behavior in real biological systems having a variety of time scales. This supports the prior findings that in sequential deterministic asynchronous models of networks without negative loops (self-inhibition) all dynamical cycles can be destroyed by a change in the update method [68] and that cycles are not likely to be observed in asynchronous cellular automata models [62]. However, we also find that focusing solely on the fixed points is not sufficient to capture all dynamical aspects of a system either, as complex attractors are possible both for deterministic and stochastic asynchronous models. At present no asynchronous method was demonstrated to be optimal: deterministic methods require knowledge of the nodes time units while stochastic methods have a 44

55 large number of possible state transitions. In cases when no biological timing information is available, the GA method may be practically preferable over the ROA method, as it is computationally more efficient and the state transition graph of the ROA method can be obtained from the state transition graph of the GA method. For biological systems with some, but insufficient information on time scales, the middle road of imposing restrictions on the order of update (e.g., by always updating a certain node before another, [27,69]) or on the probability of a node s update [65] may be most practical. 45

56 Boolean Dynamic Modeling of a T Cell Survival Network Identifies Novel Candidate Therapeutic Targets for Large Granular Lymphocyte Leukemia This chapter has been previously published in modified form in PLoS Computational Biology [70] Introduction Living cells perceive and respond to environmental perturbations in order to maintain their functional capabilities, such as growth, survival, and apoptosis. This process is carried out through a cascade of interactions forming complex signaling networks. Dysregulation (abnormal expression or activity) of some components in these signaling networks affects the efficacy of signal transduction and may eventually trigger a transition from the normal physiological state to a dysfunctional system [71] manifested as diseases such as diabetes [72,73], developmental disorders [74], autoimmunity [75] and cancer [74,76]. For example, the blood cancer T-cell large granular lymphocyte (T- LGL) leukemia exhibits an abnormal proliferation of mature cytotoxic T lymphocytes (CTLs). Normal CTLs are generated to eliminate cells infected by a virus, but unlike normal CTLs which undergo activation-induced cell death after they successfully fight the virus, leukemic T-LGL cells remain long-term competent [77,78]. The cause of this 46

57 abnormal behavior has been identified as dysregulation of a few components of the signal transduction network responsible for activation-induced cell death in T cells [79]. A Boolean network model of T cell survival signaling in the context of T-LGL leukemia was previously constructed by Zhang et al. [17] through performing an extensive literature search. This network consists of 60 components, including proteins, mrnas, and small molecules (see Figure 1). The main input to the network is Stimuli, which represents virus or antigen stimulation, and the main output node is Apoptosis, which denotes programmed cell death. Based on a random order asynchronous Boolean dynamic model of the assembled network, Zhang et al. identified a minimal number of dysregulations that can cause the T-LGL survival state, namely overabundance or overactivity of the proteins platelet-derived growth factor (PDGF) and interleukin 15 (IL15). Zhang et al. carried out a preliminary analysis of the network s dynamics by performing numerical simulations starting from one specific initial condition (corresponding to resting T cells receiving antigen stimulation and over-abundance of the two proteins PDGF and IL15). Once the known deregulations in T-LGL leukemia were reproduced, each of these deregulations was interrupted individually, by setting the node s status to the opposite state, to predict key mediators of the disease. Yet, a complete dynamic analysis of the system, including identification of the attractors (e.g., steady states) of the system and their corresponding basins of attraction, as well as a thorough perturbation analysis of the system considering all possible initial states, is lacking. Performing this analysis can provide deeper insights into unknown aspects of T- LGL leukemia. Stuck-at-ON/OFF fault is a very common dysregulation of biomolecules in various cancer diseases [80]. For example, stuck-at-on (constitutive activation) of the RAS protein in the mitogen-activated protein kinase pathways leads to aberrant cell proliferation and cancer [80,81]. Thus identifying components whose stuck-at values result in the clearance, or alternatively, the persistence of a disease is extremely beneficial for the design of intervention strategies. As there is no known curative therapy for T-LGL leukemia, identification of potential therapeutic targets is of utmost importance [82]. 47

58 In this work, we carry out a detailed analysis of the T-LGL signaling network by considering all possible initial states to probe the long-term behavior of the underlying disease. We employ an asynchronous Boolean dynamic framework and a network reduction method, which we previously proposed [40], as described in Section 2.2.1, to identify the attractors of the system and analyze their basins of attraction. This analysis allows us to confirm or predict the T-LGL states of 54 components of the network. The predicted state of one of the components (SMAD) is validated by new wet-bench experiments performed by our experimental collaborators. We then perform node perturbation analysis using a Boolean dynamic approach to study to what extent does each component contribute to T-LGL leukemia. We identify 19 key components whose disruption can reverse the abnormal state of the signaling network, thereby uncovering potential therapeutic targets for this disease Methods As we mentioned in Chapter 2, our comparative study of three different asynchronous methods applied to the same biological system suggested that the general asynchronous (GA) method, wherein a randomly selected node is updated at each time step, is the most efficient and informative asynchronous Boolean updating strategy [40]. This is because deterministic asynchronous [40] or autonomous [83] Boolean models require kinetic or timing knowledge, which is usually missing, and random order asynchronous models [27] are not computationally efficient compared to the GA models. In addition, the superiority of the GA approach has been corroborated by other researchers [28] and the method has been used in other studies as well [30,31]. We thus chose to employ the GA method in this work, and we implemented it using the opensource software library BooleanNet [33]. It is important to note that the stochasticity inherent to this method may cause each state to have multiple successors, and thus the basins of attraction of different attractors may overlap. For systems with multiple fixedpoint attractors, the absorption probabilities to each fixed point can be computed through the analysis of the Markov chain and transition matrix associated with the state transition graph of the system [61]. Given a fixed point, node perturbations can be performed by 48

59 reversing the state of the nodes, i.e., by knocking out the nodes that stabilize in an ON state in the fixed point or over-expressing the ones that stabilize in an OFF state. In this work, we employ the reduction method that we proposed in our previous work [40] (see Section for details) to simplify the T-LGL leukemia signal transduction network synthesized by Zhang et al. [17], thereby facilitating its dynamical analysis. Attractor analysis of the system is then performed as explained in Section Results Network simplification and dynamic analysis The T-LGL signaling network reconstructed by Zhang et al. [17] contains 60 nodes and 142 regulatory edges. Zhang et al. used a two-step process: they first synthesized a network containing 128 nodes and 287 edges by extensive literature search, then simplified it with the software NET-SYNTHESIS [16], which constructs the sparsest network that maintains all of the causal (upstream-downstream) effects incorporated in a redundant starting network. In this study, we work with the 60-node T-LGL signaling network reported in [17], which is redrawn in Figure 3.1. The Boolean rules for the components of the network were constructed in [17] by synthesizing experimental observations and for convenience are given in Table B.1. (see Appendix B) as well. The description of the node names and abbreviations are provided in Table B.2 (see Appendix B). To reduce the computational burden associated with the large state space (more than states for 60 nodes), we simplified the T-LGL network using the reduction method proposed in [40] (see section for details). We fixed the six source nodes in the states given in [17], i.e. Stimuli, IL15, and PDGF were fixed at ON and Stimuli2, CD45, and TAX were fixed at OFF. We used the Boolean rules constructed in [17], with one notable difference. The Boolean rules for all the nodes in [17], except Apoptosis, contain the expression AND NOT Apoptosis, meaning that if Apoptosis is ON, the cell dies and correspondingly all other nodes are turned OFF. To focus on the trajectory leading to the initial turning on of the Apoptosis node, we removed the AND NOT Apoptosis from all the logical rules. This allows us to determine the stationary states of the nodes in a live cell. We determined which nodes states stabilize using the first step of our simplification 49

method (see Section 2.2.1). Our analysis revealed that 36 nodes of the network stabilize in either an ON or OFF state. In particular, Proliferation and Cytoskeleton signaling, two Figure 3.1. The T-LGL leukemia signaling network.

60 method (see Section 2.2.1). Our analysis revealed that 36 nodes of the network stabilize in either an ON or OFF state. In particular, Proliferation and Cytoskeleton signaling, two Figure 3.1. The T-LGL leukemia signaling network. The shape of the nodes indicates the cellular location: rectangular indicates intracellular components, ellipse indicates extracellular components, and diamond indicates receptors. Node colors reflect the current knowledge on the state of these nodes in leukemic cells: highly active components in T-LGL are shown in red, inhibited nodes are shown in green, nodes that have been suggested to be deregulated are in blue, and the state of white nodes is unknown. Conceptual nodes (Stimuli, Stimuli2, P2, Cytoskeleton signaling, Proliferation, and Apoptosis) are represented by yellow hexagons. An arrowhead or a short perpendicular bar at the end of an edge indicates activation or inhibition, respectively. The inhibitory edges from Apoptosis to other nodes are not shown. The full names of the node labels are given in Table B.2. This figure and its caption have been adapted from [17]. 50

61 output nodes of the network, stabilize in the OFF and ON state, respectively. Low proliferation in leukemic LGL has been observed experimentally [84], which supports our finding of a long-term OFF state for this output node. The ON state of Cytoskeleton signaling may not be biologically relevant as this node represents the ability of T cells to attach and move which is expected to be reduced in leukemic T-LGL compared to normal T cells. The nodes whose stabilized states cannot be readily obtained by inspection of their Boolean rules form the sub-network represented in Figure 3.2(a). The Boolean rules of these nodes are listed in Table B.3 (see Appendix B) wherein we put back the AND NOT Apoptosis expression into the rules. Figure 3.2. Reduced sub-networks of the T-LGL leukemia signaling network. The full names of the nodes can be found in Table B.2. An arrowhead or a short perpendicular bar at the end of an edge indicates activation or inhibition, respectively. The inhibitory edges from Apoptosis to other nodes are not shown. (a) The 18-node sub-network. This sub-network is obtained by removing the nodes that stabilize in the ON or OFF state upon fixing the state of the source nodes. (b) The 6-node sub-network. This sub-network is obtained by removing the top sub-graph of the sub-network in (a) and merging simple mediator nodes in the bottom sub-graph. Next, we identified the attractors (long-term behavior) of the sub-network represented in Figure 3.2(a) (based on the method described in Section 2.2.2). We found 51

62 that upon activation of Apoptosis all other nodes stabilize at OFF, forming the normal fixed point of the system, which represents the normal behavior of programmed cell death. When Apoptosis is stabilized at OFF, the two nodes in the top sub-graph oscillate while all the nodes in the bottom sub-graph are stabilized at either ON or OFF. As shown in Figure 3.3, the state space of the two oscillatory nodes, TCR and CTLA4, forms a complex attractor in which the average fraction of ON states for either node is 0.5. Given that these two nodes have no effect on any other node under the conditions studied here (i.e., stable states of the source nodes), their behavior can be separated from the rest of the network. Figure 3.3. The state transition graph corresponding to the two oscillatory nodes, CTLA4 and TCR. In this graph the left binary digit of the node identifier indicates the state of CTLA4 and the right digit represents the state of TCR. The directed edges represent state transitions allowed by updating a single node s state; self-loops appear when a node is updated but its state does not change. The bottom sub-graph in Figure 3.2(a) exhibits the normal fixed point, as well as two T-LGL (disease) fixed points in which Apoptosis is OFF. The only difference between the two T-LGL fixed points is that the node P2 is ON in one fixed point and OFF in the other, which was expected due to the presence of a self-loop on P2 in Figure 3.2(a). P2 is a virtual node introduced to mediate the inhibition of interferon- translation in the case of sustained activity of the interferon- protein (IFNG in Figure 3.2(a)). The node IFNG is also inhibited by the node SMAD which stabilizes in the ON state in both T-LGL fixed points. Therefore IFNG stabilizes at OFF, irrespective of the state of P2, as supported by experimental evidence [85]. Thus the biological difference between the two fixed points is essentially a memory effect, i.e., the ON state of P2 indicates that IFNG was transiently ON before stabilizing in the OFF state. In the two T-LGL fixed points for the bottom subgraph of Figure 3.2(a), the nodes sfas, GPCR, S1P, SMAD, MCL1, FLIP, and IAP are 52

63 ON and the other nodes are OFF. We found by numerical simulations using the GA method that out of 65,536 total states in the state transition graph, 53% are in the exclusive basin of attraction of the normal fixed point, 0.24% are in the exclusive basin of attraction of the T-LGL fixed point wherein P2 is ON and 0.03% are in the exclusive basin of attraction of the T-LGL fixed point wherein P2 is OFF. Interestingly, there is a significant overlap among the basins of attraction of all the three fixed points. The large basin of attraction of the normal fixed point is partly due to the fact that all the states having Apoptosis in the ON state (that is, half of the total number of states) belong to the exclusive basin of the normal fixed point. These states are not biologically relevant initial conditions but they represent potential intermediary states toward programmed cell death and as such they need to be included in the state transition graph. Since the state transition graph of the bottom sub-graph given in Figure 3.2(a) is too large to represent and to further analyze (e.g., to obtain the probabilities of reaching each of the fixed points), we applied the second step of the network reduction method proposed in [40]. This step preserves the fixed points of the system (see Section 2.2.1), and since the only attractors of this sub-graph are fixed points, the state space of the reduced network is expected to reflect the properties of the full state space. Correspondingly, the nodes having in-degree and out-degree of one (or less) in the subgraph on Figure 3.2(a), such as sfas, MCL1, IAP, GPCR, SMAD, and CREB, can be safely removed without losing any significant information as such nodes at most introduce a delay in the signal propagation. In addition, we note that although the node P2 has a self-loop and generates a new T-LGL fixed point as described before, it can also be removed from the network since the two fixed points differ only in the state of P2 and thus correspond to biologically equivalent disease states. We revisit this node when enumerating the attractors of the original network. In the resulting simplified network, the nodes BID, Caspase, and IFNG would also have in-degree and out-degree of one (or less) and thus can be safely removed as well. This reduction procedure results in a simple subnetwork represented in Figure 3.2(b) with the Boolean rules given in Table 3.1. Our attractor analysis revealed that this sub-network has two fixed points, namely and (the digits from left to right represent the state of the nodes in the order as listed from top to bottom in Table 3.1). The first fixed point represents the 53

64 normal state, that is, the apoptosis of CTL cells. Note that the OFF state of other nodes in this fixed point was expected because of the presence of AND NOT Apoptosis in all the Boolean rules. The second fixed point is the T-LGL (disease) one as Apoptosis is stabilized in the OFF state. We note that the sub-network depicted in Figure 3.2(b) contains a backbone of activations from Fas to Apoptosis and two nodes (S1P and FLIP) which both have a mutual inhibitory relationship with the backbone. If activation reaches Apoptosis, the system converges to the normal fixed point. In the T-LGL fixed point, on the other hand, the backbone is inactive while S1P and FLIP are active. Table 3.1. Boolean rules governing the nodes states in the 6-node sub-network represented in Figure 3.2(b). For simplicity, the nodes states are represented by the node names. The asterisk indicates the future state of the marked node. Node Boolean rule S1P S1P* = NOT (Ceramide OR Apoptosis) FLIP FLIP* = NOT (DISC OR Apoptosis) Fas Fas* = NOT (S1P OR Apoptosis) Ceramide Ceramide* = Fas AND NOT (S1P OR Apoptosis) DISC DISC* = (Ceramide OR (Fas AND NOT FLIP)) AND NOT Apoptosis Apoptosis Apoptosis* = DISC OR Apoptosis We found by simulations that for the simplified network of Figure 3.2(b), 56% of the states of the state transition graph (represented in Figure 3.4) are in the exclusive basin of attraction of the normal fixed point, while 5% of the states form the exclusive basin of attraction of the T-LGL fixed point. Again, the half of state space that has the ON state of Apoptosis belongs to the exclusive basin of attraction of the normal fixed point. Notably, there is a significant overlap between the basins of attraction of the two fixed points, which is illustrated by a gray color in Figure 3.4. The probabilities of reaching each of the two fixed points starting from these gray-colored states, found by analysis of the corresponding Markov chain (see Methods), are given in Figure 3.5. As this figure represents, for the majority of cases the probability of reaching the normal fixed point is higher than that of the T-LGL fixed point. The three states whose probabilities to reach the T-LGL fixed point are greater than or equal to 0.7 are one step away either from the T-LGL fixed point or from the states in its exclusive basin of attraction. In two of them, the backbone of the network in Figure 3.2(b) is inactive, and in the third one the 54

backbone is partially inactive and most likely will remain inactive due to the ON state of S1P (one of the two nodes having mutual inhibition with the backbone). Figure 3.4.

It contains 64 states of which the state shown with a dark blue symbol is the normal fixed point and the state shown in red is the T-LGL fixed point.

65 backbone is partially inactive and most likely will remain inactive due to the ON state of S1P (one of the two nodes having mutual inhibition with the backbone). Figure 3.4. State transition graph of the 6-node sub-network represented in Figure 3.2(b). It contains 64 states of which the state shown with a dark blue symbol is the normal fixed point and the state shown in red is the T-LGL fixed point. States denoted by light blue symbols are uniquely in the basin of attraction of the normal fixed point whereas the states in pink can only reach the T-LGL fixed point. Gray states, on the other hand, can lead to either fixed point. Figure 3.5. Probabilities of reaching the normal and T-LGL fixed points when both are reachable. These probabilities are computed starting from the states that are shared by both basins of attraction (see gray-colored states illustrated in Figure 3.4). 55

66 Based on the sub-network analysis and considering the states of the nodes that stabilized at the beginning according to the first step of our reduction method, we conclude that the whole T-LGL network has three attractors, namely the normal fixed point wherein Apoptosis is ON and all other nodes are OFF, representing the normal physiological state, and two T-LGL attractors in which all nodes except two, i.e. TCR and CTLA4, are in a steady state, representing the disease state. These T-LGL attractors are given in the second column of Table 3.2, which presents the predicted T-LGL states of 54 components of the network (all but the six source nodes whose states are indicated at the beginning of the Results section). We note that the two T-LGL attractors essentially represent the same disease state since they only differ in the state of the virtual node P2. Moreover, this disease state can be considered as a fixed point since only two nodes oscillate in the T-LGL attractors. For this reason we will refer to this state as the T- LGL fixed point. It is expected that the basins of attraction of the fixed points have similar features as those of the simplified networks Experimental validation of the T-LGL steady state Experimental evidence exists for the deregulated states of 36 (67%) components out of the 54 predicted T-LGL states as summarized in the third column of Table 3.2. For example, the stable ON state of MEK, ERK, JAK, and STAT3 indicates that the MAPK and JAK-STAT pathways are activated. The OFF state of BID is corroborated by recent evidence that it is down-regulated both in natural killer (NK) and in T cell LGL leukemia [86]. In addition, the node RAS was found to be constitutively active in NK-LGL leukemia [87], which indirectly supports our result on the predicted ON state of this node. For three other components, namely, GPCR, DISC, and IFNG, which were classified as being deregulated without clear evidence of either up-regulation or down-regulation in [17], we found that they eventually stabilize at ON, OFF, and OFF, respectively. The OFF state of IFNG and DISC is indeed supported by experimental evidence [85,88]. In the second column of Table 3.2, we indicated with an asterisk the stabilized state of 17 components that were experimentally undocumented before and thus are predictions of our steady state analysis (P2 was not included as it is a virtual node). We note that ten of these cases were also predicted in [17] by simulations. 56

67 The predicted T-LGL states of these 17 components can guide targeted experimental follow-up studies. As an example of this approach, our experimental collaborators tested and validated the predicted over-activity of the node SMAD (see [70] for details). Table 3.2. A summary of the dynamic analysis results of the T-LGL survival signaling network. The first two columns from the left list the components of the network (except for the six source nodes) and their T-LGL states. The nodes states marked with an asterisk were not documented experimentally in T- LGL before and were predicted by our steady state analysis. The references for the nodes states documented before are given in the third column. The fixed point(s) obtained after each of the nodes states is reversed is given in the fourth column, while the size of the exclusive basin of attraction of the normal fixed point, expressed as a percentage of the whole relevant state space, is indicated in the fifth column. The reference of the perturbation cases for which experimental evidence exists is given in the last column. The first 19 nodes in the first column are potential therapeutic targets for T-LGL leukemia. Node T-LGL state Ref. Fixed point the disruption leads to Size of exclusive basin of normal fixed point DISC OFF [88] Normal 100% [88] Ceramide OFF [89] Normal 100% [89] Caspase OFF [88] Normal 100% SPHK1 ON [82] Normal 100% [17] S1P ON [82] Normal 100% [82] PDGFR ON [90] Normal 100% [17] GAP OFF* Normal 100% RAS ON* Normal 100% [87] 1 MEK ON [90] Normal 100% [87] 1 ERK ON [90,91] Normal 100% [87] 1 IL2RBT ON [92] Normal 100% IL2RB ON [92] Normal 100% STAT3 ON [93] Normal 100% [93] BID OFF [86] Normal 100% MCL1 ON [93] Normal 100% [93] SOCS OFF* Both 81% JAK ON [93] Both 81% [93] PI3K ON [91] Both 75% [91] NFB ON [17] Both 75% [17] Fas OFF [89] Both 72% sfas ON [94] Both 72% TBET ON [17] Both 63% RANTES ON [85] Both 63% PLCG1 ON* Both 63% FLIP ON [88] Both 56% IL2 OFF [95] Both 56% IAP ON* Both 56% TNF ON* Both 56% BclxL OFF [93] Both 56% GZMB ON [96] Both 56% IL2RA OFF [95] Both 56% Ref. 57

68 NFAT ON* Both 56% GRB2 ON* Both 56% IFNGT ON [85,95] Both 56% TRADD OFF* Both 56% ZAP70 OFF* Both 56% LCK ON [91] Both 56% FYN ON* Both 56% IFNG OFF [85] Both 56% SMAD ON* [70] Both 56% GPCR ON [82,97] Both 56% TPL2 ON [98] Both 56% A20 ON [82] Both 56% IL2RAT OFF [95] Both 56% CREB OFF* Both 56% P27 ON* Both 56% P2 ON/OFF Both 56% FasT ON [89] T-LGL 0% FasL ON [89] T-LGL 0% Cytoskeleton signaling ON* Proliferation OFF [84] Apoptosis OFF [99] TCR Oscillate* CTLA4 Oscillate* 1 Evidence in NK-LGL leukemia Node perturbations A question of immense biological importance is which manipulations of the T-LGL network can result in consistent activation-induced cell death and the elimination of the dysregulated (diseased) behavior. We can rephrase and specify this question as which node perturbations (knockouts or constitutive activations) lead to a system that has only the normal fixed point. These perturbations can serve as candidates for potential therapeutic interventions. To this end, we performed node perturbation analysis using dynamical and structural methods. In the following, only the results of the dynamic analysis are presented. For details on the structural analysis, which was performed by my colleague, Dr. Rui-Sheng Wang, one can refer to [70]. To identify manipulations of the T-LGL network leading to the existence of only the normal fixed point, we first considered the following scenario. We assumed that the T- LGL network is the simplified network given in Figure 3.2(b). We examined the following dynamic perturbation approaches as potential interventions propelling the system into the normal fixed point. In the first two approaches, it is assumed that the T- 58

69 LGL fixed point has been already reached (i.e., the disease has already developed), and in the last approach, all possible initial conditions are considered. 1- Reverse the state of one node at a time in the T-LGL fixed point for only the first time step, and keep updating the system. This intervention may be accomplished by a pharmacological intervention on a T-LGL cell. 2- Reverse the state of one node in the T-LGL fixed point permanently and continue updating other nodes. This intervention may be accomplished by genetic engineering of a T-LGL cell. 3- Considering all possible initial states, fix the state of one node in the opposite of its T-LGL state and keep updating other nodes. This intervention may be accomplished by genetic engineering of a population of CTLs. For the first perturbation approach, we found that only the trivial case of flipping the state of Apoptosis to ON leads exclusively to the normal fixed point. Using the second perturbation approach, we observed that fixing S1P at OFF or Apoptosis at ON eliminates the T-LGL fixed point. In addition, fixing either Ceramide or DISC at ON results in a new fixed point which is similar to the normal fixed point of the unperturbed system, with the only difference that the disrupted node s state is fixed at ON as long as the cell is alive. Using the last perturbation approach, we found a result identical to that of the second approach, indicating that the nodes S1P, Ceramide, and DISC are candidate therapeutic targets for the simplified sub-network. Experiments also confirm that Ceramide and DISC can serve as therapeutic targets [88,89]. We note that the third approach is superior to the second in that it provides additional information on the size of the basin of attraction of each fixed point. For example, we observed that in the case of over-expression of Fas, the exclusive basin of attraction of the normal fixed point increases significantly to 72% of the states. This suggests that although both fixed points are still reachable, the normal fixed point is more probable to be reached. This analysis revealed that the last approach leads to more detailed results than the first two approaches. Next we focused our attention to the effects of node disruptions on the whole network to make biologically testable predictions about the occurrence of the disease 59

70 state under different conditions. To this end, we followed the third approach delineated above. More precisely, for each node disruption, we fixed the state of that node in the opposite of its stabilized state in the T-LGL fixed point given in Table 3.2 (i.e., we knocked out the nodes that stabilize in the ON state in T-LGL fixed point and overexpressed the ones that stabilize in the OFF state) and considered all possible initial states for the remaining nodes (except for the six source nodes). Of the 60 nodes of the network, six are source nodes, three are output nodes and two (CTLA4 and TCR) have oscillatory behavior in the T-LGL attractor. For each of the remaining nodes, we fixed the state of that node in the opposite of its T-LGL state, initiated the six source nodes as in the unperturbed case, and identified the stabilized nodes using the first step our reduction method (see Section 2.2.1). We then simplified the network of non-stabilized nodes according to the second step of our reduction method (see Section 2.2.1) and obtained all possible fixed points by solving the corresponding set of Boolean equations. For some cases we needed to construct the full state transition graphs because of the possibility of oscillation (e.g., when the two oscillatory nodes, CTLA4 and TCR, were connected to other nodes in the simplified network and there was a possibility of propagating the oscillation to other nodes in the T-LGL state). We found that in the case of perturbation of TBET, PI3K, NFB, JAK, or SOCS, five additional nodes of the network connected to CTLA4 and TCR, namely LCK, FYN, Cytoskeleton signaling, ZAP70, and GRB2, oscillate as well. Also, for the knockout of FYN, only two of these additional nodes, i.e. LCK and ZAP70 oscillate. In addition, in the case of perturbation of TBET, JAK, SOCS, or IL2, the node IL2RA shows oscillatory behavior in the T-LGL state. In general, two types of fixed points were observed, the normal fixed point with Apoptosis being ON and all other nodes being OFF, and similar-to-tlgl fixed points with Apoptosis being OFF and the state of some nodes being different from the wild-type T-LGL fixed point due to the disruption imposed on the network. We still consider these latter fixed points as the T-LGL fixed point. A summary of the node disruption results, including the fixed point(s) obtained after the disruption as well as the size of the exclusive basin of attraction of the normal fixed point in the respective reduced model, is given in the fourth and fifth columns of Table 3.2. Our results indicate that disruption of any of the first 15 nodes in Table 3.2 leads to the disappearance of the T-LGL fixed point 60

71 (i.e., of the disease state). These nodes are thus predicted candidate therapeutic targets. For example, our results suggest that knockout of STAT3 or over-expression of Ceramide in deregulated CTLs restores their activation induced cell death. We found for the knockout of either FasT or FasL that the normal fixed point and the 50% of the state transition graph which includes the ON state of Apoptosis is separated from the rest of the state space and thus they are not accessible from the biologically relevant initial conditions. Therefore, the T-LGL fixed point is the only biologically relevant outcome in this case. For this reason, the size of the basin of attraction of the normal fixed point was indicated as 0% in Table 3.2. Notably, these nodes can serve as candidates for engineering of long-lived T cells, which are necessary for the delivery of virus and cancer vaccines. The remaining node disruptions still retain both disease and normal fixed points. There is corroborating literature evidence for several of the therapeutic targets predicted by our analysis. For example, it was found experimentally that STAT3 knockdown by using sirna or down-regulation of MCL1 through inhibiting STAT3 induces apoptosis in leukemic T-LGL [93]. Furthermore, in vitro Ceramide treatment induces apoptosis in leukemic T-LGL [89]. It was also found that treatment with IL2 and TCR stimulation facilitates Fas-mediated apoptosis via induction of DISC formation [88]. In addition, SPHK1 inhibition by using chemical inhibitors significantly induces apoptosis in leukemic T-LGL [17]. These experimental results validate that perturbation of these nodes results in the normal fixed point as mentioned in Table 3.2. Moreover, it was reported in [87] that inhibition of RAS through introducing a dominant negative form of RAS, or inhibition of MEK or ERK through chemical inhibitors, induces apoptosis in leukemic NK-LGL, which indirectly supports our results on these three nodes. For the cases where both fixed points are still reachable, our analysis of the relative size of the basins of attraction (i.e., percentage of the whole relevant state space) of the fixed points and the probabilities of reaching the fixed points indicated that in most of these cases the trends are similar to the wild-type model, e.g. the size of the exclusive basin of attraction of the normal fixed point is 56%, the same as that for the unperturbed system. In a few cases, however, including JAK, PI3K, or NFB knockout as well as 61

72 SOCS over-expression, the exclusive basin of attraction of the normal fixed point increased significantly (to 75% or more). Thus, these nodes can be also considered as potential therapeutic targets. Interestingly, for three cases, namely JAK, PI3K, and NFB, experimental data also suggest that the balance between the incidence of the two fixed points is shifted in the manipulated system compared to the original one. For example, inhibition of JAK [93], PI3K [91] or NFB [17] through chemical inhibitors induces apoptosis in leukemic T-LGL. In summary, our analysis leads to the novel predictions that Caspase, GAP, BID, or SOCS over-expression as well as RAS, MEK, ERK, IL2RBT, or IL2RB knockout can lead to apoptosis of T-LGL cells Discussion and conclusion Here we presented a comprehensive analysis of the T-LGL survival signaling network to unravel the unknown facets of this disease. By using a reduction technique, we first identified the fixed points of the system, namely the normal and T-LGL fixed points, which represent the healthy and disease states, respectively. This analysis identified the T-LGL states of 54 components of the network, out of which 36 (67%) are corroborated by previous experimental evidence and the rest are novel predictions. These new predictions include RAS, PLCG1, IAP, TNF, NFAT, GRB2, FYN, SMAD, P27, and Cytoskeleton signaling, which are predicted to stabilize at ON in T-LGL leukemia and GAP, SOCS, TRADD, ZAP70, and CREB which are predicted to stabilize at OFF. In addition, we found that the node P2 can stabilize in either the ON or OFF state, whereas two nodes, TCR and CTLA4, oscillate. These predicted T-LGL states provide valuable guidance for targeted experimental follow-up studies of T-LGL leukemia. Among the predicted states, the ON state of Cytoskeleton signaling may not be biologically relevant as this node represents the ability of T cells to attach and move which is expected to be reduced in leukemic T-LGL compared to normal T cells. This discrepancy may be due to the fact that the network contains insufficient detail regarding the regulation of the cytoskeleton, as there is only one node, FYN, upstream of Cytoskeleton signaling in the network. While the network is able to successfully capture survival signaling without necessarily capturing the cytoskeleton signaling, this discrepancy suggests that follow-up experimental studies should be conducted to 62

73 determine the relationship between cytoskeleton signaling and survival signaling in the T-LGL network. We note that in the case of perturbation of TBET, PI3K, NFB, JAK, or SOCS, the node Cytoskeleton signaling exhibits oscillatory behavior induced by oscillations in TCR. At present it is not known whether this predicted behavior is relevant. Using the general asynchronous (GA) Boolean dynamic approach, we analyzed the basins of attraction of the fixed points. We found that the basin of attraction of the normal fixed point is larger than that of the T-LGL fixed point. The trajectories starting from each initial state toward the T-LGL fixed point (Figure 3.4) may be indicative of the accumulating deregulations that lead to the disease-associated stable survival state. Although the fixed points, being time independent, are the same for all update methods or implementations of time, the update method may affect the structure of the state transition graph of the system and the basins of attraction of the fixed points. We note that the GA method assumes that each node has an equal chance of being updated. If quantitative or kinetic information becomes available in this system, unequal probabilities may be implemented by grouping the nodes into several priority classes and assigning a weight to each class where higher weights indicate more probable transitions [65]. Incorporating such information into the state space may prune the allowed trajectories and give further insights into the accumulation of deregulations. We took one step further by performing a dynamic perturbation analysis to identify the interventions leading to the disappearance of the disease fixed point. We note that our study has a dramatically larger scope than the previous key mediator analysis of Zhang et al. [17]. For the dynamical analysis, we employed the GA approach instead of the random order asynchronous method and considered all possible initial conditions as opposed to performing numerical simulations using a specific initial condition. Zhang et al. only focused on the node Apoptosis, and identified as key mediators the nodes whose altered state increases the frequency of ON state of Apoptosis. An increase in Apoptosis ON state does not necessarily imply that apoptosis is the only possible final outcome of the system. In this work, after finding the fixed points, which completely describe the state of the whole system, we performed dynamic perturbation analysis by fixing the state of each node to its opposite state in the T-LGL fixed point and 63

74 determining which fixed points were obtained and what their basins of attraction were. This way we were able to identify and distinguish the key mediators whose altered state completely eliminates the leukemic outcome, and those whose altered state reduces the basin of attraction of the leukemic outcome. Moreover, numerical simulations, as done in [17], may not be able to thoroughly sample different timing. In this study, using a reduction technique, we found the cases when timing does not matter with certainty (where there is only one fixed point), and also the cases in which timing and initial conditions may matter (where there are two reachable fixed points). Our analysis led to the identification of 19 therapeutic targets (the first 19 nodes in the first column of Table 3.2), 53% of which are supported by direct experimental evidence and 15% of which are supported by indirect evidence. Multi-stability (having multiple steady states) is an intrinsic dynamic property of many disease networks [100,101], which is related to the presence of feedback loops (see Section 2.1) in the network. It was conjectured that the presence of positive feedback loops in the network is necessary for multi-stability whereas the existence of negative feedback loops is required for having sustained oscillations [52]. From a biological point of view, the former dynamical property is associated with multiple cell types after differentiation while the latter is related to stable periodic behaviors such as circadian rhythms [102]. We note that the T-LGL signaling network consists of both positive and negative feedbacks and thus has a potential for both multi-stability and oscillations. Indeed, the negative feedback in the top sub-graph of Figure 3.2(a) causes the complex attractor shown in Figure 3.3. In contrast, the negative feedback on the node P2 of the bottom sub-graph is counteracted by the positive self-loop on the same node, thus no complex attractor is possible for the bottom sub-graph of Figure 3.2(a). The two mutual inhibition-type positive feedback loops present in the bottom sub-graph and the self-loop on P2 generate the three fixed points, while the positive self-loop on Apoptosis maintains the normal fixed point once Apoptosis is turned ON. Negative feedback loops can be a source of oscillations [103], homeostasis [103], or excitation-adaptation behavior [104]. Especially, when the activation is slower than the inhibitory interaction in the negative feedback, it can lead to sustained oscillations [103]. In the T-LGL network, the negative feedback loop between the T cell receptor TCR and 64

75 CTLA4 modulates stimulus-induced activation of the receptor in such a way that CTLA4 is indirectly activated after prolonged TCR activation, whereas the inhibition of TCR by CTLA4 is a direct interaction [105]. That is, activation is slower than inhibition in the negative feedback and thus an oscillatory behavior reminiscent of that obtained by our asynchronous Boolean model would also be observed in continuous modeling frameworks as well. Although no time-measurements of the T cell receptor activity in T- LGL exist, it has been reported that there is variability for TCR activation in different patients ([84] and unpublished observation by our experimental collaborator, Prof. Thomas P. Loughran), supporting the absence of a steady state behavior. Our study revealed that Boolean dynamic modeling is a powerful approach for identifying therapeutic targets of a disease. This approach yields a comprehensive picture of the state transition graph, including all possible fixed points of the system, their corresponding basins of attraction, as well as the relative frequency of trajectories leading to each fixed point. We demonstrated that the limitations related to the vast state space of large networks can be overcome by judicious use of the network reduction technique that we developed in our previous study [40]. Overall, the analysis presented in this study opens a promising avenue to predict dysregulated components and identify potential therapeutic targets, and it is versatile enough to be successfully applied to a large variety of signal transduction and regulatory networks related to diseases. 65

76 Piecewise Linear Differential Equation (Hybrid) Models of Biological Regulatory Networks 4.1. Introduction The class of piecewise linear differential equation (hybrid) models, which bridges the gap between discrete and continuous models, meld the logical description of the regulatory relationships with a linear concentration decay. These models were originally proposed by L. Glass [11] to provide a coarse-grained description of gene regulatory networks, and since then their properties have been studied in the literature [106,107,108,109,110,111,112,113]. In these models, each node v i is characterized by two variables: a continuous variable,, which denotes the concentration of that component, and a discrete variable, x i, that accounts for its activity. At each time instant t, the discrete variable is defined by the continuous variable according to a threshold parameter as follows: x i (t) = 1 ; x ˆ (t) > i i (4.1) 0 ; x ˆ i (t) < i The time evolution of the continuous variable is then described by the following piecewise linear differential equation: 66

77 dˆ x i dt = i f i (x i1,..., x imi ) i ˆ x i, (4.2) where f i is the Boolean function for node v i with m i regulators, and and are synthesis and decay parameters. It is usually assumed that. An alternative is to scale the continuous variable by i i in (4.2) and thus to consider. We note that in a more general form, one can consider p i threshold parameters for a node if it regulates p i downstream nodes. In order to analyze the dynamics of a given system under the hybrid framework, the state space of the system is partitioned into different domains bounded by threshold hyperplanes. The domains where no variable takes a threshold value are called regulatory domains, and the ones where at least one variable has a threshold value (i.e., the threshold hyperplanes and their intersections) are referred to as switching domains [106]. On the regulatory domains, differential equation (4.2) has a unique solution, which can be obtained analytically. However, this equation is not defined on the switching domains. The Filippov approach [114] was adapted to define the solutions on the switching domains by extending the piecewise linear differential equation into a differential inclusion [112]. In the following, we first describe a hybrid model of the interaction network between mammalian host immune components and respiratory bacteria. This model has many parameters and numerous correlations among parameter values are identified. My contribution in this project was to provide an interpretation of the parameter correlations through performing theoretical analysis of a toy network. Next, we present a hybrid model of a simplified version of the T-LGL leukemia signaling network with the aim of discovering the relation between the state trajectories in the hybrid and asynchronous Boolean models and determining which properties of the hybrid models may not be captured by the asynchronous Boolean models A hybrid model of the pathogen-immune system interactions This section has been previously published in modified form in the Journal of the Royal Society Interface [115]. Dr. Juilee Thakar served as a collaborator in this project. Subsection is a brief summary of her work. Figures 4.1 and 4.2 were generated by her as well. 67

78 Network modeling The dynamic interplay between bacteria and host immune system can lead to recovery, persistent disease, or death of the host [69]. Network modeling can aid understanding the regulation of immune responses, which is a complex system of mechanisms, by integrating the behavior of various components into a coherent representation. In this work, we study the network model of interactions between a mammalian immune system and Bordetellae bronchiseptica bacteria (see Figure 4.1) during its infection of the lower respiratory system, specifically the lungs. The nodes of the network represent bacteria and the components of the immune system, and the edges denote interactions and processes. Similar to any other regulatory network, there are two types of edges: activation and inhibition, represented by or, respectively. The edges directed towards another edge represent the regulatory relationships that modulate either a process or an unspecified mediator of a process. The network was first assembled by Thakar et al. [69] using experimental literature and further refined in [115] by adding several new nodes and edges to the network as well as identifying two different compartments. Compartment I in Figure 4.1 represents the site of infection, i.e. the lungs, and compartment II corresponds to the sites of T0 cell and B cell activation, e.g., the lymph nodes. An asynchronous Boolean framework was employed to model the original network reconstructed in [69], which reproduced the basic features of infection such as clearance or persistence of bacteria. In addition, that study revealed that the decay and/or desensitization of immune components is crucial for their dynamics [69]. To further parameterize the model and to obtain more quantitative agreement with the available experimental data, we employed a piecewise linear differential equation (hybrid) method. The 34 equations modeling the time evolution of the continuous variables in the system are given in Table C.1 (see Appendix C). Except for a few components that were modeled by a continuous equation (such as bacteria), all components have a threshold and a decay parameter. For simplicity, the synthesis parameters were considered to equal one. Since signaling molecules called cytokines are expected to be produced faster than other components in the system, their equations were augmented by a scaling factor as well. The available time course data for two components of the system, the cytokines 68

IL10 and IFN, and bacterial numbers were used to constrain the ranges of parameter values, which are listed in Table 4.1. Figure 4.1. Network model of immunological steps and processes activated upon invasion by B.

79 IL10 and IFN, and bacterial numbers were used to constrain the ranges of parameter values, which are listed in Table 4.1. Figure 4.1. Network model of immunological steps and processes activated upon invasion by B. bronchiseptica. Network nodes denote bacteria and components of the immune system, and edges represent interactions and processes. An edge with an arrowhead or a blunt segment at the end represents activation or inhibition, respectively. The gray box separates the interactions taking place in compartment II (the site of T and B cell activation) from compartment I (the site of infection). Nodes that can appear in both compartments appear as two nodes distinguished by extension I or II to their names [115]. Table 4.1. Parameters in the hybrid model of the pathogen-immune system interaction network and their ranges [115]. Parameter Range (decay parameter) (activity threshold) (scaling factor for cytokines) 1 5 n (Hill coefficient) 1 n 5 H (Hill constant) 0.01 H 1 r, rc (random variables) 0 rc-r and rc+r 1 x max = 1/ (maximum concentration) 1 x max 20 69

80 The model qualitatively reproduces the experimental growth curves of wild-type (WT) and mutant bacteria (defective in type III secretion system) that were previously observed in [116] (see [115] for details). For example, there is a faster decrease in the bacterial numbers in the mutant bacteria, and the wild-type bacteria are cleared slower than the mutant bacteria. This indicates that the dynamic model accurately represents the host s immune response Parameter analysis In total, the model contains 34 equations and 75 parameters. The values of the parameters were selected uniformly randomly from the ranges given in table 4.1. The constraints obtained by the experimental data were used to select sets of successful parameter values. Although such sets were rare and found approximately once in 50,000 simulations, we sampled the parameter space until we found 30 successful sets. The selected parameter sets outline the biologically acceptable parameter space. Performing a correlation analysis on the successful parameter sets revealed a significant degree of interdependence between parameter values. For example, by analyzing the correlations of the and parameters between different nodes we found 42 (10%) significant - correlations and 35 (8%) significant - correlations. Furthermore, we identified 79 (9%) significant correlations between of one node and of another. To better understand the effect of parameters and parameter correlations on the activity of the nodes, we constructed a simple illustrative example consisting of two nodes, a source node A and a target node B. We separately considered the cases when A activates or inhibits B. We described the dynamics of the nodes by the hybrid formalism. We hypothesized that the parameter correlations are meant to optimize the effectiveness of the regulatory activity of A. Node A is described by and node B is given by (when A activates B) or (when A inhibits B) in the illustration. The initial condition of A is, while in case of activation and (i.e., the maximal value) in case of inhibition. The values of the discrete variables of A and B are determined by comparing their continuous variables to the 70

81 thresholds and, respectively. The output variable is the activity of node B, defined as the integral of its discrete variable over a duration T,, which is equal to the time interval during which. By solving the above differential equations, we find that the discrete variable of node B switches on in case of activation at time and in case of inhibition it switches off at time. Since is increasing, there is a single switch in B. According to our hypothesis we aim to find the parameter correlations that maximize the activity of B in the activation interaction and minimize the activity of B in the inhibition interaction. This is equivalent to minimizing both (so B switches on the earliest possible) and (so B switches off the earliest possible). The effect of parameter correlations on the activity of B is illustrated in Figure 4.2. In an activating interaction, an increase in A induces an increase of t B, but t B can be reduced again if B decreases. Thus pairing an increase in A with a decrease of B (i.e., a negative A - B correlation) may have a stabilizing effect on the activity of the target node. Varying a single parameter while keeping the other three constant leads to a monotonic increase or decrease in the activity of node B, indicating that high or low activity would only be possible in the low or high limits of the biological parameter range. This scenario is not supported by the relatively equal distribution of parameters. In contrast, we find that correlated variation of two parameters, while keeping the other two constant, can lead to an extreme (maximum or minimum) activity at intermediate parameter values, and to a consistently high activity (in case of activation) or low activity (in case of inhibition) over a significant parameter range. We analytically solve representative cases when A - B, A - B, A - B, B - A, A - A and B - B are positively or negatively correlated. As an example of positive correlation between A and B (or A and B ), we assumed B = A (or B = A ). As an example of negative correlation we assumed B = 0.5 A (or B = 1 A ), so that the whole range of parameters is scanned. In the case of positive correlation of A and B, B and A, A and A, or B and B, we supposed that A = 2 B, B = 2 A, A = 2 A, or B = 2 B, respectively. Finally, for the 71

negative correlation of these parameters we assumed A = 1 2 B, B =1 2 A, A =1 2 A, or B =1 2 B. Figure 4.2. Illustration of the effect of parameter correlations in an example where node A activates node B.

82 negative correlation of these parameters we assumed A = 1 2 B, B =1 2 A, A =1 2 A, or B =1 2 B. Figure 4.2. Illustration of the effect of parameter correlations in an example where node A activates node B. Both nodes are described by the hybrid framework and characterized by variable activation thresholds and decay rates equal to unity. The discrete variables of node A (B) is turned on at time point t A (t B ) when the continuous variable associated with A (B) becomes greater than the activation threshold A ( B ). The figure shows the continuous variables associated with A and B and indicates the value of t B for three combinations of A and B : (a) A = B = 0.2, (b) A = 0.4 and B = 0.2, (c) A = 0.4 and B = t B increases in (b) compared to (a) because of the increase in A. However, the increase of t B can be curtailed if the increase of A is coupled by a decrease of B in (c) [115]. Four types of correlations, namely negative A - B and A - B correlations and positive A - B and A - B correlations, maximize the activity of the target node in an activating interaction. Similarly, four types of correlations, namely positive A - B and A - B correlations and negative A - B and A - B correlations, minimize the activity in an inhibitory interaction. We assessed if the minima of and for these conditions are attained in the correct parameter range by plugging in the real values of the parameters in our 30 successful parameter combinations. For each observed correlation and for each of the 30 parameter combinations, we calculated the value of the relevant expressions 72

Accepted Manuscript. Boolean Modeling of Biological Regulatory Networks: A Methodology Tutorial. Assieh Saadatpour, Réka Albert

Accepted Manuscript. Boolean Modeling of Biological Regulatory Networks: A Methodology Tutorial. Assieh Saadatpour, Réka Albert Accepted Manuscript Boolean Modeling of Biological Regulatory Networks: A Methodology Tutorial Assieh Saadatpour, Réka Albert PII: S1046-2023(12)00277-0 DOI: http://dx.doi.org/10.1016/j.ymeth.2012.10.012