Modeling of Topological Effects in Biological Networks

Internship report Modeling of Topological Effects in Biological Networks Vincent Picard École Normale Supérieure de Cachan (Rennes) Department of Computer Science and Telecommunications Magistère of Computer Science and Telecommunications University of Rennes 1 Rennes, France Supervised by Professor Olli Yli-Harja Head of Computational Systems Biology Research Group Department of Signal Processing Tampere University of Technology Tampere, Finland and Juha Kesseli Computational Systems Biology Research Group Department of Signal Processing Tampere University of Technology Tampere, Finland Internship location Tampere University of Technology, Tampere, Finland Internship dates from 1st June 2010 to 31st August 2010 Oral examination date 10th September 2010

Rapport de stage Modélisation des effets topologiques dans les réseaux biologiques Vincent Picard École Normale Supérieure de Cachan (Rennes) Département Informatique et Télécommunications Magistère Informatique et Télécommunications Université de Rennes 1 Rennes, France Sous la direction du Professeur Olli Yli-Harja Directeur du Computational Systems Biology Research Group Département du traitement du signal Université technologique de Tampere Tampere, Finlande et de Juha Kesseli Computational Systems Biology Research Group Département du traitement du signal Université technologique de Tampere Tampere, Finlande Lieu du stage Université technologique de Tampere, Tampere, Finlande Dates du stage du 1er juin 2010 au 31 août 2010 Date de la soutenance 10 septembre 2010

Abstract Boolean networks have been introduced as a model of gene regulatory networks Using the so called annealed approximation, the dynamics of Boolean networks with random topology can be analyzed However, biological networks are not expected to have random topology Instead, they have been shown to contain small local structures, called motifs, that occur more often than expected by chance In this work, we use partially annealed Boolean networks to model networks with local structures Using finite Markov chain theory, we prove the convergence of the model to a unique stationary distribution that can be computed easily We also study the robustness of networks with local structures by extending the definition of the sensitivity of the classical annealed approximation The theoretical results and simulations show that local topological structures can have an important stabilizing effect on the dynamics of the system Keywords Boolean, network, annealing, steady state, perturbation Résumé Les réseaux booléens ont été présentés comme modèles de réseaux de régulation génétiques En utilisant ce qu on appelle une approximation de recuit, on peut analyser le comportement dynamique des réseaux booléens dont la topologie est aléatoire Cependant, on ne s attend pas à ce que les réseaux biologiques aient une topologie aléatoire Au contraire, on a montré qu ils contiennent de petites structures locales, appelées motifs, dont la fréquence d apparition est trop grande pour être due au hasard Dans ce travail, nous utilisons des réseaux booleéns partiellement recuits pour modéliser les réseaux possédant des structures locales En utilisant la théorie des chaînes de Markov à états finis, nous prouvons la convergence du modèle vers un état stationnaire dont les caractérisques peuvent être facilement calculées Nous étudions également la robustesse des réseaux possédant des structures locales en étendant la définition de sensibilité des réseaux recuits classiques Les résultats théoriques et les simulations effectuées montrent que la topologie locale peut avoir un important effet de stabilisation sur la dynamique du système Mots clefs réseau, booléen, recuit, état stationnaire, perturbation

Modeling of topological effects in biological networks Vincent Picard I Introduction f i The study of biological networks is a major concern of computational biology Genetic regulatory networks act as biochemical computers that dynamically change the behavior of cells in response to their environment by controlling the level of expression of each gene Boolean networks were introduced by Stuart A Kauffman as a mathematical framework for the study of gene regulatory networks Of particular interest is the study of perturbations In stable networks, small perturbations tend to die out whereas in chaotic networks, they tend to spread over the whole network Recent works have given evidence that biological networks may be at the edge of chaos Using what is called the annealed approximation, the dynamical behavior of Boolean networks with random local topology can be analyzed However, biological networks are not expected to have random topology Indeed, they have been shown to contain small local structures, called motifs, that occur more often than expected by chance In this work, we propose to use a partially annealed approximation to analyze the dynamics of Boolean networks with local topological structures In section II, we provide basic knowledge about Boolean networks and the annealed approximation In section III, we apply finite Markov chain theory to analyze the steady state distributions of partially annealed Boolean networks We study the propagation of perturbations in networks with local structures in section IV We propose an extension of the network sensitivity, used to characterize the stability of annealed networks, to study the effect of local structures on stability II Boolean networks Boolean networks have been introduced by Kauffman [1, 2, 3, 4] to model genetic regulatory networks In living organisms, genes have different expression vincentpicard@ens-cachanfr x i x j1 x j2 x jki Figure 1: Modeling of gene interactions levels from one cell to another Experiments have given ample justification for modeling the activity of a given gene using a binary value representing whether the gene is expressed or not [5] An expressed gene leads to production of proteins which can act to regulate other genes A Boolean network is a directed graph where nodes represent genes and edges correspond to biochemical interactions between genes For instance, two edges from A to C and from B to C mean that A and B jointly act on C Nodes are given Boolean values x 1,, x N that represent gene states, and Boolean functions f 1,, f N that determine the nature of the genetic interactions The value of gene x i at time t + 1 is determined (cf Fig 1) by the values of the genes x j1,, x jki that act on it at time t by means of the following equation: x i (t + 1) f i (x j1 (t),, x jki (t)) (1) In this report, we assume that the updates are synchronous, ie genes update their states simultaneously The state of the network at time t is the vector of the states of all its nodes Hence, the state of the network at time t + 1 is determined by equation 1 Understanding the dynamical behavior of Boolean networks is a major topic of research [6] Of particular interest is the study of attractors which are sets of states in which the system eventually ends up Another majorn concern is the propagation of perturbations In 1

ordered networks, a small perturbation tends to die out, whereas in chaotic networks, it tends to spread over the entire network At the border between order and chaos lies criticality in which the average size of small perturbations is constant in time Recent works have given evidence that real genetic regulatory networks may have critical dynamics [7] f 1 f 2 f 1 f 2 f 1 f 2 II1 Ensemble approach Studying a particular Boolean network is a very difficult issue For instance, it is impossible in practice to enumerate the 2 N possible states of a Boolean network of N nodes Also, very often, interactions between genes are not completely known or actors of a biological mechanisms are not properly identified For instance, epigenetic regulation is known to exist but its mechanism is unknown Hence, it is a difficult task to establish an accurate Boolean network to model a given biological mechanism Instead, we can study the general properties of a class of Boolean networks and consider our particular network to be an instance of this class This is called the ensemble approach [8] Kauffman s nets are an important example of ensemble approach Kauffman introduced so-called NK Boolean networks which is a class of random Boolean networks of N nodes [1, 2, 3] Each node has K inputs that are chosen randomly among the N nodes In addition, the Boolean functions are chosen randomly according to a given random function distribution with bias ρ II2 Annealed Boolean networks The annealed approximation has been used to study the dynamical behavior of an ensemble of Boolean networks [9] It consists in choosing randomly the inputs of each node at every time step (cf Fig 2) In doing so, we assume networks to have random topology and we remove all effects of local topological structures Annealed Boolean networks allow the study of dynamics of large random networks with no local structure [10] Symmetries, due to the annealed approximation, make all nodes equivalent Hence, the proportion b of nodes with value 1 is sufficient to describe the state of the system Of particular interest is the study of the bias map [11] This is the mapping b(t + 1) g(b(t)) of the proportion of nodes with value 1 at time t + 1 given the proportion of nodes with value 1 at time t The bias map describes the dynamics of the system and f 3 f 4 f 5 f 3 f 4 f 5 f 3 f 4 f 5 t t + 1 t + 2 Figure 2: An annealed Boolean network: the inputs are chosen randomly at each time step especially the stationary points and their stability As an example, a bias map of an annealed Boolean network (simulated with Matlab) can be seen in Fig 3 From this figure, it is easy to see the existence of a unique stationary distribution b corresponding to the unique fixed point and satisfying the equation b g(b ) This distribution is stable since g (b ) < 1 b(t+1) 1 09 08 07 06 05 04 03 02 01 Bias map Identity map 0 0 02 04 06 08 1 b(t) Figure 3: An example of bias map 2

X Z X Z Y (a) Coherent type 1 Y (b) Incoherent type 1 Figure 4: Feedforward loops ( is an activating effect and is an inhibiting effect) b(t+1) 1 09 08 07 06 05 04 Original network Network 1 Network 2 Network 3 Network 4 Network 5 Identity Map II3 Topological effects Alon observed that biological networks contain some small local structures called motifs, that occur more often than expected by chance [12] Motifs have been shown to have important biological functions such as introducing delays or detecting signal persistence [12] Motifs can be considered as building blocks for complex regulatory networks Fig 4 presents two examples of classic feedforward loop motifs Motifs have an important effect on the dynamics of the system We simulated on Matlab a random Boolean network of 1200 nodes that contains 200 coherent type 1 feedforward loops (C1-FFL) The model of C1-FFL is shown in Fig 6 Then, we changed randomly the inputs of each node We did this five times to obtain five different networks (Networks 1 to 5) which have exactly the same Boolean functions as the original network but random local topology Fig 5 presents the average number of nodes with value one at time t + 1, t + 2 and t + 3 for a given proportion of nodes with value one at time t Because the values of nodes are randomly distributed at time t and all networks have the same functions, we do not see any difference after one time step However, after two or three time steps, we observe that motifs have important effects on the dynamics of the system We observed that the dynamical behavior of networks with local structures is qualitatively different from networks with random topology However, the classical annealed approximation does not take into account the local structures and assumes a random topology This makes many results somewhat uninteresting since biological networks are not expected to have random local topology II4 Dealing with local structures Previous work has been done to deal with the effect of local structures A major concern is to compute the cor- b(t+2) b(t+3) 03 02 01 0 0 01 02 03 04 05 06 07 08 09 1 b(t) 1 09 08 07 06 05 04 03 02 01 Original network Network 1 Network 2 Network 3 Network 4 Network 5 Identity Map (a) b(t+1) 0 0 01 02 03 04 05 06 07 08 09 1 b(t) 1 09 08 07 06 05 04 03 02 01 Original network Network 1 Network 2 Network 3 Network 4 Network 5 Identity Map (b) b(t+2) 0 0 01 02 03 04 05 06 07 08 09 1 b(t) (c) b(t+3) Figure 5: Proportion of nodes with value 1 after 3 steps The original network contains C1-FFL loops Networks 1 to 5 have random local topology 3

X 1 X 2 Inputs OR X Y ID AND Z 000 001 010 011 100 101 110 111 000 u 0 0 0 u 0 0 0 001 u 0 0 0 u 0 0 0 010 u 0 0 0 u 0 0 0 011 u 0 0 0 u 0 0 0 100 0 0 u 0 0 0 u 0 101 0 0 u 0 0 0 u 0 110 0 0 0 u 0 0 0 u 111 0 0 0 u 0 0 0 u Figure 6: The example case Boolean network modeling a C1-FFL motif Table 1: Transition matrix of the C1-FFL motif (u p 00 and u 1 u) relation between the values of nodes that belong to the same structure Harju et al use a partially annealed approach to model networks with local structures [13] However, the results only apply to tree like local structures with no loops Kesseli et al slightly modify the classical annealed Boolean network model by adding parameters that introduce correlation between the inputs of each nodes [14] The model allows simple computation of the bias map The results show that correlation can have drastic effects on the dynamics of the system However, adjusting the parameters to model a Boolean network with a given proportion of motifs is a major issue III Steady state analysis In this section we address the problem of determining the steady state of random networks containing local structures We have shown in the previous section that the classical approach consisting in computing the fixed points of the bias map fails to predict the steady state when local topological structures exist In our approach, we apply finite Markov chain theory [15] to a partially annealed Boolean network model In this model, the dynamical behavior of a motif is represented by a transition matrix The eigenvectors of the transition matrix allow us to compute the joint steady state distribution of the nodes belonging to the motif The reader should refer to Appendix A for all technical details, definitions, theorems and proofs Here, we only give the basic ideas of the theory and deal with the example of the C1-FFL motif III1 Example of the C1-FFL motif Let us consider the example of the C1-FFL motif We model the C1-FFL motif using the Boolean network presented in Fig 6 The motif is composed of three random variables X, Y and Z which can take value zero or one at time t The updates are synchronous The inputs of the motif at time t are represented by a random vector (X 1 (t), X 2 (t)) III11 Isolated motif First, we study the behavior of this motif without considering any network context We choose random inputs for the motifs according to a distribution P (p 00, p 01, p 10, p 11 ) and we focus on the time evolution of the state of the motif σ(t) (X, Y, Z)(t) We call Σ {000, 001, 010, 011, 100, 101, 110, 111} the set of all possible states Assuming that (X 1, X 2 )(t) and σ(t) are independent, (σ(t)) t 0 is a finite Markov chain We call q ijk (t) P[σ(t) (i, j, k)] the probability that the motif is in state (i, j, k) at time t, and Q(t) (q 000, q 001, q 010, q 011, q 100, q 101, q 110, q 111 )(t) The dynamics of a finite Markov chain is completly determined by its transition matrix Indeed, the dynamics of the motif is determined by the equation Q(t + 1) Q(t)M, (2) where M is the transition matrix The transition matrix M (M σaσ b ) gives the probability to reach the state σ b at time t+1 given that the motif is in state σ a at time t These conditional probabilities are determined by the input distribution and the motif Table 1 represents the transition matrix of the C1-FFL motif Stationary distribution In our example, the characteristic polynomial of the transition matrix is 4

χ X 7 (X 1), so 1 is an eigenvalue of the matrix and the dimension of the associated eigenspace is 1 We can compute an eigenvector associated to the eigenvalue 1 1 0 u u+u u 2 uu+u V u u 0 2 u uu+u 2 u 3 u 2 u+u 3 (3) By normalizing this vector, we get the unique stationary distribution of the Markov chain Q (u) V V 1 (4) u 3 + u 2 u 0 u 2 u uu 2 uu 2 + u 2 u 0 uu 2 u 3 (5) The distribution Q is the unique distribution that satisfies the equation Q Q M Convergence toward a steady state By iterating equation (2) we can compute the distribution Q(t) given the initial distribution Q(0) Q(t) Q(0)M t (6) Cayley-Hamilton theorem tell us that M 8 M 7, so we have Q(t) Q(0)M 7 for all t 7 Hence, no matter the initial distribution Q(0) is, Q(t) converges toward a steady state distribution Because the limiting distribution is also a stationary distribution, the only possible limiting distribution is Q By using simple application of finite Markov chains theory, we have shown that it is possible to derive analytical equations of the unique steady state joint distribution This steady state distribution gives the probability that a motif is in a given state when t + III12 Partially annealed model Until now, we have considered the steady state of a single isolated motif determined by the input distribution 1 α α 3 ρ Background I 1 I 2 Inputs X OR α 3 Y ID α 3 Z AND Figure 7: Simplified model of partial annealed network Now we consider the case of a network that contains a certain proportion of motifs We model this network using partially annealing We consider that the network contains N background nodes that do not belong to motifs and M C1-FFL motifs The partially annealing consists in choosing randomly the inputs of the background nodes and the input of the motifs at each time step The linkage inside motifs is not modified In doing so, we take into account the effect of the local structures The Boolean functions of the background nodes are chosen from a background random function distribution F in which functions have K inputs and a bias ρ The bias ρ means that for all input vectors x and all f F, P [f(x) 1] ρ We call m the number of nodes in one motif and α belong to a motif mm N+mM the proportion of nodes that In this more complicated context, the inputs of the motifs are determined by the states of the other motifs and the states of the background nodes We address this issue by assuming that the motifs are independent This approximation allows us to extend the model of an isolated motif to compute the steady state of the partially annealed network We consider the inputs independent, and that they have probability 1 α to take the value of a background node and probability α to take the value of one of the motif nodes (cf Fig 7) We also approximate the behavior of the background nodes by assuming that their value is 1 with probability ρ at each time step Let us go back to the example of C1-FFL motif Using the simplified model of a partially annealed network, the probability u i that the inputs have value 00 given that the current motif state contains i nodes with 5

000 001 010 011 100 101 110 111 000 u 0 0 0 0 u 0 0 0 0 001 u 1 0 0 0 u 1 0 0 0 010 u 1 0 0 0 u 1 0 0 0 011 u 2 0 0 0 u 2 0 0 0 100 0 0 u 1 0 0 0 u 1 0 101 0 0 u 2 0 0 0 u 2 0 110 0 0 0 u 2 0 0 0 u 2 111 0 0 0 u 3 0 0 0 u 3 011 000 100 001 010 Table 2: New transition matrix of the C1-FFL motif 110 101 value 1 is u i ( α ρ + α 3 i ) 2 (7) 3 111 Hence, this slightly changes the transition matrix of the Markov chain The new transition matrix is given in table 2 It is now possible to do the same calculations as before to derive analytical equations for the steady state distribution We have proved under certain assumptions that the system converges toward a unique stationary distribution using finite Markov chain theory arguments (cf section III2 and Appendix A) III2 General case Here we discuss the assumptions on motifs that made the previous calculations possible In the previous section, we applied finite Markov chain theory to compute steady state distributions Finite Markov chains always have a stationary distribution, however, they may have an infinite number of them Moreover, even when there exists a unique stationary distribution, the convergence of all initial distributions to the stationary distribution does not always occur However, under some assumptions on the state space structure of the motif, the results hold III21 State space structure The state space structure of a motif is a directed graph where nodes represent the states of the motif and edges correspond to possibility of an update from one state to another For instance, an edge σ a σ b means that there exists an input vector for which the motif updates its value from state σ a to state σ b For instance, Fig 8 presents the space state structure of the C1-FFL motif Figure 8: Space state structure of the C1-FFL motif Colors represent communicating classes Red states form the unique recurrent communicating class III22 Uniqueness In finite Markov chains theory, the strongly connected components of the state space structure are called communicating classes A communicating class is said to be recurrent if it is impossible to leave the communicating class If the motif contains a unique recurrent communicating class then there exists a unique stationary distributions for both the isolated motif model and the partially annealed Boolean network model Otherwise, there is an infinity of stationary distributions III23 Convergence When there exists a unique stationary distribution, the convergence of any initial distribution depends on the period of the unique recurrent communicating class The period of a communicating class is by definition the greatest common divisor of all the cycle lengths inside the communicating class If the period is 1, the communicating class is said to be aperiodic A recurrent aperiodic communicating class is said to be ergodic If there is a unique ergodic communicating class, then we have both uniqueness of the stationary distribution and convergence toward it Most of the time, motifs have a unique recurrent communicating class that contains a loop node, so that the communicating class is aperiodic Hence, our applica- 6

1 09 08 07 Simulation: quenched model Simulation: partial annealed model Theory: partial annealed model Theory: isolated motif model analytical equations and the simulated quenched network This is mainly due to a finite network size effect Indeed, annealed networks approximate quenched networks when the network sizes tend to infinity 06 05 04 03 02 01 0 q000 q001 q010 q011 q100 q101 q110 q111 Figure 9: Steady state joint distribution for the example case of C1-FFL motif tion of Markov chains theory is possible for almost all motifs III3 Results We simulated both a partially annealed Boolean network and a quenched Boolean network We refer as quenched network the same network but in which no annealing is done Both networks have 2400 nodes and contain 400 C1-FFL motifs We computed the motif state distribution of the simulated networks after 10000 updates We compared these results with the analytical equations given by the Markov chain theory We first computed the empirical proportion of nodes with value 1 in the steady state and used the result to derive the stationary distribution given by the study of the isolated motif Then, we calculated the steady state distribution given by our simplified model of a partially annealed network with α 1 2 The results are presented in Fig 9 We see that the isolated motif theory fails to give an accurate steady state distribution This is mainly due to the fact that the model does not take into account the proportion of motifs in the network However it can still be used as a first order approximation On the other hand, the analytical equations of our simplified model of partially annealed Boolean networks fit well with the experimental results Small differences, observed between the analytical equations and the simulated partially annealed network, are mainly due to the low number of motifs Also, we notice a small difference between the IV Propagation of perturbations In this section, we address the problem of measuring the robustness of Boolean networks with local structures We still consider a partially annealed Boolean network model since we already have a theoretical framework to compute the steady state distribution Measuring robustness is usually done by studying the propagation of small perturbations in the network In ordered networks, small perturbations tend to die out In chaotic networks, perturbations propagate to the entire network Critical networks are at the edge of chaos: small perturbations do not die out but their average size remains constant in time There is evidence that life is at the edge of chaos, so it is an important issue to understand the effect of local structures on the stability of the network In this work we show that C1-FFL motifs increase the stability of the biological networks IV1 Order parameter of annealed Boolean networks In classic annealed Boolean networks, propagation of perturbations is measured by the average sensitivity λ of the function distribution F The average sensitivity, also called the order parameter, is the average number of perturbed nodes at time t + 1 when we flip the value of a randomly chosen node at time t If λ < 1 the network is stable If λ 1 the network is critical If λ > 1 the network is chaotic Here we present the properties of Boolean functions that allow to compute λ We call K f the number of inputs of function f and B {0, 1} The influence of the ith input of a function f is the probability that the output of the function is perturbed when we perturb the i input I i x B K f [f(x) f(x e i )]P(x), (8) where is the bitwise exclusive OR operator, e i the unit vector defined by (e i ) k 1 if k i and (e i ) k 0 if k i, and P(x) is the probability that the input x occurs In a Boolean network, this correspond to the probability that a given edge propagates a perturbation 7

f I 1 I 2 I Kf x 1 x 2 x kf Figure 10: Influences of the input edges (cf Fig 10) The average influence of a function f is I 1 K f K f i1 x B K f [f(x) f(x e i )]P(x), (9) and the average influence of a function distribution F is I E 1 K f [f(x) f(x e i )]P(x) f F K f i1 x B K f (10) The sensitivity λ of a Boolean function f is the sum of all the influences of its inputs λ K f i1 x B K f [f(x) f(x e i )]P(x), (11) and the average sensitivity of a Boolean function distribution is K f λ E [f(x) f(x e i )]P(x) (12) f F i1 x B K f For instance it has been shown that the average sensitivity of a random function distribution with K inputs and bias ρ is λ 2Kρ(1 ρ) So, for a random function distribution with bias ρ 1 2, networks with K > 2 are chaotic, networks with K 2 are critical and networks with K < 2 are stable Most of the time, we want to study the propagation of perturbations when the network has already reached the steady state We call I and λ the influences and sensitivities when they are computed in the steady state and with randomly chosen input nodes For example, in the partially annealed case P(x) is given by P(x) u x (1 u) 1 x, (13) where u is the proportion of node with value 1 in the steady state u (1 α)ρ + α m q m σ (14) i1 σ Σ σ i 1 IV2 Dependency trees We have shown in section II3 that local structures have important effects on the dynamical behavior of the networks This effect is only visible after several time steps, because the data processing of local structures needs multiple time steps to affect the dynamics For instance, in the example of a C1-FFL motif, we observe that the state of the motif at time t is totally determined by the input values at time t 1, t 2 and t 3 Hence, we could say that the C1-FFL motif has a "response time" of 3 However, the average sensitivity λ does not take into account this multiple time step process Indeed, it only indicates whether a perturbation at time t survives at time t + 1 In order to take into account the correlation effect of local structures, we need to study the survival of a perturbation in multiple time steps In an annealed Boolean network, we define the dependency tree of depth k of a given node It is a tree of depth k in which the given node is the root The children of each node are the selected inputs by the partially annealing process at the previous time step Fig 11 shows an example of such a tree Sometimes, this structure is not a tree but a directed acyclic graph since it is possible that a particular node is selected twice during the annealing process However, the annealed networks are an approximation of the dynamical behavior of large networks Hence, the probability that a node is selected twice can be neglected A dependency tree can be considered as a Boolean function We define the sensitivity λ of a dependency tree as the sensitivity of the resulting Boolean function The influence of the ith leaf of the tree can be computed by multiplying all the influences of the edges from the leaf to the root We can define λ by induction If the depth of the tree is 1 then λ is the sensitivity of the root function If the depth of the tree is greater than 1, then we can compute by induction λ 1,, λ p the sensitivities of the root children, and define the sensitivity as I 1 λ 1 + + I p λ p, where I i is the influence of the ith input of the root function For instance, the sensitivity of the tree represented in Fig 12 is λ I 1 I 3 + I 1 I 4 + I 1 I 5 + I 2 I 6 + I 2 I 7 (15) In this work, we consider that I I for the calculation of tree sensitivities However, to be more accurate, local bias effects should be taken into consideration, especially when tree nodes belong to motifs To take into account the local bias effect, we should compute P(x) 8

f I 1 I 2 f 1 f 2 I 3 I 4 I 5 I 6 I 7 x 1 x 2 x 3 x 4 x 5 Figure 12: Considering dependency trees as Boolean functions X Y X Y X Y X Y by considering that the input nodes are not necessary randomly chosen Z Z Z Z IV3 Order parameters of partially annealed networks X Y Z f g X Y Z f g X Y Z f g X Y Z f g As shown in Fig 11, the presence of the local structures constrains the dependency trees When no local structures exist, the probability of a given dependency tree depends only on the background function distribution F When local structures exist, the dependency tree distribution is changed depending on the proportion of motifs α We call T (k, α) the distribution of dependency trees of depth k We define the k-sensitivity of the partially annealed Boolean network as λ (k) (α) E [λ(t)] (16) t T (k,α) t 3 t 2 t 1 t Figure 11: An example of a dependency tree of depth 3 in a partial annealed network with C1-FFL motifs Black arrows reprensent the edges that belong to C1- FFL motif Red arrows represent one possible dependency tree of the second node Y λ (k) is similar to the average sensitiviy λ of classic annealed Boolean networks but takes k time steps into account The 1-sensitivity λ (1) is the average sensitivity of the dependency trees of depth 1 So, it is also the average sensitivity of the function distribution F resulting from the mixture of motifs and background nodes λ (1) (α) λ(f (α)) (17) When λ (1) < 1 the Boolean function distribution of the network is stable In the classic annealed Boolean network approach, dependency trees have no constraints and λ (k) λ k When local structures exist in a partially annealed approximation λ (k) λ k, so, the classic order parameter is not sufficient to characterize the network stability on multiple time steps 9

3 25 Lambda 1 Lambda 2 Lambda 3 Lambda 4 04 035 with local structures without local structures 2 15 1 05 ratio of perturbed nodes 03 025 02 015 01 005 0 04 05 06 07 08 09 1 Figure 13: Order parameters λ (k) (α)(k 1 4) of a partial annealed network 0 0 02 04 06 08 1 alpha Figure 14: Effect of topology on perturbation size IV4 Results We wrote a small OCaml code to generate a population of 10000 dependency trees according to the tree distribution T (k, α)(k 1 4) and computed the sensitivities λ (1), λ (2), λ (3), λ (4) The partially annealed network is constituted of a mixture of background nodes with a random function distribution (K 4, ρ 05) and C1-FFL motifs The results are presented in Fig 13 The results show that C1-FFL motifs have a stabilizing effect The first effect is due to the changes in the function distribution When 07 < α, λ (1) < 1, so, the function distribution is stable enough When 055 < α < 07, we have λ (1) > 1 so the function distribution should make the network chaotic However, because λ (2), λ (3), λ (4) < 1 the network remains stable This increased area of stability is not only due to the changes in the function distribution but is a consequence of the local topology of the network We also notice that the extended area of stability between λ (3) 1 and λ (4) 1 is small So, it confirms the idea that the effect of C1-FFL motif is mainly visible in 3 time steps To confirm the existence of this increased area of stability due to local topology, we simulate the propagation of perturbations with Matlab We consider two annealed Boolean networks Network 1 is a partially annealed network with a proportion α of motifs Network 2 is an annealed network with the same function distribution as Network 1 At a given time we perturb a randomly chosen node in both networks, and measure the average number of perturbed nodes after a long-time execution The results in Fig 14 confirm the increased stability area due to local topology V Conclusions This work is divided into two parts In the first part, we propose a solid mathematical framework based on partially annealing and Markov chains to study the dynamics of Boolean networks with local structures We are proud to propose a simple and efficient model in which the convergence to a steady state can be theoretically proved and the parameters of the stationary distribution can be easily computed using simple linear algebra methods The model succeeds in predicting the joint stationary distribution of simulated partially annealed networks It also gives an approximation of the stationary distribution of quenched Boolean networks with local structures The second part of the work concerns the propagation of perturbations in networks with local structures We provide evidence to show that local topological effects needs multiple time steps to affect the dynamics Hence, we extend the definition of sensitivity of classical annealed networks to take into account this long time effect The theoretical results predict an increased stability of networks with C1-FFL motifs and simulations confirmed this results A major concern in modeling is to find a fair compromise between accuracy and simplicity Accurate models provide good predictions but are more difficult to build and to analyze On the other hand, simple mod- 10

els are easy to use but fail to predict subtle effects of the system By using a partially annealed approximation, we believe we added just enough complexity to the classical annealed approxition to model the effects of local structures Our model remains simple enough to be analyzed and its predictions are confirmed by simulations We believe this work can be easily extended to take into account mixtures of background nodes and different types of motifs Taking into account the proportion of different kinds of motifs is of great interest if we want to model realistic biological networks Further work will also be needed to understand and analyze the effect of proportions of different motifs on the dependency tree distribution Hence, we would have improved our understanding of the effect of local structures on the stability of the networks Acknowledgements I would like to thank everyone in the CSB research group for having been so welcoming toward me Especially, I am grateful to Professor Olli Yli-Harja for having accepted this internship This work would not have been possible without the unvaluable guidance and support of Juha Kesseli I am very grateful to the many people of Symbiose team who helped me to find my internship, and especially, I would like to show my gratitude to Jérémie Bourdon who made available his support in a number of ways Last but not least, I owe my deepest gratitude to my family and my friends References [1] Stuart A Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets Journal of Theoretical Biology, 22(3):437 467, 1969 [2] Stuart A Kauffman Homeostasis and differentiation in random genetic control networks Nature, 224:177 178, 1969 [3] Stuart A Kauffman The large scale structure and dynamics of gene control circuits: An ensemble approach Journal of Theoretical Biology, 44(1):167 190, 1974 [4] Stuart A Kauffman The Origins of Order: Self- Organization and Selection in Evolution Oxford University Press, USA, 1st edition, June 1993 [5] Stefan Bornholdt Boolean network models of cellular regulation: prospects and limitations Journal of the Royal Society Interface, 5(supp1):S85 S94, 2008 [6] Maximino Aldana, Susan Coppersmith, and Leo P Kadanoff Boolean dynamics with random couplings In Springer Applied Mathematical Sciences Series Springer, New York, 2003 [7] Ilya Shmulevich, Stuart A Kauffman, and Maximino Aldana Eukaryotic cells are dynamically ordered or critical but not chaotic Proceedings of the National Academy of Sciences of the United States of America, 102(38):13439 13444, September 2005 [8] Stuart A Kauffman The ensemble approach to understand genetic regulatory networks Physica A, 340:733 740, 2004 [9] Yves Pomeau and Bernard Derrida Random networks of automata: a simple annealed approximation Europhysics Letters, 1:45 49, 1986 [10] Juha Kesseli, Pauli Rämö, and Olli Yli-Harja Iterated maps for annealed Boolean networks Phys Rev E, 74(4):046104, Oct 2006 [11] Pauli Rämö, Juha Kesseli, and Olli Yli-Harja Stability of functions in gene regulatory networks Chaos, 15:034101, 2005 [12] Uri Alon Network motifs: theory and experimental approaches Nature Reviews Genetics, 8:450 461, 2007 [13] Manu Harju, Juha Kesseli, and Olli Yli-Harja Partial annealing and local structures in Boolean networks In Workshop on Computational Systems Biology, Leipzig, Germany, 2008 [14] Juha Kesseli and Olli Yli-Harja Introducing correlations into mean-field Boolean network models In Workshop on Computational Systems Biology, Luxembourg, 2010 [15] John G Kemeny and J Laurie Snell Finite Markov Chains Springer-verlag, July 1976 11

Appendice A General theory of Boolean motifs In this section we give formal definition of motifs and we study their long-time behavior Of particular interest is the study of state space structure since it determines the existence and uniqueness of a stationary distribution and the convergence to this stationary distribution Under weak conditions, motifs have a unique steady state distribution which can be derived from the computation of eigenvectors of a matrix depending on the motif Definition 1 A Boolean motif is a 5-tuple (N, I, a, (f X ) X N, (d X ) X N ) where: N is a finite set of nodes also called outputs, I is a finite set of inputs, a : I N is the arity function which gives the number of inputs of each node, (f X ) X N are the node update functions : f X : B a(x) B, (d X ) X N are the node domains : d X (N I) a(x) Definition 2 A motif state (also called state) is a function σ : N B An input state (also called input) is a function λ : I B We denote by Σ the finite set of all motif states and Λ the finite set of all input states Definition 3 We define the operator that gives the Boolean values of a given vector of nodes or inputs according to the given node state and input state: (λ, σ) (X 1,, X k ) (b 1,, b k ), where b i { λ(x i ) if X i I, σ(x i ) if X i N When a motif is in a given state with a given input state, it updates its current state to a unique state Definition 4 The update function of a motif is a function F : Λ Σ Σ where A1 Classification of states X N, F(λ, σ)(x) f X ((λ, σ) d X ) (18) When a motif is in a given state, he can update its state to a limited number of other states This defines a structure over the space of states which can be reprensented by a directed graph Here we give definitions to describe this structure The vocabulary is similar to Markov chain classification of states, since we will show that the two notions are strongly connected Definition 5 A state σ b is immediately accessible from a state σ a if λ Λ, F(λ, σ a ) σ b We write σ a σ b A state σ b is accessible from a state σ a with k steps if there are states σ 1,, σ k such that, σ 1 σ a, σ k σ b and σ 1 σ k We write σ a k σ b 12

A state σ b is accesible from a state σ a if there exists k 0 such that σ a k σ b We write σ a σ b A state σ b communicates with a state σ a if σ a σ b and σ b σ a We write σ a σ b Definition 6 is an equivalence relation on Σ The equivalence classes of this relation are called communicating classes A motif with a single communicating class is said to be irreducible A11 Recurrence and transience Definition 7 A state σ a is transient if there exists a state σ b such that σ a σ b and σ b σ a A state that is not transient is called recurrent Proposition 1 If σ a σ b then σ a and σ b are both transient or both recurrent A class is transient when all its states are transient, and recurrent when all its states are recurrent Proof If σ b is transient, then there exists σ c such that σ b σ c and σ c σ b Therefore, we have σ a σ c using transitivity of σ c σ a is impossible, because we would have σ c σ a σ b, so σ a is transient Using symmetry of we show in the same way that if σ a is transient then σ b is transient A12 Periodicity Definition 8 The period of a state σ written (σ) is with the convention (σ) + when there is no L > 0 such that σ L σ When (σ) 1, the state is said to be aperiodic, otherwise it is periodic (σ) gcd{l > 0 σ L σ} (19) Proposition 2 If σ a σ b then (σ a ) (σ b ), so we can define the period of a communicating class Proof If (σ a ) + then it is impossible to have L > 0 such that σ b L σ b because we would have σ a σ b L σ b σ a So, (σ b ) + Using symmetry, (σ b ) + also implies (σ a ) + For otherwise, there exists a and b such that σ a a σ b and σ b b σ a Because, σ a a+b σ a, we have (σ a ) (a + b) For all L > 0 such that σ b L σ b, we have σ a a σ b L σ b b σ a, so (σ a ) (a + L + b) Hence we have (σ a ) L for all L > 0 such that σ b L σ b, so (σ a ) (σ b ) Using symmetry, we also have (σ b ) (σ a ) Hence, (σ a ) (σ b ) Definition 9 A communicating class is said to be ergodic if it is aperiodic and recurrent In the irreducible case, the motif is said to be ergodic Ergodicity is the key property for uniquess and convergence to a steady-state distribution A13 Attractor states Definition 10 A communicating class C is said to have an attractor state σ λ for the input λ if the sequence defined as σ t+1 F(λ, σ t ) satisfies the property t 0, t t 0, σ t σ λ for all σ 0 C Proposition 3 A communicating class that contains an attractor state is aperiodic Proof By definition, we have σ λ σ λ so (σ λ ) 1 13

A2 Long-time behavior for constant input distribution From now, we focus on the long-time behavior of a single motif We add probabilities to the model by defining an initial distribution on the states and show that the succession of states, when the input distribution is constant, is a finite Markov chain Under some assumptions, the distribution converge to a unique limiting distribution that can be easily computed For further information about finite Markov chains, the reader would refer to [15] Definition 11 An input state distribution is a function ν : Λ [0, 1] such that λ Λ ν(λ) 1 It is said to be positive if λ Λ, ν(λ) > 0 A motif state distribution is a function µ : Σ [0, 1] such that σ Σ (µ(σ)) 1 It is said to be positive if σ Σ, µ(σ) > 0 Lemma 1 If (X(t)) t 0 and (Y(t)) t 0 are sequences of random variables with X(t) X and Y(t) Y for all t 0 such that there exists a measurable function f satisfying t 0, X(t + 1) f(x(t), Y(t)), (Y(t)) t 0 are independent and identically distributed, t 0, Y(t) is independent of (X(t),, X(0)), then, (X(t)) t 0 is a homogeneous Markov chain and Proof We prove the Markov property: P [X(1) j X(0) i] y Y,f(i,y)j P [Y(0) y] t > 0, P [X(t + 1) j X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t + 1) j X(t) i], (20) when P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] > 0 Let us call A the left member of the equality and B the right member of the equality We have A P [X(t + 1) j, X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] y Y P [Y(t) y, X(t + 1) j, X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] y Y P [Y(t) y, f(x(t), Y(t)) j, X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] y Y P [Y(t) y, f(i, y) j, X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] y Y,f(i,y)j P [Y(t) y, X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] y Y,f(i,y)j P [Y(t) y] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [X(t) i, X(t 1) i t 1,, X(0) i 0 ] P [Y(t) y] y Y,f(i,y)j y Y,f(i,y)j P [Y(0) y] (independence) 14

Using the same calculation for B, we have : P [X(t + 1) j, X(t) i] B P [X(t) i] y Y P [Y(t) y, X(t + 1) j, X(t) i] P [X(t) i] y Y P [Y(t) y, f(x(t), Y(t)) j, X(t) i] P [X(t) i] y Y P [Y(t) y, f(i, y) j, X(t) i] P [X(t) i] y Y,f(i,y)j P [Y(t) y, X(t) i] P [X(t) i] y Y,f(i,y)j P [Y(t) y] P [X(t) i] P [X(t) i] P [Y(t) y] y Y,f(i,y)j y Y,f(i,y)j P [Y(0) y] (independence) Hence, A B y Y,f(i,y)j P [Y(0) y] Theorem 1 Let us consider a motif M and F its update function, (X(t)) t 0 a sequence of random motif states, (Y(t) t 0 ) a sequence of random independent input states and ν an input state distribution such that t 0, X(t + 1) F(Y(t), X(t)), t 0, P [Y(t) λ] ν(λ) t 0, (X(0),, X(t)) is independent of Y(t) Then (X(t)) t 0 is a time-homogenous finite state Markov chain, the transition matrix of which is Proof This is direct application of lemma 1 P σ1 σ 2 λ Λ,F(λ,σ 1 )σ 2 ν(λ) (21) Theorem 2 If in addition to the assumptions of theorem 1 we have that ν is a positive input state distribution, then the communicating classes of the motif and the communicating classes of (X(t)) t 0 are the same Moreover, the properties of periodicity, recurrence, transience, and ergoticity are the same Proof It is sufficient to prove that the motif space structure and the Markov chain space structure are the same: σ a σ b P σaσ b P [X(1) σ b X(0) σ a ] > 0 If σ a σ b then there exists an input state y such that F(y, σ a ) σ b Hence, P σaσ b λ Λ,F(λ,σ a)σ b ν(λ) ν(y) > 0 Conversely, if P σaσ b λ Λ,F(λ,σ a)σ b ν(λ) > 0 then the sum have at least one term So there exists y Λ such that F(y, σ a ) σ b, that is to say σ a σ b 15

Definition 12 Let us consider a motif M with a constant input state distribution ν and (X(t)) t 0 the associated Markov chain A motif state distribution π is said to be stationary if σ Σ, π(σ) s Σ π(s)p sσ (22) We will also write this result in a matricial way π πp (23) If π is a stationary distribution and there exists t 0 0 such that σ Σ, P [X(t 0 ) σ] π(σ) then, t t 0, σ Σ, P [X(t) σ] π(σ) Hence the long-time behavior of the motif is known We now study the condition for existence and uniqueness of a stationary distribution as well as conditions for convergence to the stationary distribution Theorem 3 For all motif M and all constant input state distribution, there exists a stationary distribution π Proof Because P is a stochastic matrix, there always exists an eigenvector X such that X XP If we normalize this eigenvector π we obtain a stationary distribution X X 1 Theorem 4 For all motif M and all constant positive input distribution, we are in one and only one of these cases : M has a unique recurrent communicating class and a unique stationary distribution, M has more than one recurrent communicating class and an infinite number of stationary distributions Proof Finite Markov chain theory tells us that there always exist a recurrent communicating class A finite Markov chain with a unique recurrent communicating has a unique stationary distribution When there are two recurrent communicating class we can compute two stationary distribtions π 1 and π 2 for these two classes and x [0, 1], xπ 1 + (1 x)π 2 is a stationary distribution for the whole chain Proposition 4 For all irreducible motif M and all constant positive input distribution, there is a unique stationary distribution Proof This is a direct consequence of the previous theorem Theorem 5 Let us consider a motif M with a unique recurrent communicating class C, a constant positive input distribution and π the unique associated stationary distribution Then if C is aperiodic, we have for all initial distribution µ 0 ( σ Σ, P [X(0) σ] µ 0 (σ)) : σ Σ, lim t + P [X(t) σ] π(σ) (24) Proof This is an application of the ergotic theorem for Markov chains that say that the distribition of the an ergodic Markov chain (ie aperiodic and positive recurrent) converges to the stationary ditribution A3 Partially annealed network model Until now, we have studied the long-time behavior of a single motif with a constant input distribution However, in a network context there are several motifs and the inputs of each motif depend on the states of the other nodes in the network We now consider a partially annealed network model that contains several motifs and some other background nodes The inputs of motifs and background nodes are chosen randomly at every time step The Boolean functions of the background nodes are selected according to a random function distribution with bias ρ Hence, the partially annealing is an extension of the classical annealed approximation that takes into account some local structures 16

Definition 13 A partially annealed network (PAN) is an annealed Boolean network that contains fixed motifs It is a pair ((M k ) k, B) where (M k ) 1 k m are m motifs of same type but with inputs I k and nodes N k, B {B 1,, B q } is a finite set of background nodes We call q the number of nodes in B and we assume q 2 Definition 14 A PAN state σ (σ 1,, σ m ) is a m-tuple where σ k : N k B is the state of the kth motif Σ m is the set of all states The connections are shuffled at each update and the values of the background nodes can be considered as randomly chosen binary values with bias ρ, so the network has different possible configurations at each time step Definition 15 A PAN configuration is a pair γ (s, r) where : s is a shuffle map s : k I k ( k N k) B that represents the randomly chosen inputs, r is a Boolean vector representing the randomly chosen values for the background nodes We write S the set of all shuffle maps and Γ S B q the finite set of possible configurations Given a state of the network and a configuration, the network updates its state to unique state Definition 16 We define the update function of the PAN G : Γ Σ Σ by where σ i F i(λ i, σ i ) and A31 State space structure G((s, r), (σ 1, σ m )) (σ 1,, σ m) (25) λ i (x) { σ k (s(x)) if s(x) N k, r k if s(x) B k The update function of a PAN defines a structure over the PAN state space in the same way as motif update functions define structures over motif state space Definition 17 A state σ b is immediately accessible from a state σ a if there exists a configuration γ such that G(γ, σ a ) σ b and we write σ a σ b We define k, and in the same way as for motif state space structures The following propositions show the relationship between the state space structure of a motif and the state space structure of a PAN that contains this motif Proposition 5 We have if and only if (σ a1,, σ am ) (σ }{{} b1,, σ bm ) (26) }{{} σ a σ b i {1,, m}, σ ai σ bi (27) Proof By definition, for all i, there exists λ i such that σ bi F i (λ i, σ ai ) Conversely, we build a suitable configuration γ (s, r) to have G(γ, σ a ) σ b We set r (0, 1, 0,, 0) (the only important thing is to have both possible values) For each i, there exists λ i such that σ bi F i (λ i, σ ai ) If we set the shuffle map to be { B 1 if x I k and λ k (x) 0, s(x) B 2 if x I k and λ k (x) 1, then we have G((s, r), σ a ) σ b 17