Computational Methods in Systems and Synthetic Biology

Computational Methods in Systems and Synthetic Biology François Fages EPI Contraintes, Inria Paris-Rocquencourt, France 1 / 62

Need for Abstractions in Systems Biology Models are built in Systems Biology with two contradictory perspectives : 1) Models for representing knowledge : the more concrete the better detailed mechanistic reaction models (SBML), gene ontologies, protein functions, protein interactions, structures... 2) Models for making predictions : the more abstract the better. schematic reaction models (SBML), variable elimination, approximations, stationary states, influence graph... These perspectives can be reconciled by organizing models and formalisms in abstraction hierarchies. To understand a system is not to know everything about it but to know abstraction levels that are sufficient for answering questions about it 4 / 62

Overview of the Lecture Abstract interpretation in systems biology 1 Theory of abstract interpretation, domain of reaction models 2 Hierarchy of semantics: domains of stochastic, discrete and boolean traces 3 Analyses by type checking/type inference: dimensions, protein functions, influence graphs, compartment topology F. Fages, S. Soliman, Abstract Interpretation in Systems Biology, CMSB 06, Theoretical Computer Science 2008. 5 / 62

Theory of Abstract Interpretation: Domains Simple algebraic theory of abstraction introduced by [Cousot Cousot 77] to reason about programs. In this setting, a (computation) domain is a lattice D(,,,, ) which formalizes the set of possible executions. Often just a power-set P(S)(,, S,, ) ordered by set inclusion. Example Domain of sets of integers P(Z)(,, S,, ). Example Domain of sets of signs P({+, })(,, S,, ). 7 / 62

Syntactical Domain of Reaction Models Given a finite set M of molecule names, the universe of all possible reactions is the set R = {e for S=>S e is a kinetic expression, S and S are finite multisets over M}. Definition The syntactic domain of reaction models over M is C R = (P(R), ). = is the empty model, = {R} is the universal model. Note: the set of possible (stochastic, discrete Petri net, Boolean) execution traces increases with the number of reactions in the model. Note: the differential semantics is deterministic (produces only one trace) 10 / 62

Theory of Abstract Interpretation: Abstractions Definition A Galois connection C α γ A between two lattices C and A is defined by an abstraction function α : C A and a concretization function γ : A C which are monotonic: c, d C c C d α(c) A α(d), a, b A a A b γ(a) C γ(b), γ α is extensive (represents the information lost by abstraction) c C c C γ α(c), α γ is contracting: a A α γ(a) A a. If γ α is the identity, the abstraction α loses no information, and C and A are isomorphic from the information standpoint (although α may be not onto and γ not one-to-one). 14 / 62

Properties of Galois Connections Let a = {b b a} and a = {b a b}. 1 α, γ are adjoint functors: c C, a A : c C γ(a) α(c) A a. 15 / 62

Properties of Galois Connections Let a = {b b a} and a = {b a b}. 1 α, γ are adjoint functors: c C, a A : c C γ(a) α(c) A a. 2 γ(a) = max α 1 ( a) = α 1 ( a) 3 α(c) = min γ 1 ( c) = γ 1 ( c) 16 / 62

Properties of Galois Connections Let a = {b b a} and a = {b a b}. 1 α, γ are adjoint functors: c C, a A : c C γ(a) α(c) A a. 2 γ(a) = max α 1 ( a) = α 1 ( a) 3 α(c) = min γ 1 ( c) = γ 1 ( c) 4 γ α is the identity iff γ is onto iff α is one-to-one. 5 α preserves, and γ preserves ; It is equivalent in the definition of Galois connections to replace the conditions of extensitivity and contraction by adjointness 1, or by condition 2 which also entails the monotonicity of γ. 18 / 62

Pointwise Galois Connections between Powersets Lemma Let C and A be two sets, and α : P(C) P(A) be a function such that α(c) = e c α({e}). Then the function γ(a) = α 1 ( a) forms a Galois connection P(C) α γ P(A) between (P(C), ) and (P(A), ). Proof. α is monotonic since c c implies e c α({e}) e c α({e }) Let c = γ(a) = α 1 ( a), to apply the previous proposition we need to show c α 1 ( a), i.e. α(c) a. We have α(c) = e c α({e}) = e α 1 ( a) α({e}). 23 / 62

1. Stochastic Trace Semantics of Reaction Models For a given volume V k of the location where the compound x k resides, a concentration C k for a molecule is translated into a number of molecules N k = C k V k N A, where N A is Avogadro s number. Definition Let a discrete state be a vector of integers of dimension M. Definition The universe S of stochastic transitions is the set of triplets (S i, S j, τ) where S i and S j are discrete states and τ R +. The domain of stochastic transitions is D S = (P(S), ). Remark: Discrete states and solutions in reaction rules have the same mathematical structure (multisets) and can both be represented by M -dimensional vectors of integers. 26 / 62

Stochastic Trace Semantics of Reaction Models Let us assume a given Biocham model. In a given state S, the kinetic expression e r of a reaction rule r evaluates in a (positive) reaction weight τ r (S) R +. The weight τ i,j of a transition from state S i to state S j is the sum of reaction weights of the rules in the model that transform S i in S j : τ i,j = τ r (S i ) r M,S i r S j The probability of transition from state S i to S j is τ ij p ij = (S i,s k,τ ik ) s τ ik This defines a Markov chain that can be simulated using Gillespie algorithm. The stochastic semantics of a Biocham model is the set of triplets (S i, S j, τ i,j ) 27 / 62

Reaction Domain Stochastic Trace Domain Proposition Let α RS : C R D S be the function associating to a reaction model the state transition graph labelled with the τ i,j s. Let γ RS (s) = α 1 RS ( s). C α RS R γrs D S is a Galois connection. Remarks: α RS is not one-to-one. For instance, the reaction models m1 = { e for A => B} and m2 = m1 { e for 2*A => A+B} have the same set of stochastic transitions. γ α is thus not the identity, the information lost by the stochastic abstraction is the elimination of redundant rules in the reaction model. α RS is neither onto 28 / 62

2. Petri Net Trace Semantics of Reaction Models Definition The universe D of discrete (Petri net) transitions is the set of pairs of discrete states. The domain of discrete transitions is D D = (P(D), ). Remark: The discrete semantics is the classical Petri net semantics of reaction models [RML93ismb,SHK06bmcbi,Chaouiya07bi,GHL07cmsb]. Classical Petri net analysis tools can be used for the analysis of reaction models at this abstraction level. For instance, the elementary mode analysis of metabolic networks [SPM02bi] has been shown in [ZS03insilicobio] to be equivalent to the classical analysis of Petri nets by T-invariants. 29 / 62

Petri Net Trace Semantics of Reaction Models Proposition Let α SD : D S D D be the function associating to a set of stochastic transitions the discrete transitions obtained by projection on the two first components, and γ SD (d) = α 1 SD ( d). D α SD S γsd D D is a Galois connection. Remarks: α SD is onto but not one-to-one as the transition rates are simply forgotten. 30 / 62

3. Boolean Trace Abstraction Semantics Let a boolean state be a vector of booleans of dimension M indicating the presence of each molecule in the state. Definition The universe B of boolean transitions is the set of pairs of boolean states. The domain of boolean transitions is D B = (P(B), ). 31 / 62

Boolean Trace Abstraction Semantics of Reaction Models Let a boolean state be a vector of booleans of dimension M indicating the presence of each molecule in the state. Definition The universe B of boolean transitions is the set of pairs of boolean states. The domain of boolean transitions is D B = (P(B), ). Let α N B : N M B M be the zero/non-zero abstraction from the integers to the booleans, and its pointwise extension from discrete states to boolean states. Proposition Let α DB : D D D B be the set extension of α N B. Let γ DB (b) = α 1 DB ( b). D α DB D γdb D B is a Galois connection. Remark: α DB is onto but not one-to-one 32 / 62

4. Boolean Semantics of Biocham Given a reaction model R, let us denote by S BB the set of boolean transitions obtained by considering all pssible consumption of reactants. For instance, the rule A+B=>C+D gives rise to four boolean transitions: A B A B C D A B A B C D A B A B C D A B A B C D Remark: Biocham Boolean semantics differs from Boolean Petri nets, Pathway Logic, etc. where complete consumption of the reactants is always assumed. 33 / 62

Over-approximation Theorem of Biocham Boolean Semantics Theorem For any reaction model R, α DB (α SD (α RS (R))) S BB. 34 / 62

Over-approximation Theorem of Biocham Boolean Semantics Theorem For any reaction model R, α DB (α SD (α RS (R))) S BB. Proof. Since all our abstractions are defined pointwise, it is enough to prove it for only one rule in R. Let us consider e for S=>S. α RS (R) = {(S i, S j, e) S i S, S j = S i S + S } α SD (α RS (R)) = {(S i, S j ) S i S, S j = S i S + S } α DB (α SD (α RS (R))) = {(S i, S j ) S i S, S j = S i S + S, S i = α N B (S i ), S j = α N B (S j )}. S BB = {(T, T ) T α N B (S), α N B (S ) (T α N B (S)) T α N B (T ) α N B (S )} 38 / 62

Type Checking and Type Inference by Abstract Interpretation 1 Type Checking and Type Inference 2 Domain of Protein Functions 3 Domain of Protein Influences 4 Domain of Compartment Neighborhoods 40 / 62

Type Checking/Inference by Abstract Interpretation A type system A for a concrete domain C is a Galois connection C α A. 41 / 62

Type Checking/Inference by Abstract Interpretation A type system A for a concrete domain C is a Galois connection C α A. The type inference problem is INPUT a concrete element x C (e.g. a reaction model) UTPUT its typing α(x) (e.g. the protein functions of the model). The type checking problem is, INPUT x C (e.g. a reaction model) and a typing y A (e.g. a set of protein functions), UTPUT determine whether x C γ(y) (i.e. whether the reactions are compatible with the protein functions) or equivalently α(x) A y (the typing contains the inferred types) 43 / 62

Type Checking/Inference of Protein Functions A F = P({kinase(A) A M} {phosphatase(a) A M}) The typing of reactions by protein functions is defined by α F α F (A =[B]=> C) = {kinase(b)} if C is strictly more phosphorylated than A α F (A =[B]=> C) = {phosphatase(b)} if C is strictly less phosphorylated α F (A + B => A-B, A-B => C + B) = { kinase(b)} if C is strictly more phosphorylated than A α F (A + B => A-B, A-B => C + B) = { phosphatase(b)} if C is strictly less phosphorylated than A 45 / 62

More Precise Protein Function Typing In SBML : no typing possible as there is no syntax for phosphorylation In BIOCHAM : typing is possible but the syntax does not distinguish between phosphorylation, acetylation etc. More precise protein function types: τ ::= kinase phosphatase kinase(τ) phosphatase(τ) T where T denotes some basic types of proteins, with the subtyping relations kinase(τ) kinase and phosphatase(τ) phosphotase. 47 / 62

Evaluation Results in BIOCHAM MAPK model [Levchenko et al. 00] the kinase function of RAFK, RAF~{p1} and MEK~{p1,p2} is inferred; the phosphatase function of RAFPH, MEKPH and MAPKPH is inferred; the kinase function of MAPK~{p1,p2} is not visible and not inferred. Model of the mammalian cell cycle control after [Kohn 99] 165 proteins and genes, 500 variables and 800 rules. Type inference in < 1sec CPU : No compound is both a kinase and a phosphatase; cdc25a and cdc25c are the only phosphatases found together with the deacetylase HDAC1. The cdk are inferred to be kinases only in complexes with cyclins; the acetylases pcaf, p300 are identified to kinases. 48 / 62

Type Checking/Inference of Location Neighborhood Abstract domain A N = P({neighbors(A, B) A, B M}). α RN (e for A 1 + +A m =>A m+1 + +A n ) = {neighbors(a i, A j ) 1 i, j n}) {neighbors(a i, C) 1 i n, C e}). Proposition α RN can be computed in O(n) time where n is the number of reaction rules. 49 / 62

Type Checking/Inference of Location Neighborhood SBML models http://www.biomodels.net 13 over 50 models have compartments and 7 use the outside attribute BIOMD39.xml: neighbor(cytosol,reticulum), neighbor(cytosol, mitochondria) inferred and checked consistent with the outside attributes. BIOMD45.xml: neighbor(cytosol,extracellular), neighbor(cytosol, vesicula1), neighbor(cytosol, vesicula1) inferred and checked consistent with the outside attributes. BIOCHAM model p53-mdm2 : neighbor(cytosol,nucleus) inferred volume ratio not systematically used in the published model [Ciliberto 05] 50 / 62

Cell Grid Inferred in a Square 6x6 Delta-Notch Model (if [D::c21]+[D::c23]+[D::c12]+[D::c32] < 0.2 then 0 else ka,ma(kd)) for (if [D::c22]+[D::c24]+[D::c13]+[D::c33] < 0.2 then 0 else ka,ma(kd)) for (if [D::c23]+[D::c25]+[D::c14]+[D::c34] < 0.2 then 0 else ka,ma(kd)) for (if [D::c24]+[D::c26]+[D::c15]+[D::c35] < 0.2 then 0 else ka,ma(kd)) for... 51 / 62

Type Checking/Inference of Influence Graphs A I = P({A B + A, B M} {A B A, B M}). The influence graph of a reaction model is defined by α RI : C R A I α RI (x) = {A B (e i for S i S i ) x, l i (A) > 0 and r i (B) l i (B) < 0} α RI ({A + B => C}) = { {A + B (e i for S i S i ) x, l i (A) > 0 and r i (B) l i (B) > 0} α RI ({A = [C] => B}) = { α RI ({A = [B] => }) = { α RI ({ = [B] => A}) = { A B, A A, B A, B B, A + C, B + C} C A, A A, A + B, C + B} B A, A A} B + A} 53 / 62

Type Checking/Inference of Influence Graphs A I = P({A B + A, B M} {A B A, B M}). The influence graph of a reaction model is defined by α RI : C R A I α RI (x) = {A B (e i for S i S i ) x, l i (A) > 0 and r i (B) l i (B) < 0} Proposition {A + B (e i for S i S i ) x, l i (A) > 0 and r i (B) l i (B) > 0} α RI can be computed in O(n) time where n is the number of rules. Proposition Let γ RI (f ) = α RI 1 ( f ), C R α RI γri A I is a Galois connection. 54 / 62

MAPK model: Reaction Graph α Influence Graph Thomas s conditions for multistationarity and oscillations apply here : 55 / 62

P53-Mdm2: Reaction Graph α Influence Graph p53 rule_11 rule_13 rule_2 Mdm2::c rule_14 rule_12 Mdm2~{p}::c rule_15 rule_16 rule_18 rule_9 Mdm2::n DNAdam rule_17 rule_19 rule_3 rule_20 rule_10 p53~{u} rule_6 rule_5 rule_4 p53~{uu} rule_8 rule_7 Inhitions hidden in the kinetic expressions are missed 56 / 62

Influence Graph Abstraction from the Differential Semantics Let us denote by β the mapping from C R to D J that extracts x k and hence the Jacobian from the kinetic expressions in the reaction rules. Definition The differential influence graph abstraction α J I : D J A I is the function α J I (x) = {A B + x B / x A > 0 in some point of the phase space} {A B x B / x A < 0 in some point of the phase space} defined purely from the kinetic expressions... compatibility with the rules? 57 / 62

Increasing Kinetics Definition A kinetic expression e i is increasing w.r.t. a reaction model x iff for all molecules x k we have 1 for all points of the phase space e i / x k 0 2 if there exists a point in the phase space s.t. e i / x k > 0 then l i (x k ) > 0 The model x will be said to have increasing kinetics if each of its reaction rules has a increasing kinetic expression. The mass action law kinetics, e i = k Πx i l i, are increasing Hill s kinetics (and Michaelis-Menten kinetics when n = 1) e i = V m x s n /(K m n + x s n ) are also increasing. 58 / 62

Comparison to the Syntactical Influence Graph Theorem (Over-approximation) For any reaction model x with increasing kinetics, α J I β(x) α RI (x). Proof. If (A + B) α J I β(x) then Ḃ/ A > 0. Hence there exists a term (r i (B) l i (B)) e i in the ODE with e i / A of the same sign as r i (B) l i (B). Let us suppose that r i (B) l i (B) > 0 then e i / A > 0 and since e i is increasing we get that l i (A) > 0 and thus that (A + B) α RI (x). If on the contrary r i (B) l i (B) < 0 then e i / A < 0, impossible. If (A B) α J I β(x) then Ḃ/ A < 0. Hence there exists a term (r i (B) l i (B)) e i with e i / A of sign opposite to that of r i (B) l i (B). Let us suppose that r i (B) l i (B) > 0 then e i / A < 0, impossible. If on the contrary r i (B) l i (B) < 0 then 59 / 62

Comparison to the Syntactical Influence Graph Even with mass action law kinetics, there is no equality between α J I β and α RI. 60 / 62

Comparison to the Syntactical Influence Graph Even with mass action law kinetics, there is no equality between α J I β and α RI. For instance let x be the following model : k 1 A for A => B k 2 A for = [A] => A We have α RI (x) = {A + B, A + A, A A}, however Ȧ = (k 2 k 1 ) A, hence Ȧ/ A can be made always positive, negative or null resulting in the absence from of A A or A + A or both from α J I β(x). 61 / 62

Strongly Increasing Kinetics Definition A kinetic expression e i is strongly increasing w.r.t. a reaction model x iff for all molecules x k we have 1 for all points of the phase space e i / x k 0 2 there exists a point in the phase space s.t. e i / x k > 0 iff l i (x k ) > 0 Theorem If x has strongly increasing kinetics and the syntactical influence graph contains no both positive and negative pair, then α RI (x) = α J I β(x). Corollary Under these conditions, the (global) differential influence graph of a reaction model is independent of the kinetics! 62 / 62