How to Build a Living Cell in Software or Can we computerize a bacterium? Tom Henzinger IST Austria
Turing Test for E. coli Fictional ultra-high resolution video showing molecular processes inside the cell Computer animation of molecular processes inside the cell Biologist cannot distinguish.
Turing Test for E. coli Fictional ultra-high resolution video showing molecular processes inside the cell Computer animation of molecular processes inside the cell Biologist cannot distinguish.
Turing Test for E. coli Fictional ultra-high resolution video showing molecular processes inside the cell Computer animation of molecular processes inside the cell Such software must represent all actors and their interaction in real time (think video game ).
Cell cycle regulation of NOTCH signaling Nusser-Stein et al. 2012
Wind Turbine Hydraulic Variable Pitch System, Yang et al. Energies 2011
Which questions about the diagram can be answered with Yes or No?
The Cell as Reactive System [D. Harel] Reacts to inputs: -physical (light, mechanical, electrical) -chemical (molecular receptors, channels) By producing observable outputs: -growth, division, death -movement -signaling Generated over time by an internal dynamics: -metabolism -cell fate -cell cycle
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Reactive model
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Distributed state
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Discrete state change (Events)
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Continuous state change (Time)
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Decisions
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Randomness
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Interaction
P 1 (x,y) P 2 (z) y = f 1 (x,y) dx/dt = f 2 (x,y) until f 3 (x,y) = 0 x z 1/3 2/3 f 4 (x,z) < 0 f 4 (x,z) 0 Iteration
x v z,u P 1 (x,y) z P 2 (z) z u P 4 (u) P 3 (x) P 4 (x,y,z) P 5 (x,y,z,u) Hierarchy
x v z,u P 1 (x,y) z P 2 (z) z u P 4 (u) P 3 (x) P 4 (x,y,z) P 5 (x,y,z,u) Openness
Reactive Models are NOT particularly good for -modeling space (proximity, polarity) -approximate analysis
Standard Dynamic Models used in Molecular Biology 1 Markovian Population Models for chemical reaction systems (continuous time, no decisions, no hierarchy) 2 Boolean/Qualitative Networks for activation/repression systems (discrete time, no decisions, no hierarchy)
Chemical Reaction System (Phage λ)
Gene Activation/Repression System Rafiq K et al. Development 2012; 139:579-590.
Continuous time, stochastic Discrete time, deterministic No decisions, no hierarchy
Markovian Population Models
Markovian Population Models +
Markovian Population Models +
Markovian Population Models State: ( 8, 6, 6 )
Markovian Population Models State: ( 9, 7, 5 ) Transition: State: ( 8, 6, 6 )
State Transition Systems 9, 7, 5 8, 6, 6 deterministic
State Transition Systems 9, 7, 5 8, 6, 6 deterministic
State Transition Systems 9, 7, 5 8, 6, 6 deterministic 0.8 0.2 discrete stochastic 9, 7, 5 8, 6, 6 time 0 1 0
State Transition Systems 9, 7, 5 8, 6, 6 deterministic 0.8 0.2 discrete stochastic 9, 7, 5 8, 6, 6 time 1 0.8 0.2
State Transition Systems 0.5 9, 7, 5 8, 6, 6 1 0 continuous stochastic time 0 exit rate 0.5 expected residence time 2
State Transition Systems 0.5 9, 7, 5 8, 6, 6 0.6 0.4 continuous stochastic time 1 exit rate 0.5 expected residence time 2
Markovian Population Models + 0.2 0.1 Syntax: stoichiometric equations (finite object) 0.5 0.6 12.6 9.6 9, 7, 5 8, 6, 6 Semantics: continuous-time Markov chain (infinite object)
Continuous-Time Markov Chain (CTMC) 0.3 0.4 0.5 12.6 9, 7, 3 8, 6, 4 12.6 9, 7, 4 8, 6, 5 12.6 9, 7, 5 8, 6, 6 9.6 7.0 7, 5, 5 0.4 0.5 0.6 9.6 7.0 7, 5, 6 0.5 0.6 0.7 9.6 7.0 7, 5, 7
Chemist s syntax: + 0.2 0.1 Physicist s syntax (chemical master equation): δp t (x) / δt = Σ i: x 2 Hi α i (u i -1 (x)) p t (u i -1 (x)) - Σ i: x 2 Gi α i (x) p t (x)
Chemist s syntax: + 0.2 0.1 Physicist s syntax (chemical master equation): δp t (x) / δt = Σ i: x 2 Hi α i (u i -1 (x)) p t (u i -1 (x)) - Σ i: x 2 Gi α i (x) p t (x) Computer scientist s syntax (stochastic reactive module): [] x 1 1 Æ x 2 1-0.2 x 1 x 2 -> x 1 :=x 1-1; x 2 :=x 2-1; x 3 :=x 3 +1 [] x 3 1-0.1 x 3 -> x 3 :=x 3-1
Syntax Matters 1. Expressiveness 0 2. Succinctness IIII IIII IIII 3. Operations +1 addition multiplication XIV x XXXIV vs. 14 x 34 = 42 56 476
Executable Biology Machine model (rather than equational model): [] x 1 1 Æ x 2 1-0.2 x 1 x 2 -> x 1 :=x 1-1; x 2 :=x 2-1; x 3 :=x 3 +1 [] x 3 1-0.1 x 3 -> x 3 :=x 3-1 -can be executed (not simulated ) -can be composed, encapsulated, and abstracted -can be dynamically reconfigured -can be formally verified (if you are lucky) Transient behavior (rather than steady state) of interest.
Four Executions for Phage Genetic Switch
Probability Mass Propagation for Phage Genetic Switch t = 5000 t = 30000 t = 15000 t = 50000
Mass Propagation vs. Execution Program verification vs. Program testing
Phage λ Model Desired precision: 3 x 10-6 Probability mass propagation: 55 min runtime (adaptive uniformization with sliding-window abstraction) [Didier, H., Mateescu, Wolf 2011] Execution (β = 0.95): 67 h runtime (3 x 10 8 runs)
Main Problem with Biochemical Reaction Systems Well-stirred mixture (gas or fluid) Crowded cytosol [T. Vickers] -small numbers of important molecules -locality, gradients matter
Gene Activation/Repression Systems Rafiq K et al. Development 2012; 139:579-590.
Gene A Gene B Gene C DNA
Gene A Gene B Gene C DNA inactive inactive Promoter Protein code active GENE REGULATION
Gene A Gene B Gene C DNA inactive inactive Promoter Protein code active Transcription Protein molecules Function GENE REGULATION
Gene A Gene B Gene C DNA inactive inactive Promoter Protein code active Transcription Activation Protein molecules GENE REGULATION
Gene A Gene B Gene C DNA inactive inactive Nucleotides Promoter active Transcription Protein code A T C G C C Activation Protein molecules GENE REGULATION
Gene A Gene B Gene C DNA inactive inactive Nucleotides Promoter active Transcription Protein code A T C G C C Activation Mutation Protein molecules A T C G A C GENE REGULATION
3 WAGNER MODEL 3,2 A activate B 2,1 weight, threshold repress 2,4 C 1 input
3 WAGNER MODEL 3,2 A activate ABC 000 state B 2,1 weight, threshold repress 2,4 C 1 input
3 WAGNER MODEL 3,2 A 3-0 2 ABC 000 B 2,1 100 0 < 1 2,4 C 1 + 0-0 < 4 1
3 WAGNER MODEL 3,2 A 3-0 2 ABC 000 B 3 1 2,1 100 111 2,4 C 1 + 3-0 4 1
3 WAGNER MODEL 3,2 A 3 2 < 2 ABC 000 B 3 1 2,1 100 111 2,4 C 1 + 3 2 < 4 010 1 deterministic transition system
3 WAGNER MODEL 2 3,2 A 4,2 A mutate B 2,1 B 1,1 p 2,4 C 2,4 C 1 1
MUTATION PROBABILITIES 1. Each nucleotide (A, T, C, or G) mutates to another nucleotide with a given probability π/3. 2. The weight of a gene g decreases with the number of mutated nucleotides in the promoter region: if g has a promoter region of length n and k nucleotides are mutated, then w(g) decreases to w(g) (1 k/n).
EVOLVING GENE REGULATORY NETWORK: DISCRETE-TIME MARKOV CHAIN (DTMC) p 00 dead 1 p 20 p 23 w 0 weight function p 02 w 2 p 20 p 01 p 03 w 1 p 23 w 3
PHENOTYPE = TEMPORAL PROPERTY φ Oscillation Bistability ((A ) : A) Æ (: A ) A)) ((A Æ : B ) (A Æ : B)) Æ (: A Æ B ) (: A Æ B))) atomic propositions = genes Elowitz & Leibler, Nature 2000. Milo et al., Science 2002.
p 00 DTMC dead 1 p 20 p 23 w 0 ² φ p 02 w 2 ² :φ p 20 p 01 p 03 w 1 p 23 w 3
STATIONARY LIMIT DISTRIBUTION w 0 ² :φ w 1 ² φ w 2 ² φ w 3 ² φ dead dead Limit probability of truth of φ
COMPUTING THE ANSWER Problem: 1. Repeated Execution The number of weight functions w grows exponentially with the number of genes.
COMPUTING THE ANSWER Problem: Solution: 1. Repeated Execution The number of weight functions w grows exponentially with the number of genes. 2. Parametric Model Checking [Giacobbe, Guet, Gupta, H., Paixao, Petrov 2015]
PARAMETRIC MODEL CHECKING 1. Keep inputs, weights, and thresholds as symbolic values ( parameters ).
EXAMPLE: TRIPLE REPRESSOR i A w A,t A A i B B w B,t B w c,t c C i C
PARAMETRIC MODEL CHECKING 1. Keep inputs, weights, and thresholds as symbolic values ( parameters ). 2. Use SMT solving to compute the constraints on the parameter values such that an LTL formula φ is satisfied.
EXAMPLE: TRIPLE REPRESSOR i A Oscillation: w A,t A A φ = Æ g 2 {A,B,C} ((g ) : g) Æ (: g ) g)) i B B w B,t B w c,t c C φ satisfied iff i A > t A Æ i B > t B Æ i C > t C Æ i B - w A < t B Æ i C w B < t C Æ i A w C < t A i C
LIMITATIONS OF THE MODEL -(over?)simplification of timing -(over?)simplification of quantitative measures (weights)
The Essence of Computer Science 1. Algorithms 2. Machines/languages: towers of abstraction Programming language Processor Circuit Transistor
Tower of Bio-Abstractions Population Organism Organ Cell Network Molecule
Are there useful macros to structure the hairball? Population Organism Organ Cell Network We are looking for the organizing principle: the programming language, only then for the program! Molecule