Modeling and Systems Analysis of Gene Regulatory Networks

Modeling and Systems Analysis of Gene Regulatory Networks Mustafa Khammash Center for Control Dynamical-Systems and Computations University of California, Santa Barbara

Outline Deterministic A case study: the bacterial heat shock response A systems analysis of the regulation strategy What s behind the complexity? Robustness, performance, and optimality Reconciling known components with experimental data Stochastic Why are stochastic models needed? A case study: the Pap pili stochastic switch The stochastic formulation of chemical kinetics Gillespie s SSA A New Approach: the Finite State Projection

Loss of Protein Function Network failure Cell Unfolded Proteins Aggregates Death Folded Proteins Temp cell Temp environ

Function of Heat-Shock Proteins I. Protein Folding DnaK GroEL/ GroES Unfolded/partially folded Proteins II. Protein Degradation Proteases Proteins Aggregates Amino Acids Folded Proteins

Synthesis of Heat Shock Proteins RNA Polymerase + 32 factor start DNA end hsp1 hsp2 promoter terminator Heat-Shock Proteins mrna ribosomes

Regulation of HSP Tight regulation of 32 at 3 levels Synthesis Activity Stability Feedforward Feedback

I. Regulation of HSP Synthesis At low temperature, mrna has a secondary structure 32 mrna Translation Heat 32 32 mrna

II. Regulation of 32 Activity: A Feedback Mechanism RNAP hsp1 hsp2 Transcription & Translation Chaperones Heat 32

III. Regulation of 32 Degradation FtsH degrades sigma-32 only when bound to cheperones RNAP hsp1 hsp2 Proteases FtsH Chaperones Transcription & Translation Heat FtsH FtsH FtsH 32

Regulation of E. coli s Heat-Shock Response Heat rpoh gene Transcription Degradation - Activity - 32 Translation Feedforward hsp1 hsp2 Heat Feedback DnaK GroEL GroES HslVU FtsH Transcription & Translation Chaperones Proteases

Protein Synthesis Mathematical Model

Binding Equations Mass Balance Equations

Full Model Simulations 30000 350 300 SIGMA32 TOTAL 250 200 150 100 50 0 375 400 425 450 TIME 30 o 42 o 27000 24000 DNAK TOTAL 21000 18000 15000 12000 375 400 425 450 TIME 30 o 42 o 0.14 2E+06 0.12 SIGMA32:RNAP 0.1 0.08 0.06 0.04 0.02 FOLDED PROTEINS 1.8E+06 1.6E+06 1.4E+06 0 375 400 425 450 TIME 1.2E+06 380 390 400 410 420 430 440 450 TIME

Analysis of the HS System What are the advantages of the different control strategies? Disturbance feedforward Sequestration feedback loop Degradation feedback loop High synthesis/degradation rates of sigma-32 Fast response Robustness, efficiency Fast response, noise suppression Fast response Tools Dynamic analysis and simulations Sensitivity/robustness analysis Sum-of-Squares tools Optimal control

Effect of Sequestration Feedback Loop 10 1 Sensitivity of DnaK to model parameters 10 0 Sensitivity of DnaK 10-1 10-2 10-3 K-s32/dnaK K-s32:dnaK/D transcription rate No feedback loop Wild type 10-4 10-5 100 200 300 400 500 600 700 800 Time (min) 30 o 42 o

Role of the degradation feedback loop (cont.) 16000 Wild type Constitutive degradation 14000 DNAK 12000 10000 8000 6000 4000 0 100 200 300 400 500 Time

600 450 Total 32 24000 DNAK 300 150 0 8 8000 06 1E+ WildType No Feedforward Low S32 Flux Constitutive 32 Deg No DnaK interaction 6 4 Free 32 Unfolded 2 0 50 60 70 Time 0 50 60 70 Time

Is the WT Response optimal? Observation: A hypothetical heat-shock response system maybe devised to achieve a very small number of unfolded proteins minimal complexity (no feedback necessary) E.g. over-expressing chaperones & eliminating feedback However chaperone over-expression involves a high metabolic cost is toxic to the cell The existing design appears to achieve a tradeoff: good folding achieved with minimal number of chaperones How optimal is the WT design?

Cost function: For a given >0, solve: J The optimal solution yields a point in R 2 : p opt t 1 2 ( ) = [ chaperones] dt + t 0 t 1 t1 opt = 2 ( ) : [ chaperones ] dt, [ P t0 t0 { p ( ) : > 0} min J ( ) st.. x& = F( x, y, ) 0 = Gxy (,, ) opt The curve gives the set of Pareto optimal designs t 1 t 0 opt un [ P ] 2 un ] 2 dt dt opt { p ( ) : > 0} t 1 t0 [ 2 P un ] dt Non-optimal unachievable t 2 [ chaperones] dt 1 t 0

Pareto Optimal Design of the Heat Shock System 100 Pareto Optimal curve t 1 t 0 [ 2 P un ] dt Cost of unfolded proteins 80 60 40 various nonoptimal values of parameters 20 0 Wild type heat shock 10 11 12 Cost of chaperones t 1 t 0 [ chaperones] 2 dt

9600 Constitutive Degradation of S32 Deterministic Wild Type S32 Chaperone level 600 8400 450 10 0 Wild Type No Feedforward 8000 0 25 50 Low Flux 75 100 Constitutive degradation of S32 Time 300 150 9200 8800 The architecture of Sensitivity the heat of DnaK shock to model system parameters is a necessary Sensitivity of DnaK No DnaK 10-1 interaction 0-10 0 10 20 30 40 50 Time 10 1 outcome of robustness and performance requirements 10-2 10-3 10-4 Level of unfolded proteins (Uf) 1E+06 800000 600000 400000 10-5 100 200 300 400 500 600 700 800 Time (min) 200000 Wild type No DnaK interaction No feedforward Low S32 flux Constitutive degradation of S32 0-10 0 10 20 30 40 50 Time

Molecular Biology and Phenotype A dynamic model is to be viewed as a tool for capturing biological knowledge about known components and their frequently complicated interactions. Question: Are known components and their interactions consistent with experimental observation? Two scenarios: Yes. A dynamic model capturing known players and their interactions can robustly reproduce the data. No. A proof is generated showing that with the existing components it is impossible to reproduce the data.

Level of sigma32 600 500 400 300 200 100 Induction phase Case Study: Origin of Adaptation Phase Adaptation phase 0 390 400 410 420 430 440 Time i A hypothesis If induction phase is caused by increased sigma32 translation, then surely the adptation phase is caused by a shut off in translation. Several experiments to search for hypothesized shut-off mechanism But Dynamic simulations of the model based on known components always exhibit an adaptation phase. No need for a shut off mechanism! Feedback through the FtsH loop is responsible for such adaptation Translation of sigma32 does not shut off at higher temperatures. Dynamic interplay between antagonistic pathways controlling the s32 level in E. coli, Morita et. al, PNAS, 2000.

Missing Players? Question: Is a model without the FtsH feedback loop consistent with the data?

Model Invalidation Given: Proposed state-space model with constraints on parameters Measurement data Provide (if possible) a proof that the measurements are inconsistent with the model.

Model Invalidation using SOStools model measurements

Deficient model (no FtsH feedback) Measured data with uncertainty A barrier function B(x,p,t) was constructed using SOStools A model w/o FtsH cannot possibly produce measured data

Stochastic Modeling In the HS system, the total number of sigma-32 molecules per cell is very small (~30 per cell) Does it make sense to treat these quantities as concentrations? Deterministic models describe macroscopic behavior, but there are many examples when deterministic models are not adequate

Outline Why are stochastic models needed? Randomness and variability Motivating examples Stochastic Chemical Kinetics The Chemical Master Equation The Linear Noise Approximation Monte Carlo methods: Gillespie s Stochastic Simulation Algorithm A New Approach: The Finite State Projection Method (FSP) A case study: the Pap pili stochastic switch

Why Are Stochastic Models Needed? Much of the mathematical modeling of gene networks represents gene expression deterministically Considerable experimental evidence indicates that significant stochastic fluctuations are present

Deterministic vs. Stochastic Deterministic model Protein mrna Stochastic model DNA...

Fluctuations at Small Copy Numbers Protein (protein) mrna (mrna) DNA Mean equals deterministic steady-state Coefficient of variation of grows as 1/sqrt(mean) For large mean, fluctuations are small (relative to the mean)

Intrinsic Variability in Gene Expression Protein mrna Source of variability at cellular level. Small # of molecules Random events DNA DNA DNA Cell 1 Cell 2 Cell 3 Or T=1 T=2 T=3 Intrinsic noise Noise propagate through the network Cannot be captured by deterministic models

Deterministic Models May Not Capture the Stochastic Mean Number of Molecues of P stochastic deterministic Johan Paulsson, Otto G. Berg, and Måns Ehrenberg, PNAS 2000 Time Stochastic mean value different from deterministic steady state Noise enhances signal!

Noise induced oscillations Circadian rhythm 2500 deterministic stochastic A R 2000 C 1500 A + R R 1000 A a A A ' a A A a R A ' a R D A D R 500 Vilar, Kueh, Barkai, Leibler, PNAS 2002 0 0 50 100 150 200 250 300 350 400 Time (hours) Oscillations disappear from deterministic model after a small reduction in deg. of repressor (Coherence resonance) Regularity of noise induced oscillations can be manipulated by tuning the level of noise [El-Samad, Khammash]

Stochastic Switches The Lysis-lysogony switch 120 100 80 CI 60 Equilibrium Points 40 20 0 0 20 40 60 80 100 120 Time

Stochastic Model: Pap Pili Phase Variation System Pili are hair-like structures on a bacterium Pili enable uropathogenic E. coli to attach to the upper urinary tract; plays an essential role in the pathogenesis of UTIs Pili expression switches ON or OFF such that bacterial cells are either piliated or unpiliated Piliation is controlled by a stochastic switch that involves molecular events

A Simplified Pap Switch Model local regulatory protein Lrp Site 1 Site 2 One gene g Lrp can (un)bind either or both of two binding sites A (un)binding reaction is a random event

Lrp State g 1 R 1 R 2 State g 2

Lrp State g 1 R 1 R 2 R 3 R 4 State g 2 State g 3

Lrp State g 1 R 1 R 2 R 3 R 4 State g 2 State g 3 R 5 R 6 R 7 R 8 State g 4

Lrp State g 1 PapI R 1 R 2 R 3 R 4 State g 2 State g 3 R 5 R 6 R 7 R 8 State g 4

Lrp OFF State g 1 PapI R 1 R 2 R 3 R 4 State g 2 ON R 5 R 6 R 7 R 8 State g 3 OFF State g 4 OFF

Lrp OFF State g 1 PapI R 1 R 2 R 3 R 4 State g 2 ON R 5 R 6 R 7 R 8 State g 3 OFF Piliation takes place if gene is ON at a specific time: T State g 4 OFF

Identical genotype results in different phenotype gene gene. gene gene Piliated Unpiliated

Lrp PapI R 1 R 2 R 3 R 4 OFF What is the probability of being in State g 2 at time T? State g 1 State g 2 ON Piliation takes place if gene is ON at a specific time: T R 5 R 6 R 7 R 8 State g 4 OFF State g 3 OFF

The Stochastic Formulation of Chemical Kinetics

The Chemical Master Equation The Chemical Master Equation (CME):

The Stochastic Simulation Algorithm (SSA)

The Stochastic Simulation Algorithm (SSA) P x (,μ) 1 3 2... μ M d

# of specie 2 An SSA run # of specie 1

A New Approach: The Finite State Projection (FSP) Algorithm

The Chemical Master Equation (CME): The states of the chemical system can be enumerated: The probability density state vector is defined by: The evolution of the probability density state vector is governed by

The probability density vector evolves on an infinite integer lattice

A finite subset is appropriately chosen

A finite subset is appropriately chosen The remaining (infinite) states are projected onto a single state (red).

The projected probability density state vector evolves on a finite set. The projected system can be solved exactly!

Finite Projection Bounds

The Finite State Projection (FSP) Algorithm

FSP Applied to the Pap System Lrp PapI State g 2 R 1 R2 R 3 R 4 State g 1 State g 3 R 5 R 6 R 7 R 8 State g 4 10 reactions An infinite number of possible states:

Parameters and initial conditions

Results and Comparisons Unlike Monte-Carlo methods (such as the SSA), the FSP directly approximates the solution of the CME.

FSP Results (cont.) PapI Probability Density Probability Density Guaranteed error bound: 10-6 PapI Concentration

FSP Results (cont.) Joint Probability Density g1 g2 Probability Density g3 g4 tf=10s PapI concentration Guaranteed error bound: 10-6

The Systems Paradigm Stochastic Deterministic Master Equation Langevin Fokker- Planck Continuous Boolean Hybrid Simplified & Toy Models Structure Model Reduction Detailed Mechanistic Models Experiments Validation Sensitivity Analysis Monte Carlo Simulation (SSA, ) Simulation/ Analysis Tools Linear Noise Approximation Stochastic Stability Moment Closure method New! Finite State Projection (FSP) Algorithm Reverse Engineering / Design Principles Noise Exploitation Noise Attenuation Performance Efficiency & Optimality Robustness

Stochastic Stiffness in the Heat Shock Response Fast and Frequent reactions 70 RNAP rpoh free: 0-1 mol./cell mrna total: 30-100 mol./cell DnaK total: 10000 mol./cell folded Heat unfolded DNAK Fast, rare and important reaction RNAP RNAP ftsh DNAK Lon aggregate degradation Slow reactions DnaKFtsH Lon

Conclusions Systems Biology is concerned with the quantitative analysis of networks of dynamically interacting biological components, with the goal of reverse engineering these networks to understand how they robustly achieve biological function. Control strategies abound at all levels of organization Systems notions are required for a deeper understanding of biological function. Many similarities with engineering systems.

Conclusions Low copy numbers of important cellular components give rise to stochasticity in gene expression. This in turn may result in cell to cell variations. Organisms use stochasticity to their advantage. Stochastic modeling and computation is an emerging area of research in Systems Biology. New tools are being developed, e.g. FSP, SSA. More are needed. Many challenges and opportunities for control and system theorists!

Acknowledgments Heat Shock: Hana El-Samad (UCSF), Hiro Kurata (Kyushu), John Doyle (Caltech), Carol Gross (UCSF) Finite State Projection: Brian Munsky (UCSB) Pap Switch: Brian Munsky (UCSB), David Low (UCSB), Aaron Hernday(UCSB)