Gene Expression as a Stochastic Process: From Gene Number Distributions to Protein Statistics and Back

Similar documents
2. Mathematical descriptions. (i) the master equation (ii) Langevin theory. 3. Single cell measurements

the noisy gene Biology of the Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB)

Lecture 7: Simple genetic circuits I

A synthetic oscillatory network of transcriptional regulators

CELL BIOLOGY. by the numbers. Ron Milo. Rob Phillips. illustrated by. Nigel Orme

Bioinformatics: Network Analysis

Biomolecular Feedback Systems

arxiv: v2 [q-bio.qm] 12 Jan 2017

CS-E5880 Modeling biological networks Gene regulatory networks

2 Dilution of Proteins Due to Cell Growth

Stochastic dynamics of small gene regulation networks. Lev Tsimring BioCircuits Institute University of California, San Diego

A Simple Protein Synthesis Model

Stochastic model of mrna production

Boulder 07: Modelling Stochastic Gene Expression

Brief contents. Chapter 1 Virus Dynamics 33. Chapter 2 Physics and Biology 52. Randomness in Biology. Chapter 3 Discrete Randomness 59

56:198:582 Biological Networks Lecture 8

Stochastic simulations

Lecture 2: Analysis of Biomolecular Circuits

2. What was the Avery-MacLeod-McCarty experiment and why was it significant? 3. What was the Hershey-Chase experiment and why was it significant?

Cybergenetics: Control theory for living cells

Slow protein fluctuations explain the emergence of growth phenotypes and persistence in clonal bacterial populations

BME 5742 Biosystems Modeling and Control

Handbook of Stochastic Methods

Introduction to Bioinformatics

Single cell experiments and Gillespie s algorithm

Translation Part 2 of Protein Synthesis

An analytical model of the effect of plasmid copy number on transcriptional noise strength

Computational methods for continuous time Markov chains with applications to biological processes

Cellular Systems Biology or Biological Network Analysis

Problem Set 5. 1 Waiting times for chemical reactions (8 points)

Intrinsic Noise in Nonlinear Gene Regulation Inference

Colored extrinsic fluctuations and stochastic gene expression: Supplementary information

Introduction to Molecular and Cell Biology

Gene Network Science Diagrammatic Cell Language and Visual Cell

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Computing, Communication and Control

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

When do diffusion-limited trajectories become memoryless?

Lecture 4: Transcription networks basic concepts

Combined Model of Intrinsic and Extrinsic Variability for Computational Network Design with Application to Synthetic Biology

A Synthetic Oscillatory Network of Transcriptional Regulators

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

European Conference on Mathematical and Theoretical Biology Gothenburg

Handbook of Stochastic Methods

Elementary Applications of Probability Theory

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Simulating ribosome biogenesis in replicating whole cells

Multiple Choice Review- Eukaryotic Gene Expression

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

natural development from this collection of knowledge: it is more reliable to predict the property

Computational Biology: Basics & Interesting Problems

Slide 1 / Describe the setup of Stanley Miller s experiment and the results. What was the significance of his results?

7.32/7.81J/8.591J: Systems Biology. Fall Exam #1

Effects of Different Burst Form on Gene Expression Dynamic and First-Passage Time

Modeling Multiple Steady States in Genetic Regulatory Networks. Khang Tran. problem.

URL: <

Molecular Biology - Translation of RNA to make Protein *

Modeling and Systems Analysis of Gene Regulatory Networks

Flow of Genetic Information

Modelling Stochastic Gene Expression

Topic 1 - The building blocks of. cells! Name:!

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Stochastic simulations Application to molecular networks

Lecture 1 Modeling in Biology: an introduction

Gillespie s Algorithm and its Approximations. Des Higham Department of Mathematics and Statistics University of Strathclyde

Evolution of Phenotype as selection of Dynamical Systems 1 Phenotypic Fluctuation (Plasticity) versus Evolution

Heaving Toward Speciation

Stem Cell Reprogramming

Modelling Biochemical Pathways with Stochastic Process Algebra

The Effect of Stochasticity on the Lac Operon: An Evolutionary Perspective

BioControl - Week 6, Lecture 1

Introduction to Bioinformatics

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Introduction. Dagmar Iber Jörg Stelling. CSB Deterministic, SS 2015, 1.

From gene to protein. Premedical biology

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy.

STOCHASTIC REDUCTION METHOD FOR BIOLOGICAL CHEMICAL KINETICS USING TIME-SCALE SEPARATION

Gene regulatory networks: A coarse-grained, equation-free approach to multiscale computation

Network Biology-part II

Bi 1x Spring 2014: LacI Titration

Chetek-Weyerhaeuser Middle School

BIOLOGY STANDARDS BASED RUBRIC

Basic modeling approaches for biological systems. Mahesh Bule

Stochastic Processes around Central Dogma

Mathematical Biology - Lecture 1 - general formulation

Approximate inference for stochastic dynamics in large biological networks

Session 1: Probability and Markov chains

Stochastic Gene Expression: Modeling, Analysis, and Identification

Principles of Synthetic Biology: Midterm Exam

Lecture 5: Processes and Timescales: Rates for the fundamental processes 5.1

Extending the Tools of Chemical Reaction Engineering to the Molecular Scale

Stochastically driven genetic circuits

Virginia Western Community College BIO 101 General Biology I

Biology 112 Practice Midterm Questions

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Topic 2: Review of Probability Theory

Protein production. What can we do with ODEs

Biologists Study the Interactions of Life

Reaction Kinetics in a Tight Spot

Transcription:

Gene Expression as a Stochastic Process: From Gene Number Distributions to Protein Statistics and Back June 19, 2007

Motivation & Basics A Stochastic Approach to Gene Expression Application to Experimental Data Summary & Outlook

Gene Copy Number and Transfection A big hope of gene therapy is to treat diseases by use of artificial viruses, that bring genes (coding for beneficial proteins) into the cell.

Gene Copy Number and Transfection A big hope of gene therapy is to treat diseases by use of artificial viruses, that bring genes (coding for beneficial proteins) into the cell. Bad Treatment: Heterogeneous distribution of plasmids: Many cells get no plasmids, a few cells get many plasmids.

Gene Copy Number and Transfection A big hope of gene therapy is to treat diseases by use of artificial viruses, that bring genes (coding for beneficial proteins) into the cell. Bad Treatment: Heterogeneous distribution of plasmids: Many cells get no plasmids, a few cells get many plasmids. Good Treatment: Homogeneous distribution of plasmids: Most cells get a small number of plasmids.

The Central Dogma of Biology After import of genetic material, genes are expressed by the cellular machinery via transcription and translation. Each reaction is an inherently stochastic processes and thus a spread of in protein numbers is found after gene expression.

Intrinsic and Extrinsic Noise In biological systems noise arises from two sources: 1. Due to probabilistic nature of chemical reactions: Intrinsic Noise Can be treated by means of probability calculus: Master-, Fokker-Planck-Equation, Simulations. 2. Due to variations in rate constants (different cell volume, temperature, cell cycle state, number of enzymes, etc.): Extrinsic Noise Usually unknown nature and strength.

Deterministic Approach Assume the gene number D fixed. R(t) t P(t) t = λ 1 D δ 1 R(t) = λ 2 R(t) δ 2 P(t) These equations can be solved successively: R(t) = D λ 1 δ 1 (1 e δ1t ) The expression for P(t) is more complicated, but one finds P(t ) = D C with the expression factor C := λ1λ2 δ 1δ 2, which gives the number of proteins per gene.

Stochasticity - The Master Equation However, transcription, translation and degradation are stochastic processes. Probabilistic approach: Master equation We have a 2d state space, each state is characterized by by R and P. Usually we would need to deal with p R,P. Instead we split up the problem into two Master equations: p R t = λ 1 Dp R 1 + δ 1 (R + 1)p R+1 (λ 1 D + δ 1 R)p R

Stochasticity - The Master Equation However, transcription, translation and degradation are stochastic processes. Probabilistic approach: Master equation We have a 2d state space, each state is characterized by by R and P. Usually we would need to deal with p R,P. Instead we split up the problem into two Master equations: p R t p P t = λ 1 Dp R 1 + δ 1 (R + 1)p R+1 (λ 1 D + δ 1 R)p R = λ 2 R(t)p P 1 + δ 2 (P + 1)p P+1 (λ 2 R(t) + δ 2 P)p P The first equation is decoupled from the second and can be solved exactly, while the second one is more tricky...

mrna Distribution The solution to p R t = λ 1 Dp R 1 + δ 1 (R + 1)p R+1 (λ 1 D + δ 1 R)p R is given by a Poisson distribution where p R (t) = µ 1(t) R e µ1(t) R! µ 1 (t) = D λ 1 δ 1 ( 1 e δ 1 t ) is the mean mrna number, as also given by the deterministic rate equations.

Interlude: The Poisson Distribution Some properties: One-parametric distribution, i.e. the mean X fully determines the distribution. The mean is equal to the variance: X = var(x ) For large mean, by the central limit theorem, a Poissonian is equivalent to a Gaussian.

Protein Distribution p P = λ 2 R(t)p P 1 + δ 2 (P + 1)p P+1 (λ 2 R(t) + δ 2 P)p P t is analogous to the Master equation for p R, apart from the random variable R(t) taking the place of D. The solution is yet again a Poisson distribution: p P (t) = µ 2(t) P e µ2(t) P! Now the mean is a functional of R(t): t ) µ 2 [R(t)] = (λ 2 R(t )e δ2 t dt e δ2 t 0

Protein Distribution p P = λ 2 R(t)p P 1 + δ 2 (P + 1)p P+1 (λ 2 R(t) + δ 2 P)p P t is analogous to the Master equation for p R, apart from the random variable R(t) taking the place of D. The solution is yet again a Poisson distribution: p P (t) = µ 2(t) P e µ2(t) P! Now the mean is a functional of R(t): t µ 2 [R(t)] = (λ 2 ) R(t )e δ2 t dt e 0 1 t t δ2 t δ = 2 λ 2 0 R(t )e δ2 t dt δ t 2 0 eδ2 t dt

Protein Distribution p P = λ 2 R(t)p P 1 + δ 2 (P + 1)p P+1 (λ 2 R(t) + δ 2 P)p P t is analogous to the Master equation for p R, apart from the random variable R(t) taking the place of D. The solution is yet again a Poisson distribution: p P (t) = µ 2(t) P e µ2(t) P! Now the mean is a functional of R(t): t µ 2 [R(t)] = (λ 2 ) R(t )e δ2 t dt e 0 1 t t δ2 t δ = 2 λ 2 0 R(t )e δ2 t dt δ t 2 0 eδ2 t dt This is a weighted temporal average of R(t), where the weighting function is exp(δ 2 t). The recent past has the most weight!

Protein Distribution p P = λ 2 R(t)p P 1 + δ 2 (P + 1)p P+1 (λ 2 R(t) + δ 2 P)p P t is analogous to the Master equation for p R, apart from the random variable R(t) taking the place of D. The solution is yet again a Poisson distribution: p P (t) = µ 2(t) P e µ2(t) P! Now the mean is a functional of R(t): t µ 2 [R(t)] = (λ 2 ) R(t )e δ2 t dt e 0 1 t t δ2 t δ = 2 λ 2 0 R(t )e δ2 t dt δ t 2 0 eδ2 t dt This is a weighted temporal average of R(t), where the weighting function is exp(δ 2 t). The recent past has the most weight! Problem: Every cell has a different realization of R(t) for every cell µ 2 is different!

Separation of Time Scales: 1) mrna kinetics 1/δ 2 R(t) changes rapidly compared to the lifetimes 1 of proteins δ 2 i.e. R(t) totally explores its distribution while the proteins in each cell only see the average R(t) = µ 1 : µ 2 (t) = λ t 2 0 R(t )e δ2 t dt δ t 2 0 eδ2 t dt

Separation of Time Scales: 1) mrna kinetics 1/δ 2 R(t) changes rapidly compared to the lifetimes 1 of proteins δ 2 i.e. R(t) totally explores its distribution while the proteins in each cell only see the average R(t) = µ 1 : µ 2 (t) = λ t 2 0 R(t )e δ2 t dt δ t = λ t 2 0 µ 1(t)e δ2 t dt 2 0 eδ2 t dt δ t 2 0 eδ2 t dt t = λ 1 λ 2 δ 1 δ 2 }{{} :=C D

Separation of Time Scales: 2) mrna kinetics 1/δ 2 R(t) changes sluggishly, while proteins follow that signal and equilibrate to new steady state, forgetting the past very fast. The mean of the P is determined only by the recent past of R(t), which can be assumed to be constant in that period. For cells which have R mrnas presently, the proteins have a Poisson distribution with mean µ 2 (t) = λ t 2 0 R eδ2 t dt δ t = λ 2 R. 2 0 eδ2 t dt δ 2

Separation of Time Scales: 2) mrna kinetics 1/δ 2 R(t) changes sluggishly, while proteins follow that signal and equilibrate to new steady state, forgetting the past very fast. The mean of the P is determined only by the recent past of R(t), which can be assumed to be constant in that period. For cells which have R mrnas presently, the proteins have a Poisson distribution with mean µ 2 (t) = λ t 2 0 R eδ2 t dt δ t = λ 2 R. 2 0 eδ2 t dt δ 2 For the whole population we have to sum up all possible states of R, each with the weight according to its probability: p P = R=0 p R ( ) P λ2 δ 2 R P! e λ 2 δ 2 R

Separation of Time Scales: 2) mrna kinetics 1/δ 2 R(t) changes sluggishly, while proteins follow that signal and equilibrate to new steady state, forgetting the past very fast. The mean of the P is determined only by the recent past of R(t), which can be assumed to be constant in that period. For cells which have R mrnas presently, the proteins have a Poisson distribution with mean µ 2 (t) = λ t 2 0 R eδ2 t dt δ t = λ 2 R. 2 0 eδ2 t dt δ 2 For the whole population we have to sum up all possible states of R, each with the weight according to its probability: p P = R=0 p R A superposition of Poissonians! ( ) P λ2 δ 2 R P! e λ 2 δ 2 R

Separation of Time Scales: 2) mrna kinetics 1/δ 2 Examples The distribution of mrna is still visible in the distribution of proteins. Note: If R = 0 then the Poissonian for P collapses to a peak at P = 0 with height p R=0.

Random Number of Genes Upon viral infection, transfection or generally in bacteria carrying plasmids or minichromosomes, the number of genes varies from individual to individual. Thus D is not longer constant, but itself a random variable, subject to a distribution p D. In general, to find the protein distribution pp tot for the whole population we have to sum over the protein distributions p P (D) of subpopulations with gene copy numbers D according to their respective probabilities: pp tot = p D p P (D) D=0

Random Number of Genes Upon viral infection, transfection or generally in bacteria carrying plasmids or minichromosomes, the number of genes varies from individual to individual. Thus D is not longer constant, but itself a random variable, subject to a distribution p D. In general, to find the protein distribution pp tot for the whole population we have to sum over the protein distributions p P (D) of subpopulations with gene copy numbers D according to their respective probabilities: pp tot = p D p P (D) D=0 Since this expression can t, in general, be determined explicitly, we stick to the biological relevant case mrna kinetics 1/δ 2, as discussed above. Again we find a sum of Poissonians: p P = D=0 p D µ P 2 P! e µ2 = D=0 (DC) P p D e DC P!

Random Number of Genes Upon viral infection, transfection or generally in bacteria carrying plasmids or minichromosomes, the number of genes varies from individual to individual. Thus D is not longer constant, but itself a random variable, subject to a distribution p D. In general, to find the protein distribution pp tot for the whole population we have to sum over the protein distributions p P (D) of subpopulations with gene copy numbers D according to their respective probabilities: pp tot = p D p P (D) D=0 Since this expression can t, in general, be determined explicitly, we stick to the biological relevant case mrna kinetics 1/δ 2, as discussed above. Again we find a sum of Poissonians: p P = D=0 p D µ P 2 P! e µ2 = D=0 (DC) P p D e DC P! Note: In the opposite case (mrna kinetics 1/δ 2 ) we would have a superposition of superpositions of Poissonians...

Random Number of Genes Why is this interesting? p P = D=0 (DC) P p D e DC P!

Random Number of Genes p P = D=0 (DC) P p D e DC P! Why is this interesting? Properties of the Poisson Distribution and C often 1! 1. For C 1 the Poissonians have large mean can be approximated by Gaussians!

Random Number of Genes p P = D=0 (DC) P p D e DC P! Why is this interesting? Properties of the Poisson Distribution and C often 1! 1. For C 1 the Poissonians have large mean can be approximated by Gaussians! 2. Distance between means of two adjacent Poissonians is C while their respective widths go like σ = DC. significant overlap only for D > (C 1)2 4C

From the Protein Distribution to Copy Number Statistics Examples

From the Protein Distribution to Copy Number Statistics Examples

From the Protein Distribution to Copy Number Statistics Examples While separation of the Gauss peaks is still much greater then their widths one can even approximate then by a sum of delta peaks: p P = p D δ P,D C ; D N 0 Discretized approximation

From the Protein Distribution to Copy Number Statistics Examples While separation of the Gauss peaks is still much greater then their widths one can even approximate then by a sum of delta peaks: p P = p D δ P,D C ; D N 0 Discretized approximation Mean P Variance σ 2 (P) Sum of Poissonians 500 5.05 10 4 Sum of Gaussians 500 5.05 10 4 Sum of Gaussians with ηext = 0.1 500 5.35 10 4 Sum of δ-peaks 500 5.00 10 4

Single Cell Protein Measurements By single cell studies it is possible to obtain protein numbers of single cells (e.g. by use of GFP and derivatives), but the gene number distribution cannot be measured directly and sometime rate constants and expression factor are unknown. In these cases the above theory can be applied, if C 1 and mrna kinetics 1/δ 2 : 1. Compute mean P and variance var(p) of measured protein numbers. 2. Use discretized approximation: Mean and variance are homogeneous functions of degree 1 and 2, respectively. C = var(p) P 3. Compute the mean gene copy number D = P C. 4. If the gene copy number distribution is Poisson (meaningful for transfection), then we know everything about it! 5. From the found p D we can compute the theoretical p P and compare to the measured protein distribution as a check for consistency.

Results Non-fluorescent cells allow for independent measurement. Strong noise and bias to the left call for improved experiments and data analysis. C from C from C from p D=0 P σ 2 (P) D p D=0 and P p D=0 and σ 2 (P) P and σ 2 (P) PEI synch. 0.4 4.46 10 6 9.44 10 12 1.38 3.49 10 6 2.46 10 6 3.24 10 6 PEI asynch. 0.23 2.56 10 6 5.84 10 12 1.29 2.25 10 6 1.26 10 6 1.99 10 6 Lipo synch. 0.3 5.91 10 6 1.65 10 13 1.38 4.97 10 6 2.54 10 6 4.29 10 6 Lipo asynch. 0.3 3.75 10 6 1.20 10 13 1.29 3.15 10 6 2.16 10 6 2.90 10 6

Summary: Distributions give us information about the underlying processes. Expression factor C := λ1λ2 δ 1δ 2 can be obtained from protein distribution, yielding a functional relationship between the rates. Mean number of genes D and even distribution of genes can be computed. Transfection process can be tested for quality. Outlook: Incorporate promotor activity, poly-a-mrna-degradation, etc. into analysis. Check derived results by tuning rates: modification of promotor sequence, destabilizing proteins, mutations in the gene s open reading frames... Improve experimental setup, better data analysis, reduce extrinsic noise.

Thanks for your attention!