Large data sets and comple models: A view from Systems Biology Bärbel Finenstädt, Department of Statistics, University of Warwic OWaSP worshop October 5
Joint wor with Mathematical and Statistical Modelling: Sylvia Calderao, Simone Tiberi, Kirsty Hey, Hiroshi Momiji, Dafyd Jenins, George Minas, David Rand (University of Warwic) Eperimental Collaboration: PRL epression in pituitary gland (Julien Davis, Mie White, University of Manchester) Chronotherapy and Cancer Research Unit (Francis Lévi and group), Warwic Medical School and INSERM, Paris Molecular Neurobiology of circadian timing (Michael Hastings and group), MRC Laboratory of Molecular Biology, Cambridge PRESTA Consortium (University of Warwic) Funding: BBSRC, EPSRC, MRC, EU (BIOSIM), Wellcome
Gene Epression
Standard Model of Gene Epression (Single Gene) DNA$ Transcrip3on$ Transla3on$ mrna$ Protein$ β(t)$ α Degrada3on$ δ $ Degrada3on$ δ $
Standard Model of Gene Epression (Single Gene, ODE Version) dm(t) dt = (t) mm(t), dp(t) dt = m(t) pp(t) where m(t) : concentration of mrna p(t) : concentration of protein (t) : transcription rate of mrna m, p : degradation rate of mrna, protein
Standard Model of gene epression (Stochastic version, single gene) X(t) = Xm (t) X p (t) 4 reactions (transcription, degradation mrna, translation, degradation protein) v =,v =,v 3 =,v 4 = A reaction of type j changes X(t) tox(t)v j. Each reaction occurs at a rate w j (X(t)).
Event E ect Transition Rate Transcription (X m,x p )! (X m,x p ) w = (t) Degradation of mrna (X m,x p )! (X m,x p ) w = m X m (t) Translation (X m,x p )! (X m,x p ) w 3 = X m (t) Degradation of protein (X m,x p )! (X m,x p ) w 4 = p X p (t) Table : Summary of reactions in the standard model of gene epression Reaction networs constitute continuous time Marov jump processes and thus satisfy the Chapman-Kolmogorov equation for which one can obtain the forward form nown as the master equation (ME) describing the evolution of the probability P (X m = n,x p = n ; t). Although an eact numerical simulation algorithm is provided (Gillespie,977) the ME is rarely tractable and hence an eplicit formula for the eact lielihood is not available for parameter inference.
Gene Networs
Figure. A model for the mammalian circadian cloc. Relógio A, Westermar PO, Wallach T, Schellenberg K, Kramer A, et al. () Tuning the Mammalian Circadian Cloc: Robust Synergy of Two Loops. PLoS Comput Biol 7(): e39. doi:.37/journal.pcbi.39 http://7...:88/ploscompbiol/article?id=info:doi/.37/journal.pcbi.39
CLOCK/BMAL 7 d d f dt d = () Rev-Erb 3 3 3 3 3 3 3 ma 3 y d PC g V dt dy y v t v t w i v t # $ & ' # $ & ' # $ & ' # $ & ' = () Ror 4 4 4 4 4 4 4 ma 4 y d PC h V dt dy y p t p t q i p t # $ & ' # $ & ' # $ & ' # $ & ' = (3) REV-ERB C 6 6 ) 3 3 ( 6 6 6 3 d i y y dt d p = (4) ROR C 7 7 ) 4 4 ( 7 7 7 4 d i y y dt d p = (5) REV-ERB N 5 6 5 5 6 d i dt d = (6) ROR N 6 7 6 6 7 d i dt d = (7) Bmal 5 6 5 6 5 5 5 5 5 ma 5 y d i V dt dy y n t m i n t # $ & ' # $ & ' # $ & ' = (8) BMAL C 8 8 ) 5 5 ( 8 8 8 5 d i y y dt d p = (9) BMAL N 7 7 8 7 7 8 d f d i dt d = () Per ma y d PC a V dt dy y b t b t c i b t # $ & ' # $ & ' # $ & ' # $ & ' = () Cry 5 ma y d PC d V dt dy y f i e t e t f i e t # $ & ' # $ & ' # $ & ' # $ & ' # $ & ' = () CRY C 3 5 4 ) ( 4 5 5 4 d f f d d y y dt d p = (3)
Gene Epression Data
Plant Microarray time series (PRESTA project) Norm. ep. 4 4 4 4 Norm. ep. 4 4 4 4 Norm. ep. 4 4 4 4 Norm. ep. 4 Timescale (hours) 4 Timescale (hours) 4 Timescale (hours) 4 Timescale (hours)
Searching for transcription factors (TFs) of gene regulation from microarray data Scenario: Have (replicate) time series microarray gene epression data across various eperiments and a set of candidate parents from YH eperiments
Modeling: Transcription β(t) of a gene (Child gene) is regulated by transcriptional activators and/or inhibitors. These are protein products, called Transcription Factors (TFs), of other genes (Parent genes). Wish to draw inference about regulation of mrna transcription of a child gene by an unnown subset = {,,..., } of the candidate parents G = {g,g,...,g G } Switches occur at times where the epression, Pγ (t) of a parent γ Γ crosses a threshold level due to an increase or decrease.
Define activation function (t) =( (t), (t),..., (t)) The set of switch times of the child s transcription is the set of time-points {s,s,...,s } at which at least one parent crosses its threshold. The total time interval split into subintervals [,s ], (s,s ],...,(s,l] where the transcription rates of the subintervals are the same if the epression of all parent genes remains at the same state For a given activation function (t) we then have a set of parental states, b j,j =,,...,apple observed in the union I bj of (at least one of) the above subintervals with the corresponding transcription rates bj,j =,,...,apple
dm dt = 8 b M M(t), for t I b >< b M M(t), for t I b...... >: bapple M M(t), for t I. bapple
Statistical Implementation: Bayesian approach: RJMCMC to infer plausible models for parents Lielihood based on normal error with inhomogeneous variance Weighted Least Squares solution to piecewise linear ODE maes algorithm fast Delayed RNA as proy for unobserved TF s
Single cell imaging data
Reporter Gene constructs ; ; degradation degradation mrna Protein regulation translation DNA transcription translation measurements reporter mrna reporter Protein Light Intensity degradation degradation ; ;
Modelling single cell data The class of state space models provides a unifying framewor for modelling SRNs Y t g(y t t, ) X t h( t ) Y Y Y... Y T X X X... X T h is the transition density of the approimating SRN g is the density of the measurement process
Approimations ODE (neglects intrinsic noise) Chemical Langevin Equation (usually intractable) Linear Noise Approimation (lineariation of the ME, tractable) Other approimations?
Parameter Inference The data lielihood is given by the marginal density f(y ) = Z f(y, )d where the integrand can be factoried as f(y, ) =h( )g(y, ) TY h( t t, )g(y t t, ) t=
Parameter Inference Under the LNA (with Gaussian measurement error): Interval can be evaluated eplicitly using Kalman methodology. Inference can be achieved by sampling from the posterior f( y) Under BDA (for eample) : use -step Gibbs sampler:. Sample the parameter vector from f(, y). Sample the latent states from the filtering density f( y, )
4 4 5 3 5 3 5 3 4 Time (hours) 5 3 4 Time (hours)
Challenges for statisticians in Systems Biology and Systems Medicine Comple and large networs of genes and their products Data from many sources: Replicates and if so what type? Many cells (Bayesian Hierarchical Modeling) Various labs (Prior Distributions) Destructive Sampling. Different Types of Eperiments imply different modeling approaches Information about intrinsic noise? Etrinsic noise? Destructive sampling? Reporter gene constructs? Which reporter gene? Type of camera used for imaging
Not all state variables are observed Can use distributed delay functions as a proy? Parameter Identifiability Spatio-temporal modelling and inference Collaboration between eperimentalists and mathematician/statisticians
Research Questions Mammalian Circadian Oscillator under diseases and treatments. Eploitation for Chronotherapy Development of signal processing tools for circadian time series to be used in forecasting systems for patient care at home Development of Spatio-temporal models to investigate how cells synchronie in the SCN
#B*(shBmal)* Photon* Implanta(on* pompe* *******************************3**********4**********5**********6**********7***********8*********9**********************************3********4********5*******6** ******************************3**********4**********5**********6*********7**********8**********9*********************************3********4********5********6** Jouraprèsinocula.ontumorale
CT49&:&KI/KI&Per::luc&male&mouse&(B)& Implanta(on* pump* DEN*(5*mg/g/j,*ip)* Ac)vity& Photon& 7&&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3&&&&&&&&&&4&&&&&&&&&&5&&&&&&&&&&6&&&&&&&&&&7&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3& 7&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3&&&&&&&&&&4&&&&&&&&&&5&&&&&&&&&&6&&&&&&&&&7&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3& Day&
45 4 35 3 5 5 5 ZCM$PICADO$ 9::3 :3:3 :4:3 3:36:3 5:8:3 6:4:3 8::3 9:44:3 :6:3 :48:3 ::3 :5:3 3:4:3 4:56:3 6:8:3 8::3 9:3:3 :4:3 :36:3 4:8:3 5:4:3 7::3 8:44:3 :6:3 :48:3 3::3 :5:3 :4:3 3:56:3 5:8:3 7::3 8:3:3 :4:3 :36:3 3:8:3 4:4:3 6::3 7:44:3 9:6:3 :48:3 ::3 3:5:3 :4:3 :56:3 4:8:3 6::3 7:3:3 9:4:3 :36:3 :8:3 3:4:3 5::3 6:44:3 8:6:3 9:48:3 ::3 :5:3 :4:3 :56:3 3:8:3 5::3 6:3:3 8:4:3 38# 36# 34# 3# 3# 8# 6# 4# # # :# :# :# :# :# :# :# :# :# :# :# Temp#capteur# Temp#collecteur#
Figure. A model for the mammalian circadian cloc. Relógio A, Westermar PO, Wallach T, Schellenberg K, Kramer A, et al. () Tuning the Mammalian Circadian Cloc: Robust Synergy of Two Loops. PLoS Comput Biol 7(): e39. doi:.37/journal.pcbi.39 http://7...:88/ploscompbiol/article?id=info:doi/.37/journal.pcbi.39