Large data sets and complex models: A view from Systems Biology

Similar documents
Intrinsic Noise in Nonlinear Gene Regulation Inference

Modelling gene expression dynamics with Gaussian processes

F denotes cumulative density. denotes probability density function; (.)

Probabilistic Graphical Models

2. Mathematical descriptions. (i) the master equation (ii) Langevin theory. 3. Single cell measurements

Why Should I Care About. Stochastic Hybrid Systems?

Trends in Scientific Investigation

7.32/7.81J/8.591J: Systems Biology. Fall Exam #1

Bioinformatics 2 - Lecture 4

Cybergenetics: Control theory for living cells

Langevin Monte Carlo Filtering for Target Tracking

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.

Part 3: Introduction to Master Equation and Complex Initial Conditions in Lattice Microbes

1 Introduction. Maria Luisa Guerriero. Jane Hillston

State Space and Hidden Markov Models

56:198:582 Biological Networks Lecture 8

CS-E5880 Modeling biological networks Gene regulatory networks

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Network Biology-part II

Expression arrays, normalization, and error models

CS Lecture 18. Expectation Maximization

Time Delay Estimation: Microlensing

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Gene Expression as a Stochastic Process: From Gene Number Distributions to Protein Statistics and Back

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Dynamical-Systems Perspective to Stem-cell Biology: Relevance of oscillatory gene expression dynamics and cell-cell interaction

Modeling and Systems Analysis of Gene Regulatory Networks

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations

Markov Chain Monte Carlo Methods for Stochastic Optimization

Gaussian Mixture Model

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

BIOMED Programming & Modeling for BME Final Exam, , Instructor: Ahmet Sacan

Written Exam 15 December Course name: Introduction to Systems Biology Course no

A Simple Protein Synthesis Model

Introduction to Bioinformatics

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chemistry Chapter 26

Genomic Data Assimilation for Estimating Hybrid Functional Petri Net from Time-Course Gene Expression Data

Linear Dynamical Systems

Modelling in Biology

A Synthetic Oscillatory Network of Transcriptional Regulators

Gene Autoregulation via Intronic micrornas and its Functions

For final project discussion every afternoon Mark and I will be available

Lecture 6: Bayesian Inference in SDE Models

Discovering molecular pathways from protein interaction and ge

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Molecular Biology: from sequence analysis to signal processing. University of Sao Paulo. Junior Barrera

State Estimation and Prediction in a Class of Stochastic Hybrid Systems

Approximate inference for stochastic dynamics in large biological networks

MCMC: Markov Chain Monte Carlo

Bayesian Sparse Correlated Factor Analysis

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering

Genetic transcription and regulation

Causal Discovery by Computer

Expectation propagation for signal detection in flat-fading channels

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

the noisy gene Biology of the Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB)

Control Theory in Physics and other Fields of Science

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian inference for nonlinear multivariate diffusion processes (and other Markov processes, and their application to systems biology)

Networks in systems biology

Particle Filtering Approaches for Dynamic Stochastic Optimization

Statistical inference of the time-varying structure of gene-regulation networks

eqr094: Hierarchical MCMC for Bayesian System Reliability

Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential Use in Nonlinear Robust Estimation

Lecture 7: Simple genetic circuits I

A synthetic oscillatory network of transcriptional regulators

Bioinformatics 3. V18 Kinetic Motifs. Fri, Jan 8, 2016

Bioinformatics 3! V20 Kinetic Motifs" Mon, Jan 13, 2014"

Matteo Figliuzzi

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Modelling Transcriptional Regulation with Gaussian Processes

A STATE ESTIMATOR FOR NONLINEAR STOCHASTIC SYSTEMS BASED ON DIRAC MIXTURE APPROXIMATIONS

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Simultaneous state and input estimation with partial information on the inputs

Identifying Bio-markers for EcoArray

Markov chain optimisation for energy systems (MC-ES)

Bayesian Linear Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Need for Sampling in Machine Learning. Sargur Srihari

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Chapter 9: Bayesian Methods. CS 336/436 Gregory D. Hager

L'horloge circadienne, le cycle cellulaire et leurs interactions. Madalena CHAVES

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

BIOLOGY IB DIPLOMA PROGRAM PARENTS ORIENTATION DAY

Mixed Membership Matrix Factorization

Systems biology and complexity research

State Space Representation of Gaussian Processes

ASSESSMENT OF REGRESSION METHODS FOR INFERENCE OF REGULATORY NETWORKS INVOLVED IN CIRCADIAN REGULATION

Biomolecular Feedback Systems

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Geert Geeven. April 14, 2010

10-701/15-781, Machine Learning: Homework 4

Roles of Diagrams in Computational Modeling of Mechanisms

Gaussian Process Approximations of Stochastic Differential Equations

Transcription:

Large data sets and comple models: A view from Systems Biology Bärbel Finenstädt, Department of Statistics, University of Warwic OWaSP worshop October 5

Joint wor with Mathematical and Statistical Modelling: Sylvia Calderao, Simone Tiberi, Kirsty Hey, Hiroshi Momiji, Dafyd Jenins, George Minas, David Rand (University of Warwic) Eperimental Collaboration: PRL epression in pituitary gland (Julien Davis, Mie White, University of Manchester) Chronotherapy and Cancer Research Unit (Francis Lévi and group), Warwic Medical School and INSERM, Paris Molecular Neurobiology of circadian timing (Michael Hastings and group), MRC Laboratory of Molecular Biology, Cambridge PRESTA Consortium (University of Warwic) Funding: BBSRC, EPSRC, MRC, EU (BIOSIM), Wellcome

Gene Epression

Standard Model of Gene Epression (Single Gene) DNA$ Transcrip3on$ Transla3on$ mrna$ Protein$ β(t)$ α Degrada3on$ δ $ Degrada3on$ δ $

Standard Model of Gene Epression (Single Gene, ODE Version) dm(t) dt = (t) mm(t), dp(t) dt = m(t) pp(t) where m(t) : concentration of mrna p(t) : concentration of protein (t) : transcription rate of mrna m, p : degradation rate of mrna, protein

Standard Model of gene epression (Stochastic version, single gene) X(t) = Xm (t) X p (t) 4 reactions (transcription, degradation mrna, translation, degradation protein) v =,v =,v 3 =,v 4 = A reaction of type j changes X(t) tox(t)v j. Each reaction occurs at a rate w j (X(t)).

Event E ect Transition Rate Transcription (X m,x p )! (X m,x p ) w = (t) Degradation of mrna (X m,x p )! (X m,x p ) w = m X m (t) Translation (X m,x p )! (X m,x p ) w 3 = X m (t) Degradation of protein (X m,x p )! (X m,x p ) w 4 = p X p (t) Table : Summary of reactions in the standard model of gene epression Reaction networs constitute continuous time Marov jump processes and thus satisfy the Chapman-Kolmogorov equation for which one can obtain the forward form nown as the master equation (ME) describing the evolution of the probability P (X m = n,x p = n ; t). Although an eact numerical simulation algorithm is provided (Gillespie,977) the ME is rarely tractable and hence an eplicit formula for the eact lielihood is not available for parameter inference.

Gene Networs

Figure. A model for the mammalian circadian cloc. Relógio A, Westermar PO, Wallach T, Schellenberg K, Kramer A, et al. () Tuning the Mammalian Circadian Cloc: Robust Synergy of Two Loops. PLoS Comput Biol 7(): e39. doi:.37/journal.pcbi.39 http://7...:88/ploscompbiol/article?id=info:doi/.37/journal.pcbi.39

CLOCK/BMAL 7 d d f dt d = () Rev-Erb 3 3 3 3 3 3 3 ma 3 y d PC g V dt dy y v t v t w i v t # $ & ' # $ & ' # $ & ' # $ & ' = () Ror 4 4 4 4 4 4 4 ma 4 y d PC h V dt dy y p t p t q i p t # $ & ' # $ & ' # $ & ' # $ & ' = (3) REV-ERB C 6 6 ) 3 3 ( 6 6 6 3 d i y y dt d p = (4) ROR C 7 7 ) 4 4 ( 7 7 7 4 d i y y dt d p = (5) REV-ERB N 5 6 5 5 6 d i dt d = (6) ROR N 6 7 6 6 7 d i dt d = (7) Bmal 5 6 5 6 5 5 5 5 5 ma 5 y d i V dt dy y n t m i n t # $ & ' # $ & ' # $ & ' = (8) BMAL C 8 8 ) 5 5 ( 8 8 8 5 d i y y dt d p = (9) BMAL N 7 7 8 7 7 8 d f d i dt d = () Per ma y d PC a V dt dy y b t b t c i b t # $ & ' # $ & ' # $ & ' # $ & ' = () Cry 5 ma y d PC d V dt dy y f i e t e t f i e t # $ & ' # $ & ' # $ & ' # $ & ' # $ & ' = () CRY C 3 5 4 ) ( 4 5 5 4 d f f d d y y dt d p = (3)

Gene Epression Data

Plant Microarray time series (PRESTA project) Norm. ep. 4 4 4 4 Norm. ep. 4 4 4 4 Norm. ep. 4 4 4 4 Norm. ep. 4 Timescale (hours) 4 Timescale (hours) 4 Timescale (hours) 4 Timescale (hours)

Searching for transcription factors (TFs) of gene regulation from microarray data Scenario: Have (replicate) time series microarray gene epression data across various eperiments and a set of candidate parents from YH eperiments

Modeling: Transcription β(t) of a gene (Child gene) is regulated by transcriptional activators and/or inhibitors. These are protein products, called Transcription Factors (TFs), of other genes (Parent genes). Wish to draw inference about regulation of mrna transcription of a child gene by an unnown subset = {,,..., } of the candidate parents G = {g,g,...,g G } Switches occur at times where the epression, Pγ (t) of a parent γ Γ crosses a threshold level due to an increase or decrease.

Define activation function (t) =( (t), (t),..., (t)) The set of switch times of the child s transcription is the set of time-points {s,s,...,s } at which at least one parent crosses its threshold. The total time interval split into subintervals [,s ], (s,s ],...,(s,l] where the transcription rates of the subintervals are the same if the epression of all parent genes remains at the same state For a given activation function (t) we then have a set of parental states, b j,j =,,...,apple observed in the union I bj of (at least one of) the above subintervals with the corresponding transcription rates bj,j =,,...,apple

dm dt = 8 b M M(t), for t I b >< b M M(t), for t I b...... >: bapple M M(t), for t I. bapple

Statistical Implementation: Bayesian approach: RJMCMC to infer plausible models for parents Lielihood based on normal error with inhomogeneous variance Weighted Least Squares solution to piecewise linear ODE maes algorithm fast Delayed RNA as proy for unobserved TF s

Single cell imaging data

Reporter Gene constructs ; ; degradation degradation mrna Protein regulation translation DNA transcription translation measurements reporter mrna reporter Protein Light Intensity degradation degradation ; ;

Modelling single cell data The class of state space models provides a unifying framewor for modelling SRNs Y t g(y t t, ) X t h( t ) Y Y Y... Y T X X X... X T h is the transition density of the approimating SRN g is the density of the measurement process

Approimations ODE (neglects intrinsic noise) Chemical Langevin Equation (usually intractable) Linear Noise Approimation (lineariation of the ME, tractable) Other approimations?

Parameter Inference The data lielihood is given by the marginal density f(y ) = Z f(y, )d where the integrand can be factoried as f(y, ) =h( )g(y, ) TY h( t t, )g(y t t, ) t=

Parameter Inference Under the LNA (with Gaussian measurement error): Interval can be evaluated eplicitly using Kalman methodology. Inference can be achieved by sampling from the posterior f( y) Under BDA (for eample) : use -step Gibbs sampler:. Sample the parameter vector from f(, y). Sample the latent states from the filtering density f( y, )

4 4 5 3 5 3 5 3 4 Time (hours) 5 3 4 Time (hours)

Challenges for statisticians in Systems Biology and Systems Medicine Comple and large networs of genes and their products Data from many sources: Replicates and if so what type? Many cells (Bayesian Hierarchical Modeling) Various labs (Prior Distributions) Destructive Sampling. Different Types of Eperiments imply different modeling approaches Information about intrinsic noise? Etrinsic noise? Destructive sampling? Reporter gene constructs? Which reporter gene? Type of camera used for imaging

Not all state variables are observed Can use distributed delay functions as a proy? Parameter Identifiability Spatio-temporal modelling and inference Collaboration between eperimentalists and mathematician/statisticians

Research Questions Mammalian Circadian Oscillator under diseases and treatments. Eploitation for Chronotherapy Development of signal processing tools for circadian time series to be used in forecasting systems for patient care at home Development of Spatio-temporal models to investigate how cells synchronie in the SCN

#B*(shBmal)* Photon* Implanta(on* pompe* *******************************3**********4**********5**********6**********7***********8*********9**********************************3********4********5*******6** ******************************3**********4**********5**********6*********7**********8**********9*********************************3********4********5********6** Jouraprèsinocula.ontumorale

CT49&:&KI/KI&Per::luc&male&mouse&(B)& Implanta(on* pump* DEN*(5*mg/g/j,*ip)* Ac)vity& Photon& 7&&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3&&&&&&&&&&4&&&&&&&&&&5&&&&&&&&&&6&&&&&&&&&&7&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3& 7&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3&&&&&&&&&&4&&&&&&&&&&5&&&&&&&&&&6&&&&&&&&&7&&&&&&&&&&8&&&&&&&&&&9&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&3& Day&

45 4 35 3 5 5 5 ZCM$PICADO$ 9::3 :3:3 :4:3 3:36:3 5:8:3 6:4:3 8::3 9:44:3 :6:3 :48:3 ::3 :5:3 3:4:3 4:56:3 6:8:3 8::3 9:3:3 :4:3 :36:3 4:8:3 5:4:3 7::3 8:44:3 :6:3 :48:3 3::3 :5:3 :4:3 3:56:3 5:8:3 7::3 8:3:3 :4:3 :36:3 3:8:3 4:4:3 6::3 7:44:3 9:6:3 :48:3 ::3 3:5:3 :4:3 :56:3 4:8:3 6::3 7:3:3 9:4:3 :36:3 :8:3 3:4:3 5::3 6:44:3 8:6:3 9:48:3 ::3 :5:3 :4:3 :56:3 3:8:3 5::3 6:3:3 8:4:3 38# 36# 34# 3# 3# 8# 6# 4# # # :# :# :# :# :# :# :# :# :# :# :# Temp#capteur# Temp#collecteur#

Figure. A model for the mammalian circadian cloc. Relógio A, Westermar PO, Wallach T, Schellenberg K, Kramer A, et al. () Tuning the Mammalian Circadian Cloc: Robust Synergy of Two Loops. PLoS Comput Biol 7(): e39. doi:.37/journal.pcbi.39 http://7...:88/ploscompbiol/article?id=info:doi/.37/journal.pcbi.39