Modelling Transcriptional Regulation with Gaussian Processes

Similar documents
Inferring Latent Functions with Gaussian Processes in Differential Equations

Introduction to Gaussian Processes

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Gaussian Processes (10/16/13)

STAT 518 Intro Student Presentation

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Nonparametric Bayesian Methods (Gaussian Processes)

Modelling gene expression dynamics with Gaussian processes

Reliability Monitoring Using Log Gaussian Process Regression

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

A variational radial basis function approximation for diffusion processes

The Variational Gaussian Approximation Revisited

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Probabilistic & Unsupervised Learning

Gaussian process for nonstationary time series prediction

Model Selection for Gaussian Processes

GWAS V: Gaussian processes

Gaussian Process Regression

Neutron inverse kinetics via Gaussian Processes

Gaussian with mean ( µ ) and standard deviation ( σ)

Variational Model Selection for Sparse Gaussian Process Regression

Introduction to Gaussian Processes

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

System identification and control with (deep) Gaussian processes. Andreas Damianou

Glasgow eprints Service

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Learning Gaussian Process Models from Uncertain Data

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Non Linear Latent Variable Models

FastGP: an R package for Gaussian processes

Nonparameteric Regression:

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Probabilistic Graphical Models Lecture 20: Gaussian Processes

STA 4273H: Statistical Machine Learning

Lecture 5: GPs and Streaming regression

Bayesian learning of sparse factor loadings

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

STA 4273H: Sta-s-cal Machine Learning

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Variational Principal Components

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Probabilistic & Bayesian deep learning. Andreas Damianou

Recent Advances in Bayesian Inference Techniques

State Space Gaussian Processes with Non-Gaussian Likelihoods

Prediction of double gene knockout measurements

Derivative observations in Gaussian Process models of dynamic systems

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

GAUSSIAN PROCESS REGRESSION

Joint Emotion Analysis via Multi-task Gaussian Processes

20: Gaussian Processes

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Probabilistic Models for Learning Data Representations. Andreas Damianou

MTTTS16 Learning from Multiple Sources

Integrated Non-Factorized Variational Inference

Deep learning with differential Gaussian process flows

Prediction of Data with help of the Gaussian Process Method

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Dimensional reduction of clustered data sets

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Tokamak profile database construction incorporating Gaussian process regression

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Modeling human function learning with Gaussian processes

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Finding minimum energy paths using Gaussian process regression

Gaussian Processes for Machine Learning

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Nonparametric Bayesian Methods - Lecture I

Latent Variable models for GWAs

Multivariate Bayesian Linear Regression MLAI Lecture 11

Models. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Multi-Kernel Gaussian Processes

STA414/2104 Statistical Methods for Machine Learning II

Lecture 6: Graphical Models: Learning

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Latent Force Models. Neil D. Lawrence

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

The Bayesian approach to inverse problems

Gaussian Processes in Machine Learning

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Kernel adaptive Sequential Monte Carlo

A short introduction to INLA and R-INLA

Regression with Input-Dependent Noise: A Bayesian Treatment

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Lecture : Probabilistic Machine Learning

Transcription:

Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7

Outline Application Application Methodology Overview 3 Linear Response with MLP Kernel Non-linear Responses

Online Resources Application All source code and slides are available online This talk available from home page (see talks link on side). Scripts available in the gpsim toolbox http://www.cs.man.ac.uk/~neill/gpsim/. MATLAB commands used for examples given in typewriter font.

Framework Application Methodology Overview Latent functions Many interaction networks have latent functions. Assume a Gaussian process (GP) prior distribution for the latent function. Gaussian processes (GPs) are probabilistic models for functions. O Hagan [978, 99], Rasmussen and Williams [6] Our Approach Take a differential equation model for the system. Derive GP covariance jointly for observed and latent functions. 3 Maximise likelihood with respect to parameters (mostly physically meaningful).

Methodology Overview This Talk Transcription Network Introduce Gaussian Processes for dealing with latent functions in transcription networks. Show how in a linear response model the latent function can be dealt with analytically. Discuss extensions to systems with non-linear responses.

p53 Inference [Barenco et al., 6] Data consists of T measurements of mrna expression level for N different genes. We relate gene expression, x j (t), to TFC, f (t), by dx j (t) dt = B j + S j f (t) D j x j (t). () B j basal transcription rate of gene j, S j is sensitivity of gene j D j is the decay rate of the mrna. Dependence of mrna transcription rate on TF is linear.

Linear Response Solution Solve for TFC The equation given in () can be solved to recover x j (t) = B t j + S j exp ( D j t) f (u) exp (D j u) du. () D j If we model f (t) as a GP then as () only involves linear operations x j (t) is also a GP.

Gaussian Processes Application GP Advantages GPs allow for inference of continuous profies, accounting naturally for temporal structure. GPs allow joint estimation of a mrna concentration and production rates (derivative observations). GPs deal consistently with the uncertainty inherent in the measurements. GPs outstrip MCMC for computational efficiency. Note: GPs have previously been proposed for solving differential equations [Graepel, 3] and dynamical systems [Murray-Smith and Pearlmutter].

Gaussian Processes Application Gaussian process is governed by a mean function and a covariance function p (f) = N ( f f, K ) k ij = k (t i, t j ), f i = f (t i ) Mean function often taken to be zero. Covariance function is any positive definite function, e.g. RBF Covariance.

Covariance Functions Visualisation of RBF Covariance RBF Kernel Function k ( t, t ) = α exp ( (t t ) l ) Covariance matrix is built using the time inputs to the function, t. For the example above it was based on Euclidean distance. The covariance function is also known as a kernel. m 5.5 5.5 5 5 5 5 n

Covariance Samples Application demcovfuncsample sample from the prior 6 6.5.5 Figure: RBF kernel with γ =, α =

Covariance Samples Application demcovfuncsample sample from the prior 6 6.5.5 Figure: RBF kernel with l =, α =

Covariance Samples Application demcovfuncsample sample from the prior 6 6.5.5 Figure: RBF kernel with l =.3, α =

Different Covariance Functions MLP Kernel Function k ( t, t ) ( = αsin wtt ) + b wt + b + wt + b + A non-stationary covariance matrix Williams [997]. Derived from a multi-layer perceptron (MLP). m 5.5 5.5 5 5 5 5 n

Covariance Samples Application demcovfuncsample samples from the prior 6 6.5.5 Figure: MLP kernel with α = 8, w = and b =

Covariance Samples Application demcovfuncsample samples from the prior 6 6.5.5 Figure: MLP kernel with α = 8, b = and w =

Prior to Posterior Application Prediction with GPs GPs provide a probabilistic prior over functions. By combining with data we get a posterior over functions. This is obtained through combining a covariance function with data. Toy Example: regression with GPs.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Gaussian Process Regression demregression 3 3 Figure: Going from prior to posterior with data.

Covariance of Latent Function Prior Distribution for TFC We assume that the TF concentration is a Gaussian Process. We will assume an RBF covariance function p (f) = N (f, K) k ( t, t ) ( = exp (t ) t ) l.

Computation of Joint Covariance Covariance Function Computation We rewrite solution of differential equation as where x j (t) = B j D j + L j [f ] (t) t L j [f ] (t) = S j exp ( D j t) f (u) exp (D j u) du (3) is a linear operator.

Induced Covariance Application Gene s Covariance The new covariance function is then given by cov ( L j [f ] (t), L k [f ] ( t )) = L j L k [k ff ] ( t, t ). more explicitly k xj x k `t, t = S j S k exp ` D j t D k t Z t exp (D j u) Z t exp `D k u k ff `u, u du du. With RBF covariance these integrals are tractable.

Covariance for Transcription Model RBF Kernel function for f (t) x i (t) = B t i + S i exp ( D i t) f (u) exp (D i u) du. D i Joint distribution for x (t), x (t) and f (t). Here: D S D S 5 5.5.5 f(t) x (t) x (t) f(t) x (t) x (t).8.6.....6.8

Joint Sampling of x (t) and f (t) gpsimtest.5.5.5.5 3 5 3 5 Figure: Left: joint samples from the transcription covariance, blue: f (t), cyan: x (t) and red: x (t). Right: numerical solution for f (t) of the differential equation from x (t) and x (t) (blue and cyan). True f (t) included for comparison.

Joint Sampling of x (t) and f (t) gpsimtest 3 3.5.5.5.5 3 5 3 5 Figure: Left: joint samples from the transcription covariance, blue: f (t), cyan: x (t) and red: x (t). Right: numerical solution for f (t) of the differential equation from x (t) and x (t) (blue and cyan). True f (t) included for comparison.

Joint Sampling of x (t) and f (t) gpsimtest.5.5.5.5 3 5 3 5 Figure: Left: joint samples from the transcription covariance, blue: f (t), cyan: x (t) and red: x (t). Right: numerical solution for f (t) of the differential equation from x (t) and x (t) (blue and cyan). True f (t) included for comparison.

Noise Corruption Application Estimate Underlying Noise Allow the mrna abundance of each gene at each time point to be corrupted by noise, for observations at t i for i =,..., T, y j (t i ) = x j (t i ) + ɛ j (t i ) () with ɛ j (t i ) N (, σ ji ). Estimate noise level using probe-level processing techniques of Affymetrix microarrays (e.g. mmgmos, [Liu et al., 5]). The covariance of the noisy process is then K yy = Σ + K xx, with Σ = diag ( σ,..., σ T,..., σ N,..., σ NT ).

Artificial Data Application Results from an artificial data set. We used a known TFC and derived six mrna profiles. Known TFC composed of three Gaussian basis functions. mrna profiles derived analytically. Fourteen subsamples were taken and corrupted by noise. This data was then used to: Learn decays, sensitivities and basal transcription rates. Infer a posterior distribution over the missing TFC.

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Artificial Data Results demtoyproblem 8 6 5 3 5 Figure: Left: The TFC, f (t), which drives the system. Middle: Five gene mrna concentration profiles each obtained by using different parameter sets {B i, S i, D i } 5 i= (lines) along with noise corrupted data. Right: The inferred TFC (with error bars). demtoyproblem

Results Application Linear System Recently published biological data set studied using linear response model by Barenco et al. [6]. Study focused on the tumour suppressor protein p53. mrna abundance measured for five targets: DDB, p, SESN/hPA6, BIK and TNFRSFb. Quadratic interpolation for the mrna production rates to obtain gradients. They used MCMC sampling to obtain estimates of the model parameters B j, S j, D j and f (t).

Linear response analysis Experimental Setup We analysed data using the linear response model. Raw data was processed using the mmgmos model of Liu et al. [5] which provides variance as well as expression level. We present posterior distribution over TFCs. Results of inference on the values of the hyperparameters B j, S j and D j. Samples from the posterior distribution were obtained using Hybrid Monte Carlo (see e.g. Neal, 996).

Linear Response Results dembarenco 3 5 Figure: Predicted protein concentration for p53. Solid line is mean, dashed lines 95% credibility intervals. The prediction of [Barenco et al., 6] was pointwise and is shown as crosses.

Results Transcription Rates Estimation of Equation Parameters dembarenco.5..5..5 DDB hpa6 TNFRSFb p BIK Figure: Basal transcription rates. Our results (black) compared with Barenco et al. [6] (white).

Results Transcription Rates Estimation of Equation Parameters dembarenco.5.5.5 DDB hpa6 TNFRSFb p BIK Figure: Sensitivities. Our results (black) compared with Barenco et al. [6] (white).

Results Transcription Rates Estimation of Equation Parameters dembarenco.9.8.7.6.5..3.. DDB hpa6 TNFRSFb p BIK Figure: Decays. Our results (black) compared with Barenco et al. [6] (white).

Linear Response Discussion GP Results Note oscillatory behaviour, possible artifact of RBF covariance Rasmussen and Williams [see page 3 in 6]. Results are in good accordance with the results obtained by Barenco et al.. Differences in estimates of the basal transcription rates probably due to: different methods used for probe-level processing of the microarray data. Our failure to constrain f () =. Our results take about 3 minutes to produce Barenco et al. required million iterations of Monte Carlo.

Linear Response with MLP Kernel Non-linear Responses More Realistic Response All the quantities in equation () are positive, but direct samples from a GP will not be. Linear models don t account for saturation. Solution: model response using a positive nonlinear function.

Formalism Application Linear Response with MLP Kernel Non-linear Responses Non-linear Response Introduce a non-linearity g ( ) parameterised by θ j dx j dt = B j + g(f (t), θ j ) D j x j x j (t) = B j + exp ` D Z t j t du g(f (u), θ j ) exp `D j u. D j The induced distribution of x j (t) is no longer a GP. Derive the functional gradient and learn a MAP solution for f (t). Also compute Hessian so we can approximate the marginal likelihood.

Example: linear response Linear Response with MLP Kernel Non-linear Responses Using non-rbf kernels Start by taking g( ) to be linear. Provides sanity check and allows arbitrary covariance functions. Avoids double numerical integral that would normally be required.

Response Results Application Linear Response with MLP Kernel Non-linear Responses dembarencomap, dembarencomap 3 3 5 5 Figure: Left: RBF prior on f (log likelihood -.); Right: MLP prior on f (log likelihood -5.6).

Non-linear response analysis Linear Response with MLP Kernel Non-linear Responses Non-linear responses Exponential response model (constrains protein concentrations positive). log ( + exp (f )) response model. 3 +exp( f ) Inferred MAP solutions for the latent function f are plotted below.

exp ( ) Response Results Linear Response with MLP Kernel Non-linear Responses dembarencomap3, dembarencomap 6 6 5 5 3 3 5 5 Figure: Left: shows results of using a squared exponential prior covariance on f (log likelihood -.6); Right: shows results of using an MLP prior covariance on f (log likelihood -6.).

log ( + exp (f )) Response Results Linear Response with MLP Kernel Non-linear Responses dembarencomap5, dembarencomap6 6 6 5 5 3 3 5 5 Figure: Left: shows results of using a squared exponential prior covariance on f (log likelihood -.9); Right: shows results of using an MLP prior covariance on f (log likelihood -.).

3 +exp( f ) Application Response Results Linear Response with MLP Kernel Non-linear Responses dembarencomap7, dembarencomap8.8.8.6.6.... 5 5 Figure: Left: shows results of using a squared exponential prior covariance on f (log likelihood -.); Right: shows results of using an MLP prior covariance on f (log likelihood -.).

Discussion Application Linear Response with MLP Kernel Non-linear Responses We have described how GPs can be used in modelling dynamics of a simple regulatory network motif. Our approach has advantages over standard parametric approaches: there is no need to restrict the inference to the observed time points, the temporal continuity of the inferred functions is accounted for naturally. GPs allow us to handle uncertainty in a natural way. MCMC parameter estimation in a discretised model can be computationally expensive. Parameter estimation can be achieved easily in our framework by type II maximum likelihood or by using efficient hybrid Monte Carlo sampling techniques All code on-line http://www.cs.man.ac.uk/~neill/gpsim/.

Acknowledgements References Future Directions What Next? This is still a very simple modelling situation. We are ignoring transcriptional delays. Here we have single transcription factor: our ultimate goal is to describe regulatory pathways with more genes. All these issues can be dealt with in the general framework we have described. Need to overcome the greater computational difficulties.

Acknowledgements Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Data and Support We thank Martino Barenco for useful discussions and for providing the data. We gratefully acknowledge support from BBSRC Grant No BBS/B/76X Improved processing of microarray data with probabilistic models.

Covariance Result Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Covariance Result where k xj x k ( t, t ) = S j S k π h `t kj, t = exp (γ k) D j + D k jexp ˆ D `t k t» t t erf l Here γ k = D kl. [ hkj ( t, t ) + h jk ( t, t )] «t γ k + erf l exp ˆ `D k t»erf t «+ D j γ k + erf (γ k ) l ff. + γ k «

Cross Covariance Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Correlation of x j (t) and f (t ) Need the cross-covariance terms between x j (t) and f (t ), which is obtained as k xj f ( t, t ) t ( = S j exp ( D j t) exp (D j u) k ff u, t ) du. (5) For RBF we have k `t πlsj e xj f, t γ j = exp ˆ D `t t j t»erf ««t t γ j + erf + γ j. l l

Posterior for f Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Prediction for TFC Standard Gaussian process regression techniques [see e.g. Rasmussen and Williams, 6] yield f post = K f x K xx x K post ff = K ff K f x K xx K xf Model parameters B j, D j and S j estimated by type II maximum likelihood, log p (x) = N (x, K xx )

Implementation Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Riemann quadrature Implementation requires a discretised time. Compute the gradient and Hessian on a grid. Integrate them by approximate Riemann quadrature. We choose a uniform grid {t p } M p= so that = t p t p is constant. The vector f = {f p } M p= is the function f at the grid points. Z t I (t) = f (u) exp `D j u du I (t) MX f (t p) exp `D j t p p=

Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Log Likelihood Functional Gradient Given noise-corrupted data y j (t i ) the log-likelihood is TX log p(y f, θ j ) = i= j= " NX `xj (t i ) y j (t i ) σ ji # log σji NT log(π) The functional derivative of the log-likelihood wrt f is δ log p(y f ) δf (t) TX NX (x j (t i ) y j (t i )) = Θ(t i t) σ g (f (t))e D j (t i t) i= j= ji Θ(x) Heaviside step function.

Log Likelihood Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Functional Hessian Given noise-corrupted data y j (t i ) the log-likelihood is TX log p(y f, θ j ) = i= j= " NX `xj (t i ) y j (t i ) σ ji # log σji NT log(π) The negative Hessian of the log-likelihood wrt f is w(t, t ) = TX Θ(t i t)δ `t NX t `xj (t i ) y j (t i ) σ ji g (f (t))e D j (t i t) i= j= TX + Θ(t i t)θ `t NX i t σ ji g (f (t)) g `f (t ) e D j (t i t t ) i= j= g (f ) = g/ f and g (f ) = g/ f.

Implementation II Acknowledgements References Covariance Computation Posterior for f Non-linear Response Implementation Combine with Prior Combine these with prior to compute gradient and Hessian of log posterior Ψ(f) = log p(y f) + log p(f) [see Rasmussen and Williams, 6, chapter 3] Ψ(f) log p(y f) = K f f f Ψ(f) f = (W + K ) K prior covariance evaluated at the grid points. Use to find a MAP solution via, ˆf, using Newton s algorithm. The Laplace approximation is then log p(y ) log p(y ˆf) ˆf T K ˆf log I + KW. (7) (6)

Acknowledgements References References M. Barenco, D. Tomescu, D. Brewer, R. Callard, J. Stark, and M. Hubank. Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biology, 7(3):R5, 6. T. Graepel. Solving noisy linear operator equations by Gaussian processes: Application to ordinary and partial differential equations. In T. Fawcett and N. Mishra, editors, Proceedings of the International Conference in Machine Learning, volume, pages 3. AAAI Press, 3. ISBN -57735-89-. X. Liu, M. Milo, N. D. Lawrence, and M. Rattray. A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics, (8):3637 36, 5. R. Murray-Smith and B. A. Pearlmutter. Transformations of Gaussian process priors. R. M. Neal. Bayesian Learning for Neural Networks. Springer, 996. Lecture Notes in Statistics 8. A. O Hagan. Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society, B, :, 978. A. O Hagan. Some Bayesian numerical analysis. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics, pages 35 363, Valencia, 99. Oxford University Press. C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 6. ISBN 6853X. C. K. I. Williams. Computing with infinite networks. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, Cambridge, MA, 997. MIT Press.