Markov Chain Monte Carlo Algorithms for Gaussian Processes

Size: px
Start display at page:

Download "Markov Chain Monte Carlo Algorithms for Gaussian Processes"

Transcription

1 Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8

2 Outline Gaussian Processes Sampling algorithms for Gaussian Process Models Sampling from the prior Gibbs sampling schemes Sampling using control variables Applications Demonstration on regression/classification Transcriptional regulation Summary/Future work

3 Gaussian Processes A Gaussian process (GP) is a distribution over a real-valued function f (x). It is defined by a mean function µ(x) = E(f (x)) and a covariance or kernel function k(x n, x m ) = E(f (x n )f (x m )) E.g. this can be the RBF (or squared exponential) kernel k(x n, x m ) = α exp ( x n x m ) l

4 Gaussian Processes We evaluate a function in a set of inputs (x i ) N i= : f i = f (x i ) A Gaussian process reduces to a multivariate Gaussian distribution over f = (f i ) N i= ( p(f) = N(x, K) = exp ft K ) f (π) N K where the covariance K is defined by the kernel function p(f) is a conditional distribution (a precise notation is p(f X ))

5 Gaussian Processes for Bayesian learning Many problems involve inference over unobserved/latent functions A Gaussian process can place a prior on a latent function Bayesian inference: Data y = (y i ) N i= (associated with inputs (x i) N i= ) Likelihood model p(y f) GP prior p(f) for the latent function f Bayes rule p(f y) p(y f) p(f) Posterior Likelihood Prior For regression, where the likelihood is Gaussian, this computation is analytically obtained

6 Gaussian Processes for Bayesian Regression Data and the GP prior (rbf kernel function) Posterior GP process

7 Gaussian Processes for non-gaussian Likelihoods When the likelihood p(y f) is non-gaussian computations are analytically intractable Non-Gaussian likelihoods: Classification problems Spatio-temporal models and geostatistics Non-linear differential equations with latent functions Approximations need to be considered MCMC is a powerful framework that offers: Arbitrarily precise approximation in the limit of long runs General applicability (independent from the functional form of the likelihood)

8 MCMC for Gaussian Processes The Metropolis-Hastings (MH) algorithm Initialize f () Form a Markov chain. Use a proposal distribution Q(f (t+) f (t) ) and accept with the MH step ( ) min, p(y f(t+) )p(f (t+) ) Q(f (t) f (t+) ) p(y f (t) )p(f (t) ) Q(f (t+) f (t) ) The posterior is highly-correlated and f is high dimensional How do we choose the proposal Q(f (t+) f (t) )?

9 MCMC for Gaussian Processes Use the GP prior as the proposal distribution Proposal: Q(f (t+) f (t) ) = p(f (t+) ) MH probability min (, p(y f(t+) ) p(y f (t) ) Nice property: The prior samples functions with the appropriate smoothing requirement Bad property: We get almost zero acceptance rate. The chain will get stuck in the same state for thousands of iterations )

10 MCMC for Gaussian Processes Use Gibbs sampling Proposal: Iteratively sample from the conditional posterior p(f i f i, y) where f i = f \ f i Nice property: All samples are accepted and the prior smoothing requirement is satisfied Bad property: The Markov chain will move extremely slowly for densely sampled functions: The variance of p(f i f i, y) is smaller or equal to the variance of the conditional prior p(f i f i ) But p(f i f i ) may already have a tiny variance

11 Gibbs-like schemes Gibbs-like algorithm: Instead of p(f i f i, y) use the conditional prior p(f i f i ) and accept with the MH step (it has been used in geostatistics, Diggle and Tawn, 998) Gibbs-like algorithm is still inefficient to sample from highly correlated functions Block or region sampling: Cluster the function values f into regions/blocks {f k } M k= Sample each block f k from the conditional GP prior p(f (t+) k f (t) k ), where f k = f \ f k and accept with the MH step This scheme can work better But it does not solve the problem of sampling highly correlated functions since the variance of the proposal can be very small in the boundaries between regions

12 Gibbs-like schemes Region sampling with 4 regions ( of the proposals are shown below) Note that the variance of the conditional priors is small close to the boundaries between regions

13 Sampling using control variables Let f c be a set of auxiliary function values. We call them control variables The control variables provide a low dimensional representation of f (analogously to the inducing/active variables in sparse GP models) Using f c, we can write the posterior p(f y) = p(f f c, y)p(f c y)df c f c When f c is highly informative about f, ie. p(f f c, y) p(f f c ), we can approximately sample from p(f y): Sample the control variables from p(f c y) Generate f from the conditional prior p(f f c )

14 Sampling using control variables Idea: Sample the control variables from p(f c y) and generate f from the conditional prior p(f f c ) Make this a MH algorithm: We only need to specify the proposal q(f c (t+) f c (t) ), that will mimic sampling from p(f c y) The whole proposal is Q(f (t+), f (t+) c f (t), f (t) ) = p(f (t+) f (t+) c Each (f (t+), f c (t+) ) is accepted using the MH step c ) A = p(y f(t+) )p(f (t+) p(y f (t) )p(f (t) c ) c q(f c (t) f c (t+) ) q(f c (t+) f c (t) ) )q(f c (t+) f c (t) )

15 Sampling using control variables: Specification of q(f (t+) c f c (t) ) q(f (t+) c f c (t) ) must mimic sampling from p(f c y) The control points are meant to be almost independent, thus Gibbs can be efficient Sample each f ci from the conditional posterior p(f ci f c i, y) Unfortunately computing p(f ci f c i, y) is intractable But we can use the Gibbs-like algorithm: Iterate between different control variables i: Sample f (t+) c i from p(f c (t+) i f (t), f c (t) i ). Accept with the MH step c i ) and f (t+) from p(f (t+) f c (t+) i The proposal for f is the leave-one-out conditional prior p(f t+ f (t) c i ) = f (t+) c i p(f t+ f c (t+) i, f (t) c i )p(f c (t+) i f c (t) i )df c (t+) i

16 Sampling using control variables: Demonstration Data, current f (t) (red line) and current control variables f c (t) circles) (red

17 Sampling using control variables: Demonstration First control variable: The proposal p(f c (t+) f c (t) ) (green bar)

18 Sampling using control variables: Demonstration First control variable: The proposed f (t+) c (diamond in magenta)

19 Sampling using control variables: Demonstration First control variable: The proposed function f (t+) (blue line)

20 Sampling using control variables: Demonstration First control variable: Shaded area is the overall effective proposal p(f (t+) f (t) c )

21 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

22 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

23 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

24 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

25 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

26 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

27 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

28 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

29 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space

30 Sampling using control variables: Input control locations To apply the algorithm, we need to select the number M of control variables and their input locations X c Choose X c using a PCA-like approach Knowledge of f c must determine f with small error Given f c the prediction of f is K f,c Kc,c f c Mininimize the averaged error f K f,c Kc,c f c G(X c ) = f,f c f K f,c K c,c f c p(f f c )p(f c )dfdf c = Tr(K f,f K f,c K c,c K T f,c) Minimize G(X c ) w.r.t. X c using gradient-based optimization Note: G(X c ) is the total variance of the conditional prior p(f f c )

31 Sampling using control points: Choice of M To find the number M of control variables Minimize G(X c ) by incrementally adding control variables until G(X c ) becomes smaller than a certain percentage of the total variance of p(f) (5% used in all our experiments) Start the simulation and observe the acceptance rate of the chain Keep adding control variables until the acceptance rate becomes larger than 5% (following standard heuristics Gelman, Carlin, Stern and Rubin (4))

32 Sampling using control variables: G(X c ) function The minimization of G places the control inputs close to the clusters of the input data in such a way that the kernel function is taken into account

33 Applications: Demonstration on regression Regression: Compare Gibbs, local region sampling and control variables in regression (randomly chosen GP functions of varied input-dimensions: d =,...,, with fixed N = training points) KL(real empirical) Gibbs region control dimension number of control variables corrcoef control dimension Note: The number of control variables increases as the function values become more independent... this is very intuitive

34 Applications: Classification Classification: Wisconsin Breast Cancer (WBC) and the Pima Indians Diabetes. Hyperparameters fixed to those obtained by Expectation-Propagation Log likelihood 58 6 Log likelihood MCMC iterations MCMC iterations gibbs contr ep gibbs contr ep Figure: Log-likelihood for Gibbs (left) and control (middle) in WBC dataset. (right) shows the test errors (grey bars) and the average negative log likelihoods (black bars) on the WBC (left) and PID (right)

35 Applications: Transcriptional regulation Data: Gene expression levels y = (y jt ) of N genes at T times Goal: We suspect/know that a certain protein regulates ( i.e. is a transcription factor (TF) ) these genes and we wish to model this relationship Model: Use a differential equation (Barenco et al. [6]; Rogers et al. [7]; Lawerence et al. [7]) dy j (t) = B j + S j g(f (t)) D j y j (t) dt where t - time y j (t) - expression of the jth gene f (t) - concentration of the transcription factor protein D j - decay rate B j - basal rate S j - Sensitivity

36 Transcriptional regulation using Gaussian processes Solve the equation y j (t) = B j D j +A j exp( D j t)+s j exp( D j t) t g(f (u)) exp(d j u)du Apply numerical integration using a very dense grid (u i ) P i= and f = (f i (u i )) P i= y j (t) B P t j +A j exp( D j t)+s j exp( D j t) w p g(f p ) exp(d j u p ) D j p= Assuming Gaussian noise for the observed gene expressions {y jt }, the ODE defines the likelihood p(y f) Bayesian inference: Assume a GP prior for the transcription factor f and apply MCMC to infer (f, {A j, B j, D j, S j } N j= ) f is inferred in a continuous manner (P T )

37 Results in E.coli data: Rogers, Khanin and Girolami (7) One transcription factor (lexa) that acts as a repressor. We consider the Michaelis-Menten kinetic equation dy j (t) dt = B j + S j exp(f (t)) + γ j D j y j (t) We have 4 genes (5 kinetic parameters each) Gene expressions are available for T = 6 time slots TF (f) is discretized using points MCMC details: 6 control points are used Running time was 5 hours for 5 5 iterations plus burn in

38 Results in E.coli data: Predicted gene expressions dinf Gene 7.5 dini Gene 6 lexa Gene reca Gene recn Gene ruva Gene

39 Results in E.coli data: Predicted gene expressions 6.5 ruvb Gene 4.5 sbmc Gene 4.5 sula Gene umuc Gene umud Gene uvrb Gene 4 5 6

40 Results in E.coli data: Predicted gene expressions 5.5 yebg Gene 7 yjiw Gene

41 Results in E.coli data: Protein concentration 7 Inferred protein

42 Results in E.coli data: Kinetic parameters 7 Basal rates.4 Decay rates dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw.7 Sensitivities.5 Gamma parameters dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw

43 Results in E.coli data: Confidence intervals for the kinetic parameters Basal rates 4 Decay rates dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw Sensitivities 9 Gamma parameters dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw

44 Data used by Barenco et al. [6] One transcription factor (p5) that acts as an activator. We consider the Michaelis-Menten kinetic equation dy j (t) dt = B j + S j exp(f (t)) exp(f (t)) + γ j D j y j (t) We have 5 genes Gene expressions are available for T = 7 times and there are replicas of the time series data TF (f) is discretized using points MCMC details: 7 control points are used Running time 4 hours for 5 5 iterations plus burn in

45 Data used by Barenco et al. [6]: Predicted gene expressions for the st replica.5 DDB Gene first Replica 5 BIK Gene first Replica.5 TNFRSFb Gene first Replica CIp/p Gene first Replica.5 p6 sesn Gene first Replica

46 Data used by Barenco et al. [6]: Protein concentrations Inferred p5 protein Inferred p5 protein Inferred p5 protein Linear model (Barenco et al. predictions are shown as crosses).7 Inferred protein.7 Inferred protein Inferred protein

47 Data used by Barenco et al. [6]: Kinetic parameters.6 Basal rates Decay rates DDB p6 sesn TNFRSFb CIp/p BIK DDB p6 sesn TNFRSFb CIp/p BIK Sensitivities.9 Gamma parameters DDB p6 sesn TNFRSFb CIp/p BIK DDB p6 sesn TNFRSFb CIp/p BIK Our results (grey) compared with Barenco et al. [6] (black). Note that Barenco et al. use a linear model

48 Summary/Future work Summary: A new MCMC algorithm for Gaussian processes using control variables It can be generally applicable Future work: Deal with large systems of ODEs for the transcriptional regulation application Consider applications in geostatistics Use the G(X c ) function to learn sparse GP models in an unsupervised fashion without the outputs y being involved

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Inferring Latent Functions with Gaussian Processes in Differential Equations

Inferring Latent Functions with Gaussian Processes in Differential Equations Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline

More information

A new statistical framework to infer gene regulatory networks with hidden transcription factors

A new statistical framework to infer gene regulatory networks with hidden transcription factors A new statistical framework to infer gene regulatory networks with hidden transcription factors Ivan Vujačić, Javier González, Ernst Wit University of Groningen Johann Bernoulli Institute PEDS II, June

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Inference of genomic network dynamics with non-linear ODEs

Inference of genomic network dynamics with non-linear ODEs Inference of genomic network dynamics with non-linear ODEs University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst 11 June 2015 Joint work with Mahdi Mahmoudi, Itai Dattner and Ivan Vujačić

More information

Modelling gene expression dynamics with Gaussian processes

Modelling gene expression dynamics with Gaussian processes Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March th 6 Magnus Rattray Faculty of Life Sciences University of Manchester Talk Outline Introduction to Gaussian

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009 with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Mohammad Emtiyaz Khan Department of Computer Science University of British Columbia May 8, 27 Abstract

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes Liverpool, UK, 4 5 November 3 Aki Vehtari Department of Biomedical Engineering and Computational Science (BECS) Outline Example: Alcohol related deaths in Finland Spatial priors and benefits of GP prior

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Bayesian Sparse Correlated Factor Analysis

Bayesian Sparse Correlated Factor Analysis Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information