Markov Chain Monte Carlo Algorithms for Gaussian Processes
|
|
- Rebecca Little
- 5 years ago
- Views:
Transcription
1 Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8
2 Outline Gaussian Processes Sampling algorithms for Gaussian Process Models Sampling from the prior Gibbs sampling schemes Sampling using control variables Applications Demonstration on regression/classification Transcriptional regulation Summary/Future work
3 Gaussian Processes A Gaussian process (GP) is a distribution over a real-valued function f (x). It is defined by a mean function µ(x) = E(f (x)) and a covariance or kernel function k(x n, x m ) = E(f (x n )f (x m )) E.g. this can be the RBF (or squared exponential) kernel k(x n, x m ) = α exp ( x n x m ) l
4 Gaussian Processes We evaluate a function in a set of inputs (x i ) N i= : f i = f (x i ) A Gaussian process reduces to a multivariate Gaussian distribution over f = (f i ) N i= ( p(f) = N(x, K) = exp ft K ) f (π) N K where the covariance K is defined by the kernel function p(f) is a conditional distribution (a precise notation is p(f X ))
5 Gaussian Processes for Bayesian learning Many problems involve inference over unobserved/latent functions A Gaussian process can place a prior on a latent function Bayesian inference: Data y = (y i ) N i= (associated with inputs (x i) N i= ) Likelihood model p(y f) GP prior p(f) for the latent function f Bayes rule p(f y) p(y f) p(f) Posterior Likelihood Prior For regression, where the likelihood is Gaussian, this computation is analytically obtained
6 Gaussian Processes for Bayesian Regression Data and the GP prior (rbf kernel function) Posterior GP process
7 Gaussian Processes for non-gaussian Likelihoods When the likelihood p(y f) is non-gaussian computations are analytically intractable Non-Gaussian likelihoods: Classification problems Spatio-temporal models and geostatistics Non-linear differential equations with latent functions Approximations need to be considered MCMC is a powerful framework that offers: Arbitrarily precise approximation in the limit of long runs General applicability (independent from the functional form of the likelihood)
8 MCMC for Gaussian Processes The Metropolis-Hastings (MH) algorithm Initialize f () Form a Markov chain. Use a proposal distribution Q(f (t+) f (t) ) and accept with the MH step ( ) min, p(y f(t+) )p(f (t+) ) Q(f (t) f (t+) ) p(y f (t) )p(f (t) ) Q(f (t+) f (t) ) The posterior is highly-correlated and f is high dimensional How do we choose the proposal Q(f (t+) f (t) )?
9 MCMC for Gaussian Processes Use the GP prior as the proposal distribution Proposal: Q(f (t+) f (t) ) = p(f (t+) ) MH probability min (, p(y f(t+) ) p(y f (t) ) Nice property: The prior samples functions with the appropriate smoothing requirement Bad property: We get almost zero acceptance rate. The chain will get stuck in the same state for thousands of iterations )
10 MCMC for Gaussian Processes Use Gibbs sampling Proposal: Iteratively sample from the conditional posterior p(f i f i, y) where f i = f \ f i Nice property: All samples are accepted and the prior smoothing requirement is satisfied Bad property: The Markov chain will move extremely slowly for densely sampled functions: The variance of p(f i f i, y) is smaller or equal to the variance of the conditional prior p(f i f i ) But p(f i f i ) may already have a tiny variance
11 Gibbs-like schemes Gibbs-like algorithm: Instead of p(f i f i, y) use the conditional prior p(f i f i ) and accept with the MH step (it has been used in geostatistics, Diggle and Tawn, 998) Gibbs-like algorithm is still inefficient to sample from highly correlated functions Block or region sampling: Cluster the function values f into regions/blocks {f k } M k= Sample each block f k from the conditional GP prior p(f (t+) k f (t) k ), where f k = f \ f k and accept with the MH step This scheme can work better But it does not solve the problem of sampling highly correlated functions since the variance of the proposal can be very small in the boundaries between regions
12 Gibbs-like schemes Region sampling with 4 regions ( of the proposals are shown below) Note that the variance of the conditional priors is small close to the boundaries between regions
13 Sampling using control variables Let f c be a set of auxiliary function values. We call them control variables The control variables provide a low dimensional representation of f (analogously to the inducing/active variables in sparse GP models) Using f c, we can write the posterior p(f y) = p(f f c, y)p(f c y)df c f c When f c is highly informative about f, ie. p(f f c, y) p(f f c ), we can approximately sample from p(f y): Sample the control variables from p(f c y) Generate f from the conditional prior p(f f c )
14 Sampling using control variables Idea: Sample the control variables from p(f c y) and generate f from the conditional prior p(f f c ) Make this a MH algorithm: We only need to specify the proposal q(f c (t+) f c (t) ), that will mimic sampling from p(f c y) The whole proposal is Q(f (t+), f (t+) c f (t), f (t) ) = p(f (t+) f (t+) c Each (f (t+), f c (t+) ) is accepted using the MH step c ) A = p(y f(t+) )p(f (t+) p(y f (t) )p(f (t) c ) c q(f c (t) f c (t+) ) q(f c (t+) f c (t) ) )q(f c (t+) f c (t) )
15 Sampling using control variables: Specification of q(f (t+) c f c (t) ) q(f (t+) c f c (t) ) must mimic sampling from p(f c y) The control points are meant to be almost independent, thus Gibbs can be efficient Sample each f ci from the conditional posterior p(f ci f c i, y) Unfortunately computing p(f ci f c i, y) is intractable But we can use the Gibbs-like algorithm: Iterate between different control variables i: Sample f (t+) c i from p(f c (t+) i f (t), f c (t) i ). Accept with the MH step c i ) and f (t+) from p(f (t+) f c (t+) i The proposal for f is the leave-one-out conditional prior p(f t+ f (t) c i ) = f (t+) c i p(f t+ f c (t+) i, f (t) c i )p(f c (t+) i f c (t) i )df c (t+) i
16 Sampling using control variables: Demonstration Data, current f (t) (red line) and current control variables f c (t) circles) (red
17 Sampling using control variables: Demonstration First control variable: The proposal p(f c (t+) f c (t) ) (green bar)
18 Sampling using control variables: Demonstration First control variable: The proposed f (t+) c (diamond in magenta)
19 Sampling using control variables: Demonstration First control variable: The proposed function f (t+) (blue line)
20 Sampling using control variables: Demonstration First control variable: Shaded area is the overall effective proposal p(f (t+) f (t) c )
21 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
22 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
23 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
24 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
25 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
26 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
27 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
28 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
29 Sampling using control variables: Demonstration Iteration between control variables: Allows f to be drawn with considerable variance everywhere in the input space
30 Sampling using control variables: Input control locations To apply the algorithm, we need to select the number M of control variables and their input locations X c Choose X c using a PCA-like approach Knowledge of f c must determine f with small error Given f c the prediction of f is K f,c Kc,c f c Mininimize the averaged error f K f,c Kc,c f c G(X c ) = f,f c f K f,c K c,c f c p(f f c )p(f c )dfdf c = Tr(K f,f K f,c K c,c K T f,c) Minimize G(X c ) w.r.t. X c using gradient-based optimization Note: G(X c ) is the total variance of the conditional prior p(f f c )
31 Sampling using control points: Choice of M To find the number M of control variables Minimize G(X c ) by incrementally adding control variables until G(X c ) becomes smaller than a certain percentage of the total variance of p(f) (5% used in all our experiments) Start the simulation and observe the acceptance rate of the chain Keep adding control variables until the acceptance rate becomes larger than 5% (following standard heuristics Gelman, Carlin, Stern and Rubin (4))
32 Sampling using control variables: G(X c ) function The minimization of G places the control inputs close to the clusters of the input data in such a way that the kernel function is taken into account
33 Applications: Demonstration on regression Regression: Compare Gibbs, local region sampling and control variables in regression (randomly chosen GP functions of varied input-dimensions: d =,...,, with fixed N = training points) KL(real empirical) Gibbs region control dimension number of control variables corrcoef control dimension Note: The number of control variables increases as the function values become more independent... this is very intuitive
34 Applications: Classification Classification: Wisconsin Breast Cancer (WBC) and the Pima Indians Diabetes. Hyperparameters fixed to those obtained by Expectation-Propagation Log likelihood 58 6 Log likelihood MCMC iterations MCMC iterations gibbs contr ep gibbs contr ep Figure: Log-likelihood for Gibbs (left) and control (middle) in WBC dataset. (right) shows the test errors (grey bars) and the average negative log likelihoods (black bars) on the WBC (left) and PID (right)
35 Applications: Transcriptional regulation Data: Gene expression levels y = (y jt ) of N genes at T times Goal: We suspect/know that a certain protein regulates ( i.e. is a transcription factor (TF) ) these genes and we wish to model this relationship Model: Use a differential equation (Barenco et al. [6]; Rogers et al. [7]; Lawerence et al. [7]) dy j (t) = B j + S j g(f (t)) D j y j (t) dt where t - time y j (t) - expression of the jth gene f (t) - concentration of the transcription factor protein D j - decay rate B j - basal rate S j - Sensitivity
36 Transcriptional regulation using Gaussian processes Solve the equation y j (t) = B j D j +A j exp( D j t)+s j exp( D j t) t g(f (u)) exp(d j u)du Apply numerical integration using a very dense grid (u i ) P i= and f = (f i (u i )) P i= y j (t) B P t j +A j exp( D j t)+s j exp( D j t) w p g(f p ) exp(d j u p ) D j p= Assuming Gaussian noise for the observed gene expressions {y jt }, the ODE defines the likelihood p(y f) Bayesian inference: Assume a GP prior for the transcription factor f and apply MCMC to infer (f, {A j, B j, D j, S j } N j= ) f is inferred in a continuous manner (P T )
37 Results in E.coli data: Rogers, Khanin and Girolami (7) One transcription factor (lexa) that acts as a repressor. We consider the Michaelis-Menten kinetic equation dy j (t) dt = B j + S j exp(f (t)) + γ j D j y j (t) We have 4 genes (5 kinetic parameters each) Gene expressions are available for T = 6 time slots TF (f) is discretized using points MCMC details: 6 control points are used Running time was 5 hours for 5 5 iterations plus burn in
38 Results in E.coli data: Predicted gene expressions dinf Gene 7.5 dini Gene 6 lexa Gene reca Gene recn Gene ruva Gene
39 Results in E.coli data: Predicted gene expressions 6.5 ruvb Gene 4.5 sbmc Gene 4.5 sula Gene umuc Gene umud Gene uvrb Gene 4 5 6
40 Results in E.coli data: Predicted gene expressions 5.5 yebg Gene 7 yjiw Gene
41 Results in E.coli data: Protein concentration 7 Inferred protein
42 Results in E.coli data: Kinetic parameters 7 Basal rates.4 Decay rates dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw.7 Sensitivities.5 Gamma parameters dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw
43 Results in E.coli data: Confidence intervals for the kinetic parameters Basal rates 4 Decay rates dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw Sensitivities 9 Gamma parameters dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw dinf dini lexa reca recn ruva ruvb sbmc sula umuc umud uvrb yebg yjiw
44 Data used by Barenco et al. [6] One transcription factor (p5) that acts as an activator. We consider the Michaelis-Menten kinetic equation dy j (t) dt = B j + S j exp(f (t)) exp(f (t)) + γ j D j y j (t) We have 5 genes Gene expressions are available for T = 7 times and there are replicas of the time series data TF (f) is discretized using points MCMC details: 7 control points are used Running time 4 hours for 5 5 iterations plus burn in
45 Data used by Barenco et al. [6]: Predicted gene expressions for the st replica.5 DDB Gene first Replica 5 BIK Gene first Replica.5 TNFRSFb Gene first Replica CIp/p Gene first Replica.5 p6 sesn Gene first Replica
46 Data used by Barenco et al. [6]: Protein concentrations Inferred p5 protein Inferred p5 protein Inferred p5 protein Linear model (Barenco et al. predictions are shown as crosses).7 Inferred protein.7 Inferred protein Inferred protein
47 Data used by Barenco et al. [6]: Kinetic parameters.6 Basal rates Decay rates DDB p6 sesn TNFRSFb CIp/p BIK DDB p6 sesn TNFRSFb CIp/p BIK Sensitivities.9 Gamma parameters DDB p6 sesn TNFRSFb CIp/p BIK DDB p6 sesn TNFRSFb CIp/p BIK Our results (grey) compared with Barenco et al. [6] (black). Note that Barenco et al. use a linear model
48 Summary/Future work Summary: A new MCMC algorithm for Gaussian processes using control variables It can be generally applicable Future work: Deal with large systems of ODEs for the transcriptional regulation application Consider applications in geostatistics Use the G(X c ) function to learn sparse GP models in an unsupervised fashion without the outputs y being involved
Variational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationModelling Transcriptional Regulation with Gaussian Processes
Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application
More informationInferring Latent Functions with Gaussian Processes in Differential Equations
Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline
More informationA new statistical framework to infer gene regulatory networks with hidden transcription factors
A new statistical framework to infer gene regulatory networks with hidden transcription factors Ivan Vujačić, Javier González, Ernst Wit University of Groningen Johann Bernoulli Institute PEDS II, June
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationDisease mapping with Gaussian processes
EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationNon-Factorised Variational Inference in Dynamical Systems
st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly
More informationBayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition
Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationInference of genomic network dynamics with non-linear ODEs
Inference of genomic network dynamics with non-linear ODEs University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst 11 June 2015 Joint work with Mahdi Mahmoudi, Itai Dattner and Ivan Vujačić
More informationModelling gene expression dynamics with Gaussian processes
Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March th 6 Magnus Rattray Faculty of Life Sciences University of Manchester Talk Outline Introduction to Gaussian
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationJoint Emotion Analysis via Multi-task Gaussian Processes
Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationMarkov Chain Monte Carlo, Numerical Integration
Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationState Space Gaussian Processes with Non-Gaussian Likelihoods
State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationApproximate inference in Energy-Based Models
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationGibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables
Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Mohammad Emtiyaz Khan Department of Computer Science University of British Columbia May 8, 27 Abstract
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationDisease mapping with Gaussian processes
Liverpool, UK, 4 5 November 3 Aki Vehtari Department of Biomedical Engineering and Computational Science (BECS) Outline Example: Alcohol related deaths in Finland Spatial priors and benefits of GP prior
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationResults: MCMC Dancers, q=10, n=500
Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time
More information