Non-Gaussian likelihoods for Gaussian Processes

Size: px
Start display at page:

Download "Non-Gaussian likelihoods for Gaussian Processes"

Transcription

1 Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield

2 Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations

3 GP regression Model the observations, y, as a distorted version of the process, f, y i N ( f (x i ), σ 2) f is a non-linear function, in our case we assume it is latent, and is assigned a Gaussian process prior. 8 Realizations of f(x) Observations 95% credible intervals for p(y y)

4 GP regression setting So far we have assumed that the latent values, f, have been corrupted by Gaussian noise. Everything remains analytically tractable. Gaussian Prior: f GP(0, K ff ) = p(f) Gaussian likelihood: y N ( f, σ 2 I ) n = p(y i f i ) Gaussian posterior: p(f y) N ( y f, σ 2 I ) N (f 0, K ff ) i=1

5 Likelihood p(y f) is the probability of the observed data, if we know the latent function values f. Can also be seen as the likelihood that the latent function values, f, would give rise to some observed data, y. So far assumed that the distortion of the underlying latent function, f, that gives rise to the observed data, y, is independent and Gaussianly distributed. This is often not the case, count data, binary data, etc.

6 Binary example Binary outcomes for y i, y i [0, 1]. Model the probability of y i = 1 with transformation of GP. Probability of 1 must be between 0 and 1, thus use squashing transformation, λ(f i ) = Φ(f i ). y i = { 1, with probability λ(fi ). 0, with probability 1 λ(f i ). Realizations of Φ(f(x)) Realizations of f(x) Observations

7 Count data example Non-negative and discrete values only for y i, y i N. Model the rate or intensity, λ, of events with a transformation of a Gaussian process. Rate parameter must remain positive, use transformation to maintain positiveness λ(f i ) = exp(f i ) or λ(f i ) = f 2 i y i Poisson(y i λ i = λ(f i )) Poisson(y i λ i ) = λy i i e λ i!y i 25 Realizations of exp(f(x)) Observations 95% credible intervals for p(y y)

8 Non-Gaussian posteriors Exact computation of posterior is no longer analytically tractable due to non-conjugate Gaussian process prior to non-gaussian likelihood, p(y f). p(f y) = p(f) n i=1 p(y i f i ) p(f) n i=1 p(y i f i ) df Various methods to make a Gaussian approximation, q(f) p(f y). Allows simple predictions p(f y) = p(f f)p(f y)df p(f f)q(f)df

9 Laplace approximation Find the mode of the true log posterior, via Newton s method. Use second order Taylor expansion around this modal value. i.e obtain curvature at this point. Form Gaussian approximation setting the mean equal to the posterior mode, ˆf, and matching the curvature. p(f y) q(f µ, C) = N ( f ˆf, (K 1 ff + W ) 1) W d2 log p(y ˆf) dˆf 2. For factorizing likelihoods (most), W is diagonal.

10 Visualization of Laplace 0.30 prior f

11 Visualization of Laplace 0.30 prior likelihood f

12 Visualization of Laplace prior likelihood posterior f

13 Visualization of Laplace prior likelihood posterior f

14 Visualization of Laplace prior likelihood posterior f

15 Visualization of Laplace prior likelihood posterior f

16 Visualization of Laplace prior likelihood posterior f

17 Visualization of Laplace prior likelihood posterior f

18 Visualization of Laplace prior likelihood posterior f

19 Visualization of Laplace 0.6 Evaluate curvature 0.5 prior likelihood posterior f

20 Visualization of Laplace 0.6 Evaluate curvature 0.5 prior likelihood posterior laplace f

21 KL-method Make a Gaussian approximation, q(f) = N (f µ, C), as similar possible to true posterior, p(f y). Treat µ and C as variational parameters, effecting quality of approximation. Define a divergence measure between two distributions, KL divergence, KL ( q(f) p(f y) ). Minimize this divergence between the two distributions (Nickisch and Rasmussen, 2008).

22 KL divergence General for any two distributions q(x) and p(x). KL ( q(x) p(x) ) is the average additional amount of information required to specify the values of x as a result of using an approximate distribution q(x) instead of the true distribution, p(x). KL ( q(x) p(x) ) = log q(x) p(x) q(x) Always 0 or positive, not symmetric. Lets look at how it changes with response to changes in the approximating distribution.

23 KL varying mean pdf q(x) ~ N(µ = 2.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ

24 KL varying mean pdf q(x) ~ N(µ = 1.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ

25 KL varying mean pdf q(x) ~ N(µ =,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ

26 KL varying mean pdf q(x) ~ N(µ =1.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ

27 KL varying mean pdf q(x) ~ N(µ =2.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ

28 KL varying variance pdf q(x) ~ N(µ =,σ 2 =0.3) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2

29 KL varying variance pdf q(x) ~ N(µ =,σ 2 =0.725) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2

30 KL varying variance pdf q(x) ~ N(µ =,σ 2 =1.15) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2

31 KL varying variance pdf q(x) ~ N(µ =,σ 2 =1.575) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2

32 KL varying variance pdf q(x) ~ N(µ =,σ 2 =2.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2

33 KL-method derivation Assume Gaussian approximate posterior, q(f) = N (f µ, C). True posterior using Bayes rule, p(f y) = p(y f)p(f) p(y). Cannot compute the KL divergence as we cannot compute the true posterior, p(f y). KL ( q(f) p(f y) ) = log q(f) p(f y) q(f) = log q(f) log p(y f) + log p(y) p(f) = KL ( q(f) p(f) ) log p(y f) q(f) + log p(y) log p(y) = log p(y f) q(f) KL ( q(f) p(f) ) + KL ( q(f) p(f y) ) q(f)

34 KL-method derivation log p(y) = log p(y f) q(f) KL ( q(f) p(f) ) + KL ( q(f) p(f y) ) log p(y f) q(f) KL ( q(f) p(f) ) Tractable terms give lower bound on log p(y) as KL ( q(f) p(f y) ) always positive. Adjust variational parameters µ and C to make tractable terms as large as possible, thus KL ( q(f) p(f y) ) as small as possible. log p(y f) q(f) with factorizing likelihood can be done with a series of n 1 dimensional integrals. In practice, can reduce the number of variational parameters by reparameterizing C = (K ff 2Λ) 1 by noting that the bound is constant in off diagonal terms of C.

35 Expectation Propagation p(f y) p(f) n p(y i f i ) i=1 q(f y) 1 Z ep p(f) n i=1 t i Z i N ( f i µ i, σ 2 i t i (f i Z i, µ i, σ 2 i ) = N (f µ, Σ) ) Individual likelihood terms, p(y i f i ), replaced by independent local likelihoods, t i. Uses an iterative algorithm to update t i s.

36 Expectation Propagation 1. From the current posterior, q(f y), leave out one of the local likelihoods, t i, then marginalize out f j i, giving rise to the cavity distribution, q i (f i ). 2. Combine cavity distribution, q i (f i ), with exact likelihood contribution, p(y i f i ), giving non-gaussian un-normalized distribution, ˆq(f i ) p(y i f i )q i (f i ). 3. Choose a un-normalized Gaussian approximation to this distribution, N ( ) f i ˆµ i, ˆσ 2 Ẑi, by finding moments of ˆq(f i i ). 4. Replace parameters of t i with those that produce the same moments as this approximation. 5. Choose another i and start again. Repeat to convergence.

37 Expectation Propagation - in math Step 1. First choose a marginal, i, to focus on, then q(f y) p(f) n j=1 t j (f j ) p(f) n j=1 t j(f j ) n p(f) t j (f j ) t i (f i ) j i p(f) t j (f j ) df j i q i (f i ) j i Step 2. p(y i f i )q i (f i ) ˆq(f i ) ) Ẑi Step 3. ˆq(f i ) N ( f i ˆµ i, ˆσ 2 i Step 4: Compute parameters of t i (f i Z i, µ i, σ 2 ) making moments i of q(f i ) match those of Ẑ i N ( ) f i ˆµ i, ˆσ 2 i.

38 Comparing posterior approximations Prior p(f 1,f 2 ) Likelihood p(y =1 f 1,f 2 ) f 2 0 f f f 1 Gaussian prior between two function values {f 1, f 2 }, at {x 1, x 2 } respectively. Bernoulli likelihood, y 1 = 1 and y 2 = 1.

39 Comparing posterior approximations True posterior Laplace approximation f 2 0 f f f 1 True posterior is non-gaussian. Laplace approximates with a Gaussian at the mode of the posterior.

40 Comparing posterior approximations True posterior KL approximation f 2 0 f f f 1 True posterior is non-gaussian. KL approximate with a Gaussian that has minimal KL divergence, KL ( q(f) p(f y) ). This leads to distributions that avoid regions in which p(f y) is small. It has a large penality for assigning density where there is none.

41 Comparing posterior approximations True posterior EP approximation f 2 0 f f f 1 True posterior is non-gaussian. EP tends to try and put density where p(f y) is large Cares less about assigning density density where there is none. Contrasts to KL method.

42 Comparing posterior marginal approximations Marginals for f 2 compared laplace posterior EP KL Laplace: Poor approximation. KL: Avoids assigning density to areas where there is none, at the expense of areas where there is some (right tail). EP: Assigns density to areas with density, at the expense of areas where there is none (left tail).

43 Pros - Cons - When - Laplace Laplace approximation Pros Very fast. Cons Poor approximation if the mode does not well describe the posterior, for example Bernoulli likelihood (probit). When When the posterior is well characterized by its mode, for example Poisson.

44 Pros - Cons - When - KL KL method Pros Principled in that it we are directly optimizing a measure of divergence between an approximation and true distribution. Can be relatively quick, and lends it self to sparse approximations (Hensman et al., 2015). Cons Requires factorizing likelihoods to avoid n dimensional integral. When Laplace approximation poor or likelihood is not Bernoulli.

45 Pros - Cons - When - EP EP method Pros Very effective for certain likelihoods (classification). Cons Slow though possible to extend to sparse case. Convergence issues for certain likelihoods. Must be able to match moments. When Binary data (Nickisch and Rasmussen, 2008; Kuß, 2006), perhaps with truncated likelihood (censored data) (Vanhatalo et al., 2015).

46 Pros - Cons - When - MCMC MCMC methods Pros Theoretical limit gives true distribution Cons Can be slow when data is large. When If time is not an issue, but exact accuracy is. If you are unsure whether a different approximation is appropriate, can be used as a ground truth

47 Questions Thanks for listening. Any questions?

48 References I Hensman, J., Matthews, A. G. D. G., and Ghahramani, Z. (2015). Scalable variational gaussian process classification. In In 18th International Conference on Artificial Intelligence and Statistics, pages 1 9, San Diego, California, USA. Kuß, M. (2006). Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning. PhD thesis, TU Darmstadt. Nickisch, H. and Rasmussen, C. E. (2008). Approximations for Binary Gaussian Process Classification. Journal of Machine Learning Research, 9: Vanhatalo, J., Riihimaki, J., Hartikainen, J., Jylanki, P., Tolvanen, V., and Vehtari, A. (2015). Gpstuff.

Gaussian Process Regression with Censored Data Using Expectation Propagation

Gaussian Process Regression with Censored Data Using Expectation Propagation Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 01 Gaussian Process Regression with Censored Data Using Expectation Propagation Perry Groot, Peter Lucas Radboud University Nijmegen,

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood Journal of Machine Learning Research 1 (1) 75-19 Submitted /1; Revised 9/1; Published 1/1 Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood Jaakko

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

arxiv: v1 [stat.ml] 22 Apr 2014

arxiv: v1 [stat.ml] 22 Apr 2014 Approximate Inference for Nonstationary Heteroscedastic Gaussian process Regression arxiv:1404.5443v1 [stat.ml] 22 Apr 2014 Abstract Ville Tolvanen Department of Biomedical Engineering and Computational

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Powered by TCPDF (www.tcpdf.org) This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Vehtari, Aki; Mononen, Tommi; Tolvanen,

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Sparse Approximations for Non-Conjugate Gaussian Process Regression

Sparse Approximations for Non-Conjugate Gaussian Process Regression Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes Liverpool, UK, 4 5 November 3 Aki Vehtari Department of Biomedical Engineering and Computational Science (BECS) Outline Example: Alcohol related deaths in Finland Spatial priors and benefits of GP prior

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Expectation propagation as a way of life

Expectation propagation as a way of life Expectation propagation as a way of life Yingzhen Li Department of Engineering Feb. 2014 Yingzhen Li (Department of Engineering) Expectation propagation as a way of life Feb. 2014 1 / 9 Reference This

More information

Student-t Process Regression with Student-t Likelihood

Student-t Process Regression with Student-t Likelihood Student-t Process Regression with Student-t Likelihood Qingtao Tang, Li Niu, Yisen Wang, Tao Dai, Wangpeng An, Jianfei Cai, Shu-Tao Xia Department of Computer Science and Technology, Tsinghua University,

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009 with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

arxiv: v1 [stat.ml] 7 Nov 2014

arxiv: v1 [stat.ml] 7 Nov 2014 James Hensman Alex Matthews Zoubin Ghahramani University of Sheffield University of Cambridge University of Cambridge arxiv:1411.2005v1 [stat.ml] 7 Nov 2014 Abstract Gaussian process classification is

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical

More information

Tree-structured Gaussian Process Approximations

Tree-structured Gaussian Process Approximations Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Parameter Expanded Variational Bayesian Methods

Parameter Expanded Variational Bayesian Methods Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi MIT CSAIL 32 Vassar street Cambridge, MA 02139 alanqi@csail.mit.edu Tommi S. Jaakkola MIT CSAIL 32 Vassar street Cambridge, MA 02139 tommi@csail.mit.edu

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Chained Gaussian Processes

Chained Gaussian Processes Chained Gaussian Processes Alan D. Saul James Hensman Aki Vehtari Neil D. Lawrence CHICAS, Faculty of Department of Health and Medicine Computer Science Lancaster University University of Sheffield Department

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Decoupled Variational Gaussian Inference

Decoupled Variational Gaussian Inference Decoupled Variational Gaussian Inference Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland emtiyaz@gmail.com Abstract Variational Gaussian (VG) inference methods that optimize

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

arxiv: v1 [stat.ml] 18 Apr 2016

arxiv: v1 [stat.ml] 18 Apr 2016 Chained Gaussian Processes arxiv:1604.0563v1 [stat.ml] 18 Apr 016 Alan D. Saul Department of Computer Science University of Sheffield James Hensman CHICAS, Faculty of Health and Medicine Lancaster University

More information

Expectation Propagation for Neural Networks with Sparsity-Promoting Priors

Expectation Propagation for Neural Networks with Sparsity-Promoting Priors Journal of Machine Learning Research 15 (2014) 1849-1901 Submitted 3/13; Revised 3/14; Published 5/14 Expectation Propagation for Neural Networks with Sparsity-Promoting Priors Pasi Jylänki Department

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

arxiv: v2 [stat.ml] 16 Oct 2017

arxiv: v2 [stat.ml] 16 Oct 2017 Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations arxiv:7.96v [stat.ml] 6 Oct 7 Eero Siivola, Aki Vehtari, Jarno Vanhatalo, Javier González,

More information

Variable sigma Gaussian processes: An expectation propagation perspective

Variable sigma Gaussian processes: An expectation propagation perspective Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu

More information

Gaussian Processes for Big Data. James Hensman

Gaussian Processes for Big Data. James Hensman Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Bristol Machine Learning Reading Group

Bristol Machine Learning Reading Group Bristol Machine Learning Reading Group Introduction to Variational Inference Carl Henrik Ek - carlhenrik.ek@bristol.ac.uk November 25, 2016 http://www.carlhenrik.com Introduction Ronald Aylmer Fisher 1

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Improving posterior marginal approximations in latent Gaussian models

Improving posterior marginal approximations in latent Gaussian models Improving posterior marginal approximations in latent Gaussian models Botond Cseke Tom Heskes Radboud University Nijmegen, Institute for Computing and Information Sciences Nijmegen, The Netherlands {b.cseke,t.heskes}@science.ru.nl

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Bayes Model Selection with Path Sampling: Factor Models

Bayes Model Selection with Path Sampling: Factor Models with Path Sampling: Factor Models Ritabrata Dutta and Jayanta K Ghosh Purdue University 07/02/11 Factor Models in Applications Factor Models in Applications Factor Models Factor Models and Factor analysis

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E483 Kernel Methods in Machine Learning Lecture : Gaussian processes Markus Heinonen 3. November, 26 Today s topic Gaussian processes: Kernel as a covariance function C. E. Rasmussen & C. K. I. Williams,

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016

More information