Non-Gaussian likelihoods for Gaussian Processes
|
|
- May Day
- 5 years ago
- Views:
Transcription
1 Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield
2 Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations
3 GP regression Model the observations, y, as a distorted version of the process, f, y i N ( f (x i ), σ 2) f is a non-linear function, in our case we assume it is latent, and is assigned a Gaussian process prior. 8 Realizations of f(x) Observations 95% credible intervals for p(y y)
4 GP regression setting So far we have assumed that the latent values, f, have been corrupted by Gaussian noise. Everything remains analytically tractable. Gaussian Prior: f GP(0, K ff ) = p(f) Gaussian likelihood: y N ( f, σ 2 I ) n = p(y i f i ) Gaussian posterior: p(f y) N ( y f, σ 2 I ) N (f 0, K ff ) i=1
5 Likelihood p(y f) is the probability of the observed data, if we know the latent function values f. Can also be seen as the likelihood that the latent function values, f, would give rise to some observed data, y. So far assumed that the distortion of the underlying latent function, f, that gives rise to the observed data, y, is independent and Gaussianly distributed. This is often not the case, count data, binary data, etc.
6 Binary example Binary outcomes for y i, y i [0, 1]. Model the probability of y i = 1 with transformation of GP. Probability of 1 must be between 0 and 1, thus use squashing transformation, λ(f i ) = Φ(f i ). y i = { 1, with probability λ(fi ). 0, with probability 1 λ(f i ). Realizations of Φ(f(x)) Realizations of f(x) Observations
7 Count data example Non-negative and discrete values only for y i, y i N. Model the rate or intensity, λ, of events with a transformation of a Gaussian process. Rate parameter must remain positive, use transformation to maintain positiveness λ(f i ) = exp(f i ) or λ(f i ) = f 2 i y i Poisson(y i λ i = λ(f i )) Poisson(y i λ i ) = λy i i e λ i!y i 25 Realizations of exp(f(x)) Observations 95% credible intervals for p(y y)
8 Non-Gaussian posteriors Exact computation of posterior is no longer analytically tractable due to non-conjugate Gaussian process prior to non-gaussian likelihood, p(y f). p(f y) = p(f) n i=1 p(y i f i ) p(f) n i=1 p(y i f i ) df Various methods to make a Gaussian approximation, q(f) p(f y). Allows simple predictions p(f y) = p(f f)p(f y)df p(f f)q(f)df
9 Laplace approximation Find the mode of the true log posterior, via Newton s method. Use second order Taylor expansion around this modal value. i.e obtain curvature at this point. Form Gaussian approximation setting the mean equal to the posterior mode, ˆf, and matching the curvature. p(f y) q(f µ, C) = N ( f ˆf, (K 1 ff + W ) 1) W d2 log p(y ˆf) dˆf 2. For factorizing likelihoods (most), W is diagonal.
10 Visualization of Laplace 0.30 prior f
11 Visualization of Laplace 0.30 prior likelihood f
12 Visualization of Laplace prior likelihood posterior f
13 Visualization of Laplace prior likelihood posterior f
14 Visualization of Laplace prior likelihood posterior f
15 Visualization of Laplace prior likelihood posterior f
16 Visualization of Laplace prior likelihood posterior f
17 Visualization of Laplace prior likelihood posterior f
18 Visualization of Laplace prior likelihood posterior f
19 Visualization of Laplace 0.6 Evaluate curvature 0.5 prior likelihood posterior f
20 Visualization of Laplace 0.6 Evaluate curvature 0.5 prior likelihood posterior laplace f
21 KL-method Make a Gaussian approximation, q(f) = N (f µ, C), as similar possible to true posterior, p(f y). Treat µ and C as variational parameters, effecting quality of approximation. Define a divergence measure between two distributions, KL divergence, KL ( q(f) p(f y) ). Minimize this divergence between the two distributions (Nickisch and Rasmussen, 2008).
22 KL divergence General for any two distributions q(x) and p(x). KL ( q(x) p(x) ) is the average additional amount of information required to specify the values of x as a result of using an approximate distribution q(x) instead of the true distribution, p(x). KL ( q(x) p(x) ) = log q(x) p(x) q(x) Always 0 or positive, not symmetric. Lets look at how it changes with response to changes in the approximating distribution.
23 KL varying mean pdf q(x) ~ N(µ = 2.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ
24 KL varying mean pdf q(x) ~ N(µ = 1.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ
25 KL varying mean pdf q(x) ~ N(µ =,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ
26 KL varying mean pdf q(x) ~ N(µ =1.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ
27 KL varying mean pdf q(x) ~ N(µ =2.0,σ 2 =1.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] µ
28 KL varying variance pdf q(x) ~ N(µ =,σ 2 =0.3) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2
29 KL varying variance pdf q(x) ~ N(µ =,σ 2 =0.725) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2
30 KL varying variance pdf q(x) ~ N(µ =,σ 2 =1.15) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2
31 KL varying variance pdf q(x) ~ N(µ =,σ 2 =1.575) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2
32 KL varying variance pdf q(x) ~ N(µ =,σ 2 =2.0) p(x) ~ N(,1.0) x KL[q(x) p(x)] σ 2
33 KL-method derivation Assume Gaussian approximate posterior, q(f) = N (f µ, C). True posterior using Bayes rule, p(f y) = p(y f)p(f) p(y). Cannot compute the KL divergence as we cannot compute the true posterior, p(f y). KL ( q(f) p(f y) ) = log q(f) p(f y) q(f) = log q(f) log p(y f) + log p(y) p(f) = KL ( q(f) p(f) ) log p(y f) q(f) + log p(y) log p(y) = log p(y f) q(f) KL ( q(f) p(f) ) + KL ( q(f) p(f y) ) q(f)
34 KL-method derivation log p(y) = log p(y f) q(f) KL ( q(f) p(f) ) + KL ( q(f) p(f y) ) log p(y f) q(f) KL ( q(f) p(f) ) Tractable terms give lower bound on log p(y) as KL ( q(f) p(f y) ) always positive. Adjust variational parameters µ and C to make tractable terms as large as possible, thus KL ( q(f) p(f y) ) as small as possible. log p(y f) q(f) with factorizing likelihood can be done with a series of n 1 dimensional integrals. In practice, can reduce the number of variational parameters by reparameterizing C = (K ff 2Λ) 1 by noting that the bound is constant in off diagonal terms of C.
35 Expectation Propagation p(f y) p(f) n p(y i f i ) i=1 q(f y) 1 Z ep p(f) n i=1 t i Z i N ( f i µ i, σ 2 i t i (f i Z i, µ i, σ 2 i ) = N (f µ, Σ) ) Individual likelihood terms, p(y i f i ), replaced by independent local likelihoods, t i. Uses an iterative algorithm to update t i s.
36 Expectation Propagation 1. From the current posterior, q(f y), leave out one of the local likelihoods, t i, then marginalize out f j i, giving rise to the cavity distribution, q i (f i ). 2. Combine cavity distribution, q i (f i ), with exact likelihood contribution, p(y i f i ), giving non-gaussian un-normalized distribution, ˆq(f i ) p(y i f i )q i (f i ). 3. Choose a un-normalized Gaussian approximation to this distribution, N ( ) f i ˆµ i, ˆσ 2 Ẑi, by finding moments of ˆq(f i i ). 4. Replace parameters of t i with those that produce the same moments as this approximation. 5. Choose another i and start again. Repeat to convergence.
37 Expectation Propagation - in math Step 1. First choose a marginal, i, to focus on, then q(f y) p(f) n j=1 t j (f j ) p(f) n j=1 t j(f j ) n p(f) t j (f j ) t i (f i ) j i p(f) t j (f j ) df j i q i (f i ) j i Step 2. p(y i f i )q i (f i ) ˆq(f i ) ) Ẑi Step 3. ˆq(f i ) N ( f i ˆµ i, ˆσ 2 i Step 4: Compute parameters of t i (f i Z i, µ i, σ 2 ) making moments i of q(f i ) match those of Ẑ i N ( ) f i ˆµ i, ˆσ 2 i.
38 Comparing posterior approximations Prior p(f 1,f 2 ) Likelihood p(y =1 f 1,f 2 ) f 2 0 f f f 1 Gaussian prior between two function values {f 1, f 2 }, at {x 1, x 2 } respectively. Bernoulli likelihood, y 1 = 1 and y 2 = 1.
39 Comparing posterior approximations True posterior Laplace approximation f 2 0 f f f 1 True posterior is non-gaussian. Laplace approximates with a Gaussian at the mode of the posterior.
40 Comparing posterior approximations True posterior KL approximation f 2 0 f f f 1 True posterior is non-gaussian. KL approximate with a Gaussian that has minimal KL divergence, KL ( q(f) p(f y) ). This leads to distributions that avoid regions in which p(f y) is small. It has a large penality for assigning density where there is none.
41 Comparing posterior approximations True posterior EP approximation f 2 0 f f f 1 True posterior is non-gaussian. EP tends to try and put density where p(f y) is large Cares less about assigning density density where there is none. Contrasts to KL method.
42 Comparing posterior marginal approximations Marginals for f 2 compared laplace posterior EP KL Laplace: Poor approximation. KL: Avoids assigning density to areas where there is none, at the expense of areas where there is some (right tail). EP: Assigns density to areas with density, at the expense of areas where there is none (left tail).
43 Pros - Cons - When - Laplace Laplace approximation Pros Very fast. Cons Poor approximation if the mode does not well describe the posterior, for example Bernoulli likelihood (probit). When When the posterior is well characterized by its mode, for example Poisson.
44 Pros - Cons - When - KL KL method Pros Principled in that it we are directly optimizing a measure of divergence between an approximation and true distribution. Can be relatively quick, and lends it self to sparse approximations (Hensman et al., 2015). Cons Requires factorizing likelihoods to avoid n dimensional integral. When Laplace approximation poor or likelihood is not Bernoulli.
45 Pros - Cons - When - EP EP method Pros Very effective for certain likelihoods (classification). Cons Slow though possible to extend to sparse case. Convergence issues for certain likelihoods. Must be able to match moments. When Binary data (Nickisch and Rasmussen, 2008; Kuß, 2006), perhaps with truncated likelihood (censored data) (Vanhatalo et al., 2015).
46 Pros - Cons - When - MCMC MCMC methods Pros Theoretical limit gives true distribution Cons Can be slow when data is large. When If time is not an issue, but exact accuracy is. If you are unsure whether a different approximation is appropriate, can be used as a ground truth
47 Questions Thanks for listening. Any questions?
48 References I Hensman, J., Matthews, A. G. D. G., and Ghahramani, Z. (2015). Scalable variational gaussian process classification. In In 18th International Conference on Artificial Intelligence and Statistics, pages 1 9, San Diego, California, USA. Kuß, M. (2006). Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning. PhD thesis, TU Darmstadt. Nickisch, H. and Rasmussen, C. E. (2008). Approximations for Binary Gaussian Process Classification. Journal of Machine Learning Research, 9: Vanhatalo, J., Riihimaki, J., Hartikainen, J., Jylanki, P., Tolvanen, V., and Vehtari, A. (2015). Gpstuff.
Gaussian Process Regression with Censored Data Using Expectation Propagation
Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 01 Gaussian Process Regression with Censored Data Using Expectation Propagation Perry Groot, Peter Lucas Radboud University Nijmegen,
More informationState Space Gaussian Processes with Non-Gaussian Likelihoods
State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationNested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
Journal of Machine Learning Research 1 (1) 75-19 Submitted /1; Revised 9/1; Published 1/1 Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood Jaakko
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationDisease mapping with Gaussian processes
EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationarxiv: v1 [stat.ml] 22 Apr 2014
Approximate Inference for Nonstationary Heteroscedastic Gaussian process Regression arxiv:1404.5443v1 [stat.ml] 22 Apr 2014 Abstract Ville Tolvanen Department of Biomedical Engineering and Computational
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationThis is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.
Powered by TCPDF (www.tcpdf.org) This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Vehtari, Aki; Mononen, Tommi; Tolvanen,
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationSparse Approximations for Non-Conjugate Gaussian Process Regression
Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationNon-Factorised Variational Inference in Dynamical Systems
st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationDisease mapping with Gaussian processes
Liverpool, UK, 4 5 November 3 Aki Vehtari Department of Biomedical Engineering and Computational Science (BECS) Outline Example: Alcohol related deaths in Finland Spatial priors and benefits of GP prior
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationExpectation propagation as a way of life
Expectation propagation as a way of life Yingzhen Li Department of Engineering Feb. 2014 Yingzhen Li (Department of Engineering) Expectation propagation as a way of life Feb. 2014 1 / 9 Reference This
More informationStudent-t Process Regression with Student-t Likelihood
Student-t Process Regression with Student-t Likelihood Qingtao Tang, Li Niu, Yisen Wang, Tao Dai, Wangpeng An, Jianfei Cai, Shu-Tao Xia Department of Computer Science and Technology, Tsinghua University,
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationarxiv: v1 [stat.ml] 7 Nov 2014
James Hensman Alex Matthews Zoubin Ghahramani University of Sheffield University of Cambridge University of Cambridge arxiv:1411.2005v1 [stat.ml] 7 Nov 2014 Abstract Gaussian process classification is
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationLatent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical
More informationTree-structured Gaussian Process Approximations
Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationParameter Expanded Variational Bayesian Methods
Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi MIT CSAIL 32 Vassar street Cambridge, MA 02139 alanqi@csail.mit.edu Tommi S. Jaakkola MIT CSAIL 32 Vassar street Cambridge, MA 02139 tommi@csail.mit.edu
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationChained Gaussian Processes
Chained Gaussian Processes Alan D. Saul James Hensman Aki Vehtari Neil D. Lawrence CHICAS, Faculty of Department of Health and Medicine Computer Science Lancaster University University of Sheffield Department
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationDecoupled Variational Gaussian Inference
Decoupled Variational Gaussian Inference Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland emtiyaz@gmail.com Abstract Variational Gaussian (VG) inference methods that optimize
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationarxiv: v1 [stat.ml] 18 Apr 2016
Chained Gaussian Processes arxiv:1604.0563v1 [stat.ml] 18 Apr 016 Alan D. Saul Department of Computer Science University of Sheffield James Hensman CHICAS, Faculty of Health and Medicine Lancaster University
More informationExpectation Propagation for Neural Networks with Sparsity-Promoting Priors
Journal of Machine Learning Research 15 (2014) 1849-1901 Submitted 3/13; Revised 3/14; Published 5/14 Expectation Propagation for Neural Networks with Sparsity-Promoting Priors Pasi Jylänki Department
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationarxiv: v2 [stat.ml] 16 Oct 2017
Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations arxiv:7.96v [stat.ml] 6 Oct 7 Eero Siivola, Aki Vehtari, Jarno Vanhatalo, Javier González,
More informationVariable sigma Gaussian processes: An expectation propagation perspective
Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu
More informationGaussian Processes for Big Data. James Hensman
Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationBristol Machine Learning Reading Group
Bristol Machine Learning Reading Group Introduction to Variational Inference Carl Henrik Ek - carlhenrik.ek@bristol.ac.uk November 25, 2016 http://www.carlhenrik.com Introduction Ronald Aylmer Fisher 1
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationImproving posterior marginal approximations in latent Gaussian models
Improving posterior marginal approximations in latent Gaussian models Botond Cseke Tom Heskes Radboud University Nijmegen, Institute for Computing and Information Sciences Nijmegen, The Netherlands {b.cseke,t.heskes}@science.ru.nl
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationBayes Model Selection with Path Sampling: Factor Models
with Path Sampling: Factor Models Ritabrata Dutta and Jayanta K Ghosh Purdue University 07/02/11 Factor Models in Applications Factor Models in Applications Factor Models Factor Models and Factor analysis
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More informationDeep learning with differential Gaussian process flows
Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E483 Kernel Methods in Machine Learning Lecture : Gaussian processes Markus Heinonen 3. November, 26 Today s topic Gaussian processes: Kernel as a covariance function C. E. Rasmussen & C. K. I. Williams,
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationThe Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks
The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016
More information