Sparse Approximations for Non-Conjugate Gaussian Process Regression

Size: px
Start display at page:

Download "Sparse Approximations for Non-Conjugate Gaussian Process Regression"

Transcription

1 Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11, 014 Abstract Notes: This report only shows some preliminary work on scaling Gaussian process models that use non-gaussian likelihoods. As there are recently arxived papers on the similar idea [1,], this report will stay as is, please consult the two papers above for a proper discussion and experiments. 1 Introduction Gaussian Processes (GPs) is an important tool for probabilistic models in both supervised and unsupervised settings. Typically, GPs are used as priors over unknown continuous functions which govern the trajectory of the observed variables. When the likelihood term is conjugate to this prior, e.g. a Gaussian likelihood, the posterior distribution is analytically tractable. It is however, not tractable for non-conjugate likelihoods, such as a Poisson likelihood in Poisson count regression or a logistic likelihood in binary classification. Approximation inference schemes such as Laplace approximation or Expectation Propagation can be applied in these cases [3]. Critically, these GP based models suffer from a high complexity, typically O(N 3 ) and O(N ) for learning and prediction respectively. When dealing with large datasets, sparse approximation techniques that can reduce the computation demand are preferred. However, most of the approximations were developed for regression problems that use a Gaussian likelihood, and hence often cannot be applied to problems that deal with other forms of likelihood in a straightforward way. This report introduces a variational inference formulation that 1. uses inducing points to augment the model as in current approximations for Gaussian likelihood,. uses variational free enery approach as a basis to find a generic lowerbound for many non-gaussian likelihoods, 3. can perform joint optimization of variational parameters and model hyperparameters, 4. offers a small computational complexity. Review of GPs for regression p(f) = N (f; 0, K ff ) (1) N p(y f) = p(y n f n) () Type Distribution p(y n f n) Continuous Gaussian p(y n f n) = N (y n; f n, σn) Binary Bernoulli logistic p(y n = 1 f n) = σ(f n) Categorical Multinomial logistic p(y n = k f i) = exp(f i,k lse(f i)) Ordinal Cumulative logistic p(y n k f n) = σ(φ k f n) Count Poisson p(y n = k f n) = exp(kfn exp(fn)) k! Continuous Laplace p(y n f n) = 1 b Positive, Real Exponential p(y n f n) = exp( yn fn exp( yn/ exp(fn)) exp(f n) b ), b = σ Table 1: Type of observed variables, possible likelihood functions and their forms. 3 Variational inference and learning 3.1 Formulation Augment the model [4 7] Choose variational distribution for the joint: q(f, u) = q(f u)q(u) and q(f u) = p(f u) [6]. It could be shown that the lower 1

2 bound using the augmented variational distribution is smaller than the bound obtained if there is no auxiliary variables u [7]. The log marginal likelihood: L = log p(y) (3) = log dudfp(f, u)p(y f) (4) p(f, u)p(y f) = log dudfq(f, u) (5) q(f, u) p(f, u)p(y f) dudfq(f, u) log (6) q(f, u) = dudfq(f, u) log p(f u)p(u)p(y f) (7) p(f u)q(u) = duq(u) log p(u) q(u) + dudfq(f, u) log p(y f) = F (8) The first term in the lower bound above is the negative KL divergence between distributions q(u) and p(u). The second term can be expressed as a sum of N terms, F = N F,n, where F,n = duq(u) df np(f n u) log p(y n f n) (9) = df n log p(y n f n) duq(u)p(f n u). (10) Therefore, the lower bound to the log marginal likelihood can be expressed as, N F = KL(q(u) p(u)) + df nq(f n) log p(y n f n), (11) where q(f n) = duq(u)p(f n u). In the next sections, we will discuss the choice of the variational distribution q(u) such that the KL term and q(f n) can be computed analytically, and how to find a tight or exact bounds of the expectation of the local likelihood under q(f n). We discuss several likelihood classes for which this approximation can be used. 3. Choice of variational distribution If we choose a Gaussian variational distribution, q(u) = N (u; m, C), the first term in the lower bound, that is a negative KL divergence, can be computed exactly, F 1 = KL(q(u) p(u)) = 1 [ tr(k 1 uuc) + (m µ u ) K 1 uu(m µ u ) M log detc ], (1) detk uu where p(u) = N (u; µ u, K uu) is the prior on the inducing variables. Typically, µ u = 0 which leads to a trivial simplication in the above equation, F 1 = KL(q(u) p(u)) = 1 [ tr(k 1 uuc) + m K 1 uum M log detc ]. (13) detk uu The distribution q(u) is parameterised by its mean m and covariance matrix C, which means there are M + M parameters in total. For a large M, this may not be desirable and could lead to problems with the optimisation of the lower bound. TODO: 1. Is there any transformation we could do here to reduce the number of parameters?. Can use a sparse precision parameterisation as in [8]. Choosing a single Gaussian distribution to represent the posterior of u may be too coarse and hence a multimodal distribution such as a mixture of Gaussian distributions may be a better choice for a multimodal posterior distribution [9, 10](TODO: this reduces significantly the number of variational parameters.).we could use a mixture of (uniformly) weighted Gaussians with isotropic covariances, q(u) = 1 K N (u; m k, σki). (14) However, this distribution leads to an intractabily in computing the KL term, as it involves a log of a sum of distributions. Fortunately, we can bound this using Jensen s inequality [9, 11] as follows, F 1 = KL(q(u) p(u)) (15) = duq(u) log q(u) (16) p(u)

3 = 1 K duq(u) log q(u) + log duq(u) log p(u) (17) duq(u)n (u; m k, σ ki) + 1 K dun (u; m k, σki) log p(u), (18) where we have used the inequality to bound the first term in the equations above. TODO: R. Turner s comment: there s been limited success using mixture of gaussians to parameterise the variational distribution, perhaps because of this bound is not tight! The first term in the bound above involves a convolution of two Gaussian distributions and hence, F 1a = 1 K = 1 K log log 1 K duq(u)n (u; m k, σ ki) (19) k =1 The second term is due to the assumption about the variational distribution, F 1b = 1 K N (m k ; m k, (σk + σ k )I) (0) [ M log detkuu 1 ] (µ u m k) K 1 uu(µ u m k ) + tr(k 1 uuc k ), (1) where C k = σ ki. The expressions presented above are tractable to compute, and its gradient with respect to the variational parameters {m k, σ k} K can be computed in closed-form. Note that when K = 1, the bound above is exact, and we obtain the result of the single Gaussian case presented at the beginning of this section. In what follows, we will present the result for the single Gaussian case, but the derivation for the mixture is straightforward. 3.3 Choice of local bounds We have discussed the choice of q(u) and how to deal with the first term of the bound F. In this section, we provide a treatment for the second term, where N F = df nq(f n) log p(y n f n), () q(f n) = duq(u)p(f n u) (3) = dun (u; m, C)N (f n; a nu, b n) (4) = N (f n; a nm, b n + a nca n), (5) and a n = K fnukuu, 1 b n = k fn,fn K fnuk 1 uuk ufn in the equations above. Let denote µ n = a nm and v n = b n + a nca n. We note that F involves one dimensional integrals which are the expectation of local likelihood terms wrt the distribution q(f n). Since q(f n) is a Gaussian distribution, the local integral can be computed exactly or can be bounded so that a tractable lower bound of F can be used instead of the original intractable integrals Poisson likelihood for count data regression Consider, which means, p(y n = k f n) = exp(ynfn exp(fn)), (6) y n! log p(y n f n) = y nf n exp(f n) log y n!. (7) Therefore, F,n = = df nq ( f n) log p(y n f n) (8) df nn (f n; µ n, v n) [y nf n exp(f n) log y n!] (9) 3

4 = y nµ n exp(µ n + vn ) log yn! (30) = y na nm exp( bn + anm + anca n ) log y n! (31) In words, F can be computed analytically when the likelihood is a Poisson distribution. We have used the exponential reverse-link function here. See [] for a treatment for a different reverse-link function Bernoulli logistic likelihood for logit regression or binary classification The bound of the log partition function can be found using Taylor s approximation [1] or the Fenchel inequality [13]. These bounds have been used extensively, but recently shown that they are not tight, especially, when being converted back to the logistic function [14]. Motivated by the performance of the piecewise linear/quadratic bound in [14, 15], we apply this technique for Bernoulli logistic and ordinaegression problems, in this section and next section respectively. Consider the logistic function for binary classification or logit regression, The log likelihood is, p(y n = 1 f n) = exp( f = exp(ynfn) n) 1 + exp(f. (3) n) log p(y n f n) = y nf n log(1 + exp(f n)) (33) Using the piecewise bound in [14], we obtain, F,n = df nq(f n) [y nf n log(1 + exp(f n)] (34) y nµ n df nq(f n)bound(log(1 + exp(f n)) (35) = y nµ n = y nµ n R r=1 R r=1 h r df nq(f n)(a n,rf n + b n,rf n + c n,r) (36) a n,re hr [f n µ n, v n] + b n,re hr [f n µ n, v n] + c n,re hr [f 0 n µ n, v n], (37) where E h l [f m µ, σ ] = h dfn (f; µ, σ )f m. Importantly, the gradients of the expectations in the equations above wrt the l mean and variance of the integrating distribution {µ n, v n}, can be computed, which means the gradients wrt the variational parameters and the inducing inputs x u can be computed. The forms of the truncated Gaussian moments can be found in, e.g. [16, pp ] Cumulative logistic likelihood for ordinaegression The following is an extension of [16, Ch. 6, pp ]. Consider the cumulative logistic likelihood, which means p(y n k f n) = σ(φ k f n), (38) hence its log, p(y n = k f n) = σ(φ k f n) σ(φ k 1 f n) (39) = e fn (e φ k 1 e φ k ) [1 + e (φ k 1 f n) ][1 + e (φ k 1 f n) ] (40) log p(y n = k f n) = f n log(e φ k 1 e φ k ) llp(f n φ k ) llp(f n φ k 1 ). (41) Similar to the previous section, we can obtain the bound on the llp function using R linear/quadratic pieces. The expectation of these pieces under a one dimensional Gaussian and their corresponding gradients can also be obtained as in the above sections. 4

5 3.3.4 Laplace likelihood Consider the Laplace likelihood, Taking the log gives, p(y n f n) = 1 b yn fn exp( ) (4) b yn fn log p(y n f n) = log(b) (43) b { yn+fn log(b) if y = b n f n y n f n (44) log(b) if y b n < f n Hence the intergral of log p(y n f n) is analytically tractable and involves evaluating the Gaussian cummulative distribution... TODO Exponential likelihood Consider the exponential likelihood, Taking the log gives, p(y n f n) = exp( yn/ exp(fn)) exp(f n) (45) log p(y n f n) = yn fn, (46) exp(f n) which means the intergral will be tractable as in the case of the Poisson likelihood with the exponential inverse-link function above Multinomial logistic likelihood for categoricaegression or multiclass classification Consider the likelihood for multinomial logistic regression, TODO: [15 18] Other likelihood classes that cannot be analytically bounded p(y n = k f i) = exp(f i,k lse(f i)) (47) If the bound on the log likelihood term g(f n) = log(y n f n) cannot be found analytically, we can use stochastic optimisation to obtain the unbiased estimate of F,n,k and its gradients wrt the mean and variance of the distribution q k (f n). That is, F,n,k 1 T T g(f t), f t q k (f n) (48) t=1 Furthermore, if q k (f n) is parameterised by the mean µ n and covariance v n, the gradient of the expectation can be expressed as an expectation of a gradient. This is due to Price s theorem [19] and Bonnet s theorem [0], which have been employed recently for deep models [8]. That is, d F,n,k = d E qk (f dµ n,k dµ n)[g(f n)] = E qk (f n) n,k d dv n,k F,n,k = d dv n,k E qk (f n)[g(f n)] = E qk (f n) [ ] d g(f n) df n [ ] d g(f n) dfn The above equation means that the gradients can be estimated using a Monte Carlo summation. following theorem is that log p(y n f n) needs to be twice differentiable wrt f n. Two potential issues with this approach is: Samples drawn from q k (u) may not be important as they fall into regions in which g n(u) is too small. (49) (50) (51) The condition for the Variance of the unbiased estimates may be large, TODO: variance reduction techniques?: Rao-Blackwellisation, change of variables?, read [8, 1]. However, the problem may not be as worse as it sounds, as the expectation above is only one dimensional and therefore, the number of samples needed is much smaller compared to similar approaches in high dimensions [8, ]. 5

6 3.4 Complexity O(NM ) 4 Can this be applied to non-conjugate GPLVMs? The answer is no if you are looking for an analytic solution because of the non-linear dependencies of X in p(f U), but stochastic approximation could be used. Again, getting this to work in high dimensions could be challenging. References [1] J. Hensman, A. Matthews, and Z. Ghahramani, Scalable variational Gaussian process classification, 014. [] C. Lloyd, T. Gunter, M. A. Osborne, and S. J. Roberts, Variational inference for Gaussian process modulated Poisson processes, 014. [3] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 005. [4] J. Quiñonero-Candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, The Journal of Machine Learning Research, vol. 6, pp , 005. [5] E. Snelson and Z. Ghahramani, Sparse Gaussian processes using pseudo-inputs, in Advances in Neural Information Processing Systems 19, pp , MIT press, 006. [6] M. K. Titsias, Variational learning of inducing variables in sparse Gaussian processes, in International Conference on Artificial Intelligence and Statistics, pp , 009. [7] T. Salimans, Markov chain Monte Carlo and variational inference: Bridging the gap, 014. [8] D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in Proceedings of the 31st International Conference on Machine Learning, pp , 014. [9] S. Gershman, M. Hoffman, and D. M. Blei, Nonparametric variational inference, in Proceedings of the 9th International Conference on Machine Learning, pp , 01. [10] T. Nguyen and E. Bonilla, Automated variational inference for Gaussian process models, in Advances in Neural Information Processing Systems (N. Lawrence and C. Cortes, eds.), vol. 6, 014. [11] M. Huber, T. Bailey, H. Durrant-Whyte, and U. Hanebeck, On entropy approximation for Gaussian mixture random vectors, in Multisensor Fusion and Integration for Intelligent Systems, 008. MFI 008. IEEE International Conference on, pp , Aug 008. [1] D. Böhning, Multinomial logistic regression algorithm, Annals of the Institute of Statistical Mathematics, vol. 44, no. 1, pp , 199. [13] T. S. Jaakkola and M. I. Jordan, Bayesian parameter estimation via variational methods, Statistics and Computing, vol. 10, no. 1, pp. 5 37, 000. [14] B. M. Marlin, M. E. Khan, and K. P. Murphy, Piecewise bounds for estimating Bernoulli-logistic latent Gaussian models, in Proceedings of the 8th International Conference on Machine Learning, 011. [15] M. E. Khan, S. Mohamed, and K. P. Murphy, Fast Bayesian inference for non-conjugate Gaussian process regression, in Advances in Neural Information Processing Systems, pp , 01. [16] M. E. Khan, Variational learning for latent Gaussian models of discrete data. PhD thesis, The University of British Columbia, 01. [17] G. Bouchard, Efficient bounds for the softmax function and applications to inference in hybrid models, in NIPS 007 Workshop on Approximate Bayesian Inference in Continuous/Hybrid Systems, 007. [18] D. Blei and J. Lafferty, Correlated topic models, Advances in neural information processing systems, vol. 18, p. 147, 006. [19] R. Price, A useful theorem for nonlinear devices having Gaussian inputs, Information Theory, IRE Transactions on, vol. 4, pp. 69 7, June

7 [0] G. Bonnet, Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire, Annales des Télécommunications, vol. 19, no. 9-10, pp. 03 0, [1] R. Ranganath, S. Gerrish, and D. Blei, Black box variational inference, in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp , 014. [] M. Titsias and M. Lázaro-Gredilla, Doubly stochastic variational bayes for non-conjugate inference, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp ,

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Non-Gaussian likelihoods for Gaussian Processes

Non-Gaussian likelihoods for Gaussian Processes Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model

More information

Tree-structured Gaussian Process Approximations

Tree-structured Gaussian Process Approximations Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary

More information

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Automated Variational Inference for Gaussian Process Models

Automated Variational Inference for Gaussian Process Models Automated Variational Inference for Gaussian Process Models Trung V. Nguyen ANU & NICTA VanTrung.Nguyen@nicta.com.au Edwin V. Bonilla The University of New South Wales e.bonilla@unsw.edu.au Abstract We

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression

Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression JMLR: Workshop and Conference Proceedings : 9, 08. Symposium on Advances in Approximate Bayesian Inference Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction

More information

Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression

Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Michalis K. Titsias Department of Informatics Athens University of Economics and Business mtitsias@aueb.gr Miguel Lázaro-Gredilla

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Scalable Large-Scale Classification with Latent Variable Augmentation

Scalable Large-Scale Classification with Latent Variable Augmentation Scalable Large-Scale Classification with Latent Variable Augmentation Francisco J. R. Ruiz Columbia University University of Cambridge Michalis K. Titsias Athens University of Economics and Business David

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Variable sigma Gaussian processes: An expectation propagation perspective

Variable sigma Gaussian processes: An expectation propagation perspective Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu

More information

arxiv: v1 [stat.ml] 7 Nov 2014

arxiv: v1 [stat.ml] 7 Nov 2014 James Hensman Alex Matthews Zoubin Ghahramani University of Sheffield University of Cambridge University of Cambridge arxiv:1411.2005v1 [stat.ml] 7 Nov 2014 Abstract Gaussian process classification is

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling

Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18 Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Ximing Li, Changchun

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

arxiv: v1 [stat.ml] 7 Mar 2015

arxiv: v1 [stat.ml] 7 Mar 2015 Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data arxiv:1503.02182v1 [stat.ml] 7 Mar 2015 Yarin Gal Yutian Chen Zoubin Ghahramani University of Cambridge Abstract Multivariate

More information

Variational Bayes on Monte Carlo Steroids

Variational Bayes on Monte Carlo Steroids Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Evaluating the Variance of

Evaluating the Variance of Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Non Linear Latent Variable Models

Non Linear Latent Variable Models Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

Decoupled Variational Gaussian Inference

Decoupled Variational Gaussian Inference Decoupled Variational Gaussian Inference Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland emtiyaz@gmail.com Abstract Variational Gaussian (VG) inference methods that optimize

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Benjamin M. Marlin Mohammad Emtiyaz Khan Kevin P. Murphy University of British Columbia, Vancouver, BC, Canada V6T Z4 bmarlin@cs.ubc.ca

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Circular Pseudo-Point Approximations for Scaling Gaussian Processes

Circular Pseudo-Point Approximations for Scaling Gaussian Processes Circular Pseudo-Point Approximations for Scaling Gaussian Processes Will Tebbutt Invenia Labs, Cambridge, UK will.tebbutt@invenialabs.co.uk Thang D. Bui University of Cambridge tdb4@cam.ac.uk Richard E.

More information

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic

More information

A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models

A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models Mohammad Emtiyaz Khan 1, Shakir Mohamed 1, Benjamin M. Marlin and Kevin P. Murphy 1 1 Department of Computer Science,

More information

Gaussian Processes for Big Data. James Hensman

Gaussian Processes for Big Data. James Hensman Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

MCMC for Variationally Sparse Gaussian Processes

MCMC for Variationally Sparse Gaussian Processes MCMC for Variationally Sparse Gaussian Processes James Hensman CHICAS, Lancaster University james.hensman@lancaster.ac.uk Maurizio Filippone EURECOM maurizio.filippone@eurecom.fr Alexander G. de G. Matthews

More information

Gray-box Inference for Structured Gaussian Process Models: Supplementary Material

Gray-box Inference for Structured Gaussian Process Models: Supplementary Material Gray-box Inference for Structured Gaussian Process Models: Supplementary Material Pietro Galliani, Amir Dezfouli, Edwin V. Bonilla and Novi Quadrianto February 8, 017 1 Proof of Theorem Here we proof the

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical

More information

Sparse Variational Inference for Generalized Gaussian Process Models

Sparse Variational Inference for Generalized Gaussian Process Models Sparse Variational Inference for Generalized Gaussian Process odels Rishit Sheth a Yuyang Wang b Roni Khardon a a Department of Computer Science, Tufts University, edford, A 55, USA b Amazon, 5 9th Ave

More information

Non-conjugate Variational Message Passing for Multinomial and Binary Regression

Non-conjugate Variational Message Passing for Multinomial and Binary Regression Non-conjugate Variational Message Passing for Multinomial and Binary Regression David A. Knowles Department of Engineering University of Cambridge Thomas P. Minka Microsoft Research Cambridge, UK Abstract

More information

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty

More information

Overdispersed Black-Box Variational Inference

Overdispersed Black-Box Variational Inference Overdispersed Black-Box Variational Inference Francisco J. R. Ruiz Data Science Institute Dept. of Computer Science Columbia University Michalis K. Titsias Dept. of Informatics Athens University of Economics

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The Generalized Reparameterization Gradient

The Generalized Reparameterization Gradient The Generalized Reparameterization Gradient Francisco J. R. Ruiz University of Cambridge Columbia University Michalis K. Titsias Athens University of Economics and Business David M. Blei Columbia University

More information

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes Alexander G. de G. Matthews 1, James Hensman 2, Richard E. Turner 1, Zoubin Ghahramani 1 1 University of Cambridge,

More information

Deep Neural Networks as Gaussian Processes

Deep Neural Networks as Gaussian Processes Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Markov Chain Monte Carlo Algorithms for Gaussian Processes Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling

More information

Variational Dependent Multi-output Gaussian Process Dynamical Systems

Variational Dependent Multi-output Gaussian Process Dynamical Systems Variational Dependent Multi-output Gaussian Process Dynamical Systems Jing Zhao and Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science, University of Manchester, UK mtitsias@cs.man.ac.uk Abstract Sparse Gaussian process methods

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Nonparametric Bayesian inference on multivariate exponential families

Nonparametric Bayesian inference on multivariate exponential families Nonparametric Bayesian inference on multivariate exponential families William Vega-Brown, Marek Doniec, and Nicholas Roy Massachusetts Institute of Technology Cambridge, MA 2139 {wrvb, doniec, nickroy}@csail.mit.edu

More information

Variational Dropout and the Local Reparameterization Trick

Variational Dropout and the Local Reparameterization Trick Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Reading Group on Deep Learning Session 2

Reading Group on Deep Learning Session 2 Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Approximate Bayesian inference

Approximate Bayesian inference Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information