INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

Size: px
Start display at page:

Download "INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES"

Transcription

1 INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China Abstract: This paper presents a new model called infinite mixtures of multivariate Gaussian processes, which can be used to learn vectorvalued functions and applied to multitask learning. As an extension of the single multivariate Gaussian process, the mixture model has the advantages of modeling multimodal data and alleviating the computationally cubic complexity of the multivariate Gaussian process. A Dirichlet process prior is adopted to allow the (possibly infinite) number of mixture components to be automatically inferred from training data, and Markov chain Monte Carlo sampling techniques are used for parameter and latent variable inference. Preliminary experimental results on multivariate regression show the feasibility of the proposed model. Keywords: Gaussian process; Dirichlet process; Markov chain Monte Carlo; Multitask learning; Vector-valued function; Regression. Introduction Gaussian processes provide a principled probabilistic approach to pattern recognition and machine learning. Formally, a Gaussian process is a collection of random variables such that any finite number of them obey a joint Gaussian prior distribution. As a Bayesian nonparametric model, the Gaussian process model proves to be very powerful for general function learning problems such as regression and classification [, 2]. Recently, motivated by the need to learn vector-valued functions and for multitask learning, research on multivariate or multi-output Gaussian processes has attracted a lot of attention. By learning multiple related tasks jointly, the common knowledge underlying different tasks can be shared, and thus a performance gain is likely to be obtained [3]. Representative works on multivariate Gaussian processes include the methods given in [4, 5, 6]. However, it is well known that Gaussian processes suffer from two important limitations [2, 7]. First, limited by the inherent unimodality of Gaussian distributions, Gaussian processes cannot characterize multimodal data which are prevalent in practice. Second, they are computationally infeasible for big data, since inference requires the inversion of an N N and N M N M covariance matrix respectively for a single-variate and multivariate Gaussian process, where N is the number of training examples and M is the output dimensionality. These two limitations can be greatly alleviated by making use of mixtures of Gaussian processes [8] where there are multiple Gaussian processes to jointly explain data and one example only belongs to one Gaussian process component. For mixtures of Gaussian processes, the infinite mixtures based on Dirichlet processes [9] are prevailing because they permit the number of components to be inferred directly from data and thus bypass the difficult model selection problem on the component number. For single-variate or single-output Gaussian processes, there were already some variants and implementations for infinite mixtures which brought great success for data modeling and prediction applications [2, 7, 0]. However, no extension of multivariate Gaussian processes to mixture models has been presented yet. Here, we will fill this gap by proposing an infinite mixture model of multivariate Gaussian processes. It should be noted that the implementation of this infinite model is very challenging because the multivariate Gaussian processes are much more complicated than the single-variate Gaussian processes. The rest of this paper is organized as follows. After providing the new infinite mixture model in Section 2, we show how the hidden variable inference and prediction problems are performed in Section 3 and Section 4, respectively. Then, we report experimental results on multivariate regression in Section 5. Finally, concluding remarks and future work directions are given in Section 6.

2 2. The proposed model The graphical model for the proposed infinite mixture of multivariate Gaussian processes (IMMGP) on the observed training data D = {x i, y i } N i= is depicted in Figure. The observation likelihood for our IMMGP is p({x i, y i } Θ) = p( Θ) p({y i : z i = r} {x i : z i = r}, Θ) r p({x i : z i = r} Θ) = p( Θ) p({y i : z i = r} {x i : z i = r}, Θ) r N r p(x rj µ r, R r ). () j= 2.. Distributions for hidden variables α is the concentration parameter of the Dirichlet process, which controls the prior probability of assigning an example to a new mixture component and thus influences the total number of components in the mixture model. A gamma distribution G(α a 0, b 0 ) is used. We use the parameterization for the gamma distribution given in []. Given α and {z i } n i=, the distribution of z n+ is easy to get with the Chinese restaurant process metaphor [9]. The distribution over the input space for a mixture component is given by a Gaussian distribution with a full covariance p(x z = r, µ r, R r ) = N (x µ r, R r ), (2) Figure. The graphical model for IMMGP In the graphical model, r indexes the rth Gaussian process component in the mixture, which can be infinitely large if enough data are provided. N r is the number of examples belonging to the rth component. D and M are the dimensions for the input and output space, respectively. The set {α, {µ r }, {R r }, {σ r0 }, {K r }, {w rd }, {σ rl }} includes all random parameters, which is denoted here by Θ. The latent variables are z i (i =,..., N) and F r (r = : ), where F r can be removed from the graphical model by integration if we directly consider a distribution over {Y r }. Denote the set of latent indicators by, that is, = {z i } N i=. Since F r is for illustrative purposes only, the latent indicators and random parameters Θ constitute the total hidden variables. The circles in the left and right columns of the graphical model indicate the hyperparameters, whose values are usually found by maximum likelihood estimation or designated manually if people have a strong belief on them. where R r is the precision (inverse covariance) matrix. This input model is often flexible enough to provide a good performance, though people can consider to adopt mixtures of Gaussian distributions to model the input space. Parameters µ r and R r are further specified by a Gaussian distribution prior and a Wishart distribution prior, respectively µ r N (µ 0, R 0 ), R r W(W 0, ν 0 ). (3) The parameterization for the Wishart distribution is the same as that in []. A Gaussian process prior is placed over the latent functions {f rl } M l= for component r in our model. Assuming the Gaussian processes have zero mean values, we set E ( f rl (x)f rk (x ) ) = σ r0 K r (l, k)k r (x, x ), y rl (x) N (f rl (x), σ rl ), (4) where scaling parameter σ r0 > 0, K r is a positive semi-definite matrix that specifies the inter-task similarities, k r (, ) is a covariance function over inputs, and σ rl is the noise variance for the lth output of the rth component. The prior of the M M

3 positive semi-definite matrix K r is given by a Wishart distribution W(W, ν ). σ r0 and σ rl are given gamma priors G(σ r0 a, b ) and G(σ rl a 2, b 2 ), respectively. We set ( k r (x, x ) = exp D 2 d= w2 rd(x d x d) 2), (5) where w rd obeys a log-normal distribution N (ln w rd µ, r ) with mean µ and variance r. The whole setup for a single Gaussian process component is in large difference with that in [4]. 3. Inference Since exact inference on the distribution p(, Θ D) is infeasible, in this paper we use Markov chain Monte Carlo sampling techniques to obtain L samples { j, Θ j } L j= to approximate the distribution p(, Θ D). In particular, Gibbs sampling is adopted to represent the posterior of the hidden variables. First of all, we initialize all the variables in {, Θ} by sampling them from their priors. Then the variables are updated using the following steps. () Update indicator variables {z i } N i= one by one, by cycling through the training data. (2) Update input and output space Gaussian process parameters {{µ r }, {R r }, {σ r0 }, {K r }, {w rd }, {σ rl }} for each Gaussian process component in turn. (3) Update Dirichlet process concentration parameter α. These three steps constitute a Gibbs sampling sweep over all hidden variables, which are repeated until the Markov chain has adequate samples. Note that samples in the burn-in stage should be removed from the Markov chain and are not used for approximating the posterior distribution. In the following subsections, we provide the specific sampling method and formulations involved for each update. 3.. Updating indicator variables Let i = \z i = {z,..., z i, z i+,..., z N } and D i = D\{x i, y i }. To sample z i, we need the following posterior conditional distribution p(z i i, Θ, D) p(z i i, Θ)p(D z i, i, Θ) p(z i i, Θ)p ( y i {y j : j i, z j = z i }, {x j : z j = z i }, Θ ) p(x i µ zi, R zi ), (6) where we have used a clear decomposition between the joint distributions of {x i, y i } and D i. It is not difficult to calculate the three terms involved in the last two lines of (6). However, the computation of p(y i {y j : j i, z j = z i }, {x j : z j = z i }, Θ) may be more efficient if some approximation scheme or acceleration method is adopted. In addition, for exploring new experts, we just sample the parameters once from the prior, use them for the new expert, and then calculate (6), following [0, 2]. The indicator variable update method is also algorithm 8 from [3] with the auxiliary component parameter m = Updating input space component parameters For the input space parameters µ r and R r, they can be sampled directly because their posterior conditional distributions have a simple formulation as a result of using conjugate priors. p(µ r, Θ\µ r, D) = p(µ r {x rj } Nr j=, R r) p(µ r )p({x rj } Nr j= µ r, R r ) R 0 /2 exp{ 2 (µ r µ 0 ) R 0 (µ r µ 0 )} R r /2 exp{ 2 (x rj µ r ) R r (x rj µ r )} j exp{ 2 [µ r R 0 µ r 2µ r R 0 µ 0 + j (µ r R r µ r 2µ r R r x rj )]}, (7) and therefore p(µ r, Θ\µ r, D) = N ((R 0 + N r R r ) (R 0 µ 0 + R r j x j), (R 0 + N r R r ) ). p(r r, Θ\R r, D) = p(r r {x rj } Nr j=, µ r) p(r r )p({x rj } Nr j= µ r, R r ) R r (ν0 D )/2 exp{ 2 Tr(W 0 R r)} R r /2 exp{ 2 (x rj µ r ) R r (x rj µ r )} j R r (ν0+nr D )/2 exp{ 2 Tr((W 0 + j (x rj µ r )(x rj µ r ) )R r )},

4 and thus p(r r, Θ\R r, D) ( (W = W 0 + (x rj µ r )(x rj µ r ) ) ), ν0 + N r. j 3.3. Updating output space component parameters Note that Y r = {y i : i N, z i = r} and Y r = N r. In this subsection, we denote its N r elements by {y j r }Nr j= which correspond to {x j r} Nr j=. Define the complete M outputs in the rth GP as y r = (y r,..., y Nr r, y r2,..., y Nr r2,..., y rm,..., y Nr rm ), (8) where y j rl is the observation for the lth output on the jth input. According to the Gaussian process assumption given in (4), the observation y r follows a Gaussian distribution y r N (0, Σ), Σ = σ r0 K r K x r + D r I, (9) where denotes the Kronecker product, Kr x is the N r N r covariance matrix between inputs with Kr x (i, j) = k r (x i r, x j r), D r is an M M diagonal matrix with D r (i, i) = σ ri, I is an N r N r identity matrix, and therefore the size of Σ is MN r MN r. The predictive distribution for the interested variable f on a new input x which belongs to the rth component is N (K Σ y r, K K Σ K ), (0) where K(M MN = σ r) r0k r k x r, K = σ r0 K r, and k x r is a N r row vector with the ith element being k r (x, x i r). Hence, the expected output on x is K Σ y r. Note that the calculation of Σ is a source for approximation to speed up training. However, this problem is easier than the original single GP model since we already reduced the inversion from an MN MN matrix to several MN r MN r matrices. We use hybrid Monte Carlo [4] to update σ r0, and the basic Metropolis-Hastings algorithm to update K r, {w rd }, and {σ rl } with the corresponding proposal distributions being their priors. Below we give the posteriors of the output space parameters and when necessary provide some useful technical details. We have p(σ r0, Θ\σ r0, D) p(σ r0 )p(y r {x j r} Nr σ a r0 exp( b σ r0 ) = exp { Σ /2 exp( 2 y r Σ y r ) [ ( a ) ln σ r0 + b σ r0 + 2 ln Σ + ]} 2 y r Σ y r, () and thus the potential energy E(σ r0 ) = ( a ) ln σ r0 + b σ r0 + 2 ln Σ + 2 y r Σ y r. The gradient de(σ r0 )/dσ r0 is needed in order to use hybrid Monte Carlo, which is given by de(σ r0 ) dσ r0 = a σ r0 + b + 2 Tr[(Σ Σ y r y r Σ )(K r K x r )]. p(k r, Θ\K r, D) p(k r )p(y r {x j r} Nr K r (ν M )/2 exp{ 2 Tr(W K r)} Σ exp( /2 2 y r Σ y r ) { = exp [ (M + ν ) ln K r + Tr(W 2 K r) + ln Σ + y r Σ y r ]}. (2) p(w rd, Θ\w rd, D) p(w rd )p(y r {x j r} Nr w rd exp{ (ln w rd µ ) 2 } 2r Σ exp( /2 2 y r Σ y r ) { [ = exp ln w rd + (ln w rd µ ) 2 + 2r 2 ln Σ + ]} 2 y r Σ y r. (3) p(σ rl, Θ\σ rl, D) p(σ rl )p(y r {x j r} Nr σ a2 rl exp( b 2 σ rl ) = exp { Σ /2 exp( 2 y r Σ y r ) [ ( a 2 ) ln σ rl + b 2 σ rl + 2 ln Σ + ]} 2 y r Σ y r. (4) 3.4. Updating the concentration parameter α The basic Metropolis-Hastings algorithm is used to update α. Let c N be the number of distinct values in

5 {z,..., z N }. It is clear from [5] that p(c α, N) = β N c α c Γ(α) Γ(N + α), (5) where coefficient βc N is the absolute value of Stirling numbers of the first kind, and Γ( ) is the gamma function. With (5) as the likelihood, we can get the posterior of α is p(α c, N) p(α)p(c α, N) p(α) αc Γ(α) Γ(N + α). (6) Since the gamma prior is used, it follows that, 4. Prediction p(α c, N) αc+a0 exp( b 0 α)γ(α). (7) Γ(N + α) The graphical model for prediction is shown in Figure 2. Figure 2. The graphical model for prediction on a new input x The predictive distribution for the predicted output of a new test input x is p(f x, D) = p(f, z,, Θ x, D)dΘ z = p(z,, Θ x, D)p(f z,, Θ, x, D)dΘ z = p(z, Θ, x )p(, Θ x, D)p(f z,, Θ, x, D)dΘ,z p(z x,, Θ)p(, Θ D)p(f z,, Θ, x, D)dΘ,z [ ] = p(z x,, Θ)p(f z,, Θ, x, D) z p(, Θ D)dΘ, (8) where we have made use of the conditional independence p(z, Θ, x, D) = p(z x,, Θ) and a reasonable approximation p(, Θ x, D) p(, Θ D). With the Markov chain Monte Carlo samples { i, Θ i } L i= to approximate the above summation and integration over and Θ, it follows that, p(f x, D) [ L L i= z p(z x, i, Θ i )p(f z, i, Θ i, x, D) Therefore, the prediction for f is [ ] ˆf = L p(z x, i, Θ i )E(f z, i, Θ i, x, D), L z i= where the expectation involved is simple to calculate since p(f z, i, Θ i, x, D) is a Gaussian distribution, and z takes values from i or is different from i with the corresponding parameters sampled from the priors. The computation of p(z x, i, Θ i ) is given as follows. p(z x, i, Θ i ) = p(z i, Θ i )p(x z, i, Θ i ) p(x i, Θ i ) = p(z i, Θ i )p(x z, i, Θ i ) z p(z i, Θ i )p(x z, i, Θ i ) = p(z i, Θ i )p(x z, Θ i ) z p(z i, Θ i )p(x z, Θ i ), (9) where the last equality follows from the conditional independence. If z = r i, then p(z = r i, Θ i ) = Nir α+n with N ir = #{z : z i, z = r} and p(x z, Θ i ) = p(x µ r, R r ). If z / i, then p(z i, Θ i ) = α α+n and p(x z, Θ i ) = p(x µ, R)p(µ µ 0, R 0 )p(r W 0, ν 0 )dµdr. Unfortunately, this integral is not analytically tractable. A Monte Carlo estimate by sampling µ and R from the priors can be used to reach an approximation. Note that, if z / i, then E(f z, i, Θ i, x, D) = 0 as a result of zero-mean Gaussian process priors. Otherwise, E(f z, i, Θ i, x, D) can be calculated using standard Gaussian process regression formulations. 5. Experiment To evaluate the proposed infinite mixture model and the used inference and prediction methods, we perform multivariate re- ].

6 gression on a synthetic data set. The data set includes 500 examples that are generated by ancestral sampling from the infinite mixture model. The dimensions for the input and output spaces are both set to two. From the whole data, 400 examples are randomly selected as training data and the other 00 examples serve as test data. 5.. Hyperparameter setting The hyperparameters for generating data are set as follows: a 0 =, b 0 =, µ 0 = 0, R 0 = I/0, W 0 = I/(0D), ν 0 = D, a =, b =, W = I/M, ν = M, µ = 0, r = 0.0, a 2 = 0., and b 2 =. The same hyperparameters are used for inference except µ 0, R 0 and W 0. µ 0 and R 0 are set to the mean µ x and inverse covariance R x of the training data, respectively. W 0 is set to R x /D Prediction Performance By Markov chain Monte Carlo sampling, we obtain 4000 samples where only the last 2000 samples are retained for prediction. For comparison purpose, the MTLNN approach (multitask learning neural networks without ensemble learning) [6] is adopted. Table reports the root mean squared error (RMSE) on the test data for our IMMGP model and the MTLNN approach. IMMGP only considers the existing Gaussian process components reflected by the samples, while IMMGP2 considers to choose a new component as well. The results indicate that IMMGP outperforms MTLNN and the difference between IM- MGP and IMMGP2 is very small. Table. Prediction errors of different methods 6. Conclusion MTLNN IMMGP IMMGP In this paper, we have presented a new model called infinite mixtures of multivariate Gaussian processes and applied it to multivariate regression with good performance returned. Interesting future directions include applying this model to largescale data, adapting it to classification problems and devising fast deterministic approximate inference techniques. References [] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA, [2] S. Sun and X. Xu, Variational inference for infinite mixtures of Gaussian processes with applications to traffic flow prediction, IEEE Transactions on Intelligent Transportation Systems, Vol. 2, No. 2, pp , 20. [3] Y. Ji and S. Sun, Multitask multiclass support vector machines: Model and experiments, Pattern Recognition, Vol. 46, No. 3, pp , 203. [4] E. Bonilla, K. Chai, and C. Williams, Multi-task Gaussian process prediction, Advances in Neural Information Processing Systems, Vol. 20, pp , [5] C. Yuan, Conditional multi-output regression, Proceedings of the Interantional Joint Conference on Neural Networks, pp , 20. [6] M. Alvarez and N. Lawrence, Computationally efficient convolved multiple output Gaussian processes, Journal of Machine Learning Research, Vol. 2, pp , 20. [7] C. Rasmussen and. Ghahramani, Infinite mixtures of Gaussian process experts, Advances in Neural Information Processing Systems, Vol. 4, pp , [8] V. Tresp, Mixtures of Gaussian processes, Advances in Neural Information Processing Systems, Vol. 3, pp , 200. [9] Y. Teh, Dirichlet processes, in Encyclopedia of Machine Learning, Springer-Verlag, Berlin, Germany, 200. [0] E. Meeds and S. Osindero, An alternative infinite mixture of Gaussian process experts, Advances in Neural Information Processing Systems, Vol. 8, pp , [] C. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, New York, [2] C. Rasmussen, The infinite Gaussian mixture model, Advances in Neural Information Processing Systems, Vol. 2, pp , [3] R. Neal, Markov chain sampling methods for Dirichlet process mixture models, Technical Report 985, Department of Statistics, University of Toronto, 998. [4] R. Neal, Probabilistic inference using Markov chain Monte Carlo methods, Technical Report CRG-TR-93-, University of Toronto, 993. [5] C. Antoniak, Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, Annals of Statistics, Vol. 2, No. 6, pp , 974. [6] S. Sun, Traffic flow forecasting based on multitask ensemble learning, Proceedings of the ACM SIGEVO World Summit on Genetic and Evolutionary Computation, pp , 2009.

An Alternative Infinite Mixture Of Gaussian Process Experts

An Alternative Infinite Mixture Of Gaussian Process Experts An Alternative Infinite Mixture Of Gaussian Process Experts Edward Meeds and Simon Osindero Department of Computer Science University of Toronto Toronto, M5S 3G4 {ewm,osindero}@cs.toronto.edu Abstract

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Infinite Mixtures of Gaussian Process Experts

Infinite Mixtures of Gaussian Process Experts in Advances in Neural Information Processing Systems 14, MIT Press (22). Infinite Mixtures of Gaussian Process Experts Carl Edward Rasmussen and Zoubin Ghahramani Gatsby Computational Neuroscience Unit

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

Bayesian Mixtures of Bernoulli Distributions

Bayesian Mixtures of Bernoulli Distributions Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Infinite Latent Feature Models and the Indian Buffet Process

Infinite Latent Feature Models and the Indian Buffet Process Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational

More information

Variational Dependent Multi-output Gaussian Process Dynamical Systems

Variational Dependent Multi-output Gaussian Process Dynamical Systems Variational Dependent Multi-output Gaussian Process Dynamical Systems Jing Zhao and Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai

More information

A new Hierarchical Bayes approach to ensemble-variational data assimilation

A new Hierarchical Bayes approach to ensemble-variational data assimilation A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

Variational Mixtures of Gaussian Processes for Classification

Variational Mixtures of Gaussian Processes for Classification Variational Mixtures of Gaussian Processes for Classification Chen Luo, Shiliang Sun Department of Computer Science and Technology, East China Normal University, 3663 North Zhongshan Road, Shanghai 200062,

More information

Further Issues and Conclusions

Further Issues and Conclusions Chapter 9 Further Issues and Conclusions In the previous chapters of the book we have concentrated on giving a solid grounding in the use of GPs for regression and classification problems, including model

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Modeling human function learning with Gaussian processes

Modeling human function learning with Gaussian processes Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Hyperparameter estimation in Dirichlet process mixture models

Hyperparameter estimation in Dirichlet process mixture models Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and

More information

Hierarchical Dirichlet Processes with Random Effects

Hierarchical Dirichlet Processes with Random Effects Hierarchical Dirichlet Processes with Random Effects Seyoung Kim Department of Computer Science University of California, Irvine Irvine, CA 92697-34 sykim@ics.uci.edu Padhraic Smyth Department of Computer

More information

Part IV: Monte Carlo and nonparametric Bayes

Part IV: Monte Carlo and nonparametric Bayes Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Inference in Explicit Duration Hidden Markov Models

Inference in Explicit Duration Hidden Markov Models Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Multivariate Normal & Wishart

Multivariate Normal & Wishart Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Supplementary Notes: Segment Parameter Labelling in MCMC Change Detection

Supplementary Notes: Segment Parameter Labelling in MCMC Change Detection Supplementary Notes: Segment Parameter Labelling in MCMC Change Detection Alireza Ahrabian 1 arxiv:1901.0545v1 [eess.sp] 16 Jan 019 Abstract This work addresses the problem of segmentation in time series

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005 Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Lecture 1a: Basic Concepts and Recaps

Lecture 1a: Basic Concepts and Recaps Lecture 1a: Basic Concepts and Recaps Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping

Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping Akio Utsugi National Institute of Bioscience and Human-Technology, - Higashi Tsukuba Ibaraki 35-8566, Japan March 6, Abstract Generative

More information

Power Load Forecasting based on Multi-task Gaussian Process

Power Load Forecasting based on Multi-task Gaussian Process Preprints of the 19th World Congress The International Federation of Automatic Control Power Load Forecasting based on Multi-task Gaussian Process Yulai Zhang Guiming Luo Fuan Pu Tsinghua National Laboratory

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information