Sequential Inference for Deep Gaussian Process

Size: px
Start display at page:

Download "Sequential Inference for Deep Gaussian Process"

Transcription

1 Sequential Inference for Deep Gaussian Process Yali Wang, Marcus Brubaker, Brahim Chaib-draa, Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University AISTATS 2016 Presented by Lei Yu Sept 9, 2016

2 Highlight 1 An efficient sequential inference framework for Deep Gaussian Process (DGP) is proposed. 2 Two DGP extensions: heteroscedasticity (non-stationary noise) multi-task learning.

3 Motivation GPs can not handle with non-stationary functions, e.g. functions with sudden jumps Variants of GP to address non-stationarity: warped Gaussian Process [1] product models [2] Deep GPs Inference of DGP: Variational approximations are commonly used for inference Matirx inversion involved in inference leads to O(N 3 ) complexity

4 Sequential Inference for Deep GP

5 Deep Gaussian Process The output is assumed to be multi-dimensional, y R D y and to be generated as follows: h 0 = x h i = f i (h i 1 ) + v i, i = 1,..., L with y = f y (h L ) + v y f i GP(0, k θi ) f y GP(0, k θy ) Figure 1: The graphical model for DGP.

6 Sequential Inference for Deep GP Sequential inference means, for each training pair (x n, y n ), the inference of latent variables is composed of two steps: State Estimation: sequential Monte Carlo [3] for the latent states S n = {h n 1:L } Model Updating: sparse online GP (GP so ) [4] or Kernel Recursive Least-Square method [5] for model parameters M = {M 1,..., M L, M y }, with M n 1 { X n 1, µ n 1, Σ n 1, Q n 1 } Active training set Inverse covariance matrix of X Posterior over X: N(µ, Σ)

7 State Estimation The latent states S n are estimated via the posterior expectation E[S n ] = S n p(s n x n, y n, M n 1, Θ dgp )ds n intractable due to the deep architecture Sequential Monte Carlo: for k = 1,..., N p 1 sampling kth sample (particle) of S n from current DGP model h n 1 (k) p(hn 1 xn, M1 n 1, Θ dgp ) h n i (k) p(hn i hn i 1 (k), Mn 1 i, Θ dgp ) i = 2,..., L 2 weighting N p ĥ n i = ŵ n (k)h n i (k), with w n (k) ŵn (k) = Np k=1 k=1 wn (k) where w n (k) is determined by liklihood in the observation layer w n (k) = p(y n h n L (k), Mn 1 y, Θ dgp )

8 Model Updating Once the states of i-th layer are ready, the input-output training data (h n i 1, hn i ) for single GP is available and online updating from Mn 1 i to Mi n is required (layer index is omit) M n 1 { X n 1, µ n 1, Σ n 1, Q n 1 } M n { X n, µ n, Σ n, Q n } And the task is fulfilled in two steps: 1 updating the posterior distribution p(f n X n ) based on p(f n 1 X n 1 ) = N(f n 1 µ n 1, Σ n 1 ) 2 updating the active set of training data X. Kernel Recursive Least-Square method is exploited [5]

9 KRLS: Updating posterior (1) When n-th data is available, update X n l = X n 1 l (h n l 1, hn l ). Then according to Bayes rule: p(f n X n ) = p(f n 1, f n X n 1, h n ) = p(hn f n )p( f n f n 1 )p(f n 1 X n 1 ) p(h n X n 1 ) Observation prior GP and Conditioned GP prior p(h n f n ) = N(h n f n, σ 2 ) p(f n 1 X n 1 ) = N(f n 1 µ n 1, Σ n 1 ), p( f n f n 1 ) = N( f n q T n f n 1, γ 2 n) Evidence p(h n X n 1 ) = p(h n f n )p( f n f n 1 )p(f n 1 X n 1 )df n 1 d f n = N(h n q T n f n 1, σ 2 + γn 2 + q T n Σ n 1 q } {{ } n ) η 2 n

10 KRLS: Updating posterior (2) p(f n X n ) = N(f n µ n, Σ n ) (1) where [ µ n = µ n 1 q T n µ n 1 [ Σ n Σ = n 1 q T n Σ n 1 [ Q n Q n 1 0 = 0 T 0 q n = Q n 1 k n 1 (h n ) ] + hn q T n µ n 1 η 2 n Σ n 1 q n γn 2 + q T n Σ n 1 q n ] + 1 [ qn γn 2 1 ψ n ] ψ nψ T n ] [ qn 1 η 2 n ] T γn 2 = k(h n, h n ) k T n 1 (hn )q n [ Σ ψ n = n 1 ] q n γn 2 + q T n Σ n 1 q n

11 KRLS: Updating active dataset without info loss Consider the posterior of f n given the data set X n, p(f n X n ) = p(f n 1, f n X n ) = p(f n 1 X n ) p( f n f n 1, X n ) } {{ } related to f n f n can be removed f n is non-informative, i.e. p( f n f n 1, X n ) = p(yn f n ) p(y n f n 1 ) p( f n f n 1 ) = p( f n f n 1 ) p(y n f n ) N( f n, σ 2 I) p(y n f n 1 ) N(q T n f n 1, (σ 2 + γ 2 n)i) the second equation verified when γ 2 n = 0 and f n = q T n f n 1. If γ 2 n = 0, basis f n can be removed without any loss of information, and update active set X n l = X n l \(hn l 1, hn l )

12 KRLS: Updating active dataset with info loss Once the amount of bases in the dictionary grows larger than a predefined budget N AC, remove one basis from the dictionary. Considering i-th basis, removing fi n will effect the approximation results, which is measured by the KL distance KL(p(f n X n ) p([f n 1 ] i X n ) p( f n i [fn 1 ] i ) ) for the exact distribution, mean of f n is µ n for the approximate distribution, mean of [f n 1 ] i remains unchanged, i.e. [µ n ] i, while mean of fi n is q T n [µ n ] i Select the removed basis via E i = µ n i q T n [µ n ] i = [Qn µ n ] i Q n i,i k = arg min i ( E 2 i )

13 KRLS: Re-Updating after Removing Basis Model parameters should be updated simply via µ n [µ n ] k Σ n [Σ n ] k, k Q n [Q n ] k, k [Qn ] k,k [Q n ] T k,k [Q n ] k,k Update active set X n l = X n l \(hk l 1, hk l )

14 Overall algorithm (a) Deep GP (b) Heteroscedastic Deep GP igure 1: The graphical model of DGP and its heteroscedastic extension. ) is Gaussian with mean µ gp? and + 2 I] 1 y, k? [K + 2 I] 1 (k? ) T + 2. (3) ctor is y = [y 1,...,y N ] T. The K is computed with (K ) i,j =...N. Similarly, k?,? is computed the vector k? is computed with =1,...N. :N, ), a popular approach to infer e negative log likelihood (4) 1 y log K + 2 I + 1 n log 2. 2 on involved in Eq. (3-4) leads to Many approximations have been e computational e ciency. A defound in [14, 15]. ian Process heir ability to learn non-stationary nction with sudden jumps) [5, 6]. orrelations between di erent outpically ignored if GPs are applied Algorithm 1 Sequential inference for DGP 1: Input: 2: The n-th data pair: (x n, y n ) 3: The DGP model parameters at the (n 1)-th step: M n 1 = {M1 n 1,...,M n 1 L,My n 1 } 4: Output: 5: Estimates of the latent state at the n-th step: Ŝn = {ĥn 1:L } 6: DGP model parameters at the n-th step: M n = {M1 n,...,ml n,mn y } 7: State Estimation: 8: Draw N p samples of S n from (7) & (8) 9: Weight samples by (9) & (11) 10: Estimate Ŝn by (10) 11: Model Update: 12: Update M n 1 to M n by using GP so with (x n, ĥn 1 ),,(ĥn L 1, ĥn L ), (ĥn L, yn ). 3 Sequential Inference for Deep GP For n-th training data, the complexity (including sampling and model updating) is O(NAC 2 ) The overall complexity is O(NAC 2 N), which is much less than that of full GP O(N 3 ). In this work, we propose a sequential inference algorithm which is able to perform learning for the DGP model e ectively and e ciently. The algorithm (summarized in Alg. 1) assumes an online setting where data is presented for learning in sequence. Specifically, for each training pair, (x n, y n ), the algorithm consists

15 Prediction and Hyperparameter Learning Given a trained DGP model M = {M 1,..., M L, M y } and a new input x, h 1 (k) p(h 1 x, M 1, Θ dgp ) h i (k) p(h i h i 1 (k), M i, Θ dgp ) Predictive distribution of y is Gaussian mixture ŷ 1 N p N p k=1 p(y h L (k), M y, Θ dgp ) Hyperparameter Θ dgp for each layer can be learned by minimizing the negative log likelihood.

16 Sequential DGP Extensions

17 Heteroscedastic Learning Homoscedastic noise, i.e. σ y is a constant y = f y (h L ) + v y Heteroscedastic noise, i.e. σ y is dependent to input s α = f α (h L ) + v α, s β = f β (h L ) + v β p(f s α, s β ) = N(s α, exp(s β )) Figure 2: Heteroscedastic extension of DGP.

18 Multi-Task Learning Deep network structure of DGP allows correlations between outputs to be represented by sharing the latent layers Multi-Task learning is straightforward in sequential inference scheme, Unnormalized weight is computed by using only observed dimensions of the output y n obs w n (k) = p(y n obs hn L (k), Mn 1 y, Θ dgp ) Missing dimensions are estimated using the same procedure as latent states. Model parameters M n y are estimated using both observed and estimated outputs.

19 Further possible extensions A novel and efficient inference framework for DGP is proposed. Further possible extensions might be: 1 Exploiting deep learning techniques such as ReLU activation functions and dropout in sided the framework 2 Sequential DGP with other network structures e.g. CNNs, RNNs, LSTMs.

20 Experiments

21 Experiments ARD kernal: Evaluation: d k(x, x ) = σ 2 ker exp 0.5 c i (x i x i )2 i=1 Root mean squared error (RMSE) mean negative log probability (MNLP) Average training time (second) Abbreviations: Sequential DGP - DGP seq Heteroscedastic sequential DGP - HDGP seq Variational inference DGP - DGP var Sparse online GP - GP so

22 DGP as Deep Dynamical Prior: unsupervised learning Input is the time step and output is the multi-dimensional observation. Datasets (Time seuqences): motion, flu and stock Training size 1500/300/1000 for motion/flu/stock datasets Layers L = 2 Latent dimension D = 5 Number of samples N p = 100 Budget size N AC = 100/100/200 respectively for 3 datasets

23 Performance w.r.t. Latent Layer Number L

24 Performance w.r.t. Layer Dimensionality D

25 Performance w.r.t. Number of Particles N p

26 Performance w.r.t. Size of Active Set N AC

27 Qualitative Results on motion dataset Yali Wang, Marcus Brubaker, Brahim Chaib-draa, Raquel Urtasun Figure 5: DGP as Deep Dynamical Prior (the motion dataset). Particle Estimation of y at the time 700, 1200, 1570, 2070 where the number of particles Np is 100. Figure 3: DGP as Deep Dynamical Prior (the motion dataset). Particle Estimation of y.

28 Sequential Inference for DeepSupplementary GP Material: Sequential Sequential InferenceDGP for Extensions Deep Gaussian Process Experiments Dimensionality Reduction 24 Low-dimensional latent representations on Frey Our DGP Supplementary 22 Material: Sequential Inference for Deep seq face (L=1 D=3) Gaussian dataset: Processinput and 23 Our DGP (L=2 D=2) output are the noisy face images. seq Reconstruction Error (RMSE) Reconstruction Reconstruction Error Error (RMSE) (RMSE) Noise Level (std=0.1) (std=0.3) 25 Our DGP seq (L=1 D=2) DGP var (L=1 D=2) DGP (L=1 D=3) var Noise Level (std=0.1) DGP var (L=2 D=2) Our DGP seq (L=1 D=2) Number of of Episodes Our DGP (L=1 D=3) 22 seq Figure 8: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction error as a function Our DGP (L=2 D=2) of the number of episodes. 20Note that there is no sequential inference & hyperparameter seq learning episode in DGP var. The results of DGP var are lines. The error bar is mean±standarddgp deviation. (L=1 D=2) var 18 DGP var (L=1 D=3) 16 DGP var (L=2 D=2) Number of Episodes Ground Truth Figure 8: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction error as a function of the number of episodes. Note that there is no sequential inference & hyperparameter learning episode in Noisy Observation DGP var. The (std=0.1) results of DGP var are lines. The error bar is mean±standard deviation. DGP_var (L=1, D=3) Our DGP_seq (1 st Ep, L=1, D=3) Our DGP_seq (5 th Ep, L=1, D=3) Ground Truth Figure 9: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction image comparison at the time step 50:50:1900.

29 Our DGP_seq Sequential Inference (1 st Ep, for L=1, Deep D=3) GP Sequential DGP Extensions Experiments Our DGP_seq (5 Heteroscedastic th Ep, L=1, D=3) Learning Result on Motercycle Figure 9: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction image comparison at the time step 50:50:1900. Dynamical prior estimation on Motorcycle dataset: 94 time/acceration pairs. Acceleration Time (a) our HGPseq vs GP (b) s of our HGPseq (c) s of our HGPseq Figure 10: Heteroscedastic GP Modeling (Posterior of Motorcycle Data). In Plot10(a), the observations are black crosses, the mean of our HGP seq / GP are pink / blue lines, 95% confidence interval of our HGP seq /GP are pink / blue dashed lines.

30 Missing Data in Multi-task Learning (1) Dynamical prior estimation on motion, flu and stock (small portion of observations is missing)

31 Sequential Inference for Deep GP Supplementary Sequential DGP Material: Extensions Sequential Inference Experiments for Dee Missing Data in Multi-task Learning (2) Data Methods RMSE MNLP GP so 9.33 ± ±0.0 Parkinsons DGP var 9.26± ±1.3 our DGP seq 8.86± ±0.0 Dynamical prior estimation on Jura 359/100 total/missing dataset, with input 2D location and output the measurement of cadmium, nickel and zinc concentraions. 1 Table 2: DGP as Regressor (Parkinsons Data): Prediction Err Methods MAE Train T(s) GP ± CMOGP ± SLFM ± GPRN ± GPRN-NPV ± our DGP seq ± Table 5: Comparison with State-of-the-art Multi-task GPs (the Jura dataset). The results of GP, CMOGP, SLFM, GPRN are from [12]. [5] J. Hensman Compressio [6] Y. Cho, L. Learning, in [7] Y. Gal, M. tributed Va Process Reg in: NIPS, 2 [8] K. Kersting 1 nickel and zinc concentrations. The total number of gard, Most MAE - Mean absolute error; CMOGP - convolved multiple outputs GP; SLFM - data pairs is 359, where 100 measurements of cadmium cess Regress semiparametric latent factor model; GPRN - GP regression networks; GPRN-NPV1 - are missing. We follow the experimental settings of GPRN with nonparametric variational inference (mode 1). [9] M. Lázaro-

32 DGP for Supervised Learning (1) Regresion on Parkinsons elemonitoring dataset, a 6 months biomedical voice recording from 42 people with early-stage Parkinson s disease for a total of 5875 input/output pairs for training and 875 for testing Parameter setting: L = 2, D = 2, N p = 500, N AC = 100 Supplementary Material: Sequential Inference for Deep Gaussian Process Data Methods RMSE MNLP Train T(s) GP so 9.33 ± ± ±2 Parkinsons DGP var 9.26± ± ±1370 our DGP seq 8.86± ± ±18 Table 2: DGP as Regressor (Parkinsons Data): Prediction Error & Training E ciency. Methods MAE Train T(s) GP ± CMOGP ± [5] J. Hensman, N. Lawrence, Nested Varitaio Compression in Deep Gaussian Processes,

33 Motion DGP for Supervised Learning (2) Classification on Tumor gene expression dataset and MNIST digits dataset. 2 Tumor: L = 3, D = 12, N p = 200, N AC = 144 MNIST: L = 1, D = 400, N p = 100, N AC = 1000 Data Methods RMSE(%) MNLP GP so our DGP seq 1.0± ±0.7 our HDGP seq 1.1± ±0.2 Flu Stock GP so our DGP seq 4.0± ±0.3 our HDGP seq 3.0± ±0.1 GP so our DGP seq 6.4± ±0.0 our HDGP seq 6.6± ±0.0 Table 1: Missing Data in Multi-Task Learning (Mot Data Methods Acc Train T(s) GP so Tumor DGP var our DGP seq GP so mins MNIST DGP var - - DVI-GPLVM mins* our DGP seq mins Table 3: Classification. For MNIST, the results of DVI-GPLVM are from [7]. (*) Distributed implementation using an unreported number of cores. 2 DVI-GPLVM - GPLVM with distributed variational inference. Methods GP NGP WGP MLHGP VHGP our HGP seq Table 4: Com eroscedastic GP

34 References [1] M. Lázaro-Gredilla, Bayesian warped Gaussian processes, in Advances in Neural Information Processing Systems, 2012, pp [2] R. P. Adams and O. Stegle, Gaussian process product models for nonparametric nonstationarity, in Proceedings of the 25th international conference on Machine learning, 2008, pp [3] L. Csató and M. Opper, Sparse on-line Gaussian processes, Neural computation, vol. 14, no. 3, pp , [4] O. Cappé, S. J. Godsill, and E. Moulines, An overview of existing methods and recent advances in sequential Monte Carlo, Proceedings of the IEEE, vol. 95, no. 5, pp , [5] S. Van Vaerenbergh, M. Lázaro-Gredilla, and I. Santamaría, Kernel recursive least-squares tracker for time-varying regression, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 8, pp , 2012.

Sequential Inference for Deep Gaussian Process

Sequential Inference for Deep Gaussian Process Yali Wang Marcus Brubaker Brahim Chaib-draa Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University University of Toronto yl.wang@siat.ac.cn mbrubake@cs.toronto.edu chaib@ift.ulaval.ca

More information

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Tree-structured Gaussian Process Approximations

Tree-structured Gaussian Process Approximations Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Nonparametric Bayesian inference on multivariate exponential families

Nonparametric Bayesian inference on multivariate exponential families Nonparametric Bayesian inference on multivariate exponential families William Vega-Brown, Marek Doniec, and Nicholas Roy Massachusetts Institute of Technology Cambridge, MA 2139 {wrvb, doniec, nickroy}@csail.mit.edu

More information

Dropout as a Bayesian Approximation: Insights and Applications

Dropout as a Bayesian Approximation: Insights and Applications Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Sparse Approximations for Non-Conjugate Gaussian Process Regression

Sparse Approximations for Non-Conjugate Gaussian Process Regression Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Gaussian Process Regression with Censored Data Using Expectation Propagation

Gaussian Process Regression with Censored Data Using Expectation Propagation Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 01 Gaussian Process Regression with Censored Data Using Expectation Propagation Perry Groot, Peter Lucas Radboud University Nijmegen,

More information

Lecture 1c: Gaussian Processes for Regression

Lecture 1c: Gaussian Processes for Regression Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk

More information

Variational Dependent Multi-output Gaussian Process Dynamical Systems

Variational Dependent Multi-output Gaussian Process Dynamical Systems Variational Dependent Multi-output Gaussian Process Dynamical Systems Jing Zhao and Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Information-Based Multi-Fidelity Bayesian Optimization

Information-Based Multi-Fidelity Bayesian Optimization Information-Based Multi-Fidelity Bayesian Optimization Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low and Mohan Kankanhalli Department of Computer Science, National University of Singapore, Republic

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

arxiv: v1 [stat.ml] 27 May 2017

arxiv: v1 [stat.ml] 27 May 2017 Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes arxiv:175.986v1 [stat.ml] 7 May 17 Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk

More information

Nonparametric Inference for Auto-Encoding Variational Bayes

Nonparametric Inference for Auto-Encoding Variational Bayes Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk Neil D. Lawrence lawrennd@amazon.com

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

Gaussian Processes for Big Data. James Hensman

Gaussian Processes for Big Data. James Hensman Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Variational Dropout and the Local Reparameterization Trick

Variational Dropout and the Local Reparameterization Trick Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Variable sigma Gaussian processes: An expectation propagation perspective

Variable sigma Gaussian processes: An expectation propagation perspective Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Markov Chain Monte Carlo Algorithms for Gaussian Processes Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Anomaly Detection and Removal Using Non-Stationary Gaussian Processes

Anomaly Detection and Removal Using Non-Stationary Gaussian Processes Anomaly Detection and Removal Using Non-Stationary Gaussian ocesses Steven Reece Roman Garnett, Michael Osborne and Stephen Roberts Robotics Research Group Dept Engineering Science Oford University, UK

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Sparse Spectral Sampling Gaussian Processes

Sparse Spectral Sampling Gaussian Processes Sparse Spectral Sampling Gaussian Processes Miguel Lázaro-Gredilla Department of Signal Processing & Communications Universidad Carlos III de Madrid, Spain miguel@tsc.uc3m.es Joaquin Quiñonero-Candela

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Andrew Gordon Wilson David A. Knowles Zoubin Ghahramani Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK agw38@cam.ac.uk dak33@cam.ac.uk zoubin@eng.cam.ac.uk Abstract We introduce

More information

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server in collaboration with: Minjie Xu, Balaji Lakshminarayanan, Leonard Hasenclever, Thibaut Lienart, Stefan Webb,

More information

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Variational Dropout via Empirical Bayes

Variational Dropout via Empirical Bayes Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,

More information

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information