Sequential Inference for Deep Gaussian Process
|
|
- Gary Moody
- 5 years ago
- Views:
Transcription
1 Sequential Inference for Deep Gaussian Process Yali Wang, Marcus Brubaker, Brahim Chaib-draa, Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University AISTATS 2016 Presented by Lei Yu Sept 9, 2016
2 Highlight 1 An efficient sequential inference framework for Deep Gaussian Process (DGP) is proposed. 2 Two DGP extensions: heteroscedasticity (non-stationary noise) multi-task learning.
3 Motivation GPs can not handle with non-stationary functions, e.g. functions with sudden jumps Variants of GP to address non-stationarity: warped Gaussian Process [1] product models [2] Deep GPs Inference of DGP: Variational approximations are commonly used for inference Matirx inversion involved in inference leads to O(N 3 ) complexity
4 Sequential Inference for Deep GP
5 Deep Gaussian Process The output is assumed to be multi-dimensional, y R D y and to be generated as follows: h 0 = x h i = f i (h i 1 ) + v i, i = 1,..., L with y = f y (h L ) + v y f i GP(0, k θi ) f y GP(0, k θy ) Figure 1: The graphical model for DGP.
6 Sequential Inference for Deep GP Sequential inference means, for each training pair (x n, y n ), the inference of latent variables is composed of two steps: State Estimation: sequential Monte Carlo [3] for the latent states S n = {h n 1:L } Model Updating: sparse online GP (GP so ) [4] or Kernel Recursive Least-Square method [5] for model parameters M = {M 1,..., M L, M y }, with M n 1 { X n 1, µ n 1, Σ n 1, Q n 1 } Active training set Inverse covariance matrix of X Posterior over X: N(µ, Σ)
7 State Estimation The latent states S n are estimated via the posterior expectation E[S n ] = S n p(s n x n, y n, M n 1, Θ dgp )ds n intractable due to the deep architecture Sequential Monte Carlo: for k = 1,..., N p 1 sampling kth sample (particle) of S n from current DGP model h n 1 (k) p(hn 1 xn, M1 n 1, Θ dgp ) h n i (k) p(hn i hn i 1 (k), Mn 1 i, Θ dgp ) i = 2,..., L 2 weighting N p ĥ n i = ŵ n (k)h n i (k), with w n (k) ŵn (k) = Np k=1 k=1 wn (k) where w n (k) is determined by liklihood in the observation layer w n (k) = p(y n h n L (k), Mn 1 y, Θ dgp )
8 Model Updating Once the states of i-th layer are ready, the input-output training data (h n i 1, hn i ) for single GP is available and online updating from Mn 1 i to Mi n is required (layer index is omit) M n 1 { X n 1, µ n 1, Σ n 1, Q n 1 } M n { X n, µ n, Σ n, Q n } And the task is fulfilled in two steps: 1 updating the posterior distribution p(f n X n ) based on p(f n 1 X n 1 ) = N(f n 1 µ n 1, Σ n 1 ) 2 updating the active set of training data X. Kernel Recursive Least-Square method is exploited [5]
9 KRLS: Updating posterior (1) When n-th data is available, update X n l = X n 1 l (h n l 1, hn l ). Then according to Bayes rule: p(f n X n ) = p(f n 1, f n X n 1, h n ) = p(hn f n )p( f n f n 1 )p(f n 1 X n 1 ) p(h n X n 1 ) Observation prior GP and Conditioned GP prior p(h n f n ) = N(h n f n, σ 2 ) p(f n 1 X n 1 ) = N(f n 1 µ n 1, Σ n 1 ), p( f n f n 1 ) = N( f n q T n f n 1, γ 2 n) Evidence p(h n X n 1 ) = p(h n f n )p( f n f n 1 )p(f n 1 X n 1 )df n 1 d f n = N(h n q T n f n 1, σ 2 + γn 2 + q T n Σ n 1 q } {{ } n ) η 2 n
10 KRLS: Updating posterior (2) p(f n X n ) = N(f n µ n, Σ n ) (1) where [ µ n = µ n 1 q T n µ n 1 [ Σ n Σ = n 1 q T n Σ n 1 [ Q n Q n 1 0 = 0 T 0 q n = Q n 1 k n 1 (h n ) ] + hn q T n µ n 1 η 2 n Σ n 1 q n γn 2 + q T n Σ n 1 q n ] + 1 [ qn γn 2 1 ψ n ] ψ nψ T n ] [ qn 1 η 2 n ] T γn 2 = k(h n, h n ) k T n 1 (hn )q n [ Σ ψ n = n 1 ] q n γn 2 + q T n Σ n 1 q n
11 KRLS: Updating active dataset without info loss Consider the posterior of f n given the data set X n, p(f n X n ) = p(f n 1, f n X n ) = p(f n 1 X n ) p( f n f n 1, X n ) } {{ } related to f n f n can be removed f n is non-informative, i.e. p( f n f n 1, X n ) = p(yn f n ) p(y n f n 1 ) p( f n f n 1 ) = p( f n f n 1 ) p(y n f n ) N( f n, σ 2 I) p(y n f n 1 ) N(q T n f n 1, (σ 2 + γ 2 n)i) the second equation verified when γ 2 n = 0 and f n = q T n f n 1. If γ 2 n = 0, basis f n can be removed without any loss of information, and update active set X n l = X n l \(hn l 1, hn l )
12 KRLS: Updating active dataset with info loss Once the amount of bases in the dictionary grows larger than a predefined budget N AC, remove one basis from the dictionary. Considering i-th basis, removing fi n will effect the approximation results, which is measured by the KL distance KL(p(f n X n ) p([f n 1 ] i X n ) p( f n i [fn 1 ] i ) ) for the exact distribution, mean of f n is µ n for the approximate distribution, mean of [f n 1 ] i remains unchanged, i.e. [µ n ] i, while mean of fi n is q T n [µ n ] i Select the removed basis via E i = µ n i q T n [µ n ] i = [Qn µ n ] i Q n i,i k = arg min i ( E 2 i )
13 KRLS: Re-Updating after Removing Basis Model parameters should be updated simply via µ n [µ n ] k Σ n [Σ n ] k, k Q n [Q n ] k, k [Qn ] k,k [Q n ] T k,k [Q n ] k,k Update active set X n l = X n l \(hk l 1, hk l )
14 Overall algorithm (a) Deep GP (b) Heteroscedastic Deep GP igure 1: The graphical model of DGP and its heteroscedastic extension. ) is Gaussian with mean µ gp? and + 2 I] 1 y, k? [K + 2 I] 1 (k? ) T + 2. (3) ctor is y = [y 1,...,y N ] T. The K is computed with (K ) i,j =...N. Similarly, k?,? is computed the vector k? is computed with =1,...N. :N, ), a popular approach to infer e negative log likelihood (4) 1 y log K + 2 I + 1 n log 2. 2 on involved in Eq. (3-4) leads to Many approximations have been e computational e ciency. A defound in [14, 15]. ian Process heir ability to learn non-stationary nction with sudden jumps) [5, 6]. orrelations between di erent outpically ignored if GPs are applied Algorithm 1 Sequential inference for DGP 1: Input: 2: The n-th data pair: (x n, y n ) 3: The DGP model parameters at the (n 1)-th step: M n 1 = {M1 n 1,...,M n 1 L,My n 1 } 4: Output: 5: Estimates of the latent state at the n-th step: Ŝn = {ĥn 1:L } 6: DGP model parameters at the n-th step: M n = {M1 n,...,ml n,mn y } 7: State Estimation: 8: Draw N p samples of S n from (7) & (8) 9: Weight samples by (9) & (11) 10: Estimate Ŝn by (10) 11: Model Update: 12: Update M n 1 to M n by using GP so with (x n, ĥn 1 ),,(ĥn L 1, ĥn L ), (ĥn L, yn ). 3 Sequential Inference for Deep GP For n-th training data, the complexity (including sampling and model updating) is O(NAC 2 ) The overall complexity is O(NAC 2 N), which is much less than that of full GP O(N 3 ). In this work, we propose a sequential inference algorithm which is able to perform learning for the DGP model e ectively and e ciently. The algorithm (summarized in Alg. 1) assumes an online setting where data is presented for learning in sequence. Specifically, for each training pair, (x n, y n ), the algorithm consists
15 Prediction and Hyperparameter Learning Given a trained DGP model M = {M 1,..., M L, M y } and a new input x, h 1 (k) p(h 1 x, M 1, Θ dgp ) h i (k) p(h i h i 1 (k), M i, Θ dgp ) Predictive distribution of y is Gaussian mixture ŷ 1 N p N p k=1 p(y h L (k), M y, Θ dgp ) Hyperparameter Θ dgp for each layer can be learned by minimizing the negative log likelihood.
16 Sequential DGP Extensions
17 Heteroscedastic Learning Homoscedastic noise, i.e. σ y is a constant y = f y (h L ) + v y Heteroscedastic noise, i.e. σ y is dependent to input s α = f α (h L ) + v α, s β = f β (h L ) + v β p(f s α, s β ) = N(s α, exp(s β )) Figure 2: Heteroscedastic extension of DGP.
18 Multi-Task Learning Deep network structure of DGP allows correlations between outputs to be represented by sharing the latent layers Multi-Task learning is straightforward in sequential inference scheme, Unnormalized weight is computed by using only observed dimensions of the output y n obs w n (k) = p(y n obs hn L (k), Mn 1 y, Θ dgp ) Missing dimensions are estimated using the same procedure as latent states. Model parameters M n y are estimated using both observed and estimated outputs.
19 Further possible extensions A novel and efficient inference framework for DGP is proposed. Further possible extensions might be: 1 Exploiting deep learning techniques such as ReLU activation functions and dropout in sided the framework 2 Sequential DGP with other network structures e.g. CNNs, RNNs, LSTMs.
20 Experiments
21 Experiments ARD kernal: Evaluation: d k(x, x ) = σ 2 ker exp 0.5 c i (x i x i )2 i=1 Root mean squared error (RMSE) mean negative log probability (MNLP) Average training time (second) Abbreviations: Sequential DGP - DGP seq Heteroscedastic sequential DGP - HDGP seq Variational inference DGP - DGP var Sparse online GP - GP so
22 DGP as Deep Dynamical Prior: unsupervised learning Input is the time step and output is the multi-dimensional observation. Datasets (Time seuqences): motion, flu and stock Training size 1500/300/1000 for motion/flu/stock datasets Layers L = 2 Latent dimension D = 5 Number of samples N p = 100 Budget size N AC = 100/100/200 respectively for 3 datasets
23 Performance w.r.t. Latent Layer Number L
24 Performance w.r.t. Layer Dimensionality D
25 Performance w.r.t. Number of Particles N p
26 Performance w.r.t. Size of Active Set N AC
27 Qualitative Results on motion dataset Yali Wang, Marcus Brubaker, Brahim Chaib-draa, Raquel Urtasun Figure 5: DGP as Deep Dynamical Prior (the motion dataset). Particle Estimation of y at the time 700, 1200, 1570, 2070 where the number of particles Np is 100. Figure 3: DGP as Deep Dynamical Prior (the motion dataset). Particle Estimation of y.
28 Sequential Inference for DeepSupplementary GP Material: Sequential Sequential InferenceDGP for Extensions Deep Gaussian Process Experiments Dimensionality Reduction 24 Low-dimensional latent representations on Frey Our DGP Supplementary 22 Material: Sequential Inference for Deep seq face (L=1 D=3) Gaussian dataset: Processinput and 23 Our DGP (L=2 D=2) output are the noisy face images. seq Reconstruction Error (RMSE) Reconstruction Reconstruction Error Error (RMSE) (RMSE) Noise Level (std=0.1) (std=0.3) 25 Our DGP seq (L=1 D=2) DGP var (L=1 D=2) DGP (L=1 D=3) var Noise Level (std=0.1) DGP var (L=2 D=2) Our DGP seq (L=1 D=2) Number of of Episodes Our DGP (L=1 D=3) 22 seq Figure 8: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction error as a function Our DGP (L=2 D=2) of the number of episodes. 20Note that there is no sequential inference & hyperparameter seq learning episode in DGP var. The results of DGP var are lines. The error bar is mean±standarddgp deviation. (L=1 D=2) var 18 DGP var (L=1 D=3) 16 DGP var (L=2 D=2) Number of Episodes Ground Truth Figure 8: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction error as a function of the number of episodes. Note that there is no sequential inference & hyperparameter learning episode in Noisy Observation DGP var. The (std=0.1) results of DGP var are lines. The error bar is mean±standard deviation. DGP_var (L=1, D=3) Our DGP_seq (1 st Ep, L=1, D=3) Our DGP_seq (5 th Ep, L=1, D=3) Ground Truth Figure 9: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction image comparison at the time step 50:50:1900.
29 Our DGP_seq Sequential Inference (1 st Ep, for L=1, Deep D=3) GP Sequential DGP Extensions Experiments Our DGP_seq (5 Heteroscedastic th Ep, L=1, D=3) Learning Result on Motercycle Figure 9: Dimensionality Reduction for Image Reconstruction (Face Data): the reconstruction image comparison at the time step 50:50:1900. Dynamical prior estimation on Motorcycle dataset: 94 time/acceration pairs. Acceleration Time (a) our HGPseq vs GP (b) s of our HGPseq (c) s of our HGPseq Figure 10: Heteroscedastic GP Modeling (Posterior of Motorcycle Data). In Plot10(a), the observations are black crosses, the mean of our HGP seq / GP are pink / blue lines, 95% confidence interval of our HGP seq /GP are pink / blue dashed lines.
30 Missing Data in Multi-task Learning (1) Dynamical prior estimation on motion, flu and stock (small portion of observations is missing)
31 Sequential Inference for Deep GP Supplementary Sequential DGP Material: Extensions Sequential Inference Experiments for Dee Missing Data in Multi-task Learning (2) Data Methods RMSE MNLP GP so 9.33 ± ±0.0 Parkinsons DGP var 9.26± ±1.3 our DGP seq 8.86± ±0.0 Dynamical prior estimation on Jura 359/100 total/missing dataset, with input 2D location and output the measurement of cadmium, nickel and zinc concentraions. 1 Table 2: DGP as Regressor (Parkinsons Data): Prediction Err Methods MAE Train T(s) GP ± CMOGP ± SLFM ± GPRN ± GPRN-NPV ± our DGP seq ± Table 5: Comparison with State-of-the-art Multi-task GPs (the Jura dataset). The results of GP, CMOGP, SLFM, GPRN are from [12]. [5] J. Hensman Compressio [6] Y. Cho, L. Learning, in [7] Y. Gal, M. tributed Va Process Reg in: NIPS, 2 [8] K. Kersting 1 nickel and zinc concentrations. The total number of gard, Most MAE - Mean absolute error; CMOGP - convolved multiple outputs GP; SLFM - data pairs is 359, where 100 measurements of cadmium cess Regress semiparametric latent factor model; GPRN - GP regression networks; GPRN-NPV1 - are missing. We follow the experimental settings of GPRN with nonparametric variational inference (mode 1). [9] M. Lázaro-
32 DGP for Supervised Learning (1) Regresion on Parkinsons elemonitoring dataset, a 6 months biomedical voice recording from 42 people with early-stage Parkinson s disease for a total of 5875 input/output pairs for training and 875 for testing Parameter setting: L = 2, D = 2, N p = 500, N AC = 100 Supplementary Material: Sequential Inference for Deep Gaussian Process Data Methods RMSE MNLP Train T(s) GP so 9.33 ± ± ±2 Parkinsons DGP var 9.26± ± ±1370 our DGP seq 8.86± ± ±18 Table 2: DGP as Regressor (Parkinsons Data): Prediction Error & Training E ciency. Methods MAE Train T(s) GP ± CMOGP ± [5] J. Hensman, N. Lawrence, Nested Varitaio Compression in Deep Gaussian Processes,
33 Motion DGP for Supervised Learning (2) Classification on Tumor gene expression dataset and MNIST digits dataset. 2 Tumor: L = 3, D = 12, N p = 200, N AC = 144 MNIST: L = 1, D = 400, N p = 100, N AC = 1000 Data Methods RMSE(%) MNLP GP so our DGP seq 1.0± ±0.7 our HDGP seq 1.1± ±0.2 Flu Stock GP so our DGP seq 4.0± ±0.3 our HDGP seq 3.0± ±0.1 GP so our DGP seq 6.4± ±0.0 our HDGP seq 6.6± ±0.0 Table 1: Missing Data in Multi-Task Learning (Mot Data Methods Acc Train T(s) GP so Tumor DGP var our DGP seq GP so mins MNIST DGP var - - DVI-GPLVM mins* our DGP seq mins Table 3: Classification. For MNIST, the results of DVI-GPLVM are from [7]. (*) Distributed implementation using an unreported number of cores. 2 DVI-GPLVM - GPLVM with distributed variational inference. Methods GP NGP WGP MLHGP VHGP our HGP seq Table 4: Com eroscedastic GP
34 References [1] M. Lázaro-Gredilla, Bayesian warped Gaussian processes, in Advances in Neural Information Processing Systems, 2012, pp [2] R. P. Adams and O. Stegle, Gaussian process product models for nonparametric nonstationarity, in Proceedings of the 25th international conference on Machine learning, 2008, pp [3] L. Csató and M. Opper, Sparse on-line Gaussian processes, Neural computation, vol. 14, no. 3, pp , [4] O. Cappé, S. J. Godsill, and E. Moulines, An overview of existing methods and recent advances in sequential Monte Carlo, Proceedings of the IEEE, vol. 95, no. 5, pp , [5] S. Van Vaerenbergh, M. Lázaro-Gredilla, and I. Santamaría, Kernel recursive least-squares tracker for time-varying regression, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 8, pp , 2012.
Sequential Inference for Deep Gaussian Process
Yali Wang Marcus Brubaker Brahim Chaib-draa Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University University of Toronto yl.wang@siat.ac.cn mbrubake@cs.toronto.edu chaib@ift.ulaval.ca
More informationDoubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London
Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationTree-structured Gaussian Process Approximations
Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationNonparametric Bayesian inference on multivariate exponential families
Nonparametric Bayesian inference on multivariate exponential families William Vega-Brown, Marek Doniec, and Nicholas Roy Massachusetts Institute of Technology Cambridge, MA 2139 {wrvb, doniec, nickroy}@csail.mit.edu
More informationDropout as a Bayesian Approximation: Insights and Applications
Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationSparse Approximations for Non-Conjugate Gaussian Process Regression
Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationGaussian Process Regression with Censored Data Using Expectation Propagation
Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 01 Gaussian Process Regression with Censored Data Using Expectation Propagation Perry Groot, Peter Lucas Radboud University Nijmegen,
More informationLecture 1c: Gaussian Processes for Regression
Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk
More informationVariational Dependent Multi-output Gaussian Process Dynamical Systems
Variational Dependent Multi-output Gaussian Process Dynamical Systems Jing Zhao and Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai
More informationINFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES
INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationInformation-Based Multi-Fidelity Bayesian Optimization
Information-Based Multi-Fidelity Bayesian Optimization Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low and Mohan Kankanhalli Department of Computer Science, National University of Singapore, Republic
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction
Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationDeep Gaussian Processes
Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationarxiv: v1 [stat.ml] 27 May 2017
Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes arxiv:175.986v1 [stat.ml] 7 May 17 Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk
More informationNonparametric Inference for Auto-Encoding Variational Bayes
Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationEfficient Modeling of Latent Information in Supervised Learning using Gaussian Processes
Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk Neil D. Lawrence lawrennd@amazon.com
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationDeep learning with differential Gaussian process flows
Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto
More informationGaussian Processes for Big Data. James Hensman
Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationNonlinear Statistical Learning with Truncated Gaussian Graphical Models
Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationVariational Dropout and the Local Reparameterization Trick
Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationVariable sigma Gaussian processes: An expectation propagation perspective
Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationMarkov Chain Monte Carlo Algorithms for Gaussian Processes
Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationAnomaly Detection and Removal Using Non-Stationary Gaussian Processes
Anomaly Detection and Removal Using Non-Stationary Gaussian ocesses Steven Reece Roman Garnett, Michael Osborne and Stephen Roberts Robotics Research Group Dept Engineering Science Oford University, UK
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationSparse Spectral Sampling Gaussian Processes
Sparse Spectral Sampling Gaussian Processes Miguel Lázaro-Gredilla Department of Signal Processing & Communications Universidad Carlos III de Madrid, Spain miguel@tsc.uc3m.es Joaquin Quiñonero-Candela
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationDeep Gaussian Processes
Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian
More informationGaussian Process Regression Networks
Andrew Gordon Wilson David A. Knowles Zoubin Ghahramani Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK agw38@cam.ac.uk dak33@cam.ac.uk zoubin@eng.cam.ac.uk Abstract We introduce
More informationDistributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server
Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server in collaboration with: Minjie Xu, Balaji Lakshminarayanan, Leonard Hasenclever, Thibaut Lienart, Stefan Webb,
More informationGaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling
Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationVariational Dropout via Empirical Bayes
Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,
More informationBayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang
Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present
More informationJoint Emotion Analysis via Multi-task Gaussian Processes
Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly
More informationStochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More information