Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Size: px
Start display at page:

Download "Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London"

Transcription

1 Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017

2 Motivation DGPs promise much, but are difficult to train Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2

3 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2

4 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well We seek a variational approach that works and scales Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2

5 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well We seek a variational approach that works and scales Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2

6 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well We seek a variational approach that works and scales Other recently proposed schemes [1, 2, 5] make additional approximations and require more machinery than VI Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2

7 Talk outline 1. Summary: Model Inference Results 2. Details: Model Inference Results 3. Questions Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 3

8 Model We use the standard DGP model, with one addition: Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 4

9 Model We use the standard DGP model, with one addition: We include a linear (identity) mean function for all the internal layers Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 4

10 Model We use the standard DGP model, with one addition: We include a linear (identity) mean function for all the internal layers (1D example in [4]) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 4

11 Inference We use the model conditioned on the inducing points as a conditional variational posterior Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5

12 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5

13 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) We use sampling to deal with the intractable expectation Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5

14 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) We use sampling to deal with the intractable expectation Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5

15 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) We use sampling to deal with the intractable expectation We never compute N ˆ N matrices (we make no additional simplifications to variational posterior) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5

16 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6

17 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6

18 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6

19 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better We can get 98.1% on mnist with only 100 inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6

20 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better We can get 98.1% on mnist with only 100 inducing points We surpass all permutation invariant methods on rectangles-images (designed to test deep vs shallow architectures) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6

21 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better We can get 98.1% on mnist with only 100 inducing points We surpass all permutation invariant methods on rectangles-images (designed to test deep vs shallow architectures) Identical model/inference hyperparameters for all our models Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6

22 Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: If dimensions agree use the identity, otherwise PCA Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 7

23 Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: If dimensions agree use the identity, otherwise PCA Sensible alternative: initialize latents to identity (but linear mean function works better) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 7

24 Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: If dimensions agree use the identity, otherwise PCA Sensible alternative: initialize latents to identity (but linear mean function works better) Not so sensible alternative: random. Doesn t work well (posterior is (very) multimodal) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 7

25 The DGP: Graphical Model X Z 0 f 1 u 1 h 1 Z 1 f 2 u 2 h 2 Z 2 f 3 u 3 y Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 8

26 The DGP: Density hkkkkkikkkkkj likelihood Nπ ppy, th l, f l, u l ul 1 L q ppy i fi L qˆ i 1 Lπ pph l f l qppf l u l ; h l 1, Z l 1 qppu l ; Z l 1 q loooooooooooooooooooooooooomoooooooooooooooooooooooooon l 1 DGP prior Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 9

27 Factorised Variational Posterior X Z 0 X f 1 u 1 f 1 u 1 N (u 1 m 1, S 1 ) h 1 Z 1 Q i N (h1 i µ1 i, 1 i 2 ) h 1 f 2 u 2 f 2 u 2 N (u 2 m 2, S 2 ) h 2 Z 2 Q i N (h2 i µ2 i, 2 i 2 ) h 2 f 3 u 3 f 3 u 3 N (u 3 m 3, S 3 ) y y Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

28 Our Variational Posterior X Z 0 X f 1 u 1 f 1 u 1 N (u 1 m 1, S 1 ) h 1 Z 1 h 1 f 2 u 2 f 2 u 2 N (u 2 m 2, S 2 ) h 2 Z 2 h 2 f 3 u 3 f 3 u 3 N (u 3 m 3, S 3 ) y f 3 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

29 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

30 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Marginalise u from the variational posterior: ª ppf u; X, ZqN pu m, Sqdu N pf µ, Sq : qpf m, S; X, Zq (1) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

31 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Marginalise u from the variational posterior: ª ppf u; X, ZqN pu m, Sqdu N pf µ, Sq : qpf m, S; X, Zq (1) Define the following mean and covariance functions: µ m,z px i q mpx i q`apx i q T pm mpzqq, S S,Z px i, x j q kpx i, x j q apx i q T pkpz, Zq Sqapx j q. where apx i q kpx i, ZqkpZ, Zq 1 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

32 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Marginalise u from the variational posterior: ª ppf u; X, ZqN pu m, Sqdu N pf µ, Sq : qpf m, S; X, Zq (1) Define the following mean and covariance functions: µ m,z px i q mpx i q`apx i q T pm mpzqq, S S,Z px i, x j q kpx i, x j q apx i q T pkpz, Zq Sqapx j q. where apx i q kpx i, ZqkpZ, Zq 1 With these functions rµs i µ m,z px i q and rss ij S S,Z px i, x j q. Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

33 Recap: GPs for Big Data [3] cont. Key idea: The f i marginals of qpf, uq ppf u; X, ZqN pu m, Sq depend only on the inputs x i (and the variational parameters) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

34 Our Variational Posterior X Z 0 X f 1 u 1 f 1 u 1 N (u 1 m 1, S 1 ) h 1 Z 1 h 1 f 2 u 2 f 2 u 2 N (u 2 m 2, S 2 ) h 2 Z 2 h 2 f 3 u 3 f 3 u 3 N (u 3 m 3, S 3 ) y f 3 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

35 Our approach We can marginalise all the u l from our posterior Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

36 Our approach We can marginalise all the u l from our posterior Result for lth layer is qpf l m l, S l ; h l 1, Z l 1 q. Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

37 Our approach We can marginalise all the u l from our posterior Result for lth layer is qpf l m l, S l ; h l 1, Z l 1 q. The fully coupled (both between and within layers) variational posterior is Lπ pph l f l qqpf l m l, S l ; h l 1, Z l 1 q l 1 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

38 But what about the ith marginals? Since at each layer the ith marginal depends only on the ith component of the layer below, we have qpt f l i, hl i ul l 1 q Lπ l 1 pph l i f l i qqp f l i ml, S l ; h l 1 i, Z l 1 q (2) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

39 The lower bound Since our variational posterior matches the model everywhere except the inducing points, the bound is: Lÿ L E qpt f l i,hi lul l 1 q log ppy i fi L q l 1 KLpqpu l q ppu l qq The analytic marginalisation of the all the inner layers qp f L i q is intractable, but we can draw samples ancestrally Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

40 Sampling from the variational posterior Each layer is Gaussian, given the layer below Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

41 Sampling from the variational posterior Each layer is Gaussian, given the layer below We draw samples using unit Gaussians e N p0, 1q at each layer Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

42 Sampling from the variational posterior Each layer is Gaussian, given the layer below We draw samples using unit Gaussians e N p0, 1q at each layer For ˆf u, l mean and var from qp fi l ml, S l ; ĥl 1 i, Z l 1 q b ˆf i l µ m l,z l 1pĥl 1 q`e i S S l,z l 1pĥl 1 i, ĥl 1 i q where for the first layer ĥ0 i x i Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

43 Sampling from the variational posterior Each layer is Gaussian, given the layer below We draw samples using unit Gaussians e N p0, 1q at each layer For ˆf u, l mean and var from qp fi l ml, S l ; ĥl 1 i, Z l 1 q b ˆf i l µ m l,z l 1pĥl 1 q`e i S S l,z l 1pĥl 1 i, ĥl 1 i q where for the first layer ĥ0 i x i Just add noise for the ĥl i Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

44 The second source of stochasticity We sample the bound in minibatches Linear scaling in N Can be used when only steaming is possible ( 50GB datasets) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

45 Inference recap We use the full model as a variational posterior, conditioned on the inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

46 Inference recap We use the full model as a variational posterior, conditioned on the inducing points We use Gaussians for the inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

47 Inference recap We use the full model as a variational posterior, conditioned on the inducing points We use Gaussians for the inducing points The lower bound requires only the posterior marginals Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

48 Inference recap We use the full model as a variational posterior, conditioned on the inducing points We use Gaussians for the inducing points The lower bound requires only the posterior marginals We can take samples from the posterior marginals using a Monte Carlo estimate Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

49 Results (1): UCI Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

50 Code Demo master/demos/demo_regression_uci.ipynb Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

51 Results (2): Large and Massive Data Test RMSE N D SGP SGP 500 DGP 2 DGP 3 DGP 4 DGP 5 year airline 700K taxi 1B Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

52 Thanks for listening Questions? Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

53 References [1] T. D. Bui, D. Hernández-Lobato, Y. Li, J. M. Hernández-Lobato, and R. E. Turner. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. Icml, [2] K. Cutajar, E. V. Bonilla, P. Michiardi, and M. Filippone. Practical Learning of Deep Gaussian Processes via Random Fourier Features. arxiv preprint arxiv: , [3] J. Hensman, N. Fusi, and N. D. Lawrence. Gaussian Processes for Big Data. Uncertainty in Artificial Intelligence, pages , [4] M. Lázaro-Gredilla. Bayesian Warped Gaussian Processes. Advances in Neural Information Processing Systems, pages , [5] Y. Wang, M. Brubaker, B. Chaib-Draa, and R. Urtasun. Sequential Inference for Deep Gaussian Process. Artificial Intelligence and Statistics, Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Sequential Inference for Deep Gaussian Process

Sequential Inference for Deep Gaussian Process Sequential Inference for Deep Gaussian Process Yali Wang, Marcus Brubaker, Brahim Chaib-draa, Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University AISTATS 2016 Presented by

More information

Gaussian Processes for Big Data. James Hensman

Gaussian Processes for Big Data. James Hensman Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Tree-structured Gaussian Process Approximations

Tree-structured Gaussian Process Approximations Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Distributed Gaussian Processes

Distributed Gaussian Processes Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

arxiv: v1 [stat.ml] 11 Nov 2015

arxiv: v1 [stat.ml] 11 Nov 2015 Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation arxiv:1511.03405v1 [stat.ml] 11 Nov 2015 Thang D. Bui University of Cambridge tdb40@cam.ac.uk

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Deep Neural Networks as Gaussian Processes

Deep Neural Networks as Gaussian Processes Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Evaluating the Variance of

Evaluating the Variance of Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

Sequential Inference for Deep Gaussian Process

Sequential Inference for Deep Gaussian Process Yali Wang Marcus Brubaker Brahim Chaib-draa Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University University of Toronto yl.wang@siat.ac.cn mbrubake@cs.toronto.edu chaib@ift.ulaval.ca

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Sparse Approximations for Non-Conjugate Gaussian Process Regression

Sparse Approximations for Non-Conjugate Gaussian Process Regression Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Stochastic Variational Deep Kernel Learning

Stochastic Variational Deep Kernel Learning Stochastic Variational Deep Kernel Learning Andrew Gordon Wilson* Cornell University Zhiting Hu* CMU Ruslan Salakhutdinov CMU Eric P. Xing CMU Abstract Deep kernel learning combines the non-parametric

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Variational Dropout and the Local Reparameterization Trick

Variational Dropout and the Local Reparameterization Trick Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Anomaly Detection and Removal Using Non-Stationary Gaussian Processes

Anomaly Detection and Removal Using Non-Stationary Gaussian Processes Anomaly Detection and Removal Using Non-Stationary Gaussian ocesses Steven Reece Roman Garnett, Michael Osborne and Stephen Roberts Robotics Research Group Dept Engineering Science Oford University, UK

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Dropout as a Bayesian Approximation: Insights and Applications

Dropout as a Bayesian Approximation: Insights and Applications Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

arxiv: v2 [stat.ml] 2 Nov 2016

arxiv: v2 [stat.ml] 2 Nov 2016 Stochastic Variational Deep Kernel Learning Andrew Gordon Wilson* Cornell University Zhiting Hu* CMU Ruslan Salakhutdinov CMU Eric P. Xing CMU arxiv:1611.00336v2 [stat.ml] 2 Nov 2016 Abstract Deep kernel

More information

Variational Dropout via Empirical Bayes

Variational Dropout via Empirical Bayes Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

The Gaussian process latent variable model with Cox regression

The Gaussian process latent variable model with Cox regression The Gaussian process latent variable model with Cox regression James Barrett Institute for Mathematical and Molecular Biomedicine (IMMB), King s College London James Barrett (IMMB, KCL) GP Latent Variable

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Bayesian Calibration of Simulators with Structured Discretization Uncertainty Bayesian Calibration of Simulators with Structured Discretization Uncertainty Oksana A. Chkrebtii Department of Statistics, The Ohio State University Joint work with Matthew T. Pratola (Statistics, The

More information

Analytic Long-Term Forecasting with Periodic Gaussian Processes

Analytic Long-Term Forecasting with Periodic Gaussian Processes Nooshin Haji Ghassemi School of Computing Blekinge Institute of Technology Sweden Marc Peter Deisenroth Department of Computing Imperial College London United Kingdom Department of Computer Science TU

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Variational Inference for Monte Carlo Objectives

Variational Inference for Monte Carlo Objectives Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion

More information

Previously Monte Carlo Integration

Previously Monte Carlo Integration Previously Simulation, sampling Monte Carlo Simulations Inverse cdf method Rejection sampling Today: sampling cont., Bayesian inference via sampling Eigenvalues and Eigenvectors Markov processes, PageRank

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information