Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London
|
|
- Irma Webster
- 5 years ago
- Views:
Transcription
1 Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017
2 Motivation DGPs promise much, but are difficult to train Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2
3 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2
4 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well We seek a variational approach that works and scales Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2
5 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well We seek a variational approach that works and scales Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2
6 Motivation DGPs promise much, but are difficult to train Fully factorized VI doesn t work well We seek a variational approach that works and scales Other recently proposed schemes [1, 2, 5] make additional approximations and require more machinery than VI Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 2
7 Talk outline 1. Summary: Model Inference Results 2. Details: Model Inference Results 3. Questions Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 3
8 Model We use the standard DGP model, with one addition: Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 4
9 Model We use the standard DGP model, with one addition: We include a linear (identity) mean function for all the internal layers Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 4
10 Model We use the standard DGP model, with one addition: We include a linear (identity) mean function for all the internal layers (1D example in [4]) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 4
11 Inference We use the model conditioned on the inducing points as a conditional variational posterior Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5
12 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5
13 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) We use sampling to deal with the intractable expectation Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5
14 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) We use sampling to deal with the intractable expectation Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5
15 Inference We use the model conditioned on the inducing points as a conditional variational posterior We impose Gaussians on the inducing points, (independent between layers but full rank within layers) We use sampling to deal with the intractable expectation We never compute N ˆ N matrices (we make no additional simplifications to variational posterior) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 5
16 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6
17 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6
18 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6
19 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better We can get 98.1% on mnist with only 100 inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6
20 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better We can get 98.1% on mnist with only 100 inducing points We surpass all permutation invariant methods on rectangles-images (designed to test deep vs shallow architectures) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6
21 Results We show significant improvement over single layer models on large ( 10 6 ) and massive ( 10 9 ) data Big jump in improvement over single layer GP with 5ˆ number of inducing points On small data we never do worse than the single layer model, and often better We can get 98.1% on mnist with only 100 inducing points We surpass all permutation invariant methods on rectangles-images (designed to test deep vs shallow architectures) Identical model/inference hyperparameters for all our models Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 6
22 Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: If dimensions agree use the identity, otherwise PCA Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 7
23 Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: If dimensions agree use the identity, otherwise PCA Sensible alternative: initialize latents to identity (but linear mean function works better) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 7
24 Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: If dimensions agree use the identity, otherwise PCA Sensible alternative: initialize latents to identity (but linear mean function works better) Not so sensible alternative: random. Doesn t work well (posterior is (very) multimodal) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 7
25 The DGP: Graphical Model X Z 0 f 1 u 1 h 1 Z 1 f 2 u 2 h 2 Z 2 f 3 u 3 y Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 8
26 The DGP: Density hkkkkkikkkkkj likelihood Nπ ppy, th l, f l, u l ul 1 L q ppy i fi L qˆ i 1 Lπ pph l f l qppf l u l ; h l 1, Z l 1 qppu l ; Z l 1 q loooooooooooooooooooooooooomoooooooooooooooooooooooooon l 1 DGP prior Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/2017 9
27 Factorised Variational Posterior X Z 0 X f 1 u 1 f 1 u 1 N (u 1 m 1, S 1 ) h 1 Z 1 Q i N (h1 i µ1 i, 1 i 2 ) h 1 f 2 u 2 f 2 u 2 N (u 2 m 2, S 2 ) h 2 Z 2 Q i N (h2 i µ2 i, 2 i 2 ) h 2 f 3 u 3 f 3 u 3 N (u 3 m 3, S 3 ) y y Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
28 Our Variational Posterior X Z 0 X f 1 u 1 f 1 u 1 N (u 1 m 1, S 1 ) h 1 Z 1 h 1 f 2 u 2 f 2 u 2 N (u 2 m 2, S 2 ) h 2 Z 2 h 2 f 3 u 3 f 3 u 3 N (u 3 m 3, S 3 ) y f 3 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
29 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
30 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Marginalise u from the variational posterior: ª ppf u; X, ZqN pu m, Sqdu N pf µ, Sq : qpf m, S; X, Zq (1) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
31 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Marginalise u from the variational posterior: ª ppf u; X, ZqN pu m, Sqdu N pf µ, Sq : qpf m, S; X, Zq (1) Define the following mean and covariance functions: µ m,z px i q mpx i q`apx i q T pm mpzqq, S S,Z px i, x j q kpx i, x j q apx i q T pkpz, Zq Sqapx j q. where apx i q kpx i, ZqkpZ, Zq 1 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
32 Recap: GPs for Big Data [3] qpf, uq ppf u; X, ZqN pu m, Sq Marginalise u from the variational posterior: ª ppf u; X, ZqN pu m, Sqdu N pf µ, Sq : qpf m, S; X, Zq (1) Define the following mean and covariance functions: µ m,z px i q mpx i q`apx i q T pm mpzqq, S S,Z px i, x j q kpx i, x j q apx i q T pkpz, Zq Sqapx j q. where apx i q kpx i, ZqkpZ, Zq 1 With these functions rµs i µ m,z px i q and rss ij S S,Z px i, x j q. Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
33 Recap: GPs for Big Data [3] cont. Key idea: The f i marginals of qpf, uq ppf u; X, ZqN pu m, Sq depend only on the inputs x i (and the variational parameters) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
34 Our Variational Posterior X Z 0 X f 1 u 1 f 1 u 1 N (u 1 m 1, S 1 ) h 1 Z 1 h 1 f 2 u 2 f 2 u 2 N (u 2 m 2, S 2 ) h 2 Z 2 h 2 f 3 u 3 f 3 u 3 N (u 3 m 3, S 3 ) y f 3 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
35 Our approach We can marginalise all the u l from our posterior Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
36 Our approach We can marginalise all the u l from our posterior Result for lth layer is qpf l m l, S l ; h l 1, Z l 1 q. Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
37 Our approach We can marginalise all the u l from our posterior Result for lth layer is qpf l m l, S l ; h l 1, Z l 1 q. The fully coupled (both between and within layers) variational posterior is Lπ pph l f l qqpf l m l, S l ; h l 1, Z l 1 q l 1 Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
38 But what about the ith marginals? Since at each layer the ith marginal depends only on the ith component of the layer below, we have qpt f l i, hl i ul l 1 q Lπ l 1 pph l i f l i qqp f l i ml, S l ; h l 1 i, Z l 1 q (2) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
39 The lower bound Since our variational posterior matches the model everywhere except the inducing points, the bound is: Lÿ L E qpt f l i,hi lul l 1 q log ppy i fi L q l 1 KLpqpu l q ppu l qq The analytic marginalisation of the all the inner layers qp f L i q is intractable, but we can draw samples ancestrally Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
40 Sampling from the variational posterior Each layer is Gaussian, given the layer below Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
41 Sampling from the variational posterior Each layer is Gaussian, given the layer below We draw samples using unit Gaussians e N p0, 1q at each layer Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
42 Sampling from the variational posterior Each layer is Gaussian, given the layer below We draw samples using unit Gaussians e N p0, 1q at each layer For ˆf u, l mean and var from qp fi l ml, S l ; ĥl 1 i, Z l 1 q b ˆf i l µ m l,z l 1pĥl 1 q`e i S S l,z l 1pĥl 1 i, ĥl 1 i q where for the first layer ĥ0 i x i Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
43 Sampling from the variational posterior Each layer is Gaussian, given the layer below We draw samples using unit Gaussians e N p0, 1q at each layer For ˆf u, l mean and var from qp fi l ml, S l ; ĥl 1 i, Z l 1 q b ˆf i l µ m l,z l 1pĥl 1 q`e i S S l,z l 1pĥl 1 i, ĥl 1 i q where for the first layer ĥ0 i x i Just add noise for the ĥl i Whole sampling process is differentiable wrt variational parameters Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
44 The second source of stochasticity We sample the bound in minibatches Linear scaling in N Can be used when only steaming is possible ( 50GB datasets) Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
45 Inference recap We use the full model as a variational posterior, conditioned on the inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
46 Inference recap We use the full model as a variational posterior, conditioned on the inducing points We use Gaussians for the inducing points Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
47 Inference recap We use the full model as a variational posterior, conditioned on the inducing points We use Gaussians for the inducing points The lower bound requires only the posterior marginals Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
48 Inference recap We use the full model as a variational posterior, conditioned on the inducing points We use Gaussians for the inducing points The lower bound requires only the posterior marginals We can take samples from the posterior marginals using a Monte Carlo estimate Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
49 Results (1): UCI Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
50 Code Demo master/demos/demo_regression_uci.ipynb Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
51 Results (2): Large and Massive Data Test RMSE N D SGP SGP 500 DGP 2 DGP 3 DGP 4 DGP 5 year airline 700K taxi 1B Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
52 Thanks for listening Questions? Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
53 References [1] T. D. Bui, D. Hernández-Lobato, Y. Li, J. M. Hernández-Lobato, and R. E. Turner. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. Icml, [2] K. Cutajar, E. V. Bonilla, P. Michiardi, and M. Filippone. Practical Learning of Deep Gaussian Processes via Random Fourier Features. arxiv preprint arxiv: , [3] J. Hensman, N. Fusi, and N. D. Lawrence. Gaussian Processes for Big Data. Uncertainty in Artificial Intelligence, pages , [4] M. Lázaro-Gredilla. Bayesian Warped Gaussian Processes. Advances in Neural Information Processing Systems, pages , [5] Y. Wang, M. Brubaker, B. Chaib-Draa, and R. Urtasun. Sequential Inference for Deep Gaussian Process. Artificial Intelligence and Statistics, Doubly Stochastic Inference for DGPs Hugh Berlin, 29/5/
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationDeep learning with differential Gaussian process flows
Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationSequential Inference for Deep Gaussian Process
Sequential Inference for Deep Gaussian Process Yali Wang, Marcus Brubaker, Brahim Chaib-draa, Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University AISTATS 2016 Presented by
More informationGaussian Processes for Big Data. James Hensman
Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationTree-structured Gaussian Process Approximations
Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, 2014 1 / 27 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationNon-Factorised Variational Inference in Dynamical Systems
st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationDistributed Gaussian Processes
Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationarxiv: v1 [stat.ml] 11 Nov 2015
Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation arxiv:1511.03405v1 [stat.ml] 11 Nov 2015 Thang D. Bui University of Cambridge tdb40@cam.ac.uk
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationDeep Neural Networks as Gaussian Processes
Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationEvaluating the Variance of
Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationSequential Inference for Deep Gaussian Process
Yali Wang Marcus Brubaker Brahim Chaib-draa Raquel Urtasun Chinese Academy of Sciences University of Toronto Laval University University of Toronto yl.wang@siat.ac.cn mbrubake@cs.toronto.edu chaib@ift.ulaval.ca
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationSparse Approximations for Non-Conjugate Gaussian Process Regression
Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationReview: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)
Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationStochastic Variational Deep Kernel Learning
Stochastic Variational Deep Kernel Learning Andrew Gordon Wilson* Cornell University Zhiting Hu* CMU Ruslan Salakhutdinov CMU Eric P. Xing CMU Abstract Deep kernel learning combines the non-parametric
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationLocal Expectation Gradients for Doubly Stochastic. Variational Inference
Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationVariational Dropout and the Local Reparameterization Trick
Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationAnomaly Detection and Removal Using Non-Stationary Gaussian Processes
Anomaly Detection and Removal Using Non-Stationary Gaussian ocesses Steven Reece Roman Garnett, Michael Osborne and Stephen Roberts Robotics Research Group Dept Engineering Science Oford University, UK
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationDropout as a Bayesian Approximation: Insights and Applications
Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors
More informationINFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES
INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationarxiv: v2 [stat.ml] 2 Nov 2016
Stochastic Variational Deep Kernel Learning Andrew Gordon Wilson* Cornell University Zhiting Hu* CMU Ruslan Salakhutdinov CMU Eric P. Xing CMU arxiv:1611.00336v2 [stat.ml] 2 Nov 2016 Abstract Deep kernel
More informationVariational Dropout via Empirical Bayes
Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationDeep Gaussian Processes
Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationThe Gaussian process latent variable model with Cox regression
The Gaussian process latent variable model with Cox regression James Barrett Institute for Mathematical and Molecular Biomedicine (IMMB), King s College London James Barrett (IMMB, KCL) GP Latent Variable
More informationStochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationDeep Gaussian Processes
Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationUnderstanding Covariance Estimates in Expectation Propagation
Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationBayesian Calibration of Simulators with Structured Discretization Uncertainty
Bayesian Calibration of Simulators with Structured Discretization Uncertainty Oksana A. Chkrebtii Department of Statistics, The Ohio State University Joint work with Matthew T. Pratola (Statistics, The
More informationAnalytic Long-Term Forecasting with Periodic Gaussian Processes
Nooshin Haji Ghassemi School of Computing Blekinge Institute of Technology Sweden Marc Peter Deisenroth Department of Computing Imperial College London United Kingdom Department of Computer Science TU
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationVariational Inference for Monte Carlo Objectives
Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationIntroduction to Stochastic Gradient Markov Chain Monte Carlo Methods
Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationJoint Emotion Analysis via Multi-task Gaussian Processes
Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions
More informationSandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse
Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion
More informationPreviously Monte Carlo Integration
Previously Simulation, sampling Monte Carlo Simulations Inverse cdf method Rejection sampling Today: sampling cont., Bayesian inference via sampling Eigenvalues and Eigenvectors Markov processes, PageRank
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More information