Learning Representations for Hyperparameter Transfer Learning
|
|
- Peter Webb
- 5 years ago
- Views:
Transcription
1 Learning Representations for Hyperparameter Transfer Learning Cédric Archambeau DALI 2018 Lanzarote, Spain
2 My co-authors Rodolphe Jenatton Valerio Perrone Matthias Seeger
3
4 Tuning deep neural nets for optimal performance LeNet5 [LBBH98] The search space X is large and diverse: Architecture: # hidden layers, activation functions,... Model complexity: regularization, dropout,... Optimisation parameters: learning rates, momentum, batch size,...
5 Two straightforward approaches (Figure by Bergstra and Bengio, 2012) Exhaustive search on a regular or random grid Complexity is exponential in p Wasteful of resources, but easy to parallelise Memoryless
6 Can we do better?
7 Hyperparameter transfer learning
8 Hyperparameter transfer learning
9 Hyperparameter transfer learning
10 Hyperparameter transfer learning
11 Democratising machine learning Abstract away training algorithms Abstract away representation (feature engineering) Abstract away computing infrastructure Abstract away memory constraints Abstract away network architecture
12 Black-box global optimisation The function f to optimise can be non-convex. The number of hyperparameters p is moderate (typically < 20). Our goal is to solve the following optimisation problem: x = argmin x X f (x). Evaluating f (x) is expensive. No analytical form or gradient. Evaluations may be noisy.
13 Black-box global optimisation The function f to optimise can be non-convex. The number of hyperparameters p is moderate (typically < 20). Our goal is to solve the following optimisation problem: x = argmin x X f (x). Evaluating f (x) is expensive. No analytical form or gradient. Evaluations may be noisy.
14 Example: tuning deep neural nets [SLA12, SRS + 15, KFB + 16] LeNet5 [LBBH98] f (x) is the validation loss of the neural net as a function of its hyperparameters x. Evaluating f (x) is very costly up to weeks!
15 Bayesian (black-box) optimisation [MTZ78, SSW + 16] x = argmin x X f (x) Canonical algorithm: Surrogate model M of f #cheaper to evaluate Set of evaluated candidates C = {} While some BUDGET available: Select candidate xnew X using M and C #exploration/exploitation Collect evaluation ynew of f at x new #time-consuming Update C = C {(x new, y new )} Update M with C #Update surrogate model Update BUDGET
16 Bayesian (black-box) optimisation [MTZ78, SSW + 16] x = argmin x X f (x) Canonical algorithm: Surrogate model M of f #cheaper to evaluate Set of evaluated candidates C = {} While some BUDGET available: Select candidate xnew X using M and C #exploration/exploitation Collect evaluation ynew of f at x new #time-consuming Update C = C {(x new, y new )} Update M with C #Update surrogate model Update BUDGET
17 Bayesian (black-box) optimisation [MTZ78, SSW + 16] x = argmin x X f (x) Canonical algorithm: Surrogate model M of f #cheaper to evaluate Set of evaluated candidates C = {} While some BUDGET available: Select candidate xnew X using M and C #exploration/exploitation Collect evaluation ynew of f at x new #time-consuming Update C = C {(x new, y new )} Update M with C #Update surrogate model Update BUDGET
18 Bayesian (black-box) optimisation [MTZ78, SSW + 16] x = argmin x X f (x) Canonical algorithm: Surrogate model M of f #cheaper to evaluate Set of evaluated candidates C = {} While some BUDGET available: Select candidate xnew X using M and C #exploration/exploitation Collect evaluation ynew of f at x new #time-consuming Update C = C {(xnew, y new )} Update M with C #Update surrogate model Update BUDGET
19 Bayesian (black-box) optimisation [MTZ78, SSW + 16] x = argmin x X f (x) Canonical algorithm: Surrogate model M of f #cheaper to evaluate Set of evaluated candidates C = {} While some BUDGET available: Select candidate xnew X using M and C #exploration/exploitation Collect evaluation ynew of f at x new #time-consuming Update C = C {(xnew, y new )} Update M with C #Update surrogate model Update BUDGET
20 Bayesian (black-box) optimisation with Gaussian processes [JSW98] 1 Learn a probabilistic model of f, which is cheap to evaluate: y i f (x i ) Gaussian ( f (x i ), ς 2), f (x) GP(0, K).
21 Bayesian (black-box) optimisation with Gaussian processes [JSW98] 1 Learn a probabilistic model of f, which is cheap to evaluate: y i f (x i ) Gaussian ( f (x i ), ς 2), f (x) GP(0, K). 2 Given the observations y = (y 1,..., y n ), compute the predictive mean and the predictive standard deviation: 3 Repeatedly query f by balancing exploitation against exploration!
22 Bayesian (black-box) optimisation with Gaussian processes [JSW98] 1 Learn a probabilistic model of f, which is cheap to evaluate: y i f (x i ) Gaussian ( f (x i ), ς 2), f (x) GP(0, K). 2 Given the observations y = (y 1,..., y n ), compute the predictive mean and the predictive standard deviation: 3 Repeatedly query f by balancing exploitation against exploration!
23 Bayesian optimisation in practice (Image credit: Javier González)
24 What is wrong with the Gaussian process surrogate? Scaling is O ( N 3).
25 Adaptive Bayesian linear regression (ABLR) [Bis06] The model: P(y w, z, β) = n N (φ z (x n )w, β 1 ), P(w α) = N (0, α 1 I D ). The predictive distribution: P(y x, D) = P(y x, w)p(w D)dw = N (µ t (x ), σt 2 (x ))
26 Multi-task ABLR for transfer learning 1 Multi-task extension of the model: P(y t w t, z, β t ) = n t N (φ z (x nt )w t, β 1 t ), P(w t α t ) = N (0, α 1 t I D ). 2 Shared features φ z (x): Explicit features set (e.g., RBF) Random kitchen sinks [RR + 07] Learned by feedforward neural net 3 Multi-task objective: ( ) ρ z, {α t, β t } T t=1 T = log P(y t z, α t, β t ) t=1
27
28 Warm-start procedure for hyperparameter optimisation (HPO) Leave-one-task out.
29 A representation to optimise parametrised quadratic functions best Random search ABLR NN ABLR RKS ABLR NN transfer ABLR NN transfer (ctx) GP GP transfer (ctx) best DNGO DNGO transfer (ctx) BOHAMIANN BOHAMIANN transfer (ctx) ABLR NN ABLR NN transfer ABLR NN transfer (ctx) iteration Transfer learning with baselines [KO11] iteration Transfer learning with neural nets [SRS + 15, SKFH16]. f t (x) = a 2,t 2 x 2 + a 1,t 1 x + a 0,t
30 A representation to warm-start HPO across OpenML data sets (1 - AUC) * Random search ABLR NN ABLR RKS ABLR NN transfer GP GP transfer (ctx, L1) iteration Transfer learning in SVM. (1 - AUC) * Random search ABLR NN ABLR RKS ABLR NN transfer GP GP transfer (ctx, L1) iteration Transfer learning in XGBoost.
31 Learning better representations by jointly modelling multiple signals Tasks are now auxiliary signals (related to the target function to optimise). The target is still the validation loss f (x). Examples are training cost or training loss.
32 A representation for transfer learning across OpenML data sets Test error * Random search ABLR ABLR train loss (2) GP ABLR cost (2) ABLR cost + train loss (3) ABLR cost + epochwise train loss (21) iteration Transfer learning accross LIBSVM data sets.
33 Conclusion Bayesian optimisation automates machine learning: Algorithm tuning Model tuning Pipeline tuning Bayesian optimisation is a model-based approach that can leverage auxiliary information: It can exploit dependency structures [JAGS17] It can be extended to warm-start HPO jobs [PJSA17] It is a key building block of meta-learning Effective representations for HPO transfer learning can be learned.
34 Conclusion Bayesian optimisation automates machine learning: Algorithm tuning Model tuning Pipeline tuning Bayesian optimisation is a model-based approach that can leverage auxiliary information: It can exploit dependency structures [JAGS17] It can be extended to warm-start HPO jobs [PJSA17] It is a key building block of meta-learning Effective representations for HPO transfer learning can be learned.
35 Conclusion Bayesian optimisation automates machine learning: Algorithm tuning Model tuning Pipeline tuning Bayesian optimisation is a model-based approach that can leverage auxiliary information: It can exploit dependency structures [JAGS17] It can be extended to warm-start HPO jobs [PJSA17] It is a key building block of meta-learning Effective representations for HPO transfer learning can be learned.
36
37 References I James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13: , C. M. Bishop. Pattern Recognition and Machine Learning. Springer New York, Rodolphe Jenatton, Cedric Archambeau, Javier González, and Matthias Seeger. Bayesian optimization with tree-structured dependencies. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research, pages , International Convention Centre, Sydney, Australia, Aug PMLR. Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4): , 1998.
38 References II Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast Bayesian optimization of machine learning hyperparameters on large datasets. Technical report, preprint arxiv: , Andreas Krause and Cheng S Ong. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems (NIPS), pages , Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): , Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2( ):2, V. Perrone, R. Jenatton, M. Seeger, and C. Archambeau. Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start. ArXiv e-prints, December 2017.
39 References III Ali Rahimi, Benjamin Recht, et al. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems (NIPS) 20, pages , Matthias Seeger, Asmus Hetzel, Zhenwen Dai, and Neil D Lawrence. Auto-differentiating linear algebra. Technical report, preprint arxiv: , Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. Bayesian optimization with robust Bayesian neural networks. In Advances in Neural Information Processing Systems (NIPS), pages , Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (NIPS), pages , 2012.
40 References IV Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. Scalable Bayesian optimization using deep neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pages , Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1): , Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl, and Luis Torgo. OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49 60, 2014.
Predictive Variance Reduction Search
Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au
More informationQuantifying mismatch in Bayesian optimization
Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato
More informationBayesian Optimization with Robust Bayesian Neural Networks
Bayesian Optimization with Robust Bayesian Neural Networks Jost Tobias Springenberg Aaron Klein Stefan Falkner Frank Hutter Department of Computer Science University of Freiburg {springj,kleinaa,sfalkner,fh}@cs.uni-freiburg.de
More informationBayesian optimization for automatic machine learning
Bayesian optimization for automatic machine learning Matthew W. Ho man based o work with J. M. Hernández-Lobato, M. Gelbart, B. Shahriari, and others! University of Cambridge July 11, 2015 Black-bo optimization
More informationInformation-Based Multi-Fidelity Bayesian Optimization
Information-Based Multi-Fidelity Bayesian Optimization Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low and Mohan Kankanhalli Department of Computer Science, National University of Singapore, Republic
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationDenoising Autoencoders
Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationRaiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces
Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces Kevin Swersky University of Toronto kswersky@cs.toronto.edu David Duvenaud University of Cambridge dkd23@cam.ac.uk
More informationContext-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems
Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems Ali Baheri Department of Mechanical Engineering University of North Carolina at Charlotte
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationNonparametric Inference for Auto-Encoding Variational Bayes
Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an
More informationAn Efficient Approach for Assessing Parameter Importance in Bayesian Optimization
An Efficient Approach for Assessing Parameter Importance in Bayesian Optimization Frank Hutter Freiburg University fh@informatik.uni-freiburg.de Holger H. Hoos and Kevin Leyton-Brown University of British
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationBayesian Optimization with Unknown Constraints
Bayesian Optimization with Unknown Constraints Michael A. Gelbart Harvard University Cambridge, MA Jasper Snoek Harvard University Cambridge, MA Ryan P. Adams Harvard University Cambridge, MA Abstract
More informationA parametric approach to Bayesian optimization with pairwise comparisons
A parametric approach to Bayesian optimization with pairwise comparisons Marco Co Eindhoven University of Technology m.g.h.co@tue.nl Bert de Vries Eindhoven University of Technology and GN Hearing bdevries@ieee.org
More informationarxiv: v1 [stat.ml] 16 Jun 2014
FREEZE-THAW BAYESIAN OPTIMIZATION BY KEVIN SWERSKY, JASPER SNOEK AND RYAN P. ADAMS Harvard University and University of Toronto 1. Introduction arxiv:1406.3896v1 [stat.ml] 16 Jun 2014 In machine learning,
More informationDeep Learning & Neural Networks Lecture 4
Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly
More informationParallelised Bayesian Optimisation via Thompson Sampling
Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Carnegie Mellon University Google Research, Mountain View, CA Sep 27, 2017 Slides: www.cs.cmu.edu/~kkandasa/talks/google-ts-slides.pdf
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationarxiv: v2 [stat.ml] 16 Oct 2017
Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations arxiv:7.96v [stat.ml] 6 Oct 7 Eero Siivola, Aki Vehtari, Jarno Vanhatalo, Javier González,
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationarxiv: v3 [stat.ml] 11 Jun 2014
INPUT WARPING FOR BAYESIAN OPTIMIZATION OF NON-STATIONARY FUNCTIONS BY JASPER SNOEK, KEVIN SWERSKY, RICHARD S. ZEMEL AND RYAN P. ADAMS Harvard University and University of Toronto arxiv:42.929v3 [stat.ml]
More informationarxiv: v1 [cs.lg] 10 Oct 2018
Combining Bayesian Optimization and Lipschitz Optimization Mohamed Osama Ahmed Sharan Vaswani Mark Schmidt moahmed@cs.ubc.ca sharanv@cs.ubc.ca schmidtm@cs.ubc.ca University of British Columbia arxiv:1810.04336v1
More informationHigh Dimensional Bayesian Optimization via Restricted Projection Pursuit Models
High Dimensional Bayesian Optimization via Restricted Projection Pursuit Models Chun-Liang Li Kirthevasan Kandasamy Barnabás Póczos Jeff Schneider {chunlial, kandasamy, bapoczos, schneide}@cs.cmu.edu Carnegie
More informationGaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005
Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University
More informationForward and Reverse Gradient-based Hyperparameter Optimization
Forward and Reverse Gradient-based Hyperparameter Optimization Luca Franceschi 1,2, Michele Donini 1, Paolo Frasconi 3 and Massimilano Pontil 1,2 massimiliano.pontil@iit.it (1) Istituto Italiano di Tecnologia,
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationMulti-Attribute Bayesian Optimization under Utility Uncertainty
Multi-Attribute Bayesian Optimization under Utility Uncertainty Raul Astudillo Cornell University Ithaca, NY 14853 ra598@cornell.edu Peter I. Frazier Cornell University Ithaca, NY 14853 pf98@cornell.edu
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationImproving L-BFGS Initialization for Trust-Region Methods in Deep Learning
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationRandomized SPENs for Multi-Modal Prediction
Randomized SPENs for Multi-Modal Prediction David Belanger 1 Andrew McCallum 2 Abstract Energy-based prediction defines the mapping from an input x to an output y implicitly, via minimization of an energy
More informationDeep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationParallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration Emile Contal David Buffoni Alexandre Robicquet Nicolas Vayatis CMLA, ENS Cachan, France September 25, 2013 Motivating
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationMini-Course 6: Hyperparameter Optimization Harmonica
Mini-Course 6: Hyperparameter Optimization Harmonica Yang Yuan Computer Science Department Cornell University Last time Bayesian Optimization [Snoek et al., 2012, Swersky et al., 2013, Snoek et al., 2014,
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationHyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Journal of Machine Learning Research 18 (018) 1-5 Submitted 11/16; Revised 1/17; Published /18 Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization Lisha Li Carnegie Mellon University,
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationTrust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning
Trust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning by Jennifer B. Erway, Joshua Griffin, Riadh Omheni, and Roummel Marcia Technical report
More informationON HYPERPARAMETER OPTIMIZATION IN LEARNING SYSTEMS
ON HYPERPARAMETER OPTIMIZATION IN LEARNING SYSTEMS Luca Franceschi 1,2, Michele Donini 1, Paolo Frasconi 3, Massimiliano Pontil 1,2 (1) Istituto Italiano di Tecnologia, Genoa, 16163 Italy (2) Dept of Computer
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationJ. Sadeghi E. Patelli M. de Angelis
J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationarxiv: v1 [cs.lg] 6 Nov 2015
Deep Kernel Learning arxiv:1511.02222v1 [cs.lg] 6 Nov 2015 Andrew Gordon Wilson Carnegie Mellon University andrewgw@cs.cmu.edu Ruslan Salakhutdinov University of Toronto rsalakhu@cs.toronto.edu Abstract
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationFreezeOut: Accelerate Training by Progressively Freezing Layers
FreezeOut: Accelerate Training by Progressively Freezing Layers Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,
More informationDeep Neural Networks as Gaussian Processes
Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,
More informationFrom CDF to PDF A Density Estimation Method for High Dimensional Data
From CDF to PDF A Density Estimation Method for High Dimensional Data Shengdong Zhang Simon Fraser University sza75@sfu.ca arxiv:1804.05316v1 [stat.ml] 15 Apr 2018 April 17, 2018 1 Introduction Probability
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationarxiv: v1 [stat.ml] 11 Nov 2015
Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation arxiv:1511.03405v1 [stat.ml] 11 Nov 2015 Thang D. Bui University of Cambridge tdb40@cam.ac.uk
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationAdaptive Bayesian Optimization for Dynamic Problems
Adaptive Bayesian Optimization for Dynamic Problems Favour Mandanji Nyikosa Linacre College University of Oxford A thesis submitted for the degree of Doctor of Philosophy Hilary 2018 To the memory of
More informationTruncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher Laboratory for Information and Inference
More informationData Mining & Machine Learning
Data Mining & Machine Learning CS57300 Purdue University March 1, 2018 1 Recap of Last Class (Model Search) Forward and Backward passes 2 Feedforward Neural Networks Neural Networks: Architectures 2-layer
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationStochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)
More informationJoint Emotion Analysis via Multi-task Gaussian Processes
Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationOPTIMIZATION METHODS IN DEEP LEARNING
Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationMulti-Kernel Gaussian Processes
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Arman Melkumyan Australian Centre for Field Robotics School of Aerospace, Mechanical and Mechatronic Engineering
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationConstrained Bayesian Optimization and Applications
Constrained Bayesian Optimization and Applications The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Gelbart, Michael
More informationLecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants
Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationarxiv: v3 [stat.ml] 7 Feb 2018
Bayesian Optimization with Gradients Jian Wu Matthias Poloczek Andrew Gordon Wilson Peter I. Frazier Cornell University, University of Arizona arxiv:703.04389v3 stat.ml 7 Feb 08 Abstract Bayesian optimization
More informationScalable Gaussian Process Regression Using Deep Neural Networks
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Scalable Gaussian Process Regression Using Deep eural etworks Wenbing Huang 1, Deli Zhao 2, Fuchun
More informationNatural Gradients via the Variational Predictive Distribution
Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More information