Talk on Bayesian Optimization

Similar documents
Quantifying mismatch in Bayesian optimization

Nonparameteric Regression:

Predictive Variance Reduction Search

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Bayesian optimization for automatic machine learning

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems

Multi-Attribute Bayesian Optimization under Utility Uncertainty

Dynamic Batch Bayesian Optimization

Density Estimation. Seungjin Choi

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Optimisation séquentielle et application au design

Neutron inverse kinetics via Gaussian Processes

A parametric approach to Bayesian optimization with pairwise comparisons

Learning Gaussian Process Models from Uncertain Data

arxiv: v2 [stat.ml] 16 Oct 2017

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

Reliability Monitoring Using Log Gaussian Process Regression

Information-Based Multi-Fidelity Bayesian Optimization

A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Variable sigma Gaussian processes: An expectation propagation perspective

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Linear Models for Regression

Advanced Introduction to Machine Learning CMU-10715

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

GAUSSIAN PROCESS REGRESSION

Gaussian Processes (10/16/13)

Introduction to Gaussian Processes

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture 5: GPs and Streaming regression

Gaussian Processes for Machine Learning

Parallelised Bayesian Optimisation via Thompson Sampling

Model Selection for Gaussian Processes

EM-based Reinforcement Learning

arxiv: v1 [stat.ml] 10 Jun 2014

Kernels for Automatic Pattern Discovery and Extrapolation

Gaussian Process Regression

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability(ICML 2017) Nayeong Kim

How to build an automatic statistician

Ch 4. Linear Models for Classification

Probabilistic & Unsupervised Learning

Prediction of double gene knockout measurements

High Dimensional Bayesian Optimization via Restricted Projection Pursuit Models

Constrained Bayesian Optimization and Applications

arxiv: v1 [cs.lg] 10 Oct 2018

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation

GWAS V: Gaussian processes

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Linear Models for Regression

arxiv: v1 [stat.ml] 16 Jun 2014

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

Fast Likelihood-Free Inference via Bayesian Optimization

Adaptive Bayesian Optimization for Dynamic Problems

Lecture : Probabilistic Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Active and Semi-supervised Kernel Classification

Multivariate Bayesian Linear Regression MLAI Lecture 11

Gaussian with mean ( µ ) and standard deviation ( σ)

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Gaussian Process Optimization with Mutual Information

STAT 518 Intro Student Presentation

Contextual Gaussian Process Bandit Optimization

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Afternoon Meeting on Bayesian Computation 2018 University of Reading

MTTTS16 Learning from Multiple Sources

Probabilistic numerics for deep learning

Approximate Dynamic Programming Using Bellman Residual Elimination and Gaussian Process Regression

Bayesian Optimization in a Billion Dimensions via Random Embeddings

Batch Bayesian Optimization via Simulation Matching

Sequential Design of Computer Experiments for the Estimation of a Quantile with Application to Numerical Dosimetry

Logistic Regression. Seungjin Choi

Variational Model Selection for Sparse Gaussian Process Regression

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Bayesian Deep Learning

20: Gaussian Processes

Relevance Vector Machines

Efficient Likelihood-Free Inference

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

Introduction to Gaussian Processes

arxiv: v3 [stat.ml] 7 Feb 2018

Optimization of Gaussian Process Hyperparameters using Rprop

Introduction to Probabilistic Graphical Models: Exercises

Gaussian Process Regression Networks

Expectation Propagation in Dynamical Systems

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Gaussian Process Vine Copulas for Multivariate Dependence

Probabilistic Models for Learning Data Representations. Andreas Damianou

Transcription:

Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do, Republic of Korea Jan 13, 2016 1/23

Table of Contents Bayesian Optimization Bayesian Optimization for Expensive Black-box Function Algorithm of Bayesian Optimization Supplement: Gaussian Process Supplement: Gaussian Process Regression Acquisition Functions for Bayesian Optimization Traditional Acquisition Functions Probability of Improvement Expected Improvement GP-Upper Confidence Bound Reference 2/23

Bayesian Optimization 3/23

Bayesian Optimization for Expensive Black-box Function A powerful strategy for finding the extrema of objective functions that are expensive to evaluate, where one does not have a closed-form expression for the objective function, but where one can obtain observations at sampled values. The prior represents our belief about the space of possible objective functions. The posterior distribution is P(f D 1:t ) P(D 1:t f )P(f ), where D 1:t = {x 1:t, f (x 1:t )}, f (x i ) is the observation of the objective function at x i, P(f ) is the prior distribution, and P(D 1:t f ) is the likelihood. [Brochu, et al., 2009] 4/23

5/23

6/23

7/23

8/23

Algorithm of Bayesian Optimization Algorithm 1 Bayesian Optimization Require: Initial data D 1:I = {(x i, y i ) 1:I }. 1: for t = 1, 2,..., do 2: Predict a function f (x D 1:I +t 1 ) considered as an objective function. 3: Find x I +t by optimizing the acquisition function, x I +t = arg max xi +t a(x D 1:I +t 1 ). 4: Sample the objective function, y I +t = f (x I +t ) + ɛ I +t. 5: Update on D 1:I +t = {D 1:I +t 1, (x t, y t )}. 6: end for 9/23

Supplement: Gaussian Process A Gaussian Process (GP) is a collection of random variables, any finite number of which have a joint Gaussian distribution. Generally, GP is expressed as where m(x) = E[f (x)] f GP(m(x), k(x, x )) k(x, x ) = E[(f (x) m(x))(f (x ) m(x ))]. [Rasmussen and Williams, 2006] 10/23

Supplement: Gaussian Process Regression The squared-exponential covariance functions in one dimension has the following form, k(x, x ) = σ 2 f exp( 1 2l 2 (x x ) 2 ) + σ 2 n δ xx, where σ f is the signal standard deviation, l is the length scale and σ n is the noise standard deviation. The mean and covariance of the predictive distribution are Mean = K (X test,x training )(K (X training,x training )+σ 2 n I ) 1 y, Covariance = K (X test,x test ) K (X test,x training )(K (X training,x training )+σ 2 n I ) 1 K (X training,x test ). [Rasmussen and Williams, 2006] 11/23

Acquisition Functions for Bayesian Optimization Acquisition function is a function that acquires a next point to evaluate for a black-box expensive function. Traditionally, the Probability of Improvement (PI) [Kushner, 1964], the Expected Improvement (EI) [Mockus et al., 1978], and GP-Upper Confidence Bound (GP-UCB) [N. Srinivas et al., 2010] are used for Bayesian optimization. Several functions like Predictive Entropy Search (PES) [Hernandez-Lobato et al., 2014] and a combination of existing functions are suggested recently. 12/23

Traditional Acquisition Functions PI EI a EI (x;{x n,y n },θ)= a PI (x;{x n,y n },θ)=φ(z ), (µ(x) f (x + ))Φ(Z )+σ(x)φ(z ) if σ(x)>0 0 if σ(x)=0, GP-UCB a UCB (x;{x n,y n },θ)=µ(x,{x n,y n },θ)+βσ(x;{x n,y n },θ), where Z = µ(x) f (x + ) σ(x) if σ(x)>0 0 if σ(x)=0. 13/23

Probability of Improvement PI is given with a trade-off parameter ξ 0, a PI (x) = E(I 0 ) = P(µ(x) f (x + ) + ξ) = Φ( µ(x) f (x+ ) ξ ), σ(x) where I = max{0, µ(x) f (x + ) ξ}. 14/23

Example of PI Figure 1: Objective function is red and acquisition function is blue. Green point is the last acquired point and x point is a training data. 15/23

Exploration-exploitation trade-off of PI 16/23

Expected Improvement EI is given with a trade-off parameter ξ 0, a EI (x) = E(I 1 ) { (µ(x) f (x = + ) ξ)φ(z ) + σ(x)φ(z ) if σ(x) > 0 0 if σ(x) = 0, where and Z = I = max{0, µ(x) f (x + ) ξ} { µ(x) f (x + ) ξ σ(x) if σ(x) > 0 0 if σ(x) = 0. 17/23

Example of EI Figure 2: Objective function is red and acquisition function is blue. Green point is the last acquired point and x point is a training data. 18/23

Exploration-exploitation trade-off of EI 19/23

GP-Upper Confidence Bound GP-Upper Confidence Bound is where β is given. a UCB (x) = µ(x) + βσ(x), 20/23

Example of GP-UCB Figure 3: Objective function is red and acquisition function is blue. Green point is the last acquired point and x point is a training data. β is 1.0. 21/23

Exploration-exploitation trade-off of GP-UCB 22/23

Reference [1] Z. Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521:452-459, 2015. [2] E. Brochu, V. M. Cora, N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report UBC TR-2009-23 and arxiv: 1012.2599v1, 2009. [3] C. E. Rasmussen and C. K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. [4] H. J. Kushner. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86:97-106, 1964. [5] J. Mockus, V. Tiesis, and A. Zilinskas. The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2:117-129, 1978. [6] N. Srinivas, A. Krause, S. M. Kakade and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. ICML, 2010. 23/23