Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Size: px

Start display at page:

Download "Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration"

Jessie Potter
5 years ago
Views:

1 Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration Emile Contal David Buffoni Alexandre Robicquet Nicolas Vayatis CMLA, ENS Cachan, France September 25, 2013

2 Motivating example: Sequential optimization f (x) f (x 3 ) f (x 4 ) f (x 2 ) f (x 1 ) x x 1 x 2 x 3 x 4

3 Motivating example: Sequential optimization f (x) x n+1? f (x 3 ) f (x 4 ) f (x 2 ) f (x 1 ) x x 1 x 2 x 3 x 4

4 Motivating example: Sequential optimization f (x 3 ) f (x 4 ) f (x) x n+1? f (x 2 ) f (x 1 ) x x 1 x 2 x 3 x 4

5 Motivating example: Batch optimization f (x) f (x 3 ) x (1) n+1? x (2) n+1? x (3) n+1? f (x 4 ) f (x 2 ) f (x 1 ) x x 1 x 2 x 3 x 4

6 Problem Statement Setup Unknown f : X R, where X R d compact and convex Find x = argmax x X f (x) At iteration t, query a batch of K locations xt 1,..., xt K Observe the noisy evaluations of f, yt 1,..., yt K R, yt k = f ( xt k ) + ɛ k t where ɛ k iid t N (0, σ 2 ) X Examples Heavy numerical experiment on a cluster with K cores Sensor placement with K sensors Laboratory experiment,...

7 Bandit setting Cumulative regret Exploration / Exploitation Batch cumulative regret: R K T = T Full cumulative regret: R TK = t=1 T t=1 k=1 ( ) f (x ) max f (x t k ) k K K ( ) f (x ) f (xt k )

8 Gaussian Processes Definition f GP(m, k), with mean function m : X R and kernel function k : X X R +, when for all x 1,..., x n the values (f (x 1 ),..., f (x n ) ) are distributed as a multivariate Gaussian with mean and variance given by m and k. Bayesian Inference (Rasmussen and Williams, 2005) At iteration t, with observations Y Xt at X t = {x1 1,..., x t K }, the posterior distribution Pr[f Y Xt ] is a Gaussian process with mean and covariance given by the Bayesian inference. At each point x X, we can compute a prediction with µ t+1 (x) and uncertainty with σt+1 2 (x).

9 Example: Bayesian inference with 4 observations

10 Upper and Lower Confidence Bounds Definition Fix 0 < δ < 1, f t + (x) = µ t (x) + β t σt 2 (x) ft (x) = µ t (x) β t σt 2 (x) with β t = O(log t δ ) defined in Srinivas et al. (2012) Property x X, t 1, f (x) [ ft (x), f t + (x) ] holds with probability at least 1 δ

11 Relevant Region Definition The Relevant Region R t and the extended R + t are defined by, y t = max { R t = R + t = f t x X (x) x X f + t { x X µ t (x) + 2 } (x) y t } β t+1 σt 2 (x) y t Property argmax x X f + t+1 (x) R + t with high probability

12 UCB and Pure Exploration UCB policy xt 1 + argmax x R + t f t (x) Pure Exploration policy For 2 k K, xt k argmax x R + t σ (k) t (x) where σ (k) t (x) is the updated deviation after having selected xt 1,..., xt k 1

13 GP-UCB-PE Algorithm 1: GP-UCB-PE for t = 1, 2,... do Compute µ t and σt 2 with Bayesian inference on y1 1,..., y t 1 K Compute R + t xt 1 + argmax x R + t f t (x) for k = 2,..., K do Update σ (k) t xt k argmax x R + t σ (k) t (x) Query {x k t } 1 k K Observe {y k t } 1 k K

14 Example: GP-UCB-PE 1 0 x 1 x

15 Regret Bounds Theorem With f GP(0, k) and k(x, x) ( 1, with high probability: ) RT K = O T γ TK K log T ) and R TK = O( γtk TK log T Mutual Information γ TK is the maximum mutual information about f obtainable by a sequence of TK queries. For linear kernel, γ TK = O(d log TK) For RBF kernel, γ TK = O ( (log TK) d+1)

16 Corollary Corollary (Batch vs Sequential) With K T, the improvement of the parallel strategy over the sequential one is K with respect to R K T. Remark (GP-BUCB, Desautels et al. (2012)) Compared to GP-BUCB, there is no need for an initialization phase. The improvement can be doubly exponential in the dimension d.

17 Experiments Setup Competitors: GP-BUCB (Desautels et al. (2012)) and SM+UCB (Azimi et al. (2010)) Assessment: 3 synthetic problems and 3 real applications (a) Himmelblau (b) Gaussian Mixture

18 Results: mean instantaneous batch regret and confidence interval over 64 experiments Regret r K t Iteration t GP-BUCB SM-UCB GP-UCB-PE (a) Generated GP Iteration t (b) Himmelblau Iteration t (c) Gaussian mixture Regret r K t Iteration t Iteration t Iteration t (d) Mackey-Glass (e) Tsunamis (f) Abalone

19 Conclusion GP-UCB-PE Theoretical upper bounds on the cumulative regret Efficient in practice Easy to implement Implementation MATLAB source codes, documentation and data sets are available at

20 Azimi, J., Fern, A., and Fern, X. (2010). Batch bayesian optimization via simulation matching. In Advances in Neural Information Processing Systems 24, pages Curran Associates, Inc. Desautels, T., Krause, A., and Burdick, J. (2012). Parallelizing exploration-exploitation tradeoffs with gaussian process bandit optimization. In Proceedings of the 29th International Conference on Machine Learning, pages icml.cc / Omnipress. Rasmussen, C. E. and Williams, C. (2005). Gaussian Processes for Machine Learning. MIT Press. Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. (2012). Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):

Optimisation séquentielle et application au design

Optimisation séquentielle et application au design d expériences Nicolas Vayatis Séminaire Aristote, Ecole Polytechnique - 23 octobre 2014 Joint work with Emile Contal (computer scientist, PhD student)