Gaussian Process Optimization with Mutual Information

Size: px

Start display at page:

Download "Gaussian Process Optimization with Mutual Information"

Allan Reeves
5 years ago
Views:

1 Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS, France June 22, 214

2 Problem Statement Sequential Optimization Let f : X R where X R d is compact and convex. We consider the problem of finding the maximum of f denoted by: f (x ) = max x X f (x), via sequential queries f (x 1 ), f (x 2 ),... Noisy Observations At iteration T we choose x T +1 using the previous noisy observations Y T = {y 1,..., y T }, where t T : y t = f (x t ) + ɛ t and ɛ t iid N (, η 2 ). Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 2

3 Gaussian Processes Definition f GP(m, k) with mean function m : X R and kernel function k : X X R +, when for all x 1,..., x n X we have: ( f (x1 ),..., f (x n ) ) N ( [m(x i )] i, [k(x i, x j )] i,j ). Bayesian Inference Given Y T, the posterior distribution Pr[f Y T ] is a GP with mean µ T +1 (prediction) and variance σt 2 +1 (uncertainty) computed by Bayesian inference. Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 3

4 Gaussian Processes Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 4

5 Objective Cumulative Regret The efficiency of a policy is measured via the cumulative regret: R T = f (x ) f (x t ). t<t Goal The cumulative regret is unknown in practice. Our aim is to obtain upper bounds on R T with high probability. Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 5

6 Related Work Dani al. 28 In the general linear optimization problem, we have a lower bound on the cumulative regret of Ω(d T ). Srinivas et al. 212 For the GP-UCB algorithm with linear GP, we have an upper bound on the cumulative regret of O(d T ) with high probability. Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 6

7 Mutual Information An Important Ingredient Information Gain The information gain on f at X T is the mutual information between f and Y T. For a GP distribution with K T the kernel matrix of X T : I T (X T ) = 1 2 log det(i + η 2 K T ). We define γ T = max X =T I T (X ) the maximum information gain by a sequence of T queries points. Empirical Lower Bound For GPs with bounded variance, we have: [Srinivas et al. 212] T γ T = σt 2 2 (x t ) Cγ T where C = log(1 + η 2 ) t=1 Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 7

8 GP-MI A Novel Algorithm for Sequential Optimization γ for t = 1, 2,... do Compute µ t and σt 2 using Bayesian inference φ t (x) α ( σ 2t (x) + γ t 1 γ ) t 1 x t argmax x X µ t (x) + φ t (x) γ t γ t 1 + σ 2 t (x t ) Query at x t and observe y t end Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 8

9 Main Result Regret bounds for GP-MI For all δ > and T > 1, set α = log 2 δ. The cumulative regret R T incurred by the GP-MI algorithm on f distributed as a GP perturbed by independent Gaussian noise with variance η 2 satisfies the following bounds: [ Pr R T 5 αcγ T + 4 ] α 1 δ, where C = 2 log(1+η 2 ). Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 9

10 Application to Specific Kernels For linear kernel: R T = O( d log T ) For RBF kernel: R T = O ( (log T ) d+1) For Matérn kernel: R T = O ( T a log T ), where a = d(d+1) 2ν+d(d+1) < 1 and ν is the Matérn parameter. Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 1

11 High Probabilistic Bounds for R T Gaussian Martingale The sequence M T = R T T t=1 ( µt (x ) µ t (x t ) ) is a Gaussian martingale with respect to Y T 1. Concentration Inequalities [Becu et al. 28] For all δ > and T > 1, with y = 8(Cγ T + 1) we have: [ Pr M T ] 2α T 2αy+ σt 2 (x ) 1 δ y Pr [ R T 2αy+ 2α y t=1 t=1 T T σt 2 (x ( )+ µt (x ) µ t (x t ) )] 1 δ. t=1 Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 11

12 Regret Bounds for the GP-MI Algorithm Inequality for the GP-MI Algorithm Using the function φ t as defined in GP-MI, we have: T ( µt (x ) µ t (x t ) ) T ( φt (x t ) φ t (x ) ) t=1 t=1 αcγ T 2α y T σt 2 (x ). t=1 Regret bounds Plugging this inequality in the previous concentration bound for R T : [ Pr R T 5 αcγ T + 4 ] α 1 δ. Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 12

13 Empirical Average Regret RT /T EI UCB 1.5 MI (a) Generated GP (d = 2) UCB 3 2 EI 1 MI (b) Generated GP (d = 4) RT /T EI UCB.5 MI , (c) Gaussian mixture UCB 1 EI.5 MI (d) Himmelblau Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 13

14 Empirical Average Regret RT /T UCB EI MI (e) Branin.6 UCB.4 EI.2 MI (f) Goldstein.4.3 UCB.4.3 UCB RT /T.2.1 MI EI (g) Tsunamis.2 EI.1 MI , (h) Mackey-Glass Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 14

15 Implementation Exact Inference for Gaussian likelihood Numerical complexity in O(T 2 ) using the Cholesky sequential updates of the covariance matrix. Algorithms for non-gaussian likelihood For other likelihood functions (e.g. Laplacian or Student s t), one can use the EP algorithm or Monte Carlo sampling. Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 15

16 Open Questions and Discussion Theoretical guarantees for simple regret Empirical performance for simple regret Kernel learning procedure Calibration of δ Contal, Perchet, Vayatis Gaussian Process Optimization with Mutual Information 16

Optimisation séquentielle et application au design

Optimisation séquentielle et application au design d expériences Nicolas Vayatis Séminaire Aristote, Ecole Polytechnique - 23 octobre 2014 Joint work with Emile Contal (computer scientist, PhD student)