Natural Evolution Strategies for Direct Search

Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday October 30, 2014 Tobias Glasmachers Institut für Neuroinformatik Ruhr-Universität Bochum, Germany

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization:

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R.

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting.

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting. 3 Black-box model x f f(x)

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations:

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic

Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic 2 evaluate f(x) black box, most of the computation time

Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4

Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges:

Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise

Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5

Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering.

Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering. 2 Tuning of parameters of machine learning algorithms.

Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 6 input m R d, σ > 0, C(= I) R d d loop sample offspring x 1,...,x λ N(m,σ 2 C) evaluate f(x 1 ),...,f(x λ ) select new population of size µ (e.g., best offspring) update mean vector m update global step size σ update covariance matrix C until stopping criterion met

Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population?

Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks?

This Talk Tobias Glasmachers Natural Evolution Strategies for Direct Search 8 Given samples and their ranks, how to update the distribution parameters?

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C).

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def. Statistical manifold of distributions { P θ θ Θ } = Θ.

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d.

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R. Simplest choice: ] W f (θ) = E x Pθ [f(x)

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness.

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness. Hence under weak assumptions W f (θ) can be optimized with gradient descent (GD): θ θ η θ W f (θ)

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why?

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ).

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )]

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation [ E x Pθ θ log ( p θ (x) ) w ( f(x) )] is intractable. Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )] Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1 The estimate E[G(θ)] = θ W f (θ) is consistent, hence following G(θ) amounts to stochastic gradient descent.

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ)

Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ. SNGD algorithm invariant in first order approximation due to Euler steps.

Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions. Offspring x 1,...,x λ act as MC sample for the estimation of G(θ).

Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 16 NES algorithm input θ Θ, λ N, η > 0 loop sample x 1,...,x λ P θ evaluate f(x 1 ),...,f(x λ ) G(θ) 1 λ λ i=1 θlog ( p θ (x i ) ) f(x i ) G(θ) ( I(θ) ) 1 G(θ) θ θ η G(θ) until stopping criterion met Wierstra et al. Natural Evolution Strategies. CEC, 2008 and JMLR, 2014.

Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ.

Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 18 xnes (exponential NES) algorithm input (m,a) Θ, λ N, η m,η A > 0 loop sample z 1,...,z λ N(0,I) transform x k,...,x λ Az k +m evaluate f(x 1 ),...,f(x λ ) G m (θ) 1 λ λ i=1 w(f(x i)) z i G C (θ) 1 λ λ i=1 w(f(x i)) 1 2 (z izi T I) m m η m ( A G m (θ) A A exp η A 1 G ) 2 C (θ) until stopping criterion met Glasmachers et al. Exponential Natural Evolution Strategies. GECCO, 2010.

Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C.

Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN 2010. In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update.

Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21

Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization.

Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions. SNGD on a stochastically relaxed problem results in a direct search algorithm: the Natural-gradient Evolution Strategy (NES) algorithm.

Thank you! Questions? Tobias Glasmachers Natural Evolution Strategies for Direct Search 22