Natural Evolution Strategies for Direct Search

Size: px

Start display at page:

Download "Natural Evolution Strategies for Direct Search"

Rebecca Montgomery
6 years ago
Views:

1 Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday October 30, 2014 Tobias Glasmachers Institut für Neuroinformatik Ruhr-Universität Bochum, Germany

2 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2

3 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization:

4 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R.

5 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting.

6 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting. 3 Black-box model x f f(x)

7 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting. 3 Black-box model x f f(x) 4 Don t ever rely on function values, only comparisons f(x) < f(x ), e.g., for ranking.

8 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations:

9 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic

10 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic 2 evaluate f(x) black box, most of the computation time

11 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic 2 evaluate f(x) black box, most of the computation time 3 compare (rank) against other solutions: f(x) < f(x )?

12 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4

13 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges:

14 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective

15 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality

16 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise

17 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise high dimensionality, e.g., d 1000

18 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise high dimensionality, e.g., d 1000 black-box constraints (possibly non-smooth,...)

19 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5

20 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering.

21 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering.

22 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering. 2 Tuning of parameters of machine learning algorithms.

23 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering. 2 Tuning of parameters of machine learning algorithms.

24 Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 6 input m R d, σ > 0, C(= I) R d d loop sample offspring x 1,...,x λ N(m,σ 2 C) evaluate f(x 1 ),...,f(x λ ) select new population of size µ (e.g., best offspring) update mean vector m update global step size σ update covariance matrix C until stopping criterion met

25 Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 6 input m R d, σ > 0, C(= I) R d d loop sample offspring x 1,...,x λ N(m,σ 2 C) evaluate f(x 1 ),...,f(x λ ) select new population of size µ (e.g., best offspring) update mean vector m update global step size σ update covariance matrix C until stopping criterion met randomized population-based rank-based (function-value free) step size control metric learning (covariance matrix adaptation, CMA)

26 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population?

27 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems.

28 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ?

29 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.

30 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.

31 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.

32 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.

33 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks?

34 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values.

35 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values.

36 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values. Why CMA?

37 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values. Why CMA? Efficient optimization of ill-conditioned problems, similar to (quasi) Newton methods.

38 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values. Why CMA? Efficient optimization of ill-conditioned problems, similar to (quasi) Newton methods.

39 This Talk Tobias Glasmachers Natural Evolution Strategies for Direct Search 8 Given samples and their ranks, how to update the distribution parameters?

40 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9

41 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C).

42 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ.

43 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def.

44 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def. Statistical manifold of distributions { P θ θ Θ } = Θ.

45 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def. Statistical manifold of distributions { P θ θ Θ } = Θ. Equipped with intrinsic (Riemannian) geometry; metric tensor given by Fisher information matrix I(θ).

46 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d.

47 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R.

48 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R. Simplest choice: ] W f (θ) = E x Pθ [f(x)

49 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R. Simplest choice: More flexible choice: ] W f (θ) = E x Pθ [f(x) W f (θ) = E x Pθ [ w ( f(x) )] with monotonic weight function w : R R.

50 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness.

51 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness. Hence under weak assumptions W f (θ) can be optimized with gradient descent (GD): θ θ η θ W f (θ)

52 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness. Hence under weak assumptions W f (θ) can be optimized with gradient descent (GD): log-likelohood trick : θ θ η θ W f (θ) θ W f (θ) = θ E x Pθ [ w ( f(x) )] = E x Pθ [ θ log ( p θ (x) ) w ( f(x) )]

53 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why?

54 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ)

55 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ.

56 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ).

57 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ).

58 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ). Optimization: follow inverse flow curves t φ t (θ) from θ into (local) minimum of W f.

59 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )]

60 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation [ E x Pθ θ log ( p θ (x) ) w ( f(x) )] is intractable. Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1

61 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )] Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1 The estimate E[G(θ)] = θ W f (θ) is consistent, hence following G(θ) amounts to stochastic gradient descent.

62 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )] Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1 The estimate E[G(θ)] = θ W f (θ) is consistent, hence following G(θ) amounts to stochastic gradient descent. Yields natural gradient MC approximation G(θ) = ( I(θ) ) 1 G(θ).

63 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ)

64 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling.

65 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ.

66 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ. SNGD algorithm invariant in first order approximation due to Euler steps.

67 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ. SNGD algorithm invariant in first order approximation due to Euler steps. Ollivier et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles. arxiv: , 2011.

68 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way.

69 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions.

70 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions. Offspring x 1,...,x λ act as MC sample for the estimation of G(θ).

71 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions. Offspring x 1,...,x λ act as MC sample for the estimation of G(θ). Closed form Fisher tensor for N(m,C): I i,j (θ) = mt C 1 m + 1 ( ) θ i θ j 2 tr C 1 C C 1 C θ i θ j

72 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 16 NES algorithm input θ Θ, λ N, η > 0 loop sample x 1,...,x λ P θ evaluate f(x 1 ),...,f(x λ ) G(θ) 1 λ λ i=1 θlog ( p θ (x i ) ) f(x i ) G(θ) ( I(θ) ) 1 G(θ) θ θ η G(θ) until stopping criterion met Wierstra et al. Natural Evolution Strategies. CEC, 2008 and JMLR, 2014.

73 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ.

74 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ.

75 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ. This turns NES into a function value free algorithm.

76 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ. This turns NES into a function value free algorithm. Benefits: invariance under monotonic transformations of objective values linear convergence on scale invariant problems.

77 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 18 xnes (exponential NES) algorithm input (m,a) Θ, λ N, η m,η A > 0 loop sample z 1,...,z λ N(0,I) transform x k,...,x λ Az k +m evaluate f(x 1 ),...,f(x λ ) G m (θ) 1 λ λ i=1 w(f(x i)) z i G C (θ) 1 λ λ i=1 w(f(x i)) 1 2 (z izi T I) m m η m ( A G m (θ) A A exp η A 1 G ) 2 C (θ) until stopping criterion met Glasmachers et al. Exponential Natural Evolution Strategies. GECCO, 2010.

78 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective).

79 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C.

80 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C. SNGD has only a single mechanism.

81 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C. SNGD has only a single mechanism. Astonishing insight: all (most) parameters can be updated with a single mechanism.

82 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN 2010.

83 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update.

84 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update. This is astonishing, it is surely not by coincidence.

85 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update. This is astonishing, it is surely not by coincidence. This is insightful: it means that CMA-ES is (essentially) a SNGD algorithm.

86 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21

87 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization.

88 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way?

89 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions.

90 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions. SNGD on a stochastically relaxed problem results in a direct search algorithm: the Natural-gradient Evolution Strategy (NES) algorithm.

91 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions. SNGD on a stochastically relaxed problem results in a direct search algorithm: the Natural-gradient Evolution Strategy (NES) algorithm. The SNGD parameter update is by no means restricted to Gaussian distributions. It is a general construction template for update equations of continuous distribution parameters.

92 Thank you! Questions? Tobias Glasmachers Natural Evolution Strategies for Direct Search 22

How information theory sheds new light on black-box optimization

How information theory sheds new light on black-box optimization Anne Auger Colloque Théorie de l information : nouvelles frontières dans le cadre du centenaire de Claude Shannon IHP, 26, 27 and 28 October,