Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds

Similar documents
Investigating the Impact of Adaptation Sampling in Natural Evolution Strategies on Black-box Optimization Testbeds

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed

Bounding the Population Size of IPOP-CMA-ES on the Noiseless BBOB Testbed

Comparison of NEWUOA with Different Numbers of Interpolation Points on the BBOB Noiseless Testbed

Black-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed

Benchmarking the Nelder-Mead Downhill Simplex Algorithm With Many Local Restarts

Benchmarking a BI-Population CMA-ES on the BBOB-2009 Function Testbed

BBOB-Benchmarking Two Variants of the Line-Search Algorithm

Benchmarking the (1+1)-CMA-ES on the BBOB-2009 Function Testbed

Benchmarking a Weighted Negative Covariance Matrix Update on the BBOB-2010 Noiseless Testbed

Benchmarking a Hybrid Multi Level Single Linkage Algorithm on the BBOB Noiseless Testbed

Benchmarking Gaussian Processes and Random Forests Surrogate Models on the BBOB Noiseless Testbed

High Dimensions and Heavy Tails for Natural Evolution Strategies

The Impact of Initial Designs on the Performance of MATSuMoTo on the Noiseless BBOB-2015 Testbed: A Preliminary Study

Benchmarking Projection-Based Real Coded Genetic Algorithm on BBOB-2013 Noiseless Function Testbed

arxiv: v1 [cs.ne] 9 May 2016

Benchmarking Gaussian Processes and Random Forests on the BBOB Noiseless Testbed

How information theory sheds new light on black-box optimization

arxiv: v2 [cs.ai] 13 Jun 2011

Natural Evolution Strategies for Direct Search

Block Diagonal Natural Evolution Strategies

Covariance Matrix Adaptation in Multiobjective Optimization

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES

Stochastic Search using the Natural Gradient

arxiv: v1 [cs.ne] 29 Jul 2014

BI-population CMA-ES Algorithms with Surrogate Models and Line Searches

Real-Parameter Black-Box Optimization Benchmarking 2010: Presentation of the Noiseless Functions

Efficient Natural Evolution Strategies

A CMA-ES with Multiplicative Covariance Matrix Updates

Adaptive Coordinate Descent

arxiv: v1 [cs.lg] 13 Dec 2013

Advanced Optimization

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.

Experimental Comparisons of Derivative Free Optimization Algorithms

A Restart CMA Evolution Strategy With Increasing Population Size

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Empirical comparisons of several derivative free optimization algorithms

Natural Evolution Strategies

Comparison-Based Optimizers Need Comparison-Based Surrogates

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES

Natural Gradient for the Gaussian Distribution via Least Squares Regression

Stochastic optimization and a variable metric approach

Fitness Expectation Maximization

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Evolution Strategies and Covariance Matrix Adaptation

Multi-objective Optimization with Unbounded Solution Sets

Bio-inspired Continuous Optimization: The Coming of Age

Stochastic Methods for Continuous Optimization

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES

Beta Damping Quantum Behaved Particle Swarm Optimization

A (1+1)-CMA-ES for Constrained Optimisation

Introduction to Optimization

Adaptive Generation-Based Evolution Control for Gaussian Process Surrogate Models

Decomposition and Metaoptimization of Mutation Operator in Differential Evolution

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes

Gradient-based Adaptive Stochastic Search

Center-based initialization for large-scale blackbox

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization

Noisy Optimization: A Theoretical Strategy Comparison of ES, EGS, SPSA & IF on the Noisy Sphere

Optimisation numérique par algorithmes stochastiques adaptatifs

Toward Effective Initialization for Large-Scale Search Spaces

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial)

The Dispersion Metric and the CMA Evolution Strategy

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Geometric Semantic Genetic Programming (GSGP): theory-laden design of semantic mutation operators

Viability Principles for Constrained Optimization Using a (1+1)-CMA-ES

Local-Meta-Model CMA-ES for Partially Separable Functions

Efficiency Optimization of a Multi-Pump Booster System

Parameter Sensitivity Analysis of Social Spider Algorithm

Real-Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Definitions

The optimistic principle applied to function optimization

Efficient Covariance Matrix Update for Variable Metric Evolution Strategies

Numerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables

Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress

Path Integral Policy Improvement with Covariance Matrix Adaptation

Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1

Maximum Achievable Gain of a Two Port Network

Evolutionary Ensemble Strategies for Heuristic Scheduling

CLASSICAL gradient methods and evolutionary algorithms

Variational Autoencoders

Model-Based Relative Entropy Stochastic Search

arxiv: v4 [math.oc] 28 Apr 2017

Curiosity-Driven Optimization

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

On the Pathological Behavior of Adaptive Differential Evolution on Hybrid Objective Functions

Parameter-exploring Policy Gradients

Integer Least Squares: Sphere Decoding and the LLL Algorithm

Learning Tetris. 1 Tetris. February 3, 2009

Optimization for neural networks

Linear Regression (continued)

Particle Swarm Optimization with Velocity Adaptation

Verification of a hypothesis about unification and simplification for position updating formulas in particle swarm optimization.

REMEDA: Random Embedding EDA for optimising functions with intrinsic dimension

XCSF with Local Deletion: Preventing Detrimental Forgetting

Computational Intelligence Winter Term 2017/18

Total Ordering on Subgroups and Cosets

Cumulative Step-size Adaptation on Linear Functions

Transitional Particle Swarm Optimization

Transcription:

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University Broadway, New York, USA schaul@cims.nyu.edu ABSTRACT Natural Evolution Strategies (NES) are a recent member of the class of real-valued optimization algorithms that are based on adapting search distributions. Exponential NES (xnes) are the most common instantiation of NES, and particularly appropriate for the BBOB benchmarks, given that many are non-separable, and their relatively small problem dimensions. The technique of adaptation sampling, which adapts learning rates online further improves the algorithm s performance. This report provides the the most extensive empirical results on that combination (xnes-as) to date, on both the noise-free and noisy BBOB testbeds. Categories and Subject Descriptors G.. [Numerical Analysis]: Optimization global optimization, unconstrained optimization; F.. [Analysis of Algorithms and Problem Complexity]: Numerical Algorithms and Problems General Terms Algorithms Keywords Evolution Strategies, Natural Gradient, Benchmarking. INTRODUCTION Evolution strategies (ES), in contrast to traditional evolutionary algorithms, aim at repeating the type of mutation that led to those good individuals. We can characterize those mutations by an explicitly parameterized search distribution from which new candidate samples are drawn, akin to estimation of distribution algorithms (EDA). Covariance matrix adaptation ES (CMA-ES []) innovated the field by introducing a parameterization that includes the full covariance matrix, allowing them to solve highly non-separable problems. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO, July,, Philadelphia, USA. Copyright ACM 98----//...$.. A more recent variant, natural evolution strategies (NES [,,, ]) aims at a higher level of generality, providing a procedure to update the search distribution s parameters for any type of distribution, by ascending the gradient towards higher expected fitness. Further, it has been shown [, ] that following the natural gradient to adapt the search distribution is highly beneficial, because it appropriately normalizes the update step with respect to its uncertainty and makes the algorithm scale-invariant. Exponential NES (xnes), the most common instantiation of NES, used a search distribution parameterized by a mean vector and a full covariance matrix, and is thus most similar to CMA-ES (in fact, the precise relation is described in [] and []). Given the relatively small problem dimensions of the BBOB benchmarks, and the fact that many are non-separable, it is also among the most appropriate NES variants for the task. In this report, we retain the original formulation of xnes (including all parameter settings, except for an added stopping criterion), augmented with a technique for the online adaptation of its learning rate called adaptation sampling, which is designed to speed up convergence. We describe the empirical performance on all benchmark functions (both noise-free and noisy) of the BBOB workshop.. NATURAL EVOLUTION STRATEGIES Natural evolution strategies (NES) maintain a search distribution π and adapt the distribution parameters θ by following the natural gradient [] of expected fitness J, that is, maximizing Z J(θ) = E θ [f(z)] = f(z) π(z θ) dz Just like their close relative CMA-ES [], NES algorithms are invariant under monotone transformations of the fitness function and linear transformations of the search space. Each iteration the algorithm produces n samples z i π(z θ), i {,..., n}, i.i.d. from its search distribution, which is parameterized by θ. The gradient w.r.t. the parameters θ can be rewritten (see []) as Z θ J(θ) = θ f(z) π(z θ) dz = E θ [f(z) θ log π(z θ)] from which we obtain a Monte Carlo estimate θ J(θ) nx f(z i) θ log π(z i θ) n i=

of the search gradient. The key step then consists in replacing this gradient h by the natural gradient defined as F θ J(θ) where F = E θ log π (z θ) θ log π (z θ) i is the Fisher information matrix. The search distribution is iteratively updated using natural gradient ascent θ θ + ηf θ J(θ) with learning rate parameter η.. Exponential NES While the NES formulation is applicable to arbitrary parameterizable search distributions [, ], the most common variant employs multinormal search distributions. For that case, two helpful techniques were introduced in [], namely an exponential parameterization of the covariance matrix, which guarantees positive-definiteness, and a novel method for changing the coordinate system into a natural one, which makes the algorithm computationally efficient. The resulting algorithm, NES with a multivariate Gaussian search distribution and using both these techniques is called xnes, and the pseudocode is given in Algorithm. Algorithm : Exponential NES (xnes) input: f, µ init, η σ, η B, u k initialize µ µ init σ B I repeat for k =... n do draw sample s k N (, I) z k µ + σb s k evaluate the fitness f(z k ) end sort {(s k, z k )} with respect to f(z k ) and assign utilities u k to each sample compute gradients δ J P n k= u k s k M J P n k= u k (s k s k I) σj tr( M J)/d B J M J σj I update parameters µ µ + σb δ J σ σ exp(η σ/ σj) B B exp(η B / B J) until stopping criterion is met. Adaptation Sampling First introduced in [] (chapter, section.), adaptation sampling is a new meta-learning technique [] that can adapt hyper-parameters online, in an economical way that is grounded on a measure statistical improvement. Here, we apply it to the learning rate of the global stepsize η σ. The idea is to consider whether a larger learningrate η σ = ησ would have been more likely to generate the good samples in the current batch. For this we determine the (hypothetical) search distribution that would have resulted from such a larger update π( θ ). Then we compute importance weights w k = π(z k θ ) π(z k θ) for each of the n samples z k in our current population, generated from the actual search distribution π( θ). We then conduct a weighted Mann-Whitney test [] (appendix A) to determine if the set {rank(z k )} is inferior to its reweighted counterpart {w k rank(z k )} (corresponding to the larger learning rate), with statistical significance ρ. If so, we increase the learning rate by a factor of + c, up to at most η σ = (where c =.). Otherwise it decays to its initial value: η σ ( c ) η σ + c η σ,init The procedure is summarized in algorithm (for details and derivations, see []). The combination of xnes with adaptation sampling is dubbed xnes-as. One interpretation of why adaptation sampling is helpful is that half-way into the search, (after a local attractor has been found, e.g., towards the end of the valley on the Rosenbrock benchmarks f 8 or f 9), the convergence speed can be boosted by an increased learning rate. For such situations, an online adaptation of hyper-parameters is inherently wellsuited. Algorithm : Adaptation sampling input : η σ,t,η σ,init, θ t, θ t, {(z k, f(z k ))}, c, ρ output: η σ,t+ compute hypothetical θ, given θ t and using /η σ,t for k =... n do w k = π(z k θ ) π(z k θ) end S {rank(z k )} S {w k rank(z k )} if weighted-mann-whitney(s, S ) < ρ then return ( c ) η σ + c η σ,init else return min(( + c ) η σ, ) end Table : Default parameter values for xnes (including the utility function and adaptation sampling) as a function of problem dimension d. parameter default value n η σ = η B u k ρ + log(d) ( + log(d)) d d max `, log( n + ) log(k) P n j= max `, log( n + ) log(j) n (d + ) c

. EXPERIMENTAL SETTINGS We use identical default hyper-parameter values for all benchmarks (both noisy and noise-free functions), which are taken from [, ]. Table summarizes all the hyperparameters used. In addition, we make use of the provided target fitness f opt to trigger independent algorithm restarts, using a simple ad-hoc procedure: If the log-progress during the past d evaluations is too small, i.e., if f opt f t log f opt f t d < (r + ) m / [log f opt f t + 8] where m is the remaining budget of evaluations divided by d, f t is the best fitness encountered until evaluation t and r is the number of restarts so far. The total budget is d / evaluations. Implementations of this and other NES algorithm variants are available in Python through the PyBrain machine learning library [], as well as in other languages at www. idsia.ch/~tom/nes.html.. CPU TIMING A timing experiment was performed to determine the CPUtime per function evaluation, and how it depends on the problem dimension. For each dimension, the algorithm was restarted with a maximum budget of /d evaluations, until at least seconds had passed. Our xnes-as implementation (in Python, based on the PyBrain [] library), running on an Intel Xeon with.ghz, required an average time of.,.9,.,.,.9,. milliseconds per function evaluation for dimensions,,,,, 8 respectively (the function evaluations themselves take about.ms).. RESULTS Results of xnes-as on the noiseless testbed (from experiments according to [] on the benchmark functions given in [, 8]) are presented in Figures, and, and in Table. Similarly, results of xnes-as on the testbed of noisy functions (from experiments according to [] on the benchmark functions given in [, 9]) are presented in Figures, and, and in Table.. DISCUSSION The top rows in Figures and give a good overview picture, showing that across all benchmarks taken together, xnes-as performs almost as well as the best and better than most of the BBOB 9 contestants. Beyond this high-level perspective, the results speak for themselves, of course, we will just highlight a few observations. According to Tables and, the only conditions where xnes-as significantly outperforms all algorithms from the BBOB9 competition are on function f and in the early phase on f 8, f 8 and f 9. We observe the worst performance on multimodal functions like f, f and f that other algorithms tackle very easily. Comparing different types of It turns out that this use of f opt is technically not permitted by the BBOB guidelines, so strictly speaking a different restart strategy should be employed, for example the one described in []. D = D =.. +:/ -:8/ -:/ -8:9/ f-. log of.. +:/ -:/ -:/ -8:/ f-. log of f- 8 8 log of Df / Dftarget f- 8 8 log of Df / Dftarget Figure : Empirical cumulative distribution functions (ECDFs) of the noisy benchmark functions. Plotted is the fraction of trials versus running time (left subplots) or versus f (right subplots) (see Figure for details). noise, xnes-as appears to be least sensitive to Cauchy noise and most sensitive to uniform noise (see Figure ). From Figure and Table??, we observe a good loss ratio across the board on all benchmarks, with the best ones on moderate functions, ill-conditioned functions, and for all levels of noise. These results hold equally in dimensions and. On the other hand, the algorithm is less competitive on (noisy or noise-free) multimodal benchmarks, which we expect to be directly related to its small default population size. Acknowlegements The author wants to thank the organizers of the BBOB workshop for providing such a well-designed benchmark setup, and especially such high-quality post-processing utilities. This work was funded in part through AFR postdoc grant number 9, of the National Research Fund Luxembourg.. REFERENCES [] S. I. Amari. Natural Gradient Works Efficiently in Learning. Neural Computation, :, 998. [] S. Finck, N. Hansen, R. Ros, and A. Auger. 9: Presentation of the noiseless functions. Technical Report 9/, Research Center PPE, 9. Updated February. [] S. Finck, N. Hansen, R. Ros, and A. Auger. : Presentation of the noisy functions. Technical Report 9/, Research Center PPE,. [] N. Fukushima, Y. Nagata, S. Kobayashi, and I. Ono. Proposal of distance-weighted exponential natural evolution strategies. In IEEE Congress of

Sphere + + - - - - -8 Linear slope Ellipsoid separable Attractive sector Rastrigin separable Step-ellipsoid Skew Rastrigin-Bueche separ 8 Rosenbrock original 9 Rosenbrock rotated Ellipsoid Discus Bent cigar Sharp ridge Schaffer F, condition Gallagher peaks Sum of different powers 8 Schaffer F, condition Gallagher peaks Rastrigin 9 Griewank-Rosenbrock F8F Katsuuras Weierstrass Schwefel x*sin(x) Lunacek bi-rastrigin + + - - - - -8 Figure : Expected number of f-evaluations (ERT, with lines, see legend) to reach f opt + f, median number of f-evaluations to reach the most difficult target that was reached at least once (+) and maximum number of f-evaluations in any trial ( ), all divided by dimension and plotted as log values versus dimension. Shown are f = {,,,,,, 8}. Numbers above ERT-symbols indicate the number of successful trials. The light thick line with diamonds indicates the respective best result from BBOB-9 for f = 8. Horizontal lines mean linear scaling, slanted grid lines depict quadratic scaling.

Sphere moderate Gauss + + - - - - -8 Rosenbrock moderate Gauss Sphere Gauss Rosenbrock Gauss 8 Step-ellipsoid Gauss Sphere moderate unif Rosenbrock 8moderate unif 8 Sphere unif Rosenbrock unif Step-ellipsoid unif Sphere moderate Cauchy Rosenbrock moderate Cauchy 9 Sphere Cauchy 8 Rosenbrock Cauchy Step-ellipsoid Cauchy Ellipsoid Gauss 9 Sum of diff powers Gauss Schaffer F Gauss Griewank-Rosenbrock Gauss 8 Gallagher Gauss Ellipsoid unif Sum of diff powers unif Schaffer F unif Griewank-Rosenbrock unif 9 Gallagher unif 8 Ellipsoid Cauchy Sum of diff powers Cauchy Schaffer F Cauchy Griewank-Rosenbrock Cauchy 8 Gallagher Cauchy + + - - - - -8 Figure : Expected number of f-evaluations (ERT, with lines, see legend) to reach f opt + f, median number of f-evaluations to reach the most difficult target that was reached at least once (+) and maximum number of f-evaluations in any trial ( ), all divided by dimension and plotted as log values versus dimension. Shown are f = {,,,,,, 8}. Numbers above ERT-symbols indicate the number of successful trials. The light thick line with diamonds indicates the respective best result from BBOB-9 for f = 8. Horizontal lines mean linear scaling, slanted grid lines depict quadratic scaling.

all functions.. +:/ -:/ -:/ -8:/ f- D = D =.. +:/ -:/ -:/ -8:/ f- separable fcts.. log of f- +:/ -:/ -:/ -8:/. f- 8 8 log of Df / Dftarget.. log of f- +:/ -:/ -:/ -8:/. f- 8 8 log of Df / Dftarget misc. moderate fcts ill-conditioned fcts multi-modal fcts weak structure fcts.. log of f-9 +:/ -:/ -:/ -8:/... log of f- +:/ -:/ -:/ -8:/... log of f-9 +:/ -:/ -:/ -8:/... log of f- +:/ -:/ -:/ -8:/.. log of f- 8 8 log of Df / Dftarget f-9 8 8 log of Df / Dftarget f- 8 8 log of Df / Dftarget f-9 8 8 log of Df / Dftarget f- 8 8 log of Df / Dftarget.. log of f-9 +:/ -:/ -:/ -8:/.... log of f- +:/ -:/ -:/ -8:/.. log of f-9 +:/ -:/ -:/ -8:/... log of f- +:/ -:/ -:/ -8:/.. log of f- 8 8 log of Df / Dftarget f-9 8 8 log of Df / Dftarget f- 8 8 log of Df / Dftarget f-9 8 8 log of Df / Dftarget f- 8 8 log of Df / Dftarget Figure : Empirical cumulative distribution functions (ECDFs), plotting the fraction of trials with an outcome not larger than the respective value on the x-axis. Left subplots: ECDF of number of function evaluations (FEvals) divided by search space dimension D, to fall below f opt + f with f = k, where k is the first value in the legend. Right subplots: ECDF of the best achieved f divided by 8 for running times of D, D, D,... function evaluations (from right to left cycling black-cyan-magenta). The thick red line represents the most difficult target value f opt + 8. Legends indicate the number of functions that were solved in at least one trial. Light brown lines in the background show ECDFs for f = 8 of all algorithms benchmarked during BBOB-9.

f e+ e+ e- e- e- e- #succ f /.9().() () (8) () 8() / f 8 8 88 9 9 9 / () (9) 9(8) 9() () 9(9) / f /.(.) 9(8) () (8) () (8) / f 89 88 8 88 9 /.8() 8() 9998(8) 98(9) 899(99) 888(88) / f / () (9) (9) (8) (8) (8) / f 8 8 8 /.().(.).(.).(.).(.).(.)/ f 9 /.().8(.).().().().() / f 8 9 /.().8() 8.() () () () / f 9 9 /.() () () () () () / f 9 89 88 /.(.8).8(.).8(.).(.).9(.).(.)/ f /.().().(.).(.).(.).(.)/ f 8 8 9 / (8) () (8) (9) () () / f 9 /.(.).(.).(.).(.).(.).(.)/ f 8 9 /.().9(.9).9().9().(.).(.)/ f 9 99 9 9 /.().() 8() 8(9) (9) (8) / f 9 9 /.().(8).().8().().() / f. 899 9 9 /.8().(.).(.).8(.).().() / f 8 8 98 98 9 9 /.8(.).(.).(.).(.).(.9).() / f 9.e.e.e / (8) 8(8) (9) () () () / f 8 8 8 /.() () () () () () / f 9 / () () () () () () / f 8 98 8 8 / () () 9() (9) () () / f. 8 9 /.() 9(9).e / f.e.e 9.e.e.e /.(8) ().e / -D -D f e+ e+ e- e- e- e- #succ f /.() (9) () 88() () 8() / f 8 8 8 9 9 9 / () () () 8() () () / f / ().e / f 8 8.e 9/ 9().e / f / 8.() () () () () () / f 9 8 89 /.8(.).(.).(.).8(.).(.).(.) / f 9 99 /.9(.).(.).(.).89(.).89(.).9(.)/ f 8 9 8 9 8 /.(.).9() 9.() 9.() () () / f 9 9 / 8.() 8.9() () () () () / f 8 9 /.(.).(.).(.).(.).99(.).(.) / f 8 8 9 8 8 /.8(.).(.).(.).(.).(.).9(.)/ f 98 8 /.() (8) 8(8) (8) () () / f 89 /.().() 9(8) () () 8(8) / f 9 9 8 /.(.8).() () () ().(.) / f 8.e.e.e.e.e / ().e / f 8.9e.e.e / ().() 9.() 8() () 9(8) / f 88 8 /.(.).(.).9(.).9(.).(.) (9) / f 8 9 9 9.e.e /.(.).9(.).8(.).8(.).8(.).() / f 9.e.e.e.e / 8().e / f 8.e.e.e.e /.().(.).e / f 89 / () (98) () () () () / f 8 9 98 8.e / (8) () (8) () 98() 9() / f..9e 8.e 8.e /.() 99(e).e / f.e.e.e.e.e.e /.e / Table : Expected running time (ERT in number of function evaluations) divided by the best ERT measured during BBOB-9 (given in the respective first row) for different f values for functions f f. The median number of conducted function evaluations is additionally given in italics, if ERT( ) =. #succ is the number of trials that reached the final target f opt + 8. noiseless functions log of ERT loss ratio CrE = f- - -D -D -D -D log of ERT loss ratio CrE = f- - log of ERT loss ratio CrE = f- - log of ERT loss ratio CrE = f- - noisy functions - log of FEvals / dimension - log of FEvals / dimension - log of FEvals / dimension - log of FEvals / dimension Figure : ERT loss ratio vs. a given budget FEvals. The target value f t used for a given FEvals is the smallest (best) recorded function value such that ERT(f t) FEvals for the presented algorithm. Shown is FEvals divided by the respective best ERT(f t) from BBOB-9 for all functions (noiseless f f, left columns, and noisy f f, right columns) in -D and -D. Line: geometric mean. Box-Whisker error bar: -%-ile with median (box), -9%-ile (caps), and minimum and maximum ERT loss ratio (points). The vertical line gives the maximal number of function evaluations in a single trial in this function subset. Evolutionary Computation, pages. IEEE,. [] T. Glasmachers, T. Schaul, and J. Schmidhuber. A Natural Evolution Strategy for Multi-Objective Optimization. In Parallel Problem Solving from Nature (PPSN),. [] T. Glasmachers, T. Schaul, Y. Sun, D. Wierstra, and J. Schmidhuber. Exponential Natural Evolution Strategies. In Genetic and Evolutionary Computation Conference (GECCO), Portland, OR,. [] N. Hansen, A. Auger, S. Finck, and R. Ros. : Experimental setup. Technical report, INRIA,. [8] N. Hansen, S. Finck, R. Ros, and A. Auger.

f e+ e+ e- e- e- e- #succ f 9 /.8().().().9() () () / f 8 99 /.().().().() 9.() () / f 8 /.().9().() () () () / f 8 8 8 /.(.).(.).().().().() / f 88 8 /.(.).() () () () () / f 9 9 88 8 /.(.8).(.9).().().().() / f 8 9 8 /.().().(.).9().(8).() / f 8 8 9 9 88 8 / 9() () ().e / f 9 8 9 /.().(.).(.).8(.).(.).(.8)/ f 99.e.9e.e.e /.(.).9().().().().() 8/ f 8.e 8.8e.e.e.e /.(8).8().9().e / f 8 9 /.().().9() (8) 8() (8) / f 88 88 8 8 /.().().().().().() / f 8 8 899 / 8.() 9.() (9).e / f 8 89 9 /.(.9).88(.).8().8().8().() / f 88 9 /.().().().().().8() / f 8.e.e.e.9e / 8.() 9(9).e / f 8 9 998 9 /.(.).8(.).8(.).9(.).9(.).(.)/ f 9 9 9 /.8().().().().().9() / f 9 898 8.e.e / (8) () (8).e / f 8. 8 8 9 /.().8(.).9(.).(.8).().() / f 99 8.e /.().() 8.9(8) 8().e / f 8.e.e.e /.(9).().e / f 8 9 /.9().(.).(.).() () () / f.e.e.e /.(.) () 8(9).e / f.(.) (88) 98(9) / f.e.9e.e /.(.) () 88().e / f 8 8 88 / ().(9).().().().() / f 9 9.8e.e.8e / () () 8.() ().e / f 8 8 889 8 / (.9) (9) 8(8).().().() / -D -D Table : ERT ratios, as in table, for functions f f. f e+ e+ e- e- e- e- #succ f 9 9 8 /.8().(.8).().().() 8.() / f 99 9 9 /.(.).8().().().().() / f 9 89 /.().().9().().().8() / f 9 8.e.8e.9e.e / (9).8e / f.9e.e.e.e.e.e / ().e / f 8 8 9 /.(.).(.).(.).(.).8(.8).(.) / f 8 8 8 /.(.9) () 8(98).e / f 8 8 98.e.e.e 9.e /.e / f 9 8 8 8 9 /.(.).(.).9(.8) 9.(.) (.) (8) / f / f / f 9 88 /.99(.) 9() 9(9) 88(9) (9) (9) / f.e.e.9e.9e.9e /.() 9(8).e / f.e.e.e.e.e.e /.e / f 8 99.e.e.e /.(.).(.).(.).(.).(.).(.)/ f.e.9e 8.9e.e.e.e / 9(8).e / f.8e.e.e.9e.e.e / 8.e / f 8 98 8 9 /.9(.).(.).8(.).(.).(.).(.) / f 9 9 9.e.e.9e /.(.) () 8().e / f.8e.8e.e.e.e /.9().e / f 9 9 9 /.8(.).8(.).(.).9(.).(.).() / f 9 8.e.9e.e.8e /.8(.9).e / f.e.e.e.e.e8 ().e / f 9 99 8.e.9e 8.e /.8(.).(.).9(.).().8e / f.e 8.e 8.e /.() (8).e / f.() () / f.e.e.e /.(.) (8).e / f 8.e.e.e.e.e.e 9/ 9().(.).8(.).().().() / f 9.8e.e.e.e.e.e /.e / f 9 99.e.e.e.e / ().().(9).(9).(9).(9) / 9: Noiseless functions definitions. Technical Report RR-89, INRIA, 9. Updated February. [9] N. Hansen, S. Finck, R. Ros, and A. Auger. 9: Noisy functions definitions. Technical Report RR-89, INRIA, 9. Updated February. [] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. IEEE Transactions on Evolutionary Computation, 9:9 9,. [] T. Schaul. Studies in Continuous Black-box Optimization. Ph.D. thesis, Technische Universität München,. [] T. Schaul. Natural Evolution Strategies Converge on Sphere Functions. In Genetic and Evolutionary Computation Conference (GECCO), Philadelphia, PA,. [] T. Schaul, J. Bayer, D. Wierstra, Y. Sun, M. Felder, F. Sehnke, T. Rückstieß, and J. Schmidhuber. PyBrain. Journal of Machine Learning Research, :,. [] T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, ():,. [] Y. Sun, D. Wierstra, T. Schaul, and J. Schmidhuber. Stochastic search using the natural gradient. In International Conference on Machine Learning (ICML), 9. [] D. Wierstra, T. Schaul, T. Glasmachers, Y. Sun, and J. Schmidhuber. Natural Evolution Strategies. Technical report,. [] D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber. Natural Evolution Strategies. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Hong Kong, China, 8.