Bio-inspired Continuous Optimization: The Coming of Age

Similar documents
Experimental Comparisons of Derivative Free Optimization Algorithms

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Empirical comparisons of several derivative free optimization algorithms

Stochastic Methods for Continuous Optimization

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Advanced Optimization

Stochastic optimization and a variable metric approach

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.

Evolution Strategies and Covariance Matrix Adaptation

Optimisation numérique par algorithmes stochastiques adaptatifs

Numerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial)

A Restart CMA Evolution Strategy With Increasing Population Size

Numerical Optimization: Basic Concepts and Algorithms

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed

BI-population CMA-ES Algorithms with Surrogate Models and Line Searches

Three Steps toward Tuning the Coordinate Systems in Nature-Inspired Optimization Algorithms

Three Steps toward Tuning the Coordinate Systems in Nature-Inspired Optimization Algorithms

Adaptive Coordinate Descent

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES

Decomposition and Metaoptimization of Mutation Operator in Differential Evolution

2 Differential Evolution and its Control Parameters

arxiv: v1 [cs.ne] 9 May 2016

Introduction to Optimization

Bounding the Population Size of IPOP-CMA-ES on the Noiseless BBOB Testbed

A0M33EOA: EAs for Real-Parameter Optimization. Differential Evolution. CMA-ES.

Covariance Matrix Adaptation in Multiobjective Optimization

Benchmarking a BI-Population CMA-ES on the BBOB-2009 Function Testbed

Benchmarking a Weighted Negative Covariance Matrix Update on the BBOB-2010 Noiseless Testbed

Benchmarking the (1+1)-CMA-ES on the BBOB-2009 Function Testbed

Comparison-Based Optimizers Need Comparison-Based Surrogates

Adaptive Differential Evolution and Exponential Crossover

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes

Beta Damping Quantum Behaved Particle Swarm Optimization

Natural Evolution Strategies for Direct Search

Multi-start JADE with knowledge transfer for numerical optimization

Investigation of Mutation Strategies in Differential Evolution for Solving Global Optimization Problems

The particle swarm optimization algorithm: convergence analysis and parameter selection

Computational Intelligence Winter Term 2017/18

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1

Efficient Covariance Matrix Update for Variable Metric Evolution Strategies

Real-Parameter Black-Box Optimization Benchmarking 2010: Presentation of the Noiseless Functions

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Research Article A Novel Differential Evolution Invasive Weed Optimization Algorithm for Solving Nonlinear Equations Systems

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds

FALL 2018 MATH 4211/6211 Optimization Homework 4

Black-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

How information theory sheds new light on black-box optimization

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Scientific Computing: Optimization

Sub-Sampled Newton Methods

Algorithms for Constrained Optimization

A (1+1)-CMA-ES for Constrained Optimisation

The CMA Evolution Strategy: A Tutorial

A comparative introduction to two optimization classics: the Nelder-Mead and the CMA-ES algorithms

Differential Evolution: Competitive Setting of Control Parameters

arxiv: v1 [cs.ne] 29 Jul 2014

Unconstrained Multivariate Optimization

A Comparative Study of Differential Evolution, Particle Swarm Optimization, and Evolutionary Algorithms on Numerical Benchmark Problems

Stochastic Search using the Natural Gradient

Comparison of NEWUOA with Different Numbers of Interpolation Points on the BBOB Noiseless Testbed

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Local Search & Optimization

Optimization using derivative-free and metaheuristic methods

Research Article A Hybrid Backtracking Search Optimization Algorithm with Differential Evolution

Improving on the Kalman Swarm

Differential Evolution Based Particle Swarm Optimization

Gradient Descent. Dr. Xiaowei Huang

Principled Design of Continuous Stochastic Search: From Theory to

Particle Swarm Optimization with Velocity Adaptation

Landscapes & Algorithms for Quantum Control

Bioinformatics: Network Analysis

UNCONSTRAINED OPTIMIZATION

Efficient Natural Evolution Strategies

Toward Effective Initialization for Large-Scale Search Spaces

Center-based initialization for large-scale blackbox

Benchmarking Projection-Based Real Coded Genetic Algorithm on BBOB-2013 Noiseless Function Testbed

Quality and Robustness Improvement for Real World Industrial Systems using a Fuzzy Particle Swarm Optimization

Benchmarking Gaussian Processes and Random Forests Surrogate Models on the BBOB Noiseless Testbed

4 damped (modified) Newton methods

Local Search & Optimization

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

An Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version)

1 Numerical optimization

BBOB-Benchmarking Two Variants of the Line-Search Algorithm

Crossover and the Different Faces of Differential Evolution Searches

Selected Topics in Optimization. Some slides borrowed from

Convex Optimization. Problem set 2. Due Monday April 26th

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Verification of a hypothesis about unification and simplification for position updating formulas in particle swarm optimization.

Line Search Methods for Unconstrained Optimisation

Introduction to Optimization

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

Programming, numerics and optimization

A CMA-ES with Multiplicative Covariance Matrix Updates

CLASSICAL gradient methods and evolutionary algorithms

Transcription:

Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27, Singapore, September 27., 27

Problem Statement Continuous Domain Search/Optimization The problem Minimize a fitness function (objective function, loss function) in continuous domain f : S R n R, in the Black Box scenario (direct search) x f(x) Hypotheses domain specific knowledge only used within the black box gradients are not available

Problem Statement Continuous Domain Search/Optimization The problem Minimize a fitness function (objective function, loss function) in continuous domain f : S R n R, in the Black Box scenario (direct search) x f(x) Typical Examples shape optimization (e.g. using CFD) curve fitting, airfoils model calibration biological, physical parameter identification controller, plants, images

Optimization techniques Numerical methods Applied Mathematicians but Heavily rely on theoretical convergence proofs Requires regularity Numerical pitfalls Long history numerical gradient? Bio-inspired algorithms Computer Scientists Recent trendy methods but Computationally heavy No convergence proof mostly from AI field but divided! well, almost see later

This talk Goal Empirical comparison on some artificial testbed illustrating typical difficulties of continuous optimization between some bio-inspired algorithms and some (one!) deterministic optimization method(s) in the back-box scenario without specific intensive parameter tuning

1 Continuous optimization and stochastic search The algorithms 2 Problem difficulties Ruggedness Ill-Conditionning Non-separability 3 Implementations and parameter settings Algorithm implementations Tuning DE Experiments and results Experimental conditions Outputs and Performance measures 5 Conclusion Further work Conclusions

The algorithms Bio-inspired Optimization Algorithms Darwinian Artificial Evolution Repeat (Parent selection Variation Survival selection) Preselection: of CEC 5 Challenge Particle Swarm Optimization Eberhart & Kennedy, 95 Perturb particle velocity best and local best Update best and local best Differential Evolution Rainer and Storn, 95 Add difference vector(s) Uniform crossover Keep best of parent and offspring with proba. 1 CR Covariance Matrix Adaptation-ES Hansen & Ostermeier, 9 Gaussian mutation Keep λ 2 best of λ offspring + Update mutation parameters

The algorithms BFGS Gradient-based methods { xt+1 = x t ρ t d t ρ t = Argmin ρ {F(x t ρd t )} Line search Choice of d t, the descent direction? BFGS: a Quasi-Newton method Maintain an approximation Ĥ t of the Hessian of f Solve for d t Ĥ t d t = f (x t ) Compute x t+1 and update Ĥ t Ĥ t+1 Converges if quadratic approximation of F holds around the optimum Reliable and robust on quadratic functions!

1 Continuous optimization and stochastic search 2 Problem difficulties Ruggedness Ill-Conditionning Non-separability 3 Implementations and parameter settings Experiments and results 5 Conclusion

9 8 7 5 3 2 3 2 1 1 2 3 What makes a problem hard? Non-convexity invalidates most of deterministic theory Ruggedness non-smooth, discontinuous, noisy Multimodality presence of local optima Dimensionality line search is trivial The magnifiscence of high dimensionality Ill-conditioning Non-separability

Ruggedness Monotonous transformation invariance Monotonous transformations x 2 i x 2 i ( x 2 i ) 2 Invariance Comparison-based algorithms PSO, DE, CMA-ES,... are invariant w.r.t. monotonous transformations Gradient-based methods BFGS,... are not

Ruggedness Multimodality Bio-inspired algorithms BFGS are global search algorithms but performance on multi-modal problems depends on population size has no population :-) but starting point is crucial on multimodal functions Replace population by multiple restarts from uniformly distributed points or from the perturbed final point of previous trial the Hessian is anyway reset to I n

Ill-Conditionning Ill-Conditionning The Condition Number (CN) of a positive-definite matrix H is the ratio of its largest and smallest eigenvalues If f is quadratic, f (x) = x T Hx), the CN of f is that of its Hessian H More generally, the CN of f is that of its Hessian wherever it is defined. Graphically, ill-conditioned means squeezed lines of equal function value 1.8...2.2...8 Increased condition number 1.8...2.2...8 1 1.5.5 1 1 1.5.5 1 Issue: The gradient does not point toward the minimum...

Ill-Conditionning A priori discussion Bio-inspired algorithms PSO and DE: population can point toward the minimum CMA-ES: covariance matrix can take longer to learn BFGS Numerical gradient can raise numerical problems Hessian matrix can take longer to learn

Non-separability Separability Definition (Separable Problem) A function f is separable if ( arg min (x 1,...,x n) f (x 1,..., x n ) = arg min f (x 1,...),..., arg min f (..., x n ) x 1 x n solve n independent 1D optimization problems ) Example: Additively decomposable functions n f (x 1,..., x n ) = f i (x i ) i=1 e.g. Rastrigin function 3 2 1 1 2 3 3 2 1 1 2 3

Non-separability Designing Non-Separable Problems Rotating the coordinate system f : x f (x) separable f : x f (Rx) non-separable R rotation matrix Hansen, Ostermeier, & Gawelczyk, 95; Salomon, 9 3 3 2 1 R 2 1 1 1 2 2 3 3 2 1 1 2 3 3 3 2 1 1 2 3

Non-separability Rotational invariance Bio-inspired algorithms PSO: is not rotational invariant see next slide DE: Crossover is not rotational invariant Rotational invariance iff CR = 1 CMA-ES: is rotational invariant BFGS Numerical gradient can raise numerical problems Added to ill-conditionning effects

Non-separability PSO and rotational invariance. A sample swarm Same swarm, rotated V j i (t+1) = Vj i (t)+ c 1 U j i (, 1)(pj i xj i }{{ (t)) + c 2 Ũ j i } (, 1)(gj i xj i }{{ (t)) } approach the "previous" best approach the "global" best

Non-separability PSO and rotational invariance. A sample swarm Same swarm, rotated V j i (t+1) = Vj i (t)+ c 1 U j i (, 1)(pj i xj i }{{ (t)) + c 2 Ũ j i } (, 1)(gj i xj i }{{ (t)) } approach the "previous" best approach the "global" best

Non-separability PSO and rotational invariance. A sample swarm Same swarm, rotated V j i (t+1) = Vj i (t)+ c 1 U j i (, 1)(pj i xj i }{{ (t)) + c 2 Ũ j i } (, 1)(gj i xj i }{{ (t)) } approach the "previous" best approach the "global" best

Non-separability PSO and rotational invariance. A sample swarm Same swarm, rotated V j i (t+1) = Vj i (t)+ c 1 U j i (, 1)(pj i xj i }{{ (t)) + c 2 Ũ j i } (, 1)(gj i xj i }{{ (t)) } approach the "previous" best approach the "global" best

1 Continuous optimization and stochastic search 2 Problem difficulties 3 Implementations and parameter settings Algorithm implementations Tuning DE Experiments and results 5 Conclusion

Algorithm implementations The algorithms Default implementations DE: Matlab code from http://www.icsi.berkeley.edu/ storn/code.html Not really giving default parameters PSO: Std PSO 2, C code from http://www.particleswarm.info/standard_pso_2.c Remark: C code Matlab code here CMA-ES: Matlab code from http://www.bionik.tu-berlin.de/user/niko/ BFGS: Matlab built-in implementation widely blindely used using numerical gradient + multiple restarts local or global

Tuning DE The problem with DE Control parameters NP, F, CR, Stopping Criterion... and strategy to generate difference vector Perturb random or best Number of difference vectors 1 or 2 Slightly mutate perturbation All of the above :-) from public Matlab code 1 DE/rand/1 the basic algorithm 2 DE/local-to-best/1 3 DE/best/1 with jitter DE/rand/1 with per-vector-dither F = rand(f, 1) 5 DE/rand/1 with global-dither either-or-algorithm

Tuning DE DE tuning Experimental conditions Rotated ellipsoid, cond. number = Dimension = Stop when f elli < Strat=1 F=.3 F=.5 F=.7 F=.9.2...8 1 CR Strat=3 log(#evals) 7.5 7.5 5.5 5.5 7.5 F=.3 F=.5 F=.7 F=.9 Strat=2.2.. CR 7 Strat= Design of Experiments variants F = {.3,.5,.7,.9} CR = {.3,.5,.7,.9} NP = {3, 5, } dim 3 runs per setting :-( 8 runs F=.3 F=.5 F=.7 F=.9.2...8 1 CR F=.3 F=.5 F=.7 F=.9 Strat=5.2...8 1 CR Poor DOE, yet totally unrealistic for RWA log(#evals) log(#evals).5 5.5 5.5 7.5.5 5.5.5 F=.3 F=.5 F=.7 F=.9.2.. CR 7 5 F=.3 F=.5 F=.7 F=.9 Strat=.2.. CR

Tuning DE DE Experimental Setting and Discussion Couldn t decide among 2 variants: DE2 Strategy 2 F =.8 CR = 1. NP = dim DE5 Strategy 5 F =. CR = 1. NP = dim Discussion DE2 very close to recommended parameters Except CR =.9, but Rotational invariance iff CR = 1 no crossover

1 Continuous optimization and stochastic search 2 Problem difficulties 3 Implementations and parameter settings Experiments and results Experimental conditions Outputs and Performance measures 5 Conclusion

Experimental conditions The parameters Common Parameters Default parameters except for DE MaxEval = 7 Fitness tolerance = 9 Domain: [ 2, 8] d Optimum not at the center 21 runs in each case except BFGS when little success Population size Standard values: for n =, 2, PSO: + floor(2 n) 1, 18, 22 CMA-ES: + floor(3 ln n), 12, 15 DE: n, 2, To be increased for multi-modal functions e.g. Rastrigin

Experimental conditions The testbed Test functions Ellipsoid function Rosenbrock function Rastrigin function DiffPow Ellipsoid and DiffPow quadratic both separable and rotated for different condition number almost unimodal both original (non-separable) and rotated for different condition number highly multi-modal both separable and rotated convex, but flat unimodal, but non-convex

Outputs and Performance measures Comparisons from CEC 25 challenge on continuous optimization Goals Find the best possible fitness value At minimal cost Number of function evaluations Statistical Measures Must take into account both precision and cost Usual averages and standard deviations of fitness values irrelevant to assess precision Issue: need to impose for obvious practical reasons precision threshold on fitness maximum number of iterations

Outputs and Performance measures SP1 measure SP1 Success Performance one Average required effort for success SP1 = avg # evaluations proportion successful runs Effort to reach a given precision on the fitness (success) Same number of total fitness evaluations to allow comparisons Estimated after a fixed number of runs High (unknown) variance in the case of few successes Could also be estimated after a fixed number of successes Would allow to control the variance A single value, insufficient alone anyway

Outputs and Performance measures Cumulative distributions Cope with both fitness threshold and bound on # evaluations Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 %run 5 %run 5 3 3 2 2 5 7 3 evaluation number fitness value % runs vs # eval to reach success threshold (= 9 ) % runs vs fitness value reached before max eval (= 7 ) Rosenbrock function, dim, cond.

Outputs and Performance measures Dynamic Behaviors Median, best, worse, 25-, 75-percentiles Rosenbrock : 21 trials, dimension 2, tol 1.E 9, alpha, default size, eval max 12 rosenbrock_de2 rosenbrock_de5 function value 1.e+ 5.e+5 1.e+ 1.5e+ 2.e+ 2.5e+ 3.e+ 3.5e+.e+.5e+ function evaluations All runs of 2 settings Comparing DE2 and DE5 on Rosenbrock(), dim=2

Ellipsoid f elli (x) = i 1 n i=1 α n 1 x 2 i = x T H elli x 1... H elli = B @ 1 C A α convex, separable 1.8...2.2...8 1 1.5.5 1 1.8 felli rot(x) = f elli(rx) = x T Helli rotx R random rotation Helli rot = R T H ellir convex, non-separable...2.2...8 1 1.5.5 1 cond(h elli ) = cond(h rot elli ) = α α = 1,..., α = axis ratio of 3, typical for real-world problem

Separable Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 7 Ellipsoid dimension, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 Ellipsoid BFGS Ellipsoid DE5 Ellipsoid DE2 Ellipsoid PSO Ellipsoid CMAES Sqrt of sqrt of ellipsoid BFGS 1 2 Condition number 8

Separable Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 2 7 Ellipsoid dimension 2, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 Ellipsoid BFGS Ellipsoid DE5 Ellipsoid DE2 Ellipsoid PSO Ellipsoid CMAES Sqrt of sqrt of ellipsoid BFGS 1 2 Condition number 8

Separable Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 7 Ellipsoid dimension, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 Ellipsoid BFGS Ellipsoid DE5 Ellipsoid DE2 Ellipsoid PSO Ellipsoid CMAES Sqrt of sqrt of ellipsoid BFGS 1 2 Condition number 8

Rotated Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 7 Ellipsoid dimension, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 1 2 Condition number Rotated Ellipsoid BFGS Rotated Ellipsoid DE5 Rotated Ellipsoid DE2 Rotated Ellipsoid PSO Rotated Ellipsoid CMAES sqrt of sqrt of rotated ellipsoid BFGS 8

Rotated Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 2 7 Ellipsoid dimension 2, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 1 2 Condition number Rotated Ellipsoid BFGS Rotated Ellipsoid DE5 Rotated Ellipsoid DE2 Rotated Ellipsoid PSO Rotated Ellipsoid CMAES sqrt of sqrt of rotated ellipsoid BFGS 8

Rotated Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 7 Ellipsoid dimension, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 1 2 Condition number Rotated Ellipsoid BFGS Rotated Ellipsoid DE5 Rotated Ellipsoid DE2 Rotated Ellipsoid PSO Rotated Ellipsoid CMAES sqrt of sqrt of rotated ellipsoid BFGS 8

Separable Ellipsoid function PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 7 Ellipsoid dimension, 21 trials, tolerance 1e 9, eval max 1e+7 5 SP1 3 2 Ellipsoid BFGS Ellipsoid DE5 Ellipsoid DE2 Ellipsoid PSO Ellipsoid CMAES Sqrt of sqrt of ellipsoid BFGS 1 2 Condition number 8

Ellipsoid: Discussion Bio-inspired algorithms Separable case: PSO and DE insensitive to conditionning... but PSO rapidly fails to solve the rotated version... while CMA-ES and DE (CR = 1) are rotation invariant DE scales poorly with dimension d 2.5 compared to d 1.5 for PSO and CMA-ES and BFGS... vs BFGS BFGS fails to solve ill-conditionned cases Matlab message: Roundoff error is stalling convergence CMA-ES only 7 times slower Line search couldn t find an acceptable point in the current search direction on quadratic functions!

Rosenbrock function (Banana) f rosen (x) = n 1 [ i=1 (1 xi ) 2 + α (x i+1 xi 2)2] Non-separable, but... also ran rotated version α = 2, classical Rosenbrock function α = 1,..., 8 Multi-modal for dimension > 3 3 5 5 2 1 1 2 3

Rosenbrock functions PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 8 dimension, 21 trials, tolerance 1e 9, eval max 1e+7 7 SP1 5 3 2 beta 8

Rosenbrock functions PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 2 8 dimension 2, 21 trials, tolerance 1e 9, eval max 1e+7 7 SP1 5 3 2 beta 8

Rosenbrock functions PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 8 dimension, 21 trials, tolerance 1e 9, eval max 1e+7 7 SP1 5 3 2 beta 8

Rosenbrock function Dim Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS α 1 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha 1, default size, eval max Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha 1, default size, eval max 9 9 8 8 7 7 %run 5 %run 5 3 3 2 2 3 5 7 9 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rosenbrock function Dim Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS α Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 %run 5 %run 5 3 3 2 2 5 7 8 2 2 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rosenbrock function Dim Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS α Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 %run 5 %run 5 3 3 2 2 5 7 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rosenbrock function Dim Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS α 3 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha 3, default size, eval max 9 8 7 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha 3, default size, eval max 9 8 7 %run 5 %run 5 3 3 2 2 5 7 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rosenbrock function Dim Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS α Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 %run 5 %run 5 3 3 2 2 5 7 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rosenbrock function Dim Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS α Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 Rosenbrock : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 %run 5 %run 5 3 3 2 2 5 7 7 1 2 5 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rosenbrock: Discussion Bio-inspired algorithms PSO sensitive to non-separability DE still scales badly with dimension... vs BFGS Numerical premature convergence on ill-condition problems Both local and global restarts improve the results

Rastrigin function f rast (x) = n + n i=1 x2 i cos(2πx i ) 3 2 separable multi-modal 1 1 2 3 3 2 1 1 2 3 f rot rast(x) = f rast (Rx) 3 R random rotation non-separable multimodal 2 1 1 2 3 3 2 1 1 2 3

Rastrigin function - SP1 vs fitness value PSO, DE2, DE5, CMA-ES, and BFGS - PopSize Rastrigin: dimension, NP = 9 7 CMAES Rastrigin PSO Rastrigin DE2 Rastrigin DE5 Rastrigin BFGS Rastrigin CMAES Rotated Rastrigin PSO Rotated Rastrigin DE2 Rotated Rastrigin DE5 Rotated Rastrigin BFGS Rotated Rastrigin SP1 5 3 1 Function Value To Reach

Rastrigin function - SP1 vs fitness value PSO, DE2, DE5, CMA-ES, and BFGS - PopSize 1 Rastrigin: dimension, NP = 1 9 7 CMAES Rastrigin PSO Rastrigin DE2 Rastrigin DE5 Rastrigin BFGS Rastrigin CMAES Rotated Rastrigin PSO Rotated Rastrigin DE2 Rotated Rastrigin DE5 Rotated Rastrigin BFGS Rotated Rastrigin SP1 5 3 1 Function Value To Reach

Rastrigin function - SP1 vs fitness value PSO, DE2, DE5, CMA-ES, and BFGS - PopSize 3 Rastrigin: dimension, NP = 3 9 7 CMAES Rastrigin PSO Rastrigin DE2 Rastrigin DE5 Rastrigin BFGS Rastrigin CMAES Rotated Rastrigin PSO Rotated Rastrigin DE2 Rotated Rastrigin DE5 Rotated Rastrigin BFGS Rotated Rastrigin SP1 5 3 1 Function Value To Reach

Rastrigin function - SP1 vs fitness value PSO, DE2, DE5, CMA-ES, and BFGS - PopSize Rastrigin: dimension, NP = 9 7 CMAES Rastrigin PSO Rastrigin DE2 Rastrigin DE5 Rastrigin BFGS Rastrigin CMAES Rotated Rastrigin PSO Rotated Rastrigin DE2 Rotated Rastrigin DE5 Rotated Rastrigin BFGS Rotated Rastrigin SP1 5 3 1 Function Value To Reach

Rastrigin function - SP1 vs fitness value PSO, DE2, DE5, CMA-ES, and BFGS - PopSize 3 Rastrigin: dimension, NP = 3 9 7 CMAES Rastrigin PSO Rastrigin DE2 Rastrigin DE5 Rastrigin BFGS Rastrigin CMAES Rotated Rastrigin PSO Rotated Rastrigin DE2 Rotated Rastrigin DE5 Rotated Rastrigin BFGS Rotated Rastrigin SP1 5 3 1 Function Value To Reach

Rastrigin function - SP1 vs fitness value PSO, DE2, DE5, CMA-ES, and BFGS - PopSize Rastrigin: dimension, NP = 9 7 CMAES Rastrigin PSO Rastrigin DE2 Rastrigin DE5 Rastrigin BFGS Rastrigin CMAES Rotated Rastrigin PSO Rotated Rastrigin DE2 Rotated Rastrigin DE5 Rotated Rastrigin BFGS Rotated Rastrigin SP1 5 3 1 Function Value To Reach

Rastrigin function - Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS - PopSize Rastrigin : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max Rastrigin : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 8 7 %run %run 5 3 2 evaluation number 1 1 2 fitness value 3 % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rastrigin function - Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS - PopSize 1 Rastrigin : 21 trials, dimension, tol 1.E 9, alpha 1, default size, eval max Rastrigin : 21 trials, dimension, tol 1.E 9, alpha 1, default size, eval max 9 8 7 %run %run 5 3 2 evaluation number 1 1 2 fitness value 3 % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rastrigin function - Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS - PopSize 3 Rastrigin : 21 trials, dimension, tol 1.E 9, alpha 3, default size, eval max Rastrigin : 21 trials, dimension, tol 1.E 9, alpha 3, default size, eval max 9 9 8 8 7 7 %run 5 %run 5 3 3 2 2 3 5 7 1 1 2 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rastrigin function - Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS - PopSize Rastrigin : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max Rastrigin : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 9 8 8 7 7 %run 5 %run 5 3 3 2 2 3 5 7 1 1 2 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rastrigin function - Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS - PopSize 3 Rastrigin : 21 trials, dimension, tol 1.E 9, alpha 3, default size, eval max Rastrigin : 21 trials, dimension, tol 1.E 9, alpha 3, default size, eval max 9 9 8 8 7 7 %run 5 %run 5 3 3 2 2 3 5 7 1 1 2 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rastrigin function - Cumulative distributions PSO, DE2, DE5, CMA-ES, and BFGS - PopSize Rastrigin : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max Rastrigin : 21 trials, dimension, tol 1.E 9, alpha, default size, eval max 9 9 8 8 7 7 %run 5 %run 5 3 3 2 2 3 5 7 1 1 2 3 evaluation number fitness value % success vs # eval to reach success threshold (= 9 ) % success vs fitness value reached before max eval (= 7 )

Rastrigin: Discussion Bio-inspired algorithms Increasing population size improves the results Optimal size is algorithm-dependent CMA-ES and PSO solve separable case PSO about times slower Only CMA-ES solves the rotated Rastrigin reliably requires popsize 3... vs BFGS Gets stuck in local optima Whatever the restart strategies No numerical premature convergence

Away from quadraticity Non-linear scaling invariance Comparison-based algorithms are insensitive to monotonous transformations True for DE, PSO and all ESs BFGS is not and convergence results do depend on convexity Other test functions Simple transformation of ellispoid f SSE (x) = felli (x) The DiffPow function and DiffPow f DiffPow (x) = ( x i 2+( i) )

DiffPow SP1 vs Fitness Values PSO, DE2, DE5, CMA-ES, and BFGS - Dimension 5 Sum of diff. powers: dimension SP1 3 BFGS diffpow DE5 NP= diffpow DE2 NP= diffpow PSO NP= diffpow CMAES diffpow BFGS sqrt(diffpow) DE5 NP= sqrt(diffpow) DE2 NP= sqrt(diffpow) PSO NP= sqrt(diffpow) CMAES sqrt(diffpow) 2 9 3 Diff Pow Value 3

Sqrt and DiffPow: Discussion Bio-inspired algorithms Invariant PSO performs best as expected! DiffPow is separable... vs BFGS Worse on Ellipsoid than on Ellispoid Better on DiffPow than on DiffPow closer to quadraticity? Premature numerical convergence for high CN... fixed by the local restart strategy

1 Continuous optimization and stochastic search 2 Problem difficulties 3 Implementations and parameter settings Experiments and results Experimental conditions Outputs and Performance measures 5 Conclusion

Further work What is missing? The algorithms Other deterministic methods e.g. Scilab procedures PCX crossover operator Specific Evolution Engine The testbed Noisy functions Constrained functions The statistics Confidence bounds for SP1 and other precision/cost measures Real-world functions Which ones??? Do complete experiments

Conclusions Summary Bio-inspired algorithms BFGS All are monotonous-transformation invariant PSO very good... only on separable (easy) functions DE poorly scales up with the dimension CMA-ES is a clear best choice Sensitive to non-separability when CR < 1 Redo experiments with parameter-less restart version Optimal choice for quasi-quadratic functions but can suffer from numerical premature convergence for high condition number Even with restart procedures, fails on multimodal problem dim only here

Conclusions The coming of age The message to our Applied Maths colleagues Bio-inspired vs BFGS but CMA-ES only 7 times slower than BFGS on (quasi-)quadratic functions. is less hindered by high conditionning, is monotonous-transformation invariant, is a global search method! Moreover, Theoretical results are catching up Linear convergence for SA-ES Auger, 5 with bound on the CV speed On-going work for CMA-ES