Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation
|
|
- Johnathan Gaines
- 5 years ago
- Views:
Transcription
1 Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat ORSAY Cedex, France Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
2 Content 1 Problem Statement Black Box Optimization and Its Difficulties Non-Separable Problems Ill-Conditioned Problems 2 Evolution Strategies A Search Template The Normal Distribution Invariance 3 Step-Size Control Why Step-Size Control One-Fifth Success Rule Path Length Control (CSA) 4 Covariance Matrix Adaptation Covariance Matrix Rank-One Update Cumulation the Evolution Path Covariance Matrix Rank-µ Update 5 CMA-ES Summary 6 Theoretical Foundations 7 Comparing Experiments 8 Summary and Final Remarks Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
3 Problem Statement Problem Statement Continuous Domain Search/Optimization Black Box Optimization and Its Difficulties Task: minimize an objective function (fitness function, loss function) in continuous domain f : X R n R, Black Box scenario (direct search scenario) x x f (x) f(x) gradients are not available or not useful problem domain specific knowledge is used only within the black box, e.g. within an appropriate encoding Search costs: number of function evaluations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
4 Problem Statement Problem Statement Continuous Domain Search/Optimization Black Box Optimization and Its Difficulties Goal fast convergence to the global optimum... or to a robust solution x solution x with small function value f (x) with least search cost there are two conflicting objectives Typical Examples shape optimization (e.g. using CFD) model calibration parameter calibration curve fitting, airfoils biological, physical controller, plants, images Problems exhaustive search is infeasible naive random search takes too long deterministic search is not successful / takes too long Approach: stochastic search, Evolutionary Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
5 Problem Statement Problem Statement Continuous Domain Search/Optimization Black Box Optimization and Its Difficulties Goal fast convergence to the global optimum... or to a robust solution x solution x with small function value f (x) with least search cost there are two conflicting objectives Typical Examples shape optimization (e.g. using CFD) model calibration parameter calibration curve fitting, airfoils biological, physical controller, plants, images Problems exhaustive search is infeasible naive random search takes too long deterministic search is not successful / takes too long Approach: stochastic search, Evolutionary Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
6 Problem Statement Problem Statement Continuous Domain Search/Optimization Black Box Optimization and Its Difficulties Goal fast convergence to the global optimum... or to a robust solution x solution x with small function value f (x) with least search cost there are two conflicting objectives Typical Examples shape optimization (e.g. using CFD) model calibration parameter calibration curve fitting, airfoils biological, physical controller, plants, images Problems exhaustive search is infeasible naive random search takes too long deterministic search is not successful / takes too long Approach: stochastic search, Evolutionary Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
7 Problem Statement Black Box Optimization and Its Difficulties Objective Function Properties We assume f : X R n R to be non-linear, non-separable and to have at least moderate dimensionality, say n 10. Additionally, f can be non-convex multimodal non-smooth discontinuous ill-conditioned noisy... there are possibly many local optima derivatives do not exist Goal : cope with any of these function properties they are related to real-world problems Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
8 Problem Statement Black Box Optimization and Its Difficulties Objective Function Properties We assume f : X R n R to be non-linear, non-separable and to have at least moderate dimensionality, say n 10. Additionally, f can be non-convex multimodal non-smooth discontinuous ill-conditioned noisy... there are possibly many local optima derivatives do not exist Goal : cope with any of these function properties they are related to real-world problems Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
9 Problem Statement Black Box Optimization and Its Difficulties What Makes a Function Difficult to Solve? Why stochastic search? non-linear, non-quadratic, non-convex on linear and quadratic functions much better search policies are available ruggedness non-smooth, discontinuous, multimodal, and/or noisy function dimensionality (size of search space) non-separability (considerably) larger than three dependencies between the objective variables ill-conditioning gradient direction Newton direction Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
10 Problem Statement Black Box Optimization and Its Difficulties Ruggedness non-smooth, discontinuous, multimodal, and/or noisy Fitness cut from a 5-D example, (easily) solvable with evolution strategies Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
11 Problem Statement Curse of Dimensionality Black Box Optimization and Its Difficulties The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space. Example: Consider placing 100 points onto a real interval, say [0, 1]. To get similar coverage, in terms of distance between adjacent points, of the 10-dimensional space [0, 1] 10 would require = points. A 100 points appear now as isolated points in a vast empty space. Remark: distance measures break down in higher dimensionalities (the central limit theorem kicks in) Consequence: a search policy (e.g. exhaustive search) that is valuable in small dimensions might be useless in moderate or large dimensional search spaces. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
12 Problem Statement Curse of Dimensionality Black Box Optimization and Its Difficulties The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space. Example: Consider placing 100 points onto a real interval, say [0, 1]. To get similar coverage, in terms of distance between adjacent points, of the 10-dimensional space [0, 1] 10 would require = points. A 100 points appear now as isolated points in a vast empty space. Remark: distance measures break down in higher dimensionalities (the central limit theorem kicks in) Consequence: a search policy (e.g. exhaustive search) that is valuable in small dimensions might be useless in moderate or large dimensional search spaces. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
13 Problem Statement Curse of Dimensionality Black Box Optimization and Its Difficulties The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space. Example: Consider placing 100 points onto a real interval, say [0, 1]. To get similar coverage, in terms of distance between adjacent points, of the 10-dimensional space [0, 1] 10 would require = points. A 100 points appear now as isolated points in a vast empty space. Remark: distance measures break down in higher dimensionalities (the central limit theorem kicks in) Consequence: a search policy (e.g. exhaustive search) that is valuable in small dimensions might be useless in moderate or large dimensional search spaces. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
14 Problem Statement Curse of Dimensionality Black Box Optimization and Its Difficulties The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space. Example: Consider placing 100 points onto a real interval, say [0, 1]. To get similar coverage, in terms of distance between adjacent points, of the 10-dimensional space [0, 1] 10 would require = points. A 100 points appear now as isolated points in a vast empty space. Remark: distance measures break down in higher dimensionalities (the central limit theorem kicks in) Consequence: a search policy (e.g. exhaustive search) that is valuable in small dimensions might be useless in moderate or large dimensional search spaces. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
15 Separable Problems Problem Statement Non-Separable Problems Definition (Separable Problem) A function f is separable if arg min (x 1,...,x n) f (x 1,..., x n ) = ( ) arg min f (x 1,...),..., arg min f (..., x n ) x 1 x n it follows that f can be optimized in a sequence of n independent 1-D optimization processes Example: Additively decomposable functions n f (x 1,..., x n ) = f i (x i ) i=1 Rastrigin function Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
16 Non-Separable Problems Problem Statement Non-Separable Problems Building a non-separable problem from a separable one (1,2) Rotating the coordinate system f : x f (x) separable f : x f (Rx) non-separable R rotation matrix R Hansen, Ostermeier, Gawelczyk (1995). On the adaptation of arbitrary normal mutation distributions in evolution strategies: The generating set adaptation. Sixth ICGA, pp , Morgan Kaufmann 2 Salomon (1996). Reevaluating Genetic Algorithm Performance under Coordinate Rotation of Benchmark Functions; A survey of some theoretical and practical aspects of genetic algorithms. BioSystems, 39(3): Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
17 Problem Statement Ill-Conditioned Problems Curvature of level sets Ill-Conditioned Problems Consider the convex-quadratic function f (x) = 1 2 (x x ) T H(x x ) = 1 2 i h i,i xi i j h i,j x i x j H is Hessian matrix of f and symmetric positive definite gradient direction f (x) T Newton direction H 1 f (x) T Ill-conditioning means squeezed level sets (high curvature). Condition number equals nine here. Condition numbers up to are not unusual in real world problems. If H I (small condition number of H) first order information (e.g. the gradient) is sufficient. Otherwise second order information (estimation of H 1 ) is necessary. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
18 Problem Statement Ill-Conditioned Problems What Makes a Function Difficult to Solve?... and what can be done The Problem Dimensionality, Non-Separability Ill-conditioning The Approach in ESs and continuous EDAs exploiting the problem structure locality, neighborhood, encoding second order approach changes the neighborhood metric Ruggedness non-local policy, large sampling width (step-size) as large as possible while preserving a reasonable convergence speed population-based method, stochastic, non-elitistic recombination operator serves as repair mechanism restarts... metaphors Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
19 Problem Statement Ill-Conditioned Problems What Makes a Function Difficult to Solve?... and what can be done The Problem Dimensionality, Non-Separability Ill-conditioning The Approach in ESs and continuous EDAs exploiting the problem structure locality, neighborhood, encoding second order approach changes the neighborhood metric Ruggedness non-local policy, large sampling width (step-size) as large as possible while preserving a reasonable convergence speed population-based method, stochastic, non-elitistic recombination operator serves as repair mechanism restarts... metaphors Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
20 Problem Statement Ill-Conditioned Problems What Makes a Function Difficult to Solve?... and what can be done The Problem Dimensionality, Non-Separability Ill-conditioning The Approach in ESs and continuous EDAs exploiting the problem structure locality, neighborhood, encoding second order approach changes the neighborhood metric Ruggedness non-local policy, large sampling width (step-size) as large as possible while preserving a reasonable convergence speed population-based method, stochastic, non-elitistic recombination operator serves as repair mechanism restarts... metaphors Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
21 Metaphors Problem Statement Ill-Conditioned Problems Evolutionary Computation Optimization individual, offspring, parent candidate solution decision variables design variables object variables population set of candidate solutions fitness function objective function loss function cost function error function generation iteration... methods: ESs Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
22 Evolution Strategies 1 Problem Statement Black Box Optimization and Its Difficulties Non-Separable Problems Ill-Conditioned Problems 2 Evolution Strategies A Search Template The Normal Distribution Invariance 3 Step-Size Control Why Step-Size Control One-Fifth Success Rule Path Length Control (CSA) 4 Covariance Matrix Adaptation Covariance Matrix Rank-One Update Cumulation the Evolution Path Covariance Matrix Rank-µ Update 5 CMA-ES Summary 6 Theoretical Foundations 7 Comparing Experiments 8 Summary and Final Remarks Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
23 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
24 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
25 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
26 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
27 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
28 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
29 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
30 Evolution Strategies A Search Template Stochastic Search A black box search template to minimize f : R n R Initialize distribution parameters θ, set population size λ N While not terminate 1 Sample distribution P (x θ) x 1,..., x λ R n 2 Evaluate x 1,..., x λ on f 3 Update parameters θ F θ (θ, x 1,..., x λ, f (x 1 ),..., f (x λ )) Everything depends on the definition of P and F θ deterministic algorithms are covered as well In many Evolutionary Algorithms the distribution P is implicitly defined via operators on a population, in particular, selection, recombination and mutation Natural template for Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
31 The CMA-ES Evolution Strategies A Search Template Input: m R n, σ R +, λ Initialize: C = I, and p c = 0, p σ = 0, Set: c c 4/n, c σ 4/n, c 1 2/n 2, c µ µ w /n 2, c 1 + c µ 1, d σ 1 + µ w 1 and w i=1...λ such that µ w = µ 0.3 λ i=1 wi2 While not terminate x i = m + σ y i, y i N i (0, C), for i = 1,..., λ sampling m µ i=1 w i x i:λ = m + σy w where y w = µ i=1 w i y i:λ update mean p c (1 c c ) p c + 1I { pσ <1.5 n} 1 (1 cc ) 2 µ w y w cumulation for C p σ (1 c σ ) p σ + 1 (1 c σ ) 2 µ w C 1 2 y w C (1 c 1 c µ ) C + c 1 p c p T µ c + c µ i=1 w i y i:λ y T i:λ ( ( )) c σ σ exp σ pσ d σ E N(0,I) 1 n, cumulation for σ update C update of σ Not covered on this slide: termination, restarts, useful output, boundaries and encoding Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
32 Evolution Strategies A Search Template Evolution Strategies New search points are sampled normally distributed x i m + σ N i (0, C) for i = 1,..., λ as perturbations of m, where where x i, m R n, σ R +, C R n n the mean vector m R n represents the favorite solution the so-called step-size σ R + controls the step length the covariance matrix C R n n determines the shape of the distribution ellipsoid here, all new points are sampled with the same parameters The question remains how to update m, C, and σ. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
33 Evolution Strategies A Search Template Evolution Strategies New search points are sampled normally distributed x i m + σ N i (0, C) for i = 1,..., λ as perturbations of m, where where x i, m R n, σ R +, C R n n the mean vector m R n represents the favorite solution the so-called step-size σ R + controls the step length the covariance matrix C R n n determines the shape of the distribution ellipsoid here, all new points are sampled with the same parameters The question remains how to update m, C, and σ. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
34 Evolution Strategies The Normal Distribution Why Normal Distributions? 1 widely observed in nature, for example as phenotypic traits 2 only stable distribution with finite variance stable means that the sum of normal variates is again normal: N (x, A) + N (y, B) N (x + y, A + B) helpful in design and analysis of algorithms related to the central limit theorem 3 most convenient way to generate isotropic search points the isotropic distribution does not favor any direction, supports rotational invariance 4 maximum entropy distribution with finite variance the least possible assumptions on f in the distribution shape Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
35 Normal Distribution Evolution Strategies The Normal Distribution 0.4 Standard Normal Distribution probability density probability density of the 1-D standard normal distribution D Normal Distribution probability density of a 2-D normal distribution Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
36 Evolution Strategies The Normal Distribution The Multi-Variate (n-dimensional) Normal Distribution Any multi-variate normal distribution N (m, C) is uniquely determined by its mean value m R n and its symmetric positive definite n n covariance matrix C. The mean value m determines the displacement (translation) value with the largest density (modal value) the distribution is symmetric about the distribution mean D Normal Distribution Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
37 Evolution Strategies The Normal Distribution The Multi-Variate (n-dimensional) Normal Distribution Any multi-variate normal distribution N (m, C) is uniquely determined by its mean value m R n and its symmetric positive definite n n covariance matrix C. The mean value m determines the displacement (translation) value with the largest density (modal value) the distribution is symmetric about the distribution mean D Normal Distribution The covariance matrix C determines the shape geometrical interpretation: any covariance matrix can be uniquely identified with the iso-density ellipsoid {x R n (x m) T C 1 (x m) = 1} Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
38 Evolution Strategies The Normal Distribution... any covariance matrix can be uniquely identified with the iso-density ellipsoid {x R n (x m) T C 1 (x m) = 1} Lines of Equal Density N ( m, σ 2 I ) m + σn (0, I) one degree of freedom σ components are independent standard normally distributed N ( m, D 2) m + D N (0, I) n degrees of freedom components are independent, scaled N (m, C) m + C 1 2 N (0, I) (n 2 + n)/2 degrees of freedom components are correlated where I is the identity matrix (isotropic case) and D is a diagonal matrix (reasonable for separable problems) and A N (0, I) N ( 0, AA T) holds for all A. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
39 Evolution Strategies The Normal Distribution... any covariance matrix can be uniquely identified with the iso-density ellipsoid {x R n (x m) T C 1 (x m) = 1} Lines of Equal Density N ( m, σ 2 I ) m + σn (0, I) one degree of freedom σ components are independent standard normally distributed N ( m, D 2) m + D N (0, I) n degrees of freedom components are independent, scaled N (m, C) m + C 1 2 N (0, I) (n 2 + n)/2 degrees of freedom components are correlated where I is the identity matrix (isotropic case) and D is a diagonal matrix (reasonable for separable problems) and A N (0, I) N ( 0, AA T) holds for all A. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
40 Evolution Strategies The Normal Distribution... any covariance matrix can be uniquely identified with the iso-density ellipsoid {x R n (x m) T C 1 (x m) = 1} Lines of Equal Density N ( m, σ 2 I ) m + σn (0, I) one degree of freedom σ components are independent standard normally distributed N ( m, D 2) m + D N (0, I) n degrees of freedom components are independent, scaled N (m, C) m + C 1 2 N (0, I) (n 2 + n)/2 degrees of freedom components are correlated where I is the identity matrix (isotropic case) and D is a diagonal matrix (reasonable for separable problems) and A N (0, I) N ( 0, AA T) holds for all A. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
41 Evolution Strategies Effect of Dimensionality The Normal Distribution 2 D Normal Distribution N (0, I) N (0, I) / ( n ) 2 N (0, I) N 1/2, 1/2, with modal value: n 1 yet: maximum entropy distribution... ESs Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
42 Evolution Strategies Terminology Evolution Strategies The Normal Distribution Let µ: # of parents, λ: # of offspring Plus (elitist) and comma (non-elitist) selection (µ + λ)-es: selection in {parents} {offspring} (µ, λ)-es: selection in {offspring} (1 + 1)-ES Sample one offspring from parent m If x better than m select x = m + σ N (0, C) m x... why? Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
43 Evolution Strategies The Normal Distribution The (µ/µ, λ)-es Non-elitist selection and intermediate (weighted) recombination Given the i-th solution point x i = m + σ N i (0, C) = m + σ y }{{} i =: y i Let x i:λ the i-th ranked solution point, such that f (x 1:λ ) f (x λ:λ ). The new mean reads µ m w i x i:λ i=1 where w 1 w µ > 0, µ i=1 w i = 1, 1 µ i=1 w i 2 =: µ w λ 4 The best µ points are selected from the new solutions (non-elitistic) and weighted intermediate recombination is applied. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
44 Evolution Strategies The Normal Distribution The (µ/µ, λ)-es Non-elitist selection and intermediate (weighted) recombination Given the i-th solution point x i = m + σ N i (0, C) = m + σ y }{{} i =: y i Let x i:λ the i-th ranked solution point, such that f (x 1:λ ) f (x λ:λ ). The new mean reads µ m w i x i:λ i=1 where w 1 w µ > 0, µ i=1 w i = 1, 1 µ i=1 w i 2 =: µ w λ 4 The best µ points are selected from the new solutions (non-elitistic) and weighted intermediate recombination is applied. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
45 Evolution Strategies The Normal Distribution The (µ/µ, λ)-es Non-elitist selection and intermediate (weighted) recombination Given the i-th solution point x i = m + σ N i (0, C) = m + σ y }{{} i =: y i Let x i:λ the i-th ranked solution point, such that f (x 1:λ ) f (x λ:λ ). The new mean reads µ µ m w i x i:λ = m + σ w i y i:λ where i=1 i=1 } {{ } =: y w w 1 w µ > 0, µ i=1 w i = 1, 1 µ i=1 w i 2 =: µ w λ 4 The best µ points are selected from the new solutions (non-elitistic) and weighted intermediate recombination is applied. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
46 Evolution Strategies Invariance Invariance Under Monotonically Increasing Functions Rank-based algorithms Update of all parameters uses only the ranks f (x 1:λ ) f (x 2:λ )... f (x λ:λ ) 3 g(f (x 1:λ )) g(f (x 2:λ ))... g(f (x λ:λ )) g g is strictly monotonically increasing g preserves ranks 3 Whitley The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best, ICGA Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
47 Evolution Strategies Invariance Basic Invariance in Search Space translation invariance is true for most optimization algorithms f (x) f (x a) Identical behavior on f and f a f : x f (x), x (t=0) = x 0 f a : x f (x a), x (t=0) = x 0 + a No difference can be observed w.r.t. the argument of f Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
48 Evolution Strategies Invariance Rotational Invariance in Search Space invariance to orthogonal (rigid) transformations R, where RR T = I e.g. true for simple evolution strategies recombination operators might jeopardize rotational invariance f (x) f (Rx) Identical behavior on f and f R 45 f : x f (x), x (t=0) = x 0 f R : x f (Rx), x (t=0) = R 1 (x 0 ) No difference can be observed w.r.t. the argument of f 4 Salomon Reevaluating Genetic Algorithm Performance under Coordinate Rotation of Benchmark Functions; A survey of some theoretical and practical aspects of genetic algorithms. BioSystems, 39(3): Hansen Invariance, Self-Adaptation and Correlated Mutations in Evolution Strategies. Parallel Problem Solving from Nature PPSN VI Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
49 Invariance Impact Evolution Strategies Invariance The grand aim of all science is to cover the greatest number of empirical facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein Empirical performance results, for example from benchmark functions from solved real world problems are only useful if they do generalize to other problems Invariance is a strong non-empirical statement about generalization generalizing (identical) performance from a single function to a whole class of functions consequently, invariance is important for the evaluation of search algorithms Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
50 Step-Size Control 1 Problem Statement Black Box Optimization and Its Difficulties Non-Separable Problems Ill-Conditioned Problems 2 Evolution Strategies A Search Template The Normal Distribution Invariance 3 Step-Size Control Why Step-Size Control One-Fifth Success Rule Path Length Control (CSA) 4 Covariance Matrix Adaptation Covariance Matrix Rank-One Update Cumulation the Evolution Path Covariance Matrix Rank-µ Update 5 CMA-ES Summary 6 Theoretical Foundations 7 Comparing Experiments 8 Summary and Final Remarks Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
51 Step-Size Control Evolution Strategies Recalling New search points are sampled normally distributed x i m + σ N i (0, C) for i = 1,..., λ as perturbations of m, where where x i, m R n, σ R +, C R n n the mean vector m R n represents the favorite solution and m µ i=1 w i x i:λ the so-called step-size σ R + controls the step length the covariance matrix C R n n determines the shape of the distribution ellipsoid The remaining question is how to update σ and C. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
52 Step-Size Control Why Step-Size Control? Why Step-Size Control 10 0 step size too small random search function value optimal step size (scale invariant) constant step size step size too large function evaluations x 10 4 f (x) = n i=1 x 2 i in [ 2.2, 0.8] n for n = 10 Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
53 Step-Size Control Why Step-Size Control Why Step-Size Control? (5/5, 10)-ES, 11 runs 10 0 with optimal step-size with step-size control 10-1 m x f (x) = n i=1 x 2 i for n = 10 and x 0 [ 0.2, 0.8] n function evaluations with optimal step-size σ Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
54 Step-Size Control Why Step-Size Control Why Step-Size Control? (5/5, 10)-ES, 2 11 runs 10 0 with optimal step-size with step-size control 10-1 m x f (x) = n i=1 x 2 i for n = 10 and x 0 [ 0.2, 0.8] n function evaluations with optimal versus adaptive step-size σ with too small initial σ Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
55 Step-Size Control Why Step-Size Control? (5/5, 10)-ES Why Step-Size Control 10 0 with optimal step-size with step-size control respective step-size 10-1 m x f (x) = n i=1 x 2 i for n = 10 and x 0 [ 0.2, 0.8] n function evaluations comparing number of f -evals to reach m = 10 5 : Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
56 Step-Size Control Why Step-Size Control? (5/5, 10)-ES Why Step-Size Control 10 0 with optimal step-size with step-size control respective step-size 10-1 m x f (x) = n i=1 x 2 i in [ 0.2, 0.8] n for n = function evaluations comparing optimal versus default damping parameter d σ : Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
57 Step-Size Control Why Step-Size Control? Why Step-Size Control 10 0 random search constant σ 0.2 function value normalized progress optimal step size (scale invariant) adaptive step size σ function evaluations normalized step size σ σ opt parent ϕ n σ opt ϕ evolution window refers to the step-size interval ( is observed ) where reasonable performance Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
58 Step-Size Control Methods for Step-Size Control Why Step-Size Control 1/5-th success rule ab, often applied with + -selection increase step-size if more than 20% of the new solutions are successful, decrease otherwise σ-self-adaptation c, applied with, -selection mutation is applied to the step-size and the better, according to the objective function value, is selected simplified global self-adaptation path length control d (Cumulative Step-size Adaptation, CSA) e self-adaptation derandomized and non-localized a Rechenberg 1973, Evolutionsstrategie, Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Frommann-Holzboog b Schumer and Steiglitz Adaptive step size random search. IEEE TAC c Schwefel 1981, Numerical Optimization of Computer Models, Wiley d Hansen & Ostermeier 2001, Completely Derandomized Self-Adaptation in Evolution Strategies, Evol. Comput. 9(2) Anne e Auger & Nikolaus Hansen () CMA-ES June, / 82
59 Step-Size Control One-Fifth Success Rule One-fifth success rule increase σ decrease σ Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
60 One-fifth success rule Step-Size Control One-Fifth Success Rule Probability of success (p s ) 1/2 1/5 Probability of success (p s ) too small Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
61 One-fifth success rule Step-Size Control One-Fifth Success Rule p s : # of successful offspring / # offspring (per generation) ( 1 σ σ exp 3 p ) s p target Increase σ if p s > p target 1 p target Decrease σ if p s < p target (1 + 1)-ES p target = 1/5 IF offspring better parent p s = 1, σ σ exp(1/3) ELSE p s = 0, σ σ/ exp(1/3) 1/4 Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
62 Step-Size Control Path Length Control (CSA) The Concept of Cumulative Step-Size Adaptation Path Length Control (CSA) Measure the length of the evolution path the pathway of the mean vector m in the generation sequence x i = m + σ y i m m + σy w decrease σ increase σ loosely speaking steps are perpendicular under random selection (in expectation) perpendicular in the desired situation (to be most efficient) Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
63 Step-Size Control Path Length Control (CSA) The Equations Path Length Control (CSA) Initialize m R n, σ R +, evolution path p σ = 0, set c σ 4/n, d σ 1. m m + σy w where y w = µ i=1 w i y i:λ update mean p σ (1 c σ ) p σ + 1 (1 c σ ) 2 µw y w }{{}}{{} accounts for 1 c σ accounts for w ( ( )) i cσ p σ σ σ exp d σ E N (0, I) 1 update step-size }{{} >1 p σ is greater than its expectation Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
64 Step-Size Control Path Length Control (CSA) The Equations Path Length Control (CSA) Initialize m R n, σ R +, evolution path p σ = 0, set c σ 4/n, d σ 1. m m + σy w where y w = µ i=1 w i y i:λ update mean p σ (1 c σ ) p σ + 1 (1 c σ ) 2 µw y w }{{}}{{} accounts for 1 c σ accounts for w ( ( )) i cσ p σ σ σ exp d σ E N (0, I) 1 update step-size }{{} >1 p σ is greater than its expectation Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
65 Step-Size Control (5/5, 10)-CSA-ES, default parameters Path Length Control (CSA) with optimal step-size 10 0 with step-size control respective step-size 10-1 m x f (x) = n i=1 x 2 i in [ 0.2, 0.8] n for n = function evaluations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
66 Covariance Matrix Adaptation 1 Problem Statement 2 Evolution Strategies 3 Step-Size Control 4 Covariance Matrix Adaptation Covariance Matrix Rank-One Update Cumulation the Evolution Path Covariance Matrix Rank-µ Update 5 CMA-ES Summary 6 Theoretical Foundations 7 Comparing Experiments 8 Summary and Final Remarks Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
67 Covariance Matrix Adaptation Evolution Strategies Recalling New search points are sampled normally distributed x i m + σ N i (0, C) for i = 1,..., λ as perturbations of m, where where x i, m R n, σ R +, C R n n the mean vector m R n represents the favorite solution the so-called step-size σ R + controls the step length the covariance matrix C R n n determines the shape of the distribution ellipsoid The remaining question is how to update C. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
68 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) initial distribution, C = I... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
69 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) initial distribution, C = I... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
70 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) y w, movement of the population mean m (disregarding σ)... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
71 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) mixture of distribution C and step y w, C 0.8 C y w y T w... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
72 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) new distribution (disregarding σ)... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
73 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) new distribution (disregarding σ)... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
74 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) movement of the population mean m... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
75 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) mixture of distribution C and step y w, C 0.8 C y w y T w... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
76 Covariance Matrix Adaptation Covariance Matrix Rank-One Update Covariance Matrix Adaptation Rank-One Update m m + σy w, y w = µ i=1 w i y i:λ, y i N i (0, C) new distribution, C 0.8 C y w y T w the ruling principle: the adaptation increases the likelihood of successful steps, y w, to appear again another viewpoint: the adaptation follows a natural gradient approximation of the expected fitness... equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
77 Covariance Matrix Adaptation Covariance Matrix Adaptation Rank-One Update Covariance Matrix Rank-One Update Initialize m R n, and C = I, set σ = 1, learning rate c cov 2/n 2 While not terminate x i = m + σ y i, y i N i (0, C), µ m m + σy w where y w = w i y i:λ i=1 C (1 c cov )C + c cov µ w y w y T w }{{} rank-one where µ w = 1 µ i=1 w i 2 1 The rank-one update has been found independently in several domains Kjellström&Taxén Stochastic Optimization in System Design, IEEE TCS 7 Hansen&Ostermeier Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation, ICEC 8 Ljung System Identification: Theory for the User 9 Haario et al An adaptive Metropolis algorithm, JSTOR Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
78 Covariance Matrix Adaptation Covariance Matrix Rank-One Update covariance matrix adaptation C (1 c cov)c + c covµ wy wy T w learns all pairwise dependencies between variables off-diagonal entries in the covariance matrix reflect the dependencies conducts a principle component analysis (PCA) of steps y w, sequentially in time and space eigenvectors of the covariance matrix C are the principle components / the principle axes of the mutation ellipsoid, rotational invariant learns a new, rotated problem representation and a new metric (Mahalanobis) components are independent (only) in the new representation rotational invariant approximates the inverse Hessian on quadratic functions overwhelming empirical evidence, proof is in progress is entirely independent of the given coordinate system for µ = 1: natural gradient ascent on N... cumulation, rank-µ Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
79 Covariance Matrix Adaptation Covariance Matrix Rank-One Update 1 Problem Statement 2 Evolution Strategies 3 Step-Size Control 4 Covariance Matrix Adaptation Covariance Matrix Rank-One Update Cumulation the Evolution Path Covariance Matrix Rank-µ Update 5 CMA-ES Summary 6 Theoretical Foundations 7 Comparing Experiments 8 Summary and Final Remarks Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
80 Covariance Matrix Adaptation Cumulation the Evolution Path Cumulation The Evolution Path Evolution Path Conceptually, the evolution path is the search path the strategy takes over a number of generation steps. It can be expressed as a sum of consecutive steps of the mean m. An exponentially weighted sum of steps y w is used g p c The recursive construction of the evolution path (cumulation): i=0 (1 c c) g i }{{} exponentially fading weights y (i) w p c (1 c c) }{{} decay factor p c + 1 (1 c c) 2 µ w y w }{{}}{{} normalization factor input = m m old σ where µ w = 1 wi 2, c c 1. History information is accumulated in the evolution path. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
81 Covariance Matrix Adaptation Cumulation the Evolution Path Cumulation The Evolution Path Evolution Path Conceptually, the evolution path is the search path the strategy takes over a number of generation steps. It can be expressed as a sum of consecutive steps of the mean m. An exponentially weighted sum of steps y w is used g p c The recursive construction of the evolution path (cumulation): i=0 (1 c c) g i }{{} exponentially fading weights y (i) w p c (1 c c) }{{} decay factor p c + 1 (1 c c) 2 µ w y w }{{}}{{} normalization factor input = m m old σ where µ w = 1 wi 2, c c 1. History information is accumulated in the evolution path. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
82 Covariance Matrix Adaptation Cumulation the Evolution Path Cumulation is a widely used technique and also know as low-pass filter exponential smoothing in time series, forecasting exponentially weighted mooving average iterate averaging in stochastic approximation momentum in the back-propagation algorithm for ANNs why? Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
83 Covariance Matrix Adaptation Cumulation the Evolution Path Cumulation Utilizing the Evolution Path We used y wy T w for updating C. Because y wy T w = y w( y w) T the sign of y w is lost. The sign information is (re-)introduced by using the evolution path. p c (1 c c) p c + 1 (1 c c) }{{} 2 µ w }{{} decay factor normalization factor C T (1 c cov)c + c cov p c p c }{{} rank-one where µ w = 1 wi 2, c cov c c 1 such that 1/c c is the backward time horizon. y w... resulting in Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
84 Covariance Matrix Adaptation Cumulation the Evolution Path Cumulation Utilizing the Evolution Path We used y wy T w for updating C. Because y wy T w = y w( y w) T the sign of y w is lost. The sign information is (re-)introduced by using the evolution path. p c (1 c c) p c + 1 (1 c c) }{{} 2 µ w }{{} decay factor normalization factor C T (1 c cov)c + c cov p c p c }{{} rank-one where µ w = 1 wi 2, c cov c c 1 such that 1/c c is the backward time horizon. y w... resulting in Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
85 Covariance Matrix Adaptation Cumulation the Evolution Path Cumulation Utilizing the Evolution Path We used y wy T w for updating C. Because y wy T w = y w( y w) T the sign of y w is lost. The sign information is (re-)introduced by using the evolution path. p c (1 c c) p c + 1 (1 c c) }{{} 2 µ w }{{} decay factor normalization factor C T (1 c cov)c + c cov p c p c }{{} rank-one where µ w = 1 wi 2, c cov c c 1 such that 1/c c is the backward time horizon. y w... resulting in Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
86 Covariance Matrix Adaptation Cumulation the Evolution Path Using an evolution path for the rank-one update of the covariance matrix reduces the number of function evaluations to adapt to a straight ridge from O(n 2 ) to O(n). (a) a Hansen, Müller and Koumoutsakos Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1), pp Number of f -evaluations divided by dimension on the cigar function 10 4 c c = c c = 1/ n c c = 1/n dimension The overall model complexity is n 2 but important parts of the model can be learned in time of order n Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
87 Rank-µ Update Covariance Matrix Adaptation Covariance Matrix Rank-µ Update x i = m + σ y i, y i N i (0, C), m m + σy w y w = µ i=1 w i y i:λ The rank-µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each generation step. The matrix µ C µ = w i y i:λ y T i:λ i=1 computes a weighted mean of the outer products of the best µ steps and has rank min(µ, n) with probability one. with µ = λ weights can be negative 10 The rank-µ update then reads C (1 c cov ) C + c cov C µ where c cov µ w /n 2 and c cov Jastrebski and Arnold (2006). Improving evolution strategies through active covariance matrix adaptation. CEC. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
88 Rank-µ Update Covariance Matrix Adaptation Covariance Matrix Rank-µ Update x i = m + σ y i, y i N i (0, C), m m + σy w y w = µ i=1 w i y i:λ The rank-µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each generation step. The matrix µ C µ = w i y i:λ y T i:λ i=1 computes a weighted mean of the outer products of the best µ steps and has rank min(µ, n) with probability one. with µ = λ weights can be negative 10 The rank-µ update then reads C (1 c cov ) C + c cov C µ where c cov µ w /n 2 and c cov Jastrebski and Arnold (2006). Improving evolution strategies through active covariance matrix adaptation. CEC. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
89 Rank-µ Update Covariance Matrix Adaptation Covariance Matrix Rank-µ Update x i = m + σ y i, y i N i (0, C), m m + σy w y w = µ i=1 w i y i:λ The rank-µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each generation step. The matrix µ C µ = w i y i:λ y T i:λ i=1 computes a weighted mean of the outer products of the best µ steps and has rank min(µ, n) with probability one. with µ = λ weights can be negative 10 The rank-µ update then reads C (1 c cov ) C + c cov C µ where c cov µ w /n 2 and c cov Jastrebski and Arnold (2006). Improving evolution strategies through active covariance matrix adaptation. CEC. Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
90 Covariance Matrix Adaptation Covariance Matrix Rank-µ Update x i = m + σ y i, y i N (0, C) C µ = 1 yi:λ µ y T i:λ C (1 1) C + 1 C µ m new m + 1 µ yi:λ sampling of λ = 150 solutions where C = I and σ = 1 calculating C where µ = 50, w 1 = = w µ = 1 µ, and c cov = 1 new distribution Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
91 Covariance Matrix Adaptation Covariance Matrix Rank-µ Update Rank-µ CMA versus Estimation of Multivariate Normal Algorithm EMNA global 11 x i = m old + y i, y i N (0, C) C µ 1 (xi:λ m old )(x i:λ m old ) T m new = m old + 1 µ yi:λ rank-µ CMA conducts a PCA of steps x i = m old + y i, y i N (0, C) C µ 1 (xi:λ m new)(x i:λ m new) T m new = m old + 1 µ yi:λ EMNA global conducts a PCA of points sampling of λ = 150 calculating C from µ = 50 new distribution solutions (dots) solutions The CMA-update yields a larger variance in particular in gradient direction, because m new is the minimizer for the variances when calculating C 11 Hansen, N. (2006). The CMA Evolution Strategy: A Comparing Review. In J.A. Lozano, P. Larranga, I. Inza and E. Bengoetxea (Eds.). Towards a new evolutionary computation. Advances in estimation of distribution algorithms. pp Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
92 Covariance Matrix Adaptation Covariance Matrix Rank-µ Update The rank-µ update increases the possible learning rate in large populations roughly from 2/n 2 to µ w/n 2 can reduce the number of necessary generations roughly from O(n 2 ) to O(n) (12) given µ w λ n Therefore the rank-µ update is the primary mechanism whenever a large population size is used say λ 3 n + 10 The rank-one update uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O(n 2 ) to O(n). Rank-one update and rank-µ update can be combined 12 Hansen, Müller, and Koumoutsakos Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1), pp all equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
93 Covariance Matrix Adaptation Covariance Matrix Rank-µ Update The rank-µ update increases the possible learning rate in large populations roughly from 2/n 2 to µ w/n 2 can reduce the number of necessary generations roughly from O(n 2 ) to O(n) (12) given µ w λ n Therefore the rank-µ update is the primary mechanism whenever a large population size is used say λ 3 n + 10 The rank-one update uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O(n 2 ) to O(n). Rank-one update and rank-µ update can be combined 12 Hansen, Müller, and Koumoutsakos Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1), pp all equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
94 Covariance Matrix Adaptation Covariance Matrix Rank-µ Update The rank-µ update increases the possible learning rate in large populations roughly from 2/n 2 to µ w/n 2 can reduce the number of necessary generations roughly from O(n 2 ) to O(n) (12) given µ w λ n Therefore the rank-µ update is the primary mechanism whenever a large population size is used say λ 3 n + 10 The rank-one update uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O(n 2 ) to O(n). Rank-one update and rank-µ update can be combined 12 Hansen, Müller, and Koumoutsakos Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1), pp all equations Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
95 CMA-ES Summary Summary of Equations The Covariance Matrix Adaptation Evolution Strategy Input: m R n, σ R +, λ Initialize: C = I, and p c = 0, p σ = 0, Set: c c 4/n, c σ 4/n, c 1 2/n 2, c µ µ w /n 2, c 1 + c µ 1, d σ 1 + µ w 1 and w i=1...λ such that µ w = µ 0.3 λ i=1 wi2 While not terminate x i = m + σ y i, y i N i (0, C), for i = 1,..., λ sampling m µ i=1 w i x i:λ = m + σy w where y w = µ i=1 w i y i:λ update mean p c (1 c c ) p c + 1I { pσ <1.5 n} 1 (1 cc ) 2 µ w y w cumulation for C p σ (1 c σ ) p σ + 1 (1 c σ ) 2 µ w C 1 2 y w C (1 c 1 c µ ) C + c 1 p c p T µ c + c µ i=1 w i y i:λ y T i:λ ( ( )) c σ σ exp σ pσ d σ E N(0,I) 1 n, cumulation for σ update C update of σ Not covered on this slide: termination, restarts, useful output, boundaries and encoding Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
96 Source Code Snippet CMA-ES Summary Anne Auger & Nikolaus Hansen () CMA-ES June, / 82
Evolution Strategies and Covariance Matrix Adaptation
Evolution Strategies and Covariance Matrix Adaptation Cours Contrôle Avancé - Ecole Centrale Paris Anne Auger January 2014 INRIA Research Centre Saclay Île-de-France University Paris-Sud, LRI (UMR 8623),
More informationProblem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.
Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen INRIA Saclay - Ile-de-France, project team TAO Universite Paris-Sud, LRI, Bat. 49 945 ORSAY
More informationStochastic Methods for Continuous Optimization
Stochastic Methods for Continuous Optimization Anne Auger et Dimo Brockhoff Paris-Saclay Master - Master 2 Informatique - Parcours Apprentissage, Information et Contenu (AIC) 2016 Slides taken from Auger,
More informationAdvanced Optimization
Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France
More informationIntroduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties
1 Introduction to Black-Box Optimization in Continuous Search Spaces Definitions, Examples, Difficulties Tutorial: Evolution Strategies and CMA-ES (Covariance Matrix Adaptation) Anne Auger & Nikolaus Hansen
More informationStochastic optimization and a variable metric approach
The challenges for stochastic optimization and a variable metric approach Microsoft Research INRIA Joint Centre, INRIA Saclay April 6, 2009 Content 1 Introduction 2 The Challenges 3 Stochastic Search 4
More informationNumerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables
100 90 80 70 60 50 40 30 20 10 4 3 2 1 0 1 2 3 4 0 What Makes a Function Di cult to Solve? ruggedness non-smooth, discontinuous, multimodal, and/or noisy function dimensionality non-separability (considerably)
More informationIntroduction to Randomized Black-Box Numerical Optimization and CMA-ES
Introduction to Randomized Black-Box Numerical Optimization and CMA-ES July 3, 2017 CEA/EDF/Inria summer school "Numerical Analysis" Université Pierre-et-Marie-Curie, Paris, France Anne Auger, Asma Atamna,
More informationAddressing Numerical Black-Box Optimization: CMA-ES (Tutorial)
Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial) Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat. 490 91405
More informationOptimisation numérique par algorithmes stochastiques adaptatifs
Optimisation numérique par algorithmes stochastiques adaptatifs Anne Auger M2R: Apprentissage et Optimisation avancés et Applications anne.auger@inria.fr INRIA Saclay - Ile-de-France, project team TAO
More informationTutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation
Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.
More informationBio-inspired Continuous Optimization: The Coming of Age
Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27,
More informationMirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed
Author manuscript, published in "GECCO workshop on Black-Box Optimization Benchmarking (BBOB') () 9-" DOI :./8.8 Mirrored Variants of the (,)-CMA-ES Compared on the Noiseless BBOB- Testbed [Black-Box Optimization
More informationEmpirical comparisons of several derivative free optimization algorithms
Empirical comparisons of several derivative free optimization algorithms A. Auger,, N. Hansen,, J. M. Perez Zerpa, R. Ros, M. Schoenauer, TAO Project-Team, INRIA Saclay Ile-de-France LRI, Bat 90 Univ.
More informationCMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization
CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC
More informationA Restart CMA Evolution Strategy With Increasing Population Size
Anne Auger and Nikolaus Hansen A Restart CMA Evolution Strategy ith Increasing Population Size Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005 c IEEE A Restart CMA Evolution Strategy
More informationExperimental Comparisons of Derivative Free Optimization Algorithms
Experimental Comparisons of Derivative Free Optimization Algorithms Anne Auger Nikolaus Hansen J. M. Perez Zerpa Raymond Ros Marc Schoenauer TAO Project-Team, INRIA Saclay Île-de-France, and Microsoft-INRIA
More informationHow information theory sheds new light on black-box optimization
How information theory sheds new light on black-box optimization Anne Auger Colloque Théorie de l information : nouvelles frontières dans le cadre du centenaire de Claude Shannon IHP, 26, 27 and 28 October,
More informationEvolution Strategies. Nikolaus Hansen, Dirk V. Arnold and Anne Auger. February 11, 2015
Evolution Strategies Nikolaus Hansen, Dirk V. Arnold and Anne Auger February 11, 2015 1 Contents 1 Overview 3 2 Main Principles 4 2.1 (µ/ρ +, λ) Notation for Selection and Recombination.......................
More informationPrincipled Design of Continuous Stochastic Search: From Theory to
Contents Principled Design of Continuous Stochastic Search: From Theory to Practice....................................................... 3 Nikolaus Hansen and Anne Auger 1 Introduction: Top Down Versus
More informationSurrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES
Covariance Matrix Adaptation-Evolution Strategy Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating and Covariance-Matrix Adaptation-ES Ilya Loshchilov, Marc Schoenauer,
More informationBenchmarking the (1+1)-CMA-ES on the BBOB-2009 Function Testbed
Benchmarking the (+)-CMA-ES on the BBOB-9 Function Testbed Anne Auger, Nikolaus Hansen To cite this version: Anne Auger, Nikolaus Hansen. Benchmarking the (+)-CMA-ES on the BBOB-9 Function Testbed. ACM-GECCO
More informationAdaptive Coordinate Descent
Adaptive Coordinate Descent Ilya Loshchilov 1,2, Marc Schoenauer 1,2, Michèle Sebag 2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université
More informationVariable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem
Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem Verena Heidrich-Meisner and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {Verena.Heidrich-Meisner,Christian.Igel}@neuroinformatik.rub.de
More informationCumulative Step-size Adaptation on Linear Functions
Cumulative Step-size Adaptation on Linear Functions Alexandre Chotard, Anne Auger, Nikolaus Hansen To cite this version: Alexandre Chotard, Anne Auger, Nikolaus Hansen. Cumulative Step-size Adaptation
More informationBlack-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed
Black-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed Raymond Ros To cite this version: Raymond Ros. Black-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed. Genetic
More informationThe CMA Evolution Strategy: A Tutorial
The CMA Evolution Strategy: A Tutorial Nikolaus Hansen November 6, 205 Contents Nomenclature 3 0 Preliminaries 4 0. Eigendecomposition of a Positive Definite Matrix... 5 0.2 The Multivariate Normal Distribution...
More informationBenchmarking a BI-Population CMA-ES on the BBOB-2009 Function Testbed
Benchmarking a BI-Population CMA-ES on the BBOB- Function Testbed Nikolaus Hansen Microsoft Research INRIA Joint Centre rue Jean Rostand Orsay Cedex, France Nikolaus.Hansen@inria.fr ABSTRACT We propose
More informationBI-population CMA-ES Algorithms with Surrogate Models and Line Searches
BI-population CMA-ES Algorithms with Surrogate Models and Line Searches Ilya Loshchilov 1, Marc Schoenauer 2 and Michèle Sebag 2 1 LIS, École Polytechnique Fédérale de Lausanne 2 TAO, INRIA CNRS Université
More informationCovariance Matrix Adaptation in Multiobjective Optimization
Covariance Matrix Adaptation in Multiobjective Optimization Dimo Brockhoff INRIA Lille Nord Europe October 30, 2014 PGMO-COPI 2014, Ecole Polytechnique, France Mastertitelformat Scenario: Multiobjective
More informationBenchmarking a Weighted Negative Covariance Matrix Update on the BBOB-2010 Noiseless Testbed
Benchmarking a Weighted Negative Covariance Matrix Update on the BBOB- Noiseless Testbed Nikolaus Hansen, Raymond Ros To cite this version: Nikolaus Hansen, Raymond Ros. Benchmarking a Weighted Negative
More informationReal-Parameter Black-Box Optimization Benchmarking 2010: Presentation of the Noiseless Functions
Real-Parameter Black-Box Optimization Benchmarking 2010: Presentation of the Noiseless Functions Steffen Finck, Nikolaus Hansen, Raymond Ros and Anne Auger Report 2009/20, Research Center PPE, re-compiled
More informationThree Steps toward Tuning the Coordinate Systems in Nature-Inspired Optimization Algorithms
Three Steps toward Tuning the Coordinate Systems in Nature-Inspired Optimization Algorithms Yong Wang and Zhi-Zhong Liu School of Information Science and Engineering Central South University ywang@csu.edu.cn
More informationThree Steps toward Tuning the Coordinate Systems in Nature-Inspired Optimization Algorithms
Three Steps toward Tuning the Coordinate Systems in Nature-Inspired Optimization Algorithms Yong Wang and Zhi-Zhong Liu School of Information Science and Engineering Central South University ywang@csu.edu.cn
More informationCompletely Derandomized Self-Adaptation in Evolution Strategies
Completely Derandomized Self-Adaptation in Evolution Strategies Nikolaus Hansen and Andreas Ostermeier In Evolutionary Computation 9(2) pp. 159-195 (2001) Errata Section 3 footnote 9: We use the expectation
More informationNatural Evolution Strategies for Direct Search
Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday
More informationA (1+1)-CMA-ES for Constrained Optimisation
A (1+1)-CMA-ES for Constrained Optimisation Dirk Arnold, Nikolaus Hansen To cite this version: Dirk Arnold, Nikolaus Hansen. A (1+1)-CMA-ES for Constrained Optimisation. Terence Soule and Jason H. Moore.
More informationBounding the Population Size of IPOP-CMA-ES on the Noiseless BBOB Testbed
Bounding the Population Size of OP-CMA-ES on the Noiseless BBOB Testbed Tianjun Liao IRIDIA, CoDE, Université Libre de Bruxelles (ULB), Brussels, Belgium tliao@ulb.ac.be Thomas Stützle IRIDIA, CoDE, Université
More informationA comparative introduction to two optimization classics: the Nelder-Mead and the CMA-ES algorithms
A comparative introduction to two optimization classics: the Nelder-Mead and the CMA-ES algorithms Rodolphe Le Riche 1,2 1 Ecole des Mines de Saint Etienne, France 2 CNRS LIMOS France March 2018 MEXICO
More informationEfficient Covariance Matrix Update for Variable Metric Evolution Strategies
Efficient Covariance Matrix Update for Variable Metric Evolution Strategies Thorsten Suttorp, Nikolaus Hansen, Christian Igel To cite this version: Thorsten Suttorp, Nikolaus Hansen, Christian Igel. Efficient
More informationThe Dispersion Metric and the CMA Evolution Strategy
The Dispersion Metric and the CMA Evolution Strategy Monte Lunacek Department of Computer Science Colorado State University Fort Collins, CO 80523 lunacek@cs.colostate.edu Darrell Whitley Department of
More informationInvestigating the Local-Meta-Model CMA-ES for Large Population Sizes
Investigating the Local-Meta-Model CMA-ES for Large Population Sizes Zyed Bouzarkouna, Anne Auger, Didier Yu Ding To cite this version: Zyed Bouzarkouna, Anne Auger, Didier Yu Ding. Investigating the Local-Meta-Model
More informationA CMA-ES for Mixed-Integer Nonlinear Optimization
A CMA-ES for Mixed-Integer Nonlinear Optimization Nikolaus Hansen To cite this version: Nikolaus Hansen. A CMA-ES for Mixed-Integer Nonlinear Optimization. [Research Report] RR-, INRIA.. HAL Id: inria-
More informationA0M33EOA: EAs for Real-Parameter Optimization. Differential Evolution. CMA-ES.
A0M33EOA: EAs for Real-Parameter Optimization. Differential Evolution. CMA-ES. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Many parts adapted
More informationComparison-Based Optimizers Need Comparison-Based Surrogates
Comparison-Based Optimizers Need Comparison-Based Surrogates Ilya Loshchilov 1,2, Marc Schoenauer 1,2, and Michèle Sebag 2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 Laboratoire de Recherche
More informationA CMA-ES with Multiplicative Covariance Matrix Updates
A with Multiplicative Covariance Matrix Updates Oswin Krause Department of Computer Science University of Copenhagen Copenhagen,Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik
More informationParameter Estimation of Complex Chemical Kinetics with Covariance Matrix Adaptation Evolution Strategy
MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 68 (2012) 469-476 ISSN 0340-6253 Parameter Estimation of Complex Chemical Kinetics with Covariance Matrix
More informationComputational Intelligence Winter Term 2017/18
Computational Intelligence Winter Term 2017/18 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund mutation: Y = X + Z Z ~ N(0, C) multinormal distribution
More informationCLASSICAL gradient methods and evolutionary algorithms
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 2, NO. 2, JULY 1998 45 Evolutionary Algorithms and Gradient Search: Similarities and Differences Ralf Salomon Abstract Classical gradient methods and
More informationComparison of NEWUOA with Different Numbers of Interpolation Points on the BBOB Noiseless Testbed
Comparison of NEWUOA with Different Numbers of Interpolation Points on the BBOB Noiseless Testbed Raymond Ros To cite this version: Raymond Ros. Comparison of NEWUOA with Different Numbers of Interpolation
More informationIntroduction to Optimization
Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)
More informationChallenges in High-dimensional Reinforcement Learning with Evolution Strategies
Challenges in High-dimensional Reinforcement Learning with Evolution Strategies Nils Müller and Tobias Glasmachers Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {nils.mueller, tobias.glasmachers}@ini.rub.de
More informationBenchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds
Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University
More informationNoisy Optimization: A Theoretical Strategy Comparison of ES, EGS, SPSA & IF on the Noisy Sphere
Noisy Optimization: A Theoretical Strategy Comparison of ES, EGS, SPSA & IF on the Noisy Sphere S. Finck Vorarlberg University of Applied Sciences Hochschulstrasse 1 Dornbirn, Austria steffen.finck@fhv.at
More informationBenchmarking Gaussian Processes and Random Forests on the BBOB Noiseless Testbed
Benchmarking Gaussian Processes and Random Forests on the BBOB Noiseless Testbed Lukáš Bajer,, Zbyněk Pitra,, Martin Holeňa Faculty of Mathematics and Physics, Charles University, Institute of Computer
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationParticle Swarm Optimization with Velocity Adaptation
In Proceedings of the International Conference on Adaptive and Intelligent Systems (ICAIS 2009), pp. 146 151, 2009. c 2009 IEEE Particle Swarm Optimization with Velocity Adaptation Sabine Helwig, Frank
More informationEvolutionary Ensemble Strategies for Heuristic Scheduling
0 International Conference on Computational Science and Computational Intelligence Evolutionary Ensemble Strategies for Heuristic Scheduling Thomas Philip Runarsson School of Engineering and Natural Science
More informationChapter 10. Theory of Evolution Strategies: A New Perspective
A. Auger and N. Hansen Theory of Evolution Strategies: a New Perspective In: A. Auger and B. Doerr, eds. (2010). Theory of Randomized Search Heuristics: Foundations and Recent Developments. World Scientific
More informationLocal-Meta-Model CMA-ES for Partially Separable Functions
Local-Meta-Model CMA-ES for Partially Separable Functions Zyed Bouzarkouna, Anne Auger, Didier Yu Ding To cite this version: Zyed Bouzarkouna, Anne Auger, Didier Yu Ding. Local-Meta-Model CMA-ES for Partially
More informationEvolutionary Gradient Search Revisited
!"#%$'&(! )*+,.-0/214365879/;:!=0? @B*CEDFGHC*I>JK9ML NPORQTS9UWVRX%YZYW[*V\YP] ^`_acb.d*e%f`gegih jkelm_acnbpò q"r otsvu_nxwmybz{lm }wm~lmw gih;g ~ c ƒwmÿ x }n+bp twe eˆbkea} }q k2špe oœ Ek z{lmoenx
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More informationGradient-based Adaptive Stochastic Search
1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction
More informationQuality Gain Analysis of the Weighted Recombination Evolution Strategy on General Convex Quadratic Functions
Quality Gain Analysis of the Weighted Recombination Evolution Strategy on General Convex Quadratic Functions Youhei Akimoto, Anne Auger, Nikolaus Hansen To cite this version: Youhei Akimoto, Anne Auger,
More informationTuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES
Université Libre de Bruxelles Institut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability
More informationViability Principles for Constrained Optimization Using a (1+1)-CMA-ES
Viability Principles for Constrained Optimization Using a (1+1)-CMA-ES Andrea Maesani and Dario Floreano Laboratory of Intelligent Systems, Institute of Microengineering, Ecole Polytechnique Fédérale de
More informationStochastic Search using the Natural Gradient
Keywords: stochastic search, natural gradient, evolution strategies Sun Yi Daan Wierstra Tom Schaul Jürgen Schmidhuber IDSIA, Galleria 2, Manno 6928, Switzerland yi@idsiach daan@idsiach tom@idsiach juergen@idsiach
More informationarxiv: v4 [math.oc] 28 Apr 2017
Journal of Machine Learning Research 18 (2017) 1-65 Submitted 11/14; Revised 10/16; Published 4/17 Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles arxiv:1106.3708v4
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationarxiv: v1 [cs.ne] 9 May 2016
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES) arxiv:1605.02720v1 [cs.ne] 9 May 2016 ABSTRACT Ilya Loshchilov University of Freiburg Freiburg, Germany ilya.loshchilov@gmail.com
More informationGenetic Algorithm: introduction
1 Genetic Algorithm: introduction 2 The Metaphor EVOLUTION Individual Fitness Environment PROBLEM SOLVING Candidate Solution Quality Problem 3 The Ingredients t reproduction t + 1 selection mutation recombination
More informationBenchmarking Projection-Based Real Coded Genetic Algorithm on BBOB-2013 Noiseless Function Testbed
Benchmarking Projection-Based Real Coded Genetic Algorithm on BBOB-2013 Noiseless Function Testbed Babatunde Sawyerr 1 Aderemi Adewumi 2 Montaz Ali 3 1 University of Lagos, Lagos, Nigeria 2 University
More informationDecomposition and Metaoptimization of Mutation Operator in Differential Evolution
Decomposition and Metaoptimization of Mutation Operator in Differential Evolution Karol Opara 1 and Jaros law Arabas 2 1 Systems Research Institute, Polish Academy of Sciences 2 Institute of Electronic
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More informationOptimization using derivative-free and metaheuristic methods
Charles University in Prague Faculty of Mathematics and Physics MASTER THESIS Kateřina Márová Optimization using derivative-free and metaheuristic methods Department of Numerical Mathematics Supervisor
More informationarxiv: v6 [math.oc] 11 May 2018
Quality Gain Analysis of the Weighted Recombination Evolution Strategy on General Convex Quadratic Functions Youhei Akimoto a,, Anne Auger b, Nikolaus Hansen b a Faculty of Engineering, Information and
More informationA recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz
In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,
More informationTECHNISCHE UNIVERSITÄT DORTMUND REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531
TECHNISCHE UNIVERSITÄT DORTMUND REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 5 Design and Management of Complex Technical Processes and Systems by means of Computational Intelligence
More informationarxiv: v1 [cs.ne] 29 Jul 2014
A CUDA-Based Real Parameter Optimization Benchmark Ke Ding and Ying Tan School of Electronics Engineering and Computer Science, Peking University arxiv:1407.7737v1 [cs.ne] 29 Jul 2014 Abstract. Benchmarking
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationA Method for Handling Uncertainty in Evolutionary Optimization with an Application to Feedback Control of Combustion
A Method for Handling Uncertainty in Evolutionary Optimization with an Application to Feedback Control of Combustion Nikolaus Hansen, Andre Niederberger, Lino Guzzella, Petros Koumoutsakos To cite this
More informationThe particle swarm optimization algorithm: convergence analysis and parameter selection
Information Processing Letters 85 (2003) 317 325 www.elsevier.com/locate/ipl The particle swarm optimization algorithm: convergence analysis and parameter selection Ioan Cristian Trelea INA P-G, UMR Génie
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationEntropy Enhanced Covariance Matrix Adaptation Evolution Strategy (EE_CMAES)
1 Entropy Enhanced Covariance Matrix Adaptation Evolution Strategy (EE_CMAES) Developers: Main Author: Kartik Pandya, Dept. of Electrical Engg., CSPIT, CHARUSAT, Changa, India Co-Author: Jigar Sarda, Dept.
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationNatural Evolution Strategies
Natural Evolution Strategies Daan Wierstra, Tom Schaul, Jan Peters and Juergen Schmidhuber Abstract This paper presents Natural Evolution Strategies (NES), a novel algorithm for performing real-valued
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
ESANN'200 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 200, D-Facto public., ISBN 2-930307-0-3, pp. 79-84 Investigating the Influence of the Neighborhood
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationTHIS paper considers the general nonlinear programming
IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS: PART C, VOL. X, NO. XX, MONTH 2004 (SMCC KE-09) 1 Search Biases in Constrained Evolutionary Optimization Thomas Philip Runarsson, Member, IEEE, and Xin
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationComputational Intelligence Winter Term 2017/18
Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today Single-Layer Perceptron Accelerated Learning
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationStable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch
More informationLocal Search & Optimization
Local Search & Optimization CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 4 Some
More information