Mirrored Sampling in Evolution Strategies With Weighted Recombination

Size: px

Start display at page:

Download "Mirrored Sampling in Evolution Strategies With Weighted Recombination"

Bethanie Hodges
6 years ago
Views:

Mirrore Sampling in Evolution Strategies With Weighte Recombination Anne Auger, Dimo Brochoff, Niolaus Hansen To cite this version: Anne Auger, Dimo Brochoff, Niolaus

Genetic an Evolutionary Computation Conference GECCO 20, Jul 20, Dublin, Irelan. ACM Press, pp.86-868, 20, <0.45/200576.200694>.

1 Mirrore Sampling in Evolution Strategies With Weighte Recombination Anne Auger, Dimo Brochoff, Niolaus Hansen To cite this version: Anne Auger, Dimo Brochoff, Niolaus Hansen. Mirrore Sampling in Evolution Strategies With Weighte Recombination. Natalio Krasnogor an Pier Luca Lanzi. Genetic an Evolutionary Computation Conference GECCO 20, Jul 20, Dublin, Irelan. ACM Press, pp , 20, <0.45/ >. <inria > HAL I: inria Submitte on 29 Jul 20 HAL is a multi-isciplinary open access archive for the eposit an issemination of scientific research ocuments, whether they are publishe or not. The ocuments may come from teaching an research institutions in France or abroa, or from public or private research centers. L archive ouverte pluriisciplinaire HAL, est estinée au épôt et à la iffusion e ocuments scientifiques e niveau recherche, publiés ou non, émanant es établissements enseignement et e recherche français ou étrangers, es laboratoires publics ou privés.

2 Mirrore Sampling in Evolution Strategies With Weighte Recombination Anne Auger TAO Team, INRIA Saclay Île-e-France LRI, University Paris Su 9405 Orsay Ceex, France Dimo Brochoff Sysmo Team LIX, Ecole Polytechnique 928 Palaiseau Ceex, France Niolaus Hansen TAO Team, INRIA Saclay Île-e-France LRI, University Paris Su 9405 Orsay Ceex, France ABSTRACT This paper introuces mirrore sampling into evolution strategies ESs with weighte multi-recombination. Two further heuristics are introuce: pairwise selection selects at most one of two mirrore vectors in orer to avoi a bias ue to recombination. Selective mirroring only mirrors the worst solutions of the population. Convergence rates on the sphere function are erive that also yiel upper bouns for the convergence rate on any spherical function. The optimal fraction of offspring to be mirrore is regarless of pairwise selection one without selective mirroring an about 9% with selective mirroring, where the convergence rate reaches a value of This is an improvement of 56% compare to the best nown convergence rate of 0.25 with positive recombination weights. Categories an Subject Descriptors G..6 Numerical Analysis]: Optimization global optimization, unconstraine optimization; F.2. Analysis of Algorithms an Problem Complexity]: Numerical Algorithms an Problems General Terms Algorithms, Theory. INTRODUCTION Deranomization of ranom numbers is a general technique where inepenent samples are replace by epenent ones. Recent stuies showe, for example, how eranomization via mirrore sampling can improve, λ- an + λ-ess 6, 3]. Instea of generating λ inepenent an ientically istribute i.i.. search points in iteration as X + σ N i where X is the current search point, σ the current step-size, an N i a ranom sample from a multivariate normal istribution, the +, λ-es with mirrore sampling always pairs samples one by one an prouces λ/2 inepenent search points X + σ N i an λ/2 epenent c ACM, 20. This is the authors version of the wor. It is poste here by permission of ACM for your personal use. Not for reistribution. The efinitive version was publishe at GECCO, July 2 6, 20, Dublin, Irelan. ones as X σ N i i λ/2. In the en, the best out of these λ search points is use as next search point X + in the, λ-es an in the +λ-es the best out of the λ new an the ol X. Several ES variants using mirrore mutations showe noticeable improvements over their unmirrore counterparts not only in theoretical investigations on simple test functions such as the sphere function, but also in exhaustive experiments within the COCO/BBOB framewor 6, 3]. Up to now, the results were restricte to single-parent +, λ-ess though the iea is, in principle, applicable in a straight-forwar manner to population-base ESs such as the / w, λ-es where the best out of the λ offspring are use to compute the new search point X + via weighte recombination. However, the irect application of mirrore mutations in population-base ESs, as for example propose in a more general way by Teytau et al. 2], results in an unesire bias on the step-size, as was argue alreay in 6]. The purpose of this paper is to introuce mirrore mutations into ESs with weighte recombination without introucing a bias on the length of the recombine step. The main iea hereby is pairwise selection that allows only the better solution of a mirrore/unmirrore solution pair to possibly contribute to the weighte recombination. In etail, the contributions of this paper are the introuction of several ES variants that combine mirrore mutations an weighte recombination without a bias on the recombine step, a theoretical investigation of the algorithms convergence rates in finite an infinite imension on spherical functions, the computation of optimal recombination weights, an an experimental comparison of convergence rates with only positive recombination weights, in particular evaluating the impact of mirrore mutations. The paper is organize as follows. After introucing the baseline / w, λ-es, Sec. 2 explains in etail how mirrore mutations can be introuce in this algorithm. Section 3 theoretically investigates the convergence rate of three variants in finite an infinite imension. Section 4 presents a comparison of the algorithms base on the numerical estimation of their convergence rates on the sphere function. Section 5 summarizes an conclues the paper. Notations. For a ranom vector x R n, x] will enote its first coorinate. The vector, 0,..., 0 will be enote e. A ranom vector following a multivariate normal istribution with mean vector zero an covariance matrix ientity will be calle stanar multivariate normal istribution.

3 Algorithm / w, -ES with ranom or selective mirroring with or without resample mutation lengths for the mirrore vectors : given: f : R R, X 0 R, σ 0 > 0, N +, λ m {0,..., },, N r r N, weights w R with wi = an {wi 0} λm 2: r 0 inex of current ranom sample 3: 0 iteration counter for notational consistency 4: while stopping criterion not fulfille o 5: /* generate offspring inepenently */ 6: i 0 offspring counter 7: while i < o 8: i i + 9: r r + 0: X i = X + σ N r : if selective mirroring then 2: X,..., X = argsortfx,..., fx 3: /* mirror λ m offspring */ 4: while i < + λ m o 5: i i + 6: /* epenent sample with new length N r */ 7: if resample lengths then 8: r r + ; 9: X i = X σ N r X i λm X Xi λm X 20: else 2: X i = X X i λm X 22: /* weighte recombination */ 23: X,..., X = 24: argsortfx,..., fx λ m, 25: min{fx λ m+, fx + },..., 26: min{fx, fx } 27: X + = X + wixi X 28: σ + = upateσ, X,..., X, X 29: + iteration counter 2. MIRRORING AND WEIGHTED RECOM- BINATION 2. The Stanar / w, λ-es As our baseline algorithm, we briefly recapitulate the stanar / w, λ-es with weighte recombination an show its pseuo coe in Algorithm, where λ m = 0 an therefore λ =. Weighte recombination generalizes intermeiate multi-recombination where all weights are equal, has been stuie in 0, 7, 2], an must be nowaays regare as stanar in ESs. Given a starting point X 0 R, an initial step-size σ 0 > 0, a population size λ N +, an weights w R with wi = for a chosen λ, the / w, λ-es generates at iteration λ inepenent search points from a multivariate normal istribution with mean X an variance σ 2 an recombines the best of them in terms of a weighte sum to become the new mean X + of the next iteration in case of negative weights, the steps must be recombine, see line 27 in Algorithm. Typically, is chosen as λ/2 an w i = ln λ+ 2 lni > 0 in the scope of the CMA-ES 7]. As upate rule for the step-size σ in the / w, λ-es, several techniques such as self-aaptation ] or cumulative step-size aaptation 9] are available. Of particular theoretical interest is the scale- Figure : Illustration of i.i.. mutations left an mirrore mutations mile an mirrore mutations with resample lengths right. Dashe arrows epict the mirrore samples. Dotte lines connect points with equal function value. invariant constant step-size σ = σ X which epens on the istance to the optimum assume WLOG in zero an which allows to prove bouns on the convergence rate of evolution strategies with any aaptive step-size upate, see Sec The Mirroring Iea Deranomize mutations 2] an more recently mirrore mutations 6, 3] have been propose to replace the inepenent mutations in evolution strategies by epenent ones in orer to reuce the probability of unlucy events resulting in an increase in the convergence spee of the algorithms. Instea of sampling all λ offspring i.i.., an algorithm with mirrore mutations samples only λ/2 i.i.. offspring as X i = X + σ N i i λ/2 an up to λ/2 further offspring epening on the alreay rawn samples as X i = X σ N i λ/2 for λ/2 + i λ, see Fig., left versus mile. In evolution strategies with weighte recombination an cumulative step-size aaption, mirrore mutations cause a bias towars smaller step-sizes 6, Fig. 4], see Fig 2. The bias can cause premature convergence of the algorithm. The reason for the bias is that if both samples X + σ N i an X σ N i are consiere within weighte recombination, they partly cancel each other out an the realize shift of X will be smaller than with inepenent mutations. Consequently, eranomize step-size control lie cumulative stepsize aaptation 9] will cause the step-size to shrin. In this paper, we therefore introuce pairwise selection which prevents this bias: unmirrore an mirrore offspring are paire two-by-two an only the better among the unmirrore sample X + σ N i an its mirrore counterpart X σ N i is use within the weighte recombination but never both. Here, we introuce a new notation: the number of inepenent offspring per iteration is enote by an the number of mirrore offspring per iteration is enote by λ m, where each iteration λ = + λ m solutions are evaluate on f. As a result 0 λ m which results in the stanar / w, λ-es in case λ m = 0. We enote the new algorithm / w, + λ m-es. Note that the iea of sequential mirroring of 6, 3], i.e., stopping the generation of new offspring as soon as a better solution than the parent is foun, is not applie here. With Mutative self-aaptation has no such bias, but suffers in combination with weighte recombination from a far too small control target step-size an can achieve close to optimal step-sizes only with a peculiar parameter tuning.

4 Figure 2: Illustration of the bias towars smaller step-sizes uner ranom selection introuce by recombination of mirrore vectors in the CMA-ES. Shown is the step-size σ versus the number of function evaluations of 20 runs on a purely ranom fitness function in imension 0. The upper ten graphs show the 5/5 w, 0-CMA-ES revealing a ranom wal on logσ. The lower ten graphs show the 5/5 w, m -CMA-ES without pairwise selection of mirrore samples. recombination, the meaning of a comparison with the parent is not unique an aitional algorithm esign ecisions were necessary 2. Instea, selective mirroring is introuce. 2.3 Ranom Versus Selective Mirroring We consier two variants of the / w, -ES 3 that iffer in the choice of mirrore offspring. The / w, + λ ran m -ES, where λ m ranomly chosen offspring are mirrore, an the / w, +λ sel m -ES with selective mirroring, where only the λ m worst offspring are selecte for mirroring. The reason behin the latter variant of selecting the worst offspring for mirroring is the following: in particular on fitness functions with convex sublevel sets 4 we o not expect the best of offspring to improve by mirroring. For an offspring that is better than the current search point X, the mirroring woul always result in a worse solution since never both an inepenently rawn solution an its mirrore counterpart can be better than the parent in case of convex sublevel sets 3, Proposition 2]. Regaring the comparison of ranom an selective mirroring, two questions arise: i how much faster can an ES become with selective mirroring an ii what is the optimal choice for the number λ m of mirrore offspring. Both questions will be answere in the following by theoretical investigations of the algorithms convergence rates. 2 The super-parent an istribution mean X, resulting from the weighte recombination, is not irectly comparable to the offspring, because epening on, an λ with a large probability all i.i.. sample offspring might be worse. However, a feasible heuristic coul be to compare with the best offspring from the last iteration. 3 Aaptive variants with a variable number of mirrore offspring that epens on the observe fitness function values have also been consiere but are not inclue here. 4 The sublevel set S l contains all points in search space with a fitness value of at most l: S l = {x R fx l}. 2.4 Resample Mirrore Vector Lengths Within the / w, λ-es, solutions that happen to originate from a comparatively long step ten to be worse than average. Therefore, the solutions chosen by selective mirroring are biase towars long mutation steps an their mirrors ten to be ba solely because they originate from a long mirrore step still they ten to be better than the original samples. Hence, we consier a variant of mirroring where the lengths of the mirrore vectors σ N r are i.i.. resample, i.e., where X + = X σ N r is replace by N X + = X σ r+ N r with N r+ the newly sample N r length of the mirrore vector, cp. Fig., right. We refer to this last variant as / w, + λ sel m -ES with resample mutation lengths. Algorithm shows the pseuo coe of all variants with ranom/selective mirroring an with/without resample lengths of the mirrore offspring. Theoretical results in the next section will not only show how much improvement in the convergence rate can be gaine by the resample lengths but also that the variants with an without resample lengths are the same if the imension goes to infinity. 2.5 Algorithm Parameters The algorithms we have escribe involve several parameters, the number of inepenent samples, the number λ m of mirrore offspring, the number of offspring to be recombine, the weights w for recombination, an the step-size σ. The convergence rates epen on the choice of these parameters. In the sequel, we investigate upper bouns for the convergence rate on spherical functions by investigating the optimal choice of the ifferent parameters, given w i =. 3. CONVERGENCE RATE UPPER BOUNDS ON SPHERICAL FUNCTIONS In orer to fin optimal settings for the ifferent parameters, we investigate convergence rates on spherical functions having WLOG the optimum in zero, i.e., g x, g M where M enotes the set of functions g : R R that are strictly increasing. Convergence rates epen on the step-size aaptation chosen. We stuy the case of the scaleinvariant constant step-size where σ = σ X, that we refer to as scale-invariant ES however, as most ESs are scale invariant, the name is somewhat abusive. For an optimal choice of the constant σ, the scale-invariant ES gives the upper boun for convergence rates achievable by any strategy with step-size aaptation on spherical functions see below. This case is of great relevance, because also practical stepsize control mechanisms can achieve close-to-optimal stepsizes on spherical functions. For ifferent algorithm variants with scale-invariant step-size, we prove linear convergence in expectation in the following sense: there exists a CR R such that for all, 0 N with > 0 E Λ 0 ln X X 0 ] = CR, where Λ is the number of evaluations per iteration introuce to efine the convergence rate per function evaluation. The constant CR 5 is the convergence rate of the algorithm 5 Convergence taes place if an only if CR > 0, an for non-elitist ESs only numerical integration of CR expresse

5 an epens on the imension an the parameters σ,, λ m, an w, see Sections 2. an 2.5 an Algorithm. The convergence rate efine in is compatible with almost sure convergence 4]. Hence, we will prove that with scale-invariant step-size, almost surely ln X Λ 0 X 0 CR. 2 More loosely we may say that X + exp Λ CR X. 3. The / w, λ-es To serve as a baseline algorithm for a later comparison with algorithms using mirrore mutations, we first investigate the convergence rate of the scale-invariant version of the stanar / w, λ-es see Algorithm with λ m = Finite Dimension Results At each step, λ inepenent vectors following a stanar multivariate normal istribution N i are sample to create the offspring X i = X + σ X N i. The offspring are rane accoring to their fitness function value. We enote Z :λ,..., Z λ:λ the sorte vector of multivariate normal istributions, such that the best offspring equals X + σ X Z :λ, the secon best X + σ X Z 2:λ, etc. The istribution of Z :λ,..., Z λ:λ epens a priori on X. However, in the scale-invariant step-size case on spherical functions the istribution is inepenent of X an is etermine by raning of e + σn i for i =... λ, that can be simplifie to raning 2N i ] + σ N i 2. These results are state in the following lemma. Lemma. For the scale-invariant / w, λ-es minimizing spherical functions, the probability istribution of the vector Z :λ,... Z λ:λ is inepenent of X an equals Z :λ,..., Z λ:λ = argsort{h σn,..., h σn λ } 3 where h σx = 2x] +σ x 2 an N i i λ are λ inepenent stanar multivariate normal istribution. Proof. At iteration, starting from X, the istribution of the selecte N i is etermine by the raning of X + σ X N i i λ. Normalizing by X will not change the raning such that the istribution is etermine by the raning of X X + σn i for i λ. X X However, since the istribution of N i is spherical, the istribution of the selecte N i will be the same if we start from any vector with unit norm lie, so WLOG the istribution will be etermine by raning e + σn i for i λ or since composing by gx = x 2 will not change the raning e + σn i 2 for i λ. as an expectation of some continuous ranom variables an thus as an integral can reveal the sign of CR. In contrast to our results, classical progress-rate erivations, 2, 5] only approximate the actual strategy behavior an the result of the approximation can be comparatively poor for small values of /λ. Consequently, the classical progress-rate ϕ might be negative even when e facto convergence taes place, or vice versa 4]. We evelop e + σn i 2 an obtain +2σN i ] +σ 2 N i 2. Raning will not be affecte if we subtract an ivie by σ such that the istribution of the selecte N i is etermine by the raning with respect to h σn i. In the / w, λ-es, the best offspring X + σ X Z i:λ for i =,..., are recombine into the new parent X + = X + σ X wiz i:λ where w,..., w R an wi =. The next theorem gives the expression of the convergence rate associate to the / w, λ-es with scaleinvariant step-size as a function of σ an w = w,..., w. Theorem. For the / w, λ-es with scale-invariant step-size on g x, g M, an 2 hol an the convergence rate equals CRσ, w = 2λ E ln +2 2 σw i Z i:λ ] + σw iz i:λ where w i R an wi =. Proof. We start from X + = X + σ X w iz i:λ that we normalize by X an tae the logarithm ln X + X = ln X / X + σ 4 w iz i:λ. 5 Using the isotropy of the sphere function an of the multivariate normal istribution, together with the previous lemma, we fin that the ranom variables in the RHS of the previous equation are i.i.. istribute as ln e +σ wiz i:λ. Applying the Law of Large Numbers to we fin thus that λ ln X X = ln X i+ 0 λ X i i=0 λ ln X X = 0 λ Eln e + σw iz i:λ ] We evelop the convergence in the RHS of the previous equation using the ientity ln e + u = 2 ln + 2u + u 2], for u R n 6 that can be obtaine in a straightforwar way by writing ln e + u as 2 ln e + u 2 an eveloping the norm. We then obtain that 2 hols with the convergence rate CR as given in the theorem. To obtain the convergence in expectation as efine in, we tae the expectation in 5. For a more etaile argumentation why the expectation exists an for the inepenence of the ranom variables ln X / X, we refer to 8]. The convergence rate of the / w, λ-es is a function of σ an the weights. The optimal convergence rate computes with CRσ, w from 4 an the constraint w i = to CR opt / w,λ = max y R CR / w,λ y i, y yi. 7

6 Optimal weights can be obtaine as w opt i = y opt i / yopt i yi with y opt = argmax y R CR /w,λ yi, y/ an the optimal step-size equals i. yopt The convergence rate of the / w, λ-es with scale-invariant constant step-size gives the upper boun for the convergence rate of any step-size aaptive / w, λ-es with isotropic mutations on spherical functions 8, Theorem ] Asymptotic Results In the following, we investigate the limit of the convergence rate when the imension of the search space goes to infinity. Theorem 2. The convergence rate of the / w, λ-es on the class of spherical functions g x, g M, given scale-invariant step-size an wi =, satisfies σ lim CR, w = σ 2 wi 2 + σ w ien i:λ λ 2 where N i:λ is the ith orer statistic of λ inepenent normal istributions with mean 0 an variance, i.e., the ith smallest of λ inepenent variables N i N0,. Proof. Let N i i λ be λ inepenent stanar multivariate normal istributions. With the set of all permutations of {,..., λ} enote by Pλ, we have that 2λ 2λ ln + 2 σ π Pλ ln + 2 σ w iz :i] + σ2 w in πi ] + σ2 ] wiz:i 2 = win πi 2 {hσ/ N π... h σ/ N πλ }. 8 For any permutation π Pλ an any i λ h σ/ N πi = 2N πi ] + σ N πi 2 such that lim h σ/ N πi = 2N πi ] + σ. Therefore, {hσ/ N π... h σ/ N πλ } {N π ]...N πλ ] }. 9 In aition, since every component of the vector follows a stanar normal istribution with mean zero an variance w2 i, we have by the Law of Large Numbers that win πi 2 / converges to w2 i an thus 2 ln + 2 σ w in πi ] + σ2 ] win πi 2 σ w in πi ] + σ2 2 ] win πi wi 2. 0 Injecting the limits from 9 an 0 into 8, we obtain 2λ ln + 2 σ w iz :i] + σ2 ] wiz:i 2 σ w in πi ] + σ2 wi 2 λ 2 π Pλ {N π ]...N πλ ] }. In the RHS of the previous equation, we recognize the istribution of orer statistics of stanar normal istributions. Thus, 2λ ln + 2 σ w iz :i] + σ2 σ λ wiz:i 2 w in i:λ + σ2 2 w 2 i ]. 2 To fin the announce result, we nee to obtain the limit in expectation. To o so we nee to verify that the ranom variables are uniformly integrable. For this quite technical step we refer to 8]. The asymptotic classical progress rate ϕ of the / w, λ- ES is erive from an approximation of X + / X an coincies with the limit from the previous theorem σ ϕ /w,λσ, w = λ lim CR, w. As for the finite imensional case, we consier the variable y = σ w R with σ = yi an compute the optimal asymptotic convergence rate that is reache for y opt i = EN i:λ to CR opt, σ / = max lim CR w,λ y R, y = EN i:λ 2 3 σ 2λ as alreay foun in 2]. Optimal weights are proportional to y opt i, thus w opt, i = EN i:λ / EN i:λ. Whether or not negative weights are allowe oes not effect the optimal positive weight values, asie from the normalization factor. 3.2 The / w, λ-es With Ranom an Selective Mirroring Following the same approach as in the previous section, we analyze the convergence rates of the ifferent mirroring variants first for finite imension an then asymptotically in the imension. We efine as Z,..., Z λii, the vector of orere steps to be recombine; namely for a given ES variant, the best point to be recombine for which the highest weight will be given is X + σ X Z, the secon best X + σ X Z 2,.... Among the ifferent algorithm variants, the istribution of the vector of orere steps changes. In the sequel, we express this istribution for the ES variants with ranom mirroring, selective mirroring an selective mirroring with resample length. Selecte vector for ranom mirroring. In ranom mirroring, we mirror λ m arbitrary vectors among the inepenent ones. Without loss of generality, we can mirror the λ m last vectors. For the mirrore pairs, only the best of the two vectors is recombine. The istribution of the resulting vector of orere steps is expresse in the following lemma: Lemma 2. In the / w, +λ ran m -ES with scale-invariant step-size on spherical functions, the istribution of the vector of orere steps to be recombine is given by Z,..., Z λii = argsort{h σn,..., h σn λ m, min{h σn λ m+, h σ N λ m+ },..., where h σx = 2x] + σ x 2. min{h σn, h σ N }} 4

7 Proof. Let N i i λ be λ inepenent stanar multivariate normal istributions. At iteration starting from X, we ran the iniviuals X + σ X N i for i λ m an the best of the mirrore/unmirrore pairs for the λ m last iniviuals, i.e., we ran X + σ X N i for i λ m with min{ X + σ X N i, X σ X N i } for i = λ m +,...,. As in Lemma, we fin that the raning oes not change if we normalize by X an if we start from e such that the istribution is etermine by the raning of e + σn i for i λ m an min{ e + σn i, e σn i } for i = λ m +,...,. As in Lemma, we square the terms an evelop them to fin that the istribution is etermine by the raning accoring to h σ as given in 4. Selecte vector for selective mirroring. In selective mirroring, where we mirror the worse offspring, we nee to sort the offspring to etermine which offspring to mirror. Let Y be efine as Y := Y,..., Y λii := argsort{h σn,..., h σn } where h σx = 2x] + σ x 2. Then for the worst λ m vectors of Y, we select the pair-wise best among offspring an mirrore one an we eep the other vectors unchange: Y sel i = Y i, i =,..., λ m 5 Y sel i = argmin{h σy i, h σ Y i}, λ m + i 6 Finally, as expresse in the following lemma, the istribution of the orere steps to be recombine is the result of the sorting of the Y i sel vectors: Lemma 3. In the / w, +λ sel m -ES with scale-invariant step-size on spherical functions, the istribution of the vector of orere steps to be recombine is given by Z,..., Z λii = argsort{h σy sel,..., h σy sel }, 7 where Y sel i is efine in 5 an 6. Proof. As in Lemma an Lemma 2, the raning can be one normalizing by X an starting from e. Thus, it follows from the way we have efine Y i sel that the istribution of the vector of orere steps is etermine by 7. Selecte vector for selective mirroring with resample length. The selective mirroring with resample length algorithm iffers from the previous one for the mirroring step in that only the irection is ept an the length is inepenently resample accoring to its original χ-istribution with egrees of freeom. Assuming that sorting of the offspring has been mae accoring to Y as escribe above, the Y sel vector is given by Y sel i = Y i for i =,..., λ m 8 an for i = λ m +,...,, Y sel i = argmin {h σy i, h σ Ñ i Y } i Y i 9 where Ñ i are inepenent vectors following a stanar multivariate normal istribution. As for the previous algorithm, the istribution of the orere steps to be recombine is the result of the sorting of the Y i sel vectors: Lemma 4. In the / w, +λ sel m -ES with scale-invariant step-size on spherical functions, the istribution of the vector of orere steps to be recombine is given by Z,..., Z λii = argsort{h σy sel,..., h σy sel }, 20 where Y sel i is efine in 8 an 9. Proof. As in Lemma an Lemma 2, the raning can be one normalizing by X an starting from e. Thus, it follows from the way we have efine Y i sel that the istribution of the vector of orere steps is etermine by 20. Similarly as for the / w, λ-es, we fin that the convergence rate of the / w, +λ ran m -ES an the / w, + λ sel m -ES with an without resample mutation lengths can be expresse in the following way Theorem 3. The convergence rate of the / w, + λ ran m -ES an the / w, + λ sel m -ES with an without resample mutation lengths equals E ln + 2 σwi Zi] + ] σwizi 2 CRσ, w = 2 + λ m where w i R an wi = an the istributions of the ranom vector Z,..., Z λii are efine in Lemma 2, 3 an 4 respectively. Proof. The proof is similar to the proof of Theorem injecting the istribution of the ranom vectors Z,..., Z λii for the ifferent algorithms. As for the / w, λ-es, optimal convergence rates are solutions of the maximization problem max CR y y i, y R yi 2 with CR from Theorem Asymptotic Results We investigate the limit of the convergence rate given in Theorem 3 when the imension goes to infinity. For the / w, + λ ran m -ES, we efine the ranom vector Z,..., Z λii R as Z,..., Z λii = argsort{n,..., N λ m, N λ m+,..., N } 22 where the N i are inepenent stanar normal istributions. For the / w, + λ sel m -ES with or without resample lengths, we efine the vector Z,..., Z λii = argsort{y,..., Y λii λ m, Y λii λ m+,..., Y λii } 23 where Y,..., Y λii = argsort{n,..., N }. The asymptotic convergence rate for ifferent variants is given in the following theorem. Theorem 4. The convergence rate of the / w, + λ ran m -ES an the / w, +λ sel m -ES with or without resample lengths with scale-invariant step-size an weights

8 optimal convergence rate * im =2 =3 =4 =5 0.2 =6 0.2 =6 0.5 =0 0.5 =0 λ 0. ii =20 λ 0. ii =20 =00 = λ m / λ m / Figure 3: Optimal asymptotic convergence rates CR opt,, λ m, Equation 24, for the / w, + λ m-es versus the ratio λ m/ of mirrore an inepenent offspring for various λ =. Left: / w, +λ ran m - ES with ranom mirroring. Right: / w, + λ sel m -ES with selective mirroring, mirroring the worst λ m from inepenent offspring. In aition, the righthan plot shows the theoretical result for of Equation 25 as a ashe line. optimal convergence rate * im =2 =3 =4 =5 w R with wi = on the class of spherical functions g x, g M satisfies σ lim CR, w σ 2 = wi 2 + σ w iez i + λ m 2 where the istribution of Z i is given in 22 for the / w, + λ ran m -ES an in 23 for the / w, + λ sel m -ES with or without resample lengths for the mirroring. Proof. The proof follows the same lines as the proof for the / w, λ-es Theorem 2. The limit is the same for selective mirroring with or without resample length because asymptotically N / N goes to one when goes to infinity for any two stanar multivariate normal istribution N an N. Similarly to the / w, λ-es case, we fin that the optimal convergence rate is given by CR opt,, λ m = EZ i 2, λ m an the optimal weights equal w opt i = EZ i/ EZi. We remar that the asymptotic convergence rate for the selective mirroring is the same with or without resample lengths for the mirroring vectors. Thus, the resampling of lengths can only affect finite imensional results. We conclue this paragraph with a conjecture on an expression for the optimal asymptotic convergence rate of the / w, + λ m-es as a function of /λ m. Conjecture. The optimal -asymptotic convergence rate of the / w, +λ sel m -ES with selective mirroring an positive recombination weights in the limit for an for r = lim λii λ m/ /2 is given by sel mirr r = 2 + r CR opt,, = 2 + r G r 2 + x 2 gxx 2 + r G rgg r }{{} >0 for 0<r</2 25 where g an G are the pf an cf of the stanar normal istribution respectively. The CR opt,, sel mirr is shown as the top ashe graph in Fig. 3, right. CR opt,, sel mirr of 0 an /2 compute to /4 an /3 respectively an its unique maximum in 0; /2] can be foun for λ m/ r = ± 0 6 as ± SIMULATIONS Due to their implicit nature, some of the above erive optimal convergence rates are ifficult to compare irectly. However, we can easily estimate the rates by means of Monte Carlo sampling allowing us to compare the performance of the propose algorithms in infinite an finite imension. Moreover, we fin the optimal ratio of mirrore offspring an, theoretically an on spherical functions, the fastest algorithm. All convergence rates are estimate with only positive recombination weights an 0 6 samples are use for each combination of an λ m an for each algorithm. The MATLAB coe is available at gforge.inria.fr/mirroring/. Ranom an Selective Mirroring in Infinite Dimension. Figure 3 shows estimate optimal convergence rates versus the ratio of mirrore offspring. In all cases, the convergence rate monotonically improves with increasing number of offspring λ. For ranom mirroring left subfigure, the convergence rate also increases monotonically with the number of mirrore offspring λ m. The optimal ratio λ m/ is therefore one. With an increasing number of offspring λ, the convergence rate however approaches 0.25 for any ratio λ m/. The results loo quite ifferent with selective mirroring right subfigure: for λ m {0, }, selective mirroring cannot have any effect, but for any 0 < λ m < the convergence rate is consistently better than with ranom mirroring an has a unique optimum slightly below λ m = /5. Figure 4 shows the best convergence rates from Fig. 3 plotte versus + λ m together with the corresponing optimal ratio λ m/ for selective mirroring. For ranom mirroring, the nown limit convergence rate for λ with λ m {0, } is 0.25 for λ m = this follows immeiately from the optimal value of 0.5 with negative recombination weights. For selective mirroring the limit is close to 0.39 with λ m/ 0.9 see Conjecture above. Note that the unsmoothness of the ratio stems from iscretization: not all values for the ratio of λ m/ are possible which in partic-

9 optimal convergence rate * im optimal ratio of λ m / in μ/μ w, sel ES optimal convergence rate of μ/μ w, sel ES optimal convergence rate of μ/μ w, ran ES λ m Figure 4: Extracte normalize optimal convergence rates soli lines for the / w, +λ ran m -ES bottom line an / w, + λ sel m -ES top line of Fig. 3 for ifferent numbers of offspring λ = + λ m together with the corresponing optimal ratio of mirrore an unmirrore offspring for the selective mirroring variant ashe. ular has an effect for small. 5. SUMMARY AND CONCLUSION We have introuce mirrore sampling in ESs with multirecombination. Two important trics are use: selective mirroring where only the worst λ m offspring are mirrore an pairwise selection where at most one offspring from any mirrore couple is selecte for recombination. Less importantly, the length of mirrore vectors might be resample. Obtaine theoretical results support the effectiveness of selective mirroring in particular: the new algorithm improves the nown convergence rate recor for ESs with positive recombination weights by 56% from 0.25 to This is a huge improvement an the new / w, + λ m-es, where λ m 0.9, is also more than 60% faster than the fastest single-parent mirroring + ms-es an almost twice as fast as the regular +-ES in the asymptotic limit, cp. 3]. Only strategies with negative recombination weights are nown to realize larger convergence rates, up to 0.5, cp. 2]. Negative weights however have the isavantage that they use points for recombination that have never been even remotely evaluate, that is, they rely on quite specific properties of the fitness function. Compare to the strategy with optimal positive an negative recombination weights, the optimal / w, + λ m-es loses out in two ways. About 9% aitional offspring are evaluate with negative weights they are simply use without being evaluate. These aitional evaluations lea to a maximal loss of /.9 = 6% convergence spee. About 0.3 offspring are entirely isregare for recombination they have small negative weights otherwise. From the overall loss of /0.5 = 22% we can imply that the latter isregar contributes with a loss of 6% in convergence spee. In preliminary experiments, not shown in this paper, mirrore sampling applie in CMA-ES, using the efault recombination weights, improves the convergence spee in small populations, while its effect in large populations is almost negligible 6. This is not surprising, as with λ mirroring optimal ratio of λ m / 6 A strong averse effect that was first observe on a sinbecomes much less effective, because offspring similar to the mirrore ones are alreay present in the population. Aitionally, the reason to apply large populations is not to achieve faster convergence rates 7. Consiering that small populations are the efault setting an that mirrore sampling is simple an has the potential to be cleverly exploite for the covariance matrix upate, mirrore sampling might become a future stanar metho in practice. 6. REFERENCES ] D.V. Arnol. Optimal weighte recombination. In Founations of Genetic Algorithms FOGA 2005, pages Springer Verlag, ] D.V. Arnol. Weighte multirecombination evolution strategies. Theoretical computer science, 36:8 37, ] A. Auger, D. Brochoff, an N. Hansen. Analyzing the impact of mirrore sampling an sequential selection in elitist evolution strategies. In Founations of Genetic Algorithms FOGA 20. ACM, 20. to appear. 4] A. Auger an N. Hansen. Reconsiering the progress rate theory for evolution strategies in finite imensions. In Genetic an Evolutionary Computation Conference GECCO 2006, pages , ] H.-G. Beyer. The Theory of Evolution Strategies. Natural Computing Series. Springer-Verlag, ] D. Brochoff, A. Auger, N. Hansen, D. V. Arnol, an T. Hohm. Mirrore sampling an sequential selection for evolution strategies. In Parallel Problem Solving from Nature PPSN XI, pages 2. Springer, ] N. Hansen an A. Ostermeier. Completely Deranomize Self-Aaptation in Evolution Strategies. Evolutionary Computation, 92:59 95, ] M. Jebalia an A. Auger. Log-linear convergence of the scale-invariant / w, λ-es an optimal for intermeiate recombination for large population sizes. Research Report RR-7275, INRIA, june ] A. Ostermeier, A. Gawelczy, an N. Hansen. Step-size aaption base on non-local use of selection information. In Conference on Problem Solving From Nature PPSN III, pages 89 98, ] G. Ruolph. Convergence Properties of Evolutionary algorithms. Verlag Dr. Kovac, Hamburg, 997. ] H.-P. Schwefel. Evolution an Optimum Seeing. Sixth-Generation Computer Technology Series. John Wiley & Sons, Inc., New Yor, ] O. Teytau, S. Gelly, an J. Mary. On the ultimate convergence rates for isotropic algorithms an the best choices among various forms of isotropy. In Conference on Parallel Problem Solving from Nature PPSN IX, pages Springer, gle multimoal function was ue to a too small evaluation buget an vanishe uner more appropriate experimental conitions. 7 Note however that the effectiveness of mirroring oes not imply that the ES operates only in a local neighborhoo preominate by linear terms in the fitness function expansion.

Cumulative Step-size Adaptation on Linear Functions

Cumulative Step-size Adaptation on Linear Functions Alexandre Chotard, Anne Auger, Nikolaus Hansen To cite this version: Alexandre Chotard, Anne Auger, Nikolaus Hansen. Cumulative Step-size Adaptation