CMA-ES with Optimal Covariance Update and Storage Complexity

Size: px

Start display at page:

Download "CMA-ES with Optimal Covariance Update and Storage Complexity"

Silas Tate
5 years ago
Views:

1 CMA-ES with Optimal Covariance Upate an Storage Complexity Oswin Krause Dept. of Computer Science University of Copenhagen Copenhagen, Denmark Díac R. Arbonès Dept. of Computer Science University of Copenhagen Copenhagen, Denmark Christian Igel Dept. of Computer Science University of Copenhagen Copenhagen, Denmark Abstract The covariance matrix aaptation evolution strategy (CMA-ES) is arguably one of the most powerful real-value erivative-free optimization algorithms, fining many applications in machine learning. The CMA-ES is a Monte Carlo metho, sampling from a sequence of multi-variate Gaussian istributions. Given the function values at the sample points, upating an storing the covariance matrix ominates the time an space complexity in each iteration of the algorithm. We propose a numerically stable quaratic-time covariance matrix upate scheme with minimal memory requirements base on maintaining triangular Cholesky factors. This requires a moification of the cumulative step-size aaption (CSA) mechanism in the CMA-ES, in which we replace the inverse of the square root of the covariance matrix by the inverse of the triangular Cholesky factor. Because the triangular Cholesky factor changes smoothly with the matrix square root, this moification oes not change the behavior of the CMA-ES in terms of require objective function evaluations as verifie empirically. Thus, the escribe algorithm can an shoul replace the stanar CMA-ES if upating an storing the covariance matrix matters. Introuction The covariance matrix aaptation evolution strategy, CMA-ES [Hansen an Ostermeier, 200], is recognize as one of the most competitive erivative-free algorithms for real-value optimization [Beyer, 2007; Eiben an Smith, 205]. The algorithm has been successfully applie in many unbiase performance comparisons an numerous real-worl applications. In machine learning, it is mainly use for irect policy search in reinforcement learning an hyperparameter tuning in supervise learning (e.g., see Gomez et al. [2008]; Heirich-Meisner an Igel [2009a,b]; Igel [200], an references therein). The CMA-ES is a Monte Carlo metho for optimizing functions f : R R. The objective function f oes not nee to be continuous an can be multi-moal, constraine, an isturbe by noise. In each iteration, the CMA-ES samples from a -imensional multivariate normal istribution, the search istribution, an ranks the sample points accoring to their objective function values. The mean an the covariance matrix of the search istribution are then aapte base on the ranke points. Given the ranking of the sample points, the runtime of one CMA-ES iteration is ω( 2 ) because the square root of the covariance matrix is require, which is typically compute by an eigenvalue ecomposition. If the objective function can be evaluate efficiently an/or is large, the computation of the matrix square root can easily ominate the runtime of the optimization process. Various strategies have been propose to aress this problem. The basic approach for reucing the runtime is to perform an upate of the matrix only every τ Ω() steps [Hansen an Ostermeier, 29th Conference on Neural Information Processing Systems (NIPS 206), Barcelona, Spain.

2 996, 200], effectively reucing the time complexity to O( 2 ). However, this forces the algorithm to use outate matrices uring most iterations an can increase the amount of function evaluations. Furthermore, it leas to an uneven istribution of computation time over the iterations. Another approach is to restrict the moel complexity of the search istribution [Polan an Zell, 200; Ros an Hansen, 2008; Sun et al., 203; Akimoto et al., 204; Loshchilov, 204, 205], for example, to consier only iagonal matrices [Ros an Hansen, 2008]. However, this can lea to a rastic increase in function evaluations neee to approximate the optimum if the objective function is not compatible with the restriction, for example, when optimizing highly non-separable problems while only aapting the iagonal of the covariance matrix [Omivar an Li, 20]. More recently, methos were propose that upate the Cholesky factor of the covariance matrix instea of the covariance matrix itself [Suttorp et al., 2009; Krause an Igel, 205]. This works well for some CMA-ES variations (e.g., the (+)-CMA-ES an the multi-objective MO-CMA-ES [Suttorp et al., 2009; Krause an Igel, 205; Bringmann et al., 203]), however, the original CMA-ES relies on the matrix square root, which cannot be replace one-to-one by a Cholesky factor. In the following, we explore the use of the triangular Cholesky factorization instea of the square root in the stanar CMA-ES. In contrast to previous attempts in this irection, we present an approach that comes with a theoretical justification for why it oes not eteriorate the algorithm s performance. This approach leas to the optimal asymptotic storage an runtime complexity when aaptation of the full covariance matrix is require, as is the case for non-separable ill-conitione problems. Our CMA-ES variant, referre to as Cholesky-CMA-ES, reuces the runtime complexity of the algorithm with no significant change in the number of objective function evaluations. It also reuces the memory footprint of the algorithm. Section 2 briefly escribes the original CMA-ES algorithm (for etails we refer Hansen [205]). In section 3 we propose our new metho for approximating the step-size aaptation. We give a theoretical justification for the convergence of the new algorithm. We provie empirical performance results comparing the original CMA-ES with the new Cholesky-CMA-ES using various benchmark functions in section 4. Finally, we iscuss our results an raw our conclusions. 2 Backgroun Before we briefly escribe the CMA-ES to fix our notation, we iscuss some basic properties of using a Cholesky ecomposition to sample from a multi-variate Gaussian istribution. Sampling from a -imensional multi-variate normal istribution N (m, Σ), m R,Σ R is usually one using a ecomposition of the covariance matrix Σ. This coul be the square root of the matrix Σ = HH R or a lower triangular Cholesky factorization Σ = AA T, which is relate to the square root by the QR-ecomposition H = AE where E is an orthogonal matrix. We can sample a point x from N (m, Σ) using a sample z N (0, I) by x = Hz + m = AEz + m = Ay + m, where we set y = Ez. We have y N (0, I) since E is orthogonal. Thus, as long as we are only intereste in the value of x an o not nee y, we can sample using the Cholesky factor instea of the matrix square root. 2. CMA-ES The CMA-ES has been propose by Hansen an Ostermeier [996, 200] an its most recent version is escribe by Hansen [205]. In the tth iteration of the algorithm, the CMA-ES samples λ points from a multivariate normal istribution N (m t, σ 2 t C t ), evaluates the objective function f at these points, an aapts the parameters C t R, m t R, an σ t R +. In the following, we present the upate proceure in a slightly simplifie form (for iactic reasons, we refer to Hansen [205] for the etails). All parameters (µ, λ, ω, c σ, σ, c c, c, c µ ) are set to their efault values [Hansen, 205, Table ]. For a minimization task, the λ points are ranke by function value such that f(x,t ) f(x 2,t ) f(x λ,t ). The istribution mean is set to the weighte average m t+ = µ i= ω ix i,t. The weights epen only on the ranking, not on the function values irectly. This reners the algorithm invariant uner orer-preserving transformation of the objective function. Points with smaller ranks (i.e., better objective function values) are given a larger weight ω i with λ i= ω i =. The weights are zero for ranks larger than µ < λ, which is typically µ = λ/2. Thus, points with function values worse than the meian o not enter the aaptation process of the parameters. The covariance matrix 2

3 is upate using two terms, a rank- an a rank-µ upate. For the rank- upate, a long term average of the changes of m t is maintaine p c,t+ = ( c c )p c,t + c c (2 c c )µ eff m t+ m t σ t, () where µ eff = / µ i= ω2 i is the effective sample size given the weights. Note that p c,t is large when the algorithm performs steps in the same irection, while it becomes small when the algorithm performs steps in alternating irections. The rank-µ upate estimates the covariance of the weighte steps x i,t m t, i µ. Combining rank- an rank-µ upate gives the final upate rule for C t, which can be motivate by principles from information geometry [Akimoto et al., 202]: C t+ = ( c c µ )C t + c p c,t+ p T c,t+ + c µ σ 2 t µ ω i (x i,t m t ) (x i,t m t ) T (2) So far, the upate is (apart from initialization) invariant uner affine linear transformations (i.e., x Bx + b, B GL(, R)). The upate of the global step-size parameter σ t is base on the cumulative step-size aaptation algorithm (CSA). It measures the correlation of successive steps in a normalize coorinate system. The goal is to aapt σ t such that the steps of the algorithm become uncorrelate. Uner the assumption that uncorrelate steps are stanar normally istribute, a carefully esigne long term average over the steps shoul have the same expecte length as a χ-istribute ranom variable, enote by E{χ}. The long term average has the form i= p σ,t+ = ( c σ )p σ,t + c σ (2 c σ )µ eff C /2 t m t+ m t σ t (3) with p σ, = 0. The normalization by the factor C /2 t is the main ifference between equations () an (3). It is important because it corrects for a change of C t between iterations. Without this correction, it is ifficult to measure correlations accurately in the un-normalize coorinate system. For the upate, the length of p σ,t+ is compare to the expecte length E{χ} an σ t is change epening on whether the average step taken is longer or shorter than expecte: ( ( )) cσ pσ,t+ σ t+ = σ t exp (4) E{χ} σ This upate is not proven to preserve invariance uner affine linear transformations [Auger, 205], an it is it conjecture that it oes not. 3 Cholesky-CMA-ES In general, computing the matrix square root or the Cholesky factor from an n n matrix has time complexity ω( 2 ) (i.e., scales worse than quaratically). To reuce this complexity, Suttorp et al. [2009] have suggeste to replace the process of upating the covariance matrix an ecomposing it afterwars by upates irectly operating on the ecomposition (i.e., the covariance matrix is never compute an store explicitly, only its factorization is maintaine). Krause an Igel [205] have shown that the upate of C t in equation (2) can be rewritten as a quaratic-time upate of its triangular Cholesky factor A t with C t = A t A T t. They consier the special case µ = λ =. We propose to exten this upate to the stanar CMA-ES, which leas to a runtime O(µ 2 ). As typically µ = O(log()), this gives a large spee-up compare to the explicit recomputation of the Cholesky factor or the inverse of the covariance matrix. Unfortunately, the fast Cholesky upate can not be applie irectly to the original CMA-ES. To see this, consier the term s t = C /2 t (m t+ m t ) in equation (3). Rewriting p σ,t+ in terms of s t in a non-recursive fashion, we obtain p σ,t+ = t ( c σ ) t k c σ (2 c σ )µ eff s k. σ k Given c c, the factors in () are chosen to compensate for the change in variance when aing istributions. If the ranking of the points woul be purely ranom, µ eff (m t+ m t)/σ t N (0, C t) an if C t = I an p c,t N (0, I) then p c,t+ N (0, I). k= 3

4 Algorithm : The Cholesky-CMA-ES. input :λ, µ, m, ω i=...µ, c σ, σ, c c, c an c µ A = I, p c, = 0, p σ, = 0 for t =, 2,... o for i =,..., λ o x i,t = σ t A t y i,t + m t, y i,t N (0, I) Sort x i,t, i =,..., λ increasing by f(x i,t ) m t+ = µ i= ω ix i,t p c,t+ = ( c c )p c,t + m c c (2 c c )µ t+ m t eff σ t // Apply formula (2) to A t A t+ c c µ A t A t+ rankoneupate(a t+, c, p c,t+ ) for i =,..., µ o A t+ rankoneupate(a t+, c µ ω i, xi,t mt σ t ) // Upate σ using ŝ k as in (5) p σ,t+ = ( c σ )p σ,t + c σ (2 c σ )µ eff A ( ( )) t c σ t+ = σ t exp σ pσ,t+ σ E{χ} m t+ m t σ t Algorithm 2: rankoneupate(a, β, v) input :Cholesky factor A R of C, β R, v R output : Cholesky factor A of C + βvv T α v b for j =,..., o A jj A 2 jj + β b α2 j γ A 2 jj b + βα2 j for k = j +,..., o α k α k αj A jj A kj A kj = A jj A jj A kj + A jj βαj γ b b + β α2 j A 2 jj α k By the RQ-ecomposition, we can fin C /2 t = A t E t with E t being an orthogonal matrix an A t lower triangular. When replacing s t by ŝ t = A t (m t+ m t ), we obtain p σ,t+ = t ( c σ ) t k c σ (2 c σ )µ eff Ek T ŝ k. σ k Thus, replacing C /2 t by A t introuces a new ranom rotation matrix Et T, which changes in every iteration. Obtaining E t from A t can be achieve by the polar-ecomposition, which is a cubic-time operation: currently there are no algorithms known that can upate an existing polar ecomposition from an upate Cholesky factor in less than cubic time. Thus, if our goal is to apply the fast Cholesky upate, we have to perform the upate without this correction factor p σ,t+ t ( c σ ) t k c σ (2 c σ )µ eff ŝ k. (5) σ k This introuces some error, but we will show in the following that we can expect this error to be small an to ecrease over time as the algorithm converges to the optimum. For this, we nee the following result: k= k= 4

5 Lemma. Consier the sequence of symmetric positive efinite matrices C t=0 with C t = C t (et C t ) /. Assume that C t t C an that C is symmetric positive efinite with et C =. /2 Let C t = ĀtE /2 t enote the RQ-ecomposition of C t, where E t is orthogonal an Āt lower triangular. Then it hols Et E T t t I. Proof. Let C = ĀE, the RQ-ecomposition of C. As et C 0, this ecomposition is unique. Because the RQ-ecomposition is continuous, it maps convergent sequences to convergent sequences. t Therefore E t E an thus, Et E T t t E T E = I. This result establishes that, when C t converges to a certain shape (but not necessary to a certain scaling), A t an thus E t will also converge (up to scaling). Thus, as we only nee the norm of p σ,t+, we can rotate the coorinate system an by multiplying with E t we obtain p σ,t+ = E t p σ,t+ = t ( c σ ) t k c σ (2 c σ )µ eff E t Ek T ŝ k. (6) k= t I, the error in the norm will also vanish ue to the exponential weighting Therefore, if E t Et T in the summation. Note that this oes not hol for any ecomposition C t = B t Bt T. If we o not constrain B t to be triangular an allow any matrix, we o not have a bijective mapping between C t an B t anymore an the introuction of ( ) 2 egrees of freeom (as, e.g., in the upate propose by Suttorp et al. [2009]) allows the creation of non-converging sequences of E t even for C t = const. As the CMA-ES is a ranomize algorithm, we cannot assume convergence of C t. However, in simplifie algorithms the expectation of C t converges [Beyer, 204]. Still, the reasoning behin Lemma establishes that the error cause by replacing s t by ŝ t is small if C t changes slowly. Equation (6) establishes that the error epens only on the rotation of coorinate systems. As the mapping from C t to the triangular factor A t is one-to-one an smooth, the coorinate system changes in every step will be small an because of the exponentially ecaying weighting, only the last few coorinate systems matter at a particular time step t. The Cholesky-CMA-ES algorithm is given in Algorithm. One can erive the algorithm from the stanar CMA-ES by ecomposing (2) into a number of rank- upates C t+ = (((αc t +β v v T )+ β 2 v 2 v T 2 )... ) an applying them to the Cholesky factor using Algorithm 2. Properties of the upate rule. The O(µ 2 ) complexity of the upate in the Cholesky-CMA- ES is asymptotically optimal. 2 Apart from the theoretical guarantees, there are several aitional avantages compare to approaches using a non-triangular Cholesky factorization (e.g., Suttorp et al. [2009]). First, as only triangular matrices have to be store, the storage complexity is optimal. Secon, the iagonal elements of a triangular Cholesky factor are the square roots of the eigenvalues of the factorize matrix, that is, we get the eigenvalues of the covariance matrix for free. These are important, for example, for monitoring the conitioning of the optimization problem an, in particular, to enforce lower bouns on the variances of σ t C t projecte on its principal components. Thir, a triangular matrix can be inverte in quaratic time. Thus, we can efficiently compute A t from A t when neee, instea of having two separate quaratic-time upates for A t an A t, which requires more memory an is prone to numerical instabilities. 4 Experiments an Results Experiments. We compare the Cholesky-CMA-ES with other CMA-ES variants. 3 The reference CMA-ES { implementation } uses a elay strategy in which the matrix square root is compute every max, 0(c +c µ) iterations [Hansen, 205], which equals one for the imensions consiere 2 Actually, the complexity is relate to the complexity of multiplying two µ matrices. We assume a naïve implementation of matrix multiplication. With a faster multiplication algorithm, the complexity can be reuce accoringly. 3 We ae our algorithm to the open-source machine learning library Shark [Igel et al., 2008] an use LAPACK for high efficiency. σ k 5

6 Iterations 0 4 Cholesky-CMA-ES Suttorp-CMA-ES CMA-ES/ CMA-ES-Ref (a) Sphere 0 4 (b) Cigar 0 4 (c) Discus Iterations () Ellipsoi (e) Rosenbrock (f) DiffPowers Figure : Function evaluations require to reach f(x) < 0 4 over problem imensionality (meians of 00 trials). The graphs for CMA-ES-Ref an Cholesky-CMA-ES overlap. Cholesky-CMA-ES Suttorp-CMA-ES CMA-ES/ CMA-ES-Ref time/s (a) Sphere (b) Cigar (c) Discus time/s () Ellipsoi (e) Rosenbrock (f) DiffPowers Figure 2: Runtime in secons over problem imensionality. Shown are meians of 00 trials. Note the logarithmic scaling on both axes. 6

7 Name f(x) Sphere x 2 ( Rosenbrock i=0 00(xi+ x 2 i )2 + ( x i ) 2) Discus x i= 0 6 x 2 i Cigar 0 6 x i= x2 i 6i Ellipsoi i=0 0 x 2 i Different Powers i=0 x i 2+0i Table : Benchmark functions use in the experiments (aitionally, a rotation matrix B transforms the variables, x Bx) log f(mt) 0 6 Cholesky-CMA-ES Suttorp-CMA-ES CMA-ES/ CMA-ES-Ref (a) Sphere (b) Cigar log f(mt) (c) Discus () DiffPowers log f(mt) time/s time/s (e) Ellipsoi (f) Rosenbrock Figure 3: Function value evolution over time on the benchmark functions with = 28. Shown are single runs, namely those with runtimes closest to the corresponing meian runtimes. 7

8 in our experiments. We call this variant CMA-ES-Ref. As an alternative, we experimente with elaying the upate for steps. We refer to this variant as CMA-ES/. We also aapte the nontriangular Cholesky factor approach by Suttorp et al. [2009] to the state-of-the art implementation of the CMA-ES. We refer to the resulting algorithm as Suttorp-CMA-ES. We consiere stanar benchmark functions for erivative-free optimization given in Table. Sphere is consiere to show that on a spherical function the step size aaption oes not behave ifferently; Cigar/Discus/Ellipsoi moel functions with ifferent convex shapes near the optimum; Rosenbrock tests learning a function with bens, which lea to slowly converging covariance matrices in the optimization process; Diffpowers is an example of a function with arbitrarily ba conitioning. To test rotation invariance, we applie a rotation matrix to the variables, x Bx, B SO(, R). This is one for every benchmark function, an a rotation matrix was chosen ranomly at the beginning of each trial. All starting points were rawn uniformly from [0, ], except for Sphere, where we sample from N (0, I). For each function, we vary {4, 8, 6,..., 256}. Due to the long running times, we only compute CMA-ES-Ref up to = 28. For the given range of imensions, for every choice of, we ran 00 trials from ifferent initial points an monitore the number of iterations an the wall-clock time neee to sample a point with a function value below 0 4. For Rosenbrock we exclue the trials in which the algorithm i not converge to the global optimum. We further evaluate the algorithms on aitional benchmark functions inspire by Stich an Müller [202] an measure the change of rotation introuce by the Cholesky-CMA-ES at each iteration (E t ), see supplementary material. Results. Figure shows that CMA-ES-Ref an Cholesky-CMA-ES require the same amount of function evaluations to reach a given objective value. The CMA-ES/ require slightly more evaluations epening on the benchmark function. When consiering the wall-clock runtime, the Cholesky-CMA-ES was significantly faster than the other algorithms. As expecte from the theoretical analysis, the higher the imensionality the more pronounce the ifferences, see Figure 2 (note logarithmic scales). For = 64 the Cholesky-CMA-ES was alreay 20 times faster than the CMA-ES-Ref. The rastic ifferences in runtime become apparent when inspecting single trials. Note that for = 256 the matrix size exceee the L2 cache, which affecte the performance of the Cholesky-CMA-ES an Suttorp-CMA-ES. Figure 3 plots the trials with runtimes closest to the corresponing meian runtimes for = Conclusion CMA-ES is a ubiquitous algorithm for erivative-free optimization. The CMA-ES has proven to be a highly efficient irect policy search algorithm an to be a useful tool for moel selection in supervise learning. We propose the Cholesky-CMA-ES, which can be regare as an approximation of the original CMA-ES. We gave theoretical arguments for why our approximation, which only affects the global step-size aaptation, oes not impair performance. The Cholesky-CMA-ES achieves a better, asymptotically optimal time complexity of O(µ 2 ) for the covariance upate an optimal memory complexity. It allows for numerically stable computation of the inverse of the Cholesky factor in quaratic time an provies the eigenvalues of the covariance matrix without aitional costs. We empirically compare the Cholesky-CMA-ES to the state-of-the-art CMA-ES with elaye covariance matrix ecomposition. Our experiments emonstrate a significant increase in optimizaton spee. As expecte, the Cholesky-CMA-ES neee the same amount of objective function evaluations as the stanar CMA-ES, but require much less wall-clock time an this spee-up increases with the search space imensionality. Still, our algorithm scales quaratically with the problem imensionality. If the imensionality gets so large that maintaining a full covariance matrix becomes computationally infeasible, one has to resort to low-imensional approximations [e.g., Loshchilov, 205], which, however, bear the risk of a significant rop in optimization performance. Thus, we avocate our new Cholesky-CMA-ES for scaling up CMA-ES to large optimization problems for which upating an storing the covariance matrix is still possible, for example, for training neural networks in irect policy search. Acknowlegement. We acknowlege support from the Innovation Fun Denmark through the projects Personalize breast cancer screening (OK, CI) an Cyber Frau Detection Using Avance Machine Learning Techniques (DRA, CI). 8

9 References Y. Akimoto, Y. Nagata, I. Ono, an S. Kobayashi. Theoretical founation for CMA-ES from information geometry perspective. Algorithmica, 64(4):698 76, 202. Y. Akimoto, A. Auger, an N. Hansen. Comparison-base natural graient optimization in high imension. In Proceeings of the 6th Annual Genetic an Evolutionary Computation Conference (GECCO), pages ACM, 204. A. Auger. Analysis of Comparison-base Stochastic Continous Black-Box Optimization Algorithms. Habilitation thesis, Faculté es Sciences Orsay, Université Paris-Su, 205. H.-G. Beyer. Evolution strategies. Scholarpeia, 2(8):965, H.-G. Beyer. Convergence analysis of evolutionary algorithms that are base on the paraigm of information geometry. Evolutionary Computation, 22(4): , 204. K. Bringmann, T. Frierich, C. Igel, an T. Voß. Speeing up many-objective optimization by Monte Carlo approximations. Artificial Intelligence, 204:22 29, 203. A. E. Eiben an Jim Smith. From evolutionary computation to the evolution of things. Nature, 52: , 205. F. Gomez, J. Schmihuber, an R. Miikkulainen. Accelerate neural evolution through cooperatively coevolve synapses. Journal of Machine Learning Research, 9: , N. Hansen an A. Ostermeier. Aapting arbitrary normal mutation istributions in evolution strategies: The covariance matrix aaptation. In Proceeings of IEEE International Conference on Evolutionary Computation (CEC 996), pages IEEE, 996. N. Hansen an A. Ostermeier. Completely eranomize self-aaptation in evolution strategies. Evolutionary Computation, 9(2):59 95, 200. N. Hansen. The CMA evolution strategy: A tutorial. Technical report, Inria Saclay Île-e-France, Université Paris-Su, LRI, 205. V. Heirich-Meisner an C. Igel. Hoeffing an Bernstein races for selecting policies in evolutionary irect policy search. In Proceeings of the 26 th International Conference on Machine Learning (ICML 2009), pages , V. Heirich-Meisner an C. Igel. Neuroevolution strategies for episoic reinforcement learning. Journal of Algorithms, 64(4):52 68, C. Igel, T. Glasmachers, an V. Heirich-Meisner. Shark. Journal of Machine Learning Research, 9: , C. Igel. Evolutionary kernel learning. In Encyclopeia of Machine Learning. Springer-Verlag, 200. O. Krause an C. Igel. A more efficient rank-one covariance matrix upate for evolution strategies. In Proceeings of the 205 ACM Conference on Founations of Genetic Algorithms (FOGA XIII), pages ACM, 205. I. Loshchilov. A computationally efficient limite memory CMA-ES for large scale optimization. In Proceeings of the 6th Annual Genetic an Evolutionary Computation Conference (GECCO), pages ACM, 204. I. Loshchilov. LM-CMA: An alternative to L-BFGS for large scale black-box optimization. Evolutionary Computation, 205. M. N. Omivar an X. Li. A comparative stuy of CMA-ES on large scale global optimisation. In AI 200: Avances in Artificial Intelligence, volume 6464 of LNAI, pages Springer, 20. J. Polan an A. Zell. Main vector aaptation: A CMA variant with linear time an space complexity. In Proceeings of the 0th Annual Genetic an Evolutionary Computation Conference (GECCO), pages Morgan Kaufmann Publishers, 200. R. Ros an N. Hansen. A simple moification in CMA-ES achieving linear time an space complexity. In Parallel Problem Solving from Nature (PPSN X), pages Springer, S. U. Stich an C. L. Müller. On spectral invariance of ranomize Hessian an covariance matrix aaptation schemes. In Parallel Problem Solving from Nature (PPSN XII), pages Springer, 202. Y. Sun, T. Schaul, F. Gomez, an J. Schmihuber. A linear time natural evolution strategy for non-separable functions. In 5th Annual Conference on Genetic an Evolutionary Computation Conference Companion, pages ACM, 203. T. Suttorp, N. Hansen, an C. Igel. Efficient covariance matrix upate for variable metric evolution strategies. Machine Learning, 75(2):67 97,

Multi-objective Optimization with Unbounded Solution Sets

Multi-objective Optimization with Unbounded Solution Sets Oswin Krause Dept. of Computer Science University of Copenhagen Copenhagen, Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik