REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER PDF Free Download

U N I V E R S I T Y OF D O R T M U N D REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 53 Design and Management of Complex Technical Processes and Systems by means of Computational Intelligence Methods When the plus strategy performs better than the comma strategy - and when not Jens Jägersküpper and Tobias Storch No. CI-29/06 Technical Report ISSN 433-3325 November 2006 Secretary of the SFB 53 University of Dortmund Dept. of Computer Science/XI 4422 Dortmund Germany This work is a product of the Collaborative Research Center 53, Computational Intelligence, at the University of Dortmund and was printed with financial support of the Deutsche Forschungsgemeinschaft.

When the plus strategy performs better than the comma strategy and when not Jens Jägersküpper and Tobias Storch Department of Computer Science 2 University of Dortmund 4422 Dortmund, Germany {jens.jaegerskuepper tobias.storch}@cs.uni-dortmund.de Abstract Occasionally there have been long debates on whether to use elitist selection or not. In the present paper the simple (,λ EA and (+λ EA operating on {0, } n are compared by means of a rigorous runtime analysis. It turns out that only values for λ that are logarithmic in n are interesting. An illustrative function is presented for which newly developed proof methods show that the (,λ EA where λ is logarithmic in n outperforms the (+λ EA for any λ. For smaller offspring populations the (,λ EA is inefficient on every function with a unique optimum, whereas for larger λ the two randomized search heuristics behave almost equivalently. I. INTRODUCTION Evolutionary algorithms (EAs belong to the broad class of general randomized search heuristics. Their area of application is as huge as their variety and they have been applied in numerous situations successfully. Among the best-known and simplest EAs are the (µ+λ EA and (µ,λ EA []. The µ indicates that a parent population of size µ is used, whereas λ denotes the application of an offspring population of size λ. Whether the elements of the descendant population are selected either from the parent and offspring population or from the offspring population only is indicated by + and, respectively. Thus, for the comma strategy necessarily λ µ (for λ = µ, there is actually no selection. Runtime analysis started with very simple EAs such as the (+ EA on example functions [2], [3]. Nowadays, one is able to analyze its runtime on practically relevant problems such as the maximum matching problem [4]. However, for more complex EAs and (typical example functions, the effects of applying either a larger offspring or a large parent population size were investigated theoretically [5], [6]. In this paper, we aim at a systematic comparison of the plus and the comma strategy with respect to the offspring population size. These investigations improve our ability to choose supported by the German Research Foundation (DFG through the collaborative research center Computational Intelligence (SFB 53 resp. grant We 066/

2 an appropriate selection method, which has been debated a long time. Furthermore, they contribute to the discussion on the effects of selection pressure in evolutionary computation. In order to concentrate on these effects we consider simple EAs that allow for a rigorous analysis, but avoid unnecessary complications due to the effects of other EA components. Here we consider the maximization of pseudo-boolean objective (fitness functions f : {0, } n R, n N. We investigate the following optimization heuristics, known as (+λ EA and (,λ EA, using a parent population of size one and standard bit-mutation mutate /n (x, where each bit of x {0, } n is flipped independently with probability /n, cf. [], [5]. (+λ EA and (,λ EA Set t := and choose x t {0, } n uniformly at random. 2 Set t := t + and let y t, := mutate /n (x t,..., y t,λ := mutate /n (x t. 3 Choose y t {y t,,..., y t,λ } arbitrarily among all elements with largest f -value. (+λ EA: If f (y t f (x t, then set x t :=y t, else set x t := x t. (,λ EA: Set x t :=y t. 4 Goto 2. The number of f -evaluations which are performed until the t(n th step is completed, equals + λ (t(n for t(n. In contrast to the (+λ EA, the (,λ EA occasionally accepts an element that is worse than the previous one (unless the function to be optimized is constant. This can avoid stagnation in local optima. However, it may also cause a slow(er movement towards global optima. It was often argued that the difference between an elitist (+λ EA and a non-elitist (,λ EA is less important in {0, } n, e.g. [5]. Here we will point out in detail when this is correct but we also demonstrate when this is definitely not the case. More precisely, in Section III we show that a comparison of plus and comma strategy is interesting in particular for offspring populations of size λ with ln(n/4 < λ = O(ln n. Investigating λ = for the (+λ EA does make sense, but for the (,λ EA it does not: For any λ ln(n/4, the comma strategy indeed fails to optimize any function with a unique global optimum. Furthermore, for λ = ω(ln n (i.e., ln n/λ 0 as n it is rather unlikely to observe any difference in the populations of the (+λ EA and the (,λ EA in a polynomial number of steps. These observations are applied and extended for a simple (unimodal example function and (asymptotically tight bounds on the heuristics runtimes are obtained. In Section IV we extend the well-known proof technique of f -based partitions such that it can be applied to (+λ EA and (,λ EA. With the help of this method we demonstrate the algorithms different strengths and weaknesses. Namely, for a simple (multimodal example function we apply the method and demonstrate the possible major disadvantage of the plus strategy compared to the comma strategy. The runtime bounds to be presented are again tight. We finish with a summary and some conclusions in Section V and continue with some preliminaries in the following Section II.

3 II. PRELIMINARIES The efficiency of a randomized algorithm is (usually measured in the following way. For { + ;, } let T f n,λ(x denote the random variable which corresponds to the number of function evaluations the runtime of the ( λ EA to create an optimum of f n : {0, } n R, n N, for the first time, where the initial element is x {0, } n. (We can ignore a stopping criterion and analyze an infinite stochastic process. If for a sequence of functions f = ( f,..., f n,... the expected runtime of the ( λ EA to optimize f n, namely x {0,} E[T n f n,λ (x]/2n (since the initial element x is chosen uniformly at random, is bounded by a polynomial in n, then we call the ( λ EA efficient on f, whereas we call it totally inefficient if the probability that an optimum is created remains exponentially small even after an exponential number of steps. In this case, a polynomially bounded number of (parallel independent multistarts of the algorithm is still totally inefficient. For the notations on asymptotics see [7]. A. Small Offspring Populations III. SMALL AND LARGE OFFSPRING POPULATIONS We take a closer look at the smallest possible offspring population size. On the one hand, the (+ EA (can reasonably be applied optimizes any function in an expected runtime O(n n, and functions are known where it needs an expected runtime Θ(n n. On the other hand, the (+ EA optimizes any linear function in an expected runtime O(n ln n, and it needs an expected runtime Θ(n ln n if the linear function has a unique global optimum [2]. In contrast, the (, EA (cannot reasonably be applied optimizes any function in an expected runtime O(2 n, but it also needs an expected runtime Θ(2 n if the function has a unique global optimum [3]. This is because the closer the search point x t of the (, EA is to the unique optimum, the larger the probability of x t+ to be located farther away from the optimum. Let us consider the (,λ EA with larger offspring populations, yet λ ln(n/4. We demonstrate that a strong drift away from the optimum still exists. Namely, if x t is reasonable close to the optimum, then with a large probability all elements in the offspring population are even farther away from the optimum. Since comma selection is applied, one of these elements becomes x t+. Thus, it is time-consuming to create the optimum. Theorem Given a function f : {0, } n R with a unique global optimum x {0, } n and λ ε(n ln(n/7 with ε(n [7/ ln n, /2], with probability 2 Ω(n ε(n the (,λ EA needs a runtime larger than 2 n ε(n to optimize f. With ε(n := 7/ ln n we obtain for the (, EA a lower bound on the runtime to optimize f of 2 Ω(n which holds not only in expectation (cf. the result in [3] yet also with an overwhelming probability. Even the (, ln(n/4 EA (i.e., we choose ε(n := /2 is still totally inefficient.

4 To prove Theorem, we recall a result on Markov processes (and drift. A Markov process M on m < states, namely 0,..., m, is described by a stochastic m mmatrix P of transition probabilities (P i, j, 0 i, j m : probability to transit from state i to state j and a stochastic row vector p [0, ] m of initialization probabilities (p i, 0 i m : probability to initialize in state i. The i th entry of the stochastic row vector pp t corresponds to the probability of M being in state i after the t th step for 0 i m and t. For more detailed investigations of Markov processes see [7]. The following result was proven in [4] and goes back to a result in [8]. Lemma 2 Given a Markov process M on m states, a state l {0,..., m }, and α(l, β(l, γ(l > 0, if m j=0 p i, j e α(l ( j i /β(l i {0,..., l}, 2 m j=0 p i, j e α(l ( j l + γ(l i {l,..., m }, then the 0 th entry of the m-vector pp t is bounded from above by t e α(l l β(l ( + γ(l + In the following we prove Theorem : Proof: The runtime is larger than +λ (t(n t(n, if the unique optimum x is not created in the first t(n steps. We assume that, once the (,λ EA has created x, afterwards x would be kept forever. Thus, we are interested in the event x 2 n ε(n = x. If its probability is 2 Ω(n ε(n, then we obtain the claimed result. We describe a Markov process M on n + states with the following property. At least with the same probability, M is in a state i,..., n after t(n steps as the (,λ EA generates an element x t(n with Hamming distance H[x t(n, x ] i from the optimum x. If this holds for all i {0,..., n}, the (,λ EA generates the optimum at most with the same probability as M reaches state 0 (in a given number of steps. With p i := ( n i /2 n, M has the desired property for t(n = (even equality holds. If M is in state i after t(n steps, with at least the same probability holds H[x t(n, x ] i for the (,λ EA with x t(n. In this situation, assume that the probability of creating x t(n+ is bounded above by p i, j, where H[x t(n+, x ] j. Ensuring p i,0 + + p i, j p i, j is sufficient, so that M has the desired property also in the following step. For j < i we set p i, j to (at least the maximum of the probabilities that an x t(n generates x t(n+ with H[x t(n+, x ] = j, so that the inequality holds for j < i. We set p 0,0 := and the inequality holds for i = 0. We set p i,i+ for i to (at most the minimum of the probabilities that x t(n generates x t(n+ with H[x t(n+, x ] i +. Moreover, p i, j := 0 for j i + 2 a well as p i,i := p i,i+ i j=0 p i, j, so that the inequality holds for j i and i. In order to apply Lemma 2 for M with l := n ε(n, α(l := 6/5, β(l := 32l, and γ(l :=, we have to prove that the following two conditions are fulfilled. l j=0 p j.

5 n j=0 p i, j e (6/5 ( j i /(32 n ε(n for all i {,..., n ε(n }. Firstly, we consider j < i and an element x with H[x, x ] = i + k, 0 k n i. In order to decrease the Hamming distance from the optimum to j, for at least one of its λ offspring, i + k j out of i + k specific bits must flip. Hence, /n i+k j k {0,..., n i}} λ ( i /n i j p i, j max{λ ( i+k i+k j since ( ( i+k i+k j = i i j (i+k (i+ ( i ( j+k ( j+ i j /n i j λ ( i i j n k. Furthermore, with ( i i j i i j it holds λ i i j /n i j λ n ( ε(n(i j /n i j = λ n ε(n(i j. Secondly, we consider j = i + for i > 0 and an element x with H[x, x ] = i + k, 0 k n i. For k = 0 it is sufficient that each of its λ offspring equals x except for one bit which is flipped such that the Hamming distance to the optimum is increased. For k it is sufficient, that each of its λ offspring is a duplicate of x. In these cases, the Hamming distance to the optimum is at least i +. Hence, p i,i+ min{ ( n i n ( n n, ( n i j n } λ (6/7 λ because ( /n n 6/7 and (n i/n ( /n n (n n 7/ ln n /(en 6/7. Furthermore, using the fact that ln(6/7 /7 /6, we have (6/7 λ (6/7 ε(n ln(n/7 n ε(n/6. It remains to prove that i j=0 λ n ε(n(i j e (6/5 (i j +( n ε(n/6 i j=0 λ n ε(n(i j e (6/5 0 + n ε(n/6 e (6/5 ( n ( ε(n /32 for 0 < i < n ε(n. By an index transformation and due to the convergence property of infinite geometric series i j= λ n ε(n j e 6 j 5 λ (n ε(n e 5 6 j λ = n ε(n e 5 6 follows. Furthermore, with λ ε(n ln(n/7 it holds λ n ε(n e 6/5 j= ε(n ln(n 7n ε(n e 6/5 7 2n ε(n/6 3 since ε(n 7/ ln n and furthermore, with e 5ε(n ln(n/6 + 5ε(n ln(n/6 it is 0 ε(n ln(n ( 4 3 5 6 6 e 5 4 6 7 + 3 (e 5 e 6 4 3 6 5ε(n e 5 n 6 4 ε(n n 6 ε(n ln(n 3 ε(n = 2n 6 3 (7n ε(n e 6 5 7 ε(n ln(n.,

6 Since i n ε(n/6 λ n ε(n j n ε(n/6, j= the inequality mentioned above is fulfilled with 2n ε(n/6 /3 + ( n ε(n/6 + n ε(n/6 e 6/5 n ε(n/6 /32 n ( ε(n/6 /32 since ε(n /2. 2 n j=0 p i, j e (6/5 ( j n ε(n 2 for all i { n ε(n,..., n}. Similar to the proof that the first condition is met, we also have p i, j λ n ε(n( n ε(n j for j < i. Thus, n ε(n j=0 λ n ε(n( n ε(n j e (6/5 ( n ε(n j +( n ε(n j=0 λ n ε(n( n ε(n j e (6/5 0 2n ε(n/6 /3 + 2 for n ε(n i n. To apply Lemma 2 we must finally estimate l j=0 p j. Since l n ε(n n/e 7 with ε(n 7/ ln n, l j=0 ( n j /2 n n/e 7 j=0 ( n j /2 n n ( n n/e 7 /2 n n ( en n/e7 /2 n = e ln n+8n/e7 n ln 2 e n/3. n/e 7 Now, applying Lemma 2 with t = 2 n ε(n leads to a probability of at most 2 n ε(n e (6/5 n ε(n 32 n ε(n ( + + e n/3 = 2 Ω(n ε(n that M reaches state 0 in the first 2 n ε(n steps. B. Large Offspring Populations With an offspring population size λ of any appreciable size, the (+λ EA and the (,λ EA will not differ significantly in the way they search {0, } n. This was claimed in [5] since in this situation... the offspring population will almost surely contain at least one exact copy of the parent. We extend this statement and make it more precise in the following. Therefore, let f be a function and for t(n let s t(n := x, y 2,,..., y 2,λ, y 2, x 2, y 3,,..., y 3,λ, y 3,..., x t(n, y t(n+,,..., y t(n+,λ, y t(n+ be a sequence of (λ + 2 t(n elements from {0, } n. The ( λ EA observes s t(n (while optimizing f if with positive probability the following holds: The elements x,..., x t(n appear as the first t(n parents and, for t {2,..., t(n + }, y t can appear as the selected offspring out of the λ offspring y t,,..., y t,λ of x t. We consider a sequence s t(n observed by the ( λ EA. Recall that the ( λ EA and the ( λ EA differ only in step 3, where denotes the other of the two symbols { + ;, }. If t {2,..., t(n} : f (y t f (x t, then the condition in step 3 is always fulfilled. The ( λ EA and ( λ EA observe with equal probability: the same x and with the same parent x t the same sequence of offspring y t,,..., y t,λ and even the selected offspring y t is determined identically. Thus, the ( λ EA and ( λ EA observe the sequence s t(n with equal probability while optimizing f. The set of these sequences is denoted by S +,, f,t(n.

7 If t {2,..., t(n} : f (y t < f (x t, then the condition in step 3 is in step t not fulfilled. In case f (y t < f (x t also y t x t and the ( λ EA and ( λ EA surely select different elements to be x t. Thus, the sequence s t(n is not observed by the ( λ EA while optimizing f. The set of these sequences is denoted by S, f,t(n. We bound the probability to observe a sequence of S +,, f,t(n by the (+λ EA and (,λ EA. If at least one of the offspring y t,,..., y t,λ is a duplicate of its parent x t, then necessarily f (y t f (x t. Its probability is bounded from below by ( ( /n nλ ( 6/7 λ = (/7 λ for n large enough. With probability at most (t(n (/7 λ, this does not happen for at least one t {2,..., t(n}. Lemma 3 Given f : {0, } n R and n large enough, with probability at least (t(n (/7 λ the (+λ EA as well as the (,λ EA (with an arbitrary x observe a sequence from S +,, f,t(n for t(n. This lemma helps to transfer success probabilities and even expectation values for optimization from the (+λ EA to the (,λ EA and vice versa. In particular, when the offspring population is large with respect to the period considered. We consider a runtime of l(n with l(n +λ, i.e. at most two steps. For any λ, the ( λ EA and ( λ EA optimize a function f within the first l(n function evaluations with equal probability in this case. We consider a runtime l(n with 2+λ (t(n l(n +λ t(n for t(n 2, i.e. at most t(n + steps. Let E be the event that the ( λ EA has not optimized the function f in the first l(n function evaluations. This event occurs iff the ( λ EA observes a sequence s t(n where all l(n elements x, 2,,..., y 2,λ, y 3,,..., y 3,λ,..., y t(n+,,..., y t(n+,l(n λ (t(n are non-optimal. We decompose E into two disjoint events E, that s t(n S +,, f,t(n, and E 2, that s t(n S +,, f,t(n, i.e., s t(n S, f,t(n. As we have seen, each sequence from S +,, f,t(n occurs with the same probability for the ( λ EA and ( λ EA. Thus, Pr[E ] = Pr[E ] and hence, Pr[E ] = Pr[E 2 ] + Pr[E ] = Pr[E 2 ] + Pr[E ] = Pr[E 2 ] + Pr[E ] Pr[E 2 ]. Consider λ (5/2 ( + c(n ln t(n, where c(n 0. By Lemma 3 the ( λ EA and ( λ EA observe a sequence from S +,, f,t(n with probability at least (t(n (/7 λ t(n (/7 (5/2 (+c(n ln t(n t(n /t(n +c(n = /t(n c(n since ln(/7 (5/2. So, a sequence of S, f,t(n is observed with probability at most /t(n c(n by the ( λ EA. Since E 2 implies that the ( λ EA observes a sequence from S, f,t(n, 0 Pr[E 2 ] and Pr[E 2 ] /t(nc(n, and hence, /t(n c(n Pr[E 2 ] Pr[E 2 ] /t(nc(n. Theorem 4 Let f : {0, } n R, x {0, } n and n large enough be given. For 0 l(n + λ holds: Pr[T f,λ (x > l(n] = Pr[T f,λ (x > l(n]

8 2 For 2+λ (t(n l(n +λ t(n, where t(n 2 and λ (5/2 (+c(n ln t(n, c(n 0, holds: Pr[T f,λ (x > l(n] Pr[T f,λ (x > l(n] + /t(nc(n Pr[T f,λ (x > l(n] Pr[T f,λ (x > l(n] /t(nc(n The next section shows an exemplary application of this result. C. Application to ONEMAX Let us investigate one of the best-known functions, namely ONEMAX : {0, } n R, where ONEMAX(x := x. Even its analogue in continuous search spaces is well-studied, e.g. in [9]. Part of the following theorem was proven in [5] (it even holds for the (+λ EA with an arbitrarily fixed x. Let us consider a phase of 3 max{e[t + ONEMAX,λ (x] x {0, }n }/λ + 6 7 =: E λ steps, each creating λ offspring. By Markov s inequality [7] the (+λ EA does not create the optimum in such a phase with probability at most E[T + ONEMAX,λ (x]/(λ E λ /3 for every x. We observe, for λ 3 ln n holds (5/2 ( + /7 ln E λ λ since E λ cn for an appropriate large constant c. Hence, by Theorem 4., the (,λ EA does not create the optimum in such a phase with probability at most /3 + /E /7 λ /2, i.e., with probability at least /2 it does. In the case of a failure we can repeat the argumentation. The expected number of repetitions is upper bounded by 2 and we obtain part 3 of the following theorem since + 2 λ E λ = O(max{E[T + ONEMAX,λ (x] x {0, } n }. Finally, part 2 of the following theorem results by Theorem since ONEMAX has the unique optimum n. Theorem 5 The expected runtime of the (+λ EA on ONEMAX is O(n ln n if λ = O((ln n(ln ln n/ ln ln ln n, and O(λn if λ = Ω(ln n. 2 If λ ε(n ln(n/7 for ε(n [7/ ln n, /2], then with probability 2 Ω(n ε(n the (,λ EA needs a runtime larger than 2 n ε(n to optimize ONEMAX. 3 If λ 3 ln n, then the expected runtime of the (,λ EA on ONEMAX is O(n ln n if λ = O((ln n(ln ln n/ ln ln ln n, and O(λn.

9 IV. OFFSPRING POPULATIONS WHICH ARE NEITHER LARGE NOR SMALL We present two proof techniques one for the (+λ EA and one for the (,λ EA which are inspired by the method of f -based partitions from [0]. They demonstrate the different strengths and weaknesses of the two selection strategies. The original method of f -based partitions helps to upper bound the expected runtime of the (+ EA to optimize a particular function and is widely applied. Recently, this method was successfully extended for a (µ+ EA in []. Given f : {0, } n R and A, B {0, } n, A, B, the relation A < f B holds, iff f (a < f (b for all a A, b B. We call A 0,..., A m an f -based partition, iff A 0,..., A m is a partition of {0, } n, A 0 < f < f A m, and A m contains optima only, i.e., f (a = max{ f (b b {0, } n } for each a A m. Moreover, for i {0,..., m } let p(a, a A i, denote the probability that a mutation of a is in A i+ A m and p(a i := min{p(a a A i }, i.e., p(a i is a lower bound on the probability to leave A i with a mutation. A. (+λ EA For the (+λ EA to leave A i, i < m, once and for all, at least one of the λ offspring must be in A i+ A m. Lemma 6 Given f : {0, } n R and an f -based partition A 0,..., A m, let p + i := ( p(a i λ for i {0,..., m }. The (+λ EA (with an arbitrarily initialized x optimizes f in an expected runtime of at most + λ ( p + + +. 0 p + m Proof: We describe a Markov process M on m + states with the following property. The probability that M is in a state i,..., m after t(n steps is at most the probability that the (+λ EA generates an element x t(n with x t(n A i A m. If this holds for all i {0,..., m}, the (+λ EA generates at least with the same probability an optimum as M reaches state m (in a given number of steps. We set p 0 := and p i := 0 for i, so that M has the claimed property for t(n = and arbitrary x. If M is in state i after t(n steps, with at least the same probability x t(n A i A m for the (+λ EA with x t(n. In this situation, it is impossible to create an x t(n+ A 0 A i. Moreover, p + i is a lower bound on the probability to create x t(n+ A i+ A m since it suffices that at least one of λ offspring is therein. Thus, we set p i, j := 0 for 0 j < i m and i + 2 j m, p i,i+ := p + i, and p i,i := p + i for 0 i < m, and p m,m :=. This ensures that M has the desired property also in the following step. The expected number of steps to move from state i to state m equals E i := + p + i E i+ + ( p + i E i = /p + i + E i+ for i {0,..., m }, and E m = 0. Thus, E 0 equals /p + 0 + + /p+ m. With the initialization and the λ function evaluations

0 in each further step, the (+λ EA optimizes f in an expected runtime of at most + λ E 0. For λ = we obtain the original result for the (+ EA presented in [0]. We apply this method exemplarily to ONEMAX. We consider the partition A 0,..., A n with A i := {x x = i}. Then p + i ( n i λ e λ(n i en = en+λ(n i (cf. en + λ(n i λ(n i en [5]. Hence, by applying Lemma 6, the (+λ EA optimizes ONEMAX in an expected runtime of at most + λ n en+λ(n i i=0 = O(n ln n + λn. This already proves a major λ(n i part of Theorem 5.. B. (,λ EA Theorem 4 enables us to easily transfer Lemma 6 to the (,λ EA. For steps which do not create a duplicate, we may pessimistically assume that they lead to a disadvantage, or we may optimistically assume that they lead to an advantage, depending on whether we aim at an upper or at a lower bound on the (expected runtime. Given f : {0, } n R, let A 0,..., A m be a (not necessarily f -based partition of {0, } n such that A m consists of optima only. Let the probability that a mutation of a A i generates some b in A 0 A i such that f (b f (a be denoted by p (a, in A i+ A m such that f (b > f (a be denoted by p + (a. Thus, p (A i := max{p (a a A i } is an upper bound on the probability that an offspring is generated such that A i is (possibly left, but in the wrong direction (namely A 0 A i is hit even if a duplicate of the parent is generated. Moreover, p + (A i := min{p + (a a A i } is a lower bound on the probability that an offspring is generated such that A i is left in the right direction, namely A i+ A m is hit. Lemma 7 Given f : {0, } n R and a partition A 0,..., A m of {0, } n such that A m consists of optima only, let for i {0,..., m} p + i := max{ ( p (A i λ ( p (A i p + (A i λ, p(a i λ } and p i := ( p (A i λ + ( 7 p (A i p + (A i λ. The (,λ EA (with an arbitrarily initialized x optimizes f in an expected runtime of at most + λ ( p + p + + 0 p + + + + + p p p + m + p m p + p+ m + p m p +. m Proof: Note that p + i is a lower bound on the probability that the (,λ EA with x t(n A i generates an x t(n+ A i+ A m, since for this to happen it is sufficient that either at least one offspring in A i+ A m with a larger function value than each element of A i and no offspring in A 0 A i with a function value at least

as large as the one of each element in A i are created (the probability of this event is bounded from below by ( λ λ j= j p + (A i j ( p (A i p + (A i λ j = ( p (A i λ ( p (A i p + (A i λ, or all offspring are in A i+ A m (the probability is bounded from below by p(a i λ. Moreover, p i is an upper bound on the probability that the (,λ EA with x t(n A i generates x t(n+ A 0 A i, since for this not to happen it is sufficient that at least one offspring in A i A i+ A m with a function value at least as large as the one of each element in A i is generated but no offspring in A 0 A i with a function value at least as large as the function value of an element in A i. The probability for this is bounded from below by ( λ λ ( j= j 6/7 + p + (A i j ( p (A i (6/7 + p + (A i λ j = ( p (A i λ ( /7 p (A i p + (A i λ since the probability of generating a duplicate equals ( /n n 6/7. Let p x,x denote the probability that the (,λ EA generates x as next parent when x is mutated, and let T x denote the expected number of steps until an element in A m is generated (for the first time when starting with x. Obviously, T x = 0 if x A m. For x A i with i < m, T x = + p x,x T x + p x,x T x + p x,x T x. x A 0 A i x A i x A i+ A m Since T x max{t x x A i A m } =: T i (so that T i T i+, T x + p x,x T 0 + p x,x T i + p x,x T i+. x A 0 A i x A i x A i+ A m As we have seen above, x A i+ A m p x,x p + i and x A 0 A i p x,x p i. Hence, for each x A i T x + p i T 0 + ( p i p + i T i + p + i T i+. Thus, max{t x x A i } + p i T 0 + ( p i p + i T i + p + i T i+ and with T i+ + p i T 0 + ( p i p + i T i + p + i T i+ T i = max{max{t x x A i }, T i+ } + p i T 0 + ( p i p + i T i + p + i T i+. Since T x T 0 for all x {0, } n, we are interested in an upper bound on T 0. We consider the following Markov process M on m+ states. For i {,..., m } let p i,0 := p i, p i,i+ := p + i, p i,i := p i p + i, and p 0,0 := p + 0, p 0, := p + 0, p m,0 := p m, p m,m := p m. Moreover, let p i, j := 0 for i {,..., m } and j {,..., i, i + 2,..., m}. For the expected number of steps E i to move in M from state i to state m in fact E i T i. We prove E 0 ( p + 0 + p + + p + + p + i + p i p + + p p + p+ i + p i p + i + E i

2 for all i {0,..., m} by induction over i. With the first step and the λ function evaluations in each further step, and with i = m, the (,λ EA optimizes f in an expected runtime at most + λ E 0 since E m = 0. Obviously, E 0 = E 0, and for i =, it is readily seen that E 0 = E /p + 0. Similarly, for the estimation of E i+ we utilize that E i = + p i E 0 + ( p i p + i E i + p + i E i+ = + p i E 0 + p + i E i+ p + i + p. i Now, since p + i +p i yields E 0 ( p + i p + i +p i p i + p i p+ +p p + ( p + 0 p+ i +p i, using the estimate for E p + i (induction i + p+ + p p + p + + p + + p+ i + p i p + i p + i + + p i + p+ i E i+ p + i + p i. p + i + p i Finally, ( p i p+ p + i +p i =, so that the claimed inequality holds also for i +. i +p i p + i We apply this lemma exemplary in the following section. C. Application to CLIFF A comparison of the (+λ EA and the (,λ EA for the optimization of a function f with a unique optimum is interesting especially for λ = Θ(ln n: On the one hand, for λ ln(n/4, the (,λ EA cannot optimize f efficiently at all. On the other hand, for λ = ω(ln n, it is impossible that the ( λ EA is efficient for f when the ( λ EA is totally inefficient for f. Let us investigate the function CLIFF : {0, } n R with ONEMAX(x n/3 if x n n/3, CLIFF(x := ONEMAX(x if x < n n/3. Its analogue in continuous search spaces has been studied in [2]. Typically, the (+λ EA waits for a long time at the cliff, which consists of all elements x with x < n n/3, whereas the (,λ EA approaches the border of the cliff and, after a short while, jumps over the cliff and hardly ever drops back. The following theorem proves that the (,λ EA is efficient with an offspring population size that is logarithmic in n, whereas the (+λ EA is totally inefficient for any offspring population sizes. (The opposite effect could also be illustrated. Theorem 8 With probability 2 Ω(n the (+λ EA needs a runtime larger than n n/4 to optimize CLIFF. 2 If λ ε(n ln(n/7 for ε(n [7/ ln n, /2], with probability 2 Ω(n ε(n the (,λ EA needs a runtime larger than 2 n ε(n to optimize CLIFF.

3 3 If λ 5 ln n, the expected runtime of the (,λ EA on CLIFF is O(e 5λ. 4 The expected runtime of the (,λ EA on CLIFF is larger than min{n n/4, e λ/4 }/3. Proof: The (+λ EA with x t(n, where x t(n < n n/3, generates x t(n+, where x t(n+ < n n/3 (case or x t(n+ x t(n + n/3 (case 2 only. For case 2 to occur, at least any n/3 bits have to flip. The probability that this happens at least once in n n/4 mutations is bounded from above by n n/4 / n/3! n n/4 (e/ n/3 n/3 = 2 Ω(n ln n. As long as only case occurs, the optimum is not generated. The probability that for x holds x n n/3 is bounded from above by ( n n i=n n/3 i /2 n n ( n n/3 /2 n. Furthermore, n ( n n/3 n n (n n/6 + (n n/6 (n n/3 + n 2 (/n n/3+! n n/6 (5n/6 n/3 n/6 2, where Stirling s formula is applied. Moreover, the former expression is bounded from above by n 2 2 n/3 (log 2 (5/6/2+log 2 (3e 2 28n/29. Hence, n/3 (n/(3e n/3 n ( n i=n n/3 i /2 n 2 28n/29 /2 n = 2 Ω(n and with probability 2 Ω(n ln n 2 Ω(n the (+λ EA has not optimized CLIFF in runtime n n/4. 2 The result follows by Theorem since CLIFF has a unique optimum, namely n. 3 For λ > n ln n the result follows since the optimum is generated with probability at least /n n in each mutation as at most n bits have to flip. So, let λ n ln n and l := ln λ/ ln ln λ. In order to apply Lemma 7, we distinguish three classes of partitions A i and determine p + i and p i in each case. Note that (/7 λ (/7 5 ln n /n 2. a A i := {x x = i}, 0 i n n/3. Since p (A i = 0 it holds p i /n 2 for 0 i n n/3. Moreover, ( p (A i λ ( p (A i p + (A i λ ( p + (A i n/3 (/n( /n n /9, and hence, (movement towards the cliff and (jump over the cliff p + i /9 for 0 i < n n/3 p + n n/3 9 λ since we obtain (similarly to above p(a n n/3 λ ( n/3 (/n( /n n λ (/9 λ. b A n n/3 +i := {x n n/3 + il x < n n/3 + (i + l}, 0 i < n/(2l. It holds ( n/3 (i+l p + l (A n n/3 +i en l since it is sufficient to flip exactly l out of n x n/3 (i + l n/5 specific bits of x A n n/3 +i. Hence, ( n/3 (i+l l en l ( n/5 l en l l = e l ln(l 5 e ln λ ln ln λ (ln ln λ ln ln ln λ+ln 5 λ

4 since (ln λ/ ln ln λ (ln ln ln λ ln 5 0. Furthermore, we obtain (probable return to the cliff p n n/3 +i p+ n n/3 +i for 0 i and (improbable movement off the cliff = p + n n/3 +i ( p (A n n/3 + λ ( p (A n n/3 + p + (A n n/3 + λ λ j= ( λ j p + (A n n/3 +i j ( p (A n n/3 +i p + (A n n/3 +i λ j p + (A n n/3 +i ( p (A n n/3 +i p + (A n n/3 +i λ (6/7 λ /λ for 0 i since p + (A n n/3 +i /λ and p (A n n/3 +i + p + (A n n/3 +i /7. Hence, we obtain p (A n n/3 +i /(li! (e/(li li 3 ln ln λ (+ e 4 ( ln ln ln λ λ i /λ 3i/4, for 2 i < n/(2l since at least any li bits have to flip, for the last but one inequality holds e/(li /(e ln 3/4 λ, and for the last inequality Thus, ln λ ln ln λ + 3 ln λ 3 ln ln λ + 4 4 3 ln λ 4 ( p (A n n/3 +i λ ( /λ 3i/4λ λ 3i/4. and we obtain (improbable return to the cliff p n n/3 +i λ 3i/4 + /n 2 for 2 i < n/(2l as well as (probable movement off the cliff p + n n/3 +i ( λ 3i/4 ( /λ λ /2 for 2 i < n/(2l. c A n n/3 + n/(2l +i := {x x = n n/3 + n/(2l l + i}, 0 i n/3 n/(2l l. It holds ( p (A n n/3 + n/(2l +i λ /2 ( λ λ ( e n/3 e n n/3! n/3 since at least any n/(2l l n/3 bits have to flip. We obtain (improbable return to the cliff p n n/3 + n/(2l +i e n + /n 2 for 0 i n/3 n/(2l l

5 and furthermore, since p + (A n n/3 + n/(2l +i (/n( /n n /(en for i < n/3 n/(2l l, we have (probable movement off the cliff p + n n/3 + n/(2l +i ( p (A n n/3 + n/(2l +i λ ( p + (A n n/3 + n/(2l +i λ ( e n ( (/n( /n n λ /(3n for 0 i < n/3 n/(2l l. As argued, Lemma 7 implies for the (,λ EA an upper bound on the expected runtime to optimize CLIFF of (a (b (c ( ( n n/3 2 + λ n n/3 2 i=0 i=0 ( + 9 n 2 9 (6/7 λ (6/7 λ λ λ n/3 n/(2l l i=0 9 + n 2 + + + 9 λ + n 2 9 λ n/(2l i=2 ( + ( + 3n e n n 2. 3n n/(2l i=2 ( + 2 (λ 3i/4 + n 2 2 2 + n/3 n/(2l l i=0 3n The letters on the left identify the class of partition investigated. The expression in the first line is bounded by O(n 2 λ, and (a is bounded by O( O(9 λ /n 2 = O(e 9λ/4 /n 2 since ln 9 9/4. The expression (b is bounded by (7/6 λ λ (7/6 λ λ O(e λ/4 /λ 3 = O(e λ/4 /λ since ln(7/6 5/4 and n/(2l i=2 + 2λ 3i/4 + 2/n 2 ln n i=2 2 n/(2l i= ln n + ( + 3/n 2 n 2 = O(e λ/4 /λ 3. Finally, the expression (c is bounded by O(. Therefore, the (,λ EA optimizes CLIFF in an expected runtime O(n 2 λ O(e 9λ/4 /n 2 O(e λ/4 /λ O( = O(e 5λ. 4 By part, with probability 2 Ω(n the (+λ EA needs a runtime larger than min{n n/4, e λ/4 } to optimize CLIFF. If min{n n/4, e λ/4 } + λ, by Theorem 4., the (,λ EA needs a runtime larger than min{n n/4, e λ/4 } to optimize CLIFF with probability 2 Ω(n /3, too. If min{n n/4, e λ/4 } > + λ, by Theorem 4.2, where c(n = 3/5, the (,λ EA needs a runtime larger than min{n n/4, e λ/4 } to optimize CLIFF with probability at least ( 2 Ω(n /2 3/5 /3 since 2 t(n e λ/4 and λ (5/2 ( + 3/5 ln t(n. Consequently, the (,λ EA needs an expected runtime larger than min{n n/4, e λ/4 }/3 to optimize CLIFF.

6 It is worth to note that the proof of Theorem 8.3 implies the following: For λ 5 ln n, the (,λ EA creates the optimum of CLIFF in runtime O(λn 2 with probability Ω(e 5λ, at least. Let us take a short look at the multistart variant/extension of EAs, where a particular EA A is (independently restarted after runtime l(n, denoted by A l(n. We observe the following for λ 3 ln l(n and polynomially bounded values of l(n. If the (+λ EA l(n is efficient, then also the (,λ EA l(n is efficient. And as we have seen, there exist functions where the (,λ EA l(n is efficient, but the (+λ EA l(n is not. So, in this situation and in case of doubt, one should prefer the (,λ EA l(n. In cases when λ ln(n/4, however, one should definitely prefer the (+λ EA l(n. Thus, we have obtained a somewhat general rule when to apply the comma or the plus selection. V. SUMMARY AND CONCLUSIONS We have compared the (,λ EA and the (+λ EA operating on {0, } n, and it has been pointed out why only the consideration of offspring populations of logarithmic size in n are interesting. For smaller values of λ, the (,λ is totally inefficient on every function with a unique optimum, whereas for larger values, the (,λ EA and (+λ EA behave equivalently with a high probability. These investigations have been exemplified by ONEMAX. For the example function CLIFF, we have analyzed rigorously when and why the (,λ EA outperforms the (+λ EA. Therefore, a simple but powerful proof method has been developed. However, our results support depending on the offspring population size the importance of a correct choice of the selection operator. REFERENCES [] H.-P. Schwefel, Evolution and Optimum Seeking. Wiley, 995. [2] S. Droste, T. Jansen, and I. Wegener, On the analysis of the (+ evolutionary algorithm, Theoretical Computer Science, vol. 276, pp. 5 8, 2002. [3] J. Garnier, L. Kallel, and M. Schoenauer, Rigorous hitting times for binary mutations, Evolutionary Computation, vol. 7, pp. 73 203, 999. [4] O. Giel and I. Wegener, Searching randomly for maximum matchings, Electronic Colloquium on Computational Complexity, vol. 76, 2004. [5] T. Jansen, K. De Jong, and I. Wegener, On the choice of the offspring population size in evolutionary algorithms, Evolutionary Computation, vol. 3, pp. 43 440, 2006. [6] C. Witt, Runtime analysis of the (µ+ EA on simple pseudo-boolean functions, Evolutionary Computation, vol. 4, pp. 65 86, 2006. [7] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University Press, 995. [8] J. He and X. Yao, Drift analysis and average time complexity of evolutionary algorithms, Artificial Intelligence, vol. 27, pp. 57 85, 200. [9] J. Jägersküpper, Analysis of a simple evolutionary algorithm for minimization in Euclidean spaces, in Proc. 30th International Colloquium on Automata, Languages, and Programming ICALP 2003, 2003, pp. 068 079. [0] I. Wegener, Methods for the analysis of evolutionary algorithms on pseudo-boolean functions, in Evolutionary Optimization, 2002, pp. 349 369. [] T. Storch, On the choice of the population size, in Proc. Genetic and Evolutionary Computation Conference GECCO 2004, LNCS 302, 2004, pp. 748 760. [2] J. Jägersküpper and T. Storch, How comma selection helps with the escape from local optima, in Proc. International Conference on Parallel Problem Solving From Nature IX PPSN 2006, 2006, pp. 52 6.

REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531