Refined Upper Bounds on the Expected Runtime of Non-elitist Populations from Fitness-Levels

Size: px

Start display at page:

Download "Refined Upper Bounds on the Expected Runtime of Non-elitist Populations from Fitness-Levels"

Shanon Marshall
5 years ago
Views:

1 Refined Upper Bounds on the Expected Runtime of Non-elitist Populations from Fitness-Levels ABSTRACT Duc-Cuong Dang ASAP Research Group School of Computer Science University of Nottingham Recently, an easy-to-use fitness-level technique was introduced to prove upper bounds on the expected runtime of randomised search heuristics with non-elitist populations and unary variation operators. Following this work, we present a new and much more detailed analysis of the population dynamics, leading to a significantly improved fitness-level technique. In addition to improving the technique, the proof has been simplified. From the new fitness-level technique, the upper bound on the runtime in terms of generations can be improved from linear to logarithmic in the population size. Increasing the population size therefore has a smaller impact on the runtime than previously thought. To illustrate this improvement, we show that the current bounds on the runtime of EAs with non-elitist populations on many example functions can be significantly reduced. Furthermore, the new fitness-level technique makes the relationship between the selective pressure and the runtime of the algorithm explicit. Surprisingly, a very weak selective pressure is sufficient to optimise many functions in expected polynomial time. This observation has important consequences of which some are explored in a companion paper. Categories and Subject Descriptors F. [Theory of Computation]: Analysis of Algorithms and Problem Complexity General Terms Theory, Algorithms, Complexity Keywords Runtime, Drift analysis, Non-elitism, Fitness-levels. INTRODUCTION The goal of this paper is to estimate the expected runtime of algorithms covered by the scheme of Algorithm. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. GECCO 4, July 6, 04, Vancouver, BC, Canada. Copyright 04 ACM /4/07...$ Per Kristian Lehre ASAP Research Group School of Computer Science University of Nottingham perkristian.lehre@nottingham.ac.uk Without loss of generality, we assume maximisation. The scheme defines a class of algorithms which can be instantiated by specifying the variation operator p mut, and the selection mechanism p sel. The general term population selectionvariation algorithm is used to emphasise that it not only covers Evolutionary Algorithms EAs, but also other populationbased search heuristics. Here, we assume that the algorithm optimises a fitness function f : χ R, implicitly given by the selection mechanism p sel. In addition, the description of p sel allows the scheme to cover more possibilities than static/classical optimisation, e.g., optimisation under uncertainty see []. The algorithm keeps a vector P t X λ, t 0, of λ search points. In analogy with evolutionary algorithms, the vector will be referred to as a population, and the vector elements as individuals. Each iteration of the inner loop is called a selection-variation step, and each iteration of the outer loop, which counts for λ iterations of the inner loop, is called a generation. The initial population is sampled uniformly at random. In subsequent generations, a new population P t is generated by independently sampling λ individuals from the existing population P t according to p sel, and perturbing each of the sampled individuals by a variation operator p mut. P t and P t are called the offspring and parent populations at generation t. The algorithm is classified as non-elitist, because the parent and offspring populations are non-overlapping. The algorithmic scheme is restricted to unary variation operators, i.e., those where each individual has only one parent. Higher-arity variation operators, e.g., crossover operators, are not covered. Algorithm Population Selection-Variation Algorithm Require: Finite state space X, and initial population P 0 UnifX λ. : for t = 0,,,... until termination condition met do : for i = to λ do 3: Sample I ti [λ] according to p sel P t. 4: x := P ti ti. 5: Sample x according to p mutx. 6: P ti := x. 7: end for 8: end for Algorithm has been studied in a sequence of papers [0, 9, 8, ]. Its running time depends critically on the balance between the selective pressure imposed by p sel, and the amount of variation introduced by p mut [0]. When the selective pressure falls below a certain threshold which depends 367

2 on the mutation rate, the expected running time of the algorithm becomes exponential [9]. Conversely, if the selection pressure exceeds this threshold, it is possible to show using a fitness-level argument [3] that the expected running time is bounded from above by a polynomial [8]. The present paper improves this latest result in two directions: by providing more precise upper bounds on the expected runtime, as well as providing results when the selective pressure is arbitrary close to the above mentioned threshold ie, when δ in Theorem 3 is close to 0. Recall that fitness-levels are the simplest and probably the oldest ways to derive upper bounds on the expected runtime of elitist EAs [3]. The idea is to partition the search space X into so-called fitness levels A,..., A m X, such that for all j m, all the search points in fitness level A j have inferior function value to the search points in fitness level A j and all global optima are in the last level A m. Due to the elitism, which means the offspring population has to compete with the best individuals of the parent population, the EA will never lose the highest fitness level found so far. If the probability of mutating any search point in fitness level A j into one of the higher fitness levels is at least s j, then the expected time until this occurs is at most /s j. The expected time to overcome all the inferior levels, i.e., the expected runtime, is by linearity of expectation no more than m j= /sj. This simple technique can sometimes provide tight upper bounds of the expected runtime. Recently in [], the method has been shown be efficient to derive tight lower bounds, the key idea is to estimate the number of fitness-levels being skipped on average. Non-elitist algorithms, such as Algorithm, may loose the current best solution. Therefore, to guarantee the optimisation of f, a set of conditions has to be applied so that the population does not frequently fall down to lower fitnesslevels. Those conditions were introduced in [8], which imply a large enough population size and a strong enough selection pressure relative to the variation operator. When the conditions are satisfied, [8] concludes that the runtime is bounded from above by Omλ m j= /sj assuming that values of the parameters can be fixed as constants details are given in Section 4. The proof divides the run of the algorithm into phases. A phase is considered successful if the population does not fall to a lower fitness level during the phase. The duration of each phase conditional on the phase being successful is analysed separately using drift analysis [5]. An unconditional upper bound on the expected running time of the algorithm is obtained by taking into account the success probabilities of the phases. The contribution of this paper is a more precise and general upper bound for the fitness-level technique of [8]. In particular, we reduce the above bound to Omλ ln λ m j= /sj and allow the parameter δ see Theorem 3 to be arbitrary close to 0. These refinements are achieved due to a much more detailed analysis of the population dynamics. The remainder of the paper is organised as follows. Section provides a set of preliminary results that are crucial for our improvement. Our new theorem is presented with its proof in Section 3. The new results for the set of functions which was previously presented in [8] are described in Section 4. Further potential applications of the new form of the theorem are also discussed in the section. Finally, some conclusions are drawn.. PRELIMINARIES For any positive integer n, define [n] := {,,..., n}. The indicator function is also denoted by [X] for a logic statement X. The natural logarithm is denoted by ln, and the logarithm to the base is denoted by log. For a bitstring x of length n, define x := n i= xi. The ordering of the elements in a population vector P X λ according to decreasing f-value will be denoted x, x,..., x λ, i.e., such that fx fx fx λ. For any constant γ 0,, the individual x γλ will be referred to as the γ-ranked individual of the population. Variation operators are formally represented as transition matrices p mut : X X [0, ] over the search space, where p mutx y represents the probability of perturbing an individual y into an individual x. Selection operators are represented as probability distributions over the set of integers [λ], where the conditional probability p sel i P t represents the probability of selecting individual P ti, i. e., the i-th individual from population P t. For notational convenience, the probability of selecting an individual x P t will also be denoted by p sel x P t. Each individual within a generation is sampled independently from the same distribution p sel. It makes sense to assume that p sel is monotone with respect to an underlying fitness function f : X R. Definition [8]. A selection mechanism p sel is f- monotone if for all P X λ and pairs i, j [λ] it holds p sel i P p sel j P fp i fp j. Informally, the selective pressure of a selection mechanism refers to the degree to which the selection mechanism selects individuals that have higher f-values. We define the cumulative selection probability of a selection mechanism p sel with respect to a fitness function f as follows. Definition [8]. The cumulative selection probability β associated with selection mechanism p sel is defined for all γ 0, ] and P X λ by βγ, P := p sel y P [fy fx γλ ] y P Informally, βγ, P is the probability of selecting an individual with fitness at least as high as that of the γ-ranked individual. We write βγ instead of βγ, P when β can be bounded independently of the population vector P. Recall the definition of stochastic dominance. Definition 3. A random variable A is stochastically dominated by a random variable B, denoted by A B, if PrA > x PrB > x for all x R. Equivalently, A B holds if and only if E [fa] E [fb] for any non-decreasing function f. Our main tool to prove Theorem 3 is the following additive drift theorem. Theorem 4 Drift Theorem [5]. Let X, X,... be a stochastic process over S, d : S R 0 be a distance function on S and F t be the filtration induced by X,..., X t. Define T to be the first time t such that dx t = 0. If there exists a constant > 0 such that,. t 0 : Pr dx t < B =, and. t 0 : E [dx t dx t F t], then E [T ] B/. 368

3 The most common application of the drift theorem in evolutionary computation is to bound the expected time until the process reaches a potential of 0 ie., dx t = 0. The socalled potential function d serves the purpose of measuring the progress toward the optimum. In our case, X t represents the number of individuals in the population who has advanced to a higher fitness level. We are interested in the time until X t γ 0λ, ie., until some constant fraction of the population has advanced to the higher fitness level, sometimes called the take-over time. The earlier fitness-level theorem for populations [8] used a linear potential function on the form dx = C x, and obtained an expected progress of δx t for some constant δ. This situation is somehow inverse to that in multiplicative drift [], as we are waiting for the process X t to reach a large value. It well known that d should be chosen such that is a constant independent of X t. In this paper, we therefore consider the potential function dx = C ln cx, for appropriate choices of C and c. However, in order to bound the drift with this potential function, it is insufficient to only consider the expectation of X t. The following lemma shows that it is sufficient to exploit the fact that X t is binomially distributed. Lemma 5. If X Binλ, p with p i/λ δ and i for some δ > 0, then [ ] cx cɛ where ɛ = min{/, δ/} and c = ɛ 4 /4. Proof. Let [ Y Binλ, ] i/λ [ ɛ, ] then Y X. cx cy Therefore, and it is sufficient to show that cɛ to complete ci ci [ ] the cy ci proof. We consider the two following cases. For i 8/ɛ 3 : let q := Pr Y ɛi, then by the law of total probability [ ] cy c ɛi q ln = ln cɛi q ln q ln c ɛi Note that ci = ɛ 4 /4i ɛ 4 /48/ɛ 3 = ɛ/3, then ciɛ ci = ɛ /ci ɛ 3/ɛ = ɛ ɛ 3 ɛ 3 = ɛ 7 We have the following based on Lemma 6 and ln cɛi ln ɛ /7 ɛ /7 ɛ /7 /7 /8ɛ = 7ɛ /98 Using Theorem 5 with E [Y ] = ɛi, and e x > x q exp iɛ exp iɛ < 4 ɛ 4 iɛ We then have 4 ɛ 4 q ln c ɛi qc ɛi i = ɛ iɛ 4 4 Putting everything together, we get [ ] cy ɛ 7/98 /4 = 5ɛ 96 > 8 ɛ 4 ɛ = 8 c ɛ ɛ cɛ 3 For i 8/ɛ 3 : the condition i implies that E [Y ] = ɛi ɛ, and for binomially distributed random variables we have Var [Y ] E [Y ], so E [ Y ] = Var [Y ] E [Y ] E [Y ] E [Y ] ɛ E [Y ] ɛ Using Lemma 6, and ci ɛ/3, we get [ ] cy = E [ln cy ] ln E [cy ] E [ c Y ] ci E [cy ] E [cy ] ɛ ci ɛ = ci ɛ ci ɛ ɛ ci ɛ ɛ 3 3 = ciɛ cɛ Combining the two cases, we have, for all i, [ ] [ ] cx cy cɛ We will also need the following results. Lemma 6 Lemma 8 in [8]. If X Binλ, p with p i/λ δ, then E [ e κx] e κi for any κ 0, δ. Proof. For the completeness of our main theorem, we detail the proof of [8] as follows. The value of the moment generating function M Xt of the binomially distributed variable X at t = κ is [ E e κx] = M X κ = p e κ λ It follows from Lemma 8 and from κ < δ that p e κ i δ κ κi λ κ λ Altogether, E [ e κx] κi/λ λ e κi. Lemma 7. Let Y, Y,..., Y n be n random variables where Y Y i for all i [n]. Then for any random variable Z [n], it holds that E [Y Z] E [Y ]. Particularly for the case that Z and {Y i} are dependent, E [Y i] δ for all i does not necessarily imply that E [Y Z] δ. However, such a conclusion can be drawn if there is a specific Y with E [Y ] δ being dominated by all other Y i as stated in the lemma. Proof. For any outcome ω of the sample space Ω, let j := Zω [n]. Therefore, Y Zω ω = Y jω Y ω. Hence Y Zω Y ω for all ω Ω, and E [Y Z] E [Y ]. 369

4 3. A REFINED FITNESS LEVEL THEOREM For notational convenience, define for j [m], the set A j := m i=j Ai, i.e., the set of search points at higher fitness levels than A j. We have the following theorem and corollary for the runtime of non-elitist populations. Theorem 8. Given a function f : X R, and an f- based partition A,..., A m, let T be the number of function evaluations until Algorithm with an f-monotone selection mechanism p sel obtains an element in A m for the first time. If there exist parameters p 0, s,..., s m, s 0, ], and γ 0 0, and δ > 0, such that C p mut y A j x A j sj s for all j [m] C p mut y Aj A j x A j p0 for all j [m] C3 βγ, P p 0 δγ for all P X λ and γ 0, γ 0 C4 λ 6m a ln with a = δ γ 0 δ acɛs ɛ = min{δ/, /} and c = ɛ 4 /4 then E [T ] cɛ p 0 mλ ln cλ δγ 0 m s j j= The new theorem has the same form as the one in [8]. Similarly to the classical fitness-level argument, it assumes an f-based partition A,..., A m of the search space X. Each subset A j is called a fitness level. Condition C specifies that for each fitness level A j, the upgrade probability, ie the probability that an individual in fitness level A j is mutated into a higher fitness level, is lower bounded by a parameter s j. Condition C requires that there exists a lower bound p 0 on the probability that an individual will not downgrade to a lower fitness level. In the classical setting where p mut is bitwise mutation, it suffices to let p 0 be the probability of not mutating any bits, e.g., with mutation rate χ/n, one can choose p 0 = e χ since e χ exp χ χ/n χ χ/n n. Condition C3 requires that the selective pressure see Definition induced by the selection mechanism is sufficiently strong. The probability of selecting one of the fittest γλ individuals in the population, and not downgrading the individual via mutation probability p 0, should exceed γ by a factor of at least δ. In applications, the parameter δ may depend on the optimisation problem. The last condition C4 requires that the population size λ is sufficiently large. The required population size depends on the number of fitness levels m, the selection pressure δ c and ɛ are functions of δ, and the upgrade probabilities s j, which are problem-dependant parameters. A population size of λ = Θln n, where n is the number of problem dimensions, is sufficient for many pseudo-boolean functions. If all the conditions C-4 are satisfied, then an upper bound on the expected runtime of the algorithm is guaranteed. The new theorem makes the relation between the expected runtime and the parameters, including δ, explicit. Most notably, the assumption about δ being a constant as required by the old theorem in [8] is removed. This allows the new theorem to be applied in more complex settings, such as optimisation under incomplete information or uncertainty see eg. []. If δ is bounded from below by a constant, we have the following corollary. Corollary 9. In addition to C-4, if p 0, γ 0 and δ can be fixed as constants with respect to m, then m E [T ] = O mλ ln λ s j j= The corollary trivially holds because a, c and ɛ are constants as being derived from the constants δ and γ 0. In that case, the first part of the runtime is reduced from Omλ as previously stated in [8] to Omλ ln λ. In the other word, the overhead when increasing the population size is significantly reduced compared to previous thought. We now show the proof of Theorem 8. Proof of Theorem 8. We use the following notation. The number of individuals with at least fitness j at generation t is denoted by X j t for j [m ]. The current fitness level of the population at generation t is denoted by Z t, and Z t = l iif Xt l γ 0λ and X l t < γ 0λ. Note that Z t is uniquely defined, as it is the fitness level of the γ 0-ranked individual at generation t. In addition, since we are interested in the upper bounds of the runtime, the stopping time of the process can be considered at X m t γ 0λ, or equivalently Z t = m. We also use to denote the probability to generate at least one individual at fitness-level strictly better than j in the next generation, knowing that there are at least γ 0λ individuals of the population at fitness-level j in the current generation. By this definition, = βγ 0s j λ βγ0λsj due to Lemma 8 βγ 0λs j Using the following potential function, the theorem can now be proved solely relying on the additive drift argument. gt = g t g t with g t = m Z t ln cλ ln cx Z t t and g t = q Zt e κxz t t m j=z t The function is bounded from above by, m gt m ln cλ m ln cλ j= m j= = m ln cλ with κ 0, δ βγ 0λ βγ 0λs j m s j j= At generation t, we use R = Z t Z t to denote the random variable describing the next progress in fitness-levels. To simplify further writing, let us put l = Z t, i = Xt, l X = Xt, l then = gt gt = with X lr t = g t g t = R ln cλ ln = g t g t = q l e κi q lr e κxlr t lr j=l Let F t be the filtration induced by P t, define E t to be the event that the population in the next generation does 370

5 not fall down to a lower level, E t : Z t Z t, and Ēt the complementary event. We have, E [ F t] = Pr Ē t F t E [ Ft, E t] Pr Ē t F t E [ Ft, Ēt ] = E [ F t, E t] Pr Ē t F t E [ Ft, E t] E [ F t, Ēt ] We first compute the conditional forward drift E [ F t, E t]. Note that R is a non-negative variable under condition E t and is a random variable indexed by R = Y R. We can show that Y r Y 0 for fixed indexes. cx Y 0 = ln q l e κi q l e κx X lr t Y r = r ln cλ ln q l e κi q lr e κxlr t ln cλ ln cλ = ln cλ ln lr ln j=l q l e κi q lr lr q l e κi Y0 q l eκi j=l cλ q l e κi lr j=l It follows from Lemma 7 that E [ F t, E t] = E [Y R F t, E t] E [Y 0 F t, E t], so we only focus on r = 0 to lower bounding the drift. We separate two cases, i = 0, the event is denoted by Z t, and i event Z t. Recall that each individual is generated independently from each other, so during Z t we have that X Binλ, p where p βi/λp 0 i/λ δ. The inequality is due to condition C3. Hence by Lemma 5, it holds for i event Z t that E [ F t, E t, Z t ] E [ln ] cx F t, E t, Z t cɛ E [ F t, E t, Z t] E [ln F t, E t, Z t] = 0 By Lemma 6, we have e κi E [ e κx] for i, then E [ F t, E t, Z ] t e κi E [e κx F t, E t, q Z ] t 0 l For i = 0, recall that q l = Pr X F t, E t, Z t, so E [ F t, E t, Z t] [ Pr X F t, E t, Z t E q l e ] Ft, Et, Zt, X κi q l eκx q l e κ 0 e κ = e κ q l So the conditional forward drift is E [ F t, E t] min{cɛ, e κ }. Furthermore, κ can be picked in the non-empty interval ln cɛ, δ 0, δ, so that e κ > cɛ and E [ F t, E t] cɛ. Next, we compute the conditional backward drift, which can be done for the worst case. E [ F t, Ēt ] m ln cλ ln cλ m ln cλ m = m ln cλ m j= βγ 0λs βγ 0λs j m j= The probability of not having event E t is computed as follows. Recall that Xt l γ 0λ and Xt l is binomial distributed with probability of at least βγ 0p 0 δγ 0, so E [ ] Xt F l t δγ0λ. The event Ēt happens when the number of individuals at fitness level l is strictly less than γ 0λ in the next generation, Pr Ē t F t = Pr Xt l < γ 0λ F t Pr X t l γ 0λ F t = Pr Xt l δ δγ 0λ F t δ Pr Xt l δ ] E [X t F l t F t δ δ E [ ] Xt F l t exp due to Theorem 5 δ exp δ δγ 0λ = e aλ δ Recall condition C4 on the population size that, λ 6m a ln 8m e aλ acɛs acɛs eaλ due to Lemma 9 aλ Pr Ē t F t e aλ cɛs 8mλ Altogether, we have the drift of cɛ m E [ F t] cɛ cɛs 8mλ = cɛ cɛ 8λ ln cλ βγ 0λs cɛs m s s ln cλ cɛ cɛ cɛ cλ = cɛ 8λ 8λ λ cɛ βγ 0λ c 3 λ Finally, by multiplying λ with the output of Theorem 4 we have the runtime in terms of evaluations, E [T ] p 0 m mλ ln cλ cɛ δγ 0 s j j= Following the proof of the theorem, we observe that for each current fitness level j = Z t the runtime of the algorithmic scheme is separated into two phases: first the algorithms waits for the generation of an advanced individual with fitness Z t ; then the γ 0-upper portion of the population is filled up with advanced individuals through harmless mutation, including replications and occasional upgradings. The algorithm is considered to have left the current fitness level when there are at least γ 0λ advanced individuals in the 37

6 population. The first phase is based on the probability which is βγ 0s j λ. In fact, the lower bound for is loosely estimated using a weak version of Lemma 8. So one may wonder if the true value of λ/ could be smaller than /s j, e.g. hypothetically improving the expected optimisation times using populations instead of the standard EA. The answer is no, due to the following result. Lemma 0. For any γ 0, βγ 0, s j 0, and λ N, we λ have > βγ 0 s j λ s j. Proof. From the condition, we have βγ 0s j, 0, it follows from Theorem 7 that βγ 0s j λ βγ 0s jλ, then βγ 0 s j > s j. λ βγ 0 s j λ The bigger term is the expected waiting time for Algorithm to generate one advanced individual. This is similar to the waiting time of a λ EA to leave level j. So it is impossible for the new bound to improve the optimisation times, unless parallel implementations are considered. So, it is crucial to fully understand the population dynamic in the second phase to finely determine the runtime. 4. APPLICATIONS We focus on Evolutionary Algorithms EAs with bitwise mutations as variation operators and consider pseudo- Boolean functions. We use the same notation χ/n as the literature to denote the probability of flipping a bit position in a bitwise mutation. As the first application, we use Theorem 8 to improve the results of [8] on static functions. The following selection mechanisms were considered in [8]. In tournament selection of size k, denoted as k- tournament, k individuals are uniformly sampled from the current population with replacement then the fittest one is selected. Ties are broken uniformly at random. In ranking selection, individuals are assigned ranked from 0 to, where the best has rank 0 and the worst has. Following [4], a function α : R R is considered a ranking function if αx 0 for all x [0, ], and αxdx =. The selection mechanism chooses individuals such that the probability of selecting individu- 0 als ranked γ or better is γ αxdx. Note that this coincides with our definition of βγ. Linear ranking se- 0 lection uses the ranking function αx := η xx with η, ]. In µ, λ-selection, parent solutions are selected uniformly at random among the best µ out of λ individuals in the current population. Fitness proportionate selection with power scaling parameter ν is defined for maximisation problems as follows i [λ] p sel i P t, f := fp ti ν λ j= fptjν. Setting ν = gives the classical version. The following lemma summarises some results of [8]. Lemma. For any constants δ > 0 and p 0 0,, there exist constants δ > 0 and γ 0 0, such that condition C3 of Theorem 8 is satisfied for k-tournament selection with k δ /p 0, Selection Mechanism Parameter Fitness proportionate * ν > f max lne χ Linear ranking η > e χ k-tournament k > e χ µ, λ λ > µe χ Problem Population size E [T ] OneMax λ c ln n Onλ ln λ LeadingOnes λ c ln n Onλ ln λ n Linear λ c ln n Onλ ln λ n k-unimodal λ c lnnk Okλ ln λ nk Jump r λ cr ln n Onλ ln λ n/χ r Table : Expected runtime of Algorithm with corresponding parameter settings. For clarity, all constant factors are omitted or summarised by c. * The result for fitness proportionate selection only holds for the OneMax function. linear ranking selection with η δ /p 0, µ, λ-selection with λ/µ > δ /p 0, fitness proportionate selection with ν > ln/p 0f max. Proof. See Lemmas 5, 6, 7 and 8 in [8]. The lemma implies that δ and γ can be fixed as constants for bitwise mutation operators with χ being a constant by setting of the selection mechanisms correctly. In those cases, Corollary 9 provides the runtime for the following functions. n OneMaxx := x i = x LeadingOnesx := Jump rx := Linearx := i= n i i= j= x i { x if x n r or x = n 0 otherwise n c ix i j= We also analyse the runtime of k-unimodal functions. Recall that a pseudo-boolean function f is called unimodal if every bitstring x is either optimal, or has a Hammingneighbour x such that fx > fx. We say that a unimodal function is k-unimodal if it has k distinct function values f < f < < f k. Remark that LeadingOnes is a particular case of k-unimodal with k = n and OneMax is a special case of Linear with c i = for all i [n]. Corresponding to the new results reported in Table, the following theorem improves the results previously reported [8]. Theorem. Algorithm with bit-wise mutation rate χ/n for any constant χ > 0, and where p sel is either linear ranking selection, k-tournament selection, or µ, λ-selection where the parameter settings satisfy column Parameter in the first part of Table has expected runtimes as indicated in column E [T ] in the second part, given the population sizes respecting column Population Size of the same part. The result of fitness proportionate selection only holds for OneMax function. 37

7 Corollary 3. Algorithm with either linear ranking, or fitness proportionate or k-tournament, or µ, λ selection can optimise OneMax in On ln n ln ln n. With either linear ranking, or k-tournament, or µ, λ selection, the algorithm can optimise LeadingOnes, Linear in On and k-unimodal with k n d in Onk for any constant d N. Proof. The corollary is obtained from the theorem by taking the smallest population size. It is clear that we have On ln n ln ln n for OneMax and Onln n ln ln n n = Onln n n = On for Linear and LeadingOnes. With k-unimodal, because k n d then c lnnk cd ln n. It suffices to pick λ = cd ln n and the runtime is bounded by Okln nk ln ln nk n = Okln n n = Okn. Remark that except for OneMax, the overhead of having a population of solutions is ignored for all those cases. The proof of the theorem is similar to the one in [8], except the main tool is Corollary 9. We recall those arguments shortly as follows. The f-based partitions and upgrade probabilities are similar to the ones in [6, 7, ]. For a Linear function f, without loss of generality we assume the weights c { c c n 0, then set m = n and choose A j := x {0, } n j i= ci fx < } j i= ci [6]. For OneMax, LeadingOnes, and k-unimodal functions, with k distinct function values f < < f k, we set m = k and use the partition [], A j := {x {0, } n fx = f j}. For Linear functions, and for k-unimodal functions including LeadingOnes where k = n it is sufficient to flip one specific bit, and no other bits to reach a higher fitness level. For these functions, we therefore choose for all j the upgrade probabilities s := χ/n χ/n n = Ω/n =: s j. For OneMax, it is sufficient to flip one of the n j 0-bits, and no other bits to escape fitness level j. For this function, we therefore choose the upgrade probabilities s j := n jχ/n χ/n n = Ω j/n and s := s n. For Jump r, we set m := n r, and choose A := {x {0, } n n r x < n} and A j> := {x {0, } n x = j}. In order to escape fitness level A, it is sufficient to flip at most r bits and no other bits. For fitness level j >, it suffices to flip one of n j 0-bits and no other bits. Hence, we choose the upgrade probabilities s := χ/n r χ/n n r = Ωχ/n r =: s and s j> := n jχ/n χ/n n = Ω j/n. By the above partitions, the first condition C is satisfied. Next, we set the parameter p 0 to be a lower bound of the probability of not flipping any bits χ/n n, if so condition C is therefore satisfied. For any constant ε > 0, it holds for all n > χ / ln ε that χ [ n χ ] n χ χ χ n n n εe χ We can set p 0 = /εe χ which is a constant. According to the parameter settings of Table and Lemma there exist constants δ and γ 0 so that C3 is satisfied. Regarding C4, for Linear functions including OneMax, m/s = On. For k-unimodal functions including LeadingOnes, m/s = Onk. For Jump r, m/s = On r /χ r. Condition C4 is satisfied if the population size λ is set according to column Population size of Table. All conditions are satisfied, and the upper bounds in Table follow. We now explore the possibility of using Theorem 8 for runtime analysis on uncertain, or stochastic optimisation problems. We assume that the algorithm needs to optimise a deterministic pseudo-boolean function f : {0, } n R. However, the algorithm has only access to a stochastic approximation function F, such that F x is a random variable which represents the noisy and uncertain evaluation of fx. The approximation function F must contain some information about f. Formally, we say that F is δ-biased with respect to f if for all x, y {0, } n such that fx > fy, it holds Pr F x > F y / δ. Theorem 4. Let f : {0, } n R be any pseudo-boolean function, and F x a random variable for any x {0, } n. Consider Algorithm with -tournament selection and bitwise mutation with mutation rate χ/n. If there exist an f-based partition A,..., A m, and two constants d > 0 and D, such that L F is /n d -biased with respect to f L χ n nn d L3 C- of Theorem 8 holds with m j= /sj = OnD. then there exists some λ polyn such that the expected optimisation time when using F is in polyn. The first condition L implies that the probability of selecting the fittest individual in a tournament needs only to be slightly larger than /. Proof. Condition L3 implies that condition C can be satisfied with /s j polyn and /s polyn. Therefore, it suffices to show that condition C3 holds for some δ where /δ polyn, because this implies that there exists a λ polyn satisfying condition C4. We first compute βγ, the probability of selecting an individual in the upper γ-portion of the population for the mutation. This selection event occurs when: i both the selected individuals are selected among the best γλ individuals in the population, or ii at least one of them is, and the better one of them is picked. Condition L then gives, βγ γ γ γ/ n d = γ n d γ Here we can fix γ 0 to / so that γ / and βγ γ n d. We now look at p 0 which can be chosen to be exactly χ/n n, the probability of not mutating any bit position. From condition L, we have the probability of not flipping any bit positions χ n χ n n d = n d / =: p n d 0 The first inequality is due to Theorem 7. Now altogether, n βγp 0 γ n d d / = γ n d / n d So δ is at least n d / where d is a constant, or /δ is upper bounded by a polynomial of n. The theorem indicates that if the EA can be shown to efficiently optimise a pseudo-boolean function f using a fitness-level theorem, then a non-elitist EA with appropriate parameter settings may optimise the same function in expected polynomial time, using only access to an δ-biased approximation function F with /δ polyn. Specific settings of noisy and uncertain approximation functions are analysed in detail in a companion paper []. 373

8 5. CONCLUSIONS We have presented a new version of the main theorem introduced in [8]. The new theorem offers a very general tool to analyse the expected runtime of randomised search heuristics with non-elitist populations on many optimisation problems. Sharing similarities with the classical fitness-level technique, our method is straightforward and easy-to-use. The structure of the new proof was simplified and a more detailed analysis of the population dynamics has been provided. This leads to significantly improved upper bounds for the expected runtimes of EAs on many pseudo-boolean functions. More importantly, the new fitness-level technique makes the relationship between the selective pressure and the runtime of the algorithm explicit. Surprisingly, a very weak selective pressure is sufficient to optimise many functions in expected polynomial time. This observation has interesting consequences, e.g., optimisation under uncertainty. A very first application of this kind is discussed in our companion paper []. Acknowledgments The research leading to these results has received funding from the European Union Seventh Framework Programme FP7/ under grant agreement no 6809 SAGE and from the British Engineering and Physical Science Research Council EPSRC grant no EP/F0334/ LANCS. 6. REFERENCES [] D.-C. Dang and P. K. Lehre. Evolution under partial information. To appear in the Proceedings of the 6th Annual Conference on Genetic and Evolutionary Computation GECCO 4, Vancouver, Canada, 04. [] B. Doerr, D. Johannsen, and C. Winzen. Multiplicative drift analysis. Algorithmica, 644: , 0. [3] D. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, NY, USA, 009. [4] D. E. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms FOGA I, pages 69 93, 990. [5] J. He and X. Yao. Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence, 7:57 85, 00. Erratum in Artif. Intell. 40/: [6] T. Kötzing, F. Neumann, D. Sudholt, and M. Wagner. Simple max-min ant systems and the optimization of linear pseudo-boolean functions. In Foundations of Genetic Algorithms FOGA XI, pages 09 8, 0. [7] J. Lässig and D. Sudholt. General scheme for analyzing running times of parallel evolutionary algorithms. In PPSN, pages 34 43, 00. [8] P. K. Lehre. Fitness-levels for non-elitist populations. In Proceedings of the 3th Annual Conference on Genetic and Evolutionary Computation, GECCO, pages , New York, NY, USA, 0. ACM. [9] P. K. Lehre. Negative Drift in Populations. In Proceedings of Parallel Problem Solving from Nature - PPSN XI, volume 638 of LNCS, pages Springer Berlin / Heidelberg, 0. [0] P. K. Lehre and X. Yao. On the impact of mutation-selection balance on the runtime of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 6:5 4, 0. [] D. S. Mitrinović. Elementary Inequalities. P. Noordhoff Ltd, Groningen, 964. [] D. Sudholt. A new method for lower bounds on the running time of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 73:48 435, 03. [3] I. Wegener. Methods for the analysis of evolutionary algorithms on pseudo-boolean functions. In Evolutionary Optimization, volume 48, pages Springer US, 00. APPENDIX Theorem 5 Chernoff s Inequality [3]. If X = m i= Xi, where Xi {0, }, i [m], are independent random variables, then Pr X ɛe [X] exp ɛ E [X] / for any ɛ [0, ]. Lemma 6. For all x 0, x ln x x x/. Proof. The first inequality is due to e x x. For the next one, it is easy to see that fx = ln x x x/ is non-decreasing by looking at its derivative. So fx f0 = 0 for all x 0, or ln x x x/. Theorem 7 Bernoulli s Inequality []. If x > and n N, then x n nx. Lemma 8. For n N and x 0, we have x n e xn xn xn. Proof. From e x x, it follows that x n e nx x n = / n n k= k x k. For x 0, we have x k 0, so it suffices to cut off the sum at any k, the result is obtained with k =. Lemma 9. For all x R, e x x e x > xe x and for x > 0, e x /x > e x. Proof. Multiplying e x to e x x > 0 and adding xe x to both sides give the first result, then dividing them to x > 0 provide the second one. 374

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions. Dang, Duc-Cuong and Lehre, Per Kristian 2016 Runtime analysis of non-elitist populations: from classical optimisation to partial information. Algorithmica, 75 3. pp. 428-461. ISSN 1432-0541 Access from