arxiv: v1 [cs.ne] 4 Sep 2017

Size: px

Start display at page:

Download "arxiv: v1 [cs.ne] 4 Sep 2017"

Cecilia George
5 years ago
Views:

1 Theoretical Aalysis of Stochastic Search Algorithms Per Kristia Lehre School of Computer Sciece, Uiversity of Birmigham, Birmigham, UK Pietro S. Oliveto Departmet of Computer Sciece, Uiversity of Sheffield, Sheffield, UK September 5, 2017 arxiv: v1 [cs.ne] 4 Sep 2017 Abstract Theoretical aalyses of stochastic search algorithms, albeit few, have always existed sice these algorithms became popular. Startig i the ieties a systematic approach to aalyse the performace of stochastic search heuristics has bee put i place. This quickly icreasig basis of results allows, owadays, the aalysis of sophisticated algorithms such as populatiobased evolutioary algorithms, at coloy optimisatio ad artificial immue systems. Results are available cocerig problems from various domais icludig classical combiatorial ad cotiuous optimisatio, sigle ad multi-objective optimisatio, ad oisy ad dyamic optimisatio. This chapter itroduces the mathematical techiques that are most commoly used i the rutime aalysis of stochastic search heuristics. Careful attetio is give to the very popular artificial fitess levels ad drift aalyses techiques for which several variats are preseted. To aid the reader s comprehesio of the preseted mathematical methods, these are applied to the aalysis of simple evolutioary algorithms for artificial example fuctios. The chapter is cocluded by providig refereces to more complex applicatios ad further extesios of the techiques for the obtaimet of advaced results. 1 Itroductio Stochastic search algorithms, also called radomised search heuristics, are geeral purpose optimisatio algorithms that are ofte used whe it is ot possible to desig a specific algorithm for the problem at had. Commo reasos are the lack of available resources e.g., eough moey ad/or time) or because of a isufficiet kowledge of the complex optimisatio problem which has ot bee studied extesively before. Other times, the oly way of acquirig kowledge about the problem is by evaluatig the quality of cadidate solutios. Well-kow stochastic search algorithms are radom local search ad simulated aealig. Other more complicated approaches are ispired by processes observed i ature. Popular examples are evolutioary algorithms EAs) ispired by the cocept of atural evolutio, at coloy optimisatio ACO) ispired by at foragig behaviour ad artificial immue systems AIS) ispired by the immue system of vertebrates. The mai advatage of stochastic search heuristics is that, beig geeral purpose algorithms, they ca be applied to a wide rage of applicatios without requirig hardly ay kowledge of the problem at had. Also, the simplicity for which they ca be applied implies that practitioers ca use them to fid high quality solutios to a wide variety of problems without eedig skills ad kowledge of algorithm desig. Ideed, umerous applicatios report high performace results which make them widely used i practice. However, through experimetal work ad applicatios it is difficult to uderstad the reasos for these successes. I particular, give a stochastic search algorithm, it is uclear o which kid of problems it will achieve good performace ad o which it will perform poorly. Eve more crucial is the lack of uderstadig of how the parameter settigs ifluece the performace of the algorithms. The goal of a rigorous theoretical foudatio of stochastic search algorithms is to aswer questios of this ature by explaiig the success or the failure of these methods i practical applicatios. The beefits of a theoretical uderstadig are threefold: a) guidig the choice of the best algorithm for the problem at had, b) determiig the optimal parameter settigs, ad c) aidig the algorithm desig, ultimately leadig to the achievemet of better algorithms. 1

2 Theoretical studies of stochastic optimisatio methods have always existed, albeit few, sice these algorithms became popular. I particular, the icreasig popularity gaied by evolutioary ad geetic algorithms i the seveties led to various attempts at buildig a theory for these algorithms. However, such iitial studies attempted to provide isights o the behaviour of evolutioary algorithms rather tha estimatig their performace. The most popular of these theoretical frameworks was probably the schema theory itroduced by Hollad [13] ad made popular by Goldberg [11]. I the early ieties a very differet approach appeared to the aalysis of evolutioary algorithms ad cosequetly radomised search heuristics i geeral, drive by the isight that these heuristics are ideed radomised algorithms, albeit geeral-purpose oes, ad as such they should be aalysed i a similar spirit to that of classical radomised algorithms [25]. For the last 25 years this field has kept growig cosiderably ad owadays several advaced ad powerful tools have bee devised that allow the aalysis of the performace of ivolved stochastic search algorithms for problems from various domais. These iclude problems from classical combiatorial ad cotiuous optimisatio, dyamic optimisatio ad oisy optimisatio. The geerality of the developed techiques, has allowed their applicatio to the aalyses of several families of stochastic search algorithms icludig evolutioary algorithms, local search, metropolis, simulated aealig, at coloy optimisatio, artificial immue systems, particle swarm optimisatio, estimatio of distributio algorithms amogst others. The aim of this chapter is to itroduce the reader to the most commo ad powerful tools used i the performace aalysis of radomised search heuristics. Sice the mai focus is the uderstadig of the methods, these will be applied to the aalysis of very simple evolutioary algorithms for artificial example fuctios. The hope is that the structure of the fuctios ad the behaviour of the algorithms are easy to grasp so the attetio of the reader may be mostly focused o the mathematical techiques that will be preseted. At the ed of the chapter refereces to complex applicatios of the techiques for the obtaimet of advaced results will be poited out for further readig. 2 Computatioal Complexity of Stochastic Search Algorithms From the perspective of computer sciece, stochastic search heuristics are radomised algorithms although more geeral tha problem specific oes. Hece, it is atural to aalyse their performace i the classical way as doe i computer sciece. From this perspective a algorithm should be correct, i.e., for every istace of the problem the iput) the algorithm halts with the correct solutio i.e., the correct output) ad it should be efficiet i terms of its computatioal complexity i.e., the algorithm uses the computatioal resources wisely. The resources usually cosidered are the umber of basic computatios to fid the solutio i.e., time) ad the amout of memory required i.e., space). Differetly from problem-specific algorithms, the goal behid geeral-purpose algorithms such as stochastic search heuristics is to deliver good performace idepedetly of the problem at had. I other words, a geeral-purpose algorithm is correct if it visits the optimal solutio of ay problem i fiite time. If the optimum is ever lost afterwards, the a stochastic search algorithm is said to coverge to the optimal solutio. I a formal sese the latter coditio for covergece is required because most search heuristics are ot capable of recogisig whe a optimal solutio has bee foud i.e., they do ot halt). However, it suffices to keep track of the best foud solutio durig the ru of the algorithm, hece the coditio is trivial to satisfy. What is particularly relevat ad ca make a huge differece o the usefuless of a stochastic search heuristic for a give problem is its time complexity. I each iteratio the evaluatio of the quality of a solutio is geerally far more expesive tha its other algorithmic steps. As a result, it is very commo to measure time as the umber of evaluatios of the fitess fuctio also called objective fuctio) rather tha coutig the umber of basic computatios. Sice radomised algorithms make radom choices durig their executio, the rutime of a stochastic search heuristic A to optimise a fuctio f is a radom variable T A,f. The mai measure of iterest is: 1. The expected rutime E [T A,f ]: the expected umber of fitess fuctio evaluatios util the optimum of f is foud; For skewed rutime distributios, the expected rutime may be a deceivig measure of algorithm performace. The followig measure therefore provides additioal iformatio. 2

3 Figure 1: A efficiet search heuristic blue) versus ad iefficiet search heuristic red) for a give istace. 2. The success probability i t steps Pr T A,f t): the probability that the optimum is foud withi t steps. Just like i the classical theory of efficiet algorithms the time is aalysed i relatio to growig iput legth ad usually described usig asymptotic otatio [3]. A search heuristic A is said to be efficiet for a fuctio class) f if the rutime grows as polyomial fuctio of the istace size. O the other had, if the rutime grows as a expoetial fuctio, the the heuristic is said to be iefficiet. See Figure 1 for a illustrative distictio. 3 Evolutioary Algorithms A geeral framework of a evolutioary algorithm is the µ+λ) EA defied i Algorithm 1. The algorithm evolves a populatio of µ cadidate solutios, geerally called the paret populatio. At each geeratio a offsprig populatio of λ idividuals is created by selectig idividuals from the paret populatio uiformly at radom ad by applyig a mutatio operator to them. The geeratio is cocluded by selectig the µ fittest idividuals out of the µ + λ parets ad offsprig. Algorithm 1 presets a formal defiitio. Algorithm 1 µ+λ) EA 1: Iitialisatio: Iitialise P 0 = {x 1),..., x µ) } with µ idividuals chose uiformly a radom from {0, 1} ; t 0; 2: for i = 1,..., λ do 3: Selectio for Reproductio: Choose x P t uiformly at radom; 4: Variatio: Create y i) by flippig each bit i x with probability p m ; 5: ed for 6: Selectio for Replacemet Create the ew populatio P t+1 by choosig the best µ idividuals out of {x 1),..., x µ), y 1),..., y λ) }; 7: t t + 1; Cotiue at 2; I order to apply the algorithm for the optimisatio of a fitess fuctio f : {0, 1} R, some parameters eed to be set. The populatio size µ, the offsprig populatio size λ ad the mutatio rate p m. Geerally p m = 1/ is cosidered a good settig for the mutatio rate. Also, i practical applicatios a stoppig criterio has to be defied sice the algorithm does ot halt. A fixed umber of geeratios or a fixed umber of fitess fuctio evaluatios are usually decided i advace. Sice the objective of the aalysis is to calculate the time required to reach the optimal approximate) solutio for the first time, o stoppig coditio is required, ad oe ca assume that the algorithms are allowed to ru forever. The + symbol i the algorithm s ame idicates that elitist trucatio selectio is applied. This meas that the whole populatio cosistig of both parets ad offsprig are sorted accordig to fitess ad the best µ are retaied for the ext geeratio. Some criterio eeds to be decided i case the best µ idividuals are ot uiquely defied. Ties betwee solutios of equal fitess may be broke uiformly at radom. Ofte offsprig are preferred over parets of equal 3

4 fitess. I the latter case if µ = λ = 1 are set, the the stadard 1+1) EA is obtaied, a very simple ad well studied evolutioary algorithm. O the other had if some stochastic selectio mechaism was used istead of the elitist mechaism ad a crossover operator was added as variatio mechaism, the Algorithm 1 would become a geetic algorithm GA) [11]. Give the importace of the 1+1) EA i this chapter, a formal defiitio is give i Algorithm 2. Algorithm 2 1+1) EA 1: Iitialisatio: Iitialise x 0) uiformly a radom from {0, 1} ; t 0; 2: Variatio: Create y by flippig each bit i x t) with probability p m = 1/; 3: Selectio for Replacemet 4: if fy) fx t) ) the 5: x t+1) y 6: ed if 7: t t + 1; Cotiue at 2; The algorithm is iitialised with a radom bitstrig. At each geeratio a ew cadidate solutio is obtaied by flippig each bit with probability p m = 1/. The umber of bits that flip ca be represeted by a biomial radom variable X Bi, p) where is the umber of bits i.e., the umber of trials) ad p = 1/ is the probability of a success i.e. a bit actually flips), while 1 p = 1 1/ is the probability of a failure i.e., the bit does ot flip). The, the expected umber of bits that flip i oe geeratio is give by the expectatio of the biomial radom variable, E [X] = p = 1/ = 1. The algorithm behaves i a very differet way compared to the radom local search RLS) algorithm that flips exactly oe bit per iteratio. Although the 1+1) EA flips exactly oe bit i expectatio per iteratio, may more bits may flip or eve oe at all. I particular, the 1+1) EA is a global optimiser because there is a positive probability that ay poit i the search space is reached i each geeratio. As a cosequece, the algorithm will fid the global optimum i fiite time. O the other had, RLS is a local optimiser sice it gets stuck oce it reaches a local optimum because it oly flips oe bit per iteratio. The probability that a biomial radom variable X Bi, p) takes value j i.e., j bits flip) is Pr X = j) = j ) p j 1 p) j. Hece, the probability that the 1+1) EA flips exactly oe bit is ) ) 1 Pr X = 1) = 1 1 ) 1 = 1 1 ) 1 1/e So the outcome of oe geeratio of the 1+1) EA is similar to that of RLS oly approximately 1/3 of the geeratios. The probability that two bits flip is exactly half the probability that oe flips: Pr X = 2) = 2 ) 1 1) = 2 = 1 2 ) ) 2 ) ) ) 1 1/2e) O the other had the probability o bits flip at all is: ) Pr X = 0) = 1/) 0 1 1/) 1/e 0 The latter result implies that i more tha 1/3 of the iteratios o bits flip. This should be take ito accout whe evaluatig the fitess of the offsprig, especially for expesive fitess fuctios. 4

5 Figure 2: A liear uitatio block of legth m startig at positio m + k ad edig at positio k. A liear uitatio block of legth is the Oemax fuctio. I geeral, the probability that i bits flip decreases expoetially with i: ) 1 Pr X = i) = i 1 1 ) i = 1 i i! 1 1 ) i 1 i! e I the worst case all the bits may eed to flip to reach the optimum i oe step. This evet has probability 1/. Sice, this is always a lower boud o the probability of reachig the optimum i each geeratio, by a simple waitig time argumet a upper boud of O ) may be derived for the expected rutime of the 1+1) EA o ay pseudo-boolea fuctio f : {0, 1} R. It is simple to desig a example trap fuctio for which the algorithm actually requires Θ ) expected steps to reach the optimum [10]. This simple result further motivates why it is fudametal to gai a foudatioal uderstadig of how the rutime of stochastic search heuristics depeds o the parameters of the problem ad o the parameters of the algorithms. 4 Test Fuctios Test fuctios are artificially desiged to aalyse the performace of stochastic search algorithms whe they face optimisatio problems with particular characteristics. These fuctios are used to highlight characteristics of fuctio classes which may make the optimisatio process easy or hard for a give algorithm. For this reaso they are ofte referred to as toy problems. The aalysis o test fuctios of simple ad well uderstood structure has allowed the developmet of several geeral techiques for the aalysis. Afterwards these techiques have allowed to aalyse the same algorithms for more complicated problems with practical applicatios such as classical combiatorial optimisatio problems. Furthermore, i recet years several stadard techiques origially developed for simple algorithms have bee exteded to allow the aalyses of more realistic algorithms. I this sectio the test fuctios that will be used as example fuctios throughout the chapter are itroduced. The most popular test fuctio is Oemax x) := xi which simply couts the umber of oe-bits i the bitstrig. The global optimum is a bitstrig of oly oe-bits. Oemax is the easiest fuctio with uique global optimum for the 1+1) EA [5]. A particularly difficult test fuctio for stochastic search algorithms is the eedle-i-ahaystack fuctio. Needlex) := xi cosists of a huge plateau of fitess value zero apart from oly oe optimal poit of fitess value oe represeted by the bitstrig of oly oe-bits. This fuctio is hard for search heuristics because all the search poits apart from the optimum have the same fitess. As a cosequece, the algorithms caot gather ay iformatio about where the eedle is by samplig search poits. Both Oemax ad Needle as defied above) have the property that the fuctio values oly deped o the umber of oes i the bitstrig. The class of fuctios with this property is called fuctios of uitatio ) Uitatiox) := f x i 5

6 Figure 3: A gap uitatio block of legth m startig at positio m + k ad edig at positio k. A gap uitatio block of legth 1 is the Needle fuctio. Throughout this chapter, fuctios of uitatio will be used as a geeral example class to demostrate the use of the techiques that will be itroduced. For simplicity of the aalysis, the optimum is assumed to be the bitstrig of oly oe-bits. For the aalysis the fuctio of uitatio will be divided ito three differet kids of subblocks: liear blocks, gap blocks ad plateau blocks. Each block will be defied by its legth parameter m i.e. the umber of bits i the block) ad by its positio parameter k i.e., each block starts at bitstrigs with m + k zeroes ad eds at bitstrigs with k zeroes). Give a uitatio fuctio it is divided ito sub-blocks proceedig from left to right from the all-zeroes bitstrig towards the all-oes bitstrig. If the fitess icreases with the umber of oes, the a liear block is created. The liear block eds whe the fuctio value stops icreasig with the umber of oes. { a x + b if k < x k + m Liear x ) = 0 otherwise. See Figure 2 for a illustratio. If the fitess fuctio decreases with the umber of oes, the a gap block is created. The gap block eds whe the fitess value reaches for the first time a higher value tha the value at the begiig of the block. { a if x = k + m Gap x ) = 0 otherwise. See Figure 3 for a illustratio. If the fitess remais the same as the umber of oes i the bitstrigs icreases, the a plateau block is created. The block eds at the first poit where the fitess value chages. { a if k < x k + m Plateau x ) = 0 otherwise. See Figure 4 for a illustratio. By proceedig from left to right the whole search space is subdivided ito blocks. See Figure 5 for a illustratio. Let the uitatio fuctio be subdivided ito r sub-fuctios f 1, f 2,... f r, ad let T i be the rutime for a elitist search heuristic to optimise each subfuctio f i. The by liearity of expectatio, a upper boud o the expected rutime of a elitist stochastic search heuristic for the uitatio fuctio is: [ r ] r E [T ] E T i = E [T i]. Hece, a upper boud o the total rutime for the uitatio fuctio may be achieved by calculatig upper bouds o the rutime for each block separately. Oce these are obtaied, summig all the bouds yields a upper boud o the total rutime. Attetio eeds to be put whe calculatig upper bouds o the rutime to overcome a plateau block whe this is followed by a gap block because poits straight after the ed of the plateau will have lower fitess values, hece will ot be accepted. I these special cases, the upper boud for the 6

7 Figure 4: A plateau uitatio block of legth m startig at positio m + k ad edig at positio k. Figure 5: A illustratio of how the search space of a uitatio fuctio may be subdivided ito blocks of the three kids. Plateau block eeds to be multiplied by the upper boud for the Gap block to achieve a correct upper boud o the rutime to overcome both blocks. I the remaider of the chapter upper ad lower bouds for each type of block will be derived as example applicatios of the preseted rutime aalysis techiques. The reader, will the be able to calculate the rutime of the 1+1) EA ad other evolutioary algorithms for ay such uitatio fuctio. By simply usig waitig time argumets it is possible to derive upper ad lower bouds o the rutime of the 1+1) EA for the Gap block. Assumig that the algorithm is at the begiig of the gap block the to reach the ed it is sufficiet to flip m zero-bits ito oe-bits ad leave the other bits uchaged. O the other had it is a ecessary coditio to flip at least m zero-bits because all search poits achieved by flippig less tha m zero-bits have a fitess value of zero ad would ot be accepted by selectio. Give that there are m + k zero-bits available at the begiig of the block, the followig upper ad lower bouds o the probability of reachig the ed of the block follows ) ) 1 1 m + k)e ) m m + k 1 m e m + k m ) m 1 e p m + k m ) m Here the outer iequalities are achieved by usig ) k k k) e ) k for k 1. The by k simple waitig time argumets, the expected time for the 1+1) EA to optimise a Gap block of legth m ad positio k is upper ad lower bouded by ) m m m + k m + k)e m 5 Tail Iequalities ) 1 m E [T ] e m m + k m ) 1 e m ) m. ) m m. m + k The rutime of a stochastic search algorithm A for a fuctio class) f is a radom variable T A,f ad the mai goal of a rutime aalysis is to calculate its expectatio E [T A,f ]. Sometimes the expected rutime may be particularly large, but there may also be a high probability that the actual optimisatio time is sigificatly lower. I these cases a result about the success probability withi t steps, helps cosiderably the uderstadig of the algorithm s performace. I other occasios it may be iterestig to simply gai kowledge about the probability that the actual optimisatio time deviates from the expected rutime. I such circumstaces tail 7

8 E [X] Figure 6: The expectatio of a radom variable ad its probability distributio. The tails are highlighted i red. iequalities tur out to be very useful tools by allowig to obtai bouds o the rutime that hold with high probability. A example of the expectatio of a radom variable ad its probability distributio are give i Figure 6. Give the expectatio of a radom variable, which ofte may be estimated easily, tail iequalities give bouds o the probability that the actual radom variable deviates from its expectatio [25, 24]. The most simple tail iequality is Markov s iequality. May strog tail iequalities are derived from Markov s iequality. Theorem 1 Markov s Iequality). Let X be a radom variable assumig oly o-egative values. The for all t R +, PrX t) E [X]. t The power of the iequality is that o kowledge about the radom variable is required apart from it beig o-egative. Let X be a radom variable idicatig the umber of bits flipped i oe iteratio of the 1+1) EA. As see i the previous sectio, oe bit is flipped per iteratio i expectatio, i.e., E [X] = 1. Oe may woder what is the probability that more tha oe bit is flipped i oe time step. A straightforward applicatio of Markov s Iequality reveals that i at least half of the iteratios either oe bit is flipped or oe: Pr X 2) E [X] 2 Similarly, oe may wat to gai some iformatio o how may oes are cotaied i the bitstrig at iitialisatio, give that i expectatio there are E [X] = /2 here X is a biomial radom variable with parameters ad p = 1/2). A applicatio of Markov s iequality yields that the probability of havig more tha 2/3) oes at iitialisatio is bouded by = 1 2 Pr X 2/3)) E [X] 2/3) = /2 = 3/4 1) 2/3) Sice X is biomially distributed it is reasoable to expect that, for large eough, the actual umber of obtaied oes at iitialisatio would be more cocetrated aroud the expected value. I particular while the boud is obviously correct, the probability that the iitial bitstrig has more tha 2/3) oes is much smaller tha 3/4. However, to achieve such a result more iformatio about the radom variable should be required by the tail iequality i.e., that it is biomially distributed). A importat class of tail iequalities used i the aalysis of stochastic search heuristics are Cheroff bouds. Theorem 2 Cheroff Bouds). Let X 1, X 2,... X be idepedet radom variables takig values i {0, 1}. Defie X = Xi, which has expectatio EX) = PrXi = 1). a) PrX 1 δ)e [X]) e E[X]δ2 2 for 0 δ 1. ) e b) PrX > 1 + δ)e [X]) δ E[X] 1+δ) for δ > 0. 1+δ A applicatio of Cheroff bouds reveals that the probability that the iitial bitstrig has more tha 2/3) oe-bits is expoetially small i the legth of the bitstrig. Let X = Xi be the radom variable summig up the radom values Xi of each of the 8

9 A m A m 1 Fitess.. A 3 A 2 A 1 Figure 7: A partitio of the search space satisfyig the coditios of a f-based partitio. bits. Sice each bit is iitialised with probability 1/2, it holds that PrX i = 1) = 1/2 ad E [X] = /2. By fixig δ = 1/3 it follows that 1 + δ)e [X] = 2/3) ad fially by applyig iequality b), ) e 1/3 /2 ) /2 29 PrX > 2/3)) < 4/3) 4/3 30 I fact a expoetially small probability of deviatig from /2 by a costat factor of the search space c/ for ay costat c > 0 may easily be obtaied by Cheroff bouds. 6 Artificial Fitess Levels AFL) The artificial fitess levels techique is a very simple method to achieve upper bouds o the rutime of elitist stochastic optimisatio algorithms. Albeit its simplicity, it ofte achieves very good bouds o the rutime. The idea behid the method is to divide the search space of size 2 ito m disjoit fitessbased partitios A 1,... A m of icreasig fitess such that fa i) < fa j) i < j. The uio of these partitios should cover the whole search space ad the level of highest fitess A m should cotai the global optimum or all global optima if there is more tha oe). Defiitio 3. A tuple A 1, A 2,..., A m) is a f-based partitio of f : X R if 1. A 1 A 2 A m = X 2. A i A j = for i j 3. fa 1) < fa 2) < < fa m) 4. fa m) = max x fx) For fuctios of uitatio, a atural way of defiig a fitess-based partitio is to divide the search space ito + 1 levels, each defied by the umber of oes i the bitstrig. For the Oemax fuctio, where fitess icreases with the umber of oes i the bitstrig, the fitess levels would be aturally defied as A i := {x {0, 1} Oemaxx) = i}. 6.1 AFL - Upper Bouds Give a fitess-based partitio of the search space, it is obvious that a elitist algorithm usig oly oe idividual will oly accept poits of the search space that belog to levels of higher or equal fitess to the curret level. Oce a ew fitess level has bee reached, the algorithm will ever retur to previous levels. This implies that each fitess level has to be left at most oce by the algorithm. Sice i the worst case all fitess levels are visited, the sum of the expected times to leave all levels is a upper boud o the expected time to reach the global optimum. The artificial fitess levels method simplifies this idea by oly requirig a lower boud s i o the probability of leavig each level A i rather tha askig for the exact probabilities to leave each level. 9

10 Theorem 4 Artificial Fitess Levels). Let f : X R be a fitess fuctio, A 1... A m a fitess-based partitio of f ad s 1... s m 1 be lower bouds o the correspodig probabilities of leavig the respective fitess levels for a level of better fitess. The the expected rutime of a elitist algorithm usig a sigle idividual is E [T A,f ] m 1 1/si. The artificial fitess level method will ow be applied to derive a upper boud o the expected rutime of 1+1) EA for the Oemax fuctio. Afterwards, the boud will be geeralised to geeral liear blocks of uitatio. Theorem 5. The expected rutime of the 1+1) EA o Oemax is O l ). Proof. The artificial fitess levels method will be applied to the + 1 partitios defied by the umber of oes i the bitstrig, i.e., A i := {x {0, 1} Oemaxx) = i}. This meas that all bitstrigs with i oes ad i zeroes belog to fitess level A i. For each level A i, the method requires a lower boud o the probability of reachig ay level A j where j > i. To reach a level of higher fitess it is ecessary to icrease the umber of oes i the bitstrig. However, it is sufficiet to flip a zero ito a oe ad leave the remaiig bits uchaged. Sice the probability of flippig a bit is 1/ ad there are i zeroes that may be flipped, a lower boud o the probability to reach a level of higher fitess from level A i is: s i i) ) 1 i e where 1 1/) 1 is the probability of leavig 1 bits uchaged ad the iequality follows because 1 1/) 1 1/e for all N. The by the artificial fitess levels method Theorem 4), E [ m 1 ] T 1+1) EA,Oemax i=0 1/s i 1 i=0 e i = e 1 = O l ). i Theorem 6. The expected rutime of the 1+1)-EA for a liear block of legth m edig at positio k is O lm + k)/k)). Proof. Apply the artificial fitess levels method where each partitio A i cosists of the bitstrigs i the block with i zeroes. The the probability of leavig a fitess level is bouded by s i i/ 1 1/) 1 i/e. Give that at most m fitess levels eed to be left ad that the block starts at positio m + k ad eds at positio k, by Theorem 4 the expected rutime is: E [T ] k+m i=k+1 e i e k+m i=k+1 k+m 1 i e 1 k i ) 1 e l i ) m + k k 6.2 AFL - Lower Bouds Recetly Sudholt itroduced a artificial fitess levels method to obtai lower bouds o the rutime of stochastic search algorithms [35]. Sice lower bouds are aimed for, apart from the probabilities of leavig each fitess level, the method eeds to also take ito accout the probability that some levels may be skipped by the algorithm. Theorem 7. Cosider a fitess fuctio f : X R ad A 1... A m a fitess-based partitio of f. Let u i be the probability of startig i level A i, s i be a upper boud o the probability of leavig A i ad p i,j be a upper boud o the probability of jumpig from level A i to level A j. If there exists some 0 < χ 1 such that for all j > i m 1 p i,j χ p i,k, the the expected rutime of a elitist algorithm usig a sigle idividual is k=j m 1 m 1 E [T A,f ] χ u i j=i 1 s j 10

11 The method will first be illustrated for the 1+1) EA o the Oemax fuctio. Afterwards, the result will be geeralised to geeral liear blocks of uitatio. Theorem 8. The expected rutime of the 1+1) EA o Oemax is Ω l ). Proof. Apply the artificial fitess levels method o the +1 partitios defied by the umber of oes i the bitstrig, i.e., A i := {x {0, 1} Oemaxx) = i}. This meas that all bitstrigs with i oes ad i zeroes belog to fitess level A i. To apply the artificial fitess levels method, bouds o s i ad χ eed to be derived. A upper boud o the probability of leavig fitess level A i is simply s i i)/ because it is a ecessary coditio that at least oe zero flips to reach a better fitess level. The boud follows because each bit flips with probability 1/ ad there are i zeroes available to be flipped. I order to obtai a upper boud o χ, the method requires a lower boud o p i,j ad a upper boud o m 1 k=j p i,k. For the lower boud o p i,j otice that i order to reach level A j, it sufficiet to flip j i zeroes out of the i zeroes available ad leave all the other bits uchaged. Hece the followig boud is obtaied: ) ) j i i 1 p ij 1 1 ) j i) j i For a upper boud o the sum, otice that to reach ay level A k k j from level A i it is ecessary to flip at least j i zeroes out of the i available zeroes. So, ) 1 ) j i i 1 p i,k j i k=j ad for χ := 1/e the coditio of Theorem 7 is satisfied as follows: p i,j 1 1 ) j i) 1 1 p i,k χ By Eq. 1), the probability that the iitial search poit has less tha 2/3) 1-bits is at least The statemet of Theorem 7 ow yields E [T A,f ] > 2/3) ) 1 e ) 1 e It ow follows that E [T A,f ] = Ω log ). k=j u i u i 1 s j j=i 2/3) u i ) e 4 ) /3 1 4e j. j=1 ) 1 1 k=j 1 s j j=2/3) j=2/3) j Similarly the followig result may also be proved for liear blocks of uitatio fuctios by defiig the fitess partitios as A i := {x : x = k + m i} for 0 i m. Theorem 9. The expected rutime of the 1+1)-EA for a liear block of legth m edig at positio k is Ω lm + k)/k)). p i,k 11

12 6.3 Level-based aalysis of o-elitist populatios A weakess with the classical artificial fitess level techique is that it is limited to search heuristics that oly keep oe solutio, such as the 1+1) EA, ad it heavily relies o the selectio mechaism to use elitism. [4] recetly itroduced the so-called level-based aalysis, a geeralisatio of fitess level theorems for o-elitist evolutioary algorithms which is also applicable to search heuristics with populatios, ad usig higher arity operators such as crossover. Their theorem applies to ay algorithm that ca be expressed i the form of Algorithm 3, such as geetic algorithms [4] ad estimatio of distributio algorithms UMDA [7]. The mai compoet of the algorithm is a radom operator D which give the curret populatio P t X λ returs a probability distributio DP t) over the search space X. The ext populatio P t+1 is obtaied by samplig idividuals idepedetly from this distributio. Algorithm 3 Populatio-based algorithm with idepedet samplig 1: Iitialisatio: t 0; Iitialise P t uiformly at radom from X λ. 2: Variatio ad Selectio: 3: for i = 1... λ do Sample P t+1 i) DP t ) 4: ed for 5: t t + 1; Cotiue at 2 I cotrast to classical fitess-level theorems, the level-based theorem Theorem 10) oly assumes a partitio A 1,..., A m+1) of the search space X, ad ot a f-based partitio see Defiitio 3). Each of the sets A j, j [m+1] is called a level, ad the symbol A + j := m+1 i=j+1 Ai deotes the set of search poits above level A j. Give a costat γ 0 0, 1), a populatio P X λ is cosidered to be at level A j with respect to γ 0 if P A + j 1 γ0λ ad P A+ j < γ0λ meaig that at least a γ 0 fractio of the populatio is i level A j or higher. Theorem 10 [4]). Give ay partitio of a fiite set X ito m o-overlappig subsets A 1,..., A m+1), defie T := mi{tλ P t A m+1 > 0} to be the first poit i time that elemets of A m+1 appear i P t of Algorithm 3. If there exist parameters z 1,..., z m, z 0, 1], δ > 0, ad a costat γ 0 0, 1) such that for all j [m], P X λ, y DP ) ad γ 0, γ 0] it holds C1) Pr y A + j P A + j 1 γ 0λ ) z j z C2) Pr y A + j P A + j 1 γ 0λ ad P A + j γλ ) 1 + δ)γ, ad C3) λ 2 ) 16m a l with a = δ2 γ 0 acεz 21 + δ), ε = mi{δ/2, 1/2} ad c = ε4 /24 the E [T ] 2 cε mλ1 + l1 + cλ)) + m z j j=1 ) 1. The theorem provides a upper boud o the expected optimisatio time of Algorithm 3 if it is possible to fid a partitio A 1,..., A m+1) of the search space X ad accompayig parameters γ 0, δ, z 1,..., z m, z such that coditios C1), C2), ad C3) are satisfied. Coditio C1) requires a o-zero probability z j of creatig a idividual i level A j+1 or higher if there are already at least γ 0λ idividuals i level A j or higher. I typical applicatios, this imposes some coditios o the variatio operator. The coditio is aalogous to the probability s j i the artificial fitess level techique. Coditio C2) requires that if i additio there are γλ idividuals at level A j+1 or better, the the probability of producig a idividual i level A j+1 or better should be larger tha γ by a multiplicative factor 1 + δ. I typical applicatios, this imposes some coditios o the stregth of the selective pressure i the algorithm. Fially, coditio C3) imposes miimal requiremets o the populatio size i terms of the parameters above. As a example applicatio of the level-based theorem, the µ, λ) EA is aalysed, which is the o-elitist variat of the µ + λ) EA show i Algorithm 1. The two algorithms differ i the selectio step lie 6) where the ew populatio P t+1 i µ, λ) EA is chose as the best µ idividuals out of {y 1,..., y λ } ad breakig ties uiformly at radom. While the µ + λ) EA 12

13 always retais the best µ idividuals i the populatio hece the ame elitist), the µ, λ) EA always discards the old idividuals x 1),..., x µ). At first sight, it may appear as if the µ, λ) EA caot be expressed i the form of Algorithm 3. The µ idividuals x 1),..., x µ) that are kept i each geeratio are ot idepedet due to the iheret sortig of the offsprig. However, takig a differet perspective, the populatio of the algorithm at time t could also be iterpreted as the λ offsprig y 1),..., y λ). I this alterative iterpretatio, the ew populatio is ow created by samplig uiformly at radom amog the µ best idividuals i the populatio, ad applyig the mutatio operator. The operator D i Algorithm 3 ca ow be defied as i Algorithm 4. Algorithm 4 Operator D correspodig to µ, λ) EA 1: Selectio: Sort the populatio P t = y 1),..., y λ) ) such that fy 1) ) fy 2) )... fy λ) ). Select x uiformly at radom amog {y 1),..., y µ) }. 2: Variatio mutatio): Create x by flippig each bit i x with probability χ/. 3: retur x The followig lemma will be useful whe estimatig the probability that the mutatio operator does ot flip ay bit positios. Lemma 11. For ay δ 0, 1) ad χ > 0, if χ + δ)χ/δ) the 1 χ ) 1 δ)e χ. Proof. Note first that l1 δ) < δ, hece ) χ 1 χ l1 δ)) + δ χ χ + δ). By makig use of the fact that 1 1/x) x 1 1/e ad simplifyig the expoet as above 1 χ ) [ 1 χ ) ] /χ) 1 χ l1 δ) 1 δ)e χ. The expected optimisatio time of the µ, λ) EA o Oemax ca ow be expressed i terms of the mutatio rate χ/ ad the problem size assumig some costraits o the populatio sizes µ ad λ. The theorem is valid for a wide rage of mutatio rates χ/. I the classical settig of χ = 1, the expected optimisatio time reduces to Oλ l λ). Theorem 12. The expected optimisatio time of the µ, λ) EA with bitwise mutatio rate χ/ where χ 0, /2), ad populatio sizes µ ad λ satisfyig for ay costat δ 0, 1) ) λ 1 + δ µ e χ, ad λ 4 ) ) 1 δ δ 2 e l χ δ 7 χ o Oemax is for ay χ + δ)/χ/δ) o more tha ) 1536 λ lλ) + eχ l + 2) + Oλ). δ 5 χ1 δ) Proof. Apply the level-based theorem with the same m := + 1 partitios as i the proof of Theorem 8. Sice the parameter δ is assumed to be some costat δ 0, 1), it also holds that the parameters a, ε, ad c are positive costats. The parameters γ 0, z 1,..., z m, ad z will be chose later. To verify that coditios C1) ad C2) hold for ay j [m], it is ecessary to estimate the probability that operator D produces a search poit x with j + 1 oe-bits whe applied to a populatio P cotaiig at least γ 0λ idividuals, each havig at least j oe-bits formally P A + j 1 γ0λ). Such a evet is called a successful sample. Coditio C1) asks for bouds z j for each j [m] o the probability that the search poit x retured by Algorithm 4 cotais j + 1 oe-bits. First chose the parameter settig γ 0 := µ/λ. This parameter settig is coveiet, because the selectio step i Algorithm 4 13

14 always picks a idividual x amog the best µ = γ 0λ idividuals i the populatio. By the assumptio that P A + j 1 γ0λ, the algorithm will always select a idividual x cotaiig at least j + i oe-bits for some o-egative iteger i 0. Assume without loss of geerality, that the first j bit-positios i the selected idividual x are oe-bits, ad let k, j < k, be ay of the other bit positios. If there is a zero-bit i positio k or if i 2, the a successful sample occurs if the mutatio operator flips oly bit positio k. If there is a oe-bit i positio k, ad if i = 1, the the step is still successful if the mutatio operator flips oe of the bit positios. Sice the probability of ot flippig a positio is higher tha the probability of flippig a positio, i.e., 1 χ/ χ/, the probability of a successful sample is therefore i both cases at least j)χ/)1 χ/) 1. 2) By Lemma 11, the probability above is at least z j := j)χ/)e χ 1 δ). The parameter z is chose to be the miimal amog these probabilities, i.e. z := χ/)e χ 1 δ). Coditio C2) assumes i additio that γλ < µ idividuals have fitess j + 1 or higher. I this case, it suffices that the selectio mechaism picks oe of the best γλ idividuals amog the µ idividuals, ad that oe of the bits are mutated i the selected idividual. The probability of this evet is at least γλ µ 1 χ/) γλ µ e χ 1 δ) Hece, to satisfy coditio C2), it suffices to require that which is true wheever γλ µ exp χ)1 δ) γ1 + δ), λ µ ) 1 + δ e χ. 1 δ To check coditio C3), otice that ε = δ/2, ad c = δ 4 /384, hece a = δ2 λ/µ) 21 + δ) δ2 e χ 21 δ), ad acεz δ2 χ 2 cε = δ7 χ 1536 Coditio C3) is ow satisfied, because the populatio size λ is required to fulfil ) ) 2 16m a l 41 δ) 24576m l λ acεz δ 2 e χ δ 7 χ All coditios are satisfied, ad the theorem follows. 6.4 Coclusios The artificial fitess levels method was first described by Wegeer [37]. The origial method was desiged for the achievemet of upper bouds o the rutime of stochastic search heuristics usig oly oe idividual such as the 1+1) EA. Sice the, several extesios of the method have bee devised for the aalysis of more sophisticated algorithms. Sudholt itroduced the method preseted i Sectio 7 for the obtaimet of lower bouds o the rutime [35]. I a early study, [38] used a potetial fuctio that geeralises the fitess level argumet of [37] to aalyse the µ+1) EA. His aalysis achieved tight upper bouds o the rutime of the µ+1) EA o LeadigOes ad Oemax by waitig for a sufficiet amout of idividuals of the populatio to take over a give fitess level A i before calculatig the probability to reach a fitess level of higher fitess. Che et al. exteded the aalysis to offsprig populatios by aalysig the N+N) EA, also takig ito accout the take over process [2]. Lehre itroduced a geeral fitess-level method for arbitrary populatio-based EAs with o-elitist selectio mechaisms ad uary variatio operators [20]. This techique was later geeralised further ito the level-based method preseted i Sectio 6.3 [4]. The method allows the aalysis of sophisticated o-elitist heuristics such as geetic algorithms equipped with mutatio, crossover ad stochastic selectio mechaisms, both for classical as well as oisy ad ucertai optimisatio [6]. 14

15 Figure 8: A illustratio of the drift at time step k of a process represeted by the radom variable X ad a distace fuctio d. 7 Drift Aalysis Drift aalysis is a very flexible ad powerful tool that is widely used i the aalysis of stochastic search algorithms. The high level idea is to predict the log term behaviour of a stochastic process by measurig the expected progress towards a target i a sigle step. Naturally, a measure of progress eeds to be itroduced, which is geerally called a distace fuctio. Give a radom variable X k represetig the curret state of the process at step k, over a fiite set of states S, a distace fuctio d : S R + 0 is defied such that dx k) = 0 if ad oly if X k is a target poit e.g., the global optimum). Drift aalysis aims at derivig the expected time to reach the target by aalysig the decrease i distace i each step, i.e., dx k+1 ) dx k ). The expected value of this decrease i distace, k = E [dx k+1 ) dx k ) X k ] is called the drift. See Figure 8 for a illustratio. If the iitial distace from the target is dx 0) ad a boud o the drift i.e., the expected improvemet i each step) is kow, the bouds o the expected rutime to reach the target may be derived. 7.1 Additive Drift Theorem The additive drift theorem was itroduced to the field of evolutioary computatio by He ad Yao [12]. The theorem allows to derive both upper ad lower bouds o the rutime of stochastic search algorithms. Cosider a distace fuctio Y k = dx k ) idicatig the curret distace, at time k, of the stochastic process from the optimum. The theorem simply states that if at each time step k, the drift is at least some value ε i.e., the process has moved closer to the target) the the expected umber of steps to reach the target is at most Y 0/ε. Coversely if the drift i each step is at most some value ε, the the expected umber of steps to reach the target is at least Y 0/ε. Theorem 13 Additive Drift Theorem). Give a stochastic process X 1, X 2,... over a iterval [0, b] R ad a distace fuctio d : S R + 0 such that dx k) = 0 if ad oly if X cotais the target. Let Y k = dx k ) for all k, defie T := mi{k 0 Y k = 0}, ad assume E [T ] <. Verify the followig coditios: C1+) k C1 ) k The, E [Y k+1 Y k Y k > 0] ε E [Y k+1 Y k Y k > 0] ε 1. If C1+) holds for a ε > 0, the E [T Y 0] b/ε. 2. If C1 ) holds for a ε > 0, the E [T Y 0] Y 0/ε. A Example applicatio of the additive drift theorem follows cocerig the 1+1) EA for plateau blocks of fuctios of uitatio of legth m positioed such that k > /2 + ε. Theorem 14. The expected rutime of the 1+1)-EA for a plateau block of legth m edig at positio k > /2 + ε is Θm). Proof. The additive drift theorem will be applied to derive both upper ad lower bouds o the expected rutime. The startig poit is a bitstrig X 0 with m + k zeroes ad the target poit is a bitstrig X t with k zeroes. Choose to use the atural distace fuctio 15

16 ε 0 Y k = dx k ) b Figure 9: A illustratio of the coditio of the Additive Drift Theorem. If the expected distace to the optimum decreases of at least ε at each step i.e., coditio C1+), the a upper boud o the rutime is achieved. If the distace decreases of at most ε at each step i.e., coditio C1-), the a lower boud o the rutime is obtaied. Y t = dx t) := X t that couts the umber of zeroes i the bitstrig. Subtract k from the distace such that target poits with k zeroes have distace 0 ad the iitial poit has distace m. As log as poits o the plateau are geerated, they will be accepted because all plateau poits have equal fitess. Give that each bit flips with probability 1/, ad at each step the curret search poit has Y t zeroes ad Y t oes, the drift is t := E [Y t Y t+1 Y t > 0] = Yt Yt = 2 Yt 1 A lower boud o the drift is obtaied by cosiderig that as log as the ed of the plateau has ot bee reached there are always at least k zeroes that may be flipped i.e., Y t k). Accordigly for a upper boud, at most m + k zeroes may be available to be flipped i.e., Y t m + k). Hece, 2k 2m + k) 1 t 1 The by additive drift aalysis Theorem 13), ad E [T Y 0] E [T Y 0] m 2k)/ 1 = m 2k = Om) m 2m + k)/ 1 = m 2m + k) = Ωm) where the last equalities hold as log as k > /2 + ε. Note agai that if the plateau block is followed by a gap block, the a upper boud o the expected time to optimise both blocks is achieved by multiplyig the upper bouds obtaied for each block. This is ecessary because poits i the gap will ot be accepted by the 1+1) EA. 7.2 Multiplicative Drift Theorem I the additive drift theorem the worst case decrease i distace is cosidered. If the expected decrease i distace chages cosiderably i differet areas of the search space, the the estimate o the drift may be too pessimistic for the obtaimet of tight bouds o the expected rutime. Drift aalysis of the 1+1) EA for the classical Oemax fuctio will serve as a example of this problem. Sice the global optimum is the all-oes bitstrig ad the fitess icreases with the umber of oes a atural distace fuctio is Y t = dx t) = OemaxX t) which simply couts the umber of zeroes i the curret search poit. The the distace will be zero oce the optimum is foud. Poits with less oe-bits tha the curret search poit will ot be accepted by the algorithm because of their lower fitess. So the drift is always positive, i.e., t 0 ad the amout of progress is the expected umber of oes gaied i each step. I order to fid a upper boud o the rutime, a lower boud o the drift is eeded i.e., the worst case improvemet). Such worst case occurs whe the curret search poit is optimal except for oe 0-bit. I this case the maximum decrease i distace that may be achieved i a step is Y t Y t+1 = 1 ad to achieve such progress it is ecessary that the algorithm flips the zero ito a oe ad leaves the other bits uchaged. Hece, the drift is t ) 1 1 e := ε

17 Sice the expected iitial distace is E [Y 0] = /2 due to radom iitialisatio, the drift theorem yields E [T Y 0] E [Y0] ε = /2 1/e) = e/2 2 = O 2 ) I Sectio 6 it was prove that the rutime of the 1+1) EA for Oemax is Θ l ), hece a boud of O 2 ) is ot tight. The reaso is that o fuctios such as Oemax the amout of progress made by the algorithm depeds crucially o the distace from the optimum. For Oemax i particular, larger progress per step is achieved whe the curret search poit has may zeroes that may be flipped. As the algorithm approaches the optimal solutio the amout of expected progress i each step becomes smaller because search poits have icreasigly more oe-bits tha zero-bits i the bitstrig. I such cases a distace fuctio that takes ito accout these properties of the objective fuctio eeds to be used. For Oemax a correct boud is achieved by usig a distace fuctio that is logarithmic i the umber of zeroes i, i.e., Y t = dx t) := li + 1) where a 1 is added to i i the argumet of the logarithm such that the global optimum has distace zero i.e., l1) = 0). With such distace measure, the decrease i distace whe flippig a zero ad leavig the rest of the bitstrig uchaged is li + 1) li) = l ) 1 i 2i where the last iequality holds for all i 1. Sice it is sufficiet to flip a zero ad leave everythig else uchaged to obtai a improvemet, the drift is t i e 1 2i = 1 2e := ε Give that the maximum possible distace is Y 0 l + 1), the drift theorem yields E [T ] Y 0 = 2e l + 1) = O l ). 1/2e) The multiplicative drift theorem was itroduced as a hady tool to deal with situatios as the oe described above where the amout of progress depeds o the distace from the target. Theorem 15 Multiplicative Drift Theorem [8]). Let {X t} t N0 be radom variables describig a Markov process over a fiite state space S R. Let T be the radom variable that deotes the earliest poit i time t N 0 such that X t = 0. If there exist δ, c mi, c max > 0 such that for all t < T, 1. E [X t X t+1 X t] δx t ad 2. c mi X t c max, the E [T ] 2 ) δ l 1 + cmax c mi The followig derivatio of a upper boud o the rutime of the 1+1) EA for liear blocks illustrates the multiplicative drift theorem. Theorem 16. The expected time for the 1+1)-EA to optimise a liear uitatio block of legth m edig at positio k is O lm + k)/k)) Proof. Let X t be the umber of zero-bits i the bitstrig at time step t, represetig the distace from the ed of the liear block. By rememberig that icreases i distace are ot accepted due to elitism, the expected decrease i distace at time step ca be bouded by E [X t+1 X t] X t 1 Xt e = Xt 1 1 ) e simply by cosiderig that if a zero-bit is flipped ad othig else the the distace decreases by 1. The the drift is: E [X t X t+1 X t] X t X t 1 1 ) = 1 Xt := δxt e e By fixig k = c mi X t c max = m + k the multiplicative drift theorem yields E [T ] 2 ) δ l 1 + cmax = 2e l1 + m + k)/k) = O lm + k)/k)) c mi 17

18 By fixig 1 = c mi X t c max = a O l ) boud o the expected rutime of the 1+1) EA for Oemax is achieved. 7.3 Variable Drift Theorem The multiplicative drift theorem is applicable whe the drift of a stochastic process is liear with respect to the curret positio. However, i some stochastic processes, the drift is oliear i the curret positio, i.e., E [X t X t+1 X t x mi] hx t) 3) for some fuctio h. The followig variable drift theorem provides bouds o the expectatio ad the tails of the hittig time distributio of such processes, give some assumptios about the fuctio h. Theorem 17 Corollary 1 i [22]). Let X t) t 0, be a stochastic process over some state space S {0} [x mi, x max], where x mi 0. Let h: [x mi, x max] R + be a differetiable fuctio. The the followig statemets hold for the first hittig time T := mi{t X t = 0}. i) If E [X t X t+1 X t x mi] hx t) ad h x) 0, the E [T X 0] xmi X0 hx + 1 mi) x mi hy) dy. ii) If E [X t X t+1 X t x mi] hx t) ad h x) 0, the E [T X 0] xmi X0 hx + 1 mi) x mi hy) dy. iii) If E [X t X t+1 X t x mi] hx t) ad h x) λ for some λ > 0, the Pr T t X 0) < exp λ t xmi X0 )) hx 1 mi) x mi hy) dy. iv) If E [X t X t+1 X t x mi] hx t) ad h x) λ for some λ > 0, the Pr T < t X 0 > 0) < eλt e λ e λ 1 exp λxmi X0 ) hx λ mi) x mi hy) dy. To illustrate the variable drift theorem, a upper boud o the optimisatio time of the 1+1) EA o the class of liear fuctios with bouded coefficiets will be derived. More formally, this class of fuctios cotais ay fuctio of the form fx) := w ix i, with bouded, positive coefficiets w 1,..., w w mi, w max) where 0 < w mi < w max. The drift fuctio h i this example turs out to be liear, hece the multiplicative drift theorem could have bee applied istead. Theorem 18. The expected optimisatio time of the 1+1) EA o liear fuctios is less tha t) := el) + lw max/w mi) + 1), ad the probability that the optimisatio time exceeds t) + re for ay r 0 is o more tha e r. Proof. Defie the distace X t at time k to be the fuctio value that remais at time k, i.e., ) ) ) X t := w i w ix t) i = w i 1 x t) i, where x t) i is the i-th bit i the curret search poit at time t. For ay i [], assume that the mutatio operator flipped oly bit positio i, ad o other bit positios, a evet deoted by = 0, the bit positio i flipped from 0 to 1 ad the distace reduced by w i. Otherwise, if x t) i = 1, the bit positio i flipped from 1 to 0, the ew search poit was ot accepted, ad the distace reduced by 0. Hece, the distace always reduces by w i1 x t) the symbol E i. If x t) i i ) 18

General Lower Bounds for the Running Time of Evolutionary Algorithms

General Lower Bounds for the Running Time of Evolutionary Algorithms Geeral Lower Bouds for the Ruig Time of Evolutioary Algorithms Dirk Sudholt Iteratioal Computer Sciece Istitute, Berkeley, CA 94704, USA Abstract. We preset a ew method for provig lower bouds i evolutioary