On the Effectiveness of Sampling for Evolutionary Optimization in Noisy Environments

Size: px

Start display at page:

Download "On the Effectiveness of Sampling for Evolutionary Optimization in Noisy Environments"

Marvin Bryant
5 years ago
Views:

1 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets Chao Qia,2 chaoqia@ustc.edu.c Yag Yu 2 yuy@ju.edu.c Ke Tag ketag@ustc.edu.c Yaochu Ji 3 yaochu.ji@surrey.ac.uk Xi Yao,4 x.yao@cs.bham.ac.uk Zhi-Hua Zhou 2 zhouzh@ju.edu.c USTC-Birmigham Joit Research Istitute i Itelliget Computatio ad Its Applicatios, School of Computer Sciece ad Techology, Uiversity of Sciece ad Techology of Chia, Hefei, , Chia 2 Natioal Key Laboratory for Novel Software Techology, Najig Uiversity, Najig, 20023, Chia 3 Departmet of Computer Sciece, Uiversity of Surrey, Guildford, GU2 7XH, UK 4 Ceter of Excellece for Research i Computatioal Itelligece ad Applicatios, School of Computer Sciece, Uiversity of Birmigham, Birmigham, B5 2TT, UK Abstract I real-world optimizatio tasks, the objective i.e., fitess) fuctio evaluatio is ofte disturbed by oise due to a wide rage of ucertaities. Evolutioary algorithms are ofte employed i oisy optimizatio, where reducig the egative effect of oise is a crucial issue. Samplig is a popular strategy for dealig with oise: to estimate the fitess of a solutio, it evaluates the fitess multiple k) times idepedetly ad the uses the sample average to approximate the true fitess. Obviously, samplig ca make the fitess estimatio closer to the true value, but also icreases the estimatio cost. Previous studies maily focused o empirical aalysis ad desig of efficiet samplig strategies, while the impact of samplig is uclear from a theoretical viewpoit. I this paper, we show that samplig ca speed up oisy evolutioary optimizatio expoetially via rigorous ruig time aalysis. For the +)-EA solvig the OeMax ad the LeadigOes problems uder prior e.g., oe-bit) or posterior e.g., additive Gaussia) oise, we prove that, uder a high oise level, the ruig time ca be reduced from expoetial to polyomial by samplig. The aalysis also shows that a gap of oe o the value of k for samplig ca lead to a expoetial differece o the expected ruig time, cautioig for a careful selectio of k. We further prove by usig two illustrative examples that samplig ca be more effective for oise hadlig tha paret populatios ad threshold selectio, two strategies that have show to be robust to oise. Fially, we also show that samplig ca be ieffective whe oise does ot brig a egative impact. Keywords Robust optimizatio, optimizatio i oisy eviromets, evolutioary algorithms, ruig time aalysis, computatioal complexity. Correspodig author c 200X by the Massachusetts Istitute of Techology Evolutioary Computatio xx): xxx-xxx

2 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou Itroductio I may real-world optimizatio tasks, the exact objective i.e., fitess) evaluatio of cadidate solutios is almost impossible, while we ca obtai oly a oisy oe. Evolutioary algorithms EAs) Bäck, 996) are geeral-purpose optimizatio algorithms ispired from atural pheomea, ad have bee widely ad successfully applied to solve oisy optimizatio problems Ji ad Brake, 2005; Biachi et al., 2009; Zeg et al., 205). Durig evolutioary optimizatio, hadlig oise i fitess evaluatio is very importat, sice oise may mislead the search directio ad the deteriorate the efficiecy of EAs. May studies thus have focused o reducig the egative effect of oise i evolutioary optimizatio Arold, 2002; Beyer, 2000; Ji ad Brake, 2005). Oe popular way to cope with oise i fitess evaluatio is samplig Arold ad Beyer, 2006), which, istead of evaluatig the fitess of oe solutio oly oce, evaluates the fitess k times ad the uses the average to approximate the true fitess. Samplig obviously ca reduce the stadard deviatio of the oise by a factor of k, while also icreasig the computatio cost k times. This makes the fitess estimatio closer to the true value, but computatioally more expesive. I order to reduce the samplig cost as much as possible, may smart samplig approaches have bee proposed, icludig adaptive Aizawa ad Wah, 994; Stagge, 998) ad sequetial Brake ad Schmidt, 2003, 2004) methods, which dyamically decide the size of k for each solutio i each geeratio. The impact of samplig o the covergece of EAs i oisy optimizatio has bee empirically ad theoretically ivestigated Gutjahr, 2003; Arold ad Beyer, 2006; Heidrich-Meiser ad Igel, 2009; Rolet ad Teytaud, 200). O the ruig time, a more practical performace measure for how soo a algorithm ca solve a problem, previous experimetal studies have reported coflictig coclusios. I Aizawa ad Wah, 994), it was show that samplig ca speed up a stadard geetic algorithm o two test fuctios; while i Catú-Paz, 2004), samplig led to a larger computatio time for a simple geeratioal geetic algorithm o the OeMax fuctio. However, little work has bee doe o theoretically aalyzig the impact of samplig o the ruig time. Thus, there are may fudametal theoretical issues o samplig that have ot bee addressed, e.g., if samplig ca reduce the ruig time of EAs from expoetial to polyomial i oisy eviromets, ad if samplig will icrease the ruig time i some cases. The ruig time is usually couted by the umber of fitess evaluatios eeded to fid a optimal solutio for the first time, because the fitess evaluatio is deemed as the most costly computatioal process Droste et al., 2002; Yu ad Zhou, 2008; Qia et al., 205b). Rigorous ruig time aalysis has bee a leadig theoretical aspect for radomized search heuristics Neuma ad Witt, 200; Auger ad Doerr, 20). Recetly, progress has bee made o the ruig time aalysis of EAs. Numerous aalytical results for EAs solvig sythetic problems as well as combiatorial problems have bee reported, e.g., Neuma ad Witt, 200; Auger ad Doerr, 20). Meawhile, geeral ruig time aalysis approaches have also bee proposed, e.g., drift aalysis He ad Yao, 200; Doerr et al., 202b; Doerr ad Goldberg, 203), fitesslevel methods Wegeer, 2002; He ad Yao, 2003; Sudholt, 203; Dag ad Lehre, 205b), ad switch aalysis Yu et al., 205; Yu ad Qia, 205). However, most of them focus o oise-free eviromets, where the fitess evaluatio is exact. For EAs i oisy eviromets, few results have bee reported o ruig time aalysis. Droste 2004) first aalyzed the +)-EA o the OeMax problem i the presece of oe-bit oise ad showed the maximal oise level log)/ allowig a 2 Evolutioary Computatio Volume x, Number x

3 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets polyomial ruig time, where the oise level is characterized by the oise probability p [0, ] ad is the problem size. This result was later exteded to the LeadigOes problem ad to may differet oise models i Gieße ad Kötzig, 206), which also proved that small populatios of size Θlog ) ca make elitist EAs i.e., µ+)-ea ad +λ)-ea, perform well i high oise levels. The robustess of populatios to oise was also proved i the settig of o-elitist EAs with mutatio oly Dag ad Lehre, 205a) or uiform crossover oly Prugel-Beett et al., 205). However, Friedrich et al. 205) showed the limitatio of paret populatios to cope with oise by provig that the µ+)-ea eeds super-polyomial time for solvig OeMax i the presece of additive Gaussia oise N 0, σ 2 ) with σ 2 3. This difficulty ca be overcome by the compact geetic algorithm cga) Friedrich et al., 205) ad a simple At Coloy Optimizatio ACO) algorithm Friedrich et al., 206), both of which fid the optimal solutio i polyomial time with a high probability. Recetly, Qia et al. 205a) proved that the threshold selectio strategy is also robust to oise: the expected ruig time of the +)-EA usig threshold selectio o OeMax i the presece of oe-bit oise is always polyomial regardless of the oise level. They also showed the limitatio of threshold selectio uder asymmetric oe-bit oise ad further proposed smooth threshold selectio, which ca overcome the difficulty. Note that there was also a sequece of papers aalyzig the ruig time of ACO o sigle destiatio shortest paths SDSP) problems with edge weights disturbed by oise Sudholt ad Thysse, 202; Doerr et al., 202a; Feldma ad Kötzig, 203). I additio to the above results, there exist two other pieces of work o ruig time aalysis i oisy evolutioary optimizatio that ivolve samplig. Akimoto et al. 205) proved that samplig with a large eough k ca make optimizatio uder additive ubiased oise behave as optimizatio i a oise-free eviromet, ad thus cocluded that oisy optimizatio usig samplig ca be solved i k r ruig time, where r is the oise-free ruig time. A similar result was also achieved for a adaptive Pareto samplig APS) algorithm solvig bi-objective optimizatio problems uder additive Gaussia oise N 0, σ 2 ) Gutjahr, 202). These results, however, do ot describe ay impact of samplig o the ruig time, because they do ot compare the ruig time i oisy optimizatio without samplig. I this paper, we show that samplig ca speed up oisy evolutioary optimizatio expoetially via rigorous ruig time aalysis. For the +)-EA solvig the OeMax ad the LeadigOes problems uder prior e.g., oe-bit) or posterior e.g., additive Gaussia) oise, we prove that the ruig time is expoetial whe the oise level is high i.e., Theorems, 4, 6, 7), while samplig ca reduce the ruig time to be polyomial i.e., Theorems 3, 5, Corollaries, 2). Particularly, for the +)- EA solvig OeMax uder oe-bit oise with p =, the aalysis also shows that a gap of oe o the value of k for samplig ca lead to a expoetial differece o the expected ruig time i.e., Theorems 2, 3), which reveals that a careful selectio of k is importat for the effectiveess of samplig. As previous studies Qia et al., 205a; Gieße ad Kötzig, 206) have show that paret populatios ad threshold selectio ca brig about robustess to oise, we also compare samplig with these two strategies. O the OeMax problem uder additive Gaussia oise N 0, σ 2 ) with σ 2 3, the µ+)-ea eeds super-polyomial time Friedrich et al., 205) i.e., Theorem 8), while the +)-EA usig samplig ca solve the problem i polyomial time i.e., Corollary ). O the OeMax problem uder asymmetric oe-bit oise with p =, the +)-EA usig threshold selectio Evolutioary Computatio Volume x, Number x 3

4 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou eeds at least expoetial time Qia et al., 205a) i.e., Theorem 9), while the +)- EA usig samplig ca solve it i O log 2 ) time i.e., Theorem 0). Therefore, these results show that samplig ca be more tolerat of oise tha paret populatios ad threshold selectio, respectively. Fially, for the +)-EA solvig the Trap problem uder additive Gaussia oise, we prove that oise does ot brig a egative impact. Uder the assumptio that the positive impact of oise icreases with the oise level, we cojecture that samplig is ieffective i this case sice it will decrease the oise level. The cojecture is verified by experimets. Note that the cojecture is cosistet with that i Qia et al., 205a). I that work it is hypothesized that the impact of oise is correlated with the problem hardess: whe the problem is EA-hard He ad Yao, 2004) w.r.t. a specific EA e.g., the Trap problem for the +)-EA), oise ca be helpful ad does ot eed to be hadled, but whe the problem is EA-easy He ad Yao, 2004), oise ca be harmful ad eeds to be tackled. This paper exteds our prelimiary work Qia et al., 204) ad improves oe previous statemet. I Qia et al., 204), we proved a sufficiet coditio uder which samplig is ieffective, ad applied it to the cases that the +)-EA solvig OeMax ad Trap uder additive Gaussia oise. The proof assumed the mootoicity of a quatity. By fidig that a upper/lower-boud of the quatity is mootoic, we hypothesized that the quatity itself is also mootoic. Cosiderig that this property does ot always hold, we have corrected our previous statemet o the OeMax problem by provig that samplig with a moderate sample size is possible to expoetially reduce the ruig time of the +)-EA from o samplig i.e., Theorem 6, Corollary ). Meawhile, both aalysis ad experimets i.e., Sectio 6) show that samplig is ieffective o the Trap problem. The rest of this paper is orgaized as follows. Sectio 2 itroduces some prelimiaries. The robustess aalysis of samplig to prior ad posterior oise is preseted i Sectios 3 ad 4, respectively. Sectio 5 compares samplig with the other two strategies, paret populatios ad threshold selectio, o the robustess to oise. Sectio 6 gives a case where samplig is ieffective. Sectio 7 cocludes the paper. 2 Prelimiaries I this sectio, we first itroduce the oise models, problems ad evolutioary algorithms studied i this paper, respectively, the describe the samplig strategy, ad fially preset the aalysis tools that we use throughout this paper. 2. Noise Models Noise models ca be geerally divided ito two categories: prior ad posterior Ji ad Brake, 2005; Gieße ad Kötzig, 206). For prior oise, the oise comes from the variatio o a solutio istead of the evaluatio process. Oe-bit oise as preseted i Defiitio is a represetative oe, which flips a radom bit of a solutio before evaluatio with probability p. For posterior oise, the oise comes from the variatio o the fitess of a solutio. A represetative model is additive Gaussia oise as preseted i Defiitio 2, which adds a value draw from a Gaussia distributio. Both oe-bit oise ad additive Gaussia oise have bee widely used i previous empirical ad theoretical studies, e.g., Beyer, 2000; Droste, 2004; Ji ad Brake, 2005; Gieße ad Kötzig, 206). I this paper, we will also use these two kids of oise models. 4 Evolutioary Computatio Volume x, Number x

5 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets Defiitio Oe-bit Noise). Give a parameter p [0, ], let f x) ad fx) deote the oisy ad true fitess of a biary solutio x {0, }, respectively, the f x) = { fx) with probability p, fx ) with probability p, where x is geerated by flippig a uiformly radomly chose bit of x. Defiitio 2 Additive Gaussia Noise). Give a Gaussia distributio N θ, σ 2 ), let f x) ad fx) deote the oisy ad true fitess of a solutio x, respectively, the f x) = fx) + δ, where δ is radomly draw from N θ, σ 2 ), deoted by δ N θ, σ 2 ). I additio to the above oises, we also cosider a variat of oe-bit oise called asymmetric oe-bit oise Qia et al., 205a), i Defiitio 3. For the flippig of asymmetric oe-bit oise o a solutio x {0, }, if x 0 = 0, a radom bit is flipped; if x 0 =, a radom 0 bit is flipped; otherwise, the probability of flippig a specific 0 bit is 2 x 0, ad the probability of flippig a specific bit is 2 x 0, where x 0 = i= x i is the umber of 0-bits of x. Note that for oe-bit oise, the probability of flippig ay specific bit is. Defiitio 3 Asymmetric Oe-bit Noise). Give a parameter p [0, ], let f x) ad fx) deote the oisy ad true fitess of a biary solutio x {0, }, respectively, the f x) = fx) with probability p), otherwise f x) = fx ), where x is geerated by flippig the j-th bit of x, ad j is a uiformly radomly chose positio of { all bits of x, if x 0 = 0 or ; 0 bits of x, with probability /2; bits of x, with probability /2., otherwise. 2.2 Optimizatio Problems As most theoretical aalyses of EAs start from simple sythetic problems, we also use two well-kow test fuctios OeMax ad LeadigOes, which have bee widely studied i both oise-free e.g., He ad Yao, 200; Droste et al., 2002; Sudholt, 203)) ad oisy e.g., Droste, 2004; Dag ad Lehre, 205a; Gieße ad Kötzig, 206)) evolutioary optimizatio. The OeMax problem as preseted i Defiitio 4 aims to maximize the umber of -bits of a solutio. Its optimal solutio is... briefly deoted as ) with the fuctio value. It has bee show that the expected ruig time of the +)-EA o OeMax is Θ log ) Droste et al., 2002). Defiitio 4 OeMax). The OeMax Problem of size is to fid a bits biary strig x such that x = arg max x {0,} fx) = ). i= x i The LeadigOes problem as preseted i Defiitio 5 aims to maximize the umber of cosecutive -bits coutig from the left of a solutio. Its optimal solutio is with the fuctio value. It has bee proved that the expected ruig time of the +)-EA o LeadigOes is Θ 2 ) Droste et al., 2002). Evolutioary Computatio Volume x, Number x 5

6 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou Defiitio 5 LeadigOes). The LeadigOes Problem of size is to fid a bits biary strig x such that x = arg max x {0,} fx) = i= i j= x j We will also use a EA-hard problem Trap i Defiitio 6, the aim of which is to maximize the umber of 0-bits of a solutio except for the optimal solutio. Its optimal fuctio value is C > 0, ad the fuctio value for ay o-optimal solutio is ot larger tha 0. The expected ruig time of the +)-EA o Trap has bee prove to be Θ ) Droste et al., 2002). Defiitio 6 Trap). The Trap Problem of size is to fid a bits biary strig x such that, let C >, x = arg max x {0,} fx) = C x i ) x i. i= i= 2.3 Evolutioary Algorithms I this paper, we cosider the +)-EA as described i Algorithm, which is a simple EA for maximizig pseudo-boolea problems over {0, }. The +)-EA reflects the commo structure of EAs. It maitais oly oe solutio i.e., the populatio size is ), ad repeatedly improves the curret solutio by usig bit-wise mutatio i.e., step 3) ad selectio i.e., steps 4 ad 5). The +)-EA has bee widely used i the ruig time aalysis of EAs, see Neuma ad Witt, 200; Auger ad Doerr, 20). Algorithm +)-EA). Give a fuctio f over {0, } to be maximized, it cosists of the followig steps:. x := uiformly radomly selected from {0, }. 2. Repeat util the termiatio coditio is met 3. x := flip each bit of x idepedetly with probability /. 4. if fx ) fx) 5. x := x. For the +)-EA i oisy eviromets, oly a oisy fitess value f x) is available, ad thus step 4 of Algorithm chages to be if f x ) f x). Note that we assume that the reevaluatio strategy is used as i Droste, 2004; Doerr et al., 202a; Gieße ad Kötzig, 206), that is, whe accessig the fitess of a solutio, it is always calculated by samplig a ew radom variate, or drawig a ew radom siglebit mask. For example, for the +)-EA, both f x ) ad f x) will be evaluated ad reevaluated i each iteratio. The ruig time i oisy optimizatio is usually defied as the umber of fitess evaluatios eeded to fid a optimal solutio w.r.t. the true fitess fuctio f for the first time Droste, 2004; Akimoto et al., 205; Gieße ad Kötzig, 206). I oisy optimizatio, a worse solutio may appear to have a better fitess ad the survive to replace the true better solutio which has a worse fitess. This may mislead the search directio of EAs, ad the deteriorate the efficiecy of EAs. To deal with this problem, a selectio strategy for EAs hadlig oise was proposed Marko et al., 200; Bartz-Beielstei, 2005). threshold selectio: a offsprig solutio will be accepted oly if its fitess is larger tha the paret solutio by at least a predefied threshold τ 0. 6 Evolutioary Computatio Volume x, Number x ).

7 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets For example, whe usig threshold selectio, the 4th step of the +)-EA i Algorithm chages to be if fx ) fx) + τ rather tha if fx ) fx). Such a strategy ca reduce the risk of acceptig a bad solutio due to oise. I Qia et al., 205a), it has bee proved that threshold selectio with τ = ca make the +)- EA solve the OeMax problem i polyomial time eve if oe-bit oise occurs with probability. 2.4 Samplig I oisy evolutioary optimizatio, samplig as described i Defiitio 7 has ofte bee used to reduce the egative effect of oise Aizawa ad Wah, 994; Stagge, 998; Brake ad Schmidt, 2003, 2004). It approximates the true fitess fx) usig the average of a umber of radom evaluatios. Samplig ca estimate the true fitess more accurately. For example, the output fitess ˆfx) by samplig uder additive Gaussia oise N θ, σ 2 ) ca be represeted by fx) + δ with δ N θ, σ 2 /k), that is, samplig reduces the variace of oise by a factor of k. However, the computatio time for the fitess estimatio of a solutio is also icreased by k times. Defiitio 7 Samplig). Samplig first evaluates the fitess of a solutio k times idepedetly ad obtais the oisy fitess values f x),..., fk x), ad the outputs their average as ˆfx) = k k i= f i x). For the +)-EA usig samplig, the 4th step of Algorithm chages to be if ˆfx ) ˆfx). Note that k = is equivalet to that samplig is ot used. 2.5 Aalysis Tools To derive ruig time bouds i this paper, we first model EAs as Markov chais, ad the use a variety of drift theorems. The evolutio process usually goes forward oly based o the curret populatio, thus, a EA ca be modeled as a Markov chai {ξ t } + t=0 e.g., i He ad Yao, 200; Yu ad Zhou, 2008)) by takig the EA s populatio space X as the chai s state space, i.e. ξ t X. Note that the populatio space X cosists of all possible populatios. Let X X deote the set of all optimal populatios, which cotai at least oe optimal solutio. The goal of the EA is to reach X from a iitial populatio. Thus, the process of a EA seekig X ca be aalyzed by studyig the correspodig Markov chai with the optimal state space X. Note that we cosider the discrete state space i.e., X is discrete) i this paper. Give a Markov chai {ξ t } + t=0 ad ξˆt = x, we defie its first hittig time FHT) as a radom variable τ such that τ = mi{t ξˆt+t X, t 0}. That is, τ is the umber of steps eeded to reach the optimal space for the first time startig from ξˆt = x. The mathematical expectatio of τ, E[[τ ξˆt = x]] = + i=0 ip τ = i), is called the expected first hittig time EFHT) of this chai startig from ξˆt = x. If ξ 0 is draw from a distributio π 0, E[[τ ξ 0 π 0 ]] = x X π 0x)E[[τ ξ 0 = x]] is called the EFHT of the Markov chai over the iitial distributio π 0. Thus, the expected ruig time of the correspodig EA startig from ξ 0 π 0 is equal to N + N 2 E[[τ ξ 0 π 0 ]], where N ad N 2 are the umber of fitess evaluatios for the iitial populatio ad each iteratio, respectively. For example, for the +)-EA usig samplig, N = k ad N 2 = 2k due to the reevaluatio strategy. Note that whe ivolvig the expected ruig time of a EA o a problem i this paper, it is the expected ruig time Evolutioary Computatio Volume x, Number x 7

8 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou startig from a uiform iitial distributio π u, i.e., N + N 2 E[[τ ξ 0 π u ]] = N + N 2 x X X E[[τ ξ 0 = x]]. Thus, i order to aalyze the expected ruig time of EAs, we just eed to aalyze the EFHT of the correspodig Markov chais. I the followig, we itroduce the drift theorems which will be used to derive the EFHT of Markov chais i the paper. Drift aalysis was first itroduced to the ruig time aalysis of EAs by He ad Yao 200). Sice the, it has become a popular tool i this field, ad may variats have bee proposed e.g., i Doerr et al., 202b; Doerr ad Goldberg, 203)). I this paper, we will use its additive i.e., Lemma ) as well as multiplicative i.e., Lemma 2) versio. To use them, a fuctio V x) has to be costructed to measure the distace of a state x to the optimal state space X. The distace fuctio V x) satisfies that V x X ) = 0 ad V x / X ) > 0. The, we eed to ivestigate the progress o the distace to X i each step, i.e., E[[V ξ t ) V ξ t+ ) ξ t ]]. For additive drift aalysis i.e., Lemma ), a upper boud of the EFHT ca be derived through dividig the iitial distace by a lower boud of the progress. Multiplicative drift aalysis i.e., Lemma 2) is much easier to use whe the progress is roughly proportioal to the curret distace to the optimum. Lemma Additive Drift Aalysis He ad Yao, 200)). Give a Markov chai {ξ t } + t=0 ad a distace fuctio V x), if for ay t 0 ad ay ξ t with V ξ t ) > 0, there exists a real umber c > 0 such that E[[V ξ t ) V ξ t+ ) ξ t ]] c, the the EFHT satisfies that E[[τ ξ 0 ]] V ξ 0 )/c. Lemma 2 Multiplicative Drift Aalysis Doerr et al., 202b)). Give a Markov chai {ξ t } + t=0 ad a distace fuctio V x), if for ay t 0 ad ay ξ t with V ξ t ) > 0, there exists a real umber c > 0 such that the the EFHT satisfies that where V mi = mi{v x) V x) > 0}. E[[V ξ t ) V ξ t+ ) ξ t ]] c V ξ t ), E[[τ ξ 0 ]] + logv ξ 0)/V mi ), c The simplified drift theorem Oliveto ad Witt, 20, 202) as preseted i Lemma 3 was proposed to prove expoetial lower bouds o the FHT of Markov chais, where X t is usually represeted by a mappig of ξ t. It requires two coditios: a costat egative drift ad expoetially decayig probabilities of jumpig towards or away from the goal state. To relax the requiremet of a costat egative drift, the simplified drift theorem with self-loops Rowe ad Sudholt, 204) as preseted i Lemma 4 has bee proposed, which takes ito accout large self-loop probabilities. Lemma 3 Simplified Drift Theorem Oliveto ad Witt, 20, 202)). Let X t, t 0, be real-valued radom variables describig a stochastic process over some state space. Suppose there exists a iterval [a, b] R, two costats δ, ɛ > 0 ad, possibly depedig o l := b a, a fuctio rl) satisfyig rl) = ol/ logl)) such that for all t 0 the followig two coditios hold:. E[[X t X t+ a < X t < b]] ɛ, 8 Evolutioary Computatio Volume x, Number x

9 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets 2. P X t+ X t j X t > a) rl) + δ) j for j N 0. The there is a costat c > 0 such that for T := mi{t 0 : X t a X 0 b} it holds P T 2 cl/rl) ) = 2 Ωl/rl)). Lemma 4 Simplified Drift Theorem with Self-loops Rowe ad Sudholt, 204)). Let X t, t 0, be real-valued radom variables describig a stochastic process over some state space. Suppose there exists a iterval [a, b] R, two costats δ, ɛ > 0 ad, possibly depedig o l := b a, a fuctio rl) satisfyig rl) = ol/ logl)) such that for all t 0 the followig two coditios hold:. E[[X t X t+ X t = i]] ɛ P X t+ i X t = i) for a < i < b, 2. P X t+ X t j X t =i) rl) +δ) j P X t+ i X t =i) for i > a, j N 0. The there is a costat c > 0 such that for T := mi{t 0 : X t a X 0 b} it holds P T 2 cl/rl) ) = 2 Ωl/rl)). 3 Robustess to Prior Noise I this sectio, by comparig the expected ruig time of the +)-EA with or without samplig for solvig the OeMax ad the LeadigOes problems uder oe-bit oise, we show the robustess of samplig to prior oise. 3. The OeMax Problem Oe-bit oise with p = is cosidered here. We first aalyze the case i which samplig is ot used. Note that Droste 2004) proved that the expected ruig time is super-polyomial for p ωlog)/). Gieße ad Kötzig 206) have recetly reproved the super-polyomial lower boud for p ωlog)/) ωlog)/) by usig the simplified drift theorem Oliveto ad Witt, 20, 202). However, their proof does ot cover p =. Here, we use the simplified drift theorem with self-loops Rowe ad Sudholt, 204) to prove the lower boud of the expoetial ruig time for p = as show i Theorem. Theorem. For the +)-EA solvig the OeMax problem uder oe-bit oise with p =, the expected ruig time is expoetial. Proof. We use Lemma 4 to prove this theorem. Let X t be the umber of 0-bits of the solutio after t iteratios of the +)-EA. We cosider the iterval [0, /4 ], i.e., the parameters a = 0 i.e., the global optimum) ad b = /4 i Lemma 4. The, we aalyze the drift E[[X t X t+ X t = i]] for i < /4. Let p i,i+d deote the probability that the ext solutio after bit-wise mutatio ad selectio has i + d i d i) umber of 0-bits i.e., X t+ = i + d). We thus have E[[X t X t+ X t = i]] = i d= i d p i,i d d p i,i+d. ) We the aalyze the probabilities p i,i+d for i. Let P d deote the probability that the offsprig solutio x geerated by bit-wise mutatio has i + d umber of 0- bits. Note that oe-bit oise with p = makes the oisy fitess ad the true fitess of a solutio have a gap of oe, i.e., f x) fx) =. For a solutio x with x 0 = i, Evolutioary Computatio Volume x, Number x 9 d=

10 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou f x) = i + with a probability of i ; otherwise, f x) = i. Let x ad x deote the curret solutio ad the offsprig solutio, respectively. ) Whe d 3, f x ) i d + i 2 < f x). Thus, the offsprig x will be discarded i this case, which implies that d 3 : p i,i+d = 0. 2) Whe d = 2, the offsprig solutio x will be accepted if ad oly if f x ) = i i = f x), the probability of which is i+2 of x ad flip oe -bit of x i oise. Thus, p i,i+2 = P 2 i+2, sice it eeds to flip oe 0-bit i ). 3) Whe d =, x will be accepted if ad oly if f x ) = i f x) = i, the probability of which is i+ i, sice it eeds to flip oe 0-bit of x ad flip oe -bit of x i oise. Thus, p i,i+ = P i+ i ). 4) Whe d =, x will be rejected if ad oly if f x ) = i f x) = i +, the probability of which is i+ i, sice it eeds to flip oe -bit of x ad flip oe 0-bit of x i oise. Thus, p i,i = P i+ i ). 5) Whe d 2, f x ) i d i + f x). Thus, the offsprig x will always be accepted i this case, which implies that d 2 : p i,i+d = P d. We the boud the probabilities P d. For d > 0, P d ) i d d ) d, sice it is sufficiet to flip d -bits ad keep other bits uchaged; P d ) i d, sice it is d ecessary to flip at least d 0-bits. Thus, we ca upper boud i d=2 dp d as follows: i i ) i dp d d d d = d=2 = i i i d d=2 d=0 i ) i d d d i d= ) d i = i + ) i ). For P, we also eed a tighter upper boud see Lemma 2 i Paixão et al., 205)) P i ).4. By applyig these probabilities to Eq. ), we have E[[X t X t+ X t = i]] = i + ) i i P + dp d i + i P 2 i + 2 i P 2 d=2 i + ) i i ).4 + i + ) i ) i + i i ) 2 i + 2 i i) i ) 2 2 ) 2 i ).4 2 ) ) ) 2 i + O sice i < /4 ) i ) ) 2 i + O. by ) e ) To ivestigate the coditio of Lemma 4, we also eed to aalyze the probability 0 Evolutioary Computatio Volume x, Number x

11 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets P X t+ i X t = i) for i < /4. We have P X t+ i X t = i) = i + i ) P + i d=2 P d + i + i P + i + 2 i P 2. It is easy to verify that P X t+ i X t = i) = Θ i ). Thus, E[[X t X t+ X t = i]] = ΩP X t+ i X t = i)), which implies that coditio of Lemma 4 holds. For coditio 2 of Lemma 4, we eed to compare P X t+ X t j X t = i) with rl) +δ) P X j t+ i X t = i) for i. We rewrite P X t+ i X t = i) as P X t+ X t X t = i), ad show that coditio 2 holds with δ = ad rl) = 32e rl) 7. For j {, 2, 3}, it trivially holds, because +δ) >. For j 4, accordig j to the aalysis o p i,i+d, we have P X t+ X t j X t = i) = i P d d=j ) i j j j! ) j i 2 2 j i, where the first iequality is because for decreasig the umber of 0-bits by at least j i mutatio, it is ecessary to flip at least j 0-bits. Furthermore, we have P X t+ X t X t = i) p i,i = i + ) i P i + ) i i ) 7 6e i, where the last iequality holds with 2. Thus, rl) + δ) j P X t+ X t X t = i) 32e 7 2 j 7 i 6e = 2 2 j i P X t+ X t j X t = i), which implies that coditio 2 of Lemma 4 holds. Note that l = b a = /4. Thus, by Lemma 4, the probability that the ruig time is 2 O/4) whe startig from a solutio x with x 0 /4 is expoetially small. Due to the uiform iitial distributio, the probability that the iitial solutio x has x 0 < /4 is expoetially small by Cheroff s iequality. Thus, the expected ruig time is expoetial. The, we aalyze the case i which samplig with k = 2 is used. The expected ruig time is still expoetial, as show i Theorem 2. The proof is very similar to that of Theorem. The chage of the probabilities p i,i+d led by icreasig k from to 2 does ot affect the applicatio of the simplified drift theorem with self-loops i.e., Lemma 4). The detailed proofs are show i the supplemetary material due to space limitatios. Theorem 2. For the +)-EA solvig the OeMax problem uder oe-bit oise with p =, if usig samplig with k = 2, the expected ruig time is expoetial. We have show that samplig with k = 2 is ot effective. I the followig, we prove that icreasig k from 2 to 3 ca reduce the expected ruig time to be polyomial as show i Theorem 3, the proof of which is accomplished by applyig multiplicative drift aalysis Doerr et al., 202b). Evolutioary Computatio Volume x, Number x

12 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou Theorem 3. For the +)-EA solvig the OeMax problem with 8 uder oe-bit oise with p =, if usig samplig with k = 3, the expected ruig time is O log ). Proof. We use Lemma 2 to prove this theorem. We first costruct a distace fuctio V x) as x X = {0, }, V x) = x 0, where x 0 = i= x i is the umber of 0-bits of the solutio x. It is easy to verify that V x X = { }) = 0 ad V x / X ) > 0. The, we ivestigate E[[V ξ t ) V ξ t+ ) ξ t = x]] for ay x with V x) > 0 i.e., x / X ). We deote the umber of 0-bits of the curret solutio x by i where i ). Let p i,i+d be the probability that the ext solutio after bit-wise mutatio ad selectio has i + d umber of 0-bits where i d i). Note that we are referrig to the true umber of 0-bits of a solutio istead of the effective umber of 0-bits after oisy evaluatio. Thus, E[[V ξ t ) V ξ t+ ) ξ t = x]] = i d= i d p i,i d d p i,i+d. 2) We the aalyze p i,i+d for i as i the proof of Theorem. Note that for a solutio x, the fitess value output by samplig with k = 3 is the average of oisy fitess values output by three idepedet fitess evaluatios, i.e., ˆfx) = f x) + f2 x) + f3 x))/3. ) Whe d 3, ˆfx ) i d + i 2 < ˆfx). Thus, the offsprig x will be discarded, the we have d 3 : p i,i+d = 0. 2) Whe d = 2, x will be accepted if ad oly if ˆfx ) = i = ˆfx), the probability of which is i+2 )3, sice it eeds to always flip oe 0-bit of x ad )3 i flip oe -bit of x i three oisy fitess evaluatios. Thus, p i,i+2 = P 2 i+2 )3 i d= )3. 3) Whe d =, there are three possible cases for the acceptace of x : ˆfx ) = i ˆfx) = i, ˆfx ) = i ˆfx) = i 3 ad ˆfx ) = i 2 3 ˆfx) = i. The probability of ˆfx ) = i is i+ )3, sice it eeds to always flip oe 0-bit of x i three oisy evaluatios. The probability of ˆfx ) = i 2 3 is, sice it eeds to flip oe 0-bit of x i two oisy evaluatios ad flip oe -bit i the other oisy evaluatio. Similarly, we ca derive that the probabilities of ˆfx) = i ad ˆfx) = i i 3 are )3 ad 3 i )2 i, respectively. Thus, p i,i+ = P i+ )3 i )3 + 3 i )2 i i+ i ) + 3 )2 i )3 ). 3 i+ i )2 4) Whe d =, there are three possible cases for the rejectio of x : ˆfx ) = i ˆfx) = i +, ˆfx ) = i ˆfx) = i + 3 ad ˆfx ) = i ˆfx) = i +. The probability of ˆfx ) = i is i+ ) 3, sice it eeds to always flip oe -bit of x i three oisy evaluatios. The probability of ˆfx ) = i is 3 i+ ) 2 i, sice it eeds to flip oe -bit of x i two oisy evaluatios ad flip oe 0-bit i the other evaluatio. Similarly, we ca derive that the probabilities of ˆfx) = i + ad ˆfx) = i + 3 are i )3 ad 3 i i )2, respectively. Thus, p i,i = P i+ ) 3 i )3 + 3 i i )2 ) 3 i+ ) 2 i i )3 ). 5) Whe d 2, ˆfx ) i d i + ˆfx). Thus, x will always be accepted, the we have d 2 : p i,i+d = P d. By applyig these probabilities to Eq. 2), we have E[[V ξ t ) V ξ t+ ) ξ t = x]] p i,i p i,i+ 2 p i,i+2 3) ) 3 ) 3 ) ) 2 ) 2 ) ) 3 i+ i i i i+ i i = P Evolutioary Computatio Volume x, Number x

13 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets i+ ) 3 i ) 3 ) ) 2 ) 2 ) ) 3 i i i+ i i P ) 3 ) 3 i + 2 i 2 P 2. We simplify the above equatio by usig simple mathematical calculatios. ) 3 ) 3 ) ) 2 ) 2 i + i i i i + i i = i + ) 2 i i + i ) ) i + i 9 ) 2 i + i +, 20 where the iequality is because 5x )x )2. By replacig i with i i the above equatio, we get ) 3 ) 3 ) ) 2 ) 2 i + i i i i + i ) 2 i i Thus, Eq. 3) becomes ) 3 i E[[V ξ t ) V ξ t+ ) ξ t = x]] 4) P 9 ) ) 2 ) 2 i i i + i + P ) 3 ) 3 i + 2 i 2 P 2. We the boud the three mutatio probabilities P, P ad P 2. For decreasig the umber of 0-bits by i mutatio, it is sufficiet to flip oe 0-bit ad keep other bits uchaged, thus we have P i ). For icreasig the umber of 0-bits by 2, it is ecessary to flip at least two -bits, thus we have P 2 ) i 2 = i) i ) 2 2. For 2 icreasig the umber of 0-bits by, it eeds to flip oe more -bit tha the umber of 0-bits it flips, thus we have P = i i mi{ i,i+} k= i k ) ) i k 2k ) 2k+ ) mi{ i,i+} i) k i k + k!k )! k k ) 2k+ k=2 ) + i mi{ i,i+} ) k!k )! k=2 Evolutioary Computatio Volume x, Number x 3 ) 3

14 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou i = i ) + i + k=2 k! ) + e 2) i ) ). By applyig these probability bouds to Eq. 4), we have E[[V ξ t ) V ξ t+ ) ξ t = x]] i ) 9 ) 2 + i i+ + i + i i 20 i + e 2) i )) ) i i) i ) i + 2 ) 2 ) 3 i + 2 i. i Whe i 2, + /i 3/2 ad + 2/i 2, thus we get E[[V ξ t ) V ξ t+ ) ξ t = x]] i 9 + e 20 = i e 27 ) ) ) e 729 ) ) i, i + 2 where the first iequality is by usig i) i ) i+2 2 )2 i )3 2 i )3 ) )5, ad the last iequality holds with 5. Whe i =, usig Eq. 3), we get i E[[V ξ t ) V ξ t+ ) ξ t = x]] ) 3 ) )) 2 ) 3 ) 3 3 = P P 2 ) 3 ) 3 ) ) 2 ) 2 ) ) P e ) , ) 5 )2 ) 2 where the last iequality holds with 8. Thus, the coditio of Lemma 2 holds with E[[V ξ t ) V ξ t+ ) ξ t = x]] V ξ t). We the get, otig that V mi = ad V x), E[[τ ξ 0 ]] log V ξ 0)) O log ), i.e., the expected ruig time is upper bouded by O log ). Thus, we have show that samplig is robust to oise for the +)-EA solvig the OeMax problem i the presece of oe-bit oise. By comparig Theorem 2 with Theorem 3, we also fid that a gap of oe o the value of k ca lead to a expoetial differece o the expected ruig time, which reveals that a careful selectio of k is importat for the effectiveess of samplig. The complexity trasitio from k = 2 to 4 Evolutioary Computatio Volume x, Number x

15 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets k = 3 is because samplig with k = 3 ca make false progress i.e., acceptig solutios with more 0-bits) domiated by true progress i.e., acceptig solutios with fewer 0- bits), while samplig with k = 2 is ot sufficiet. We have also coducted experimets to complemet the theoretical results, which give bouds oly. For each value of ad k, we ru the +)-EA 000 times idepedetly. I each ru, we record the umber of fitess evaluatios util a optimal solutio w.r.t. the true fitess fuctio is foud for the first time. The the total umber of evaluatios of the 000 rus are averaged as the estimatio of the expected ruig time, called as the estimated ERT. We will always compute the estimated ERT i this way for the experimets throughout this paper. We estimate the expected ruig time of the +)-EA usig samplig with k from to 30. The results for = 40, 50, 60 are plotted i Figure. We ca observe that the curves are high at k =, 2 ad drop suddely at k = 3, which is cosistet with our theoretical results i Theorems -3. Note that the curves grow liearly sice k = 3, which is because ERT = 2k EFHT i.e., the umber of fitess evaluatios i each iteratio the umber of iteratios), ad whe the oise has bee sufficietly reduced by samplig, the umber of iteratios caot further reduce as k icreases, but the samplig cost icreases liearly with k. 4 x x x 0 5 Estimated ERT 3 2 Estimated ERT Estimated ERT Sample size k Sample size k Sample size k a) = 40 b) = 50 c) = 60 Figure : Estimated ERT for the +)-EA usig samplig o the OeMax problem uder oe-bit oise with p = The LeadigOes Problem Oe-bit oise with p = 2 is cosidered here. For the case i which samplig is ot used, Gieße ad Kötzig 206) have proved the expoetial ruig time lower boud as show i Theorem 4. We prove i Theorem 5 that samplig ca reduce the expected ruig time to be polyomial. Theorem 4. Gieße ad Kötzig, 206) For the +)-EA solvig the LeadigOes problem uder oe-bit oise with p = 2, the expected ruig time is 2Ω). Theorem 5. For the +)-EA solvig the LeadigOes problem uder oe-bit oise with p = 2, if usig samplig with k = 04, the expected ruig time is O 6 ). Proof. We use Lemma to prove this theorem. Let LOx) = i= i j= x j deote the umber of leadig -bits of a solutio x. We first costruct a distace fuctio V x) as x X = {0, }, V x) = LOx). It is easy to verify that V x X = { }) = 0 ad V x / X ) > 0. The, we aalyze E[[V ξ t ) V ξ t+ ) ξ t = x]] for ay x with V x) > 0. For the curret solutio x, assume that LOx) = i where 0 i ). Let x be the offsprig solutio produced by mutatig x. We cosider three mutatio cases for LOx ): Evolutioary Computatio Volume x, Number x 5

16 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou ) The l-th leadig -bit is flipped ad the first l ) leadig -bits remai uchaged, which leads to LOx ) = l. Thus, l i : P LOx ) = l ) = )l. 2) The i + )-th bit which must be 0) is flipped ad the first i leadig -bits remai uchaged, which leads to LOx ) i +. Thus, we have P LOx ) i + ) = )i. 3) The first i + ) bits remai uchaged, which leads to LOx ) = i. Thus, P LOx ) = i) = )i+. Assume that LOx ) = j. We the aalyze the acceptace probability of x, i.e., P ˆfx ) ˆfx)). Note that ˆfx) = k i= f i x))/k, where f i x) is the fitess output by oe idepedet oisy evaluatio. By oe-bit oise with p = 2, the f x) value ca be calculated as follows: ) The oise does ot occur, whose probability is p = 2. Thus, P f x) = i) = 2. 2) The oise occurs, the probability of which is p = 2. 2.) It flips the l-th leadig -bit, the f x) = l. Thus, we have l i : P f x) = l ) = ) It flips the i + )-th bit, which leads to f x) i +. Thus, we have P f x) i + ) = 2. Note that f x) reaches the miimum i + whe x has 0-bit at positio i + 2, ad f x) reaches the maximum whe x has all -bits sice positio i ) Otherwise, f x) remais uchaged. Thus, we have P f x) = i) = i+ 2 ). For each i, let x opt i be the solutio which has oly -bits except for the i + )-th bit i.e., x opt i = i 0 i ), ad let x pes i be the solutio with i leadig -bits ad otherwise oly 0-bits i.e., x pes i = i 0 i ). The we have the stochastic orderig f x pes i ) f x) f x opt pes i ), which implies that ˆfx i ) ˆfx) opt ˆfx i ). We ca similarly get ) ˆfx ) ). Thus, it is easy to see that ˆfx pes j P pes ˆfx j ) ˆfx opt j opt ˆfx i )) P ˆfx ) ˆfx)) P opt ˆfx j ) ˆfx pes i )). 5) Let P mut x, x ) be the probability that x is geerated by mutatig x. By combiig the mutatio probability with the acceptace probability, we have E[[V ξ t ) V ξ t+ ) ξ t = x]] 6) = P mut x, x ) P ˆfx ) ˆfx)) i j)) j=0 LOx )=j i j=0 LOx )=j + j=i+ LOx )=j i j=0 LOx )=j + j=i+ LOx )=j Because P ad P P mut x, x ) P P mut x, x ) P P mut x, x ) P opt ˆfx j ) pes ˆfx j ) P mut x, x ) P opt ˆfx j ) pes ˆfx j ) ˆfx pes i )) j i) opt pes ˆfx i ) ˆfx i )) j i) ˆfx opt i )) j i) by Eq. 5)) pes opt ˆfx i+ ) ˆfx i )) i + i) pes opt pes ˆfx i )) P ˆfx i ) ˆfx i )) for j i, opt ˆfx i )) P pes opt ˆfx i+ ) ˆfx i )) for j i + ) 6 Evolutioary Computatio Volume x, Number x

17 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets = ) i i pes opt P ˆfx i+ ) ˆfx i )) ) j i j) P Because j=i+ LOx )=j ad 0 j i : pes opt P ˆfx i+ ) ˆfx e j=0 P mutx, x ) = P LOx ) i + ) = LOx )=j i )) Because i ad P mutx, x ) = P LOx ) = j) = ii + ) 2 ) i opt pes P ˆfx i ) ˆfx i )). ) e ) ˆfx pes i+ ) i opt We the boud the probabilities P ) ˆfx i )) ad P First, we have k ) k ) ) P ˆfx ) ˆfx)) = P fi x ) /k fi x) /k i= k = P fi x ) i= i= opt pes ˆfx i ) ˆfx i )) ) j ) ) k fi x) 0 = P Zx, x) 0), i= opt pes ˆfx i ) ˆfx i )). where the radom variable Zx, x) is used to represet k i= f i x ) k i= f i x) for coveiece. We the calculate the expectatio ad variace of f x opt i ) ad f x pes i ). Based o the aalysis of f x), we ca easily derive E[[f x opt i E[[f x pes i )]] = i 2 i + j=0 )]] = i 2 i + j=0 2 j i + ) i = i i2 + 3i 4, 2 j + 2 i + ) + i + ) 2 i = i i2 + i 2, 4 Varf x opt i )) = E[[f x opt i )) 2 ]] E[[f x opt i )]]) 2 = i 2 i2 + 2 j i + ) i 2 i + ) i2 + 3i 4 j= , Varf x pes i )) = E[[f x pes i )) 2 ]] E[[f x pes i )]]) 2 = i 2 i2 + 2 j2 + 2 i + )2 + i + ) ) 2 i 2 i i2 + i j= Note that the last iequalities for Varf x opt i )) ad Varf x pes i )) hold with 2. Thus, we have E[[Zx pes i+, xopt i )]] = ke[[f x pes i+ )]] E[[f x opt i )]]) = k 2, Evolutioary Computatio Volume x, Number x 7

18 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou E[[Zx opt i, xpes i VarZx pes VarZx opt i+, xopt i i, xpes i )]] = ke[[f x opt i )]] E[[f x pes i )]]) = k 2, )) = kvarf x pes i+ )) + Varf x opt i ))) 9 2 k2, )) = kvarf x opt i )) + Varf x pes i ))) 9 2 k2. ˆfx pes i+ opt The, we ca get the bouds o the probabilities P ) ˆfx i )) ad pes P ) ˆfx )) by Chebyshev s iequality. Note that Zx, x) is iteger-valued. ˆfx opt i P ˆfx pes i+ = P Zx pes P Zx pes i opt ) ˆfx )) = P Zx pes i i+, xopt i i+, xopt i ) E[[Zx pes i+, xopt i ) E[[Zx pes i+, xopt i i+, xopt i ) 0) = P Zx pes )]] E[[Zx pes )]] + k/2) i+, xopt i+, xopt i ) ) i )]]) VarZxpes i+, xopt i )) + k/2) 2 by Chebyshev s iequality) 92 3k. Similarly, we have P ˆfx opt i = P Zx opt P Zx opt pes ) ˆfx )) = P Zx opt i i, xpes i i, xpes i ) E[[Zx opt ) E[[Zx opt i, xpes i, xpes i i, xpes i i ) 0) )]] E[[Zx opt )]] k/2) i, xpes i )]]) VarZxopt i, xpes i )) k/2) 2 by Chebyshev s iequality) 92 3k. By applyig these two probability bouds to Eq. 6), we have E[[V ξ t ) V ξ t+ ) ξ t = x]] ) 92 ii + ) 9 2 e 3k 2 3k 9 ) ) 9 e by i ad k = 0 4 ) e 3. Thus, coditio of Lemma holds with E[[V ξ t ) V ξ t+ ) ξ t = x]] Ω ). We ca get, otig that V x) = LOx), E[[τ ξ 0 ]] O) V ξ 0 ) O 2 ), i.e., the expected iteratios of the +)-EA for fidig the optimal solutio is upper bouded by O 2 ). Because the expected ruig time is 2k i.e., the umber of fitess evaluatios i each iteratio) the expected iteratios ad k = 0 4, we coclude that the expected ruig time is O 6 ). 4 Robustess to Posterior Noise I the above sectio, we have show that samplig ca be robust to oe-bit oise a kid of prior oise) for the +)-EA solvig the OeMax ad the LeadigOes problems. I this sectio, by comparig the expected ruig time of the +)-EA with or without samplig for solvig OeMax ad LeadigOes uder additive Gaussia oise, we will prove that samplig ca also be robust to posterior oise. 8 Evolutioary Computatio Volume x, Number x

19 O the Effectiveess of Samplig for Evolutioary Optimizatio i Noisy Eviromets 4. The OeMax Problem Additive Gaussia oise N θ, σ 2 ) with σ 2 is cosidered here. We first aalyze the case i which samplig is ot used. By applyig the origial simplified drift theorem Oliveto ad Witt, 20, 202), we prove that the expected ruig time is expoetial, as show i Theorem 6. Theorem 6. For the +)-EA solvig the OeMax problem uder additive Gaussia oise N θ, σ 2 ) with σ 2, the expected ruig time is expoetial. Proof. We use Lemma 3 to prove this theorem. Let X t be the umber of 0-bits of the solutio after t iteratios of the +)-EA. We cosider the iterval [0, /4 ], i.e., the parameters a = 0 ad b = /4 i Lemma 3. The, we aalyze the drift E[[X t X t+ X t = i]] for i < /4. Let p i,i+d deote the probability that the ext solutio after bit-wise mutatio ad selectio has i + d i d i) umber of 0-bits i.e., X t+ = i + d), ad let P d deote the probability that the offsprig solutio geerated by bit-wise mutatio has i + d umber of 0-bits i.e., x 0 = i + d). The, we have, for d 0, p i,i+d = P d P f x ) f x)) = P d P i d + δ i + δ 2 ) = P d P δ δ 2 d) = P d P δ d), where δ, δ 2 N θ, σ 2 ) ad δ N 0, 2σ 2 ). We thus have E[[X t X t+ X t = i]] = i d= i d p i,i d d p i,i+d d= i d P d P P δ ). d= i d p i,i d p i,i+ 7) Let δ N 0, ). The, P δ ) = P δ 2σ ) P δ 2 ) = P δ 2 ) 0.23, where the first iequality is by σ, ad the last oe is obtaied by calculatig the CDF of the stadard ormal distributio. Furthermore, P i ) i e, ad P d ) i d. Applyig these probability bouds to Eq. 7), we have d i ) i E[[X t X t+ X t = i]] d d d i e 0.23 d= = i e e + i + ) i 0.23 ) 0.23 i + + e e e = 0.23 ) /4 + O. sice i < /4 ) e Thus, E[[X t X t+ X t = i]] = Ω), which implies that coditio of Lemma 3 holds. For coditio 2, we eed to ivestigate P X t+ X t j X t ). Because it is ecessary to flip at least j bits, we have ) P X t+ X t j X t ) j j j! 2 2 j, which implies that coditio 2 of Lemma 3 holds with δ = ad rl) = 2. Note that l = b a = /4. Thus, by Lemma 3, the expected ruig time is expoetial. Evolutioary Computatio Volume x, Number x 9 d=

20 C. Qia, Y. Yu, K. Tag, Y. Ji, X. Yao, ad Z.-H. Zhou Note that Friedrich et al. 205) have proved that for solvig OeMax uder additive Gaussia oise N 0, σ 2 ) with σ 2 3, the classical µ+)-ea eeds superpolyomial expected ruig time. Our result i Theorem 6 is complemetary to their result with µ =, sice it covers a costat variace. We the prove i Corollary that usig samplig ca reduce the expected ruig time to be polyomial. The proof idea is that samplig with a large eough k ca reduce the oise to be σ 2 = Olog /), which allows a polyomial ruig time, as show i the followig lemma. I the followig aalysis, let poly) idicate ay polyomial of. Lemma 5. Gieße ad Kötzig, 206) Suppose posterior oise, samplig from some distributio D with variace σ 2. The we have that the +)-EA optimizes OeMax i polyomial time if σ 2 = Olog /). Corollary. For the +)-EA solvig the OeMax problem uder additive Gaussia oise N θ, σ 2 ) with σ 2 ad σ 2 Opoly)), if usig samplig with k = σ2 log, the expected ruig time is polyomial. Proof. The oisy fitess is f x) = fx) + δ, where δ N θ, σ 2 ). The fitess output by samplig is ˆfx) = k i= f i x))/k = k i= fx) + δ i)/k = fx) + k i= δ i/k, where δ i N θ, σ 2 ). Thus, ˆfx) = fx) + δ, where δ N θ, σ2 k ). That is, samplig reduces the variace σ 2 of oise to be σ2 k. Because k = σ2 log, we have σ2 k log. By Lemma 5, the expected iteratios of the +)-EA for fidig the optimal solutio is polyomial. We kow that the expected ruig time is 2k the expected iteratios. Sice σ 2 Opoly)), the expected ruig time is polyomial. Thus, the compariso betwee Theorem 6 ad Corollary correct our previous statemet i Qia et al., 204), that samplig is ieffective for the +)-EA solvig OeMax uder additive Gaussia oise. We have coducted experimets to complemet the theoretical results, which give bouds oly. For the additive Gaussia oise, we set θ = 0 ad σ =. The results for = 0, 20, 30 are plotted i Figure 2. Note that the poit with k = i the figure correspods to the ERT without samplig. From Figure 2b ad c), we ca observe that the ERT has a fast drop at the begiig of the curve, reaches the miimum at a small sample size, ad cosistetly grows after that. The miimum is much smaller tha the value at k =, thus it is clear that a moderate samplig ca reduce the ruig time from o samplig, which is cosistet with our theoretical result. However, i Figure 2a), the ERT always icreases with k, which is similar to what was observed i Figure i Qia et al., 204). The settig i Qia et al., 204) is = 0, θ = 0 ad σ = 0. A too small e.g., = 0) makes the decrease of the umber of iteratios easily domiated by the icrease of k, therefore, we did ot observe the droppig stage of the curve. 4.2 The LeadigOes Problem Additive Gaussia oise N θ, σ 2 ) with σ 2 2 is cosidered here. We first aalyze the case i which samplig is ot used. Usig the origial simplified drift theorem Oliveto ad Witt, 20, 202), we prove that the expected ruig time is expoetial, as show i Theorem 7. Theorem 7. For the +)-EA solvig the LeadigOes problem uder additive Gaussia oise N θ, σ 2 ) with σ 2 2, the expected ruig time is expoetial. Proof. We use Lemma 3 to prove this theorem. Let X t be the umber of 0-bits of the 20 Evolutioary Computatio Volume x, Number x

arxiv: v1 [cs.ne] 2 Nov 2017

arxiv: v1 [cs.ne] 2 Nov 2017 Noame mauscript No. will be iserted by the editor) Ruig Time Aalysis of the +)-EA for OeMax ad LeadigOes uder Bit-wise Noise Chao Qia Chao Bia Wu Jiag Ke Tag Received: date / Accepted: date arxiv:7.00956v