Analysis of Noisy Evolutionary Optimization When Sampling Fails

Size: px

Start display at page:

Download "Analysis of Noisy Evolutionary Optimization When Sampling Fails"

Frederick Ferdinand Garrison
5 years ago
Views:

1 Aalyss of Nosy Evolutoary Optmzato Whe Samplg Fals Chao Qa, Member, IEEE, Chao Ba, Studet Member, IEEE, Yag Yu, Member, IEEE, Ke Tag, Seor Member, IEEE, X Yao, Fellow, IEEE arxv: v [cs.ne] Oct 08 Abstract I osy evolutoary optmzato, samplg s a commo strategy to deal wth ose. By the samplg strategy, the ftess of a soluto s evaluated multple tmes called sample sze depedetly, ad ts true ftess s the approxmated by the average of these evaluatos. Prevous studes o samplg are maly emprcal. I ths paper, we frst vestgate the effect of sample sze from a theoretcal perspectve. By aalyzg the +-EA o the osy LeadgOes problem, we show that as the sample sze creases, the rug tme ca reduce from expoetal to polyomal, but the retur to expoetal. Ths suggests that a proper sample sze s crucal practce. The, we vestgate what strateges ca work whe samplg wth ay fxed sample sze fals. By two llustratve examples, we prove that usg paret or offsprg populatos ca be better. Fally, we costruct a artfcal osy example to show that whe usg ether samplg or populatos s effectve, adaptve samplg.e., samplg wth a adaptve sample sze ca work. Ths, for the frst tme, provdes a theoretcal support for the use of adaptve samplg. Idex Terms Nosy optmzato, evolutoary algorthms, samplg, populato, rug tme aalyss. I. INTRODUCTION Evolutoary algorthms EAs are a type of geeral-purpose radomzed optmzato algorthms, spred by atural evoluto. They have bee wdely appled to solve real-world optmzato problems, whch are ofte subject to ose. Samplg s a popular strategy for dealg wth ose: to estmate the ftess of a soluto, t evaluates the ftess multple m tmes called sample sze depedetly ad the uses the sample average to approxmate the true ftess. Samplg reduces the varace of ose by a factor of m, but also creases the computato tme for the ftess estmato of a soluto by m tmes. Prevous studes maly focused o the emprcal desg of effcet samplg methods, e.g., adaptve samplg [4], [5], whch dyamcally decdes the sample sze m for each soluto each geerato. The theoretcal aalyss o samplg was rarely touched. Due to ther sophstcated behavors of mmckg atural pheomea, the theoretcal aalyss of EAs s dffcult. Much effort thus has bee devoted to uderstadg the behavor C. Qa ad C. Ba are wth the School of Computer Scece ad Techology, Uversty of Scece ad Techology of Cha, Hefe, Cha e-mals: chaoqa@ustc.edu.c, bacht@mal.ustc.edu.c Y. Yu s wth the Natoal Key Laboratory for Novel Software Techology, Najg Uversty, Najg, Cha e-mal: yuy@ju.edu.c K. Tag ad X. Yao are wth the Departmet of Computer Scece ad Egeerg, Souther Uversty of Scece ad Techology, Shezhe, Cha e-mals: tagk3, xy}@sustc.edu.c Ths work s exteded from [5]. of EAs from a theoretcal vewpot [], [7], but most of them focus o ose-free optmzato. The presece of ose further creases the radomess of optmzato, ad thus also creases the dffculty of aalyss. For rug tme aalyss oe essetal theoretcal aspect osy evolutoary optmzato, oly a few results have bee reported. The classc +-EA algorthm was frst studed o the OeMax ad LeadgOes problems uder varous ose models [3], [7], [0], [4], [], [7]. The results showed that the +-EA s effcet oly uder low ose levels, e.g., for the +-EA solvg OeMax the presece of oebt ose, the maxmal ose level of allowg a polyomal rug tme s Olog/, where the ose level s characterzed by the ose probablty p [0,] ad s the problem sze. Later studes maly proved the robustess of dfferet strateges to ose, cludg usg populatos [6], [7], [4], [], [7], samplg [], [3] ad threshold selecto [4]. For example, the µ+-ea wth µ log5 [4], the +λ- EA wth λ 4log [4], the +-EA usg samplg wth m = 3 [3] or the +-EA usg threshold selecto wth threshold τ = [4] ca solve OeMax polyomal tme eve f the probablty of oe-bt ose reaches. Note that there was also a sequece of papers aalyzg the rug tme of the compact geetc algorthm [3] ad at coloy optmzato algorthms [8], [], [], [6] solvg osy problems, cludg OeMax as well as the combatoral optmzato problem sgle destato shortest paths. The very few rug tme aalyses volvg samplg [], [3] maly showed the effectveess of samplg wth a large eough fxed sample sze m. For example, for the +-EA solvg OeMax uder oe-bt ose wth p = ωlog/, usg samplg wth m = 4 3 ca reduce the rug tme expoetally. I addto, Akmoto et al. [] proved that usg samplg wth a large eough m ca make optmzato uder addtve ubased ose behave as oseless optmzato. However, there are stll may fudametal theoretcal ssues that have ot bee addressed, e.g., how the sample sze ca affect the effectveess of samplg, ad what strateges ca work whe samplg fals. I ths paper, we frst theoretcally vestgate the effect of sample sze. It may be beleved that oce the sample sze m reaches a effectve value, the rug tme wll always be polyomal as m cotues to crease. We gve a couterexample,.e., the +-EA solvg LeadgOes uder oe-bt ose wth p =. Qa et al. [] have show that the rug tme wll reduce from expoetal to polyomal whe m = 4 4 log/5. We prove that the rug tme wll retur

2 Algorthm +-EA Gve a pseudo-boolea fucto f to be maxmzed, the procedure of the +-EA: : Let x be a uformly chose soluto from 0,}. : Repeat utl some termato codto s met 3: x := flp each bt of x depedetly wth prob. /. 4: f fx fx the x := x. to expoetal whe m 5. Our aalyss suggests that the selecto of sample sze should be careful practce. The, we theoretcally compare the two strateges of usg populatos ad samplg o the robustess to ose. Prevous studes have show that both of them are effectve for solvg OeMax uder oe-bt ose [4], [], [3], whle usg samplg s better for solvg OeMax uder addtve Gaussa ose [3]. Here, we complemet ths comparso by costructg two specfc osy OeMax problems. For oe of them, usg paret populatos s better tha usg samplg, whle for the other, usg offsprg populatos s better. I both cases, we prove that the employed paret ad offsprg populato szes are almost tght. We also gve a artfcal osy OeMax problem where usg ether populatos or samplg s effectve. For ths case, we further prove that usg adaptve samplg ca reduce the rug tme expoetally, whch provdes some theoretcal justfcato for the good emprcal performace of adaptve samplg [8], [3]. Ths paper exteds our prelmary work [5]. Whe comparg samplg wth populatos, we oly cosdered paret populatos [5]. To get a complete uderstadg, we add the aalyss of usg offsprg populatos. We costruct a ew osy example to show that usg offsprg populatos ca be better tha usg samplg.e., Theorems 9 ad 0 Secto V. For the osy example Secto VI, where we prevously proved that usg ether samplg or paret populatos s effectve whle adaptve samplg ca work, we ow prove that usg offsprg populatos s also effectve.e., Theorem 4 Secto VI. To show that usg paret populatos s better tha usg samplg, we oly gave a effectve paret populato sze [5]. We ow add the aalyss of the tghtess of the effectve paret populato sze.e., Theorem 8 Secto IV as well as the effectve offsprg populato sze.e., Theorem Secto V. The rest of ths paper s orgazed as follows. Secto II troduces some prelmares. Secto III aalyzes the effect of sample sze. The effectveess of usg paret ad offsprg populatos whe samplg fals s proved Sectos IV ad V, respectvely. Secto VI the shows that whe usg ether samplg or populatos s effectve, adaptve samplg ca work. Fally, Secto VII cocludes the paper. II. PRELIMINARIES I ths secto, we frst troduce the EAs ad the samplg strategy, ad the preset the aalyss tools that wll be used ths paper. Algorthm µ+-ea Gve a pseudo-boolea fucto f to be maxmzed, the procedure of the µ+-ea: : Let P be a set of µ uformly chose solutos from 0,}. : Repeat utl some termato codto s met 3: x := uformly selected from P at radom. 4: x := flp each bt of x depedetly wth prob. /. 5: Let z argm z P fz; tes are broke radomly. 6: f fx fz the P := P \z} x }. Algorthm 3 +λ-ea Gve a pseudo-boolea fucto f to be maxmzed, the procedure of the +λ-ea: : Let x be a uformly chose soluto from 0,}. : Repeat utl some termato codto s met 3: Let Q :=. 4: for = to λ do 5: x := flp each bt of x depedetly wth prob./. 6: Q := Q x }. 7: Let z argmax z Q fz; tes are broke radomly. 8: f fz fx the x := z. A. Evolutoary Algorthms The +-EA.e., Algorthm matas oly oe soluto, ad teratvely tres to produce oe better soluto by btwse mutato ad selecto. The µ+-ea.e., Algorthm uses a paret populato sze µ. I each terato, t also geerates oe ew soluto x, ad the uses x to replace the worst soluto the populato P f x s ot worse. The +λ-ea.e., Algorthm 3 uses a offsprg populato sze λ. I each terato, t geerates λ offsprg solutos depedetly by mutatg the paret solutox, ad the uses the best offsprg soluto to replace the paret soluto f t s ot worse. Whe µ = ad λ =, both the µ+-ea ad +λ-ea degeerate to the +-EA. Note that for the µ+-ea, a slghtly dfferet updatg rule s also used [3], [30]: x s smply added to P ad the the worst soluto P x } s deleted. Our results about the µ+-ea derved the paper also apply to ths settg. I osy optmzato, oly a osy ftess value f x stead of the exact oe fx ca be accessed. Note that our aalyss, the algorthms are assumed to use the reevaluato strategy as [8], [0], [4]. That s, besdes evaluatg the osy ftess f x of offsprg solutos, the osy ftess values of paret solutos wll be reevaluated each terato. The rug tme of EAs s usually measured by the umber of ftess evaluatos utl fdg a optmal soluto w.r.t. the true ftess fucto f for the frst tme [], [0], [4]. B. Samplg Samplg as descrbed Defto s a commo strategy to deal wth ose. It approxmates the true ftess fx usg the average of a umber of radom evaluatos. The umber

3 3 m of radom evaluatos s called the sample sze. Note that m = mples that samplg s ot used. Qa et al. [], [3] have theoretcally show the robustess of samplg to ose. Partcularly, they proved that by usg samplg wth some fxed sample sze, the rug tme of the +-EA for solvg OeMax ad LeadgOes uder ose ca reduce from expoetal to polyomal. Defto Samplg. Samplg frst evaluates the ftess of a soluto m tmes depedetly ad obtas the osy ftess values f x,f x,...,f mx, ad the outputs ther average,.e., ˆfx = m = f x/m. Adaptve samplg dyamcally decdes the sample sze for each soluto the optmzato process, stead of usg a fxed sze. For example, oe popular strategy [4], [5] s to frst estmate the ftess of two solutos by a small umber of samples, ad the sequetally crease samples utl the dfferece ca be sgfcatly dscrmated. It has bee foud well useful may applcatos [8], [3], whle there has bee o theoretcal work supportg ts effectveess. C. Aalyss Tools EAs ofte geerate offsprg solutos oly based o the curret populato, thus, a EA ca be modeled as a Markov cha ξ t } + t=0 e.g., [6], [3] by takg the EA s populato space X as the cha s state space.e., ξ t X ad takg the set X of all optmal populatos as the cha s target state space. Note that the populato space X cossts of all possble populatos, ad a optmal populato cotas at least oe optmal soluto. Gve a Markov cha ξ t } + t=0 ad ξˆt, we defe ts frst httg tme as τ = mt ξˆt+t X,t 0}. The expectato of τ, Eτ ξˆt = + =0 Pτ = ξˆt, s called the expected frst httg tme EFHT. If ξ 0 s draw from a dstrbuto π 0, Eτ ξ 0 π 0 = ξ π 0 X 0ξ 0 Eτ ξ 0 s called the EFHT of the cha over the tal dstrbuto π 0. Thus, the expected rug tme of the µ+-ea startg from ξ 0 π 0 s µ+µ+ Eτ ξ 0 π 0, where the frst µ s the cost of evaluatg the tal populato, ad µ+ s the cost of oe terato, where t eeds to evaluate the offsprg solutox ad reevaluate the µ paret solutos. Smlarly, the expected rug tme of the +λ-ea startg from ξ 0 π 0 s ++λ Eτ ξ 0 π 0, where the frst s the cost of evaluatg the tal soluto, ad +λ s the cost of oe terato, where t eeds to evaluate the λ offsprg solutos ad reevaluate the paret soluto. For the +-EA, the expected rug tme s calculated by settg µ = or λ =,.e., + Eτ ξ 0 π 0. For the +-EA wth samplg, t becomes m + m Eτ ξ 0 π 0, because the ftess estmato of a soluto eeds m depedet evaluatos. Note that ths paper, we cosder the expected rug tme of a EA startg from a uform tal dstrbuto. The, we troduce several drft theorems whch wll be used to aalyze the EFHT of Markov chas ths paper. The multplcatve drft theorem.e., Theorem [9] s for dervg upper bouds o the EFHT. Frst, a dstace fucto Vx satsfyg that Vx X = 0 ad Vx / X > 0 eeds to be desged to measure the dstace of a state x to the target state space X. The, we eed to aalyze the drft towards X each step,.e., EVξ t Vξ t+ ξ t. If the drft each step s roughly proportoal to the curret dstace to the optmum, we ca derve a upper boud o the EFHT accordgly. Theorem Multplcatve Drft [9]. Gve a Markov cha ξ t } + t=0 ad a dstace fucto V over X, f for ay t 0 ad ay ξ t wth Vξ t > 0, there exsts c > 0 such that EVξ t Vξ t+ ξ t c Vξ t, the t holds that Eτ ξ 0 +logvξ0/vm c, where V m deotes the mmum amog all possble postve values of V. The smplfed egatve drft theorem.e., Theorem [8], [9] s for provg expoetal lower bouds o the EFHT of Markov chas, where X t s ofte represeted by a mappg of ξ t. From Theorem, we ca see that two codtos are requred: a costat egatve drft ad expoetally decayg probabltes of jumpg towards or away from the target state. By buldg a relatoshp betwee the jumpg dstace ad the legth of the drft terval, a more geeral theorem smplfed egatve drft wth scalg [0] as preseted Theorem 3 has bee proposed. Theorem 4 gves the orgal egatve drft theorem [5], whch s stroger because both the two smplfed versos are proved by usg ths orgal theorem. Theorem Smplfed Negatve Drft [8], [9]. Let X t, t 0, be real-valued radom varables descrbg a stochastc process over some state space. Suppose there exsts a terval [a,b] R, two costats δ,ǫ > 0 ad, possbly depedg o l := b a, a fucto rl satsfyg rl = ol/logl such that for all t 0: EX t X t+ a < X t < b ǫ, j N + : P X t+ X t j X t > a rl +δ j. The there exsts a costatc > 0 such that for T := mt 0 : X t a X 0 b} t holds PT cl/rl = Ωl/rl. Theorem 3 Smplfed Negatve Drft wth Scalg [0]. Let X t, t 0, be real-valued radom varables descrbg a stochastc process over some state space. Suppose there exsts a terval [a,b] R ad, possbly depedg o l := b a, a drft boud ǫ := ǫl > 0 as well as a scalg factor r := rl such that for all t 0: EX t X t+ a < X t < b ǫ, j N + : P X t+ X t jr X t > a e j, 3 r mǫ l, ǫl/3logǫl}. The t holds for the frst httg tme T := mt 0 : X t a X 0 b} that PT e ǫl/3r = Oe ǫl/3r. Theorem 4 Negatve Drft [5]. Let X t,t 0, be realvalued radom varables descrbg a stochastc process over some state space. Pck two real umbers al ad bl depedg o a parameter l such that al < bl holds. Let Tl be the radom varable deotg the earlest tme t 0 such that

4 4 X t al holds. Suppose there exsts λl > 0 ad pl > 0 such that for all t 0: E e λl Xt+ Xt al < X t < bl pl. The t holds that for all tme bouds Ll 0, PTl Ll X 0 bl e λl bl al Ll Dl pl, where Dl = max,e e λl Xt+ bl X t bl }. III. THE EFFECT OF SAMPLE SIZE Prevous studes [], [3] have show that for osy evolutoary optmzato, samplg wth some fxed sample sze m ca decrease the rug tme expoetally some stuatos. For example, for the +-EA solvg the OeMax problem uder oe-bt ose wth the ose probabltyp = ωlog/, the expected rug tme s super-polyomal [0]; whle by usg samplg wth m = 4 3, the rug tme reduces to polyomal []. The, a atural questo s that whether the rug tme wll always be polyomal by usg ay polyomally bouded sample sze larger tha the effectve m. It may be beleved that the aswer s yes, sce the sample sze m has bee effectve ad usg a larger sample sze wll make the ftess estmato more accurate. For example, for the +-EA solvg OeMax uder oe-bt ose, t s easy to see from Lemma 3 [] that usg a larger sample sze tha 4 3 wll make the probablty of acceptg a true worse soluto the comparso cotue to decrease ad the rug tme wll obvously stay polyomal. I ths secto, we gve a couterexample by cosderg the +- EA solvg the LeadgOes problem uder oe-bt ose, whch suggests that the selecto of sample sze should be careful practce. As preseted Defto, the goal of the LeadgOes problem s to maxmze the umber of cosecutve -bts coutg from the left of a soluto. We ca easly see that the optmal soluto s the strg wth all s deoted as. As preseted Defto 3, the oe-bt ose model flps a radom bt of a soluto before evaluato wth probablty p. Whe p =, t was kow [] that the expected rug tme of the +-EA s expoetal, whle the rug tme wll reduce to polyomal by usg samplg wthm = 4 4 log/5. We prove Theorem 5 that the rug tme of the +-EA wll retur to expoetal f m 5. Defto LeadgOes. The LeadgOes Problem s to fd a bary strg x 0,} that maxmses fx = = j= x j. Defto 3 Oe-bt Nose. Gve a parameter p [0, ], let f x ad fx deote the osy ad true ftess of a soluto x 0,}, respectvely, the f fx wth prob. p, x = fx wth prob. p, where x s geerated by flppg a radomly chose bt of x. From Lemma 6 [], we ca fd the reaso why samplg s effectve oly wth a moderate sample sze. I most cases, f fx > fy, the expected gap betwee f x adf y s postve, whch mples that a larger sample sze s better sce t wll decrease Pˆfx ˆfy. However, whe x = ad y s close to the optmum, the expectato of f f y ca be egatve, whch mples that a larger sample sze s worse sce t wll crease Pˆf ˆfy. Thus, ether a small sample sze or a large sample sze s effectve. The sample sze of m = 4 4 log/5 just makes a good tradeoff, whch ca lead to a ot too large probablty of ˆf ˆfy ad a suffcetly small probablty of ˆfx ˆfy for two solutos x ad y wth fx > fy ad Ef x f y > 0. Theorem 5. For the +-EA solvg LeadgOes uder oe-bt ose wth p =, the expected rug tme s expoetal []; f usg samplg wth m = 4 4 log/5, the expected rug tme s polyomal []; f usg samplg wth m 5, the expected rug tme s expoetal. Proof. We oly eed to prove the case m 5. Our ma dea s to show that before reachg the optmal soluto, the algorthm wll frst fd the soluto 0 or 0 wth a probablty of at least + ; whle the probablty of leavg 0 or 0 s expoetally small. Combg these two pots, the theorem holds. Let a Markov cha ξ t } + t=0 model the aalyzed evolutoary process. Let LOx deote the true umber of leadg -bts of a soluto x. For ay t, let C t deote the evet that at tme t, the +-EA fds a soluto wth at least leadg -bts for the frst tme,.e., LOξ t ad t < t : LOξ t < ; let A t ad B t deote the subsets of C t, whch requre that ξ t 0, 0} ad ξ t, 0 }, respectvely. Thus, before reachg the optmal soluto, the +-EA ca fd a soluto 0, 0} wth probablty at least + t= PA t C t PC t. We the show that PA t C t /+. Assume that ξ t = x, where LOx <. Let P mut x,y deote the probablty that x s mutated to y by bt-wse mutato. The, PA t C t = P mut x, 0 Pˆf 0 ˆfx 3 +P mut x, 0 Pˆf 0 ˆfx /PC t. For Pˆf 0 ˆfx ad Pˆf 0 ˆfx, we apply Hoeffdg s equalty to get a lower boud e /. By the defto of oe-bt ose, we get, for 0 k, Ef k 0 k = k j= j + k + k. The, we have, for k, Ef k 0 k Ef k 0 k k =. 4 Thus, for k 3, Ef 0 Ef k 0 k Ef 0 Ef 3 0 = /. Sce LOx 3 ad Ef x Ef LOx 0 LOx, we have Ef 0 Ef x /.

5 5 Let r = Eˆfx ˆf 0. Sce the ˆf value by samplg s the average of m depedet evaluatos, r = Ef x Ef 0 /. The, we have Pˆfx ˆf 0 = Pˆfx ˆf 0 r r exp m r m e /, where the frst equalty s by Hoeffdg s equalty ad f x f 0, ad the last s by r / ad m 5. It s easy to see from Eq. 4 that Ef 0 = Ef 0. Thus, we ca smlarly get 5 Pˆfx ˆf 0 e /. 6 By applyg Eqs. 5 ad 6 to Eq. 3, we get PA t C t e / Pmutx, 0+P mut x, 0. PC t Sce PB t C t P mut x, +P mut x, 0 /PC t, PA t C t PB t C t e / Pmutx, 0+P mut x, 0 P mut x, +P mut x, 0. If x = x = 0 or x = x =, PA t C t PB t C t e / + +. If x +x =, we ca smlarly derve that PAt Ct PB t C t. Sce PA t C t + PB t C t =, our clam that PA t C t /+ holds. Thus, the probablty that the +-EA frst fds a soluto 0, 0} before reachg the optmum s at least + t= PA t C t PC t + + PC t t= = + PLOξ 0 < = +, where the frst equalty s because the uo of the evets C t wth t mples that the tme of fdg a soluto wth at least leadg -bts s at least, whch s equvalet to that the tal soluto ξ 0 has less tha leadg -bts; ad the last equalty s due to the uform tal dstrbuto. We the show that after fdg 0 or 0, the probablty of the +-EA leavg ths state each terato s expoetally small. From Eqs. 5 ad 6, we kow that for ay x wth LOx < ad y 0, 0}, Pˆfx ˆfy e /. For x 0, } ad y 0, 0}, t s easy to verfy that Ef y f x = j= j + j= j =. Usg the same aalyss as Eq. 5, we ca get, for x 0, } ad y 0, 0}, Pˆfx ˆfy e /. Combg the above two cases, we get, for x / 0, 0} ad y 0, 0}, Pˆfx ˆfy e /. Thus, our clam that the probablty of leavg 0, 0} each step s expoetally small holds. IV. PARENT POPULATIONS CAN WORK ON SOME TASKS WHERE SAMPLING FAILS Prevous studes [4], [], [3] have show that both usg populatos ad samplg ca brg robustess to ose. For example, for the OeMax problem uder oe-bt ose wth p = ωlog /, the +-EA eeds expoetal tme to fd the optmum [0], whle usg a paret populato sze µ log5/p [4], a offsprg populato sze λ max/p,4}log [4] or a sample sze m = 4 3 [] ca all reduce the rug tme to polyomal. The, a atural questo s that whether there exst cases where oly oe of these two strateges.e., populatos ad samplg s effectve. Ths questo has bee partally addressed. For the OeMax problem uder addtve Gaussa ose wth large varaces, t was show that the µ+-ea wth µ = ω eeds super-polyomal tme to fd the optmum [3], whle the +-EA usg samplg ca fd the optmum polyomal tme [3]. Now, we try to solve the other part of ths questo. That s, we are to prove that usg populatos ca be better tha usg samplg. I ths secto, we show that compared wth usg samplg, usg paret populatos ca be more robust to ose. Partcularly, we compare the +-EA usg samplg wth the µ+-ea for solvg OeMax uder symmetrc ose. As preseted Defto 4, the goal of the OeMax problem s to maxmze the umber of -bts, ad the optmal soluto s. As preseted Defto 5, symmetrc ose returs a false ftess fx wth probablty /. It s easy to see that uder ths ose model, the dstrbuto of f x for ay x s symmetrc about. Defto 4 OeMax. The OeMax Problem s to fd a bary strg x 0,} that maxmses fx = = x. Defto 5 Symmetrc Nose. Let f x ad fx deote the osy ad true ftess of a soluto x, respectvely, the f fx wth prob. /, x = fx wth prob. /. We prove Theorem 6 that the expected rug tme of the +-EA usg samplg wth ay sample sze m s expoetal. From the proof, we ca fd the reaso why usg samplg fals. Uder symmetrc ose, the dstrbuto of f x for ay x s symmetrc about. Thus, for ay two solutos x ad y, the dstrbuto of f x f y s symmetrc about 0. By samplg, the dstrbuto of ˆfx ˆfy s stll symmetrc about 0, whch mples that the offsprg soluto wll always be accepted wth probablty at least / each terato of the +-EA. Such a behavor s aalogous to radom walk, ad thus the optmzato s effcet. Theorem 6. For the +-EA solvg OeMax uder symmetrc ose, f usg samplg, the expected rug tme s expoetal. Proof. We apply the smplfed egatve drft theorem.e., Theorem to prove t. Let X t = x 0 deote the umber

6 6 of 0-bts of the soluto x mataed by the +-EA after rug t teratos. We cosder the terval [0, /0],.e., the parameters a = 0 ad b = /0 Theorem. We the aalyze EX t X t+ X t = for < /0. The drft s dvded to two parts: E + ad E. That s, EX t X t+ X t = = E + E, where 7 E + = P mut x,x Pˆfx ˆfx x 0, x : x 0< E = x : x 0> P mut x,x Pˆfx ˆfx x 0. To aalyze E +, we use a trval upper boud o Pˆfx ˆfx. The, we have E + x : x 0< P mutx,x x 0, 8 where the last equalty s drectly derved from Eq. 7 the proof of Theorem 9 []. For E, we have to cosder that the umber of 0-bts s creased. We aalyze the cases where oly oe -bt s flpped.e., x 0 = +, whose probablty s e. Let Z = f x f x. By the defto of symmetrc ose, the value of Z ca be,, ad +, each wth probablty /4. It s easy to see that the dstrbuto of Z s symmetrc about 0,.e., Z has the same dstrbuto as Z. Sce ˆfx ˆfx s the average of m depedet radom varables, whch have the same dstrbuto as Z, the dstrbuto of ˆfx ˆfx s also symmetrc about 0, ad thus Pˆfx ˆfx /. The, E e + = e. By calculatg E + E, we get EX t X t+ X t = e 0.05, where the last equalty s by < /0. Thus, codto of Theorem holds wth ǫ = To make X t+ X t j, t s ecessary to flp at least j bts of x. Thus, we get P X t+ X t j X t j j j! j. That s, codto of Theorem holds wth δ = ad rl =. Note that l = b a = /0. By Theorem, we ca coclude that the expected rug tme s expoetal. We prove Theorem 7 that the µ+-ea wth µ = 3log ca fd the optmum Olog 3 tme. The reaso for the effectveess of usg paret populatos s that the true best soluto wll be dscarded oly f t appears worse tha all the other solutos the populato, the probablty of whch ca be very small by usg a logarthmc paret populato sze. Note that ths fdg s cosstet wth that [4]. Theorem 7. For the µ+-ea solvg OeMax uder symmetrc ose, f µ = 3log, the expected rug tme s Olog 3. Proof. We apply the multplcatve drft theorem.e., Theorem to prove t. Note that the state of the correspodg Markov cha s curretly a populato,.e., a set of µ solutos. We frst desg a dstace fuctov: for ay populato P, VP = m x P x 0,.e., the mmum umber of 0-bts of the soluto P. It s easy to see that VP = 0 ff P X,.e., P cotas the optmum. The, we vestgate EVξ t Vξ t+ ξ t =P for ay P wth VP>0.e., P / X. Assume that curretly VP =, where. We also dvde the drft to two parts: EVξ t Vξ t+ ξ t = P = E + E, where E + = Pξ t+ = P ξ t = P VP, E = P :VP < P :VP > Pξ t+ = P ξ t = P VP. For E +, we eed to cosder that the best soluto P s mproved. Let x argm x P x 0, the x 0 =. I oe terato of the µ+-ea, a soluto x wth x 0 = ca be geerated by selectg x ad flppg oly oe 0-bt mutato, whose probablty s µ eµ. If x s ot added to P, t must hold that f x < f x for ay x P, whch happes wth probablty / µ sce f x < f x ff f x = fx. Thus, the probablty that x s added to P whch mples that VP = s / µ. We the get E + eµ µ = eµ µ. For E, f there are at least two solutos x,y P such that x 0 = y 0 =, t obvously holds that E = 0. Otherwse, VP > VP = mples that for the uque best soluto x P ad ay x P \x }, f x f x, whch happes wth probablty/ µ scef x f x ff f x = fx. Thus, PVP > / µ. Furthermore, VP ca crease by at most. Thus, E / µ. By calculatg E + E, we get EVξ t Vξ t+ ξ t eµ eµ µ µ 0log = 0log Vξ t, where the secod equalty holds wth large eough. Note that µ = 3log. Thus, by Theorem, Eτ ξ 0 0log+log = Olog, whch mples that the expected rug tme s Olog 3, sce the algorthm eeds to evaluate the offsprg soluto ad reevaluate the µ paret solutos each terato. I the followg, we show that the paret populato sze µ = 3log s almost tght for makg the µ+-ea effcet. Partcularly, we prove that µ = O s suffcet. Note that the proof s fshed by applyg the orgal egatve drft theorem.e., Theorem 4 stead of the smplfed versos.e., Theorems ad 3. To apply the smplfed egatve drft theorems, we have to show that the probablty of jumpg towards ad away from the target s expoetally decayg. However, the probablty of jumpg away from the target s at least a costat ths studed case. To jump away from the

7 7 target, t s suffcet that oe o-best soluto the curret populato s cloed by mutato ad the the best soluto s deleted the process of updatg the populato. The former evet happes wth probablty µ µ = Θ, ad the latter happes wth probablty, whch s Θ for µ µ = O. The orgal egatve drft theorem s stroger tha the smplfed oes, ad ca be appled here to prove the expoetal rug tme. Theorem 8. For the µ+-ea solvg OeMax uder symmetrc ose, f µ = O, the expected rug tme s expoetal. Proof. We apply the orgal egatve drft theorem.e., Theorem 4 to prove t. Let X t = Y t hz t, where Y t = m x P x 0 deotes the mmum umber of 0-bts of the soluto the populato P after t teratos of the µ+-ea, Z t = x P x 0 = Y t } deotes the umber of solutos P that have the mmum 0-bts Y t, ad for,,...,µ}, h = dµ d µ d µ wth d = µ+4. Note that 0 = h < h <... < hµ <, ad X t 0 ff Y t = 0,.e., P cotas at least oe optmum. We set l =, λl = ad cosder the terval [0,c ], where c = 3d,.e., the parameters al = 0 ad bl = c µ Theorem 4. The, we aalyze Eq.. It s easy to verfy that Eq. s equvalet to the followg equato: e X t r pl. PX t+ =r al<x t <bl r X t 9 We dvde the term the left sde of Eq. 9 to two parts: r < X t.e., X t+ < X t ad r > X t.e., X t+ > X t. We frst cosder X t+ < X t. Sce X t+ = Y t+ hz t+, X t = Y t hz t ad 0 hz t+,hz t <, we have X t+ < X t ff Y t+ Y t < 0 or Y t+ = Y t hz t+ > hz t. We cosder these two cases separately. Y t+ Y t = j. It mples that a ew soluto x wth x 0 = Y t j s geerated the t+-th terato of the algorthm. Suppose that x s geerated from some soluto x whch must satsfy that x 0 Y t selected from P, the x : x 0=Y t j P mut x,x x : x 0=Y t j Yt j j P mut x Y t,x j Yt < c j, where x j deotes ay soluto wth j 0-bts, the secod equalty s because t s ecessary to flp at least j 0-bts, ad the last equalty s by Y t = X t +hz t < bl+ = c. Meawhle, we have X t X t+ = Y t hz t Y t+ +hz t+ = j hz t j, where the secod equalty s by hz t+ = h = 0. Y t+ = Y t hz t+ > hz t. It mples that Z t < µ ad a ew soluto x wth x 0 = Y t s geerated. Suppose that the t + -th terato, the soluto selected from P for mutato s x. If x 0 > Y t, x : x 0=Y t P mut x,x x : x 0=Y t P mut x Yt+,x Y t+ = Yt+. If x 0 = Y t, x : x 0=Y t P mut x,x + Y t Yt j= j j e + Y t j= Yt j e + Yt/ Y. Sce Y t/ t = X t + hz t < bl+ = c ad c = 3d =, we have µ 3 µµ+4 P mut x,x. x : x 0=Y t Meawhle, t must hold that Z t+ = Z t +, thus we have X t X t+ = hz t+ hz t = hz t + hz t. By combg the above two cases, we get PX t+ = r al < X t < bl e X t r 0 r<x t Y t c j e j e + hzt+ hzt, Z t < µ 0, Z t = µ j= Y t ce j hzt + hz + t, Z t < µ 0, Z t = µ j= ce ce + hzt + hz t, Z t < µ 0, Z t = µ, where the secod equalty s by 0 < hz t + hz t < ad e s +s for 0 < s <. The, we cosder X t+ > X t. It s easy to verfy that X t+ > X t ff the t+-th terato, the ewly geerated soluto x satsfes that x 0 > Y t ad oe soluto x P wth x 0 = Y t s deleted. We frst aalyze the probablty of geeratg a ew soluto x wth x 0 > Y t. Suppose that the soluto selected from P for mutato s x. If x 0 > Y t, t s suffcet that all bts of x are ot flpped, thus x : x 0>Y t P mut x,x e. If x 0 = Y t, t s suffcet that oly oe -bt ofxs flpped, thus x : x 0>Y t P mut x,x Yt Yt e. Note that Y t = X t +hz t < bl + = c ad c = = 3 µµ+4 Θ for µ = O. Thus, we have P mut x,x c. e x : x 0>Y t We the aalyze the probablty of deletg oe soluto x P wth x 0 = Y t. Sce t s suffcet that the ftess evaluato of all solutos P x } wth more tha Y t 0- bts s affected by ose, the probablty s at least / µ. We fally aalyze X t X t+. If Z t =, Y t+ Y t +, thus X t X t+ = Y t Y t+ +hz t+ hz t hµ. If Z t, we have Y t+ = Y t ad Z t+ = Z t, thus X t X t+ = hz t+ hz t = hz t hz t. Note that for X t+ > X t, e Xt Xt+ < 0. Thus, we have PX t+ = r al < X t < bl e Xt r r>x t µ c e c µ+ d e hµ, Z t = e hzt hzt, Z t hµ, Zt = hz t hz t, Z t, e hµ, Zt = hz t hz t, Z t,

8 8 where the secod equalty s by e s s+s / = s+ s/ s/ for < s < 0, ad the last s by d = µ+4 ad c =. 3 By combg µµ+4 Eq. 0 ad Eq., we ca get PX t+ =r al<x t <bl e X t r ce ce + r X t hz t + hz t + d hµ, Z t = hz t + hz t + d hz t hz t, < Z t < µ. d hz t hz t, Z t = µ hµ If Z t =, hz = dµ d µ d t+ hz t d µ µ d µ d = d, ad µ we have hz t + hz t + d hµ = hz t + hz t d d hµ hµ. If < Z t < µ, hz t hz t hz = dµ Z t+ d µ Z t t+ hz t = d, ad smlarly we have d µ Z t d µ Z t hz t + hz t + d hz t hz t = hz t hz t + hµ hµ. If Z t = µ, d hz t hz t = hµ hµ. Thus, the above equato cotues wth d ce ce + d hµ hµ = /ce + d d µ 3 d µ = d µ, d d µ where the secod equalty s by c = 3d ad d 4. The µ codto of Theorem 4.e., Eq. or equvaletly Eq. 9 thus holds wth pl = d µ. Now we vestgate Dl = max,e e λl Xt+ bl X t bl } = max,e e bl Xt+ X t bl } Eq.. To derve a upper bod o Dl, we oly eed to aalyze E e bl Xt+ X t bl. E e bl Xt+ X t bl = PY t+ = r X t bl r bl + r<bl E e bl Xt+ X t bl,y t+ = r PY t+ = r X t bl E e bl Xt+ X t bl,y t+ = r. Whe Y t+ = r bl, we have bl X t+ = bl Y t+ + hz t+ hz t+ <. The we cosder the case that Y t+ < bl. Sce X t = Y t hz t bl, we have Y t bl > Y t+, whch mples that Y t bl ad Y t+ bl. To make Y t+ = r bl, t s ecessary that a ew soluto x wth x 0 = r bl s geerated by mutato. Let x deote the soluto selected from the populato P for mutato. Note that x 0 Y t bl. The, for r bl, PY t+ = r X t bl x : x P 0=r mutx,x x : x P 0=r mutx bl,x bl bl r bl r bl bl r. Furthermore, for Y t+ < Y t, t must hold that Z t+ =, ad thusbl X t+ = bl Y t+ +hz t+ = bl Y t+. Thus, the above equato cotues wth e+ r bl bl r bl e bl r e+ = e+ bl j= bl j e j e+ e bl / e bl / /e bl e+ /ce e+, where the fourth equalty s by bl bl+ = c ad the last equalty s by c = 3d. Thus, µ } Dl = max,e e bl Xt+ X t bl e+. Let Ll = e c/ Theorem 4. Note that c = 3d µ = 3 µµ+4 = Θ for µ = O. The, by Theorem 4, we get PTl e c/ X 0 bl e c e c/ e+ d µ = e Ω. By Cheroff bouds, for ay x chose from 0,} u.a.r., P x 0 < c = e Ω. By the uo boud, PY 0 < c µ e Ω = e Ω, whch mples that PX 0 < bl = PY 0 hz 0 < bl PY 0 < bl+ = PY 0 < c = e Ω. Thus, the expected rug tme s expoetal. V. OFFSPRING POPULATIONS CAN WORK ON SOME TASKS WHERE SAMPLING FAILS I the above secto, we have show that usg paret populatos ca be better tha usg samplg. We the show the superorty of usg offsprg populatos over samplg o the robustess to ose. Partcularly, we compare the +-EA usg samplg wth the +λ-ea, o the OeMax problem uder reverse ose. As preseted Defto 6, reverse ose returs a reverse ftess fx wth probablty /. Defto 6 Reverse Nose. Let f x ad fx deote the osy ad true ftess of a soluto x, respectvely, the f fx wth prob. /, x = fx wth prob. /. For OeMax uder reverse ose, t s easy to see that for ay two solutos x ad y, the dstrbuto of f x f y s symmetrc about 0. Thus, as we have foud uder symmetrc ose, the algorthm behavor by usg samplg s aalogous to radom walk, ad thus the optmzato s effcet. The proof of Theorem 9 s very smlar to that of Theorem 6, whch proves the expoetal rug tme requred by the +-EA usg samplg to solve OeMax uder symmetrc ose. Theorem 9. For the +-EA solvg OeMax uder reverse ose, f usg samplg, the expected rug tme s expoetal. Proof. The proof ca be accomplshed the same way as that of Theorem 6. The proof of Theorem 6 apples the smplfed egatve drft theorem.e., Theorem, ad aalyzes the drft EX t X t+ X t = by dvdg t to two parts: E + ad E. I the proof of Theorem 6, the aalyss of the postve drft E + does ot rely o the ose model, ad thus the upper boud E + / stll holds here. For the lower boud e of the egatve drft E, t reles o the property that for ay two solutos x wth x 0 = ad x wth x 0 = +, the

9 9 dstrbuto of f x f x s symmetrc about 0. By the defto of reverse ose, f x f x ca be +,, ad, each wth probablty /4; thus ts dstrbuto s stll symmetrc about 0. The, t stll holds that E e. Accordg to Theorem, we ca get that the expected rug tme s expoetal. By usg offsprg populatos, the probablty of losg the curret ftess becomes very small. Ths s because a far umber of offsprg solutos wth ftess ot worse tha the curret ftess wll be geerated wth a hgh probablty the reproducto of each terato of the +λ-ea, ad the curret ftess becomes worse oly f all these good offsprg solutos ad the paret soluto are evaluated correctly, the probablty of whch ca be very small by usg a logarthmc offsprg populato sze. Thus, usg offsprg populatos ca lead to a effcet optmzato, as show Theorem 0. Note that the reaso for the effectveess of usg offsprg populatos foud here s cosstet wth that [4]. Theorem 0. For the +λ-ea solvg OeMax uder reverse ose, f λ = 8 log, the expected rug tme s Olog. Proof. We apply Theorem to prove t. Each state of the correspodg Markov cha ξ t } + t=0 s just a soluto here. That s, ξ t correspods to the soluto after rugtteratos of the +λ-ea. We desg the dstace fucto as for x 0,}, Vx = x 0. Assume that curretly x 0 =, where. To aalyze EVξ t Vξ t+ ξ t = x, we dvde t to two parts as the proof of Theorem 7. That s, EVξ t Vξ t+ ξ t = x = E + E, where E + = Pξ t+ = y ξ t = x y 0, y: y 0< E = y: y 0> Pξ t+ = y ξ t = x y 0. For E +, sce y 0 <, we have y 0. Thus, E + Pξ t+ = y ξ t = x = P ξ t+ 0 < ξ t = x. y: y 0< To make ξ t+ 0 <, t requres that at least oe soluto x wth x 0 < s geerated the reproducto ad at least oe of them s evaluated correctly. To geerate a soluto x wth x 0 < by mutatg x, t s suffcet that oly oe 0-bt of x s flpped, whose probablty s e. Thus, each terato of the +λ-ea, the probablty of geeratg at least oe offsprg soluto x wth x 0 < s at least λ e λ e e. +λ e If λ e λ e +λ e >, e λ ; otherwse, e λ λ e. Thus, e λ m, λ e } = m, 4log e }, where the equalty s by λ = 8 log. Sce each soluto s evaluated correctly wth probablty, P ξ t+ 0 < ξ t = x m, 4log e }. Thus, E + m, 4log } e = m 4, log } e 4. For E, sce y 0, we have E P ξ t+ 0 > ξ t = x. Let q = x : x P 0 mutx,x deote the probablty of geeratg a offsprg solutox wth at most0-bts by mutatg x. Sce t s suffcet that o bt s flpped or oly oe 0-bt s flpped mutato,q + e. Now we aalyze P ξ t+ 0 > ξ t = x. Assume that the reproducto, exactly k offsprg solutos wth at most 0-bts are geerated, where 0 k λ; t happes wth probablty λ k q k q λ k. If k < λ, the soluto the ext geerato has more tha 0-bts.e., ξ t+ 0 > ff the ftess evaluato of these k offsprg solutos ad the paret soluto x s all affected by ose, whose probablty s. If k = λ, the soluto the ext geerato must have k+ at most 0-bts.e., ξ t+ 0. Thus, λ λ P ξ t+ 0 > ξ t = x = k k=0 q q k q λ k k+ λ λ, e where the last equalty s by q e. We the get E 8log e.3.3, where the base of the logarthm s. By calculatg E + E, we have EVξ t Vξ t+ ξ t = 5 Vξ t, where the secod equalty holds wth large eough. Thus, by Theorem, Eτ ξ 0 5+log = Olog, whch mples that the expected rug tme s Olog, sce t eeds to reevaluate the paret soluto ad evaluate the λ = 8log offsprg solutos each terato. The, we prove that a costat offsprg populato sze λ = O s ot suffcet to allow solvg the osy problem polyomal tme. Ths also mples that the effectve value λ = 8log derved the above theorem s early tght. From the proof, we ca fd that λ = O caot guaratee a suffcetly small probablty of losg the curret ftess, ad thus the optmzato s effcet. Theorem. For the +λ-ea solvg OeMax uder reverse ose, f λ = O, the expected rug tme s expoetal. Proof. We apply Theorem to prove t. Let X t = x 0 deote the umber of 0-bts of the soluto x mataed by the +λ-ea after rug t teratos. We cosder the terval [0, ],.e., a = 0 ad b = Theorem. 6e λ 6e λ We the aalyze EX t X t+ X t = for < 6e. We dvde the drft as follows: λ EX t X t+ X t = = E + E, where

10 0 E + = PX t+ = j X t = j, E = j=+ PX t+ = j X t = j. For E +, we eed to derve a upper boud o PX t+ = j X t = for j <. Note that X t+ = j mples that at least oe offsprg soluto x wth x 0 = j s geerated by mutatg x the reproducto. Thus, we have PX t+ =j X t = λ P mut x,x λ The, we get E + λ = λ x : x 0< x : x 0=j x : x 0=j x : x 0=j P mut x,x. P mut x,x j P mut x,x x 0 λ, where the last equalty s drectly derved by Eq. 8. For E, we easly have E PX t+ = j X t = = PX t+ > X t =. j=+ Let q = x : x P 0 mutx,x, where x s ay soluto wth 0-bts. Usg the same aalyss as Eq., we ca get λ λ PX t+ > X t = = q k q λ k k k+ k=0 = q λ q λ = q λ q λ + q λ q λ q λ 8e λ, where the last equalty s by q e ad q x : x P 0=+ mutx,x e 4. Thus, E λ/8e λ. By calculatg E + E, we have EX t X t+ X t = λ λ 8e λ λ 6e λ, where the last equalty s by < 6e. Thus, codto λ of Theorem holds wth ǫ = λ, whch s a costat for 6e λ λ = O. To make X t+ X t j, t s ecessary that at least oe offsprg soluto geerated by mutatg x flps at least j bts of x. Let pj deote the probablty that at least j bts of x are flpped mutato. We easly have pj j. Thus, j P X t+ X t j X t pj λ λ pj λ j j λ 3 j,.e., codto of Theorem holds wth δ= ad rl= λ=o. Note thatl=b a= =Θ. By Theorem, 6e λ we get that the expected rug tme s expoetal. VI. ADAPTIVE SAMPLING CAN WORK ON SOME TASKS WHERE BOTH SAMPLING AND POPULATIONS FAIL I ths secto, we frst theoretcally vestgate whether there exst cases where usg ether populatos or samplg s effectve. We gve a postve aswer by cosderg OeMax uder segmeted ose. The, we prove that such a stuato, usg adaptve samplg ca be effectve, whch provdes some theoretcal justfcato for the good emprcal performace of adaptve samplg practce [8], [3]. As preseted Defto 7, the OeMax problem s dvded to four segmets. I oe segmet, the ftess s evaluated correctly, whle the other three segmets, the ftess s dsturbed by dfferet oses. We prove Theorem that the expected rug tme of the +-EA usg samplg wth ay sample sze m s expoetal. From the proof, we ca fd the reaso for the effectveess of samplg. For two solutosx adx wth x 0 = x 0 +.e.,fx = fx +, the expected gaps betwee f x ad f x are postve ad egatve, respectvely, the segmets of 00 < x 0 50 ad 00 < x Thus, the former segmet, a larger sample sze s better sce t wll decrease Pˆfx ˆfx, whle the latter segmet, a larger sample sze s worse sce t wll crease Pˆfx ˆfx. Furthermore, there s o moderate sample sze whch ca make a good tradeoff. Thus, samplg fals ths case. Defto 7 OeMax uder Segmeted Nose. For ay x 0,}, the osy ftess value f x s calculated as: f x 0 > 50, f x = x 0 ; f 00 < x 0 50, f x 0 wth prob. /+/, x = 3+ x 0 wth prob. / /; 3 f 00 < x 0 00, f 4 x 0 wth prob. /, x = + x 0 3 wth prob. /; 4 f x 0 00, f x = 4 x 0 wth prob. /5, 4 δ wth prob. 4/5, where δ s radomly draw from a cotuous uform dstrbuto U[0,], ad /00 N +. Theorem. For the +-EA solvg OeMax uder segmeted ose, f usg samplg, the expected rug tme s expoetal. Proof. We dvde the proof to two parts accordg to the rage of m. Let X t = x 0 deote the umber of 0-bts of the soluto x mataed by the +-EA after rug t teratos. Whe m 4 400, we apply Theorem to prove that startg from X 0 50, the expected umber of teratos utl X t 4 00 s expoetal. Whem > 400, we apply Theorem to prove that startg from X 0 00, the expected umber of teratos utl X t 00 s expoetal. Due to the uform tal dstrbuto, both X 0 50 ad X 0 00 hold wth a hgh probablty. Thus, for ay m, the expected rug tme

11 utl fdg the optmum s expoetal. For the proof of each part, codto of Theorem trvally holds, ad we oly eed to show that EX t X t+ X t s upper bouded by a egatve costat. [Part I: m 4 ] We cosder the terval [ ]. As , 50 the proof of Theorem 6, we compute the drft EX t X t+ X t = where 00 < < 50 by E+ E.e., Eq. 7. For E, we cosder the cases where oly oe -bt of x s flpped mutato. That s, x 0 = +. We the show that the offsprg soluto x s accepted wth probablty at least 0.07.e., Pˆfx ˆfx 0.07 by cosderg two subcases for m. m 4. For 00 < k 50, let xk deote a soluto wth k 0-bts. Accordg to case of Defto 7, we have Ef x k = + k + = k ; Varf x k = + k + 3+k 4 3+k k 0 +k +4k 4. Let Y = f x f x. Note that x 0 = 00, 50 ad x 0 = +. The, we get that µ := EY = ad σ := VarY. Let Z = Y µ. The, we have EZ = 0, VarZ = σ ad ρ := E Z , where the last equalty holds wth large eough. Note that ˆfx ˆfx µ s the average of m depedet radom varables, whch have the same dstrbuto as Z. By Berry- Essee equalty [9], P ˆfx ˆfx µ m x σ Φx ρ σ 3 m, where Φx deotes the cumulatve dstrbuto fucto of the stadard ormal dstrbuto. Thus, Pˆfx ˆfx 0 = Pˆfx ˆfx µ µ ˆfx = P ˆfx µ m σ Φ µ m σ ρ σ 3 m 0.07, µ m σ where the last equalty s derved by µ =, 4 m 4 400, σ ad ρ 9 3. m 3. It holds that Pˆfx ˆfx 3 0., sce t s suffcet thatf x s always evaluated to3++ m depedet evaluatos. Combg the above two cases, our clam that Pˆfx ˆfx 0.07 holds. Note that < /50. Thus, we have E For E +, we ca smlarly get E + 50 as the proof of Theorem 6, because the offsprg soluto x s optmstcally assumed to be always accepted. Thus, the drft satsfes that EX t X t+ X t = = E + E 0./50. [Part II: m > ] We cosder the terval [ 00, 00 ], ad compute the drft EX t X t+ X t = where 00 < < 00 by E+ E.e., Eq. 7. For the egatve drft, we show that the probablty of acceptg the offsprg soluto x wth x 0 = + s at least 0.9. Let x k deote a soluto wth k 0-bts. Accordg to case 3 of Defto 7, we have, for 00 < k < 00, Ef x k f x k+ = 4 3+k +3+k+ 8; ad for 00 < k 00, Varf x k = +k6 + 4 k Ef x k / The, µ := Eˆfx ˆfx 8 ad σ := Varˆfx ˆfx = Varf x f x /m 8 5 /m. By Chebyshev s equalty ad m > 4 400, we have Pˆfx ˆfx P ˆfx ˆfx µ µ σ /µ 0., where the last equalty holds wth large eough. Thus, E e For E+, we stll have E Thus, the drft satsfes that EX t X t+ X t = = E + E 0.3. To prove the effectveess of paret populatos, we derve a suffcet codto for the expoetal rug tme of the µ+-ea requred to solve OeMax uder ose, whch s spred from Theorem 4 [3]. We geeralze ther result from addtve ose to arbtrary ose. As show Lemma, the codto tutvely meas that whe the soluto s close to the optmum, the probablty of dscardg t from the populato decreases learly w.r.t. the populato sze µ, whch s, however, ot small eough to make a effcet optmzato. Note that for the case where usg paret populatos works Secto IV, the probablty of dscardg the best soluto from the populato decreases expoetally w.r.t. µ. Let poly dcate ay polyomal of. Lemma. For the µ+-ea where µ poly solvg OeMax uder ose, f for ay y wth y > ad ay set of µ solutos Q = x,x,...,x µ }, Pf y < m x Qf x 3/5µ+, 5 the the expected rug tme s expoetal.

12 Proof. Let ξ t deote the populato after t teratos of the algorthm. Let X t deote the umber of solutos wth - bts ξ t. Let a = ad b = 0. We frst use a ductve proof to show that t 0, > a : EX t µb a. 6 For t = 0, due to the uform tal dstrbuto, we have EX 0 = µ /. Note that for j 3, j+ / j = j /3 j+ /3+. Thus, for > a, / / / b a, whch mples that > a,ex 0 µba. We the assume that 0 t k, > a : EX t µba, ad aalyze EX k+ for > a. Let X k = X0 k,xk,...,xk, l = l 0,l,...,l, l = =0 l 3 ad p = 5µ+. Let x deote the offsprg soluto geerated the t + -th terato of the algorthm, ad let x deote ay soluto wth -bts. Let P mut x,y deote the probablty that x s mutated to y by bt-wse mutato. We use P mut x j,x = y: y P = mutx j,y to deote the probablty of geeratg a soluto wth -bts by mutatg ay soluto wth j -bts. The, we have EX k+ X k = EEX k+ X k X k = PX k = l l =µ P x =,x ad ay x ξ k are ot deleted X k =l P x, oe x ξ k s deleted X k = l PX k =l P x = X k =l l +p l =µ = l =µ l =µ P x = X k = l l p PX k =l P x = X k =l p l p = PX k =l l j µ P mutx j,x p l p = p P mut x j,x l =µ = p PX k = ll p P mut x j,x µ PX k = l l p l =0 = p µ l =µ µ l PX k = l l j µ PX k j = l j l j µ P mut x j,x EXj p EX k k, where the secod equalty s because X k+ X k = ff x = ad x s added to the populato meawhle the solutos wth -bts ξ k are ot deleted; X k+ X k = ff x = ad oe soluto wth -bts ξ k s deleted, the frst equalty s because ay soluto wth -bts s deleted wth probablty at least p = 3 5µ+ by the codto Eq. 5, ad the fourth equalty s sce a paret soluto s uformly selected from ξ k for mutato. We further derve a upper boud o µ P mut x j,x EXj k as follows: µ P mut x j,x EXj k = a µ + j=a+ + + j= P mut x j,x EXj k j=+ a a j j + b a j a j j=a+ +b a l + + b a j l l= j=+ a a j +b a a b j j=a+ + e + l a + b j l= j=+ a b a + b a b a + e + a + b b a /, where the frst equalty s derved by applyg j a : P mut x j,x P mut x a,x a a a, EXk j = E Xk j = µ, j > a : EXk j µba j ad some smple upper bouds o P mut x j,x for j > a, the thrd equalty s by 0 < c < : + l= cl = c c = /c, ad the last s by a = , b = 0 ad > a. Combg the above two formulas, we get EX k+ X k p b a / p EX k, whch mples that EX k+ p b a /+ p EX k µ + 5µ+ 5µ+ µba µb a, where the secod equalty s by p = 3 5µ+ ad EXk µb a, ad the last equalty holds wth µ. Thus, our clam that t 0, > a : EX t µba holds. Based o Eq. 6 ad Markov s equalty, we get, for ay t 0, PX t EXt µba. Note that X t s the umber of optmal solutos the populato after t teratos. Let T = b a/. The, the probablty of fdg the optmal soluto T teratos s P t T,X t T t=0 PXt T µb a = µ b a /, whch s expoetally small for µ poly. Ths mples that the expected rug tme for fdg the optmal soluto s expoetal.

13 3 By verfyg the codto of Lemma, we prove Theorem 3 that the µ+-ea wth µ poly eeds expoetal tme for solvg OeMax uder segmeted ose. Theorem 3. For the µ+-ea where µ poly solvg OeMax uder segmeted ose, the expected rug tme s expoetal. Proof. We apply Lemma to prove t. For ay soluto y wth y 0 /00 ad Q = x,...,x µ }, let A deote the evet that f y < m x Qf x. We wll show that PA 4 5µ+, whch mples that the codto Eq. 5 holds sce y 0 /00 covers the requred rage of y > 599/600. Let B l 0 l µ deote the evet that l solutos Q are evaluated to have egatve osy ftess values. Note that for ay x, f x < 0 mples that x 0 /00, ad f x = 4 δ where δ U[0,]. For 0 l µ, PA B l Pf y < 0 B l PA f y < 0,B l. Uder the codtos f y < 0 ad B l, the osy ftess values of y ad the correspodg l solutos Q satsfy the same cotuous dstrbuto 4 δ where δ U[0,], thus PA f y < 0,B l l+ µ+. The, we get PA B l 4 5 µ+ ad PA = µ l=0 PA B l PB l 4 5µ+. By Lemma, the theorem holds. We the show Theorem 4 that usg offsprg populatos s also effectve ths case. By usg offsprg populatos, the probablty of mprovg the curret ftess becomes very small whe the soluto s the d segmet.e., 00 < x Ths s because a far umber of offsprg solutos wth ftess ot better tha the curret ftess wll be geerated wth a hgh probablty, ad the curret ftess becomes better oly f all these bad offsprg solutos ad the paret soluto are evaluated correctly, the probablty of whch almost decreases expoetally w.r.t. λ. Note that for the +λ-ea solvg OeMax uder reverse ose.e., Theorem 0, the effectveess of usg offsprg populatos s due to the small probablty of losg the curret ftess, sce t requres a far umber of offsprg solutos wth ftess ot worse tha the curret ftess to be evaluated correctly. Therefore, we ca see that usg offsprg populatos ca geerate a far umber of good ad bad offsprg solutos smultaeously, ad whether t wll be effectve depeds o the cocrete osy problem. Theorem 4. For the +λ-ea where λ poly solvg OeMax uder segmeted ose, the expected rug tme s expoetal. Proof. We apply the smplfed egatve drft theorem wth scalg.e., Theorem 3 to prove t. Let X t = x 0 deote the umber of 0-bts of the soluto x mataed by the +λ- EA after rugt teratos. We cosder the terval[ 75, 50 ],.e., a = 75 ad b = 50 Theorem 3. Frst, we aalyze EX t X t+ X t = for 75 < < 50. As the proof of Theorem, the drft s dvded to two parts: E + = PX t+ = j X t = j ad E = j=+ PX t+ = j X t = j. For E +, we cosder that the umber of 0-bts s decreased. Let q = x : x P 0,+} mutx,x,.e., the probablty of geeratg a soluto wthor+ 0-bts by mutatgx. Sce t s suffcet to flp o bts or flp oly oe -bt, q +. Now we aalyzepx t+ = j X t = for < j <. Assume that the reproducto, exactly k 00 offsprg solutos wth or + 0-bts are geerated, where 0 k λ; t happes wth probablty λ k q k q λ k. For k = λ, the soluto the ext geerato must have at least 0-bts.e., X t+. For 0 k < λ, each of the remag λ k solutos has j 0-bts wth probablty pj q, where pj := x : x P 0=j mutx,x. Thus, uder the codto that exactly k offsprg solutos wth or + 0-bts are geerated, the probablty that at least oe offsprg soluto has j 0-bts s pj q λ k. Furthermore, to make the soluto the ext geerato have j 0-bts.e., X t+ = j, t s ecessary that the ftess evaluato of these k offsprg solutos ad the paret soluto x s ot affected by ose, the probablty of whch s + k+. Thus, we have, for 00 < j <, PX t+ = j X t = λ λ q k q λ k pj λ k k+ k q + k=0 λ λ q k q λ k pj λ k k q + k+ k=0 λ k λ = pjλ + q k + q λ k k=0 λ λ = pjλ + q pjλ, 3 where the last equalty s by q + e For λ, λ+ 3 λ+ /λ 3 λ = λ+ λ 3, ad ote that 3, 3. Thus, we have, for 00 < j <, PX t+ = j X t = pj = P mut x,x. 7 For 0 j 00, we have x : x 0=j PX t+ = j X t = pj λ λ pj λ λ 8 λ j j j /300, where the frst equalty s because to make X t+ = j, t s ecessary that at least oe offsprg soluto wth j 0-bts s geerated, ad the last equalty s by > 75 ad j 00. By applyg Eqs. 7 ad 8 to E +, we get E + P mut x,x j /00<j< x : x 0=j + 0 j /00 λ j /300 + λ / ,

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package