Playing against Hedge

Size: px

Start display at page:

Download "Playing against Hedge"

Juliet Tucker
6 years ago
Views:

In J Communcaons, ework and Sysem Scences, 4, 7, 497-57 Publshed Onlne December 4 n ScRes hp://wwwscrporg/journal/jcns hp://dxdoorg/436/jcns475 Playng agans Hedge Mlades E Anagnosou, Mara A ambrou

1 In J Communcaons, ework and Sysem Scences, 4, 7, Publshed Onlne December 4 n ScRes hp://wwwscrporg/journal/jcns hp://dxdoorg/436/jcns475 Playng agans Hedge Mlades E Anagnosou, Mara A ambrou School of Elecrcal and Compuer Engneerng, aonal Techncal Unversy of Ahens, Ahens, Greece Deparmen of Shppng, Trade and Transpor, Unversy of he Aegean, Chos, Greece Emal: mlos@cenralnuagr, mlambrou@aegeangr Receved Ocober 4; revsed ovember 4; acceped December 4 Copyrgh 4 by auhors and Scenfc Research Publshng Inc Ths work s lcensed under he Creave Commons Arbuon Inernaonal cense (CC BY) hp://creavecommonsorg/lcenses/by/4/ Absrac Hedge has been proposed as an adapve scheme, whch gudes he player s hand n a mul-armed band full nformaon game Applcaons of hs game exs n nework pah selecon, load dsrbuon, and nework nerdcon We perform a wors case analyss of he Hedge algorhm by usng an adversary, who wll conssenly selec penales so as o mze he player s loss, assumng ha he adversary s penaly budge s lmed We furher explore he performance of bnary penales, and we prove ha he opmum bnary sraegy for he adversary s o make greedy decsons Keywords Hedge Algorhm, Adversary, Onlne Algorhm, Greedy Algorhm, Perodc Performance, Bnary Penales, Pah Selecon, ework Inerdcon Inroducon The problems of adapve nework pah selecon and load dsrbuon have ofen been consdered as games ha are played smulaneously and ndependenly by agens conrollng flows n a nework A possble absracon of hese and oher relaed problems s he band game In he mul-armed band game [] a player chooses one ou of sraeges (or machnes or opons or arms ) A loss or penaly (or a reward, whch can be s assgned o each sraegy (,,, ) modeled as a negave loss) = afer each round of he game An agen facng repeaed selecons wll possbly ry o explo he so far accumulaed experence A popular algorhm ha can gude he agen n each selecon round s he mulplcave updaes algorhm or Hedge In hs paper we calculae he wors possble performance of Hedge by usng he adversaral echnque, e we nvesgae he behavor of an nellgen adversary, who res o mze he player s cumulave loss In Secon we descrbe Hedge; n Secon we gve a rgorous formulaon of he adversary s problem; n Secon 3 we gve a recursve soluon; and n Secon 4 we presen sample numercal resuls Fnally, n Secon 5 we How o ce hs paper: Anagnosou, ME and ambrou, MA (4) Playng agans Hedge In J Communcaons, ework and Sysem Scences, 7, hp://dxdoorg/436/jcns475

2 M E Anagnosou, M A ambrou explore bnary adversaral sraeges Our man resul s ha he greedy adversaral sraegy s opmal among bnary sraeges The Band Game In a generalzed band game he player s allowed o play mxed sraeges, e o assgn a fracon p (such ha p = = ) of he oal be o opon, hereby geng a loss equal o = p = Alernavely, p can be nerpreed as a probably ha he player assgns he be on opon In he band verson only he oal loss s announced o he player, whle n he full nformaon verson he penaly vecor (,,, ) s announced A game consss of T rounds; a superscrp marks he h ( =,, T ) round Apparenly he player wll ry o mnmze he oal cumulave loss = p = = = () by conrollng he be dsrbuon, e by properly selecng he varables p We use he addonal assumpon ha he loss budge s lmed n each round by seng he consran = = Clearly a player s goal s o mnmze hs or her oal cumulave loss An exremely lucky player, or a player wh nsde nformaon, would selec he mnmum penaly opon n each round and would pu all hs or her be on hs opon, hereby T achevng a oal loss equal o mn The Hedge Algorhm = Que a few algorhmc soluons, whch wll gude he player s hand n he full nformaon game, have appeared n he leraure Freund and Schapre have proposed he Hedge algorhm [] for he full nformaon game Auer, Cesa-Banch, Freund and Schapre have proposed he Exp3 algorhm n [3] Allenberg-eeman and eeman proposed a Hedge varan, he G (Gan-oss) algorhm, for he full nformaon game wh gans and losses [4] Dan, Hayes, and Kakade have proposed he GeomercHedge algorhm n [5], and a modfcaon was proposed by Barle, Dan e al n [6] Recenly Cesa-Banch and ugos have proposed he ComBand algorhm for he band verson [7] A comparson can be found n [8] Hedge manans a vecor w = ( w, w,, w n ) of weghs, such ha w ( =,,, T, and =,,, ) In each round Hedge chooses he be allocaon accordng o he normalzed wegh p = w w When he opponen reveals he loss vecor of hs round, he nex round wegh w + s = deermned so as o reflec he loss resuls, e w + = wβ for some fxed β, such ha β In [9] Auer, Cesa-Banch, Freund and Schapre have proved ha he expeced Hedge performance and he expeced performance of he bes arm dffer a mos by O( Tln ) Freund and Schapre [] have gven a loss upper bound, whch relaes he oal cumulave loss wh he oal loss of he bes arm 3 Compeve Analyss The compeve analyss of an algorhm, whch n hs paper s Hedge, nvolves a comparson of s performance wh he performance of he opmal offlne algorhm In he band game he opmal offlne algorhm, e he opmal player s decsons gven he sequence of all penales n advance, s rval In a gven round he player can jus be everyhng on he opon wh he lowes penaly Accordng o S Iran and A Karln (n Secon 33 of []) a echnque n fndng bounds s o use an adversary who plays agans and concocs an npu, whch forces o ncur a hgh cos Usng an adversary s jus an llusrave way of sayng ha we ry o fnd he wors possble performance of an onlne algorhm In our analyss he adversary res o mze Hedge s oal loss by conrolng he penaly vecor (under a lmed budge) 498

3 M E Anagnosou, M A ambrou 4 Inerpreaons and Applcaons In hs secon we offer some nerpreaons from he areas of ) communcaon neworks and ) ransporaon The general seng of course nvolves a number of opons or arms, whch mus be seleced by a player whou any knowledge of he fuure Band models have been used n que dverse decson makng suaons In [] He, Chen, Wand and u have used a band model for he mzaon of he revenue of a search engne provder, who charges for adversemens on a per-clck bass They have subsequenly defned he armed band problem wh shared nformaon ; arms are paroned n groups and loss nformaon s shared only among players usng arms of he same group In [] Park and ee have used a mul-armed band model for lane selecon n auomaed hghways and auonomous vehcles raffc conrol 4 Traffc oad Dsrbuon Ths frs applcaon example can ake mulple nerpreaons, whch always nvolve a selecon n a compeve envronmen, n whch compeon s lmed I can be seen as ) a pah selecon problem n neworkng, ) a ranspor means (mode) choce or pah selecon problem, 3) a compuaonal load dsrbuon problem, whch we menon n he end of hs secon Frsly, we descrbe he problem n he conex of neworkng Consder smlar ndependen pahs (n he smples case jus parallel lnks), whch jon a par of nodes, A raffc volume equal o Q s sen from o n consecuve me perods or rounds by Q, Q,, Q a populaon of agens Q s he same n each round, bu he allocaon of Q o pahs, e ( ) such ha Q = Q, s dfferen n each round An agen A produces a consan amoun of raffc equal = o A, such ha q Q, n T consecuve rounds, and allocaes a par equal o q ( q = q = ) o he h pah n round The average delay (or cos) experenced by A s raffc n he h round s proporonal o Qq =, f we assume a lnear delay (or cos) model near models are used for smplcy n nework analyss [3] and can be realsc f a nework resource sll operaes n he lnear regon of he delay vs load curve, eg when delay s calculaed n a lnk, whch operaes no very close o capacy Agen A ams a mnmzng he oal delay for s own raffc and may use Hedge o deermne he quanes q n round, assumng ha A knows he performance of s own raffc n each pah n he pas me perod oe ha he mum delay n a round occurs f A pus he whole q n a sngle pah ogeher wh he whole raffc of he compeon, e wh Q ; hen A s average delay n hs round equals Q On he conrary, f Q s evenly dsrbued n all pahs, A s allocaon decson does no really maer, as he average wll be equal o ( q q) ( Q ) = Q Of course he mnmum delay n a round wll occur f A pus he whole q n an empy pah, hereby achevng a zero delay The above problem can also be formulaed as a more general problem of dsrbung workload over a collecon of parallel resources (eg dsrbung jobs o parallel processors) A Blum and C Burch have used he followng movang scenaro n [4]: A process runs on some machne n an envronmen wh machnes n oal The process may move o a dfferen machne a he end of a me nerval The load, whch wll be found on a machne a me round s he penaly fel by he process 4 Inerdcon Alhough an adversary s usually a echncal (fconal) concep, whch serves he wors case analyss of onlne algorhms, n some envronmens a real adversary, who nenonally res o oppose a player, does exs An example s he nerdcon problem We presen a verson of he nerdcon problem n a nework secury conex An aacker aacks resources (eg launches a dsrbued denal of servce aack on nodes, servers, ec, see [5]) by sendng sreams of harmful packes o resource a a rae w (where =,, and w s consan) A defender assgns a defense mechansm of nensy (eg a fler ha s able o deec and avod harmful packes wh a probably proporonal o ) o resource A he end of a me nerval T, eg one day, boh he aacker and he defender revse he flows and he dsrbuon of defense mechansms o resources respecvely, 499

4 M E Anagnosou, M A ambrou based on pas performance Smlar nerpreaons exs n ransporaon nework envronmens, as n border and cusom conrol, ncludng llegal mmgraon conrol An nerdcon problem formulaon can be used n a marme ranspor secury conex: praes aack he vessels raversng a marme roue In [6] Vanek e al assgn he role of he player o he prae The prae operaes n rounds, sarng and fnshng n hs home por In each round he selecs a sea area (arm) o sal o and search for possble vcm vessels A parol force dsrbues he avalable escor resources o sea areas (arms), and prae gans are nversely proporonal o he srengh of he defender s forces on hs area aval forces reallocae her own resources o sea areas Problem Formulaon In hs paper we am a fndng he wors case performance of Hedge Effecvely, we ry o solve he followng problem: Problem Gven a number of opons, an nal normalzed wegh vecor w = ( w, w,, w ), and a T Hedge parameer β, fnd he sequence,,, ha mzes he player s oal cumulave loss where = (,, ) H ( β ) = p () = = s he penaly vecor n round ( =,,, T ) round penaly weghs p are updaed accordng o w w = w = w, p = τ τ β β = w =, such ha, and he h for =,, T and p = w Clearly he objecve funcon () s a funcon of a) he nal weghs = = (3) w, and b) he T varables T + + ndepen-, and c) β Due o he normalzaon of boh weghs and penales here are den varables n oal In he followng we use ( w,, w ;,,,,,, ) T T ( ;,, ) H β whenever s necessary o refer o hese varables w nsead of 3 Recurson or w and he adversary generaes penales Assumng ha a gven round sars wh weghs = ( w,, w ) = (,, ), he nex round wll wll sar wh weghs W ( w, ) = ( W ( w, ),, W ( w, )) = ( = ) j wjβ j= where w β W w,,,, (4) Then, he oal loss of a T round game, whch sars wh weghs w, can be wren as he sum of he losses of a sngle round game, whch sars wh weghs w, and a T round game, whch sars wh weghs W ( w, ) = W ( w, ),, W ( w, ), as follows: ( ) ( w ) = ( w ) + W ( w ) ( ) ;,,, ;, ;,, (5) T T T T T oe ha he erm, whch expresses he conrbuon of he las T rounds, depends only on he updaed weghs provded by he nal round Such a Markovan propery can be generalzed n he followng sense: A T + T round game can be seen as conssng of a T round game g followed by a T round game g, whose nal weghs are he fnal weghs of g, and no more deals abou g are passed o g ; w = w,, he followng recursve Assumng ha he soluon o Problem s T T formula for w can be derved from (5):,, 5

5 where = ( w) = ( w; ) + ( W ( w; )) M E Anagnosou, M A ambrou (6) s he penaly vecor chosen by he adversary n he nal round ; ; ; λ w = λ w,, λ w, where The opmal penales can be compued also recursvely e T ; λ wh weghs w ) The opmal penaly of he nal round w denoes he h opmal penaly of he h opon n he h round of a T round game (sarng opmzes (6) Therefore = + ( ) λ ; w w W w = s apparenly equal o he value of, whch arg ; ; (7) In all oher rounds =,,, T he opmal penales are such ha he oal loss of he res of he game s ( ) T mzed, e such ha T ;, T W w λ ( w ) s acheved Snce he oal loss ; T usng penales λ ( w ), he oal loss T ( ( ; W w, λ ( w ))) s realzed by usng T ; λ W w, λ ; ( w ) nsead Therefore ( ) ( ) ( T ) w s acheved by ; + ; λ w = λ W w, λ ; w =,,, (8) 4 Two Opon Games and umercal Resuls Ths secon we explo he recursve mehodology, whch has been presened n he prevous secon, n order o provde some numercal resuls for wo opon games We compare hese resuls wh avalable bounds n he leraure We consder =, e wo opon games We keep only he ndependen penales n he exended noaon and use he more compac verson ( w ;,,, ) As an example, he loss of a sngle round game s gven by = + ( )( ) w w w ; (9) T Also, snce he nal weghs are w = ( w, w), we smplfy he mum cumulave loss ( w) Assumng losses = =, he nex round wll wll sar wh weghs (, ) W( w ), where T, Then (6) s smplfed o ( w) w o W w and w W( w, β ) = wβ + β () = ( ; ) + ( (, )) w w W w () where = s he penaly chosen by he adversary for he frs opon n he nal round The eraon sars from ( w ), e he loss of a sngle round game In such game he adversary conrols a sngle penaly varable, as he loss s gven by (9) Apparenly he adversary wll choose bnary values, e = = = f w= w > ( w < ), and he mum oal loss s ( w) = { w, w}, e ( ) ( w) w, f w, = w, f w The graph of ( w ) appears as he lowes V-shaped curve n Fgure The fac ha he w s a pecewse lnear funcon of w wh a breakpon (e a sudden change n s slope), creaes even more break- w and so on Therefore, whle s possble o use he aforemenoned recurson n pons n ( w ), () 5

6 M E Anagnosou, M A ambrou T Fgure Plo of ( w) β =,3,,9 and T =,,, (mum loss n a T round game) vs w for order o fnd analycal expressons for he mum oal loss and he assocaed penales, he analyss becomes que complcaed even for small values of he number of rounds T (e n a T + round game) We om hs edous analyss and presen numercal resuls based on he recursve mehodology gven above T w s approxmaed by K + Insead we have mplemened a numercal compuaon based on () T samples n he nerval [, ], e by ( w), where =,,, K and w K funcons ( w; ) and W( w, ) are represened by ( K + ) samples ( m wn ; ) and W( m wn, ) where w = We have used K = Inally we creae ( w) (,,, K) use he resul as npu o () and creae ( ( w) ) Then we use he already calculaed and 3 T o calculae, hen and o calculae, and so on In Fgure we show ( w) he nal wegh w = w n games wh up o en rounds (,,) T ha he shape of ( w) s more neresng for unreasonably small values of β = In he same way he, = by usng (9) We n () as a funcon of T = for dfferen values of β Observe The opmal penales can be deermned by usng formulas (7) and (8) for = In Fgure we draw one of he curves of Fgure ogeher wh he respecve opmal penales The fnal round opmal penaly (e 3;3 3 λ ( w) n hs example) s ceran o be bnary, snce he adversary wll assgn = o he opon wh 3; 3; he greaes wegh facor However, he penales λ ( w) and λ ( w) of he frs wo games are clearly non-bnary 5 Bnary and Greedy Schemes The penaly values n he frs wo rounds n he example of Fgure prove ha he adversary s opmal penales are no necessarly bnary However, n hs example β s unnaurally close o, as n praccal Hedge mplemenaons β s chosen close o ; hs choce acheves a more gradual adapaon o losses Boh expermenal and analycal evdence show ha he opmal penales end rapdly o bnary values as β approaches Effecvely, seems ha resuls very close o opmum can be acheved by a bnary adversary, e an adversary ha wll resor o bnary values only On he oher hand he opmal adversaral polcy wh bnary penales can be found exhausvely as ( w) = (,, ) where S s a se of bnary vecors ( b b b ) ( w ) ;,, bn T S,,, such ha b = =, e only one componen equals T Apparenly, he complexy of hs calculaon grows wh However, n he followng we show ha he opmal bnary adversary s n fac he greedy adversary, The laer acheves bnary opmaly n lnear me A greedy adversary s eager o punsh he mum wegh opon as much as possble n each round Thus 5

7 M E Anagnosou, M A ambrou Fgure Plo of 3 w (mum oal loss of a 4 round game) vs w for β =, ogeher wh he opmal penales 3; λ (,,,3) = he adversary wll assgn exacly one un of penaly o he mum curren wegh opon, and zero penales o all oher opons Gven a suffcen number of rounds (say ), easy o see ha he weghs of an opon game are equalzed so ha any wo weghs p, p j are such ha p pj < β for When equalzaon s acheved, a perodc phenomenon sars and he greedy penales form a roaon scheme 5 Greedy Behavor We explore he greedy paern n a wo opon game ha can easly be generalzed o opons Assumng nal weghs w, w ( w = w) such ha w > > w, a greedy adversary wll choose = = = =, = ff wβ > w > wβ, where (havng assumed w > w ) A he wegh of he second opon becomes for he frs me greaer han he wegh of he frs opon, and a loss equal o s assgned o he second opon Therefore, n he nex sep + he weghs (before normalzaon) are w β and w β, or equvalenly w β and w for he second me In he nex round hey become w β and w agan, and n general hey oscllae beween hese wo pars perodcally Therefore he oal loss for n a par of subsequen rounds s equal o wβ w p = + wβ + wβ wβ + w The value of s deermned by he nally assumed nequaly, and snce ough o be neger = ( lnw lnw) lnβ The loss n he frs seps =,,, s equal o w + w β τ τ τ = wβ + w (3) 53

8 M E Anagnosou, M A ambrou Therefore, for an even posve neger T he oal loss n T seps s wβ T wβ w τ = w + H β + τ + τ = wβ + w wβ + wβ wβ + w In a game wh more han wo opons s sraghforward o show ha n he seady (perodc) sae weghs end o become equal, e almos equal o, where s he number of opons Consequenly, he loss s gven by H T n a T round game β 5 Opmaly of he Greedy Behavor The followng proposon provdes a smple polynomal soluon o he problem of fndng he opmal bnary adversary Proposon The greedy sraegy s opmal for he adversary among all sraeges wh bnary penales Proof: Due o normalzaon of weghs and penales, n he proof we menon only opon weghs and n penales Assumng an nal wegh ω and penales,,, n he frs n rounds, he wegh, whch emerges before he (n + )h round s ωβ ( ωβ ω) +, where n = = Effecvely, wo opons are avalable o he adversary n each sep, eher ) o assgn a penaly equal o, whch wll produce an ncre- + ωβ ωβ ω ωβ ωβ + + ω or ) o menal loss equal o ( + ), and wll updae he wegh o assgn a zero penaly, whch wll produce a loss equal o ωβ ( ωβ ω) o ωβ ( ωβ x x + ω) Defne f ( x) ωβ ( ωβ + ω) + and an updaed wegh equal Ths looks lke a new game, n whch he adversary s he player The player s saus s deermned by a real f x f x, hs wll brng hm o number x, and possble rewards are f ( x ) and If he player ops for a new saus x + δ If he ops for f ( x), hs wll brng hm o x δ In our case f ( ) =, f ( + ) =, and f ω ξ s he roo of f ( x ) = (or f ( x) f ( x) = Moreover, f hen f ( x) for x ξ, and f ( x) for x ξ around ( ξ, ), e f ( ξ ) + x + f ξ x = f ξ =, and f ( x ) s concave n (,ξ ) convex n ( ξ, ) Assume ha ω, hen f = ω, and x < ξ, he greedy behavor s o move ( x ξ ) δ δ = oe also ha = ), I s easy o prove ha here s an odd symmery, whle s ξ If he curren saus of he player s x, and mes o he rgh, whch (unless T s oo shor) wll brng he player o a pon x such ha x ξ If x > ξ, hen f ( x) > > f ( x) and he greedy player mus choose f ( x ) and move back o x δ < ξ Effecvely, hs sars an oscllaon beween x δ and x, whch wll las unl he end of he game In he followng we prove ha hs behavor s opmal, n spe of he fac ha profs around ξ are low f x s never a The man dea behnd hs skech of proof s ha a rerea (wh consequen low profs good nvesmen for he fuure Assume x as he player s saus, and T seps (rounds) reman unl he end of he game, whle x+ Tδ < ξ The player execues M forward seps, e x = x + δ, =,,, M, wh rewards f ( x ) Then, M backward seps wh gans f ( x ) are execued; consequenly x s reached agan In he res of he game, e unl he T h sep, greedy selecons are made Ths course of evens s shown on curve (a) n Fgure 3, where he dos mark he rewards acheved (and some dos have been vercally dsplaced by a small amoun so as o be dsngushable from oher dos a he same poson) If greedy selecons had been made all he way, he course of evens would be as shown by curve (b) If y descrbes he saus of he adversary on he greedy curve (b) a he h sep and x he saus on curve (a), hen f ( x) = f ( y) for =,, M Furhermore, f ( x3m+ ) = f ( ym+ ) Therefore he dfference beween he cumulave reward on curve (b) and curve (a) s 54

9 M E Anagnosou, M A ambrou Fgure 3 Sample pahs of player behavor, whch are used n he proof of Proposon T T M M R = f y f x = f x + f x + + f x + = = M+ = = ( δ) ( δ) ( δ) T = M+ ( δ) ( δ) = f x+ M + f x f x+ M Effecvely we need o show ha R Frs, le us make some observaons and explore oher varaons of R oe ha R, as gven by (4), s posve f he cumulave reward from he back and forh movemen (n he frs M seps) s less han he reward n he las M seps However, as T ncreases, he poson of he las sep approaches ξ and can be shown ha he cumulave reward of he las M seps decreases Ths propery wll be proved laer, and s due o he convexy and monooncy properes of f When T furher ncreases, some of he very las M seps of he greedy behavor ener he phase of oscllaon around ξ, and for T suffcenly large, all M belong o he oscllaon phase oe, however, ha he oscllaon phase rewards are hose closer o /, whch s he lower lm of all greedy seps If he greedy algorhm s o be opmal, even he M oscllaory seps should brng a cumulave reward greaer han he orgnal back and forh movemen On he oher hand, f we prove hs las nequaly, hs wll also prove (4), whose las M seps brng more reward han he M oscllaory seps e ( ψ, ψ ) be he par of oscllaon pons around ξ, e ψ = x+ ( ξ x) δ δ and ψ = δ + ψ In he wors case, whch has jus been menoned, R = M ωβ + ωβ M + f x f x + M ψ ψ ωβ ωβ = M f ( x) f ( x Mδ ) ψ ψ + ωβ + ω ωβ + ω ψ ψ ψ ( ) ψ δ ωβ + ω ωβ + ω However, f ( x) f ( x+ Mδ ) can be seen as he sum of M erms f ( x δ) f ( x ( ) δ) + + +, for =, M We shall furher prove ha each of hese erms s smaller han he dfference nsde he bg parenheses, e ψ ψ ωβ ωβ f ( x+ δ) f ( x+ ( + ) δ) ψ ψ ωβ + ω ωβ + ω Ths s a consequence of he followng lemma: f x he followng nequaly s rue: emma For any concave funcon f ( x) f ( x x) f ( x x) f ( x x) Inequaly (5) holds because (4) (5) 55

10 M E Anagnosou, M A ambrou ( ) f x f x+ x f x+ x f x+ x f ( x+ x) x x whch s a consequence of he mean value heorem sang ha here s a pon xx, + x such ha f ( φ ) = f ( x+ x) f ( x) x Also, here s a pon φ n ( x+ xx, + x) such ha ( φ ) φ n f = f x+ x f x+ x x However, f s a concave funcon, and s dervave s non-ncrea- + mples f ( φ ) f ( x x) f ( φ ) sng, herefore φ x x φ +, whch proves (6) In fac (5) can be easly generalzed o any same lengh nervals, even overlappng ones, e f x x, hen (6) f x f x + x f x f x + x (7) Due o (5) each successve equal lengh (e x ) nerval produces an ncremenal reward f x f x+ x, whch s smaller han he ncremenal reward of he nex nerval, and of all succeedng nervals, as long as f remans concave Effecvely, emma proves ha he ncremenal reward of he rghmos nerval, whch does no conan ξ, e he nerval ( ψ δψ, ), s he hghes among he rewards of all nervals of he same lengh, whch begn o he lef of ψ δ Unforunaely, our am was o prove (4), whch would be secured f f remaned concave n ψ, ψ, eg f ψ = ξ δ and ψ = ξ However hs s no rue, snce a ξ f urns from concave o convex Forunaely, he erm f ( ψ ) f ( ψ ), whch covers he nerval ( ψ, ψ ) can be seen as he sum of rewards relaed wh he concave f n (, ) cave f n ( ξ, ψ ) Due o he odd symmery around ξ, f ( ξ + ( ψ ξ) ) + f ( ξ ( ψ ξ) ) = f ( ξ), herefore f ( ψ) = f ( ξ) f ( ξ ψ), and f ( ψ) f ( ψ) = f ( ψ) f ( ξ) f ( ξ ψ) = f ( ψ) f ( ξ) + f ( ξ ψ) f ( ξ) However, due o he concavy of f, f ( ψ) f ( ξ) f ( ψ δ) f ( ξ δ), and f ( ξ ψ) f ( ξ) f ( ξ ψ ( ξ ψ) ) f ( ξ ( ξ ψ) ) f ( ξ δ) f ( ψ) f ( ψ) f ( ψ) f ( ψ δ) f ( ξ δ) + f ( ξ δ) f ( ψ) = f ( ψ δ) f ( ψ) Ths resul saes ha he nerval ( ψ, ψ+ δ), whch conans ξ, provdes hgher nerval ( ψ δψ, ), whch n urn s hgher han he f = Therefore ψ ξ and he con- f han he prevous of any prevous nerval of he same lengh Therefore we have seen so far ha a sequence of penales, whch begns a some x < ξ and nvolves one fold, can be reduced o a sequence whou any folds, and wh mproved oal reward, as shown n Fgure 4 In Fgure 4 a sequence of seps wh a sngle fold, whch sars a x and ends a x, s shown ogeher wh he respecve greedy sequence, whch sars a x and ends a x3 = Mδ + x If he sequence mus exend afer ξ, he addonal seps are oscllaon seps around ξ The res of hs proof s jus an applcaon of hs resul, so ha a sequence wh an arbrary number of folds can be reduced o an mproved reward foldless sequence Fgure 4 Reducon of a sequence of penales, whch conans a fold, o a sequence whou folds and wh mproved oal reward 56

11 M E Anagnosou, M A ambrou Suppose ha he nal poson of he game s a x, and ha x ξ (oherwse reverse he nal probables ω, ω) Suppose also ha he nal sequence does no exend beyond ψ, e does no reach ξ or nvolves a number of oscllaons around ξ Then ake he las fold and reduce as menoned, e by replacng wh an equal number of greedy seps a he end of he curren sequence If hese seps reach ξ, hey are oscllaon seps Repea he same sep, unl all folds have dsappeared (oscllaons do no coun as folds) If he orgnal sequence does exend beyond ξ, he approach s he same, bu he reader should noe ha he applcaon of hs algorhm wll fnally reduce he par, whch exends beyond ψ, o oscllaons beween ψ and ψ 6 Concluson We summarze he man resuls of hs paper: An wors performance (adversaral) analyss of he Hedge algorhm has been presened, under he assumpon of lmed penales per round A recursve expresson has been gven for he evaluaon of he mum oal cumulave loss; hs expresson can be exploed boh numercally and analycally However, bnary penaly schemes provde an excellen approxmaon o he opmal scheme, and, remarkably, he greedy bnary sraegy has been proved opmal among bnary schemes for he adversary References [] Robbns, H (95) Some Aspecs of he Sequenal Desgn of Expermens Bullen of he Amercan Mahemacal Socey, 58, hp://dxdoorg/9/s [] Freund, Y and Schapre, RE (997) A Decson-Theorec Generalzaon of On-ne earnng and an Applcaon o Boosng Journal of Compuer and Sysem Scences, 55, 9-39 hp://dxdoorg/6/jcss99754 [3] Auer, P, Cesa-Banch,, Freund, Y and Schapre, RE () The on-sochasc Mul-Armed Band Problem SIAM Journal on Compung, 3, hp://dxdoorg/37/s [4] Allenberg-eeman, C and eeman, B (4) Full Informaon Game wh Gans and osses Algorhmc earnng Theory: 5h Inernaonal Conference, 344, [5] Dan, V, Hayes, TP and Kakade, SM (8) The Prce of Band Informaon for Onlne Opmzaon In: Pla, JC, Koller, D, Snger, Y and Rowes, S, Eds, Advances n eural Informaon Processng Sysems, MIT Press, Cambrdge, [6] Barle, P, Dan, V, Hayes, T, Kakade, S, Rakhln, A and Tewar, A (8) Hgh-Probably Regre Bounds for Band Onlne near Opmzaon Proceedngs of nd Annual Conference on earnng Theory (COT), Helsnk [7] Cesa-Banch, and ugos, G () Combnaoral Bands Journal of Compuer and Sysem Scences, 78, 44-4 hp://dxdoorg/6/jjcss [8] Uchya, T, akamura, A and Kudo, M () Algorhms for Adversaral Band Problems wh Mulple Plays In: Huer, M, Sephan, F, Vovk, V and Zeugmann, T, Eds, Algorhmc earnng Theory, ecure oes n Arfcal Inellgence o 633, Sprnger, [9] Auer, P, Cesa-Banch,, Freund, Y and Schapre, RE (995) Gamblng n a Rgged Casno: The Adversaral Mul-Armed Band Problem Proceedngs of 36h Annual Symposum on Foundaons of Compuer Scence, Mlwaukee, 3-33 [] Hochbaum, DS (995) Approxmaon Algorhms for P-Hard Problems PWS Publshng Company, Boson [] He, D, Chen, W, Wang, and u, T-Y (3) Onlne earnng for Aucon Mechansm n Band Seng Decson Suppor Sysems, 56, hp://dxdoorg/6/jdss374 [] Park, C and ee, J () Inellgen Traffc Conrol Based on Mul-Armed Band and Wreless Schedulng Technques Inernaonal Conference on Advances n Vehcular Sysem, Technologes and Applcaons, Vence, 3-7 [3] Bersekas, DP (998) ework Opmzaon Ahena Scenfc, Belmon [4] Blum, A and Burch, C () On-ne earnng and he Mercal Task Sysem Problem Machne earnng, 39, hp://dxdoorg/3/a: [5] Cole, SJ and m, C (8) Algorhms for ework Inerdcon and Forfcaon Games Sprnger Opmzaon and Is Applcaons, 7, hp://dxdoorg/7/ _4 [6] Vanĕk, O, Jakob, M and Pĕchouček, M () Usng Agens o Improve Inernaonal Marme Transpor Secury IEEE Inellgen Sysems, 6, 9-95 hp://dxdoorg/9/mis3 57

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS R&RATA # Vol.) 8, March FURTHER AALYSIS OF COFIDECE ITERVALS FOR LARGE CLIET/SERVER COMPUTER ETWORKS Vyacheslav Abramov School of Mahemacal Scences, Monash Unversy, Buldng 8, Level 4, Clayon Campus, Wellngon