Beating the Adaptive Bandit with High Probability

Size: px

Start display at page:

Download "Beating the Adaptive Bandit with High Probability"

Mervin Walters
5 years ago
Views:

1 Beaing he Adapive Bandi wih High Probabiliy Jacob Abernehy Compuer Science Division UC Berkeley Alexander Rakhlin Deparmen of Saisics Universiy of Pennsylvania Absrac We provide a principled way of proving Õ ) high-probabiliy guaranees for parial-informaion bandi) problems over arbirary convex decision ses. Firs, we prove a regre guaranee for he full-informaion problem in erms of local norms, boh for enropy and self-concordan barrier regularizaion, unifying hese mehods. Given one of such algorihms as a black-box, we can conver a bandi problem ino a full-informaion problem using a sampling scheme. he main resul saes ha a high-probabiliy Õ ) bound holds whenever he black-box, he sampling scheme, and he esimaes of missing informaion saisfy a number of condiions, which are relaively easy o check. A he hear of he mehod is a consrucion of linear upper bounds on confidence inervals. As applicaions of he main resul, we provide he firs known efficien algorihm for he sphere wih an Õ ) high-probabiliy bound. We also derive he resul for he n-simplex, improving he O p n logn )) bound of Auer e al [3] by replacing he log erm wih log log and closing he gap o he lower bound of Ω n ). While Õ ) high-probabiliy bounds should hold for general decision ses hrough our main resul, consrucion of linear upper bounds depends on he paricular geomery of he se; we believe ha he sphere example already exhibis he necessary ingrediens. he guaranees we obain hold for adapive adversaries unlike he in-expecaion resuls of [1]) and he algorihms are efficien, given ha he linear upper bounds on confidence can be compued. I. INRODUCION he problem of Online Convex Opimizaion, in which a player aemps o minimize his regre agains a possibly adversarial sequence of convex cos funcions, is now quie well-undersood. he more recen research rend has been o consider various limied-informaion versions of his problem. In paricular, he bandi version of Online Linear Opimizaion OLO) has received much aenion in he pas few years. o be precise, he problem we are ineresed in can be phrased as a repeaed game beween he player and he adversary. A each round, he player picks a decision from he allowed convex se of moves, and he adversary simulaneously picks a linear cos funcion from her allowed se of moves. Unlike he well-undersood OLO game, in he bandi version only he cos of he decision is revealed o he player, no he cos funcion iself. he adversary, on he oher hand, is aware of he complee hisory. he aim of he player is o minimize regre, he cumulaive cos incurred over he course of he game minus he cumulaive cos of he bes fixed decision. he scarciy of informaion revealed o he player makes he problem difficul. he firs efficien algorihm wih an Õ ) guaranee on he regre for opimizaion over arbirary convex ses was recenly obained in [1]. his guaranee was shown o hold in expecaion and he quesion of obaining guaranees in high probabiliy was lef open. In his paper, we develop a general framework for obaining high-probabiliy saemens for bandi problems. We aim o provide a clean picure, building upon he mechanism employed in [3], [5]. We also simplify he proof of [1] for he regre of regularizaion wih a self-concordan barrier and pu i ino he conex of a general class of regre bounds based on local norms. A reader surveying he lieraure on bandi opimizaion can easily ge confused rying o disinguish beween he resuls. hus, we firs iemize some recen papers according o he following crieria: a) efficien algorihm vs inefficien algorihm, b) arbirary convex se vs simplex or he se of flows in a graph, c) opimal Õ ) vs subopimal e.g. O 2/3) ) guaranee, d) in-expecaion vs high-probabiliy guaranee, and e) wheher he resul holds for an adapive adversary or only an oblivious one. For all he resuls we are aware of including he ones in his paper), a high-probabiliy guaranee on he regre naurally covers he case of an adapive adversary. his is no necessarily rue for he in-expecaion resuls. Wih respec o hese parameers, Auer e al [3] obained an efficien algorihm for he simplex, wih an opimal guaranee which holds in high probabiliy. McMahan and Blum [13] and Flaxman e al [11] obained efficien algorihms for an arbirary convex se wih subopimal guaranees which hold in expecaion agains an adapive adversary. Awerbuch and Kleinberg [4] obained an efficien algorihm for he se of flows wih a subopimal guaranee which holds in expecaion agains an adapive adversary. György e al [12] obained an efficien algorihm for he se of flows wih a subopimal guaranee which holds in high probabiliy. 1 Dani e al [9] obained an inefficien algorihm for an arbirary se, wih an opimal guaranee which holds in expecaion agains an oblivious adversary. he algorihm can be implemened efficienly for he se of flows. Barle e al [5] exended he resul of [9] o obain an inefficien algorihm for an arbirary se, wih an opimal 1 he auhors also obained an opimal guaranee for he se of flows in he seing where he lenghs of all edges on he chosen pah are revealed. his does no mach he bandi problem considered in his paper.

2 guaranee which holds in high probabiliy. he algorihm canno be in a sraighforward way) implemened efficienly for he se of flows. Abernehy e al [1] exhibied an efficien algorihm for an arbirary convex se, wih an opimal guaranee which holds in expecaion agains an oblivious adversary. In his paper, we obain an efficien algorihm for a sphere and simplex wih an opimal guaranee which holds in high probabiliy and, hus, agains an adapive adversary). Analogous resuls can be obained for oher convex ses; however, such resuls would have o be considered on he per-case basis, as he specific geomery of he se plays an imporan role for obaining an efficien algorihm wih an opimal high-probabiliy guaranee. his paper is organized as follows. In Secion II, we discuss full-informaion algorihms which will be used as blackboxes for bandi opimizaion. In Secion II-B we prove he known regre guaranees which arise from regularizaion wih a srongly convex funcion. We argue ha hese guaranees are no srong enough o be used for bandi opimizaion and, in Secion II-C, we inroduce a noion of local norms. We prove general regre guaranees wih respec o hese norms for regularizaion wih a self-concordan barrier and, for he case of he n-simplex, wih he enropy funcion. his allows us o have a unified analysis of bandi opimizaion wih eiher of hese wo mehods as a black-box. Secion III discusses he mehod of using a randomized algorihm for convering a full-informaion algorihm ino a bandi algorihm. We discuss he advanages of high-probabiliy resuls over he in-expecaion resuls and explain why he sraighforward way of applying concenraion inequaliies does no work. Secion IV conains he main resuls of he paper. We sae he main resul, heorem 4.1, and hen apply i o various seings in he subsequen secions. he muliarmed bandi seing he simplex case) is considered in Secion V-A, and we improve upon he resul of Auer e al [3] by removing he log facor. We provide a soluion for he sphere in Secion V- B. In passing, we menion how he in-expecaion resul for general convex ses of [1] immediaely follows heorem 2.3. Anoher sampling scheme for general bodies is suggesed, alhough we do no go ino he deails. he proof of our main resul, heorem 4.1, is given in Secion VI. I is based on lemmas whose proofs can be found in he echnical repor [2]. II. FULL-INFORMAION ALGORIHMS In his paper, we srive o obain he mos general resuls possible. o his end, bandi algorihms in Secion IV will ake as a sub-rouine an absrac full-informaion black-box for regre minimizaion. We devoe he presen secion o describing known guaranees for some full-informaion algorihms, as well as o developing a new family of guaranees under local norms. he laer are suied o he sudy of bandi opimizaion. o make hings concree, he full-informaion seing is ha of online linear opimizaion, which is phrased as he following game beween he learner player) and he environmen adversary). Le K R n be a closed convex se. A each ime sep = 1 o, Player chooses K Adversary independenly chooses f R n Player suffers loss f and observes f he aim of he player algorihm) is o minimize he regre agains any comparaor u K A. Algorihms R u) := f f u. Le Rx) be a convex funcion. We consider he following family wih respec o he choice of R) of Follow he Regularized Leader algorihms: Algorihm 1 Follow he Regularized Leader FRL) Inpu: η > 0. On he firs round, play x 1 := arg min x K Rx). On round + 1, play [ ] +1 := arg min x K η fs x + Rx) s=1. 1) Wihou loss of generaliy, we assume ha R akes is minimum a 0, since arg min is he same modulo consan shifs of R. We begin wih a well-known fac, whose easy inducion proof can be found e.g. in [16]. Proposiion 2.1: he regre of Algorihm 1, relaive o a comparaor u K, can be upper bounded as R u) f +1 ) + η 1 Ru). 2) he FRL algorihm is closely relaed o he following Mirror Descen-syle algorihm [8], [16]. Algorihm 2 Mirror Descen wih Projecions On he firs round, play x 1 := arg min x K Rx). On round + 1, compue +1 := arg min x R n ηf x + D R x, ) and hen play he projeced poin +1 := arg min x K D Rx, +1 ) his algorihm is given in wo seps alhough i can be described in one. Indeed, he poin +1 can simply be obained as he soluion o arg min x K ηf x + D R x, ). However, we emphasize he unprojeced poin +1 as i gives us an occasionally more useful regre bound:

3 Proposiion 2.2: he regre of Algorihm 2, relaive o a comparaor u K, can be upper bounded as R u) f +1 ) + η 1 Ru). 3) he analogue of Proposiion 2.1 also holds: R u) f +1 ) + η 1 Ru). 4) We also noe ha he wo algorihms coincide if R is a barrier. We refer o [16] for he proofs of hese facs. B. Regre Bounds wih Respec o Fixed Norms he regre bounds saed in Proposiions 2.1 and 2.2 are no ulimaely saisfying. In paricular, i is no immediaely obvious wheher he erms f +1 ) are small. Noice ha he poin +1 depends on boh f as well as on he behavior of R. I would be much more appealing if we could remove he dependence on he poins and have he regre depend solely on he Adversary s choices f and our choice of regularizer. his can indeed be achieved if we require cerain condiions on our regularizer. he ypical approach is o require ha R is srongly convex wih respec o some norm, which implies ha +1 2 R ) R+1 ), +1 5) R ) R+1 ) +1. where is he norm dual o, and he las sep follows by Hölder s Inequaliy. Hence, srong convexiy of R implies +1 R ) R+1 ), making possible he following resul. Proposiion 2.3: When R is srongly convex wih respec o he norm, hen for Algorihms 1 and 2 we have he following regre bound 2 : R u) η f 2 + η 1 Ru). Proof: For he case of FRL Algorihm 1), when R is a barrier funcion and hus is always aained on he inerior of K) i is a convenien fac ha R ) R+1 ) = ηf. Applying Hölder s inequaliy in he saemen of Proposiion 2.1 leads o he desired resul. If R is no a barrier, an applicaion of he Kolmogorov crierion see [7], heorem 2.4.2) for generalized projecions a sep 5) yields he saemen of he Proposiion. For Algorihm 2, he proof is a bi more involved, bu is well-known see e.g. [6]). Again, we refer he reader o [16], [17] for deails. he easies way o see Proposiion 2.3 a work is o assume ha f B p and K B q, he uni zero-cenered balls wih respec o l p and l q norms, where p, q) is a dual pair. When faced wih he paricular choice of l, l 1 ) pair of norms, he 2 We also menion ha a more refined proof leads o a consan of 1 2 insead of 1 in fron of he η P f 2 erm. naural choice of regularizaion is he unnormalized negaive enropy funcion Rx) = i x[i] log x[i] x[i]) log n), 6) defined over he posiive orhan. Here he 1 + log n erm ensures ha min R = 0 over he n-simplex K. I is easy o see ha his regularizaion funcion leads o he so-called exponenial weighs : exp η ) s=1 f [i] +1 [i] = n j=1 η exp ), s=1 f [j] and indeed his is rue for boh Algorihm 1 and Algorihm 2. For he fuure, i is useful o noe ha he unprojeced updaed +1 has he very simple unnormalized form : +1 [i] = [i] exp ηf [i]). 7) I is well-known ha he enropy funcion has he useful propery of srong convexiy wih respec o he l 1 norm. We can hus apply Proposiion 2.3 o obain: R u) η f 2 + η 1 log N. where he log N arises by aking R ) a any corner of he n-simplex. In he exper seing i is ypical o assume ha f 1, and so seing η = log N)/ appropriaely we obain R u) η + η 1 log N = 2 log N. C. Regre Bounds wih Respec o Local Norms he analysis of Proposiion 2.3 is he ypical approach, and indeed i can be shown ha he above bound for exponenial weighs is very igh, i.e. wihin a small consan facor from opimal. On he oher hand, here are imes when we canno make he assumpion ha f is bounded wih respec o a fixed norm. his is paricularly relevan in he bandi seing, when we will be esimaing he funcions f ye our esimaes will blow up depending on he locaion of he poin. In such cases, o obain igher bounds, i will be necessary o measure he size of f wih respec o a changing norm. While i may no be obvious a presen, he ideal choice of norm is he inverse Hessian of R a he poin. From now on, define z x := z 2 Rx)z, where z R n is arbirary and where R is assumed o be he regularizer in quesion. he dual of his norm z x is idenically he norm wih respec o he inverse Hessian, i.e. z x := z 2 Rx) 1 z. Our goal will now be o obain bounds of he form R u) η f ) 2 + η 1 Ru). 8) Le us inroduce he following shorhand: z := z x for he norm defined wih respec o.

4 For he case when Rx) = x 2 2 leading o he online gradien descen algorihm), his bound is easy: since 2 Rx) = I n, and R is srongly convex wih respec o he l 2 norm, we already know ha R u) η f 2 2+η 1 Ru) = η f ) 2 +η 1 Ru). 1) Regre guaranee for he enropy regularizer.: For he enropic regularizaion case menioned above, proving a regre bound wih respec o he local norm x requires a lile bi more work. Firs noice ha 2 Rx) = diagx[1] 1,..., x[n] 1 ), and ha 1 e x x for all real x. Nex, using Eq. 7), +1 = n [i] +1 [i]) 2 / [i] = n [i]1 e ηf[i] ) 2 η n [i]f [i] 2 = η f. Now we make special use of Proposiion 2.2. By Hölder s Inequaliy, R u) η f ) 2 + η 1 Ru). I can be verified ha Algorihms 1 and 2 produce he same when R is he enropy funcion and K is he simplex. hus, we have proved he following heorem. heorem 2.1: he exponenial weighs algorihm eiher Algorihm 1 or Algorihm 2) enjoys he following bound in erms of local norms: R u) η f ) 2 + η 1 Ru). As a side remark, we menion ha one can prove he same guaranee wih a slighly worse consan) by saring from Eq. 2) insead of Eq. 3). A lemma which can be found in he Appendix of [2], implies ha +1 2 = = β [i] 1 n [i] e ηf[i]) 2 n ) 2 e ηf[i] n j=1 [j]e ηf[j] ) 2 1 j=1 [j]e ηf[j] [i]ηf [i]) 2 = β ηf ) 2 for a small consan β. 2) Regre guaranee for he self-concordan regularizer.: I was shown in [1] ha, for he case of linear bandi opimizaion, he regularizaion funcion mus have he propery ha i curves srongly near he boundary. Indeed, i was observed ha he Hessian of R mus behave roughly as inverse disance 1/d, or even inverse squared disance 1/d 2, o he boundary. Indeed, he enropy funcion discussed above possesses he former propery on he n-simplex, bu funcions wih his 1/d growh propery are no readily available for general convex ses. o obain a funcion whose Hessian grows as 1/d 2 is much easier: he self-concordan barrier, commonly known as log barrier, is he cenral objec of sudy in Inerior Poin Mehods. In paricular, self-concordan barriers always exis and can be efficienly compued for many known bodies see, e.g., [14]). For a convex se wih linear consrains, he ypical choice of a self-concordan barrier is simply he sum of negaive log disance o each boundary. ha is, if he se is defined by Ax b, hen we would le R ) = i logb i e i Ax). I is rue ha, up o a consan, R is srongly convex wih respec o he l 2 norm, and we can hen easily prove a bound in erms of f 2 2. On he oher hand, i is precisely he case of bandi linear opimizaion for which i is useful o bound he regre in erms of he local norms f as in 8). I was shown in [1] ha he Hessian of a self-concordan barrier no only plays a crucial role in bounding he regre, bu also gives a handle on he local geomery hrough he noion of a Dikin ellipsoid. We refer he reader o [1] for more informaion on he Dikin ellipsoid and is relaion o sampling. As before, we can use Hölder s in equaliy o bound f +1 ) f +1, and now, as in he previous secion, we would like o replace +1 wih he dual norm η f. While i is no immediaely obvious how his should be accomplished, we can appeal o several nice resuls abou self-concordan funcions which makes our job easy. Define he objecive of Algorihm 1 as Φ x) = η s=1 f x + Rx). Since he barrier R goes o infiniy a he boundary of he se K, we have ha +1 is he unconsrained minimizer of Φ. o begin our shor journey o he land of Inerior Poin Mehods, define he Newon decremen for Φ as λx, Φ ) := Φ x) x = 2 Φ x) 1 Φ x) x and noe ha since R is self-concordan hen so is Φ. he above quaniy can be used o measure roughly how far a poin is from he global opimum: heorem 2.2 e.g. [14]): For any self-concordan funcion g, whenever λx, g) 1/2, we have x arg min g x 2λx, g) where he local norm x is defined wih respec o g, i.e. y x := y 2 gx))y. We can immediaely apply his heorem using he objecive Φ and he poin. Recalling ha 2 Φ = 2 R, we see ha, under he condiions of he heorem, +1 = arg min Φ 2λ, Φ ) = 2η f he las equaliy holds because, as is easy o check, Φ ) = ηf. We herefore have

5 heorem 2.3: Suppose for all {1... } we have η f 1 2, and R ) is self-concordan. hen R u) 2η [ f ] 2 + η 1 Ru). Given heorem 2.3, he resul of Abernehy, Hazan, and Rakhlin [1] follows immediaely, as we show in Secion V-C. III. BANDI FEEDBACK In he bandi version of online linear opimizaion, he funcion f is no revealed o us excep for is value a. he mechanism employed by all algorihms known o he auhors is o consruc a biased or unbiased esimae f of he vecor f from he single number revealed o us and feed i o he black box full-informaion algorihm. In order o consruc f, he algorihm has o randomly sample y around insead of deerminisically playing. Hence, he emplae bandi algorihm is: a round o predic y such ha Ey, obain f y, consruc f, feed i ino he black box, and obain he new +1. he paricular mehod for sampling y and consrucing f will be called he sampling scheme. he regre of he above procedure, relaive o a comparaor u, is R u) = f y u). However, he guaranees for he black-box are for a differen quaniy, which we denoe as R u) = f u). Le E denoe he condiional expecaion, given he random variables for ime seps If i is he case ha E f = f and E y =, hen for any fixed u, ] E [ f u) = E [f y u)]. 9) We conclude ha ER u) = E Ru). Hence, expeced regre agains a fixed u can be bounded hrough he expeced regre of he black-box. here are wo downsides o he above argumen. he firs is ha an in expecaion resul is much weaker han he corresponding high probabiliy saemen as he variance of he quaniies involved can be and, in fac, is) very large. I is no very saisfying o say ha he regre is of he correc order in expecaion bu has flucuaions of a higher order of magniude. he second weakness is in he fac ha u is fixed and, herefore, canno depend on he random moves of he player; in oher words, he adversary mus be oblivious. Boh of he downsides are overcome by proving a high probabiliy guaranee. I is emping o use he following incorrec) argumen for proving a high-probabiliy bound on R u) given an Õ ) bound on E R u): o obain a high-probabiliy bound, fix a u K and use Azuma-Hoeffding inequaliy o show an O ) concenraion of R u) around ER u). Nex, replace ER u) by E R u), which is Õ ), and ake a union bound over a discreizaion of u. he las sep only inroduces a log facor ino he bound, as we discuss laer. his approach fails 3 for he simple reason ha hrough he maringale difference argumen R u) is concenraed around he sum of condiional expecaions E f u), no he full expecaion E f u). he sum of condiional expecaions of f y u) erms is indeed equal o he sum of condiional expecaions of f u) erms. However, we do no know how o bound he laer: he regre guaranee for he black-box comes for he expeced regre, no he sum of condiional expecaions, hus breaking he argumen. Indeed, for proving high probabiliy bounds, a more refined analysis is needed. We ry o convey he big picure in he nex secion and illusrae i by proving a high-probabiliy bound for he sphere and he simplex, using he regularizaion wih selfconcordan barrier and enropy, respecively, as black-boxes. IV. HIGH PROBABILIY BOUNDS We now presen a emplae algorihm for bandi opimizaion. We assume ha a full-informaion black-box algorihm for linear opimizaion is available o us. A each ime sep = 1 o, Decide on he sampling scheme for his round, i.e. consruc a disribuion for y wih E y. Draw a sample y K from he disribuion and observe he loss f y. Consruc f such ha E f = f. Consruc a linear bias-funcion g u) = g u + µ. Feed f α g ino he black-box and receive +1. he algorihm requires wo parameers, α and η, which in urn depend on various aspecs of he problem. he following is he main resul of he paper. heorem 4.1: Suppose f B p for all and K B q, where log2 log )/δ p and q are dual. Le α = ) n. Suppose we can find c 1, c 2, c 3, c 4, c 5, c 6 0, such ha for all {1,..., } all of he following hold: A) he black-box full informaion algorihm enjoys a regre bound of he form R u) c 1 η [ f ] 2 + η 1 Ru) wih he local norm defined by 2 R ). B) E y q c 2 n. C) f u c 3 n for all u K. D) We can consruc a linear funcion g u) = g u + µ such ha u) E f f u) g u) u K and g ) c 4 n. [ ] 2 E) he consrucion saisfies f α g c5. [ ] 2 F ) On average, he norm is small: E f α g c6. 3 We hank Ambuj ewari for very helpful discussions in undersanding his.

6 G) Condiions for he regre bound in A) o hold are saisfied e.g. η f α g for log-barrier ) 1 2 hen, for any fixed u K, wih probabiliy a leas 1 δ + δ + δ ) where and f y u) η 1 Ru) + η A 1 + A 2, ) A 1 = c 1 c 6 + c 5 8 log1/δ ) A 2 = 8 log1/δ)+c 2 n+2c3 +c 4 +2) n log2 log )/δ ). Remark 4.1: As long as c 1,..., c 6 depend only weakly e.g. logarihmically) on, we obain he opimal Õ ) dependence by seing η 1/2. he growh of he bound in erms of n depends on he problem a hand and he sampling mehod. Remark 4.2: o obain a saemen wih probabiliy a leas 1 δ, for all u he guaranee holds, a union bound needs o be aken. For a se K, which can be represened as a convex hull of a number of is verices, he union bound inroduces an exra logarihm of his number of verices see he simplex example below). For a se such as sphere, an exra sep of discreizing he se ino a fine grid and aking a union over his exponenial) discreizaion is required. his echnique can inroduce an exra n log ino he bound see [10], [5] for deails). Since his sep depends on he paricular K a hand, we leave i ou of he main resul. Remark 4.3: he requiremen B) is a relaxaion of E y =. his slack is absoluely crucial for D) o be even possible. In he simplex case he slack corresponds o mixing in a uniform disribuion, which Auer e al [3] inerpre as an exploraion sep. For he sphere case, i corresponds o saying O 1/2 ) away from he boundary. From he poin of view of he proof, he relaxaion allows us o consruc g, i.e. o conrol he sum of condiional variances of f u). We noe ha he slack is no necessary for bounding he expeced regre only. his poins o he large variance of he esimaes and he weakness of he in-expecaion resuls. A. A Proof Skech Le us skech he mechanism for proving high-probabiliy bounds, which is applicable o a wide variey of ses and assumpions. We already menioned ha R u) is concenraed, for a fixed u K around he sum of condiional expecaions E f y u) wih ypical deviaions of O ). he laer is equal o he sum of condiional expecaions E f u). he ricky par is in proving ha R u) is concenraed around his sum. he ypical flucuaions of Ru) are more han, as he magniude of f depends on. hus, he only saemen we can make is ha, wih high probabiliy, E f u) f u) + c Var, where Var is he sum of condiional variances, growing faser han linear in. he magic comes from spliing he Var erm ino erms by he arihmeic-geomeric mean inequaliy and absorbing each of hese erms ino f, hereby biasing he esimaes. A a high level, we are adding he sandard deviaion a each ime sep o he esimaes f. Since his confidence inerval is a concave funcion, he black-box opimizaion over he modified f s will no work; he second magic sep due o his paper) is o find a linear funcion which uniformly bounds he confidence over he whole se K. If his can be done, he modified linear funcions are fed o he black-box, which enjoys an upper bound of η f ) 2, wih he norms of modified funcions. Finally, we show ha his quaniy is concenraed around he sum of condiional expecaions of he erms wih he ypical deviaions of O ), and he sum of condiional expecaions iself is bounded by O ) if f s have been consruced carefully. he las resul criically depends on availabiliy of a regre guaranee wih local norms, which have been exhibied earlier in he paper. he above paragraph is an informal descripion of he proof, which can be found in Secion VI. We refer o [2] for he deails. V. APPLICAIONS: HEOREM 4.1 A WORK For he sampling schemes below, we show ha our consrucion saisfies condiions of heorem 4.1, implying a highprobabiliy guaranee of Õ ). For each scheme, we provide a visual depicion of he disribuion from which we draw y. he size of he dos represens he relaive probabiliy mass, while he doed ellipsoid represens a sphere in he local norm a. Noe ha in he case of self-concordan R, his ellipsoid he Dikin ellipsoid) is conained in he se, which allows us o sample from is eigenvecors see [1]). A. Example 1: Soluion for he simplex his case corresponds o he non-sochasic muliarmed bandi problem [3]. We assume ha K is he simplex i.e. q = 1) and 0 f [i] 1 p = ). Regularizer R: We se our regularizaion funcion o be he enropy 6) and use Algorihm 1 or 2 as he black-box. Sampling of y : Le = n. Given he poin in he simplex, sample y = e i wih prob. p [i] := 1 ) [i] + /n. Consrucion of f : Given he above sampling scheme, we define our esimaes f he usual way: f = f e i )e i p [i] = f [i]e i p [i] when y = e i. 10)

7 Consrucion of g : he following g is appropriae for his problem: e i g u) := 2 + u p [i]. Before we ge sared, we noe a couple of useful facs ha we use several imes below: [i] p [i] p [i] 1 n 1 Now we check he condiions of he heorem. A) Since we are using enropy as our regularizaion, we have already shown in heorem 2.1 how o obain he necessary bound wih c 1 = 1. B) Noice ha Ey = 1 ) +unifn) and hus Ey 1 = unifn) 1 2 n, i.e. c 2 = 2. C) Since u is in he simplex, we see ha c 3 = 1: f u max i f [i] max p [i] 1 n i = n. D) We check ha g does indeed bound he variance. We can firs compue ) 2 f [i] E f f = p [i] e i e i p [i] 1 e i e i. p [i] We can now upper bound he variance of he esimaed losses, bu we need o do his on he enire simplex. Forunaely, since we are upper bounding a quadraic a convex funcion) i suffices o check he corners u = e i : x e i ) [j] 1[i = j]) 2 E f f e i ) p [j] j=1 < 1 p [i] + [j] 2 1 p [j] p [i] + p [j]) 2 1 ) 2 p [j] j i j i p [i] = g e i ). where we use he fac ha 1 ) 2 1/2 when 16n. Addiionally, we see ha c 4 = 3: g ) = 2 + n p [i] p [i] 2 + [i] 1 p [i] 3n. E) We now check ha, in he -norm, he biased esimae is no oo big. I is easy o check ha 2 R = [i] 1 e i e i 2 R 1 = [i]e i e i. Now assume y = e j, we can bound: ) 2 1[i = j] α f α g 2 = [i] p [i] ) 2 ) 1[i = j] α p [i] p [i] 1 2 α2 n 2 n1 α)2 + 2 Subsiuing = n, we obain c 5 = 2α 2 n 3/2 + 2 n1 α) 2. F ) We also mus check ha, in expecaion, he biased esimae is of consan order in he -norm: and [ ] 2 [ E f α g E 2 f 2 + 2α 2 g 2 n ) 1 2 p [i] p [i] 2 [i] + α 2 1 p [i] 2 [i] 2 n ) n + α n + α2 n 2 ) 1 p [i] = 4n + α 2 )n 3/2 ) =: c 6. We conclude ha ] A 1 = 4n + α 2 )n 3/2 ) + 2α 2 n 3/2 + 2 n1 α) 2) 8 log1/δ ) A 2 = 8 log1/δ) + 2 n + 7 n log2 log )/δ ). Now we swich o he Big-O noaion o elucidae he dependence on and n. Recalling ha α 2 = Olog log /n )), we observe ha A 1 = On) and A 2 = O n log log ). heorem 4.1 now saes ha wih probabiliy a leas 1 δ + δ + δ ), f y u) η 1 Ru) + η A 1 + A 2 for any fixed u. Since he regre is a linear funcional, i aains is maximum a one of he verices of he simplex. Hence, unlike in he nex secion, we only need o ake a union bound over hese verices o arrive a a saemen for all u K. We hus se δ = δ = δ = δ /n. Observe ha A 1 s asympoic dependence on n does no change, while A 2 now becomes O n logn log )). For any verex u, he shifed) enropy is Ru) = log n. log n Seing η = n, we conclude ha, wih high probabiliy, u K, f y u) = O n logn log )). his bound improves upon he resul of Auer e al [3], who obained an O n logn )) bound for he problem Algorihm EXP3.P). Our bound replaces he log erm wih log log, closing he gap o he lower bound of Ω n ). We conjecure ha log log growh in erms of is he mos sharp bound possible, due o he Law of he Ieraed Logarihm. In he full version of he paper, we will use sharper concenraion inequaliies o keep log log under he square roo.

8 B. Example 2: Soluion for he Euclidean sphere Suppose ha K = B 2 R n and ha he choices of he adversary are also l 2 -bounded by 1, i.e. p = q = 2. We poin ou ha wih he sampling scheme of [1] i is impossible o consruc g o saisfy he requiremens of heorem 4.1 see also Secion V-C). he modified sampling procedure below is key o reducing he variance of he esimaes. Regularizer R: We se our regularizaion funcion o be he sandard log-barrier Rx) = log1 x 2 ) for he sphere and use Algorihm 1 as he black-box. Sampling of y : We can assume wihou loss of generaliy ha 0, so define z := / 2. owards he goal of keeping our sampled poin y away from he boundary, define := max1 2, n ). Now consruc some n 1 orhonormal basis of he subspace perpendicular o z, which we will call Perpz ). Sample our predicion y as follows: z w.p y = z w.p. 4 ±w Perpz ) w.p. 4n 1) 11) Consrucion of f : Given he above sampling scheme, we define our esimaes f as follows, f = f y )y 2 Pry ) 12) where he probabiliies Pr ) are defined in equaion 11). I is sraighforward o check ha E f = f. Consrucion of g : he following choice of g will be shown o saisfy he requiremens: g u) := 4n ) u 2z. We now check he condiions of he heorem o verify ha his consrucion leads o a high-probabiliy bound. A) Since we are using a self-concordan regularizer, we already showed in heorem 2.3 how o obain he necessary bound wih c 1 = 2. B) Noice ha Ey = 1 )z = 1 )x 2 and because 1 ) 2 n i follows ha Ey n. Hence, c 2 = 1. C) Since we assume ha u 2 1, we see ha c 3 = 2: f u f 2 u 2 f 2 2 Pry )) 1 2n 2 n. D) We check ha g does indeed bound he variance. We firs upper bound he marix E f f : E f f = ) f 2 Pry ) y y y 2 Pry y ) 1 2 max y Pry ) 1 )I n, since he range of y is over ± vecors from an orhonormal basis. By consrucion each of hese probabiliies is /4n). Now we can bound: u) E f f u) 2n u 2 2 2n 2 E y u E y 2 2) 4n 1 )z u n 4n 2 2u z + 2u z ) + 4n g u). Addiionally, we check ha he bias is no large a. Recalling ha z = / and since 1 by consrucion, g ) = 4n x ) 20n, i.e. c 4 = 20. E) We now check ha, in he -norm, he biased esimae is no oo big. We can roughly lower bound 2 2 R = 1 2 I ) 2 x 1 1 I ) 2 z z 1 where we used ha 1 21 ) whenever [0, 1]. his ells us ha he eigenvalues of 2 R are bounded from below 1 ) 1 in all ) direcions orhogonal o, and by x 2 1 in he direcion of. hus, 1 1 = 2 1 )1+ ) 2 R 1 1 )I z z ) + 1 ) z z 1 )I z z ) + 21 ) 2 z z where he las inequaliy holds since 1 x + x 2 > 1/2 when x [0, 1]. Now ha we have conrol of he norm 2 R 1, we can bound g 2 g 2 R 1 g ) 2 8n = z 2 R 1 z 64n2 21 ) n 2 If y = z or z, f 2 4 )2 z 2 R 1 z ) 2 32,

9 If ±y Perpz ), ) 2 2n 1) f 2 y 2 x R 1 y 4n 1)2 2 1 ) 4n 3/2 1/2. hese las wo bounds give us, for large enough : f α g 2 8n 3/2 1/ n 2 α 2 i.e. c 5 = 8n 3/ n2 α 2. F ) We also mus check ha, in expecaion, he biased esimae is of consan order in he -norm: [ ] 2 [ ] E f α g E 2 f 2 + 2α 2 g 2 [ ] 2 Pry) f 2 y = y + 128n 2 α 2 y {±z,perpz )} < n n 2 α 2 =: c 6. G) heorem 2.3 comes wih he requiremen ha η f α g 1/2. From E), f α g = O 1/4 ). By aking η = O 1/2 ), he requiremen is saisfied for large enough. We conclude ha and A 1 = n n 2 α 2 + 8n 3/ n2 α 2 ) ) 8 log1/δ ) A 2 = 8 log1/δ) + n + 26 n log2 log )/δ ). log2 log )/δ Recalling ha α = ) n, we observe ha A 1 = On 2 ) and A 2 = O n log log ). heorem 4.1 hen gives us, wih η = log n, f y u) η 1 Ru)+η A 1 + A 2 = On log ) wih high probabiliy for any u K which is 1/2 away from he boundary. he asympoic behavior in erms of n and exacly maches he in-expecaion resul of [1], as he self-concordance parameer ϑ = 1 for he sphere. Now, o make he resul uniform for any u, we discreize he se K ino a grid of size n/2 and ake a union bound for all u in his se see [9], [5] for deails). Seing δ = δ = δ = δ leads n/2 o replacing all hree log 1/δ erms by n log + log 1/δ. Inspecing A 1, we observe ha his subsiuion inroduces n log in fron of 8n 3/2, which, when balanced wih η, exhibis η A 1 = On log ) behavior. However, A 2 = On 3/2 log ) now becomes he dominaing erm, as he log log /δ is no under he square roo. We conclude ha, wih high probabiliy, u K, f y u) = On 3/2 log ). A more careful analysis, involving a sharper inequaliy in one of he seps of he proof of heorem 4.1 see [2]), should reduce he dependence on n o linear. his will be carried ou in he full version of his paper. C. Example 3: Recovering he resul of [1] While i does no require heorem 4.1, for he sake of compleeness we show ha he in-expecaion resul of [1] immediaely follows from heorem 2.3. For any convex se, he sampling procedure proposed in ha paper is Regularizer R: he regularizaion funcion is a ϑ-selfconcordan barrier for K, whose exisence is guaraneed see [15], [14]). We use Algorihm 1 as he black-box. Sampling of y : Le {e 1,..., e n } and {λ 1,..., λ n } be he se of eigenvecors and eigenvalues of 2 R ). Choose i uniformly a random from {1,..., n} and ε = ±1 wih probabiliy 1/2. Sample y = + ε λ 1/2 i e i. Consrucion of f : Define f := n f y ) ε λ 1/2 i e i. Since here we are no ineresed in high-probabiliy bounds, we do no need o consruc g. Appealing o 9) and heorem 2.3, i only remains o bound f. By consrucion, f ) 2 = f 1 R ) f n 2. For any u which is 1/2 away from he boundary, Ru) 2ϑ log see [1]). hus, wih η = ϑ log n, we obain ER u) 4n ϑ log, which recovers he in-expecaion resul wih a slighly beer consan. he sampling scheme presened here does no saisfy condiions of heorem 4.1. Indeed, following he discussion in Remark 4.3, i is easy o prove even for K = [0, 1]) ha i is impossible o consruc g wih he desired properies. In oher words, he variance of he esimaes is larger han he desired regime. his realizaion was indeed he main moivaion for his paper. D. Example 4: Sampling schemes for general bodies We remark ha, while R has o be fixed hroughou he game, he sampling scheme does no. As long as he requiremens of heorem 4.1 are saisfied a each sep, he high probabiliy bound holds rue. he main difficuly in obaining a resul for general convex bodies K is in consrucion of g u), an upper-bound on he variance. Such a funcion heavily depends on he geomery and mus be consruced on per-case basis. We conjecure ha he following wo sampling schemes, one for he curved boundary similar o he spherical case) and one for he fla boundary similar o he simplex case), should be enough o deal wih mos nice ses K.

10 Pu large mass e.g. O 1 n ) on n 1 poins along he fla boundary, and pu a small probabiliy mass on a far away poin. As in he spherical case, pu large mass close o 1) on a single poin close o and small mass on 2n 1 oher poins far away. VI. PROOFS We sae four lemmas whose proof can be found in he echnical repor [2]. Lemma 6.1: Wih probabiliy a leas 1 δ, f y u) f u) + 8 log1/δ) + c 2 n. he following lemma is based on a resul proved in [5]. Lemma 6.2: For any δ < e 1 and 4, wih probabiliy a leas 1 2 log )δ, f u) R u) + 2 max 2 u) E f f u), 1 + 2c 3 n ) log1/δ) } log1/δ). Lemma 6.3: For any δ < e 1 and 4, wih probabiliy a leas 1 δ, ) f u) f α g x u) + [ 2c 3 + c 4 + 2) ] n log2 log )/δ ). he final ingredien is he following concenraion resul. Lemma 6.4: Wih probabiliy a leas 1 δ, [ ] 2 η f [ ] 2 α g η E f α g + η c 5 8 log1/δ ). Combining he above lemmas, we now prove he heorem. Proof: [Proof of heorem 4.1] Combining Lemma 6.1 and Lemma 6.3, we obain ha ) f y u) f α g x u) + 8 log1/δ) + c 2 n + 2c3 + c 4 + 2) n log2 log /δ ) wih probabiliy a leas 1 δ+δ ) ). By he black-box guaranee applied o funcions f α g, for any fixed u K, ) f α g x u) η 1 2 Ru)+c 1 η f α g. Combining he resuls, wih probabiliy a leas 1 δ + δ ), f y u) η 1 2 Ru) + c 1 η f α g + c 2 n + 8 log1/δ) + 2c 3 + c 4 + 2) n log2 log /δ ). Finally, by Lemma 6.4 and our assumpion F ), wih a bi of algebra we arrive a he saemen of heorem 4.1. REFERENCES [1] J. Abernehy, E. Hazan, and A. Rakhlin. Compeing in he dark: An efficien algorihm for bandi linear opimizaion. In Proceedings of he weny Firs Annual Conference on Learning heory, [2] J. Abernehy and A. Rakhlin. Beaing he adapive bandi wih high probabiliy. echnical Repor UCB/EECS , EECS Deparmen, Universiy of California, Berkeley, Jan [3] Peer Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Rober E. Schapire. he nonsochasic muliarmed bandi problem. SIAM J. Compu., 321):48 77, [4] Baruch Awerbuch and Rober D. Kleinberg. Adapive rouing wih end-o-end feedback: disribued learning and geomeric approaches. In SOC 04: Proceedings of he hiry-sixh annual ACM symposium on heory of compuing, pages 45 53, New York, NY, USA, ACM. [5] P. L. Barle, V. Dani,. Hayes, S. Kakade, A. Rakhlin, and A. ewari. High probabiliy regre bounds for online opimizaion. In Proceedings of he weny Firs Annual Conference on Learning heory, [6] Amir Beck and Marc eboulle. Mirror descen and nonlinear projeced subgradien mehods for convex opimizaion. Oper. Res. Le., 313): , [7] Y. Censor and S. A. Zenios. Parallel Opimizaion: heory, Algorihms, and Applicaions. Oxford Universiy Press, [8] Nicolò Cesa-Bianchi and Gábor Lugosi. Predicion, Learning, and Games. Cambridge Universiy Press, [9] Varsha Dani, homas Hayes, and Sham Kakade. he price of bandi informaion for online opimizaion. In J.C. Pla, D. Koller, Y. Singer, and S. Roweis, ediors, Advances in Neural Informaion Processing Sysems 20. MI Press, Cambridge, MA, [10] Varsha Dani and homas P. Hayes. Robbing he bandi: less regre in online geomeric opimizaion agains an adapive adversary. In SODA 06: Proceedings of he seveneenh annual ACM-SIAM symposium on Discree algorihm, pages , New York, NY, USA, ACM. [11] Abraham D. Flaxman, Adam auman Kalai, and H. Brendan McMahan. Online convex opimizaion in he bandi seing: gradien descen wihou a gradien. In SODA 05: Proceedings of he sixeenh annual ACM- SIAM symposium on Discree algorihms, pages , Philadelphia, PA, USA, Sociey for Indusrial and Applied Mahemaics. [12] A. György,. Linder, G. Lugosi, and G. Oucsák. he on-line shores pah problem under parial monioring. Journal of Machine Learning Research, 8: , [13] H. Brendan McMahan and Avrim Blum. Online geomeric opimizaion in he bandi seing agains an adapive adversary. In COL, pages , [14] A. Nemirovski and M. odd. Inerior-poin mehods for opimizaion. Aca Numerica, pages , [15] Y. E. Neserov and A. S. Nemirovskii. Inerior Poin Polynomial Algorihms in Convex Programming. SIAM, Philadelphia, [16] A. Rakhlin and A. ewari. Lecure noes on online learning, Available a hp://wwwsa.wharon.upenn.edu/ rakhlin/papers/online learning.pdf. [17] Shai Shalev-Shwarz. Online Learning: heory, Algorihms, and Applicaions. PhD hesis, Hebrew Universiy, 2007.

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec