Regret in Online Combinatorial Optimization

Size: px

Start display at page:

Download "Regret in Online Combinatorial Optimization"

Justin McKinney
5 years ago
Views:

1 Regret n Onlne Combnatoral Optmzaton Jean-Yves Audbert Imagne, Unversté Pars Est, and Serra, CNRS/ENS/INRIA audbert@magne.enpc.fr Sébasten Bubeck Department of Operatons Research and Fnancal Engneerng, Prnceton Unversty sbubeck@prnceton.edu Gábor Lugos ICREA and Pompeu Fabra Unversty gabor.lugos@upf.edu March 29, 2013 Abstract We address onlne lnear optmzaton problems when the possble actons of the decson maker are represented by bnary vectors. The regret of the decson maker s the dfference between her realzed loss and the mnmal loss she would have acheved by pckng, n hndsght, the best possble acton. Our goal s to understand the magntude of the best possble mnmax regret. We study the problem under three dfferent assumptons for the feedback the decson maker receves: full nformaton, and the partal nformaton models of the socalled sem-bandt and bandt problems. In the full nformaton case we show that the standard exponentally weghted average forecaster s a provably suboptmal strategy. For the sem-bandt model, by combnng the Mrror Descent algorthm and the INF Implctely Normalzed Forecaster strategy, we are able to prove the frst optmal bounds. Fnally, n the bandt case we dscuss exstng results n lght of a new lower bound, and suggest a conjecture on the optmal regret n that case. 1 Introducton. In ths paper we consder the framework of onlne lnear optmzaton. The setup may be descrbed as a repeated game between a decson maker or smply player or forecaster and an adver- 1

2 sary as follows: at each tme nstance t = 1,..., n, the player chooses, possbly n a randomzed way, an acton from a gven fnte acton set A R d. The acton chosen by the player at tme t s denoted by a t A. Smultaneously to the player, the adversary chooses a loss vector z t Z R d and the loss ncurred by the forecaster s a T t z t. The goal of the player s to mnmze the expected cumulatve loss E n at t z t where the expectaton s taken wth respect to the player s nternal randomzaton and eventually the adversary s randomzaton. In the basc full-nformaton verson of ths problem, the player observes the adversary s move z t at the end of round t. Another mportant model for feedback s the so-called bandt problem, n whch the player only observes the ncurred loss a T t z t. As a measure of performance we defne the regret 1 of the player as R n = E a T t z t mn E a T z t. a A In ths paper we address a specfc example of onlne lnear optmzaton: we assume that the acton set A s a subset of the d-dmensonal hypercube {0, 1} d such that a A, a 1 = m, and the adversary has a bounded loss per coordnate, that s 2 Z = [0, 1] d. We call ths settng onlne combnatoral optmzaton. As we wll see below, ths restrcton of the general framework contans a rch class of problems. Indeed, n many nterestng cases, actons are naturally represented by Boolean vectors. In addton to the full nformaton and bandt versons of onlne combnatoral optmzaton, we also consder another type of feedback whch makes sense only n ths combnatoral settng. In the sem-bandt verson, we assume that the player observes only the coordnates of z t that were played n a t, that s the player observes the vector a t 1z t 1,..., a t dz t d. All three varants of onlne combnatoral optmzaton are sketched n Fgure 1. More rgorously, onlne combnatoral optmzaton s defned as a repeated game between a player and an adversary. At each round t = 1,..., n of the game, the player chooses a probablty dstrbuton p t over the set of actons A {0, 1} d and draws a random acton a t A accordng to p t. Smultaneously, the adversary chooses a vector z t [0, 1] d. More formally, z t s a measurable functon of the past p s, a s, z s s=1,...,t 1. In the full nformaton case, p t s a measurable functon of p s, a s, z s s=1,...,t 1. In the sem-bandt case, p t s a measurable functon of p s, a s, a s z s =1,...,d s=1,...,t 1 and n the bandt problem t s a measurable functon of p s, a s, a T s z s s=1,...,t Motvatng examples. Many problems can be tackled under the onlne combnatoral optmzaton framework. We gve here three smple examples: m-sets. In ths example we consder the set A of all d m Boolean vectors n dmenson d wth exactly m ones. In other words, at every tme step, the player selects m actons out of 1 In the full nformaton verson, t s straghtforward to obtan upper bounds for the stronger noton of regret E n n at t z t E mn a A at z t whch s always at least as large as R n. However, for partal nformaton games, ths requres more work. In ths paper we only consder R n as a measure of the regret. 2 Note that snce all actons have the same sze,.e. a 1 = m, a A, one can reduce the case of Z = [α, β] d to Z = [0, 1] d va a smple renormalzaton. 2

3 Parameters: set of actons A {0, 1} d ; number of rounds n N. For each round t = 1, 2,..., n; 1 the player chooses a probablty dstrbuton p t over A and draws a random acton a t A accordng to p t ; 2 smultaneously, the adversary selects a loss vector z t [0, 1] d wthout revealng t; 3 the player ncurs the loss a T t z t. She observes the loss vector z t n the full nformaton settng, the coordnates z t a t n the sem-bandt settng, the nstantaneous loss a T t z t n the bandt settng. Goal: The player tres to mnmze her cumulatve loss n at t z t. Fgure 1: Onlne combnatoral optmzaton. d possbltes. When m = 1, the sem-bandt and bandt versons concde and correspond to the standard adversaral mult-armed bandt problem. Onlne shortest path problem. Consder a communcaton network represented by a graph n whch one has to send a sequence of packets from one fxed vertex to another. For each packet one chooses a path through the graph and suffers a certan delay whch s the sum of the delays on the edges of the path. Dependng on the traffc, the delays on the edges may change, and, at the end of each round, accordng to the assumed level of feedback, the player observes ether the delays of all edges, the delays of each edge on the chosen path, or only the total delay of the chosen path. The player s objectve s to mnmze the total delay for the sequence of packets. One can represent the set of vald paths from the startng vertex to the end vertex as a set A {0, 1} d where d s the number of edges. If at tme t, z t [0, 1] d s the vector of delays on the edges, then the delay of a path a A s z T t a. Thus ths problem s an nstance of onlne combnatoral optmzaton n dmenson d, where d s the number of edges n the graph. In ths paper we assume, for smplcty, that all vald paths have the same length m. Rankng. Consder the problem of selectng a rankng of m tems out of M possble tems. For example a webste could have a set of M ads, and t has to select a ranked lst of m of these ads to appear on the webpage. One can rephrase ths problem as selectng a matchng of sze m on the complete bpartte graph K m,m wth d = m M edges. In the onlne learnng verson of ths problem, each day the webste chooses one such lst, and gans one dollar for each clck on the ads. Ths problem can easly be formulated as an onlne combnatoral optmzaton problem. Our theory apples to many more examples, such as spannng trees whch can be useful n certan communcaton problems, or m-ntervals. 3

4 1.2 Prevous work. Full Informaton. The full-nformaton settng s now farly well understood, and an optmal regret bound n terms of m, d, n was obtaned by Koolen, Warmuth, and Kvnen [26]. Prevous papers under full nformaton feedback also nclude Gentle and Warmuth [14], Kvnen and Warmuth [25], Grove, Lttlestone, and Schuurmans [15], Takmoto and Warmuth [34], Kala and Vempala [22], Warmuth and Kuzmn [36], Herbster and Warmuth [19], and Hazan, Kale, and Warmuth [18]. Sem-bandt. The frst paper on the adversaral mult-armed bandt problem.e., the specal case of m-sets wth m = 1 s by Auer, Cesa-Banch, Freund, and Schapre [4] who derved a regret bound of order dn log d. Ths result was mproved to dn by Audbert and Bubeck [2, 3]. György, Lnder, Lugos, and Ottucsák [16] consder the onlne shortest path problem and derve suboptmal regret bounds n terms of the dependency on m and d. Uchya, Nakamura, and Kudo [35] respectvely Kale, Reyzn, and Schapre [23] derved optmal regret bounds for the case of m-sets respectvely for the problem of rankng selecton up to logarthmc factors. Bandt. McMahan and Blum [27], and Awerbuch and Klenberg [5] were the frst to consder ths settng, and obtaned suboptmal regret bounds n terms of n. The frst paper wth optmal dependency n n was by Dan, Hayes, and Kakade [12]. The dependency on m and d was then mproved n varous ways by Abernethy, Hazan, and Rakhln [1], Cesa-Banch, and Lugos [11], and Bubeck, Cesa-Banch, and Kakade [9]. We dscuss these bounds n detal n Secton 4. In partcular, we argue that the optmal regret bound n terms of d and m s stll an open problem. We also refer the nterested reader to the recent survey [8] for an overvew of bandt problems n varous other settngs. 1.3 Contrbuton and contents of the paper. In ths paper we are prmarly nterested n the optmal mnmax regret n terms of m, d and n. More precsely, our am s to determne the order of magntude of the followng quantty: For a gven feedback assumpton, wrte sup for the supremum over all adversares and nf for the nfmum over all allowed strateges for the player under the feedback assumpton. Recall the defnton of adversary and player from the ntroducton. Then we are nterested n max nf sup R n. A {0,1} d : a A, a 1 =m Our contrbuton to the study of ths quantty s threefold. Frst, we unfy the algorthms used n Abernethy, Hazan, and Rakhln [1], Koolen, Warmuth, and Kvnen [26], Uchya, Nakamura, and Kudo [35], and Kale, Reyzn, and Schapre [23] under the umbrella of mrror descent. The dea of mrror descent goes back to Nemrovsk [28], Nemrovsk and Yudn [29]. A somewhat smlar concept was re-dscovered n onlne learnng by Herbster and Warmuth [20], Grove, Lttlestone, and Schuurmans [15], Kvnen and Warmuth [25] under the name of potental-based gradent 4

5 Lower Bound Full Informaton Sem-Bandt Bandt m n log d m mdn m dn Upper Bound m n log d m mdn m 3/2 dn log d m Table 1: Bounds on the mnmax regret up to constant factors. The new results are set n boldface. In ths paper we also show that EXP2 n the full nformaton case has a regret bounded below by d 3/2 n when m s of order d. descent, see [10, Chapter 11]. Recently, these deas have been flourshng, see for nstance Shalev- Schwartz [33], Rakhln [30], Hazan [17], and Bubeck [7]. Our man theorem Theorem 2 allows one to recover almost all known regret bounds for onlne combnatoral optmzaton. Ths frst contrbuton leads to our second man result, the mprovement of the known upper bounds for the sem-bandt game. In partcular, we propose a dfferent proof of the mnmax regret bound of the order of nd n the standard d-armed bandt game that s much smpler than the one provded n Audbert and Bubeck [3] whch also mproves the constant factor. In addton to these upper bounds we prove two new lower bounds. Frst we answer a queston of Koolen, Warmuth, and Kvnen [26] by showng that the exponentally weghted average forecaster s provably suboptmal for onlne combnatoral optmzaton. Our second lower bound s a mnmax lower bound n the bandt settng whch mproves known results by an order of magntude. A summary of known bounds and the new bounds proved n ths paper can be found n Table 1. The paper s organzed as follows. In Secton 2 we ntroduce the two algorthms dscussed n ths paper. In partcular n Secton 2.1 we dscuss the popular exponentally weghted average forecaster and we show that t s a provably suboptmal strategy. Then n Secton 2.2 we descrbe our man algorthm, OSMD Onlne Stochastc Mrror Descent, and prove a general regret bound n terms of the Bregman dvergence of the Fenchel-Legendre dual of the Legendre functon defnng the strategy. In Secton 3 we derve upper bounds for the regret n the sem-bandt case for OSMD wth approprately chosen Legendre functons. Fnally n Secton 4 we prove a new lower bound for the bandt settng, and we formulate a conjecture on the correct order of magntude of the regret for that problem based on ths new result and the regret bounds obtaned n [1, 9]. 2 Algorthms. In ths secton we dscuss two classes of algorthms that have been proposed for onlne combnatoral optmzaton. 2.1 Expanded Exponental weghts EXP2. The smplest approach to onlne combnatoral optmzaton s to consder each acton of A as an ndependent expert, and then apply a generc regret mnmzng strategy. Perhaps the most popular such strategy s the exponentally weghted average forecaster see, e.g., [10]. Ths 5

6 strategy s sometmes called Hedge, see Freund and Schapre [13]. We call the resultng strategy for the onlne combnatoral optmzaton problem EXP2, see Fgure 2. In the full nformaton settng, EXP2 corresponds to Expanded Hedge, as defned n Koolen, Warmuth, and Kvnen [26]. In the sem-bandt case, EXP2 was studed by György, Lnder, Lugos, and Ottucsák [16] whle n the bandt case n Dan, Hayes, and Kakade [12], Cesa-Banch and Lugos [11], and Bubeck, Cesa-Banch, and Kakade [9]. Note that n the bandt case, EXP2 s mxed wth an exploraton dstrbuton, see Secton 4 for more detals. Despte strong nterest n ths strategy, no optmal regret bound has been derved for t n the combnatoral settng. More precsely, the best bound whch can be derved from a standard argument, see for example [12] or [26] s of order m n 3/2 log d m. On the other hand, n [26] the authors showed that by usng Mrror Descent see next secton wth the negatve entropy, one obtans a regret bounded by m n log d m. Furthermore ths latter bound s clearly optmal up to a numercal constant, as one can see from the standard lower bound n predcton wth expert advce consder the set A that corresponds to playng m expert problems n parallel wth d/m experts n each problem. In [26] the authors leave as an open queston the problem of whether t would be possble to mprove the bound for EXP2 to obtan the optmal order of magntude. The followng theorem shows that ths s mpossble, and that n fact EXP2 s a provably suboptmal strategy. Theorem 1 Let n d. There exsts a subset A {0, 1} d such that n the full nformaton settng, the regret of the EXP2 strategy for any learnng rate η, satsfes The proof s deferred to the Appendx. sup R n 0.01 d 3/2 n. adversary 2.2 Onlne Stochastc Mrror Descent. In ths secton we descrbe the man algorthm studed n ths paper. We call t Onlne Stochastc Mrror Descent OSMD. Each term n ths name refers to a part of the algorthm: Mrror Descent orgnates n the work of Nemrovsk and Yudn [29]. The dea of mrror descent s to perform a gradent descent, where the update wth the gradent s performed n the dual space defned by some Legendre functon F rather than n the prmal see below for a precse formulaton. The Stochastc part takes ts orgn from Robbns and Monro [31] and from Kefer and Wolfowtz [24]. The key dea s that t s enough to observe an unbased estmate of the gradent rather than the true gradent n order to perform a gradent descent. Fnally the Onlne part comes from Znkevch [37]. Znkevch derved the Onlne Gradent Descent OGD algorthm, whch s a verson of gradent descent talored to onlne optmzaton. To properly descrbe the OSMD strategy, we recall a few concepts from convex analyss, see Hrart-Urruty and Lemaréchal [21] for a thorough treatment of ths subject. Let D R d be an open convex set, and D the closure of D. Defnton 1 We call Legendre any contnuous functon F : D R such that F s strctly convex contnuously dfferentable on D, 6

7 EXP2: Parameter: Learnng rate η. Let p 1 = 1 A,..., 1 A R A. For each round t = 1, 2,..., n; a Play a t p t and observe the loss vector z t n the full nformaton game, the coordnates z t 1 at=1 n the sem-bandt game, the nstantaneous loss a T t z t n the bandt game. b Estmate the loss vector z t by z t. For nstance, one may take z t = z t n the full nformaton game, z t = z t a A:a=1 ptaa t n the sem-bandt game, z t = P + t a t a T t z t, wth P t = E a pt aa T n the bandt game. c Update the probabltes, for all a A, p t+1 a = exp ηa T z t p t a b A exp ηbt z T t p t b. Fgure 2: The EXP2 strategy. The notaton E a pt denotes expected value wth respect to the random choce of a when t s dstrbuted accordng to p t. lm x D\D F x = +. 3 The Bregman dvergence D F : D D assocated to a Legendre functon F s defned by D F x, y = F x F y x y T F y. Moreover, we say that D = F D s the dual space of D under F. We also denote by F the Legendre-Fenchel transform of F defned by F u = sup x T u F x. x D Lemma 1 Let F be a Legendre functon. Then F = F and F = F 1 on the set D. Moreover, x, y D, D F x, y = D F F y, F x. 1 3 By the equvalence of norms n R d, ths defnton does not depend on the choce of the norm. 7

8 The lemma above s the key to understandng how a Legendre functon acts on the space. The gradent F maps D to the dual space D, and F s the nverse mappng from the dual space to the orgnal prmal space. Moreover, 1 shows that the Bregman dvergence n the prmal space corresponds exactly to the Bregman dvergence of the Legendre-Fenchel transform n the dual space. A proof of ths result can be found, for example, n [Chapter 11, [10]]. We now have all ngredents to descrbe the OSMD strategy, see Fgure 3 for the precse formulaton. Note that step d s well defned f the followng consstency condton s satsfed: F x η z t D, x ConvA D. 2 In the full nformaton settng, algorthms of ths type were studed by Abernethy, Hazan, and Rakhln [1], Rakhln [30], and Hazan [17]. In these papers the authors adopted the presentaton suggested by Beck and Teboulle [6], whch corresponds to a Follow-the-Regularzed-Leader FTRL type strategy. There the focus was on F beng strongly convex wth respect to some norm. Moreover, n [1] the authors also consder the bandt case, and swtch to F beng a self-concordant barrer for the convex hull of A see Secton 4 for more detals. Another lne of work studed ths type of algorthms wth F beng the negatve entropy, see Koolen, Warmuth, and Kvnen [26] for the full nformaton case and Uchya, Nakamura, and Kudo [35], Kale, Reyzn, and Schapre [23] for specfc nstances of the sem-bandt case. All these results are unfed and descrbed n detals n Bubeck [7]. In ths paper we consder a new type of Legendre functons F nspred by Audbert and Bubeck [3], see Secton 3. Regardng computatonal complexty, OSMD s effcent as soon as the polytope ConvA can be descrbed by a polynomal n d number of constrants. Indeed n that case steps a-b can be performed effcently jontly one can get an algorthm by lookng at the proof of Carathéodory s theorem, and step d s a convex program wth a polynomal number of constrants. In many nterestng examples such as m-sets, selecton of rankngs, spannng trees, paths n acyclc graphs one can descrbe the convex hull of A by a polynomal number of constrants, see Schrjver [32]. On the other hand, there also exst mportant examples where ths s not the case such as paths on general graphs. Also note that for some specfc examples t s possble to mplement OSMD wth mproved computatonal complexty, see Koolen, Warmuth, and Kvnen [26]. In ths paper we restrct our attenton to the combnatoral learnng settng n whch A s a subset of {0, 1} d and the loss s lnear. However, one should note that ths specfc form of A plays no role n the defnton of OSMD. Moreover, f the loss s not lnear, then one can modfy OSMD by performng a gradent update wth a gradent of the loss rather than the loss vector z t. See Bubeck [7] for more detals on ths approach. The followng result s at the bass of our mproved regret bounds for OSMD n the sem-bandt settng, see Secton 3. Theorem 2 Suppose that 2 s satsfed and the loss estmates are unbased n the sense that E at p t z t = z t. Then the regret of the OSMD strategy satsfes R n sup a A F a F x 1 η + 1 η ED F F x t η z t, F x t. 8

9 OSMD: Parameters: learnng rate η > 0, Legendre functon F defned on D ConvA. Let x 1 argmn x ConvA F x. For each round t = 1, 2,..., n; a Let p t be a dstrbuton on the set A such that x t = E a pt a. b Draw a random acton a t accordng to the dstrbuton p t and observe the feedback. c Based on the observed feedback, estmate the loss vector z t by z t. d Let w t+1 D satsfy F w t+1 = F x t η z t. 3 e Project the weght vector w t+1 defned by 3 on the convex hull of A: x t+1 argmn D F x, w t+1. 4 x ConvA Fgure 3: Onlne Stochastc Mrror Descent OSMD. Proof Let a A. Usng that a t and z t are unbased estmates of x t and z t, we have E a t a T z t = E x t a T z t. Usng 3, and applyng the defnton of the Bregman dvergences, one obtans η z T t x t a = a x t T F w t+1 F x t = D F a, x t + D F x t, w t+1 D F a, w t+1. By the Pythagorean theorem for Bregman dvergences see, e.g., Lemma 11.3 of [10], we have D F a, w t+1 D F a, x t+1 + D F x t+1, w t+1, hence η z T t x t a D F a, x t + D F x t, w t+1 D F a, x t+1 D F x t+1, w t+1. Summng over t gves η z t T x t a D F a, a 1 D F a, a n+1 + DF x t, w t+1 D F x t+1, w t+1. 9

10 By the nonnegatvty of the Bregman dvergences, we get η z t T x t a D F a, a 1 + D F x t, w t+1. From 1, one has D F x t, w t+1 = D F F xt η z t, F x t. Moreover, by wrtng the frstorder optmalty condton for x 1, one drectly obtans D F a, x 1 F a F x 1 whch concludes the proof. Note that, f F admts an Hessan, denoted 2 F, that s always nvertble, then one can prove that, up to a thrd-order term n z t, the regret bound can be wrtten as R n sup a A F a F x 1 η + η 2 z T t 2 F x t 1 zt. 5 The man techncal dffculty s to control the thrd-order error term n ths nequalty. 3 Sem-bandt feedback. In ths secton we consder onlne combnatoral optmzaton wth sem-bandt feedback. As we already dscussed, n the full nformaton case Koolen, Warmuth, and Kvnen [26] proved that OSMD wth the negatve entropy s a mnmax optmal strategy. We frst prove a regret bound when one uses ths strategy wth the followng estmate for the loss vector: z t = z ta t. 6 x t Note that ths s a vald estmate snce t makes only use of z t 1a t 1,..., z t da t d. Moreover, t s unbased wth respect to the random draw of a t from p t, snce by defnton, E at p t a t = x t. In other words, E at p t z t = z t. Theorem 3 The regret of OSMD wth F x = d =1 x log x d =1 x and D = 0, + d and any non-negatve unbased loss estmate z t 0 satsfes R n m log d m η In partcular, wth the estmate 6 and η = R n + η 2 d x t z t 2. =1 m log dm 2, nd 2mdn log d m. 10

11 Proof One can easly see that for the negatve entropy the dual space s D = R d. Thus, 2 s verfed and OSMD s well defned. Moreover, agan by straghtforward computatons, one can also see that D F F x, F y = d =1 y Θ F x F y, 7 where Θx = expx 1 x. Thus, usng Theorem 2 and the facts that Θx x2 for x 0 2 and d =1 x t m, one obtans R n sup a A F a F x 1 η sup a A F a F x 1 η + 1 η + η 2 ED F F x t η z t, F x t d x t z t 2 The proof of the frst nequalty s concluded by notng that: d d 1 F a F x 1 x 1 log x 1 m log =1 The second nequalty follows from =1 =1 Ex t z t 2 E a t x t = 1. x 1 m 1 = m log d x 1 m. Usng the standard dn lower bound for the mult-armed bandt whch corresponds to the case where A s the canoncal bass, see e.g., [Theorem 30, [3]], one can drectly obtan a lower bound of order mdn for our settng. Thus the upper bound derved n Theorem 3 has an extraneous logarthmc factor compared to the lower bound. Ths phenomenon already appeared n the basc mult-armed bandt settng. In that case, the extra logarthmc factor was removed n Audbert and Bubeck [2] by resortng to a new class of strateges for the expert problem, called INF Implctely Normalzed Forecaster. Next we generalze ths class of algorthms to the combnatoral settng, and thus remove the extra logarthmc factor. Frst we ntroduce the noton of a potental and the assocated Legendre functon. Defnton 2 Let ω 0. A functon ψ :, a R + for some a R {+ } s called an ω-potental f t s convex, contnuously dfferentable, and satsfes lm ψx = ω, x ψ > 0, lm ψx = +, x a ω+1 ω ψ 1 s ds < +. For every potental ψ we assocate the functon F ψ defned on D = ω, + d by: F ψ x = d =1 x ω 11 ψ 1 sds.

12 In ths paper we restrct our attenton to 0-potentals whch we wll smply call potentals. A non-zero value of ω may be used to derve regret bounds that hold wth hgh probablty nstead of pseudo-regret bounds, see footnote 1. The frst order optmalty condton for 4 mples that OSMD wth F ψ s a drect generalzaton of INF wth potental ψ, n the sense that the two algorthms concde when A s the canoncal bass. Note, n partcular, that wth ψx = expx we recover the negatve entropy for F ψ. In [3], the choce of ψx = x q wth q > 1 was recommended. We show n Theorem 4 that here, agan, ths choce gves a mnmax optmal strategy. Lemma 2 Let ψ be a potental. Then F = F ψ s Legendre and for all u, v D =, a d such that u v, {1,..., d}, D F u, v 1 2 d ψ v u v 2. =1 Proof A drect examnaton shows that F = F ψ s a Legendre functon. Moreover, snce F u = F 1 u = ψu 1,..., ψu d, we obtan D F u, v = From a Taylor expanson, we get d u v ψsds u v ψv. =1 D F u, v d 1 max s [u,v ] 2 ψ su v 2. =1 Snce the functon ψ s convex, and u v, we have whch gves the desred result. max s [u,v ] ψ s ψ maxu, v ψ v, Theorem 4 Let ψ be a potental. The regret of OSMD wth F = F ψ and any non-negatve unbased loss estmate z t satsfes R n sup a A F a F x 1 η + η 2 d z t 2 E ψ 1 x t. In partcular, wth the estmate 6, ψx = x q, q > 1,and η = =1 2 q 1 m 1 2/q 1, d 1 2/q n Wth q = 2 ths gves 2 R n q q 1 mdn. R n 2 2mdn. 12

13 In the case m = 1, the above theorem mproves the bound R n 8 nd obtaned n Theorem 11 of [3]. Proof Frst note that snce D =, a d and z t has non-negatve coordnates, OSMD s well defned that s, 2 s satsfed. The frst nequalty follows from Theorem 2 and the fact that ψ ψ 1 1 s = Let ψx = x q. Then ψ 1 x = x 1/q and F x = q q 1 note that by Hölder s nequalty, snce d =1 x 1 = m, F a F x 1 q q 1 d x 1 1 1/q =1 Moreover, note that ψ 1 x = 1 q x 1 1/q, and d =1 x1 1/q q q 1 mq 1/q d 1/q. d z t 2 d E ψ 1 x t q x t 1/q qm 1/q d 1 1/q, =1 whch concludes the proof. =1 ψ 1 s.. In partcular, 4 Bandt feedback. In ths secton we consder onlne combnatoral optmzaton wth bandt feedback. Ths settng s much more challengng than the sem-bandt case, and n order to obtan sublnear regret bounds all known strateges add an exploraton component to the algorthm. For example, n EXP2, nstead of playng an acton at random accordng to the exponentally weghted average dstrbuton p t, one draws a random acton from p t wth probablty 1 γ and from some fxed exploraton dstrbuton µ wth probablty γ. On the other hand, n OSMD, one randomly perturbs x t to some x t, and then plays at random a pont n A such that on average one plays x t. In Bubeck, Cesa-Banch, and Kakade [9], the authors study the EXP2 strategy wth the exploraton dstrbuton µ supported on the contact ponts between the polytope ConvA and the John ellpsod of ths polytope.e., the ellpsod of mnmal volume enclosng the polytope. Usng ths method they are able to prove the best known upper bound for onlne combnatoral optmzaton wth bandt feedback. They show that the regret of EXP2 mxed wth John s exploraton and wth the estmate descrbed n Fgure 2 satsfes R n 2m 3/2 3dn log ed m. Our next theorem shows that no strategy can acheve a regret less than a constant tmes m dn, leavng a gap of a factor of m log d. As we argue below, we conjecture that the lower bound s of m the correct order of magntude. However, mprovng the upper bound seems to requre some substantally new deas. Note that the followng bound gves lmtatons that no strategy can surpass, on the contrary to Theorem 1 whch was dedcated to the EXP2 strategy. 13

14 Theorem 5 Let n d 2m. There exsts a subset A {0, 1} d such that a 1 = m, a A, under bandt feedback, one has nf sup strateges adversares R n 0.02m dn, 8 where the nfmum and the supremum are taken over the class of strateges for the player and for the adversary as defned n the ntroducton. Note that t should not come as a surprse that EXP2 wth John s exploraton s suboptmal, snce even n the full nformaton case the basc EXP2 strategy was provably suboptmal, see Theorem 1. We conjecture that the correct order of magntude for the mnmax regret n the bandt case s m dn, as the above lower bound suggests. A promsng approach to resolve ths conjecture s to consder agan the OSMD approach. However we beleve that n the bandt case, one has to consder Legendre functons wth nondagonal Hessan on the contrary to the Legendre functons consdered so far n ths paper. Abernethy, Hazan, and Rakhln [1] propose to use a self-concordant barrer functon for the polytope ConvA. Then they randomly perturb the pont x t gven by OSMD usng the egenstructure of the Hessan. Ths approach leads to a regret upper bound of order md θn log n for θ > 0 when ConvA admts a θ-self-concordant barrer functon. Unfortunately, even when there exsts a O1-self concordant barrer, ths bound s stll larger than the conjectured optmal bound by a factor d. In fact, t was proved n [9] that n some cases there exst better choces for the Legendre functon and the perturbaton than those descrbed n [1], even when there s a O1-self concordant functon for the acton set. How to generalze ths approach to the polytopes nvolved n onlne combnatoral optmzaton s a challengng open problem. A Proof of Theorem 1. For the sake of smplcty, we assume that d s a multple of 4 and that n s even. We consder the followng subset of the hypercube: { d/2 A = a {0, 1} d : a = d/4 and =1 a = 1, {d/2 + 1;..., d/2 + d/4} or } a = 1, {d/2 + d/4 + 1,..., d}. That s, choosng a pont n A corresponds to choosng a subset of d/4 elements among the frst half of the coordnates, and choosng one of the two frst dsjont ntervals of sze d/4 n the second half of the coordnates. We prove that for any parameter η, there exsts an adversary such that Exp2 wth parameter η has a regret of at least nd tanh ηd 16 8 least mn d log 2 12η, and that there exsts another adversary such that ts regret s at, nd 12. As a consequence, we have nd ηd d log 2 sup R n max 16 tanh, mn 8 12η, nd 12 nd ηd mn max 16 tanh, d log 2, nd 8 12η mn A, nd 12,

15 wth nd ηd A = mn max η [0,+ 16 tanh, d log η nd ηd nd ηd mn mn ηd 8 16 tanh, mn 8 max ηd<8 16 tanh, d log η nd nd mn tanh1, mn 16 max ηd d log 2 tanh1, ηd< η nd mn 16 tanh1, nd3 log 2 tanh1 mn 0.04 nd, 0.01 d 3/2 n, where we used the fact that tanh s concave and ncreasng on R +. As n d, ths mples the stated lower bound. Frst we prove the lower bound nd tanh ηd Defne the followng adversary: z t = 1 f {d/2 + 1;..., d/2 + d/4} and t odd, 1 f {d/2 + d/4 + 1,..., d} and t even, 0 otherwse. Ths adversary always puts a zero loss on the frst half of the coordnates, and alternates between a loss of d/4 for choosng the frst nterval n the second half of the coordnates and the second nterval. At the begnnng of odd rounds, any vertex a A has the same cumulatve loss and thus Exp2 pcks ts expert unformly at random, whch yelds an expected cumulatve loss equal to nd/16. On the other hand, at even rounds the probablty dstrbuton to select the vertex a A s always the same. More precsely, the probablty of selectng a vertex whch contans the nterval 1 {d/2 + d/4 + 1,..., d}.e, the nterval wth a d/4 loss at ths round s exactly. Ths 1+exp ηd/4 adds an expected cumulatve loss equal to nd 1. Fnally, note that the loss of any fxed 8 1+exp ηd/4 vertex s nd/8. Thus, we obtan R n = nd 16 + nd exp ηd/4 nd 8 = nd ηd 16 tanh. 8 It remans to show a lower bound proportonal to 1/η. To ths end, we consder a dfferent adversary defned by 1 ε f d/4, z t = 1 f {d/4 + 1,..., d/2}, 0 otherwse, for some fxed ε > 0. Note that aganst ths adversary the choce of the nterval n the second half of the components does not matter. Moreover, by symmetry, the weght of any coordnate n {d/4 + 1,..., d/2} s the same at any round. Fnally, note that ths weght s decreasng wth t. Thus, we have the followng denttes n the bg sums represents the number of components selected n the frst 15

16 d/4 components: R n = nεd a A:a d/2 =1 exp ηnzt 1 a 4 a A exp ηnzt 1 a = nεd 4 = nεd 4 = nεd 4 d/4 1 d/4 d/4 1 =0 d/4 d/4 d/4 =0 d/4 1 d/4 d/4 1 =0 d/4 d/4 d/4 =0 d/4 1 =0 1 4 d/4 d/4 d d/4 d/4 d/4 =0 d/4 1 exp ηnd/4 nε d/4 exp ηnd/4 nε d/4 1 expηnε d/4 expηnε d/4 expηnε d/4 expηnε where we used d/4 1 d/4 1 = 1 4 d/4 d d/4 n the last equalty. Thus, takng ε = mn log 2, 1 ηn yelds d log 2 R n mn 4η, nd 4 d/4 1 =0 1 4 d/4 2 d mn2, expηn 2 mn2, expηn d/4 =0 d/4 d log 2 mn 12η, nd, 12 where the last nequalty follows from Lemma 3 n the appendx. Ths concludes the proof of the lower bound. B Proof of Theorem 5 The structure of the proof s smlar to that of [3, Theorem 30], whch deals wth the smple case where m = 1. The man mportant conceptual dfference s contaned n Lemma 4, whch s at the heart of ths new proof. The man argument follows the lne of standard lower bounds for bandt problems, see, e.g., [10]: The worst-case regret s bounded from below by by takng an average over a convenently chosen class of strateges of the adversary. Then, by Pnsker s nequalty, the problem s reduced to computng the Kullback-Lebler dvergence of certan dstrbutons. The man techncal argument, gven n Lemma 4, s for provng manageable bounds for the relevant Kullback-Lebler dvergence. For the sake of smplfyng notaton, we assume that d s a multple of m, and we dentfy {0, 1} d wth the set of m d/m bnary matrces {0, 1} m d m. We consder the followng set of actons: d/m A = {a {0, 1} m d m : {1,..., m}, a, j = 1}. In other words, the player s playng n parallel m fnte games wth d/m actons. From step 1 to 3 we restrct our attenton to the case of determnstc strateges for the player, and we show how to extend the results to arbtrary strateges n step 4. Frst step: defntons. j=1 16

17 We denote by I,t {1,..., m} the random varable such that a t, I,t = 1. That s, I,t s the acton chosen at tme t n the th game. Moreover, let τ be drawn unformly at random from {1,..., n}. In ths proof we consder random adversares ndexed by A. More precsely, for α A, we defne the α-adversary as follows: For any t {1,..., n}, z t, j s drawn from a Bernoull dstrbuton wth parameter 1 εα, j. In other words, aganst adversary α, n the 2 th game, the acton j such that α, j = 1 has a loss slghtly smaller n expectaton than the other actons. We denote by E α ntegraton wth respect to the loss generaton process of the α-adversary. We wrte P,α for the probablty dstrbuton of α, I,τ when the player faces the α-adversary. Note that 1 we have P,α 1 = E n α n 1 α,i,t =1, hence, aganst the α-adversary, we have m m R n = E α ε1 α,i,t 1 = nε 1 P,α 1, =1 whch mples snce the maxmum s larger than the mean m max R 1 n nε 1 P α A d/m m,α 1. 9 =1 Second step: nformaton nequalty. Let P,α be the probablty dstrbuton of α, I,τ aganst the adversary whch plays lke the α-adversary except that n the th game, the losses of all coordnates are drawn from a Bernoull dstrbuton of parameter 1/2. We call t the, α-adversary and we denote by E,α ntegraton wth respect to ts loss generaton process. By Pnsker s nequalty, 1 P,α 1 P,α KLP,α, P,α, where KL denotes the Kullback-Lebler dvergence. Moreover, note that by symmetry of the adversares, α, 1 1 P d/m m,α 1 = E d/m m,α α, I,τ α A = = = α A 1 1 d/m m d/m β A =1 α A α:,α=,β 1 1 d/m m d/m E,β β A 1 1 d/m m d/m β A and thus, thanks to the concavty of the square root, α:,α=,β E,α α, I,τ α, I,τ = m d, 10 1 P d/m m,α 1 m 1 d + KLP 2d/m m,α, P,α. 11 α A 17 α A

18 Thrd step: computaton of KLP,α, P,α wth the chan rule. Note that snce the forecaster s determnstc, the sequence of observed losses up to tme n W n {0,..., m} n unquely determnes the emprcal dstrbuton of plays, and, n partcular, the probablty dstrbuton of α, I,τ condtonally to W n s the same for any adversary. Thus, f we denote by P n α respectvely P n,α the probablty dstrbuton of W n when the forecaster plays aganst the α-adversary respectvely the, α-adversary, then one can easly prove that KLP,α, P,α KLP n,α, P n α. Now we use the chan rule for Kullback-Lebler dvergence teratvely to ntroduce the probablty dstrbutons P t α of the observed losses W t up to tme t. More precsely, we have, KLP n,α, P n α = KLP 1,α, P 1 α + P t 1,α w t 1KLP t,α. w t 1, P t α. w t 1 t=2 w t 1 {0,...,m} t 1 = KL B, B 1 α,i,1 =1 + P t 1,α w t 1KL B wt 1, B w t 1, t=2 w t 1 :α,i,1 =1 where B wt 1 and B w t 1 are sums of m Bernoull dstrbutons wth parameters n {1/2, 1/2 ε} and such that the number of Bernoulls wth parameter 1/2 n B wt 1 s equal to the number of Bernoulls wth parameter 1/2 n B w t 1 plus one. Now usng Lemma 4 see below we obtan, In partcular, ths gves KLP n,α, P n α KL B wt 1, B w t 1 8 ε 2 1 4ε 2 m. 8 ε 2 1 4ε 2 m E,α 1 α,i,t =1 = 8 ε 2 n 1 4ε 2 m P,α1. Summng and pluggng ths nto 11 we obtan agan thanks to 10, for ε 1 8, 1 P d/m m,α 1 m 8n d + ε d. α A To conclude the proof of 8 for determnstc players one needs to plug ths last equaton n 9 along wth straghtforward computatons. Fourth step: Fubn s theorem to handle non-determnstc players. Consder now a randomzed player, and let E rand denote the expectaton wth respect to the randomzaton of the player. Then one has thanks to Fubn s theorem, 1 E d/m m α A a T t z t α T 1 z = E rand d/m m α A E α a T t z t α T z. Now note that f we fx the realzaton of the forecaster s randomzaton then the results of the 1 prevous steps apply and, n partcular, one can lower bound d/m α A E n m α at t z t α T z as before note that α s the optmal acton n expectaton aganst the α-adversary. 18

19 C Techncal lemmas. Lemma 3 For any k N, for any 1 c 2, we have k =0 1 /k k 2c k =0 k 2c 1/3. Proof Let fc denote the expresson on the left-hand sde of the nequalty. Introduce the random varable X, whch s equal to {0,..., k} wth probablty k 2c / k k 2c j=0 j j. We have f c = 1E[X1 X/k] 1 1 EXE1 X/k = Var X 0. So the functon f s decreasng c c ck on [1, 2], and therefore t suffces to consder c = 2. Numerator and denomnator of the left-hand sde dffer only by the factor 1 /k. A lower bound for the left-hand sde can thus be obtaned by showng that the terms for close to k are not essental to the value of the denomnator. To prove ths, we may use Strlng s formula whch mples that for any k 2 and [1, k 1], k k k k e 1/6 < k 2πk k k k k k < e 1/12, k 2πk hence k 2 k 2k ke 1/3 k 2πk < 2 k k 2 k 2k ke 1/6 < k 2π. Introduce λ = /k and χλ = 2 λ λ 2λ 1 λ 21 λ. We have [χλ] k 2e 1/3 πk < 2 k 2 < [χλ] k e1/6 2πλ. 12 Lemma 3 can be numercally verfed for k We now consder k > For λ 0.666, snce the functon χ can be shown to be decreasng on [0.666, 1], the nequalty k 22 < [χ0.666] k e1/6 holds. We have χ0.657/χ0.666 > Consequently, for k > π 106, we have [χ0.666] k < [χ0.657] k /k 2. So for λ and k > 10 6, we have 2 k 2 < [χ0.657] k e 1/6 2π k < 2e 1/3 2 [χ0.657]k 1000πk 2 = mn λ [0.656,0.657] [χλ]k < k 2e 1/3 1000πk 2 max {1,...,k 1} [0,0.666k k 2 2, 13 where the last nequalty comes from 12 and the fact that there exsts {1,..., k 1} such that /k [0.656, 0.657]. Inequalty 13 mples that for any {1,..., k}, we have 0.666k k 2 k 2 < max {1,...,k 1} [0,0.666k 19 2 k 2 < <0.666k 2 k 2.

20 To conclude, ntroducng A = 0 <0.666k k 22, we have k =0 1 /k k 22 k k k =0 k 2 > A A A 1 3. Lemma 4 Let l and n be ntegers wth 1 n l n. Let p, 2 2 p, q, p 1,..., p n be real numbers n 0, 1 wth q {p, p }, p 1 = = p l = q and p l+1 = = p n. Let B resp. B be the sum of n + 1 ndependent Bernoull dstrbutons wth parameters p, p 1,..., p n resp. p, p 1,..., p n. We have KLB, B 2p p 2 1 p n + 2q. Proof Let Z, Z, Z 1,..., Z n be ndependent Bernoull dstrbutons wth parameters p, p, p 1,..., p n. Defne S = l =1 Z, T = n =l+1 Z and V = Z + S. By a slght and usual abuse of notaton, we use KL to denote Kullback-Lebler dvergence of both probablty dstrbutons and random varables. Then we may wrte the nequalty s an easy consequence of the chan rule for Kullback- Lebler dvergence KLB, B = KL Z + S + T, Z + S + T KL Z + S, T, Z + S, T = KL Z + S, Z + S. Let s k = PS = k for k = 1, 0,..., l + 1. Usng the equaltes s k = l q k 1 q l k = k q 1 q whch holds for 1 k l + 1, we obtan l k + 1 l q k 1 1 q l k+1 = k k 1 q 1 q l k + 1 s k 1, k l+1 PZ + S = k KLZ + S, Z + S = PV = k log PZ + S = k k=0 l+1 psk ps k = PV = k log p s k p s k k=0 l+1 p 1 q k + 1 pl k + 1 q = PV = k log p 1 q k + 1 k=0 q p l k + 1 p qv + 1 pql + 1 = E log. 14 p qv + 1 p ql

21 Frst case: q = p. By Jensen s nequalty, usng that EV = p l p p n ths case, we get p p KLZ + S, Z EV + 1 pp l S log 1 p p l + 1 p p p p l + 1 = log 1 p p l + 1 p p 2 p p 2 = log p p l p p l + 1. Second case: q = p. In ths case, V s a bnomal dstrbuton wth parameters l + 1 and p. From 14, we have p KLZ + S, Z pv + 1 p pl S E log 1 ppl + 1 E log 1 + p pv EV ppl + 1 To conclude, we wll use the followng lemma. Lemma 5 The followng nequalty holds for any x x 0 wth x 0 0, 1: logx x 1 + x 12 2x 0. Proof Introduce fx = x 1 + x 12 2x 0 + logx. We have f x = 1 + x 1 x 0 + 1, and x f x = 1 x 0 1. From f x x 2 0 = 0, we get that f s negatve on x 0, 1 and postve on 1, +. Ths leads to f nonnegatve on [x 0, +. Fnally, from Lemma 5 and 15, usng x 0 = 1 p, we obtan 1 p KLZ + S, Z p 2 p E[V EV 2 ] + S 1 ppl + 1 2x 0 p 2 p l + 1p1 p 2 = 1 ppl p p p 2 = 21 p l + 1p. Acknowledgements G. Lugos s supported by the Spansh Mnstry of Scence and Technology grant MTM and PASCAL2 Network of Excellence under EC grant no

22 References [1] J. Abernethy, E. Hazan, and A. Rakhln, Competng n the dark: An effcent algorthm for bandt lnear optmzaton, Proceedngs of the 21st Annual Conference on Learnng Theory COLT, 2008, pp [2] J.-Y. Audbert and S. Bubeck, Mnmax polces for adversaral and stochastc bandts, Proceedngs of the 22nd Annual Conference on Learnng Theory COLT, [3], Regret bounds and mnmax polces under partal montorng, Journal of Machne Learnng Research , [4] P. Auer, N. Cesa-Banch, Y. Freund, and R. Schapre, The non-stochastc mult-armed bandt problem, SIAM Journal on Computng , no. 1, [5] B. Awerbuch and R. Klenberg, Adaptve routng wth end-to-end feedback: dstrbuted learnng and geometrc approaches, STOC 04: Proceedngs of the thrty-sxth annual ACM symposum on Theory of computng, 2004, pp [6] A. Beck and M. Teboulle, Mrror descent and nonlnear projected subgradent methods for convex optmzaton, Operatons Research Letters , no. 3, [7] S. Bubeck, Introducton to onlne optmzaton, Lecture Notes, [8] S. Bubeck and N. Cesa-Banch, Regret analyss of stochastc and nonstochastc mult-armed bandt problems, Foundatons and Trends n Machne Learnng , no. 1, [9] S. Bubeck, N. Cesa-Banch, and S. M. Kakade, Towards mnmax polces for onlne lnear optmzaton wth bandt feedback, Arxv preprnt arxv: [10] N. Cesa-Banch and G. Lugos, Predcton, learnng, and games, Cambrdge Unversty Press, [11], Combnatoral bandts, Journal of Computer and System Scences 2011, To appear. [12] V. Dan, T. Hayes, and S. Kakade, The prce of bandt nformaton for onlne optmzaton, Advances n Neural Informaton Processng Systems NIPS, vol. 20, 2008, pp [13] Y. Freund and R. E. Schapre, A decson-theoretc generalzaton of on-lne learnng and an applcaton to boostng, Journal of Computer and System Scences , [14] C. Gentle and M. Warmuth, Lnear hnge loss and average margn, Advances n Neural Informaton Processng Systems NIPS, [15] A. Grove, N. Lttlestone, and D. Schuurmans, General convergence results for lnear dscrmnant updates, Machne Learnng , [16] A. György, T. Lnder, G. Lugos, and G. Ottucsák, The on-lne shortest path problem under partal montorng, Journal of Machne Learnng Research ,

23 [17] E. Hazan, The convex optmzaton approach to regret mnmzaton, Optmzaton for Machne Learnng S. Sra, S. Nowozn, and S. Wrght, eds., MIT press, 2011, pp [18] E. Hazan, S. Kale, and M. Warmuth, Learnng rotatons wth lttle regret, Proceedngs of the 23rd Annual Conference on Learnng Theory COLT, [19] D. P. Helmbold and M. Warmuth, Learnng permutatons wth exponental weghts, Journal of Machne Learnng Research , [20] M. Herbster and M. Warmuth, Trackng the best expert, Machne Learnng , [21] J.-B. Hrart-Urruty and C. Lemaréchal, Fundamentals of convex analyss, Sprnger, [22] A. Kala and S. Vempala, Effcent algorthms for onlne decson problems, Journal of Computer and System Scences , [23] S. Kale, L. Reyzn, and R. Schapre, Non-stochastc bandt slate problems, Advances n Neural Informaton Processng Systems NIPS, 2010, pp [24] J. Kefer and J. Wolfowtz, Stochastc estmaton of the maxmum of a regresson functon, Annals of Mathematcal Statstcs , [25] J. Kvnen and M. Warmuth, Relatve loss bounds for multdmensonal regresson problems, Machne Learnng , [26] W. Koolen, M. Warmuth, and J. Kvnen, Hedgng structured concepts, Proceedngs of the 23rd Annual Conference on Learnng Theory COLT, 2010, pp [27] H. McMahan and A. Blum, Onlne geometrc optmzaton n the bandt settng aganst an adaptve adversary, In Proceedngs of the 17th Annual Conference on Learnng Theory COLT, 2004, pp [28] A. Nemrovsk, Effcent methods for large-scale convex optmzaton problems, Ekonomka Matematcheske Metody , In Russan. [29] A. Nemrovsk and D. Yudn, Problem complexty and method effcency n optmzaton, Wley Interscence, [30] A. Rakhln, Lecture notes on onlne learnng, [31] H. Robbns and S. Monro, A stochastc approxmaton method, Annals of Mathematcal Statstcs , [32] A. Schrjver, Combnatoral optmzaton, Sprnger, [33] S. Shalev-Shwartz, Onlne learnng: Theory, algorthms, and applcatons, Ph.D. thess, The Hebrew Unversty of Jerusalem, [34] E. Takmoto and M. Warmuth, Paths kernels and multplcatve updates, Journal of Machne Learnng Research ,

24 [35] T. Uchya, A. Nakamura, and M. Kudo, Algorthms for adversaral bandt problems wth multple plays, Proceedngs of the 21st Internatonal Conference on Algorthmc Learnng Theory ALT, [36] M. Warmuth and D. Kuzmn, Randomzed onlne pca algorthms wth regret bounds that are logarthmc n the dmenson, Journal of Machne Learnng Research , [37] M. Znkevch, Onlne convex programmng and generalzed nfntesmal gradent ascent, Proceedngs of the Twenteth Internatonal Conference on Machne Learnng ICML,

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.