Online Learning with Queries

Size: px

Start display at page:

Download "Online Learning with Queries"

Douglas Morton
6 years ago
Views:

1 Online Learning wih Queries Chao-Kai Chiang Chi-Jen Lu Absrac The online learning problem requires a player o ieraively choose an acion in an unknown and changing environmen. In he sandard seing of his problem, he player has o choose an acion in each round before knowing anyhing abou he corresponding loss. However, here are siuaions in which i seems possible for he player o spend effors or resources o collec some prior informaion before her acions. This moivaes us o sudy a varian of he online learning problem, in which he player is allowed o query B bis from he loss vecor in each round before choosing her acion. Suppose each loss value is represened by K bis and disinc loss values differ by a leas some amoun δ, and suppose here are N acions o choose and T rounds o play. We provide an algorihm for his problem which achieves a regre of he following form. Before B approaching B 1 = NK/2, he regre says a O T ln N), and afer B exceeding B 1 bu before approaching B 2 = NK/2 + 3K/2 1, he regre drops slighly o O T ln N)/N), while afer B exceeding B 2, he regre akes a dramaic drop o N ln N)/δ. Our algorihm is in fac close o opimal as we also provide regre lower bounds which almos mach he regre upper bounds achieved by our algorihm. 1 Inroducion Many siuaions in daily life seem o involve making repeaed decisions in an unknown and changing environmen, including examples such as rading socks, commuing o work, rouing in a nework, forecasing weaher, playing games, ec. This moivaes he sudy of he well-known online learning problem in which a player ieraively chooses an acion and receives a loss or a reward) for a number of rounds. In each round, he player mus choose her acion before knowing he corresponding loss, bu afer choosing her acion, she ges o know he whole loss vecor one enry per acion) of ha round. The player would like o have an online algorihm, which can learn from he pas and hopefully make beer decisions as ime goes by, so ha he oal accumulaed loss is small. The sandard way of evalu- Insiue of Informaion Science, Academia Sinica, Taipei, Taiwan. {chaokai,cjlu}@iis.sinica.edu.w Deparmen of Compuer Science Informaion and Engineering, Naional Taiwan Universiy, Taipei, Taiwan aing such an online algorihm is o compare is oal loss wih ha of he bes fixed acion in hindsigh. The difference beween hese wo losses is called he regre, and he goal of an online algorihm is o minimize is regre. There have been many wonderful works on his problem, and i has grown ino a rich opic wih conribuions coming from several areas such as machine learning, algorihms design, and saisics. More informaion can be found in he survey papers such as [3, 5] or he nice book [6], and a sample of more recen works includes [1, 17, 4, 7, 16, 8, 10, 18, 11, 2]. For he online learning problem which has N acions o choose and T rounds o play, here are some algorihms which achieve a regre of O T ln N), and he bound is in fac igh as a maching lower bound of Ω T ln N) can be shown see e.g. [6]). Noe ha hese bounds hold in he mos general and adversarial seing in which he loss vecor in each round could be any arbirary one in [0, 1] N. On he oher hand, i becomes possible o achieve a smaller regre when he loss vecors have consrains. For he online convex opimizaion problem, which generalizes he online learning problem, Hazan, Agarwal, and Kale [13] showed ha when he loss funcions saisfy some nice properies, such as sric convexiy wih bounded firs and second derivaives), a regre of Oln T ) can be achieved. The resul, however, does no seem o carry over o he online learning problem. For he online linear opimizaion problem, Hazan and Kale [14] considered he case in which he sequence of T loss funcions have a small variaion V, and hey showed ha a regre of O V ) can be achieved. They also have a analogous resul for he online learning problem. Anoher siuaion in which one can have consrains on he loss funcions/vecors, even hough hey could sill be arbirary, is when one can obain some prior informaion abou hem. For he online linear opimizaion problem, Hazan and Megiddo [15] showed ha if he player knows he firs enry of he loss vecor as he prior informaion) before choosing her acion in each round, a regre of ON 2 ln T ) can be achieved. They also considered modeling he prior informaion in each round as some sae vecor and measuring he regre agains sronger) offline algorihms which are allowed o have heir acion in ha round depend in a cerain way on he same prior informaion. In 616 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

2 his seing, hey showed ha a regre can be achieved which depends on T in he form of OT 1 1/d+2) ) where d is he dimension of he sae vecors. In hese previous works, he online player seems o be considered as having a passive role in he environmen, wih no conrol over he consrains of loss funcions: eiher he player is in a somewha benign environmen in which he loss funcions hemselves saisfy some nice propery, or he player passively observes some revealed informaion abou he loss funcions. On he oher hand, here are scenarios in which i seems possible for he player o spend effors or resources o acively collec some informaion of her choice abou he loss funcions. For example, before deciding which roue o ake, a driver may firs selec some roues and ry o collec heir raffic condiions; before deciding which socks o rade, an invesor may firs selec some socks and ry o do some research on heir poenial; before choosing he nex move in a game, a player may firs selec some moves and ry o evaluae how good hey are. However, in mos siuaions, one is unlikely o have an unlimied amoun of effors or resources o collec all he informaion one would like o have; herefore, one needs o decide how o spend he limied effors or resources in an efficien way. We would like o iniiae a sudy on such scenarios. As a sar, we consider modifying he online learning problem in he following way. In each round, we give he player a B-bi budge which allows her o query B bis of her choice on he loss vecor before choosing her acion, where we assume ha each loss value is represened by a K-bi sring and disinc loss values differ by a leas some amoun δ. We allow he queries o be made in a randomized way, bu we also allow an adversary o se each bi of a loss vecor afer receiving he corresponding query made by he online algorihm, alhough we sill require ha he adversary fixes a loss vecor before seeing he algorihm s acion. This has he purpose of limiing he power of queries and capuring he poenial delay beween he queries and acions made by he algorihm. Noe ha our model has he original online learning problem as a special case, when B = 0. On he oher hand, when B = NK, one can achieve a zero regre since one has enough budge o figure ou he whole loss vecor and choose he bes acion in each round. The ineresing case is when he value of B lies in he middle, and some quesions arise. Wih a limied number of queries, where should one spend hem? I is naural o expec ha wih a larger B, one can obain more informaion abou he loss vecors and achieve a smaller regre, bu how does he regre look like as a funcion of he budge bound B? We will ry o answer hese quesions in his paper, by providing an algorihm for his problem ogeher wih lower bounds on he regre which almos mach hose achieved by he algorihm. Our algorihm is based on he well-known weighed average algorihm, which achieves an opimal regre for he original online learning problem. To work in our new seing, we add a sep for making he queries and modify he way an acion is chosen in each round while keeping he weighs updaed in he same muliplicaive way). Insead of using he probabiliy disribuion p of he weighed average algorihm o choose an acion in each round, we use p o guide our queries, and from he query resul, we modify he disribuion p by moving probabiliies around among some acions. Our sraegy is o use queries o find ou acions wih differen loss values so ha by moving he probabiliies o acions wih a smaller loss, he expeced loss in ha sep can be reduced from ha of he weighed average algorihm. We sar he queries on acions wih larger probabiliies in p, hoping ha a larger amoun of probabiliies can be moved around so ha a larger reducion on he loss can be achieved. The regre which our algorihm achieves depends on he budge bound B in he following way. Before B approaches he bound B 1 = NK/2, he regre remains a O T ln N) which is wihin he same order as ha of he no-query case B = 0). Afer B passes he bound B 1 bu before i approaches he bound B 2 = NK/2 + 3K/2 1, here is a noiceable drop of he regre o O T ln N)/N). Finally, afer B passes he bound B 2, he regre akes a dramaic drop o N ln N)/δ, which is independen of T. One may see our regre bound as having wo phase ransiions, one minor and one major, a he wo criical poins B 1 and B 2. One may wonder if his ineresing shape of he regre bound is jus an arificial resul of he paricular algorihm we design. We show ha i is no he case and i acually comes from he naure of he problem. We do his by providing regre lower bounds which almos mach he regre bounds achieved by our algorihm. As a resul, we know ha unless one can query close o half of he bis in he loss vecors, hese queries do no help much as hey can only reduce he regre by a consan facor. Moreover, even when one can have he number of queries close o B 2, one can only reduce he regre by a facor of N. On he oher hand, according o our algorihm, when he budge bound exceeds B 2, 617 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

3 he queries suddenly become exremely useful, and he regre can be made exremely small which does no even depend on T. I is a pleasan surprise o see how he budge affecs he regre in such an ineresing way. We consider our work as a preliminary sep in he new direcion of allowing queries in online learning. There are many quesions ha remain o be answered, and nex we lis hree of hem. Firs, recall ha in our model, we allow an adversary o se each bi of a loss vecor afer receiving he corresponding query made by he online algorihm. This somewha limis he power of he queries even hough queries are allowed o be randomized. Sill, we show ha queries can be very powerful when heir number exceeds some hreshold. We would like o undersand if he queries could become even more powerful when an adversary has o fix a loss vecor before he online algorihm makes any query on i. Nex, our algorihm is based on he specific weighed average algorihm and our regre analysis seems o rely crucially on some of is special properies. We would like o undersand if i is possible o modify any exising online algorihm, insead of jus he weighed average algorihm, o use queries o achieve a smaller regre. Finally, in our query model, we allow he online algorihm o obain he informaion of individual bis of a loss vecor, bu his may no be realisic in some seings. In hese seings, we would like o have more appropriae queries models which capure he kind of informaion one can obain from loss vecors, and hen o design algorihms which can uilize such queries o achieve small regres. The ouline of he paper is he following. In Secion 2, we inroduce some definiions and provide some basic facs. In Secion 3, we consider a special case of he problem and provide a simple algorihm wih a simple analysis, which conains he essenial ideas. Then we provide an algorihm and analyze is regre for he general problem in Secions 4 and 5. Finally, we prove regre lower bounds which almos mach he regre upper bounds achieved by our algorihm. 2 Preliminaries Firs, we inroduce some noaions which will be used in his paper. For a binary vecor v, le # 1 v) denoe he number of ones in v. For a se S, le S denoe he number of elemens in S. For a posiive ineger N, le [N] denoe he se {1, 2,, N}. Nex, le us describe he original online learning problem. Suppose here is a se of N available acions and here are a oal of T rounds o play. In each round [T ], an online algorihm A chooses o play an acion according o some disribuion p = p 1,, p N ) over he N acions, where p i is he probabiliy ha A plays acion i in round. Afer ha a loss vecor l = l 1,, l N ) [0, 1]N is revealed o A, where l i is he loss of playing acion i in round, and A suffers an expeced loss p i l i. The expeced loss of A in T rounds of plays is T L T A = p il i, =1 and we compare i wih ha of he bes fixed acion in hindsigh, which is L T min = min T l i. =1 The goal of A is o minimize is regre, defined as R T A = L T A L T min. For his problem, here are algorihms which achieve an opimal regre of O T ln N). Nex, we describe one of hem, called he weighed average algorihm, denoed as A 0, which will be used laer o build our algorihm. In each round, A 0 mainains a weigh vecor w = w1,, wn ) iniially, w1 = 1 N ) ogeher wih he disribuion p = p 1,, p N ) such ha for each i [N], 2.1) p i = w i W, where W = and performs he following wo seps: j [N] w j, Sep 1: A 0 plays an acion sampled according o he disribuion p = p 1,, p N ). Sep 2: A 0 afer receiving he loss vecor l = l 1,, l N ) updaes is weighs o w+1 = w1 +1,, w +1 N ) according o he rule ha for each i [N], 2.2) w +1 i = w i e l i, where he parameer is he learning rae which one can choose. Several ways are known for bounding he regre of his algorihm. However, for our resul o work, we will use he paricular one given in he following lemma, which we will prove in Appendix A. Noe ha i guaranees a regre of a mos ln N by choosing = ln N)/T. + T = 2 T ln N 618 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

4 Lemma 2.1. For any i 1, i 2,..., i T [N], he regre of A 0 is a mos ln N T +. =1 i:l i l i p i In his paper, we sudy a new seing ha he online algorihm is allowed o query some informaion abou he loss vecor before choosing is acion o play in each round. More precisely, in each round, he algorihm is allowed o query B bis from he loss vecor l. Here, we assume ha each loss value l i comes from a se of a mos 2 K values so ha we can represen each value by a K-bi sring, wih a smaller binary represenaion for a smaller loss value, and we assume furhermore ha any wo disinc loss values differ by a leas some amoun δ. For he clariy of our presenaion, we assume here ha δ and hus K) is a consan, and we also assume ha he algorihm knows he numbers B, δ, and T before i sars. In each round, while we allow he online algorihm o make randomized queries, we also allow an adversary he power o se he bis of he loss vecor afer receiving he corresponding queries, bu sill he adversary mus fix he loss vecor before seeing he acion chosen by he algorihm. 3 A Special Case In his secion, we provide a simple example showing ha even wih a one-bi query in each round, i becomes possible o reduce he regre significanly. We use his simpl case o illusrae he basic ideas, which will be exended for he more difficul general case in he nex secion. The resul of his secion is he following. Theorem 3.1. For he special case of he online learning problem wih N acions such ha loss vecors are from {0, 1} N and he budge bound is B = 1 per round, here exiss an algorihm A 1 which achieves a regre of a mos N ln N. Before proving he heorem, le us firs see how some parial informaion abou a loss vecor can be used o save some loss for he online algorihm. One example is ha if we know l i > l j in round, hen by moving some probabiliy q i from playing acion i o playing acion j, we can save he expeced loss by he amoun 3.3) qi l i q i l ) j = qi l i l ) j = qi since l i, l j {0, 1}, which means ha a larger q i gives a larger saving. This suggess ha we query he bi l i when acion i is he one ha we iniially plan o play wih he highes probabiliy, hoping ha from i we can move a large probabiliy o some oher acion wih a smaller loss value. Using his idea, we will design he algorihm A 1 and analyze is regre nex. Proof. of Theorem 3.1) The algorihm A 1 is based on he weighed average algorihm A 0 described in he previous secion, bu i adds a query sep and hen modifies he disribuion of acions in each round. More precisely, in round, A 1 mainains a weigh vecor w = w1,..., wn ) and he disribuion p = p 1,..., p N ) defined as in 2.1), bu i replaces Sep 1 of A 0 by he following: Sep 1.1. A 1 queries he bi l i of he loss vecor, where i is he acion such ha p i p j for every j [N]. Sep 1.2. A 1 derives he disribuion ˆp from p by moving is probabiliies in he following way: If l i = 0, hen A 1 moves all he probabiliies of oher acions o acion i, so ha ˆp i = 1 and ˆp j = 0 for any j i. If l i = 1, hen A 1 moves he probabiliy of acion i o oher acions evenly, so ha ˆp i = 0 and ˆp j = p j + p i /N 1) for any j i. Sep 1.3. A 1 plays an acion sampled according o he disribuion ˆp. Nex, we analyze he regre of A 1. We do his by comparing i wih ha of A 0, which by Lemma 2.1 is a mos ln N T +. =1 i:l i l i p i According o 3.3), in each round, by moving he probabiliies around, he algorihm A 1 can reduce he loss of A 0 by some amoun s, such ha when l i = 0, s = and when l i = 1, s i:l i l i p i i:l i l i l i l i ) = p i l N i l 1 i) N i:l i l i p i, i:l i l i p i, since p i p i for any i. As a resul, he regre of A 1 is 619 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

5 a mos ln N + T =1 ln N + = N ln N, p i s i:l i l i 1 ) T N p i =1 i:l i l i by choosing = 1/N. This proves Theorem Main Resul In his secion, we consider he general online learning problem described in Secion 2. We will generalize he algorihm A 1 in he previous secion o he general seing, and our main resul is he following heorem. Theorem 4.1. Le D = max{0, N K/2+3K/2 1 B}. Then for he general online learning problem described in Secion 2, here exiss an online algorihm A 2, which given a budge of B queries per round achieves a regre { N ln N)/δ if D = 0, RA T 2 8DT ln N)/NK) if D > 0, for a large enough T. Before proving he heorem, le us ry o undersand beer he somewha complicaed-looking regre bound, and in paricular, o see how he regre is affeced by he budge bound B. Firs, observe ha as B increases from zero, he quaniy D decreases, and consequenly he regre RA T 2 decreases. This maches wha one normally would expec. Nex, le us ake a closer look a how RA T 2 decreases as B increases. Ineresingly, he value of RA T 2 appears o go hrough wo phase ransiions, one minor and one major, around B = N K/2 and B = N K/2 + 3K/2 1, in he following sense. When B 1 ε)nk/2 for any small posiive consan ε, RA T 2 remains a ) O T ln N which is wihin he same order as ha of he no-query case B = 0). When NK/2 B NK/2+1 ε)3k/2 for any small posiive consan ε, RA T 2 akes a noiceable drop o ) O T ln N)/N. Finally, when B NK/2 + 3K/2 1, RA T 2 akes a dramaic drop o N ln N)/δ, which is very small and independen of T. Nex, we proceed o prove Theorem 4.1 by providing he algorihm A 2 and hen bounding is regre in he following wo subsecions, respecively. 4.1 The Algorihm A 2. The algorihm A 2 is based on he algorihm A 1 in he previous secion which in urn is based on he weighed average algorihm A 0 ), bu i modifies Sep 1.1 for making queries) and Sep 1.2 for deriving he disribuion ˆp ) in order o handle he more general case. Consider any round. Jus as in A 1, we would like o use queries o find ou some relaionships among he losses of acions so ha we can move probabiliies o acions wih a smaller loss. Now in he general case, which can have B > 1 and K > 1, we need o decide where o spend he B bis of budge; if we spend hem efficienly, we can find ou more relaionships. We will call an acion i heavier han an acion j if p i p j, and we call i ligher han j oherwise. Le i denoe he heavies acion, and our sraegy is o use is loss value l i as a basis and o find ou is relaionship wih l i for as many acion i s as possible. Here, we look firs for a parial relaionship such as l i l i insead of an exac one such as l i = l i or l i < l i, so ha we can spend as few queries as possible and sill know some way o move he probabiliy. Following he idea in Secion 3, we will query heavier acions before ligher ones, hoping ha larger probabiliies can be moved among hem. Formally, in each round, A 2 replaces Sep 1.1 of A 1 by he following: Sep 1.1. Before he B-bi budge runs ou, A 2 queries he K bis of l i, where i is he heavies acion, and hen repeas he following if l i / {1 K, 0 K }: a) A 2 finds he nex heavies acion i. b) A 2 queries hose bis of l i in hose posiions which have zeros in l i if l i has fewer zeros han ones i.e., # 1 l i ) > K/2), and queries he oher bis of l i oherwise. For example, if l i = 100, hen A 2 queries only he lefmos bi of l i.) c) If any of he queried bi in l i differs from he corresponding bi in l i, A 2 queries all he remaining bis in l i. Noe ha if l i equals 1 K or 0 K, A 2 will no make any furher query on any oher acion i because i knows already he relaionship l i l i or l i l i, respecively. If in Sep 1.1.b) all he queried bis mach he corresponding bis in l i, A 2 knows he relaionship l i l i or l i l i when hose bis are all zeros or all ones, respecively. Oherwise if here is a mismach), hen in Sep 1.1.c) A 2 will query he remaining bis in 620 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

6 l i o deermine wheher l i < l i or l i > l i. From such informaion, A 2 can divide he N acions ino six ses: I <, I, I =, I, I >, and I?, in he following way. If A 2 knows l i < l i or l i > l i, A 2 pus acion i in I < or I >, respecively. If A 2 only knows l i l i or l i l i, A 2 pus acion i in I or I, respecively. If A 2 sill does no know any relaionship beween l i and l i afer running ou he budge, A 2 pus acion i in I?. Finally, le I = = {i }. Wih such informaion a hand, A 2 will derive he new disribuion ˆp from he disribuion p by rying o move probabiliies o acions wih a smaller loss. We will say ha he probabiliies of some se I of acions are moved o anoher se I of acions evenly if ˆp i = 0 for i I and ˆp i = p i + j I p j / I for i I. Formally, in each round, A 2 replaces Sep 1.2 of A 1 by he following: Sep 1.2. A 2 derives he disribuion ˆp from p by moving is probabiliies in he following way: If I <, A 2 moves all he probabiliies from I = I I > I o some i 0 I <. If I < = I, A 2 moves all he probabiliies from I = I > I o I evenly. If I < = = I, A 2 moves all he probabiliies from I > I o I =. The oher seps of he algorihm A 1 are all inheried wihou any change by he algorihm A 2, excep ha now A 2 ses is learning rae as { δ/n if D = 0, 4.4) = NK ln N)/2T D) if D > 0. Nex, we will show ha he algorihm A 2 indeed achieves he regre bound given in Theorem Proof of Theorem 4.1. We follow he analysis in Secion 3. For each round, le s denoe he amoun of loss A 2 saves from ha of A 0 by moving he probabiliies around and playing according o he disribuion ˆp insead of p ), and le r = p i s. i:l i l i According o Lemma 2.1 and he discussion in Secion 3, we can bound he regre of A 2 as 4.5) R T A 2 ln N + T r. =1 Then we bound each r by he following lemma. Lemma 4.1. Le D = max{0, NK/2 + 3K/2 1 B}, and suppose δ/n. Then for any [T ], r 2D NK. We will prove he lemma in Secion 5. Now le us apply i o he bound in 4.5) and consider wo cases, depending on he value of D. If D = 0, by choosing = δ/n, we have R T A 2 ln N = N ln N. δ If D > 0, by choosing = NK ln N)/2T D), which is a mos δ/n for a large enough T, we have RA T 2 ln N + 2DT 8DT ln N NK = NK. This complees he proof of Theorem Proof of Lemma 4.1 Consider any [T ], and le i be he heavies acion which A 2 queires firs in round. Recall ha r = p i s, i:l i l i where s is he saving of loss in round by playing according o he probabiliy disribuion ˆp insead of p. Our goal is o show ha 5.6) r 2D NK, where D = max{0, NK/2 + 3K/2 1 B}. For his, we consider wo cases, depending on wheher or no I < = 0. Firs, le us consider he easier case ha I < 0. In his case, he algorihm A 2 moves he probabiliy p i from he acion i and possibly also probabiliies from oher acions) o some acion i 0 I < wih l i 0 < l i, which means ha he saving of loss is s p i l i l i 0 ). Since i is he heavies acion, we have p i 1/N, and since disinc loss values differ by a leas δ, we have l i l i 0 δ. As a resul, we have r = p i s δ/n 0, i:l i l i by he assumpion ha δ/n. Thus, he bound in 5.6) holds in his case. Nex, le us consider he more difficul case ha I < = 0. We rely on he following claim which we will prove in Subsecion Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

7 Claim 5.1. If I < = 0, hen we have r k I? p k δ ) and I? I > 2D/K. j I > p j Recall ha A 2 queries heavier acions before ligher ones, which implies ha p k p j for any k I? and j / I?. Now le I?1 be he se of he I > heavies acions in I?, and le I?2 be he se of he remaining acions in I?, so ha I?2 consiss of he I?2 2D/K lighes acions among all he N acions. Then we have p k = p k + p k k I? k I?1 k I?2 p j + I?2 N j I > p j + 2D NK, j I > and subsiuing his ino he bound in Claim 5.1, we obain r p j + 2D δ ) p j NK j I > j I > 2D δ 2) p j NK j I > 2D NK, since δ 2 by he assumpion ha δ/n. Thus, he bound in 5.6) also holds in he case ha I < = 0. To complee he proof of Lemma 4.1, i remains o prove Claim 5.1, which we do nex. 5.1 Proof of Claim 5.1. Assume I < = 0. Le us consider wo cases according o he range of # 1 l i ), as algorihm A 2 behaves differenly in hem. Case 1: # 1 l i ) K/2. In his case, A 2 sars is queries on posiions corresponding o ones in l i, and afer finishing all he queries, each acion i i belongs o one of he hree ses: I, I >, or I?. Since A 2 moves all he probabiliies from I I > o I = = {i }, i reduces he loss of A 0 by a leas l i l ) i + l j l ) i, i I :l i >l i p i which implies ha 5.7) s δ j I > p j i I :l i >l i p i + δ j I > p j, since disinc loss values differ by a leas δ. oher hand, 5.8) i:l i l i p i p i + p j + p k, i I :l i >l i j I > k I? On he and noe ha he firs erm in 5.8) is a mos he firs erm in 5.7) since δ. As a resul, r = i:l i l i p i s p j + p k δ j I > k I? = k I? p k δ ) j I > p j. j I > p j Nex, le us bound I? I >. We can assume ha A 2 does run ou he budge in Sep 1.1 because oherwise we have I? = 0 and hence I? I > 0 2D K. Assuming ha no budge remains and since he number of queries A 2 spends on he acions in I =, I, I >, and I? are a mos K, K/2) I, K I >, and K 1, respecively, we have B K + K 2 I + K I > + K 1). On he oher hand, we know ha N = 1 + I + I > + I?, and by combing hese wo bounds o remove I, we obain I? I > 2 NK + 3K ) K B 2D K. Case 2: # 1 l i ) > K/2. In his case, A 2 sars is queries on posiions corresponding o zeros in l i, and afer finishing all he queries, each acion i i belongs o one of he hree ses: I, I >, or I?. Since A 2 moves all he probabiliies from I = I > o I evenly, i reduces he loss of A 0 by a leas i I :l i <l i which implies ha p i I l i l i) + j I > i I 5.9) s δ p i N + δ p j. i I :l i <l i j I > p j I l j l i) 622 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

8 On he oher hand, 5.10) i:l i l i p i p i + p j + p k, i I :l i <l i j I > k I? and noe ha again he firs erm in 5.10) is a mos he firs erm in 5.9) because δ/n and p i p i for any i. Therefore, by subracing 5.9) from 5.10), we can obain he same bound for r as in Case 1. Furhermore, following a similar argumen as in Case 1, one can show ha and B K + K 2 I + K I > + K 1), which ogeher give 6 Lower Bounds N = 1 + I + I > + I?, I? I > 2D K. In his secion we provide regre lower bounds which almos mach he upper bounds achieved by our algorihm. The resul of his secion is he following, and for he simpliciy of our presenaion, we assume here ha K is even. Theorem 6.1. Suppose ε is any consan in 0, 1) and c is a large enough consan. Then any algorihm A for he general online learning problem mus have { RA T Ω T ln N) if B N εn)k/2, Ω T ln N)/N) if B N c)k/2, for a large enough N. Here we do no aemp o prove a maching lower bound for he case which has an N ln N)/δ regre upper bound in Theorem 4.1, since we consider he bound o be exremely small as i can be seen as a consan in erms of T. Our proof of Theorem 6.1 basically follows he approach of [9, 3] for proving lower bounds on approximaely solving a game. A key ool used here is a lower bound on he ail of he binomial disribuion, while for our proof, we need he following bound for more general disribuions. Lemma 6.1. Suppose ha µ, δ 1, δ 2,..., δ n are consans in 0, 1), and X 1, X 2,..., X n are independen random variables such ha for each i [n], Pr [X i = µ δ i ] = Pr [X i = µ + δ i ] = 1/2. Then for any λ c/ n for a large enough consan c, we have Pr X i 1 λ)µn e Oλ2n). We will prove he lemma in Subsecion 6.1, and now le us proceed o prove Theorem 6.1. Proof. of Theorem 6.1) Consider any algorihm A. We would like o show he exisence of a sequence of T loss vecors from which A suffers a large regre. We prove is exisence by he probabilisic mehod. We will generae he T loss vecors in some probabilisic way. Le us see he T loss vecors as an N T marix, in which he enry on row i [N] and column [T ] is he loss value of acion i in round. Here we consider 2 K possible loss values in he range from 0 o 1 2 K wih he naural K-bi binary represenaion. We would like each enry o be independenly disribued and have he same expeced value µ, for some consan µ. This means ha he expeced loss of A in each round is exacly µ, and he loss vecors are independen of each oher. Thus a Chernoff-Hoeffding bound shows ha 6.11) Pr [ L T A µt v ] e Ωv2 /T ), for any v > 0. To make A spend as many queries as possible on an enry wihou figuring ou is relaionship wih µ, we choose µ o have he binary represenaion 01) K/2, which has alernaing zeros and ones. Then in each round, we answer queries and sample enries in he following way. For each query o some bi of an enry, we answer i wih he corresponding bi in µ. Afer answering all he queries, some bis of he loss vecor have now been fixed, and some remain free. For any enry wih wo adjacen bis which have no been fixed and he corresponding wo bis in µ mus have differen values), we make he enry uncerain for A as follows: se hose wo bis of he enry o 00 or 10 wih equal probabiliy if hey are 01 in µ and se hem o 01 or 11 wih equal probabiliy if hey are 10 in µ. All he oher bis are hen fixed as hose in µ. In his way, each enry indeed has expeced value µ and is independen from ohers alhough some have a fixed value µ), and we can see each uncerain enry as a random variable saisfying he condiion in Lemma 6.1. For he clariy of our presenaion, we assume here ha µ and each δ i are consans, bu i is no hard o derive he dependence of hem in our bounds. Nex, we analyze he regre by considering wo cases depending on he range of B. 623 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

9 Case 1: B N εn)k/2 for some consan ε 0, 1). Since i akes a leas K/2 queries o an enry o avoid is uncerainy oherwise, i mus miss some adjacen bis), here mus be a leas εn acions whose corresponding enries in he loss vecor are lef uncerain afer running ou he budge in each round. Thus, he oal number of uncerain enries in he marix is a leas εnt, which implies he exisence of a collecion S of εn/2 acions rows) each of which has uncerain enries in εt/2 rounds columns). This is because oherwise he oal number of uncerain enries in he marix would be less han εn/2)t + NεT/2) = εnt, a conradicion. Now consider any acion in S and he n = εt/2 rounds in which i has uncerain enries fixing any addiional uncerain enries o µ). By applying Lemma 6.1 on hose rounds, one can show ha he accumulaed loss of ha acion in hose rounds is a mos 1 λ)µn wih probabiliy a leas e Oλ2 n) e 1/2) ln N = 1/ N, for some λ = Θ ln N)/n), and when his happens, is oal loss in T rounds is a mos 1 λ)µn + µt n) = µt λµn µt Ω T ln N). Therefore, he probabiliy ha some acion in S has such a oal loss is a leas 1 1 1/ S N) 1 e Ω N). On he oher hand, using he bound in 6.11) wih v = T, we have Pr [ L T A µt v ] e Ω1). As a resul, we can conclude ha RA T µt v) µt Ω ) T ln N) wih probabiliy = Ω T ln N) T = Ω T ln N), 1 e Ω N) e Ω1) > 0, for a large enough N. This implies he exisence of a sequence of T loss vecors from which he algorihm A suffers such a large regre. Case 2: B N c)k/2 for a large enough consan c. Following he same reasoning as in Case 1, one can show ha here mus be a leas c uncerain enries in each round column) and hus he oal number of uncerain enries in he marix is a leas ct. Then we claim ha eiher here are rc 2)/2 acions rows) each of which has uncerain enries in T/e r rounds for some r r = ln N ln ln N, or here are N/ ln N acions each of which has uncerain enries in T/N rounds. This is because oherwise he oal number of uncerain enries in he rc 2)/2 rows wih mos uncerain enries would be less han r 2)/2)T/e r=1c r 1 ) < c 2)/2)T r 0 < c 2)T 1/e r ) while he oal number of uncerain enries in he remaining rows would be less han N/ ln N)T/e r ) + NT/N) = 2T, and hus he oal number of uncerain enries in he marix would be less han c 2)T + 2T = ct, a conradicion. Now le us firs consider he subcase ha here are rc 2)/2 acions each of which has uncerain enries in n = T/e r rounds, for some r r. In his subcase, we can choose λ = Θ 1/n) and follow he argumen in Case 1 o show ha some of hese acions has a oal loss of a mos wih probabiliy a leas µt λµn µt Ω T/e r ) 1 1 e Ω1)) rc 2)/2 1 e Ωcr). On he oher hand, using he bound in 6.11) wih v = c 0 T/er for a small enough consan c 0, we have which implies ha Pr [ L T A µt v ] e Ωe r), R T A µt v) µt Ω ) T/e r ) = Ω T/e r ) c 0 T/e r = Ω T/e r ) Ω T ln N)/N), 624 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

10 1 e Ωcr) e Ωe r ) 1 e Ωcr) 1 Ωe r ) ) = Ωe r ) e Ωcr) > 0, for any r [1, r] and a large enough consan c. Nex, le us consider he subcase ha here are N/ ln N acions each of which has uncerain enries in n = T/N rounds. In his subcase, we can choose λ = Θ ln N)/n) and follow he argumen in Case 1 o show ha some of hese acions has a oal loss of a mos µt λµn µt Ω T ln N)/N) wih probabiliy 1 1 1/ N/ ln N N) 1 e Ω N/ ln N). On he oher hand, using he bound in 6.11) wih v = c 0 T ln N)/N for a small enough consan c0, we have which implies ha Pr [ L T A µt v ] e Ωln N)/N), R T A µt v) wih probabiliy µt Ω ) T ln N)/N) = Ω T ln N)/N) c 0 T ln N)/N = Ω T ln N)/N), 1 e Ω N/ ln N) Ωln N)/N) e 1 e Ω N/ ln N) 1 Ωln N)/N)) = Ωln N)/N) e Ω N/ ln N) > 0, for a large enough N. From hese wo subcases, we can conclude he exisence of a sequence of T loss vecors such ha R T A Ω T ln N)/N). 6.1 Proof of Lemma 6.1. Le Y = Y 1, Y 2,..., Y n ) be a sequence of independen random variables wih Pr [Y i = 1] = Pr [Y i = 1] = 1/2 for each i [n], and i is known ha for any α 0, 1), 6.12) Pr Y i αn e Oα2n), which can be shown using he Sirling formula. Noe ha each random variable X i has he same disribuion as µ + δ i Y i, and hus Pr X i 1 λ)µn = Pr µ + δ i Y i ) 1 λ)µn = Pr δ i Y i λµn. Le δ = δ i/n and le γ = λµ/ δ so ha λµn = γ δn. Le A denoe he even ha δ i Y i γ δn, and our goal now becomes o bound Pr [A]. For his, we consider anoher relaed even, denoed as B, ha Y i 2γn, and we know from 6.12) ha Pr [B] e Oγ2 n). Observe ha in he simpler case when all he δ i s are he same and hus equal o δ, even B implies even A so ha we have Pr [A] Pr [B]. However, when hese δ i s are differen, even B does no necessarily imply even A, so Pr [A] may no be as large as Pr [B] in general. Sill, we will show ha Pr [A] is in fac almos as large as Pr [B]. One approach is o use he inequaliy ha Pr [A] Pr [A B] = Pr [B] Pr [A B], and show ha Pr [A B] is large. However, i urns ou o require some edious calculaion o bound Pr [A B], so we ake a slighly differen approach. Le us decompose he even B ino several disjoin evens in he following way. For any ineger n, le B be he even ha exacly of he n random variables Y 1, Y 2,..., Y n have he value 1, or equivalenly Y i = 2 n. Since 2 n 2γn if and only if 1/2 γ)n, we have B = B and Pr [B] = 1/2 γ)n 1/2 γ)n Pr [ B ]. 625 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

11 Then we use he following bound: 6.13) Pr [A] Pr A = Pr = = 1/2 γ)n 1/2 γ)n 1/2 γ)n 1/2 γ)n B A B ) Pr [ A B ] Pr [ A B ] Pr [ B ], so i suffices o show ha each Pr [A B ] is large. Le us fix any ineger 1/2 γ)n, and we nex show ha Pr [A B ] 1/2 by proving ha Pr [ A B ] 1/2. Observe ha he disribuion of Y = Y 1, Y 2,..., Y n ) condiioned on B is he same as ha of sampling uniformly from hose srings in { 1, 1} n wih exacly number of 1 in hem, and le Z = Z1, Z2,..., Zn) denoe such a condiional disribuion. Then we have 6.14) Pr [ A B ] = Pr δ i Zi > γ δn, which we will bound using he second momen mehod. Noe ha all he random variables Z 1, Z 2,..., Z n have he same disribuion and hus he same expeced value, which we denoe by β, wih β = n n n = 2 n n 2γ. Furhermore, any wo of he random variables are negaively correlaed in he following sense. Claim 6.1. For any disinc i, j [n], E [ Z i Z j] E [ Z i ] E [ Z j ]. We will prove he claim laer. Now observe ha he probabiliy in 6.14) equals Pr δ i Z i β ) > γ δn β Pr δ i Z i β ) > γ δn, since γ δn β δ i γ δn + 2γ δn = γ δn. Then δ i he probabiliy above is a mos Pr δ i Z i β ) E 2 > γ δn ) 2 [ δ i Z i β) ) 2 ] γ δn ) 2 by Markov inequaliy, and he numeraor above equals δ i δ j E [ Zi β ) Zj β )] i,j [n] = i,j [n] Noe ha when i j, we have δ i δ j E [ Z i Z j] β 2 ). E [ Z i Z j] β 2 = E [ Z i Z j] E [ Z i ] E [ Z j ] 0 by Claim 6.1, and when i = j, we have E [ Z i Z j] β 2 E [ Z i Z j] = 1. Combining all hese bounds ogeher, we have Pr [ A B ] δ2 i δ i γ δn) 2 γ 2 δ = 1 2 n 2 γ 2 δn. Since we assume ha λ c/ n for a large enough consan c, we have γ 2 δn = λ 2 µ 2 n/ δ c 2 µ 2 / δ 2 and hus Pr [ A B ] 1 γ 2 δn 1 2. Finally, by subsiuing he above bound ino 6.13), we have Pr [A] 1 1 ) Pr [ B ] = 1 Pr [B], 2 2 1/2 γ)n and hen by applying 6.12) o bound Pr [B], we obain Pr [A] 1 2 e Oγ2 n) = e Oλ2 n), as γ = λµ/ δ = Θλ). Thus, o finish he proof of Lemma 6.1, i remains o prove Claim 6.1, which we do nex. Proof. of Claim 6.1) Fix any disinc i, j [n]. Noe ha we have Pr [ Z j = 1 Z i = 1 ] = 1 n 1 n = Pr [ Z j = 1 ], 626 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

12 which implies ha E [ Z j Z i = 1 ] = 2 Pr [ Z j = 1 Z i = 1 ] 1 and we also have 2 Pr [ Z j = 1 ] 1 = E [ Z j], Pr [ Zj = 1 Zi = 1 ] n ) 1 = n 1 n n = Pr [ Zj = 1 ], which implies ha E [ Z j Z i = 1 ] = 1 2 Pr [ Z j = 1 Z i = 1 ] 1 2 Pr [ Z j = 1 ] = E [ Z j]. As a resul, we have E [ Zi Zj ] = Pr [ Zi = 1 ] E [ Zj Zi = 1 ] Pr [ Zi = 1 ] E [ Zj Zi = 1 ] Pr [ Zi = 1 ] E [ Zj ] Pr [ Zi = 1 ] E [ Zj ] References = E [ Z i ] E [ Z j ]. [1] J. Abernehy, P. Barle, and A. Rakhlin. Muliask Learning wih Exper Advice. In Proceedings of he 20h Annual Conference on Learning Theory COLT), pp , [2] D. Angluin, J. Aspnes, J. Chen, and L. Reyzin. Learning Large-Alphabe and Analog Circuis wih Value Injecion Queries. In Proceedings of he 20h Annual Conference on Learning Theory COLT), pp , [3] S. Arora, E. Hazan, and S. Kale. The Muliplicaive Weighs Updae Mehod: a Mea Algorihm and Applicaions. Manuscrip, [4] S. Ben-David, D. Pal, and S. Shalev-Shwarz. Agnosic Online Learning. In Proceedings of he 22nd Annual Conference on Learning Theory COLT), [5] A. Blum and Y. Mansour. Learning, Regre Minimizaion, and Equilibria. In Algorihmic Game Theory, Cambridge Universiy Press, New York, [6] N. Cesa-Bianchi and G. Lugosi. Predicion, Learning, and Games. Cambridge Universiy Press, New York, [7] E. Even-Dar, M. Kearns, Y. Mansour, and J. Worman. Regre o he Bes vs. Regre o he Average. In Proceedings of he 20h Annual Conference on Learning Theory COLT), pp , [8] E. Even-Dar, R. Kleinberg, S. Mannor, and Y. Mansour. Online Learning for Global Cos Funcions. In 22nd Annual Conference on Learning Theory COLT), [9] Y. Freund and R. Schapire. Adapive game playing using muliplicaive weighs. Games and Economic Behavior, 29, pp , [10] S. Guha and K. Munagala. Approximaion Algorihms for Budgeed Learning Problems. In Proceedings of he 39h Annual ACM Symposium on Theory of Compuing STOC), pp , [11] A. György, G. Lugosi, and G. Oucsák. On-line Sequenial Bin Packing. In Proceedings of he 21s Annual Conference on Learning Theory COLT), pp , [12] D. Haussler, J. Kivinen, and M. K. Warmuh. Sequenial Predicion of Individual Sequences Under General Loss Funcions. IEEE Transacions on Informaion Theory, 445), pp , [13] E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarihmic Regre Algorihms for Online Convex Opimizaion. In Proceedings of he 19h Annual Conference on Learning Theory COLT), pp , [14] E. Hazan and S. Kale. Exracing Cerainy from Uncerainy: Regre Bounded by Variaion in Coss. In Proceedings of he 21s Annual Conference on Learning Theory COLT), pp , [15] E. Hazan and N. Megiddo. Online Learning wih Prior Knowledge. In Proceedings of he 20h Annual Conference on Learning Theory COLT), pp , [16] E. Hazan and C. Seshadhri. Adapive Algorihms for Online Decision Problems. In Elecronic Colloquium on Compuaional Complexiy ECCC), TR07-088, [17] G. Lugosi, O. Papaspiliopoulos, and G. Solz. Online Muli-ask Learning wih Hard Consrains. In Proceedings of he 22nd Annual Conference on Learning Theory COLT), [18] M. K. Warmuh and D. Kuzmin. Online Variance Minimizaion. In Proceedings of he 19h Annual Conference on Learning Theory COLT), pp , [19] M. Zinkevich. Online Convex Programming and Generalized Infiniesimal Gradien Ascen. In Proceedings of he Twenieh Inernaional Conference on Machine Learning ICML), pp , A Proof of Lemma 2.1. Recall he updae rule from 2.2) ha for any [T ] and i [N], w +1 i = w i e l i, and recall W = w i. Following a sandard analysis of he weighed average algorihm see e.g. 627 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

13 [5, 6]), we have ln W T +1 W 1 = ln e [T ] l i N and moreover ln W T +1 W 1 = ln e LT min N = L T min ln N, = = T =1 ln W +1 W T ln =1 w i e l i W T ln p ie l i. =1 Then o ge he specific bound of he lemma, we rely on he following claim. Claim A.1. Suppose [0, 1/2], p i, l i [0, 1] for any i [N], and p i = 1. Then for any i [N], ln p i e l i p i l i + 2 i:l i l i We will prove he claim laer. Now assuming i and by combining i wih he bounds before, we have L T min ln N ln W T +1 W 1 T p il i + 2 =1 = L T A 0 + which implies ha T 2 =1 L T A 0 L T min ln N + i:l i l i p i T =1 i:l i l i p i, i:l i l i p i. Thus, o complee he proof of Lemma 2.1, i remains o prove Claim A.1, which we will do nex. A.1 Proof of Claim A.1. Consider he funcion f on x = x 1,..., x N ) [0, 1] N defined by fx) = ln p i e x i. p i. Then our goal is o bound he value of f a he poin l = l 1, l 2,..., l N ). Using Taylor s heorem, by expanding f a he poin l = l i, l i,..., l i ), we have fl) = fl ) A.1) + A.2) i,j [N] fl ) x i l i l i ) fv) x i x j l i l i )l j l i ), for some v [0, 1] N. Since p i = 1, we have fl ) = ln p i e l i = l i, and i remains o bound he wo erms in A.1) and A.2). Le hx) = p ie x i so ha fx) = ln hx), and le g i x) = hx) x i = p i e x i, for i [N]. Then i is no hard o show ha and 2 fx) = x i x j fx) x i g i x) hx) = g ix) hx) ) 2 gi x) hx) if i = j, gix)gjx) h 2 x) if i j. Using his, he erm in A.1) can be wrien as g i l ) hl ) l i l i ) = p i )l i l i ) = l i p i l i, while he erm in A.2) can be wrien as 1 ) ) 2 g i v) gi v) l i l i ) 2 2 hv) hv) g i v)g j v) h 2 l i l i )l j l i ) v) 1i<jN = 1 ) gi v) l i l i ) 2 2 hv) 1 2 g i v) 2 hv) l i l i ) 1 ) gi v) l i l i ) 2 2 hv) 2 p i l i l i ) 2, 628 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

14 where he las line follows from he fac ha wih [0, 1/2], g i v) hv) = 2 p i e v i p ie vi 2 p i e0 e 1/2 22 p i. Finally, by combining all hese bounds ogeher, we have fl) l i +l i p i l i + 2 p i l i l i ) 2 p i l i + 2 :l i l i by using he fac ha l i l i ) 2 1 when l i l i. This proves Claim A.1. p i, 629 Copyrigh by SIAM. Unauhorized reproducion of his aricle is prohibied.

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any