From Bandits to Experts: A Tale of Domination and Independence

Similar documents
V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

Epistemic Game Theory: Online Appendix

CS286.2 Lecture 14: Quantum de Finetti Theorems II

( ) () we define the interaction representation by the unitary transformation () = ()

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Variants of Pegasos. December 11, 2009

Solution in semi infinite diffusion couples (error function analysis)

On One Analytic Method of. Constructing Program Controls

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Comparison of Differences between Power Means 1

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

arxiv: v3 [cs.lg] 12 Jun 2018

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

Fall 2010 Graduate Course on Dynamic Learning

Tight results for Next Fit and Worst Fit with resource augmentation

Graduate Macroeconomics 2 Problem set 5. - Solutions

Testing a new idea to solve the P = NP problem with mathematical induction

Linear Response Theory: The connection between QFT and experiments

Let s treat the problem of the response of a system to an applied external force. Again,

Lecture 6: Learning for Control (Generalised Linear Regression)

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Robustness Experiments with Two Variance Components

Computing Relevance, Similarity: The Vector Space Model

arxiv: v1 [cs.sy] 2 Sep 2014

Cubic Bezier Homotopy Function for Solving Exponential Equations

2.1 Constitutive Theory

Lecture VI Regression

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

Part II CONTINUOUS TIME STOCHASTIC PROCESSES

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Survival Analysis and Reliability. A Note on the Mean Residual Life Function of a Parallel System

Relative controllability of nonlinear systems with delays in control

Lecture 11 SVM cont

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

Online Appendix for. Strategic safety stocks in supply chains with evolving forecasts

An introduction to Support Vector Machine

Appendix to Online Clustering with Experts

Clustering (Bishop ch 9)

Volatility Interpolation

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Notes on the stability of dynamic systems and the use of Eigen Values.

Mechanics Physics 151

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

A Deza Frankl type theorem for set partitions

II. Light is a Ray (Geometrical Optics)

Advanced Machine Learning & Perception

Introduction to Boosting

Algorithmic models of human decision making in Gaussian multi-armed bandit problems

TSS = SST + SSE An orthogonal partition of the total SS

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth

Robust and Accurate Cancer Classification with Gene Expression Profiling

Lecture 2 M/G/1 queues. M/G/1-queue

2/20/2013. EE 101 Midterm 2 Review

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

arxiv: v2 [cs.lg] 22 Nov 2016

Density Matrix Description of NMR BCMB/CHEM 8190

FI 3103 Quantum Physics

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

On computing differential transform of nonlinear non-autonomous functions and its applications

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Department of Economics University of Toronto

College of William & Mary Department of Computer Science

Math 128b Project. Jude Yuen

A decision-theoretic generalization of on-line learning. and an application to boosting. AT&T Labs. 180 Park Avenue. Florham Park, NJ 07932

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Chapter 6: AC Circuits

P R = P 0. The system is shown on the next figure:

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Comb Filters. Comb Filters

FTCS Solution to the Heat Equation

Advanced Macroeconomics II: Exchange economy

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

A decision-theoretic generalization of on-line learning. and an application to boosting. AT&T Bell Laboratories. 600 Mountain Avenue

( ) [ ] MAP Decision Rule

Li An-Ping. Beijing , P.R.China

Mechanics Physics 151

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Mechanics Physics 151

General Weighted Majority, Online Learning as Online Optimization

Scattering at an Interface: Oblique Incidence

A Reinforcement Procedure Leading to Correlated Equilibrium

Transcription:

000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 From Bands o Expers: A Tale of Domnaon and Independence Anonymous Auhors) Afflaon Address emal Absrac We consder he paral observably model for mul-armed bands, nroduced by Mannor and Shamr. Our man resul s a characerzaon of regre n he dreced observably model n erms of he domnang and ndependence numbers of he observably graph. We also show ha n he undreced case, he learner can acheve opmal regre whou even accessng he observably graph before selecng an acon. Boh resuls are shown usng varans of he Exp3 algorhm operang on he observably graph n a me-effcen manner. Inroducon Predcon wh exper advce see, e.g., 0, 3, 5, 8, 6 s a general absrac framework for sudyng sequenal predcon problems, formulaed as repeaed games beween a player and an adversary. A well suded example of predcon game s he followng: In each round, he adversary prvaely assgns a loss value o each acon n a fxed se. Then he player chooses an acon possbly usng randomzaon) and ncurs he correspondng loss. The goal of he player s o conrol regre, whch s defned as he excess loss ncurred by he player as compared o he bes fxed acon over a sequence of rounds. Two mporan varans of hs game have been suded n he pas: he exper seng, where a he end of each round he player observes he loss assgned o each acon for ha round, and he band seng, where he player only observes he loss of he chosen acon, bu no ha of oher acons. Le K be he number of avalable acons, and T be he number of predcon rounds. The bes possble regre for he exper seng s of order T log K. Ths opmal rae s acheved by he Hedge algorhm 8 or he Follow he Perurbed Leader algorhm 9. In he band seng, he opmal regre s of order T K, acheved by he INF algorhm 2. A band varan of Hedge, called Exp3 3, acheves a regre wh a slghly worse bound of order T K log K. Recenly, Mannor and Shamr nroduced an elegan way for defnng nermedae observably models beween he exper seng full observably) and he band seng sngle observably). An nuve way of represenng an observably model s hrough a dreced graph over acons: an arc from acon o acon j mples ha when playng acon we ge nformaon also abou he loss of acon j. Thus, he exper seng s obaned by choosng a complee graph over acons playng any acon reveals all losses), and he band seng s obaned by choosng an empy edge se playng an acon only reveals he loss of ha acon). The man resul of concerns undreced observably graphs. The regre s characerzed n erms of he ndependence number α of he undreced observably graph. Specfcally, hey prove ha T α log K s he opmal regre up o logarhmc facors) and show ha a varan of Exp3, called ELP, acheves hs bound when he graph s known ahead of me, where α {,..., K} nerpolaes beween full observably α for he clque) and sngle observably α K for he graph wh no edges). Gven he observably graph, ELP runs a lnear program o compue he

054 055 056 057 058 059 060 06 062 063 064 065 066 067 068 069 070 07 072 073 074 075 076 077 078 079 080 08 082 083 084 085 086 087 088 089 090 09 092 093 094 095 096 097 098 099 00 0 02 03 04 05 06 07 desred dsrbuon over acons. In he case when he graph changes over me, and a each me T sep ELP observes he curren observably graph before predcon, a bound of α log K s shown, where α s he ndependence number of he graph a me. A major problem lef open n was he characerzaon of regre for dreced observably graphs, a seng for whch hey only proved paral resuls. Our man resul s a full characerzaon o whn logarhmc facors) of regre n he case of dreced and dynamc observably graphs. Our upper bounds are proven usng a new algorhm, called Exp3-DOM. Ths algorhm s effcen o run even n he dynamc case: jus needs o compue a small domnang se of he curren observably graph whch mus be gven as sde nformaon) before predcon. As n he undreced case, he regre for he dreced case s characerzed n erms of he ndependence numbers of he observably graphs compued gnorng edge drecons). We arrve a hs resul by showng ha a key quany emergng n he analyss of Exp3-DOM can be bounded n erms of he ndependence numbers of he graphs. Ths bound Lemma 3 n he appendx) s based on a combnaoral consrucon whch mgh be of ndependen neres. We also explore he possbly of he learnng algorhm recevng he observably graph only afer predcon, and no before. For hs seng, we nroduce a new varan of Exp3, called Exp3-SET, whch acheves he same regre as ELP for undreced graphs, bu whou he need of accessng he curren observably graph before each predcon. We show ha n some random dreced graph models Exp3-SET has also a good performance. In general, we can upper bound he regre of Exp3- SET as a funcon of he maxmum acyclc subgraph of he observably graph, bu hs upper bound may no be gh. Ye, Exp3-SET s much smpler and compuaonally less demandng han ELP, whch needs o solve a lnear program n each round. There are a varey of real-world sengs where paral observably models correspondng o dreced and undreced graphs are applcable. One of hem s roue selecon. We are gven a graph of possble roues connecng ces: when we selec a roue r connecng wo ces, we observe he cos say, drvng me or fuel consumpon) of he edges along ha roue and, n addon, we have complee nformaon on any sub-roue r of r, bu no vce versa. We absrac hs n our model by havng an observably graph over roues r, and an arc from r o any of s sub-roues r. Sequenal predcon problems wh paral observably models also arse n he conex of recommendaon sysems. For example, an onlne realer, whch adverses producs o users, knows ha users buyng ceran producs are ofen neresed n a se of relaed producs. Ths knowledge can be represened as a graph over he se of producs, where wo producs are joned by an edge f and only f users who buy any one of he wo are lkely o buy he oher as well. In ceran cases, however, edges have a preferred orenaon. For nsance, a person buyng a vdeo game console mgh also buy a hgh-def cable o connec o he TV se. Vce versa, neres n hgh-def cables need no ndcae an neres n game consoles. Such observably models may also arse n he case when a recommendaon sysem operaes n a nework of users. For example, consder he problem of recommendng a sequence of producs, or conens, o users n a group. Suppose he recommendaon sysem s hosed on an onlne socal nework, on whch users can befrend each oher. In hs case, has been observed ha socal relaonshps reveal smlares n ases and neress 2. However, socal lnks can also be asymmerc e.g., followers of celebres). In such cases, followers mgh be more lkely o shape her preferences afer he person hey follow, han he oher way around. Hence, a produc lked by a celebry s probably also lked by hs/her followers, whereas a preference expressed by a follower s more ofen specfc o ha person. 2 Learnng proocol, noaon, and prelmnares As saed n he nroducon, we consder an adversaral mul-armed band seng wh a fne acon se V {,..., K}. A each me, 2,..., a player he learnng algorhm ) pcks some acon I V and ncurs a bounded loss l I, 0,. Unlke he sandard adversaral band problem 3, 6, where only he played acon I reveals s loss l I,, here we assume all he losses Compung an approxmaely mnmum domnang se can be done by runnng a sandard greedy se cover algorhm, see Secon 2. 2

08 09 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 40 4 42 43 44 45 46 47 48 49 50 5 52 53 54 55 56 57 58 59 60 6 n a subse S I, V of acons are revealed afer I s played. More formally, he player observes he pars, l, ) for each S I,. We also assume S, for any and, ha s, any acon reveals s own loss when played. Noe ha he band seng S, {}) and he exper seng S, V ) are boh specal cases of hs framework. We call S, he observaon se of acon a me, and wre j when a me playng acon also reveals he loss of acon j. Hence, S, {j V : j}. The famly of observaon ses {S, } we collecvely call he observaon sysem a me. The adversares we consder are nonoblvous. Namely, each loss l, a me can be an arbrary funcon of he pas player s acons I,..., I. The performance of a player A s measured hrough he regre max E L A,T L k,t, k V where L A,T l I, + + l IT,T and L k,t l k, + + l k,t are he cumulave losses of he player and of acon k, respecvely. The expecaon s aken wh respec o he player s nernal randomzaon snce losses are allowed o depend on he player s pas random acons, also L k, may be random). 2 The observaon sysem {S, } s eher adversarally generaed n whch case, each S, can be an arbrary funcon of pas player s acons, jus lke losses are), or randomly generaed see Secon 3. In hs respec, we dsngush beween adversaral and random observaon sysems. Moreover, whereas some algorhms need o know he observaon sysem a he begnnng of each sep, ohers need no. From hs vewpon, we shall consder wo onlne learnng sengs. In he frs seng, called he nformed seng, he whole observaon sysem {S, } seleced by he adversary s made avalable o he learner before makng s choce I. Ths s essenally he sdenformaon framework frs consdered n In he second seng, called he unnformed seng, no nformaon whasoever regardng he me- observaon sysem s gven o he learner pror o predcon. We fnd convenen o adop he same graph-heorec nerpreaon of observaon sysems as n. A each me sep, 2,..., he observaon sysem {S, } defnes a dreced graph G V, D ), where V s he se of acons, and D s he se of arcs,.e., ordered pars of nodes. For j, arc, j) D f and only f j he self-loops creaed by are nenonally gnored). Hence, we can equvalenly defne {S, } n erms of G. Observe ha he oudegree d + of any V equals S,. Smlarly, he ndegree d of s he number of acon j such ha S j,.e., such ha j ). A noable specal case of he above s when he observaon sysem s symmerc over me: j S, f and only f S j, for all, j and. In words, playng a me reveals he loss of j f and only f playng j a me reveals he loss of. A symmerc observaon sysem s equvalen o G beng an undreced graph or, more precsely, o a dreced graph havng, for every par of nodes, j V, eher no arcs or lengh-wo dreced cycles. Thus, from he pon of vew of he symmery of he observaon sysem, we also dsngush beween he dreced case G s a general dreced graph) and he symmerc case G s an undreced graph for all ). For nsance, combnng he ermnology nroduced so far, he adversaral, nformed, and dreced seng s when G s an adversarally-generaed dreced graph dsclosed o he algorhm n round before predcon, whle he random, unnformed, and dreced seng s when G s a randomly generaed dreced graph whch s no gven o he algorhm before predcon. The analyss of our algorhms depends on ceran properes of he sequence of graphs G. Two graph-heorec noons playng an mporan role here are hose of ndependen ses and domnang ses. Gven an undreced graph G V, E), an ndependen se of G s any subse T V such ha no wo, j T are conneced by an edge n E. An ndependen se s maxmal f no proper superse hereof s self an ndependen se. The sze of a larges maxmal) ndependen se s he ndependence number of G, denoed by αg). If G s dreced, we can sll assocae wh an ndependence number: we smply vew G as undreced by gnorng arc orenaon. If G V, D) s a dreced graph, hen a subse R V s a domnang se for G f for all j R here exss some R such ha arc, j) D. In our band seng, a me- domnang se R s a subse of acons wh he propery ha he loss of any remanng acon n round can be observed by playng 2 Alhough we defned he problem n erms of losses, our analyss can be appled o he case when acons reurn rewards g, 0, va he ransformaon l, g,. 3

62 63 64 65 66 67 68 69 70 7 72 73 74 75 76 77 78 79 80 8 82 83 84 85 86 87 88 89 90 9 92 93 94 95 96 97 98 99 200 20 202 203 204 205 206 207 208 209 20 2 22 23 24 25 Algorhm : Exp3-SET: Algorhm for he unnformed seng Parameer: η 0, ; Inalze: w, for all V {,..., K}; For, 2,... :. Observaon sysem {S, } s generaed bu no dsclosed) ; 2. Se, w, for each V, where W w j, ; W, j V 3. Play acon I drawn accordng o dsrbuon p p,,..., p K, ) ; 4. Observe pars, l, ) for all S I,; 5. Observaon sysem {S, } s dsclosed ; 6. For any V se w,+ w, exp η l, ), where l, l, q, I{ S I,} and q, j : j some acon n R. A domnang se s mnmal f no proper subse hereof s self a domnang se. The domnaon number of dreced graph G, denoed by γg), s he sze of a smalles mnmal) domnang se of G. Compung a mnmum domnang se for an arbrary dreced graph G s equvalen o solvng a mnmum se cover problem on he assocaed observaon sysem {S, }. Alhough mnmum se cover s NP-hard, he well-known Greedy Se Cover algorhm 7, whch repeaedly selecs from {S, } he se conanng he larges number of uncovered elemens so far, compues a domnang se R such ha R γg ) + ln K). Fnally, we can also lf he ndependence number of an undreced graph o dreced graphs hrough he noon of maxmum acyclc subgraphs: Gven a dreced graph G V, D), an acyclc subgraph of G s any graph G V, D ) such ha V V, and D D V V ), wh no dreced) cycles. We denoe by masg) V he maxmum sze of such V. Noe ha when G s undreced more precsely, as above, when G s a dreced graph havng for every par of nodes, j V eher no arcs or lengh-wo cycles), hen masg) αg), oherwse masg) αg). In parcular, when G s self a dreced acyclc graph, hen masg) V. p j,. 3 Algorhms whou Explc Exploraon: The Unnformed Seng In hs secon, we show ha a smple varan of he Exp3 algorhm 3 obans opmal regre o whn logarhmc facors) n wo varans of he unnformed seng: ) adversaral and symmerc, 2) random and dreced. We hen show ha even he harder adversaral and dreced seng lends self o an analyss, hough wh a weaker regre bound. Exp3-SET Algorhm ) runs Exp3 whou mxng wh he unform dsrbuon. Smlar o Exp3, Exp3-SET uses loss esmaes l, ha dvde each observed loss l, by he probably q, of observng. Ths probably q, s smply he sum of all p j, such ha j he sum ncludes, ). Nex, we bound he regre of Exp3-SET n erms of he key quany Q, q,, p. ) j : j j, Each erm, /q, can be vewed as he probably of drawng from p condoned on he even ha was observed. Smlar o, a key aspec o our analyss s he ably o deermnscally and nonvacuously) 3 upper bound Q n erms of ceran quanes defned on {S, }. We shall 3 An obvous upper bound on Q s K. 4

26 27 28 29 220 22 222 223 224 225 226 227 228 229 230 23 232 233 234 235 236 237 238 239 240 24 242 243 244 245 246 247 248 249 250 25 252 253 254 255 256 257 258 259 260 26 262 263 264 265 266 267 268 269 do so n wo ways, eher rrespecve of how small each, may be hs secon) or dependng on suable lower bounds on he probables, Secon 4). In fac, forcng lower bounds on, s equvalen o addng exploraon erms o he algorhm, whch can be done only when knowng {S, } before each predcon an nformaon avalable only n he nformed seng. The followng smple resul s he buldng block for all subsequen resuls n he unnformed seng. 4 Theorem In he adversaral case, he regre of Exp3-SET sasfes max E ln K L A,T L k,t + η k V η 2 T EQ. As we sad, n he adversaral and symmerc case he observaon sysem a me can be descrbed by an undreced graph G V, E ). Ths s essenally he problem of, whch hey suded n he easer nformed seng, where he same quany Q above arses n he analyss of her ELP algorhm. In her Lemma 3, hey show ha Q αg ), rrespecve of he choce of he probables p. When appled o Exp3-SET, hs mmedaely gves he followng resul. Corollary 2 In he adversaral and symmerc case, he regre of Exp3-SET sasfes max E ln K L A,T L k,t + η k V η 2 T EαG ). In parcular, f for consans α,..., α T we have αg ) α,,..., T, hen seng η 2 ln K) / T α, gves max E L A,T L k,t k V T 2ln K) α. As shown n, he knowledge of T αg ) for unng η can be dspensed wh a he cos of exra log facors n he bound) by bnnng he values of η and runnng Exp3 on op of a pool of nsances of Exp-SET, one for each bn. The bounds proven n Corollary 2 are equvalen o hose proven n Theorem 2 heren) for he ELP algorhm. Ye, our analyss s much smpler and, more mporanly, our algorhm s smpler and more effcen han ELP, whch requres solvng a lnear program a each sep. Moreover, unlke ELP, Exp-SET does no requre pror knowledge of he observaon sysem {S, } a he begnnng of each sep. We now urn o he dreced seng. We frs rea he random case, and hen he harder adversaral case. The Erdős-Reny model s a sandard model for random dreced graphs G V, D), where we are gven a densy parameer r 0, and, for any par, j V, arc, j) D wh ndependen probably r. 5 We have he followng resul. Corollary 3 Le G be generaed accordng o he Erdős-Reny model wh parameer r 0,. Then he regre of Exp3-SET sasfes max E ln K L A,T L k,t + η T r) K ). k V η 2r In he above, he expecaons E are w.r.. boh he algorhm s randomzaon and he random generaon of G occurrng a each round. In parcular, seng η 2r ln K T r) K ), gves max E 2ln K)T r)k ) L A,T L k,t. k V r 4 All proofs are gven n he supplemenary maeral o hs paper. 5 Self loops,.e., arcs, ) are ncluded by defaul here. 5

270 27 272 273 274 275 276 277 278 279 280 28 282 283 284 285 286 287 288 289 290 29 292 293 294 295 296 297 298 299 300 30 302 303 304 305 306 307 308 309 30 3 32 33 34 35 36 37 38 39 320 32 322 323 Noe ha as r ranges n 0, we nerpolae beween he band r 0) 6 and he exper r ) regre bounds. In he adversaral seng, we have he followng resul. Corollary 4 In he adversaral and dreced case, he regre of Exp3-SET sasfes max E ln K L A,T L k,t + η k V η 2 T EmasG ). In parcular, f for consans m,..., m T we have masg ) m,,..., T, hen seng η 2 ln K) / T m, gves max E L A,T L k,t k V T 2ln K) m. Observe ha Corollary 4 s a src generalzaon of Corollary 2 because, as we poned ou n Secon 2, masg ) αg ), wh equaly holdng when G s an undreced graph. As far as lower bounds are concerned, n he symmerc seng, he auhors of derve a lower bound of Ω αg)t ) n he case when G G for all. We remark ha smlar o he symmerc seng, we can derve a lower bound of Ω αg)t ). The smple observaon s ha gven a dreced graph G, we can defne a new graph G whch s made undreced jus by recprocang arcs; namely, f here s an arc, j) n G we add arcs, j) and j, ) n G. Noe ha αg) αg ). Snce n G he learner can only receve more nformaon han n G, any lower bound on G also apples o G. Therefore we derve he followng corollary o he lower bound of Theorem 4 heren). Corollary 5 Fx a dreced graph G, and suppose G G for all. Then here exss a randomzed) adversaral sraegy such ha for any T Ω αg) 3) and for any learnng sraegy, he expeced regre of he learner s Ω αg)t ). One may wonder wheher a sharper lower bound argumen exss whch apples o he general dreced seng and nvolves he larger quany masg). Unforunaely, he above measure does no seem o be relaed o he opmal regre: Usng Clam n he appendx see proof of Theorem 3) one can exhb a sequence of graphs each havng a large acyclc subgraph, on whch he regre of Exp3-SET s sll small. The lack of a lower bound machng he upper bound provded by Corollary 4 s a good ndcaon ha somehng more sophscaed has o be done n order o upper bound Q n ). Ths leads us o consder more refned ways of allocang probables, o nodes. However, hs allocaon wll requre pror knowledge of he graphs G. 4 Algorhms wh Explc Exploraon: The Informed Seng We are sll n he general scenaro where graphs G are arbrary and dreced, bu now G s made avalable before predcon. We sar by showng a smple example where our analyss of Exp3-SET nherenly fals. Ths s due o he fac ha, when he graph nduced by he observaon sysem s dreced, he key quany Q defned n ) canno be nonvacuously upper bounded ndependen of he choce of probables,. A way round s o nroduce a new algorhm, called Exp3- DOM, whch conrols probables, by addng an exploraon erm o he dsrbuon p. Ths exploraon erm s suppored on a domnang se of he curren graph G. For hs reason, Exp3- DOM requres pror access o a domnang se R a each me sep whch, n urn, requres pror knowledge of he enre observaon sysem {S, }. 6 Observe ha lm r 0 + r)k r K. 6

324 325 326 327 328 329 330 33 332 333 334 335 336 337 338 339 340 34 342 343 344 345 346 347 348 349 350 35 352 353 354 355 356 357 358 359 360 36 362 363 364 365 366 367 368 369 370 37 372 373 374 375 376 377 Algorhm 2: Exp3-DOM Inpu: Exploraon parameers γ b) 0, for b { 0,,..., log 2 K } ; Inalzaon: w b), for all V and b { 0,,..., log 2 K } ; For, 2,... :. Observaon sysem {S, } s generaed and dsclosed ; 2. Compue a domnang se R V for G assocaed wh {S, } ; 3. Le b be such ha R 2 b, 2 b+ ; 4. Se W b) 5. Se p b), wb), ; wb) γ b)), + γb) W b) R I{ R }; 6. Play acon I drawn accordng o dsrbuon p b) 7. Observe pars, l, ) for all S I,; p b),,..., ) pb) V, ; 8. For any V se w b),+ wb), exp b) b) ) γ l,, where /2b l b), l, I{ S I,} and q b) q b), p b) j,., j : j As announced, he nex resul shows ha, even for smple dreced graphs, here exs dsrbuons p on he verces such ha Q s lnear n he number of nodes whle he ndependence number s. 7 Hence, nonrval bounds on Q can be found only by mposng condons on dsrbuon p. Fac 6 Le G V, D) be a oal order on V {,..., K},.e., such ha for all V, arc j, ) D for all j +,..., K. Le p p,..., p K ) be a dsrbuon on V such ha 2, for < K and p k 2 K+. Then Q K + j : j p j K K j p j K + 2 We are now ready o nroduce and analyze he new algorhm Exp3-DOM for he adversaral, nformed and dreced seng. Exp3-DOM see Algorhm 2) runs Olog K) varans of Exp3 ndexed by b 0,,..., log 2 K. A me he algorhm s gven observaon sysem {S, }, and compues a domnang se R of he dreced graph G nduced by {S, }. Based on he sze R of R, he algorhm uses nsance b log 2 R o pck acon I. We use a superscrp b o denoe he quanes relevan o he varan of Exp3 ndexed by b. Smlarly o he analyss of Exp3-SET, he key quanes are q b), p b) j, j : S j, j : j p b) j, and Q b). p b),, b 0,,..., log q b) 2 K., Le T b) {,..., T : R 2 b, 2 b+ }. Clearly, he ses T b) are a paron of he me seps {,..., T }, so ha b T b) T. Snce he adversary adapvely chooses he domnang ses R, he ses T b) are random. Ths causes a problem n unng he parameers γ b). For hs reason, we do no prove a regre bound for Exp3-DOM, where each nsance uses a fxed γ b), bu for a slgh varan descrbed n he proof of Theorem 7 see he appendx) where each γ b) s se hrough a doublng rck. 7 In hs specfc example, he maxmum acyclc subgraph has sze K, whch confrms he looseness of Corollary 4. 7

378 379 380 38 382 383 384 385 386 387 388 389 390 39 392 393 394 395 396 397 398 399 400 40 402 403 404 405 406 407 408 409 40 4 42 43 44 45 46 47 48 49 420 42 422 423 424 425 426 427 428 429 430 43 Theorem 7 In he adversaral and dreced case, he regre of Exp3-DOM sasfes log2 K max E L A,T L k,t 2b ln K + γ b) E ) + Qb) k V γ b) 2 b+. 2) b0 T b) Moreover, f we use a doublng rck o choose γ b) for each b 0,..., log 2 K, hen max E L A,T L k,t O ln K) E T ) 4 R + Q b) + ln K) lnkt ). 3) k V Imporanly, he nex resul shows how bound 3) of Theorem 7 can be expressed n erms of he sequence αg ) of ndependence numbers of graphs G whenever he Greedy Se Cover algorhm 7 see Secon 2) s used o compue he domnang se R of he observaon sysem a me. Corollary 8 If Sep 2 of Exp3-DOM uses he Greedy Se Cover algorhm o compue he domnang ses R, hen he regre of Exp-DOM wh doublng rck sasfes max E T L A,T L k,t O lnk) lnkt ) αg ) + lnk) lnkt ), k V where, for each, αg ) s he ndependence number of he graph G nduced by observaon sysem {S, }. 5 Conclusons and work n progress We have nvesgaed onlne predcon problems n paral nformaon regmes ha nerpolae beween he classcal band and exper sengs. We have shown a number of resuls characerzng predcon performance n erms of: he srucure of he observaon sysem, he amoun of nformaon avalable before predcon, he naure adversaral or fully random) of he process generang he observaon sysem. Our resuls are subsanal mprovemens over he paper ha naed hs neresng lne of research. Our mprovemens are dverse, and range from consderng boh nformed and unnformed sengs o delverng more refned graph-heorec characerzaons, from provdng more effcen algorhmc soluons o relyng on smpler and ofen more general) analycal ools. Some research drecons we are currenly pursung are he followng.. We are currenly nvesgang he exen o whch our resuls could be appled o he case when he observaon sysem {S, } may depend on he loss l I, of player s acon I. Noce ha hs would preven a drec consrucon of an unbased esmaor for unobserved losses, whch many wors-case band algorhms ncludng ours see he appendx) hnge upon. 2. The upper bound conaned n Corollary 4 and expressed n erms of mas ) s almos ceranly subopmal, even n he unnformed seng, and we are ryng o see f more adequae graph complexy measures can be used nsead. 3. Our lower bound Corollary 5) heavly reles on he correspondng lower bound n whch, n urn, refers o a consan graph sequence. We would lke o provde a more complee charecerzaon applyng o sequences of adversarally-generaed graphs G, G 2,..., G T n erms of sequences of her correspondng ndependence numbers αg ), αg 2 ),..., αg T ) or varans hereof), n boh he unnformed and he nformed sengs. 8

432 433 434 435 436 437 438 439 440 44 442 443 444 445 446 447 448 449 450 45 452 453 454 455 456 457 458 459 460 46 462 463 464 465 466 467 468 469 470 47 472 473 474 475 476 477 478 479 480 48 482 483 484 485 References N. Alon and J. H. Spencer. The probablsc mehod. John Wley & Sons, 2004. 2 Jean-Yves Audber and Sébasen Bubeck. Mnmax polces for adversaral and sochasc bands. In COLT, 2009. 3 Peer Auer, Ncolò Cesa-Banch, Yoav Freund, and Rober E. Schapre. The nonsochasc mularmed band problem. SIAM Journal on Compung, 32):48 77, 2002. 4 Y. Caro. New resuls on he ndependence number. In Tech. Repor, Tel-Avv Unversy, 979. 5 N. Cesa-Banch, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapre, and M. K. Warmuh. How o use exper advce. J. ACM, 443):427 485, 997. 6 N. Cesa-Banch and G. Lugos. Predcon, learnng, and games. Cambrdge Unversy Press, 2006. 7 V. Chvaal. A greedy heursc for he se-coverng problem. Mahemacs of Operaons Research, 43):233 235, 979. 8 Yoav Freund and Rober E. Schapre. A decson-heorec generalzaon of on-lne learnng and an applcaon o boosng. In Euro-COLT, pages 23 37. Sprnger-Verlag, 995. Also, JCSS 55): 9-39 997). 9 A. Kala and S. Vempala. Effcen algorhms for onlne decson problems. Journal of Compuer and Sysem Scences, 7:29 307, 2005. 0 Nck Llesone and Manfred K. Warmuh. The weghed majory algorhm. Informaon and Compuaon, 08:22 26, 994. S. Mannor and O. Shamr. From bands o expers: On he value of sde-observaons. In 25h Annual Conference on Neural Informaon Processng Sysems NIPS 20), 20. 2 Alan Sad, Erneso W De Luca, and Sahn Albayrak. How socal relaonshps affec user smlares. In Proceedngs of he Inernaonal Conference on Inellgen User Inerfaces Workshop on Socal Recommender Sysems, Hong Kong, 200. 3 V. G. Vovk. Aggregang sraeges. In COLT, pages 37 386, 990. 4 V. K. Wey. A lower bound on he sably number of a smple graph. In Bell Lab. Tech. Memo No. 8-27-9, 98. 9

486 487 488 489 490 49 492 493 494 495 496 497 498 499 500 50 502 503 504 505 506 507 508 509 50 5 52 53 54 55 56 57 58 59 520 52 522 523 524 525 526 527 528 529 530 53 532 533 534 535 536 537 538 539 A Techncal lemmas and proofs Ths secon conans he proofs of all echncal resuls occurrng n he man ex, along wh ancllary graph-heorec lemmas. Throughou hs appendx, E s a shorhand for E I,..., I. Proof of Theorem Followng he proof of Exp3 3, we have W + w,+ W W w, exp η l, ) W, exp η l, ), η l, + ) 2 η2 l, ) 2 usng e x x + x 2 /2 for all x 0 η, l, + η2, l, ) 2. 2 Takng logs, usng ln x) x for all x 0, and summng over,..., T yelds ln W T T T + η, l, + η2, l, ) 2. W 2 Moreover, for any fxed comparson acon k, we also have ln W T + ln w T k,t + η l k, ln K. W W Pung ogeher and rearrangng gves T T, l, l k, + ln K η Noe ha, for all V, E l, l, p j, q, j : S j, j : j + η 2 T, l, ) 2. 4) p j, l, q, l, q, j : j p j, l,. Moreover, E l, ) 2 p l2, j, q 2 l2, j : S j,, q, 2 p j, q 2 p j,. j : j, q, j : j Hence, akng expecaons E on boh sdes of 4), and recallng he defnon of Q, we can wre T T, l, l k, + ln K + η T Q. 5) η 2 Fnally, akng expecaons o remove condonng gves as clamed. E L A,T L k,t ln K η + η 2 T EQ, Proof of Corollary 3 Fx round, and le G V, D) be he Erdős-Reny random graph generaed a me, N he n-neghborhood of node,.e., he se of nodes j such ha j, ) D, and denoe by d ndegree of. be he 0

540 54 542 543 544 545 546 547 548 549 550 55 552 553 554 555 556 557 558 559 560 56 562 563 564 565 566 567 568 569 570 57 572 573 574 575 576 577 578 579 580 58 582 583 584 585 586 587 588 589 590 59 592 593 Clam Le p,..., p K be an arbrary probably dsrbuon defned over V, f : V V be an arbrary permuaon of V, and E f denoe he expecaon w.r.. permuaon f when f s drawn unformly a random. Then, for any V, we have p f) E f p f) + j : fj) N p fj) + d. f) Proof. Consder selecng a subse S V of + d nodes. We shall consder he conrbuon o he expecaon when S N f) {f)}. Snce here are KK ) K d + ) erms ou of K!) conrbung o he expecaon, we can wre p f) E f p f) + j : fj) N p K ) fj) + d f) d S V, S d + S j S,j p j ) + d K d + d S V, S d Clam 2 Le p,..., p K be an arbrary probably dsrbuon defned over V, and E denoe he expecaon w.r.. he Erdős-Reny random draw of arcs a me. Then, for any fxed V, we have E + p r) K ). j : j j rk Proof. For he gven V and me, consder he Bernoull random varables X j, j V \{}, and p denoe by E j : j he expecaon w.r.. all of hem. We symmerze E p P + j : j p j by means of a random permuaon f, as n Clam. We can wre E + p E j : j j : j j + j : j X jp j p f) E j : j E f p f) + j : j X by symmery) fj)p fj) E j : j + j : j X from Clam ) j K ) K r r) K + 0 rk rk K 0. ) K r + r) K + r) K ). A hs pon, we follow he proof of Theorem up unl 5). We ake an expecaon E G,...,G T w.r.. he randomness n generang he sequence of graphs G,..., G T. Ths yelds T T E G,...,G T, l, l k, + ln K + η T E G,...,G η 2 T Q.

594 595 596 597 598 599 600 60 602 603 604 605 606 607 608 609 60 6 62 63 64 65 66 67 68 69 620 62 622 623 624 625 626 627 628 629 630 63 632 633 634 635 636 637 638 639 640 64 642 643 644 645 646 647 We use Clam 2 o upper bound E G,...,G T Q by ) r r) K, and ake he ouer expecaon o remove condonng, as n he proof of Theorem. Ths concludes he proof. The followng lemma can be seen as a generalzaon of Lemma 3 n. Lemma 9 Le G V, D) be a dreced graph wh verex se V {,..., K}, and arc se D. Le N be he n-neghborhood of node,.e., he se of nodes j such ha j, ) D. Then K p + masg). j N p j Proof. We wll show ha here s a subse of verces V such ha he nduced graph s acyclc and V K + P j N p j. We prove he lemma by growng se V sarng off from V. Le K Φ 0 p +, j N p j and be he verex whch mnmzes + j N p j over V. We are gong o delee from he graph, along wh all s ncomng neghbors se N ), and all edges whch are ncden boh deparng and ncomng) o hese nodes, and hen erang on he remanng graph. Le us denoe he n-neghborhoods of he shrunken graph from he frs sep by N,. The conrbuon of all he deleed verces o Φ 0 s p r p r + j Nr p j r N { } r N { } where he nequaly follows from he mnmaly of. p r + j N p j, Le V V { }, and V V N { }). Then from he frs sep we have Φ + j N p j + Φ 0., j N p j We apply he very same argumen o Φ wh node 2 mnmzng + j N p j over V ),, o Φ 2 wh node 3,..., o Φ s wh node s, up unl Φ s 0,.e., up unl no nodes are lef n he shrunken graph. Ths gves Φ 0 s V, where V {, 2,..., s }. Moreover, snce n each sep r,..., s we remore all remanng arcs ncomng o r, he graph nduced by se V canno conan cycles. Proof of Corollary 4 The clam follows from a drec combnaon of Theorem wh Lemma 9. Proof of Fac 6 Usng sandard properes of geomerc sums, one can mmedaely see ha K hence he clamed resul. K j p j K 2 2 K+ + 2 + 2 K+ K + K +, 2 2 The followng graph-heorec lemma urns ou o be farly useful for analyzng dreced sengs. I s a dreced-graph counerpar o a well-known resul 4, 4 holdng for undreced graphs. Lemma 0 Le G V, D) be a dreced graph, wh V {,..., K}. Le d node, and α αg) be he ndependence number of G. Then K + d 2α ln + K ). α be he ndegree of 2

648 649 650 65 652 653 654 655 656 657 658 659 660 66 662 663 664 665 666 667 668 669 670 67 672 673 674 675 676 677 678 679 680 68 682 683 684 685 686 687 688 689 690 69 692 693 694 695 696 697 698 699 700 70 Proof. We wll proceed by nducon, sarng off from he orgnal K-node graph G G K wh ndegrees {d }K {d,k }K, and ndependence number α α K, and hen progressvely shrnk G by elmnang nodes and ncden boh deparng and ncomng) arcs, hereby obanng a sequence of smaller and smaller graphs G K, G K, G K 2,..., and assocaed ndegrees {d,k }K, {d,k }K, {d,k 2 }K 2,..., and ndependence numbers α K, α K, α K 2,.... Specfcally, n sep s we sor nodes,..., s of G s n nonncreasng value of d,s, and oban G s from G s by elmnang node.e., one havng he larges ndegree among he nodes of G s ), along wh s ncden arcs. On all such graphs, we wll use he classcal Turan s heorem e.g., ) sang ha any undreced graph wh n s nodes and m s edges has an ndependen se of sze a leas Ths mples ha f G s V s, D s ), hen α s sasfes 8 We hen sar from G K. We can wre Hence, d,k max...k d,k K K + d,k n s 2ms ns +. D s V s V s 2α s 2. 6) K d,k D K V K V K 2α K 2. K + d +,K + d,k 2 2α K K α K + K + 2 K 2α K α K + K + + d,k + d,,k where he las nequaly follows from d +,K d,k,,... K, due o he arc elmnaon urnng G K no G K. Recursvely applyng he very same argumen o G K.e., o he sum K +d,k ), and hen erang all he way o G yelds he upper bound K + d,k K 2α α +. Combnng wh α α K α, and K α+ ln ) + K α concludes he proof. The nex lemma relaes he sze R of he domnang se R compued by he Greedy Se Cover algorhm of 7 operang on he me- observaon sysem {S, } o he ndependence number αg ) and he domnaon number γg ) of G. Lemma Le {S } be an observaon sysem, and G V, D) be he nduced dreced graph, wh verex se V {,..., K}, ndependence number α αg), and domnaon number γ γg). Then he domnang se R consruced by he Greedy Se Cover algorhm see Secon 2) sasfes R mn { γ + ln K), 2α ln K + }. Proof. As recalled n Secon 2, he Greedy Se Cover algorhm of 7 acheves R γ + ln K). In order o prove he oher bound, consder he sequence of graphs G G, G 2,..., where each G s+ V s+, D s+ ) s obaned by removng from G s he verex s seleced by he Greedy Se 8 Noce ha D s s a leas as large as he number of edges of he undreced verson of G s whch he ndependence number α s acually refers o. 3

702 703 704 705 706 707 708 709 70 7 72 73 74 75 76 77 78 79 720 72 722 723 724 725 726 727 728 729 730 73 732 733 734 735 736 737 738 739 740 74 742 743 744 745 746 747 748 749 750 75 752 753 754 755 Cover algorhm, ogeher wh all he verces n G s ha are domnaed by s, and all arcs ncden o hese verces. By defnon of he algorhm, he oudegree d + s of s n G s s larges n G s. Hence, d + s D s V s V s 2α s 2 V s 2α 2 by Turan s heorem e.g., ), where α s s he ndependence number of G s and α α s. Ths shows ha V s+ V s d + s V s ) V s e /2α). 2α Ierang, we oban V s K e s/2α). Choosng s 2α ln K + gves V s <, hereby coverng all nodes. Hence he domnang se R {,..., s } so consruced sasfes R 2α ln K +. Lemma 2 If a, b 0, and a + b B > A > 0, hen Proof. a a + b A a a + b a a + b A We now lf Lemma 0 o a more general saemen. a a + b + A B A. aa a + b)a + b A) A a + b A A B A. Lemma 3 Le G V, D) be a dreced graph, wh verex se V {,..., K}, and arc se D. Le N be he n-neghborhood of node,.e., he se of nodes j such ha j, ) D. Le α be he ndependence number of G, R V be a domnang se for G of sze r R, and p,..., p K be a probably dsrbuon defned over V, such ha β > 0, for R. Then K p + 2α ln + K 2 rβ + K ) + 2r. j N p j α + P j N p j Proof. The dea s o appropraely dscreze he probably values, and hen upper bound he dscrezed counerpar of K by reducng o an expresson ha can be handled by Lemma 0. In order o make hs dscrezaon effecve, we need o sngle ou he erms correspondng o nodes R. We frs wre + P j N p j K and hen focus on 7). + j N p j + R j N r + + / R j N + p j + / R j N p j p j, 7) Le us dscreze he un nerval 9 0, no subnervals j M, j M, j,..., M, where M K2 rβ. Le j/m be he dscrezed verson of, beng j he unque neger such ha /M <. 9 The zero value won be of our concern here, because f 0, he correspondng erm n 7) can be dsregarded. 4

756 757 758 759 760 76 762 763 764 765 766 767 768 769 770 77 772 773 774 775 776 777 778 779 780 78 782 783 784 785 786 787 788 789 790 79 792 793 794 795 796 797 798 799 800 80 802 803 804 805 806 807 808 809 Le us focus on a sngle node / R wh ndegree d N, and nroduce he shorhand noaon P j N p j, and P j N p j. We have ha P P β, snce s domnaed by some node j R N such ha p j β. Moreover, P > P d M β d M > 0, and + P β. Hence, for any fxed node / R, we can wre + P < < + P + P d M + P + d /M β d /M d + P + βm d + P + r K r, where n he second-las nequaly we used Lemma 2 wh a, b P, A d /M, and B β > d /M. Recallng 7), and summng over hen gves K r + + P p / R + P + r p / R + P + 2r. 8) Therefore, we connue by boundng from above he rgh-hand sde of 8). We frs observe ha p / R + P ŝ, Ŝ ŝ j, 9) ŝ / R + Ŝ where ŝ M,,..., K, are negers. Based on he orgnal graph G, we consruc a new graph Ĝ made up of conneced clques. In parcular: Each node of G s replaced n Ĝ by a clque C of sze ŝ ; nodes whn C are conneced by lengh-wo cycles. j N If arc, j) s n G, hen for each node of C draw an arc owards each node of C j. We would lke o apply Lemma 0 o Ĝ. Noce ha, by he above consrucon: The ndependence number of Ĝ s he same as ha of G; The ndegree d k of each node k n clque C sasfes d k ŝ + Ŝ. The oal number of nodes of Ĝ s K K K ŝ M < M + ) M + K. M Hence, we are n a poson o apply Lemma 0 o Ĝ wh ndegrees d k, revealng ha ŝ K ŝ / R + Ŝ / R k C + d k k C + d k 2α ln + M + K ). α Pung ogeher as n 8) and 9), and recallng he value of M gves he clamed resul. Proof of Theorem 7 We sar o bound he conrbuon o he overall regre of an nsance ndexed by b. When clear from 5

80 8 82 83 84 85 86 87 88 89 820 82 822 823 824 825 826 827 828 829 830 83 832 833 834 835 836 837 838 839 840 84 842 843 844 845 846 847 848 849 850 85 852 853 854 855 856 857 858 859 860 86 862 863 he conex, we remove he superscrp b from γ b), w b) T b) we have W + W,, pb), w,+ W w, exp γ/2 b ) W l ),, γ/ R exp γ/2 b ) γ l ), + R R, γ/ R γ γ 2 l b, + γ 2 2 l ) ) 2 b, + R R, and oher relaed quanes. For any, γ exp γ/2 b ) l ), R, γ usng e x x + x 2 /2 for all x 0) γ/2b, l, + γ2 /2 b l, γ γ R + γ/2 b ) 2 ) 2, l,. 2 γ Takng logs, upper boundng, and summng over T b) yelds ln W T b) + W γ/2b γ T b), l, + γ2 /2 b γ T b) R Moreover, for any fxed comparson acon k, we also have ln W T b) + W ln w k, T b) + W γ 2 b Pung ogeher, rearrangng, and usng γ gves, l, lk, + 2b ln K + γ γ T b) T b) T b) R l, R + γ/2 b ) 2 2 γ T b) lk, ln K. l, R + γ 2 b+ γ 2 l b, + γ 2 2 l ) ) 2 b, T b) Renroducng he noaon γ b) and summng over b 0,,..., log 2 K gves T p b), l b), l k, ) log 2 K b0 2 b ln K γ b) + T γb) l R R b ), + T b) T γ b) 2 b+ ) 2, l,. ) 2, l,. p b), lb ) ) 2,. 0) l, and Now, smlarly o he proof of Theorem, we have ha, for any and, E lb ),. Hence, akng expecaons E on boh sdes of 0) and recallng he defnon E ) lb, )2 q b ), of Q b) gves T Moreover, and ) log 2 K p b), l, l k, T γb) l, R R T b0 T 2 b ln K γ b) + γb) R T γb) l, + R R T R γ b) log 2 K b0 γ b) log 2 K γ b) Qb) 2b+ 2 b+ Q b). b0 T b) T γ b) Qb) 2b+. ) γ b) T b) 6

864 865 866 867 868 869 870 87 872 873 874 875 876 877 878 879 880 88 882 883 884 885 886 887 888 889 890 89 892 893 894 895 896 897 898 899 900 90 902 903 904 905 906 907 908 909 90 9 92 93 94 95 96 97 Hence, pluggng back no ), akng ouer expecaons on boh sdes and recallng ha T b) s random snce he adversary adapvely decdes whch seps fall no T b) ), we ge E log 2 K L A,T L k,t E 2b ln K + γ b) T b) + γb) 2 b+ Ths esablshes 2). b0 log 2 K b0 γ b) 2b ln K γ b) + γ b) E T b) T b) Q b) + Qb) 2 b+ ). 2) In order o prove nequaly 3), we need o une each γ b) separaely. However, a good choce of γ b) depends on he unknown random quany Q b) ) + Qb) 2 b+. T b) To overcome hs problem, we slghly modfy Exp3-DOM by applyng a doublng rck 0 o guess Q b) for each b. Specfcally, for each b 0,,..., log 2 K, we use a sequence γ r b) 2b ln K)/2 r, for r 0,,.... We nally run he algorhm wh γ b) 0. Whenever he algorhm s runnng wh γ b) r hen we resar he algorhm wh γ b) r+ and observes ha s Qb) s > 2 r, where he sum s over all s so far n T b),. Because he conrbuon of nsance b o 2) s 2 b ln K γ b) + γ b) T b) ) + Qb) 2 b+ he regre we pay when usng any γ b) r s a mos 2 2 b ln K)2 r. The larges r we need s log2 Q b) and log 2 Q b) r0 2 r/2 < 5 Q b). Snce we pay regre a mos for each resar, we ge E log 2 K L A,T L k,t c E ln K) 2 b T b) + 2 b0 for some posve consan c. Takng no accoun ha log 2 K b0 log 2 K b0 log 2 K b0 2 b T b) 2 Q b) T b) T R T Q b), Q b) T b) log2 Q b) O ln K) lnkt ) ), + log 2 Q b). 0 The pseudo-code for he varan of Exp3-DOM usng such a doublng rck s no dsplayed n hs exended absrac. Noce ha P s Qb) s s an observable quany. 7

98 99 920 92 922 923 924 925 926 927 928 929 930 93 932 933 934 935 936 937 938 939 940 94 942 943 944 945 946 947 948 949 950 95 952 953 954 955 956 957 958 959 960 96 962 963 964 965 966 967 968 969 970 97 we oban E log 2 K L A,T L k,t c as desred. b0 E ln K) 2 b T b) + 2 c log 2 K E ln K log 2 K O ln K) E T T 4 R + Q b) Q b) T b) + O ln K) lnkt ) ) 2 R + ) 2 Qb) + O ln K) lnkt ) ) ) + ln K) lnkt ) Proof of Corollary 8 We sar off from he upper bound 3) n he saemen of Theorem 7. We wan o bound he quanes R and Q b) occurrng heren a any sep n whch a resar does no occur he regre for he me seps when a resar occurs s already accouned for by he erm O ln K) lnkt ) ) n 3). Now, Lemma gves R O αg ) ln K ). If γ γ b) for any me when a resar does no occur, s no hard o see ha γ Ω ln K)/KT ) ). Moreover, Lemma 3 saes ha Hence, Q O αg ) lnk 2 /γ ) + R ) O αg ) lnk/γ ) ). Pung ogeher as n 3) gves he desred resul. Q O αg ) lnkt ) ). 8