Optimization, Learning, and Games With Predictable Sequences

Size: px
Start display at page:

Download "Optimization, Learning, and Games With Predictable Sequences"

Transcription

1 University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 03 Optimization, Learning, and Games With Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Follow this and additional works at: Part of the Statistics and Probability Commons, and the heory and Algorithms Commons Recommended Citation Rakhlin, A., & Sridharan, K. (03). Optimization, Learning, and Games With Predictable Sequences. NIPS, Retrieved from his paper is posted at ScholarlyCommons. For more information, please contact

2 Optimization, Learning, and Games With Predictable Sequences Abstract We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Holder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log )/). his addresses a question of Daskalakis et al 0. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. Disciplines Statistics and Probability heory and Algorithms his conference paper is available at ScholarlyCommons:

3 Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania November, 03 arxiv:3.869v [cs.lg] 8 Nov 03 Abstract We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Hölder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log)/). his addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. Introduction Recently, no-regret algorithms have received increasing attention in a variety of communities, including theoretical computer science, optimization, and game theory [3, ]. he wide applicability of these algorithms is arguably due to the black-box regret guarantees that hold for arbitrary sequences. However, such regret guarantees can be loose if the sequence being encountered is not worst-case. he reduction in arbitrariness of the sequence can arise from the particular structure of the problem at hand, and should be exploited. For instance, in some applications of online methods, the sequence comes from an additional computation done by the learner, thus being far from arbitrary. One way to formally capture the partially benign nature of data is through a notion of predictable sequences []. We exhibit applications of this idea in several domains. First, we show that the Mirror Prox method [9], designed for optimizing non-smooth structured saddle-point problems, can be viewed as an instance of the predictable sequence approach. Predictability in this case is due precisely to smoothness of the inner optimization part and the saddle-point structure of the problem. We extend the results to Hölder-smooth functions, interpolating between the case of well-predictable gradients and unpredictable gradients. Second, we address the question raised in [6] about existence of simple algorithms that converge at the rate ofõ( ) when employed in an uncoupled manner by players in a zero-sum finite matrix game, yet maintain the usualo( / ) rate against arbitrary sequences. We give a positive answer and exhibit a fully adaptive algorithm that does not require the prior knowledge of whether the other player is collaborating. Here, the additional predictability comes from the fact that both players attempt to converge to the minimax value. We also tackle a partial information version of the problem where the player has only access to the real-valued payoff of the mixed actions played by the two players on each round rather than the entire vector. Our third application is to convex programming: optimization of a linear function subject to convex constraints. his problem often arises in theoretical computer science, and we show that the idea of predictable sequences can be used here too. We provide a simple algorithm for ǫ-approximate Max Flow for a graph with d edges with time complexityõ(d 3/ /ǫ), a performance previously obtained through a relatively involved procedure [8].

4 Online Learning with Predictable Gradient Sequences Let us describe the online convex optimization (OCO) problem and the basic algorithm studied in [4, ]. Let F be a convex set of moves of the learner. On round t=,...,, the learner makes a prediction f t F and observes a convex function G t onf. he objective is to keep regret G t (f t ) G t (f ) small for any f F. LetRbe a -strongly convex function w.r.t. some norm onf, and let g 0 = argmin g F R(g). Suppose that at the beginning of every round t, the learner has access to M t, a vector computable based on the past observations or side information. In this paper we study the Optimistic Mirror Descent algorithm, defined by the interleaved sequence f t = argmin f F η t f, M t +D R (f, g t ), g t = argmin g F η t g, G t (f t ) +D R (g, g t ) () whered R is the Bregman Divergence with respect torand{η t } is a sequence of step sizes that can be chosen adaptively based on the sequence observed so far. he method adheres to the OCO protocol since M t is available at the beginning of round t, and G t (f t ) becomes available after the prediction f t is made. he sequence{f t } will be called primary, while{g t } secondary. his method was proposed in [4] for M t = G t (f t ), and the following lemma is a straightforward extension of the result in [] for general M t : Lemma. LetF be a convex set in a Banach spaceb. LetR B R be a -strongly convex function onf with respect to some norm, and let denote the dual norm. For any fixed step-size η, the Optimistic Mirror Descent Algorithm yields, for any f F, G t (f t ) G t (f ) f t f, t η R + t M t g t f t ( g t f t + g t f t ) () η where R 0 is such thatd R (f, g 0 ) R and t = G t (f t ). When applying the lemma, we will often use the simple fact that t M t g t f t = inf{ ρ ρ>0 t M t + ρ g t f t }. (3) In particular, by setting ρ= η, we obtain the (unnormalized) regret bound of η R +(η/) t M t, which is R t M t by choosing η optimally. Since this choice is not known ahead of time, one may either employ the doubling trick, or choose the step size adaptively: Corollary. Consider step size η t = R max min t i= i M i t + i M i with R max = sup f,g F D R(f, g). hen regret of the Optimistic Mirror Descent algorithm is upper bounded by 3.5R max i= t M t +.,

5 hese results indicate that tighter regret bounds are possible if one can guess the next gradient t by computing M t. One such case arises in offline optimization of a smooth function, whereby the previous gradient turns out to be a good proxy for the next one. More precisely, suppose we aim to optimize a function G(f) whose gradients are Lipschitz continuous: G(f) G(g) H f g for some H> 0. In this optimization setting, no guessing of M t is needed: we may simply query the oracle for the gradient and set M t = G(g t ). he Optimistic Mirror Descent then becomes f t = argmin f F η t f, G(g t ) +D R (f, g t ), g t = argmin g F η t g, G(f t ) +D R (g, g t ) which can be recognized as the Mirror Prox method, due to Nemirovski [9]. By smoothness, G(f t ) M t = G(f t ) G(g t ) H f t g t. Lemma with Eq. (3) and ρ= η=/h immediately yields a bound G(f t ) G(f ) HR, which implies that the average f = f t satisfies G( f ) G(f ) HR /, a known bound for Mirror Prox. We now extend this result to arbitrary α-hölder smooth functions, that is convex functions G such that G( f) G(g) H f g α for all f, g F. Lemma 3. LetF be a convex set in a Banach spaceb and letr B R be a -strongly convex function onf with respect to some norm. Let G be a convex α-hölder smooth function with constant H> 0 and α [0,]. hen the average f = f t of the trajectory given by Optimistic Mirror Descent Algorithm enjoys where R 0 is such that sup f F D R (f, g 0 ) R. G( f ) inf G(f) 8HR+α f F +α his result provides a smooth interpolation between the / rate at α=0 (that is, no predictability of the gradient is possible) and the rate when the smoothness structure allows for a dramatic speed up with a very simple modification of the original Mirror Descent. 3 Structured Optimization In this section we consider the structured optimization problem argmin G(f) f F where G(f)is of the form G(f)=sup x X φ(f, x) with φ(, x) convex for every x X and φ(f, ) concave for every f F. BothF andx are assumed to be convex sets. While G itself need not be smooth, it has been recognized that the structure can be exploited to improve rates of optimization if the function φ is smooth [0]. From the point of view of online learning, we will see that the optimization problem of the saddle point type can be solved by playing two online convex optimization algorithms against each other (henceforth called Players I and II). Specifically, assume that Player I produces a sequence f,..., f by using a regret-minimization algorithm, such that and Player II produces x,..., x with φ(f t, x t ) inf f F ( φ(f t, x t )) inf x X φ(f, x t ) Rate (x,..., x ) (4) ( φ(f t, x)) Rate (f,..., f ). (5) 3

6 By a standard argument (see e.g. [7]), inf f φ(f, x t ) inf f φ(f, x ) sup x inf f φ(f, x) inf f sup x φ(f, x) sup x where f = f t and x = x t. By adding (4) and (5), we have sup x X φ(f t, x) inf f F φ( f, x) sup x φ(f t, x) φ(f, x t ) Rate (x,..., x )+Rate (f,..., f ) (6) which sandwiches the previous sequence of inequalities up to the sum of regret rates and implies near-optimality of f and x. Lemma 4. Suppose both players employ the Optimistic Mirror Descent algorithm with, respectively, predictable sequences Mt and Mt, -strongly convex functionsr onf (w.r.t. F ) andr onx (w.r.t. X ), and fixed learning rates η and η. Let{f t } and{x t } denote the primary sequences of the players while let{g t },{y t } denote the secondary. hen for any α, β > 0, supφ( f, x) inf supφ(f, x) (7) x X f F x X R η + α + R η + β f φ(f t, x t ) Mt F + g t f t F α ( g t f t F η + g t f t F ) x φ(f t, x t ) Mt X + β y t x t X η ( y t x t X + y t x t X ) where R and R are such thatd R (f, g 0 ) R andd R (x, y 0 ) R, and f = f t. he proof of Lemma 4 is immediate from Lemma. We obtain the following corollary: Corollary 5. Suppose φ F X R is Hölder smooth in the following sense: f φ(f, x) f φ(g, x) F H f g α F, f φ(f, x) f φ(f, y) F H x y α X and x φ(f, x) x φ(g, x) X H 4 f g β F, xφ(f, x) x φ(f, y) X H 3 x y β X. Let γ=min{α,α,β,β }, H= max{h, H, H 3, H 4 }. Suppose both players employ Optimistic Mirror Descent with Mt = f φ(g t, y t ) and Mt = xφ(g t, y t ), where{g t } and{y t } are the secondary sequences updated by the two algorithms, and with step sizes η=η =(R + R ) γ (H) ( γ ). hen sup φ( f, x) inf x X f F sup x X φ(f, x) 4H(R + R ) +γ +γ (8) As revealed in the proof of this corollary, the negative terms in (7), that come from an upper bound on regret of Player I, in fact contribute to cancellations with positive terms in regret of Player II, and vice versa. Such a coupling of the upper bounds on regret of the two players can be seen as leading to faster rates under the appropriate assumptions, and this idea will be exploited to a great extent in the proofs of the next section. 4 Zero-sum Game and Uncoupled Dynamics he notions of a zero-sum matrix game and a minimax equilibrium are arguably the most basic and important notions of game theory. he tight connection between linear programming and minimax equilibrium suggests that there might be simple dynamics that can lead the two players of the game to eventually converge to the equilibrium 4

7 value. Existence of such simple or natural dynamics is of interest in behavioral economics, where one asks whether agents can discover static solution concepts of the game iteratively and without extensive communication. More formally, let A [,] n m be a matrix with bounded entries. he two players aim to find a pair of nearoptimal mixed strategies( f, x) n m such that f A x is close to the minimax value min f n max x m f Ax, where n is the probability simplex over n actions. Of course, this is a particular form of the saddle point problem considered in the previous section, with φ(f, x)= f Ax. It is well-known (and follows immediately from (6)) that the players can compute near-optimal strategies by simply playing no-regret algorithms [7]. More precisely, on round t, the players I and II predict the mixed strategies f t and x t and observe Ax t and ft A, respectively. While black-box regret minimization algorithms, such as Exponential Weights, immediately yieldo( / ) convergence rates, Daskalakis et al [6] asked whether faster methods exist. o make the problem well-posed, it is required that the two players are strongly uncoupled: neither A nor the number of available actions of the opponent is known to either player, no funny bit arithmetic is allowed, and memory storage of each player allows only for constant number of payoff vectors. he authors of [6] exhibited a near-optimal algorithm that, if used by both players, yields a pair of mixed strategies that constitutes ano( log(m+n)(log +(log(m+n))3/ ) )-approximate minimax equilibrium. Furthermore, the method has a regret bound of the same order as Exponential Weights when faced with an arbitrary sequence. he algorithm in [6] is an application of the excessive gap technique of Nesterov, and requires careful choreography and interleaving of rounds between the two non-communicating players. he authors, therefore, asked whether a simple algorithm (e.g. a modification of Exponential Weights) can in fact achieve the same result. We answer this in the affirmative. While a direct application of Mirror Prox does not yield the result (and also does not provide strong decoupling), below we show that a modification of Optimistic Mirror Descent achieves the goal. Furthermore, by choosing the step size adaptively, the same method guarantees the typicalo( / ) regret if not faced with a compliant player, thus ensuring robustness. In Section 4., we analyze the first-order information version of the problem, as described above: upon playing the respective mixed strategies f t and x t on round t, Player I observes Ax t and Player II observes ft A. hen, in Section 4., we consider an interesting extension to partial information, whereby the players submit their moves f t, x t but only observe the real value ft Ax t. Recall that in both cases the matrix A is not known to the players. 4. First-Order Information Consider the following simple algorithm. Initialize f 0 = g 0 n and x 0 = y 0 m to be uniform distributions, set β=/ and proceed as follows: On round t, Player I performs Play f t and observe Ax t Update g t (i) g t (i)exp{ η t [Ax t ] i }, f t+ (i) g t (i)exp{ η t+[ax t ] i } g t=( β) g t +(β/n) n while simultaneously Player II performs Play x t and observe f t A Update y t (i) y t (i)exp{ η t [f t A] i }, y t =( β) y t+(β/m) m x t+ (i) y t(i)exp{ η t+[f t A] i } Here, n R n is a vector of all ones and both[b] i and b(i) refer to the i -th coordinate of a vector b. Other than the mixing in of the uniform distribution, the algorithm for both players is simply the Optimistic Mirror Descent with the (negative) entropy function. In fact, the step of mixing in the uniform distribution is only needed when some coordinate of g t (resp., y t ) is smaller than /(n ). Furthermore, this step is also not needed if none of the players deviate from the prescribed method. In such a case, the resulting algorithm is simply the constant step- 5

8 size Exponential Weights f t (i) exp{ η t s= [Ax s ] i + η[ax t ] i }, but with a factor in front of the latest loss vector! Proposition 6. Let A [,] n m,f= n,x= m. If both players use above algorithm with, respectively, Mt= Ax t and Mt = f t A, and the adaptive step sizes η t = min{log(n)( i= t Ax i Ax i + i= t Ax i Ax i ), } and η t = min{log(m)( t i= f i A f i A + t i= f i A f i A ), } respectively, then the pair( f log m+log n+log, x ) is an O( )-approximate minimax equilibrium. Furthermore, if only one player (say, Player I) follows the above algorithm, her regret against any sequence x,..., x of plays is O log(n) t Ax t Ax +. (9) In particular, this implies the worst-case regret ofo( log(n) ) in the general setting of online linear optimization. We remark that (9) can give intermediate rates for regret in the case that the second player deviates from the prescribed strategy but produces stable moves. For instance, if the second player employs a mirror descent algorithm (or Follow the Regularized Leader / Exponential Weights method) with step size η, one can typically show stability x t x t =O(η). In this case, (9) yields the rateo( η log ) for the first player. A typical setting of η / for the second player still ensures theo(log /) regret for the first player. Let us finish with a technical remark. he reason for the extra step of mixing in the uniform distribution stems from the goal of having an adaptive and robust method that still attainso( / ) regret if the other player deviates from using the algorithm. If one is only interested in the dynamics when both players cooperate, this step is not necessary, and in this case the extraneous log factor disappears from the above bound, leading to the log n+log m O( ) convergence. On the technical side, the need for the extra step is the following. he adaptive step size result of Corollary involves the term Rmax sup g D R (f, g) which is potentially infinite for the negative entropy functionr. It is possible that the doubling trick or the analysis of Auer et al [] (who encountered the same problem for the Exponential Weights algorithm) can remove the extra log factor while still preserving the regret minimization property. We also remark that R max is small whenr is instead the p-norm; hence, the use of this regularizer avoids the extraneous logarithmic in factor while still preserving the logarithmic dependence on n and m. However, projection onto the simplex under the p-norm is not as elegant as the Exponential Weights update. 4. Partial Information We now turn to the partial (or, zero-th order) information model. Recall that the matrix A is not known to the players, yet we are interested in finding ǫ-optimal minimax strategies. On each round, the two players choose mixed strategies f t n and x t m, respectively, and observe ft Ax t. Now the question is, how many such observations do we need to get to an ǫ-optimal minimax strategy? Can this be done while still ensuring the usual no-regret rate? he specific setting we consider below requires that on each round t, the two players play four times, and that these four plays are δ-close to each other (that is, ft i f j t δ for i, j {,...,4}). Interestingly, up to logarithmic factors, the fast rate of the previous section is possible even in this scenario, but we do require the knowledge of the number of actions of the opposing player (or, an upper bound on this number). We leave it as an open problem the question of whether one can attain the / -type rate with only one play per round. 6

9 Player I u,..., u n : orthonormal basis of n Initialize g, f = n n; Draw i 0 Unif([n ]) At time t= to Play f t Draw i t Unif([n ]) Observe : r t + =(f t + δu it ) Ax t rt =(f t δu it ) Ax t r t + =(f t + δu it ) Ax t r t =(f t δu it ) Ax t Build estimates : â t = n δ (r+ t rt)u it ā t = n δ ( r+ t r t)u it Update : g t (i) g t (i)exp{ η t â t (i)} g t=( β) g t +(β/n) f t+ (i) g t(i)exp{ η t+ ā t (i)} End Player II v,..., v m : orthonormal basis of m Initialize y, x = m m; Draw j 0 Unif([m ]) At time t= to Play x t Draw j t Unif([m ]) Observe : s t + = f t A(x t + δv jt ) st = f t A(x t δv jt ) s t + = f t A(x t + δv jt ) = f t A(x t δv jt ) s t Build estimates : ˆb t = m δ (s+ t st) v jt b t = m δ ( s+ t s t) v jt Update : y t (i) y t (i)exp{ η t ˆb t (i)} y t=( β) y t +(β/m) x t+ (i) y t(i)exp{ η t+ b t (i)} End Lemma 7. Let A [,] n m,f= n,x= m, let δ be small enough (e.g. exponentially small in m,n, ), and let β=/. If both players use above algorithms with the adaptive step sizes η t = min{ log(n) t i= â i ā i t i= â i ā i â t ā t, } 8m log(m) and η t i= t= min ˆb i b i log(m) ˆb t b t t i= ˆb i b i, 8n log(n) respectively, then the pair( f, x ) is an O (m log(n) log(m)+n log(m) log(n)) -approximate minimax equilibrium. Furthermore, if only one player (say, Player I) follows the above algorithm, her regret against any sequence x,..., x of plays is bounded by m log(m)log(n)+n log(n) O x t x t We leave it as an open problem to find an algorithm that attains the / -type rate when both players only observe the value e i Ae j= A i,j upon drawing pure actions i, j from their respective mixed strategies f t, x t. We hypothesize a rate better than / is not possible in this scenario. 5 Approximate Smooth Convex Programming In this section we show how one can use the structured optimization results from Section 3 for approximately solving convex programming problems. Specifically consider the optimization problem argmax f G s.t. c f (0) i [d], G i (f) 7

10 whereg is a convex set and each G i is an H-smooth convex function. Let the optimal value of the above optimization problem be given by F > 0, and without loss of generality assume F is known (one typically performs binary search if it is not known). Define the setsf={f f G,c f= F } andx= d. he convex programming problem in (0) can now be reformulated as argmin f F max G i(f)=argmin i [d] f F d sup x(i)g i (f). () x X his problem is in the saddle-point form, as studied earlier in the paper. We may think of the first player as aiming to minimize the above expression over F, while the second player maximizes over a mixture of constraints with the aim of violating at least one of them. Lemma 8. Fix γ,ǫ>0. Assume there exists f 0 G such that c f 0 0 and for every i [d], G i (f 0 ) γ. Suppose each G i is -Lipschitz overf. Consider the solution fˆ =( α) f + αf 0 where α= ǫ ǫ+γ and f = f t F is the average of the trajectory of the procedure in Lemma 4 for the optimization problem (). LetR ( )= andr be the entropy function. Further let B be a known constant such that B f g 0 where g 0 F is some initialization and f F is the (unknown) solution to the optimization problem. Set { B η + η log d η= argmin η H of iterations be such that ηh }, η = η H, M t =d i= y t (i) G i (g t ) and M t =(G (g t ),...,G d (g t )). Let number i= > ǫ inf { B η H η + ηlog d ηh } We then have that f ˆ G satisfies all d constraints and is ǫ -approximate, that is γ c ˆf ( ǫ γ )F. Lemma 8 tells us that using the predictable sequences approach for the two players, one can obtain an ǫ γ - approximate solution to the smooth convex programming problem in number of iterations at most order /ǫ. If (reps. ) is the time complexity for single update of the predictable sequence algorithm of Player I (resp. Player ), then time complexity of the overall procedure iso( + ǫ ) 5. Application to Max-Flow We now apply the above result to the problem of finding Max Flow between a source and a sink in a network, such that the capacity constraint on each edge is satisfied. For simplicity, consider a network where each edge has capacity (the method can be easily extended to the case of varying capacity). Suppose the number of edges d in the network is the same order as number of vertices in the network. he Max Flow problem can be seen as an instance of a convex (linear) programming problem, and we apply the proposed algorithm for structured optimization to obtain an approximate solution. For the Max Flow problem, the setsg andf are given by sets of linear equalities. Further, if we use Euclidean norm squared as regularizer for the flow player, then projection step can be performed in O(d) time using conjugate gradient method. his is because we are simply minimizing Euclidean norm squared subject to equality constraints which is well conditioned. Hence =O(d). Similarly, the Exponential Weights update has time complexityo(d) as there are order d constraints, and so overall time complexity to produce ǫ approximate solution is given byo(nd), where n is the number of iterations of the proposed procedure. Once again, we shall assume that we know the value of the maximum flow F (for, otherwise, we can use binary search to obtain it). 8

11 Corollary 9. Applying the procedure for smooth convex programming from Lemma 8 to the Max Flow problem with f 0 = 0 G the 0 flow, the time complexity to compute an ǫ-approximate Max Flow is bounded by O( d 3/ log d ). ǫ his time complexity matches the known result from [8], but with a much simpler procedure (gradient descent for the flow player and Exponential Weights for the constraints). It would be interesting to see whether the techniques presented here can be used to improve the dependence on d to d 4/3 or better while maintaining the /ǫ dependence. While the result of [5] has the improved d 4/3 dependence, the complexity in terms of ǫ is much worse. 6 Discussion We close this paper with a discussion. As we showed, the notion of using extra information about the sequence is a powerful tool with applications in optimization, convex programming, game theory, to name a few. All the applications considered in this paper, however, used some notion of smoothness for constructing the predictable process M t. An interesting direction of further research is to isolate more general conditions under which the next gradient is predictable, perhaps even when the functions are not smooth in any sense. For instance one could use techniques from bundle methods to further restrict the set of possible gradients the function being optimized can have at various points in the feasible set. his could then be used to solve for the right predictable sequence to use so as to optimize the bounds. Using this notion of selecting predictable sequences one can hope to derive adaptive optimization procedures that in practice can provide rapid convergence. Acknowledgements: We thank Vianney Perchet for insightful discussions. We gratefully acknowledge the support of NSF under grants CAREER DMS and CCF-698, as well as Dean s Research Fund. References [] S. Arora, E. Hazan, and S. Kale. he multiplicative weights update method: A meta-algorithm and applications. heory of Computing, 8(): 64, 0. [] P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64():48 75, 00. [3] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 006. [4] C.-K. Chiang,. Yang, C.-J. Lee, M. Mahdavi, C.-J. Lu, R. Jin, and S. Zhu. Online optimization with gradual variations. In COL, 0. [5] P. Christiano, J. A Kelner, A. Madry, D. A. Spielman, and S.-H. eng. Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. In Proceedings of the 43rd annual ACM symposium on heory of computing, pages ACM, 0. [6] C. Daskalakis, A. Deckelbaum, and A. Kim. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the wenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, pages SIAM, 0. [7] Y. Freund and R. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 9():79 03, 999. [8] A. Goldberg and S. Rao. Beyond the flow decomposition barrier. Journal of the ACM (JACM), 45(5): ,

12 [9] A. Nemirovski. Prox-method with rate of convergence O(/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 5():9 5, 004. [0] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 03():7 5, 005. [] A. Rakhlin and K. Sridharan. Online learning with predictable sequences. In Proceedings of the 6th Annual Conference on Learning heory (COL), 03. Proofs Proof of Lemma. For any f F, f t f, t = f t g t, t M t + f t g t, M t + g t f, t () First observe that f t g t, t M t f t g t t M t. (3) Any update of the form a = argmin a A a, x +D R (a,c) satisfies for any d A a d, x D R (d,c) D R (d, a ) D R (a,c). (4) his yields f t g t, M t η (D R(g t, g t ) D R (g t, f t ) D R (f t, g t )) (5) and g t f, t η (D R(f, g t ) D R (f, g t ) D R (g t, g t )). (6) Combining, f t f, t is upper bounded by t M t f t g t + η (D R(g t, g t ) D R (g t, f t ) D R (f t, g t )) + η (D R(f, g t ) D R (f, g t ) D R (g t, g t ))) = t M t f t g t + η (D R(f, g t ) D R (f, g t ) D R (g t, f t ) D R (f t, g t )) t M t f t g t + η (D R(f, g t ) D R (f, g t ) g t f t g t f t ) (7) where in the last step we used strong convexity: for any f, f,d R (f, f ) f f. Summing over t=,..., yields, for any f F, f t f, t η D R (f, g 0 )+ t M t g t f t ( g t f t + g t f t ). η Appealing to convexity of G t s completes the proof. 0

13 Proof of Corollary. Let us re-work the proof of Lemma for the case of a changing η t. Eq. (5) and (6) are now replaced by f t g t, M t η t (D R (g t, g t ) D R (g t, f t ) D R (f t, g t )) (8) and g t f, t η t (D R (f, g t ) D R (f, g t ) D R (g t, g t )). (9) he upper bound of Eq. (7) becomes t M t f t g t + η t (D R (f, g t ) D R (f, g t ) g t f t g t f t ). Summing over t=,..., yields, for any f F, Observe that f t f, t η D R(f, g 0 )+ D R (f, g t )( )+ t M t η t η g t f t t t= ( g t f t + g t f t ) η t (η + η )R max + t M t g t f t η t ( g t f t + g t f t ) (0) η t = R max min i= t i M i + i= t i M i i= t = R max min i M i i= t i M i t M t,, () () From (), η t Rmax max t i= i M i,. Using this step size in Equation (0) and defining η =, f t f, t is upper bounded by R max t M t + + t M t g t f t R max t M t + + t+ t M t η + η t ( g t f t + g t f t ) t+ f t g t η t g t f t where we used (3) with ρ= η t+ and dropped one of the positive terms. he last two terms can be upper bounded as η t+ f t g t η t g t f t R max η (η t+ η t ) R max η +,

14 yielding an upper bound R max t M t + + η t+ t M t + R max η + 3R max t M t + + η t+ t M t. In view of (), we arrive at 3R max t M t + + R max t i M i t i M i i= i= 3R max t M t + + R max i M i i= 3.5 R max t M t + Proof of Lemma 3. Let t = G(f t ) and M t = G(g t ). hen by Lemma and by Hölder smoothness, f t f, t R η + H We can re-write the middle term in the upper bound as H g t f t +α = H((+α)η) +α ( g +α t f t ) (+α)η ( g t f t +α g t f t. (3) η H +α α ((+α)η) α) α ( g t f t +α (+α)η ) by Hölder s inequality with conjugate powers /p =( α)/ and /q =( + α)/. We further upper bound the last term using AM-GM inequality as Plugging into (3), α ( H +α α (+α) α η +α α )+ η g t f t. f t f, t R η + α ( H α (+α) +α α η +α α ). Setting η=r α H (+α) +α ( α) α α yields an upper bound of f t f, t HR +α (+α) +α ( α) α α 8HR +α α.

15 Proof of Corollary 5. Using Lemma 4, sup φ( x X R f t, x) inf supφ(f, x) f F x X η + η f φ(f t, x t ) f φ(g t, y t ) F g t f t F η + R η η + x φ(f t, x t ) x φ(g t, y t ) X η Using a+ b a + b and the smoothness assumption yields y t x t X and similarly η f φ(f t, x t ) f φ(g t, y t ) F η f φ(f t, x t ) f φ(g t, x t ) F + η f φ(g t, x t ) f φ(g t, y t ) F ηh f t g t α F + ηh x t y t α X η x φ(f t, x t ) x φ(g t, y t ) X η x φ(f t, x t ) x φ(f t, y t ) X + η x φ(f t, y t ) x φ(g t, y t ) X η H3 x t y t β X + η H4 f t g t β F. Combining, we get the upper bound of R α η + (4αη) α ηh α ( f t g t F ) + (4α η ) α ηh 4αη ( x t y t X ) 4α η g t f t F η + R η + (4β η ) β η H3 ( x t y t )β X + 4β η (4βη) β η H4 ( f t g t F β ) 4βη As in the proof of Lemma 3, we use Hölder inequality to further upper bound by R η + R η η +((4αη) α α η +((4β η ) β β η g t f t F η α H α α ) ( ) β β β H 3 ( R η + R α η +(( α)(4αη) α η +(( β )(4β η ) β β η η y t x t X y t x t X (4) f t g t F ) 4αη α H α β β H α+((4α η ) α x t y t β X 4β η ) +((4βη) β β η )+(( α )(4α η ) α 3 )+(( β)(4βη) β β η α η ) α α α H ( α η α α H β β H 4 ) β β β H4 ) ) ( x t y t α X 4α η ) f t g t β F ) 4βη 3

16 Setting η=η we get an upper bound of R + R η +(( α)(4α) α α η +α +(( β )(4β ) β R + R η α H α β +β η β β H )+(( α )(4α ) α 3 )+(( β)(4β) β α η +α α α H β η +β β β H 4 ) ) +(( α)(α) α α η +α α (H ) α ) +(( α )(α ) α α η +α α (H ) α ) +(( β )(β ) β R + R η β +β η β (H 3 ) β )+(( β)(β) β +(η +α α (H ) α )+(η +α α (H ) α ) +β +(η β (H 3 ) β )+(η β(h +β 4 ) β) R + R η +(Hη) +γ γ H β η +β Finally picking step size as η=(r + R ) γ (H) ( γ ) we conclude that β(h 4 ) β) sup x X φ( f t, x) inf sup f F x X φ(f, x) 4H(R + R ) +γ +γ (5) Proof of Proposition 6. LetR (f)= n i= f(i)ln f(i) and, respectively,r (x)= m i= x(i)ln x(i). hese functions are strongly convex with respect to norm on the respective flat simplex. We first upper bound regret of Player I, writing t as a generic observation vector, later to be chosen as Ax t, and M t as a generic predictable sequence, later chosen to be Ax t. Observe that g t g t /. Let f = e i be a vertex of the simplex. hen By the update rule, f t f, t = f t g t, t M t + f t g t, M t + g t f, t f t g t, M t η t (D R (g t, g t ) D R (g t, f t ) D R (f t, g t )) and g t f, t η t (D R (f, g t ) D R (f, g t ) D R (g t, g t )). We conclude that f t f, t is upper bounded by t M t f t g t + η t (D R (g t, g t ) D R (g t, f t ) D R (f t, g t )) + η t (D R (f, g t ) D R (f, g t ) D R (g t, g t ))) = t M t f t g t + η t (D R (f, g t ) D R (f, g t ) D R (g t, f t ) D R (f t, g t )) 4

17 Using strong convexity, the term involving the four divergences can be further upper bounded by η t (D R (f, g t ) D R (f, g t ) g t f t g t f t ) = η t (D R (f, g t ) D R (f, g t ) g t f t g t f t ) + η t (D R (f, g t ) D R (f, g t )) = η t (D R (f, g t ) D R (f, g t) g t f t g t f t )+ η t ln g t(i ) g t (i ) where f is on coordinate i and 0 everywhere else, and the norm is the l norm. Now let us bound the last term. First, whenever g t (i ) g t (i ) the term is negative. Since g t (i )=( / )g t (i )+/(n ) this happens whenever g t (i ) /n. On the other hand, for g t (i )>/n we can bound ln g t(i ) g t (i ) = ln g t (i ) ( / )g t (i )+/(n ). Using the above in the bound on f t f, t and summing over t=,...,, and using the fact that the step size are non-increasing, we conclude that f t f, t η D R (f, g 0 )+ D R (f, g t )( )+ t M t η t η g t f t t (η + η t= ( g t f t + g t f t )+ η t + )R,max+ η t t M t g t f t η t ( g t f t + g t f t ) η t. (6) where R,max is an upper bound on the largest KL divergence between f and any g that has all coordinates at least /(n ). Since f is a vertex of the flat simplex, we may take R,max log(n ). Also note that /η t c and so η t c / for large enough. Hence we conclude that a bound on regret of Player I is given by f t f, t (η + η )R,max + t Ax t g t f t Ax η t ( g t f t + g t f t )+ (7) Observe that and η t = min R,max i= t Ax i Ax i i= t Ax i Ax i Ax t Ax t, η t max{r,max i= t Ax i Ax i,} With this, the upper bound on Player I s unnormalized regret is +R,max+ Ax t Ax t + Ax t Ax t g t f t ( g t f t + g t f t ) 5

18 Adding the regret of the second player who uses step size η t, the overall bound on the suboptimality, as in Eq. (6), is +R,max + + R,max + Ax t Ax t + Ax t Ax t g t f t ft A f t A + ( g t f t + g t f t ) By over-bounding with c c+ for c 0, we obtain an upper bound f t f, t 6+R,max+ ft A ft A y t x t ( y t x t + y t x t ) Ax t Ax t + Ax t Ax t g t f t + R,max+ ft A ft A + ft A ft A y t x t ( g t f t + g t f t ) 6+R,max + 5 t Ax t Ax + g t f t + R,max + 5 ft A ft A + Since each entry of the matrix is bounded by, ( g t f t + g t f t ) ( y t x t + y t x t ) y t x t ( y t x t + y t x t ) Ax t Ax t x t x t x t y t + x t y t and similar inequality holds for the other player too. his leads to an upper bound of Now note that 6+R,max + R,max + t x t y + g t f t + 5( g t f t + g t f t )+5( y t x t + y t x t ) ( g t f t + g t f t ) 6+R,max + R,max ( y t x t + y t x t ) + 5( g t f t g t f t )+5( y t x t y t x t ). (8) g t f t g t f t =( g t f t + g t f t )( g t f t g t f t ) ( g t f t + g t f t )( g t g t ) 4 (9) 6

19 Similarly we also have that y t x t y t x t 4. Using these in Eq. (8) we conclude that the overall bound on the suboptimality is 6+R,max+ R,max+ 40 = 6+log(n )+log(m )+ 40 = 6+log(nm 4 )+ 40. his proves the result for the case when both players adhere to the prescribed algorithm. Now, consider the case when Player I adheres, but we do not make any assumption about Player II. hen, from Eq. (7) and Eq. (3) with ρ= η t, the upper bound on, f t f, t, the unnormalized regret of Player I s is (η + η )R,max+ + t Ax t g t f t Ax η t ( g t f t + g t f t ) R,max + + R,max + + Now, using the definition of the stepsize, while Ax t Ax t + η t+ Ax t Ax t R,max t Ax t g t f t Ax Ax t Ax t + t+ Ax t Ax t η + (η R,max t Ax i Ax i t i= i= Ax i Ax i i= t+ η t ) g t f t (η t+ η t Combining, we get an upper bound of R,max+ + Ax t Ax t + R,max concluding the proof. i= + 4max R,max i Ax i i= Ax, R,max Ax t Ax t + R,max R,max R,max Ax t Ax t ) 4η + η t g t f t (η t+ η Ax i Ax i i= t ) g t f t. Ax i Ax i Ax i Ax i Proof of Lemma 7. We start with the observation that â t and ā t are unbiased estimates of Ax t and Ax t respectively. hats is E it [â t ]= Ax t and E it [ā t ]= Ax t. Hence we have E[ f t Ax t inf f n f Ax t ] E[ f t, â t inf f, â t ] f n 7

20 Using the same line of proof as the one used to arrive at Eq. (7) in Proposition 6, we get that the unnormalized regret for Player I can be upper bounded as, E[ f t Ax t inf f n f Ax t ] E[(η + η )R,max+ + t ā t g t f t â η t ( g t f t + g t f t )] E[(η + η )R,max+ + t+ â t ā t η + η t+ g t f t 4 η t ( g t f t + g t f t )] Since g t f t 4, we upper bound the above by E[(η + η )R,max + + η t+ â t ā t + (η t+ η t ) 4 η t ( g t f t + g t f t )] E[R,max(η + η )++ η t+ â t ā t η t ( g t f t + g t f t )] 4 Since and η t = min R,max i= t â i ā i i= t â i ā i â t ā t, 8m R,max 8mR,max η t max{r,max i= t â i ā i,8m R,max} the upper bound on Player I s unnormalized regret is E[ f t Ax t inf f Ax t ] 56mR,max R,max R,max f n 7mR,max ( g t f t + g t f t ) â t ā t (30) Both players are honest : We first consider the case when both players play the prescribed algorithm. In this case, a similar regret bound holds for Player II. Adding the regret of the second player who uses step size η t, the overall bound on the suboptimality, as in Eq. (6), is +56R,max R,max (mr,max + nr,max )+ 7 R,max â t ā t + 7 R,max ˆb t b t Now note that 7mR,max ( g t f t + g t f t ) 7nR,max ( y t x t + y t x t ) â t ā t n δ u i t r + t r t+ r t r + t = n δ u i t δu i t A(x t x t ) n A(x t x t ) n x t x t 8

21 Similarly we have ˆb t b t m f t f t. Hence using this, we can bound the sub-optimality as +56R,max R,max (mr,max + nr,max )+ 7 nr,max x t x t + 7 mr,max f t f t 7mR,max ( g t f t + g t f t ) 7nR,max ( y t x t + y t x t ) Using the fact that c c + we further bound sub-optimality by Now note that and similarly +56R,max R,max (mr,max + nr,max )+ 7 mr,max+ 7 nr,max+ 7 nr,max x t x t + 7 mr,max f t f t 7mR,max ( g t f t + g t f t ) 7nR,max ( y t x t + y t x t ) x t x t x t y t + x t y t f t f t f t g t + f t g t Hence we can conclude that sub-optimality is bounded by +56R,max R,max (mr,max + nr,max )+ 7 mr,max+ 7 nr,max + 7mR,max ( g t f t g t f t )+7nR,max ( y t x t y t x t ) +56R,max R,max (mr,max + nr,max )+ 7 mr,max+ 7 nr,max + 8mR,max g t g t +8nR,max y t y t +56R,max R,max (mr,max + nr,max )+ 7 mr,max+ 7 nr,max + 8(mR,max+ nr,max ) Just as in the proof of Proposition 6 we have R,max log(n ) and R,max log(m ) and so overall we get the bound on sub-optimality : +56 log(m )log(n )(m log(n )+n log(m ))+ 7 log(m m )+ 7 n log(n ) + 8(m log(m )+n log(n )) Player II deviates from algorithm : Now let us consider the case when the Player deviates from the prescribed algorithm. In this case, note that starting from Eq. (30) and simply dropping the negative term we get, E[ f t Ax t inf f Ax t ] 56mR,max R,max R,max â t ā t f n 9

22 As we noted earlier, â t ā t n x t x t and so, E[ f t Ax t inf f n f Ax t ] 56mR,max R,max nr,max Further noting that R,max log(n) and R,max log(m) we conclude that E[ f t Ax t inf his concludes the proof. f n f Ax t ] 56m log(m)log(n)++ 7 log(n) n x t x t x t x t Proof of Lemma 8. Noting that the constraints are all H-strongly smooth and that the objective is linear for the maximizing player, we can apply Lemma 4 to the optimization problem withr ( )= andr the entropy function to obtain that (max G i( f ) inf max G i(f)) i [d] f F i [d] f g 0 η + logd η + β + x t (i) G i (f t ) y t (i) G i (g t ), g t f t ( g t f t η + g t f t ) d i= d i= G(f t ) G(g t ) + β y t x t η ( y t x t + y t x t ) (Strictly speaking we have used a version of Lemma 4 where the first term coming from () in Lemma is kept as a linear term.) Here, G( f) is the vector of the values of the constraints for f. We then write d i= d x t (i) G i (f t ) y t (i) G i (g t ), g t f t i= = x t (i) G i (f t ) y t (i) G i (f t ), g t f t + y t (i) G i (f t ) y t (i) G i (g t ), g t f t d i= d d i= = (x t (i) y t (i)) G i (f t ), g t f t + y t (i) G i (f t ) G i (g t ), g t f t i= d i= x t y t max G i(f t ), g t f t + y t (i) G i (f t ) G i (g t ) g t f t i [d] d i= x t y t g t f t + H f t g t g t f t where we used the fact that each G i (f). Combining, we get an upper bound of f g 0 η + log d η + β d i= + x t y t g t f t ( g t f t η + g t f t )+ H f t g t g t f t G(f t ) G(g t ) + β y t x t η d i= ( y t x t + y t x t ) 0

23 f g 0 + η β + log d η + β + H f t g t + H = f g 0 η x t y t + β f t g t + β g t f t g t f t ( g t f t η + g t f t ) y t x t η + (β+ H η ) ( g t f t + g t f t ) + log d η + ( β η ) ( y t x t + y t x t ) Picking β 0 such that η H β η we get an upper bound of f g 0 η ( y t x t + y t x t ) + logd η B η + logd η Of course, for this choice to be possible we need to pick η and η such that η H η. herefore, picking η = η H and η /H we obtain Now since is such that ǫ inf η H { B max G i( f ) inf max G i(f) i [d] f F i [d] inf { B η H η + ηlogd ηh } η + η log d ηh } we can conclude that max G i( f ) argmin max G i(f) ǫ i [d] f F i [d] Observe that for an optimal solution f G to the original optimization problem (0) we have that f F and i,g i (f ). hus, max G i( f ) +ǫ i [d] We conclude that f F is a solution that attains the optimum value F and almost satisfies the constraints. Now we have from the lemma statement that f 0 G is such that c f 0 0 and for every i [d], G i (f 0 ) γ. Hence by convexity of G i, we have that for every i [d], G i (αf 0 +( α) f ) αg i (f 0 )+( α)g i ( f ) α( γ)+( α)(+ǫ) hus for α= ǫ ǫ+γ and f ˆ =( α) f +αf 0 we can conclude that ˆf G and that all the constraints are satisfied. hat is for every i [d], G i ( f ˆ ). Also note that and, hence, ˆ f is an approximate maximizer, that is c ˆ f =( α)c f + αc f 0 =( α)f = γf ǫ+γ c (f f ˆ ) F γf ǫ+γ = F ǫ+f γ γf = ǫ( F ǫ+γ ǫ+γ ) ǫ γ F hus we obtain a(+ ǫ )-optimal solution in the multiplicative sense which concludes the proof. γ

24 Proof of Corollary 9. As mentioned, for both players, the time to perform each step of the optimistic mirror descent in the Max Flow problem is O(d). Now further note that Max Flow is a linear programming problem and so we are ready to apply Lemma 8. Specifically for f 0 we use the 0 flow which is ing (though not inf) and note that for f 0 we have that γ=. Applying Lemma 8 we get that number of iterations we need to reach an ǫ approximate solution is given by ǫ inf { B η H η + ηlog d ηh } Now we can use g 0 = argmin g F g which can be computed in O(d) time. Observe that f g 0 f + g 0 d= B. Also note that for linear constraints we have H= 0. Hence, the number of iterations is at most ǫ inf { B d log d η η + ηlogd}= ǫ Since each iteration has time complexity O(d), the overall complexity of the algorithm is given by this concludes the proof. O( d 3/ logd ) ǫ

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Online Optimization with Gradual Variations

Online Optimization with Gradual Variations JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

ORIE 6334 Spectral Graph Theory October 25, Lecture 18

ORIE 6334 Spectral Graph Theory October 25, Lecture 18 ORIE 6334 Spectral Graph Theory October 25, 2016 Lecturer: David P Williamson Lecture 18 Scribe: Venus Lo 1 Max Flow in Undirected Graphs 11 Max flow We start by reviewing the maximum flow problem We are

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

An Online Convex Optimization Approach to Blackwell s Approachability

An Online Convex Optimization Approach to Blackwell s Approachability Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

CS260: Machine Learning Theory Lecture 12: No Regret and the Minimax Theorem of Game Theory November 2, 2011

CS260: Machine Learning Theory Lecture 12: No Regret and the Minimax Theorem of Game Theory November 2, 2011 CS260: Machine Learning heory Lecture 2: No Regret and the Minimax heorem of Game heory November 2, 20 Lecturer: Jennifer Wortman Vaughan Regret Bound for Randomized Weighted Majority In the last class,

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

arxiv: v1 [cs.lg] 8 Nov 2010

arxiv: v1 [cs.lg] 8 Nov 2010 Blackwell Approachability and Low-Regret Learning are Equivalent arxiv:0.936v [cs.lg] 8 Nov 200 Jacob Abernethy Computer Science Division University of California, Berkeley jake@cs.berkeley.edu Peter L.

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Lecture 19: Follow The Regulerized Leader

Lecture 19: Follow The Regulerized Leader COS-511: Learning heory Spring 2017 Lecturer: Roi Livni Lecture 19: Follow he Regulerized Leader Disclaimer: hese notes have not been subjected to the usual scrutiny reserved for formal publications. hey

More information

Lecture notes for quantum semidefinite programming (SDP) solvers

Lecture notes for quantum semidefinite programming (SDP) solvers CMSC 657, Intro to Quantum Information Processing Lecture on November 15 and 0, 018 Fall 018, University of Maryland Prepared by Tongyang Li, Xiaodi Wu Lecture notes for quantum semidefinite programming

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria Tim Roughgarden November 4, 2013 Last lecture we proved that every pure Nash equilibrium of an atomic selfish routing

More information

II. Analysis of Linear Programming Solutions

II. Analysis of Linear Programming Solutions Optimization Methods Draft of August 26, 2005 II. Analysis of Linear Programming Solutions Robert Fourer Department of Industrial Engineering and Management Sciences Northwestern University Evanston, Illinois

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Totally Corrective Boosting Algorithms that Maximize the Margin Manfred K. Warmuth 1 Jun Liao 1 Gunnar Rätsch 2 1 University of California, Santa Cruz 2 Friedrich Miescher Laboratory, Tübingen, Germany

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Sequential prediction with coded side information under logarithmic loss

Sequential prediction with coded side information under logarithmic loss under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Game Theory, On-line prediction and Boosting (Freund, Schapire)

Game Theory, On-line prediction and Boosting (Freund, Schapire) Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,

More information

Lectures 6, 7 and part of 8

Lectures 6, 7 and part of 8 Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Approximate Nash Equilibria with Near Optimal Social Welfare

Approximate Nash Equilibria with Near Optimal Social Welfare Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 015) Approximate Nash Equilibria with Near Optimal Social Welfare Artur Czumaj, Michail Fasoulakis, Marcin

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics Tim Roughgarden November 13, 2013 1 Do Players Learn Equilibria? In this lecture we segue into the third part of the course, which studies

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved

More information

Lecture 21 November 11, 2014

Lecture 21 November 11, 2014 CS 224: Advanced Algorithms Fall 2-14 Prof. Jelani Nelson Lecture 21 November 11, 2014 Scribe: Nithin Tumma 1 Overview In the previous lecture we finished covering the multiplicative weights method and

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition Serhat S. Bucak bucakser@cse.msu.edu Rong Jin rongjin@cse.msu.edu Anil

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

Experts in a Markov Decision Process

Experts in a Markov Decision Process University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2004 Experts in a Markov Decision Process Eyal Even-Dar Sham Kakade University of Pennsylvania Yishay Mansour Follow

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

Self-stabilizing uncoupled dynamics

Self-stabilizing uncoupled dynamics Self-stabilizing uncoupled dynamics Aaron D. Jaggard 1, Neil Lutz 2, Michael Schapira 3, and Rebecca N. Wright 4 1 U.S. Naval Research Laboratory, Washington, DC 20375, USA. aaron.jaggard@nrl.navy.mil

More information

No-regret algorithms for structured prediction problems

No-regret algorithms for structured prediction problems No-regret algorithms for structured prediction problems Geoffrey J. Gordon December 21, 2005 CMU-CALD-05-112 School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract No-regret

More information

Online learning with feedback graphs and switching costs

Online learning with feedback graphs and switching costs Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence

More information

Lecture 10 + additional notes

Lecture 10 + additional notes CSE533: Information Theorn Computer Science November 1, 2010 Lecturer: Anup Rao Lecture 10 + additional notes Scribe: Mohammad Moharrami 1 Constraint satisfaction problems We start by defining bivariate

More information

Optimistic Bandit Convex Optimization

Optimistic Bandit Convex Optimization Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu

More information