Regret bounded by gradual variation for online convex optimization

Size: px
Start display at page:

Download "Regret bounded by gradual variation for online convex optimization"

Transcription

1 Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date Abstract Recently, it has been shown that the regret of the Follow the Regularized Leader FTRL algorithm for online linear optimization can be bounded by the total variation of the cost vectors rather than the number of rounds. In this paper, we extend this result to general online convex optimization. In particular, this resolves an open problem that has been posed in a number of recent papers. We first analyze the limitations of the FTRL algorithm in [16] when applied to online convex optimization, and extend the definition of variation to a gradual variation which is shown to be a lower bound of the total variation. We then present two novel algorithms that bound the regret by the gradual variation of cost functions. Unlike previous approaches that maintain a single sequence of solutions, the proposed algorithms maintain two sequences of solutions that make it possible to achieve a variation-based regret bound for online convex optimization. To establish the main results, we discuss a lower bound for FTRL that maintains only one sequence of solutions, and a necessary condition on smoothness of the cost functions for obtaining a gradual variation bound. We extend the main results three-fold: i we present a general method to obtain a gradual variation bound measured by general norm; ii we extend algorithms to a class of online non-smooth optimization with gradual variation bound; and Tianbao Yang NEC Laboratories America tyang@nec-labs.com The most part of the work was done when the author was at Michigan State University Mehrdad Mahdavi and Rong Jin Department of Computer Science and Engineering Michigan State University mahdavim@cse.msu.edu, rongjin@cse.msu.edu Shenghuo Zhu NEC Laboratories America zsh@nec-labs.com

2 Tianbao Yang et al. iii we develop a deterministic algorithm for online bandit optimization in multipoint bandit setting. Keywords online convex optimization regret bound gradual variation bandit 1 Introduction We consider the general online convex optimization problem [6] which proceeds in trials. At each trial, the learner is asked to predict the decision vector x t that belongs to a bounded closed convex set P R d ; it then receives a cost function c t : P R and incurs a cost of c t x t for the submitted solution. The goal of online convex optimization is to come up with a sequence of solutions x 1,..., x T that minimizes the regret, which is defined as the difference in the cost of the sequence of decisions accumulated up to the trial T made by the learner and the cost of the best fixed decision in hindsight, i.e. Regret T = c t x t min c t x. In the special case of linear cost functions, i.e. c t x = f t x, the problem becomes online linear optimization. The goal of online convex optimization is to design algorithms that predict, with a small regret, the solution x t at the tth trial given the partial knowledge about the past cost functions c τ, τ = 1,, t 1. Over the past decade, many algorithms have been proposed for online convex optimization, especially for online linear optimization. As the first seminal paper in online convex optimization, Zinkevich [6] proposed a gradient descent algorithm with a regret bound of O T. When cost functions are strongly convex, the regret bound of the online gradient descent algorithm is reduced to Olog T with appropriately chosen step size [13]. Another common methodology for online convex optimization, especially for online linear optimization, is based on the framework of Follow the Leader FTL [17]. FTL chooses x t by minimizing the cost incurred by x t in all previous trials. Since the naive FTL algorithm fails to achieve sublinear regret in the worst case, many variants have been developed to fix the problem, including Follow the Perturbed Leader FTPL [17], Follow the Regularized Leader FTRL [1], and Follow the Approximate Leader FTAL [13]. Other methodologies for online convex optimization introduce a potential function or link function to map solutions between the space of primal variables and the space of dual variables, and carry out primal-dual update based on the potential function. The well-known Exponentiated Gradient EG algorithm [0] or multiplicative weights algorithm [1] belong to this category. We note that these different algorithms are closely related. For example, in online linear optimization, the potential-based primal-dual algorithm is equivalent to FTRL algorithm [16].

3 Regret Bounded by Gradual Variation 3 Generally, most previous studies of online convex optimization bound the regret in terms of the number of trials T. However, it is expected that the regret should be low in an unchanging environment or when the cost functions are somehow correlated. Specifically, the tightest rate for the regret should depend on the variance of the sequence of cost functions rather than the number of rounds T. An algorithm with regret in terms of variation of the cost functions is expected to perform better when the cost sequence has low variation. Therefore, it is of great interest to derive a variation-based regret bound for online convex optimization in an adversarial setting. As it has been established as a fact that the regret of a natural algorithm in a stochastic setting, i.e. the cost functions are generated by a stationary stochastic process, can be bounded by the total variation in the cost functions [16], devising algorithm in fully adversarial setting, i.e. the cost functions are chosen completely adversarially, is posed as an open problem in [5]. The concern is whether or not it is possible to derive a regret bound for an online optimization algorithm by the variation of the observed cost functions. Recently [16] made a substantial progress in this route and proved a variationbased regret bound for online linear optimization by tight analysis of FTRL algorithm with an appropriately chosen step size. A similar regret bound is shown in the same paper for prediction from expert advice by slightly modifying the multiplicative weights algorithm. In this work, we aim to take one step further and contribute to this research direction by developing algorithms for general framework of online convex optimization with variation-based regret bounds. A preliminary version of this paper appeared at the Conference on Learning Theory COLT [8] as the result of a merge between two papers as discussed in a commentary written by Satyen Kale [18]. We highlight the differences between this paper and [8]. We first motivate the definition of a gradual variation by showing that the total variation bound in [16] is not necessary small when the cost functions change slowly, and the gradual variation is smaller than the total variation. We then extend FTRL to achieve a regret bounded by the gradual variation in Subsection.1. The second algorithm in Subsection. is similar to the one that appeared in [8]. Sections 3 and 4 contain generalizations of the second algorithm to non-smooth functions and a bandit setting, respectively, which appear in this paper for the first time. In the remaining of this section, we first present the results from [16] for online linear optimization and discuss their potential limitations when applied to online convex optimization which motivates our work. 1.1 Online Linear Optimization Many decision problems can be cast into online linear optimization problems, such as prediction from expert advice [6], online shortest path problem [5]. [16] proved the first variation-based regret bound for online linear optimization problems in an adversarial setting. Hazan and Kale s algorithm for online

4 4 Tianbao Yang et al. Algorithm 1 Follow The Regularized Leader for Online Linear Optimization 1: Input: η > 0 : for t = 1,..., T do 3: If t = 1, predict x t = 0 4: If t > 1, predict x t by t x t = arg min fτ x + 1 η x τ=1 5: Receive cost vector f t and incur loss f t xt 6: end for linear optimization is based on the framework of FTRL. For completeness, the algorithm is shown in Algorithm 1. At each trial, the decision vector x t is given by solving the following optimization problem: t x t = arg min τ=1 f τ x + 1 η x, where f t is the cost vector received at trial t after predicting the decision x t, and η is a step size. They bound the regret by the variation of cost vectors defined as VAR T = f t µ, 1 where µ = 1/T T f t. By assuming f t 1, t and setting the value of η to η = min/ VAR T, 1/6, they showed that the regret of Algorithm 1 can be bounded by f t x t min ft x { 15 VART if VAR T if VAR T 1. From, we can see that when the variation of the cost vectors is small less than 1, the regret is a constant, otherwise it is bounded by the variation O VAR T. This result indicates that online linear optimization in the adversarial setting is as efficient as in the stationary stochastic setting. 1. Online Convex Optimization Online convex optimization generalizes online linear optimization by replacing linear cost functions with non-linear convex cost functions. It has found applications in several domains, including portfolio management [3] and online classification [19]. For example, in online portfolio management problem, an investor wants to distribute his wealth over a set of stocks without knowing the market output in advance. If we let x t denote the distribution on the

5 Regret Bounded by Gradual Variation 5 Algorithm Follow The Regularized Leader FTRL for Online Convex Optimization 1: Input: η > 0 : for t = 1,..., T do 3: If t = 1, Predict x t = 0 4: If t > 1, Predict x t by t x t = arg min fτ x + 1 η x τ=1 5: Receive cost function c t and incur loss c tx t 6: Compute f t = c tx t 7: end for stocks and r t denote the price relative vector, i.e. r t [i] denote the the ratio of the closing price of stock i on day t to the closing price on day t 1, then an interesting function is the logarithmic growth ratio, i.e. T logx t r t, which is a concave function to be maximized. Similar to [16], we aim to develop algorithms for online convex optimization with regret bounded by the variation in the cost functions. Before presenting our algorithms, we first show that directly applying the FTRL algorithm to general online convex optimization may not be able to achieve the desired result. To extend FTRL for online convex optimization, a straightforward approach is to use the first order approximation for convex cost function, i.e., c t x c t x t + c t x t x x t, and replace the cost vector f t in Algorithm 1 with the gradient of the cost function c t at x t, i.e., f t = c t x t. The resulting algorithm is shown in Algorithm. Using the convexity of c t, we have c t x t min c t x f t x t min ft x. 3 If we assume c t x 1, t, x P, we can apply Hazan and Kale s variation-based bound in to bound the regret in 3 by the variation VAR T = f t µ = c tx t 1 c τ x τ. 4 T To better understand VAR T in 4, we rewrite VAR T as VAR T = c tx t 1 c τ x τ = 1 c T t x t c τ x τ T 1 T τ=1 τ=1 c t x t c t x τ + 1 T t,τ=1 τ=1 τ=1 c t x τ c τ x τ = VAR 1 T + VAR T. 5

6 6 Tianbao Yang et al. We see that the variation VAR T is bounded by two parts: VAR 1 T essentially measures the smoothness of individual cost functions, while VAR T measures the variation in the gradients of cost functions. Let us consider an easy setting when all cost functions are identical. In this case, VAR T vanishes, and VAR T is equal to VAR 1 T /, i.e., VAR T = 1 T t,τ=1 c t x t c τ x τ = 1 T c t x t c t x τ t,τ=1 = VAR1 T. As a result, the regret of the FTRL algorithm for online convex optimization may still be bounded by O T regardless of the smoothness of the cost function. To address this challenge, we develop two novel algorithms for online convex optimization that bound the regret by the variation of cost functions. In particular, we would like to bound the regret of online convex optimization by the variation of cost functions defined as follows GV T = max c t+1x c t x. 6 Note that the variation in 6 is defined in terms of gradual difference between individual cost function to its previous one, while the variation in 1 [16] is defined in terms of total difference between individual cost vectors to their mean. Therefore we refer to the variation defined in 6 as gradual variation 1, and to the variation defined in 1 as total variation. It is straightforward to show that when c t x = ft x, the gradual variation GV T defined in 6 is upper bounded by the total variation VAR T defined in 1 with a constant factor: f t+1 f t f t+1 µ + f t µ 4 f t µ. On the other hand, we can not bound the total variation by the gradual variation up to a constant. This is verified by the following example: f 1 = = f T/ = f and f T/+1 = = f T = g f. The total variation in 1 is given by VAR T = f t µ = T f f + g + T g f + g = ΩT, while the gradual variation defined in 6 is a constant given by GV T = f t+1 f t = f g = O1. 1 This is also termed as deviation in [8].

7 Regret Bounded by Gradual Variation 7 Based on the above analysis, we claim that the regret bound by gradual variation is usually tighter than total variation. Before closing this section, we present the following lower bound for FTRL whose proof can be found in [8]. Theorem 1 The regret of FTRL is at least ΩminGV T, T. The theorem motivates us to develop new algorithms for online convex optimization to achieve a gradual variation bound of O GV T. The remainder of the paper is organized as follows. We present in Section the proposed algorithms and the main results. Section 3 generalizes the results to the special class of non-smooth functions that is composed of a smooth and a non-smooth component. Section 4 is devoted to extending the proposed algorithms to the setting where only partial feedback about the cost functions are available, i.e., online bandit convex optimization with a variation-based regret bound. Finally, in Section 5, we conclude this work and discuss a few open problems. Algorithms and Main Results Without loss of generality, we assume that the decision set P is contained in a unit ball B, i.e., P B, and 0 P [16]. We propose two algorithms for online convex optimization. The first algorithm is an improved FTRL and the second one is based on the mirror prox method [1]. One common feature shared by the two algorithms is that both of them maintain two sequences of solutions: decision vectors x 1:T = x 1,, x T and searching vectors z 1:T = z 1,, z T that facilitate the updates of decision vectors. Both algorithms share almost the same regret bound except for a constant factor. To facilitate the discussion, besides the variation of cost functions defined in 6, we define another variation, named extended gradual variation, as follows EGV T, y 1:T = c t+1 y t c t y t c 1 y 0 + GV T, 7 where c 0 x = 0, the sequence y 0,..., y T is either z 0,..., z T as in the improved FTRL or x 0,..., x T as in the prox method and the subscript means the variation is defined with respect to l norm. When all cost functions are identical, GV T becomes zero and the extended variation EGV T, y 1:T is reduced to c 1 y 0, a constant independent from the number of trials. In the sequel, we use the notation EGV T, for simplicity. In this study, we assume smooth cost functions with Lipschtiz continuous gradients, i.e., there exists a constant L > 0 such that c t x c t z L x z, x, z P, t. 8

8 8 Tianbao Yang et al. Algorithm 3 Improved FTRL for Online Convex Optimization 1: Input: η 0, 1] : Initialization: z 0 = 0 and c 0 x = 0 3: for t = 1,..., T do 4: Predict x t by { x t = arg min x c t z t + L } η x z t 5: Receive cost function c t and incur loss c tx t 6: Update z t by { } t z t = arg min x c τ z τ + L η x τ=1 7: end for Our results show that for online convex optimization with L-smooth cost functions, the regrets of the proposed algorithms can be bounded as follows c t x t min EGVT, c t x O + constant. 9 Remark 1 We would like to emphasize that our assumption about the smoothness of cost functions is necessary to achieve the variation-based bound stated in 9. To see this, consider the special case of c 1 x = = c T x = cx. If the bound in 9 holds for any sequence of convex functions, then for the special case where all the cost functions are identical, we have cx t min cx + O1, implying that x T = 1/T T x t approaches the optimal solution at the rate of O1/T. This contradicts the lower complexity bound i.e. O1/ T for any optimization method which only uses first order information about the cost functions [, Theorem 3..1]. This analysis indicates that the smoothness assumption is necessary to attain variation based regret bound for general online convex optimization problem. We would like to emphasize the fact that this contradiction holds when only the gradient information about the cost functions is provided to the learner and the learner may be able to achieve a variation-based bound using second order information about the cost functions, which is not the focus of the present work..1 An Improved FTRL Algorithm for Online Convex Optimization Gradual variation bounds for a special class of non-smooth cost functions that are composed of a smooth component and a non-smooth component are discussed in section 3.

9 Regret Bounded by Gradual Variation 9 The improved FTRL algorithm for online convex optimization is presented in Algorithm 3. Note that in step 6, the searching vectors z t are updated according to the FTRL algorithm after receiving the cost function c t. To understand the updating procedure for the decision vector x t specified in step 4, we rewrite it as { x t = arg min c t z t + x z t c t z t + L } η x z t. 10 Notice that c t x c t z t + x z t c t z t + L x z t c t z t + x z t c t z t + L η x z t, 11 where the first inequality follows the smoothness condition in 8 and the second inequality follows from the fact η 1. The inequality 11 provides an upper bound for c t x and therefore can be used as an approximation of c t x for predicting x t. However, since c t z t is unknown before the prediction, we use c t z t as a surrogate for c t z t, leading to the updating rule in 10. It is this approximation that leads to the variation bound. The following theorem states the regret bound of Algorithm 3. Theorem Let c t, t = 1,..., T be a sequence of convex functions with L- Lipschitz continuous gradients. By setting η = min { 1, L/ EGV T, }, we have the following regret bound for Algorithm 3 c t x t min c t x max L, EGV T,. Remark Comparing with the variation bound in 5 for the FTRL algorithm, the smoothness parameter L plays the same role as VAR 1 T that accounts for the smoothness of cost functions, and term EGV T, plays the same role as VAR T that accounts for the variation in the cost functions. Compared to the FTRL algorithm, the key advantage of the improved FTRL algorithm is that the regret bound is reduced to a constant when the cost functions change only by a constant number of times along the horizon. Of course, the extended variation EGV T, may not be known apriori for setting the optimal η, we can apply the standard doubling trick [6] to obtain a bound that holds uniformly over time and is a factor at most 8 from the bound obtained with the optimal choice of η. The details are provided in Appendix A. To prove Theorem, we first present the following lemma.

10 10 Tianbao Yang et al. Lemma 1 Let c t, t = 1,..., T be a sequence of convex functions with L- Lipschitz continuous gradients. By running Algorithm 3 over T trials, we have [ ] L c t x t min η x + c t z t + x z t c t z t + η c t+1 z t c t z t L. With this lemma, we can easily prove Theorem by exploring the convexity of c t x. Proof of Theorem By using x 1, x P B, and the convexity of c t x, we have { } L min η x + c t z t + x z t c t z t L η + min c t x. Combining the above result with Lemma 1, we have c t x t min c t x L η + η L EGV T,. By choosing η = min1, L/ EGV T,, we have the regret bound claimed in Theorem. The Lemma 1 is proved by induction. The key to the proof is that z t is the optimal solution to the strongly convex minimization problem in Lemma 1, i.e., z t = arg min [ L η x + ] t c τ z τ + x z τ c τ z τ τ=1 Proof of Lemma 1 We prove the inequality by induction. When T = 1, we have x 1 = z 0 = 0 and [ ] L min η x + c 1 z 0 + x z 0 c 1 z 0 + η L c 1z 0 c 1 z 0 + η { } L L c 1z 0 + min x η x + x z 0 c 1 z 0 = c 1 z 0 = c 1 x 1. where the inequality follows that by relaxing the minimization domain x P to the whole space. We assume the inequality holds for t and aim to prove it for t + 1. To this end, we define ψ t x = [ L η x + ] t c τ z τ + x z τ c τ z τ τ=1 + η t c τ+1 z τ c τ z τ L. τ=0

11 Regret Bounded by Gradual Variation 11 According to the updating procedure for z t in step 6, we have z t = arg min ψ t x. Define φ t = ψ t z t = min ψ t x. Since ψ t x is a L/η-strongly convex function, we have ψ t+1 x ψ t+1 z t L η x z t + x z t ψ t+1 z t = L η x z t + x z t ψ t z t + c t+1 z t. Setting x = z t+1 = arg min ψ t+1 x in the above inequality results in ψ t+1 z t+1 ψ t+1 z t = φ t+1 φ t + c t+1 z t + η L c t+1z t c t z t L η z t+1 z t + z t+1 z t ψ t z t + c t+1 z t L η z t+1 z t + z t+1 z t c t+1 z t, where the second inequality follows from the fact z t = arg min ψ t x, and therefore x z t ψ t z t 0, x P. Moving c t+1 z t in the above inequality to the right hand side, we have φ t+1 φ t η L c t+1z t c t z t 1 L η z t+1 z t + z t+1 z t c t+1 z t + c t+1 z t { } L min η x z t + x z t c t+1 z t + c t+1 z t L = min η x z t + x z t c t z t +c t+1 z t + x z t c t+1 z t c t z t. }{{} }{{} rx ρx To bound the right hand side, we note that x t+1 is the minimizer of ρx by step 4 in Algorithm 3, and ρx is a L/η-strongly convex function, so we have ρx ρx t+1 + x x t+1 ρx t+1 + L }{{} η x x t+1 ρx t+1 + L η x x t+1. 0 Then we have ρx + c t+1 z t + rx ρx t+1 + c t+1 z t + L η x x t+1 + rx = L η x t+1 z t + x t+1 z t c t z t +c t+1 z t + L η x x t+1 + rx }{{} ρx t+1

12 1 Tianbao Yang et al. Plugging above inequality into the inequality in 1, we have φ t+1 φ t η L c t+1z t c t z t L η x t+1 z t + x t+1 z t c t z t + c t+1 z t { } L + min η x x t+1 + x z t c t+1 z t c t z t To continue the bounding, we proceed as follows φ t+1 φ t η L c t+1z t c t z t L η x t+1 z t + x t+1 z t c t z t + c t+1 z t { } L + min η x x t+1 + x z t c t+1 z t c t z t = L η x t+1 z t + x t+1 z t c t+1 z t + c t+1 z t { } L + min η x x t+1 + x x t+1 c t+1 z t c t z t L η x t+1 z t + x t+1 z t c t+1 z t + c t+1 z t { } L + min x η x x t+1 + x x t+1 c t+1 z t c t z t = L η x t+1 z t + x t+1 z t c t+1 z t + c t+1 z t η L c t+1z t c t z t c t+1 x t+1 η L c t+1z t c t z t, where the first equality follows by writing x t+1 z t c t z t = x t+1 z t c t+1 z t x t+1 z t c t+1 z t c t z t and combining with x z t c t+1 z t c t z t, and the last inequality follows from the smoothness condition of c t+1 x. Since by induction φ t t τ=1 c τ x τ, we have φ t+1 t+1 τ=1 c τ x τ.. A Prox Method for Online Convex Optimization In this subsection, we present a prox method for online convex optimization that shares the same order of regret bound as the improved FTRL algorithm. It is closely related to the prox method in [1] by maintaining two sets of vectors x 1:T and z 1:T, where x t and z t are computed by gradient mappings

13 Regret Bounded by Gradual Variation 13 Algorithm 4 A Prox Method for Online Convex Optimization 1: Input: η > 0 : Initialization: z 0 = x 0 = 0 and c 0 x = 0 3: for t = 1,..., T do 4: Predict x t by { x t = arg min x c t x t + L } η x z t 5: Receive cost function c t and incur loss c tx t 6: Update z t by { z t = arg min x c tx t + L } η x z t 7: end for using c t x t, and c t x t, respectively, as 1 x t = arg min z t = arg min 1 x x z t η L c tx t z t η L c tx t The detailed steps are shown in Algorithm 4, where we use an equivalent form of updates for x t and z t in order to compare to Algorithm 3. Algorithm 4 differs from Algorithm 3: i in updating the searching points z t, Algorithm 3 updates z t by the FTRL scheme using all the gradients of the cost functions at {z τ } t τ=1, while Algorithm 4 updates z t by a prox method using a single gradient c t x t, and ii in updating the decision vector x t, Algorithm 4 uses the gradient c t x t instead of c t z t. The advantage of Algorithm 4 compared to Algorithm 3 is that it only requires to compute one gradient c t x t for each loss function; in contrast, the improved FTRL algorithm in Algorithm 3 needs to compute the gradients of c t x at two searching points z t and z t. It is these differences mentioned above that make it easier to extend the prox method to a bandit setting, which will be discussed in section 5. The following theorem states the regret bound of the prox method for online convex optimization. Theorem 3 Let c t, t = 1,..., T be a sequence of convex functions with L- Lipschitz continuous gradients. By setting η = 1/ min { 1/, L/ EGV T, }, we have the following regret bound for Algorithm 4 c t x t min c t x max L, EGVT,. We note that compared to Theorem, the regret bound in Theorem 3 is slightly worse by a factor of. To prove Theorem 3, we need the following lemma, which is the Lemma 3.1 in [1] stated in our notations.

14 14 Tianbao Yang et al. Lemma Lemma 3.1 [1] Let ωz be a α-strongly convex function with respect to the norm, whose dual norm is denoted by, and Dx, z = ωx ωz + x z ω z be the Bregman distance induced by function ωx. Let Z be a convex compact set, and U Z be convex and closed. Let z Z, γ > 0, Consider the points, then for any u U, we have x = arg min u U γu ξ + Du, z, 13 z + = arg min u U γu ζ + Du, z, 14 γζ x u Du, z Du, z + + γ α ξ ζ α [ x z + x z + ]. 15 In order not to have readers struggle with complex notations in [1] for the proof of Lemma, we present a detailed proof in Appendix A which is an adaption of the original proof to our notations. Theorem 3 can be proved by using the above Lemma, because the updates of x t, z t can be written equivalently as 13 and 14, respectively. The proof below starts from 15 and bounds the summation of each term over t = 1,..., T, respectively. Proof of Theorem 3 First, we note that the two updates in step 4 and step 6 of Algorithm 4 fit in the Lemma if we let U = Z = P, z = z t, x = x t, z + = z t, and ωx = 1 x, which is 1-strongly convex function with respect to. Then Du, z = 1 u z. As a result, the two updates for x t, z t in Algorithm 4 are exactly the updates in 13 and 14 with z = z t, γ = η/l, ξ = c t z t, and ζ = c t x t. Replacing these into 15, we have the following inequality for any u P, η L x t u c t x t 1 u zt u z t + η L c tx t c t x t 1 xt z t + x t z t Then we have η L c tx t c t u η L x t u c t x t 1 u zt u z t + η L c tx t c t x t + η L c tx t c t x t 1 xt z t + x t z t 1 u zt u z t + η + η x t x t 1 L c tx t c t x t xt z t + x t z t,

15 Regret Bounded by Gradual Variation 15 where the first inequality follows the convexity of c t x, and the third inequality follows the smoothness of c t x. By taking the summation over t = 1,, T with z T = arg min u P c tu, and dividing both sides by η/l, we have c t x t min c t x L η + η c t+1 x t c t x t L + η x t x t 1 xt z t + x t z t }{{} B T 16 We can bound B T as follows: B T = x t z t + 1 T +1 x t z t t= xt z t + x t z t t= x t x t = 1 4 t= x t x t 17 where the last equality follows that x 1 into 18, we have = x 0. Plugging the above bound c t x t min c t x L η + η c t+1 x t c t x t L + η 1 x t x t 18 4 We complete the proof by plugging the value of η..3 A General Prox Method and Some Special Cases In this subsection, we first present a general prox method to obtain a variation bound defined in a general norm. Then we discuss four special cases: online linear optimization, prediction with expert advice, and online strictly convex optimization. The omitted proofs in this subsection can be easily duplicated by mimicking the proof of Theorem 3, if necessary with the help of previous analysis as mentioned in the appropriate text.

16 16 Tianbao Yang et al. Algorithm 5 A General Prox Method for Online Convex Optimization 1: Input: η > 0, ωz : Initialization: z 0 = x 0 = min z P ωz and c 0 x = 0 3: for t = 1,..., T do 4: Predict x t by { x t = arg min x c t x t + L } η Dx, z t 5: Receive cost function c t and incur loss c tx t 6: Update z t by { z t = arg min x c tx t + L } η Dx, z t 7: end for.3.1 A General Prox Method The prox method, together with Lemma provides an easy way to generalize the framework based on Euclidean norm to a general norm. To be precise, let denote a general norm, denote its dual norm, ωz be a α-strongly convex function with respect to the norm, and Dx, z = ωx ωz + x z ω z be the Bregman distance induced by function ωx. Let c t, t = 1,, T be L-smooth functions with respect to norm, i.e., c t x c t z L x z. Correspondingly, we define the extended gradual variation based on the general norm as follows: EGV T = c t+1 x t c t x t. 19 Algorithm 5 gives the detailed steps for the general framework. We note that the key differences from Algorithm 4 are: z 0 is set to min z P ωz, and the Euclidean distances in steps 4 and 6 are replaced by Bregman distances, i.e., { x t = arg min x c t x t + L } η Dx, z t, z t = arg min { x c t x t + L } η Dx, z t. The following theorem states the variation-based regret bound for the general norm framework, where R measure the size of P defined as R = max ωx min ωx.

17 Regret Bounded by Gradual Variation 17 Theorem 4 Let c t, t = 1,..., T be a sequence of convex functions whose gradients are L-smooth continuous, ωz be a α-strongly convex function, both with respect to norm, and EGV T be defined in 19. By setting η = 1/ min { α/, LR/ EGV T }, we have the following regret bound c t x t min.3. Online Linear Optimization c t x R max LR/ α, EGVT. Here we consider online linear optimization and present the algorithm and the gradual variation bound for this setting as a special case of proposed algorithm. In particular, we are interested in bounding the regret by the gradual variation T EGV f T, = f t+1 f t, where f t, t = 1,..., T are the linear cost vectors and f 0 = 0. Since linear functions are smooth functions that satisfy the inequality in 8 for any positive L > 0, therefore we can apply Algorithm 4 to online linear optimization with any positive value for L 3. The regret bound of Algorithm 4 for online linear optimization is presented in the following corollary. Corollary 1 Let c t x = ft x, t = 1,..., T be a sequence of linear functions. By setting η = 1/ EGV f T, and L = 1 in Algorithm 4, then we have f t x t min ft x EGV f T,. Remark 3 Note that the regret bound in Corollary 1 is stronger than the regret bound obtained in [16] for online linear optimization due to the fact that the gradual variation is smaller than the total variation..3.3 Prediction with Expert Advice In the problem of prediction with expert advice, the decision vector x is a distribution over m experts, i.e., x P = {x R m + : m i=1 x i = 1}. Let f t R m denote the costs for m experts in trial t. Similar to [16], we would like to bound the regret of prediction from expert advice by the gradual variation defined in infinite norm, i.e., T EGV f T, = f t+1 f t. 3 We simply set L = 1 for online linear optimization and prediction with expert advice.

18 18 Tianbao Yang et al. Since it is a special online linear optimization problem, we can apply Algorithm 4 to obtain a regret bound as in Corollary 1, i.e., f t x t min ft x EGV ft, megv f T,. However, the above regret bound scales badly with the number of experts. We can obtain a better regret bound in O EGV f T, ln m by applying the general prox method in Algorithm 5 with ωx = m i=1 x i ln x i and Dx, z = m i=1 x i lnz i /x i. The two updates in Algorithm 5 become x i t = z i t = zt i exp[η/l]ft i m j=1 zj t exp[η/l]f j i = 1,..., m t, zt i exp[η/l]ft i m j=1 zj t exp[η/l]f j, i = 1,..., m. t The resulting regret bound is formally stated in the following Corollary. Corollary Let c t x = ft x, t = 1,..., T be a sequence of linear functions in prediction with expert advice. By setting η = ln m/egv f T,, L = 1, ωx = m i=1 x i ln x i and Dx, z = m i=1 x i lnx i /z i in Algorithm 5, we have f t x t min ft x EGV f T, ln m. Remark 4 By noting the definition of EGV f T,, the regret bound in Corollary T is O max i ft+1 i f t i ln m, which is similar to the regret bound obtained in [16] for prediction with expert advice. However, the definitions of the variation are not exactly the same. In [16], the authors bound the regret of prediction with expert advice by O ln m max i f t i µ i t + ln m, T where the variation is the maximum total variation over all experts. To compare the two regret bounds, we first consider two extreme cases. When the costs of all experts are the same, then the variation in Corollary is a standard gradual variation, while the variation in [16] is a standard total variation. According to the previous analysis, a gradual variation is smaller than a total variation, therefore the regret bound in Corollary is better than that in [16]. In another extreme case when the costs at all iterations of each expert are the same, both regret bounds are constants. More generally, if we assume the maximum total variation is small say a constant, then T f t+1 i ft i is also a constant for any i [m]. By a trivial analysis T max i ft+1 i ft i T m max i f t+1 i ft i, the regret bound in Corollary might be worse up to a factor m than that in [16].

19 Regret Bounded by Gradual Variation 19 Remark 5 It was shown in [8], both the regret bounds in Corollary 1 and Corollary are optimal because they match the lower bounds for a special sequence of loss functions. In particular, for online linear optimization if all loss functions but the first T k = EGV f T, are all-0 functions, then the known lower bound Ω T k [] matches the upper bound in Corollary 1. Similarly, for prediction from expert advice if all loss functions but the first T k EGV = f T, are all-0 functions, then the known lower bound Ω T k ln m [6] matches the upper bound in Corollary..3.4 Online Strictly Convex Optimization In this subsection, we present an algorithm to achieve a logarithmic variation bound for online strictly convex optimization [14]. In particular, we assume the cost functions c t x are not only smooth but also strictly convex defined formally in the following. Definition 1 For β > 0, a function cx : P R is β-strictly convex if for any x, z P cx cz + cz x z + βx z cz cz x z 0 It is known that such a defined strictly convex function include strongly convex function and exponential concave function as special cases as long as the gradient of the function is bounded. To see this, if cx is a β -strongly convex function with a bounded gradient cx G, then cx cz + cz x z + β x z x z cz + cz x z + β G x z cz cz x z, thus cx is a β /G strictly convex. Similarly if cx is exp-concave, i.e., there exists α > 0 such that hx = exp αcx is concave, then cx is a β = 1/ min1/4gd, α strictly convex c.f. Lemma in [14], where D is defined as the diameter of the domain. Therefore, in addition to smoothness and strict convexity we also assume all the cost functions have bounded gradients, i.e., c t x G. In [8], a logarithmic gradual variation regret bound has been proved. For completeness, we present the algorithm in the framework of general prox method and summarize the results and analysis. To derive a logarithmic gradual variation bound for online strictly convex optimization, we need to change the Euclidean distance function in Algorithm 4 to a generalized Euclidean distance function. Specifically, at trial t, we let H t = I + βg I + β t τ=0 c τ x τ c τ x τ and use the generalized Euclidean distance D t x, z = 1 x z H t = 1 x z H t x z in updating

20 0 Tianbao Yang et al. x t and z t, i.e., x t = arg min z t = arg min {x c t x t + 1 x z t Ht } {x c t x t + 1 x z t Ht }, 1 To prove the regret bound, we can prove a similar inequality as in Lemma by applying ωx = 1/ x H t, which is stated as follows c t x t x t z D t z, z t D t z, z t + c t x t c t x t 1 [ H xt z t H t t + x t z t ] H t. Then by applying inequality in 0 for strictly convex function, we obtain the following c t x t c t z D t x, z t D t x, z t β x t z h t + c t x t c t x t 1 [ H xt z t H t t + x t z t ] H t, where h t = c t x t c t x t. It remains to apply the analysis in [8] to obtain a logarithmic gradual variation bound, which is stated in the following corollary and its proof is deferred to Appendix C. Corollary 3 Let c t x, t = 1,..., T be a sequence of β-strictly convex and L- smooth functions with gradients bounded by G. We assume 8dL 1, otherwise we can set L = 1/8d. An algorithm that adopts the updates in 1 has a regret bound by c t x t min c t x 1 + βg + 8d β ln max16dl, βegv T,, where EGV T, x P. = T c t+1x t c t x t and d is the dimension of 3 Online Non-Smooth Optimization with Gradual Variation Bound All the results we have obtained so far rely on the assumption that the cost functions are smooth. Additionally, at the beginning of Section, we showed that for general non-smooth functions when the only information presented to the learner is the first order information about the cost functions, it is impossible to obtain a regret bound by gradual variation. However, in this section, we show that a gradual variation bound is achievable for a special class of non-smooth functions that is composed of a smooth component and a non-smooth component.

21 Regret Bounded by Gradual Variation 1 We consider two categories for the non-smooth component. In the first category, we assume that the non-smooth component is a fixed function and is relatively easy such that the composite gradient mapping can be solved without too much computational overhead compared to gradient mapping. A common example that falls into this category is to consider a non-smooth regularizer. For example, in addition to the basic domain P, one would enforce the sparsity constraint on the decision vector x, i.e., x 0 k < d, which is important in feature selection. However, the sparsity constraint x 0 k is a non-convex function, and is usually implemented by adding a l 1 regularizer λ x 1 to the objective function, where λ > 0 is a regularization parameter. Therefore, at each iteration the cost function is given by c t x+λ x 1. To prove a regret bound by gradual variation for this type of non-smooth optimization, we first present a simplified version of the general prox method and show that it has the exactly same regret bound as stated in Subsection.3.1, and then extend the algorithm to the non-smooth optimization with a fixed non-smooth component. In the second category, we assume that the non-smooth component can be written as an explicit maximization structure. In general, we consider a timevarying non-smooth component, present a primal-dual prox method, and prove a min-max regret bound by gradual variation. When the non-smooth components are equal across all trials, the usual regret is bounded by the min-max bound plus a variation in the non-smooth component. To see an application of min-max regret bound, we consider the problem of online classification with hinge loss and show that the number of mistakes can be bounded by a variation in sequential examples. Before moving to the detailed analysis, it is worth mentioning that several pieces of works have proposed algorithms for optimizing the two types of non-smooth functions as described above to obtain an optimal convergence rate of O1/T [4,3]. Therefore, the existence of a regret bound by gradual variation for these two types of non-smooth optimization does not violate the contradictory argument in Section. 3.1 Online Non-Smooth Optimization with a Fixed Non-smooth Component A Simplified Version of Algorithm 5 In this subsection, we present a simplified version of Algorithm 5, which is the foundation for us to develop the algorithm for non-smooth optimization. The key trick is to replace the domain constraint x P with a non-smooth function in the objective. Let δ P x denote the indicator function of the domain P, i.e., 0, x P δ P x =, otherwise

22 Tianbao Yang et al. Then the proximal gradient mapping for updating x step 4 in Algorihtm 5 is equivalent to x t = arg min x c t x t + L x η Dx, z t + δ P x. By the first order optimality condition, there exists a sub-gradient v t δ P x t such that c t x t + L η ωx t ωz t + v t = 0. 3 Thus, x t is equal to x t = arg min x x c t x t + v t + L η Dx, z t. 4 Then we can change the update for z t to z t = arg min x x c t x t + v t + L η Dx, z t. 5 The key ingredient of above update compared to step 6 in Algorithm 5 is that we explicitly use the sub-gradient v t that satisfies the optimality condition for x t instead of solving a domain constrained optimization problem. The advantage of updating z t by 5 is that we can easily compute z t by the first order optimality condition, i.e., c t x t + v t + L η ωz t ωz t = 0. 6 Note that equation 3 indicates v t = c t x t ωx t + ωz t. By plugging this into 6, we reach to the following simplified update for z t, ωz t = ωx t + η L c tx t c t x t. The simplified version of Algorithm 5 is presented in Algorithm 6. Remark 6 We make three remarks for Algorithm 6. First, the searching point z t does not necessarily belong to the domain P, which is usually not a problem given that the decision point x t is always in P. Nevertheless, the update can be followed by a projection step z t = min Dx, z t to ensure the searching point also stay in the domain P, where we slightly abuse the notation z t in ωz t = ωx t + η L c tx t c t x t. Second, the update in step 6 can be implemented by [6, Chapter 11]: z t = ω ωx t + η L c tx t c t x t,

23 Regret Bounded by Gradual Variation 3 Algorithm 6 A Simplified General Prox Method for Online Convex Optimization 1: Input: η > 0, ωz : Initialization: z 0 = x 0 = min z P ωz and c 0 x = 0 3: for t = 1,..., T do 4: Predict x t by { x t = arg min x c t x t + L } η Dx, z t 5: Receive cost function c t and incur loss c tx t 6: Update z t by solving 7: end for ωz t = ωx t + η L c tx t c tx t. where ω is the Legendre-Fenchel conjugate of ω. For example, when ωx = 1/ x, ω x = 1/ x and the update for the searching point is given by z t = x t + η/l c t x t c t x t ; when ωx = i x i ln x i, ω x = log [ i expx i] and the update for the searching point can be computed by [z t ] i x i exp η/l[ c t x t c t x t ], s.t. [z t ] i = 1. Third, the key inequality in 15 for proving the regret bound still hold for ζ = c t x t + v t, ξ = c t x t + v t by noting the equivalence between the pairs 4, 5 and 13, 14, which is given below: η L c tx t + v t x t x Dx, z t Dx, z t + γ α c tx t c t x t α [ x t z t + x z t ], x P. where v t δ P x t. As a result, we can apply the same analysis as in the proof of Theorem 3 to obtain the same regret bound in Theorem 4 for Algorithm 6. Note that the above inequality remains valid even if we take a projection step after the update for z t due to the generalized pythagorean inequality Dx, z t Dx, z t, x P [6]. i 3.1. A Gradual Variation Bound for Online Non-Smooth Optimization In spirit of Algorithm 6, we present an algorithm for online non-smooth optimization of functions c t x = ĉ t x + gx with a regret bound by gradual

24 4 Tianbao Yang et al. Algorithm 7 A General Prox Method for Online Non-Smooth Optimization with a Fixed Non-Smooth Component 1: Input: η > 0, ωz : Initialization: z 0 = x 0 = min z P ωz and c 0 x = 0 3: for t = 1,..., T do 4: Predict x t by { x t = arg min x c t x t + L } η Dx, z t + gx 5: Receive cost function c t and incur loss c tx t 6: Update z t by z t = ω ωx t + η L c tx t c tx t and z t = min Dx, z t 7: end for variation EGV T = T c t+1x t c t x t. The trick is to solve the composite gradient mapping: and update z t by x t = arg min x c t x t + L η Dx, z t + gx ωz t = ωx t + η L c tx t c t x t. Algorithm 7 shows the detailed steps and Corollay 4 states the regret bound, which can be proved similarly. Corollary 4 Let c t = ĉ t + g, t = 1,..., T be a sequence of convex functions where ĉ t are L-smooth continuous w.r.t and g is a nonsmooth function, ωz be a α-strongly convex function w.r.t, and EGV T be defined in 19. By setting η = 1/ min { α/, LR/ EGV T }, we have the following regret bound c t x t min c t x R max LR/ α, EGVT. 3. Online Non-Smooth Optimization with an Explicit Max Structure In previous subsection, we assume the composite gradient mapping with the non-smooth component can be efficiently solved. Here, we replace this assumption with an explicit max structure of the non-smooth component. In what follows, we present a primal-dual prox method for such non-smooth cost functions and prove its regret bound. We consider a general setting, where

25 Regret Bounded by Gradual Variation 5 the non-smooth functions c t x has the following structure: c t x = ĉ t x + max u Q A tx, u φ t u, 7 where ĉ t x and φ t u are L 1 -smooth and L -smooth functions, respectively, and A t R m d is a matrix used to characterize the non-smooth component of c t x with φ t u by maximization. Similarly, we define a dual cost function φ t u as φ t u = φ t u + min A tx, u + ĉ t x. 8 We refer to x as the primal variable and to u as the dual variable. To motivate the setup, let us consider online classification with hinge loss l t w = max0, 1 y t w x t, where we slightly abuse the notation x t, y t to denote the attribute and label pair received at trial t. It is straightforward to see that l t w is a non-smooth function and can be cast into the form in 7 by l t w = max α [0,1] α1 y tw x t = max α [0,1] αy tx t w + α. To present the algorithm and analyze its regret bound, we introduce some notations. Let F t x, u = ĉ t x + A t x, u φ t u, ω 1 x be a α 1 -strongly convex function defined on the primal variable x w.r.t a norm p and ω u be a α -strongly convex function defined on the dual variable u w.r.t a norm q. Correspondingly, let D 1 x, z and D u, v denote the induced Bregman distance, respectively. We assume the domains P, Q are bounded and matrices A t have a bounded norm, i.e., max x p R 1, max u Q u q R max ω 1x min ω 1x M 1 max ω u min ω u M u Q u Q A t p,q = max x u A t x σ. p 1, u q 1 9 Let p, and q, denote the dual norms to p and q, respectively. To prove a variational regret bound, we define a gradual variation as follows: EGV T,p,q = ĉ t+1 x t ĉ t x t p, + R1 + R A t A t p,q φ t+1 u t φ t u t q,. 30 T + Given above notations, Algorithm 8 shows the detailed steps and Theorem 5 states a min-max bound.

26 6 Tianbao Yang et al. Algorithm 8 General Prox Method for Online Non-smooth Optimization with an Explicit Max Structure 1: Input: η > 0, ω 1 z, ω v : Initialization: z 0 = x 0 = min z P ω 1 z, v 0 = u 0 = min v Q ω v and ĉ 0 x = φ 0 u = 0 3: for t = 1,..., T do 4: Update u t by { u t = arg max u A t x t φ t u t L } u Q η D u, v t 5: Predict x t by { x t = arg min x ĉ t x t + A t u t + L } 1 η D 1x, z t 6: Receive cost function c t and incur loss c tx t 7: Update v t by { v t = arg max u A tx t φ tu t L } u Q η D u, v t 8: Update z t by 9: end for { z t = arg min x ĉ tx t + A t ut + L } 1 η D 1x, z t Theorem 5 Let c t x = ĉ t x + max u Q A t x, u φ t u, t = 1,..., T be a sequence of non-smooth functions. Assume ĉ t x, φu are L = maxl 1, L - smooth functions and the domain P, Q and A t satisfy the boundness condition as in 9. Let ωx be a α 1 -strongly convex function w.r.t the norm p, ωu be a α -strongly convex function w.r.t. the norm q, and α = minα 1, α. By setting η = min M 1 + M / EGV T,p,q, α/4 σ + L in Algorithm 8, then we have F t x t, u min F t x, u t 4 M1 + M σ M 1 + M max + L, EGV T,p,q. α max u Q To facilitate understanding, we break the proof into several Lemmas. The following Lemma is by analogy with Lemma. x Lemma 3 Let θ = denote a single vector with a norm θ = u and a dual norm θ = x p + u q x p + u q. Let ωθ = ω 1 x+ω u, Dθ, ζ =

27 Regret Bounded by Gradual Variation 7 D 1 x, u + D z, v. Then x F t x t, u t xt x η Dθ, ζ t Dθ, ζ t u F t x t, u t u t u + η x F t x t, u t x F t x t, u t p, + η u F t x t, u t u F t x t, u t q, α xt z t p + u t v t q + x t z t p + u t v t q. Proof The updates of x t, u t in Algorithm 8 can be seen as applying the xt zt updates in Lemma with θ t = in place of x, ζ t = in place of z +, u t v t zt ζ t = in place of z. Note that ωθ is a α = minα 1, α -strongly v t convex function with respect to the norm θ. Then applying the results in Lemma we can complete the proof. Applying the convexity of F t x, u in terms of x and the concavity of F t x, u in terms of u to the result in Lemma 3, we have η F t x t, u F t x, u t D 1 x, z t D 1 x, z t + D u, v t D u, v t + η ĉ t x t ĉ t x t + A t u t A tu t p, + η φ t u t φ t u t + A t x t A t x t q, α xt z t p + x t z t α p ut v t q + u t v t q D 1 x, z t D 1 x, z t + D u, v t D u, v t 31 + η ĉ t x t ĉ t x t p, + η A t u t A tu t p, + η φ t u t φ t u t q, + η A t x t A t x t q, α xt z t p + x t z t α p ut v t q + u t v t q The following lemma provides tools for proceeding the bound. Lemma 4 ĉ t x t ĉ t x t p, ĉ t x t ĉ t x t p, + L x t x t p φ t u t φ t u t q, φ t u t φ t u t q, + L u t u t q A t x t A t x t q, R 1 A t A t p,q + σ x t x t p A t u t A tu t p, R A t A t p,q + σ u t u t q

28 8 Tianbao Yang et al. Proof We prove the first and the third inequalities. Another two inequalities can be proved similarly. ĉ t x t ĉ t x t p, ĉ t x t ĉ t x t p, + ĉ t x t ĉ t x t p, ĉ t x t ĉ t x t p, + L x t x t p where we use the smoothness of ĉ t x. A t x t A t x t q, A t A t x t q, + A t x t x t p R 1 A t A t p,q + σ x t x t p Lemma 5 For any x P and u Q, we have η F t x t, u F t x, u t D 1 x, z t D 1 x, z t + D u, v t D u, v t + 4η ĉ t x t ĉ t x t p, + φ t u t φ t u t q, + R1 + R A t A t p,q + 4η σ x t x t p + 4η L x t x t p α x t z t p + x t z t p + 4η σ u t u t p + 4η L u t u t q α u t v t q + u t v t q. Proof The lemma can be proved by combining the results in Lemma 4 and the inequality in 31. η F t x t, u F t x, u t D 1 x, z t D 1 x, z t + D u, v t D u, v t + η ĉ t x t ĉ t x t p, + η A t u t A tu t p, + η φ t u t φ t u t q, + η A t x t A t x t q, α xt z t p + x t z t α p ut v t q + u t v t q D 1 x, z t D 1 x, z t + D u, v t D u, v t + 4η ĉ t x t ĉ t x t p, + 4η L x t x t p + 4η σ x t x t p + 4η R 1 A t A t p,q + 4η R A t A t p,q + 4η φ t u t φ t u t q, + 4η L u t u t q + 4η σ u t u t q α xt z t p + x t z t α p ut v t q + u t v t q

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Online Optimization with Gradual Variations

Online Optimization with Gradual Variations JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin

Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization by Mark Franklin Godwin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

Lecture 19: Follow The Regulerized Leader

Lecture 19: Follow The Regulerized Leader COS-511: Learning heory Spring 2017 Lecturer: Roi Livni Lecture 19: Follow he Regulerized Leader Disclaimer: hese notes have not been subjected to the usual scrutiny reserved for formal publications. hey

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

An Online Algorithm for Maximizing Submodular Functions

An Online Algorithm for Maximizing Submodular Functions An Online Algorithm for Maximizing Submodular Functions Matthew Streeter December 20, 2007 CMU-CS-07-171 Daniel Golovin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This research

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Convergence rate of SGD

Convergence rate of SGD Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

Supplementary Material of Dynamic Regret of Strongly Adaptive Methods

Supplementary Material of Dynamic Regret of Strongly Adaptive Methods Supplementary Material of A Proof of Lemma 1 Lijun Zhang 1 ianbao Yang Rong Jin 3 Zhi-Hua Zhou 1 We first prove the first part of Lemma 1 Let k = log K t hen, integer t can be represented in the base-k

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

The Interplay Between Stability and Regret in Online Learning

The Interplay Between Stability and Regret in Online Learning The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com

More information

Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Optimal Regularized Dual Averaging Methods for Stochastic Optimization Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Online Learning Repeated game: Aim: minimize ˆL n = Decision method plays a t World reveals l t L n l t (a

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ)

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ) 1 28 Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/) Yi Xu yi-xu@uiowa.edu Yan Yan yan.yan-3@student.uts.edu.au Qihang Lin qihang-lin@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Convex hull of two quadratic or a conic quadratic and a quadratic inequality

Convex hull of two quadratic or a conic quadratic and a quadratic inequality Noname manuscript No. (will be inserted by the editor) Convex hull of two quadratic or a conic quadratic and a quadratic inequality Sina Modaresi Juan Pablo Vielma the date of receipt and acceptance should

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Gambling in a rigged casino: The adversarial multi-armed bandit problem Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

Logarithmic Regret Algorithms for Online Convex Optimization

Logarithmic Regret Algorithms for Online Convex Optimization Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan Amit Agarwal Satyen Kale November 20, 2008 Abstract In an online convex optimization problem a decision-maker makes a sequence of

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Stochastic Semi-Proximal Mirror-Prox

Stochastic Semi-Proximal Mirror-Prox Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology nhe6@gatech.edu Zaid Harchaoui NYU, Inria firstname.lastname@nyu.edu Abstract We present a direct extension of the Semi-Proximal

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information