Online Learning with Non-Convex Losses and Non-Stationary Regret
|
|
- Oswin Morrison
- 5 years ago
- Views:
Transcription
1 Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider online learning with non-convex loss functions. Similar to Besbes et al. [5 we apply non-stationary regret as the performance metric. In particular, we study the regret bounds under different assumptions on the information available regarding the loss functions. When the gradient of the loss function at the decision point is available, we propose an online normalized gradient descent algorithm ONGD to solve the online learning problem. In another situation, when only the value of the loss function is available, we propose a bandit online normalized gradient descent algorithm BONGD. Under a condition to be called weak pseudo-convexity WPC, we show that both algorithms achieve a cumulative regret bound of O + V, where V is the total temporal variations of the loss functions, thus establishing a sublinear regret bound for online learning with non-convex loss functions and non-stationary regret measure. Introduction Online convex optimization OCO has been studied extensively in the literature. In OCO, at each period t {,,..., }, an online player chooses a feasible strategy x t from a decision set X R n, and suffers a loss given by f t x t, where f t is a convex loss function. One key feature of the OCO is that the player must make a decision for period t without knowing the loss function f t. he performance of an OCO algorithm is usually measured by the stationary regret, which compares the accumulated loss suffered by the player with the loss suffered by the best fixed strategy. Specifically, the stationary regret is defined Proceedings of the st International Conference on Artificial Intelligence and Statistics AISAS 8, Lanzarote, Spain. PMLR: Volume 84. Copyright 8 by the authors. as Regret S {x t} f t x t f t x, where x is one best fixed decision in hindsight, i.e. x arg min x X f t x. Several sub-linear cumulative regret bounds measured by stationary regret have been established in various papers in the literature. For example, Zinkevich [3 proposed an online gradient descent algorithm which achieves an regret bound of order O for convex loss functions. he order of the regret can be further improved to Olog if the loss functions are strongly convex see Hazan et al. [7. Moreover, the bounds are shown to be tight for the OCO with convex / strongly convex loss functions respectively in Abernethy et al. [9. One important extension of the OCO is the so-called bandit online convex optimization, where the online player is only supposed to know the function value f t x t at x t, instead of the entire function f t. In particular, when the player can only observe the function value at a single point, Flaxman et al. [5 established an O 3/4 regret bound for general convex loss functions by constructing a zerothorder approximation of the gradient. Assuming that the loss functions are smooth, the regret bound can be improved to O /3 by incorporating a self-concordant regularizer see Saha and ewari [. Alternatively, if multiple points can be inquired at the same time, Agarwal et al. [ showed that the regrets can be further improved to O / and Olog for convex / strongly convex loss functions respectively. As suggested by its name, the loss functions in the OCO are assumed to be convex. Only a handful of papers studied online learning with non-convex loss functions. In the existing works, most of the time heuristic algorithms were proposed e.g. Gasso et al. [, Ertekin et al. [ without focussing on establishing sublinear regret bounds. here are a few noticeable exceptions though. Hazan and Kale [ developed an algorithm that achieves O / and O /3 regret bounds for the full information and bandit settings respectively, by assuming the loss functions to be submodular. Zhang et al. [5 showed that an O /3 regret bound still holds if the loss functions are in
2 Online Learning with Non-Convex Losses and Non-Stationary Regret the form of composition between a non-increasing scalar function and a linear function. he stationary regret requires the benchmark strategy to remain unchanged throughout the periods. his assumption may not be relevant in some of the applications. Recently, a new performance metric known as the non-stationary regret was proposed by Besbes et al. [5. he nonstationary regret compares the cumulative losses of the online player with the losses of the best possible responses: Regret NS {x t} f t x t f t x t, where x t arg min x X f t x. Clearly, the non-stationary regret is never less than the stationary regret. Besbes et al. [5 proves that if there is no restriction on the changes of the loss functions, then the non-stationary regret is linear in regardless of the strategies. o obtain meaningful bounds, the authors assumed that the temporal change of the sequence of the function {f t } is bounded. Specifically, the loss functions are assumed to be taken from the set { } V : {f, f,..., f } : f t f t+ V, 3 where f t f t sup x X f t x f t x. For nonzero temporal change V, Besbes et al. [5 then proposes algorithms with sub-linear non-stationary regret bounds: OV /3 /3, OV / /, and OV /3 /3 respectively, for the cases where loss functions are: convex with noisy gradients, strongly convex with noisy gradients, and strongly convex with noisy function values. Note that V > is assumed in the bounds. More recently, Yang et al. [6 also studied the non-stationary regret bounds for OCO. hey proposed an uncertainty set S p of the sequence of functions in which the worst-case variation of the optimal solution x t of f t referred to as the path variation is bounded: S p : {{f, f,..., f } : } x t x t+ V. max x t arg min x X f tx hey then proved some upper and lower bounds for the several different feedback structure and functional classes within the convex function class. In particular, they showed that some existing algorithms with some modification can achieve the V, V and V for true gradient with smooth condition, noisy gradient and twopoint bandit feedback, respectively. Note that V > is assumed in the bounds. In this paper, we consider online non-convex optimization with non-stationary regret as the performance metric. o the best of our knowledge, such a combination had not been studied before. For each period t, even after the decision x t is made, the online player is not assumed to know the function f t ; instead, only some partial information regarding the loss at x t is revealed. Specifically, only fx t in the first-order setting or fx t the zeroth-order setting is available to the player. Similar to Yang et al. [6, we define the uncertainty set S of the sequence of functions as follows: S : {{f, f,..., f } : x t arg min x X } f t x, t,...,, s.t. x t x t+ V. Note that S p S for the same V. In particular, consider a static sequence f t f, t,.., where f has multiple optimal solutions. his sequence would clearly belong to S even when V. However, V would have to be linear in in order for this sequence to be in S p. We propose the Online Normalized Gradient Descent ONGD and the novel Bandit Online Normalized Gradient Descent BONGD algorithms for the first-order setting and the zeroth-order setting respectively. For the loss functions satisfying 4 and a condition to be introduced later, we show that these two algorithms both achieve O + V regret bound. Compared to the regret bounds in Yang et al. [6, our regret bound for the first-order setting is worse but is the same for the zeroth-order setting. Note however, that our loss functions are non-convex and we use a weaker version of variational constraints. Regarding non-convex objective function, a related work in the literature is Hazan et al. [5, where the authors propose a Normalized Gradient Descent NGD method for solving an optimization model with the so-called strictly locally quasi-convex SLQC function as the objective. hey further show that the NGD converges to an ɛ-optimal minimum within O/ɛ iterations. his paper generalizes the results in Hazan et al. [5 in the following aspects: Hazan et al. [5 considers an optimization model, while this paper considers an online learning model. Hazan et al. [5 assumes the objective function to be strictly locally quasi-convex SLQC. In this paper we introduce the notion of weak pseudo-convexity WPC, which will be shown to be a weaker condition than the SLQC. We show that the regret bounds hold if the objective function is weak pseudo-convex. Hazan et al. [5 considers only the first-order setting, while our proposed BONGD algorithm works for the bandit zeroth-order setting as well. he rest of the paper is organized as follows. Section presents some preparations including the assumptions and notations. In Section 3, we present the ONGD algorithm and prove its regret bound. In Section 4, we present the
3 Xiang Gao, Xiaobo Li, Shuzhong Zhang BONGD algorithm for the zeroth-order setting and show its regret bound under some assumptions. Finally, we conclude the paper in Section 5. Problem Setup In this section, we present the assumptions underlying our online learning model and introduce some notations that will be used in the paper. Let X R n be a convex decision set that is known to the player. For every period t {,,..., }, the loss function is f t. hroughout the paper, we assume that X R n is bounded, i.e., there exists R > such that x R for all x X. We present the following definitions regarding the loss functions. Definition Bounded Gradient A function f is said to have bounded gradient if there exists a finite positive value M such that for all x X, it holds that fx M. Note that if f has bounded gradient, then it is also Lipschitz continuous with Lipschitz constant M on the set X. Definition Weak Pseudo-Convexity A function f is said to be weakly pseudo-convex WPC if there exists K > such that fx fx K fx x x, fx fx holds for all x X, with the convention that fx if fx, where x is one optimal solution, i.e., x arg min x X fx. Here we discuss some implications of the weak pseudoconvexity. If a differentiable function f is Lipschitz continuous and pseudo-convex, then we have see similar derivation in Nesterov [4 fx fy M fx x y, fx for all y, x with fx fy, where M is Lipschitz constant. herefore, we can simply let K M, and the function is also weakly pseudo-convex. Moreover, as another example, the star-convex function proposed by Nesterov and Polyak [6 is weakly pseudo-convexity. Proposition If f is star-convex and smooth with bounded gradient in X, then f is weakly pseudo-convex. he proof of Proposition can be found in the supplementary file. We next introduce a property that is essentially the same as the SLQC property introduced in Hazan et al. [5. Definition 3 Acute Angle Gradient of f is said to satisfy the acute angle condition if there exists a positive value Z such that cos fx, x x fx x x fx x x Z >, fx holds for all x X, with the convention that fx if fx, where x is one optimal solution, i.e., x arg min x X fx. he following proposition shows that the acute angle condition together with the Lipschitz continuity implies the weak pseudo-convexity. Proposition If f has bounded gradient and satisfies the acute angle condition, then f is weakly pseudoconvex. he proof of Proposition can be found in the supplementary file. he class of weakly pseudo-convex functions certainly go beyond the acute angle condition. For example, below is another class of functions satisfying the WPC. Proposition 3 If f has bounded gradient and satisfy the α-homogeneity with respect to its minimum, i.e., there exists α > satisfying ftx x + x fx t α fx fx, for all x X and t where x arg min x X fx, then f is weak pseudo-convex. he proof of Proposition 3 can be found in the supplementary file. Proposition 3 suggests that all non-negative homogeneous polynomial satisfies WPC with respect to. ake fx x + x + x x as an example. It is easy to verify that f satisfies the condition in Proposition 3, and thus is weakly pseudo-convex. In Figure, the curvature of fx and a sub-level set of this function are plotted. he function is not quasi-convex since the sublevel set is non-convex. However, this function satisfies the acute-angle condition in 3. Note that if f i x is α i -homogeneous with respect to the shared minimum x for all i I with α i α >, and the gradient of f i is uniformly bounded over a set X, then I i f ix is WPC. As a result, we can construct functions that are WPC but do not satisfy the acute-angle condition. Consider a two-dimensional function fx x + x 3/, and suppose that X is the unit disc centered at the origin. Clearly, fx is differentiable and Lipchitz continuous in X. Also, it is the sum of a -homogeneous function and a 3/-homogeneous function with a shared minimum,. hus fx is WPC. We compute that cos fx, x x fx x x fx x x x + 3 x 3/ 4x + 94 x x + x.
4 Online Learning with Non-Convex Losses and Non-Stationary Regret x + x + x - x this situation, we introduce the expected non-stationary regret for an algorithm that outputs a random sequence {xt } in the performance metric. 8 6 Definition 5 he expected non-stationary regret for a randomized algorithm A is defined as 4.5 ".5 S ERegretN {xt } E x - - x X # ft xt ft x t, 5 x +x + x -x 4.5 where the expectation is taken over the filtration generated by the random sequence {xt } produced by A. x.5 3 he First-Order Setting -.5 In this section, we assume that for each period, the gradient information at the current point is available to the online player after the decision is made. Specifically, at each period t, the sequence of the events is as follows: x Figure : Plot of a WPC function that is not quasi-convex. he online player chooses a strategy xt ; Consider a parameterized path x, x t/, t/3 with t >. On this path, we have cos f x, x x 3. Regret ft xt ft x t incurs but is not necessarily known to the online player. x + 3 x 3/ q. he online player receives the feedback ft xt ; 4x + 94 x x + x 7t q 9 /3 4t + 4 t t + t4/3 7t/6 q. 4t/ t/3 herefore, along the path, as t approaches to, we have cos f x, x x. his example shows that a WPC function may fail to satisfy the acute angle condition. As we mentioned before, in order to establish some sublinear non-stationary regret bound, we need to confine the loss functions {ft } in a unified manner. herefore, we introduce the uncertainty set of the loss functions S, as a set of admissible loss functions where their total variation of the minimizers are bounded by V. Definition 4 he uncertainty set of functions S is defined as S : {{f, f,..., f } : x t arg minx X ft x, P t,...,, s.t. kxt xt+ k V. 4 where V. In the zeroth-order setting to be discussed in Section 4, only the function value is available. herefore, some randomized approaches are needed in the algorithm. o account for We propose the Online Normalized Gradient Descend algorithm ONGD in this setting. he normalized gradient descent method was first proposed in Nesterov [4 which can be applied to solve the pseudo-convex minimization problem. he ONGD algorithm uses the firstorder information ft xt to compute the normalized vector ft xt /k ft xt k as the search direction. Similar to the standard gradient method, it moves along that search direction with a specific stepsize η > and then projects the point back to the decision set X ; seeqalgorithm for the details. Note that in Algorithm, X y : Algorithm Online Normalized Gradient Descent Input: feasible set X, # time period Initialization: x X for t to do chooses xt and receives the feedback gt ft xt if kgt k > then Q xt+ X xt η kggtt k else xt+ xt end if end for arg minx X ky xk is the projection operator. he main result is shown in the following theorem which claims an O + V non-stationary regret bound for ONGD.
5 Xiang Gao, Xiaobo Li, Shuzhong Zhang heorem Let V be as defined in 4. For any sequence of loss functions {f t } S where f t is weakly pseudo-convex with common constant K, let the stepsize 4R η +6RV. hen, the following regret bound holds for ONGD: Regret NS {x t} K R +.5RV O + V. Proof: Let x t, t,.., be the sequence of optimal solutions satisfying the condition in Definition 4, and z t : x t x t. hen we have: z t+ x t+ x t+ x t+ x t + x t x t+ +x t+ x t x t x t+ x t η g t x t + 6R x t x g t t+ X x t η g t g t x t + 6R x t x t+ z t + η η g t x t x t g t + 6R x t x t+. By rearranging terms and multiplying K on both sides we have K g t x t x t g t K η z t z t+ + η + 6R x t x t+. By Definition, noting that g t f t x t, we have f t x t f t x t K f tx t x t x t f t x t K z η t zt+ + η + 6R x t x t+. Summing these inequalities from t,...,, we have Regret NS {x t} K z z + + η + 6R x t x η t+ 6 4 he Zeroth-Order Setting In the previous section, it is assumed that the gradient information is available, which may not be the case in some applications. Such exceptions include the multi-armed bandit problem, dynamic pricing and Bayesian optimization. herefore, in this section, we consider the setting where the online player only receives the function value f t x t, instead of the gradient f t x t, as the feedback. As mentioned above, the zeroth-order or bandit setting has been studied in the OCO literature. he main technique in the OCO literature see Flaxman et al. [5 for example is to construct a zeroth-order approximation of the gradient of a smoothed function. hat smoothed function is often created by integrating the original loss function with a chosen probability distribution. By querying some random samples of the function value according to a probability distribution, the player is able to create an unbiased zeroth-order approximation of the gradient of the smoothed function. his is, however, not applicable in our online normalized gradient descent algorithm since what we need is the direction of the gradient. herefore, we shall first develop a new type of zeroth-order oracle that can approximate the gradient direction without averaging multiple samples of gradients when the norm of the gradient is not too small. o proceed, we require some additional conditions on the loss function. Definition 6 Error Bound here exists D > and γ > such that x x t D f t x γ, for all x X, t, where x t is the optimal solution to f t, i.e., x t arg min x X f t x. Since X is a compact set, the error bound condition is essentially the requirement for a unique optimal solution and no local minimum. Definition 7 Lipschitz Gradient here exists a positive number L, such that f t x f t y L x y, K 4R + η + 6RV. η for all x, y X, where t. As a result, by noting η 4R +6RV, we have Regret NS {x t} K R +.5RV O + V. Note that... We introduce some notations that will be used in subsequent analysis. Sn: the unit sphere in R n ; ma: the measure of set A R n ; : the area of the unit sphere Sn; ds n : the differential unit on the unit sphere Sn; A x: the indicator function of set A; sign : the sign function.
6 Online Learning with Non-Convex Losses and Non-Stationary Regret Before we present the main results, several lemmas are in order. he first lemma considers some geometric properties of the unit sphere. Lemma For any non-zero vector d R n and δ <, let Sδ x be defined as S x δ : { v Sn s.t. d v < δ }. If d δ, then there exists a constant C n >, such that Proof: We have ms x δ ms x δ < C n δ. v Sn S x δ ds n. By the symmetry of Sn, we may assume w.l.o.g. that d,...,, d. Let a δ d. Since a <, we have msδ x { } δ vn δ vds n d d v Sn Using the previous lemmas, we have the following result which constructs a zeroth-order estimator for the normalized gradient. v dv dv n v n a v + +v n a r a r a r r n dr dsn r r n r dr r dr π arcsin a arcsin a < π a πβn δ d πβn δ. By setting C n π, the desired result follows. Notice that if v Sn, then u v, v,..., v n, v n is also in Sn. As a result, the above integral will be on the direction of d d,,...,,, and its length is given by v Sn vn vv nds n v + +v n v + +v n r βn n r n drds n : Pn. v v n dsn v v n dv... dv n v v n heorem Suppose fx has Lipschitz gradient and fx δ at x. Let ɛ δ L. hen we have E fx Sn [signfx + ɛv fxv fx D nδ where v is a random vector uniformly distributed over Sn, and Pn and D n Cn. Proof: By Definition 7, we have fx + ɛv fx ɛ fx v ɛl v fx v ɛ fx + ɛv fx L fx v + ɛ ɛ L. he next lemma leads to an unbiased first-order estimator of the direction of a vector. Lemma Suppose d R n, and d. hen, signd d vvds n P n d, v Sn where P n is a constant. Proof: By the symmetry of Sn, again we may assume d,...,, d, and signd vvds n vn vvds n. v Sn v Sn Since fx v δ for v Sn\S x δ, if we let ɛ δ L, we have fx v δ hus, fx + ɛv fx ɛ fx v + δ. sign fx v sign fx v δ fx + ɛv fx sign sign fx v + δ ɛ sign fx v, implying sign fx v sign fx+ɛv fx ɛ. here-
7 Xiang Gao, Xiaobo Li, Shuzhong Zhang fore, E Sn [signfx + ɛv fxv [signfx + ɛv fxv dsn v Sn\S x δ + v Sn\S x δ + v Sn + P n fx fx + [signfx + ɛv fxv dsn [ sign fx vv dsn [signfx + ɛv fxv dsn [ sign fx vv dsn [ sign fx vv dsn [signfx + ɛv fxv dsn [ sign fx vv dsn [signfx + ɛv fxv dsn, Algorithm Bandit Online Normalized Gradient Descent Input: feasible set X, # time period, δ Initialization: x X, ɛ δ /L for t to do Sample v t uniformly over Sn R n ; play x t and x t + ɛv t ; receive feedbacks f t x t and f t x t + ɛv t ; set G t x t, v t signftxt+ɛvt ftxt v t ; update x t+ X x t ηg t x t, v t. end for Note that Algorithm actually outputs a random sequence of vectors {x t } ; hence the notion of expected non-stationary regret is applicable here. Let us denote {F t } be the filtration generated by {x t }. hen v t is independent of F t. Note that in Algorithm, at each step, it queries the function at another point x t + ɛv t. herefore, besides its output sequence {x t }, we need to include {x t + ɛv t } in our regret. We thus define ERegret NS {x t}, {x t + ɛv t } E[ f t x t f t x t + E[ f t x t + ɛv t f t x t. he following theorem shows that by choosing η and δ appropriately, we can still achieve an O + V expected non-stationary regret bound. where the last equality is due to Lemma. Putting the estimations together, we have E Sn [signfx + ɛv fxv Pn fx fx sign fx vv dsn v S δ x + v S δ x msx δ signfx + ɛv fxv dsn Cnδ. Note that Pn and D n Cn, the theorem is proved. Based on heorem, for a given δ > we have a zerothorder estimator for the normalized gradient given as: G t x t, v t signf tx t + ɛv t f t x t v t, 7 where ɛ δ /L and v t is an uniformly distributed random vector over Sn. heorem implies that the distance between the estimator and the normalized gradient can be controlled up to a factor of δ. Essentially, the Bandit Online Normalized Gradient Descent BONGD algorithm replaces the normalized gradient by G t x t, v t in the ONGD algorithm. heorem 3 Let V be defined in 4. Assume that the loss functions have Lipschitz gradients Definition 7, satisfying the error bound condition Definition 6 and are weakly pseudo-convex with bounded gradient. For any sequence of loss functions {f t } S, applying BONGD with η 4R Q +6RV n and δ min{ γ, 4 } where and P n is a constant, the following regret bound holds P n ERegret NS {x t}, {x t + ɛv t } O + V. Proof: Let z t : x t x t. hen, zt+ x t+ x t+ x t+ x t + x t x t+ +x t+ x t x t x t+ x t+ x t + R x t x t+ + 4R x t x t+ X x t ηg t x t, v t x t + 6R x t x t+ x t ηg t x t, v t x t + 6R x t x t+ zt + η G t x t, v t ηg t x t, v t x t x t +6R x t x t+ zt + η Q ηg t x t, v t x t x t + 6R x t x t+. n By rearranging the terms, we have: KG tx t, v t x t x t 8 K z t z t+ + η + 6R x t x t+. η Q n
8 Online Learning with Non-Convex Losses and Non-Stationary Regret Now based on f t x t, we have two different cases: f t x t δ. In this case, by heorem, we have herefore, E[G t x t, v t x t f tx t f t x t D n δ. f tx t f tx t K ftxt x t x t f tx t KE[G tx t, v t x t x t x t + DnK δ x t x t K E[z t x t E[z t+ x t + η + 6R x η Q t x t+ n + DnK δ x t x t K E[z t x t E[z t+ x t + η + 6R x η Q t x t+ n + 4DnK Rδ. 9 f t x t < δ. In this case, by the error bound property Definition 6 we have x t x t D f t x t γ < Dδ γ. herefore, due to the boundedness of gradient and K η f t x t f t x t M x t x t MDδ γ, E[z t x t E[z t+ x t + η Q n Q n + 6R x t x t+ KE[G tx t, v t x t x t x t K E[z t x t E[z t+ x t + η + 6R x t x t+ η +K βn Dδ γ. Adding with, it follows that f tx t f tx t K E[z t x t E[z t+ x t + η + 6R x η Q t x t+ n + K βn D + MD δ γ. In { view of 9 and, } if we let U 4C max nk P n R, K βn D + MD, then in either case the following inequality holds: Summing these inequalities over t,...,, we have ERegret NS {xt}, {x t + ɛv t} [ E f tx t + f tx t + ɛv t f tx t [ E f tx t f tx t + Mɛ v t K η E[z E[z + + η Q n + Uδ γ + M ɛ K 4R + η η Q n + 6R x t x t+ + 6RV + Uδ γ + M δ L. 4R By choosing η Q +6RV n, and δ min{ γ, 4 }, we have ERegret NS {xt}, {x t + ɛv t} K 4R + 6RV + U + M L O + V. herefore, under the additional error bound condition Definition 6 and the Lipschitz continuity of the gradient Definition 7 on the loss functions e.g. the function depicted in Figure satisfies all the conditions of heorem 3, the expected regret of BONGD remains O + V, which matches both the upper and lower bound in Yang et al. [6 for the general Lipschitz continuous convex cost functions and two point bandit feedback. Moreover, the zeroth-order estimator for the normalized gradient could be of interest on its own. 5 Concluding Remarks In this paper, we considered online learning with nonconvex loss functions and non-stationary regret measure, and established O + V regret bounds, where V is the total variation of the loss functions, for a gradient-type algorithm and a bandit-type algorithm under some conditions on the non-convex loss function. As a direction for future research, it will be interesting to find out if the same regret bound can still be established without knowing V in advance. Moreover, it remains open to extend the results to the setting where the loss functions may be noisy and non-smooth. f tx t f tx t K E[z t x t E[z t+ x t + η + 6R x t x t+ η +Uδ γ. Q n
9 Xiang Gao, Xiaobo Li, Shuzhong Zhang References Jacob Abernethy, Alekh Agarwal, Peter L Bartlett, and Alexander Rakhlin. A stochastic view of optimal regret through minimax duality. arxiv preprint arxiv:93.538, 9. Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COL, pages 8 4. Citeseer,. Omar Besbes, Yonatan Gur, and Assaf Zeevi. Nonstationary stochastic optimization. Operations Research, 635:7 44, 5. S. Ertekin, L. Bottou, and C. L. Giles. Nonconvex online support vector machines. IEEE ransactions on Pattern Analysis and Machine Intelligence, 33:368 38, Feb. ISSN doi:.9/pami..9. Abraham D Flaxman, Adam auman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages Society for Industrial and Applied Mathematics, 5. Gilles Gasso, Aristidis Pappaioannou, Marina Spivak, and Léon Bottou. Batch and online learning algorithms for nonconvex neyman-pearson classification. ACM ransactions on Intelligent Systems and echnology IS, 3:8,. Elad Hazan and Satyen Kale. Online submodular minimization. he Journal of Machine Learning Research, 3:93 9,. Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69-3:69 9, 7. Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasi-convex optimization. In Advances in Neural Information Processing Systems, pages 594 6, 5. Yurii Nesterov. Introductory lectures on convex optimization, volume 87. Springer Science & Business Media, 4. Yurii Nesterov and Boris Polyak. Cubic regularization of newton method and its global performance. Mathematical Programming, 8:77 5, 6. Ankan Saha and Ambuj ewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In International Conference on Artificial Intelligence and Statistics, pages ,. ianbao Yang, Lijun Zhang, Rong Jin, and Jinfeng Yi. racking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. In International Conference on Machine Learning, pages , 6. Lijun Zhang, ianbao Yang, Rong Jin, and Zhi-Hua Zhou. Online bandit learning for a special class of non-convex losses. In AAAI, pages , 5. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. 3.
Adaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationDynamic Regret of Strongly Adaptive Methods
Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationSupplementary Material of Dynamic Regret of Strongly Adaptive Methods
Supplementary Material of A Proof of Lemma 1 Lijun Zhang 1 ianbao Yang Rong Jin 3 Zhi-Hua Zhou 1 We first prove the first part of Lemma 1 Let k = log K t hen, integer t can be represented in the base-k
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationOnline Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationBandit Online Convex Optimization
March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationBandit Convex Optimization
March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict
More informationarxiv: v2 [math.pr] 22 Dec 2014
Non-stationary Stochastic Optimization Omar Besbes Columbia University Yonatan Gur Stanford University Assaf Zeevi Columbia University first version: July 013, current version: December 4, 014 arxiv:1307.5449v
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationOnline Submodular Minimization
Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com
More informationOptimistic Bandit Convex Optimization
Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu
More informationBandit Convex Optimization: T Regret in One Dimension
Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationAn Online Convex Optimization Approach to Blackwell s Approachability
Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical
More informationarxiv: v2 [stat.ml] 7 Sep 2018
Projection-Free Bandit Convex Optimization Lin Chen 1, Mingrui Zhang 3 Amin Karbasi 1, 1 Yale Institute for Network Science, Department of Electrical Engineering, 3 Department of Statistics and Data Science,
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationThe convex optimization approach to regret minimization
The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationOnline Optimization with Gradual Variations
JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationThe Interplay Between Stability and Regret in Online Learning
The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com
More informationMinimizing Adaptive Regret with One Gradient per Iteration
Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationarxiv: v1 [cs.lg] 8 Nov 2010
Blackwell Approachability and Low-Regret Learning are Equivalent arxiv:0.936v [cs.lg] 8 Nov 200 Jacob Abernethy Computer Science Division University of California, Berkeley jake@cs.berkeley.edu Peter L.
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More informationLogarithmic Regret Algorithms for Online Convex Optimization
Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationThe Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm
he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,
More informationRegret bounded by gradual variation for online convex optimization
Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationConvergence rate of SGD
Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected
More informationMulti-armed Bandits: Competing with Optimal Sequences
Multi-armed Bandits: Competing with Optimal Sequences Oren Anava The Voleon Group Berkeley, CA oren@voleon.com Zohar Karnin Yahoo! Research New York, NY zkarnin@yahoo-inc.com Abstract We consider sequential
More informationOnline Submodular Minimization
Journal of Machine Learning Research 13 (2012) 2903-2922 Submitted 12/11; Revised 7/12; Published 10/12 Online Submodular Minimization Elad Hazan echnion - Israel Inst. of ech. echnion City Haifa, 32000,
More informationOptimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationA Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints
A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationA Drifting-Games Analysis for Online Learning and Applications to Boosting
A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationConstrained Stochastic Gradient Descent for Large-scale Least Squares Problem
Constrained Stochastic Gradient Descent for Large-scale Least Squares Problem ABSRAC Yang Mu University of Massachusetts Boston 100 Morrissey Boulevard Boston, MA, US 015 yangmu@cs.umb.edu ianyi Zhou University
More informationLecture 25: Subgradient Method and Bundle Methods April 24
IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything
More informationLower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization
JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationOnline Learning of Optimal Strategies in Unknown Environments
215 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 215. Osaka, Japan Online Learning of Optimal Strategies in Unknown Environments Santiago Paternain and Alejandro Ribeiro Abstract
More informationOnline Learning with Predictable Sequences
JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We
More informationA Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria
More informationOnline Sparse Linear Regression
JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationSubgradient Methods for Maximum Margin Structured Learning
Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing
More informationRegret Bounds for Online Portfolio Selection with a Cardinality Constraint
Regret Bounds for Online Portfolio Selection with a Cardinality Constraint Shinji Ito NEC Corporation Daisuke Hatano RIKEN AIP Hanna Sumita okyo Metropolitan University Akihiro Yabe NEC Corporation akuro
More informationarxiv: v1 [math.oc] 7 Dec 2018
arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California
More informationConvergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to
More informationThird-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic
More informationA DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson
204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationZeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications
Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications Sijia Liu Jie Chen Pin-Yu Chen Alfred O. Hero University of Michigan Northwestern Polytechnical IBM
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationPreference Based Adaptation for Learning Objectives
Preference Based Adaptation for Learning Objectives Yao-Xiang Ding Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 223, China {dingyx, zhouzh}@lamda.nju.edu.cn
More information15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015
5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More informationAdaptive Negative Curvature Descent with Applications in Non-convex Optimization
Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa
More informationTutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,
More informationONLINE VARIANCE-REDUCING OPTIMIZATION
ONLINE VARIANCE-REDUCING OPTIMIZATION Nicolas Le Roux Google Brain nlr@google.com Reza Babanezhad University of British Columbia rezababa@cs.ubc.ca Pierre-Antoine Manzagol Google Brain manzagop@google.com
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationStochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values
Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationarxiv: v1 [cs.lg] 4 Jul 2017
Robust Optimization for Non-Convex Objectives Robert Chen Computer Science Harvard University Brendan Lucier Microsoft Research New England Yaron Singer Computer Science Harvard University Vasilis Syrgkanis
More informationGeneralized Boosting Algorithms for Convex Optimization
Alexander Grubb agrubb@cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 523 USA Abstract Boosting is a popular way to derive powerful
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationBetter Algorithms for Benign Bandits
Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More information