Online Learning with Non-Convex Losses and Non-Stationary Regret

Size: px

Start display at page:

Download "Online Learning with Non-Convex Losses and Non-Stationary Regret"

Oswin Morrison
5 years ago
Views:

1 Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider online learning with non-convex loss functions. Similar to Besbes et al. [5 we apply non-stationary regret as the performance metric. In particular, we study the regret bounds under different assumptions on the information available regarding the loss functions. When the gradient of the loss function at the decision point is available, we propose an online normalized gradient descent algorithm ONGD to solve the online learning problem. In another situation, when only the value of the loss function is available, we propose a bandit online normalized gradient descent algorithm BONGD. Under a condition to be called weak pseudo-convexity WPC, we show that both algorithms achieve a cumulative regret bound of O + V, where V is the total temporal variations of the loss functions, thus establishing a sublinear regret bound for online learning with non-convex loss functions and non-stationary regret measure. Introduction Online convex optimization OCO has been studied extensively in the literature. In OCO, at each period t {,,..., }, an online player chooses a feasible strategy x t from a decision set X R n, and suffers a loss given by f t x t, where f t is a convex loss function. One key feature of the OCO is that the player must make a decision for period t without knowing the loss function f t. he performance of an OCO algorithm is usually measured by the stationary regret, which compares the accumulated loss suffered by the player with the loss suffered by the best fixed strategy. Specifically, the stationary regret is defined Proceedings of the st International Conference on Artificial Intelligence and Statistics AISAS 8, Lanzarote, Spain. PMLR: Volume 84. Copyright 8 by the authors. as Regret S {x t} f t x t f t x, where x is one best fixed decision in hindsight, i.e. x arg min x X f t x. Several sub-linear cumulative regret bounds measured by stationary regret have been established in various papers in the literature. For example, Zinkevich [3 proposed an online gradient descent algorithm which achieves an regret bound of order O for convex loss functions. he order of the regret can be further improved to Olog if the loss functions are strongly convex see Hazan et al. [7. Moreover, the bounds are shown to be tight for the OCO with convex / strongly convex loss functions respectively in Abernethy et al. [9. One important extension of the OCO is the so-called bandit online convex optimization, where the online player is only supposed to know the function value f t x t at x t, instead of the entire function f t. In particular, when the player can only observe the function value at a single point, Flaxman et al. [5 established an O 3/4 regret bound for general convex loss functions by constructing a zerothorder approximation of the gradient. Assuming that the loss functions are smooth, the regret bound can be improved to O /3 by incorporating a self-concordant regularizer see Saha and ewari [. Alternatively, if multiple points can be inquired at the same time, Agarwal et al. [ showed that the regrets can be further improved to O / and Olog for convex / strongly convex loss functions respectively. As suggested by its name, the loss functions in the OCO are assumed to be convex. Only a handful of papers studied online learning with non-convex loss functions. In the existing works, most of the time heuristic algorithms were proposed e.g. Gasso et al. [, Ertekin et al. [ without focussing on establishing sublinear regret bounds. here are a few noticeable exceptions though. Hazan and Kale [ developed an algorithm that achieves O / and O /3 regret bounds for the full information and bandit settings respectively, by assuming the loss functions to be submodular. Zhang et al. [5 showed that an O /3 regret bound still holds if the loss functions are in

2 Online Learning with Non-Convex Losses and Non-Stationary Regret the form of composition between a non-increasing scalar function and a linear function. he stationary regret requires the benchmark strategy to remain unchanged throughout the periods. his assumption may not be relevant in some of the applications. Recently, a new performance metric known as the non-stationary regret was proposed by Besbes et al. [5. he nonstationary regret compares the cumulative losses of the online player with the losses of the best possible responses: Regret NS {x t} f t x t f t x t, where x t arg min x X f t x. Clearly, the non-stationary regret is never less than the stationary regret. Besbes et al. [5 proves that if there is no restriction on the changes of the loss functions, then the non-stationary regret is linear in regardless of the strategies. o obtain meaningful bounds, the authors assumed that the temporal change of the sequence of the function {f t } is bounded. Specifically, the loss functions are assumed to be taken from the set { } V : {f, f,..., f } : f t f t+ V, 3 where f t f t sup x X f t x f t x. For nonzero temporal change V, Besbes et al. [5 then proposes algorithms with sub-linear non-stationary regret bounds: OV /3 /3, OV / /, and OV /3 /3 respectively, for the cases where loss functions are: convex with noisy gradients, strongly convex with noisy gradients, and strongly convex with noisy function values. Note that V > is assumed in the bounds. More recently, Yang et al. [6 also studied the non-stationary regret bounds for OCO. hey proposed an uncertainty set S p of the sequence of functions in which the worst-case variation of the optimal solution x t of f t referred to as the path variation is bounded: S p : {{f, f,..., f } : } x t x t+ V. max x t arg min x X f tx hey then proved some upper and lower bounds for the several different feedback structure and functional classes within the convex function class. In particular, they showed that some existing algorithms with some modification can achieve the V, V and V for true gradient with smooth condition, noisy gradient and twopoint bandit feedback, respectively. Note that V > is assumed in the bounds. In this paper, we consider online non-convex optimization with non-stationary regret as the performance metric. o the best of our knowledge, such a combination had not been studied before. For each period t, even after the decision x t is made, the online player is not assumed to know the function f t ; instead, only some partial information regarding the loss at x t is revealed. Specifically, only fx t in the first-order setting or fx t the zeroth-order setting is available to the player. Similar to Yang et al. [6, we define the uncertainty set S of the sequence of functions as follows: S : {{f, f,..., f } : x t arg min x X } f t x, t,...,, s.t. x t x t+ V. Note that S p S for the same V. In particular, consider a static sequence f t f, t,.., where f has multiple optimal solutions. his sequence would clearly belong to S even when V. However, V would have to be linear in in order for this sequence to be in S p. We propose the Online Normalized Gradient Descent ONGD and the novel Bandit Online Normalized Gradient Descent BONGD algorithms for the first-order setting and the zeroth-order setting respectively. For the loss functions satisfying 4 and a condition to be introduced later, we show that these two algorithms both achieve O + V regret bound. Compared to the regret bounds in Yang et al. [6, our regret bound for the first-order setting is worse but is the same for the zeroth-order setting. Note however, that our loss functions are non-convex and we use a weaker version of variational constraints. Regarding non-convex objective function, a related work in the literature is Hazan et al. [5, where the authors propose a Normalized Gradient Descent NGD method for solving an optimization model with the so-called strictly locally quasi-convex SLQC function as the objective. hey further show that the NGD converges to an ɛ-optimal minimum within O/ɛ iterations. his paper generalizes the results in Hazan et al. [5 in the following aspects: Hazan et al. [5 considers an optimization model, while this paper considers an online learning model. Hazan et al. [5 assumes the objective function to be strictly locally quasi-convex SLQC. In this paper we introduce the notion of weak pseudo-convexity WPC, which will be shown to be a weaker condition than the SLQC. We show that the regret bounds hold if the objective function is weak pseudo-convex. Hazan et al. [5 considers only the first-order setting, while our proposed BONGD algorithm works for the bandit zeroth-order setting as well. he rest of the paper is organized as follows. Section presents some preparations including the assumptions and notations. In Section 3, we present the ONGD algorithm and prove its regret bound. In Section 4, we present the

3 Xiang Gao, Xiaobo Li, Shuzhong Zhang BONGD algorithm for the zeroth-order setting and show its regret bound under some assumptions. Finally, we conclude the paper in Section 5. Problem Setup In this section, we present the assumptions underlying our online learning model and introduce some notations that will be used in the paper. Let X R n be a convex decision set that is known to the player. For every period t {,,..., }, the loss function is f t. hroughout the paper, we assume that X R n is bounded, i.e., there exists R > such that x R for all x X. We present the following definitions regarding the loss functions. Definition Bounded Gradient A function f is said to have bounded gradient if there exists a finite positive value M such that for all x X, it holds that fx M. Note that if f has bounded gradient, then it is also Lipschitz continuous with Lipschitz constant M on the set X. Definition Weak Pseudo-Convexity A function f is said to be weakly pseudo-convex WPC if there exists K > such that fx fx K fx x x, fx fx holds for all x X, with the convention that fx if fx, where x is one optimal solution, i.e., x arg min x X fx. Here we discuss some implications of the weak pseudoconvexity. If a differentiable function f is Lipschitz continuous and pseudo-convex, then we have see similar derivation in Nesterov [4 fx fy M fx x y, fx for all y, x with fx fy, where M is Lipschitz constant. herefore, we can simply let K M, and the function is also weakly pseudo-convex. Moreover, as another example, the star-convex function proposed by Nesterov and Polyak [6 is weakly pseudo-convexity. Proposition If f is star-convex and smooth with bounded gradient in X, then f is weakly pseudo-convex. he proof of Proposition can be found in the supplementary file. We next introduce a property that is essentially the same as the SLQC property introduced in Hazan et al. [5. Definition 3 Acute Angle Gradient of f is said to satisfy the acute angle condition if there exists a positive value Z such that cos fx, x x fx x x fx x x Z >, fx holds for all x X, with the convention that fx if fx, where x is one optimal solution, i.e., x arg min x X fx. he following proposition shows that the acute angle condition together with the Lipschitz continuity implies the weak pseudo-convexity. Proposition If f has bounded gradient and satisfies the acute angle condition, then f is weakly pseudoconvex. he proof of Proposition can be found in the supplementary file. he class of weakly pseudo-convex functions certainly go beyond the acute angle condition. For example, below is another class of functions satisfying the WPC. Proposition 3 If f has bounded gradient and satisfy the α-homogeneity with respect to its minimum, i.e., there exists α > satisfying ftx x + x fx t α fx fx, for all x X and t where x arg min x X fx, then f is weak pseudo-convex. he proof of Proposition 3 can be found in the supplementary file. Proposition 3 suggests that all non-negative homogeneous polynomial satisfies WPC with respect to. ake fx x + x + x x as an example. It is easy to verify that f satisfies the condition in Proposition 3, and thus is weakly pseudo-convex. In Figure, the curvature of fx and a sub-level set of this function are plotted. he function is not quasi-convex since the sublevel set is non-convex. However, this function satisfies the acute-angle condition in 3. Note that if f i x is α i -homogeneous with respect to the shared minimum x for all i I with α i α >, and the gradient of f i is uniformly bounded over a set X, then I i f ix is WPC. As a result, we can construct functions that are WPC but do not satisfy the acute-angle condition. Consider a two-dimensional function fx x + x 3/, and suppose that X is the unit disc centered at the origin. Clearly, fx is differentiable and Lipchitz continuous in X. Also, it is the sum of a -homogeneous function and a 3/-homogeneous function with a shared minimum,. hus fx is WPC. We compute that cos fx, x x fx x x fx x x x + 3 x 3/ 4x + 94 x x + x.

4 Online Learning with Non-Convex Losses and Non-Stationary Regret x + x + x - x this situation, we introduce the expected non-stationary regret for an algorithm that outputs a random sequence {xt } in the performance metric. 8 6 Definition 5 he expected non-stationary regret for a randomized algorithm A is defined as 4.5 ".5 S ERegretN {xt } E x - - x X # ft xt ft x t, 5 x +x + x -x 4.5 where the expectation is taken over the filtration generated by the random sequence {xt } produced by A. x.5 3 he First-Order Setting -.5 In this section, we assume that for each period, the gradient information at the current point is available to the online player after the decision is made. Specifically, at each period t, the sequence of the events is as follows: x Figure : Plot of a WPC function that is not quasi-convex. he online player chooses a strategy xt ; Consider a parameterized path x, x t/, t/3 with t >. On this path, we have cos f x, x x 3. Regret ft xt ft x t incurs but is not necessarily known to the online player. x + 3 x 3/ q. he online player receives the feedback ft xt ; 4x + 94 x x + x 7t q 9 /3 4t + 4 t t + t4/3 7t/6 q. 4t/ t/3 herefore, along the path, as t approaches to, we have cos f x, x x. his example shows that a WPC function may fail to satisfy the acute angle condition. As we mentioned before, in order to establish some sublinear non-stationary regret bound, we need to confine the loss functions {ft } in a unified manner. herefore, we introduce the uncertainty set of the loss functions S, as a set of admissible loss functions where their total variation of the minimizers are bounded by V. Definition 4 he uncertainty set of functions S is defined as S : {{f, f,..., f } : x t arg minx X ft x, P t,...,, s.t. kxt xt+ k V. 4 where V. In the zeroth-order setting to be discussed in Section 4, only the function value is available. herefore, some randomized approaches are needed in the algorithm. o account for We propose the Online Normalized Gradient Descend algorithm ONGD in this setting. he normalized gradient descent method was first proposed in Nesterov [4 which can be applied to solve the pseudo-convex minimization problem. he ONGD algorithm uses the firstorder information ft xt to compute the normalized vector ft xt /k ft xt k as the search direction. Similar to the standard gradient method, it moves along that search direction with a specific stepsize η > and then projects the point back to the decision set X ; seeqalgorithm for the details. Note that in Algorithm, X y : Algorithm Online Normalized Gradient Descent Input: feasible set X, # time period Initialization: x X for t to do chooses xt and receives the feedback gt ft xt if kgt k > then Q xt+ X xt η kggtt k else xt+ xt end if end for arg minx X ky xk is the projection operator. he main result is shown in the following theorem which claims an O + V non-stationary regret bound for ONGD.

5 Xiang Gao, Xiaobo Li, Shuzhong Zhang heorem Let V be as defined in 4. For any sequence of loss functions {f t } S where f t is weakly pseudo-convex with common constant K, let the stepsize 4R η +6RV. hen, the following regret bound holds for ONGD: Regret NS {x t} K R +.5RV O + V. Proof: Let x t, t,.., be the sequence of optimal solutions satisfying the condition in Definition 4, and z t : x t x t. hen we have: z t+ x t+ x t+ x t+ x t + x t x t+ +x t+ x t x t x t+ x t η g t x t + 6R x t x g t t+ X x t η g t g t x t + 6R x t x t+ z t + η η g t x t x t g t + 6R x t x t+. By rearranging terms and multiplying K on both sides we have K g t x t x t g t K η z t z t+ + η + 6R x t x t+. By Definition, noting that g t f t x t, we have f t x t f t x t K f tx t x t x t f t x t K z η t zt+ + η + 6R x t x t+. Summing these inequalities from t,...,, we have Regret NS {x t} K z z + + η + 6R x t x η t+ 6 4 he Zeroth-Order Setting In the previous section, it is assumed that the gradient information is available, which may not be the case in some applications. Such exceptions include the multi-armed bandit problem, dynamic pricing and Bayesian optimization. herefore, in this section, we consider the setting where the online player only receives the function value f t x t, instead of the gradient f t x t, as the feedback. As mentioned above, the zeroth-order or bandit setting has been studied in the OCO literature. he main technique in the OCO literature see Flaxman et al. [5 for example is to construct a zeroth-order approximation of the gradient of a smoothed function. hat smoothed function is often created by integrating the original loss function with a chosen probability distribution. By querying some random samples of the function value according to a probability distribution, the player is able to create an unbiased zeroth-order approximation of the gradient of the smoothed function. his is, however, not applicable in our online normalized gradient descent algorithm since what we need is the direction of the gradient. herefore, we shall first develop a new type of zeroth-order oracle that can approximate the gradient direction without averaging multiple samples of gradients when the norm of the gradient is not too small. o proceed, we require some additional conditions on the loss function. Definition 6 Error Bound here exists D > and γ > such that x x t D f t x γ, for all x X, t, where x t is the optimal solution to f t, i.e., x t arg min x X f t x. Since X is a compact set, the error bound condition is essentially the requirement for a unique optimal solution and no local minimum. Definition 7 Lipschitz Gradient here exists a positive number L, such that f t x f t y L x y, K 4R + η + 6RV. η for all x, y X, where t. As a result, by noting η 4R +6RV, we have Regret NS {x t} K R +.5RV O + V. Note that... We introduce some notations that will be used in subsequent analysis. Sn: the unit sphere in R n ; ma: the measure of set A R n ; : the area of the unit sphere Sn; ds n : the differential unit on the unit sphere Sn; A x: the indicator function of set A; sign : the sign function.

6 Online Learning with Non-Convex Losses and Non-Stationary Regret Before we present the main results, several lemmas are in order. he first lemma considers some geometric properties of the unit sphere. Lemma For any non-zero vector d R n and δ <, let Sδ x be defined as S x δ : { v Sn s.t. d v < δ }. If d δ, then there exists a constant C n >, such that Proof: We have ms x δ ms x δ < C n δ. v Sn S x δ ds n. By the symmetry of Sn, we may assume w.l.o.g. that d,...,, d. Let a δ d. Since a <, we have msδ x { } δ vn δ vds n d d v Sn Using the previous lemmas, we have the following result which constructs a zeroth-order estimator for the normalized gradient. v dv dv n v n a v + +v n a r a r a r r n dr dsn r r n r dr r dr π arcsin a arcsin a < π a πβn δ d πβn δ. By setting C n π, the desired result follows. Notice that if v Sn, then u v, v,..., v n, v n is also in Sn. As a result, the above integral will be on the direction of d d,,...,,, and its length is given by v Sn vn vv nds n v + +v n v + +v n r βn n r n drds n : Pn. v v n dsn v v n dv... dv n v v n heorem Suppose fx has Lipschitz gradient and fx δ at x. Let ɛ δ L. hen we have E fx Sn [signfx + ɛv fxv fx D nδ where v is a random vector uniformly distributed over Sn, and Pn and D n Cn. Proof: By Definition 7, we have fx + ɛv fx ɛ fx v ɛl v fx v ɛ fx + ɛv fx L fx v + ɛ ɛ L. he next lemma leads to an unbiased first-order estimator of the direction of a vector. Lemma Suppose d R n, and d. hen, signd d vvds n P n d, v Sn where P n is a constant. Proof: By the symmetry of Sn, again we may assume d,...,, d, and signd vvds n vn vvds n. v Sn v Sn Since fx v δ for v Sn\S x δ, if we let ɛ δ L, we have fx v δ hus, fx + ɛv fx ɛ fx v + δ. sign fx v sign fx v δ fx + ɛv fx sign sign fx v + δ ɛ sign fx v, implying sign fx v sign fx+ɛv fx ɛ. here-

7 Xiang Gao, Xiaobo Li, Shuzhong Zhang fore, E Sn [signfx + ɛv fxv [signfx + ɛv fxv dsn v Sn\S x δ + v Sn\S x δ + v Sn + P n fx fx + [signfx + ɛv fxv dsn [ sign fx vv dsn [signfx + ɛv fxv dsn [ sign fx vv dsn [ sign fx vv dsn [signfx + ɛv fxv dsn [ sign fx vv dsn [signfx + ɛv fxv dsn, Algorithm Bandit Online Normalized Gradient Descent Input: feasible set X, # time period, δ Initialization: x X, ɛ δ /L for t to do Sample v t uniformly over Sn R n ; play x t and x t + ɛv t ; receive feedbacks f t x t and f t x t + ɛv t ; set G t x t, v t signftxt+ɛvt ftxt v t ; update x t+ X x t ηg t x t, v t. end for Note that Algorithm actually outputs a random sequence of vectors {x t } ; hence the notion of expected non-stationary regret is applicable here. Let us denote {F t } be the filtration generated by {x t }. hen v t is independent of F t. Note that in Algorithm, at each step, it queries the function at another point x t + ɛv t. herefore, besides its output sequence {x t }, we need to include {x t + ɛv t } in our regret. We thus define ERegret NS {x t}, {x t + ɛv t } E[ f t x t f t x t + E[ f t x t + ɛv t f t x t. he following theorem shows that by choosing η and δ appropriately, we can still achieve an O + V expected non-stationary regret bound. where the last equality is due to Lemma. Putting the estimations together, we have E Sn [signfx + ɛv fxv Pn fx fx sign fx vv dsn v S δ x + v S δ x msx δ signfx + ɛv fxv dsn Cnδ. Note that Pn and D n Cn, the theorem is proved. Based on heorem, for a given δ > we have a zerothorder estimator for the normalized gradient given as: G t x t, v t signf tx t + ɛv t f t x t v t, 7 where ɛ δ /L and v t is an uniformly distributed random vector over Sn. heorem implies that the distance between the estimator and the normalized gradient can be controlled up to a factor of δ. Essentially, the Bandit Online Normalized Gradient Descent BONGD algorithm replaces the normalized gradient by G t x t, v t in the ONGD algorithm. heorem 3 Let V be defined in 4. Assume that the loss functions have Lipschitz gradients Definition 7, satisfying the error bound condition Definition 6 and are weakly pseudo-convex with bounded gradient. For any sequence of loss functions {f t } S, applying BONGD with η 4R Q +6RV n and δ min{ γ, 4 } where and P n is a constant, the following regret bound holds P n ERegret NS {x t}, {x t + ɛv t } O + V. Proof: Let z t : x t x t. hen, zt+ x t+ x t+ x t+ x t + x t x t+ +x t+ x t x t x t+ x t+ x t + R x t x t+ + 4R x t x t+ X x t ηg t x t, v t x t + 6R x t x t+ x t ηg t x t, v t x t + 6R x t x t+ zt + η G t x t, v t ηg t x t, v t x t x t +6R x t x t+ zt + η Q ηg t x t, v t x t x t + 6R x t x t+. n By rearranging the terms, we have: KG tx t, v t x t x t 8 K z t z t+ + η + 6R x t x t+. η Q n

8 Online Learning with Non-Convex Losses and Non-Stationary Regret Now based on f t x t, we have two different cases: f t x t δ. In this case, by heorem, we have herefore, E[G t x t, v t x t f tx t f t x t D n δ. f tx t f tx t K ftxt x t x t f tx t KE[G tx t, v t x t x t x t + DnK δ x t x t K E[z t x t E[z t+ x t + η + 6R x η Q t x t+ n + DnK δ x t x t K E[z t x t E[z t+ x t + η + 6R x η Q t x t+ n + 4DnK Rδ. 9 f t x t < δ. In this case, by the error bound property Definition 6 we have x t x t D f t x t γ < Dδ γ. herefore, due to the boundedness of gradient and K η f t x t f t x t M x t x t MDδ γ, E[z t x t E[z t+ x t + η Q n Q n + 6R x t x t+ KE[G tx t, v t x t x t x t K E[z t x t E[z t+ x t + η + 6R x t x t+ η +K βn Dδ γ. Adding with, it follows that f tx t f tx t K E[z t x t E[z t+ x t + η + 6R x η Q t x t+ n + K βn D + MD δ γ. In { view of 9 and, } if we let U 4C max nk P n R, K βn D + MD, then in either case the following inequality holds: Summing these inequalities over t,...,, we have ERegret NS {xt}, {x t + ɛv t} [ E f tx t + f tx t + ɛv t f tx t [ E f tx t f tx t + Mɛ v t K η E[z E[z + + η Q n + Uδ γ + M ɛ K 4R + η η Q n + 6R x t x t+ + 6RV + Uδ γ + M δ L. 4R By choosing η Q +6RV n, and δ min{ γ, 4 }, we have ERegret NS {xt}, {x t + ɛv t} K 4R + 6RV + U + M L O + V. herefore, under the additional error bound condition Definition 6 and the Lipschitz continuity of the gradient Definition 7 on the loss functions e.g. the function depicted in Figure satisfies all the conditions of heorem 3, the expected regret of BONGD remains O + V, which matches both the upper and lower bound in Yang et al. [6 for the general Lipschitz continuous convex cost functions and two point bandit feedback. Moreover, the zeroth-order estimator for the normalized gradient could be of interest on its own. 5 Concluding Remarks In this paper, we considered online learning with nonconvex loss functions and non-stationary regret measure, and established O + V regret bounds, where V is the total variation of the loss functions, for a gradient-type algorithm and a bandit-type algorithm under some conditions on the non-convex loss function. As a direction for future research, it will be interesting to find out if the same regret bound can still be established without knowing V in advance. Moreover, it remains open to extend the results to the setting where the loss functions may be noisy and non-smooth. f tx t f tx t K E[z t x t E[z t+ x t + η + 6R x t x t+ η +Uδ γ. Q n

9 Xiang Gao, Xiaobo Li, Shuzhong Zhang References Jacob Abernethy, Alekh Agarwal, Peter L Bartlett, and Alexander Rakhlin. A stochastic view of optimal regret through minimax duality. arxiv preprint arxiv:93.538, 9. Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COL, pages 8 4. Citeseer,. Omar Besbes, Yonatan Gur, and Assaf Zeevi. Nonstationary stochastic optimization. Operations Research, 635:7 44, 5. S. Ertekin, L. Bottou, and C. L. Giles. Nonconvex online support vector machines. IEEE ransactions on Pattern Analysis and Machine Intelligence, 33:368 38, Feb. ISSN doi:.9/pami..9. Abraham D Flaxman, Adam auman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages Society for Industrial and Applied Mathematics, 5. Gilles Gasso, Aristidis Pappaioannou, Marina Spivak, and Léon Bottou. Batch and online learning algorithms for nonconvex neyman-pearson classification. ACM ransactions on Intelligent Systems and echnology IS, 3:8,. Elad Hazan and Satyen Kale. Online submodular minimization. he Journal of Machine Learning Research, 3:93 9,. Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69-3:69 9, 7. Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasi-convex optimization. In Advances in Neural Information Processing Systems, pages 594 6, 5. Yurii Nesterov. Introductory lectures on convex optimization, volume 87. Springer Science & Business Media, 4. Yurii Nesterov and Boris Polyak. Cubic regularization of newton method and its global performance. Mathematical Programming, 8:77 5, 6. Ankan Saha and Ambuj ewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In International Conference on Artificial Intelligence and Statistics, pages ,. ianbao Yang, Lijun Zhang, Rong Jin, and Jinfeng Yi. racking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. In International Conference on Machine Learning, pages , 6. Lijun Zhang, ianbao Yang, Rong Jin, and Zhi-Hua Zhou. Online bandit learning for a special class of non-convex losses. In AAAI, pages , 5. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. 3.

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn