A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

Size: px
Start display at page:

Download "A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret"

Transcription

1 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Minghong Lin Computer Science California Institute of Technology Adam Wierman Computer Science California Institute of Technology Lachlan L.H. Andrew Centre for Advanced Internet Architectures Swinburne University of Technology Alan Roytman Computer Science University of California, Los Angeles Adam Meyerson Google Inc. ABSTRACT We consider algorithms for smoothed online convex optimization (SOCO) problems. SOCO is a variant of the class of online convex optimization (OCO) problems that is strongly related to the class of metrical task systems (MTS). Both of these have a rich history and a wide range of applications. Prior literature on these problems has focused on two performance metrics: regret and the competitive ratio. There exist known algorithms with sublinear regret and known algorithms with constant competitive ratios; however no known algorithm achieves both simultaneously. In this paper, we show that this is due to a fundamental incompatibility between these two metrics no algorithm (deterministic or randomized) can achieve sublinear regret and a constant competitive ratio, even in the case when the objective functions are linear. However, we also exhibit an algorithm that, for the important special case of one dimensional decision spaces, provides sublinear regret while maintaining a competitive ratio that grows arbitrarily slowly. To evaluate the performance of the algorithm in practice, we consider a case study focused on dynamic capacity provisioning in data centers. 1. INTRODUCTION Online convex optimization has a wide array of applications, e.g., the k-experts problem [1, 2], portfolio management [3] and network routing [4]; there is also a significant literature of work developing and analyzing algorithms that can effectively learn in this setting, e.g., [5 7]. In an online convex optimization (OCO) problem, a learner interacts with an environment in a sequence of rounds. During each round t: (i) the learner must choose an action x t from a convex decision space F ; (ii) the environment reveals a cost convex function c t, and (iii) the learner experiences Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX...$ cost c t (x t ). The goal is to design learning algorithms that minimize the cost experienced over a (long) horizon of T rounds. In this paper, we study a generalization of online convex optimization that we term smoothed online convex optimization (SOCO). The only change in SOCO compared to OCO is that the cost experienced by the learner is now c t (x t )+ x t x t 1, where is a norm. That is, the learner experiences an additional smoothing cost or switching cost associated with changing the action. By adding switching costs to online convex optimization, SOCO captures a wide variety of important applications. In fact, a strong motivation for studying SOCO are the recent developments in dynamic capacity provisioning algorithms for data centers [8 15], where the goal is to dynamically control the number and placement of active servers (x t ) in order to minimize a combination of the delay and energy costs (captured by c t ) and the switching costs involved in cycling servers into power saving modes and migrating data ( x t x t 1 ). Other computer systems applications of recent interest include video streaming [16], where the encoding quality of a video (x t ) needs to change dynamically in response to network congestion, but where large changes in encoding quality are visually annoying to users; and optical networking [17], in which there is a cost to reestablishing a light path. In addition to applications within computer systems, there are a number of problems in industrial optimization that can be modeled by SOCO. One that is of particular interest is the dynamic dispatching of electricity generators, where a particular portfolio of generation (x t ) must be allocated to cover demand, but in addition to the time-varying operating costs of the generators (c t ) there are significant setup and ramping costs associated with changing x t [18]. Further, many applications typically modeled using online convex optimization have, in reality, some cost associated with a change of action; and so may be better modeled using SOCO rather than OCO. For example, OCO encodes the so-called k-experts problem, which has many applications where switching costs can be important, e.g., in stock portfolio management there is a cost associated with adjusting the stock portfolio owned. In fact, switching costs have long been considered important in related learning problems, such as the k-armed bandit problem which has a considerable literature studying algorithms that can learn effectively

2 despite switching costs [19, 20]. Further, SOCO has applications even in contexts where there are no costs associated with switching actions. For example, if there is concept drift in a penalized estimation problem, then it is natural to make use of a regularizer (switching cost) term in order to control the speed of the drift of the estimator. Two communities, two performance metrics Though the precise formulation of SOCO does not appear to have been studied previously, there are two large bodies of literature that have studied formulations that are very related to SOCO: (i) the online convex optimization (OCO) literature within the machine learning community, e.g., [6,7], and (ii) the metrical task system (MTS) literature within the algorithms community, e.g., [21, 22]. We discuss these literatures in detail in Section 3. While there are several differences between the formulations in the two communities, a notable difference is that they focus on different performance metrics. In the OCO literature, the goal is typically to design a learning algorithm that minimizes the regret, which is the difference between the cost of the algorithm and the cost of the offline optimal static solution. In this context, a number of algorithms have been shown to provide sub-linear regret (a.k.a. no regret). For example, online gradient descent can achieve O( T )-regret [6]. Though such guarantees are proven only in the absence of switching costs, we show in Section 3.1 that the same regret bound also holds for SOCO. In the MTS literature, the goal is to design learning algorithms that minimize the competitive ratio, which is the maximum ratio between the cost of the algorithm and the cost of the offline optimal (dynamic) solution. In this setting, most results tend to be negative, e.g., for any metric space, the competitive ratio of an MTS with states chosen from that space grows without bound as the number of states grows [21, 23]. However, recently, results have shown that positive results can be found in the case when the cost function and decision space are convex. In the data center context, when x t is scalar, Lazy Capacity Provisioning (LCP) is 3-competitive [10]. Interestingly, the focus on different performance metrics in the OCO and MTS communities has led the communities to develop quite different styles of algorithms. The differences between the algorithms is highlighted by the fact that the algorithms developed in the OCO community universally perform poorly with respect to the competitive ratio and the algorithms developed in the MTS community universally perform poorly with respect to regret. Summary of contributions. This paper seeks to understand the relationship between algorithms that focus on minimizing regret and those that focus on minimizing the competitive ratio. To this end, the two questions we seek to answer are: (i) Can an algorithm simultaneously achieve both a constant competitive ratio and a sub-linear regret? (ii) Can bounds on the competitive ratio be translated into bounds on the regret, and vice versa? Despite the rich literatures focusing on SOCO-like formulations in each of the OCO and MTS communities, there are very few examples of research along the directions above. Perhaps the first, and for a long time only, such paper is the work of Blum and Burch [24], which provides some connections between the OCO and MTS literatures in the special case where there are switching costs that are a fixed constant. In this context, [24] shows how to translate bounds on regret into bounds on the competitive ratio, and vice versa. However, beyond this result there has been almost no success at connecting the various OCO literatures until a recent paper of Buchbinder et al. [25]. This paper uses a primal-dual approach to develop an algorithm that can do well for the α-unfair competitive ratio, which is a hybrid of the competitive ratio and regret that provides comparison to the dynamic optimal when α = 1 and to the static optimal when α = (see Section 2). The algorithm makes heavy use of α, and so does not simultaneously perform well for regret and the competitive ratio, but the result highlights the feasibility of unified approaches for algorithm design across competitive ratio and regret. Additionally, it is worth noting that there is work focused on achieving simultaneous guarantees with respect to the static and dynamic optimal solutions in other settings, e.g., decision making on lists and trees [26], and there have been some attempts to use algorithmic approaches from machine learning in the context of MTSs [27, 28]. In the current paper, we provide answers to both of the questions above in the context of SOCO. In answer to the first question above, we show that there is a fundamental incompatibility between regret and competitive ratio no algorithm can maintain both sublinear regret and a constant competitive ratio (Theorem 2). This incompatibility does not stem from a pathological example: it holds even for the simple case when c t is linear and x t is scalar. Further, it holds for both deterministic and randomized algorithms and when the α-unfair competitive ratio is considered. Thus, it is no accident that algorithms from the OCO community perform poorly with respect to the competitive ratio and algorithms from the MTS community perform poorly with respect to regret. This is disappointing since there are many applications where providing both guarantees on regret and on the competitive ratio would be desirable. For example, providing sub-linear regret can be seen as providing justification for incurring the added complexity of implementing a dynamic control policy (you always improve upon any simple static policy), while providing a constant competitive ratio can be seen as providing a near-optimal guarantee. Though providing sub-linear regret and a constant competitive ratio is impossible, we show that it is possible to nearly achieve this goal. More specifically, we present a new algorithm named Randomly Biased Greedy (RBG) which achieves a competitive ratio of (1+γ) while maintaining O(max{T/γ, γ}) regret for γ 1 on one-dimensional action spaces. If γ can be chosen as a function of T, then this algorithm can be used to balance between regret and the competitive ratio. For example, it can achieve sub-linear regret while having an arbitrarily slowly growing competitive ratio, or it can achieve O( T ) regret while maintaining an O( T ) competitive ratio. Note that, unlike the scheme of [25], this algorithm has a finite competitive ratio on continuous action spaces and provides a simultaneous guarantee on both regret and the competitive ratio. The main limitation of RBG is that the guarantees on regret only hold when the action space is onedimensional (unlike those for the algorithm in [25]). However, this special case is important in a number of settings. For example, it has been applied for dynamic capacity provisioning in data centers [10, 11], it is a crucial subproblem for k-server on binary trees [29], it has been used to study the handover of cell phones [30, 31], and it can be used for

3 controlling the degradation of quality in video with varying encoding rate [16]. Importantly, the analysis of RBG provides an answer to the second question above. In particular, the analysis provides a generic approach that can be used to translate between bounds on the competitive ratio and bounds on the regret, as well as a general technique for parameterizing any algorithm in order to achieve simultaneous guarantees on the competitive ratio and regret. Finally, we illustrate the contrast between algorithms focused on regret and algorithms focused on competitive ratio using a case study in Section 6. Additionally, this case study provides a practical evaluation of the performance of RBG. The case study we focus on is that of dynamic capacity provisioning in a data center, i.e., dynamically adapting the number of active servers in a data center in response to dynamic load. 2. PROBLEM FORMULATION Our focus is on smoothed online convex optimization (SOCO) problems, which is a class of online optimization problems that come up in a wide range of applications, as discussed in the introduction. To ground the work in this paper, we present a case study of dynamic capacity provisioning in data centers in Section 6. Formally, a SOCO problem is defined as follows. There is a convex decision/action space F (R + ) n and a sequence of cost functions {c 1, c 2,... }, where each c t : F R +. At each time t, a learner/algorithm chooses an action vector x t F and the environment chooses a cost function c t. The algorithm is then evaluated on the cost function and pays a switching penalty/cost corresponding to the difference between the actions. Cost formulation More specifically, consider the following sequence of environment revelations and algorithm decisions:..., x t, c t, x t+1, c t+1,.... The cost of a (strictly) online algorithm A over a horizon of T rounds is [ T ] C(A) = E c t (x t ) + x t x t 1, where x 1,..., x t are the decisions of algorithm A, x 0 = 0 without loss of generality, the expectation is over any randomness used by the algorithm, and is a seminorm on R n. 1 We use C (A) when [ we do not wish to include switching costs, i.e., C T ] (A) = E ct (x t ). As we have discussed in the introduction, there are two main literatures that focus on SOCO-like formulations: OCO and MTS. In these two literatures, the orders of events in a round are different. The OCO setting follows the ordering described above, where the algorithm plays first. However, in the MTS setting the environment plays first. Consequently, the algorithm can be considered to have a 1-step lookahead. In general, an algorithm may have a longer lookahead, and so we define the cost of an online algorithm A with lookahead i as [ T ] C i(a) = E c t (x t+i ) + x t+i x t+i 1, 1 Recall that a seminorm satisfies the axioms of a norm except that x = 0 does not imply x = 0. where x i = 0 without loss of generality. Using this notation, the cost considered by the MTS literature is C 1. As above, we use C i(a) if we do not want to consider switching costs. Importantly, it is possible to relate C i to C i 1, and thus to relate the cost considered by the MTS literature to the cost considered by the OCO literature, as done in [24, 25]. Specifically, the penalty incurred by the learner due to acting before the t th cost function is revealed can be bounded by c t (x t ) c t (x t+1 ) c t (x t )(x t x t+1 ) (1) c t (x t ) 2 x t x t+1 2, where 2 is the Euclidean norm. A common assumption in OCO, which we adopt in this paper, is that c t ( ) 2 are bounded on a given instance, and thus the difference between the OCO and MTS costs (i.e., C 0 and C 1) is not too large. One final variation on the cost function that we consider is the α-penalized cost, which penalizes changes in action by an extra factor of α 1. Formally, we define Ci α as follows: [ T ] Ci α (A) = E c t (x t+i ) + α x t+i x t+i 1. Performance metrics To evaluate the performance of algorithms for SOCO problems, it is typical to compare the cost of the algorithm with the cost of an offline optimal solution. However, there are a wide variety of optimal solutions that serve as benchmarks for comparison. Further, the form of the comparison can be additive or multiplicative depending on the community. Within the OCO literature, the typical benchmark that is compared against is the optimal offline static action, i.e., OP T s = min x F c t (x). Using this benchmark for comparison, the most typical performance metric considered is the regret. The regret of an online learning algorithm is defined as the (additive) difference between its cost and the cost of the optimal static action vector. Specifically, the regret of Algorithm A on instances C, R (A), is less than ρ(t ) if for any sequence of cost functions (c 1,..., c T ) C, C (A) OP T s ρ(t ) (2) In the traditional definition of regret given above, switching costs and lookahead are not considered. However, a natural generalization is R i(a) for which C (A) is replaced by C i(a) in (2). The larger the set C, the stronger an upper bound on regret is, and the smaller C, the stronger a lower bound is. In this paper, except where noted, we take C to be the set C 1 of sequences of convex functions mapping (R + ) n to R + with gradient uniformly bounded over the sequence. Within the MTS literature, the typical benchmark that is compared against is the optimal offline (dynamic) solution: OP T d = min x F T c t (x t ) + x t x t 1. Note that the minimal cost is the same regardless of lookahead since the cost functions are fixed. Using this benchmark for comparison, the most typical performance metric considered is the competitive ratio,

4 which compares the cost of the algorithm to that of the offline optimal. In the MTS literature, the cost typically considered is C 1. However, in more generality the competitive ratio with lookahead i, denoted by CR i(a), is ρ(t ) if for any sequence of cost functions (c 1,..., c T ) C C i(a) ρ(t )OP T d + O(1). (3) The case of i = 1 corresponds to the typical notion of competitive ratio studied in the MTS literature (since C 1 is used to measure the cost in this literature). Bridging competitiveness and regret Clearly, there is significant motivation for studying regret and competitive ratio. The goal of this paper is to understand the relationship between these measures. There are two important contrasts between regret and the competitive ratio that are important to distinguish: (i) the benchmarks chosen for comparison are the static and dynamic offline optimal solutions, respectively, and (ii) the comparisons are additive and multiplicative, respectively. A main result of this paper is that regret and the competitive ratio are fundamentally incompatible, in that no algorithm can perform well on both measures. Thus, it is important to understand which of (i) and/or (ii) above leads to this incompatibility. To tease apart these issues, we need to define variations of the competitive ratio and regret that allow us to bridge these differences. First, there are a variety of options for bridging (i). In particular, there are a number of hybrid measures in both the OCO and MTS literatures that use benchmarks that are in some sense in between the static optimal and the dynamic optimal. For example, Adaptive-Regret [32] is defined as the maximum regret over any interval, where the static optimum can differ for different intervals, and Internal regret [33] compares the online policy against a simple perturbation of that policy. The metric we use is termed the α-unfair competitive ratio [23 25], and we denote it with CRi α (A) in the case of i-lookahead. Formally, CRi α (A) is defined exactly the same as the competitive ratio except that the benchmark for comparison is OP T α d = min x F T c t (x t ) + α x t x t 1, where α 1. Specifically, CRi α (A) is ρ(t ) if (3) holds with OP T d replaced by OP Td α. Note that OP Td α transitions between the dynamic optimal (when α = 1) and the static optimal (for large enough α). Second, it is simple to bridge difference (ii). To this end, we define the competitive difference as follows. In particular, the α-unfair competitive difference with lookahead i on instances C, CDi α (A), is the additive comparison with the dynamic optimal. That is, CDi α (A) is ρ(t ) if for any sequence of cost functions (c 1,..., c T ) C C i(a) OP T α d ρ(t ). (4) This measure allows for a smooth transition between regret (when α is large enough) and an additive version of the competitive ratio when α = BACKGROUND As we have already mentioned, there are two large literatures that have focused on designing algorithms for SOCOlike problems: the online convex optimization (OCO) literature in the machine learning community and the metrical task system (MTS) literature in the algorithms community. Importantly, the OCO literature has primarily focused on developing algorithms to minimize regret, while the MTS literature has primarily focused on developing algorithms to minimize the competitive ratio. In the following, we briefly discuss related work from these two communities in order to provide context for the results in the current paper. 3.1 Online convex optimization The OCO problem has a rich history and a wide range of important applications. Historically, in computer science, OCO is perhaps most associated with the so-called k-experts problem [1, 2], a discrete-action version of online optimization wherein at each round t the learning algorithm must choose one of k possible actions, which can be viewed as following the advice of one of k experts. However, there are also a number of other areas where OCO has a long history, e.g., portfolio management [3, 34]. The traditional OCO formulation is very related to the SOCO formulation introduced in Section 2. The key differentiating features are that (i) the typical performance metric considered is regret, (ii) switching costs are not considered, and (iii) the learner must act before the environment reveals the cost function. So, the cost function of interest in this literature is C (A) and the performance metric of interest is R (A). Following the treatment of [6,7], the typical assumptions are that the decision space F is non-empty, bounded and closed, and that the Euclidean norms of gradients c t ( ) 2 are also bounded. In this setting, a number of algorithms have been shown to achieve sublinear regret, i.e., R i(a) = o(t ). An important example of a sublinear regret, a.k.a. no-regret, learning algorithm is online gradient descent (OGD), which is parameterized by learning rates η t. OGD works as follows. Algorithm 1 (Online Gradient Descent, OGD). Select arbitrary x 1 F. In time step t 1, select x t+1 = P (x t η t c t (x t )), where P (y) = arg min x F x y 2 is the projection under the Euclidean norm. Note that by choosing the learning rates η t carefully, OGD is able to achieve sub-linear regret for OCO. In particular, the variation in [6] uses η t = Θ(1/ t) and obtains O( T )-regret. Other variations are also possible. For example, [7] achieves a tighter regret bound of O(log T ) by choosing η t = 1/(γt) for settings when the cost function additionally has minimal curvature, i.e., 2 c t (x) γi n is positive semidefinite for all x and t, where I n is the identity matrix of size n. In addition to algorithms based on gradient descent, more recent algorithms such as Online Newton Step and Follow the Approximate Leader [7] also attain O(log T )- regret bounds for a class of cost functions. Note that none of the work discussed above considers switching costs. To extend the literature discussed above from OCO to SOCO, we need to track the switching costs incurred by the algorithms. Intuitively, since the algorithms rely on learning rates η t that decay with t, if one chooses η t appropriately it should provide a parameter for controlling the switching costs incurred by the algorithm. Interestingly, the choices of η t used by the algorithms designed for OCO also turn out to be good choices to control the switching costs of the algorithms. In particular, all of the classical gradient descent learning algorithms are still sub-linear regret for SOCO. Proposition 1. Consider an online gradient descent algorithm A on a finite dimensional space with learning rates

5 such that T ηt = O(ρ1(T )). If R (A) = O(ρ 2(T )), then R 0(A) = O(ρ 1(T ) + ρ 2(T )). The proof is included in Appendix A. An immediate consequence of Proposition 1 is that the online gradient descent algorithms in [6, 7], which use η t = 1/ t and η t = 1/(γt), are still O( T )-regret and O(log T )-regret respectively when switching costs are considered, since in these cases ρ 1(T ) = O(ρ 2(T )). Note that a similar result can be obtained for Online Newton Step [7]. Importantly, though the regret of OGD algorithms is sublinear, it can easily be shown that the competitive ratio of these algorithms is unbounded. 3.2 Metrical Task Systems Like OCO, MTS also has a rich history and a wide range of important applications. Historically, MTS is perhaps most associated with the so-called k-server problem, [29]. In this problem, there are k servers, each in some state, and a sequence of requests is incrementally revealed. To serve a request, the system must choose one of the servers to move to the state necessary to serve the request, which happens at some cost depending on the source and destination states. The formulation of SOCO in Section 2 is actually, in many ways, a special case of the MTS formulation. In general, the MTS formulation differs in that (i) the cost functions c t are not assumed to be convex, (ii) the decision space is typically assumed to be discrete and is not necessarily embedded in a vector space, and (iii) the switching cost is an arbitrary metric d(x t, x t 1 ) rather than a seminorm x t x t 1. In this context, the cost function studied by MTS is typically C 1 and the performance metric of interest is the competitive ratio, specifically, CR 1(A); although the α-unfair competitive ratio CR1 α also receives attention. The weakening of the assumptions on the cost functions, and the fact that the competitive ratio uses the dynamic optimum as the benchmark, means that most of the results in the MTS setting are negative when compared with those for OCO. In particular, it has been proven that, given an arbitrary metric decision space of size n, any deterministic algorithm must be Ω(n)-competitive [21]. Further, any randomized algorithm must be Ω( log n/ log log n)-competitive [23]. These results motivate imposing additional structure on the cost functions in order to attain positive results. For example, it is commonly assumed that the metric is the uniform metric, in which d(x, y) is equal for all x y; that assumption was made in [25] in its study of the tradeoff between competitive ratio and regret. For comparison with OCO, an alternative natural restriction is to impose convexity assumptions on the cost function and the decision space, as done in this paper. Upon restricting c t to be convex, F to be convex, and to be a semi-norm, the MTS formulation becomes quite similar to the SOCO formulation. This restricted class has been the focus of a number of recent papers, and some positive results have emerged. In particular, [10] shows that when F is a one dimensional normed space 2, a deterministic online algorithm named Lazy Capacity Provisioning (LCP) is 3-competitive. However, if the additional restriction that F be a onedimensional normed space is removed, the problem is again hard. In particular, we can perform a reduction from the 2 Note that it is sufficient to consider the absolute value norm, since for every seminorm on R, x = 1 x. parking permit problem [35], to show that the competitive ratio for SOCO is unbounded. But, lookahead can provide a tool to avoid this hardness. In particular, a recent algorithm named Averaging Fixed Horizon Control (AFHC) [11] can obtain a competitive ratio of 1 + O(1/i) when i-lookahead is considered, i.e., CR i(af HC) = 1 + O(1/i). Importantly, though the algorithms described above provide constant competitive ratios, in all cases it is easy to see that the regret of these algorithms is linear. 4. THE INCOMPATIBILITY OF REGRET AND THE COMPETITIVE RATIO Though regret and the competitive ratio have largely been studied independently in the OCO and MTS literatures, respectively, it is natural to strive for algorithms that have strong guarantees with respect to both regret and the competitive ratio. In fact, in many settings there is significant motivation for studying each of these two metrics. For example, if the goal is to learn a static concept (e.g., the c t are stationary in some sense), then regret is the natural measure. However, if one is attempting to learn a dynamic concept, then the comparison point is the dynamic (offline) optimal, and so the competitive ratio is a natural metric. Thus, doing well for both the competitive ratio and regret corresponds to being able to learn both static and dynamic concepts well. Similarly, the importance of both metrics can be seen naturally in the language of control. In such a context, regret measures the performance relative to the optimal static control policy, and so if one wishes to show that dynamic control is not more risky than static control, one needs to exhibit an algorithm with small regret. However, once dynamic control is being used, one wishes to have an optimality guarantee compared to the dynamic (offline) optimal, and so the competitive ratio is a natural measure. So, one would hope for an algorithm which is always as good as the optimal static controller but nearly as good as the best dynamic controller. Interestingly, it is easy to see that none of the algorithms we have discussed to this point achieves both goals. For example, though online gradient descent has sublinear regret, its competitive ratio is infinite. Similarly, though LCP is 3-competitive, it has linear regret. It turns out that this is no accident. We show below that the two goals are fundamentally incompatible. That is, any algorithm that has sublinear regret for OCO necessarily has an infinite competitive ratio for MTS; and any algorithm that has a constant competitive ratio for MTS necessarily has at least linear regret for OCO. Further, our results give lower bounds on the simultaneous guarantees that are possible for regret and the competitive ratio. In discussing this incompatibility, there are a number of subtleties as a result of the differences in formulation between the OCO literature, where regret is the focus, and the MTS literature, where competitive ratio is the focus. In particular, there are four key differences which are important to highlight: (i) the use of C 0 in OCO and C 1 in MTS; (ii) OCO does not consider switching costs while MTS does; (iii) regret relies on an additive comparison while the competitive ratio relies on a multiplicative comparison; and (iv) regret compares to the static optimal while competitive ratio compares to the dynamic optimal. Note that the first two are intrinsic to the costs, while the latter are intrinsic to the performance metric. The following teases apart which of these differences create incompatibility and which do not. In particular, we prove that (i) and (iv) create incompati-

6 bilities. Our main result in this section states that there is an incompatibility between regret in the OCO setting and the competitive ratio in the MTS setting, i.e., between R (A) and CR 1(A). Naturally, the incompatibility remains if switching costs are added to regret, i.e., R 0(A) is considered. Further, the incompatibility remains when the competitive difference is considered, and so both the comparison with the static optimal and the dynamic optimal are additive. In fact, the incompatibility remains even when the α-unfair competitive ratio/difference is considered. Perhaps most surprisingly, the incompatibility remains when there is lookahead, i.e., when C i and C i+1 are considered. Theorem 2. Consider an arbitrary seminorm on R n, constants γ > 0, α > 0 and i N. There is a C containing a single sequence of cost functions such that, for large enough T, for all deterministic and randomized algorithms A, CR α i+1(a), CD α i+1, and R i(a) must satisfy CR i+1(a) α + Ri(A) γ, T and (5) CD i+1(a) α + R i(a) γt. (6) Note that the proof of Theorem 2 uses linear cost functions and a one-dimensional decision space, and the construction of the cost functions does not depend on T. This highlights that the incompatibility arises even in simple instances. The impact of this result can be stark. In particular, it is impossible for a learning algorithm to be able to learn static concepts with sublinear regret in the OCO setting, while having a constant competitive ratio (sub linear competitive difference) for learning dynamic concepts in the MTS setting. Even more strikingly, in the context of control theory, any dynamic controller that has a constant competitive ratio (sublinear competitive difference), must have linear regret, and so there must be cases where it does significantly worse than the best static controller. Thus, one cannot simultaneously guarantee the dynamic policy is always as good as the best static policy and is nearly as good as the optimal dynamic policy. One aspect of Theorem 2 that is worth discussing further is that the cost functions used by regret and the competitive ratio are off by one. This is motivated by the different settings in the OCO and MTS literatures; however it is natural to ask the same question for R α 0 (A) and CR α 0 (A). A parallel result to Theorem 2 holds in this context, but it turns out the result is primarily due to the hardness of attaining a small CR α 0, i.e., the hardness of mimicking the dynamic optimal without 1-step lookahead. Note that this is not the case for Theorem 2. Proposition 3. Consider an arbitrary seminorm on R n, constants γ > 0, α > 0, i N, and a deterministic or randomized online algorithm A. For large enough T, CR α 0 (A), CD α 0 (A), and R 0(A) must satisfy Proofs CR 0 α (A) + R0(A) γ T and (7) CD 0 α (A) + R 0(A) γt. (8) We now prove Theorem 2 and Proposition 3 using onedimensional examples. Note that these examples can easily be embedded into higher dimensions if desired. 3 3 To embed these examples into (R + ) n, replace functions fi α (x) by θi α (x 1,..., x n) = fi α (x 1). The lower bounds on Let ᾱ = max(1, α ). Given a > 0 and b > 0, define two possible cost functions on F = [0, 1/ᾱ]: f1 α (x) = b + axᾱ and f2 α (x) = b + a(1 xᾱ). These functions are similar to those used in [36] to study online gradient descent for a concept of bounded total variation. To simplify notation, let D(t) = 1/2 E [ x t] ᾱ, and note that D(t) [ 1/2, 1/2]. Proof of Theorem 2 Consider a system with costs c t = f1 α if t is odd and f2 α if t is even. Then C i(a) (ae [x t+i] ᾱ + b) + [ (a(1 E x t+i] ᾱ) + b) t odd =bt + a t even (1/2 + ( 1) t D(t + i)) =(a/2 + b)t + a C i+1(a) (a/2 + b)t + a ( 1) t D(t + i) ( 1) t D(t + i + 1) The static optimum is not worse than the scheme that sets x t = 1/(2ᾱ) for all t, which has total cost at most (a/2 + b)t + 1/2. The α-unfair dynamic optimum for C i+1 is not worse than the scheme that sets x t = 0 if t is odd and x t = 1/ᾱ if t is even, which has total α-unfair cost at most (b + 1)T. Hence R i(a) a ( 1) t D(t + i) 1/2 CR i+1(a) α (a/2 + b)t + a T ( 1)t D(t + i + 1) (b + 1)T CD i+1(a) α (a/2 1)T + a ( 1) t D(t + i + 1). Thus, since D(t) [ 1/2, 1/2], (b + 1)T (CR i+1(a) α + R i(a)/t ) + (b + 1) 1/2 (a/2 + b)t a ( 1) t (D(t + i + 1) + (b + 1)D(t + i)) =ab ( ) ( 1) t D(t + i) a D(i + 1) + ( 1) T D(T + i + 1) abt/2 a To establish (5), it is then sufficient that (a/2 + b)t (b + 1) 1/2 abt/2 a γt (b + 1). For b = 1/2 and a = 30γ , this holds for T 5. Ci α and Ci+1 α do not depend on the choice of norm. Moreover, the (static or dynamic) optimum is no worse than the solution constrained to have x 2,..., x n = 0. This means that the theorems hold when is replaced by the seminorm defined by (x 1,..., x n) = (x 1, 0,..., 0). With this substitution, all costs depend only on x 1, and the result follows from the one-dimensional case.

7 Similarly, CD α i+1(a) + R i(a) (a/2 1)T 1/2 + a(( 1) T D(T + i + 1) D(i + 1)) (a/2 1)T 1/2 a. Taking a = 30γ again establishes (6) for T 3. Proof of Proposition 3 To prove Proposition 3, consider the cost function sequence with c t ( ) = f 0 2 for E [ x t] 1/2 and c t ( ) = f 0 1 otherwise, on decision space [0, 1], where x t is the (random) choice of the algorithm at round t. Here the expectation is taken over the marginal distribution of x t conditioned on c 1,..., c t 1, averaging out the dependence on the realizations of x 1,..., x t 1. Notice that this sequence can be constructed by an oblivious adversary before the execution of the algorithm. The following lemma, proven in Appendix B, is the key step in establishing Proposition 3. Lemma 4. Given any algorithm, the sequence of cost functions chosen by the above oblivious adversary makes the regret, α-unfair competitive ratio, and α-unfair competitive difference satisfy R 0(A),R (A) a 1/2 E [ x t] 1/2, (9) CR 0 α (A) (a/2 + b)t + a T 1/2 E [ x t], (b + α )T (10) CD 0 α (A) (a/2 α )T + a 1/2 E [ x t]. (11) From (9) and (10) in Lemma 4, we have CR α 0 (A) + R (A)/T (a/2 + b)t (b + α )T 1/2 T For a > 2γ(b + α ), the right hand side is bigger than γ for sufficiently large T, which satisfies (7). Next, from (9) and (11) CD α 0 (A) + R (A) (a/2 α )T 1/2. By choosing a > 2(γ + α ), the right hand side is at least γt for sufficiently large T, which satisfies (8). Hence, by taking a > 2 max(γ(b + α ), γ + α ), we have an instance that simultaneously satisfies (7) and (8). 5. BALANCING REGRET AND THE COMPETITIVE RATIO Given the incompatibility of regret and the competitive ratio highlighted by Theorem 2, it is necessary to reevaluate the goals for algorithm design. For example, given that sublinear regret is impossible, the next best goal is a class of algorithms that can ensure ɛt regret for an arbitrarily small constant ɛ while remaining O(1)-competitive. Or, it may be enough to ensure that the algorithm is log T or log log T - competitive while retaining sublinear regret, even though this is not a constant guarantee. In this section, we present a novel algorithm named Random Bias Greedy (RBG) which can achieve simultaneous bounds on regret in the OCO setting and competitive ratio in the MTS setting, when the decision space F is onedimensional. Further, RBG can be used to tradeoff guarantees on regret and the α-unfair competitive ratio and, in fact, our techniques provide a general approach for the design and analysis of algorithms that simultaneously balance the two performance metrics. The algorithm takes a norm N as its input: Algorithm 2 (Random Bias Greedy, RBG(N)). Given a norm N, define w 0 (x) = N(x) for all x and w t (x) = min y{w t 1 (y)+c t (y)+n(x y)}. Generate a random number r ( 1, 1). For each time step t, go to the state x t which minimizes Y t (x t ) = w t 1 (x t ) + rn(x t ). RBG is motivated by [29], where the MTS problem is solved for a line metric space. Note that the algorithm makes use of randomness, but only in a very limited way it parameterizes its bias using a single random number r ( 1, 1). Given this bias, the algorithm works by choosing actions to greedily minimize its work function w t (x). Notice that RBG can be implemented by solving a convex minimization problem in each round, or be approximated to any given precision using a dynamic programming approach similar to the classical Work Function algorithm [37]. As stated, RBG performs well for the α-unfair competitive ratio, but performs poorly for the regret. In particular, we show in Theorem 5 that, for any norm, RBG(N) with N = is 2-competitive. So, RBG has a constant competitive ratio, and thus has linear regret by Theorem 2. To remedy this, we consider an idea motivated by the incompatibility results in Section 4: to do well on regret an algorithm needs to be sticky, i.e., it needs to settle down over time. One way to force an algorithm to be more sticky is to increase the switching costs. In particular, we show below that one can achieve a balance between regret and the α-unfair competitive ratio by parameterizing RBG using the wrong norm. More specifically, recall that, the absolute value is the only norm on R in the sense that for every norm on R, x = 1 x. We take advantage of this to use different parameterizations of 1 to weight the switching cost differently. In particular, the performance of the algorithm can be bounded as follows. Theorem 5. For a SOCO problem in a one-dimensional normed space, running RBG(N) with a one-dimensional norm having N(1) = γ 1 as input (where γ 1) attains an α-unfair competitive ratio CR1 α of (1 + γ)/ min{γ, α} and a regret R 0 of O(max{T/γ, γ}) with probability 1. A few remarks on Theorem 5 are in order. First, note that Theorem 5 holds in the setting of the stronger of the two incompatibility results, i.e., Theorem 2. This setting corresponds to comparing the performance of algorithms in the MTS and OCO settings, i.e., the cost functions are mismatched in addition to the differences in the performance metrics. Second, note that if N(1) = α 1 then we obtain that RBG(N) has an α-unfair competitive ratio of (1 + 1/α) and has linear regret (with or without switching costs). However, if one wishes to obtain a suboptimal constant α-unfair competitive ratio and ɛt -regret, for an arbitrarily small ɛ, one can choose N(1) = 1 /ɛ. Similarly, if one wishes to obtain sublinear regret and T is known in advance, choosing N(1) = γ(t ) for some increasing function achieves an O(γ(T )) α- unfair competitive ratio and O(max{T/γ(T ), γ(t )}) regret.

8 If T is not known in advance, N(1) can be gradually increased as the rounds go on, and bounds similar to those in Theorem 5 still hold. Note that because of the max the optimal regret achievable by RBG(N) is O( T ), which is equal to the optimal regret in the online learning setting with arbitrary convex costs [6]. However, the α-unfair competitive ratio can be made to grow arbitrarily slowly while maintaining sublinear regret by choosing γ(t ) appropriately. Proof of Theorem 5 In order to prove Theorem 5, we state and apply a more general result, which provides a useful tool for designing algorithms that simultaneously balance regret and the α- unfair competitive ratio. In particular, for any algorithm A, we define the OperatingCost(A) as T ct (x t+1 ) and the SwitchingCost(A) as T xt+1 x t, so that C 1(A) = OperatingCost(A) + SwitchingCost(A). For ease of notation, define OP T N to be the dynamic optimal solution under a different norm N with N(1) = γ 1 (γ 1). Note that there is no penalty factor α paid on the switching cost for OP T N. The key step in proving Theorem 5 is the following lemma, which is proven in Appendix C. Lemma 6. Consider a one-dimensional SOCO problem with norm and an online algorithm A which, when run with norm N, satisfies OperatingCost(A(N)) OP T N + O(1) along with SwitchingCost(A(N)) βop T N + O(1) with β = O(1). Fix a norm N such that N(1) = γ 1 with γ 1. Then A(N) has α-unfair competitive ratio CR1 α (A(N)) = (1 + β) max{ γ, 1} and regret R0(A(N)) = α O(max{βT, (1 + β)γ}) for the original SOCO problem with norm. Given this lemma, Theorem 5 follows from the following two lemmas, proven in Appendices D and E. In particular, suppose RBG is evaluated under the norm. The first lemma states that the operating cost of RBG(N) with N(1) = γ 1, is at most OP T N. Moreover, the switching cost of RBG(N) is at most OP T N /γ. Lemma 7. Given a SOCO problem with norm, E [OperatingCost(RBG(N))] OP T N. Lemma 8. Given a one-dimensional SOCO problem with norm, E [SwitchingCost(RBG(N))] OP T N /γ with probability CASE STUDY To ground the analytic results in this paper, we end with a case study illustrating the contrast between algorithms designed for regret and those designed for the competitive ratio. To do this, we consider the setting of dynamic capacity provisioning in data centers. This is an application that has received significant attention in the networking literature recently, [8 15], and is a setting where SOCO problems and algorithms have been formulated and applied. In this application, the goal is to dynamically control the number of active servers x t in a data center in order to minimize the operating cost given time-varying workloads, cooling efficiencies, and power costs. Experimental model To model dynamic capacity provisioning, we adopt the formulation and parameter settings introduced by Lin et al. in [10] to allow for easy comparison to prior literature. workload workload time (days) time (days) Figure 1: Workload trace from an anonymous data center. The key piece of the model is to capture the operating costs of the data center c t (x t ) given the choice of the number of active servers x t. We model the operating costs c t (x t ) as a weighted sum of delay costs (i.e., revenue loss due to delay increase) and energy costs. For simplicity, we assume the load balancer dispatches workload to active servers uniformly, at rate λ t/x t per server (which is widely deployed and turns out to be optimal for many systems). Further, we assume the power of an active server handling arrival rate λ is e(λ) and the delay cost is g(λ). A common power profile for typical servers is an affine function e(λ) = e 0 + e 1λ where e 0 and e 1 are constants; e.g., see [38]. The delay cost is modeled as g(λ) = γλd(λ) [39], where d(λ) is the average delay for users and γ is a weighting factor depending on the Internet service. The average delay is modeled using the standard queuing system M/GI/1 Processor Sharing queue, i.e., the average delay is d(λ) = 1/(µ λ), where the service rate of the server is µ [40]. However, if we apply this delay cost directly, the operating cost c t will have unbounded gradient. Therefore, we enforce an additional condition on the resulting c t that dc t (x t )/dx t B, which corresponds to ensuring an SLA requirement that the average delay be bounded. Further, we assume that the power cost of an active idle server for one time slot is normalized to be 1, and the norm 1 switching cost incurred when switching a server into and out of a power-saving mode, including the wear-and-tear costs. To model a realistic scenario, we consider traced-based experiments using a workload drawn from two real-world anonymous data centers. The first trace is for a 14-day period with 10 minute granularity, i.e., T = 2016, and the second is for a 21-day period with 1 hour granularity. Both are shown in Figure 1, and both show strong daily and weekly patterns, which is common for Internet services. The peakto-mean ratio of the workloads are both between 1.5 and 2, which is common in middle to large scale data centers. In our experiments, we normalize µ = 1, and set e(λ) = 1, since the energy consumption of current servers is dominated by the fixed costs [41]. The weight γ reflects revenue lost due to customers being deterred by delay. We set γ = 0.1. The gradient bound is B = 100. Since the power cost of an active idle server for one time slot is e 0 = 1, the norm 1 measures the duration a server must be powered down to outweigh the switching cost. We set 1 so that turning a server off and back on costs the same as the energy consumption for two hours. That is, 1 = 6 with 10-minute time steps and 1 = 1 for the 1-hour time steps. This was chosen as an approximation of the time a server should sleep so that the wear-and-tear costs of power cycling matches the operating

9 x t time (hours) Workload OPT Static OPT OGD RBG(6) RBG(96) Figure 2: Dynamic control provided by OGD and RBG over a two day period. normalized cost OGD OPT s RBG C 0 RBG C RBG/ 1 costs [42]. (a) Trace 1 normalized cost OGD OPT s RBG C 0 RBG C RBG/ 1 (b) Trace 2 Figure 3: Relative costs of OGD and RBG. Experimental results The results of our numerical experiments are shown in Figures 2 and 3. Figure 2 shows the offline optimal solution (OPT), the offline optimal static solution (Static OPT), and the output of OGD and RBG(k), where for convenience k = 1 RBG/ 1, for two days of the first trace. OGD, the algorithm designed for optimizing regret, converges to the optimal static solution, and RBG(6), the algorithm designed for optimizing competitive ratio, mimics the optimal dynamic solution. If we run RBG on a bigger norm, i.e., RBG(96), we can see that its solution becomes close to the optimal static solution. This contrasting behavior is a consequence of the performance metric the algorithms are designed to optimize. Figure 3 shows the costs of different algorithms normalized to the cost of the optimal dynamic solution for the two traces. 4 In Figure 3a the relative cost of the offline optimal static solution is small (about 1.2), which implies that both the performance of OGD and RBG will perform quite well in realistic settings, in contrast to the analytical incompatibility results under adversarial settings in Section 4. Further, as 1 RBG increases, the solution of RBG deviates further from the offline optimal solution, making the C 1 cost increase and approach the cost of the static optimal solution. The tradeoff induced by 1 RBG is apparent when the second trace is considered. Specifically, Figure 3b shows that the cost C 0 (without lookahead) can decrease when 1 RBG increases. 7. CONCLUDING REMARKS In this paper we have studied the class of smoothed online convex optimization (SOCO) problems. Variants of this 4 The solution x depends on which C i is being optimized, but the cost does not. class of problems have been studied in both the online learning community (online convex optimization problems) and the online algorithms community (metrical task systems). These two communities have focused on different performance measures regret and the competitive ratio, respectively. The goal of this paper is to understand the relationship between these two performance measures. Our first contribution is to show these measures are fundamentally incompatible. Specifically, algorithms that have sublinear regret necessarily have an infinite competitive ratio, and algorithms with a constant competitive ratio necessarily have at least linear regret. Thus, the performance measure chosen has a significant impact on the style of algorithm that is designed, which helps explain why algorithms from the online convex optimization and metrical task system communities are so different. Our second contribution is the design of an algorithm that balances regret and the competitive ratio, allowing sublinear regret while having an arbitrarily slowly growing competitive ratio. Further, the proof technique used to analyze this algorithm provides a generic approach for developing algorithms that can balance between regret and the competitive ratio. There are a number of interesting directions that this work motivates. In particular, the SOCO formulation is still under-explored, and many variations of the formulation discussed here are still not understood. For example, is it possible to tradeoff between regret and the competitive ratio in bandit versions of SOCO? More generally, the message from this paper is that regret and the competitive ratio are incompatible within the formulation of SOCO. It is quite interesting to try to understand how generally this holds. For example, does the incompatibility result proven here extend to settings where the cost functions are random instead of adversarial, e.g., variations of SOCO such as k- armed bandit problems with switching costs? 8. REFERENCES [1] M. Herbster and M. K. Warmuth, Tracking the best expert, Mach. Learn., vol. 32, no. 2, pp , Aug [2] N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Inf. Comput., vol. 108, no. 2, pp , Feb [3] T. M. Cover, Universal portfolios, Mathematical Finance, vol. 1, no. 1, pp. 1 29, [4] N. Bansal, A. Blum, S. Chawla, and A. Meyerson, Online oblivious routing, in Proc. ACM Symp. Parallel Algorithms and Architectures (SPAA), 2003, pp [5] A. Kalai and S. Vempala, Efficient algorithms for universal portfolios, in Proc. IEEE Symp. Foundations of Computer Science (FOCS), [6] M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in Proc. Int. Conf. Machine Learning (ICML), T. Fawcett and N. Mishra, Eds. AAAI Press, 2003, pp [7] E. Hazan, A. Agarwal, and S. Kale, Logarithmic regret algorithms for online convex optimization, Mach. Learn., vol. 69, pp , Dec [8] D. Kusic and N. Kandasamy, Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems, Cluster Computing, vol. 10, no. 4, pp , [9] A. Gandhi, V. Gupta, M. Harchol-Balter, and M. Kozuch, Optimality analysis of energy-performance trade-off for server farm management, Performance Evaluation, no. 11, pp , Nov [10] M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska, Dynamic right-sizing for power-proportional data centers, in Proc. IEEE INFOCOM, 2011.

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 23 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Lachlan L. H. Andrew Swinburne University of Technology Siddharth Barman

More information

Online Convex Optimization Using Predictions

Online Convex Optimization Using Predictions Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

arxiv: v1 [cs.ds] 15 Aug 2015

arxiv: v1 [cs.ds] 15 Aug 2015 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret arxiv:1508.03769v1 [cs.ds] 15 Aug 2015 Lachlan L. H. Andrew landrew@swin.edu.au Minghong Lin mhlin@caltech.edu Siddharth Barman

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

ONLINE CONVEX OPTIMIZATION USING PREDICTIONS

ONLINE CONVEX OPTIMIZATION USING PREDICTIONS ONLINE CONVEX OPTIMIZATION USING PREDICTIONS NIANGJUN CHEN, ANISH AGARWAL, ADAM WIERMAN, SIDDHARTH BARMAN, LACHLAN L. H. ANDREW Abstract. Making use of predictions is a crucial, but under-explored, area

More information

Online Facility Location with Switching Costs

Online Facility Location with Switching Costs MSc Thesis Online Facility Location with Switching Costs Lydia Zakynthinou µπλ Graduate Program in Logic, Algorithms and Computation National and Kapodistrian University of Athens Supervised by: Dimitris

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality Tim Roughgarden December 3, 2014 1 Preamble This lecture closes the loop on the course s circle of

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

arxiv: v1 [cs.ds] 30 Jun 2016

arxiv: v1 [cs.ds] 30 Jun 2016 Online Packet Scheduling with Bounded Delay and Lookahead Martin Böhm 1, Marek Chrobak 2, Lukasz Jeż 3, Fei Li 4, Jiří Sgall 1, and Pavel Veselý 1 1 Computer Science Institute of Charles University, Prague,

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Competitive Analysis of M/GI/1 Queueing Policies

Competitive Analysis of M/GI/1 Queueing Policies Competitive Analysis of M/GI/1 Queueing Policies Nikhil Bansal Adam Wierman Abstract We propose a framework for comparing the performance of two queueing policies. Our framework is motivated by the notion

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Unified Algorithms for Online Learning and Competitive Analysis

Unified Algorithms for Online Learning and Competitive Analysis JMLR: Workshop and Conference Proceedings vol 202 9 25th Annual Conference on Learning Theory Unified Algorithms for Online Learning and Competitive Analysis Niv Buchbinder Computer Science Department,

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Move from Perturbed scheme to exponential weighting average

Move from Perturbed scheme to exponential weighting average Move from Perturbed scheme to exponential weighting average Chunyang Xiao Abstract In an online decision problem, one makes decisions often with a pool of decisions sequence called experts but without

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Non-clairvoyant Scheduling for Minimizing Mean Slowdown

Non-clairvoyant Scheduling for Minimizing Mean Slowdown Non-clairvoyant Scheduling for Minimizing Mean Slowdown N. Bansal K. Dhamdhere J. Könemann A. Sinha April 2, 2003 Abstract We consider the problem of scheduling dynamically arriving jobs in a non-clairvoyant

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Online Packet Routing on Linear Arrays and Rings

Online Packet Routing on Linear Arrays and Rings Proc. 28th ICALP, LNCS 2076, pp. 773-784, 2001 Online Packet Routing on Linear Arrays and Rings Jessen T. Havill Department of Mathematics and Computer Science Denison University Granville, OH 43023 USA

More information

An Online Algorithm for Maximizing Submodular Functions

An Online Algorithm for Maximizing Submodular Functions An Online Algorithm for Maximizing Submodular Functions Matthew Streeter December 20, 2007 CMU-CS-07-171 Daniel Golovin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This research

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Technion - Computer Science Department - Technical Report CS On Centralized Smooth Scheduling

Technion - Computer Science Department - Technical Report CS On Centralized Smooth Scheduling On Centralized Smooth Scheduling Ami Litman January 25, 2005 Abstract Shiri Moran-Schein This paper studies evenly distributed sets of natural numbers and their applications to scheduling in a centralized

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Competitive Management of Non-Preemptive Queues with Multiple Values

Competitive Management of Non-Preemptive Queues with Multiple Values Competitive Management of Non-Preemptive Queues with Multiple Values Nir Andelman and Yishay Mansour School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel Abstract. We consider the online problem

More information

Experts in a Markov Decision Process

Experts in a Markov Decision Process University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2004 Experts in a Markov Decision Process Eyal Even-Dar Sham Kakade University of Pennsylvania Yishay Mansour Follow

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin

Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization by Mark Franklin Godwin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

The Parking Permit Problem

The Parking Permit Problem The Parking Permit Problem Adam Meyerson November 3, 2004 Abstract We consider online problems where purchases have time durations which expire regardless of whether the purchase is used or not. The Parking

More information

A Polynomial-Time Algorithm for Pliable Index Coding

A Polynomial-Time Algorithm for Pliable Index Coding 1 A Polynomial-Time Algorithm for Pliable Index Coding Linqi Song and Christina Fragouli arxiv:1610.06845v [cs.it] 9 Aug 017 Abstract In pliable index coding, we consider a server with m messages and n

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

arxiv: v1 [cs.ni] 18 Apr 2017

arxiv: v1 [cs.ni] 18 Apr 2017 Zijun Zhang Dept. of Computer Science University of Calgary zijun.zhang@ucalgary.ca Zongpeng Li Dept. of Computer Science University of Calgary zongpeng@ucalgary.ca Chuan Wu Dept. of Computer Science The

More information

Algorithms, Games, and Networks January 17, Lecture 2

Algorithms, Games, and Networks January 17, Lecture 2 Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

Lecture 2: Paging and AdWords

Lecture 2: Paging and AdWords Algoritmos e Incerteza (PUC-Rio INF2979, 2017.1) Lecture 2: Paging and AdWords March 20 2017 Lecturer: Marco Molinaro Scribe: Gabriel Homsi In this class we had a brief recap of the Ski Rental Problem

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Optimal Sleeping Mechanism for Multiple Servers with MMPP-Based Bursty Traffic Arrival

Optimal Sleeping Mechanism for Multiple Servers with MMPP-Based Bursty Traffic Arrival 1 Optimal Sleeping Mechanism for Multiple Servers with MMPP-Based Bursty Traffic Arrival Zhiyuan Jiang, Bhaskar Krishnamachari, Sheng Zhou, arxiv:1711.07912v1 [cs.it] 21 Nov 2017 Zhisheng Niu, Fellow,

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem

A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem Dimitris J. Bertsimas Dan A. Iancu Pablo A. Parrilo Sloan School of Management and Operations Research Center,

More information

39 Better Scalable Algorithms for Broadcast Scheduling

39 Better Scalable Algorithms for Broadcast Scheduling 39 Better Scalable Algorithms for Broadcast Scheduling NIKHIL BANSAL, Eindhoven University of Technology RAVISHANKAR KRISHNASWAMY, Princeton University VISWANATH NAGARAJAN, IBM T.J. Watson Research Center

More information

Dynamic Capacity Provisioning of Heterogeneous Servers

Dynamic Capacity Provisioning of Heterogeneous Servers Dynamic Capacity Provisioning of Heterogeneous Servers Minghong Lin, Lachlan L. H. Andrew, Adam Wierman California Institute of Technology, Email: {mhlin,adamw}@caltech.edu Swinburne University of Technology,

More information

Approximation Metrics for Discrete and Continuous Systems

Approximation Metrics for Discrete and Continuous Systems University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science May 2007 Approximation Metrics for Discrete Continuous Systems Antoine Girard University

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

A STAFFING ALGORITHM FOR CALL CENTERS WITH SKILL-BASED ROUTING: SUPPLEMENTARY MATERIAL

A STAFFING ALGORITHM FOR CALL CENTERS WITH SKILL-BASED ROUTING: SUPPLEMENTARY MATERIAL A STAFFING ALGORITHM FOR CALL CENTERS WITH SKILL-BASED ROUTING: SUPPLEMENTARY MATERIAL by Rodney B. Wallace IBM and The George Washington University rodney.wallace@us.ibm.com Ward Whitt Columbia University

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

A lower bound for scheduling of unit jobs with immediate decision on parallel machines

A lower bound for scheduling of unit jobs with immediate decision on parallel machines A lower bound for scheduling of unit jobs with immediate decision on parallel machines Tomáš Ebenlendr Jiří Sgall Abstract Consider scheduling of unit jobs with release times and deadlines on m identical

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Scheduling for the tail: Robustness versus Optimality

Scheduling for the tail: Robustness versus Optimality Scheduling for the tail: Robustness versus Optimality Jayakrishnan Nair, Adam Wierman 2, and Bert Zwart 3 Department of Electrical Engineering, California Institute of Technology 2 Computing and Mathematical

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

arxiv: v2 [math.pr] 22 Dec 2014

arxiv: v2 [math.pr] 22 Dec 2014 Non-stationary Stochastic Optimization Omar Besbes Columbia University Yonatan Gur Stanford University Assaf Zeevi Columbia University first version: July 013, current version: December 4, 014 arxiv:1307.5449v

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

On-line Learning and the Metrical Task System Problem

On-line Learning and the Metrical Task System Problem Machine Learning, 39, 35 58, 2000. c 2000 Kluwer Academic Publishers. Printed in The Netherlands. On-line Learning and the Metrical Task System Problem AVRIM BLUM avrim+@cs.cmu.edu CARL BURCH cburch+@cs.cmu.edu

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

Robustness and performance of threshold-based resource allocation policies

Robustness and performance of threshold-based resource allocation policies Robustness and performance of threshold-based resource allocation policies Takayuki Osogami Mor Harchol-Balter Alan Scheller-Wolf Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue,

More information

Online Convex Optimization for Dynamic Network Resource Allocation

Online Convex Optimization for Dynamic Network Resource Allocation 217 25th European Signal Processing Conference (EUSIPCO) Online Convex Optimization for Dynamic Network Resource Allocation Tianyi Chen, Qing Ling, and Georgios B. Giannakis Dept. of Elec. & Comput. Engr.

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

IN this paper, we consider the capacity of sticky channels, a

IN this paper, we consider the capacity of sticky channels, a 72 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 1, JANUARY 2008 Capacity Bounds for Sticky Channels Michael Mitzenmacher, Member, IEEE Abstract The capacity of sticky channels, a subclass of insertion

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Speed Scaling in the Non-clairvoyant Model

Speed Scaling in the Non-clairvoyant Model Speed Scaling in the Non-clairvoyant Model [Extended Abstract] ABSTRACT Yossi Azar Tel Aviv University Tel Aviv, Israel azar@tau.ac.il Zhiyi Huang University of Hong Kong Hong Kong zhiyi@cs.hku.hk In recent

More information

Dynamic Control of Running Servers

Dynamic Control of Running Servers Dynamic Control of Running Servers Esa Hyytiä 1,2, Douglas Down 3, Pasi Lassila 2, and Samuli Aalto 2 1 Department of Computer Science, University of Iceland, Iceland 2 Department of Communications and

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing

More information

Information Storage Capacity of Crossbar Switching Networks

Information Storage Capacity of Crossbar Switching Networks Information Storage Capacity of Crossbar Switching etworks ABSTRACT In this work we ask the fundamental uestion: How many bits of information can be stored in a crossbar switching network? The answer is

More information

On the Impossibility of Black-Box Truthfulness Without Priors

On the Impossibility of Black-Box Truthfulness Without Priors On the Impossibility of Black-Box Truthfulness Without Priors Nicole Immorlica Brendan Lucier Abstract We consider the problem of converting an arbitrary approximation algorithm for a singleparameter social

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

A Primal-Dual Randomized Algorithm for Weighted Paging

A Primal-Dual Randomized Algorithm for Weighted Paging A Primal-Dual Randomized Algorithm for Weighted Paging Nikhil Bansal Niv Buchbinder Joseph (Seffi) Naor April 2, 2012 Abstract The study the weighted version of classic online paging problem where there

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

On the Partitioning of Servers in Queueing Systems during Rush Hour

On the Partitioning of Servers in Queueing Systems during Rush Hour On the Partitioning of Servers in Queueing Systems during Rush Hour This paper is motivated by two phenomena observed in many queueing systems in practice. The first is the partitioning of server capacity

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information