A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

Size: px

Start display at page:

Download "A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret"

Barnaby Edwards
5 years ago
Views:

1 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Minghong Lin Computer Science California Institute of Technology Adam Wierman Computer Science California Institute of Technology Lachlan L.H. Andrew Centre for Advanced Internet Architectures Swinburne University of Technology Alan Roytman Computer Science University of California, Los Angeles Adam Meyerson Google Inc. ABSTRACT We consider algorithms for smoothed online convex optimization (SOCO) problems. SOCO is a variant of the class of online convex optimization (OCO) problems that is strongly related to the class of metrical task systems (MTS). Both of these have a rich history and a wide range of applications. Prior literature on these problems has focused on two performance metrics: regret and the competitive ratio. There exist known algorithms with sublinear regret and known algorithms with constant competitive ratios; however no known algorithm achieves both simultaneously. In this paper, we show that this is due to a fundamental incompatibility between these two metrics no algorithm (deterministic or randomized) can achieve sublinear regret and a constant competitive ratio, even in the case when the objective functions are linear. However, we also exhibit an algorithm that, for the important special case of one dimensional decision spaces, provides sublinear regret while maintaining a competitive ratio that grows arbitrarily slowly. To evaluate the performance of the algorithm in practice, we consider a case study focused on dynamic capacity provisioning in data centers. 1. INTRODUCTION Online convex optimization has a wide array of applications, e.g., the k-experts problem [1, 2], portfolio management [3] and network routing [4]; there is also a significant literature of work developing and analyzing algorithms that can effectively learn in this setting, e.g., [5 7]. In an online convex optimization (OCO) problem, a learner interacts with an environment in a sequence of rounds. During each round t: (i) the learner must choose an action x t from a convex decision space F ; (ii) the environment reveals a cost convex function c t, and (iii) the learner experiences Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX...$ cost c t (x t ). The goal is to design learning algorithms that minimize the cost experienced over a (long) horizon of T rounds. In this paper, we study a generalization of online convex optimization that we term smoothed online convex optimization (SOCO). The only change in SOCO compared to OCO is that the cost experienced by the learner is now c t (x t )+ x t x t 1, where is a norm. That is, the learner experiences an additional smoothing cost or switching cost associated with changing the action. By adding switching costs to online convex optimization, SOCO captures a wide variety of important applications. In fact, a strong motivation for studying SOCO are the recent developments in dynamic capacity provisioning algorithms for data centers [8 15], where the goal is to dynamically control the number and placement of active servers (x t ) in order to minimize a combination of the delay and energy costs (captured by c t ) and the switching costs involved in cycling servers into power saving modes and migrating data ( x t x t 1 ). Other computer systems applications of recent interest include video streaming [16], where the encoding quality of a video (x t ) needs to change dynamically in response to network congestion, but where large changes in encoding quality are visually annoying to users; and optical networking [17], in which there is a cost to reestablishing a light path. In addition to applications within computer systems, there are a number of problems in industrial optimization that can be modeled by SOCO. One that is of particular interest is the dynamic dispatching of electricity generators, where a particular portfolio of generation (x t ) must be allocated to cover demand, but in addition to the time-varying operating costs of the generators (c t ) there are significant setup and ramping costs associated with changing x t [18]. Further, many applications typically modeled using online convex optimization have, in reality, some cost associated with a change of action; and so may be better modeled using SOCO rather than OCO. For example, OCO encodes the so-called k-experts problem, which has many applications where switching costs can be important, e.g., in stock portfolio management there is a cost associated with adjusting the stock portfolio owned. In fact, switching costs have long been considered important in related learning problems, such as the k-armed bandit problem which has a considerable literature studying algorithms that can learn effectively

2 despite switching costs [19, 20]. Further, SOCO has applications even in contexts where there are no costs associated with switching actions. For example, if there is concept drift in a penalized estimation problem, then it is natural to make use of a regularizer (switching cost) term in order to control the speed of the drift of the estimator. Two communities, two performance metrics Though the precise formulation of SOCO does not appear to have been studied previously, there are two large bodies of literature that have studied formulations that are very related to SOCO: (i) the online convex optimization (OCO) literature within the machine learning community, e.g., [6,7], and (ii) the metrical task system (MTS) literature within the algorithms community, e.g., [21, 22]. We discuss these literatures in detail in Section 3. While there are several differences between the formulations in the two communities, a notable difference is that they focus on different performance metrics. In the OCO literature, the goal is typically to design a learning algorithm that minimizes the regret, which is the difference between the cost of the algorithm and the cost of the offline optimal static solution. In this context, a number of algorithms have been shown to provide sub-linear regret (a.k.a. no regret). For example, online gradient descent can achieve O( T )-regret [6]. Though such guarantees are proven only in the absence of switching costs, we show in Section 3.1 that the same regret bound also holds for SOCO. In the MTS literature, the goal is to design learning algorithms that minimize the competitive ratio, which is the maximum ratio between the cost of the algorithm and the cost of the offline optimal (dynamic) solution. In this setting, most results tend to be negative, e.g., for any metric space, the competitive ratio of an MTS with states chosen from that space grows without bound as the number of states grows [21, 23]. However, recently, results have shown that positive results can be found in the case when the cost function and decision space are convex. In the data center context, when x t is scalar, Lazy Capacity Provisioning (LCP) is 3-competitive [10]. Interestingly, the focus on different performance metrics in the OCO and MTS communities has led the communities to develop quite different styles of algorithms. The differences between the algorithms is highlighted by the fact that the algorithms developed in the OCO community universally perform poorly with respect to the competitive ratio and the algorithms developed in the MTS community universally perform poorly with respect to regret. Summary of contributions. This paper seeks to understand the relationship between algorithms that focus on minimizing regret and those that focus on minimizing the competitive ratio. To this end, the two questions we seek to answer are: (i) Can an algorithm simultaneously achieve both a constant competitive ratio and a sub-linear regret? (ii) Can bounds on the competitive ratio be translated into bounds on the regret, and vice versa? Despite the rich literatures focusing on SOCO-like formulations in each of the OCO and MTS communities, there are very few examples of research along the directions above. Perhaps the first, and for a long time only, such paper is the work of Blum and Burch [24], which provides some connections between the OCO and MTS literatures in the special case where there are switching costs that are a fixed constant. In this context, [24] shows how to translate bounds on regret into bounds on the competitive ratio, and vice versa. However, beyond this result there has been almost no success at connecting the various OCO literatures until a recent paper of Buchbinder et al. [25]. This paper uses a primal-dual approach to develop an algorithm that can do well for the α-unfair competitive ratio, which is a hybrid of the competitive ratio and regret that provides comparison to the dynamic optimal when α = 1 and to the static optimal when α = (see Section 2). The algorithm makes heavy use of α, and so does not simultaneously perform well for regret and the competitive ratio, but the result highlights the feasibility of unified approaches for algorithm design across competitive ratio and regret. Additionally, it is worth noting that there is work focused on achieving simultaneous guarantees with respect to the static and dynamic optimal solutions in other settings, e.g., decision making on lists and trees [26], and there have been some attempts to use algorithmic approaches from machine learning in the context of MTSs [27, 28]. In the current paper, we provide answers to both of the questions above in the context of SOCO. In answer to the first question above, we show that there is a fundamental incompatibility between regret and competitive ratio no algorithm can maintain both sublinear regret and a constant competitive ratio (Theorem 2). This incompatibility does not stem from a pathological example: it holds even for the simple case when c t is linear and x t is scalar. Further, it holds for both deterministic and randomized algorithms and when the α-unfair competitive ratio is considered. Thus, it is no accident that algorithms from the OCO community perform poorly with respect to the competitive ratio and algorithms from the MTS community perform poorly with respect to regret. This is disappointing since there are many applications where providing both guarantees on regret and on the competitive ratio would be desirable. For example, providing sub-linear regret can be seen as providing justification for incurring the added complexity of implementing a dynamic control policy (you always improve upon any simple static policy), while providing a constant competitive ratio can be seen as providing a near-optimal guarantee. Though providing sub-linear regret and a constant competitive ratio is impossible, we show that it is possible to nearly achieve this goal. More specifically, we present a new algorithm named Randomly Biased Greedy (RBG) which achieves a competitive ratio of (1+γ) while maintaining O(max{T/γ, γ}) regret for γ 1 on one-dimensional action spaces. If γ can be chosen as a function of T, then this algorithm can be used to balance between regret and the competitive ratio. For example, it can achieve sub-linear regret while having an arbitrarily slowly growing competitive ratio, or it can achieve O( T ) regret while maintaining an O( T ) competitive ratio. Note that, unlike the scheme of [25], this algorithm has a finite competitive ratio on continuous action spaces and provides a simultaneous guarantee on both regret and the competitive ratio. The main limitation of RBG is that the guarantees on regret only hold when the action space is onedimensional (unlike those for the algorithm in [25]). However, this special case is important in a number of settings. For example, it has been applied for dynamic capacity provisioning in data centers [10, 11], it is a crucial subproblem for k-server on binary trees [29], it has been used to study the handover of cell phones [30, 31], and it can be used for

3 controlling the degradation of quality in video with varying encoding rate [16]. Importantly, the analysis of RBG provides an answer to the second question above. In particular, the analysis provides a generic approach that can be used to translate between bounds on the competitive ratio and bounds on the regret, as well as a general technique for parameterizing any algorithm in order to achieve simultaneous guarantees on the competitive ratio and regret. Finally, we illustrate the contrast between algorithms focused on regret and algorithms focused on competitive ratio using a case study in Section 6. Additionally, this case study provides a practical evaluation of the performance of RBG. The case study we focus on is that of dynamic capacity provisioning in a data center, i.e., dynamically adapting the number of active servers in a data center in response to dynamic load. 2. PROBLEM FORMULATION Our focus is on smoothed online convex optimization (SOCO) problems, which is a class of online optimization problems that come up in a wide range of applications, as discussed in the introduction. To ground the work in this paper, we present a case study of dynamic capacity provisioning in data centers in Section 6. Formally, a SOCO problem is defined as follows. There is a convex decision/action space F (R + ) n and a sequence of cost functions {c 1, c 2,... }, where each c t : F R +. At each time t, a learner/algorithm chooses an action vector x t F and the environment chooses a cost function c t. The algorithm is then evaluated on the cost function and pays a switching penalty/cost corresponding to the difference between the actions. Cost formulation More specifically, consider the following sequence of environment revelations and algorithm decisions:..., x t, c t, x t+1, c t+1,.... The cost of a (strictly) online algorithm A over a horizon of T rounds is [ T ] C(A) = E c t (x t ) + x t x t 1, where x 1,..., x t are the decisions of algorithm A, x 0 = 0 without loss of generality, the expectation is over any randomness used by the algorithm, and is a seminorm on R n. 1 We use C (A) when [ we do not wish to include switching costs, i.e., C T ] (A) = E ct (x t ). As we have discussed in the introduction, there are two main literatures that focus on SOCO-like formulations: OCO and MTS. In these two literatures, the orders of events in a round are different. The OCO setting follows the ordering described above, where the algorithm plays first. However, in the MTS setting the environment plays first. Consequently, the algorithm can be considered to have a 1-step lookahead. In general, an algorithm may have a longer lookahead, and so we define the cost of an online algorithm A with lookahead i as [ T ] C i(a) = E c t (x t+i ) + x t+i x t+i 1, 1 Recall that a seminorm satisfies the axioms of a norm except that x = 0 does not imply x = 0. where x i = 0 without loss of generality. Using this notation, the cost considered by the MTS literature is C 1. As above, we use C i(a) if we do not want to consider switching costs. Importantly, it is possible to relate C i to C i 1, and thus to relate the cost considered by the MTS literature to the cost considered by the OCO literature, as done in [24, 25]. Specifically, the penalty incurred by the learner due to acting before the t th cost function is revealed can be bounded by c t (x t ) c t (x t+1 ) c t (x t )(x t x t+1 ) (1) c t (x t ) 2 x t x t+1 2, where 2 is the Euclidean norm. A common assumption in OCO, which we adopt in this paper, is that c t ( ) 2 are bounded on a given instance, and thus the difference between the OCO and MTS costs (i.e., C 0 and C 1) is not too large. One final variation on the cost function that we consider is the α-penalized cost, which penalizes changes in action by an extra factor of α 1. Formally, we define Ci α as follows: [ T ] Ci α (A) = E c t (x t+i ) + α x t+i x t+i 1. Performance metrics To evaluate the performance of algorithms for SOCO problems, it is typical to compare the cost of the algorithm with the cost of an offline optimal solution. However, there are a wide variety of optimal solutions that serve as benchmarks for comparison. Further, the form of the comparison can be additive or multiplicative depending on the community. Within the OCO literature, the typical benchmark that is compared against is the optimal offline static action, i.e., OP T s = min x F c t (x). Using this benchmark for comparison, the most typical performance metric considered is the regret. The regret of an online learning algorithm is defined as the (additive) difference between its cost and the cost of the optimal static action vector. Specifically, the regret of Algorithm A on instances C, R (A), is less than ρ(t ) if for any sequence of cost functions (c 1,..., c T ) C, C (A) OP T s ρ(t ) (2) In the traditional definition of regret given above, switching costs and lookahead are not considered. However, a natural generalization is R i(a) for which C (A) is replaced by C i(a) in (2). The larger the set C, the stronger an upper bound on regret is, and the smaller C, the stronger a lower bound is. In this paper, except where noted, we take C to be the set C 1 of sequences of convex functions mapping (R + ) n to R + with gradient uniformly bounded over the sequence. Within the MTS literature, the typical benchmark that is compared against is the optimal offline (dynamic) solution: OP T d = min x F T c t (x t ) + x t x t 1. Note that the minimal cost is the same regardless of lookahead since the cost functions are fixed. Using this benchmark for comparison, the most typical performance metric considered is the competitive ratio,

4 which compares the cost of the algorithm to that of the offline optimal. In the MTS literature, the cost typically considered is C 1. However, in more generality the competitive ratio with lookahead i, denoted by CR i(a), is ρ(t ) if for any sequence of cost functions (c 1,..., c T ) C C i(a) ρ(t )OP T d + O(1). (3) The case of i = 1 corresponds to the typical notion of competitive ratio studied in the MTS literature (since C 1 is used to measure the cost in this literature). Bridging competitiveness and regret Clearly, there is significant motivation for studying regret and competitive ratio. The goal of this paper is to understand the relationship between these measures. There are two important contrasts between regret and the competitive ratio that are important to distinguish: (i) the benchmarks chosen for comparison are the static and dynamic offline optimal solutions, respectively, and (ii) the comparisons are additive and multiplicative, respectively. A main result of this paper is that regret and the competitive ratio are fundamentally incompatible, in that no algorithm can perform well on both measures. Thus, it is important to understand which of (i) and/or (ii) above leads to this incompatibility. To tease apart these issues, we need to define variations of the competitive ratio and regret that allow us to bridge these differences. First, there are a variety of options for bridging (i). In particular, there are a number of hybrid measures in both the OCO and MTS literatures that use benchmarks that are in some sense in between the static optimal and the dynamic optimal. For example, Adaptive-Regret [32] is defined as the maximum regret over any interval, where the static optimum can differ for different intervals, and Internal regret [33] compares the online policy against a simple perturbation of that policy. The metric we use is termed the α-unfair competitive ratio [23 25], and we denote it with CRi α (A) in the case of i-lookahead. Formally, CRi α (A) is defined exactly the same as the competitive ratio except that the benchmark for comparison is OP T α d = min x F T c t (x t ) + α x t x t 1, where α 1. Specifically, CRi α (A) is ρ(t ) if (3) holds with OP T d replaced by OP Td α. Note that OP Td α transitions between the dynamic optimal (when α = 1) and the static optimal (for large enough α). Second, it is simple to bridge difference (ii). To this end, we define the competitive difference as follows. In particular, the α-unfair competitive difference with lookahead i on instances C, CDi α (A), is the additive comparison with the dynamic optimal. That is, CDi α (A) is ρ(t ) if for any sequence of cost functions (c 1,..., c T ) C C i(a) OP T α d ρ(t ). (4) This measure allows for a smooth transition between regret (when α is large enough) and an additive version of the competitive ratio when α = BACKGROUND As we have already mentioned, there are two large literatures that have focused on designing algorithms for SOCOlike problems: the online convex optimization (OCO) literature in the machine learning community and the metrical task system (MTS) literature in the algorithms community. Importantly, the OCO literature has primarily focused on developing algorithms to minimize regret, while the MTS literature has primarily focused on developing algorithms to minimize the competitive ratio. In the following, we briefly discuss related work from these two communities in order to provide context for the results in the current paper. 3.1 Online convex optimization The OCO problem has a rich history and a wide range of important applications. Historically, in computer science, OCO is perhaps most associated with the so-called k-experts problem [1, 2], a discrete-action version of online optimization wherein at each round t the learning algorithm must choose one of k possible actions, which can be viewed as following the advice of one of k experts. However, there are also a number of other areas where OCO has a long history, e.g., portfolio management [3, 34]. The traditional OCO formulation is very related to the SOCO formulation introduced in Section 2. The key differentiating features are that (i) the typical performance metric considered is regret, (ii) switching costs are not considered, and (iii) the learner must act before the environment reveals the cost function. So, the cost function of interest in this literature is C (A) and the performance metric of interest is R (A). Following the treatment of [6,7], the typical assumptions are that the decision space F is non-empty, bounded and closed, and that the Euclidean norms of gradients c t ( ) 2 are also bounded. In this setting, a number of algorithms have been shown to achieve sublinear regret, i.e., R i(a) = o(t ). An important example of a sublinear regret, a.k.a. no-regret, learning algorithm is online gradient descent (OGD), which is parameterized by learning rates η t. OGD works as follows. Algorithm 1 (Online Gradient Descent, OGD). Select arbitrary x 1 F. In time step t 1, select x t+1 = P (x t η t c t (x t )), where P (y) = arg min x F x y 2 is the projection under the Euclidean norm. Note that by choosing the learning rates η t carefully, OGD is able to achieve sub-linear regret for OCO. In particular, the variation in [6] uses η t = Θ(1/ t) and obtains O( T )-regret. Other variations are also possible. For example, [7] achieves a tighter regret bound of O(log T ) by choosing η t = 1/(γt) for settings when the cost function additionally has minimal curvature, i.e., 2 c t (x) γi n is positive semidefinite for all x and t, where I n is the identity matrix of size n. In addition to algorithms based on gradient descent, more recent algorithms such as Online Newton Step and Follow the Approximate Leader [7] also attain O(log T )- regret bounds for a class of cost functions. Note that none of the work discussed above considers switching costs. To extend the literature discussed above from OCO to SOCO, we need to track the switching costs incurred by the algorithms. Intuitively, since the algorithms rely on learning rates η t that decay with t, if one chooses η t appropriately it should provide a parameter for controlling the switching costs incurred by the algorithm. Interestingly, the choices of η t used by the algorithms designed for OCO also turn out to be good choices to control the switching costs of the algorithms. In particular, all of the classical gradient descent learning algorithms are still sub-linear regret for SOCO. Proposition 1. Consider an online gradient descent algorithm A on a finite dimensional space with learning rates

5 such that T ηt = O(ρ1(T )). If R (A) = O(ρ 2(T )), then R 0(A) = O(ρ 1(T ) + ρ 2(T )). The proof is included in Appendix A. An immediate consequence of Proposition 1 is that the online gradient descent algorithms in [6, 7], which use η t = 1/ t and η t = 1/(γt), are still O( T )-regret and O(log T )-regret respectively when switching costs are considered, since in these cases ρ 1(T ) = O(ρ 2(T )). Note that a similar result can be obtained for Online Newton Step [7]. Importantly, though the regret of OGD algorithms is sublinear, it can easily be shown that the competitive ratio of these algorithms is unbounded. 3.2 Metrical Task Systems Like OCO, MTS also has a rich history and a wide range of important applications. Historically, MTS is perhaps most associated with the so-called k-server problem, [29]. In this problem, there are k servers, each in some state, and a sequence of requests is incrementally revealed. To serve a request, the system must choose one of the servers to move to the state necessary to serve the request, which happens at some cost depending on the source and destination states. The formulation of SOCO in Section 2 is actually, in many ways, a special case of the MTS formulation. In general, the MTS formulation differs in that (i) the cost functions c t are not assumed to be convex, (ii) the decision space is typically assumed to be discrete and is not necessarily embedded in a vector space, and (iii) the switching cost is an arbitrary metric d(x t, x t 1 ) rather than a seminorm x t x t 1. In this context, the cost function studied by MTS is typically C 1 and the performance metric of interest is the competitive ratio, specifically, CR 1(A); although the α-unfair competitive ratio CR1 α also receives attention. The weakening of the assumptions on the cost functions, and the fact that the competitive ratio uses the dynamic optimum as the benchmark, means that most of the results in the MTS setting are negative when compared with those for OCO. In particular, it has been proven that, given an arbitrary metric decision space of size n, any deterministic algorithm must be Ω(n)-competitive [21]. Further, any randomized algorithm must be Ω( log n/ log log n)-competitive [23]. These results motivate imposing additional structure on the cost functions in order to attain positive results. For example, it is commonly assumed that the metric is the uniform metric, in which d(x, y) is equal for all x y; that assumption was made in [25] in its study of the tradeoff between competitive ratio and regret. For comparison with OCO, an alternative natural restriction is to impose convexity assumptions on the cost function and the decision space, as done in this paper. Upon restricting c t to be convex, F to be convex, and to be a semi-norm, the MTS formulation becomes quite similar to the SOCO formulation. This restricted class has been the focus of a number of recent papers, and some positive results have emerged. In particular, [10] shows that when F is a one dimensional normed space 2, a deterministic online algorithm named Lazy Capacity Provisioning (LCP) is 3-competitive. However, if the additional restriction that F be a onedimensional normed space is removed, the problem is again hard. In particular, we can perform a reduction from the 2 Note that it is sufficient to consider the absolute value norm, since for every seminorm on R, x = 1 x. parking permit problem [35], to show that the competitive ratio for SOCO is unbounded. But, lookahead can provide a tool to avoid this hardness. In particular, a recent algorithm named Averaging Fixed Horizon Control (AFHC) [11] can obtain a competitive ratio of 1 + O(1/i) when i-lookahead is considered, i.e., CR i(af HC) = 1 + O(1/i). Importantly, though the algorithms described above provide constant competitive ratios, in all cases it is easy to see that the regret of these algorithms is linear. 4. THE INCOMPATIBILITY OF REGRET AND THE COMPETITIVE RATIO Though regret and the competitive ratio have largely been studied independently in the OCO and MTS literatures, respectively, it is natural to strive for algorithms that have strong guarantees with respect to both regret and the competitive ratio. In fact, in many settings there is significant motivation for studying each of these two metrics. For example, if the goal is to learn a static concept (e.g., the c t are stationary in some sense), then regret is the natural measure. However, if one is attempting to learn a dynamic concept, then the comparison point is the dynamic (offline) optimal, and so the competitive ratio is a natural metric. Thus, doing well for both the competitive ratio and regret corresponds to being able to learn both static and dynamic concepts well. Similarly, the importance of both metrics can be seen naturally in the language of control. In such a context, regret measures the performance relative to the optimal static control policy, and so if one wishes to show that dynamic control is not more risky than static control, one needs to exhibit an algorithm with small regret. However, once dynamic control is being used, one wishes to have an optimality guarantee compared to the dynamic (offline) optimal, and so the competitive ratio is a natural measure. So, one would hope for an algorithm which is always as good as the optimal static controller but nearly as good as the best dynamic controller. Interestingly, it is easy to see that none of the algorithms we have discussed to this point achieves both goals. For example, though online gradient descent has sublinear regret, its competitive ratio is infinite. Similarly, though LCP is 3-competitive, it has linear regret. It turns out that this is no accident. We show below that the two goals are fundamentally incompatible. That is, any algorithm that has sublinear regret for OCO necessarily has an infinite competitive ratio for MTS; and any algorithm that has a constant competitive ratio for MTS necessarily has at least linear regret for OCO. Further, our results give lower bounds on the simultaneous guarantees that are possible for regret and the competitive ratio. In discussing this incompatibility, there are a number of subtleties as a result of the differences in formulation between the OCO literature, where regret is the focus, and the MTS literature, where competitive ratio is the focus. In particular, there are four key differences which are important to highlight: (i) the use of C 0 in OCO and C 1 in MTS; (ii) OCO does not consider switching costs while MTS does; (iii) regret relies on an additive comparison while the competitive ratio relies on a multiplicative comparison; and (iv) regret compares to the static optimal while competitive ratio compares to the dynamic optimal. Note that the first two are intrinsic to the costs, while the latter are intrinsic to the performance metric. The following teases apart which of these differences create incompatibility and which do not. In particular, we prove that (i) and (iv) create incompati-

6 bilities. Our main result in this section states that there is an incompatibility between regret in the OCO setting and the competitive ratio in the MTS setting, i.e., between R (A) and CR 1(A). Naturally, the incompatibility remains if switching costs are added to regret, i.e., R 0(A) is considered. Further, the incompatibility remains when the competitive difference is considered, and so both the comparison with the static optimal and the dynamic optimal are additive. In fact, the incompatibility remains even when the α-unfair competitive ratio/difference is considered. Perhaps most surprisingly, the incompatibility remains when there is lookahead, i.e., when C i and C i+1 are considered. Theorem 2. Consider an arbitrary seminorm on R n, constants γ > 0, α > 0 and i N. There is a C containing a single sequence of cost functions such that, for large enough T, for all deterministic and randomized algorithms A, CR α i+1(a), CD α i+1, and R i(a) must satisfy CR i+1(a) α + Ri(A) γ, T and (5) CD i+1(a) α + R i(a) γt. (6) Note that the proof of Theorem 2 uses linear cost functions and a one-dimensional decision space, and the construction of the cost functions does not depend on T. This highlights that the incompatibility arises even in simple instances. The impact of this result can be stark. In particular, it is impossible for a learning algorithm to be able to learn static concepts with sublinear regret in the OCO setting, while having a constant competitive ratio (sub linear competitive difference) for learning dynamic concepts in the MTS setting. Even more strikingly, in the context of control theory, any dynamic controller that has a constant competitive ratio (sublinear competitive difference), must have linear regret, and so there must be cases where it does significantly worse than the best static controller. Thus, one cannot simultaneously guarantee the dynamic policy is always as good as the best static policy and is nearly as good as the optimal dynamic policy. One aspect of Theorem 2 that is worth discussing further is that the cost functions used by regret and the competitive ratio are off by one. This is motivated by the different settings in the OCO and MTS literatures; however it is natural to ask the same question for R α 0 (A) and CR α 0 (A). A parallel result to Theorem 2 holds in this context, but it turns out the result is primarily due to the hardness of attaining a small CR α 0, i.e., the hardness of mimicking the dynamic optimal without 1-step lookahead. Note that this is not the case for Theorem 2. Proposition 3. Consider an arbitrary seminorm on R n, constants γ > 0, α > 0, i N, and a deterministic or randomized online algorithm A. For large enough T, CR α 0 (A), CD α 0 (A), and R 0(A) must satisfy Proofs CR 0 α (A) + R0(A) γ T and (7) CD 0 α (A) + R 0(A) γt. (8) We now prove Theorem 2 and Proposition 3 using onedimensional examples. Note that these examples can easily be embedded into higher dimensions if desired. 3 3 To embed these examples into (R + ) n, replace functions fi α (x) by θi α (x 1,..., x n) = fi α (x 1). The lower bounds on Let ᾱ = max(1, α ). Given a > 0 and b > 0, define two possible cost functions on F = [0, 1/ᾱ]: f1 α (x) = b + axᾱ and f2 α (x) = b + a(1 xᾱ). These functions are similar to those used in [36] to study online gradient descent for a concept of bounded total variation. To simplify notation, let D(t) = 1/2 E [ x t] ᾱ, and note that D(t) [ 1/2, 1/2]. Proof of Theorem 2 Consider a system with costs c t = f1 α if t is odd and f2 α if t is even. Then C i(a) (ae [x t+i] ᾱ + b) + [ (a(1 E x t+i] ᾱ) + b) t odd =bt + a t even (1/2 + ( 1) t D(t + i)) =(a/2 + b)t + a C i+1(a) (a/2 + b)t + a ( 1) t D(t + i) ( 1) t D(t + i + 1) The static optimum is not worse than the scheme that sets x t = 1/(2ᾱ) for all t, which has total cost at most (a/2 + b)t + 1/2. The α-unfair dynamic optimum for C i+1 is not worse than the scheme that sets x t = 0 if t is odd and x t = 1/ᾱ if t is even, which has total α-unfair cost at most (b + 1)T. Hence R i(a) a ( 1) t D(t + i) 1/2 CR i+1(a) α (a/2 + b)t + a T ( 1)t D(t + i + 1) (b + 1)T CD i+1(a) α (a/2 1)T + a ( 1) t D(t + i + 1). Thus, since D(t) [ 1/2, 1/2], (b + 1)T (CR i+1(a) α + R i(a)/t ) + (b + 1) 1/2 (a/2 + b)t a ( 1) t (D(t + i + 1) + (b + 1)D(t + i)) =ab ( ) ( 1) t D(t + i) a D(i + 1) + ( 1) T D(T + i + 1) abt/2 a To establish (5), it is then sufficient that (a/2 + b)t (b + 1) 1/2 abt/2 a γt (b + 1). For b = 1/2 and a = 30γ , this holds for T 5. Ci α and Ci+1 α do not depend on the choice of norm. Moreover, the (static or dynamic) optimum is no worse than the solution constrained to have x 2,..., x n = 0. This means that the theorems hold when is replaced by the seminorm defined by (x 1,..., x n) = (x 1, 0,..., 0). With this substitution, all costs depend only on x 1, and the result follows from the one-dimensional case.

7 Similarly, CD α i+1(a) + R i(a) (a/2 1)T 1/2 + a(( 1) T D(T + i + 1) D(i + 1)) (a/2 1)T 1/2 a. Taking a = 30γ again establishes (6) for T 3. Proof of Proposition 3 To prove Proposition 3, consider the cost function sequence with c t ( ) = f 0 2 for E [ x t] 1/2 and c t ( ) = f 0 1 otherwise, on decision space [0, 1], where x t is the (random) choice of the algorithm at round t. Here the expectation is taken over the marginal distribution of x t conditioned on c 1,..., c t 1, averaging out the dependence on the realizations of x 1,..., x t 1. Notice that this sequence can be constructed by an oblivious adversary before the execution of the algorithm. The following lemma, proven in Appendix B, is the key step in establishing Proposition 3. Lemma 4. Given any algorithm, the sequence of cost functions chosen by the above oblivious adversary makes the regret, α-unfair competitive ratio, and α-unfair competitive difference satisfy R 0(A),R (A) a 1/2 E [ x t] 1/2, (9) CR 0 α (A) (a/2 + b)t + a T 1/2 E [ x t], (b + α )T (10) CD 0 α (A) (a/2 α )T + a 1/2 E [ x t]. (11) From (9) and (10) in Lemma 4, we have CR α 0 (A) + R (A)/T (a/2 + b)t (b + α )T 1/2 T For a > 2γ(b + α ), the right hand side is bigger than γ for sufficiently large T, which satisfies (7). Next, from (9) and (11) CD α 0 (A) + R (A) (a/2 α )T 1/2. By choosing a > 2(γ + α ), the right hand side is at least γt for sufficiently large T, which satisfies (8). Hence, by taking a > 2 max(γ(b + α ), γ + α ), we have an instance that simultaneously satisfies (7) and (8). 5. BALANCING REGRET AND THE COMPETITIVE RATIO Given the incompatibility of regret and the competitive ratio highlighted by Theorem 2, it is necessary to reevaluate the goals for algorithm design. For example, given that sublinear regret is impossible, the next best goal is a class of algorithms that can ensure ɛt regret for an arbitrarily small constant ɛ while remaining O(1)-competitive. Or, it may be enough to ensure that the algorithm is log T or log log T - competitive while retaining sublinear regret, even though this is not a constant guarantee. In this section, we present a novel algorithm named Random Bias Greedy (RBG) which can achieve simultaneous bounds on regret in the OCO setting and competitive ratio in the MTS setting, when the decision space F is onedimensional. Further, RBG can be used to tradeoff guarantees on regret and the α-unfair competitive ratio and, in fact, our techniques provide a general approach for the design and analysis of algorithms that simultaneously balance the two performance metrics. The algorithm takes a norm N as its input: Algorithm 2 (Random Bias Greedy, RBG(N)). Given a norm N, define w 0 (x) = N(x) for all x and w t (x) = min y{w t 1 (y)+c t (y)+n(x y)}. Generate a random number r ( 1, 1). For each time step t, go to the state x t which minimizes Y t (x t ) = w t 1 (x t ) + rn(x t ). RBG is motivated by [29], where the MTS problem is solved for a line metric space. Note that the algorithm makes use of randomness, but only in a very limited way it parameterizes its bias using a single random number r ( 1, 1). Given this bias, the algorithm works by choosing actions to greedily minimize its work function w t (x). Notice that RBG can be implemented by solving a convex minimization problem in each round, or be approximated to any given precision using a dynamic programming approach similar to the classical Work Function algorithm [37]. As stated, RBG performs well for the α-unfair competitive ratio, but performs poorly for the regret. In particular, we show in Theorem 5 that, for any norm, RBG(N) with N = is 2-competitive. So, RBG has a constant competitive ratio, and thus has linear regret by Theorem 2. To remedy this, we consider an idea motivated by the incompatibility results in Section 4: to do well on regret an algorithm needs to be sticky, i.e., it needs to settle down over time. One way to force an algorithm to be more sticky is to increase the switching costs. In particular, we show below that one can achieve a balance between regret and the α-unfair competitive ratio by parameterizing RBG using the wrong norm. More specifically, recall that, the absolute value is the only norm on R in the sense that for every norm on R, x = 1 x. We take advantage of this to use different parameterizations of 1 to weight the switching cost differently. In particular, the performance of the algorithm can be bounded as follows. Theorem 5. For a SOCO problem in a one-dimensional normed space, running RBG(N) with a one-dimensional norm having N(1) = γ 1 as input (where γ 1) attains an α-unfair competitive ratio CR1 α of (1 + γ)/ min{γ, α} and a regret R 0 of O(max{T/γ, γ}) with probability 1. A few remarks on Theorem 5 are in order. First, note that Theorem 5 holds in the setting of the stronger of the two incompatibility results, i.e., Theorem 2. This setting corresponds to comparing the performance of algorithms in the MTS and OCO settings, i.e., the cost functions are mismatched in addition to the differences in the performance metrics. Second, note that if N(1) = α 1 then we obtain that RBG(N) has an α-unfair competitive ratio of (1 + 1/α) and has linear regret (with or without switching costs). However, if one wishes to obtain a suboptimal constant α-unfair competitive ratio and ɛt -regret, for an arbitrarily small ɛ, one can choose N(1) = 1 /ɛ. Similarly, if one wishes to obtain sublinear regret and T is known in advance, choosing N(1) = γ(t ) for some increasing function achieves an O(γ(T )) α- unfair competitive ratio and O(max{T/γ(T ), γ(t )}) regret.

8 If T is not known in advance, N(1) can be gradually increased as the rounds go on, and bounds similar to those in Theorem 5 still hold. Note that because of the max the optimal regret achievable by RBG(N) is O( T ), which is equal to the optimal regret in the online learning setting with arbitrary convex costs [6]. However, the α-unfair competitive ratio can be made to grow arbitrarily slowly while maintaining sublinear regret by choosing γ(t ) appropriately. Proof of Theorem 5 In order to prove Theorem 5, we state and apply a more general result, which provides a useful tool for designing algorithms that simultaneously balance regret and the α- unfair competitive ratio. In particular, for any algorithm A, we define the OperatingCost(A) as T ct (x t+1 ) and the SwitchingCost(A) as T xt+1 x t, so that C 1(A) = OperatingCost(A) + SwitchingCost(A). For ease of notation, define OP T N to be the dynamic optimal solution under a different norm N with N(1) = γ 1 (γ 1). Note that there is no penalty factor α paid on the switching cost for OP T N. The key step in proving Theorem 5 is the following lemma, which is proven in Appendix C. Lemma 6. Consider a one-dimensional SOCO problem with norm and an online algorithm A which, when run with norm N, satisfies OperatingCost(A(N)) OP T N + O(1) along with SwitchingCost(A(N)) βop T N + O(1) with β = O(1). Fix a norm N such that N(1) = γ 1 with γ 1. Then A(N) has α-unfair competitive ratio CR1 α (A(N)) = (1 + β) max{ γ, 1} and regret R0(A(N)) = α O(max{βT, (1 + β)γ}) for the original SOCO problem with norm. Given this lemma, Theorem 5 follows from the following two lemmas, proven in Appendices D and E. In particular, suppose RBG is evaluated under the norm. The first lemma states that the operating cost of RBG(N) with N(1) = γ 1, is at most OP T N. Moreover, the switching cost of RBG(N) is at most OP T N /γ. Lemma 7. Given a SOCO problem with norm, E [OperatingCost(RBG(N))] OP T N. Lemma 8. Given a one-dimensional SOCO problem with norm, E [SwitchingCost(RBG(N))] OP T N /γ with probability CASE STUDY To ground the analytic results in this paper, we end with a case study illustrating the contrast between algorithms designed for regret and those designed for the competitive ratio. To do this, we consider the setting of dynamic capacity provisioning in data centers. This is an application that has received significant attention in the networking literature recently, [8 15], and is a setting where SOCO problems and algorithms have been formulated and applied. In this application, the goal is to dynamically control the number of active servers x t in a data center in order to minimize the operating cost given time-varying workloads, cooling efficiencies, and power costs. Experimental model To model dynamic capacity provisioning, we adopt the formulation and parameter settings introduced by Lin et al. in [10] to allow for easy comparison to prior literature. workload workload time (days) time (days) Figure 1: Workload trace from an anonymous data center. The key piece of the model is to capture the operating costs of the data center c t (x t ) given the choice of the number of active servers x t. We model the operating costs c t (x t ) as a weighted sum of delay costs (i.e., revenue loss due to delay increase) and energy costs. For simplicity, we assume the load balancer dispatches workload to active servers uniformly, at rate λ t/x t per server (which is widely deployed and turns out to be optimal for many systems). Further, we assume the power of an active server handling arrival rate λ is e(λ) and the delay cost is g(λ). A common power profile for typical servers is an affine function e(λ) = e 0 + e 1λ where e 0 and e 1 are constants; e.g., see [38]. The delay cost is modeled as g(λ) = γλd(λ) [39], where d(λ) is the average delay for users and γ is a weighting factor depending on the Internet service. The average delay is modeled using the standard queuing system M/GI/1 Processor Sharing queue, i.e., the average delay is d(λ) = 1/(µ λ), where the service rate of the server is µ [40]. However, if we apply this delay cost directly, the operating cost c t will have unbounded gradient. Therefore, we enforce an additional condition on the resulting c t that dc t (x t )/dx t B, which corresponds to ensuring an SLA requirement that the average delay be bounded. Further, we assume that the power cost of an active idle server for one time slot is normalized to be 1, and the norm 1 switching cost incurred when switching a server into and out of a power-saving mode, including the wear-and-tear costs. To model a realistic scenario, we consider traced-based experiments using a workload drawn from two real-world anonymous data centers. The first trace is for a 14-day period with 10 minute granularity, i.e., T = 2016, and the second is for a 21-day period with 1 hour granularity. Both are shown in Figure 1, and both show strong daily and weekly patterns, which is common for Internet services. The peakto-mean ratio of the workloads are both between 1.5 and 2, which is common in middle to large scale data centers. In our experiments, we normalize µ = 1, and set e(λ) = 1, since the energy consumption of current servers is dominated by the fixed costs [41]. The weight γ reflects revenue lost due to customers being deterred by delay. We set γ = 0.1. The gradient bound is B = 100. Since the power cost of an active idle server for one time slot is e 0 = 1, the norm 1 measures the duration a server must be powered down to outweigh the switching cost. We set 1 so that turning a server off and back on costs the same as the energy consumption for two hours. That is, 1 = 6 with 10-minute time steps and 1 = 1 for the 1-hour time steps. This was chosen as an approximation of the time a server should sleep so that the wear-and-tear costs of power cycling matches the operating

9 x t time (hours) Workload OPT Static OPT OGD RBG(6) RBG(96) Figure 2: Dynamic control provided by OGD and RBG over a two day period. normalized cost OGD OPT s RBG C 0 RBG C RBG/ 1 costs [42]. (a) Trace 1 normalized cost OGD OPT s RBG C 0 RBG C RBG/ 1 (b) Trace 2 Figure 3: Relative costs of OGD and RBG. Experimental results The results of our numerical experiments are shown in Figures 2 and 3. Figure 2 shows the offline optimal solution (OPT), the offline optimal static solution (Static OPT), and the output of OGD and RBG(k), where for convenience k = 1 RBG/ 1, for two days of the first trace. OGD, the algorithm designed for optimizing regret, converges to the optimal static solution, and RBG(6), the algorithm designed for optimizing competitive ratio, mimics the optimal dynamic solution. If we run RBG on a bigger norm, i.e., RBG(96), we can see that its solution becomes close to the optimal static solution. This contrasting behavior is a consequence of the performance metric the algorithms are designed to optimize. Figure 3 shows the costs of different algorithms normalized to the cost of the optimal dynamic solution for the two traces. 4 In Figure 3a the relative cost of the offline optimal static solution is small (about 1.2), which implies that both the performance of OGD and RBG will perform quite well in realistic settings, in contrast to the analytical incompatibility results under adversarial settings in Section 4. Further, as 1 RBG increases, the solution of RBG deviates further from the offline optimal solution, making the C 1 cost increase and approach the cost of the static optimal solution. The tradeoff induced by 1 RBG is apparent when the second trace is considered. Specifically, Figure 3b shows that the cost C 0 (without lookahead) can decrease when 1 RBG increases. 7. CONCLUDING REMARKS In this paper we have studied the class of smoothed online convex optimization (SOCO) problems. Variants of this 4 The solution x depends on which C i is being optimized, but the cost does not. class of problems have been studied in both the online learning community (online convex optimization problems) and the online algorithms community (metrical task systems). These two communities have focused on different performance measures regret and the competitive ratio, respectively. The goal of this paper is to understand the relationship between these two performance measures. Our first contribution is to show these measures are fundamentally incompatible. Specifically, algorithms that have sublinear regret necessarily have an infinite competitive ratio, and algorithms with a constant competitive ratio necessarily have at least linear regret. Thus, the performance measure chosen has a significant impact on the style of algorithm that is designed, which helps explain why algorithms from the online convex optimization and metrical task system communities are so different. Our second contribution is the design of an algorithm that balances regret and the competitive ratio, allowing sublinear regret while having an arbitrarily slowly growing competitive ratio. Further, the proof technique used to analyze this algorithm provides a generic approach for developing algorithms that can balance between regret and the competitive ratio. There are a number of interesting directions that this work motivates. In particular, the SOCO formulation is still under-explored, and many variations of the formulation discussed here are still not understood. For example, is it possible to tradeoff between regret and the competitive ratio in bandit versions of SOCO? More generally, the message from this paper is that regret and the competitive ratio are incompatible within the formulation of SOCO. It is quite interesting to try to understand how generally this holds. For example, does the incompatibility result proven here extend to settings where the cost functions are random instead of adversarial, e.g., variations of SOCO such as k- armed bandit problems with switching costs? 8. REFERENCES [1] M. Herbster and M. K. Warmuth, Tracking the best expert, Mach. Learn., vol. 32, no. 2, pp , Aug [2] N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Inf. Comput., vol. 108, no. 2, pp , Feb [3] T. M. Cover, Universal portfolios, Mathematical Finance, vol. 1, no. 1, pp. 1 29, [4] N. Bansal, A. Blum, S. Chawla, and A. Meyerson, Online oblivious routing, in Proc. ACM Symp. Parallel Algorithms and Architectures (SPAA), 2003, pp [5] A. Kalai and S. Vempala, Efficient algorithms for universal portfolios, in Proc. IEEE Symp. Foundations of Computer Science (FOCS), [6] M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in Proc. Int. Conf. Machine Learning (ICML), T. Fawcett and N. Mishra, Eds. AAAI Press, 2003, pp [7] E. Hazan, A. Agarwal, and S. Kale, Logarithmic regret algorithms for online convex optimization, Mach. Learn., vol. 69, pp , Dec [8] D. Kusic and N. Kandasamy, Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems, Cluster Computing, vol. 10, no. 4, pp , [9] A. Gandhi, V. Gupta, M. Harchol-Balter, and M. Kozuch, Optimality analysis of energy-performance trade-off for server farm management, Performance Evaluation, no. 11, pp , Nov [10] M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska, Dynamic right-sizing for power-proportional data centers, in Proc. IEEE INFOCOM, 2011.

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 23 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Lachlan L. H. Andrew Swinburne University of Technology Siddharth Barman