Network Utility Maximization in Adversarial Environments

echncal Report Network Utlty Maxmzaton n Adversaral Envronments Qngka Lang and Eytan Modano Laboratory for Informaton and Decson Systems Massachusetts Insttute of echnology, Cambrdge, MA arxv:1712.08672v2 cs.ni 27 Dec 2017 Abstract Stochastc models have been domnant n network optmzaton theory for over two decades, due to ther analytcal tractablty. However, these models fal to capture non-statonary or even adversaral network dynamcs whch are of ncreasng mportance for modelng the behavor of networks under malcous attacks or characterzng short-term transent behavor. In ths paper, we consder the network utlty maxmzaton problem n adversaral network settngs. In partcular, we focus on the tradeoffs between total queue length and utlty regret whch measures the dfference n network utlty between a causal polcy and an oracle that knows the future wthn a fnte tme horzon. wo adversaral network models are developed to characterze the adversary s behavor. We provde lower bounds on the tradeoff between utlty regret and queue length under these adversaral models, and analyze the performance of two control polces.e., the Drft-plus-Penalty algorthm and the rackng Algorthm. I. INRODUCION Stochastc network models have been domnant n network optmzaton theory for over two decades, due to ther analytcal tractablty. For example, t s often assumed n wreless networks that the varaton of traffc patterns and the evoluton of channel capacty follow some statonary stochastc process, such as the..d. model and the ergodc Markov model. Many mportant network control polces e.g., MaxWeght 1 and Drft-plus-Penalty polcy 2 have been derved to optmze network performance under those stochastc network dynamcs. However, non-statonary or even adversaral dynamcs have been of ncreasng mportance n recent years. For example, modern communcaton networks frequently suffer from Dstrbuted Denal-of-Servce DDoS attacks or jammng attacks 3, where traffc njectons and channel condtons are controlled by some malcous entty n order to degrade network performance. As a result, t s mportant to develop effcent control polces that optmze network performance even n adversaral settngs. However, extendng the tradtonal stochastc network optmzaton framework to the adversaral settng s non-trval because many mportant notons and analytcal tools developed for stochastc networks cannot be appled n adversaral settngs. For example, tradtonal stochastc network optmzaton hs work was supported by NSF Grant CNS-1524317 and by DARPA I2O and Raytheon BBN echnologes under Contract No. HROO l l-l 5- C-0097. focuses on long-term network performance whle n an adversaral envronment the network may not have any steady state or well-defned long-term tme averages. hus, typcal steady-state analyss and many equlbrum-based notons such as the network throughput regon cannot be used n networks wth adversaral dynamcs, and t s mportant to understand transent network performance wthn a fnte tme horzon n a non-statonary/adversaral envronment. In ths paper, we nvestgate effcent network control polces that can maxmze network utlty wthn a fnte tme horzon whle keepng the total queue length small n an adversaral envronment. In partcular, we focus on the followng optmzaton problem: max α t D ωt s.t. Uα t, ω t a t b t, where Uα t, ω t s the network utlty ganed n slot t under the control acton α t constraned to some acton space D ωt and the network event ω t whch ncludes nformaton about exogenous arrvals, lnk capactes, etc. he sequence of network events {ω t } follows an arbtrary possbly adversaral process. he objectve s to maxmze the total network utlty ganed wthn a fnte tme horzon subject to the constrant that for each queue the total arrvals a t do not exceed the total departures b t durng the tme horzon. A. Man Results We develop general adversaral network models and propose a new fnte-tme performance metrc, referred to as utlty regret the formal defnton s gven n Secton II-B: R π = Uαt, ω t Uαt π, ω t, where {αt π } s the sequence of control actons taken by a polcy π, and {αt } s the optmal sequence of actons for solvng 1 generated by an oracle that knows the future. Note that a control polcy π may trvally maxmze the network utlty by smply gnorng the constrant n 1 e.g., admttng all the exogenous traffc such that the utlty 1

regret become zero or even negatve 1. However, such an acton may sgnfcantly volate the constrant n 1 and lead to large queue length. herefore, there s a tradeoff between the utlty regret and the queue length acheved by a control polcy, whch s smlar to the well-known utlty-delay tradeoff n tradtonal stochastc network optmzaton 4. In ths paper, we nvestgate ths tradeoff n an adversaral envronment. he man results are as follows. We prove that t s mpossble to smultaneously acheve both low utlty regret and low queue length f the adversary s unconstraned. In partcular, there exst some adversaral network dynamcs such that ether the utlty regret or the total queue length grows at least lnearly wth the tme horzon under any causal control polcy. hs mpossblty result motvates us to study constraned adversaral dynamcs. We develop two adversaral network models where the network dynamcs are constraned to some admssble set. In partcular, we frst consder the W -constraned adversary model, where under the optmal polcy the total arrvals do not exceed the total servces wthn any wndow of W slots. We then propose a more general adversary model called V -constraned adversary, where the total queue length generated by the oracle durng ts sample path s upper bounded by V. By varyng the values of V, the proposed V -constraned model covers a wde range of adversaral settngs: from a strctly constraned adversary to a fully unconstraned adversary. We develop lower bounds on the tradeoffs between utlty regret and queue length under both the W -contraned and the V -constraned adversary models. It s shown that no causal polcy can smultanesouly acheve both sublnear utlty regret and sublnear queue length f W or V grows lnearly wth. We also analyze the tradeoffs acheved by two control algorthms: the Drft-plus-Penalty algorthm 2 and the rackng Algorthm 5, 6 under the two adversaral models. In partcular, both algorthms smultaneously acheve sublnear utlty regret and sublnear queue length whenever W or V grows sublnearly wth, yet the theoretcal regret bound under the rackng Algorthm s better than that under the Drft-plus-Penalty algorthm. he rackng Algorthm also asymptotcally attans the optmal tradeoffs under the W -constraned adversary model. B. Related Work he study of adversaral network models dates back more than two decades ago. Rene Cruz 7 provded the frst concrete example of networks wth adversaral dynamcs, whch were later generalzed by Borodn et al. 8 under the Adversaral Queung heory AQ framework. In AQ, n each tme slot, the adversary njects a set of packets at some of the nodes. In order to avod trvally overloadng the system, the AQ framework mposes a strngent wndow 1 he negatve utlty regret may occur snce any optmal soluton {α t } s requred to satsfy the constrant n 1 whle an arbtrary polcy π may volate ths constrant. constrants: the maxmum traffc njected n every lnk over any wndow of W tme slots should not exceed the amount of traffc that the lnk can serve durng that nterval. Andrews et al. 9 ntroduced a more generalzed adversary model known as the Leaky Bucket LB model that dffers from AQ by allowng some traffc burst durng any tme nterval. he AQ model and the LB model have gven rse to a large number of results snce ther ntroducton, most of whch are about network stablty under several smple schedulng polces such as FIFO see 10 for a revew of these results. However, the AQ and the LB models assume that only packet njectons are adversaral whle the underlyng network topology and lnk states reman fxed. Such a statc network model does not capture many adversaral envronments, such as wreless networks under jammng attacks where the adversary can control the channel states. Andrews and Zhang 5, 6 extended the AQ model to sngle-hop dynamc wreless networks, where both packets njectons and lnk states are controlled by an adversary, and prove the stablty of the MaxWeght algorthm n ths context. Jung et al. 11, 12 further extended the results of 5, 6 to mult-hop dynamc networks. Our wndowbased W -constraned model s nspred by and smlar to the adversaral models used n 5, 6, 11, 12. Whle the above-mentoned works focused on network stablty, our work s most related to the unversal network utlty maxmzaton problem by Neely 13 where network utlty needs to be maxmzed subject to stablty constrants under adversaral network dynamcs. Algorthm tme-average performance s measured wth respect to a so-called W -slot look-ahead polcy. Such a polcy has perfect knowledge about network dynamcs over the next W slots but t s requred that under ths polcy the total arrvals to each queue should not exceed the total amount of servce offered to that queue durng every wndow of W slots. As a result, t s smlar to our W -constraned model where strngent wndow constrants have to be enforced. Our paper expands prevous work n a number of fundamental ways. Frst, we develop lower bounds on the tradeoffs between utlty regret and queue length under both the W -contraned and the V -constraned adversary models. As far as we know, none of the exstng works e.g., 5, 6, 11 13 provde lower bounds n any knd of adversaral network models. Second, we provde analyss under the new V -constraned adversary model whch generalzes the adversaral network dynamcs models used by exstng works. o the best of our knowledge, exstng works e.g., 5, 6, 11 13 all use the W -constraned adversary model or smlar wndows-based varants due to ts analytcal tractablty. In ths paper, we propose a new V - constraned adversary model whch gets rd of the wndow constrans. Due to the lack of wndow-based structure, the analyss carred out n exstng works cannot be appled to the V -constraned model. We develop new analytcal results under the new V -constraned model by convertng the V -constraned model to a W -constraned model wth a

carefully selected wndow sze W. C. Organzaton of ths Paper he rest of ths paper s organzed as follows. We frst ntroduce the system model and relevant performance metrcs n Secton II. We study the W -constraned and V -constraned adversary models n Sectons III and IV, respectvely. Fnally, smulaton results and conclusons are gven n Sectons V and VI, respectvely. II. SYSEM MODEL Consder a network wth N queues the set of all queues are denoted by N = {1,, N}. me s slotted wth a fnte horzon = {0,, 1}. Let ω t denote the network event that occurs n slot t, whch ndcates the current network parameters, such as a vector of condtons for each lnk, a vector of exogenous arrvals to each node, or other relevant nformaton about the current network lnks and exogenous arrvals. he set of all possble network events s denoted by Ω. At the begnnng of each tme slot t, the network operator observes the current network event ω t and chooses a control acton α t from some acton space D ωt that can depend on ω t. he network event ω t and the control acton α t together produce the servce vector bα t, ω t bt = b 1 t,, b N t and the arrval vector aα t, ω t at = a 1 t,, a N t. Note that a t ncludes both the admtted exogenous arrvals from outsde the network to queue, and the endogenous arrvals from other queues.e., routed packets from other queues to queue. hus, the above network model accounts for both sngle-hop and mult-hop networks, and the control acton α t may correspond to, for example, jont admsson control, routng, rate allocaton and schedulng decsons n a mult-hop network. Let Qt = Q 1 t,, Q N t be the queue length vector at the begnnng of slot t before the arrvals n that slot. he queueng dynamcs are Q t + 1 = Q t + a t b t +, N, t, where x + = max{x, 0}. We assume that the sequence of network events {ω t } are generate accordng to an arbtrary process possbly non-statonary or even adversaral, except for the followng boundedness assumpton. Under any network event and any control acton, the arrvals and the servce rates n each slot are bounded by constants that are ndependent of the tme horzon : for any ω t Ω and any α t D ωt 0 a α t, ω t A, 0 b α t, ω t B. For smplcty, we assume B A such that both arrvals and servces are upper bounded by B n each slot. A polcy π generates a sequence of control actons α π 0,, α π wthn the tme horzon. In each slot t, the queue length vector, the arrval vector and the servce rate vector under polcy π s denoted by Q π t, a π t and b π t, respectvely. A causal polcy s one that generates the current control acton α t only based on the knowledge up untl the current slot t. In contrast, a non-causal polcy may generate the current control acton α t based on knowledge of the future. Let Uα t, ω t be the network utlty ganed n slot t f acton α t s taken under network event ω t. We assume that under any control acton and any network event, network utlty s bounded: U mn Uα t, ω t U max, ω t Ω, α t D ωt. A commonly-used form of the network utlty functon s Uα t, ω t = U x t where x t s the amount of admtted exogenous traffc to queue n slot t. ypcal examples nclude Uα t, ω t = x t total throughput, Uα t, ω t = log x t proportonal farness, etc. In wreless networks wth power control, another wdely-used network utlty functon s Uα t, ω t = P t where P t s the power allocated to queue n slot t. hs utlty functon ams to mnmze the total power consumpton. In ths paper, we consder the followng network utlty maxmzaton problem, referred to as NUM. NUM: max α t D ωt s.t. Uα t, ω t 2 a t bt, N, 3 where b t = mn{b α t, ω t, Q t} s the actual packet departures from queue n slot t. he objectve 2 s to maxmze the total network utlty ganed n the tme horzon. he constrant 3 requres that the total arrvals to each queue should not exceed the total amount of departures from that queue durng the tme horzon. Note that the above optmzaton problem s a natural analogue of the tradtonal stochastc network optmzaton problem 4, where the tme-average utlty s maxmzed subject to certan network stablty constrants. Indeed, f we consder a stochastc network wth an nfnte tme horzon, then the objectve 2 s equvalent to maxmzng tme-average network utlty, and the constrant 3 requres that the tme-average arrval rate to each queue should not exceed the tme-average servce rate, whch s equvalent to rate stablty 2. A. Asymptotc Notatons Let f and g be two functons defned on some subset of real numbers. hen fx = Ogx f fx lm sup x gx <. Smlarly, fx = Ωgx f fx lm nf x gx > 0. Also, fx = Θgx f fx = Ogx and fx = Ωgx. In addton, fx = ogx fx f lm x gx = 0, and n ths case we say that fx s sublnear n gx. 2 A network s rate-stable under a control polcy π f Qπ / 0 as.

B. Performance Metrcs Our objectve s to fnd a causal control polcy that can maxmze the network utlty whle keepng the total queue length small. Note that a network wth adversaral dynamcs may not have any steady state or well-defned tme averages. Hence, t s crucal to understand the transent behavor of the network, and the tradtonal equlbrum-based performance metrcs may not be approprate n an adversaral settng. As a result, we ntroduce the noton of utlty regret to measure the fnte-tme performance acheved by a control polcy. Defnton 1 Utlty Regret. Gven the tme horzon, the utlty regret acheved by a polcy π under a sequence of network events ω 0,, ω s defned to be {ω 0,, ω } = Uαt, ω t Uαt π, ω t, 4 R π where {αt } s an optmal soluton to NUM generated by an oracle that knows the entre sequence of network events {ω 0,, ω } n advance. In ths setup, a polcy π s chosen and then the adversary selects the sequence of network events {ω 0,, ω } that maxmze the regret. Intutvely, the noton of utlty regret captures the worst-case utlty dfference between a causal polcy and an deal -slot lookahead non-causal polcy. Note that any optmal soluton {αt } to NUM s a utlty maxmzng polcy subject to the constrant 3 that t clears all the backlogs wthn the tme horzon. A causal control polcy may trvally maxmze the network utlty by smply gnorng the stablty constrant 3 e.g., admttng all the exogenous traffc such that the utlty regret become zero or even negatve. However, such an acton may sgnfcantly volate the stablty constrant 3 and lead to large total queue length. As a result, there s a tradeoff between the utlty regret and the total queue length acheved by a causal control polcy. A desrable frst order characterstc of a good polcy π s that t smultaneously acheves sublnear utlty regret and sublnear queue length w.r.t. the tme horzon,.e., R π = o and Qπ = o. Sublnear utlty regret guarantees that R π / 0 as the tme horzon, meanng that the tme-average utlty ganed under polcy π asymptotcally approaches that under the optmal non-causal polcy. In other words, the long-term tme-average utlty s optmal. Sublnear queue length ensures Qπ / 0 as, whch s equvalent to rate stablty. Note that smultaneously achevng sublnear utlty regret and sublnear queue length s equvalent to maxmzng longterm tme-average utlty subject rate stablty, whch s the goal of tradtonal stochastc network optmzaton 4. Note that smultaneously achevng sublnear utlty regret and sublnear queue length s just a coarse-graned requrement for a good tradeoff between utlty regret and queue length. In an adversaral settng wth no steady state, the fne-graned growth rates of utlty regret and queue length are equally mportant and should also be well balanced. A better tradeoff n terms of ther growth rates mples that the polcy has a better learnng ablty and can adapt to the adversaral envronment faster. Unfortunately, the followng theorem shows that n general no causal polcy can smultaneously acheve both sublnear utlty regret and sublnear queue length. heorem 1. For any causal polcy π, there exsts a sequence of network events ω 0,, ω such that ether the utlty regret R π {ω 0,, ω } = Ω or the total queue length Qπ = Ω. Proof: We prove ths theorem by consderng a specfc one-hop network wth 2 users and constructng a sequence of adversaral network dynamcs such that ether the utlty regret or the total queue length grows at least lnearly wth the tme horzon. More specfcally, the tme horzon s splt nto two parts. In the frst /2 slots, the adversary just generates some regular network events, let the polcy run and observes the queue lengths of the two users. In the remanng /2 slots, the adversary sets the capacty to zero for the user wth a longer queue and creates suffcent capacty for the other user such that the performance of the causal polcy s sgnfcantly degraded whle the oracle can stll perform very well. See Appendx A for detals. heorem 1 shows that s t mpossble to acheve sublnear utlty regret whle mantanng sublnear queue length, f the adversary has unconstraned power n determnng the network dynamcs. As a result, n the followng two sectons, we develop two adversary models where the sequence of network events.e. network dynamcs that the adversary can select s constraned to some admssble set. In Secton III, we consder the W -constraned adversary model that s an extenson of the wdely-known yet very strngent model used n Adversaral Queueng heory. Next n Secton IV, we develop a more relaxed adversary model called the V -constraned adversary. Lower bounds on the tradeoffs between utlty regret and queue length as well as the performance of some commonly-used algorthms are analyzed under the two adversary models. III. W -CONSRAINED ADVERSARY MODEL In ths secton, we nvestgate the W -constraned adversary model whch s an extenson of the classcal Adversaral Queueng heory AQ model 8. It has strngent constrants on the set of admssble network dynamcs that the adversary can set, yet s analytcally tractable, whch facltates our subsequent nvestgaton of a more relaxed adversary model n Secton IV. We frst gve the defnton of W -constraned network dynamcs. Defnton 2 W -Constraned Dynamcs. Gven a wndow sze W 1,, a sequence of network events ω 0,, ω s W -constraned f a τ b τ, N, t, 5

where { a t } and { b t } s the optmal soluton to NUM under the above sequence of network events. Note that f there exst multple optmal solutons to NUM, then constrant 5 s only requred to be satsfed by any one of them. Any network satsfyng the above s called a W - constraned network. In other words, under the optmal possbly non-causal polcy, the total amount of arrvals to each queue does not exceed the total amount of servce offered to that queue durng any wndow of W slots. Denote by W the set of all sequences of network events {ω 0,, ω } that are W -constraned. hen the W -constraned adversary can only select the sequence of network events from the constraned set W. In the followng, we frst provde a lower bound on the tradeoffs between utlty regret and queue length under the W -constraned adversary model Secton III-A, and then analyze the tradeoffs acheved by several common control polces Secton III-B. Note that throughout ths secton we manly focus on the dependence of utlty regret and queue length on W and whle treatng the number of users N a constant. A. Lower Bound on the radeoffs he followng theorem provdes a lower bound on the tradeoffs between utlty regret and queue length under the W -constraned adversary model. heorem 2. For any causal polcy π, there exsts a sequence of network events {ω 0,, ω } W such that R π {ω 0,, ω } + c Q π c W, where c > 0 s some constant. Proof: We prove ths theorem by constructng a sequence of network events such that lower bound s attaned. he constructon s smlar to the one used n the proof of heorem 1. he dfference s that the constructed sequence of network events s W -constraned here. See Appendx B for the detaled proof. Note that f the wndow sze W s comparable wth the tme horzon,.e., W = Θ, the above theorem shows that no causal polcy can smultaneously acheve sublnear utlty regret and sublnear queue length under the W -constraned adversary model. On the other hand, f W = o, there mght exst some causal polcy that attans sublnear utlty and subnear queue length smultaneously, whch we nvestgate n the next secton. In partcular, we show that the above lower bound can be asymptotcally attaned by some causal polcy. B. Algorthm Performance n W -Constraned Networks In ths secton, we analyze the tradeoffs between utlty regret and queue length acheved by two network control algorthms under the W -constraned adversary model. he frst s the famous Drft-plus-Penalty algorthm 2 that was proved to acheve good utlty-delay tradeoffs n stochastc networks. he second s a generalzed verson of the rackng Algorthm 5, 6 that was orgnally proposed for Adversaral Queueng heory. In partcular, we show that the rackng Algorthm attans the tradeoff lower bound n heorem 2. 1 Drft-plus-Penalty Algorthm: In each slot t, the Drftplus-Penalty algorthm observes the current network event ω t and the queue length vector Qt, and choose the followng control acton αt DP : αt DP = arg max Q t b α t, ω t a α t, ω t α t D ωt 6 + V Uα t, ω t, where V > 0 s a parameter controllng the tradeoffs between utlty regret and queue length. Note that Q t b α t, ω t a α t, ω t corresponds to the drft part whle V Uα t, ω t s the penalty part. he control actons n the Drft-plus-Penalty algorthm can be usually decomposed nto several actons. For example, n one-hop networks wthout routng, at corresponds to the admtted exogenous arrval vector n slot t. Suppose that the utlty functon s n the form Uα t, ω t = U a t. hen the Drft-plus-Penalty algorthm can be decomposed nto the solutons of two sub-problems. Admsson Control Choose at = arg max V U a Q ta. a Resource Allocaton and Schedulng Choose bt = arg max Q tb. b he frst part s usually a convex optmzaton problem whle the second part corresponds to the MaxWeght polcy 1. he followng theorem gves the performance of the Drftplus-Penalty algorthm n W -constraned networks. heorem 3. In any W -constraned network, the Drft-plus- W Penalty algorthm wth parameter V acheves O V utlty regret and the total queue length s O W + V. Proof: he proof s based on the Lyapunov drft analyss. However, nstead of consderng the one-slot drft as n the tradtonal stochastc analyss, we fnd upper bounds on the W -slot drft-plus-penalty term and make sample-path arguments. See Appendx C for detals. here are several mportant observatons about heorem 3. Frst, f parameter V s set approprately, then sublnear utlty regret and sublnear queue length can be smultaneously acheved by the Drft-plus-Penalty algorthm n W - constraned networks as long as W = o. For example, f W = Θ 1/2, then settng V = Θ 3/4 yelds the utlty regret of O 3/4 and the total queue length of O 7/8. Notcng that sublnear utlty regret and sublnear queue length cannot be acheved smultaneously by any causal

polcy f W = Ω heorem 2, we have the followng corollary. Corollary 1. Under the W -constraned adversary model, sublnear utlty regret and sublnear queue length are smultaneously achevable f and only f W = o. Second, the performance of the Drft-plus-Penalty algorthm could be much worst than the lower bound n heorem 2. For example, f W = Θ 1/2, then one of the tradeoffs mpled by the lower bound s that the utlty regret s Θ 1/2 and the total queue length s also Θ 1/2, whch s not achevable by the Drft-plus-Penalty algorthm. In the next secton, we develop an algorthm that has a better performance and attans the lower bound. 2 rackng Algorthm: he tradeoff bounds acheved by the Drft-plus-Penalty algorthm s relatvely loose as compared to the lower bound n heorem 2. In ths secton, we develop the rackng Algorthm that has a better performance and attans the lower bound n heorem 2. he orgnal rackng Algorthm was proposed n 5, 6 to solve a schedulng problem under the Adversaral Queueng heory model. However, t only works for a very specfc network model: the network has to be snglehop where the arrval vector s ndependent of the control acton, and the control acton has to satsfy the prmary nterference constrants,.e., only one lnk ncdent on the same node can be actvated n each slot. Next, we extend the orgnal rackng Algorthm to accommodate the general network model consdered n ths paper. Let Ω be the set of all possble network events that could happen n each slot. In order for the rackng Algorthm to work, the cardnalty of Ω has to be fnte otherwse t could be dscretzed nto a fnte set as n 5. For example, n a sngle-hop network, suppose each network event ω t corresponds to a couple At, St where At s a vector of exogenous packet arrvals n slot t and St a vector of lnk states n slot t. For any lnk and tme t, assume that 0 A t B and A t s an nteger, and each lnk only has a fnte number of S states. hen Ω = SB N. he rackng Algorthm s gven n Algorthm 1. It mantans an acton queue Q ω for each type of network events ω Ω. he acton queue Q ω stores the optmal actons that the rackng Algorthm should have taken when network event ω occurred. Note that the sequence of optmal control actons cannot be calculated onlne but can be calculated every W slots due to the wndow structure 5. In the rackng Algorhtm, the sequence of optmal actons durng each wndow are added to the acton queues n batch at the end of ths wndow steps 8-9. Here, the optmal actons durng a wndow t W + 1, t corresponds to any optmal soluton to 7 whch s also a part of the optmal soluton to NUM. In each slot t, the rackng Algorthm frst observes the current network event ω t = ω. If the correspondng acton queue Q ω s not empty.e., there are some actons we should have taken but have not taken yet, the algorthm just sets the control acton as the frst acton n the acton queue Q ω, and the acton s removed from the acton queue Q ω steps 3-5. If the acton queue s empty, the algorthm may take any feasble acton. In our analyss, we assume that no acton s taken when the acton queue s empty. t max Uα τ, ω τ s.t. W +1 t W +1 α τ D ωτ, τ. a α τ, ω τ t W +1 b α τ, ω τ, Algorthm 1 rackng Algorthm A 1: Intalze Q ω = for each ω Ω. 2: for t = 0,, 1 do 3: Observe the current network event ω t = ω. 4: f acton queue Q ω s not empty then 5: Choose the control acton α A t 7 as the frst acton n Q ω and remove ths acton from Q ω. 6: end f 7: f mod t, W = W 1 then 8: Compute the sequence of optmal control actons {ατ } t W +1 n the past wndow t W + 1, t, whch s any optmal soluton to 7. 9: For each slot τ n the past wndow t W + 1, t, enqueue the computed optmal acton ατ nto the acton queue Q ωτ, where ω τ s the network event occurrng n slot τ. 10: end f 11: end for he followng theorem gves the tradeoff between utlty regret and queue length acheved by the rackng Algorthm under the W -constraned adversary model. heorem 4. In any W -constraned network, the rackng Algorthm acheves OW utlty regret and the total queue length s OW. Proof: Snce the rackng Algorthm updates the optmal actons every W slots and replays these actons whenever possble, the number of unfulflled actons n any acton queue s at most W. hus, the performance gap between the rackng Algorthm and optmal polcy s also OW. See Appendx D for detals. here are several mportant observatons about heorem 4. Frst, under the W -constraned adversary model, sublnear utlty regret and sublnear queue length can be smultaneously acheved by the rackng Algorthm as long as W = o. Moreover, the tradeoff acheved by the rackng Algorthm s better than that of the Drft-plus- Penlaty algorthm, n terms of ther dependence on W and. For example, f W = Θ 1/2, the rackng Algorthm can acheve O 1/2 utlty regret and O 1/2 total queue length, whle such a tradeoff s not attanable by the Drftplus-Penalty algorthm.

Second, the rackng Algorthm asymptotcally acheves the lower bound n heorem 2 n the sense that t ensures that R {ω 0,, ω } + Q = OW for any {ω 0,, ω } W. As a result, the rackng Algorthm asymptotcally acheves the optmal tradeoff between utlty regret and queue length w.r.t. W and. hrd, the rackng Algorthm needs to mantan a vrtual queue for each type of network events whle the sze of the network event space Ω may be exponental n the number of users N. As a result, the rackng Algorthm may not be a practcal algorthm. he purpose of presentng the rackng Algorthm s to demonstrate that the lower bound n heorem 2 could be asymptotcally acheved by a causal polcy. Note that Andrews and Zhang 5 proposed a method to get rd of the exponental dependence on N, at the expense of much more nvolved algorthm. Fnally, the rackng Algorthm descrbed n Algorthm 1 only acheves one pont n the tradeoff curve snce t only tracks the optmal soluton to NUM. One approach to enable tunable tradeoffs s to relax the optmzaton problem 7. For example, the frst constrant n 7 can be modfed to t W +1 a α τ, ω τ t W +1 b α τ, ω τ + V, for some parameter V. Clearly, by tunng the value of V, the optmal soluton to 7 denoted by {αt V } can acheve dfferent tradeoffs. By trackng the soluton {αt V }, the rackng Algorthm can acheve tunable tradeoffs. he analyss of the tunable rackng Algorthm s smlar to the proof of heorem 4 but requres more specfc assumptons on the utlty functon, and s omtted due to space constrants. Note that the above rackng Algorthm requres W as a parameter. We dscuss how to properly select the value of W n Secton IV-C2. IV. V -CONSRAINED ADVERSARY MODEL he aforementoned W -constraned model s relatvely restrctve, where the strngent constrants 5 have to be satsfed for every wndow of W slots. In ths secton, we consder a general adversary model where the wndow constrants 5 are relaxed. he new adversary model s parameterzed by the nherent varaton n the sequence of network events, whch s measured as follows. Gven a sequence of network events ω 0,, ω and a possbly non-causal polcy, we defne V π {ω 0,, ω } = max t Q π t. he above functon measures the peak queue length acheved by polcy π durng ts sample path. We further defne V {ω 0,, ω } to be the peak queue length durng the sample path of the optmal soluton to NUM under the sequence of network events ω 0,, ω. If there are multple optmal solutons to NUM, then the one wth the smallest value of V s consdered. Note that V only depends on {ω 0,, ω } and measures the nherent varatons n the sequence of network events. Now we defne the noton of V -constraned network dynamcs where the value of V s constraned by some budget V. Defnton 3 V -Constraned Dynamcs. Gven some V 0, N B, a sequence of network events ω 0,, ω s V -constraned f V {ω 0,, ω } V. Any network satsfyng the above s called a V -constraned network. Denote by V the set of all possble sequences of network events that are V -constraned. A V -constraned adversary can only select the sequence of network events from the set V. Note that we restrct the range of V to 0, N B snce the peak queue length wthn slots s at most N B. Any larger value of V has the same effect as V = N B. Note also that the larger V s, the more varatons the network could have. By varyng the value of V from 0 to N B, the above V -constraned adversary model covers a wde range of adversaral settngs: from a strctly constraned adversary V = 0,.e., the arrvals should not exceed the servces for each queue n every slot to a completely unconstraned adversary V = N B. In the followng, we frst provde a lower bound on the tradeoffs between utlty regret and queue length under the V -constraned adversary model n Secton IV-A and then analyze the performance of the Drft-plus-Penalty polcy and the rackng Algorthm n Secton IV-B. A. Lower Bound on the radeoffs he followng theorem provdes a lower bound on the tradeoffs between utlty regret and queue length under the V -constraned adversary model. heorem 5. For any causal polcy π, there exsts a sequence of network events {ω 0,, ω } V such that R π {ω 0,, ω } + c where c > 0 s some constant. Q π c V, Proof: he proof s the same as that for heorem 2 except that we replace W wth V, thus omtted for brevty. heorem 5 shows that f V = Ω, then no causal polcy can smultaneously acheve sublnear utlty regret and sublnear queue length under the V -constraned adversary model. On the other hand, f V = o, there mght exst some causal polcy that attans sublnear utlty regret and sublnear queue length smultaneously, whch we nvestgate n Secton IV-B.

B. Algorthm Performance n V -Constraned Networks In ths secton, we analyze the tradeoffs between utlty regret and queue length acheved by two algorthms n V - constraned networks: the Drft-plus-Penalty algorthm and the rackng Algorthm. In partcular, we show that both algorthms smultaneously acheve sublnear utlty regret and sublnear queue length f V = o. 1 Drft-plus-Penalty Algorthm: he Drft-plus-Penalty algorthm dscussed n Secton III-B can be drectly appled to the V -constraned settng. he followng theorem gves the tradeoffs between utlty regret and queue length acheved by the Drft-plus-Penalty algorthm under the V - constraned adversary model. heorem 6. In any V -constraned network, the Drftplus-Penalty algorthm wth parameter V > 0 acheves 2/3 V 4/3 V + V 1/3 7/6 V 1/2 O utlty regret and the total queue length s O V 1/3 2/3 + 1/2 V. 1/2 Proof: We frst dvde the tme horzon nto frames of W slots. hen we apply the analyss used n the W - constraned adversary model and derve bounds on the W - slot drft-plus-penalty term, whch further leads to upper bounds on utlty regret and queue length. he value of W s carefully chosen to optmze these bounds. See Appendx E for detals. here are several observatons about heorem 6. Frst, the Drft-plus-Penalty algorthm acheves sublnear utlty regret and sublnear queue length under the V -constraned adversary model whenever V = o. For example, f V = Θ 1/2 and we set V = Θ 4/5, then the utlty regret and the total queue length are both O 11/12. Notce that sublnear utlty regret and sublnear queue length cannot be smultaneously acheved by any causal polcy f V = Ω heorem 5. We have the followng corollary. Corollary 2. Under the V -constraned adversary model, sublnear utlty regret and sublnear queue length are smultaneously achevable f and only f V = o. Second, the Drft-plus-Penalty algorthm does not attan the lower bound n heorem 5. For example, f V = Θ 1/2, one of the tradeoffs mpled by the lower bound s that the utlty regret s Θ 1/2 and the total queue length s also Θ 1/2, whch s not achevable by the Drft-plus-Penalty algorthm. In fact, although the Drft-plus-Penalty algorthm can acheve sublnear utlty regret and sublnear queue length, the tradeoff bound n heorem 6 s relatvely loose. In the next secton, we show that the rackng Algorthm can acheve a better tradeoff bound than the Drft-plus-Penalty algorthm. 2 rackng Algorthm: he rackng Algorthm ntroduced under the W -constraned adversary model requres that the wndow constrants 5 be satsfed for some wndow sze W. However, there mght be no wndow structure under the V -constraned adversary model and thus the rackng Algorthm cannot be drectly appled n V -constraned networks. We slghtly modfy the rackng Algorthm n two aspects. Frst, the wndow sze W s set to be W = V under the V -constraned adversary model. Second, n step 8 of the orgnal rackng Algorthm, the optmzaton problem 7 s modfed to be max s.t. t W +1 t W +1 α τ D ωτ, τ. Uα τ, ω τ a α τ, ω τ t W +1 b α τ, ω τ + V, In partcular, the frst constrant n 7 s relaxed by allowng some bursts up to V. Note that by the defnton of V - constraned networks, the optmal soluton to NUM s also a feasble soluton to 8. Under the above settng, the utlty regret and the total queue length acheved by the rackng Algorthm n V -constraned networks s gven by the followng theorem 3. heorem 7. Under the V -constraned adversary model, the rackng Algorthm acheves O V utlty regret and the total queue length s O V. Proof: he proof s smlar to the analyss under the W - constraned adversary model, except that an addtonal V terms s added n the frst constrant of 8. See Appendx F for detals. here are several mportant observatons about heorem 7. Frst, the rackng Algorthm can smultaneously acheve sublnear utlty regret and sublnear queue length whenever V = o. Second, the performance of the rackng Algorthm s better than that under the Drft-plus-Penalty algorthm n V -constraned networks. For example, f we set W = Θ V and V = Θ, then the rackng Algorthm acheves O 3/4 utlty regret and O 3/4 queue length, whch s not achevable by the Drft-plus- Penalty algorthm. Fnally, the rackng Algorthm does not attan the tradeoff lower bound n heorem 5. hus, fndng a causal polcy that can close the gap remans an open problem. Note that the above rackng Algorthm requres V as a parameter. We dscuss how to properly select the value of V n Secton IV-C2. C. Dscussons 1 Relatonshp between Adversary Models: he V - constraned adversary model generalzes the W -constraned adversary model: any sequence of network events that are W -constraned must also be V -constraned wth V = OW due to the wndow structure note that the peak queue length under the optmal polcy s at most NW B. he analyss n the V -constraned adversary model also 3 As s dscussed n Secton III-B2, the set of possble network events should be fnte n order for the rackng Algorthm to work. 8

gves a more general condton for sublnear utlty regret and sublnear queue length. 2 Choosng Parameters for rackng Algorthm: Note that the rackng Algorthm requres V as a parameter. Unfortunately, n practce, t s mpossble to know the precse value of V for a gven network n advance. o allevate ths ssue, we can search for the correct value of V. Note that the range for V s 0, NB. hen one may perform bnary search to fnd the correct value of V by runnng the rackng algorthm wth dfferent values of V over multple epsodes wthn the tme horzon e.g., f the tme horzon s = 10 5 slots, then one epsode could be 10 3 slots. Smlar technques can be appled f the rackng Algorthm s used n W -constraned networks where the value of W s requred as nput parameters. V. SIMULAIONS In ths secton, we emprcally valdate the theoretcal bounds derved n ths paper and compare the performance of the Drft-plus-Penalty and the rackng Algorthm. In our smulatons, we consder a sngle-hop network wth N = 2 users. In each slot t, the central controller observes the current network event ω t = At, St, where At s the exogenous arrval vector and St s the channel rate vector for each lnk n slot t. hen the controller makes an admsson control and a schedulng decson. he constrant on the admsson control acton s 0 a t A t for each lnk, and the constrant on the schedulng decson s that at most one of the lnks can be served n each slot. he network utlty s Uα t, ω t = log 1 + a t proportonal farness. We consder a scenaro where the channel rate vector n each slot s controlled by an adaptve adversary. me s dvded nto frames of W slots. In the frst W/2 slots of each frame, the exogenous arrvals to each user are 10 packets/slot and the channel rate for each user s also 10 packets/slot. In the remanng slots of each frame, there are no exogenous arrvals to both users whle the channel rate s zero for the user wth a longer queue and 10 packets/slot for the other user. If the two users have the same queue length, tes are broken randomly. Such a scenaro s smlar to the one that we use to prove the tradeoff lower bound under the W -constraned adversary model see the proof of heorems 2, and t has been shown that ths s a W -constraned adversary and also a V -constraned adversary wth V = 5W. Fgure 1 llustrates the growth of the total queue length and the utlty regret wth the tme horzon under the Drft-plus-Penalty algorthm wth dfferent values of V and the rackng algorthms. Frst, when W = Θ, the Drft-plus-Penalty can smultaneously acheve sublnear utlty regret and sublnear queue length, f the parameter V s set approprately for example, V = Θ 3/4. Note that settng V to some very large value e.g., V = Θ 2 stll acheves sublnear utlty regret and sublnear queue length, though the theoretcal bound on queue length see heorem 3 s at least lnear n when V = Ω, whch shows that the performance upper bound s not tght n ths scenaro. he rackng Algorthm also smultaneously acheves sublnear utlty regret and sublnear queue length when W = Θ. However, when W = Θ, both algorthms fal to acheve desrable performance: ether the utlty regret or the queue length grows lnearly wth. In fact, the lower bound n heorem 2 shows that no causal polcy can acheve both sublnear utlty regret and sublnear queue length f W = Θ. Fgure 2 shows the tradeoffs between utlty regret and queue length under the Drft-plus-Penalty algorthm and the rackng Algorthm, where we fx the tme horzon to be = 10 4 slots and the wndow sze W = Θ. Note that for the Drft-plus-Penalty algorthm, we plot a tradeoff curve snce t acheves dfferent tradeoffs by tunng the parameter V, whle only a sngle tradeoff pont s plotted for the rackng Algorthm. It s observed that the rackng Algorthm acheves a better tradeoff pont that s not achevable by the Drft-plus-Penalty algorthm. In addton, the theoretcal lower bound for any causal polcy heorem 2 and the theoretcal performance upper bounds for both algorthms heorems 3 and 4 are also valdated n the fgure. VI. CONCLUSIONS In ths paper, we focus on optmzng network utlty wthn a fnte tme horzon under adversaral network models. We show that no causal polcy can smultaneously acheve both sublnear utlty regret and sublnear queue length f the network dynamcs are unconstraned, and nvestgate two constraned adversary models. We frst consder the restrctve W -constraned adversary model and then propose a more relaxed V -constraned adversary model. Lower bounds on the tradeoffs between utlty regret and queue length are derved under the two adversary models, and the performance of two control polces s analyzed,.e., the Drft-plus-Penalty algorthm and the rackng Algorthm. It s shown that the rackng Algorthm asymptotcally attans the optmal tradeoffs under the W -constraned adversary model and that the rackng Algorthm has a better tradeoff bound than that of the Drft-plus-Penalty REFERENCES 1 L. assulas and A. Ephremdes, Stablty propertes of constraned queueng systems and schedulng polces for maxmum throughput n multhop rado networks, IEEE transactons on automatc control, vol. 37, no. 12, pp. 19361948, 1992. 2 M. J. Neely, E. Modano, and C.-P. L, Farness and optmal stochastc control for heterogeneous networks, IEEE/ACM ransactons on Networkng ON, vol. 16, no. 2, pp. 396409, 2008. 3 Y. Zou, J. Zhu, X. Wang, and L. Hanzo, A survey on wreless securty: echncal challenges, recent advances, and future trends, Proceedngs of the IEEE, vol. 104, no. 9, pp. 17271765, 2016. 4 M. J. Neely, Stochastc network optmzaton wth applcaton to communcaton and queueng systems, Synthess Lectures on Communcaton Networks, vol. 3, no. 1, pp. 1211, 2010. 5 M. Andrews and L. Zhang, Schedulng over a tme-varyng userdependent channel wth applcatons to hgh speed wreless data, n he 43rd Annual IEEE Symposum on Foundatons of Computer Scence, 2002. Proceedngs., 2002, pp. 293302. 6 M. Andrews and L. Zhang, Schedulng over nonstatonary wreless channels wth fnte rate sets, IEEE/ACM ransactons on Networkng, vol. 14, no. 5, pp. 1067-1077, Oct 2006.

otal Queue Length 15000 12000 9000 6000 3000 rackng Algorthm Drft-plus-Penalty V = Θ1 Drft-plus-Penalty V = Θ 3/4 Drft-plus-Penalty V = Θ 2 Utlty Regret 10 4 12 10 8 6 4 2 rackng Algorthm Drft-plus-Penalty V = Θ1 Drft-plus-Penalty V = Θ 3/4 Drft-plus-Penalty V = Θ 2 otal Queue Length 10 4 4 rackng Algorthm Drft-plus-Penalty V = Θ1 Drft-plus-Penalty V = Θ 3/4 3 Drft-plus-Penalty V = Θ 2 2 1 Utlty Regret 10 4 5 rackng Algorthm Drft-plus-Penalty V = Θ1 4 3 2 1 Drft-plus-Penalty V = Θ 3/4 Drft-plus-Penalty V = Θ 2 0 0 1 2 3 4 5 me Horzon 10 5 0 0 1 2 3 4 5 me Horzon 10 5 0 0 2 4 6 8 10 me Horzon 10 4 0 0 2 4 6 8 10 me Horzon 10 4 a Queue Length W = Θ b Utlty Regret W = Θ c Queue Length W = Θ d Utlty Regret W = Θ Fg. 1. Growth of total queue length and utlty regret wth the tme horzon under an adaptve W -adversary. Utlty Regret 10 5 10 4 10 3 10 2 10 1 Drft-plus-Penalty smulated performance Drft-plus-Penalty theoretcal upper bound rackng Algorthm smulated performance rackng Algorthm theoretcal upper bound Lower bound for any causal polcy 10 2 10 3 10 4 10 5 otal Queue Length Fg. 2. radeoffs between utlty and total queue length double log scale. he tme horzon s fxed to be = 10 4 slots and W = Θ. 7 R. L. Cruz, A calculus for network delay.. network elements n solaton, IEEE ransactons on nformaton theory, vol. 37, no. 1, pp. 114131, 1991. 8 A. Borodn, J. Klenberg, P. Raghavan, M. Sudan, and D. P. Wllamson, Adversaral queung theory, Journal of the ACM JACM, vol. 48, no. 1, pp. 1338, 2001. 9 M. Andrews, B. Awerbuch, A. Fernandez,. Leghton, Z. Lu, and J. Klenberg, Unversal-stablty results and performance bounds for greedy contenton-resoluton protocols, Journal of the ACM JACM, vol. 48, no. 1, pp. 3969, 2001. 10 V. Cholv and J. Echague, Stablty of ffo networks under adversaral models: State of the art, Computer Networks, vol. 51, no. 15, pp. 44604474, 2007. 11 M. Andrews, K. Jung, and A. Stolyar, Stablty of the max-weght routng and schedulng protocol n dynamc networks and at crtcal loads, n Proceedngs of the hrty-nnth Annual ACM Symposum on heory of Computng, ser. SOC 07. ACM, 2007, pp. 145154. 12 S. Lm, K. Jung, and M. Andrews, Stablty of the max-weght protocol n adversaral wreless networks, IEEE/ACM rans. Netw., vol. 22, no. 6, pp. 18591872, Dec. 2014. 13 M. J. Neely, Unversal schedulng for networks wth arbtrary traffc, channels, and moblty, n Decson and Control CDC, 2010 49th IEEE Conference on. IEEE, 2010, pp. 18221829. 14 Sha et al., Onlne learnng and onlne convex optmzaton, n Foundatons and rends n Machne Learnng, vol. 4, no. 2, pp. 107-194, 2012 APPENDIX A PROOF O HEOREM 1 We prove ths theorem by constructng a sequence of network events ω 0,, ω such that ether utlty regret or total queue length grows at least lnearly wth the tme horzon. Consder a sngle-hop network wth 2 lnks. In each slot t, the central controller observes the current network event ω t = At, St, where At s the exogenous arrval vector and St s the channel rate vector for each lnk n slot t. hen the controller makes an admsson control and a schedulng decson. he constrant on the admsson control acton s 0 a t A t for each lnk, and the constrant on the schedulng decson s that at most one of the lnks can be served n each slot. he network utlty s a functon of the admtted traffc vector at,.e., Uα t, ω t = Uat = U a t, where U x s convex and strctly ncreasng n x. In partcular, any subdervatve of U x over the range x 0, B s lower bounded by some constant c > 0. ypcal examples of such utlty functons are Uat = a t total throughput and Uat = log a t proportonal farness. Wthout loss generalty, assume that the tme horzon s an even number. he exogenous arrvals and channel rates n the frst /2 slots are A 1 t = A 2 t = 2, S 1 t = S 2 t = 2, t = 0,, 2. For any causal polcy π, let B π 1 and B π 2 be the number of packets cleared over lnk 1 and 2 durng the frst /2 slots, respectvely. Also let A π 1 and A π 2 be the number of admtted packets to lnk 1 and lnk 2 durng the frst /2 slots, respectvely. hen the queue length vector after the frst /2 slots s Q π /2 = A π B π, = 1, 2. Under the schedulng constrant, the total number of packets that can be cleared n the frst /2 slots s at most. hen we have B1 π + B2 π, whch mples that mn{b1 π, B2 π } /2. Defne arg mn B π. In the remanng /2 slots, the adversary can set A t = 0, S t = 0, t = /2,, 1.

For the other lnk ts ndex s denoted by, the adversary can set A t = 0, S t = 2, t = /2,, 1. Snce there s no capacty to clear any packet over lnk n the remanng /2 slots, we have Q π = Qπ /2 = Aπ Bπ. 9 Note that the optmal non-causal polcy can admt all the exogenous traffc whle keepng the total queue length Q = 0 by servng lnk n the frst /2 slots and servng lnk n the remanng /2 slots. As a result, the utlty regret s R π {ω 0,, ω } = = c Ua t Ua π t U a t U a π t a t a π t = c2 A π 1 A π 2, 10 where the nequalty s due to the concavty of the utlty functon and the fact that the subdervatves of the utlty functon are lower-bounded by c > 0. he last equalty holds because the total admtted traffc by the optmal polcy s a t = 2 whle the total admtted traffc by the causal polcy π s aπ t = Aπ 1 + A π 2. hen t follows that R π {ω 0,, ω } + c Q π R π {ω 0,, ω } + cq π c2 A π 1 A π 2 + A π Bπ =c2 A π Bπ c2 /2 =c/2, where the second nequalty s due to 9 and 10, and the last nequalty holds because the total admtted traffc over lnk s A π and the amount of cleared traffc over s B π /2 by the defnton of. herefore, t s mpossble for any causal polcy π to smultaneously acheve both sublnear utlty regret and sublnear queue length, otherwse R π {ω 0,, ω } + c Qπ = o. Remark: Note that the above constructon requres the value of. We can elmnate the dependence on the tme horzon by usng the standard Doublng rcks see Secton 2.3.1 n 14. APPENDIX B PROOF O HEOREM 2 We prove ths theorem by constructng a sequence of network events {ω 0,, ω } W such that the lower bound s attaned. Consder the same network settng as n the proof of heorem 1. Wthout loss generalty, assume that the wndow sze W s an even number. he exogenous arrvals and channel rates n the frst W/2 slots are A 1 t = A 2 t = 2, S 1 t = S 2 t = 2, t = 0,, W 2. For any causal polcy π, let B π 1 and B π 2 be the number of packets cleared over lnk 1 and 2 durng the frst W/2 slots, respectvely. Also let A π 1 and A π 2 be the number of admtted packets to lnk 1 and lnk 2 durng the frst W/2 slots, respectvely. hen the queue length vector after the frst W/2 slots s Q π W/2 = A π B π, = 1, 2. Under the schedulng constrant, the total number of packets that can be cleared n the frst W/2 slots s at most W. hen we have B1 π + B2 π W, whch mples that mn{b1 π, B2 π } W/2. Defne arg mn B π. In the remanng W/2 slots, the adversary can set A t = 0, S t = 0, t = W/2,, 1. For the other lnk ts ndex s denoted by, the adversary can set A t = 0, S t = 2, t = W/2,, 1. Snce there s no capacty to clear any packet over lnk n the remanng W/2 slots, we have Q π = Qπ W/2 = Aπ Bπ. 11 Note that the optmal non-causal polcy can admt all the exogenous traffc whle satsfyng the wndow constrants 5 by servng lnk n the frst W/2 slots and servng lnk n the remanng W/2 slots. As a result, the utlty regret s R π {ω 0,, ω } = = c Ua t Ua π t U a t U a π t a t a π t = c2w A π 1 A π 2, 12 where the nequalty s due to the concavty of the utlty functon and the fact that the subdervatves of the utlty functon are lower-bounded by c > 0. he last equalty holds because the total admtted traffc by the optmal polcy s a t = 2W whle the total admtted traffc by