Proactive Serving Decreases User Delay Exponentially: The Light-tailed Service Time Case

Proactive Serving Decreae Uer Delay Exponentially: The Light-tailed Service Time Cae Shaoquan Zhang, Longbo Huang, Minghua Chen, and Xin Liu Abtract In online ervice ytem, the delay experienced by uer from ervice requet to ervice completion i one of the mot critical performance metric. To improve uer delay experience, recent indutrial practice ugget a modern ytem deign mechanim: proactive erving, where the ervice ytem predict future uer requet and allocate it capacity to erve thee upcoming requet proactively. Thi approach complement the conventional mechanim of capability booting. In thi paper, we propoe queuing model for online ervice ytem with proactive erving capability and characterize the uer delay reduction by proactive erving. In particular, we how that proactive erving decreae average delay exponentially (a a function of the prediction window ize) in the cae where ervice time follow light-tailed ditribution. Furthermore, the exponential decreae in uer delay i robut againt prediction error (in term of mi detection and fale alarm) and uer demand fluctuation. Compared to the conventional mechanim of capability booting, proactive erving i more effective in decreaing delay when the ytem i in the light-load regime. Our trace-driven evaluation demontrate the practical power of proactive erving: for example, for the data trace of light-tailed Youtube video, the average uer delay decreae by 5% when the ytem predict 6 econd ahead. Our reult provide, from a queuing-theoretical perpective, jutification for the practical application of proactive erving in online ervice ytem. Index Term Proactive erving, queuing model, uer delay I. INTRODUCTION The fat growing number of peronal device with Internet acce, e.g., mart mobile device, ha led to the blooming of divere online ervice ytem, uch a cloud computing, cloud torage, online ocial network, mobile Internet acce, and a variety of online communication application. In online ervice ytem, delay experienced by uer from ervice requet to ervice completion i one of the mot critical performance metric. For example, experiment at Amazon howed that every -milliecond increae in the loading time of Amazon.com decreaed revenue by % [3]. Google alo Thi paper wa preented in part at ACM SIGMETRICS 24 a a poter paper [] and at ACM MAMA 25 [2]. The work preented in thi paper wa upported in part by National Baic Reearch Program of China (Project No. 23CB3367) and the Univerity Grant Committee of the Hong Kong Special Adminitrative Region, China (Area of Excellence Grant Project No. AoE/E-2/8 and General Reearch Fund No. 4295). The work of Longbo Huang wa upported in part by the National Baic Reearch Program of China Grant 2CBA3, 2CBA3, the National Natural Science Foundation of China Grant 633, 636363, 63395, the China youth -talent grant. The work of Xin Liu wa partially upported by NSF through grant CNS- 54746, CNS-4576, and CCF-423542. Shaoquan Zhang and Minghua Chen are with the Department of Information Engineering, The Chinee Univerity of Hong Kong, Shatin N.T., Hong Kong. Longbo Huang i with the Intitute for Interdiciplinary Information Science, Tinghua Univerity, Beijing, China. Xin Liu wa with Microoft Reearch at Aia, Beijing, China. found that an extra.5 econd in earch page generation time decreaed traffic by 2% [3]. Traditionally, to reduce uer delay and to improve quality of experience, ervice provider often reort to capacity booting, i.e., increaing the ervice capacity by deploying more erver. However, uch a mechanim may be expenive a it need to proviion for peak demand, which reult in low average utilization due to the burty nature of ervice requet, epecially when uer arrival are time-varying. Recent indutry practice ugget proactive erving, i.e., erving upcoming requet before they arrive, a a modern approach for reducing uer delay. Proactive erving i baed on a key obervation: many ervice requet are predictable. Thi technique ha been widely ued in computer ytem, for example in cache pre-loading and command prefetching. Similarly, in cloud ervice ytem, it i common to have predictable ervice requet. For example, in cloud computing platform, ervice job, uch a indexing, page ranking, backing up, crawling, and performing maintenance, are often predictable. In fact, in an indutrial-grade cloud computing ytem, reearcher oberve a ignificant portion, e.g., up to 76%, of the workload to be periodic and thu predictable [4]. Furthermore, individual uer behavior often follow predictable pattern [5]. For intance, if a uer watche port new regularly in the morning, then uch content can be pre-loaded to the uer device. Recently, Amazon launched Amazon Silk, a mobile web brower for it Kindle Fire [5]. All web traffic from the brower goe through the Amazon cloud and i managed by the erver in the cloud. Baed on cached uer traffic, the cloud ue machine learning technique to predict what uer will browe next. When ervice in the cloud i available, web page that are likely to be requeted are pre-loaded to uer tablet. Thu, when a uer click on the correponding content, it load intantly, reducing uer delay to zero. Thi technique peed up requet repone and improve uer browing experience. The above mentioned application cenario ugget proactive erving a a new deign mechanim for reducing uer delay: baed on uer requet arrival prediction, the ytem can allocate it capacity proactively and pre-erve upcoming requet, reducing the delay experienced by uer. Thi obervation naturally lead to two fundamental quetion: How much can we reduce uer delay by proactive erving? Cache pre-loading mean that the ytem can end content to uer cache before uer requet them. Command pre-fetching mean that the ytem can run command before they are actually called.

2 How doe proactive erving compare to capacity booting in reducing uer delay? Motivated by thee open quetion, in thi paper, we invetigate the fundamental of proactive erving from a queuing theory perpective. In particular, we tudy proactive erving with a prediction window of ize, in which one ha the ability to predict upcoming requet and erve them when capacity allow. We invetigate how proactive erving reduce uer delay a a function of. We alo conider the cae of imperfect prediction. A dicued in Section II, thi work differ from all exiting one in it model and problem etting. It provide theoretical foundation for uing proactive erving to reduce uer delay. Challenge. We addre two technical challenge in our tudy. Firt, a generic approach to obtain average uer delay i to model the queuing ytem with proactive erving capability (baed on perfect or imperfect prediction) uing a multidimenional Markov chain. Then we compute it teady-tate ditribution and ubequently average uer delay. However, it i highly non-trivial to derive cloed-form expreion for teady-tate ditribution of multi-dimenional Markov chain, and it i hard to generalize thi approach to cenario with imperfect prediction. We addre thi challenge by developing a new approach that relate the uer delay ditribution with proactive erving to that without proactive erving. Thi allow u to obtain the delay ditribution without proactive erving, which i uually a le complicated tak, and then to derive the deired one with proactive erving. Thi imple approach reveal inight into the delay reduction enabled by proactive erving. More importantly, the approach can be generalized to variou queuing model. Second, even with our new approach, it i till non-trivial to characterize uer delay with proactive erving in the preence of prediction error. We carefully model the ytem behavior under two type of prediction error, namely mi detection and fale alarm, a a priority queue. Through an involved derivation, we obtain cloed-form expreion for the average uer delay. The expreion allow u to reveal the exact relationhip of the delay performance and ytem deign parameter. Contribution. We make the following contribution. In Section III, we preent the firt et of queuing model for ervice ytem with proactive erving capability. Thee model allow u to leverage queuing theory tool to characterize the delay reduction by proactive erving. In Section IV, for table M/M/ queuing ytem with proactive erving capability, we how that the average uer delay decreae exponentially in the prediction window ize. Furthermore, baed on the inight from the M/M/ ytem, we prove that exponential delay decrement hold for the more general G/G/ queuing ytem. In Section V, we tudy the impact of imperfect prediction on delay reduction. We conider two type of prediction error: mi detection and fale alarm. We contruct queuing model for the tudy and characterize the average uer delay under proactive erving with imperfect prediction. We how that in the preence of mi detection, average uer delay till decreae exponentially in the prediction window ize, but it converge to a poitive contant determined by the fraction of mi detection (intead of converging to zero a would happen with perfect prediction). Meanwhile, in the preence of fale alarm, we how that there exit an effective ervice rate determined by both the ytem ervice rate and the fraction of fale alarm. Uer delay decreae exponentially to zero a prediction window ize increae if the actual uer requet arrival rate i maller than the effective ervice rate; and otherwie it decreae to a poitive contant determined by the fraction of fale alarm. In Section VI, by comparing proactive erving to capacity booting, we how that proactive erving i more effective in reducing uer delay in a light-load regime. In Section VII, we evaluate the performance of proactive erving uing imulation baed on real-world trace. Specifically, for the data trace of light-tailed Youtube video, uer delay decreae by 5% when the ytem can predict 6 econd ahead. For the data trace of heavy-tailed Youtube video, uer delay decreae by 5% when the ytem can predict 84 econd ahead. We note that there are efficient mechanim for predicting uer requet in the near future for indutrial-grade VoD ytem [6]. II. RELATED WORK In [7], the author tudy how prediction can be ued to facilitate queue admiion control, i.e., which requet hould be redirected away (up to certain rate) in order to minimize the average queue length. They find that, in the heavy-load regime (arrival rate λ ), when the ize of look-ahead window i O(log λ ), the achieved average queue length i the ame a that when we have complete future knowledge. Similar idea i alo adopted in [8] to reduce waiting time for patient in Emergency Department by proactively managing the admiion control. Thee two work ue future information to control admiion of requet into the ytem, intead of pooling idle ervice capacity to proactively erve future requet a we explore in thi paper. When the ytem can pre-erve upcoming requet baed on prediction, the author in [9] conider predictive cheduling in controlled queuing ytem. They propoe the Predictive Backpreure algorithm to achieve the optimal utility performance. With proactive erving, the author in [] how that the probability of erver outage in wirele network decreae exponentially with the ize of look-ahead window. They alo how that appropriate ue of prediction by primary uer can improve the gain of the econdary network at no cot of primary uer in cognitive network. The author in [] explore the idea of proactive erving in data network to hape uer demand o that the time average expected cot incurred by the ervice provider i minimized. In [2], the author combine mart pricing and proactive content caching in mobile ervice ytem and how that the combination can increae the profit of the ervice provider while at the ame time reducing enduer cot. In [3] and [4], the author provide a prediction model and develop technique for proactively erving web content. What differentiate our work from their i that we tudy the effect of proactive erving on decreaing uer delay.

3 In addition, there have been many work on proactively erving mobile content baed on predicting mobile uer traffic and mobility pattern [5], [6], [7], [8], [9]. In [2], the author preent a ytem architecture for mobile pre-fetching where informed pre-fetching i tructured a a library to which any mobile application may link. For general computer ytem, the author in [2], [22], [23] explore prediction and prefetching for file ytem, databae, and DRAM repectively. Thee work mainly focu on practical ytem deign and call for theoretical invetigation. A(t) Q(t) Fig.. A ingle queue ervice ytem. t prediction window t + erver time III. MODEL A. Service Sytem without Proactive Serving Conider a ervice ytem hown in Fig.. Incoming uer requet arrive at the ytem according to a continuou proce {A(t)} t. For all t, A(t) {, } where A(t) if a uer requet arrive at t and A(t) otherwie. Server in the ytem provide ervice upon uer requet. When a uer requet arrive, the requet will be erved if there i idle ervice capacity. Otherwie, the requet wait in the queue Q(t) until ervice i available. The requet will leave the ytem upon ervice completion. We define uer delay a the time from uer requet arrival till it departure. Traditionally, queuing theory ha been applied to model uch a ytem and tudy it performance. In particular, queuing theoretic analyi ugget booting ytem capacity a a principled mechanim to reduce average uer delay. For example, one can ue tandard M/M/ queuing model to repreent the ervice ytem in Fig., and it i well known that average uer delay decreae invere-proportionally in ervice capacity. B. Service Sytem with Proactive Serving We now conider a ervice ytem that can proactively erve upcoming uer requet baed on arrival prediction. For the eae of preentation, we conider perfect prediction of arrival in thi ection and in Section IV, and we tudy imperfect prediction in Section V. A hown in Fig. 2, we aume that the ytem can predict uer requet arrival amount of time ahead. That i, at time t, the ytem know exactly in (t, t+) the requet arrival epoch and the correponding uer who generate the requet 2. We aume that the ytem doe not know the ervice time, i.e., how long it take to erve a requet. The rationale behind the aumption i that even if the ytem know the workload of the requet (e.g., ize of a video), it i till difficult for the ytem to know exactly how long it take to erve the requet becaue of the dynamic in erver capabilitie and the timevarying available bandwidth between erver and uer. Baed on arrival prediction, the ytem can erve upcoming requet proactively. The uer requet that are pre-erved will not enter the ytem. For the ervice ytem without proactive erving capability, erver remain idle when there i no requet in the ytem. In contrat, for the ervice ytem with proactive 2 For ervice ytem that are content-centric, we conider the timecale at which the content refreh/update i much larger a compared to the prediction horizon. Thu, content delivered proactively to a cutomer will not be outdated by the time the cutomer actually requet them. Fig. 2. Prediction model: Each upright arrow repreent a requet arrival. At time t, the ytem know in (t, t + ) the requet arrival epoch (red olid arrow) and the correponding uer who generate thee requet by it prediction mechanim. erving capability, erver capacity can be allocated to erve upcoming requet when there i no requet in the ytem. We depict the ervice ytem with proactive erving capability under perfect prediction in Fig. 3. In the figure, Q (t) denote the queue that tore the requet that have arrived at the ytem and are waiting for ervice at time t. W (t) denote the prediction window of ize. Each uer requet firt goe through the prediction window W (t) and then enter the queue Q (t). The erver can erve the requet in both Q (t) and W (t). We remark that each requet entering W (t) will tranit to Q (t) after exactly amount of time, if it ha not been pre-erved completely before that. Requet will not queue up in W (t). Thu W (t) hould be viewed a a pipe rather than a queue. Uer delay correpond to the time that the requet pend in Q (t) and with the erver, and it doe not include the time pent in W (t). Slightly abuing the notation, we ometime ue Q (t) and W (t) to denote their correponding ize repectively. For example, W (t) can repreent the number of arrival within the prediction window that have not been pre-erved completely at time t. In thi paper, we aume no contraint on the uer ide when proactive erving i conducted. For example, when cache pre-loading cenario i conidered, we aume that there i no limit on uer cache ize and the cache can be alway acceed. C. Queuing Model for Service Sytem with Proactive Serving Capability In thi paper, we are intereted in undertanding the fundamental benefit of proactive erving on uer delay reduction. For thi purpoe, we extend the claical queuing model to capture the proactive erving behavior. In the claical queuing model, requet arrive randomly at the ytem. If all erver are buy, requet will wait in the queue for ervice. Server erve requet according to a ervice policy. The amount of time to erve a requet i random. In our extended model, there i alo a prediction window (modeled a a pipe). Each requet firt goe through the pipe before entering the queue. Server can erve requet in either the queue or the pipe according to a ervice policy.

4 A( t + ) A( t + ) Q p (t) erver W (t) Fig. 4. M/M/: The arrival proce i {A(t+)} t. Service time of requet are independent and identically exponentially ditributed with mean /µ. The initial value Q p () i A( : ) + Q (). The ervice policy i FCFS. Q ( t ) erver Fig. 3. Service ytem with perfect prediction: Q (t) repreent the queue of the requet that have arrived at the ytem and are waiting for ervice at time t. W (t) i the prediction window of ize. Each arrived uer requet firt goe through the prediction window W (t) and then enter the queue Q (t). The erver can oberve and erve the requet in both Q (t) and W (t). In the claical queuing theory, Kendall notation i widely ued to decribe a queuing ytem. We extend it to decribe the ervice ytem with proactive erving capability a A/S/k [] /POLICY. Here A repreent the ditribution of requet inter-arrival time, S repreent the ditribution of ervice time, k i the number of erver, and POLICY denote the ervice dicipline, uch a Firt-Come-Firt-Served (FCFS). [] denote that the ytem can predict upcoming arrival amount of time ahead and erve them proactively. For example, one can ue M/M/ [] /FCFS to model the ytem in Fig. 3. Both the inter-arrival time and the ervice time follow exponential ditribution. There i a ingle erver in the ytem. The ytem can predict upcoming arrival amount of time ahead and erve them proactively. When the erver become idle, it firt examine Q (t). If Q (t) >, the erver erve the requet at the head of Q (t). If Q (t), the erver then check the prediction window W (t) and preerve the earliet requet in it. A queuing ytem i table if arrival happen lower than ervice completion, and i untable otherwie. Untable ytem do not have teady-tate ditribution and conequently do not have well-defined average uer delay. Thu we only tudy uer delay for table queuing ytem. To focu on characterizing the benefit of proactive erving and avoid the complication of ervice policy deign, we aume FCFS a the ervice policy in the ret of the paper, unle mentioned otherwie. Our analyi how that even under the imple cheduling policy FCFS, proactive erving i very effective in reducing uer delay. IV. PROACTIVE SERVING WITH PERFECT PREDICTION In thi ection, we tart from the imple M/M/ [] model and how that proactive erving can decreae the average uer delay exponentially. Baed on the inight from the M/M/ [] model, we extend the analyi to the general G/G/ [] model and how that exponential delay decrement hold regardle of inter-arrival time and ervice time ditribution. We remark that we have alo obtained imilar exponential uer-delay decrement reult for the Markovian/Geo/ [] model, in which the uer requet arrival rate i time-varying, governed by an underlying Markov proce. We kip the detail here due to the pace limitation and refer intereted reader to our technical report [24]. A. Average Uer Delay of M/M/ [] In an M/M/ [] model, there i a ingle erver. Uer requet arrive according to a Poion proce {A(t)} t with rate λ, and ervice time of requet are independent and identically ditributed according to an exponential ditribution with parameter µ. The ytem can predict upcoming arrival amount of time ahead and erve them proactively. Define D a the uer delay and ρ λ µ. When, i.e., no proactive erving, the M/M/ [] model reduce to the claical M/M/ queuing model. It well known that the average uer delay i given by E [ D ] µ λ. () The probability denity function of D i alo known a [25] f D (t) (µ λ)e (µ λ)t, t. (2) To characterize the average uer delay with proactive erving, i.e., E[D ], a generic approach i to dicretize the ytem and then ue a multi-dimenional Markov chain to model the number of requet in queue and the prediction window, where uer requet paing through the window can be captured by moving from one tate to it ubequent tate in the Markov chain. If we can compute the teady-tate ditribution of the multi-dimenional Markov chain, then we can compute the average number of uer requet in the queue, and conequently the average uer delay by uing Little Law. However thi approach uffer from two limitation. One i that it i difficult to derive the tationary ditribution of multidimenional Markov chain. The other i that it i hard to generalize the method to tudy more complex model. Detailed dicuion can be found in Appendix B. Intead of applying the generic method, we leverage problem tructure to analyze the average uer delay. To analyze the uer delay for M/M/ [], we firt prove that Q um (t) Q (t) + W (t) evolve the ame a an M/M/ queue with a properly initialized queue. Baed on thi obervation, the ditribution of uer delay under proactive erving, i.e., D, turn out to be a hifted verion of that of uer delay without proactive erving a hown in (2). Once we know the ditribution of D, we can compute the average uer delay E [D ]. More interetingly, thi approach can be applied to variou queuing model.

5 To proceed, conider Q (t) and W (t) a a group. The requet arrival proce of the group i {A(t+)} t. The time to erve a requet i exponentially ditributed. The erver erve the requet in the group according to the FCFS policy. Thi eentially mimic an M/M/ queue. Intuitively, the ize of the group evolve tatitically the ame a the queue ize of the M/M/ queue. The following lemma confirm thi obervation. Lemma : Define Q p () A( : ) + Q (), where A( : ) {A(τ), < τ } i the et of arrival from time to time and A( : ) i it ize. Q p (t) i an M/M/ queue with initial value Q p () a hown in Fig. 4. We have Q um (t) Q p (t), for all t. Proof: Under the condition of Q p () A( : ) + Q () and Q um () Q p (). By time t, Q um (t) and Q p (t) accept the ame et of arrival. Becaue both queue contain the ame et of arrival and adopt the ame queuing dicipline, Q um (t) and Q p (t) have the ame equence of departure up to time t. A a reult, Q um (t) Q p (t). Lemma reveal a ueful obervation: the total time that a uer requet pend in M/M/ [], i.e., the um of thoe in the prediction window W (t), the queue Q (t), and with the erver, i tatitically the ame a that in an M/M/ queue. A dicued at the end of Section III-B, the time pent in W (t) i excluded from the uer delay calculation. Then the uer delay in M/M/ [], i.e., D, ha the ame ditribution a max (, D ). We leverage thi obervation to obtain the ditribution of D for M/M/ [] in the following lemma. Lemma 2: Let f D (t) be the probability denity function of D. We have and Pr(D ) f D (t) f D (t + ), t >, ˆ f D (t)dt e (µ λ). Proof: Becaue the arrival proce A(t) i tationary, the delay ditribution of arrival in Q p i the ame a that of M/M/ in (2). By Lemma, Q um (t) Q p (t) for any t. Then a ame requet in A(t) pend the ame amount of time in Q um and Q p. A hown in Fig. 6, if not getting pre-erved, a requet goe through the prediction window W (t) before it enter Q, which cot amount of time. If a requet pend T amount of time in Q p, it will pend [T ] + amount of time in Q. ([T ] + mean the requet i pre-erved completely.) Therefore, the delay ditribution of requet in Q i a hifted verion of that of thoe in Q p. An illutrating example of the ditribution derived in Lemma 2 i hown in Fig. 5. Note that the argument in the proof of Lemma and 2 are alo ued in our iter paper [9]. We preent the proof here for completene. Although Lemma and 2 are etablihed for the M/M/ [] model, they can be extended to the general G/G/ [] model a hown in the following corollary, which will be ued in the next ubection. Corollary : In the G/G/ [] model, the uer delay ditribution, denoted a fd G (t), can be obtained from that of the.7.5.3. M/M/ [2] f D (t) f D (t) 2 4 6 8 t Fig. 5. Probability denity function (PDF) of uer delay with/without future prediction: The PDF under perfect prediction (the vertical arrow at the origin and the dah curve) can be obtained by hifting the PDF without prediction (dah curve) unit left, where 2. G/G/ model, denoted a f G D (t), a follow and fd G (t) f D G (t + ), t >, Pr(D ) ˆ f G D (t)dt. The proof i imilar to that of Lemma 2 and i relegated to Appendix A. Lemma 2 and Corollary allow u to obtain the ditribution of D from that of D, which uually i well tudied in queuing theory. Baed on Lemma 2, for the M/M/ [] model, we obtain the average requet delay E[D ] a follow. Theorem : Aume µ > λ. The average uer delay of M/M/ [] model with perfect prediction i given by E[D ] µ λ e (µ λ). (3) Proof: Baed on Lemma 2, E[D ] ˆ t f(t + )dt µ λ e (µ λ). Theorem ay that the average uer delay decreae exponentially in the prediction window ize. Thi indicate that a little bit of future information can improve uer delay experience ignificantly. In particular, thi reult ugget that, for Amazon new mobile web-brower decribed in Section I, if the Amazon cloud can predict upcoming web requet actually, the requet repone time can be reduced ignificantly. From Theorem, in the heavy-load regime where the arrival rate λ i cloe to the ervice rate µ, (3) can be approximated by µ λ when i mall, which i linear in. In contrat, when the ytem i in the light-load regime, (3) decreae exponentially in. Thi indicate that proactive erving i more effective in decreaing delay in the light-load regime than in the heavy-load regime. The reaon i a follow. In the light-load regime, the number of requet that enter Q (t) for ervice i mall and it i empty mot of time. A uch, mot of the ervice capacity can be pared to erve upcoming requet. Conequently, many requet are erved proactively and thu experience zero delay. In contrat, in the heavyload regime, the erver i buy with erving requet in Q (t) mot of the time. A a reult, a requet ha little chance to be erved proactively, epecially when the prediction window

6 ize i mall. Therefore, proactive erving ha limited delay reduction capability. B. Average Uer Delay of G/G/ [] In thi ubection, we extend the analyi to the general G/G/ [] model. In G/G/ [], inter-arrival time of uer requet are independent, identically, and generally ditributed with mean λ and variance σ2 λ. The ervice time of uer requet are alo independent, identically, and generally ditributed with mean µ and variance σ2 µ. Moreover, inter-arrival time and ervice time are independent. The ytem can predict upcoming arrival amount of time ahead and erve them proactively. By definition, under the FCFS queuing policy, uer delay i the ummation of queuing delay and ervice time, where queuing delay i defined a the time from when the uer requet arrive at the ytem till the ytem tart erving it. In G/G/ [], ervice time follow general ditribution. A a reult, uer delay ditribution i alo general. A hown by Corollary, proactive erving eentially hift the delay ditribution left by amount of time. Therefore, average uer delay will decreae monotonically to zero a increae, yet the decrement may not be exponential in. Intead, we focu on the effect of proactive erving on reducing uer queuing delay (i.e., uer delay ubtract ervice time) and how that proactive erving decreae the average uer queuing delay exponentially a increae. Let QD denote the uer queuing delay when the ytem predict amount of time ahead and QD denote the uer queuing delay without proactive erving. The explicit form of the ditribution of QD in G/G/ i till an open problem. However, the author of [26], [27] have derived an ueful upper bound on the tail of the ditribution of QD. Let random variable X be the inter-arrival time and Y be the ervice time. Let U() denote the Laplace-Stieltje tranform of random variable Y X. Define up{ > : U( ) }. (4) The following lemma preent a ufficient condition for uch to exit. Lemma 3 ([26]): defined in (4) exit if the following condition i atified: (, ),.t. E [ e Y ] <. (5) The condition in (5) require that the convergence region of the moment generating function of Y include an interval around the origin. Clearly, whether condition in (5) i atified depend on the ditribution of Y, and it i atified for many popular ditribution for modeling ervice time, including Exponential, Gamma, and Weibull [28] [29]. Auming that the condition in (5) i atified and defined in (4) exit, the tail of the ditribution of QD, i.e., Pr(QD t), can be upper-bounded a follow [26], [27] Pr(QD t) e t, t >. (6) Combining the ditribution hift effect under proactive erving, thi upper-bound in (6) enable u to prove that the average uer queuing delay, i.e., E[QD ], decreae exponentially under proactive erving. Theorem 2: Aume µ > λ and there exit a real uch that E[e Y ] <. The average queuing delay of G/G/ [] with perfect prediction i given by E[QD ] e, (7) where exit and i defined in (4). Proof: Let f QD (t) denote the ditribution of QD. Let QD denote the queuing delay and f QD (t) denote the correponding ditribution without proactive erving. Firt, we how that the ditribution of QD can be obtained by hifting that of QD by unit. Although Corollary i etablihed for the ditribution of uer delay, it can be eaily extended to the ditribution of queuing delay baed on the ame proof. We can have and f QD (t) f QD (t + ) for t > Pr(QD ) ˆ f QD (t)dt. Now we are ready to how that the average queuing delay decreae exponentially in. From [26], we have the following about the tail of the ditribution of QD Then we have E[QD ] ˆ ˆ Pr(QD t) e t. (8) Pr(QD t)dt ˆ e (t+) dt e. Pr(QD t + )dt Theorem 2 ay that, a long a the ervice time ditribution atifie condition (5), the average uer delay decreae exponentially under proactive erving. In general, computing the explicit form of i difficult. In Lemma 4 below, for the heavy-load regime (ρ λ µ but remain trictly le than ), one can leverage the reult in [3], [3] to how that higher-order term of Taylor expanion (around ) of U() can be neglected and an approximate expreion of can be obtained. Since the argument and detail are imilar to thoe in [3], [3] (for analyzing queuing ytem performance without proactive erving), we kip the detail here and refer intereted reader to [3], [3]. Lemma 4: When ρ λ µ but remain trictly le than, a defined in (4) can be approximated by ( 2 λ µ ) / ( σ 2 λ + σ 2 µ). (9) From (9), we can ee how the inter-arrival time and the ervice time ditribution affect the average queuing delay under proactive erving. Firt, we can ee that i a decreaing function of λ and an increaing function of µ. Thi i intuitive. The incoming workload increae when λ increae or µ decreae. Conequently the erver i more dedicated to erve requet in the queue and thu the number of pre-erved

7 A ( t ) p A ( t + 2 W (t) p ) Q ( t ) erver Fig. 6. Service ytem with imperfect prediction: The mi detection proce {A (t)} t can not be erved proactively. Requet in {A (t)} t enter Q (t) for ervice directly. The proce {A 2(t)} t include fale alarm and actual arrival that are predicted correctly. Requet in {A 2(t)} t go through W (t) and can be pre-erved by the erver. p i the probability that a requet in {A 2(t)} t i an actual arrival. requet decreae. Then the delay reduction by proactive erving i le ignificant. From (9), we can alo ee that i a decreaing function of σ 2 λ or σ2 µ. When the variance of interval-time or ervice time increae, the incoming workload become more burty. In thi cae, the reult in (9) ugget that proactive erving i le effective in reducing uer delay. V. PROACTIVE SERVING WITH IMPERFECT PREDICTION In Section IV, we analyze the benefit of proactive erving under perfect arrival prediction. In thi ection, we look at two more realitic cenario that correpond to two common type of prediction error, and we tudy the performance of proactive erving under thee etting. For the eae of analyi and illutration, we conider a ingle erver etting. A. Modeling The firt type of error i failing to predict actual arrival, i.e., mi detection (alo called fale negative). When mi detection happen, the mied arrival cannot be proactively erved. Intuitively, uch error reult in a ide flow into the ytem and will affect the benefit of proactive erving. The other type of error i fale alarm (alo called fale poitive), which happen when the ytem mitakenly predict nonexiting arrival. Such fale arrival will not eventually enter the ytem for ervice. However, the ytem may incorrectly allocate reource to pre-erve them, reulting in wated ervice capacity. We repreent the ytem with thee two type of prediction error uing the model hown in Fig. 6. In thi model, {A (t)} t repreent the proce of mi detection and {A 2 (t)} t repreent the proce of predicted arrival, which include fale alarm and actual arrival that are predicted correctly. Q (t) tore requet that have already entered the ytem and are waiting for ervice at time t. W (t) i the prediction window with ize. Requet in {A 2 (t)} t go through the prediction window and can be erved proactively by the erver. In contrat, requet in {A (t)} t enter Q (t) directly and cannot be erved proactively. Fale alarm in {A 2 (t)} t diappear once they leave the prediction window and do not enter Q (t). For tractability, we make the following two aumption on {A (t)} t and {A 2 (t)} t. Firt, we aume that the probability that a requet in {A 2 (t)} t i an actual arrival i p. With probability p, a requet of {A 2 (t)} t will enter Q (t); with probability p, it will diappear once it leave the prediction window. The larger p i, the more accurate the prediction i. Second, we aume that {A (t)} t and {A 2 (t)} t are independent Poion procee, which i reaonable for ytem with large number of uer. Let λ and λ 2 be the arrival rate of {A (t)} t and {A 2 (t)} t repectively. Since all the mi detection and the actual arrival among the predicted arrival compoe the actual arrival, we have λ + pλ 2 λ. () Recall that λ i the rate for the actual arrival proce. In thi ytem, the erver applie the FCFS ervice policy. Different from the cae of perfect prediction, here we aume that the requet in Q (t) and the arrival from {A (t)} t have preemptive priority. That i, the arrival in W (t) will be preerved only when the queue i empty and there i no new arrival entering Q (t) from {A (t)} t. B. Impact of Mi Detection Now uppoe that there are only mi detection in the ytem, i.e., p and λ + λ 2 λ. In thi cae, the delay reduction can be le ignificant a compared to the perfectprediction cae, becaue mi detection cannot be pre-erved by the ytem. To characterize the impact of mi detection, we follow the ame idea ued in the perfect prediction cae. That i, linking the ditribution of D to that of a ingle queue ytem without proactive erving. After obtaining the ditribution of D, we then calculate the average uer delay, i.e., E[D ]. Mi detection create a ub-arrival proce into Q (t) and mixe with the requet in {A 2 (t)} t. Thi make it challenging to derive the uer delay ditribution. In the derivation proce, we need to apply reidue theorem [32] with carefully deigned branch cut [33] to invere Laplace tranform to obtain the average uer delay. We arrive at the following reult. Theorem 3: Aume λ λ + λ 2 < µ. The average uer delay in the preence of mi detection i given by: E[D ] () [ λ (µ λ + λ2 λ 2 λ µ λ2(λ µ) e )λ λ λ 2 λ 2 (µ λ) λ 2 >λ µ+ ˆ 4 λ µ (µ λ) x(4 λ µ x)e ( ( µ ] λ ) 2 x) ( 2π ( µ λ) 2 + x ) 2 ( λx + ( λµ λ) 2) dx, which decreae exponentially in. Proof: The proof i relegated to Appendix C. The average uer delay expreion in () conit of two term. The firt term i a poitive contant repreenting the delay experienced by the mi-detection uer requet. A hown in Fig. 6, mi detection enter Q (t) directly without going through the prediction window. A uch, thee mi detection cannot be erved proactively and will alway experience poitive delay. The econd term in () decreae exponentially

8 average delay.5 M/M/ [] with mi detection with mi detection by im with fale alarm with fale alarm by im with perfect prediction 2 4 6 8 average delay.8.6.4.2 2 4 6 M/M/ [] 8 2/3 /3 λ /λ average delay.8.6.4.2 2 4 6 M/M/ [] 8.2.4 -p.6 Fig. 7. The average requet delay under perfect prediction, mi detection and fale alarm. Fig. 8. Impact of mi detection on the average requet delay with λ 3 and µ 4. The fraction of mi detection i λ λ. Fig. 9. Impact of fale alarm on the average requet delay with λ 3 and µ 4. The fraction of fale alarm i p. in and vanihe a the ytem predict ufficiently far into the future, demontrating the effectivene of proactive erving. We validate the cloed-form expreion of average uer delay in Theorem 3 by plotting the average uer delay by () (the red curve marked by down triangle) and by imulation (the black curve marked by + ) in Fig. 7 under the etting of λ, λ 3, and µ 4. We oberve that, (i) two curve coincide, which verifie Theorem 3, and (ii) uer delay decreae exponentially a the ytem predict further. The uer delay i eventually dominated by the firt term in () a expected. Compared to the cenario with perfect prediction, we note that the decay rate in i maller. Thi i intuitive becaue mi detection occupy part of the erver capacity and thu the capacity ued for proactive erving i reduced. We plot the average uer delay a a function of both mi detection fraction and prediction window ize in Fig. 8, baed on (). The fraction of mi detection i λ λ. A een, proactive erving i more effective when there i few mi detection (i.e., high prediction accuracy). Thi matche our intuition ince more mi detection mean that more uer requet go directly into the ytem without having the chance to be erved proactively. The plot alo ugget that when there are ubtantial mi detection, predicting further into the future i not effective in reducing uer delay, and we are better off if we focu on improving the prediction accuracy (i.e., reducing the fraction of mi detection). C. Impact of Fale Alarm When there are only fale alarm in the ytem, i.e., λ and pλ 2 λ, the erver capacity will be wated if a fale alarm i pre-erved. A a reult, the benefit of proactive erving will be le ignificant a compared to the perfect prediction cae. Different from mi detection, fale alarm do not enter Q (t). Therefore, the ytem cannot be modeled by a ingle queue ytem without proactive erving. A a reult, we cannot carry out the ame equivalence argument a for the mi detection cae in Section V-B. Hence, the idea ued in the mi detection cenario can not be applied here. To addre the difficulty, we firt dicretize the ytem and model it evolution by a multi-dimenional Markov chain where uer requet paing through the window i captured by moving from one tate to it ubequent tate in the Markov chain. Thi Markov chain i non time-reverible. Then, by tudying the tationary ditribution of the Markov chain, we obtain the average uer delay by applying Little Law. We arrive at the following reult. Theorem 4: Aume λ pλ 2 < µ ( < p ). The average uer delay in the preence of fale alarm i given by E[D ] (2) pµ λ (e p(µ λ) 2 p (pµ λ) ( p)λ p(µ λ) ), when λ < pµ; λ pµ + p(λ pµ) (e ( p)λ(µ λ) ( p) 2 λ 2 p (λ pµ) p(µ λ) ( p)λ ), when pµ < λ and λ < µ;, when pµ λ. (µ λ)[(µ λ)+] Due to the pace limitation, the proof i included in [24]. From (2), it i not immediately clear that E[D ] decreae with an exponential rate when λ 2 µ. Intead, we how in the proof in [24] that E[D ] can be lower and upper bounded by exponential function which decreae exponentially in. The average delay decreae exponentially to zero when λ < pµ and decreae exponentially to a contant value when λ > pµ. We call pµ effective ervice rate. Theorem 3 ay that the average uer delay decreae exponentially to zero in the prediction window ize, if the uer requet arrival rate i maller than the effective ervice rate, i.e., λ < pµ, and otherwie to a poitive contant determined by the fraction of fale alarm. We plot the average uer delay under the impact of fale alarm by (2) (the pink curve marked by up triangle) and by imulation (the blue curve marked by dot) in Fig. 7 under the etting of λ 2 3.5, λ 3, and µ 4. The reult clearly verify Theorem 3 and how that the average delay decreae exponentially in. A compared to the perfect prediction cae, the decay rate i maller. Thi i becaue part of the erver capacity i wated to erve fale alarm. Moreover, we can derive the following from (2): lim E[D ] (3) {, when λ < pµ, λ µ < p ; µ (µ λ)λ ( p)λ, when pµ λ < µ, < p λ µ. Thi equation how that the ytem cannot reduce delay to zero if pµ < λ < µ. Thi i becaue tatitically p fraction of the ervice capacity allocated for proactive erving i conumed by fale alarm, and the remaining p fraction i not enough to pre-erve all the actual arrival before they enter the ytem. A uch, the uer delay can not be reduced to zero no matter how far we can predict into the future. Meanwhile,

9 average delay.5.5 M/M/ [] λ, λ 2 4, p λ.5, λ 2 7, p.5 λ 2, λ 2 2.5, p.8 λ.3, λ 2 3.6, p.75 2 4 6 8 Fig.. The average requet delay v. how far we predict into the future under different prediction trategie. *(m) 8 6 4 2 M/M/ [] ρ.5 ρ.9 2 3 4 5 m Fig.. How far do we need predict in order to achieve the ame delay performance a compared to the erver capacity being increaed by m time? 2 5 5 M/M/ [] average delay.2 average delay.5 average delay..2.4.6.8 2 m Fig. 2. To achieve the ame delay performance, the ytem can elect different combination of proactive erving and ytem capacity booting. when λ pµ the delay can alway be reduced to zero a long a the ytem can predict ufficiently far. In Fig. 9, we how the impact of fale alarm on uer delay baed on (2). The fraction of fale alarm in the predicted arrival i p. A een from the figure, the average uer delay increae when more fale alarm exit in the ytem. When λ > pµ (correpond to p >.25 in the imulation), the average delay doe not decreae to zero. The larger p i, the le helpful proactive erving i in reducing uer delay. In practice, ytem deigner hould improve their prediction algorithm to keep the fraction of fale alarm le than the threhold λ µ to extract the full potential of proactive erving. If the fraction of fale alarm i inevitably larger than the threhold, then capacity booting maybe a better choice for ytem deigner to decreae the average delay. Dicuion: In addition to the impact on uer delay reduction, fale alarm may incur additional cot. For example, in the content pre-fetching cenario, erving fale alarm will wate bandwidth on the ytem ide, and bandwidth, torage, and energy on the uer ide. Thu ytem deigner hould take into account the conequence of fale alarm when deigning the requet prediction algorithm. D. Impact of Mi Detection and Fale Alarm When mi detection and fale alarm are both preent in the ytem, it i difficult to analyze the uer delay due to the coupling of the two effect. Intead, we conduct imulation to invetigate the ytem behavior. We conider three cae. In the firt cae, the prediction mechanim reult in few mi detection but many fale alarm. In the econd cae, the prediction mechanim lead to few fale alarm but many mi detection. The third cae i in-between. The firt two can be conidered a extreme cae. The imulation reult under the etting of µ 5 are hown in Fig.. A few comment are in order for Fig.. Firt of all, uer delay till decreae exponentially a the ytem predict further. Second, when there are many fale alarm, the ytem cannot remove delay completely, which align with (3). When there are many mi detection, the delay decay rate i maller becaue the erver i occupied with mi detection, which alo align with the reult in Fig. 8. When the number of mi detection and fale alarm are moderate, compared to other two cae, uer delay decreae rapidly and proactive erving lead to a mall delay for uer. VI. COMPARISON WITH CAPACITY BOOSTING In thi ection, we compare capacity booting and proactive erving with perfect prediction a two principal deign mechanim for reducing uer delay. In the cae of imperfect prediction, the cloed-form expreion doe not admit a clear comparion, and thu we reort to perform the comparion baed on imulation reult in Section VII. For eay dicuion, we focu on the M/M/ [] ytem a dicued in Section IV-A. For the M/M/ ytem without proactive erving capability, the average uer delay with ervice capacity m i given by /(m µ λ). By comparing it with the average delay of the M/M/ [] ytem with proactive erving capability in (3), we obtain the amount of prediction (meaured in prediction window ize) needed to obtain the ame delay reduction a booting the capacity by m time. Denote the correponding prediction window ize a (m), we have the following theorem. Theorem 5: Aume λ < µ. For the M/M/ [] ytem with perfect prediction, (m) (m ) i given by (m) µ λ ln m ρ ρ. Proof: For the M/M/ ytem without proactive erving, the average uer delay i mµ λ when the ervice capacity i m. To achieve the ame delay performance, (m) hould atify E[D (m) ] µ λ e (µ λ) (m). Then we obtain (m) m ρ µ λ ln ρ where ρ λ µ. We plot the (m) a a function of m in Fig. under different ρ with µ (recall that ρ λ µ ). We oberve that (m) increae logarithmically in the ytem capacity m. Thi ugget that proactive erving i more effective than booting the ytem capacity to keep the ytem in low delay regime. For example, to achieve the ame average uer delay, proactive erving with prediction window ize of 4 unit time i equivalent to increaing the erver capacity by 5 time when ρ.5. In Fig., when ρ.9, the ytem i in heavy-load regime when m i mall and i in light-load regime when m

i large. Then the logarithmic curve indicate that proactive erving i more effective in delay reduction than capacity booting in light-load regime. The reaon i a follow. When the workload i light, the number of requet that enter the queue for ervice i mall and thu mot of the ervice capacity can be pared to erve future requet. A a reult, mot of requet are erved proactively and thu experience low delay. It i alo conceivable to combine proactive erving and capacity booting to achieve a deired average uer delay for a ytem. To evaluate thi idea, we run imulation under the etting of λ.4, λ 2 8, µ and p.95 (for modeling imperfect prediction), a defined in Fig. 6. We plot different combination of (repreenting proactive erving capability) and m (repreenting ervice capacity) to achieve the ame deired delay target in Fig. 2. All the combination of and m on the ame ioline achieve the ame average uer delay. For example, to obtain the average uer delay of. unit time, the ytem can erve proactively 7.4 unit time ahead without booting the ervice capacity, or it can erve proactively.9 unit time ahead and boot the ervice capacity by %. The ytem deigner can elect the bet combination according to the average uer delay requirement and variou reource contraint. In practice, the choice of trategy to improve ytem delay performance may involve additional conideration. For capacity booting, the incurred operation cot i a non-negligible factor. For proactive erving, the cot of bandwidth, torage, and energy due to fale alarm are alo needed to be conidered. Along thi line, we have obtained initial reult by adapting the well-tudied network utility maximization framework to achieve an optimized performance trade-off among delay reduction and multiple deign conideration. Due to the pace limitation, we kip the detail and refer intereted reader to our technical report [24]. VII. SIMULATIONS We carry out imulation to tudy the impact of proactive erving on reducing the average uer delay under different practical requet arrival and etting. Our objective i to evaluate: (i) How i the delay performance of the ytem with proactive erving capability? (ii) How doe prediction error affect the delay reduction? (iii) How doe proactive erving compare to capacity booting? A. Parameter and Setting To carry out our imulation, we collect two et of video from YouTube [35]. One et of 557 video are popular one, and the other et of 443 video are randomly choen video. By tudying the hitogram of duration of the popular video, we oberve that the hitogram fit light-tailed Gauian ditribution well, a hown in Fig. 3. Thi obervation alo confirm the meaurement reult in [36] that the video length of popular YouTube categorie follow Gauian ditribution. For the randomly-choen video, we oberve that the empirical CCDF (Complementary Cumulative Ditribution Function) of their duration fit heavy-tailed power law ditribution well, a hown in Fig. 4. Note that it i common to tudy CCDF to evaluate whether a ditribution i heavy-tailed or not. We evaluate the delay performance of popular YouTube video, repreenting the light-tailed ervice time cae, and that of randomly-choen YouTube video, repreenting the heavytailed erving time cae. In the imulation of popular Youtube video, we et the number of erver to be 65 in the ytem. In the imulation of randomly-choen video, we et the number of erver to be 27 in the ytem. In one econd, a ingle erver can erve a video of 7 econd, which i about MB in ize. Such a etting i choen for two reaon. Firt, under uch etting, the workload level for two data et i ame o that we can make comparion between imulation reult under the two et. Second, it i to make ure that the average uer delay i reaonable. The ytem adopt the FCFS ervice policy. To invetigate the impact of prediction error, we model the mi detection and fale alarm in the imulation a follow. Each time unit, q fraction of the requet arrival are mi detection. The ret q fraction compoe actual arrival in the predicted requet arrival that can be erved proactively. In the predicted arrival, p fraction are fale alarm and p fraction are actual arrival. Larger q mean more mi detection, and maller p mean more fale alarm in the ytem. For perfect prediction, q and p. B. Delay Reduction by Proactive Serving The imulation reult under the popular video (with lighttailed duration) and the randomly-elected (with heavy-tailed duration) are hown in Fig. 5 and 6, repectively. For the popular video, under perfect prediction, the uer delay can be reduced by 5% when the ytem predict 6 econd ahead. For the randomly-choen video, under prefect prediction, the uer delay can be reduced by 5% when the ytem predict 84 econd ahead. The above imulation reult ugget that the ytem can improve uer delay experience ignificantly by proactive erving. Furthermore, for the popular video with light-tailed duration, the delay curve under perfect prediction can be fitted nicely by the exponential function 9.9e.2 with R-quared value of 99.74%. Thi verifie the theoretical reult we obtain in Section IV-B. In Fig. 6, we oberve imilar delay reduction a in Fig. 5. That i, when i mall, increaing can lead to larger delay reduction than the cae when i large. On the other hand, the delay reduction rate in Fig.6 i maller than that in Fig. 5. Requet for elephant video play an important role on the reduction rate. In the randomly-choen video of heavy-tailed duration, elephant video i not rare. Requet for uch video will occupy the ervice for a long time. Therefore, the delay reduction under the heavy-tailed data et i le ignificant. It remain an intereting direction to characterize the delay performance under proactive erving for the cae of heavytailed duration. C. Impact of Mi Detection and Fale Alarm The imulation reult are hown in Fig. 5 and 6. A een, both mi detection and fale alarm lead to maller delay reduction. But proactive erving can till reduce uer delay ignificantly. For example, in Fig. 5, the uer delay can be

PDF.8% 7.2% 3.6% Normal ditribution 2 3 4 video duration (ec.) CCDF Power Law ditribution.8.6.4.2 5 2, 3, 4, video duration (ec.) average delay (ec) 5 q, p q.5, p.95 q.5, p.75 q.2, p.85 2 3 (ec) average delay (ec) 5 5 q, p q.5, p.95 q.5, p.75 q.2, p.85 2 3 (ec) Fig. 3. The hitogram fit well Fig. 4. The ccdf appear to fit with a Gauian ditribution with R-quared value [34] of 92%, uggeting that the ditribution of video duration i light-tailed. well a Power Law ditribution with R-quared value 97%, indicating that the ditribution of video duration i heavy-tailed. Fig. 5. The average requet delay v. how far we predict into the future under the light-tailed video et. Fig. 6. The average requet delay v. how far we predict into the future under the heavy-tailed video et. reduced by 5% a the ytem predict about 23 econd even when the prediction mie 5% of the actual arrival (the red curve with circle maker). Under all the data trace, proactive erving i relatively more enitive to fale alarm than mi detection, which matche what we oberved in Section V-D. The reaon i that ome erver capacity allocated to future requet i wated due to fale alarm, thu erver capacity i not well utilized. D. Comparion with Capacity Booting We compare proactive erving under perfect prediction with ytem capacity booting uing all Youtube video trace. The reult are hown in Tab. I. In the table, the firt column i how far in time the ytem erve proactively with perfect prediction. The econd column i by how many percent the number of erver need to increae to achieve the ame delay performance. The third column i percentage of the decrement of the utilization ratio of each erver, when the total number of erver increae. The fourth column i how many percent the average uer delay i reduced. For example, in Tab. I, the firt row ay, erving proactively 2 econd ahead can reduce the average uer delay by 54.%. To achieve the ame delay performance, the ytem need to increae the number of erver by %, which reult in that the utilization rate of each erver i decreaed by 9.%. From Tab. I, we firt oberve that, when the ytem capacity increae, the rate of increaing low down, where i how far the ytem need to predict to achieve the ame delay performance a capacity booting. Thi agree with what we oberved in Fig. that proactive erving i more effective than capacity booting in the light-load regime. For example, to achieve the ame delay reduction attained by booting the capacity by %, one need to predict requet arrival 2 econd ahead. Meanwhile, uppoe we already boot the capacity by 2%, to achieve the ame delay reduction by booting another % of the capacity (auming incurring the ame amount of expene), one only need to increae the prediction window ize by 3 29 8 econd, which i maller than the 2-econd increment required in the previou (heavier load) cae. Reult from Tab. I alo how that, under the Youtube data trace, booting the capacity by 3% i equivalent to predicting 3 econd ahead; both reduce the average uer delay by about 9%. Since predicting 3 econd ahead with decent accuracy i very feaible (ec.) # of erver utilization average delay 2 % 9.% 54.% 29 2% 6.7% 75.3% 3 3% 23.% 89.9% TABLE I COMPARISON WITH SYSTEM CAPACITY BOOSTING UNDER THE YOUTUBE DATA TRACE. according to the tudy on predicting uer requet for an indutrial-grade VoD ytem [6], thi mean one can achieve the ame delay reduction by leveraging prediction, without ignificant invetment to increae the ervice capacity by 3%. VIII. CONCLUSIONS AND FUTURE WORK In thi paper, we invetigate the fundamental of proactive erving from a queuing theory perpective. We how that proactive erving decreae average delay exponentially (a a function of the prediction window ize) with perfect prediction. We how that in the preence of mi detection, average uer delay till decreae exponentially in the prediction window ize, but to a poitive contant determined by the fraction of mi detection. Meanwhile, in the preence of fale alarm, we how that there exit an effective ervice rate pµ. Average uer delay decreae exponentially to zero a prediction window ize increae if the actual uer requet arrival rate i maller than the effective ervice rate; otherwie it decreae to a poitive contant determined by the fraction of fale alarm. Compared with the conventional mechanim of capability booting, we how that proactive erving i more effective in decreaing uer delay in a light workload regime. Our trace driven evaluation reult demontrate the practical power of proactive erving, e.g., for the data trace of lighttailed Youtube video, uer delay decreae by 5% when the ytem can predict 6 econd ahead. Our tudy applie to general online ervice model that ue prediction, uch a the pre-loading intereted content on mobile device and the pre-fetching Youtube video. Therefore, our reult not only provide olid theoretical foundation for proactive erving, but alo reveal inight into it practical application. Our reearch offer everal intereting future direction. Firt, while in the paper we focu on characterizing the benefit of proactive erving, it would alo be valuable to conider the cot/overhead involved in proactive erving, e.g., bandwidth

2 cot due to fale alarm, to properly evaluate it effectivene. Second, while thi paper only conider proactively erving one predictive requet at a time, it would be conceivable and intereting to generalize the analyi to conider erving multiple predictive requet imultaneouly. Third, while we conider time-homogeneou prediction accuracy in thi paper, it would be intereting to generalize the tudy to cae with time-heterogeneou prediction accuracy where hort-time prediction i more accurate than long-term prediction. Lat but not the leat, it would be intereting to analyze the delay reduction benefit of proactive erving under work-conerving policie other than the FCFS ued in our current tudy. REFERENCES [] S. Zhang, L. Huang, M. Chen, and X. Liu, Effect of proactive erving on uer delay reduction in ervice ytem, in ACM SIGMETRICS, 24. [2], Proactive erving decreae uer delay exponentially, in Proceeding of the Workhop on MAthematical performance Modeling and Analyi, 25. [3] R. Kohavi and R. Longbotham, Online experiment: Leon learned, IEEE Computer, 27. [4] A. Khan, X. Yan, S. Tao, and N. Aneroui, Workload characterization and prediction in the cloud: A multiple time erie approach, in IEEE Network Operation and Management Sympoium, 22. [5] Kindle fire. http://www.amazon.com/gp/product/b5vvob2. [6] G. Gurun, M. Crovella, and I. Matta, Decribing and forecating video acce pattern, in Proc. INFOCOM, 2. [7] J. Spencer, M. Sudan, and K. Xu, Queueing with future information, Annal of Applied Probability, 24. [8] K. Xu and C. W. Chan, Uing future information to reduce waiting time in the emergency department, Manufacturing & Service Operation Management, 26. [9] L. Huang, S. Zhang, M. Chen, and X. Liu, When backpreure meet predictive cheduling, IEEE Tran. Networking, 26. [] J. Tadrou, A. Eryilmaz, and H. El Gamal, Proactive reource allocation: harneing the diverity and multicat gain, IEEE Tran. Information Theory, 23. [], Proactive content download and uer demand haping for data network, IEEE Tran. Networking, 25. [2], Joint mart pricing and proactive content caching for mobile ervice, IEEE Tran. Networking, 26. [3] X. Chen and X. Zhang, A popularity-baed prediction model for web prefetching, IEEE Computer, 23. [4], Coordinated data prefetching for web content, Computer Communication, 25. [5] T. Anagnotopoulo, C. Anagnotopoulo, S. Hadjiefthymiade, M. Kyriakako, and A. Kaloui, Predicting the location of mobile uer: a machine learning approach, in International Conference on Pervaive Service, 29. [6] R. Mayrhofer, H. Radi, and A. Fercha, Recognizing and predicting context by learning from uer behavior, in The International Conference On Advance in Mobile Multimedia, 23. [7] S. Sigg, Development of a novel context prediction algorithm and analyi of context prediction cheme. Kael Univerity Pre, 28. [8] V. S. Teng and K. W. Lin, Efficient mining and prediction of uer behavior pattern in mobile web ytem, Information and Software Technology, 26. [9] Y. Xu, M. Lin, H. Lu, G. Cardone, N. Lane, Z. Chen, A. Campbell, and T. Choudhury, Preference, context and communitie: a multi-faceted approach to predicting martphone app uage pattern, in Proc. ISWC, 23. [2] B. D. Higgin, J. Flinn, T. J. Giuli, B. Noble, C. Peplin, and D. Waton, Informed mobile prefetching, in Proc. MobiSy, 22. [2] M. Palmer and S. B. Zdonik, Fido: A cache that learn to fetch. Brown Univerity, Department of Computer Science, 99. [22] D. Kotz and C. S. Elli, Practical prefetching technique for parallel file ytem, in Parallel and Ditributed Information Sytem, 99. [23] H. Yu and G. Kedem, Dram-page baed prediction and prefetching, in IEEE Computer Deign, 2. A( t + ) Q g (t) erver Fig. 7. G/G/: The arrival proce i {A(t+)} t. Service time of requet are independent and identically ditributed with mean /µ. The initial value Q g () i A( : ) + Q (). The ervice policy i FCFS. [24] S. Zhang, L. Huang, M. Chen, and X. Liu, Proactive erving decreae uer delay exponentially: The light-tailed ervice time cae, The Chinee Univerity of Hong Kong, Hong Kong, Tech. Rep., 25. [Online]. Available: http://www.ie.cuhk.edu.hk/ mhchen/paper/ proactive erving.tr.pdf [25] A. O. Allen, Probability, tatitic, and queueing theory: with computer cience application. Gulf Profeional Publihing, 99. [26] L. Kleinrock, Queueing ytem, volume II: Computer application. Wiley-Intercience, 976. [27] J. Kingman, Inequalitie in the theory of queue, Journal of the Royal Statitical Society, 97. [28] Moment generating function. http://en.wikipedia.org/wiki/ Moment-generating function. [29] G. Muraleedharan, A. Rao, P. Kurup, N. U. Nair, and M. Sinha, Modified weibull ditribution for maximum and ignificant wave height imulation and prediction, Coatal Engineering, 27. [3] J. Kingman, On queue in heavy traffic, Journal of the Royal Statitical Society, 962. [3] H. Kobayahi, Bound for the waiting time in queuing ytem, Computer Architecture and Network, 974. [32] Reidue theorem. http://en.wikipedia.org/wiki/reidue theorem. [33] Branch cut. http://en.wikipedia.org/wiki/branch point. [34] N. R. Draper and H. Smith, Applied regreion analyi. Wiley- Intercience, 998. [35] Youtube. http://www.youtube.com. [36] X. Cheng, C. Dale, and J. Liu, Undertanding the characteritic of internet hort video haring: Youtube a a cae tudy, in Proc. IMC, 27. [37] L. Kleinrock, Queueing ytem. volume : Theory. Wiley-Intercience, 975. [38] I. Adan and J. Reing, Queueing theory. Eindhoven Univerity of Technology Eindhoven, 22. APPENDIX A PROOF OF COROLLARY Conider a general queuing ytem a hown in Fig. 3. We can alway find a queuing ytem without proactive erving capability a in Fig. 7, where the arrival proce i delayed amount of time, the initial value of queue i equal to A( : ) + Q () and the ervice policy i the ame a adopted in the general queuing ytem. A uch, we can get the ame equivalence between two ytem a Lemma. Baed on how fd G (t) i derived in [37], the delay ditribution of the ytem in Fig. 7 i the ame a fd G (t). Thi i becaue that we conider the delay ditribution when the ytem i in teady tate. Then the initial value of Q g (which i bounded) and the hifted arrival proce will not alter the delay ditribution. Then, baed on the ame argument a in Lemma 2, we can how that the delay ditribution under proactive erving can be obtained by hifting the delay ditribution without proactive erving left by amount of time. APPENDIX B Dicuion: An alternative approach to obtain E[D ] in Theorem i applying the Markov chain model, which work a follow. Firt, we dicretize the ytem. We chop the time into lot of equal length. Let δ denote the lot length. At time lot t, A(t) become a Bernoulli random variable

3 A ( t ) A ( t + 2 Q m (t) ) erver Fig. 8. M/M/ with preemptive priority: Two arrival procee are {A (t)} t and {A 2(t + )} t repectively. Service time of requet are independent and identically exponentially ditributed with mean /µ. The initial value of Q m i A 2( : ) + Q (). Requet in {A (t)} t have preemptive priority over thoe in {A 2(t + )} t. with probability λ δ. Each time lot, the erver i on with probability µ δ and off with probability µ δ. When the erver i on, it can erve a requet in one lot. Q (t) tore requet that are waiting in the ytem for ervice at lot t. The prediction window W (t) i chopped to /δ mall window which are denoted by {w i (t)} i /δ. Each requet firt goe through a pipeline of thee mall window from w /δ (t) to w (t) before entering Q (t). If A(t+iδ), then the ytem can oberve a requet in the window w i (t), which can be erved proactively. Then, baed on the effort in the above tep, the ytem can be modeled by a multi-dimenional Markov chain with tate being ( w /δ (t), w /δ (t),..., w (t), Q (t) ). By olving the tationary ditribution of the Markov chain, we can obtain the average uer delay of the dicretized ytem by applying Little Law. Then, by taking limit in δ, we finally get E[D ]. Remark that the tructure of the Markov chain i very complicated which make the derivation of the tationary ditribution highly involved. Compared the approach we ued in the paper, the above approach i complicated and cannot provide intuitive inight. At the ame time, the approach i hard to be generalized to more complex model. For example, for the ytem in Section V, the Markov chain that model the dicretized ytem i much more complex, which i rather challenging to olve. APPENDIX C PROOF OF THEOREM 3 Intead of FCFS, the ytem under imperfect prediction give preemptive priority to the requet of {A (t)} t, and within the ame arrival proce FCFS i adopted. Under thi policy, Q um (t) Q (t) + W (t) evolve the ame a the ytem in Fig. 8. The proof i imilar to that of Lemma and thu omitted. Conider a requet in {A 2 (t)} t. Similar to Lemma 2, if it pend T time in Q m, it will pend [T ] + in Q. Then the ditribution of delay of requet in {A 2 (t)} t pend in Q can be obtained by hifting that in Q m left by unit. Let f 2 (t) and f m 2 (t) be the denity function of delay that requet of {A 2 (t)} t pend in Q and Q m, repectively. We have f 2 (t) f m 2 (t + ) when t >. Next we firt calculate f m 2 (t), and then we can obtain f 2 (t). To do o, a the firt tep, we expre f m 2 (t) a a function of ditribution of buy period of the M/M/ ytem. By leveraging exiting knowledge on buy period of the M/M/ ytem (in particular, we know the Laplace tranform of the ditribution of length of the buy period), we get the Laplace tranform of f2 m (t). Then, baed on the relationhip between f2 m (t) and f2 (t), we get the Laplace tranform of f2 (t). Now we focu on the ytem in Fig. 8. It can be modeled by a Markov chain and let π i be the tationary probability that the number of requet in the ytem i i. By tandard analyi, we can have π i ( λ+λ2 µ )( λ+λ2 µ ) i. Conider a requet of {A 2 (t)} t. Let u denote the requet by a 2. Let T be the time that a 2 pend in the ytem. Let N be the total number of requet already in the ytem when the requet enter the ytem. Let T i+ be the time that a 2 pend in the ytem conditioning on N i. Then we have P (T t) π ip (T t N i) π ip (T i+ t). (4) i T i+ i equal to the time till a tandard M/M/ ytem i empty again when there are i+ requet in the M/M/ ytem [38]. Note that T i the length of a buy period. Denote the probability denity function of T i+ by f Ti+ (t). By (4), we get f2 m (t) i π if Ti+ (t). From [38], we know that the Laplace tranform of f Ti+ (t) i [ ( F i+() λ + µ + ) ] i+ (λ + µ + ) 2λ 2 4λ µ. So the Laplace tranform of f2 m (t) i π i F i+ () i i 2(µ λ λ 2 ) µ λ 2λ 2 + + (λ + µ + ) 2 4λ µ. By definition, the Laplace tranform of f 2 (t) i F () ˆ e t f m 2 (t + )dt + ˆ ˆ f m 2 (t)dt ˆ e e t f2 m (t)dt + f2 m (t)dt ˆ ] e [F () e t f2 m (t)dt + ˆ f m 2 (t)dt. Baed on the Laplace tranform, we are ready to calculate the average requet delay of the requet of {A 2 (t)} t in Q, which i denoted by E[D 2 ]. By the definition of the Laplace tranform, E[D 2 ] df () d µ (µ λ )(µ λ λ 2 ) + ˆ t f m 2 (t)dt. Then the derivative of E[D 2 ] i de[d 2 ] d ˆ ˆ f m 2 (t)dt. f m 2 (t)dt Now conider f m 2 (t)dt a a function of. The Laplace tranform of it i equal to 2(µ λ λ 2 ) ( µ λ 2λ 2 + + ), (λ + µ + ) 2 4λ µ

4 which i due to the integration property of the Laplace tranform. Next, we invert and get the expreion of f 2 m (t)dt. Define ( µ + λ ) 2, 2 ( µ λ ) 2, 3 λ 2(λ +λ 2 µ) λ +λ 2, 4. and 2 are branch point of. 4 i a imple pole of. When (λ + λ 2 ) 2 > λ µ, 3 i alo a imple pole of. The reidue at 3 Re( 3 ) i λ µ (λ +λ 2) 2 λ 2(λ +λ 2) e λ2(λ+λ2 µ) σ jr λ +λ 2. The reidue at 4 Re( 4 ) i. Conider the cloed contour L + C R hown in Fig. 9. ˆ ˆ e d L e d + C R e d ˆ σ+jr ˆ e d + e d C r ˆ lim r C r e d+ C R 2πj Re( 4 ) + 2πj Re( 3 ) (λ+λ 2) 2 >λ µ, (5) where the lat equality i baed on Cauchy Theorem. When σ+jr R, 2πj σ jr e d become the invere tranform of. According to Jordan lemma, C R e d when R. Thu, to calculate the invere tranform of, we only need to calculate lim r C r e d. ˆ lim r e d ˆ π lim r π ˆ π lim r π ˆ 4 λ µ F ( + re jθ ) + re jθ F ( 2 + re jθ ) 2 + re jθ e ( jθ +re ) jre jθ dθ+ e ( jθ 2+re ) jre jθ dθ 2(µ λ λ 2)e ( 2 x) ( ( 2 x) µ λ 2λ 2 + 2 x + j x(4 λ µ x) ) dx ˆ 4 λ µ 2(µ λ λ 2)e ( 2 x) + ( ( 2 x) µ λ 2λ 2 + 2 x j x(4 ) dx λ µ x) ˆ 4 λ µ j(µ λ λ 2) x(4 λ µ x)e ( 2 x) ( 2 x) ((λ + λ 2)x + ( λ µ λ λ 2) 2) dx, where, in the firt equality, the firt two integral are thoe around and 2 which are equal to and the lat two are thoe from 2 to and from to 2 repectively. By (5), we obtain f m 2 (t)dt. Finally, we get de[d2 ] d ˆ f m 2 (t)dt λ µ (λ + λ 2 ) 2 e λ 2 (λ +λ 2 µ) λ +λ 2 (λ+λ λ 2 (λ + λ 2 ) 2) 2 >λ µ+ 2π ˆ 4 λµ (µ λ λ 2 ) x(4 λ µ x)e (2 x) ( 2 x) ((λ + λ 2 )x + ( λ µ λ λ 2 ) 2)dx. (6) Then we derive the average delay of requet of {A 2 (t)} t a E[D2 ] µ (µ λ )(µ λ λ 2) (λ + λ 2) 2 λ µ λ2(λ+λ2 µ) ( e λ +λ 2 ) (µ λ λ2) (λ +λ 2 ) 2 >λ µ λ 2 2 ˆ 4 λ µ 2π (µ λ λ 2) x(4 λ µ x)( e (2 x) ) ( 2 x) 2 ((λ + λ 2)x + ( dx. λ µ λ λ 2) 2) Becaue of preemptive priority, the average delay of A requet E[D ] µ λ. Then, we obtain the average delay of all requet by E[D ] λ λ +λ 2 E[D ]+ λ2 λ +λ 2 E[D2 ] baed on the total expectation law. Now conider de[d 2 ] 2π ˆ 4 λµ d. C e ( ( µ λ ) 2 ξ), (µ λ λ 2 ) x(4 λ µ x)e (2 x) ( 2 x) ((λ + λ 2 )x + ( λ µ λ λ 2 ) 2)dx where C i a negative contant and ξ i a contant between and 4 λ µ by the mean value theorem. Now in (6), the coefficient of two exponential term are negative. At the ame time, in both exponential term, the coefficient of i alo negative. We can find poitive contant C, C 2, ξ, ξ 2 o that < C 2 e ξ2. A a reult, E[D 2 ] de- C e ξ < de[d 2 ] d creae exponentially in. Becaue de[d ] we get that E[D ] decreae exponentially in. C R C r 2 3 4 d λ2 de[d 2 ] λ +λ 2 d, Fig. 9. Contour integration: The contour conit of L and C R. and 2 are branch point. 3 and 4 are imple pole.

5 Shaoquan Zhang received hi B.Eng. degree from the Univerity of Science and Technology of China in 28. He received hi Ph.D. degree in information Engineering from from The Chinee Univerity of Hong Kong, Shatin, N.T., Hong Kong, in 24. Hi reearch interet include networked ytem analyi and algorithm deign, ditributed and tochatic network optimization. Xin Liu received her Ph.D. degree in electrical engineering from Purdue Univerity in 22. She i currently a Profeor in the Computer Science Department at the Univerity of California, Davi. From March 22-June 24, he worked in the wirele networking group at Microoft Reearch Aia. She ha tudied cellular cheduling algorithm, cognitive radio network, and wirele meh network. Her current reearch focue on data-driven approache in networking. She ha received the NSF CAREER award (25), and the Outtanding Engineering Junior Faculty Award from the UC Davi College of Engineering (25), and the Chancellor Fellowhip (2). Longbo Huang received the Ph.D. degree in Electrical Engineering from the Univerity of Southern California in Augut 2. He then worked a a potdoctoral reearcher in the Electrical Engineering and Computer Science department at Univerity of California at Berkeley from July 2 to Augut 22. Since Augut 22, Dr. Huang ha been an aitant profeor at the Intitute for Interdiciplinary Information Science (IIIS) at Tinghua Univerity (Beijing, China). Dr. Huang wa a viiting cholar at the LIDS lab at MIT in ummer 22 and ummer 24, at the EECS department at UC Berkeley in ummer 23. He wa alo a viiting profeor at the Intitute of Network Coding at CUHK in winter 22 and at MSRA in 25. Dr. Huang wa elected into China Youth -talent program in 23 and received the outtanding teaching award from Tinghua univerity in 24. Dr. Huang current reearch interet are in the area of learning and optimization, network control, data center networking, mart grid and mobile network. Minghua Chen (S4 M6 SM 3) received hi B.Eng. and M.S. degree from the Dept. of Electronic Engineering at Tinghua Univerity in 999 and 2, repectively. He received hi Ph.D. degree from the Dept. of Electrical Engineering and Computer Science at Univerity of California at Berkeley in 26. He pent one year viiting Microoft Reearch Redmond a a Potdoc Reearcher. He joined the Dept. of Information Engineering, the Chinee Univerity of Hong Kong in 27, where he i currently an Aociate Profeor. He i alo an Adjunct Aociate Profeor in Intitute of Interdiciplinary Information Science, Tinghua Univerity. He received the Eli Jury award from UC Berkeley in 27 (preented to a graduate tudent or recent alumnu for outtanding achievement in the area of Sytem, Communication, Control, or Signal Proceing) and The Chinee Univerity of Hong Kong Young Reearcher Award in 23. He alo received everal bet paper award, including the IEEE ICME Bet Paper Award in 29, the IEEE Tranaction on Multimedia Prize Paper Award in 29, and the ACM Multimedia Bet Paper Award in 22. He i currently an Aociate Editor of the IEEE/ACM Tranaction on Networking. He erve a a TPC Co-Chair of ACM e-energy 26 and a General Co-Chair of ACM e-energy 27. Hi recent reearch interet include energy ytem (e.g., mart power grid and energy-efficient data center), ditributed optimization, multimedia networking, wirele networking, network coding, and delay-contrained network information flow.