Online Learning Approaches in Maximizing Weighted Throughput

Size: px

Start display at page:

Download "Online Learning Approaches in Maximizing Weighted Throughput"

Bryce Hardy
5 years ago
Views:

1 Online Learning Approaches in Maximizing Weighed Throughpu Zhi Zhang, Fei Li, Songqing Chen Deparmen of Compuer Science George Mason Universiy Fairfax, Virginia {zzhang8, lifei, Absrac Moivaed by providing qualiy-of-service for nex generaion IP-based neworks, we design algorihms o schedule packes wih values and deadlines. Packes arrive over ime; each packe has a non-negaive value and an ineger deadline. In each ime sep, a mos one packe can be sen. Packes can be dropped a any ime before hey are sen. The objecive is o maximize he oal value gained by delivering packes no laer han heir respecive deadlines. This model is he wellsudied bounded-delay model (Hajek. CISS Kesselman e al. SICOMP 2004) which exensive compeiive online algorihms have been developed for. In a generalizaion of his model, he success of delivering a packes in each ime sep depends on he reliabiliy of he communicaion channel. In his paper, we apply online learning approaches on his model as well as a few of is varians. We design online learning algorihms and analyze heir performance heoreically in erms of exernal regre. We also measure hese algorihms performance experimenally. We conclude ha no online learning algorihms have a consan regre. Our online learning algorihms ouperform he compeiive algorihms for algorihmic simpliciy and running complexiy. However, in general, his online learning algorihms work no worse han he bes known compeiive online algorihm for maximizing weighed hroughpu in pracice. I. INTRODUCTION In he pas a few decades, he Inerne coninues serving various ypes of applicaions. The rouers in he IP-based infrasrucure execue he Inerne s main funcionaliy. Figure 1 illusraes he funcionaliies of he buffer managemen inside of rouers. A buffer managemen algorihm is in charge of wo asks, queuing packes and delivering packes. When new packes arrive, buffer managemen decides which ones o accep and queue for poenial deliveries, and which ones already in he buffer o drop permanenly due o packe deadline or buffer capaciy consrains. In each ime sep, he buffer managemen selecs a pending packe in he buffer o send. Informaion abou he incoming packes in he fuure is compleely unknown o us. The buffer managemen can be regarded as an online scheduling algorihm processing prioriized packes. For ease of implemenaion and deploymen, mos curren rouers forward packes in a firs-in-firs-ou manner and rea all packes equally: They simply send he earlies-released packe and when he buffer is full of packes, laer arrivals will be dropped immediaely. However, he diversiy of applicaions running over he Inerne has resuled in heerogeneiy, and unpredicable or even chaoic nework raffic [23], [24]. Thus, Fig. 1. Buffer managemen inside of rouers i is more reasonable o consider differeniaion among packes from differen ypes of applicaions. One such effor is o provide he proporional differeniaed services called he DiffServ (see [7], [19] and he references herein). These pracical concerns have made buffer managemen a rouers significan and challenging in providing effecive qualiy-ofservice (QoS) suppor o various applicaions. In he QoS buffer managemen seing, we use a value o represen a packe s prioriy, and we maximize he weighed hroughpu, which is defined as he oal value of he packes sen. (In his paper, we use value, weigh, and prioriy inerchangeably.) If he release ime and value of each packe are known ahead of ime, an opimal schedule can be found efficienly (see Secion III-A). The opimal offline algorihm is empowered of clairvoyance o have he whole inpu sequence in advance o design is scheduling policy. However, in real applicaions, we do no know all such informaion ahead of ime. Raher, packes arrive over ime in an online manner, and we only learn abou a packe and is associaed characerisics when i acually arrives. Thus, an opimal offline scheduling algorihm is unrealisic and canno be applied in real buffer managemen. Insead, we have o design online buffer managemen ha makes is decision based on he inpu ha i has seen so far. A. Relaed work Principally, here are wo main sreams of research argeing a providing differeniaed services for he Inerne via efficien buffer managemen. One kind of such research is o provide saisical guaranees under some mild assumpions on he sochasic properies of he raffic flows. These algorihms are called sochasic algorihms. Saisical guaranees heavily /10/$ IEEE 206

2 depend on he successfulness of modeling he inpu raffic flows and probabilisic analysis is used o calculae such an algorihm s expeced performance. In conras o providing saisical guaranees, oher researchers sudy he algorihms performance under he wors case scenarios. These algorihms can be classified by compeiive online algorihms and online learning algorihms. Quie exensive work has been done on compeiive online algorihms. To our knowledge, our work in his paper is he firs endeavor o ackle wih his buffer managemen problem using he online learning approach. Research on compeiive online algorihms sars from Aiello e al. [1], Mansour e al. [22], and Kesselman e al. [12]. In heir models, no assumpions are made on he inpu raffic flows and compeiive raio [4] is used o measure an online algorihm s performance. These sudies avoid he difficuly inroduced by no-successful-ye modeling of he inpu raffic flow [24]. The models in [1], [12] have araced much aenion and have been sudied exensively in he pas a few years. In order o evaluae he performance of an online compeiive algorihm lacking of fuure inpu informaion, we compare i wih an opimal offline algorihm. An online algorihm is called k-compeiive if is weighed hroughpu on any insance is a leas 1/k of he weighed hroughpu of an opimal offline algorihm on his insance. The upper bounds of compeiive raios are achieved by some known online algorihms. A compeiive raio less han he lower bound is no reachable by any online algorihm. For he bounded-delay model (he buffer size is assumed ), an opimal offline algorihm has been proposed in [12], running in O(n log n) ime where n is he number of packes released. For online algorihms, he bes known lower bound of compeiive raio of deerminisic algorihms is φ = [11], [6], [2]; his lower-bound also applies o insances in which he deadlines of he packes (weakly) increase wih heir release daes. A simple greedy algorihm ha always schedules he maximum-value pending packe in he buffer is 2-compeiive [11], [12]. For a varian in which he deadlines of he packes (weakly) increase wih heir release daes, Li e al. [16] propose an opimal deerminisic φ-compeiive algorihm. Using he same analysis, bu in a more complicaed way, Li e al. provide a (3/φ 1.854)-compeiive deerminisic algorihm [16] for he general model. Independenly, Engler and Wesermann presen a compeiive deerminisic memoryless algorihm and a ( )-compeiive deerminisic algorihm [8]. Closing he gap [1.618, 1.828] of compeiive raio for deerminisic algorihms is a difficul open problem. A randomized online algorihm wih a compeiive raio of e/(e 1) is proposed in [5]. The lower bound of compeiive raio of randomized algorihms is How o ighen he gap [1.25, 1.582] in he randomized bounded-delay model remains open. If he buffer size B is a finie number, he generalizaion of he boundeddelay model is called a bounded-buffer model. In [13], a 3-compeiive deerminisic algorihm and a (φ )- compeiive randomized algorihms are given. Fung [10] provides a 2-compeiive deerminisic algorihm and Li presens an alernaive proof [15]. When he number of size-bounded buffers is more han 1, Azar and Levy [3] provide a compeiive deerminisic algorihm and Li [14] improves he compeiive raio o B. Paper organizaion In his paper, we firs presen he model ha we sudy and he meric ha we use o measure online learning algorihms in Secion II. Then we inroduce our online learning algorihms in Secion III. The heoreical analysis of hese online learning algorihms are presened in Secion IV. Simulaion resuls are provided in Secion V. Finally, we conclude he paper in Secion VI. II. THE MODEL D j D µj G j. We define he regre We sudy he following model for QoS buffer managemen. Time is discree. A ime sep represens he ime inerval ( 1, ]. Packes arrive over ime. Each packe p has a nonnegaive value v p R + and an ineger deadline d p Z +. The deadline d p specifies he ime by which p should be sen. There is a buffer. Packes already exising in he buffers can be dropped a any ime before hey are served. A dropped packe canno be delivered any more. In each ime sep, a mos one packe from he buffer can be sen; he success of delivering a packe depends on he channel s reliabiliy which is no known ahead. The objecive is o maximize weighed hroughpu, defined as he oal value of he ransmied packes by heir respecive deadlines. This model is called a generalized model and called bounded-delay model if he channel s reliabiliy is always 1. Essenially, his buffer managemen is an online decision making problem. In general, people consider he wors-case guaranees for online algorihms. Compeiive analysis is one of he widely acceped way in measuring an online algorihm s wors-case performance in heoreical compuer science and operaions research. However, compeiive analysis is oo pessimisic as he adversary is empowered o change he inpu sequences according o wha he online algorihm does over ime. Many alernaive merics have been proposed (see [4] and he references herein). In his paper, we consider exernal regre as he meric in measuring online (learning) algorihms performance. Le D = {D 1, D 2,..., D n } be he se of possible decisions in each ime period. Denoe he profi/gain incurred a ime from aking decision D j by G j. We assume hroughou ha profis are bounded; for example, n v max, where v max is he maximal value of a packe. Definiion 1: Regre [9]. Any scheme (deerminisic or randomized) for selecing decisions can be described in erms of he probabiliy, µ j, of choosing decision D j a ime. Consider now a scheme S for selecing decisions. Le {µ } 0 be he probabiliy weighs implied by he scheme. Then, he expeced gain from using S, G(S), over T periods will be T =1 incurred by S from using decision D j o be R j T (S) = 207

3 T max{0, =1 µj (G j G i )}. The regre from using S will be R T (S) = j D Rj T (S). The scheme S will have he no inernal regre propery if is expeced regre is small, i.e., R T (S) = o(t ). III. ONLINE LEARNING ALGORITHMS In Secion III-A, we design an opimal offline algorihm for he generalized bounded-delay model. In Secion III-B, we show he limi of online learning for he bounded-delay model, especially for he model in which he adversarial informaion is no revealed over ime. This resul can be exended o he generalized model. A las, we presen a few online learning algorihms for he bounded-delay model and a few of is varians in Secion III-C. A. An opimal offline algorihm We consider he offline seing of he generalized boundeddelay model, in which he success raio of delivering a packe in a ime sep varies in he range of [0, 1]. We call his raio channel reliabiliy. This model is moivaed by applicaions of wireless communicaions. The problem is saed as follows: Given a se of packes wih all heir characerisic known in advance, and a sequence of saic channel saes wih known reliabiliy, we are looking for an algorihm wih he maximum oal weigh. Theorem 1: The generalized bounded-delay model in which he channel reliabiliy varies can be solved in pseudopolynomial ime. Proof: If he buffer size B is infinie, an opimal offline algorihm has been given for he bounded-delay model in [12] wih a running ime of O(n log n). If he buffer size B is a consan, an opimal offline algorihm is proposed in [18] wih a running ime of O(n 2 log max{b, n}). Now, we consider he generalizaion in which he channel reliabiliy may no always be consan. The buffer size B is assumed infinie. Given a se of packes P = {p 1,..., p n }, each packe p i is denoed as a 3-uple (r i, d i, v i ). Le r min = min{r i i = 1,..., n} and d max = max{d i i = 1,..., n}. All possible scheduling ime slos are in he range of [r min, d max ]; hey are known as feasible inervals. Consider hese m (:= d max r min + 1) ime slos. Le he sequence of channel reliabiliy be {η 1,..., η m }. The reducion from he generalized bounded-delay model o he maximum weighed biparie maching problem works as below. We firs consruc a graph G = (V, E), and V = L R where L R =. Each a i L represens a packe p i and each b j R represens a ime slos in he feasible inerval. Obviously, we have L = n and R = m. Each verex a i L represens he packe p i. We creae an edge connecing each a i and each b j wih r pi b j d pi. Also, each edge has an expeced cos defined as c(e ij ) = η j v pi, where η j is he channel reliabiliy a ime slo represened by verex b j and v pi is he weigh of he packe represened by verex a i. A his poin, our problem could be solved by finding he maximum weighed maching of graph G and his resuls in a maximum expeced oal weigh E(W ). The running complexiy of his algorihm includes wo pars: he consrucion par and he par of solving he reduced problem. The consrucion akes a linear ime O( V + E ). Calculaing a maximum weighed maching akes ime O( E 3 ), using he Hungarian algorihm. B. The power of learning Afer inroducing he offline soluion o he generalized bounded-delay model, we consider online soluions. We noe ha compeiive online algorihms and online learning algorihms can be boh applied as online approaches o his model wihou making any assumpions on packes sochasic properies. The main quesion ha should be answered is wha is he power/limi ha an adversarial inpu sequence may have. Tha is, does here exiss a finie inpu insance such ha any online learning algorihm canno achieve a consan exernal regre? Theorem 2: Consider a finie bu large inpu sequence. There is no online learning algorihm ha can have a consan exernal regre. Proof: In order o show ha for a given finie inpu sequence, no online learning algorihm has a consan exernal regre, we need o show ha compared wih he bes algorihm in hindsigh, any online learning algorihm has a consan > 1 compeiive raio. We consider a specific case in which all packes deadlines are weakly increasing along heir release ime. We call such insances agreeable deadline. This example is modified from he one given in [6]. We firs show ha for any online policy π, is compeiive raio c φ ɛ, for any small ɛ > 0. A each ime sep, wo packes p and p wih span (he difference beween deadline and release ime) 1 and 2 respecively are released wih values v and v +1 respecively. Assume a deerminisic online algorihm runs policy π, and is compeiive raio is c. Iniially, we se τ = 5 2, v 0 = 1, v 1 = φ + ɛ, and v i+1 = (v i v i 1 )/τ. (Explicily, we have v i = (1 ɛ)φ i + ɛ(φ + 1) i, i 0.) 1. Le k be a sufficien large number. If here exiss 0 j < k, π selecs p j in sep j (i.e., ime inerval [j, j + 1]), is adversary sops releasing packes afer j. π does no selec p j o send in sep j, and i < j, π selecs p i o send in sep i. On he conrary, π s adversary delivers p i in sep i, i < j, p j in sep j, and p j in sep v j + 1. Noe ha lim k k v k 1 = φ + 1 and n i=1 v i = (v n 1 v 0 )/τ. The compeiive raio is: c (v 1 + v v j+1 ) + v j (v 0 + v v j+1 ) v j > 1 + 2τ 1 τ ( 2τ 1 τ )2 ɛ > φ ɛ. 2. Oherwise, he adversary releases all packes p i and p i up o sep k 1 and a ime k, only p k is released. i < k, π selecs p i o send in sep i. π s adversary delivers all packes p i in sep i up o sep k, where p k is sen. Assume k is large. 208

4 v v k + v k c v 0 + v v k v k v 0 = 1 + v 0 + v v k k 1 + (1 + φ)v k φ + v k 1 1 τ k 1 + φ + 1 = φ. τ 1 Thus, no algorihm can achieve a compeiive raio beer han φ > 1. In urn, no online learning algorihm has a consan expeced regre during he course of he whole schedule. The limi of online learning algorihm is due o he fac ha he parial inpu sequence canno reveal more informaion on he opimaliy of each saic algorihm unil he end of his inpu sequence. C. Online learning algorihms We design online learning algorihms for boh he boundeddelay model and is generalizaion, as well as a few of is varians. We name a saic algorihm an exper. Based on Occam s razor (a principle saing ha he simples explanaion is usually he correc one ), we inerpre ha in general, we prefer simpler explanaions. Consider he model ha we sudy. Packe values, packe deadlines, and channel reliabiliy are he hree facors ha we need o consider in he online decision making procedure. Thus, he saic algorihms should be simple funcions on packe values, packe deadlines, and channel reliabiliy only. Le v min and v max denoe he minimum value and he maximum value of a packe in he inpu sequence. Firs, we design a few expers (saic algorihm wih hindsigh). Then we design online learning algorihms based on he observed performance of hese expers. Noe ha he number of expers canno be large since his value deermines he running ime of he online learning algorihm in each ime sep. Thus, we apply he well-known geomeric rounding echnique. Consider all disinc packe values v min = v 1 < v 2 <... < v n = v max. We le v max = (1+δ) k v min, where k n. Then for each value v j, i falls in he range of [v min (1+δ) i 1, v min (1+δ) i ). Thus, we v have log max v 1+δ v min disinc inervals. Le M = log max 1+δ v min. We firs inroduce he ways ha how hese M expers work. Then we represen our online learning algorihm under various scenarios respecively. We have wo phases of delivering a packe for each exper. In he firs phase, he reliabiliy of he channel is prediced. In he second phase, he exper chooses one pending packe o send. In selecing a packe o send, packe values play he role and we have wo (families of) sraegies. 1) A sraegy based on absolue values: Sor all pending packes in increasing order of deadlines, wih ies broken in favor of he one wih a larger value. Choose he packe wih a value is predefined hreshold value. If such a packe is no available in is buffer, hen his exper sends nohing. This algorihm is described in Algorihm 1. Algorihm 1 EBAV(v min, v max, δ, j [1, M]) 1: Sor packes in he buffer in canonical order; i.e., in increasing order of deadlines wih ies broken in favor of larger values. 2: Send he firs packe p saisfying v p v max /(1 + δ) j. If p does no exis in he buffer, hen send nohing. Each exper only admis he packe ha i would send evenually. Thus, for he exper EBAV wih a parameer j, i only acceps packes wih values v max /(1 + δ) j. If he buffer size is limied, when overflow happens, we apply he greedy approach o filer ou he packes ha canno be accommodaed. Tha is, we drop he minimum-value packes when packe overflow happens; ies broken arbirarily. 2) A sraegy based on relaive values: Sor all pending packes in increasing order of deadlines, wih ies broken in favor of he one wih a larger value. Choose he packe wih a value is predefined hreshold raio han he maximumvalue pending packe in he buffer, ies broken in favor of he earlier released packe. We noe ha such a packe is always in he buffer. Algorihm 2 EBRV(v min, v max, δ, j [1, M]) 1: Sor packes in he buffer in canonical order; i.e., in increasing order of deadlines wih ies broken in favor of larger values. 2: Le h be he maximum-value packe in he buffer. 3: Send he firs packe p saisfying v p v h /(1 + δ) j. The packe p can always be found since eiher he firs packe (in canonical order) or he packe h is he candidae. In admiing packes, we apply he approach of idenifying he opimal provisional schedule, which is similar o he one defined and used in compeiive online algorihms. Given a se of pending packes P, a provisional schedule S specifies which packes in P should be sen in which ime sep. An opimal provisional schedule S is one ha achieves he maximum weighed hroughpu among all provisional schedules on pending packes P (channel reliabiliy is assumed 1). Clearly, an opimal provisional schedule S a ime can be calculaed via a maximum-weighed biparie maching over pending packes in O( P 2 ) (see [13]). 3) Assume he channel reliabiliy is always perfec: This is he ypical bounded-delay model. Here we inroduce wo online learning algorihms based on he same sraegy: follow he bes exper. One is simply following he sraegy of he exper who has he bes gain up o he curren ime. The oher one is following he sraegy of he exper who has he bes gain afer delivering all he pending packes in is buffer successfully. We call hese wo online learning algorihms Follow COPT and Follow OPT, respecively. 4) Assume he channel reliabiliy varies over ime: When he channel reliabiliy is no a fixed value along all he ime, 209

5 we need o consider he channel qualiy s variabiliy as well. Thus, we employ a se of expers in predicing he channel s reliabiliy over ime. In esimaing he channel saes, we inroduce wo expers. (We can adap our algorihm o muliple expers.) Each exper insiss on giving a fixed predicion of he fuure channel saes, which is eiher H (represening high ) or L (represening low ). These wo expers are named EH and EL respecively. We hen use he weighed majoriy algorihm [20], [21] in predicing he saus of he channel reliabiliy. We associae credis o hese expers, which means o how much exend an exper s opinion could be rused. Le he credi for exper EH a ime be c EH and for exper EL be c EL respecively. Iniially, we se c EH 0 = c EL 0 = 1. In each ime sep 1, 2,..., 1, we have a label indicing wheher he channel reliabiliy is H or L. Then we proceed as he following Winnow algorihm. Algorihm 3 Winnow(EH, EL) 1: Iniially, we se c EH 0 = c EL 0 = 1. 2: if c EH c EL hen 3: Predic H. 4: else 5: Predic L. 6: end if 7: if he channel s qualiy was high hen 8: c EH = 2c EH ; 9: c EL = c EL /2; 10: else 11: c EH = c EH /2; 12: c EL = 2c EL. 13: end if However, in admiing packes, we are no only predicing he nex sep s channel reliabiliy, we also need o esimae he fuure channel s saus when we calculae an opimal provisional schedule. Hence, we need o sudy he chain effec of he predicion. Le E( ) be he prediced value using he Winnow algorihm. Lemma 1: If E(c EL (respecively, < E(c EL ) holds a ime slo, hen +1) E(c EL +1) (respecively, +1) < E(c EL +1)) holds a ime + 1. Proof: Le η denoe he prediced reliabiliy a ime. We have η = ) + E(c EL. Based on he Winnow Algorihm, we have 1) = η 1 1) + (1 η 1 ) E(cEH 2 1) E(c EL E(c EL = η 1 2 Given he assumpion ha. (1) + (1 η 1 ) 1). (2) ) E(c EL ), we have η 1/2. Then from Equaions 1 and 2, we have +1) = (1 + η 1 2 E(cEH E(c EL +1) = 3 4 E(cEH. ( 1 1 ) 2 η E(c EL 3 4 E(cEL. Lemma 1 is compleed. Theorem 3 (Monooniciy of η): If he iniial expeced credis of expers saisfies 0 ) E(c EL 0 ) (respecively, 0 ) < E(c EL 0 )), he reliabiliy of channel η is nondecreasing (respecively, non-increasing) for. Proof: Theorem 3 can be proved using he inducive mehod. Based on he definiion of η and E( ), we have η = η +1 = = = Therefore, we have η +1 η = + E(c EL. +1) +1 ) + E(cEL +1 ) (1 + η 1 2 E(cEH ) E(c EL (1 + η 1 2 E(cEH + ( η (1 + η (1 + η + (2 η E(c EL. (1 + η (1 + η + (2 η E(c EL ) + (1 + η E(c EL (1 + η + (2 η E(c EL. = (1 + η ) ( E(c EH ) + E(c EL Assume ha we have he iniial case of 0 ) E(c EL 0 ), hen based on Lemma 1, we have E(c EL. Tha is 1/2 η 1. So 2 η 3/4 1 + η, which indicaes (1 + η E(c EL (2 η E(c EL (E(c EL > 0). Thus, i is obvious ha η +1 /η 1. Theorem 3 is compleed. Theorem 3 indicaes ha he prediced reliabiliy of he channel a a cerain ime highly depend on he resul prediced in he previous sep. Thus, he prediced saus can be rolled over along he ime we send packes. We have he following algorihm. Algorihm 4 PCR(EH, EL) 1: Apply he Winnow Algorihm o predic channel s reliabiliy. 2: if E(c EL hen 3: Assign all prediced channel saus High. 4: else 5: Assign all prediced channel saus Low. 6: end if Afer we idenify he channel saus over ime, we can apply he biparie maching o find he opimal provisional schedule and arrange packes in canonical order. In summary, we assign ) 210

6 all he pending packes o heir laes ime slos ha hey can be feasibly of being sen by deadlines if E(c EL. If < E(c EL, hese pending packes are assigned o he earlies ime slos for delivery. 5) There are only wo expers: When here are only wo expers in predicing he channel s reliabiliy, he online learning algorihm makes is decision based on hese wo expers and we consider his varian for analysis in Secion IV. D. An example To beer illusrae how our online learning approach work and is performance agains he opimal offline algorihm and bes known compeiive online algorihms, we give he following example. We assume he channel reliabiliy is always 1. There are 4 packes p 1, p 2, p 3 and p 4. Their release ime, deadlines, and values are lised in Table I. Remember ha packes are uni-lengh. TABLE I PACKETS INFORMATION packe release ime deadline value p p p p We compare he following hree kinds of algorihms based on his same inpu as below. 1) The opimal offline algorihm: I is obvious ha he offline algorihm could send all he packes before heir respecive deadlines and obain a maximum oal weighed hroughpu. The packes are sen in he order of p 1, p 2, p 3, p 4 and his yields oal value of 18. 2) The compeiive online algorihm: Here, we consider algorihms SEMI-GREEDY [17] and EDF β [17]. For his example, we le he algorihm SEMI-GREEDY work under a perfec channel sae such ha i sends he maximum-value packe in each ime sep. I sends p 2 a ime 0 (p 1 expires) and p 4 a ime 2 (p 3 expires). The oal value SEMI-GREEDY gains is 12. For he algorihm EDF β (β = 2), we send he earliesdeadline packe whose value if a leas 1/β imes of he maximum-value of a pending packe in he buffer. Packe p 2 will be sen a ime 0 since v 2 > βv 1 = 2v 1. A ime 2, EDF β chooses p 3 o send since βv 3 = 2v 3 > v 4. The packe p 4 will be sen aferwards. Therefore, EDF β (β = 2) gains a oal value of 16. 3) The online learning algorihm: We use he algorihm EBAV as an example. The algorihm EBRV works in a similar way and we omi i here. We choose δ = 0.5 as he inpu parameer of he algorihm. We also have v max = 7 and v min = 2. So he number of expers we will have is v M = log max 1+δ v min = 4. Refer o Algorihm 1, we have he schedule lis of each exper is shown in he able II. The online learning algorihm EBAV runs he following sraegy: Follow he exper who has he bes gain up o he curren ime. TABLE II THE SCHEDULE LISTS OF EXPERTS exper = 0 = 1 = 2 = 3 value exper 1 p 2 p 4 12 exper 2 p 2 p 3 p 4 16 exper 3 p 2 p 3 p 4 16 exper 4 p 1 p 2 p 3 p 4 18 Iniially, he online learning algorihm simply uses EDF as he sraegy since no exper exiss o follow ye. Therefore, he online learning algorihm schedules p 1 a ime = 0. Afer all expers finish scheduling heir firs packes, exper 1, 2 and 3 are he bes for sep 1. Assume he online algorihm follows exper 1 a his sep, hen, i will schedule p 2 a = 1 for ha v 2 > v max /(1 + δ). A he end of sep 2, exper 4 has he bes gain for he firs ime seps. The online learning algorihm swiches o exper 4 s sraegy, which sends p 3 a ime = 2. The online algorihm follow he exper 4 for ime = 2 and = 3. The oal value ha i gains is 18. Table III summarizes he weighed hroughpu of hese 4 differen kinds of algorihms on he above example. The online learning algorihm achieves he opimal gain ha can be achieved by an offline algorihm. TABLE III PERFORMANCE OF ALGORITHMS Algorihm Schedule Value offline p 1, p 2, p 3, p 4 18 SEMI-GREEDY p 2, p 4 12 EDF β (β = 2) p 2, p 3, p 4 16 EBAV p 1, p 2, p 3, p 4 18 IV. ANALYSIS OF THE ONLINE LEARNING ALGORITHMS We apply he local opimizaion echnique o analyze he online learning algorihms ha we develop. We firs consider an insance wih only wo values for packes. Then we show ha he regre of he online learning algorihm is bounded by a raio of hese wo packes. Finally, we claim ha his raio can be generalized o he case in which packes can have muliple values. Assume ha here are only wo kinds of packes denoed as p s and p b, where v s = 1 and v b = α > 1. In his case, i is obvious ha only wo expers are needed. Exper 1 is denoed as EA and i will schedule he earlies deadline packe whose weigh v s. If boh p s and p b exiss and if hey share he same deadline, exper 1 will schedule p b. Exper 2 is denoed as EB and i will schedule he earlies deadline packe whose weigh α. If only p s remains in he curren buffer, exper 2 will schedule p s. Le he bes exper among EA and EB be EXP. Le he oal gains of he bes exper and of he online learning algorihm are G EXP and G ON respecively. Then he average regre R per ime sep is expressed as R = GEXP G ON, (3) T 211

7 where T is he overall ime spen in scheduling all he packes for he online learning algorihm. We case sudy he regre R. Given a ime sep, if he online algorihm has exacly he same behavior (schedules he same packe) wih he bes exper in hindsigh, hen he average regre will be zero. Oherwise, we claim ha here exis a ime sep such ha he packe scheduled by he online learning algorihm is differen from he one scheduled by he bes exper. Assume he bes exper EXP send a packe p s and he online learning algorihm ON schedule a packe p b and assume EXP ouperforms ON. Remark 1: The bes exper mus send a packe p b in he fuure, i.e., afer he ime he online learning algorihm send he packe p s. Proof: We prove Remark 1 using he conradicion mehod. Assume ha he claim fails a some ime >. Then, in any ime sep >, he bes exper sends a packe wih a less or equal weigh comparing o he online learning algorihm since no packe wih value v b is sen by EXP. Noe ha is he firs ime sep ha he online learning algorihm differs from he bes exper, he bes exper will have a sricly less gain han he online learning algorihm. This conradics he assumpion ha he bes exper ouperforms he online learning algorihm. In his case, we charge EXP he value 1 + α and charge ON he value α. We have an inernal regre of value 1 and he raio of gains is bounded by (1 + α)/α. Assume he bes exper EXP send a packe p b and he online learning algorihm ON schedule a packe p s. In his case, he packe p b is eiher sen by ON already or i is pending in ON s buffer. If ON has sen his packe in a previous ime sep, we can charge EXP value α o. In he curren sep, we charge EXP and ON values α and 1 respecively. Overall, he raio of he charged values in and is bounded by α+α 1+α. If ON schedules he packe p b in a fuure ime sep, his may incur a loss of value α due o packe deadline consrain. We charge EXP and ON values α and α in respecively. Overall, he raio of he charged values is bounded by α+α 1+α. This analysis approach can be applied o muliple values raher han wo ones. We skip he deails of expanding his analysis. We give a igh example for he online learning algorihm we presen. Example 1: We have wo kinds of packes wih values 1 and α respecively. Iniially, B packes wih value 1 arrive and heir deadlines are 1, 2,..., B, and a he same ime, B packes wih value α arrive and heir deadlines are B + 1, B + 2,..., 2B. For he online learning algorihm EBRV (or EBAV), if i chooses he packes wih value 1 o send (of course, i plans o send he packes wih value α in he fuure), we generae an inpu of B packes wih value α a ime B + 1 and hese packes have deadlines of 2B. If he online learning algorihm chooses packes wih value α o send in seps 1, 2,..., B, we sop releasing new packes. In eiher case, he average regre is α 1 2 per ime sep. V. SIMULATION We develop a prooype o simulae he performance of online learning algorihms. There is a packe generaor. This generaor generaes packes wih release ime, deadlines and values. In our seing, we assume all hese numbers are generaed uniformly randomly. In oal, here are 2000 packes generaed. The range of he release ime is [0, 999]. The range of he deadlines is [0, 999] and he range of he packe values is [0, 99]. The maximum and he minimum packe values are v max = and v min = 0.009, respecively. For he curren version of our simulaion, he channel sae is fixed. We design he offline algorihm as follows. We assume he channel reliabiliy is always perfec. We implemen EBAV and EBRV in Secion III-C. In geomeric rounding, we se δ = 0.5. In oal, here are 23 expers (0.009( ) ). In our simulaed resuls, he bes exper yields a oal gain of I sends 1000 packes and i is he second exper. In each ime sep, he bes exper chooses a packe wih a value v max /[(1 + δ) 2 v min ] o send. The wors exper yields a oal gain of wih only 1 packe scheduled he mos valuable packe. The exper seing he hreshold as he minimum-value packe yields a oal gain of and 1000 packes are scheduled successfully. The online learning algorihm who follows he bes exper so far will gain a profi as wih 999 packes scheduled. Thus, he regre of online learning algorihm is bounded by a value 52 (< ) and i is almos he opimal exper. Figure 2 shows he simulaed resuls we have. In Figure 2, he dash-doed line represens he performance of our online learning algorihm. In his figure, we ake expers wih index j of 0, 1, 2, 5, 20 among all 23 expers. Fig. 2. Simulaed resuls of he online learning algorihms We also develop he prooype o simulae he performance of he opimal offline algorihm as well as wo compeiive 212

8 online algorihms: SEMI-GREEDY [17] and EDF β [17]. The simulaion is run on he same daa se used above. In our simulaed resuls, he opimal offline algorihm yields a oal gain of I sends 1000 packes. The algorihm SEMI-GREEDY yields a near opimal resul EDF β earns a oal gain of , where β = 2. We noice ha boh online algorihms performances are very close o he opimal offline soluion. Figure 3 shows he simulaed resuls we have. In Figure 3, he doed line represens he performance of our opimal offline algorihm, he dash-doed line represens he performance of SEMI-GREEDY and he solid line represens he performance of EDF β. Fig. 3. Simulaed resuls of he opimal offline algorihm and online algorihms In summary, boh online learning algorihms and compeiive online algorihm perform nearly as good as he opimal offline algorihms. Our online learning algorihm has much less running complexiy (O(log n) for each arriving packe) when compared wih compeiive online algorihms (O(n) for each arriving packe). VI. CONCLUSION In his paper, we design algorihms o schedule packes wih values and deadlines. This model has been sudied exensively using compeiive online algorihms. We apply he online learning approaches on his model as well as a few of is varians. These online learning algorihms performance are analyzed heoreically in erms of exernal regre. We also measure hese algorihms performance experimenally. We conclude ha no online learning algorihms have a consan regre. However, in general, our designed online learning algorihm works as almos well as he bes known compeiive online algorihm in pracice. In our fuure work, we will sudy semi-online algorihms, assuming par of he inpu informaion is known. ACKNOWLEDGEMENT The work is parially suppored by NSF under grans CCF , CNS , and AFOSR under gran FA REFERENCES [1] W. Aiello, Y. Mansour, S. Rajagopolan, and A. Rosen. Compeiive queue policies for differeniaed services. in Proceedings of he 19h Annual Join Conference of he IEEE Compuer and Communicaions Socieies (INFOCOM), pages , [2] N. Andelman, Y. Mansour, and A. Zhu. Compeiive queuing polices for QoS swiches. in Proceedings of he 14h Annual ACM-SIAM Symposium on Discree Algorihms (SODA), pages , [3] Y. Azar and N. Levy. Muliplexing packes wih arbirary deadlines in bounded buffers. in Lecure Noes in Compuer Science (SWAT), pages 5 16, [4] A. Borodin and R. El-Yaniv. Online Compuaion and Compeiive Analysis. Cambridge Universiy Press, [5] F. Y. L. Chin, M. Chrobak, S. P. Y. Fung, W. Jawor, J. Sgall, and T. Tichy. Online compeiive algorihms for maximizing weighed hroughpu of uni jobs. Journal of Discree Algorihms, 4: , [6] F. Y. L. Chin and S. P. Y. Fung. Online scheduling wih parial job values: Does imesharing or randomizaion help? Algorihmica, 37(3): , [7] C. Dovrolis, D. Siliadis, and P. Ramanahan. Proporional differeniaed services: Delay differeniaion and packe scheduling. IEEE/ACM Transacions on Neworking, 10(1):12 26, [8] M. Engler and M. Wesermann. Considering suppressed packes improves buffer managemen in QoS swiches. in Proceedings of he 18h Annual ACM-SIAM Symposium on Discree Algorihms (SODA), pages , [9] D. P. Foser and R. Vohra. Regre in he on-line decision problem. Games and Economic Behavior, 29:7 35, [10] S. P. Y. Fung. Bounded delay packe scheduling in a bounded buffer. arxiv: v1[cs.ds], July [11] B. Hajek. On he compeiiveness of online scheduling of uni-lengh packes wih hard deadlines in sloed ime. in Proceedings of 2001 Conference on Informaion Sciences and Sysems (CISS), pages , [12] A. Kesselman, Z. Loker, Y. Mansour, B. Pa-Shamir, B. Schieber, and M. Sviridenko. Buffer overflow managemen in QoS swiches. SIAM Journal on Compuing (SICOMP), 33(3): , [13] F. Li. Compeiive scheduling of packes wih hard deadlines in a finie capaciy queue. In Proceedings of he 28h IEEE Inernaional Conference on Compuer Communicaions (INFOCOM), pages , [14] F. Li. Improved online algorihms for muliplexing weighed packes in bounded buffers. Lecure Noes in Compuer Science (AAIM), 5564: , [15] F. Li. Packe scheduling in a size-bounded buffer. arxiv: [cs.ds], July [16] F. Li, J. Sehuraman, and C. Sein. An opimal online algorihm for packe scheduling wih agreeable deadlines. in Proceedings of he 16h Annual ACM-SIAM Symposium on Discree Algorihms (SODA), pages , [17] F. Li and Z. Zhang. Online maximizing weighed hroughpu in a fading channel. In Proceedings of he 2009 IEEE Inernaional Symposium on Informaion Theory (ISIT), pages , [18] F. Li and Z. Zhang. Scheduling weighed packes wih deadlines over a fading channel. In Proceedings of he 43rd Annual IEEE Conference on Informaion Sciences and Sysems (CISS), pages , [19] J. Liebeherr and N. Chrisin. A QoS archiecure for quaniaive service differeniaion. IEEE Communicaion Magazine, 41(6):38 45, [20] N. Lilesone. Learning quickly when irrelevan aribues abound: A new linear-hreshold algorihm. Machine Learning, 2: , [21] N. Lilesone and M. Warmuh. The weighed majoriy algorihm. Informaion and Compuaion, 108(2): , [22] Y. Mansour, B. Pa-Shamir, and O. Lapid. Opimal smoohing schedules for real-ime sreams. in Proceedings of he 19h Annual ACM Symposium on Principles of Disribued Compuing (PODC), pages 21 29, [23] V. Paxson and S. Floyd. Wide-area raffic: The failure of poisson modeling. IEEE/ACM Transacions on Neworking, page 1995, [24] W. Willinger and V. Paxson. Where mahemaics mees he Inerne. Noices of he American Mahemaical Sociey, pages ,

Vehicle Arrival Models : Headway

Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where