Stable Scheduling Policies for Maximizing Throughput in Generalized Constrained Queueing Systems

1 Sable Scheduling Policies for Maximizing Throughpu in Generalized Consrained Queueing Sysems Prasanna Chaporar, Suden Member, IEEE, Saswai Sarar, Member, IEEE Absrac We consider a class of queueing newors referred o as generalized consrained queueing newors which form he basis of several differen communicaion newors and informaion sysems. These newors consis of a collecion of queues such ha only cerain ses of queues can be concurrenly served. Whenever a queue is served, he sysem receives a cerain reward. Differen rewards are obained for serving differen queues, and furhermore, he reward obained for serving a queue depends on he se of concurrenly served queues. We demonsrae ha he dependence of he rewards on he schedules aler fundamenal relaions beween performance merics lie hroughpu and sabiliy. Specifically, maximizing he hroughpu is no longer equivalen o maximizing he sabiliy region; we herefore need o maximize one subjec o cerain consrains on he oher. Since sabiliy is criical for bounding pace delays and buffer overflow, we focus on maximizing he hroughpu subjec o sabilizing he sysem. We design provably opimal scheduling sraegies ha aain his goal by scheduling he queues for service based on he queue lenghs and he rewards provided by differen selecions. The proposed scheduling sraegies are however compuaionally complex. We subsequenly develop echniques o reduce he complexiy and ye aain he same hroughpu and sabiliy region. We demonsrae ha our framewor is general enough o accommodae random rewards and random scheduling consrains. Index Terms Consrained queueing newors, sabiliy, hroughpu, opimizaion, wireless, mulicas, randomized algorihms I. INTRODUCTION Consrained queueing newors have been exensively used o model several sysems of pracical ineres including wireless newors 35, 34, 25, 27, inpu queued swiches 23 and daabase sysems 34. A consrained queueing newor is a collecion of queues such ha only cerain ses of queues can be concurrenly served; hese schedulable ses depend on he underlying sysem. Whenever a queue is served, he sysem receives a cerain reward. In such sysems, queues need o be seleced for service such ha i) he oal reward earned by he sysem per uni ime hroughpu ) is maximized and ii) each queue is served ofen enough such ha he mean queue lengh in each queue is bounded sysem sabiliy ). The wo goals urn ou o be equivalen if he service of each queue i.e., he ransmission of each pace) feches he same reward. The performances of such newors are now reasonably well-undersood owing o several seminal conribuions 1, 3, 4, 6, 24, 25, 26, 35. We now invesigae consrained queueing newors where differen rewards are obained for ransmiing paces from differen queues, and furhermore, he reward obained for serving a queue depends on he se of concurrenly served queues. Such generalized consrained queueing newors form he basis of several communicaion and informaion sysems of pracical ineres, bu have no received adequae aenion in he research communiy. We firs provide examples of such sysems, and subsequenly demonsrae ha new resource allocaion goals and echniques are required for capuring he rade-off beween differen performance merics in hese sysems. Firs, consider one-o-many communicaions in wireless newors. Here, a sender may wish o ransmi is paces o muliple receivers in is communicaion range. Due o he broadcas propery of he wireless ransmission, a single ransmission may reach all hese receivers. Here, each sender consiues a queue, and he reward aained by a ransmission is he number of receivers who successfully receive i. Since differen mulicas groups have differen P. Chaporar is wih Indian Insiue of Technology, Mumbai, India. His email address is chaporar@ee.iib.ac.in. S. Sarar is wih he Deparmen of Elecrical and Sysems Engineering a Universiy of Pennsylvania. Her email address is swai@seas.upenn.edu. This wor was suppored by he Naional Science Foundaion under grans ANI-0106984, NCR-0238340 and CNS-0435306.

2 01 R3 Transmission Range of S 2 R4 01 S1 R 01 5 Transmission Range of S 1 R 2 01 R 0 1 01 S2 R 6 Fig. 1. Figure shows an example o demonsrae he applicaion of generalized consrained queueing newors in one-o-many communicaion in wireless newors. There are wo senders S 1, S 2 and 6 receivers R 1 o R 6. The dashed circles indicae he communicaion ranges of he senders. A single ransmission from S 1 can reach all is receivers, R 1,..., R 5. Here, R 6 is S 2 s receiver. Each sender corresponds o a queue. Here, L = { l 1 = 0 0, l 2 = 1 0 l 3 = 0 1, l 4 = 1 1}. Here, r l 1) = 0, {1,2}, r 1 l 2) = 5, r 2 l 2) = 0, r 1 l 3) = 0, r 2 l 3) = 1, r 1 l 4) = 3, r 2 l 4) = 1. T1 T2 T3 T4 U1 U2 U3 Fig. 2. Figure shows a daabase sysem wih four ables T1,..., T4 ha are accessed by hree applicaions U1, U2 and U3. The arrows indicae he ables each applicaion updaes. When here are concurren requess for updaes in he same able, he reques from an applicaion wih he lowes id is honored. Noe ha if all hree applicaions ry o simulaneously updae he daabase, hen U1, U2 and U3 achieve rewards 3, 1 and 0 respecively. If only U2 and U3 ry o simulaneously updae he daabase, hen hey achieve rewards 2 and 1 respecively. number of receivers, he reward aained by serving differen queues will be differen. Furhermore, wheher a receiver can successfully decode a ransmission depends on oher ransmissions in is neighborhood. Thus, he reward associaed wih each ransmission depends on he se of queues served concurrenly. For example in Figure 1 when S 2 is ransmiing o R 6, R 1 and R 2 canno receive a ransmission from S 1 as boh he ransmissions will collide a hese receivers. Hence, S 1 receives a reward of 5 when S 1 alone is served, and i receives a reward of 3 when S 1 and S 2 are served ogeher. Thus, he reward for S 1 depends on he se of queues served. Now, consider one-o-one communicaion in wireless newors. Success of each ransmission depends upon he inerference due o concurren ransmissions in he newor and he channel sae. Le he reward for each ransmission be 1 if he ransmission is successful. Thus, differen ransmissions aain differen rewards depending on he se of queues served. Furhermore, here, he same selecion of sessions may generae differen rewards a differen imes as he inerferences randomly change due o fading - rewards may herefore be random. Nex, in many daabase sysems, a single updae operaion from an applicaion involves updaes in many ables. Here, each applicaion consiues a queue, and he reward aained by an updae operaion is he number of ables ha are successfully updaed. Since differen applicaions require o updae differen number of ables, rewards received by serving differen queues will be differen. Moreover, if many applicaions ry o updae he same able, hen only one of hem can do so, as he access o hese ables is conrolled o avoid inconsisencies due o concurren updaes. Thus, he reward for a queue depends on he se of queues served. We demonsrae his using a specific applicaion in Figure 2. Our conribuion is o provide a mahemaical framewor for modeling and opimizing ey performance aribues in generalized consrained queueing newors. Firs, we define appropriae performance merics Secion II). Nex, we demonsrae ha he fundamenal relaions beween performance merics such as hroughpu and sabiliy change due o he dependence of he rewards on he se of queues served Secion III). Specifically, maximizing he hroughpu is no longer equivalen o maximizing he sabiliy region; we herefore need o maximize one subjec o cerain consrains on he oher. Since sabiliy is criical for bounding pace delays and buffer overflow, we focus on maximizing he hroughpu subjec o sabilizing he sysem. We design provably opimal scheduling sraegies ha aain his goal by scheduling he queues for service based on he queue lenghs and he rewards provided by differen selecions Secion IV). These scheduling sraegies are however compuaionally complex. We nex develop a framewor o reduce he compuaional complexiy and ye aain he opimum performance Secion V).

3 Finally, we consider some possible generalizaions Secion VI) and describe he relaed wor Secion VII). II. SYSTEM MODEL We consider a queueing newor wih n queues. We assume ha ime is sloed. In each queue {1,...,n} paces arrive as per arrival process {Λ )} =1, where Λ ) is he number of arrivals in queue during slo. Arrivals for he same session in differen slos are independen and idenically disribued. The arrival processes for differen sessions are independen bu no idenically disribued. We assume ha Λ ) A max in each slo def and for any. Le λ = EΛ ) and λ = λ 1,...,λ n ) denoe he arrival rae vecor. Each pace can be served in a mos one slo, and i depars he sysem a he end of he slo in which i is served. This assumpion has been moivaed by he fac ha in wireless newors muliple ransmissions of he same pace consume addiional energy and increase he inerference for oher ransmissions. We denoe by Q ) he queue lengh of he h queue a he beginning of slo. Also, Q) = Q 1 ) Q n ). A queue can only be served if i has a pace o ransmi, and in each slo in which i is served i ransmis one pace. The indicaor l ) = 1 if he h queue is served in slo, and is 0 oherwise. The vecor l) = l 1 ) l n ) denoes he service vecor in slo. The sysem consrains may prohibi simulaneous service of cerain queues. Thus, all 2 n n-dimensional binary vecors may no consiue a valid service vecor. Le L = { l 1,..., l m } denoe he se of all valid service vecors, and l i denoe he h elemen of l i L. Clearly m 2 n. For example, Figure 1 elucidaes a consrained queueing newor wih n = 2 and m = 4. Now, if he sysem has an addiional consrain ha all he receivers { should receive every pace, hen boh S 1 and S 2 can no be served concurrenly. Thus, in his case, L = l1 = 0 0, l 2 = 1 0, } l 3 = 0 1 and m = 3. We assume he following abou L. If l L, hen every l 1 l also belongs o L, where he inequaliy is elemen-wise. In oher words, if a cerain se of queues can be served simulaneously, hen any subse of hese queues can also be served simulaneously. Noe ha his assumpion holds in wireless newors. For each l i L and queue lengh vecor Q, we define an n-dimensional vecor l i Q) as follows. The h componen of l i Q) equals l i if Q > 0, and is 0 oherwise. Clearly, for each l i L and Q, l i Q) L. The sysem receives a reward for serving each queue, and he reward obained for serving he h queue in slo, r l)), is a funcion of he service vecor l) in slo, for each. We assume ha r l)) G < for each. We iniially assume ha he reward for each queue is a deerminisic funcion of he service vecor, and laer generalize o allow he reward o randomly depend on he service vecor Secion VI). Refer o Figure 1 for some example rewards. We assume he following properies of he reward funcion. Firs, if l = 0 hen r l) = 0. Thus, if a queue is no served hen i does no receive any reward. Nex, for any l 1, l 2 L if l 1 l 2 and l 1 > 0, r l 1 ) r l 2 ). Thus, r l Q)) r l) for any Q, l L and such ha Q > 0. We jusify his assumpion in conex of one of he applicaion scenarios, wireless newors. In wireless newors, when fewer queues ransmi, he inerference is less in he sysem and herefore, usually, he queues ha ransmi receive higher reward. If his is no he case, e.g., when he probabiliy of success increases wih increase in inerference due o he use of sophisicaed decoding sraegies, hen if an empy queue is seleced, i can ransmi a signal so as o ensure ha oher queues do no receive less reward because i is empy. This may increase he overall energy consumpion, bu our focus here is o maximize he hroughpu. Join minimizaion of he energy consumpion and maximizaion of he hroughpu consiss of ineresing opics for fuure research. The assumpion can also be similarly jusified for daabase sysems. Nex, we presen some imporan definiions. Definiion 1 Scheduling Policy): A scheduling policy decides he service vecor l ) in each slo 1 such ha l ) L and l i ) = 0 if Q i) = 0. This class includes offline policies ha decide heir service vecors based on he nowledge of pace arrivals in each pas, presen and even fuure slos. Since r l u)) = 0 if l u) = 0, and l u) = 0 if Q u) = 0, r l u)) = 0 if Q u) = 0. Thus, irrespecive of he scheduling policy, no queue receives any reward in a slo in which i is empy. Transmission of a signal from an empy queue is no considered service for he empy queue.

4 Definiion 2 Throughpu): For an arrival rae vecor λ, he hroughpu under a scheduling policy, Ω λ), is he reward i receives per uni ime. Mahemaically, Ω λ) = lim inf Since r l u)) = 0 if l u) = 0, and l u) {0,1}, Ω 1 λ) = lim inf 1 u=1 u=1 r l u)). r l u))l u). 1) Noe ha if he reward r l) is he number of receivers of session ha receive a pace when he service vecor is l, he hroughpu under is he sum, over all receivers, of he number of paces each receiver receives per uni ime. This is consisen wih he usual definiion of hroughpu in a communicaion newor. Definiion 3 Loss): The loss under a scheduling policy a any slo is he difference beween he sum of he maximum possible rewards of he queues i serves a and he reward i obains a. The loss under a scheduling policy, L λ), is is oal loss per uni ime. Mahemaically, L 1 λ) = lim inf G r ) l u)) l u). u=1 In a communicaion newor, usually, he loss experienced by a receiver denoes he number of paces ransmied by is source ha i does no receive per uni ime, and he newor loss denoes he sum of he losses of all receivers. Again, if he reward r l) is he number of receivers of session ha receive a pace when he service vecor is l, hen he formal definiion of loss in Definiion 3 has he same connoaion as above. Definiion 4 Sysem Sabiliy): The queueing P sysem is said o be sable if he ime average of queue lenghs is finie for each queue, i.e., lim sup Qiu) u=1 < wih probabiliy w.p.) 1 for each i. A scheduling policy ha sabilizes he sysem is called a sable scheduling policy. The sabiliy region of a scheduling policy is he se of arrival rae vecors for which he sysem is sable under he policy. The sabiliy region of he sysem Θ is he union of he sabiliy regions of all scheduling policies. A scheduling policy whose sabiliy region equals Θ is said o maximize he sabiliy region. Le C denoe he convex hull of he vecors in L and C denoe he inerior of C. In heir seminal wor, Tassiulas e al Theorem 3.2, 35) showed ha C Θ C. Definiion 5 Sabilizable Arrival Rae Vecor): We denoe he arrival rae vecor λ as sabilizable if λ C. Definiion 6 Throughpu Opimaliy): A sable scheduling policy is said o be hroughpu opimal if w.p. 1 i aains he maximum hroughpu among all he sable scheduling policies. We denoe he hroughpu aained by such a policy for arrival rae vecor vecor λ Θ by Ω max λ). Definiion 7 ǫ-throughpu Opimaliy): A scheduling policy is said o be ǫ-hroughpu opimal for a ǫ > 0 if a) i is sable, and b) Ω λ) Ω max λ) ǫ w.p. 1. In he nex secion, we show ha in generalized consrained queueing newors maximizing he sabiliy region is no equivalen o maximizing he hroughpu. Since sabiliy is imperaive for guaraneeing bounded delay and for limiing pace drop due o buffer overflow, we aim o maximize he hroughpu subjec o sabilizing he sysem. Specifically, our goal is o design ǫ-hroughpu opimal policies. We now invesigae he relaion beween he hroughpu and he loss. Now, L 1 λ) = G lim inf l u) Ω λ). Noe ha if a sysem is sable under policy, lim 1 u=1 u=1 l u) = λ w.p. 1. Thus, L λ) = n λ G Ω λ) w.p. 1. Thus, if λ is in he sabiliy region of policies 1, 2, Ω 1 λ) + L 1 λ) = Ω 2 λ) + L 2 λ) w.p. 1. Thus, for any sabilizable arrival rae vecor λ, a hroughpu opimal policy mus also minimize he loss, and an ǫ-hroughpu opimal policy aains a loss which is a mos ǫ more han he loss of any sable policy. Thus, we focus on obaining ǫ-hroughpu opimal policies.

5 III. RELATION BETWEEN THROUGHPUT AND STABILITY Firs, we examine wha decisions policies are liely o mae if hey wan o maximize only he sabiliy region, or if hey wan o maximize only he hroughpu. A policy ha aims o maximize he sabiliy region serves as many paces as possible in a slo while giving prioriy o longer queues. If he policy aims o maximize he hroughpu, hen i may wai and ransmi only when he reward is high so ha each pace feches he maximum possible reward. Thus, he conrol decisions for maximizing he sabiliy region and for maximizing he hroughpu are no equivalen. Using an example ha is moivaed by one-o-many communicaion in wireless newors Figure 1), we nex demonsrae ha a policy ha maximizes he sabiliy region does no maximize he hroughpu. Example 1: Consider he sysem shown in Figure 1. Le λ = 1/2 ǫ,1/2 ǫ), where ǫ is a small posiive real number. Now, consider a policy ha serves each queue whenever i is non-empy. Thus, if only S 1 S 2, resp.) is non-empy, hen will selec service vecor l 2 l 3, resp.) and achieve a reward of 5 1, resp.). If boh queues are non-empy in a slo, hen will selec l 4 and achieve a reward of 4. Clearly, maximizes he sabiliy region. Now, he service process for S 1 is independen of ha for S 2. Using Lile s law, he fracion of slos in which S 1 S 2, resp.) is non-empy and S 2 S 1. resp.) is empy is 1/4 ǫ 2 1/4 ǫ 2, resp.), and he fracion of slos in which boh queues are non-empy is 1/2 ǫ) 2. Thus, Ω b λ) = 5 + 1)1/4 ǫ 2 ) + 41/2 ǫ) 2 10/4. Now, consider a policy ha serves only S 1 when S 1 is non-empy, and serves only S 2 if S 1 is empy and S 2 is non-empy. Noe ha is sable as λ 1 + λ 2 = 1. Thus, whenever S 1 S 2, resp.) is served, he service vecor is l 2 l 3, resp.) and he reward is 5 1, resp.). Since he queues are sable, S 1 and S 2 are served in 1/2 ǫ fracion of slos each. Thus, Ω = 5 + 1)1/2 ǫ) 3. Thus, Ω b λ) < Ω λ). Noe ha in Example 1 always ransmis he maximum number of paces in each slo and also chooses he se of queues whose sum of queue lenghs is he maximum. Tassiulas e al 35 showed ha a policy ha saisfies he laer propery maximizes he sabiliy region in arbirary consrained queueing newors. Bu, Example 1 shows ha does no maximize he hroughpu. This is because does no consider he reward srucure in deciding he service vecor. So, he policies designed o maximize he sabiliy region of he consrained queueing sysem e.g. see 1, 5, 19, 34, 35) need no maximize he hroughpu. Thus, we need alernae mechanism o design hroughpu opimal policies. Now, we consider wo policies, 1, 2, ha see o maximize he reward in a greedy fashion. 1 serves each queue only when he queue can obain is maximum possible reward, and 2 selecs in each slo he service vecor ha aains he maximum possible reward among all valid service vecors in he slo. Simply pu, 1 maximizes he reward per pace, and 2 greedily maximizes he reward in each slo. We show ha 1 does no sabilize he sysem even when he arrival rae vecor is sabilizable, and 2 does no aain he maximum hroughpu among all sable policies. Example 2: Consider he sysem shown in Figure 1. Le λ = 3/4,1/2). Clearly, λ C and policy in Example 1 sabilizes he sysem. Noe ha 1 will never concurrenly serve boh queues. Hence, he sum of he service raes provided o he wo queues is a mos 1. Thus, since λ 1 + λ 2 > 1, 1 does no sabilize he sysem. Noe ha 1 maximizes he reward per pace while serving queues a raes smaller han heir arrival raes and hereby compromises sabiliy. Example 3: Consider he sysem shown in Figure 1 wih he difference ha r 2 l 3 ) = r 2 l 4 ) = 3. Le λ = 1/4,1/4). Noe ha for he above rewards, 2 selecs he same service vecors as. Thus, 2 sabilizes he sysem. Now, he service process for S 1 is independen of ha for S 2. Using Lile s law, he fracion of slos in which S 1 S 2, resp.) is non-empy and S 2 S 1. resp.) is empy is 1/4)3/4) 1/4)3/4)), resp.), and he fracion of slos in which boh queues are non-empy is 1/4) 2. Thus, Ω b λ) = 5 + 3)3/16) + 61/16) = 15/8. Now, consider described in Example 1. Again, is sable as λ 1 + λ 2 < 1. Thus, whenever S 1 S 2, resp.) is served, he service vecor is l 2 l 3, resp.) and he reward is 5 3, resp.). Since he queues are sable, S 1 and S 2 are served in 1/4 fracion of slos each. Thus, Ω = 5 + 3)1/4) = 2. Thus, 2 does no aain he maximum hroughpu among all sable policies. The limiaion of 2 is ha i myopically bases is decision in a slo solely on he aggregae reward in he slo. Thus, even when i is possible o wai and serve queues in muually disjoin slos and achieve a higher reward per pace, 2 serves he queues in he same slo. The examples demonsrae ha 1) a policy ha maximizes he sabiliy region need no maximize he hroughpu

6 2) myopically maximizing he reward in each slo or he reward per pace may no maximize he hroughpu or sabilize he sysem 3) he opimal policy should wai jus long enough so as o achieve he highes possible reward per pace while serving each queue a a rae higher han is arrival rae. IV. OPTIMAL POLICIES In his secion, we propose wo policies and prove ha hey are ǫ-hroughpu opimal for every sabilizable arrival rae vecor λ and ǫ > 0. A. Linear program based opimal policy ) The scheduling policy selecs l i L w.p. w i in every slo. If l i is chosen in slo, hen l ) = l i Q) ), i.e., he h queue ransmis a pace if l i = 1 and Q ) > 0. Recall ha l ) is he indicaor vecor for he se of queues served by in slo. Le selec l i in a slo. Then l ) l i. The inequaliy is sric only when some queues in l i are empy in, and hen, as discussed in Secion II, r l )) r l i ) for each for which Q ) > 0. The probabiliy disribuion w = w 1 w m is compued using he following linear program LP λ,δ). Here, δ is a parameer. LP λ,δ) :- Maximize: U λ,δ) = i w il i r l i ) Subjec o: 1) m i=1 w i = 1 and w i 0 for every i 2) m i=1 w il i = λ + δ for every. Consrain 1) ensures ha w is a valid probabiliy disribuion. When δ > 0, consrain 2) ensures ha each queue is seleced for service a a rae higher han he arrival rae in he queue. Thus, consrain 2) ensures sabiliy. Noe ha w and hence depend on λ and he chosen δ. We indicae his dependence by using he noaions w λ,δ) and λ,δ). Now, alhough LP λ,δ) is well-defined, i need no have any feasible soluion, for arbirary λ R n and δ R. Theorem 1 shows ha for all sabilizable λ and sufficienly small posiive δ, LP λ,δ) is feasible and λ,δ) is ǫ-hroughpu opimal. Noe ha allowing arbirary λ R n and δ R in LP λ,δ) simplifies he proof for Theorem 1. Theorem 1: Le λ be any sabilizable arrival rae vecor. Then, for every ǫ > 0 here exiss a δ such ha for every δ 0, δ), LP λ,δ) is feasible and λ,δ) is ǫ-hroughpu opimal. Furhermore, Ω λ,δ) U λ,δ) δ G Ω max λ) ǫ w.p. 1. 2) We prove Theorem 1 in he appendix. Finally, he sabiliy region can be maximized using arbirary feasible soluions of LP λ,δ) 5. Specifically, if λ,δ) selecs he service vecors as per any probabiliy disribuion ha consiues a feasible soluion of LP λ,δ) for any posiive δ, i sabilizes he sysem provided λ is sabilizable 5. Bu, for aaining he maximum hroughpu among all sable policies, an opimal soluion of LP λ,δ) mus be used. Specifically, for any sabilizable λ and ǫ > 0, λ,δ) is ǫ-opimal for any δ 0,min{δ max λ),ǫ/ n G }, where δ max λ) is he maximum value of δ for which LP λ,δ) has a feasible soluion follows from Theorem 1 and Lemma 5 in he appendix). B. Queue lengh based opimal policy O ) The policy λ,δ) requires he nowledge of λ in order o obain he opimal w λ,δ). The sysem may no however now λ. We now design a policy O ha aains he maximum hroughpu among all sable policies and sabilizes he sysem for any sabilizable λ wihou nowing λ. Recall ha an opimal policy should wai as long as possible o achieve he highes possible reward per pace wihou violaing sysem sabiliy Secion III). Now, λ,δ) uses he nowledge of λ o ensure he above, whereas O ensures he above by using only he value of Q).

7 We now describe O. In slo, O selecs he service vecor l O ) such ha { } l O ) = arg max Q ) V G r ) l)) l, 3) l L, l= l Q)) where V is a consan. Noe ha he consrain l = ) l Q) implies ha l = 0 if Q ) = 0 and l {0,1} oherwise. Theorem 2: Le λ be any sabilizable arrival rae vecor. Then, for every V 0, O sabilizes he sysem. Moreover, for every ǫ > 0, here exiss V such ha for every V V, O is ǫ-hroughpu opimal. The above resul implies ha any sable off-line policy ha aes ransmission decisions based on he nowledge of pas, presen and fuure arrivals can no aain hroughpu significanly more han Ω O λ) for every sabilizable λ. This holds even hough O aes ransmission decisions based only on he curren queue lenghs. Now, we describe he inuiion behind his resul. Le W Q, def l) = W 1 Q, l) def = Q l V G r l))l ). 4) Q l. Noe ha inuiively G r l) is he loss of reward of he h queue when service vecor l is used. Thus, in each slo, O selecs he service vecor l ha maximizes he do produc, W Q), l), of l and he difference beween he queue lengh vecor Q) and a scaled loss vecor associaed wih l. Noe ha a policy ) ha selecs he service vecor l ha maximizes he do produc, W 1 Q), l), of l and he queue lengh vecor Q) sabilizes he sysem for every sabilizable λ 1, 18, 35. This is because under he queue lengh process has a negaive drif when n Q ) is sufficienly large for every sabilizable λ. When n Q ) >> V n G, W Q), l) W 1 Q), l) for every l L, and herefore, O and selec similar service vecors. Thus, inuiively, for every sabilizable λ, he queue lengh process under O should also have a negaive drif when n Q ) is sufficienly large. Hence, O also sabilizes he sysem for any sabilizable λ. We have however shown ha all sable policies do no aain equal hroughpu Example 1). So, i is no obvious ha O maximizes he hroughpu among all policies ha sabilize he sysem; we now provide he inuiion behind why his is he case. Noe ha when he queue lenghs are small, high hroughpu can be aained wihou violaing sabiliy by serving he queues only when hey receive high rewards. On he oher hand, sabiliy can be ensured by selecing he queues wih higher queue lenghs and by serving a large number of paces when he queue lenghs are large. We now demonsrae ha O follows boh he above principles. For simpliciy, assume ha V,G,r l) are inegers for all, l L. Now, when Q ) < V, Q ) V G r ) l) 0 only if r l) = G. Then, since O maximizes W Q), l), i will serve he h queue only if he maximum possible reward is achievable. Now, if Q {V,...,2V 1}, hen Q ) V G r ) l) 0 only if r l) G 1. Thus, O will serve he h queue only if he achievable reward is greaer han or equal o G 1. Similarly, if Q ) {G u)v,...,g u + 1)V 1}, hen O will serve he h queue only if he achievable reward is greaer han or equal o u. Summarily, O aains he maximum possible reward for every pace while mainaining sabiliy by dynamically selecing he service vecors based on he queue lenghs. Thus, O aains he maximum hroughpu among all sable policies. Now, we prove Theorem 2 using a combinaion of opimizaion and Lyapunov heories. Neely e al. 27 proposed his proof echnique in a differen conex. Proof: Consider a sabilizable λ. For any policy, Q + 1) = Q ) + Λ ) l ). 5) Now for O, { Q)} 1 is an irreducible, aperiodic and counable Marov chain. Now, consider he Lyapunov funcion f Q)) = Q )) 2. 6)

8 Le, M def = na 2 max + 1). From 5) and 6), i follows ha f Q + 1)) f Q)) M + 2Q )Λ ) 2Q )l O Thus, E f Q + 1)) f Q)) Q) M + = M + 2E 2Q )λ E ). 2Q )l O ) Q) 2Q )λ 2E W Q), l O )) Q) Q )l ) V G r ) l ))l ) Q) 2V E G r l O ))l O ) Q). 8) Now, since λ is sabilizable, we can obain small enough posiive δ such ha λ,δ) is ǫ/2-hroughpu opimal Theorem 1). We consider λ,δ) for such a δ. Here, l λ,δ) ) is he service vecor λ,δ) would have used a if i had a queue lengh vecor of Q) a. From definiion of O equaion 3)), for every and, E Thus, from 8), for every δ W Q), l O )) Q) E E W Q), l )) Q). f Q + 1)) f Q)) Q) M + 2Q )λ 2E W Q), l λ,δ) )) Q) 2V E G r l O ))l O ) Q). 9) Now λ,δ) chooses each service vecor l i w.p. w i λ,δ) independen of he queue lenghs, and subsequenly serves only hose queues ha are included in he seleced service vecor and are also non-empy. Thus, if Q ) > 0, E ) Q) m = l i w i λ,δ) = λ + δ. 10) l λ,δ) i=1 In addiion, recall ha if selecs l i, and if he h queue is nonempy i receives a reward of a leas r l i ). Thus, if Q ) > 0, E G r ) l λ,δ) )) l λ,δ) ) Q) m w i λ,δ) G r ) l i ) If Q ) = 0, i=1 = G λ + δ) l i m w i λ,δ)r l i )l i. i=1 E G r ) l λ,δ) )) l λ,δ) ) Q) = 0. 7)

9 Now, m i=1 w i λ,δ)r l i )l i G λ + δ) since r l i ) G and m i=1 w i λ,δ)l i = λ + δ. Thus, E G r l λ,δ) )) m G λ + δ) i=1 ) l λ,δ) ) Q) From 4), 10), and 11), i follows ha E W Q), l λ,δ) )) Q) Q ) V G )λ + δ) + V U λ,δ). Hence, from 9) E f Q + 1)) f Q)) Q) M 2δQ ) + 2V G λ + δ) 2V U λ,δ) w i λ,δ)r l i )l i. 11) 2V E G r l O ))l O ) Q). 12) 1) Sabiliy of O : From 12), since 0 r l) G,l 0 for all, l L, i follows ha for every sabilizable λ and every non-negaive V E f Q + 1)) f Q)) Q) M 2δQ ) + 2 V G λ + δ). Le A = { Q : n =0 Q V δ n =0 G λ + δ) + M+1 2δ E f Q + 1)) f Q)) Q) }. Then, { < for all Q), 1 if Q) / A. Thus, since A is finie, by Foser s Theorem Theorem 2.2.3 in 20), { Q)} 1 is posiive recurren, and for each queue he expeced queue lengh under is saionary disribuion is finie. Thus, he sysem is sable under O. 2) ǫ-throughpu Opimaliy of O : Taing expecaion on boh sides of 12) wih respec o he saionary disribuion of { Q)} 1, we obain E f Q + 1)) f Q)) M 2δEQ ) + 2V G λ + δ) 2V U λ,δ) 2V E G r l O ))l O ). 13) Now, u=1 l O ) is he number of deparures from queue in 0,) under O. Since he queue lengh process { Q)} 1 under O is a posiive recurren Marov chain, for every 1 u El O ) = lim l O u u v) = λ w.p. 1, 14) v=1

and 1 = lim u u Er l O ))l O ) u v=1 r l O v))l O v) w.p. 1 = Ω O λ) from 1)). 15) Moreover, since he expecaions are wih respec o he saionary disribuion of Q), i follows ha From 13), 14), 15) and 16), i follows ha Ω O λ) Ef Q + 1)) = Ef Q)). 16) U λ,δ) δ G M 2V Ω max λ) ǫ 2 M from Theorem 1) 2V 17) Ω max λ) ǫ if V M/ǫ. The resul follows. Finally, we commen on he role of he parameer V in deermining he hroughpu of O. From 17), i can be seen ha if V < M/ǫ = na 2 max + 1)/ǫ, hen no hroughpu guaranee can be provided for O. Noe ha A max deermines he bursiness of he arrival process. Thus, he minimum required value of V is higher for more bursy arrival processes. 10 C. Compuaion ime for and O In he wors case, cardinaliy of L can be 2 n as i may conain all n-dimensional binary vecors. Then, λ,δ) can be compued by solving a linear program wih O2 n ) variables and On) consrains. Thus, he ime and he memory required o compue λ,δ) is O2 n ) in he wors case. Under O, we need o find a l L ha maximizes W Q), l) for every. Since L is O2 n ), he ime required o compue he opimal service vecor in each slo is also O2 n ) unless some addiional srucure on he queueing sysem is assumed. We nex propose wo opimal policies which require polynomial compuaion ime in every slo. V. COMPUTATIONALLY SIMPLE OPTIMAL POLICIES We provide a general framewor for designing compuaionally simple policies for maximizing he hroughpu subjec o aaining sabiliy by considering he noion of inaccurae scheduling Subsecion V-A). We subsequenly uilize his framewor o design wo compuaionally simple policies for maximizing he hroughpu subjec o sabilizing he sysem Subsecions V-B,V-C). Finally, we discuss how hese policies can be implemened using disribued compuaion Subsecion V-D). A. Inaccurae scheduling for maximizing he hroughpu subjec o sabilizing he sysem We firs describe a class of scheduling policies referred o as inaccurae scheduling. Noe ha he noion of inaccurae scheduling has earlier been proposed for designing compuaionally simple policies for maximizing he sabiliy region 23, 32, 34. Our conribuion here is o generalize his noion o aain he goal of maximizing he hroughpu subjec o sabilizing he sysem while using simple compuaions. We consider policies for which he sae { Y ) = Q), l ))} 1 consiues an irreducible, aperiodic and counable Marov chain. This assumpion holds when l ) is compued ieraively based on Q) and l 1). Noe ha hen { Q)} 1 may no be a Marov process. Definiion 8 γ-inaccurae Policy): A policy γ is called γ-inaccurae if in each slo i selecs a service vecor l ) such ha W Q), ) l )) max W Q), l X Y )), 18) l L l= l Q))

11 where X Y )) is a random variable ha depends on Y ) i.e., he disribuion or X Y )) is deermined by he curren sysem sae Y )), and if { Y )} 1 has a saionary disribuion hen he expecaion EX Y ) under he saionary disribuion is less han or equal o γ. Any service vecor ha saisfies 18) is called a γ-inaccurae service vecor. Noe ha if γ is large, hen he number of γ-inaccurae service vecors will be large and hence he ime needed o find one such service vecor may be small. We show ha for appropriae choices of V all sable γ-inaccurae policies are ǫ-hroughpu opimal. Theorem 3: Le λ be any sabilizable arrival rae vecor and γ be an arbirary γ-inaccurae policy. Then, for every ǫ > 0 and γ <, here exiss V such ha for every V V, 1) if { Y )} 1 is a posiive recurren Marov chain, hen γ is ǫ-hroughpu opimal, and 2) if EX Y ) Y ) = Y γ for every Y, hen { Y )} 1 is a posiive recurren Marov chain, and γ sabilizes he sysem. Now, we provide he inuiion. For simpliciy of explanaion, we assume ha X Y ) γ for every Y, and hence he condiion in 2) of Theorem 3 holds. We firs explain why γ-inaccurae policies maximize he sabiliy region 34. For large queue lenghs, max l L W Q, l) >> γ, and hence from 18), W Q, l γ ) max l L W Q, l). l= l Q) l= l Q) Thus, O and γ selec similar service vecors when he queue lenghs are large. We have shown ha for every sabilizable λ, O has a negaive drif when he queue lenghs are large. Thus, γ also has a negaive drif for large queue lenghs. Hence, γ sabilizes he sysem whenever λ is sabilizable. Incidenally, oher approximae policies may also maximize he sabiliy region. For example, any policy ha saisfies 18) wih W Q), l )) and W Q), l) replaced by W 1 Q), l )) and W 1 Q), l) respecively, maximize he sabiliy region 23, 32, 34. The ey difference beween only sabilizing he sysem and aaining he maximum possible hroughpu subjec o sabilizing he sysem is ha whereas for he former i is sufficien o appropriaely selec he service vecor when he queue lenghs are large, bu for he laer appropriae selecion of service vecors is required for all values of queue lenghs. Hence, i is no clear ha γ maximizes he hroughpu as well; we now explain why his is in fac somewha couner-inuiive. Noe ha for small queue lenghs max l L W Q, l) may be smaller or comparable l= l Q) wih γ. Then, 18) does no guaranee ha he service vecors seleced by γ and O are similar. Hence, i is no clear ha γ achieves he same hroughpu as O, which aains he maximum hroughpu. We now explain why Theorem 3 holds. We argue ha for proper choice of parameers he queue lenghs and he service vecors under O and γ become similar. Clearly, in he firs slo, boh sysems have he same queue lengh vecor, Q. Now, noe ha for large V, W Q, l1 ) and W Q, l 2 ) significanly differ if l 1 and l 2 are significanly differen. Thus, due o 18), and since W Q, l γ ) max l L W Q, l), l γ l O. Thus, he queue lenghs in l= l Q) he nex slo are also similar in boh sysems. Recursive use of he same argumen shows ha he queue lenghs and he service vecors seleced in each slo are similar in boh sysems. Thus, boh policies aain similar hroughpu. Thus, γ is hroughpu opimal for large V. Nex, we prove Theorem 3. Proof: We assume ha λ is sabilizable. We define he following Lyapunov funcion: f Y )) = Q )) 2. Using analysis similar o ha for obaining 8), E f Y + 1)) f Y )) Y ) M + 2Q )λ 2E W Q), l γ )) Y ) 2V E G r l γ ))l γ ) Y ). 19)

12 From 18) and 19), i follows ha E f Y + 1)) f Y )) Y ) M + 2Q )λ + 2E X Y )) Y ) 2E max W Q), l) Y ) l L l= l Q)) 2V E G r l γ ))l γ ) Y ). Using argumens similar o hose in he proof of 12) from 8), we can prove ha E f Y + 1)) f Y )) Y ) M + 2E X Y )) Y ) 2δQ ) +2V G λ + δ) 2V U λ,δ) 2V E G r l γ ))l γ ) Y ), 20) where δ is such ha λ,δ) is ǫ/2-hroughpu opimal. 1) Proof for 1): Le he process { Y )} 1 be a posiive recurren Marov chain. Then his process has a saionary disribuion. Taing expecaion on boh sides of 20) wih respec o his saionary disribuion, we obain E f Y + 1)) f Y )) M + 2E X Y )) 2δEQ ) + 2V G λ + δ) 2V U λ,δ) 2V E G r l γ ))l γ ). 21) Since { Y )} 1 is a posiive recurren Marov chain, 1 u El γ ) = lim l γ u u v) = λ w.p. 1, 22) and 1 = lim u u v=1 Er l γ ))l γ ) u v=1 r l γ v))l γ v) w.p. 1 = Ω γ λ) from 1)). 23) From saionariy, Ef Y + 1)) = Ef Y )). 24)

13 From 21), 22), 23) and 24), and since from Definiion 8), E X Y )) γ, i follows ha Ω γ λ) U λ,δ) δ G M + 2γ 2V Ω max λ) ǫ 2 M + 2γ from Theorem 1) 2V Ω max λ) ǫ if V M + 2γ. ǫ The resul follows. 2) Proof for 2): Now, le E X Y )) Y ) = Y γ for all Y. Thus, from 20), for every γ and V 0, E f Y + 1)) f Y )) Y ) M + 2γ 2δQ ) + 2V G λ + δ). Le B = { Y = Q, l) : l L, n =0 Q V n δ =0 G λ + δ) + M+2γ+1 2δ }. Thus, { E f Y + 1)) f Y )) Y for all Y ), ) < 1 if Y ) B. Thus, since B is finie, by Foser s Theorem Theorem 2.2.3 in 20), { Y )} 1 is posiive recurren, and he expecaions of he queue lenghs under is saionary disribuion are finie. Hence, γ sabilizes he sysem. The main challenge in compuing γ-inaccurae service vecors is ha W Q, l O ) may no be nown and in mos cases is compuaion is complex. Thus, even he verificaion of wheher a given l is γ-inaccurae may be compuaionally complex. We circumven his challenge by designing a compuaionally simple approach ha obains γ-inaccurae service vecors wihou requiring he nowledge of W Q, l O ). B. Periodic Compuaion of Opimal Schedule We divide he ime axis in inervals of lengh T, i.e., in inervals of he form KT,K + 1)T 1. Le ) l OPT ) def = arg max W Q), l. l L l= l Q)) We consider a policy T ha compues l OPT ) a he beginning of each inerval, i.e., in he slos KT for K 0, and hroughou he inerval serves each seleced queue while i is non-empy. The ime needed o compue T is O2 n /T) in he amorized sense, i.e., sup 1 { u=1 cu)/} is O2n /T) on every sample pah, where cu) is he compuaional complexiy is slo u 17. Thus, if we choose T o be sufficienly large 2 n ), hen T requires O1) compuaion ime in he amorized sense. In he following lemma, we show ha X Y ) γ for all Y where γ = nta max + 2). Lemma 1: Le Q) be he queue lengh vecor under T in. Then, W Q), l T )) max W l L l= l Q)) Q), l ) nta max + 2). Proof: Wihou loss of generaliy le KT,K + 1)T 1 for some K. Now, from 5), Q ) Q KT) + u=kt Λ u) Q KT) + nta max. 25)

14 Similarly, from 5), Q ) Now, from 4), 25) and 26), we obain Now, from 3), max W l L l= l Q)) Q KT) u=kt l T u) Q KT) nt. 26) Q), l ) W Q), l T )) Q KT) V G r l OPT )))l OPT ) Q KT) V G r l T )))l T ) +nta max + 1) = W QKT), l OPT )) W QKT), l T )) +nta max + 1) 27) W QKT), l OPT )) W QKT), l OPT KT)). 28) Also, since he service vecor seleced by T changes in he inerval only if some queues empy during he period and hen he change ) is o no serve) hem, l T ) l OPT KT) and if l T ) < lopt KT) hen Q KT) T. Thus, r l T ) r l OPT KT) for all for which Q ) > 0. Hence, W QKT), l T )) W QKT), l OPT KT)) nt. 29) The resul follows from 27), 28) and 29). Bu, Y ) is no a Marov chain. Thus, in spie of Lemma 1, T is no γ accurae. Now, Y T) is an irreducible, aperiodic, Marov chain, and he framewor for γ-inaccurae scheduling can be generalized o such cases. We omi his generalizaion for breviy. Bu, using Lemma 1 and a proof similar o ha for Theorem 3, we can prove ha when λ is sabilizable, T is ǫ-hroughpu opimal for every ǫ > 0. We formally sae his in he following heorem, and prove i in he appendix. Theorem 4: Le λ be any sabilizable arrival rae vecor. Then, for every ǫ > 0, here exiss V such ha for every V V he policy T is ǫ-hroughpu opimal. The main challenge in using T is ha i needs o periodically compue he opimal service vecor. Since he ime required in each such compuaion is exponenial in n, for large n, such compuaions may become infeasible. We nex propose an opimal randomized policy which requires On) compuaion ime in every slo. C. Opimal Randomized Policy R ) We now propose a randomized policy R which has been inspired by a randomized policy proposed by Tassiulas 34. The policy in 34 aains he maximum possible sabiliy region in a consrained queueing newor using linear ime compuaions in each slo. Our conribuion here is o show ha linear-ime compuable randomized policies can also maximize he hroughpu subjec o sabilizing he sysem. We now describe R.) In every slo 0, R generaes a service vecor l) randomly among all service vecors l L such ha l Q) = l as per a disribuion P Q ). In every slo 1, once a random vecor is generaed as above, R obains l R ) ieraively as follows. )) ) l) : W Q), l R 1)) Q) <W Q), l) l R )= ) l R 1)) Q) : oherwise.

15 Thus, in any slo, R uses a new service vecor only when i increases he value of W ); oherwise i coninues wih he service vecor used in he previous slo. I is ineresing o observe ha he randomized policy proposed by Tassiulas 34, which maximizes he sabiliy region using linear compuaion ime in each slo, uses a new service vecor only when i increases he value of W 1 ). Noe ha he disribuion PQ ) may depend on he curren queue lengh vecor. We only consider disribuions P Q ) such ha for every Q, P Q l OPT ) µ for some µ > 0. Lemma 2: Le λ be a sabilizable arrival rae vecor. Then { Y ) = Q), l γ ))} 1 is a posiive recurren Marov chain, and R sabilizes he sysem. We prove Lemma 2 in appendix. Now, we show ha R is n µ A max + 2)-inaccurae. Lemma 3: Le Q) be he queue lengh vecor under R in. Then, for any iniial disribuion of Y ), ) E max W Q), l W Q), l R )) n l L µ A max + 2). l= l Q)) Proof: Since P Q l OPT ) µ for every Q, l R ) = l OPT ) infiniely ofen w.p. 1. Le {κ K } K 1 be he slos in which l R ) = l OPT ). Again, since P Q l OPT ) µ for every Q, Eκ K+1 κ K 1/µ. Consider he K for which κ K,κ K+1 1. Lie in Lemma 1, we obain ) max W Q), l W Q), l R )) l L l= l Q)) nκ K+1 κ K )A max + 2). Thus, he resul follows since Eκ K+1 κ K 1/µ. Now, from par 1 of Theorem 3 and Lemmas 2 and 3, i follows ha R is ǫ-hroughpu-opimal for any µ. We formally sae his in he following heorem. Theorem 5: Le λ be a sabilizable arrival rae vecor. Then, for every ǫ > 0, here exiss V such ha for every V V he policy R is ǫ-hroughpu opimal. Now, if µ = 2 n, R can be compued in On) ime in each slo. Each non-empy queue can be seleced w.p. 1/2. If he resuling vecor is no in L, hen no queue is served. D. Disribued Implemenaion of R and T Disribued scheduling can be defined in differen ways. One definiion is o consider a policy as disribued if each node selecs is acion based on is observaion, sae and he informaion i acquires by exchanging messages wih is neighbors. Such policies are hen evaluaed on he basis of heir performance and he frequency and he amoun of message exchange. Anoher definiion is o consider a policy as disribued if each node selecs is acion based on is observaion, sae and he saes and acions of nodes in a cerain neighborhood. We firs describe how R and T can be implemened as per he firs definiion. The ime axis can be divided in periods of lengh T. Each node can broadcas is queue lengh a he beginning of every period. The period lengh T should be seleced so ha he broadcass in a period reach oher nodes in he same period. For execuing T, each node compues he opimal service vecor a he beginning of every period based on he broadcass i receives in he previous period. For execuing R, each node randomly selecs a service vecor a he beginning of each period, and subsequenly chooses beween he service vecors seleced in he curren and previous periods based on he broadcass i receives in he previous period, and finally uses he chosen service vecor hroughou he period. All nodes use he same seeds in he random number generaors and herefore obain he same random selecions. For boh policies, each node s compuaions depend on he queue lenghs of oher nodes in he previous period. Theorems 4 and 5 sill hold. The message exchange complexiy can be made arbirarily small in boh cases by increasing T. Deerminaion of an opimal policy which is disribued as per he second definiion for disribued scheduling remains open. Noe ha he design of such scheduling policies in he precursor problem, ha of maximizing he sabiliy region, is sill no compleely undersood, alhough some illuminaing resuls have been obained recenly

16 10, 29, 36. We hope ha he opimaliy resuls in his paper and he recen advances in conex of disribued scheduling will moivae furher exploraion of he above open problem. Finally, Ross e. al. has obained local search based policies, which are liely o be compuaionally simple in pracice, for maximizing he sabiliy region of cerain classes of consrained queueing newors 31. I will be ineresing o deermine wheher he hroughpu can be maximized subjec o sabilizing he sysem using similar local search policies, and how he compuaion ime required by he γ-accurae policies we propose compare wih hose for he resuling local search policies. VI. DISCUSSIONS AND GENERALIZATIONS We now generalize our framewor so as o obain opimal policies when some of he assumpions made in Secion II do no hold. Firs, we have so far assumed ha a pace is discarded only afer i is ransmied. We discuss how our framewor can be generalized o allow a queue o discard some or all paces before ransmiing hem, and examine he advanages and disadvanages of his opion Subsecion VI-A). We nex describe how our framewor can be generalized o accommodae random rewards and random ses of valid service vecors L Subsecion VI-B). Finally, we discuss how L and reward funcions can be chosen so as o aain cerain performance goals in an imporan applicaion domain for his framewor ha of wireless newors Subsecion VI-C). A. Discarding paces before ransmission In Secion II, we have assumed ha each pace is discarded from is queue only afer i is served once. However, in pracice, a pace may be discarded from is queue even before i is served. The availabiliy of his opion enhances he sabiliy region, and is judicious use increases he hroughpu. For example, in Example 1 in Secion III, when λ = 1 ǫ,1 ǫ) where ǫ is a small posiive number, Ωmax λ) 4 Ω b 4). Now, if S 2 can discard paces before serving hem, is sable and aains a hroughpu close o 5. Bu, clearly, indiscriminae use of his opion subsanially reduces he hroughpu. We now show ha appropriae augmenaion of L allows us o design policies ha aain he maximum possible hroughpu in presence of his opion. Le O be he original sysem ha does no allow paces o be discarded before ransmission, and le Ô be he new sysem which allows he above. In Ô a queue is said o be served when a pace is removed from is queue. The service vecors in Ô have 2n 0 1 componens. The firs n componens denoe which queues are being served and he remaining componens denoe wheher he paces from he queues ha are being served are ransmied or discarded before ransmission. We obain he se L of valid service vecors of Ô from he corresponding se L of O as follows. Le l L and le l have i 0 componens where 0 i n. Now, l corresponds o 2 i service vecors in L, and each of hese service vecors a) ransmi paces from he queues l were serving in O and b) discard paces from a cerain possibly empy) subse of queues which l were no serving in O. Noe ha he se of queues l were no serving in O has 2 i subses. Thus, he number of service vecors generaed by l is 2 i. Le l1 be one such service vecor generaed by l. Since l and l1 ransmi paces from he same queues, r l1 ) = r l) for each {1,...,n}. The sabiliy region of Ô is a possibly improper) superse of ha of O. This is because as long as he arrival rae of a queue is less han 1 i can be sabilized in Ô by simply discarding all is paces before ransmission. Thus, he sabiliy region of Ô is a superse of { λ : 0 λ i < 1 i = 1,...,n} and a subse of { λ : 0 λ i 1 i = 1,...,n}. For any λ ha is sabilizable in O, he maximum hroughpu of a sable policy in O is less han or equal o ha of he maximum hroughpu of a sable policy in Ô. This is because every policy in O is a valid policy in Ô, since for each l L here exiss l L ha does no discard paces from any queue before ransmission, and ransmis paces from he same queues which l serves. Noe ha, O, γ, T and R can be defined similar o ha in O; he only difference is ha L mus be subsiued by L. The performance guaranees for hese generalized versions, i.e., Theorems 1 o 5, hold in Ô. are he same as hose for O. However, noe ha higher hroughpu and sabiliy region can be aained in Ô while sacrificing fairness. Specifically, for any λ ha is sabilizable in O, if an ǫ hroughpu opimal policy ˆ in Ô aains a hroughpu which is higher han ha of an ǫ hroughpu opimal policy in O, ˆ discards paces before ransmission from some queues. Thus, ˆ aains higher hroughpu by being unfair o some queues. Also, in communicaion newors, in presence of his opion, some receivers may only receive a small fracion of paces ransmied by he corresponding

17 sources, which will in urn preven hem from successfully decoding he ransmied informaion. Thus, his opion is no liely o be widely used refer o Secion VI-C). B. Random rewards and random L We have so far assumed ha he reward received by he h queue in is compleely deermined by he service vecor chosen in. We now allow he rewards o be random variables r.v. s) ha depend on an exernal random componen in addiion o he service vecor Subsecion VI-B). This generalizaion is relevan in conex of wireless newors, where he success of a ransmission is a random even whose probabiliy depends on he fading sae of he channels. Thus, in one-o-many or one-o-one communicaion he reward is a r.v. whose disribuion depends on he service vecor and he channel fading sae beween each sender-receiver pair. We generalize, O, γ, T and R so as o maximize he hroughpu subjec o sabilizing he sysem in presence of random rewards. We firs formally describe he generalizaion. We consider a random process {S)} 1 which in any slo is in sae S i wih probabiliy b i, i = 1,2,...,z, independen of is sae in any oher slo and also independen of he arrival process in any slo. Here, b i > 0 for each i {1,...,z}. The policy nows S) a he beginning of slo. The reward received by he h queue when l is he service vecor and he process S) is in sae S is a random variable, R l,s), whose disribuion depends on l and S. Le r l,s) def = ER l,s) G for every l and S. We assume ha R l,s) = 0 if l = 0, and r l 1,S) r l 2,S) if l 1 l 2 and l > 0. Thus, he hroughpu Ω λ) under a policy and arrival rae vecor λ is Ω λ) = lim inf = lim inf 1 1 u=1 u=1 R l u), Su)) R l u), Su))l u). 30) Finally, when he arrival rae vecor is λ, he maximum hroughpu of any sable policy is Ω max λ). We nex elucidae he above formalisms wih a specific example. Example 4: In Fig.1 assume ha he channel o each receiver is in good bad, resp.) sae w.p. 0.8 0.2), and each receiver can decode he pace w.p. 0.9 0.2, resp.) when is channel is in good bad, resp.) sae and i is no in he range of any oher sender ha is ransmiing paces. The sae of a channel in a slo is independen of ha in oher slos and also independen of he saes of oher channels in any slo. In each slo, he sysem nows he saes of all channels, bu does no now wheher a receiver can decode he pace is sender ransmis. Thus, he sysem has 64 saes corresponding o differen combinaions of channel saes. Now, if l = l 2 l = l 4, resp.) R 1 l,s) equals he number of receivers in he se {R 1,...,R 5 } {R 3,R 4,R 5 }, resp.) ha can decode he pace S 1 ransmis and if l { l 1, l 3 }), R 1 l,s) = 0. Nex, R 2 l,s) = 1 if l 2 = 1 and R 6 can decode he pace, R 2 l,s) = 0 oherwise. Thus, R 1 l,s),r 2 l,s) are random variables whose disribuions depend on l,s. For example, r 1 l 2,S) = 4.5 if S is such ha he channels o R 1,...,R 5 are in good sae, r 2 l,s) = 0.9 if l 2 = 1 and S is such ha he channel o R 6 is in good sae. Firs, noe ha since L does no depend on S), he sabiliy region of he sysem remains he same. Now, we presen he opimaliy resuls. We firs describe how λ,δ) can be generalized. In any slo in which S) = S z, he generalized ) policy λ,δ) selecs l i w.p. w iz. If l i is seleced in slo, hen he sysem selecs service vecor li Q) i.e., l b λ,δ) ) = l i Q) )). The probabiliy disribuion w z = w 1z w mz for every z = 1,...,Z is compued using he following linear program. LP λ,δ) :- Maximize: Û λ,δ) = Z z=1 b m n z i=1 w izl i r l i,s z ) Subjec o: 1) m i=1 w iz = 1 for every z {1,...,Z} 2) w iz 0 for every i {1,...,m} and z {1,...,Z} 3) Z z=1 b z m i=1 w izl i = λ + δ for every {1,...,n}. Noe ha LP λ,δ) is similar o LP λ,δ); he only difference is ha he disribuion for selecing he service vecors depends on he sae S) of he sysem.