Network Utility Maximization over Partially Observable Markovian Channels

Size: px

Start display at page:

Download "Network Utility Maximization over Partially Observable Markovian Channels"

Clyde Holmes
5 years ago
Views:

1 WIOPT Nework Uiliy Maximizaion over Parially Observable Markovian Channels Chih-ping Li, Suden Member, IEEE and Michael J. Neely, Senior Member, IEEE Absrac This paper considers maximizing hroughpu uiliy in a muli-user nework wih parially observable Markov ON/OFF channels. Insananeous channel saes are never known, and all conrol decisions are based on informaion provided by ACK/NACK feedback from pas ransmissions. This sysem can be viewed as a resless muli-armed bandi problem wih a concave objecive funcion of he ime average reward vecor. Such problems are generally inracable. However, we provide an approximae soluion by opimizing he concave objecive over a non-rivial inner bound on he nework performance region, where he inner bound is consruced by randomizing welldesigned saionary policies. Using a new frame-based Lyapunov drif argumen, we design a policy of admission conrol and channel selecion ha sabilizes he nework wih hroughpu uiliy ha can be made arbirarily close o he opimal in he inner performance region. Our problem has applicaions in limied channel probing in wireless neworks, dynamic specrum access in cogniive radio neworks, and arge racking of unmanned aerial vehicles. Our analysis generalizes he MaxWeigh-ype scheduling policies in sochasic nework opimizaion heory from ime-sloed sysems o frame-based sysems ha have policy-dependen frame sizes. I. INTRODUCTION This paper sudies a muli-user wireless scheduling problem in a parially observable environmen. We consider a base saion serving N users via N independen Markov ON/OFF channels see Fig. 1). Time is sloed wih normalized slos Fig. 1. P n,10 P n,11 ON1) OFF0) P n,00 P n,01 The Markov ON/OFF model for channel n {1, 2,..., N}. Z +. Channel saes are fixed in every slo, and can only change a slo boundaries. Suppose he base saion has unlimied daa o send for all users. In every slo, he channel saes are unknown, and he base saion selecs a mos one user o which i blindly ransmis a packe. The ransmission succeeds if he used channel is ON, and fails oherwise. A he end of a slo, an error-free ACK is fed back from he Chih-ping Li web: hp://www-scf.usc.edu/ chihpinl) and Michael J. Neely web: hp://www-rcf.usc.edu/ mjneely) are wih he Deparmen of Elecrical Engineering, Universiy of Souhern California, Los Angeles, CA 90089, USA. This maerial is suppored in par by one or more of he following: he DARPA IT-MANET gran W911NF , he NSF Career gran CCF , and he Nework Science Collaboraive Technology Alliance sponsored by he U.S. Army Research Laboraory. served user o he base saion over an independen conrol channel absence of an ACK is considered as a NACK). Since channels are ON/OFF and correlaed over ime, he ACK/NACK feedback provides parial informaion of fuure channel saes, and can improve fuure scheduling decisions for beer performance. The goal is o design a conrol policy ha maximizes a concave uiliy funcion of he hroughpu vecor from all channels. Specifically, le y n ) be he number of packes delivered o user n {1,..., N} in slo. Define 1 E y nτ) as he hroughpu of user 1 y n lim n. Le Λ denoe he nework capaciy region, defined as he closure of he se of all achievable hroughpu vecors y y n ) N. The goal is o solve: maximize: gy), subjec o: y Λ, 1) where g ) is a concave, coninuous, nonnegaive, and nondecreasing funcion. The ineres in he above problem comes from is many applicaions. One applicaion is limied channel probing over wireless neworks. Consider he same wireless downlink as saed above, excep ha a mos one channel is explicily probed in every slo. A packe is served over he probed channel if he sae is ON. This seup is essenially he same as our original problem, excep ha channels are probed differenly implici probing by ACK/NACK feedback versus probing by explici signaling). The moivaion for sudying limied channel probing is ha, in a fas-changing environmen where full channel probing may be infeasible due o iming overhead, we shall probe channels wisely and exploi channel memory o improve nework performance. As an example of 1), we may addiionally provide fairness o all users, such as a varian of rae proporional fairness 1, 2 by solving: maximize: gy) = N log 1+y n ), subjec o: y Λ. 2) In cogniive radio neworks 3, 4, a secondary user has access o a collecion of Markov ON/OFF channels. Every channel reflecs he occupancy of a specrum by a primary user, and he secondary user opporunisically ransmis daa over unused specrums for beer specrum efficiency. In arge racking of unmanned aerial vehicles UAVs) 5, a UAV deecs one of he many arges in every slo. Every Markov channel reflecs he movemen of a arge; a channel is ON if is associaed arge moves o a hospo, and OFF oherwise. Delivering a packe over a channel represens gaining a reward by locaing a arge a is hospo. In he above wo applicaions, possible goals include maximizing a weighed

2 WIOPT sum gy) = N c ny n of hroughpus/ime-average rewards, where c n are given consans, or providing fairness o differen specrums/arges by solving 2). The problem 1) is challenging because he curren informaion available for each channel depends on he pas ransmission decisions. This problem belongs o he class of resless muli-armed bandi RMAB) problems 6, which are generally inracable 7. In addiion, he nework capaciy region Λ in 1) does no seem o have a closed form expression see 8 for more discussions). Therefore we mus resor o approximaion mehods o solve 1). In his paper, we propose an achievable region approach o consruc an approximae soluion o 1). There are wo seps: i) we consruc a good inner performance region Λ in Λ for he original problem, hen ii) we solve he consrained problem: maximize: gy), subjec o: y Λ in, 3) which serves as an approximaion o he original problem 1). In previous work 8, we have consruced a non-rivial inner performance region Λ in using he rich srucure of Markov channels see Secion III for deails). The inner performance region Λ in is rendered as a convex hull of performance vecors of a well-designed collecion of round robin policies. The ighness of he inner region Λ in see Fig. 2 for an example) is analyzed in 8 when channels are saisically idenical. In his special case we show ha he gap beween he boundary of he inner region Λ in and ha of he full performance region Λ decreases o zero geomerically fas as he reference direcion moves closer o he 45-degree angle. 1 The main conribuion of his paper is wih respec o he second sep of he achievable region approach: Given an inner performance region Λ in, we consruc a policy ha solves 3) using Lyapunov drif heory. Lyapunov drif heory is originally developed for hroughpu opimal conrol over ime-sloed wireless neworks 9, 10, laer exended o opimize various performance objecives such as average power 11 or rae uiliy funcions 1, 12, 13 in wireless neworks, and recenly generalized o opimize dynamic sysems ha have a renewal srucure The inuiion is he following. Since he performance region Λ in is a convex hull of performance vecors of he round robin policies we design, he problem 3) is solved by some random mixure of hese policies. Using Lyapunov drif heory see more deails in Secion IV), we greedily consruc a sequence of round robin policies whose long-erm ime sharing can approximae he opimal soluion as close as desired, wih some radeoffs discussed laer. Our conrol policy ha solves 3) has wo componens. To faciliae he soluion o 3), we keep an infinie-capaciy queue for every user a he base saion, and design an admission 1 We remark ha he ighness of he inner region Λ in is difficul o check in general cases, alhough he region is inuiively large by he naure of is consrucion. The boomline is, consrucing an inuiively good and easily achievable inner performance region is of pracical ineres, because saisfying performance ouside he inner region may ineviably involve solving much more complicaed parially observable Markov decision processes. From his view, in inracable RMAB problems, we may regard an inner performance region as an operaional performance region, which shall be gradually improved by a deeper invesigaion ino he problem srucure. conrol algorihm ha admis daa ino he queues for evenual ransmissions. In every slo, he amoun of daa admied for every user is decided by solving a simple convex program. 2 In addiion, he base saion deploys a sequence of round robin policies implemened frame by frame, where every frame is one round of execuion by a round robin policy. The round robin policy used in every frame is seleced by maximizing an average drif minus reward raio over he average frame size c.f. 18)). We emphasize ha his new raio rule generalizes he MaxWeigh-ype policies 1, 10 for sochasic nework opimizaion from ime-sloed wireless neworks o framebased sysems in which he disribuion of he random frame size is policy-dependen. We prove ha he above policy of admission conrol and channel scheduling yields a hroughpu vecor y saisfying gy) gy ) B V, 4) where gy ) is he opimal objecive of problem 3), B > 0 is a finie consan, V > 0 is a predefined conrol parameer, and we emporarily assume ha all limis exis. By choosing V sufficienly large in 4), he performance uiliy gy) can be made arbirarily close o he opimal gy ), wih he radeoff ha he average queue size a he base saion grows linearly wih V. We remark ha he proof of 4) does no require he knowledge of he opimal uiliy gy ). In he lieraure, sochasic uiliy maximizaion over wireless neworks is solved in 1, 12, assuming ha channel saes are i.i.d. over slos and are known perfecly and insanly. Limied channel probing in wireless neworks is sudied in differen conexs in 17 22, also assuming ha channel saes are i.i.d. over ime. This paper generalizes he framework in 1 o wireless neworks wih limied channel probing and ime-correlaed channels, and uses channel memory o improve performance. RMAB problems wih Markov ON/OFF projecs are previously sudied in for he maximizaion of sum of ime average or discouned rewards. In paricular, work shows ha greedy round robin policies are opimal in some special cases; we modify hese policies in 8 for he consrucion of a racable inner performance region Λ in. Index policies such While s index 6 are consruced in 26, 27, and are shown o have good performance by simulaions. A 2 + ɛ)-approximae algorihm is derived in 28 based on dualiy mehods. This paper provides a new mahemaical programming mehod for opimizing nonlinear objecive funcions of ime average rewards in RMAB problems. In he lieraure RMAB problems are mosly sudied wih linear objecive funcions. The wo popular mehods for linear RMABs While s index 6 and parially observable) Markov decision heory 29 seem difficul o apply o nonlinear RMABs because hey are based on dynamic programming ideas. Exensions of our new mehod in his paper o oher RMAB problems wih general projec sae space are lef for fuure research. 2 The admission conrol decision decouples ino separable one-dimensional convex programs ha are easily solved in real ime when he hroughpu uiliy gy) is a sum of one-dimensional uiliy funcions.

3 WIOPT In he res of he paper, he deailed nework model is in he nex secion. Secion III inroduces he inner performance region Λ in consruced in 8. Our dynamic conrol policy is moivaed and given in Secion IV, followed by performance analysis. II. DETAILED NETWORK MODEL Beside he nework model inroduced in he previous secion, we suppose ha every Markov ON/OFF channel n {1,..., N} changes saes a slo boundaries by he ransiion probabiliy marix Pn,00 P P n = n,01, P n,10 P n,11 where sae ON is represened by 1 and OFF by 0, and P n,ij denoes he ransiion probabiliy from sae i o j. Assume ha he marices P n are known. We assume ha every channel is posiively correlaed over ime, so ha an ON sae is more likely followed by he same sae. An equivalen mahemaical definiion is P n,01 + P n,10 < 1 for all n. We suppose ha every user has a higher-layer daa source of unlimied packes a he base saion. The base saion keeps a nework-layer queue Q n ) of infinie capaciy for every user n {1,..., N}, where Q n ) denoes he backlog for user n in slo. In every slo, he base saion admis r n ) 0, 1 packes for user n from is daa source ino queue Q n ). For simpliciy, we assume ha r n ) akes real values in 0, 1 for all n. 3 Le µ n ) {0, 1} denoe he service rae for user n in slo. The queueing process {Q n )} =0 of user n evolves as Q n + 1) = maxq n ) µ n ), 0 + r n ). 5) Iniially Q n 0) = 0 for all n. We say queue Q n ) is srongly) sable if is limiing average backlog is finie, i.e., lim sup 1 1 E Q n τ) <. The nework is sable if all queues Q 1 ),..., Q N )) are sable. Clearly a sufficien condiion for sabiliy is: lim sup 1 1 N E Q n τ) <. 6) Our goal is o design a policy ha admis he righ amoun of daa ino he nework and serves hem properly by channel scheduling, so ha he nework is sable wih hroughpu uiliy ha can be made arbirarily close o he opimal soluion o he problem 3). III. A PERFORMANCE INNER BOUND This secion presens an inner performance region Λ in consruced in previous work 8 using randomized round robin policies; see 8 for deails. For every channel n {1,..., N}, le P k) n,ij denoe he k-sep ransiion probabiliy from sae i o j, and π n,on be he saionary probabiliy of sae ON. We 3 We can accommodae he ineger-value assumpion of r n) by inroducing auxiliary queues; see 1 for an example. define he informaion sae for user n in slo, denoed by ω n ), as he condiional probabiliy ha channel n is ON in slo given all pas channel observaions. Namely, ω n ) Pr s n ) = ON channel observaion hisory, where s n ) denoes he sae of channel n in slo. Condiioning on he mos recen channel observaion, we observe ha ω n ) akes values in he counably infinie se W n {P k) n,01, Pk) n,11 : k N} {π n,on}. The informaion sae vecor ω n )) N is a sufficien saisic 29; i is opimal o ac based only on he ω n )) N informaion. Le n) denoe he channel observed in slo via ACK/NACK feedback. The probabiliy ω n ) on channel n {1,..., N} evolves as: P n,01, if n = n), s n ) = OFF ω n +1)= P n,11, if n = n), s n ) = ON ω n )P n, ω n ))P n,01, if n n). 7) A. Randomized round robin policy Le Φ denoe he se of all nonzero N-dimensional binary vecors. Every vecor φ φ n ) N Φ represens a collecion of acive channels, where we say channel n is acive if φ n = 1. Le Mφ) denoe he number of ones or acive channels) in φ. Consider he nex dynamic round robin policy RRφ) ha serves acive channels in φ, possibly wih differen order in differen rounds. This is he building block of randomized round robin policies ha we will inroduce shorly. Dynamic Round Robin Policy RRφ): 1) In every round, we assume an ordering of acive channels in φ is given. 2) When swiching o an acive channel n, Wih probabiliy P Mφ)) n,01 /ω n ), we keep ransmiing packes over channel n unil a NACK is received, afer which we swich o he nex acive channel according o he predefined ordering. Oherwise, we ransmi over channel n a dummy packe wih no informaion conen for one slo used for channel sensing), hen swich o he nex acive channel. 3) Updae probabiliies ω n )) N by 7) in every slo. These RRφ) policies are carefully designed o have good and, more imporanly, racable performance. Work 24 shows ha, when channels are saisically idenical, serving all channels by greedy round robin policies differen from he above) maximizes he sum hroughpu of he nework. Thus, inuiively, we ge a good achievable hroughpu region Λ in by randomly mixing round robin policies each of which serves a differen subse of channels. Randomized Round Robin Policy RandRR: 1) In every round, pick a binary vecor φ Φ {0} wih some probabiliy α φ, where α 0 + φ Φ α φ = 1. 2) If a vecor φ Φ is seleced, run policy RRφ) for one round using he channel ordering of leas recenly used

4 WIOPT firs. Oherwise, φ = 0, idle he sysem for one slo. A he end of eiher case, go o Sep 1. We noe ha, in every round of a RandRR policy, a RRφ) policy is feasible only if P Mφ)) n,01 ω n ) whenever an acive channel n sars service see Sep 2 of he RRφ) policy). This condiion is guaraneed by serving acive channels in he order of leas recenly used firs 8, Lemma 6. Thus all RandRR policies are feasible. 4 The following resuls presen he amoun of service opporuniies provided by a RandRR policy o every user n. Theorem 1 8). i) In every round of a RandRR policy, when policy RRφ) is randomly chosen for service, le L φ n denoe he ime duraion an acive channel n is accessed. The duraion L φ n has he probabiliy disribuion: and L φ n = { 1 wih prob. 1 P Mφ)) n,01 j 2 wih prob. P Mφ)) n,01 P n,11 ) j 2) P n,10 E L φ P Mφ)) n,01 n = ) P n,10 ii) In he duraion L φ n, channel n serves L φ n 1) packes. Theorem 1 shows ha he disribuion of L φ n is independen of he informaion sae vecor ω n )) N a he sar of a ransmission round; i only depends on he number of channels, Mφ), chosen for service in a round. This observaion implies ha he ransmission rounds in a RandRR policy have i.i.d. duraions. Moreover, for every fixed user n, he number of user-n packes served in a round is also i.i.d. over differen rounds. This leads o he following corollary. Corollary 1. i) Le T k denoe he duraion of he kh ransmission round in a RandRR policy. The random variables T k are i.i.d. over differen k wih E T k = α 0 + α φ E L φ n, φ Φ n:φ which is compued by condiioning on he policy chosen in a round. ii) Le N n,k denoe he number of packes served for user n in round T k. For each fixed n, he random variables N n,k are i.i.d. over differen k wih E N n,k = φ:φ α φ E L φ n 1, which is compued by condiioning on he RRφ) policy ha is chosen and uses channel n. iii) Because N n,k and T k are i.i.d. over k, he hroughpu of user n under a RandRR policy is equal o E N n,k /E T k. B. The inner performance region Λ in In his paper, we define he inner performance region Λ in in 3) as he se of all hroughpu vecors achieved by he class of RandRR policies. Equivalenly, he inner hroughpu region 4 The feasibiliy of RandRR policies is proved in 8 under he special case ha here are no idle operaions α 0 = 0). Using he monooniciy of he k- sep ransiion probabiliies {P k) n,01, Pk) n,11 }, he feasibiliy can be similarly proved for he exended RandRR policies considered here. Λ in can be viewed as a convex hull of he zero vecor and he performance vecors of he subse of RandRR policies, each of which execues a fixed RRφ) policy in every round. A closed form expression of he inner region Λ in is given in 8, Theorem 1. An example is given nex. Consider a wo-user sysem wih saisically idenical channels wih P 01 = P 10 = 0.2. Fig. 2 shows he ighness of he inner hroughpu region Λ in compared o he unknown) full nework capaciy region Λ. We noe ha poins B, A, and λ 2 B D Λ in A Λ Fig. 2. The closeness of he inner hroughpu region Λ in and he nework capaciy region Λ in a wo-user nework wih saisically idenical channels. C in Fig. 2 maximize he sum hroughpu of he nework in direcions 0, 1), 1, 1), and 1, 0), respecively 24. Thus he boundary of Λ is a concave curve connecing hese poins. C λ 1 IV. NETWORK UTILITY MAXIMIZATION A. The QRRNUM policy Following he above discussions, he problem 3) is a welldefined convex program. Ye, solving 3) is difficul because he performance region Λ in is represened as a convex hull of 2 N performance vecors. The following admission conrol and channel scheduling policy solves 3) in a dynamic manner wih low complexiy. Queue-dependen Round Robin for Nework Uiliy Maximizaion QRRNUM): Admission conrol) A he sar of every round, observe he curren queue backlog Q) = Q 1 ),..., Q N )) and solve he convex program maximize: V gr) N Q n ) r n 9) subjec o: r n 0, 1, n {1,..., N}, 10) where V > 0 is a predefined conrol parameer, and vecor r r n ) N. Le rn QRR ) N denoe he soluion o 9)-10). In every slo of he curren round, admi rn QRR packes ino queue Q n ) for every user n {1,..., N}. Channel scheduling) A he sar of every round, over all nonzero binary vecors φ = φ n ) N Φ, le φ QRR be he maximizer of he raio N Q n)e L φ n 1 φ n, 11) N E L φ n φ n

5 WIOPT where E L φ n is given in 8). If he maximum of 11) is posiive, run policy RRφ QRR ) for one round using he channel ordering of leas recenly used firs. Oherwise, idle he sysem for one slo. A he end of eiher case, sar a new round of admission conrol and channel scheduling. When he uiliy funcion g ) is a sum of individual uiliies, i.e., gr) = N g nr n ), problem 9)-10) decouples ino N one-dimensional convex programs, each of which maximizes he weighed difference V g n r n ) Q n )r n over r n 0, 1, which can be solved efficienly in real ime. The mos complex par of he QRRNUM policy is o maximize he raio 11). In he following we presen a bisecion algorihm 16, Secion ha searches for he maximum of 11) wih exponenially fas speed. This algorihm is moivaed by he nex lemma. Lemma 1. 16, Lemma 7.5) Le aφ) and bφ) denoe he numeraor and denominaor of 11), respecively. Define { } aφ) θ max, cθ) max aφ) θbφ). φ Φ bφ) φ Φ Then he following is rue: 1) If θ = θ, hen cθ) = 0. 2) If θ < θ, hen cθ) > 0. 3) If θ > θ, hen cθ) < 0. The value cθ) can be easily compued by noicing { } cθ) = max aφ) θbφ), 12) φ Φ k max k {1,...,N} where Φ k Φ denoes he se of binary vecors having k ones. The inner maximum of 12) is equal o { N } k) P n,01 max Q n ) θ) θ φ n, φ Φ k P n,10 P which is solved by soring he values k) n,01 P n,10 Q n ) θ) θ. Inuiion from Lemma 1: To search for he opimal raio θ, suppose iniially we know θ θ min, θ max for some θ min and θ max. We compue he midpoin θ mid = 1 2 θ min + θ max ) and evaluae cθ mid ). If cθ mid ) > 0, we have θ mid < θ and hus θ θ mid, θ max ; one such bisecion operaion reduces he feasible region of θ by half. By ieraing he bisecion, we can find θ quickly. Noice ha he maximizer of cθ ) is he desired policy φ QRR, since by definiion we have aφ) θ bφ) 0 for all φ Φ and aφ QRR ) θ bφ QRR ) = 0. The bisecion algorihm ha maximizes 11): Iniially, define θ min 0 and N ) Q n) θ max { } πn,on max n {1,...,N} P n,10 { } Pn, min n {1,...,N} P n,10 so ha θ min aφ)/bφ) θ max for all φ Φ. I follows ha θ θ min, θ max. 5 Compue θ mid = 1 2 θ min + θ max ) and cθ mid ). If cθ mid ) = 0, we have θ = θ mid and φ QRR is he maximizer of 5 The value θ max is creaed by noing ha, in a posiively correlaed channel, he k-sep ransiion probabiliies P k) k) n,01 and P n,10 increase and decrease wih k, respecively; boh sequences have he same limi π n,on. cθ ). When cθ mid ) < 0, updae he feasible region of θ as θ min, θ mid. If cθ mid ) > 0, updae he feasible region of θ as θ mid, θ max. In eiher case, repea he bisecion process. B. Lyapunov drif inequaliy The consrucion of he QRRNUM policy follows a new Lyapunov drif argumen. We sar wih consrucing a framebased Lyapunov drif inequaliy over a frame of size T, where T is possibly random bu has a finie second momen bounded by a consan C so ha C E T 2 Q) for all and all possible Q). Inuiion for consrucing such an inequaliy is shown laer. By ieraively applying 5), i is no hard o show T 1 T 1 Q n +T ) max Q n ) µ n + τ), 0 + r n +τ) 13) for every n {1,..., N}. We define he quadraic Lyapunov funcion LQ)) 1 2 N Q2 n) as a scalar measure of he queue size vecor Q). Define he T -slo Lyapunov drif T Q)) E LQ + T )) LQ)) Q) as a condiional expeced change of queue sizes across T slos, where he expecaion is wih respec o he randomness of he nework wihin he T slos, including he randomness of T. By aking square of 13) for every n, using inequaliies maxa b, 0 a a, b 0, maxa b, 0) 2 a b) 2, µ n ) 1, r n ) 1, o simplify erms, summing all resuling inequaliies, and aking condiional expecaion on Q), we can show T Q)) B N E Q n ) T 1 14) µ n + τ) r n + τ) Q) where B NC > 0 is a consan. Subracing from boh sides T 1 of 14) he weighed sum V E gr + τ)) Q), where V > 0 is a predefined conrol parameer and r + τ) an admied daa vecor, we ge he Lyapunov drif inequaliy T 1 T Q)) V E gr + τ)) Q) 15) where fq)) hq)) E B fq)) hq)), N Q n )E T 1 T 1 µ n + τ) Q) V gr + τ)) 16) N Q n )r n + τ) Q). 17) The inequaliy 15) holds for any scheduling policy over a frame of any size T.

6 WIOPT C. Inuiion behind he Lyapunov drif inequaliy The desired nework conrol policy shall sabilize all queues Q 1 ),..., Q N )) and maximize he hroughpu uiliy g ). For queue sabiliy, we wan o minimize he Lyapunov drif T Q)), because i capures he expeced growh of queue sizes over a duraion of ime. On he oher hand, o increase hroughpu uiliy, we wan o admi more daa ino he sysem for service, and maximize he expeced sum T 1 gr + τ)) Q). Minimizing Lyapunov uiliy E drif and maximizing hroughpu uiliy, however, conflic wih each oher, because queue sizes increase wih more daa admied ino he sysem. To capure his radeoff, i is naural o minimize a weighed difference of Lyapunov drif and hroughpu uiliy, which is he lef side of 15). The radeoff is conrolled by he posiive parameer V. Inuiively, a large V value pus more weighs on hroughpu uiliy, hus hroughpu uiliy is improved, a he expense of he growh of he queue size refleced in T Q)). The consrucion of he inequaliy 15) provides a useful bound on he weighed difference of Lyapunov drif and hroughpu uiliy. The QRRNUM policy ha we will consruc in he nex secion uses he above ideas wih wo modificaions. Firs, i suffices o minimize a bound on he weighed difference of Lyapunov drif and hroughpu uiliy, i.e., he righ side of 15). Second, since he weighed difference of Lyapunov drif and hroughpu uiliy in 15) is made over a frame of T slos, where he value T is random and depends on he policy implemened wihin he frame, i is naural o normalize he weighed difference by he average frame size, and we will minimize he resuling raio see 18)). This new raio rule is a generalizaion of he MaxWeigh policies for sochasic nework opimizaion over frame-based sysems. D. Consrucion of he QRRNUM policy We consider he policy ha, a he sar of every round, observes he curren queue backlog vecor Q) and maximizes over all feasible policies he expression: fq)) + hq)) E T Q) 18) over a frame of size T, where he numeraor is defined in 16) and 17). Every feasible policy here consiss of: 1) an admission policy ha admis packes ino queues Q) for all users in every slo; 2) a randomized round robin policy RandRR given in Secion III-A) for daa delivery. The frame size T in 18) is he lengh of one ransmission round under he candidae RandRR policy, and is disribuion depends on he backlog vecor Q) via he queue-dependen choice of policy RandRR. When he feasible policy ha maximizes 18) is chosen, i is execued for one round of ransmission, afer which a new policy is chosen for he nex round based on he updaed raio of 18), and so on. Nex we simplify he maximizaion of 18); he resul is he QRRNUM policy in Secion IV-A. In hq)), he opimal admied daa vecor r+τ) in every slo is independen of he frame size T and of he rae allocaions µ n + τ) in fq)). In addiion, i should be he same for all τ {0,..., T 1}, and is he soluion o 9)-10). These observaions lead o he admission conrol subrouine in he QRRNUM policy. Le Ψ Q)) denoe he opimal objecive of 9)-10). Since he opimal admied daa vecor is independen of he frame size T, we ge hq)) = E T Q) Ψ Q)), and 18) is equal o fq)) E T Q) + Ψ Q)). 19) I indicaes ha finding he opimal admission policy is independen of finding he opimal RandRR policy ha maximizes he firs erm of 19). Nex we evaluae he firs erm of 19) under a fixed RandRR policy wih parameers {α φ } φ Φ {0}. In he res of he secion, when we use a RRφ) policy for one round, he channel ordering of leas recenly used firs is always adoped. Condiioning on he choice of φ, we ge fq)) = α φ fq), RRφ)), φ Φ {0} where fq), RRφ)) denoes he erm fq)) in 16) evaluaed under he policy RRφ) for one round; for convenience we have denoed by RR0) he policy of idling he sysem for one slo. Similarly, by condiioning we can show 6 E T = E T Q) = α φ E T RRφ), φ Φ {0} where T RRφ) denoes he duraion of one ransmission round under he RRφ) policy. I follows ha fq)) E T Q) = φ Φ {0} α φ fq), RRφ)) φ Φ {0} α φ E. 20) T RRφ) The nex lemma shows ha here always exiss a RRφ) policy maximizing 20) over all RandRR policies for one round of ransmission. Therefore i suffices o focus on he class of RRφ) policies in every round of ransmission. Lemma 2. We index RRφ) policies for all φ Φ {0}. For he RRφ) policy wih index k, define f k fq), RRφ)), D k E T RRφ). Wihou loss of generaliy, assume f 1 D 1 f k D k, k {2, 3,..., 2 N }. Then for any probabiliy disribuion {α k } k {1,...,2N } wih α k 0 and 2 N k=1 α k = 1, we have 2 N f 1 k=1 α k f k D 2 N 1 k=1 α. k D k Proof of Lemma 2: Omied due o space consrain. By Lemma 2, nex we evaluae he firs erm of 19) under a given RRφ) policy. When φ = 0, we ge fq))/e T Q) = 0. Oherwise, fix some φ Φ. For each acive channel n in φ, we denoe by L φ n he amoun 6 Given a fixed RandRR policy, he frame size T no longer depends on he backlog vecor Q), and E T = E T Q).

7 WIOPT of ime he nework says wih channel n in one round of ransmission under policy RRφ). The probabiliy disribuion and he mean of L φ n are given in Theorem 1. I follows ha under he RRφ) policy we have E T = E T Q) = E L φ n, E T 1 n:φ µ n + τ) Q) = As a resul, fq)) E T Q) = { E L φ n 1 if φn = 1 0 if φ n = 0. N Q n)e L φ n 1 φ n. 21) N E L φ n φ n The above simplificaions lead o he channel scheduling subrouine of he QRRNUM policy. V. PERFORMANCE ANALYSIS Theorem 2. Le y n ) = minq n ), µ n ) be he number of packes delivered o user n in slo ; define y) y n )) N. For any conrol parameer V > 0, he QRRNUM policy sabilizes all queues Q 1 ),..., Q N )) and yields hroughpu uiliy saisfying lim inf g 1 1 E yτ) ) gy ) B V, 22) where gy ) is he opimal objecive of he consrained problem 3) and B > 0 is a finie consan. Proof of Theorem 2: In Appendix A. Theorem 2 shows ha he hroughpu uiliy under he QRRNUM policy is a mos B/V away from he opimal gy ). By choosing V sufficienly large, he hroughpu uiliy can be made arbirarily close o he opimal gy ) and he consrained problem 3) is solved. The radeoff of choosing a large V value is ha he average queue size in he nework grows linearly wih V. Such radeoff agrees wih he design principle of he QRRNUM policy discussed in Secion IV-C. VI. CONCLUSION This paper provides a heoreical framework for uiliy maximizaion over a wireless nework wih parially observable Markov ON/OFF channels. The performance and conrol in his nework are consrained by limiing channel probing and delayed/uncerain channel sae informaion, bu can be improved by exploiing channel memory. Overall, aacking such problems requires solving a leas approximaely) highdimensional resless muli-armed bandi problems wih nonlinear objecive funcions of ime average rewards, which are difficul o solve by exising ools such as While s index or Markov decision heory. This paper provides a new achievable region mehod for hese problems. The idea is o firs idenify a good inner performance region rendered by randomizing saionary policies, and hen solve he problem over he inner region, serving as an approximaion o he original problem. In his paper, wih an inner performance region consruced in 8, we provide a novel frame-based Lyapunov drif argumen ha solves he approximaion problem wih provably near-opimal performance. We generalize he classical MaxWeigh policies from ime-sloed wireless neworks o frame-based ones ha have policy-dependen random frame sizes. Exensions of his new achievable region mehod o oher open sochasic opimizaion problems are ineresing fuure research. REFERENCES 1 M. J. Neely, E. Modiano, and C.-P. Li, Fairness and opimal sochasic conrol for heerogeneous neworks, IEEE/ACM Trans. New., vol. 16, no. 2, pp , Apr F. P. Kelly, Charging and rae conrol for elasic raffic, European Trans. Telecommunicaions, vol. 8, pp , Online. Available: hp:// frank/elasic.hml 3 Q. Zhao and B. M. Sadler, A survey of dynamic specrum access, IEEE Signal Process. Mag., vol. 24, no. 3, pp , May Q. Zhao and A. Swami, A decision-heoreic framework for opporunisic specrum access, IEEE Wireless Commun. Mag., vol. 14, no. 4, pp , Aug J. L. Ny, M. Dahleh, and E. Feron, Muli-uav dynamic rouing wih parial observaions using resless bandi allocaion indices, in American Conrol Conference, Seale, WA, USA, Jun P. While, Resless bandis: Aciviy allocaion in a changing world, J. Appl. Probab., vol. 25, pp , C. H. Papadimiriou and J. N. Tsisiklis, The complexiy of opimal queueing nework conrol, Mah. of Oper. Res., vol. 24, pp , May C.-P. Li and M. J. Neely, Exploiing channel memory for muliuser wireless scheduling wihou channel measuremen: Capaciy regions and algorihms, Performance Evaluaion, 2011, acceped for publicaion. 9 L. Tassiulas and A. Ephremides, Sabiliy properies of consrained queueing sysems and scheduling policies for maximum hroughpu in mulihop radio neworks, IEEE Trans. Auom. Conrol, vol. 37, no. 12, pp , Dec , Dynamic server allocaion o parallel queues wih randomly varying conneciviy, IEEE Trans. Inf. Theory, vol. 39, no. 2, pp , Mar M. J. Neely, Energy opimal conrol for ime varying wireless neworks, IEEE Trans. Inf. Theory, vol. 52, no. 7, pp , Jul , Dynamic power allocaion and rouing for saellie and wireless neworks wih ime varying channels, Ph.D. disseraion, Massachuses Insiue of Technology, November A. Eryilmaz and R. Srikan, Fair resource allocaion in wireless neworks using queue-lengh-based scheduling and congesion conrol, IEEE/ACM Trans. New., vol. 15, no. 6, pp , Dec M. J. Neely, Sochasic opimizaion for markov modulaed neworks wih applicaion o delay consrained wireless scheduling, in IEEE Conf. Decision and Conrol CDC), , Dynamic opimizaion and learning for renewal sysems, in Asilomar Conf. Signals, Sysems, and Compuers, Nov. 2010, invied paper. 16, Sochasic Nework Opimizaion wih Applicaion o Communicaion and Queueing Sysems. Morgan & Claypool, C.-P. Li and M. J. Neely, Energy-opimal scheduling wih dynamic channel acquisiion in wireless downlinks, IEEE Trans. Mobile Compu., vol. 9, no. 4, pp , Apr P. Chaporkar, A. Prouiere, H. Asnani, and A. Karandikar, Scheduling wih limied informaion in wireless sysems, in ACM In. Symp. Mobile Ad Hoc Neworking and Compuing MobiHoc), New Orleans, LA, May N. B. Chang and M. Liu, Opimal channel probing and ransmission scheduling for opporunisic specrum access, in ACM In. Conf. Mobile Compuing and Neworking MobiCom), New York, NY, 2007, pp P. Chaporkar, A. Prouiere, and H. Asnani, Learning o opimally exploi muli-channel diversiy in wireless sysems, in IEEE Proc. INFOCOM, P. Chaporkar and A. Prouiere, Opimal join probing and ransmission sraegy for maximizing hroughpu in wireless sysems, IEEE J. Sel. Areas Commun., vol. 26, no. 8, pp , Oc S. Guha, K. Munagala, and S. Sarkar, Joinly opimal ransmission and probing sraegies for mulichannel wireless sysems, in Conf. Informaion Sciences and Sysems, Mar

8 WIOPT Q. Zhao, B. Krishnamachari, and K. Liu, On myopic sensing for mulichannel opporunisic access: Srucure, opimaliy, and preformance, IEEE Trans. Wireless Commun., vol. 7, no. 12, pp , Dec S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari, Opimaliy of myopic sensing in mulichannel opporunisic access, IEEE Trans. Inf. Theory, vol. 55, no. 9, pp , Sep S. H. A. Ahmad and M. Liu, Muli-channel opporunisic access: A case of resless bandis wih muliple plays, in Alleron Conf. Communicaion, Conrol, and Compuing, 2009, pp K. Liu and Q. Zhao, Indexabiliy of resless bandi problems and opimaliy of while s index for dynamic mulichannel access, IEEE Trans. Inf. Theory, vol. 56, no. 11, pp , Nov J. Nino-Mora, An index policy for dynamic fading-channel allocaion o heerogeneous mobile users wih parial observaions, in Nex Generaion Inerne Neworks, 2008, pp S. Guha, K. Munagala, and P. Shi, Approximaion algorihms for resless bandi problems, Tech. Rep., Feb D. P. Bersekas, Dynamic Programming and Opimal Conrol, 3rd ed. Ahena Scienific, 2005, vol. I. APPENDIX A Proof of Theorem 2: We need o show ha all queues Q n )) N are sable and ha 22) is achieved. Due o space consrain, we only prove 22) here. Under policy QRRNUM, le k 1 and T k be he beginning and he duraion of he kh ransmission round, respecively. We have T k = k k 1 and k = k i=1 T i for all k N. Every T k is he lengh of a ransmission round of some RRφ) policy. Assume 0 = 0. To show 22), we compare he QRRNUM policy wih he unknown) soluion o problem 3). By definiion of he hroughpu region Λ in in Secion III-B, here exiss a randomized round robin policy RandRR ha solves 3) and yields he opimal hroughpu vecor y = y n) N. Le T denoe he lengh of one ransmission round under policy RandRR. From Corollary 1, we have for every user n {1,..., N}: T 1 T 1 E µ n + τ) Q) = E µ n + τ) = y ne T. Combining RandRR wih he admission policy σ ha ses r n +τ) = y n for all users n and τ {0,..., T 1} yields 7 f Q)) = E T h Q)) = E T N Q n ) y n 23) V gy ) N Q n ) y n 24) where 23) and 24) are fq)) and hq)) see 16), 17)) evaluaed under policies RandRR and σ, respecively. Since he QRRNUM policy maximizes 18), comparing 18) under QRRNUM and he join policy RandRR, σ ) yields f QRRNUM Q k )) + h QRRNUM Q k )) E T k+1 Q k ) f Q k )) + h Q k )) E T a) = E T k+1 Q k ) V gy ), 25) where a) is from 23) and 24). The inequaliy 15) under he QRRNUM policy in he k +1)h round of ransmission yields T k+1 1 Tk+1 Q k )) V E gr k + τ)) Q k ) B f QRRNUM Q k )) h QRRNUM Q k )) b) B E T k+1 Q k ) V gy ), 26) where b) uses 25). Taking expecaion over Q k ) in 26) and summing i over k {0,..., K 1}, we ge K 1 E LQ K )) E LQ 0 )) V E grτ)) 27) BK V gy ) E K B V gy ) E K where he las inequaliy uses K = K k=1 T k K. Ignoring he firs erm, noing Q 0 ) = 0, and dividing by V yields E K 1 grτ)) gy ) B V ) E K. 28) Le K) denoe he number of ransmission rounds ending by ime ; we have K) < K)+1. I follows ha 1 E grτ)) c) K) 1 E grτ)) d) gy ) B ) E K) V = gy ) B gy ) B ) E K), V V 29) where c) uses he nonnegaiviy of g ) and K), and d) follows 28). Dividing 29) by, aking a lim inf as, and noing E K) E TK)+1 < yields lim inf 1 1 E grτ)) gy ) B V. 30) Using Jensen s inequaliy and he concaviy of g ), we ge lim inf g r )) gy ) B V, 31) 1 E r nτ). Since where r ) r ) n ) N and r ) n 1 all queues Q n )) N are sable, we can show ha he ime average hroughpu vecor y ) 1 E y nτ), saisfies 1 = y ) n ) N, where y ) n lim inf g y )) lim inf g r )). 32) Combining 31) and 32) finishes he proof. 7 The hroughpu y n is less han or equal o one, hus is a feasible choice of r n + τ).

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly