Power Aware Wireless File Downloading: A Constrained Restless Bandit Approach

PROC. WIOP 204 Power Aware Wireless File Downloading: A Constrained Restless Bandit Aroach Xiaohan Wei and Michael J. Neely, Senior Member, IEEE Abstract his aer treats ower-aware throughut maximization in a multi-user file downloading system. Each user can receive a new file only after its revious file is finished. he file state rocesses for each user act as couled Markov chains that form a generalized restless bandit system. First, an otimal algorithm is derived for the case of one user. he algorithm maximizes throughut subject to an average ower constraint. Next, the one-user algorithm is extended to a low comlexity heuristic for the multi-user roblem. he heuristic uses a simle online index olicy and its effectiveness is shown via simulation. For simle 3-user cases where the otimal solution can be comuted offline, the heuristic is shown to be near-otimal for a wide range of arameters. I. INRODUCION Consider a wireless access oint, such as a base station or femto node, that delivers files to N different wireless users. he system oerates in slotted time with time slots t {0,, 2,...}. Each user can download at most one file at a time. File sizes are random and comlete delivery of a file requires a random number of time slots. A new file request is made by each user at a random time after it finishes its revious download. Let F n (t) {0, } reresent the binary file state rocess for user n {,..., N}. he state F n (t) = means that user n is currently active downloading a file, while the state F n (t) = 0 means that user n is currently idle. Idle times are assumed to be indeendent and geometrically distributed with arameter λ n for each user n, so that the average idle time is /λ n. Active times deend on the random file size and the transmission decisions that are made. Every slot t, the access oint observes which users are active and decides to serve a subset of at most M users, where M is the maximum number of simultaneous transmissions allowed in the system (M < N is assumed throughout). he goal is to maximize a weighted sum of throughut subject to a total average ower constraint. he file state rocesses F n (t) are couled controlled Markov chains that form a total state (F (t),..., F N (t)) that can be viewed as a restless multi-armed bandit system. Such roblems are comlex due to the inherent curse of dimensionality. his aer first comutes an online otimal algorithm for -user systems, i.e., the case N =. his simle case avoids the curse of dimensionality and rovides valuable intuition. he authors are with the Electrical Engineering deartment at the University of Southern California, Los Angeles, CA. his material is suorted in art by: NSF grant CCF-0747525, Network Science Collaborative echnology Alliance sonsored by U.S. Army Research Laboratory W9NF-09-2-0053, the Okawa Foundation research grant. he otimal olicy here is nontrivial and uses the theory of Lyaunov otimization for renewal systems []. he resulting algorithm makes a greedy transmission decision that affects success robability and ower usage. he decision is based on a drift-lus-enalty index. Next, the algorithm is extended as a low comlexity online heuristic for the N-user roblem. he heuristic has the following desirable roerties: Imlementation of the N-user heuristic is as simle as comaring indices for N different -user roblems. he N-user heuristic is analytically shown to meet the desired average ower constraint. he N-user heuristic is shown in simulation to erform well over a wide range of arameters. Secifically, it is very close to otimal for examle 3-user cases where an offline otimal can be comuted. Prior work on wireless otimization uses Lyaunov functions to maximize throughut in cases where the users are assumed to have an infinite amount of data to send [2][3][4][5][6][7][8], or when data arrives according to a fixed rate rocess that does not deend on delays in the network (which necessitates droing data if the arrival rate vector is outside of the caacity region) [4][6]. hese models do not consider the interlay between arrivals at the transort layer and file delivery at the network layer. he current aer catures this interlay through the binary file state rocesses F n (t). his creates a comlex roblem of couled Markov chains. his roblem is fundamental to file downloading systems. he modeling and analysis of these systems is a significant contribution of the current aer. Markov decision roblems (MDPs) can be solved offline via linear rogramg [9]. his can be rohibitively comlex for large dimensional roblems. Low comlexity solutions for couled MDPs are ossible in secial cases when the couling involves only time average constraints [0]. Finite horizon couled MDPs are treated via integer rogramg in [] and via a heuristic task decomosition method in [2]. he roblem of the current aer does not fit the framework of [0]-[2] because it includes both time-average constraints (on average ower exenditure) and instantaneous constraints which restrict the number of users that can be served on one slot. he latter service restriction is similar to a traditional restless multi-armed bandit (RMAB) system [3]. RMAB roblems are generally comlex (see P-SPACE hardness results in [4]). A standard low comlexity heuristic for such roblems is the Whittle s index technique [3]. Low comlexity Whittle indexing has been used in RMAB models for wireless systems [5][6][7], where simulations demon-

PROC. WIOP 204 2 strate near otimal results. Certain secial cases with symmetry are also known to be otimal [5][6]. Unfortunately, not every RMAB roblem has a Whittle s index, and such indices, if they exist, are not always easy to comute. Further, the Whittle s index framework does not consider additional time average ower constraints. he algorithm develoed in the current aer can be viewed as a Whittle-like indexing scheme that can always be imlemented and that incororates average ower constraints. It is likely that these techniques can be extended to other constrained RMAB roblems. II. SINGLE USER SCENARIO Consider a file downloading system that consists of only one user that reeatedly downloads files. Let F (t) {0, } be the file state rocess of the user. State means there is a file in the system that has not comleted its download, and 0 means no file is waiting. he length of each file is indeendent and is either exonentially distributed or geometrically distributed (described in more detail below). Let B denote the exected file size in bits. ime is slotted. At each slot in which there is an active file for downloading, the user makes a service decision that affects both the downloading success robability and the ower exenditure. After a file is downloaded, the system goes idle (state 0) and remains in the idle state for a random amount of time that is indeendent and geometrically distributed with arameter λ > 0. A transmission decision is made on each slot t in which F (t) =. he decision affects the number of bits that are sent, the robability these bits are successfully received, and the ower usage. Let α(t) denote the decision variable at slot t and let A reresent the abstract action set with a finite number of elements. he set A can reresent a collection of modulation and coding otions for each transmission. Assume also that A contains an idle action denoted as 0. he decision α(t) deteres the following two values: he robability of successfully downloading a file φ(α(t)), where φ( ) [0, ] with φ(0) = 0. he ower exenditure (α(t)), where ( ) is a nonnegative function with (0) = 0. he user chooses α(t) = 0 whenever F (t) = 0. he user chooses α(t) A for each slot t in which F (t) =, with the goal of maximizing throughut subject to a time average ower constraint. he roblem can be described by a two state Markov decision rocess with binary state F (t). Given F (t) =, a file is currently in the system. his file will finish its download at the end of the slot with robability φ(α(t)). Hence, the transition robabilities out of state are: P r[f (t + ) = 0 F (t) = ] = φ(α(t)) () P r[f (t + ) = F (t) = ] = φ(α(t)) (2) Given F (t) = 0, the system is idle and will transition to the active state in the next slot with robability λ, so that: P r[f (t + ) = F (t) = 0] = λ (3) P r[f (t + ) = 0 F (t) = 0] = λ (4) Define the throughut, measured by bits er slot (not files er slot) as: lim inf Bφ(α(t)) he file downloading roblem reduces down to the following: Maximize: lim inf Subject to: lim su Bφ(α(t)) (5) (α(t)) β (6) α(t) A t {0,, 2,...} such that F (t) = (7) ransition robabilities satisfy ()-(4) (8) where β is a ositive constant that deteres the desired average ower constraint. A. he memoryless file size assumtion he above model assumes that file comletion success on slot t deends only on the transmission decision α(t), indeendent of history. his imlicitly assumes that file length distributions have a memoryless roerty. his holds when each file i has indeendent length B i that is exonentially distributed with mean length B bits, so that: P r[b i > x] = e x/b for x > 0 For examle, suose the transmission rate r(t) (in units of bits/slot) and the transmission success robability q(t) are given by general functions of α(t): r(t) = ˆr(α(t)) q(t) = ˆq(α(t)) hen the file comletion robability φ(α(t)) is the robability that the residual amount of bits in the file is less than or equal to r(t), and that the transmission of these residual bits is a success. By the memoryless roerty of the exonential distribution, the residual file length is distributed the same as the original file length. hus, the file success robability function is: φ(α(t)) = ˆq(α(t))P r[b i ˆr(α(t))] = ˆq(α(t)) ˆr(α(t)) 0 B e x/b dx (9) Alternatively, history indeendence holds when each file i consists of a random number Z i of fixed length ackets, where Z i is geometrically distributed with mean Z = /µ. Assume each transmission sends exactly one acket, but different ower levels affect the transmission success robability q(t) = ˆq(α(t)). hen: φ(α(t)) = µˆq(α(t)) (0) hese memoryless file length assumtions ensure the file state can be modeled by a simle binary-valued rocess F (t) {0, }. However, actual file size distributions might not be memoryless. One way to treat general distributions is

PROC. WIOP 204 3 to aroximate the file sizes as being memoryless by using a φ(α(t)) function defined by either (9) or (0), formed by matching the average file size B or average number of ackets Z. he decisions α(t) are made according to the algorithm below, but the actual outcomes that arise from these decisions are not memoryless. A simulation comarison of this aroximation is rovided in Section IV, where it is shown to be remarkably accurate (see Fig. 4). B. Lyaunov otimization his subsection develos an online algorithm for roblem (5)-(8). First, notice that file state is recurrent under any decisions for α(t). Denote t k as the k-th time when the system returns to state. Define the renewal frame as the time eriod between t k and t k+. Define the frame size: [k] = t k+ t k Notice that [k] = for any frame k in which the file does not comlete its download. If the file is comleted on frame k, then [k] = +G k, where G k is a geometric random variable with mean E [G k ] = /λ. Each frame k involves only a single decision α(t k ) that is made at the beginning of the frame. hus, the total ower used over the duration of frame k is: t k+ t=t k (α(t)) = (α(t k )) () Using a technique similar to that roosed in [], we treat the time average constraint in (6) using a virtual queue Q[k] that is udated every frame k by: Q[k + ] = max {Q[k] + (α(t k )) β [k], 0} (2) with initial condition Q[0] = 0. he algorithm is then arameterized by a constant V 0 which affects a erformance tradeoff. At the beginning of the k-th renewal frame, the user observes virtual queue Q[k] and chooses α(t k ) to maximize the following drift-lus-enalty (DPP) ratio []: max α(t) A V Bφ(α(t k )) Q[k](α(t k )) E[ [k] α(t k )] where E[ [k] α(t k )] can be easily comuted: E[ [k] α(t k )] = + φ(α(t k)) λ hus, (3) is equivalent to max α(t k ) A V Bφ(α(t k )) Q[k](α(t k )) + φ(α(t k ))/λ (3) (4) Since there are only a finite number of elements in A, (4) is easily comuted. his gives the following algorithm for the single-user case: At each time t k, the user observes virtual queue Q[k] and chooses α(t k ) as the solution to (4) (where ties are broken arbitrarily). he value Q[k + ] is comuted according to (2) at the end of the k-th frame. C. Average ower constraints via queue bounds Lemma : If there is a constant C 0 such that Q[k] C for all k {0,, 2,...}, then: lim su (α(t)) β Proof: From (2), we know that for each frame k: Q[k + ] Q[k] + (α(t k )) [k]β Rearranging terms and using [k] = t k+ t k gives: (α(t k )) (t k+ t k )β + Q[k + ] Q[k] Fix K > 0. Sumg over k {0,,, K } gives: K k=0 (α(t k)) (t K t 0 )β + Q[K] Q[0] t K β + C he sum ower over the first K frames is the same as the sum u to time t K, and so: tk (α(t)) t K β + C Dividing by t K gives: aking K, then, tk t K (α(t)) β + C/t K. lim su K t K tk (α(t)) β, which yields the result. he next lemma shows that the queue rocess under our roosed algorithm is deteristically bounded. Define: = α A\{0} max = max α A\{0} Lemma 2: If Q[0] = 0, then under our algorithm we have for all k > 0: { } V B Q[k] max + max β, 0 Proof: First, consider the case when max β. From (2) and the fact that [k] for all k, it is clear the queue can never increase, and so Q[k] Q[0] = 0 for all k > 0. Next, consider the case when max > β. We rove the assertion by induction on k. he result trivially holds for k = 0. Suose it holds at k = l for l > 0, so that: Q[l] V B + max β We are going to rove that the same holds for k = l+. here are two cases: ) Q[l] V B. In this case we have by (2): Q[l + ] Q[l] + max β V B + max β

PROC. WIOP 204 4 < Q[l] V B + max β. In this case, if (α(t l )) = 0 then the queue cannot increase, so: 2) V B Q[l + ] Q[l] V B + max β On the other hand, if (α(t l )) > 0 then (α(t l )) and so the numerator in (4) satisfies: V Bφ(α(t l )) Q[l](α(t l )) V B Q[l] < 0 and so the maximizing ratio in (4) is negative. However, the maximizing ratio in (4) cannot be negative, because the alternative choice α(t l ) = 0 would increase the ratio to 0. his contradiction imlies that we cannot have (α(t l )) > 0. he above is a samle ath result that used only the fact that λ > 0 and 0 < (t) max. hus, the algorithm meets the average ower constraint even if the λ, B, and φ(α(t)) values used in the algorithm are only estimates of the true values. D. Otimality over randomized algorithms Consider the following class of i.i.d. randomized algorithms: Let θ(α) be non-negative numbers defined for each α A, and suose they satisfy α A θ(α) =. Let α (t) reresent a olicy that, every slot t for which F (t) =, chooses α (t) A by indeendently selecting strategy α with robability θ(α). hen ((α (t k )), φ(α (t k ))) are indeendent and identically distributed (i.i.d.) over frames k. Under this algorithm, it follows by the law of large numbers that the throughut and ower exenditure satisfy (with robability ): lim Bφ(α (t)) = t lim t (α (t)) = BE [φ(α (t k ))] + E [φ(α (t k ))] /λ E [(α (t k ))] + E [φ(α (t k ))] /λ It can be shown that otimality of roblem (5)-(8) can be achieved over this class. hus, there exists an i.i.d. randomized algorithm α (t) that satisfies: BE [φ(α (t k ))] + E [φ(α (t k ))] /λ = µ (5) E [(α (t k ))] + E [φ(α (t k ))] /λ β (6) where µ is the otimal throughut for the roblem (5)-(8). E. Key feature of the drift-lus-enalty ratio Define H[k] as the system history u to frame k, which includes all random events that occurred before frame k, and also includes the queue value Q[k] (since this is detered by the random events before frame k). Consider the algorithm that, on frame k, observes Q[k] and chooses α(t k ) according to (4). he following key feature of this algorithm can be shown (see [] for related results): E [ V Bφ(α(t k )) + Q[k](α(t k )) H[k] ] E [ + φ(α(t k ))/λ H[k]] E [ V Bφ(α (t k )) + Q[k](α (t k )) H[k] ] E [ + φ(α (t k ))/λ H[k]] where α (t k ) is any (ossibly randomized) alternative decision that is based only on H[k]. Using the i.i.d. decision α (t k ) from (5)-(6) in the above and noting that this alternative decision is indeendent of H[k] gives: E [ V Bφ(α(t k )) + Q[k](α(t k )) H[k] ] V µ + Q[k]β E [ + φ(α(t k ))/λ H[k]] (7) F. Performance theorem heorem : he roosed algorithm achieves the constraint lim su (α(t)) β and yields throughut satisfying (with robability ): lim inf Bφ(α(t)) µ C 0 V (8) where C 0 is a constant. Proof: First, for any fixed V, Lemma 2 imlies that the queue is deteristically bounded. hus, according to Lemma, the roosed algorithm achieves the constraint lim su roving the throughut guarantee (8). Define: E[(α(t))] β. he rest is devoted to L(Q[k]) = 2 Q[k]2. We call this a Lyaunov function. Define a frame-based Lyaunov Drift as: According to (2) we get hus: [k] = L(Q[k + ]) L(Q[k]) Q[k + ] 2 (Q[k] + (α(t k )) [k]β) 2. [k] ((α(t k)) [k]β) 2 2 + Q[k]((α(t k )) [k]β) aking a conditional exectation of the above given H[k] and recalling that H[k] includes the information Q[k] gives: E [ [k] H[k]] C 0 + Q[k]E [(α(t k )) β [k] H[k]] (9) where C 0 is a constant that satisfies the following for all ossible histories H[k]: [ ((α(tk )) [k]β) 2 ] E H[k] C 0 2 Such a constant C 0 exists because the ower (α(t k )) is deteristically bounded, and the frame sizes [k] are bounded in second moment regardless of history.

PROC. WIOP 204 5 Adding the enalty E [ V Bφ(α(t k )) H[k] ] to both sides of (9) gives: E [ [k] V Bφ(α(t k )) H[k] ] C 0 + E [ V Bφ(α(t k )) + Q[k]((α(t k )) [k]β) H[k] ] = C 0 Q[k]βE [ [k] H[k]] + E [ [k] H[k]] E [ V Bφ(α(t k )) + Q[k](α(t k )) H[k] ] E [ [k] H[k]] Exanding [k] in the denoator of the last term gives: E [ [k] V Bφ(α(t k )) H[k] ] C 0 Q[k]βE [ [k] H[k]] + E [ [k] H[k]] E [ V Bφ(α(t k )) + Q[k](α(t k )) H[k] ] E [ + φ(α(t k ))/λ H[k]] Substituting (7) into the above exression gives: E [ [k] V Bφ(α(t k )) H[k] ] Rearranging gives: C 0 Q[k]βE [ [k] H[k]] +E [ [k] H[k]] ( V µ + βq[k]) = C 0 V µ E [ [k] H[k]] (20) E [ [k] + V (µ [k] Bφ(α(t k ))) H[k] ] C 0 (2) he above is a drift-lus-enalty exression. Because we already know the queue Q[k] is deteristically bounded, it follows that: E [ [k] 2] < k= k 2 hus, the drift-lus-enalty result in Proosition 2 of [8] ensures that (with robability ): lim su K K K k=0 [ µ [k] Bφ(α(t k )) ] C 0 V hus, for any ɛ > 0 one has for all sufficiently large K: K [µ [k] Bφ(α(t k ))] C 0 K V + ɛ k=0 Rearranging imlies that for all sufficiently large K: K k=0 Bφ(α(t k)) K k=0 [k] µ (C 0/V + ɛ) K K k=0 [k] µ (C 0 /V + ɛ) where the final inequality holds because [k] for all k. hus: K k=0 lim inf Bφ(α(t k)) K K k=0 [k] µ (C 0 /V + ɛ) he above holds for all ɛ > 0. aking a limit as ɛ 0 imlies: K k=0 lim inf Bφ(α(t k)) K K k=0 [k] µ C 0 /V, which yields the result by noticing that φ(α(t)) only changes at the boundary of each frame. he theorem shows that throughut can be ushed within O(/V ) of the otimal value µ, where V can be chosen as large as desired to ensure throughut is arbitrarily close to otimal. he tradeoff is a queue bound that grows linearly with V according to Lemma 2, which affects the convergence time required for the constraints to be close to the desired time averages (as described in the roof of Lemma ). III. MULI-USER FILE DOWNLOADING his section considers a multi-user file downloading system that consists of N single user subsystems. Each subsystem is similar to the single-user system described in the revious section. Secifically, for the n-th user (where n {,..., N}): he file state rocess is F n (t) {0, }. he transmission decision is α n (t) A n, where A n is an abstract set of transmission otions for user n. he ower exenditure on slot t is n (α n (t)). he success robability on a slot t for which F n (t) = is φ n (α n (t)), where φ n ( ) is the function that describes file comletion robability for user n. he idle eriod arameter is λ n > 0. he average file size is B n bits. Assume that the random variables associated with different subsystems are mutually indeendent. o control the downloading rocess, there is a central server with only M threads (M < N), meaning that at most M jobs can be rocessed simultaneously. So at each time slot, the server has to make decisions selecting at most M out of N users to transmit a ortion of their files. hese decisions are further restricted by a global time average ower constraint. he goal is to maximize the aggregate throughut, which is defined as lim inf n= N c n B n φ(α n (t)) where c, c 2,..., c N are a collection of ositive weights that can be used to rioritize users. hus, this multi-user file downloading roblem reduces down to the following: Max: lim inf S.t.: lim su N n= n= N c n B n φ n (α n (t)) (22) n= N n (α n (t)) β (23) I(α n (t)) M t {0,, 2, } (24) P r[f n (t + ) = F n (t) = 0] = λ n (25) P r[f n (t + ) = 0 F n (t) = ] = φ n (α n (t)) (26) where the constraints (25)-(26) hold for all n {,..., N} and t {0,, 2,...}, and where I( ) is the indicator function defined as: { 0, if x = 0; I(x) =, otherwise.

PROC. WIOP 204 6 A. Lyaunov Indexing Algorithm his section develos our indexing algorithm for the multi-user case using the single-user case as a steing stone. he major difficulty is the instantaneous constraint N n= I(α n(t)) M. emorarily neglecting this constraint, we use Lyaunov otimization to deal with the time average ower constraint first. We introduce a virtual queue Q(t), which is again 0 at t = 0. Instead of udating it on a frame basis, the server udates this queue every slot as follows: Q(t + ) = max { Q(t) + } N n (α n (t)) β, 0. (27) n= Define N (t) as the set of users beginning their renewal frames at time t, so that F n (t) = for all such users. In general, N (t) is a subset of N = {, 2,, N}. Define N (t) as the number of users in the set N (t). At each time slot t, the server observes the queue state Q(t) and chooses (α (t),..., α N (t)) to maximize the following drift-lus-enalty exression subject to an instantaneous constraint: V c Max.: nb nφ n(α n(t)) Q(t) n(α n(t)) n N (t) +φ n(α n(t))/λ n (28) S.t.: α n (t) A n n N (29) Notice that in (28), the term α n (t) = 0 n / N (t) (30) n N (t) I(α n(t)) M. (3) g n (α n (t)) V c nb n φ n (α n (t)) Q(t) n (α n (t)) + φ n (α n (t))/λ n is similar to the exression (4) used in the single-user otimization. Call g n (α n (t)) a reward. Now define an index for each subsystem n by: γ n (t) max g n (α n (t)) (32) α n(t) A n which is the maximum ossible reward one can get from the n-th subsystem at time slot t. hus, it is natural to define the following myoic algorithm: Find the (at most) M subsystems in N (t) with the greatest rewards, and serve these with their corresonding otimal α n (t) otions in A n that maximize g n (α n (t)). Secifically: At each time slot t, the server observes virtual queue state Q(t) and comutes the indices using (32) for all n N (t). Activate the [M, N (t) ] subsystems with greatest indices, using their corresonding actions α n (t) A n that maximize g n (α n (t)). Udate Q(t) according to (27) at the end of each slot t. B. heoretical Performance Analysis In this subsection, we show that the above algorithm always satisfies the desired time average ower constraint. Define: n = α n A n\{0} n(α n ) = n n max n = max n (α n ) α n A n c max = max c n n B max = max B n n Lemma 3: Under the above Lyaunov indexing algorithm, the queue {Q(t)} is deteristically bounded. Secifically, we have for all t {0,, 2,...}: {V c max B max } N Q(t) max + max n β, 0 n= Proof: First, consider the case when N n= max n β. Since Q(0) = 0, it is clear from the udating rule (27) that Q(t) will remain 0 for all t. Next, consider the case when N n= max n > β. We rove the assertion by induction on t. he result trivially holds for t = 0. Suose at t = t, we have: Q(t ) V cmax B max + N n= max n β We are going to rove that the same statement holds for t = t +. We further divide it into two cases: ) Q(t ) V cmax B max. In this case, since the queue increases by at most N n= max n β on one slot, we have: Q(t + ) V cmax B max + N n= max n β V c 2) max B max < Q(t ) V cmax B max + N n= max n β. In this case, since φ n (α n (t )), there is no ossibility that V c n B max φ n (α n (t )) Q(t ) n (α n (t )) unless α n (t ) = 0, and thus α n (t ) must be 0 for all n. hus, all indices are 0. his imlies that Q(t +) cannot increase, and we get Q(t + ) V cmax B max + N n= max n β. heorem 2: he roosed Lyaunov indexing algorithm achieves the constraint: lim su n= N n (α n (t)) β Proof: Using Lemma under the secial case that each frame only occuies one slot, we get that if {Q(t)} is deteristically bounded, then the time average constraint is satisfied. hen, according to Lemma 3 we are done.

PROC. WIOP 204 7 IV. SIMULAION EXPERIMENS In this section, we demonstrate the near otimality of the multi-user Lyaunov indexing algorithm by extensive simulations. In the first art, we simulate the case in which the file length distribution is geometric, and show that the subotimality ga is extremely small. In the second art, we test the robustness of our algorithm for more general scenarios in which the file length distribution is not geometric. For simlicity, it is assumed throughout that all transmissions send a fixed sized acket, all files are an integer number of these ackets, and that decisions α n (t) A n affect the success robability of the transmission as well as the ower exenditure. hroughut 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 Lyaunov Indexing Otimal A. Lyaunov Indexing for multi-user downloading with geometric file length In the first simulation we use N = 3, M = with action set A n = {0, } n; idle eriod arameter: λ = 0.8, λ 2 = 0.5, λ 3 = 0.. Files consist of an integer number of ackets and have indeendent and geometrically distributed sizes with arameters µ = 0., µ 2 = 0.2, and µ 3 = 0.4; so that the exected file size for user n {, 2, 3} is B n = /µ n ackets. he success robability functions are given by: φ () = 0.9µ, φ 2 () = 0.8µ 2, φ 3 () = 0.7µ 3 ; ower exenditure function: () = 2, 2 () =.5, 3 () = ; weight arameters: c =, c 2 =.5, c 3 = 2 and β =. he algorithm is run for million slots. We comare the erformance of our algorithm with the otimal randomized olicy. he otimal olicy is comuted by constructing comosite states (i.e. if queue is at state 0, queue 2 is at state and queue 3 is at state, we view 0 as a comosite state, and then reformulating this MDP into a linear rogram (see [9]) which contains 20 variables. In Fig., we show that as our tradeoff arameter V gets larger, the objective value aroaches the otimal value and achieves a near otimal erformance. Fig. 2 and Fig. 3 show that V also affects the virtual queue size and the constraint ga. As V gets larger, the average virtual queue size becomes larger and the ga becomes smaller. We also lot the uer bound of queue size we derived from Lemma 3 in Fig. 2, demonstrating that the queue is bounded. In the second simulation, we exlore the arameter sace and demonstrate that in general the subotimality ga of our algorithm is negligible. First, we define the relative error as the following: relative error = OBJ OP OP (33) where OBJ is the objective value after running million slots of our algorithm and OP is the otimal value. We first exlore the system arameters by letting λ n s and µ n s take random numbers between 0 and, choosing V = 70 and fixing the remaining arameters the same as the last exeriment. We conduct 000 Monte-Carlo exeriments and calculate the average relative error, which is 0.00064. Next, we exlore the control arameters by letting the n () and φ n ()/µ n values take random numbers between 0 and Fig.. Average Power Exenditure 0.55 0 0 20 30 40 50 60 70 V value hroughut versus tradeoff arameter V.05 0.95 0.9 0.85 0.8 Lyaunov Indexing 0.75 0 0 20 30 40 50 60 70 V Value Fig. 2. he time average ower consumtion versus tradeoff arameter V. Average Virtual Queue Backlog 50 00 50 Average Queuesize Queuesize bound 0 0 0 20 30 40 50 60 70 V value Fig. 3. Average virtual queue backlog versus tradeoff arameter V.

PROC. WIOP 204 8, choosing V = 70 and fixing the remaining arameters the same as the first simulation. he relative error is 0.00077. Both exeriments show an extremely small subotimality ga. B. Lyaunov indexing for multi-user downloading with nonmemoryless file lengths In this art, we test the sensitivity of the algorithm to different file length distributions. In articular, the uniform distribution and the Poisson distribution are imlemented resectively, while our algorithm still treats them as a geometric distribution with same mean. We then comare their throughuts with the geometric case. We still use N = 3, M = with action set A n = {0, } n. For the uniform distribution case, the file lengths of the three subsystems are uniformly distributed between [5, 5], [2, 8] and [, 5] ackets, resectively, with integer acket numbers. For the Poisson distribution case, the Poisson arameters are set to ensure means of 0, 5 and 3 ackets, resectively. We then kee the remaining conditions the same as the first simulation scenario in Section IV-A. In the algorithm we use φ n (α n ) functions defined using arameters B n = /µ n with µ = /0, µ 2 = /5, µ 3 = /3. While the decisions are made using these values, the affect of these decisions incororates the actual (non-memoryless) file sizes. Fig. 4 shows the throughut-versus-v relation for the two nonmemoryless cases and the memoryless case with matched means. Remarkably, the curves are almost indistinguishable. his illustrates that the indexing algorithm is robust under different file length distributions. hroughut 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 Geometric Uniform Poisson 0.55 0 0 20 30 40 50 60 70 V value Fig. 4. hroughut versus tradeoff arameter V under different file length distributions. V. CONCLUSIONS We have investigated a file downloading system where the network delays affect the file arrival rocesses. he singleuser case was solved by a variable frame length Lyaunov otimization method. he technique was extended as a wellreasoned heuristic for the multi-user case. Such heuristics are imortant because the roblem is a multi-dimensional Markov decision roblem with very high comlexity. he heuristic is simle, can be imlemented in an online fashion, and was analytically shown to achieve the desired average ower constraint. While we do not have a roof of throughut otimality for the multi-user case, simulations suggest that the algorithm is very close to otimal. Further, simulations suggest that non-memoryless file lengths can be accurately aroximated by the algorithm. hese methods can likely be alied in more general situations of restless multi-armed bandit roblems with constraints. REFERENCES [] M. J. Neely. Stochastic Network Otimization with Alication to Communication and Queueing Systems. Morgan & Clayool, 200. [2] L. assiulas and A. Ehremides. Dynamic server allocation to arallel queues with randomly varying connectivity. IEEE ransactions on Information heory, vol. 39, no. 2,. 466-478, March 993. [3] A. Stolyar. Maximizing queueing network utility subject to stability: Greedy rimal-dual algorithm. Queueing Systems, vol. 50, no. 4,. 40-457, 2005. [4] L. Georgiadis, M. J. Neely, and L. assiulas. Resource allocation and cross-layer control in wireless networks. Foundations and rends in Networking, vol., no.,. -49, 2006. [5] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queue-length-based scheduling and congestion control. IEEE/ACM ransactions on Networking, vol. 5, no. 6,. 333-344, Dec. 2007. [6] M. J. Neely, E. Modiano, and C. Li. Fairness and otimal stochastic control for heterogeneous networks. IEEE/ACM ransactions on Networking, vol. 6, no. 2,. 396-409, Aril 2008. [7] S. Liu, L. Ying, and R. Srikant. hroughut-otimal oortunistic scheduling in the resence of flow-level dynamics. IEEE/ACM rans. Networking, vol. 9, no. 4,. 057-070, Jan. 20. [8] L. Huang and M. J. Neely. Utility otimal scheduling in energyharvesting networks. IEEE/ACM rans. Networking, vol. 2, no. 4,. 7-30, Aug. 203. [9] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programg. John Wiley & Sons, 2005. [0] M. J. Neely. Asynchronous control for couled Markov decision systems. In Proc. Information heory Worksho (IW), ages. 287 29, Se. 202. [] D. Dolgov and E. Durfee. Otimal resource allocation and olicy formulation in loosely-couled Markov decision rocesses. In Proc. ICAPS, ages. 35 324, June 2004. [2] N. Meuleau, M. Hauskrecht, K.-E. Kim, L. Peshkin, L. P. Kaelbling,. Dean, and C. Boutilier. Solving very large weakly couled Markov decision rocesses. In Proc. 5th National Conf. on Artificial Intelligence, 998. [3] P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Alied Probability, vol. 25,. 287-298, 988. [4] C. H. Paadimitriou and J. N. sitsiklis. he comlexity of otimal queueing network control. Math. Oer. Res., vol. 24, no. 2,. 293-305, May 999. [5] K. Liu and Q. Zhao. Indexability of restless bandit roblems and otimality of Whittle s index for dynamic multichannel access. IEEE rans. Inf. heory, vol. 56, no.,. 5547-5567, Nov. 200. [6]. Javidi, B. Krishnamachari, Q. Zhao, and M. Liu. Otimality of myoic sensing in multi-channel oortunistic access. In Proc. IEEE ICC, May 2008. [7] W. Ouyang, S. Murugesan, A. Eryilmaz, and N. B. Shroff. Exloiting channel memory for joint estimation and scheduling in downlink networks. Proc. IEEE INFOCOM, 20. [8] M. J. Neely. Stability and robability convergence for queueing networks via Lyaunov otimization. Journal of Alied Mathematics, doi:0.55/202/83909, 202. [9] B. Fox. Markov renewal rogramg by linear fractional rogramg. SIAM Journal on Alied Mathematics, vol. 4, no. 6,. 48-432, Nov. 966.