Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay

Similar documents
MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

Asynchronous Control for Coupled Markov Decision Systems

Delay Analysis of Maximum Weight Scheduling in Wireless Ad Hoc Networks

A Brief Introduction to Markov Chains and Hidden Markov Models

A Survey on Delay-Aware Resource Control. for Wireless Systems Large Deviation Theory, Stochastic Lyapunov Drift and Distributed Stochastic Learning

<C 2 2. λ 2 l. λ 1 l 1 < C 1

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

A. Distribution of the test statistic

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Age-based Scheduling: Improving Data Freshness for Wireless Real-Time Traffic

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 15, NO. 2, FEBRUARY

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Arbitrary Throughput Versus Complexity Tradeoffs in Wireless Networks using Graph Partitioning

CS229 Lecture notes. Andrew Ng

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Optimal Control of Assembly Systems with Multiple Stages and Multiple Demand Classes 1

An Adaptive Opportunistic Routing Scheme for Wireless Ad-hoc Networks

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

STA 216 Project: Spline Approach to Discrete Survival Analysis

Rate-Distortion Theory of Finite Point Processes

8 Digifl'.11 Cth:uits and devices

Separation of Variables and a Spherical Shell with Surface Charge

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

XSAT of linear CNF formulas

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

A simple reliability block diagram method for safety integrity verification

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

Exploring the Throughput Boundaries of Randomized Schedulers in Wireless Networks

Age of Information: The Gamma Awakening

Provisions estimation for portfolio of CDO in Gaussian financial environment

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

Seung Jun Baek 1 and Joon-Sang Park Introduction. example, autonomous vehicles, remote surgery, and automated

4 Separation of Variables

Introduction to Simulation - Lecture 13. Convergence of Multistep Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

C. Fourier Sine Series Overview

Recursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines

An approximate method for solving the inverse scattering problem with fixed-energy data

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Fast Blind Recognition of Channel Codes

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Efficiently Generating Random Bits from Finite State Markov Chains

Approximate Bandwidth Allocation for Fixed-Priority-Scheduled Periodic Resources (WSU-CS Technical Report Version)

Discrete Applied Mathematics

Heavy-traffic Delay Optimality in Pull-based Load Balancing Systems: Necessary and Sufficient Conditions

BALANCING REGULAR MATRIX PENCILS

Physics 235 Chapter 8. Chapter 8 Central-Force Motion

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees

UI FORMULATION FOR CABLE STATE OF EXISTING CABLE-STAYED BRIDGE

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method

V.B The Cluster Expansion

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

Problem set 6 The Perron Frobenius theorem.

Stochastic Variational Inference with Gradient Linearization

Nonlinear Analysis of Spatial Trusses

$, (2.1) n="# #. (2.2)

A Statistical Framework for Real-time Event Detection in Power Systems

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Multiple Loop Self-Triggered Model Predictive Control for Network Scheduling and Control

Torsion and shear stresses due to shear centre eccentricity in SCIA Engineer Delft University of Technology. Marijn Drillenburg

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

Optimal Distributed Scheduling under Time-varying Conditions: A Fast-CSMA Algorithm with Applications

Formulas for Angular-Momentum Barrier Factors Version II

Integrating Factor Methods as Exponential Integrators

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

Delay Asymptotics with Retransmissions and Fixed Rate Codes over Erasure Channels

Explicit overall risk minimization transductive bound

Centralized Coded Caching of Correlated Contents

Joint Congestion Control and Routing Optimization: An Efficient Second-Order Distributed Approach

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Lecture 6 Povh Krane Enge Williams Properties of 2-nucleon potential

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Statistical Learning Theory: A Primer

A Graphical Approach for Solving Single Machine Scheduling Problems Approximately

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS

Approximated MLC shape matrix decomposition with interleaf collision constraint

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

Section 6: Magnetostatics

Reliability: Theory & Applications No.3, September 2006

FOURIER SERIES ON ANY INTERVAL

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

A Comparison Study of the Test for Right Censored and Grouped Data

Methods for Ordinary Differential Equations. Jacob White

Tracking Control of Multiple Mobile Robots

Approximated MLC shape matrix decomposition with interleaf collision constraint

More Scattering: the Partial Wave Expansion

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Spring Gravity Compensation Using the Noncircular Pulley and Cable For the Less-Spring Design

14 Separation of Variables Method

V.B The Cluster Expansion

On Stochastic Sensor Network Scheduling for Multiple Processes

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes

Fractional Power Control for Decentralized Wireless Networks

Transcription:

Throughput Optima Scheduing for Wireess Downinks with Reconfiguration Deay Vineeth Baa Sukumaran vineethbs@gmai.com Department of Avionics Indian Institute of Space Science and Technoogy. Abstract We consider wireess downinks where the base station dynamicay switches between different users in order to transmit data intended for the respective users. When the base station switches from serving one user to another there is a reconfiguration deay. For such wireess downinks with reconfiguration deay we consider the probem of throughput optima scheduing. We propose the 1-ookahead scheduing poicy and anayticay show that it is throughput optima. We obtain the 1-ookahead poicy by using an approximate soution to a Markov decision process formuation of the scheduing probem. The approximate soution is aso used to expain the biased maxweight form of 1-ookahead as we as an existing poicy. Index Terms Wireess downink Random connectivity Reconfiguration deay Stabiity region Throughput optimaity I. INTRODUCTION We consider a wireess downink mode where the base station switches between different users in order to transmit data intended for the respective users. When switching between users the ogica inks between the user and the base station need to be configured. For exampe the user s state may need to be changed from an ide to an active state. This incurs a reconfiguration deay which is the deay between the time at which the base station scheduer decides to serve the user and the time at which the actua data transmission starts. We note that such reconfiguration deay aso arises in other cases: such as sateite systems with mechanicay steered antennae eectronic beamforg optica routers 1 and radio transceivers 3. The other important feature of such systems is that the service of a user is aso affected by the connectivity of that user to the service station. For exampe the connectivity of the user is through a wireess channe subject to fading therefore the connectivity is random over time. Motivated by such scenarios we consider the stabiity region and throughput optima scheduing for a wireess downink mode with random connectivity over time and reconfiguration deay see Fig. 1. For wireess networks with reconfiguration deay maxweight poicies 7 are not throughput optima since such poicies switch between queues very frequenty 2. Prior work in 1 2 and 4 had proposed heuristic throughput optima poicies for such systems. Ceik et a. 2 proposed the variabe frame max-weight VFMW poicy which reduces reconfiguration deay overhead by restricting switching to happen ony at the ends of scheduing frames for wireess networks with random connectivity and reconfiguration deay. Hsieh et a. 4 proposed the queue biased max weight QBMW poicy that is throughput optima for a queueing system with reconfiguration deay. The average deay performance of VFMW poicies was Fig. 1. An exampe wireess downink with two users and a base station BS. The data for the users have random packet arrivas A i t into their packet buffers at the BS. The BS can transmit at most one packet to one user from its queue in a sot. The connections between the BS and the users are randomy on or off across time sots. When the BS switches from serving one user to another there is a reconfiguration deay of sots. improved by the QBMW poicy. However why the specific biased form of the QBMW poicy was needed was not addressed in their paper. In this paper we propose the 1-ookahead poicy which is another throughput optima poicy for systems with reconfiguration deay. In contrast to VFMW and QBMW we use a forma Markov decision theoretic formuation to anayticay motivate the need for the bias term. The same approach can be used to motivate the bias term used for QBMW poicies. Outine Contributions and Notation: In Section II we discuss the queueing mode that we use for anayzing the wireess downink shown in Fig. 1 as we as its stabiity region. Our main contribution in this paper is the anayticay wemotivated throughput optima 1-ookahead poicy which we propose in Section III. We then discuss a Markov decision process formuation which is used to motivate the definition of 1-ookahead poicies in Section IV. Our first secondary contribution is that we expain the bias term appearing in the QBMW poicy which has a simiar form as 1-ookahead. Another secondary contribution is in the proof of throughput optimaity of 1-ookahead and its reation to the proof of throughput optimaity of QBMW; we provide simpifications and corrections to the proof in 4. In this paper a vectors are coumn vectors and vector transposes are denoted by. T. II. SYSTM MODL AND PROBLM FORMULATION We consider a system of N parae queues served by a singe server to mode a wireess downink for N users 1 see Fig. 1 1 We note that the mode can aso appy to a wireess upink but with additiona assumptions on the avaiabiity of queue ength information at a centraized scheduer.

for an exampe with N = 2. The system evoves in sotted time with the sots indexed by t {0 1 2...}. In each sot t a random number A i t of packets arrive to the base station BS destined for the i th user for every i {1 2... N}. For every user these packets are queued in a separate infinite ength buffers at the BS. We assume that the random process A i t t 0 is an independent and identicay distributed IID process e.g. A i t can be modeed by a Bernoui process. The arriva processes to different buffers are aso assumed to be independent. We denote the arriva rate A i 0 as λ i and the coumn vector λ 1... λ N T as λ. The BS scheduer decides which user s queue is served in a sot; S i t = 1 if the scheduer decides to serve user i in sot t and 0 otherwise. We assume that N i=1 S it = 1 i.e. at most one user can be served in a sot. We assume that there is a connection random process C i t {0 1} t 0 associated with user i that modes whether the BS is connected to user i in sot t. We assume that the the processes C i t t 0 are IID. We denote the average connection rate of queue i as µ i i.e. C i 0 = µ i. We aso assume that C i t t 0 and C j t t 0 for two different queues i and j are independent. We assume that the channe connectivity C j t j is not known to the scheduer at the start of sot t 2. We assume that if the BS is connected to user i and the scheduer decides to serve user i then at most one packet is removed from queue i in sot t. We assume that there is a reconfiguration deay of sots see Fig. 1 if the server switches from serving one queue to another. We assume that 1. We denote the number of sots to finish reconfiguration at time t by Rt. If the scheduer switches from one queue to the another at sot t Rt = and then Rt decrements by one for every sot unti Rt = 0 if the scheduer stays with the queue that it has switched to. Once reconfiguration is finished Rt stays at zero unti the scheduer switches to another queue. We note that a packet wi be removed from the i th queue in the t th sot ony if C i ts i ti {Rt=0} = 1. We define I i t = C i ts i ti {Rt=0} which is caed the service opportunity for queue i in sot t. We denote the number of packets in the i th queue at the beginning of sot t as Q i t. Then Q i t + 1 = Q i t I i t + + A i t 1 where x + = maxx 0. We note that the packets which arrive in sot t are assumed to stay in the system for at east one sot. A poicy µ is the sequence of decisions S1 S2... where St is the vector S 1 t... S N t T. The time average tota queue ength under a poicy µ is 1 im sup T T T 1 N i=1 Q it and is denoted as qµ. The queueing system with an arriva rate λ is said to be stabe under the poicy µ if the time average tota queue ength qµ is finite. The stabiity region Λ of the system is the set of arriva rate vectors λ where for each λ there exists a poicy 2 Since C i t are IID this information is not usefu. µ possiby dependent on λ such that the queueing system is stabe. From 3 we have that if λ Λ then there exists β j j such that β j 0 j β j 1 such that λ j β j I j µ j where I j is a coumn vector of ength N with 1 at the j th position and zero otherwise. It is known that the VFMW poicy from 2 can be used to achieve any point in the stabiity region Λ. We note that both VFMW and QBMW are poicies that are throughput optima for the above queueing system. In the next section we propose another throughput optima poicy 1-ookahead; the form of this poicy is anayticay motivated using a Markov decision theoretic formuation. III. TH 1-LOOKAHAD POLICY For the queueing mode any scheduing poicy at the start of a sot needs to decide whether to keep serving the current queue or to switch to any other queue. Intuitivey we shoud serve a queue with arge queue ength so that the queue ength can be reduced and argest possibe throughput since the reduction in queue ength woud be the most. This is the intuition behind the decision rue that is proposed as the 1-ookahead poicy. We define our 1-ookahead poicy in Agorithm 1. We assume that the server is serving queue i at t 1 in Agorithm 1. For the 1-ookahead poicy the weights W i t or W j t can be cacuated as the expected sum throughput that can be obtained in the immediate future consisting of + 1 sots under the decision of staying with queue i or switching to the j th queue. Then if we stay with queue i at t the expected sum throughput i.e. W i t is given by µ i. If we decide to switch to queue j at t then the expected sum throughput i.e. W j t is given by µ j. We find that a heuristic modification to the weight W i t is required to show that the 1-ookahead poicy is in fact throughput optima. We redefine W i t as F Qt where F Qt = max1 i Q it α where α 0 1. µ i Agorithm 1 1-ookahead poicy 1: Cacuate the weights W j t j {1 2... N}. 2: If W i tq i t < W j tq j t then switch to the j th queue ese stay with the i th queue. A natura question that arises in the definition of 1-ookahead poicies is why we are ooking at the expected tota throughput for + 1 sots and not + m sots for some m 1. In the next section we motivate this and the exact form for the 1-ookahead poicy by considering a Markov decision process MDP formuation for the probem of imizing time average tota queue ength for the queueing system discussed in our paper. We show that the 1-ookahead poicy arises from a heuristic approximate soution to the average cost optimaity equation for a Markov decision process formuated for the queueing system in our paper. The reason why m = 1 is reated to the need for keeping the poicy specification simpe.

We now show that the 1-ookahead poicy has finite average queue ength for any arriva rate vector within the stabiity region of the system and is therefore throughput optima. Theorem III.1. For IID channe connection processes for any arriva rate that is within the stabiity region the 1-ookahead poicy has finite average queue ength. The proof of this theorem is presented in Appendix A. We note that the proof of this theorem borrows ideas from the proof of the throughput optimaity of QBMW in 4. Comparison with QBMW: We note that the form of 1- ookahead poicy is simiar to that of the QBMW poicy. However for the QBMW poicy the weight W i t is cacuated as t. Since the form of 1-ookahead is amost the same as that of QBMW the derivation of the weight terms for the 1-ookahead given in the next section aso motivates the form of the QBMW poicy. We aso note that the proof of throughput optimaity of QBMW reies upon showing that the drift of a Lyapunov function defined as the sum of squared queue engths in a sot is negative when considered over frames mutipe sots. We expect that during reconfiguration deay after a switch since there is no service the Lyapunov drift woud not be negative. Hsieh et a. 4 define a frame size parameter T k in order to show that the drift is indeed negative when considered over T k sots which incude the reconfiguration deay. However the proof in 4 does not consider what happens when T k <. We address this probem in our paper. F Qt k where t k is the time of the ast switch before IV. MARKOV DCISION PROCSS FORMULATION We consider a two queue system here for ease of exposition. We note that instead of finding a scheduing poicy such that the average queue ength under that poicy is just finite we reformuate our objective to find a scheduing poicy that imizes im sup T T 1 1 T Q 1 t + Q 2 t. 2 In order to find a scheduing poicy that imizes 2 defined above we use a Markov decision process MDP formuation see 8 9. The state of the MDP is defined to be St = Q 1 t Q 2 t Rt Mt where Mt {0 1} indicates whether the first queue or second queue has been in service at sot t 1 respectivey. The state space of the MDP is the Cartesian product of the state spaces of the individua components i.e. Z + Z + {0... } {0 1}. We denote a specific state vector vaue as s = q 1 q 2 r m T. The action taken at sot t is γt which is defined as either staying with the current queue γt = 0 or switching to the other queue γt = 1. The evoution of the MDP s state from sot to sot i.e. St to St + 1 is defined in terms of its components as foows: 1 if γt = 0 and Rt > 0 then Rt + 1 = Rt 1 and since 1 there is no service from the Mt th queue; we denote this as I Mt t = 0 2 if γt = 0 and Rt = 0 then we have I Mt t = 1 3 if γt = 1 then Rt = and I Mt t = 0; Mt changes to the queue that was switched to 4 Q i t + 1 = maxq i t I i t 0 + A i t. Since we are interested in imizing the time average expected tota queue ength the singe stage cost of the MDP at sot t is chosen as Q 1 t + Q 2 t. Suppose the average cost optimaity equation ACO 9 Chapter 6 exists 3 for the MDP. The ACO is of the form 9 Chapter 6 Theorem 6.3.1 hs = { q q 2 g + hs +1 s } 3 where hs is the reative vaue function; which is a function of the state s = q 1 q 2 r m T g is the optima imum average cost or sum of average queue engths and S +1 is the state that the MDP evoves to in one step starting from s according to the MDP evoution described above. We note that q q 2 is the singe stage cost when the state of the MDP is s. The possibe actions which can be taken when the state is s is: a 0 stay with the current queue or b 1 switch to the other queue. The optima poicy is a stationary poicy that chooses an action γ {0 1} in order to imize the expression within the imization in the RHS of 3. This optima stationary poicy prescribes an action γ as a function γs of the state s. Note that if the function hs is known then the optima stationary poicy can be competey characterized. However in 3 both hs and g are not known. In most cases the above functiona equation cannot be soved anayticay for hs and g. Vaue iteration 9 Section 6.6 is an iterative procedure that can be used to obtain a soution to the ACO. We et V 1 s = q q 2. We define V n+1 s = { } q q 2 + V n S +1 s. 4 We reca that S +1 is the state that the MDP evoves to in a singe sot starting from state s. We note that V n s caed the vaue function is the imum expected cumuative sum of queue engths when the system evoution is considered over n sots starting with state s or S0 = s i.e. n 1 N i=1 Q it S0 = s. From 9 eq 6.6.7 for arge enough n we have that the γ which attains the imum in the above vaue iteration equation 4 for every s is an approximatey-optima stationary poicy γs for the average cost probem. Furthermore again from 9 eq. 6.3.6 we have that for arge n V n s = hs + ng. 5 The motivating idea behind the definition of the 1-ookahead poicy is that a good poicy can be obtained from the ACO 3 3 We note that one of the sufficient conditions for the existence of the ACO is that there exists a poicy under which 2 is finite. Since the VFMW poicy has a finite average queue ength if the arriva rate vector is within Λ the above sufficient condition is satisfied. Other sufficient conditions which dea with irreducibiity of the queue ength Markov chain for the existence of the ACO can be shown to hod under appropriate assumptions on the distribution of the arriva random variabes. Since this informa discussion is to motivate the form of the 1-ookahead poicy these detais are not incuded here.

Reative vaue function hq q r m 1 2 80 70 60 50 40 30 20 10 q 1 = 10 r = 0 m = 1 q 1 = 10 r = 2 m = 1 q 1 = 30 r = 0 m = 1 0 0 25 50 75 100 125 150 175 200 Queue ength q 2 Fig. 2. The reative vaue function hs where s = q 1 q 2 r m potted as a function of q 2 for different q 1 r and m. The reative vaue function has been obtained from vaue iteration carried out for a system with buffer size for both queues truncated to 1000 channe connectivity parameters µ 1 = µ 2 = 0.5 and arriva rates λ 1 = 0.15 and λ 2 = 0.2. We observe that hs need to be approximated by a non-inear function of q 2. or the vaue iteration 4 if a good approximation can be found for hs or V n s. The approximations for hs or V n s can be substituted into the RHS of the ACO 3 and 4 and the imizing γ can be proposed as a candidate poicy. Suppose an approximation to hs is hs. We note that except for a constant hs is aso an approximation for V n s from 5. We first use a fuid mode 6 Chapter 10 of the queueing system in order to motivate that hs = i q2 i is a reasonabe approximation to the function hs. Under the optima poicy et µ i be the time average of the service opportunity which is given to queue i. The fuid mode of the queueing system modes each queue ength as a deteristic function q i t which evoves according to the differentia equation dq i t dt = µ i λ i. We assume that the initia state q i 0 of queue i is q i the component of our state s. We then obtain that q i t = maxq i µ i λ i t 0. The area under the q i t function or the cumuative queue ength for queue i is proportiona to qi 2. The tota cumuative queue ength which is then proportiona i q2 i is the vaue function for the fuid mode. From 6 Theorem 10.0.3 we have that the vaue function for the fuid mode is the same as the reative vaue function hs where the queue ength components of s are q 1 and q 2 used in the fuid mode in an asymptotic regime where q 1 q 2. So we approximate the reative vaue function hs with i q2 i. The second motivation for the use of i q2 i as an approximation comes from numericay soving the ACO using vaue iteration. In Fig. 2 we pot hs as a function of q 2 for different q 1 r and m. where hs has been obtained using vaue iteration for 1000 iterations and for queue ength state space truncated to 1000. The other parameters used are indicated in Fig. 2. We find that a second order poynomia i.e. c 1 q2 2 + c 2 q 2 + c 3 is a good choice for fitting the observed function. When higher degree poynomias are used the coefficients of higher degree terms are seen to be cose to zero. We note that we excude inear terms as we as cross terms i.e. q 1 q 2 in our approximation for the sake of anaytica simpicity. We now use the sum of squares approximation to motivate the form of the 1-ookahead poicy. From 4 we have that a poicy that achieves the imum in { q q 2 + V n S + s } for arge enough n is approximatey optima for the average queue ength imization probem. Suppose we use the sum of squares approximation i q2 i for V ns motivated by 5. Then we are considering the poicy that chooses γ to achieve the imum as foows { q q 2 + Q 1 1 2 + Q 2 1 2 s }. Here Q 1 1 and Q 2 1 are the queue engths that the MDP evoves to under an action γ according to the MDP evoution described above starting with Q 1 0 = q 1 and Q 2 0 = q 2. Let us consider the case where s is such that queue 1 is currenty being served a simiar discussion hods for the case where queue 2 is being served. For γ = 0 we have that Q 1 1 2 + Q 2 1 2 s = q 1 C A 1 2 + q 2 + A 2 2 and for γ = 1 we have that Q 1 1 2 + Q 2 1 2 s = q A 1 2 + q 2 + A 2 2 since we have 1. We see that γ = 1 wi never be chosen; this happens of course because our approximation does not capture the possibe service that can happen for queue 2 after sots. The approximation does not capture the possibe service since the approximation athough simpe does not incude other state variabes especiay r. In order to capture this potentia service after sots we proceed by writing V n+1 s as: { +m 1 V n+1 s = Q 1 τ + Q 2 τ s + V n +m 1S s } ++m. Here Q 1 0 = q 1 and Q 2 0 = q 2 where q 1 and q 2 are the queue ength components of s. Aso Q i τ is the queue ength that the MDP evoves to in the τ th sot after taking action γ in the first sot and then the optima actions in a sots. Then speciay note that the expectation +m 1 Q 1 τ + Q 2 τ s is computed and the state S ++m is what the MDP evoves to in + m sots under the use of γ in the first sot and the optima poicy after that. We note that the above aternate expression for V n+1 s can be written for any n > +m but we are interested in sufficienty arge vaues of n. We now expicity indicate that the optima actions are taken for the first + m 1 sots by using a imization over action variabes γ 1... γ +m 1. We have that V n+1 s is γγ 1...γ +m 1 { +m 1 Q 1 τ + Q 2 τ s V n +m 1S ++m s }. 6 We note that since n is arge the sum of squares approximation can aso be appied to V n +m 1s.

We now consider the case where m = 1. For m = 1 we note that { V n+1 s = Q 1 τ + Q 2 τ s γγ 1...γ + V n S s } ++1. 7 Assug that s is such that queue 1 is being served the possibe action sequences can be divided into three: A1: γ = γ 1 = = γ = 0 or stay with queue 1 for sots A2: γ = 1 γ 1 = = γ = 0 or switch to queue 2 in the first sot and then stay with queue 2 for the rest of sots A3: Any other action sequence. In the foowing in order to differentiate between queue ength evoution under different action sequences we use Q j i t to denote the queue ength for queue i under action sequence Aj e.g. Q 3 1t for queue ength of queue 1 under action sequence A3. For the action sequence A1 we have for a τ {0... + 1} Q 1 1τ = q 1 C 1 + A 1 and Q 1 2τ = q 2 + A 2. For the action sequence A2 we have for a τ {0... + 1} and for a τ {0... } Q 2 1τ = q A 1 Q 2 2τ = q 2 + A 2. We note that there is no service under A2 for the first sots but at the + 1 th we do have a service i.e Q 2 2 + 1 = q 2 C 2 + A 2. We note that the action sequence corresponding to A3 above is of the form γ = 0 γ 1 = 0... γ τs 1 = 1.... That is we switch to queue 2 from queue 1 in the τs th sot where τ s > 1. We note that for m = 1 after τ s there cannot be any more service in the system in the first sots since there is a reconfiguration deay of sots. Then we have that for A3 and any τ {0 1... + 1} ττ s 1 Q 3 1τ = q 1 C 1 + A 1 and Q 3 2τ = q 2 + A 2. We note that for the same sampe path of arrivas A 1 A 2 and channe connectivity C 1 and C 2 and for any τ {0... + 1} Q 1 1τ Q 3 1τ and Q 1 2τ = Q 3 2τ. Aso under the assumption of the sum of squares form for V n. we have that under A1 V n S ++1 s = Q 1 i + 1 2 s which is ess than the foowing corresponding vaue V n S ++1 s = Q 3 i + 1 2 s under A3. Therefore we concude that for m = 1 and a sum of squares approximation for V n s the optima sequence of actions is either A1 or A2 in the above ist but not any action sequence in A3. Thus for m = 1 with the sum of squares form for V n. we have that the optima choice is between A1 and A2; which incidentay is a choice between staying with the current queue or switching to the other queue. This choice forms the basis for the definition of the 1-ookahead note that m = 1 poicy. From 7 and the fact that the optima sequence is either A1 or A2 we have the foowing decision rue. We choose γ = 0 if Q 1 i + 1 2 s < Q 1 i τ + i i i Q 2 i τ + i i i Q 2 i + 1 2 s and γ = 1 otherwise. We have that Q 1 1τ + Q 1 2τ s = 9 q i + i A i C 1 and i Q1 i + 1 2 s is q A 1 τ C 1 τ 2 + q 2 + A 2 τ 2 s which can be simpified as 2 = C 1 τ C 1 0 + 2q 1 2 C 1 τ C 1 0 + 2q 1 C 1 τ C 1 0 2 A 2 τ + 2q 2 s 8 2 A 1 τ A 1 τ A 1 τ + q1 2 + q2 2 + A 2 τ. 10

Simiary we have that Q 2 1τ + Q 2 2τ s = 11 and q A 1 + q 2 + A 2 s Q 2 1 2 + Q 2 2 + 1 2 s = q A 1 τ 2 + q 2 C 2 + A 2 τ 2 s = q1 2 + 2 A 1 τ + 2q 1 2 A 2 τ + 2q 2 A 1 τ + q2 2 + A 2 τ + C 2 2 C 2 0 2q 2 C 2 C 2 0 2 C 2 C 2 0 A 2 τ. 12 We note that the comparison in 8 is equivaent to the comparison 9 + 10 < 1 12. Severa terms are common in 9 + 10 and 1 12. We cance out those common terms and keep ony those terms which have a queue ength term appearing in it. For arge vaues of queue engths these terms woud doate other constant terms. Furthermore such a choice woud resut in the decision rue for 1-ookahead to have a simpe biased max-weight form. Then we get a heuristic decision rue of choosing γ = 0 if 2q 1 C 1 τ C 1 0 < 2q 2 C 2 C 2 0 or q 1 C 1 τ C 1 0 > q 2 C 2 C 2 0 and γ = 1 otherwise. We note that this is the motivation behind the definition of the biased max-weight 1-ookahead poicy. We now consider the case where m > 1. Using a simiar sequence of steps as for m = 1 it is possibe to derive a decision rue comparing A1 with A2. This comparison between A1 and A2 eads to the rue of choosing γ = 0 if +m 1 +m 1 q 1 C 1 τ C 1 0 > q 2 C 2 τ C 2 0 τ= However for m > 1 we note that it is not possibe to concude that the optima sequence of actions is either A1 or A2 but not any in A3. We iustrate this via an exampe here. Consider 6 for = 2 and m = 3. Assug that we are serving queue 1 a possibe sequence of decisions is γ = 0 γ 1 = 1 γ 2 = 0 γ 3 = 1 γ 4 = 0 i.e. we stay with queue 1 then switch to queue 2 wait for 2 sots switch back to queue 1 and wait for 2 sots. Such a sequence of decisions might ead to service for both queue 1 and queue 2. We note that the simpe decision rue that we have proposed in this case just compares A1 and A2 and not any poicy in A3 such as the one above. If we incude other comparisons to action sequences of type A3 the simpicity of the definition of 1-ookahead poicies does not carry over. Since 1-ookahead poicies are throughput optima and have a simpe specification we restrict our attention to the case of m = 1. We note that the correction to the weight or bias using optimaity. F Qt is motivated by the proof of throughput V. CONCLUSIONS AND FUTUR WORK In this paper we considered queueing system modes with reconfiguration deay. We proposed a new 1-ookahead poicy using an approximate soution to a Markov decision process formuation. The biased max-weight form of the 1-ookahead poicy is anayticay we motivated. Furthermore since the form of 1-ookahead is simiar to that of QBMW the bias term which appears in QBMW can aso be expained using this anaytica deveopment. We aso prove that 1-ookahead is throughput optima. We note that the Markov decision process formuation can be extended to the case of correated channe connectivity by incuding channe states in the MDP s state. It is possibe to again define a 1-ookahead poicy for the correated channe case see 5; proving the throughput optimaity of such a poicy woud be part of our future work. RFRNCS 1 Güner D Ceik Sem C Borst Phiip A Whiting and ytan Modiano. Dynamic scheduing with reconfiguration deays. Queueing Systems 831-2:87 129 2016. 2 Güner D Ceik Long B Le and ytan Modiano. Dynamic server aocation over time-varying channes with switchover deay. I ansactions on Information Theory 589:5856 5877 2012. 3 Güner D Ceik and ytan Modiano. Scheduing in networks with timevarying channes and reconfiguration deay. I/ACM ansactions on Networking 231:99 113 2015. 4 Ping-Chun Hsieh I Hou and Xi Liu. Deay-optima scheduing for queueing systems with switching overhead. arxiv preprint arxiv:1701.03831 2017. 5 Amit Kumar and Vineeth B. S. Scheduing poicies for wireess downink with correated random connectivity and mutisot reconfiguration deay. Accepted to I Communications Letters 2017. 6 S. P. Meyn. Contro Techniques for Compex Networks. Cambridge University Press 2007. 7 Michae J Neey. Stochastic network optimization with appication to communication and queueing systems. Synthesis Lectures on Communication Networks 31:1 211 2010. 8 Linn I Sennott. Stochastic dynamic programg and the contro of queueing systems voume 504. John Wiey & Sons 2009. 9 C. H. Tijms. A first course in stochastic modes. Wiey 2003.. APPNDIX A PROOF OF THORM III.1 We prove the stabiity of the modified 1-ookahead poicy in this section. Suppose t k is the sot at which the k th switch happens. Let T k = t k+1 t k be the duration of the k th frame. Let us assume that in the k th frame the i th queue is being

served. Then we note that at t k µ i Q i t k > µ j Q j t k for every j. Furthermore we have that there exists a j such that Q i t k + T k µ i < Q j t k + T k µ j. F Qt k + T k We first prove the foowing emma which states that for arge enough queue engths the duration of the frame T k is arge. The proof is simiar to Lemma 3 in 4. Lemma A.1. The ength T k of the k th frame for every k is such that where we have used T k 1. Using this upper bound on F. in 13 we obtain that T 1+α Q i t k µ i k > A max N + Q t k µ α + A max. We aso note that Q i t k µ i Q t k µ and hence Q i t k µ i. Hence Q t k µ N T 1+α k > Q t k µ N A max N + Q t k µ α + A max. T 1+α k > Q t k µ N A max N + Q t k µ α + A max where α 0 1 is as in the definition of F Qt. Proof. From the definition of T k or t k+1 we have that there exists a j such that Q i t k + T k µ i < Q j t k + T k µ j. F Qt k + T k We note that Q i t k + T k Q i t k T k Q j t k + T k Q j t k + A max T k where we have used the fact that in T k sots the maximum amount of service for queue i is T k and the maximum number of arrivas for queue j is A max T k. We aso note that the first ower bound coud be negative. Substituting these bounds we obtain that Q i t k T k µ i < F Qt k + T k Q j t k + A max T k µ j Q i t k µ i µ j Q j t k < F Qt k + T k T k µ i + A max µ j where we have used F Qt k + T k 1. Since µ i and µ j 1 and µ i Q i t k µ j Q j t k we have that T k > Q i t k µ i F Qt k + T k + A max. 13 We note that α F Qt k + T k Q t k + A max T k µ A max T k N + Q t k µ α where we have used that there are N queues and µ 1. Then we have that F Qt k + T k T α k T α k A max N + A max N + Q t k µ α Q t k µ α. An important coroary of this emma is stated beow. Coroary A.2. Given a Z + there is a finite set Q of queue ength vectors such that T k + for Qt k Q c. Proof. From Lemma A.1 we have that T k > Q 1 t k µ N A max N + 1+α Q t k µ α. + A max We note that the ower bound on T k is an increasing function of Q t k µ. So for a given there is a finite δ such that T k + for Q t k µ > δ. There is a finite set of queue ength vectors Q such that Q t k µ δ for which the above ower bound on T k does not guarantee that T k > +. We now prove the stabiity of the 1-ookahead poicy using Lyapunov drift arguments. The Lyapunov function is chosen to be Lt = Q t 2. From Coroary A.2 we note that at t k for a queue ength vectors except for a finite set Q we have that T k +. We first consider the case of the compement set of Q. So T k + in the discussion beow. In the stabiity proof we wi consider the Lyapunov drift of the system over a frame first between t k and t k + + and then for every sot in {t k + + + 1 t k + T k }. We note that in 4 the drift is considered over t k and t k + T k instead of between t k and t k + +. In 4 T k is defined as T k F Qt k. It is not cear why T k woud be greater than ; this property is essentia to the drift arguments that foow in 4. In contrast we consider Tk to be a constant + but ony for those t k for which the queue ength vector ies in Q c. Then we show that whatever Lyapunov drift that we obtain for the case where Qt k Q c can be appied to the case where Qt k Q but with a different constant factor in the drift expression. The Lyapunov drift between t k and t k + + is Q t k + + 2 Q t k 2. We note that Q t k + + max + 1 A t k + t. Q t k + 1 I t k + t 0 +

With Γ defined as max Q t k + 1 I t k + t 0 Q t k + 1 I t k + t we have that Q t k + + Q t k + 1 I t k + t + Γ + + 1 A t k + t. Then 14 can be bounded above as + 1 + 1 Q t k I t k + t + Γ + A t k + t Q t k 2 B 0 + 2 + 1 Q A t k + t I t k + t Here B0 is a constant which is obtained by bounding I t k +t above by 1 A t k above by A max and the fact that Γ is at most + and is non-zero ony if Q +. We note that B0 is a function of. The expected Lyapunov drift which is conditioned on Qt k is then B 0 + 2 Q + λ I {=i} µ i where I {=i} is 1 if = i and 0 otherwise since in the k th frame there is service ony for the i th queue. Furthermore in the first sots after t k we do not have service since the system is in reconfiguration. We now write the expected Lyapunov using vector notation as being B0 + 2 λ T + Ii µ i Qt k 14 where a vectors are coumn vectors and I i is a vector of zeros except with 1 at the i th position. Suppose we consider a λ in the stabiity region then we have that there exists a set of β j j β j 1 such that λ j β j I j µ j and we can write λ = 1 ɛ j β ji j µ j for some ɛ > 0. Furthermore we have that µ j I j T Qt k µ i I i T Qt k since we have switched to i at t k. Then we have that 14 is B0 + 2 1 ɛ T I i µ i Qt k + For any ɛ > 0 we choose such that 1 ɛ T = ɛ r+ 1 for some ɛ 1 > 0. Then we have that the expected Lyapunov drift between t k + + and t k is B 0 2ɛ 1 µ i I i T Qtk. Now since µ j I j T Qt k µ i I i T Qt k we have that B0 2 ɛ 1 µ Q t k. N. We note that Q t k + t Q t k + A max t or + 1 2 µ Q t k + t µ A max + + + 1 2 + µ Q t k. Then the expected Lyapunov drift between t k and t k + + is B1 ɛ 1 + 1 2 µ Q t k + t + N where B1 is a constant dependent on. We now consider the sot to sot drift of the Lyapunov functions for sots t {t k + + + 1 t k + T k }. We note that we are sti restricting to those t k at which Qt k Q c. The Lyapunov drift is Q t + 1 2 Q t 2 As in the previous case we have that Q t + 1 = Q t I t + A t + Γ and it can be shown that the Lyapunov drift is B 2 + 2 Q t A t I t. The expected Lyapunov drift conditioned on Qt is B 2 + 2 Q t λ I {=i} µ i. Here we note that there is ony service to queue i in frame k and we are considering sots after the reconfiguration deay. In vector notation we have T B 2 + 2 λ I i µ i Qt. 15 Since λ is in the stabiity region we again have that 15 is B 2 + 2 1 ɛ j β j I j µ j I i µ i T Qt. We note that at t µ j I j T Qt F Qt µ i I i T Qt. Substituting we get the conditiona drift as B 2 + 2 1 ɛ 1 µ i I i T Qt F Qt = B 2 + 2 ɛ + 1 ɛ µ i I i T Qt F Qt = B 2 2ɛµ i I i T Qt + 21 ɛ µ i I i T Qt. F Qt We note that µ i I i T Qt µ I T Qt. Aso µ i I i T 1 Qt N1+ µ I T Qt. There we have the conditiona drift as 2ɛ B 2 µ I T Qt + N 21 ɛ µ I T Qt. F Qt

We note that there woud exist a finite set Q α in the compement of which the above drift woud be negative since the doant term in the above bound is the second term. By defining a new constant B 3 we can therefore write that the expected drift is B 3 2ɛ 2 N µ I T Qt. We note that the above drifts were defined for t k such that Qt k Q. By redefining the constants in the drift expressions we have that for a Qt k the above drift conditions hod. Then the rest of the proof proceeds as in 4 from equation 127 onwards and it can be shown that the average queue ength is finite.