Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device

Similar documents
Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

Lyapunov Functions. V. J. Venkataramanan and Xiaojun Lin. Center for Wireless Systems and Applications. School of Electrical and Computer Engineering,

Equilibrium in Queues Under Unknown Service Times and Service Value

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy

The total derivative. Chapter Lagrangian and Eulerian approaches

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lecture 6: Calculus. In Song Kim. September 7, 2011

On the Aloha throughput-fairness tradeoff

Linear and quadratic approximation

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Least-Squares Regression on Sparse Spaces

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

Calculus of Variations

An M/G/1 Retrial Queue with Priority, Balking and Feedback Customers

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Throughput Optimal Control of Cooperative Relay Networks

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Perturbation Analysis and Optimization of Stochastic Flow Networks

Math 342 Partial Differential Equations «Viktor Grigoryan

Capacity Analysis of MIMO Systems with Unknown Channel State Information

6 General properties of an autonomous system of two first order ODE

Logarithmic spurious regressions

Linear First-Order Equations

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Improved Rate-Based Pull and Push Strategies in Large Distributed Networks

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

A Weak First Digit Law for a Class of Sequences

Lie symmetry and Mei conservation law of continuum system

The Exact Form and General Integrating Factors

Final Exam Study Guide and Practice Problems Solutions

Introduction to the Vlasov-Poisson system

An Analytical Expression of the Probability of Error for Relaying with Decode-and-forward

PDE Notes, Lecture #11

Switching Time Optimization in Discretized Hybrid Dynamical Systems

How to Minimize Maximum Regret in Repeated Decision-Making

Convergence of Random Walks

The canonical controllers and regular interconnection

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

The Principle of Least Action

arxiv:hep-th/ v1 3 Feb 1993

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

12.11 Laplace s Equation in Cylindrical and

The Generalized Incompressible Navier-Stokes Equations in Besov Spaces

Math 115 Section 018 Course Note

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Delay Limited Capacity of Ad hoc Networks: Asymptotically Optimal Transmission and Relaying Strategy

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

Qubit channels that achieve capacity with two states

Role of parameters in the stochastic dynamics of a stick-slip oscillator

SYNCHRONOUS SEQUENTIAL CIRCUITS

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

Introduction to Markov Processes

θ x = f ( x,t) could be written as

Systems & Control Letters

Discrete Mathematics

Monotonicity for excited random walk in high dimensions

Agmon Kolmogorov Inequalities on l 2 (Z d )

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems

Differentiation ( , 9.5)

All s Well That Ends Well: Supplementary Proofs

Blended Call Center with Idling Times during the Call Service

Existence of equilibria in articulated bearings in presence of cavity

Fluid model for a data network with α-fair bandwidth sharing and general document size distributions: two examples of stability

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

A Course in Machine Learning

7.1 Support Vector Machine

Optimization Notes. Note: Any material in red you will need to have memorized verbatim (more or less) for tests, quizzes, and the final exam.

Jointly continuous distributions and the multivariate Normal

ECE 422 Power System Operations & Planning 7 Transient Stability

Generalized Tractability for Multivariate Problems

New Bounds for Distributed Storage Systems with Secure Repair

On colour-blind distinguishing colour pallets in regular graphs

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

23 Implicit differentiation

Table of Common Derivatives By David Abraham

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Adaptive Gain-Scheduled H Control of Linear Parameter-Varying Systems with Time-Delayed Elements

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Chapter 6: Energy-Momentum Tensors

Sturm-Liouville Theory

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016

Lower bounds on Locality Sensitive Hashing

A Sketch of Menshikov s Theorem

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Some Examples. Uniform motion. Poisson processes on the real line

Proof of SPNs as Mixture of Trees

Space-time Linear Dispersion Using Coordinate Interleaving

Number of wireless sensors needed to detect a wildfire

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Fall 2016: Calculus I Final

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

Transcription:

Close an Open Loop Optimal Control of Buffer an Energy of a Wireless Device V. S. Borkar School of Technology an Computer Science TIFR, umbai, Inia. borkar@tifr.res.in A. A. Kherani B. J. Prabhu INRIA Sophia Antipolis 06902 Sophia Antipolis, France. {alam,bprabhu}@sophia.inria.fr Abstract We stuy a ecision problem face by an energy limite wireless evice that operates in iscrete time. There is some external arrival to the evice s transmit buffer. The possible ecisions are: a to serve some of the buffer content; b to orer a new battery after serving the maximum possible amount that it can; or c to remain ile so that the battery charge can increase owing to iffusion process which is possible in some commercially available batteries. We look at open an close-loop controls of the system. The close-loop control problem is aresse using the framework of arkov Decision Processes. We consier both finite an infinite horizon iscounte costs as well as average cost minimization problems. Without using any secon orer characteristics, we obtain results that inclue i optimality of bang-bang control, ii the optimality of threshol base policies, iii parametric monotonicity of the threshol, an iv uniqueness of the threshol. For the open-loop control setting we use recent avances in application of multimoular functions to establish optimality of bracket sequence base control. I. INTRODUCTION Wireless evices are constraine in their operational lifetime by finite energy batteries. Therefore, energy efficient esign of protocols at ifferent layers of the protocol stack for wireless networks has recently receive significant attention, see, for example, []-[4]. Although the primary objective of a terminal is to transmit an receive ata with minimal elay, this must be one with the ae constraint of minimizing the transmission costs an increasing the operational lifetime of the terminal. In [5], the authors stuie elay optimal packet scheuling policies subject to average transmit power constraint over a wireless channel with inepenent faing. In [6], the authors extene this moel to inclue arkovian faing. Although in the above mentione articles an average transmit power constraint was impose, an interesting feature of the battery was ignore. In [7], it was observe that a battery when left ile can regain some of its lost charge. This phenomenon, known as relaxation phenomenon, allows a battery operate in pulse or intermittent ischarge moe to eliver more energy than the same battery operate in continuous ischarge moe. This enables a user to sen more packets an increase the operational lifetime of the terminal by leaving the battery ile in between packet transmissions thus proviing an incentive to remain ile even though the transmit buffer may not be empty. However, this woul a to the elay of the packets queue up in the buffer. This trae-off between energy an elay leas to a ecision making problem formulation where the user has to ecie whether to serve packets or leave the terminal ile in orer to minimize certain costs. In this paper, we consier a iscrete time system in which a user with a finite energy battery terminal has to ecie whether to serve packets or to leave the system ile in each time slot. Further, the user can ecie to replace the battery with a new one at an aitional cost. We note that there are two variables i.e., energy level of the battery an the length of the transmit buffer base on which a ecision is to be mae. We formulate the problem as a arkov ecision process. We then erive the structural properties base on the irectional erivative of the value function. We first consier a finite horizon problem an provie the structure of the optimal policy. We then exten this to the infinite horizon iscounte cost problem, an finally consier the infinite horizon average cost minimization. We then consier the problem of making an optimal ecision when the knowlege of the remaining energy an buffer occupancy are not known. The outline of the paper is as follows. In Section II, we formulate the problem of close loop control. Sections III IV- B an IV-C eal with finite horizon iscounte cost, infinite horizon iscounte cost an infinite horizon average cost, respectively. Section VI eals with open loop control. For the proofs, an the efinitions use in Section VI on open loop control, the reaer is referre to the etaile research report [8]. II. CLOSED LOOP CONTROL We first consier the optimal control problem where, at the beginning of each time slot i.e., ecision epoch, the evice is aware of the current buffer occupancy an the energy level of the battery, an hence can take an action base on these two parameters. However, it oes not have any knowlege of the amount of ata that will arrive in the transmit buffer uring the current time slot. In this section, we shall formalize the problem statement mentione in the Introuction. A. Problem Formulation Let x n an p n enote the buffer length an the remaining energy level, respectively, at the beginning of the n th time slot. We assume x n is infinitely ivisible an x n [0,, n, i.e., the buffer content is flui an there is infinite buffer space. The remaining energy level, p n, is assume to be boune above by, i.e., p n [0, ], n. A linear relationship is assume between the amount of energy an the amount of flui that can be

2 transmitte using this energy. In other wors, a unit of energy is require to serve each unit of flui. The state space, C, is given by C = {x, p : x [0,, p [0, ]}. The cost function associate with the state x, p is enote by hx + gp, where hx is an increasing function of x an gp is a ecreasing function of p. We coul also use a composite cost function cx, p instea of hx + gp, an all the results in the paper woul continue to hol. However, for simplicity, we use the above mentione form. Let w n enote the amount of flui which arrives uring the n th time slot. The ranom sequence {w n }, n 0, is assume to be compose of inepenent an ientically istribute ranom variables, each with istribution function µ. We assume that when the battery is left ile in a slot, the resiual battery energy or, charge increases from p to some amount p + Bp p. In the rest of the article, we will rop the epenence on p of Bp an use B to enote the function. We note that the case B = 0 correspons to the other practical scenario where the battery oes not gain its charge when left ile. The following evelopment easily extens to the case where Bp < 0. In state x, p, the user can take one of the following actions: remain ile, 2 serve some amount u [0, x p] without reorering a new battery, or 3 serve x p an reorer a new battery with resiual energy level. Here the symbol enotes the minimum operator. We enote the action space by A, where A = {, 2, 3}. A cost of rp, where r is a non-ecreasing function of p, is incurre each time a battery is reorere in state x, p. A policy π efines an action for each x, p C. Let β enote the iscount factor. We stuy optimal policies which minimize one of the following three cost criteria. Finite horizon iscounte cost β k hx k + gp k, k=0 for some N > 0. Infinite horizon iscounte cost β k hx k + gp k. k=0 Infinite horizon average cost lim N N hx k + gp k. k=0 III. THE FINITE HORIZON DISCOUNTED COST CASE Let V k x, p enote the cost when the system state is x, p an there are k more ecision epochs before reaching the horizon. Since the ecision epoch can be etermine from V k, we use x, p an w instea of x k, p k an w k, respectively. For the finite horizon iscounte cost case, the ynamic programming equation is V k+ x, p = hx + gp + β min E w V k x p + + w, + rp x +, E w V k x + w, p + B, min wv k x + w u, p u 0 u x p. The expectation operator, E w, is the expectation over the ranom variable w. The first term for the minimum operator correspons to the ecision of serving the maximum possible flui an reorering a new battery. The secon term correspons to the ecision of leaving the battery ile, an the thir term correpons to the ecision of serving some amount of flui. An important property satisfie by the above formulation is that p rp x+ + x rp x+ = 0. In the rest of this section, we shall use a fixe cost c 3 for battery reorering instea of r. In the rest of this section, we assume that β = with a note that all the results of this section are vali when we consier a iscount factor β <. Let N enote the finite horizon. We first provie a simple conition uner which bang-bang control is optimal, i.e., the ecision is to either serve the maximum possible quantity or to remain ile. We then provie structural result of the optimal policy, like parametric monotonicity of the threshol policy. We woul like to point out here that we obtain the results of this section without using secon orer characteristics of the value functions such as convexity or ecreasingifferences. A. Optimality of Bang-Bang Control Let V x, p = y V y, p y=x + q V x, q q=p enote the irectional erivate of V y, q along the vector, at x, p. We now give a conition which is use to erive the structure of the optimal policy. Assumption : The function h x+g p has the same sign, say S {+, }, for all values of x an p. Remark : The cost function gp is a continuous nonincreasing function efine on the close interval [0, ]. The erivative of gp is, therefore, negative an boune from below. If hx is a polynomial in x such that the coefficient of x is greater than sup p g p then S will be positive for all x an p. Similary, if the erivative of hx is upper boune by inf p g p then S will be negative for all x an p. For example, for c > 0, the function hx = c exp xx is convex an increasing with h x upper boune by c. This conition is quite general an is satisfie by many natural caniates for the cost functions hx an gp. In particular, this conition is also satisfie by the linear functions hx = c x an gp = c 2 p. Let u := argmin E w V k x + w u, p u 0 u x p

3 enote the optimal amount of flui to be serve in a slot in which the ecision is to serve some flui. Uner Assumption, we have the following two lemmas. Lemma : For all values of x, p, w, an k, u is given by u = { 0 if Vk x, p < 0, x, p, x p if V k x, p > 0, x, p. The above lemma helps us to quantify the optimal amount of flui to be serve in a slot in which the ecision is to serve some flui. Depening on the sign of V k x, p, the optimal amount of flui to be serve is either zero or the maximum possible amount. We now characterize the behaviour of the sign of V k x, p. Lemma 2: Uner Assumption, the irectional erivative V k x, p has the same sign S for all values of k < N, x, an p. Corollary : For all values of x, p, w, an k < N, the optimal amount of flui to be serve, u, is { 0 if S is negative, u = x p if S is positive. Using inuction, one can prove the following lemma. Lemma 3: V k x, p is ecreasing in p for a fixe x an increasing in x for a fixe p. Using the previous three lemmas, we can now obtain certain characteristics of the optimal policy for two ifferent values of S. Theorem : Let r be a constant, equal to c 3. If S is negative then the optimal policy is to either a serve maximum possible amount x p an reorer, or b remain ile. 2 If S is positive then the optimal policy is to either a serve maximum possible amount x p an reorer, or b serve maximum possible amount x p an o not reorer, or c remain ile. For either value of S, the optimal policy is bang-bang type, i.e., either no flui is serve or maximum possible amount of flui is serve. Theorem 2: If S is positive an p is equal to then the optimal policy is to first serve x an then ecie to reorer or not. The battery charge p can not increase beyon. When S is positive an p is equal to, leaving the terminal ile will a to the cost of increasing the elay of packets. Therefore, it is optimal not to remain ile. For the special case where B is equal to zero an S is positive, we can strengthen Theorem in the following way. Theorem 3: Let r be a constant, an let B be zero. When S is positive, the optimal policy is to either serve maximum possible amount x p an reorer, or serve maximum possible amount x p an o not reorer. Remark When B is equal to zero an S is positive, it is optimal at all ecision epochs to serve x p. We are, therefore, left with the ecision to reorer or not to reorer. However, when B is equal to zero an S is negative, we can not eliminate the ecision to remain ile from the action space. This point will become clear later when we consier infinite horizon iscounte cost problem. The above series of results have systematically reuce the number of choices to be mae. We have also note the optimality of bang-bang control policy, i.e, it is optimal to either serve the maximum possible amount or to serve nothing. Next we provie some structural results for the optimal policy. We stuy separately the case when S is negative an the case when S is positive. IV. WHEN S IS NEGATIVE In this section we consier the case when S is negative an obtain structural results for finite an infinite horizon iscounte cost, an then use vanishing iscount approach to stuy the infinite horizon average cost case. A. Structure of the Optimal Policy We show that for each given value of p an k, there is a value x k p such that if x < x k p then the optimal action is to remain ile. That is, for each value of p an k there exists a threshol x k p such that if the amount of flui in the buffer is less than the threshol then the optimal action is to remain ile. We also show that x k p is increasing function of p. First, we have the following lemma which can be prove using inuction. Lemma 4: The partial erivative p V kx, p resp. x V kx, p is boune above resp. below by max p p gp resp. min x xhx for each k N. Remark If p gp < then p V kx, p < for all values of k N. Since we are consiering the case when S is negative, from Theorem, we only have two actions to choose from. Therefore, the ynamic programming equation simplfies to V k x, p = hx + gp + min E w V k x p + + w, + c 3, E w V k x + w, p + B. 2 Lemma 5: At each ecision epoch k, there is an x k p such that it is optimal to remain ile when x < x k p when the battery level is p. Theorem 4: If S =, i.e., the irectional erivative x hx + p gp is less than 0, then x k p is an increasing function of p. Remark In orer to obtain the parametric monotonicity we have not use convexity or ecreasing ifferences property of the value function which, in fact, are not present in our case. Similarly we get Corollary 2: If X p enotes the set of queue lengths x such that optimal ecision is to remain ile when battery level is p, then X p is increasing in p in the sense that X p X p+δ, δ > 0. B. Infinite Horizon Discounte Cost Now we consier the case of N =, i.e., the infinite horizon problem. It is clear that all the properties obtaine for the finite horizon problem are vali for this case also. For the finite

4 horizon case, it was necessary to stuy the value function for all possible values of its argument, i.e., x an p. However, since we inten to stuy the average cost problem via the infinite horizon iscounte cost case by using the vanishing iscount approach, we will see that, for the case where S is negative, it is enough to stuy the value function V x, p only at p equal to. In this section, we stuy the structure of the infinite horizon iscounte problem value function V x, assuming S is negative. We know that V, satisfies the ynamic programming equation V x, p = hx + gp + β min E w V x p + + w, + c 3, E w V x + w, p + B, min wx u + w, p u 0 u x p. 3 Assuming that S is negative, we can use value iteration for the above problem to show that V x, p = hx + gp + β min E w V x p + + w, + c 3, E w V x + w, p + B. 4 Assume now that hx is a concave ifferentiable increasing function so that sup x x hx = x hx x=0 βc3. Now we consier the value-iteration for p =, i.e., assume V 0 x, p = hx+gp, an apply the above minimization iteratively, generating a family of value functions V k x, p, k 0 so that V k+ x, p = hx + gp + β min E w V k x p + + w, + c 3, E w V k x + w, p + B. 5 Since p is equal to, V k+ x, = hx + g + β min E w V k x + + w, + c 3, E w V k x + w,. 6 We observe from the above expression that once the battery energy reaches, it stays there. This simplifies the problem significantly as now, with initial battery level at, the value function can be viewe as a function of x, the buffer occupancy, only. We now prove that Theorem 5: The partial erivative of the value function with respect to x at x equal to zero, x V kx, x=0, is less than or equal to βk+ c 3, an V k x, is concave in x for all k 0. Remark: The above result suggests that if sup x x hx = x hx x=0 βc3 then it is optimal to remain ile forever whenever the battery is fully charge. This result, though seemingly counter-intuitive, can be explaine by the fact that we are consiering iscounte cost which gives weight to near future cost only. To be able to give more consieration to istant future, we require β very close to unity, in which case the hx conition of the theorem sup x x x=0 βc3 oes not hol. This point will be clearer when we consier the average cost optimization problem where one gives all weight to the istant future costs. Remark: If hx is not concave then we can only say that = hx x x V k+x, x=0 βc3 If we consier a linear form for the buffer cost, i.e., hx = c x then we have that Theorem 6: If hx = c x with c βc3 then. x V kx, x=0 = βk+ β c, so that V k x, is linear in x for all k 0. Corollary 3: If hx = c x with c βc3 then V x, = cx+g β. Theorem 7: If S is negative an if hx is concave function such that an inf x h x = β c3 + L for L > 0 then there exists an N such that V N x, > V N 0, + c 3 for some x <. Note that once the battery attains its maximum capacity, i.e.,, then if S is negative, the battery always remains fully charge. Hence, if the intial battery charge is then we can consier V n x, p as a function of x alone when consiering the infinite horizon problem. For this case we have the result of Theorem 8 which requires the following efinition. Definition We say that a function fx is -convex if f x f x +, x. Theorem 8: If S is negative, an hx is -convex an increasing then V n x, is continuous an -convex, i.e., x V nx + x V nx, x 0. Remark The above result also implies that V n x, is neither convex nor concave in general. Now we assume, without loss of generality, that inf x h x = this can always be one by appropriately scaling gp an c 3. We now use result from Lemma 4 to provie structure of the optimal policy. Theorem 9: If S is negative, an hx is -convex an increasing with h 0 = an c 3 < then there is a unique threshol T such that, for p =, if x T then it is optimal to remain ile for the infinite horizon problem else it is optimal to serve x an reorer the battery. Theorem 7 can now be applie to the case where hx = c x an then Theorem 9 can be use to obtain more structural results when hx = c x. We thus have the following structure when S is negative an hx = c x when starting from p = : if c β c3 then it is optimal to always remain ile when using iscount factor of β 2 else there is a T < c 3 such that it is optimal to remain ile for x < T an reorer battery for x > T. The first result is obtaine from Theorem 5 an the secon part is obtaine as follows: since c > β c3, starting from

5 p =, in the value iteration we will ultimately get a stage at which V x, > V 0, + c 3 for some x <. Now, since the structure erive for hx convex is vali here, Theorem 9 can be invoke an a similar proof yiels the conclusion. Let us now make the epenence of value function on β explicit an use V k,β, to represent the value function in k th step. Theorem 0: If c 3 < then for each k an x, V k,β x, is non-ecreasing in β. Let x k β enote the unique threshol for k-step to go cost function when the iscount factor is β. Lemma 6: The erivative x V k,βx is non-ecreasing function of β. Theorem : The erivative β x kβ 0. C. Average Cost We now consier the problem of optimization of the infinite horizon average cost whens is negative. The approach is to use results from infinite horizon iscounte cost optimization an then use the stanar vanishing iscount approach with β. It is clear that if the average cost exists, it is inepenent of the initial state so that we can, without loss of generality, assume that p 0 =. We saw that, for the iscounte cost case when S is negative, once the level is attaine, it is retaine throughout. Thus making the structural results obtaine in Section IV-B for this particular case very relevant to the analysis of average cost problem. We first nee to establish some continuity conitions conitions W in [5]. It is easily shown that the above conitions are satisfie in our problem. A sufficient conition for existence of stationary average optimal policy, which can be obtaine as limit of iscounte cost optimal policies f β x, is provie in [9]. In our problem f β : R {0, } where 0 means remaining ile an means serving x an reorering. Define w β x = V β x inf x V β x. Theorem 2 Schal, Theorem 3.8: Suppose there exists a policy Ψ an an initial state x such that the average cost V Ψ x <. Let sup β< w β x < for all x an the Conitions W hol, then there exists a stationary policy f which is average cost optimal an the optimal cost is inepenent of the initial state. Also f is limit iscount optimal in the sense that, for any x an given any sequence of iscount factors converging to one, there exists a subsequence {β m } of iscount factors an a sequence x m x such that f x = lim m f βm x m. In orer to apply above result we nee to show: Existence of policy Ψ: The policy of serving x in every slot yiels a finite average cost. 2 sup β< w β x < for all x: For any β, since V β x is monotone increasing, it follows that x := argmin V β x = 0. We have x V β x = h0 + βe w V β w. Now, for any fixe x 0 = x, consier a policy that serves x j an reorers till the first time the queue is empty let us enote this time by a ranom variable Z. Then it can be shown that w β x = V β x V β 0 n hx + [ n= x j=0 j w i j + ]P Z = n, i= where the expression on the right han sie is inepenent of β an finite almost surely if E[W ] <. The require conition is thus verifie. Hence, for the average cost criterion the cost is inepenent of the initial state. So we can, without loss of generality, assume p 0 =. Now we use the results of Section IV-B to obtain our main result. Theorem 3: For the average cost optimization problem, if S is negative an hx is a convex function, then there exists a threshol base policy which gives the minimum cost. V. THE CASE OF S = + WITH B 0 We now consier the case when S is positive an B 0. Our starting point is Theorem 3 which says that if B 0 then the optimal policy serves x p an then ecies whether to reorer the battery or not. Thus, in this case the ynamic programming equation is V k+ x, p = hx + gp + β min E w V k x p + + w, + c 3, E w V k x p + + w, p x +. 7 We nee to compare V k x p + + w, + c 3 with V k x p + + w, p x + in orer to obtain the policy at x, p. It is note from these terms that the optimal policy shoul be a function only of x p, i.e., the ecision is same for all x, p for which x p is same. This structure helps us in accurately characterizing the optimal policy which is one below. A. Finite Horizon Discounte Cost Consier the ynamic programming equation for the case x p. V k+ x, p = hx + gp + β min E w V k w, + c 3, E w V k w, p x. 8 Observe that the first term uner the minimization operation is inepenent of x an p. Now, it was shown in Lemma 4 that p V kx, p is boune above by a negative quantity. Since V k w, V k w, + c 3, it follows that, for each w, there is a value p k w such that V kw, l > V k w, + c 3 for all l = p k w an that V kw, l < V k w, + c 3. Note here that it is possible that p k w = 0 but what is important is that, owing to negative value of p V kx, p, the set {l : V k w, l < V k w, + c 3 } is connecte an hence has a smallest element that is p k w. Note also that p k w is inepenent of the state x, p. By taking expectation over w, we obtain

6 Theorem 4: For a fixe x < p, there is a quantity p k such that it is optimal to reorer battery when x < p < x + p k an it is optimal to not reorer battery when x + p k < p. In orer to erive the structure of the optimal policy for the case x > p, we again use Lemma 4. Consier the ynamic programming equation for the case x > p. V k+ x, p = hx + gp + β min E w V k x p + w, + c 3, E w V k x p + w, 0. c3 From Lemma 4, if sup l lgl, then p V kx, p c3 for all values of k thus implying that E w V k x p + w, + c 3 E w V k x p + w, 0. The conition sup l l gl < c 3 also implies that p w > 0 for all values of w, thus p > 0. Hence we have c3 Theorem 5: If sup l lgl then it is optimal to reorer the battery after serving whenever x > p + p. This result, along with Theorem 4, gives complete structure of c3 the optimal policy when sup l lgl. Now, note that if it turns out that p V kx, p > c3 for all values of x an p then p w = 0 for all w an that, if x > p, V k x p + w, + c 3 > V k x p + w, 0 w, so that we get Theorem 6: If inf p inf x x V kx, p c3 for all values of k then it is optimal to never reorer the battery. The results obtaine here are very similar to those obtaine for the case when S is negative in the sense that we get a threshol base policy where existence of a nontrivial threshol epens on the slope of the cost functions. We are now consiering the infinite horizon iscounte/average cost problems for this case an expect them to provie results of same flavour as those obtaine when S is negative. 9 Let {a n } n, a n {0, }, be a sequence of controls so that a n = inicates ecision of remaining ile at n th ecision instant an a n = 0 implies serving the maximum possible amount of flui an reorering a new battery. We also require an upper boun on the rate at which the battery can be reorere therefore taking in to account the cost of reorering a battery. This can be one by letting lim N N a n p, n= where p is chosen so that the system is also stable. For the system to be stable, the long term average of given service shoul satisfy the following inequality. lim N N a n E[W ], n= i.e., p E[W ]. We now have the problem of minimizing lim x n a N N subject to lim N N n= a n p, n= where a is a control sequence an x n = x n a n + + w n is the buffer occupancy at ecision instant n. Without loss of generality, we assume that x 0 = 0 so that x = w. For our case we have that Theorem 7: The function f N a := x N a is multimoular for each N. Now we make the following assumption: The maximum amount of arrival in a slot is boune by. Uner this assumption, all the conitions in Theorem 6 of [0] are satisfie we can take the require sequence b n 0 since W <. We have thus establishe the optimality of bracket sequences of rate p for open loop control of the system uner consieration. VI. OPEN LOOP CONTROL The key result obtaine for the average cost optimization problem was the existence of a threshol base policy for the case when S is negative. The problem with such an approach is that the threshol epens on the istribution of the arrival process so that the computation of the threshol becomes har. We may also look at other suboptimal policies that are easily implementable. This can be one, for example, by restricting the policies to those that o not require state information. Such policies nee not be stationary. A possible ecision problem now woul be to fin an optimal sequence of 0s an s where 0 correspons to reorering a new battery an correspons to remaining ile. We woul like to have a boun on the long term average cost of reorering while minimizing the average buffer occupancy cost. VII. CONCLUSION We consiere jointly optimal scheuling an power control of a wireless evice, an formulate it as a arkov ecision process problem. We consiere the cases of optimizing finite an infinite horizon iscounte costs as well as that of infinite horizon average cost. The problem becomes har as the unerlying state space is two-imensional an important secon orer properties like convexity or increasing/ecreasing ifferences o not hol. We establishe the optimality of bang-bang policy which is threshol base. We also stuie the behaviour of this threshol an obtaine parametric monotonicity results. We then consiere the problem of open-loop control of the system where the ecision maker oes not have knowlege of the system state. For this case we prove that using a bracket sequence base policy results in optimal performance.

7 ACKNOWLEGDEENT This work was supporte by project no. 2900-IT- from the Centre Franco-Inien pour la Promotion e la Recherche Avancee CEFIPRA. REFERENCES [] S. Cui, A. J. Golsmith, an A. Bahai, Energy-constraine moulation optimization, To appear in IEEE Trans. on Wireless Communications,, 2004. [2] W. Ye, J. Heiemann, an D. Estrin, An Energy-Efficient AC Protocol for Wireless Sensor Networks, in Proceeings of the IEEE INFOCO, 2002, pp. 567 576. [3] ichele Zorzi an Ramesh R. Rao, Energy Efficiency of TCP in a Local Wireless Environment, ob. Netw. Appl., vol. 6, no. 3, pp. 265 278, 200. [4] C. E. Price, K.. Sivalingam, P. Agarwal, an J.-C. Chen, A Survey of Energy Efficient Network Protocols for Wireless an obile Networks, AC/Baltzer Journal on Wireless Networks, vol. 7, no. 4, pp. 343 358, 200. [5] unish Goyal, Anurag Kumar, an Vino Sharma, Power Constraine an Delay Optimal Policies for Scheuling Transmissions over a Faing Channel, in Proceeings of the IEEE INFOCO, 2003. [6] G. S. Rajahyaksha an V. S. Borkar, Transmission Rate Control Over Ranomly Varying Channels, Probability in Engineering an Informational Sciences, vol. 9, no., pp. 73 82, 2005. [7] T. F. Fuller,. Doyle, an J. S. Newman, Relaxation phenomena in lithium-ion insertion cells, J. Electrochem. Soc., vol. 4, pp. 982 990, 994. [8] V. S. Borkar, A. A. Kherani, an B. J. Prabhu, Control of Buffer an Energy of a Wireless Device: Close an Open Loop Approaches, Tech. Rep. RR-544, INRIA, December 2004. [9]. Schal, Average Optimality in Dynamic Programming with General State Space, ath. of Operations Res., vol. 8, pp. 63 72, 993. [0] E. Altman, B. Gaujal, an A. Horijk, Discrete-event control of stochastic networks: ultimoularity an Regularity, Lecture Notes in athematics. Springer Verlag, 2003.