Online Power Control for Block i.i.d. Energy Harvesting Channels

Size: px

Start display at page:

Download "Online Power Control for Block i.i.d. Energy Harvesting Channels"

Vanessa Malone
6 years ago
Views:

1 Online Power Control for Block i.i.d. Energy Harvesting Channels Dor Shaviv and Ayfer Özgür arxiv: v cs.i 25 Jun 207 Abstract We study the problem of online power control for energy harvesting communication nodes with random energy arrivals and a finite battery. We assume a block i.i.d. stochastic model for the energy arrivals in which the energy arrivals are constant for a fixed duration but are independent across different blocks drawn from an arbitrary distribution. his model serves as a simple approximation to a random process with coherence time. We propose a simple online power control policy and prove that its performance gap to the optimal throughput is bounded by a constant which is independent of the parameters of the problem. his also yields a simple formula for the approximately optimal long-term average throughput which sheds some light on the qualitative behavior of the throughput and how it depends on the coherence time of the energy arrival process. Our results show that perhaps counter-intuitively for a fixed mean energy arrival rate the throughput decreases with increasing coherence time of the energy arrival process. In particular the battery size needed to approach the AWGN capacity of the channel increases linearly with the coherence time of the process. Finally we show that our results can provide an approximation to the information-theoretic capacity of the same channel. Index erms Energy harvesting online power control channel capacity finite battery block i.i.d. I. INRODUCION Recent advances in energy harvesting technologies enable wireless devices to harvest the energy they need for communication from the natural resources in their environment. his development opens the exciting possibility to build wireless networks that are self-powered self-sustainable and which have lifetimes limited by their hardware and not the size of their batteries. Communication with such wireless devices requires the design of good power control policies that can maximize throughput under random energy availability. In particular available energy should not be consumed too fast or transmission can be interrupted in the future due to an energy outage; on the other hand if the energy consumption is too slow it can result in the wasting of the harvested energy and missed recharging opportunities in the future due to an overflow in the battery capacity. his problem has received significant interest in the recent literature In the his work was supported in part by a Robert Bosch Stanford Graduate Fellowship in part by the National Science Foundation under Grant CCF and in part by the Center for Science of Information an NSF Science and echnology Center under Grant CCF his work was presented in part at the 207 IEEE Wireless Communications and Networking Conference (WCNC and submitted to 207 IEEE GLOBECOM 2. he authors are with the Department of Electrical Engineering Stanford University Stanford CA USA ( shaviv@stanford.edu; aozgur@stanford.edu. offline case when future energy arrivals are known ahead of time the problem has an explicit solution 3 5. he optimal policy keeps energy consumption as constant as possible over time while ensuring no energy wasting due to an overflow in the battery capacity. he more interesting case is the online scenario where future energy arrivals are random and unknown. When the energy arrivals are i.i.d. the problem can be modeled as a Markov Decision Process (MDP and solved numerically using dynamic programming 6 9. However this numerical approach can be computationally demanding and does not provide insight into the structure of the optimal online power control policy and the qualitative behavior of the resultant throughput namely how it varies with the parameters of the problem. his kind of insight can be critical for design considerations such as choosing the size of the battery to employ at the transmitter. More recently in 20 we developed a simple online policy which provably achieves a near-optimal throughput for any distribution of the energy arrivals (see also precursory work in 8 9 and extensions in he gap between the throughput achieved by this scheme and the optimal throughput can be explicitly bounded by a constant independent of the distribution of the energy arrivals and any of the problem parameters. his leads to a simple approximation for the optimal throughput which sheds some light on the qualitative behavior of the optimal throughput and its dependence on major problem parameters. All of the above solutions including the MDP approach are applicable only when the energy arrival process is i.i.d. and therefore the next energy arrival at each time instant is impossible to predict. However most natural energy harvesting processes such as solar energy or wind energy are far from i.i.d. and are highly correlated over time. For processes of this type an i.i.d. model is very far from the actual behavior of the process. he research on optimal online power control for non-i.i.d. processes with finite battery size is very scarce. For example 3 proposes a simple policy for general stationary ergodic arrival processes which becomes asymptotically optimal as the battery size tends to infinity however this strategy can be arbitrarily away from optimality at finite battery size. For a finite battery 28 studies the information-theoretic capacity of a model with a general Markov arrival process and provides upper and lower bounds on capacity. However these bounds can be arbitrarily away from optimality and moreover they do not provide any qualitative understanding of the actual capacity of the system. In this work we consider energy arrivals processes which follow a block i.i.d. model. his means that the energy arrivals remain constant for a fixed period of time say time slots

2 2 and then change to an independent realization for the next time slots. his can model for example a solar panel which harvests energy from the sun and the appearance of clouds can change randomly and block certain amounts of sunshine for a certain period of time. his process can be approximated by a block i.i.d. model. Additionally this is a good model for a device which harvests RF energy from other transmitting devices in its environment. Such transmitting devices typically transmit continuously for certain periods of time and are silent for the remaining periods (as in DMA for example which warrants a block i.i.d. model. Note that block i.i.d. models have been popularly used in wireless communication to capture correlations in the channel fading process by a simple model. In this case is called the coherence time of the channel which corresponds to the time duration over which the channel remains approximately constant 29. Analogously we refer to as the coherence time of the energy arrival process in this paper. We propose a simple policy and establish its near-optimality for this block i.i.d. model. his policy combines features of the optimal offline 4 and approximately-optimal online 20 strategies for the i.i.d. ( = model. Since in the beginning of each block the future energy arrivals are known for a duration of channel uses energy allocations for the entire block can be decided on ahead of time akin to the offline setting. In particular power allocation inside each block is constant as implied by the optimal offline strategy and ensures that energy is not wasted due to an overflow in the battery capacity. On the other hand the energy arrivals are i.i.d. across different blocks and the situation across blocks is akin to the online setting. In particular between different blocks the policy resembles the Fixed Fraction Policy of 20 where a constant fraction of the currently available energy in the battery is allocated to the channel. However achieving the optimal throughput within a constant gap requires a non-trivial combination of these two schemes. In the same spirit of 20 we develop a lower bound to the throughput achieved by our proposed policy by modifying the distribution of the energy arrivals. We do so in a way that as we show produces worse throughput than the original distribution and for which we can analytically evaluate the throughput. We then proceed to developing a nearly-tight upper bound on the optimal throughput achievable under the block i.i.d. model. he throughput achieved with an infinite battery namely the AWGN capacity 2 log(+µ where µ is the mean of the energy arrival rate is always an upper bound on the throughput achievable with any finite battery size. his was the upper bound used in the i.i.d. case in 20. However this upper bound turns out to be too loose in general for the block i.i.d. case; indeed we show that this upper bound is nearly-achievable (up to a bounded gap only when the battery size is large enough specifically when B µ+(e max µ where E max is the maximal energy arrival. Note that for fixed µ and E max as the coherence time of the energy arrival process increases a larger battery is needed to approach the In 20 this modified distribution was simply a Bernoulli distribution; here the modification is slightly more involved. AWGN capacity. his is somewhat counter-intuitive since one may expect a large coherence time to increase the optimal throughput as it results in larger lookahead. We show that when B < µ + (E max µ the optimal throughput can be significantly smaller than the AWGN capacity. We finally show that the difference between the throughput achieved by our proposed strategy and our upper bound is bounded by 2 loge 0.72 regardless of the values of the problem parameters. While in this paper we mostly focus on the online power control problem for energy harvesting nodes we show that this problem is central to understanding and achieving the information-theoretic capacity of this channel. Following the approach in 30 which focused on an i.i.d. model for the energy arrival process we show that the informationtheoretic capacity of the channel can be approximated by the corresponding optimal online throughput also under a block i.i.d. model. he upper bound on the gap between the two performance metrics we develop in this paper depends on the entropy rate of the energy arrival process which decreases with increasing coherence time. It is also possible to bound the gap between these two performance metrics by the entropy rate of the online power control process (rather than the entropy rate of the energy harvesting process itself. By modifying the online power control policy to have a constant entropy rate in the lines of 30 we believe it is possible to show that the information-theoretic capacity and the optimal online throughput are indeed within a constant gap of each other independent of the parameters of the problem. II. SYSEM MODEL We begin by introducing notation: Let E denote expectation. All logarithms are taken to base 2. For a process x t } t= with a block structure it will be convenient to have special notation for the j-th slot in the i-th block: x (i j := x (i +j j =... i = 2... We consider the discrete-time online power control problem for an energy harvesting transmitter communicating over an additive Gaussian channel. he transmitter is equipped with a battery of finite capacity B which is being continuously recharged by an exogenous energy harvesting process. Let E t E be the energy harvested at discrete time t. We assume E t is a block i.i.d. stochastic process with block duration. More precisely let E (i } i= be an i.i.d. random process where E (i E is a nonnegative random variable (RV drawn from a set E with marginal distribution P E. o simplify the analysis we assume P E is a discrete distribution and E is a finite set however our results hold in more generality for arbitrary discrete or continuous distributions. hen the process E t is given by E t = E (i t = (i +...i i = 2... We assume E (i > 0 with positive probability; otherwise E (i = 0 w.p. and the problem is degenerate. A power control policy for an energy harvesting system is a sequence of mappings from energy arrivals to a non-negative

3 3 number which will denote a level of instantaneous power. In this work we will focus on online policies; an online policy g = g t } t= is a sequence of mappings g t : E t R + t = 2... such that the instantaneous power at time t is g t (E E 2...E t. In words the power allocation at time t can depend only on the realizations of the energy arrival process up to time t (and not future realizations of the random process although the probabilistic model for the energy arrival process i.e. the fact that it is block i.i.d. with coherence time and distributionp E is known ahead of time. By allocating power g t at time t we get an instantaneous rate equal to the AWGN capacity i.e. C(g t := 2 log(+g t. ( Let b t be the amount of energy available in the battery at the beginning of time slot t. An admissible policy g is such that satisfies the following constraints for every possible harvesting sequence E t } t= : 0 g t b t t = 2... (2 b t = minb t g t +E t B} t = (3 where we assume b = B without loss of generality. For a given policy g we define the N-horizon expected total throughput to be N (g = E N t= C(g t where the expectation is over the energy arrivals E...E N. he longterm average throughput of the same policy is defined as (g = liminf N N N(g. (4 Our goal is to characterize the optimal online power control policy and the resultant optimal long-term average throughput: A. Background Θ = sup (g. (5 g admissible III. PRELIMINARY DISCUSSION he optimal offline power control policy has been explicitly characterized in 3 5 in which the energy arrival sequence E t } t= is assumed to be known ahead of time. Additionally in 20 we develop a near-optimal online power control policy for the case of i.i.d. energy arrivals and provide an approximate expression for the resultant long-term average throughput with a bounded gap to optimality. In particular the Fixed Fraction Policy of 20 allocates a fixed fraction q of the currently available energy at each channel use. More precisely let q. Note that 0 q. hen Emin(Et B B g t = qb t t = 2... he main result of 20 is to establish the optimality of this online strategy within a constant additive gap for any i.i.d. process ( =. heorem (heorem 2 in 20. Let E t be an i.i.d. nonnegative process and let g be the Fixed Fraction Policy. hen the throughput achieved by g is bounded by (g C(Emin(E t B 2 loge where C( is the AWGN capacity defined in (. Note that the AWGN capacity C(Emin(E t B is an upper bound on the achievable throughput for any distribution (see 20 Prop. 2. Observe that whenever there is an energy arrival larger than the battery size E t > B the battery will be completely recharged to B and the remaining energy is discarded as per (3. Hence effectively this is as if an energy arrival of E t = B occurred. We can therefore replace the energy arrival process with min(e t B and Emin(E t B is the mean energy harvested by the transmitter. here is an alternative way to view the quantity Emin(E t B which will be useful in the sequel. Observe that whenever the event E t > B} occurs the memory of the system which is encapsulated in the state of the battery is essentially erased. his induces a regenerative structure for the online decision process and the behaviors of different epochs the periods between consecutive events E t > B} are statistically independent and identical. Let p = Pr(E t > B and observe that the average length of an epoch is τ = /p. he average energy available for transmission in a single epoch is given by ε = B+( p EE t E t B because the battery is fully charged at the beginning of the epoch and the average amount of energy harvested in each of the subsequent time slots is EE t E t B. herefore the average energy per time slot which is available for transmission is given by ε τ = p B +( pee t E t B = Emin(E t B. B. Preliminary Results For the block i.i.d. model considered in this paper it can be observed that the problem can be formulated as a Markov Decision Process (MDP where each time step of the MDP corresponds to time slots of the original communication system. Let i denote the i-th step of this MDP. hen we define the state as the pair (b (i + E (i + = (b (i E(i. he action (or control is the vector of power allocations for the entire block(g (i +...g i = (g (i...g(i which must satisfy the energy constraints (2 and (3. he disturbance is E (i+ and the next state pair (b (i+ E (i+ is given by b (i+ = minb (i g(i +E(i+ B} (6 where the state variable E (i+ is of course equal to the disturbance itself. he stage reward is given by r i = j= C(g(i j and the goal is to optimize the expected long-term average reward per stage given by liminf N n N i= Er i. In fact this MDP can be further simplified. First it can be easily seen that since the energy arrivals for the entire block are known ahead of time it is suboptimal to have battery overflows inside the block (unless E (i B in which case overflows are inevitable. hat is b (i j = b (i j g(i j +E(i for j = Otherwise if there is some j such that b (i j g(i j +E(i > B one can simply increase g (i j and consequently the reward without affecting the state. According to this observation and by concavity of the logarithm it follows that it is optimal to set g (i = g (i 2 =... = g (i.

4 4 hus the control is reduced to the pair (g (i g(i (note that in general g (i is not equal to g(i. his is made formal in the following lemma which is proved in Appendix A. Lemma. he MDP defined previously is equivalent to the following MDP with state pair (b (i E(i action pair (g (i g(i and disturbance E(i+. he actions must satisfy the constraints 0 g (i min ( E (i E(i b (i B (7 0 g (i b(i (8 where b (i = minb(i +( (E (i g (i B}. he state evolves according to the function ( (i+ b E (i+ = ( minb (i g(i +E(i+ B} E (i+ and the stage reward is given by r(b (i E(i g (i g(i = C(g(i + C(g(i. (9 Additionally it follows from the fact that 0 g (i b(i b (i+ = min(b (i g(i + E(i+ B and E (i+ is independent of the state (or equivalently by the principle of optimality 3 that the policy for g (i can be a function of b (i instead of (b(i E(i. he optimal policy can be found by solving the Bellman equation 32 however this is hard to solve explicitly even for the simple case of = (i.i.d. energy arrivals. Alternatively it can be solved numerically using value iteration but this can require extensive computation resources. Specifically since the state space is a continuous interval and the action space is a two dimensional rectangle only an approximate solution can be found. his is done by quantizing the state and actions spaces a process which suffers from the curse of dimensionality. Additionally the numerical solution cannot provide insight into the structure of the optimal policy and the qualitative behavior of the optimal throughput namely how it varies with the parameters of the problem. In the next section we propose an explicit online power control policy and show that it is within a constant gap of 2 loge 0.72 to optimality analogously to heorem 2 of 20 stated above. his gap does not depend on any of the parameters of the problem namely B or the distribution of the energy arrivalsp E. Moreover this policy yields a simple and insightful formula for the approximate throughput which clarifies how the battery size needs to be chosen in terms of and P E for the resultant throughput to approach the AWGN capacity. IV. MAIN RESUL Note that if E (i > B then b (i = b (i 2 =... = b (i = B regardless of the allocated energy. Hence we can treat such energy arrivals as if E (i = B and for the rest of this section we will assume E (i B. Before we formally state the main result of the paper we informally motivate the policy we propose for the block i.i.d. energy arrival model. In the light of the discussion in the previous section a natural way to extend the Fixed Fraction Policy of 20 to the block i.i.d. model can be as follows: For an appropriately chosen q 0 let g (i j = q (b(i +( E(i j =... (0 he intuition behind this extension can be understood as follows: since the total energy to be harvested throughout a block is known ahead of time in the first time slot of the block this strategy decides on the total energy to be allocated in the current block i by taking into account both the energy available in the battery in the first time slot of the block b (i and the energy that will be harvested in the remaining time slots ( E (i. he sum of these two quantities i.e. the energy we already have in the battery plus the energy we known we will harvest can be thought of as the energy we effectively have for this block. he total energy allocated to block i is simply a fraction q of the energy we effectively have. his total energy is then uniformly divided over the channel uses in the block due to the concavity of the reward function akin to the optimal offline strategy. We will adopt the policy in (0 unless this allocation leads to an overflow of the battery during the block and therefore a wasting of the harvested energy. Note that this strategy will not lead to a battery overflow throughout the block if and only if ( ( q (i (b +( E(i B. ( When this is the case the battery state at the beginning of the last time slot of the block is given by b (i = b(i +( (E(i g (i = ( ( q (b (i +( E(i. (2 herefore when ( is satisfied we can write the policy in (0 in a way that agrees with the optimal policy structure for the MDP formulation discussed in Section III-B: g (i = q (b(i +( E(i g (i = q q+( q b(i. (3 For blocks in which the condition ( is not satisfied we would want to modify the policy (0 so as not to waste the harvested energy. Note that this condition can be checked at the beginning of the block and the energy allocations can be increased from that in (0 if the condition is not satisfied. In particular if ( is not satisfied we modify the policy to: g (i = min ( E (i g (i = q q+( q B. B b (i B (4 he energy allocations in the first time slots are increased so that energy is not wasted and the battery is fully charged after the last energy arrival i.e. b (i = B. Note that the energy allocated at the last time slot follows the same policy as (3 since b (i = B. While we can use the condition in ( for switching between the two modes of the policy for small and large energy arrivals respectively as discussed above we would

5 5 want to simplify this condition in a way that does not significantly degrade the performance but simplifies the following discussion. If we assume the battery was empty at the end of the previous block i.e. b (i = E (i the no battery overflow condition ( would be equivalent to E (i E c where E c is a critical energy level given by E c = B q +( q. (5 Note that when E (i > E c battery overflow will occur regardless of the state of the battery at the end of the previous block. Specifically we propose to allocate energy according to (4 when E (i > E c and use (3 when E (i E c. It remains to choose the fixed fraction q. Recall that in the i.i.d. case as discussed in Section III-A q was chosen Emin(Et B B to be where B is the size of the battery and Emin(E t B is the average energy available in an epoch which is the period between consecutive large arrival events E t > B}. In the block i.i.d. case observe that when the event E (i > E c } occurs at block i the battery will be fully charged at the end of the block. Hence letp = Pr(E (i > E c and imagine we put aside the first time slots of the large arrival block (in which we abandon the fixed fraction policy and instead concentrate only on the subsequent slots (where we do apply it. he average energy available for this period can be computed as ε = B+( p E E (i E (i E c because the battery is fully charged at the last time slot of the large arrival block and at each one of the subsequent low-energy blocks the transmitter harvests an average amount of energy equal to E E (i E (i E c. he duration of an epoch is on average τ = +( p slots. herefore the average energy available per time slot during this period is again given byε/τ. Note that the system is reset whenever an energy arrival larger than E c occurs which leaves the battery fully charged at the beginning of the next epoch. herefore inspired by the i.i.d. case given E c we may want to choose B +( p E E (i E (i E c q = ε/τ E c = E c (+( p. (6 Recall however that given q we want to choose E c as in (5. hese two desired relations for E c and q along with the identity Emin(E (i E c = pe c +( pee (i E (i E c can be solved to obtain the following equation: E c ( Emin(E (i E c = B (7 which can be solved for E c for given B and PE (it is shown in Appendix B that it has a unique solution in the interval 0 B. Additionally combining (5 and (7 yields the following simple formula for q given E c : q = Emin(E(i E c E c. (8 Note that this is essentially the same expression for q as in the i.i.d. case with B replaced by E c. Indeed when = eq. (7 reduces to E c = B and hence (8 reduces to Emin(E (i B B. o summarize the online policy we propose for the block i.i.d case is given as follows. Policy. Given B and PE (the distribution of E (i compute E c and q according to (7 and (8. hen apply q g (i (b(i +( E(i if E (i E c = E (i (i B b if E c < E (i B B if B < E (i (9 g (i = q q+( q b(i and note that g (i j = g (i for j = he main result of this paper is to prove that this policy is optimal within the same gap as in the i.i.d. case as stated in the following theorem. heorem 2. Let E c be the unique solution of (7 and let p = Pr(E (i > E c. hen the optimal throughput is bounded by Θ 2 loge Θ Θ where ( Θ = p( E C min E (i + p+( p C ( E min(e (i E c and the lower bound is achieved by Policy. } (i B E E B (i > E c (20 Note that the structure of the approximately optimal throughput expression has a natural interpretation in terms of Policy. he expression has two terms corresponding to the two different operation modes of the policy. he first term corresponds to the throughput achieved in the first time slots of a large energy arrival block. Note that these time slots correspond to a fraction p( of the total time on average. In the remaining fraction of the time we apply the Fixed Fraction Policy which analogously to the i.i.d. case achieves a throughput C(Emin(E (i E c where Emin(E (i E c = ε/τ = qe c is the average available energy rate for a low-energy period. heorem 2 is proved by showing the throughput obtained by Policy under the process E (i is lower bounded by the throughput obtained by Policy under a different block i.i.d. energy arrivals processê(i with a modified distribution. his modified distribution has structure similar to a Bernoulli distribution (the exact distribution will be precisely defined in Section V. he analysis of Policy turns out to be easier for Bernoulli distributions since the regenerative structure discussed previously is inherent in the arrivals process. Specifically the case whene (i 0Ē} for someē > 0 was solved in and was the basis for this work. Denoteµ = EE (i. For all ergodic energy arrival processes (including block i.i.d. the AWGN capacity 2 log( + µ is always an upper bound on the throughput for any finite battery size. However as shown in the following corollary in our block i.i.d. model this is nearly achievable only if the battery size is large enough.

6 6 Corollary. If B µ+(emax µ where E (i E max with probability the approximate throughput reduces to Θ = C(µ = 2 log(+µ. (2 Proof. Since E (i µ + B µ choose E c = µ + B µ and observe that Emin(E (i E c = µ and therefore E c is the solution to (7. It follows that p = 0 and (20 reduces to (2. We identify the case B µ + (E max µ as the large battery regime. he threshold µ + (E max µ can be intuitively interpreted as follows: When the battery size is infinite it is straightforward to observe that the optimal policy is to allocate a constant amount of power equal to the mean energy arrival rate g t = µ (cf Assume the battery was empty prior to the beginning of block i i.e. b (i = E (i and we apply this policy for all the time slots in block i. hen the battery level at the last time slot of the block will be b (i = µ+(e(i µ. his implies that we would need a battery size of at least µ+(ē µ in order to not waste energy due to an overflow since E (i can take values up to E max. However the fact that we can nearly achieve the AWGN capacity as soon as the battery size is larger than this threshold is indeed surprising. A. Connection to Channel Capacity In this section we will show how the approximate throughput of heorem 2 can provide an approximation to the information-theoretic capacity of the channel. In this section we will use the notation X n m = (X m X m+...x n for m n and X n = X n. We consider an AWGN channel i.e. the output isy t = X t + Z t where Z t N(0 and X t R is the input. Instead of the energy constraints (2 and (3 which were only concerned with the amount of transmitted power at each time slot the information-theoretic model imposes energy constraints on the amplitude of the transmitted symbol: at time t X 2 t B t (22 B t = minb t X 2 t +E t B} (23 where again we assume B = B and the harvesting process E t } t= is the same as in Section II. Instead of a power control policy our goal is to find a set of encoding functions and a decoding function: f enc t : M E t X t =...N f dec : Y N M wherem =...2 NR } is the message set andx = Y = R are the input and output spaces. As usual the capacity C is the supremum of all rates R for which the probability of error vanishes as N (see 30 for a detailed formulation of the problem. he main result of this section is the following approximation to capacity. heorem 3. he capacity of the energy harvesting channel with block i.i.d. energy arrivals is bounded by Θ H(E(i 2 log( πe 2 2 C Θ (24 where Θ is given by (20. his is the counterpart of heorem in 30 in which the i.i.d. case ( = is considered. he proof follows similarly with some modifications to account for a block i.i.d. energy arrivals process. Specifically we have the following intermediate result. Proposition. he capacity of the energy harvesting channel with block i.i.d. energy arrivals is bounded by Θ H(E(i 2 log( πe 2 C Θ (25 where Θ is the optimal throughput defined in (5. See Appendix C for the proof. Clearly applying heorem 2 to (25 yields heorem 3. Remark. he lower bound in (25 can be tightened as in 30 hm. 4 to C (g H(g 2 log( πe 2 for any policy g where H(g = limsup N N H(gN (E N is the entropy rate of the process g t (E t } t=. his allows choosing policies for which H(g is bounded by a constant which is independent of the statistics of E (i thereby making the gap dependent on the entropy rate of the power control process instead of the entropy rate of the energy harvesting process itself. It is shown in 30 that in the i.i.d. ( = case one can design online power control policies that have constant entropy rate (in particular H(g bits per channel use independent of the distribution of the energy arrivals and at the same time are within a constant gap to the optimal throughput. his suggests that in the i.i.d. case the informationtheoretic capacity of the channel can be approximated by the optimal throughput within a constant gap independent of the parameters of the problem. In the block i.i.d. case the entropy rate of Policy we develop in the previous section can not be simply bounded by a constant independent of the distribution of the energy arrivals. We believe it is possible to modify Policy in the lines of 30 to obtain a policy with a constant entropy rate and hence show that the information-theoretic capacity and the optimal online throughput are indeed within a constant gap of each other independent of the parameters of the problem. However this is less critical for the block i.i.d. case since the entropy rate of the energy arrival process decreases to zero with increasing blocklength. Hence we expect the gap term H(E(i in (25 to be small for sufficiently large values of. In the remainder of the paper we provide the proof of heorem 2. In Section V we prove the lower bound on the throughput and in Section VI we derive an upper bound which differs from the lower bound by no more than 0.72 bits per channel use. Section VII concludes the paper.

7 7 V. LOWER BOUND In this section we will show that Policy achieves a throughput which is lower bounded by Θ 2 loge as defined in heorem 2. his will be done in three parts. In the first part we will derive the lower bound for the special case of Bernoulli energy arrivals i.e. E (i 0Ē} for some Ē > 0. In the second part we will generalize this to a larger class of energy arrival distributions dubbed semi-bernoulli distributions which maintain the regenerative structure exhibited by Bernoulli energy arrivals. Finally in the third part we will show that for any block i.i.d. energy arrivals distribution we can find a modified distribution of the form discussed in the second part for which the throughput under Policy is a lower bound to the throughput under the original distribution. A. Bernoulli Energy Arrivals We start with the special case of Bernoulli energy arrivals namely Ē w.p. E (i p0 = (26 0 w.p. p 0 for some 0 p 0. We will assume the energy level Ē satisfies E c Ē B where E c is the solution to (7. his implies Emin(E (i B E c = p 0 E c which yields E c = p 0+( p 0 and q = p 0 from (7 and (8 respectively. In the following proposition we show that the lower bound in heorem 2 holds for this special case. Proposition 2. Let E (i be distributed Bernoulli as in (26 with E c Ē B. hen the throughput obtained by Policy is lower bounded by (g Θ 2 loge. Proof. Assume for now Ē > E c; the special case Ē = E c will be treated afterwards. We have p = Pr(E (i > E c = p 0 and the approximate throughput in (20 is given by Θ = p0( g (i = C(Ē B Ē p0+( p0 + C(p 0 E c. (27 It can be observed that Policy reduces to p0 b (i Ē g (i = p 0 p 0+( p 0 b(i. if E(i = 0 (i B b if E(i = Ē We see that whenever there is a positive energy arrival the battery will be fully charged to B by the end of the block. Consider the Markov reward process 34 Ch. 8.2 obtained by applying Policy. his comprises of the state process (b (i E(i and a reward function given by r(b (i E(i = C(g(i + C(g(i. It can be verified that the reward function is given by C( p0 if E(i = 0 r(b (i E(i = b(i (i B b C(Ē + C(p 0E c if E (i = Ē. he battery state evolves according to b (i+ = min(b (i g(i +E(i+ B = ( p 0 b (i b (i g(i if E(i = 0 ( p 0 E c if E (i = Ē. (28 In what follows we will lower bound the throughput obtained by Policy by analyzing the average long-term reward under the following reward process: r(b (i E(i = C( p0 if E(i = 0 B Ē C(Ē + C(p 0E c if E (i = Ē. b(i Note that r(b (i E(i r(b (i E(i since b (i E (i. Hence the throughput obtained by Policy is lower bounded by (g = liminf N liminf N N N N i= N i= r(b (i E(i r(b (i E(i. (29 Observe that the process r(b (i E(i is a regenerative process where regeneration occurs whenevere (i = Ē. hen by the renewal reward theorem (see e.g. heorem 3. in 35 Ch. VI 36 Prop. 7.3 or 20 Lemma eq. (29 becomes: (g L EL E r(b (i E(i (30 i= wheree ( = Ē E(i = 0 fori > andlis a Geometric RV with parameter p 0 representing the number of blocks between consecutive positive energy arrivals. From (28 it follows that b (i = ( p 0 i E c for i 2 and therefore (30 becomes (g EL E B Ē C(Ē L + i=2 + C(p 0E c C(p 0 ( p 0 i E c. (3 Using the fact that log( + αx log( + x + logα for 0 α we can lower bound C(p 0 ( p 0 i E c C(p 0 E c + i 2 log( p 0. Substituting in (3: (g EL E B Ē C(Ē + C(p 0E c +(L C(p 0 E c + L2 L 4 log( p 0 = p0( C(Ē B Ē +(p0 + p 0C(p 0 E c + p0 2p 0 log( p 0 Θ 2 loge where the last step is due to (27 and because p 2p log( p 2 loge for 0 p. It remains to show the lower bound holds for Ē = E c. In this case we have p = Pr(E (i > E c = 0 and the approximate throughput (20 is Θ = C(p 0 E c. (32

8 8 Θ = p( = q( = q( = q( E ( E C min E (i ( E C min E (i ( E C min E (i ( C min E (i (i B E B } E (i > E c (i B E B } E (i E c (i B E B } E (i E c (i B E B } E (i E c + p+( p C(qE c (34 Pr(E(i = E c C(E c B E c p+( p + C(qE c (35 + ( p ( Pr(E(i =E c C(qE c (36 + q+( q C(qE c (37 Policy takes the form g (i = p0 b (i p 0 (b (i g (i = p 0 p 0+( p 0 b(i. if E(i = 0 +( E c if E (i = E c Observe that here as well whenever there is a positive energy arrival the battery will be fully charged to B by the end of the block. his can be seen by computing the battery state at the end of the block when E (i = E c : b (i = min ( p0( Since b (i E (i = E c : (b (i +( E c B }. ( p0( (b (i +( E c ( p0( E c = B. herefore the battery state evolves exactly the same as in the case Ē > E c. he reward function is given by r(b (i E(i = C( p0 b(i C(p0 (b(i if E(i = 0 +( E c+ C(p 0E c if E (i = E c. As before we will lower bound the reward function using the fact that b (i E (i giving r(b (i C( p0 E(i = b(i if E(i = 0 C(p 0 E c if E (i = E c. Repeating the previous steps we obtain (g C(p 0 E c + p0 2p 0 log( p 0 Θ 2 loge. B. Semi-Bernoulli Energy Arrivals Now consider an energy arrivals process E (i and let E c be the solution to (7. We say that E (i has a semi-bernoulli distribution if Pr(0 < E (i < E c = 0. (33 In other words E (i can either be 0 or take a value which is at least E c but not any value in the open interval (0E c. Note that the Bernoulli distribution from the previous section namely E (i 0Ē} with E c Ē clearly satisfies this condition; however in general E (i can take an arbitrary number of values. Observe that Emin(E (i E c = E c Pr(E (i E c which from (8 yields q = Pr(E (i E c. Denote p = Pr(E (i > E c as in heorem 2. Note that q = p+pr(e (i = E c. he approximate throughput Θ in (20 is given by (34 (37 at the top of the page where (36 is due to (5. he following proposition generalizes the result of Proposition 2 to semi- Bernoulli distributions. Proposition 3. Suppose E (i and E c satisfy Pr(0 < E (i < E c = 0. hen the throughput obtained by Policy is lower bounded by (g Θ 2 loge. Proof. Observe that Policy reduces to q b(i if E(i = 0 g (i q = (b(i +( E c if E (i = E c g (i = B b (i E (i if E c < E (i B B if B < E (i q q+( q b(i. As in the previous section whenever there is a positive energy arrival (or equivalently E (i E c the battery will be fully charged to B by the end of the block. aking the appropriate Markov reward process with reward function C( q b(i if E(i = 0 C( q (b(i +( E c+ C(qE c r(b (i E(i = C(E(i if E (i = E c (i B b + C(qE c if E c < E (i B C( B+ C(qE c if B < E (i we can again obtain a lower bound on the reward function by using the relation b (i E (i for the appropriate cases (specifically when E (i > 0: C( q b(i if E(i = 0 C(qE c if E (i = E c r(b (i E(i = C(E(i C( B+ C(qE c B E (i + C(qE c if E c < E (i B if E (i > B.

9 9 (g EL E q( E = Θ 2 loge (min C ( C min E ( E (i ( B E B } B E (i B + C(qE c E ( E c } E (i E c + L EL E C(q( q i E c i=2 (38 + q+( q C(qE c 2 loge (39 (40 his can be written succinctly as follows: C( q r(b (i b(i if E(i = 0 E(i = C(minE(i B E (i B}+ C(qE c if E (i E c. he process r(b (i E(i is regenerative with regeneration occurring at the event E (i E c. he renewal reward theorem takes the form lim inf N N N i= = L EL E r(b (i i= r(b (i E(i E(i E ( E c E (i = 0 i 2. he derivation carries on almost identically to the proof of Proposition 2 where the only difference is in the first term of (3. We obtain (38 (40 at the top of the page where (40 is due to (37. C. General Block i.i.d. Energy Arrivals Now consider an arbitrary distribution of energy arrivals E (i. We lower bound the throughput obtained by Policy using a technique inspired by 20. here the throughput obtained by the proposed policy was lower bounded by showing that the Bernoulli harvesting process yields the worst performance compared to all other i.i.d. processes with the same mean. Accordingly we suggest a mean-preserving modification to the energy arrival distribution; specifically the modified distribution will be semi-bernoulli as defined in the previous section. We then show that the throughput obtained by Policy under this modified distribution is lower than under the original distribution. Subsequently the throughput obtained under the modified harvesting process is readily lower bounded using Proposition 3. We begin by defining the modified energy arrival process: Ê (i := W E (i E c }+E (i E (i > E c } (4 where } is the indicator function and W is a Bernoulli RV independent of E (i with W 0E c }. o make sure the mean is preserved i.e. EÊ(i = EE (i we write EE (i E (i E c } = EW E (i E c } giving = Pr(W = E c E c Pr(E (i E c Pr(W = E c = EE(i E (i E c E c. (42 With this distribution of W the probability of positive energy arrival Ê (i is given by: Pr(Ê(i > 0 = Pr(Ê(i E c = ( ppr(w = E c +p = EE(i E (i E c } E c +EE (i > E c } = Emin(E(i E c E c = q. (43 In what follows we will analyze the long-term average throughput obtained by Policy under the original harvesting process E (i as well as under the modified process Ê(i. As before we consider the Markov reward process induced by applying Policy. It would be convenient to describe the system using simpler state variables defined below: x i = b(i +( E(i (44 s i = E (i } > E c. (45 Using these new state variables Policy can be expressed as follows: g (i qx i if s i = 0 = min x i B B } if s i = g (i = q min(x i E c if s i = 0 qe c if s i =. Accordingly the reward function is given by r(x i s i = C(qx i+ C(q min(x ie c if s i = 0 (min C xi B B } + C(qE c if s i = (46 and the state dynamics are given by 2 min ( qmin(x i E c +E (i+ B+( E } (i+ if s i = 0 x i+ = min ( qe c +E (i+ B+( E } (i+ if s i = (47 s i+ = E (i+ > E c }. (48 2 While the state variables were changed from (b (i E(i to (x i s i the disturbance is still E (i+ which is independent of the current state.

10 0 Define the N-horizon total throughput obtained by Policy when the initial state is (xs: N J N (xs = E r(x i s i x = xs = s. (49 i= he long-term average expected throughput obtained by Policy is given by (g = liminf N N J N(xs for any x 0 B s 0}. For the modified harvesting process we similarly define the processes ˆx i and ŝ i given by (47 and (48 respectively by replacinge (i+ with Ê(i+. he N-horizon total throughput obtained by Policy under the modified energy arrival process is given by N Ĵ N (xs = E r(ˆx i ŝ i ˆx = xŝ = s i= and the long-term average throughput (for which we provided a lower bound in Proposition 3 in the previous section is given by ˆ (g = liminf N In the following proposition NĴN(xs. we claim that the N-horizon expected throughput for the original distribution of block i.i.d. energy arrivals E (i is greater than the throughput obtained for the modified distribution Ê(i for any N and any initial state (x s. Proposition 4. For any x 0 B s 0} and integer N : J N (xs ĴN(xs. (50 his is proved by induction making use of the concavity and monotonicity of the reward function r(x s. We defer the proof to Appendix D. By takingn an immediate corollary of Proposition 4 is (g ˆ (g. Since Ê(i is a semi-bernoulli process i.e. it satisfies Pr(0 < Ê (i < E c = 0 we can readily obtain a lower bound on ˆ(g by applying Proposition 3 from the previous section: ˆ(g p( E C ( min Ê (i B Ê(i B } Ê (i > E c + p+( p C(E c Pr(Ê(i E c 2 loge where the above expression for the approximate throughput Θ of the modified process Ê(i is given by (34. Note that p = Pr(Ê(i > E c = Pr(E (i > E c. By construction (4 the first term is equivalent to E C ( min = E Ê(i B Ê(i C ( min E (i B } Ê (i > E c (i B E B } E (i > E c. For the second term we use (43 followed by (8 to obtain C(E c Pr(Ê(i E c = C(E c q = C(Emin(E (i E c. It follows that (g p( E C ( min E (i B E (i B } E (i > E c + p+( p C(Emin(E (i E c 2 loge (5 which concludes the proof of the lower bound in heorem 2. VI. UPPER BOUND Fix an arbitrary policy g and consider the expected total throughput for N blocks or equivalently N time slots: N N (g = EC(g t = = t= N i= j= j= + N i=2 EC(g (i j EC(g ( j +EC(g (N N ( i=2 j= ( j= EC(g (i j +EC(g (i EC(g (i j +EC(g (i where the inequality is because g (i j b (i j +C( B (52 B for any i j. According to Lemma (specifically (68 and (69 in Appendix A we can upper bound the total rate of the first time slots in each block as follows: N (g N E ( C ( min E (i b(i i=2 + N i=2 E C(g (i +C( B. Since b (i = minb (i g (i nondecreasing: N (g b(i B } + E (i B} and C( is N θ i +C( B (53 i=2 where we have denoted for i = 2...N: θ i = E ( C ( min E (i b(i b(i +E C(g (i. +g (i E (i B } We break the expectation over E (i as shown in equations (54 (57 at the top of the next page where (54 is because Pr(E (i E c = p; (55 is due to the monotonicity of C( ; (56 is because g (i is independent of E (i (by definition of an online policy and because E (i are i.i.d. and by denoting p = Pr(E (i > B and (57 is by applying Jensen s inequality to the two terms in the first expectation.

11 θ i (i = EC(g (i +( p( E C ( min E (i b(i C ( min x b(i + P E (x( E x>e c (ii EC(g (i +( p( E C ( E (i b(i + P E (x( E C ( x b(i b(i (iii = E C(g (i + +( p( C ( E (i b(i P E (x( E C ( x b(i (iv (p+( pe C ( pg (i p+( p + P E (x( E C ( x b (i b(i b(i +g (i E (i +g (i b(i +g (i x b(i b(i +g (i +( p(e (i b (i +b(i +b(i g (i x +g (i +g (i x B } E (i E c B } E (i = x E (i E (i E c (54 E (i = x +Pr(E (i > B ( C( B (55 E (i E (i E c E (i = x +p ( C( B (56 E (i E c E (i = x +p ( C( B (57 Denote the following expected values for i =...N: γ (i = Eg (i µ = EE (i E (i E c β (i = Eb (i β (i 0 = Eb (i E(i E c β (i (x = Eb(i E(i = x E c < x B. Note that since E (i > B implies b (i = B the following relation holds: β (i = ( pβ (i 0 + P E (xβ (i (x+p B. (58 Now applying Jensen s inequality to (57 and again observing that g (i as well as b (i are independent of E (i : θ i (p+( pc ( pγ (i +( p( µ β (i 0 +β(i p+( p +( P E (xc ( x β (i (x+β(i γ (i +p ( C( B. (59 Substituting into (53 and dividing by N yields (60 at the bottom of the page. Denote the following time-averages of the expected values above: N γ = N γ (i i= N β = N β (i β 0 = N β (x = N i= N i= N i= β (i 0 β (i (x E c < x B. We again apply Jensen s inequality to (60 giving (6 (63 at the top of the next page. Eq. (62 follows because 0 b (i B for any i; and (63 is due to the fact that ln(+x+y ln(+x+y for xy 0 or equivalently C(x+y C(x+ 2ln2 y. Since g (i b(i we can take expectation to obtain γ(i β (i for every i which implies γ β. Similarly b (i B which implies β (i (x B and consequently β (x B for all E c < x B. Finally the relation (58 implies β = ( p β 0 + P E (x β (x+p B. With these constraints we can further upper bound (63 as follows: N N(g N N Θ + NC( B+ 2ln2 N B (64 N N(g N C( B+ N N + N i=2 N i=2 p+( p C ( pγ (i +( p( µ β (i P E (xc ( x β (i 0 +β(i p+( p (x+β(i γ (i + N N p C( B. (60

12 2 N N(g N N N C( B+ N N + N N C( B+ N N + N N ( p+( p p γ +( p( µ β0 N C (β(n p+( p N C( B+ N + N N 0 β ( 0 ( x β (x N P E (xc (β(n (x β ( p+( p C ( p γ +( p( µ β0 + β + N B p+( p ( x β (x+ β γ + B N P E (xc C ( p γ+( p( µ β 0+ β p+( p + p B 2ln2 N P E (xc ( x β (x+ β γ + p B 2ln2 N + N p+( p + β (x+ β γ + N N p C( B (6 + N N p C( B (62 N p C( B. (63 where Θ is the optimal solution to the following convex optimization problem: 3 maximize γββ 0 β (x} Ec<x B p+( p + subject to γ β C ( pγ+( p( µ β 0+β p+( p +p P E (xc ( x β (x+β γ C( B β (x B E c < x B β = ( pβ 0 + P E (xβ (x+p B. (65 It is shown in Appendix E by verifying that KK conditions hold that the following solution is optimal: γ = β = p B β 0 = 0 β (x = B E c < x B. (66 he optimal objective is given by Θ = p+( p C ( p B+( p µ p+( p + P E (xc ( x B +p C( B (i = p+( p C(Emin(E (i E c + pe C ( min E (i (i B E B } E (i > E c (ii = Θ (67 where (i is by (6 and (8 and (ii is by definition of the approximate throughput (20. Substituting in (64 and taking the limit as N : lim inf N N N(g Θ. Since the policy g was arbitrary this implies Θ Θ. his concludes the proof of the upper bound in heorem 2. 3 hese constraints are only necessary but are not sufficient to describe the optimization variables; nevertheless this is still a valid upper bound on (63. VII. CONCLUSION We proposed a simple online power control policy for the energy harvesting channel with block i.i.d. energy arrivals and showed that it is within a constant gap from the optimum. his resulted in a simple and insightful formula that approximates the optimal throughput. Previously optimal power control has been characterized for the offline case and for the online case with i.i.d. energy arrivals. Our results reveal how correlation in the energy harvesting process impacts online power control and the corresponding optimal throughput. While in this paper we consider block i.i.d. energy arrivals with arbitrary distribution the development of these results for an important special case namely block i.i.d. Bernoulli arrivals can be found in our preliminary work which provided the insights for the current paper. o the best of our knowledge this is the first paper to develop online power control policies with explicit guarantees on optimality for an energy arrival process with memory and a finite battery. APPENDIX A PROOF OF LEMMA Consider an arbitrary set of power allocations g (i...g(i that satisfy the energy constraints (2 and (3. We will show that we can replace the first elements with an appropriate g (i... g(i such that g(i =... = g (i while still preserving the energy constraints and without decreasing the reward. Specifically let b (i be the battery state at the last time slot of block i given g (i...g(i. Set g (i =... = g (i = min( E (i b(i b(i B. We will show that this is an admissible policy which produces a higher reward than the original policy and results in the same battery state b (i at the end of the block. First the reward obtained by the original policy can be upper bounded using concavity of the function C( : r i = C(g (i j j=

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Can Feedback Increase the Capacity of the Energy Harvesting Channel? Dor Shaviv EE Dept., Stanford University shaviv@stanford.edu Ayfer Özgür EE Dept., Stanford University aozgur@stanford.edu Haim Permuter