NEW generation devices, e.g., Wireless Sensor Networks. Battery-Powered Devices in WPCNs. arxiv: v2 [cs.it] 28 Jan PDF Free Download

1 Battery-Powered Devices in WPCNs Alessandro Biason, Student Memer, IEEE, and Michele Zorzi, Fellow, IEEE arxiv:161.6847v2 [cs.it] 28 Jan 216 Astract Wireless powered communication networks are ecoming an effective solution for improving self sustainaility of moile devices. In this context, a hyrid access point transfers energy to a group of nodes, which use the harvested energy to perform computation or transmission tasks. While the availaility of the wireless energy transfer mechanism opens up new frontiers, an appropriate choice of the network parameters e.g., transmission powers, transmission duration, amount of transferred energy, etc.) is required in order to achieve high performance. In this work, we study the throughput optimization prolem in a system composed of an access point which recharges the atteries of two devices at different distances. In the literature, the main focus so far has een on slot-oriented optimization, in which all the harvested energy is used in the same slot in which it is harvested. However, this approach is strongly suoptimal ecause it does not exploit the possiility to store the energy and use it at a later time. Thus, instead of considering the slot-oriented case, we address the long-term maximization. This assumption greatly increases the optimization complexity, requiring to consider, e.g., the channel state realizations, its statistics and the atteries evolution. Our ojective is to find the est scheduling scheme, oth for the energy transferred y the access point and for the data sent y the two nodes. We discuss how to perform the maximization with optimal as well as approximate techniques and show that the slot-oriented policies proposed so far are strongly su-optimal in the long run. Index Terms WPCN, douly near far, energy harvesting, energy transfer, power transfer, WSN, MDP, approximate MDP, Value Iteration, optimization, policies, finite attery. I. INTRODUCTION NEW generation devices, e.g., Wireless Sensor Networks WSNs) or moile cellular networks, are ale to provide high communication performance in terms of throughput or delay at the cost of computational complexity and demanding power supplies. Wireless Energy Transfer WET) has een recognized as one of the most appealing solutions for supplying moile devices when their atteries cannot e easily or cheaply replaced. Via WET it ecomes possile to greatly extend the network lifetime and improve the devices performance y avoiding energy outage situations. Nowadays, it is possile to transfer powers of tens to hundreds of microwatts at distances of 1 m and 5 m see, for example, the Powercast company products [2]) and thus it ecomes possile to supply ultra-low power mid-range networks. Differently from standard amient energy harvesting techniques, WET has the major advantage of eing fully controlled and does not rely on an external random phenomenon. We consider a Wireless Powered Communication Network WPCN) in which a ase station transfers energy to different users. In WPCNs, one of the main goals is to design the The authors are with the Dept. of Information Engineering, University of Padova, Padova, Italy. email: iasonal,nil,zorzi@dei.unipd.it. A preliminary version of this paper will e presented at IEEE WCNC 216 [1]. scheduling procedure in order to etter exploit the availale resources and meet some Quality of Service QoS) criteria e.g., delay, packet drop rate, throughput). When a node is far away from the ase station, it experiences a worse channel than the closer devices, on average, oth for data transmission in uplink and for energy harvesting in downlink. Therefore, in order to develop a fair system in which all users achieve the same QoS, more resources have to e used to feed the far users. This phenomenon is known in the literature as douly near-far effect and has een solved in the slot-oriented case in which all the transferred energy is also used for data transmission in the same time slot) [3]. However, in the attery-powered case, in which the harvested energy can e stored and used at a later time, new considerations can e made channel condition, current attery levels, attery sizes, future energy arrivals, etc.) and the optimization approach ecomes more involved. The goal of the present work is to investigate such a prolem. Wireless Energy Transfer techniques have experienced a renewed research interest in the last few years [4] and several applications can e found in the WSN field, where lowpower devices are fed with the transferred energy and use it for transmission or computation purposes. Different aspects of WET have een studied y oth industry and academia, e.g., in terms of circuit and rectenna design [5] ut also in terms of transmission protocols y the communication and networking community. In this field, three major research areas can e identified so far: SWIPT, energy cooperation and WPCN. In SWIPT Simultaneous Wireless Information and Power Transfer) systems, the tradeoffs etween information and energy transfer are investigated [6]. Nowadays, ecause of hardware constraints of the current technology, a real simultaneous data and energy transmission is not possile yet, and therefore the Time Splitting TS) and Power Splitting PS) approaches were introduced [7]. The TS approach, in which WET and data transmission are temporally interleaved, was studied in [8], [9], whereas PS was analyzed in [1] [13]. A second research area studies the energy cooperation paradigm, where different nodes exchange their energy to improve the system performance. This is particularly suitale for achieving energy fairness among devices when one has more energy resources e.g., it is recharged y an external and powerful amient energy source). The concept of energy cooperation was introduced in [14], in which Gurakan et al. studied a system of a few nodes and defined the optimal offline communication schemes. Recently, [15] studied a similar system in which nodes receive energy y the same external energy source and introduced a save-then-transmit scheme. [16] analyzes a multiterminal network with energy harvesting nodes which transfer and receive energy from other devices in the network. [17] introduced achievale performance upper ounds for a

2 transmitter-receiver case with finite energy uffers with or without energy cooperation. In [18], routing with energy cooperation was studied. Even if the previous research topics have mainly een studied separately, it is expected that in the near future a system which considers multiple aspects of WET will e analyzed or developed. While several different kinds of WET mechanisms are availale, e.g., inductive coupling or strongly coupled magnetic resonances [19], [2], in this work we focus on Radio-Frequency WET RF-WET). Indeed, since RF-WET is a far-field WET technique, it is suitale for powering several devices simultaneously in a distriuted area. Via dedicated components, namely rectifiers [21] which, for example, can e composed of a diode [22], a ridge of diodes or a voltage rectifier multiplier), the devices are ale to convert the input RF signal into DC voltage, which can e used to refill their atteries. The RF signal can e harvested from the environment e.g., this may e possile in a city where several electromagnetic sources are availale), or from a dedicated source, i.e., a particular node generally the access point) which emits RF signals to feed the devices commercial products for RF-WET transmission/reception are already availale, see [2]). This last kind of scenario is known as wireless powered communication network WPCN). In a WPCN where multiple devices harvest energy from the ase station and transmit data in uplink, a douly near-effect phenomenon is present: a user far away from the ase station experiences, on average, a worse channel than the others oth in uplink therefore it has to use more energy to perform its transmission) and in downlink thus it gathers less energy). The douly near-far prolem was initially studied in [3]. The authors introduced a harvest-then-transmit scheme in which the time horizon is divided in slots and every slot is divided in two phases: first, the access point transfers energy to the devices and, secondly, the devices use the harvested energy to transmit data in the uplink channel. The trade-offs etween the times to use for transferring energy and transmitting data were investigated and the optimal scheduling scheme was provided. The authors extended their previous work in [23], where user cooperation was taken into account in a twodevice system. It was shown that coordination is a powerful technique which can effectively improve the system performance. Nevertheless, ecause of the additional complexity demanded to compute the scheduling scheme, and the unavoidale coordination and physical proximity required among devices, the cooperation solution may not e suitale for every scenario. [24] descried a harvest-then-cooperate protocol, in which source and relay work cooperatively in the uplink phase for the source s information transmission. The authors also derived an approximate closed-form expression for the average throughput of the proposed protocol. [25] studied the case of devices with energy and data queues and descried a Lyapunov approach to derive the stochastic optimal control algorithm which minimizes the expected energy downlink power and stailizes the queues. The long-term performance of a single-user system for a simple transmission scheme was presented in closed form in [26]. [27] modeled a WPCN with a Decentralized Partially Oservale Markov Decision Process Dec-POMDP) and minimized the total numer of waiting packets in the network. Similarly to [3], a WPCN was studied in [28], where the access point has also the capaility of eamforming the transferred RF signal in order to serve the most disadvantaged users and to guarantee throughput fairness. The authors managed to convert a non-convex optimization prolem into a spectral radius minimization prolem, which can efficiently solved. [29] studied the applicaility of the massive multiple-input-multiple-output MIMO) technology to a WPCN. With massive MIMO it ecomes possile to receive data from several different devices simultaneously thanks to spatial multiplexity) ut also to improve the downlink performance y using sharp eams. Most previous works descrie a half-duplex system in which uplink and downlink cannot e performed simultaneously. Instead, the full duplex case was studied in [3], [31]. [3] optimized the time allocations for WET and data transmission for different users in order to maximize the weighted sum rate of the uplink transmission. The authors considered perfect as well as imperfect selfinterference SIC) at the access point and showed that, when SIC is performed effectively, the performance of the fullduplex case outperforms the half-duplex one. A survey of recent advances and future perspectives in the WPCN field can e found in [32]. In this work, we consider a WPCN composed of an Access Point AP) and two distriuted nodes. AP transfers energy in downlink to the nodes, which use the harvested energy for transmission purposes. Our system model is similar to that of [3], [26], [28]. As in [26], we consider attery-powered devices and focus on the long-term performance. However, [26] considered only one device, whereas, in the present work, we consider the near-far effect prolem when multiple devices are present. Moreover, differently from [26], we descrie how to derive the optimal strategy to maximize the throughput of the system, whereas [26] focuses on the performance evaluation of a given strategy. [3], [28] descrie a prolem similar to what we analyze, ut they focus on the optimization in a single slot and not in the long term. This assumption turns out to e very restrictive in practice. Indeed, in our numerical evaluation we will descrie the differences etween these two approaches and show that focusing only on a greedy slotoriented optimization is strongly su-optimal in the long run. We study the throughput maximization prolem and solve it optimally, via the Markov Decision Process MDP) theory, and approximately, exploiting the results we derived in the optimization section. We explicitly study the trade-offs among attery size, amount of availale energy, fading effects and performance. We show how fading and amount of dowlink energy are related and descrie how the system changes when the power supply is scarce or aundant. This work can e considered as a first step to understand the key tradeoffs and optimization prolems in a WPCN with finite attery-powered devices. The paper is organized as follows. Section II defines the system model we analyzed and introduces the optimization prolem, which is solved in Sections III and IV optimally and approximately, respectively). We riefly descrie the slotoriented maximization in Section V. Section VI presents our

3 numerical results. Finally, Section VII concludes the paper. II. SYSTEM MODEL AND OPTIMIZATION PROBLEM We consider a system composed of three nodes: one Access Point AP) with Wireless Energy Transfer WET) capailities and two devices, namely D 1 and D 2. Via an RF-WET mechanism, AP recharges the atteries with finite capacities B 1,max J and B 2,max J) of the two devices. It is assumed that AP has an unlimited amount of energy availale. The devices use the energy transferred in downlink from the access point to upload data packets. An approach similar to the harvest-then-transmit protocol proposed in [28] is adopted to keep the devices operational. Under this scheme, time is divided in slots of length T and slot k corresponds to the time interval [kt, k + 1)T ). Every slot is divided in two phases: 1 1) uplink UP): in the first phase, which lasts for τ 1 + τ 2 T seconds, the two devices transmit data to AP in a TDMA fashion using the energy stored in their atteries; 2) downlink DL): during the second τ AP T τ 1 τ 2 seconds, D 1 and D 2 harvest the energy transferred from the access point and store it in their atteries. AP is assumed to have multiple antennas and is ale to perform energy eamforming in order to split the energy transferred to D 1 or D 2 during the DL phase, whereas D 1 and D 2 are assumed to e equipped with an omnidirectional antenna. A. Uplink Phase At the eginning of a slot, device D i i 1, 2) has B i [, B i,max ] J of energy stored. In a TDMA fashion, first device 1 and then device 2 occupy the channel to transmit data in the uplink channel for τ 1 and τ 2 seconds, respectively. The transmission powers ρ 1 and ρ 2 and the time allocations τ 1 and τ 2 can change dynamically in every slot and are the control variales of our optimization. We assume that the main source of energy consumption is due to the transmission and therefore neglect the circuitry costs. Note that device D i is constrained to consume an amount of energy E i τ i ρ i B i in the upload phase. We also impose the realistic constraint ρ i [P i,min, P i,max ] when a transmission is performed. We assume that, in every slot, the devices always have enough data to transmit, i.e., the transmission data queue is always nonempty. This assumption is useful to characterize the maximum throughput of the system. According to Shannon s formula, when a power ρ i is used, the noise power is σ 2 and the uplink channel gain is h i, the transmission rate of device D i is Rρ i, h i ) = log 1 + h ) iρ i σ 2. 1) Thus, during a single slot, the amount of transmitted data is the time reserved for device D i multiplied y the transmission rate, τ i Rρ i, h i ). 1 Unlike in [3], we choose to consider first the uplink and then the downlink phases in order to more easily track the energy level of the two devices when we set up the MDP formulation in Section II-C. The uplink channel is affected y flat fading, which remains constant over the same slot ut may change from slot to slot. The channel gain h i can e expressed as h i = h i θ i, where θ i is a random variale which represents the fading and h i is the average channel gain, otained y considering the path loss effects as h i = h,i d γi i. h,i is the signal power gain at a reference distance of 1 m, d i is the distance etween D i and AP expressed in meters, and γ i is the path loss exponent. Note that the device closer to AP experiences, on average, a etter channel and spends less energy than the other to transmit the same amount of data near-far prolem). B. Downlink Phase The downlink period lasts for τ AP T τ 1 τ 2 seconds. During this phase, the access point sends two energy eams to the devices. The sent signal x ET can e seen as the comination of two contriutions: x ET = 1 s 1 + 2 s 2, 2) where s i represents the signal carrying energy toward device i and i is its directional gain. Since s i does not carry information, we assume that it is an i.i.d. random variale with zero mean and unit variance. The total power transferred y AP is 1 2 + 2 2 Q max, where Q max < is a technology parameter which represents the maximum power that can e used to transfer energy e.g., on the order of some watts). The received signal at device D i is y i = g i x ET + n i, 3) where g i is the channel gain from AP to device D i. n i is the receiver noise, assumed negligile for energy transfer. We assume that eamforming is perfect, thus y i = g i i s i. The corresponding transferred power is P i,rc = ηg i i 2 = ηg i Q i, where η is a constant in, 1] that models the energy conversion losses at the devices and Q i is the power sent to device D i, with Q 1 + Q 2 Q max. The term g i can e explicitly written as g i = g i κ i in order to consider the flat fading effects, where g i = g,i d δi i and κ i are defined similarly to h i and θ i in Section II-A g,i is the signal power gain at a reference distance of 1 m and δ i is the path loss exponent). In summary, when a power Q i is transferred to device i, the stored energy is C i = τ AP ηq i g,i 1 d i ) δi κ i. 4) The channel gain components in uplink h 1, h 2 and downlink g 1, g 2 can e assumed equal if the transmission is performed in the same frequency and, which is a common assumption in WPCNs [3]. Finally, note that the downlink channel of the user farther from AP is worse on average), leading to a douly near-far scenario. C. Batteries In every slot, the energy level of attery i is updated according to B i minb i,max, B i E i + C i 5)

4 Note that the arguments of the min are always non-negative ecause the energy consumption E i is chosen such that E i B i. We also highlight that C i is a random variale ecause of the channel fading. The min operation is used to explicitly consider the effects of finite atteries. The attery evolution depends upon the choices of all parameters τ i, ρ i, τ AP and Q i, which are the control variales of our optimization and will e analyzed in the next section. In order to perform the optimization, we model the system with a discrete Markov Chain MC). In particular, we discretize the attery of D i in i,max + 1 levels, where i,max represents the maximum amount of energy quanta that can e stored in the attery and one energy quantum corresponds to B i,max / i,max J. There exists a trade-off etween the precision of the discrete approximation and the corresponding numerical complexity of the model. In general, if i,max is sufficiently high, the discrete model can e considered as a good approximation of the continuous system. Equation 5) can e rewritten in terms of energy quanta: i min i,max, i e i + c i. In every slot, only an integer amount of energy quanta e i can e extracted from the attery. Similarly, only an integer amount of energy quanta can e harvested, thus we define c i = C i i,max /B i,max the floor is used to otain a lower ound to the real performance, whereas an upper ound can e similarly otained using the ceiling). Similarly, if the channel fading is descried y a continuous r.v., we discretize it using a finite numer of intervals. In the rest of the paper, the old notation is used to identify a pair of values, e.g., a = a 1, a 2 ). D. Optimization Prolem We define a policy µ as an action proaility measure over the state set, namely S. S represents all the cominations of attery levels and channels g, h. We assume that the policy is computed y a central controller e.g., the access point), which knows the state of the two atteries and the joint channel state g, h), and distriuted among nodes. 2 Note that, while estimating the uplink channel is a standard task, downlink channel estimation may e more challenging due to the hardware limitations of the energy receivers. However, y exploiting innovative techniques, e.g., [33], it is possile to otain accurate CSI for the downlink channel as well. For every state s =, g, h) S, µ defines with which proaility an action a is performed. a summarizes the data transmission duration τ, the energy transfer duration τ AP, the transmission powers ρ, and the amount of energy Q to send over the two eams, i.e., a = τ, τ AP, ρ, Q). Formally, µ defines P µ a s), with a As) P µa s) = 1, where As) is the set of the possile actions in state s e.g., As) includes the energy constraints imposed y the attery levels). For the sake of presentation simplicity, in the next sections we use a deterministic policy µ, i.e., P µ a s) is equal to 1 for a = ā s and to for a ā s, where ā s is an action in 2 In the cases in which CSI is only partially availale, our model is useful to characterize the performance upper ound. A detailed analysis of the partial CSI case is left for future study. As). However, in our numerical evaluation we consider a more general random policy. Our focus is on the long-term throughput optimization prolem. This is suitale for scenarios in which nodes operate in the same position for a sufficient amount of time e.g., sensors), ut can e easily extended to the finite-horizon case with similar techniques. Our goal is to maximize the minimum throughput value reached y oth devices in order to increase the QoS. Formally, the reward G µ is expressed as G µ = ming 1,µ, G 2,µ, 6) G i,µ lim inf K K 1 1 K k= E [τ i Rρ i, h i )], i 1, 2. 7) The expectation is taken with respect to the channel conditions. The maximization process is where µ is the Optimal Policy OP). µ = arg max Ḡ, 8) µ III. OPTIMAL SOLUTION In this section, we will show how to solve the prolem descried in Section II-D and otain OP. In particular, y exploiting the Markov Decision Process MDP) theory, the optimization process can e simplified y focusing on the optimization of ā s for every fixed s instead of considering the whole function µ, i.e., the optimization can e parallelized see Bellman s equation in [34]). Moreover, we will descrie how it is possile to reduce the action a = τ, τ AP, ρ, Q) to a simpler action with only four entries ã = τ AP, Q 1, e). A. Max-min Prolem We now derive a simple technique to deal with the max-min optimization prolem of Equation 8). Indeed, since standard dynamic programming techniques are designed for min or max and not max-min) prolems, we recast the prolem in a standard form. Consider a new optimization prolem, similar to the previous one except for the ojective function, which ecomes H µ α) instead of G µ : H µ α) = αg 1,µ + 1 α)g 2,µ, 9) where α [, 1] is a constant. Note that the new prolem µ α) = arg max µ H µα) 1) is expressed in a max form, and thus is easier to solve. If α = 1 [α = ], then we are maximizing the performance of device D 1 [D 2 ] only and neglecting the other device. Name µ α) the policy which maximizes H µ α) for a given α. Since µ α) depends upon α, also G 1,µ α) and G 2,µ α) implicitly depend upon α. It is straightforward to show that G 1,µ α) [G 2,µ α)] increases [decreases] as α increases. We now want to find the value ᾱ such that the new prolem coincides with the original one. Consider the following intuitive result.

5 Lemma 1. The optimal solution of Prolem 8) allocates the same throughput to oth users. Therefore, we impose Lemma 1 as design constraint for the new prolem and name ᾱ the value of α at which such condition is satisfied, i.e., G 1,µ ᾱ) = G 2,µ ᾱ). Under this condition, we have H µ ᾱ)ᾱ) = G 1,µ ᾱ) = G 2,µ ᾱ) 11) As a consequence, at α = ᾱ, we otain µ µ ᾱ), i.e., OP solution of 8)) coincides with the new policy µ ᾱ) which maximizes H µ ᾱ). This procedure simplifies the numerical optimization ecause µ ᾱ) can e found exploiting standard stochastic optimization algorithms, e.g., the Value Iteration Algorithm VIA), or the Policy Iteration Algorithm PIA) [34]. Practically, the value ᾱ which satisfies 11) can e found with a isection search as follows. First, aritrarily fix α [, 1] and maximize H µ α) with VIA or PIA. Using the optimal solution, compute G 1,µ α) and G 2,µ α). If G 1,µ α) is greater [less] than G 2,µ α), then decrease [increase] α and repeat the procedure. The algorithm is repeated until the throughputs of the two nodes are within ɛ of each other, with ɛ a sufficiently small constant. In the next, we will equivalently) deal with H µ α) instead of G µ. B. Bellman s Equation Structure The most suitale algorithms for solving our prolem are VIA or PIA. In the next we descrie the policy improvement step which is one of the asic operations of oth algorithms see [34, Sec. 7.4, Vol. 1]). We define the cost-to-go function associated to state s as J s. The policy improvement step exploits Bellman s equation as follows J s max r α τ, ρ h) + Ps s, a)j s a A s s, 12) r α τ, ρ h) ατ 1,µ Rρ 1,µ, h 1 ) + 1 α)τ 2,µ Rρ 2,µ, h 2 ) The proaility of going from state s to state s given the action a can e expressed as Ps s, a) a) = P, g, h, g, h, a) 13a) ) = P, g, h, g, a) 13) c) = fg, h )P, g, a) 13c) d) = fg, h )P 1 1, g 1, a)p 2 2, g 2, a), 13d) where fg, h) is the pmf of the channel state note that the randomness is given y the fading components θ i and κ i only). a) holds y definition. ) holds ecause the uplink channel does not influence the attery evolution given the action). c) holds ecause the channel is i.i.d. over time and independent of other quantities. The last step holds ecause the states of the atteries evolve independently in the two devices, given a fixed action. Exploiting Equation 4) and the MDP formulation, the transition proailities can e expressed as follows. If i < i,max, P i i, g i, a) = 14) otherwise χ i τ i ρ i + ηg i τ AP Q i i,max /B i,max = i, P i,max i, g i, a) = 15) χ i τ i ρ i + ηg i τ AP Q i i,max /B i,max i,max. χ is the indicator function and the floor is used to discretize the energy and use the MDP approach. 14)-15) indicate that the attery transitions follow a deterministic scheme given the action and the state of the system). Intuitively, this happens ecause the randomness of the channel fading is already included in g i. Therefore, 12) can e reformulated as follows J s max a A s r α τ, ρ h) + g,h fg, h )J,g,h ), 16) with defined according to 14)-15). Note that, with this oservation, we can avoid to iterate over, saving computation time. Another interesting point is that does not depend upon the particular values of τ and ρ ut only upon their products τ 1 ρ 1 and τ 2 ρ 2. We will use this property in the next section. C. Variales Reduction VIA or PIA requires to focus on the maximization of Equation 16) only, which can e formally written as in this susection we always refer to a fixed state s =, g, h)) s.t.: max τ,τ AP,ρ,Q r α τ, ρ h) + τ ρ, τ AP, Q g), 17a) B i,max τ i ρ i B i = i, i,max i 1, 2, 17) τ 1 + τ 2 + τ AP T, 17c) Q 1 + Q 2 Q max, τ, τ AP, P min ρ P max, Q. 17d) 17e) Constraints 17)-17e) represent the set A s. 3 and are the component-wise inequalities. τ ρ, τ AP, Q g) is a quantity that, as the second term in 16), does not depend upon the individual values of τ and ρ ut only on their products denotes the Hadamard product). This happens ecause the attery update formulas consider only the overall energy consumption of a device in a slot, that is given y the transmission duration τ i multiplied y the transmission power ρ i see Equations 14) and 15)). Without deriving particular properties of J s, the classic procedure to solve 17) is to perform an exhaustive search over all the seven optimization variales. However, the computation may e too demanding 4 and simpler optimization techniques are required. In particular, in this section we propose a method to simplify the optimization. 3 Technically, we should also consider the cases in which ρ 1 = and/or ρ 2 =. However, these are trivial cases that can e easily analyzed separately. 4 Note that Prolem 17) must e solved for every comination of, g, h and for every step of PIA.

6 First, it can e shown that choosing Q 1 +Q 2 = Q max is optimal otherwise the availale resources would e underused). Similarly, using τ AP < T τ 1 τ 2 is suoptimal. Therefore, without loss of optimality, we can choose Q 2 = Q max Q 1 and τ 2 = T τ AP τ 1 and avoid to iterate over Q 2 and τ 2. Now, fix the products τ ρ = E, where E i represents the energy consumed y device D i. In order to solve Prolem 17), we consider the vector E instead of τ and ρ. Given Q 1, τ AP, E, the particular values for the duration and the transmission power are extracted y solving the following su-prolem s.t.: max τ,ρ r α τ, ρ h) + E, τ AP, Q g), 18a) τ ρ = E, τ 1 + τ 2 = T τ AP, τ, P min ρ P max, 18) 18c) 18d) where E, τ AP, Q g) is a constant term that can e removed from the max argument. Prolem 18) can e rewritten as a function of τ 1 only: max τ 1 s.t.: ) E1, h 1 19a) τ 1 ) E 2 + 1 α)t τ AP τ 1 )R, h 2, T τ AP τ 1 ατ 1 R E1 τ 1 τ 1,min max, T τ AP E 2, 19) P 1,max P 2,min E1 τ 1 τ 1,max min, T τ AP E 2, 19c) P 1,min P 2,max 19) is a uni-dimensional maximization prolem which except in the trivial cases, e.g., E 1 = or E 2 = or no feasile solutions) can e easily solved y taking the derivative of the reward function, given in the following expression α log 1 + h ) 1 E 1 E 1 h 1 σ 2 τ 1 τ 1 σ 2 + E 1h 1 E 2 h 2 T τ AP τ 1 )σ 2 + E 2h 2 log 1 + h 2 σ 2 ) + 1 α) 2) E 2 T τ AP τ 1 )), and setting it to zero. It can e shown that the previous expression has a unique zero in, T τ AP ) that corresponds to the optimal value τ 1,n.c. of Prolem 19) without constraints. The optimal solution of 19), namely τ 1, can e found as τ 1 = maxminτ 1,n.c., τ 1,max, τ 1,min. 21) Given τ AP, E and τ 1, the values of τ 2, ρ 1 and ρ 2 can e derived from 18)-18c). In summary, instead of performing an exhaustive search over seven variales, we just iterate over τ AP, Q 1 and E, and recover the other parameters y solving 18) and choosing Q 2 = Q max Q 1, τ 2 = T τ AP τ 1. We also remark that E 1, E 2 must satisfy e i E i i,max /B i,max,..., i,max. The previous method is useful to simplify the numerical computation. Moreover, starting from the definitions of E 1 and E 2, additional insights can e derived and in particular we can exploit the following pruning techniques. Lemma 2. The energy consumption of D i is not decreasing with its uplink channel gain h i. Formally, consider consider state, g, h A) ) and assume that, for fixed τ AP and Q, the est e which maximizes 17a) is e A). Then, for state, g, h B) ), with h 1 h 1 and h 2 h 2, the est e, namely e B), satisfies e B) 1 e A) 1, e B) 2 e A) 2 22) and similarly y switching the suscripts 1 and 2). Intuitively, the previous lemma holds ecause the reward of Equation 1) increases with h i. Numerically, this means that, given e A), we can avoid to iterate over all the space,..., 1 ),..., 2 ) and restrict the optimization space to e A) 1,..., 1 ),..., e A) 2 ) only. D. Low-SNR Regime An interesting and practical case 5 in which more analytical results can e developed and explained is the low- SNR regime. In this section we provide additional details for such a case. We assume h 1 σ 2 ρ 1 1 and h 2 σ 2 ρ 2 1 low-snr assumption), therefore Rρ i, h i ) h i σ 2 ρ i. In this case, r α τ, ρ h) reduces to αe h 1 1 + 1 α)e h 2 2, i.e., it depends only upon the product τ ρ = E. Therefore, the est choice ecomes to use the maximum transmission power P i,max and the minimum transmission duration E i /P i,max at oth devices. In this way, the system achieves the same reward per slot and maximizes the downlink phase, thus more energy is harvested and stored. As a consequence, once E is specified, the downlink duration τ AP is uniquely determined as τ AP = T E 1 /P 1,max E 2 /P 2,max. σ 2 E. Reducing State Space Complexity In a general step of PIA or VIA, given the current policy, the corresponding cost-to-go function J s has to e computed policy evaluation step [34, Sec. 7.4, Vol. 1]). This process is challenging when the state space is large. So far, the state of the system is the tuple s =, g, h). However, since g and h evolve independently over time, the state space can e reduced to s = ) only, as follows. Define a new cost-to-go function K σ 2 J,). 23) K sustitutes J,) in the original prolem. Indeed, we can rewrite the policy improvement step as K fg, h) max r α τ, ρ h) + Ps s, a)j s a A,) s 24a) 5 Indeed, since the amount of transferred energy is low due to the WET inefficiencies, also the transmission powers are low, leading to a low-snr scenario.

7 B k) = 9 B k) = 3 B k) = 1 We now discuss in more detail the two previous points. The policy improvement step ecomes, for every B k+1), K k+1) = fg, h) max r α τ, ρ h) + a A,) k) 25) Figure 1: Different sets B k) when 1,max = 2,max = 9. Rows and columns correspond to 1 and 2, respectively. = fg, h) max r α τ, ρ h) + K, 24) a A,) where is defined according to 14)-15). This procedure further simplifies the numerical computation ecause 1) it reduces the complexity of the policy evaluation step there is a lower numer of states) and 2) it reduces the numer of elementary operations inside the max operation in the policy improvement step. IV. APPROXIMATE SCHEME Finding the optimal policy is practically feasile only for a relatively small numer of discrete values which however corresponds to a rough quantization. Therefore, in this section we propose a method which is ased on the characteristics of the original solution ut is faster to compute and achieves approximately the same performance of OP. This is particularly useful to characterize the system performance and identify the system trade-offs. Even with the simplifications introduced in Section III, the main challenge is to perform the policy improvement step, i.e., solving 24) for all system states. To manage this prolem, several different approximated techniques have een proposed in the literature so far. An interesting idea is to approximate the function K with another one simpler to compute. We follow this approach in the remainder of this section, and derive an Approximate Value Iteration Algorithm App-VIA) see [35, Sec. 6.5]). A. Approximate Value Iteration In the classic VIA, the optimal policy is derived y iteratively solving 24) until the cost-to-go function converges. In the approximate approach, we modify every iteration of VIA according to the following two steps: 1) compute K k) for every B k) performing the policy improvement step Eq. 24)), with B k) B. The superscript k) denotes the k-th iteration of VIA and B is the set of all attery levels; 2) interpolate K k) for every B\ B k) using the values of K k) computed in the previous step. The advantage is that the policy improvement is performed only for a suset B k) rather that for every attery level in B. See Figure 1 for a graphical interpretation. A lack circle means that B k). In the last case, all the attery levels are in B k), i.e., Bk) = B. In general, Bk) can dynamically change in every step of the algorithm in a deterministic, stochastic or simulation ased way. We further discuss our approach in the numerical evaluation section. where is defined according to 14)-15). Kk+1) represents the approximate value function at step k + 1 and is defined only in the suset B k+1), whereas is such that k) k) k) = K, if B k). 26) In the second phase of the algorithm, for all B k), k) is derived exploiting 26) with an interpolation process or using a mean squares error approximation. In practice, r k ) is designed in order to approximate the true function K k). We remark that Kk+1) is defined only in B k) k+1), whereas is defined for every B. B. Convergence Properties In the following we show that, provided that the approximation k) is sufficiently good, the long-term reward of App-VIA is a good approximation of VIA. First, we introduce the notation T ) as follows. Define the two sets K k) K k), B and k) k), B. Then, Equations 24) and 25) can e written as ) K k+1) = T K k),, B, 27) = T k), ), B k+1), 28) K k+1) respectively. Also, assume that the initial configurations are equal, i.e., K ) = ). Note that K k+1) is evaluated for k+1) every, whereas we compute K only in the suset B k+1). Proposition 1. After N iterations, the cost-to-go functions of App-VIA and VIA differ y at most Nɛ, i.e., 6 with ɛ K N) N) Nɛ 29) VIA App VIA max max k+1) T k), ) k=,...,n 1 B Proof: See Appendix A. 3) We first remark that, ecause of 3), Proposition 1 descries a worst case analysis. N corresponds to the numer of iterations of VIA and, in our prolem, it can e numerically verified that N is typically small, e.g., N 1. The previous proposition provides some ound to the algorithm performance and guarantees convergence, provided that the approximation of K k+1) is sufficiently good. 6 We adopt the notation K N) N) max B K N) N).

8 V. TRADITIONAL SCHEME In the literature, the main focus so far has een on the optimization in a single time slot, which we riefly report in this section for the sake of completeness. In particular, we consider the harvest-then-transmit scheme, in which all the energy harvested in a slot is used for transmission in the same slot. If C 1 and C 2 joules of energy are transferred at the eginning of the slot, in the uplink transmission phase D i is suject to the following constraint E i minc i, B i,max, 31) i.e., it cannot consume more energy than what it received in the same slot nor can it exceed the maximum attery size. The optimization variale is a tuple of 7 elements. Formally, the optimization prolem is s.t.: max minτ 1Rρ 1, h 1 ), τ 2 Rρ 2, h 2 ), τ,τ AP,ρ,Q τ 1 + τ 2 + τ AP T, Q 1 + Q 2 Q max, τ ρ τ AP η Q g, τ ρ B max, τ, τ AP, P min ρ P max, Q 32a) 32) 32c) 32d) 32e) 32f) As in Section III, we solve separately the trivial cases h i =, g i =, ρ i = ). The solution of 32) is given in Proposition 2. Constraints 32e)-32f) identify the feasile region. In the following, i = 1 if i = 2 and vice-versa. Proposition 2. The optimal ρ solution of Prolem 32)) can e derived as follows the other parameters are otained according to Equations 46)-48) in Appendix B). Name ρ i the solution of ) σ 2 ηg i Q max + ρ i = + ρ i h i log 1 + h ) i σ 2 ρ i. 33) If ρ and the corresponding τ, Q, τap lie in the feasile region, then ρ = ρ ; otherwise the optimal solution lies on the oundary of the feasile region. Proof: See Appendix B. Exploiting the results of the previous proposition, we can derive the optimal reward achieved in a single slot. By averaging over the channel gains, we otain the corresponding long-term throughput G σ fg, h)τ 1 Rρ 1, h 1 ) = fg, h)τ 2 Rρ 2, h 2 ), 34) where σ is the slot-oriented policy which solves 32). In the numerical evaluation we will compare G σ and G µ. Note that, differently from µ, the slot-oriented strategy is much simpler to compute ut provides lower reward. A. Low-SNR Regime In this section we provide additional details for the low-snr regime in the case ρ = ρ. Equation 33) can e solved in closed form as ρ ηg i Q max σ 2 i =. 35) h i Note that the optimal transmission power of device i depends upon its parameters only. If the downlink channel gain increases, more energy is harvested, therefore a higher transmission power can e used. Interestingly, the etter the uplink channel gain h i, the lower the transmission power. The corresponding Q i can e derived using Equation 48) Q i = g ih i Q max g 1 h 1 + g 2 h 2. 36) In order to alance the system performance, Q i decreases if g i or h i increases. In this case, it is etter to allocate less resources to the node with a etter channel and direct more energy to the other node. A closed form expression for the reward in a single slot can e otained. Starting from the equations of τ, ρ and Q, we have τ 1 Rρ 1, h 1 ) = τ 2 Rρ 2, h 2 ) 37) = g 2 h 2 ηg1 h 1 Q max σ 2 ηg 1 g h 1 h 2 2 σ 2 Q max T ) + 1 + g 1 h 1 ηg2 h 2 Q max σ 2 ), + 1 which represents the highest reward that can e achieved in a single slot. The long-term reward can e otained comining the previous expression with 34), which can e easily solved numerically. VI. NUMERICAL RESULTS We study how the achievale rate changes as a function of the system parameters in different scenarios. As in [3], [28], we assume channel reciprocity for uplink and downlink, thus g i = h i in every slot however, we remark that our model is general and can e easily adapted to other cases). We consider an exponential random variale with unit mean for θ i Rayleigh fading) to model non line-of-sight links or Nakagami fading with parameter 5 when a strong line-ofsight component is present. We explicitly consider energy conversion losses y setting η =.8. If not otherwise stated, we use the following parameters h,1 = h,2 = 1.25 1 3, γ 1 = γ 2 = 2 path loss exponents), σ 2 = 155 dbm/hz noise power), a andwidth of 1 MHz, T = 5 ms slot duration), Q max = 3 W maximum transfer power), P 1,min = P 2,min = 1 mw and P 1,max = P 2,max = 1 mw. The attery sizes are important parameters which influence the performance of the system. In particular, since with large atteries the throughput of the system saturates, we choose to focus on the case of small atteries, i.e., B max B 1,max = B 2,max.1,..., 1 mj [36]. In Figure 2 we depict the slot division otained y averaging all the quantities with the steady-state proailities) with

9 1 1.5 3 d 2 = 3, α =.5 d 2 = 3, α = ᾱ d 2 = 5, α =.5 d 2 = 5, α = ᾱ 1 1.35 9 1.2 8 Transmission powers [W] 1.5 Q 2 = 64%.9.75.6.45 ρ 1 ρ 2.3 Q 1 = 36%.15 Q 2 = 83% Q 1 = 17%.1.2.3.4.5.1.2.3.4.5 7 Q 2 = 63% 6 Q 2 = 9% 5 4 3 2 Q 1 = 37% 1 Q 1 = 1%.1.2.3.4.5.1.2.3.4.5 Transfer powers [%] τ 1 τ 2 τ AP Figure 2: Average transmission powers ρ 1, ρ 2, transfer powers Q 1 /Q max, Q 2 /Q max and duration τ 1, τ 2, τ AP with α = ᾱ) and without α =.5) throughput alancing when d 1 = 1 m and d 2 = 3 or 5 m. and without throughput fairness when d 1 = 1 m and d 2 = 3 or 5 m. The first figure is otained y setting α =.5, i.e., the ojective function is the unweighted sum of the rewards of the two devices. Since D 1 is closer to AP and experiences, on average, a etter channel, it spends more time transmitting. Moreover, even if in Q 1 < Q 2, D 1 harvests much more energy than D 2 on average. While this scheme achieves the maximum system sum-throughput, it does not ensure fairness. In particular, the throughput of D 1 is.88 Mps, whereas the throughput of D 2 turns out to e only.34 Mps. It is also worth noting that D 2 does not contriute much to the gloal performance, ut a lot of resources are used to feed it Q 2 Q 1 ). Q 1 is smaller than Q 2 ecause the downlink channel of D 1 is etter and thus the first device harvests much more energy. When d 2 increases as in the third plot of Figure 2, the transmission duration of D 2 and its harvested energy ecome much lower. In this case, D 2 is so far from AP with respect to D 1 that it is not worth using a lot of resources to increase its throughput. Instead, the second plot of Figure 2 is otained at the end of the algorithm descried in Section III-A, i.e., for α equal to ᾱ =.91. With this policy, fairness is achieved and the throughput of the two devices G 1,µ = G 2,µ is.47 Mps which, as expected, results in a smaller sum-throughput than in the unalanced case). Note that to achieve this situation and to compensate the douly near-far effect, D 2 must receive much more energy and transmit with much more power than D 1. This phenomenum is emphasized in the last plot, in which 9% of the transmission power is devoted to D 2. We remark that we used a discrete model to approximate the continuous nature of the energy stored in the atteries see Section II-C), thus 1,max and 2,max play a key role in the computation of µ. In particular, for larger atteries higher 1,max and 2,max are required, incurring additional numerical complexity, whereas for small atteries the quantization can e coarser. Nevertheless, even with small atteries, computing Long-term reward.6.55.5.45.4.35.3.25.2.15 Battery size: E max =.25 mj E max =.15 mj E max =.125 mj E max =.1 mj Policy: µ App-VIA µ PIA/VIA σ.1 1 1.5 2 2.5 3 3.5 4 4.5 5 Distance [m] Figure 3: Long-term reward of µ evaluated with PIA/VIA and App-VIA and of σ as a function of d 1 when d 2 = 3 m and with Rayleigh fading. the optimal policy µ with PIA or VIA is a computationally intensive task. Therefore, in the following we present our results using the approximate App-VIA scheme introduced in Section IV. To justify the goodness of our approximation, focus on Figure 3, where we depict the throughput as a function of the distance d 1 for several different attery sizes. It can e seen that App-VIA closely approaches the optimal schemes, especially if the attery sizes are small. In our numerical evaluation we derived B k) as shown in Figure 4 see the lack circles). The left figure represents the optimal cost-to-go function K ), i.e., Prolem 24) has een solved for every pair 1, 2 ), whereas the right plot represents its ) approximation defined in Section IV. ) is otained with a linear interpolator. Figure 5 represents the throughput region of D 1 and D 2, otained changing α in, 1). Blue circles represent the fairthroughput optimal points, whereas the red crosses are the sum-throughput optimal points. They coincide only in the

1 K ) 1.5 1.5 2 1 1 5 2 1 Figure 4: Cost-to-go function K ) Black circles represent B ). 15 ) 1.5 1.5 2 1 1 5 2 1 left) and its approximation ) 15 right). Long-term reward D2.7.6.5.4.3.2.1 Fair-thr. Sum-thr. Fading: Nakagami Rayleigh Distance: d 1 = 2 m d 1 = 4 m.2.4.6.8 1 1.2 Long-term reward D 1 Figure 6: Long-term rewards G 1 and G 2 of µ and σ with Rayleigh and Nakagami fading when d 2 = 3 m and B max =.15 mj..7.7.6 Fair-thr. Sum-thr..65.6 Long-term reward D2.5.4.3.2.1 Distance: d 1 = 2 m d 1 = 3 m d 1 = 4 m Battery size: B max =.4 mj B max =.15 mj.2.4.6.8 1 1.2 Long-term reward D 1 Figure 5: Long-term rewards G 1 and G 2 of µ and σ when d 2 = 3 m and with Rayleigh fading. Long-term reward.55.5.45.4.35.3.25 Distance: d 1 =.5 m d 1 = 1.5 m d 1 = 2.5 m d 1 = 3. m d 1 = 4. m d 1 = 5. m.2.1.2.3.4.5.6.7.8.9 1 Battery size [mj] Figure 7: Long-term reward of µ and σ as a function of E max when d 2 = 3 m. symmetric cases d 1 = d 2. Otherwise, to alance the system performance, part of the throughput of one of the two devices has e to reduced. Ascissa [ordinate] points are otained when α = 1 [α = ], i.e., D 2 [D 1 ] is completely neglected. Similar curves are depicted in Figure 6, where we compare Rayleigh and Nakagami fading. Even if on average the channel gains are the same in the two scenarios, when a strong lineof-sight component is present as in Nakagami fading), etter performance can e achieved ecause 1) it ecomes easier to predict the future energy arrivals and thus to correctly manage the availale energy, and 2) the system approaches the deterministic energy arrivals case, which represents an upper ound for the energy harvesting scenarios [17]. In Figure 7 we plot the long-term reward as a function of the attery size of the first device, oth for µ. When B max is very small, the atteries represent the system ottleneck ecause D 1 and/or D 2 are not ale to store and use all the incoming energy. As the attery sizes grow, the performance of the system saturates ecause the energy availale at the access point Q max is limited. The throughput difference etween low and high B max is larger when d 1 is small ecause, when the attery of D 1 is small, the device is not ale to fully exploit its channel potential, which in turn can e fully used with larger atteries. Some artifacts can e noticed e.g., at B max =.225 mj for the curve d 1 = 3 m) ecause we are using App- VIA and not the real optimal policy, whose throughput strictly increases with the attery sizes. Finally, we descrie how the throughput changes as a function of the distance of D 1 from AP. Figures 8, 9 and 1 are otained in the high transmission power regime, i.e., P 1,min = P 2,min = 1 mw and P 1,max = P 2,max = 1 mw, whereas Figure 11 is determined in the low transmission power regime, i.e., with P 1,min = P 2,min =.1 mw and P 1,max = P 2,max =.5 mw. When d 2 is small Figure 8), the difference etween the slot-oriented and the long-term approaches is smaller ecause a lot of energy is availale at the two devices, thus even an inefficient use of it leads to high performance. Instead, as d 2 increases see Figure 1), the difference etween the two approaches is significant and this supports the need for of a long-term optimization. As expected, in all cases the throughput decreases as d 1 increases. This is particularly emphasized when d 2 is small ecause,

11 Long-term reward 1.8 1.6 1.4 1.2 1.8.6.4 Battery size: E max = 1. mj E max =.5 mj E max =.25 mj Policy:.2 1 1.5 2 2.5 3 3.5 4 4.5 5 Distance [m] Figure 8: Long-term reward of µ and σ as a function of d 1 with high transmission powers when d 2 = 1 m. µ σ Long-term reward.26.24.22.2.18.16.14 Battery size: E max = 1. mj.12 E max =.5 mj.1 E max =.25 mj E max =.15 mj Policy:.8 E max =.125 mj µ E max =.1 mj σ.6 1 1.5 2 2.5 3 3.5 4 4.5 5 Distance [m] Figure 1: Long-term reward of µ and σ as a function of d 1 with high transmission powers when d 2 = 5 m. Long-term reward.7.6.5.4.3 Policy: µ σ Battery size: E max = 1. mj E max =.5 mj E max =.25 mj E max =.15 mj E max =.125 mj E max =.1 mj Long-term reward 1.1 1.9.8.7.6 Policy: µ σ Battery size: E max =.5 mj E max =.3 mj E max =.15 mj E max =.1 mj.5.2.4.1 1 1.5 2 2.5 3 3.5 4 4.5 5 Distance [m] Figure 9: Long-term reward of µ and σ as a function of d 1 with high transmission powers when d 2 = 3 m..3 1 1.5 2 2.5 3 3.5 4 4.5 5 Distance [m] Figure 11: Long-term reward of µ and σ as a function of d 1 with low transmission powers when d 2 = 3 m. since it is farther from AP, D 1 represents the performance ottleneck. Differently, when d 2 = 5 m, D 2 is the ottleneck, thus the system performance shows a weak dependence on y the distance of D 1 from AP. The differences etween high and low transmission power regimes can e seen comparing Figures 9 and 11. It can e seen that with lower transmission powers it is possile to achieve higher rewards. Indeed, in the analyzed scenario the distances are small, thus the uplink SNR is high even for low transmission powers. Therefore, ecause of the concavity of the reward function in Equation 1), with lower transmission powers it may e possile to achieve high throughput while consuming less energy, leading to an overall improvement of the system performance. VII. CONCLUSIONS In this work we studied the long-term throughput optimization in a wireless powered communication network composed of an access point and two distriuted devices. The system alternates a downlink phase, in which AP recharges the atteries of the nodes via an RF-WET mechanism, and an uplink phase, in which oth devices transmit data toward AP in a TDMA fashion. We explained how to solve the long-term throughput maximization prolem optimally and approximately while explicitly considering the atteries evolution and the channel state information. We simplified the optimization y exploiting the structure of Bellman s equation. Finally, we compared the long-term approach with the slot-oriented one and noticed that, in order to achieve high performance, the traditional schemes proposed in the literature are strongly su-optimal. As part of our future work we would like to 1) extend the model of the system in order to consider partial CSI, storage losses or circuitry costs, 2) compare our results with those otained using a distriuted approach, and 3) extend the longterm optimization to the case with a generic numer of nodes. APPENDIX A PROOF OF PROPOSITION 1 The proof is y induction on N. If N =, 29) holds ecause K ) = ). Then, assume that 29) holds for some N. The inductive step is as follows K N+1) N+1) = max B K N+1) N+1) 38a)

NEW generation devices, e.g., Wireless Sensor Networks. Battery-Powered Devices in WPCNs. arxiv: v2 [cs.it] 28 Jan 2016