IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 1. Optimal Energy Management Policy of Mobile Energy Gateway

Size: px

Start display at page:

Download "IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 1. Optimal Energy Management Policy of Mobile Energy Gateway"

Gervase Pope
6 years ago
Views:

1 TRANSACTIONS ON VEHICULAR TECHNOLOGY Optimal Energy Management Policy of Mobile Energy Gateway Yang Zhang, Dusit Niyato, Senior Member,, Ping Wang, Senior Member,, and Dong In Kim, Senior Member, 5 Abstract With the advancement of wireless energy harvesting 6 and transfer technologies, e.g., radio frequency (RF) energy, mo- 7 bile nodes are fully untethered as energy supply is more ubiqui- 8 tous. The mobile nodes can receive energy from wireless chargers, 9 which can be static or mobile. In this paper, we introduce the use 10 of a mobile energy gateway that can receive energy from a fixed 11 charging facility, as well as move and transfer energy to other 12 users. The mobile energy gateway aims to maximize the utility by 13 optimally taking energy charging/transferring actions. We for- 14 mulate the optimal energy charging/transferring problem as a 15 Markov decision process (MDP). The MDP model is then solved 16 to obtain the optimal energy management policy for the mobile 17 energy gateway. Furthermore, the optimal energy management 18 policy obtained from the MDP model is proven to have a thresh- 19 old structure. We conduct an extensive performance evaluation 20 of the MDP-based energy management scheme. The proposed 21 MDP-based scheme outperforms several conventional baseline 22 schemes in terms of expected overall utility. 23 Index Terms Markov decision process (MDP), mobile energy 24 gateway, wireless charging. 25 I. INTRODUCTION R 26 ADIO FREQUENCY (RF) energy is one of the wireless 27 energy harvesting and transfer techniques that support 28 far-field wireless charging services. The other techniques are 29 inductive coupling and magnetic resonance coupling, which are 30 near-field charging techniques. In RF-based wireless charging, 31 an RF signal is used as a carrier to transfer energy from a source 32 (e.g., a wireless charger) to a consumer (e.g., a user). The RF- 33 based wireless charging can support mobile networks, which 34 are composed of energy-constrained nodes and devices. This 35 will help improve not only the energy efficiency but also the 36 performance of the networks. The efficiency of the RF-based 37 wireless charging depends largely on the distance between the 38 charger and the charging device. Traditionally, wireless charg- Manuscript received December 26, 2014; revised April 5, 2015; accepted June 5, This work was supported in part by the Singapore Ministry of Education (MOE) under Tier-1 Grant RG18/13 and Grant RG33/12 and Tier-2 Grant MOE2014-T ARC 4/15 and in part by the National Research Foundation of Korea funded by the Korean government (MSIP) under Grant 2014R1A5A The review of this paper was coordinated by Dr. P. Lin. Y. Zhang, D. Niyato, and P. Wang are with the School of Computer Engineering, Nanyang Technological University, Singapore ( yzhang28@e.ntu.edu.sg; dniyato@ntu.edu.sg; wangping@ntu.edu.sg). D. I. Kim is with the School of Information and Communication Engineering, Sungkyunkwan University (SKKU), Suwon , Korea ( dikim@skku.ac.kr). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TVT ing is deployed for fixed chargers with constant power supply 39 (e.g., from a power outlet). However, in many situations (e.g., 40 sensor networks), wireless charging can be used for a mobile 41 energy gateway that moves and reaches other charging devices 42 [1]. Such a mobile energy gateway improves an energy reple- 43 nishment process for various mobile and wireless networks, 44 particularly when the charging devices cannot or rarely visit a 45 fixed wireless charging facility. However, there are a few issues 46 when the mobile energy gateway is deployed, e.g., optimal de- 47 ployment, path planning, and energy management. 48 We consider a mobile network with the mobile energy gate- 49 way. Unlike most of existing works, which assume that the mo- 50 bility of the energy gateway 1 can be adjusted and its path can be 51 optimized, we consider the energy gateway with noncontrolla- 52 ble mobility. For example, the energy gateway may be attached 53 to other vehicles (e.g., a bicycle or trolley) or carried by a hu- 54 man. The energy gateway works as a self-interested carrier (i.e., 55 an agent) of energy between energy sources and users. The 56 energy gateway is equipped with RF charging capability, as 57 well as an energy storage, e.g., a battery. The fixed chargers and 58 users are geographically distributed at different locations in the 59 network. The energy gateway moves among the locations ran- 60 domly, pays for the energy received from the chargers, and 61 receives payment from users when the energy is transferred. 62 Therefore, the energy gateway aims to maximize the profit 63 from strategically charging and transferring energy at different 64 locations. 65 To address the problem of energy charging/transferring ac- 66 tions of the energy gateway, we propose a Markov decision 67 process (MDP)-based scheme. By employing the MDP-based 68 scheme, the energy gateway decides whether to pay and receive 69 energy from the fixed charger or not. By contrast, when the 70 energy gateway meets with users (i.e., the users are in the en- 71 ergy transfer range of the energy gateway), the energy gateway 72 decides whether to transfer energy to the users or not. There 73 could be multiple independent energy gateways in the system. 74 Each energy gateway makes the energy charging/transferring 75 decisions rationally from its own perspective, and thus, the 76 system operates in a distributed manner. The energy charging/ 77 transferring decision, which is referred to as a policy, is made 78 based on the states of the energy gateway. The states are defined 79 as the location, the energy level of the battery, the users in the 80 neighborhood, and the current prices of energy at fixed chargers We use mobile energy gateway and energy gateway interchangeably in the rest of this paper Personal use is permitted, but republication/redistribution requires permission. See for more information.

2 2 TRANSACTIONS ON VEHICULAR TECHNOLOGY 82 The contributions of this paper are summarized as follows. 83 We propose the concept of a self-interested energy gate- 84 way, which is equipped with RF charging capability. The 85 energy gateway acts as an energy carrier to assist the 86 chargers in extending the energy transmission range to 87 remote users. 88 We design an MDP-based scheme for the energy gateway 89 to obtain the energy management policy. The optimal 90 performance is achieved in terms of maximized utility of 91 the energy gateway. 92 We study the structure of the optimal energy charging/ 93 transferring policy. In particular, we prove that the opti- 94 mal policy obtained from the MDP-based scheme has a 95 threshold structure with respect to the system states. 96 We present an extensive performance evaluation of the 97 MDP-based scheme. We demonstrate that the MDP-based 98 scheme outperforms several baseline schemes for energy 99 management actions. This is due to the fact that the MDP- 100 based scheme takes both the current and the future system 101 states into account. 102 The rest of this paper is organized as follows. We review 103 related work in Section II. In Section III, we describe the mobile 104 network with energy sources (i.e., fixed chargers): an energy 105 gateway to carry and transfer energy to users. The RF energy 106 propagation model is presented, and a payment scheme of users 107 is described. Section IV formulates an MDP to maximize the 108 energy gateway s expected utility. The solution method and 109 the existence of threshold policies of the MDP are presented 110 in Section V. Numerical results are provided in Section VI. 111 Finally, Section VII concludes this paper. 112 II. RELATED WORK 113 A. RF Energy Harvesting 114 Recently, an RF energy harvesting technique has been intro- 115 duced to sustain the operation of wireless devices if their wired 116 charging or battery replacement are too costly or practically 117 infeasible. For example, body area wireless devices for human 118 health monitoring [2], sensors inside civil infrastructures [3], 119 and monitoring devices in airframes [4] can benefit from the 120 RF energy harvesting techniques. Numerous applications and 121 research works related to RF energy harvesting were reviewed 122 in [5] and [6]. 123 Nishimoto et al. in [7] developed a prototype of a wireless 124 sensor network with RF energy harvesting capability. The 125 sensors harvest ambient RF energy from far-field TV broadcast 126 (6.6 km) signals. Popovic et al. in [8] also implemented and 127 studied the similar RF energy harvesting sensor networks. The 128 experiments in [7] and [8] showed that with advanced antenna 129 and circuit designs, the RF harvested power is able to support 130 sensor applications. Ostaffe in [9] showed that a mobile phone 131 emitting 0.5 W can charge the device at the distance of 1, 5, 132 and 10 m with the power of 40 mw/m 2, 1.6 mw/m 2, and μw/m 2, respectively. RF energy harvesting to support 134 cognitive radio systems was discussed in [10]. Simultaneous 135 wireless information and power transfer was proposed in [11], 136 which allows the existing wireless network architecture to 137 support RF energy transferring without much modification. RF energy may be harvested from two types of sources, 138 i.e., ambient source (e.g., TV tower [7]) and dedicated source. 139 For example, in [12] and [13], mobile chargers were deployed 140 to power wireless sensors. The Powercaster transmitters [14], 141 which operate with the transmit power of 1 W/3 W, and 142 the Powerharvester receivers [14], which harvest 6 dbm/ dbm, were also developed as commercialized devices to 144 utilize RF energy. 145 B. Performance Modeling and Optimization 146 for RF Energy Harvesting 147 The MDP was introduced as an online optimization approach 148 for energy harvesting in communication systems [15]. For ex- 149 ample, in a body sensor network where each sensor node carries 150 a rechargeable battery and an associated energy harvesting 151 facility [16], an MDP model was formulated for the sensors 152 to choose different transmission modes in different system 153 states to achieve the optimal energy efficiency in terms of the 154 maximized successful reported health events with constrained 155 energy. The actions of taking different transmission modes 156 are, respectively, associated with different energy consumption 157 levels and data rates. The system states, which include the 158 energy level, the health event to be transmitted, and the energy 159 harvesting state, are modeled as a correlated two-state process 160 of harvesting RF energy from an ambient source. Similarly, in 161 [17], Sultan formulated an MDP model to determine an action 162 of sensing/being idle for an RF-energy-powered secondary user 163 in cognitive radio networks. Energy is randomly harvested by 164 the secondary user. An MDP model was developed in [18] to 165 optimize the mean delay of data transmission of a sensor node. 166 To achieve the maximized throughput with quality-of-service 167 (QoS) requirements, Niyato et al. [19] studied an MDP-based 168 scheme for a mobile user to balance between energy charging 169 and data transmission. 170 Users may be far away from RF sources and cannot receive 171 RF energy from wireless chargers [20], [21]. To overcome the 172 transmission range limitation, in the literature, a dedicated en- 173 ergy transmitter, e.g., a charger and relay, was proposed to move 174 and disseminate RF energy at different areas in the system, 175 so that the RF energy can be transferred to areas without RF 176 energy sources. For example, to charge radio frequency identifi- 177 cation (RFID) tags with RF energy, He in [22] proposed the op- 178 timal placement of stationary RFID readers, which also supply 179 RF energy. The objective is to minimize the number of readers, 180 given QoS requirements. Erol-Kantarci and Mouftah in [12] 181 and Shi et al. in [23] introduced the idea of using mobile 182 chargers to travel to different locations and charge multiple 183 sensors. Erol-Kantarci and Mouftah [12] proposed an optimal 184 path of mobile chargers, which is obtained based on the shortest 185 Hamiltonian cycle of the locations to visit. Erol-Kantarci and 186 Mouftah in [13] extended the scheme in [12] by considering 187 priorities of different sensors. An integer linear programming 188 optimization model was applied to maximize the power re- 189 ceived by the prioritized sensors. 190 To the best of our knowledge, the works in the literature 191 related to RF energy harvesting in mobile networks did not 192 consider the perspective of a mobile energy transmitter (i.e., 193

3 ZHANG et al.: OPTIMAL ENERGY MANAGEMENT POLICY OF MOBILE ENERGY GATEWAY 3 period is set to allow the energy to be charged or transferred, 233 which relies on the implementation of the system. We consider 234 that only one decision is made in one decision period. 235 Note that the energy pricing (i.e., from chargers to the energy 236 gateway and from the energy gateway to users) is beyond 237 the scope of this paper. We assume that the set of prices is 238 predetermined. Finding optimal prices is a separate issue that 239 could be studied in the future work. 240 A. RF Energy Propagation Model and User Payment 241 Fig. 1. System description. 194 a mobile energy gateway), which is designed to carry energy 195 from sources to energy users, aiming at maximizing the profit 196 in terms of utility or monetary reward, and hence, this is the 197 main issue that is the focus of this paper. 198 III. SYSTEM MODEL 199 We consider the mobile network with fixed wireless chargers 200 (i.e., energy suppliers), an energy gateway (i.e., an energy mes- 201 senger), and users (i.e., energy consumers), as shown in Fig There are multiple fixed chargers at different locations in the 203 network. The energy gateway moves among locations, visiting 204 chargers and users. The energy gateway is equipped with a bat- 205 tery and energy transfer interfaces. When the energy gateway 206 is at a charger, a certain amount of energy can be transferred to 207 the battery of the energy gateway. Then, a certain price has to be 208 paid by the energy gateway to the charger. The energy price of 209 different chargers can be different and time varying. We assume 210 that the energy gateway visits only one charger at a time, and 211 thus, it receives energy only from the corresponding charger. 212 By contrast, when the energy gateway is at the users, a certain 213 amount of energy may be transferred to the nearby users. The 214 energy transfer to the users is performed in a broadcast manner 215 (i.e., multiple users can receive energy simultaneously), which 216 is a typical nature of RF energy transfer. The users are assumed 217 to be geographically distributed following a Poisson spatial 218 distribution [24]. The user receives RF energy from the energy 219 gateway and pays a retail energy price to the energy gateway. 220 Note that the amount of energy transferred from the charger to 221 the energy gateway can be different and is usually higher than 222 that transferred from the energy gateway to the users. 223 Based on the given system model, we aim to design an energy 224 management scheme for the energy gateway. The scheme as- 225 sists the energy gateway in deciding whether to receive energy 226 from the charger and whether to transfer energy to the users 227 or not. The decision is based on the system states, which are 228 defined based on location, energy prices at chargers, and the 229 number of users that are able to receive energy from the energy 230 gateway. To make the model tractable, we assume that the 231 decision making by the energy gateway is time slotted, and each 232 time slot is called a decision period. The length of a decision The energy gateway transfers energy to users by the RF en- 242 ergy transfer technique. With the energy successfully received, 243 the user makes payment to the energy gateway (e.g., in the 244 form of a real money transaction or a fictitious token). Due to 245 path loss, an amount of energy received by different users may 246 vary. We assume that if the received energy does not exceed the 247 demand of a user, the payment of the user will be based on the 248 actual energy received. 249 We assume that the energy transferring time is fixed and 250 less than a time slot, depending on the property of the energy 251 gateway. Signals will be sent to the end users by the energy 252 gateway to start and terminate the energy transfer. As a result, 253 the durations of all the end users receiving energy are identical. 254 The amount of energy (i.e., in joules) received by user n during 255 the energy transfer duration can be expressed using Friis 256 formula [6], as follows: 257 ( ) 2 λ e R n = ζ RF/DC G t G r,n E S (1) 4πR n where ζ RF/DC is the RF-to-DC energy conversion efficiency 258 [7]. G t and G r,n are the energy transmitting antenna gain of 259 the energy gateway and the receiving antenna gain of user n, 260 respectively. λ is the wavelength of energy transfer signal. R n 261 is the distance from the energy gateway to user n. For simplicity 262 of notation, we let ζ RF/DC G t G r,n (λ/4π) 2 = g, where g is a 263 constant. From (1), the amount of energy received by any user is 264 inversely proportional to the square of the distance to the energy 265 gateway, given a fixed amount of transmitted energy E S. 266 Friis transmission formula requires that the distance R n bex- 267 tween the user and the energy gateway satisfies R n >R f, where 268 R f is the Fraunhofer distance satisfying the following conditions: 269 R f = 2D2 a λ, R f λ and R f D a (2) where D a is the largest dimension of the user s antenna. For 270 the users in the near-field of the energy gateway, i.e., 0 R n 271 R f, we assume that the energy can be transferred without loss. 272 Let the maximum energy demand of user n be e D n, the distance 273 R D n = g E S e D n is a boundary distance, where the demand of the user will not 274 be satisfied for R n >R D n, and the demand will be fully met 275 otherwise. 276 We consider that the energy transfer is performed in a spher- 277 ical spatial area (or a circle area in 2-D space). The area is 278 (3)

4 4 TRANSACTIONS ON VEHICULAR TECHNOLOGY 279 centered at the energy gateway with radius R, where R is the 280 longest distance that the user can receive any energy. The area 281 is divided into the following subareas When user n has the distance 0 R n < max{r f,rn D }, the amount of received energy is larger than the demand of the energy user, i.e., e R n >e D n, and thus, the energy demand of the user will be fully met. For max{r f,rn D } R n R, the amount of energy e R n received by the user will not fully satisfy the energy demand e D n. ForR n >R, the user cannot receive any energy since the energy received becomes too low. The user is defined to be in the energy outage zone of the energy gateway. 292 The distance R is treated as a cutoff distance. All the users 293 located at the energy outage zone will not be considered as valid 294 users for the energy gateway. 295 After receiving the energy with amount e R n,usern will in- 296 form the mobile energy gateway this information with the pay- 297 ment of energy price. We assume that the user always reports 298 truthful information. As we assume, the users are geographi- 299 cally distributed following a Poisson spatial distribution. The 300 probability density function of distance l between user n (out 301 of maximum N users) and the energy gateway (i.e., an origin) 302 is expressed as follows [24]: f(n, l N)= 3 R B(n+ 2 3,N n+1) ( l 3 β B(N n+1,n) R 3 ; n+ 2 ) 3,N n+1 (4) 303 where B(a, b) =Γ(a)Γ(b)/Γ(a + b), and Γ( ) is the Gamma 304 function. β(x; y, z) =(x y 1 (1 x) z 1 )/(B(y, z)) is the Beta 305 density function [24]. 306 The expected payment R(n, E S ) made by user n to the 307 energy gateway is obtained as follows: R R ( R(n, E S )= f(n, l N)r(e D n )dl + f(n, l N)r g E ) S l 2 dl 0 R (5) 308 where R =min{r, max{r f,rn D }}. The first term in (5) 309 indicates the total payment of the users, whose demands are 310 all fully satisfied. The second term in (5) indicates the total 311 payment of users whose demends are partly satisfied, which 312 is calculated from (1), due to path loss and RF-to-DC energy 313 conversion efficiency. The functions r(e D n ) and r(g(e S /l 2 )) 314 indicate the energy price of the amount of received energy to 315 be e D n and e R n = g(e S /l 2 ), respectively. 316 The total average payment received by the energy gateway is 317 obtained as R(N,E S )= N R(n, E S ). (6) n=1 318 B. Uniform Payment for Energy Transfer 319 With large enough energy E S transferred by the energy 320 gateway, the demands of all the users are satisfied, i.e., e R n e D n n {1, 2,...,N}. This is the case when R = R, n 321 {1, 2,...,N}. The payment from user n to the energy gateway 322 becomes 323 R(n, E S )=r ( e D ) n (7) since R 0 f(n, l N)dl = 1. Thus, the total average payment 324 from all the users to the energy gateway is 325 R(N,E S )= N R(n, E S )= n=1 N n=1 r ( e D ) n which can be simplified as R(N,E S )=Nr(e D ) for the case 326 that all the users have the same energy demand denoted by e D. 327 IV. OPTIMIZATION PROBLEM FORMULATION 328 We formulate an MDP model to obtain the optimal energy 329 management policy for the energy gateway. The MDP model 330 consists of the system states, transition matrices among the states, 331 the actions, and corresponding reward of the energy gateway. 332 A. State Space and Action Space 333 The state space of the mobile energy gateway is defined as 334 follows: 335 S={S =(L, E, N, P) L L, E G, N N, P P} (9) where S is a composite state consisting of all the system state 336 variables L, E, N, and P. 337 There are, in total, L locations. The location state is de- 338 noted by L L = {1, 2,...,L}, where L is the set of all 339 locations that the energy gateway can visit. 340 Energy state E G = {0, 1,..., E} is the current 341 amount of energy in the energy gateway s battery. The 342 capacity of the battery is E units of energy. 343 User state N {0, 1,..., N} denotes the number of 344 users that the energy gateway can transfer energy to at the 345 current location. We assume that the maximum number of 346 users that the mobile energy gateway can transfer energy 347 to is finite and denoted by N. 348 Price state P is a composite state for the energy prices at 349 all the chargers. P is denoted by P =(P 1, P 2,...,P M ), 350 where P i, i = 1, 2,...,M is the energy price at location 351 i with a charger. M is the total number of locations with 352 chargers. We assume that the price state P i of each 353 charger takes a value from a finite discrete set of energy 354 price, i.e., P i P = {ρ 1,ρ 2,...,ρ K }, where P is the 355 set of all the K possible prices. This assumption is widely 356 adopted in the literature [25]. 357 The action of the energy gateway is denoted by A A = 358 {0, 1, 2}, where A is the action space. The action A = 1 indi- 359 cates that the energy gateway requests for charging from the 360 charger at the current location. The action A = 2 indicates that 361 the energy gateway transfers energy to the users. The action 362 A = 0 indicates that the energy gateway is idle (i.e., doing 363 nothing). 364 (8)

5 ZHANG et al.: OPTIMAL ENERGY MANAGEMENT POLICY OF MOBILE ENERGY GATEWAY B. Transition Matrices of System States 366 The current state S =(L, E, N, P) transits to the next state 367 S =(L, E, N, P ). In the following, we derive the probabil- 368 ity matrices for the state transition ) Price State Transitions: The price state transition matrix 370 for the charger at location i is expressed as follows: ψ p i 1,1... ψ p i 1,K P i =..... (10) ψ p i K,1... ψ p i K,K where ψ p i 371 k,k indicates the probability that the price state P i of 372 the charger at location i changes from P i = k to P i = k, where 373 k, k P indicate the current price state and the next price 374 state of the ith charger. Thus, the transition matrix for the com- 375 posite price state P of all the chargers is obtained as W P = P 1 P 2 P M (11) 376 where is the Kronecker product. We denote the element in 377 W P with row p and column p to be ψ p p,p, which is the transi- 378 tion probability indicating the composite price state P changes 379 from the current state p =(P 1, P 2,...,P M ) to the next state 380 p =(P 1, P 2,...,P M ) ) Location State Transitions: We divide the set of locations 382 L into three subsets, i.e., L B, L S, and L NC, based on the attri- 383 butes of the locations, where L = L B L S L NC. The subset 384 L B includes all the locations with chargers. L S includes all the 385 locations where there are users but no chargers. L NC is the 386 subset where the energy gateway has contact to neither chargers 387 nor users. We simply assume L B L S =, L B L NC =, 388 and L S L NC =. We denote the total number of locations 389 in the subset L B to be L B (i.e., L B = L B ). Likewise, L S = 390 L S, and L NC = L NC. Clearly, L = L B + L S + L NC. 391 The transition of the location state L of the energy gateway 392 can be expressed by the following transition matrix: L NC,NC L NC,B L NC,S W L = L B,NC L B,B L B,S (12) L S,NC L S,B L S,S 393 the elements of which denote the transition matrices among the 394 three subsets. L a,a, a, a {NC,B,S} contains the transition 395 probabilities when the current location is in the location sub- 396 set L a, and the next location is in subset L a. For example, 397 L S,B means the current location of the energy gateway is 398 in subset L S, and the next location is in subset L B. Each 399 element ψm,m l in L a,a denotes the transition probability from 400 a current location m in subset a to the next location m in 401 subset a ) Energy State Transitions: Next, we derive the energy state 403 transition matrix of the energy gateway. Energy state transitions 404 can be divided into three cases First, the energy state may increase. This occurs when the energy gateway receives E B units of energy from a charger. Recall that E is the capacity of the energy gateway s battery. The (E + 1) (E + 1)-dimensional transition matrix for this case is given in the following 409 equation: 410 E + (E, E L) 1 η L 0 (EB 1) 1 η L η L 0 (EB 1) 1 η L = η L 0 η L 1 η L η L 1 (13) where each row of the matrix denotes the current energy 411 state E, and each column denotes the energy state of the 412 next decision period E. η L is the efficiency of energy 413 charging at location L, i.e., the probability of successful 414 charging. 0 (EB 1) 1 is a row vector, which is composed 415 of E B 1 zeros. 416 Second, the energy state may decrease. This can happen 417 when the energy gateway transfers E S units of energy to 418 users. In this case, the energy state may decrease by E S, 419 except that when there is lees than E S units of energy 420 in the battery, we assume that the energy gateway still 421 transfers energy, so that the energy state decreases to 422 E = 0. The transition matrix is as shown in (14), which 423 has the dimenson of (E + 1) (E + 1), as follows: 424 E (E, E L) = (ES 1) (ES 1) 1 0 (14) where 0 (ES 1) 1 is a row vector, which is composed of 425 E S 1 zeros. 426 The energy state can remain the same, for example, 427 when the energy gateway does not receive or transfer any 428 energy. In this case, we have the transition matrix E 0 = 429 I E+1, where I E+1 is an (E+1) (E+1) identity matrix. 430 The changing of energy state depends on the current location 431 and on the action of the energy gateway. Let W L,E ((L, E), 432 (L, E ) A) denote the transition matrix from the current com- 433 posite state (L, E) to the next state (L, E ), which has the di- 434 mension of (E+1)L (E+1)L. When action A = 0 is taken, 435 the corresponding transition matrix is expressed as follows: 436 W L,E ((L, E), (L, E ) A = 0) =W L E 0 (15) where E 0 indicates that the energy state E remains the same (i.e., 437 E = E ), regardless of the location state of the energy gateway. 438

6 6 TRANSACTIONS ON VEHICULAR TECHNOLOGY may receive energy directly from the charger). Conse- 471 quently, the user state of the energy gateway is N When the energy gateway moves to the location with 473 neither chargers nor users (i.e., L L NC ), similar to the 474 previous case, the user state is N For ease of presentation, we define the following matrices: 476 W L,E L L B LNC AQ1 448 Fig. 2. Energy charging action. (a) With a charger and (b) without any charger at the current location. 439 When action A = 1 is taken, i.e., the energy gateway re- 440 quests for energy charging, the corresponding transition matrix 441 is expressed as follows: W L,E ((L, E), (L, E ) A =1) L NC,NC E 0 L NC,B E 0 L NC,S E 0 = L B,NC E + L B,B E + L B,S E +. (16) L S,NC E 0 L S,B E 0 L S,S E In this case, when the energy gateway is not at any charger (i.e., 443 the current location state L belongs to subset L S or L NC ), the 444 energy gateway cannot receive any energy. Consequently, E is applied to the row corresponding to the location states L 446 L S and L L NC. Otherwise, the energy gateway will receive 447 energy, and the matrix E + is applied for the location state L L B (see Fig. 2). 449 When the action A = 2 is taken, the energy gateway trans- 450 fers energy to users. The corresponding transition matrix is 451 expressed as follows: W L,E ((L, E), (L, E ) A = 2) =W L E. (17) 452 In this case, the energy state of the battery decreases. Note that 453 the energy transferring action can be taken at all the locations. 454 For the locations without energy users (i.e., L L NC LB ), 455 the energy gateway can still transfer energy, although no users 456 will pay for and receive the transferred energy so that the 457 transferred energy could be wasted ) User State Transitions: Here, the user state is the number 459 of users that the energy gateway can transfer energy to. The 460 transitions of the user state depend on the location of the energy 461 gateway. For the energy gateway moving from the current 462 location L to the next location L, the transition of the user state 463 from N to N has the following three cases The energy gateway moves to the location with users (i.e., L L S ). Thus, there will be some users that can receive energy from the energy gateway, and the user state is N N = {0, 1,...,N}. However, when the energy gateway moves to the location with a charger (i.e., L L B ), there will be no users receiving energy from the energy gateway (e.g., the users = W L,E ((L, E), (L, E ) A) ([ ] ) I(LNC+L B ) (L NC+L B ) 0 I 0 0 (E+1) (E+1) L L W L,E L L S (18) = W L,E ((L, E), (L, E ) A) ([ ] ) 0 0 I 0 I (E+1) (E+1) (19) LS L S L L where the matrix I Y Y is a Y Y identity matrix. W L,E L L 477 S is a matrix with [L(E + 1)] [L(E + 1)] dimensions, which 478 has the physical meaning that it represents the part of transitions 479 in matrix W L,E ((L, E), (L, E ) A), where the next location 480 L of the energy gateway is in subset L S (i.e., with users), 481 and masks the rest with zeros. Similarly, W L,E L L B LNC 482 represents the part of transitions in W L,E ((L, E), (L, E ) A), 483 where L belongs to subset L B (i.e., with a charger) and subset 484 L NC. 485 For L L B LNC,wehaveN 0. By contrast, for L 486 L S, the transition matrix of user state N considering states L 487 and E is derived as follows: 488 ψ0,0 u ψ u 0,N W L,E,N L L S =..... W L,E L L S ψn,0 u ψn,n u (20) where ψn,n u is the transition probability of an event that the 489 user state changes from n N (for the current state) to n N 490 (for the next decision period). Given the case that users are 491 distributed spatially in Poisson distribution with the spatial 492 density α and the maximum number of users is finite and known 493 (i.e., N<+ ), ψn,n u is in the form of a truncated Poisson 494 process defined as follows: 495 ψn,n u = e πr2 α (πr 2 α) n + k=n n!, n {1,...,N} n {1,...,N 1} e πr2 α (πr 2 α) k k!, n {1,...,N} n = N. (21) Note that other spatial distributions and user state transition 496 processes can be applied without affecting the optimization 497 model. W L,E,N L L S is the transition matrix of (L, E, N ) 498 when the next location L is in subset L S. Similarly, 499 W L,E,N L L B LNC is the transition matrix of (L, E, N ) 500

7 ZHANG et al.: OPTIMAL ENERGY MANAGEMENT POLICY OF MOBILE ENERGY GATEWAY when the next location L is in subset L NC or L B, which is 502 expressed as follows: W L,E,N L L B LNC = [ ] 1 (N+1) 1 0 (N+1) N WL,E L L B LNC (22) 503 where 1 (N+1) 1 is an (N +1) 1 matrix of 1 s. 504 Then, the transition matrix of the current composite state 505 (L, E, N ) to the next composite state (L, E, N ) is given as 506 follows: W L,E,N ((L, E, N ), (L, E, N ) A) = W L,E,N L L S + W L,E,N L L B LNC. (23) 507 5) Overall Transition Matrix: The transition matrix of the 508 entire state space is denoted by W(S, S A), where the current 509 composite state is S =(L, E, N, P), and the next composite 510 state is S =(L, E, N, P ), given action A, which is taken by 511 the energy gateway, as follows: W(S, S A) = W L,E,N ((L, E, N ), (L, E, N ) A) W P (P, P ). (24) 512 V. S OLVING THE MARKOV DECISION PROCESS 513 OPTIMIZATION MODEL 514 Here, we first define an immediate utility function of the 515 energy gateway. Then, we present the MDP model. Next, we 516 define the threshold structure of the optimal policy obtained 517 from the MDP model. 518 A. Immediate Utility Function 519 An immediate utility function u(s, A) is defined as the 520 reward of the energy gateway in the current decision period, 521 given the composite state S =(L, E, N, P). Without loss of 522 generality, we adopt the following function of u(s, A), which 523 has different forms given different locations L and actions A of 524 the energy gateway, i.e., u B (S), L L B and A = 1 u(s A)= u S (S), L L S and A = 2 (25) u 0 (S), otherwise 525 where u B (S) denotes the reward, which is obtained when the 526 energy gateway is at the location with a charger (i.e., L L B ), 527 and the charging action A = 1 is taken. u S (S) denotes the 528 reward when the energy gateway is at the location with users, 529 and the action energy transfer A = 2 is taken. u 0 (S) is the 530 reward of the energy gateway being idle. u B (S), u S (S), and 531 u 0 (S) are defined as follows: due to the compensation to the self-discharging effect [26], 536 [27]. Thus 537 u S (S) =R(N,E S ) F (E) (27) where E S denotes the amount of energy transmitted at the en- 538 ergy gateway to the users. R(N,E S ) is the function indicating 539 the payment from all N users at the current location. This 540 function is defined as in (6) and (8). Thus 541 u 0 (S) = F (E) (28) where only the holding cost of energy is applied. 542 Note that the immediate utility function u 0 (S) is used for the 543 following cases. First, the energy gateway takes the idle action 544 A = 0 regardless of the current location. Second, the charging 545 action A = 1 is taken when the current location has no charger 546 (i.e., L L B ). Third, the energy transferring action is taken 547 when the energy gateway is not at the location with users (i.e., 548 L L S ). 549 B. Solving the MDP Optimization Model 550 The objective of the MDP model is to obtain an optimal 551 energy management policy for the energy gateway. A policy 552 φ(a S) is defined as a mapping of state S to action A to 553 be taken by the energy gateway. The optimal policy, which is 554 denoted by φ (A S), aims to maximize the overall utility of the 555 energy gateway. 556 The following Bellman equation [28] is applied to obtain the 557 optimal policy 558 U(S) = max H(S A) (29) φ(a S) φ (A S) = arg max H(S A) (30) φ(a S) H(S A)=u(S A)+γ S S W (S, S A)U(S ) (31) where S =(L, E, N, P) is the current state. The Bellman equa- 559 tion can be numerically solved by the value iteration algorithm 560 [29]. H( ) denotes the overall utility of the energy gateway, in- 561 cluding the immediate utility of the current state as well as that 562 of all the possible future states. U(S) is the achieved optimal 563 overall utility. φ (A S) is the optimal policy. γ [0, 1) is a dis- 564 count factor of possible future states. u(s A) and S S W (S, 565 S A)U(S ) are the current immediate utility and the expected 566 future utility of the energy gateway, respectively. W (S, S A) 567 is the transition probability from the current state S to the next 568 state S, which can be obtained from the transition matrix, as 569 given in (24). The complexity of solving the Bellman equation 570 by the value iteration algorithm is O( A S 2 ) [30], where A 571 is the number of actions, and S is the total number of states. 572 C. Threshold Structure of MDP Solutions 573 u B (S) = E B P(L) F (E) (26) 532 where E B is the amount of energy transferred from the charger 533 to the energy gateway, P(L) {P 1, P 2,...,P M } is the cur- 534 rent price state at the location L, and F (E) is the holding cost 535 of the current energy state. This cost, for example, could be Next, we introduce the concept of a threshold policy and 574 prove the existence of the threshold policy in the optimal energy 575 management policy obtained from solving the proposed MDP 576 model ) Concept of Threshold Policy: The optimal policy 578 φ (A S) of the MDP model is defined to be a threshold policy, 579

8 8 TRANSACTIONS ON VEHICULAR TECHNOLOGY 580 if the following condition holds: A 1, min Θ Θ Θ thr,1 φ A i, Θ thr,i 1 Θ Θ thr,i (A Θ, S Θ )= i {2, 3,..., A 1} A A, Θ A 1 Θ max Θ (32) 581 where Θ is a state or a composite state variable. S Θ denotes 582 the composite state of other states except Θ. φ (A Θ, S Θ ) is 583 the optimal action solved by the Bellman equation in (29) (31), 584 given the current state S =(Θ, S Θ ). Θ thr,i is called the ith 585 threshold of state Θ. In other words, action A is monotonic as 586 state Θ increases. 587 The existence of the threshold policy can contribute to solv- 588 ing the MDP model efficiently. For example, in a system with 589 a very large number of states S, e.g., S = , the value 590 iteration algorithm is not viable due to the unmanageable com- 591 plexity, as analyzed in Section V-B. As shown in the definition 592 of threshold policy (32), with the existence of the threshold 593 policy, once all the thresholds Θ thr,i i {2, 3,..., A 1} 594 are known, the actions A 1,...,A A to take on all the system 595 states are already decided. The algorithms for immediately 596 deciding on the thresholds deserve future research. However, 597 a few approaches have been proposed to estimate thresholds 598 in the MDP solutions, such as reinforcement learning [31] and 599 approximation algorithms [32]. 600 To prove that the optimal policy φ (A S) in (30) is a thresh- 601 old policy, the concept of supermodularity/submodularity [33] 602 is applied. 603 Definition 1: For x X R, y Y R, a function f(x, y) 604 R is supermodular in (x, y) if f(x 1,y 1 ) f(x 1,y 2 ) f(x 2, 605 y 1 ) f(x 2,y 2 ) x 1,x 2 X y 1,y 2 Y,x 1 >x 2,y 1 >y Similarly, f(x, y) is submodular in (x, y) if f(x 1,y 1 ) f(x 1, 607 y 2 ) f(x 2,y 1 ) f(x 2,y 2 ) x 1,x 2 X y 1,y 2 Y,x 1 > 608 x 2,y 1 >y The supermodularity/submodularity property of f(x, y) is a 610 sufficient condition of the nondecreasing/nonincreasing mono- 611 tonicity of y =argmax y f(x, y) [28], [33]. Specifically, in the 612 proposed MDP model and Bellman equation given in (29) (31), 613 for a given state θ {E, L, W}, the fact that H(S A) is 614 supermodular/submodular in (θ, A) indicates that φ (A S) is 615 nondecreasing/nonincreasing in θ {E, L, W} ) Threshold Policy: First, when the energy gateway is at the 617 location with a charger, the threshold policy exists with respect 618 to the energy state E. 619 We first remove the action of energy transfer A = 2 (i.e., 620 the energy gateway never transfers energy when it is at the 621 location with a charger, i.e., for the current system state S = 622 (L, E, N, P), L L B ). The proof is direct that A = 2 is always 623 dominated by the idle action A = 0 in this case since the 624 following condition holds: H(S A = 0) H(S A = 2) S where L L B. (33) 625 Thus, we have the following theorem when the energy gate- 626 way is at the location with a charger. 627 Theorem 1: Given any user state N, price state P, and 628 location state L L B, the optimal action policy of the energy 629 gateway is a threshold policy in the energy state E, if the holding cost F (E) is a linear function in E. The action of the energy 630 gateway is A = 1ifE ETHRESHOLD 1 0, and A = 0 otherwise. 631 The threshold policy is binary that only the action A {0, 1} 632 will be taken. The intuition is that when the energy gateway has 633 less energy in its battery, it is more likely to receive energy from 634 the charger. The proof of Theorem 1 is in Appendix A. 635 Similarly, when the energy gateway is at the location with 636 users, the charging action A = 1 is eliminated. We have the 637 following theorem for the threshold policy with respect to the 638 energy state E. 639 Theorem 2: Given any user state N, price state P, and loca- 640 tion state L L S, the optimal action policy of the energy gateway 641 is a threshold policy in the energy state E, given that the holding 642 cost F (E) is a linear function in E. The action of the energy 643 gateway is A=0 when E ETHRESHOLD 0 2, and A=2 otherwise. 644 Again, the intuition is that when the energy gateway has more 645 energy in its battery, it is more likely to transfer energy to the 646 users. The proof of Theorem 2 is similar to that of Theorem 1, 647 and therefore, we omit it for brevity. 648 Finally, for the energy gateway at the location without any 649 charger or users (i.e., the location state is in subset L NC ), 650 the idle action A = 0 is always taken, and a threshold policy 651 with respect to the energy state E exists trivially. Therefore, the 652 existence of a threshold policy with respect to the energy state 653 E in the optimal policy is completely proven. 654 In similar spirit, when the energy gateway is at the location 655 with a charger, a threshold policy with respect to the energy 656 price of a particular charger P i i {1, 2,...,M} exists, as 657 stated in the following theorem. 658 Theorem 3: Given any user state N, price state P i =(P 1, ,P i 1, P i+1,...,p M ) (except the ith price component), 660 and location state L L B, the optimal action policy of the 661 energy gateway is a threshold policy in the ith price state 662 component P i. 663 The intuition is that if the energy price is higher, the energy 664 gateway is less likely to receive energy from the charger since 665 it incurs smaller reward. 666 When the energy gateway is at the location with users, we 667 have the following theorem for a threshold policy with respect 668 to the user state N. 669 Theorem 4: Given any price state P, energy state E, and 670 location state L L S, the optimal action policy of an energy 671 gateway is a threshold policy in the user state N. 672 The intuition is that when there are more number of users 673 that can receive energy from the energy gateway, the energy 674 gateway is likely to take the action to transfer energy due to 675 higher reward. The proofs of Theorems 3 and 4 are similar to 676 that of Theorem 1, and therefore, we omit it for brevity. 677 VI. NUMERICAL RESULTS 678 A. System Settings 679 1) System Parameters: Unless otherwise stated, we use the 680 following parameter settings to evaluate and compare the per- 681 formance of different schemes. 682 There are three locations in the network: Location L = has neither a charger nor a user, i.e., L = 1 is in subset 684

9 ZHANG et al.: OPTIMAL ENERGY MANAGEMENT POLICY OF MOBILE ENERGY GATEWAY L NC. Location L = 2 belongs to L B, where the charger exists. At location L = 3, the energy gateway can transfer energy to users, i.e., L = 3 L S. The transition matrix of location state L is W L = (34) which indicates that the energy gateway has the probability of 0.29 to be with the charger and the probability of 0.69 to be with users. The battery of the energy gateway has the capacity of five units of energy, i.e., E = 5. The charger provides the energy charging service at three different prices, which are denoted by P = {0.1, 1.0, 5.0}. The price state changes among the three prices uniformly, i.e., W P = P 1 =[1/3] 3 3, as in (11). The spatial density of users is α = per unit of area. The energy transferring range is set as R = 10 m. The energy gateway receives one unit of energy from the charger and transfers one unit of energy to users, i.e., E B = E S = 1. The probability of successfully receiving energy from the charger is For the immediate utility function given in (25), we assume that the cost of holding energy is negligible, i.e., H(E) 0. The utility function of charging is expressed as u B (S) = E B P. For transferring energy to users, we consider the case where the energy demands of all users are met, as in (8). We set the uniform payment as follows: r(e d ) 1.0. Therefore, in (25), u S (S) =R(N,E S )= 1.0 N, where 1.0 indicates the payment from a user to the energy gateway. The discount factor in the Bellman equation is γ = ) Baseline Schemes and Evaluation Criteria: We compare 715 the proposed MDP-based scheme with four baseline energy 716 management schemes. These schemes are as follows ) A greedy scheme (GRDY): The energy gateway always takes an action to maximize the immediate utility function u(s A) of the current decision period (i.e., a myopic strategy), regardless of all the previous and future system states. 2) A location-aware scheme (LOCA): The energy gateway always takes charging (A = 1), transferring (A = 2), and idle (A = 0) actions at the locations with a charger (i.e., subset L B ), with users (subset L S ) and with neither a charger nor users (subset L NC ), respectively. 3) A random scheme (RND): The action taken by the energy gateway is randomly selected from A = {0, 1, 2}, with the probability of 1/3 for each action. 4) A location-aware random scheme (LRND): The energy gateway takes actions A = 0 and A = 1 when it is at the location in subset L B. It takes actions A = 0 and A = 2 when it is at the location in subset L S. Finally, it takes action A 0atthe location in subset L NC. 735 We assume that the energy gateway is initialized at any state 736 S Swith the probability p ent = 1/ S. By adopting different energy management schemes, we evaluate the expected utility 737 of the energy gateway, energy charging (or transferring) rate, 738 average energy level, and successful energy transferring rate. 739 Here, the successful energy transferring rate is the probability 740 of the states at which the energy gateway receives and stores 741 enough energy to be transferred ) Threshold Policy: Fig. 3 shows that an optimal energy 743 management policy obtained from the proposed MDP-based 744 scheme is a threshold policy. In particular, the threshold policy 745 with respect to the price state P is shown in Fig. 3(a) (d). 746 Fig. 3(a) and (c) shows the policies for the location state L = 1, 747 i.e., the energy gateway is at the location with a charger. In this 748 case, the action taken by the energy gateway changes from A = (i.e., charging) to A = 0 (i.e., idle) as the price state P in- 750 creases. For example, in Fig. 3(a), at the energy state E = 2, 751 action A = 1 is taken when P = 1 as well as when P = 2, and 752 A changes to 0 when P increases to 3. However, when the en- 753 ergy gateway is at the location with users, i.e., L = 2, as shown 754 in Fig. 3(b) and (d), no threshold policy exists with respect to 755 P since the energy gateway cannot request and receive energy 756 from the charger. Consequently, the actions are not affected by 757 the price state P. 758 In Fig. 3(d) and (f), when the energy gateway is at the 759 location with users, i.e., L = 2, the threshold policy exists with 760 respect to N.AsN increases, the action of the energy gateway 761 changes from A = 0toA = 2 (i.e., an energy transferring 762 action). This is due to the fact that the energy gateway gains 763 higher utility by transferring energy when more users can 764 receive energy. By contrast, as shown in Fig. 3(c) and (e), where 765 the location state is fixed as L = 1 (i.e., at the location with a 766 charger), there is no threshold policy with respect to N since 767 the number of users does not affect the charging decision of the 768 energy gateway. 769 The energy state E affects the action of the energy gateway 770 when it is at the location with either the charger or the user, as 771 shown in Fig. 3(a), (b), (e), and (f). When the energy gateway is at 772 the location L=1 where the charger exists, as shown in Fig. 3(a) 773 and (e), the action changes from A = 1toA = 0asE increases. 774 This is because the energy gateway tends to request and receive 775 energy when its battery (energy) level is low [e.g., E 3in776 Fig. 3(e)]. The energy gateway stops charging when its energy 777 level is high enough [e.g., E > 3 in Fig. 3(e)] to avoid the cost 778 from charging. By contrast, when the energy gateway is at the lo- 779 cation with users (i.e., L = 2), the energy transferring action 780 A=2 is preferred as E becomes larger [e.g., E 2inFig. 3(b)]. 781 Specifically, the energy gateway is more likely to transfer 782 energy to the users when it has sufficient energy in its battery. 783 B. Maximum Energy Capacity of Mobile Energy Gateway: 784 Impacts to Optimality 785 We evaluate different performance measures and compare 786 the proposed MDP-based scheme with the other baseline 787 schemes. The results are shown in Figs. 4 and 5, when the 788 maximum capacity E of the energy gateway s battery changes 789 from 0 to Fig. 4(a) shows the expected utilities of the energy gate- 791 way by adopting different energy management schemes. The 792

10 10 TRANSACTIONS ON VEHICULAR TECHNOLOGY Fig. 3. Threshold in actions for different (a) price state P and energy state E (when L = 1andN = 2), (b) price state P and energy state E (when L = 2and N = 2), (c) price state P and user state N (when L = 1andE = 2), (d) price state P and user state N (when L = 2andE = 2), (e) user state N and energy state E (when L = 1andP = 1), and (f) user state N and energy state E (when L = 2andP = 1). Fig. 4. Impacts of maximum energy capacity E on (a) the expected utility, (b) the energy charging rate, and (c) the energy transferring rate. 793 proposed MDP-based scheme achieves optimal performance in 794 terms of utility, compared with all the baseline schemes. In 795 Fig. 4(a), although the computational complexity of solving 796 the MDP-based scheme is O( A S 2 ), which is larger than 797 O(1) of the baseline schemes, the expected utility obtained 798 significantly increases compared with other baseline schemes. 799 As the maximum capacity E of the energy gateway s battery 800 increases, the utilities obtained from the MDP-based and base- 801 lines schemes increase. This is because when E becomes large, 802 the energy gateway can store more energy to be transferred to 803 users and, thus, gain more utility. 804 Fig. 4(a) and (b) shows the energy charging (A = 1) rate 805 from the chargers and energy transferring (A = 2) rate to users, respectively. Fig. 4(a) and (b) highlights that the energy 806 charging and transferring rates first increase and then become 807 stable at a certain level as the maximum capacity E increases. 808 In this case, when E is relatively small, the increased capacity 809 E allows the energy gateway to receive and store more energy 810 (i.e., taking action A = 1). Thus, the energy gateway has more 811 opportunity to transfer energy (i.e., A = 2) to the users. Con- 812 sequently, both the energy charging/transferring rates increase. 813 However, as E continues to increase, the cost (i.e., negative 814 utility) of charging u B (S) prevents the energy gateway from 815 charging and, thus, curtails the energy transfer. Therefore, both 816 the curves of energy charging/transferring rates plateau as E is 817 large enough, i.e., when E 4 in Fig. 4(a) and (b). 818

Channel Selection in Cognitive Radio Networks with Opportunistic RF Energy Harvesting

1 Channel Selection in Cognitive Radio Networks with Opportunistic RF Energy Harvesting Dusit Niyato 1, Ping Wang 1, and Dong In Kim 2 1 School of Computer Engineering, Nanyang Technological University