Learning Relaying Strategies in Cellular D2D Networks with Token-Based Incentives

Size: px

Start display at page:

Download "Learning Relaying Strategies in Cellular D2D Networks with Token-Based Incentives"

Lora Angel Hubbard
5 years ago
Views:

1 Learning Relaying Sraegies in Cellular D2D Neworks wih Token-Based Incenives Nicholas Masronarde, Viral Pael Deparmen of Elecrical Engineering Sae Universiy of New York a Buffalo Buffalo, NY USA Jie Xu, Mihaela van der Schaar Deparmen of Elecrical Engineering Universiy of California a Los Angeles Los Angeles, CA USA Absrac We consider a cellular nework where inelligen cellular devices owned by selfish users are incenivized o cooperae wih each oher by using okens, which hey exchange elecronically o buy and sell downlink relay services, hereby increasing he nework s capaciy. We endow each device wih he abiliy o learn is opimal cooperaion sraegy online in order o maximize is long-erm uiliy in he dynamic nework environmen. We invesigae he impac of he oken exchange sysem on he overall downlink nework performance and he performance of individual devices in various deploymen scenarios involving mixures of high and low mobiliy users. Our resuls sugges ha devices have he greaes incenive o cooperae when he nework conains many highly mobile users (e.g., users in moor vehicles). Moreover, wihin he oken sysem, devices can effecively learn o cooperae online, and achieve over 20% higher hroughpu on average han wih direc ransmission alone, all while selfishly maximizing heir own uiliy. Keywords D2D cooperaive relaying; oken exchange sysem; online learning; LTE-Advanced I. INTRODUCTION Cooperaive device-o-device (D2D) relaying can increase he capaciy of cellular neworks [1] and has he poenial o significanly improve he performance of imporan applicaions such as muliuser wireless video sreaming [2][3]. For his reason, cooperaive communicaion and D2D echnologies are cenral o he LTE-Advanced sandard, which aims o address an anicipaed exponenial increase in mobile daa over he nex few years. However, mos exising work on cooperaive ransmission assumes ha all nework devices will ac as relays whenever requesed. Unforunaely, his assumpion does no hold when devices are cellphones/lapops/ables ha are owned by self-ineresed users, which aim o maximize heir own uiliy. In his siuaion, devices may no be willing o use heir limied baery energy o relay oher devices daa wihou any direc benefi o hemselves. Therefore, o simulae cooperaion among selfineresed devices, and realize he full poenial of cooperaive relaying, proper incenives mus be designed ino he sysem. One way o incenivize cooperaion is o inroduce a paymen o reward he self-ineresed devices when hey ac as relays and inroduce a charge o he devices ha receive help from relays. To his end, elecronic okens have been proposed [4][5]. The key idea is ha, each ime a device receives help from a relay, i pays he relay wih a oken, which he relay can use o ge relay service in he fuure. As long as he expeced fuure benefi of having an addiional oken ouweighs he immediae cos of relaying, hen a self-ineresed device will be willing o ac as a relay [5]. In his paper, we adop a oken-based approach similar o [5]. However, unlike in [5], where he focus is on designing oken-based incenive schemes from a sysem perspecive, in his paper, we invesigae how he devices can learn o adap heir wireless cooperaion and oken gahering sraegies online, a run-ime, based on heir experience. Specifically, we sudy how an individual user in he sysem can learn is opimal (bes response) cooperaion sraegy online, and how his sraegy impacs and is impaced by he oher heerogeneous devices in he nework, which are simulaneously learning. There is a lo of grea lieraure on cooperaive relaying in wireless and cellular neworks (e.g., [1]-[3][6][7][10][12]). However, hese works do no consider selfish users ha require incenives o cooperae, which is he focus of his paper. Imporanly, many of he echniques and soluions in he exising lieraure on cooperaive relaying can be implemened in conuncion wih our proposed framework. Our conribuions are as follows: We model each device s decision problem, i.e., wheher or no o relay, as a Markov decision process (MDP). The obecive of a device is o maximize is long-erm uiliy, which is defined as he difference beween he benefi i gains when i receives daa hrough a relay and he energy cos i incurs when i relays daa for anoher device. We propose a low complexiy learning algorihm ha enables each individual device o opimize is long-erm uiliy in he dynamic nework environmen. We sysemaically evaluae he sysem performance in various deploymen scenarios involving boh high mobiliy and low mobiliy users. Our simulaion resuls sugges ha devices have he greaes incenive o cooperae when he nework conains many highly mobile users (e.g., users in moor vehicles). The remainder of his paper is organized as follows. In Secion II, we presen he sysem model and formulae an individual device s opimizaion problem. In Secion III, we propose a simple and effecive algorihm ha an individual device can deploy o learn he opimal cooperaion sraegy online. In Secion IV, we presen our simulaion resuls. We conclude in Secion V /13/$ IEEE 163 1

2 II. SYSTEM MODEL A. Cooperaive Downlink Nework Model We consider a cellular nework comprising N mobile ransceiver devices. 1 We assume ha ime is sloed ino discree ime inervals indexed by N. A any given ime, a fracion of he devices need o receive daa from he base saion; however, some of hese devices may experience bad channel condiions (e.g., due o fading or shadowing), which will limi heir received daa rae. In his siuaion, inermediae devices can ac as device-o-device relays o help deliver he daa from he base saion o he desinaion device. We use an Amplify-and-Forward (AF) cooperaion scheme [12] and a simple algorihm for deciding when a device will reques a relay and which relay i will selec. Imporanly, he oken sysem and learning soluion proposed in his paper are no dependen on hese specific deails: in fac, hey can be applied o Decode-and-Forward (DF) cooperaion schemes and any oher relay selecion sraegies ha use one relay a a ime for each cooperaive link, e.g., [6][7]. 2 Oubound relay demand rae: Suppose ha in slo, device {1,, N} wans o receive daa from he base saion (device 0). We assume ha device has a arge,arge receive rae r ha is achievable a a arge Signal-o,arge Inerference-and-Noise Raio (SINR) Γ. Assuming ha 0 he source power is fixed a P, here is a non-zero probabiliy λ ha he device will no achieve is arge rae in slo : i.e., he ouage probabiliy 0,arge 0,arge Pr( r r Pr( λ = < ) = Γ <Γ ), (1) 0 where Γ is he SINR of he channel beween he base saion 0 and device, and r is he corresponding receive rae for device. If device is unable o receive is arge rae during slo, hen i will aemp o find a relay hrough which i can achieve is arge rae. For his reason, we refer o λ as device s insananeous oubound relay demand rae (ORDR). Noe ha a device s insananeous ORDR is unknown a priori because i depends on he device s geographic locaion, is disance from he neares base saion, and he underlying nework condiions. Moreover, since devices are mobile, heir insananeous ORDRs are ime-varying. For convenience, in he remainder of his secion, we model device s ORDR as a fixed value λ ; however, in Secion III, we assume ha i is ime-varying and propose a learning algorihm ha enables he device o dynamically adap as is ORDR changes over ime. Illusraive relay selecion sraegy: If device is unable o receive is arge rae during slo, hen i aemps o find a relay hrough which i can achieve is arge rae. Following sandard relay channel analysis for AF cooperaion [12], we obain he following received SINR over he cooperaive link: 1 The proposed soluion can also be applied o an ad-hoc nework where here is no base saion. 2 The sysem can be exended o work wih cooperaion schemes ha use muliple relays, e.g., hose based on randomized space-ime block coding [10]. 0i i 0i ΓΓ 0i i 1 Γ = Γ +Γ +, (2) where i {1,, N} denoes he relay device and Γ and are he SINRs of he source-relay and relay-desinaion channels, respecively. Assuming ha he relay ransmission i power is P, device selecs he relay i ha can mee he arge SINR while using he leas power: i.e., i i cell( ) i 0i,arg 0i i Γ = argmin { P : Γ Γ }, (3) where cell( ) is he se of devices ha are wihin range of he same base saion as device. If here is no relay for which 0i,arg Γ Γ ha is also willing o provide relay services, hen device will receive is daa direcly from he base 0, arg saion a he rae r < r. B. Token Sysem Providing a relay ransmission coss he relay energy wihou providing i any immediae benefi. Consequenly, wihou proper incenives, no ransceiver will be willing o cooperae. To overcome his, we incenivize devices o cooperae hrough he use of okens, which hey exchange elecronically o buy and sell relay services. Therefore, in order for a device o reques relay service, i mus have a oken available o buy he service from anoher device. Afer a device places a relay reques, a relay replies wih eiher a posiive or negaive acknowledgemen (ACK/NACK) [5] indicaing if i is willing o relay or no (we describe how his decision is made in Secion II.D). If he device s reques is ACK d, hen a oken ransfer akes place (from he requesing device o he relaying device) and relay ransmission occurs. Several echniques ha enable secure elecronic oken ransacions have been proposed [4][11], which require no cenral eniy. Our work assumes ha such echnology is in place o implemen he proposed oken exchange sysem. C. Model of a Nework Device We now describe our model of device {1,, N}. Token holding sae: A a given ime, device holds k K = {0,1,..., T} okens, where T is he oal number of okens in he nework. Baery sae/energy budge: Mobile devices have limied baery energy o suppor heir operaions. We le p max represen he energy ha he user allocaes o relaying daa for oher users. p max can be se based on user preferences. Wih each relay ransmission ha device provides, i expends some energy and herefore reduces is energy budge p [0, pmax ]. When is energy budge reaches 0, he device can neiher relay nor ask for relay service. We refer o p = 0 as he dead sae. In our problem formulaion, we assume ha p is coninuous; however, we quanize p in our simulaion resuls o reduce he implemenaion complexiy. Noe ha he budge is for energy spen relaying; hence, i is no decreased when ransmiing or receiving he device s own daa

3 Oubound Relay Reques Inbound Relay Reques Receiver Candidae Relay Oubound Requesλ Device (Receiver) Inbound Reques µ ACK σ ( k, p ) (a) ACK e (c) Device (Candidae Relay) Receiver Relay Device (Receiver) ( k, p ) ( k 1, p) Device (Relay) ( k, p ) ( k + 1, p c) (b) (d) Fig. 1. Illusraion of oubound and inbound relay requess in cellular nework wih cooperaive downlink ransmission. (a) Device requess a relay wih probabiliy λ and is reques is ACK d wih probabiliy e. (b) Afer is reques is ACK d, device ges help from he relay in exchange for one oken. (c) Device receives a relay reques wih probabiliy µ and ACK s he reques based on is cooperaion sraegy σ ( k, p ). (d) Afer i ACK s he reques, device acs as a relay and expends c unis of energy in exchange for one oken. Oubound relay demand rae (ORDR): λ denoes he h device s ORDR as defined in (1). I should be inerpreed as he probabiliy ha device wans o use a relay o receive is daa from he base saion (i is no he probabiliy ha he device acually ges help from a relay). A device s ORDR is unknown a priori for he reasons described in Secion II.A. The definiion of he ORDR is illusraed in Fig. 1(a). Inbound relay demand rae (IRDR): Le µ denoe he probabiliy ha device is asked o relay daa for anoher device. A device s IRDR is unknown a priori because i depends on he device s geographic locaion, is disance from he neares base saion, he locaions of oher devices in he nework, and he underlying nework condiions. The definiion of he IRDR is illusraed in Fig. 1(c). Relay recruimen efficiency: Le e denoe he h device s relay recruimen efficiency, which is defined as he following condiional probabiliy: e 0,arg k > p = Pr(ACK recv'd Γ <Γ, 0, > 0). (4) In words, e is he condiional probabiliy ha device ges help from a relay (i.e., receives an ACK), given ha i requires a relay ransmission, has enough okens o pay a relay, and has non-zero baery energy. I follows ha, if k > 0 and p > 0, hen he probabiliy ha device acually receives an ACK from a relay in a ime slo can be wrien as λ e. A device s relay recruimen efficiency is unknown a priori because i depends on he device s geographic locaion, is disance from he neares base saion, he locaions of oher devices in he nework, he underlying nework condiions, and he cooperaion sraegies, oken holdings, and baery saes of he oher devices. The definiion of he relay recruimen efficiency is illusraed in Fig. 1(a). Cos and benefi: Le b ( r ) be he benefi gained by device when i receives daa hrough a relay a daa rae r and le c be he cos incurred by device (i.e., energy cos) when i provides a relay ransmission o anoher device. Le b =E br ( ) be he expeced benefi over all possible ransmission raes. We normalize he coss o his expeced benefi (i.e., by dividing he cos by b ). Cooperaion acions: When he device receives a relay reques, i mus decide if i will ACK he reques. The acion se ( p A ) depends on he device s baery sae p : i.e., A ( p ) = {0,1} if p > 0 and A ( p ) = {0} if p = 0, where a = 1 A ( p ) (ACK) means ha he device is willing o ac as a relay and a = 0 A ( p ) (NACK) means ha i is no. 3 If he device is in he dead sae (i.e., p = 0 ), hen i canno ac as a relay (i.e., A (0) = {0} ). Cooperaion sraegy/policy: Le σ ( k, p ) A ( p ) denoe he device s cooperaion sraegy when i has k okens and is in power saep. Given is IRDR and cooperaion sraegy, he probabiliy ha device acually relays daa for anoher device in a ime slo is µσ ( k, p ). The role of he cooperaion sraegy is illusraed in Fig. 1(c). Sae evoluion: Denoe he device s sae by s = ( k, p ) S, where k K = {0,1,..., T} is is oken holding sae and p [0, p ] is is baery sae/energy max 3 More precisely, A ( p ) = {0,1} if p > c, where c is he amoun of energy required o relay. However, o simplify he model descripion, we do no explicily wrie his. This mismach beween model and realiy is largely insignifican because i only changes he behavior of he sysem in he firs ime slo when p < c

4 budge. When he device acs as a relay, i gains one oken and loses c unis of baery energy: i.e., ( k, p ) ( k + 1, p c ). When he device uses a relay, i loses one oken and remains in he same baery sae: ( k, p ) ( k 1, p ). The evoluion of he sae is illusraed in Fig. 1(b,d). Le ([, P k p ] [ k, p ], a ) denoe he sae ransiion probabiliy funcion, which gives he probabiliy of ransiioning from sae s = ( k, p ) o sae s = ( k, p ) afer aking cooperaion decision a. The sae ransiion probabiliy funcion is defined as follows: P ([0, p ] [0, p ], a ) = 1 µ a, P ([1, p c ] [0, p ], a ) = µ a, P ([ k 1, p ] [ k, p ], a ) = λe, P ([ k, p ] [ k, p ], a ) = 1 λ e µ a, P ([ k + 1, p c ] [ k, p ], a ) = µ a, P ([ k,0] [ k,0], a ) = 1, where a A ( p ) and ([, Ρ k p ] [ k, p ], a ) = 0 under all cases ha are no included in (5). The firs and second lines of (5) indicae ha if device has zero okens and non-zero baery energy, hen i remains in ha sae wih probabiliy 1 µa (i.e., i does no provide help as a relay) or gains one oken and loses c unis of baery energy wih probabiliy µ a (i.e., i provides help as a relay), respecively. The hird, fourh, and fifh lines of (5) indicae ha if device has nonzero okens and non-zero baery energy, hen i ges help in exchange for a oken wih probabiliy λ e, remains in he same sae wih probabiliy 1 λ e µ a (i.e., i neiher ges help nor provides help), and gains one oken and loses c unis of baery energy wih probabiliy µ a (i.e., i provides help), respecively. The final line of (5) indicaes ha device canno gain or lose okens if is baery is dead (i.e., p = 0 ). Expeced uiliy: Le u ( k, p, a ) denoe he expeced uiliy of being in sae s = ( k, p ) and aking cooperaion acion a : specifically, µ ac, ifk = 0, p > 0 u ( k, p, a ) = λ eb µ ac, if k > 0, p > 0 (6) 0, oherwise. In (6), when a device has non-zero okens (i.e., k > 0 ), i eiher ges help from a relay wih probabiliy λ e and receives benefi b or provides relay service wih probabiliy µ a and incurs cos c. If he device does no have any okens (i.e., k = 0 ), i canno reques help bu can provide help; herefore, i incurs cos c wih probabiliyµ a. If he device is in a dead sae (i.e., p = 0 ), i canno seek or provide help. (5) D. Opimal Collaboraion Sraegy for a Single Device We formulae he problem of deermining a device s opimal cooperaion sraegy as an MDP [8]. The opimal sae value funcion V ( k, p ) represens how good i is o be in each sae. I is given by he following Bellman equaion: { (,, ) β ( k, )] } ( p ) V ( k, p ) = max u k p a + E [ V p a A, (7) Where E [ ] denoes an expecaion over he nex sae disribuion defined in (5) and β is an economic discoun facor reflecing he fac ha having a oken now is more valuable han having one in he fuure. The key idea in (7) is ha he opimal cooperaion decision in sae s = ( k, p ) no only depends on he immediae uiliy, bu also on he expeced fuure uiliy. Indeed, a long-erm opimizaion is required because, if he device only considers is immediae uiliy, i will never choose o ac as a relay (because i will incur a cos c wihou any direc benefi o iself). The opimal cooperaion sraegy/policy σ ( k, p ) can be deermined by aking he argumen ha maximizes (7). Assuming ha he ransiion probabiliy funcion and uiliy funcion are known a priori, he opimal value funcion can be compued using he well-known value ieraion algorihm [8]. In pracice, however, he cos and ransiion probabiliy funcions are unknown a priori (due o he fac ha he ORDR λ, IRDR µ, and relay recruimen efficiency e are unknown), so device canno direcly apply value ieraion o find he opimal policy. Insead, i mus learn is opimal cooperaion sraegy online based on experience. We discuss our proposed learning soluion in Secion III. E. Balance of Oubound and Inbound Token Exchanges On average, a device will only be able o ge relay service as ofen as i provides relay service because i mus earn as many okens as i spends: specifically, he following condiion mus hold Pr(Device receives inbound ACK [-1 Token]) (8) = Pr(Device sends oubound ACK [+1 Token]) In his secion, we characerize his balance afer making several simplifying assumpions. Firs, we are only ineresed in he balance of oken exchanges while he device has non-zero baery energy, so we assume ha he power sae is non-zero and fixed. Second, since he ORDR λ, IRDR µ, and relay recruimen efficiency e are ime-varying, i is difficul o characerize he seady sae behavior; hence, we assume ha hese parameers are fixed. Under hese assumpions, (8) is equivalen o: λ e Pr( k > 0) = µ E[ σ ( k, p )], (9) 1 Token + 1 Token where he lef hand side is he probabiliy ha device uses a relay (and herefore spends one oken), he righ hand side is 166 4

he probabiliy ha i acs as a relay (and herefore earns one oken), Pr( k > 0) is he probabiliy ha i has a leas one oken, and E [ σ ( k, p )] is he probabiliy ha i ACK s an k inbound relay reques, where

The opimal cooperaion sraegy σ ( k, p ) is hreshold in he oken sae k and he hreshold depends on he baery sae p : specifically, 1, if k Kh( p ), σ ( k, p ) = (10) 0, oherwise, where he hreshold Kh ( p

5 he probabiliy ha i acs as a relay (and herefore earns one oken), Pr( k > 0) is he probabiliy ha i has a leas one oken, and E [ σ ( k, p )] is he probabiliy ha i ACK s an k inbound relay reques, where he expecaion is aken over is seady-sae oken holding disribuion. F. Threshold Sraegies Device s cooperaion decision depends on is oken and baery saes. The opimal cooperaion sraegy σ ( k, p ) is hreshold in he oken sae k and he hreshold depends on he baery sae p : specifically, 1, if k Kh( p ), σ ( k, p ) = (10) 0, oherwise, where he hreshold Kh ( p ) depends on he baery sae, and Kh ( p ) is non-decreasing in p. In oher words, as he device s baery drains, i will wan o conserve is energy and will herefore use a lower hreshold. A proof of his resul is lef as fuure work; however, i has been shown in [5] ha policies are hreshold when here is unlimied baery energy. A represenaive opimal cooperaion policy is illusraed in Fig. 2. For compuaional reasons (i.e., o compue he opimal policy wih value ieraion), we quanize he baery sae ino eleven bins (one represening he dead sae) and we limi he maximum number of okens ha he device can hold o 20. III. LEARNING THE OPTIMAL POLICY σ generaed using Fig. 2. Illusraive opimal cooperaion policy ( kp, ) β= 0.99, λ e= 0.45, µ= 0.45, and c / b= All devices in he cellular nework experience differen environmenal dynamics depending on heir geographic locaion in he nework, heir disance from he neares base saion, he underlying nework condiions, and, imporanly, he locaions, cooperaion sraegies, oken holdings, and power saes of he oher nework devices. These dynamics are capured by he a priori unknown and ime-varying ORDR λ, IRDR µ, and relay recruimen efficiency e, for all {1,, N}, as defined in Secion II.C. In his secion, we propose a hybrid offline/online learning algorihm ha a device can deploy o learn he opimal collaboraion sraegy σ ( k, p ) on-he-fly, despie hese parameers being unknown and ime-varying. In he online phase of he proposed algorihm, device esimaes several unknown parameers (namely, is ORDR λ, IRDR µ, relay recruimen efficiency e, and ransmission cos c ). Then, in each ime slo, device selecs is cooperaion sraegy based on he esimaed values. In principle, device could use hese esimaed values o populae he uiliy and ransiion probabiliy funcions defined in (6) and (5), respecively, and hen use value ieraion o compue he opimal cooperaion sraegy corresponding o he esimaed parameers; however, recompuing he opimal policy in every ime slo is compuaionally prohibiive. For his reason, we propose o firs compue a collecion of cooperaion policies offline, which correspond o a represenaive se of discreized nework parameers, and hen use a simple online able look-up o selec device s cooperaion sraegy in each ime slo. To reduce he size of he look-up able, insead of esimaing he ORDR λ and relay recruimen efficiency e independenly, we direcly esimae he oubound relay success rae π = λe [0,1 / 2]. 4 Addiionally, since he opimal policy is hreshold in he oken sae as described in Secion II.F, each policy can be represened compacly wih only one (hreshold) value per baery sae. We now describe he offline and online phases in more deail. 2 Offline phase: Le 1 X Π= { π, π,, π }, Μ= { µ, µ, µ }, and C = { c c,, c } be finie ses conaining represenaive values of he oubound relay success rae, IRDR, and relay cos, respecively. In he offline phase of he hybrid learning algorihm, we compue he collecion of, Y, Z x y z x y z policies { σ( kp, π, µ, c ) : π, µ, c } x y z Π Μ C. To compue he policy σ( kp, π, µ, c ), we firs populae he uiliy and ransiion probabiliy funcions defined in (6) and (5), respecively, wih x π, y µ, and z c (in place of π, µ, and c ). Subsequenly, we use value ieraion o compue he opimal cooperaion sraegy corresponding o he represenaive parameers. Online phase: In he algorihm s online porion, he device mainains run-ime esimaes of he oubound relay success rae π, denoed by π ˆ, and he IRDR µ, denoed by µ ˆ. These esimaes can be made using an exponenial average of successful relay ransmissions and incoming relay requess, respecively. Afer evaluaing he cos c required o provide a 4 The maximum oubound relay success rae is ½. This is because, on average, a device can only use a relay as ofen as i relays

6 relay ransmission in slo, he device follows he policy σ ( k, p f( π ˆ ), ( ˆ ), ( z gµ hc )), where f : [0,1 / 2] Π, g : [0,1] Μ, and h : [0,1] C map π ˆ, µ ˆ, and c o he neares values in Π, Μ, and C, respecively. IV. SIMULATION RESULTS In his secion, we presen our simulaion resuls. In Secion IV.A, we describe he simulaion seup. In Secion IV.B, we invesigae how, wihin he oken sysem, users mobiliy and baery saes impac heir own performance, he performance of oher users, and he overall nework performance. A. Cellular Nework Simulaion Seup We assume ha N = 1500 mobile ransceivers are uniformly and randomly disribued in a 10 km x 10 km square area consising of 100 cells wih size 1 km x 1 km. There is one base saion a he cener of each cell. In each ime slo, each ransceiver moves o a nearby locaion according o a random waypoin mobiliy model and needs o receive daa from he base saion in he corresponding cell wih probabiliy 0.3. We consider pah loss and shadow fading for he channel model such ha [9], P = P PL( d ) 10 αlog( d / d ) χ, (11) where rx x 0 0 P rx and P x are he receive and ransmi powers (in db), respecively, PLd ( 0) is he pah loss of he reference disance d 0, d is he disance beween he source and desinaion, α is he pah loss facor, and χ is a Gaussian disribued random variable represening he effec of shadow fading. We assume ha he maximum ransmission power of a ransceiver is 15 dbm, he channel bandwidh is 10 MHz, and he arge daa rae is 10 Mbps for all devices. If he arge daa rae canno be achieved wih he maximum power, hen he device requess a relay ransmission. Using he above parameer values, he average ORDR hroughou he nework is approximaely λ= 0.1. The required relay ransmission power is calculaed as he minimum power in he opimizaion defined in (3). The (normalized) cos c is he raio of he relay ransmission power (in mw) o he expeced benefi E br ( ) of achieving rae r. All devices adap heir cooperaion sraegy using he hybrid offline/online learning algorihm. Finally, we assume ha here are a oal of T= 9000 okens in he nework ha are uniformly and randomly disribued among he users a he sar of he simulaion. B. Impac of Mobiliy In his secion, we sudy how each user s mobiliy impacs is own performance, he performance of oher users, and he overall nework performance. We use he simulaion es-bed described in Secion IV.A and assume ha all users deploy he hybrid offline/online learning algorihm wih x π {0.05, 0.1,, 0.45} =Π, y µ {0.05, 0.1,, 0.45} =Μ, and c {0.05, 0.10,, 0.95} =C o compue he collecion of policies offline.. We consider wo classes of users: 1. High mobiliy users move a speeds beween 50 and 120 km/hour. For insance, users in moor vehicles on roads or highways are considered high mobiliy users. These users play differen roles in he nework over ime because someimes hey will be far from he base saion and have a high ORDR, and someimes hey will be closer o he base saion where hey will have a high IRDR. 2. Low mobiliy users move a speeds beween 0 and 8 km per hour. For example, users in offices, resaurans, or on foo are considered o be low mobiliy users. Due o heir limied mobiliy, hese users ypically do no swich roles over ime and heir ORDR and IRDR are relaively saic. Since cellular neworks usually conain boh high mobiliy and low mobiliy users, we now invesigae how differen mobiliy mixures impac he nework performance. We consider five mixures in which he nework comprises 10%, 30%, 50%, 70%, and 90% high mobiliy users, and he remaining users have low mobiliy. In oal, here are N = 1500 users. We assume ha all users sar in he maximum baery sae, which is defined such ha each user can ACK an average of 1000 relay requess before enering he dead sae. In Fig. 3, we plo he average ORDR λ, average IRDR µ, average modified relay recruimen efficiency e Pr( k> 0, p> 0) 5, and average hroughpu gain 6 over 3000 ime slos for users in each mobiliy class and for all users combined. The lower and upper error bars in Fig. 3 indicae he 25 h and 75 h perceniles, respecively, over he class of users. ORDR and IRDR: Fig. 3(a) and Fig. 3(b) illusrae he average ORDR and average IRDR, respecively, under he various user mobiliy mixures. Recall ha, a he beginning of he simulaion, users are uniformly disribued hroughou he nework. Since he low mobiliy users do no significanly deviae from heir saring posiions, and he high mobiliy users move many places hroughou he nework, he average ORDR and IRDR for users in each class is approximaely he same; however, he variaion of ORDRs and IRDRs across users wihin each class varies significanly. In paricular, he low mobiliy users have a much larger range of ORDRs and IRDRs because if hey are a he periphery of a cell, hen hey will have large ORDRs and small IRDRs, and if hey are in he core of a cell, hen hey will have small ORDRs and large IRDRs. In conras, he high mobiliy users move beween he peripheries and cores of various cells over ime, and herefore experience less deviaion in hese parameers across he populaion. Modified relay recruimen efficiency and ACK rae: Fig. 3(c) illusraes he average modified relay recruimen efficiency under he various user mobiliy mixures. The device s modified relay recruimen efficiency 5 The device s modified relay recruimen efficiency is he probabiliy ha is oubound requess are ACK d given ha a relay is required [see (4)]. 6 The hroughpu gain is he raio of he acual hroughpu o he direc ransmission hroughpu. Since he acual hroughpu is always greaer han or equal o he direc hroughpu, he minimum hroughpu gain is

e Pr( k> 0, p> 0) is he condiional probabiliy ha is oubound requess are ACK d given ha a relay is required [see (4)]. Imporanly, his parameer is inimaely ied o each user s ORDR and IRDR.

These effecs emerge because low mobiliy users have srongly imbalanced ORDRs and IRDRs unless hey are lucky enough o be siuaed beween he core and periphery of a cell.

Wihou okens, i canno recrui relays, which reduces is modified relay recruimen efficiency. Noe ha his also hurs oher users because i reduces opporuniies o earn okens.

7 e Pr( k> 0, p> 0) is he condiional probabiliy ha is oubound requess are ACK d given ha a relay is required [see (4)]. Imporanly, his parameer is inimaely ied o each user s ORDR and IRDR. On average, low mobiliy users have lower modified relay recruimen efficiencies and more variaion of hese parameers across he populaion. These effecs emerge because low mobiliy users have srongly imbalanced ORDRs and IRDRs unless hey are lucky enough o be siuaed beween he core and periphery of a cell. If a device s ORDR is larger han is IRDR, i.e., λ > µ, hen i ends o run ou of okens (i.e., Pr( k > 0) in (9) will be small). Wihou okens, i canno recrui relays, which reduces is modified relay recruimen efficiency. Noe ha his also hurs oher users because i reduces opporuniies o earn okens. On he oher hand, if a device s ORDR is smaller han is IRDR, i.e., λ < µ, hen i will end o collec a surplus of okens and no have any incenive o ACK incoming relay requess, which in urn reduces he relay recruimen efficiencies of he oher devices in he nework. Noe ha he ACK rae, i.e., he probabiliy ha a device ACKs an inbound reques [he expecaion E [ σ ( k, p )] in (9)], is direcly proporional o he modified relay recruimen efficiency as shown in (9); consequenly, we omi i from Fig. 3. Throughpu gain relaive o direc ransmission: Fig. 3(d) illusraes he average hroughpu gain under he various user mobiliy mixures. Clearly, having a higher fracion of high mobiliy users in he nework improves he average nework hroughpu (relaive o direc ransmission only) because hese users play differen roles in he nework over ime, which enables a coninuous exchange of okens beween devices in he periphery and core of each cell. [2] O. Alay, P. Liu, Y. Wang, E. Erkip, and S. S. Panwar, Cooperaive layered video mulicas using randomized disribued space ime codes, IEEE Trans. on Mulimedia, vol. 13, no. 5, pp , Oc [3] N. Masronarde, F. Verde, D. Darsena, A. Scaglione, and M. van der Schaar, Transmiing imporan bis and sailing high radio waves: a decenralized cross-layer approach o cooperaive video ransmission, IEEE J. on Selec. Areas in Comm. Cooperaive Neworking Challenges and Applicaions, vol. 30, no. 9, pp , Oc [4] L. Buyan and J.-P. Hubaux, Nugles: a virual currency o simulae cooperaion in self-organized mobile ad hoc neworks, EPFL echinical repor, Jan [5] J. Xu and M. van der Schaar, Token sysem design for auonomic wireless relay neworks, IEEE Trans on Communicaions, o appear, [6] A. S. Ibrahim, A. K. Sadek, W. Su, and K. J. R. Liu, Cooperaive communicaions wih relay-selecion: when o cooperae and whom o cooperae wih?, IEEE Trans. on Wireless Communicaions, vol. 7, no. 7, pp , July [7] R. Madan, N. Meha, A. Molisch, and J. Zhang, Energy-efficien cooperaive relaying over fading channels wih simple relay selecion, IEEE Trans. on Wireless Communicaions, vol. 7, no. 8, pp , Aug [8] M. L. Puerman, Markov Decision Processes: Discree Sochasic Dynamic Programming. New York: John Wiley & Sons, [9] R. Jain, Channel models: a uorial, Available online: hp:// [10] B. Sirkeci-Mergen and A. Scaglione, Randomized space-ime coding for disribued cooperaive communicaion, IEEE Trans. on Signal Processing, vol. 55, pp , Oc [11] M. Franklin and M. Reier, Fair exchange wih a semi-rused hird pary, ACM conference on Compuer and Communicaion Securiy, [12] Y. Zhao, R. Adve, T. J. Lim, Improving amplify-and-forward relay neworks: opimal power allocaion versus selecion, IEEE Trans. Wireless Commun., vol. 6, no. 8, V. CONCLUSION Cooperaive communicaion and D2D echnologies are cenral o he LTE-Advanced sandard. However, hese echnologies assume ha users are willing o share heir limied baery energy o relay daa for oher users, even when doing so may reduce heir uiliy. In his paper, we use oken exchanges o provide self-ineresed users wih incenives o relay daa for oher users. We formulae each device s decision problem, i.e., wheher or no o relay, as a Markov decision process. We propose a simple and effecive learning algorihm ha a device can deploy o learn is opimal cooperaion sraegy online based on is experience. We evaluae he nework performance in various deploymen scenarios involving boh high mobiliy and low mobiliy users. Our simulaion resuls indicae ha individual devices have he greaes incenive o cooperae when he nework conains many highly mobile users (e.g., users in moor vehicles). REFERENCES [1] T. C.-Y Ng and W. Yu, Join opimizaion of relay sraegies and resource allocaions in cooperaive cellular neworks, IEEE Trans. on Selec. Areas in Comm., vol. 25, no. 2, Feb Fig. 3. Impac of differen mobiliy mixures on differen user classes. (a) Average ORDR λ. (b) Average IRDR µ. (c) Average modified relay recruimen efficiency e Pr( k> 0, p> 0). (d) Average hroughpu gain

Vehicle Arrival Models : Headway

Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where