Title. Author(s)Asheralieva, Alia; Miyanaga, Yoshikazu. CitationMobile information systems, 2016: Issue Date 2016.

Size: px

Start display at page:

Download "Title. Author(s)Asheralieva, Alia; Miyanaga, Yoshikazu. CitationMobile information systems, 2016: Issue Date 2016."

Zoe Shields
5 years ago
Views:

1 Tile Dynamic Resource Allocaion wih Inegraed Reinforc Access o Unlicensed Band Auhor(s)Asheralieva, Alia; Miyanaga, Yoshikazu CiaionMobile informaion sysems, 206: Issue Dae 206 Doc URL hp://hdl.handle.ne/25/64588 Righs(URL) hps://creaivecommons.org/licenses/by/4.0/ Type aricle File Informaion pdf Insrucions for use Hokkaido Universiy Collecion of Scholarly and Aca

2 Mobile Informaion Sysems Volume 206, Aricle ID , 8 pages hp://dx.doi.org/0.55/206/ Research Aricle Dynamic Resource Allocaion wih Inegraed Reinforcemen Learning for a D2D-Enabled LTE-A Nework wih Access o Unlicensed Band Alia Asheralieva and Yoshikazu Miyanaga Laboraory of Informaion Communicaion Neworks, School of Informaion Science and Technology, Hokkaido Universiy, Sapporo, Japan Correspondence should be addressed o Alia Asheralieva; aasheralieva@gmail.com Received 30 May 206; Revised 8 Sepember 206; Acceped 6 Ocober 206 AcademicEdior:JuanC.Cano Copyrigh 206 A. Asheralieva and Y. Miyanaga. This is an open access aricle disribued under he Creaive Commons Aribuion License, which permis unresriced use, disribuion, and reproducion in any medium, provided he original work is properly cied. We propose a dynamic resource allocaion algorihm for device-o-device (D2D) communicaion underlying a Long Term Evoluion Advanced (LTE-A) nework wih reinforcemen learning (RL) applied for unlicensed channel allocaion. In a considered sysem, he inband and ouband resources are assigned by he LTE evolved NodeB (enb) o differen device pairs o maximize he nework uiliy subjec o he arge signal-o-inerference-and-noise raio (SINR) consrains. Because of he absence of an esablished conrol link beween he unlicensed and cellular radio inerfaces, he enb canno acquire any informaion abou he qualiy and availabiliy of unlicensed channels. As a resul, a considered problem becomes a sochasic opimizaion problem ha can be deal wih by deploying a learning heory (o esimae he random unlicensed channel environmen). Consequenly, we formulae he ouband D2D access as a dynamic single-player game in which he player (enb) esimaes is possible sraegy and expeced uiliy for all of is acions based only on is own local observaions using a join uiliy and sraegy esimaion based reinforcemen learning (JUSTE-RL) wih regre algorihm. A proposed approach for resource allocaion demonsraes near-opimal performance afer a small number of RL ieraions and surpasses he oher comparable mehods in erms of energy efficiency and hroughpu maximizaion.. Inroducion D2D communicaion is a direc communicaion beween he users ransmiing over he cellular specrum (inband) oroperaingonanunlicensedband(i.e.,ouband).the main advanages of inband D2D communicaion are he increased specrum efficiency and possibiliy of qualiy of service (QoS) provisioning for differen cellular/d2d users. The chief obsacles o he implemenaion of inband D2D access are (i) inerference miigaion (beween he users ransmiing over he same frequency bands) and (ii) resource allocaion []. Effecive resource allocaion and inerference managemen sraegies can significanly improve he performance of cellular neworks. The objecives here could be differen (such as improvemen of specrum efficiency, cellular coverage, nework hroughpu, or user experience) buoachieveheopimalsysemperformance,heproblems of cellular/d2d mode selecion, specrum assignmen, power allocaion, and inerference miigaion should be considered joinly in he algorihm design. Relaed conribuions in his area are [2 0] sudying he problem of inerference miigaion for underlying D2D communicaion. I should be noed, however, ha he majoriy of proposed formulaions (excep [2, 3]) does no deal wih he issues of mode selecion, specrum assignmen, and inerference managemen in a joinfashionburaherbyspliingheoriginalproblem ino smaller subproblems (see e.g., [0]) or by separaing he ime scales of hese subproblems (e.g., [9]). Hence, alhough he complexiy of such mehods is less han he complexiy of a join resource allocaion, heir efficiency in maximizing some cerain opimaliy crierion is clearly downgraded. Ouband D2D communicaion (carried over Wi-Fi Direc [], ZigBee [2], or Blueooh [3]) eliminaes he need for inerference miigaion bu can be disored by

3 2 Mobile Informaion Sysems he randomness of unlicensed channels. Exising works on ouband D2D access focus on such issues as power consumpion (e.g., [4 7]) and coordinaion beween cellular and wireless inerfaces ([8 2]). Some of hese works ([4, 5, 2]) sugges conrol of unlicensed band by he cellular nework (which requires a cerain amoun of cooperaion and informaion exchange beween differen radio inerfaces). Oher works (e.g., [7, 8, 20]) imply auonomous operaion of D2D devices (based on sochasic modeling of unlicensed channels). The main conribuions of his work are as follows. We consider a nework-conrolled D2D communicaion in which he licensed and unlicensed specrum resources, user modes, and ransmission power levels are allocaed o differen device pairs by he LTE enb o maximize he overall nework uiliy. We consider a general nework deploymen scenario where he unlicensed band is assumed o be provided by one or more radio access echnologies (RATs) based on he orhogonal frequency division muliple access (OFDMA), carrier sense muliple access wih collision avoidance (CSMA/CA), frequency-hopping code division muliple access (FH-CDMA), or any oher muliple access mehod. I is assumed ha all device pairs are equipped wih differen wireless inerfaces allowing hem o connec o he appropriae RAT and use a CSMA/CA o avoid collisions when operaing on he unlicensed band. Hence, each unlicensed channel becomes available o a D2D pair only wheniisidle.unlikemanypreviousworks,wejoinlysolve he problems of inband/ouband access, mode selecion, and specrum/power assignmen by combining hese problems ino one opimizaion problem which allows o allocae he inband nework resources and offload he D2D raffic in a mos effecive way (in erms of maximizing he overall nework uiliy). Noe ha he formulaed problem can be solved o opimaliy only if he global channel and nework knowledge (including he precise informaion on he operaing condiions of he licensed and unlicensed channels) is available o he enb. However, because of he absence of an esablished conrol link beween he unlicensed and cellular radio inerfaces, he enb canno ge any informaion abou hequaliyandavailabiliyofheunlicensedchannels.asa resul, a considered resource allocaion problem becomes a sochasic opimizaion problem ha can be deal wih by deploying a learning heory [22] (o esimae he random unlicensed channel environmen). Consequenly, we formulae he ouband D2D access as a dynamic single-player game in which he player (enb) esimaes is possible sraegy and expeced uiliy for all of is acions based only on is own local observaions using a JUSTE-RL wih regre (originally proposed in [23]). The main idea behind RL is ha he acions leading o he higher nework uiliy a he curren sage should be graned wih higher probabiliies a he nex sage [22]. In he simples form of RL (described, e.g., in [24]), a learning agen esimaes is bes sraegy based on is observed uiliy wihou any prior informaion abou is operaing environmen. This form of RL requires only algebraic operaions bu is convergence o he equilibrium sae is no guaraneed [25]. In Q-learning [22], a uiliy is esimaed using some value-acion funcion. This RL mehod converges o a Nash equilibrium (NE) sae. However, i requires maximizaion of he acion-value a every sage which can be compuaionally demanding [22]. In JUSTE-RL algorihm (described, in deail, in [23]), alearningagenesimaesnoonlyisownsraegybu also he expeced uiliy for all of is acions. Unlike Q- learning, JUSTE-RL does no need o perform opimizaion of he acion-value (since only algebraic operaions are required o updae he sraegies) and, hence, i has a lower compuaional complexiy. On he oher hand, compared o a basic RL algorihm, JUSTE-RL converges o a ε-ne [23, 25]. I is worh menioning ha, in wireless communicaions, RL has been sudied in he conex of various specrum access problems.in[26,27],helearninghasbeenemployedominimize he inerference (creaed by adjacen nodes) in parially overlapping channels. This problem has been formulaed as he exac poenial graphical game admiing a pure-sraegy NE and, herefore, he proposed approach is no realizable in a broader range of problems. A cogniive nework wih muliple players has been analyzed in [28]. In his work, he learning and channel selecion have been separaed ino wo differen procedures which increased he complexiy of a proposed resource allocaion approach. Besides, he sabiliy of a final soluion was no verified. A muli-player game for inband D2D access, where he players (D2D users) learn heir opimal sraegies based on he hroughpu performance in a sochasic environmen, has been sudied in [29]. I was assumedhaeachd2dusercanransmioverhevacancellular channels using a CSMA/CA implying ha here are no channels wih inerfering users (i.e., each orhogonal channel can be occupied by a mos one cellular/d2d user). Alhough he auhors consider a scenario wih wo D2D users operaing onhesamechannel,iisnoclearhowad2dusercan sense wheher he user operaing on he channel is cellular or D2D. An auonomous D2D access in heerogeneous cellular neworks comprising muliple low-power and high-power BSs wih (possibly) overlapping specrum bands has been invesigaed in [30]. This problem has been modeled as a sochasic noncooperaive game wih muliple players (D2D pairs) admiing a mixed-sraegy NE. The goal of each player was o joinly selec he wireless channel and power level o maximize is reward, defined as he difference beween heachievedhroughpuandhecosofpowerconsumpion consrained by he minimum olerable SINR requiremens of his D2D pair. To solve his problem, a fully auonomous muliagen Q-learning algorihm (which does no require any informaion exchange and/or cooperaion among differen users) is developed and implemened in an LTE-A nework. The res of he paper is organized as follows. A general nework model for inband and ouband nework operaion is described in Secion 2. A general problem and he algorihms for unlicensed and licensed resource allocaion areformulaedinsecion3.thealgorihmimplemenaion, including he proposed resource allocaion procedure in an LTE-A neworks and performance evaluaion, is presened in Secion 4. The paper is finalized in Conclusion.

4 Mobile Informaion Sysems 3 2. Nework Model In his paper, he problem of resource allocaion for D2D communicaion is invesigaed for boh he uplink (UL) and downlink (DL) direcions. Similarly, he discussion hrough he res of he paper is applicable (if no saed oherwise) o eiher direcion. Consider a basic LTE-A nework consising of one enb and N user pairs, denoed PU,...,PU N,wih N = {,...,N} being he se of user pairs indices. I is assumed ha a fixed licensed specrum band of he enb spans K resource blocks (RBs), numbered RB,...,RB K,wih K = {,...,K} denoing he se of RBs indices comprising he bandwidh. The nework runs on a sloed-ime basis wih he ime axis pariioned ino equal nonoverlapping ime inervals (slos) of he lengh T s,wih denoing an inegervalued slo index. Each pair of users can communicae wih each oher eiher by he radiional cellular mode (CM) via he enb or in a D2D mode (DM) wihou raversing he enb. Le C N be he se of he indices of device pairs ha can operae only in CM and le D = N \ C denoe he se of he indices of poenial D2D pairs (The indices in C and D can be deermined based on, e.g., user applicaion (such as video sharing, gaming, and proximiy-aware social neworking)inwhichhepairofdevicescouldpoenially be in range for he direc communicaion. Such informaion can be acquired from a sandard session iniiaion proocol (SIP) procedure (which handles he session seups and users arrivals in LTE neworks). Ineresed readers are referred o [3] for a comprehensive descripion of an SIP procedure and is use in he D2D access.). In our nework, any poenial D2D pair can be allocaed wih cellular or D2D mode (based on he resuls of resource allocaion procedure). Consequenly, we define a binary mode allocaion variable c n (),, equaling,ifpu n is allocaed CM a slo, and0,oherwise.noehac n () =, for all n C. Furher, we consider he following models of D2D access. (i) Inband D2D: a D2D pair operaes wihin he licensed LTEspecruminanunderlayocellularcommunicaion. (ii) Ouband D2D: a D2D pair ransmis over he unlicensed band by exploiing oher RATs, such as Wi- Fi Direc [], ZigBee [2], or Blueooh [3] (I is assumed ha all user devices are equipped wih he corresponding wireless inerfaces o be able o communicae using a suiable RAT.). We assume ha here is no coordinaion and/or informaion exchange beween differen wireless inerfaces. To differeniae he pairs according o heir D2D access, we define a binary channel access variable b n (),,equaling, if PU n operaes inband a slo,and0,oherwise.noeha all cellular users can access only he LTE bands. Hence, b n () =,foralln C. 2.. Inband Nework Operaion. In LTE/LTE-A, RBs are allocaed o cellular users by he enbs using a sandard packe scheduling procedure [32]. The use of packe scheduling in a D2D-enabled LTE-A nework is described, in deail, in [33]. In shor, a packe scheduling process can be explained as follows. In he UL direcion, a he beginning of any slo, each user is required o collec and ransmi is buffer saus informaion. Afer collecing his daa, a user sends he scheduling reques (SR) wih is buffer saus informaion o he enb via a dedicaed physical uplink conrol channel (PUCCH). Afer receiving all he SRs, he enb allocaes he RBs o he users (according o a cerain scheduling algorihm) and responds o all he SRs by sending he scheduling grans (SGs) ogeher wih he allocaion informaion o he corresponding users via dedicaed physical downlink conrol channels (PDCCHs) [33]. In he DL, he enb readily finds ou he DL buffer saus for each user, allocaes he RBs, and sends he SGs wih allocaion informaion via PDCCHs [33]. In he frameworkusedinhispaper,heaboveschedulingprocess is applied for boh he cellular and D2D communicaion wih some modificaions (he corresponding resource allocaion procedure will be described in Secion 4). Le us furher define a binary RB allocaion variable a k n (),, k K, equaling,ifpu n is allocaed wih RB k a slo, and0,oherwise.eachrbcanbeallocaedoamosone cellular user. Hence, a k n () b n () c n (), k K. (a) The number of D2D users operaing on he same RBs is unlimied. Addiionally, o maximize he nework uilizaion, we enforce each RB o be allocaed o a leas one user. Tha is, a k n () b n (), k K. (b) Noe ha boh he OFDMA used for DL ransmissions andsinglecarrierfrequencydivisionmulipleaccess(sc- FDMA) applied in he UL direcion provide orhogonaliy of resource allocaion o cellular communicaions. This allows achieving a minimal level of cochannel inerference beween he ransmier-receiver pairs locaed wihin one cell [34]. Thus, when informaion is ransmied by cellular/d2d user, i will be disored only by he users operaing on he same RB(s). Le G k nm,, m N, andk K, denoe he channel gain coefficien beween he ransmier and receiver of PU n and PU m operaing on RB k (for n C, G k nn indicaes he channel gain coefficien beween PU n operaing on RB k and he enb). In LTE sysem, he insananeous values of G k nm canbeobainedfromhechannelsaeinformaion(csi) hrough he use of special reference signals (RSs) [35] and, hence, hey are known o he enb and he users. Then, for any PU n operaing on RB k, he SINR a slo in he UL direcion is described by SINR k n () = a k n () p n () G k nn j N\{n} an k () ak j () p n () Gjn k +N, 0 (2) n N, k K, where N 0 is he variance of zero-mean addiive whie Gaussian noise (AWGN) power and p n () is he ransmission

5 4 Mobile Informaion Sysems power allocaed o PU n a slo.clearly,p n () is nonnegaive and canno exceed some predefined maximal level P max n ;ha is 0 p n () P max n, n N. (3) A any, heinbandserviceraeofpu n depends on he number of RBs allocaed o his device pair and he SINR in each RB. Tha is, r L n () =ωl a k n () log ( + SINRk n ()),, (4) k K where r L n () is he service rae of PU n (in bis per slo or bps) over licensed (inband) specrum and ω L is he bandwidh of one LTE RB (ω = 80 khz) Ouband Nework Operaion. We consider M separae ouband wireless channels numbered, for noaion consisency, as C K+,...,C K+M (In his paper, we consider a general scenario when he unlicensed ouband access can be based on OFDMA, CSMA/CA (in case of Wi-Fi Direc), FH- CDMA (in case of Blueooh), or any oher muliple access mehod.). We denoe by M = {K+,...,K+M}he se of channelindiceswihinheunlicensedbandanduseabinary channel allocaion variable a m n (),, m M,oindicae if PU n is allocaed wih he unlicensed channel C m (in which case, a m n () = ) orno(am n () = 0). Noe ha am n () = 0, for all n Cand m M (since cellular users can access only he LTE bands). For n D, b n () equals 0, if m M a m n (),and, oherwise (i.e., if m M a m n () = 0). Hence, b n () = max {0, a m n ()}, n D. (5a) m M To avoid collisions, he D2D pairs use a CSMA/CA mehod when operaing ouband. As a resul, each unlicensed channel C m isavailableod2dcommunicaiononlywheniisidle. Addiionally, o reduce he possibiliy of collisions beween D2D users, we assume ha, a any slo, amosonedevice pair can ransmi over each unlicensed channel C m.thais, a m n n D (), ( b n ()) M, n D m M. (5b) The ransmission procedure for he pair of D2D users operaing ouband is described as follows. A he beginning of slo, one of he users sars sensing he allocaed unlicensed channel C m (for simpliciy, we assume perfec sensing). If he channel is free, he ransmission phase (of he lengh T m r,such ha 0 T m r T s ) begins. Noe ha he duraion of T m r is random. I depends on he availabiliy of he channel C m and he applied CSMA/CA scheme. The probabiliy densiy funcion (p.d.f.) of T m r isnocalculaedhere(sinceihasno impac on he furher analysis in his paper). An example of such calculaions can be found in [36]. Le G m nn,forallm M, denoe he channel gain coefficien beween he ransmier and receiver of PU n operaing on unlicensed channel C m. Then, he SINR of PU n ransmiing over he channel C m a slo can be expressed by SINR m n () = am n () p n () G m nn, n D, m M (6a) N 0 and he service rae of PU n over unlicensed (ouband) specrum is described by r U n () = m M T m r ωu m a m n T () log ( + SINRm n ()), s n D, (6b) where ω U m is he bandwidh (in Hz) of unlicensed channel C m. Noe ha neiher he enb nor D2D users have prior informaion abou qualiy and availabiliy of unlicensed channels. Therefore, he exac values of G m nn and Tm r are unknownoheenbandhed2dusers. 3. Resource Allocaion Problem 3.. Problem Saemen. We define a binary N Kdimensional RB allocaion marix a L and a binary N Mdimensional unlicensed channel allocaion marix a U as a [ () ak () a L [[[ =..., ] [ a N () ak N () ] (7) a K+ () a K+M () a U = [..., ] [ a K+ N () ak+m N ()] respecively. We also define a binary N-dimensional D2D access allocaion vecor b =(b (),...,b N ()), abinaryndimensional mode allocaion vecor c = (c (),...,c N ()), and a real-valued N-dimensional power allocaion vecor p =(p (),...,p M ()). Then, he ses of all admissible values for a L, au, b, c,andp are described by A L = {a L ak n () {0, }, n N, k K} ; (8a) A U = { { a U a m n () {0, }, am i () =0, a m j () j D { n D, i C, m M } } ; } B = { { b b n () {0, }, b i () =, ( b j ()) j D { M n D, i C } } ; } (8b) (8c)

Mobile Informaion Sysems 5 Cellular link PU 3 PU 4 PU 2 enb PU 5 PU PU 7 D2D link PU

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a L 0 0 0 0 0 = ; a U 0 0 0 0 = ; b = ; c = 0

0) ( 0) Figure : A D2D-enabled cellular nework wih hree cellular pairs (PU 2,PU

Three of he D2D pairs (PU,PU 3,andPU 4 ) use inband access and wo D2D pairs (PU 6,PU

In his example, differen cellular and D2D pairs inerfere wih each oher when

RB 4 (PU 5 inerferes wih PU 3 ), RB 5 (PU 7 inerferes wih PU 3 ), and RB 6 (PU 4

0 C = {c c n () {0, }, c i () =,, i C} ; (8d) P = {p 0 p n () P max n, n N}.

6 Mobile Informaion Sysems 5 Cellular link PU 3 PU 4 PU 2 enb PU 5 PU PU 7 D2D link PU 8 RB RB 2 RB 3 RB 4 RB 5 RB 6 C 7 C 8 C 9 PU 2 PU PU 5 PU7 6 PU 8 PU PU 3 PU a L = ; a U = ; b = ; c = ( ) ( ) ( ) ( ) ( ) ( 0 0 ) ( 0) ( 0) Figure : A D2D-enabled cellular nework wih hree cellular pairs (PU 2,PU 5,andPU 7 ) and five D2D pairs (PU,PU 3,PU 4,PU 6,andPU 8 ). Three of he D2D pairs (PU,PU 3,andPU 4 ) use inband access and wo D2D pairs (PU 6,PU 8 ) are allocaed wih he unlicensed channels. In his example, differen cellular and D2D pairs inerfere wih each oher when ransmiing over RB 2 (where PU 2 inerferes wih PU ), RB 3 (PU inerferes wih PU 5 ), RB 4 (PU 5 inerferes wih PU 3 ), RB 5 (PU 7 inerferes wih PU 3 ), and RB 6 (PU 4 inerferes wih PU 7 ). 0 C = {c c n () {0, }, c i () =,, i C} ; (8d) P = {p 0 p n () P max n, n N}. (8e) Example of a D2D-enabled nework wih all defined opimizaion variables is shown in Figure. Ideally, a any slo, he enb should disribue he nework resources among he users o maximize heir aggregaed service rae. Tha is, o maximize he sum: r n () = (b n () r L n () +( b n ())r U n ()), (9) where r n () represens he service rae of PU n (operaing eiher inband or ouband). However, when communicaing

7 6 Mobile Informaion Sysems overheunlicensedspecrum,eachd2dpairshouldransmi a a maximal power level o achieve he high SINR regime (and, consequenly, service rae) which, in urn, resuls in increased power consumpion of mobile erminals. Therefore, when formulaing he uiliy of each device pair, we should also consider he cos of power consumpion, o quanify he rade-off beween he achieved rae and power level (as in [37]). Accordingly, we can define a uiliy u n () of PU n a slo, as he difference beween is insananeous service rae r n () and he cos of power consumpion: u n () =r n () υ n p n () =b n () r L n () +( b n ())r U n () υ np n (), (0) where υ n 0is he cos per uni (W) level of power for PU n. Using he above definiion, we can express our resource allocaion problem as follows: maximize u n () = [b n () r L n () +( b n ())r U n () υ np n ()], (a) subjec o a L AL, a U A U, b B, (b) c C, p P, a k n () b n () c n (), k K, (c) a k n () b n (), k K, (d) b n () = max {0, a m n m M SINR k n ()}, n D, (e) () SINRar n, n N, k K M, (f) where he consrain (f) is necessary o proec he users from heavy inerference (here SINR ar n sands for he minimal SINR level accepable by PU n ). Noe ha informaion on he ses C and D is readily available a he enb. The values of G k nm for n N, m N and k K, areobainedby heenbfromhecsicarriedbyherss.theonlymissing informaion is relaed o r U n () ha depends on he parameers T m r (represening he availabiliy of he unlicensed channel C k in our model) and G m nn (which defines he qualiy of unlicensed channel C k ), for all m M. The laer parameer is deermined by he unlicensed channel allocaions and, hence, he enb can adap o he changes of G m nn in ime and space. Since here is no coordinaion (and no informaion exchange) beween he LTE and ouband RAT inerfaces, solving (a) (f) o opimaliy migh be impossible, which is a raher srong argumen in favor of applying a well-known reinforcemen learning (RL) for resource allocaion. The main idea behind RL is ha he acions (unlicensed channel allocaions) leading o he higher nework uiliy a slo should be graned wih higher probabiliies a slo + and vice versa [22]. In he simples form of RL (presened in [24]), he learning agen esimaes is possible sraegies basedonhelocallyobserveduiliywihouanypriorinformaion abou he operaing environmen. This form of RL requires only algebraic operaions bu does no guaranee he convergence o an equilibrium [25]. In Q-learning [22], he agen s uiliy is esimaed using some value-acion funcion. Given he cerain (easy o follow) condiions, his algorihm converges(wihprobabiliy)oannesae.however,i requires maximizaion of he acion-value a every slo (which can be compuaionally demanding depending on he srucure of a chosen value-acion funcion) [20]. In JUSTE-RL algorihm [23], he learning agen esimaes no only is own sraegy bu also he expeced uiliy for all of is acions. Unlike Q-learning, JUSTE-RL does no need o perform opimizaion of he acion-value (since only algebraic operaions are required o updae he sraegies) and, hence, i has a lower compuaional complexiy. On he oher hand, compared o a basic RL algorihm, JUSTE-RL converges o a ε-ne [23, 25]. We now show how a JUSTE-RL wih regre can be applied o our problem Unlicensed Channel Allocaion. To apply JUSTE-RL wih regre o our problem, we represen i as a game wih one player (he enb) having no informaion abou he operaing environmen. A finie se of he enb s acions A U represens he se of all admissible unlicensed channel allocaion decisions. The objecive of he enb is o selec, a any slo, an

8 Mobile Informaion Sysems 7 acion A = a U A U o maximize he enb s uiliy u = u n (). In he following, we use noaion a m n (), m M, o specify he enb s decision regarding he allocaion of an unlicensed channel C k o a pair PU n and a U o describe all unlicensed channel allocaions by he enb when selecing a paricular acion A a slo. Wealsouseb o denoe he D2D access allocaion vecor and r U n () o indicae he ouband service rae achieved by playing he acion A.Afer aking an acion A = a U a slo, he enb observes he (random) service rae r U n () and esimaes he nework uiliy u = u (A ) by solving he following problem: maximize u = [b n () r L n () +( b n ()) r U n () υ np n ()], (2a) subjec o a L AL, c C, (2b) p P, a k n () b n () c n (), k K, (2c) a k n () b n (), k K, (2d) SINR n () SINR ar n, n N, (2e) where b n () = max {0, a m n ()}, n D (2f) m M and b n () = 0, n C. Noe ha, unlike problem (a) (f), () is problem (2a) (2e) can be solved o opimaliy (since r U n known). I has hree opimizaion variables a L, c,andp and, hence, is complexiy is lower han he complexiy of (a) (f) (he mehod for solving (2a) (2e) is presened in he nex subsecion). We also define a mixed-sraegy probabiliy π of playing an acion A = a U a slo as π (A )={π (B)} B A U, π (B) = Pr {A =B}, A A U, and a regre ρ (A ) for no playing his acion a slo as (3a) ρ (A ) = max {0, u u }, A A U. (3b) In JUSTE-RL, he probabiliy disribuion of a regre over all possible acions becomes he Bolzmann Gibbs disribuion (aka canonical ensemble), given by [22] G{ρ (A )}= exp { ρ (A )/kt B } B A U exp { ρ (B) /kt B }, A A U, (4) where k = J/K is he Bolzmann consan; T B is he sysem emperaure (in K). High emperaures make all acions almos equiprobable and low emperaures resul in greedy acion selecion [22]. Using he above definiions, he dynamics of a JUSTE-RL wih regre can be described as [23] u + (B) = u (B) +α A =B (u u (B)), ρ + (B) =ρ (B) +β (u (B) u ρ (B)), π + (B) =π (B) +γ (G {ρ (B)} π n (B)), (5) for all B A U,whereα, β,andγ are he learning raes, such ha [23] lim α τ =+, τ= lim α 2 τ <+, τ= lim β τ =+, τ= lim β 2 τ <+, τ= lim γ τ =+, τ= lim γ 2 τ <+, τ=

9 8 Mobile Informaion Sysems Iniializaion: () Inpu υ α, υ β, υ γ, T; (2) For all A 0 A U, se u 0 (A 0 ) 0, ρ 0 (A 0 ) 0, π 0 (A 0 ) 0; (3) For all, se r U n (0) 0; Main Loop: (4) While ( T) do (5) Selec A arg max B A U(π (B)) and se a U A ; (6) For all n D, se b n () = max{0, m M a m n ()}; (7) For all n C, se b n () = 0; (8) Execue A and observe r U n (), for all ; (9) Solve (2a) (2e) o find an opimal u (A ) u ; (0) For all B A U, updae u (B), ρ (B), π (B) using (5); () End. Algorihm : JUSTE-RL wih regre for unlicensed channel allocaion. RL algorihm for ouband channel allocaion. γ lim =0, α β lim =0. γ Typically, he learning raes are se equal [22]: α = β = γ = (+) υ α, (+) υ β, (+) υ γ, (6a) (6b) where υ α (0.5, ], υ β (0.5, ], υ γ (0.5, ], andυ α υ β υ γ. The iniializaions u 0 (A) 0, ρ 0 (A) 0, and π 0 (A) 0 should be sufficienly close o zero, for all A A U. The dynamics (5) converge o he ε-nash equilibrium. Noe ha a Nash equilibrium poin for (5) is given by [23]: lim π (A) =π (A), A A U. (7) The corresponding learning algorihm for unlicensed channel allocaion is presened in Algorihm (where T indicaes he oal simulaion lengh in slos). Noe ha his algorihm converges when π (B) = π (B), forallb A U.The complexiy of JUSTE-RL wih regre is mainly deermined byhesizeofanacionsea U,since,aanyslo, wehave o selec an acion A A U ha maximizes π (A ) (he dynamics in (5) require only algebraic operaions and, hus, is compuaional complexiy is negligible). Consequenly, he wors-case ime complexiy of Algorihm is O(n), where n= A U N Mishesizeofouracionse Inband Resource Allocaion. Consider (2a) (2e) ha represens a join mode, RB, and power level allocaion problem. This problem has wo binary opimizaion variables a L and c, one real-valued variable p, nonlinear objecive (2a), and nonlinear consrains (2c) and (2e). Hence, i belongs o a family of he mixed-ineger nonlinear programming (MINLP) problems. I has been well esablished in he pas (see, e.g., [38]) ha all MINLP problems involving binary variables (such as (2a) (2e)) are Nondeerminisic Polynomial-ime- (NP-) hard. For immediae NP-hardness proof for a considered problem noe ha, given ha a L can be eiher 0 or, any feasible soluion o (2a) (2e) is a subse of verices. The consrain (2d) also implies ha a leas one end poin of every edge is included in his subse. Hence, he soluion o his problem describes a verex cover, for which finding a minimum is NP-hard. Mos of he MINLP soluion echniques involve he consrucion of he following relaxaions o he considered problem: a nonlinear programming (NLP) relaxaion (he original problem wihou ineger resricions) and a mixedineger linear-programming (MILP) relaxaion (an original problem where he nonlineariies are replaced by supporing hyperplanes). To form he MILP and NLP relaxaions o (2a) (2e), le us firs represen in equivalen form he following: minimize [υ n p n () ω L b n () a k n () dk n ( b n ()) r U n ()], k K (8a) subjec o a L AL, c C, (8b) p P,

10 Mobile Informaion Sysems 9 a k n () b n (), k K, (8c) g k (al, c )= a k n () b n () c n () 0, k K, (8d) g 2 n (al, p )=SINR n () SINR ar n 0,, (8e) g 3 n,k (al, p ) = SINR k n () 2dk n + 0,, k K, (8f) where objecive (8a) and consrains (8b) and (8c) are linear, while consrains (8d) (8f) are nonlinear. The MILP relaxaion o (8a) (8f) in a given poin (a L0, c 0, p0 )isgiven by minimize [υ n p n () ω L b n () a k n () dk n ( b n ()) r U n ()], k K (9a) subjec o a L AL, c C, (9b) p P, a k n () b n (), k K, (9c) g k (al, c )+ g k (al0 g 2 n (al, p )+ g 2 n (al0 g 3 n,k (al, p )+ g 3 n,k (al0, c 0 )T [ al al0 ] 0, k K, (9d) c c 0, p 0 )T [ al al0 ] 0, n N, (9e) p p 0, p 0 )T [ al al0 ] 0, n N, k K. (9f) p p 0 The NLP relaxaion o (8a) (8f) is given by minimize [υ n p n () ω L b n () a k n () dk n ( b n ()) r U n ()], k K (20a) subjec o a L ÃL, c C, (20b) p P, a k n () b n (), k K, (20c) g k (al, c ) 0, k K, (20d) g 2 n (al, p ) 0, n N, (20e) g 3 n,k (al, p ) 0,, k K, (20f)

11 0 Mobile Informaion Sysems where Ã L ={a L 0 ak n (),, k K}; (20g) C ={c 0 c n (), c i () =,, i C}. (20h) In general, all MINLP problems can be solved using eiher exac echniques (e.g., branch-and-bound [39]) or heurisic mehods (such as local branching [40], large neighborhood search [4], and feasibiliy pump [42]). Since we are ineresed in a reasonably simple and fas algorihm, i is more convenien o use heurisics o solve (8a) (8f). Among numerous heurisic echniques, feasibiliy pump (FP) [43] is perhaps he mos simple and effecive mehod for producing more and beer soluions in a shorer average running ime (he local convergence properies of FP for nonconvex problems have been proved in [44]). The fundamenal idea of an FP heurisic is o decompose he MINLP problem ino wo pars: ineger feasibiliy and consrain feasibiliy. The former is achieved by rounding (solving he MILP relaxaion o an original problem), he laer by projecion (solving he NLP relaxaion). The algorihm generaes wo sequences of inegral and rounding poins. The firs sequence of inegral poins, {(a Li, ci, pi )}I i=, I =,2,...,conains he soluions ha may violae he nonlinear consrains; he second sequence, {(a Li, ci, pi )}I i=,comprisesherounding poinshaarefeasibleforhemilprelaxaionbumighno be inegral., c, p ) being a soluion o an NLP relaxaion (20a) (20f), FP generaes wo sequences by solving he following problems, for i=,...,i, Paricularly, wih he inpu (a L (a Li, ci, pi ) = arg min subjec o a L AL, (al, c, p ) (a Li, ci, pi ), c C, (2a) (2b) p P, a k n () b n (), k K, (2c) g k (al, c )+ g k (al0 g 2 n (al, p )+ g 2 n (al0 g 3 n,k (al, p )+ g 3 n,k (al0, c 0 )T [ al al0 ] 0, k K, (2d) c c 0, p 0 )T [ al al0 ] 0, n N, (2e) p p 0, p 0 )T [ al al0 ] 0,, k K; (2f) p p 0 (a Li+, c i+, p i+ )=arg min (al, c, p ) (a Li, ci, pi ) 2, (22a) subjec o a L ÃL, c C, (22b) p P, a k n () b n (), k K, (22c) g k (al, c ) 0, k K, (22d) g 2 n (al, p ) 0, n N, (22e) g 3 n,k (al, p ) 0, n N, k K, (22f) where and 2 are l -norm and l 2 -norm, respecively. The rounding is carried ou by solving he problem (2a) (2f) and he projecion is he soluion o (22a) (22f). Consequenly, an FP algorihm alernaes beween he rounding and

12 Mobile Informaion Sysems projecion seps unil (a Li, ci, pi )=(ali, ci, pi ) (which implies feasibiliy) or unil he number of ieraions i has reached is predefined limi I. The workflow of he algorihm is presened in Algorihm 2. Noe ha o reain he local convergence, he problems (2a) (2f) and (22a) (22f) have o be solved exacly in he FP algorihm. The problem (22a) (22f) (and, hence, (20a) (20f)) can be solved using any sandard NLP mehod. In his paper, an inerior poin algorihm (described, e.g., in [45]) which has a polynomial O(n 2 ) ime complexiy is applied o solve (20a) (20f) and (22a) (22f). The MILP problem (2a) (2f) is relaively simple and, herefore, i can be solved o opimaliy by any echnique from he family of he branch-and-bound mehods (e.g., [46]). Noe ha, in general, finding an opimal soluion o any join resource allocaion problem wih inegraliy consrains isnp-hard(whichhasbeenshownin[47]).consequenly, mos of he recen approaches o deal wih such kind of problems focus on finding he high-qualiy subopimal soluions using, for example, relaxaion (by removing all he ineger resricions, as i has been done in [47, 48]) or ieraive wo-sage algorihms for deermining he opimal inegral soluions given fixed power levels and, hen, finding he opimal power allocaion wih fixed inegral poins (e.g., [49]). In his paper, insead of relaxaion or ieraion, we direcly apply a heurisic FP algorihm ha has a polynomial O(n c ) ime complexiy in he size n of he problem (wih c being some real consan) [43] (Noe ha in our case, he size n of he problem (8a) (8f) is proporional o A L C P =N 3 K.Thenumericalresulsshowing he complexiy of a proposed algorihm will be presened in Secion 4.). Hence, he presened heurisic approach has moderae complexiy compared o he previously proposed algorihmsforresourceallocaionwihinegraliyconsrains whose complexiy ranges from linear [2, 30, 47, 48, ] o polynomial[2,3,49,5 53]. 4. Algorihm Implemenaion 4.. Resource Allocaion Procedure. We now discuss he implemenaion of he proposed algorihms (presened in Secion 3) in an LTE-A nework. The following scheduling procedureis repeaeda hebeginning of each slo as follows. (i) All users send heir SRs o he enb via dedicaed PUCCHs. Noe ha he SRs may conain some useful conrol informaion, such as updaed arge SINR level SINR ar n or observed hroughpu on he unlicensed channel r U n (). (ii) Afer receiving he SRs from all of he users, he enb performs resource allocaion (by assigning he modes, RBs and unlicensed channels, and power levels o user pairs according o Algorihms and 2) and sends he SGs wih opimal allocaions o he corresponding users via PDCCHs. (iii) Afer receiving he SGs, he users sar heir daa ransmissions over allocaed RBs/unlicensed channels wih assigned mode and power levels. As i was already been menioned, we deploy a CSMA/CA for ouband D2D access using a procedure described in IEEE 802. [54]. As dicaed by [54], if a cerain D2D pair PU n, n D, is allocaed wih one or more unlicensed channels hen, prior o ransmission, one of he users mus firs sense he channel (o deermine wheher i is idle) for he duraion of a disribued coordinaion funcion inerframe space (DIFS). DIFS (which is 34 μs long) consiss of a shor inerframe space (SIFS) equaling 6 μs and2wi-fislos(eachequals 9 μs). Afer DIFS, a user mus ypically defer is ransmission for a random number of slos, generaed from 0 o CW- (conenion window size), o allow he oher devices o share a channel in a fair manner. Given ha he minimum CW value is CW min =6,hedevicewill,onanaverage,waiforabou 7.5 Wi-Fi slos before ransmission. Thus, he average channel access delay is 6 μs μs =0.5μs (independen of service rae). Since he slo duraion in LTE sysem (T s = ms) is much longer han he average channel access delay (0.5 μs), i is expeced ha (in average) a D2D pair will be ableoexchangehedaawihinhescheduledperiod.in hiscase,eachofheusersinad2dpairshouldobserve achieved hroughpu r U n () and repor his value o he enb when sending is SR. Oherwise (if a D2D pair is no able o exchange he daa wihin one slo), he D2D users send he value r U n () = 0 o he enb. Noe ha a CSMA/CA does no allow wo-way daa ransmission. Hence, he second device in a D2D pair can sar he daa ransmission only afer he firsuserhasfinishedisransmission. I is worh menioning ha, a some poin in ime, a JUSTE-RL will reach is equilibrium sae. However, even afer he equilibrium has been reached, he enb coninues he learning process, because he nework environmen (channel qualiy, nework raffic, and he number of acive users) is likely o change over ime resuling in differen opimal mode, RB/unlicensed channel, and power allocaions Simulaion Model. A simulaion model of he nework has been implemened upon a sandard LTE-A plaform using he OPNET simulaion and developmen package [55]. The model consiss of one enb and N user pairs randomly posiioned inside a hree-secor hexagonal cell (wih he anenna paern specified in [56]). I is assumed ha he users operae oudoors in a ypical urban environmen and are saionary hroughou all simulaion runs. Each user device has is own raffic generaor, enabling a variey of raffic paerns. For simpliciy, in he examples below, he user raffic is modeled as a full buffer wih load of 0 packes per second and packe size of 0 byes. In all simulaions, C =0cellular pairs, υ β =.υ α, υ γ =.υ β, T = 000 slos, T B =0 6 K, and I = 000 ieraions, he arge SINR levels for each device pair are se as SINR ar n = SINR ar =0dB,foralln N. Thelicensedbandofhe enb comprises K =00RBs(equivaleno20MHz).The unlicensed band comprises M = 4 nonoverlapping OFDM channels wih ω U m =0MHz,form M. Themainsimulaion parameers of our model are lised in Table. Oher parameers are se in accordance wih 3GPP specificaions [56].

13 2 Mobile Informaion Sysems Iniializaion: () Inpu I, T; (2) While ( T) do (3) Inpu r U n (), au, b ; (4) Solve (20a) (20f) o find he opimal (a L, c, p ); Main Loop: (5) While (i I) do Rounding: (6) Solve (2a) (2f) o find he opimal (a Li, ci, pi ); (7) If ((a Li, ci, pi )=(ali, ci, pi )) hen break; Projecion: (8) Solve (22a) (22f) o find he opimal (a Li+ (9) Se i i+; (0) End., c i+, p i+ ); Algorihm 2: FP algorihm for inband resource allocaion. Table : Simulaion parameers of LTE-A Model. Parameer Value Cell radius 0 m Frame srucure Type 2 (ime division duplex) Slo duraion ms TDD configuraion 0 enodeb Tx power 46 dbm UE acive/idle Tx power 23/2 dbm Noise power 74 dbm/hz Pah loss and cellular link log(d), d [km] NLOS pah loss and D2D link 40 log(d) + 30 log(f) + 49, d [km], f [Hz] LOS pah loss and D2D link 6.9 log(d) + 20 log(f/5) , d [m], f [GHz] Shadowing s. dev. 0 B (cell mode); 2 db (D2D mode) In his paper, he evaluaion of a proposed approach for inband and ouband resource allocaion, referred o as JRA (JUSTE-RL based resource allocaion), is divided ino wopars.inhefirspar,weanalyzeheperformance of JUSTE-RL wih regre for unlicensed channel allocaion (Algorihm ). In he second par, we examine he efficiency of a proposed join inband/ouband resource allocaion (Algorihms and 2). In he following, he performance of JRA is compared wih he performance of he following resource allocaion echniques. (i) Firs is join inband/ouband resource allocaion wih ε-greedy Q-learning (GQL) [57] based on formulaions (2a) (2e) and (8a) (8f), where he unlicensed channels are allocaed o he users by he LTE enb. In GQL, a any slo, anacionwihhe larges Q-value is seleced wih probabiliy εand he oher acions are seleced uniformly a random wih probabiliy ε. In all simulaion experimens, he value of ε is se in accordance wih he mos common suggesions (provided, e.g., in [22]), as ε = 0.. (ii) Second is cenralized opimal sraegy (COS), where he inband and ouband nework resources are allocaed o he users by solving (a) (f) direcly based on global channel and nework knowledge. Noe ha COS corresponds o he mos efficien (in erms of nework uiliy maximizaion) sraegy alhough i is no pracically realizable (since in he real nework deploymen scenarios, he precise informaion abou qualiy and availabiliy of unlicensed channels is no available) (In his paper, we use an FP algorihm o find he opimal soluion o (a) (f) or (8a) (8f) in GQL and COS.). (iii) Third is social heurisic for mulimode D2D communicaion (SMD) in an LTE-A nework proposed in [2] o reduce he complexiy of an original opimizaion problem for join inband/ouband resource allocaion. This algorihm assigns user modes and resources o maximize he social welfare based on he global channel and nework knowledge. The enb creaes a randomly ordered lis of he D2D pairs. Then, i compues he aggregaed nework uiliy for eachmodeofhefirsuserinhelisandassignshis user wih a mode ha provides he highes aggregaed uiliy. This process is repeaed for all D2D pairs. (iv) Fourh is greedy heurisic for mulimode D2D communicaion (GMD) in LTE-A neworks [2], where he modes and inband/ouband nework resources areallocaedomaximizeheindividualusers welfare based on he global channel and nework knowledge. Similar o SMD, he enb creaes a randomly ordered lis of he D2D pairs. Afer his, i compues he uiliy foreachmodeofhefirsuserinhelisandassigns his user a mode assuring he highes individual uiliy. This process is repeaed for all D2D pairs. (v) Fifh is ranked heurisic for mulimode D2D communicaion (RMD) in LTE-A neworks [2]. Here he enb evaluaes he uiliy of each user in each mode (based on he global channel and nework knowledge)

14 Mobile Informaion Sysems Average number of RL ieraions (slos) before sraegy convergence Average number of RL ieraions (slos) before uiliy convergence Number of user pairs, N Number of user pairs, N υ α = 0.8 υ α = 0.75 υ α = 0.7 υ α = 0.65 υ α = 0.6 υ α = 0.55 υ α = 0.8 υ α = 0.75 υ α = 0.7 υ α = 0.65 υ α = 0.6 υ α = 0.55 Figure 2: The average number of RL ieraions (slos) necessary for convergence of sraegies in JRA wih differen values of υ α and fixed C =0. Figure 3: The average number of RL ieraions (slos) necessary for convergence of uiliies in JRA wih differen values of υ α and fixed C =0. and sors he D2D pairs according o heir uiliies in a descending order. Nex, he enb allocaes he firs user in he lis a mode ha guaranees he highes aggregaed nework uiliy. This process is repeaed for all D2D pairs. Noe ha all algorihms used in our performance evaluaion are simulaed wih idenical sysem parameers Performance of a Learning Algorihm. We sar wih he performance evaluaion of JUSTE-RL wih regre for unlicensed channel allocaion (oulined in Algorihm ). Figures2and3demonsraehelearningspeedofJRA. Figure 2 shows he average number of RL ieraions (slos) necessary for convergence of sraegies in JRA (a he poin where π (B) = π (B), forallb A U ) wih differen values of υ α [0.55, 0.8], υ β =.υ α [0.6, 0.88], and υ γ =.υ β [0.67, 0.97] andavaryingnumberofuserpairs, N [, 0]. The average number of RL ieraions (slos) necessary for convergence of uiliies in JRA (a he poin where u (B) = u (B), forallb A U )wihυ α [0.55, 0.8], υ β [0.6, 0.88], andυ γ [0.67, 0.97] and N [, 0]) is ploed in Figure 3. The accuracy of esimaion in JRA is presened in Figures 4 and 5. Figure 4 shows he absolue error of sraegy esimaion in JRA, denoed as δ π, defined as a sum of he absolue differences beween he acual opimal sraegies and he esimaed sraegies upon he algorihm erminaion. Tha is, δ π = B A U π T (B) π (B), (23) where π T (B) is an opimal sraegy esimaed in JRA upon he algorihm erminaion (a slo T)andπ (B) is he acual The absolue error of sraegy esimaion, δ π Number of user pairs, N υ α = 0.8 υ α = 0.75 υ α = 0.7 υ α = 0.65 υ α = 0.6 υ α = 0.55 Figure 4: The absolue error of sraegy esimaion δ π in JRA calculaed upon he algorihm erminaion wih differen values of υ α and fixed C =0. opimal sraegy obained by playing an acion B A U. Figure 5 demonsraes he absolue error of uiliy esimaion in JRA, denoed δ u, defined as a sum of he absolue differences beween he acual and he esimaed opimal nework uiliies upon he algorihm erminaion; ha is, δ u = B A U u T (B) u (B), (24)

15 4 Mobile Informaion Sysems The absolue error of uiliy esimaion, δ u Insananeous nework uiliy, u ( 0 6 ) Number of user pairs, N υ α = 0.8 υ α = 0.75 υ α = 0.7 υ α = 0.65 υ α = 0.6 υ α = 0.55 Figure 5: The absolue error of uiliy esimaion δ u in JRA calculaed upon he algorihm erminaion wih differen values of υ α and fixed C =0. where u T (B) isanopimalneworkuiliyesimaedinjra upon he algorihm erminaion (a slo T) andu (B) is he acualopimalneworkuiliyobainedbyplayinganacion B A U. The observaions in Figures 2 5 show ha he raes of convergence of sraegies and uiliies and he accuracy of sraegy and uiliy esimaion are almos he same. Furhermore, we find ha he number of ieraions necessary for he algorihm convergence and absolue esimaion error srongly depend on he seing of he parameers υ α, υ β,and υ γ : he wors performance is aained wih υ α = 0.8, υ β = 0.6, and υ γ = 0.67 and he bes wih υ α = 0.55, υ β = 0.88, and υ γ = Such resuls are raher predicable since he parameers υ α, υ β,andυ γ are relaed o he parameers α, β,andγ (see (6b)) which have a direc influence on he learning rae of JRA [22, 23]. In Figures 6 and 7, he insananeous nework uiliy u = u n () is presened as a funcion of ime in scenarios wih low nework load (N = 00) and high nework load (N = 0) and fixed. Here a proposed JRA echnique is simulaed wih he seings υ α =0.5,υ β =0.6,andυ γ = 0.67.Thegraphsinhesefiguresshowhaheefficiencyof JRA and GQL improves gradually over ime. Afer abou 300 slos (which is he average ime necessary for he convergence of sraegies and uiliies in Algorihm ), JRA demonsraes near-opimal resuls. GQL needs a lile longer ime ( 400 slos) o converge, afer which is performance also becomes verycloseoheperformanceofcos.unlikejraandgql, he performance of COS, SMD, GMD, and RMD is consisen over ime (since hese algorihms do no involve any learning process). We also observe ha he nework uiliy aained in SMD, GMD, and RMD is much smaller han ha in COS. To undersand such poor performance of SMD, GMD, and RMD, noe ha, in hese algorihms, he original resource Number of ieraions (slos), GQL JRA COS SMD GMD RMD Figure 6: The insananeous nework uiliy u in differen algorihms wih fixed N = 00 and C =0. Insananeous nework uiliy, u ( 0 6 ) Number of ieraions (slos), GQL JRA COS SMD GMD RMD Figure 7: The insananeous nework uiliy u in differen algorihms wih fixed N = 0 and C =0. allocaion problem is divided ino wo separae problems: (i) mode selecion and (ii) packe scheduling. Afer ha, he mode selecion problem is solved using very plain heurisics (social, greedy, or ranked) which reduces he complexiy of an original opimizaion problem (from exponenial o linear) bu has a negaive impac on he performance of hese echniques in erms of nework uiliy maximizaion [2] Performance of Join Inband/Ouband Allocaion. We now evaluae he efficiency of a proposed inband/ouband resource allocaion (Algorihms and 2). The graphs in Figures 8 0 demonsrae he compuaional complexiy, soluion ime, and soluion accuracy of differen resource

16 Mobile Informaion Sysems 5 7 Average number of FP ieraions (per slo) before convergence The average relaive deviaion from he opimal soluion, Δ Number of user pairs, N GQL JRA COS SMD GMD RMD Figure8:TheaveragenumberofFPieraions(perslo)necessary for he convergence of he algorihms wih fixed C =0colleced during T slos Number of user pairs, N GQL JRA COS SMD GMD RMD Figure 0: The average relaive deviaion from he opimal soluion Δ in differen algorihms wih fixed C = 0colleced during T slos. Average soluion ime (μs) Number of user pairs, N GQL JRA COS SMD GMD RMD Figure 9: The average soluion ime (in μs) of differen algorihms wih fixed C =0colleced during T slos. allocaion echniques in he experimens wih υ α =0.55,υ β =0.6,andυ γ = 0.67 (in JRA) colleced during he enire simulaion period T. Paricularly, in Figures 8 and 9, he average number of algorihm ieraions (per slo) and soluion ime (in μs) are presened as a funcion of he number of user pairs N. Figure 0 shows he average relaive deviaion from he opimal soluion, denoed as Δ and calculaed according o Δ= T + T = ( pi p p ) ali a L a L + in GQL, JRA, and COS and Δ= T aui a U a U + T xi x = x ci c c (25) (26) in SMD, FMD, and RMD. In he above equaions, (a Li, aui, c i, pi ) is he opimal soluion found in GQL, JRA, and COS, (a L, a U, c, p ) is he acual opimal soluion o he original resource allocaion problem (a) (f), x i is he mode allocaion in SMD, FDM, and RMD, and x is he acual opimal allocaion (a soluion o he opimizaion problem originally saed in [2]). I follows from hese figures ha all simulaed sraegies have moderae compuaional complexiy. Predicably, COS has he highes complexiy because he number of opimizaion variables (a L, au, c, p ) in his algorihm is bigger han ha in JRA, GQL, SMD, FMD, and RMD. The lowes complexiy and soluion accuracy are achieved in SMD, FMD, and RMD (which have a linear ime complexiy bu are based on very raw approximaions and plain heurisic assumpions). Figures 3 presen he observaions colleced a slo =0wihυ α =0.55,υ β =0.6,andυ γ =0.67(inJRA). The average user hroughpu r (in kbis/s) and he average

17 6 Mobile Informaion Sysems 25 3 Average user hroughpu, r (kbis/s) Insananeous nework uiliy, u ( 0 6 ) Number of user pairs, N Targe SINR level, SINR ar GQL JRA COS SMD GMD RMD Figure : The average user hroughpu r (in kbis/s) in differen algorihms wih fixed C =0observed a slo = 0. GQL JRA COS SMD GMD RMD Figure 3: The insananeous nework uiliy u in differen algorihms wih fixed N = 00 and fixed C =0observed a slo = Average ransmi power per user, p (dbm) Number of user pairs, N GQL JRA COS SMD GMD RMD Figure 2: The average user ransmi power (in dbm) in differen algorihms wih fixed C =0observed a slo = 0. ransmission power (per user) p (in dbm) in differen algorihms esimaed according o r = N (r L n () +ru n ()), p = N p n (), (27) are shown in Figures and 2, respecively. The insananeous nework uiliy u in differen algorihms depending on he arge SINR level, SINR ar,wihfixednumberofuserpairs, N = 00, is ploed in Figure 3. The obained resuls demonsrae ha he average user hroughpu decreases wih he number of user pairs N (Figure ). This is raher predicable because when he nework load increases, he number of RBs or unlicensed channels available for each user decreases resuling in a reduced hroughpu. Besides, o achieve he desired SINR levels, he users end o ransmi a a higher power level (see Figure 2) when he oal number of user pairs in he nework increases. The graphs in Figure 3 show ha he nework uiliies in differen resource allocaion schemes are described by some concave funcions of SINR ar. To undersand such resuls, noe ha wih oo low seings of SINR ar (SINR ar < 0dB), he oal user hroughpu reduces because of he bad channel condiions leading o he decreased nework uiliy. On he oher hand, when SINR ar is oo high (SINR ar > 4 db), he hroughpu (and, consequenly, nework uiliy) degrades due o he shorage of available bandwidh since he number of channels wih suiable daa ransmission condiions becomes very small (because no all of hem saisfy he SINR requiremens of he users). We also observe ha, in all simulaed scenarios, he performance of JRA is very close o opimal (i.e., he one achieved in COS). GQL performs a lile worse han JRA bu sill beer han heurisic algorihms (SMD, GMD, and RMD). 5. Conclusion This paper inroduces a JRA algorihm for a D2D-enabled LTE-A nework wih access o unlicensed band provided by one or more RATs based on differen channel access mehods (OFDMA, CSMA/CA, FH-CDMA, ec.). In he presened framework, he inband/ouband nework resources (cellular/d2d modes, specrum, and power) are allocaed joinly by he LTE enb o maximize he oal nework uiliy. Unlike mos of he previously proposed echniques for

18 Mobile Informaion Sysems 7 ouband D2D communicaion (which presume a cerain level of coordinaion and informaion exchange beween licensed and unlicensed sysems), our JUSTE-RL based approach for unlicensed channel assignmen is fully auonomous and has demonsraed relaively fas ( 300 RL ieraions) convergence o ε-nash equilibrium (given he appropriae seings of learning raes). Simulaions resuls also show ha he proposed join inband/ouband resource allocaion sraegy ouperforms oher relevan specrum and power managemen schemes in erms of energy efficiency and hroughpu maximizaion. Compeing Ineress The auhors declare ha hey have no compeing ineress. References [] A. Asadi, Q. Wang, and V. Mancuso, A survey on device-odevice communicaion in cellular neworks, IEEE Communicaions Surveys and Tuorials,vol.6,no.4,pp.80 89,204. [2] A. Asheralieva and Y. Miyanaga, Dynamic buffer saus-based conrol for LTE-a nework wih underlay D2D communicaion, IEEE Transacions on Communicaions,vol.64,no.3,pp , 206. [3] A. Asheralieva and Y. Miyanaga, QoS oriened mode, specrum and power allocaion for D2D communicaion underlaying LTE-A nework, IEEE Transacions on Vehicular Technology, 206. [4] J. Kim, S. Kim, J. Bang, and D. Hong, Adapive mode selecion in D2D communicaions considering he bursy raffic model, IEEE Communicaions Leers,vol.20,no.4,pp.72 75,206. [5] D. Penda, L. Fu, and M. Johansson, Energy efficien D2D communicaions in dynamic TDD sysems, hps://arxiv.org/abs/ [6] K. Yang, S. Marin, L. Boukhaem, J. Wu, and X. Bu, Energyefficien resource allocaion for device-o-device communicaions overlaying LTE neworks, in Proceedings of he IEEE 82nd Vehicular Technology Conference (VTC Fall 5),pp. 6,Boson, Mass, USA, Sepember 205. [7] A.Asadi,P.Jacko,andV.Mancuso, Modelingmuli-modeD2D communicaions in LTE, Acm Sigmerics,vol.42,no.2,pp.55 57, 204. [8] F. Malandrino, C. Casei, C. F. Chiasserini, and Z. Limani, Uplink and downlink resource allocaion in D2D-enabled heerogeneous neworks, in Proceedings of he IEEE Wireless Communicaions and Neworking Conference Workshops (WCNCW 4),pp.87 92,April204. [9] D.Feng,L.Lu,Y.-W.Yi,G.Y.Li,G.Feng,andS.Li, Device-odevice communicaions underlaying cellular neworks, IEEE Transacions on Communicaions, vol.6,no.8,pp , 203. [0] L. Su, Y. Ji, P. Wang, and F. Liu, Resource allocaion using paricle swarm opimizaion for D2D communicaion underlay of cellular neworks, in Proceedings of he IEEE Wireless Communicaions and Neworking Conference (WCNC 3), pp , IEEE, Shanghai, China, April 203. [] Wi-Fi Alliance, Wi-Fi Peer-o-Peer (P2P) Specificaion version., Wi-Fi Alliance Specificaion,, 200. [2] Z. Alliance, Zigbee Specificaion, Documen r06 (version),, [3] Blueooh Specificaion, Blueooh Specificaion version., 200, hp:// [4] A. Asadi and V. Mancuso, Energy efficien opporunisic uplink packe forwarding in hybrid wireless neworks, in Proceedings of he 4h ACM Inernaional Conference on Fuure Energy Sysems (e-energy 3), pp , Berkeley, Calif, USA, May 203. [5] A. Asadi and V. Mancuso, On he compound impac of opporunisic scheduling and D2D communicaions in cellular neworks, in Proceedings of he 6h ACM Inernaional ConferenceonModeling,AnalysisandSimulaionofWirelessand Mobile Sysems (MSWiM 3), pp , November 203. [6] A. Asadi and V. Mancuso, WiFi Direc and LTE D2D in acion, in Proceedings of he 6h IFIP/IEEE Wireless Days Conference (WD 3), pp. 8, Valencia, Spain, November 203. [7] Q. Wang and B. Rengarajan, Recouping opporunisic gain in dense base saion layous hrough energy-aware user cooperaion, in Proceedings of he IEEE 4h Inernaional Symposium on a World of Wireless, Mobile and Mulimedia Neworks (WoWMoM 3),pp. 9,June203. [8] B.Zhou,S.Ma,J.Xu,andZ.Li, Group-wisechannelsensing and resource pre-allocaion for LTE D2D on ISM band, in Proceedings of he IEEE Wireless Communicaions and Neworking Conference (WCNC 3), pp. 8 22, IEEE, Shanghai, China, April 203. [9] M. Ji, G. Caire, and A. F. Molisch, Wireless device-o-device caching neworks: basic principles and sysem performance, IEEE Journal on Seleced Areas in Communicaions,vol.34,no.,pp.76 89,206. [20] H. Cai, I. Koprulu, and N. B. Shroff, Exploiing double opporuniies for deadline based conen propagaion in wireless neworks, in Proceedings of he 32nd IEEE Conference on Compuer Communicaions (IEEE INFOCOM 3),pp , Turin, Ialy, April 203. [2] A. Asadi, P. Jacko, and V. Mancuso, Modeling muli-mode D2D communicaions in LTE, hps://arxiv.org/abs/ [22] R. S. Suon and A. G. Baro, Reinforcemen Learning: An Inroducion, MIT Press, Cambridge, Mass, USA, 998. [23] S. M. Perlaza, H. Tembine, and S. Lasaulce, How can ignoran bu paien cogniive erminals learn heir sraegy and uiliy? in Proceedings of he IEEE h Inernaional Workshop on Signal Processing Advances in Wireless Communicaions (SPAWC 0), June 200. [24] Y. Xing and R. Chandramouli, Sochasic learning soluion for disribued discree power conrol game in wireless daa neworks, IEEE/ACM Transacions on Neworking, vol. 6, no. 4, pp , [25] L. Rose, S. Lasaulce, S. M. Perlaza, and M. Debbah, Learning equilibria wih parial informaion in decenralized wireless neworks, IEEE Communicaions Magazine, vol. 49, no. 8, pp , 20. [26]Y.Xu,Q.Wu,L.Shen,J.Wang,andA.Anpalagan, Opporunisic specrum access wih spaial reuse: graphical game and uncoupled learning soluions, IEEE Transacions on Wireless Communicaions,vol.2,no.0,pp ,203. [27]Y.Xu,Q.Wu,J.Wang,L.Shen,andA.Anpalagan, Opporunisic specrum access using parially overlapping channels: graphical game and uncoupled learning, IEEE Transacions on Communicaions,vol.6,no.9,pp ,203. [28] D. Kalahil, N. Nayyar, and R. Jain, Decenralized learning for muliplayer muliarmed bandis, IEEE Transacions on Informaion Theory,vol.60,no.4,pp ,204.

19 8 Mobile Informaion Sysems [29] S. Maghsudi and S. Sańczak, Channel selecion for neworkassised D2D communicaion via no-regre bandi learning wih calibraed forecasing, IEEE Transacions on Wireless Communicaions,vol.4,no.3,pp ,205. [30] A. Asheralieva and Y. Miyanaga, An auonomous learningbased algorihm for join channel and power level selecion by D2D pairs in heerogeneous cellular neworks, IEEE Transacions on Communicaions,vol.64,no.9,pp ,206. [3] 3rd Generaion Parnership Projec, Physical channels and modulaion, Technical Specificaion 3GPP TS 36.2 V9..0, 200. [32] 3rd Generaion Parnership Projec; Technical Specificaion, E-UTRA; MAC proocol specificaion, 3GPP TS 36.2 V2.5.0, 205. [33] L. Lei, Z. Zhong, C. Lin, and X. Shen, Operaor conrolled device-o-device communicaions in LTE-advanced neworks, IEEE Wireless Communicaions,vol.9,no.3,pp.96 04,202. [34] H. Holma and A. Toskala, LTE for UMTS: Evoluion o LTE- Advanced, John Wiley & Sons, New York, NY, USA, 20. [35] I. F. Akyildiz, D. M. Guierrez-Esevez, and E. C. Reyes, The evoluion o 4G cellular sysems: LTE-Advanced, Physical Communicaion,vol.3,no.4,pp ,200. [36] Y.Xu,J.Wang,Q.Wu,A.Anpalagan,andY.-D.Yao, Opporunisic specrum access in unknown dynamic environmen: a game-heoreic sochasic learning soluion, IEEE Transacions on Wireless Communicaions, vol., no. 4, pp , 202. [37] H. Li, Muli-agen Q-learning of channel selecion in muliuser cogniive radio sysems: a wo by wo case, in Proceedings of he IEEE Inernaional Conference on Sysems, Man and Cyberneics (SMC 09), pp , Ocober [38] T. Berhold, Heurisic Algorihms in Global MINLP Solvers, Verlag Dr. Hu, 204. [39] S. Burer and A. N. Lechford, Non-convex mixed-ineger nonlinear programming: a survey, SurveysinOperaionsResearch and Managemen Science,vol.7,no.2,pp.97 06,202. [40] M. Fischei and A. Lodi, Local branching, Mahemaical Programming,vol.98,no. 3,pp.23 47,2003. [4] A. Lodi, The heurisic (dark) side of MIP solvers, in Hybrid Meaheurisics,vol.434ofSudies in Compuaional Inelligence, pp , Springer, Berlin, Germany, 203. [42] C.D Ambrosio,A.Frangioni,L.Liberi,andA.Lodi, Asorm of feasibiliy pumps for nonconvex MINLP, Mahemaical Programming B,vol.36,no.2,pp ,202. [43] M. Fischei and D. Salvagnin, Feasibiliy pump 2.0, Mahemaical Programming Compuaion,vol.,no.2-3,pp , [44] C.D Ambrosio,A.Frangioni,L.Liberi,andA.Lodi, Asorm of feasibiliy pumps for nonconvex MINLP, Mahemaical Programming,vol.36,no.2,pp ,202. [45] S. P. Boyd and L. Vandenberghe, Convex Opimizaion, Cambridge Universiy Press, [46] S. Leyffer, Inegraing SQP and branch-and-bound for mixed ineger nonlinear programming, Compuaional Opimizaion and Applicaions,vol.8,no.3,pp ,200. [47] R. Sun, M. Hong, and Z.-Q. Luo, Join downlink base saion associaion and power conrol for max-min fairness: compuaion and complexiy, IEEEJournalonSelecedAreasin Communicaions,vol.33,no.6,pp ,205. [48] Q. Kuang, W. Uschick, and A. Dozler, Opimal join user associaion and resource allocaion in heerogeneous neworks via sparsiy pursui, hps://arxiv.org/abs/ [49] K. Shen and W. Yu, Disribued pricing-based user associaion for downlink heerogeneous cellular neworks, IEEE Journal on Seleced Areas in Communicaions, vol. 32, no. 6, pp. 00 3, 204. [] M.Peng,X.Xie,Q.Hu,J.Zhang,andH.V.Poor, Conracbased inerference coordinaion in heerogeneous cloud radio access neworks, IEEE Journal on Seleced Areas in Communicaions, vol. 33, no. 6, pp , 205. [5] D. Fooladivanda and C. Rosenberg, Join resource allocaion and user associaion for heerogeneous wireless cellular neworks, IEEE Transacions on Wireless Communicaions, vol. 2, no., pp , 203. [52] M. Sanjabi, M. Razaviyayn, and Z.-Q. Luo, Opimal join base saion assignmen and beamforming for heerogeneous neworks, IEEE Transacions on Signal Processing, vol. 62, no. 8, pp. 9 96, 204. [53]Q.Han,B.Yang,X.Wang,K.Ma,C.Chen,andX.Guan, Hierarchical-game-based uplink power conrol in femocell neworks, IEEE Transacions on Vehicular Technology, vol. 63, no. 6, pp , 204. [54] IEEE Sandards Associaion, IEEE 802 :LocalandMeropolian Area Nework Sandards: IEEE 802. Sandard,202. [55] OPNET, hp:// [56] Evolved Universal Terresrial Radio Access (E-UTRA) and Evolved Universal Terresrial Radio Access Nework (E- UTRAN), 3GPP TS , version V9.4.0, 200. [57] M. Bennis, S. Guruacharya, and D. Niyao, Disribued learning sraegies for inerference miigaion in femocell neworks, in Proceedings of he IEEE Global Telecommunicaions Conference (GLOBECOM ), pp. 5, Houson, Tex, USA, December 20.

20 Journal of Advances in Indusrial Engineering Mulimedia The Scienific World Journal Applied Compuaional Inelligence and Sof Compuing Inernaional Journal of Disribued Sensor Neworks Advances in Fuzzy Sysems Modelling & Simulaion in Engineering Submi your manuscrips a Journal of Compuer Neworks and Communicaions Advances in Arificial Inelligence Hindawi Publishing Corporaion Inernaional Journal of Biomedical Imaging Volume 204 Advances in Arificial Neural Sysems Inernaional Journal of Compuer Engineering Compuer Games Technology Advances in Advances in Sofware Engineering Inernaional Journal of Reconfigurable Compuing Roboics Compuaional Inelligence and Neuroscience Advances in Human-Compuer Ineracion Journal of Journal of Elecrical and Compuer Engineering

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques

Resource Allocation in Visible Light Communication Networks NOMA vs. OFDMA Transmission Techniques Resource Allocaion in Visible Ligh Communicaion Neworks NOMA vs. OFDMA Transmission Techniques Eirini Eleni Tsiropoulou, Iakovos Gialagkolidis, Panagiois Vamvakas, and Symeon Papavassiliou Insiue of Communicaions