where v ij = [v ij,1,..., v ij,v ] is the vector of resource

Echo State Transfer Learning for ata Correlation Aware Resorce Allocation in Wireless Virtal Reality Mingzhe Chen, Walid Saad, Changchan Yin, and Méroane ebbah Beijing Laboratory of Advanced Information Network, Beijing University of Posts and Telecommnications, Beijing, China 00876, Emails: chenmingzhe@bpt.ed.cn, ccyin@ieee.org. Wireless@VT, Bradley epartment of Electrical and Compter Engineering, Virginia Tech, Blacksbrg, VA, USA, Email: walids@vt.ed. Mathematical and Algorithmic Sciences Lab, Hawei France R &, Paris, France, Email: meroane.debbah@hawei.com. Abstract In this paper, the problem of data correlation aware resorce management is stdied for a network of wireless virtal reality VR sers commnicating over clod-based small cell networks SCNs. In the stdied model, the small base stations SBSs with limited comptation resorce act as VR control centers that collect the tracking information from VR sers over the celllar plink and send them to the VR sers over the downlink. In sch a setting, VR sers may send or reqest the correlated or similar data panoramic images and tracking data. This potential spatial data correlation can be factored into the resorce allocation problem to redce the traffic load in both plink and downlink. This VR resorce allocation problem is formlated as a noncooperative game that allows jointly optimizing the comptation and spectrm resorces, while being cognizant of the data correlation. To solve this game, a transfer learning algorithm based on the machine learning framework of echo state networks ESNs is proposed. Unlike conventional reinforcement learning algorithms that mst be exected each time the environment changes, the proposed algorithm can intelligently transfer information on the learned tility, across time, to rapidly adapt to environmental dynamics de to factors sch as changes in the sers content or data correlation. Simlation reslts show that the proposed algorithm achieves p to 6.7% and 8.2% gains in terms of delay compared to the Q-learning with data correlation and Q-learning withot data correlation. The reslts also show that the proposed algorithm has a faster convergence time than Q-learning and can garantee low delays. I. INTROUCTION Virtal reality VR can enable sers to virtally hike the Grand Canyon or make a secret mission as a video game hero withot leaving their room. However, de to the wired connections of conventional VR devices, the sers are significantly restricted in the type of actions that they can take and VR applications that they can experience. To enable pervasive and trly immersive VR applications, VR systems can be operated sing wireless networking technologies []. However, operating VR devices over wireless celllar systems sch as small cell networks SCNs faces many challenges [] that inclde tracking accracy, extremely low delay, and effective image compression. The existing literatre has stdied a nmber of problems related to wireless VR sch as in [] [4]. The athors in [] exposed the ftre challenges of VR systems over a wireless network. However, this work is restricted to preliminary srveys that do not provide any technical soltions for optimizing wireless VR. In [2], a channel access scheme for wireless mlti-ser VR system is proposed. The athors in [3] proposed an alternate crrent magnetic field-based tracking system to track the position and orientation of a VR ser s head. However, existing works sch as in [2] and [3] only focs on the improvement of one VR qality-of-service QoS metric sch as tracking or delay. Indeed, this prior art does not develop any VR-specific model that can captre all factors of VR QoS jointly consider plink and downlink and, hence, these works fall short in addressing the challenges of optimizing VR QoS for wireless sers. In [4], we proposed a wireless VR model that captres the tracking accracy, processing delay, and transmission delay and proposed a machine learning based algorithm to solve the resorce allocation problem. However, this work is only focsed on spectrm allocation that ignores the data correlation over the data transmission of VR sers. Indeed, the sensors placed at a VR ser can collect the tracking data of other sers and, hence, the tracking data of VR sers may have some correlation. Moreover, when the VR sers are watching a football game with different perspective, the clod only needs to transmit one 360 image to the SBS, then the SBS can rotate the image and transmit it to different sers. In this case, the se of data correlation to redce the traffic load in data transmission can improve the transmission delay. The main contribtion of this paper is to introdce a novel framework for enabling VR applications over wireless celllar networks. To the best of or knowledge, this is the first work that jointly considers the data correlation, spectrm resorce allocation, and comptation resorce allocation for VR over celllar networks. Hence, or key contribtions inclde: We propose a novel VR model to jointly captre the downlink and plink transmission delay, backhal transmission delay, and comptation time ths effectively qantifying the VR delay for all sers in a wireless VR network. For the considered VR applications over wireless, we analyze resorce blocks allocation jointly over, the plink and downlink and the comptation resorce allocation via the plink. We formlate the problem as a noncooperative game in which the players are the small base stations SBSs. Each player seeks to find an optimal resorce allocation scheme to optimize a tility fnction that captres the VR delay. To solve this game, we propose a transfer learning algorithm based on echo state networks ESNs [5] to find the Nash eqilibrim of the game. The proposed algorithm can intelligently transfer information on the learned tility across time, and, hence, allow adaptation to environmental dynamics de to factors sch as changes in the sers data correlation. Simlation reslts show that the proposed algorithm can, respectively, yield 6.7% and 8.2% gains in terms of delay compared to Q-learning with data correlation and Q-learning withot data correlation. II. SYSTEM MOEL AN PROBLEM FORMULATION Consider the downlink and plink transmission of a clodbased SCN servicing a set U of U wireless VR sers and a set B of B SBSs. Here, the downlink is sed to transmit the VR images displayed on each ser s VR device while the plink is

sed to transmit the tracking information that is sed to determine each VR ser s location and orientation. The SBSs are connected to a clod via capacity-constrained backhal links and the SBSs serve their sers sing the celllar band. Here, V F represents the maximm backhal transmission rate for all sers. Here, we focs on entertainment VR applications sch as watching immersive videos and playing immersive games. In or model, the SBSs adopt an orthogonal freqency division mltiple access OFMA techniqe and transmit over a set of V of V plink resorce blocks and a set of S of S downlink resorce blocks. The coverage of each SBS is a circlar area with radis r and each SBS only allocates resorce blocks to the sers located in its coverage range. We also assme that the resorce blocks of each SBS will all be allocated to the associated sers. A. ata Correlation Model ownlink ata Correlation Model: In VR wireless networks, mltiple VR sers may play the same immersive game with different locations and orientations. In this case, the clod can exploit the data correlation between the sers that are playing the same immersive game to redce the traffic load of backhal links. For example, when the sers are watching the same immersive sports game, the clod can extract the difference between the VR images of these sers and will only need to transmit the data that is niqe to each ser to an SBS. However, when the VR sers are playing different immersive games, the data correlation between the sers is low and, hence, the clod needs to transmit entire VR images to the associated VR sers. In order to define the data correlation of VR images, we first assme that the nmber of pixels that ser i needs to constrct the VR images is N i and the nmber of different pixels between any pair of sers i and k is N ik. Here, N ik is calclated by the clod sing image processing methods sch as motion search [6]. Then, the data correlation between ser i and ser k can be defined as follows: φ ik = N ik N i + N k, where N k is the nmber of pixels that ser k needs to constrct the VR images dring a period. Indeed, captres the difference between the images of sers i and j. From, we can see that when ser i and ser k are associated with the same SBS, the clod only needs to transmit N i + N j N i + N j φ ij pixels to that SBS. 2 Uplink ata Correlation Model: In the plink, the sers mst transmit the tracking information to the SBSs. The tracking information is collected by the sensors placed at a VR ser s headset or near the VR ser. It has been shown that for most data-gathering applications, the data sorce can be modeled as a Gassian field [7]. The plink data is collected by the sensors and, hence, the plink data can be assmed to follow the Gassian distribtion. We can assme that the tracking data, X i, collected by each VR ser i is a Gassian random variable with mean µ i and variance σi 2. In wireless VR, observations from proximal VR devices are often correlated de to the dense deployment density. Hence, we consider the power exponential model [8] to captre the spatial correlation of VR tracking data. Here, the covariance σ ij between ser i and ser j separated by distance d ij is: σ ij = cov X i, X j = σ i σ j e dα ij/κ, 2 where α and κ captre the significance of distance variation on data correlation. B. elay Model In an SCN, the VR images are transmitted from the clod to the SBSs then to the sers. The tracking information is transmitted from the sers to the SBSs and processed at each corresponding SBS. In this case, the backhal links are only sed for VR image transmission and the transmission rate of each VR image from the clod to the SBS can be given as V F i = V F U. Here, we assme that the backhal transmission rate of each ser is eqal and we do not consider the optimization of the backhal transmission. In a VR model, we need to captre the VR transmission reqirements sch as high data rate, low delay, and accrate tracking and, hence, we consider the transmission delay as the main VR QoS metric of interest. The downlink rate of ser i associated with SBS j is: S c ij s ij = s ij,k Blog 2 + γ ij,k, 3 k= where s ij = [s ij,,..., s ij,s ] is the vector of resorce blocks that SBS j allocates to ser i with s ij,k {, 0}. Here, s ij,k = indicates that resorce block k is allocated to ser i. γ ij,k = P B h k ij N0 2+ P B h k l R k il,l j is the signal-to-interference-pls-noise ratio SINR between ser i and SBS j over resorce block k. Here, R k represents the set of the SBSs that se downlink resorce block k, B is the bandwidth of each sbcarrier, P B is the transmit power of SBS j which is assmed to be eqal for all SBSs, N0 2 is the variance of the Gassian noise and h k ij = gk ij p β ij is the path loss between ser i and SBS j over resorce block with gij k is the Rayleigh fading parameter, d ij is the distance between ser i and SBS j, and β is the path loss exponent. Based on and 3, the downlink transmission delay at time slot t is: ij L i φ max, s ij = L i φ max c ij s ij + L i φ max, 4 V F U where L i φ max is the data that ser i needs to constrct a VR image dring a period and φ max i = max φ ik is the maximm downlink data correlation between ser i and other sers k U j,k i associated with SBS j. Finding the maximm data correlation allows minimizing the downlink transmission data transmitted in the downlink and that will be sed constrct a VR image. Here, the first term is the transmission time from SBS j to ser i and the second term is the transmission time from the clod to SBS j. We assme that P U is the transmit power of each ser which is assmed to be eqal for all sers. The bandwidth of each plink resorce block is also B. In this case, the plink rate of each ser i associated with SBS j is: V c ij v ij = v ij,k Blog 2 + γ ij,k, 5 k= where v ij = [v ij,,..., v ij,v ] is the vector of resorce blocks that SBS j allocates to ser i with v ij,k {, 0}. γ ij,k = P U h k ij σ 2 + P U h k l U k il,l j is the SINR between ser i and SBS j over resorce block k with U k represents the set of sers that se plink resorce blocks k. In this case, the plink transmission delay can be given by Kiσmax where K i is the data that needs to c ijv ij

be transmitted and σi max = max σ ik is the maximm plink k U j,k i data correlation between ser i and other SBS j s associated sers. Similarly, finding the maximm data correlation allows minimizing the plink transmission data that SBS j ses to determine ser i s location and orientation. In the plink, the tracking information can be directly processed by the SBSs that have limited comptation power. Here, the comptation resorce of each SBS, c, represents its ability to compte the tracking data. Each SBS j will allocate the total comptation power to the associated sers and, hence, m ij is sed to represent the comptation power that SBS j allocates to ser i with i U j m ij = m. U j represents the set of the sers associated with SBS j. The comptation time of SBS j that processes the tracking data collected by ser i is Kiσmax m ij and the total plink delay can be given by: ij K i σi max, v ij, m ij = K iσi max c ij v ij + K iσi max, 6 m ij where the first term is the transmission time from ser i to SBS j and the second term is the comptation time for ser i data. Here, the comptation time depends on the comptation resorce that SBS j allocates to each ser that will affect the plink delay. C. Utility Fnction Model In order to jointly consider the transmission delay in both plink and downlink, we introdce a method based on the framework of mlti-attribte tility theory [9] to constrct an appropriate tility fnction to captre transmission delay in both plink and downlink. We first introdce the tility fnctions of transmission delay in plink and downlink, separately. Then, we formlate the tility fnction based on [9]. The tility fnction of downlink transmission delay is constrcted based on the normalization of downlink transmission delay, which can be given by: ij L i φ max, s ij = { ij,max ijl iφ max,s ij ij,max γ, ij L i φ max, s ij γ, 7, ij L i φ max, s ij < γ, where γ is the maximal tolerable delay for each VR ser maximm spported by the VR system being sed and ij,max = max ij L i 0, s ij is the maximal transmission delay. From s ij 7, we can see that, when the downlink transmission delay is smaller than γ, the tility vale will remain at. This is de to the fact when the delay meets the system reqirement, the network will encorage the SBSs to reallocate the resorce blocks to other sers. The tility fnction for the plink transmission is: ij K i σ max, v ij, m ij = { ij,max ijk iσi max ij,max γ,v ij,m ij, ijk iσ max, v ij, m ij γ, 8, ijk iσi max, v ij, m ij<γ, where γ is the maximal tolerable delay for the VR tracking information transmission and ij,max = max v ij K i 0, v ij, m ij is the maximal plink delay. ij,m ij Based on 7 and 8, the total tility fnction that captres both downlink and plink delay for ser i associated with SBS j is: U ij s ij, v ij, m ij = ij L i φ max, s ij ij K i σi max, v ij, m ij. 9 Here, L i φ max and K i σi max are determined by the ser association scheme. In order to captre the gain that stems from the allocation of the resorce blocks and the comptational capabilities, we state the following reslt: Theorem. The tility gain of ser i s delay de to an increase in the amont of allocated resorce blocks and comptational resorces is: i The gain that stems from an increase in the allocated plink resorce blocks, U ij, is given by: U ij = f ij c ijv ij c ij v ij f ij c ijv ij 2 f ij, c ij v ij c ij v ij,, c ij v ij c ij v ij,, else, c ij v ij c ijv ij 2 +c ijv ijc ij v ij 0 where f ij x = ij L i φ max Kiσ, s ij max x. ij,max γ ii The gain that stems from the increase in the nmber of downlink resorce blocks allocated to ser i, U ij, is: f ij c ijs ij, c ij s ij c ij s ij, cij s U ij = f ij ij, c c ijs ij 2 ij s ij c ij s ij, c f ij s ij ij, else, c ijs ij 2 +c ijs ijc ij s ij Liφ max x where f ij x= ij K i σi max, v ij, m ij ij,max γ. iii The gain that stems from the increase in the amont of comptation resorces, m, allocated to ser i, U ij, is: U ij = ijl i φ max K iσi max m, s ij. 2 ij,max γ mijm ij+ m Proof. For i, The gain that stems from an increase in the allocated plink resorce blocks, U ij, can be given by: U ij = U ij s ij, v ij + v ij, m ij U ij s ij, v ij, m ij = ij L i φ max, s ij ij K i σi max, v ij, m ij ij L i φ max, s ij ij K i σi max, v ij + v ij, m ij. 3 Sbmitting 8 and 6 into 3, 3 can be re written as follows: U ij = ij L i φ max, s ij = ij L i φ max, s ij K iσ max c Kiσmax jiv ij c jiv ij+ v ij ij,max γ K iσ max c ij v ij c ijv ij 2 +c ijv ijc ij v ij ij,max γ. 4 c Here, when c ij v ij c ij v ij, ij v ij c ijv ij 2 +c ijv ijc ij v ij c, and, conseqently, U ijv ij ij = ijl iφ max,s ijk iσ max. ij,max γc ijv ij Moreover, as c ij v ij c ij v ij, c ij v ij cij vij and, conseqently, c ijv ij 2 +c ijv ijc ij v ij c ijv ij 2 U ij = ijl iφ max,s ijk iσ max c ij v ij. For any other cases, ij,maxv ij γc ijv ij 2 U ij = ijl iφ max i,s ijk iσ max ij,max γ c ij v ij c ijv ij 2 +c ijv ijc ij v ij

Cases ii and iii can be proved sing similar method as case i. This completes the proof. From Theorem, we can see that the allocation of spectrm and comptation resorce jointly determines the delay tility. Indeed, Theorem provides gidance for the SBSs when they select actions in the learning algorithm that is proposed in Section III.. Problem Formlation Given the defined system model, or goal is to develop an effective resorce allocation scheme that allocates resorce blocks and comptation power to maximize the tility fnctions of all sers. However, the maximization problem depends not only on the resorce blocks allocation and comptation resorce allocation bt also on the ser associations. Moreover, the tility vale of each SBS depends not only on its own choice of resorce allocation scheme bt also on the remaining SBSs schemes. In addition, the data correlation among the sers varies as the period changes, which will affect the resorce allocation and ser association. In this case, we first ] formlate a noncooperative game G = [R, {A j } j R, {U j }. In this game, the players j R are the SBSs, A j represents the action set of each SBS j, and U j is the tility fnction of each SBS j. Here, an action of SBS j, a j, consists of: i downlink resorce allocation vector s j = [ s j, s 2j,..., s Ujj], ii plink resorce allocation vector v j = [ v j, v 2j,..., v Ujj], and iii comptation resorce allocation vector m j = [ m j, m 2j,..., m Ujj]. Here, mij M, i U j where M = { c M, 2c M,..., c} is a finite set of M level fractions of SBS j s total comptation resorce m j. We assme that each SBS j adopts one action at each time slot t. Then, the tility fnction of each SBS j can be given by: j a j, a j = T U ij,t s ij, v ij, c ij, 5 T t= i U j where a j A j is an action of SBS j and a j denotes the action profile of all SBSs other than SBS j. Indeed, 5 captres the T average tility vale of each SBS j. Let π j,aij = T {aj,t=a ij}= t= Pr a j,t = a ij be the probability of SBS j sing action a ij. Here, a j,t represents the action that SBS j ses at time t and [ a j,t = a ij denotes that ] SBS j adopts action a ij at time t. π j = π j,aj,..., π j,a Aj is the action selection strategy of SBS j j with A j being the nmber of actions of SBS j. Based on the definition of the strategy, the tility fnction in 5 is given by: j a j, a j = T U j,t a j, a j = U j a j, a j π j,aj, T t= a A j B 6 where a A with A being the action set of all SBSs. Given the proposed model, or goal is to solve the proposed resorce allocation game. A soltion for this game is the mixedstrategy Nash eqilibrim NE, formally defined as follows []: A mixed strategy profile π = π,..., π B = π j, j π is a mixed-strategy Nash eqilibrim if, j R and π j, we have: a A π j,aj j B j π j, π j j πj, π j, 7 where j π n, π n = U j a j, a j is the expected tility of SBS j when it selects the mixed strategy π j.for or game, the mixed-strategy NE for the SBSs represents a soltion of the game at which each SBS j can minimize the delay for its associated sers, given the actions of its opponents. III. ECHO STATE NETWORKS FOR SELF-ORGANIZING RESOURCE ALLOCATION Next, we introdce a transfer reinforcement learning RL algorithm that can be sed to find an NE of the VR game. To satisfy the delay reqirement for the VR transmission, we propose a transfer RL algorithm based on the neral networks framework of echo state networks ESN [6]. Traditional RL algorithms sch as Q-learning typically rely on a Q-table to record the tility vale. However, as the nmber of players and actions increases, the nmber of tility vales that the Q-table needs to inclde will increase exponentially and, hence, the Q-table may not be able to record all of the needed tility vales. However, the proposed algorithm ses a tility fnction approximation method to record the tility vale and, hence, it can be sed for large networks and large tility spaces. Moreover, a dynamic network in which the sers comptation resorce and data correlation may change across the time, traditional RL algorithms need to be exected each time the network changes. However, the proposed ESN transfer RL algorithm can find the relationship of the tility fnctions when the environment changes. After learning this relationship, the proposed algorithm can se the historic learning reslt to find a mixed strategy NE. The proposed transfer RL algorithm consists of two components: i ESN-based RL algorithm and ii ESN-based transfer learning algorithm. The ESN-based RL algorithm is based on or work in [4], and, ths, here, we jst introdce the ESN-based transfer learning algorithm. We first assme that, before the sers state information changes, the strategy, action, and tility of each SBS j are π j, a j and û j a j, a j, while the strategy, action, and tility of SBS j, after the sers state information changes, are π j, a j, and û j a j, a j. Since the nmber of sers associated with SBS j is nchanged, the sets of action and strategy of SBS j will not change when the sers state information changes. In this case, the proposed ESN-based transfer learning algorithm is sed to find the relationship between û j a j, a j and û j a j, a j when SBS j only knows û j a j, a j. This means that the proposed algorithm can transfer the information from the already learned tility û j a j, a j to the new tility û j a j, a j that mst be learned. The ESN-based transfer learning algorithm of each SBS j consists of three components: a inpt, b otpt, and c ESN model, which are given by: Inpt: The ESN-based transfer learning algorithm takes the strategies of the SBSs and the action of SBS j ses at time t as inpt which is given by x t,j = [π,, π B, a j,t ] T. Otpt: The otpt of the ESN-based transfer learning algorithm at time t is the deviation of the tility vales when the sers information changes y j,t = û j a j,t û j a j,t. ESN Model: An ESN model is sed to find the relationship between the inpt x t,j and otpt y t,j. The ESN model consists of the otpt weight matrix W ot j R Nw and the dynamic reservoir containing the inpt weight matrix W in j R Nw B+, and the recrrent matrix W j R Nw Nw with N w being the nmber of the dynamic reservoir nits. Here, the dynamic reservoir is sed to store historic ESN information that incldes

TABLE I ESN-BASE LEARNING ALGORITHM FOR RESOURCE ALLOCATION Inpts: x j,t and x j,t Initialize: W in j, W j, W ot j, W in j, W j, W ot j, y j = 0, and y j for each time t do. = 0. a Estimate the vale of the tility fnction û j,t based on 9. if t == b Set the mixed strategy π j,t niformly. else c Set the mixed strategy π j,t based on the ε-greedy exploration. end if d Broadcast the index of the mixed strategy to other SBSs. e Receive the index of the mixed strategy as inpt x j,t. f Perform an action based on the mixed strategy. g Use the index of the mixed strategies and action as inpt x j,t. h Estimate the vale of the difference of tility fnction y j,t. i Update the dynamic reservoir state µ j,t. j Update the otpt weight matrix W ot j based on y j,t. end for TABLE II SYSTEM PARAMETERS Parameter Vale Parameter Vale F 000 P B 20 dbm B 2 MHz S, V 5, 5 N w 000 σ 2-95 dbm N v 6 λ, λ 0.03, 0.3 m 5 r B 30 m α 2 V F 00 Gbit/s inpt, reservoir state, and otpt. This information is sed to bild the relationship between the inpt and otpt. The pdate process of the dynamic reservoir will be given by: µ j,t = f W jµ j,t + W in j x j,t. 8 where fx = ex e x e x +e is the tanh fnction. Based on the dynamic reservoir state, the ESN-based transfer learning algorithm x will combine with the otpt weight matrix to approximate the deviation of the tility vale, which can be given by: y j,t = W ot j,t µ j,t, 9 where W ot j,t is the otpt weight matrix at time slot t. W ot j,t+ = W ot j,t + λ û j a j,t û j a j,t y j,t µ T j,t, 20 where λ is the learning rate, and û j,t is the actal deviation between two tility vales. In this case, the ESN-based transfer learning algorithm can find the relationship between the tility fnctions when the sers state information changes and, hence, redce the iterations of the RL algorithm to learn the new tility vales. The proposed, distribted ESN-based learning algorithm performed by each SBS j is smmarized in Table I. The proposed algorithm is garanteed to converge to an NE and this convergence follows from [4]. IV. SIMULATION RESULTS For or simlations, we consider a clod-based SCN deployed within a circlar area with radis r = 00 m. U = 25 sers and B = 4 SBSs are niformly distribted in this SCN area. The rate reqirement of VR transmission is 25.32 Mbit/s [4]. The detailed parameters are listed in Table III. For comparison prposes, we se ESN algorithm and a baseline Q-learning algorithm in [4]. Fig. shows how the average delay per ser changes with the nmber of SBSs varies. Fig. shows that, as the nmber of SBSs increases, the average delay of all algorithms decreases, then increases. This is de to the fact that as the nmber of SBSs Average delay of each serviced ser ms 26 24 22 20 8 6 Proposed ESN-based learning algorithm Q-learning algorithm Q-learning withot data correlation ESN algorithm 4 2 3 4 5 6 Nmber of SBSs Fig.. Average delay of each ser vs. nmber of SBSs. elay tility of each SBS 6 5 4 3 2 0 ESN-based transfer learning algorithm ESN-based learning algorithm Q-learning with data correlation Convergent point 5 0 5 20 25 30 35 40 45 50 55 Nmber of iterations 0 2 Fig. 2. Convergence of the proposed algorithm and Q-learning. increases, the nmber of sers located in each SBS s coverage decreases and, hence, the average delay decreases. However, as the nmber of SBSs keeps increasing, the interference will also increase. Fig. also shows that or algorithm achieves p to 6.7% and 8.2% gains in terms of average delay compared to the Q-learning with data correlation and Q-learning withot data correlation for 6 SBSs. This is de to the fact that or algorithm can transfer information across time. From Fig., we can also see that the deviation between Q-learning algorithms decreases as the nmber of SBSs changes. This implies that as the nmber of SBSs increases, the nmber of sers associated with each SBS decreases and, hence, the data correlation of sers decreases. Fig. also shows that the delay gain of the proposed algorithm is small compared with ESN algorithm. However, the proposed algorithm can converge mch faster as shown in Fig. 2. Fig. 2 shows the nmber of iterations needed till convergence for the proposed approach, ESN algorithm, and Q-learning with data correlation when the sers information changes. In this figre, we can see that, as time elapses, the delay tilities for all considered algorithms increase ntil convergence to their final vales. Fig. 2 also shows that the proposed algorithm achieves, respectively, 22.5% and 36% gains in terms of the nmber of the iterations needed to reach convergence compared to ESN algorithm and Q-learning. This implies that the proposed algorithm can apply the already learned tility vale to the new tility vale that mst be learned as the sers information changes. V. CONCLUSION In this paper, we have proposed a novel resorce allocation framework for optimizing delay for wireless VR services with data correlation. We have formlated the problem as a noncooperative game and we have proposed a novel transfer learning algorithm based on echo state networks to solve the game. The proposed learning algorithm can se the existing learning reslt

to directly find the optimal resorce allocation when the sers state information changes and, hence, can qickly converge to a mixed-strategy NE. Simlation reslts have shown that the proposed algorithm has a faster convergence time than Q-learning and garantees low delays for VR services. REFERENCES [] E. Baştğ, M. Bennis, M. Médard, and M. ebbah, Towards interconnected virtal reality: Opportnities, challenges and enablers, arxiv preprint arxiv:6.05356, 206. [2] J. Ahn, Y. Yong Kim, and R. Y. Kim, elay oriented VR mode WLAN for efficient wireless mlti-ser virtal reality device, in Proc. of IEEE International Conference on Consmer Electronics, Las Vegas, NV, USA, March 207. [3] M. Singh and B. Jng, High-definition wireless personal area tracking sing AC magnetic field for virtal reality, in Proc. of IEEE Virtal Reality, Los Angeles, California, USA, March 207. [4] M. Chen, W. Saad, and C. Yin, Virtal reality over wireless networks: Qality-of-service model and learning-based resorce management, available online: arxiv.org/abs/703.04209, Mar. 207. [5] H. Jaeger, Short term memory in echo state networks, in GM Report, 200. [6] J. F. Yang, S. C. Chang, and C. Y. Chen, Comptation redction for motion search in low rate video coders, IEEE Transactions on Circits and Systems for Video Technology, vol. 2, no. 0, pp. 948 95, Oct. 2002. [7] N. Cressie and C. K. Wikle, Statistics for spatio-temporal data, John Wiley & Sons, 205. [8] M. C. Vran, O. B. Akan, and I. F. Akyildiz, Spatio-temporal correlation: theory and applications for wireless sensor networks, Compter Networks, vol. 45, no. 3, pp. 245 259, Jne 2004. [9] A. E. Abbas, Constrcting mltiattribte tility fnctions for decision analysis, INFORMS Ttorials in Operations Research, pp. 62 98, Oct. 200. [0] M. Chen, W. Saad, C. Yin, and M. ebbah, Echo state transfer learning for data correlation aware joint comptation and resorce allocation in wireless virtal reality, available online: http://resme.walidsaad.com/pdf/extendedasilomarpaper.pdf, May 207. [] Z. Han,. Niyato, W. Saad, T. Basar, and A. HjÃ rngnes, Game Theory in Wireless and Commnication Networks: Theory, Models, and Applications, Cambridge University Press, 202.