where v ij = [v ij,1,..., v ij,v ] is the vector of resource

Similar documents
Wireless Networks Without Edges : Dynamic Radio Resource Clustering and User Scheduling

FOUNTAIN codes [3], [4] provide an efficient solution

Full Duplex in Massive MIMO Systems: Analysis and Feasibility

Joint Transfer of Energy and Information in a Two-hop Relay Channel

Outline. Model Predictive Control: Current Status and Future Challenges. Separation of the control problem. Separation of the control problem

LOS Component-Based Equal Gain Combining for Ricean Links in Uplink Massive MIMO

Prediction of Transmission Distortion for Wireless Video Communication: Analysis

Step-Size Bounds Analysis of the Generalized Multidelay Adaptive Filter

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance

Cache-Aided Interactive Multiview Video Streaming in Small Cell Wireless Networks

Sources of Non Stationarity in the Semivariogram

Chapter 4 Supervised learning:

Performance analysis of GTS allocation in Beacon Enabled IEEE

Stair Matrix and its Applications to Massive MIMO Uplink Data Detection

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL

Simulation investigation of the Z-source NPC inverter

Adaptive Fault-tolerant Control with Control Allocation for Flight Systems with Severe Actuator Failures and Input Saturation

Queueing analysis of service deferrals for load management in power systems

System identification of buildings equipped with closed-loop control devices

Survivable Virtual Topology Mapping To Provide Content Connectivity Against Double-Link Failures

Fast Algorithms for Restoration of Color Wireless Capsule Endoscopy Images

Multi-Voltage Floorplan Design with Optimal Voltage Assignment

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty

Development of Second Order Plus Time Delay (SOPTD) Model from Orthonormal Basis Filter (OBF) Model

VIDEO DISTORTION ANALYSIS AND SYSTEM DESIGN FOR WIRELESS VIDEO COMMUNICATION

Stability of Model Predictive Control using Markov Chain Monte Carlo Optimisation

Study of the diffusion operator by the SPH method

Mathematical Analysis of Nipah Virus Infections Using Optimal Control Theory

Lecture Notes: Finite Element Analysis, J.E. Akin, Rice University

Network Coding for Multiple Unicasts: An Approach based on Linear Optimization

CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK. Wassim Jouini and Christophe Moy

Worst-case analysis of the LPT algorithm for single processor scheduling with time restrictions

1. Tractable and Intractable Computational Problems So far in the course we have seen many problems that have polynomial-time solutions; that is, on

Decision Oriented Bayesian Design of Experiments

A Model-Free Adaptive Control of Pulsed GTAW

Performance Analysis of the Idle Mode Capability in a Dense Heterogeneous Cellular Network

QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING

Nonparametric Identification and Robust H Controller Synthesis for a Rotational/Translational Actuator

PREDICTABILITY OF SOLID STATE ZENER REFERENCES

Fast Path-Based Neural Branch Prediction

New Multi-User OFDM Scheme: Braided Code Division Multiple Access

Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers

), σ is a parameter, is the Euclidean norm in R d.

Study on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom

EVALUATION OF GROUND STRAIN FROM IN SITU DYNAMIC RESPONSE

RESGen: Renewable Energy Scenario Generation Platform

Estimating models of inverse systems

Data-Efficient Control Policy Search using Residual Dynamics Learning

Centralized Wireless Data Networks: Performance Analysis

Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS

Effects of Soil Spatial Variability on Bearing Capacity of Shallow Foundations

Power-and Rate-Adaptation Improves the Effective Capacity of C-RAN for Nakagami-m Fading Channels

IN RECENT years, global mobile data traffic has experienced

Formal Methods for Deriving Element Equations

Regression Analysis of Octal Rings as Mechanical Force Transducers

Convergence analysis of ant colony learning

The Real Stabilizability Radius of the Multi-Link Inverted Pendulum

STABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION

FEA Solution Procedure

Linear System Theory (Fall 2011): Homework 1. Solutions

Information Source Detection in the SIR Model: A Sample Path Based Approach

Online Solution of State Dependent Riccati Equation for Nonlinear System Stabilization

Designing MIPS Processor

Designing of Virtual Experiments for the Physics Class

Modelling by Differential Equations from Properties of Phenomenon to its Investigation

PAPR Constrained Power Allocation fo Carrier Transmission in Multiuser SI Communications. Author(s)Trevor, Valtteri; Tolli, Anti; Matsu

Please Lower Small Cell Antenna Heights in 5G

Decoder Error Probability of MRD Codes

Sareban: Evaluation of Three Common Algorithms for Structure Active Control

Reducing Conservatism in Flutterometer Predictions Using Volterra Modeling with Modal Parameter Estimation

Computational Fluid Dynamics Simulation and Wind Tunnel Testing on Microlight Model

Prandl established a universal velocity profile for flow parallel to the bed given by

Evaluation of the Fiberglass-Reinforced Plastics Interfacial Behavior by using Ultrasonic Wave Propagation Method

Classify by number of ports and examine the possible structures that result. Using only one-port elements, no more than two elements can be assembled.

Discontinuous Fluctuation Distribution for Time-Dependent Problems

Quantum Key Distribution Using Decoy State Protocol

Simplified Identification Scheme for Structures on a Flexible Base

Please Lower Small Cell Antenna Heights in 5G

Chapter 4 Linear Models

arxiv: v1 [cs.sy] 22 Nov 2018

REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL

Active Flux Schemes for Advection Diffusion

Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defuzzification Strategy in an Adaptive Neuro Fuzzy Inference System

FINITE ELEMENT MODELING OF EDDY CURRENT PROBES FOR EDGE EFFECT

Control Performance Monitoring of State-Dependent Nonlinear Processes

International Journal of Physical and Mathematical Sciences journal homepage:

Decision making is the process of selecting

Capacity Provisioning for Schedulers with Tiny Buffers

Linear and Nonlinear Model Predictive Control of Quadruple Tank Process

Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters

Analytical Value-at-Risk and Expected Shortfall under Regime Switching *

Efficient quadratic penalization through the partial minimization technique

Assignment Fall 2014

Robust Tracking and Regulation Control of Uncertain Piecewise Linear Hybrid Systems

An Auction Algorithm for Procuring Wireless Channel in A Heterogenous Wireless Network

Department of Industrial Engineering Statistical Quality Control presented by Dr. Eng. Abed Schokry

The Replenishment Policy for an Inventory System with a Fixed Ordering Cost and a Proportional Penalty Cost under Poisson Arrival Demands

The spreading residue harmonic balance method for nonlinear vibration of an electrostatically actuated microbeam

Instruction register. Data. Registers. Register # Memory data register

Transcription:

Echo State Transfer Learning for ata Correlation Aware Resorce Allocation in Wireless Virtal Reality Mingzhe Chen, Walid Saad, Changchan Yin, and Méroane ebbah Beijing Laboratory of Advanced Information Network, Beijing University of Posts and Telecommnications, Beijing, China 00876, Emails: chenmingzhe@bpt.ed.cn, ccyin@ieee.org. Wireless@VT, Bradley epartment of Electrical and Compter Engineering, Virginia Tech, Blacksbrg, VA, USA, Email: walids@vt.ed. Mathematical and Algorithmic Sciences Lab, Hawei France R &, Paris, France, Email: meroane.debbah@hawei.com. Abstract In this paper, the problem of data correlation aware resorce management is stdied for a network of wireless virtal reality VR sers commnicating over clod-based small cell networks SCNs. In the stdied model, the small base stations SBSs with limited comptation resorce act as VR control centers that collect the tracking information from VR sers over the celllar plink and send them to the VR sers over the downlink. In sch a setting, VR sers may send or reqest the correlated or similar data panoramic images and tracking data. This potential spatial data correlation can be factored into the resorce allocation problem to redce the traffic load in both plink and downlink. This VR resorce allocation problem is formlated as a noncooperative game that allows jointly optimizing the comptation and spectrm resorces, while being cognizant of the data correlation. To solve this game, a transfer learning algorithm based on the machine learning framework of echo state networks ESNs is proposed. Unlike conventional reinforcement learning algorithms that mst be exected each time the environment changes, the proposed algorithm can intelligently transfer information on the learned tility, across time, to rapidly adapt to environmental dynamics de to factors sch as changes in the sers content or data correlation. Simlation reslts show that the proposed algorithm achieves p to 6.7% and 8.2% gains in terms of delay compared to the Q-learning with data correlation and Q-learning withot data correlation. The reslts also show that the proposed algorithm has a faster convergence time than Q-learning and can garantee low delays. I. INTROUCTION Virtal reality VR can enable sers to virtally hike the Grand Canyon or make a secret mission as a video game hero withot leaving their room. However, de to the wired connections of conventional VR devices, the sers are significantly restricted in the type of actions that they can take and VR applications that they can experience. To enable pervasive and trly immersive VR applications, VR systems can be operated sing wireless networking technologies []. However, operating VR devices over wireless celllar systems sch as small cell networks SCNs faces many challenges [] that inclde tracking accracy, extremely low delay, and effective image compression. The existing literatre has stdied a nmber of problems related to wireless VR sch as in [] [4]. The athors in [] exposed the ftre challenges of VR systems over a wireless network. However, this work is restricted to preliminary srveys that do not provide any technical soltions for optimizing wireless VR. In [2], a channel access scheme for wireless mlti-ser VR system is proposed. The athors in [3] proposed an alternate crrent magnetic field-based tracking system to track the position and orientation of a VR ser s head. However, existing works sch as in [2] and [3] only focs on the improvement of one VR qality-of-service QoS metric sch as tracking or delay. Indeed, this prior art does not develop any VR-specific model that can captre all factors of VR QoS jointly consider plink and downlink and, hence, these works fall short in addressing the challenges of optimizing VR QoS for wireless sers. In [4], we proposed a wireless VR model that captres the tracking accracy, processing delay, and transmission delay and proposed a machine learning based algorithm to solve the resorce allocation problem. However, this work is only focsed on spectrm allocation that ignores the data correlation over the data transmission of VR sers. Indeed, the sensors placed at a VR ser can collect the tracking data of other sers and, hence, the tracking data of VR sers may have some correlation. Moreover, when the VR sers are watching a football game with different perspective, the clod only needs to transmit one 360 image to the SBS, then the SBS can rotate the image and transmit it to different sers. In this case, the se of data correlation to redce the traffic load in data transmission can improve the transmission delay. The main contribtion of this paper is to introdce a novel framework for enabling VR applications over wireless celllar networks. To the best of or knowledge, this is the first work that jointly considers the data correlation, spectrm resorce allocation, and comptation resorce allocation for VR over celllar networks. Hence, or key contribtions inclde: We propose a novel VR model to jointly captre the downlink and plink transmission delay, backhal transmission delay, and comptation time ths effectively qantifying the VR delay for all sers in a wireless VR network. For the considered VR applications over wireless, we analyze resorce blocks allocation jointly over, the plink and downlink and the comptation resorce allocation via the plink. We formlate the problem as a noncooperative game in which the players are the small base stations SBSs. Each player seeks to find an optimal resorce allocation scheme to optimize a tility fnction that captres the VR delay. To solve this game, we propose a transfer learning algorithm based on echo state networks ESNs [5] to find the Nash eqilibrim of the game. The proposed algorithm can intelligently transfer information on the learned tility across time, and, hence, allow adaptation to environmental dynamics de to factors sch as changes in the sers data correlation. Simlation reslts show that the proposed algorithm can, respectively, yield 6.7% and 8.2% gains in terms of delay compared to Q-learning with data correlation and Q-learning withot data correlation. II. SYSTEM MOEL AN PROBLEM FORMULATION Consider the downlink and plink transmission of a clodbased SCN servicing a set U of U wireless VR sers and a set B of B SBSs. Here, the downlink is sed to transmit the VR images displayed on each ser s VR device while the plink is

sed to transmit the tracking information that is sed to determine each VR ser s location and orientation. The SBSs are connected to a clod via capacity-constrained backhal links and the SBSs serve their sers sing the celllar band. Here, V F represents the maximm backhal transmission rate for all sers. Here, we focs on entertainment VR applications sch as watching immersive videos and playing immersive games. In or model, the SBSs adopt an orthogonal freqency division mltiple access OFMA techniqe and transmit over a set of V of V plink resorce blocks and a set of S of S downlink resorce blocks. The coverage of each SBS is a circlar area with radis r and each SBS only allocates resorce blocks to the sers located in its coverage range. We also assme that the resorce blocks of each SBS will all be allocated to the associated sers. A. ata Correlation Model ownlink ata Correlation Model: In VR wireless networks, mltiple VR sers may play the same immersive game with different locations and orientations. In this case, the clod can exploit the data correlation between the sers that are playing the same immersive game to redce the traffic load of backhal links. For example, when the sers are watching the same immersive sports game, the clod can extract the difference between the VR images of these sers and will only need to transmit the data that is niqe to each ser to an SBS. However, when the VR sers are playing different immersive games, the data correlation between the sers is low and, hence, the clod needs to transmit entire VR images to the associated VR sers. In order to define the data correlation of VR images, we first assme that the nmber of pixels that ser i needs to constrct the VR images is N i and the nmber of different pixels between any pair of sers i and k is N ik. Here, N ik is calclated by the clod sing image processing methods sch as motion search [6]. Then, the data correlation between ser i and ser k can be defined as follows: φ ik = N ik N i + N k, where N k is the nmber of pixels that ser k needs to constrct the VR images dring a period. Indeed, captres the difference between the images of sers i and j. From, we can see that when ser i and ser k are associated with the same SBS, the clod only needs to transmit N i + N j N i + N j φ ij pixels to that SBS. 2 Uplink ata Correlation Model: In the plink, the sers mst transmit the tracking information to the SBSs. The tracking information is collected by the sensors placed at a VR ser s headset or near the VR ser. It has been shown that for most data-gathering applications, the data sorce can be modeled as a Gassian field [7]. The plink data is collected by the sensors and, hence, the plink data can be assmed to follow the Gassian distribtion. We can assme that the tracking data, X i, collected by each VR ser i is a Gassian random variable with mean µ i and variance σi 2. In wireless VR, observations from proximal VR devices are often correlated de to the dense deployment density. Hence, we consider the power exponential model [8] to captre the spatial correlation of VR tracking data. Here, the covariance σ ij between ser i and ser j separated by distance d ij is: σ ij = cov X i, X j = σ i σ j e dα ij/κ, 2 where α and κ captre the significance of distance variation on data correlation. B. elay Model In an SCN, the VR images are transmitted from the clod to the SBSs then to the sers. The tracking information is transmitted from the sers to the SBSs and processed at each corresponding SBS. In this case, the backhal links are only sed for VR image transmission and the transmission rate of each VR image from the clod to the SBS can be given as V F i = V F U. Here, we assme that the backhal transmission rate of each ser is eqal and we do not consider the optimization of the backhal transmission. In a VR model, we need to captre the VR transmission reqirements sch as high data rate, low delay, and accrate tracking and, hence, we consider the transmission delay as the main VR QoS metric of interest. The downlink rate of ser i associated with SBS j is: S c ij s ij = s ij,k Blog 2 + γ ij,k, 3 k= where s ij = [s ij,,..., s ij,s ] is the vector of resorce blocks that SBS j allocates to ser i with s ij,k {, 0}. Here, s ij,k = indicates that resorce block k is allocated to ser i. γ ij,k = P B h k ij N0 2+ P B h k l R k il,l j is the signal-to-interference-pls-noise ratio SINR between ser i and SBS j over resorce block k. Here, R k represents the set of the SBSs that se downlink resorce block k, B is the bandwidth of each sbcarrier, P B is the transmit power of SBS j which is assmed to be eqal for all SBSs, N0 2 is the variance of the Gassian noise and h k ij = gk ij p β ij is the path loss between ser i and SBS j over resorce block with gij k is the Rayleigh fading parameter, d ij is the distance between ser i and SBS j, and β is the path loss exponent. Based on and 3, the downlink transmission delay at time slot t is: ij L i φ max, s ij = L i φ max c ij s ij + L i φ max, 4 V F U where L i φ max is the data that ser i needs to constrct a VR image dring a period and φ max i = max φ ik is the maximm downlink data correlation between ser i and other sers k U j,k i associated with SBS j. Finding the maximm data correlation allows minimizing the downlink transmission data transmitted in the downlink and that will be sed constrct a VR image. Here, the first term is the transmission time from SBS j to ser i and the second term is the transmission time from the clod to SBS j. We assme that P U is the transmit power of each ser which is assmed to be eqal for all sers. The bandwidth of each plink resorce block is also B. In this case, the plink rate of each ser i associated with SBS j is: V c ij v ij = v ij,k Blog 2 + γ ij,k, 5 k= where v ij = [v ij,,..., v ij,v ] is the vector of resorce blocks that SBS j allocates to ser i with v ij,k {, 0}. γ ij,k = P U h k ij σ 2 + P U h k l U k il,l j is the SINR between ser i and SBS j over resorce block k with U k represents the set of sers that se plink resorce blocks k. In this case, the plink transmission delay can be given by Kiσmax where K i is the data that needs to c ijv ij

be transmitted and σi max = max σ ik is the maximm plink k U j,k i data correlation between ser i and other SBS j s associated sers. Similarly, finding the maximm data correlation allows minimizing the plink transmission data that SBS j ses to determine ser i s location and orientation. In the plink, the tracking information can be directly processed by the SBSs that have limited comptation power. Here, the comptation resorce of each SBS, c, represents its ability to compte the tracking data. Each SBS j will allocate the total comptation power to the associated sers and, hence, m ij is sed to represent the comptation power that SBS j allocates to ser i with i U j m ij = m. U j represents the set of the sers associated with SBS j. The comptation time of SBS j that processes the tracking data collected by ser i is Kiσmax m ij and the total plink delay can be given by: ij K i σi max, v ij, m ij = K iσi max c ij v ij + K iσi max, 6 m ij where the first term is the transmission time from ser i to SBS j and the second term is the comptation time for ser i data. Here, the comptation time depends on the comptation resorce that SBS j allocates to each ser that will affect the plink delay. C. Utility Fnction Model In order to jointly consider the transmission delay in both plink and downlink, we introdce a method based on the framework of mlti-attribte tility theory [9] to constrct an appropriate tility fnction to captre transmission delay in both plink and downlink. We first introdce the tility fnctions of transmission delay in plink and downlink, separately. Then, we formlate the tility fnction based on [9]. The tility fnction of downlink transmission delay is constrcted based on the normalization of downlink transmission delay, which can be given by: ij L i φ max, s ij = { ij,max ijl iφ max,s ij ij,max γ, ij L i φ max, s ij γ, 7, ij L i φ max, s ij < γ, where γ is the maximal tolerable delay for each VR ser maximm spported by the VR system being sed and ij,max = max ij L i 0, s ij is the maximal transmission delay. From s ij 7, we can see that, when the downlink transmission delay is smaller than γ, the tility vale will remain at. This is de to the fact when the delay meets the system reqirement, the network will encorage the SBSs to reallocate the resorce blocks to other sers. The tility fnction for the plink transmission is: ij K i σ max, v ij, m ij = { ij,max ijk iσi max ij,max γ,v ij,m ij, ijk iσ max, v ij, m ij γ, 8, ijk iσi max, v ij, m ij<γ, where γ is the maximal tolerable delay for the VR tracking information transmission and ij,max = max v ij K i 0, v ij, m ij is the maximal plink delay. ij,m ij Based on 7 and 8, the total tility fnction that captres both downlink and plink delay for ser i associated with SBS j is: U ij s ij, v ij, m ij = ij L i φ max, s ij ij K i σi max, v ij, m ij. 9 Here, L i φ max and K i σi max are determined by the ser association scheme. In order to captre the gain that stems from the allocation of the resorce blocks and the comptational capabilities, we state the following reslt: Theorem. The tility gain of ser i s delay de to an increase in the amont of allocated resorce blocks and comptational resorces is: i The gain that stems from an increase in the allocated plink resorce blocks, U ij, is given by: U ij = f ij c ijv ij c ij v ij f ij c ijv ij 2 f ij, c ij v ij c ij v ij,, c ij v ij c ij v ij,, else, c ij v ij c ijv ij 2 +c ijv ijc ij v ij 0 where f ij x = ij L i φ max Kiσ, s ij max x. ij,max γ ii The gain that stems from the increase in the nmber of downlink resorce blocks allocated to ser i, U ij, is: f ij c ijs ij, c ij s ij c ij s ij, cij s U ij = f ij ij, c c ijs ij 2 ij s ij c ij s ij, c f ij s ij ij, else, c ijs ij 2 +c ijs ijc ij s ij Liφ max x where f ij x= ij K i σi max, v ij, m ij ij,max γ. iii The gain that stems from the increase in the amont of comptation resorces, m, allocated to ser i, U ij, is: U ij = ijl i φ max K iσi max m, s ij. 2 ij,max γ mijm ij+ m Proof. For i, The gain that stems from an increase in the allocated plink resorce blocks, U ij, can be given by: U ij = U ij s ij, v ij + v ij, m ij U ij s ij, v ij, m ij = ij L i φ max, s ij ij K i σi max, v ij, m ij ij L i φ max, s ij ij K i σi max, v ij + v ij, m ij. 3 Sbmitting 8 and 6 into 3, 3 can be re written as follows: U ij = ij L i φ max, s ij = ij L i φ max, s ij K iσ max c Kiσmax jiv ij c jiv ij+ v ij ij,max γ K iσ max c ij v ij c ijv ij 2 +c ijv ijc ij v ij ij,max γ. 4 c Here, when c ij v ij c ij v ij, ij v ij c ijv ij 2 +c ijv ijc ij v ij c, and, conseqently, U ijv ij ij = ijl iφ max,s ijk iσ max. ij,max γc ijv ij Moreover, as c ij v ij c ij v ij, c ij v ij cij vij and, conseqently, c ijv ij 2 +c ijv ijc ij v ij c ijv ij 2 U ij = ijl iφ max,s ijk iσ max c ij v ij. For any other cases, ij,maxv ij γc ijv ij 2 U ij = ijl iφ max i,s ijk iσ max ij,max γ c ij v ij c ijv ij 2 +c ijv ijc ij v ij

Cases ii and iii can be proved sing similar method as case i. This completes the proof. From Theorem, we can see that the allocation of spectrm and comptation resorce jointly determines the delay tility. Indeed, Theorem provides gidance for the SBSs when they select actions in the learning algorithm that is proposed in Section III.. Problem Formlation Given the defined system model, or goal is to develop an effective resorce allocation scheme that allocates resorce blocks and comptation power to maximize the tility fnctions of all sers. However, the maximization problem depends not only on the resorce blocks allocation and comptation resorce allocation bt also on the ser associations. Moreover, the tility vale of each SBS depends not only on its own choice of resorce allocation scheme bt also on the remaining SBSs schemes. In addition, the data correlation among the sers varies as the period changes, which will affect the resorce allocation and ser association. In this case, we first ] formlate a noncooperative game G = [R, {A j } j R, {U j }. In this game, the players j R are the SBSs, A j represents the action set of each SBS j, and U j is the tility fnction of each SBS j. Here, an action of SBS j, a j, consists of: i downlink resorce allocation vector s j = [ s j, s 2j,..., s Ujj], ii plink resorce allocation vector v j = [ v j, v 2j,..., v Ujj], and iii comptation resorce allocation vector m j = [ m j, m 2j,..., m Ujj]. Here, mij M, i U j where M = { c M, 2c M,..., c} is a finite set of M level fractions of SBS j s total comptation resorce m j. We assme that each SBS j adopts one action at each time slot t. Then, the tility fnction of each SBS j can be given by: j a j, a j = T U ij,t s ij, v ij, c ij, 5 T t= i U j where a j A j is an action of SBS j and a j denotes the action profile of all SBSs other than SBS j. Indeed, 5 captres the T average tility vale of each SBS j. Let π j,aij = T {aj,t=a ij}= t= Pr a j,t = a ij be the probability of SBS j sing action a ij. Here, a j,t represents the action that SBS j ses at time t and [ a j,t = a ij denotes that ] SBS j adopts action a ij at time t. π j = π j,aj,..., π j,a Aj is the action selection strategy of SBS j j with A j being the nmber of actions of SBS j. Based on the definition of the strategy, the tility fnction in 5 is given by: j a j, a j = T U j,t a j, a j = U j a j, a j π j,aj, T t= a A j B 6 where a A with A being the action set of all SBSs. Given the proposed model, or goal is to solve the proposed resorce allocation game. A soltion for this game is the mixedstrategy Nash eqilibrim NE, formally defined as follows []: A mixed strategy profile π = π,..., π B = π j, j π is a mixed-strategy Nash eqilibrim if, j R and π j, we have: a A π j,aj j B j π j, π j j πj, π j, 7 where j π n, π n = U j a j, a j is the expected tility of SBS j when it selects the mixed strategy π j.for or game, the mixed-strategy NE for the SBSs represents a soltion of the game at which each SBS j can minimize the delay for its associated sers, given the actions of its opponents. III. ECHO STATE NETWORKS FOR SELF-ORGANIZING RESOURCE ALLOCATION Next, we introdce a transfer reinforcement learning RL algorithm that can be sed to find an NE of the VR game. To satisfy the delay reqirement for the VR transmission, we propose a transfer RL algorithm based on the neral networks framework of echo state networks ESN [6]. Traditional RL algorithms sch as Q-learning typically rely on a Q-table to record the tility vale. However, as the nmber of players and actions increases, the nmber of tility vales that the Q-table needs to inclde will increase exponentially and, hence, the Q-table may not be able to record all of the needed tility vales. However, the proposed algorithm ses a tility fnction approximation method to record the tility vale and, hence, it can be sed for large networks and large tility spaces. Moreover, a dynamic network in which the sers comptation resorce and data correlation may change across the time, traditional RL algorithms need to be exected each time the network changes. However, the proposed ESN transfer RL algorithm can find the relationship of the tility fnctions when the environment changes. After learning this relationship, the proposed algorithm can se the historic learning reslt to find a mixed strategy NE. The proposed transfer RL algorithm consists of two components: i ESN-based RL algorithm and ii ESN-based transfer learning algorithm. The ESN-based RL algorithm is based on or work in [4], and, ths, here, we jst introdce the ESN-based transfer learning algorithm. We first assme that, before the sers state information changes, the strategy, action, and tility of each SBS j are π j, a j and û j a j, a j, while the strategy, action, and tility of SBS j, after the sers state information changes, are π j, a j, and û j a j, a j. Since the nmber of sers associated with SBS j is nchanged, the sets of action and strategy of SBS j will not change when the sers state information changes. In this case, the proposed ESN-based transfer learning algorithm is sed to find the relationship between û j a j, a j and û j a j, a j when SBS j only knows û j a j, a j. This means that the proposed algorithm can transfer the information from the already learned tility û j a j, a j to the new tility û j a j, a j that mst be learned. The ESN-based transfer learning algorithm of each SBS j consists of three components: a inpt, b otpt, and c ESN model, which are given by: Inpt: The ESN-based transfer learning algorithm takes the strategies of the SBSs and the action of SBS j ses at time t as inpt which is given by x t,j = [π,, π B, a j,t ] T. Otpt: The otpt of the ESN-based transfer learning algorithm at time t is the deviation of the tility vales when the sers information changes y j,t = û j a j,t û j a j,t. ESN Model: An ESN model is sed to find the relationship between the inpt x t,j and otpt y t,j. The ESN model consists of the otpt weight matrix W ot j R Nw and the dynamic reservoir containing the inpt weight matrix W in j R Nw B+, and the recrrent matrix W j R Nw Nw with N w being the nmber of the dynamic reservoir nits. Here, the dynamic reservoir is sed to store historic ESN information that incldes

TABLE I ESN-BASE LEARNING ALGORITHM FOR RESOURCE ALLOCATION Inpts: x j,t and x j,t Initialize: W in j, W j, W ot j, W in j, W j, W ot j, y j = 0, and y j for each time t do. = 0. a Estimate the vale of the tility fnction û j,t based on 9. if t == b Set the mixed strategy π j,t niformly. else c Set the mixed strategy π j,t based on the ε-greedy exploration. end if d Broadcast the index of the mixed strategy to other SBSs. e Receive the index of the mixed strategy as inpt x j,t. f Perform an action based on the mixed strategy. g Use the index of the mixed strategies and action as inpt x j,t. h Estimate the vale of the difference of tility fnction y j,t. i Update the dynamic reservoir state µ j,t. j Update the otpt weight matrix W ot j based on y j,t. end for TABLE II SYSTEM PARAMETERS Parameter Vale Parameter Vale F 000 P B 20 dbm B 2 MHz S, V 5, 5 N w 000 σ 2-95 dbm N v 6 λ, λ 0.03, 0.3 m 5 r B 30 m α 2 V F 00 Gbit/s inpt, reservoir state, and otpt. This information is sed to bild the relationship between the inpt and otpt. The pdate process of the dynamic reservoir will be given by: µ j,t = f W jµ j,t + W in j x j,t. 8 where fx = ex e x e x +e is the tanh fnction. Based on the dynamic reservoir state, the ESN-based transfer learning algorithm x will combine with the otpt weight matrix to approximate the deviation of the tility vale, which can be given by: y j,t = W ot j,t µ j,t, 9 where W ot j,t is the otpt weight matrix at time slot t. W ot j,t+ = W ot j,t + λ û j a j,t û j a j,t y j,t µ T j,t, 20 where λ is the learning rate, and û j,t is the actal deviation between two tility vales. In this case, the ESN-based transfer learning algorithm can find the relationship between the tility fnctions when the sers state information changes and, hence, redce the iterations of the RL algorithm to learn the new tility vales. The proposed, distribted ESN-based learning algorithm performed by each SBS j is smmarized in Table I. The proposed algorithm is garanteed to converge to an NE and this convergence follows from [4]. IV. SIMULATION RESULTS For or simlations, we consider a clod-based SCN deployed within a circlar area with radis r = 00 m. U = 25 sers and B = 4 SBSs are niformly distribted in this SCN area. The rate reqirement of VR transmission is 25.32 Mbit/s [4]. The detailed parameters are listed in Table III. For comparison prposes, we se ESN algorithm and a baseline Q-learning algorithm in [4]. Fig. shows how the average delay per ser changes with the nmber of SBSs varies. Fig. shows that, as the nmber of SBSs increases, the average delay of all algorithms decreases, then increases. This is de to the fact that as the nmber of SBSs Average delay of each serviced ser ms 26 24 22 20 8 6 Proposed ESN-based learning algorithm Q-learning algorithm Q-learning withot data correlation ESN algorithm 4 2 3 4 5 6 Nmber of SBSs Fig.. Average delay of each ser vs. nmber of SBSs. elay tility of each SBS 6 5 4 3 2 0 ESN-based transfer learning algorithm ESN-based learning algorithm Q-learning with data correlation Convergent point 5 0 5 20 25 30 35 40 45 50 55 Nmber of iterations 0 2 Fig. 2. Convergence of the proposed algorithm and Q-learning. increases, the nmber of sers located in each SBS s coverage decreases and, hence, the average delay decreases. However, as the nmber of SBSs keeps increasing, the interference will also increase. Fig. also shows that or algorithm achieves p to 6.7% and 8.2% gains in terms of average delay compared to the Q-learning with data correlation and Q-learning withot data correlation for 6 SBSs. This is de to the fact that or algorithm can transfer information across time. From Fig., we can also see that the deviation between Q-learning algorithms decreases as the nmber of SBSs changes. This implies that as the nmber of SBSs increases, the nmber of sers associated with each SBS decreases and, hence, the data correlation of sers decreases. Fig. also shows that the delay gain of the proposed algorithm is small compared with ESN algorithm. However, the proposed algorithm can converge mch faster as shown in Fig. 2. Fig. 2 shows the nmber of iterations needed till convergence for the proposed approach, ESN algorithm, and Q-learning with data correlation when the sers information changes. In this figre, we can see that, as time elapses, the delay tilities for all considered algorithms increase ntil convergence to their final vales. Fig. 2 also shows that the proposed algorithm achieves, respectively, 22.5% and 36% gains in terms of the nmber of the iterations needed to reach convergence compared to ESN algorithm and Q-learning. This implies that the proposed algorithm can apply the already learned tility vale to the new tility vale that mst be learned as the sers information changes. V. CONCLUSION In this paper, we have proposed a novel resorce allocation framework for optimizing delay for wireless VR services with data correlation. We have formlated the problem as a noncooperative game and we have proposed a novel transfer learning algorithm based on echo state networks to solve the game. The proposed learning algorithm can se the existing learning reslt

to directly find the optimal resorce allocation when the sers state information changes and, hence, can qickly converge to a mixed-strategy NE. Simlation reslts have shown that the proposed algorithm has a faster convergence time than Q-learning and garantees low delays for VR services. REFERENCES [] E. Baştğ, M. Bennis, M. Médard, and M. ebbah, Towards interconnected virtal reality: Opportnities, challenges and enablers, arxiv preprint arxiv:6.05356, 206. [2] J. Ahn, Y. Yong Kim, and R. Y. Kim, elay oriented VR mode WLAN for efficient wireless mlti-ser virtal reality device, in Proc. of IEEE International Conference on Consmer Electronics, Las Vegas, NV, USA, March 207. [3] M. Singh and B. Jng, High-definition wireless personal area tracking sing AC magnetic field for virtal reality, in Proc. of IEEE Virtal Reality, Los Angeles, California, USA, March 207. [4] M. Chen, W. Saad, and C. Yin, Virtal reality over wireless networks: Qality-of-service model and learning-based resorce management, available online: arxiv.org/abs/703.04209, Mar. 207. [5] H. Jaeger, Short term memory in echo state networks, in GM Report, 200. [6] J. F. Yang, S. C. Chang, and C. Y. Chen, Comptation redction for motion search in low rate video coders, IEEE Transactions on Circits and Systems for Video Technology, vol. 2, no. 0, pp. 948 95, Oct. 2002. [7] N. Cressie and C. K. Wikle, Statistics for spatio-temporal data, John Wiley & Sons, 205. [8] M. C. Vran, O. B. Akan, and I. F. Akyildiz, Spatio-temporal correlation: theory and applications for wireless sensor networks, Compter Networks, vol. 45, no. 3, pp. 245 259, Jne 2004. [9] A. E. Abbas, Constrcting mltiattribte tility fnctions for decision analysis, INFORMS Ttorials in Operations Research, pp. 62 98, Oct. 200. [0] M. Chen, W. Saad, C. Yin, and M. ebbah, Echo state transfer learning for data correlation aware joint comptation and resorce allocation in wireless virtal reality, available online: http://resme.walidsaad.com/pdf/extendedasilomarpaper.pdf, May 207. [] Z. Han,. Niyato, W. Saad, T. Basar, and A. Hjà rngnes, Game Theory in Wireless and Commnication Networks: Theory, Models, and Applications, Cambridge University Press, 202.