A Cooperation Framework for Traffic Offloading among Cellular Systems

Size: px

Start display at page:

Download "A Cooperation Framework for Traffic Offloading among Cellular Systems"

Kathlyn McLaughlin
5 years ago
Views:

1 A Cooperation Framewor for Traffic Offloading among Cellular Systems Diep N. Nguyen and Iain B. Collings and Stephen V. Hanly and Philip Whiting Department of Engineering, Macquarie University, Australia {diep.nguyen, iain.collings, stephen.hanly, Abstract This wor introduces a novel cooperation framewor that allows mobile service providers (MSPs) to offload traffic onto each other so that temporarily unused spectrum/resources of cellular bands can be opportunistically harvested. Specifically, through traffic offloading, MSPs aim to maximize their profit while maintaining their QoS commitment. For that purpose, we model the strategic cooperation between MSPs as a stochastic Marov game in which the dynamics of MSPs resources and user behaviors are captured by an underlying Marov decision process. We prove that the game is irreducible and admits a Nash Equilibrium at which all MSPs benefit from traffic offloading. A practical algorithm that uses only local information to govern traffic offloading at MSPs is then developed. Numerical simulations show that by designing appropriate profit sharing contracts, this algorithm can achieve almost the same performance as that of a socially optimal solution. Index Terms Traffic offloading, cellular, Marov game. I. INTRODUCTION Currently WiFi (IEEE 802.) is the most widely adopted solution for offloading mobile traffic at hot spots (through dualmode devices). While a great success, WiFi is constrained by the open-access ISM bands which have been over crowded, and often offer poor use experience, especially in urban areas []. Therefore, it is foreseeable that WiFi alone cannot cope with the high wireless demand in the near future. In this wor, we tae a further step to enable traffic offloading between MSPs to harvest whitespaces in cellular bands. Specifically, when receiving service requests, a MSP can decide to serve its customers (we use customers or service requests in queueing theory to mean data/communications sessions in cellular systems) or redirect them to other MSPs. A MSP can also decide to either serve or reject customers offloaded from other MSPs. Serving an offloaded customer at a MSP may lead to the ris of losing/rejecting the MSP s own future customers or violating the MSP s quality of service (QoS) commitment. The decision hence depends not only on the MSP s resource availability but also on the reward/payment from the customers, and MSPs QoS commitment. Traffic offloading between MSPs is inline with the well-nown spectrum sharing/trading concept in which unused spectrum can be traded on maret for its owner s profit. Although various spectrum economics/sharing and auctioning mechanisms have been proposed, a dynamic spectrum maret is still unliely in the near future. This is because temporarily unused spectrum truns are highly dynamic in both temporal and spatial dimensions and depends on users behaviors. Consequently, MSPs are not willing to share/trade their spectrum but rather maintain their exclusive ownership so that they can access whenever and wherever needed. Note that existing wors on spectrum sharing (e.g., [2]) did not tae into account the dynamics of users behaviors and the dynamics of the resource availability. Those approaches fail This research was supported in part by Australian Research Council (Discovery Early Career Researcher Award) and the CSIRO Macquarie University Chair in Wireless Communications. This Chair has been established with funding provided by the Science and Industry Endowment Fund. to harvest short-lived whitespaces that, from real-life cellular system traces, account for more than one third of the entire frequency-time resources of the cellular bands, even in urban areas [3], [4]. Given the above, we model the traffic offloading between MSPs as a constrained stochastic Marov game [5] [6] in which MSPs or players aim to maximize their revenue rate (i.e., average revenue over time) while maintaining the QoS commitment. Unlie existing wors (e.g., [2]), to harvest shortlived whitespaces we model the dynamics of user behaviors and resource availability by an underlying Marov Decision Process (MDP). For simplicity, we assume there are two MSPs in the game. We show that the game admits a Nash Equilibrium (NE) at which both MSPs gain higher average revenue than in the conventional (non-offloading) scenario, especially when one experiences heavy traffic. The theoretical results herein are not only applicable to cellular systems but also to the more general area of competitive and cooperative admission control in queueing systems. To facilitate practical implementation, we design an algorithm that requires only local information. Using constrained Marov decision process model [7], the algorithm achieves a tight lowerbound on the reward rate of the above Marov game. II. PROBLEM STATEMENT Two MSPs provide coverage to the same residential area, each has its own base stations. The service requests at MSP and MSP 2 are assumed to be Poisson with rate λ and λ 2, respectively. Networ resource (e.g., spectrum) or capacity of MSP and MSP 2 allows them to serve N and N 2 customers simultaneously. The service time for each customer is exponentially distributed with average µ (for brevity, we normalize µ = ). When a customer arrives, it either gets served or rejected, i.e., the two networ operators do not queue unserved customers. MSP (or MSP 2) will get p (or p 2 ) monetary units after admitting one of its own customers for service. We consider 3 scenarios: the two MSPs operate independently (S ); the two MSPs partially cooperate (S 2 ); the two MSPs fully cooperate (S 3 ). In S, each MSP does not share its resources with the other, i.e., an MSP only serves its customers (as in conventional cellular systems). In S 2 a MSP shares its resource with the other by serving customers from the other MSP. However, to maximize its own revenue, a MSP reserves the right to reject or serve the other MSP s customers. It also decides whether to serve its own customers or redirect its customers to the other MSP. By serving a customer of MSP (or 2), MSP 2 (or ) is paid β p (or β 2 p 2 ) and the other fraction ( β )p (or ( β 2 )p 2 ) is retained for MSP (or MSP 2) with β, β 2 [0, ]. The partial cooperation in S 2 refers to the fact that the two MSPs can offload traffic onto each other (i.e., sharing resource is possible) but each reserves rights to mae its own strategic decisions on accepting or rejecting a customer. To facilitate theoretical analysis, in S 2, we assume that MSPs share their

2 resource availability information and admitting/rejecting policies with each other. We later design an algorithms that guide MSPs decisions with only local information. In S 3, the two MSPs fully cooperate by chipping in their resources and having a common interest in maximizing their total revenue. We aim to study: Will the two MSPs under S 2 benefit from sharing their resources, i.e., they both get higher revenue than under S? If so, then how to design such a cooperation policy and what is the best strategies for each MSP in accepting/rejecting/redirecting customers? What is the best strategies in accepting/rejecting/redirecting customers if MSPs do not share information on their resource/capacity, strategies as well as traffic load (i.e., the customer arrival rate)? How is the total revenue of the two MSPs under S and S 2 compared with that under S 3? How to design a revenuesharing (β, β 2 ) mechanism between the two so that both have incentives to fully cooperate? In S, each MSP can be modeled by a classical M/M/N /N (or M/M/N 2 /N 2 ) queue. For S 3, this is a M/M/(N + N 2 ) queue with two types of customers (type with reward p and type 2 with reward p 2 ). We address S 3 in Section III by relying on constrained Marov decision process [7]. In the following, we model the strategic traffic offloading between MSPs in S 2 as a constrained stochastic Marov game [5]. A. Stochastic Marov Game Formulation Let i and j denote the number of customers which are being served concurrently at MSP and MSP 2 at a given point in time. Let x denote the customer types that arrive at either MSP: x = 0 if no customer arrives, x = if a customer of MSP arrives, x = 2 if a customer of MSP 2 arrives. S = def {s = def (ijx)} denotes system space. At any time instance, the system can be in one of these states: (ij), (ij2), or (ij0). The cardinality of S, S = 3(N +)(N 2 +). Let A denote the pure action/strategy space of each MSP where A = def {a } = {0, }. A pure action/strategy a = of MSP means MSP admits the newly arrived customer and a = 0 if it refuses to admit the customer. A customer of type that is rejected by MSP is then directed to the other MSP for service. At the second MSP, this customer can be admitted or rejected, depending this MSP s strategy. Note that a customer who have been refused by both MSPs will be discarded. As the state space S is countable and the transition rate is bounded, there exists an equivalence between the continuous and discrete time domains for the MDP [8]. Hence, we can study this continuous time MDP in its equivalent discrete time domain. P (s s, a, a 2 ) is the transition probability to state s = def (i j x ) when actions (a, a 2 ) are taen by the two MSPs at state s : P (i j x ij2, a, a 2 ) λ 2 if x = 2; i = i + a a 2 ; j = j + a 2 λ = if x = ; i = i + a a 2 ; j = j + a 2 i if x = 0; i = i + a a 2 ; j = j + a 2 L j if x = 0; i = i + a a 2 ; j = j + a 2 (L λ λ 2 i j) if x =0; i =i+a a 2 ; j =j+a 2 These are for i, j > 0 and i < N ; j < N 2. If either one of the MSPs is full or empty (e.g., i = 0 or i = N ), the above transition probabilities need to be revised accordingly. We omit these cases here due to space limitation. P (i j x ij, a, a 2 ) λ 2 if x = 2; i = i + a ; j = j + a 2 a λ = if x = ; i = i + a ; j = j + a 2 a i if x = 0; i = i + a ; j = j + a 2 a L j if x = 0; i = i + a ; j = j + a 2 a (L λ λ 2 i j) if x =0; i =i+a ; j =j+a 2 a λ 2 if x = 2; i = i; j = j λ P (i j x ij0, a, a 2 )= if x = ; i = i; j = j i if x = 0; i = i ; j = j L j if x = 0; i = i; j = j (L λ λ 2 i j) if x =0; i =i; j =j where L = λ + λ 2 + N + N 2. Let S 2 matrices F and F 2 denote the mixed/stationary strategies of MSPs and 2, respectively. F (s, :) denote a distribution vector whose element F (s, 0) is the probability that MSP rejects (i.e., action a = 0 is taen) and F (s, ) is the probability that MSP accepts (i.e., action a = is taen) the arriving customer when the system is in state s. We have a stochastic probability transition matrix S S P(F, F 2 ) where element (s, s ) is denoted as P (s s, F, F 2 ) with: P (s s, F, F 2 )= P (s s, a, a 2 )F (s, a )F 2 (s, a 2 ) () a A a 2 A The reward of operator at sate s when actions a, a 2 are executed by the two MSPs is denoted by r (s, a, a 2 ) where: r (ij, a, a 2 ) p if a = and i < N, = p ( β ) if (a = 0 or i = N ) and a 2 = and j < N 2, r 2 (ij, a, a 2 ) { p β = if (a = 0 or i = N ) and a 2 = and j < N 2, r (ij2, a, a 2 ) { p2 β = 2 if (a 2 = 0 or j = N 2 ) and a = and i < N, r 2 (ij2, a, a 2 ) p 2 if a 2 = and j < N 2, = p 2 ( β 2 ) if (a 2 = 0 or j = N 2 ) and a = and i < N, and r (ij0, a, a 2 ) = r 2 (ij0, a, a 2 ) = 0. Let (s, F, F 2 ) denote the reward rate 2 (or average reward over time) of MSP when starting at state s: T (s, F, F 2 ) = def lim r (t) T + T (s, F, F 2 ) (2) where r (t) (s, F, F 2 ) is the expected reward at time t (w.r.t. F and F 2 ) of MSP when the system starts at state s. t=0 Let an S vector r (t) (F, F 2 ) (, F, F 2 ),..., r (t) ( S, F, F 2 )]. We have: [r (t) def = r (t) (F, F 2 ) = P(F, F 2 ) t r (F, F 2 ) (3) where the S vector r (F, F 2 ) [r (, F, F 2 ),..., r (s, F, F 2 ),..., r ( S, F, F 2 )] and 2 Note that the discounted reward function which is easier to analyze due to its guaranteed convergence/existence [5] can also be studied in a similar manner. def =

3 r (s, F, F 2 ) is initial/immdediate expected reward at state s: r (s, F, F 2 ) = a A r (s, a, a 2 )F (s, a )F 2 (s, a 2 ) (4) a 2 A The following Proposition states the existence of the reward rate in (2): Proposition : If MSPs aim to maximize the reward rate (s, F, F 2 ), i.e., the average criterion is used, the underlying Marov decision process with state space S and the transition probability matrix P(F, F 2 ) is irreducible and (s, F, F 2 ) in (2) is well-defined. Proof: The ey idea is that from any state, we can go to state (000) to empty the system (due to the service completion). From the empty state, we can go to any other state. Specifically, first, note that the MDP can always move from any state (ij0) to either state (ij) or (ij2) due to the arrival of customers. Similarly, due to the departures of customers, from any state (ijx) with i > or j >, the process can move to state ((i )jx ) or (i(j )x ). Lets consider MSP. If MSP aims to maximize its average reward, for any state s {(ij), (ij2)} in which it still has available resource (i.e., i < N ) while a customer arrives, it should not always reject service to both types of customers. If not (i.e., F (ij, ) = 0 and F (ij2, ) = 0), from (4), the immediate expected reward r (s, F, F 2 ) can always be improved (by having either F (ij, ) > 0 or F (ij2, ) > 0), so is the average reward in (2). Hence, from any state s {(ij), (ij2)}, the process can move to state ((i + )jx) for i < N. In a similar manner for MSP 2, from any state s {(ij), (ij2)}, the process can move to state (i(j + )x) for j < N 2. Thus, state space S contains only one communicating class or P (s s, F, F 2 ) > 0 s, s. In other words, the MDP with states in S is irreducible. As the underlying Marov process is irreducible, according to Theorem 5..5 [5], (s, F, F 2 ) in (2) is well-defined and identical for all initial states s. While sharing its resource, a MSP needs to maintain its QoS commitment. QoS of a MSP is measured by the probability that a customer of the MSP does not get served (the lower this probability the higher QoS), denoted by R (F, F 2 ). The QoS commitment ensures that while sharing its resource, an operator either meets its QoS target, QoS, (i.e., QoS R (F, F 2 )) or at least achieves the same level of QoS as if it did not share its resources with the other MSP, defined as P b (λ, N ) (i.e., P b (λ, N ) R (F, F 2 )). Without sharing its resource, P b (λ, N ) is exactly the Erlang B blocing probability: λ N P b (λ, N ) = N! N R (F, F 2 ) is given as πre where π is the stationary distribution vector of the MDP under the stationary strategies (F, F 2 ); Re is an S vector whose element Re (s) is the probability that a customer of the MSP does not get served given the system is at state s. For MSP, Re (s) = 0 if s (ij0), (ij2) and Re (s) = F (s, 0)F 2 (s, 0) if s = (ij) (as a customer does not get served if and only if it is rejected by both operators). Similarly, Re 2 (s) = 0 if s (ij0), (ij) and Re 2 (s) = F (s, 0)F 2 (s, 0) if s = (ij2). The objective of each MSP is to optimize its own stationary strategy given the other MSP s strategy so as to maximize its reward rate while maintaining its QoS commitment. As (s, F, F 2 ) does not depend on which state the system starts i=0 λ i i! from (see above), for brevity, let (F, F 2 ) denote the reward rate of MSP. Formally, each operator needs to solve the following problem: maximize (F, F 2 ) F s.t. C: F (s, a ) =, s a A C2: F (s, a ) 0, s, a A C3: max(qos, P b (λ, N )) R (F, F 2 ) where C and C2 are to ensure that each row of F is a probability distribution vector. C3 is to enforce the QoS commitment. B. NE Existence and Characterization Theorem : There exists a NE for the game (5) in which MSPs aim to maximize their reward rates. Proof: (5) belongs to the class of constrained Marov games [6]. For the existence of a NE to the game, we rely on the results in [6]. Theorem 2. in [6] states that a constrained Marov game admits at least one NE if the two following conditions hold: (Ergodicity) If the average criterion is used, then the state process is an irreducible Marov chain. (Strong Slater) For any stationary strategy from the other player, a player can still find its stationary strategy to ensure that the constraint of the game is met. As a consequence of Proposition, it is easy to see that the Ergodicity condition holds. (5) is then an irreducible stochastic game and it admits at least one NE if there is no constraint C3 (Theorem in [5]). In our case, the second condition also holds. Specifically, for MSP the LHS of C3 is greater than or equal to the blocing/rejecting probability P b (λ, N ). Hence, C3 can always be met if operator refuses to serve customers from operator 2. This is realized by executing the stationary strategy with F + (s, 0) =, s {ijx x = 2, 0}, regardless of strategies from the other player. In other words, the Strong Slater condition holds. Thus, there exists at least one NE to the constrained Marov game (5). Let (F, F 2) are the stationary strategies at a NE of two MSPs, we have the following Corollary: Corollary : Cooperation in game (5) is rational, i.e., both MSPs have incentives to share their resources. Proof: If MSPs do not share their resources, the reward rates are ( P b (λ, N ))λ p and ( P b (λ 2, N 2 ))λ 2 p 2 for MSP and MSP 2, respectively. Let F + be the strategy of MSP when it does not accept customer type 2, i.e., F + (s, 0) =, s {ijx x = 2, 0}. By definition of NE strategies (F, F 2): (F, F 2) (F +, F 2) ( P b (λ, N ))λ p 2 (F, F 2) 2 (F, F + 2 ) ( P (6) b(λ 2, N 2 ))λ 2 p 2 In the above, (F +, F 2) ( P b (λ, N ))λ p because though MSP rejects customers from MSP 2, MSP s customers are still offloaded and can be accepted by MSP 2 under MSP 2 s policy F 2. Corollary is proved. Intuitively, Corollary guarantees that each MSP does at least as well as he would if he does not participate in the traffic offloading game. The following theorem states necessary and sufficient conditions of a NE of the game (5) that can be used to find the game s NE(s). Theorem 2: Any pair (F, F 2 ) is a NE of the def constrained Marov game (5) if and only if z = (v, v 2, F, F 2, u, w, u 2, w 2 ) is the globally optimal solution of the following problem and its optimal value is 0: (5)

4 s.t. 2 minimize T [v P(F, F 2 )v ] z = C: T(s, v )F 2 v (s), s C2: r (s)f 2 + T(s, u )F 2 (v (s) + u (s)), s C3: F T(s, v 2 ) v 2 (s), s C4: F r 2 (s) + F T(s, u 2 ) (v 2 (s) + u 2 (s)), s C5: r (F, F 2 ) + P(F, F 2 )w = v + w, =, 2 C6: F (s, a ) = ; F 2 (s, a ) = s a A a A C7: F (s, a ) 0, s, a A C8: max(qos, P b (λ, N )) R (F, F 2 ), =, 2. (7) where is a column vector with all ones; u and w are S vectors of auxiliary variables u (s), w (s), respectively; T(s, v ), T(s, u ), and r (s) are 2 2 matrices whose elements are P (s s, a, a 2 )v (s ), P (s s, a, a 2 )u (s ), s S s S and r (s, a, a 2 ) for a, a 2 {0, }, respectively. Proof: Sufficiency: We assume z = (v, v 2, F, F 2, u, w, u 2, w 2 ) is the globally optimal solution of (7) and its optimal value is 0. We will prove (F, F 2) is the NE of the constrained Marov game (5). From (2), (3), we rewrite (F, F 2 ) as: (F, F 2 ) = lim T + T T P(F, F 2 ) t r (F, F 2 ) t=0 = Q(F, F 2 )r (F, F 2 ) where Q(F, F 2 ) is the Cesaro-limit matrix [5], defined as: Q(F, F 2 ) = def lim T + T (8) T P t (F, F 2 ) (9) As the underlying Marov process is irreducible the above Q(F, F 2 ) exists (Theorem 5..3 in [5]). Additionally: C in (7) implies that: t=0 Q(F, F 2 ) = Q(F, F 2 )P(F, F 2 ) (0) v P(F, F 2)v, F () hence together with the definition of Q(F, F 2 ), we also have: v Q(F, F 2)v, F (2) v (F, F 2) (6a) From (4) and (6a): (F, F 2) (F, F 2). In a similar way, from C3, C4, and C5, we also can show that: 2 (F, F 2) 2 (F, F 2 ). In other words, (F, F 2) is the NE of the constrained Marov game (5). Necessity: We need to prove that if (F, F 2) is the NE of the constrained Marov game (5), then there exists (v, v 2, u, w, u 2, w 2 ) so that we can construct z = (v, v 2, F, F 2, u, w, u 2, w 2 ) to be the globally optimal solution of (7) and its optimal value is 0. For that purpose, we need to construct a feasible solution z (i.e., all constraints in (7) hold) and show that 0 is the optimal value of (7) which can be then attained by z. First set v = (F, F 2). Note that for a given stationary strategy from a MSP, e.g., F 2, MSP finds its optimal stationary strategy F by solving for the optimal stationary policy of a MDP with transition probability P(F, F 2). Applying Proposition and in [5] to the MDP with transition probability P(F, F 2), we have: (F, F 2) P(F, F 2) (F, F 2), F (7) and there exists u such that: (F, F 2) + u r (F, F 2) + P(F, F 2)u, F (8) As (F, F 2) is the NE and recall v = (F, F 2), below inequalities hold: v P(F, F 2) (F, F 2), F v + u r (F, F 2) + P(F, F 2)u, F (9) Since the above inequalities hold F, constraints C and C2 must hold. Similarly, we can also show that there exists u 2 such that C3 and C4 also hold. According to Theorem 5..3 in [5], for the MDP with transition probability P(F, F 2) (at the NE), constraint C5 holds for both MSPs by setting: w = (I P(F, F 2) + Q(F, F 2)) (r (F, F 2) v ) (20) The strategy pair (F, F 2) is the NE of the constrained Marov game (5), thus (F, F 2) has to be within the strategy space of both MSPs, defined by constraints C, C2, and C3 in (5). Hence, constraints C6, C7, and C8 of (7) must hold. We have just constructed a feasible solution z of (7). Note that for any feasible solutions of (7), constraints C and C3 imply that v P(F, F 2 )v. In other words, the objective function of (7) is lower-bounded by 0. By recalling (8) and (0), the NE stationary strategy pair (F, F 2) from z can attain this Since the objective function in (7) is zero under z, from the above, we must have: v = P(F, F lower bound. The proof is completed. 2)v. Recall the definition Problem (7) may have multiple solutions, i.e., possibly multiple NEs. Using a gradient-based algorithm, we can numerically of Q(F, F 2 ), we then have v = Q(F, F 2)v (3) obtain a solution very close (within 0 7 ) to the optimal value Multiply both sides of C5 with Q(F, F of (7) that is lower-bounded by 0. 2) on their left and Remar : First, (7) is a nonlinear problem (with nonlinear constraints and nonlinear objective function) that involves recall (0)(3): (F, F 0 3 (N 2) = v (4) + ) (N 2 + ) variables. For a reasonable capacity size (e.g., 0 customers per pico-cell), the number On the other hand, C2 in (7) implies that v + u r (F, F 2) + P(F, F 2)u, F. (5) of variables 30 = 3630 is very large. Hence, the computational complexity involved in solving (7) is significant. In fact, we attempt to solve it with a gradient-based algorithm Multiply both sides of (5) with Q(F, F 2) on their left and but it taes a long time. Second, computing the NE via problem recall (0)(2): (7) requires a MSP to reveal its capacity and resource availability Q(F, F 2)v +Q(F, F 2)u Q(F, F 2)r (F, F 2)+Q(F, F status with the other MSP. Not only mentioning the MSPs 2)u willingness to share their business privacy, this approach requires additional communication overhead to exchange information. In the following, we derive a practical algorithm that only relies on local information and achieves a lower-bound (it is shown to be tight via simulations) for NE utilities of game (5).

5 III. PRACTICAL IMPLEMENTATION Note that the offloaded traffic from a MSP is not an overflow process 3. In fact, its statistical characteristics depend on the MSP s strategy. That maes the approach in [9] not readily applicable. However, we observe that the actual offloaded traffic process from operator (in game (7)) is comprised of not only overflow customers (rejected because of not having enough resources, with rate P b (λ, N )λ ) but also customers rejected even when having enough resources. Thus, we can find a lower bound for the reward rate under the optimal strategy (derived from (7)) by replacing the offloaded traffic process with an overflow process with rate P b (λ, N )λ. The resulting reward rate is a lower bound because its derived accepting/rejecting policy is suboptimal for (7) (by always rejecting customers offloaded by the MSP who still has enough resources). We now limit interest to MSP and the following results/analysis also apply to MSP 2. MSP serves two types of customers, one arriving according to a Poisson distribution with rates λ and the other following an overflow process with rate P b (λ 2, N 2 )λ 2. Fortunately, authors in [9] pointed out that the average reward rate of MSP can be well approximated by assuming the overflow process is a Poison process. Now, MSP serves two types of customers, arriving according to two Poisson processes with rates λ and λ def 2 = P b (λ 2, N 2 )λ 2. Remar 2: Note that the following approach is also applicable to case S 3 in which two MSPs fully cooperate by chipping in their resource and maximize their total reward rate (i.e., a M/M/(N + N 2 ) queue with two classes of customers arriving with rate λ and λ 2 and reward p and p 2, respectively). To solve for the optimal admitting/rejecting policy, we denote the system state at MSP as s = (ix) where i is the number of customers being served and x is the type of the coming customer at the MSP (x = 0 if no one arrives). The system space S = def {s}. The transition probability of the state process and corresponding rewards are as follows: P (i x i0, a ) = N +λ +λ 2 P (i x i, a ) = N + λ + λ 2 λ 2 if x = 2; i = i λ if x = ; i = i i if x = 0; i = i (N i) if x = 0; i = i λ 2 if x = 2; i = i + a λ if x = ; i = i + a i if x = 0; i = i + a (N i) if x = 0; i = i + a = P (i x i2, a ) r(i0, a ) = 0; r(i, a ) = a p ; r(i2, a ) = a p 2 β 2 Without the QoS commitment constraint, the optimal admission strategy that maximizes the average reward is the (deterministic) trun reservation policy 4 [0]. However, with the QoS commitment, one faces a constrained Marov decision process whose optimal admission policy is generally not deterministic. As MSPs do not reveal their strategies, MSP has to maintain its QoS commitment regardless of MSP 2 s strategy. The probability that a customer of MSP does not get served R (F ) is computed as R (F ) = π Re where π is the stationary distribution of the MDP with state space S. Re, similar to Re, is the 3 For a finite queue/capacity system, overflow process captures customers who are not admitted due to overflow. 4 Trun reservation policy states that customers with higher payment/reward are always admitted while customers with lower payment/reward will be admitted into the system only if the system s size is less than a given threshold a vector whose element Re (s) = F (s, 0) if s = (i) and Re (s) = 0 if s = (i0). The optimal stationary policy F (s, a ) that maximizes the average reward V (a) for the constrained MDP with the states and transition probabilities above is obtained from solving the following problem [7]: maximize f V (a) = S s= a A r(s, a )f(s, a ) s.t. C: W f = 0 C2: f = C3: max(qos, P b (λ, N )) R (F ) C4: f 0. (2) where W is an S 2 S matrix with w s,(s,a ) = δ(s, s ) P (s s, a ) (δ(s, s ) is a Kronecer delta function). C3 is to enforce the QoS commitment. The optimal stationary policy is then obtained as F (s, a ) = f(s,a) f(s,a. ) a A Remar 3: The two operators do not need to share information regarding their resources, capacity, and traffic load (i.e., λ, λ 2 ). The only external input for each operator to mae its decision is the rate of offloaded traffic from the other. This rate can be estimated/learnt locally and accurately with initial training time, so are customer/traffic arrival rates (i.e., λ, λ 2). IV. NUMERICAL RESULTS We numerically evaluate the average reward under S, S 3, and the lower-bound of game (5) in S 2 using Matlab simulations. The accepting/rejecting policy for the lower-bound of game (5), numerically obtained from solving (2), is then used to govern the admission policy of two MSPs in simulations. QoS = 0.3 and QoS 2 = 0.4; N = 0; N 2 = 5; p = 4; p 2 = 7. λ = 5; λ 2 = 25. The corresponding blocing rates at two MSPs are and The total reward rate when two MSPs do not cooperate is ( P b (λ, N ))λ p + ( P b (λ 2, N 2 ))λ 2 p 2 = 277 (S ). The total reward rate when two MSPs fully cooperate (S 3 ) is obtained by solving (2). It is in our case. The cooperation gain in this case is about 20%. The reward rate of MSP, MSP 2, and their total reward rate (lower bounds in S 2 ) vs. β and β 2 are shown in Figures (a)(b)(c) and 2(a)(b)(c), respectively. By selecting appropriate β and β 2, the lower bound on total reward rate under S 2 (obtained by solving (2)) is almost that under S 3 when both MSPs fully cooperate (Figures (c) and 2(c)). This means that the lower-bound for the reward rate in S 2 can be made very tight by tuning β and β 2. The critical values β and β 2 that shape the revenue sharing contract can be found numerically so that the propose offloading mechanism can achieve its social optimality (S 3 ). Figure shows that the reward rates of both MSPs are not very sensitive w.r.t β. This suggests that there is virtually no offloading from the underloaded system (MSP) to the overloaded (MSP2). The reward rate of MSP 2 has a almost concave shape w.r.t. the fraction of reward (β 2 ) it pays for MSP to carry its traffic (Figure 2(b)). This is because if the reward from serving traffic offloaded from a MSP is too small, then the other MSP will reserve less resource for offloaded customers. This leads to the loss of revenue for the traffic owner. On the other hand, the traffic owner also earns less if it pays the other MSP too much for carrying its traffic. Additionally, the reward rate of a MSP (e.g., MSP ) monotonically increases w.r.t. the fraction of reward (e.g., β 2 ) it gets from serving the other MSP s customers (Figure 2(a)). The average reward of both MSPs vs. traffic loads λ and λ 2 are shown in Figures 3 and 4. As can be seen, the higher

CONCLUSIONS To harvest short-lived whitespaces in cellular bands, we proposed a cooperation framewor that allows mobile/cellular service providers (MSPs) to offload traffic onto each other while

6 (a) (b) (c) Fig.. Reward rate of MSP, MSP 2 and their total reward rate vs. β. (a) (b) (c) Fig. 2. Reward rate of MSP, MSP 2 and their total reward rate vs. β 2. the traffic load the higher the gain can be harvested via traffic offloading and the gain can be up to 60%, compared with the case without traffic offloading. V. CONCLUSIONS To harvest short-lived whitespaces in cellular bands, we proposed a cooperation framewor that allows mobile/cellular service providers (MSPs) to offload traffic onto each other while maintaining their own QoS commitment. The optimal offloading strategy for each MSP was derived by solving a constrained Marov game. Fig. 4. Reward rate of MSP 2 vs. λ and λ 2 (β = 0.7, β 2 = 0.8). Fig. 3. Reward rate of MSP vs. λ and λ 2 (β = 0.7, β 2 = 0.8). REFERENCES [] N. Golmie, Interference in the 2.4 ghz ism band: Challenges and solutions, Networing for Pervasive Computing: Research from the USA National Institute of Standards and Technology, Tech. Rep., 203. [2] R. Berry, M. Honig, T. Nguyen, V. Subramanian, H. Zhou, and R. Vohra, On the nature of revenue-sharing contracts to incentivize spectrumsharing, in Proceedings of the IEEE INFOCOM Conference, 203, pp [3] P. Marques, H. Marques, J. Ribeiro, and A. Gameiro, Coexistence analysis and cognitive opportunities selection in GSM bands, in Proceedings of the IEEE 69th Vehicular Technology Conference (VTC), April 2009, pp. 5. [4] S. Kandeepan, A. Sierra, J. Campos, and I. Chlamtac, Periodic sensing in cognitive radios for detecting UMTS/HSDPA based on experimental spectral occupancy statistics, in Proceedings of the IEEE Wireless Communications and Networing Conference (WCNC), April 200, pp. 6. [5] J. Filar and K. Vrieze, Competitive Marov Decision Processes. Springer Press, 997. [6] E. Altman and A. Shwartz, Constrained marov games: Nash equilibria, Advances in Dynamic Games and Applications, vol. 5, pp , [7] A. Hordij and F. Spiesma, Constrained admission control to a queueing system, Advances in Applied Probability, vol. 2, no. 2, pp , 989. [8] R. F. Serfozo, An equivalence between continuous and discrete time marov decision processes, Operations Research, vol. 27, no. 3, pp , 979. [9] V. Nguyen, On the optimality of trun reservation in overflow process, Probability in the Engineering and Informational Sciences, vol. 5, pp , 99. [0] S. Stidham, Optimal control of admission to a queueing system, IEEE Transactions on Automated Control, vol. 30, pp , 985.

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony