Distributed Opportunistic Spectrum Access in an Unknown and Dynamic Environment: A Stochastic Learning Approach

Size: px

Start display at page:

Download "Distributed Opportunistic Spectrum Access in an Unknown and Dynamic Environment: A Stochastic Learning Approach"

Tracy Randall
5 years ago
Views:

1 Distributed Opportunistic Spectrum Access in an Unknown and Dynamic Environment: A Stochastic Learning Approach Huijin Cao, Jun Cai, Senior Member, IEEE, Abstract In this paper, the problem of distributed throughput maximization in an Opportunistic Spectrum Access (OSA) network with multiple secondary users (SUs) and multiple primary channels is investigated. To address the challenges in designing efficient solutions in dynamic and unknown environment, we formulate the optimization problem as a noncooperative game, which is further proved to be an ordinal potential game. We then propose a Best Response (BR) based algorithm to achieve the Nash Equilibrium Points (NEPs) of the formulated game, given that there exists a coordinator for SUs to work in a roundrobin fashion and a common control channel for SUs to exchange their information. To further relieve the system overhead due to information exchange among SUs, we design a new Stochastic Learning Automata (SLA) based algorithm, called N-SLA, which can converge to the pure-strategy NEPs of the formulated ordinal potential game in a fully distributed way. To our best knowledge, we are the first to address the convergence issue of the SLA based algorithms for general ordinal potential games. Simulation results validate the effectiveness of our proposed algorithms. Index Terms Cognitive radio networks, Opportunistic spectrum access, Nash Equilibrium Points, Ordinal potential games, Stochastic Learning Automata. I. INTRODUCTION To lessen the dilemma between spectrum shortage and spectrum waste caused by the static spectrum allocation, a more intelligent and flexible spectrum allocation paradigm, namely opportunistic spectrum access (OSA) using cognitive radio (CR) technology, has been proposed [ ]. In OSA networks, primary users (PUs), which are licensed to use the specific spectrum, coexist with secondary users (SUs) that can only opportunistically access the spectrum holes. The concept of OSA aims to explore the opportunity of sharing licensed spectrum among both licensed and unlicensed users, and has been adopted in several applications. For example, recently, the U.S. Federal Communications Commission (FCC) has approved commercial unlicensed operations in the ultra high frequency (UHF) spectrum [4]. Since the occupancy states of the licensed channels are always time-varying and unknown to SUs, each SU needs to detect the spectrum through sensing before access. However, due to the hardware limitation, each SU can only select a very limited number of channels to sense and access at any time [5, 6]. Under this scenario, selecting Copyright (c) 05 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. Huijin Cao, Jun Cai, are with the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, RT 5V6, Canada, ( caoh@myumanitoba.ca, jun.cai@umanitoba.ca) best channels to sense and access becomes a key issue for OSA networks, which will ultimately determine spectrum utilization efficiency and secondary throughput. The problem of channel selection in OSA networks has been widely studied in literature. Among them, distributed solutions have recently drawn great attention. To capture interactions among multiple SUs, various game theoretic approaches have been proposed, e.g., [7 0], and the focus was on studying the properties of the formulated games as well as investigating algorithms that converged towards the Nash Equilibrium Points (NEPs). Although some progress has been achieved in this area, there are still several unsolved problems. For example, most existing game theoretic solutions were based on the assumptions that all SUs have perfect knowledge about the environment and complete information about actions taken by other SUs. Besides, the environment was always assumed to be static during the convergence process. However, such assumptions may not hold in practical systems, where the spectrum holes are always time-varying and the channel availability statistics are initially unknown to SUs. In addition, in distributed scenarios, each SU only has its own local information, and acquiring knowledge about other SUs not only consumes a large amount of network resources, e.g., time, power, and bandwih, but also can lead to high communication overhead. What s more, coordination among SUs may not be available in some practical communication networks. Therefore, it is still important to design new solutions for distributed channel selection so as to solve collisions among SUs and maximize their throughput in an unknown, dynamic environment with incomplete information. In this paper, we consider distributed channel selection in an OSA network by jointly considering the following aspects: () the availability of spectrum holes are time-varying and its statistics are unknown to SUs; () there is no central controller and no information exchange among SUs; () each SU is allowed to opportunistically occupy the idle channel with a probability, and in general, such probabilities for different SUs are different, depending on their individual service requirements; and (4) the interests of SUs are conflicting since each SU tries to selfishly maximize its own expected throughput. We are trying to answer the fundamental questions that () without channel statistics, how could a SU choose a channel with high availability to improve its chance for access? () without a central controller and any information exchange, how could a SU effectively resolve collisions and achieve self-coordination for channel access? To address all

2 these challenges, we first model the distributed throughput maximization problem in an OSA network within a game theoretic frame, and then incorporate the Stochastic Learning Automata (SLA) technology [] to address the missing information on both the environment and other SUs. With our design, the SUs can learn from their individual actionreward history and adjust their behaviors towards the NEPs independently and automatically. The main contributions of this paper are summarized as follows: ) We first formulate the distributed throughput maximization problem as a noncooperative game. Then, we prove the game to be an ordinal potential game [] by carefully constructing an ordinal potential function. According to the property of the ordinal potential game, the formulated game has at least one pure-strategy Nash Equilibrium Point (NEP). ) To derive the NEPs, we first propose a Best Response (BR) [ 5] based algorithm, which can guarantee the convergence towards the NEPs within finite time, based on the assumptions that there exists a coordinator for SUs to work in a round-robin fashion and a common control channel for SUs to exchange their information. ) We then introduce Stochastic Learning Automata (SLA) into the formulated game to adapt decision-making in an unknown and dynamic environment. To investigate the convergence property of the SLA in ordinal potential games, which has not been addressed in literature, we define a weighted potential game and prove that it has the same NEP set with the formulated ordinal potential game. After that, we propose a new SLA based algorithm, called N-SLA, to achieve the NEPs of the weighted potential game, or equivalently the NEPs of the formulated ordinal potential game. 4) Simulation results are provided to demonstrate the convergence of both the BR based and the N-SLA algorithms, and illustrate the efficiency of them in terms of sum log expected throughput. The rest of this paper is organized as follows. In section II, related works are presented. In section III, system model and game formulation are described. We analyze the NEPs of the formulated game in Section IV. In Sections V and VI, we propose a BR based algorithm and a N-SLA algorithm to achieve NEPs, respectively. The numerical results are presented in Section VII and we conclude the paper in Section VIII. II. RELATED WORKS In the literature, the problem of distributed spectrum access in OSA networks has been addressed from both non-game theoretic and game theoretic perspectives. From non-game theoretic perspective, authors in [6 8] investigated the distributed opportunistic spectrum access by using a framework of partially observable Markov decision process. These studies did not consider the interactions among SUs. In [9 ], the distributed learning and access algorithms were developed based on the multi-armed bandit (MAB) approaches. In these studies, the objective was to minimize the system regret and the focus was on investigating asymptotical performance of learning algorithms, and balancing the tradeoff between exploration and exploitation. Research works from the game theoretic perspective focused more on investigating algorithms to achieve NEPs [7 0]. However, most of these algorithms were based on traditional game theory and needed indispensable information exchange (such as actions and/or received payoffs) among SUs during the convergence process towards NEPs. Recently, incorporating the SLA technology into game theory has been proposed to avoid information exchange. For example, in [], Zandi et al. proposed a distributed adaptive learning and access policy for SUs, where the channel selection and access can effectively adapt to a wide range of traffic load patterns in the primary network, and can converge towards a pure strategy NEP without any information exchange. In [4], Xu et al. studied the problem of multiuser sequential channel sensing and access in dynamic cognitive radio networks. To cope with the uncertain, dynamic, and incomplete information constraints, the authors proposed a distributed stochastic learning algorithm and proved its convergence towards NEPs. Both [] and [4] focused on the scenario where the number of primary channels was no less than the number of SUs. Such scenario leaded to a deterministic transmission, i.e., the transmission probability for each SU was one. In [5 8], the scenario where the number of SUs is larger than the number of primary channels was considered. Wu et al. in [5] studied the problem of distributed channel selection for interference mitigation in a time-varying radio environment without information exchange. Zheng et al. in [6] investigated a more general and practical system model where the users activities were dynamically variable, and designed a low-complexity fully distributed noregret learning algorithm for channel adaptation. However, these studies focused on minimizing the experienced interference in the physical layer only, and ignored the consideration of multiple-access control mechanisms at higher layers. In [7, 8], authors employed the uniform medium access control (MAC) protocol to coordinate transmissions among SUs that tried to access the same idle channel. Our work in this paper differs from all these existing works from the following aspects: ) Both the scenarios where the number of primary channels is no less than and less than the number of SUs are discussed in this paper. ) Unlike the existing works that mainly focused on homogeneous SUs and uniform MAC protocols, a more general scenario with heterogeneous SUs (each SU has different access probability) is considered here. ) All the existing works investigated the convergence only for exact potential games. In this paper, we design a new SLA based algorithm which can guarantee the convergence for ordinal potential games. Thus, the results obtained in this paper can hence be applied into more general scenarios.

3 Channel N Channel Channel Fig.. System Model t X n (t) X n (t) 0 III. SYSTEM MODEL AND GAME FORMULATION A. System Model Consider a distributed OSA network consisting of M secondary users (SUs) (or equivalently, M pairs of secondary transceivers) and N independent primary channels, owned by N primary users (PUs). Time is divided into slots with an equal length. Let X n (t) denote the availability status of channel n at time slot t. X n (t) = means channel n is available and X n (t) = 0 otherwise, as shown in Fig.. Without loss of generality, assume that X n (t), n =,, N, follows a stationary Bernoulli random process over t, with the mean θ n = E[X n (t)] [0, ] [, 4]. For explanation purpose, we consider a case where all SUs locate in a small-scale mutually interfering area [, 4]. At the beginning of time slot t, each SU selects one channel for sensing. Based on the sensing outcome, each SU makes a decision on access. If the selected channel n (n =,, N) is sensed to be idle at SU m (m =,, M), it will access this channel with a transmission probability P m (0 < P m ). Otherwise, it will keep silent in the current slot. A transmission is successful only if there is a single SU in transmission in the given channel. Otherwise, a collision occurs. The collisionfree achievable rate of SU m (m =,, M) on channel n (n =,, N) can be calculated as rm(t) n = log ( + p mhm(t) n ) () σ where p m is the transmission power of SU m, σ is the background noise, and Hm(t) n = (d m ) α βm(t) n is the instantaneous channel gain of SU m on channel n. Here, d m is the distance between the transmitter and the receiver of SU m, α is the path loss factor, and βm(t) n is an exponentially distributed random fading coefficient with unit mean. Note that the channel bandwih of each primary channel has been normalized to be for simplicity. B. Game Formulation Define a m as the channel selection action of SU m, a m as the set of channel selection actions of all SUs except SU m, i.e., a m = {a,, a m, a m+,, a M }, and C n as the set of SUs who select the channel n (n =, N), i.e., For simplicity, it is assumed that the channel sensing is perfect. However, the analysis in this paper can easily be extended to the scenario with imperfect channel sensing by introducing the detection probability P d and the false alarm probability P f. C n = {m {,, M} : a m = n}. Then, the achievable throughput of SU m in time slot t can be written as: r m (a m, a m, t) = X am (t)i m (C am, t)r am m (t) () where I m (C am, t) = if only SU m from C am transmits over the channel a m in time slot t, and 0 otherwise. Hence, the expected throughput of SU m is given by R m (a m, a m ) = E[r m (a m, a m, t)] = E[X am (t)]e[i m (C am, t)]e[rm am (t)] = θ am P m ( P l ) r m () l C am,l m In this paper, we consider a distributed throughput maximization problem, where each SU tries to maximize its own expected normalized rate without a central controller: where max a m=,,n ˆR m (a m, a m ), m =,, M (4) ˆR m (a m, a m ) = R m(a m, a m ) r m = θ am P m l C am,l m ( P l ) (5) According to problem (4), each SU m tries to maximize its own utility function ˆR m, which complies with the property of noncooperative games. Thus, to solve this distributed throughput maximization problem, we formulate a noncooperative game denoted by G = {M, N, { ˆR m (a m, a m )} m M }, where M = {,, M} is the set of players (or SUs), N = {,, N} is the set of actions (or primary channels) that each player can take, and ˆR m (a m, a m ) is the utility of the player m (m = {,, M}) upon taking action a m N while other players taking a m. Each player independently and selfishly adjusts its strategy to maximize its individual utility ˆR m (a m, a m ). Definition : A channel selection profile a = (a,, a M ) is a pure strategy NEP of G if and only if no SU can improve its utility function by deviating unilaterally, i.e., ˆR m (a m, a m) ˆR m (a m, a m) m M, a m N (6) IV. PROPERTIES OF NEPS IN G In this section, we investigate the properties of the NEPs in G. We will prove that the formulated game G is an ordinal potential game []. Definition : A game is called an ordinal potential game if the incentives of all players of the game for changing their actions can be reflected by a function Φ : a = (a,, a M ) R, called an ordinal potential function, i.e., ˆR m (a m, a m ) ˆR m (a m, a m ) > 0 Φ(a m, a m ) Φ(a m, a m ) > 0, m M, a m, a m N, a m a m (7)

4 4 In general, the proof of the existence of an ordinal potential function in a game is sufficient to prove the game being an ordinal potential game. Theorem : The formulated noncooperative game G is an ordinal potential game. Proof: To prove the theorem, we consider two scenarios separately: (I) the number of channels N is no less than the number of SUs M, i.e., N M, and (II) N > M. In scenario (I), since the spectrum resource is redundant, each SU can transmit with probability. Thus, effective orthogonalization mechanism to avoid collisions among SUs is crucial. In scenario (II), each SU m ( m M) transmits with a probability P m, where 0 < P m <, to share the limited spectrum with other SUs. Scenario (I): With P m =, ˆRm (a m, a m ) ( m M) can be rewritten as ˆR m (a m, a m ) = υ am ( C am ) = { θam C am = 0 otherwise where C am denotes the number of SUs taking the same action a m. Then, we can define the following bounded function Φ : a = (a,, a M ) R, Φ (a m, a m ) = (8) C N n υ n (k) (9) n= k= Note that equation (9) follows a same form of the Rosenthal s potential function [9]. Following similar proof given in [], we have Φ (a m, a m ) Φ (a m, a m ) = ˆR m (a m, a m ) ˆR m (a m, a m ) m M, a m, a m N, a m a m. (0) From (0), it is obvious that the property in (7) holds. Therefore, the function Φ : a = (a,, a M ) R is an ordinal potential function so that the game G with P m = ( m M) is an ordinal potential game. Moreover, since the deviation in the utility of an arbitrary player m is exactly reflected by the deviation in the ordinal potential function, the game G with P m = ( m M) is also an exact potential game [0]. Scenario (II): With 0 < P m < ( m M), we define the following bounded function Φ : a = (a,, a M ) R: Φ (a m, a m ) = β m ( β l + β m + y am m ) () m M l:a l =a m where β m = log ( P m ), () y am m = log ( θ a m P m P m ) () Suppose that an arbitrary SU m (m M) unilaterally changes its action from a m to a m (a m, a m N, a m a m). According to (), we have Φ (a m, a m ) = [β m ( p C am (a m,a m),p m + y ap p ) + + t:a t a m,a m l C am (a m,a m) β p (β m + q C a m (a m,a m) β q ( β l + β m + y am m )+ l C ap (a m,a m),l m l C aq (a m,a m) β l + β p β l + β q + y aq q ) β t ( β l + β t + yt at )] (4) l C at and Φ (a m, a m ) = [β m ( β l + β m + y a m m )+ l C a (a m m,a m) β q (β m + β l + β q + q C a (a m m,a m),q m l C aq (a m,a m),l m yq aq ) + β p ( β l + β p + yp ap ) p C am (a m,a m) l C ap (a m,a m) + β t ( β l + β t + yt at )] (5) l C at t:a t a m,a m Then, we have Φ (a m, a m ) Φ (a m, a m ) = β m ( β l + β m + ym am ) β m ( l C am (a m,a m) β l + β m + y a m m ) + β m q C a m (a m,a m),q m β q = β m ( Define l C am (a m,a m) p C am (a m,a m),p m β l + y am m Ũ m (a m, a m ) = log ( ˆR m (a m, a m )), Substituting (5) into (7), we have Ũ m (a m, a m ) = log (θ am P m = log ( θ a m P m P m ) + According to () and (), l C am (a m,a m),l m l C am (a m,a m) Ũ m (a m, a m ) = y am m + Substituting (9) into (6), we have l C a m (a m,a m) β p β m l C a m (a m,a m) β l y a m m ) (6) m M, a m N (7) ( P l )) log ( P l ) (8) l C am (a m,a m) Φ (a m, a m ) Φ (a m, a m ) β l (9)

5 5 = β m (Ũm(a m, a m ) Ũm(a m, a m )) = β m (log ( ˆR m (a m, a m )) log ( ˆR m (a m, a m ))) (0) Since β m = log ( P m ) < 0, ˆR m (a m, a m ) ˆR m (a m, a m ) > 0 Φ (a m, a m ) Φ (a m, a m ) > 0, m M, a m, a m N, a m a m, 0 < P m <. () Therefore, the function Φ : a = (a,, a M ) R is also an ordinal potential function and the game G with 0 < P m <, m M, is an ordinal potential game. From [0], we can conclude that there exists at least one pure strategy NEP in the formulated noncooperative game G. However, deriving such NEPs is not straightforward, especially for scenario (II). In next sections, we will investigate how to achieve the pure strategy NEP of G. V. BEST RESPONSE (BR) BASED ALGORITHM In this section, we propose a Best Response (BR) based algorithm, to find the pure-strategy NEPs of G, by assuming that there exist a coordinator for SUs to work in a round-robin fashion and a common control channel for SUs to broadcast their individual information, e.g., the updated actions and the transmission probabilities. The proposed BR based algorithm is divided into two stages: in stage (), each SU distributively learns the channel availability statistics (θ, θ,, θ N ) by adopting the upperconfidence-bound (UCB) algorithm as shown in [ ]; based on these results, in stage (), SUs select primary channels one by one, and in each round, one SU chooses the best response to the strategies of SUs who have already made decisions beforehand. Specifically, given C n (t) ( n N ) in time slot t, the SU chooses the best channel n (t) satisfying n (t) = arg n N max[ω n (C n (t))] () where { υn ( C ω n (C n (t)) = n (t) ) if m C n (t) υ n ( C n (t) + ) otherwise in scenario (I), or chooses n (t) satisfying () n (t) = arg n N max[θ n P m χ n (C n (t))] (4) where { if Cn (t) = {m} or C n (t) = χ n (C n (t)) = ( P l ) otherwise (5) l C n(t),l m in scenario (II). The proposed BR based algorithm is summarized in Algorithm. Theorem : The proposed BR based algorithm converges to a pure-strategy NEP of G, starting from any point. Proof: In time slot t (t ), suppose SU m (m M) is selected, and the current action profile of all the other SUs is denoted as a m (t). Following () (under scenario (I)) or (4) (under scenario (II)), the SU m chooses a m (t) = n (t) so that ˆR m (n (t), a m (t)) ˆR m (n, a m (t)) n N (7) Algorithm : The Best Response based Algorithm Stage (): Each SU estimates the channel availability statistics (θ, θ,, θ N ) by adopting the UCB algorithm. Stage (): 4 Initialize t =, a m (0) = 0, m M, and C n (t) =, n N 5 Repeat: 6 Set M = M, M =. 7 While M, 8 The coordinator randomly selects a SU m M. 9 if in scenario (I), i.e., P m =, m M, 0 the SU chooses n (t) according to (). else if in scenario (II), i.e., 0 < P m <, m M, the SU chooses n (t) according to (4). end if 4 The SU m broadcasts the selected channel n (t) through the common control channel. 5 Each SU updates {C n (t)} n N according to the following rule: C n (t)\m if n = a m (t ), n a m (t) C n (t + ) = C n (t) {m} if n = a m (t), n a m (t ) C n (t) otherwise 6 (6) 7 The SU m tunes to n (t) for sensing, and transmits with probability P m if available. 8 Exclude m from M and include it in M. 9 Update t = t +. 0 end While Until convergence. Using the property of ordinal potential games (7), we have Φ(n (t), a m (t)) Φ(n, a m (t)) n N (8) where { Φ(n Φ (n (t), a m (t)) = (t), a m (t)) in scenario (I) Φ (n (t), a m (t)) in scenario (II) (9) Since a m (t ) = a m (t) and a m (t ) N, (8) can be rewritten as Φ(a(t)) Φ(a(t )) n N (0) Thus, in both scenarios, the ordinal potential function Φ(a(t)) is non-decreasing with time t. Due to the bounded property of both Φ (a m, a m ) and Φ (a m, a m ) ( m M, a m N ), Φ(a(t)) will converge to a (local) maximum. Since any (local) maximum of the ordinal potential function is a NEP [], the Theorem holds. Remark : In the proposed BR based algorithm, the information exchange among SUs is indispensable, which may result in high network overhead, and may not be feasible in some practical communication networks. Therefore, designing a fully distributed online-adaptive algorithm is required, so that each SU can independently and adaptively adjust its own

6 6 strategies based on its individual experienced action-reward without the coordinator or any information exchange. VI. STOCHASTIC LEARNING AUTOMATA (SLA) BASED ALGORITHM In this section, we propose a fully distributed algorithm based on Stochastic Learning Automata (SLA) and investigate its convergence to the pure-strategy NEP of G in an unknown and dynamic environment. Since the game G in scenario (I) is proved to be an exact potential game, the existing SLA based algorithms for achieving the pure-strategy NEPs (e.g., [] and [8]) can be applied directly. However, investigating the convergence property of the SLA based algorithms for the ordinal potential game in scenario (II) is challenging. To our best knowledge, there is no SLA based algorithms available in literature for general ordinal potential games. Thus, in this section, we focus our discussion on the scenario (II). We will first define a modified noncooperative game G and show that it has the same NEP set as the original game G. Then, by proving G to be a weighted potential game, we propose a new SLA based algorithm, called N-SLA, to achieve the pure-strategy NEP of G, which equivalently prove the convergence towards the NEPs of G. A. A Modified Noncooperative Game We define a modified noncooperative game from G as follows. Definition : A modified noncooperative game is defined as G = {M, N, { R m (a m, a m )} m M }, where M and N are the same as in the game G. Rm (a m, a m ) = log (θ am P m ( P l ))+L ( m M), where L > 0 l C am,l m is a predefined constant to guarantee the utility R m (a m, a m ) is nonnegative. Proposition : The modified game G has the same NEP set as the original game G. Proof: In the modified game G, each player m (m M) tries to maximize its utility R m (a m, a m ), i.e., max a m N R m (a m, a m ), m M () Using the monotonicity of the logarithm function, we have a m = arg max a m N R m (a m, a m ) = arg max a m N log (θ am P m = arg max a m N θ a m P m = arg max a m N l C am,l m l C am,l m ( P l ) ( P l )) ˆR m (a m, a m ) m M () Thus, the optimization problems (4) and () have a same solution set, or in other words, the modified game G and the original game G have the same NEP set. Proposition : The modified game G is a weighted potential game. Proof: According to (0), for m M, and a m, a m N, a m a m, Φ (a m, a m ) Φ (a m, a m ) = β m (log ( ˆR m (a m, a m )) log ( ˆR m (a m, a m ))) = β m (Ũm(a m, a m ) + L Ũm(a m, a m ) L) = β m ( R m (a m, a m ) R m (a m, a m )) () According to [], the modified game G is a weighted potential game, where the weighted potential function is Φ : a = (a,, a M ) R and the weight vector is ( β,, β M ). Since () complies with the property (7), the weighted potential game is also an ordinal potential game, so that there exists at least one pure-strategy NEP in G. In the next subsection, we propose a new SLA based algorithm to find the pure-strategy NEP of the modified game G, which is also the NEP of the original game G. B. N-SLA Algorithm ) Algorithm Description: At first, define p m (j) = (p m (j), p m (j),, p mn (j)) as the mixed strategy of SU m (m M) in the jth iteration, where p mn (j) (n N ) is the probability for SU m to choose the channel n in the jth iteration and N n= p mn(j) =, j 0, m M. The key ideas of the proposed N-SLA algorithm are as follows: ) each SU m (m M) chooses one primary channel for sensing and access in each iteration according to the current mixed strategy p m (j); ) the mixed strategy p m (j) of each SU m is updated based on Rw m (a m (j), a m (j)), which is defined as Rw m (a m (j), a m (j)) = log ( ˆR m (a m (j), a m (j))) + L L (4) and converges to a pure-strategy NEP at the end. The details of the proposed N-SLA algorithm is described in Algorithm. Remark : Note that in our proposed N-SLA algorithm, SUs do not need to compute ˆR m (a m (j), a m (j)) according to (5), by requesting the information of both the channel statistics and other SUs including their channel selection strategies and their transmission probabilities. Instead, each SU m (m M) can estimate ˆR m (a m (j), a m (j)) by only measuring the successful access it achieves within T slots on the selected channel a m (j). Since the mixed strategy of each SU is updated only based on the individual experienced action-reward, there is no need of a coordinator for managing the sequential access of SUs, and each SU independently and automatically updates its strategy without any information exchange. Therefore, the proposed N-SLA algorithm is fully distributed. Remark : According to the existing SLA based algorithms ([ 8]), the mixed strategy p m (j) of each SU m is updated based on the received instantaneous reward r m (a m, a m, t). However, such updating rule is not feasible in our problem since the sufficient condition specified by Theorem. in [] for {p m (j)} m converging to the pure-strategy NEPs does no longer hold in ordinal potential games. In the implementation

7 7 Algorithm : The stochastic learning automata (SLA) based Algorithm Initialization: Set j = 0 and p mn (j) = ( N, N,, N ), m M. While there exists a p m (j) (m M), in which the maximum p mn (j) (n N ) is less than 0.99, each SU m selects a channel a m (j) according to its current mixed strategy p m (j). 4 for t = jt : T + jt, 5 in time slot t, each SU m (m M) performs channel sensing and channel contention on the selected channel a m (j). At the end of the tth slot, each SU m receives the random reward ˆr m (a m (j), a m (j), t) specified by ˆr m (a m (j), a m (j), t) = X am(j)(t)i m (C am(j), t) (5) end for 6 for m = : M, 7 the SU m estimates ˆR m (a m (j), a m (j)) according to ˆR m (a m (j), a m (j)) = ˆR est m (a m (j), a m (j)) = ˆR est m (a m (j), a m (j)) + ξ (6) T +jt t=jt ˆr m (a m (j),a m (j), t) T (7) where ξ is the estimation error. 8 end for 9 Each SU m (m M) updates its p m (j) according to the following rule: p mn (j + ) = p mn (j) + b Rw m (a m (j), a m (j))( p mn (j)), n = a m (j) p mn (j + ) = p mn (j) brw m (a m (j), a m (j))p mn (j), n a m (j) (8) where 0 < b < is the step size. 0 Update j = j +. end While of the proposed N-SLA algorithm, the mixed strategy of each SU is updated based on Rw m (a m (j), a m (j)). In the following, we will show that such updating rule guarantees the convergence towards the pure-strategy NEPs of the formulated ordinal potential game. ) Convergence of the proposed N-SLA algorithm: In this subsection, we show that under our proposed N-SLA algorithm, the channel selection probability vector p m (j) ( m M) converges to the pure strategy NEP of the modified game G. The proof consists of two stages. In the first stage, we derive an ordinary differential equation (ODE) whose solution approximates the asymptotic behavior of the channel selection probability matrix P (j) = [p T (j),, p T M (j)] if the parameter b used in (7) is sufficiently small, where ( ) T denotes the transpose operation. In the second stage, we characterize the solutions of the ODE and derive the long term behavior of P ( ). Ordinary differential equation (ODE): Define Rw(j) = ( Rw (a (j), a (j)),, Rw M (a M (j), a M (j))) as the reward profile at the jth iteration, and G( ) as a function of P (j), a(j), and Rw(j), which represents the updating rule specified by (8). Then, the matrix form of the updating rule given by (8) can be expressed as P (j + ) = P (j) + bg(p (j), a(j), Rw(j)) (9) Define f(p ) as the conditional expectation of G(P (j), a(j), Rw(j)) given P (j) = P, i.e., f(p ) = E[G(P (j), a(j), Rw(j)) P (j) = P ] (40) Lemma : Define a piecewise-constant interpolation of P (j) as P b (y) = P (j), y [jb, (j + )b) (4) As the step size b 0, the sequence {P b ( )} converges weakly to Q( ), which is the solution of the following ODE: dq = f(q), Q(0) = P (0) (4) where P (0) is the initial channel selection probability matrix. Proof: The proof follows the same procedure to prove the Theorem. in []. Solution to the ODE: Since P contains N M components, denoted by p mn, m M, n N, the component equations of (4) can be written as dp mn = p mn ( p mn )E[ Rw m P, a m = n]+ n N,n n ( p mn )E[ Rw m P, a m = n ] = p mn p mn E[ Rw m P, a m = n] p mn n N,n n = p mn ( n N,n n p mn E[ Rw m P, a m = n ] n N,n n p mn p mn (E[ Rw m P, a m = n] E[ Rw m P, a m = n ])) = p mn ( p mn (E[ Rw m P, a m = n] E[ Rw m P, n N a m = n ])) m M, n N (4) where E[ Rw m P, a m = n] denotes the expected normalized reward of the SU m if it employs the pure strategy n while any other SU m ( m M, m m) employs the mixed strategy p m. Specifically, E[ Rw m P, a m = n] can be represented as E[ Rw m P, a m = n] = m M,m m (a,,a m,a m+,,a M ) Rw m (n, a m ) p m a m (44) Theorem : With a sufficiently small parameter b, P converges to a pure NEP of the modified game G.

8 8 Proof: Define a function F (P ) as F (P ) = E[Φ (a m, a m ) P ] N = p mn E[Φ (n, a m ) a m = n, P ] = n= N n= p mn m M,m m a,,a m,a m+,,a M Φ (n, a m ) p m a m (45) Thus, F (P ) = Φ (n, a m ) p mn a,,a m,a m+,,a M m M,m m p m a m m M, n N (46) and df (P ) = m M,n N F (P ) dp mn p mn Substituting (4) and (46) into (47), we have df (P ) = E[Φ (n, a m ) P, a m = n]p mn ( m M,n N (47) n N p mn (E[ Rw m P, a m = n] E[ Rw m P, a m = n ])) = L β m p mn p mn (E[ Rw m P, a m = n] m M,n N,n N E[ Rw m P, a m = n ]) (48) According to (), β m < 0, m M. Thus, F (P ) is nondecreasing along time. In addition, since F (P ) is bounded as specified in (45), P converges to P such that df (P ) = 0. Thus, from (48) and (4), we have df (P ) = 0, p mnp mn (E[ Rw m P, a m = n] E[ Rw m P,a m = n ]) = 0, m M, n, n N, dp mn = 0, m M, n N, P is a stable stationary point of the ODE (4). (49) Following the Theorem. in [] that all stable stationary points of the ODE are the NEPs, Theorem holds. Channel index (n) Slot index (t) a(t) a(t) a(t) a4(t) Fig.. Evolution of the channel selection actions by the Algorithm for the unconstrained transmission case Channel index (n) Slot index (t) a(t) a(t) a(t) a4(t) a5(t) a6(t) a7(t) a8(t) Fig.. Evolution of the channel selection actions by the Algorithm for the constrained transmission case Channel Selection Probabilities Iteration index (j) Fig. 4. Evolution of the channel selection probabilities for SU 4 by the Algorithm p4 p4 p4 p44 VII. SIMULATION RESULTS In this section, we provide numerical results to demonstrate the performance of both the BR based and the N-SLA algorithms. We first illustrate the convergence of both algorithms. And then, we evaluate the performance in terms of sum log expected throughput, which is a commonly used metric to evaluate the tradeoff between efficiency and fairness among multiple users [4, 5]. In simulations, we set b = 0., T = 00 slots, and r m =, m M. Note that this simplification may affect the absolute performance values, but not the optimization. In addition, both the mean channel availability parameters θ n ( m N ) and the transmission probabilities P m ( < P m <, m M) are simulated following the uniform distribution, i.e, θ n U[0, ], n N, and P m U[0, ], m M. A. Convergence behavior of the proposed BR based and N- SLA algorithms Figs. and show the convergence behaviors of a m (t) ( m M) over time t by using Algorithm, with respect to the scenario (I) and the scenario (II), respectively. In Fig., M = 4, N = 8, (θ,, θ 8 ) = (0.5, 0., 0.7, 0.6, 0., 0.8, 0., 0.), and in Fig., M = 8,

9 9 4.5 a(j) a(j) a5(j) a8(j) 4.5 a4(j) Channel index n.5 Channel index (n) Iteration index (j) Iteration index (j) Fig. 5. Evolution of the channel selection actions for SUs,, 5, 8, by the Algorithm Fig. 8. Evolution of the channel selection actions for SU 4 by the Algorithm Channel index (n) a(j) a7(j) convergence values of the ordinal potential function -0-5 scenario N-SLA BR -0-5 scenario N-SLA BR -0-5 scenario N-SLA BR Iteration index (j) -40 min max -40 min max -40 min max Fig. 6. Evolution of the channel selection actions for SUs, 7, by the Algorithm Fig. 9. Comparison of the proposed two algorithms in terms of the convergence values of the ordinal potential function Channel index (n) Iteration index (j) Fig. 7. Evolution of the channel selection actions for SU 6 by the Algorithm N = 4, (θ,, θ 4 ) = (0.8, 0.6, 0.5, 0.), (P,, P 8 ) = (0., 0., 0.6, 0.5, 0.4, 0.8, 0.6, 0.5). From these two figures, we can see that under the scenario (I) (i.e., Fig. ), the network converges after M time slots and the selected channels are the M most available primary channels (θ, θ, θ 4, θ 6 ) = (0.5, 0.7, 0.6, 0.8), while under the scenario (II) (i.e., Fig. ), due to the heterogeneous ties of SUs who have different transmission probabilities, the network convergence consumes more time than that in scenario (I). Figs. 4 to 8 show the convergence behavior of the Algorithm, by setting L = and all other parameters to be the same a6(j) as in Fig.. Fig. 4 plots the evolution of the channel selection probabilities p mn (j) ( n N ) for an arbitrary SU. It can be seen that the channel selection probability vector p m (j) evolves from a mixed strategy ( 4, 4, 4, 4 ) to a pure-strategy (0, 0, 0, ) within about 450 iterations. We further show in Figs. 5 to 8 the evolution of the channel selection actions a m (j) for each SU m (m M). It can be seen that after the network converges, SUs,, 5, 8 will always select channel to access (as shown in Fig. 5), SUs, 7 will always select channel to access (as shown in Fig. 6), while SU 6 and SU 4 will always select channel and channel 4 to access, respectively (as shown in Figs 7 and 8). This channel selection result can be easily justified to be an NEP of the formulated game. Fig. 9 compares the proposed BR based and the N-SLA algorithms in terms of the values of the ordinal potential function at convergence. In this simulation, we consider three scenarios with respect to the transmission probability vector (P,, P M ). In scenario, the transmission probabilities of SUs are dissimilar, i.e., (P,, P M ) = (0., 0., 0., 0.4, 0.5, 0.6, 0.7, 0.8); in scenario, the transmission probabilities across different SUs are randomly selected as (P,, P 8 ) = (0., 0., 0.6, 0.5, 0.4, 0.8, 0.6, 0.5); and in scenario, the transmission probabilities of all SUs are identical, i.e., P m = 0.5, m M. Other parameter settings are M = 8, N = 4, (θ,, θ 4 ) = (0.8, 0.6, 0.5, 0.), L is 60 for scenario, for scenario, and 80 for scenario. For the

10 0 sum log expected throughput global optimal solution random selective scheme proposed algorithms achievable sum log expected throughput global optimal solution random selection scheme proposed algorithms the number of SUs (M) the number of SUs (M) Fig. 0. Performance comparison of different approaches in case I Fig.. Performance comparison of different approaches in case III sum log expected throughput the number of SUs (M) global optimal solution random selective scheme proposed algorithms Fig.. Performance comparison of different approaches in case II two proposed algorithms, under each scenario, the maximum and the minimum values of the ordinal potential function at convergence are compared by independently simulating 500 trials. From Fig. 9, we can see that in all three scenarios, the two proposed algorithms achieve same maximum values, but different minimum values. The reason is explained as follows. As shown in the Theorem, the Algorithm can converge to the NEPs only when b 0. However, b cannot be too small in practice since smaller b leads to slower convergence speed. Thus, for a practical value of b, the solutions found by the Algorithm may be non-equilibrium points, which lead to lower values of the ordinal potential function at convergence. However, the same maximum values achieved by the two proposed algorithms illustrate that Algorithm can still converge to the NEPs under the current parameter settings. We further show the percentages that Algorithm converges to NEPs in all three scenarios in Table I. From the table, we can see that Algorithm converges to the NEPs most of the time, which clearly indicates the effectiveness of the proposed fully distributed algorithm. TABLE I THE EQUILIBRIUM PERCENTAGE ACHIEVED BY Algorithm IN ALL THREE SCENARIOS scenario scenario scenario 94.6% 9.% 9.4% B. Sum log expected throughput of the proposed algorithms In this subsection, we evaluate the system performance in terms of sum log expected throughput achieved by the NEP solutions. For comparison purpose, a random selection scheme and a globally optimal solution are also simulated. In the random selection scheme, each SU randomly chooses a channel in each time slot, while the globally optimal solution tries to maximize the sum log expected throughput in a centralized manner by assuming all the information including the primary channel statistics and the SUs transmission probabilities is known a prior. The presented results are average values over 000 independent trials. The performance comparison is carried out under three different cases, as shown in Figs. 0-. Case I as shown in Fig. 0 considers a homogeneous OSA network where the availability statistics of primary channels and the transmission probabilities of SUs are identical. The parameter settings are θ n = 0.7, n N, and P m = 0.5, m M. From Fig. 0, we can see that the proposed algorithms achieve the same performance as the globally optimal solution. It is because that in case I, the ordinal potential function specified by (0) can be rewritten as Φ (a m, a m ) = ( log ( P ))( m M log ( ˆR m (a m, a m )) + Mlog (θ) + Mlog ( P P )). As we have discussed before, the proposed algorithms try to maximize the ordinal potential function Φ (a m, a m ). Thus, the obtained NEP solution (a m, a m) satisfies (a m, a m) = arg max (a m,a m) ( log ( P ))( m M log ( ˆR m (a m, a m ))+Mlog (θ) + Mlog ( P P )) = arg max m M log ( ˆR m (a m, a m )= (a m,a m) arg max n N C n. According to the procedures in (a m,a m) Algorithm, all the pure-strategy NEPs will achieve the same n N C n. Thus, the proposed algorithms can achieve globally optimal performance. Case II considers a CR network with different availability statistics across the primary channels, but identical transmission probabilities across the SUs, as shown in Fig.. The parameter settings are (θ, θ, θ ) = (0.7, 0.5, 0.), and P m = 0.5, m M. From this figure, the performance achieved by the proposed algorithms is very close to the globally optimal solution. This is because in case II, Φ (a m, a m ) can be rewritten

11 as Φ (a m, a m ) = ( log ( P ))( m M log ( ˆR m (a m, a m )) + m M log (θ am ) + Mlog ( P P )), and the NEP solution (a m, a m) achieved by the proposed algorithms satisfies (a m, a m) = arg max ( m M log (a m,a m) ( ˆR m (a m, a m ))+ m M log (θ am )). Thus, due to the existence of the term m M log (θ am )), the proposed algorithms can not find the globally optimal solution. In case III as shown in Fig., the simulated CR network has different availability statistics across the primary channels and different transmission probabilities across the SUs. Specifically, the simulation parameters are set as: (θ, θ, θ ) = (0.7, 0.5, 0.), (P,, P M ) = (0., 0.7, 0.5, 0.) for M = 4, (P,, P M ) = (0., 0.6, 0., 0.8, 0.5, 0.) for M = 6, (P,, P M ) = (0.4, 0., 0.6, 0., 0.8, 0., 0.5, 0.6) for M = 8, and (P,, P M ) = (0., 0.4, 0.4, 0.5, 0., 0.8, 0., 0.6, 0., 0.) for M = 0. Since in case III, the ordinal potential function Φ (a m, a m ) specified by (0) is totally different from the global optimization m M log ( ˆR m (a m, a m )), an obvious gap exists between the proposed algorithms and the globally optimal solution. However, in all these three figures, we can see that the proposed algorithms always outperform the random selection approach. Thus, we can conclude that for distributed channel access, the proposed algorithms can not only converge to the pure-strategy NEPs, but also achieve better system performance in terms of sum log expected throughput than the random selection approach. VIII. CONCLUSION In this paper, we investigate the problem of distributed throughput maximization of an OSA network in an unknown and dynamic environment. We first formulate this distributed optimization as an ordinal potential game, which has at least one pure-strategy NEP. Then, to achieve the pure-strategy NEPs, we propose two algorithms: a BR based algorithm and a N-SLA algorithm. The BR based algorithm is proved to converge towards NEPs, but requiring indispensable information exchange among SUs. While the N-SLA algorithm is fully distributed and can converge towards NEPs without any information exchange. Our simulation results demonstrate that the proposed algorithms can not only guarantee the convergence to NEPs, but also achieve a good network performance in terms of sum log expected throughput. REFERENCES [] Y. Zhao, S. Mao, J. Neel, and J. Reed, Performance evaluation of cognitive radios: Metrics, utility functions, and methodology, Proc. IEEE, vol. 97, no. 4, pp , Apr [] S. Haykin, Cognitive radio: Brain-empowered wireless communications, IEEE J. Sel. Areas Commun., vol., no., pp. 0-0, Feb [] M. Masonta, M. Mzyece, and N. Ntlatlapa, Spectrum decision in cognitive radio networks: A survey, IEEE Commun. Surveys Tuts., vol. 5, no., pp , rd Quart. 0. [4] Fed. Commun. Comm., Second memorandum opinion and order (FCC 0-74), Sep. 00. [Online]. Available: Releases/Daily Business/ 00/db09/FCC-0-74A.pdf. [5] M. Bkassiny, Y. Li, and S. K. Jayaweera, A survey on machine-learning techniques in cognitive radios, IEEE Commun. Surveys Tuts., vol. 5, no., pp. 6-59, rd Quart. 0. [6] Q. Zhao, L. Tong, A. Swami, and Y. Chen, Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework, IEEE J. Sel. Areas Commun., vol. 5, no., pp , Apr [7] X. Chen and J. Huang, Distributed spectrum access with spatial reuse, IEEE J. Sel. Areas Commun., vol., no., pp , Mar. 0. [8] R. Southwell, X. Chen, and J. Huang, Quality of service games for spectrum sharing, IEEE J. Sel. Areas Commun., vol., no., pp , Mar. 04. [9] M. Azarafrooz and R. Chandramouli, Distributed learning in secondary spectrum sharing graphical game, IEEE GLOBECOM, pp. -6. [0] N. Cheng, N. Zhang, N. Lu, X. Shen, J. W. Mark, and F. Liu, Opportunistic spectrum access for CR-VANETs: A game-theoretic approach, IEEE Trans. Veh. Technol., vol. 6, no., pp. 7-5, Jan. 04. [] P. Sastry, V. Phansalkar, and M. Thathachar, Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information, IEEE Trans. Syst., Man, Cybern. B, vol. 4, no. 5, pp , 994. [] D. Monderer and L. Shapley, Potential games, Games Economic Behavior, vol. 4, pp. 4-4, 996. [] K. Cohen, A. Leshem, and E. Zehavi, Game theoretic aspects of the multi-channel ALOHA protocol in cognitive radio networks, IEEE J. Sel. Areas Commun., vol., no., pp , Nov. 0. [4] K. Cohen and A. Leshem, Distributed game-theoretic optimization and management of multichannel ALOHA networks, IEEE/ACM Trans. on Netw., vol. 4, no., pp. 78-7, Jun. 06. [5] C. Singh, A. Kumar, and R. Sundaresan, Combined base station association and power control in multichannel cellular networks, IEEE/ACM Trans. Netw., vol. 4, no., pp , Apr. 06. [6] S. Ahmad, M. Liu, T. Javidi, et al., Optimality of myopic sensing in multichannel opportunistic access, IEEE Trans. Inf. Theory, vol. 55, no. 9, pp , 009. [7] Y. Chen, Q. Zhao, and A. Swami, Distributed spectrum sensing and access in cognitive radio networks with energy constraint, IEEE Trans. Signal Process., vol. 57, no., pp , 009. [8] Y. Chen, Q. Zhao, and A. Swami, Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors, IEEE Trans. Inf. Theory, vol. 54, no. 5, pp , May 008. [9] K. Liu and Q. Zhao, Decentralized multi-armed bandit with multiple distributed players, in Proc. 00 Inf. Theory and Appl. Workshop, pp. -0. [0] Y. Gai, B. Krishnamachari, and R. Jain, Learning mul-

Tang, Opportunistic spectrum access with multiple users: learning under competition, IEEE INFOCOM 0, pp. -9. [] M. Zandi, M. Dong, and A.

12 tiuser channel allocations in cognitive radio networks: a combinatorial multi-armed bandit formulation, in Proc. IEEE DySPAN 0, pp. -9. [] K. Liu and Q. Zhao, Distributed learning in multi-armed bandit with multiple players, IEEE Trans. Signal Process., vol. 58, no., pp , 00. [] A. Anandkumar, N. Michael, and A. Tang, Opportunistic spectrum access with multiple users: learning under competition, IEEE INFOCOM 0, pp. -9. [] M. Zandi, M. Dong, and A. Grami, Distributed stochastic learning and adaptation to primary traffic for dynamic spectrum access, IEEE Trans. on Wireless Commun., vol. 5, no., pp , Mar. 06. [4] Y. Xu, Q. Wu, J. Wang, L. Shen, and A. Anpalagan, Robust multiuser sequential channel sensing and access in dynamic cognitive radio networks: Potential games and stochastic learning, IEEE Trans. on Veh. Technol., vol. 64, no. 8, pp , Aug. 05. [5] Q. Wu et al., Distributed channel selection in timevarying radio environment: Interference mitigation game with uncoupled stochastic learning, IEEE Trans. Veh. Technol., vol. 6, no. 9, pp , Nov. 0. [6] J. Zheng, Y. Cai, Y. Xu, and A. Anpalagan, Distributed channel selection for interference mitigation in dynamic environment: A game-theoretic stochastic learning solution, IEEE Trans. Veh. Technol., vol. 6, no. 9, pp , Nov. 04. [7] J. Zheng, Y. Cai, N. Lu, Y. Xu, and X. Shen, Stochastic game-theoretic spectrum access in distributed and dynamic environment, IEEE Trans. on Veh. Technol., vol. 64, no. 0, pp , Oct. 05. [8] R. W. Rosenthal, A class of games possessing pure strategy Nash equilibria, J. Game Theory, vol., pp , 97. [9] Y. Xu, J. Wang, Q. Wu, A. Anpalagan, and Y. Yao, Opportunistic spectrum access in unknown dynamic environment: A game-theoretic stochastic learning solution, IEEE Trans. on Wireless Commun., vol., no. 4, pp. 80-9, Apr. 0. [0] B. Vcking and R. Aachen, Congestion games: Optimization in competition, in Proc. Algorithms Complexity Durham Workshop, Sep [] K. Liu and Q. Zhao, Distributed learning in multi-armed bandit with multiple player, IEEE Trans. Signal Process., vol. 58, no., pp , Nov. 00. [] A. Anandkumar, N. Michael, K. Tang, and A. Swami, Distributed algorithms for learning and cognitive medium access with logarithmic regret, IEEE J. Sel. Areas Commun., vol. 9, no. 4, pp , Apr. 0. [] Y. Gai and B. Krishnamachari, Decentralized online learning algorithms for opportunistic spectrum access, IEEE GLOBECOM, pp. -6. [4] W. Yu, T. Kwon, and C. Shin, Multicell coordination via joint scheduling, beamforming and power spectrum adaptation, IEEE INFOCOM, pp [5] Y. Gai, H. Liu, and B. Krishnamachari, A packet dropping-based incentive mechanism for M/M/ queues with selfish users, IEEE INFOCOM, pp Huijin Cao received her B.E. and M.S. degrees in the School of Information Engineering from Zhengzhou University, Zhengzhou, China, in 00 and 0, respectively. She is currently working towards the Ph.D. degree in telecommunications with the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada. Her current research interests include cognitive radio, radio resource allocation, game theory, MDP, and machine learning. Jun Cai received the B.Sc. and M.Sc. degrees from Xi an Jiaotong University, Xi an, China, in 996 and 999, respectively, and the Ph.D. degree from the University of Waterloo, ON, Canada, in 004, all in electrical engineering. From June 004 to April 006, he was with McMaster University, Hamilton, ON, as a Natural Sciences and Engineering Research Council of Canada Postdoctoral Fellow. Since July 006, he has been with the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada, where he is currently an Associate Professor. His current research interests include energy-efficient and green communications, dynamic spectrum management and cognitive radio, radio resource management in wireless communications networks, and performance analysis. Dr. Cai served as the Technical Program Committee Co- Chair for the IEEE Vehicular Technology Conference 0 Fall Wireless Applications and Services Track, the IEEE Global Communications Conference (Globecom) 00 Wireless Communications Symposium, and International Wireless Communications and Mobile Computing (IWCMC) Conference 008 General Symposium; the Publicity Co-Chair for IWCMC in 00, 0, 0, and 04; and the Registration Chair for the First International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine) in 005. He also served on the editorial board of the Journal of Computer Systems, Networks, and Communications and as a Guest Editor of the special issue of the Association for Computing Machinery Mobile Networks and Applications. He received the Best Paper Award from Chinacom in 0, the Rh Award for outstanding contributions to research in applied sciences in 0 from the University of Manitoba, and the Outstanding Service Award from IEEE Globecom in 00.

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],