Performance of Round Robin Policies for Dynamic Multichannel Access

Performance of Round Robin Policies for Dynamic Multichannel Access Changmian Wang, Bhaskar Krishnamachari, Qing Zhao and Geir E. Øien Norwegian University of Science and Technology, Norway, {changmia, oien}@iet.ntnu.no University of Southern California, USA, bkrishna@usc.edu University of California, Davis, USA, qzhao@ece.ucdavis.edu Abstract We consider two simple round-robin sensing policies for dynamic multi-channel access in cognitive radio networks one in which channel switching takes place when the primary user is sensed to be, and one in which a channel switching takes place when the primary user is sensed to be sent. Prior work has shown that these policies are each optimal under certain conditions when the primary user occupancy on each channel can be described as an independent two-state Markov chain. In this work, we consider a very general case where the primary user occupancy on each channel is an arbitrary stationary and ergodic two-state process, and derive bounds on their performance. The bounds provide insights into conditions under which these extremely simple policies perform well. I. INTRODUCTION Dynamic spectrum access techniques aim to improve the efficiency of radio-frequency utilization for wireless communications. In these techniques, cognitive secondary users (users that do not have priority access to spectrum) attempt to identify and exploit the opportunities for transmission that arise whenever and wherever primary users (that own or have priority access to spectrum) are not active. We consider the following simple but fundamental dynamic multichannel access problem. There is a single backlogged secondary user. Time is discretized into slots. At the beginning of each slot, the user picks one of multiple orthogonal channels to sense. If that channel is availle (i.e., no primary user activity is detected), the user may proceed to use it, else it has to wait out the rest of the slot. The secondary user must employ some policy to select a channel at each time, taking into account its past observations, so as to maximize its expected long-term throughput. This problem has been considered before in prior work. However, in most of these, the occupancy of the primary user on the channels has been modeled as being Markovian. Under such an assumption, this problem can be formulated as a partially observle Markov decision process (POMDP) []. In the particular case when the primary user occupancy over each channel can be modeled as an independent and identically distributed (i.i.d.) two-state Markov chain, it was shown by Zhao, Krishnamachari and Liu [2] that the myopic policy, in which the secondary user picks the channel that maximizes the immediate reward has a semi-universal structure that obviates the need to know the details of the one-step transition probility matrix. In particular, when the Markov chain describing the primary user occupancy for each channel is positively correlated, the myopic policy is for the secondary user to stick with a channel until it is observed to be busy, in which case it must switch to the channel that was observed the longest ago from the next slot on. When the Markov chain is negatively correlated, the myopic policy is for the secondary user to stick with a channel until it is observed to be free, then switching to a either the channel that was observed the longest ago or the most recent previous channel, depending on whether an odd or even number of steps have elapsed since the last channel switch. It has been shown in [2] that the myopic policy achieves optimality for the two-channel case regardless of the sign of the correlation. Building on this work, Ahmad et al. [3] show that for i.i.d. Markov chains the myopic policy is in fact optimal for any number of channels if they are positively correlated; it is also optimal for 3 channels if they are negatively correlated; however, for 4 channels it is not always optimal. Other extensions of these works have considered imperfect sensing [4], and simultaneous multichannel sensing [5], [6]. Unlike these prior works, we relax in this paper the requirement that the primary user behavior be Markovian. Instead, we consider a more general case when the primary user behavior on each channel can be modeled as an independent stationary and ergodic binary process. We consider two simple round robin policies that are related to the ove-mentioned myopic policy for the case of i.i.d. Markov channels, and develop bounds for the performance of these policies for the general settings considered here. II. CHANNEL MODEL There are N channels, with each channel i described by an independent two-state stationary ergodic process S (i) (n), where n is the time index. The two states can be thought of as the busy () and free () states reflecting the occupancy of the primary user. We define the probility of a state a on channel i as a and the probility of seeing state a followed state b, i.e. P{S (i) (n) = a, S (i) (n + ) = b}, as. We define the following probility mass function (p.m.f.) for n : h (i) (n) = P {S(i) (m + ) = a, S (i) (m + 2) = a,..., S (i) (m + n ) = a, S (i) (m + n) = b S (i) (m ) = b, S (i) (m) = a} ()

Channel Channel Fig.. Illustration of the Zero-Switch Round Robin Policy and the One-Switch Round Robin Policy which is the probility distribution of how long the process shall stay in one state immediately after a transition before making its next transition. Then the average time the process stays in a certain state is thus given by: a = {n h (i) (n)} (2) n= Let F (i) (n) be the cumulative distribution function of h(i) (n) and c F (i) (n) be the corresponding complementary cumulative distribution function. We can also define the following p.m.f. for n : rh (i) (n) = P {S(i) (m + ) = a, S (i) (m + 2) = a,..., S (i) (m + n ) = a, S (i) (m + n) = b S (i) (m) = a} (3) which is the probility distribution of the duration that the process shall stay in one state before making its next transition if the process is entered at a random point with an observation of state a. We denote the corresponding cumulative distribution function and complementary cumulative distribution function as r F (i) (n) and c rf (i) (n) respectively. III. ROUND ROBIN POLICIES We assume the secondary user can perform perfect sensing of one channel at each time. We consider two intuitive round-robin channel sensing and access policies for the secondary user: Zero-switch round robin policy (ZSRRP): in this policy, the secondary user stays on a given channel repeatedly accessing it on each round until it observes a zero, then switches to the channel observed the longest ago. One-switch round robin policy (OSRRP): in this policy, the secondary user stays on a given channel until it observes a one, then switches to the channel observed the longest ago. Note that ZSRRP is identical to the myopic policy for positively correlated i.i.d. Markov chains [2], while the OSRRP is a variant of the myopic policy for negatively correlated i.i.d. Markov chains but coincides with it when there are only two channels. Thus the policies we explore in this work are closely related to strategies that have been shown to be optimal for Markov chains under certain conditions in prior work [3]. Channel Fig. 2. previous previous The two error patterns for 2 channels for ZSRRP IV. BOUNDS ON REGRET Let E[T π ] be the expected average throughput for some given policy π. Further, let E[T genie ] be the expected average throughput that can be achieved by an omniscient secondary user, i.e., one that can always pick a free channel at each time slot if one is availle. We define the expected regret E[R π ] as follows: E[R π ] = E[T genie ] E[T π ] (4) Thus, if we can find an upper bound on the regret for a given policy E[R + π ] E[R π ], it can be used to obtain a lower bound on that policy s expected average throughput, as follows: E[T π ] E[T genie ] E[R + π ] (5) We first develop bounds on the regret for both ZSRRP and OSRRP for the case of two channels. The crux of our approach is to identify certain error patterns for each policy, conditions under which they may possibly be misled into missing an opportunity for transmission. We begin with rough bounds, which have the advantage of simplicity in that they consider only two consecutive time slots. We then show how these bounds may be improved by considering corrections that take into account more slots. A. Bounds for Zero-Switch Round Robin Policy Figure 2 shows two error patterns for the ZSRRP policy. The horizontal lines separate the two channels, and the vertical lines separate two adjacent time slots. The symbols showing the state of each channel in a given slot are,, or to indicate a don t care (either or ). The small x on one of the channels in each slot indicates the channel selection and observation made by the secondary user following the ZSRRP policy. Consider the first pattern. In this case, the secondary user is on the first channel at the first time slot, and observes a. Regardless of what happens on the second channel at the first time slot, the ZSRRP policy requires the secondary user to stick with the first channel in the second time slot. However,

previous previous Channel Fig. 3. The two error patterns for 2 channels for OSRRP in the second slot there is a on the first channel, and a on the second channel. Thus, the ZSRRP policy misses an opportunity that is availle to the genie. Similarly, in the second error pattern, the secondary user starts from the second channel, but observes a there, causing it to switch to the first channel in the second slot, which has a, so it again misses an opportunity on the second channel, which has a in this second time slot. Not shown are the two symmetric versions of these patterns, with the channel lels reversed. Note that these error patterns must occur for the ZSRRP policy to have missed an opportunity availle to the genie. Therefore quantifying the probility of the occurrence of these patterns yields an upper bound on the regret of the ZSRRP. In the first pattern, in channel, a transition occurs, which can be characterized by e () as defined on the ove. Since the channels are mutually independent and the probility of channel 2 being in state is given by φ (2) based on ove definitions, the overall probility for this pattern to occur is characterized by: e () φ(2) (6) The probility of the second pattern is, similarly: e (2) φ() (7) Equivalent expressions are obtained for the symmetric cases where the channel lels are reversed. The rough upper bound on regret for 2 channels that is thus obtained is given by : E[R + ZSRRP ] = i= It is not hard to see the following fact: Further, φ(i+) + e(i+) e(i+) e(i+) (8) = e(i) = + (9) and φ(i), which denote the probility of seeing a and respectively, can also be expressed in terms of the expect length that the process can stay in each state as follows: We apologize for the use of notation in referring to channel numbers; (i+) res the next channel in the round-robin schedule, allowing for wrap-around. Thus, for two-channels, if i = 2, i + =. = = + + () Using the ove observations, we get the following simplified expression for the bound on ZSRRP regret when the channels are i.i.d.: E[R + ZSRRP ] = 2e φ + 2e φ 2e e = 2 τ + τ (τ + τ ) 2 () Equation () shows that the upper bound on the regret for ZSRRP is inversely proportional to the sum of the expected time spent at each state. This intuitive result suggests that this policy performs well when channel states are positively correlated, which is very much in keeping with the previous findings in the literature ([2], [3]) that it is in fact optimal for positively correlated Markovian channels. B. Bounds for One-Switch Round Robin Policy Figure 3 shows the error patterns under which the OSRRP policy misses opportunities. In the first cases, OSRRP causes the user to switch out from a channel because it observes a, and then miss an opportunity in the next time slot. In the second case, OSRRP causes the user to remain on the same channel because it observes a, and then miss an opportunity in the next time slot. Quantifying the probility of these patterns and their symmetric counterparts (with the channel lels switched) yields a corresponding rough upper bound on the regret of OSRRP: E[R + OSRRP ] = i= e(i+) + e(i+) e(i) (2) Which, when the channels are i.i.d., becomes: E[R + OSRRP ] = 2(e φ + e φ e e ) (3) V. IMPROVED UPPER BOUND ON REGRET We now develop improved upper bounds on the regret for N = 2 by ruling out certain cases when the error patterns described in the previous section are not encountered by the respective policies. A. Improved Bound for ZSRRP To develop improved bounds, we make use of the following straightforward observations out the two policies: Observation : For N = 2 and ZSRRP, a sufficient condition to have channel i selected at time slot n is that at time slot n the other channel was in state and channel i was in state. Observation 2: For N = 2 and OSRRP, the sufficient condition for channel i to be selected at time slot n is

Channel Channel Fig. 4. Correct patterns for error pattern for ZSRRP Fig. 6. Correct patterns for error pattern for OSRRP Channel Channel Fig. 7. Correct patterns for error pattern for OSRRP Fig. 5. Correct patterns for error pattern 2 for ZSRRP that at time slot n the other channel was in state and channel i was in state. The bounds derived in the previous section are loose because they quantify the probility of occurrence of error patterns that are not always encountered by the corresponding policy. For example, consider the first error pattern in figure 2. ZSRRP misses an opportunity in the second time slot if the use is in channel at the first time slot. However, consider the scenario depicted on the left part of figure 4. Although the pattern for the second and third slot in this case match error pattern for ZSRRP, because of observation ove, the secondary user following this policy must pick channel 2 in the second slot, not channel, and in doing so does not miss the opportunity in the third slot. Thus even though the two channels have the states indicated in error pattern, a user following ZSRRP does not encounter the pattern and lose an opportunity by it. There are in fact infinitely many such correct patterns, all starting with the same suffix, as shown in figure 4. Similarly, in figure 5, we see that the symmetric version of error pattern 2 occurs (there is a - transition on channel on the second and third slots and a on channel 2 in the third slot), but the user is again constrained to be on the second channel in the second slot because of the channel states in the first time slot, again, as per observation. And again, there is an infinite string of other related longer correct patterns as indicated. Taking these correct patterns into account (i.e., subtracting their probility of occurrence from the probility of occurrence of the basic error patterns) improves the bound on regret to the following: E[R +I ZSRRP ] = E[R+ ZSRRP ] T T 2 T 3 (4) where T = T 2 = T 3 = i= k= i= k= i= k= B. Improved Bound for OSRRP φ (i+) h(i) (k)c rf (i+) (k + ) e(i+) rh (i) (2k + 2) c F (i+) (2k + ) φ (i+) h(i) (2k + 2) c rf (i+) (2k + 3) The improved bound for OSRRP is derived in a similar way as it was done for ZSRRP. The correct patterns in this case for the two error patterns in figure 3 are as depicted by figure 6 and figure 7. We the three correcting terms in this case: S = S 2 = S 3 = i= k= i= k= φ (i+) h(i) (2k + ) c rf (i+) (2k + 2), e(i+) rh (i) (2k + ) c F (i+) (2k), e(i+) c F (i+) i= k= The corresponding improved bound is: (k) r h (i) (k + ). E[R +I OSRRP ] = E[R+ OSRRP ] S S 2 S 3 (5) VI. SIMULATIONS In this section, we evaluate the performance of ZSRRP and OSRRP policies on two-channel systems through simulations and compare them with the bounds derived in the previous sections. For the simulations, we assume that the channels are i.i.d. and the primary occupancy process in each channel can be described as a two-state semi-markov process. We assume that the holding time for state follows a Zipf distribution and that the holding time for state follows a geometric distribution. The p.m.f of a Zipf distribution is given as: p Zipf (n) = n a ζ, (6)

TABLE I ACTUAL ACHIEVED THROUGHPUT FOR ZSRRP τ τ =.2 τ =.9 τ =.32 τ =.6 τ =2.75.95.598.58.5893.5475.4629.43.443.444.4324.43.3387.24.3672.3629.3572.3399.286.5.325.323.378.34.25..32.2986.2925.283.2325 TABLE II ACTUAL ACHIEVED THROUGHPUT FOR OSRRP τ τ =.2 τ =.9 τ =.32 τ =.6 τ =2.75.95.769.7442.6988.68.436.43.7286.723.656.5675.366.24.722.6839.6362.549.3526.5.73.6757.6292.546.346..78.6723.6245.5363.343 τ 2.6 2.4 2.2 2.8.6.4.2 s Intersection Contour under Two Policies OSRRP Dominant Region Actual Achieved Performance Rough Bound Improved Bound Markov Channel Model ZSRRP Dominant Region.2.4.6.8 2 2.2 2.4 2.6 where a is the parameter for Zipf distribution and the ζ function is defined as ζ = i= i. The mean of Zipf a distribution is finite only if a > 2, and it is given by: E{n} = ζ(a ), a > 2. (7) ζ The expected holding time for state can thus be computed according to equation (7). In general, the holding time distribution for a semi-markov process with random starting point is related to the complementary cumulative holding time distribution, by rh sis j (n) = c F sis j (n ) τ sis j. (8) The complementary cumulative holding time distribution with a random start, which is denoted by c rf sis j can be derived as: c rf sis j (n) = n rh sis j (m) (9) m= = n F sis τ j (m ). sis j (2) m= Note that e and e can be expressed by means of stationary distribution φ and the random-start complementary cumulative distribution function c rf (n) as: e = φ c r F () (2) e = φ c r F () (22) In figures 8 and 9, we simulate the actual achieved throughput under ZSRRP and OSRRP and compare them to the various bounds. For both plots, the simulation parameters are as follows. The average holding time for state, τ is set to two different values in each: τ =.2 and τ = 8. And the a parameter for the Zipf holding time distribution is varied from 2.5 to 3., which means by equation (7), τ varies from 2.75 to.37. We can see that the improved bound in general approximate the actual achieved performance well in both cases. Both policies perform worse as a increases, this is Fig.. τ Contour of Intersection Points under Two Policies because as the parameter a gets larger, τ gets smaller compare to τ, reducing the throughput. It is also of interest to know which of the two policies shall perform better under a given circumstance. This can be done by comparing Tle I and Tle II, where we vary the average holding time τ and τ for and process respectively. From the tles, it can be seen that ZSRRP outperforms OSRRP on the upper-right corner of the matrix, while OSRRP is a better algorithm for the lower-left corner of the matrix. In addition, we also plot the contour when the throughput achieved by the two policies cross over, in figure. In particular, we plot the contour under three cases, i.e. the simulated achieved throughput, the throughput indicated by the rough as well as improved bounds ed in the previous two sections. The contours showing the crossover of lower bounds on throughput of the two policies of course is not rigorously an upper or lower bound on the true contour, but nevertheless could be seen as a way to approximate the cross-over contour. Surprisingly, the contour for all the three cases closely match each other, which implies that even the simple rough bounds can help determine the dominant regions for the two policies. In a conventional two-state Markov channel model, it has been proven in [2] that for N = 2, ZSRRP is optimal if p > p and OSRRP is optimal if p < p. Hence it is not hard to see the decision boundary is when p = p, which implies p + p =, and hence we have τ + τ =. (23) Equation (23) is hence the dominant decision region boundary for a Markov model in the case of two channels. This boundary is also shown in figure as a reference. The reason why the lower part of the simulated boundary is closer to the conventional Markov boundary is because as τ gets smaller, the holding time distribution for state becomes close to

.9.8 Simulation Results for two i.i.d. semi Markov Channels (τ =.2).9.8 Simulation Results for two i.i.d. semi Markov Channels (τ = 8) Achieved by Genie. Actual Achieved by ZSRRP. Improved.7.7.6.5.4.6.5.4.3.2. Achieved by Genie. Actual Achieved by ZSRRP. Improved.3.2. Fig. 8. ZSRRP bounds and simulated throughput for τ =.2 and τ = 8.9.8 Simulation Results for two i.i.d. semi Markov Channels (τ =.2).9.8 Simulation Results for two i.i.d. semi Markov Channels (τ = 8) Achieved by Genie. Actual Achieved by OSRRP. Improved.7.7.6.5.4.6.5.4.3.2. Achieved by Genie. Actual Achieved by OSRRP. Improved.3.2. Fig. 9. OSRRP bounds and simulated throughput for τ =.2 and τ = 8 Fig.. The error patterns for N > 2 under ZSRRP and OSRRP

the geometric distribution, so that the semi-markov process behaves increasingly like a conventional Markov process. VII. BOUNDS ON REGRET FOR N > 2 Figure shows the error patterns for ZSRRP when there are more than 2 channels. The situation is quite similar to the N = 2 case, but there are some minor differences. In particular, in case of the second pattern the relevant event probility calculations must differentiate between a appearing on the channel that was andoned, and a appearing elsewhere. The calculations yield the following general bound on regret for ZSRRP for N > 2 channels: where L = L 2 = L 3 = E[R + ZSRRP ] = L + L 2 + L 3 (24) i= i= i= [ N φ(i+), j=,j i φ(i+) ( ], N j=,j i,i+ ) (25) Figure shows the error patterns for OSRRP for N > 2. The corresponding bound on the regret is: where M = M 2 = M 3 = E[R + OSRRP ] = M + M 2 + M 3, (26) i= i= i= [ N j=,j i φ(i+) [ ], N j=,j i,i+ ], φ(i+) (27) It should be noted that, in deriving improved bounds for the N = 2 case, we used the sufficient condition for channel selection. Such techniques can also be applied to improve the bound for N > 2; however, the expressions get more complicated as N becomes large. We omit these detailed calculations. of these policies with respect to an oracle that is aware of all channel realizations in advance using certain characteristic error patterns. The derived bounds are insightful in showing the conditions under which either policy results in low regret. Further, by comparing the bounds for both policies, we are le to gain some insight into the conditions under which one policy outperforms the other. Our results in this paper suggest that these two simple policies may be quite efficient for dynamic multichannel access in practice when the channels are identical. An interesting question to be explored in the future is whether hybrid policies may be similarly useful for the case of non-identical channels (for instance, when some are positively correlated and others are negatively correlated). ACKNOWLEDGMENT The work of C. Wang and G. E. Øien was supported by the NORDITE/NFR(VERDIKT) project CROPS. The work of Q. Zhao was supported by the U.S. Army Research Office under Grant W9NF-8--467. REFERENCES [] Q. Zhao, L. Tong, A. Swami, and Y. Chen, Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad Hoc Networks: A POMDP Framework, in IEEE Journal on Selected Areas in Communications (JSAC): Special Issue on Adaptive, Spectrum Agile and Cognitive Wireles Networks, vol. 25, no. 3, pp. 589-6, April 27. [2] Q. Zhao, B. Krishnamachari, and K. Liu, On Myopic Sensing for Multi- Channel Opportunistic Access: Structure, Optimality, and Performance, IEEE Transactions on Wireless Communications, vol. 7, no. 2, part 2, December 28. [3] S. H. Ahmad, M. Liu, T. Javidi, Q. Zhao and B. Krishnamachari, Optimality of Myopic Sensing in Multi-Channel Opportunistic Access, IEEE Trans. on Information Theory, vol. 55, no. 9, pp. 44-45, September 29. [4] K. Liu, Q. Zhao, and B. Krishnamachari, Dynamic Multichannel Access with Imperfect Channel State Detection IEEE Transactions on Signal Processing, to appear. [5] K. Liu and Q. Zhao, Channel Probing for Opportunistic Access with Multi-channel Sensing, in Proc. of IEEE Asilomar Conference on Signals, Systems, and Computers, October, 28. [6] S. Ahmad and M. Liu, Multi-channel opportunistic access: a case of restless bandits with multiple plays, in Allerton Conference, October 29, Allerton, IL. VIII. CONCLUSION We have considered the problem of a single secondary user selecting which of multiple channels to access at each time to maximize its expected throughput in the presence of stochastic primary user activity. Unlike most of the prior work that has focused on Markovian primary users, we have developed bounds on the performance of two simple policies when the primary traffic on each channel can be modeled as a more general independent stationary and ergodic two-state process. The crux of our technique is to bound the regret