Learning to optimally exploit multi-channel diversity in wireless systems

Similar documents
Joint-optimal Probing and Scheduling in Wireless Systems

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Dynamic spectrum access with learning for cognitive radio

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

Morning Session Capacity-based Power Control. Department of Electrical and Computer Engineering University of Maryland

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Optimal Channel Probing and Transmission Scheduling in a Multichannel System

On the complexity of maximizing the minimum Shannon capacity in wireless networks by joint channel assignment and power allocation

Performance of Round Robin Policies for Dynamic Multichannel Access

Optimal power-delay trade-offs in fading channels: small delay asymptotics

Information in Aloha Networks

Channel Allocation Using Pricing in Satellite Networks

Power Control in Multi-Carrier CDMA Systems

WIRELESS COMMUNICATIONS AND COGNITIVE RADIO TRANSMISSIONS UNDER QUALITY OF SERVICE CONSTRAINTS AND CHANNEL UNCERTAINTY

Optimal Channel Probing and Transmission Scheduling for Opportunistic Spectrum Access

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems

Power Allocation over Two Identical Gilbert-Elliott Channels

Distributed power allocation for D2D communications underlaying/overlaying OFDMA cellular networks

USING multiple antennas has been shown to increase the

OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback. Prof. Phil Schniter

Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks

An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits

Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays

Optimal Power Allocation for Parallel Gaussian Broadcast Channels with Independent and Common Information

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Markovian Decision Process (MDP): theory and applications to wireless networks

Energy Harvesting Multiple Access Channel with Peak Temperature Constraints

Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking

The Optimality of Beamforming: A Unified View

IN this paper, we show that the scalar Gaussian multiple-access

Adaptive Distributed Algorithms for Optimal Random Access Channels

Cognitive Multiple Access Networks

Optimum Power Allocation in Fading MIMO Multiple Access Channels with Partial CSI at the Transmitters

A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling

Efficient Nonlinear Optimizations of Queuing Systems

Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

Optimal Power Allocation for Cognitive Radio under Primary User s Outage Loss Constraint

Distributed Optimization. Song Chong EE, KAIST

Competitive Scheduling in Wireless Collision Channels with Correlated Channel State

Two-Way Training: Optimal Power Allocation for Pilot and Data Transmission

Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations

Dirty Paper Coding vs. TDMA for MIMO Broadcast Channels

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access

Efficient Rate-Constrained Nash Equilibrium in Collision Channels with State Information

Battery-State Dependent Power Control as a Dynamic Game

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

Optimal Association of Stations and APs in an IEEE WLAN

Information Theory Meets Game Theory on The Interference Channel

Learning Algorithms for Minimizing Queue Length Regret

Maximizing System Throughput by Cooperative Sensing in Cognitive Radio Networks

TRANSMISSION STRATEGIES FOR SINGLE-DESTINATION WIRELESS NETWORKS

A Half-Duplex Cooperative Scheme with Partial Decode-Forward Relaying

Sequential Opportunistic Spectrum Access with Imperfect Channel Sensing

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

Distributed Power Control for Time Varying Wireless Networks: Optimality and Convergence

Achievable Outage Rate Regions for the MISO Interference Channel

Adaptive Space-Time Shift Keying Based Multiple-Input Multiple-Output Systems

The Capacity Region of the Gaussian Cognitive Radio Channels at High SNR

Minimum Mean Squared Error Interference Alignment

Optimal Power Allocation over Parallel Gaussian Broadcast Channels

Optimal Power Allocation With Statistical QoS Provisioning for D2D and Cellular Communications Over Underlaying Wireless Networks

DETERMINING the information theoretic capacity of

ALOHA Performs Optimal Power Control in Poisson Networks

Energy-Efficient Resource Allocation for Multi-User Mobile Edge Computing

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

P e = 0.1. P e = 0.01

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

Rate and Power Allocation in Fading Multiple Access Channels

Duality, Achievable Rates, and Sum-Rate Capacity of Gaussian MIMO Broadcast Channels

Distributed Approaches for Proportional and Max-Min Fairness in Random Access Ad Hoc Networks

The Poisson Channel with Side Information

Upper Bounds on the Capacity of Binary Intermittent Communication

Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information

Backlog Optimal Downlink Scheduling in Energy Harvesting Base Station in a Cellular Network

Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information

Introducing strategic measure actions in multi-armed bandits

How Much Training and Feedback are Needed in MIMO Broadcast Channels?

Game Theoretic Approach to Power Control in Cellular CDMA

Distributed Joint Offloading Decision and Resource Allocation for Multi-User Mobile Edge Computing: A Game Theory Approach

Sum Capacity of General Deterministic Interference Channel with Channel Output Feedback

Near-optimal policies for broadcasting files with unequal sizes

Optimal Harvest-or-Transmit Strategy for Energy Harvesting Underlay Cognitive Radio Network

The RF-Chain Limited MIMO System: Part II Case Study of V-BLAST and GMD

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Wireless Channel Selection with Restless Bandits

An Evolutionary Game Perspective to ALOHA with power control

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Chapter 4: Continuous channel and its capacity

OPPORTUNISTIC Spectrum Access (OSA) is emerging

Network Optimization: Notes and Exercises

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access

Cooperative Spectrum Prediction for Improved Efficiency of Cognitive Radio Networks

On Two Class-Constrained Versions of the Multiple Knapsack Problem

Broadcasting with a Battery Limited Energy Harvesting Rechargeable Transmitter

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications

On the Fundamental Limits of Multi-user Scheduling under Short-term Fairness Constraints

Transcription:

Learning to optimally exploit multi-channel diversity in wireless systems P. Chaporkar (IITB, Mumbai), A. Proutiere (Microsoft Research, UK), H. Asnani (IITB, Mumbai) Abstract Consider a wireless system where a transmitter may send data to a set of receivers, or on various channels, experiencing random time-varying fading. The transmitter can send data to a single receiver or on a single channel at a time and may adapt its transmission power to the radio conditions of the chosen receiver/channel. Its objective is to implement a strategy defining at each time how to select the receiver/channel and transmission power, so as to maximize its throughput, i.e., its average sending rate, under an average power constraint. The optimization problem is easy when the fading conditions of all the receivers/channels are known. In many situations however, the instantaneous fading conditions are not known a priori, instead they have to be acquired, i.e., receivers/channels have to be probed, which consumes resources (time, spectrum, energy) in proportion of the number of probed receivers/channels. Hence, the transmitter may choose not to acquire the radio conditions of all the receivers/channels so as to spare resources for actual transmissions. In this paper, we aim at characterizing a joint probing, receiver/channel selection and power control strategy maximizing throughput. We provide an adaptive algorithm converging to the throughput optimal strategy. This algorithm may be used in a wide class of wireless systems with limited information, such as broadcast systems without a priori knowledge of the instantaneous Channel-State Information (CSI). But it can be also used to solve dynamic spectrum access problems such as those arising in cognitive radio systems, where secondary users can access large parts of the spectrum, but have to discover which portions of the spectrum offer more favorable radio conditions or less interference from primary users. I. INTRODUCTION Opportunistic resource allocation has been shown to significantly improve the performance of wireless systems by exploting (rather than countering) location dependent and time varying channel conditions on various links. But, to employ opportunistic schemes, the transmitter has to know the channel side information (CSI) on each of the links. The CSI is not automatically known, rather it has to be acquired. Acquiring CSI on each link consumes resources (time, power and bandwidth) proportional to the number of links. Often the gain due to opportunism compensates for the resources invested in CSI acquisition, and hence many systems keep resources aside for CSI acquisition. For example, in CDMA/HDR [2] broadcast systems, a dedicated uplink channel for each receiver is maintained to communicate the CSI. In IEEE 82.16 based WiMax systems, CSI can be obtained by polling each link at the beginning of a frame [8]. However there is an increasing number of systems where it is not feasible to maintain dedicated resources for CSI acquisition; rather the CSI acquisition should be done on Prof. Chaporkar s work is supported by India-UK Advanced Technology Centre (IU-ATC) of Excellence in Next Generation Networks Systems and Services. demand. We refer to systems with on demand CSI acquisition as limited information based MAC. An increasingly important example of limited information based MAC is the opportunistic spectrum access methods used in systems such as cognitive radio systems. In these systems, a secondary user may access a large number of frequency bands provided that these bands are not currently occupied by licensed or primary users. In this scenario, a user willing to maximize its transmission rate, has to opportunistically use the spectrum parts left idle by primary users and offering favorable fading conditions. Of course, the user cannot maintain dedicated resources to acquire the CSI on each frequency band, and to check whether a primary user is using it. Rather before transmitting, the user should acquire this information on a few well selected bands. For an optimal design of limited information based MAC, one has to strike the best exploration versus exploitation tradeoff. Here, exploration refers to finding out (probing) link CSIs. Exploration consumes resources proportional to the number of links probed, and thus leaves few resources for the actual data transmission. On the other hand, exploitation refers to opportunistically transmitting on the probed link with the best CSI, and hence more links one probes, greater is the chance to find a link with good channel conditions. In [6], we have developed a probing strategy achieving the optimal exploration vs. exploitaion trade-off under the assumption that the transmitter always transmits at a fixed power. Here, our aim is to achieve the optimal trade-off when the transmitter can vary the transmit power, but has to satisfy an average power constraint. In wireless networks, power is an important resource that should be used optimally. Hence, it is important to design joint probing and power control schemes that maximize the system throughput. We investigate the throughput gain achieved using power adaptation in limited information based MAC. Now, we elaborate on the analytical challenges in obtaining the optimal joint probing and power control schemes in limited information based MAC. When the fading conditions on the various links are known at the transmitter, then the optimal power control scheme can be obtained as a solution of a convex optimization program. For example, in the case where a single channel can be used at a time, the optimal scheme is to always transmit on the channel with the most favorable fading state, and to share power in time through the celebrated waterfilling procedure, where the water-filling level is obtained so that the average power constraint is satisfied with equality [9]. To compute the water-filling level, one needs to know the distribution of the CSI of chosen link (the link with the best CSI). As we will demonstrate, a similar analysis is possible 978-1-4244-5837-/1/$26. 21 IEEE

even in the case of limited information based MAC, i.e., the optimal probing and power control scheme can be obtained as a solution to an optimization problem. A major difficulty in solving this problem is that the constrained set of possible schemes is large and is extemely intricate to characterize. Indeed, to compute the average power consumption of a given scheme, we need to quantify the distribution of the CSI of the link selected at the end of the probing phase, which turns out to be almost impossible. Hence, we need to solve the optimization problem without really knowing the constrained set of possible schemes. To circumvent this difficulty, we propose an on-line learning strategy that provably converges to the optimal joint probing and power control scheme. More precisely, the contributions in this paper are as follows: We formalize the problem of designing optimal opportunistic probing and power allocation schemes as a Constrained Markov Decision Process (Section II). We provide structural properties of the problem in systems where transmitting on a single link at a time is permitted. These properties allow us to characterize the throughputoptimal strategies (Section III). As the complexity of the numerical computation of the optimal strategies from the aforementioned characterization grows exponentially with the number of links, we propose an on-line learning algorithm with linear complexity that provably converges to the optimal strategy (Section IV). The results are then extended to the case where the transmitter is allowed to transmit on several links at a time (Section V). Finally, we illustrate and discuss, using simulations, the efficiency of the proposed optimal exploration-exploitation strategies. In particular, we evaluate the price in terms of throughput that has to paid due to the lack of information, i.e., due to the fact that the channel states have to be acquired (Section VI). Note that related work is presented in Section VII, and we conclude in VIII. II. SYSTEM MODEL AND PROBLEM FORMULATION We present the first basic model considered in this paper to analyze the problem of designing optimal exploration/exploitation strategies in limited information based MAC. We generalize this model in Section V. A. Model Consider a user that can possibly transmit on N channels, but on one channel at a time. Time is slotted. The slot duration is assumed to correspond to the coherence time of channels. We assume that the radio conditions on the various channels satisfy the block fading model: the radio conditions on channel i are constant during each slot, and hence represented by a channel state C i (t) in slot t. The random variable C i (t) takes its values in a finite set C = c 1,c 2,,c M. Moreover, C i (t), t are i.i.d. random variables with distribution F i ( ). We assume that the distribution F i ( ) is known to the transmitter for every i. Here the underlying assumption is that the user remains in the system a long time, so that it can learn F i ( ). We also assume that the channel states are independent across channels, i.e., the random variables C i (t),t,i=1,...,n are independent (spatial diversity). At the beginning of each slot, the user may acquire the state of one or several channels sequentially. Probing a channel takes a fixed proportion β of the slot, so that after probing k channels, the fraction of the slot available for actual data transmissions is (1 kβ). When the user decides to transmit on the probed channel i observed in state c Cwith power p, its transmission rate is approximated by Shannon formula: R(c, p) = log(1 + c p N ), where N denotes the thermal noise power. The choice of the rate function R(, ) does not impact the results derived in this paper, provided that its is increasing and concave in the second argument, i.e., in power. If the user transmits after probing k channels, and decides to transmit at power p on a channel in state c, the amount of information transmitted during this slot is: (1 kβ)r(c, p). Note that in [17], [5], similar models (but with fixed transmit power) have been considered and exemplified in practical systems. In order to utilize the channel resources and its power reserve optimally, the user has to decide in a smart way the order in which it is going to probe channels, when to stop probing and start transmitting actual data, and finally at which power it should transmit. In short, it has to implement an optimal probing and power allocation strategy. Formally, we define such a strategy as follows. Consider an arbitrary slot (the slot considered does not play any role here as the system is i.i.d. over slots). In this slot, let s =[s 1 s N ] denote an N-dimensional vector indicating which channels have been already probed and also the states of these channels. If the i th channel has been probed, then there exists c Csuch that s i = c; and for unprobed channels, we let s i = 1. The set of all possible states is S = (C 1) N. Depending on the past decisions in the slot, and its observation of the channel states, the user has to decide whether to probe further, or to transmit on a channel, and at which power. This decision can be random, e.g. with some probability the user decides to probe further, and with some other probability it decides to stop and transmit. In Figure 1, we give an example of decisions in a simplistic 3-channel system. In the following, we denote by P(A) the set of probability measures on A. Exploration Decisions (2,p) (3,p) (3,p) State s P2 P3 Tr 3 ( 1, 1, 1) ( 1,c, 1) ( 1,c,c ) Exploitation Fig. 1. Decisions made in one slot - Exploration phase of duration 2β: Channels 2 and 3 are probed; Exploitation phase of duration (1 2β): transmission on channel 3 at power p (i.e., (1 2β)R(c,p) bits are sent). Definition 1: A joint probing and power control strategy π is a mapping from the set of states S to the set P(1,...,N R + ), i.e., in every state s, π chooses a pair (i, p) randomly according to the distribution π(s). If s i = 1, then the user probes channel i, observes its

state c, and the system state changes to s, where s j = s j for j i and s i = c. If s i C, it means that the channel i has been probed already. The user stops probing and starts transmitting on channel i with power p. The above definition does not exclude deterministic strategies that choose a single couple (i, p) (i.e., w.p. 1) in each state s - In this case, π(s) =δ (i,p). It is worth observing as well that the decision taken by a strategy π is defined in all possible states s S, although because of the specific choices made by π, some states may not be actually reached (for example, π can decide that channel 1 is never probed first, in which case, the state (c, 1,..., 1) for c Cis never reached under π). Strategy π in those states can be arbitrarily defined. We denote by Π the set of all probing and power allocation strategies. For a given strategy π Π, we define by ρ π the corresponding occupation measure, i.e., for any subset A S of states, and Borel set I R + of possible transmission powers, the probability that under π, the user stops probing in a state s A and start transmitting at a power p I is: ρ π (A I) = 1 s A 1 p I dρ π (s,p). S R + We also introduce the measure σ π, corresponding to the distribution of the state in which the strategy π stops: for any A S, σ π (A) = 1 s A dρ π (s,p). S R + We refer to σ π ( ) as the terminal state distribution. The occupation measure ρ π results from the random decisions made by π, and also from the random channel states. B. Problem formulation We are now ready to state the problem of designing a probing and power allocation strategy maximizing user s longterm throughput subject to an average power constraint. Since the objective is to maximize throughput, we restrict our attention to strategies that when deciding to stop and transmit, transmits on the channel with the best observed state. In state s, we denote by s = maxs i,i =1,...,N the state of the best (probed) channel. We also denote by k(s) the number of channels that have been probed in state s. Both throughput T (π) and average power P (π) under strategy π are expressed through the occupation measure ρ π : T (π) = dρ π (s,p)(1 k(s)β)r( s,p), (1) S R + P (π) = dρ π (s,p)(1 k(s)β)p. (2) S R + Denote by P the average power budget. Our problem is then formalized as follows (O1) Find π Π maximizing T (π) subject to P (π) P. This problem cannot be solved using classical methods, e.g., convex optimization techniques, simply because the objective and the constraint are both functions of the occupation measure, which proves quite complicated to characterize for a given strategy. In fact the problem belongs to the class of constrained stochastic control problems [1] which are notoriously difficult. In the next section, we provide some structural properties of (O1), that will help the analysis. III. STRUCTURAL PROPERTIES OF OPTIMAL STRATEGIES To solve (O1), we need to study the structure of the possible optimal probing and power allocation strategies. First we show that it is useless to randomize the power allocation. Then we prove that optimal power allocations are always obtained through water-filling. We show that this implies that solving (O1) is equivalent to identifying the saddle point of a function depending on the probing strategy and on a parameter defining the level of the water-filling procedure providing the power allocation. Finally, we provide structural properties of the probing strategy maximizing this function. A. Derandomizing power We first define the set Π 1 Π as the set of strategies π such that the power allocation is deterministic in the sense that when in state s, π decides to stop probing and to transmit, it then picks a unique transmission power, denoted by p π (s). Mathematically this implies that for any state s and any subset I of R +, ρ π (s,i)=σ π (s) 1 pπ(s) I. In the following, for any π Π, we denote by ρ π (p s) the probability that π selects power p given that it stops probing in state s. Lemma 1: Let π Π. Consider π Π 1 such that it makes the same probing decisions as π, but averages the transmission power decisions made by π: for any state s, ifπ chooses a pair (i, p) for some possible power p, then π chooses (i, p ), with p = R + dρ π (p s)p. Then: T (π ) T (π). Proof. Note that since R(, ) is concave in power, for any state s we have, by Jensen s inequality and the definition of π, that: R(s,p π (s)) R + dρ π (p s)r(s,p). Then: T (π) = dρ π (s,p)(1 βk(s))r( s,p) S R + = σ π (s)(1 βk(s)) dρ π (p s)r( s,p) p R + σ π (s)(1 βk(s))r( s,p π (s)) = T (π ). B. Optimality of water-filling Now we investigate the possible form of optimal power allocations. We fix the terminal state distribution σ P(S), and given that distribution, we seek the best power allocation. A (deterministic) power allocation is represented by a function p : S R +. The throughput achieved by power allocation p( ) is: T (σ,p) = σ(s)(1 βk(s))r( s,p(s)). The average power consumption under p( ) is: P (σ,p) = σ(s)(1 βk(s))p(s). We seek to solve, for a given σ P(S):

(P σ ) Find p ( ) maximizing T (σ,p) subject to P (σ,p) P. Clearly (P σ ) is a convex optimization problem, and should R(, ) be strictly concave in power, it admits a unique solution. Consider the associated Lagrangian: L σ (p( ),μ)= σ(s)(1 βk(s))[r( s,p(s)) μp(s))]+μp, where μ denotes the Lagrange multiplier. Denote by G(σ, μ) = max p( ) L σ (p( ),μ). The solution of (P σ ) is obtained with a power allocation obtained through a water-filling procedure of parameter μ, as stated in the following lemma: Lemma 2: We have: G(σ, μ) = σ(s)(1 βk(s)) [R( s,p μ ( s)) μp μ ( s))] + μp, where [ 1 p μ ( s) = μ N ] +. s Proof. The result follows by solving G p(s) =for all s. C. Saddle point interpretation From the previous result, the power allocation in a throughput optimal strategy is necessarily obtained through a waterfilling procedure. Hence to identify such an optimal strategy π, we may restrict our attention to strategies defined by a probing strategy and a parameter μ defining the level of the water-filling procedure. To formalize this observation, we define the notion of probing strategy: Definition 2: A probing strategy ν is a mapping from S to the set P(1,...,N), i.e., in every state s, π chooses an index i randomly according to the distribution ν(s). If s i = 1, then the user probes channel i, observes its state c i, and the system state changes to s, where s j = s j for j i and s i = c i. If s i C, it means that the channel i has been probed already. The user stops probing yielding a terminal state s. We denote by V the set of probing strategies. The couple composed by a probing strategy ν V, and a power allocation obtained through water-filling of level μ (i.e., p μ ( )) defines a strategy π Π 1, and we use the notation π =(ν, μ). Define Π 2 as the set of such strategies: Π 2 = π Π 1 : ν V,μ>,π =(ν, μ). For a strategy π =(ν, μ) Π 2, the terminal state distribution σ π depends on π through the probing strategy ν only; hence we may write σ ν = σ π. Summarizing what we have shown so far: Solving (O1) is equivalent to solving (O2) where: (O2) Find π Π 2 maximizing T (π) subject to P (π) P. The following crucial result will help us to characterize the solution of (O2). It states that the solution may be interpreted as the saddle point of the function (ν, μ) G(σ ν,μ) defined in III-B. Theorem 1: Let π =(ν,μ ) Π 2. The strategy π is optimal if and only if the pair (ν,μ ) satisfies the following saddle point condition: for any ν V,μ>, G(σ ν,μ ) G(σ ν,μ ) G(σ ν,μ), or equivalently, G(σ ν,μ ) = min max G(σ ν,μ) = max min G(σ ν,μ). (3) μ> ν V ν V μ> The proof of Theorem 1 is not straightforward since G is not the Lagrangian of problem (O2), and hence (3) does not a priori express the strong duality of some optimization problem. Next, we present the formal proof. Proof. First, we show that (O1) is a convex optimization problem. To show this, we need to show that (1) Π is a convex set, and (2) T (π) is concave in π. Note that a joint probing and power control policy π is characterized by its occupation measure ρ π. Thus, the convex combination of the two policies is defined as the convex combination of their occupation measures, elementwise. That is, for every α [, 1], π = απ 1 +(1 α)π 2 implies that ρ π (s,p)= αρ π1 (s,p)+(1 α)ρ π2 (s,p). Clearly, π is a valid joint probing and power control policy as it can be obtained by choosing π 1 w.p. α and π 2 w.p. (1 α). Thus, Π is a convex space. Now, we show that T (π) is concave in π. We need to show that T (π) αt (π1) +(1 α)t (π2). First, note that σ π (s) = ασ π1 (s)+(1 α)σ π2 (s), ρ π (p s) = θρ π1 (p s)+(1 θ)ρ π2 (p s), ασ where θ = π1 (s) ασ π1 (s)+(1 α)σ π2 (s). With the above observations and some algebra, it can be verified that T (π) = αt (π1) + (1 α)t (π2). Thus, T (π) is a concave function of π. Now, we show that (O1) has strong duality property using Slater s constraint qualification condition. Note that any strategy π that allocates zero power in every terminal state, i.e., ρ π(s, ) = 1, is a strictly feasible solution of (O1). Thus, Slater s condition holds. This implies that max min π Π λ T (π) + λ(p P (π) ) = min max λ π Π T (π) + λ(p P (π) ). (4) In (4), λ is the Lagrange multiplier. Now, by Lemma 1, we know that the optimal probing and power control strategy lies in Π 1. Thus, (4) holds even when Π is replaced by Π 1. Let Π σ denote the set of policies π that generate the same terminal distribution σ. Moreover, let Σ=σ : σ = σ π for some π Π 1. With this notation, the right hand side of (4) can be written as follows: min T (π) + λ(p P (π) ). (5) max max λ σ Σ π Π σ Consider the last optimization in (5), and note that max T (π) + λ(p P (π) ) = max L σ(p( ),λ). π Π σ p( ) This is because, in Π σ, policies differ in their power allocation only. Thus, optimizing over Π σ is equivalent to choosing optimal power control. Thus, min max max T (π) +λ(p P (π) ) =min max G(σ, λ). λ σ Σ π Π σ λ σ Σ (6)

Now, note that the left hand side of (4) is equal to max max min T (π) + λ(p P (π) ). (7) σ Σ π Π σ λ Using the similar arguments as before, we note that max min T (π) + λ(p P (π) ) = max min L σ(p( ),λ). π Π σ λ p( ) λ Using the strong duality of (P σ ), we conclude that max min T (π) + λ(p P (π) ) = min max L σ(p( ),λ). π Π σ λ λ p( ) Thus, max max min T (π) + λ(p P (π) ) σ Σ π Π σ λ = max G(σ, λ). (8) min σ Σ λ From (6) and (8), we conclude that min max λ σ Σ G(σ, λ) = max min σ Σ λ G(σ, λ). The result follows. Theorem 1 provides a simple way to verify the optimality of a given strategy π =(ν, μ) in Π 2. For example, we simply have to check that: 1. ν = arg max ν V G(σ ν,μ); 2.μ = arg min μ > G(σ ν,μ ). Now observe that for any σ P(S), G(σ, μ) is minimized in μ if and only if the resulting average power consumption is exactly equal to P (just deriving G w.r.t. μ). Summarizing, we have the following characterization of optimal strategies: Corollary 1: Let π Π. The strategy π solves (O1) if and only if π Π 2, i.e., (ν,μ ) V R + : π =(ν,μ ), and (1) ν = arg max ν V G(σ ν,μ ),(2)P (σ ν,μ ) = P, where for any (ν, μ), P (σν,μ) denotes the average power consumption under strategy (ν, μ): P (σν,μ) = σ ν (s)(1 k(s)β)p μ ( s). D. Structure of the optimal probing strategy If one wish to use the characterization of the solution of (O1) provided in the above corollary, one needs to be able to verify Condition (1). In other words, we need to solve the following problem for a fixed μ: (P μ ) Find ν V maximizing G(σ ν,μ). (P μ ) can be seen as a generalized version of stopping time problems, and as it turns out, similar problems have been recently studied and solved, see [5], [6]. We adapt the results of these existing analysis to our setting. For brevity, we introduce the following notation: for any c C, G(c) =R(c, p μ (c)) + μ[p p μ (c)]. Assume that at a given slot, the system is in state s. - If under strategy ν, we stop probing and transmit (on the best channel), the reward is G tr (s) with G tr (s) =(1 k(s)β)g( s); - If under strategy ν, we probe further a channel i in state c C, the state becomes s = s(i) where for all j i, s j = s j and s i = C i. C i is the random variable representing the state of channel i. Now denote by G (s) the average reward under an optimal strategy ν starting from state s. Bellman s equation allows us to recursively characterize G : for any s S, G (s) = maxg tr (s), max i:s i= 1 E i[g (s(i)), where E i [ ] is the expectation taken w.r.t. the distribution F i ( ) of the state of the i-th channel. To characterize the solution ν of (P μ ), we need to compute G (s ) where s =( 1,..., 1) is the initial state. To do so, let s introduce the average reward G pr,tr (s) obtained when, starting in state s, one first probes channel i and after that, one stops and transmits (on the best channel): G pr,tr (s) =(1 (k(s)+1)β) max i:s i= 1 E i[g(max s,c i )]. For the results of [5], [6] to be applicable, we need the following property of function G( ) that can be easily checked: Lemma 3: G( ) is a non-decreasing function. We are now ready to provide two structural properties of the optimal probing strategy ν, that will actually characterize this strategy in some particular but relevant cases. 1) Optimal stopping rule: The following result states that in any given state s S, to optimally decide whether to stop and transmit or to probe further, we only need to follow the choice made by the one-step-look-ahead strategy [5], [6]. Theorem 2: Let ν be the optimal probing strategy solving (P μ ).Inanystates S, ν decides to probe another channel if and only if: G pr,tr (s) >G tr (s). Theorem 2 is sufficient to characterize the optimal strategy when the states of the various channel are i.i.d. Indeed, in this case, the order in which channels are probed has no impact on the average reward, and hence we can probe channels in any order. However, when the channel states are not identically distributed, characterizing ν becomes extremely complicated and is an open problem in general. This might be explained by the fact that the one-step-look-ahead strategy is not always optimal as shown in [6]. 2) Optimal channel probing order: As discussed above, the main challenge in characterizing ν is to determine the optimal order in which channels should be probed. And in general, this issue proves impossible to solve. However, there are special cases where it is still possible to find ν. Specifically, when the channel states are stochastically ordered (as defined below), the optimal order is obtained when the stochastically largest unprobed channel is probed. Channels are stochastically ordered if there exists a permutation ω of 1,...,N such that for all i, j,ifω(i) ω(j), then C ω(j) st C ω(i), where X st Y if and only if for all increasing function f such that E[f(Y )] <, E[f(X)] E[f(Y )]. Without loss of generality, when the channels are stochastically ordered, we assume that the permutation ω is ω(i) =i for all i. An example of ordered channels is when one can write C i = E[C i ]Y i where the random variables Y i s are i.i.d. copies of a fixed random variable Y, i.e., when the

channels have similar distributions but different means. This is a quite usual fading model in wireless networks, for example in the case of Rayleigh fading. In these settings, we can obtain an optimal probing strategy [6]: Theorem 3: Assume that the channels are stochastically ordered. Let ν be the optimal probing strategy solving (P μ ). In any state s S, under ν, the decision on whether to stop and transmit or to probe further is defined by the rule of Theorem 2. Moreover, if the decision is to probe further, the next channel to probe is the stochastically largest un-probed channel. In other words, we necessarily have: s =(c 1,...,c k(s), 1,..., 1), and the channel to probe next is channel k(s)+1. E. Summary In this section, we have proved that the optimal probing and power control strategy π solving (O1) has the following properties: (i) the optimal power control strategy is deterministic; (ii) it is obtained via a water-filling procedure of parameter μ ; (iii) π =(ν,μ ) where ν denotes the optimal probing strategy, and (ν,μ ) satisfies ν = arg max ν V G(σ ν,μ ) and P (σ ν,μ ) = P ; finally, we have identified how to determine ν = arg max ν V G(σ ν,μ ). We have theoretically characterized the optimal probing and power control strategy. However, we still need to numerically compute the optimal water-filling parameter μ, which is difficult since the average power consumption depends on both the probing strategy and the water-filling parameter. Such computation might be prohibitive on a simple mobile device. Indeed, computing the average power consumption even for a fixed strategy needs to consider all possible realizations of the state of all channels, which requires O(#C N ) operations. In the next section, we propose a simple algorithm that the user can run while exploring and exploiting the spectrum resources and that provably converges to the optimal probing and power control strategy. In each slot, the user has to perform O(N) operations to make its probing and power control decisions. The price for reducing the complexity is the time it takes for the algorithm to converge. IV. OPTIMAL ON-LINE STRATEGY We now provide an on-line algorithm that provably converges to the optimal joint probing and power control strategy. The algorithm may be interpreted as a multiple timescale stochastic approximation algorithm. We first describe the algorithm, and then prove its convergence. A. Stochastic learning algorithm The algorithm seeks to solve min μ max ν G(σ ν,μ). At each slot, the parameter μ, defining the power allocation obtained through water-filling, is updated. The probing strategy ν is also updated at each slot so as to maximize G(σ ν,μ). The latter update is performed using the analysis presented in III-D. The update of μ is done so that μ converges to μ solution of G μ =, which is equivalent to the fact that the average power consumption under p μ ( ) is exactly P. Formally the algorithm maintains two random variables: the power allocation parameter μ n R + in slot n, and P n R + representing the average empirical power consumed until slot n. The Algorithm operates as follows. Algorithm 1 1) In the n-th slot, run the probing strategy ν n arg max ν G(σ ν,μ n ), and power allocation p μn ( ); 2) At the end of slot n: (i) Observe γ n+1 the transmission power during slot n, and update P n as: (ii) Update μ n as: P n+1 = P n + a n (γ n+1 P n ); (9) μ n+1 = μ n + b n (P n+1 P ). (1) The step-size sequences (a n ) and (b n ) are chosen such that: n a n, n b n =, n a2 n, n b2 n <. Note that in principle, we should update the parameter μ n as a function of the actual average power consumption using the optimal strategy given the power allocation parametrized by μ n.this average power cannot be observed in one slot of course, so we need to impose that the update on μ n is much slower than that of P n, in other words we require that b n /a n as n. Note also that Algorithm 1 is easy to implement, because Step 1. only requires to implement ν n that have been completely characterized in III-D; this requires O(N) operations, since in the worst case we probe all channels. B. Convergence analysis We prove that Algorithm 1 converges to the optimal probing and power allocation strategy, i.e., the long-term throughput is optimized while satisfying the power constraint. Theorem 4: Under Algorithm 1, we have almost surely: μ n μ, ν n ν when n. Proof. The updates in Algorithm 1 can be written as: P n+1 = P n + a n (E[γ n+1 F n ] P n + Mn+1), 1 μ n+1 = μ n + b n (E[P n+1 F n ] P + Mn+1), 2 where the σ-algebra F n = σ(p m,μ m,m n) represents the past up to slot n, and M 1 n and M 2 n are martingale difference sequences defined by: M 1 n+1 = γ n E[γ n+1 F n ], M 2 n+1 = P n+1 E[P n+1 F n ]. Note that the average power E[γ n+1 F n ] observed in slot n depends on the past only through the parameter μ n, hence we can define a function g(, ) such that g(μ n,p n ) = E[γ n+1 F n ] P n. Similarly E[P n+1 F n ] depends on the past through P n and μ n only, and there exists a function h( ) such that h(μ n,p n )=E[P n+1 F n ] P. Hence the updates in Algorithm 1 become: P n+1 = P n + a n (g(μ n,p n )+Mn+1), 1 μ n+1 = μ n + b n (h(μ n,p n )+Mn+1). 2 These are the equations of a stochastic approximation algorithm with two time-scales as considered in [4] Chapter 6. It can be shown then that h and g are Lipschitz. Now the conditions to apply the results of [4] are met, and we deduce

that Algorithm 1 converges. Now since the unique equilibrium point of Algorithm 1 is that where the power consumption is exactly P and where an optimal probing strategy is used, Theorem 4 is proved. V. MULTI-CHANNEL TRANSMISSIONS So far, we have considered that a user may access N channels, but transmits on one of these channels at a time. Here, we extend the analysis to the case where the user can simultaneously transmit on several channels at a time, provided that these channels have been probed. We assume that the various channels are orthogonal, so that concurrent transmissions on different channels do not interfere with each other. Now the decision problem that the user faces is similar to that investigated previously, except that here when the user decides to stop and transmit, it has to decide the transmission power on each of the probed channels. Note that the user may decide not to transmit at all on a given channel by allocating a zero power on this channel. As before, the user s objective is to maximize its throughput. The analysis of this problem uses similar methods as those developed in Sections III and IV. A. Problem formulation We first define the space of probing and power allocation strategies in the case of possible multi-channel transmissions, and then state the throughput maximization problem. Definition 3: A joint probing and power control strategy π is a mapping from the set of states S to the set P(, 1,...,N R N + ), i.e., in every state s, π chooses a pair (i, p) randomly according to the distribution π(s). If i>, then the user probes channel i, observes its state c i, and the system state changes to s, where s j = s j for j i and s i = c i. If i =, the user stops probing and starts transmitting on channel j with power p j. In state s, let(i, p) be the decision made under π. Then, we impose the following restrictions on π s decisions: (a) If i>, π probes channel i so it means that i had not been probed earlier, i.e. s i = 1. (b)ifi =, then under π, the user stops and transmits. To ensure that it transmits on probed channels only, we impose p j > only if s j C. The set of joint probing and power allocation strategies satisfying (a) and (b) is denoted by Π. Like before, for any π Π, we define the associated occupation measure ρ π ( ) and the terminal state distribution σ π ( ). Using these, we can compute the throughout and the average power under π: T ( π) = dρ π (s, p)(1 k(s)β) R(s j,p j ), (11) P ( π) = S R N + S R N + dρ π (s, p)(1 k(s)β) p j. (12) We seek to solve the following optimization problem: (Õ1) Find π Π maximizing T ( π) subject to P ( π) P. B. Optimal power allocation and saddle point interpretation We first provide structural properties of the optimal power allocations that simplify problem (Õ1). First using the concavity of the rate function R(, ) in power, we can reproduce the proof of Lemma 1, and prove that we may restrict our attention to deterministic power allocations. We define Π 1 the set of strategies having deterministic power allocations, and denote by p π (s) the power allocation vector chosen under strategy π Π 1 in state s S. Next, we identify for a given terminal state distribution σ P(S), the optimal power allocation. The power allocation of a strategy π in Π 1 whose terminal state distribution is σ is just represented as a function p : S R +, and the couple (σ, p( )) uniquely defines the throughput and the average power consumption: T ( π) = T (σ,p) = σ(s)(1 k(s)β) R(s j,p j (s)), P ( π) = P (σ,p) = σ(s)(1 k(s)β) p j (s). We solve: ( P σ ) Find p ( ) maximizing T (σ,p) subject to P (σ,p) P. The above problem is convex with associated Lagrangian: L σ (p( ),μ)= σ(s)(1 βk(s)) [R(s j,p j ) μp j ]+μp We can then easily show that the power allocation maximizing the Lagrangian is again obtained through water-filling with parameter μ. Note that here the water-filling is made in time and channels, i.e., for any state s S, the optimal power allocation is p μ (s) with: for any j 1,...,N, if s j = 1, p μ,j (s) =; [ 1 if s j C, p μ,j (s) = μ N ] + = p μ (s j ). s j Hence we can restrict our attention to strategies within the set Π2 of strategies whose power allocations are obtained through water-filling in time and channels. Any strategy π Π 2 can be represented as a couple (ν, μ) V R +, where ν is a probing strategy satisfying σ π = σ ν and μ is the timechannel water-filling parameter of the power allocation. Now for any (σ, μ) P(S) R +, define G(σ, μ) as: G(σ, μ) = max L σ(p( ),μ) p( ) = σ(s)(1 k(s)β) R(s j,p μ (s j )) ( + μ P σ(s)(1 k(s)β) ) p μ (s j ). Then as in Theorem 1, it can be shown that an optimal probing and power allocation strategy is (ν,μ ) Π 2 and solves the following strong maxmin condition: max min G(σ ν,μ) = min max G(σ ν,μ). ν V μ μ ν V

Finally, π Π solves (Õ1) iff π =(ν,μ ) Π 2 with: 1) ν = arg max ν V G(σν,μ ), 2) P (σ ν,μ ) = P, where for any (ν, μ), P (σν,μ) denotes the average power consumption under strategy (ν, μ): P (σν,μ) = σ ν (s)(1 k(s)β) C. Structure of optimal probing strategies p μ (s j ). We fix the power control to be a water-filling power allocation with parameter μ. For this power control, we obtain the optimal probing strategy ν. For any state s S, define A s as the set of probed channel in state s, and A s be the set of un-probed channels. Also define: G(s) = j A s G(s j ), where G(c) =R(c, p μ (c)) μp μ (c). Now, consider the system to be in state s. If a probing strategy terminates in s, then the total reward received is G tr (s) =(1 k(s)β) G(s). If the strategy decides to probe further, say channel i, then the system state changes from s to s(i), where s(i) satisfies A s(i) = A s i. Let G (s) denote the maximum expected reward starting from state s. Then, we can characterize G ( ) recursively, starting from state with s =( 1,..., 1), using Bellman s equation: G (s) = max G tr (s), max E i [ G (s i )]. i A s In order to characterize ν, let us define the following term that provides the maximum expected reward which can be obtained by probing exactly one additional channel. G pr,tr (s) =(1 (k(s)+1)β)[ G(s) + max i A s E i [G(C i )] 1) Optimal stopping rule: Now, we characterize the states in which an optimal probing strategy terminates: Theorem 5: The optimal probing strategy ν terminates in state s, if and only if G tr (s) G pr,tr (s). 2) Optimal channel probing order: Now, we fully characterize ν, by obtaining an optimal channel probing order. Theorem 6: Assume that the channels are stochastically ordered. Fix any state s S such that G tr (s) < G pr,tr (s). (13) Then, ν probes the stochastically largest channel in A s. The proof of Theorems 5 and 6 are similar to those of Theorems 2 and 3. D. An optimal on-line strategy Again computing π can be quite difficult (exponential complexity). As in the case where transmissions on a single channel were allowed, we can propose an on-line learning algorithm that provably converges the optimal strategy. The algorithm is exactly the same as Algorithm 1 except that we use the strategy νn arg max ν G(σν,μ n ) in slot n. ]. VI. SIMULATION RESULTS We now illustrate the throughput gains achieved by an optimal probing and power control strategy using simulation. The N channels are equivalent, and experience Rayleigh fading, i.i.d. across slots. The results with heterogeneous channels follow similar trends and are omitted due to space constraints. We assume that β =.4. The optimal strategy π is compared to: (1) a genie-aided strategy, that assumes that at the beginning of each slot the channel states are known; (2) a fixed-power strategy π fp where an average power P is used in each slot (in the case of multi-channel transmissions, the power is evenly spread among probed channels), and where an optimal probing strategy, given this fixed power allocation, is used as determined in [17], [6]; (3) Strategies π 1 and π N where one or all channels are probed, and where the optimal power allocation, given this probing strategy, is used. We compute π using learning Algorithm 1 (or its equivalent in multichannel transmission scenarios), with parameters a(n) = (1/n).8 and b(n) =1/n. We observe a convergence time 1 for this algorithm that lies between 2 and 3 slots with up to 25 channels. Figures 2(a) and (b) present results when transmitting on only one channel is allowed. Figure 2(a) shows the throughput of various strategies as a function of N for a fixed SNR = 1 db. Comparing the throughput achieved with the genieaided strategy and the others allows us to quantify the price of information; e.g. for N channels, the loss in throughput due to lack of channel information is around 3%, but this loss grows as N increases - it should scale as log log(n) for large N, because when probing is required, the throughput remains bounded as N grows large. Figure 2(b) shows the throughput gain of π over other strategies as a function of average SNR. Note that the throughput gain of π over π fp is negligible except for low SNR (the gain is 9% at -1 db). The reason behind this is that in the high SNR regime log(1+ SNR) log(snr). With this approximation, the optimal solution of (P σ ) is p(s) =P for every s Sand σ( ), i.e., the constant power control is almost optimal. However, note that the gain is small even for moderate SNR values, e.g., the gain is 1% at db. Thus, when transmitting on only one channel is allowed, optimizing over the probing strategy is important, and optimizing over power control is not crucial. In Figure 2(c), we give the throughput gain of π over other strategies in the case of multi-channel transmissions. It is interesting to see that the throughput gain is significantly higher than that observed in Figure 2(b). Note also that the gain over π fp is quite important (at least 9% for various values of SNR). Thus, to achieve good performance, it is imperative to optimize both probing and power control strategies, which contrasts with the case of single-channel transmissions. VII. RELATED WORK The problem analyzed in this paper falls into the broad class of stochastic control problems [3], where an optimal exploration vs. exploitation trade-off has to be identified. 1 By definition, the convergence time is the first time after which the achieved throughput remains within 5 % of the maximum throughput.

Throughput 4 3.5 3 2.5 2 1.5 Genie-aided 1 Optimal Fixed-power.5 Probe-all Probe-one 5 1 15 2 25 Number of Channels (N) Throughput Gain (%) 1 8 6 4 2 Fixed-power Probe-all Probe-one -1-5 5 1 15 2 Avg. SNR (db) Throughput Gain (%) 9 8 7 6 5 4 3 2 1 Fixed-power Probe-all Probe-one -1-5 5 1 15 2 Avg. SNR (db) (a) Avg. SNR = 1 db (b) N = 15 (c) Avg. N = 15 Fig. 2. Throughput and throughput gains in the case of single-channel transmissions (a) and (b) and multi-channel transmissions (c). However, as already noticed in [11], it does not correspond to any of the classical control problems, such as multi-armed bandits, or stopping time, or optimal sampling problems. Indeed in the various version of the multi-armed bandits problems, sampling an arm (here a channel) is not allowed before exploiting it. Note that the authors of [15] propose a model for opportunistic spectrum access where, in each slot, the user chooses a channel and tries to transmit on it without acquiring its state. This model actually corresponds to the restless multi-armed bandits problem [18]. Our problem cannot be seen as a stopping time problem [7], because here in addition to the decision to probe further or to stop and transmit, the user has to select which channel to probe next, or at which power to transmit. It would become a stopping time problem if the channels were statistically equivalent and if transmissions were made at a fixed power, e.g. as in [17]. Finally, our problem is not an optimal sampling problem, where the optimal order at which random variables should be sampled [14], since this kind of models does not allow for exploitation. The design of optimal probing and channel selection strategy has been only recently studied [13], [1], [12], [11], [5], [6], but most often under the assumptions that (i) the channel states are identically distributed and (ii) power control is not taken into account. In [6], the authors manage to relax assumption (i), but to our knowledge, this present work is the first considering jointly probing and power control strategies. VIII. CONCLUSION We have considered a case where a user can access many channels for data transmission, but to use them effectively it needs to acquire CSI. Acquiring CSI consumes resources, thereby reducing the resources remaining for actual data transmission. In such systems, we have designed a probing and power control strategy that maximizes the throughput. The optimal strategy is computationally simple, but can be computed only through iterative learning algorithm. We have shown that the iterative procedure converges to the optimal policy. Key insights obtained from the numerical experiments are: (a) when a user can transmit only on a single channel, the gain through power adaptation is limited, i.e., the constant power allocation with optimal probing strategy provides a near optimal performance. (b) When a user can transmit on multiple channels simultaneously, the throughput gain through intelligent power allocation is significant (more than 9%). Hence, it is of paramount importance to use joint probing and power control to optimally exploit the available resources. Note that cognitive radio is one of the most important examples of the systems in which user can simultaneously transmit on multiple channels after acquiring CSI. REFERENCES [1] E. Altman. Constrained Markov Decision Processes. Chapman and Hall/CRC, 1999. [2] P. Bender, P. Black, M. Grob, R. Padovani, N. Sindhushayana, A. Viterbi. CDMA/HDR: a bandwidth-efficient high-speed wireless data service for nomadic users. IEEE Commun. Mag., vol. 28, pp 7-77, 2. [3] D. Bertsekas. Dynamic Programming and Optimal Control, 3rd edition. Athena Scientific, 27. [4] V. Borkar. Stochastic Approximation, a Dynamical Systems Viewpoint. Hindustan Book Agency (Cambridge University Press), 28. [5] N. Chang, M. Liu. Optimal channel probing and transmission scheduling for opportunistic spectrum access. In proc. of ACM MobiCom, 27. [6] P. Chaporkar, A. Proutiere. Optimal Joint Probing and Transmission Strategy for Maximizing Throughput in Wireless Systems. IEEE J. on Selected Areas in Commu., vol. 26, no. 18, pp. 1546-1556, Oct. 28. [7] Y.S. Chow, H. Robbins, D. Siegmund. Great expectations: the theory of optimal stopping. Houghton Mufflin Company, 1971. [8] K. Etemad. Overview of Mobile WiMax technology and evolution. IEEE Comm. Magazine, pp 31-4, Oct. 28. [9] A. Goldsmith, P. Varaiya. Capacity of Fading Channels with Channel side information. IEEE Trans. Inform. Theory, vol. 43, pp 1986-1992, Nov. 1997. [1] S. Guha, K. Munagala, S. Sarkar. Jointly optimal transmission and probing strategies for multichannel wireless systems. In proc. of CISS, 26. [11] S. Guha, K. Munagala, S. Sarkar. Approximation Schemes for Information Acquisition and Exploitation in Multichannel Wireless Networks, Proc. of Allerton Conf. on Commu., Control and Computing, 26. [12] S. Guha, K. Munagala, S. Sarkar. Optimizing Transmission Rate in Wireless Channels using Adaptive Probes. Poster paper in ACM Sigmetrics/Performance Conference, 26. [13] Z. Ji, Y. Yang, J. Zhou, M. Takai, R. Bagrodia. Exploiting medium access diversity in rate adaptive wireless LANs. In proc. of ACM Mobicom, 24. [14] M. Kodialam. The throughput of sequential testing. Lectures notes in Compu. Sci., 281 pp 28-292, 21. [15] L. Lai, H. El Gamal, H. Jiang and H. V. Poor. Cognitive Medium Access: Exploration, Exploitation and Competition. Submitted to IEEE ToN, Oct. 27. [16] H. Robbins. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 55 pp 527-535, 1952. [17] A. Sabharwal, A. Khoshnevis, E. Knightly. Opportunistic spectral usage: Bounds and multi-band CSMA/CA protocol. ACM/IEEE Trans. on Networking, vol 15-3, 27. [18] P. Whittle. Restless bandits: Activity allocation in a changing world. In: A celebration of Applied Probability, J. Gani (Ed), J. Appl. Probab. Spec., 25 pp 287-298, 1988.