Dynamic Rate and Channel Selection in Cognitive Radio Systems

Similar documents
Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1

Complexity of Regularization RBF Networks

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

The Effectiveness of the Linear Hull Effect

A Queueing Model for Call Blending in Call Centers

Simplification of Network Dynamics in Large Systems

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

Methods of evaluating tests

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017

Probabilistic Graphical Models

Sensitivity Analysis in Markov Networks

A model for measurement of the states in a coupled-dot qubit

23.1 Tuning controllers, in the large view Quoting from Section 16.7:

Hankel Optimal Model Order Reduction 1

Reliability Guaranteed Energy-Aware Frame-Based Task Set Execution Strategy for Hard Real-Time Systems

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.

Non-Orthogonal Random Access for 5G Mobile Communication Systems

Lightpath routing for maximum reliability in optical mesh networks

Directional Coupler. 4-port Network

Assessing the Performance of a BCI: A Task-Oriented Approach

Frequency hopping does not increase anti-jamming resilience of wireless channels

Subject: Introduction to Component Matching and Off-Design Operation % % ( (1) R T % (

max min z i i=1 x j k s.t. j=1 x j j:i T j

A Characterization of Wavelet Convergence in Sobolev Spaces

Control Theory association of mathematics and engineering

Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems

Phase Diffuser at the Transmitter for Lasercom Link: Effect of Partially Coherent Beam on the Bit-Error Rate.

A simple expression for radial distribution functions of pure fluids and mixtures

Convergence of reinforcement learning with general function approximators

Likelihood-confidence intervals for quantiles in Extreme Value Distributions

Capacity-achieving Input Covariance for Correlated Multi-Antenna Channels

Determination of the reaction order

Variation Based Online Travel Time Prediction Using Clustered Neural Networks

Coding for Random Projections and Approximate Near Neighbor Search

A Spatiotemporal Approach to Passive Sound Source Localization

Wave Propagation through Random Media

3 Tidal systems modelling: ASMITA model

Combined Electric and Magnetic Dipoles for Mesoband Radiation, Part 2

Chapter 8 Hypothesis Testing

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS

Connectivity and Blockage Effects in Millimeter-Wave Air-To-Everything Networks

Robust Recovery of Signals From a Structured Union of Subspaces

7 Max-Flow Problems. Business Computing and Operations Research 608

Chapter Review of of Random Processes

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013

Resolving RIPS Measurement Ambiguity in Maximum Likelihood Estimation

Maximum Entropy and Exponential Families

arxiv: v2 [math.pr] 9 Dec 2016

Evaluation of effect of blade internal modes on sensitivity of Advanced LIGO

Quantum secret sharing without entanglement

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix

Complementarities in Spectrum Markets

LECTURE NOTES FOR , FALL 2004

On Industry Structure and Firm Conduct in Long Run Equilibrium

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

An Integrated Architecture of Adaptive Neural Network Control for Dynamic Systems

Estimating the probability law of the codelength as a function of the approximation error in image compression

Measuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach

IMPEDANCE EFFECTS OF LEFT TURNERS FROM THE MAJOR STREET AT A TWSC INTERSECTION

Singular Event Detection

Nonreversibility of Multiple Unicast Networks

Relativity in Classical Physics

Analysis of discretization in the direct simulation Monte Carlo

Relativistic Dynamics

EECS 120 Signals & Systems University of California, Berkeley: Fall 2005 Gastpar November 16, Solutions to Exam 2

On the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION

UTC. Engineering 329. Proportional Controller Design. Speed System. John Beverly. Green Team. John Beverly Keith Skiles John Barker.

Danielle Maddix AA238 Final Project December 9, 2016

Word of Mass: The Relationship between Mass Media and Word-of-Mouth

Maximum Likelihood Multipath Estimation in Comparison with Conventional Delay Lock Loops

An Adaptive Optimization Approach to Active Cancellation of Repeated Transient Vibration Disturbances

Supplementary Materials

Battery Sizing for Grid Connected PV Systems with Fixed Minimum Charging/Discharging Time

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS

Average Rate Speed Scaling

Calibration of Piping Assessment Models in the Netherlands

Supporting Information

On the Licensing of Innovations under Strategic Delegation

Designing Dynamic Contests

QCLAS Sensor for Purity Monitoring in Medical Gas Supply Lines

The Laws of Acceleration

Normative and descriptive approaches to multiattribute decision making

Sensor Network Localisation with Wrapped Phase Measurements

Sensitivity of Spectrum Sensing Techniques to RF impairments

Sensor management for PRF selection in the track-before-detect context

Sensitivity analysis for linear optimization problem with fuzzy data in the objective function

10.2 The Occurrence of Critical Flow; Controls

ONLINE APPENDICES for Cost-Effective Quality Assurance in Crowd Labeling

Orchestrating Massively Distributed CDNs

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.

Lecture Notes 4 MORE DYNAMICS OF NEWTONIAN COSMOLOGY

Spatial Degrees of Freedom of Large Distributed MIMO Systems and Wireless Ad hoc Networks

REFINED UPPER BOUNDS FOR THE LINEAR DIOPHANTINE PROBLEM OF FROBENIUS. 1. Introduction

PoS(ISCC 2017)047. Fast Acquisition for DS/FH Spread Spectrum Signals by Using Folded Sampling Digital Receiver

SUPPORTING ultra-reliable and low-latency communications

An I-Vector Backend for Speaker Verification

CONDITIONAL CONFIDENCE INTERVAL FOR THE SCALE PARAMETER OF A WEIBULL DISTRIBUTION. Smail Mahdi

An Optimization Framework for Practical Multipath Routing in Wireless Mesh Networks

Cross-layer Optimization for Ultra-reliable and Low-latency Radio Access Networks

Taste for variety and optimum product diversity in an open economy

Transcription:

1 Dynami Rate and Channel Seletion in Cognitive Radio Systems R. Combes and A. Proutiere KTH, The Royal Institute of Tehnology arxiv:1402.5666v2 [s.it] 12 May 2014 Abstrat In this paper, we investigate dynami hannel and rate seletion in ognitive radio systems whih exploit a large number of hannels free from primary users. In suh systems, transmitters may rapidly hange the seleted (hannel, rate) pair to opportunistially learn and trak the pair offering the highest throughput. We formulate the problem of sequential hannel and rate seletion as an online optimization problem, and show its equivalene to a strutured Multi-Armed Bandit problem. The struture stems from inherent properties of the ahieved throughput as a funtion of the seleted hannel and rate. We derive fundamental performane limits satisfied by any hannel and rate adaptation algorithm, and propose algorithms that ahieve (or approah) these limits. In turn, the proposed algorithms optimally exploit the inherent struture of the throughput. We illustrate the effiieny of our algorithms using both test-bed and simulation experiments, in both stationary and non-stationary radio environments. In stationary environments, the paket suessful transmission probabilities at the various hannel and rate pairs do not evolve over time, whereas in non-stationary environments, they may evolve. In pratial senarios, the proposed algorithms are able to trak the best hannel and rate quite aurately without the need of any expliit measurement and feedbak of the quality of the various hannels. I. INTRODUCTION In ognitive radio systems, radio devies may aess a potentially large number of frequeny bands or hannels. An example of suh systems are those exploiting white spae spetrum, the unused part of the TV/UHF spetrum (unalloated or not used loally). The FCC 2008 ruling allowed unliensed devies to use parts of this spetrum, provided that devies an detet primary users (TV transmitters and wireless mirophones). As a part of the 2010 ruling [1], FCC mandates the use of a geoloation database to identify whih frequenies are free from primary users. By querying the geoloation database, we are guaranteed to obtain a set of hannels free from primary transmitters and we avoid the diffiult problem of sensing primary users. We onsider systems exploiting hannels known to be free from primary users. For the transmission of eah paket, transmitters an selet a oding rate from a finite predefined set (as in 802.11 systems for example) and a hannel from the set of available hannels. The outome of a paket transmission is random, and the probabilities of suessfully transmitting a paket using the various (hannel, rate) pairs are a priori unknown at the transmitter; they need to be learnt based on trial and error. These probabilities an vary signifiantly and randomly over time and aross hannels; they also strongly depend on the hosen oding rate. As a onsequene, traking the best (hannel, oding rate) pair for transmission may greatly improve the system performane. In this paper, we aim at designing sequential hannel and oding rate seletion shemes that effiiently trak the best available hannel and the orresponding oding rate. As shown in previous works, see e.g. [2], [3], RSSI (Reeive Signal Strength Indiator) is a poor preditor of hannel quality, and hene of the paket suessful transmission probabilities. In OFDM systems for example, this stems from the fat that RSSI does not report the individual signal strength experiened on the various sub-arriers. In order to aurately estimate the quality of a wide-band hannel, more sophistiated tehniques with speifi hardware are needed [4], [5]. But these tehniques are not typially supported in urrent ommerial radio hardware. Instead, we need to infer the quality of eah hannel at eah transmission rate through probing. Here we onsider 802.11-like systems, where the only feedbak sent from the reeiver to the transmitter is whether a data paket has been suessfully reeived or not. Hene by probing, we mean that several real data pakets have to be sent on eah hannel and at eah rate to onstrut a reliable estimate of the hannel quality. In the design of hannel and rate seletion shemes, we then fae a lassial exploration vs. exploitation trade-off problem. We need to exploit the (hannel, rate) pair that has offered the best throughput so far, whilst onstantly exploring other pairs in ase one of them is atually optimal. We rigorously formulate the design of the optimal sequential (hannel, rate) seletion algorithms as an online stohasti optimization problem. In this problem,

2 the objetive is to maximize the number of pakets suessfully sent over a finite time horizon. We show that this problem redues to a Multi-Armed Bandit (MAB) problem [6]. In MAB problems, a deision maker sequentially selets an ation (also alled an arm ), and observes the orresponding reward. Rewards of a given arm are random variables with unknown distribution. The objetive is to design sequential ation seletion strategies that maximize the expeted reward over a given time horizon. These strategies have to ahieve an optimal trade-off between exploitation (ations that have provided high rewards so far have to be seleted) and exploration (sub-optimal ations have to be hosen so as to learn their average rewards). For our (hannel, rate) seletion problem, the various arms orrespond to the deisions available at the transmitter to send pakets, i.e., an arm orresponds to a hannel and a oding rate. When a (hannel, rate) pair is seleted for a paket transmission, the reward is equal to 1 if the transmission is suessful, and equal to 0 otherwise. The average suessful paket transmission probabilities at the various (hannel, rate) pairs are unknown, and have to be learnt. The MAB problem orresponding to the design of hannel and rate seletion mehanisms is referred to as a strutured MAB problem. It differs from lassial MAB problems. (i) First, the rewards assoiated with the various rates on a given hannel are stohastially orrelated, i.e., the outomes of transmissions at different rates are not independent: for example, if a transmission at a high rate is suessful, it would be also suessful at lower rates. (ii) Then, the average throughputs ahieved at various rates exhibit natural strutural properties. For a given hannel, the throughput is an unimodal funtion of the seleted rate. (iii) In addition, most often, on all hannels, the paket suessful transmission probabilities are lose to 1 at low rates, and abruptly derease to 0 as the rate inreases. This additional struture, referred to as graphial unimodality, allows us to predit the outomes of transmissions on various hannels. As we demonstrate, orrelations and (graphial) unimodality are instrumental in the design of hannel and rate seletion mehanisms, and an be exploited to learn and trak the best (hannel, rate) pair quikly and effiiently. Finally, note that most MAB problems onsider stationary environments, whih, for our problem, means that the suessful paket transmission probabilities for the different (hannel, rate) pairs do not vary over time. In pratie, the transmitter faes a non-stationary environment as these probabilities ould evolve over time. We onsider both stationary and non-stationary radio environments. In the ase of stationary environments, we derive an upper bound of the expeted reward that an be ahieved in strutured MAB problems. This provides a fundamental performane limit that any (hannel, rate) seletion algorithm annot exeed. This limit quantifies the inevitable performane loss due to the need to explore sub-optimal (hannel, rate) pairs. It also indiates the performane gains that an be ahieved by devising shemes that optimally exploit the orrelations and the strutural properties of the MAB problem. We present sequential (hannel, rate) seletion algorithms that optimally exploit the strutural properties of the problem: for our algorithms, we prove that the performane loss due to the need to explore sub-optimal (hannel, rate) pairs does not depend on the number of available rates. We also extend our algorithms to non-stationary radio environments. Finally, we evaluate the performane of the proposed algorithms using an offie white-spae testbed operating in the 500MHz-600MHz band, and simulation experiments. To ounowledge, the problem of sequential hannel and rate seletion has only been investigated in [3], where heuristi algorithms have been developed. In ontrast, we formulate and solve this problem rigorously, i.e., we provide fundamental upper performane bounds satisfied under any hannel and rate seletion algorithm and design optimal algorithms that math these bounds. Contributions. We formulate the design of (hannel, rate) seletion algorithms as an online optimization problem, and establish its equivalene to a strutured MAB problem. We derive a performane upper bound satisfied by any (hannel, rate) seletion algorithm, depending on the assumptions made on the struture of the problem three senarios with inreasing struture are onsidered: 1. no strutural assumption is made; 2. the throughput on eah hannel is a unimodal funtion of the rate; 3. the throughput is a graphially unimodal funtion of the hannel and rate. We also quantify the performane gains that one may ahieve by exploiting the strutural properties of the problem. We propose three (hannel, rate) seletion algorithms, one for eah of the above senarios, and analyze their performane in stationary radio environments. We prove that our algorithms optimally exploit the strutural properties of the throughputs. We briefly disuss the extensions of our algorithms to non-stationary radio environments. Finally, we evaluate the performane of our algorithms using simulation experiments. To this aim, we use artifiially generated traes, as well as traes

3 extrated from a white spae test-bed operating in the UHF radio spetrum. Paper organization. The next setion is devoted to the related work. In Setions III and IV, we formulate the problem of sequential seletion of hannel and rate seletion as a strutured bandit problem. Setion V presents fundamental upper performane bounds for this problem. In Setion VI, asymptotially optimal algorithms are proposed. Setion VII deals with non-stationary radio environments. Setion VIII presents numerial experiments to evaluate the performane of our algorithms. II. RELATED WORK First observe that the joint hannel and rate seletion problem is onsiderably more diffiult than deteting hannels with no primary users as onsidered in a lot of reent works, see e.g. [7], [8], [9], [10], [11], [12], [13]. In some of these papers, a MAB framework has been used to design primary user detetion algorithms. The presene or the absene of primary users just means that a hannel is either good or bad. When seleting both hannel and rate, the dimension of the problem beomes larger, and there are multiple and numerous possible hannel states. Primary users are not onsidered in our work, as we assume that transmitters an use a geoloation database to get a list of hannels free from primary users [1]. It should also be observed that most of the work on dynami spetrum aess onsiders stationary radio environments. In [9], [10] for example, the authors use lassial stohasti ontrol tehniques (Markov Deision Proesses) to sequentially selet a hannel for transmission. The underlying assumption is that the environment is stationary, i.e., the paket suessful transmission probabilities do not evolve over time. In this paper, both stationary and non-stationary radio environments are explored. Test-bed experiments atually suggest that the environment is non-stationary in pratie, even in networks where nodes do not move suh as indoor offies, see e.g. [3]. Our problem resembles the rate adaptation problem in 802.11 systems, see e.g. [14], [15], [16]. But again, our problem has one additional dimension (a hannel has to be seleted): in turn, the number of available deisions at the transmitter is muh larger than in 802.11 systems where only the rate has to be hosen. Rate adaptation algorithms are not appliable when the hannel an also be seleted for eah paket transmission. This is due to the fat that the transmitter does not ontinuously monitor the same hannel (as in 802.11 systems), and has to swith hannels often to disover the best (hannel, rate) pair as rapidly as possible. There is an abundant literature on MAB problems, and engineers have applied these problems to dynami spetrum aess [8], [10], [11], [17]. Most existing theoretial results, see [18] for a reent survey, are onerned with unstrutured MAB problems, i.e., problems where the average reward assoiated with the various deisions are not related. For this kind of problems, Lai and Robbins [6] derived an asymptoti lower bound on regret and also designed optimal sequential deision algorithms. When the average rewards are strutured (as this is the ase for our problem), the design of optimal deision algorithms is more hallenging, see e.g. [18]. Non-stationary environments have not been extensively studied in the bandit literature: Most often unstrutured MAB only are analyzed, see [19], [20], [21]. To ounowledge, the only work dealing with joint (hannel, rate) seletion is [3]. However there, the strutural properties of the orresponding MAB problem had not been identified, and the authors only proposed algorithms based on heuristis. This ontrasts with the present work: we rigorously determine fundamental limits satisfied by any (hannel, rate) adaptation algorithm, and propose algorithms approahing these limits. III. MODELS We onsider a single link (a transmitter-reeiver pair). At time 0, the link beomes ative and the transmitter starts sending pakets to the reeiver. For eah paket, the transmitter selets a hannel from a finite set C = {1,...,C}, and a oding and modulation sheme from a finite set K = {1,...,K}. The transmission rate when using oding and modulation sheme k is denoted by and we define the set of rates R = {,k K}. R is ordered, i.e., r 1 < r 2 <... < r K. After a paket is sent, the transmitter is informed of whether the transmission has been suessful. Based on the observed past transmission suesses and failures at the various hannels and rates, the transmitter has to selet a hannel and rate pair for the next paket transmission. Let Π denote the set of all possible sequential (hannel, rate) seletion shemes. Pakets are assumed to be of equal size, and without loss of generality, for any k, the duration of a paket transmission at rate is 1/. A. Channel models For the i-th paket transmission on hannel at rate, a binary random variable X k (i) represents the suess (X k (i) = 1) or failure (X k (i) = 0) of the transmission. E[X k (i)] refers to as the paket suessful transmission probability on hannel at rate (i.e., it is the paket reeption rate). We onsider

4 both stationary and non-stationary radio environments. In stationary environments, the suess transmission probabilities on the various hannels and at different rates do not evolve over time. This arises when the system onsidered is stati (in partiular, the transmitter and reeiver do not move). In non-stationary environments, suess transmission probabilities an evolve over time. Unless otherwise speified, we onsider stationary radio environments. Non-stationary environments are treated in Setion VII. We assume that X k (i), i = 1,2,..., are independent and identially distributed, and we denote by θ k the suess transmission probability on hannel at rate : θ k = E[X k (i)]. We verified that the i.i.d. assumption holds in our test-bed and simulation framework. Denote by (,k ) the optimal (hannel, rate) pair, i.e., (,k ) argmax,k θ k. To simplify the exposition and the notation, we assume that the optimal (hannel, rate) pair is unique, i.e., θ k > θ k, for all (,k) (,k ). Our analysis an be extended in an easy way to senarios where the optimal hannel and rate pair is not unique. Note however that senarios where different hannel and rate pairs yield exatly the same throughput should happen very rarely in pratie. We further introdue, for any hannel, the optimal rate, i.e., (,k) argmax k θ k. Again for simpliity, we assume that on any hannel, the optimal rate is unique: θ k > θ k, for all k k. The throughput ahieved using (hannel, rate) pair (, k) is denoted by µ k = θ k. The maximum throughput on hannel is µ = µ k, and the throughput ahieved using the optimal (hannel, rate) pair is µ = µ = µ k. Although we do not aount for the presene of primary users in this work, we ould atually model senarios where on eah hannel, primary users oupy the hannel with some fixed probability ζ, and in an i.i.d. manner aross time. Indeed in suh senarios, we just need to replae θ k by (1 ζ )θ k. If the primary users oupy hannels for long periods of time (not in an i.i.d. manner), the analysis would be signifiantly more hallenging. This kind of situations is investigated in [9] for example. B. Strutural properties The suessful transmission probabilities θ = (θ k, C,k K) are initially unknown at the transmitter, and have to be learnt. When the number of (hannel, rate) pairs grows large, learning the best pair for transmission then beomes really hallenging. Fortunately, the outomes of transmissions using the various (hannel, rate) pairs exhibit strutural properties that an be exploited to speed up the learning proess. To emphasize the importane of exploiting the strutural properties, we onsider three senarios with inreasing struture. 1) Senario 1 No struture: If no strutural assumptions are made regarding the suessful transmission probabilities, then θ [0,1] C K. In suh senarios, we will show that the performane loss due to the need to explore sub-optimal (hannel, rate) pairs sales linearly with the number of hannels and rates. 2) Senario 2 Unimodality: First observe that the suesses and failures of transmissions on a given hannel at various rates are statistially orrelated. Indeed, if a transmission is suessful at a high rate, it has to be suessful at a lower rate. Similarly, if a lowrate transmission fails, then transmitting at a higher rate would also fail. Formally this means that for any hannel, θ = (θ 1,...,θ K ) T, where T = {η [0,1] K : η 1... η K }. Then, in pratie, it has been observed (and this is onfirmed in our numerial experiments) that the throughput ahieved on a given hannel is a unimodal funtion of the transmission rate, see e.g. [5], [16]. In other words, for any hannel, θ U, where U = {η [0,1] K : k 1,r 1 η 1 <... < 1 η k1,1 η k1 > 1+1η k1+1 >... > r K η K }. In summary in Senario 2, for any hannel, θ T U. 3) Senario 3 Graphial unimodality: We further observe (see Setion VIII) that on a given hannel, the throughput first grows linearly with the rate (the suessful transmission probability is lose to 1), and then abruptly dereases to 0. This observation has been made in earlier work, see [14] (the author refers to this senario as the steep throughput senario), [5]. This knowledge an be exploited to build a relationship between the throughputs ahieved on various hannels. Indeed, for example, the throughputs observed on two different hannels are roughly idential in their growth phase (when the rates are low and the suess probabilities are lose to 1). To exploit this observation, we remark that if it holds, the throughput is a graphially unimodal funtion of the (hannel, rate) pair as defined below. We first onstrut a direted graph G = (V,E) whose verties orrespond to the (hannel, rate) pairs. When (d,d ) E, we say that the deision d is a neighbor of deision d, and we define N(d) = {d V : (d,d ) E} as the set of neighbors of d. The throughput or average reward of deision d = (, k) is denoted by µ d = θ k. Graphial unimodality expresses the fat that when the optimal deision is d = (,k ), then for any d V, there exists a path in G from d to d along whih the expeted reward is stritly inreasing. In other

5 words there is no loal maximum in terms of expeted reward exept at d. The notion of loality is defined through that of neighborhood N(d),d V. Formally, θ U G, where U G is the set of suessful transmission probabilities θ [0,1] C K suh that, if d = (,k ) argmax (,k) θ k, for any d = (,k) V, there exists a path (d 0 = d,d 1,...,d p = d ) in G suh that for any i = 1,...,p, µ di > µ di 1. Let us now omplete the onstrution of graph G. The set of edges E is: ((,k),(,k 1)), ((,k),(,k +1)) and ((,k),(,k)), ((,k),(,k +1)) for all (hannel, rate) pair (,k), and all. An example of suh a graph G is presented in Figure 1 for 2 hannels and 4 rates. When the above observation made on θ holds (steep senario as defined in [14]), it is easy to hek that the throughput is a graphially unimodal funtion (w.r.t. graph G) of the hannel and rate. In all pratial ases, beyond the steep throughput senario, we have atually observed that the graph G as onstruted above had enough edges to guarantee the graphial unimodality of the throughput, see Setion VIII. Fig. 1 EXAMPLE OF A GRAPH PROVIDING UNIMODALITY OF THE THROUGHPUT. In summary, in Senario 3, we assume that θ T C U G. Note that there is more struture in Senario 3 than in Senario 2: if θ T C U G, then for any, θ T U. Remark 1: Note that the sharp derease of the throughput when the rate inreases may not hold in some senarios as observed in several papers. The sharp transition of the reeption rate motivates the use of graphial unimodality, but the latter is more general, and may hold even in absene of this sharp transition. Atually, we are free to design the graph G, and adapt its topology depending on the speifiity of the radio environment so as to get graphial unimodality. Remark 2: The notion of graphial unimodality is generi. Our approah onsists in onstruting a minimal graph G ombining strutural properties satisfied by the throughput as a funtion of the (hannel, rate) pair, and results from experiments run off-line, so that the throughput is graphially unimodal w.r.t. G. It should be observed that in MIMO systems (e.g. as in 802.11n and subsequent standards), the throughput is no longer a unimodal funtion of the rate (due to the different available MIMO modes, with single or multiple streams). The proposed framework an be adapted to aount for the various MIMO modes. To this aim, the set of deisions would orrespond to the set of all possible (hannel, MIMO mode, rate) triplets, and the graph G would be onstruted so that the throughput is a graphially unimodal funtion of these triplets. IV. OBJECTIVES AND MULTI-ARMED BANDITS Our goal is to devise a sequential (hannel, rate) seletion sheme that maximizes the number of pakets suessfully transmitted over a finite time horizon. Suh a design an be formulated as an online stohasti optimization problem. The hoie of the time horizon, denoted by T, is not really important as long as during time interval T, a large number of pakets an be sent so that inferring the suess transmission probabilities effiiently is possible. Consider a rate adaption sheme π Π that selets (hannel, rate) pair ( π (t),k π (t)) for the t-th paket transmission. The number of pakets γ π (T) that have been suessfully sent under algorithm π up to time T is: γ π (T) = s π k (T),k i=1 X k (i), where s π k (T) is the number of transmission attempts on hannel at rate before time T. The s k (T) s are random variables (sine the rates seleted under π depend on the past random suesses and failures), and satisfy the following onstraint: s π k (T) 1 T.,k Wald s lemma implies that the expeted number of pakets suessfully sent up to time T is: E[γ π (T)] =,k E[sπ,k (T)]θ k. Thus, our objetive is to design an algorithm solving the following online stohasti optimization problem: max π Π s.t.,k E[sπ k (T)]θ k, (1),k sπ k (T) 1 T,,k,s π k (T) N. A. An equivalent Multi-Armed Bandit (MAB) problem Next we show that the above online stohasti optimization problem is equivalent to a Multi-Armed Bandit (MAB) problem. 1) An alternative system: Without loss of generality, we assume that time an be divided into slots whose durations are suh that for any k, the time it takes to transmit one paket at rate orresponds to an integer number of slots. Under this onvention, the optimization

6 problem (1) an be written as: max π Π,k E[tπ k (T)]θ k, (2) s.t.,k tπ k (T) T,,k,t π k (T) 1 N = { u,u N}, where t π k (T) = sπ k (T)/ represents the amount of time (in slots) that the transmitter spends, before T, on sending pakets on hannel at rate. The onstraint t k (T) 1 N indiates that when a rate is seleted, this rate seletion remains the same for the next 1/ slots. By relaxing this onstraint, we obtain an optimization problem orresponding to a MAB problem. Indeed, onsider now an alternative system where rate seletion is made every slot. If at any given slot, (hannel, rate) pair (,k) is seleted for the i-th times, then if X k (i) = 1, the transmitter suessfully sends bits in this slot, and if X k (i) = 0, then no bit are reeived. A (hannel, rate) seletion algorithm then deides in eah slot whih (hannel, rate) pair to use. There is a natural mapping between rate seletion algorithms in the original system and in the alternative system: let π Π, if for the t-th paket transmission, rate is seleted under π in the original system, then π selets the same rate in the t-th slot. For the alternative system, the objetive is to design π Π solving the following optimization problem, whih an be interpreted as a relaxation of (2). max π Π s.t.,k E[tπ k (T)]θ k, (3),k tπ k (T) T,,k,t π k (T) N. The above optimization problem orresponds to a MAB problem, where in eah slot a deision is taken (i.e., a hannel and a rate are seleted), and where when (,k) is hosen, the obtained reward is with probability θ k and 0 with probability 1 θ k. 2) Regrets: We quantify the performane of an algorithm π Π in both original and alternative systems through the notion of regret. The regret up to slot T ompares the performane of π to that ahieved by an algorithm always seleting the best (hannel, rate) pair. If the parameter θ = (θ k,,k) was known, then in both systems, it would be optimal to always selet (hannel, rate) pair (,k ). The regret of algorithm π up to time slot T in the original system is then defined by: R π 1(T) = θ k T,k θ k E[s π k (T)], where x denotes the largest integer smaller than x. The regret of algorithm π up to time slot T in the alternative system is similarly defined by: R π (T) = θ k T,k θ k E[t π k (T)]. 3) Asymptoti equivalene: In the next setion, we show that an asymptoti lower bound for the regret R π (T) is of the form (θ)log(t) where (θ) is a stritly positive onstant that we an expliitly haraterize. It means that for all π Π, liminf T R π (T)/log(T) (θ). It an be also shown that there exists an algorithm π Π that atually ahieves this lower bound in the alternative system, in the sense that limsup T R π (T)/log(T) (θ). In suh a ase, we say that π is asymptotially optimal. The following proposition states that atually, the same lower bound is valid in the original system, and that any asymptotially optimal algorithm in the alternative system is also asymptotially optimal in the original system. Proposition 1: Let π Π. For any β > 0, we have: R π (T) R1 π lim inf β = lim inf (T) T log(t) T log(t) β, and R π (T) R1 π lim sup β = lim sup (T) T log(t) T log(t) β. Proof. Let T > 0. By time T, we know that there have been at least Tr 1 transmissions, but no more than Tr K. Also observe that both regrets R π and R1 π are inreasing funtions of time. We dedue that: Now R π ( Tr 1 ) R π 1(T) R π ( Tr K ). R1 π lim inf (T) R π ( Tr 1 ) lim inf T log(t) T log(t) R π ( Tr 1 ) = lim inf T log( Tr 1 ) β. The seond statement an be derived similarly. B. MAB problems Instead of trying to solve (1), we rather fous on analyzing the MAB problem (3). We know that optimal algorithms for (3) will also be optimal for the original problem. The nature of the MAB problem (3) depends on the strutural assumption made on the suessful transmission probabilities θ. In absene of suh assumption (Senario 1), we get a lassial MAB problem where the rewards provided by all deisions are independent. In Senarios 2 and 3, we get a strutured MAB problem, as we know a priori that θ belongs to a strutured set, whih helps learning the best (hannel, rate) pair. Next

7 we summarize the MAB problems obtained in the three different senarios. We have a set {1,...,C} {1,...,K} of possible deisions (i.e., (hannel,rate) pairs). If deision (, k) is taken for the i-th time, we reeive a reward X k (i). (X k (i),i = 1,2,...) are i.i.d. with Bernoulli distribution with mean θ k. The objetive is to design a deision sheme minimizing the regret R π (T) over all possible algorithms π Π. The three MAB problems differ depending on the strutural assumptions made on θ. Unstrutured MAB (P I ). No assumption is made on θ: θ [0,1] C K. Strutured MAB (P U ). We assume that θ T U for all hannel. Strutured MAB (P GU ). We assume that θ T C U G. V. REGRET LOWER BOUNDS In this setion, we derive an asymptoti (as T grows large) lower bound of the regret R π (T) satisfied by any algorithm π Π in the three MAB bandit problems (P I ), (P U ), and (P GU ). These lower bounds provide insightful theoretial performane limits satisfied by any (hannel, rate) seletion sheme. By omparing the lower bounds derived for the three problems, we also quantify the performane gains that an be ahieved by smartly exploiting the (a priori) known struture. A. Unstrutured MAB (P I ) The regret lower bound for MAB problem (P I ) an be derived using the diret tehnique used by Lai and Robbins [6]. Note that the only differene between (P I ) and the lassial MAB problems [6] lies in the fat that in (P I ), we know that the average reward of deision (,k) is of the form θ k fonown. The analysis of (P I ) is then similar to that of lassial bandit problems. We first introdue the notion of uniformly good algorithms. An algorithm π is uniformly good, if for all parameters θ, for any α > 0, we have 1 : E[t π k (T)] = o(t α ), (,k) (,k ), where t π k (T) is the number of times (hannel, rate) pair (,k) has been hosen up to the T -th deision, and(,k ) is the optimal hannel and rate pair (it depends on θ). Uniformly good algorithms exist as we shall see later on. Let N = {k : µ } note that N depends on θ. There exists k 0 suh that N = {k 0,...,K}, with the onvention k 0 = K +1 if N =. Note that if k < k 0, then for any hannel, θ k >, whih means that even if all transmissions at rate on hannel were 1 f(t) = o(g(t)) means that lim T f(t)/g(t) = 0. suessful, i.e., θ k = 1, rate would be sub-optimal. Hene, there is no need to selet rate to disover this fat, sine by only seleting rate on hannel, we get to know whether θ k > r k θ k. Finally, we introdue the Kullbak-Leibler (KL) divergene, a well-known measure for dissimilarity between two distributions. When we ompare two Bernoulli distributions with respetive averages p and q, the KL divergene is: I(p,q) = plog p 1 p q +(1 p)log 1 q. Theorem 1: Let π Π be a uniformly good rate seletion algorithm for MAB problem (P I ). We have: liminf T R π (T) log(t) I(θ), where K I (θ) = µ θ k I(θ k=k 0:k k k, µ ) + K µ θ k I(θ k=k k, µ 0 ). The proof of the previous theorem is similar to that of the regret lower bound in [6], and is omitted here. In view of this result, if we do not exploit strutural properties of the problem, then the regret of any algorithm sales at least as CK log(t). Hene, when the number of hannels and rates grow large, no algorithm is able to learn the best (hannel, rate) pair rapidly and effiiently. B. Strutured MAB (P U ) To derive a regret lower bound for MAB problem (P U ), we need to introdue additional notations. We define M = N {k 1,k + 1}. For any hannel, let N = {k : µ }, and k 0 suh that N = {k 0,...,K}, with the onvention k 0 = K + 1 if N =. Observe that for any, k 0 k 0. Let M = N {k 1,k +1}. The following theorem is proved in [22]. Due to spae limitations, all the proofs are presented in [22]. Theorem 2: Let π Π be a uniformly good (hannel, rate) seletion algorithm for MAB problem (P U ). We R have: liminf π (T) T log(t) U (θ), where U (θ) is the optimal value of the following optimization problem: inf α k 0, k, (,k) (,k ) α k (µ µ k ) s.t. k M,α ki(θ k, µ ) 1, : k k 0,α k I(θ k, µ ) 1, and k k 0,k k, inf λ C k where C k = {λ U T : λ k > µ }. l α l I(θ l,λ l ) 1, The above theorem does not provide a fully expliit regret lower bound. In partiular, it remains unlear how

8 this lower bound sales with the numbers of rates and hannels. In the following theorem, we further exploit the strutural properties of the MAB problem (P U ) to show that U (θ) sales at most linearly with the number of hannels, and does not sale with the number of rates. Theorem 3: We have U (θ) U (θ) where U(θ) = µ µ k I(θ k M k, µ ) + [ µ µ k min{i(θ k, µ ),I(θ k,θ k δ )} + ] µ µ k. (4) I(θ k,θ k + δ ) and k M δ = min (µ k µ k {k 1,k +1} k)/2. In partiular, U (θ) is proportional to the number of hannels and independent of the number of rates. From the above analysis, we onlude that the minimum regret for the MAB problem (P U ) sales at most as 3C log(t). Hene we expet that exploiting the struture of the problem (the fat that θ T U for any hannel ) may signifiantly improve the system performane. Indeed we expet a regret that does not depend on the number of available rates. In the next setion, we design an algorithm with suh a regret. C. Strutured MAB problem (P GU ) Graphial unimodal bandit problems have been reently studied in [23], [24]. A regret lower bound is derived in [24]. The only differene between our graphially unimodal MAB problem and those onsidered in [24] is that we onsider direted graphs, but the analysis is similar. We use here the notation introdued in III-B.3, and reall that N = {k : µ }. For any (,k), we define N (,k) = N(,k) N. N (,k) is the set of (hannel, rate) pairs that are neighbors of vertex (,k), and that need to be explored if one wants to know whether they provide better throughput than (, k). Theorem 4: [24] Let π Π be a uniformly good (hannel, rate) seletion algorithm for MAB problem (P GU ). We have: where R π (T) lim inf T + log(t) GU(θ), (5) GU (θ) = (,k) N (,k ) µ µ k I(θ k, µ ). In view of the above theorem, for the MAB problem (P GU ), the minimum regret sales as γlog(t), where γ is γ is the maximum node degree in the graph G. Note that for our graph G, γ 2C. Hene, by exploiting the graphial unimodal struture, we may expet to design algorithms whose regret does not depend on the number of available rates. In the next setion, an algorithm whose regret mathes the lower bound of Theorem 4 is proposed. In this setion, we have shown that the regret lower bound an be signifiantly improved when strutural assumptions are made, i.e., GU (θ) U (θ) I (θ). By exploiting the struture of the problem, we may atually design algorithms whose regrets does not depend on the number of available rates. Suh algorithms do not exist when the struture is not exploited (see Theorem 1). VI. ALGORITHMS In this setion, we present algorithms for the three MAB problems (P I ), (P U ), and (P GU ), and analyze their regrets. For the two strutured MAB problems, the proposed algorithms exhibit a regret that does not depend on the number of available rates. A. The KL-UCB algorithm for MAB problem (P I ) Classial unstrutured bandit problems have been extensively studied in the past, and numerous effiient algorithms have been proposed. We build on this previous work, and present a simple extension of KL- UCB algorithm [21] to the MAB problem (P I ). This algorithm does not exploit any strutural properties, and is asymptotially optimal: its regret mathes the lower bound derived in Theorem 1. Under the KL-UCB algorithm, eah (hannel, rate) pair (,k) is assoiated with an index q k (n) for the (n + 1)-th paket transmission: q k (n) =max{q [0, ] : t (n)i(ˆµ k(n) k, q ) log(n)+3loglog(n)}, wheret k (n) denotes the number of times(,k) has been seleted up to the n-th transmission, and ˆµ k (n) = 1 t k(n) X k (i), t k (n) i=1 is the empirial throughput or reward of (hannel, rate) pair (, k) up to the n-th transmission. The algorithm selets the (hannel, rate) pair with highest index: Algorithm 1 KL-UCB For n = 0,...,CK 1 (initialization): for the (n+1)-th transmission, selet (hannel, rate) pair (, k)(n + 1) =

9 ( +1,k +1) where n = K +k, k {0,...,K 1}. For n CK, for the (n + 1)-th transmission, selet (,k)(n+1) where (,k)(n+1) argmax (,k) q k (n). The rationale behind the design of KL-UCB is the same as that of the lassial UCB algorithm. We onstrut an index for eah (hannel, rate) pair, whih in turn onstitutes an upper onfidene bound of the orresponding throughput. By seleting the pair with the highest upper onfidene bound, we fore the exploration of suboptimal pairs if the latter have not been explored enough (in suh a ase, the upper onfidene bound of a suboptimal pair an be higher than that of the optimal pair). KL-UCB is designed so that the number of times a suboptimal pair is seleted mathes the optimal number of times it is explored in the regret lower bound. In fat, KL-UCB is known to be asymptotially optimal in lassial bandit problems [21]. It an be easily established that its extension is also optimal for the problem (P I ): Theorem 5: For any θ [0,1] C K, the regret of the π = KL-UCB algorithm satisfies: R π (T) lim sup T log(t) I(θ), In partiular, the regret under KL-UCB sales linearly with the numbers of hannels and rates. When the later beome large, the performane of KL-UCB an be quite poor. Note that one may atually derive finite-time upper bounds on the regret of KL-UCB, as done in [21]. KL- UCB is asymptotially optimal, but also provides good performane over a finite time horizon. Finally, regarding the omputational omplexity of implementing KL-UCB, note that we just require to maintain an index for eah pair, whih requires a number of operations that sales as CK after eah transmission. The omparison between the various indexes an be done with CK log(ck) operations. B. The CRS-T algorithm for MAB problem (P U ) Next, we present CRS-T (Channel and Rate Sampling with Tests), an algorithm that exploits the struture of the MAB problem (P U ), i.e., the fat that on eah hannel, the throughput is a unimodal funtion of the rate. To desribe our algorithm, we introdue the following notations. After the n-th transmission, the rate with the highest average empirial throughput on hannel is referred to as the leader on hannel, and is l (n) = argmax k ˆµ k (n). The global leader l(n) is the (hannel, rate) pair with highest average empirial throughput: l(n) = argmax (,k) ˆµ k (n). Similar to the KL-UCB index q k (n), we define the lower onfidene bound q k (n) as: q k (n) =min{q [0, ] : t (n)i(ˆµ k(n) k, q ) log(n)+3loglog(n)}, We introdue a statistial test, whih will be used to assess whether the leader on hannel, l (n), provides a larger reward than its neighbors l (n) 1, l (n)+1 on the same hannel. Define the test for hannel at time n through U (n) : U (n) = 1{q l(n) (n) sup q k (n)} k: k l (n) =1 The test an be interpreted as follows. U (n) = 1 means that l (n) is better than its neighbors with high probability, and U (n) = 0 means that we do not have enough samples to determine whetherl (n) is better than its neighbors. After the n-th paket transmission, we define U(n) = { : U (n) = 0} the set of hannels for whih we annot determine whether the leader l (n) orresponds to the best rate k on this hannel. The sequential deisions under the CRS-T algorithm are based on the indexes of the various (hannel, rate) pairs, and an be easily implemented. The index b k (n) of deision (,k) for the (n+1)-th paket transmission is: b k (n) = q k (n)1{k = l (n)}, where q k (n) is the index used in the KL-UCB algorithm. Note that the index of deision (,k) is equal to 0 if k is not the leader on hannel. The pseudo-ode for CRS-T is given below (eah time the deision is ambiguous, ties are broken arbitrarily). Algorithm 2 CRS-T For n = 0,...,CK 1 (initialization): for the (n+1)-th transmission, selet (hannel, rate) pair (, k)(n + 1) = ( +1,k +1) where n = K +k, k {0,...,K 1}. For n CK: for the (n + 1)-th transmission, selet (,k)(n+1) where if U(n), then (n+1) U(n) and k(n+1) argmin k : k l (n+1)(n) 1t (n+1)k (n); else (,k)(n+1) argmax,k b k (n). The design of the CRS-T algorithm is motivated by the following objetives: (1) For all hannels, we need to play the leader and all its neighbours until we an determine with high probability that the leader l (n) is the best rate k. (2) One we have determined that l (n) is k for all hannels, then we play the leader of the hannel with the largest index, i.e we apply KL-UCB restrited to the set of leaders.

10 Define = min (,k) min k : k k =1 µ k µ k the minimal separation between two neighboring rates on any hannel and µ = (µ k +max k k =1µ k )/2. The next theorem, proved in [22], provides a finite time upper bound on the regret under CRS-T. The asymptoti regretlog(t) CRS-T (θ) sales linearly with the number of hannels, but is independent of the number of available rates. In partiular, CRS-T exploits the struture of the MAB problem (P U ). Theorem 6: For any θ suh that for all, θ T U, and for all ǫ > 0, the regret of the CRS-T algorithm satisfies: R CRS-T (T) (1+ǫ) CRS-T (θ)log(t) +Γ ǫ ( 2 +log(log(t))), where Γ ǫ > 0 depends on ǫ, C, K and R but not on θ, and C CRS-T (θ) = τ 1 (µ µ k ), =1 k: k k 1 with ( ) τ = min min k: k k 1I(θ k, µ / ),I(θ k,µ / ) The regret under CRS-T does not depend on the number of available rates, and hene exploits (at least asymptotially) the unimodal struture of (P U ). The omputational omplexity of CRS-T is similar to that of KL-UCB beause it essentially requires to maintain the indexes of the various hannel and rate pairs: it sales linearly with CK (up to a logarithmi fator). C. The KL-UCB-U algorithm for MAB problem (P GU ) Finally, we present KL-UCB-U, an algorithm for MAB problem (P GU ). KL-UCB-U is a natural extension of an algorithm proposed in [24] for graphially unimodal bandits with undireted graphs. This algorithm is asymptotially optimal (its regret mathes the lower bound derived in Theorem 4). Reall that the global leader is denoted by l(n) before the (n + 1)-th transmission. We introdue v (,k) (n) the number of times that (hannel, rate) pair (,k) has been the global leader up to the n-th transmission: v (,k) (n) = n n =1 1{l(n ) = (,k)}. The index assoiated with deision (,k) before the (n+1)-th transmission is: { b k (n) =max q [0, ] : t k (n)i (ˆµ k (n), q ) } log(v l(n) (n))+3log(log(v l(n) (n))), For the (n + 1)-th transmission, KL-UCB-U selets the (hannel, rate) pair in the neighborhood of the leader with maximum index. Ties are broken arbitrarily. Algorithm 3 KL-UCB-U For n = 0,...,CK 1 (initialization): for the (n+1)-th transmission, selet (hannel, rate) pair (, k)(n + 1) = ( +1,k +1) where n = K +k, k {0,...,K 1}. For n CK: for the (n + 1)-th transmission, selet (,k)(n+1) where: l(n) if (v l(n) (n) 1)/γ N, (, k)(n+1) = arg max b k(n) otherwise. (,k) N(l(n)) Remember that γ is the maximum number neighbors in G of a given (hannel, rate) pair. The KL-UCB-U algorithm periodially selets the leader to make sure that the latter is often seleted. The design of KL-UCB- U is based on the lower regret bound derived for the MAB problem (P GU ). This lower bound implies that an optimal algorithm explores suboptimal (hannel, rate) pairs a number of times that sales with log(t) only for pairs that are neighbours of the the optimal pair in the graph G. Hene in KL-UCB-U, the exploration is restrited to the neighbours of the urrent leader in G. As in [24], we an establish that KL-UCB-U is asymptotially optimal: Theorem 7: For any θ T C U G, the regret of π =KL-UCB-U satisfies: R π (T) lim sup T log(t) GU(θ), In partiular, KL-UCB-U optimally exploits the struture of MAB problem (P GU ). In turn, if the throughput is a graphially unimodal funtion of the (hannel, rate) pair, then KL-UCB-U asymptotially outperforms any other algorithm, and in partiular CRS-T, an algorithm designed to exploit the unimodal struture per hannel only. Note that a finite-time regret analysis for KL-UCB-U is possible as shown in [24]. KL-UCB-U is asymptotially optimal, but also provides good performane over a finite time horizon. Finally, again, the omputational omplexity of KL-UCB-U is similar to that of KL-UCB. VII. NON-STATIONARY RADIO ENVIRONMENTS In pratie, hannel onditions may be non-stationary, i.e., the suess probabilities at various (hannel, rate) pair ould evolve over time. In many situations, the evolution over time is rather slow refer to [3] and

11 to Setion V for test-bed measurements. These slow variations allow us to devise (hannel,rate) adaptation shemes that effiiently trak the best (hannel,rate) pair for transmission. We assume that for all (,k) pairs, the transmissions outomes X k (n), n = 1,2,... are independent, with expetation θ k (n) = E[X k (n)]. At time n we define the throughput of (,k) µ k (n) = θ k (n), the best throughput µ (n) = max,k µ k (n) and the optimal deision (,r )(n) = argmax,k µ k (n). Any algorithm designed for stationary radio environments an readily be extended to non-stationary environments. These extensions are obtained by replaing empirial averages by averages over a sliding time window. Let τ 1 denote the sliding window size, and define the empirial reward ˆµ k (n) as: ˆµ τ k (n) = t τ k (n) where t τ k (n) = n n =n τ+1 n n =n τ+1 X k (n )1{(,k)(n ) = (,k)}, 1{(,k)(n ) = (,k)}, with the onvention ˆµ τ k (n) = 0 if tτ k (n) = 0. We also define the upper onfidene index of (hannel, rate) pair (,k) as: q τ k (n) = max{q [0,] : I(ˆµτ k (n), q ) log(τ)+3log(log(τ))}. We define sliding window variants of the algorithms presented in Setion VI by replaing t k (n) by t τ k (n), ˆµ k (n) by ˆµ τ k (n) and q k(n) by qk τ (n). For instane, KL-UCB with sliding window is the algorithm whih selets (,k)(n) argmax k qk τ (n). In [25], the authors show that algorithms with sliding windows effiiently trak the best deision over time provided that the environment evolves relatively slowly. This is onfirmed in [24], where the performane of algorithms similar to KL-UCB and KL-UCB-U with sliding window is analyzed. Due to spae limitation, we skip this analysis; refer to [24] for more details. The way the window size τ should be hosen in pratie is ditated by the following remarks. First, τ should be relatively small ompared to the time it takes for the paket suessful transmission probabilities to evolve. This ensures that the algorithms with sliding window trak the best hannel and rate pair. Then, τ should be suffiiently large so that the throughput of the various hannel and rate pairs ould be estimated with preision using samples olleted in an interval of time of duration τ. Typially 10 or 20 pakets sent using the same hannel and rate pair is enough. These two requirements for τ go in opposite diretions, and learly when the paket suessful transmission probabilities evolve rapidly, an appropriate design ofτ is not possible. In pratie,τ may be tuned by just observing the way these probabilities evolve (via off-line preliminary experiments). It is not neessary to adapt τ over time if the radio environment does not hange (say for example, users stay inside a building). However, in other senarios, we may need to adapt τ. VIII. NUMERICAL EXPERIMENTS In this setion we numerially illustrate the performane of the proposed (hannel, rate) seletion algorithms. To this aim, we ondut simulation experiments where our algorithms are tested against hannel quality traes that are either extrated from a test-bed [3], or artifiially generated. In the latter ase, we generate traes based on a widely used statistial model for radio propagation and on a mapping between hannel quality and probability of paket suessful transmission on a given (hannel,rate) pair [5]. A. Traes extrated form a test-bed In this subsetion we present trae-driven experiments using the test-bed desribed in [3]. The testbed is based on a SDR platform (Lyrteh SFF-SDR), and is loated in an indoor offie. The PHY layer is OFDM, as in 802.11a/g/n. There are 3 available rates {4.5, 6, 6.75} Mbps orresponding to QPSK modulation with respetive oding rates {1/2, 2/3, 3/4}. We onsider 5 hannels in the UHF band entred at {510,530,550,580,600} Mhz. The bandwidth of eah hannel is 10 Mhz, and the paket size is 1500 bytes. The trae duration is 600s. The traes are olleted as follows. The transmitter transmits 10 pakets at eah rate on a given hannel before moving to the next hannel. Eah measurement round (where eah (hannel, rate) pair is probed) hene onsists of the transmission of 10 5 3 pakets, and lasts less than 1s. This means we sample a given hannel at a given rate one every seond. In eah round and for eah hannel, we alulate, by averaging over 10 pakets, the RSSI, the suessful paket transmission probabilities and the goodputs at the 3 different rates. In Fig.2, we plot the best deision (,k ) as a funtion of time. the radio environment is non-stationary, and the optimal deision remains onstant for several seonds. Sine a paket transmission lasts about1ms, the paket suessful transmission probabilities for various

12 Best hannel Best rate 5 4 3 2 1 0 100 200 300 400 500 600 Time (s) 3 2 1 0 100 200 300 400 500 600 Time (s) Fig. 2 TEST-BED: BEST (CHANNEL, RATE) PAIR (,k )(t) AS A FUNCTION OF TIME. Throughput (Mpbs) 1.5 1 0.5 Orale Stati KL UCB KL UCB U 0 0 200 400 600 Time (s) Fig. 3 TEST-BED: THROUGHPUTS OF THE VARIOUS ALGORITHMS AS A FUNCTION OF TIME. deisions stay onstant for thousands of paket transmissions. Therefore we have quite a lot of statistial information to find the best deision. Furthermore the window size used in the tested algorithms should be of the order of a few seonds we fix it to 2s. In Fig.3, we plot the throughput under KL-UCB and KL-UCB-U algorithms. For the sake of omparison, we also plot µ (t) the throughput of an Orale algorithm that always selets the optimal deision. We also plot the throughput obtained by hoosing the best stati (hannel, rate) pair, omputed offline. We observe that seleting the best stati pair is learly sub-optimal, so that adaptive algorithms an lead to a large gain in throughput. Both deision algorithms, KL-UCB and KL- UCB-U, manage to losely follow the best (hannel, rate) pair. KL-UCB-U provides a throughput equal to 95% of that obtained under the Orale algorithm, whereas the throughput under KL-UCB is equal to 90% of that of the Orale algorithm. There is not a huge performane gap between KL-UCB and KL-UCB-U beause there are few available rates, K = 3. Hene KL-UCB explores C K = 15 (hannel,rate) pairs, while KL-UCB-U explores (in the worse ase) 2C + 1 = 11 pairs. We will show that inreasing the number of available rates K makes this differene signifiantly larger. B. Artifiial traes We also present numerial results based on a widely used statistial model for radio propagation. Namely, we assume that the hannel is a multi-path Rayleigh fading hannel. When a signal is transmitted, several delayed opies of this signal are reeived and the amplitude and phase of eah delayed opy is an independent Rayleigh fading proess. We use Jakes model to simulate Rayleigh fading with user speed set to math the time variability of the test-bed trae presented in VIII-A. This orresponds to stati users suh as laptops in an offie environment. The expeted power of eah delayed path is hosen aording to the field measurements presented in [26]. We assume that OFDM is used, and the mapping between the strength of reeived signal on eah subarrier and the probability of suessful transmission is alulated by the method presented in [5]. We onsider 5 hannels with bandwidth 20 Mhz in the 2.4 GHz band entred at {2.4, 2.41, 2.42, 2.43, 2.44}GHz, respetively. Eah hannel has 52 sub-arriers and the paket size is 1500 bytes. We onsider 8 available rates: {6,13,19.5,26,39,52,58.5,65} Mbps, and a transmitter-reeiver pair with an average SNR of 20 db. The trae length is 600 seonds. We first onsider stationary environments, so that a snapshot of the suess probabilities for all (hannel,rate) pairs is drawn and kept onstant throughout the simulation. Fig.4 shows the paket suessful transmission probabilities and throughputs of different (hannel,rate) pairs. As announed, graphial unimodality holds: the throughput on eah hannel is a unimodal funtion of the rate, and given the optimal rate k on sub-optimal hannel, there exists another hannel suh that either µ,k > µ,k or µ,k +1 > µ,k. Graphial unimodality results from the fat that we are in a steep environment as defined in [14]. Fig.5 presents