Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks

Similar documents
Population Games and Evolutionary Dynamics

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

Evolutionary Game Theory: Overview and Recent Results

Learning, Memory, and Inertia *

Performance and Convergence of Multi-user Online Learning

1 Equilibrium Comparisons

Distributed Learning based on Entropy-Driven Game Dynamics

AN EVOLUTIONARY APPROACH TO INTERNATIONAL ENVIRONMENTAL AGREEMENTS WITH FULL PARTICIPATION. Hsiao-Chi Chen Shi-Miin Liu.

Fair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks

Competitive Scheduling in Wireless Collision Channels with Correlated Channel State

The Theory behind PageRank

Hannu Salonen Utilitarian Preferences and Potential Games. Aboa Centre for Economics

Stochastic Adaptive Dynamics. Prepared for the New Palgrave Dictionary of Economics. H. Peyton Young. February 27, 2006

Utilitarian Preferences and Potential Games

Local Interactions under Switching Costs

On Equilibria of Distributed Message-Passing Games

Large Deviations and Stochastic Stability in the Small Noise Double Limit

Basins of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by-Step Evolution

Ergodicity and Non-Ergodicity in Economics

Large Deviations and Stochastic Stability in the Small Noise Double Limit, I: Theory

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES. Toshimasa Maruta 1 and Akira Okada 2

Learning Near-Pareto-Optimal Conventions in Polynomial Time

MS&E 246: Lecture 17 Network routing. Ramesh Johari

TCP over Cognitive Radio Channels

Game Theory and Control

Distributed learning in potential games over large-scale networks

Large deviations and stochastic stability in the small noise double limit

Citation Hitotsubashi Journal of Economics,

Learning to Coordinate Efficiently: A Model-based Approach

Optimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach

Bayesian Learning in Social Networks

Evolution & Learning in Games

Part II: Integral Splittable Congestion Games. Existence and Computation of Equilibria Integral Polymatroids

Achieving Pareto Optimality Through Distributed Learning

Random Access Game. Medium Access Control Design for Wireless Networks 1. Sandip Chakraborty. Department of Computer Science and Engineering,

Imitation Processes with Small Mutations

Equilibrium selection in bargaining models

Stochastic Stability in Three-Player Games with Time Delays

Distributed Spectrum Access with Spatial Reuse

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL

A finite population ESS and a long run equilibrium in an n players coordination game

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Optimal Convergence in Multi-Agent MDPs

Dynamic spectrum access with learning for cognitive radio

DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Stochastic Imitative Game Dynamics with Committed Agents

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

Decoupling Coupled Constraints Through Utility Design

Wireless Channel Selection with Restless Bandits

Applications of Dynamic Pricing in Day-to-Day Equilibrium Models

Achieving Pareto Optimality Through Distributed Learning

Markov Decision Processes

Performance of Round Robin Policies for Dynamic Multichannel Access

Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

Population Games and Evolutionary Dynamics

MDP Preliminaries. Nan Jiang. February 10, 2019

Achieving Pareto Optimality Through Distributed Learning

UNIVERSITY OF VIENNA

A Modified Q-Learning Algorithm for Potential Games

Distributed Power Control and lcoded Power Control

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access

Stochastic Stability of General Equilibrium. Antoine Mandel and Herbert Gintis

Cooperation Speeds Surfing: Use Co-Bandit!

Basics of reinforcement learning

Self-stabilizing uncoupled dynamics

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Reinforcement Learning

2176 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 13, NO. 4, APRIL 2014

A note on Herbert Gintis Emergence of a Price System from Decentralized Bilateral Exchange

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

ALOHA Performs Optimal Power Control in Poisson Networks

Calibration and Nash Equilibrium

Connections Between Cooperative Control and Potential Games Illustrated on the Consensus Problem

Fast Convergence in Evolutionary Equilibrium Selection

The Principle of Minimum

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Achieving Pareto Optimality Through Distributed Learning

The Water-Filling Game in Fading Multiple Access Channels

T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:

Fictitious Self-Play in Extensive-Form Games

Reinforcement Learning

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Fast Convergence in Evolutionary Equilibrium Selection 1

Gains in evolutionary dynamics. Dai ZUSAI

Markov Decision Processes

Understanding the Influence of Adversaries in Distributed Systems

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Survival of Dominated Strategies under Evolutionary Dynamics. Josef Hofbauer University of Vienna. William H. Sandholm University of Wisconsin

Stochastic stability in spatial three-player games

Decentralized Anti-coordination Through Multi-agent Learning

Introduction to Reinforcement Learning

Communication Games on the Generalized Gaussian Relay Channel

Introduction to Game Theory

Transcription:

Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks Marceau Coupechoux, Stefano Iellamo, Lin Chen + TELECOM ParisTech (INFRES/RMS) and CNRS LTCI + University Paris XI Orsay (LRI) CEFIPRA Workshop on New Avenues for Network Models Indian Institute of Science, Bangalore 14 Jan 2014 M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 1 / 26

Introduction Introduction Opportunistic spectrum access in cognitive radio networks SU access freq. channels partially occupied by the licensed PU Distributed spectrum access policies based only on past experienced payoffs (i.e. completely uncoupled dynamics as opposed to coupled dynamics where players can observe the actions of others) Convergence analysis based on perturbed Markov chains M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 2 / 26

Related Work Related Work Distributed spectrum access in CRN: # SUs < # Channels: solutions based on multi-user Multi-Armed Bandit [Mahajan07, Anandkumar10] Large population of SUs: Distributed Learning Algorithm [Chen12] based on Reinforcement Learning and stochastic approx., Imitation based algorithms [Iellamo13] Bounded rationality and learning in presence of noise: Bounded rationality: [Foster90, Kandori93, Kandori95, Dieckmann99, Ellison00] Learning in presence of noise: [Mertikopoulos09] Mistake models: [Friedman01] Trial and Error: [Pradelski12] Similar approaches to our algorithm in other contexts: [Marden09, Zhu13] M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 3 / 26

System Model System Model I A PU is using on the DL a set C of C freq. channels Primary receivers are operated in a synchronous time-slotted fashion The secondary network is made of a set N of N SUs We assume perfect sensing M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 4 / 26

System Model System Model II PU is active with probability 1-µ i SUs share the available bandwidth SUs take a decision for the next block channel i PU is active block t block t+1 time At each time slot, channel i is free with probability µ i Throughput achieved by j along a block is denoted T j Expected throughput when block duration is large: E[T j ] = Bµ sj p j (n sj ) p j ( ) is a function that depends on the MAC protocol, on j and on the number of SUs on the channel chosen by j, n sj We assume B = 1, p j strictly decreasing and p j (x) 1/x for x > 0 M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 5 / 26

Spectrum Access Game Formulation Spectrum Access Game Formulation Definition The spectrum access game G is a 3-tuple (N, C, {U j (s)}), where N is the player set, C is the strategy set of each player. When a player j chooses strategy s j C, its player-specific utility function U j (s j, s j ) is defined as U j (s j, s j ) = E[T j ] = µ sj p j (n sj ). Lemma (Milchtaich96) For the spectrum access game G, there exists at least one pure Nash equilibrium (PNE). M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 6 / 26

Retrospective Spectrum Access Protocol Motivation Find a distributed strategy for SUs to converge to a PNE Uniform random imitation of another SU leads to the replicator dynamics (see Proportional Imitation Rule in [Schlag96, Schlag99]) Uniform random imitation of two SUs leads to the aggregate monotone dynamics (see Double Imitation in [Schlag96, Schlag99]) Imitation on the same channel can be approximated by a double replicator dynamics [Iellamo13] We now want to avoid any information exchange between SUs M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 7 / 26

Retrospective Spectrum Access Protocol RSAP I Each SU j has a finite memory H j containing the history (strategies and payoffs) relative to the H j past iterations. State of the system at t: z(t) {s j (t h), U j (t h)} j N,h Hj Number of iterations passed from the highest remembered payoff: λ j = min argmax h H j U j (t h) Define inertia ρ j = prob. that j is unable to update its strategy at each t [Alos-Ferrer08] (an endogenous parameter for us) Define the exploration probability ɛ(t) 0 M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 8 / 26

RSAP II Retrospective Spectrum Access Protocol Algorithm 1 RSAP: executed at each SU j 1: Initialization: Set ɛ(t) and ρ j. 2: At t = 0, randomly choose a channel to stay, store the payoff U j (0) and set U j (t h) randomly h {1,.., H j }. 3: while at each iteration t 1 do 4: With probability 1 ɛ(t) do 5: if U j (t λ j ) > U j (t) 6: Migrate to channel s j (t λ j ) w. p. 1 ρ j 7: Stay on the same channel w. p. ρ j 8: else 9: Stay on the same channel 10: end if 11: With probability ɛ(t) switch to a random channel. 12: end while M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 9 / 26

RSAP III Retrospective Spectrum Access Protocol Definition (Migration Stable State) A migration stable state (MSS) ω is a state where no more migration is possible, i.e., U j (t) U j (t h) h H j j N. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 10 / 26

Convergence Analysis Perturbed Markov Chain I We have a model of evolution with noise: } Z = {z {s j (t h), U j (t h)} j N,h Hj is the finite state space of the system stochastic process Unperturbed chain: P = (p uv ) (u,v) Z 2 is the transition matrix of RSAP without exploration (i.e. ɛ(t) = 0 t) Perturbed chains: P(ɛ) = (p uv (ɛ)) (u,v) Z 2 is a family of transitions matrices on Z indexed by ɛ [0, ɛ] associated to RSAP with exploration ɛ Properties of P(ɛ): P(ɛ) is ergodic for ɛ > 0 P(ɛ) is continuous in ɛ and P(0) = P There is a cost function c : Z 2 R + { } s.t. for any pair of states p (u, v), lim uv (ɛ) ɛ 0 ɛ exists and is strictly positive for c cuv uv < and p uv (ɛ) = 0 if c uv = M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 11 / 26

Convergence Analysis Perturbed Markov Chain II Remarks: ɛ can be interpreted as a small probability that SUs do not follow the rule of the dynamics. When a SU explores, we say that there is a mutation The cost c uv is the rate at which p uv (ɛ) tends to zero as ɛ vanishes c uv can also be seen as the number of mutations needed to go from state u to state v c uv = 0 when p uv 0 in the unperturbed Markov chain c uv = when the transition u v is impossible in the perturbed Markov chain The unperturbed Markov chain is not necessarily ergodic. It has one or more limit sets, i.e., recurrent classes M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 12 / 26

Convergence Analysis Perturbed Markov Chain III Lemma (Young93) There exists a limit distribution µ = lim ɛ 0 µ(ɛ) Definition A state i Z is said to be long-run stochastically stable iff µ i > 0. Lemma (Ellison00) The set of stochastically stable states is included in the recurrent classes (limit sets) of the unperturbed Markov chain (Z, P). M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 13 / 26

Convergence Analysis Ellison Radius Coradius Theorem I D(Ω) R(Ω) Ω L r-1 CR*(Ω) L 1 x z proba=1 Ω: a union of limit sets of (Z, P) D(Ω): basin of attraction, the set of states from which the unperturbed chain converges to Ω w.p.1 R(Ω): radius, the min cost of any path from Ω out of D(Ω) CR(Ω): coradius, maximum cost to Ω CR (Ω): modified coradius, obtained by substracting from the cost, the radius of intermediate limit sets M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 14 / 26

Convergence Analysis Ellison Radius Coradius Theorem II Theorem (Ellison00, Theorem 2 and Sandholm10, Chap. 12) Let (Z, P, P(ɛ)) be a model of evolution with noise and suppose that for some set Ω, which is a union of limit sets, R(Ω) > CR (Ω), then: The long-run stochastically stable set of the model is included in Ω. For any y / Ω, the longest expected wait to reach Ω is W (y, Ω, ɛ) = O(ɛ CR (Ω) ) as ɛ 0. Proof idea Uses the Markov chain tree theorem and the fact that it is more difficult to escape from Ω than to return to Ω. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 15 / 26

Convergence Analysis RSAP Convergence Analysis I Proposition Under RSAP, LS MSS, i.e., all MSSs are LSs and all LSs are made of a single state, which is MSS, (a) in the general case with ρ j > 0, or (b) in the particular case H j = 1 and ρ j = 0, for all j N. Proof idea Every MSS is obviously a LS. (a) There is a positive probability that no SU change its strategy for max j H j iterations. After such an event, the system is in a MSS. (b) If the system is in a LS, every SU must switch between at most two strategies. As the system is deterministic, the system alternates between two states. So the LS has a unique state because every SU can choose between two payoffs. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 16 / 26

Convergence Analysis RSAP Convergence Analysis II Remark. Every PNE can be mapped to a set of sates that are MSSs, i.e., LSs. Let denote Ω the union of all these states corresponding to the PNEs. Lemma It holds that R(ω) = 1 ω / Ω, where ω is a LS. Proof idea For a congestion game G with player specific decreasing payoff functions, the weak-fip property holds [Milchtaich96]. Using weak-fip, we show that a single mutation is enough to leave the basin of attraction of any MSS not in Ω and to reach a new MSS. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 17 / 26

Convergence Analysis RSAP Convergence Analysis III Lemma CR (Ω ) = 1 Proof idea From any state, there is a path of null cost to reach a MSS and then (from weak-fip property) a path, which is a sequence of MSSs. Each MSS has a radius of 1. Lemma R(Ω ) > 1 Proof idea Comes from the definition of the PNEs and of RSAP. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 18 / 26

Convergence Analysis RSAP Convergence Analysis IV Theorem (Convergence of RSAP and convergence rate) If all SUs adopt the RSAP with exploration probability ɛ 0, then the system dynamics converges a.s. to Ω, i.e. to a PNE of the game. The expected wait until a state in Ω is reached, given that the play in the ɛ-perturbed model begins in any state not in Ω, is O(ɛ 1 ) as ɛ 0. Remark. Our study can be readily extended to other games possessing the weak-fip and hence the FBRP, weak-fbrp and the FIP [Monderer96] since FIP FBRP weak-fbrp weak-fip. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 19 / 26

Simulation Settings Performance Evaluation We compare our algorithm to Trial and Error (T&E, Pradelski s optimized learning parameters in [Pradelski&Young 2012]) and to the Distributed Learning Algorithm (DLA) [Chen&Huang 2012]. We consider two networks: Network 1: We consider N = 50 SUs, C = 3 channels characterized by the availability probabilities µ = [0.3, 0.5, 0.8] and user specific payoffs: U j (.) = w j f (.), where f (.) is a decreasing function common to all the SUs and w j is a user-specific weight. We set H j =3 and ρ j =0.3 for all j. Network 2: We set N = 10, C = 2 and µ = [0.2, 0.8]. We set H j =1 and ρ j =0 for all j. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 20 / 26

Performance Evaluation Fairness index in Network 1 I weighted fairness index 1 0.95 0.9 0.85 0.8 0 20 40 60 80 100 120 140 160 180 200 iteration t DLA (γ=0.01) DLA (γ=0.1) DLA (γ=1) RSAP Figure : Weighted fairness index of RSAP and the DLA algorithm proposed in [Chen&Huang 2012]. Each curve represents an average over 1000 independent realizations. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 21 / 26

Performance Evaluation Fairness index in Network 1 II weighted fairness index 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 RSAP DLA (γ=0.01) 0.55 0 20 40 60 80 100 120 140 160 180 200 iteration t Figure : Weighted fairness index of RSAP and the DLA algorithm proposed in [Chen&Huang 2012]. Each curve represents a single realization of the two algorithms. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 22 / 26

RSAP vs T&E Performance Evaluation Jain s fairness index 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 iteration t Figure : Trial and Error fairness index on Network 2 (average of 1000 trajectories). M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 23 / 26

RSAP vs T&E Performance Evaluation Jain s fairness index 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 100 iteration t Figure : RSAP fairness index on Network 2. M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 24 / 26

Conclusion and Further Work Conclusion We discussed the distributed resource allocation problem in CRNs We have proposed a fully distributed scheme without any information exchange between SUs and based on self-imitation We have proved convergence using Ellison radius-coradius theorem We have compared RSAP to T&E [Pradelski&Young 2012] and to DLA [Chen&Huang 2012] M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 25 / 26

Conclusion and Further Work Further Work Congestion games on graphs More realistic models of the channel between the SU transmitter and the SU receiver Learning in presence of noise (SUs get only an estimate of the mean throughput at each iteration) Joint sensing and access problem M. Coupechoux (TPT) Retrospective Spectrum Access Protocol 14 Jan 2014 26 / 26