OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback. Prof. Phil Schniter

Similar documents
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO. XX, MONTH

OPTIMAL RESOURCE ALLOCATION IN OFDMA DOWNLINK SYSTEMS WITH IMPERFECT CSI

Joint Scheduling and Resource Allocation in the OFDMA Downlink: Utility Maximization under Imperfect Channel-State Information

Joint Scheduling and Resource Allocation in OFDMA Downlink Systems via ACK/NAK Feedback

Heterogeneous Doppler Spread-based CSI Estimation Planning for TDD Massive MIMO

Optimal Transmit Strategies in MIMO Ricean Channels with MMSE Receiver

Opportunistic Scheduling Using Channel Memory in Markov-modeled Wireless Networks

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

On the Throughput of Proportional Fair Scheduling with Opportunistic Beamforming for Continuous Fading States

Markovian Decision Process (MDP): theory and applications to wireless networks

Dynamic spectrum access with learning for cognitive radio

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

On Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms

Adaptive Bit-Interleaved Coded OFDM over Time-Varying Channels

NOMA: Principles and Recent Results

How Much Training and Feedback are Needed in MIMO Broadcast Channels?

Coding versus ARQ in Fading Channels: How reliable should the PHY be?

Cell throughput analysis of the Proportional Fair scheduler in the single cell environment

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks A Whittle s Indexability Analysis

Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios

Improved MU-MIMO Performance for Future Systems Using Differential Feedback

Mobile Communications (KECE425) Lecture Note Prof. Young-Chai Ko

A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels

Direct-Sequence Spread-Spectrum

The Optimality of Beamforming: A Unified View

Dirty Paper Coding vs. TDMA for MIMO Broadcast Channels

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

IN this paper we consider a problem of allocating bandwidth

Journal Watch IEEE Communications- Sept,2018

Adaptive Distributed Algorithms for Optimal Random Access Channels

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Wireless Channel Selection with Restless Bandits

Near-Optimal Resource Allocation for Type-II HARQ based OFDMA Networks under Rate and Power Constraints

Limited Feedback in Wireless Communication Systems

Morning Session Capacity-based Power Control. Department of Electrical and Computer Engineering University of Maryland

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

2318 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE Mai Vu, Student Member, IEEE, and Arogyaswami Paulraj, Fellow, IEEE

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL

Under sum power constraint, the capacity of MIMO channels

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Lattice Reduction Aided Precoding for Multiuser MIMO using Seysen s Algorithm

Applications of Lattices in Telecommunications

On Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms

Two-Way Training: Optimal Power Allocation for Pilot and Data Transmission

Abstract. I. Introduction

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

Optimal Energy Allocation. for Wireless Communications. with Energy Harvesting Constraints

WIRELESS COMMUNICATIONS AND COGNITIVE RADIO TRANSMISSIONS UNDER QUALITY OF SERVICE CONSTRAINTS AND CHANNEL UNCERTAINTY

L interférence dans les réseaux non filaires

Fast-Decodable MIMO HARQ Systems

Ressource Allocation Schemes for D2D Communications

QoS-Based Power-Efficient Resource Management for LTE-A Networks with Relay Nodes

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

ELEC546 Review of Information Theory

Exploiting Sparsity for Wireless Communications

Analysis of Receiver Quantization in Wireless Communication Systems

Energy Optimal Transmission Scheduling in Wireless Sensor Networks

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

Optimum Power Allocation in Fading MIMO Multiple Access Channels with Partial CSI at the Transmitters

Learning Algorithms for Minimizing Queue Length Regret

Exploiting Partial Channel Knowledge at the Transmitter in MISO and MIMO Wireless

Delay QoS Provisioning and Optimal Resource Allocation for Wireless Networks

Improved Multiple Feedback Successive Interference Cancellation Algorithm for Near-Optimal MIMO Detection

Noncooperative Optimization of Space-Time Signals in Ad hoc Networks

4888 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 15, NO. 7, JULY 2016

Performance Analysis of BPSK over Joint Fading and Two-Path Shadowing Channels

Energy Efficient Multiuser Scheduling: Statistical Guarantees on Bursty Packet Loss

Multiuser Joint Energy-Bandwidth Allocation with Energy Harvesting - Part I: Optimum Algorithm & Multiple Point-to-Point Channels

Resource Management and Interference Control in Distributed Multi-Tier and D2D Systems. Ali Ramezani-Kebrya

Chapter 4: Continuous channel and its capacity

Energy-Efficient Resource Allocation for

Interleave Division Multiple Access. Li Ping, Department of Electronic Engineering City University of Hong Kong

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 15, NO. 11, NOVEMBER

On Power Minimization for Non-orthogonal Multiple Access (NOMA)

Risk-Sensitive Reinforcement Learning for URLLC Traffic in Wireless Networks

Adaptive Modulation with Smoothed Flow Utility

The RF-Chain Limited MIMO System: Part II Case Study of V-BLAST and GMD

On the Optimality of the ARQ-DDF Protocol

Modeling and Simulation NETW 707

Reducing contextual bandits to supervised learning

Low-High SNR Transition in Multiuser MIMO

Power Control for Wireless VBR Video Streaming: From Optimization to Reinforcement Learning

Multi-User Gain Maximum Eigenmode Beamforming, and IDMA. Peng Wang and Li Ping City University of Hong Kong

Distributed Power Control and lcoded Power Control

SINGLE antenna differential phase shift keying (DPSK) and

Minimum BER Linear Transceivers for Block. Communication Systems. Lecturer: Tom Luo

Trust Degree Based Beamforming for Multi-Antenna Cooperative Communication Systems

Appendix B Information theory from first principles

Wireless Communications Lecture 10

Spatial and Temporal Power Allocation for MISO Systems with Delayed Feedback

Lecture 5: Antenna Diversity and MIMO Capacity Theoretical Foundations of Wireless Communications 1. Overview. CommTh/EES/KTH

Short-Packet Communications in Non-Orthogonal Multiple Access Systems

Lecture 4. Capacity of Fading Channels

Decomposition Methods for Large Scale LP Decoding

Cooperative Transmission for Wireless Relay Networks Using Limited Feedback

On joint CFO and channel estimation in single-user and multi-user OFDM systems

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

Transcription:

OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback Prof. Phil Schniter T. H. E OHIO STATE UNIVERSITY Joint work with Mr. Rohit Aggarwal, Dr. Mohamad Assaad, Dr. C. Emre Koksal September 20, 2010 1

Setup: Single-antenna downlink with K mobile users OFDMA with N subchannels Channels are Markov time-varying with impulse responses of length L ACK/NAK feedback on each transmitted data packet (i.e., ARQ) data data ACK/NAK data ACK/NAK ACK/NAK subchannel 1 subchannel 2 subchannel 3 2

The OFDMA Resource Allocation Problem: At each time t, we want to assign the best users (multiuser diversity) to their best subchannels (frequency diversity). We also want to optimize the powers and data-rates of assigned users. For this, we need channel state information (CSI) at the transmitter. Feedback of receiver CSI can be very costly! (Many users, subchannels.) Feedback of data-packet ACK/NAKs is free! (Provided by link layer.) Is it possible to do near-optimal resource allocation using only ACK/NAK feedback from previously transmitted data packets? Can we get enough Tx-CSI from such limited cross-layer feedback? 3

A Much Simpler Problem: Simplified scenario: K=1 user, N=1 subchannel, L=1 channel tap. Point-to-point communication over a time-varying channel. The only parameter to optimize is the transmission rate r t (bits/packet). Rate adaptation via ARQ feedback. Though it may sound simple, this is still an interesting problem! Notice that r t effects short-term throughput: r t too high the packet will not be correctly decoded. r t too low the channel will be underutilized. But r t also effects long-term throughput: Recalling that more accurate CSI leads to higher throughput... r t very high likely NAK nothing learned about CSI. r t very low likely ACK nothing learned about CSI. r t such that Pr{ACK} = 1 is most informative about CSI. 2 4

Rate Adaptation based on ARQ Feedback: Goal: Maximize total expected goodput over T packets: { T } [r1,...,r T] = argmaxe G(γ t,r t ), G(γ,r) = ( 1 ǫ(γ,r) ) r. [r 1,...,r T ] t=1 If our knowledge of the SNR γ t was decoupled from the rates {r τ } τ t, then it would be relatively easy to calculate the optimal {rt} T t=1: rt = argmaxe{g(γ t,r t )} = argmax G(γ t,r t )p(γ t )dt. r t r t But really, our knowledge of SNR γ t will depend on the ARQ feedbacks received up to time t, which are strongly influenced by rates {r τ } t 1 τ=1. The current rate allocation affects not only immediate goodput, but also future SNR knowledge (and hence future goodput)! A classical tradeoff between exploitation and exploration. 5

Optimal Rate Adaptation: Optimal rate adaptation, i.e., r t+1 = argmax r t+1 E { G(r t+1,γ t+1 )+ can be recognized as a dynamic program. T τ=t+2 Denoting the optimal expected future-goodput by { T Ḡ t+1:t(ˆǫ t,r t ) E G(rτ,γ τ ) we can write the Bellman equation as τ=t+1 G(r τ,γ τ ) ˆǫ t,r t }, } ˆǫ t,r t { Ḡ t+1:t(ˆǫ t,r t ) = max E{G(r t+1,γ t+1 ) ˆǫ t,r t } r t+1 } +E{Ḡ t+2:t([ˆǫ t,ˆǫ t+1 ],[r t,r t+1 ]) ˆǫ t,r t } with the first expectation over γ t+1 and the second expectation over ˆǫ t+1. 6

Optimal Rate Adaptation (cont.): The solution to this dynamic program is a partially observable Markov decision process (POMDP), which is intractable when the SNRs {γ t } are continuous random variables, can be solved numerically when the SNRs {γ t } are discrete, but at great cost: both complexity and memory increase exponentially with the horizon length T and the number of states used for γ t. ( PSPACE-complete ) Not practical to implement! 7

Greedy Adaptation: What if we maximize only the short-term reward? This corresponds to the greedy scheme r g t+1 argmax r t+1 E { G(r t+1,γ t+1 ) ˆǫ t,r t }, which should be much easier to implement. How good is this greedy scheme (w.r.t the optimal POMDP)? 8

The Causal Genie: Consider the optimal rate under non-degraded causal error-rate feedback: r cg t+1 argmax r t+1 E { G(r t+1,γ t+1 )+ T τ=t+2 G(r cg τ,γ τ ) ǫ t,r t }. For a given rate r t, the error ǫ t uniquely determines the SNR γ t, so that r cg t+1 = argmax r t+1 E { G(r t+1,γ t+1 )+ T τ=t+2 G(r cg τ,γ τ ) } γ t. Since future causal-genie rates {r cg τ } τ t+2 will be chosen based on perfect SNR knowledge, they will not depend on r t+1. Thus, r cg t+1 = argmax r t+1 E { G(r t+1,γ t+1 ) γt }. Optimal adaptation under non-degraded error-rate feedback is greedy! 9

The Causal Genie is an Upper Bound on the POMDP: Since E{G(r t,γ t ) ˆǫ t d,r t d } max E{G(r t,γ t ) ˆǫ t d,r t d } r t R...since rt is not necessarily short-term optimal = max E{ E{G(r t,γ t ) ˆǫ t d,r t d,ǫ t d } } ˆǫ t d,r t d r t R E { max E{G(r t,γ t ) ˆǫ t d,r t d,ǫ t d } } ˆǫ t d,r t d r t R...since max r t E{f(r t )} E{max r t = E { max r t R E{G(r t,γ t ) γ t d } ˆǫ t d,r t d } = E{G(r cg t,γ t ) ˆǫ t d,r t d }, summing and averaging both sides gives { T } { T E G(rt,γ t ) E t=1 t=1 f(r t )} for any f( ) } G(r cg t,γ t ). 10

Recap: So far we ve shown that, in the point-to-point problem, 1. Optimal rate adaptation is very difficult to implement/analyze. 2. There exists a suboptimal greedy scheme which is easy to implement. 3. The causal-genie scheme (i.e., the optimal scheme based on non-degraded feedback) is easy to implement, since it s greedy. 4. The causal genie upper bounds the performance of the optimal scheme. So, if we can show that the greedy scheme is close to the causal genie, then greedy must be close to optimal. 11

Numerical Examples Setup: Modulation: Uncoded square-qam, p = 100, packet error rate ( ( ǫ(r t,γ t ) = 1 1 2 1 1 ) ( )) 2p 3γt Q. 2 r t /p 2 r t/p 1 ARQ feedback: ˆǫ t {0,1} SNR variation: Pr(ˆǫ t = f ǫ t ) = ǫ t 1 ǫ t for f = 0 ( NAK ) for f = 1 ( ACK ) Rayleigh-fading channel gain (via Gauss-Markov process): g t+1 = (1 α)g t +αw t, {w t } i.i.d CN(0,1) SNR: γ t = C g t 2. Parameters (α,c) control the mean and coherence time of γ t. For a given α, we assume C is chosen so that E{γ t } = 25 db. 12

Steady-state goodput versus fading-rate parameter α: 5.2 5 Steady State Goodput 4.8 4.6 4.4 4.2 4 Non causal Genie Causal Genie Greedy Best fixed rate Greedy rate adaptation extracts most of the goodput gain available from the use of causal ARQ feedback! 3.8 3.6 10 4 10 3 10 2 10 1 Gauss Markov model parameter, α Best fixed-rate: r fr t+1 = argmax rt+1 G(rt+1,γ t+1 )p(γ t+1 )dγ t+1 Greedy: r g t+1 = argmax rt+1 G(rt+1,γ t+1 )p(γ t+1 ˆǫ t,r t )dγ t+1 Causal Genie: r cg t+1 = argmax rt+1 G(rt+1,γ t+1 )p(γ t+1 ǫ t,r }{{} t )dγ t+1 Non-causal Genie: r ng γ t+1 = argmax rt+1 G(r t+1,γ t+1 ) t 13

Back to the OFDMA Problem: Single-antenna downlink with K mobile users OFDMA with N subchannels Channels are Markov time-varying with impulse responses of length L ACK/NAK feedback on each transmitted data packet (i.e., ARQ) data data ACK/NAK data ACK/NAK ACK/NAK subchannel 1 subchannel 2 subchannel 3 14

Formal Objective: At each time t and subchannel n, choose each user k s next... rate r n,k,t+1 R, for some M-ary rate alphabet R, power p n,k,t+1 0, based on ARQ feedbacks F 1:t, to maximize the total future utility { T K N (1 ǫ(rn,k,τ Ḡ t+1:t = E U(,γ n,k,τ,p n,k,τ ) ) ) } r n,k,τ F 1:t }{{} τ=t+1 k=1 n=1 goodput from k on n at τ subject to the power constraint n,k p n,k,τ P con, τ, and subject to 1 (user,rate) per (time,subchannel). Here, ǫ(r,γ,p) is packet error rate, U( ) is a concave utility function, and F 1:t {0,1, } tnk collects all past ARQ feedbacks. 15

Greedy Resource Allocation: Using the indicator I n,m,k,t {0,1} to denote time-t assignment of subchannel n to user k at MCS index m, the time-t greedy allocation is { ) max E U (I } n,k,m,t+1 (1 a m e b mp n,k,m,t+1 γ n,k,t+1 )r m F 1:t I n,k,m,t+1 {0,1} k n,m p n,k,m,t+1 0 subject to n,k,mi n,k,m,t p n,k,m,t P con, t, and k,m I n,k,m,t 1, n, t, where γ n,k,t is SNR of user k on subchannel n at time t, (a m,b m,r m ) determines data rate and error rate for MCS index m, F 1:t collects all ARQ feedbacks collected from times 1 to t. A mixed-integer programming problem! 16

Greedy Resource Allocation: Exact solution: Brute Force: Compute water-filling solution for every hypothesized (subcarrier,user,rate) combination. Costs O ( (KM) N). Proposed approximate solution: Relax problem by allowing each (subcarrier,time) slot to be shared among multiple (user,rate) pairs. Can reformulate as a convex optimization problem. Can solve dual problem using an iterative algorithm with exponential convergence rate at O(N KM) per iteration. Byproducts of above algorithm yield a near-exact solution to the original mixed-integer problem. R. Aggarwal, M. Assaad, C.E. Koksal, P. Schniter, Optimal Joint Scheduling and Resource Allocation in OFDMA Downlink Systems with Imperfect Channel-State Information, manuscript in preparation. 17

Approximate Greedy Algorithm Sketch: Say that we relax the binary indicators to Ĩn,m,k,t [0,1]. Then the KKT conditions become (suppressing the time-t notation): n,k,m, µ = a m b m r m E{γ n,k e b mp n,k,m γ n,k F} (1) n,k,m, λ n = r m E{1 a m e b mp n,k,m γ n,k F} µp n,k,m (2) where {λ n } N n=1 and µ are Lagrange multipliers. A practical alg is then: 1. Initialize search region [µ,µ] with midpoint µ = (µ+µ)/2. 2. For each subchannel n, For each (k,m)... calculate p n,k,m by solving (1) and forcing a non-negative result. plug p n,k,m into (2) and calculate the corresponding λ n (k,m). Find (k,m ) = argmax (k,m) λ n (k,m). Set I n,k,m = 1 and I n,k,m (k,m) (k,m ) = 0. 3. If n p n,k,m > P con, set [µ,µ] [µ,µ], else set [µ,µ] [µ,µ]. Goto 2. 18

Approximate Greedy Algorithm Example Performance: N K M greedy goodput (brute force) approximation 1 3 9 5.9884 5.988 1 5 9 6.3501 6.3499 2 3 9 10.3251 10.3249 2 5 9 10.9778 10.9774 3 3 9 14.0573 14.0571 3 5 9 14.9653 14.9651 The practical approximation typically yields 99.99% of the goodput attained by the exact greedy scheme! We have also derived performance bounds on our approximation. 19

Tracking the SNR distribution: The greedy allocator tracks the SNR by updating the SNR distributions { p(γn,k,t+1 F 1:t ), users k and subchannels n } The SNR evolves as follows: Markov evolution of time-domain channel taps: h l,k,t+1 = (1 α)h l,k,t +αw l,k,t, w l,k,t CN(0,1), subchannel gains as a function of time-domain channel taps: H n,k,t = L 1 l=0 h l,k,t e j 2π N nk, subchannel SNRs as a function of subchannel gains: γ n,k,t = C H n,k,t 2. 20

Tracking the SNR distribution (cont.): SNR tracking can be done as follows: p(γ n,k,t+1 F 1:t ) = p(γ n,k,t+1 h k,t+1 ) p(h h k,t+1 }{{} k,t+1 F 1:t ) (3) p(h k,t+1 F 1:t ) = p(h k,t F 1:t ) = p(f k,t h k,t ) = p(f n,k,t = f γ n,k,t ) = (approx of) Dirac delta p(h k,t+1 h k,t ) p(h h k,t }{{} k,t F 1:t ) (4) Markov prediction p(f k,t h k,t )p(h k,t F 1:t 1 ) p(f k,t h k,t)p(h (Bayes rule) (5) k,t F 1:t 1 ) h k,t N p(f n,k,t γ n,k,t (h k,t )) (6) n=1 m I n,k,m,ta m e b mp n,k,m,t γ n,k,t f = 0 m I n,k,m,t(1 a m e b mp n,k,m,t γ n,k,t) f = 1 1 m I n,k,m,t f = (7) 21

Tracking the SNR distribution (cont.): Thus, for each user k, 1. measure feedbacks f k,t across all subchannels, 2. compute p(f n,k γ n,k,t (h k,t )) on h-lattice using error-rate rules (6)-(7), 3. compute p(h k,t F 1:t ) on h-lattice by updating previous posterior via (5), 4. compute p(h k,t+1 F 1:t ) on h-lattice via Markov-prediction step (4), 5. compute p(γ k,t+1 F 1:t ) on γ-lattice via h-to-γ conversion step (3). This costs O ( KNQ L h +KLQL+1 h +KNQ γ Q L h), where Q h = number of grid points used per dimension of h-lattice, Q γ = number of grid points used per dimension of γ-lattice. 22

Numerical Experiments: Setup: K = 2 users N = 2 subchannels L = 2 time-domain channel taps E{γ n,k,t } = 25dB mean subchannel SNR ρ = 0.33 subchannel correlation Plots show goodput versus fading rate α goodput versus time t power/rate/user versus time t 23

Steady-state goodput versus α: Steady State Sum Goodput 10.5 10 9.5 9 8.5 8 7.5 Global Genie Greedy Algorithm Round Robin (no feedback) 7 10 4 10 3 10 2 10 1 Fading rate α 24

Goodput for α = 0.001: 12 2 users, 2 subcarrier, α = 1e 3, 200 packets 10 Total goodput in all subcarriers 8 6 4 genie aided CSI tracked CSI prior CSI 2 genie CSI avg = 11.0324 tracked CSI avg = 10.574 prior CSI avg = 7.3398 0 0 20 40 60 80 100 120 140 160 180 200 Packet Number 25

Allocations for α = 0.001: Power Subcarrier 1 SNR estimate Subcarrier 1 Power Subcarrier 2 SNR estimate Subcarrier 2 1 0.5 2000 1000 2 users, 2 subcarriers, α = 1e 3, 200 Packets User 1 after ACK User 2 after ACK User 1 after NACK User 2 after NACK Rate change up or down 0 0 20 40 60 80 100 120 140 160 180 200 0 1 0.5 2000 1000 User 1 User 2 Actual SNR for corresponding users 20 40 60 80 100 120 140 160 180 200 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Packet Number 26

Summary: Goal: From only ARQ feedback, optimize OFDMA users, powers, and rates to maximize finite-horizon expected goodput under an instantaneous total-power constraint. The optimal resource allocator is a POMDP, which is computationally impractical. We settle for greedy resource allocation, found to be near-optimal for practical fading rates. Greedy allocation is a mixed-integer programming problem, but we can solve it almost exactly with O(N KM) complexity. To maintain CSI, we track the SNR distribution (conditioned on past ACK/NAK feedback) of each user at each subcarrier. Preliminary experiments for 2 users and 2 subchannels indicates that our practical algorithm performs relatively close to a genie-aided upper bound on the optimal POMDP. 27

Thanks for listening! 28