OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback. Prof. Phil Schniter

OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback Prof. Phil Schniter T. H. E OHIO STATE UNIVERSITY Joint work with Mr. Rohit Aggarwal, Dr. Mohamad Assaad, Dr. C. Emre Koksal September 20, 2010 1

Setup: Single-antenna downlink with K mobile users OFDMA with N subchannels Channels are Markov time-varying with impulse responses of length L ACK/NAK feedback on each transmitted data packet (i.e., ARQ) data data ACK/NAK data ACK/NAK ACK/NAK subchannel 1 subchannel 2 subchannel 3 2

The OFDMA Resource Allocation Problem: At each time t, we want to assign the best users (multiuser diversity) to their best subchannels (frequency diversity). We also want to optimize the powers and data-rates of assigned users. For this, we need channel state information (CSI) at the transmitter. Feedback of receiver CSI can be very costly! (Many users, subchannels.) Feedback of data-packet ACK/NAKs is free! (Provided by link layer.) Is it possible to do near-optimal resource allocation using only ACK/NAK feedback from previously transmitted data packets? Can we get enough Tx-CSI from such limited cross-layer feedback? 3

A Much Simpler Problem: Simplified scenario: K=1 user, N=1 subchannel, L=1 channel tap. Point-to-point communication over a time-varying channel. The only parameter to optimize is the transmission rate r t (bits/packet). Rate adaptation via ARQ feedback. Though it may sound simple, this is still an interesting problem! Notice that r t effects short-term throughput: r t too high the packet will not be correctly decoded. r t too low the channel will be underutilized. But r t also effects long-term throughput: Recalling that more accurate CSI leads to higher throughput... r t very high likely NAK nothing learned about CSI. r t very low likely ACK nothing learned about CSI. r t such that Pr{ACK} = 1 is most informative about CSI. 2 4

Rate Adaptation based on ARQ Feedback: Goal: Maximize total expected goodput over T packets: { T } [r1,...,r T] = argmaxe G(γ t,r t ), G(γ,r) = ( 1 ǫ(γ,r) ) r. [r 1,...,r T ] t=1 If our knowledge of the SNR γ t was decoupled from the rates {r τ } τ t, then it would be relatively easy to calculate the optimal {rt} T t=1: rt = argmaxe{g(γ t,r t )} = argmax G(γ t,r t )p(γ t )dt. r t r t But really, our knowledge of SNR γ t will depend on the ARQ feedbacks received up to time t, which are strongly influenced by rates {r τ } t 1 τ=1. The current rate allocation affects not only immediate goodput, but also future SNR knowledge (and hence future goodput)! A classical tradeoff between exploitation and exploration. 5

Optimal Rate Adaptation: Optimal rate adaptation, i.e., r t+1 = argmax r t+1 E { G(r t+1,γ t+1 )+ can be recognized as a dynamic program. T τ=t+2 Denoting the optimal expected future-goodput by { T Ḡ t+1:t(ˆǫ t,r t ) E G(rτ,γ τ ) we can write the Bellman equation as τ=t+1 G(r τ,γ τ ) ˆǫ t,r t }, } ˆǫ t,r t { Ḡ t+1:t(ˆǫ t,r t ) = max E{G(r t+1,γ t+1 ) ˆǫ t,r t } r t+1 } +E{Ḡ t+2:t([ˆǫ t,ˆǫ t+1 ],[r t,r t+1 ]) ˆǫ t,r t } with the first expectation over γ t+1 and the second expectation over ˆǫ t+1. 6

Optimal Rate Adaptation (cont.): The solution to this dynamic program is a partially observable Markov decision process (POMDP), which is intractable when the SNRs {γ t } are continuous random variables, can be solved numerically when the SNRs {γ t } are discrete, but at great cost: both complexity and memory increase exponentially with the horizon length T and the number of states used for γ t. ( PSPACE-complete ) Not practical to implement! 7

Greedy Adaptation: What if we maximize only the short-term reward? This corresponds to the greedy scheme r g t+1 argmax r t+1 E { G(r t+1,γ t+1 ) ˆǫ t,r t }, which should be much easier to implement. How good is this greedy scheme (w.r.t the optimal POMDP)? 8

The Causal Genie: Consider the optimal rate under non-degraded causal error-rate feedback: r cg t+1 argmax r t+1 E { G(r t+1,γ t+1 )+ T τ=t+2 G(r cg τ,γ τ ) ǫ t,r t }. For a given rate r t, the error ǫ t uniquely determines the SNR γ t, so that r cg t+1 = argmax r t+1 E { G(r t+1,γ t+1 )+ T τ=t+2 G(r cg τ,γ τ ) } γ t. Since future causal-genie rates {r cg τ } τ t+2 will be chosen based on perfect SNR knowledge, they will not depend on r t+1. Thus, r cg t+1 = argmax r t+1 E { G(r t+1,γ t+1 ) γt }. Optimal adaptation under non-degraded error-rate feedback is greedy! 9

The Causal Genie is an Upper Bound on the POMDP: Since E{G(r t,γ t ) ˆǫ t d,r t d } max E{G(r t,γ t ) ˆǫ t d,r t d } r t R...since rt is not necessarily short-term optimal = max E{ E{G(r t,γ t ) ˆǫ t d,r t d,ǫ t d } } ˆǫ t d,r t d r t R E { max E{G(r t,γ t ) ˆǫ t d,r t d,ǫ t d } } ˆǫ t d,r t d r t R...since max r t E{f(r t )} E{max r t = E { max r t R E{G(r t,γ t ) γ t d } ˆǫ t d,r t d } = E{G(r cg t,γ t ) ˆǫ t d,r t d }, summing and averaging both sides gives { T } { T E G(rt,γ t ) E t=1 t=1 f(r t )} for any f( ) } G(r cg t,γ t ). 10

Recap: So far we ve shown that, in the point-to-point problem, 1. Optimal rate adaptation is very difficult to implement/analyze. 2. There exists a suboptimal greedy scheme which is easy to implement. 3. The causal-genie scheme (i.e., the optimal scheme based on non-degraded feedback) is easy to implement, since it s greedy. 4. The causal genie upper bounds the performance of the optimal scheme. So, if we can show that the greedy scheme is close to the causal genie, then greedy must be close to optimal. 11

Numerical Examples Setup: Modulation: Uncoded square-qam, p = 100, packet error rate ( ( ǫ(r t,γ t ) = 1 1 2 1 1 ) ( )) 2p 3γt Q. 2 r t /p 2 r t/p 1 ARQ feedback: ˆǫ t {0,1} SNR variation: Pr(ˆǫ t = f ǫ t ) = ǫ t 1 ǫ t for f = 0 ( NAK ) for f = 1 ( ACK ) Rayleigh-fading channel gain (via Gauss-Markov process): g t+1 = (1 α)g t +αw t, {w t } i.i.d CN(0,1) SNR: γ t = C g t 2. Parameters (α,c) control the mean and coherence time of γ t. For a given α, we assume C is chosen so that E{γ t } = 25 db. 12

Steady-state goodput versus fading-rate parameter α: 5.2 5 Steady State Goodput 4.8 4.6 4.4 4.2 4 Non causal Genie Causal Genie Greedy Best fixed rate Greedy rate adaptation extracts most of the goodput gain available from the use of causal ARQ feedback! 3.8 3.6 10 4 10 3 10 2 10 1 Gauss Markov model parameter, α Best fixed-rate: r fr t+1 = argmax rt+1 G(rt+1,γ t+1 )p(γ t+1 )dγ t+1 Greedy: r g t+1 = argmax rt+1 G(rt+1,γ t+1 )p(γ t+1 ˆǫ t,r t )dγ t+1 Causal Genie: r cg t+1 = argmax rt+1 G(rt+1,γ t+1 )p(γ t+1 ǫ t,r }{{} t )dγ t+1 Non-causal Genie: r ng γ t+1 = argmax rt+1 G(r t+1,γ t+1 ) t 13

Back to the OFDMA Problem: Single-antenna downlink with K mobile users OFDMA with N subchannels Channels are Markov time-varying with impulse responses of length L ACK/NAK feedback on each transmitted data packet (i.e., ARQ) data data ACK/NAK data ACK/NAK ACK/NAK subchannel 1 subchannel 2 subchannel 3 14

Formal Objective: At each time t and subchannel n, choose each user k s next... rate r n,k,t+1 R, for some M-ary rate alphabet R, power p n,k,t+1 0, based on ARQ feedbacks F 1:t, to maximize the total future utility { T K N (1 ǫ(rn,k,τ Ḡ t+1:t = E U(,γ n,k,τ,p n,k,τ ) ) ) } r n,k,τ F 1:t }{{} τ=t+1 k=1 n=1 goodput from k on n at τ subject to the power constraint n,k p n,k,τ P con, τ, and subject to 1 (user,rate) per (time,subchannel). Here, ǫ(r,γ,p) is packet error rate, U( ) is a concave utility function, and F 1:t {0,1, } tnk collects all past ARQ feedbacks. 15

Greedy Resource Allocation: Using the indicator I n,m,k,t {0,1} to denote time-t assignment of subchannel n to user k at MCS index m, the time-t greedy allocation is { ) max E U (I } n,k,m,t+1 (1 a m e b mp n,k,m,t+1 γ n,k,t+1 )r m F 1:t I n,k,m,t+1 {0,1} k n,m p n,k,m,t+1 0 subject to n,k,mi n,k,m,t p n,k,m,t P con, t, and k,m I n,k,m,t 1, n, t, where γ n,k,t is SNR of user k on subchannel n at time t, (a m,b m,r m ) determines data rate and error rate for MCS index m, F 1:t collects all ARQ feedbacks collected from times 1 to t. A mixed-integer programming problem! 16

Greedy Resource Allocation: Exact solution: Brute Force: Compute water-filling solution for every hypothesized (subcarrier,user,rate) combination. Costs O ( (KM) N). Proposed approximate solution: Relax problem by allowing each (subcarrier,time) slot to be shared among multiple (user,rate) pairs. Can reformulate as a convex optimization problem. Can solve dual problem using an iterative algorithm with exponential convergence rate at O(N KM) per iteration. Byproducts of above algorithm yield a near-exact solution to the original mixed-integer problem. R. Aggarwal, M. Assaad, C.E. Koksal, P. Schniter, Optimal Joint Scheduling and Resource Allocation in OFDMA Downlink Systems with Imperfect Channel-State Information, manuscript in preparation. 17

Approximate Greedy Algorithm Sketch: Say that we relax the binary indicators to Ĩn,m,k,t [0,1]. Then the KKT conditions become (suppressing the time-t notation): n,k,m, µ = a m b m r m E{γ n,k e b mp n,k,m γ n,k F} (1) n,k,m, λ n = r m E{1 a m e b mp n,k,m γ n,k F} µp n,k,m (2) where {λ n } N n=1 and µ are Lagrange multipliers. A practical alg is then: 1. Initialize search region [µ,µ] with midpoint µ = (µ+µ)/2. 2. For each subchannel n, For each (k,m)... calculate p n,k,m by solving (1) and forcing a non-negative result. plug p n,k,m into (2) and calculate the corresponding λ n (k,m). Find (k,m ) = argmax (k,m) λ n (k,m). Set I n,k,m = 1 and I n,k,m (k,m) (k,m ) = 0. 3. If n p n,k,m > P con, set [µ,µ] [µ,µ], else set [µ,µ] [µ,µ]. Goto 2. 18

Approximate Greedy Algorithm Example Performance: N K M greedy goodput (brute force) approximation 1 3 9 5.9884 5.988 1 5 9 6.3501 6.3499 2 3 9 10.3251 10.3249 2 5 9 10.9778 10.9774 3 3 9 14.0573 14.0571 3 5 9 14.9653 14.9651 The practical approximation typically yields 99.99% of the goodput attained by the exact greedy scheme! We have also derived performance bounds on our approximation. 19

Tracking the SNR distribution: The greedy allocator tracks the SNR by updating the SNR distributions { p(γn,k,t+1 F 1:t ), users k and subchannels n } The SNR evolves as follows: Markov evolution of time-domain channel taps: h l,k,t+1 = (1 α)h l,k,t +αw l,k,t, w l,k,t CN(0,1), subchannel gains as a function of time-domain channel taps: H n,k,t = L 1 l=0 h l,k,t e j 2π N nk, subchannel SNRs as a function of subchannel gains: γ n,k,t = C H n,k,t 2. 20

Tracking the SNR distribution (cont.): SNR tracking can be done as follows: p(γ n,k,t+1 F 1:t ) = p(γ n,k,t+1 h k,t+1 ) p(h h k,t+1 }{{} k,t+1 F 1:t ) (3) p(h k,t+1 F 1:t ) = p(h k,t F 1:t ) = p(f k,t h k,t ) = p(f n,k,t = f γ n,k,t ) = (approx of) Dirac delta p(h k,t+1 h k,t ) p(h h k,t }{{} k,t F 1:t ) (4) Markov prediction p(f k,t h k,t )p(h k,t F 1:t 1 ) p(f k,t h k,t)p(h (Bayes rule) (5) k,t F 1:t 1 ) h k,t N p(f n,k,t γ n,k,t (h k,t )) (6) n=1 m I n,k,m,ta m e b mp n,k,m,t γ n,k,t f = 0 m I n,k,m,t(1 a m e b mp n,k,m,t γ n,k,t) f = 1 1 m I n,k,m,t f = (7) 21

Tracking the SNR distribution (cont.): Thus, for each user k, 1. measure feedbacks f k,t across all subchannels, 2. compute p(f n,k γ n,k,t (h k,t )) on h-lattice using error-rate rules (6)-(7), 3. compute p(h k,t F 1:t ) on h-lattice by updating previous posterior via (5), 4. compute p(h k,t+1 F 1:t ) on h-lattice via Markov-prediction step (4), 5. compute p(γ k,t+1 F 1:t ) on γ-lattice via h-to-γ conversion step (3). This costs O ( KNQ L h +KLQL+1 h +KNQ γ Q L h), where Q h = number of grid points used per dimension of h-lattice, Q γ = number of grid points used per dimension of γ-lattice. 22

Numerical Experiments: Setup: K = 2 users N = 2 subchannels L = 2 time-domain channel taps E{γ n,k,t } = 25dB mean subchannel SNR ρ = 0.33 subchannel correlation Plots show goodput versus fading rate α goodput versus time t power/rate/user versus time t 23

Steady-state goodput versus α: Steady State Sum Goodput 10.5 10 9.5 9 8.5 8 7.5 Global Genie Greedy Algorithm Round Robin (no feedback) 7 10 4 10 3 10 2 10 1 Fading rate α 24

Goodput for α = 0.001: 12 2 users, 2 subcarrier, α = 1e 3, 200 packets 10 Total goodput in all subcarriers 8 6 4 genie aided CSI tracked CSI prior CSI 2 genie CSI avg = 11.0324 tracked CSI avg = 10.574 prior CSI avg = 7.3398 0 0 20 40 60 80 100 120 140 160 180 200 Packet Number 25

Allocations for α = 0.001: Power Subcarrier 1 SNR estimate Subcarrier 1 Power Subcarrier 2 SNR estimate Subcarrier 2 1 0.5 2000 1000 2 users, 2 subcarriers, α = 1e 3, 200 Packets User 1 after ACK User 2 after ACK User 1 after NACK User 2 after NACK Rate change up or down 0 0 20 40 60 80 100 120 140 160 180 200 0 1 0.5 2000 1000 User 1 User 2 Actual SNR for corresponding users 20 40 60 80 100 120 140 160 180 200 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Packet Number 26

Summary: Goal: From only ARQ feedback, optimize OFDMA users, powers, and rates to maximize finite-horizon expected goodput under an instantaneous total-power constraint. The optimal resource allocator is a POMDP, which is computationally impractical. We settle for greedy resource allocation, found to be near-optimal for practical fading rates. Greedy allocation is a mixed-integer programming problem, but we can solve it almost exactly with O(N KM) complexity. To maintain CSI, we track the SNR distribution (conditioned on past ACK/NAK feedback) of each user at each subcarrier. Preliminary experiments for 2 users and 2 subchannels indicates that our practical algorithm performs relatively close to a genie-aided upper bound on the optimal POMDP. 27

Thanks for listening! 28