Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers Abdul Kabbani Stanford University Joint work with: Mohammad Alizadeh, Masato Yasuda, Rong Pan, and Balaji Prabhakar 1
Multi-tenant DCs Cloud computing with multiple tenants Need to share networking resources in a programmable way One tenant/class shouldn t adversely affect another Different versions of TCP, some more aggressive than others Different transport protocols (eg. UDP) Malicious flows Flow1 Flow2 Flow3 Flow4 Flow5 2
Flow isolation Classical approach: isolate packets at buffers Followed by FQ/WFQ, GPS, DRR But, isolating packets in buffers can be hard/expensive Need per-flow queues, per-packet schedulers Lots of datapath work 3
Our Approach Key observations for providing bandwidth slices It is not necessary to do per-packet scheduling; fairness over few RTTs works well Focus on the larger flows; small (mice) flows impose work but don t consume bandwidth Provide bandwidth slices by sending congestion signals differentially, rather than scheduling AFD (Pan, Breslau, Prabhakar, Shenker, 2003) Uses a single queue, slices bandwidth via differential dropping 4
Goal 1 F b =2 QCN Rate=1Gbs 2 Rate=9Gbs F b =2 Switch 1 Rate=1Gbs F b =1 AF-QCN 2 Rate=9Gbs F b =7 Switch 5
Rest of the talk Quick overview of AFD Overview of QCN: L2 congestion control Approximate Fair QCN (AF-QCN) algorithm 6
AFD Based on 3 simple mechanisms Estimate per flow/class arrival rate counting per class bytes over fixed intervals ( T s ) averaging over multiple intervals Estimate fair share rate: Fair share = C / #flows Perform differential probabilistic dropping to drive arrival rate to fair (or weighted fair) rate 7
AFD Algorithm D i = Drop Probability for Class i Arriving Packets Class i M i = Arrival estimate for Class i (Bytes over interval T s ) D i 1-D i Qlen Qref Mfair = Mfair a1(qlen Qref) + a2(qlen_old Qref) Fair Share If M i F(Mfair,Min i,max i,w i, ) : No Drop (D i = 0) If M i > F(Mfair,Mini,Maxi,Wi, ) : D i > 0 such that M i (1-D i ) = F(Mfair,Min i,max i,w i, )
The QCN control loop Feedback S 1 D 1 S N Congestion Point D N Reaction Points 9
Reflection Probability QCN Congestion Point Consider the single-source, single-switch loop below Q eq Source Congestion Point (Switch) Dynamics: Sample packets, compute feedback (Fb), send it to source Fb = (Q-Q eq + w. dq/dt ) = (queue offset + w.rate offset) Pmin P max Quantized to 6 bits Fb 10
Rate QCN: Reaction Point Source (reaction point): Transmit regular Ethernet frames. When congestion message arrives: Multiplicative Decrease: CR CR(1 G d F b ) Fast Recovery Active Probing TR CR Target Rate Fast Recovery Rd Rd/2 Current Rate Rd/4 Rd/8 Active Probing Congestion message recd Time
AF-QCN 12
How does it work? 1 F b =2 QCN Rate=1Gbs 2 Rate=9Gbs F b =2 Switch 1 Rate=1Gbs F b =1 AF-QCN 2 Rate=9Gbs F b =7 Switch 13
AF-QCN Algorithm Upon sampling a pkt at the QCN switch, F b is computed as F b (1 )F b QCN F b AF F b-qcn : the same value calculated by the QCN CP (flowindependent) F b-af : a fairness term calculated as in AFD (flowdependent) α is small (chosen to be 1/8), to ensure good stability of QCN is retained Utilization (stability) first, fairness is second Major difference with DRR-type schedulers 14
Evaluation 15
Setup Simulations Static flows: Service rate: 10Gbps -> 2Gbps -> 10Gbps Class priorities: uniform, and variable weights, rate caps. Parking lot topology Dynamic flows: Bursty on-off source together with backlogged sources Poisson arrivals with backlogged sources NetFPGA hardware evaluation 16
Single Link, 4 Sources, 50us RTT QCN AF-QCN 17
AF-QCN with Different Weights w i = i Flow 4 is capped at 1Gbps at t=2sec 18
Parking Lot QCN AF-QCN 19
Dynamic Flows: Flow Completion Times (FCT) 8 RPs sharing one link - 4 RPs serving backlogged static flows - 4 RPs each serving 4 permanent connections (16 connections in total) Dynamics flows - Pareto size of mean 10KB - Poisson arrivals - 1Gbps total offered load Flow Size Bin (KB) FCT with QCN (usec) FCT with AF-QCN (usec) [1,10) 2.346 1.586 [10,100) 2.610 1.732 [100,1000) 5.037 2.932 [1000, ) 33.14 17.14 20
Hardware Implementation (1Gbps NetFPGA) Same Weight Different Weights 21
Summary AF-QCN Provides programmable bandwidth allocation via light-weight CP modifications Fair at the granularity of a few msecs Does not disturb the original QCN characteristics: stability/responsiveness/etc Improves flow completion times Similar approaches seem promising at L3 22
Back up 23
Single Link, 4 Sources, 400us RTT QCN AF-QCN 24
Dynamic Flows: Flow Completion Times (FCT) 8 RPs sharing one link - 4 RPs serving backlogged static flows - 4 RPs each serving 4 permanent connections (16 connections in total) Dynamics flows - Pareto size of mean 10KB - Poisson arrivals - 1Gbps total offered load Flow Size Bin (KB) FCT with QCN (usec) FCT with AF-QCN (usec) [1,10) 2.346, 4.89 1.586, 3.23 [10,100) 2.610, 4.93 1.732, 3.33 [100,1000) 5.037, 4.47 2.932, 3.04 [1000, ) 33.14, 22.9 17.14, 14.8 25
Fairness achieved: 40 Sources, 50us RTT
1 Bursty Source + 3 Static Sources 10KB Bursts (totaling 1Gbps) 10KBBursts (totaling 10Gbps)
AF-QCN Algorithm Flow j s arrivals are estimated every T s (equals 1msec here) as where: m j (1-β)m j + βm j-new m j-new denotes the pkt arrivals within the last T s interval β is small enough to smooth down bursty arrivals (chosen to be 1/8)