A Study of Active Queue Management for Congestion Control

A Study of Active Queue Management for Congestion Control Victor Firoiu vfiroiu@nortelnetworks.com Nortel Networks 3 Federal St. illerica, MA 1821 USA Marty orden mborden@tollbridgetech.com Tollridge Technologies 872 Hermosa Dr. Sunnyvale, CA 9486 USA Abstract In this work we investigate mechanisms for Internet congestion control in general, and Random Early Detection (RED) in articular. We first study the current roosals for RED imlementation and identify several structural roblems such as roducing large traffic oscillations and introducing unnecessary overhead in the fast ath forwarding. We model RED as a feedback control system and we discover fundamental laws governing the traffic dynamics in TCP/IP networks. ased on this understanding, we derive a set of recommendations for the architecture and imlementation of congestion control modules in routers, such as RED. I. INTRODUCTION Congestion control for IP networks has been a recurring roblem for many years. The roblem of congestion collase encountered by early TCP/IP rotocols has romted the study of end-to-end congestion control algorithms in the late 8 s and roosals such as [4], which forms the basis for the TCP congestion control in current imlementations. The essence of this congestion control scheme is that a TCP sender adjusts its sending rate according to the rate (robability) of ackets being droed in the network (which is considered a measure of network congestion). This algorithm is relatively well understood and several models have been roosed and verified with increasing degrees of accuracy through simulation and Internet measurements [6], [8]. In traditional imlementations of router ueue management, the ackets are droed when a buffer becomes full (in which case the mechanism is called Dro-Tail). More recently, other ueue management mechanisms have been roosed (the most oular being Random Early Discard (RED), [3]). RED has the otential to overcome some of the roblems discovered in Dro-Tail such as synchronization of TCP flows and correlation of the dro events (multile ackets being droed in seuence) within a TCP flow. In RED, ackets are randomly droed before the buffer is comletely full, and the dro robability increases with the average ueue size. RED is a owerful mechanism for controlling traffic. It can rovide better network utilization than Dro-Tail if roerly used, but can induce network instability and major traffic disrution if not roerly configured. RED configuration has been a roblem since its first roosal, and many studies [2], [1], [5] have tried to address this toic. Unfortunately, most of the studies roose RED configurations ( otimal sets of RED arameters) based on heuristics and simulations, and not on a systematic aroach. Their common roblem is that each roosed configuration is only good for the articular traffic conditions studied, but may have detrimental effects if used in other conditions. In this study, we roose a general method for configuring RED congestion control modules, based on a model of RED as a feedback control system with TCP flows. We use this model and reuirements for stability and efficiency to derive a set of RED configuration arameters good for a given range of traffic characteristics and line seed. Our algorithm for comuting RED arameters is imlemented in a configuration rogram that can be used by network managers for router configuration. The rest of the aer is organized as follows. In Section II we construct a model for RED as a feedback control system, we verify it by simulation, and we use it to derive its stability conditions. In Section III we use these conditions to configure the RED control function. In Section IV we study the estimator of the average ueue size and make recommendations for its configuration. Section V concludes the aer and gives some directions for future work. II. QUEUE-SIZE-ASED CONGESTION CONTROL AS A FEEDACK CONTROL SYSTEM In this section, we analyze the dynamics of TCP congestion control in the resence of a ueue-size-based conges-

tion control module. First (in Section II-A), we develo a model of average ueue size when TCP flows ass through a ueue system with fixed dro robability. In Section II- we verify this model through simulations. In Section II-C, we combine this model with RED s control element and derive the steady state behavior of the resulting feedback control system. In Section II-D we analyze the stability of the RED control system. A. A model of average ueue size as a function of average acket dro robability In this section we construct a model of TCP congestion control couled with a ueuing system that dros ackets. For the urose of constructing an aroximate but tractable model, we make several simlifying assumtions. TCP senders A 1 A 2 A n... r s,1 r s,2 r s,n dro rate =H( ) Σjr t,j () Dro module C l, c ave. ueue size =G() r t,1 TCP receivers r t,2 r t,n Fig. 1. An n-flow feedback control system... We consider a system of n TCP flows assing through a common link l with caacity c as in Figure 1. TCP flow f i, 1 i n, is established between hosts A i and D i and transorts data in one direction, from A i to D i. The traffic in oosite direction consists only of acknowledgement ackets (ACKs). We assume that all access links, A i? and C?D i, have enough caacity so that? C is the only bottleneck link for any flow f i, i.e., the only link where incoming traffic rate can surass the link s caacity, and thus where ackets may be discarded. We assume that the number of flows n remains constant for a long eriod of time, and that all flows have long duration (i.e., have data available to send for a long time). We assume a TCP Reno imlementation (currently the most commonly deloyed), where the congestion control window increases linearly over time during eriods of no acket discard. If a acket discard is detected, the congestion control window is halved or a timeout waiting eriod begins. [1] has a detailed descrition of TCP Reno. In [8] and [9] we have derived a model for TCP Reno congestion control, which we will use in the D 1 D 2 D n following. Each flow f i sends with rate r s;i. The sending rates of all n flows combine at the buffer of link l and generate a ueue of size. The discard module at link l dros ackets with robability that is a function of average ueue size. For each flow f i, the ackets that are not discarded are sent on link l with rate r t;i (smaller than r s;i by the number of ackets droed from f i er second). Each TCP sender adjusts its sending rate according to the dro robability. We can view this system as a feedback control system with the controlled systems being the TCP senders, the controlling element being the dro module, the feedback signal - the dro robability, and the controlled variables - the TCP sending rates. This control system is different from a classical control system in that there are several simultaneously controlled elements instead of one. Also, the number of controlled elements (TCP flows) can vary over time. The urose of the controlling element is to bring and kee the cumulative throughut (of all flows) below (or eual to) the link s caacity c: nx j=1 r t;j c (1) Since we have assumed that the TCP flows have long duration and that their number does not change, the throughut of each TCP flow follows the steady state model that we have derived in [9]: 1 where T (; R) = 8 >< >: M M 1? r t;i (; R i ) = T (; R i ) + W () +Q(;W ()) 2 R( b 2 W ()+1)+ Q(;W ())F ()T 1? 1? + Wmax +Q(;W 2 max) R( b 1? Wmax+ 8 Wmax +2)+ Q(;Wmax)F ()T 1? if W () < W max otherwise (2) T is the throughut of a TCP flow (in bits/second) and deends on the acket dro robability, the average round tri time R, the average acket size M (in bits), the average number of ackets acknowledged by an ACK b (usually 2), the maximum congestion window size advertised by the flow s TCP receiver W max (in ackets) and the duration of a basic (non-backed-off) TCP timeout T (which is tyically 5R). Also, W, Q and F have the following exressions: W () = 2+b 3b + 8(1?) 3b +( 2+b 3b (3) )2 1 Observe that the model in [9] assumes correlated dros due to Dro- Tail, whereas here we assume uncorrelated, random dros. We will see in Section II- that our use of the model in [9] is a good aroximation.

Q(; w) = min 1; (1?(1?)3 )(1+(1?) 3 (1?(1?) w?3 )) 1?(1?)w (4) F () = 1++2 2 +4 3 +8 4 +16 5 +32 6 (5) We can construct a simle model of the n-flow feedback system if we assume that all flows have the same average round tri time, R i = R, 1 i n, the same average acket size, M i = M and that W max;i are sufficiently large to not influence T (; R). We have and (1) becomes r t;i (; R) = r t;j (; R); r t;i (; R) c=n; 1 i; j n 1 i n We can now reduce the n-flow feedback system to a singleflow feedback system, as in Figure 2 (where we droed the index i). TCP sender r s () A dro rate =H( ) Dro module r () t C l, c ave. ueue size =G() TCP receiver r t () D Fig. 2. A single-flow feedback control system To determine the steady state of this feedback system (i.e., the average values of r t, and when the system is in euilibrium), we need to determine the ueue function (or ueue law ) = G() and the control function = H(). The control function H is given by the architecture of the dro module, such as Dro-Tail or RED. TCP sender A r () s r () t C ave. ueue size =G() TCP receiver r () t Fig. 3. An oen control system with one TCP flow To determine = G() let us examine the oen-loo (non-feedback) system in Figure 3, where is an indeendent arameter. Since we have assumed that l is the only bottleneck link for any TCP flow, we have that the average round tri time of a acket is the sum of the average waiting time in the ueue of link l, and R o, the roagation and transmission time on the rest of its round tri. Assuming D a FIFO ueuing disciline at l, we have that the average waiting time in ueue l is =c, and thus R = R o + =c Deending on the value of, the system can be in one of two states. For >, the line s bandwidth is underutilized r t (; R) < c=n; Then, the average ueue size is negligible, thus R = R o and the utilization of the link is u() = r t c=n = T (; R o) c=n ; > For, the link s bandwidth is fully utilized, u() = 1, and the average ueue size can be derived from the condition r t (; R o + =c) = c=n: () = c(t?1 R (; c=n)? R o) where T R?1 (; y) is the inverse of T (; R) in R, i.e., T R?1 (; T (; R)) = R, the function T (; R) being defined in (2). If the robability of random dro is small enough so that c(t R?1 (; c=n)? R o) > ( is the buffer size), then some additional ackets are droed due to buffer overflow. Obviously, the average ueue cannot be larger that the buffer size, thus, () = max(; c(t?1 R (; c=n)? R o)); We can also determine, the value of dro robability at which the link regime changes between under-utilized and fully-utilized. Since for <, R = R o, we have that is given by r t ( ; R o ) = c=n Denoting by T?1 (x; R) the inverse of T (; R) in, i.e., T?1 (T (; R); R) =, we have that = T?1 (c=n; R o) (6) In conclusion we have the following exressions for the average ueue size and link utilization as a function of the dro robability : () = u() = n max(; c(t?1 R (; c=n)? Ro)); (7) ; otherwise 1; T (;R o) c=n ; otherwise (8) In the next section, we verify the above model of the n-flow non-feedback system through simulation.

. Verification through simulation We have conducted extensive simulation exeriments using the ns simulator [7]. The toology of the simulated network is as in Figure 1. We have erformed three simulation sets with link l of caacity c = 1:5Mb=s, 45Mb=s and 15Mb=s. In all cases, all other links have significantly higher seed and buffer caacity than l such that there is no acket dro due to buffer overflow. For each line seed, we erform several exeriments with different numbers of flows n. For each link seed c, we vary the number of flows n such that the average throughut er flow, = c=n, lies between min = 12:5Kb=s and max = 75Kb=s. Each TCP flow is generated by an infinite FTP alication, i.e., that is active throughout the duration of the simulation. All TCP flows have RTT R o = 1ms (which does not include waiting time at ueue l) and average acket size 5. The buffer at link l has a size = 2cR o. In all our exeriments, this buffer size is large enough to avoid acket dros due to buffer overflow. At the outut ort of link l at router, we installed a module that randomly dros ackets with a robability. is a configurable arameter that is fixed for the duration of a simulation exeriment. For each exeriment, we record a trace of the instantaneous ueue size (samled at each acket arrival and dearture) and comute the average ueue size, m, over the duration of the exeriment. We ran several exeriments with different values for, and comuted the measured average ueue size m (). In Figures 4-5 we comare m () with the redicted average ueue size () comuted using (7)). In Figure 4, we lot the measured and redicted average ueue size as a function of dro robability for line seed c = 1:5M b=s and n = 2; 2 flows. The ueue size is scaled with the line seed for the urose of comarison between oeration of links with different line seeds. We lot similar grahs in Figure 5 for a line seed c = 15Mb=s and n = 2; 2 flows. First, we observe that the redictions are close to the measurements in all the cases, and for all values of. A second observation is that the redictions are eually valid for all link seeds. This leads us to the conclusion that the ueue dynamic is indeendent of the link seed. Moreover, we observe that the scaled average ueue size ()=c does not deend on the link seed but just on = c=n, the average throughut er flow. We conclude here that is a measure of level of congestion, and henceforth we take it as our definition for the level of congestion at a given link. In all of the above exeriments we also recorded the state of the link over time, i.e., the beginning and end of Ave Queue Size/Line Seed [msec] Ave Queue Size/Line Seed [msec] LineSeed=1.5Mb/s, NumFlows=2, c/n=75.kb/s 2 Predicted () Measured () 15 1 5.5.1.15.2.25.3 Measured Dro Rate LineSeed=1.5Mb/s, NumFlows=2, c/n=75.kb/s 2 Predicted () Measured () 15 1 5.3.4.5.6.7.8.9.1 Measured Dro Rate Fig. 4. Measured and redicted scaled average ueue size, Line seed = 1.5Mb/s Ave Queue Size/Line Seed [msec] Ave Queue Size/Line Seed [msec] LineSeed=15.Mb/s, NumFlows=2, c/n=75.kb/s 2 Predicted () Measured () 15 1 5.5.1.15.2.25.3 Measured Dro Rate LineSeed=15.Mb/s, NumFlows=2, c/n=75.kb/s 2 Predicted () Measured () 15 1 5.3.4.5.6.7.8.9.1 Measured Dro Rate Fig. 5. Measured and redicted scaled average ueue size, Line seed = 15Mb/s the busy eriods (when the link is busy sending ackets). At the end of each exeriment, for a given dro robability, we comute the average utilization of the link, u m () (comuted as the time average of the link state trace).

Link Utilization Link Utilization LineSeed=15.Mb/s, NumFlows=2, c/n=75.kb/s 1.95.9.85.8.75.7.65.6.55.5 Predicted u() Measured u().5.1.15.2.25.3 Measured Dro Rate LineSeed=15.Mb/s, NumFlows=2, c/n=75.kb/s 1.95.9.85.8.75.7.65 Predicted u().6 Measured u().55.5.3.4.5.6.7.8.9.1 Measured Dro Rate Fig. 6. Measured and redicted average link utilization, Line seed = 15Mb/s In Figure 6, we lot the measured and redicted average link utilization as a function of dro robability for a line seed c = 15Mb=s and n = 2; 2 flows. We obtain similar results for line seeds of 1:5Mb=s and 45Mb=s. In all our exeriments we observe that the redictions are close to the measurements, although not as close as in the case of average ueue size. Given all the above results, we conclude that our model of ueue dynamics is well confirmed through simulation. In the following sections we use this model to analyze the dynamics of RED as a feedback control system. C. Steady-state oeration of RED congestion control Let us return to the feedback control system in Figure 2. In Section II-A we have derived an exression for the long-term (steady-state) average ueue size as a function of acket dro robability denoted by () = G(), as in (7). If we assume that the dro module has a feedback control function denoted by = H( e ), where e is an estimate of the long-term average of the ueue size, and if the following system of euations ( = G() (9) = H() has a uniue solution ( s ; s ), then the feedback system in Figure 2 has an euilibrium state ( s ; s ). Moreover, the system oerates on average at ( s ; s ), i.e., its long-term average of acket dro robability is s and average ueue size is s. s =G() s Controlled system =H() Feedback control function Euilibrium oint Fig. 7. Euilibrium oint for a feedback system Figure 7 illustrates this result: the euilibrium oint ( s ; s ) is at the intersection of the curves = G() and = H(). A justification for the above result is that the system is constrained on one hand by the ueue size law = G() (euation (7) derived in Section II-A), and on the other hand by the control module through its control function = H(). We also observe that ( s ; s ) is the euilibrium oint for the dynamic system defined in (9). We emhasize here that the system resides in state ( s ; s ) on average, and that it does not necessarily stay in this state at all times. Indeed, the system can converge to this state or ermanently oscillate around this state. We will consider the transient behavior of the system in Section II- D. We can aly the above result to determine the longterm average oeration of a system where the control module follows the RED algorithm [3]. In this case, the control function is: = H( e ) = 8 >< >: ; e < min e? min max?min max ; min e < max 1; max e (1) where e is the exonential weighted moving average of ueue size, min, max, max are configurable RED arameters, and is the buffer size. We have run simulation exeriments with the RED control module, recorded traces of ueue size and acket dros and comuted the average ueue size m and acket dro robability m. We have also comuted the redicted oerating oint ( s ; s ) as the solution of the system of euations (1) and (7). In all cases, the redicted oerating oints were close to the measurements. For examle, in Figure 8 we lot the theoretical average ueue size given by (7), and the RED control function given by (1) with arameters max = :1, min = 12:5ackets, max = 37:5ackets, and buffer size =

Ave Queue Size [Packets] LineSeed=1.5Mb/s, NumFlows=2, c/n=75.kb/s 7 6 5 4 3 2 1 Measured =G() Predicted =G() RED control =H() Measured RED oerating oint.3.4.5.6.7.8.9.1 Measured Dro Rate is that the ueue size at t k+1 is k+1 = G( k ), following the ueue law (7). The RED module comutes a new estimate of ueue size e;k+1 = A( e;k ; k+1 ), following the exonential weighted moving average A( e;k ; k+1 ) = (1? w) e;k + w k+1 RED then changes its dro rate to k+1 = H( e;k+1 ), according to its control law (1). In summary, we have identified a discrete-time dynamic system defined by the following system of recurrence euations: Fig. 8. RED average oerating oint: measured and redicted 75ackets, as recommended in [2], and acket size M = 5. Their intersection oint reresents the redicted oerating oint. We also lot the oint ( m ; m ) resulting from simulation. We observe that the average RED oerating oint is close to the intersection of the ueue law and RED control curves, which confirms our model for the steady-state of the RED control system. D. Transient oeration of RED congestion control In (9) we have defined a dynamic system having the average ueue size and average acket dro rate as state arameters. This system may or may not be stable around the euilibrium oint, deending on the functions H and G. Moreover, we are interested in the evolution of the instantaneous ueue size in time, not only in its average. Therefore, in the following we describe more recisely the RED dynamic system. We start from the observation that a TCP sender adjusts its congestion window (and thus its sending rate) deending on whether it has sensed that ackets are droed or not. If a acket is droed at link l, this event is tyically sensed at the TCP sender (which changes its rate accordingly) aroximately one RTT after the acket has been droed. Therefore, the feedback system we are about to model has a time lag of about one RTT between the moment a signal is sent by the control module (by droing or not a acket) and the moment the controlled system (TCP sender) reacts to this signal. The increase or decrease in the TCP sending rate roduces an increase or decrease in the instantaneous ueue length at bottleneck link l, which romts the RED module to again change its dro rate, and the rocess reeats. In the following we model the RED feedback system as a discrete-time dynamic system with the time ste of one RTT R. Assume that at time t k the dro robability is k. At time t k+1 = t k +R, the TCP senders react to k, and on average they adjust their sending rate to r k+1. The result k+1 = G( k ) e;k+1 = A( e;k ; k+1 ) k+1 = H( e;k+1 ) A uantitative study of the transient evolution of this dynamic system can be uite comlex, and we leave it for future work. In the following we give a ualitative investigation, backed by some exerimental examles. 1 2 e,2 e,1 1 (, ) s s k+1 e,k e,k+1 2 k k+1 Fig. 9. RED oerating oint converges In a first scenario, the ueue law and control law are as in Figure 8, also sketched in Figure 9. Suose the initial state is =, e; = and =. At time t 1, 1 = (since the dro rate is zero, the sending rate is high, and the buffer is full) and e;1 = w (w is a small value, say :2, as recommended in [3]). So the average ueue size increases a bit, and the new dro rate increases a bit too, to 1 (see Figure 9). The rocess reeats, and the system aroaches the euilibrium oint ( s ; s ) (defined in Section II-C). On the other hand, if at some time t k, we have k > s, then k+1 < k since (follow the Figure 9) k+1 < e;k, and thus e;k+1 < e;k. We conclude that the euilibrium oint ( s ; s ) is an attractor for

all states around it and that, once the system reaches the euilibrium state, it will stay there with only small statistical fluctuations, given that the number of flows n does not change. Queue Size [Pkts], uffer Size=75. 7 6 5 4 3 2 1 RED, LineSeed=1.5Mb/s, NumFlows=2 Qsize Qave Qewma max min 55 6 65 7 75 8 85 9 95 1 Time [s] Fig. 1. Instantaneous and average ueue size in time, converging case k+2 e,k e,k+1 k+1 k+2 1 max Fig. 12. RED oerating oint oscillates k+1 max In Figure 1 we show the evolution of the instantaneous ueue size and e.w.m. average ueue size in time, and observe that they have indeed a moderate variation around the euilibrium oint. This ueue trace corresonds to the system whose steady state was lotted in Figure 8. Ave Queue Size [Packets] LineSeed=1.5Mb/s, NumFlows=12, c/n=12.5kb/s 7 6 5 4 3 2 1 Measured =G() Predicted =G() RED control =H() Measured RED oerating oint.15.2.25.3.35.4.45 Measured Dro Rate Fig. 11. RED average oerating oint situated beyond max = :1 In a second scenario, the ueue and control laws are as in Figure 11. In this case, the euilibrium oint is situated beyond max = :1, i.e., on the horizontal line of the control law where the dro rate has a jum from :1 to 1, as given by (1). We clearly see that the system, although attracted by this oint, cannot stay there, since the value of is not defined. More recisely (see Figure 12), once e;k > max, k+1 = 1, i.e., all ackets are droed. Then all TCP rates dro to zero, and thus k+1 = and the average ueue size begins to decrease, e;k+1 < e;k. After the average ueue size e dros below max, the dro rate becomes less than 1 (in fact less than :1), and the ueue grows again. The effect is that the ueue length oscillates widely between and full buffer. These large oscillations can roagate through the network, making it unstable and resulting in a very harmful, erratic behavior. Queue Size [Pkts], uffer Size=75. 7 6 5 4 3 2 1 RED, LineSeed=1.5Mb/s, NumFlows=12 Qsize Qave Qewma max min 7 75 8 85 9 95 1 15 11 115 Time [s] Fig. 13. Instantaneous and average ueue size in time, oscillating case In Figure 13 we lot the trace of ueue size for the RED system whose steady state is in Figure 11. We observe that indeed the ueue size oscillates between and full buffer size, and that the oscillations do not decrease over time. Definitely, this kind of situation is harmful and should be avoided by a roer configuration of the RED control law. Such a configuration would avoid an oeration close to the discontinuity of the RED control law or would eliminate any such discontinuity. We secify such configuration details in Section III. Other characteristics of the RED control function, such as the sloe of the segment between min and max, may influence the stability of the feedback system and/or its rate of convergence to the euilibrium oint. For examle, it is likely that a small sloe of RED control function, = @=@ would result in a faster converging system than a control function with large sloe, but the system would be less stable. We leave this study for future work. In the next section, we give a set of recommendations for configuring RED, that take into consideration the above

constraints. III. RECOMMENDATIONS FOR CONFIGURING RED CONTROL FUNCTION In this section we derive a set of recommendations for configuring the RED control function. In general, there are many different functions that can be good candidates for RED control. Even if the sace of ossible functions is limited by the constraints from Section II-D, the number of functions remains infinite. To reduce the comlexity of choosing such a function, we grou them in a few categories, and restrict their shae to being iece-wise linear. In Section II-D we have uncovered the ueue law = G(), as shown in Figure 14. The steady state of the system is determined by the intersection of the control law with the ueue law. For examle, a control law H 1 with a high sloe results in a state with low dro rate but large average ueue size. Conversely, H 2 results in a lower average ueue size, but larger dro rate. G() H 1 H 2 Fig. 14. Two control laws imlementing different olicies In the following, we define two olicies for RED control and we give recommendation for configuring each of them: dro-conservative olicy: low, high delay-conservative olicy: low, high When configuring the RED control at a secific line (outut interface) of a router or switch, we start from several estimates of averages or bounds for traffic conditions: The line seed c. The minimum and maximum throughut er flow, min, max, or alternatively, the minimum and maximum number of simultaneous flows, n min and n max. For examle, min = 28:8Kb=s and max = 56Kb=s for a WAN with redominantly dial-u traffic, or min = 1Kb=s and max = 1Mb=s for LAN or enterrise environment. The minimum and maximum data acket size (non- ACK), M min and M max, or average, M ave. For examle, M min = 256, M max = 15, or M ave = 512. The minimum and maximum round tri time (excluding time in the ueue of the line under configuration), R o;min and R o;max, or average, R o;ave. For examle, in a LAN, R o;min = 2ms, R o;max = 15ms and R o;ave = 1ms and in a WAN, R o;min = 5ms, R o;max = 2? 5ms and R o;ave = 1ms. Each combination of n, M, and R roduces a different ueue law G(). We are interested in alying the conditions from Section II-D. We observe that, since the conditions should hold for any ueue law corresonding to any combination of n, M and R within a secified range, it is sufficient to verify the conditions for functions G() corresonding to extreme values of n, M and R. We define the maximum ueue law G max () to be the function G() defined in (7) with arameters n max, M max (or M ave if M max is not defined) and R o;min (or R o;ave if R o;min is not defined). It is easy to show that G max () G() for all, where G() has arameter values n, M and R within their secified ranges. Similarly, we define the minimum ueue law G min () to be the function G() defined in (7) with arameters n min, M min (or M ave if M min is not defined) and R o;max (or R o;ave if R o;max is not defined). It is easy to show that G min () G() for all, where G() has arameter values n, M and R within their secified ranges. Thus, we have that the ueue law can be anywhere between G max and G min as in Figure 15. max G min G max max Fig. 15. Range of ueue laws for a given ueue system In Section II-D we have seen that a condition for stability is that ( max ; max ) of the RED control function should be above the ueue law G. In the context defined above, this condition becomes ( max ; max ) above G max. For the dro-conservative olicy, we are given max.

Then, the condition becomes max > G max ( max ). We can choose for examle max = 1:2G max ( max ) to rovide sace for statistical fluctuations. For the delay-conservative olicy, we are given d max, the maximum ueueing delay at the link. For examle d max = :2R o;ave in order to not add much to the average round tri time. Then max = d max c, and the condition becomes max > G?1 max ( max). We can choose for examle max = 1:2G?1 max ( max) to rovide sace for statistical fluctuations. Although our traffic estimations (reflected in min (or n max ), M max and R o;min ) imly that the system will not oerate beyond the G max curve, we need to define the control function for that region as well, in order to cover any excetional oeration. Given the instability issues uncovered in Section II-D, we strongly recommend against any discontinuity in ( max ; max ) such as a jum of from max to 1 as suggested in [3]. We recommend a linear segment from ( max ; max ) to (1; ), where is the buffer size. We recommend dimensioning the buffer size = 2 max, to allow for ueue fluctuations on small time scales. The set of recommendations ut forth in this section involve some comutations that may be comlicated analytically, but have simle numerical solutions. We have incororated all these comutations into a configuration rogram which inuts the estimated arameters such as the ranges of RTT and number of flows along with the desired olicy, and oututs the recommended arameters for the RED control function. IV. CONFIGURING THE ESTIMATOR OF AVERAGE QUEUE SIZE A. Queue averaging algorithms A ueue-averaging algorithm is essentially a low-ass filter on the instantaneous ueue size. It is intended to filter out brief ueue changes or bursts, and to estimate longerterm ueue average. The estimate is used by the RED dro module to adjust its dro rate according to a control function, studied in the revious sections. The estimate is a moving average: at any given time, the average is comuted over the set of samles taken in the revious I seconds, and we define I to be the averaging interval. The exonentially weighted moving average is comuted recursively based on revious average k and a new samle k : k+1 = w k + (1? w) k where < w < 1 is a weight. It follows that the average exressed only in terms of samles is k+1 = w kx i= (1? w) i k?i If we assume that samles are taken at fixed time intervals, then a samle taken at time t? m contributes to the average comuted at time t with weight (1? w) m. In other words, the contribution of a samle decreases exonentially with its age. If we consider that a weight smaller than a threshold a, < a < 1 (for examle a = :1) makes a samle s contribution to the average negligible, then we can determine the number of samles m that are significant: ln a m = ln (1? w) It follows that the time interval I (where samle contributions are significant) is I = m, or: ln a I = ln (1? w) (11) In the next section, based on traffic considerations, we will develo recommendations for values for the averaging interval I and samling interval. Then, the averaging weight follows from (11):. Averaging interval w = 1? a =I (12) A first condition on the ueue averaging algorithm (exonentially weighted moving average) is to rovide a good aroximation of the long-term average in a system with a constant number of flows. In other words, if traffic conditions (number of flows, round tri times) do not change, the ueue length estimate should be uasi-constant over time and close to the average ueue length comuted over a very long time interval. For examle, the moving average should not be influenced by the linear increase and multilicative decrease of flow rates roduced by the TCP congestion control algorithm. We observe that the longer the averaging interval, the closer the moving average is to the long-term average. A second and oosing condition on the ueue averaging algorithm is to change its value as fast as ossible to the new long-term average after a change in traffic conditions (such as number of flows or round tri times). We observe that the shorter the averaging interval, the faster the moving average reflect the new conditions. In the following we derive an averaging interval that rovides a good comromise between the two conditions

above. Let us consider a system with n TCP flows having the same average round tri time R. Following the arguments in Section II, each TCP flow has on average a throughut of = c=n and a dro rate. According to the TCP Reno congestion control algorithm, the sending rate is increasing until a loss indication occurs. If the indication is trile-dulicate ACK, (TD) the window is reduced to half, and the window increase resumes (see Figure 16). If the indication is time-out (TO), there is a eriod of otentially multile time-out intervals (when no ackets are sent) alternating with small transmissions, after which the TCP window resumes its increase from one (see Figure 17). For the detailed descrition and analysis lease refer to [8] and [9]. In both cases denote the eriod of this function by P. The variation in sending rate is reflected in similar variation of ueue size, and thus the ueue size has the same eriodicity P. rate average P I time Fig. 16. Period and average interval of TCP sending rate, tridu (TD) case rate average P I time Fig. 17. Period and average interval of TCP sending rate, timeout (TO) case We observe that if we take the averaging interval of our moving average to be eual to the eriod, I = P, then the moving average with interval I is close to the long-term average. For an interval smaller than the average TCP eriod, I < P, the average followins closer the instantaneous ueue size. For I > P, the moving average has small variations, but converges raidly to the long-term average as I increases. We also observe that I = P is a good averaging interval for a suerosition of any number of functions having eriod P, as we can see in Figure 18. We conclude that I = P is the value of choice for averaging interval. From the models in [8] and [9] and given the average RTT R and the dro robability, we can comute P as the flow 1 flow 2 Fig. 18. Averaging two TCP flows average TCP eriod E[S]: I time R( b 2 W () + 1) + Q(;W ())F ()T 1? if W () < W max R( b 1? Wmax + 8 W + 2) + Q(;Wmax)F ()T max 1? otherwise (13) where all the notation was introduced in connection to euation (2). C. Samling the ueue size To determine the freuency at which the ueue size should be samled, let us consider again the rocess to be samled, which is the instantaneous ueue size. We have seen in Section IV- that the throughut of each TCP flow has a uasi-eriodical variation, where the duration between loss indications, although statistical, has a defined mean. In steady state (i.e., when the dro rate exhibits only small variations) TCP throughut has a eriodic increase and decrease, as in Figure 16. More recisely, the ackets in a TCP round are sent in a burst, and thus, the variation in TCP throughut is close to a ste function that increases every b RTTs, where b is tyically 2, as in Figure 19. rate RTT Fig. 19. TCP throughut is a ste function time The ueue size then behaves as a ste function, changing at every RTT, since the ueue size is synchronized with the TCP rate variation. It follows that the ideal samling

rate should be once every RTT, since this would cature each change of value. More freuent samling (at the same averaging interval) would not bring significantly more accuracy to the average, but would not decrease the accuracy either. Less freuent samling, on the other hand, would miss significant changes, would diminish the accuracy of the average, and thus is not recommended. flow 2 flow 1 RTT Fig. 2. Samling two TCP flows time Observe also that the same samling interval is also good for a suerosition of TCP flows having the same RTT, as Figure 2 shows. If the flows have different RTTs, then, following the above logic that larger is worse, smaller may be better or eual, we recommend the samling interval to be eual to the minimum RTT, = R o;min. V. CONCLUSION In this study, we have investigated the Random Early Detection (RED) mechanism for congestion control. First, we have surveyed the existing work on RED configuration and imlementation and identified otential roblems. Then, we have modeled the ueue-based congestion control (including RED) as a feedback control system. We have identified the ueue law governing the ueue size of a link transited by a number of TCP flows which, in conjunction with a given control law, determines the euilibrium state of the feedback system. We have validated this model through simulation exeriments. Using this model, we have identified otential roblems of instability of the feedback system. To avoid such roblems, we recommend a configuration of the RED control law. Also, based on our understanding of TCP traffic dynamics, we have derived a set of recommendations for configuration of the RED ueue size estimator: the freuency of ueue samling and the averaging weight. There are several other asects of RED that we intend to study in the future such as: the rocess of ueue samling (deterministic or stochastic), the rocess of randomly choosing ackets to dro, and validating our results in heterogeneous TCP traffic such as flows with different RTTs and mix of short and long lived flows. Also, our study has focussed only on TCP traffic in general and its Reno imlementation in articular. We lan to consider other TCP imlementations such as TCP SACK, and other traffic tyes such as oen-loo (uncontrolled) UDP traffic (such as voice) and mixes of TCP and UDP traffic. VI. ACKNOWLEDGEMENTS The authors would like to thank Don Towsley for many useful discussions. REFERENCES [1] W. C. Feng, D. Kandlur, D. Saha, and K. Shin. A Self-configuring RED Gateway. In Infocom 99, 1999. [2] S. Floyd. Notes on RED in the end-to-end-interest mail list. 1998. [3] S. Floyd and V. Jacobson. Random Early Detection gateways for Congestion Avoidance. IEEE/ACM Transactions on Networking, 1(4), August 1997. [4] V. Jacobson and M. J. Karels. Congestion Avoidance and Control. In SIGCOMM 88, 1988. [5] D. Lin and R. Morris. Dynamics of Random Early Detection. In SIGCOMM 97, 1997. [6] M. Mathis, J. Semske, J. Mahdavi, and T. Ott. The Macroscoic ehavior of the TCP Congestion Avoidance Algorithm. Comuter Communication Review, 27(3), July 1997. [7] S. MCanne and S. Flyod. ns-ll Network Simulator, 1997. Obtain via htt://www-nrg.ee.lbnl.gov/ns/. [8] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. Modeling TCP Throughut: A Simle Model and its Emirical Validation. In ACM SIGCOMM, 1998. [9] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. A Stchastic Model of TCP Reno Congestion Avoidance and Control. Technical Reort CMPSCI TR 99-2, Univ. of Massachusetts, Amherst, 1999. [1] W. Stevens. TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. RFC21, Jan 1997.