On the Variability of Internet Traffic Georgios Y Lazarou Information and Telecommunication Technology Center Department of Electrical Engineering and Computer Science The University of Kansas, Lawrence (KS) August, 2000 0-0
Some Background Information on Long-Range Dependence and Self-Similarity On the Variability of Internet Traffic Outline Introduction and Motivation Characterizing the Variability of Traffic examples Simulation Study on the Variability of Internet Traffic Conclusions and Future Work
Many empirical studies on a variety of networks have shown traffic exhibits high variability that High variability was shown to have a significant impact on performance network Several studies claim that the high variability in traffic is due long-range dependence (LRD) property of traffic process to On the Variability of Internet Traffic 2 What Is The Problem? traffic is bursty (variable) over a wide range of time scales The assertion that traffic has LRD triggered a large effort in explaining the cause of LRD in traffic studying the impact of LRD on network performance, and creating new traffic models that have LRD
Develop a new theoretical and practical framework to characterize the variability and correlation structure accurately Determine that conventional traffic models can capture the variability of traffic empirically observed over a wide range high Investigate the contribution of TCP's dynamics to LRD or high of traffic variability On the Variability of Internet Traffic 3 What Is This Research All About? of a typical network traffic process at each time scale of time scales
k = Var[X + +X k ] k(e[x]) 2 J Var[N(t) E[N(t)] IDC(t)= do not capture the fluctuation of the degree of traffic ) across time scales burstiness Var[X] 2 (E[X]) our novel measure of variability based on the slope of the ) curve at each time scale and directly related to H IDC On the Variability of Internet Traffic 4 Why New Measure of Variability? Most commonly used measures of traffic burstiness are peak-to-mean ratio (= for CBR Flows) squared coefficient of variation of interarrival times: indices of dispersion for intervals and counts: ) any value other than one! bursty traffic Hurst parameter (H)
The exact degree of traffic variability over all time scales can be obtained analytically Considerable amount of work has already been done on network performance (i.e., queueing behavior) analyzing Performance evaluation depends on traffic characteristics over a range of time scales specific to system under study (i.e., finite any model can be used as long as it captures traffic behavior this range of time scales over On the Variability of Internet Traffic 5 Why Care About Conventional Traffic Models? Analytically simpler and tractable than models with LRD associated with these models maximum buffer size)
There are not strong evidence that real network traffic exhibits dependence long-range all empirically collected and analyzed traffic traces were one about three hours long to 360000 to 000000 samples for sampling period of 0 ms ) long enough for capturing the variability and correlation ) over the time scales associated with network structure performance but not long enough to assert that traffic processes have ) LRD On the Variability of Internet Traffic 6 Does Network Traffic Have LRD? definition of LRD applies only to infinite time sequences need to check the tail of correlation structure for LRD
aggregate TCP traffic exhibits LRD behavior over a wide of time scales range transmits packets as fast as it cans and then becomes idle ) for acknowledgments waiting presence of LRD depends on whether a reliable, flow- and protocol is employed at transport layer congestion-controlled natural to assume that the dynamics of TCP have a great on its traffic variability impact On the Variability of Internet Traffic 7 What TCP Has to Do With the Variability of Traffic? TCP traffic was used in most studies to detect the presence of LRD in network traffic, or give a possible explanation of what causes the asserted LRD The results from these studies claim that not surprising observation since TCP is a bursty protocol
a plot of H v (fi) describes the behavior of a traffic process in of its variability over a range of time scales terms traditional models can capture the high variability observed network traffic over a wide range of time scales in the amount of correlation that a traffic process has at a time scale does not alone determine the degree of particular the dynamics of TCP alone can not cause considerable over a substantial range of time scales variability On the Variability of Internet Traffic 8 Major Contributions A new measured of variability: index of variability H v (fi) better measure for capturing the burstiness of traffic than H The results show that variability at that time scale
A weakly stationary discrete-time real-valued Definition: process Y = fy t ;t= 0; ; 2;:::g (μ = E[Y t ] = constant, stochastic 2 = E[(Y t μ) 2 ] < ) with an autocorrelation function r(k) is called ff dependent if long-range t μ)(y t+k μ)] E[(Y 2 = ff r(k) measures the correlation between elements of Y separated by k On the Variability of Internet Traffic 9 Long-Range Dependence X X r(k) = k= k= units of time correlations between observations that are separated in time decay to zero at a slower rate that one would expect from data following Markov-type (i.e., SRD) models Self-Similar Processes: most popular models with LRD statistical properties remain same over all time scales several definitions, asymptotically second-order
ο k fi L(k) as k! r(k) 0 < fi < and L is slowly varying at infinity, that is, where m H On the Variability of Internet Traffic 0 Asymptotically Second-Order Self-Similar Processes Assume that L(kx) lim = 8x > 0 k! L(k) L(t) = const, L(t) = log(t). i.e., each m = ; 2; 3;:::, let Y (m) = fy (m) For k ;k = ; 2; 3;:::g, where k (m) k Y Y kmm+ + + Y km = m Definition: Y is called asymptotically second-order self-similar with self-similarity parameter H if Y (m) has the same variance and autocorrelation as Y as m!. That is, 8k large enough, r (m) (k)! r(k) as m!
different processes with same H can generate vastly queueing behavior different Conclusion: the single value Hurst parameter does not capture fluctuation of traffic burstiness across time scales the On the Variability of Internet Traffic Hurst Parameter Most important parameter of self-similar processes measures the degree of self-similarity expresses the speed of decay of autocorrelation function 0:5 < H < : ) LRD 0 < H» 0:5 : ) SRD Claimed to be a good measure of variability the higher the value of H, the burstier the traffic Popular belief: higher the H, poorer the queueing performance but, there are examples showing otherwise
a traffic sequence ^Y of length N. Construct ^Y (m) by Assume ^Y into blocks of length m, and averaging the sequence dividing P N mk= (Y (m) (k) μ Y ) 2 N m successive values of m that are equidistant on a log scale, the For variance of the aggregated series is plotted versus m on a sample = slope ^H 2 On the Variability of Internet Traffic 2 Estimation of Hurst Parameter: Aggregated Variance Method over each block. Its sample variance is then given by: V ar[y (m) ] = ^ where P N t= Y t μ = Y N log-log plot. By fitting a least-squares line
On the Variability of Internet Traffic 3 Packet Traffic With Point Processes (I) Relating X X 2 X 3 X 4 X 5 X 6 a) N(t) s s s s s s s 0 2 3 4 5 6... t... b) t c) Y τ = 2 Y = Y = 2 3 τ τ τ τ 0 2 3 4 5 Y 4 = 0 Y 5 = 2... t
V ar[n(t)] IDC(t) E[N(t)] V ar[n(t)] = t On the Variability of Internet Traffic 4 Relating Packet Traffic With Point Processes (II) Counting process: fn(t);t 0g, (weakly) stationary where N(t) = supfn : n = 0; ; 2;:::; S n» tg Traffic process: Y = fy n (fi);fi > 0;n= ; 2;:::g where Y n (fi) = N[nfi] N[(n )fi] Index of dispersion for counts: : mean event (packet) arrival rate IDC(t = mfi) = m fi V ar[y (m) ] m = ; 2; 3;::: Y (m) : aggregated packet (byte) count process
a self-similar process, plotting log(idc(mfi)) versus log(m) For in an asymptotic straight line with slope 2H results d(log(idc(fi))) + d(log(fi)) index of variability of Y for the time scale fi the Y results from the superposition of M independent traffic Suppose ar[n i (fi)] dv dfi P M i= V ar[n i(fi )] M P j= j Poisson: i d(idc i (fi)) = 2 i (fi)) d(idc dfi M IDCi (fi) P Λi i= lim M P (fi) IDCi c = < ; then limfi! H v (fi) = 0:5 Λi fi! i= If On the Variability of Internet Traffic 5 Index of Variability for Traffic Processes Definition: For a general stationary traffic process Y, we call H v (fi) 2 streams: Λi!) Hv(fi ) = 0:5fi ψp M i= + fi ψp M i=! ( = 0 8fi; i ) Hv(fi ) = 0:5 8fi dfi Λi=
If C 2 i V ar[n(2fi )] V ar[n(fi )] k = : 2 MX 2(X) Ci i Λ k(fi) C k = 0; ; 2;::: Var[N(fi)] C 2 (X) = Var[X] where 2 (E[X]) On the Variability of Internet Traffic 6 Correlation Structure of Traffic Process Y Autocovariance function: ( V ar[n((k + )fi )] + 2 V ar[n((k )fi )] V ar[n(kfi)] k > ; 2 Ck(fi ) = Autocorrelation function: r k (fi) = Correlation intensity: X R(fi ) ) lim k! IDC(kfi) = rk(fi ) IDC(fi 2 k= Suppose Y is a superposition of M independent renewal processes lim IDC(kfi) = k! i= (X) = for at least one i ) R(fi ) = ) Y is LRD process
the underlying point processes of Y is a stationary renewal Suppose with interarrival times hyperexponentially distributed of process two order f 2 (x) = w ae ax + w 2 be bx where w + w 2 = Pdf: V )]= 2[(aw +bw 2 ) 2 (a 2 w +b 2 w 2 )] ar[n(fi +bw ) 3 e[aw 2 +bw ]fi + C 2 (X)fi (aw 2 IDC(fi )= 2[(aw +bw 2 ) 2 (a 2 w +b 2 w 2 )] (aw +bw ) 3 e[aw 2 +bw ]fi fi 2 limfi! IDC(fi ) = C 2 (X) ) R(fi ) < ) Y is SRD process If a = b then [(aw + bw 2 ) 2 (a 2 w + b 2 w 2 )] = 0 and C 2 (X) = ) Var[N(t)] = t and IDC(t) =, i.e., Poisson process On the Variability of Internet Traffic 7 Example: Hyperexponential Distribution of Order Two (I) Then: = E[X] = ab +bw 2 C (X)= 2h 2 a w 2 +b 2 w (aw 2 +bw ) 2 i 2 aw + C 2 (X)
On the Variability of Internet Traffic 8 Example: Hyperexponential Distribution of Order Two (II) 00 0 6 90 80 70 0 5 λ (packets/sec) 60 50 40 C 2 (X) 0 4 30 20 0 3 0 0 0 0 0 9 0 8 0 7 0 6 0 5 0 4 0 3 0 2 w 2 0 2 0 0 0 0 9 0 8 0 7 0 6 0 5 0 4 0 3 0 2 w 2 a= 00 b= 0:000
On the Variability of Internet Traffic 9 Example: Hyperexponential Distribution of Order Two (III) 0.95 (b) 0 0 0 b = 0.000 w 2 = 0.000000 0.9 (i) 0 2 H v (τ) 0.85 0.8 0.75 0.7 0.65 (ii) (iii) (iv) (v) Autocorrelation, r k (τ = 0ms) 0 3 0 4 0 5 0 6 0 7 b = 0.0 w 2 = 0.0000 0.6 (vi) 0 8 0.55 (vii) 0 9 0.5 0 2 0 0 0 0 0 2 0 3 0 4 0 5 τ (seconds) 0 0 0 0 0 0 2 0 3 0 4 0 5 0 6 0 7 Lag (k) 00 b= 0:000 w 2 = (i) 0 6 (ii) 0 7 (iii) 0 8 (iv) 0 9 a= 0 0 (vi) 0 (vii) 0 2 (v)
the underlying point-process of the packet (byte) count Suppose Y is the superposition of: sequence 0 renewal processes with interarrival times hyperexponentially of order two (RPH2) distributed 40 packet streams generated by ON/OFF traffic sources whose and OFF periods are both exponentially distributed ON On the Variability of Internet Traffic 20 Example: Superposition of Heterogeneous Traffic Processes (I) 20 two-state Markov Modulated Poisson processes (MMPP) 6 packetized voice streams C 2 (X) = 5:6323x0 5 ) Y is not Poisson Y is SRD
On the Variability of Internet Traffic 2 Example: Superposition of Heterogeneous Traffic Processes (II) 0 6 (b) 0.95 0 5 0.9 0.85 0 4 H v (τ) 0.8 0.75 0.7 R(τ) 0 3 0 2 0.65 0 0.6 0.55 Superposition RPH2 Only 0 0 0.5 0 2 0 0 0 0 0 2 0 3 0 4 0 5 τ (seconds) 0 0 2 0 0 0 0 0 2 0 3 0 4 0 5 τ Y has high variability over a range of time scales that spans 8 order of magnitude the amount of correlation that the process Y has at a particular time scale not alone determine the degree of its variability at that time scale does
W : number of packets during ON period: geometrically distributed ] E[W ] E[W fi (p)+fit On the Variability of Internet Traffic 22 Message Length OFF Time ON/OFF/Exponential Model ) Renewal Process T I +T packet stream: renewal process Pdf: f(x) = pffi(x T ) + ( p)fie fi(xt ) u(x T ) fi : mean OFF period T : packet transmission time p = : probability that the next interarrival time is T p : probability that the next interarrival time is I + T = : mean packet arrival rate
d (Var[N(fi)]) = 2 X dfi x R ty e t dt y > 0; x > 0 : incomplete Gamma function 0 2 (p 2 ) 2 IDC(fi) fi fi 2 On the Variability of Internet Traffic 23 ON/OFF/Exponential Model : Exact Analysis X X nx n pnz ( p) z fi z Var[N(fi)] = 2 p n (fi nt )u(fi nt )+2 n=0 n= z= nt )G [fi(fi nt );z] zg [fi(fi nt );z+]gu(fi nt ) ffi(fi (fi) 2 fi p n u(fi nt )+ 2 nx X n z pnz ( p) z n=0 n= z= G [fi(fi nt );z] u(fi nt ) 2 2 fi G(x; y) = (y) lim k! IDC(kfi) = C 2 (X) = 2 p 2 R(fi) = 2 : Y is SRD
~V ar[n(fi)] = d dfi 2( p)3 ~V ar[n(fi)] = ~ Var[N(fi)] Var[N(fi)] = 2 +p fi! lim lim ~ IDC(fi) = 2 +p C2 (X) fi! e ρfi On the Variability of Internet Traffic 24 ON/OFF/Exponential Model : Fluid Analysis ff = E[W ]T : mean ON period ρ = ff + fi fi 2»fi ρ e ρfi Λ p)3 2( 2 fi Enormous gain in computational speed
On the Variability of Internet Traffic 25 0.95 0.9 β = 0.6 s Link Speed = 5Mbps MML = 0000 Packets 0.95 0.9 β =00s β =000s ON/OFF/Exponential Model : Index of Variability H v (τ) 0.85 0.8 0.75 0.7 MML = 4 Packets H v (τ) 0.85 0.8 0.75 0.7 MML = 0000 Packets Link Speed = Mbps β =s β =0s 0.65 0.65 0.6 0.6 0.55 0.55 0.5 0 3 0 2 0 0 0 0 τ (s) 0.5 0 2 0 0 0 0 0 2 0 3 τ (s) 0.95 () τ = s β = 00s () 0.95 τ = s β = 000s H v 0.9 0.85 0.8 0.75 (2) τ = 0s β = 00s (3) τ = 50s β = 00s (4) τ = 00s β = 00s (2) (3) H v 0.9 0.85 0.8 0.75 τ = s β = s τ = s β = 0.5s 0.7 (4) 0.7 0.65 0.65 0.6 0.55 0.6 0.55 τ = s β = 0.s 0.5 0 3 0 2 0 0 0 0 0 2 α (seconds) 0.5 0 3 0 2 0 0 0 0 0 2 α (seconds)
validate or invalidate our assumption that the primary contributing to high variability empirically observed factor On the Variability of Internet Traffic 26 Simulation Study Main Goals: in TCP traffic is the dynamics of TCP validate the theory
On the Variability of Internet Traffic 27 Simulation Study: Network Model TCP Source TCP Dest TCP Source 2... WAN Bottleneck Node TCP Dest 2... TCP Source N TCP Dest N
On the Variability of Internet Traffic 28 Simulation Study: ON/OFF/Exponential Model (I) 9 8.5 Estimated H = 0.89 9 8.5 "logvar" "VFitLine" "VSlope-.0" 8 8 7.5 7.5 Log0(Var[Y (m) ]) 7 6.5 6 5.5 Log0(Var) 7 6.5 6 5.5 5 5 4.5 4.5 4 4 3.5 0 0.5.5 2 2.5 3 3.5 4 4.5 5 Log0(m) 3.5 0 0.5.5 2 2.5 3 3.5 4 4.5 5 Log0(m) 0.95 0.9 0.85 0.8 H v (mτ) 0.75 0.7 0.65 0.6 0.55 0.5 0 0.5.5 2 2.5 3 3.5 4 4.5 5 Log0(m) E[W ]= 0000 pkts fi = 240s Link Speed=5Mbps Flows=64 TCP RTT=200-250ms
On the Variability of Internet Traffic 29 9 Estimated H = 0.85 8 b) File Size & OFF Time: Exponentially Distributed "logvar" "VFitLine" "VSlope-.0" 8 7 Simulation Study: ON/OFF/Exponential Model (II) Log0(Var[Y (m) ]) 7 6 5 Log0(Variance) 6 5 4 4 3 3 0 0.5.5 2 2.5 3 3.5 4 4.5 5 Log0(m) 2 0 2 3 4 5 6 Log0(m) (d) 0.95 0.9 0.85 H v (mτ) 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0 0.5.5 2 2.5 3 3.5 4 4.5 5 Log(m) Flows=64 : E[W ]= 4 pkts fi = 0:6s Link Speed=0Mbps TCP RTT=200-250ms Flows=64 : E[W ]= 0000 pkts fi = 900s
On the Variability of Internet Traffic 30 35 c) Pareto/Exponential -- Aggregation Time Interval: 0 ms "Rate" 25 d) Exponential/Exponential -- Aggregation Time Interval: 0 ms "Rate" ON/OFF/Heavy-Tailed Model vs. ON/OFF/Exponential Model 30 20 25 Traffic Rate (Mbps) 20 5 Traffic Rate (Mbps) 5 0 0 5 5 0 6020 6030 6040 6050 6060 6070 6080 Time (sec) 0 8000 800 8020 8030 8040 8050 8060 Time (sec) 8 a) File Size: Pareto Distr. -- OFF Time: Exponentially Distr. "Rate" 6 b) File Size: Exponentially Distr. -- OFF Time: Exponentially Distr. "Rate" 6 4 4 2 2 Traffic Rate (Mbps) 0 8 Traffic Rate (Mbps) 0 8 6 6 4 2 4 0 0 000 2000 3000 4000 5000 6000 7000 8000 9000 0000 Time (sec) -- (Aggregation Time Interval: 0 sec) 2 0 000 2000 3000 4000 5000 6000 7000 8000 9000 0000 Time (sec) -- (Aggregation Time Interval: 0 sec) Top: Time Scale 0ms Bottom: Time Scale 0s UDP case Flows=64 Left: ON/OFF/Heavy-Tailed Right: ON/OFF/Exponential
^H = slope Heavy-Tailed: ^H= 0:87 Exponential: ^H= 0:84 Hv (0ms)=H v (0s)= 0:86 On the Variability of Internet Traffic 3 6 5 Estimated H = 0.87 -- R/S Method -- Aggregation Time Interval: 0 ms "RSstat" "RSFitLine" "RSslope0.5" 5 4.5 Estimated H = 0.84 -- R/S Method -- Aggregation Time Interval: 0 ms "RSstat" "RSFitLine" "RSslope0.5" 4 ON/OFF/Heavy-Tailed Model vs. ON/OFF/Exponential Model 4 3.5 Log0(R/S) 3 2 Log0(R/S) 3 2.5 2.5 0 0.5-0 2 3 4 5 6 7 8 Log0(d) 0 0 2 3 4 5 6 Log0(d) 0.95 0.9 0.85 0.8 H v (τ) 0.75 0.7 0.65 0.6 0.55 0.5 0 2 0 0 0 0 0 2 0 3 0 4 τ (s)
On the Variability of Internet Traffic 32 ON/OFF/Heavy-Tailed Model vs. ON/OFF/Exponential Model "AutoCorrelation" "AutoCorrelation" 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 r(k) r(k) 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0. 0 50 00 50 200 250 300 350 400 k 0. 0 50 00 50 200 250 300 350 400 k "AutoCorrelation" "AutoCorrelation" 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 r(k) r(k) 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0. 0 200 400 600 800 000 200 400 600 800 2000 k 0. 0 500 000 500 2000 k Left: ON/OFF/Heavy-Tailed TCP Win= MB BDP= 3:2MB fi=0ms Right: ON/OFF/Exponential Model TCP Win= 64KB BDP= 244KB
On the Variability of Internet Traffic 33 ON/OFF/Heavy-Tailed Model vs. ON/OFF/Exponential Model 22 20 "Rate" 6 "Rate" 8 4 6 2 Traffic Rate (Mbps) 4 2 0 8 Traffic Rate (Mbps) 0 8 6 6 4 4 2 0 0 2000 4000 6000 8000 0000 2000 4000 6000 8000 20000 Time (sec) -- (Aggregation Time Interval: 00 sec) 2 0 2000 4000 6000 8000 0000 2000 4000 6000 8000 20000 Time (sec) -- (Aggregation Time Interval: 00 sec) 0.95 0.9 0.85 0.8 H v (τ) 0.75 0.7 0.65 0.6 0.55 0.5 0 2 0 0 0 0 0 2 0 3 0 4 τ (s) Time Scale = 00s H v (00s) = 0.55 TCP traffic
Goal: Determine if the dynamics of TCP alone can cause high over a wide range of time scales in traffic variability all application-level factors that might contribute to the of traffic were eliminated variability On the Variability of Internet Traffic 34 Simulation Study: Connections With Greedy Sources Simulation experiments were conducted for the cases of no packet losses ) several with uniformly distributed RTT (300ms to 600ms) packet losses due to queue overflows random packet losses Resulting aggregate TCP traffic did not have LRD had considerable variability only at short time scales (< s)
Constructed a new and theoretical practical framework for network traffic at all time scales based on the characterizing captures degree of burstiness at each time scale ) completely characterized by V ar[n(fi)] or IDC(fi) ) new and straightforward way of calculating the for all lags and all time scales autocovariance new and practical way for computing the infinity sum of the function for each time scale autocorrelation On the Variability of Internet Traffic 35 Conclusions (I) statistical properties of the underlying point processes novel measure of variability: index of variability H v (fi)
conventional traffic models can capture the high variability observed in network traffic over a considerable empirically H v (fi) is a better measure for capturing the burstiness of traffic than the Hurst parameter network the amount of correlation that a traffic process has at a time scale does not alone determine the degree of particular the mean file size, the mean OFF period, and the source speed have a great impact on the variability of traffic link On the Variability of Internet Traffic 36 Conclusions (II) Results from analyzing several traffic models show that range of time scales variability at that time scale generated by ON/OFF/Exponential sources
the dynamics of TCP alone can not cause high variability a considerable range of time scales over the presence of high variability over a wide range of time does not necessarily depend on whether a reliable, scales and congestion-controlled protocol is employed at flowlayer transport the Hurst parameter that an empirically collected finite of sequence exhibits LRD traffic the number of samples that might be required to get a ) estimated of the Hurst parameter can be extremely good On the Variability of Internet Traffic 37 Conclusions (III) Results from analyzing TCP/UDP traffic streams suggest that without prior knowledge about the traffic process we can not conclude based alone on the estimated value ) large
Find a relation that associates H v (fi) with queueing metrics (packet loss rate and delay) performance expect different queueing behavior for each different H v (fi) over all performance relevant time scales curve Construct a methodology of how to estimate H v (fi) from measured traffic traces empirically will help to develop accurate traffic models and traffic mechanisms control compromise between the ON/OFF/Heavy-Tailed and models ON/OFF/Exponential On the Variability of Internet Traffic 38 Future Work Analyze the ON/OFF/Hyperexponential traffic model Obtain H v (fi) for several stochastic processes thad have LRD support our claim that H v (fi) is a better measure than H