Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Patrick Loiseau 1, Paulo Gonçalves 1, Stéphane Girard 2, Florence Forbes 2, Pascale Vicat-Blanc Primet 1 1 INRIA/ENS Lyon - Université de Lyon, France 2 INRIA - Grenoble Universities, France Sigmetrics/Performance 2009 Seattle, June 18, 2009 1 / 18

Motivations I: Importance of α Global context: Quality of Service in networks Depends on what is sent, i.e. the file size distribution File size distribution in the Internet: commonly modeled as heavy-tailed distributions (Crovella, Bestravos) Characterized by its tail index α (that fixes the number of existing moments of the distribution) α can have an impact on the QoS (Norros, Mandjes, Park, etc.) Importance of the α estimation 2 / 18

Motivations II: Necessity of Sampling Very high speed networks: 1 Gbps and 10 Gbps Because of : CPU load storage capacity long data treatment energy consumption etc. Impossible to process each packet at such hight rates. Packet sampling: consider (deterministically or statistically), one packet out of N How can we estimate the flow size distribution tail index α from packet sampled data? 3 / 18

Problem Formulation I: Flow Size and Heavy-Tailed Distributions Traffic: interleaved packets from multiple sources Flow: set of packets sharing the same source and destination IPs and ports, and the same protocol Flow size: number of packets, random variable X Flow size distribution: P X (X = i) Zipf (discrete Pareto) Distribution: P X (X = i, α) = i (α+1) ζ(α+1) Estimation of α (no sampling): Hill (Seal), Nolan, Gonçalves 10 0 PX (X = i) 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 4 / 18

Problem Formulation II: The Sampling Packet sampling: Deterministic (Practice): Pick one packet every N Probabilistic (Theory): Pick packets with probability p = 1 N Sampled flow size: random variable Y Conditional probability: Binomial P Y X = B p (i, j) Sampled flow size distribution (key relation to be inverted): P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution 10 0 orginal sampled, p=1/100 10 2 PX, PY 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 5 / 18

Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18

Framework and Existing Solutions II: 2-steps Methods Inference of the Original Distribution Maximum Likelihood Estimation using the Expectation-Maximisation algorithm (to impose P X (X = i) 0, i) [Duffield et al., SIGCOMM, 2003] oscillating behavior for large flows Expansion of the probability generating function [Hohn, Veitch, IMC, 2003] reliable for p > 1 2 only Utilization of an a priori distribution to invert P Y X : P X Y P Y X P ap X (Bayes) Estimation of the original distribution: P X = P X Y P }{{}}{{} Y sampling model + a priori observation How to appropriately choose the a priori model? 7 / 18

Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X Y (X = i Y = j) P X Y (X = i Y = j) i (fixed j) i (fixed j) 8 / 18

i (α ap )(j): geometric mean of [j, ] weighted by P X Y 8 / 18 Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X Y (X = i Y = j) P X Y (X = i Y = j) i (fixed j) i (fixed j)

Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X (X = i) original infered, scaling meth. 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X (X = i) original infered, Zipf a pr. meth. 10 0 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i i (α ap )(j): geometric mean of [j, ] weighted by P X Y 8 / 18

Framework and Existing Solutions IIII: 1-Step Method, Stochastic Counting [Chabchoub, IEEE Comm. Lett., 2008] Observation period of lenght T is divided into sub-series of duration T = 5 s (< T ) W j : number of sampled flows of size j observed in a sub-series of duration T EW j empirically estimated by averaging the W j s obtained from each sub-series A Poisson approximation leads to a closed-form relation between α and EW j, ( which yields ) the following estimator for α: α = (j + 1) 1 EW j+1 EW j 1, for j j 0 Very simple, easy to implement, fast 9 / 18

Maximum Likelihood Estimation I: Formulation Assumption: The original distribution is Zipf (α) P Y (Y = j α) = 1 ζ(α+1) i=j B p(i, j)i (α+1) n: number of observed sampled flows Log-likelihood function: L(α) = log ( n k=1 P Y (Y = j k α)) L(α) = n [ normalization + P j=0 Y (Y = j) }{{} observation MLE: α ML = argmax L(α) α {}}{ ln ζ(α + 1) ln ( i=j B p(i, j) }{{} sampling ) ] i} (α+1) {{} original dist. 10 / 18

Maximum Likelihood Estimation II: Resolution Differentiation of L(α) brings: ζ (α + 1) ζ(α + 1) = P Y (Y = j) ln i (α) (j) j=0 No close form solution Fixed-point method and Expectation-Maximisation algorithm lead to the same iterative solution Approximative solution for j min large ( 3, for p = 1/100): 1 α k+1 = PY (Y = j) ln i (bαk )(j) i j=j (bαk min )(j min ) Hill estimation on the RV i (bαk )(Y ) (= the Zipf a priori method) (Convergence: between 5 and 100 iterations (worst case)) 11 / 18

Maximum Likelihood Estimation III: Practical usage, introduction of j min Practical situations: discard small values of observed flow sizes because: there are actually not observed (e.g. j = 0) the distribution is only asymptotically Pareto j min : minimum observed sampled flow size considered Determination of j min : bias-variance trade-off iteratively optimized 12 / 18

Results: Performance Analysis I: Evaluation Scheme Synthetic traffic (Matlab): 100 independent ON/OFF sources emitting at 10 Mbps 50 independent realizations of T = 300 s 5 values of α: 1.1, 1.3, 1.5, 1.7, 1.9 3 sampling rates: p = 1/10, p = 1/100, p = 1/1000 between 10 6 and 10 7 original (unsampled) flows Real Internet traffic: Internet access router of ENS Lyon 1 hour trace on March 4, 2007, from 4:30pm to 5:30pm 10 7 original flows 13 / 18

Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 14 / 18

Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 Variance (dashed lines represent the Cramér-Rao bound): Variance 10 1 10 2 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 7 number of original flows Reaches the Cramér-Rao bound : p = 1/10, j min = 0 : p = 1/10, j min = 1 : p = 1/100, j min = 0 : p = 1/100, j min = 1 : p = 1/1000, j min = 0 : p = 1/1000, j min = 1 14 / 18

Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 Variance (dashed lines represent the Cramér-Rao bound): Variance 10 1 10 2 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 7 number of original flows : p = 1/10, j min = 0 : p = 1/10, j min = 1 : p = 1/100, j min = 0 : p = 1/100, j min = 1 : p = 1/1000, j min = 0 : p = 1/1000, j min = 1 Reaches the Cramér-Rao bound efficient estimator 14 / 18

Results: Performance Analysis III: Comparison of the different estimators on synthetic traffic p = 1/10 p = 1/100 p = 1/1000 1 1 1 Bias and standard deviation 0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2 α α α : scaling method : Zipf a priori method : stochastic counting method : MLE 15 / 18

Results: Performance Analysis IV: Comparison of the different estimators on real traffic number of flows 10 6 10 4 10 2 10 0 10 2 10 4 10 0 10 2 10 4 10 6 flow size i min 35 (tail lower bound) j min = 7 for p = 1/10 j min = 2 for p = 1/100 j min = 2 for p = 1/1000 MLE with p = 1 (no sampling) α = 0.9047 (reference) Bias of estimation from sampled data: Scaling Zipf a p. Stochastic MLE with geom. counting p mean approx. 1/10 0.0814 0.0234-0.0634 0.0149 1/100 0.3694 0.0888-0.1997 0.0169 1/1000 0.4113 0.0995 0.1360 0.0525 16 / 18

Conclusions and Perspectives Conclusions: MLE naturally outperform other estimators The Zipf a priori method is an approximation of the MLE Small values of j are well taken into consideration the difference between MLE and other estimators might be reduced when j min increases Perspectives: Robustness against data sets that do not perfectly match the Zipf model Real-time perspective: study a faster algorithm that conserves the good properties of the MLE The method can be applied to other situations: social networks: individuals are clustered into groups of heavy-tailed distributed sizes, only a cross-section of the population is observed 17 / 18

References [Duffield et al., SIGCOMM, 2003]: Estimating flow distributions from sampled flow statistics [Loiseau et al., MetroGrid, 2007]: A comparative study of different heavy tail index estimators of flow size from sampled data [Chabchoub et al., IEEE Comm. Lett., 2008]: Inference of flow statistics via packet sampling in the internet [Hohn, Veitch, IMC, 2003]: Inverting sampled traffic [Loiseau et al., Sigmetrics, 2009]: Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data 18 / 18