Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Patrick Loiseau 1, Paulo Gonçalves 1, Stéphane Girard 2, Florence Forbes 2, Pascale Vicat-Blanc Primet 1 1 INRIA/ENS Lyon - Université de Lyon, France 2 INRIA - Grenoble Universities, France Sigmetrics/Performance 2009 Seattle, June 18, 2009 1 / 18
Motivations I: Importance of α Global context: Quality of Service in networks Depends on what is sent, i.e. the file size distribution File size distribution in the Internet: commonly modeled as heavy-tailed distributions (Crovella, Bestravos) Characterized by its tail index α (that fixes the number of existing moments of the distribution) α can have an impact on the QoS (Norros, Mandjes, Park, etc.) Importance of the α estimation 2 / 18
Motivations II: Necessity of Sampling Very high speed networks: 1 Gbps and 10 Gbps Because of : CPU load storage capacity long data treatment energy consumption etc. Impossible to process each packet at such hight rates. Packet sampling: consider (deterministically or statistically), one packet out of N How can we estimate the flow size distribution tail index α from packet sampled data? 3 / 18
Problem Formulation I: Flow Size and Heavy-Tailed Distributions Traffic: interleaved packets from multiple sources Flow: set of packets sharing the same source and destination IPs and ports, and the same protocol Flow size: number of packets, random variable X Flow size distribution: P X (X = i) Zipf (discrete Pareto) Distribution: P X (X = i, α) = i (α+1) ζ(α+1) Estimation of α (no sampling): Hill (Seal), Nolan, Gonçalves 10 0 PX (X = i) 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 4 / 18
Problem Formulation II: The Sampling Packet sampling: Deterministic (Practice): Pick one packet every N Probabilistic (Theory): Pick packets with probability p = 1 N Sampled flow size: random variable Y Conditional probability: Binomial P Y X = B p (i, j) 5 / 18
Problem Formulation II: The Sampling Packet sampling: Deterministic (Practice): Pick one packet every N Probabilistic (Theory): Pick packets with probability p = 1 N Sampled flow size: random variable Y Conditional probability: Binomial P Y X = B p (i, j) Sampled flow size distribution (key relation to be inverted): P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution 10 0 orginal sampled, p=1/100 10 2 PX, PY 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 5 / 18
Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18
Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18
Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18
Framework and Existing Solutions II: 2-steps Methods Inference of the Original Distribution Maximum Likelihood Estimation using the Expectation-Maximisation algorithm (to impose P X (X = i) 0, i) [Duffield et al., SIGCOMM, 2003] oscillating behavior for large flows Expansion of the probability generating function [Hohn, Veitch, IMC, 2003] reliable for p > 1 2 only Utilization of an a priori distribution to invert P Y X : P X Y P Y X P ap X (Bayes) Estimation of the original distribution: P X = P X Y P }{{}}{{} Y sampling model + a priori observation How to appropriately choose the a priori model? 7 / 18
Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X Y (X = i Y = j) i (fixed j) 8 / 18
Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X (X = i) original infered, scaling meth. 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 8 / 18
Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X Y (X = i Y = j) P X Y (X = i Y = j) i (fixed j) i (fixed j) 8 / 18
i (α ap )(j): geometric mean of [j, ] weighted by P X Y 8 / 18 Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X Y (X = i Y = j) P X Y (X = i Y = j) i (fixed j) i (fixed j)
Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X (X = i) original infered, scaling meth. 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X (X = i) original infered, Zipf a pr. meth. 10 0 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i i (α ap )(j): geometric mean of [j, ] weighted by P X Y 8 / 18
Framework and Existing Solutions IIII: 1-Step Method, Stochastic Counting [Chabchoub, IEEE Comm. Lett., 2008] Observation period of lenght T is divided into sub-series of duration T = 5 s (< T ) W j : number of sampled flows of size j observed in a sub-series of duration T EW j empirically estimated by averaging the W j s obtained from each sub-series A Poisson approximation leads to a closed-form relation between α and EW j, ( which yields ) the following estimator for α: α = (j + 1) 1 EW j+1 EW j 1, for j j 0 Very simple, easy to implement, fast 9 / 18
Maximum Likelihood Estimation I: Formulation Assumption: The original distribution is Zipf (α) P Y (Y = j α) = 1 ζ(α+1) i=j B p(i, j)i (α+1) n: number of observed sampled flows Log-likelihood function: L(α) = log ( n k=1 P Y (Y = j k α)) L(α) = n [ normalization + P j=0 Y (Y = j) }{{} observation MLE: α ML = argmax L(α) α {}}{ ln ζ(α + 1) ln ( i=j B p(i, j) }{{} sampling ) ] i} (α+1) {{} original dist. 10 / 18
Maximum Likelihood Estimation II: Resolution Differentiation of L(α) brings: ζ (α + 1) ζ(α + 1) = P Y (Y = j) ln i (α) (j) j=0 No close form solution Fixed-point method and Expectation-Maximisation algorithm lead to the same iterative solution Approximative solution for j min large ( 3, for p = 1/100): 1 α k+1 = PY (Y = j) ln i (bαk )(j) i j=j (bαk min )(j min ) Hill estimation on the RV i (bαk )(Y ) (= the Zipf a priori method) (Convergence: between 5 and 100 iterations (worst case)) 11 / 18
Maximum Likelihood Estimation II: Resolution Differentiation of L(α) brings: ζ (α + 1) ζ(α + 1) = P Y (Y = j) ln i (α) (j) j=0 No close form solution Fixed-point method and Expectation-Maximisation algorithm lead to the same iterative solution Approximative solution for j min large ( 3, for p = 1/100): 1 α k+1 = PY (Y = j) ln i (bαk )(j) i j=j (bαk min )(j min ) Hill estimation on the RV i (bαk )(Y ) (= the Zipf a priori method) (Convergence: between 5 and 100 iterations (worst case)) 11 / 18
Maximum Likelihood Estimation III: Practical usage, introduction of j min Practical situations: discard small values of observed flow sizes because: there are actually not observed (e.g. j = 0) the distribution is only asymptotically Pareto j min : minimum observed sampled flow size considered Determination of j min : bias-variance trade-off iteratively optimized 12 / 18
Results: Performance Analysis I: Evaluation Scheme Synthetic traffic (Matlab): 100 independent ON/OFF sources emitting at 10 Mbps 50 independent realizations of T = 300 s 5 values of α: 1.1, 1.3, 1.5, 1.7, 1.9 3 sampling rates: p = 1/10, p = 1/100, p = 1/1000 between 10 6 and 10 7 original (unsampled) flows Real Internet traffic: Internet access router of ENS Lyon 1 hour trace on March 4, 2007, from 4:30pm to 5:30pm 10 7 original flows 13 / 18
Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 14 / 18
Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 Variance (dashed lines represent the Cramér-Rao bound): Variance 10 1 10 2 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 7 number of original flows Reaches the Cramér-Rao bound : p = 1/10, j min = 0 : p = 1/10, j min = 1 : p = 1/100, j min = 0 : p = 1/100, j min = 1 : p = 1/1000, j min = 0 : p = 1/1000, j min = 1 14 / 18
Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 Variance (dashed lines represent the Cramér-Rao bound): Variance 10 1 10 2 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 7 number of original flows : p = 1/10, j min = 0 : p = 1/10, j min = 1 : p = 1/100, j min = 0 : p = 1/100, j min = 1 : p = 1/1000, j min = 0 : p = 1/1000, j min = 1 Reaches the Cramér-Rao bound efficient estimator 14 / 18
Results: Performance Analysis III: Comparison of the different estimators on synthetic traffic p = 1/10 p = 1/100 p = 1/1000 1 1 1 Bias and standard deviation 0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2 α α α : scaling method : Zipf a priori method : stochastic counting method : MLE 15 / 18
Results: Performance Analysis IV: Comparison of the different estimators on real traffic number of flows 10 6 10 4 10 2 10 0 10 2 10 4 10 0 10 2 10 4 10 6 flow size i min 35 (tail lower bound) j min = 7 for p = 1/10 j min = 2 for p = 1/100 j min = 2 for p = 1/1000 MLE with p = 1 (no sampling) α = 0.9047 (reference) Bias of estimation from sampled data: Scaling Zipf a p. Stochastic MLE with geom. counting p mean approx. 1/10 0.0814 0.0234-0.0634 0.0149 1/100 0.3694 0.0888-0.1997 0.0169 1/1000 0.4113 0.0995 0.1360 0.0525 16 / 18
Conclusions and Perspectives Conclusions: MLE naturally outperform other estimators The Zipf a priori method is an approximation of the MLE Small values of j are well taken into consideration the difference between MLE and other estimators might be reduced when j min increases Perspectives: Robustness against data sets that do not perfectly match the Zipf model Real-time perspective: study a faster algorithm that conserves the good properties of the MLE The method can be applied to other situations: social networks: individuals are clustered into groups of heavy-tailed distributed sizes, only a cross-section of the population is observed 17 / 18
References [Duffield et al., SIGCOMM, 2003]: Estimating flow distributions from sampled flow statistics [Loiseau et al., MetroGrid, 2007]: A comparative study of different heavy tail index estimators of flow size from sampled data [Chabchoub et al., IEEE Comm. Lett., 2008]: Inference of flow statistics via packet sampling in the internet [Hohn, Veitch, IMC, 2003]: Inverting sampled traffic [Loiseau et al., Sigmetrics, 2009]: Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data 18 / 18