Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data

Similar documents
Modeling Residual-Geometric Flow Sampling

SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING

A LONG-RANGE DEPENDENT MODEL FOR NETWORK TRAFFIC WITH FLOW-SCALE CORRELATIONS

TWO PROBLEMS IN NETWORK PROBING

Network Traffic Characteristic

An Overview of Traffic Matrix Estimation Methods

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

ECEN 689 Special Topics in Data Science for Communications Networks

Accurate and Fast Replication on the Generation of Fractal Network Traffic Using Alternative Probability Models

Evaluation of Effective Bandwidth Schemes for Self-Similar Traffic

Effective Bandwidth for Traffic Engineering

Estimators as Random Variables

High Dimensional Discriminant Analysis

Charging from Sampled Network Usage

Part 2: Random Routing and Load Balancing

Northwestern University Department of Electrical Engineering and Computer Science

Sampling and Censoring in Estimation of Flow Distributions

Processing Top-k Queries from Samples

7 Likelihood and Maximum Likelihood Estimation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

Estimation de mesures de risques à partir des L p -quantiles

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Second main application of Chernoff: analysis of load balancing. Already saw balls in bins example. oblivious algorithms only consider self packet.

Long Range Mutual Information

Lecture 4: Bernoulli Process

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Sample Based Estimation of Network Traffic Flow Characteristics

BTRY 4090: Spring 2009 Theory of Statistics

Bayesian Inference and MCMC

Density Estimation. Seungjin Choi

Bayesian Methods for Machine Learning

Outline lecture 2 2(30)

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Network Performance Tomography

Stat 5101 Lecture Notes

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Mice and Elephants Visualization of Internet

DIMENSIONING BANDWIDTH FOR ELASTIC TRAFFIC IN HIGH-SPEED DATA NETWORKS

COMP90051 Statistical Machine Learning

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Review of Probabilities and Basic Statistics

Priority sampling for estimation of arbitrary subset sums

Signal detection theory

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

ECEN 689 Special Topics in Data Science for Communications Networks

2 Statistical Estimation: Basic Concepts

6.867 Machine Learning

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

CHAPTER 7. Trace Resampling and Load Scaling

Lecture 7 Introduction to Statistical Decision Theory

Outline Lecture 2 2(32)

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

Knowledge Discovery and Data Mining 1 (VO) ( )

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Heavy Tails: The Origins and Implications for Large Scale Biological & Information Systems

Internet Traffic Modeling and Its Implications to Network Performance and Control

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Support Vector Machines

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Mathematical statistics

ABC methods for phase-type distributions with applications in insurance risk problems

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Active Measurement for Multiple Link Failures Diagnosis in IP Networks

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

CSE 312 Final Review: Section AA

Extreme Value Theory An Introduction

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Inferring from data. Theory of estimators

Exam C Solutions Spring 2005

An Early Traffic Sampling Algorithm

Semi-Markovian User State Estimation and Policy Optimization for Energy Efficient Mobile Sensing

Statistical learning. Chapter 20, Sections 1 4 1

Continuous Univariate Distributions

Linear discriminant functions

Approximate Bayesian Computation and Particle Filters

DIMENSIONING BANDWIDTH FOR ELASTIC TRAFFIC IN HIGH-SPEED DATA NETWORKS

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Introduction to Probabilistic Machine Learning

Probability Theory for Machine Learning. Chris Cremer September 2015

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Lecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking

NONPARAMETRIC ESTIMATION OF THE CONDITIONAL TAIL INDEX

On Tandem Blocking Queues with a Common Retrial Queue

System Identification, Lecture 4

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Stochastic Hybrid Systems: Applications to Communication Networks

Lecture 8: Information Theory and Statistics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

A Simple and Efficient Estimation Method for Stream Expression Cardinalities

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

1 Degree distributions and data

Two Different Shrinkage Estimator Classes for the Shape Parameter of Classical Pareto Distribution

Probability and Estimation. Alan Moses

Frequency estimation by DFT interpolation: A comparison of methods

Estimation of Truncated Data Samples in Operational Risk Modeling

On Tandem Blocking Queues with a Common Retrial Queue

Transcription:

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Patrick Loiseau 1, Paulo Gonçalves 1, Stéphane Girard 2, Florence Forbes 2, Pascale Vicat-Blanc Primet 1 1 INRIA/ENS Lyon - Université de Lyon, France 2 INRIA - Grenoble Universities, France Sigmetrics/Performance 2009 Seattle, June 18, 2009 1 / 18

Motivations I: Importance of α Global context: Quality of Service in networks Depends on what is sent, i.e. the file size distribution File size distribution in the Internet: commonly modeled as heavy-tailed distributions (Crovella, Bestravos) Characterized by its tail index α (that fixes the number of existing moments of the distribution) α can have an impact on the QoS (Norros, Mandjes, Park, etc.) Importance of the α estimation 2 / 18

Motivations II: Necessity of Sampling Very high speed networks: 1 Gbps and 10 Gbps Because of : CPU load storage capacity long data treatment energy consumption etc. Impossible to process each packet at such hight rates. Packet sampling: consider (deterministically or statistically), one packet out of N How can we estimate the flow size distribution tail index α from packet sampled data? 3 / 18

Problem Formulation I: Flow Size and Heavy-Tailed Distributions Traffic: interleaved packets from multiple sources Flow: set of packets sharing the same source and destination IPs and ports, and the same protocol Flow size: number of packets, random variable X Flow size distribution: P X (X = i) Zipf (discrete Pareto) Distribution: P X (X = i, α) = i (α+1) ζ(α+1) Estimation of α (no sampling): Hill (Seal), Nolan, Gonçalves 10 0 PX (X = i) 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 4 / 18

Problem Formulation II: The Sampling Packet sampling: Deterministic (Practice): Pick one packet every N Probabilistic (Theory): Pick packets with probability p = 1 N Sampled flow size: random variable Y Conditional probability: Binomial P Y X = B p (i, j) 5 / 18

Problem Formulation II: The Sampling Packet sampling: Deterministic (Practice): Pick one packet every N Probabilistic (Theory): Pick packets with probability p = 1 N Sampled flow size: random variable Y Conditional probability: Binomial P Y X = B p (i, j) Sampled flow size distribution (key relation to be inverted): P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution 10 0 orginal sampled, p=1/100 10 2 PX, PY 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 5 / 18

Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18

Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18

Framework and Existing Solutions I: 2 Types of Methods P Y (Y = j) = }{{} i j observation B p (i, j) }{{} sampling model P X (X = i, α) }{{} original distribution How to estimate α from sampled data? A. Two steps methods: 1. estimate the original distribution P X from the observation P Y with no a priori model 2. deduce α from the estimate b P X B. One step methods: estimate directly α from the observation P Y 6 / 18

Framework and Existing Solutions II: 2-steps Methods Inference of the Original Distribution Maximum Likelihood Estimation using the Expectation-Maximisation algorithm (to impose P X (X = i) 0, i) [Duffield et al., SIGCOMM, 2003] oscillating behavior for large flows Expansion of the probability generating function [Hohn, Veitch, IMC, 2003] reliable for p > 1 2 only Utilization of an a priori distribution to invert P Y X : P X Y P Y X P ap X (Bayes) Estimation of the original distribution: P X = P X Y P }{{}}{{} Y sampling model + a priori observation How to appropriately choose the a priori model? 7 / 18

Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X Y (X = i Y = j) i (fixed j) 8 / 18

Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X (X = i) original infered, scaling meth. 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 8 / 18

Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X Y (X = i Y = j) P X Y (X = i Y = j) i (fixed j) i (fixed j) 8 / 18

i (α ap )(j): geometric mean of [j, ] weighted by P X Y 8 / 18 Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X Y (X = i Y = j) P X Y (X = i Y = j) i (fixed j) i (fixed j)

Framework and Existing Solutions III: 2-steps Methods Choice of an a priori Distribution 1. Uniform a priori (scaling method): P X (X = i) C st P X Y (X = i Y = j) B p (i, j) Simplified form of P X Y : Rectangular window approx. P X (X = i) original infered, scaling meth. 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i 2. Zipf a priori: P X (X = i) i (αap +1) P X Y (X = i Y = j) B p (i, j)i (αap +1) Simplified form of P X Y : concentrated on one point: i (α ap )(j) P X (X = i) original infered, Zipf a pr. meth. 10 0 10 2 10 4 10 6 10 8 10 0 10 1 10 2 10 3 10 4 i i (α ap )(j): geometric mean of [j, ] weighted by P X Y 8 / 18

Framework and Existing Solutions IIII: 1-Step Method, Stochastic Counting [Chabchoub, IEEE Comm. Lett., 2008] Observation period of lenght T is divided into sub-series of duration T = 5 s (< T ) W j : number of sampled flows of size j observed in a sub-series of duration T EW j empirically estimated by averaging the W j s obtained from each sub-series A Poisson approximation leads to a closed-form relation between α and EW j, ( which yields ) the following estimator for α: α = (j + 1) 1 EW j+1 EW j 1, for j j 0 Very simple, easy to implement, fast 9 / 18

Maximum Likelihood Estimation I: Formulation Assumption: The original distribution is Zipf (α) P Y (Y = j α) = 1 ζ(α+1) i=j B p(i, j)i (α+1) n: number of observed sampled flows Log-likelihood function: L(α) = log ( n k=1 P Y (Y = j k α)) L(α) = n [ normalization + P j=0 Y (Y = j) }{{} observation MLE: α ML = argmax L(α) α {}}{ ln ζ(α + 1) ln ( i=j B p(i, j) }{{} sampling ) ] i} (α+1) {{} original dist. 10 / 18

Maximum Likelihood Estimation II: Resolution Differentiation of L(α) brings: ζ (α + 1) ζ(α + 1) = P Y (Y = j) ln i (α) (j) j=0 No close form solution Fixed-point method and Expectation-Maximisation algorithm lead to the same iterative solution Approximative solution for j min large ( 3, for p = 1/100): 1 α k+1 = PY (Y = j) ln i (bαk )(j) i j=j (bαk min )(j min ) Hill estimation on the RV i (bαk )(Y ) (= the Zipf a priori method) (Convergence: between 5 and 100 iterations (worst case)) 11 / 18

Maximum Likelihood Estimation II: Resolution Differentiation of L(α) brings: ζ (α + 1) ζ(α + 1) = P Y (Y = j) ln i (α) (j) j=0 No close form solution Fixed-point method and Expectation-Maximisation algorithm lead to the same iterative solution Approximative solution for j min large ( 3, for p = 1/100): 1 α k+1 = PY (Y = j) ln i (bαk )(j) i j=j (bαk min )(j min ) Hill estimation on the RV i (bαk )(Y ) (= the Zipf a priori method) (Convergence: between 5 and 100 iterations (worst case)) 11 / 18

Maximum Likelihood Estimation III: Practical usage, introduction of j min Practical situations: discard small values of observed flow sizes because: there are actually not observed (e.g. j = 0) the distribution is only asymptotically Pareto j min : minimum observed sampled flow size considered Determination of j min : bias-variance trade-off iteratively optimized 12 / 18

Results: Performance Analysis I: Evaluation Scheme Synthetic traffic (Matlab): 100 independent ON/OFF sources emitting at 10 Mbps 50 independent realizations of T = 300 s 5 values of α: 1.1, 1.3, 1.5, 1.7, 1.9 3 sampling rates: p = 1/10, p = 1/100, p = 1/1000 between 10 6 and 10 7 original (unsampled) flows Real Internet traffic: Internet access router of ENS Lyon 1 hour trace on March 4, 2007, from 4:30pm to 5:30pm 10 7 original flows 13 / 18

Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 14 / 18

Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 Variance (dashed lines represent the Cramér-Rao bound): Variance 10 1 10 2 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 7 number of original flows Reaches the Cramér-Rao bound : p = 1/10, j min = 0 : p = 1/10, j min = 1 : p = 1/100, j min = 0 : p = 1/100, j min = 1 : p = 1/1000, j min = 0 : p = 1/1000, j min = 1 14 / 18

Results: Performance Analysis II: Performance of the MLE (α = 1.5) Bias: MLE asymptotically unbiased Illustration: bias < 10 4 for a number of original flows 10 6 Variance (dashed lines represent the Cramér-Rao bound): Variance 10 1 10 2 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 7 number of original flows : p = 1/10, j min = 0 : p = 1/10, j min = 1 : p = 1/100, j min = 0 : p = 1/100, j min = 1 : p = 1/1000, j min = 0 : p = 1/1000, j min = 1 Reaches the Cramér-Rao bound efficient estimator 14 / 18

Results: Performance Analysis III: Comparison of the different estimators on synthetic traffic p = 1/10 p = 1/100 p = 1/1000 1 1 1 Bias and standard deviation 0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2 0.5 0 0.5 1 1.5 2 α α α : scaling method : Zipf a priori method : stochastic counting method : MLE 15 / 18

Results: Performance Analysis IV: Comparison of the different estimators on real traffic number of flows 10 6 10 4 10 2 10 0 10 2 10 4 10 0 10 2 10 4 10 6 flow size i min 35 (tail lower bound) j min = 7 for p = 1/10 j min = 2 for p = 1/100 j min = 2 for p = 1/1000 MLE with p = 1 (no sampling) α = 0.9047 (reference) Bias of estimation from sampled data: Scaling Zipf a p. Stochastic MLE with geom. counting p mean approx. 1/10 0.0814 0.0234-0.0634 0.0149 1/100 0.3694 0.0888-0.1997 0.0169 1/1000 0.4113 0.0995 0.1360 0.0525 16 / 18

Conclusions and Perspectives Conclusions: MLE naturally outperform other estimators The Zipf a priori method is an approximation of the MLE Small values of j are well taken into consideration the difference between MLE and other estimators might be reduced when j min increases Perspectives: Robustness against data sets that do not perfectly match the Zipf model Real-time perspective: study a faster algorithm that conserves the good properties of the MLE The method can be applied to other situations: social networks: individuals are clustered into groups of heavy-tailed distributed sizes, only a cross-section of the population is observed 17 / 18

References [Duffield et al., SIGCOMM, 2003]: Estimating flow distributions from sampled flow statistics [Loiseau et al., MetroGrid, 2007]: A comparative study of different heavy tail index estimators of flow size from sampled data [Chabchoub et al., IEEE Comm. Lett., 2008]: Inference of flow statistics via packet sampling in the internet [Hohn, Veitch, IMC, 2003]: Inverting sampled traffic [Loiseau et al., Sigmetrics, 2009]: Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data 18 / 18