Modeling Residual-Geometric Flow Sampling

Similar documents
Modeling Residual-Geometric Flow Sampling

Modeling Residual-Geometric Flow Sampling

Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data

Self-Tuning the Parameter of Adaptive Non-Linear Sampling Method for Flow Statistics

Min Congestion Control for High- Speed Heterogeneous Networks. JetMax: Scalable Max-Min

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

Towards a Universal Sketch for Origin-Destination Network Measurements

Probabilistic Near-Duplicate. Detection Using Simhash

Estimation of DNS Source and Cache Dynamics under Interval-Censored Age Sampling

Statistical Data Analysis

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

Modeling Randomized Data Streams in Caching, Data Processing, and Crawling Applications

Small active counters

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

On Delay-Independent Diagonal Stability of Max-Min Min Congestion Control

PIQI-RCP: Design and Analysis of Rate-Based Explicit Congestion Control

Charging from Sampled Network Usage

Negative Association, Ordering and Convergence of Resampling Methods

Classic Network Measurements meets Software Defined Networking

Chapter 2 Per-Flow Size Estimators

Unsupervised Anomaly Detection for High Dimensional Data

Network Performance Tomography

Shortest Paths & Link Weight Structure in Networks

Confident Estimation for Multistage Measurement Sampling and Aggregation

SAMPLING AND INVERSION

Stochastic Network Calculus

Sample Based Estimation of Network Traffic Flow Characteristics

Router-Based Algorithms for Improving Internet Quality of Service

Statistical Analysis and Distortion Modeling of MPEG-4 FGS

Gaussian Traffic Revisited

Yale University Department of Computer Science

DETECTION OF DANGEROUS MAGNETIC FIELD RANGES FROM TABLETS BY CLUSTERING ANALYSIS

Inference For High Dimensional M-estimates. Fixed Design Results

A Comparative Study of Two Network-based Anomaly Detection Methods

High-Rank Matrix Completion and Subspace Clustering with Missing Data

Methods for sparse analysis of high-dimensional data, II

An In-Depth, Analytical Study of Sampling Techniques For Self-Similar Internet Traffic

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

An Early Traffic Sampling Algorithm

ECEN 689 Special Topics in Data Science for Communications Networks

End-biased Samples for Join Cardinality Estimation

Methods for sparse analysis of high-dimensional data, II

Ping Li Compressed Counting, Data Streams March 2009 DIMACS Compressed Counting. Ping Li

Active Measurement for Multiple Link Failures Diagnosis in IP Networks

Ad Placement Strategies

Multivariate random variables

Lecture Notes 1: Vector spaces

Don t Let The Negatives Bring You Down: Sampling from Streams of Signed Updates

Solving Corrupted Quadratic Equations, Provably

Local and Global Stability of Symmetric Heterogeneously-Delayed Control Systems

Information theoretic perspectives on learning algorithms

On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs

Interactive Communication for Data Exchange

Machine Learning Approaches to Network Anomaly Detection

Wavelet and Time-Domain Modeling of Multi-Layer VBR Video Traffic

sparse and low-rank tensor recovery Cubic-Sketching

Consistency of the maximum likelihood estimator for general hidden Markov models

Distributed Scheduling for Achieving Multi-User Diversity (Capacity of Opportunistic Scheduling in Heterogeneous Networks)

Statistics: Learning models from data

Reorganized and Compact DFA for Efficient Regular Expression Matching

Inverting Sampled Traffic

High-Rank Matrix Completion and Subspace Clustering with Missing Data

Lecture 1 September 3, 2013

Sampling and Censoring in Estimation of Flow Distributions

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Very Sparse Random Projections

TWO PROBLEMS IN NETWORK PROBING

y(x) = x w + ε(x), (1)

Continuous-time hidden Markov models for network performance evaluation

Robust Lifetime Measurement in Large-Scale P2P Systems with Non-Stationary Arrivals

On the Number of Products to Represent Interval Functions by SOPs with Four-Valued Variables

Quantile Regression for Extraordinarily Large Data

Lecture 11: Continuous-valued signals and differential entropy

Packet Loss Analysis of Load-Balancing Switch with ON/OFF Input Processes

An Overview of Traffic Matrix Estimation Methods

Internet Traffic Mid-Term Forecasting: a Pragmatic Approach using Statistical Analysis Tools

Inference with Multivariate Heavy-Tails in Linear Models

Northwestern University Department of Electrical Engineering and Computer Science

How Many Leaders Are Needed for Reaching a Consensus

Robust Testing and Variable Selection for High-Dimensional Time Series

Lecture 8: Information Theory and Statistics

Impact of traffic mix on caching performance

Discrete Random Variables

Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation

Brownian survival and Lifshitz tail in perturbed lattice disorder

Review. December 4 th, Review

Performance Evaluation and Service Rate Provisioning for a Queue with Fractional Brownian Input

Priority sampling for estimation of arbitrary subset sums

Introduction to statistics

Data Rate Theorem for Stabilization over Time-Varying Feedback Channels

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Internet Traffic Modeling and Its Implications to Network Performance and Control

Optimal Combination of Sampled Network Measurements

Design of Probing Experiments for Loss Tomography

The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge.

This exam contains 6 questions. The questions are of equal weight. Print your name at the top of this page in the upper right hand corner.

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Transcription:

Modeling Residual-Geometric Flow Sampling Xiaoming Wang Joint work with Xiaoyong Li and Dmitri Loguinov Amazon.com Inc., Seattle, WA April 13 th, 2011 1

Agenda Introduction Underlying model of residual sampling Analysis of existing estimators Proposal of new estimators Performance evaluation Conclusion 2

Introduction Traffic monitoring is an important topic for today s Internet Security, accounting, traffic engineering It has become challenging as Internet grew in scale and complexity In this talk, we focus on two problems in the general area of measuring flow sizes Determining the number of packets of elephant flows Recovering the distribution of flow sizes 3

Related Work Packet sampling Sampled NetFlow (Cisco) Adaptive NetFlow (Estan, SIGCOMM 04) Sketch-guided sampling (Kumar, INFOCOM 06) Adaptive non-linear sampling (Hu, INFOCOM 08) Flow sampling Sample-and-hold (Estan, SIGCOMM 02) Flow thinning (Hohn, IMC 03) Smart sampling (Duffield, IMC 03/SIGMETRICS 03) Flow slicing (Kompella, IMC 05) 4

Analysis of Underlying Model Our talk is based on the sampling method proposed by sample-and-hold (Estan, SIGCOMM 02) We call this method by Residual-Geometric Sampling (RGS) due to two reasons: This belongs to the class of residual-sampling techniques (Wang, INFOCOM 07/P2P 09) It can be modeled by a geometric process Our analysis of RGS covers two goals: Providing a unifying analytical model Understanding the properties of samples it collects 5

Analysis of Underlying Model 2 How does RGS work? For a sequence of packets traversing a router, it checks each packet s flow id x in some RAM table If x is found, its counter is incremented by 1 Otherwise, an entry is created for x with probability p and this packet is discarded with probability 1 p The state of a flow can be modeled by a simple geometric process 1 p Not Sampled p Sampled 1 6

Analysis of Underlying Model 3 We need several definitions: Assume that flow sizes are i.i.d Given a random flow with size L, define geometric age A L the number of packets discarded from the front Define geometric residual R L the final counter value A flow of size 9 is not sampled until the 4 th packet Packets from a flow Age A L =3 Flow size L=9 Residual R L =6 7

Analysis of Underlying Model 4 Assume flow size L has a PMF f i : Lemma 1: Probability p s of a flow being selected by RGS is: Lemma 2: PMF h i of geometric residual R L can be expressed as: 8

Agenda Introduction Underlying model of residual sampling Analysis of existing estimators Proposal of new estimators Performance evaluation Conclusion 9

Previous Method Single-Flow Usage Prior work on RGS (Estan, SIGCOMM 02) suggested following estimator of single-flow size: Theorem 1: For given size l, the expected value of estimator e(r l ) is: It tends to overestimate the original flow size by a factor of up to 1/p 10

Simulations Estimated Size p=0.01 p=0.001 11

Previous Method Single-Flow Usage 2 Quantifying the error of individual values e(r l ) in estimating flow size l Relative Root Mean Square Error (RRMSE) where is relative error Theorem 2: RRMSE of the existing RGS estimator: 12

Simulations Relative RMSE p=0.01 p=0.001 13

Previous Method Flow-Size Distribution Consider PMF q i of e(r L ) and compare it with f i Theorem 3: PMF of flow sizes estimated from e(r L ) is: where p s is the probability of a flow is selected and The estimated distribution q i is quite different from actual f i 14

Simulations Flow Size Distribution p=0.01 p=0.001 15

Agenda Introduction Underlying model of residual sampling Analysis of existing estimators Proposal of new estimators Performance evaluation Conclusion 16

URGE Single-Flow Usage For single-flow size, we propose following estimator: same different Lemma 3: is unbiased for any flow size l Theorem 4: RRMSE of as: 17

URGE Simulations Single-Flow Usage p=0.01 p=0.001 18

URGE Simulations Relative Error p=0.01 p=0.001 19

URGE Flow Size Distribution Lemma 5: The flow size distribution f i can be expressed using PMF of geometric residual h i as: For flow size distribution, we propose following estimator: Corollary 2: Estimator is asymptotically unbiased, that is, converges in probability to f i as the number M of sampled flows 20

URGE Simulations Flow Size Dist. p=0.01 p=0.001 21

URGE Convergence We next examine the effect of sample size M on the convergence of estimator p=10-4, M=3,090 p=10-5, M=337 22

URGE Convergence 2 Theorem 5: For small constants η and ξ with probability 1 ξ, following holds for j [1, i+1] if sample size M is no less than: where Φ(x) is the CDF of the standard Gaussian distribution N(0,1) 23

Performance Evaluation We applied our estimation algorithm to the traces collected by NLANR and CAIDA All of them confirm the accuracy of URGE As example, we show our experiment on dataset FRG, collected from a gigabit link between UCSD and Abilene 24

Performance Evaluation Usage Previous method URGE 25

Performance Evaluation Distribution p=0.01 p=0.0001 p=0.001 26

Conclusion We proposed a novel modeling framework for analyzing residual sampling Proved that previous estimators based on RGS had certain bias We also developed a novel set of unbiased estimators Verified them both in simulations and on Internet traces Results show that the proposed method provides an accurate and scalable solution to Internet traffic monitoring 27

The End Thanks! 28