Network Simulation Chapter 5: Traffic Modeling. Chapter Overview

Similar documents
Network Simulation Chapter 6: Output Data Analysis

Summarizing Measured Data

SUMMARIZING MEASURED DATA. Gaia Maselli

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Uniform random numbers generators

Discrete-event simulations

Overall Plan of Simulation and Modeling I. Chapters

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

MATH4427 Notebook 4 Fall Semester 2017/2018

Chapter 11. Output Analysis for a Single Model Prof. Dr. Mesut Güneş Ch. 11 Output Analysis for a Single Model

Descriptive Data Summarization

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

The Instability of Correlations: Measurement and the Implications for Market Risk

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Queueing Theory and Simulation. Introduction

Chapter 4: Continuous Random Variables and Probability Distributions

Introduction to statistics

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

1.225J J (ESD 205) Transportation Flow Systems

Distribution Fitting (Censored Data)

Modeling and Performance Analysis with Discrete-Event Simulation

CS 147: Computer Systems Performance Analysis

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Chapter 2 Queueing Theory and Simulation

Network Traffic Characteristic

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Probability Distribution

Stat 2300 International, Fall 2006 Sample Midterm. Friday, October 20, Your Name: A Number:

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Summarizing Measured Data

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

(Re)introduction to Statistics Dan Lizotte

MULTIPLE CHOICE QUESTIONS DECISION SCIENCE

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Eruptions of the Old Faithful geyser

STAT Chapter 5 Continuous Distributions

Learning Objectives for Stat 225

COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS

AP Statistics Cumulative AP Exam Study Guide

Continuous-Valued Probability Review

1. Exploratory Data Analysis

Frequency Analysis & Probability Plots

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Summarizing Measured Data

Prof. Thistleton MAT 505 Introduction to Probability Lecture 13

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

SIMULATION SEMINAR SERIES INPUT PROBABILITY DISTRIBUTIONS

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

ntopic Organic Traffic Study

Lecture 2: Probability Distributions

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ALGEBRA I CURRICULUM OUTLINE

SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING

1 Measures of the Center of a Distribution

More on Input Distributions

Queueing Theory. VK Room: M Last updated: October 17, 2013.

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )

Introduction to Queueing Theory

Fundamentals of Applied Probability and Random Processes

U.S. - Canadian Border Traffic Prediction

EE/CpE 345. Modeling and Simulation. Fall Class 9

Estimation of multivariate critical layers: Applications to rainfall data

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Chapter 4a Probability Models

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

CPSC 531 Systems Modeling and Simulation FINAL EXAM

Bounded Delay for Weighted Round Robin with Burst Crediting

You may use a calculator. Translation: Show all of your work; use a calculator only to do final calculations and/or to check your work.

B. Maddah INDE 504 Discrete-Event Simulation. Output Analysis (1)

Introduction to Queueing Theory with Applications to Air Transportation Systems

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Chapter 1 Descriptive Statistics

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

HEAVY-TRAFFIC EXTREME-VALUE LIMITS FOR QUEUES

Class 11 Maths Chapter 15. Statistics


Sample Problems for the Final Exam

Some Background Information on Long-Range Dependence and Self-Similarity On the Variability of Internet Traffic Outline Introduction and Motivation Ch

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0.

Exploratory Data Analysis August 26, 2004

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

ALGEBRA 1 PACING GUIDE

A C E. Answers Investigation 4. Applications

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

CSE 312, 2017 Winter, W.L. Ruzzo. 7. continuous random variables

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Discrete Random Variables (1) Solutions

Chapter 3 Balance equations, birth-death processes, continuous Markov Chains

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

Discrete probability distributions

ANÁLISE DOS DADOS. Daniela Barreiro Claro

Transcription:

Network Simulation Chapter 5: Traffic Modeling Prof. Dr. Jürgen Jasperneite 1 Chapter Overview 1. Basic Simulation Modeling 2. OPNET IT Guru - A Tool for Discrete Event Simulation 3. Review of Basic Probabilities and Statistics 4. Building valid, credible Simulation Models 5. Traffic Modeling 6. Output Data Analysis 0: Overview 2 Prof. Dr. J ürgen Jasperneite 1

Chapter Overview 1. Basic Simulation Modeling 2. OMNeT++ - A Tool for Discrete Event Simulation 3. Review of Basic Probabilities and Statistics 4. Building valid, credible Simulation Models 5. Traffic Modeling 6. Output Data Analysis 0: Overview 3 Overview Introduction Quantifying models Goodness of fit tests 4 Prof. Dr. J ürgen Jasperneite 2

Introduction load parameter System parameter Workload Traffic Source System under study metrics 5 Introduction Part of modeling what input probability distributions to use as input to simulation for: e.g. Interarrival times, message lengths, message types Characterization of traffic is very important Results of a simulation are only as good as the input > Inappropriate input distribution(s) can lead to incorrect output, bad decisions. Many different methods are used to generate traffic sources. Each method has advantages/disadvantages Development time Flexibility Accuracy 6 Prof. Dr. J ürgen Jasperneite 3

Introduction Traffic categories include: Statistical sources Exponential distributed IA times ON-OFF Network applications FTP HTTP Voice Video... Captured packet traces (trace-driven simulation) 7 Simple Statistical Distributions Statistical distributions are commonly used in performance analysis Poisson (Application Traffic, Interarrival times) Normal (Packet Sizes) Uniform (Destination Addresses) 8 Prof. Dr. J ürgen Jasperneite 4

Overview Introduction Quantifying models Goodness of fit tests 9 Introduction Usually, have observed data on input quantities options for use: Use Pros Cons Trace-driven Use actual data values to drive simulation Valid vis-a-vis real world direct Not generalizable 10 Empirical distribution Use data values to define a connect- the-dots distribution Fitted standard deviation Use data to fit a classical distribution (exp, uniform, Poisson, etc.) Fairly valid Simple Fairly direct Generalizable fills in holes in data May limit range of generated variates (depending on form) May not be valid May be difficult Prof. Dr. J ürgen Jasperneite 5

Extracting distributions out of traces How to overcome finiteness of a trace? How to characterize a trace in general? Consider a trace as a set {X 1,, X n } of individual values Assumption: all samples come from the same distribution Construct the empirical distribution function from this set Sort the {X 1,, X n } in increasing order such that X (1) X (n) Define a piecewise-linear distribution function: 0 i 1 x X ( i) F ( x) n 1 ( n 1)( X X ) ( i 1) ( i) 1 if if x X X ( i ) if X x X ( n) (1) x ( i 1) for i 1,..., n 1 11 Empirical distribution - example Figure shows an empirical distribution function for six data points F(x) 1 0.8 0.6 Empirical distribution 0.4 0.2 0 0 X (1) 5 X (2) 10 X (4) 15 20 25 X(3) X (5) X (6) 12 Prof. Dr. J ürgen Jasperneite 6

Empirical distributions Discussion For realistic sample sizes, no or few data are available for the tail of a distribution. Empirical distributions as defined above do not allow to generate values larger than maximum X j, which might be desirable Adding an exponential tail to the data is possible and often useful 14 Traces vs. empirical distributions Going from traces to empirical distributions seems to be quite attractive Infinite number of samples can be easily generated Is there a downside? Example: Suppose you want to use the waiting time of customers in a queue as an input to some other simulation model Trace-driven: generate a long list of many individual waiting times of customers (either by measurement or by simulation), store this list, and whenever a waiting time is needed, use one entry of this list. Empirical distribution function: take the list, compute an empirical distribution, and generate a random variate when a waiting time is required. 15 Prof. Dr. J ürgen Jasperneite 7

Traces vs. empirical distributions Difference? Generating random variates from distribution happens one at a time, no information about the previous values is stored All values are identically distributed (they come from the same distribution) and are independent of each other Their corresponding random variables are called IID variables In a trace, the history of the system, how the values were generated, is still maintained (though implicitly) Such history could result in a mutual dependence of values Consider a queue: When the person before you has to wait long, it is quite likely that you will also have to wait long Waiting times in a queue are positively correlated Correlation structure of traces is destroyed by simulation using empirical distributions! 16 Traces vs. empirical distribution Example Consider the waiting times in an M/M/1 queue Compute the empirical distribution from one simulation run Use this empirical distribution to generate random numbers according to it Plot shows both distributions; they are reasonably similar We will see what is reasonable shortly Cumulat ive distribut ion function Randomly generated according to empirical distribution Empirical distribution 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 Waiting time 17 Prof. Dr. J ürgen Jasperneite 8

Traces vs. empirical distributions Example 18 But: look at the autocorrelation of the two sets of numbers (measured from the simulation and randomly generated): Compare the graphs on the right Note the slowly decaying autocorrelation for the simulated/measured data Randomly generated data is practically uncorrelated Generating random numbers in a direct fashion destroys correlation structure A utocorrela tion 1 0.8 0.6 0.4 0.2 0-0.2 p C Randomly generated data Simulated/Measured data 1 2 3 4 5 6 7 8 9 10 11 C i, i j Lag j 2 2 i i j E [( X )( X )] i, i j i i i j i j Generalizing empirical distributions Empirical distributions are essentially a big set of data points Unwieldy, big description Generating random numbers based on such an empirical distribution is quite time-consuming We will soon see how Is there a possibility to have a more compact, smaller representation? Yes: look for an analytically described (closed-form) distribution function that matches the empirical distribution function! 19 Prof. Dr. J ürgen Jasperneite 9

Fitting empirical distributions To replace an empirical distribution by an analytically described distribution, the following steps are required Find an analytical distribution which fits the overall shape of the empirical distribution As an analytical distribution is usually parameterized, find appropriate values for these parameters Determine the quality of the fit 20 Finding proper families of distributions To find a suitable family of analytical distributions, often prior knowledge about the underlying empirical distribution is available E.g., certain assumptions about arrival process directly result in Poisson distributions, etc. Negative selection is also possible: Some values have natural upper or lower bounds E.g., values that can only be positive should not be modeled with distributions that take on negative value 21 Prof. Dr. J ürgen Jasperneite 10

Heuristics to choose distributions How to choose distributions to fit data when no prior knowledge is available? Some heuristics exist Summary statistics Histogram Note that most of these heuristics (as well as procedures to check the quality of a fit) require the underlying data (from which the empirical distribution has been generated) to be independent One means to check independency is autocovariance 22 Summary statistics Compute summary statistics such as mean, median, variance, coefficient of variation, or skewness (measure of symmetry) from the original sample Compare these results with properties of a possible distribution E.g., for symmetric distributions mean and median are equal For some distributions, coefficient of variation must be smaller than 1, equal to 1 (exponential distribution) More a means to quickly weed out inappropriate distributions from a large set of possible ones. 23 Prof. Dr. J ürgen Jasperneite 11

Histograms Compute a histogram of the original data Typically, equidistant buckets are useful Compare the shape of the histogram with that of the density of possible distributions Many shapes are quite characteristic and easily recognized Ignore differences in location and scale How to choose width/number k of buckets? Sturges s rule: k 1 log 2 n where n is the number of data points Better: rely on optical impression smooth shape, buckets neither too wide (detail is lost, spikes at crucial points could be missed) nor too small (small differences are overemphasized) Histogram can often indicate whether density is sum of two individual densities 24 Histograms Example of a multimodal distribution Histogram shows Data traffic between a Logic Controller (PLC) and a Human-Maschine Interface (HMI). [1] Jasperneite, Jürgen: Analyse und Modellierung von Kommunikationslasten in der Fertigungstechnik. in: at - Automatisierungstechnik, R. Oldenbourg Verlag(49) S.: 206-213, Apr 2001 Result of a keep-alive function, where every 5 sec. Packets will be exchanged. 25 Prof. Dr. J ürgen Jasperneite 12

Overview Introduction Quantifying models Goodness of fit tests 26 Goodness of fit tests Based on a hypothesized distribution along with estimated parameters, how to tell how good this hypothesis matches real data? Heuristic procedures Density/Histogram overplots: Plot both empirical histogram and estimated density function in one graph, look for differences Frequency comparison: Plot empirical histogram and calculated histogram side by side, look for differences Distribution Function Difference Plot: Compute difference between empirical and estimated distribution, plot this difference. Ideally, result is a horizontal line at 0 Directly comparing two plots of distributions is difficult for most humans Probability/Quantile Plots see below 27 Prof. Dr. J ürgen Jasperneite 13

QQ-Plots Way of plotting the difference between two distributions: Q-Q plots A quantile is the variable-value that corresponds to a fixed cumulative frequency. First quartile = 0.25 quantile Second quartile = median = 0.5 quantile Third quartile = 0.75 quantile Can read any quantile from the cdf plot 28 QQ-Plot..... compare two univariate 1) distributions.. is a plot of matching quantiles > a straight line implies that the two distributions have the same shape... has units of the data.. emphasize differences in the tails 1) Involving one variable, as opposed to two (bivariate) or many (multivariate) 29 Prof. Dr. J ürgen Jasperneite 14

Example : QQ-Plot 4 1 5 2 Sample 3 6 30 Normal Example: Old faithful inter-eruption times Data describing times between eruptions from a geyser (in minutes): 3.600,1.800,3.333,2.283,4.533,2.883,4.700,3.600,1.950,4.350,1.833,3.917,4.200,1.750,4. 700,2.167,1.750,4.800,1.600,4.250,1.800,1.750,3.450,3.067,4.533,3.600,1.967,4.083,3.85 0,4.433,4.300,4.467,3.367,4.033,3.833,2.017,1.867,4.833,1.833,4.783,4.350,1.883,4.567, 1.750,4.533,3.317,3.833,2.100,4.633,2.000,4.800,4.716,1.833,4.833,1.733,4.883,3.717,1.6 67,4.567,4.317,2.233,4.500,1.750,4.800,1.817,4.400,4.167,4.700,2.067,4.700,4.033,1.967, 4.500,4.000,1.983,5.067,2.017,4.567,3.883,3.600,4.133,4.333,4.100,2.633,4.067,4.933,3. 950,4.517,2.167,4.000,2.200,4.333,1.867,4.817,1.833,4.300,4.667,3.750,1.867,4.900,2.48 3,4.367,2.100,4.500,4.050,1.867,4.700,1.783,4.850,3.683,4.733,2.300,4.900,4.417,1.700, 4.633,2.317,4.600,1.817,4.417,2.617,4.067,4.250,1.967,4.600,3.767,1.917,4.500,2.267,4.6 50,1.867,4.167,2.800,4.333,1.833,4.383,1.883,4.933,2.033,3.733,4.233,2.233,4.533,4.817,4.333,1.983,4.633,2.017,5.100,1.800,5.033,4.000,2.400,4.600,3.567,4.000,4.500,4.083,1.800,3.967,2.200,4.150,2.000,3.833,3.500,4.583,2.367,5.000,1.933,4.617,1.917,2.083,4.5 83,3.333,4.167,4.333,4.500,2.417,4.000,4.167,1.883,4.583,4.250,3.767,2.033,4.433,4.08 3,1.833,4.417,2.183,4.800,1.833,4.800,4.100,3.966,4.233,3.500,4.366,2.250,4.667,2.100, 4.350,4.133,1.867,4.600,1.783,4.367,3.850,1.933,4.500,2.383,4.700,1.867,3.833,3.417,4. 233,2.400,4.800,2.000,4.150,1.867,4.267,1.750,4.483,4.000,4.117,4.083,4.267,3.917,4.55 0,4.083,2.417,4.183,2.217,4.450,1.883,1.850,4.283,3.950,2.333,4.150,2.350,4.933,2.900, 4.583,3.833,2.083,4.367,2.133,4.350,2.200,4.450,3.567,4.500,4.150,3.817,3.917,4.450,2. 000,4.283,4.767,4.533,1.850,4.250,1.983,2.250,4.750,4.117,2.150,4.417,1.817,4.467 31 Prof. Dr. J ürgen Jasperneite 15

Histogram of eruption data Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Histogram of eruptions 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 eruptions 32 Empirical distribution of eruption data Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 ecdf(eruptions) 2 3 4 5 x Data evidently bi-modal -> no standard distribution will fit What about looking at only the, e.g.,upper part? 33 Prof. Dr. J ürgen Jasperneite 16

Restricted empirical distribution Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 ecdf(long) 3.0 3.5 4.0 4.5 5.0 x 34 Looks like a reasonable fit with a normal distribution Check with Q-Q plot! Q-Q plot for eruption data 35 Sample Quantiles 3.0 3.5 4.0 4.5 5.0 Normal Q-Q Plot -2-1 0 1 2 Theoretical Quantiles Reasonable fit, but some differences in the tail Shifted mean for the theoretical quantiles not taken into account Example taken from the R manual (see Web page www.r-project.org ) Prof. Dr. J ürgen Jasperneite 17

Traffic Modeling Introduction Quantifying models Goodness of fit tests 36 Prof. Dr. J ürgen Jasperneite 18