EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

Similar documents
Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 18

2 k, 2 k r and 2 k-p Factorial Designs

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

CS 5014: Research Methods in Computer Science. Experimental Design. Potential Pitfalls. One-Factor (Again) Clifford A. Shaffer.

Fractional Factorial Designs

Glossary availability cellular manufacturing closed queueing network coefficient of variation (CV) conditional probability CONWIP

CS 5014: Research Methods in Computer Science

14 Random Variables and Simulation

Network Simulation Chapter 6: Output Data Analysis

Notation Precedence Diagram

One Factor Experiments

Uniform random numbers generators

Simulation. Where real stuff starts

Contents LIST OF TABLES... LIST OF FIGURES... xvii. LIST OF LISTINGS... xxi PREFACE. ...xxiii

CS 147: Computer Systems Performance Analysis

Discrete-Event System Simulation

CPSC 531 Systems Modeling and Simulation FINAL EXAM

UNIT 5:Random number generation And Variation Generation

Analyzing Simulation Results

Summarizing Measured Data

Simulation. Where real stuff starts

Overall Plan of Simulation and Modeling I. Chapters

Two-Factor Full Factorial Design with Replications

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Systems Simulation Chapter 7: Random-Number Generation

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

2 k Factorial Designs Raj Jain

IE 303 Discrete-Event Simulation L E C T U R E 6 : R A N D O M N U M B E R G E N E R A T I O N

Chapter Learning Objectives. Probability Distributions and Probability Density Functions. Continuous Random Variables

Random Number Generation. CS1538: Introduction to simulations

Exploring Monte Carlo Methods

Stochastic Simulation of

Analysis of Variance and Design of Experiments-I

Queueing Theory and Simulation. Introduction

Part I Stochastic variables and Markov chains

Chapter 2 Queueing Theory and Simulation

Statistical Methods in Particle Physics

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Learning Objectives for Stat 225

CS 700: Quantitative Methods & Experimental Design in Computer Science

Probability Distributions Columns (a) through (d)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

Two Factor Full Factorial Design with Replications

Tables Table A Table B Table C Table D Table E 675

Statistics For Economics & Business

Distribution Fitting (Censored Data)

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

Convolution Algorithm

Stat 5101 Lecture Notes

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

2008 Winton. Statistical Testing of RNGs

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

2 k Factorial Designs Raj Jain Washington University in Saint Louis Saint Louis, MO These slides are available on-line at:

Stochastic Simulation of Communication Networks

2WB05 Simulation Lecture 7: Output analysis

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Lecture 2: Repetition of probability theory and statistics

HANDBOOK OF APPLICABLE MATHEMATICS

Factorial designs. Experiments

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Modeling and Performance Analysis with Discrete-Event Simulation

HANDBOOK OF APPLICABLE MATHEMATICS

Sources of randomness

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

n i n T Note: You can use the fact that t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) =

IE 581 Introduction to Stochastic Simulation

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

ISyE 6644 Fall 2014 Test 3 Solutions

Plotting data is one method for selecting a probability distribution. The following

Two or more categorical predictors. 2.1 Two fixed effects

Department of Electrical- and Information Technology. ETS061 Lecture 3, Verification, Validation and Input

ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION. < May 22, 2006

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

Ch. 1: Data and Distributions

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Lecturer: Olga Galinina

Summarizing Measured Data

Practice Problems Section Problems

Statistics Toolbox 6. Apply statistical algorithms and probability models

Randomized Algorithms

Slides 3: Random Numbers

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Sociology 6Z03 Review II

Fundamentals of Applied Probability and Random Processes

2008 Winton. Review of Statistical Terminology

ISyE 6644 Fall 2014 Test #2 Solutions (revised 11/7/16)

Computer Systems Modelling

Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Date

Transcription:

EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 19 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org (based on Dr. Raj Jain s lecture notes) Outline Simulation of a Single-Server Queueing System Review of midterm # 1

Simulation of a Single-Server Queueing System 3 Simulation model Performance metrics to be measured State variables Manual simulation steps 4 Workload Characterization Techniques Workload characterization: the process of studying the real-user environments, observe the key characteristics, and develop a workload model that can be used repeated The measured workload data consists of services requested or the resource demands of a number of users on the system Workload component or workload unit - the entity that makes the service requests at the SUT interface Workload parameters or workload features Measured quantities, service requests, or resource demands

Workload Characterization Techniques 5 Averaging Single-parameter histograms Multiparameter histograms Principal component analysis Markov models Clustering Principal Component Analysis 6 Key idea: use a weighted sum of parameters to classify the components Let x ij denote the ith parameter for jth component y j = i= 1 Principal component analysis assigns weights w i s such that y j s provide the maximum discrimination among the components The quantity y j is called the principal factor The factors are ordered. First factor explains the highest percentage of the variance n w x i ij 3

Markov Models 7 Markov => the next request depends only on the last request Described by a transition matrix Given the same relative frequency of requests of different types, it is possible to realize the frequency with several different transition matrices Clustering 8 Take a sample, that is, a subset of workload components Select workload parameters Select a distance measure Remove outliers Scale all observations Perform clustering Interpret results Change parameters, or number of clusters, and repeat 3-7 Select representative components from each cluster 4

Experimental Designs - Terminology 9 Response variables: outcome Factors: variables that affect the response variable Levels: the values that a factor can assume Primary factors and secondary factors Replication: repetition of all or some experiments Design: the number of experiments, the factor level and number of replications for each experiment Experimental Unit Interaction: effect of one factor depends upon the level of the other Types of Experimental Designs 10 Simple Designs: vary one factor at a time Full Factorial Design: all combinations Fractional Factorial Designs: use only a fraction of the full factorial design 5

k Factorial Designs 11 k factors, each at two levels Easy to analyze Helps in sorting out impact of factors Good at the beginning of a study Valid only if the effect of a factor is unidirectional, i.e., the performance either continuously decreases or continuously increases as the factor is increased from min to max E.g., memory size, the number of disk drives Computation of Effects: Sign Table Method 1 For a design, the effects can be computed easily by preparing a 4 4 sign matrix as shown above Next, multiply the entries in column I by those in column y and put their sum under column I; perform similar calculation for other columns The sums under each column are divided by 4 to give the corresponding coefficients of the regression model 6

Allocation of Variation 13 Importance of a factor = proportion of the variation explained ( ) 1 Sample Variance of = = i= yi y y sy 1 Variation of y Δ Numerator For a design = i = 1 ( y y) Variation due to A=SSA= q, etc. i = sum of squares total ( SST ) SST = + qa + qb qab A k r Factorial Designs with Replication 14 r replications of k experiments k r observations Allows estimation of experimental errors Model: y = q 0 + q A x A + q B x B + q AB x A x B +e e = experimental error Computation of effects: use sign table Estimation of errors: e = y yˆ = y q + q x + q x + q x x ij ij i ij 0 A Ai B Bi AB Ai Bi 7

k r Factorial Designs with Replication 15 Allocation of Variation: SST = ( y y ) = rq + rq + rq + e ij Effects are random variables Errors ~ ij.. A B AB ij i, j SST = SSA + SSB + SSAB + SSE N(0, σ ) y ~ N( y, σ ) q 0 is normal with variance e.. e σ /( r e ) Confidence Intervals for Effects 16 Variance of errors: Estimated variance of q 0 : Similarly, s q A = s q B 1 SSE s = e ( 1) = MSE ( r 1) Δ e e r ij = s s s /( r ) 0 = s q = Confidence intervals (CI) for the effects: q i t [ r ] s 1 α / ; ( 1 ) qi CI does not include a zero => significant q AB e e r 8

17 Multiplicative Models for r Experiments Additive model. Not valid if effects do not add E.g., execution time of workloads Execution time y ij = v i w j The two effects multiply. Logarithm => additive model: log( y ) = log( v ) + log( w ) Correct model: ij where, y' ij = log( yij ) Taking an antilog of additive effects q. to get the multiplicative effects u. i j y ' = q + q x + q x + q x x + e ij 0 A A B B AB A B ij k-p Fractional Factorial Designs 18 Large number of factors Large number of experiments Full factorial design too expensive Use a fractional factorial design k-p design allows analyzing k factors with only k-p experiments k-1 design requires only half as many experiments k- design requires only one quarter of the experiments 9

Preparing Sign Table for k-p Design 19 Prepare a sign table for a full factorial design with k-p factors Mark the first column I Mark the next k-p columns with the k-p factors Of the ( k-p k + p 1) columns on the right, choose p columns and mark them with the p factors which were not chosen in step 1 Algebra of Confounding 0 Given just one confounding, it is possible to list all other confoundings Rules: I is treated as unity Any term with a power of is erased 10

Design Resolution 1 Order of an effect = number of factors included in it Order of ABCD=4, order of I=0 Order of a confounding = sum of the order of two terms E.g., AB=CDE is of order 5 Resolution of a design = minimum of orders of confoundings Notation: Resolution III k = = p R III III One Factor Experiments Used to compare alternatives of a single categorical variable For example, several processors, several caching schemes Model: yij = μ + α j + eij r = number of replications yij = ith response with jth alternative μ = mean response α j = effect of alternative j e = error term ij α = 0 j 11

Analysis of Variance (ANOVA) 3 Importance significance Important => explains a high percent of variation Significance => high contribution to the variation compared to that by errors Degree of freedom = number of independent values required to compute SSY = SS0 + SSA + SSE ar = 1 + ( a 1) + a( r 1) Note that the degrees of freedom also add up F-Test 4 Purpose: to check if SSA is significantly greater than SSE Errors are normally distributed => SSE and SSA have chi-square distributions The ratio (SSA/ν A )/ (SSE/ν e ) has an F distribution. Where ν A = a 1 = degree of freedom for SSA ν e = a(r 1) = degree of freedom for SSE Computed ratio > F[1 α ; ν A, ν e] => SSA is significantly higher than SSE SSA/ν A is called mean square of A or (MSA). Similarly, MSE = SSE/ ν e 1

Simulations - Terminology 5 State variables Event: a change in the system state Continuous-time and discrete-time modes Continuous state and discrete state models Deterministic and probabilistic models Static and dynamic models Linear and nonlinear models Open and closed models Stable and unstable models Types of Simulations 6 Emulation: using hardware or firmware Monte Carlo simulation Trace-driven simulation Discrete event simulation Process-oriented simulation 13

Components of Discrete Event Simulations Event scheduler Simulation clock and a time-advancing mechanism System state variables Event routines Input routines Report generator Initialization routines Trace routines Dynamic memory management Main program 7 Analysis of Simulation Results 8 Model verification techniques Verification => debugging, correct implementation of the model Model validation techniques Validation => model = real world, valid assumption Transient removal Terminating simulations Stopping criteria: variance estimation Variance reduction 14

Model Verification Techniques 9 Top down modular design Antibugging Structured walk-through Deterministic models Run simplified cases Trace Online graphic displays Continuity test Degeneracy tests Consistency tests Seed independence Model Validation Techniques 30 Aspects to validate Assumptions Input parameter values and distributions Output values and conclusions Techniques Expert intuition Real system measurements Theoretical results 15

Transient Removal 31 General steady state performance is interesting Remove the initial part. No exact definition => heuristics Transient removal methods Long runs Proper initialization Truncation Initial data deletion Moving average of independent replications Batch means Stopping Criteria: Variance Estimation 3 Run until confidence interval is narrow enough x ± z1 α / Var( x) Independence not applicable to most simulations. Large waiting time for ith job => large waiting time for (i+1)th job For correlated observations: Actual variance» Var(x)/n Solutions: Independent replications Batch means Method of regeneration 16

Random-Number Generation 33 A key step in developing a simulation: have a routine to generate random values for variables with a specified random distribution Random number generation: a sequence of random numbers distributed uniformly between 0 and 1 Random variate generation: the sequence is transformed to produce random values obeying the desired distribution Terminology 34 Seed: x 0 Pseudo-random: deterministic yet would pass randomness tests Fully random: not repeatable Cycle length, Tail, Period: 17

35 Desired Properties of a Good Generator It should be efficiently computable The period should be large The successive values should be independent and uniformly distributed 36 Types of Random-Number Generators Linear congruential generators Tausworthe generators Extended Fibonacci generators Combined generators 18

Seed Selection 37 Do not use zero Avoid even values Do not subdivide one stream Might lead to strong correlation Use non-overlapping streams Reuse seeds in successive replications Do not use random seeds such as the time of day Can t reproduce. Can t guarantee non-overlap Myths About Random-Number Generation 38 A complex set of operations leads to random results A single test, such as the chi-square test, is sufficient to test the goodness of a random-number generator Random numbers are unpredictable Some seeds are better than others Accurate implementation is not important Bits of successive words generated by a randomnumber generator are equally randomly distributed 19

Chi-Square Test 39 Most commonly used test. Can be used for any distribution Prepare a histogram of the observed data Compare observed frequencies with theoretical D= D=0 => exact fit D has a chi-square distribution with k 1 degrees of freedom => compare D with χ[ 1 α ; k 1] Pass with confidence α if D is less Kolmogorov-Smirnov Test Designed for continuous distributions Difference between the observed CDF (cumulative distribution function) F o (x) and the expected cdf F e (x) should be small K + = maximum observed deviation above the expected cdf K = maximum observed deviation below the expected cdf + K = nmax( F ( x) F ( x)) K = nmax( F ( x) F ( x)) x K + < K [1-α;n] and K - < K [1-α;n] => pass at α level of significance For U(0, 1): F e (x) = x, F o (x) = j/n, where x > x 1, x,, x j-1 o e e o x + j j 1 K = nmax( xj) K = nmax( xj ) j n j n 40 0

Chi-Squire vs. K-S Test 41 K-S Test Small samples Continuous distributions Differences between observed and expected CDFs Use each observation in the sample without any grouping => makes a better use of the data Cell size is not a problem Exact Chi-Square Test Large sample Discrete distributions Differences between observed and hypothesized probabilities (pdfs or pmfs) Groups observations into a small number of cells Cell sizes affect the conclusion but no firm guidelines Approximate k-dimensional Uniformity or k-distribution 4 k-distributed if: P( a un 1 < b1,..., ak un+ k 1 < bk ) = ( b1 a1) ( bk a 1 k For all choices of a i, b i in [0, 1), with b i > a i, i=1,,, k. k-distributed sequence is always (k 1)-distributed. The inverse is not true Two tests: Serial test Spectral test ) 1

Serial-Correlation Test 43 Nonzero covariance => dependence The inverse is not true R k = autocovariance at lag k = Cov[x n, x n+k ] For large n, R k is normal distributed with a mean of zero and a variance of 1/[144(n-k)] 100(1-α)% confidence interval for the autocovariance is z /(1 n ) R k 1 α / k Check if CI includes zero Spectral Test 44 Goal: to determine how densely the k-tuples {x 1, x,, x n } can fill up the k-dimensional hyperspace The k-tuples from an LCG all on a finite number of parallel hyperplanes Successive pairs would lie on a finite number of lines In three dimensions, successive triplets lie on a finite number of planes Spectral test: determine the max distance between adjacent hyperplanes Larger distance => worse generator

Random-Variate Generation 45 General techniques Inverse Transformation Rejection Composition Convolution Characterization 46 How to Select a Random-Variate Generation Technique 3

Commonly Used Distributions 47 Bernoulli, binomial, negative binomial, Poisson, geometric, Pascal, normal Gamma, Erlang, Exponential, Beta, uniform, Pareto, χ, Cachy, Lognormal, t, F, Weibull 4