CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Similar documents
Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Chapter 11. Output Analysis for a Single Model Prof. Dr. Mesut Güneş Ch. 11 Output Analysis for a Single Model

Slides 12: Output Analysis for a Single Model

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 11 Estimation of Absolute Performance

Network Simulation Chapter 6: Output Data Analysis

EE/PEP 345. Modeling and Simulation. Spring Class 11

B. Maddah INDE 504 Discrete-Event Simulation. Output Analysis (1)

CPSC 531 Systems Modeling and Simulation FINAL EXAM

Output Data Analysis for a Single System

2WB05 Simulation Lecture 7: Output analysis

Overall Plan of Simulation and Modeling I. Chapters

Simulation. Where real stuff starts

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Simulation. Where real stuff starts

Output Analysis for a Single Model

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Expected Values, Exponential and Gamma Distributions

Chapter 10 Verification and Validation of Simulation Models. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Construction Operation Simulation

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Computer Systems Modelling

Expected Values, Exponential and Gamma Distributions

ISyE 6644 Fall 2014 Test #2 Solutions (revised 11/7/16)

Non Markovian Queues (contd.)

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

Redacted for Privacy

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

1.225J J (ESD 205) Transportation Flow Systems

57:022 Principles of Design II Final Exam Solutions - Spring 1997

7 Variance Reduction Techniques

Probability and Statistics Concepts

EECS 126 Probability and Random Processes University of California, Berkeley: Fall 2014 Kannan Ramchandran November 13, 2014.

Verification and Validation. CS1538: Introduction to Simulations

Simulation of stationary processes. Timo Tiihonen

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION. < May 22, 2006

More on Input Distributions

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Comparing Systems Using Sample Data

Queueing Theory (Part 4)

IE 581 Introduction to Stochastic Simulation. Three pages of hand-written notes, front and back. Closed book. 120 minutes.

Analysis of Simulated Data

EE 368. Weeks 3 (Notes)

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010

Chapter 6 Queueing Models. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

ISyE 6644 Fall 2014 Test 3 Solutions

The exponential distribution and the Poisson process

ST 371 (IX): Theories of Sampling Distributions

Queueing Systems: Lecture 3. Amedeo R. Odoni October 18, Announcements

Name of the Student:

[Chapter 6. Functions of Random Variables]

Output Data Analysis for a Single System

Chapter 2 SIMULATION BASICS. 1. Chapter Overview. 2. Introduction

Slides 9: Queuing Models

Variance reduction techniques

Readings: Finish Section 5.2

Chapter 10 Verification and Validation of Simulation Models. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

SOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012

Review of Queuing Models

ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles).

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011

Discrete-Event System Simulation

ISyE 6644 Fall 2016 Test #1 Solutions

Variance reduction techniques

REAL-TIME DELAY ESTIMATION BASED ON DELAY HISTORY SUPPLEMENTARY MATERIAL

Introduction to Queueing Theory

Queueing Review. Christos Alexopoulos and Dave Goldsman 10/6/16. (mostly from BCNN) Georgia Institute of Technology, Atlanta, GA, USA

Queueing Theory and Simulation. Introduction

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

CS 147: Computer Systems Performance Analysis

Systems Simulation Chapter 6: Queuing Models

ARD: AN AUTOMATED REPLICATION-DELETION METHOD FOR SIMULATION ANALYSIS

Page 0 of 5 Final Examination Name. Closed book. 120 minutes. Cover page plus five pages of exam.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

ec1 e-companion to Liu and Whitt: Stabilizing Performance

Stochastic Models in Computer Science A Tutorial

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Bulk input queue M [X] /M/1 Bulk service queue M/M [Y] /1 Erlangian queue M/E k /1

Engineering Mathematics : Probability & Queueing Theory SUBJECT CODE : MA 2262 X find the minimum value of c.

LECTURE #6 BIRTH-DEATH PROCESS

Quiz Queue II. III. ( ) ( ) =1.3333

Section 8.1: Interval Estimation

OUTLINE. Deterministic and Stochastic With spreadsheet program : Integrated Mathematics 2

STAT FINAL EXAM

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Business Statistics 41000: Homework # 5

Exponential Distribution and Poisson Process

Simulation model input analysis

MEDIAN CONFIDENCE INTERVALS -GROUPING DATAINTO BATCHES AND COMPARISON WITH OTHER TECHNIQUES

Probabilities & Statistics Revision

Discrete Random Variables

Online Supplement to Delay-Based Service Differentiation with Many Servers and Time-Varying Arrival Rates

Introduction to Queueing Theory with Applications to Air Transportation Systems

Monte Carlo Integration

Problem Selected Scores

A STAFFING ALGORITHM FOR CALL CENTERS WITH SKILL-BASED ROUTING: SUPPLEMENTARY MATERIAL

Transcription:

CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science University of Calgary Fall 2017

Quote of the Day A person with one watch knows what time it is. A person with two watches is never quite sure. -Segal s Law 2

Simulation Output Analysis Purpose: estimate system performance from simulation output Understand: Terminating and non-terminating simulations Transient and steady-state behavior Learn about statistical data analysis: Computing confidence intervals Determining the number of observations required to achieve a desired confidence interval 3

Outline Measure of performance and error Transient and steady state Types of simulations Steady-state analysis Initial data deletion Length of simulation run Confidence interval Estimating mean and variance Confidence interval for small and large samples Width of confidence interval 4

Outline Measure of performance and error Transient and steady state Types of simulations Steady-state analysis Initial data deletion Length of simulation run Confidence interval Estimating mean and variance Confidence interval for small and large samples Width of confidence interval 5

Stochastic Nature of Simulation Output Output data are random variables, because the input variables are stochastic, and model is basically an input-output transformation A queueing example: Banff park entry booth Arrival rate ~ Poisson arrival process (λ per minute) Service time ~Exponential(μ = 1.5) minutes System performance: long-run average queue length Question 1: does simulation model result agree with M/M/1 model? Question 2: is queueing better/same/worse for HyperExp() service? Question 3: how much better would it be with two servers? Suppose we run the simulation 3 times, i.e., 3 replications Each replication is for a total of 5,000 minutes Divide each replication into 5 equal subintervals (i.e., batches) of 1000 minutes Y i,j : Average number of cars in queue from time j 1 1000 to j 1000 in replication i Y i : Average number of cars in queue in replication i 6

Stochastic Nature of Simulation Output queueing example (cont d): Batched average queue length for 3 independent replications: Batching Interval Replication (minutes) Batch, j 1, Y 1j 2, Y 2j 3, Y 3j [0, 1000) 1 3.61 2.91 7.67 [1000, 2000) 2 3.21 9.00 19.53 [2000, 3000) 3 2.18 16.15 20.36 [3000, 4000) 4 6.92 24.53 8.11 [4000, 5000) 5 2.82 25.19 12.62 [0, 5000) 3.75 15.56 13.66 Inherent variability in stochastic simulation both within a single replication and across different replications The average across 3 replications, i.e., Y 1, Y 2, Y 3, can be regarded as independent observations, but averages within a replication, e.g., Y 11, Y 12, Y 13, Y 14, Y 15, are not. 7

Measure of Performance Consider estimating a performance parameter Θ The true value of Θ is unknown Can only observe simulation output Estimate Θ using independent observations obtained from independent simulation runs (i.e., replications) Θ: estimation of Θ Is unbiased if: E Θ = Θ Desired Is biased if: E Θ Θ Estimation bias = E Θ Θ 8

Measure of Error Confidence Interval (CI): We cannot know for certain how far Θ is from Θ CI attempts to bound the estimation error Θ Θ The more replications we make, the lower the error in Θ Example: queueing system Y: long-run average queue length Y i : average queue length in simulation run i Define estimator Y = ഥY, i.e., Y = 1 8 Y 1 + + Y 8 = 14.814 Can calculate a 95% confidence interval for Y such that: P Y Y ϵ = 0.95 For instance: 11.541 Y 18.087 i Y i 1 15.028 2 13.385 3 18.891 4 10.559 5 8.866 6 15.883 7 18.598 8 17.302 9

Outline Measure of performance and error Transient and steady state Types of simulations Steady-state analysis Initial data deletion Length of simulation run Confidence interval Estimating mean and variance Confidence interval for small and large samples Width of confidence interval 10

Types of Simulations 1. Terminating simulation: there is a natural event that specifies the length of the simulation: Runs for some duration of time T E, where E is a specified event that stops the simulation Starts at time 0 under well-specified initial conditions Ends at specified stopping time T E Example: Simulating banking operations over one day Opens at 8:30 am (time 0) with no customers present and 8 of the 11 tellers working (initial conditions), and closes at 4:30 pm (Time T E = 480 minutes) 11

Types of Simulations 2. Non-terminating simulation: No natural event specifying length of the simulation Runs continuously, for a very long period of time Initial conditions defined by the performance analyst Runs for some analyst-specified period of time T E Of interest are transient and steady state behavior Example: Simulating banking operations to compute the long-run mean response time of customers 12

Types of Simulations Whether a simulation is considered to be terminating or non-terminating depends on both The objectives of the simulation study, and The nature of the system. Similar statistical techniques applied to both types of simulations to estimate performance and error For non-terminating simulations: Transient and steady-state behavior are different Generally, steady-state performance is of interest 13

Transient and Steady-State Behavior Consider a queueing system Define P n, t = P n in system at time t Depends on the initial conditions Depends on time t Steady state behavior System behavior over long-run: P n = lim t P n, t Independent of the initial conditions Independent of time 14

Transient and Steady-State Behavior

Steady-State Output Analysis General approach based on independent replications: Choose the initial conditions Determine the length of simulation run Run the simulation and collect data Problem: steady-state results are affected by using artificial and potentially unrealistic initial conditions Solutions: 1. Intelligent initialization 2. Simulation warmup (initial data deletion) 16

Intelligent Initialization Initialize the simulation in a state that is more representative of long-run conditions If the system exists, collect data on it and use these data to specify typical initial conditions If the system can be simplified enough to make it mathematically solvable (e.g. queueing models), then solve the simplified model to find long-run expected or most likely conditions, and use that to initialize the simulation 17

Simulation Warmup: Initial Data Deletion Divide each simulation into two phases: Initialization phase, from time 0 to time T 0 Data-collection phase, from T 0 to stopping time T 0 + T E Important to do a thorough job of investigating the initial-condition bias: Bias is not affected by the number of replications, rather, it is affected only by deleting more data (i.e., increasing T 0 ) or extending the length of each run (i.e. increasing T E ) How to determine T 0 and T E? 18

Simulation Warmup: Initial Data Deletion How to determine T 0? After T 0, system should be more nearly representative of steady-state behavior System has reached steady state: the probability distribution of the system state is close to the steady-state probability distribution No widely accepted, objective and proven technique to guide how much data to delete to reduce initialization bias to a negligible level Heuristics such as plotting the moving averages can be used 19

Initial Data Deletion How to implement data deletion in DES? At initialization, schedule a reset event at clock + T 0 reset event routine: reset all statistical counters (for data collection) to their initial values At the end of simulation, statistical counters contain data collected after the transient period 20

Length of Simulation Run How to determine T E? Too short: results may not be reliable Too long: wasteful of resources Method to determine length of run Perform independent replications For each replication, perform initial data deletion Select length of run and number of replications such that the confidence intervals for the performance measures of interest narrow to the desired widths 21

Outline Measure of performance and error Transient and steady state Types of simulations Steady-state analysis Initial data deletion Length of simulation run Confidence interval Estimating mean and variance Confidence interval for small and large samples Width of confidence interval 22

Confidence Interval Terminology Observation: a single value of a performance measure from an experiment Example: mean response time of a web server Sample: the set of observations of a performance measure from an experiment 23

Sample Versus Population Generate several million random numbers with a given distribution and draw a sample of m observations Sample mean population mean In discrete-event simulation, population characteristics such as mean and variance are unknown Need to estimate them using simulation output 24

Estimating Mean Consider a sample of m observations, denoted by Y 1, Y 2,, Y m Example: Y i is the mean response time of a web server in the i-th experiment Sample mean: m തY = 1 m i=1 Y i Sample mean തY is an unbiased estimator for the unknown population mean 25

Estimating Variance Sample variance: m s 2 = 1 m 1 i=1 Y i തY 2 s 2 is an unbiased estimator for the unknown population variance The divisor for s 2 is m 1 and not m This is because only m 1 of the m differences are independent Given m 1 differences, m-th difference can be computed since the sum of all m differences must be zero The number of independent terms in a sum is also called its degrees of freedom 26

Confidence Interval for Mean Consider a simulation study Y: random variable denoting the performance measure corresponding to the simulation output Example: average wait time of customers in a bank Problem: Y varies across different simulation runs Consider m simulation runs Y i : simulation output in simulation run i Generally, Y 1 Y 2 Y m Solutions: 1. Characterize distribution of Y (e.g., CDF) 2. Characterize statistics of Y (e.g., mean and variance) 27

Confidence Interval for Mean Consider a simulation study Y: random variable denoting the performance measure corresponding to the simulation output Example: average wait time of customers in a bank Objective is to characterize the unknown mean μ = E[Y] Algorithm: Make m independent simulation runs to obtain m observations Y 1, Y 2,, Y m Sample mean തY is an unbiased estimator for μ Question: How far is തY from μ? Determine confidence interval for mean 28

Confidence Interval for Mean Determine bounds c 1 and c 2 such that: P c 1 μ c 2 = 1 α [c 1, c 2 ]: 1 α 100% Confidence Interval (CI) 1 α 100% : Confidence Level c 1 c 2 μ 29

Determining Confidence Interval തY is a random variable: തY = Y 1 + Y 2 + + Y m m where the Y i s are IID with the same distribution as Y~N(μ, σ 2 ) We have: E തY = E Y 1 + + E[Y m ] = m V തY = V Y 1 + + V[Y m ] m 2 = m μ m = μ m σ2 m 2 = σ2 m μ and σ 2 unknown but can be estimated by തY and s 2 30

Determining Confidence Interval Define the normalized random variable X as X = ത Y μ s/ m Theorem: The distribution of X is independent of unobservable parameter μ For large m: X follows a standard normal distribution For small m: X follows a Student s t-distribution 31

Confidence Interval for Small Samples Define the normalized random variable T as T = ത Y μ s/ m T has a standard Student s t-distribution with d = m 1 degrees of freedom It describes the distribution of the mean of a sample of m observations Symmetric distribution E T = 0, and V T = d for d 2 d 2 As m, we have T~N(0, 1) 32

Student s t-distribution d=1 d=2 d=5 d=10 d=infinity Probability Density Function (PDF) 33

Determining Confidence Interval Quantile: The x value at which the CDF takes a value a is called the a-quantile or 100a-percentile. It is denoted by x α : α = F x α = P(X x α ) f(x) α Example: X has standard normal distribution 95-percentile = 0.95-quantile = 1.6449 25-percentile = 0.25-quantile = - 0.6745 x α 34

Determining Confidence Interval Define c = t m 1,1 α/2 as: (1 α/2)-quantile of T with d = m 1 degrees of freedom: P T t m 1,1 α/2 = 1 α/2 t m 1,1 α/2 0 t m 1,1 α/2 Note that t m 1,1 α/2 does not depend on the value of the unobservable population mean μ 35

Determining Confidence Interval Therefore Which means P c T c = 1 α P c ത Y μ s/ m c = 1 α P തY c s m μ തY + c s m = 1 α (1-α)100% confidence interval of µ is given by തY c s m, തY + c s m 36

Example Sample: -0.04, -0.19, 0.14, -0.09, -0.14, 0.19, 0.04, and 0.09 Mean = 0, Sample standard deviation = 0.138 For 90% confidence interval: t 7, 0.95 = 1.895 Confidence interval for the mean 0.138 0 1.895 0 0.093 ( 0.093, 8 0.093) 37

Confidence Interval: Meaning If we take 100 samples and construct confidence interval for each sample, the interval would include the population mean in 90 cases. c 1 µ c 2 Total yes > 100(1-α) 38

Example: Assignment 2 10 replications of Banff park entry gate simulation Warmup: 10,000 minutes Number of cars: 60,000 λ 1/μ ρ Mean Q Std Dev 0.5 1.5 0.75 3.019 0.109 0.55 1.5 0.825 4.715 0.174 0.60 1.5 0.90 9.042 0.980 0.65 1.5 0.975 39.876 12.76 See graph online for 90% confidence intervals 39

Example: Assignment 2 (cont d) 10 batches from Banff park entry gate simulation Warmup: 0 minutes Number of cars: 500,000 λ 1/μ ρ Mean Q Std Dev 0.5 1.5 0.75 2.997 0.088 0.55 1.5 0.825 4.813 0.206 0.60 1.5 0.90 9.033 0.608 0.65 1.5 0.975 42.22 14.72 40

Confidence Interval for Large Samples Define normalized random variable Z as Z = ത Y μ s/ m where s is the sample standard deviation From Central Limit Theorem: Z has standard normal distribution for large m (1-α)100% confidence interval for m: z 1-a/2 = (1-a/2)-quantile of N(0,1) s Y z1 a / 2, Y z1 a / 2 m s m -z 1-a/2 0 z 1-a/2 41

Example തY= 3.90, s = 0.95, and m = 32 A 90% confidence interval for the mean = We can state with 90% confidence that the population mean is between 3.62 and 4.17 The chance of error in this statement is 10%. 42

Width of Confidence Interval Width of the confidence interval is 2 t m 1,1 α 2 s m Width can be reduced by Using a larger m (i.e., more simulation runs) Using a smaller s (i.e., longer simulation runs) 43

Number of Simulation Runs Suppose the desired width of the confidence interval is δ, and m replications have been made but the desired width is not met: Total number of replications required can be estimated by m = 2 t m 1,1 α 2 s δ 2 Number of additional replications required = m m 44

Length of Simulation Runs An alternative to increasing m is to increase total run length T 0 + T E for each replication Approach: for β 1 Increase run length from (T 0 + T E ) to β(t 0 + T E ), and Delete additional amount of data, from time 0 to time βt 0 βt 0 β(t 0 +T E ) 45