Simulation. Where real stuff starts

Similar documents
Simulation. Where real stuff starts

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

Chapter 11. Output Analysis for a Single Model Prof. Dr. Mesut Güneş Ch. 11 Output Analysis for a Single Model

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Uniform random numbers generators

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Numerical Methods I Monte Carlo Methods

Exercises Solutions. Automation IEA, LTH. Chapter 2 Manufacturing and process systems. Chapter 5 Discrete manufacturing problems

ISyE 6644 Fall 2014 Test #2 Solutions (revised 11/7/16)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Slides 12: Output Analysis for a Single Model

Chapter 11 Estimation of Absolute Performance

Random Number Generation. CS1538: Introduction to simulations

Stochastic Processes

Lecture 2 Sep 5, 2017

The distribution inherited by Y is called the Cauchy distribution. Using that. d dy ln(1 + y2 ) = 1 arctan(y)

ISyE 6644 Fall 2014 Test 3 Solutions

Random Number Generation. Stephen Booth David Henty

Network Simulation Chapter 6: Output Data Analysis

Econ 424 Time Series Concepts

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

Monte Carlo Simulation. CWR 6536 Stochastic Subsurface Hydrology

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Computational statistics

Department of Electrical- and Information Technology. ETS061 Lecture 3, Verification, Validation and Input

Output Data Analysis for a Single System

14 Random Variables and Simulation

7 Variance Reduction Techniques

More on Input Distributions

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

An MCMC Package for R

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

16 : Markov Chain Monte Carlo (MCMC)

Random variables. DS GA 1002 Probability and Statistics for Data Science.

1.225J J (ESD 205) Transportation Flow Systems

Probability and Statistics Concepts

Overview of Spatial analysis in ecology

Variance reduction techniques

STA205 Probability: Week 8 R. Wolpert

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Lehmer Random Number Generators: Introduction

2008 Winton. Statistical Testing of RNGs

EE/PEP 345. Modeling and Simulation. Spring Class 11

11. Further Issues in Using OLS with TS Data

MATH5835 Statistical Computing. Jochen Voss

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

Overall Plan of Simulation and Modeling I. Chapters

18.05 Practice Final Exam

Lecture 2 : CS6205 Advanced Modeling and Simulation

ACM 116: Lectures 3 4

Variance reduction techniques

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Workshop on Heterogeneous Computing, 16-20, July No Monte Carlo is safe Monte Carlo - more so parallel Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Midterm Exam 1 (Solutions)

ECE Lecture 4. Overview Simulation & MATLAB

Lecture 6: Markov Chain Monte Carlo

Name of the Student:

Class 26: review for final exam 18.05, Spring 2014

2905 Queueing Theory and Simulation PART IV: SIMULATION

Chapter 10 Verification and Validation of Simulation Models. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010

Generating Uniform Random Numbers

Practical Statistics

Slides 3: Random Numbers

ST 740: Markov Chain Monte Carlo

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Covers Chapter 10-12, some of 16, some of 18 in Wooldridge. Regression Analysis with Time Series Data

2 Random Variable Generation

1 Random walks and data

ISE/OR 762 Stochastic Simulation Techniques

On the Partitioning of Servers in Queueing Systems during Rush Hour

Review of Probability. CS1538: Introduction to Simulations

Contents LIST OF TABLES... LIST OF FIGURES... xvii. LIST OF LISTINGS... xxi PREFACE. ...xxiii

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Bulk input queue M [X] /M/1 Bulk service queue M/M [Y] /1 Erlangian queue M/E k /1

Chapter 2 SIMULATION BASICS. 1. Chapter Overview. 2. Introduction

CSC 2541: Bayesian Methods for Machine Learning

B. Maddah ENMG 622 Simulation 11/11/08

CS145: Probability & Computing Lecture 24: Algorithms

Sources of randomness

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Random Number Generators

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Simulation optimization via bootstrapped Kriging: Survey

IE 303 Discrete-Event Simulation L E C T U R E 6 : R A N D O M N U M B E R G E N E R A T I O N

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

CPSC 531 Systems Modeling and Simulation FINAL EXAM

Massachusetts Institute of Technology

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Seminar on Simulation of Telecommunication Sistems

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Transcription:

1 Simulation Where real stuff starts

ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2

1. What is a simulation? 3

What is a simulation? An experiment in computer Important differences Simulated vs real time Serialization of events Initialization 4

5 Two simulation examples. What is the Simulation 1: Time to complete jobs customers arrive (at the same time) at Joe s shop and download a file. We want to estimate the time it takes to serve the customers difference? Simulation 2: Delay at a server An information server handles requests according to some scheduling policy. We want to estimate the time it takes to serve one request

Terminating versus non terminating simulation 6

7 Two Simulation Runs What is the difference? Scenario 1 Scenario 2 Average service time = 96 ms Average service time = 101 ms Request arrival rate = 100 /s

8 There is no stationary regime walk to infinity 10 /s Transient phase due to atypical initial conditions followed by typical (=stationary) phase You want to measure things only in the stationary phase Otherwise: non reproducible results, non typical

9 Definition of Stationarity A property of a stochastic model X t Let X t be a stochastic model that represents the state of the simulator at time t. It is stationary iff for any s>0 (X t1, X t2,, X tk ) has same distribution as (X t1+s, X t2+s,, X tk+s ) i.e. simulation does not get old

Classical Cases Markov models Definition: State X t is sufficient to draw the future of the simulation Common case for all simulations For a Markov model, over a discrete state space If you run the simulation long enough it will either walk to infinity (unstable) or converge to a stationary regime Ex: queue with >1: unstable queue with <1: becomes stationary after transient If the state space is strongly connected (any state can be reached from any state) then there is 0 or 1 stationary regime Ex: queue Else, there may be several distinct stationary regimes Ex: system with failure modes 10

Stationarity and Transience Knowing whether a model has a stationary regime is sometimes a hard problem We will see important models where this is solved Ex: some queuing systems Ex: time series models Reasoning about your system may give you indications Do you expect growth? Do you expect seasonality? Once you believe your model is stationary, you should handle transients in order to eliminate the impact of initial conditions Remove (how? Look at your output and guess) Sometimes it is possible to avoid transients at all (perfect simulation see later Importance of the View Point ) 11

Typical Reasons For Non Stationarity Obvious dependency on time Seasonality, growth Can be ignored at small time scale (minute) Non Stability: Explosion Queue with utilization factor >1 Non Stability: Freezing Simulation System becomes slower with time (aging) Typically because there are rare events of large impact («Kings») The longer the simulation, the larger the largest king We ll come back to this in the chapter «Importance of the View Point» 12

The state of a simulation at time is drawn at random from a distribution F(). Is this a stationary simulation? A. Yes B. It depends on the distribution is a sample drawn from the distribution 23, 100 C. It depends if the simulation terminates or not 56% D. No 26% E. I don t know 15% 4% 0% It depends on the distribut Yes It depends if the simulation No I don t know 13

14 Solution This simulation is stationary because the joint distribution of is the same for all It is the distribution of a vector of independent random variables drawn from is a sample drawn from the distribution 23, 100 It is statistically impossible to see whether a part of the simulation is young or old

15 2 Accuracy of Simulation Output A stochastic simulation produces a random output, we need confidence intervals 30 runs of Joe s shop: estimate the time to serve a batch of customers

16 Confidence Intervals Median Mean

When computing these confidence intervals we need to make sure that A. the simulation runs are independent B. the output is normally distributed 60% C. Both D. None 24% E. I don t know 16% 0% 0% the simulation runs are... the output is normally di... Both None I don t know 17

18 Solution If we use the median we need to ensure that the runs are independents If we use the mean we also need to ensure that the runs are independents that we are in the domain of validity of theorem 2.2.2 which is probably OK when we will see a more formal answer next Answer A We will now see how to make sure simulation runs are independent

3 Random Number Generator A stochastic simulation does not use truly random numbers but use pseudo random numbers The Random Number Generator produces a random number U(0,1) based on a chaotic sequence Example (obsolete but commonly used, eg in ns2: ) Let us check if output looks random 19

20 The Linear Congruential Generator of ns2 uniform qq plot appears to be uniform autocorrelation Auto correlation is negligible

Lag Diagram, 1000 points 21 appears to be independent of

22 Period of RNG This RNG appears to produce an iid, uniform output However, it is in fact periodic Any sequence generated from a deterministic algorithm in a given computer must be periodic (because the number of states is finite) But we can require that the period should be much larger than maximum number of uses The ns2 simulator has period, which is too small The Mersenne twister (matlab s default) has period

23 Example when the period is too small Incorrect intervals (too large) Correct intervals

for two parallel streams with the ns2 RNG 24

Using RNGs Be careful to have a RNG that has a period orders of magnitude larger than what you will ever use in the simulation choose your RNG carefully The RNG must have a period several orders of magnitude larger than the maximum number of times you will call it The RNG uses a seed for independent replications you need independent seeds; sequential runs usually are sufficient. If you use parallel runs, you must find an way to obtain truly independent seeds Eg based on clock Or by using an RNG that supports parallel streams 25

26 4 Sampling From A Distribution Problem is: Given a distribution, and a RNG, produce some sample that is drawn from this distribution A common task in simulation Matlab does it for us most of the time, but not always Two generic methods CDF inversion Rejection sampling

CDF Inversion for distribution with CDF Applies to real or integer valued random variable For a continuous distribution: draw using RNG; then is drawn from the distribution with CDF CDF 27

28 Proof is uniformly distributed: for We define by, i.e. Let us compute the CDF of because is monotonic increasing Therefore

29

30 CDF Inversion, when has jumps Replace inverse by pseudo inverse CDF of an integer valued random variable

31 Sampling an integer valued RV We want to draw a sample of the random integer. Let for and such that 1. Draw using the RNG 2. Output is such that

32

33 What does this function compute? myfun if rand 0 else 1 A. The flip of a coin where 1 is obtained with proba and 0 with proba 1 B. The flip of a coin where 1 is obtained with proba 1 and 0 with proba C. A sample of a geometric random variable with mean D. A sample of a geometric random variable with mean E. I don t know The flip of a coin where 1 is... The flip of a coin where 1 is... 13% 71% A sample of a geometric r... A sample of a geometric ra... 17% 0% 0% I don t know

34 Solution The flip of a coin that returns 0 with probability follows If return 0 else return 1 is obtained as Answer B

Which one is a uniform random point? A. A B. B C. Both D. None E. I don t know A 10 000 samples 64% B 16% 16% A B Both None 4% I don t know 0% 35

36 Solution A is a regular (non random) sample A B is sampled from the uniform distribution Answer B B

How do you sample a uniform point in a rectangle? 37

How do you sample a uniform point in a rectangle? where and are independent, because the distribution of given is uniform in (and therefore does not depend on do times Draw from the RNG end 38

We sample a point uniformly in some arbitrary area. Are and independent? A. Yes B. It depends on the area C. No D. I don t know 38% 38% 23% Yes It depends on the are No I don t know 0% 39

Solution When area is a rectangle, we have seen that the answer is yes For a general area, the distribution of given depends on Given Given are not independent 40

What is the PDF of the uniform distribution in?, if, if C. Something else D. I don t know 48% 32% 8% 12% 41 $$ $$,$$ $$,$$ = 1 if ($$, $$ $$,$$ $$,$$ = $$$$ if Something else I don t know

42 Solution Answer C The pdf is constant on (that is the definition of uniform) and 0 outside A but the constant is such that the total probability is 1, if

43 How can we sample a point uniformly distributed in this area? A B draw 10,20 if 0; 10 draw 20; 40 else draw 0; 20 repeat draw 10,20, 20; 40 until, A. A B. B C. Both D. None E. I don t know A 25% 38% B Both 38% None 0% I don t know 0%

44 Solution is not uniformly distributed. a higher chance of falling in A is wrong has

45 Solution B repeat draw 10,20, 20; 40 until, B first computes a point that is uniformly distributed and rejects it if it does not pass the test This is called rejection sampling It produces a sample of the conditional distribution of that (theorem 6.2 see later) given

46 Solution B produces a sample of the conditional distribution of given that What is this distribution?, Thus the pdf is,, i.e. the uniform distribution.

47 Rejection Sampling for Conditional Distribution Notation in this theorem Previous example, Unif rectangle), Unif rectangle), Unif A)

48 Recap A uniformly distributed point in an area A has pdf equal to,, It can be sampled by rejection sampling sample a point reject if not in until return uniform in a bounding box around Note: the distribution here is not the same as there

What does this algorithm compute? 1. sample uniformly in 10,10 ; 2. compute 3. with probability return with probability 1 goto 1 sin A. A sample of the random variable with pdf for some constant B. A sample of the random variable with pdf for some constant C. A sample of the random variable with pdf 1 for some constant D. Nothing, it never terminates E. None of the above F. I don t know A sample of the random var.. A sample of the random var.. 50% 14% A sample of the random var.. 9% Nothing, it never terminate 9% None of the above 14% I don t know 5% 49

50 Solution Intuitively, this algorithm chooses in with a probability proportional to Answer A is correct Formally, we will see the proof next. Note that the constant total probability =1, i.e. It is called a normalizing constant is given by the condition:

Rejection Sampling for General Distributions 51

Example: use rejection sampling to draw a sample of with pdf we take The rejection sampling algorithm is: 1. Sample and 2. If / return else goto 1 52

Are these algorithms equivalent when and? A. Yes B. No C. It depends D. I don t know A B 33% 38% 14% 14% Yes No It depends I don t know 53

54 Solution A B In Algorithm B, let Re write B as: 1. Sample uniformly in and ; 2. If return else goto 1 This is the same as Algorithm A Answer A is correct.

55

56 pdf histogram of 2 000 samples obtained by rejection sampling

We sampled 5 000 points from the distribution in with pdf Who s what? A. A 1, B 2, C 3 B. A 1, B 3, C 2 C. A 2, B 1, C 3 D. A 2, B 3, C 1 E. A 3, B 1, C 2 F. A 3, B 2, C 1 G. I don t know, 1,, 0.5 0.5 14% 14% 14% 14% 14% 14% 14% A 1, B 2, C 3 A 1, B 3, C 2 A 2, B 1, C 3 A 2, B 3, C 1 A 3, B 1, C 2 A 3, B 2, C 1 I don t know 57

58 Solution A is proportional to the distance from to the line B is uniform C is proportional to the distance from square to the center of the Answer F

Which algorithm is a correct sampling of the distribution in with pdf? A. A B. B C. Both D. None E. I don t know A B, 0.5 0.5 1. Sample and uniformly in 0,1 and ~Unif 0, 2 ; 2. If, return, else goto 1 1. Sample and uniformly in 0,1 and ~Unif 0,1 ; 2. If, return, else goto 1 45% 36% 14% 5% 0% A B Both None I don t know 59

60 Solution Both are correct implementations of rejection sampling. We need where, for all Algorithm B is more efficient as it requires fewer iterations in average. An even more efficient implementation would sample U\simUnif(0, \sqrt(2)/2)u\simunif(0, \sqrt(2)/2)

61

Another Sample from a Weird Distribution 62

6.3 Ad Hoc Methods Optimized methods exist for some common distributions Optimization = reduce computing time If implemented in your tool, use them! Example: simulating a normal distribution Inversion method is not simple (no closed form for F 1 ) Rejection method is possible But a more efficient method exists, for drawing jointly 2 independent normal RVs There are also ad hoc methods for n dimensional normal distributions (gaussian vectors) 63

64 1 000 samples of the random Gaussian vector with zero mean and covariance matrix and are independent

65 1 000 samples of the random Gaussian vector with zero mean and covariance matrix are not independent

Non normal I don t know What is the distribution of? A. Normal with B. Normal with C. Normal but with other parameters D. Non normal E. I don t know 47% 26% 11% 16% 0% 66 Normal with $$=0,$$=0 Normal but with other par. Normal with $$=0,$$

67 Solution has a normal distribution (the marginal of a Gaussian random vector is Gaussian = normal) The mean is 0 because the mean of The variance is given by the covariance matrix and is 1. Answer 1

68 4 Monte Carlo Simulation A simple method to compute integrals and probabilities Idea: interpret the integral or probability and assume you can simulate as many independent samples of X as you want: 1. generate replicates 2. the Monte Carlo estimate of is 3. compute a confidence interval for the mean (since is the mean of the distribution of ) as

Example: Compute i.e. the average distance between two points uniformly distributed in the unit square 0 ; 1. Say which one is a correct implementation of the Monte-Carlo method: A 1. Draw points, random uniform in 0,1 2. mean,, 1: 3. std,, 1: 4. 1.96 A. A B. B C. Both D. None E. I don t know 0, 0 B do times sample (,,, ) iid Unif (0,1) end / 18% 1.96 36% 32% 5% 9% A B Both None I don t know 69

70 Solution Algorithms A and B compute the same and but algorithm A, line 4, uses a wrong formula for the confidence interval (it uses the formula for a prediction interval) Answer B

Compute Example i.e. the average distance between two points uniformly distributed in the unit square. We obtain the following Monte Carlo estimates R 95% confidence interval 1 000 0.5254 0.5102 0.5405 10 000 0.5227 0.5178 0.5276 100 000 0.5206 0.5190 0.5221 1 000 000 0.5216 0.5211 0.5221 (The answer is known and is equal to 0.5214 [Ghosh B. 1943, On the distribution of random distances in a rectangle]) 71

Is it a good idea to use a confidence interval for the mean? A. Yes when is large B. Yes but it would be safer to use a CI for median C. No because we don t know if the output is normally distributed 68% D. I don t know 16% 11% 5% Yes when$$ is larg Yes but it would be safer t Nobecausewedon t kno I don t know 72

73 Solution We are estimating the mean of the distribution of we have no choice, we must use a CI for the mean therefore The formula is valid when is large, which is typically the case when we do a Monte Carlo simulation. Answer A

Conclusion Simulating well requires knowing the concepts of Transience Confidence intervals Sampling methods 74