1 Simulation Where real stuff starts
ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2
1. What is a simulation? 3
What is a simulation? An experiment in computer Important differences Simulated vs real time Serialization of events Initialization 4
5 Two simulation examples. What is the Simulation 1: Time to complete jobs customers arrive (at the same time) at Joe s shop and download a file. We want to estimate the time it takes to serve the customers difference? Simulation 2: Delay at a server An information server handles requests according to some scheduling policy. We want to estimate the time it takes to serve one request
Terminating versus non terminating simulation 6
7 Two Simulation Runs What is the difference? Scenario 1 Scenario 2 Average service time = 96 ms Average service time = 101 ms Request arrival rate = 100 /s
8 There is no stationary regime walk to infinity 10 /s Transient phase due to atypical initial conditions followed by typical (=stationary) phase You want to measure things only in the stationary phase Otherwise: non reproducible results, non typical
9 Definition of Stationarity A property of a stochastic model X t Let X t be a stochastic model that represents the state of the simulator at time t. It is stationary iff for any s>0 (X t1, X t2,, X tk ) has same distribution as (X t1+s, X t2+s,, X tk+s ) i.e. simulation does not get old
Classical Cases Markov models Definition: State X t is sufficient to draw the future of the simulation Common case for all simulations For a Markov model, over a discrete state space If you run the simulation long enough it will either walk to infinity (unstable) or converge to a stationary regime Ex: queue with >1: unstable queue with <1: becomes stationary after transient If the state space is strongly connected (any state can be reached from any state) then there is 0 or 1 stationary regime Ex: queue Else, there may be several distinct stationary regimes Ex: system with failure modes 10
Stationarity and Transience Knowing whether a model has a stationary regime is sometimes a hard problem We will see important models where this is solved Ex: some queuing systems Ex: time series models Reasoning about your system may give you indications Do you expect growth? Do you expect seasonality? Once you believe your model is stationary, you should handle transients in order to eliminate the impact of initial conditions Remove (how? Look at your output and guess) Sometimes it is possible to avoid transients at all (perfect simulation see later Importance of the View Point ) 11
Typical Reasons For Non Stationarity Obvious dependency on time Seasonality, growth Can be ignored at small time scale (minute) Non Stability: Explosion Queue with utilization factor >1 Non Stability: Freezing Simulation System becomes slower with time (aging) Typically because there are rare events of large impact («Kings») The longer the simulation, the larger the largest king We ll come back to this in the chapter «Importance of the View Point» 12
The state of a simulation at time is drawn at random from a distribution F(). Is this a stationary simulation? A. Yes B. It depends on the distribution is a sample drawn from the distribution 23, 100 C. It depends if the simulation terminates or not 56% D. No 26% E. I don t know 15% 4% 0% It depends on the distribut Yes It depends if the simulation No I don t know 13
14 Solution This simulation is stationary because the joint distribution of is the same for all It is the distribution of a vector of independent random variables drawn from is a sample drawn from the distribution 23, 100 It is statistically impossible to see whether a part of the simulation is young or old
15 2 Accuracy of Simulation Output A stochastic simulation produces a random output, we need confidence intervals 30 runs of Joe s shop: estimate the time to serve a batch of customers
16 Confidence Intervals Median Mean
When computing these confidence intervals we need to make sure that A. the simulation runs are independent B. the output is normally distributed 60% C. Both D. None 24% E. I don t know 16% 0% 0% the simulation runs are... the output is normally di... Both None I don t know 17
18 Solution If we use the median we need to ensure that the runs are independents If we use the mean we also need to ensure that the runs are independents that we are in the domain of validity of theorem 2.2.2 which is probably OK when we will see a more formal answer next Answer A We will now see how to make sure simulation runs are independent
3 Random Number Generator A stochastic simulation does not use truly random numbers but use pseudo random numbers The Random Number Generator produces a random number U(0,1) based on a chaotic sequence Example (obsolete but commonly used, eg in ns2: ) Let us check if output looks random 19
20 The Linear Congruential Generator of ns2 uniform qq plot appears to be uniform autocorrelation Auto correlation is negligible
Lag Diagram, 1000 points 21 appears to be independent of
22 Period of RNG This RNG appears to produce an iid, uniform output However, it is in fact periodic Any sequence generated from a deterministic algorithm in a given computer must be periodic (because the number of states is finite) But we can require that the period should be much larger than maximum number of uses The ns2 simulator has period, which is too small The Mersenne twister (matlab s default) has period
23 Example when the period is too small Incorrect intervals (too large) Correct intervals
for two parallel streams with the ns2 RNG 24
Using RNGs Be careful to have a RNG that has a period orders of magnitude larger than what you will ever use in the simulation choose your RNG carefully The RNG must have a period several orders of magnitude larger than the maximum number of times you will call it The RNG uses a seed for independent replications you need independent seeds; sequential runs usually are sufficient. If you use parallel runs, you must find an way to obtain truly independent seeds Eg based on clock Or by using an RNG that supports parallel streams 25
26 4 Sampling From A Distribution Problem is: Given a distribution, and a RNG, produce some sample that is drawn from this distribution A common task in simulation Matlab does it for us most of the time, but not always Two generic methods CDF inversion Rejection sampling
CDF Inversion for distribution with CDF Applies to real or integer valued random variable For a continuous distribution: draw using RNG; then is drawn from the distribution with CDF CDF 27
28 Proof is uniformly distributed: for We define by, i.e. Let us compute the CDF of because is monotonic increasing Therefore
29
30 CDF Inversion, when has jumps Replace inverse by pseudo inverse CDF of an integer valued random variable
31 Sampling an integer valued RV We want to draw a sample of the random integer. Let for and such that 1. Draw using the RNG 2. Output is such that
32
33 What does this function compute? myfun if rand 0 else 1 A. The flip of a coin where 1 is obtained with proba and 0 with proba 1 B. The flip of a coin where 1 is obtained with proba 1 and 0 with proba C. A sample of a geometric random variable with mean D. A sample of a geometric random variable with mean E. I don t know The flip of a coin where 1 is... The flip of a coin where 1 is... 13% 71% A sample of a geometric r... A sample of a geometric ra... 17% 0% 0% I don t know
34 Solution The flip of a coin that returns 0 with probability follows If return 0 else return 1 is obtained as Answer B
Which one is a uniform random point? A. A B. B C. Both D. None E. I don t know A 10 000 samples 64% B 16% 16% A B Both None 4% I don t know 0% 35
36 Solution A is a regular (non random) sample A B is sampled from the uniform distribution Answer B B
How do you sample a uniform point in a rectangle? 37
How do you sample a uniform point in a rectangle? where and are independent, because the distribution of given is uniform in (and therefore does not depend on do times Draw from the RNG end 38
We sample a point uniformly in some arbitrary area. Are and independent? A. Yes B. It depends on the area C. No D. I don t know 38% 38% 23% Yes It depends on the are No I don t know 0% 39
Solution When area is a rectangle, we have seen that the answer is yes For a general area, the distribution of given depends on Given Given are not independent 40
What is the PDF of the uniform distribution in?, if, if C. Something else D. I don t know 48% 32% 8% 12% 41 $$ $$,$$ $$,$$ = 1 if ($$, $$ $$,$$ $$,$$ = $$$$ if Something else I don t know
42 Solution Answer C The pdf is constant on (that is the definition of uniform) and 0 outside A but the constant is such that the total probability is 1, if
43 How can we sample a point uniformly distributed in this area? A B draw 10,20 if 0; 10 draw 20; 40 else draw 0; 20 repeat draw 10,20, 20; 40 until, A. A B. B C. Both D. None E. I don t know A 25% 38% B Both 38% None 0% I don t know 0%
44 Solution is not uniformly distributed. a higher chance of falling in A is wrong has
45 Solution B repeat draw 10,20, 20; 40 until, B first computes a point that is uniformly distributed and rejects it if it does not pass the test This is called rejection sampling It produces a sample of the conditional distribution of that (theorem 6.2 see later) given
46 Solution B produces a sample of the conditional distribution of given that What is this distribution?, Thus the pdf is,, i.e. the uniform distribution.
47 Rejection Sampling for Conditional Distribution Notation in this theorem Previous example, Unif rectangle), Unif rectangle), Unif A)
48 Recap A uniformly distributed point in an area A has pdf equal to,, It can be sampled by rejection sampling sample a point reject if not in until return uniform in a bounding box around Note: the distribution here is not the same as there
What does this algorithm compute? 1. sample uniformly in 10,10 ; 2. compute 3. with probability return with probability 1 goto 1 sin A. A sample of the random variable with pdf for some constant B. A sample of the random variable with pdf for some constant C. A sample of the random variable with pdf 1 for some constant D. Nothing, it never terminates E. None of the above F. I don t know A sample of the random var.. A sample of the random var.. 50% 14% A sample of the random var.. 9% Nothing, it never terminate 9% None of the above 14% I don t know 5% 49
50 Solution Intuitively, this algorithm chooses in with a probability proportional to Answer A is correct Formally, we will see the proof next. Note that the constant total probability =1, i.e. It is called a normalizing constant is given by the condition:
Rejection Sampling for General Distributions 51
Example: use rejection sampling to draw a sample of with pdf we take The rejection sampling algorithm is: 1. Sample and 2. If / return else goto 1 52
Are these algorithms equivalent when and? A. Yes B. No C. It depends D. I don t know A B 33% 38% 14% 14% Yes No It depends I don t know 53
54 Solution A B In Algorithm B, let Re write B as: 1. Sample uniformly in and ; 2. If return else goto 1 This is the same as Algorithm A Answer A is correct.
55
56 pdf histogram of 2 000 samples obtained by rejection sampling
We sampled 5 000 points from the distribution in with pdf Who s what? A. A 1, B 2, C 3 B. A 1, B 3, C 2 C. A 2, B 1, C 3 D. A 2, B 3, C 1 E. A 3, B 1, C 2 F. A 3, B 2, C 1 G. I don t know, 1,, 0.5 0.5 14% 14% 14% 14% 14% 14% 14% A 1, B 2, C 3 A 1, B 3, C 2 A 2, B 1, C 3 A 2, B 3, C 1 A 3, B 1, C 2 A 3, B 2, C 1 I don t know 57
58 Solution A is proportional to the distance from to the line B is uniform C is proportional to the distance from square to the center of the Answer F
Which algorithm is a correct sampling of the distribution in with pdf? A. A B. B C. Both D. None E. I don t know A B, 0.5 0.5 1. Sample and uniformly in 0,1 and ~Unif 0, 2 ; 2. If, return, else goto 1 1. Sample and uniformly in 0,1 and ~Unif 0,1 ; 2. If, return, else goto 1 45% 36% 14% 5% 0% A B Both None I don t know 59
60 Solution Both are correct implementations of rejection sampling. We need where, for all Algorithm B is more efficient as it requires fewer iterations in average. An even more efficient implementation would sample U\simUnif(0, \sqrt(2)/2)u\simunif(0, \sqrt(2)/2)
61
Another Sample from a Weird Distribution 62
6.3 Ad Hoc Methods Optimized methods exist for some common distributions Optimization = reduce computing time If implemented in your tool, use them! Example: simulating a normal distribution Inversion method is not simple (no closed form for F 1 ) Rejection method is possible But a more efficient method exists, for drawing jointly 2 independent normal RVs There are also ad hoc methods for n dimensional normal distributions (gaussian vectors) 63
64 1 000 samples of the random Gaussian vector with zero mean and covariance matrix and are independent
65 1 000 samples of the random Gaussian vector with zero mean and covariance matrix are not independent
Non normal I don t know What is the distribution of? A. Normal with B. Normal with C. Normal but with other parameters D. Non normal E. I don t know 47% 26% 11% 16% 0% 66 Normal with $$=0,$$=0 Normal but with other par. Normal with $$=0,$$
67 Solution has a normal distribution (the marginal of a Gaussian random vector is Gaussian = normal) The mean is 0 because the mean of The variance is given by the covariance matrix and is 1. Answer 1
68 4 Monte Carlo Simulation A simple method to compute integrals and probabilities Idea: interpret the integral or probability and assume you can simulate as many independent samples of X as you want: 1. generate replicates 2. the Monte Carlo estimate of is 3. compute a confidence interval for the mean (since is the mean of the distribution of ) as
Example: Compute i.e. the average distance between two points uniformly distributed in the unit square 0 ; 1. Say which one is a correct implementation of the Monte-Carlo method: A 1. Draw points, random uniform in 0,1 2. mean,, 1: 3. std,, 1: 4. 1.96 A. A B. B C. Both D. None E. I don t know 0, 0 B do times sample (,,, ) iid Unif (0,1) end / 18% 1.96 36% 32% 5% 9% A B Both None I don t know 69
70 Solution Algorithms A and B compute the same and but algorithm A, line 4, uses a wrong formula for the confidence interval (it uses the formula for a prediction interval) Answer B
Compute Example i.e. the average distance between two points uniformly distributed in the unit square. We obtain the following Monte Carlo estimates R 95% confidence interval 1 000 0.5254 0.5102 0.5405 10 000 0.5227 0.5178 0.5276 100 000 0.5206 0.5190 0.5221 1 000 000 0.5216 0.5211 0.5221 (The answer is known and is equal to 0.5214 [Ghosh B. 1943, On the distribution of random distances in a rectangle]) 71
Is it a good idea to use a confidence interval for the mean? A. Yes when is large B. Yes but it would be safer to use a CI for median C. No because we don t know if the output is normally distributed 68% D. I don t know 16% 11% 5% Yes when$$ is larg Yes but it would be safer t Nobecausewedon t kno I don t know 72
73 Solution We are estimating the mean of the distribution of we have no choice, we must use a CI for the mean therefore The formula is valid when is large, which is typically the case when we do a Monte Carlo simulation. Answer A
Conclusion Simulating well requires knowing the concepts of Transience Confidence intervals Sampling methods 74