Unit 5a: Comparisons via Simulation. Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology

Size: px

Start display at page:

Download "Unit 5a: Comparisons via Simulation. Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology"

Tyrone Moody
6 years ago
Views:

1 Unit 5a: Comparisons via Simulation Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology

2 Motivation Simulations are typically run to compare 2 or more alternative system designs or scenarios. Simulations, as all models, provide better estimates of relative difference than they do absolute performance because the same simplifications go into all the models.

3 Types of Comparisons Determining which scenarios have similar performance. Determining which scenarios are better than a standard or default. Determining which scenario is the best. Determining how a system s performance changes as a function of controllable parameters, or optimizing over the parameters.

4 When are scenarios different? There is a distinction between statistical and practical difference. A practically meaningful difference depends on the problem at hand: 5 minutes in cycle time $10,000 on a portfolio s return 100 people being unable to connect

5 Continued Statistical significance depends on how much sampling variability there is in the point estimate: A 95% confidence interval for the difference in expected cycle time between model A and B is 4 ± 5 minutes. What can we conclude? What if it is 4 ± 1 minute?

6 Controlling Significance We use statistical procedures to tell us whether we can believe the difference we see in the results from two or more simulations. We use the number of replications to control the size of the differences that are detectable; that is, to control the error in our estimates.

7 Special Opportunities In simulation, more so than in other statistical experiments, we control the source of randomness. By using the same random numbers to drive the simulation of each scenario we achieve sharper comparisons. This is known as correlated sampling or common random numbers (CRN).

8 Intuition behind CRN We want each scenario to see the same source of randomness (demands for product, service times, failed machines,customer arrivals, etc.). CRN implies that differences in observed performance will be primarily due to differences in the scenarios, not differences in the random inputs.

9 Impact of CRN Example 12.2 Dump Truck Average Response Time Two Loaders One Loader Replication The outputs are variable, but CRN makes it easy to see that two loaders has smaller response time.

10 Math behind CRN Var( Y1 Y2 ) = Var( Y1 ) + Var( Y2 ) 2Cov( Y1, Y2 ) If scenarios are simulated independently (different random numbers), then Cov = 0. But if we use CRN then Cov > 0 (usually), reducing the variance of the difference.

11 CRN Happens Note that CRN is, essentially, the default experiment design unless we explicitly do something to cause each simulation to use different random numbers. However, there are things we can do to make the effect of CRN stronger.

12 Making CRN Work The effect of CRN is enhanced if the same random number is used for the same purpose in each simulated scenario. The primary way to make this happen is to assign a distinct random number stream to each distinct input process (interarrival times, service times, etc.)

13 What are Streams? Remember that pseudorandom numbers are provided by a generator with a (very) long period. Streams are just different starting places (very far apart) within this long sequence. Arena has many streams(1.8 * )

14 Making CRN Work Better Use the same stream for an input process even if the distribution changes. Model A service time: Expo(7.1, 9) Model B service time: Tria(2,6,12, 9) If entities get any randomly assigned attributes, then assign them all at once when the entity is created. stream

15 Making it Work EVEN Better We want Models A and B to use the same random numbers for the same purpose on each replication of Model A and Model B (as much as possible). This is difficult because two models may consume different numbers of random numbers on each replication.

16 Burning Random Numbers We can skip random numbers at the end of each replication to synchronize them. Arena does it automatically Model Rep 1 Rep 2 Rep 3 A B R 1,, burn R , R R 1,, burn R , R burn R , burn R ,

17 Comparing Means A standard comparison of scenarios is via differences in their mean performance. A common way to compare means is to look for overlapping confidence intervals for each mean.

18 Box & Whisker Chart Whiskers show max and min observations Box shows 95% c.i. for the mean These intervals overlap

19 Problems with Overlapping C.I.s If each individual interval has 95% confidence, then the overall confidence for all intervals simultaneously is < 95%. If the intervals don t overlap then the scenarios are different, but they may be different even when the intervals do overlap. This approach does not exploit CRN.

20 Better Methods We will start with the case of K=2 scenarios, numbered 1 and 2. Scenario Outputs from R Reps Statistics 1 Y 11, Y 21, Y 31,, Y R1 2 Y 12, Y 22, Y 32,, Y R2 1 2 D 1, D 2, D 3,, D R Y 2 1,S 1 Y 2 2,S 2 2 D, SD

21 Paired-t Interval Interval for difference in means θ 1 - θ 2 Allows unequal variances, and exploits CRN. Assumes normally distributed data D ± t α / 2, R 1 S 2 D R

22 Two-Sample t Interval Assumes equal variances, no CRN. Assumes normally distributed data Has double the degrees of freedom of the paired-t Y 1 Y 2 ± t α / 2,2( R 1) S R S 2 2

23 Comparison We typically prefer paired t because we have no reason to believe variances will be equal. Provided the number of reps is 10 or more, even a little bit of positive correlation from CRN will overcome the loss of degrees of freedom.

24 Practical Significance When we construct confidence intervals for θ 1 - θ 2 we want to be able to detect differences that matter. If we want to detect differences of more that ±ε, then after R 0 initial replications we set 2 R t α / 2, R0 ε 1 S D

25 Example 12.1 From 10 reps we get an estimate of the difference in response time between two configurations for vehicle inspection of 0.4 ± 0.9 minutes with 95% confidence. Suppose a difference of ± 0.5 minutes matters.

26 Example 12.1 continued R R t 0.05/ 2,(10 1) (2.26) ε 2 (0.5) (1.7) 2 S D = 2 35 reps

27 Alternative Approach When S D 2 is not available use R t 2 α / 2, 2( ( S + R 1) 0 ε S 2 2 )

28 Comparing More than Two When we compare more than two scenarios, looking at overlapping confidence intervals is even less appropriate. And looking at all differences θ i - θ j is not the most efficient way to compare scenarios when our goal is to identify the best.

29 Approaches for K > 2 Form simultaneous confidence intervals for all differences. In this case we need to adjust for multiplicity. Identify a subset that contains the best; this is called subset selection. Run a multi-stage procedure specifically designed to find the best; this is called ranking (the book gives one procedure).

30 Simultaneous C.I.s Remember that if the confidence level is 1-α, then the chance of making an error is no more than α. The Bonferroni inequality says that if we form C intervals, each at level 1- α, then Pr{ all intervals cover} 1- Cα

31 Example Suppose we have K=4 scenarios, and we want to estimate θ i - θ j for all C = K(K-1)/2 = 6 pairs of means with overall confidence level of 95%. Then we should form each confidence interval at the /6 = 0.99 level of confidence. Notice that this makes all intervals much wider.

32 Subset Selection Approach A subset selection procedure guarantees, with given confidence level, to find a set that contains a may be the best system. One way to find the best is to keep increasing R until the subset only contains one scenario.

33 Identify the Best in PAN Check box causes PAN to identify all scenarios that might be the best. The error tolerance is how far you are willing to be off from including the true best.

34 Graphical Identification of Best

35 Error Tolerance The procedure guarantees, with 95% confidence, to provide a subset of scenarios that contains the best when Tolerance = 0. When Tolerance > 0, the subset will contain the best, or a scenario within Tolerance of the best, with 95% confidence.

36 In this case an error tolerance of 0.05 (5% utilization) causes one scenario to be identified as best. We are guaranteed (with high confidence) that this is the true best, or within 0.05 of it. With the same data, an error tolerance of 0 causes 4 scenarios to be placed in the group that contains the best. Less risk, but less conclusive.

37 Intuition Compute the sample mean from each scenario. Keep the scenario with the best (largest or smallest) sample mean. Keep the other scenarios whose sample means are not too far from the best based on a type of confidence interval for the difference.

38 Controlling Error If our goal is to find the best, then we can increase the number of replications until the subset has only one scenario. There is no direct way tell how many replications will be needed, but don t add fewer than 10 replications at a time. The book contains a two-stage procedure that guarantees selecting a single scenario.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter Comparison and Evaluation of Alternative System Designs Purpose Purpose: comparison of alternative system designs. Approach: discuss a few of many statistical