Overall Plan of Simulation and Modeling I. Chapters

Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous and Hybrid Simulation Simulation Software Simulation and Modeling I Output Data Analysis 1

Output Data Analysis Contents Statistic Nature of Simulation Output Parameter Estimation in Statistics Confidence Intervals in Statistics Transient and Steady-State Behavior Statistical Analysis of Terminating Simulations Statistical Analysis of Steady-State Simulations Simulation Control in AnyLogic Simulation and Modeling I Output Data Analysis

Statistic Nature of Simulation Output Statistical analysis of simulation output data simulation runs don t give exact answers observations from a stochastic process different runs of same model different numerical results let X 1, X,... be an output from a single simulation run X i s are generally correlated let x 1 (1), x (1),, x m (1) be a realization from one run repeat n runs with independent random numbers leads to n independent replications: x (1) 1, x (1),, x (1) m x () 1, x (),, x () m... x (n) 1, x (n),, x (n) m for each replication we expect different outcomes Simulation and Modeling I Output Data Analysis 3

Statistic Nature of Simulation Output (cont.) Example: Bank 5 tellers, 1 FIFO queue, doors open 9 a.m. - 5 p.m., open until empty exponentially distributed interarrivals (mean = 1 min.) exponentially distributed service times (mean = 4 min.) 10 replications (of one day), initially empty Replication Results for 10 independent replications of the bank model Number served Finish time (hours) Average delay in queue (minutes) Average queue length Proportion of customers delayed <minutes 1 484 8.1 1.53 1.5 0.917 475 8.14 1.66 1.6 0.916 3 484 8.19 1.4 1.3 0.95 4 483 8.03.34.34 0.8 5 455 8.03.00 1.89 0.840 6 461 8.3 1.69 1.56 0.866 7 451 8.09.69.50 0.783 8 486 8.19.86.83 0.78 9 50 8.15 1.70 1.74 0.873 10 475 8.4.60.50 0.779 Simulation and Modeling I Output Data Analysis 4

Parameter Estimation in Statistics Random sampling in statistical inference (recapitulation) random sample X 1, X,..., X n of size n from the population, provided they are mutually independent and identically distributed with F(x) statistic f(x 1, X,..., X n ) is also a RV with so-called sampling distribution estimator: a statistic used to estimate the value of a parameter estimation is based on n experimental outcomes x 1, x,..., x n each experimental outcome x i is a value of the RV X i sample mean: estimator of µ µ ˆ = X(n) sample variance: estimator of σ = n i= 1 n X i σˆ = S (n) = n ( Xi X(n) ) i= 1 n 1 Simulation and Modeling I Output Data Analysis 6

Parameter Estimation in Statistics (cont.) Overview of the various quantities population distribution F(x), its mean µ and variance σ assume IID samples X 1, X,..., X n sample mean µ ˆ = X(n) and sample variance σˆ = S (n) the sample distributions of the sample mean and sample variance sample-mean variance Var [ X( n) ] the estimator of the sample-mean variance σ = n Vˆ ar [ X( n) ] S = ( n) n Simulation and Modeling I Output Data Analysis 7

Parameter Estimation in Statistics (cont.) Dependent samples simulation output data are almost always correlated the sample RVs X 1, X,..., X n are thus not independent the sample mean is still consistent and unbiased but the estimator for the sample-mean variance is wrong and its naïve application would lead to gross underestimations example: M/M/1 queue with utilization = 0.9 consider first ten delays D 1, D,..., D 10 it can be shown analytically that Var[D( two possibilities 10) ] [ 10)/10] E S( = 0.0034 correct the formula by using correlations (possible but complicated) use independent replications (our choice) (the sample-mean variance is much bigger than derived by the estimator for independent samples) Simulation and Modeling I Output Data Analysis 8

Parameter Estimation in Statistics (cont.) Independent replications we replicate the experiment n times, each experiment with m observations if initial state of each replication is randomly chosen, the results of the n replications will be independent the variance of the sample mean is then available by using the arguments from random sampling this approach is commonly applied in simulation output data analysis Simulation and Modeling I Output Data Analysis 9

Parameter Estimation in Statistics (cont.) Independent replications (cont.) x i (j) is the ith observation in the jth replication, the RV is X i (j) the sample mean of the jth replication is X ( j ) ( m) m ( j) the overall sample mean is ( ) X n = the overall sample variance is S ( n) = n j= 1 = m i= 1 X X i ( j ) ( j) ( m) Xi n = n m j= 1 i= 1 nm n ( j [ X ) ( m) X( n) ] j= 1 n 1 and the estimated variance of the overall sample mean is: S Vˆ ar[x n ] = ( ) ( n) n Simulation and Modeling I Output Data Analysis 10

Confidence Intervals in Statistics So far: point estimation only rarely the true value and the estimation will be identical in non-random situations, quality of an estimate µˆ of a true value µ is often assessed via a bounded relative error: µ µ γ µ with random sampling, we may only hope to bound the relative error X(n) µ probabilistically: P γ = 1 α µ where 1-α is a large probability, say, 0.9,0.95,0.99, etc. As shown on the next slides, an interval estimate of the form P(X(n) δ µ X(n) + δ) = 1 α allows us to bound the relative error probabilistically known as confidence intervals Simulation and Modeling I Output Data Analysis 1

Confidence Intervals in Statistics (cont.) Definitions of 100(1-α)% confidence intervals confidence interval [ X(n) δ, X(n) + δ] δ is the confidence half-length 1-α is the confidence level, so that P(X(n) δ µ X(n) + δ) = 1 α typical values: α = 0.1, 0.05, 0.01 fraction ε = δ / X(n) is the relative confidence half-length Simulation and Modeling I Output Data Analysis 13

Confidence Intervals in Statistics (cont.) Interpretation of confidence intervals confidence intervals are constructed for estimating a mean µ right: 100(1-α)% of the time, the interval formed this way covers µ µ (unknown) wrong: take one of the constructed confidence intervals and say that µ lies in this interval with probability 1-α in practice the second interpretation is common (and is also often stated incorrectly in textbooks) Simulation and Modeling I Output Data Analysis 14

Confidence Intervals in Statistics (cont.) Central Limit Theorem most important result in probability theory definitions let X 1, X,..., X n be IID RVs with finite mean µ and finite variance σ (but no further assumption on distribution) X(n) µ define statistic Zn = σ / n note: µ and σ /n are the mean and variance of statement: for large n, the distribution of Z n converges to a standard normal distribution (for any population distribution) interpretation: if n is sufficiently large, Z n is approximately distributed as a standard normal variable ( ) also: X n N(µ, σ /n) X( n) Simulation and Modeling I Output Data Analysis 15

Confidence Intervals in Statistics (cont.) Sampling from the normal distribution in practice: variance σ is unknown but sample variance S (n) converges to σ as n gets large central limit theorem still holds if σ is replaced by S (n) ( ) X n μ S ( n)/n is still approximately N(0,1)-distributed for large n let z 1-α/ be the (1-α/)-quantile of the standard normal distribution for large n we get z X(n) µ α / z α 1 α 1 / S (n) / n P 1 and an approximate 100(1-α)% confidence interval [X(n) z 1 α / S (n) / n, X(n) + z1 α / S (n) / n] Simulation and Modeling I Output Data Analysis 16

Confidence Intervals in Statistics (cont.) Sampling from the normal distribution (cont.) true value µ is covered with probability 1-α: f(x) Shaded area = 1 - α -z 1-α/ 0 z 1-α/ x practical problem: when is n sufficiently large? Simulation and Modeling I Output Data Analysis 17

Confidence Intervals in Statistics (cont.) Sampling from the normal distribution (cont.) quantiles of the standard normal distribution: α 0.1 0.05 0.01 1-α/ 0.95 0.975 0.995 quantile z 1-α/ 1.645 1.960.576 more values in most statistical textbooks, e.g., Table T.1 in Law/Kelton take care whether α or 1-α/ is used in the table! Simulation and Modeling I Output Data Analysis 18

Confidence Intervals in Statistics (cont.) Sampling from the normal distribution (cont.) example 10 independent observations: 1.0, 1.50, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.50, 1.09 X(10) = 1.34 and S (10) = 0.17 z 0.95 = 1.645 (from table) approximate 90% confidence interval: [1.34 1.645 S = [1.34 0.1,1.34 + 0.1] = [1.13,1.55] ( 10) /10,1.34 + 1.645 S ( 10) /10] Simulation and Modeling I Output Data Analysis 19

Confidence Intervals in Statistics (cont.) Sampling from the Student-t distribution when the sample size n is small (n < 30), the approximation of the normal distribution is poor alternatively, the Student-t distribution can be used this approach is commonly applied in simulation output data analysis, together with independent replications as assumption we need that the X i s are normally distributed define statistic t n 1 = X(n) µ S (n) / n t n-1 has a Student-t distribution with n-1 degrees of freedom (df) t n-1,1-α/ is its (1-α/)-quantile an exact 100(1-a)% confidence interval is [X(n) t n 1,1 α / S (n) / n, X(n) + t n 1,1 α / S (n) / n] see complementary problem of testing the hypothesis H 0 : µ = µ 0 (slide 11 of Statistics and Tests ) Simulation and Modeling I Output Data Analysis 0

Confidence Intervals in Statistics (cont.) Sampling from the Student-t distribution (cont.) density functions of the standard normal distribution and the Student-t distribution with 4 df (=m) f m (x) = Γ F(x) =... ( ( m + 1) ) 1 1 + x mπ Γ( m ) m ( involves Gamma and Beta function) m + 1 f(x) Standard normal distribution Student-t distribution with 4 df E[X] = 0 Var[X] = m m 0 t n-1, 1-α/ > z 1-α/ (i.e., larger confidence interval for Student-t case and coverage closer to desired level α) and t n-1, 1-α/ z 1-α/ for large n for n 30, quantiles from the standard normal distribution can be used Simulation and Modeling I Output Data Analysis 1 x

Confidence Intervals in Statistics (cont.) Sampling from the Student-t distribution (cont.) quantiles of the Student-t distribution: α 0.1 0.05 0.01 1-α/ 0.95 0.975 0.995 quantile t 4, 1-α/.13.776 4.604 quantile t 9, 1-α/ 1.833.6 3.50 quantile t 19, 1-α/ 1.79.093.878 quantile t 9, 1-α/ 1.699.045.756 quantile t 50, 1-α/ 1.676 1.99.678 quantile t 100, 1-α/ 1.660 1.984.66 quantile z 1-α/ 1.645 1.960.576 more values in Table T.1 in Law/Kelton Simulation and Modeling I Output Data Analysis

Confidence Intervals in Statistics (cont.) Sampling from the Student-t distribution (cont.) previous example of 10 observations X(10) = 1.34 and S (10) = 0.17 t 9, 0.95 = 1.833 (from table) 90% confidence interval (slightly larger now): [1.34 1.833 S (10)/10,1.34 + 1.833 (10)/10] = [1.34 0.4,1.34 + 0.4] = [1.1,1.58] approximation the confidence interval is still an approximation, if the population distribution is not normal we cannot assume that simulation output data are normally distributed question: how good is the coverage of this Student t confidence interval for other population distributions? S Simulation and Modeling I Output Data Analysis 3

Confidence Intervals in Statistics (cont.) Coverage estimation sampling from distributions with known parameters normal exponential chi-square with 1 df, lognormal hyperexponential with F(x) = 0.9(1-e -0.5x ) + 0.1(1-e -5.5x ) 90% confidence intervals were constructed for estimating the mean with sample sizes n = 5, 10, 0, 40 repeated in 500 independent experiments since the true parameters are known, empirical coverages can be determined from these experiments Simulation and Modeling I Output Data Analysis 4

Confidence Intervals in Statistics (cont.) Coverage estimation (cont.) empirical coverages: distribution skewness n=5 n=10 n=0 n=40 normal 0.00 0.910 0.90 0.898 0.900 exponential.00 0.854 0.878 0.870 0.890 chi square.83 0.810 0.830 0.848 0.890 lognormal 6.18 0.758 0.768 0.84 0.85 hyperexponential 6.43 0.584 0.586 0.68 0.774 coverage gets close to 90% as n gets larger the larger the skewness, the larger the needed sample size Simulation and Modeling I Output Data Analysis 5

Confidence Intervals in Statistics (cont.) Summary: Student-t confidence intervals goal: point estimate and confidence interval for mean µ = E(X) make n independent replications with X 1, X,... X n (used to be for j=1,,n) ( X j) ( m) sample mean sample variance X(n) n X i i= 1 confidence interval half-length S = 1 n 1 (n) = n 1 n i= 1 δ (X = i t X(n)) n 1,1 α / S (n) n confidence interval [ X(n) δ, X(n) + δ] Simulation and Modeling I Output Data Analysis 6

Confidence Intervals in Statistics (cont.) Obtaining a specified precision δ and thus confidence interval is known, i.e., we have P(X(n) δ µ X(n) + δ) 1 α but actually we want to bound the relative error (X n) µ µ γ we need to find a confidence interval half-length such that a requested relative error is met Simulation and Modeling I Output Data Analysis 7

Confidence Intervals in Statistics (cont.) Obtaining a specified precision (cont.) relative half-length: ε = δ / X(n) we can derive the relationship: 1 α P(X(n) δ µ X(n) + δ) = P( X(n) µ δ ) = P( X(n) µ ε X(n) ) = P( X(n) µ ε X(n) µ + µ ) triangle inequality P( X(n) µ ε X(n) µ + ε µ ) = P((1 ε) X(n) µ ε µ ) = P( X(n) µ / µ ε /(1 ε)) relative error is hence bounded by ε/(1-ε) with probability 1-α therefore choose δ such that ε/(1-ε) < γ with ε = δ / X(n) and for given γ Simulation and Modeling I Output Data Analysis 8

Confidence Intervals in Statistics (cont.) Obtaining a specified precision (cont.) previous example of 10 observations X(10) = 1.34 and S (10) = 0.17 confidence half-length δ = 0.4 relative confidence half-length = 0.4/1.34 0.179 relative error bound = ε/(1-ε) 0.18 useful procedure for specific confidence level 1-α: specify relative error γ (e.g., 0.1, 0.05) increase n (which will decrease δ and thus ε) until relative error bound ε/(1-ε) is small enough, i.e., less than γ leads to: ε = δ X < γ + ( n) 1 γ Simulation and Modeling I Output Data Analysis 9

Transient and Steady-State Behavior Convergence to steady-state output process: Y 1, Y,... consider the transient distribution F i (y initial condition) = P(Y i y initial condition) corresponding density functions f i : Simulation and Modeling I Output Data Analysis 31

Transient and Steady-State Behavior (cont.) Example: M/M/1 queue consider delays D 1, D,... transient means: E(D i ) arrival rate = 1 utilization = 0.9 s = number in system at time 0 Simulation and Modeling I Output Data Analysis 3

Transient and Steady-State Behavior (cont.) Types of simulations terminating simulations specific initial and stopping conditions bank example: simulate from 9 a.m. (initially with no customers) until at least 5 p.m. and all customers have left measures depend on initial and stopping conditions similar to transient analysis (but not equivalent) steady-state simulations (nonterminating simulations) no natural event to specify simulation length interest in long-run system behavior when operating normally corresponds to steady-state analysis Simulation and Modeling I Output Data Analysis 33

Transient and Steady-State Behavior (cont.) When terminating and when steady-state simulations? a modeling decision example: military confrontation, goal is to find out winner terminating simulation until one force has lost 30 % example: aerospace manufacturer must produce 100 airplanes within 18 months and want to compare various manufacturing configurations to see whether deadline can be met terminating simulation until 100 airplanes have been completed example: a manufacturer operating 16 hours a day (two shifts) with work carrying over from one day to the next and wants to know the throughput of parts steady-state simulation (continuous process, ending conditions from one day are initial conditions of next) example: a Web content provider wants to know the impact on download times of adding new hardware to its Web server steady-state simulation Simulation and Modeling I Output Data Analysis 34

Statistical Analysis of Terminating Simulations Approach: make n independent replications same initial conditions for each replication same stopping conditions for each replication separate random numbers for each replication let X j be summary measure of jth replication (used to be X 1, X,... X n are a random sample (IID) can apply simple statistics can mostly rely on normality assumption ( X j) ( m) for j=1,,n) Simulation and Modeling I Output Data Analysis 36

Statistical Analysis of Terminating Simulations (cont.) Estimating means (recall) goal: point estimate and confidence interval for mean µ = E(X) make n replications with X 1, X,... X n sample mean sample variance X(n) n X i i= 1 confidence interval half-length S = 1 n 1 (n) = n 1 n i= 1 (X i X(n)) δ(n, α) = t n 1,1 α / S (n) n confidence interval [ X(n) δ(n, α),x(n) + δ(n, α)] rel. interval half-length ε = δ / X(n), rel. error bound ε/(1-ε) Simulation and Modeling I Output Data Analysis 37

Statistical Analysis of Terminating Simulations (cont.) Example: bank n = 10 replications estimation of average delay in queue X(10) =.03, S (10) = 0.31, δ 10,0.01) = t 0.031 0. 3 ( 9,0. 95 = 90% confidence interval: [1.71,.35] minutes estimated proportion of customers with a delay less than 5 minutes X(10) = 0.853, S (10) = 0.004, δ 10,0.01) = t 0.0004 0. 036 ( 9,0. 95 = 90% confidence interval: [0.817, 0.889] Simulation and Modeling I Output Data Analysis 38

Statistical Analysis of Terminating Simulations (cont.) Normality assumption actually: the X j s are not normally distributed but: if averaged over a large number of data in each replication, central limit theorem says that they are approximately normally distributed example: M/M/1 queue (utilization = 0.9), average of the m first delays, histograms of results for 500 replications: m = 5, large skew: m = 6400, smaller skew: h(x) h(x) 0.0 0.5 0.15 0.0 0.10 0.15 0.10 0.05 0.05 0.00 0.3 1.8 3.3 4.8 6.3 7.8 9.3 x 0.00 0.5.5 4.5 6.5 8.5 10.5 1.5 14.5 16.5 18.5 x Simulation and Modeling I Output Data Analysis 39

Statistical Analysis of Terminating Simulations (cont.) Coverage estimation experimental estimation of coverage for the M/M/1 queue (m = 5) with an increasing number n of replications analysis: µ =.1 500 times a 90% confidence interval was constructed count % of these intervals covering µ estimate average of relative precision in all intervals n coverage average relative half-length 5 88% 0.67 10 86% 0.44 0 89% 0.30 40 9% 0.1 coverage is close, relative half length decreases with square root of n, i.e., with n Simulation and Modeling I Output Data Analysis 40

Statistical Analysis of Terminating Simulations (cont.) Obtaining a specified precision want: obtain a certain confidence level and relative error fixed-sample-size procedure fix number n of replications in advance hard to determine required n for given relative error bank example n = 10 replications, set α = 0.1 ε = 0.157 and ε/(1-ε) = 0.187, interval: [1.71,.35] sequential procedure increase n until ε is small enough bank example set α = 0.1, relative error 10% after n = 74 replications: X(74) = 1.76, S (74) = 0.67, δ( 74,0.01) = 0. 18 ε = 0.09 and ε/(1-ε) = 0.10, interval: [1.60, 1.9] Simulation and Modeling I Output Data Analysis 41

Statistical Analysis of Terminating Simulations (cont.) Algorithm: sequential procedure input initial number of replications n 0 relative error γ level α e.g., n 0 = 10, γ = 0.1, α = 0.1 initial replications set n = n 0 make n replications compute X(n) and δ(n,α) from X 1, X,... X n while δ( n, α) / X(n) γ /(1+ γ) do make additional replication n++ compute new X(n) and δ(n,α) return [ X(n) δ(n, α),x(n) + δ(n, α)] ε from < 1- ε δ ε = X γ : ( n, α) γ < ( n) 1+ γ Simulation and Modeling I Output Data Analysis 4

Statistical Analysis of Steady-State Simulations Characteristics of steady-state simulation aimed to give insight after a long period of time initial transient (nonstationary, warm-up) period after launching, initially in this period example: a queue started empty data from this period leads to significant bias and must be discarded convergence to steady state the observations are still fluctuating and are (typically) correlated must run simulation long enough to get specified precision Simulation and Modeling I Output Data Analysis 44

Statistical Analysis of Steady-State Simulations (cont.) The two main problems duration of the initial transient period need methods to estimate duration rules of thumb graphical procedures statistical tests must discard data from this period analysis of correlated data need stopping criteria to run simulation until a precision is met must deal with correlations conceptually simple: independent replications, batch means also other statistical methods: more sophisticated Simulation and Modeling I Output Data Analysis 45

Statistical Analysis of Steady-State Simulations (cont.) Rules of thumb for detection of the initial transient period e.g., the initial transient period is over if the first observation is neither the minimum nor the maximum of the rest the longest cycle of the simulated system has been executed three or four times the sample mean approaches a constant level with a given accuracy the variance of the sample mean is approximately inversely proportional to the number of observations the observations cross the sample mean k times (e.g., k = 5 is a common approach)... Simulation and Modeling I Output Data Analysis 46

Statistical Analysis of Steady-State Simulations (cont.) Schruben s test for detection of the initial transient period a test to check whether X 1, X,... X n is a stationary sequence not discussed here suggests sequential procedure: use a heuristic rule to guess length of initial transient period and discard those observations make new observations use Schruben s test to check stationarity no stationary? yes start steady-state simulation discard a number of observations from the beginning of the sequence Simulation and Modeling I Output Data Analysis 47

Statistical Analysis of Steady-State Simulations (cont.) Welch s graphical procedure for detection of the initial transient period step 1) make n independent replications, each of length m (n 5, m large) let Y ij be the ith observation from the jth replication step ) n 1 let Y be the averaged process i = Y ij n j= 1 the averaged process has the same mean as the original process, but only (1/n)th the variance step 3) smooth out high-frequency oscillations (but leave the long-run trend) by the moving average with window size w (w m/4): 1 w + 1 = 1 i + 1s w s= w Yi (w) i 1 Y Y i+ s i+ s = (i 1) if i = w + 1,...,m w if i = 1,..., w Simulation and Modeling I Output Data Analysis 48

Statistical Analysis of Steady-State Simulations (cont.) Welch s graphical procedure for detection of the initial transient period (cont.) Replication 1 Y 11, Y 1, Y 13, Y 14,..., Y 1,m-, Y 1,m-1, Y 1,m n Y 1, Y, Y 3, Y 4,..., Y,m-, Y,m-1, Y,m..................... Y n1, Y n, Y n3, Y n4,... Y n,m-, Y n,m-1, Y n,m Step 1 Averaged process Y 1 Y Y 3 Y 4... Y m- Y m Y m-1 Step Moving averaged w = 1 Y 1 (1)Y (1)Y 3 (1)... Y m-1 (1) Step 3 Simulation and Modeling I Output Data Analysis 49

Statistical Analysis of Steady-State Simulations (cont.) Welch s graphical procedure for detection of the initial transient period (cont.) step 4) plot the moving average and choose ( eyeball ) l such that (w),y (w),... appear to have converged Yl + 1 l+ recommendations initially, n = 5 or 10 plot the moving average for several values of w and choose smallest value for which the plot is smooth same problem as with histograms: w too small: ragged plot w too large: overaggregated if not satisfactory, increase n by 5 or 10 Simulation and Modeling I Output Data Analysis 50

Statistical Analysis of Steady-State Simulations (cont.) Example small factory with machining center and inspection station Queue Machining Queue Inspection 0.9 good center station 0.1 bad unfinished parts arrive with exponentially distributed interarrival times, mean = 1 minute processing times at machine uniform on [0.65, 0.70] minutes 90% inspected parts good and sent to shipping rest bad and sent back to machine for rework consider N 1, N,..., where N i is the number of parts sent to shipping in the ith hour Simulation and Modeling I Output Data Analysis 51

Statistical Analysis of Steady-State Simulations (cont.) Example (cont.) averaged process over 10 replications: Simulation and Modeling I Output Data Analysis 5

Statistical Analysis of Steady-State Simulations (cont.) Example (cont.) moving averages for w = 0: Simulation and Modeling I Output Data Analysis 53

Statistical Analysis of Steady-State Simulations (cont.) Example (cont.) moving averages for w = 0: Simulation and Modeling I Output Data Analysis 54

Statistical Analysis of Steady-State Simulations (cont.) Independent replications for the analysis of correlated data assume that the length l of the initial transient period has been detected now independent replications (as in the terminating case) make n independent replications each of length m let X j be summary measure of jth replication (discard data from initial transient period) X 1, X,... X n are a random sample construct confidence interval [ X(n) δ(n, α),x(n) + δ(n, α)] rep. 1 l obs. m-l observations X 1 X(n) S (n) δ(n,a) rep. m l obs. m-l observations X n Simulation and Modeling I Output Data Analysis 55

Statistical Analysis of Steady-State Simulations (cont.) Independent replications for the analysis of correlated data (cont.) possible to use same/different replications as for detecting initial transient period must have m significantly larger than l factory example n = 10 replications of length m = 160 with removed initial transient period with l = 4 90% confidence interval: [59.51, 60.43] compare with arrival rate! Simulation and Modeling I Output Data Analysis 56

Statistical Analysis of Steady-State Simulations (cont.) Independent replications for the analysis of correlated data (cont.) main problem: fixed-sample-size approach m is model-dependent, hard to control relative error decreases if n and m is increased relative error decreases with square root of n recommendations a few replications: n = 5, 10, 0, 40 m as large as possible n=10 n=0 n=30 n=40 E[X(n)] µ (unknown) Simulation and Modeling I Output Data Analysis 57

Statistical Analysis of Steady-State Simulations (cont.) Multiple replications in parallel (MRIP) replications as parallel processes run until specified relative error is met (see terminating simulation) conceptually simple user interaction master process - starts and stops simulation engines - collects data, computes statistics slave process 1 - simulation engine - independent RVs - sends data slave process n - simulation engine - independent RVs - sends data Simulation and Modeling I Output Data Analysis 58

Statistical Analysis of Steady-State Simulations (cont.) Other approaches for the analysis of correlated data many short runs vs. one long run Many short runs One long run X 1 X 1 X X 3 X 4 X 5 Simulation and Modeling I Output Data Analysis 59

Statistical Analysis of Steady-State Simulations (cont.) Other approaches for the analysis of correlated data (cont.) batch means one single long run, the first l observations deleted the remaining observations are divided into n batches of length k = (m-l)/n l obs. batch 1 batch batch n X 1 X X n let X j be sample mean of jth batch if k is large enough, the X 1, X,... X n are independent and the usual techniques can be used main problem: choice of k advantages: simple, initial transient period only once variants: spaced/overlapping/weighted batch means Simulation and Modeling I Output Data Analysis 60

Statistical Analysis of Steady-State Simulations (cont.) Other approaches for the analysis of correlated data (cont.) regenerative simulation certain instants (regeneration points) B j can be identified where the process repeats its behavior (probabilistically) observations before and after regeneration points are independent example: single-server queue, arrivals to an empty system estimation based on periods divided by regeneration points hard to find regeneration points in general models D i B 1 =1 B =6 B 3 =8 B 4 =13 B 5 =14 B 6 =0 i Simulation and Modeling I Output Data Analysis 61

Statistical Analysis of Steady-State Simulations (cont.) Other approaches for the analysis of correlated data (cont.) sample mean variance estimation directly from correlated data spectral analysis discrete Fourier transformation of observations spectrum in frequency domain variance is estimated from spectrum at zero frequency estimation with good properties sophisticated, not easy to implement autoregressive method, standardized time series two other approaches to estimate variance directly experimental results not as good as for spectral analysis Simulation and Modeling I Output Data Analysis 6

Statistical Analysis of Steady-State Simulations (cont.) A reference about the statistical evaluation with more details than in the book of Law/Kelton, together with pseudocode procedures: K. Pawlikowski. Steady-State Simulation of Queueing Processes: A Survey of Problems and Solutions. ACM Computing Surveys, Vol., pp. 13-170, 1990. Simulation and Modeling I Output Data Analysis 63

Simulation Control in AnyLogic 6 sequential procedure for terminating and steady-state simulations (see page 4) may be implemented as a Parameter Variation Experiment in AnyLogic 6 example: SSQ (as described in chapter Discrete Simulation) simulation control to be performed over mean waiting time step1 (pane General): set number of runs per iteration to 1 Here, we do not vary any parameters; we set constant expressions. Simulation and Modeling I Output Data Analysis 65

Simulation Control in AnyLogic 6 (cont.) sequential procedure for SSQ (cont.) step (pane Replications): check button "use replications" We choose "fixed number of replications" to upper bound their number by say 1000. The confidence level is then implicitly fixed to 95%. (The option "varying number of replications" is not well documented and does not allow to specify a relative error.) Simulation and Modeling I Output Data Analysis 66

Simulation Control in AnyLogic 6 (cont.) sequential procedure for SSQ (cont.) step3 (pane Advanced): specify what to do after a replication and when to start another one add summary measure of a replication to a Statistics of HistogramData object of the Parameter Variation Experiment stop condition: (getcurrentreplication() > InitialReplications) && (WaitingTimeRep.meanConfidence()/WaitingTimeRep.mean() < RelativeError/(1+RelativeError)) Simulation and Modeling I Output Data Analysis 67