Performance of Skart: A Skewness- and Autoregression-Adjusted Batch-Means Procedure for Simulation Analysis

Similar documents
ARD: AN AUTOMATED REPLICATION-DELETION METHOD FOR SIMULATION ANALYSIS

ARD: AN AUTOMATED REPLICATION-DELETION METHOD FOR SIMULATION ANALYSIS

N-SKART: A NONSEQUENTIAL SKEWNESS- AND AUTOREGRESSION-ADJUSTED BATCH-MEANS PROCEDURE FOR SIMULATION ANALYSIS

A SPACED BATCH MEANS PROCEDURE FOR SIMULATION ANALYSIS

Asummary and an analysis are given for an experimental performance evaluation of WASSP, an automated

Skart: A skewness- and autoregression-adjusted batch-means procedure for simulation analysis

PERFORMANCE COMPARISON OF MSER-5 AND N-SKART ON THE SIMULATION START-UP PROBLEM

Online publication date: 22 December 2010 PLEASE SCROLL DOWN FOR ARTICLE

Overall Plan of Simulation and Modeling I. Chapters

Online Appendix to IIE Transactions Article Skart: A skewness- and autoregression-adjusted batch-means procedure for simulation analysis

Chapter 11. Output Analysis for a Single Model Prof. Dr. Mesut Güneş Ch. 11 Output Analysis for a Single Model

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

B. Maddah INDE 504 Discrete-Event Simulation. Output Analysis (1)

A SEQUENTIAL PROCEDURE FOR ESTIMATING THE STEADY-STATE MEAN USING STANDARDIZED TIME SERIES. Christos Alexopoulos David Goldsman Peng Tang

LADA, EMILY KATE. A Wavelet-Based Procedure for Steady-State Simulation Output Analysis. (Under the direction of Dr. James R. Wilson.

SEQUENTIAL ESTIMATION OF THE STEADY-STATE VARIANCE IN DISCRETE EVENT SIMULATION

Network Simulation Chapter 6: Output Data Analysis

Output Data Analysis for a Single System

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Monitoring Autocorrelated Processes Using A Distribution-Free Tabular CUSUM Chart With Automated Variance Estimation

ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION. < May 22, 2006

ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles).

Monitoring autocorrelated processes using a distribution-free tabular CUSUM chart with automated variance estimation

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Redacted for Privacy

Slides 12: Output Analysis for a Single Model

Computational statistics

Simulation. Where real stuff starts

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.

Name of the Student:

EMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

NON-STATIONARY QUEUE SIMULATION ANALYSIS USING TIME SERIES

If we want to analyze experimental or simulated data we might encounter the following tasks:

2WB05 Simulation Lecture 7: Output analysis

Queueing Theory and Simulation. Introduction

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010

REFLECTED VARIANCE ESTIMATORS FOR SIMULATION. Melike Meterelliyoz Christos Alexopoulos David Goldsman

EE/PEP 345. Modeling and Simulation. Spring Class 11

Choosing Arrival Process Models for Service Systems: Tests of a Nonhomogeneous Poisson Process

NEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS

THE HEAVY-TRAFFIC BOTTLENECK PHENOMENON IN OPEN QUEUEING NETWORKS. S. Suresh and W. Whitt AT&T Bell Laboratories Murray Hill, New Jersey 07974

Contents LIST OF TABLES... LIST OF FIGURES... xvii. LIST OF LISTINGS... xxi PREFACE. ...xxiii

Variance reduction techniques

Output Data Analysis for a Single System

14 Random Variables and Simulation

Multivariate Distributions

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

Lecturer: Olga Galinina

A distribution-free tabular CUSUM chart for autocorrelated data

Financial Econometrics and Quantitative Risk Managenent Return Properties

A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

57:022 Principles of Design II Final Exam Solutions - Spring 1997

This paper investigates the impact of dependence among successive service times on the transient and

Econ 424 Time Series Concepts

Part I Stochastic variables and Markov chains

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

Uniform random numbers generators

HITTING TIME IN AN ERLANG LOSS SYSTEM

Stabilizing Customer Abandonment in Many-Server Queues with Time-Varying Arrivals

Estimation and Confidence Intervals for Parameters of a Cumulative Damage Model

Simulation. Where real stuff starts

On the static assignment to parallel servers

SIMULATION OUTPUT ANALYSIS

Ch3. TRENDS. Time Series Analysis

REAL-TIME DELAY ESTIMATION BASED ON DELAY HISTORY SUPPLEMENTARY MATERIAL

COSC460 Honours Report: A Sequential Steady-State Detection Method for Quantitative Discrete-Event Simulation

Q = (c) Assuming that Ricoh has been working continuously for 7 days, what is the probability that it will remain working at least 8 more days?

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Applied Probability and Stochastic Processes

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes?

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18.

Chapter 2 Queueing Theory and Simulation

A STAFFING ALGORITHM FOR CALL CENTERS WITH SKILL-BASED ROUTING: SUPPLEMENTARY MATERIAL

Chapter 11 Estimation of Absolute Performance

Online Supplement to Creating Work Breaks From Available Idleness

Data analysis and stochastic modeling

Online Supplement to Creating Work Breaks From Available Idleness

1.225J J (ESD 205) Transportation Flow Systems

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Practice Problems Section Problems

6 Single Sample Methods for a Location Parameter

Chapter 6: Random Processes 1

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Network Traffic Characteristic

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

at least 50 and preferably 100 observations should be available to build a proper model

Basics of Uncertainty Analysis

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Simulation of stationary processes. Timo Tiihonen

Glossary availability cellular manufacturing closed queueing network coefficient of variation (CV) conditional probability CONWIP

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

SIMULATION OUTPUT ANALYSIS

Discrete-event simulations

Variance Parameter Estimation Methods with Data Re-use

Contents Preface The Exponential Distribution and the Poisson Process Introduction to Renewal Theory

Introduction to Queueing Theory with Applications to Air Transportation Systems

1 Random walks and data

Transcription:

INFORMS Journal on Computing, to appear. Performance of Skart: A Skewness- and Autoregression-Adjusted Batch-Means Procedure for Simulation Analysis Ali Tafazzoli, James R. Wilson Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Campus Box 7906, Raleigh, North Carolina 27695-7906, USA, {atafazz@ncsu.edu, jwilson@ncsu.edu} Emily K. Lada SAS Institute, 100 SAS Campus Drive, Cary, North Carolina 27513-8617, USA, emily.lada@sas.com Natalie M. Steiger Maine Business School, University of Maine, Orono, Maine 04469-5723, USA, nsteiger@maine.edu An analysis is given for an extensive experimental performance evaluation of Skart, an automated sequential batch-means procedure for constructing an asymptotically valid confidence interval (CI) on the steady-state mean of a simulation output process. Skart is designed to deliver a CI satisfying user-specified requirements on absolute or relative precision as well as coverage probability. Skart exploits separate adjustments to the half-length of the classical batch-means CI so as to account for the effects on the distribution of the underlying Student s t-statistic that arise from skewness (nonnormality) and autocorrelation of the batch means. Skart also delivers a point estimator for the steady-state mean that is approximately free of initialization bias. In an experimental performance evaluation involving a wide range of test processes, Skart compared favorably with other steady-state simulation analysis methods namely, its predecessors ASAP3, WASSP, and SBatch as well as ABATCH, LBATCH, the Heidelberger-Welch procedure, and the Law-Carson procedure. Specifically, Skart exhibited competitive sampling efficiency and closer conformance to the given CI coverage probabilities than the other procedures, especially in the most difficult test processes. Key words: simulation; statistical analysis; steady-state analysis; method of batch means; Cornish- Fisher expansion; autoregressive representation 1. Introduction In a nonterminating simulation, we are often interested in long-run (steady-state) average performance measures. Let fx i W i D 1; 2; : : : g denote the sequence of outputs generated by a single run of a nonterminating probabilistic simulation. If the simulation is in steady-state operation, then the 1

random variables fx i g will have the same steady-state marginal cumulative distribution function (c.d.f.) F X.x/ D PrfX i xg for i D 1; 2; : : : ; and for all real x. Usually in a nonterminating simulation, we are interested in constructing point and confidence-interval (CI) estimators for some characteristic of the steady-state c.d.f. F X./. In this article, we seek to estimate the steady-state mean, X D EŒX i D R 1 1 x df X.x/; and we limit the discussion to output processes for which EŒjX i j 3 < 1 so that the marginal mean X, variance X 2 D VarŒX i D EŒ.X i X / 2, and skewness SkŒX i D E.X i X /= X 3 are well defined. We let n denote the length of the time series fx i g of outputs generated by a single, long run of the simulation. Standard statistical methods require independent and identically distributed (i.i.d.) normal observations to yield a valid CI estimator of X ; unfortunately in many large-scale simulation experiments, these requirements are not even approximately satisfied because the relevant simulationgenerated output processes exhibit one or more of the following anomalous properties: initialization bias, correlation, and nonnormality (Law 2007). The start-up (or initialization bias) problem is caused by a transient in the initial sequence of responses that is due to the system s starting condition. It is usually impossible to start a simulation in steady-state operation, thereby making it necessary to do the following: (a) start the simulation in some convenient initial condition that may not be typical of steady-state operation; and (b) select the duration of the warm-up period (i.e., the data-truncation point or statistics clearing time) so that beyond the warm-up period, the mean of each simulation-generated observation is sufficiently close to the steady-state mean. If observations prior to the end of the warm-up period are included in the analysis, then the resulting estimator may be biased; and such bias in the point estimator can severely degrade not only the accuracy of the point estimator but also the probability that the associated CI will cover the steadystate mean (Law 2007). Beyond the start-up problem, the following complications also frequently arise in practice: (i) the correlation problem is caused by pronounced stochastic dependencies between successive responses generated within a single simulation run; and (ii) the nonnormality problem is caused by substantial departures from normality (in particular, nonzero skewness) in the simulation-generated responses. Several methods have been proposed for solving the previously mentioned problems in the analysis of steady-state simulation experiments. The method of nonoverlapping batch means (NBM) is by far the most widely used and most efficient output analysis procedure in practical applications for which initialization bias, correlation, or nonnormality are significant effects (Law 2007). In the NBM method, the outputs fx i W i D 1; : : : ng are divided into k adjacent nonoverlapping batches, each of size m, where we assume that n D km and that m is sufficiently large 2

to ensure the resulting batch means are at least approximately i.i.d. normal random variables. The sample mean for the j th batch is Y j.m/ D 1 mx X.j 1/mCi for j D 1; : : : ; k I m id1 and the grand mean of the individual batch means, xy D xy.m; k/ D 1 k kx Y j.m/ ; (1) is used as a point estimator for X. The objective is to construct a CI estimator for X that is centered on a point estimator like xy, where in practice some initial observations (or batches) may be deleted (truncated) to eliminate the effects of initialization bias. When the simulation is in steady-state operation, we assume the original simulation-generated process fx i g is stationary (in the strict sense), so that the joint distribution of the X i s is insensitive to time shifts. We also assume the process is weakly dependent i.e., X i s widely separated from each other in the sequence are almost independent in sense of -mixing (Billingsley 1968) such that the lag-q covariance, X.q/ D EŒ.X icq X /.X i X / for q D 0, 1, 2, : : : ; satisfies X.q/! 0 sufficiently fast as jqj! 1. These weakly dependent processes typically obey a j D1 central limit theorem of the form p n h x X.n/ X i D! n!1 N.0; X / ; (2) where: xx.n/ D n 1 P n id1 X i D xy is the sample mean of the target output process; X D lim n!1 nvar xx.n/ D 1X qd 1 X.q/ (3) is the steady-state variance parameter (as distinguished from the process variance 2 X D VarŒX i D X.0/); and the symbol D! n!1 denotes convergence in distribution. For conditions sufficient to ensure that the right-hand side of (3) is absolutely convergent and the central limit theorem (2) holds, see, for example, Theorem 20.1 of Billingsley (1968). If after appropriate data truncation the output process fx i W i D 1; : : : ; ng is stationary and weakly dependent, then as m! 1 with k fixed so that n D km! 1, an asymptotically valid 100.1 /% CI for X is xy.m; k/ t 1 =2;k 1 S m;k p k ; (4) 3

where t 1 =2;k 1 denotes the 1 =2 quantile of Student s t-distribution with k 1 degrees of freedom, and S 2 m;k D 1 k 1 kx Yj.m/ xy.m; k/ 2 j D1 is the sample variance of the k batch means with batch size m. The asymptotic validity of the batch-means CI (4) as m! 1 with k fixed is established, for example, in Theorem 1 of Steiger and Wilson (2001). The main difficulty with conventional implementations of NBM, such as the procedure of Law and Carson (1979) and the ABATCH and LBATCH procedures of Fishman and Yarberry (1997), is the lack of a reliable method for determining a sufficiently large batch size m such that the batch means fy j.m/g are approximately uncorrelated, normal, and free of initialization bias. For an elaboration of this issue, see Steiger et al. (2005) and Lada et al. (2006). In this article we summarize the results of an extensive experimental performance evaluation of Skart, a procedure for steady-state simulation analysis that is based on the method of batch means. Skart incorporates many advantages of its predecessors i.e., ASAP3 (Steiger et al. 2005); WASSP (Lada and Wilson 2006); and SBatch (Lada et al. 2008) while avoiding many of their disadvantages. Based on our experimentation with a broad diversity of test processes, we reached the following conclusions: (a) Skart generally delivered closer conformance to the nominal CI coverage probability than its predecessors; (b) Skart s sampling efficiency was about the same as that of ASAP3 and superior to that of WASSP and SBatch; (c) Skart eliminated initialization bias about as well as its predecessors did; and (d) Skart and SBatch were simpler to implement and understand than ASAP3 and WASSP. There is substantial experimental evidence that ASAP3 outperforms ABATCH and LBATCH (Steiger and Wilson 2002) and the Law-Carson procedure (Lada et al. 2006). Similarly Lada et al. (2007) provide good experimental evidence that WASSP outperforms the Heidelberger-Welch procedure. Thus we concluded that in several important respects, Skart compared favorably with many of the steady-state simulation analysis methods currently in use. The rest of this article is organized as follows. Section 2 provides a brief overview of Skart, and Section 3 summarizes the procedures to be compared with Skart: namely, SBatch, WASSP, and ASAP3. Section 4 contains selected results from our experimental performance evaluation, and Section 5 provides an analysis of the sampling efficiencies of the selected procedures. Section 6 summarizes the main findings of this work. To make this article self-contained, a detailed algorithmic statement of Skart is provided in the Online Supplement. Some preliminary results on the formulation and evaluation of Skart are presented in Tafazzoli et al. (2008). See Tafazzoli (2009), 4

Tafazzoli and Wilson (2010), and Tafazzoli et al. (2010) for a comprehensive discussion of the design and evaluation of Skart and N-Skart, a nonsequential variant of Skart. 2. Overview of Skart Skart is a sequential extension of the classical NBM method. Skart addresses the start-up problem by successively applying the randomness test of von Neumann (1941) to spaced batch means with progressively increasing batch sizes and interbatch spacer sizes. When the randomness test is finally passed with a batch size m and spacer size dm for sufficiently large integers m and d (where m 1 and d 0), the data-truncation point (i.e., the length of the warm-up period) w is defined by the initial spacer so that the leading w D dm observations are truncated (ignored) in calculating the point and CI estimators of X. (Although the batch size m may be increased on subsequent iterations of the later steps of Skart in order to satisfy the precision requirement as explained below, the truncation point w remains fixed once the randomness test has been passed.) Skart addresses the normality problem by a modified Cornish-Fisher expansion for the classical batch-means Student s t-ratio that incorporates a term due to Willink (2005) accounting for any skewness in the set of truncated, nonspaced batch means that are finally delivered. Skart addresses the correlation problem by using a first-order autoregressive approximation to the autocorrelation function of the delivered set of truncated, nonspaced batch means. Beyond the data-truncation point w, Skart computes k 0 truncated, nonspaced batch means with batch size m, Y j.m/ D 1 m mx X wc.j 1/mCi for j D 1; : : : ; k 0 I (5) id1 and then Skart computes the sample mean and variance of the truncated, nonspaced batch means, xy.m; k 0 / D 1 Xk 0 Y k 0 j.m/ and S 2 m;k D 1 0 k 0 1 j D1 Xk 0 j D1 Yj.m/ xy.m; k 0 / 2 ; (6) respectively. Next Skart computes an asymptotically valid 100.1 /% skewness- and autoregressionadjusted CI for X having the form 2 4 xy.m; k 0 / G.L/ s s AS 2 m;k AS 0 ; xy 2.m; k 0 m;k / G.R/ 0 k 0 k 0 where the skewness adjustments G.L/ and G.R/ are defined in terms of the function 3 5 ; (7) G./ p 3 1 C 6ˇ. ˇ/ 1 2ˇ ; with ˇ D y B m;k 00 6 p k 00 (8) 5

and 8 9 < approximately unbiased estimator of the marginal skewness of Y j.m/ that is = yb m;k 00 D computed from k 00 spaced batch means with truncation point w, batch size : ; (9) m, and dw=mem ignored observations in each interbatch spacer, so that the skewness-adjustment function G./ has the arguments L D t 1 =2;k 00 1 and R D t =2;k 00 1 ; where for q 2.0; 1/, the quantity t q; denotes the q quantile of Student s t-distribution with degrees of freedom; and the correlation adjustment A is computed as i A D h1 C y' Y.m/ i.h1 y' Y.m/ ; (10) where the standard estimator of the lag-one correlation of the truncated, nonspaced batch means is y' Y.m/ D 1 k 0 1 X Yj.m/ xy.m; k 0 / Y j C1.m/ xy.m; k 0 / k 0 1 S 2 : (11) j D1 m;k 0 (Note that in Equation (8), the indicated cube root 3 p 1 C 6ˇ. same sign as the quantity 1 C 6ˇ. ˇ/ is understood to have the ˇ/.) Thus we see that G.L/ and G.R/ are skewness-adjusted quantiles of Student s t-distribution for the left- and right-hand subintervals of the proposed CI; and the autoregression (correlation) adjustment A is applied to the naive estimator S 2 m;k 0 ı k 0 of Var xy.m; k 0 / so as to compensate for any residual correlation between the truncated, nonspaced batch means (5) that are used to compute the truncated grand mean xy.m; k 0 /. The specific methods for computing w, m, k 0, k 00, and yb m;k 00 are detailed in the Online Supplement. The final step of Skart is to determine whether the constructed CI satisfies the user-specified precision requirement. The half-length of the CI (7) is taken to be s AS 2 m;k H D maxf jg.l/j; jg.r/j g 0 ; k 0 the maximum of the left- and right-hand subintervals of (7). If the CI (7) satisfies the precision requirement H H ; where H is given by 8 ˆ< 1 ; for no user-specified precision level ; H D r ˇˇ xy.m; k 0 / ˇˇ ; for a user-specified relative precision level r ; ˆ: h ; for a user-specified absolute precision level h ; then Skart terminates, delivering the CI (7) as well as the usual CI of the form xy.m; k 0 / H. If the precision requirement H H is not satisfied, then Skart estimates the total number of nonspaced batches of the current batch size that are needed to satisfy the precision requirement, k D.H=H / 2 k 0 I and thus k m is the latest estimate of the total sample size beyond the 6

truncation point that is needed to satisfy the precision requirement. For the next iteration of Skart, the batch count is taken to be k 0 according to m D min k ; 1,024 ; and the associated batch size is updated ( ) m ; if k 0 k ; m mid 1:02; k =k 0 ; 2:0 ; if k 0 < k ; where midfu 1 ; u 2 ; u 3 g denotes the median of the numbers u 1, u 2, and u 3. On the next iteration of Skart, the total sample size including the warm-up period is given by n D w Ck 0 m. The additional observations are obtained by restarting the simulation or by retrieving extra data from storage; and then the next iteration of Skart involves reperforming the operations of Equations (5) (12) with the updated values of k 0, m, and n. Successive iterations of Skart are performed until a CI satisfying the precision requirement is finally delivered. See the Online Supplement for a complete description of the steps in the operation of Skart. 3. Simulation-Analysis Methods to Be Compared with Skart 3.1. Overview of ASAP3 Steiger et al. (2005) formulated ASAP3 as an improved variant of the batch-means algorithms ASAP (Steiger and Wilson 2002) and ASAP2 (Steiger et al. 2002) for steady-state simulation analysis. ASAP3 operates as follows: the batch size is progressively increased until spaced groups of four adjacent batch means pass the Shapiro-Wilk test for four-dimensional normality, where the spacer preceding each group also consists of four adjacent batch means; and then after skipping the first spacer as the warm-up period, ASAP3 fits a first-order autoregressive (AR(1)) time series model to the truncated, nonspaced batch means. If necessary, the batch size is further increased until the autoregressive parameter in the AR(1) model does not significantly exceed 0.8. Next ASAP3 computes the terms of an inverse Cornish-Fisher expansion for the classical batch means t-ratio based on the AR(1) parameter estimates; and finally ASAP3 delivers a correlation-adjusted CI based on this expansion. ASAP3 is a sequential procedure designed to deliver a CI satisfying a user-specified absolute or relative precision requirement. The stopping rules for Skart and SBatch are based on the stopping rules developed for ASAP3 and WASSP. 3.2. Overview of WASSP WASSP is a wavelet-based spectral method for the analysis of steady-state simulation output (Lada and Wilson 2006). WASSP determines first a batch size and a truncation point beyond which 7 (12)

nonoverlapping (adjacent) batch means form an approximately stationary normal process. For this purpose WASSP uses the randomness test of von Neumann (1941) to determine the size of the spacer preceding each batch that is sufficiently large to ensure the resulting spaced batch means are approximately i.i.d. Then WASSP uses the univariate normality test of Shapiro and Wilk (1965) to determine a batch size that is sufficiently large to ensure the spaced batch means are approximately normal. Next WASSP computes the discrete wavelet transform of the bias-corrected log-smoothedperiodogram of the truncated, nonspaced batch means; and the resulting wavelet coefficients are denoised by applying a soft-thresholding scheme. Then by computing the inverse discrete wavelet transform of the thresholded wavelet coefficients, WASSP delivers an estimator of the batch means log-spectrum and ultimately the steady-state variance parameter of the original (unbatched) process. Finally WASSP combines the estimator of the steady-state variance parameter with the grand average of the truncated batch means in a sequential procedure for constructing a CI estimator of the steady-state mean that satisfies user-specified requirements on absolute or relative precision as well as coverage probability. 3.3. Overview of SBatch SBatch (Lada et al. 2008) is a batch-means procedure in which the size of the warm-up period and the size of all subsequent batches are taken separately to be just large enough to yield spaced batch means that approximately constitute a stationary AR(1) process. Like WASSP, SBatch uses the randomness test of von Neumann (1941) and the normality test of Shapiro and Wilk (1965) to determine the size s of the spacer preceding each batch and the batch size m that are sufficiently large to ensure the resulting spaced batch means form an approximately (normal) AR(1) process. Then SBatch tests the condition that 0.8 is an upper limit for the lag-one correlation of the resulting set of approximately normal, spaced batch means. Each time the correlation test is failed, the batch size m is increased by 10%, the required additional observations are obtained (by restarting the simulation if necessary), a new set of spaced batch means is computed, and the correlation test is repeated for the new set of spaced batch means. Once the correlation test is passed, SBatch computes a correlation-adjusted 100.1 /% CI Ay 2 xx.m;s/ on the steady-state mean using r the current set of k 00 approximately normal spaced batch means as. follows, xx t 1 =2;k 00 1 k 00, where: the midpoint xx is the average of all observations except those in the first spacer; y 2 xx.m;s/ is the sample variance of the k00 spaced batch means 8

with batch size m and spacer size s; the quantity t 1 =2;k 00 1 is the 1 =2 quantile of Student s h i.h i t-distribution with k 00 1 degrees of freedom; and A D 1 C y' 2 1 y' xx.m;s/ 2 is the correlation adjustment based on y' 2, the usual sample estimator of the lag-one correlation between xx.m;s/ xx.m;s/ the k 00 spaced batch means with batch size m and spacer size s. SBatch employs a stopping rule similar to that of ASAP3 and WASSP to deliver a CI satisfying a user-specified absolute or relative precision requirement. 4. Experimental Performance Evaluation In this article, we compare the performance of Skart with that of other well-known steady-state simulation analysis procedures namely, ABATCH, ASAP3, the Heidelberger-Welch procedure, the Law-Carson procedure, SBatch, and WASSP. Several different types of test processes were used in the experimentation, including processes whose characteristics are typical of many large-scale practical applications of steady-state simulation, and processes exhibiting extremes of stochastic behavior that are commonly used to stress-test simulation analysis procedures. To demonstrate the robustness of Skart, we concentrate primarily on comparing the performance of Skart with that of SBatch, WASSP, and ASAP3. Some limited results are also presented for ABATCH, the Law-Carson procedure, and the Heidelberger-Welch procedure. The steady-state mean response is available analytically for all the selected test problems; thus we were able to evaluate the performance of each selected procedure in terms of actual versus nominal coverage probabilities for the CIs delivered by each procedure. Beyond CI coverage probability, the performance of each simulation analysis procedure was monitored with respect to the following criteria: total sample size; average CI half-length; and variance of the CI half-length. Each experiment included either 400 or 1,000 independent replications of the selected simulation analysis procedures, with nominal 90% and 95% CIs delivered on each replication of each procedure. To provide an indication of the asymptotic performance of each selected procedure, we used a decreasing sequence of relative-precision requirements (that is, decreasing values of r ) with each test problem. The standard error of each CI coverage estimator depended on the number of independent replications of that CI and its nominal coverage probability. In the case of 1,000 replications, the standard error of the coverage estimator for nominal 90% CIs was approximately 0:95% ; and for nominal 95% CIs, this standard error was approximately 0:69%. In the case of 400 replications, for nominal 90% CIs this standard error was approximately 1:5% ; and for nominal 95% CIs, this stan- 9

dard error was approximately 1:1%. To obtain a reasonable level of precision in the estimation of Skart s actual coverage probabilities, we performed 1,000 replications of Skart in each experiment on each test process. The resulting errors in estimating actual CI coverage probabilities were small enough to allow a meaningful comparison of Skart s performance with that of its competitors. 4.1. M=M=1 Queue-Waiting-Time Process Table 1 summarizes the experimental performance of Skart, SBatch, WASSP, and ASAP3 when they were applied to an M=M=1 queue-waiting-time process for a system with an empty-and-idle initial condition, an interarrival rate of D 0:9 customers per time unit, and a service rate of D 1:0 customers per time unit. In this system the steady-state server utilization is D 0:9, the steady-state expected waiting time is X D 9:0 time units, and the steady-state standard deviation of the waiting time is X D 9:950. Table 1 Performance of Skart, SBatch, WASSP, and ASAP3 in the M=M=1 queue-waiting-time process with 90% server utilization and empty-and-idle initial condition Prec. Performance Nominal 90% CIs Nominal 95% CIs Req. Measure Skart SBatch WASSP ASAP3 Skart SBatch WASSP ASAP3 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 87.60% 87.1% 87.7% 87.5% 93.9% 91.6% 93.4% 91.5% None Avg. sample size 42,369 54,371 18,090 31,181 42,369 54,371 17,971 31,181 Avg. CI half-length 1.7668 1.3864 3.0715 2.0719 2.298 1.6578 3.9987 2.5209 Var. CI half-length 0.2577 0.2603 2.0026 0.3478 1.2924 0.3725 3.6999 0.5350 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 89.6% 86.6% 87.2% 91% 93.9% 91.2% 93% 95.5% 15% Avg. sample size 85,996 66,719 92,049 103,742 124,169 88,447 143,920 140,052 Avg. CI half-length 1.1883 1.1556 1.1103 1.1820 1.2116 1.2046 1.1342 1.2059 Var. CI half-length 0.0199 0.0396 0.0387 0.0259 0.0174 0.0263 0.0314 0.0205 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 91.1% 88.8% 90.4% 89.5% 95.9% 94% 97% 94% 7:5% Avg. sample size 302,305 278,642 388,000 287,568 431,677 403,844 598,020 382,958 Avg. CI half-length 0.6347 0.6141 0.5866 0.6273 0.6374 0.6160 0.5950 0.6324 Var. CI half-length 0.0014 0.0055 0.0072 0.0023 0.0012 0.0056 0.0056 0.0020 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 92% 89.8% 94% 89.5% 96% 95.2% 97.7% 93.5% 3:75% Avg. sample size 1,105,417 1,151,178 1,518,400 969,011 1,586,267 1,618,147 2,361,300 1,341,522 Avg. CI half-length 0.321 0.3081 0.3060 0.3200 0.3212 0.3076 0.3060 0.3210 Var. CI half-length 0.0002 0.0014 0.0008 0.0004 0.0002 0.0014 0.0007 0.0004 The warm-up period for this process is relatively short, and consequently the effect of initialization bias on the sample mean waiting time is much less than for many of the other test processes considered in this study. However, this process is particularly interesting because in steady-state operation, we observe the following anomalies: (a) the autocorrelation function of the waiting time 10

process decays slowly with increasing lags; and (b) the marginal distribution of the waiting times is markedly nonnormal, having an atom at zero (that is, a nonzero probability mass at zero) and an exponential tail. These characteristics result in slow convergence to the classical requirement that the batch means are i.i.d. normal as the batch size increases. From the results in Table 1, we concluded that all four procedures performed reasonably well in terms of conformance to the nominal coverage probability. This was to be expected, since virtually all simulation-analysis procedures have been tuned to this test problem at least to some extent. As the precision level r became progressively smaller, Skart, ASAP3, and SBatch delivered CIs whose coverage probabilities converged to their nominal levels, while WASSP delivered CIs with some overcoverage; moreover for the 7:5% and 3:75% precision levels, WASSP required substantially larger sample sizes than were required by Skart, ASAP3, or SBatch. To put these figures into the proper perspective, note that the corresponding results for LBATCH, ABATCH, the Law-Carson procedure, and the Heidelberger-Welch procedure are inferior to most of the results in Table 1. From Table 2 of Steiger and Wilson (2002) for example, ABATCH delivered the following coverage probabilities for nominal 90% CIs with the indicated relative precision levels: (i) no precision, 60%; (ii) 15% precision, 72%; and (iii) 7:5% precision, 82%. From Table 2 of Lada et al. (2007), the corresponding coverage probabilities for the Heidelberger-Welch procedure are 67.8%, 76%, and 77%. Next, we studied the M=M=1 queueing system with 90% server utilization as described above with arrival rate D 0:9 and service rate D 1:0 but with an extreme initial condition in which c D 113 customers are assumed to be in the queue at time zero; and the first regular customer arrives as usual after an exponentially distributed interarrival time. This initial condition was carefully selected to induce a long transient in the queue-waiting-time statistic and to test the robustness of Skart, SBatch, and ASAP3 in removing severe initialization bias. Queue-waiting-time statistics are accumulated only for the regular customers arriving after time zero. Let C.u/ denote the number of customers in the system at time u for all u 0. If the initial number of customers C.0/ D c, then it can be shown that the conditional moment generating function of X 1, the waiting time in the queue for the first regular customer, is M X1.u/ E ˇ e ux 1ˇC.0/ D c D # c C.1 #/ 1 # c.1 u=/ c.1 u=/ c Œ1 #.1 u=/ for u < ; (13) 11

where # =. C / (see Appendix B of Tafazzoli (2009)). From (13) it follows that E X 1 jc.0/ D c D d du M X 1.0/ D c #.1 # c /.1 #/ ; and Var X 1 jc.0/ D c D d2 du 2 M X 1.0/ E 2 X 1 jc.0/ D c ; where the expression for.d 2 =du 2 /M X1.0/ is too complicated to display but is easily evaluated using Maple (Maplesoft 2003). For the M=M=1 queue-waiting-time process with C.0/ D 113, we have EŒX 1 jc.0/ D 113 D 111:889; p VarŒX1 jc.0/ D 113 D 10:6881 : (14) Thus we see that with the initial condition C.0/ D 113, the expected value of the queue waiting time for the first regular customer is EŒX 1 jc.0/ D 113 X : D 10:34 (15) X standard deviations above the steady-state mean queue waiting time. It is clear from the results in (14) and (15) that this test process is contaminated by severe initialization bias. Table 2 summarizes the performance of Skart, SBatch, and ASAP3 for the M=M=1 queuewaiting-time process with 113 customers initially in the system. From Table 2 we concluded that Skart, SBatch, and ASAP3 all performed relatively well with respect to conformance to the nominal CI coverage probabilities. ASAP3 outperformed Skart and SBatch with respect to the average sample size in all cases. As the precision requirement became smaller, Skart required relatively smaller sample sizes than SBatch required. Comparing the results in Tables 1 and 2, we concluded that for the cases involving the larger (coarser) levels of relative precision (namely, no precision and 15%), the average sample size for the process with the extreme initial condition was substantially larger compared with the corresponding average sample size for the process having the empty-and-idle initial condition; and then the difference between these two average sample sizes decreased as the precision level decreased. For the no precision case, the sample sizes given in Table 2 for Skart, SBatch, and ASAP3 were roughly twice as large as the corresponding sample sizes for the empty-and-idle initial condition. For the precision levels of 7:5% and 3:75%, both Skart and ASAP3 required roughly the same average sample sizes for both initial conditions, whereas SBatch required average sample sizes that were 15% 40% larger for the initial condition C.0/ D 113 compared with the corresponding average required sample sizes for the initial condition C.0/ D 0. In general as the precision level 12

Table 2 Performance of Skart, SBatch, and ASAP3 in the M=M=1 queuewaiting-time process with 90% server utilization and 113 customers initially in the system Prec. Performance Nominal 90% CIs Nominal 95% CIs Req. Measure Skart SBatch ASAP3 Skart SBatch ASAP3 # replications 1,000 1,000 1,000 1,000 1,000 1,000 CI coverage 91.4% 92.1% 93.1% 97.5% 96.1% 95.7% None Avg. sample size 74,951 111,198 57,880 73,503 111,198 57,876 Avg. CI half-length 1.6596 1.1843 1.6330 2.1469 1.4165 1.9699 Var. CI half-length 0.6722 0.1795 0.3970 1.4803 0.2570 0.6033 # replications 1,000 1,000 1,000 1,000 1,000 1,000 CI coverage 93.3% 91.9% 91.8% 96.9% 96.3% 95.5% 15% Avg. sample size 112,622 124,040 93,296 159,457 143,172 127,885 Avg. CI half-length 1.1576 1.0729 1.1688 1.16 1.1710 1.1967 Var. CI half-length 0.0323 0.0544 0.0339 0.027 0.0413 0.0240 # replications 1,000 1,000 1,000 1,000 1,000 1,000 CI coverage 92.5% 93.7% 90.1% 97.6% 97.3% 95.6% 7:5% Avg. sample size 318,154 358,533 300,386 464,443 514,722 390,574 Avg. CI half-length 0.6288 0.6115 0.6219 0.634 0.6101 0.6268 Var. CI half-length 0.002 0.0062 0.0027 0.0016 0.0063 0.0022 # replications 1,000 1,000 1,000 1,000 1,000 1,000 CI coverage 92.4% 92.1% 89.3% 96.3% 95.3% 94.9% 3:75% Avg. sample size 1,126,834 1,329,144 968,361 1,604,888 1,887,500 1,338,628 Avg. CI half-length 0.3206 0.3079 0.3193 0.3206 0.3042 0.3214 Var. CI half-length 0.0002 0.0014 0.0004 0.0002 0.0015 0.0003 decreased, we observed that the average sample size required by each procedure increased roughly as the inverse square of the precision level; and this inverse-square-law growth in the sample size rapidly swamped any effects arising from initialization bias. 4.2. M=M=1 Number-in-Queue Process Table 3 shows the result of applying Skart to construct CIs on the steady-state mean of the numberin-queue process for two M=M=1 queueing systems with the following characteristics: an emptyand-idle initial condition; interarrival rates of 0:8 and 0:9 customers per time unit, respectively; and a service rate of 1:0 customers per time unit. Two versions of the M=M=1 number-in-queue process were included in the experimentation primarily to demonstrate Skart s facility for automatically handling time-persistent simulation output processes. (Although SBatch, ASAP3, and WASSP do not have comparable facilities, all three procedures could be adapted to yield CIs based on time-persistent statistics.) The sampling interval for the M=M=1 number-in-queue process was set at D 1:0 time units, enforcing the collection of the time-averaged number-in-queue statistic every 1:0 time units during each simulation run. Because the service rate is 1:0 customers per 13

time unit and the interarrival rates are relatively large, it is reasonable to monitor changes in the time-averaged number-in-queue statistic every 1:0 time units. Table 3 Performance of Skart in the M=M=1 number-inqueue process with 80% and 90% server utilization computed over 1,000 independent replications Prec. Performance 80% server utilization 90% server utilization Req. Measure 90% CIs 95% CIs 90% CIs 95% CIs CI coverage 90.00% 94.90% 89.20% 93.90% None Avg. sample size 16,352 16,352 44,599 44,599 Avg. CI half-length 0.5639 0.6847 1.6058 1.946 Var. CI half-length 0.0188 0.0289 0.23 0.3421 CI coverage 90.50% 95.10% 88.80% 95.20% 15% Avg. sample size 25,887 35,963 90,467 134,859 Avg. CI half-length 0.4290 0.4348 1.081 1.109 Var. CI half-length 0.0029 0.0023 0.0183 0.0149 CI coverage 91.20% 94.30% 90.90% 96.70% 7:5% Avg. sample size 88,289 123,514 331,844 478,595 Avg. CI half-length 0.2273 0.2285 0.5739 0.5537 Var. CI half-length 0.0001 0.0001 0.001 0.001 CI coverage 91.00% 96.40% 91.60% 96.10% 3:75% Avg. sample size 307,769 433,417 1,030,586 1,505,643 Avg. CI half-length 0.1144 0.1143 0.3006 0.3007 Var. CI half-length 0 0 0.0001 0.0001 From the results in Table 3, we concluded that at all selected precision levels the delivered CIs for both 80% and 90% server utilizations were in close conformance with the corresponding nominal coverage probabilities. The average simulation run length for each precision level is equal to the reported average sample size multiplied by the sampling interval. Comparing the results in Table 3 with the results in Table 1, we concluded that for the M=M=1 queue with 90% server utilization and at each level of relative precision, Skart required nearly the same average sample sizes to deliver CIs for the steady-state means of both the number-in-queue and the queue-waitingtime processes. 4.3. M=H 2 =1 Queue-Waiting-Time Process Table 4 summarizes the results for the queue-waiting-time process in an M=H 2 =1 queueing system with an empty-and-idle initial condition, a mean interarrival time of 1:0, and a hyperexponential service-time distribution that is a mixture of two exponential distributions as detailed in Appendix 2 of Lada et al. (2006) such that the service times have a mean of 0:8 and a coefficient of variation of 2:0. Thus in steady-state operation this system has a server utilization of D 0:8 and a mean queue-waiting-time of X D 8:0. 14

Table 4 Performance of Skart, SBatch, WASSP, and ASAP3 in the M=H 2 =1 queue-waiting-time process with 80% server utilization and empty-and-idle initial condition Prec. Performance Nominal 90% CIs Nominal 95% CIs Req. Measure Skart SBatch WASSP ASAP3 Skart SBatch WASSP ASAP3 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 90% 89.5% 91% 87.8% 93% 94.3% 93% 91.8% None Avg. sample size 30,379 50,777 23,221 42,022 30,379 50,777 22,230 42,022 Avg. CI half-length 1.808 1.2135 2.7040 1.6140 2.3306 1.4504 3.4560 1.9500 Var. CI half-length 0.462 0.2106 1.7720 0.5960 1.4362 0.3015 2.9820 0.9080 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 89.7% 89.2% 88.3% 88% 95.1% 93.3% 94.5% 93.3% 15% Avg. sample size 72,890 65,149 78,691 76,214 103,163 84,363 138,960 96,706 Avg. CI half-length 1.074 1.0286 0.9930 1.0330 1.106 1.0804 0.9940 1.0690 Var. CI half-length 0.0143 0.0339 0.0300 0.0270 0.0099 0.0227 0.0290 0.0170 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 91.3% 89.6% 91% 90% 96.4% 94.7% 95.7% 94.5% 7:5% Avg. sample size 255,363 254,400 330,580 228,482 367,391 364,154 519,990 309,560 Avg. CI half-length 0.5664 0.5478 0.5160 0.5620 0.5655 0.5512 0.5280 0.5650 Var. CI half-length 0.0011 0.0048 0.0060 0.0020 0.001 0.0048 0.0020 0.0003 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 91.8% 89.5% 93% 90% 95.9% 94.7% 98% 94.7% 3:75% Avg. sample size 929,527 1,028,683 1,283,400 798,234 1,337,112 1,396,922 2,006,800 1,115,986 Avg. CI half-length 0.2862 0.2722 0.2700 0.2870 0.2858 0.2729 0.2700 0.2880 Var. CI half-length 0.0002 0.0012 0.0009 0.0003 0.0002 0.0011 0.0009 0.0002 We concluded from Table 4 that in the case of no precision requirement, Skart and WASSP outperformed SBatch and ASAP3 with respect to average required sample size, while all four procedures achieved close conformance to the user-specified CI coverage probability. In the case of 15% precision, all four procedures performed about the same. In the 7:5% and 3:75% precision cases, Skart, SBatch, and ASAP3 delivered comparable CI coverages; however, the average sample size required by ASAP3 was smaller than the average sample sizes required by Skart, SBatch, and WASSP. At all levels of precision, the CI coverages provided by Skart and SBatch were close to the corresponding nominal levels. 4.4. First-Order Autoregressive (AR(1)) Process The results shown in Table 5 are for applying Skart, SBatch, WASSP, and ASAP3 to an AR(1) process with the initial condition X 0 D 0, the autoregressive parameter D 0:995, and the steadystate mean X D 100. This process is generated via the relation X t D X C.X t 1 X / C " t for t D 1; 2; : : : ; (16) where f" t W t D 1; 2; : : :g i.i.d. N.0; 2 " / with 2 " D 1. 15

Table 5 Performance of Skart, SBatch, WASSP, and ASAP3 in the AR(1) process Prec. Performance Nominal 90% CIs Nominal 95% CIs Req. Measure Skart SBatch WASSP ASAP3 Skart SBatch WASSP ASAP3 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 92.9% 91.5% 90.9% 95.5% 96.6% 95.6% 94.5% 98.8% None Avg. sample size 21,537 29,831 9,866 41,076 21,537 29,831 9,824 41,076 Avg. CI half-length 2.5445 2.1468 5.3000 2.3300 3.0917 2.5678 6.7300 2.8300 Var. CI half-length 0.1554 0.0901 1.8300 0.1700 0.2269 0.1292 2.8800 0.2700 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 93.5% 91.5% 87% 95.5% 97.1% 95.6% 95% 98.8% 3:75% Avg. sample size 21,442 29,831 13,535 41,076 21,947 29,857 21,099 41,208 Avg. CI half-length 2.6716 2.1468 3.2100 2.3300 3.1723 2.5653 3.2800 2.8200 Var. CI half-length 0.1408 0.0901 0.1420 0.1700 0.1321 0.1233 0.1530 0.2570 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 94.2% 91.2% 93.5% 95.5% 96.2% 95.8% 97.7% 99.3% 1:875% Avg. sample size 46,747 42,182 57,449 68,474 68,636 61,001 90,371 101,526 Avg. CI half-length 1.7458 1.7764 1.6500 1.7600 1.729 1.7717 1.6600 1.7700 Var. CI half-length 0.0155 0.0084 0.0423 0.0134 0.0206 0.0100 0.0429 0.0120 # replications 1,000 1,000 1,000 400 1,000 1,000 1,000 400 CI coverage 93.8% 92.7% 94% 94.3% 97.1% 96.9% 98% 97.3% 0:9375% Avg. sample size 170,792 175,257 229,730 213,826 231,873 249,387 333,050 254,920 Avg. CI half-length 0.9019 0.8861 0.8300 0.8940 0.9027 0.8855 0.8670 0.8960 Var. CI half-length 0.0014 0.0035 0.0105 0.0026 0.0008 0.0039 0.0115 0.0021 The high correlation between successive observations in this process makes it a severe test of Skart s ability to handle correlated observations and to deliver an approximately valid correlationadjusted CI. The steady-state marginal standard deviation of the AR(1) process (16) is X D " ıp 1 2 D 10:0125 I and thus like the M=M=1 queue-waiting-time process with 113 customers initially in the system, the AR(1) process (16) with initial condition X 0 D 0 starts approximately ten standard deviations away from its steady-state mean. In both processes, there is a high level of positive correlation between successive observations, and the magnitude of the resulting initialization bias is large; however the bias is positive for the M=M=1 queue-waiting-time process and negative for the AR(1) process. The magnitude and duration of the initial transient in simulation-generated realizations of the AR(1) process (16) was purposely designed to stress-test Skart s ability to eliminate initialization bias as well as to compensate effectively for pronounced correlation among successive observations of a target process. Table 5 shows that for all precision levels, Skart s sampling efficiency was better than that of ASAP3. Notice that for this test problem, ASAP3-generated CIs exhibited significant overcoverage in some cases. In particular, for 95% CIs with 1:875% precision, ASAP3 delivered CIs having 16

99:3% coverage, which was significantly larger than the nominal level. For the no precision case, WASSP had the best sampling efficiency, with an average sample size of 9,866 and an empirical coverage probability of 90:9% for nominal 90% CIs, although the mean and variance of the CI half-lengths delivered by WASSP were significantly larger than the mean and variance of the CI half-lengths provided by Skart, SBatch, and ASAP3. WASSP also had the best sampling efficiency at the 3:75% relative precision level for nominal 90% CIs, requiring an average sample size of 13,535; by contrast, Skart, SBatch, and ASAP3 had average sample sizes of 21,442, 29,831 and 41,076, respectively. On the other hand, in this case WASSP delivered an empirical coverage probability of 87%; and by contrast the empirical coverage probabilities in this case for Skart, SBatch, and ASAP3 were 93:5%, 91:5%, and 95:5%, respectively. 4.5. AR(1)-to-Pareto (ARTOP) Process To generate the AR(1)-to-Pareto, or ARTOP, process, we require an underlying (or base ) AR(1) process fz i W i D 1; 2; : : :g represented by Z i D Z i 1 C b i for i D 1; 2; : : : ; (17) where Z 0 N.0; 1/, and fb i W i D 1; 2; : : :g i.i.d. N.0; 2 b / is a white-noise process with variance 2 b D Z 2.1 2 / D 1 2. The base AR(1) process (17) is then supplied as input to the standard normal c.d.f. to obtain a sequence of correlated random variables fu i D ˆ.Z i / W i D 1; 2; : : :g whose marginal distribution is Uniform(0,1), where ˆ.z/ D R z p 1 1 2 e 2 =2 d for all real z denotes the N.0; 1/ c.d.f. Finally, the process fu i W i D 1; 2; : : :g is supplied as input to the inverse of the Pareto c.d.f. F X.x/ PrfX xg D 1.=x/ ; x ; 0 ; x < ; where > 0 is a location parameter and > 0 is a shape parameter, to generate the ARTOP process fx i W i D 1; 2; : : :g as follows, X i D F 1 X.U i/ D F 1 X Œˆ.Z i/ D =Œ1 ˆ.Z i / 1= for i D 1; 2; : : : : (18) The mean and variance of the ARTOP process (18) are respectively given by X D EŒX i D. 1/ 1 (for > 1) and 2 X D VarŒX i D 2. 1/ 2. 2/ 1 (for > 2). The parameters of the Pareto distribution (18) are set according to D 2:1 and D 1; and the lag-one correlation in the base process (17) is set to D 0:995. This provides an ARTOP process 17

fx i W i D 1; 2; : : :g whose marginal distribution has mean, variance, skewness, and kurtosis given by X D 1:9091, 2 X D 17:3554, E Œ.X i X /= X 3 D 1, and E Œ.X i X /= X 4 D 1, respectively. The ARTOP process (18) is particularly difficult because its marginals are highly nonnormal in fact infinite values of the marginal skewness and kurtosis are well beyond the type of nonnormality that Skart was designed to handle. To stress-test Skart with this process, we used the initial condition Z 0 D 3:4 for the base AR(1) process (17) so as to generate an ARTOP process with a long transient period; in particular, note that the corresponding initial condition X 0 D FX 1Œˆ.Z 0/ D 43:5689 is ten standard deviations above the steady-state mean X of the ARTOP process. The results obtained for the ARTOP process are summarized in Table 6. Note that the results reported in Table 6 for SBatch, WASSP and ASAP3 are based on ARTOP processes that began in steady-state operation (i.e., we sampled Z 0 N.0; 1/ to ensure X i F X./ for i D 1; 2; : : :) and thus had no warm-up period. Therefore SBatch, WASSP, and ASAP3 arguably had a substantial advantage over Skart in the following performance comparison. Table 6 process Performance of Skart, SBatch, WASSP, and ASAP3 in the ARTOP Prec. Performance Nominal 90% CIs Nominal 95% CIs Req. Measure Skart SBatch WASSP ASAP3 Skart SBatch WASSP ASAP3 # replications 1,000 1,000 400 400 1,000 1,000 400 400 CI coverage 88.3% 85.3% 79% 85.5% 93.5% 90.1% 87% 90.8% None Avg. sample size 37,923 47,423 22,512 114,053 37,923 47,423 19,012 114,053 Avg. CI half-length 0.6399 0.3012 0.4480 0.1730 1.0307 0.0576 0.0830 0.0144 Var. CI half-length 0.6206 0.0403 0.0540 0.0098 5.927 0.0576 0.0830 0.0144 # replications 1,000 1,000 400 400 1,000 1,000 400 400 CI coverage 90.3% 84.4% 71.5% 85.5% 94.5% 89.6% 81% 90.8% 15% Avg. sample size 98,333 85,077 66,158 117,092 158,616 109,473 95,488 120,660 Avg. CI half-length 0.2223 0.1940 0.2230 0.1030 0.2244 0.2121 0.2230 0.1900 Var. CI half-length 0.0022 0.0030 0.0020 0.0025 0.0019 0.0031 0.0020 0.0024 # replications 1,000 1,000 400 400 1,000 1,000 400 400 CI coverage 88.3% 82.3% 85.3% 84% 95.7% 88% 91.5% 90.3% 7:5% Avg. sample size 333,666 306,781 345,870 186,517 478,926 460,613 520,750 255,512 Avg. CI half-length 0.1218 0.1154 0.1160 0.1270 0.1213 0.1144 0.1200 0.1310 Var. CI half-length 0.0003 0.0007 0.0005 0.0002 0.0003 0.0007 0.0004 0.0001 # replications 1,000 1,000 400 1,000 1,000 400 CI coverage 91.1% 86.3% 88.8% 96.6% 93.7% 91% 3:75% Avg. sample size 1,098,130 1,366,856 734,312 1,588,612 1,943,033 1,044,259 Avg. CI half-length 0.064 0.0576 0.0665 0.0643 0.0580 0.0668 Var. CI half-length 0 0.0002 0 0 0.0002 0 Table 6 indicates that Skart achieved close conformance to the user-specified CI coverage probabilities in all precision levels. Further examination of Table 6 revealed that Skart s performance in the ARTOP process was much better than of SBatch, WASSP, and ASAP3. For example, in the 18

case of 15% relative precision, WASSP s nominal 90% and 95% CIs had coverages of 71:5% and 81%, respectively. Moreover in the case of 7:5% precision, the nominal 90% CIs delivered by SBatch and ASAP3 had coverages of 82:3% and 84%, respectively. As summarized in Table 6, the performance of Skart in the ARTOP process suggested that in many types of practical applications, Skart should be robust against markedly nonnormal marginals, pronounced autocorrelations, and substantial initialization bias. In particular, the results in Table 6 demonstrated the effectiveness of Skart s skewness and correlation adjustments to the classical batch-means Student s t-ratio, so that neither the remaining deviations of the final batch means from normality nor the autocorrelations among the final batch means caused a loss of CI coverage that was either practically or statistically significant. 4.6. M=M=1=LIFO Queue-Waiting-Time Process The next test process was the sequence of queue waiting times for the M=M=1=LIFO queue, with customers in the queue being served in last-in-first-out (LIFO) order, an empty-and-idle initial condition, a mean interarrival time of 1:0, and a mean service time of 0:8. In steady-state operation this system has a server utilization of D 0:8 and a mean queue waiting time of X D 3:20. The M=M=1=LIFO queue-waiting-time process was selected mainly because in steady-state operation, batch means computed from the waiting times are highly skewed, even for batch sizes that are sufficiently large to ensure the batch means are nearly uncorrelated (Lada et al. 2006). Table 7 summarizes the experimental performance of Skart, SBatch, WASSP, and ASAP3 for the queue-waiting-time process in the M=M=1=LIFO queueing system. From Table 7 we concluded that Skart had better sampling efficiency compared with that of WASSP and SBatch, especially at the less-stringent precision levels (that is, no precision and 15% relative precision). For the M=M=1=LIFO queue-waiting-time process, the large skewness of the batch means caused the normality test in SBatch and WASSP to be passed only after the significance level of the test had become practically negligible (that is, less than 10 30 ), resulting in excessive sample sizes for both SBatch and WASSP merely to pass the normality test. For the precision levels of 15% and 7:5%, Skart also demonstrated closer conformance to the nominal CI coverage probabilities compared with ASAP3. For example in the case of 15% precision, the coverage probabilities for 90% CIs delivered by Skart and ASAP3 were 91.9% and 86.8%, respectively. From Table 2 of Lada et al. (2006), the Law-Carson procedure delivered the following coverage probabilities for nominal 90% CIs: (i) no precision, 64%; (ii) 15% precision, 76%; and (iii) 7:5% precision, 84%. 19

Table 7 Performance of Skart, SBatch, WASSP, and ASAP3 in the M=M=1=LIFO queue-waiting-time process with 80% server utilization Prec. Performance Nominal 90% CIs Nominal 95% CIs Req. Measure Skart SBatch WASSP ASAP3 Skart SBatch WASSP ASAP3 # replications 1,000 1,000 400 400 1,000 1,000 400 400 CI coverage 85.6% 91.4% 93% 87% 92.6% 95.9% 96% 92.5% None Avg. sample size 21,176 117,416 125,517 53,958 21,176 117,416 124,202 53,958 Avg. CI half-length 0.5136 0.1891 0.2650 0.1060 0.6987 0.2255 0.3350 0.3120 Var. CI half-length 0.0875 0.0057 0.0230 0.1060 0.247 0.0080 0.0310 0.0080 # replications 1,000 1,000 400 400 1,000 1,000 400 400 CI coverage 91.9% 91.3% 90.7% 86.8% 94.6% 94% 95.2% 92.8% 15% Avg. sample size 27,017 118,209 124,512 54,017 36,026 119,903 126,682 54,265 Avg. CI half-length 0.4035 0.1812 0.2490 0.2600 0.4185 0.2117 0.2960 0.3080 Var. CI half-length 0.0037 0.0021 0.0110 0.0040 0.0028 0.0022 0.0110 0.0050 # replications 1,000 1,000 400 400 1,000 1,000 400 400 CI coverage 91.6% 89.5% 90.2% 87.5% 95.9% 95.4% 96.2% 92.5% 7:5% Avg. sample size 81,441 126,961 152,355 68,325 122,391 134,123 194,590 90,911 Avg. CI half-length 0.2241 0.1734 0.1860 0.2190 0.2224 0.1996 0.1990 0.2260 Var. CI half-length 0.0003 0.0012 0.0020 0.0005 0.0002 0.0011 0.0010 0.0003 # replications 1,000 1,000 CI coverage 91.6% 96.4% 3:75% Avg. sample size 305,903 444,852 Avg. CI half length 0.1139 0.1138 Var. CI half length 0 0 4.7. M=M=1=M=1 Queue-Waiting-Time Process The next test process was the overall queue waiting time in a system consisting of two M=M=1 queues in series; this is usually called the M=M=1=M=1 queueing system. This system has an empty-and-idle initial condition, a mean interarrival time of 1.0, and a mean service time of 0.8 at each server. In steady-state operation, each sever has a utilization of D 0:8; and the expected total waiting time in both queues is 6:4. Table 8 summarizes the results of applying Skart, ASAP3, and WASSP to the M=M=1=M=1 queue-waiting-time process. Skart, ASAP3, and WASSP achieved close conformance to the specified CI coverage probabilities at all reported precision levels, but WASSP required substantially larger average sample sizes than were required by Skart and ASAP3. 4.8. Two-State Discrete-Time Markov Chains For the last series of test processes, we used a real-valued reward function defined on three irreducible aperiodic discrete-time Markov chains (DTMCs), each with a relatively high positive correlation structure but with marginal distributions having different levels of skewness. In partic- 20