Introduction to Modeling and Generating Probabilistic Input Processes for Simulation

Similar documents
Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds.

INTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION

Univariate Input Models for Stochastic Simulation

INTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION. James R. Wilson

LEAST SQUARES ESTIMATION OF NONHOMOGENEOUS POISSON PROCESSES

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Estimating and Simulating Nonhomogeneous Poisson Processes

IE 581 Introduction to Stochastic Simulation

Recall the Basics of Hypothesis Testing

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Estimation for Nonhomogeneous Poisson Processes from Aggregated Data

SIMULATION SEMINAR SERIES INPUT PROBABILITY DISTRIBUTIONS

PARAMETER ESTIMATION FOR ARTA PROCESSES

MODELING AND GENERATING MULTIVARIATE TIME SERIES WITH ARBITRARY MARGINALS AND AUTOCORRELATION STRUCTURES. Bahar Deler Barry L.

Distribution Fitting (Censored Data)

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

EE/CpE 345. Modeling and Simulation. Fall Class 9

Modeling and Performance Analysis with Discrete-Event Simulation

Bayesian Methods for Machine Learning

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Subject CS1 Actuarial Statistics 1 Core Principles

Week 1 Quantitative Analysis of Financial Markets Distributions A

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

Slides 8: Statistical Models in Simulation

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Frequency Analysis & Probability Plots

Introduction to Reliability Theory (part 2)

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Stat 710: Mathematical Statistics Lecture 31

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Computational statistics

Chapter Learning Objectives. Probability Distributions and Probability Density Functions. Continuous Random Variables

Lecturer: Olga Galinina

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION. < May 22, 2006

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

One-Sample Numerical Data

Robustness and Distribution Assumptions

IE 303 Discrete-Event Simulation L E C T U R E 6 : R A N D O M N U M B E R G E N E R A T I O N

Modeling and Simulation of Nonstationary Non-Poisson Arrival Processes

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

A DIAGNOSTIC FUNCTION TO EXAMINE CANDIDATE DISTRIBUTIONS TO MODEL UNIVARIATE DATA JOHN RICHARDS. B.S., Kansas State University, 2008 A REPORT

STATISTICS SYLLABUS UNIT I

Estimation of Quantiles

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Answers and expectations

Practice Problems Section Problems

Fitting Time-Series Input Processes for Simulation

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

The exponential distribution and the Poisson process

Applied Probability and Stochastic Processes

Clinical Trial Design Using A Stopped Negative Binomial Distribution. Michelle DeVeaux and Michael J. Kane and Daniel Zelterman

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Probability Distributions Columns (a) through (d)

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Reliability Growth in JMP 10

Stochastic Renewal Processes in Structural Reliability Analysis:

Continuous Distributions

11/16/2017. Chapter. Copyright 2009 by The McGraw-Hill Companies, Inc. 7-2

Repairable Systems Reliability Trend Tests and Evaluation

I I FINAL, 01 Jun 8.4 to 31 May TITLE AND SUBTITLE 5 * _- N, '. ', -;

A STATISTICAL TEST FOR MONOTONIC AND NON-MONOTONIC TREND IN REPAIRABLE SYSTEMS

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Introduction to Probability and Statistics (Continued)

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

Probabilities & Statistics Revision

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

SENSITIVITY OF OUTPUT PERFORMANCE MEASURES TO INPUT DISTRIBUTIONS IN QUEUEING SIMULATION MODELING

Estimating Income Distributions Using a Mixture of Gamma. Densities

Computer Simulation of Repairable Processes

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Overall Plan of Simulation and Modeling I. Chapters

Estimation and Confidence Intervals for Parameters of a Cumulative Damage Model

6 Single Sample Methods for a Location Parameter

Exploratory Data Analysis. CERENA Instituto Superior Técnico

Continuous Distributions

ORDER STATISTICS, QUANTILES, AND SAMPLE QUANTILES

5.6 The Normal Distributions

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Simulation model input analysis

Characterizing Heuristic Performance in Large Scale Dual Resource Constrained Job Shops

S1. Proofs of the Key Properties of CIATA-Ph

Continuous random variables

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Modeling Hydrologic Chanae

Foundations of Probability and Statistics

Stat 5101 Lecture Notes

The Robustness of the Multivariate EWMA Control Chart

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Transcription:

Introduction to Modeling and Generating Probabilistic Input Processes for Simulation Michael E. Kuhl RIT Natalie M. Steiger University of Maine Emily K. Lada SAS Institute Inc. Mary Ann Wagner SAIC James R. Wilson NC State University <www.ise.ncsu.edu/jwilson> December 11, 2007

OVERVIEW I. Introduction II. Univariate Input Models A. Generalized Beta Distribution Family B. Johnson Translation System of Distributions C. Bézier Distribution Family III. Time-Dependent Arrival Processes IV. Conclusions and Recommendations 2

I. Introduction Stochastic simulations require valid input models e.g., probability distributions that accurately mimic the random input processes driving the target system. 3

Problems in using many conventional probability models: 1. They cannot adequately represent real-world behavior, e.g. in the tails of the underlying distribution. 2. Parameter estimation based on sample data or subjective information (expert opinion) is often troublesome. 3. Fine-tuning the fitted model is difficult; e.g., many conventional probability distributions have the following drawbacks (a) A limited number of parameters available to control the fitted distribution, and (b) No effective mechanism for directly manipulating the fitted distribution while simultaneously updating its parameter estimates. 4

Conventional approach to identifying an input model uses sample data to select from a list of well-known alternatives based on 1. informal graphical techniques such as probability plots, Q Q plots, histograms, empirical frequency distributions, or box-plots; and 2. statistical goodness-of-fit tests such as the Kolmogorov-Smirnov, chi-squared, and Anderson-Darling tests. 5

Drawbacks of conventional input modeling 1. Visual comparison of a histogram to a fitted probability density function (p.d.f.) depends on the (arbitrary) layout of the histogram. 2. Problems with statistical goodness-of-fit tests include: (a) In small samples, low power to detect lack of fit results in an inability to reject any alternatives. (b) In large samples, practically insignificant fit discrepancies result in rejection of all alternatives. 6

Problems in estimating the parameters of the selected input model from sample data: Matching the mean and standard deviation of the fitted distribution with that of the sample often fails to capture relevant shape characteristics. Some estimation methods, such as maximum likelihood and percentile matching, may simply fail to estimate some parameters. Users lack a comprehensive basis for selecting the best-fitting model. 7

Problems with parameter estimation based on subjective information (expert opinion): Subjective estimates of moments such as the mean and standard deviation can be unreliable and depend critically on the units of measurement. Subjective estimates of extreme quantiles (e.g., lower and upper limits of the fitted distribution) are unreliable. Practitioners lack definitive procedures for identifying and estimating valid input models; thus, output analysis is often based on incorrect input processes. We focus on methods for input modeling that alleviate many of these problems. 8

II. Univariate Input Models A. Generalized Beta Distribution Family If X is a continuous random variable with lower limit a and upper limit b whose distribution is to be approximated and subsequently sampled in a simulation, then often we can model the behavior of X using a generalized beta distribution. Generalized beta p.d.f. f X (x) = Ɣ(θ 1 + θ 2 ) Ɣ(θ 1 )Ɣ(θ 2 )(b a) θ 1+θ 2 1 (x a)θ 1 1 (b x) θ 2 1 for a x b, (1) where Ɣ(z) = 0 t z 1 e t dt (for z>0) denotes the gamma function. 9

The beta p.d.f. can accommodate a wide variety of shapes, including symmetric and positively or negatively skewed unimodal p.d.f. s; J - and U-shaped p.d.f. s; left- and right-triangular p.d.f. s; and uniform p.d.f. s. Some examples illustrating the range of distributional shapes achievable with the beta p.d.f. follow. 10

Positively and Negatively Skewed Unimodal Beta Densities 11

U-shaped Beta Densities 12

J -shaped and Left-triangular Beta Densities 13

Symmetric and Uniform Beta Densities 14

Cumulative distribution function (c.d.f.) of beta variate X, F X (x) = Pr{X x} = x has no convenient analytical expression. f X (w) dw for all real x, 15

Mean and variance of X are given by µ X = E[X] = θ 1b + θ 2 a θ 1 + θ 2, σ 2 X = E[( X µ X ) 2 ] = (b a) 2 θ 1 θ 2 (θ 1 + θ 2 ) 2 (θ 1 + θ 2 + 1). (2) Provided θ 1,θ 2 > 1 so that the p.d.f. (1) is unimodal, the mode is given by m = (θ 1 1)b + (θ 2 1)a. (3) θ 1 + θ 2 2 The key distributional characteristics (2) and (3) are simple functions of a, b, θ 1, and θ 2 ; and this facilitates rapid input modeling. 16

Fitting Beta Distributions to Data or Subjective Information Given the data set {X i : i = 1,...,n} of size n, welet X (1) X (2) X (n) denote the order statistics; and we compute the sample statistics â = 2X (1) X (2), b = 2X (n) X (n 1), n n X = n 1 X i, S 2 = (n 1) 1 ( Xi X ) 2. i=1 i=1 17

Moment-matching estimates of θ 1, θ 2 are computed from θ 1 = d2 1 (1 d 1) d 2 2 d 1, θ 2 = d 1(1 d 1 ) 2 d 2 2 (1 d 1 ), where d 1 = X â b â and d 2 = S. b â 18

BetaFit (AbouRizk, Halpin, and Wilson 1994) is a Windows-based package for fitting the beta distribution to sample data by computing â, b, θ 1, and θ 2 using the following estimation methods: moment matching; feasibility-constrained moment matching (so that the feasibility conditions â<x (1) and X (n) < b are always satisfied); maximum likelihood (assuming a and b are known and thus are not estimated); and ordinary least squares (OLS) and diagonally weighted least squares (DWLS) estimation of the c.d.f. BetaFit is in the public domain and is available on the Web via <www.ise.ncsu.edu/jwilson/page3>. 19

Application of BetaFit to a Sample of n = 9,980 Observations of End-to-End Chain Lengths (in Angströms) of Nafion, an Ionic Polymer Used As a Smart Material, Based on the Method of Moment Matching 20

21

Result of Applying BetaFit to Nafion Data Set Using Maximum Likelihood Estimation 22

23

Result of Applying BetaFit to Nafion Data Set Using Ordinary Least Squares Estimation of the C.d.f. 24

25

Rapid input modeling with subjective estimates â, m, and b of the minimum, mode, and maximum, respectively, of the target distribution: θ 1 = d2 + 3d + 4 d 2 + 1 and θ 2 = 4d2 + 3d + 1, (4) d 2 + 1 where d = b m. m â The mode of the fitted beta distribution will differ from m by at most 4.4%; in practice the error is usually at most 1%. 26

VIBES (AbouRizk, Halpin, and Wilson 1991) is a Windows-based package for fitting the beta distribution to subjective estimates of: 1. the endpoints a and b; and 2. any of the following combinations of distributional characteristics the mean µ X and the variance σx 2, the mean µ X and the mode m, the mode m and the variance σx 2, the mode m and an arbitrary quantile x p = FX 1 (p) for p (0, 1), or two quantiles x p and x q for p, q (0, 1). 27

Advantages of the beta distribution as an input-modeling tool: sufficient flexibility to represent with reasonable accuracy a wide diversity of distributional shapes; and convenient estimation of parameters from sample data or subjective information. Disadvantages of the beta distribution as an input-modeling tool: difficult to explain; and difficult to sample some popular beta variate generators break down when θ 1 > 10 or θ 2 > 10. 28

Application of Beta Distributions to Pharmaceutical Manufacturing Pearlswig (1995) developed a simulation of a proposed facility for manufacturing effervescent tablets. For each operation, he obtained three time estimates ( â, m, and b ) from the process engineers. Extremely conservative estimates given for upper limits (so that b m). With triangular distributions to model processing times, bottlenecks resulted in excessively low simulation estimates of annual production. Using (4), Pearlswig fitted beta distributions to all operation times; and then the simulation results conformed to production levels of similar plants elsewhere. 29

B. Johnson Translation System of Distributions To fit a distribution to the continuous random variable X, Johnson (1949a) proposed finding a translation of X to a standard normal random variable Z with mean 0 and variance 1 so that Z N(0, 1). For a detailed discussion of the Johnson translation system, see DeBrota, D. J., R. S. Dittus, S. D. Roberts, J. R. Wilson, J. J. Swain, and S. Venkatraman. 1989a. Modeling input processes with Johnson distributions. In Proceedings of the 1989 Winter Simulation Conference, pp. 308 318. Available online via <www.ise.ncsu.edu/jwilson/files/wsc89jnsn.pdf>. 30

The proposed normalizing translations have the general form ( ) X ξ Z = γ + δ g, (5) λ where γ and δ are shape parameters, λ is a scale parameter, ξ is a location parameter, and the function g( ) defines the four distribution families in the Johnson translation system, g(y) = ln(y), for S L (lognormal) family, ( ln y + ) y 2 + 1, for S U (unbounded) family, ln[y/(1 y)], for S B (bounded) family, y, for S N (normal) family. 31

Johnson c.d.f. If (5) is an exact normalizing translation of X to a standard normal random variable, then the c.d.f. of X is given by [ ( )] x ξ F X (x) = γ + δ g for all x H, λ where: (z) = (2π) 1/2 z exp( 2 1 w2) dw is the standard normal c.d.f.; and the space of X is [ξ,+ ), for S L (lognormal) family, (, + ), for S U (unbounded) family, H = [ξ,ξ + λ], for S B (bounded) family, (, + ), for S N (normal) family. 32

Johnson p.d.f. is f X (x) = δ g λ(2π) 1/2 ( ) [ ( )] x ξ 2 x ξ exp λ 1 γ + δ g 2 λ for all x H, where 1/y, for S L (lognormal) family, g 1 / y 2 + 1, for S U (unbounded) family, (y) = 1/[y/(1 y)], for S B (bounded) family, 1, for S N (normal) family. Following are examples illustrating all the distributional shapes in the Johnson system. 33

Symmetric Bimodal and Nearly Uniform Johnson S B Densities 34

Nearly J -shaped and Symmetric Unimodal Johnson S B Densities 35

Positively Skewed and Symmetric Unimodal Johnson S B Densities 36

Symmetric and Negatively Skewed Johnson S U Densities 37

Fitting Johnson Distributions to Sample Data We select an estimation method and the desired translation function g( ) and then obtain estimates of γ, δ, λ, and ξ. The Johnson system has the flexibility to match (a) any feasible combination of values for the mean µ X, variance σ 2 X, skewness and kurtosis or α X = E [( X µ X ) 3 / σ 3 X ] ( often denoted by β1 ), β X = E [( X µ X ) 4 / σ 4 X ] (often denoted by β2 ); (b) sample estimates of the moments µ X, σ 2 X, α X, and β X. 38

FITTR1 (Swain, Venkatraman, and Wilson 1988) is a software package for fitting Johnson distributions to sample data using the following estimation methods: OLS and DWLS estimation of the c.d.f.; minimum L 1 and L norm estimation of the c.d.f.; moment matching; and percentile matching. 39

VISIFIT (DeBrota et al. 1989b) is a Windows-based software package for fitting Johnson S B distributions to subjective information, possibly combined with sample data. The user must provide estimates of a, b, and any two of the following characteristics: the mode m; the mean µ X ; the median x 0.5 ; arbitrary quantile(s) x p or x q for p, q (0, 1); the width of the central 95% of the distribution; or the standard deviation σ X. Venkatraman, Swain and Wilson (1988), DeBrota et al. (1989b), FITTR1, and VISIFIT are available on the Web via <www.ise.ncsu.edu/jwilson/more_info>. 40

Generating Johnson Variates by Inversion [1] Generate Z N(0, 1). [2] Apply to Z the inverse translation X = ξ + λ g 1 ( Z γ δ ), (6) where for all real z we define the inverse translation function g 1 (z) = e z, for S L (lognormal) family, ( e z e z)/ 2, for S U (unbounded) family, 1 /( 1 + e z), for S B (bounded) family, z, for S N (normal) family. (7) 41

Application of Johnson Distributions to Smart Materials Research Matthews et al. (2006) and Weiland et al. (2005) use a multiscale modeling approach to predict material stiffness of a certain class of smart materials called ionic polymers. Material stiffness depends on effective length of the polymer chains comprising the material. In a case study of the ionic polymer Nafion, Matthews et al. (2006) develop a simulation of polymer-chain conformation on a nanoscopic level so as to generate a large number of end-to-end chain lengths. The chain-length p.d.f. is estimated and used as input to a macroscopiclevel mathematical model to predict material stiffness. 42

Johnson S U C.d.f. Fitted to n = 9,980 Nafion Chain Lengths Using DWLS Estimation 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 0 10 20 30 40 50 60 70 80 90 43

Johnson S U P.d.f Fitted to n = 9,980 Nafion Chain Lengths Using DWLS Estimation 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 10 0 10 20 30 40 50 60 70 80 90 44

Matthews et al. (2006) and Weiland et al. (2005) obtain more accurate and intuitively appealing fits to Nafion chain-length data with Johnson p.d.f. s than with other distributions. Material stiffness is computed from the second derivative f X (x) of the fitted p.d.f. There is a relatively simple relationship between the Johnson parameters and material stiffness. 45

Application of Johnson Distributions to Healthcare To model arrival patterns of patients who have scheduled appointments at community healthcare clinics in San Diego, Alexopoulos et al. (2006) estimate the distribution of patient tardiness that is, deviation from the scheduled appointment time. Alexopoulos et al. (2006) perform an exhaustive analysis of 18 continuous distributions, concluding that the S U distribution provided superior fits to the available data. 46

C. Bézier Distribution Family Definition of Bézier Curves A Bézier curve is often used to approximate a smooth function on a bounded interval by forcing the Bézier curve to pass in the vicinity of selected control points { pi (x i,z i ) T : i = 0, 1,...,n } in two-dimensional Euclidean space. 47

A Bézier curve of degree n with control points {p 0, p 1,...,p n } is given parametrically by P(t) = [ P x (t; n, x), P z (t; n, z) ] T = n B n,i (t) p i for t [0, 1], (8) i=0 where x (x 0,x 1,...,x n ) T and z (z 0,z 1,...,z n ) T, and where the blending function, B n,i (t) n! i! (n i)! ti (1 t) n i for t [0, 1], (9) is the ith Bernstein polynomial for i = 0, 1,...,n. 48

Bézier Distribution and Density Functions IfX is a continuous random variable on [a, b] with c.d.f. F X ( ) and p.d.f. f X ( ), then we can approximate F X ( ) arbitrarily closely using a Bézier curve of the form (8) by taking a sufficient number (n + 1) of control points with appropriate coordinates p i = (x i,z i ) T for the ith control point, where i = 0,...,n. 49

IfX is Bézier, then the c.d.f. of X is given parametrically by P(t) = [ P x (t; n, x), P z (t; n, z) ] T = { x(t),f X [x(t)] } T for t [0, 1], (10) where n x(t) = P x (t; n, x) = B n,i (t)x i, i=0 F X [x(t)]=p z (t; n, z) = n B n,i (t)z i for t [0, 1]. (11) i=0 For a detailed discussion of Bézier distributions, see Wagner, M. A. F., and J. R. Wilson. 1996a. Using univariate Bézier distributions to model simulation input processes. IIE Transactions 28 (9): 699 711. Available online via <www.ise.ncsu.edu/jwilson/files/wagner96iie.pdf> 50

IfXis Bézier with c.d.f. F X ( ) given by (10), then the p.d.f. f X (x) is P (t) = [ Px (t; n, x), P z (t; n, x, z)] T = { x(t),f X [x(t)] } T for t [0, 1], where x(t) = Px (t; n, x) = P x(t; n, x) as in (11) and f X [x(t)]=p z (t; n, x, z) = P z(t; n 1, z) P x (t; n 1, x) = n 1 i=0 B n 1,i(t) z i, n 1 i=0 B n 1,i(t) x i where x ( x 0,..., x n 1 ) T and z ( z 0,..., z n 1 ) T, with x i x i+1 x i and z i z i+1 z i for i = 0, 1,...,n 1. 51

Generating Bézier Variates by Inversion [1] Generate a random number U Uniform[0, 1]. [2] Find t U [0, 1] such that F X [x(t U )] = n B n,i (t U )z i = U. (12) i=0 [3] Deliver the variate X = x(t U ) = n B n,i (t U )x i. i=0 Codes to implement this approach are available on the Web via <www.ise.ncsu.edu/jwilson/page3>. 52

Using PRIME to Model Bézier Distributions PRIME (Wagner and Wilson 1996a) is a Windows-based system for fitting Bézier distributions to data or subjective information. PRIME is available on the previously mentioned Web site. Control points appear as indexed black squares that can be manipulated with the mouse and keyboard. Each control point exerts on the c.d.f. a magnetic attraction whose strength is given by the associated Bernstein polynomial (9). Moving a control point causes the displayed c.d.f. to be updated (nearly) instantaneously. 53

PRIME Windows Showing the Bézier C.d.f. (Left Panel) with Its Control Points and the P.d.f. (Right Panel) 54

PRIME includes the following methods for fitting Bézier distributions to sample data: OLS estimation of the c.d.f.; minimum L 1 and L norm estimation of the c.d.f.; maximum likelihood estimation (assuming a and b are known); moment matching; and percentile matching. For automatic estimation of the number of control points, see Wagner, M. A. F., and J. R. Wilson. 1996b. Recent developments in input modeling with Bézier distributions. In Proceedings of the 1996 Winter Simulation Conference, pp. 1448 1456. Available online as <www.ise.ncsu.edu/jwilson/files/bezwsc96.pdf>. 55

Bézier Distribution Fitted to n = 9,980 Nafion Chain Lengths Using OLS Estimation of the C.d.f. 56

Advantages of the Bézier distribution family: It is extremely flexible and can represent a wide diversity of distributional shapes, including multiple modes and mixed distributions. If data are available, then the likelihood ratio test of Wagner and Wilson (1996b) can be used with any of the available estimation methods to find automatically both the number and location of the control points. In the absence of data, PRIME can be used to determine the conceptualized distribution based on known quantitative or qualitative information. As the number (n + 1) of control points increases, so does the flexibility in fitting Bézier distributions. 57

III. Time-Dependent Arrival Processes Many simulation applications require high-fidelity input models of arrival processes with arrival rates that depend strongly on time. Nonhomogeneous Poisson processes (NHPPs) have been used successfully to model complex time-dependent arrival processes in many applications. An NHPP {N(t) : t 0} is a counting process such that N(t) is the number of arrivals in the time interval (0,t]; λ(t) is the instantaneous arrival rate at time t, and λ(t) satisfies the Poisson postulates; and the (cumulative) mean-value function is given by µ(t) E[N(t)]= t 0 λ(z) dz for all t 0. (13) 58

We discuss the nonparametric approach of Leemis (1991, 2000, 2004) for modeling and simulation of NHPPs; see Leemis, L. M. 1991. Nonparametric estimation of the cumulative intensity function for a nonhomogeneous Poisson process. Management Science 37 (7): 886 900. Leemis, L. M. 2000. Nonparametric estimation of the cumulative intensity function for a nonhomogeneous Poisson process from overlapping realizations. Management Science 46 (7): 989 998. Leemis, L. M. 2004. Nonparametric estimation and variate generation for a nonhomogeneous Poisson process from event count data. IIE Transactions 36:1155 1160. 59

The context is a recent application to modeling and simulating unscheduled patient arrivals to a community healthcare clinic (Alexopoulos et al. 2005) Suppose we have a time interval (0,S] over which we observe several independent replications (realizations) of a stream of unscheduled patient arrivals constituting an NHPP with arrival rate λ(t) for t (0, S]. For example, (0,S] might represent the time period on each weekday during which unscheduled patients may walk into a clinic say, between 9 A.M. and 5 P.M. so that S = 480 minutes. 60

Suppose k realizations of the arrival stream over (0,S] have been recorded so that we have n i patient arrivals in the ith realization for i = 1, 2,...,k; and k n = n i patient arrivals accumulated over all realizations. i=1 61

Let { t (i) : i = 1,...,n } denote the overall set of arrival times for all unscheduled patients expressed as an offset from the beginning of (0,S] and then sorted in increasing order. For example, if we observed n = 250 patient arrivals over k = 5 days, each with an observation interval of length S = 480 minutes, then t (1) = 2.5 minutes means that over all 5 days, the earliest arrival occurred 2.5 minutes after the clinic opened on one of those days; and t (2) = 4.73 minutes means that the second-earliest arrival occurred 4.73 minutes after the clinic opened on one of those days. t (n) = 478.5 minutes means that the latest arrival occurred 478.5 minutes after the clinic opened on one of those days. 62

We estimate the mean-value function µ(t) as follows. We take t (0) 0 and t (n+1) S. For t (i) <t t (i+1) and i = 0, 1,...,n, we take { in µ(t) = (n + 1)k + n [ ] t t (i) (n + 1)k [ ] t (i+1) t (i) }. (14) Equation (14) provides a basis for modeling and simulating unscheduled patient-arrival streams when the arrival rate exhibits a strong dependence on time. 63

µ(t) n k n 2 (n+1)k 3n (n+1)k 2n (n+1)k n (n+1)k (0,0) t (1) t (2) t (3) t (n) t (n+1) S...... Nonparametric Estimator of Mean Value Function 64

Goodness-of-fit Testing for the Fitted Mean-Value Function In addition to the realizations of the target arrival process that were used to compute the estimated mean-value function µ(t), suppose we observe one additional realization { A i : i = 1, 2,...,n } independently of the previously observed realizations, with the ith patient arriving at time A i for i = 1,...,n. If the target arrival stream is an NHPP with mean-value function µ(t) for t (0,S], then the transformed arrival times { B i = µ ( A i) : i = 1, 2,...,n } constitute a homogeneous Poisson process with an arrival rate of 1. 65

If the target arrival stream is an NHPP with mean-value function µ(t) for t (0,S], then the corresponding transformed interarrival times { X i = B i B i 1 : i = 1, 2,...,n } (with B 0 0) constitute a random sample from an exponential distribution with a mean of 1. To test the adequacy of the fitted mean-value function µ(t) as an approximation to µ(t), apply the Kolmogorov-Smirnov test to the data set (with A 0 { X i = µ ( A ) ( ) i µ A i 1 : i = 1, 2,...n } 0), where the hypothesized c.d.f. in the goodness-of-fit test is F X i (x) = 1 e x for all x 0. 66

Generating Realizations of the Fitted NHPP [1] Set i 1 and N 0. [2] Generate U i Uniform(0, 1). [3] Set B i ln(1 U i ). [4] While B i <n/kdo Begin Set m (n + 1)kBi n Set A i t (m) + { } { (n + 1)kB i t (m+1) t (m) n Set N N + 1; Set i i + 1; Generate U i Uniform(0, 1); Set B i B i 1 ln(1 U i ). End ; } m ; NHPP Simulation Procedure of Leemis (1991) 67

Advantages of Leemis s Nonparametric Approach to Modeling and Simulation of NHPPs It does not require the assumption of any particular form for arrival rate λ(t) as a function of t. It provides a strongly consistent estimator of mean-value function that is, lim k µ(t) = µ(t) for all t (0,S] with probability 1. The simulation algorithm given above, which is based on inversion of µ(t) so that A i = µ 1 (B i ) for i = 1,...,N, is also asymptotically valid as k. 68

Application to Organ Transplantation Policy Analysis The United Network for Organ Sharing (UNOS) applied a simplified variant of this approach in the development and use of the UNOS Liver Allocation Model (ULAM) for analyzing the cadaveric liver-allocation system in the U.S. (see Harper et al. 2000). ULAM incorporated models of (a) the streams of liver-transplant patients arriving at 115 transplant centers, and (b) the streams of donated organs arriving at 61 organ procurement organizations in the United States. Virtually all these arrival streams exhibited long-term trends as well as strong dependencies on the time of day, the day of the week, and the geographic location of the arrival stream. 69

Handling Arrival Processes Having Trends and Cyclic Effects Kuhl, Sumant, and Wilson (2006) develop a semiparametric method for modeling and simulating arrival processes that may exhibit a long-term trend or nested periodic phenomena (such as daily and weekly cycles), where the latter effects might not necessarily possess the symmetry of sinusoidal oscillations. See Kuhl, M. E., S. G. Sumant, and J. R. Wilson. 2006. An automated multiresolution procedure for modeling complex arrival processes. INFORMS Journal on Computing 18 (1): 3 18. Available online via <www.ise.ncsu.edu/jwilson/files/kuhl06joc.pdf> 70

Fitted Rate Function over 100 Replications of a Test Process with One Cyclic Rate Component and Long-term Trend 700 600 500 Arrival Rate 400 300 200 100 0 0 1 2 3 4 5 6 7 8 9 10 Time t 71

Fitted Mean-Value Function over 100 Replications of a Test Process with One Cyclic Rate Component and Long-term Trend 1200 Cumulative Mean Arrivals 600 0 0 1 2 3 4 5 6 7 8 9 10 Time t 72

Web-based Input Modeling Software <www.rit.edu/ kuhl1/simulation> 73

IV. Conclusions and Recommendations The common thread running through this tutorial is the focus on robust input models that are computationally tractable and sufficiently flexible to represent adequately many of the probabilistic phenomena that arise in many applications of discrete-event stochastic simulation. Notably absent is a discussion of Bayesian input-modeling techniques a topic that will receive increasing attention in the future. Additional material on input modeling is available via <www.ise.ncsu.edu/jwilson/more_info>. 74