ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles).

Size: px

Start display at page:

Download "ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles)."

Agatha Wood
6 years ago
Views:

1 ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Set up standard example and notation. Review performance measures (means, probabilities and quantiles). A framework for conducting simulation experiments and reporting results. Review point estimators. Measuring and controlling error for absolute and relative performance. Measuring and controlling error in steady-state simulation. 14

2 Public Sub TTF(Lambda As Double, Mu As Double, Sum As Double) sub to generate one replication of the ttf for the airline reservation system variables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call STANDARD EXAMPLE Dim State As Integer Dim Fail As Double Dim Repair As Double State = 2 Sum = 0 While State > 0 If State = 2 Then Fail = Exponential(1 / Lambda) Sum = Sum + Fail State = 1 Else Fail = Exponential(1 / Lambda) Repair = Exponential(1 / Mu) If Repair < Fail Then Sum = Sum + Repair State = 2 Else Sum = Sum + Fail State = 0 End If End If Wend End Sub 15

3 NOTATION AND PERFORMANCE MEASURES Let Y 1, Y 2,..., Y n denote outputs across n replications. Thus, Y 1, Y 2,..., Y n are independent and indentically distributed (i.i.d.) with common, unknown cumulative distribution function (cdf) F. We want to estimate mean µ = E[Y ] = y df (y) probability p = Pr{a < Y b} = b a df (y) quantile θ, where for given 0 < γ < 1 Pr{Y θ} = θ df (y) = γ 16

4 Example: In the TTF example, µ could represent the expected time to failure; p the probability failure occurs in less than 700 hours; and θ the number of hours so that failure occurs on or before then 80% of the time. Comments: Means and probabilities are the same problem, since a probability is the mean of an indicator random variable Z = I(a < Y b). E[Z] = 0 Pr{Y a or Y > b} + 1 Pr{a < Y b} Higher moments (variances, skewness, kurtosis, etc.) viewed as means. can also be Quantiles cannot be (conveniently) expressed as means. 17

5 A FRAMEWORK FOR CONDUCTING EXPERIMENTS AND REPORTING RESULTS We should distinguish between absolute and relative performance measures. The expected TTF of a system is an absolute performance measure. The difference in expected TTF between two proposed systems is a relative performance measure. When we report simulation-based performance measures, the minimum we report is either a point estimate (our best guess) and a measure of error, or a point estimate with only significant digits. Without a measure of error, a point estimate may be meaningless. 18

6 REPORTING POINT ESTIMATES Suppose we estimate the expected TTF to be What should we report? point estimate and a measure of error ± or [ , ] Report 998 ± 61 or [937, 1059] a point estimate with only significant digits with standard error Rule of Thumb: Use the first nonzero digit of the standard error to indicate which digit to round. Report

7 CONTROLLING ERROR When we control point-estimator error, we can control absolute or relative error. Absolute error: Estimate expected TTF to within ±20 hours. Relative error: true value. Estimate expected TTF to within ±5% of its We should reduce error to the level required for the decision we have to make. The primary ways we control error are through the experiment design (such as number of replications) and variance-reduction techniques. 20

8 CONDUCTING EXPERIMENTS Experiments are conducted in stages. 1. Exploratory experiments provide information needed to form the designed experiments (sometimes called production runs ). They do not control error, but they do measure it. L&K refer to these as fixed-sample-size experiments. 2. Designed experiments provide the performance estimates and control error. 3. We redesign and repeat experiments if goals are not achieved. L&K refer to these as sequential experiments. Save raw output data when possible, and use both graphical and numerical exploration. 21

9 Estimate a mean µ (or probability p) by POINT ESTIMATION FOR MEANS where X t may be Y t or Z t. X = 1 n n X t t=1 Properties: E[ X] = E[X] (either µ or p; unbiased) se = Var[ X] = σ/ n if a mean, then σ = Var[Y t ] if a probability, then σ = p(1 p) 22

10 ESTIMATING QUANTILES Example: if n = 100, γ = 0.9 Y (1) Y (2) Y (89) Y (90) Y (100) 1. Initialization: n and 0 < γ < 1 2. Data: y[j], the jth output 3. Sort y[j] from smallest to largest so that y[1] y[2] y[n] 4. i (n + 1) γ if i < 1 output y[1] if i n output y[n] otherwise f (n + 1) γ i; output (1 f) y[i] + f y[i + 1] 23

11 Properties: E[ˆθ] θ (the estimator is biased) ˆθ is consistent (converges to θ) For i.i.d. data and large n se γ(1 γ) df 1 (γ) n du Example: How hard is it to estimate a quantile? Suppose F (y) = 1 e λy (the exponential distribution). γ n for equivalent se

12 MEASURES OF ERROR FOR ABSOLUTE PERFORMANCE A point estimate without a measure of error (MOE) may be meaningless. The goal of simulation output analysis is to provide an appropriate MOE. We will focus on two MOEs: standard error roughly the average deviation of an estimator from what it estimates; use for rounding confidence interval a procedure that captures the unknown quantity with prespecified probability With i.i.d. data (i.e., replications) this is classical statistics, but we take advantage of opportunities available in simulation. 25

13 BATCHING TRANSFORMATION When we have a large number of replications, we can use a transformation that partitions n outputs into m batches of size b (or m macroreps of b microreps): Y 1 Y b Y b+1 Y 2b Y (m 1)b+1 Y mb We can compute (within) batch means Ȳ h = 1 b b Y (h 1)b+j j=1 batch probabilities ˆp h = 1 b b j=1 I(A < Y (h 1)b+j B) batch quantiles (sort within each batch) ˆθ h = Y ((h 1)b+ (b+1)γ ) 26

14 BATCHING TRANSFORMATION, CONT. We can also batch continuous-time output processes {Y (t); 0 t T }. The ith batch mean of size b = T/m time units is Ȳ i = 1 b ib (i 1)b Y (t) dt After the first batching, it is easy to double the batch size by just averaging pairs of batch means; for other batch sizes we have to recompute the integral. 27

15 Why batch? Batch means and probabilities tend to be more nearly normally distributed than the raw outputs (due to the central limit theorem). If batching gives us approximately i.i.d. normal data then we can use classical statistics. Batch statistics aggregate the data, but less so than summary statistics. Little is lost provided m is not too small. For instance, consider a confidence interval based on the t distribution. t 0.975, = 1.96 while t 0.975,30 = 2.04 Maintain from 10 to 40 batch means. Batch sizes can be used to adjust variability. If S 2 i (n) > S2 l (n), then set b i S 2 i (n)/s2 l (n) and b l = 1. 28

16 MOE FOR i.i.d. DATA Goal: Estimate E[X] (note that X may be a batch quantile) Data: X 1, X 2,..., X m i.i.d. (reps or batch statistics) Point Estimator: X Estimated Standard Error: ŝe = S/ m where S 2 = mi=1 (X i X) 2 m 1 = mi=1 X 2 i ( m j=1 X i ) 2 /m m 1 Confidence interval (normal data): X ± t 1 α/2,m 1 ŝe where t 1 α/2,m 1 is the 1 α/2 quantile of the t distribution with m 1 degrees of freedom. Note: S 2 /m = p(1 p)/(m 1) when X is an indicator random variable. 29

17 algorithm batch means 1. Initialization: length of the replication, n; number of batches m; batch size b n/m ; sum 0; and sumsq 0 2. Data: y[t], the tth observation from a single replication. 3. Batch the outputs: (a) for j 1 to m: y[j] y[(j 1) b + 1] for t 2 to b: y[j] y[j] + y[(j 1) b + t], loop loop (b) Compute the standard error: for j 1 to m: y[j] y[j]/b sum sum + y[j] sumsq sumsq + y[j] y[j] loop 30

18 (note: batch means are now stored in y[1],..., y[m]) 4. Output: sqrt{(sumsq sum sum/m)/(m(m 1))}

19 SPECIAL CI FOR QUANTILES If the number of replications is too small for batching, there is a nonparametric c.i. for quantiles. Goal: Estimate θ such that Pr{Y θ} = γ Data: Y 1, Y 2,..., Y n i.i.d. (reps) Point Estimator: Y ( (n+1)γ ) (or interpolated version) By definition, Pr{Y i θ} = γ. Therefore we can find integers l and u such that Pr{Y (l) θ Y (u) } = Pr{l #(Y i θ) < u} = u 1 j=l ( n j ) γ j (1 γ) n j 1 α The confidence interval is [Y (l), Y (u) ]. 31

20 For large n we can use the normal approximation to the binomial distribution to get l = nγ z 1 α/2 nγ(1 γ) + 1/2 u = nγ + z 1 α/2 nγ(1 γ) + 1/2 + 1 where z 1 α/2 is the 1 α/2 quantile of the standard normal distribution. Even with small n, the approximations for l and u provide a good place to start looking. Example: γ = 0.9, n = 200, α = 0.05 and z = 1.96 implies l = 172 and u = 189, meaning the confidence interval is [Y (172), Y (189) ]. Question: Why no batching allowed? 32

21 EXAMPLE Suppose we set λ = 1 and µ = 1000 in the TTF simulation and make n = 1000 replications. Our goal is to estimate µ = E[Y ], p = Pr{Y < 500} and the 0.9 quantile of Y, Pr{Y θ} = 0.9 For µ and p we will use both the raw data, and m = 100 batch means of size b = 10. For the quantile estimator, we use the nonparametric methods: i = (n + 1) γ = (1001)(0.9) = 900 f = (1001)(0.9) 900 = 0.9 θ = 0.1 Y (900) Y (901) l = (1000)(0.9) 1.96 (1000)(0.9)(0.1) + 1/2 = 881 u = (1000)(0.9) 1.96 (1000)(0.9)(0.1) + 1/2 + 1 =

22 CONTROLLING ABSOLUTE ERROR Our standard error estimator se = S/ n estimates the true standard error σ/ n. If S is not too far from σ, then we can predict what our absolute error will be if we increase n. Suppose we initially make n 0 replications or batches and compute S. To obtain an absolute error less than a we need n = ( t1 α/2,n0 1 S a ) Notice that we need to take 4n to cut se in half. This approach can be shown to be valid as a 0. 34

23 CONTROLLING RELATIVE ERROR Suppose we initially make n 0 replications or batches and compute X and S. To obtain a relative error less than r 100%, set n = ( t1 α/2,n0 1 S r X ) Comments: It is important to recheck the absolute or relative error after taking the additional samples. To obtain a stable estimate of σ, we would like n 0 10 reps or batches initially. Using n 0 1 degrees of freedom for t is conservative. When estimating a probability, planning can be done in advance since σ = p(1 p) which is largest at p = 1/2. 35

24 ERROR CONTROL EXERCISE Based on 1000 replications of TTF, a 95% confidence interval for the expected TTF is ± 60.96, based on a standard error of and t 0.975,999 = How many replications are needed to estimate expected TTF to within ±20 hours? How many replications are needed to estimate expected TTF to within ±1% of its true value? 36

25 STEADY-STATE SIMULATION When we attempt to estimate long-run, or steady-state, performance, we need to simulate each replication (perhaps only 1) long enough that output data are statistically stationary. All of the point estimators we have covered are valid for stationary, but dependent data. However, the MOEs are not valid unless they are applied to independent replications, or batches that are sufficiently large that the point estimators from different batches are (nearly) independent. Typically, the standard MOEs underestimate the true level of error when applied to positively dependent data. 37

26 REASONS TO USE STEADY-STATE SIMULATION Appropriate planning horizon is unknown, but long. Appropriate initial conditions are unknown. Effects of initial conditions are substantial, but short lived. System conditions vary over time, but we want to design the system to tolerate sustained exposure to worst-case conditions. 38

27 CAUTIONS If the simulated time required to overcome the initial conditions is much longer than the planning horizon, then it makes no sense to do a steady-state simulation. Instead, work on determining initial conditions. Pretending that peak load is the norm may be overly conservative, particularly if the peak is short lived. In reality we may be able to tolerate an over-capacity situation for a short time. The approach to steady state is typically evaluated via stability of the mean, but this is only a necessary condition. 39

28 EXAMPLE: GI/G/1 QUEUE Let Y t represent the delay in queue of the tth customer, S t the customer s service time, and G t the interarrival-time gap. Let Y 0 = 0, S 0 = 0. Then Y t = max{0, Y t 1 + S t 1 G t } Under certain conditions Y t Y as t, where denotes weak convergence (convergence in distribution). Problem: Estimate performance measures associated with the limiting random variable Y. This is a more difficult problem because we cannot observe Y directly. 40

29 STATIONARITY (STEADY STATE) If Y t F t and Y F, then weak convergence ( ) means that F t (a) F (a) for every a at which F is continuous. To obtain a MOE, we will also want Y t to be covariance stationary independent of t. Cov[Y t, Y t+h ] = γ(h) Steady state does not mean constant. Long run must not depend on initial conditions, no regular cycles and cannot become trapped. Y 1, Y 2,... i.i.d. satisfies these conditions. 41

30 INITIAL-CONDITION BIAS Even if Y t Y as t, we can only observe Y 1,..., Y m. For a finite sample, the initial conditions do matter; e.g., for GI/G/1 the E[Y 1 ] = 0 but E[Y 2 ] > 0 Ideas: make m large delete Y 1,..., Y d for d < m sample Y 1 from F pick Y 1 close to E[Y ] or median or mode (in general, pick system state near mean or modal state). 42

31 REPLICATION-DELETION APPROACH Goal: Estimate steady-state mean (or other quantity). Method: Make n replications of length m, deleting d observations from each, and use replications means as the basic observations. Y 11 Y 1d Y 1,d+1 Y 1m Ȳ 1 Y 21 Y 2d Y 2,d+1 Y 2m Ȳ Y n1 Y nd Y n,d+1 Y nm Ȳ n The key then becomes determining the appropriate value of d. (In practice, d is often a deletion time, rather than a count.) 43

32 MEAN PLOT Goal: Find point where mean is nearly stationary. Idea: Plot E[Y t ] vs. t and look for convergence. Method: Make k replications of length m and average across. Y 11 Y 12 Y 1m Y 21 Y 22 Y 2m.... Y k1 Y k2 Y km Ȳ (1) Ȳ (2) Ȳ (m) Plot Ȳ (t) vs. t. Smoothing can help. Note: probabilities and quantiles may converge more slowly than means, and covariance function is even slower. 44

33 algorithm mean plot 1. Initialization: number of reps, k; length of each rep, m; array a[t] of length m; array s[t] 0 for t 1, 2,..., m; window width, w 2. Data: y[t, j], the tth observation from replication j 3. Calculate averages: (a) for t 1 to m: for j 1 to k: s[t] s[t] + y[t, j], endfor endfor (b) for t 1 to m: a[t] s[t]/k, endfor 45

34 4. Smooth plot: (a) for t 1 to w: s[t] 0 for l t + 1 to t 1: s[t] s[t] + a[l + t], endfor s[t] s[t]/(2t 1) endfor (b) for t w + 1 to m w: s[t] 0 for l w to w: s[t] s[t] + a[l + t], endfor s[t] s[t]/(2w + 1) endfor 5. Output: plot a[t] versus t for t 1 to m w

35 ERROR IN THE MEAN PLOT We can put a standard error or confidence interval around the mean plot to show the variability of the plot. Without Smoothing Let and plot S 2 (t) = 1 k 1 k i=1 ( Yit Ȳ (t) ) 2 Ȳ (t) ± t 1 α/2,k 1 S(t) k Since we are forming many confidence intervals, it may be sufficient to form one around every bth point only. A conservative estimate of the remaining bias is the difference between the lowest lower bound and the highest upper bound on the retained data. 46

36 With Smoothing Typically, Var[Ȳ (t, w)] < Var[Ȳ (t)] so using the previous intervals around the smoothed plot would be conservative (L&K, exercise 9.13). However, 1 Ȳ (t, w) = 2w + 1 = 1 k = 1 k k i=1 k i=1 w l= w 1 2w + 1 Ȳ it (w) 1 k Y i,l+t k i=1 w Y i,l+t l= w Since Ȳ 1t (w),..., Ȳ kt (w) are i.i.d., form a c.i. based on them. 47

37 SINGLE-RUN IDEAS Suppose we want to minimize bias by making only a single run, so that a mean plot may be too noisy. It will be useful to think of the output as represented by where E[X t ] = 0 and b t 0. Cumulative Mean Plot Y t = µ + X t + b t Let Ȳ j = 1 j j t=1 Y t and plot Ȳ j vs. j. This is not a good idea (L&K, exercise 9.14). Cusum Plot Let S 0 = 0 and S j = j t=1 (Ȳ Y t ) = j(ȳ Ȳ j ). Plot S j vs. j. 48

38 Under our model S j = j t=1 ( X X t + b b t ) Therefore, S 0 = S m = 0 (tied to 0 at the ends) E[S j ] = j t=1 ( b b t ) No bias If b t = 0 then E[S j ] = 0 and S j = j t=1 ( X X t ) should cross 0 many times. 49

39 Bias Suppose b t < 0 for t t and b t 0 for t > t. For j < t we have b b t > 0 so E[S j ] > 0. For j > t E[S j+1 S j ] = E j+1 (Ȳ Y t ) j t=1 t=1 = E[Ȳ Y j+1 ] E[Ȳ ] µ (Ȳ Y t ) = Bias[Ȳ ] = b < 0 So the slope of the curve is negative and equal to the bias. Statistical tests for bias have been based on this idea. 50

40 EFFECT OF DEPENDENCE When the output data are dependent, the usual se is not valid. When the output data are positively dependent, the usual se is too small. For Y 1, Y 2,..., Y n stationary and ρ h = Corr[Y t, Y t+h ] Var[Ȳ ] = σ2 Y n n 1 h=1 ( 1 h ) n ρ h while E [ ] S 2 n = σ2 Y n n 1 n 1 h=1 ( 1 h ) n ρ h giving a bias of E[S 2 /n] Var[Ȳ ] = 2σ2 n 1 ( ) n 1 h=1 1 h n ρh. 51

41 BATCH SIZE (CLASSICAL) How do we choose the batch size? 1. generate data 2. form batch means 3. evaluate independence and/or normality of batch means 4. if test fails, increase batch size (and perhaps generate more data). Classic References: Fishman (1978), Management Science 24, Law and Carson (1979), Operations Research 27,

42 LAG CORRELATIONS Batching algorithms typically focus on dependence rather than normality. Evaluate dependence via the lag correlations Corr[Y t, Y t+h ] for h = 1, 2,.... If independent, then Corr[Y t, Y t+h ] = 0. Sample lag-h correlation ˆρ h = n h i=1 (X i X)(X i+h X)/(n h) S 2 Focus on lag-1 correlation because other lags typically smaller. ˆρ h can be quite biased for small n. 53

43 FISHMAN (1978) Fishman assumes total sample size n is fixed. Starts with b = 1 (m = n batches). Tests Corr[Ȳ h, Ȳ h+1 ] = 0. If test rejected, set b 2b and repeat. Comments: Not clear what the meaning of repeated tests on same data is. Little is lost provided m > 30 since t 0.975,30 = 2.04 vs. t 0.975, = Waste to start with m = n. Actually used. 54

44 LAW & CARSON Law and Carson assume total sample size n can be sequentially increased to obtain c.i. with fixed relative width. Calculates lag-1 correlation for large number of small batches. Projects what lag-1 correlation would be for small number (40) of large batches. If correlation too large, can increase both batch size and sample size. Comments: A complex procedure, but well tested. 55

45 ASYMPTOTIC VARIANCE CONSTANT Suppose that the asymptotic variance constant σ 2 = lim n nvar[ȳ ] exists. Then for large n Var[Ȳ ] σ2 n The more modern view of single-rep output analysis is attempting to estimate σ 2. Note that σ 2 = σy 2 for i.i.d. data. 56

46 NONOVERLAPPING BATCH MEANS (NOBM) NOBM estimates σ 2 by σ 2 nobm = n m(m 1) m h=1 (Ȳh Ȳ ) 2 where b Ȳ h = 1 b Y (h 1)b+j j=1 To have a low-bias estimate of σ 2, it is sufficient that m be small enough (b large enough) that the batch means are approximately independent. 57

47 OVERLAPPING BATCH MEANS Notice that, for a stationary process E ( Ȳ h Ȳ ) 2 is the same for each batch mean. The NOBM estimator is just an average of these terms But this will be true for any batch mean of b consecutive observations, and the expected value of a sum is the sum of the expected values. Thus, if we have a large enough batch size to be good for NOBM, there is no reason the batches have to be nonoverlapping. 58

48 Let X h = 1 b h+b 1 for h = 1, 2,..., n b + 1, all the overlapping batch means of size b. t=h Y t Then we can estimate σ 2 by σ obm 2 n b+1 = constant h=1 ( X h Ȳ ) 2 The constant that is often used is nb/((n b + 1)(n b)) which is correct for i.i.d. data. The asymptotic variance of the OBM estimator is 2/3 that of NOBM estimator. 59

49 OTHER METHODS There are many other methods, including ARMA modeling: fit a parametric model to the data for which σ 2 is known. Standardized Time Series: σ 2 appears in a functional central limit theorem for a standardized version of the output process. Regenerative Method: σ 2 appears as a function of the reward and length of a regenerative cycle. 60

50 ESTIMATING THE BATCH SIZE For many variance estimators, their asymptotic bias and variance can be derived; thus, we can figure out the batch size that minimizes the asymptotic MSE. Under certain conditions MSE[ σ 2 ] = Var[ σ 2 ] + Bias 2 [ σ 2 ] lim b,n/b nbbias[ σ 2 ] = c b γ 1 R 0 lim b,n/b n 3 b Var[ σ 2 ] = c v (γ 0 R 0 ) 2 γ 0 = lim n nh= n ρ h γ 1 = lim n nh= n h ρ h R 0 = Var[Y t ] and (c b, c v ) are constants that depend on the estimator [(1,2) for NOBM, (1, 4/3) for OBM] 61

51 Thus for n and b large MSE[ σ 2 ] ( bcv γ 2 0 n 3 + c2 b γ2 1 n 2 b 2 ) R 2 0 Taking the derivative with respect to b and setting it to 0 gives an approximation to the minimum MSE batch size: b = 1 + 2n c2 ( ) 2 1/3 b γ1 c v γ 0 We can estimate b by estimating γ 0 and γ 1. They are not easy to estimate, but in finite samples the MSE has been shown to be rather flat around the optimal. 62

52 CONSISTENT ESTIMATION vs. CANCELLATION We have been thinking about NOBM with the number of batches m fixed as the sample size n (and therefore the batch size b) increases. While this gives an asymptotically valid confidence interval, it does not give a consistent estimator of σ 2. Consistent estimation is required by some sequential procedures, and it also tends to give narrower confidence intervals. As the total run length n increases, a consistent estimator is obtained by letting both m and b grow as O( n). See, for instance, Fishman and Yarberry (1997), INFORMS Journal on Computing, 9,

53 Cancellation: With the number of batches fixed, as n so that m(ȳ µ) σ 2 (m) σ 2 (m) σ 2 (m) χ2 df df σ(m)n(0, 1) σ 2 (m)χ 2 df /df t df Consistent estimation: With both the batch size and number of batches increasing as n σ 2 σ 2 so that n(ȳ µ) σ σ N(0, 1) σ N(0, 1) 64

Slides 12: Output Analysis for a Single Model

Slides 12: Output Analysis for a Single Model Objective: Estimate system performance via simulation. If θ is the system performance, the precision of the estimator ˆθ can be measured by: The standard error