A distribution-free tabular CUSUM chart for autocorrelated data

Size: px

Start display at page:

Download "A distribution-free tabular CUSUM chart for autocorrelated data"

Reynard Griffith
5 years ago
Views:

1 IIE Transactions (007) 39, Copyright C IIE ISSN: X print / online DOI: / A distribution-free tabular CUSUM chart for autocorrelated data SEONG-HEE KIM 1, CHRISTOS ALEXOPOULOS 1, KWOK-LEUNG TSUI 1 and JAMES R. WILSON 1 H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA , USA skim@isye.gatech.edu or christos@isye.gatech.edu or ktsui@isye.gatech.edu Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, NC , USA jwilson@ncsu.edu Received September 005 and accepted March 006 A distribution-free tabular CUSUM chart called DFTC is designed to detect shifts in the mean of an autocorrelated process. The chart s Average Run Length (ARL) is approximated by generalizing Siegmund s ARL approximation for the conventional tabular CUSUM chart based on independent and identically distributed normal observations. Control limits for DFTC are computed from the generalized ARL approximation. Also discussed are the choice of reference value and the use of batch means to handle highly correlated processes. The performance of DFTC compared favorably with that of other distribution-free procedures in stationary test processes having various types of autocorrelation functions as well as normal or nonnormal marginals. Keywords: Statistical process control, tabular CUSUM chart, autocorrelated data average run length, distribution-free statistical methods 1. Introduction Given a stochastic process to be monitored, a Statistical Process Control (SPC) chart is used to detect any practically significant shift from the in-control status for that process, where the in-control status is defined as maintaining a specified target value for a given parameter of the monitored process for example, the mean, the variance or a quantile of the marginal distribution of the process. An SPC chart is designed to yield a specified value ARL 0 for the in-control Average Run Length (ARL) of the chart that is, the expected number of observations sampled from the in-control process before an out-of-control alarm is (incorrectly) raised. Given several alternative SPC charts whose control limits are determined in this way, one would prefer the chart with the smallest out-of-control average run length ARL 1, a performance measure analogous to ARL 0 for the situation in which the monitored process is in a specific out-of-control condition. If the monitored process consists of independent and identically distributed (i.i.d.) random variables from a known distribution, such as the normal distribution, then control limits can be determined analytically for some charts such as the Shewhart and tabular CUSUM charts as detailed in Montgomery (001). It is more difficult to determine control limits for an SPC chart that is applied to an autocorrelated process; and much of the recent work on this problem has been focused on developing distribution-based (or model-based) SPC charts, which require one of the following properties. 1. The in-control and out-of-control versions of the monitored process must follow specific probability distributions.. Certain characteristics of the monitored process such as the first-order and second-order moments, including the entire autocovariance function must be known. Of course, if the underlying assumptions about the probability distributions describing the target process are violated, then these charts may not perform as advertised. Moreover, the control limits for many distribution-based charts can only be determined by trial-and-error experimentation, which can be very inconvenient in practical applications. This is especially true in circumstances that require rapid calibration of the chart and do not allow extensive preliminary experimentation on training data sets to estimate ARL 0 for various trial values of the control limits and other parameters of the chart. We illustrate these disadvantages of distribution-based charts in more detail in the next section, using an example from intrusion detection in information systems. The limitations of distribution-based procedures can be overcome by distribution-free SPC charts. Runger and Willemain (R&W) (1995) organize the sequence of observations of the monitored process into adjacent X C 007 IIE

2 318 Kim et al. nonoverlapping batches of equal size; and their SPC procedure is applied to the corresponding sequence of batch means. They choose a batch size large enough to ensure that the batch means are approximately i.i.d. normal, and then they apply to the batch means one of the classical SPC charts developed for i.i.d. normal data, including the Shewhart and tabular CUSUM charts. In contrast to this approach, Johnson and Bagshaw (J&B) (1974) and Kim, Alexopoulos, Goldsman, and Tsui (006) present CUSUMbased methods that use raw (unbatched) observations instead of batch means. Computing the control limits for the latter two procedures requires an estimate of the variance parameter of the monitored process that is, the sum of covariances at all lags. Nevertheless, these CUSUM-based charts are distribution free since one can estimate the variance parameter using a variety of distribution-free techniques that are popular in the simulation literature; see Alexopoulos et al. (006). For first-order autoregressive processes with a known variance parameter, Kim, Alexopoulos, Goldsman, and Tsui (006) show that: (i) their model-free CUSUM chart called the MFC chart performs uniformly better than the J&B chart in terms of ARL 1 for a given target value of ARL 0 ; and (ii) the MFC chart works better than the R&W Shewhart chart for small shifts. On the other hand, Kim, Alexopoulos, Goldsman, and Tsui (006) find that the R&W Shewhart chart performs better than the MFC chart for large shifts. This is not surprising, given that a Shewharttype chart is generally more effective than a CUSUM-type chart in detecting large shifts in processes consisting of independent normal observations. However, Kim, Alexopoulos, Goldsman, and Tsui (006) show that for stationary processes with nonnormal marginals such as first-order exponential autoregressive processes, a large batch size is often required to achieve both independence and normality of the batch means. This large batch size impairs the performance of the R&W Shewhart chart, delaying legitimate out-of-control alarms for processes with a pronounced correlation structure or large shifts; and in practice it is difficult to determine a good choice for the batch size in the R&W Shewhart chart. Another approach to developing distribution-free SPC charts is taken by Ben-Gal et al. (003), who introduce a context-based SPC methodology for state-dependent discrete-valued data generated by a finite-memory source. Unfortunately this method is limited to univariate stochastic processes having a finite state space; and the experimental results of Ben-Gal et al. (003) indicate that relatively large sample sizes are required to calibrate the performance of this procedure for the in-control condition. In this paper we formulate DFTC, a distribution-free tabular CUSUM chart for monitoring autocorrelated processes. The proposed chart is a generalization of the conventional tabular CUSUM chart that is designed for i.i.d. normal random variables. Moreover, to improve upon the performance of the J&B chart, DFTC incorporates a nonzero reference value into the monitoring statistic. For a reflected Brownian motion process with drift, Bagshaw and Johnson (1975) derive the density and expected value of the first-passage time to a positive threshold; and they mention that this result can be used to approximate the ARL of a CUSUM chart with a nonzero reference value. Combining this approximation with a generalization of the Brownianmotion approximation of Siegmund (1985) for the ARL of a CUSUM-based procedure that requires i.i.d. normal random variables, we designed DFTC so that it can be used with raw correlated data or with batch means based on any batch size. The rest of this article is organized as follows. Section contains relevant background information, including a motivating example, notation, and assumptions. Section 3 presents the proposed DFTC chart for autocorrelated processes. Section 4 contains an experimental comparison of the performance of DFTC with that of existing distributionfree procedures based on three test processes whose probabilistic behavior is typical of many practical applications of SPC procedures. The first test process is a stationary first-order autoregressive (AR(1)) process with the following values of the autoregressive parameter (and hence also of the lag-one correlation): 0.0, 0.5, 0.5, 0.7, 0.9, 0.95, and The second test process is the sequence of queue waiting times generated by the M/M/1 queueing system with traffic intensities of 30 and 60%. Thus, for each configuration of this system in steady-state operation, the queue waiting-time process has the following properties: (i) its autocorrelation function decays at an approximately exponential rate; and (ii) its marginal distribution is markedly nonnormal, with an atom at zero and an exponential tail. The third test process is a stationary second-order autoregressive (AR()) process, where the corresponding autocorrelation function exhibits exponentially damped sinusoidal behavior; and the original AR() process also exhibits a kind of distorted periodicity with the same period as the autocorrelation function. Section 5 summarizes the main findings of this work.. Background In this section we give a motivating example from the area of intrusion detection in information systems to illustrate the emerging need for distribution-free SPC methods. Then we define the notation used in this article, and we state our basic assumptions about the probabilistic behavior of the process to be monitored.

3 Distribution-free tabular CUSUM chart 319 Fig. 1. Event counts in 1-second time intervals derived from a BSM audit file..1. Motivating example The MIT Lincoln Laboratory simulated the environment of a real computer network to provide a test-bed of data sets for comprehensive evaluation of the performance of various intrusion-detection systems. e et al. (001), e et al. (003), and Park (005) derive event-intensity (arrival-rate) data from log files generated by the Basic Security Module (BSM) of a Sun SPARC 10 workstation running the Solaris operating system and functioning as one of the components of the network simulated by the MIT Lincoln Laboratory. These authors consider a Denial-of-Service (DoS) attack on the Sun workstation that leaves trails in the audit data in particular, e et al. (001), e et al. (003), and Park (005) capture the activities on the machine through a continuous stream of audit events whose occurrence times are recorded in the log files. Figure 1 shows event-intensity data (that is, the number of events in successive 1-second time intervals) derived from the BSM log files for an observation period of seconds on a specific day in the data sets from the MIT Lincoln Laboratory. This data set is believed to be intrusion free. Since the Sun system performs a specific routine to create a log file every 60 seconds, the graph in Fig. 1 shows a repeated pattern every 60 seconds. After a careful analysis, Park (005) separates the graph in Fig. 1 into the cyclic and noise components as shown in Fig.. For the detection of a DoS attack, the noise events must be monitored. One can observe that the noise data are very sparse in particular, only 60 of the second time intervals contained noise events not related to the generation of a log file so that the estimated probability of occurrence of at least one noise event in a given 1- second time interval is only Conventional probability Fig.. Event counts in 1-second time intervals separated into cyclic (top panel) and noise (bottom panel) components.

4 30 Kim et al. distributions (in particular, the Poisson and normal distributions) cannot provide an adequate fit to the observed noise data set because of its high standard deviation. For the sample of 60 noise-event counts associated with 1-second time intervals containing at least one noise event as depicted in the bottom panel of Fig., the sample mean is 81 and the sample standard deviation is 154, which is almost twice as large as the mean. Such anomalous behavior in the noise data strongly suggests that this process cannot be adequately represented by standard univariate probability distributions; and ultimately Park fitted a Bézier distribution (Wagner and Wilson, 1996) to the nonzero noiseevent counts displayed in the bottom panel of Fig. to drive a simulation-based performance evaluation of various intrusion-detection procedures. For this application, it is clear that conventional distribution-based SPC charts are inappropriate for detecting a DoS attack... Notation and assumptions Suppose the discrete-time stochastic process { i : i = 1,,...} to be monitored has a steady-state distribution with marginal mean E[ i ] = μ and marginal variance Var[ i ] = σ. Specifically, we let μ 0 denote the in-control marginal mean. We let Ȳ(n) = n 1 n i=1 i denote the sample mean of the first n observations. The standardized CUSUM, C n (t), is defined as nt i=1 C n (t) i ntμ for t [0, 1], (1) n where: (i) is the floor (greatest integer) function so that z denotes the largest integer not exceeding z; and (ii) is the variance parameter for the process { i}, defined as lim n nvar[(n)] = l= Cov( i, i+l ). Let {W(t) :t [0, )} denote a standard Brownian motion process so that for arbitrary s, t [0, ), the random variables W(s) and W(t) are jointly normal with E[W(s)] = E[W(t)] = 0 and Cov[W(s), W(t)] = min{s, t}. For each positive integer n, the random function C n ( ) is an element of D[0, 1], the space of functions on [0, 1] that are right-continuous and have left-hand limits (Billingsley, 1968). Our main assumption is that { i : i = 1,,...} satisfies the following Functional Central Limit Theorem (FCLT): Assumption FCLT. There exist finite real constants μ and > 0 such that as n, the sequence of random functions {C n ( ) :n = 1,,...} converges in distribution to standard Brownian motion W( ) in the space D[0, 1]. Formally, we write C n ( ) D n W( ), D where denotes convergence in distribution as n. n Assumption FCLT applies to a broad class of stationary stochastic processes, including φ-mixing processes, strongly mixing processes, associated strictly stationary processes, and regenerative processes (Glynn and Iglehart, 1985, 1990). Roughly speaking, a stationary process will satisfy assumption FCLT if in the evolution of the process over time, the distant future is virtually independent of the past and the present in other words, the stochastic dependence between any two subseries { i : i g} and { i : i g + l} of the process separated by the lag l in their index ( time ) parameter must diminish sufficiently fast in some welldefined sense as the lag l. For a discussion of processes that do not satisfy Assumption FCLT, see Example 7.8 of Cox and Miller (1965). For technical reasons we also assume that for every t [0, 1], the family of random variables {Cn (t) :n = 1,,...} is uniformly integrable; see Billingsley (1968). Let B(t) = d t + W(t) fort [0, ), () so that B( ) denotes Brownian motion on [0, ) with drift parameter d and variance parameter so that for all t 0, we have E[B(t)] = d t and Var[B(t)] = t..3. Tabular CUSUM for i.i.d. normal data Given a monitored process consisting of i.i.d. normal random variables with marginal variance σ, we see that the two-sided tabular CUSUM chart with reference value K = kσ and control limit H = hσ is defined by 0, if n = 0, S ± (n) = max{0, S ± (n 1) ± ( n μ 0 ) K}, (3) if n = 1,,.... The interpretation of the ± notation in Equation (3) is that: (i) we have the initial values S + (0) = 0, S (0) = 0; and (ii) for n = 1,,...,wehaveS + (n) = max{0, S + (n 1) + ( n μ 0 ) K} and S (n) = max{0, S (n 1) ( n μ 0 ) K}. (Similar use of the ± notation is made throughout this article.) An out-of-control alarm is raised just after the nth observation is taken if S + (n) H or S (n) H. It is well known that the tabular CUSUM chart for i.i.d. normal data has nearly optimal sensitivity to a shift E[ j ] μ 0 = μ μ 0 of magnitude K; see p. 415 of Montgomery (001). Therefore, if the reference value K (or the associated reference parameter k) is very small, then the chart is effective in detecting relatively small shifts but is less effective in detecting more meaningful shifts compared with a similar chart having a substantially larger reference value. Table 1 shows ARLs of the tabular CUSUM chart with the reference parameter values k = 0 and k = 0.5. For the case k = 0.5, we obtained from Table 8.4 of Montgomery (001) the value h = 4.77 of the control-limit parameter that

5 Distribution-free tabular CUSUM chart 31 Table 1. ARLs of the tabular CUSUM chart when the data are i.i.d. normal with marginal variance σ = 1, where all estimated ARLs are based on experiments; the numbers in parentheses represent standard errors for the corresponding estimated ARLs Estimated ARL for Tabular CUSUM Shift in mean (μ μ 0 )/σ k = 0, h = 6.05 k = 0.5, h = (0.13) (0.363) (0.038) 11.0 (0.115) (0.014) 35. (0.09) (0.008) (0.011) (0.005) 9.9 (0.005) (0.003) 5.51 (0.00) (0.00) 3.86 (0.001) (0.001) 3.00 (0.001) (0.001).48 (0.001) (0.001) 1.96 ( ) is required to yield ARL for i.i.d. in-control normal data. For the case k = 0, we obtained the corresponding control-limit parameter estimate h = 6.05 by trial-anderror experimentation based on simulation experiments. As expected, the tabular CUSUM chart with k = 0 is more effective in detecting shifts of size 0.5σ, but the chart with k = 0.5 detects any shift exceeding 0.5σ much faster. Similarly, for autocorrelated data, we can expect that introducing a nonzero reference value into a CUSUM-type chart should improve the performance of the chart. The monitoring statistic of the J&B chart is the same as that of the tabular CUSUM chart but with reference value K = 0. Therefore, in the design of the DFTC chart, we incorporated a nonzero reference value K (or, equivalently, a nonzero reference parameter value k) into the monitoring statistic of the J&B chart. In the next section, we present a detailed development of the DFTC chart, including a rationale for setting the reference value K and a method for determining the control limits of the procedure. 3. DFTC: A distribution-free tabular CUSUM procedure For the one-sided monitoring statistics S + (n) and S (n) defined in Equation (3), we have the corresponding times at which an alarm is raised, T ± = min{n : S± (n) H and n = 1,,...}. (4) In this section we formulate procedure DFTC by computing the one-sided ARL E[T + ] for the in-control condition E[ i ] = μ 0. A similar approach will yield the same final result for E[T ]. We exploit these results to derive the parameters of the DFTC chart. To compute E[T + ], we consider the following monitoring statistic that is closely related to S + (n) but is defined in a slightly different way, { 0, if n = 0, S(n) = S(n 1) + ( n μ 0 ) K, if n = 1,,.... It is easy to see that S + (n) is always equal to S(n) min{s(l) :l = 0, 1,...,n} for n = 1,,... Set d = (E[ j ] μ 0 ) K. Ifnis sufficiently large, then it follows from the definition (1) of the standardized CUSUM, Assumption FCLT, the definition () of Brownian motion with drift, and the continuous mapping theorem (Billingsley, 1968) that S + (n) = S(n) min{s(l) :l = 0, 1,...,n} D d n + ncn (1) inf{d tn+ ncn (t) :0 t 1} D d n + nw(1) inf{d tn+ nw(t) :0 t 1} = D d n + W(n) inf{d u + W(u) :0 u n} = D B(n) inf{b(u) :0 u n}, (5) where D = denotes exact equality in distribution and D denotes approximate (asymptotically exact) equality in distribution. Now the stochastic process defined by Z(t) = B(t) inf{b(u) :0 u t} for t [0, ) (6) has a first-passage time to the threshold H > 0 that is given by T Z = inf{t : t 0 and Z(t) H}. It follows from Equation (5) and an argument similar to the proof of Proposition 3. of Kim et al. (005) that if n is sufficiently large, then T + D T Z, (7) so that we have E[T + ] E[T Z] H /, if d = 0, = [ ( d exp d ) H 1 + d ] H, if d 0, (8) where the formula for E[T Z] on the far right-hand side of Equation (8) follows from Equation (.1) of Bagshaw and Johnson (1975) or Theorem 3.1 of Darling and Siegert (1953). For the situation in which the { i } are i.i.d. normal random variables, Siegmund (1985, p. 7) proposes an improvement to the approximation (8) for the expected first-passage time of the process {S + (n) :n = 1,,...} to the control limit H. We formulate a distribution-free generalization of Siegmund s approximation to handle the case of observations that may be correlated or nonnormal as follows: { [ d exp d ] (H ) E[T + ] 1 + d } (H ), if d 0, ( ) H , if d = 0. (9)

6 3 Kim et al. If the monitored process is in control so that E[ i ] = μ 0 value, then the right-hand side of Equation (9) yields our approximation to E[T + ] when we take d = K. Finally, considerations of symmetry in the definition (3) of the onesided process-monitoring statistics S + (n) and S (n) and of their respective first-passage times defined by Equation (4) reveal that E[T + ] = E[T ]. To derive procedure DFTC, we determine the control limits based on Equation (9) since this approximation is slightly more accurate than Equation (8). It follows that procedure DFTC has the following formal algorithmic statement. DFTC: Distribution-Free Tabular CUSUM Procedure Step 1. Choose K and a target two-sided ARL 0. Then, calculate H, the solution to the equation { [ ] K(H ) K exp 1 K(H } ) = ARL 0. (10) Step. Raise an out-of-control alarm after the nth observation if S + (n) H or S (n) H. A search method (such as the bisection algorithm) can be used to solve Equation (10) Determination of parameters The control limits of the DFTC chart depend on the reference value K and the target value ARL 0 for the incontrol ARL. In this section we search for the choice of K that guarantees good performance for the DFTC chart by experimenting with a stationary first-order autoregressive (AR(1)) process, i = μ + ϕ ( i 1 μ) + ε i for i = 1,,..., (11) where: (i) the residuals satisfy {ε i : i = 1,,...} i.i.d. N(0,σ ε ); (ii) the autoregressive parameter ϕ satisfies 1 < ϕ < 1 so that Equation (11) defines a stationary process at least asymptotically as i ; and (iii) the initial condition 0 satisfies 0 N(μ, σ ), where σ = σ ε /(1 ϕ ) so that the process { i : i = 1,,...} starts in steady-state operation. Recall that in the AR(1) process (11), the autoregressive parameter ϕ coincides with the lag-one correlation ρ = Corr( i, i+1 ). As possible choices for the reference value K, we consider the following: (i) K = kσ, which is the choice for the tabular CUSUM chart with i.i.d. normal data; or (ii) K = k, which seems to be the natural generalization of (i) for correlated data. The accuracy of Equation (9) depends on the extent to which the process monitoring statistic S + (n) behaves probabilistically like the process Z( ) defined by Equation (6). In our computational experience, we found that if we took too large a value of K, then S + (n) would hit zero too frequently so that the rates of convergence in Equations (5) and (7) were too slow for the approximation (9) to yield acceptable accuracy. Thus, we concluded that K should not be too large; but at the same time, it should not be too close to zero to ensure that the chart is sensitive to meaningful shifts in the process mean. In practice, observations from the monitored process are likely to be positively correlated; and in this situation, the variance parameter is often substantially larger than the marginal variance σ. For example, the AR(1) process (11) with autoregressive parameter ϕ = 0.9 and marginal variance σ = 1 has variance parameter = σ (1 + ϕ )/(1 ϕ ) = 19; see Section 4.1. On the basis of all the considerations mentioned in this paragraph, we decided to take K = kσ rather than K = k in the design of the DFTC chart. To find a good choice of the reference parameter k yielding an effective reference value K = kσ, we set the incontrol two-sided ARL equal to the target value ARL 0 = ; or, equivalently, for each of the one-sided tests based on S ± (n), we set the in-control one-sided ARL equal to the target value ARL 0 = Then we computed H from Equation (10) with k { , 0.01, 0.03, 0.05, 0.1, 0.5}; and we recorded the corresponding sample estimates of the actual two-sided ARL 0 computed from 5000 independent replications of the AR(1) process (11) with ϕ {0.5, 0.9} as summarized in Table. Based on the results in Table, we concluded that the sample estimates of the actual two-sided ARL 0 were close to the target value of even for high values of ϕ when k was small say, k { , 0.01, 0.03}. But for large k (say, k 0.5), we found that the accuracy of the approximation deteriorated significantly even for a small value of ϕ such as ϕ = 0.5. We recommend the value k = 0.1 on account of the following considerations: (i) the reference parameter k should not be too large; (ii) the reference parameter k should not be too close to zero; and (iii) in most practical applications of SPC charts, the lag-one correlation of the monitored process is rarely larger than 0.9. On the basis of the foregoing analysis and experimentation with the AR(1) process (11), we concluded that setting k = 0.1 helped to ensure the actual ARL 0 delivered Table. Estimated actual two-sided ARL 0 of the DFTC chart with the generalized approximation (9) for an AR(1) process based on 5000 experiments Estimated actual ARL 0 k ϕ = 0.5 ϕ =

7 Distribution-free tabular CUSUM chart 33 by the DFTC chart was close to the target ARL 0 for small to medium values of the lag-one correlation of the monitored process. However, we also found that the accuracy of the generalized approximation (9) to the ARL broke down for high values of the lag-one correlation, resulting in a conservative control limit H. For example, Table shows that for the AR(1) process with lag-one correlation of 0.90, the DFTC procedure with k = 0.1 yielded for the sample estimate of the two-sided actual ARL 0 when the target ARL 0 was In the next subsection, we present a method for handling processes with high correlation that ensures the actual ARL 0 is close to the target value. 3.. Method for handling processes with high correlation The DFTC chart incorporates a method for handling processes with excessively high correlation. Based on all our computational experience in applying DFTC to a wide variety of test processes, we concluded that when DFTC was applied to a process { i }, the procedure only worked as intended (that is, it delivered an ARL approximately equal to ARL 0 for the in-control condition) when ρ = Corr( i, i+1 ) ζ = 0.5; (1) see also Bagshaw and Johnson (1975). On the other hand, if the upper limit (1) on the lag-one correlation was not satisfied, then we found it was necessary to compute the batch means jm j (m) = 1 m i for j = 1,,...,b = n/m, i=(j 1)m+1 (13) for a batch size m just large enough to ensure that the lag-one correlation between batch means satisfied the requirement ρ (m) = Corr[ j (m), j+1 (m)] ζ. (14) When Equation (14) was satisfied, we found that the DFTC chart performed properly when it was applied to the batch means process { j (m) :j = 1,...,b} with a variance parameter given by = (m) /m. The remainder of this section details the computation of the batch size m required to satisfy Equation (14) so that the resulting batch means may be used as the basic observations of a process which can be monitored effectively by the DFTC chart. Suppose we are given a realization { i : i = 1,...,n} of the original (unbatched) process from which we calculate the sample statistics n n (n) = n 1 i, S = (n 1) 1 [ i (n)], i=1 i=1 n 1 ρ = (n 1) 1 [ i (n)][ i+1 (n)]/s. i=1 We test the hypothesis (1) at the level of significance α cor = If we find that [ ρ sin sin 1 (ζ ) z 1 α cor ], (15) n (with z 1 α cor = z 0.99 =.33), then we conclude that the original unbatched process { i } satisfies condition (1) and no batching is required before applying the DFTC chart. If Equation (15) is not satisfied, then we compute the required batch size according to { [ m = ln sin sin 1 (ζ ) z 1 α cor ]}/ln( ρ ) ; (16) n we compute the batch means (13) for batches of size m; and finally we apply DFTC to the resulting batch means process. Note that in Equation (16), denotes the ceiling function so that z is the smallest integer not less than z. Remark 1. A detailed justification of the test (15) for the condition (1) is given in Section 3. of Steiger et al. (005). Remark. The basis for the batch size formula (16) is the approximation ρ (m) ρ m as detailed in Appendix B of Steiger et al. (005). 4. Experiments In this section, we compare the performance of DFTC with that of the following distribution-free SPC procedures designed for autocorrelated data. The J&B two-sided chart: Define S ± (n) = max{s ± (n 1) ± ( n μ 0 ), 0} for n 1, with S ± (0) = 0. Choose the target two-sided ARL 0 value, and set H = ARL0. Give an out-of-control signal after the nth observation if S + (n) > H or S (n) > H. The MFC chart: Choose a target ARL 0 value and set H = ( ARL ). Raise an out-of-control signal after the nth observation if n ( j μ 0 ) H. j=1 The R&W Shewhart chart: Find a batch size m such that the batch means are approximately normal with lag-one autocorrelation at most 0.1. Choose a target ARL 0 value and find z ON such that m 1 (z ON ) + ( z ON ) = ARL 0, where ( ) denotes the standard normal distribution function. Then give an out-of-control signal after observation i = jm if the jth batch mean defined by Equation (13) satisfies j (m) z ON Var[ 1 (m)],

8 34 Kim et al. where Var[Ȳ 1 (m)], the marginal variance of the batch means, is assumed to be known. It is important to recognize that all the experimental results reported in this article are based on the assumption that the variance parameter is known. In many practical applications, this quantity must be estimated from a training data set, and it is unclear how the performance of the selected SPC procedures will be affected by estimation of the variance parameter. Nevertheless, it is reasonable to expect that any effective SPC procedure must necessarily perform well when it uses the exact value of the variance parameter; and we regard the experimental results reported below as providing an essential first step in identifying potentially effective methods for monitoring autocorrelated processes in practice. For each configuration of each test process, we performed 5000 independent replications of each SPC procedure with two-sided target ARL 0 = ; and the resulting estimated ARLs are summarized below. Although space constraints did not allow inclusion of the corresponding estimated standard errors, the latter quantities are tabulated in the supplement to this article (Kim, Alexopoulos, Tsui, and Wilson, 006) AR(1) processes For the AR(1) process (11), the marginal variance is σ ε σ = 1 ϕ ; the lag-l covariance is Cov( i, i+l ) = σ ϕ l = σ ε ϕ l 1 ϕ for l = 0, ±1, ±,...; and the variance parameter is ( ) 1 + = σ ϕ σε = 1 ϕ (1 ϕ ). In the experiments reported below, the marginal variance of i was set to σ = 1 so that σ ε = 1 ϕ. The shift E[ j ] μ 0 = μ μ 0 was varied over the following multiples of σ :0,0.5, 0.5, 0.75, 1, 1.5,,.5, 3, and 4; and the autoregressive parameter ϕ was varied over the values 0, 0.5, 0.5, 0.7, 0.9, 0.95, and For the R&W chart, we considered two different values for the batch size: (i) a batch size m 1 that yields a lag-one correlation between batch means of approximately 0.1 as prescribed in Table 3 of Runger and Willemain (1995); and (ii) a batch size m that minimizes the mean-squared error of the nonoverlapping batch means estimator of the variance parameter, = m b 1 b [ j (m) (n)], j=1 Table 3. Two-sided ARLs in terms of number of raw observations for an AR(1) process with small or medium ϕ and σ = 1 ϕ Shift R&W (μ μ 0 )/σ J&B MFC DFTC m = m 1 = 4 m = m 1 = 8 m = (Chien et al., 1997). The asymptotically optimal batch size m for the AR(1) process is derived by Carlstein (1986): { } m ϕ /3 = 1 ϕ n 1/3. Since the target ARL 0 for a shift of zero was , we used n = in the above equation to compute m. Table 3 displays the estimated ARLs for small to medium values of the lag-one correlation that is, ϕ {0, 0.5, 0.5}. Each row of the table shows the results obtained with all selected SPC procedures for a combination

9 Distribution-free tabular CUSUM chart 35 of the lag-one correlation ϕ and the standardized shift (μ μ 0 )/σ that defines a specific configuration of the test process { i }; and the boxed entry in each row identifies the best (smallest) value of ARL 1 (and hence the best-performing SPC procedure) for that particular configuration of the test process. Table S-1 of the supplement (Kim, Alexopoulos, Tsui, and Wilson, 006) contains the estimated standard errors corresponding to the estimated ARLs given in Table 3. Among the three CUSUM-type charts we considered in this article, we found that the DFTC chart always outperformed the other two charts in the AR(1) process. However, the R&W chart was more efficient than the DFTC chart in detecting large shifts. Table 4 displays estimated ARLs for large values of the lag-one correlation that is, ϕ {0.7, 0.9, 0.95, 0.99}.For large lag-one correlation, we tested the performance of the DFTC chart with batching as well as without batching. We used Equation (16) to compute the batch size for the DFTC chart so that the lag-one correlation of the batch means was approximately 0.5. Based on the results in Table 4, we concluded that batching helped in getting ARL 0 closer to its target value of Batching caused some deterioration in the performance of the DFTC chart with respect to ARL 1, but this was not that significant because the required batch sizes were not that large. From Table 4 we found that for large values of the lag-one correlation, the DFTC procedure did not always outperform the other two CUSUM-type charts. There were a few cases involving small shifts for which MFC had a smaller value of ARL 1 than did the DFTC chart. Both MFC and DFTC were more effective in detecting small shifts compared with R&W. For large shifts R&W still outperformed the three CUSUM-type charts. However, when ϕ = 0.99, R&W required an excessive batch size; and then R&W required one full batch even for large shifts. This delayed legitimate out-of-control alarms and degraded the performance of the R&W chart. This problem is demonstrated more clearly in the following example involving waiting times in a single-server queueing system. 4.. M/M/1 queue waiting times In an M/M/1 queueing system, we let A i denote the interarrival time between the customers numbered i 1 and i so that {A i : i = 1,,...} i.i.d. exponential(μ A ) and E[A i ] = μ A ; moreover, we let B i denote the service time of the ith customer so that {B i : i = 1,,...} i.i.d. Exponential(μ B ) and E[B i ] = μ B.If i denotes the waiting time in the queue for the ith customer in this single-server queueing system, then we see that i+1 = max{0, i + B i A i+1 } for i = 1,,.... As detailed, for example, in Section 4. of Steiger and Wilson (001), the M/M/1 queue waiting times { i : i = 1,,...} constitute a test process with highly nonnormal marginals and an autocorrelation function that decays approximately at an exponential rate. In terms of the arrival rate λ = 1/μ A, the service rate ν = 1/μ B, and traffic intensity τ = λ/ν, the process { i } has marginal distribution function 0, y < 0, F (y) = Pr{ i y} = 1 τ, y = 0, (17) 1 τe (ν λ)y, y > 0, so that the marginal mean and variance are given by μ 0 = τ λ(1 τ) and σ = τ 3 ( τ) λ (1 τ), (18) respectively. The lag-l covariance of the process { i } is Cov( i, i+l ) = 1 τ r z l +3/ (r z) 1/ πλ 0 (1 z) 3 dz for l = 0, ±1, ±,..., (19) where r = 4τ/(1 + τ) so that 0 < r < 1; and the variance parameter is given by = τ 3 (τ 3 4τ + 5τ + ). (0) λ (1 τ) 4 The service rate of the in-control process was set to ν = 1. To test different levels of dependence, we took the arrival rate λ {0.3, 0.6} so that for the traffic intensity of the incontrol system, we had τ {0.3, 0.6}. We generated the process { i : i = 1,,...} using the algorithm of Schmeiser and Song (1989) so that the process started in steady-state operation and thus had the steady-state properties (17) (0). To generate shifted data with mean μ, we first generated observations from an in-control process with mean μ 0 given by Equation (18); and then we added the constant μ μ 0 to each observation. This approach yielded the desired shifts in the process mean without affecting the covariance structure. Similar to the AR(1) processes, the shift μ μ 0 was varied over the following multiples of σ :0,0.5, 0.5, 0.75, 1, 1.5,,.5, 3, and 4. When applying the DFTC procedure to the M/M/1 queue waiting times for the selected values τ = 0.3 and τ = 0.6 of the traffic intensity, we used Equation (16) to estimate that the batch sizes m = and m = 10 would be respectively required in the two selected system configurations so as to achieve an approximate lag-one correlation of 0.5 for the batch means. Similarly when applying the R&W procedure for the selected values τ = 0.3 and τ = 0.6ofthe traffic intensity, we found by trial-and-error experimentation that the batch sizes m 1 = 11 and m 1 = 55 would be respectively required in the two selected system configurations so as to achieve a lag-one correlation of at most 0.1 for the batch means. Table 5 summarizes the experimental results obtained for the M/M/1 queue waiting time process. From Table 5, we concluded that J&B and MFC achieved values of ARL 0 that were close to the target value of However, due to correlation and nonnormality of the monitored process

10 36 Kim et al. Table 4. Two-sided ARLs in terms of number of raw observations for an AR(1) process with high ϕ and σ = 1 ϕ DFTC R&W Shift (μ μ 0 )/σ J&B MFC Unbatched m = 3 m 1 = 19 m = Unbatched m = 7 m 1 = 58 m = Unbatched m = 15 m 1 = 118 m = Unbatched m = 74 m 1 = 596 m =

11 Distribution-free tabular CUSUM chart 37 Table 5. Two-sided ARLs in terms of number of raw observations for M/M/1 queue waiting times τ DFTC R&W Shift (μ μ 0 )/σ J&B MFC Unbatched m = m 1 = 11 m = Unbatched m = 10 m 1 = 55 m = { i }, there was some deviation from the target ARL 0 for the DFTC chart. Batching helped to reduce this deviation from the target value of ARL 0 while causing a small degradation in performance of DFTC with respect to ARL 1. The performance of R&W was significantly degraded because of the large batch size required to achieve approximately i.i.d. normal batch means. Because of the nonnormality of i as revealed in Equation (17), the batch size m 1 that resulted in a lag-one correlation of approximately 0.1 was not large enough to achieve approximate normality of the batch means; and the actual value of ARL 0 for the R&W procedure deviated substantially from the target value of For example, with m 1 = 11, the R&W procedure delivered ARL 0 = 700 when τ = 0.3. To calibrate the R&W procedure in this situation, we increased the batch size m until the estimated ARL 0 was close to the target value. The resulting batch sizes were quite large we had to take m = 300 when τ = 0.3, and we had to take m = 400 when τ = 0.6. Such large batch sizes caused catastrophic degradation in the performance of the R&W procedure in this test process. We emphasize that the trial-and-error experimentation required to determine the batch sizes used by the R&W procedure in this test process greatly exceeded the amount of experimentation required for the other procedures; and thus it can be argued that the R&W procedure was given an unfair advantage in the performance comparison based on the results in Table 5. At a minimum we leaned over backwards to be fair in our evaluation of the R&W procedure; and this should be borne in mind in all the performance comparisons involving the R&W procedure AR() processes A second-order autoregressive (that is, AR()) process is generated according to the relation i = μ + ϕ 1 ( i 1 μ) + ϕ ( i μ) + ε i for i = 1,,..., (1) where: (i) the residuals satisfy {ε i : i = 1,,...} i.i.d. N(0,σε ); and (ii) the autoregressive parameters ϕ 1 and ϕ satisfy the constraints ϕ 1 + ϕ < 1, ϕ ϕ 1 < 1 and 1 <ϕ < 1 to ensure that Equation (1) defines a stationary process at least asymptotically as i. Before discussing the initial condition ( 1, 0 ) that ensures the process { i : i = 1,,...} starts in steady-state operation, we must first discuss the second-order moment structure of this process.

12 38 Kim et al. The marginal variance and variance parameter of this process are given by ( ) 1 σ = ϕ σε 1 + ϕ (1 ϕ ) ϕ1 and = σε (1 ϕ 1 ϕ ), () respectively; see, for example, Equations (3) and (4) of Sullivan and Wilson (1989). We obtain the autocorrelation function {Corr( i, i+l ):l = 0, ±1, ±,...} for the stationary AR() process by considering the auxiliary equation z ϕ 1 z ϕ = 0 with roots given by r 1 = ϕ 1 + ϕ1 + 4ϕ and r = ϕ 1 ϕ1 + 4ϕ. (3) From Equation (3.5.35) of Priestley (1981), we have that Corr( i, i+l ) ( ) 1 r l +1 r 1 ( ) 1 r1 l +1 r, if r = (r 1 r )(1 + r 1 r ) 1 r, [1 + l ( )/( )] 1 r1 1 + r l 1 r 1, if r 1 = r, for l = 0, ±1, ±,... (4) If ϕ1 + 4ϕ < 0, then ϕ < 0 and the roots r 1, r in Equation (3) are complex conjugates; and in this case we have r 1 r = r 1 = r = ϕ < 1 from the stationarity condition (ii) immediately following Equation (1). Thus we can write that r 1 = ϕ exp(θ 1) and r = ϕ exp( θ 1), where θ = cos 1 [ϕ 1 /( ϕ )]. In terms of the phase ψ = tan 1 [tan(θ)(1 ϕ )/(1 + ϕ )], we see that the autocorrelation function of the AR() process has the damped sinusoidal form, Corr( i, i+l ) = ( [ ] sin( l θ + ψ) ϕ ) l sin(ψ) for l = 0, ±1, ±,... ( if ϕ1 + 4ϕ < 0 ). (5) For a complete discussion of the behavior of the autocorrelation function of the AR() process in the less-interesting case that ϕ1 + 4ϕ 0, see Section of Priestley (1981). From Equations () and (4) it follows that the lag-one covariance of the AR() process is always given by ( ) Cov(X i, X i+1 ) = σ ϕ1. (6) 1 ϕ If we sample the initial condition for the AR() process (1) according to ( ) ([ ] [ ]) 1 μ N,σ 1 ϕ 1 /(1 ϕ ), 0 μ ϕ 1 /(1 ϕ ) 1 then Equation (6) implies that the process starts in steadystate operation that is, the subsequent observations { i : i = 1,,...} generated according to Equation (1) constitute a stationary stochastic process. Fig. 3. Autocorrelation function of the AR() process (1) with ϕ 1 = 1.8 and ϕ = 0.9, with the lag l on the horizontal axis and the lag-l autocorrelation Corr( i, i+l ) on the vertical axis. In the experiments reported below, we set ϕ 1 = 1.8 and ϕ = 0.9 to obtain the damped sinusoidal autocorrelation structure of the form (5), which differs substantially from that of the AR(1) and M/M/1 queue-waiting-time processes. Figure 3 shows the autocorrelation structure of this process out to lag 100; one can see that the autocorrelation function exhibits pseudoperiodic behavior with a period of π/θ time units. From the discussion on pages of Priestley (1981), we see that the original AR() process { i } also exhibits a kind of distorted periodicity with the same period. For this test process, the marginal variance of i was set to σ = 1; therefore, Equation () implies that σε The shift in the mean was varied over the following multiples of the marginal standard deviation σ :0,0.5, 0.5, 0.75, 1, 1.5,,.5, 3, and 4. Evaluation of Equation (5) reveals that the lag-one autocorrelation is Corr( i, i+1 ) To lessen the effects of such high correlation, we ran DFTC with batch means based on batch size m = 14, which was calculated from Equation (16), as well as without batching. However, one can easily verify that m = 4 is actually the smallest batch size that reduces the lag-one autocorrelation of batch means below 0.5; and the batch size m = 14 actually reduces the lag-one autocorrelation of the batch means to 0.59 approximately. Because Equation (16) is designed for use in processes whose autocorrelation structure is similar to that of an AR(1) process, it is not surprising that Equation (16) is not particularly accurate in this test process. Nevertheless, Equation (16) yielded a reasonable batch size for this test process and for all other test processes to which it has

13 Distribution-free tabular CUSUM chart 39 Table 6. Two-sided ARLs in terms of number of raw observations for the AR() process with ϕ 1 = 1.8 and ϕ = 0.9 DFTC Shift (μ μ 0 )/σ J&B MFC Unbatched m = 14 R&W m = been applied (see, for example, Steiger et al. (005) and Lada et al. (006)); and thus we ran the distribution-free tabular CUSUM chart with the batch size m = 14 in this experiment. For R&W, we found that m = 6 was the smallest batch size that reduced the lag-one autocorrelation of batch means to a level not exceeding 0.1. As shown in Table 6, the estimated values of ARL 0 for J&B and MFC were close to the target value of R&W also achieved an ARL 0 close to the target, as normality was not a problem in the AR() process. However, DFTC delivered ARL 0 = 7050 without batching and ARL 0 = 899 with batching; and these in-control average run lengths were judged to be somewhat but not drastically lower than the target value of We conjectured that this was mainly due to negative correlation dominating the AR() process as shown in Fig. 3. However, this loss in ARL 0 was easily compensated by the superb performance that was achieved in ARL 1 for shift sizes ranging from 0.5 to.0. For example, for the shift size 0.5, DFTC without (respectively, with) batching detected the shift at the 90th (respectively, the 359th) observation while the other three SPC charts required 77, 553, and 6451 observations on average. For a shift size of 1.0, DFTC without (respectively, with) batching required 56 (respectively, 84) observations on the average while the other charts required 198, 139, and 55 observations on average. 5. Conclusions and recommendations In this article we developed and evaluated the performance of DFTC, a distribution-free CUSUM chart for autocorrelated data. From the experimental results, we concluded that the proposed chart reacted more quickly to meaningful shifts than other existing distribution-free CUSUM charts at the cost of slight but not significant deviation in ARL 0 for high correlation. The proposed chart provides a simple way to determine the control limits, and it allows for the use of raw (unbatched) observations. To improve the accuracy of the setup for determining the control limits and to lessen the deviation in the actual ARL 0 from the target value, batching can be used. The batch sizes required for this purpose are usually quite small, and a routinely applicable method for choosing a good batch size is provided. In terms of chart performance, we concluded from our experiments that for an approximately Gaussian monitored process with small to moderate lag-one correlation, DFTC generally outperformed the other existing distribution-free CUSUM-type charts and was competitive with the R&W chart. If the monitored process exhibited marked departures from normality or a pronounced dependency structure, then our experimental results indicated that DFTC outperformed existing distribution-free SPC charts for autocorrelated data, including the R&W chart. The chief limitation of the experimentation reported in this article is that it is based on the assumption that the marginal variance and the variance parameter are known quantities. In many practical applications, the uncertainty about the values of these quantities is at least as great as the uncertainty about the value of the process mean; and thus extensive follow-up analysis and experimentation is required to evaluate the performance of the selected SPC procedures for monitoring shifts in the process mean when those procedures are augmented with appropriate varianceestimation procedures. Nevertheless, the experimental results presented in this article strongly suggest that DFTC can be used as the foundation for the ultimate development of an SPC procedure for correlated processes that can be directly applied in practice. This is the subject of ongoing research. Acknowledgements The authors thank David Goldsman and the anonymous referees for several suggestions which substantially improved this paper. References Alexopoulos, C., Goldsman, D. and Serfozo, R.F. (006) Stationary processes: statistical estimation, in The Encyclopedia of Statistical Sciences, nd edn., Balakrishnan, N., Read, C. and Vidakovic, B. (eds.), Wiley, New ork, N. Bagshaw, M. and Johnson, R.A. (1975) The effect of serial correlation on the performance of CUSUM tests II. Technometrics, 17(1), Ben-Gal, I., Morag, G. and Shmilovici, A. (003) Context-based statistical process control: a monitoring procedure for state-dependent processes. Technometrics, 45(4), Billingsley, P. (1968) Convergence of Probability Measures, Wiley, New ork, N. Carlstein, E. (1986) The use of subseries values for estimating the variance of a general statistic from a stationary sequence. The Annals of Statistics, 14(3),

A New Model-Free CuSum Procedure for Autocorrelated Processes

A New Model-Free CuSum Procedure for Autocorrelated Processes Seong-Hee Kim, Christos Alexopoulos, David Goldsman, and Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of