Multivariate Process Control Chart for Controlling the False Discovery Rate

Industrial Engineering & Management Systems Vol, No 4, December 0, pp.385-389 ISSN 598-748 EISSN 34-6473 http://dx.doi.org/0.73/iems.0..4.385 0 KIIE Multivariate Process Control Chart for Controlling e False Discovery Rate Jang-Ho Park, Chi-Hyuck Jun* Department of Industrial and Management Engineering, POSTECH, Pohang, Korea (Received: November 9, 0 / Revised: November 0, 0 / Accepted: November, 0) ABSTRACT Wi e development of computer storage and e rapidly growing ability to process large amounts of data, e multivariate control charts have received an increasing attention. The existing univariate and multivariate control charts are a single hypoesis testing approach to process mean or variance by using a single statistic plot. This paper proposes a multiple hypoesis approach to developing a new multivariate control scheme. Plotted Hotelling s T statistics are used for computing e corresponding p-values and e procedure for controlling e false discovery rate in multiple hypoesis testing is applied to e proposed control scheme. Some numerical simulations were carried out to compare e performance of e proposed control scheme wi e ordinary multivariate Shewhart chart in terms of e average run leng. The results show at e proposed control scheme outperforms e existing multivariate Shewhart chart for all mean shifts Keywords: Average Run Leng, False Discovery Rate, Multivariate Shewhart Control Chart, p-value * Corresponding Auor, E-mail: chun@postech.ac.kr. INTRODUCTION Traditionally, quality is essential part of manufacturing in various industries, such as e chemical, semiconductor, automobile, computer, and cell phone industries. These days e importance of quality is emphasized also in service industries such as e banking, telecommunications, and heal care industries. Customers demand for better quality products is growing stronger especially since ey can now share e knowledge about quality rough e Internet and social networks. Therefore, quality improvement is one of e most important aspects of business. Montgomery (007) defines e term quality improvement as e reduction of variability in processes and products. There are two causes of variability in processes. The first cause is chance cause derived from random effects such as weaer conditions. Chance cause is considered as natural. The oer cause is assignable cause such as a failure in machinery or faulty raw materials. Assignable cause can be controlled by eoretical meodology. Therefore, many companies have tried to reduce e assignable cause. Alough ere are various meods to improve quality, statistical process control (SPC) is regarded as e most scientific and valid approach. In most industries, e univariate quality variable is used for separately monitoring key measurements on final products which in some way define e quality of at product (Mac- Gregor and Kourti, 995). However, e univariate control chart cannot detect wheer e variables are correlated wi each oer. To overcome is difficulty, several multivariate control charts have been proposed based on chi-squared statistics and Hotelling s T statistics. These charts can be considered an extension of each univariate control chart. The multivariate Shewhart control charts are an extension of e -control chart (Hotelling, 947) which is e simple and most widely used. Control charts can be interpreted using a p-value approach. By using a p-value approach, we get several

Park and Jun: Industrial Engineering & Management Systems Vol, No 4, December 0, pp.385-389, 0 KIIE 386 advantages over e traditional control charts. First, a p- value approach offers better graphical displays of e performance of e process and incorporates more complex control procedures (Benamini and Kling, 999). Li et al. (0) showed at we can determine how strong e signal is and how stable e process performs at a given time using univariate cumulative sum (CUSUM) charts. If we adopt a p-value approach, e control charts can be considered as a sequential single hypoesis test. Therefore, if we can establish e distribution of plotted statistics, we can control quality more easily by setting only type I error α (Lee and Jun, 00, 0). Finally, we can apply a multiple comparison procedure to control charts by testing single hypoesis simultaneously (Benamini and Kling, 999). In statistics, e multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously (Miller, 98). Since single hypoesis testing increases e false positive rate when various hypoeses are tested, e family wise error rate (FWER) is used in multiple comparison procedures. The FWER is e probability at at least one false positive or type I error will occur among all e hypoeses tested. Many procedures to control e FWER have been proposed such as e Bonferroni, Sidak, Tukey s, Holm s step-down, and Hochberg s step-up procedures. However, ese procedures are not widely used. They give conservative results when e number of hypoeses increases. Therefore, e utility of testing decreases. To overcome e weaknesses of e FWER, Benamini and Hochberg (995) proposed using e false discovery rate (FDR). The FDR is e expected portion of false positives among all e reected hypoeses. They also proposed a procedure for controlling e FDR. There has been an effort to apply e FDR to univariate control charts. Lee and Jun (00, 0) proposed procedures to control FDR for univariate -charts and exponentially weighted moving average (EWMA) charts. They showed at by controlling e FDR, -charts and EWMA charts give better performance an traditional control charts. Grown out of ese motivations, e obective of is paper is to provide a new multivariate process chart by controlling e FDR. The remainder of is paper is organized as follows. The multivariate process control is interpreted in terms of p-values in Section. Section also proposes a new multivariate control scheme for controlling e FDR. Section 3 compares e performances of e new control schemes using numerical experiments. Finally, Section 4 gives e conclusion.. MULTIVARIATE CONTROL SCHEME FOR CONTROLLING THE FDR Multivariate Shewhart control chart (Hotelling, 947) is used for more an two quality variables or equal to two quality variables. Basically, it uses Hotelling s T. Suppose at ere are q quality variables and e total number of observations is equal to m. Also, suppose at e single observation vector follows a multivariate normal distribution wi mean vector μ 0 and covariance matrix Σ. The proposed meod controls no single observation but subgroup. If e size of subgroup is n, e mean vector of subgroup, ( =,, 3, ) is like e following equation. = n i () n i= The vector i, is i observation which is included in subgroup. Therefore, e covariance matrix of subgroup is given by n n i= ( )( ) T S =. i i () T If μ0 =( μ0, μ0,, μ0q) is e target mean quality, en e statistics of e subgroup of e Multivariate SPC chart is defined by Hotelling s T statistics as (Anderson, 958), T = n( - μ ) S ( - μ ) (3) T - 0 0 Since e control chart controls e quality when e current condition is in an in-control state, T follows e Hotelling distribution wi a degree of freedom (q, n-q). T ~ q,n- T (4) The Hotelling T distribution is related to e more familiar F distribution. The relationship between e Hotelling T and e F distributions is F = n q T (5) q( n ) The upper control limit (UCL) of a multivariate SPC chart can be set using e above relationship, while e lower control limit (LCL) is 0 and e UCL is obtained using Eq. (6). T F (6) ( ) = q n UCL ( n q) α, q, n q Here F α,q,n q is 00 α% of e critical point of e F distribution wi a degree of freedom (q, n-q). Therefore, e multivariate SPC chart indicates an out-ofcontrol state when T is greater an or equal to T UCL. The multivariate Shewhart control scheme is considered to be a sequential single hypoesis testing which is described as, H : μ = μ vs. H : μ μ (=,, ) (7) 0 0 0

Multivariate Process Control Chart for Controlling e False Discovery Rate Vol, No 4, December 0, pp.385-389, 0 KIIE 387 If e null hypoesis μ = μ 0 is true, en e statistic T follows e Hotelling T distribution wi a degree of freedom (q, n-q). Therefore, each subgroup s p- value is p { n-q } ( qn q ) = P T > T H = G T (8) 0, q(n-) where G q,n-q is e tail probability of e F distribution wi a degree of freedom (q, n-q). The control chart indicates an out-of-control state when e p-values, p, is less an or equal to a type I error α. Anderson (958) proved at if e null hypoesis is not true, e statistics T follow a generalized T distribution ( T ) wi a degree of freedom (q, n-q). He also derived e relationship between e generalized T distribution and e non-central F distribution ( F ) to be F = T (9) n-q q(n-) where e non-centrality parameter (τ) is T 0 0 τ = n ( μ μ ) ( μ μ ) (0) If τ is 0, e non-central F distribution is exactly e same as e F distribution. Step : For each subgroup, compute T statistics. Step : For each T statistic, compute e p-value. Step 3: Apply e Benamini and Hochberg procedure. ) Specify e FDR level q. ) From e current testing point t to e previous testing point t-r+, sort e p-values in increasing order. If e ordered p-values are p (), p (),, p (r- ), p (r) e corresponding hypoeses are H (), H (),, H (r-), H (r) 3) If certain p (i) (I =,, r) values satisfy p() i qi/ r, (i =,, r) (9) e current state is considered to be out-of-control and e hypoesis H (i) is reected. 3. NUMERICAL EPERIMENTS In is section, some numerical experiments are performed to compare e BH scheme wi e multivariate Shewhart control chart. A eoretical average run leng (ARL) for e multivariate Shewhart control chart is compared wi a simulated ARL for e BH scheme. Since computing e simulated ARL for e BH scheme is difficult, e Monte Carlo simulation approach is used. In is section, two- and ree-dimensional quality variables are used for experiments. 3. Two-Dimensional Quality Variables First, e eoretical ARL is compared wi e simulated ARL using e p-value approach for e conventional multivariate Shewhart control chart. For twodimensional quality variables, e mean vector (0, 0) T, covariance matrices ( 0.; 0. ), ( 0.5; 0.5 ), and ( 0.8; 0.8 ) are used to compare e eoretical ARL and e simulated ARL. For e eoretical ARL, Eqs. (9) and (0) are used. For e simulated ARL, 0,000 iterations replicated 0 times are performed to compute an average value. Various mean shift sizes (0, 0.5,,.5,,.5, 3, 4, 5) were used for experimental purposes. If a mean shift size is 0.5, a shifted process has a mean vector of (0.5, 0.5) T, and e covariance matrix is unchanged. The critical value α was set at 0.005. The results of is experiment are provided in Table. Table indicates at e simulated results are almost exactly Table. ARL of multivariate Shewhart control chart for two-dimensional quality variables Covariance matrix Mean Shift 0. 0.5 0.8 0. 0.5 0.8 Simulation Theoretical Simulation Theoretical Simulation Theoretical 0 99.6347 00.0000 99.5738 00.0000 99.6769 00.0000 0.5 74.7893 74.8447 86.6555 86.45 96.0968 96.0559.0309.035 7.6506 7.737 33.505 33.083.5 8.953 8.9563.5473.546 4.588 4.679 4.6744 4.676 5.9996 6.000 7.3766 7.3789.5.933.944 3.690 3.675 4.4844 4.4709 3.058.0665.540.5360 3.0394 3.037 4.355.3565.56.5643.7967.793 5.5.50.9.5.363.358 ARL: average run leng.

Park and Jun: Industrial Engineering & Management Systems Vol, No 4, December 0, pp.385-389, 0 KIIE 388 e same as e eoretical results. Therefore, a p-value approach is stable and appropriate for use wi a multivariate Shewhart control chart. Also, e out-of-control ARLs are different given e same mean shifts level. Table shows e difference between maximum and minimum eigenvalue for two-dimensional covariance matrix. Tables and represent at e out-of-control ARL drops sharply when e difference between maximum and minimum eigenvalues is small. In oer words, when e quality variables are independent, e control charts detect e out-of-control signal quickly. The mean vector (0, 0) T and only e covariance matrix ( 0.5; 0.5 ) are used to compare ARLs. The span size for BH scheme was r = 0, 0, 30. For multivariate Shewhart control chart ARLs, eoretical values were used. For e BH scheme simulation, 0,000 iterations replicated 0 times were used to compute average values. Various mean shift sizes (0, 0.5,,.5,,.5, 3, 4, 5) were used for experimental purposes, and e critical value α was set at 0.005. It is axiomatic at a shorter ARL corresponds to a better control chart given e same ARL 0. A trivial case is where e ARL 0 of e MSCS is /α. Some experiments show at e ARL 0 of e BH-scheme is approximately /α when e FDR level of e BH-scheme is q = α r. Therefore, in is experiment, e FDR level was set at α r. The results of is experiment are listed in Table 3. The ARL of e BH scheme is smaller an at of e multivariate Shewhart control chart for all mean shift sizes. Therefore, e BH scheme performs better for two-dimensional quality variables. Also, in e case of e BH scheme, a larger span size results in better performance when e mean shift is small. These tendencies are e same in e various covariance matrix cases. 3. Three-Dimensional Quality Variables For ree-dimensional quality variables, e same procedure is adopted to compare e performance of e BH scheme and multivariate Shewhart control chart. To compare e performance of e BH scheme wi e multivariate Shewhart control chart, e mean vector (0, 0, 0) T and only e covariance matrix ( 0.5 0.5; 0.5 0.5; 0.5 0.5 ) are used. Oer conditions such as e number of iterations, number of replications, critical level α and mean shift size were e same as ose in e two-dimensional analysis set for above. Table 4 lists e results of is experiment. The interpretation of ese results is also e same as in e case of twodimensional quality variables. In oer words, e BH scheme performs better for ree dimensional quality variables. Also, in e case of e BH scheme, a larger span size results in better performance when e mean Table. Maximum and minimum eigenvalue for two-dimensional covariance matrix Covariance matrix st Eigenvalue nd Eigenvalue 0. 0. 0. 0. 0. 0. Difference between maximum and minimum eigenvalue. 0.8 0.4.5 0.5.0.8 0..6 Table 3. ARLs of BH scheme and multivariate shewhart control chart for two-dimensional quality variables Mean shift BH scheme (r = 0) BH scheme (r = 0) BH scheme (r = 30) Multivariate Shewhart 0 00.84 00.5796 0.60 00.0000 0.5 4.9605 06.4 98.7953 3.3485 50.045 4.649 36.5857 57.5694.5 3.7864 8.4067 5.06 30.6586.840 9.705 8.9444 8.6685.5 7.80 6.4639 6.44.558 3 5.3399 4.9407 4.988 9.07 4 3.3348 3.343 3.380 5.3909 5.5065.56.55 3.6733 ARL: average run leng, BH scheme: Benamini and Hochberg scheme.

Multivariate Process Control Chart for Controlling e False Discovery Rate Vol, No 4, December 0, pp.385-389, 0 KIIE 389 Table 4. ARLs of BH scheme and multivariate Shewhart control chart for ree-dimensional quality variables Mean shift BH scheme (r = 0) BH scheme (r = 0) BH scheme (r = 30) Multivariate Shewhart 0 00.84 00.5796 0.60 00.0000 0.5 4.9605 06.4 98.7953 3.3485 50.045 4.649 36.5857 57.5694.5 3.7864 8.4067 5.06 30.6586.840 9.705 8.9444 8.6685.5 7.80 6.4639 6.44.558 3 5.3399 4.9407 4.988 9.07 4 3.3348 3.343 3.380 5.3909 5.5065.56.55 3.6733 ARL: average run leng, BH scheme: Benamini and Hochberg scheme. shift is small. 4. CONCLUSION This paper proposed a new multivariate process control scheme, which intends to control e false discovery rate. The BH procedure is incorporated to control e FDR in e sense of multiple hypoesis testing. First, some simulation studies showed at e use of p- values in multivariate Shewhart control chart is appropriate. Finally, it was shown at e proposed control scheme outperforms e conventional chart in two- and ree- dimensional quality variables in terms of ARL. ACKNOWLEDGMENTS This research was supported by Basic Science Research Program rough e National Research Foundation of Korea from e Ministry of Education, Science and Technology (Proect No. 0-000665). REFERENCES Anderson, T. W. (958), An Introduction to Multivariate Statistical Analysis, Wiley, New York, NY. Benamini, Y. and Hochberg, Y. (995), Controlling e false discovery rate: a practical and powerful approach to multiple testing, Journal of e Royal Statistical Society B Meodological, 57(), 89-300. Benamini, Y. and Kling, Y. (999), A look at statistical process control rough e p-values, Research Paper: RP-SOR-99-08, Tel Aviv University, School of Maematical Science, Israel. Hotelling, H. (947), Multivariate quality control, illustrated by e air testing of sample bombsights. In: Eisenhart, C. (ed.), Selected Techniques of Statistical Analysis for Scientific and Industrial Research, and Production and Management Engineering, Mc- Graw-Hill Books, New York, NY. Lee, S. H. and Jun, C. H. (00), A new control scheme always better an -bar chart, Communications in Statistics-Theory and Meods, 39(9), 349-3503. Lee, S. H. and Jun, C. H. (0), A process monitoring scheme controlling false discovery rate, Communications in Statistics-Simulation and Computation, 4(0), 9-90. Li, Z., Qiu, P., Chatteree, S., and Wang, Z. (0), Using p values to design statistical process control charts, Statistical Papers, -7. MacGregor, J. F. and Kourti, T. (995), Statistical process control of multivariate processes, Control Engineering Practice, 3(3), 403-44. Miller, R. G. (98), Simultaneous Statistical Inference, Springer-Verlag, New York, NY. Montgomery, D. C. (007), Introduction to Statistical Quality Control, Academic Internet Publishers, Ventura, CA.