Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 Using a Comutational Intelligence Hybrid Aroach to Recognize the Faults of Variance hifts for a Manufacturing Process Yuehjen E. hao Deartment of tatistics and Information cience, Fu Jen Catholic University, New Taiei City, Taiwan Email: stat3@mail.fju.edu.tw Abstract tatistical Process Control (PC) chart is effective in monitoring a rocess. When an PC chart monitors a univariate rocess, it is not difficult to determine the assignable causes due to the fact that a univariate PC chart only monitors a single quality characteristic. However, when a Multivariate tatistical Process Control (MPC) chart is used to monitor a multivariate rocess, it is comlicated to determine which quality characteristic(s) at fault. This study rooses a hybrid classification model to recognize the quality characteristic(s) at fault when the variance shifts occurred in a multivariate rocess. The roosed mechanism includes the hybridization of Artificial Neural Network (ANN) and analysis of variance (ANOVA). The erformance of the roosed aroach is evaluated by conducting a series of exeriments. Index Terms variance shift, hybrid, multivariate statistical rocess control chart I. INTRODUCTION For the alications of monitoring univariate rocesses, the detecting caability of tatistical Process Control (PC) charts tyically ensures that the determination of sources of faults becomes easier and sooner. This is due to the fact that there is only one quality variable for a univariate rocess. The out-of-control signal is simly to imly that the single quality variable is at fault, and the rocess ersonnel only need to ay attention to remedy this single variable. There are, however, two or more quality variables involved in a multivariate rocess. The Multivariate tatistical Process Control (MPC) signal only indicates that the rocess faults have occurred, but it doesn t reort which quality variable or grou of variables cause those faults. Before taking the correct and remedial actions, the rocess ersonnel have to firstly identify or determine the quality variables at fault. However, this identification is not straightforward in ractice. The more quality variables are involved in a multivariate rocess, the degree of difficulty for the identification is considerably increased. When monitoring a multivariate rocess, an out-ofcontrol multivariate PC signal indicates that the rocess disturbances have occurred. The rocess ersonnel should Manuscrit received Aril 2, 25; revised November 2, 25. doi:.878/jiii.4.2.3-35 3 start searching the ossible root causes of the roblems. However, the rocess ersonnel are usually only aware that the underlying rocess is in an unsteady state. Determining which set or which of the monitored quality variables is resonsible for this signal is a very difficult task. Accordingly, effective determination of the source of rocess faults has become a challenging issue in rocess industries. Many studies investigate the sources of a rocess fault. The grahical aroaches, such as olygonal charts [], line charts [2], multivariate rofile charts [3] and boxlot charts [4], were emloyed to assist in determining the quality variables at fault in a rocess. However, the oerations of these grahical aroaches are tedious and subjective. The statistical decomosition methods were then used to interret the contributors to an PC signal. For examle, Mason et al. [5] roosed a useful aroach to decomose the T²statistic into indeendent arts, each of which reflects the contribution of an individual quality variable. In addition, the techniques which were roosed by [6]-[8] alied the same concet to decomose the T² statistics. However, these methods have not been analyzed in terms of the ercentage of success in the classification of the variables that have actually shifted in the rocess [9], []. The Princial Comonents Analysis (PCA) method was investigated to determine the quality variables at fault in a multivariate rocess []. However, the PCA aroach may be argued that the dimensionality of data may not be efficiently reduced by linear transformation. Also, the roblem of the PCA consists in the fact that the directions maximizing variance do not always maximize information. More recently, some comutational intelligence aroaches are used to determine which of the quality variables is resonsible for the PC signal. A comarative study has been conducted by [9], []. Both two research concluded that the erformance of comutational intelligence methods are better than those obtained using the decomosition aroach. In addition, A two level-based model for identifying the sources of the out-of-control signals was develoed [2]. An ANNbased model was also roosed to identify and quantify the mean shifts for the bivariate rocesses [3]. A study roosed an ANN based identifier to detect the mean shifts and simultaneously to identify the sources of the shifts for a multivariate rocesses [4]. The sources of
Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 rocess variance faults with the use of ANN and uort Vector Machine (VM) has been investigated; however, the considerations of rocess variance shifts were large [5]. A hybrid model for on-line analysis of PC signals in multivariate manufacturing rocesses were develoed [6], [7]. A study on identification which of the monitored quality variables is resonsible for the MPC signal has been investigated [8]. However, it focused on the faults of mean shift for a multivariate normal rocess. A study which was comosed of indeendent comonent analysis and VM to determine the fault quality variables when a ste-change disturbance existed in a multivariate rocess [9]. In addition, different from the resent study, an integrated aroach of ANOVA, ANN, and an identification strategy was used to identify the starting oint of the faults of variance shifts in a multivariate rocess [2]. While most of the related studies use the rocess mean shift as a fault, this study considers a rocess variance shift as the rocess fault. The resent study is concerned with the situation in which a multivariate rocess with seven quality characteristics that is monitored by the, an MPC chart. Additionally, this study assumes that the rocess covariance matrix has shifted from when the rocess fault has occurred. There are 29 inut variables to be considered in this study. It is not ractical to use all the 29 variables as the inuts for the Artificial Neural Network (ANN) classifier. Consequently, this study uses a hybrid technique to select fewer but more significant exlanatory variables. This is the initial hase of building the roosed scheme. The fewer but significant variables are then served as the inuts for the ANN models. This modeling is the second hase of the hybrid scheme. The structure of this study is organized as follows. ection 2 introduces the rocess models and the chart. ection 3 resents the methodologies which we used in the resent study. The simulation results are addressed in ection 4. The final section rovides the research findings and future directions. II. THE MODEL Assume that the multivariate rocess is initially in control, and the samle observations can be obtained from an unknown distribution F, with a known mean vector ~ and covariance matrix. After a fault of variance shift has been occurred in a certain time, the resent study assumes that the rocess covariance matrix changes from to, where [2],,2, j, 2, 2,2 2, i, i, j,,2, j, to,,2, j, j+, 2, 2,2 2, j 2, j+ 2, i+, i+,2 i, j i+, j + i+, 2 i, i,2 i, j i, j+ i,,,2, j, j+, where is the inflated ratio. Let the samle variancecovariance matrix in subgrou i be defined as follows: n i ( X - X )( X - X ) n- j ~ i, j ~ i ~ i, j ~ i i,, i,, 2 i,, 2 2 2 i,, i,, i, 2, i,, i,, 2 i,, In addition, the samle generalized variances i, i=, 2,, and the following Uer Control Limit (UCL) and Lower Control Limit (LCL) are used to monitor a multivariate rocess variance shift [2]: UCL= ( b +3 b ) 2 LCL=max (, ( b -3 b )), 2 where is the determinant of, and b = ( - ), n i ( n -) i b2 = 2 ( n - i) ( n - i+2)- ( n - i) ( n -) i i i III. THE METHODOLOGIE The roosed hybrid aroach combines the framework of ANOVA and an ANN. Firstly, a one-way ANOVA test is alied to select imortant, influential variables. Then, the selected significant variables are served as the inut variables for ANN classifier. A. ANOVA Modeling The urose of erforming a one-way ANOVA in initial hase is to determine whether data from the incontrol and out-of-control grous have a common mean, i.e., to determine whether the measured characteristic from the in-control and out-of-control grous are actually different. Because matrix i is symmetric, only the elements on and above the diagonal need to be examined by the one-way ANOVA. To simlify the notation, let Y i, = i,,, Y i,2 = i,,2, Y i, i,,..., =, Y =, Y =,..., Y =, Y = i, + i,2,2 i, +2 i,2,3 i, N i,, i, N where N=+ ( +) 2. We also let Y i, j, k, l be the l th observation at the k th level of the factor (where level reresents an in-control grou and level 2 reresents an out-of-control grou) for the variable Y i,j mentioned above, i=,2,,t; j=,2,,n; k=,2; l=,2,,n i,j,k. Let 32
Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 i, j and i, j, k be the corresonding overall mean and treatment effect, resectively. Accordingly, the linear equation for the one-way ANOVA model is Y = + +, k,2; l,2,..., n, i, j, k, l i, j i, j, k i, j, k, l i, j, k i,2,..., T; j,2,... N To identify significant variables, an F-test statistic is used to test the differences between the in-control and out-of control grous. Those significant variables selected in this hase are then substituted into the ANN to construct a hybrid model. B. ANN Modeling The use of ANN for forecasting and modeling has received considerable interest. ANN ossesses the following characteristics. First, ANN is a data-driven or model-free aroach in which fewer assumtions are needed. ANN modeling is able to develo an internal reresentation of the relationshi between the data or variables. Accordingly, ANN modeling does not require assumtions about the nature of the distribution of the data. However, MR modeling is sometimes being criticized for the strong model assumtions such as variation homogeneity, and hence limited its alication. econd, ANN has excellent learning caability. After fitting the samle data, ANN often can cature effectively the natural structure and correctly infer the unseen oulation. This feature comes from the ANNs arallel rocessing caability, and it enriches the ANNs to be able to aroximate a large class of functions with a high degree of accuracy. The last characteristic is that ANNs are nonlinear maing mechanisms. ANN is a arallel system comrised of highly interconnected rocessing elements that are based on neurobiological models. ANN rocesses information through the interactions of a large number of simle rocessing elements, the neurons. The neural network s nodes are usually divided into three layers: the inut, the outut and the hidden layers. The nodes in the inut layer receive inut signals from an external source, and the nodes in the outut layer rovide the target outut signals. ANN modeling can be described briefly as follows. The relationshi between outut (y) and inuts x, x,..., x ) in an ANN model can be formed as: ( 2 a b y g( x ), j j ij i j i where ( j,, 2,..., b) and j ij ( i,, 2,..., a ; j,, 2,..., b) are model connection weights; a is the number of inut nodes; b is the number of hidden nodes, and is the error term. The transfer function in the hidden layer is often reresented by a logistic function, gz () ex( z ) IV. a REULT AND ANALYI A series of simulations were conducted in order to show the erformance of the roosed aroach. Without loss of generality, this study assumed that each quality characteristic was initially samled from a normal distribution with zero mean and one standard deviation. Because we considered 7 quality characteristics for the multivariate normal rocess, there are 2 7 - ossible tyes of variance shifts. They are reresented by (,,,), (,,,,),, and (,,,), where denotes a quality characteristic that is at fault and denotes a quality characteristic that is not at fault. For an abnormal variance vector structure, we considered two most common cases of variance shifts for demonstration: (,,,,,,) and (,,,,,,). This study also considered four different values of in the rocess models; namely, =.2, =.4, =.6, and =.8, resectively. The samle size was assumed to be. Now, consider the case of a multivariate rocess with seven quality characteristics that is monitored by the, and the samle variance-covariance matrix can be described as follows. 2 7 2 22 27 7 72 77 In this condition, we have 29 variables in samle variance-covariance matrix. TABLE I..2.4.6.8 VARIABLE ELECTION REULT BY ANOVA Variables selected, 2, 3, 4, 5, 6, 7, 22, 33, 45, 66, (i.e., determinant of ), 2, 3, 4, 5, 6, 7, 22, 23, 24, 25, 26, 27, 33, 44, 66,, 2, 3, 4, 5, 6, 7, 22, 23, 24, 25, 26, 27, 33, 36, 37, 44, 45, 66,, 2, 3, 4, 5, 6, 7, 22, 23, 24, 25, 26, 27, 33, 35, 36, 37, 44, 45, 66, TABLE II. TRAINING AND TETING DATA TRUCTURE FOR ANN Variables at fault Outut node (Y) No 3 X 3 X and X 2 2 3 Number of observations After erforming ANOVA, Table I shows the variables selection results in hase. For examle, when =.2, the quality variables of, 2, 3, 4, 5, 6, 7, 22, 33, 45, 66 and are selected after ruing ANOVA rocedure. In addition, since the quality variables at fault would not be too many in a multivariate rocess, this study considers that the quality variables at faults are either X or X and X 2. When ANN is erformed, the training and testing data structures are dislayed in Table II. In this study, the training data sets include 9 data vectors. Whereas the first 3 data vectors are all from an in-control state, data vectors from 3 to 6 are from the out-of-control state where the X is at fault, and the last 33
Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 3 data vectors are from the out-of-control state where the two quality variables (i.e., X and X 2 ) are at fault. The structure of the testing data sets is same as the training data sets. For a tyical single stage of ANN model, we have 29 inut nodes. In our roosed hybrid models, we have 2, 7, 2 and 2 inut nodes for the ANOVA-ANN models with the cases of =.2, =.4, =.6 and =.8, resectively. For all the models, there is only one outut node (Y). This outut node indicates the classification results of the rocess status, where a value of imlies that the rocess is in control, a value of imlies that the rocess is out-of-control, where the X is at fault, and a value of 2 imlies that the rocess is out-of-control, where the X and X 2 are at fault. Table III dislays the simulation results for the Accurate Identification Rates (AIR) of the tyical and the roosed aroaches. Observing Table III, it is aarently seen that the roosed hybrid aroach outerforms the tyical ANN alone. For examle, in the case of =.2 and the variables X and X 2 are at fault, the AIRs are.5333 and.6333 for the tyical and roosed aroaches, resectively. This is almost 9% imrovement. TABLE III. AIR FOR CONVENTIONAL AND PROPOED APPROACHE Variable(s) fault at Conventional Aroach.2 X 57.67% 72.% X and X 2 53.33% 63.33%.4 X 58.33% 66.% X and X 2 62.67% 63.67%.6 X 65.% 64.% X and X 2 72.67% 74.33%.8 X 76.67% 74.67% X and X 2 82.% 87.67% Proosed hybrid aroach Table IV shows the overall erformance, in terms of AIR and the associated standard deviation, for the roosed hybrid and conventional aroaches. From Table IV, it imlies that the quality variable X at fault can be accurately recognized with 64.42% of chance by using the conventional ANN aroach. The associated standard deviation is 8.8%. Additionally, it indicates that the quality variable X at fault can be accurately recognized with 69.7% of chance by using the roosed hybrid model. The associated standard deviation is 5.%, which is smaller than the standard deviation of conventional aroach. As a result, the 7.37% of AIR imrovement can be achieved by using roosed aroach over conventional aroach. The associated standard deviation is decreased or imroved by 43.26%. Also, when the quality variables X and X 2 are both at fault, it can be accurately recognized with 67.67% and 72.25% of chances by emloying the conventional and roosed hybrid aroaches, resectively. The associated standard deviations are 2.4% and.48% for the conventional and roosed aroaches, resectively. Accordingly, the 6.77% of AIR imrovement can be achieved by using roosed aroach over conventional aroach. The associated standard deviation is imroved by 7.4%. TABLE IV. OVERALL PERFORMANCE FOR CONVENTIONAL AND PROPOED APPROACHE AIR tandard deviation Conventional aroach X at fault 64.42% 8.8% X and X 2 at fault 67.67% 2.4% Proosed hybrid aroach X at fault 69.7% 5.% X and X 2 at fault 72.25%.48% V. CONCLUION In this study, a hybrid aroach is roosed to overcome the difficulties of determining the quality variables at fault when a variance shift has occurred in a multivariate rocess. The roosed ANOVA-ANN models not only have fewer inut variables but also ossess better AIRs. The roosed hybrid model in this study is not the only combination techniques. ome other data mining techniques, such as multivariate adative regression slines or genetic algorithms, can be combined with suort vector machine to further refine the structure of the classifiers. Extensions of the roosed rocedures to data-driven design or real-time imlementation of fault tolerant control system are ossible. uch works deserve future concern. ACKNOWLEDGMENT This work is artially suorted by the Ministry of cience and Technology of the Reublic of China, Grant No. MOT 3-222-E-3-2. REFERENCE [] L. W. Blazek, B. Novic, and M. D. cott, Dislaying multivariate data using olylots, Journal of Quality Technology, vol. 9, no. 2,. 69 74, 987. [2] N. ubramanyan and A. A. Houshmand, imultaneous reresentation of multivariate and corresonding univariate x-bar charts using a line grah, Quality Engineering, vol. 7, no. 4,. 68 682, 995. [3] C. Fuchs and Y. Benjamin, Multivariate rofile charts for statistical rocess control, Technometrics, vol. 36, no. 2,. 82 95, 994. [4] O. O. Atienza, L. T. Ching, and B. A. Wah, imultaneous monitoring of univariate and multivariate PC information using boxlots, International Journal of Quality cience, vol. 3, no. 2,. 94 24, 998. [5] R. L. Mason, N. D. Tracy, and J. C. Young, Decomosition of T 2 for multivariate control chart interretation, Journal of Quality Technology, vol. 27,. 99 5, 995. [6] R. L. Mason, N. D. Tracy, and J. C. Young, A ractical aroach for interreting multivariate T²control chart signals, Journal of Quality Technology, vol. 29,. 396-46, 997. [7] N. H. Timm, Multivariate quality control using finite intersection tests, Journal of Quality Technology, vol. 28,. 233-243, 996. [8] G. C. Runger, F. B. Alt, and D. C. Montgomery, Contributors to a multivariate statistical rocess control signal, Communications in tatistics-theory and Methods, vol. 25,. 223 223, 996. [9] F. Aarisi, G. Avendaño, and J. Andanz, Techniques to interret T 2 control chart signals, IIE Transactions, vol. 38, no. 8,. 647-657, 26. 34
Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 [] Y. E. hao and B.. Hsu, Determining the contributors for a multivariate PC chart signal using artificial neural networks and suort vector machine, International Journal of Innovative Comuting, Information and Control, vol. 5,. 4899 496, 29. [] T. Kourti and J. F. MacGregor, Multivariate PC methods for rocess and roduct monitoring, Journal of Quality Technology, vol. 28,. 49-428, 996. [2]. T. A. Niaki and B. Abbasi, Fault diagnosis in multivariate control charts using artificial neural networks, Quality and Reliability Engineering International, vol. 2, no. 8,. 825-84, 25. [3] R.. Guh, On-Line identification and quantification of mean shifts in bivariate rocesses using a neural network-based aroach, Quality and Reliability Engineering International, vol. 23,. 367-385, 27. [4] H. B. Hwarng and Y. Wang, hift detection and source identification in multivariate autocorrelated rocesses, International Journal of Production Research, vol. 48, no. 3,. 835 859, 2. [5] C.. Cheng and H. P. Cheng, Identifying the source of variance shifts in the multivariate rocess using neural network and suort vector machines, Exert ystems with Alications, vol. 35,. 98 26, 28. [6] M. alehi, R. B. Kazemzadeh, and A. almasnia, On line detection of mean and variance shift using neural networks and suort vector machine in multivariate rocesses, Alied oft Comuting, vol. 2, no. 9,. 2973-2984, 22. [7] M. alehi, A. Bahreininejad, and I. Nakhai, On-line analysis of out-of-control signals in multivariate manufacturing rocesses using a hybrid learning-based model, Neurocomuting, vol. 74,. 283-295, 2. [8] Y. E. hao and C. D. Hou, Hybrid artificial neural networks modeling for faults identification of a stochastic multivariate rocess, Abstract and Alied Analysis, vol. 23,. -, 23. [9] Y. E. hao, C. J. Lu, and Y. C. Wang, A hybrid ICA-VM aroach for determining the fault quality variables in a multivariate rocess, Mathematical Problems in Engineering, vol. 22,. -2, 22. [2] Y. E. hao and C. D. Hou, Fault identification in industrial rocesses using an integrated aroach of neural network and analysis of variance, Mathematical Problems in Engineering, vol. 23,. -7, 23. [2] F. B. Alt, Multivariate quality control, in Encycloedia of tatistical ciences, N. L. Johnson and. Kotz, Eds. New York, NY, UA: John Wiley & ons, 985, vol. 6. Yuehjen E. hao is a Professor in the Deartment of tatistics and Information cience at Fu Jen Catholic University, New Taiei City, Taiwan. He holds a Ph.D. in Decision ciences and Engineering ystems from Rensselaer Polytechnic Institute, New York, U..A. His resent research interests include statistical rocess control, data attern recognition and classification, forecasting methods and big data. 35