Data reduction for multivariate analysis Using T 2, m-cusum, m-ewma can help deal with the multivariate detection cases. But when the characteristic vector x of interest is of high dimension, it is difficult to perform a task of detection. In a high dimension, the noise components can add up to a great magnitude, even if individual ones are relatively small. As a result, the aggregated noise effect can overwhelm the signal effects and makes it harder to reject the null hypothesis. This is known as "curse of dimensionality." On the other hand, it may not be necessary to use the original high dimensional data vector to perform a detection task. By the principle of effect sparsity, it is always the "vital few" instead of the "trivial many" that matters. if one can extract the so-called "vital few", then the task of detection can be conducted in a (much) lower data dimension. This process of mapping a high-dimension data vector to a lowdimension "vital few" is data reduction. 1
We will introduce the principal component analysis (PCA) as the data reduction tool here. Basic idea: look at the 2-D data cloud, one can easily notice that in certain direction, the variance is larger than those in the other directions. If a method can help identify all the major directions where most of the variability exist, those are the "vital few" effects to be monitored. A set of transformed variables along the "vital few" directions is called "principal components (PC)." 2
Definition of principal component (PC) - A short answer: a PC is a particular linear combination of elements in the original vector x p 1, which has the largest variation. - A formal mathematical definition 3
Definition of principal component (PC) - PCA is to find a i 's such that variances of y's are maximized, i.e., How to find the PCs? 4
PCA: find the a i 's that give us the PCs. Recall that Result 3.6, the spectral decomposition: E, the eigenvector matrix, can transform a set of correlated variables into a set of uncorrelated variables. As it turns out, the transformed variables also have the largest variability along their corresponding direction. So we should let a i = e i, the eigenvector. 5
Result 3.7 (principal component analysis): 6
Example 3.5: Find the eigenvalues/eigenvectors of x - Use MATLAB function eig(.) but notice that the MATLAB function arrange the eigenvalues in ascending order. 7
Example 3.5: Principal components 8
Example 3.5: a graphic illustration 9
PCA can also be applied to a correlation matrix. 10
Revisit Example 3.5. 11
More remarks 12
After applying PCA to the original data set, we will have the same number of PCs as the number of elements in the original vector, In order to reduce the data dimension, we can only retain the first few principal components, corresponding to the largest values in eigenvalue. So the question is how to decide the number of PCs to be kept? - Pareto plot of eigenvalues: With the eigenvalues ordered from largest to smallest, select the first m eigenvalues (and the corresponding PCs) if their aggregated effects can explain more than certain percentage (say, 85%) of the total variation in the data. - Scree plot: With the eigenvalues ordered from largest to smallest, a scree plot is a plot of i versus i. We look for an elbow (bend) in the plot. The number of components is taken to be the point at which the remaining eigenvalues are relatively small and all about the same size. 13
Pareto and scree plots Pareto plot Scree plot 14
So the question is how to decide the number of PCs to be kept? - Minimum description length (MDL) criterion (a more objective criterion): 15
Data reduction: Example 3.6 Example 3.6: data reduction and detection in a forging process. Data are obtained by strain sensors mounted on the supporting pillars of a forging press. They are in the form of profile signals (four of those are displayed in the earlier slide of Chapter 3). Each profile is digitalized into a vector of p=224 dimension. The historical dataset has a total of n = 530 profile signals. The data set is denoted as {x i } i =1,, 530 and each x i is a 224 1 vector. Tie Rod Flywheel Forging Press Crown Bearing Punch Speed Shut Height Upright Bolster Tonnage Sensors Die Linkage Gib Slide Tonnage Sensors Bed 16
Data reduction: Example 3.6 Example 3.6: here the objective is to perform a Phase I analysis that separate the in-control from the rest of the data. sample statistics Perform a PCA on S (substitute for ) - Use the MDL criterion it will keep 33 eigenvalues 2.2 x 105 2 1.8 MDL values 1.6 1.4 1.2 1 0.8 0 10 20 30 40 50 60 70 80 90 100 principal components 17
Data reduction: Example 3.6 eigenvalues Perform a PCA on S (substitute for ) - Scree plot 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 principal components - Finally retain the first three PCs. 18
Data reduction: Example 3.6 We use a multiple univariate detection charts on the first three PCs. One retionale that we can do this is because the PCs are uncorrelated so monitoring the individual PCs will not miss out the change in correlation in the original signals. We choose α = 0.0027 for individual charts so that the combined α for the whole procedure is 1 - (1-0.0027) 3 = 0.0081, three times higher than the individual charts. Recall that this is a Phase I analysis. After the analysis, we need to remove the out-of-control data points (seg # 1 and seg #3) and use the "in-control" data to establish the baseline for future monitoring and detection. 19
Data reduction: Example 3.6 PC1 PC2 PC3 The control charts: observe three major segments 500 0-500 0 100 200 300 400 500 600 200 0-200 0 100 200 300 400 500 600 200 0-200 0 100 200 300 400 500 600 index of cycles 20
Data reduction: Example 3.6 Tonnage (ton) Average of the original signals corresponding to the three segments: the different is much more subtle to notice than in the PCs. 1400 1200 1000 Seg # 1 In-Control Seg # 3 800 600 400 200 0-200 0 50 100 150 200 250 300 350 Crank Angle (degree) 21