Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. 1
The Challenge The quality of an item or service usually depends on more than one characteristic. When the characteristics are not independent, considering each characteristic separately can give a misleading estimate of overall performance. 2
The Solution Proper analysis of data from such processes requires the use of multivariate statistical techniques. 3
Outline Multivariate SPC Multivariate control charts Multivariate capability analysis Data exploration and modeling Principal components analysis (PCA) Partial least squares (PLS) Neural network classifiers Design of experiments (DOE) Multivariate optimization 4
Example #1 Textile fiber Characteristic #1: tensile strength - 115 ± 1 Characteristic #2: diameter - 1.05 ± 0.05 5
Sample Data n = 100 6
X Individuals Chart - strength X Chart for strength 115.8 115.5 115.2 CTR = 114.98 UCL = 115.69 LCL = 114.27 114.9 114.6 114.3 114 0 20 40 60 80 100 Observation 7
X Individuals Chart - diameter X Chart for diameter 1.058 1.055 1.052 CTR = 1.05 UCL = 1.06 LCL = 1.04 1.049 1.046 1.043 1.04 0 20 40 60 80 100 Observation 8
frequency Capability Analysis - strength Process Capability for strength LSL = 114.0, Nominal = 115.0, USL = 116.0 24 20 Normal Mean=114.978 Std. Dev.=0.238937 16 12 8 DPM = 30.76 Cp = 1.41 Pp = 1.40 Cpk = 1.38 Ppk = 1.36 K = -0.02 4 0 114 114.4 114.8 115.2 115.6 116 strength 9
frequency Capability Analysis - diameter 20 16 Process Capability for diameter LSL = 1.04, Nominal = 1.05, USL = 1.06 DPM = 44.59 Normal Mean=1.04991 Std. Dev.=0.00244799 12 8 4 Cp = 1.41 Pp = 1.36 Cpk = 1.39 Ppk = 1.35 K = -0.01 0 1.04 1.044 1.048 1.052 1.056 1.06 diameter 10
strength Scatterplot Plot of strength vs diameter 116 115.5 correlation = 0.89 115 114.5 114 1.04 1.045 1.05 1.055 1.06 diameter 11
Multivariate Normal Distribution Multivariate Normal Distribution 114 114.5 115 115.5 116 strength 1.06 1.04 1.045 1.051.055 diameter 12
strength Control Ellipse Control Ellipse 115.8 115.5 115.2 114.9 114.6 114.3 114 1.04 1.043 1.046 1.049 1.052 1.055 1.058 diameter 13
Multivariate Capability Determines joint probability of being within the specification limits on all characteristics Observed Estimated Estimated Variable Beyond Spec. Beyond Spec. DPM strength 0.0% 0.00307572% 30.7572 diameter 0.0% 0.00445939% 44.5939 Joint 0.0% 0.00703461% 70.3461 14
Multivariate Capability Multivariate Normal Distribution DPM = 70.3461 113.5 114 114.5 115 115.5 116 116.5 strength 1.04 1.045 1.05 1.055 1.06 1.065 1.035 diameter 15
diameter Capability Ellipse 1.065 99.73% Capability Ellipse MCP =1.27 1.06 1.055 1.05 1.045 1.04 1.035 113.5 114 114.5 115 115.5 116 116.5 strength 16
Mult. Capability Indices Defined to give the same DPM as in the univariate case. Capability Indices Index Estimate MCP 1.27 MCR 78.80 DPM 70.3461 Z 3.80696 SQL 5.30696 17
empirical data Test for Normality P-Values Shapiro-Wilk strength 0.408004 diameter 0.615164 Probability Plot 3.4 2.4 strength diameter 1.4 0.4-0.6-1.6-2.6-2.6-1.6-0.6 0.4 1.4 2.4 3.4 normal distribution 18
More than 2 Characteristics Calculate T-squared: T 2 i ( x x) 1 S i ( x i x) where S = sample covariance matrix x = vector of sample means 19
T-Squared T-Squared Chart 30 Multivariate Control Chart UCL = 11.25 25 20 15 10 5 0 0 20 40 60 80 100 120 Observation 20
T-Squared Decomposition Subtracts the value of T-squared if each variable is removed. T-Squared Decomposition Relative Contribution to T-Squared Signal Observation T-Squared diameter strength 17 26.3659 22.9655 25.951 Large values indicate that a variable has an important contribution. 21
rnormal(100,10,1) Control Ellipsoid Control Ellipsoid 12.8 11.8 10.8 9.8 8.8 7.8 6.8 1.04 1.044 1.048 1.052 1.056 1.06 diameter 116 115.6 115.2 114 114.4 114.8 strength 22
T-Squared Multivariate EWMA Chart 15 12 Multivariate EWMA Control Chart UCL = 11.25, lambda = 0.2 Largest strength diameter 9 6 3 0 0 20 40 60 80 100 120 Observation 23
Gen. Variance Generalized Variance Chart Plots the determinant of the variance-covariance matrix for data that is sampled in subgroups. (X 1.E-7) 6 5 4 3 Generalized Variance Chart UCL = 3.281E-7 CL = 7.01937E-8 LCL = 0.0 2 1 0 0 4 8 12 16 20 24 Subgroup 24
Data Exploration and Modeling When the number of variables is large, the dimensionality of the problem often makes it difficult to determine the underlying relationships. Reduction of dimensionality can be very helpful. 25
Example #2 26
Matrix Plot MPG City MPG Highway Engine Size Horsepow er Fueltank Passengers Length Wheelbase Width U Turn Space Weight 27
Analysis Methods Predicting certain characteristics based on others (regression and ANOVA) Separating items into groups (classification) Detecting unusual items 28
Multiple Regression MPG City = 29.6315 + 0.28816*Engine Size - 0.00688362*Horsepower - 0.297446*Passengers - 0.0365723*Length + 0.280224*Wheelbase + 0.111526*Width - 0.139763*U Turn Space - 0.00984486*Weight Standard T Parameter Estimate Error Statistic P-Value CONSTANT 29.6315 12.9763 2.28351 0.0249 Engine Size 0.28816 0.722918 0.398607 0.6912 Horsepower -0.00688362 0.0134153-0.513119 0.6092 Passengers -0.297446 0.54754-0.543241 0.5884 Length -0.0365723 0.0447211-0.817786 0.4158 Wheelbase 0.280224 0.124837 2.24472 0.0274 Width 0.111526 0.218893 0.5095 0.6117 U Turn Space -0.139763 0.17926-0.779668 0.4378 Weight -0.00984486 0.00192619-5.11104 0.0000 R-squared = 73.544 percent R-squared (adjusted for d.f.) = 71.0244 percent Standard Error of Est. = 3.02509 Mean absolute error = 1.99256 29
Principal Components The goal of a principal components analysis (PCA) is to construct k linear combinations of the p variables X that contain the greatest variance. C1 a11x1 a12x 2... a1 p X p C2 a21x1 a22x 2... a2 p X p C k a X a X... k1 1 k 2 2 a kp X p 30
Eigenvalue Scree Plot Shows the number of significant components. 6 5 Scree Plot 4 3 2 1 0 0 2 4 6 8 Component 31
Percentage Explained Principal Components Analysis Component Percent of Cumulative Number Eigenvalue Variance Percentage 1 5.8263 72.829 72.829 2 1.09626 13.703 86.532 3 0.339796 4.247 90.779 4 0.270321 3.379 94.158 5 0.179286 2.241 96.400 6 0.12342 1.543 97.942 7 0.109412 1.368 99.310 8 0.0552072 0.690 100.000 32
Components Table of Component Weights Component Component 1 2 Engine Size 0.376856-0.205144 Horsepower 0.292144-0.592729 Passengers 0.239193 0.730749 Length 0.369908 0.0429221 Wheelbase 0.374826 0.259648 Width 0.38949-0.0422083 U Turn Space 0.359702-0.0256716 Weight 0.396236-0.0298902 First component 0.376856*Engine Size + 0.292144*Horsepower + 0.239193*Passengers + 0.369908*Length + 0.374826*Wheelbase + 0.38949*Width + 0.359702*U Turn Space + 0.396236*Weight Second component -0.205144*Engine Size 0.592729*Horsepower + 0.730749*Passengers + 0.0429221*Length + 0.259648*Wheelbase - 0.0422083*Width - 0.0256716*U Turn Space 0.0298902*Weight 33
C_2 Interpretation Plot of C_2 vs C_1 3 1-1 Type Compact Large Midsize Small Sporty Van -3-5 -6-4 -2 0 2 4 6 C_1 34
Principal Component Regression MPG City = 22.3656-1.84685*size + 0.567176*unsportiness Standard T Parameter Estimate Error Statistic P-Value CONSTANT 22.3656 0.353316 63.302 0.0000 size -1.84685 0.147168-12.5492 0.0000 unsportiness 0.567176 0.339277 1.67172 0.0981 R-squared = 64.0399 percent R-squared (adjusted for d.f.) = 63.2408 percent Standard Error of Est. = 3.40726 Mean absolute error = 2.26553 35
Partial Least Squares (PLS) Similar to PCA, except that it finds components that minimize the variance in both the X s and the Y s. May be used with many X variables, even exceeding n. 36
Percent variation Component Extraction Starts with number of components equal to the minimum of p and (n-1). Model Comparison Plot 100 80 X Y 60 40 20 0 1 2 3 4 5 6 7 8 Number of components 37
Engine Size Horsepower Passengers Length Wheelbase Width U Turn Space Weight Stnd. coefficient Coefficient Plot PLS Coefficient Plot 0.5 0.3 0.1-0.1-0.3-0.5 MPG City MPG Highway Fueltank -0.7 38
Model in Original Units MPG City = 50.0593 0.214083*Engine Size - 0.0347708*Horsepower - 0.884181*Passengers + 0.0294622*Length - 0.0362471*Wheelbase - 0.0882233*Width - 0.0282326*U Turn Space - 0.00391616*Weight 39
Classification Principal components can also be used to classify new observations. A useful method for classification is a Bayesian classifier, which can be expressed as a neural network. 40
unsportiness 6 Types of Automobiles Plot of unsportiness vs size 3 1-1 Type Compact Large Midsize Small Sporty Van -3-5 -6-4 -2 0 2 4 6 size 41
Neural Networks Input layer Pattern layer Summation layer Output layer (2 variables) (93 cases) (6 neurons) (6 groups) 42
Bayesian Classifier Begins with prior probabilities for membership in each group Uses a Parzen-like density estimator of the density function for each group g j ( X ) 1 n j n j i 1 exp X X 2 i 2 43
Options The prior probabilities may be determined in several ways. A training set is usually used to find a good value for. 44
Output Number of cases in training set: 93 Number of cases in validation set: 0 Spacing parameter used: 0.0109375 (optimized by jackknifing during training) Training Set Percent Correctly Type Members Classified Compact 16 75.0 Large 11 100.0 Midsize 22 77.2727 Small 21 76.1905 Sporty 14 85.7143 Van 9 100.0 Total 93 82.7957 45
unsportiness Classification Regions 3 1-1 Classification Plot sigma = 0.0109375 Type Compact Large Midsize Small Sporty Van -3-5 -6-4 -2 0 2 4 6 size 46
unsportiness Changing Sigma 3 1-1 Classification Plot sigma = 0.3 Type Compact Large Midsize Small Sporty Van -3-5 -6-4 -2 0 2 4 6 size 47
unsportiness Overlay Plot 3 1-1 Classification Plot sigma = 0.3 Type Compact Large Midsize Small Sporty Van -3-5 -6-4 -2 0 2 4 6 size 48
unsportiness Outlier Detection 5 Control Ellipse 3 1-1 -3-5 -8-4 0 4 8 size 49
unsportiness Cluster Analysis 3 1-1 Cluster Scatterplot Method of k-means,squared Euclidean Cluster 1 2 3 4 Centroids -3-5 -6-4 -2 0 2 4 6 size 50
Design of Experiments When more than one characteristic is important, finding the optimal operating conditions usually requires a tradeoff of one characteristic for another. One approach to finding a single solution is to use desirability functions. 51
Example #3 Myers and Montgomery (2002) describe an experiment on a chemical process: Response variable Conversion percentage Goal maximize Thermal activity Maintain between 55 and 60 Input factor Low High time 8 minutes 17 minutes temperature 160 C 210 C catalyst 1.5% 3.5% 52
Experiment run time temperature catalyst conversion activity (minutes ) (degrees C ) (percent ) 1 10.0 170.0 2.0 74.0 53.2 2 15.0 170.0 2.0 51.0 62.9 3 10.0 200.0 2.0 88.0 53.4 4 15.0 200.0 2.0 70.0 62.6 5 10.0 170.0 3.0 71.0 57.3 6 15.0 170.0 3.0 90.0 67.9 7 10.0 200.0 3.0 66.0 59.8 8 15.0 200.0 3.0 97.0 67.8 9 8.3 185.0 2.5 76.0 59.1 10 16.7 185.0 2.5 79.0 65.9 11 12.5 160.0 2.5 85.0 60.0 12 12.5 210.0 2.5 97.0 60.7 13 12.5 185.0 1.66 55.0 57.4 14 12.5 185.0 3.35 81.0 63.2 15 12.5 185.0 2.5 81.0 59.2 16 12.5 185.0 2.5 75.0 60.4 17 12.5 185.0 2.5 76.0 59.1 18 12.5 185.0 2.5 83.0 60.6 19 12.5 185.0 2.5 80.0 60.8 20 12.5 185.0 2.5 91.0 58.9 53
Step #1: Model Conversion Standardized Pareto Chart for conversion AC C:catalyst CC B:temperature BB BC AA AB A:time + - 0 2 4 6 8 Standardized effect 54
catalyst Step #2: Optimize Conversion Goal: maximize conversion Optimum value = 118.174 Factor Low High Optimum time 8.0 17.0 17.0 temperature 160.0 210.0 210.0 catalyst 1.5 3.5 3.48086 3.5 3 2.5 2 1.5 Contours of Estimated Response Surface temperature=210.0 8 9 10 11 12 13 14 15 16 17 time conversion 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 92.5 95.0 97.5 100.0 55
Step #3: Model Activity Standardized Pareto Chart for activity A:time C:catalyst AA AB B:temperature BC BB CC AC + - 0 2 4 6 8 Standardized effect 56
catalyst Step #4: Optimize Activity Goal: maintain activity at 57.5 Optimum value = 57.5 Factor Low High Optimum time 8.3 16.7 10.297 temperature 209.99 210.01 210.004 catalyst 1.66 3.35 2.31021 3.5 3 2.5 Contours of Estimated Response Surface temperature=210.0 activity 55.0 56.0 57.0 58.0 59.0 60.0 2 1.5 8 9 10 11 12 13 14 15 16 17 time 57
Desirability, d Step #5: Select Desirability Fcns. Maximize 0 20 40 60 80 100 1 Desirability Function for Maximization 0.8 0.6 s = 0.2 s = 0.4 0.4 s = 1 s = 2 0.2 s = 8 0 Low Predicted response High 58
Des irability, d Desirability Function Hit Target 1 Desirability Function for Hitting Target 0.8 0.6 0.4 0.2 0 s = 0.1 t = 0.1 s = 1 t = 1 s = 5 t = 5 0 20 40 60 80 100 Low Target Predicted response High 59
Combined Desirability m I I I 1/ I d 1d 2 d m j j D... 1 1 2 m where m = # of factors and 0 I j 5. D ranges from 0 to 1. 60
Example Optimum value = 0.949092 Factor Low High Optimum time 8.0 17.0 11.1394 temperature 160.0 210.0 210.0 catalyst 1.5 3.5 2.20119 Weights Weights Response Low High Goal First Second Impact conversion 50.0 100.0 Maximize 1.0 3.0 activity 55.0 60.0 57.5 1.0 1.0 3.0 Response Optimum conversion 95.0388 activity 57.5 61
catalyst Desirability Contours 3.5 3 2.5 2 1.5 Contours of Estimated Response Surface temperature=210.0 8 9 10 11 12 13 14 15 16 17 time Desirability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 62
Desirability Desirability Surface Estimated Response Surface temperature=210.0 1 0.8 0.6 0.4 0.2 0 8 9 10 11 12 13 14 15 16 17 time 2 1.5 3.5 3 2.5 catalyst 63
catalyst Overlaid Contours 3 2.8 Overlay Plot temperature=210.0 conversion activity 2.6 2.4 2.2 2 10 11 12 13 14 15 time 64
References Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate Statistical Analysis. Upper Saddle River: Prentice Hall.Mason, R.L. and Young, J.C. (2002). Mason and Young (2002). Multivariate Statistical Process Control with Industrial Applications. Philadelphia: SIAM. Montgomery, D. C. (2005). Introduction to Statistical Quality Control, 5th edition. New York: John Wiley and Sons. Myers, R. H. and Montgomery, D. C. (2002). Response Surface Methodology: Process and Product optimization Using Designed Experiments, 2nd edition. New York: John Wiley and Sons. 65
PowerPoint Slides Available at: www.statgraphics.com/documents.htm 66