1 ANNALS OF CLINICAL AND LABORATORY SCIENCE, Vol. 17, No. 6 Copyright 1987, Institute for Clinical Science, Inc. Interpretation of Laboratory Results Using M ultidimensional Scaling and Principal C om ponent Analysis* DAVID A. LACHER, M.D.f Department o f Pathology, Medical College o f Ohio, Toledo, OH ABSTRACT Principal com ponent analysis (PCA) and m ultidim ensional scaling (MDS) are a set of mathematical techniques which uncover the underlying structure of data by examining the relationships betw een variables. Both MDS and PCA use proximity m easures such as correlation coefficients or Euclidean distances to generate a spatial configuration (map) of points where distances betw een points reflect the relationship betw een individuals with their underlying set of data. M ultidimensional scaling, when com pared to PCA, gives m ore readily interpretable solutions of lower dim ensionality and does not d ep en d on the assum ption of a linear relationship b etw een variables. Both MDS and PCA w ere applied to electrolyte profiles of patients with acute renal failure and patients without apparent disease. The MDS was superior to PCA in separating renal patients from normal patients. The one-dim ensional and two-dimensional solutions of M DS and PCA were com pared. Introduction Principal com ponent analysis (PCA) and m ultidim ensional scaling (MDS) are m athem atical techniques used to investig a te th e u n d e rly in g re la tio n s h ip b e tw e e n v a ria b le s. B o th m e th o d s usually reduce the dim ensional space (the variable set) w hile p reserv in g the * Part of the paper was presented at the 6th International Meeting on Clinical Laboratory Organizational Management, Noodwijkerhout, Netherlands, June t Address reprint requests to David A. Lacher, M.D., D epartm ent of Pathology, Medical College of Ohio, C.S. #10008, Toledo, OH maximum am ount of information. A data set of n variables and p subjects can be visualized as a cloud of p points in n- dimensional space. Both MDS and PCA seek a lower dimensional representation while retaining, as much as possible, the distance betw een points. Both m ethods can generate factors (derived variables) w hich are linear com binations of variables that reflect basic constructs (area of generalization) in the data. Both MDS and PCA can map subjects in n (or less) dim ensional space. There are several differences betw een M DS and PCA. Principal com ponent analysis generally starts w ith a correlation m atrix b e tw een variables, w hile /87/ $00.90 Institute for Clinical Science, Inc.

2 INTERPRETATION O F LABORATORY RESULTS m ultidim ensional scaling begins with an inter-subject distance matrix. The MDS is based on distances betw een points while PCA is based on angles betw een vectors. Also, PCA is based on the general linear m odel, but MDS has no such u n d erly in g assum ption. In addition, M D S m ay lead to few er significant dim ensions than PC A.4,5,7 The application of pattern recognition techniques in laboratory m edicine has b een discu ssed by B o y d.1 Norm ally, physicians interpret quantitatively single laboratory results and in te rp re t qualitatively the pattern of m ultiple-related laboratory tests. Laboratory tests are rarely interpreted in a m ultivariate quantitative sense. Both MDS and PCA have been classically used in the social sciences. As a dem onstration of the application of MDS and PCA to laboratory medicine, these m ethods w ere used in th e analysis of electrolyte profiles of patients with renal failure and n o rm al p eo p le w ithout apparent disease. M ultidim ensional scaling and principal com ponent analysis are com pared in their ability to reduce the variable set, to construct the physiologic relationship betw een laboratory tests, and to d iscrim in ate b etw een p a tie n t p o p u la tio n s in v a rio u s d im e n sio n a l spaces. M ethods S u b je c t s Twenty-two second-year m edical stud en ts w ere se le c te d as th e n o rm al sample and 22 patients with renal failure w ere analyzed. The diagnosis of renal failure was m ade by increased serum creatinine and u rea nitrogen. V a r ia b l e s bicarbonate was done on each patient. The electrolyte analysis was perform ed on the Beckman ASTRA analyzer. * D a t a T r a n s f o r m a t io n The raw test data was standardized via a Z-score transform: Z = Z-score = raw score = average S = standard deviation The following estim ated averages and standard deviations of the norm al population w ere used for th e Z-score transform ation: Standard Test (units) Average Deviation Sodium (meq/1) Potassium (meq/1) C hloride (meq/1) B icarbonate (meq/1) The raw data w ere standardized to maintain a uniform scale for the laboratory tests. This was necessary to calculate distance m easures betw een subjects for the M DS analysis. St a t is t ic a l A n a ly sis D escriptive statistics w ere analyzed for the entire group of patients and for the normal and renal patients separately. T he B M D P ID program was used to analyze the descriptive statistics.2 The correlation betw een test variables was g e n erated using PROC CO RR of the SAS package.6 Principal com ponent analysis was p erform ed on the patients data using the SAS PROC FACTOR program. Unro- An electrolyte profile consisting of serum sodium, potassium, chloride and * Beckman Corp., Brea, CA.

3 1 o vj 1 o LACHER TABLE I Descriptive Statistics of Average and Standard Deviations of Laboratory Tests S o d i u m C h l o r i d e P o t a s s i u m B i c a r b o n a t e Group V* Avçft SD$ Avg S D Avg S D Avg SD T o t a l R e n a l N o r m a l Correlation Matrix S o d i u m C h l o r i d e P o t a s s i u m B i c a r b o n a t e S o d i u m C h l o r i d e P o t a s s i u m B i c a r b o n a t e *N = n u m b e r o f p a t i e n t s f A v g = a v e r a g e $ S D = s t a n d a r d d e v i a t i o n r ^ n o t s i g n i f i c a n t a t p = (twotailed) tated, VARIM A-rotated, PROMArotated, and H arris-k aiser-rotated PCA w ere perform ed using two and th ree dimensional solutions. Factor scores for renal and normal patients were plotted. M ultidim ensional scaling was p e r form ed using the SAS PROC ALSCAL program. The Z-score transform ed data was used to create a Euclidean distance betw een each pair of subjects using the following formula: d ij = \ / S (Z ir - Z jr )2 w h e r e r=l dy = Euclidean distance R = test num ber Zir = test value (z-transformed) for ith person for the rth test Zjr = test value (z-transformed) for the jth person for the rth test S tim u lu s c o o rd in a te s (d im e n sio n a l scores) w ere p ro d u c e d by M DS and were plotted. Renal and normal patients w ere identified on the plots. M ultiple lin ear reg ressio n, u sing th e stim ulus coordinates as dependent variables and laboratory tests as in d ep e n d e n t variables, was perform ed (using the RM DP 1R program) to establish the test regression weights for each MDS dimension. Results and Discussion D escriptive statistics of the standardized (Z-score transform ed) laboratory data are seen in table I. Patients with re n a l failu re h ad low er sodium and bicarbonate values and higher potassium values than normal patients. The mean chloride was about the same for both groups, but renal patients had more variable chloride values. T A B L E II Unrotated Principal Component Analysis Factor Pattern Factor Test S odium Chloride P otassium Bicarbonate Eigenvalue Proportion Cumulative P roportion o

4 INTERPRETATION O F LABORATORY RESULTS TABLE I I I Two-Dimensional Rotated Principal Component Analysis Varimax Promax Harris- Kaiser Test I 11 II I II Sodium Chloride Potassium Bicarbonate T he relationship b etw een the electrolyte tests d e m o n stra tes several physiologic relationships (table I). For example, sodium and chloride have a high correlation (r = 0.65) reflecting a salt loss or gain. Potassium has an indirect relationship w ith b ic a rb o n a te (r = 0.38) r e f l e c t i n g h y d r o g e n - p o t a s s i u m exchange. Sodium bicarbonate excretion (r = 0.39) is im portant in the m aintenance of th e acid-base homeostasis. P rin c ip a l c o m p o n e n t analysis was a p p lied to th e e le ctro ly te profiles of renal and norm al patients. The factor pattern of the unrotated solution indicated that the first two factors had eigenvalues greater than one and explained 79 percent of the variance (table II). A scree p lo t (eig en v alu e vs. factor n u m b er) revealed no significant change in slope, and, hence, was not useful in determ in ing th e dim ensionality. Since sim ple structure was not p resent in the unrotated PCA solution, the orthogonal VAR- IMA and the oblique PROMA and H arris-k aiser ro tatio n m ethods w ere analyzed for two factors. The oblique ro tated solutions did not significantly im prove the sim ple stru ctu re of the facto r p a tte r n w h e n c o m p a re d to th e o rthogonal VARIMA ro tatio n (table III). Factor 1 had positive salient loadings for sodium and chloride which could be in te rp re te d as a salt dim ension. Factor 2 had a positive salient loading for bicarbonate and a negative salient loading for potassium which could be seen as an acid-base (ph) dimension. However, sodium also h ad a positive loading on Factor 2 probably as a result of to the sodium bicarbonate relationship. M u ltid im en sio n al scaling was also done on the electrolyte profiles of the renal and normal patients. The Kruskal stress coefficient (goodness of fit function) was red u ced (R2 increased) significantly from a one dimensional to a two dimensional solution indicating that the two dim ensional solution was optimal. M ultiple linear regression, using the stim u lu s (dim ension) co o rd in a te s as dependent variables and the test values as in d e p e n d e n t values, was done to in te rp re t th e dim ensions (table IV). Sodium and bicarbonate had positive w eights and potassium had a negative weight on Factor 1. On Factor 2, sodium and chloride had positive weights and b ic a rb o n a te had a n e g ativ e w eig h t. Sodium loaded on both factors as in the PCA solution. Potassium, which loaded only on Factor 1, was im portant in the separation of renal from normal patients. It appears that Factor 1 was an acid-base (ph) scale. Factor 2 may be interpreted as an ion balance scale. M ultidim ensional scaling (MDS) and PCA w ere com pared for th eir ability to T A B L E I V Multidimensional Scaling Analysis D i m e n s i o n a l Goodness of Fit Test N u m b e r o f D i m e n s i o n s Str e s s R T w o - D i m e n s i o n a l M u l t i p l e L i n e a r R e g r e s s i o n W e i g h t s V a r i a b l e D i m e n s i o n 1 D i m e n s i o n 2 S o d i u m C h l o r i d e P o t a s s i u m Bicarbonate I n t e r c e p t M u l t i p l e

5 4 1 6 LACHER -1 -PC A «. «M U»»», K. K»»» M M D S F ig u r e 1. One-dimensional plot of factor (stimulus) scores of 22 renal ( ) and 22 normal () patients for principal component analysis and multidimensional scaling. discrim inate patients with renal failure from normal people. For the one dim ensio n s o lu tio n, M D S c la s s ifie d th e patients b etter than PCA as seen graphically as less overlapping betw een patient groups (figure 1). Principal com ponent analysis did not separate the renal from normal patients as well as MDS in the two-dimensional solution (figures 2 and 3). It appears that acidosis was im portant in separating th e two groups. D iscrim i nant analysis or cluster analysis could also be used to classify patients, b u t F A C T O R I I th ese te c h n iq u e s w ould n ot read ily explain the interrelationships among the variables. Principal com ponent analysis (PCA) and M DS w ere applied to a profile of chem istry tests to reduce the dimensionality of th e variable set and to discrim i nate two patient groups. Both MDS and PCA acco m p lish e d th e re d u c tio n in dim ensionality b u t different in terp retations of th e dim ensions; M DS b e tte r separated the two patient groups than PCA. M ultidim ensional scaling is fre- F ig u r e 2. Two-dimensional plot of factor scores for renal ( ) and normal patients () using principal com ponent analysis with varimax rotation I F A C T O R I

6 INTERPRETATION O F LABORATORY RESULTS FACTOR II F ig u r e 3. Two-dimensional plot of multidimensional scaling stim u lu s scores for renal ( ) and normal patients () xx FACTOR I quently applied to the social sciences but is rarely applied to laboratory medicine. W itte used M DS for data reduction to predict bone marrow findings from tests perform ed in peripheral blood.8 Gattaz applied MDS to separate schizophrenic patients from norm al patients by obtaining a two dimensional representation of 17 cerebrospinal su b stan ces.3 M ultidim ensional scaling and principal com ponent analysis can reduce a large num ber of variables to a few significant variables in order to simplify data analysis. References 1. B o y d, J. C.: Use of methods of pattern recognition to assist in test selection and test interpretation. Clinics Lab Med. 2: , D ix o n, W. J., ed.: BMDP Statistical Software Manual. Berkeley, CA, University of California Press, G a t t a z, W. F., G a s s e r, T., and B e c k m a n n, H.: Multidimensional analysis of the concentration of 17 substances in the CSF of schizophrenics and controls. Biol. Psychiatry 20: , G o r s u c h, R. L.: Factor Analysis, (2nd ed.). Hillsdale, N J, Lawrence Erlbaum Associates, Inc., K r u s k a l, J. B. and W i s h, M.: Multidimensional Scaling. Beverly Hills, CA, Sage Publications, Inc., SAS User s Guide: Statistics, 5th ed. Cary, NC, SAS Institute, Inc., S c h i f f m a n, S., R e y n o l d s, M., and Yo u n g, F.: In tro d u ctio n to M ultidim ensional Scaling. Orlando, FL, Academic Press, W i t t e, D. L., K r a e m e r, D. F., J o h n s o n, G. F., D ic k, F. R., and H a m il t o n, H.: Prediction of bone marrow iron findings from tests performed on peripheral blood. Am. J. Clin. Pathol. 55: , 1986.

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

