Interpretation of Laboratory Results Using M ultidimensional Scaling and Principal C om ponent Analysis*

Similar documents
C o r p o r a t e l i f e i n A n c i e n t I n d i a e x p r e s s e d i t s e l f

Comparison of Multidimensional Scaling and Principal Component Analysis of Interspecific Variation in Bacteria*

Effect of Methods of Platelet Resuspension on Stored Platelets

EKOLOGIE EN SYSTEMATIEK. T h is p a p e r n o t to be c i t e d w ith o u t p r i o r r e f e r e n c e to th e a u th o r. PRIMARY PRODUCTIVITY.

Form and content. Iowa Research Online. University of Iowa. Ann A Rahim Khan University of Iowa. Theses and Dissertations

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

A L A BA M A L A W R E V IE W

Functional pottery [slide]

LSU Historical Dissertations and Theses

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

S ca le M o d e l o f th e S o la r Sy ste m

Nov Julien Michel

The Ability C ongress held at the Shoreham Hotel Decem ber 29 to 31, was a reco rd breaker for winter C ongresses.

STEEL PIPE NIPPLE BLACK AND GALVANIZED

Principal Component Analysis, A Powerful Scoring Technique

c. What is the average rate of change of f on the interval [, ]? Answer: d. What is a local minimum value of f? Answer: 5 e. On what interval(s) is f

A Study of Drink Driving and Recidivism in the State of Victoria Australia, during the Fiscal Years 1992/ /96 (Inclusive)

LU N C H IN C LU D E D

gender mains treaming in Polis h practice

Rebecca G. Frederick L ouisiana State U niversity D epartm ent of E xperim ental Statistics

The M echanism of Factor VIII Inactivation by H um an Antibodies

Factor Analysis (1) Factor Analysis

REFUGEE AND FORCED MIGRATION STUDIES

INCOME TAXES IN ALONG-TERMMACROECONOMETRIC FORECASTING MODEL. Stephen H. Pollock

A Comparison of Two Methods of Teaching Computer Programming to Secondary Mathematics Students.

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode

Sodium-Initiated Polymerization of Alpha- Methylstyrene in the Vicinity of Its Reported Ceiling Temperature

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Procedures for Computing Classification Consistency and Accuracy Indices with Multiple Categories

What are S M U s? SMU = Software Maintenance Upgrade Software patch del iv ery u nit wh ich once ins tal l ed and activ ated prov ides a point-fix for

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

EC 219 SA M PLING AND INFERENCE. One and One HalfH ours (1 1 2 H ours) Answerallparts ofquestion 1,and ONE other question.

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

Comparative Analyses of Teacher Verbal and Nonverbal Behavior in a Traditional and an Openspace

ARC 202L. Not e s : I n s t r u c t o r s : D e J a r n e t t, L i n, O r t e n b e r g, P a n g, P r i t c h a r d - S c h m i t z b e r g e r

University Microfilms

Internet-assisted Chinese-English Dictionary Compilation

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9

A Study of Attitude Changes of Selected Student- Teachers During the Student-Teaching Experience.

С-4. Simulation of Smoke Particles Coagulation in the Exhaust System of Piston Engine

F O R M T H R E E K enya C ertificate of Secondary E ducation

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...

Unconstrained Ordination

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

A Study of Protein-A of Staphylococcus Aureus of Bovine Origin.

Chapter 5 Workshop on Fitting of Linear Data

Response Rate, Latency, and Resistance to Change

Principal Component Analysis

Results as of 30 September 2018

MOLINA HEALTHCARE, INC. (Exact name of registrant as specified in its charter)

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Applied Multivariate Analysis

The Construction and Testing of a New Empathy Rating Scale

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

Adaptation o f EMIT D rug Assays to a Random-Access Automated Clinical Analyzer*

Optimization and Evaluation of Cardiac Enzym es and Isoenzym es M easured on a Random Access Analyzer

(2009) Journal of Rem ote Sensing (, 2006) 2. 1 (, 1999), : ( : 2007CB714402) ;

Introduction to Factor Analysis

Grain Reserves, Volatility and the WTO

Principal Component Analysis (PCA) Theory, Practice, and Examples

PRINCIPAL COMPONENTS ANALYSIS (PCA)

INFORMATION TO USERS

A bacteriological study of the method of pasteurizing and homogenizing the ice cream mix.

Feasibility Analysis, Dynamics, and Control of Distillation Columns With Vapor Recompression.

A Confusion Matrix Intelligibility Testing Procedure for Preschool Children

Isolation o f T Lymphocyte Subsets from Peripheral Blood Using Monoclonal A ntilym phocyte Antibodies*

Large chunks. voids. Use of Shale in Highway Embankments

M I E A T? Y A H 0E 3TE S

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C Form 8-K/A (Amendment No. 2)

A study of intra-urban mobility in Omaha

The Effects of Apprehension, Conviction and Incarceration on Crime in New York State

Impact of Drink-drive Enforcement and Public Education Programs in Victoria, Australia

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

S U E K E AY S S H A R O N T IM B E R W IN D M A R T Z -PA U L L IN. Carlisle Franklin Springboro. Clearcreek TWP. Middletown. Turtlecreek TWP.

Information System Desig

The Effectiveness of the «Checkpoint Tennessee» Program

Introduction to Factor Analysis

Transverse curvature effects on turbulent boundary layers.

Designing the Human Machine Interface of Innovative Emergency Handling Systems in Cars

Real Gas Equation of State for Methane

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

V o l. 21, N o. 2 M ar., 2002 PRO GR ESS IN GEO GRA PH Y ,, 2030, (KZ9522J 12220) E2m ail: w igsnrr1ac1cn

Class Diagrams. CSC 440/540: Software Engineering Slide #1

Visceral mass and reticulorumen volume of differing biological types of beef cattle by Eddie L Fredrickson

THE EFFECT Of SUSPENSION CASTING ON THE HOT WORKABILITY AND MECHANICAL PROPERTIES OF A IS I TYPE STAINLESS STEEL

VERITAS L1 trigger Constant Fraction Discriminator. Vladimir Vassiliev Jeremy Smith David Kieda

Beechwood Music Department Staff

Organ-on-a-chip: practical applications & challenges. Remko van Vught

The use and effectiveness of financial and physical reserves in Montana's dryland wheat area by Howard W Hjort

The Effects of Symbolic Modeling and Parent Training on Noncompliance in Hyperactive Children

A Comparison of the Early Social Behavior of Twins and Singletons.

Principal Components Analysis using R Francis Huang / November 2, 2016

Approach to multiple attribute decision making based on different intuition istic preference structures

M a rtin H. B r e e n, M.S., Q u i T. D a n g, M.S., J o se p h T. J a in g, B.S., G reta N. B o y d,

Evaluation of Survey results for regulatory grading

Drugs other than alcohol (medicines and illicit drugs) in people involved in fatal road accidents in Spain

Few thoughts on PFA, from the calorim etric point of view

Thermodynamic properties, mesomorphic transitions and recycling behaviour of cholesteryl laurate using DSC technique

Transcription:

ANNALS OF CLINICAL AND LABORATORY SCIENCE, Vol. 17, No. 6 Copyright 1987, Institute for Clinical Science, Inc. Interpretation of Laboratory Results Using M ultidimensional Scaling and Principal C om ponent Analysis* DAVID A. LACHER, M.D.f Department o f Pathology, Medical College o f Ohio, Toledo, OH 43699 ABSTRACT Principal com ponent analysis (PCA) and m ultidim ensional scaling (MDS) are a set of mathematical techniques which uncover the underlying structure of data by examining the relationships betw een variables. Both MDS and PCA use proximity m easures such as correlation coefficients or Euclidean distances to generate a spatial configuration (map) of points where distances betw een points reflect the relationship betw een individuals with their underlying set of data. M ultidimensional scaling, when com pared to PCA, gives m ore readily interpretable solutions of lower dim ensionality and does not d ep en d on the assum ption of a linear relationship b etw een variables. Both MDS and PCA w ere applied to electrolyte profiles of patients with acute renal failure and patients without apparent disease. The MDS was superior to PCA in separating renal patients from normal patients. The one-dim ensional and two-dimensional solutions of M DS and PCA were com pared. Introduction Principal com ponent analysis (PCA) and m ultidim ensional scaling (MDS) are m athem atical techniques used to investig a te th e u n d e rly in g re la tio n s h ip b e tw e e n v a ria b le s. B o th m e th o d s usually reduce the dim ensional space (the variable set) w hile p reserv in g the * Part of the paper was presented at the 6th International Meeting on Clinical Laboratory Organizational Management, Noodwijkerhout, Netherlands, June 1987. t Address reprint requests to David A. Lacher, M.D., D epartm ent of Pathology, Medical College of Ohio, C.S. #10008, Toledo, OH 43699. maximum am ount of information. A data set of n variables and p subjects can be visualized as a cloud of p points in n- dimensional space. Both MDS and PCA seek a lower dimensional representation while retaining, as much as possible, the distance betw een points. Both m ethods can generate factors (derived variables) w hich are linear com binations of variables that reflect basic constructs (area of generalization) in the data. Both MDS and PCA can map subjects in n (or less) dim ensional space. There are several differences betw een M DS and PCA. Principal com ponent analysis generally starts w ith a correlation m atrix b e tw een variables, w hile 412 0091-7370/87/1100-0410 $00.90 Institute for Clinical Science, Inc.

INTERPRETATION O F LABORATORY RESULTS 4 1 3 m ultidim ensional scaling begins with an inter-subject distance matrix. The MDS is based on distances betw een points while PCA is based on angles betw een vectors. Also, PCA is based on the general linear m odel, but MDS has no such u n d erly in g assum ption. In addition, M D S m ay lead to few er significant dim ensions than PC A.4,5,7 The application of pattern recognition techniques in laboratory m edicine has b een discu ssed by B o y d.1 Norm ally, physicians interpret quantitatively single laboratory results and in te rp re t qualitatively the pattern of m ultiple-related laboratory tests. Laboratory tests are rarely interpreted in a m ultivariate quantitative sense. Both MDS and PCA have been classically used in the social sciences. As a dem onstration of the application of MDS and PCA to laboratory medicine, these m ethods w ere used in th e analysis of electrolyte profiles of patients with renal failure and n o rm al p eo p le w ithout apparent disease. M ultidim ensional scaling and principal com ponent analysis are com pared in their ability to reduce the variable set, to construct the physiologic relationship betw een laboratory tests, and to d iscrim in ate b etw een p a tie n t p o p u la tio n s in v a rio u s d im e n sio n a l spaces. M ethods S u b je c t s Twenty-two second-year m edical stud en ts w ere se le c te d as th e n o rm al sample and 22 patients with renal failure w ere analyzed. The diagnosis of renal failure was m ade by increased serum creatinine and u rea nitrogen. V a r ia b l e s bicarbonate was done on each patient. The electrolyte analysis was perform ed on the Beckman ASTRA analyzer. * D a t a T r a n s f o r m a t io n The raw test data was standardized via a Z-score transform: Z = Z-score = raw score = average S = standard deviation The following estim ated averages and standard deviations of the norm al population w ere used for th e Z-score transform ation: Standard Test (units) Average Deviation Sodium (meq/1) 140.0 2.0 Potassium (meq/1) 4.1 0.4 C hloride (meq/1) 101.0 2.5 B icarbonate (meq/1) 27.0 1.5 The raw data w ere standardized to maintain a uniform scale for the laboratory tests. This was necessary to calculate distance m easures betw een subjects for the M DS analysis. St a t is t ic a l A n a ly sis D escriptive statistics w ere analyzed for the entire group of patients and for the normal and renal patients separately. T he B M D P ID program was used to analyze the descriptive statistics.2 The correlation betw een test variables was g e n erated using PROC CO RR of the SAS package.6 Principal com ponent analysis was p erform ed on the patients data using the SAS PROC FACTOR program. Unro- An electrolyte profile consisting of serum sodium, potassium, chloride and * Beckman Corp., Brea, CA.

1 o vj 1 o 4 1 4 LACHER TABLE I Descriptive Statistics of Average and Standard Deviations of Laboratory Tests S o d i u m C h l o r i d e P o t a s s i u m B i c a r b o n a t e Group V* Avçft SD$ Avg S D Avg S D Avg SD T o t a l 4 4-0.92 2. 3 8 1.24 1.98 0. 9 2 1.67-2.38 2.84 R e n a l 22-2. 0 9 2.79 1. 1 1 2. 6 5 1. 6 5 2. 0 3-4. 8 2 1.50 N o r m a l 22 0. 2 5 0.97 1.36 0.99 0. 1 9 0.70 0. 0 6 1.33 Correlation Matrix S o d i u m C h l o r i d e P o t a s s i u m B i c a r b o n a t e S o d i u m C h l o r i d e P o t a s s i u m B i c a r b o n a t e 1.00 0. 6 5-0. 2 9 0. 3 9 0.65 1.00-0. 2 3-0.17-0.29-0. 2 3 1.00-0.38 0.39-0.17-0. 3 8 1.00 *N = n u m b e r o f p a t i e n t s f A v g = a v e r a g e $ S D = s t a n d a r d d e v i a t i o n r ^ 0. 2 9 n o t s i g n i f i c a n t a t p = 0. 0 5 (twotailed) tated, VARIM A-rotated, PROMArotated, and H arris-k aiser-rotated PCA w ere perform ed using two and th ree dimensional solutions. Factor scores for renal and normal patients were plotted. M ultidim ensional scaling was p e r form ed using the SAS PROC ALSCAL program. The Z-score transform ed data was used to create a Euclidean distance betw een each pair of subjects using the following formula: d ij = \ / S (Z ir - Z jr )2 w h e r e r=l dy = Euclidean distance R = test num ber Zir = test value (z-transformed) for ith person for the rth test Zjr = test value (z-transformed) for the jth person for the rth test S tim u lu s c o o rd in a te s (d im e n sio n a l scores) w ere p ro d u c e d by M DS and were plotted. Renal and normal patients w ere identified on the plots. M ultiple lin ear reg ressio n, u sing th e stim ulus coordinates as dependent variables and laboratory tests as in d ep e n d e n t variables, was perform ed (using the RM DP 1R program) to establish the test regression weights for each MDS dimension. Results and Discussion D escriptive statistics of the standardized (Z-score transform ed) laboratory data are seen in table I. Patients with re n a l failu re h ad low er sodium and bicarbonate values and higher potassium values than normal patients. The mean chloride was about the same for both groups, but renal patients had more variable chloride values. T A B L E II Unrotated Principal Component Analysis Factor Pattern Factor Test 1 2 3 4 S odium 0.89 0.18 0.36-0.23 Chloride 0.68 0.70 0.23 P otassium -0.65 0.36 0.66 0.06 Bicarbonate 0.52 0.32 0.18 Eigenvalue 1.94 1.24 0.68 0.14 Proportion 0.48 0.31 0.17 0.04 Cumulative P roportion 0.48 0.79 0.96 1.00 o

INTERPRETATION O F LABORATORY RESULTS 4 1 5 TABLE I I I Two-Dimensional Rotated Principal Component Analysis Varimax Promax Harris- Kaiser Test I 11 II I II Sodium 0.80 0.43 0.78 0.36 0.32 0.78 Chloride 0.97-0.10 0.98-0.18-0.23 0.99 Potassium -0.28-0.69-0.23-0.67-0.66-0.23 Bicarbonate -0.10 0.92-0.16 0.94 0.95-0.17 T he relationship b etw een the electrolyte tests d e m o n stra tes several physiologic relationships (table I). For example, sodium and chloride have a high correlation (r = 0.65) reflecting a salt loss or gain. Potassium has an indirect relationship w ith b ic a rb o n a te (r = 0.38) r e f l e c t i n g h y d r o g e n - p o t a s s i u m exchange. Sodium bicarbonate excretion (r = 0.39) is im portant in the m aintenance of th e acid-base homeostasis. P rin c ip a l c o m p o n e n t analysis was a p p lied to th e e le ctro ly te profiles of renal and norm al patients. The factor pattern of the unrotated solution indicated that the first two factors had eigenvalues greater than one and explained 79 percent of the variance (table II). A scree p lo t (eig en v alu e vs. factor n u m b er) revealed no significant change in slope, and, hence, was not useful in determ in ing th e dim ensionality. Since sim ple structure was not p resent in the unrotated PCA solution, the orthogonal VAR- IMA and the oblique PROMA and H arris-k aiser ro tatio n m ethods w ere analyzed for two factors. The oblique ro tated solutions did not significantly im prove the sim ple stru ctu re of the facto r p a tte r n w h e n c o m p a re d to th e o rthogonal VARIMA ro tatio n (table III). Factor 1 had positive salient loadings for sodium and chloride which could be in te rp re te d as a salt dim ension. Factor 2 had a positive salient loading for bicarbonate and a negative salient loading for potassium which could be seen as an acid-base (ph) dimension. However, sodium also h ad a positive loading on Factor 2 probably as a result of to the sodium bicarbonate relationship. M u ltid im en sio n al scaling was also done on the electrolyte profiles of the renal and normal patients. The Kruskal stress coefficient (goodness of fit function) was red u ced (R2 increased) significantly from a one dimensional to a two dimensional solution indicating that the two dim ensional solution was optimal. M ultiple linear regression, using the stim u lu s (dim ension) co o rd in a te s as dependent variables and the test values as in d e p e n d e n t values, was done to in te rp re t th e dim ensions (table IV). Sodium and bicarbonate had positive w eights and potassium had a negative weight on Factor 1. On Factor 2, sodium and chloride had positive weights and b ic a rb o n a te had a n e g ativ e w eig h t. Sodium loaded on both factors as in the PCA solution. Potassium, which loaded only on Factor 1, was im portant in the separation of renal from normal patients. It appears that Factor 1 was an acid-base (ph) scale. Factor 2 may be interpreted as an ion balance scale. M ultidim ensional scaling (MDS) and PCA w ere com pared for th eir ability to T A B L E I V Multidimensional Scaling Analysis D i m e n s i o n a l Goodness of Fit Test N u m b e r o f D i m e n s i o n s Str e s s R 2 1 0. 3 4 1 0.726 2 0. 1 1 8 0. 9 5 4 3 0. 0 3 1 0. 9 9 6 T w o - D i m e n s i o n a l M u l t i p l e L i n e a r R e g r e s s i o n W e i g h t s V a r i a b l e D i m e n s i o n 1 D i m e n s i o n 2 S o d i u m 0. 1 9 8 0. 1 9 4 C h l o r i d e 0. 0 6 5 0. 2 0 7 P o t a s s i u m - 0. 0 9 2 0. 0 0 3 Bicarbonate 0. 250-0.20 3 I n t e r c e p t 0. 782-0.563 M u l t i p l e 0. 9 9 8 0. 9 9 3

4 1 6 LACHER -1 -PC A -----------------------------------------------------------------------«. «M U»»», K. K»»» M...... ----------------- M D S - 3-2 - 1 0 1 2 3 F ig u r e 1. One-dimensional plot of factor (stimulus) scores of 22 renal ( ) and 22 normal () patients for principal component analysis and multidimensional scaling. discrim inate patients with renal failure from normal people. For the one dim ensio n s o lu tio n, M D S c la s s ifie d th e patients b etter than PCA as seen graphically as less overlapping betw een patient groups (figure 1). Principal com ponent analysis did not separate the renal from normal patients as well as MDS in the two-dimensional solution (figures 2 and 3). It appears that acidosis was im portant in separating th e two groups. D iscrim i nant analysis or cluster analysis could also be used to classify patients, b u t F A C T O R I I 1.5 1.0 0.5 0.0-0.5-1.0-1.5 th ese te c h n iq u e s w ould n ot read ily explain the interrelationships among the variables. Principal com ponent analysis (PCA) and M DS w ere applied to a profile of chem istry tests to reduce the dimensionality of th e variable set and to discrim i nate two patient groups. Both MDS and PCA acco m p lish e d th e re d u c tio n in dim ensionality b u t different in terp retations of th e dim ensions; M DS b e tte r separated the two patient groups than PCA. M ultidim ensional scaling is fre- F ig u r e 2. Two-dimensional plot of factor scores for renal ( ) and normal patients () using principal com ponent analysis with varimax rotation. - 2.0 -- - 2.5 -- -3.0 -- - 4.0-3.5-3.0-2.5-2.0-1.5-1.0-0.5 0.0 0.5 I 1.0 1.5 2.0 F A C T O R I

INTERPRETATION O F LABORATORY RESULTS 4 1 7 FACTOR II 2.0 1.5 -- 1.0 -- 0.5 -- F ig u r e 3. Two-dimensional plot of multidimensional scaling stim u lu s scores for renal ( ) and normal patients (). 0.0-0.5 -- xx -1.0 -- - 1.5-2.0 -- -- -------1---- 1---- 1---- 1---- 1---- 1---- 1---- 1---- 1- - 2.5-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 2.5 FACTOR I quently applied to the social sciences but is rarely applied to laboratory medicine. W itte used M DS for data reduction to predict bone marrow findings from tests perform ed in peripheral blood.8 Gattaz applied MDS to separate schizophrenic patients from norm al patients by obtaining a two dimensional representation of 17 cerebrospinal su b stan ces.3 M ultidim ensional scaling and principal com ponent analysis can reduce a large num ber of variables to a few significant variables in order to simplify data analysis. References 1. B o y d, J. C.: Use of methods of pattern recognition to assist in test selection and test interpretation. Clinics Lab Med. 2:717 734, 1982. 2. D ix o n, W. J., ed.: BMDP Statistical Software Manual. Berkeley, CA, University of California Press, 1985. 3. G a t t a z, W. F., G a s s e r, T., and B e c k m a n n, H.: Multidimensional analysis of the concentration of 17 substances in the CSF of schizophrenics and controls. Biol. Psychiatry 20:360-366, 1985. 4. G o r s u c h, R. L.: Factor Analysis, (2nd ed.). Hillsdale, N J, Lawrence Erlbaum Associates, Inc., 1983. 5. K r u s k a l, J. B. and W i s h, M.: Multidimensional Scaling. Beverly Hills, CA, Sage Publications, Inc., 1978. 6. SAS User s Guide: Statistics, 5th ed. Cary, NC, SAS Institute, Inc., 1985. 7. S c h i f f m a n, S., R e y n o l d s, M., and Yo u n g, F.: In tro d u ctio n to M ultidim ensional Scaling. Orlando, FL, Academic Press, 1981. 8. W i t t e, D. L., K r a e m e r, D. F., J o h n s o n, G. F., D ic k, F. R., and H a m il t o n, H.: Prediction of bone marrow iron findings from tests performed on peripheral blood. Am. J. Clin. Pathol. 55:202-206, 1986.