Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard Mark Ths column s a contnuaton of our prevous column descrbng the use of goodness of ft statstcal parameters (). When developng a calbraton for quanttatve analyss one must select the analyte range over whch the calbraton s performed. For a gven standard error of analyss the sze of the range wll have a drect effect on the magntude of the correlaton coeffcent. The standard devaton of Y also has a drect effect, demonstrated by notng the computaton for correlaton between X and Y, n matrx notaton, denoted as covar (X, Y ) r = stdev(x ) stdev(y ) [] Note for ths example that covar(x,y) represents the covarance of (X,Y), stdev(x) s the standard devaton of the X data and stdev(y) s the standard devaton of the Y data. For the MathCad program (MathCad software, MathSoft Engneerng & Educaton, Inc., Cambrdge, MA), the stdev(x) s represented by the varable symbol Sr, whch Jerome Workman Jr. serves on the Edtoral Advsory Board of Spectroscopy and s chef techncal offcer and vce presdent of research and engneerng for Argose, Inc. (Waltham, MA). He can be reached by e-mal at jworkman@argose.com. Howard Mark serves on the Edtoral Advsory Board of Spectroscopy and runs a consultng servce, Mark Electroncs (69 Jame Court, Suffern, NY 9). He can be reached va e-mal at hlmark@ prodgy.net. can be thought of as the set of many possble standard devatons for a set of data X. Thus, a comparson of the correlaton coeffcent between two or more sets of X, Y data pars cannot be performed adequately unless the standard devatons of the two data sets are nearly dentcal or unless the correlaton coeffcent confdence lmts for the data sets are compared. In summary, f Set A of X, Y pared data has a correlaton of.95 ths does not necessarly ndcate that t s more hghly correlated than Set B of X, Y pared data wth a correlaton of.9. The meanng of ths wll be descrbed n greater detal later. Let us look at seven slghtly dfferent equatons (r through r 7,or equatons 6 ) for calculatng correlaton between X (known concentraton or analyte data for a set of standards) and Y (nstrument measured data for those standards) usng MathCad functon or summaton notaton nomenclature. Frst we must defne the calculaton of the standard error of performance, also termed the standard error of predcton (SEP), and the calculatons for the slope (K ) and the ntercept (K ) for the lnear regresson lne between X and Y.The regresson lne for estmatng the concentraton denoted by (PredX or ˆx) s gven as PredX = ˆx = K Y + K [] The standard error of performance, whch represents an estmate of the predcton error ( sgma) for a regresson lne, s gven as ( ) SEP= Xˆ X [3] n June 4 9(6) Spectroscopy 9
The slope (K ) and ntercept (K ) of the lne for ths regresson lne s gven as (a) ( ) ( ) ( ) n Y X Y X K = [4] n Y Y.86.7 K = ( Y ) X Y ( Y X ) n( Y ) ( Y) [5] The seven ways (r through r 7 ) for calculatng correlaton as the square root of the rato of the explaned varaton over the total varaton between X (concentraton of analyte data) and Y (measured data) are descrbed usng many notatonal forms. For example, many software packages provde bultn functons capable of calculatng the coeffcent of correlaton drectly from a par of X and Y vectors as gven by r (equaton 6). (Ths s the bult-n MathCad correlaton functon.).57.43.9.4.57.4.7.9.86 3.43 4 Fgure. Plots of correlaton coeffcent versus the standard devaton of the samples used for calbraton wth a standard error of estmate of.. r = corr(x,y) [6] Several software packages contan smple command lnes for performng matrx computatons drectly and thus are capable of convenently computng the correlaton coeffcent, as shown n r (equaton 7). r = covar( X, Y ) [7] stdev (X ) stdev (Y ) Equaton 7 denotes the rato of the covarance of X on Y to the standard devaton of X tmes the standard devaton of Y, where X and Y are vectors. If the software s capable of usng summaton notaton, then one can use ths algebrac form for calculatng the correlaton as n r 3 and r 4 (equatons 8 and 9, respectvely). (b).999.997.996.994.993.99.99.57.4.7.9.86 3.43 4 r 3 = ( Xˆ X) ( X X) [8] Equaton 8 s the square root of the Fgure b. rato comprsng the sum of the squared dfferences between each predcted X and the mean of all X,to the sum of the squared dfferences between all ndvdual X values and the mean of all X. 3 Spectroscopy 9(6) June 4 www.spectroscopyonlne.com
(c).98.96.94.9.9.88.86..6.3.37.43.49.54.6 Fgure c. And f the software allows you to assgn varable names as needed for specfc computatons, such as standard error of performance or standard devatons, then you can proceed to use computatonal descrptons such as r 5 and r 6 (equatons and, respectvely) to compute the correlaton. r 5 = SEP [] (stdevx ) Equaton ndcates that the correlaton coeffcent s represented by the square root of one mnus the rato comprsng the square of the standard error of performance, to the square of the standard devaton of all X. r 6 = SEP [] stdevx Equaton s smply the algebrac equvalent of the equaton found above. Other computatonal methods for correlaton are gven n reference, page 5 (as shown n equaton ). Coeffcent of determnaton (R [Sr]).5 3 4 r 4 = [9] ( Xˆ X ) ( X X) Equaton 9 denotes the square root Fgure. Plot of coeffcent of determnaton versus the standard devaton of the samples used for calbraton. of one mnus the rato comprsng the sum of the squared dfferences between each predcted X and ts correspondng X,to the sum of the squared dfferences between all ndvdual X values and the mean of all X. { ( )( )} x x y y r 7 = [] ( x x) ( y y) You mght be surprsed that for our example data from reference, page 6, the correlaton coeffcent calculated usng any of these methods of computaton for the r-value s.9988795653485. When we evaluate the correlaton computaton we see that gven a relatvely equvalent predcton error represented by the standard error of performance, the standard devaton of the data set (X) determnes the magntude of the correlaton coeffcent. Ths s llustrated usng Fgures a and b. These graphcs allow the correlaton coeffcent to be dsplayed for any specfed standard error of predcton, also occasonally denoted as the standard error of estmate (SEE). It should be obvous that for any statstcal study one must compare the actual computatonal June 4 9(6) Spectroscopy 3
recpes used to make a calculaton, rather than rely on nonstandard termnology and assume that the computatons are what one expected. For a graphcal comparson of the correlaton (r[sr]) and the standard devaton of the samples used for calbraton (Sr), a value s entered for the standard error of performance for a specfed analyte range as ndcated through the standard devaton of that range. The resultant graphc dsplays Sr (as the abscssa) versus r (as the ordnate). From ths graphc t can be seen how the correlaton coeffcent ncreases wth a constant standard error of performance as the standard devaton of the data ncreases. Thus when comparng correlaton results for analytcal methods, one must consder carefully the standard devaton of the analyte values for the samples used n order to make a far comparson. For the example shown, the standard error of estmate s set to., whle the correlaton s scaled from. to. for Sr values from. to 4.. Fgure b demonstrates the correlaton range above.99 for the fgure n Fgure a. Note that the correlaton begns to flatten when Sr s over an order of magntude tmes the standard error of the estmate. Note from Fgure c that at a certan value for standard devaton of X (denoted as Sr), a small change n Sr results n a large apparent change n the correlaton. For example, n ths case where the standard error of the estmate s set to., the correlaton changes from.86 to.95 when Sr s changed only from. to.3. As s the general case, usng correlaton to compare analytcal methods requres dentcal sample analyte standard devatons, or comparson of the confdence lmts for the correlaton coeffcents to nterpret the sgnfcance of the dfferent correlaton values. For a graphcal comparson of the coeffcent of determnaton (R ) and Sr,a value s entered for the standard error of estmate for a specfed range of Sr. The resultant graphc (Fgure ) dsplays Sr (abscssa) versus R (ordnate). From ths graph t can be seen how the Correlaton coeffcent (r[see]).98.96 3 4 Rato of Sr/SEE (R[Sr]) Fgure 3. Plot of correlaton coeffcent versus the rato of Sr/SEE. Correlaton coeffcent (r[see]).5 3 4 Standard error of estmate (SEE) Fgure 4. Plot of coeffcent of determnaton versus standard error of estmate. coeffcent of determnaton ncreases as the standard devaton of the data. The standard error of estmate s set at. as n the examples shown n Fgures a and b. Note that the same recommendaton holds whether usng r or R that relatve comparsons for ths statstc should not be used unless the standard devatons of the comparatve data sets are dentcal. Fgure 3 shows the relatve rato of the range (Sr) to the standard error of estmate (abscssa) as compared wth the correlaton coeffcent r as the 3 Spectroscopy 9(6) June 4 www.spectroscopyonlne.com
Correlaton coeffcent (r[see]).5..4.6.8 Rato of Sr/SEE (R[Sr]) Fgure 5. Plot of correlaton coeffcent versus the rato of the standard error of estmate and standard devaton of the samples used for calbraton. ordnate. Ths graph shows that the correlaton coeffcent contnues to ncrease as the rato of Sr/SEE even when the rato approaches more than 6. Note that when the rato s greater than there s not much mprovement n the correlaton. A graphcal comparson of r versus the standard error of estmate s shown n Fgure 4. Ths graphc clearly shows that when Sr s held constant (Sr = 4) the correlaton decreases as the standard error of estmate ncreases. Fgure 5 shows the relatonshp between correlaton and the rato of SEE/Sr, as the standard error of estmate ncreases relatve to Sr the correlaton decreases rapdly. We have ntroduced several common methods for calculatng the correlaton coeffcent between a set of pared X and Y data. Durng ths dscusson we have shown that the absolute values for correlaton are obvously qute dependent upon the standard devaton of the ranges for these data. Lkewse, the magntude of the standard error of performance (or standard error of estmate) s also mportant for correlaton, whch affects the correlaton when ts magntude changes relatve to the standard devaton (or range) of the data. Thus t s mportant that the data ranges be equvalent when smply comparng absolute values for correlaton. In future columns, we wll calculate confdence lmts for comparng these statstcal parameters, ncludng consderatons for varyng sample sze. References. J. Workman and H. Mark, Spectroscopy 9(4), 38 4 (4).. J.C. Mller and J.N. Mller, Statstcs for Analytcal Chemstry, nd ed. (Ells Horwood, New York, 99). Note: The authors have receved some error notces regardng the recent seres of columns that dscussed dervatves; the results presented should therefore not be used wthout verfcaton. Correctons wll be publshed as the errors are verfed and as the publshng schedule permts. The authors apologze for any nconvenence. Crcle 35