This column is a continuation of our previous column

Similar documents
Statistics for Business and Economics

Chapter 11: Simple Linear Regression and Correlation

Statistics MINITAB - Lab 2

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics for Economics & Business

Chapter 9: Statistical Inference and the Relationship between Two Variables

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 13: Multiple Regression

Linear Regression Analysis: Terminology and Notation

Comparison of Regression Lines

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Basic Business Statistics, 10/e

Lecture 6: Introduction to Linear Regression

x = , so that calculated

NUMERICAL DIFFERENTIATION

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

β0 + β1xi. You are interested in estimating the unknown parameters β

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Chapter 15 - Multiple Regression

Statistical Evaluation of WATFLOOD

Chapter 8 Indicator Variables

STAT 3008 Applied Regression Analysis

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

SIMPLE LINEAR REGRESSION

Chemometrics In Spectroscopy Limitations in Analytical Accuracy, Part I: Horwitz s Trumpet

Correlation and Regression

Kernel Methods and SVMs Extension

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

β0 + β1xi. You are interested in estimating the unknown parameters β

Introduction to Regression

/ n ) are compared. The logic is: if the two

Economics 130. Lecture 4 Simple Linear Regression Continued

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Negative Binomial Regression

Chapter 14 Simple Linear Regression

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Properties of Least Squares

Lecture 3 Stat102, Spring 2007

STATISTICS QUESTIONS. Step by Step Solutions.

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Linear Approximation with Regularization and Moving Least Squares

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Chemometrics. Unit 2: Regression Analysis

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Chapter 6. Supplemental Text Material

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Global Sensitivity. Tuesday 20 th February, 2018

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

A Comparative Study for Estimation Parameters in Panel Data Model

Solution Thermodynamics

STAT 511 FINAL EXAM NAME Spring 2001

Statistics II Final Exam 26/6/18

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Uncertainty in measurements of power and energy on power networks

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Lab 4: Two-level Random Intercept Model

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Statistics Chapter 4

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Analytical Chemistry Calibration Curve Handout

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Measurement Uncertainties Reference

Chapter 6. Supplemental Text Material. Run, i X i1 X i2 X i1 X i2 Response total (1) a b ab

A Robust Method for Calculating the Correlation Coefficient

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Uncertainty as the Overlap of Alternate Conditional Distributions

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Laboratory 3: Method of Least Squares

Midterm Examination. Regression and Forecasting Models

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Polynomial Regression Models

U-Pb Geochronology Practical: Background

RELIABILITY ASSESSMENT

Dummy variables in multiple variable regression model

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Laboratory 1c: Method of Least Squares

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Basically, if you have a dummy dependent variable you will be estimating a probability.

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

AP Physics 1 & 2 Summer Assignment

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

A Monte Carlo Study for Swamy s Estimate of Random Coefficient Panel Data Model

The SAS program I used to obtain the analyses for my answers is given below.

APPENDIX 2 FITTING A STRAIGHT LINE TO OBSERVATIONS

Transcription:

Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard Mark Ths column s a contnuaton of our prevous column descrbng the use of goodness of ft statstcal parameters (). When developng a calbraton for quanttatve analyss one must select the analyte range over whch the calbraton s performed. For a gven standard error of analyss the sze of the range wll have a drect effect on the magntude of the correlaton coeffcent. The standard devaton of Y also has a drect effect, demonstrated by notng the computaton for correlaton between X and Y, n matrx notaton, denoted as covar (X, Y ) r = stdev(x ) stdev(y ) [] Note for ths example that covar(x,y) represents the covarance of (X,Y), stdev(x) s the standard devaton of the X data and stdev(y) s the standard devaton of the Y data. For the MathCad program (MathCad software, MathSoft Engneerng & Educaton, Inc., Cambrdge, MA), the stdev(x) s represented by the varable symbol Sr, whch Jerome Workman Jr. serves on the Edtoral Advsory Board of Spectroscopy and s chef techncal offcer and vce presdent of research and engneerng for Argose, Inc. (Waltham, MA). He can be reached by e-mal at jworkman@argose.com. Howard Mark serves on the Edtoral Advsory Board of Spectroscopy and runs a consultng servce, Mark Electroncs (69 Jame Court, Suffern, NY 9). He can be reached va e-mal at hlmark@ prodgy.net. can be thought of as the set of many possble standard devatons for a set of data X. Thus, a comparson of the correlaton coeffcent between two or more sets of X, Y data pars cannot be performed adequately unless the standard devatons of the two data sets are nearly dentcal or unless the correlaton coeffcent confdence lmts for the data sets are compared. In summary, f Set A of X, Y pared data has a correlaton of.95 ths does not necessarly ndcate that t s more hghly correlated than Set B of X, Y pared data wth a correlaton of.9. The meanng of ths wll be descrbed n greater detal later. Let us look at seven slghtly dfferent equatons (r through r 7,or equatons 6 ) for calculatng correlaton between X (known concentraton or analyte data for a set of standards) and Y (nstrument measured data for those standards) usng MathCad functon or summaton notaton nomenclature. Frst we must defne the calculaton of the standard error of performance, also termed the standard error of predcton (SEP), and the calculatons for the slope (K ) and the ntercept (K ) for the lnear regresson lne between X and Y.The regresson lne for estmatng the concentraton denoted by (PredX or ˆx) s gven as PredX = ˆx = K Y + K [] The standard error of performance, whch represents an estmate of the predcton error ( sgma) for a regresson lne, s gven as ( ) SEP= Xˆ X [3] n June 4 9(6) Spectroscopy 9

The slope (K ) and ntercept (K ) of the lne for ths regresson lne s gven as (a) ( ) ( ) ( ) n Y X Y X K = [4] n Y Y.86.7 K = ( Y ) X Y ( Y X ) n( Y ) ( Y) [5] The seven ways (r through r 7 ) for calculatng correlaton as the square root of the rato of the explaned varaton over the total varaton between X (concentraton of analyte data) and Y (measured data) are descrbed usng many notatonal forms. For example, many software packages provde bultn functons capable of calculatng the coeffcent of correlaton drectly from a par of X and Y vectors as gven by r (equaton 6). (Ths s the bult-n MathCad correlaton functon.).57.43.9.4.57.4.7.9.86 3.43 4 Fgure. Plots of correlaton coeffcent versus the standard devaton of the samples used for calbraton wth a standard error of estmate of.. r = corr(x,y) [6] Several software packages contan smple command lnes for performng matrx computatons drectly and thus are capable of convenently computng the correlaton coeffcent, as shown n r (equaton 7). r = covar( X, Y ) [7] stdev (X ) stdev (Y ) Equaton 7 denotes the rato of the covarance of X on Y to the standard devaton of X tmes the standard devaton of Y, where X and Y are vectors. If the software s capable of usng summaton notaton, then one can use ths algebrac form for calculatng the correlaton as n r 3 and r 4 (equatons 8 and 9, respectvely). (b).999.997.996.994.993.99.99.57.4.7.9.86 3.43 4 r 3 = ( Xˆ X) ( X X) [8] Equaton 8 s the square root of the Fgure b. rato comprsng the sum of the squared dfferences between each predcted X and the mean of all X,to the sum of the squared dfferences between all ndvdual X values and the mean of all X. 3 Spectroscopy 9(6) June 4 www.spectroscopyonlne.com

(c).98.96.94.9.9.88.86..6.3.37.43.49.54.6 Fgure c. And f the software allows you to assgn varable names as needed for specfc computatons, such as standard error of performance or standard devatons, then you can proceed to use computatonal descrptons such as r 5 and r 6 (equatons and, respectvely) to compute the correlaton. r 5 = SEP [] (stdevx ) Equaton ndcates that the correlaton coeffcent s represented by the square root of one mnus the rato comprsng the square of the standard error of performance, to the square of the standard devaton of all X. r 6 = SEP [] stdevx Equaton s smply the algebrac equvalent of the equaton found above. Other computatonal methods for correlaton are gven n reference, page 5 (as shown n equaton ). Coeffcent of determnaton (R [Sr]).5 3 4 r 4 = [9] ( Xˆ X ) ( X X) Equaton 9 denotes the square root Fgure. Plot of coeffcent of determnaton versus the standard devaton of the samples used for calbraton. of one mnus the rato comprsng the sum of the squared dfferences between each predcted X and ts correspondng X,to the sum of the squared dfferences between all ndvdual X values and the mean of all X. { ( )( )} x x y y r 7 = [] ( x x) ( y y) You mght be surprsed that for our example data from reference, page 6, the correlaton coeffcent calculated usng any of these methods of computaton for the r-value s.9988795653485. When we evaluate the correlaton computaton we see that gven a relatvely equvalent predcton error represented by the standard error of performance, the standard devaton of the data set (X) determnes the magntude of the correlaton coeffcent. Ths s llustrated usng Fgures a and b. These graphcs allow the correlaton coeffcent to be dsplayed for any specfed standard error of predcton, also occasonally denoted as the standard error of estmate (SEE). It should be obvous that for any statstcal study one must compare the actual computatonal June 4 9(6) Spectroscopy 3

recpes used to make a calculaton, rather than rely on nonstandard termnology and assume that the computatons are what one expected. For a graphcal comparson of the correlaton (r[sr]) and the standard devaton of the samples used for calbraton (Sr), a value s entered for the standard error of performance for a specfed analyte range as ndcated through the standard devaton of that range. The resultant graphc dsplays Sr (as the abscssa) versus r (as the ordnate). From ths graphc t can be seen how the correlaton coeffcent ncreases wth a constant standard error of performance as the standard devaton of the data ncreases. Thus when comparng correlaton results for analytcal methods, one must consder carefully the standard devaton of the analyte values for the samples used n order to make a far comparson. For the example shown, the standard error of estmate s set to., whle the correlaton s scaled from. to. for Sr values from. to 4.. Fgure b demonstrates the correlaton range above.99 for the fgure n Fgure a. Note that the correlaton begns to flatten when Sr s over an order of magntude tmes the standard error of the estmate. Note from Fgure c that at a certan value for standard devaton of X (denoted as Sr), a small change n Sr results n a large apparent change n the correlaton. For example, n ths case where the standard error of the estmate s set to., the correlaton changes from.86 to.95 when Sr s changed only from. to.3. As s the general case, usng correlaton to compare analytcal methods requres dentcal sample analyte standard devatons, or comparson of the confdence lmts for the correlaton coeffcents to nterpret the sgnfcance of the dfferent correlaton values. For a graphcal comparson of the coeffcent of determnaton (R ) and Sr,a value s entered for the standard error of estmate for a specfed range of Sr. The resultant graphc (Fgure ) dsplays Sr (abscssa) versus R (ordnate). From ths graph t can be seen how the Correlaton coeffcent (r[see]).98.96 3 4 Rato of Sr/SEE (R[Sr]) Fgure 3. Plot of correlaton coeffcent versus the rato of Sr/SEE. Correlaton coeffcent (r[see]).5 3 4 Standard error of estmate (SEE) Fgure 4. Plot of coeffcent of determnaton versus standard error of estmate. coeffcent of determnaton ncreases as the standard devaton of the data. The standard error of estmate s set at. as n the examples shown n Fgures a and b. Note that the same recommendaton holds whether usng r or R that relatve comparsons for ths statstc should not be used unless the standard devatons of the comparatve data sets are dentcal. Fgure 3 shows the relatve rato of the range (Sr) to the standard error of estmate (abscssa) as compared wth the correlaton coeffcent r as the 3 Spectroscopy 9(6) June 4 www.spectroscopyonlne.com

Correlaton coeffcent (r[see]).5..4.6.8 Rato of Sr/SEE (R[Sr]) Fgure 5. Plot of correlaton coeffcent versus the rato of the standard error of estmate and standard devaton of the samples used for calbraton. ordnate. Ths graph shows that the correlaton coeffcent contnues to ncrease as the rato of Sr/SEE even when the rato approaches more than 6. Note that when the rato s greater than there s not much mprovement n the correlaton. A graphcal comparson of r versus the standard error of estmate s shown n Fgure 4. Ths graphc clearly shows that when Sr s held constant (Sr = 4) the correlaton decreases as the standard error of estmate ncreases. Fgure 5 shows the relatonshp between correlaton and the rato of SEE/Sr, as the standard error of estmate ncreases relatve to Sr the correlaton decreases rapdly. We have ntroduced several common methods for calculatng the correlaton coeffcent between a set of pared X and Y data. Durng ths dscusson we have shown that the absolute values for correlaton are obvously qute dependent upon the standard devaton of the ranges for these data. Lkewse, the magntude of the standard error of performance (or standard error of estmate) s also mportant for correlaton, whch affects the correlaton when ts magntude changes relatve to the standard devaton (or range) of the data. Thus t s mportant that the data ranges be equvalent when smply comparng absolute values for correlaton. In future columns, we wll calculate confdence lmts for comparng these statstcal parameters, ncludng consderatons for varyng sample sze. References. J. Workman and H. Mark, Spectroscopy 9(4), 38 4 (4).. J.C. Mller and J.N. Mller, Statstcs for Analytcal Chemstry, nd ed. (Ells Horwood, New York, 99). Note: The authors have receved some error notces regardng the recent seres of columns that dscussed dervatves; the results presented should therefore not be used wthout verfcaton. Correctons wll be publshed as the errors are verfed and as the publshng schedule permts. The authors apologze for any nconvenence. Crcle 35