Reasons fo Teaching and Using the Signed Coefficient of Detemination Instead of the Coelation Coefficient by John N. Zoich, J. www.johnzoich.com THIS IS THE MSWORD DOCUMENT FROM WHICH THE JOURNAL OF THE AMERICAN MATHEMATICAL ASSOCIATION OF TWO YEAR COLLEGES CREATED THE PUBLISHED 018 VERSION. THE 1ST PAGE OF THE AS-PUBLISHED VERSION CAN BE FOUND AT THE END OF THIS DOCUMENT; THE FULL AS-PUBLISHED VERSION, PAGES 48-51+58, CAN BE FOUND STARTING AT: https://amatyc.site-ym.com/page/educatosumme018 ---------------------------------------------------------------- Does the simple linea coelation coefficient, typically symbolized by o XY, have meaning that is mathematically igoous? In othe wods, what does it mean when one value is twice as lage as anothe? Does it mean twice as much coelation? Does measue o quantify the stength of coelation? Can any of the definitions o fomulas fo the coelation coefficient help answe these questions? Coelation Coefficient Mathematical coelation and the simple linea coelation coefficient became impotant pats of statistical method afte thei discovey in 1888 by Fancis Galton (Walke, 199). He defined them by saying that two vaiable ogans ae said to be co-elated when the vaiation of the one is accompanied on the aveage by moe o less vaiation of the othe (Galton, 1888, p. 135); he povided no mathematically igoous meaning (in the sense povided in the intoduction) fo what he called his index of co-elation (p. 143), which he symbolized by, othe than to say that it measues the closeness of co-elation (p. 145). As ealy as 189, came to be called the coefficient of coelation (Edgewoth, 189, p. 191) and was then viewed (incoectly) as a popotion (Edgewoth, 1893, p. 674). The following is a chonological list of examples of definitions that have been given fo since Galton s time; none of these definitions help answe the questions posed peviously: measues the coespondence between deviations fom thei means of the two seies of obsevations (Bowley, 1901, p. 30), [ is the] amount of dependence one vaiable has upon anothe (Baten, 1938, p. 170), [ is the] degee of association between two vaiables (Duncan, 1986, p. 80), [ is the] extent to which the scattegaph of the elationship between two vaiables fits a staight line (Miles & Shevlin, 001, p. 0), and [] measues the stength of the linea elationship between two vaiables (Bluman, 015, p. 543). v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 1
Fomulas fo the Coelation Coefficient Thee ae many stikingly diffeent looking but mathematically identical fomulas fo the coelation coefficient (Symonds, 196; Roges & Nicewande, 1988). Although some of them ae useful fo intoducing novice students to the geneal concept of coelation (Zoich, 017), none of them help answe the questions initially posed. The following is a sampling of such fomulas. In 1896, the fist basic fomula fo estimating the coelation coefficient had finally been pesented (Stigle, 1986, p. 343); its pesente was Kal Peason. That is why the coelation coefficient has most often been defined as Peason s : Fomula 1: covaiance of X, Y Peason's. vaiance of X vaiance of Y Fomulas, 3, and 4 have also been used to define the coelation coefficient. Fo example, Fomula was used by Yule in 1910 (p. 538), Fomula 3 by Ezekiel in 1930 (p. 118), and Fomula 4 by Dixon and Massey in 1969 (p. 03). The meanings of symbols used in these fomulas ae: b1 the slope of the linea egession of Y on X, b the slope of the linea egession of X on Y, S X standad deviation of the X values, SY standad deviation of the Y values, SYe the standad deviation of the values deived fom the linea egession equation at each X value in an X, Y data set (i.e., Ye a bx ), and (in Fomulas and 3) takes its sign fom the slopes of the linea egessions. Fomula : Fomula 3: Fomula 4: b b SYe S Y 1 S X b1 SY Based upon Fomula 3, it is tempting to descibe the coelation coefficient as the popotion of the total vaiation (measued in units of standad deviation) that can be explained by the linea dependence of Y on X. In fact, that is exactly how it was descibed in what may have been the fist book eve published solely on coelation: [Fomula 3] is then a measue of... the amount of coelation... The [coelation] coefficient is simply a measue of how lage the vaiation in the estimated values is, in popotion to the vaiation in the oiginal values (Ezekiel, 1930, pp. 118 119). Ezekiel eventually v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p.
ealized that it is mathematically invalid to conside a atio of standad deviations to be a popotion, as evidenced by the fact that the thid edition of his book defined the popotion of vaiation in Y accounted fo by X (Ezekiel & Fox, 1959, p. 17) as a atio of vaiances (athe than standad deviations), and stated that the squae oot of this popotion [athe than the popotion itself] is temed the coefficient of coelation (p. 17). No fomula and/o definition fo itself has eve been used to povide a mathematically igoous explanation of what it means when one value is twice that of anothe. Even lengthy philosophical and technical discussions failed in this egad (e.g., 7 pages in Peason (1911, pp. 15 178), and 115 pages in Yule (191, pp. 157 53, 317 334)). As is geneally well known, the only way to povide such an explanation is to tansfom each value into its espective coefficient of detemination. Coefficient of Detemination The coefficient of detemination has been symbolized in vaious souces by, equals the squae of the coelation coefficient and is theefoe commonly efeed to as R-squaed. The ealiest known efeence to it is fom Wight (191): Anothe coefficient which it will be convenient to use, the coefficient of detemination [emphasis added] of X by A,... measues the faction [emphasis added] of complete detemination fo which facto A is diectly esponsible (p. 56). Almost evey textbook that discusses descibes it as a bette statistic than fo quantifying the explainable popotion of vaiation in the Y values in an X, Y data set. The following chonological list contains examples of definitions that have been given fo since Wight s time; they ae mathematically igoous because of thei valid use of the wods pecentage and popotion: [ ] may be said to measue the pe cent [sic] to which the vaiance in Y is detemined by X, since it measues that popotion of all the elements of vaiance in Y which ae also pesent in X. (Ezekiel, 1930, p. 10); [ is the] popotion of the vaiance of Y that can be attibuted to its linea egession on X (Snedeco & Cochan, 1967, p. 176); [ is the] pecentage of the total vaiation in Y which can be attibuted to the linea elationship with X (Bohnstedt & Knoke, 1988, p. 70); and [ is the] popotion of the vaiability in Y... pedicted by the elationship with X (Gavette & Wallnau, 000, p. 565). Fomulas fo the Coefficient of Detemination The squae of any of the fomulas fo the coelation coefficient could be used to calculate the coefficient of detemination. Fo example, using Fomula 3, we can deive, XY R, o R XY. It v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 3
Fomula 5: S V V V S V V Ye Ye Y Yu, Y Y Y whee V Yu is defined by Fomula 6, and V Ye is deived via Fomula 7. Fomula 6: Fomula 7: V Yu V V V Y Y e ( n 1) Y Ye Yu In Fomula 7, the total vaiation of Y is boken down aithmetically into two components, one of which V epesents the vaiation due to the linea elationship between X and Y, and the othe of which V Ye Yu epesents the emainde of the vaiation (vaiation that is unexplained by, o is not due to, a linea elationship between X and Y). Thus, these fomulas help povide a mathematically igoous meaning fo, namely that it is a decimal faction between 0 and 1, a faction whose numeato is the amount of Y vaiation caused by the linea egession dependence of Y on X (as measued by VYe ), whose denominato is the total vaiation of Y (as measued by VY ). Theefoe, a linea coefficient of detemination whose value is exactly twice that of anothe means that exactly twice as much of the vaiation in Y is due to its linea coelation with X. If we apply the mathematical igo of the pevious example to the coelation coefficient, it will not be valid. This is emphasized in the wanings which follow. Wanings The following chonological list contains examples of wanings given about since 191: Fo many puposes it is enough to look on it [the coelation coefficient] as giving an abitay scale [emphasis added] between +1 fo pefect positive coelation, 0 fo no coelation, and 1 fo pefect negative coelation (Wight, 191, p. 558); A peson who has no knowledge of statistics might easily be led to the eoneous [emphasis added] idea that a coelation of = 0.80 is twice is good as a coelation of = 0.40, o that a coelation of = 0.75 is thee times as good o thee times as stong as a coelation of = 0.5 (Feund, 1960, p. 333); The coelation coefficient is not a popotion and one cannot talk about one coelation coefficient being twice anothe, no about one coelation coefficient being 0. moe than anothe; its scale must be egaded as odinal [emphasis added] (Selkik, 1981, p. 17); and While a [coelation coefficient] value of 0.00 indicates no linea elationship and a value of ±1.00 indicates a pefect linea elationship, values between these extemes have no diect intepetation [emphasis added] (Healey, 1984, p. 67). and v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 4
Histoy The pactice of epoting coelation as athe than oiginated with Galton s biological epot in 1888. Unfotunately, those who fist applied coelation to othe fields continued that pactice. Most pominent among such people was Kal Peason, who fom the mid-1890s to the Fist Wold Wa... dominated statistical theoy in Bitain (MacKenzie, 1981, p. 10). In Peason s 1896 aticle that intoduced the fomula fo Peason s, he epeatedly compaed values (not values) fom diffeent data sets, theeby giving the eade the eoneous impession that magnitudes of values can be compaed diectly (i.e., without fist conveting them into values). The est of his ealy statistical papes in which coelation figued pominently wee similaly athe than focused (E. S. Peason, 1956). Anothe ealy majo contibuto in this field was R. A. Fishe. His books notably the many editions of Statistical Methods fo Reseach Wokes have become classics (MacKenzie, 1981, p. 183). In the 195 fist edition of that book, in the 38-page chapte entitled The Coelation Coefficient, Fishe explains the meaning of the coefficient of detemination; but he does so in only one sentence. In it, ρ efes to the population coelation coefficient and to the population coefficient of detemination: Of the total vaiance of y[,] the faction 1 is independent of x, while the emaining faction,, is detemined by, o calculable fom, the value of x (Fishe, 195, p. 145). In the 1958 thiteenth and final edition of the same-titled book, he included that same one sentence (Fishe, 1958, p. 18). That sentence was not intended by Fishe to uge the eade to use coesponding to ) v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 5 (i.e., the sample statistic athe than as a measue of coelation stength, as evidenced by the fact that in neithe the fist no thiteenth edition of his book was thee any subsequent discussion o even mention of the meaning o intepetation of o. Instead, he focused on, the coelation coefficient; he theeby led many of his eades to eoneously conclude that magnitudes of coelation coefficients can be compaed to each othe on a linea scale ( on a conventional scale, as he descibed it (Fishe, 195, pp. 153 154; 1958, p. 190). Pactical Uses fo the Coelation Coefficient Thee ae some pactical applications fo which the coelation coefficient is moe appopiate than the coefficient of detemination. Fo example: In the Finance industy, beta is a measue of an investment s volatility vesus a benchmak (such as the S&P 500). This next fomula is a common expession fo beta (Wikipedia, 017):
S A Fomula 10: beta, S B whee SA standad deviation of the daily % etun fom an investment (e.g., a stock) ove a given time peiod and SB standad deviation of the benchmak s daily % etun ove that same time peiod. In Mateials Science, one goal of mechanical poduct design is to ensue that the distibution of poduct stengths does not ovelap the distibution of anticipated stesses. Statistical analysis of that ovelap equies calculation of the standad deviation of the stength stess distibution; the calculation is given by Dovich (1990, p. 58): Fomula 11: S S S S S ( X Y ) X Y X Y, Conclusion whee S stands fo the standad deviation of the indicated vaiable o diffeence, X and Y efe to the stength and stess vaiables espectively, and is the coelation coefficient between X, Y values paied by individual on-test units of poduct. Wheeve the coelation coefficient might be used as a measue of stength and an indicato of diection of simple linea coelation, it would be bette to use a signed vesion of the coefficient of detemination (the sign being the same as fo the coelation coefficient). It is poposed that such a value be symbolized by S R and be called the signed-r-squaed o signed coefficient of detemination. One fomula that povides both its magnitude and sign, whee is the coelation coefficient, is: Fomula 1: S R 3. As discussed peviously, it would be misleading to think of as a popotion o to give it a diect intepetation. As a scale, it is abitay and pimaily odinal in natue. Theefoe, it would be bette if textbooks and instuctos taught measuement of the stength of simple linea coelation only in tems of R S, and taught only in the contexts of coelation histoy and pactical applications. v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 6
Refeences Baten, W. D. (1938). Elementay mathematical statistics. New Yok, NY: John Wiley and Sons. Bluman, A. G. (015). Elementay statistics: A step by step appoach (7th annotated instucto s ed.). New Yok, NY: McGaw-Hill. Bohnstedt, G. W., & Knoke, D. (1988). Statistics fo social data analysis. Itasca, IL: F. E. Peacock. Bowley, A. L. Si. (1901). Elements of statistics. London, UK: P. S. King and Son. Dixon, W. J., & Massey, F. J., J. (1969). Intoduction to statistical analysis (3d. ed.). New Yok, NY: McGaw-Hill. Dovich, R. A. (1990). Reliability statistics. Milwaukee, WI: ASQC Quality Pess. Duncan, A. J. (1986). Quality contol and industial statistics (5th ed.). Homewood, IL: Iwin. Edgewoth, F. Y. (189). Coelated aveages. Philosophical Magazine, 5(34), 190 04. doi: 10.1080/147864490860307 Edgewoth, F. Y. (1893). Statistical coelation between social phenomena. Jounal of the Royal Statistical Society, 56, 670 675. Ezekiel, M. (1930). Methods of coelation analysis. New Yok, NY: John Wiley and Sons. Ezekiel, M., & Fox, K. A. (1959). Methods of coelation and egession analysis: Linea and cuvilinea (3d ed.). New Yok, NY: John Wiley and Sons. Fishe, R. A. Si. (195). Statistical methods fo eseach wokes. Edinbugh, Scotland: Olive and Boyd. Fishe, R. A. Si. (1958). Statistical methods fo eseach wokes (13th ed., ev.). New Yok: Hafne. Feund, J. E. (1960). Moden elementay statistics (nd ed.). Englewood Cliffs, NJ: Pentice Hall. Galton, F. (1888). Co-elations and thei measuement, chiefly fom anthopometic data. Poceedings of the Royal Society of London, 45, 135 145. Gavette, F. J., & Wallnau, L. B. (000). Statistics fo the behavioal sciences (5th ed.). Belmont, CA: Wadswoth. Healey, J. F. (1984). Statistics: A tool fo social eseach. Belmont, CA: Wadswoth. MacKenzie, D. A. (1981). Statistics in Bitain, 1865 1930: The social constuction of scientific knowledge. Edinbugh, Scotland: Edinbugh Univesity Pess. Miles, J., & Shevlin, M. (001). Applying egession and coelation: A guide fo students and eseaches. Thousand Oaks, CA: Sage. Peason, E. S., (ed.). (1956). Kal Peason s ealy statistical papes. Cambidge, UK: Univesity Pess. Peason, K. (1896). Mathematical contibutions to the theoy of evolution. III. Regession, heedity, and panmixia. Philosophical Tansactions of the Royal Society of London, Seies A, 187, 53 318 (note: page efeences in the body of this pesent aticle ae to a epint in Kal Peason s Ealy Statistical Papes by E. S. Peason, cited peviously). doi: 10.1098/sta.1896.0007 Peason, K. (1911). The gamma of science. Pat I Physical. (3d ed., ev.). London, UK: Adam and Chales Black. Rodges, J. L., & Nicewande, W. A. (1988). Thiteen ways to look at the coelation coefficient. The Ameican Statistician, 4, 59 66.doi: 10.1080/00031305.1988.1047554 v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 7
Selkik, K. E. (1981). Rediguide 3: Coelation and egession. Nottingham, UK: Nottingham Univesity. Snedeco, G. W., & Cochan, W. G. (1967). Statistical methods (6th ed.). Ames, IA: Iowa State Univesity Pess. Stigle, S. M. (1986). The histoy of statistics: The measuement of uncetainty befoe 1900. Cambidge, MA: Belknap Pess. Symonds, P. M. (196). Vaiations of the poduct-moment (Peason) coefficient of coelation. Jounal of Educational Psychology, 17, 458 469. doi: 10.1037/h007008 Walke, H. M. (199). Studies in the histoy of statistical method: With special efeence to cetain educational poblems. Baltimoe, MD: Williams and Wilkins. doi: 10.1037/13379-000 Wikipedia. (017). Beta (finance) [Webpage]. Retieved Apil 1, 017 fom https://en.wikipedia.og/wiki/beta_(finance) Wight, S. (191). Coelation and causation. Jounal of Agicultual Reseach, 0, 557 585. Yule, G. U. (1910). The applications of the method of coelation to social and economic statistics. Bulletin de L Institut Intenational de Statistique, 18(1), 537 551. Yule, G. U. (191). An intoduction to the theoy of statistics (nd ed.). London, UK: Chales Giffin. Zoich, J. (017). Fou fomulas fo teaching the meaning of the coelation coefficient. MathAMATYC Educato, 8(3), 4 7. Attachments: see next page John N. Zoich eceived an MS degee in botany fom the Univesity of Califonia, Davis, in 1979. He has woked in medical-device design and manufactuing as an independent statistical consultant and instucto since 1999. Annually since 005, he has taught a couse in intoductoy and applied statistics fo the Biotechnology Cente of Ohlone College (Newak, CA). Peviously, he taught couses in applied statistics at Silicon Valley Polytechnic Institute and fo the Silicon Valley ASQ Biomedical Division, and he has tutoed intoductoy statistics at Lone Sta College (Kingwood, TX). He cuently esides nea Houston, TX. v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 8
v1.01, August 19, 018, Reasons fo Coefficient of Detemination, J. Zoich, www.johnzoich.com p. 9