For more information about how to cite these materials visit

Size: px

Start display at page:

Download "For more information about how to cite these materials visit"

Clara Baker
6 years ago
Views:

Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.

1 Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. 1 / 20

2 Review of Pearson Correlation Kerby Shedden Department of Statistics, University of Michigan Monday 11 th March, / 20

3 Data setting Paired data: (X 1, Y 1 ), (X 2, Y 2 ),..., (X n, Y n ), e.g. X i is a person s age and Y i is that same person s income. Note that this is different from having two independent samples: X 1,..., X n Y 1,..., Y n, e.g. where we have the ages of one group of people and the incomes of a different group of people. 3 / 20

4 Standard definition and basic properties Symmetry: n i=1 Ĉor(X, Y ) = (X i X )(Y i Ȳ ) [ n i=1 (X i X ) 2 n i=1 (Y i Ȳ )2 ] 1/2 Affine invariance: Ĉov(X, Y ) = Ĉov(Y, X ). Ĉor(a + bx, c + dy ) = Ĉor(X, Y ), where a, b > 0, c, d > 0 are real constants. 4 / 20

5 Alternative definition 1 Ĉor(X, Y ) = i X i X Y i Ȳ j (X j X ) 2 j (Y j Ȳ ) 2 We can interpret X i X j (X j X ) 2 as a standardized measure of the distance from X i to the mean value X. 5 / 20

6 Interpretation The contribution of case i to alternative definition 1 is: X i X j (X j X Y i Ȳ ) 2 j (Y j Ȳ )2 This is positive if X i and Y i lie on the same side of their mean values (i.e. if X i > X and Y i > Ȳ, or if X i < X and Y i < Ȳ ). The contribution of case i is negative if X i and Y i lie on opposite sides of their mean values. 6 / 20

7 Interpretation of the correlation coefficient Source: Wikimedia commons 7 / 20

8 Anscombe s quartet All plots have the same marginal means and standard deviations, and the same correlation coefficient. Source: Wikimedia commons 8 / 20

9 Relationship to mean, variance and standard deviation Ĉov(X, Y ) = Var(X ) = n (X i X )(Y i Ȳ )/(n 1) i=1 n (X i X ) 2 /(n 1) i=1 ŜD(X ) = Var(X ) Ĉor(X, Y ) = Ĉov(X, Y ) ŜD(X ) ŜD(Y ) 9 / 20

10 Alternative definition 2 Ĉor(X, Y ) = Ĉov( X, Ỹ ) = Ĉor( X, Ỹ ) where X i = (X i X )/ŜD(X ) and Ỹi = (Y i Ȳ )/ŜD(Y ) are the standardized versions of X and Y. 10 / 20

11 Definition in terms of linear algebra Inner product: Y, X = n i=1 Y ix i Norm: X = X, X = X X X 2 n ŜD(Y ) = Y Ȳ / n 1 Ĉov(Y, X ) = Y Ȳ, X X /(n 1) 11 / 20

12 Alternative definition 3: Ĉor(Y, X ) = Y Ȳ, X X Y Ȳ X X. This definition makes sense if you think of the vectors (X 1,..., X n ) and (Y 1,..., Y n ) as points in R n. Properties: The correlation coefficient is zero if and only if X X and Y Ȳ are orthogonal vectors. The Cauchy-Schwarz inequality states that U, V U V. Thus Ĉor(X, Y ) / 20

13 Interpretation in terms of angles If θ is the angle between X X and Y Ȳ, then cos(θ) = Y Ȳ, X X Y Ȳ X X. The correlation coefficient equals 1 if and only if θ = 0, which means that the vectors X X and Y Ȳ are parallel, i.e. X X = c (Y Ȳ ). This is equivalent to X and Y being linearly related for constants a and b > 0. Y i = a + bx i, 13 / 20

14 Relationship to least squares regression Suppose we use simple linear regression, fit using least squares, to identify an approximate linear relationship between X and Y of the form The optimal values of a and b are Y i a + bx i. and b = Ĉov(X, Y ) Var(X ) = Ĉor(X, Y ) ŜD(Y ) ŜD(X ) a = Ȳ b X. If ŜD(X ) = ŜD(Y ), e.g. if X and Y have been standardized, then b = Ĉor(X, Y ). 14 / 20

15 Other connections to least squares regression The fitted values of a regression analysis are The variance of Ŷ is Ŷ = a + bx. Var(Ŷ ) = b 2 Var(X ) = Ĉor(Y, X )2 Var(Y ). 15 / 20

16 Other connections to least squares regression The covariance between Y and Ŷ is Ĉov(Y, Ŷ ) = Ĉov(Y, a + bx ) = b Ĉov(Y, X ) = Ĉor(Y, X ) ŜD(Y )Ĉov(Y, X )/ŜD(X ) = Ĉor(Y, X )2 Var(Y ). 16 / 20

17 Other connections to least squares regression It follows that Ĉor(Y, Ŷ ) = Ĉov(Y, Ŷ ) ŜD(Y ) ŜD(Ŷ ) = = Ĉor(Y, X ) 2 Var(Y ) ŜD(Y ) Ĉor(Y, X ) ŜD(Y ) Ĉor(Y, X ). 17 / 20

18 A variance decomposition The residuals of a regression analysis are Y Ŷ. It is a fact that the fitted values and residuals are orthogonal: and Ŷ, Y Ŷ = 0, Ŷ Ȳ 1, Y Ŷ = / 20

19 A variance decomposition We can use the orthogonality between fitted values and residuals to decompose the variance in a regression: Y Ȳ 2 = Y Ŷ + Ŷ Ȳ 2 = Y Ŷ 2 + Ŷ Ȳ Y Ŷ, Ŷ Ȳ = Y Ŷ 2 + Ŷ Ȳ 2. Thus 1 = Y Ŷ 2 Ŷ Ȳ 2 + Y Ȳ 2 Y Ȳ / 20

20 A variance decomposition The term Ŷ Ȳ 2 Y Ȳ 2 = Var(Ŷ ) Var(Y ) is called the r 2 of the regression. Why is it called an r 2? Since Var(Ŷ ) = Ĉor(Y, X )2 Var(Y ), it follows that Ŷ Ȳ 2 Y Ȳ 2 = Ĉor(Y, X )2. 20 / 20

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/