Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix Transpose/rank/inverse of a matrix Determinants Random vectors and matrices Simple linear regression model in matrix terms Least squares estimation of regression parameters Fitted values and residuals Inferences in regression analysis ANOVA results W Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 20

Random Vector and Matrix A random vector or a random matrix contains elements that are random variables Simple linear regression: The response variables Y 1,, Y n can be written in the form of a random vector Y n 1 = Y 1 Y n Alternative notation: Y = (Y 1,, Y n ) T or Y = (Y 1,, Y n ) W Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 20

Expectation of Random Vector/Matrix The expectation of an n 1 random vector Y is E(Y ) n 1 = [E(Y i ) : i = 1,, n] = Since E(Y i ) = β 0 + β 1 X i for i = 1,, n, β 0 + β 1 X 1 E(Y ) = β 0 + β 1 X n E(Y 1 ) E(Y n ) In general, the expectation of an n p random matrix Y is E(Y ) n p = [E(Y ij ) : i = 1,, n; j = 1,, p] W Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 20

Variance-Covariance Matrix of Random Vector The variance-covariance of an n 1 random vector Y is Σ (or σ 2 {Y }) Σ = V ar(y ) = E [(Y ] E(Y ))(Y E(Y )) T (Y 1 E(Y 1)) 2 (Y 1 E(Y 1))(Y n E(Y n)) (Y 2 E(Y 2))(Y n E(Y n)) = E (Y n E(Y n)) 2 V ar(y 1) Cov(Y 1, Y 2) Cov(Y 1, Y n) V ar(y 2) Cov(Y 2, Y n) = V ar(y n) Since Cov(Y i, Y j ) = Cov(Y j, Y i ), Σ is symmetric W Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 20

Variance-Covariance Matrix of Random Vector The independent random errors ɛ 1,, ɛ n can be written in the form of a random vector ɛ 1 ɛ n 1 = ɛ n The expectation of ɛ is E(ɛ) = 0 n 1 The variance-covariance matrix of ɛ is σ 2 {ɛ} = Σ ɛ = W Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 20

Some Basic Results For Y (n 1 random vector), A (n n non-random matrix), and b (n 1 non-random vector), we have E(AY + b) = AE(Y ) + b Var(AY + b) = AVar(Y )A T Also for a n 1 = (a 1,, a n ) T (n 1 non-random vector), a T a = n a 2 i := a 2 2 i=1 and for J = [1] n n (n n matrix of 1 s), a T Ja = W Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 20

Multivariate Normal Distribution Let Y m 1 = (Y 1,, Y m ) T follow a multivariate normal distribution with mean µ m 1 = (µ 1,, µ m ) T and variance We denote this by Σ m m = (σ 2 ij) i=1,,m;j=1,,m Y N(µ, Σ) The probability density function is [ 1 f(y ) = exp 1 ] (2π) m/2 Σ 1/2 2 (Y µ)t Σ 1 (Y µ), where Σ is the determinant of Σ W Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 20

Using the Notation with Simple Linear Regression Let Y n 1 = Let X n 2 = Y 1 Y n denote the n 1 vector of response variables 1 X 1 denote the n 2 design matrix of predictor 1 X n variables (2 columns for simple LR) ɛ 1 Let ɛ n 1 = denote the n 1 vector of random errors Let β 2 1 = [ ɛ n β 0 β 1 ] denote the 2 1 vector of regression coefficients W Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 20

Simple Linear Regression in Matrix Terms The simple linear regression model in matrix terms is Y = Xβ + ɛ, where ɛ N(0, σ 2 I) Recall that E(ɛ) = 0 n 1 and Σ ɛ = σ 2 I n n Thus, we have Y N(Xβ, σ 2 I) since E(Y ) = E(Xβ + ɛ) = Xβ + E(ɛ) = Xβ Σ = Var(Xβ + ɛ) = Var(ɛ) = σ 2 {ɛ} = σ 2 I W Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 20

Least Squares Estimation [ Let ˆβ ˆβ0 2 1 = ˆβ 1 As we will show, ] denote the least squares estimate of β ˆβ = (X T X) 1 X T Y Recall that the least squares method minimizes Q = In matrix terms, n (Y i β 0 β 1 X i ) 2 i=1 W Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 20

Normal Equations Let ( Q Q =, Q ) β 2 1 β 0 β 1 Differentiate Q with respect to β to obtain: Q β = Set the equation above to 0 2 1 and obtain a set of normal equations: Thus the least squares estimate of β is ˆβ = assuming that the 2 2 matrix X T X is nonsingular and thus invertible W Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 20

Fitted Values Let Ŷn 1 = Ŷ 1 Ŷ n Since Ŷi = ˆβ 0 + ˆβ 1 X i, we have denote the n 1 vector of fitted vales Ŷ = X ˆβ Note that Ŷ 1 = 1 X 1 [ ˆβ0 ˆβ 1 ] = ˆβ 0 + ˆβ 1 X 1 Ŷ n 1 X n ˆβ 0 + ˆβ 1 X n W Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 20

Hat Matrix Rewrite the vector of fitted values as Ŷ = X ˆβ = X(X T X) 1 X T Y := HY where the n n hat matrix is defined as H = X(X T X) 1 X T The hat matrix H is symmetric and idempotent, as H T = HH = W Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 20

Residuals Let e n 1 = e 1 e n Since e i = Y i Ŷi, we have e = denote the n 1 vector of residuals (I H) is symmetric and idempotent Proof: W Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 20

Variance-Covariance Matrix of Residuals The variance-covariance matrix of residuals is The estimated variance-covariance matrix of residuals is W Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 20

Fitted Values The variance-covariance matrix of fitted values is The estimate of σ 2 {Ŷ } is W Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 20

Regression Coefficients The variance-covariance matrix of least squares estimates is The estimate of σ 2 { ˆβ} is W Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 20

Estimation of Mean Response and Prediction of New Observation Define X h = [ 1 X h ] NOTE!! Book uses column vector for X h, not row So many transposes are opposite from book s presentation! The variance of Ŷh for estimating E(Y h ) is The estimate of σ 2 {Ŷh} is Similarly, the variance of Ŷh for predicting Y h(new) is The estimate of σ 2 {pred} is W Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 20

Sums of Squares Total sum of squares: n SST O = (Y i Ȳ )2 = i=1 Error sum of squares: n SSE = (Y i Ŷi) 2 = e T e = i=1 Regression sum of squares: SSR = SST O SSE = W Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 20

Sums of Squares as Quadratic Forms For a non-random symmetric n n matrix A, a quadratic form is defined as n n Q(Y ) = Y T AY = a ij Y i Y j, a ij = a ji Total sum of squares: SST O = i=1 j=1 Error sum of squares: SSE = Regression sum of squares: SSR = W Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 20