Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares estimates are: ˆβ p 1 = X T X ) X T y Our goal today is to learn about the statistical properties of these estimates, in particular their expectation and variance. Random Vectors ˆβ is a vector-valued random variable, so we first need to cover a little more background. Let U 1,..., U n be scalar random variables. Then the vector U = U 1,..., U n ) T is a vector valued random variable, a.k.a. a random vector. The expectation of U is the vector of expectations, E U) = E U 1 ),..., E U n )) T And the variance-covariance matrix of U is Var U 1 ) Cov U 1, U 2 ) Cov U 1, U n ) Cov U 2, U 1 ) Var U 2 ) Cov U 2, U n ) Var U) = Cov U) =...... Cov U n, U 1 ) Cov U n, U 2 ) Var U n ) For example, the errors in multiple linear regression, ɛ i, i = 1,..., n are independent with mean 0 and variance σ 2. Then, ɛ 1 0 σ 2 0 0 ɛ 2 ɛ =., E ɛ) = 0 0 σ 2 0 = 0, Var ɛ) =....... = σ2 I n 0 0 0 0 σ 2 ɛ n n n 1
Properties of Expectation and Variance for random vectors Let: U n 1 be a random vector A m n be a constant matrix b m 1 be a constant vector Then: E AU + b) = AE U) + b Var AU + b) = AVar U) A T These are the vector analogs of the properties you wrote down in the warmup. Find E y) and Var y) where y n 1 satisfies the multiple linear regression equation: y = Xβ + ɛ and E ɛ) = 0, Var ɛ) = σ 2 I 2
Expectation of the least squares estimates Assume the regression set up with the usual dimensions): y = Xβ + ɛ where X is fixed with rank p, E ɛ) = 0, and Var ɛ) = σ 2 I n. Fill in the blanks to show the least squares estimates are unbiased E X ˆβ) = E T X ) ) X T y X = E T X ) ) X T ) plug in the regression equation for y = E + ) expand = E + ) simplify term on the left A A = I = + E ) property of expectation = regression assumptions = β 3
Variance-covariance matrix of the least square estimates Fill in the blanks to find the variance covariance matrix of least squares estimates Var X ˆβ) = Var T X ) ) X T y X = Var T X ) X T Xβ + X T X ) ) X T ɛ plug in reg. eqn. and expand = 0 + ) Var ɛ) ) T property of Var = ) ) T regression assumption = ) ) T move scalar to front = σ 2 ) distribute transpose = σ 2 since A A = I = σ 2 X T X ) We can pull out the variance of a particular parameter estimate, say ˆβ i, from the diagonal of the matrix: Var β i ) = σ 2 X T X) i+1i+1 where A ij indicates the element in the i th row and j th column of the matrix A. Why i + 1? The off diagonal terms tell us about the covariance between parameter estimates. Estimating σ To make use of the variance-covariance results we need to be able to estimate σ 2. An unbiased estimate is: ˆσ 2 = 1 n p n i=1 e 2 i = e 2 n p 4
The denominator n p is known as the model degrees of freedom. Standard errors of particular parameters The standard error of a particular parameter is then the squareroot of the variance replacing σ 2 with its estimate: SE β i ) = ˆσ 2 X T X) i+1i+1 Gauss Markov Theorem You might wonder if we can find estimates with better properties. The Guass-Markov theorem says the least squares estimates are BLUE Best Linear Unbiased Estimator). Of all linear, unbiased estimates, the least squares estimates have the smallest variance. Of course if you are willing to have a non-linear estimate and/or, a biased estimate you might be able to find an estimate with smaller variance. Proof, see Section 2.8 in Faraday 5
Summary For the linear regression model Y = Xβ + ɛ where E ɛ) = 0 n 1, Var ɛ) = σ 2 I n, and the matrix X n p is fixed with rank p. The least squares estimates are ˆβ = X T X) X T Y Furthermore, the least squares estimates are BLUE, and E ˆβ) = β, E ˆσ 2) = E 1 n p Var ˆβ) = σ 2 X T X) ) n = σ 2 e 2 i i=1 We have not used any Normality assumptions to show these properties. 6