Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32

Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the regression function; components of X = (X 1,..., X K ) are regressors. KC Border Linear Regression I March 3, 2017 2 / 32

The standard linear model The standard linear model or Y = Xβ + ε y t = x t,1 β 1 + + x t,k β K + ε t (t = 1,..., N) KC Border Linear Regression I March 3, 2017 3 / 32

The standard linear model The linear model is more general than you might think Kepler s 3rd Law. The square of the orbital period of a planet is directly proportional to the cube of the semi-major axis of its orbit. or Hubble s Law. Newton s Law of Gravity: P 2 = ca 3. 2 ln P = ln c + 3 ln A red shift = c distance F = G M 1M 2 d 2 ln F = ln G + ln M 1 + ln M 2 2 ln d KC Border Linear Regression I March 3, 2017 4 / 32

The standard linear model Polynomials: Geometric means: y = b 0 + b 1 x + b 2 x 2 + + b K x K y = b 0 x b 1 1 x b 2 2 x b K K ln y = ln b 0 + b 1 ln x 1 + + b K ln x K Dummy variables, or indicators: e.g., { 1 Honda X 1 = 0 otherwise { 1 Kawasaki X 2 = 0 otherwise X l =. { 1 Ducati 0 otherwise KC Border Linear Regression I March 3, 2017 5 / 32

The standard linear model Variates The variates X k may be fixed constants chosen by an experimenter or they may be random variables themselves. They are called regressors. Almost always a constant variate is included. KC Border Linear Regression I March 3, 2017 6 / 32

The standard linear model Data N observations of the values x 1,..., x K and y. y t = x t,1 β 1 + + x t,k β K + ε t (t = 1,..., N) where the ε t s are unobserved errors. In matrix form: y = Xβ + ε KC Border Linear Regression I March 3, 2017 7 / 32

The standard linear model y = y 1. is a N 1 column vector y N x 1,1 x 1,K X =..... x N,1 x N,K β 1 β =. β K is a N K matrix, is a K 1 column vector, and ε = ε 1. is a N 1 column vector. ε N KC Border Linear Regression I March 3, 2017 8 / 32

The standard linear model The estimation problem The problem is to estimate (β 1,..., β K ). Statistical assumptions of the standard model: E(ε X) = 0, Var(ε X) = E(εε X) = σ 2 I N N. This last assumption is known as homoskedasticity. KC Border Linear Regression I March 3, 2017 9 / 32

The standard linear model The Least Squares approach KC Border Linear Regression I March 3, 2017 10 / 32

The standard linear model Sum of squared residuals Vector of residuals as a function of b is y Xb The sum of squared residuals (SSR) is (y Xb) (y Xb). Expanding yields SSR(b) = y y 2y Xb + b X Xb. which is a convex quadratic function in the components of b. KC Border Linear Regression I March 3, 2017 11 / 32

The standard linear model Minimizing the sum of squared residuals By convexity, the minimum occurs whenever the gradient equals zero. The gradient of this function is SSR(b) = 2X y + 2X Xb. Thus the minimizer ˆβ OLS satisfies the first-order condition SSR( ˆβ OLS ) = 0: X y = X X ˆβ OLS. This matrix equation is known as the normal equation for ˆβ OLS. KC Border Linear Regression I March 3, 2017 12 / 32

The standard linear model Least Squares Estimator On the hypothesis that X X (a K K matrix) is nonsingular, we then have that ˆβ OLS = (X X) 1 X y minimizes the sum of squared residuals. This ˆβ OLS is called the ordinary least squares (OLS) estimator of β. KC Border Linear Regression I March 3, 2017 13 / 32

The standard linear model The singular case What if X X is singular? Then where not all a k are zero. Then a 1 X 1 + + a K X K = 0, y = β 1 X 1 + + β K X K + ε + c (a 1 X 1 + + a K X K ) }{{} =0 = (β 1 + ca 1 )X 1 + + (β K + ca k )X K + ε for any value of c. Whenever a k is nonzero, the coefficient on X k can be whatever we want. That is, the data cannot tell us what the coefficient β k is, even if every error term is zero. KC Border Linear Regression I March 3, 2017 14 / 32

The standard linear model Properties ˆβ OLS = (X X) 1 X y = (X X) 1 X (Xβ + ε) = β + (X X) 1 X ε. This is a random vector. Set e = y X ˆβ OLS, the vector e of residuals is orthogonal to each k th column vector of the values of the regressor X k. X e = 0, since X e = X (y X ˆβ OLS ) = X y X X ˆβ OLS = X y X X(X X) 1 X y = X y X y = 0. KC Border Linear Regression I March 3, 2017 15 / 32

The standard linear model If the regressors include a constant term, then the fitted plane passes through the sample means. That is, Proof: so ȳ = x 1 ˆβ1 + + x K ˆβK. y = X ˆβ OLS + e, 1 y = 1 X ˆβ OLS + 1 e, where 1 is a N-vector of ones. Since it is one of the regressors, 1 e = 0. Dividing by N gives ȳ = x 1 ˆβ1 + + x K ˆβK. KC Border Linear Regression I March 3, 2017 16 / 32

The standard linear model The Geometry of LSE y ˆβ 1 x 1 e x 1 ŷ 0 x 2 ˆβ2 x 2 KC Border Linear Regression I March 3, 2017 17 / 32

OLS and MLE OLS and MLE When the error vector ε has a multivariate normal distribution N(0, σ 2 I) distribution, then the OLS estimator of β is also the Maximum Likelihood Estimator. KC Border Linear Regression I March 3, 2017 18 / 32

OLS and MLE MLE of β The density of ε = y Xβ is the multivariate normal density N(0, σ 2 I) ( 1 2π ) N 1 det σ 2 I e 1 2 (y Xβ) (σ 2 I) 1 (y Xβ) = Taking logs, we find the log likelihood function is ( ) 1 N ( ) 1 1 2 e 1 2π (σ 2 ) N 2σ 2 (y Xβ) (y Xβ) N 2 log(2π) N 2 log σ2 1 2σ 2 (y Xβ) (y Xβ). Maximizing this with respect to β amounts to minimizing (y Xβ) (y Xβ), which is exactly what OLS does. KC Border Linear Regression I March 3, 2017 19 / 32

OLS and MLE MLE of σ 2 The first order condition for the maximum with respect to σ 2 is N 1 2 σ 2 + 1 2 (y 1 Xβ) (y Xβ) (σ 2 ) 2 = 0. Then multiply by 2(σ 2 ) 2 to get Nσ 2 + (y Xβ) (y Xβ) = 0, so where ˆσ 2 MLE = e e N, e = y X ˆβ. KC Border Linear Regression I March 3, 2017 20 / 32

OLS and MLE ˆβ OLS is unbiased ˆβ OLS = (X X) 1 X y = (X X) 1 X (Xβ + ε) = β + (X X) 1 X ε, ˆβ OLS β = (X X) 1 X ε E( ˆβ OLS β) = E(X X) 1 X ε = (X X) 1 X E ε = 0. ˆβ OLS is unbiased, E ˆβ OLS = β. KC Border Linear Regression I March 3, 2017 21 / 32

OLS and MLE Variance-covariance matrix ˆβ OLS ( ˆβ OLS β)( ˆβ OLS β) = (X X) 1 X εε X(X X) 1, Var( ˆβ OLS ) = E( ˆβ OLS β)( ˆβ OLS β) = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1. KC Border Linear Regression I March 3, 2017 22 / 32

OLS and MLE Gauss Markov Theorem In the standard linear model, if X has rank K, then the OLS estimator ˆβ OLS is the Best Linear Unbiased Estimate (BLUE) of β in the following sense. Given any other estimator b of β which is linear in y and which satisfies E b = β for any possible value of β, then Var b = Var ˆβ OLS + P, where P is positive semidefinite. This implies that for any vector w of weights Var w b Var w ˆβOLS. KC Border Linear Regression I March 3, 2017 23 / 32

OLS and MLE Proof of Gauss Markov Let b = Ay. Define Then D = A (X X) 1 X. b = Ay = ( D + (X X) 1 X ) y = ( D + (X X) 1 X ) (Xβ + ε) So in expectation, = DXβ + β + ( D + (X X) 1 X ) ε, b β = DXβ + ( (X X) 1 X + D ) ε. (1) E b β = DXβ + ( (X X) 1 X + D ) E ε. }{{} =0 KC Border Linear Regression I March 3, 2017 24 / 32

OLS and MLE Proof of Gauss Markov, continued Now b is unbiased if and only if DXβ = 0 for all β. Therefore DX = 0, so (1) becomes b β = ( D + (X X) 1 X ) ε. KC Border Linear Regression I March 3, 2017 25 / 32

OLS and MLE Proof of Gauss Markov, continued So for an unbiased linear estimator b, Var b = E(b β)(b β) = ( D + (X X) 1 X ) E(εε ) ( D + (X X) 1 X ) = σ 2( D + (X X) 1 X )( D + X(X X) 1) = σ 2( DD + }{{} DX (X X) 1 + (X X) 1 X D ) }{{} + (X X) 1 =0 =0 = σ 2 DD + Var ˆβ OLS. But P = σ 2 DD is positive semidefinite as w DD w = (D w) (D w) 0. q.e.d. KC Border Linear Regression I March 3, 2017 26 / 32

OLS and MLE Estimating σ 2 e = My = Mε, where M = I X(X X) 1 X. e e = ε M Mε = ε Mε. Since ε Mε is 1 1, it is equal to its trace, and since trace is a linear operator, the expected value of the trace of a random matrix is the trace of the expected matrix. Thus by the magic of linear algebra, E(e e) = E(ε Mε) = (N K)σ 2 KC Border Linear Regression I March 3, 2017 27 / 32

OLS and MLE Estimating σ 2, continued Define s 2 = e e N K, s = e e N K. Theorem If ε N(0, σ 2 I), then ˆβ OLS N ( β, σ 2 (X X) 1), and (N K)s 2 σ 2 χ 2 (N K) Also, ˆβ OLS and s 2 are independent. KC Border Linear Regression I March 3, 2017 28 / 32

OLS and MLE Test statistics If ε is jointly Normal, then for any K-vector w of weights, w ( ˆβ ( ) OLS β) N 0, σ 2 w (X X) 1 w, so w ( ˆβ OLS β) t(n K). (2) s w (X X) 1 w KC Border Linear Regression I March 3, 2017 29 / 32

OLS and MLE Standard error of ˆβ k OLS Special case, w is the k th unit coordinate vector: ˆβ k β k t(n K). s (X X) 1 kk Since σ 2 (X X) 1 kk = Var ˆβ kols, we have that s (X X) 1 kk is the estimated standard deviation of ˆβ kols, and is called the standard error of ˆβ kols. KC Border Linear Regression I March 3, 2017 30 / 32

OLS and MLE Confidence intervals for β k The 1 α confidence interval for β k is ( ) ˆβ k t α,n K 2 s (X X) 1 kk, ˆβk + t 1 α,n K 2 s (X X) 1 kk KC Border Linear Regression I March 3, 2017 31 / 32

OLS and MLE Testing β k To test Compute H 0 : β k = β 0 k versus H 1 : β k β 0 k t = ˆβ kols βk 0 s (X X) 1 kk We reject the null hypothesis if t > t α 2,N K. For the null hypothesis H 0 : ˆβ k = 0, we have t = ˆβ kols. s (X X) 1 kk It is this value of t that statistical software reports as the t-value for β k. KC Border Linear Regression I March 3, 2017 32 / 32