Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22
Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns A typical element of A might be denoted by either A ij or a ij, where i = 1,, n and j = 1,, m If a matrix has only one column or only one row, it is called a vector There are two types of vectors, column vectors and row vectors If a matrix has the same number of columns and rows, it is said to be square A square matrix A is symmetric ifa ij = A ji for all i and j A square matrix is said to be diagonal if A ij = 0 for all i j The transpose of A is obtained by interchanging its row and column subscripts, denote A T or A 2 / 22
Arithmetic Operations on Matrices Addition and subtraction of matrices works exactly the way it does for scalars Matrix multiplication: A B = C n mm l AB BA except in special cases Identity matrix: I, IB = BI n l Assuming that the dimensions of the matrices are conformable for the various operations Distributive properties Associative properties A(B + C) = AB + AC (B + C)A = BA + CA (A + B) + C = A + (B + C) (AB)C = A(BC) 3 / 22
Transpose and Inverse (AB) T = B T A T If A is invertible, then it has an inverse matrix A 1 with the property that AA 1 = A 1 A = I If A is symmetric, then so is A 1 If A is triangular, then so is A 1 If an n n square matrix A is invertible, then its rank is n Such a matrix is said to have full rank If a square matrix does not have full rank, and therefore is not invertible, it is said to be singular for matrices that are not necessarily square, the rank is the largest number m for which an m m nonsingular matrix can be constructed by omitting some rows and some columns from the original matrix 4 / 22
Regression Models and Matrix Notation Model with one regressor y 1 = β 0 + β 0 x 11 + u 1 y 2 = β 0 + β 0 x 12 + u 2 y n = β 0 + β 0 x 1n + u n where y = Matrix notation: y 1 y 2 y n 1 x 11 1 x 12 y = Xβ + u,, X =, u = 1 x 1n u 1 u 2 u n, β = ( β0 β 1 ) 5 / 22
Multiple Regression Model Matrix notation: where y 1 y 2 y =, X = y n y = Xβ + u, x 11 x k1 x 12 x k2 x 1n x kn Incorporate the case of intercept, β = β 1 β k 6 / 22
Partitioned Matrices [ X = x 1 n k n 1 X = n k X = n k X 1 X 2 X n [ k1 x 2 n 1 x k n 1 1 k 1 k 1 k ] X 11 k 2 X 12 X 21 X 22 ], n 1 n 2 If two matrices A and B of the same dimensions are partitioned in exactly the same way, they can be added or subtracted block by block A + B = [ A 1 +B 1 A 2 +B 2 ] 7 / 22
OLS Estimation of One Linear Regressor Model First derivative: 1 n 1 n n (y i β 0 β 1 x 1i ) = 0 i=1 n x 1i (y i β 0 β 1 x 1i ) = 0 i=1 [ n n i=1 x 1i Matrix form n i=1 x 1i n i=1 x2 1i ] [ β0 β 1 ] = X T Xβ = X T y [ n i=1 y ] i n i=1 x 1iy i ˆβ = ( X T X ) 1 X T y 8 / 22
OLS Estimation of Multiple Linear Regressor Model SSR(β) = (y Xβ) T (y Xβ) First Derivative X T (y Xβ) = 0 Remark: Method of Moments E (x i u) = 0, i = 1,, k X T Xβ = X T y 9 / 22
The Geometry of Linear Regression: Introduction n observations of a linear regression model with k regressors y = Xβ + u, where y and u are n vectors, X is an n k matrix ˆβ = ( X T X ) 1 X T y Numerical properties of these OLS estimates they have nothing to do with how the data were actually generated Euclidean geometry 10 / 22
The Geometry of Vector Spaces an n vector was defined as a column vector with n elements, that is, an n 1 matrix Euclidean space in n dimensions, which we will denote as E n Scalar or inner product: For any two vectors x, y E n, their scalar product is x, y = x T y Comutative: x, y = y, x norm of a vector x is x = ( x T x ) 1/2 11 / 22
Vector Geometry in Two Dimensions Cartesian coordinates: x = (x 1, x 2 ), y = (y 1, y 2 ) Adding: x + y O A C B x x y x + y y x 2 y 2 y 1 x 1 addition of vectors 12 / 22
The Geometry of Scalar Products Multiplying: αx = α x x, y = x y cos θ cos θ = 0, θ = π 2, x, y are said to be orthogonal Cauchy-Schwartz inequality: x, y x y 13 / 22
Subspaces of Euclidean Space Defining a subspace of E n is in terms of a set of basis vectors A subspace that is of particular interest to us is the one for which the columns of X provide the basis vectors We may denote the k columns of X as x 1, x 2,, x k Then the subspace associated with these k basis vectors will be denoted by (X) or (x 1, x 2,, x k ) { } k (x 1, x 2,, x k ) z E n z = b i x i, b i R The subspace defined above is called the subspace spanned by the x 1, x 2,, x k or the column space of X The orthogonal complement of (X) is denoted as (X) i=1 (X) { w E n w T z =0, z (X) } If the dimension of (X) is k, then the dimension of (X) is n k 14 / 22
The Geometry of OLS Estimation ] X = [x 1 x 2 x k ] Xβ = [x 1 x 2 x k β 1 β 2 β k = k β i x i i=1 OLS estimator ˆβ ( X T y X ˆˆβ ) = 0 15 / 22
The Geometry of OLS Estimation Pythagoras Theorem y 2 = X ˆβ 2 + û 2 y T y = ˆβ T X T X ˆβ + û T û T SS = ESS + SSR O y θ û X ˆβ Residuals and fitted values 16 / 22
The Geometry of OLS Estimation x 2 O θ y X ˆβ B û A S(x 1, x 2) x 1 a) y projected on two regressors x 2 A ˆβ 2x 2 X ˆβ O ˆβ 1x 1 x 1 O θ y X ˆβ B û A b) The span S(x 1, x 2) of the regressors c) The vertical plane through y Linear regression in three dimensions 17 / 22
Orthogonal Projections A projection is a mapping that takes each point of E n into a point in a subspace of E n, while leaving all points in that subspace unchanged An orthogonal projection maps any point into the point of the subspace that is closest to it OLS is an example of orthogonal projection Projection matrix P X = X ( X T X ) 1 X T M X = I X ( X T X ) 1 X T = I P X X ˆβ = P X y M X y = û 18 / 22
Orthogonal Projections P X P X = P X, M X M X = M X The pair of projections P X and M X are said to be complementary projections, since the sum of P X y and M X y restores the original vector y y 2 = P X y 2 + M X y 2 P X y y P Z would be the matrix that projects on to (Z), P X,W would be the matrix that projects of (X, W) 19 / 22
Linear Transformations of Regressors Nonsingular linear transformation: k k A XA = X [ a 1 a 2 ] [ a k = Xa1 Xa 2 ] Xa k (X) = (XA) Xβ = XAA 1 β Fitted values and residuals are invariant to any nonsingular linear transformation of the columns of X, even though ˆβ will change Special Case: Units of measurement of the regressors 20 / 22
The Frisch-Waugh-Lovell Theorem Two Groups of Regressors: y = X 1 β 1 + X 2 β 2 + u, where X 1 is n k 1 matrix, X 2 is n k 2 matrix, X = [ X 1 X 2 ] with k = k1 + k 2 X T 2 X 1 = O : OLS estimator of β 2 in y = X 2 β 2 + u 2 is the same as the OLS estimator β 2 in y = X 1 β 1 + X 2 β 2 + u(second condition in the omitted variable bias) P 1 = P X1 = X 1 ( X T 1 X 1 ) 1 X T 1 P 1 P X = P X P 1 = P 1 M 1 = I P 1 M 1 M X = M X 21 / 22
The Frisch-Waugh-Lovell Theorem Consider two regression model y = X 1 β 1 + X 2 β 2 + u M 1 y = M 1 X 2 β 2 + residual Theorem (The Frisch-Waugh-Lovell Theorem) The OLS estimates of β 2 from the two regressions above are numerically identical The residuals from regressions above are numerically identical 22 / 22