Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design matrix β is an unknown parameter vector in IR p ɛ is an n random vector of errors E(ɛ) 0 and Var(ɛ) σ I, where σ is an unknown parameter in IR + The Column Space of the Design Matrix Xβ is a linear combination of the columns of X: β Xβ x,,x p β x + + β p x p β p The set of all possible linear combinations of the columns of X is called the column space of X and is denoted by {Xa : a IR p } The Gauss-Markov linear model says y is a random vector whose mean is in the column space of X and whose variance is σ I for some positive real number σ, ie, E(y) and Var(y) σ I,σ IR + Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 An Example Column Space X {Xa : a IR p } { a : a IR} { } a : a IR { } a : a IR a Another Example Column Space X 0 0 0 0 0 0 a 0 : a IR a 0 0 a 0 + a 0 : a, a IR 0 a 0 a 0 + 0 a : a, a IR 0 a a a a : a, a IR a Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5

Another Column Space Example X 0 0 0 0 X 0 0 0 0 x C(X ) x X a for some a IR 0 x X for some a IR a x X b for some b IR x C(X ) Thus, C(X ) C(X ) Another Column Space Example (continued) X 0 0 0 0 X 0 0 0 0 x C(X ) x X a for some a IR x a + a 0 + a 0 x a + a a + a a + a a + a 0 0 for some a, a, a IR a + a x X for some a a + a, a, a IR for some a IR Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 6 / 5 Another Column Space Example (continued) Estimation of E(y) A fundamental goal of linear model analysis is to estimate E(y) a + a x X for some a a + a, a, a IR x X b for some b IR x C(X ) Thus, C(X ) C(X ) We previously showed that C(X ) C(X ) Thus, it follows that C(X )C(X ) We could, of course, use y to estimate E(y) y is obviously an unbiased estimator of E(y), but it is often not a very sensible estimator For example, suppose y μ + y ɛ Should we estimate E(y) ɛ 6, and we observe y μ μ by y 6? Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 7 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 8 / 5

Estimation of E(y) Orthogonal Projection Matrices The Gauss-Markov linear models says that E(y), so we should use that information when estimating E(y) Consider estimating E(y) by the point in that is closest to y (as measured by the usual Euclidean distance) This unique point is called the orthogonal projection of y onto and denoted by ŷ (although it could be argued that Ê(y) might be better notation) By definition, y ŷ min z y z, where a n i a i It can be shown that y IR n, ŷ P X y, where P X is a unique n n matrix known as an orthogonal projection matrix P X is idempotent: P X P X P X P X is symmetric: P X P X P X X X and X P X X 5 P X X(X X) X, where (X X) is any generalized inverse of X X Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 9 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 0 / 5 Why Does P X X X? Generalized Inverses G is a generalized inverse of a matrix A if AGA A P X X P X x,,x p P X x,,p X x p x,,x p X We usually denote a generalized inverse of A by A If A is nonsingular, ie, if A exists, then A is the one and only generalized inverse of A AA A AI IA A If A is singular, ie, if A does not exist, then there are infinitely many generalized inverses of A Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5

Invariance of P X X(X X) X to Choice of (X X) If X X is nonsingular, then P X X(X X) X because the only generalized inverse of X X is (X X) If X X is singular, then P X X(X X) X and the choice of the generalized inverse (X X) does not matter because P X X(X X) X will turn out to be the same matrix no matter which generalized inverse of X X is used Suppose (X X) and (X X) are any two generalized inverses of X X Then X(X X) X X(X X) X X(X X) X X(X X) X An Example Orthogonal Projection Matrix Suppose y y ɛ 6 μ +, and we observe y ɛ ( ) X(X X) X ( ) / / / / Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 An Example Orthogonal Projection Thus, the orthogonal projection of y onto the column space of X 6 Suppose X and y is P X y / / / / 6 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 6 / 5

Suppose X and y Suppose X and y y Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 7 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 8 / 5 Suppose X and y Suppose X and y y^ y^ y y y y^ Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 9 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 0 / 5

Optimality of ŷ as an Estimator of E(y) The angle between ŷ and y ŷ is 90 The vectors ŷ and y ŷ are orthogonal ŷ (y ŷ) ŷ (y P X y)ŷ (I P X )y (P X y) (I P X )y y P X(I P X )y y P X (I P X )y y (P X P X P X )y y (P X P X )y 0 ŷ is an unbiased estimator of E(y): E(ŷ) E(P X y)p X E(y) P X Xβ Xβ E(y) It can be shown that ŷ P X y is the best estimator of E(y) in the class of linear unbiased estimators, ie, estimators of the form My for M satisfying E(My) E(y) β IR p MXβ Xβ β IR p MX X Under the Normal Theory Gauss-Markov Linear Model, ŷ P X y is best among all unbiased estimators of E(y) Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Ordinary Least Squares (OLS) Estimation of E(y) OLS: Find a vector b IR p such that Note that Q(b ) Q(b) b IR p, where Q(b) Q(b) n (y i x (i) b) i n (y i x (i) b) (y Xb) (y Xb) y Xb i To minimize this sum of squares, we need to choose b IR p such Xb will be the point in that is closest to y In other words, we need to choose b such that Xb P X y X(X X) X y Clearly, choosing b (X X) X y will work Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Ordinary Least Squares and the Normal Equations Often calculus is used to show that Q(b ) Q(b) b IR p if and only if b is a solution to the normal equations: X Xb X y If X X is nonsingular, multiplying both sides of the normal equations by (X X) shows that the only solution to the normal equations is b (X X) X y If X X is singular, there are infinitely many solutions that include (X X) X y for all choices of generalized inverse of X X X X(X X) X yx X(X X) X y X P X y X y Henceforth, we will use ˆβ to denote any solution to the normal equations Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5

Ordinary Least Squares Estimator of E(y) Xβ We call Xˆβ P X Xˆβ X(X X) X Xˆβ X(X X) X y P X y ŷ the OLS estimator of E(y) Xβ It might be more appropriate to use Xβ rather than Xˆβ to denote our estimator because we are estimating Xβ rather than pre-multiplying an estimator of β by X As we shall soon see, it does not make sense to estimate β when X X is singular However, it does make sense to estimate E(y) Xβ whether X X is singular or nonsingular Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 5 / 5