October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal idepedet variable X: X X 1 X 2 X 1,1 X 1,2 X 1,k X 2,1 X 2,2 X 2,k ad Y Y 1 Y 2 X X,1 X,k X,k k Y 1 We ca thik of Y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca defie a subspace of R called the colum space of a k matrix X, that is a collectio of all vectors i R that ca be writte as liear combiatios of the colums of X: SX { z R : z Xb, b b 1, b 2,, b k R k} For two vectors a, b i R, the distace betwee a ad b is give by the Euclidea orm 1 of their differece a b a b a b Thus, the least squares problem, miimizatio of the sumof-squared errors Y Xb Y Xb, is to fid, out of all elemets of SX, the oe closest to Y : mi Y y SX y 2 The closest poit is foud by "droppig a perpedicular" That is, a solutio to the least squares problem, Ŷ X β must be chose so that the residual vector ê Y Ŷ to each colum of X: ê X 0 is orthogoal perpedicular As a result, ê is orthogoal to every elemet of SX Ideed, if z SX, the there exists b R k such that z Xb, ad SX: ê z ê Xb 0 The collectio of the elemets of R orthogoal to SX is called the orthogoal complemet of S X { z R : z X 0 } Every elemet of S X is orthogoal to every elemet i SX 1 For a vector x x 1, x 2,, x, its Euclidea orm is defied as x x x i1 x2 i 1
The solutio to the least squares problem is give by Ŷ X β X X X 1 X Y P X Y, where P X X X X 1 X is called the orthogoal projectio matrix For ay vector z R, P X z SX Furthermore, the residual vector will be i S X: z P X z S X 1 To show 1, first ote, that, sice the colums of X are i SX, P X X X X X 1 X X X, ad, sice P X is a symmetric matrix, X P X X Now, X z P X z X z X P X z X z X z 0 Thus, by the defiitio, the residuals z P X z belogs to S X The residuals ca be writte as ê Y P X Y I P X Y M X Y, where is a projectio matrix oto S X M X I P X I X X X 1 X, The projectio matrices P X ad M X have the followig properties: 2
P X + M X I This implies, that for ay z R, z P X z + M X z Symmetric: P X P X, M X M X Idempotet: P X P X P X, ad M X M X M X P X P X X X X 1 X X X X 1 X X X X 1 X P X M X M X I P X I P X I 2P X + P X P X I P X M X Orthogoal: P X M X P X I P X P X P X P X P X P X 0 This property implies that M X X 0: M X X I P X X X P X X X X 0 Note that, i the above discussio, oe of statistical assumptios such as E e i X i 0 have bee used Give data, Y ad X, oe ca always perform least squares, regardless of what data geeratig process stads behid the data However, oe eeds a model to discuss the statistical properties of a estimator such as ubiasedess ad etc 3
Partitioed regressio We ca partitio the matrix of regressors X as follows: X [X 1 X 2 ], ad write the model as Y X 1 β 1 + X 2 β 2 + e, where X 1 is a k 1 matrix, X 2 is k 2, k 1 + k 2 k, ad β where β 1 ad β 2 are k 1 ad k 2 -vectors respectively Such a decompositio allows oe to focus o a group of variables ad their correspodig parameters, say X 1 ad β 1 If β 1 β 2 β β1 β 2 the oe ca write the followig versio of the ormal equatios first-order coditios of the least square: X X β X Y,, as X 1X 1 X 1X 2 X 2X 1 X 2X 2 β1 β 2 X 1Y X 2Y Oe ca obtai the expressios for β 1 ad β 2 by ivertig the partitioed matrix o the left-had side of the equatio above Alteratively, let s defie M 2 to be the projectio matrix o the space orthogoal to the space S X 2 : M 2 I X 2 X 1 2 X 2 X 2 The, I order to show that, first write β 1 X 1M 2 X 1 1 X 1 M 2 Y 2 Y X 1 β1 + X 2 β2 + ê 3 Note that by the costructio, M 2 ê ê ê is orthogoal to X 2, M 2 X 2 0, X 1ê 0, X 2ê 0 4
Substitute equatio 3 ito the right-had side of equatio 2: X 1 1 M 2 X 1 X 1 M 2 Y X 1 1M 2 X 1 X 1 M 2 X 1 β1 + X 2 β2 + ê X 1M 2 X 1 1 X 1 M 2 X 1 β1 + X 1M 2 X 1 1 X 1 ê M 2 X 2 0 ad M 2 ê ê β 1 where Sice M 2 is symmetric ad idempotet, oe ca write β 1 M 2 X 1 M 2 X 1 1 M 2 X 1 M 2 Y X 1 X 1 1 X 1Ỹ, X 1 M 2 X 1 X 1 X 2 X 1 2 X 2 X 2 X 1 residuals from the regressio of colums of X 1 o X 2, Ỹ M 2 Y Y X 2 X 1 2 X 2 X 2 Y residuals from the regressio of Y o X 2 Thus, to obtai coefficiets for the first k 1 regressors, istead of ruig the full regressio with k 1 + k 2 regressors, oe ca regress Y o X 2 to obtai the residuals Ỹ, regress X 1 o X 2 to obtai the residuals X 1, ad the regress Ỹ o X 1 to obtai β 1 I other words, β 1 shows the effect of X 1 after cotrollig for X 2 Similarly to β 1, oe ca write: β 2 X 1 2M 1 X 2 X 2 M 1 Y, where M 1 I X 1 X 1 1 X 1 X 1 For example, cosider a simple regressio for i 1,, Y i β 1 + β 2 X i + e i, Let s defie a -vector of oes: 1 1 1 1 1 5
I this case, the matrix of regressors is give by 1 X 1 1 X 2 1 X 1 X Cosider ad Now, 1 1 Therefore, M 1 I 1 1 1 1 1, β 2 X M 1 Y X M 1 X M 1 I 1 11, ad M 1 X X 1 1 X X X1 X 1 X X 2 X X X, where X is the sample average: X 1 X 1 X i Thus, the matrix M 1 trasforms the vector X ito the vector of deviatios from the average We ca write i1 β 2 i1 Xi X Y i i1 Xi X 2 i1 Xi X Y i Y i1 Xi X 2 Goodess of fit Write Y P X Y + M X Y Ŷ + ê, 6
where, by the costructio, Ŷ ê P X Y M X Y Y P X M X Y 0 Suppose that the model cotais a itercept, ie the first colum of X is the vector of oes 1 The total variatio i Y is Yi Y 2 Y M 1 Y i1 Ŷ + ê M 1 Ŷ + ê Ŷ M 1 Ŷ + Ŷ M 1 ê + 2Ŷ M 1 ê Sice the model cotais a itercept, 1 ê 0, ad M 1 ê ê However, Ŷ ê 0, ad, therefore, Y M 1 Y Ŷ M 1 Ŷ + ê ê, or Yi Y 2 2 Ŷi Ŷ + ê 2 i Note that i1 i1 i1 Y 1 Y 1 Ŷ 1 Ŷ Ŷ + 1 ê Hece, the averages of Y ad its predicted values Ŷ are equal, ad we ca write: Yi Y 2 2 Ŷi Y + ê 2 i, or 4 i1 i1 i1 T SS ESS + RSS, where T SS Yi Y 2 total sum-of-squares, i1 7
ESS RSS i1 2 Ŷi Y explaied sum-of-squares, ê 2 i residual sum-of-squares i1 The ratio of the ESS to the T SS is called the coefficiet of determiatio or R 2 : i1 Ŷi Y 2 R 2 i1 Yi Y 2 1 i1 ê2 i i1 Yi Y 2 1 ê ê Y M 1 Y Properties of R 2 : Bouded betwee 0 ad 1 as implied by decompositio 4 This property does ot hold if the model does ot have a itercept, ad oe should ot use the above defiitio of R 2 i this case If R 2 1 the ê ê 0, which ca happe oly if Y SX, ie Y is exactly a liear combiatio of the colums of X R 2 icreases by addig more regressors Suppose we have observatios o regressors Z 1,, Z k ad W 1,, W m ad depedet variable Y Cosider two regressios: the log regressio with all regressors ad the short regressio with oly Z 1,, Z k It ca be show that the R 2 of the log regressio must be smaller or equal to the R 2 of the short regressio R 2 shows how much of the sample variatio i Y was explaied by X However, our objective is to estimate populatio relatioships ad ot to explai the sample variatio High R 2 is ot ecessary a idicator of the good regressio model, ad a low R 2 is ot a evidece agaist it Sice R 2 icreases with iclusio of additioal regressors, istead researchers ofte report the adjusted coefficiet of determiatio R 2 : R 2 1 1 1 R 2 k 1 ê ê/ k Y M 1 Y / 1 The adjusted coefficiet of determiatio discouts the fit whe the umber of the regressors k is large relative to the umber of observatios R 2 may decrease with k However, there is o strog argumet for usig such a adjustmet 8