Linear Regression Models, OLS, Assumptions and Properties

Chapter 2 Liear Regressio Models, OLS, Assumptios ad Properties 2.1 The Liear Regressio Model The liear regressio model is the sigle most useful tool i the ecoometricia s kit. The multiple regressio model is the study if the relatioship betwee a depedet variable ad oe or more idepedet variables. I geeral it ca be writte as: y = f(x 1,x 2,...,x K )+ε (2.1) = x 1 β 1 + x 2 β 2 + +x K β K + ε. The radom disturbace ε arises because we caot capture every ifluece o a ecoomic variable i a model. The et effect of the omitted variables is captured by the disturbace. Notice that y is the summatio of two parts, the determiistic part ad the radom part, ε. The objective is to estimate the ukow parameters (β 1, β 2,..., β K ) i this model. 2.2 Assumptios The classical liear regressio model cosist of a set of assumptios how a data set will be produced by the uderlyig data-geeratig process. The assumptios are: A1. Liearity A2. Full rak A3. Exogeeity of the idepedet variables A4. Homoscedasticity ad oautocorrelatio A5. Data geeratio A6. Normal distributio 5

6 2 Liear Regressio Models, OLS, Assumptios ad Properties 2.2.1 Liearity The model specifies a liear relatioship betwee y ad x 1,x 2,...,x K. The colum vector x k cotais the observatios of the variable x k, k = 1,2,...K, which ca be writte i a sigle K data matrix X. Whe the model is estimated with a costat term, the first colum of X is assumed to be a colums oe oes, makig β 1 the coefficiet associated with the costat of the model. The vector y cotais the observatios, y 1,y 2,...,y, ad the vector ε cotais all the disturbaces. The model ca be represeted as: y 1 1 x 11 x 21... x K1 β 1 ε 1 y 2 1 x 12 x 22... x K2 β 2 ε 2. y 1 =....... 1 x 1 x 2... x K I matrix form the model ca be writte as: K. β K K 1 +. ε K 1 ASSUMPTION 1 : y=xβ + ε (2.2) Notice that the assumptio meas that Equatio 2.2 ca be either i terms of the origial variables or after some trasformatio. For example, cosider the followig two equatios: y = Ax β e ε y = Ax β + ε. While the first equatio ca be made liear by takig logs, the secod equatio is ot liear. Typical examples iclude the costat elasticity model: ly = β 1 + β 2 lx 2 + β 3 x 3 + +β K x K + ε where the elasticity betwee y ad x k is give by ly/ lx k = β k. Aother commo model is the semilog. 2.2.2 Full rak Full rak or idetificatio coditio meas that there are o exact liear relatioships betwee the variables. ASSUMPTION 2 : X is a K matrix with rak K. (2.3) X is a full colum rak meas that the colums of X are liearly idepedet ad that there are at least K observatios.

2.2 Assumptios 7 2.2.3 Exogeeity of the idepedet variables Exogeeity of the idepedet variables meas that each of the disturbace terms is assumed to have zero expected value. This ca be writte as: ASSUMPTION 3 : E[ε X] = 0. (2.4) This assumptio meas that the disturbaces are purely radom draws from some populatio ad that o observatio o x covey iformatio about the expected value of the disturbace (ε ad X are ucorrelated). Give Assumptio 3, Xβ is the coditioal mea fuctio because Assumptio 3 implies that: E[y X]=Xβ. (2.5) 2.2.4 Homoscedasticity ad oautocorrelatio The combiatio of homoscedasticity ad oautocorrelatio is also kow as spherical disturbaces ad it refers to the variaces ad covariaces of the disturbaces. Homoscedasticity meas costat variace ad ca be writte as: Var[ε i X]=σ 2 for all,2,,, Cov[ε i,ε j X]=0 for all i j. Typical examples of regressio models with heteroscedastic errors are household expeditures ad firms profits. Autocorrelatio (or serial correlatio) o the other side, meas just correlatio betwee the errors i differet time periods. Hece, autocorrelatio is usually a problem with time series or pael data. σ 2 0... 0 E[εε 0 σ 2... 0 X]=...... 0 0... σ 2 which ca be writte as: ASSUMPTION 4 : E[εε X]=σ 2 I. (2.6)

8 2 Liear Regressio Models, OLS, Assumptios ad Properties 2.2.5 Data geeratio It is mathematically coveiet to assume x i is ostochastic, like i a agricultural experimet where y i is yield ad x i is the fertilizer ad water applied. However, social scietist are very likely to fid stochastic x i. The assumptio we will use is that X ca be a mixture of costat ad radom variables, ad the mea ad variace of ε i are both idepedet of all elemets of X. ASSUMPTION 5 : X may be fixed or radom. (2.7) 2.2.6 Normal distributio The last assumptio, which is coveiet, but ot ecessary to obtai may of the results of the liear regressio model is that the residuals follow a ormal distributio with zero mea ad costat variace. That is addig ormality to Assumptios 3 ad 4. ASSUMPTION 6 : ε X N[0,σ 2 I]. (2.8) The assumptios of the liear regressio model are summarized i Figure 2.1. Fig. 2.1 Classical Regressio Model, from [Greee (2008)].

2.3 Ordiary Least Squares Regressio 9 2.3 Ordiary Least Squares Regressio The first distictio eeded at this poit is betwee populatio parameters ad sample estimates. From the previous discussio we have β ad ε i as populatio parameters, hece we use b ad e i as their sample estimates. For the populatio regressio we have E[y i x i ]=x iβ, however β is ukow ad we use its estimate b. Therefore we have: E[y i x i ]=ŷ i = x ib. (2.9) For observatio i, the (populatio) disturbace term is give by: ε i = y i x iβ. (2.10) Oce we estimate b the estimate of the disturbace term ε i is its sample couterpart, the residual: 1 e i = y i x ib. (2.11) It follows that y i = x iβ + ε i = x ib+e i. (2.12) A graphical summary of this discussio is preseted i Figure 2.2. This figure shows the simple example of a sigle regressor. 2.3.1 Least Squares Coefficiets The problem i had is to obtai a estimate of the ukow populatio vector β based o the sample data (y i, x i ) for,2,,. I this sectio we will derive the least squares estimator vector for β, deoted by b. By defiitio, the least squares coefficiet vector miimizes the sum of squared residuals: e 2 i0 = (y i x ib 0 ) 2. (2.13) The idea is to pick the vector b 0 that makes the summatio i Equatio 2.13 the smallest. I matrix otatio: mi b 0 S(b 0 ) = e 0e 0 =(y Xb 0 ) (y Xb 0 ). (2.14) mi b 0 S(b 0 ) = y y b 0X y y Xb 0 + b 0X Xb 0 mi b 0 S(b 0 ) = y 2y Xb 0 + b 0X Xb 0. 1 [Dougherty (2007)] follows a similar otatio, but most textbooks, e.g. [Wooldridge (2009)], use ˆβ as the sample estimate of β.

10 2 Liear Regressio Models, OLS, Assumptios ad Properties Fig. 2.2 Populatio ad Sample Regressio, from [Greee (2008)]. The first order ecessary coditio is: S(b 0 ) b 0 = 2X y+2xxb 0 = 0. (2.15) Let b be the solutio. The, give that X is full rak, (X X) 1 exists ad the solutio is: b=(x X) 1 X y. (2.16) The secod order coditio is: 2 S(b 0 ) b 0 b = 2X y+2xxb 0 = 0. (2.17) 0 That is satisfied if it yields a positive defiite matrix. This will be the case if X is full rak, the the least squares solutio b is uique ad miimizes the sum of squared residuals. Example 1 Derivatio of the least squares coefficiet estimators for the simple case of a sigle regressor ad a costat. y i = b 0 + b 1 x i + e i (2.18)

2.3 Ordiary Least Squares Regressio 11 ŷ i = b 0 + b 1 x i For observatio i we obtai the residual, the square it ad fially sum across all observatios to obtai the sum of squared residuals: e i = y i ŷ i (2.19) e 2 i = (y i ŷ i ) 2 e 2 i = (y i ŷ i ) 2 Agai, the coefficiets b 0 ad b 1 are chose to miimize the sum of squared residuals: mi b 0,b 1 (y i ŷ i ) 2 (2.20) mi b 0,b (y i b 0 b 1 x i ) 2 1 The first order ecessary coditio are: 2 2 (y i b 0 b 1 x i ) = 0 w.r.t. b 0 (2.21) x i (y i b 0 b 1 x i ) = 0 w.r.t. b 1 (2.22) Dividig Equatio 2.22 by ad workig through some math we obtai the OLS estimators for the costat: Pluggig this result ito Equatio 2.22 we obtai: b 0 = ȳ b 1 x. (2.23) b 1 = i=0 (x i x)(y i ȳ) i=0 (x i x) 2. (2.24) 2.3.2 Normal equatios From the first order coditios i Equatio 2.15 we ca obtai the ormal equatios: X Xb X y= X (y Xb)= X e=0. (2.25)

12 2 Liear Regressio Models, OLS, Assumptios ad Properties Therefore, followig X e = 0 we ca derive a umber of properties: 1. The observed values of X are ucorrelated with the residuals. For every colum of X, x k e = 0. I additio, if the regressio icludes a costat: 2. The sum of the residuals is zero. x 1 e= i e = i e i = 0. 3. The sample mea of the residuals is zero. ē= i e i = 0. 4. The regressio hyperplae passes through the meas of the data. This follows from ē = 0. Recall that e = y Xb. Dividig by, we have ē = ȳ xb. This implies that ȳ = x b. 5. The predicted values of y are ucorrelated with the residuals. ŷ e = (Xb) e = b X e = 0. 6. The mea of the fitted values is equal to the mea of the actual values. Because y = ŷ + e. We have i e i = 0, the ŷ = ȳ. 2.3.3 Projectio matrix The matrix M (residual maker) is fudametal i regressio aalysis. It is give by: M=I X(X X) 1 X. (2.26) It geerates the vector of least square residuals i a regressio of y o X whe it premultiplies ay vector y. It ca be easily derived from the least square residuals: e = y Xb (2.27) = y X(X X) 1 X y = (I X(X X) 1 X )y = My. M is a symmetric (M = M ) ad idempotet (M = M 2 ) matrix. For example, it is useful to see that if we regress X o X we have a perfect fit ad the residuals should be zero: MX=0. (2.28) The projectio matrix P is also a symmetric ad idempotet matrix formed from X. Whe y is premultiplied by P, it results o the the fitted values ŷ i the regressio of y o X. I is give by: P=X(X X) 1 X. (2.29) It ca be obtaied by startig from the equatio y=xb+e. We kow that ŷ=xb, the y=ŷ+ethat gives: ŷ = y e (2.30) = y My

2.3 Ordiary Least Squares Regressio 13 = (I M)y = X(X X) 1 X y = Py. Notice that M = I P. I additio, M ad P are orthogoal ad i a regressio of X o X, the fitted values are also X, that is, PX = X. Followig from the results of these M ad P matrices, we ca see that the least squares regressio partitios the vector y ito two orthogoal parts, the projectio ad the residual, y=py+my. (2.31) 2.3.4 Goodess of fit ad aalysis of variace The variatio of the depedet variable is captured i terms of deviatios from its mea, y i ȳ. The the total variatio i y is the sum of square deviatios: SST= (y i ȳ) 2. (2.32) To decompose this sum of square deviatios ito the part the regressio model explai ad the part the model does ot explai, we first look at a sigle observatio to get some ituitio. For observatio i we have: Subtractig ȳ we obtai: y i = ŷ i + e i = x ib+e i. (2.33) y i ȳ=ŷ i ȳ+e i =(x i x)b+e i. (2.34) Figure 2.3 illustrates the ituitio for the case of a sigle regressor. Let the symmetric ad idempotet matrix M 0 have (1 1/) i all its diagoal elemets ad 1/ i all its off-diagoal elemets: [ M 0 = I 1 ii ] = 1 1 1... 1 1 1 1... 1...... 1 1... 1 1 M 0 trasforms observatios ito deviatios from sample meas. The, M 0 is useful i computig sum of square deviatios. For example, the sum of squared deviatios about the mea for x i is: (x i x) 2 = x M 0 x. (2.35)

14 2 Liear Regressio Models, OLS, Assumptios ad Properties Fig. 2.3 Decompositio of y i, from [Greee (2008)]. Now, if we start with y = Xb + e ad premultiply it by M 0 we obtai: The, we traspose this equatio to obtai: Premultiply Equatio 2.36 by Equatio 2.37: M 0 y=m 0 Xb+M 0 e (2.36) y M 0 = b X M 0 + e M 0 (2.37) y M 0 y = (b X M 0 + e M 0 )(M 0 Xb+M 0 e) (2.38) = b X M 0 Xb+b X M 0 e+e M 0 Xb+e M 0 e = b X M 0 Xb+e e. The secod term o the right-had-side i the last lie is zero because M 0 e=ead X e = 0, while the third term is zero because e M 0 X = e X = 0 (the regressors are orthogoal to the residuals). Equatio 2.38 show the decompositio of the total sum of squares ito regressio sum of squares plus the error sum of squares: SST=SSR+SSE (2.39)

2.4 Properties of OLS 15 If we calculate the fractio of the total variatio i y that is explaied by the model, we are talkig about the coefficiet of determiatio, R 2 : R 2 = SSR SST = b X M 0 Xb y M 0 = 1 e e y y M 0 y (2.40) As we iclude variables ito the model the R 2 will ever decrease. Hece, for small samples, a better measure of fit is the adjusted R 2 or R 2 : R 2 = 1 e e/( K) y M 0 y/( 1) (2.41) 2.4 Properties of OLS 2.4.1 Ubiasedess The least square estimator b is ubiased for every value of. b = (X X) 1 X y (2.42) = (X X) 1 X (Xβ + ε) = β +(X X) 1 X ε E[b X] = β + E[(X X) 1 X ε] = β. The secod term after takig expectatios is zero because the errors are assumed to be orthogoal to the regressio residuals. 2.4.2 Variace ad the Gauss-Markov Theorem It is relatively simple to derive the samplig variace of the OLS estimators. However, the key assumptio i the derivatio is that the matrix X is costat. If X is ot costat, the the expectatios should be take coditioal o the observed X. From the derivatio of the ubiasedess of OLS we have that b β =(X X) 1 X ε. Usig this i the variace-covariace matrix of the OLS we have: Var[b X] = E[(b β)(b β) X] = E[((X X) 1 X ε)((x X) 1 X ε) X] = E[((X X) 1 X ε)(ε X(X X) 1 ) X] = (X X) 1 X E[εε X]X(X X) 1

16 2 Liear Regressio Models, OLS, Assumptios ad Properties = (X X) 1 X (σ 2 I)X(X X) 1 = σ 2 (X X) 1 (2.43) Gauss-Markov Theorem. I a liear regressio model (with spherical disturbaces), the Best Liear Ubiased Estimator (BLUE) of the coefficiets is the ordiary least squares estimator. I the Gauss-Markov Theorem, best refers to miimum variace. I additio, errors do ot eed to have a ormal distributio ad the X could be either stochastic or ostochastic. 2.4.3 Estimatig the Variace I order to obtai a sample estimate of the variace-covariace matrix preseted i Equatio2.43, we eed a estimate of the populatio parameter σ 2. We ca use: ˆσ 2 = 1 e 2 i, (2.44) which makes sese because e i is the sample estimate of ε i, ad σ 2 is the expected value of εi 2. However, this oe is biased because β is ot observed directly. To obtai a ubiased estimator of σ 2 we ca start with the expected value of the sum of squared residuals. Recall that e = My = M[Xβ + ε] = M ε. The, the sum of squared residuals is: e e = ε Mε (2.45) E[e e X] = E[ε Mε X] = E[tr(ε Mε) X] = E[tr(Mεε ) X] = tr(me[εε X]) = tr(mσ 2 I) = σ 2 tr(m) = σ 2 tr(i X(X X) 1 X ) = σ 2 [tr(i ) tr(x(x X) 1 X )] = σ 2 [tr(i ) tr(i K )] = σ 2 ( K), where ε M ε is a scalar (1 1 matrix), so it it equal to its trace, ad the operatio from the third to the fourth lie follows from the results o cyclic permutatios. From Equatio 2.45 we obtai the ubiased estimator of σ 2 :

2.4 Properties of OLS 17 s 2 = e e K. (2.46) Hece, the stadard errors of the estimators b ca be obtaied by first obtaiig a estimate of σ 2 usig Equatio 2.46 ad the pluggig s 2 ito Equatio 2.43. 2.4.4 Statistical Iferece Give that b is a liear fuctio of ε, if ε has a multivariate ormal distributio we have that: b X N[β,σ 2 (X X) 1 ]. (2.47) 2.4.4.1 Hypothesis Testig Assumig ormality coditioal o X ad with S kk beig the kth diagoal elemet of X X 1 we have that: z k = b k β k σ 2 S kk (2.48) has a stadard ormal distributio. However, σ 2 is a ukow populatio parameter. Hece, we use: t k = b k β k (2.49) s 2 S kk that has a t distributio with ( K) degrees of freedom. We use Equatio 2.49 for hypotheses testig about the elemets of β. 2.4.4.2 Cofidece Itervals Based o Equatio 2.49 we ca obtai the (1 α) cofidece iterval for the populatio parameter β k usig: P(b k t α/2, K s bk β k b k +t α/2, K s bk )=1 α. (2.50) What this equatio is sayig is that the true populatio parameter β k will lie betwee the lower cofidece iterval b k t α/2, K s bk ad the upper cofidece iterval b k + t α/2, K s bk i (1 α)% of the times. Moreover, t α/2, K is the critical value from the t distributio with ( K) degrees of freedom. This is illustrated i Figure 2.4.

18 2 Liear Regressio Models, OLS, Assumptios ad Properties Fig. 2.4 Cofidece Itervals. Refereces [Dougherty (2007)] Dougherty, C., 2007. Itroductio to Ecoometrics. 3rd ed. New York: Oxford Uiversity Press. [Greee (2008)] Greee, W.H. 2008. Ecoometric Aalysis. 6th ed. New Jersey: Pearso Pretice Hall. [Wooldridge (2009)] Wooldridge, J.M., 2009. Itroductory Ecoometrics: A Models Approach. 4th ed. New York: South-Wester Publishers.