Slide Set 4 CLRM estimatio Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Thursday 24 th Jauary, 2019 (h08:41) P. Coretto MEF CLRM estimatio 1 / 22 Least Squares Method (LS) Give a additive regressio model: y = f (X; β) + ε ote that ε is ot observed, but it is fuctio of observables ad the ukow parameter ε = y f (X; β) LS method: assume the sigal f (X; β) is much stroger tha the error ε. look for a β such that the size of ε is as small as possible size of ε is measured by some orm ε P. Coretto MEF CLRM estimatio 2 / 22
Ordiary Least Squares estimator (OLS) OLS = LS with 2. Therefore the OLS objective fuctio is S(β) = ε 2 2 = ε ε = (y f (X; β)) (y f (X; β)), ad the OLS estimator b is defied as the optimal solutio b = arg mi S(β) β R K For the liear model S(β) = ε 2 2 = ε ε = (y Xβ) (y Xβ) = S(β) is icely covex! ε 2 i = (y i x iβ) 2 P. Coretto MEF CLRM estimatio 3 / 22 Propositio: OLS estimator The uique OLS estimator is b = (X X) 1 X y To see this, first we itroduce two simple matrix derivative rules: 1 Let a, b R p the a b b = b a b = a 2 Let b R p, ad let A R p p be symmetric, the a Ab b = 2Ab = 2b A P. Coretto MEF CLRM estimatio 4 / 22
Proof. Rewrite the LS objective fuctio S(β) =(y Xβ) (y Xβ) =y y β X y y Xβ + β X Xβ Note that the traspose of a scalar is the scalar itself, the so that we write y Xβ = (y Xβ) = β X y S(β) = y y 2β (X y) + β (X X)β (4.1) Sice S( ) is covex, there exists a miimum b which will satisfy the first order coditios S(β) β = 0 β=b P. Coretto MEF CLRM estimatio 5 / 22 By applyig the previous derivative rules (1) ad (2) to the 2 d ad 3 rd term of (4.1) S(b) b = 2(X y) + 2(X X)b = 0 Which lead to the so called ormal equatios (X X)b = X y The matrix X X is square symmetric (see homeworks). Based o A3 with probability 1 X X is o sigular, the (X X) 1 exists, the the ormal equatio ca be writte as (X X) 1 (X X)b = (X X) 1 X y = b = (X X) 1 X y which proves the desired result P. Coretto MEF CLRM estimatio 6 / 22
Formulatio i terms of sample averages It ca be show (see homeworks) that X X = x i x i ad X y = x i y i Defie S xx = 1 X X = 1 x i x i ad s xy = 1 X y = 1 x i y i Therefore b = (X X) 1 X y ca be writte as ) 1 1 1 b =( X X X y ( ) 1 1 ( ) 1 = x i x i x i y i =S 1 xx s xy P. Coretto MEF CLRM estimatio 7 / 22 Oce β is estimated via b, the estimated error, also called residual is obtaied as e = y Xb Fitted values, also called the predicted values, are ŷ = Xb so that e = y ŷ Note that ŷ i = b 1 + b 2 x i2 + b 2 x i2 +... for all i = 1, 2,..., What is ŷ i? ŷ i I s the estimated coditioal expectatio of Y for the whe X 1 = 1, X 2 = x i2,..., X K = x ik P. Coretto MEF CLRM estimatio 8 / 22
Algebraic/Geometric properties of the OLS Propositio (orthogoality of residuals) The colum space of X is orthogoal to the residual vector Proof. Write the ormal equatios X Xb X y = 0 = X (y Xb) = 0 = X e = 0 Therefore for every colum X k (observed regressor) it holds true that the ier product X k e = 0. P. Coretto MEF CLRM estimatio 9 / 22 Propositio (residuals sum to zero) If the liear model icludes the costat term, the e i = (y i x ib) = 0 Proof. By assumptio we have a lier model with costat/itercept term.that is y i = β 1 + β 2 x i2 + β 3 x i3 +... + ε i Therefore X 1 = 1 = (1,, 1,..., 1). Apply the previous property the 1 st colum of X X 1 e = 1 e = ad this proves the property e i = 0 P. Coretto MEF CLRM estimatio 10 / 22
Propositio (Fitted vector is a projectio) ŷ is the projectio of y oto the space spaed by colums of X (regressors) Proof. ŷ = Xb = X(X X) 1 X y = Py It suffices to show that that P = X(X X) 1 X is symmetric ad idempotet. P = (X(X X) 1 X ) ( (X X ) 1 ) X = X = X Therefore P is symmetric. ( (X X) ) 1 X = X(X X) 1 X = P P. Coretto MEF CLRM estimatio 11 / 22 PP = (X(X X) 1 X ) ( X(X X) 1 X ) = X(X X) 1 (X X)(X X) 1 X = X(X X) 1 X = P which shows that P is also idempotet, ad this completes the proof P it s called the ifluece matrix, because measures the impact of the observed ys o each predicted ŷ i. Elemets of the diagoal of P are called leverages, because are the ifluece y i o the the correspodig ŷ i P. Coretto MEF CLRM estimatio 12 / 22
Propositio (Orthogoal decompositio) The OLS fittig decomposes the observed vector y i the sum of two orthogoal compoets y = ŷ + e = Py + M y Remark: orthogoality implies that the idividual cotributios of each term of the decompositio of y are somewhat well idetified. Proof. First otice that e = y ŷ = y Py = (I P)y = M y where M = (I P). Therefore y = ŷ + e = Py + M y It remais to show that ŷ = Py ad e = M y are orthogoal vectors. P. Coretto MEF CLRM estimatio 13 / 22 First ote that M P = PM = 0, i fact (I P)P = I P PP = 0 Moreover Py, M y = (Py) (M y) = y P M y = y PM y = y 0y = 0 ad this completes the proof M = I P is called the residual maker matrix because it maps y ito e. It allows to write e i terms of the observables y ad X. Properties: M is idempotet ad symmetric (show it) M X = 0, i fact M X = (I P)X = X X = 0 Remark: it ca be show that this decompositio is also uique (a cosequece of Hilbert projectio theorem). P. Coretto MEF CLRM estimatio 14 / 22
OLS Projectio Source: Greee, W. H. (2011) Ecoometric Aalysis 7th Editio P. Coretto MEF CLRM estimatio 15 / 22 Estimate of the variace of the error term Mi of the LS objective fuctio S(b) = (y Xb) (y Xb) = e e This called Residual sum of squares RSS = ei 2 = e e Note that ad e = M y = M (Xβ + ε) = M ε RSS = e e = (M ε) (M ε) = ε M M ε = ε M ε P. Coretto MEF CLRM estimatio 16 / 22
Ubiased estimatio of the error variace s 2 = 1 K ei 2 = e e K = RSS K SER = stadard error of the regressio = s P. Coretto MEF CLRM estimatio 17 / 22 Estimatio error decompositio The samplig estimatio error is give by b β, ow b β = ( X X ) 1 X y β = ( X X ) 1 X (Xβ + ε) β = ( X X ) 1 (X X)β + ( X X ) 1 X ε β = β + ( X X ) 1 X ε β = ( X X ) 1 X ε The bias is the expected estimatio error: Bias(b) = E[b β] P. Coretto MEF CLRM estimatio 18 / 22
TSS = total sum of squares Let ȳ be the sample average of the observed y 1, y 2,..., y : ȳ = 1 y i, ad let ȳ = (ȳ, ȳ,..., ȳ). We ca also write ȳ = ȳ1 }{{} times TSS = the deviace (variability) observed i the idepedet variable y TSS = (y i y) 2 = (y ȳ) (y ȳ) This is a variability measure, because it computes the squared deviatios of y from its observed ucoditioal mea. P. Coretto MEF CLRM estimatio 19 / 22 ESS = explaied sum of squares ESS = the overall deviace of the predicted values of y wrt to the ucoditioal mea of y ESS = (ŷ i y) 2 = (ŷ ȳ) (ŷ ȳ) At first look this is ot exactly a measure of variability (why?). But it turs out that aother property of the OLS is that 1 ŷ i = 1 y i P. Coretto MEF CLRM estimatio 20 / 22
TSS decompositio ad goodess of fit It ca be show (we do t do this here) that TSS = ESS + RSS From the previous decompositio we get a famous (ad misused) goodess of fit statistic R 2 = ESS TSS = 1 RSS TSS R 2 is the portio of deviace observed i the y that is explaied by the liear model. This is also called coefficiet of determiatio. P. Coretto MEF CLRM estimatio 21 / 22 Problems with R 2 Icreases by addig more regressors. For this reaso its better to look at the so-called adjusted R 2 (for the degrees of freedom) which is computed as follows: R 2 = 1 RSS/( K) TSS/( 1) R 2 [0, 1] oly if the costat term is icluded i the model. So whe you estimate without itercept do t be scared if you get R 2 < 0 A extremely large R 2 is pathological, guess why! P. Coretto MEF CLRM estimatio 22 / 22