I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression Y i i... K... Ki i i i K Ki i if: 3... K i i 3i Ki where he 'lambdas' are a se of parameers (no all equal o zero) and i. This mus be rue for all observaions. Alernaively, we could wrie any independen variable as an exac linear funcion of he ohers. = - - 3 K -... - i i 3i K i This says essenially ha is redundan. Jus a linear combinaion of he oher regressors. = Problems wih perfec mulicollineariy: () Coefficiens can' be esimaed. For example, in he -variable regression: ˆ xi y xi i Suppose i =λ i =λ. As a resul x i = for all i and hence he denominaor is. Thus, esimaed slope coefficiens are undefined. This resul applies o MLR. Inuiion: Again, we wan o esimae he parial coefficiens. Depends on he variaion in one variable and is abiliy o explain he variaion in he dependen variable ha can be explained by he oher regressor. Bu we can ge variaion
Page - in one wihou geing variaion in he oher by definiion. () Sandard errors can' be esimaed. In he 3-variable regression model, he sandard error on ˆ can be wrien: ˆ se( )= ˆ xi( - r 3 ) Bu perfec mulicollineariy implies r 3 = or r 3 =- (r 3= in eiher case), and he denominaor is zero. Thus, sandard errors are undefined ( ). The soluion o perfec mulicollineariy is rivial: Drop one or several of he regressors. B. Imperfec Mulicollineariy Definiion: Imperfec mulicollineariy exiss in a K-variable regression if: 3... K i i 3i Ki where v i is a sochasic variable wih mean zero and small variance. As Var(v i ), imperfec becomes perfec mulicollineariy. Alernaively, we could wrie any paricular independen variable as an 'almos' exac linear funcion of he ohers. i = - If you know K variables, you don know he K h variable precisely. Wha are he problems wih imperfec mulicollineariy? i - 3 -... - vi = Coefficiens can be esimaed. OLS esimaors are sill unbiased, and minimum variance (i.e., BLUE). Imperfec mulicollineariy does no violae he classical assumpions. Bu sandard errors 'blow up'. They increase wih he degree of mulicollineariy. I reduces he precision of our coefficien esimaes. K 3i K i - vi
se(bea) 3 4 5 6 7 Page - 3 For example, recall: se( ˆ )= xi( - r 3 ) As r 3, he sandard error. Numerical example: Suppose sandard error is when r 3 =. If r 3 =., If r 3 =.5, If r 3 =.5, If r 3 =.75, If r 3 =.9, If r 3 =.99, hen he sandard error=.. hen he sandard error=.3. hen he sandard error=.5. hen he sandard error=.5. hen he sandard error=.9. hen he sandard error=7.9....4.6.8. Sandard error increases a an increasing rae wih he mulicollineariy beween he explanaory variables. Resul in wider confidence inervals and insignifican raios on our coefficien esimaes (e.g., you ll have more difficuly rejecing he null ha a slope coefficien is equal o zero). r3
Page - 4 This problem is closely relaed o he problem of a small sample size. In boh cases, sandard errors blow up. Wih a small sample size he denominaor is reduced by he lack of variaion in he explanaory variable. II. Mehods of Deecion 3 General Indicaors or Diagnosic Tess. T Raios vs. R. Look for a high R, bu few significan raios. Common 'rule of humb'. Can' rejec he null hypoheses ha coefficiens are individually equal o zero ( ess), bu can rejec he null hypohesis ha hey are simulaneously equal o zero (F es). No an 'exac es'. Wha do we mean by 'few' significan ess, and a 'high' R? Too imprecise. Also depends on oher facors like sample size.. Correlaion Marix of Regressors. Look for high pair-wise correlaion coefficiens. Look a he correlaion marix for he regressors. Mulicollineariy refers o a linear relaionship among all or some he regressors. Any pair of independen variables may no be highly correlaed, bu one variable may be a linear funcion of a number of ohers. In a 3-variable regression, mulicollineariy is he correlaion beween he explanaory variables. Ofen said ha his is a... sufficien, bu no a necessary condiion for mulicollineariy. In oher words, if you ve go a high pairwise correlaion, you ve go problems. However, i isn conclusive evidence of an absence of mulicollineariy. 3. Auxiliary Regressions. Run series of regressions o look for hese linear relaionships among he explanaory variables. Given he definiion of mulicollineariy above, regress one independen variable agains he ohers and 'es' for his linear relaionship.
For example, esimae he following: = 3 Page - 5... K ei where our hypohesis is ha i is a linear funcion of he oher regressors. i 3i Ki We es he null hypohesis ha he slope coefficiens in his auxiliary regression are simulaneously equal o zero: wih he following F es. H K : 3 4 R F = K - - R n - K where R is he coefficien of deerminaion wih i as he dependen variable, and K is he number of coefficiens in he original regression. This is relaed o high Variance Inflaion Facors discussed in he exbook, where VIFs ; if R VIF>5, he mulicollineariy is severe. Bu ours is a formal es. Summary: No single es for mulicollineariy. III. Remedial Measures Once we're convinced ha mulicollineariy is presen, wha can we do abou i? Diagnosis of he ailmen isn clear cu, neiher is he reamen. Appropriaeness of he following remedial measures varies from one siuaion o anoher. EAMPLE: Esimaing he labour supply of married women from 95-999: where: HRS = WW W HRS = Average annual hours of work of married women. W w = Average wage rae for married women. W m = Average wage rae for married men. M
Page - 6 Suppose we esimae he following: HRS ˆ =733.7 48.37 W...(34.97) W -.9W (9.) M R =.847 Mulicollineariy is a problem here. Firs ipoff is he -raios are less han.5 and respecively (insignifican a % levels). Ye, R is.847. Bu easy o confirm mulicollineariy in his case. Correlaion beween mean wage raes is.99 over our sample period! Sandard errors blow up. Can separae he wage effecs on labour supply of married women. Possible Soluions?. A Priori Informaion. If we know he relaionship beween he slope coefficiens, we can subsiue his resricion ino he regression and eliminae he mulicollineariy. Heavy reliance on economic heory. For example, suppose ha β =-.5β. We expec ha β > and β <. HRS = W = (W W = where we compue W * =W W -.5W M. Suppose we re-esimae and find: M * Clearly, his has eliminaed mulicollineariy by reducing his from a 3- o a -variable regression. Using earlier assumpion ha β =-.5β we ge individual coefficien esimaes: Unforunaely, such a priori informaion is exremely rare. W -.5 W M -.5W ) W HRS ˆ =78. 46.8W...(6.7) ˆ = 46.8 ˆ = -3.4 *
Page - 7. Dropping a Variable. Suppose we omi he wage of married men. We esimae: HRS = W W v The problem is ha we're inroducing 'specificaion bias'. We're subsiuing one problem for anoher. Remedy may be worse han he disease. Recall he fac ha he esimae of α is likely o be a biased esimae of β. E( ˆ) b In fac, bias is increased by he mulicollineariy. Where he laer erm comes from he regression of he omied variable on he included regressor. 3. Transformaion of he Variables. One of he simples hings o do wih ime series regressions is o run 'firs differences'. Sar wih he original specificaion a ime. The same linear relaionship holds for he previous period as well: Subrac he second equaion from he firs: or HRS = WW W HRS - The advanage is ha changes in wage raes may no be as highly correlaed as heir levels. M = WW - W M - ( HRS - HRS - )= (WW -WW - ) (W M -W M - ) ( - - ) HRS = WW W M
Page - 8 The disadvanages are: (i) Number of observaions are reduced (i.e., loss of a degree of freedom). Sample period is now 95-999. (ii) May lead o serial correlaion. Cov (, - ) = - - = - - - - Again, he cure may be worse han he disease. Violaes one of he classical assumpions. More on serial correlaion laer. 4. New Daa. Two possibiliies here: (i) Exend ime series. Mulicollineariy is a 'sample phenomenon'. Wage raes may be correlaed over he period 95-999. Add more years. For example, go back o 94. Correlaion may be reduced. Problem is ha i may no be available, or he relaionship among he variables may have changed (i.e., he regression funcion isn sable ). More likely ha he daa isn here. If i was, why wasn i included iniially? (ii) Change Naure or Source of Daa. Swich from ime-series o cross-secional analysis. Change he 'uni of observaion'. Use a random sample of households a a poin in ime. The degree of mulicollineariy in wages may be relaively lower beween spouses. Or combine daa sources. Use 'panel daa'. Follow a random sample of households over a number of years. 5. 'Do Nohing' (A Remedy!). Mulicollineariy is no a problem if he objecive of he analysis is forecasing. Doesn' affec he overall 'explanaory power' of he regression (i.e., R ). More of a problem if he objecive is o es he significance of individual parial coefficiens.
Page - 9 However, esimaed coefficiens are unbiased. Simply reduces he 'precision' of he esimaes. Mulicollineariy ofen given oo much emphasis in he lis of common problems wih regression analysis. If i s imperfec mulicollineariy, which is almos always going o be he case, hen i doesn violae he classical assumpions. Much more a problem of if he goal is o es he significance of individual coefficiens. Less of a problem for forecasing and predicion. IV. Quesions for Discussion: Q8. V. Compuing Exercise: Example 8.5. (Johnson, Ch 8)