Econ107 Applied Econometrics

Size: px

Start display at page:

Download "Econ107 Applied Econometrics"

Claud Knight
5 years ago
Views:

1 Econ107 Applied Econometrics Topics 2-4: discussed under the classical Assumptions 1-6 (or 1-7 when normality is needed for finite-sample inference) Question: what if some of the classical assumptions are violated? Topic 5: deals with violations of Assumption 1 (A1 hereafter). Topics 6-8: deal with three cases of violations of the classical assumptions: multicollinearity (A6), serial correlation (A4), and heteroskedasticity (A5). Questions to be addressed: what is the nature of the problem? what are the consequences of the problem? how is the problem diagnosed? how to remedy the problem? 1

2 6 Multicollinearity (Studenmund, Chapter 8) 6.1 The Nature of Multicollinearity Perfect Multicollinearity 1. Definition: Perfect multicollinearity exists in the following regression Y i = β 0 + β 1 X 1i + + β k X ki + ε i, (1) if there exist a set of parameters λ j (j =0, 1,,k, not all equal to zero) such that λ 0 X 0i + λ 1 X 1i + + λ k X ki =0, (2) where X 0i 1. (2) must hold for all observations. 2

3 Alternatively, we could write an independent variable as an exact linear combination of the others, e.g., if λ k 6=0, we can write (2) as X ki = λ 0 λ k X 0i λ 1 λ k X 1i λ k 1 λ k X k 1,i. (3) The last expression says essentially that X ki is redundant and it does not have any information other than those contained in X 0i,X 1i,,X k 1,i to explain Y i. 3

4 Example. Consider the following regression model for consumption function C = β 1 + β 2 N + β 3 S + β 4 T + ε, where C is consumption, N is nonlabor income, S is salary, T is total income, and ε is error term. Since T = N + S, it is not possible to separate individual effects of the components (N, S) of income and total income (T ). According to the model, E (C) =β 1 + β 2 N + β 3 S + β 4 T. But if we let c be any nonzero value, and let β 0 2 = β 2 c, β 0 3 = β 3 c, and β 0 4 = β 4 + c, then E (C) =β 1 + β 0 2 N + β0 3 S + β0 4 T as well for a different set of parameters. This allows the same value of E (C) for many different values of the parameters. 4

5 2. Problems (1) Coefficients can t be estimated. Consider the regression: Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i. (4) If X 2i = λx 1i (λ 6= 0), we will explain that the parameters β 1 and β 2 cannot be identified or estimated. To see why, define β 1 = β 1 + cλ, and β 2 = β 2 c, where c can be any constant. (4) is equivalent to Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i. (5) This means that there are an infinite number of c s that makes (5) hold. In other words, there are an infinite number of (β 1,β 2 ) such that (5) holds. We cannot separate the influence of X 1i from that of X 2i on Y i. The above analysis extends to a generic MLR model where a regressor can be written as a linear combination of others. 5

6 (2) Standard errors can t be estimated. In the regression model (4), the standard error of ˆβ 1 can be written: std(ˆβ 1 ) = v u t σ 2 P ni=1 ³ X1i X 2 ³ 1 r 2 12 where r 12 is the sample correlation between X 1i and X 2i. In the case of perfect multicollinearity (e.g., X 2i = λx 1i + a), r 12 =1or 1, so that the denominator is zero. Thus, std(ˆβ 1 )=. Solution: The solution to perfect multicollinearity is trivial: Drop one or several of the regressors. In the above example, we can drop either X 2i or X 1i so that (4) can be written as or Y i = β 0 +(β 1 + λβ 2 ) X 1i + ε i, Y i = β 0 +(β 2 + β 1 /λ) X 2i + ε i., 6

7 By regressing Y i on X 1i, we are estimating β 1 + λβ 2. Analogously, by regressing Y i on X 2i, we are estimating β 2 + β 1 /λ. In either case, we cannot estimate β 1 or β 2. Remarks. Perfect multicollinearity is fairy easy to avoid. Econometricians almost never talk about perfect multicollinearity. Instead, when we use the word multicollinearity, we are really talking about severe imperfect multicollinearity. 7

8 6.1.2 Imperfect Multicollinearity Imperfect multicollinearity can be defined as a linear functional relationship between two or more independent variables that is so strong that it can significantly affect the estimation of the coefficients of the variables. Definition: Imperfect multicollinearity exists in a k-variate regression if forsomestochasticvariablev i. λ 0 X 0i + λ 1 X 1i + + λ k X ki + v i =0 Remarks. 1) As Var(v i ) 0, imperfect multicollinearity tends to perfect multicollinearity. 8

9 2) Alternatively, we could write any particular independent variable as an almost exact linear function of the others. E.g., if λ k 6=0, then X ki = λ 0 λ k X 0i λ 1 λ k X 1i λ k 1 λ k X k 1,i v i λ k. (6) The above equation implies that although the relationship between X ki and X 0i,X 1i, might be fairly strong, it is not strong enough to allow X ki to be completely explained by X 0i,X 1i,,X k 1,i ; some unexplained variation still remains. 3) Imperfect multicollinearity indicates a strong linear relationship between the regressors. The stronger the relationship between the two or more regressors, the more likely it is that they will be considered significantly multicollinear. 9

10 6.2 The Consequences of (Imperfect) Multicollinearity 1. Coefficient estimators will remain unbiased. Imperfect multicollinearity does not violate the classical assumptions. If all the classical assumptions 1-6 are met, we can estimate the coefficients and the estimators of β s will still be centered around the true value of β s. Moreover, the OLS estimators are still unbiased and are BLUE. 2. The variances/standard errors of the coefficient estimators blow up. They increase with the degree of multicollinearity. Since two or more of the regressors are significantly related, it becomes too difficult to identify the separate effects of the multicollinear variables and we are much more likely to make errors in estimating the coefficients than we were before we encountered multicollinearity. So imperfect multicollinearity reduces the precision of our coefficient estimates. 10

11 For example, in the 2-variate regression case std(ˆβ 1 ) = v u t As r 12 1, the standard error. σ 2 P ni=1 ³ X1i X 2 ³ 1 r 2 12 Numerical example: Suppose the standard error σ 2 / P n i=1 ³ X1i X 2 =1, i.e., std(ˆβ 1 )=1 when r 12 =0. If r 12 =0.10, then the standard error=1.01. If r 12 =0.25, then the standard error=1.03. If r 12 =0.50, then the standard error=1.15. If r 12 =0.75, then the standard error=1.51. If r 12 =0.90, then the standard error=2.29. If r 12 =0.99, then the standard error=7.09. Standard error increases at an increasing rate with the multicollinearity between the explanatory variables. As a result, we will have wider confidence intervals and possibly insignificant t ratios on our coefficient estimates because t 1 = β b 1 /se( β b 1 ).. 11

12 8 7 6 ) std(β 1 ^ r 12 Figure 1: A consequence of impefect multicollinearity: the blow up of standard errors 12

13 3. The computed t-ratios will fall. This means that we ll have more difficulty in rejecting the null hypothesis that a slope coefficient is equal to zero. This problem is closely related to the problem of a small sample size. In both cases, standard errors blow up. With a small sample size the denominator is reduced by the lack of variation in the explanatory variable. 4. Coefficient estimates become very sensitive to the changes in specification and number of observations. The coefficient estimates may be very sensitive to the addition of one or a small number of observations. The coefficient estimates may be sensitive to the deletion of a statistically insignificant variable. One may get very odd coefficient estimates possibly with wrong signs due to the high variance of the estimator. 5. The overall fit of the model will be largely unaffected. Even though the individual t-ratios are often quite low in the case of imperfect multicollinearity, the overall fit oftheequation(r 2 or R 2 ) will not fall much. 13

14 Ahypotheticalexample. Suppose we want to estimate a student s consumption function. After some preliminary work, we come up with the following equation C i = β 0 + β 1 Y d,i + β 2 LA i + ε i, where C i =the annual consumption expenditures of the ith student Y d,i =the annual disposable income (including gifts) of the ith student LA i =the liquid asset (cash, savings, etc.) of the ith student. Please analyze the following regression outputs: bc i = Y d,i LA i (7) (1.0307) (0.0492) t [0.496] [0.868] n = 9, R 2 =

15 bc i = Y d,i (8) (0.157) t [6.187] n = 9, R 2 = An empirical example: petroleum consumption Suppose that we are interested in building a cross-sectional model of the demand for gasoline by state: C i = β 0 + β 1 Mile i + β 2 Tax i + β 3 Reg i + ε i, where C i =the petroleum consumption in the ith state Mile i =the urban highway miles within the ith state Tax i =the gasoline tax rate in the ith state Reg i =the motor vehicle registrations in the ith state. 15

16 Please analyze the following regression outputs: bc i = Mile i 36.5Tax i 0.061Reg i (9) (10.3) (13.2) (0.043) t [5.92] [-2.77] [-1.43] n = 50, R 2 = bc i = Tax i Reg i (10) (16.9) (0.012) t [-3.18] [15.88] n = 50, R 2 =

17 6.3 The Detection of Multicollinearity It is worth mentioning that multicollinearity exists in almost all equations. It is virtually impossible in the real world to find a set of independent variables that are totally uncorrelated with each other. Our purpose is to learn to determine how much multicollinearity exists by using three general indicators or diagnostic tools. 1. t-ratios versus R 2. Look for a high R 2,but few significant t ratios. Remarks. (1) Common rule of thumb. Can t reject the null hypotheses that coefficients are individually equal to zero (t tests), but can reject the null hypothesis that they are simultaneously equal to zero (F test). (2) This is not an exact test. What do we mean by few significant t ratios, and a high R 2? Too imprecise. Also depends on other factors like the sample size. 17

18 2. Correlation matrix of regressors. Look for high pair-wise correlation coefficients. Look at the correlation matrix for the regressors. Remarks. (1) How high is high? As a rule of thumb, we can use 0.8. If the sample correlation exceeds 0.8 in absolute value, we should be concerned about multicollinearity. (2) Multicollinearity refers to a linear relationship among all or some of the regressors. Any pair of independent variables may not be highly correlated, but one variable may be a linear function of a number of others. In a 2-variate regression, multicollinearity is the correlation between the 2 explanatory variables. (3) This is a... sufficient, but not a necessary condition for multicollinearity. In other words, if you ve got a high pairwise correlation, you ve got problems. However, it isn t conclusive evidence of an absence of multicollinearity. 18

19 3. High variance inflation factors (VIFs). The variance inflation factor (VIF) is a method of detecting the degree of multicollinearity by looking at the extent to which a given explanatory variable can be explained by all other explanatory variables in the equation. So there is a VIF for each regressor. Suppose we want to use VIF to detect multicollinearity in the following regression: Y i = β 0 + β 1 X 1i + + β k X ki + ε i. (11) Let b β j denote the OLS estimator of β j in the above regression. We need to calculate k different VIFs, one for each X ji (j =1,,k). 1) Run the following k regressions: X 1i = γ 0 + γ 2 X 2i + + γ k X k,i + v 1i X ki = α 0 + α 1 X 1i + + α k 1 X k 1,i + v ki 19

20 2) Calculate the R 2 for each of the above k regressions and denote Rj 2 as the R2 from the linear regression of X ji on all other regressors in (11). The VIF for β b j is defined by VIF( β b j )= 1 1 Rj 2. ThehigherVIF( b β j ), thehigherthevarianceof b β j (holding constant the variance of the error term) and the more severe the effects of multicollinearity. Remarks. 1) How high is high? As a common rule of thumb, if VIF( b β j ) > 5 for some j, then the multicollinearity is severe. 2) As the number of regressors increases, it makes sense to increase the above number (5) slightly. 3) In Eviews we can calculate the VIF( b β j ) after the jth regression (i.e, run X ji = α 0 +α 1 X 1i + + α j 1 X j 1,i +α j+1 X j+1,i + + α k X k,i + v ki, and name the equation as eqj after the regression) by typing in the command window scalar VIFj=1/(1-eqj.@R2) Summary: No single test for multicollinearity. 20

21 6.4 Remedies for Multicollinearity Once we re convinced that multicollinearity is present, what can we do about it? The diagnosis of the ailment isn t clear cut, neither is the treatment. Appropriateness of the following remedial measures varies from one situation to another. Example. Estimating the labour supply of married women from : Hours t = β 0 + β 1 W w,t + β 2 W m,t + ε t, (12) where: Hours t = Average annual hours of work of married women W w,t = Average wage rate for married women W m,t =Averagewagerateformarriedmen. Suppose the regression output is Hours d t = W w,t 22.91W m,t (34.97) (29.01) n = 50, R 2 =

22 Multicollinearity is a problem here. The t-ratios are less than 1.5 and 1, respectively (insignificant at 10% levels). Yet, R 2 is It is easy to confirm multicollinearity in this case. The correlation between the two wage rates is as high as 0.99 over our sample period! Standard errors blow up. We can t separate the wage effects on labour supply of married women. Possible Solutions? 22

23 1. A Priori Information If we know the relationship between the slope coefficients, we can substitute this restriction into the regression and eliminate the multicollinearity. This relies heavily on economic theory. Example. If we use time series data to estimate the Cobb-Douglass production function or the elasticity of output (Y ) with respect to the capital (K) and labor (L), we may have multicollinearity problem because as time evolves, both K and L increase and they can be highly correlated. Suppose that we have a constant return to scale in the Cobb-Douglass production function Y t = AK β 1 t L β 2 t eε t (β 1 + β 2 =1). We can impose the restriction β 1 + β 2 =1in the following regression: ln Y t = β 0 + β 1 ln K t + β 2 ln L t + ε t 23

24 by plugging β 2 =1 β 1 into the above equation to obtain ln Y t = β 0 + β 1 ln K t +(1 β 1 )lnl t + ε t ln Y t ln L t = β 0 + β 1 (ln K t ln L t )+ε t ln (Y t /L t ) = β 0 + β 1 ln (K t /L t )+ε t. That is we can estimate β 1 by regressing ln (Y t /L t ) on a constant and ln (K t /L t ). Afterweobtainestimate b β 1 of β 1, we can obtain estimate β 2 by bβ 2 =1 b β 1. Remarks. Unfortunately, such a priori information is extremely rare. 24

25 2. Dropping a Variable In the example of labour supply of married women, suppose we omit the wage of married men and estimate the following model Hours t = α 0 + α 1 W w,t + v t. (13) In this example, it seems natural to drop the variable W m,t. In other cases, it may make no statistical significance which variable is dropped. One has to rely on the theoretical underpinnings of the model or common sense. Some cautionary note. Sometimes we have to be careful when we consider dropping a variable in case of multicollinearity. If one variable should appear in the regression while we have dropped it, then we will encounter the problem of omitted variable bias. So we are substituting one problem for another. The remedy may be worse than the disease. Suppose that W m,t should appear in (12), then the OLS estimator bα 1 in (13) is likely to be biased for β 1 : E(ˆα 1 )=β 1 + β 2 b 12, where b 12 is associated with the correlation between W w,t and W m,t. 25

26 3. Transformation of the Variables One of the simplest things to do with time series regressions is to run the regression on the first differences data. Start with the original specification at time t: Hours t = β 0 + β 1 W w,t + β 2 W m,t + ε t, (14) The same linear relationship holds for the previous period (t 1) as well: Hours t 1 = β 0 + β 1 W w,t 1 + β 2 W m,t 1 + ε t 1. (15) Subtracting (15) from (14) yields Hours t Hours t 1 = β 1 ³ Ww,t W w,t 1 +β2 ³ Wm,t W m,t 1 +(εt ε t 1 ), (16) or 4Hours t = β 1 4W w,t + β 2 4W m,t + 4ε t, (17) where e.g., 4Hours t = Hours t Hours t 1. The advantage is that changes in wage rates may not be as highly correlated as their levels. 26

27 The disadvantages are: (i) Number of observations are reduced (i.e., loss of one degree of freedom). The sample period changes from to , say. (ii) May introduce serial correlation. Even if ε t are uncorrelated, 4ε t are not because Cov(4ε t, 4ε t 1 ) = Cov(ε t ε t 1,ε t 1 ε t 2 ) = Cov(ε t,ε t 1 ) Cov(ε t,ε t 2 ) Cov(ε t 1,ε t 1 ) + Cov(ε t 1,ε t 2 ) = 0 0 Var (ε t 1 )+0= Var (ε t 1 ) 6= 0. Again, the cure may be worse than the disease. It violates one of the classical assumptions and new problems need to be addressed (in later topic). 27

28 4. Get More Data. Two possibilities here: (1) Extend the data set. Multicollinearity is a sample phenomenon. Wage rates may be correlated over the period Add more years. For example, go back to Correlation may be reduced. The problem is that more data may not be available, or the relationship among the variables may have changed (i.e., the regression function isn t stable over time). More likely that the data are not there. If they are there, why not include them initially? (2)ChangeNatureorSourceofData. If possible, we can switch from time-series to cross-sectional analysis or to panel data analysis. The sample correlation in the cross-sectional data is usually different from that in thetimeseriesdata. The use of panel data potentially reduces the multicollinearity in the total sample. For example, we can use a random sample of many households at a point in time. The degree of multicollinearity in wages may be relatively lower between spouses. Or, we can use a random sample of households over a number of years. 28

29 5. Do Nothing (A Remedy!) Multicollinearity is not a problem if the objective of the analysis is forecasting. It doesn t affect the overall explanatory power of the regression (i.e., R 2 ). It is a problem if the objective is to test the significance of individual coefficients because of the inflated variances/standard errors. Multicollinearity is often given too much emphasis in the list of common problems with regression analysis. If it s imperfect multicollinearity, which is almost always going to be the case, then it doesn t violate the classical assumptions. 29

30 Exercise: Q8.11 Questions for Discussion: Example Example 8.5.2: Does the Pope s 1966 decision to allow Catholics to eat meat on non-lent Fridays cause a shift in the demand function for fish? Consider the regression F t = β 0 + β 1 PF t + β 2 PB t + β 3 ln Yd t + β 4 N t + β 5 P t + ε t, where F t : average pounds of fish consumed per capita in year t PF t : price index for fish in year t PB t : price index for beef in year t Yd t : real per capita disposable income in year t (in billions of dollars) N t : the number of Catholics in the US in year t (tens of thousands) P t : =1 after the Pope s 1966 decision and 0 otherwise Question 1: State the null and alternative hypotheses to test whether the Pope s decision plays a negative role in the consumption of fish. Question 2: Some economic theory suggests that as income rises, the portion of 30

31 that extra income devoted to the consumption of fish will decrease. Is the choice of semilog function to relate the disposable income to the consumption of fish consistent with this theory? Question 3: Suppose the regression output is F t = PF t PB t ln Yd t N t P t (0.031) (0.0202) (1.87) ( ) (0.353) t : [1.27] [ ] [0.945] (-0.958) (-1.01) R 2 = 0.736, R 2 =0.666, n = 25. Evaluate the above regression results. Question 4: Are there any signs of multicollinearity in the above regression model? How do you check for this by using simple correlation coefficients? What is the drawback of this approach? [Hint. To detect multicollinearity with simple correlation coefficients: After you run the regression, select Procs/Make Regressor group on the equation window menu bar. Select View/Correlation/Common Sample on the group object menu bar.] 31

32 Question 5: How do you check the presence of multicollinearity by using the VIF? Verify that the VIF for PF t and ln Yd t, is about 43.4 and 23.3, respectively. What does this suggest to us? Question 6: Given the high correlation between ln Yd t and N t, it is reasonable to drop one of them. Given that the logic behind including the number of Catholics in a per capita fish consumption equation is fairly weak, we can decide to drop N t : F t = PF t PB t ln Yd t P t (0.03) (0.019) (1.15) (0.26) t : [0.98] [0.24] [0.31] (-0.48) R 2 = 0.723, R 2 =0.667, n = 25. Does this solve the problem? Question 7: Inthecaseofprices,bothPF t and PB t are theoretically important, so it is not advisable to drop either one. As an alternative, the textbook author suggests to use RP t = PF t /P B t to replace both price variables. Does it make any 32

33 sense to do so? If so, what is the expected sign of RP t? The regression output now becomes F t = RP t ln Yd t P t (1.43) (0.66) (0.281) t : [-1.35] [4.13] [0.019] R 2 = 0.640, R 2 =0.588, n = 25. Question 8: Based on the last regression output, can you reject the null hypothesis in Question 1? Remark. To calculate VIF( β b j )(j =1,,k) in Eviews: Step 1: Run the regression of X ji on (1, X 1i,,X j 1,i,X j+1,i,,x ki ) and name the equation as eqj, for example. Step 2: In the command window type scalar vifj=1/(1-eqj.@r2) or genr vifj=1/(1-eqj.@r2). The former generate a scalar value for vifj whereas the latter generates a sequence values for vifj. 33

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists