Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will lead to changes in the coefficient estimates and their standard errors, but it won t alter the significance of the variables and their interpretations. Therefore, the results from the t test, F test and R 2 remain unchanged. If in the model some variables are expressed in the log form, then it will result in changes in the intercept terms, but the slopes will remain unchanged. For example 1.2 β Coefficient y = β 0 + β 1 x 1 + ε, (1) y = β 0 + β 1 ln x 1 + ε, (2) ln y = β 0 + β 1 x 1 + ε, (3) ln y = β 0 + β 1 ln x 1 + ε. (4) Sometime when the meaning of the marginal change of the original variable is unclear, we may report the so-called β coefficients rather than the regular coefficient estimates. To do this, replace the dependent and independent variables by their standardized values: z y = y y σ y, (5) z j = x j x j σ j. (6) In this case, the interpretation of the coefficients becomes the change of the dependent variable (y) associated with the change of the independent variables (x) measured in standard deviations. By doing this, we eliminate the impact of unit of measurement. but the economic significance and statistic significance remain unchanged. Consider the following example. y i = b 0 + b 1 x i1 +... + b k x ik + e i. (7) 1
1 Functional Forms 2 Taking the sample average, we have Taking the difference, y = b 0 + b 1 x 1 +... + b k x k. (8) y i y = b 1 (x i1 x 1 ) +... + b k (x ik x k ) + e i. (9) Dividing both sides by σ y, we have ( ) ( ) ( ) ( ) y i y b1 σ 1 xi1 x 1 bk σ k xik x k = +... + + e i. (10) σ y σ y σ 1 σ y σ k σ y In the above model, b j σ j σ y, j = 1, 2,..., k (11) is called the β coefficient. The interpretation is that, when x j changes one standard deviation, y will change of its standard deviation. 1.3 Natural Logarithm bj bσj bσ y Sometimes it is useful to express the variables in their natural logarithm. There might be different reasons for this, which include: 1. In a log-linear model, since the change of variables are measured in terms of percentage, so the estimates are independent of the unit of measurement of variables. 2. The coefficient can be interpreted as elasticity. 3. In the case that y > 0, using natural logs can reduce heteroskedasticity and skewness. 4. The values of ln y are more concentrated than y, and hence it reduces the impact of extreme values on estimates. Usually natural logs are used for variables that take values of large positive numbers like population, income, GDP, etc. Variables like percentages or proportions should not be expressed in natural log. The following are some examples. In the model ln y = β 0 + β 1 ln x + ε, (12) β 1 is interpreted as the elasticity of y with respective to x. This model is also called a log-linear model. In the model ln y = β 0 + β 1 x + ε, (13) β 1 is interpreted as the percentage change of y associated with a one-unit change of x. In the model y = β + β 1 ln x + ε, (14) β 1 is interpreted as the change of y associated with a one-percent change of x.
2 Omission of Relevant Variables 3 1.4 Quadratic Terms Consider the model Now y = β 0 + β 1 x + β 2 x 2 + ε. (15) y (β 1 + 2β 2 x) x. (16) Suppose β 1 > 0 and β 2 < 0. In this case, y increases with x at first, but eventually it decreases with x. If β 1 < 0 and β 2 > 0, then y decreases with x at first, but eventually it increases with x. The turning point is where dy/dx = 0, which yields, x = β 1 2β 2. (17) 1.5 Models with Interaction Terms Consider the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + ε. (18) When we interpret β 1, we cannot only consider the change of x 1, but also need to consider β 3. This is because y x 1 = β 1 + β 3 x 2. (19) Therefore, when we consider the impact of the change of x 1 on y, we need to take into account the effect of x 2 as well. Usually we evaluate x 2 at its sample mean, x 2. Example 1: Consider the following estimated model, where n = 680, R 2 = 0.229, R 2 = 0.222. stndfnl = 2.05 0.0067 atndrte 1.63 prigp A 0.128 ACT + (1.36) (0.0102) (0.48) (0.098) 0.296 prigp (0.101) A2 + 0.0045 ACT 2 + 0.0056priGP A atndrte. (0.0022) (0.0043) If we only look at the coefficient for atndrte ( 0.0067), we find that it has negative impact on stndf nl, although not statistically significant (t = 0.66). However, only looking at the coefficient for atndrte cannot reveal the relationship between atndrte and stndfnl, because it only holds when prigp A = 0, which is not meaningful for the current problem. If we evaluate prigp A at its sample mean, which is 2.59, then we have 0.0067 + 0.0056 2.59 0.0078. 2 Omission of Relevant Variables So far we have assumed that the correct model specification is y = Xβ + ε. (20)
2 Omission of Relevant Variables 4 There are many types of errors that one might make in specifying the regressor set, among which two of them are very common: (1) Excluding relevant variables; and (2) including irrelevant variables. What happens to the properties of least squares estimator in the faces of these mistakes? Suppose the true data generating process (DGP) is y = X 1 β 1 + X 2 β 2 + ε, (21) n 1 n K 1 K 1 1 n K 2 K n 1 2 1 where ε N ( 0, σ 2 I ). But we fitted the model y = X 1 β 1 + u, (22) where u = X 2 β 2 + ε. That is, we wrongly exclude the regressors in X 2. Normally we do not know that we have done this and will apply OLS to (22). Therefore, we would assume that u N ( 0,σ 2 I ). However, this is incorrect. In fact, u = X 2 β 2 + ε, and thus u N ( X 2 β 2, σ 2 I ). (23) That is, the error term has non-zero mean because of omission of relevant regressors. The least squares estimator of β 1 from (22) is Therefore, b 1 = (X 1X 1 ) 1 X 1y = (X 1X 1 ) 1 X 1 (X 1 β 1 + X 2 β 2 + ε) = β 1 + (X 1X 1 ) 1 X 1X 2 β 2 + (X 1X 1 ) 1 X 1ε. (24) E (b 1 ) = β 1 + (X 1X 1 ) 1 X 1X 2 β 2. (25) So b 1 is no longer unbiased estimator of β 1 if we have omitted variables. Note that b 1 is unbiased only if X 2 β 2 = 0 (i.e., we haven t omitted regressors) or if X 1X 2 = 0 (i.e., X 1 and X 2 are orthogonal). In addition, if we don t realize the omitted-variable problem, so we would form Var (b 1 ) = σ 2 (X 1X 1 ) 1. (26) But this is not the appropriate formula if we apply least squares to the true DGP (21). Note that (21) can be written as where X = n K the new notation, we have y = Xβ + ε, [ [ X1 X 2, β = n K 1 n K 2 K 1 β 1 K 1 1 β 2 K 2 1, K = K 1 + K 2. With where b = [ b 1 b 2. Var (b) = σ 2 (X X) 1 (27) [ Var (b = 1 ) Cov (b 1, b 2 ), (28) Cov (b 1, b 2 ) Var (b 2 )
3 Inclusion of Irrelevant Variables 5 Therefore, the variance of the coefficients on X 1 would have a covariance matrix equal to the upper left block of (28), which is Var (b 1 ). It can be shown that Var (b 1 ) = σ 2 ( X 1X 1 X 1X 2 (X 2X 2 ) 1 X 2X 1 ) 1 (29) = σ 2 (X 1M 2 X 1 ) 1 (30) σ 2 (X 1X 1 ) 1, (31) where M 2 = I X 2 (X 2X 2 ) 1 X 2. The above analysis shows that if we omit relevant regressors, then we would use the wrong formula for the variancecovariance matrix of the least squares estimator. (Again, if X 1 and X 2 are orthogonal, then the variance-covariance matrix has the correct form. However, as it is shown below, the regular estimate for σ 2 still will be wrong under this condition.) Furthermore, in the presence of omitted-variable problem, if we use the usual OLS, we would estimate σ 2 by s 2 = e 1 e1 n K 1, where e 1 = y X 1 b 1. However, it can be shown that E (e 1e 1 ) = β 2X 2M 1 X 2 β 2 + σ 2 (n K 1 ), (32) where M 1 = I X 1 (X 1X 1 ) 1 X 1. Therefore, E ( s 2) ( ) e = E 1 e 1 n K 1 (33) = σ 2 + β 2X 2M 1 X 2 β 2 n K 1 (34) σ 2. (35) So, s 2 is a biased estimator of σ 2 when there are omitted variables. Note that the bias of s 2 disappears only if X 2 β 2 = 0. Different from b 1, which would be unbiased if X 1X 2 = 0, s 2 would still be biased even if X 1X 2 = 0. In summary, omitting variables will lead to biased estimator of coefficients, incorrect variance-covariance matrix, and biased estimator of the standard error of regression. (Equivalently, we can view omitting X 2 as estimating (21) subject to the incorrect restriction β 2 = 0, which, as we have shown in previous chapters, leads to biased estimator.) These problems will affect the properties of every test we undertake on our model and make it impossible to make correct inference on β 1 due to misspecification of the model. 3 Inclusion of Irrelevant Variables Suppose the true DGP is but we estimate the model y = X 1 β 1 + ε, ε N ( 0,σ 2 I ), (36) y = X 1 β 1 + X 2 β 2 + u. (37)
3 Inclusion of Irrelevant Variables 6 That is, we have included extra (irrelevant) variables, X 2. We can write (37) as [ X1 X 2, β = n K 1 n K 2 K 1 y = Xβ + u, (38) [, K = K 1 + K 2. β 1 K 1 1 β 2 K 2 1 where X = n K In terms of the effect on the properties of least squares estimator of β 1, inclusion of irrelevant variables is not as serious as omitting relevant regressors. Estimating (38) by OLS, we have Therefore, Define A = K K1 Then we can write or b = (X X) 1 X y (39) I K 1 K 1 0 K 2 K 1 = (X X) 1 X X 1 β 1 + (X X) 1 X ε. (40) E (b) = (X X) 1 X X 1 β 1. (41), a selection matrix such that XA = X 1. (42) E (b) = (X X) 1 X XAβ 1 = Aβ [ 1 I = β 0 1, ( ) [ b1 β1 E = b 2 0. (43) Expression (43) implies that b 1 and b 2 are unbiased estimators of β 1 and β 2. It also can be shown that E ( s 2) = σ 2, (44) i.e., s 2 is unbiased estimator of σ 2 even in the face of misspecification. The cost of including irrelevant variables lies in the reduced precision of estimates. 1. Additional regressors means of loss of degrees of freedom. We try to estimate more parameters with the same number of observations and this leads to loss of precision. 2. Variance of the estimator from the incorrect model will be higher. We may think of the situation as having failed to incorporating the valid restrictions that β 2 = 0 in (37). We know that the variance of RLS estimator is less than that of OLS when the restrictions are true. So this result applies here. Remark 1: The ultimate criterion to select the correct model should be suggested by theories.
3 Inclusion of Irrelevant Variables 7 Remark 2: The discussions also suggest that we need an approach to determine selection of variables. Recall we have seen some criteria already. R 2 = 1 e e SST. However, e e always stays the same or decreases when additional variables are added, irrespective of their relevance. Therefore, R 2 is not recommended. R 2 = 1 e e/(n K) SST/(n 1). This is a better criterion than R2, but some researchers have suggested that R 2 does not penalize the loss of degrees of freedom enough when adding variables. This leads to other model selection criteria. Some Model Selection Criteria: 1. FPE: Akaike s (1969, 1979) Final Prediction Error Criterion. In the present context F P E (K j ) = [e e/ (n K j ) [(n + K j ) /n, (45) where K j is the number of parameters in model j. It is also known as Amemiya s (1985) Prediction Criterion (PC). 2. AIC: Akaike s (1974) Information Criterion. In the present context ( e ) e ln (AIC j ) = ln + 2K j n n. (46) 3. SC: Schwarz s (1978) Criterion. ln (SC j ) = ln ( e ) e n + K j ln n. (47) n