Answers: Problem Set 9. Dynamic Models 1. Given annual data for the period 1970-1999, you undertake an OLS regression of log Y on a time trend, defined as taking the value 1 in 1970, 2 in 1972 etc. The table gives the estimated regression coefficients. Standard errors are given in brackets, DW is the Durbin-Watson statistic and h is the Durbin h statistic. Dependent Variable: National Income, Log Y OLS OLS Time Trend 0.026 (0.0004) 0.007 (0.003) Log Yt-1 0.713 (0.100) Constant 10.598 (0.335) 3.066 (1.168) ^ ρ 0.90 0.20 R 2 0.990 0.995 ^ Since DW=2(1- ρ ) In (1) DW = 2(1-.9) =0.2 Given T = 30 and k = 1, critical values at 5% level for Durbin-Watson are dlow = 1.35 and dupper = 1.49 So observed DW < low Conclude there is evidence of positive first order autocorrelation. This means OLS coefficients remain unbiased but standard errors are wrong. In (2) DW = 2(1-.2) =1.6 Given T = 30 and k = 2, critical values at 5% level for Durbin-Watson are dlow = 1.28 and dupper = 1.57 So now observed DW > d upper Conclude there is no evidence of positive first order autocorrelation. but (2) has lagged dependent variable on right hand side and we know this baises DW value toward 2 So instead do the Durbin h test. h = rho (T/(1-T*Var(Y t-1 ) ) where rho is estimated as 1-(DW/2), T is the sample size
and var(y t-1 ) is square of estimated standard error on lagged dependent variable in equation(2) (column 3 in Table) above so rho = 1 (1.60/2) = 0.20 var(y t-1 ) = (.100) 2 = 0.010 and h = 0.20 30/(1-(30*0.010)) = 1.31 Since rule says if h > 1.96 can t reject null hypothesis of no positive autocorrelation. If standard error = 0.200 then var(y t-1 ) = (.200) 2 = 0.040 So h = 0.20 30/(1-(30*0.040)) = indeterminate (since 1 < T*Var(Y t-1 )=1.2 (so if sample size is large enough better to try Breusch-Godfrey test) 2. Read in the data set demand.dta from the web site This file contains time series information on national expenditure levels of a variety of different consumption categories. Use the variables Income household disposable income Hous household housing expenditure Generate the logs of household expenditure on income and housing ( g lninc=log(income) g lnhous=log(hous) ) Regress the log of housing expenditure on the log of income. Interpret your results.. g linc=log(income). g lhous=log(hous). reg lhous linc Source SS df MS Number of obs = 36 ---------+------------------------------ F( 1, 34) =13484.77 Model 4.98049769 1 4.98049769 Prob > F = 0.0000 Residual.01255764 34.000369342 R-squared = 0.9975 ---------+------------------------------ Adj R-squared = 0.9974 Total 4.99305533 35.142658724 Root MSE =.01922 linc 1.138336.0098028 116.124 0.000 1.118415 1.158258 _cons -3.123138.0786026-39.733 0.000-3.282878-2.963398 Since log-linear model (both variables in logs), coefficients are elasticities So income elasticity of housing demand = 1.14 1% increase in come leads to a 1.14% increase in housing expenditure (elastic)
Now regress the log of housing expenditure on the log of income and the log of income lagged one year (to create lagged variables in stata sort year g linc1=lninc[_n-1] ) Find an estimate of the short and the long run elasticities of housing demand with respect to income.. g linc1=linc[_n-1] (1 missing value generated). reg lhous linc linc1 Source SS df MS Number of obs = 35 ---------+------------------------------ F( 2, 32) = 9249.59 Model 4.44935695 2 2.22467848 Prob > F = 0.0000 Residual.007696526 32.000240516 R-squared = 0.9983 ---------+------------------------------ Adj R-squared = 0.9982 Total 4.45705348 34.131089808 Root MSE =.01551 linc.3955737.1735544 2.279 0.029.0420549.7490924 linc1.7219512.1693625 4.263 0.000.3769711 1.066931 _cons -2.933258.0767122-38.237 0.000-3.089516-2.777001 Short-run elasticity is coefficient on current level of income =0.40 Long-run elasticity is sum of coefficients on all income variables =0.40+0.72 = 1.12 (so long run effect is similar to that implied by 1 st regression as it should be. Now add 2 and 3 year lags of log income. How do your results change? Calculate the correlation coefficients between log of income and these lagged values. Does this tell you there may be a problem with the data?. g linc2=linc[_n-2] (2 missing values generated). g linc3=linc[_n-3] (3 missing values generated). reg lhous linc linc1 linc2 linc3 Source SS df MS Number of obs = 33 ---------+------------------------------ F( 4, 28) = 5217.72 Model 3.50313124 4.875782811 Prob > F = 0.0000 Residual.004699736 28.000167848 R-squared = 0.9987 ---------+------------------------------ Adj R-squared = 0.9985 Total 3.50783098 32.109619718 Root MSE =.01296
linc.2763671.1557681 1.774 0.087 -.0427093.5954435 linc1.3315827.2059345 1.610 0.119 -.0902548.7534203 linc2.1449981.2065425 0.702 0.488 -.278085.5680812 linc3.3363033.1427561 2.356 0.026.0438808.6287259 _cons -2.676428.0943476-28.368 0.000-2.86969-2.483165 Note t values now mostly insignificant and R-squared still very high (classic symptom of multicolinearity). corr lhous linc linc1 linc2 linc3 (obs=33) lhous linc linc1 linc2 linc3 ---------+--------------------------------------------- lhous 1.0000 linc 0.9983 1.0000 linc1 0.9988 0.9987 1.0000 linc2 0.9988 0.9976 0.9987 1.0000 linc3 0.9984 0.9965 0.9975 0.9987 1.0000 Can see income variables highly colinear. Use the Koyck approximation to the regression specification above to find the short and long-run elasticities? How do your short and long-run elasticity estimates differ? Are the data from the last regression autocorrelated? If they are how might this affect the interpretation of your results? Need lagged dependent variable to do Koyck transformation. (No need to run estimates with lots of lags since equivalent to estimating equation below see lecture notes for details). Log(Hous) t = a + blog(income) t + λlog(hous) t-1 + ε t. g lhous1=lhous[_n-1] (1 missing value generated). tsset year time variable: year, 1959 to 1994. regdw lhous linc lhous1 Source SS df MS Number of obs = 35 ---------+------------------------------ F( 2, 32) =56655.84 Model 4.45579513 2 2.22789757 Prob > F = 0.0000 Residual.001258347 32.000039323 R-squared = 0.9997 ---------+------------------------------ Adj R-squared = 0.9997 Total 4.45705348 34.131089808 Root MSE =.00627 linc.2577097.0529952 4.863 0.000.1497619.3656575 lhous1.7508883.0452914 16.579 0.000.6586328.8431438 _cons -.5438196.1560751-3.484 0.001 -.8617343 -.2259049 Durbin-Watson Statistic = 1.51815
Coefficient on current income gives short run elasticity (0.26). Coefficient on lagged dependent variable is estimate of λ and long-run multiplier is b/(1-λ) = 0.26/1-.75 = 1.04 So estimate is close but not same as before. Since regression contains lagged dependent variable know that these estimates will be biased if there is autocorrelation in residuals. Durbin-Watson suggests dlow< DW <dupper for T=35 and k=3 1.34 < DW <1.58 so test is inconclusive as to presence of autocorrelation. Conclude estimates may be biased, (though not by much). 3. Short-run multiplier is just the coefficient on the current value of Income = 0.5 in (1) = 0.8 in (2) Long run multiplier is either given by sum of all coefficients on all lags or (see lecture notes) LRM=b/1-λ where λ is the coefficient on the lagged dependent variable in (2) In (1) LRM = 0.500 + 0.300 + 0.250 +0.250 = 1.30 In (2) LRM = 0.8/(1-0.3) = 1.14 Would prefer 2 nd estimate since 1 st one appears to suffer from multicolinearity (high R 2 low t values) Compare (1) to the original dynamic model in levels Y t = a + b 0 X t + b 1 X t-1 + b 2 X t-2 +.+ b k X t-k +u t (2) and use the fact that X t =?X t +?X t-2 +.. +?X t-k+1 + X t-k Eg Consider a model with a maximum of 2 lags Y t = a + b 0 X t + b 1 X t-1 + b 2 X t-2 + u t (3) (1) suggests that (3) could also be written as Y t = g + d 0? X t + d 1? X t-1 + d 2 X t-2 + u t (4) And that the coefficient on X t-2 would be the long-run multiplier How? Use the fact that but equally
X t =?X t + X t-1 X t =?X t +?X t-1 + X t-2 = (X t -X t-1 )+X t-1 =(X t -X t-1 )+(X t-1 -X t-2 )+X t-2 = X t = X t So (3) could be written Y t = a + b 0 (?X t +? X t-1 + X t-2 ) + b 1 (?X t-1 +? X t-2 + X t-3 ) + b 2 X t-2 + u t = a+b 0?X t +b 0?X t-1 +b 0 X t-2 + b 1?X t-1 + b 1?X t-2 + b 1 X t-3 + b 2 X t-2 + u t Collecting terms Y t =a+b 0?X t +(b 0 +b 1 )?X t-1 + (b 0 +b 2 )X t-2 + b 1 (?X t-2 +X t-3 ) Since?X t-2 = X t-2 X t-3 Y t =a+b 0?X t +(b 0 +b 1 )?X t-1 +(b 0 +b 2 )X t-2 + b 1 (X t-2 X t-3 +X t-3 ) X t-3 terms cancel, so Y t =a+b 0?X t +(b 0 +b 1 )?X t-1 +(b 0 +b 1 +b 2 )X t-2 Which gives the result that comparing with (4) g = a d 0 = b 0 d 1 = b 0 +b 1 d 2 = b 0 +b 1 +b 2 Y t = g + d 0? X t + d 1? X t-1 + d 2 X t-2 + u t (4) so the coefficient on d 2 is the long run multiplier estimate for periods 0 to 2 5. Use 2 different versions of the Dickey Fuller test (one with and one without a constant) GDP is trended upward so probably approximates a random walk with drift Y t = b 0 + Y t-1 + e t
(subtract Y t-1 from both sides ) So Y t Y t-1 =b 0 + by t-1 Y t-1 + e t and Y t = b 0 + (b-1) Y t-1 + e t Y t = b 0 + g Y t-1 + e t and test whether g= b-1 = 0 (if g=0 then b=1) and there is a random walk ie the variable is nonstationary Unemployment is not trended but cyclical therefore may be better to use Y t = Y t-1 + e t ie random walk no drift (subtract Y t-1 from both sides ) So Y t Y t-1 = by t-1 Y t-1 + e t and Y t = (b-1) Y t-1 + e t Y t = g Y t-1 + e t and test whether g= b-1 = 0 (if g=0 then b=1) and there is a random walk ie the variable is nonstationary Only difference is whether a constant is included in the regression and 5% critical values change depending on this (1.94 no constant, 2.86 with a constant) From table in (1) t= -1.0 so cant reject null that coefficient is zero and so GDP appears to be a random walk with drift ie non-stationary. If used GDP in OLS regression would lead to problems In (2) t = -4.0 > t crtical = 1.94 so reject null that coefficient is zero. Unemployment not a random walk and so is stationary OK to use this variable in OLS regressions.