Spurious regressions PE 1.. The last lecture discussed the idea of non-stationarity and described how many economic time series are non-stationary. Unfortunately non-stationarity trips up many of the conventional statistical techniques. It is possible to obtain apparently significant regression results from unrelated data when using nonstationary series in regression analysis. Such regressions are said to be spurious. Granger & Newbold based their discussion on artificial sampling experiments (called Monte Carlo experiments) but in the 80s the phenomenon was given a theoretical explanation. The theory is much more sophisticated than the theory underlying the models and methods we have seen so far. Udny Yule had some inkling of the phenomenon already in 196! His Why Do We Sometimes Get Nonsense Correlations between Time-series? A Study in Sampling and the Nature of Time-series Journal of the Royal Statistical Society. Yule focusses on correlation rather on regression but that s a detail. The name and the publicity come from Granger & Newbold Spurious Regressions in Econometrics Journal of Econometrics, (1974)
PE 1. Fig. 1.3 (a) presents two artificially generated series. They are random walks, generated independently and have no relation to one another. Each consists of 1000 observations. When cross-plotted together in Figure 1.3 (b) there is a clear positive relationship between them. The simple regression produces an R of 0. 70 a slope estimate of 0. 84 and a t-ratio of 40. 837. Such a large value for the t-ratio is apparently very strong evidence that there is a relationship between the series. PE does not give the value of the Durbin-Watson statistic. It is a very close to 0 indicating serial correlation in the errors. Look back at the discussion of DW to see that a DW of 0 corresponds to almost perfect serial correlation of the regression residuals.
When Granger & Newbold were writing the popular wisdom was that serial correlation was not such a big deal and that the low DW could be signalling completely misleading inferences. (See the stable case reported later.) Granger & Newbold suggested a rule of thumb when estimating regressions with time series data: if the value of R is greater than value of the Durbin-Watson statistic, then one should suspect a spurious regression. A bigger Monte-Carlo experiment PE illustrates what can happen by generating a single pair of series and doing a regression. Certain features of their results are accidental, like the negative slope or correlation found between the variables. To remove the suspicion that they have been lucky or unlucky with their samples I repeated their exercise 1000 times. This is essentially what Granger & Newbold did originally. Artificial sampling exeriments are called Monte-Carlo experiments (You may have done some using Excel in
the first year). They are used particularly when it is hard to obtain a theoretical solution for a distribution. I generated 1000 samples where each sample consists of 100 observations on two variables: each were generated from two independent random walks y t = y t 1 +u t, u t IN(0,1) x t = x t 1 +v t, v t IN(0,1). (This is not quite the same experiment as in U/G E foroneoftheirserieshasastandarddeviationof0.5 and their samples have 1000 observations.) Theyseriesandthexserieswillbeindependentifthe useriesandthevseriesonwhichtheyarebasedare independent. SoIchoseuandvtobeindependent. For each of the 1000 samples I fitted the simple regression model y t =α+βx t +ε t. With each sample containg 100 observation one expects reasonably good estimates of the parameters and accurate tests. Histograms based on the 1000 experiments show in thetoprowthevaluesoftheslopeandinterceptestimates; below them are the values of the t-statistics (fortestingwhethertheparametervalueis0)andin bottom right hand corner is the distribution of DW values. TheBIGBADsurpriseisinthedistributionofthetstatistics. The naive expectation is that they will be approximately N(0, 1). They are HUGELY more dispersed and the chance of observing a significant result (one outside ±) is much bigger than the chance of observing an insignificant result. The only good news is that the distribution of the DW is well away from the value of which would
signal lack of serial correlation. The diagnostic test is working WELL and giving a warning. 0.75 0.50 Yb 0.050 Constant Unrelated stationary ARs 0.5 0.050 0.05 0.0015 0.0010 0.0005 0.05-3 - -1 0 1 3-30 -0-10 0 10 0 30 t-yb 0.03 t-constant 0.0 0.01-30 -0-10 0 10 0 30 40 AR1 0 1000 000 3000 4000-75 -50-5 0 5 50 75 DW 4 0.0 0. 0.4 0.6 0.8 1.0 Figure 8: spurious regressions ForthesakeofcomparisonIdidthesameexperiments with the stable/stationary series y t = 0.5y t 1 +u t, u t IN(0,1) x t = 0.5x t 1 +v t, v t IN(0,1) whereagainu t andv t areindependent. AgainIfittedthemodel y t =α+βx t +ε t. The results for the distribution of the estimates and the t-statistics are very different from the nonstationary case and more like the results in Introduction to Econometrics.
3 1 0.3 0. 0.1 0.03 0.0 0.01 Yb -0.4-0. 0.0 0. 0.4 t-yb -4-0 4 6 AR1 0 5 50 75 100 Constant 1-0.75-0.50-0.5 0.00 0.5 0.50 0.75 0.3 t-constant 0. 0.1-5.0 -.5 0.0.5 5.0 7.5 DW 1 0.50 0.75 1.00 1.5 1.50 1.75 The t-tests are much better behaved they are not massively different from the naive N(0, 1). There willbeover-rejectionofthe(true)nullthatβ=0 but not the massive over-rejection found in the random walk case. The DW continues indicate serial correlation because the residuals are picking up the the serial correlationiny t (=0.5y t 1 +u t ). Inthepresent case the serial correlation messes up the standard errors of the estimates (and the t-statistics) but not disastrously. Figure 9: not so spurious